BMC Bioinformatics 2021

This site accompanies the publication Using sound to understand protein sequence data: new sonification algorithms for protein sequences and multiple sequence alignments by Martin et. al. 2021 in BMC Bioinformatics.

The paper presents five algorithms for the sonification of protein sequence and multiple sequence alignment (MSA) data. Algorithms I, II, and III sonify protein sequences. Algorithms IV and V sonify multiple sequence alingments.

Some details of the algorithms will be included here, but for a full explanation of the sonifications please see the paper.

Code and documentation is available from GitHub https://github.com/sonifyed/Protein_Sound

Questionnaire

https://github.com/sonifyed/Protein_Sound/blob/main/Questionnaire.pdf

We used a questionaire to research the effectiveness of these methods. We set tasks for our participants to complete using two of the sonification algorithms (Algorithms I and IV). You are welcome to access this questionnaire by the link above and try the tasks - it should take about 15 mins to complete. Download the pdf for clickable links to sound examples. Please note we are not collecting responses.

Protein Examples

Major Prion Protein - Homo sapiens (Human)

https://www.uniprot.org/uniprot/P04156

>sp|P04156|PRIO_HUMAN Major prion protein OS=Homo sapiens OX=9606 GN=PRNP PE=1 SV=1
MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQP
HGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGA
VVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV
NITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPV
ILLISFLIFLIVG

The example of the Major Human Prion protein demonstrates the effectiveness of sonification in identifying Amino Acid Repeats (AARs). 

Transmembrane protein 14C - Homo sapiens (Human)

This is an example of the protein algorithms using a transmembrane protein. Information about the protein can be found at the uniprot website here: https://www.uniprot.org/uniprot/Q9P0S9

https://en.wikipedia.org/wiki/Transmembrane_protein

>sp|Q9P0S9|TM14C_HUMAN Transmembrane protein 14C OS=Homo sapiens OX=9606 GN=TMEM14C PE=1 SV=1
MQDTGSVVPLHWFGFGYAALVASGGIIGYVKAGSVPSLAAGLLFGSLAGLGAYQLSQDPR
NVWVFLATSGTLAGIMGMRFYHSGKFMPAGLIAGASLLMVAKVGVSMFNRPH

Algorithm I

Algorithm II

Algorithm III

 

Insulin (globular protein)

https://en.wikipedia.org/wiki/Insulin

https://www.uniprot.org/uniprot/P01308

https://en.wikipedia.org/wiki/Globular_protein

>sp|P01308|INS_HUMAN Insulin OS=Homo sapiens OX=9606 GN=INS PE=1 SV=1
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN

 

Histone (Intrinsically Disordered Protein)

https://en.wikipedia.org/wiki/Histone_H4

https://www.uniprot.org/uniprot/P62805

https://en.wikipedia.org/wiki/Intrinsically_disordered_proteins

https://www.ideal-db.org

>sp|P62805|H4_HUMAN Histone H4 OS=Homo sapiens OX=9606 GN=H4C1 PE=1 SV=2
MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLK
VFLENVIRDAVTYTEHAKRKTVTAMDVVYALKRQGRTLYGFGG

 

Multiple Sequence Alignment Examples

For each of the examples below, both a gappy and a compact MSA are given as examples for comparison. These differ in the technique used to make the multiple alignment. Gappy MSAs were generated using MUSCLE 3.8.31 (-gapopen -3). Compact MSAs were generated using MUSCLE 3.8.31 (-gapopen 1). For each pair of gappy and compact MSAs, the same unaligned sequences were used as input.

GAPDH

https://en.wikipedia.org/wiki/Glyceraldehyde_3-phosphate_dehydrogenase

Compact Visualisation (AliView)

Gappy Visualisation (AliView)

Algorithm IV

 

Algorithm V

Insulin

Compact Visualisation (AliView)

Gappy Visualisation (AliView)

https://en.wikipedia.org/wiki/Insulin

Algorithm IV

 

Algorithm V