BMC Bioinformatics 2021
This site accompanies the publication Using sound to understand protein sequence data: new sonification algorithms for protein sequences and multiple sequence alignments by Martin et. al. 2021 in BMC Bioinformatics.
The paper presents five algorithms for the sonification of protein sequence and multiple sequence alignment (MSA) data. Algorithms I, II, and III sonify protein sequences. Algorithms IV and V sonify multiple sequence alingments.
Some details of the algorithms will be included here, but for a full explanation of the sonifications please see the paper.
Code and documentation is available from GitHub https://github.com/sonifyed/Protein_Sound
Questionnaire
https://github.com/sonifyed/Protein_Sound/blob/main/Questionnaire.pdf
We used a questionaire to research the effectiveness of these methods. We set tasks for our participants to complete using two of the sonification algorithms (Algorithms I and IV). You are welcome to access this questionnaire by the link above and try the tasks - it should take about 15 mins to complete. Download the pdf for clickable links to sound examples. Please note we are not collecting responses.
Protein Examples
Major Prion Protein - Homo sapiens (Human)
https://www.uniprot.org/uniprot/P04156
>sp|P04156|PRIO_HUMAN Major prion protein OS=Homo sapiens OX=9606 GN=PRNP PE=1 SV=1 MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQP HGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGA VVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV NITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPV ILLISFLIFLIVG
The example of the Major Human Prion protein demonstrates the effectiveness of sonification in identifying Amino Acid Repeats (AARs).
Transmembrane protein 14C - Homo sapiens (Human)
This is an example of the protein algorithms using a transmembrane protein. Information about the protein can be found at the uniprot website here: https://www.uniprot.org/uniprot/Q9P0S9
https://en.wikipedia.org/wiki/Transmembrane_protein
>sp|Q9P0S9|TM14C_HUMAN Transmembrane protein 14C OS=Homo sapiens OX=9606 GN=TMEM14C PE=1 SV=1 MQDTGSVVPLHWFGFGYAALVASGGIIGYVKAGSVPSLAAGLLFGSLAGLGAYQLSQDPR NVWVFLATSGTLAGIMGMRFYHSGKFMPAGLIAGASLLMVAKVGVSMFNRPH
Algorithm I
Algorithm II
Algorithm III
Insulin (globular protein)
https://en.wikipedia.org/wiki/Insulin
https://www.uniprot.org/uniprot/P01308
https://en.wikipedia.org/wiki/Globular_protein
>sp|P01308|INS_HUMAN Insulin OS=Homo sapiens OX=9606 GN=INS PE=1 SV=1 MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
Histone (Intrinsically Disordered Protein)
https://en.wikipedia.org/wiki/Histone_H4
https://www.uniprot.org/uniprot/P62805
https://en.wikipedia.org/wiki/Intrinsically_disordered_proteins
>sp|P62805|H4_HUMAN Histone H4 OS=Homo sapiens OX=9606 GN=H4C1 PE=1 SV=2 MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLK VFLENVIRDAVTYTEHAKRKTVTAMDVVYALKRQGRTLYGFGG
Multiple Sequence Alignment Examples
For each of the examples below, both a gappy and a compact MSA are given as examples for comparison. These differ in the technique used to make the multiple alignment. Gappy MSAs were generated using MUSCLE 3.8.31 (-gapopen -3). Compact MSAs were generated using MUSCLE 3.8.31 (-gapopen 1). For each pair of gappy and compact MSAs, the same unaligned sequences were used as input.
GAPDH
https://en.wikipedia.org/wiki/Glyceraldehyde_3-phosphate_dehydrogenase
Compact Visualisation (AliView)
Algorithm IV
Algorithm V
Insulin
Compact Visualisation (AliView)
https://en.wikipedia.org/wiki/Insulin
Algorithm IV
Algorithm V