printlogo
http://www.ethz.ch/index_EN
CBRG - Computational Biochemistry Research Group
 
print
  

Would you eat that cake? - Excercise Week 12

Peter von Rohr

The topic of this exercise is to identify an unknown protein based on molecular weights.

Let us assume that you bought some cake. Before eating it, you analyze that cake chemically. The result of the analysis shows that the following unknown protein is in your cake. Of course, you want to identify this unknown protein before eating the cake, just to make sure, you are not eating something strange.

In reality you would not know the sequence of the unknown protein. The only thing you would know about the protein are the weights of the protein pieces after digestion, and after mass spectroscopy analysis. But for the sake of this exercise, let us simulate the chemical analysis also. For this simulation, we need to know the protein sequence. But we still pretend that we do not know the name of the protein.

So here is the sequence of the 'unknown' protein:

ProtSeq := 'MKLEVLPLDQKTFSAYGDVIETQERDFFHINNGLVERYHDLAKVEVLEQDRTLISINRAQPAAMPIVVHELERHPLGTQAFVPMNGEAFVVIVALGDDKPELSTLRAFISNGRQGVNYHRNVWHHPLFAWQTVTDFLTVDRGGSDNCDVESIPTHELCFA':

1. Digest the unknown protein with trypsin

Just as a reminder, trypsin cuts each protein after an amino acids R and K not followed by a P.

Write a Darwin procedure that takes a protein sequence as an argument and returns a list of strings which correspond to the digested fragments after a trypsin digestion.

In the second step, we have to determine the weights of the digested fragments. This can be done using Darwin's procedure GetMolWeight.

The following two statements are an example on how to get help about the procedure and how to call it for a certain list of digested sequence, called DigProt.

?GetMolWeight
ProtWeight := GetMolWeight(DigProt);

2. Digest all known proteins of the database

Finally, we have to load a database, and digest all the proteins in the database. For the purpose of this exercise, we do not use the standard SwissProt database. Instead you can download a smaller test database. If you work in the student computer room, you can load this dataset with the following command:

ReadDb('/home/darwin/DB/testdb');

The digestion of the proteins in the database results in a list of digested fragments for each protein in the database. For each of these fragments we have to get the weights which is again done with the Darwin procedure GetMolWeight.

3. Find the best fit for your digestion with a digested protein in the database

Then the deviations of the weights of the fragments from the unknown protein to the closest to the weights of the fragments of proteins in the database must be computed. Remember that the list of weights per protein are not ordered, a simple one-to-one comparison won't work in general.

Based on these deviations, the probabilities of such a fragment match occurring just by chance can be computed. In this exercise, we will just stick with the approximation of chosing the match with the lowest deviation, but this is flawed (why?) and should not be done in real world applications. See the notes of last week's lecture and the biorecipes on mass spectra analyses for the details on how to compute those probabilities.

Compare your result with the output of the Darwin procedure called SearchMassDb which actually does not only the computations you did in this exercise but also compares the probabilities of getting a match compared to the probability of random chance - which would be the correct way to do this. Here is an example on how to call this procedure:

res := SearchMassDb( Protein(DigestionWeights('Trypsin', op(ProtWeight))), 5);
print(res);

And now, after all this, would you still eat that cake?

 

Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne graphische Elemente dargestellt. Die Funktionalität der Website ist aber trotzdem gewährleistet. Wenn Sie diese Website regelmässig benutzen, empfehlen wir Ihnen, auf Ihrem Computer einen aktuellen Browser zu installieren. Weitere Informationen finden Sie auf
folgender Seite.

Important Note:
The content in this site is accessible to any browser or Internet device, however, some graphics will display correctly only in the newer versions of Netscape. To get the most out of our site we suggest you upgrade to a newer browser.
More information

© 2012 ETH Zurich | Imprint | Disclaimer | 8 December 2011
top