printlogo
http://www.ethz.ch/index_EN
CBRG - Computational Biochemistry Research Group
 
print
  

ML Trees - Solution

This is the solution to the ML tree exercise.

First, we need to set up the parameters and create the Dayhoff matrices. We will need the amino acid frequencies AF and the logPAM1 matrix created by CreateDayMatrices().

b1 := 10;
b2 := 15;
b3 := 15;
b4 := 20;
b5 := 10;
A := 'TGEELPNDLQTIANELFGTDQPFTNFNGIGRFATPGFSPFGAALDNMSVGGLNHIAIKHSGEIEDSVRSN':
B := 'TGEELPFDIETIANELFGTDQPFTNFNGLGQMATPGFNPFGAAADNLATGGVNHIAIEHSGEIEPSVRSN':
C := 'TGEELPNDLQTISEEKFGTDQPFTNFNGIGRFAAPGFNPFGAALDNLSSGGTNHVVIEHSGDIQQSVRSN':
D := 'TGEELPNDAQAITEQLFGTDKPFTNFNGIGRFATPGPDPMGAELDNLEVGGVNHIAVQSTGEIEPSVRSN':   
CreateDayMatrices();

Step 1: Function to compute the likelihood

To compute the likelihood, we need the mutation matrices for all 5 distances on the tree. Then we loop over all positions of the sequences (1 to N). At each position we have a double-loop to consider all possible combinations of amino acids at nodes X and Y. Since the multiplications in this double loop are executed 400 times for each position, we should try to make it a bit more efficient. This is achieved by precomputign the terms that depend only on X and those that depend only on Y. The array PX[Z] contains the partial product for a Z at node X. PY[Z] is almost the same, but for node Y.

ComputeLogL := proc(S1, S2, S3 ,S4, d1, d2, d3 ,d4 ,d5)
N := length(S1);
M1 := exp(d1*logPAM1);
M2 := exp(d2*logPAM1);
M3 := exp(d3*logPAM1);
M4 := exp(d4*logPAM1);
M5 := exp(d5*logPAM1);
logL := 0;
PX := CreateArray(1..20):
PY := CreateArray(1..20):
for i to N do
    # precompute the parts that depend only on X or Y
    for Z to 20 do
        PX[Z] := AF[Z]*M1[AToInt(S1[i]),Z]*M2[AToInt(S2[i]),Z];
        PY[Z] := M3[AToInt(S3[i]),Z]*M4[AToInt(S4[i]),Z];
    od:
    Li := 0:
    for X to 20 do
        for Y to 20 do
            Li := Li + PX[X]*M5[Y,X]*PY[Y];
        od:
    od:
    logL := logL+log(Li);
od:
return(logL);
end:

Step 2: Maximizing the middle branch

The simple (but inefficient) solution to optimize the middle branch, is by trying all possible branch lengths within a reasonable range, for example between 0 and 15 with steps of 0.1:

maxL := -DBL_MAX;
for d5 from 0 to 15 by 0.1 do
   ll := ComputeLogL(A,B,C,D,b1,b2,b3,b4,d5);
   if ll>maxL then maxL := ll; best5 := d5 fi;
od:

The result should be an optimal middle branch of 1.9 with a log-likelihood of -400.1191.

 

Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne graphische Elemente dargestellt. Die Funktionalität der Website ist aber trotzdem gewährleistet. Wenn Sie diese Website regelmässig benutzen, empfehlen wir Ihnen, auf Ihrem Computer einen aktuellen Browser zu installieren. Weitere Informationen finden Sie auf
folgender Seite.

Important Note:
The content in this site is accessible to any browser or Internet device, however, some graphics will display correctly only in the newer versions of Netscape. To get the most out of our site we suggest you upgrade to a newer browser.
More information

© 2012 ETH Zurich | Imprint | Disclaimer | 12 November 2008
top