printlogo
http://www.ethz.ch/index_EN
CBRG - Computational Biochemistry Research Group
 
print
  

Distance Tree Solution

Step 1 - Finding the sequences

By executing these commands we get N=13 entries that contain ("OPSG_" OR "OPSR_") AND "Mammalia", in other words all red and green opsins in mammalia.

ReadDb('/home/darwin/DB/SwissProt.db');
entries := [SearchDb({'OPSG_','OPSR_'},'Mammalia')];
N := length(entries);

Step 2 - All vs. all alignments

A double loop is needed to align each sequence against each other sequence. We save the PAM distances and variances in matrices called D and V.

D := CreateArray(1..N,1..N):
V := CreateArray(1..N,1..N):

CreateDayMatrices();
for a to N do for b from a+1 to N do
    al := Align(entries[a],entries[b],DMS);
    D[a,b] := D[b,a] := al[PamDistance];
    V[a,b] := V[b,a] := al[PamVariance];
od od:

Step 3 - Creating a least-squares tree

Given D and V, the MinSquareTree function of Darwin can be used. It also needs a list of labels, we use the ID tag for that purpose:

labels := CreateArray(1..N):
for i to N do
    labels[i] := SearchTag('ID',entries[i]);
od:
tree := MinSquareTree(D,V,labels);

T := Tree_matrix(tree);

The DrawTree commands has many options to draw the tree. Here the 'Unrooted' option is used to make an unrooted tree (since we have no information about the position of the root), and 'Legend' removes the circles at the leaves.

DrawTree(tree,Unrooted,Legend);
View();
Distance tree of 13 red and green mammalian opsins
Distance tree of 13 red and green mammalian opsins

Analysis - Least-squares fit

The distance on the tree (T) and the estimated distances are not the same:

T := Tree_matrix(tree):
print(T-D);

This is because there are more pairwise distances than free parameters (branches on the tree) Only in very ideal situations would the distances perfectly fit on a tree. The least-squares tree minimizes the squared differences between the pairwise distances (D) and the tree distances(T), divided by the variance of the estimates (V):

ssq := sum(sum((T[i,j]-D[i,j])^2/V[i,j],j=i+1..N),i=1..N);

To normalize this sum of squared errors, it is divided by the degrees of freedom, the difference between the number of given equations (N*(N-1)/2 estimated distances) and the number of free parameters (2*N-3 branch lengths):

df := N*(N-1)/2-(2*N-3);
ssq := ssq/df;

The MinSquareTree function stores this value in the MST_Qual variable:

MST_Qual;
 

Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne graphische Elemente dargestellt. Die Funktionalität der Website ist aber trotzdem gewährleistet. Wenn Sie diese Website regelmässig benutzen, empfehlen wir Ihnen, auf Ihrem Computer einen aktuellen Browser zu installieren. Weitere Informationen finden Sie auf
folgender Seite.

Important Note:
The content in this site is accessible to any browser or Internet device, however, some graphics will display correctly only in the newer versions of Netscape. To get the most out of our site we suggest you upgrade to a newer browser.
More information

© 2012 ETH Zurich | Imprint | Disclaimer | 7 November 2011
top