|
|||||||||||
Open Positions
Research
Step 1 - Finding the sequences
By executing these commands we get N=13 entries that contain ("OPSG_" OR "OPSR_") AND "Mammalia", in other words all red and green opsins in mammalia.
ReadDb('/home/darwin/DB/SwissProt.db');
entries := [SearchDb({'OPSG_','OPSR_'},'Mammalia')];
N := length(entries);
Step 2 - All vs. all alignments
A double loop is needed to align each sequence against each other sequence. We save the PAM distances and variances in matrices called D and V.
D := CreateArray(1..N,1..N):
V := CreateArray(1..N,1..N):
CreateDayMatrices();
for a to N do for b from a+1 to N do
al := Align(entries[a],entries[b],DMS);
D[a,b] := D[b,a] := al[PamDistance];
V[a,b] := V[b,a] := al[PamVariance];
od od:
Step 3 - Creating a least-squares tree
Given D and V, the MinSquareTree function of Darwin can be used. It also needs a list of labels, we use the ID tag for that purpose:
labels := CreateArray(1..N):
for i to N do
labels[i] := SearchTag('ID',entries[i]);
od:
tree := MinSquareTree(D,V,labels);
T := Tree_matrix(tree);
The DrawTree commands has many options to draw the tree. Here the 'Unrooted' option is used to make an unrooted tree (since we have no information about the position of the root), and 'Legend' removes the circles at the leaves.
DrawTree(tree,Unrooted,Legend); View();

Analysis - Least-squares fit
The distance on the tree (T) and the estimated distances are not the same:
T := Tree_matrix(tree): print(T-D);
This is because there are more pairwise distances than free parameters (branches on the tree) Only in very ideal situations would the distances perfectly fit on a tree. The least-squares tree minimizes the squared differences between the pairwise distances (D) and the tree distances(T), divided by the variance of the estimates (V):
ssq := sum(sum((T[i,j]-D[i,j])^2/V[i,j],j=i+1..N),i=1..N);
To normalize this sum of squared errors, it is divided by the degrees of freedom, the difference between the number of given equations (N*(N-1)/2 estimated distances) and the number of free parameters (2*N-3 branch lengths):
df := N*(N-1)/2-(2*N-3); ssq := ssq/df;
The MinSquareTree function stores this value in the MST_Qual variable:
MST_Qual;
Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne
graphische Elemente dargestellt. Die Funktionalität der
Website ist aber trotzdem gewährleistet. Wenn Sie diese
Website regelmässig benutzen, empfehlen wir Ihnen, auf
Ihrem Computer einen aktuellen Browser zu installieren. Weitere
Informationen finden Sie auf
folgender
Seite.
Important Note:
The content in this site is accessible to any browser or
Internet device, however, some graphics will display correctly
only in the newer versions of Netscape. To get the most out of
our site we suggest you upgrade to a newer browser.
More
information