printlogo
http://www.ethz.ch/index_EN
CBRG - Computational Biochemistry Research Group
 
print
  

Synthetic Evolution Exercise - Week 11

In this exercise we want to investigate the effects of model violations on two different types of tree building algorithms. For this purpose we will use a synthetic evolution framework to create evolutionary scenarios. The framework takes a tree and evolves a set of random start sequences along its braches. Then, we use the output sequences to re-estimate the tree and check how many times the correct tree was reconstructed. IMPORTANT NOTE: The method to compute parsimony trees is currently unstable on machines running OSX. If you use your own mac, use an ssh connection to one of the lab machines (stud{nr}-h56.inf.ethz.ch) to run your code.

We use the simulation framework ALF, which has been developed in Darwin in our group. ALF is not included in the general Darwin installation but has to be installed separately.

First, download a copy of the stand-alone version on this website: ALF service. Note that you don't have to enter any personal information in the popup window, just click the download button.

Second, extract the downloaded archive (e.g. with tar xvzf ALF_standalone.tar.gz). Don't run the installer script, since we only use ALF for this single exercise. Running ALF means basically changing to its directory and call bin/alfsim parameterfile.drw.

We will use the following tree for the simulations:

Synthetic Evolution Tree
Synthetic Evolution Tree

In newick format, the tree can be represented like this:

((S1:11,S3:41):1,(S2:13,(S4:20,S5:13):31):2):0;

Save the upper line in a text file called "synEvol_tree.newick" in the ALF folder. Additionally, the parameter file for the simulation is available here.

The parameters you will have to change for the exercise are at the top of the file. You might want to change the name of the simulation, the working directory and the path to the tree file. Run the simulation with all other options unchanged: bin/alfsim synEvol_parameters.drw

The results will be stored in a subfolder structure in the folder you assigned in the parameters file. Especially, the new sequences are stored as FASTA files in the subfolder MSA. To read them into darwin, you can use the function ReadFasta().

Next we will re-estimate the tree topology from each set of simulated orthologous sequences. You will probably want to implement the following as a function, because we will re-use the code. We want to compare the performance of least squares and parsimony trees. Use the Darwin function PhylogeneticTree() for this purpose. Make sure that the labels of the estimated tree correspond to those of the original one. This should give you two lists of trees (one with parsimony trees and one with least squares trees). Now you can use the function RobinsonFoulds to compute the distance between these trees and the original one:

dist := RobinsonFoulds([tree, op(lstrees)]):

dist is a distance matrix, but we are only interested in the first row, which contains the Robinson-Foulds distance between the simulated and the reconstructed trees. To obtain the number of correctly re-estimated trees, you can just count how many of the trees have distance 0 to the original tree.

Did the reconstruction of the tree work for all orthologous groups?

Now we want to see how a perturbation of the model affects the reconstruction by modifying the fraction of invariable sites (the variable motifFreq in the parameter file). Run the simulation again for 10, 30, 50, 70 and 90 percent invariable sites and for each run re-estimate the gene trees.

Finally, plot the number of correct trees vs. percent invariable sites. You could use a function like this one:

DrawResult := proc(d:matrix(numeric), legend:list(string) )
    colors := [[1,0,0],[0,0,1]]:
    cmds := []:
    for j from 2 to length(d[1]) do
        cmds := append(cmds,
           seq( LINE(d[i,1],d[i,j],d[i+1,1],d[i+1,j], 'color'=colors[j-1]), i=1..length(d)-1));
        cmds := append( cmds, LTEXT(0,40-5*j, legend[j-1],'color'=colors[j-1]) ):
    od:
    DrawPlot(cmds,axis):
end:

You can also investigate how other parameters affect tree reconstruction, e.g. by varying the sequence length (change the parameters minGeneLength and gammaLengthDist) or by introducing site heterogeneity (change parameter areas). Maybe have a look at the sample parameters file alf-params that contains even more parameters than the example we used in this exercise.

 

Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne graphische Elemente dargestellt. Die Funktionalität der Website ist aber trotzdem gewährleistet. Wenn Sie diese Website regelmässig benutzen, empfehlen wir Ihnen, auf Ihrem Computer einen aktuellen Browser zu installieren. Weitere Informationen finden Sie auf
folgender Seite.

Important Note:
The content in this site is accessible to any browser or Internet device, however, some graphics will display correctly only in the newer versions of Netscape. To get the most out of our site we suggest you upgrade to a newer browser.
More information

© 2012 ETH Zurich | Imprint | Disclaimer | 29 November 2011
top