printlogo
http://www.ethz.ch/index_EN
CBRG - Computational Biochemistry Research Group
 
print
  

Progressive MSA Reconstruction using Consensus Sequences

In this exercise you should write a program to reconstruct a progressive multiple sequence alignment in Darwin. A progressive MSA traverses a guide tree from the leaves upwards and reconstructs an ancestral sequence for every inner node of the tree. For the purpose of this exercise for reasons of simplicity, we propose to use the consensus sequence as the reconstructed ancestral sequence.

You are given the following 5 homologous sequences:

s1 := 'RYEDGVGNVIMQGRVMLAWRSVIALNTEKFLDVGPEDEQALEIKRNGTKAQLHLLVLMTVLAADNPAEENRKAVEMYAFLTEFDFRADGMVRVRGNLFPFMGLGLKR';
s2 := 'RYEDSIENLLLQSRLVLAYRTAVALNTEKFLGVPEDDEKELQIQRNGTKAQLKDWVLMHVLAAPAEENREAFLTPLDFHADAMVEVRGLIKQGLHR';
s3 := 'RYKRPVVNATLQSRLILAFRSVVALNTEEFLEVRPDDEAGEIERNGTKAELNLTVQMHVLAAPLEENRQAFLTPVDFHAAGMILVRGNGFPLIKLGLKR';
s4 := 'DPADGLDNKREENRVDLCCVPVVALNTPQFVNIGPDPAALTIKDNGSHVALEVLRWSLAAPTEENFQRFDLTKDFAVCMVAVRMNEFPKPGKDLLR';
s5 := 'DEADGLDNNREENRVKLCCVPVYAINTPQFIRAGPDSAALAIKANGSSGHVLLQVLRFSLAAPSEENFLRFDLTRDFAICMVAMRGNEFPKPGSDLKR';

and in the figure below, the associated guide tree including branch lengths is depicted.

Guide Tree
Guide Tree

A pseudo-code of our progressive alignment method can be written as

AlignProgressive := proc(leftMSA:MSA, rightMSA:MSA, evolDist:numeric)
  cons1 := ConsensusSequence(leftMSA);
  cons2 := ConsensusSequence(rightMSA);
  consAlignment := OptimalGlobalAlignment(cons1, cons2, evolDist);
  return ( MergeMSA(leftMSA, rightMSA, consAlignment) );
end:

meaning that computing the MSA at any inner node of the guide tree involves the following 3 steps:

  1. compute a consensus ancestral sequence for the left and the right sub-MSA.
  2. compute an optimal global alignment of the two consensus sequences at the given evolutionary distance
  3. merge the left and right sub-MSA based on the alignment of the consensus sequences.

Task 1

For our progressive alignment method to work, we need several ingredients:

First, we need a method to convert our original sequences into a MSA (with only one sequence). As a representation of an MSA we suggest to use a matrix of size NxM, where N is the number of sequences in the MSA and M is the length of it. Hence the method SequenceToMSA should take a sequence as input and return a matrix of size 1x(length(sequence)) containing the corresponding residues.

Second, write a method ConsensusSequence which computes the consensus sequence from a given MSA, meaning a sequence which contains the most frequent character of a column. In cases of ties, any of the characters is equally fine (Hint1: the sort function might simplify this task. Hint2: Think carefully about the role of gaps in the consensus sequence.)

The optimal global alignment of the consensus sequences can be computed with Darwin's Align function with DayMatrix(evolDist) as input. Remember to CreateDayMatrices() beforehand.

Third, we need to write a method MergeMSA to merge two MSAs based on a global pairwise alignment of the two consensus sequences. To get the aligned sequences from a Darwin Alignment, use DynProgStrings. Based on these sequences, you can merge the two sub-MSAs. Hint: Ask yourself how you can match a column from a sub-MSA with the aligned consensus sequence characters and second, which scenarios for merging are possible.

Lastly, we need to write the above pseudo-code function AlignProgressive in proper darwin language.

Now, we have all the ingredients for reconstructing the MSA. Apply your AlignProgressive function in the right order to your input sequences, after having them converted into trivial MSAs using your SequenceToMSA function.

Task 2 (optional)

Instead of traversing the guide tree by hand as we did in Task 1, write a method which recursively traverses the guide tree in depth-first order, extracts the relevant information and computes the resulting MSA. Note: You may want to consult the darwin help pages for Tree and Leaf.

guideTree := Tree(Tree(Leaf(s1,-60),-30,Tree(Leaf(s2,-70),-40,Leaf(s3,-90))),0,Tree(Leaf(s4,-100),-80,Leaf(s5,-90)));

Task 3 (to complete your own MSA package)

The only thing that is missing now is a way to estimate a guide tree. A common way to do this is to reconstruct a distance tree from pairwise alignments. Do you get the same guide tree? What is the influence on the final MSA you obtain?

Good Luck & Have Fun!

 

Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne graphische Elemente dargestellt. Die Funktionalität der Website ist aber trotzdem gewährleistet. Wenn Sie diese Website regelmässig benutzen, empfehlen wir Ihnen, auf Ihrem Computer einen aktuellen Browser zu installieren. Weitere Informationen finden Sie auf
folgender Seite.

Important Note:
The content in this site is accessible to any browser or Internet device, however, some graphics will display correctly only in the newer versions of Netscape. To get the most out of our site we suggest you upgrade to a newer browser.
More information

© 2012 ETH Zurich | Imprint | Disclaimer | 25 November 2010
top