|
|||||||||||
Open Positions
Research
This is the solution to the Progressive MSA Reconstruction exercise
First, we enter our 5 homologous sequences and initialize the Dayhoff matrices:
s1 := 'RYEDGVGNVIMQGRVMLAWRSVIALNTEKFLDVGPEDEQALEIKRNGTKAQLHLLVLMTVLAADNPAEENRKAVEMYAFLTEFDFRADGMVRVRGNLFPFMGLGLKR'; s2 := 'RYEDSIENLLLQSRLVLAYRTAVALNTEKFLGVPEDDEKELQIQRNGTKAQLKDWVLMHVLAAPAEENREAFLTPLDFHADAMVEVRGLIKQGLHR'; s3 := 'RYKRPVVNATLQSRLILAFRSVVALNTEEFLEVRPDDEAGEIERNGTKAELNLTVQMHVLAAPLEENRQAFLTPVDFHAAGMILVRGNGFPLIKLGLKR'; s4 := 'DPADGLDNKREENRVDLCCVPVVALNTPQFVNIGPDPAALTIKDNGSHVALEVLRWSLAAPTEENFQRFDLTKDFAVCMVAVRMNEFPKPGKDLLR'; s5 := 'DEADGLDNNREENRVKLCCVPVYAINTPQFIRAGPDSAALAIKANGSSGHVLLQVLRFSLAAPSEENFLRFDLTRDFAICMVAMRGNEFPKPGSDLKR'; CreateDayMatrices();
The main idea is to reduce the alignment of two MSAs to the alignment of two sequences. Therefore, we need to compress each MSA into a consensus sequence, align those sequences using global alignment, and expand each character in the resulting pairwise alignment to the corresponding column to get the resulting MSA.
First, let us compute a consensus sequence from a MSA, which is represented as a matrix of characters. The easiest way to achieve this is by sorting each column of the MSA and determining the most frequent character in a linear pass. Gap characters are not allowed in a consensus sequence, therefore we skip them:
ConsensusSequence := proc(m:matrix)
sequence := '':
# transpose matrix to get columns in an easy way
mt := transpose(m):
# loop over all positions
for position to length(mt) do
# get array with sites at this position
# sort it alphabetically (all same characters
# are now clustered... )
sites := sort(mt[position]):
# what is the best site and its frequency?
bestSite := '$':
bestFreq := 0:
# what is the current site and its frequency?
currSite := '$':
currFreq := 1:
# loop over all sites (i.e. all sequences)
for j to length(sites) do
# if its a gap, skip it - no gaps in consensus sequences
if sites[j] = '_' or sites[j] = '-' then next fi:
# if its the same character as currSite, increase frequency
if sites[j] = currSite then
currFreq := currFreq + 1:
else
# if frequency of current character is higher than
# current highest frequency, update 'bestSite' and
# 'bestFreq'
if currFreq > bestFreq then
bestFreq := currFreq:
bestSite := sites[j]:
fi;
# in any case, update current site and reset its
# frequency
currSite := sites[j]:
currFreq := 1:
fi:
od:
# attach best site to consensus sequence
sequence := sequence.best:
od:
return(sequence):
end:
Then, we need to invert the compression of columns into single characters. This is done in the MergeMSA procedure, which takes a pairwise sequence alignment and expands it to a MSA:
MergeMSA := proc(m1:matrix, m2:matrix, a:Alignment)
alignedStrings := DynProgStrings(a)[2..3]:
len := length(alignedStrings[1]):
# create a new empty MSA (i.e. list of lists)
msa := CreateArray(1..length(m1)+length(m2), 1..len, '_'):
pos1 := pos2 := globalPos := 1:
# run through every position in the aligned consensus seqs
for globalPos to len do
# check first cons.seq: if current position is not a gap,
if alignedStrings[1,globalPos] <> '_' then
# loop over sequences in first MSA and attach characters at
# position 'pos1' to new MSA and increase pos1
for k to length(m1) do msa[k,globalPos] := m1[k,pos1] od:
pos1 := pos1 + 1:
fi:
# same for second cons.seq and second MSA
if alignedStrings[2,globalPos] <> '_' then
for k to length(m2) do msa[k+length(m1),globalPos] := m2[k,pos2] od:
pos2 := pos2 + 1:
fi:
od:
return(msa):
end:
At the leaves of the tree we have to convert our sequences into simple 'MSAs' to be able to apply the AlignProgressive() procedure consistently:
SequenceToMSA := proc(s:string)
[[seq(s[i],i=1..length(s))]]:
end:
Finally, we rewrite our progressive alignment method with Darwin code:
AlignProgressive := proc(m1:matrix, m2:matrix, dist:numeric)
s1 := ConsensusSequence(m1):
s2 := ConsensusSequence(m2):
al12 := Align(s1, s2, DayMatrix(dist), Global):
MergeMSA(m1,m2,al12):
end:
Now we can compute the MSA progressively along the given guide tree:
m1 := SequenceToMSA(s1): m2 := SequenceToMSA(s2): m3 := SequenceToMSA(s3): m4 := SequenceToMSA(s4): m5 := SequenceToMSA(s5): m23 := AlignProgressive(m2,m3,30+50): m45 := AlignProgressive(m4,m5,20+10): m13 := AlignProgressive(m23,m1,10+30): res := AlignProgressive(m13,m45,30+80):
One last thing we have to do is define a method for pretty-printing the MSA:
PrintMSA := proc(m:matrix)
for i in m do
s := '';
for j in i do
s := s.j:
od:
print(s):
od:
end:
PrintMSA(res):
AlignProgressiveTree := proc(sequences:list(string), labels:list(string), tree:Tree)
if type(tree,Leaf) then
i := SearchArray(tree[Label],labels);
m := SequenceToMSA(sequences[i]);
else
m1 := AlignProgressiveTree(sequences,labels,tree[Right]):
m2 := AlignProgressiveTree(sequences,labels,tree[Left]):
m := AlignProgressive(m1,m2,2*tree[Height]-tree[Right,Height]-tree[Left,Height]):
fi;
return(m):
end:
tree := Tree(Tree(Leaf('s1',-60),-30,Tree(Leaf('s2',-70),-40,Leaf('s3',-90))),0,Tree(Leaf('s4',-100),-80,Leaf('s5',
sequences := [s1,s2,s3,s4,s5]:
labels := ['s1','s2','s3','s4','s5']:
res := AlignProgressiveTree(sequences,labels,tree):
PrintMSA(res):
FullAlignProgressive := proc(sequences:list(string), labels:list(string))
tree := PhylogeneticTree(sequences,labels,DISTANCE):
print(tree);
m := AlignProgressiveTree(sequences,labels,tree):
return(m):
end:
res := FullAlignProgressive(sequences,labels):
PrintMSA(res):
Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne
graphische Elemente dargestellt. Die Funktionalität der
Website ist aber trotzdem gewährleistet. Wenn Sie diese
Website regelmässig benutzen, empfehlen wir Ihnen, auf
Ihrem Computer einen aktuellen Browser zu installieren. Weitere
Informationen finden Sie auf
folgender
Seite.
Important Note:
The content in this site is accessible to any browser or
Internet device, however, some graphics will display correctly
only in the newer versions of Netscape. To get the most out of
our site we suggest you upgrade to a newer browser.
More
information