|
|||||||||||
Open Positions
Research
The goal of the exercise wass to identify potential orthologs
between two genomes (human and mouse) and then using a third genome (the dog)
to verify the orthologs and exclude paralogs.
As a preparation, we load the databases:
humanDB := ReadDb('human.db');
mouseDB := ReadDb('mouse.db');
dogDB := ReadDb('dog.db');
humanDB := Peptide file(human.db(53580), 15 entries, 7278 aminoacids) mouseDB := Peptide file(mouse.db(52446), 17 entries, 8600 aminoacids) dogDB := Peptide file(dog.db(32084), 12 entries, 5426 aminoacids)
and create the Dayhoff matrices
CreateDayMatrices();
We also save the number of sequences in each species:
DB := humanDB; NH := DB[TotEntries];
DB := Peptide file(human.db(53580), 15 entries, 7278 aminoacids) NH := 15
DB := mouseDB; NM := DB[TotEntries];
DB := Peptide file(mouse.db(52446), 17 entries, 8600 aminoacids) NM := 17
DB := dogDB; ND := DB[TotEntries];
DB := Peptide file(dog.db(32084), 12 entries, 5426 aminoacids) ND := 12
In a first step, we create a table with the scores of all alignments in the human genome against all sequences in mouse. To speed up the many alignments, we first align with one matrix and only for a score above 150 we do the optimal alignment with distance estimation (which is about 13 times slower).
HumMus := CreateArray(1..NH,1..NM):
dm := DayMatrix(20):
for a to NH do
DB := humanDB;
e1 := Entry(a);
DB := mouseDB;
for b to NM do
e2 := Entry(b);
al := Align(e1,e2,dm);
if al[Score]>150 then al := Align(e1,e2,DMS) fi;
HumMus[a,b] := al[Score];
od;
od:
This table makes it easy to find all stable pairs. Every match with a score above 200 which is the highest score in its row and its column is a stable pair:
stable := []:
for a to NH do for b to NM do
if HumMus[a,b]>200 then
t := HumMus[a,b];
if t=max(HumMus[a]) and t=max(transpose(HumMus)[b]) then
stable := append(stable,[a,b]);
fi;
fi;
od od:
lprint(stable);
[[1, 6], [2, 2], [3, 17], [4, 3], [5, 5], [7, 9], [13, 4]]
For the stable pair verification, we need to loop through every stable pair and every pair of dog entries. If none of the pairs of dog sequences contradicts the stable pair, we can keep it.
verified := []:
for sp in stable do
is_ok := true;
DB := humanDB;
hum_x := Entry(sp[1]);
DB := mouseDB;
mus_y := Entry(sp[2]);
DB := dogDB;
for a to ND do
dog_z1 := Entry(a);
al := Align(dog_z1,hum_x,DMS);
if al[Score]>200 then d_z1_x := al[PamDistance];
else next fi;
al := Align(dog_z1,mus_y,DMS);
if al[Score]>200 then d_z1_y := al[PamDistance];
else next fi;
for b from a+1 to ND do
dog_z2 := Entry(b);
al := Align(dog_z2,hum_x,DMS);
if al[Score]>200 then d_z2_x := al[PamDistance];
else next fi;
al := Align(dog_z2,mus_y,DMS);
if al[Score]>200 then d_z2_y := al[PamDistance];
else next fi;
if (d_z1_x<d_z1_y and d_z2_y<d_z2_x) or
(d_z2_x<d_z2_y and d_z1_y<d_z1_x) then
lprint(sp,'are not orthologous because of',a,'and',b);
is_ok := false;
fi;
od;
od;
if is_ok then verified := append(verified,sp) fi;
od:
lprint(verified);
[13, 4] are not orthologous because of 2 and 4 [[1, 6], [2, 2], [3, 17], [4, 3], [5, 5], [7, 9]]
for vp in verified do
DB := humanDB;
e1 := Entry(vp[1]);
DB := mouseDB;
e2 := Entry(vp[2]);
lprint(SearchTag(DE,e1),SearchTag(DE,e2));
od:
Zinc finger and BTB domain-containing protein 24 (Zinc finger protein 450). zinc finger and BTB domain containing 24 40S ribosomal protein S9. ribosomal protein S9 Dynein light chain 2, cytoplasmic (Dynein light chain LC8-type 2). dynein light chain LC8-type 2 F-box/LRR-repeat protein 19 (F-box and leucine-rich repeat protein 19). F-box and leucine-rich repeat protein 19 Heat shock 70 kDa protein 12B. heat shock protein 12B core 2 beta-1,6-N-acetylglucosaminyltransferase 3 PREDICTED: similar to core 2 beta-1,6-N-acetylglucosaminyltransferase 3
Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne
graphische Elemente dargestellt. Die Funktionalität der
Website ist aber trotzdem gewährleistet. Wenn Sie diese
Website regelmässig benutzen, empfehlen wir Ihnen, auf
Ihrem Computer einen aktuellen Browser zu installieren. Weitere
Informationen finden Sie auf
folgender
Seite.
Important Note:
The content in this site is accessible to any browser or
Internet device, however, some graphics will display correctly
only in the newer versions of Netscape. To get the most out of
our site we suggest you upgrade to a newer browser.
More
information