printlogo
http://www.ethz.ch/index_EN
CBRG - Computational Biochemistry Research Group
 
print
  

Solution - Finding and Verifying Orthologs

(by Christophe Dessimoz)

Part II: Find orthologs yourself!


The goal of the exercise wass to identify potential orthologs between two genomes (human and mouse) and then using a third genome (the dog) to verify the orthologs and exclude paralogs.

As a preparation, we load the databases:

humanDB := ReadDb('human.db');
mouseDB := ReadDb('mouse.db');
dogDB := ReadDb('dog.db');
humanDB := Peptide file(human.db(53580), 15 entries, 7278 aminoacids)
mouseDB := Peptide file(mouse.db(52446), 17 entries, 8600 aminoacids)
dogDB := Peptide file(dog.db(32084), 12 entries, 5426 aminoacids)

and create the Dayhoff matrices

CreateDayMatrices();

We also save the number of sequences in each species:

DB := humanDB;
NH := DB[TotEntries];
DB := Peptide file(human.db(53580), 15 entries, 7278 aminoacids)
NH := 15
DB := mouseDB;
NM := DB[TotEntries];
DB := Peptide file(mouse.db(52446), 17 entries, 8600 aminoacids)
NM := 17
DB := dogDB;
ND := DB[TotEntries];
DB := Peptide file(dog.db(32084), 12 entries, 5426 aminoacids)
ND := 12

1. Stable Pairs

In a first step, we create a table with the scores of all alignments in the human genome against all sequences in mouse. To speed up the many alignments, we first align with one matrix and only for a score above 150 we do the optimal alignment with distance estimation (which is about 13 times slower).

HumMus := CreateArray(1..NH,1..NM):
dm := DayMatrix(20):
for a to NH do 
    DB := humanDB;
    e1 := Entry(a);
    DB := mouseDB;
    for b to NM do
	e2 := Entry(b);
	al := Align(e1,e2,dm);
	if al[Score]>150 then al := Align(e1,e2,DMS) fi;
	HumMus[a,b] := al[Score];
    od;
od:

This table makes it easy to find all stable pairs. Every match with a score above 200 which is the highest score in its row and its column is a stable pair:

stable := []:
for a to NH do for b to NM do
    if HumMus[a,b]>200 then
	t := HumMus[a,b];
	if t=max(HumMus[a]) and t=max(transpose(HumMus)[b]) then
	    stable := append(stable,[a,b]);
	fi;
    fi;
od od:
lprint(stable);
[[1, 6], [2, 2], [3, 17], [4, 3], [5, 5], [7, 9], [13, 4]]

2. Verified Pairs

For the stable pair verification, we need to loop through every stable pair and every pair of dog entries. If none of the pairs of dog sequences contradicts the stable pair, we can keep it.

verified := []:
for sp in stable do
    is_ok := true;
    DB := humanDB;
    hum_x := Entry(sp[1]);
    DB := mouseDB;
    mus_y := Entry(sp[2]);
    DB := dogDB;
    for a to ND do 
	dog_z1 := Entry(a);
	al := Align(dog_z1,hum_x,DMS);
	if al[Score]>200 then d_z1_x := al[PamDistance];
	else next fi;	
	al := Align(dog_z1,mus_y,DMS);
	if al[Score]>200 then d_z1_y := al[PamDistance];
	else next fi;	
	for b from a+1 to ND do
	    dog_z2 := Entry(b);	
	    al := Align(dog_z2,hum_x,DMS);
	    if al[Score]>200 then d_z2_x := al[PamDistance];
	    else next fi;
	    al := Align(dog_z2,mus_y,DMS);
	    if al[Score]>200 then d_z2_y := al[PamDistance];
	    else next fi;
	    if (d_z1_x<d_z1_y and d_z2_y<d_z2_x) or
	       (d_z2_x<d_z2_y and d_z1_y<d_z1_x) then
		    lprint(sp,'are not orthologous because of',a,'and',b);
		    is_ok := false;
	    fi;
	od;
    od;
    if is_ok then verified := append(verified,sp) fi;
od:
lprint(verified);
[13, 4] are not orthologous because of 2 and 4
[[1, 6], [2, 2], [3, 17], [4, 3], [5, 5], [7, 9]]

3. Check the Result

for vp in verified do
    DB := humanDB;
    e1 := Entry(vp[1]);
    DB := mouseDB;
    e2 := Entry(vp[2]);
    lprint(SearchTag(DE,e1),SearchTag(DE,e2));
od:
Zinc finger and BTB domain-containing protein 24 (Zinc finger protein 450). 
zinc finger and BTB domain containing 24
40S ribosomal protein S9. ribosomal protein S9
Dynein light chain 2, cytoplasmic (Dynein light chain LC8-type 2). 
dynein light chain LC8-type 2
F-box/LRR-repeat protein 19 (F-box and leucine-rich repeat protein 19). 
F-box and leucine-rich repeat protein 19
Heat shock 70 kDa protein 12B. heat shock protein 12B
core 2 beta-1,6-N-acetylglucosaminyltransferase 3 
PREDICTED: similar to core 2 beta-1,6-N-acetylglucosaminyltransferase 3
 

Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne graphische Elemente dargestellt. Die Funktionalität der Website ist aber trotzdem gewährleistet. Wenn Sie diese Website regelmässig benutzen, empfehlen wir Ihnen, auf Ihrem Computer einen aktuellen Browser zu installieren. Weitere Informationen finden Sie auf
folgender Seite.

Important Note:
The content in this site is accessible to any browser or Internet device, however, some graphics will display correctly only in the newer versions of Netscape. To get the most out of our site we suggest you upgrade to a newer browser.
More information

© 2012 ETH Zurich | Imprint | Disclaimer | 20 December 2010
top