printlogo
http://www.ethz.ch/index_EN
CBRG - Computational Biochemistry Research Group
 
print
  

Exercise - Finding and Verifying Orthologs

Part I: Interactive, web-based exploration of orthology

(by Christophe Dessimoz)

The following tools are recommended to solve part I:
- Orthology: http://omabrowser.org
- MSA: http://www.ebi.ac.uk/mafft/
- Distance Trees: http://omabrowser.org/PhylogeneticTree.html

1. Orthology and function

Consider the following protein sequence:

MAINPQYEEIGKGFVTQYYALFDDSTQRPSLVNLYNAELSFMTFEGQQIQGAAKILEKLQSLTFQNIKRVLTAVDSQPMFDGGVLINVLGRLQCDEDPPHAYSQTFVLKPLGGTFFCAHDIFRLNIHNSA

  1. In which species can you find this protein sequence?
  2. Consider now the orthologs predicted by OMA. In which type of organism are they present?
  3. What could be the function of this sequence? Does such function make sense in the organisms that have it?

2. Orthology and Distance Tree Reconstruction

Now let us consider the protein with SwissProt ID ADC_HUMAN.

  1. Create a FASTA file with 6 sequences: the ADC_HUMAN sequence, its ortholog in the chimpanzee (PANTR), in the mouse (MOUSE), in the elephant (LOXAF) and in the chicken (CHICK). In addition, include also the paralog DCOR_HUMAN.
  2. Reconstruct a distance tree (format "Phylogram") from these sequences. Is the tree obtained consistent with OMA's predictions of orthology? Why?
  3. The constructed tree is rooted. Can you trust that the rooting is correct? What would be the alternative topologies?
  4. If we consider the tree correct, what is the most likely evolutionary relation between DCOR_HUMAN and the protein from chicken?
  5. To increase your confidence in the tree, add the protein with Ensembl ID ENSGALP00000036115 to this set of sequences, and perform tree reconstruction on this extended set of sequences. Discussion?


:) GOOD LUCK :)

Part II: Find orthologs yourself!

1. Initial pairs of orthologs


The goal of this Darwin exercise is to identify potential orthologs between two genomes (human and mouse) and then to use a third genome (the dog) to verify the orthologs and exclude paralogs. This is a simplified version of the OMA algorithm, which is described here.

To speed up the procedure, we have prepared small genome databases
with only a few sequences: human, mouse and dog. These can be read in darwin:

humanDB := ReadDb('human.db');
mouseDB := ReadDb('mouse.db');
dogDB := ReadDb('dog.db');
humanDB := Peptide file(human.db(53580), 15 entries, 7278 aminoacids)
mouseDB := Peptide file(mouse.db(52446), 17 entries, 8600 aminoacids)
dogDB := Peptide file(dog.db(32084), 12 entries, 5426 aminoacids)

To change the current database (e.g. for the Entry function), reassign the variable DB:

DB := humanDB;
DB := Peptide file(human.db(53580), 15 entries, 7278 aminoacids)

The total number of entries in a database can be found using the selector TotEntries :

DB[TotEntries];
15

In a first step, the stable pairs (also called "mutual best hits") between human and mouse need to be identified. This is done by aligning all human sequences against all mouse sequences. To make the procedure simple, only alignments with a score above 200 are considered significant and no confidence intervall is used. Make a list of all matches

(hum_x, mus_y)

which are significant. The stable pairs are then the set of sequence pairs (hum_x,mus_y),

such that no pair

(hum_x, mus_z) or (hum_z, mus_y)

has a higher score. Save all stable pairs as potential orthologs.

To verify: you should now have a list of 7 stable pairs.

2. Verified Pairs

Sequences that form stable pairs can still be paralogs, because of gene losses in the two species (or ancestors of them). In order to detected those paralogs, third genomes are used, in which both paralogs are still present. In this exercise, we use the dog genome. For each stable pair, we have to search for two sequences in dog, dog_z1 and dog_z2, such that

d(hum_x, dog_z1) < d(mus_y, dog_z1) and

d(mus_y, dog_z2)<d(hum_x, dog_z2),

where d is the Pam distance between two sequences.

Again, consider only alignments with a score above 200. If we find two sequences fulfilling this criteria, it means that hum_x and mus_y are not orthologous.

To verify: exactly one of the stable pairs should be invalidated by this procedure.

3. Check the Result

Finally check your result by printing the descriptions of all pairs of orthologs.

HAVE FUN :)

 

Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne graphische Elemente dargestellt. Die Funktionalität der Website ist aber trotzdem gewährleistet. Wenn Sie diese Website regelmässig benutzen, empfehlen wir Ihnen, auf Ihrem Computer einen aktuellen Browser zu installieren. Weitere Informationen finden Sie auf
folgender Seite.

Important Note:
The content in this site is accessible to any browser or Internet device, however, some graphics will display correctly only in the newer versions of Netscape. To get the most out of our site we suggest you upgrade to a newer browser.
More information

© 2012 ETH Zurich | Imprint | Disclaimer | 9 December 2010
top