Molecular Phylogeny: Darwinian, Genomic and Phenotypic

Speaker: Charles Kurland, Professor Emeritus, Lund University
Host: Birgitte Regenberg, Cell Biology and Physiology

Abstract:
The ancestry and descent of extant organisms represented by several hundred genome sequences has been reconstructed from genome scale distributions of protein domains in the form of a tree of life (ToL).

Annotation of hundreds of genome sequences identifies circa 1800 unique protein domains, at the superfamily level of the SCOP hierarchy. The superfamilies are the core domains of proteins, which alone or in different combinations define the structures and function of proteins. The individual members of a superfamily are all homologous to each other. Superfamilies (SFs) are excellent complex phylogenetic characters. SFs are phenotypic characters that are retrieved from genome sequences with the help of HMMs.

The genomes of representative Archaea, Bacteria and Eukaryotes encode very approximately equivalent numbers of unique SFs. Finally, the intrinsic diversity of the SF proteomes of Akaryotes is not very different from that of Eukaryotes: ca. 600 versus nearly 900 SFs, respectively. The enormous size differences between Eukaryote and Akaryote coding sequences are due to correspondingly large differences in the numbers of duplicated SFs that the respective genomes encode.

Using genome content of superfamilies as taxa, a Sankoff version of Maximum Parsimony with penalties has been implemented to reconstruct an intrinsically rooted ToL. The root of the modern ToL, the universal common ancestor (UCA) is complex with circa 1400 SFs out of the 1800 SFs that make up a 336-genome sampling. The three superkingdoms descend in two major lineages from that very complex UCA, with the Archaea and Bacteria as sister clades on one branch (Akaryotes) and the Eukaryotes on the other.

The Akaryotes are not ancestors to the Eukaryotes. Horizontal Gene Transfer (HGT) in general is limited to a small percent of taxa, and there is no evidence at all for an endosymbiotic origin of Eukaryotes. Most of the modern SF proteome of mitochondria, which is almost half of the Eukaryote SF proteome, has been identified in the ancestor, UCA. Phylogeny shows that mitochondria descend from the UCA, not from bacteria.