The Tree of Life: Tangled Roots and Sexy Shoots
Tracing the genetic pathway from the Last Universal Common Ancestor to Homo sapiens
Chris King - Genotype 1.2.6 - 4 Jan 2013
PDF Active site: http://www.dhushara.com/unravel/
This article is a fully referenced research review to overview progress in unraveling the details of the evolutionary Tree of Life, from life's first occurrence in the hypothetical RNA-era, to humanity's own emergence and diversification, through migration and intermarriage, using research diagrams and brief discussion of the current state of the art, in response to summaries of some of these developments (Lawton 2009, Lane 2009).The Tree of Life, in biological terms, has come to be identified with the evolutionary tree of biological diversity. It is this tree which represents the climax fruitfulness of the biosphere and the genetic foundation of our existence, embracing not just higher Eukaryotes, plants, animals and fungi, but Protista, Eubacteria, and Archaea, the realm, including the extreme heat and salt-loving organisms, which appears to lie almost at the root of life itself.
Fig 1: The Tree of Life (King) Click on image to enlarge.
See also: Entheogens, the Conscious Brain and Existential Reality A state of the art research overview of what is currently known about how entheogens, including the classic psychedelics, affect the brain and transform conscious experience through their altered serotonin receptor dynamics, and to explore their implications for understanding the conscious brain and its relationship to existential reality, and their potential utility in our cultural maturation and understanding of the place of sentient life in the universe.
LUCA: The Last Universal Common Ancestor
Following a phase of biogenesis possibly based on cosmic symmetry-breaking (King 1978, 2004), based on spontaneous prebiotic RNA synthesis (Powner et. al. 2009, 2010) recent research suggests that the last universal common ancestor (LUCA) of all life on the planet may have arisen before the first cells, from a phase interface between alkaline hydrogen-emitting undersea vents and the archaic acidified iron-rich ocean (Martin and Russel 2003) in which differential dynamics in membranous micropores in the vents managed to concentrate polypeptides and polynucleotides to biologically sustainable levels (Baaske et. al. 2007, Budin et. al. 2009), giving rise to the RNA era, while at the same time providing a free energy source based on proton transport across membranous microcellular interfaces resulting from fatty acids also being concentrated above their critical aggregate concentration. The transition to enclosed cells is likely to have been in an active iron-sulphur reaction phase still present in living cells and associated with sodium-proton anti-porters activating ATP (Lane and Martin 2012), leading in turn to electron transport and some of the most ancient proteins, such as ferredoxin,
Fig 1a: Proposed scheme for the universal common ancestor (Martin and Russel 2003)
The universal common ancestor of the three domains of life may have thus been a proton-pumping membranous interface from which archaea and bacteria emerged as free-living adaptions. This is suggested by fundamental differences in their cell walls and other details of evolutionary relationships among some of the oldest genes.
Among the archaea, halobacteria still use a form of photosynthesis generating ATP from H+ gradients generated by a rhodopsin protein and those in hydrothermal vents rely on Na+-H+ antiporters to generate ion gradients, and their membrane proteins, such as the ATP synthase, are compatible with gradients of sodium ions or protons (Lane and Martin 2012, Yong 2012).
Fig 1b: (Above) founding metabolism based on Na+-H+ anti-transported, ATP synthetase and FeSNiS containing vents (Lane and Martin 2012). The extremely ancient origin of the rhodopsin family of heptahelical receptors can be seen from the ultra-primitive archael photosynthesis in Halobacteria, which relies on direct coupling between photo-stimulated chemiosmotic H+ pumping and H+ generated ATP formation, based on bacteriorhodopsin, which is heptahelical, uses a form of retinal and whose helices share a distant sequence homology with vertebrate rhodopsin (Ihara et al 1999).
It has also been proposed, on the basis of the highly-conserved commonality of transcription and translation proteins to all life, but the apparently independent emergence of distinct DNA replication enzymes in archaea/eucaryotes and eubacteria, that the last universal common ancestor had a mixed RNA-DNA metabolism based on reverse transcriptase, pinpointing it to the latter phases of the RNA era (Leipe et. al. 1999).
To get a characteization of LUCA at the point it diversified into the three domains of life Archaea, Eucaryotes and Bacteria, one cannot rely on nucleotide gene sequences because these would have mutated beyond recognition, but amino acid sequences mutate more slowly because neutral mutations leave the amino acid sequence fixed and the tertiary folded structure of a protein is even more strongly conserved.
To reconstruct the set of proteins LUCA could make, Kim and Caetano-Anollés (direct link) searched a database of proteins from 420 modern organisms, looking for structures that were common to all. Of the structures he found, just 5 to 11 per cent were universal, meaning they were conserved enough to have originated in LUCA. By looking at their function, they conclude that LUCA had enzymes to break down and extract energy from nutrients, and some protein-making equipment, but it lacked the enzymes for making and reading DNA molecules.
Fig 1d: Phylogenomic tree of proteomes describing the evolution of 420 FL organisms. phylogenomic study of protein domain structure in the proteomes of 420 free-living fully sequenced organisms. Domains were defined at the highly conserved fold superfamily (FSF) level of structural classification (Kim and Caetano-Anollés).
(click image to link to original research version)
Organelles were thought to be the preserve of eukaryotes, but in 2003 researchers found an organelle called the acidocalcisome also occurred in bacteria. Caetano-Anollés' team has now found that tiny granules in some archaea are also acidocalcisomes, or at least their precursors. That means acidocalcisomes are found in all three domains of life, and date back to LUCA (Seufferheld et al. - direct link).
Acidocalcisomes were originally discovered in Trypanosomes (sleeping sickness and Chagas disease) but have since been found in Toxoplasma gondii (toxoplasmosis), Plasmodium (malaria), Chlamydomonas reinhardtii (a green alga), Dictyostelium discoideum (a slime mould), bacteria and human platelets. Their membranes contain a number of protein pumps and antiporters, including aquaporins, ATPases and Ca2+/H+ and Na+/H+ antiporters. Acidocalcisomes have been implied in osmoregulation. They were detected in vicinity of the contractile vacuole in Trypanosoma cruzi and were shown to fuse with the vacuole when the cells were exposed to osmotic stress. Presumably the acidocalcisomes empty their ion contents into the contractile vacuole, thereby increasing the vacuole's osmolarity. This then causes water from the cytoplasm to enter the vacuole, until the latter gathers a certain amount of water and expels it out of the cell.
Fig 1e: Tangled web linking acidocalcisomes in existent archaea, bacteria and eucaryote species (Seufferheld et al.), overlaying electron micrographs of acidocalcisomes in Agrobacterium tumefaciens(a, b) and Methanosarcina acetivorans (c, d). (click image to link to original research version)
LUCA may have used RNA rather than DNA, as there is no evidence LUCA possessed ribonucleotide reductases, which create the deoxy versions of ribonucleotides the building blocks of DNA (Lundin et al - direct link). Rather it appears these functions have been transferred from bacteria back to archaea by horizontal transfer on at least two separate occasions (arrows in fig 1e). Eucaryotes (mid green) would also have received theirs after LUCA diversification.
Fig 1f: Ribonucleotide reductase trees showing bacterial, eucaryote and archaeal branches, with evidence of two events of horizontal transfer from bacteria to archaea (arrows) after the diversification of LUCA (Lundin et al).
LUCA was a "progenote". Progenotes can make proteins using genes as a template, but the process is so error-prone that the proteins can be quite unlike what the gene specified. Both Di Giulio and Caetano-Anollés have found evidence that systems that make protein synthesis accurate appear long after LUCA. In order to cope, the early cells must have shared their genes and proteins with each other. Caetano-Anollés says the free exchange and lack of competition mean this living primordial ocean essentially functioned as a single mega-organism.
Three Domains of Life
Life today is informationally based on the sequences of the four bases A, G, T and C in DNA, with messenger copies of the genetic sequence in mRNA (with U replacing T) forming intermediates in the assembly of proteins, as the cell's primary active chemical and structural agents. This is achieved through a process of translation at the ribosome - a supra-molecular complex composed of some 50 chaperoning proteins surrounding a core composed of three rRNA units, fed by amino-acid coupled tRNAs. The RNAs carry out the essential function, supporting the idea that translation was at first a purely RNA-based process of protein construction. In line with this and other RNA fossils found particularly in Eukaryotes, it is widely believed that life began based on RNA, which shares both the capacity for complementary replication of DNA and the formation of 3-dimensional chemically reactive conformations, similar to proteins, after which the ribosome evolved, transferring the reactive burden on to proteins sequenced through the genetic code. Some time later, the informational genome was consolidated into more stable DNA.
Fig 2: The initial tree of rRNAs shows three distinct founding domains (Woese 1987) Click to enlarge
Originally the Bacteria and Archaea were thought to be one large diverse family of prokaryotes until Carl Woese (1977, 1978, 1987, 1990) and others investigated the evolutionary tree of ribosomal RNAs and found that there were three distinct founding evolutionary domains, then named eubacteria, archaebacteria along with the eukaryotes.
This gave the Eukaryotes a closer founding status as well, by contrast with the idea that the procaryotic bacteria came first and then, somehow the higher Eukaryote organisms with their complex cellular structures, including among others - the endoplasmic reticulum, along with the nuclear envelope and Golgi apparatus - all parts of a common complex of internal membranous partitions - and the architecture of microtubules, including centrioles, and the Eukaryote flagellum, as well as the Eukaryotes endosymbiont mitochondria and chloroplasts.
Fig 3: Key structural differences separating the larger rRNA units of the three domains (Woese 1987) (click to enlarge).
In addition to their evolutionary sequence divergence, the smaller 30s ribosomal RNAs of each domain, show distinct structural features characteristic of their own domain, but also emphasizing structural links between Bacteria and Archaea on the one hand and Archaea and Eukaryotes on the other, qualitatively confirming the central place of the Archaea in the divergence.
Fig 4: Small and large rRNA subunits of the eubacteia Thermus thermophilus and the archaeon Haloarcula marismortui.
RNA orange and yellow, protein blue and active site green. (Wikipedia Ribosome) Click either image to see it rotating.
The validity of the RNA-era concept and the capacity for RNAs to be both replicating informational and active ribo-enzymes is emphasized by the continuing dependence of the ribosome on rRNA rather than the protein components demonstrated by the 3-dimensional realizations of the two subunits in fig 4, which show that the rRNA molecules are still carrying out the central task of protein assembly with only minor modification due to the 'chaperoning' proteins, despite 3.8 billion years of evolution.
Fig 5: Further elaboration of the rRNA tree (Pace 1997) Click to enlarge
Norman Pace subsequently enlarged the scope and accuracy of the rRNA tree, including a greater diversity of organisms. This tree has become the basis of several other studies (see e.g. fig 10).
Fig 6: Lower right: A third rRNA tree which suggests Archaea lie very close to the root is contrasted with that for the enzyme HMGCoA reductase, which also shows evidence of horizontal transfer to an Archaean (ex Doolittle 2000).
The Copernican principle asserts that the Earth is a typical rocky planet in a typical planetary system, located in an unexceptional region of a common barred-spiral galaxy, hence it is probable that the universe teems with complex life. This is supported to a reasonable extent by the discovery of an increasing number of planets including some putative "Goldilocks" zone planets where water would be liquid and life as we know it could potentially exist. Set against this, the Rare Earth hypothesis argues that the emergence of complex life requires a host of fortuitous circumstances including a galactic habitable zone, a central star and planetary system having the requisite character, the circumstellar habitable zone, the size of the planet, the advantage of a large satellite, conditions needed to assure the planet has a magnetosphere and plate tectonics, the chemistry of the lithosphere, atmosphere, and oceans, the role of "evolutionary pumps" such as massive glaciation and rare bolide impacts, and whatever led to the still mysterious Cambrian explosion of animal phyla. This might mean that planets able to support a bacterial level of life are not so uncommon, but those supporting complex multicellular life might be.
Bringing this question to a pivotal crux in our context, the emergence of mitochondria as endosymbionts has been proposed to be a critical bottleneck which allowed complex life to evolve only once, because, only in this effectively fractal cellular architecture, can the membrane surface areas necessary to support the chemical reactions enabling the vastly larger number of genes in a complex organism's genome to maintain metabolic stability (Lane and Martin). Whether such endo-symbiosis is rare. or a common extreme of parasitic relationships would then determine how likely or unlikely complex life might be.
Left: Bacterium Gemmata obscuriglobus with internal nuclear envelope and vaccuoles (Rachel Melwig & Christine Panagiotidis / EMBL). Right: Ultrathin EM section of a mimivirus in amoeba (Jean-Michel Claverie) Inset: Mimivirus infected by sputnik phage.
Offset against both the uniqueness of the mitochondrial endo-symbiosis and the closely linked, but independent question of the origin of the nucleus and nuclear envelope, has been the discovery of mimiviruses and mamaviruses infecting amoeba (Raoult et, al.) and related very large aquatic viruses such as CroV infecting single celled plankton species, which despite their recent discovery, appear from ocean gene analyses to be potentially ubiquitous and widespread in the oceans and possibly playing a crucial role in regulating the atmospheric-oceanic pathways, such as carbon sequestration (Fisher et. al.). These form an intermediate genetic position between viruses and cells, having the largest genomes, with extensive cellular machinery and larger than the smallest completely autonomous bacterial and archaeal genomes.
As an illustration of genes in mimivirus normally appearing only in cellular genomes, the mimivirus has genes for central protein-translation components, including four amino-acyl transfer RNA synthetases, peptide release factor 1, translation elongation factor EF-TU, and translation initiation factor 1. The genome also exhibits six tRNAs. Other notable features include the presence of both type I and type II topoisomerases, components of all DNA repair pathways, many polysaccharide synthesis enzymes, and one intein-containing gene. Inteins are protein-splicing domains encoded by mobile intervening sequences (IVSs). They self-catalyze their excision from the host protein, ligating their former flanks by a peptide bond. They have been found in all domains of life (Eukaria, Archaea, and Eubacteria), but their distribution is highly sporadic. Only a few instances of viral inteins have been described. Self-splicing type I introns are a different type of mobile IVS, self-excising at the mRNA level. They are rare in viruses. Mimivirus exhibits four instances of self-excising intron, all in RNA polymerase genes.
Evolutionary diversification of Mimiviruses from nucleocytoplasmic large DNA viruses (Fisher et. al.) and in relation to the three domains of cellular life based on the concatenated sequences of seven universally conserved protein sequences (Raoult et. al.)
Mimiviruses also host parasitic virophages, affectionately named sputnik as viral satellites, which piggy back on the metabolism of the large viral factories set up by these giant viral genomes causing the mimiviruses to sicken, and these virophages also contains genes that are linked to viruses infecting each of the three domains of life Eukarya, Archaea and Bacteria (La Scola et. al.). It has thus been suggested that they have a primary role in the establishment of cellular life and that they may have been instrumental in the emergence of the nuclear envelope.
Tangled Roots of Horizontal Transfer
Despite the division into three domains, further investigations of proteins in the three domains began to reveal a much more confused and complicated picture. Firstly the ribosomal proteins, like the rRNAs show distinct, easily differentiated morphologies with some correspeondences linking one pair of domains and other another pair (Forterre 2006b, Woese 2000). Secondly, the proteins in Eukaryotes appear to have a mixed origin with the informational ones having an evolutionary relationship with Archaea but the metabolic enzymes appearing to have a bacterial origin. This suggests that the Eukaryote genome has either resulted from one, or more symbiotic fusions e.g. an Archaeal and a bacterial genome and/or that there has been a high degree of horizontal gene transfer between bacteria and Eukaryotes.
Fig 7: Evolution of iron-sulphur cluster proteins of mitochondria is linked to α-proetobacteria (Emelyanov 2003)
The evidence for symbiotic inclusions is clear from the fact that all Eukaryotes, except for a few primitive anaerobic varieties, such as the metamonad human gut parasite Giardia lamblia, all have endosymbiotic respiring mitochondria, which are evolutionarily related to α-proteobacteria such as Rickettsiae (Emelyanov 2003). Plants also have photosynthetic chloroplasts derived from cyanobacteria. α-proteobacteria, including Rickettsiae (and related Wollbachia and Agrocbacterium), obligately live in the cytoplasm of other cells and so are naturally adapted to becoming an endo-symbiont of a glycolytic organism by providing respiring energy to the host's metabolism resulting in the mitochondrion. Giardia still retains traces of mitochondrial proteins so appears to have lost its respiring organelles, rather than occupying a place in the tree before mitochondria were incorporated into eucarya (Adam 2000).
Fig 8: Tangled roots (Doolittle 2000)
The picture of horizontal transfer is even more tangled in bacterial and archaeal genomes, which contain a great number of shared and exchanged genes, probably through viral transfer.
This has led generally to Doolittle (1998), Woese (2002), and others, proposing a tangled root to the tree of life involving a transition from a regime in which thre was a much higher rate of horizontal exchange and effective global optimization of genomes, to tree-like vertical evolution of genomes, once the more complex genomes of the Eukaryote domains became established.
Fig 9: High resolution tree of the three domains of life (Ciccarelli, Bork et. al. 2006) (click to enlarge).
Purple eubacteria, green archaea, red Eukaryotes.
Subsequently, to try to clarify the taxonomic relationships founding the tree of life, Peer Bork and his team produced a refined evolutionary tree by selecting only universal proteins that had not been subjected to horizontal transfer, providing the most detailed tree root diagram to date, although admittedly on only a skeleton gene set comprising some 1% of the respective genomes. The phylogenetic tree has its basis in a cleaned and concatenated alignment of 31 universal protein families and covers 191 species whose genomes have been fully sequenced. Merhej et. al. have further demonstrated convergent evolution among specialized bacterial groups.
Fig 9b: Tree diagram of the birth, transfer, duplication and loss of key genes in the redox and electron transport pathways, in a founding burst of gene evolution between 3.3 and 2.7 billion years ago (David and Alm 2010).
Then Lawrence David and Eric Alm (2010) produced the above tree investigating the central genes common to a wide spectrum of life forms, involving the founding steps of redox reactions and electron transport, demonstrating a rapid evolutionary innovation during an Archaean genetic expansion between 3.3 and 2.7 billion years ago. They mapped the evolutionary history of 3983 gene families that occur in a wide range of modern species. They were able to show that 27 per cent of these gene families appeared in a short evolutionary burst. Many of the genes from this time were involved in electron transport - a key step in respiration and photosynthesis, which ultimately led to oxygen-producing photosynthesis and the "great oxygenation event" 2.4 billion years ago, when the atmosphere became oxygen rich.
This lends support to the idea that the collective promordial genome functioned as a supercomputer (King 2010) based on parallel genetic algorithms combined with horizontal genetic transfer, whose bit computation rate through mutation and recombination is sufficient to generate the functional conformations, through protein folding, to solve the key metabolic pathways over a period no longer than 300 million years.
Bacteria engage in much more radical forms of pan-sexuality than higher organisms, involving viruses and plasmids, themselves separate mobile genetic elements, acting as agents of genetic transfer, accelerating the pace of bacterial evolution (Maxmen 2010). This enables the genetic sequences of bacteria, archaea and protists to move around in the genome and to be exchanged between cells, and even between different species. Sexual exchange of material can happen both through viral exchange and through a conjugation plasmid, which can spool DNA from one bacterium into another, resulting in a net donation of genes from one strain or species to another, which ensures a broad exchange of genetic material throughout bacterial ecosystems, resulting in rapid accumulation of advantageous genes exemplified by plasmid borne infectious drug resistance.
To give a very rough idea of the computing power of the combined bacterial genome alone, taking into account bacterial soil densities (~109/g), effective surface area (~1018 cm2), genome sizes (~106), combined reproduction and mutation rates (~10-3/s) gives a combined presentation rate of new combinations of up to 1030 bits per second, roughly 1013 times greater than the current fastest computer at 2 petaflops or about 1017 bit ops per second. Corresponding rates for complex life forms would be much lower, at around 1017 per second because they are fewer in total number and have lower reproduction rates and longer generation times, but they are still vying with the computation rates of the fastest supercomputer on earth.
This picture of bit rates coincides closely with the Archaean expansion scenario noted above and suggests that evolution has been a two-phase process in which the much higher bit rates of the collective single-celled genome under promiscuous sexuality and horizontal transfer has arrived at a global genetic solution to the protein folding problems of the central metabolic, electro-chemical and even root developmental pathways, which are then later capitalized on by multi-celled organisms, through gene duplication and loss as well as the creation of new specialized genes at a much lower rate.
Fig 10: Horizontal transfers across the bacterial tree under two thresholds 10, 5 genes (Dagan et. al.).
The massive extent of horizontal transfer in eubacteria, as well as archaea has also become clear suggesting large components of procaryote genomes are effectively globaly optimized for their niches by frequent genetic transfer. Dagan et. al. have characterized the extent of horizontal transfer for a series of thresholds as well as establishing specific modularity of horizontal transfer of functions between groups.
Fig 11: Evolutionary root of the tree of life and its diversification into archaea, bacteria and Eukaryotes appears to have gone through an early period of cool temperature consistent with an RNA era, followed by a hot period (Anathaswamy, Boussau et. al.) (click to enlarge).
Surviving Archaea are known to inhabit extreme environments, including hot vocanic pools, hydrothermal vents and extreme salty environments and several arrangements of the root of the tree, including Bork's team's work suggest a hot origin for life. However other research (Brochier and Philippe, Boussau et. al.), concludes the base root may have been at about 25oC, a more viable temperature for a simple RNA metabolism, with a succeeding period of high temperature adaptions shortly after the differentiation of the three domains in evolutionary time.
Fig 12: Genetic diffusion at the root of the tree (Dagan and Martin)
Critics of the validity of the tree root concept, such as Dagan and Martin emphasize the small proportion (1%) of the genome used in Bork's study and stress both the lateral (or horizontal) gene transfer events uniting the prokaryote realms and the apparent chimaeric nature of the Eukaryote genome, which appears to contain both archaea-related informational genes and eubacterial metabolic ones, in addition to obvious endosymbiont events of the mitochondrion and chloroplast.
Fig 13: Pattern of invasions of the SPIN element (Lisch)
The case for horizontal transfer of genes between unrelated Eukaryote species through infectious elements invading new and hence non-resistant species is also well established. The SPIN element is present in a diverse unrelated set of species, spanning amphibians, reptiles, marsupials and mammals while absent from closely related species (Lisch).
The Eukaryote Nuclear Genome as a Genetic Fusion
Fig 14: The proposed ring of life (Rivera and Lake)
In a further study, Rivera and Lake used a new algorithm to take account of possible genetic fusion events by forming a genetic ring through matching partial trees into a most parsimonious whole, inferring that the Eukaryote genome has arisen from a fusion of an archaeal (possibly eocyte) genome with that of either a cyanobacterium or possibly a γ-proteobacterium.
The method used cannot definitively determine whether or not the eubacterial genome could have come from the mitochondrial event, which, to an even greater extent than the more recent chloroplast, has resulted in a high net transfer of genes from the mitochondrial chromosome to the nucleus, leaving open the possibility that in addition to the mitochondrial symbiosis, and the later chloroplast one, there may have been an additional genetic fusion. Lane and Archibald have cited further major endosymbiosis events involving complex three genome interaction in protista, where both green and red algae have been incorporated by endosymbiosis into other protists, which demonstrate both that endosymbiois has occurred many times and the genomic complexity of nuclear symbiont gene exchange.
Fig 15: Proposed fusion between two genomes - informational, from Archaea (red), and metabolic, from Eubacteria (blue) as well as mitochondrial genes migrating to the nucleus (green) (Horiike et. al.). Up to 75% of nuclear genes whose ancestry has been elucidated may come from bacteria (Lane).
The idea of a genetic fusion between an archaea and a γ-proteobacterium is also supported by several other lines of research including evolution of glycolytic enzymes (Emelyanov), ‘homology-hit analysis’ of non-mitochondrial genes determined the number of yeast orthologous ORFs in each functional category to the ORFs in six Archaea and nine Bacteria at several thresholds, suggesting an archaeal parasite engulfed by a eubacterium (Horiike et. al.) the proposal that close association between a central methanogenic archaebacterium (archaea) and a close-knit surrounding clump of ancestral sulfate-respiring δ-proteobacteria could have also led to the nucleus and endoplasmic reticulum (Moreira and Lopez-Garcia).
An intriguing question is raised by the bacterium Gemmata obscuriglobus which is a member of the phylum Planctomycetes (see fig 9 enlargement) which appears to have both a nuclear envelope and endoplamic reticulum-like intra-cellular membranes and the ability to uptake proteins present in the external milieu in an energy-dependent process analogous to eukaryotic endocytosis consistent with autogenous evolution of endocytosis and the endomembrane system in an ancestral noneukaryote cell (Lonhiennea et. al.).
Fig 16: Gene replacement tree root (Makarova) Hartman and Federov's list of putative chronocyte genes correspond to the 359 above.
Other theories stick to the three domain paradigm and propose that a primitive Eukaryote precursor possibly still retaining an RNA-based genome, as suggested by Woese (1998) might be the case for the progenote (first root life form) to be the last universal common ancestor of all three domains, possibly including genes for endoplasmic reticulum and microtubules, engulfed both and archaea and a eubacterium. Hartman and Federov cite a collection of such genes, including those for ribosomal proteins as well, naming the organism as a chronocyte.
This is also consistent with the much greater complexity of use of RNA in Eukaryotes, including alternative splicing, the use of introns, interfering-RNAs in gene regulation, micro-RNAs and the use of the nucleus to contain a diversely functioning RNA informational metabolism not unlike that of a putative progenote.
A more detailed evolutionary picture showing the relationship between metazoa and protozoans and the bridge between choanoflagellates and sponges has been developed in association with elucidating the genome of the single-celled choanoflagellate Monosiga brevicollis (King N et. al. 2008 The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans Nature 451 783-4 doi:10.1038/nature06617).
Trees of eucaryotes based on translation elongation factor EF-2 and β-tubulin genes (King and Carroll)
Viral Influences on the Nuclear Genome
Fig 17: Proposed viral contribution of DNA polymerases FvA etc. founder viruses (Forterre)
Other cellular and viral genealogies are possible and the scheme is merely representative.
Forterre looks likewise to a three component origin, but his emphasis is on the idea that viruses have contributed major components to the genome of all three groups, possibly providing each of three RNA-based cell lineages with independent transitions to DNA-based genomes by contributing DNA-polymerases, thus radically improving the stability and competitiveness of these cell lines who became the eventual survivors. In addition to the ribosomal proteins and rRNAs having distinct qualitative features in each domain, many DNA informational proteins exist in different nonhomologous families (usually with several versions for one family). There are already six known nonhomologous families of cellular DNA polymerases. In the case of DNA polymerases of the B family, there is one version in Bacteria (only found in some proteobacteria), one in Archaea, and several in Eukarya. The distribution of the different versions and families of cellular DNA informational proteins among domains is erratic most of the time and does not fit with any of the models proposed for the universal tree, suggesting abrupt insertion into the cellular genomes by viral transfer.
First glimpse at the viral birth of DNA (New Scientist Apr 2012) DNA-based RNA-viral genes in endemic viruses show reverse transcriptase has made multiple RNA to DNA transitions of other viral genes, probably when RNA, DNA and retroviruses cohabited cells.
Evolution of viruses such as phages appears to have no evolutionary tree with genomes across widely diverse habitats consisting of cut and paste components implying viral adaption has resulted almost entirely from utilization of advantageous genes from horizontal transfer. Around 10% of all bacterial genes sequenced to date consist of ORFans that bear no resemblance to genes seen anywhere else, suggesting horizontal viral origin (Hamilton).
Fig 18: Evolutionary tree of DNA polymerase amino termini (Villareal and Defilippis) (click to enlarge).
Villareal and Defilippis have likewise investigated the idea that DNA viruses are the origin of DNA replication proteins, by investigating the amino terminus and constructing an evolutionary tree which shows DNA polymerases of DNA viruses, eukaryotes (&alpha,&delta), archaea, E coli and two phages rooted in a tree consistent with a viral origin.
This idea has a great deal of plausibility because viruses are now know to have a potentially primal origin, rather than being recent escapees from cellular genomes which have undergone reductive parasitic changes to their genome. Viruses clearly also have retained both RNA-RNA, DNA-DNA and retrotranscription DNA-RNA-DNA using both RNA and DNA stages in their capsid viral forms, so they retain all the transitional states between RNA and DNA-based replication.
Fig 19: Bacterial DNA polymerases also show viral members (underlined) close to the root of the tree (File et. al.).
Furthermore the retroviruses and related mobile genetic elements have a common ancient evolutionary origin, which is related to telomerase, which itself uses an RNA primer to initiate chromosome duplication. There is thus a plausible case that telomerase is in fact a biological fossil of a retroviral conversion of the founding Eukaryote cell line to a DNA genome.
The Symbiotic Face of Eukaryote Mobile Elements
Fig 20: (Left) Human transposable element evolutionary history of L1-LINEs (cream), Alu elements (lt. blue), retrovirus-like LTR (long-terminal repeat) elements (green) and DNA transposons (dark brown). Older L2-LINEs and SINEs are in yellow and dark blue. This history extends back over 200 million years indicating the very ancient basis of this potentially symbiotic relationship (Human Genome Consortium) (click to enlarge). Comparison with the mouse genome can be accessed at Waterston et. al. (2002). (Right) L1 replication: L1 is transcribed in open reading frames ORF1, an RNA-binding protein, and ORF2 an endonuclease/reverse-transcriptase. The bound RNA-protein complex RNP is transported to the nucleus where target-primed reverse transcription to chromosomal DNA takes place (Han and Boeke).
In 1978, following the work of Darryl Reanny (1974-6), I proposed (1978, 1992) that viruses and transposable elements, far from just being selfish genes (Dawkins 1976), formed part of a dynamical system of genetic symbiosis between the hosts and the mobile genetic elements, because the mobile elements permitted forms of coordinated gene expression and the formation of new genes in a modular manner, which would otherwise be impossible, achieving in return perpetuation of their own genomes over evolutionary time scales. Most of the details of this proposal have proved to be realized. The ENCODE project has demonstrated involvement of all the major classes of human transposable element in regulatory enchancer activity, most specific to a single cell type (Thurman 2012).
Fig 20b: ENCODE data showing involvement of the major classes of transposable element in enhancer activity (Thurman 2012).
By some reckonings, 40 to 50 per cent of the human genome consists of DNA imported horizontally by viruses, some of which has taken on vital biological functions. Taken together, virus-like genes represent a staggering 90 per cent of the human genome (Hamilton). Coding sequences comprise less than 5% of the human genome, whereas repeat sequences account for at least 50% and probably much more. Transposable LINE or long-intermediate repeat retroelements, common to mammals (Han and Boeke), and insects (Jensen et al, Sheen et al) with a history running back to the Eukaryote origin are specifically activated in both sperms and eggs during meiosis (Branciforte and Martin, Tchénio et. al., Trelogan and Martin), although subjected to down regulation by interfering piRNAs (Aravin et al). They replicate from transcribed RNA copies of themselves thus using RNA to instruct DNA copies, indicating an origin in RNA-based life, as does the active RNA processing of our own Eukaryote cells. Their RNA-based reverse transcriptase shows homologies with the telomerase essential for maintaining immortality in our germ line, indicating a common and symbiotic origin. 100,000 partially defective LINEs, around 100 of which remain fully active in humans, and their 300,000 dependent smaller fellow traveller Alu SINEs make up a significant portion of the human and mammalian genomes, along with pseudogenes, apparently defective copies of existing genes translocated by elements such as LINEs.
These elements travel passively down the germ line with chromosomal DNA, so their specific activation during meiosis suggests they may perform a role of coordinated regulatory mutation. This suggests that the type of symbiotic sexuality embraced by bacteria and plasmids also continues to function in higher organisms in a form of sexual symbiosis between our chromosomes and transposable genetic elements. This is consistent with the 1.4% point mutation divergence between humans and chimps, being overshadowed by an additional 3.9% divergence to 5.4% overall (Britten), when insertions and deletions are accounted.
SINEs, such as human Alu, a free-rider on the LINE reverse transcriptase derived from the small cellular RNA used to insert nascent proteins through the membrane, are in turn implicated in active functional genes (Reynolds, Schmid) particularly some involved in cellular stress reactions, again suggesting genetic symbiosis. Humans have about 13 times as many RNA edits as non-primate species, including inosine insertions associated with Alu elements, as well as intron deletions (Holmes ) and newly inserted exons (Ast), which may differentiate humans from other apes through alternative splicing of genes expressed in the brain. RNA editing is abundant in brain tissue, where editing defects have been linked to depression, epilepsy and motor neuron disease. There is a new Alu insert about every 100 births. As many as three quarters of all human genes are subject to alternative splice editing.
Recent explosion of the area of interfering miRNAs as regulatory elements in gametogenesis and development (Großhans) has provided an explanation of how pseudogenes, including those retrotransposed via LINE elements, can gain functional regulatory significance even though they do not produce translatable mRNAs.
Fig 21: Pseudogene-mediated production of endogenous small interfering RNAs (endo-siRNAs). Pseudogenes can arise through the copying of a parent gene (by duplication or by retrotransposition). (a) An antisense transcript of the pseudogene and an mRNA transcript of its parent gene can then form a double-stranded RNA. (b) Pseudogenic endo-siRNAs can also arise through copying of the parent gene as in a and then nearby duplication and inversion of this copy. The subsequent transcription of both copies results in a long RNA, which folds into a hairpin, as one half of it is complementary to its other half. In both a and b, the double-stranded RNA is cut by Dicer into 21-nucleotide endo-siRNAs, which are guided by the RISC complex to interact with, and degrade, the parent gene's remaining mRNA transcripts. The mRNA from genes is in red and that from pseudogenes is in blue. Green arrows indicate DNA rearrangements (Sasidharan and Gerstein).
Although the data from the human genome project indicated that human LINEs are becoming less active as a group by comparison with the corresponding elements in the more rapidly evolving mouse genome, there remain about 60 active human LINE elements which are known to be responsible for mutations in humans. More recent investigation (Boissinot et. al.) shows that the most recent families are highly active. Around four million years ago shortly after the chimp-human split, a new family Ta-L1 LINE-1 emerged and is still active, with about half the Ta insertions being polymorphic, varying across human populations. Moreover 90% of Ta-1d, the most recent subfamily are polymorphic, showing highly active lines remain present. LINEs are more heavily distributed on the sex chromosomes with X chromosomes containing 3 times as many full length potentially active elements and the Y chromosome 9 times as many! This is consistent with a continuing mutational load on humans which is removed more slowly from the sex chromosomes by crossing over in proportion to the degree to which crossing over is inhibited in each (i.e. totally on the Y and largely in males in the X but not in females). Sexual recombination is a protection from mutational error in a process called Muller's ratchet.
Fig 22: Evolution of reverse transcriptases from a common ancestor bearing a LINE archetype (Xiong and Eickbush, Nakamura et. al.). The root of their evolution goes back to the transfer from RNA to DNA at the beginning of life. They form a complementary evolutionary tree to that of cellular life as genetic symbionts of metazoa travelling down the germ line. Their group includes telomerases essential to the reproductive cycle.
LINEs are preferentially expressed in both steriodogenic and germ-line tissues in mice (Branciforte and Martin, Trelogan and Martin), suggesting stress could interact with meiosis. L1 expression occurs in embryogenesis, at several stages of spermatogenesis including leptotene, and in the primary oocytes of females poised at prophase 1. Conversely the SRY-group male determining gene SOX has been found to regulate LINE retrotransposition (Tchénio et. al.). Similarly LINE elements have been proposed to be 'boosters' in the inactivation of one X chromosome that happens in female embryogenesis (Lyon). This could enable somatic stress to have a potential effect on translocation in the germ-line which might enable form of genetic adaption in long-lived species such as humans. They have diverse means both to cause mutational damage and novel alleles (Han and Boeke).
Both L1 and Alu elements may be able to self-regulate rates of replication, through the existence of stealth drivers, viable elements which maintain a low transcription rate of active elements, with little genomic impact and hence little negative selection. These occasionally seed daughter master elements, which may replicate actively to form new families when conditions permit. This picture is consistent with long periods of quiescence, punctuated by bursts of 'saltatory' replication leading to large copy numbers (Han et. al.).
Further evidence of a symbiotic relationship comes from Drosophila telomeres, which are maintained by the non-LTR retrotransposons, the Line-like TART (Jensen et al, Sheen et al) and HeT-A (Biessmann et al). Likewise the recombination activating gene protein RAG1/RAG2, essential for the mutational variability of the vertebrate immune system, appears to have evolved from an ancient DNA transposon common to the metazoa (Agrawal et al, ). Significant similarities exist in the catalytic proteins of Hermes hAT transposase in insects, the V(D)J recombinase RAG, and retroviral integrase superfamily transposases, thereby linking the movement of transposable elements and V(D)J recombination (Zhou et al).
Fig 22b: Transib evolutionary tree spans the eucaryotes (Kapitonov and Jurka).
The approximately 600-amino acid ‘‘core’’ region of RAG1 required for its catalytic activity is significantly similar to the transposase encoded by DNA transposons of the Transib superfamily discovered recently based on computational analysis of the fruit fly and African malaria mosquito genomes. Transib transposons also are present in the genomes of sea urchin, yellow fever mosquito, silkworm, dog hookworm, hydra, and soybean rust (Kapitonov and Jurka).
Looking at the varying pace of evolutionary change in a very ancient gene family and one of the largest in the human genome, the G-protein linked receptor family has roots going back to the first eucaryotes, with two major types of serotonin receptor 5HT1 and 5HT2 diverging before the molluscs, arthropods and vertebrates diverged and originating between 750 million and 1 billion yeas ago. Consequently serotonin functions in mood and circadin rhythms in a similar manner in insects and humans. The diversity of neurotransmitters in humans particularly the amines serotonin, dopamine, norepinephrine and histamine and the amino acids glutamate, gamma-amino-butyric acid and glycine originate from the need of single celled eucaryotes to communicate major pathways of strategic survival from nutrition through aversion to reproduction and sporulation. The family also includes the opsins of visual perception, development receptors and a diverse array of olfactory receptors which are evolving far more rapidly. Serotonin appears with the first photosynthetic bacteria that used tryptophan to hold the porphyrin reactive center and continues, along with melatonin to play a crucial role in light and circadian cycles in humans, as well as mood and social responsiveness.
Fig 22c: (a) The two major serotonin receptor types 5HT1 and 5HT2 separated before the molluscs, arthropods and vertebrates diverged (Blenau & Thamm). (b) Evolutionary tree of the human G-protein linked receptors with examples highlighted in color. On the α branch are amine receptors - serotonin 5HT1A and 5HT2A, dopamine D1, and D2 (DRD1, DRD2), adrenergic α2a (ADRA2A), muscarinic acetylcholine (CHRM2), trace amine TAR1, as well as rhodopsin (RHO) and encephalopsin (OPM3). On the glutamate branch are metabotropic glutamate mGluR2 and GABA GABBR1. On the β branch is oxytocin (OXTR) surrounded by vasopressin receptors and Ghrelin. On the γ branch are opioid κ and μ (OPRK1, OPRM1). Olfactory and the non-rhodopsin receptors are linked to their respective points on the rhodopsin family tree. (Fredriksson R et al, Zozulya S. et al). (c) Insect tree of receptors for serotonin, dopamine, tyramine and octopamine neurotransmitters (Blenau & Baumann).
The evolution of the metazoan sodium channel essential for the neuronal action potential, from the Calcium channel shared by fungi and animals ocurred in single celled eucaryotes before the metazoa evolved from choanoflagellate-like ancestors. Metazoan eyes also appear to have a common origin, as indicated by the capacity of both jellyfish and mouse pax genes to elicit ectopic compound eyes on fruit files.
Fig 22d (a) Common involvement of PAX genes in eye formation from jellyfish, insects and vertebrates suggests a single common origin despite the differing mechanisms. The jelly fish pax genes, like mouse pax-6, induce ectopic compound eyes in fruit fly (right) (Suga et al, Kozmik et al). The small camera eyes of a jellyfish are shown top right (yellow arrow). It has even been suggested that the jellyfish could have gained the eye development pathway through symbiosis with certian single-celled dinoflagellates which possess an eyespot ocelloid, complete with lens and retinoid organelle (lower right) and may have in turn inherited this functionality from cyanobacterial chloroplasts via red algae (Pennisi, Keim) (b) Evolutionary diversification of Na+ channels from Ca++ channels, essential for the action potential, appears to have occurred before the existence of nervous systems in founding single-celled eucaryotes leading to the metazoa before the choanoflagelates such as monosiga (Liebeskind et al).
Endogenous retroviruses, or ERVs, which also travel down the germ line as free-riders, although some may retain infectious capacity, may be essential for placental function, as every mammal tested has placental blooms of endogenous retroviruses which appear to both aid the formation of the syncytium, the super-cellular fused membrane that enables diffusion from the mother to the baby and the immunity suppression, which prevents rejection of the embryo, both characteristics of retroviruses such as HIV.
Mi and colleagues (2000) found a placental gene whose sequence was homologous to several retroviral envelope proteins. The sequence, now called syncytin, is identical to the envelope protein of the HERV-W retrovirus (Blond et. al.) which exists in around 40 apparently defective viral copies, including those in which the two syncytin viral env genes are fully functional (Mi et. al.). Syncytin is expressed at high levels in the syncytiotrophoblast (and at low levels in the testes) and nowhere else. Most of the other genes of the provirus have been mutated, suggesting that the envelope glycoprotein function was specifically selected. If cultured cells are made to express syncytin, they will fuse together, and this fusion can be blocked with antibodies against syncytin. HERV-W is only found in primates, but mice have similar retroviral blooms and ERV-related Syncytin genes have also been found in them (Dupressoir et. al.). The ability of mammals and thus ourselves to form a viable placenta and give birth to live young may thus depend on the mammals having harnessed a viral gene somewhere in our evolutionary lineage.
Magiorkinis G et. al. (2012) have verified that the loss of the Env gene which enables cell infection, is associated with super-amplification of germ-line retroviral elements by a factor of about 30, as exogenous retroviruses, switch to endogenous modes of selection. They have investigated the widespread occurence of retroviruses, including intracisternal A-particles, or IAPs, across the diversity mammal groups. Notably following Jern and Coffin (2008) the evolutionary tree of retroviruses both spans the vertebrates and includes both endogenous and exogenous habits, rooted in exogenous viral types.
Fig 23: (a) The seven retroviral genera: alpha-, beta-, gamma-, delta-, epsilon-, lenti-, and spuma-like retroviruses and their intermediate groups based on Pol sequences. Black branches indicate viruses known only in exogenous infectious forms (XRV); redbranches indicate viruses present in both XRV andendogenous (ERV) forms; and blue branches indicate ERVs Jern and Coffin (2008) . (b) Phylogeny of mammals with ERV megafamilies shown as colored circles (area is proportional to the percentage of the ERV loci in the genome represented by that family Magiorkinis G et. al. (2012). When retrotranspons are included (fig 22) they extend to all Eukaryote realms.
The defective copies of endogenous retroviruses may also serve to protect the host against further infection by becoming transcribed and causing incorporation of defective elements into the replicating virus (Best et. al.).
The Cambrian Radiation, Homeotic Genes, Metamorphosis and Hybridization
One of the most stunning and puzzling aspects of evolution is the Cambrian radiation some 550 million years ago which over a very short period of geological time, gave rise to the major phylla of multicellular animals we see today. This radiation forms the core of the evolutionary tree of fig 1. The previous evolutionary epoch - the Ediacarian by contrast has far fewer and less elaborate fossil forms such as Charnia, Dicksonia and Spriggina in fig 1, particularly oranisms with well-preserved mineral skeletons.
There have many proposals why such a rapid and abundant radiation could have occurred, including geological scenarios involving the ending of a snowball earth epoch in which the earth became frozen and thus reflected radiant heat, until rising CO2 levels caused a rapid thawing, setting off a major expansion of cyanobacteria, filling the atmosphere with oxygen and changing the ocean from an acidic state with dissolved iron and litle oxygen to one capable of harbouring diverse forms of multicellular life.
Fig 24: Homeotic genes specifying differentiation along the bodily axis have closely related sequences and are organized in a parallel scheme in arthropods and vertebrates. Mutations of related genes in maize cause disruption of leaf development.
However the underlying reasons may be genetic and more to do with the evolution of a pangenomic algorithm for generating the multicellular body plan based on homeotic and related developmental genes which are highly conserved and spread across the major animal and even plant kingdoms. Closely related schemes of homeotic genes drive the vertebrate and arthropod development along the bilateral axis, being involved in segmentation and notochord differentiation.
Fig 25: Ectopic compound eye on the leg of a fruit fly induced by mouse pax-6.
For example pax-6, a gene involved in eye formation in the mouse, will induce ectopic eyes on the fruit fly, intriguingly the compound eyes insects usually have, indicating a deep commonality between the genes organizing the body plan in these two pivotal phylla.
This suggests it may have taken evolution a considerable time to come up with such an algorithmic regulatory process, but that once it came into play, it permitted almost symphonic variations, leading to the diversity of the major animal phylla over a short geological time.
Pivotally, given a context in which such organisms were still in the early stages of genetic diversification in their radiative adaption, it is possible that occasional wholesale horizontal transfer of whole genomes may have occurred as a result of adventitious hybridization.
The conventional theory of insect morphogenesis is the evolution from eggs giving rise to small adult form individuals to delayed maturation of embryonic forms in the form of larvae, which did not compete with the adults in food consumption and habit, thus leading to a two-stage life cycle, with non-competing foraging larval and reproductive adults forms (Jabr). A controversial theory (Ryan) posits that aspects of metamorphosis, which we also see in insects, but more pivotally in marine organisms such as echinoderms, may have resulted from early hybridization even between organisms which have now become distinct phylla, such as vertebrates and echinoderms. For example Nectocaris pteryx appears to have a body plan looking like a chimera of an arthropod head and the abdomen of an entirely different phyllum, although this may be a result of the way fossils are depicted in drawings and this species may simply be an early cephalopod (Smith & Caron). The validity of this idea, at least in insects, is highly controversial and hotly disputed (Hart & Grosberg, Williamson).
Fig 26: Fossil and two very different scientific illustrations of Nectocaris ptaryx and two views of the larva of Luidia sarsi emitting an echinodermal 'offspring'.
Studies attempting to trace the evolutionary tree depending on the simpler larval forms from which one would expect the adult form to have evolved have yielded contradictory results, suggesting the two might have independent genetic origins. Genetic analysis of sea squirts, which have vertebrate larval forms with a notochord and a primitive brain but metamorphose into fixed sea-floor feeders, suggests they have two genomic components, one coming from vertebrates and the other from and an unknown but now extinct non-vertebrate at a very early stage in the evolution of animals. Echinoderms themselves have larvae with a bilateral body plan which later becomes colonized by pluripotent cells in the abdominal cavity forming radially symmetric organisms which grow into adults. In the starfish Luidia sarsi, the embryonic form some 4 cm long survives for several months as a vegetarian living off phytoplankton after its starfish 'offspring' have burst out to their carnivorous habit of hunting other starfish.
Donald Williamson, who in the 1950s advanced the 'larval transfer' theory claimed to have successfully hybridized fertilised eggs from the sea squirt Ascidia mentula with sperm from the sea urchin Echinus esculentus. Then in 2002, in an unpublished study with Sebastian Holmes and Nic Boerboom, he did the reverse cross, using eggs from the urchin and sperm from the sea squirt. Both crosses resulted in large numbers of offspring, the majority of eggs developing into easel-shaped larvae - the 'pluteus' form typical of sea urchins, rather than the tadpole larvae that are the hallmark of sea squirts. Most of these larvae subsequently metamorphosed to a rounded adult form, which Williamson called a 'spheroid'. The first cross created spheroids with a suction cup, that enabled them to attach to surfaces. Most intriguingly, the second produced spheroids that reproduced asexually through budding, the pinching off of a section of the body to create a clone. However these were never subjected to genetic analysis. A cross between two echinoderm species has however resulted in new developmental phenotypes with confirmed hybridized genomes.
Fig 26b: Conflict in the tree of mammalian diversification. Detailed traditional DNA-based evolutionary tree of the mammals (right) (Meredith et al, dos Reis et al) tends to have a different order of diversification from one based on the number o fnew miRNAs appearing in successive branches (top left Dolgin). Micro miRNA numbers have also been suggested to be correlated with neural complexity (bottom left Technau).
Mammalian Radiative Adaption: Traditional DNA versus Micro RNAs
Different assay methods are shedding an intriguing light on radiative adaptation and diversification of animal species, from the Cambrian through to the present day. The detailed branching of the tree of life when calculated by traditional mutational DNA methods (Meredith et al, dos Reis et al), appears to differ significantly from a new technique developed by Kevin Peterson (Dolgin) that depends on the number of newly accrued micro miRNAs which modulate gene expression by selectively binding to specific messengers inhibiting their expression.
A single miRNA is thus able to modulate the expression of a diverse array of mRNAs to which it binds, thus providing for sophisticated forms of coordinated regulation conducive to phylogenetic complexity. It has also been suggested that neuronal complexity correlates with the number of miRNAs (Technau, Grimson et al) an interesting question in itself to do with how complex nervous systems are generated in development. Notice here that humans have fewer protein genes than a mouse, roughly 21,000 against 22,000 although we have a brain with 10,000 times as many neurons, so we need to have an idea how organismic complexity evolves in terms of sophisticated gene regulation, and miRNAs do just that.
Consitent with such a role in multi-celled evolution, the appearance of miRNAs goes back to the earliest multicelled animals. Sea anemones already carry up to 40. Metazoa, from sponges to bilateria, also share the two classes of piRNA, the second of which plays a role in suppressing transposable elements in gametogenesis, by containing a sequence complementary to a transposase mRNA. In fruit flies these are directed against DNA transposons, but in mammals they target LINE L1 and IAP transcription during meiosis in the germ line, by methylating L1 and IAP DNA sequences (Aravin et al).
While a traditional DNA-based tree places primates and humans much closer to rodents, as highly evolved branches, with the elephants diverging earliest, an miRNA analysis places rodents as branching out earliest, something which might seem to be consistent with their possibly closer correspondence to the founding shrew-like mammalian type. The critical question determining the fate of the miRNA perspective is what the rate of loss of these small RNA molecules is in evolution. A higher rate of loss would tend to remove the inconsistency. While the picture is consistent with retaining miRNAs in mammalian diversification, in insects and a primitive chordate sudden losses have occurred. Ida's controversial place in the primate famiy tree.
Emergence and Diversification of Modern Humans
Fig 27: One hypothetical evolutionary tree for humans and related apes. There is much debate about the actual form of such a tree.
Homo sapiens appears to have evolved into a single dominant species on the planet, after preceding period in which fossil evidence suggests there were several different anthropoid species coexistent.
The final stage of this process was the disappearance of Homo erectus and Neanderthal, the latter after a well-defined period of coexistence in Europe at the end of the last ice age.
Our own evolutionary and cultural roots appear to lie in Africa, with evidence of culture and cosmetics running back over 100,000 years, in addition to evidence for tools and weapons. An alternative regional development theory has proposed that humans evolved through a considerable amount of interbreeding over the whole African and Asian continental region, however genetic evidence is coming to point towards an African origin with only at most very occasional cross fertilization with related species.
Fig 28: Human-Neanderthal-Chimp divergences (Green et. al.)
According to genetic analysis, Neanderthals diverged from homo sapiens ~500,000 years ago. There has been no major interbreeding, but possibly some transfer of genes e.g. from human males to Neanaderthal females, although candidate human genes conferring natural advantage do have a profile consistent with transfer from Neanderthals (Green et. al.).
Fig 29: The "Out of Africa" hypothesis may be consistent with a degree of regional development
involving some sexual interbreeding with Neanderthals and Homo erectus (New Scientist).
Specific genes such a s PDHA1 consist of two families with the last common ancestor 1.8 million years ago, and microcephalin variants appearing 40,000 years ago also have differences suggesting an original divergence 1 million years ago suggesting 'introgression' from Neanderthals (Jones). An even more ancient divergence in the pseudogene RRM2P4 in East Asian people suggests interbreeding with Homo Erectus. Some evidence from skeletons is also consistent with this picture. However more recent sequencing of the Neanderthal nuclear genome (Callaway) suggests little or no interbreeding with Homo sapiens and has cast doubt on the existence of the microcephalin variant in Neanderthals, as well as a gene associated with increased fertility in Icelanders also attributed to transfer from Neanderthals.
Comprehensive investigation of the Neanderthal genome (Green, R. E. et al. (2010) Science 328, 710-722., Nature | doi:10.1038/news.2010.225) suggests that there was a period of interbreeding between Neanderthals and humans in the Near East around the time of the first migration out of Africa, rather than more recently in Europe, as the putative sequences are shared by non-African French, Han and Papuan, but not by the African Yoruba or San. It is estimated that among the former, 1-4% of the genome derives from Neanderthal sequences, although there is little evidence for these corresponding to the specific genes suggested by Lahn's team. Other transfers could have occurred but are not apparent in the research.
The situation has been complicated by two finds. Firstly we have the 'hobbit' human remains found on Flores, named Homo floresiensis. These are variously claimed to be a Separate human species possibly related to Homo erectus, or disclaimed as microcephalic human pygmy peoples. More recently we have the discovery of remains of Denisovians, a further species branching off from the Neanderthals and their lines branching from Homo sapiens some 800,000 years ago. Genetic analysis of the remains indicates a significant interbreeding specifically with Melanesian people of some 6% (Reich et. al 2010, Meyer et al 2012).
Chromosomes contain a variety of markers that can be used to compare diverse populations and infer an evolutionary relationship between them. These include the slowly varying protein polymorphisms of coding regions which are useful for long-term trends, single nucleotide polymorphisms, and non-coding region changes (mutation rates about 2.5 x 10-8 per base pair per generation and useful for reconstructing evolutionary history only over millions of years) insertion and deletion events (about 8% of polymorphisms, extending from one to millions of nucleotides), particularly those driven by transposable elements such as the LINEs and even more frequent SINEs, non-coding micro-satellites (mutation rate 10-5 - 10-2 due to repeat slippage) and mini-satellite regions of repeating DNA (mutation rates as high as 2 x 10-1 due to meiotic recombination in sperm) that both evolve rapidly and are not subject to the strong selection of coding regions which can differentiate changes over the much shorter time scales of modern human migration.
The insertions and deletions of the million or so Alu elements in the human genome are particularly useful, as the most active sub-population of about 1000 Alu is actively transcribing and undergoing rapid change. A subpopulation of Alu are capable of generating new coding regions (exons), when inserted into non-coding introns between spliced sections of a translated mRNA, because one base-pair change within Alu leads to formation of a new exon reading into the surrounding DNA. This is not necessarily deleterious because alternative splicing still allows the original protein to be made as well. We have the highest number of introns per gene of any organism, and thus have to have gained an advantage from this costly error-prone process. Alus may have given rise, through alternative splicing, to new proteins that drove primates' divergence from other mammals. Recent studies have shown that the nearly identical genes of humans and chimps produce essentially the same proteins in most tissues, except in parts of the brain, where certain human genes are more active and others generate significantly different proteins through alternative splicing of gene transcripts. Our divergence from other primates may thus be due in part to alternative splicing.
If we consider the likely effects of the out of Africa hypothesis, we would expect that founding African populations not subject to active expansion and migration would have greater genetic diversity and that the genetic makeup of other world populations would come from a subset of the African diversity, consisting of those subgroups who migrated. This picture is complicated by the evidence for one or more bottlenecks that reduced the genetic diversity of the surviving human population to 3000-10,000 breeding pairs around 70,000 years ago, which has been associated with the supervolcanic Toba eruption in Sumatra.
Fig 29b: he Volcanic Winter/Weak Garden of Eden model proposed in Ambrose 1998. Population subdivision due to dispersal within African and to other continents during the early Late Pleistocene is followed by bottlenecks caused by volcanic winter, resulting from the eruption of Toba, around 71,000 years ago. The bottleneck may have lasted either 1000 years, during the hyper-cold stadial period between Dansgaard-Oeschlger events 19 and 20, or 10,000 years, during oxygen isotope stage 4. Population bottlenecks and releases are both synchronous. More individuals survived in Africa because tropical refugia were largest there, resulting in greater genetic diversity in Africa.
In the case of mitochondrial mtDNA (mutation rate about 2.5 x 10-7) and its hyper-variable D-loop (mutations rates as high as 4 x 10-3), which is transmitted only down the maternal line (see Tishkoff and Verrelli for caveat) and the non-recombining majority of the Y-chromosome which is transmitted only down the paternal line, each with no recombination, we would expect greater diversity going deeper into the historical tree of divergence, with certain existing groups who have retained the founding patterns of survival and have not undergone rapid population expansions to retain an increasingly diverse source variation. All these features are broadly observed in the genetic data to date.
Fig 30: (a) MtDNA tree for African groups showing haplotypes of !Kung, Mbuti and Biaka as well as the line coming out of Africa (Chen et. al.). (b) Diagram of world migration and regional differentiation of successive mtDNA haplotypes (Gilbert). (c) mtDNA distances between founding African groups including Hadza (clicks) Khwe is from (Knight et. al.). Recent mtDNA evidence suggests a first wave of migration down the coast of Asia all the way to Australia (Forster et. al.).
Most studies of non-coding regions of autosomal, X-chromosome, and mitochondrial mtDNA genetic variation (which are desirable markers because they are not so subject to selection and thus have relatively neutral drift) show higher levels of genetic variation in African populations compared to non-African populations, using many types of markers. Although some studies of Y-chromosome variation have observed higher heterozygosity levels in non-African populations, the African populations have higher levels of pairwise sequence differences, consistent with these populations being ancestral. High levels of diversity in African populations alone do not prove that African populations are ancestral. A recent bottleneck event and/or colonization and extinction events among non-African populations, or a more recent onset of population growth in non-Africans, could also cause a decrease in genetic diversity (Tishkoff and Verrelli). In fact the complete inter-fertility of all human populations and the relative lack of genetic divergence by comparison with the few remaining chimp colonies in the wild (Hrdy 183) does indicate a significant bottleneck. The genetic data is consistent with a human emergence from a population of only 10,000 around 100,000 years ago. This is also consistent with the delayed maturation, long birth spacings as a result of prolonged lactation and high infant mortality seen in gather-hunter populations such as the !Kung. At such low growth rates a population of 100 would take 50,000 years to reach 10,000 (Hrdy 183).
Fig 31: Patterns of male migration. The Genographic Project - a partnership between National Geographic and IBM - will collect DNA samples from over 100,000 people worldwide to provide a high-resolution genetic map of human migration.
However studies of protein polymorphisms as well as mtDNA haplotypes, X-chromosome and Y-chromosome haplotypes, autosomal microsatellites and minisatellites, Alu elements, and autosomal haplotypes indicate that the roots of the population trees constructed from these data are composed of African populations and/or that Africans have the most divergent lineages, as expected under a recent African origin rather than a multi-regional emergence model. Additionally, studies of autosomal, X-chromosomal haplotype and mtDNA variation indicate that Africans have the largest number of population-specific alleles and that non-African populations harbor a subset of the genetic diversity that is present in Africa, as expected if there was a genetic bottleneck when modern humans migrated out of Africa. Analysis of genetic variation among ethnically diverse human populations indicates that populations cluster by geographic region (i.e., Africa, Europe/Middle East, Asia, Oceania, New World) and that African populations are highly divergent. The mtDNA studies hypothesize a primal female ancestor - the African Eve - around 150,000 years ago (Chen et. al.) while the Y-chromosome Adam is more recent, at around 90,000 years ago (Underhill et. al.) consistent with the greater reproductive variance of males than females. Differences between the Y- and mtDNA distributions indicate how migration, intermarriage and female exogamy have affected the gene pool. The genetic patterns of both these and autosomal microsatellites (Zhivotovsky et. al.) are consistent with founding African diversity with migratory radiations to form other world populations, with deep founding radiations to the forest people such as the Biaka and Mbuti, Khoisan click-language speaking !Kung-san bushmen of Botswana and the Sandawe of Tanzania, and possibly the Hadzabe, as well as the forest people such as the Mbuti and Biaka 'pygmies' who have adopted the Bantu languages of the farming neighbours with which they now share semi-symbiotic relationships. Along with some Ethiopian and Sudanese sub-populations, these groups may represent some of the oldest and deeply diversified branches of modern humans.
Fig 32: (Right) Genographic project study of mitochondiral origins shows a deep split separating Khoisan mitochondrial inheritance from other groups, including those migrating out of Africa, suggesting a separation of some 100,000 years possibly caused by long term drought in Africa (Behar et al.) .
Such recent genetic evidence has laid bare the relationships between some of the founding human groups spread across Africa from the 'Cushite' horn of Ethiopia to the southern Kalahari. Mitochondrial DNA studies have highlighted the ancient origin of the !Kung San and of pygmy peoples of the Congo Basin such as the Mbuti and the Biaka.
Y-chromosome studies have shown the !Kung share a most ancient haplotype with sub-populations from Ethiopia and the Sudan. According to an overall survey of genetic research by Sarah Tishkoff of the University of Maryland, the most deeply ancestral known human DNA lineages may be those of East Africans, such as the Sandawe, who share many phenotypic features and a click language with the !Kung. This suggests southern Khoisan-speaking peoples originated in East Africa. The most ancient populations are now believed to also include the Sandawe, Burunge, Gorowaa and Datog people of Tanzania. The Burunge and Gorowaa migrated to Tanzania from Ethiopia within the last 5,000 years consistent with an ancient founding population in this area. Echoes of the earliest language spoken by ancient humans tens of thousands of years ago may have been preserved in the distinctive clicking sounds still spoken by some existing African tribes.
Highlighting unique features of human genetic evolution, are two key genes whose mutations cause microcephaly, consistent with increased brain size, whose rapid spread through the human population may coincide with spurts in human culture. Microcephalin (Evans et. al.) appeared ~37,000 years ago coinciding with the birth of culture and ASPM spread from the Near East around 5000 years ago (Mekel-Bobrov et. al.). However studies linking these variants have failed to find differences in intelligence and results remain highly controversial (DOI:10.1126/science.314.5807.1872). Nevertheless, these results are consistent with an overall examination of linkage disequilibrium in single nucleotide polymorphisms (Moyzis et. al.) which indicate that about 7% of our genes have been subject to selection in the last 50,000 years, a figure similar to domestication of maize, including genes for protein metabolism, disease resistance and brain function.
Fig 33: (a) Non-recombining Y-chromosome evolutionary tree (Underhill et. al.) (b) Geographical distribution showing the ancient haplotype shared by the San and Ethiopian and Sudanese sub-populations. (c) Genetic distances between Khoisan and forest peoples sharing M112 a Y-chromosome allele common only in these groups showing great genetic distance between Hadzabe and San peoples (Knight et. al.) . (d) Autosome satellite analysis confirming ancient divergence of San and forest peoples leading to migration from Africa (Zhivotovsky et. al.).
Y-chromosome studies have shown the !Kung share a most ancient haplotype with sub-populations from Ethiopia and the Sudan, suggesting they are parts of an ancient widespread population later divided by the Bantu expansion. According to an overall survey of genetic research by Sarah Tishkoff of the University of Maryland, the most deeply ancestral known human DNA lineages may be those of East Africans, such as the Sandawe, who share many phenotypic features and a click language with the !Kung. This suggests southern Khoisan-speaking peoples originated in East Africa. The most ancient populations are now believed to also include the Sandawe, Burunge, Gorowaa and Datog people of Tanzania. The Burunge and Gorowaa migrated to Tanzania from Ethiopia within the last 5,000 years consistent with an ancient founding population in this area. Echoes of the earliest language spoken by ancient humans tens of thousands of years ago may have been preserved in the distinctive clicking sounds still spoken by some existing African tribes.
The genetic structure of 126 Ethiopian and 139 Senegalese Y chromosomes was investigated by a hierarchical analysis of 30 diagnostic biallelic markers selected from the worldwide Y-chromosome genealogy. The present study reveals that only the Ethiopians share with the Khoisan the deepest human Y-chromosome clades. This confirms the ancestral affinity between the Ethiopians and the Khoisan, which has previously been suggested by both archaeological and genetic findings (Semino et al.).
In a counterpoint to these studies, (Hein, Rohde et. al.) estimate that the repeated spreading of family trees by sexually recombining mobile populations and differences in reproductive rates leads to an estimate of the most recent common ancestor of our global populations existing just 3,500 years ago, excepting these most isolated groups.
Fig 34: Human divergence trees calculated by single nucleotide polymorphisms (SNPs) top left (Li et. al.) bottom right (Jakobsson et. al.).
Trees for haplotypes and copy number variation between populations (Li et. al.). (click to enlarge).
Further studies of the nuclear genome, using SNPs (single nucleotide polymorphisms), CNVs (copy number variation) and haplotype have thrown up reasonably consistent maps of regional divergence of principal human groups, demonstrating correspondence to the "Out of Africa" hypothesis and consistent with major patterns of migration.
Fig 35: In 2009, Tishkoff et. al. reported on a major study of African and African American evolution containing the most detailed information on African diversity to date (click to enlarge).
In 2012 an in depth study into human origins (Schlebusch et al) has found no single founding loscation but mixing and divergence of populations with the Khoe-San diverging from other human groups over 100,000 years ago, and a further division later between North and South Kalahari populations around 35,000 years ago, but in addition, deep complexity both within and wthout Khoe-San populations.
Fig 36: Evolutionary tree of Indo-European languages suggests a possible radiation corresponding to the Kurgans occurred around 4,900 BC (6,900 BP) and that they were preceded by Hittite migrations into Anatolia. Time scales in red are BP (Gray and Atkinson). Significantly Tocharian appears in Buddhist writings from China's Xinjiang province, indicating early far-eastern spread. Inset: hypothetical relationship between Indo-European and wider language groups such as Afro-Asiatic (click to enlarge).
The evolutionary tree of human ethnic and migratory peoples bears an interesting relationship with the corresponding tree of languages, which has been resolved at least as far as the founding Indo-european languages (Gray and Atkinson).
For the absolutely latest on this, as of Sep 2012, here is the current research burst:
Fig 37: Hypothetical core of all human languages from "The Shape and Fabric of Language Evolution" further work associated with Gray and Atkinson's work (click to enlarge). The Nature article above implies "languages evolve in their own idiosyncratic ways, rather than being governed by universal rules set down in human brain patterns".
Fig 38: The Mandala of Evolution (Dion Wright) Click to see full image.
Conclusion: The Tree of Life, the Selfish Gene, and Climax Genetic Diversity
The picture conveyed by the significance of endosymbiosis, genome fusion and horizontal transfer as key evolutionary processes complementing the vertical transmission of the tree of life, makes clear that evolution is not just a matter of competitive survival of the fittest gene, individual, or species, but of dynamic survival of genes in a surviving ecosystem. Although Dawkins' (1978) notion of the "selfish gene" was pivotal in drawing attention to the fact that it was the survival of genes and not organisms, or even species, that was the key evolutionary process, attributing the human sentiment of selfishness to a gene is somewhat of a self-serving advertising distraction on the part of the author, which diminishes the subtlety and complexity of the sometimes apparently paradoxical ways genes actually interact to bring about beneficial outcomes in the evolutionary dynamics of the ecosystem.
Although the idea of selection of genes has been pivotal in defining the need to consider evolutionarily stable strategies under genetic variation in ways which have been subsequently confirmed time and time again in situations such as the sexual genetics of social insects such as bees and ants, social selection is by no means ineffectual, or much of sociobiology, including the biological basis of morality as an extension of reciprocal altruism, would cease to exist.
Moreover, from what we have seen, particularly about horizontal gene transfer, and the capacity of mobile elements to induce modulated changes in nuclear genomes, it is not the 'selfishness' of a genetic element alone that results in survival of both a gene and its hosts, but dynamic feedbacks,, and relationships which ultimately contribute to a massive sharing of information in the manner of parallel genetic algorithms fundamental to the replicative genetic process, which enable global forms of genetic and genome optimization central to the overall viability of life as complex systems.
Just as a predator, such as a lion survives, not because it is a selfish beast thinking only of eating the next gazelle, but because the predator, although it is surviving by killing individual antelopes, is maintaining a degree of stability in population dynamics, without which, the herbivores might multiply causing a massive famine, leading to cycles of boom and bust and the potential extinction or attrition of antelopes, lions and the grasslands.
Likewise, although we may think of individual genes, transposable elements, or viruses as 'selfish' for reproducing sufficiently to ensure their own survival, and sometimes behaving as noxious parasites, the overall effects of this process, in evolution can be to enrich the genetic potential of many unrelated organisms along the way, changing forever the face of the ecosystems in which they exist, enabling organisms of far greater complexity to evolve and to survive in the closing circle of the biosphere.