The Tree of Life: Tangled Roots and Sexy Shoots
Tracing the genetic pathway from the Last Universal Common Ancestor to Homo sapiens
Chris King - Genotype 1.3.17 - Dec 2009 - Feb 2016
PDF Active site: http://www.dhushara.com/unravel/
If you like this article please repost it!
This article is a fully referenced research review to overview progress in unraveling the details of the evolutionary Tree of Life, from life's first occurrence in the hypothetical RNA-era, to humanity's own emergence and diversification, through migration and intermarriage, using research diagrams and brief discussion of the current state of the art.The Tree of Life, in biological terms, has come to be identified with the evolutionary tree of biological diversity. It is this tree which represents the climax fruitfulness of the biosphere and the genetic foundation of our existence, embracing not just higher Eucaryotes, plants, animals and fungi, but Protista, Eubacteria and Archaea, the realm, including the extreme heat and salt-loving organisms, which appears to lie almost at the root of life itself.
To a certain extent the notion of a tree based on generational evolution has become complicated by a variety of compounding factors. Gene transfer is not just vertical carried down the generations. There is also evidence for promiscuous incidences of horizontal gene transfer, genetic symbiosis, hybridization and even the formation of chimeras. This review will cover all these aspects, from the first life on Earth to Homo sapiens.
Fig 1: The Tree of Life (King) Click on image to enlarge.
LUCA: The Last Universal Common Ancestor
Following a phase of biogenesis possibly based on cosmic symmetry-breaking (King 1978, 2004), based on spontaneous prebiotic RNA synthesis (Powner et. al. 2009, 2010) recent research suggests that the last universal common ancestor (LUCA) of all life on the planet may have arisen before the first cells, from a phase interface between alkaline hydrogen-emitting undersea vents and the archaic acidified iron-rich ocean (Martin and Russel 2003) in which differential dynamics in membranous micropores in the vents managed to concentrate polypeptides and polynucleotides to biologically sustainable levels (Baaske et. al. 2007, Budin et. al. 2009), giving rise to the RNA era, while at the same time providing a free energy source based on proton transport across membranous microcellular interfaces resulting from fatty acids also being concentrated above their critical aggregate concentration. The transition to enclosed cells is likely to have been in an active iron-sulphur reaction phase still present in living cells and associated with sodium-proton anti-porters activating ATP (Lane and Martin 2012, Lane 2009b), leading in turn to electron transport and some of the most ancient proteins, such as ferredoxin,
Fig 1a: Proposed scheme for the universal common ancestor (Martin and Russel 2003)
The universal common ancestor of the three domains of life may have thus been a proton-pumping membranous interface from which archaea and bacteria emerged as free-living adaptions. This is suggested by fundamental differences in their cell walls and other details of evolutionary relationships among some of the oldest genes.
Among the archaea, halobacteria still use a form of photosynthesis generating ATP from H+ gradients generated by a rhodopsin protein and those in hydrothermal vents rely on Na+-H+ antiporters to generate ion gradients, and their membrane proteins, such as the ATP synthase, are compatible with gradients of sodium ions or protons (Lane and Martin 2012, Yong 2012).
Fig 1b: (Above) founding metabolism based on Na+-H+ anti-transported, ATP synthetase and FeSNiS containing vents (Lane and Martin 2012). The extremely ancient origin of the rhodopsin family of heptahelical receptors can be seen from the ultra-primitive archael photosynthesis in Halobacteria, which relies on direct coupling between photo-stimulated chemiosmotic H+ pumping and H+ generated ATP formation, based on bacteriorhodopsin, which is heptahelical, uses a form of retinal and whose helices share a distant sequence homology with vertebrate rhodopsin (Ihara et al 1999) (click to enlarge).
It has also been proposed, on the basis of the highly-conserved commonality of transcription and translation proteins to all life, but the apparently independent emergence of distinct DNA replication enzymes in archaea/eucaryotes and eubacteria, that the last universal common ancestor had a mixed RNA-DNA metabolism based on reverse transcriptase, pinpointing it to the latter phases of the RNA era (Leipe et. al. 1999).
To get a characterization of LUCA at the point it diversified into the three domains of life Archaea, Eucaryotes and Bacteria, one cannot rely on nucleotide gene sequences because these would have mutated beyond recognition, but amino acid sequences mutate more slowly because neutral mutations leave the amino acid sequence fixed and the tertiary folded structure of a protein is even more strongly conserved.
The validity of the RNA-era concept and the capacity for RNAs to be both replicating informational and active ribo-enzymes is emphasized by the continuing dependence of the ribosome on rRNA rather than the protein components demonstrated by the 3D realizations of the two subunits in fig 1c1, which show that the rRNA molecules are still carrying out the central task of protein assembly with only minor modification due to the 'chaperoning' proteins, despite 3.8 billion years of evolution.
Fig 1c1: Small and large rRNA subunits of the eubacteia Thermus thermophilus and the archaeon Haloarcula marismortui.
RNA orange and yellow, protein blue and active site green. (Wikipedia Ribosome) Click HERE to see the images rotating.
Brooks et al. (2002) have found that the amino acids used in sections of genes common to life which are believed to originate with LUCA show amino acid distributions reflecting the relative abundance of such amino acids in primitive synthesis, indicating that the first translational genes used the amino acids which were spontaneously available.
One intriguing indication of the state of genetic translation in LUCA is the incorporation of selenocysteine into the genetic code. Selenoenzymes which contain selenocysteine as a genetically translated amino acid are essential to the three domains of life and source back to LUCA, despite the fact that the 21st coded amino acid selenocysteine could not be fitted into the genetic code. An ingenious piece of genetic software engineering evolved in which the amber stop codon UAG is overridden if the m-RNA possesses a motif called SECIS (selenocysteine insertion sequence) and selenocysteine is then inserted instead of termination and translation continues.
Fig 1c2: Left: Evolutionary tree of selenophosphate synthetase (Romero et al. 2005) spans the three domains of life. Centre: SECIS hairpins of archaea (A), bacteria (B) and corresponding eukaryote variants (C, D) (Moldave ed 2006). Top right: Tertiary structure of SECIS showing highly conserved regions (hot) (Walczak et al. 1996). Lower right: SECIS acts as an RNA-enzyme to attach the selenocysteine t-RNA to the nascent protein (click to enlarge).
SECIS is an unusual hairpin loop structure which has varying forms in archaea and prokaryotes with both forms appearing in eucaryotes, but they have a common feature of a highly conserved hairpin loop forming an RNA translational catalyst, which literally takes over some of the ribosomal RNA function, binding to the selenocysteine t-RNA and coupling selenocysteine to the nascent protein chain, as shown in fig 1c2. It is clear that this unique piece of genetic software engineering evolved in LUCA because the wobble positions of three other essential amino acid t-RNAs, lysine, glutamine and glutamic acid (those with two wobble positions XAA-XAG, the fourth set being amber and ochre stop codons), all depend on a modified 2-seleno-uridine base to function and this has to be generated from selenophosphate, which in turn is generated by selenophosphate synthetase. As shown above left, this enzyme has an evolutionary tree extending back to LUCA confirming the obvious - that the genetic code cannot exist without the 21st software engineered amino acid selenocysteine!
To reconstruct the set of proteins LUCA could make, Kim and Caetano-Anollés (direct link) searched a database of proteins from 420 modern organisms, looking for structures that were common to all. Of the structures he found, just 5 to 11 per cent were universal, meaning they were conserved enough to have originated in LUCA. By looking at their function, they conclude that LUCA had an advanced metabolic network, especially rich in nucleotide metabolism enzymes, had primordial pathways for the biosynthesis of membrane glycerol ether and ester lipids, crucial elements of translation, including aminoacyl-tRNA synthases, regulatory factors, and a primordial ribosome with protein synthesis capabilities. It lacked however transcription from DNA to RNA, processes for extracellular communication, and enzymes for deoxyribonucleotide synthesis, and in advanced evolutionary stages stored genetic information in RNA (not DNA) molecules
Fig 1d: Phylogenomic tree of proteomes describing the evolution of 420 free-living organisms. phylogenomic study of protein domain structure in the proteomes of 420 free-living fully sequenced organisms. Domains were defined at the highly conserved fold superfamily (FSF) level of structural classification (Kim and Caetano-Anollés).
(click image to link to original research version)
Organelles were thought to be the preserve of eukaryotes, but in 2003 researchers found an organelle called the acidocalcisome also occurred in bacteria. Caetano-Anollés' team has now found that tiny granules in some archaea are also acidocalcisomes, or at least their precursors. That means acidocalcisomes are found in all three domains of life, and date back to LUCA (Seufferheld et al. - direct link).
Acidocalcisomes were originally discovered in Trypanosomes (sleeping sickness and Chagas disease) but have since been found in Toxoplasma gondii (toxoplasmosis), Plasmodium (malaria), Chlamydomonas reinhardtii (a green alga), Dictyostelium discoideum (a slime mould), bacteria and human platelets. Their membranes contain a number of protein pumps and antiporters, including aquaporins, ATPases and Ca2+/H+ and Na+/H+ antiporters. Acidocalcisomes have been implied in osmoregulation. They were detected in vicinity of the contractile vacuole in Trypanosoma cruzi and were shown to fuse with the vacuole when the cells were exposed to osmotic stress. Presumably the acidocalcisomes empty their ion contents into the contractile vacuole, thereby increasing the vacuole's osmolarity. This then causes water from the cytoplasm to enter the vacuole, until the latter gathers a certain amount of water and expels it out of the cell.
Fig 1e: Tangled web linking acidocalcisomes in existent archaea, bacteria and eucaryote species (Seufferheld et al.), overlaying electron micrographs of acidocalcisomes in Agrobacterium tumefaciens(a, b) and Methanosarcina acetivorans (c, d). (click image to link to original research version)
LUCA may have used RNA rather than DNA, as there is no evidence LUCA possessed ribonucleotide reductases, which create the deoxy versions of ribonucleotides the building blocks of DNA (Lundin et al - direct link). Rather it appears these functions have been transferred from bacteria back to archaea by horizontal transfer on at least two separate occasions (arrows in fig 1e). Eucaryotes (mid green) would also have received theirs after LUCA diversification.
Fig 2: Ribonucleotide reductase trees showing bacterial, eucaryote and archaeal branches, with evidence of two events of horizontal transfer from bacteria to archaea (arrows) after the diversification of LUCA (Lundin et al).
LUCA was a "progenote". Progenotes can make proteins using genes as a template, but the process is so error-prone that the proteins can be quite unlike what the gene specified. Both Di Giulio and Caetano-Anollés have found evidence that systems that make protein synthesis accurate appear long after LUCA. In order to cope, the early cells must have shared their genes and proteins with each other. Caetano-Anollés says the free exchange and lack of competition mean this living primordial ocean essentially functioned as a single mega-organism.
Two or Three Domains of Life?
Life today is informationally based on the sequences of the four bases A, G, T and C in DNA, with messenger copies of the genetic sequence in mRNA (with U replacing T) forming intermediates in the assembly of proteins, as the cell's primary active chemical and structural agents. This is achieved through a process of translation at the ribosome - a supra-molecular complex composed of some 50 chaperoning proteins surrounding a core composed of three rRNA units, fed by amino-acid coupled tRNAs. The RNAs carry out the essential function, supporting the idea that translation was at first a purely RNA-based process of protein construction. In line with this and other RNA fossils found particularly in Eukaryotes, it is widely believed that life began based on RNA, which shares both the capacity for complementary replication of DNA and the formation of 3-dimensional chemically reactive conformations, similar to proteins, after which the ribosome evolved, transferring the reactive burden on to proteins sequenced through the genetic code. Some time later, the informational genome was consolidated into more stable DNA.
Fig 3: The initial tree of rRNAs shows three distinct founding domains (Woese 1987) Click to enlarge
Originally the Bacteria and Archaea were thought to be one large diverse family of prokaryotes until Carl Woese (1977, 1978, 1987, 1990) and others investigated the evolutionary tree of ribosomal RNAs and found that there were three distinct founding evolutionary domains, then named eubacteria, archaebacteria along with the eukaryotes.
This gave the Eukaryotes a closer founding status as well, by contrast with the idea that the procaryotic bacteria came first and then, somehow the higher Eukaryote organisms with their complex cellular structures, including among others - the endoplasmic reticulum, along with the nuclear envelope and Golgi apparatus - all parts of a common complex of internal membranous partitions - and the architecture of microtubules, including centrioles, and the Eukaryote flagellum, as well as the Eukaryotes endosymbiont mitochondria and chloroplasts.
Fig 4: Key structural differences separating
the larger rRNA units of the three domains
(Woese 1987) (click to enlarge).
In addition to their evolutionary sequence divergence, the smaller 30s ribosomal RNAs of each domain, show distinct structural features characteristic of their own domain, but also emphasizing structural links between Bacteria and Archaea on the one hand and Archaea and Eukaryotes on the other, qualitatively confirming the central place of the Archaea in the divergence.
Fig 5: (a) Further elaboration of the rRNA tree (Pace 1997) (b) A third rRNA tree which suggests Archaea lie very close to the root is contrasted with that for the enzyme HMGCoA reductase (c), which also shows evidence of horizontal transfer to an Archaean (ex Doolittle 2000). Click to enlarge
Norman Pace subsequently enlarged the scope and accuracy of the rRNA tree, including a greater diversity of organisms. This tree has become the basis of several other studies. Surviving Archaea are known to inhabit extreme environments, including hot vocanic pools, hydrothermal vents and extreme salty environments and several arrangements of the root of the tree, including Bork's team's work suggest a hot origin for life. However other research (Brochier and Philippe, Boussau et. al.), concludes the base root may have been at about 25oC, a more viable temperature for a simple RNA metabolism, with a succeeding period of high temperature adaptions shortly after the differentiation of the three domains in evolutionary time.
Fig 5b: Left: Evolutionary root of the tree of life and its diversification into archaea, bacteria and Eukaryotes appears to have gone through an early period of cool temperature consistent with an RNA era, followed by a hot period (Anathaswamy, Boussau et. al.) (click to enlarge). Right: Three domains (a) is contrasted with a recent version of the "eocyte" hypothesis (b) showing the eucaryotes emerging from the wider crenarcheota grouping (TACK) after divergence from euryarcheota, implying the amoeboid ancestor of the eucaryotes was an "eocyte" (Williams et al. 2013).
However James Lake (1988) had already challenged the notion of three domains, with an analysis claiming that the eucaryotes instead branched off form only one line of the archaea, the eocytes or chrenarcheota. This view has been confirmed by accumulating genetic studies (Williams & Embley 2014, Williams et al. 2013, Foster, Cox & Embley 2009, Cox et al. 2008) in which the TACK group of archaeota (fig 5b right) have a pivotal relationship with eucaryotes.
The Copernican principle asserts that the Earth is a typical rocky planet in a typical planetary system, located in an unexceptional region of a common barred-spiral galaxy, hence it is probable that the universe teems with complex life. This is supported to a reasonable extent by the discovery of an increasing number of planets including some putative "Goldilocks" zone planets where water would be liquid and life as we know it could potentially exist. Set against this, the "rare earth" hypothesis argues that the emergence of complex life requires a host of fortuitous circumstances, including a galactic habitable zone, a central star and planetary system having the requisite character, the circumstellar habitable zone, the size of the planet, the advantage of a large satellite, conditions needed to assure the planet has a magnetosphere and plate tectonics, the chemistry of the lithosphere, atmosphere, and oceans, the role of "evolutionary pumps" such as massive glaciation and rare bolide impacts, and whatever led to the still mysterious Cambrian explosion of animal phyla. This might mean that planets able to support a bacterial level of life are not so uncommon, but those supporting complex multicellular life might be.
Fig 6: Metabolic power of eucarote cells per haploid genome and hence the capacity for genomic complexity depends on the respiratory power of mitochondria (Lane and Martin).
Bringing this question to a pivotal crux in our context, the emergence of mitochondria as endosymbionts has been proposed to be a critical bottleneck which allowed complex life to evolve only once on Earth, because, only in this effectively fractal cellular architecture, can the membrane surface areas necessary to support the chemical reactions enabling the vastly larger number of genes in a complex organism's genome to maintain metabolic stability (Lane and Martin 2010, 2012). Lane and Martin note "The cornerstone of eukaryotic complexity is a vastly expanded repertoire of novel protein folds, protein interactions and regulatory cascades. The eukaryote common ancestor increased its genetic repertoire by some 3,000 novel gene families. The invention of new protein folds in the eukaryotes was the most intense phase of gene invention since the origin of life. Eukaryotes invented five times as many protein folds as eubacteria, and ten times as many as archaea. Even median protein length is 30% greater in eukaryotes than in prokaryotes". Whether such endo-symbiosis is rare. or a common extreme of parasitic or predatory relationships would then determine how likely or unlikely complex life might be.
That said, one can also see how an endosymbiotic relationship between an archaeote and a respiring bacterium could set up a mutually beneficial energetic relationship which could lead to the eucaryote emergence. Martin and Muller (1998) have proposed that mitochondrial symbiosis might emerge as an interaction under low oxygen between a respiring hydrogen-producing bacterium and an archaeal cell utilizing hydrogen to make ATP.
This massive increase in complexity remains obscure in the genetic and fossil records and requires some ingenious model construction to envisage how mitosis, meiosis, sexuality, the nuclear envelope, endoplasmic reticulum, cytoskeleton, and all the complexities of eucaryote regulation evolved. For a seminal work on this see (Cavalier-Smith 2010).
Regardless of this, Lane and Martin's metabolic approach explains neatly why there is little sign of any of these structures in any existing prokaryote. In effect endo-synbiosis created a completely new energetic regime, in which the only niche players were the newly formed endo-symbiotic chimeras themselves, who then underwent a massive adaptive radiation to form ever more complex forms of cellular machinery and ultimately LECA and the diversity of eucaryotes as we now know them. There are echoes in this metabolic shangri-la of the conditions in lost city vents that we are coming to understand may have likewise given rise much earier to LUCA.
Fig 6b: Left: Bacterium Gemmata obscuriglobus with internal nuclear envelope and vaccuoles (Rachel Melwig & Christine Panagiotidis / EMBL). Right: Ultrathin EM section of a mimivirus in amoeba (Jean-Michel Claverie) Inset: Mamavirus infected by sputnik phage.
Offset against both the uniqueness of the mitochondrial endo-symbiosis and the closely linked, but independent question of the origin of the nucleus and nuclear envelope, has been the discovery of mimi-, mama-, mega- and pandora-viruses infecting amoeba (Raoult et, al., Philippe et al) and related very large aquatic viruses such as CroV infecting single celled plankton species (Fisher et. al.), which despite their recent discovery, appear from ocean gene analyses to be potentially ubiquitous and widespread in the oceans and possibly playing a crucial role in regulating the atmospheric-oceanic pathways, such as carbon sequestration. These form an intermediate genetic position between viruses and cells, having the largest genomes, with extensive cellular machinery and larger than the smallest completely autonomous bacterial and archaeal genomes.
Megavirus chilensis, for example is 10 to 20 times wider than the average virus. The particle measures about 0.7 micrometres (thousandths of a millimetre) in diameter. It just beats the previous record holder, Mimivirus, which was found in a water cooling tower in the UK in 1992. A study of the megavirus's DNA shows it to have more than a thousand genes. The mimivirus genome is a linear, double-stranded molecule of DNA with 1.18 Mbp in length. Megavirus has 1.25 Mbp. Like Mimivirus, Megavirus has hair-like structures, or fibrils, on the exterior of its shell, or capsid, that probably attract unsuspecting amoebas looking to prey on bacteria displaying similar features. These viruses show many characteristics at the boundary of living and non-living. They are as large as several bacterial species, such as Rickettsia conorii and Tropheryma whipplei, possess a genome of comparable size to several bacteria, including those above, and code for products previously not thought to be encoded by viruses. Mimivirus has genes coding for nucleotide and amino acid synthesis, which even some small obligate intracellular bacteria lack. However, it lacks genes for ribosomal proteins, making it dependent on a host cell for protein translation and energy metabolism.
As of mid-2013, an even larger virus with a 2.5 Mb genome without morphological or genomic resemblance to any previously defined virus families has been discovered by the same researchers that found mimivirus, in both the same ocean sample off Peru and in a freshwater pond in Australia. Named pandoravirus - reflecting their lack of similarity with previously described microorganisms and the surprises expected from their future study. The researchers suspect that giant viruses evolved from cells. They think that at some point, the dynasty on Earth was much bigger than the three domains of bacteria, archaea and eukaryotes. Some cells gave rise to modern life, and others survived by parasitizing them and evolving into viruses. Pandora might thus provide a complementary relic of the genomes of this wider founding group (Philippe et al). Using the Global Ocean Sampling (GOS) Expedition data to explore variants of recA (the universal DNA repair enzyme) and rpoB (the beta subunit of bacterial RNA polymerase) a team associated with Craig Venter have discovered branches which may also point to a fourth domain (Wu et al).
Fig 6c: Evolutionary tree of B-family DNA polymerase showing relationship of pandoravirus to other viruses and eucaryotes. Inset is shown pandoraviruses invading acanthamoeba (Philippe et al).
As an illustration of genes in mimivirus normally appearing only in cellular genomes, the mimivirus has genes for central protein-translation components, including four amino-acyl transfer RNA synthetases, peptide release factor 1, translation elongation factor EF-TU, and translation initiation factor 1. The genome also exhibits six tRNAs. Other notable features include the presence of both type I and type II topoisomerases, components of all DNA repair pathways, although the topoisomerase 1B has a different header structure from the eucaryote form (Brochier-Armanet, Gribaldo & Forterre 2008), many polysaccharide synthesis enzymes, and one intein-containing gene. Inteins are protein-splicing domains encoded by mobile intervening sequences (IVSs). They self-catalyze their excision from the host protein, ligating their former flanks by a peptide bond. They have been found in all domains of life (Eukaria, Archaea, and Eubacteria), but their distribution is highly sporadic. Only a few instances of viral inteins have been described. Self-splicing type I introns are a different type of mobile IVS, self-excising at the mRNA level. They are rare in viruses. Mimivirus exhibits four instances of self-excising intron, all in RNA polymerase genes.
Fig 6d: Evolutionary diversification of Mimiviruses from nucleocytoplasmic large DNA viruses (Fisher et. al.) and in relation to the three domains of cellular life based on the concatenated sequences of seven universally conserved protein sequences (Raoult et. al.)
Mamaviruses also host parasitic virophages, affectionately named sputnik (Pearson 2008) as viral satellites, which piggy back on the metabolism of the large viral factories set up by these giant viral genomes causing the mimiviruses to sicken, and these virophages also contains genes that are linked to viruses infecting each of the three domains of life Eukarya, Archaea and Bacteria (La Scola et. al.). It has thus been suggested that they have a primary role in the establishment of cellular life and that they may have been instrumental in the emergence of the nuclear envelope.
Tangled Roots of Horizontal Transfer
Darwin (1859) was the first person to publish an evolutionary tree of life on page 133 of his seminal work. The basis of such a tree in the genetic age has become the vertical transfer of genetic information through reproduction coupled to mutation and selective advantage. This is the basis of the tree diagram itself and all evolutionary trees constructed on genetic data. However, despite the division into three domains, further investigations of proteins in the three domains began to reveal a much more confused and complicated picture.
Firstly the ribosomal proteins, like the rRNAs show distinct, easily differentiated morphologies with some correspondences linking one pair of domains and other another pair (Forterre 2006b, Woese 2000). Secondly, horizontal transfer of genes e.g. through viral interaction has occurred at fluctuating rates throughout all the domains of life. Lawton (2009), provides an in depth a review of this debate. Thirdly, the proteins in Eukaryotes appear to have a mixed origin, with the informational ones having an evolutionary relationship with archaea but the metabolic enzymes appearing to have a bacterial origin. This suggests that the Eukaryote genome has either resulted from one, or more symbiotic fusions e.g. an archaeal and a bacterial genome and/or that there has been a high degree of horizontal gene transfer between bacteria and Eukaryotes.
Fig 7: Evolution of iron-sulphur cluster proteins of mitochondria is linked to α-proetobacteria (Emelyanov 2003)
The evidence for symbiotic inclusions is clear from the fact that all Eukaryotes have endosymbiotic respiring mitochondria. Plants also have photosynthetic chloroplasts derived from cyanobacteria. The only apparent exceptions are a few primitive anaerobic organisms, such as the metamonad human gut parasite Giardia lamblia which nevertheless has a mitochondrial remnant. Mitochondria are evolutionarily related to α-proteobacteria and in particular the SAR11-clade of Rickettsiiae (Emelyanov 2003, Thrash et al. 2011) named after their discovery in the Sargasso sea.
SAR11 clade organisms, unlike the Rickettsiae causing diseases such as typhus, are free living dominant ocean organisms. Pelagibacter ubique discovered in the Sargasso sea, along with its relatives constitute 25% of all microbial plankton cells - the most abundant ocean bacteria and possibly the most abundant bacteria on Earth. It is one of the smallest self-replicating cells known, with a length of 0.37-0.89 µm and a diameter of only 0.12-0.20 µm. 30% of the cell's volume is taken up by its genome. It has the smallest genome 1.30 Mbp of any free living organism, encoding only 1,354 open reading frames (1,389 genes total). The only species with smaller genomes are intracellular symbionts and parasites, such as Mycoplasma genitalium. It recycles dissolved organic carbon and undergoes regular seasonal cycles in abundance - in summer reaching ~50% of the cells in the temperate ocean surface waters. Thus it plays a major role in the Earth's carbon cycle.
Fig 7b: Nitrosopumilus maritimus one of the ubiquitous Thaumarchaea
and Pelagibacter ubique from the SAR11 clade.
Complementing this, in terms of close archaeal relatives of the eucaryote endosymbiosis, are the Crenarchaeota and the closely related TACK sub-clade Thaumarchaeota discovered from their wide occurrence in ocean samples, which may have an even closer relationship with eucaryotes (Brochier-Armanet et al. 2008). The wider grouping of Crenarchaeota were originally thought to be extreme thermophiles, such as the aerobic Sulfolobus solfataricus found in an Italian hot pool growing at 80oC in a pH of 2-4. However since then ubiquitous low temperature Thaumarchaeote species, such as Cenarchaeum symbiosum and Nitrosopumilus maritimus have been discovered in cool oxic ocean. Nitrosopumilus is one of the smallest living organisms at 0.2 µm in diameter. It has a genome of 1.65 Mbp and lives by oxidizing ammonia to nitrite. Based on measurements of their signature lipids taken from ocean samples, These organisms are thought to be very abundant - estimated at 1028 cells in the world’s oceans (Konneke et al. 2005) - suggesting that they have a major role in global biogeochemical cycles and are one of the main contributors to the fixation of carbon. DNA sequences from Crenarchaea have also been found in soil and freshwater environments, suggesting that this phylum is ubiquitous to most environments.
Significantly, these two organisms have been shown to possess a eucaryote type topoisomerase 1B (swivelase) which plays a major role in DNA replicaton, replication and chromatin assembly in eucaryotes, and distinct from the 1B types found in some viruses and bacteria (Brochier-Armanet, Gribaldo & Forterre 2008). This confrims the founding eucaryote had a DNA genome. The closely related Caldiarchaeum subterraneum harbors a ubiquitin-like protein modifier system with structural motifs specific to eukaryotic system proteins. The presence of such a eukaryote-type system is unprecedented in prokaryotes, and indicates that a prototype of the eukaryotic protein modifier system is present in the Archaea (Nunoura et al. 2010).
Thus the two cell types believed to be involved in eucaryote endosymbiosis are both closely related to existing ubiquitous species dominant on a global basis.
Fig 8: Tangled roots (Doolittle 2000)
Many α-proteobacteria, including Rickettsiae (and related Wollbachia and Agrocbacterium), obligately live in the cytoplasm of other cells and so are naturally adapted to becoming an endo-symbiont of a glycolytic organism by providing respiring energy to the host's metabolism resulting in the mitochondrion. Giardia still retains traces of mitochondrial proteins so appears to have lost its respiring organelles, through specializing for an anaerobic parasitic habitat, rather than occupying a place in the tree before mitochondria were incorporated into eucarya (Adam 2000). In fact Giardia was found to have mitochondrial Fe-S proteins which showed up in vestigal mitochondria, now called a mitosomes (Zimmer 2009). Some protista, such as Trichomonads and fungi such as Chritidiomycetae from cattle rumen have hydrogen-generating hydrogenosomes which share evolutionary homology with mitochondria (Martin & Mentel 2010). Some mitochondria in worms and molluscs are also able to shift to a low-energy anaerobic mode.
Having discovered the hydrogenosome, Martin and Muller (1998) suggested that the advent of mitochondrial symbiosis might be explained as an interaction under anoxic conditions between a respiring hydrogen-producing bacterium and an archaeal cell utilizing hydrogen to make ATP, as many archaea still do. Between 1000 and 600 Mya, the ocean underwent an anoxic period, possibly due to a snowball Earth phase which was broken by atmospheric CO2 buildup, and a major oxygen increase generated by cyanobacteria, which then precipitated the acidic ocean iron and triggered the Cambrian.
There are glaring incidences of horizontal transfer in higher organisms, where for example, cows have a gene that originated in snakes. The picture of horizontal transfer is even more tangled in bacterial and archaeal genomes, which contain a great number of shared and exchanged genes, through promiscuous viral transfer between species.
This has led to Doolittle (1998), Woese (2002), and others, proposing a tangled root to the tree of life involving a transition from a regime in which there was a much higher rate of horizontal exchange and effective global optimization of genomes, to tree-like vertical evolution of genomes, once the more complex genomes of the Eukaryote domains became established.
Fig 9: Left: Superfamily fold incidence evoutionary tree of eucaryotes and the three domains of life. The number of folds connecting each group is shown lower left (Yang, Doolittle & Bourne 2005).
Right: High resolution tree of the three domains of life - eubacteria, archaea, eukaryotes. (Ciccarelli, Bork et. al. 2006) (click to enlarge).
One way of testing whether the three branches actually have a meaningful evolutionary tree is to use the simple presence of a given super-family protein fold as a classifier (Yang, Doolittle & Bourne 2005). This proves to be a more accurate measure of taxonomy than many others, based of fold frequency, which correctly divides the Archaea into crenarchs and euryarchs and groups the Eukarya into animals, plants, fungi, and others (protists). This method leads to the evolutionary tree illustrated left in fig 9. This leads to the implication that LECA was a well-defined organism with a rich gene complement and not just a quasi-genome shaped by horizontal transfer.
To try to clarify the taxonomic relationships founding the tree of life, Peer Bork and his team produced a refined evolutionary tree (fig 9 right) by selecting only universal proteins that had not been subjected to horizontal transfer, providing the most detailed tree root diagram to date, although admittedly on only a skeleton gene set comprising some 1% of the respective genomes. The phylogenetic tree has its basis in a cleaned and concatenated alignment of 31 universal protein families and covers 191 species whose genomes have been fully sequenced. Merhej et. al. have further demonstrated convergent evolution among specialized bacterial groups.
Fig 10: Tree diagram of the birth, transfer, duplication and loss of key genes in the redox and electron transport pathways, in a founding burst of gene evolution between 3.3 and 2.7 billion years ago (David and Alm 2010).
Then Lawrence David and Eric Alm (2010) produced the above tree investigating the central genes common to a wide spectrum of life forms, involving the founding steps of redox reactions and electron transport, demonstrating a rapid evolutionary innovation during an Archaean genetic expansion between 3.3 and 2.7 billion years ago. They mapped the evolutionary history of 3983 gene families that occur in a wide range of modern species. They were able to show that 27 per cent of these gene families appeared in a short evolutionary burst. Many of the genes from this time were involved in electron transport - a key step in respiration and photosynthesis, which ultimately led to oxygen-producing photosynthesis and the "great oxygenation event" 2.4 billion years ago, when the atmosphere became oxygen rich.
This lends support to the idea that the collective primordial genome functioned as a supercomputer (King 2010) based on parallel genetic algorithms combined with horizontal genetic transfer, whose bit computation rate through mutation and recombination is sufficient to generate the functional conformations, through protein folding, to solve the key metabolic pathways over a period no longer than 300 million years.
Fig 10b: Tree of orphan genes in metazoa with charts for the mouse and fruit fly, show the emergence of orphans throughout the span of evolution, with a peak in both at 800 million years ago when earth emerged from its “snowball” phase, with the current peaks corresponding to newborn genes, many of which will be lost. About 20 percent of new genes in fruit flies appear to be required for survival. And many others show signs of natural selection (Tautz & Domazet-Loso 2011).
Contrasting with the the early emergence of key functional genes it has been discovered that regions of non-coding DNA have been repeatedly activated from non-coding DNA to become de-novo 'orphan' genes, which cannot be accounted for by gene duplication, conversion to new functions through exon shuffling to make new modular arrangements, or genes generated from transposable elements. While orphan genes might seem exponentially improbable on the basis that n base pairs with 4n possible arrangements could randomly become functional, such genes have been recently found to be ubiquitous. 10-20% of genes in all taxa so far explored lack homologs in other species. About 2/3 of domains of unknown function (DUF) open reading frames (ORF) whose 3-D structures have been analysed show folds which are likely outlier extremes of existing gene families not recognized by gene comparison systems such as BLAST (Jaroszewski et al. 2009) many are also de-novo orphans.
A clear example is the Pldi gene in the mouse M. musculus, which has arisen within the past 2.5-3.5 million years in a large intergenic region present in many mammals, including humans, thus excluding gene duplication, transposable elements, or other genome rearrangements. The gene has three exons, shows alternative splicing, and is specifically expressed in postmeiotic cells of the testis enhancing sperm motility (Heinen et al. 2009). Its emergence correlates with indel mutations in the 5' regulatory region. A recent selective sweep is associated with the transcript region in M. musculus populations.
In humans, at least one de novo gene is active in the brain, leading some scientists to speculate such genes may have helped drive the brain’s evolution. Others are linked to cancer when mutated, suggesting they have an important function in the cell. De novo genes are often short, and produce small proteins. Rather than folding into a precise structure they have a more disordered architecture, allowing the protein to promiscuously bind to a broader array of molecules (Singer 2015). Investigations suggest that such taxon-specific genes drive morphological specification, enabling organisms to adapt to changing conditions in the generation of morphological diversity, and innate defence (Khalturin et al. 2009).
Bacteria engage in much more radical forms of pan-sexuality than higher organisms, involving viruses and plasmids, themselves separate mobile genetic elements, acting as agents of genetic transfer, accelerating the pace of bacterial evolution (Maxmen 2010). This enables the genetic sequences of bacteria, archaea and protists to move around in the genome and to be exchanged between cells, and even between different species. Sexual exchange of material can happen both through viral exchange and through a conjugation plasmid, which can spool DNA from one bacterium into another, resulting in a net donation of genes from one strain or species to another, which ensures a broad exchange of genetic material throughout bacterial ecosystems, resulting in rapid accumulation of advantageous genes exemplified by plasmid borne infectious drug resistance.
To give a very rough idea of the computing power of the combined bacterial genome alone, taking into account bacterial soil densities (~109/g), effective surface area (~1018 cm2), genome sizes (~106), combined reproduction and mutation rates (~10-3/s) gives a combined presentation rate of new combinations of up to 1030 bits per second, roughly 1012 times greater than the current fastest computer at 33 petaflops or about 1017 bit ops per second. Corresponding rates for complex life forms would be much lower, at around 1017 per second because they are fewer in total number and have lower reproduction rates and longer generation times, but they are still vying with the computation rates of the fastest supercomputer on earth.
An even higher figure has been given by Ladenmark et al. (doi:10.1371/journal.pbio.1002168.t001). Using information on the typical mass per cell for each domain and group and the genome size, estimate the total amount of DNA in the biosphere to be 5.3 x 1031 (±3.6 x 1031) megabases (Mb) of DNA (Table 1). This quantity corresponds to approximately 5 x 1010 tonnes of DNA, assuming that 978 Mb of DNA is equivalent to one picogram. Assuming the commonly used density for DNA of 1.7 g/cm3, then this DNA is equivalent to the volume of approximately 1 billion standard (6.1 x 2.44 x 2.44 m) shipping containers. The DNA is incorporated within approximately 2 x 1012 tonnes of biomass and approximately 5 x 1030 living cells, the latter dominated by prokaryotes. By analogy, it would require 1021 computers with the mean storage capacity of the world’s four most powerful supercomputers (Tianhe-2, Titan, Sequoia, and K computer) to store this information. If all the DNA in the biosphere was being transcribed at reported rates, taking an estimated transcription rate of 30 bases per second, then the potential computational power of the biosphere would be approximately 1015 yottaNOPS (yotta = 1024), about 1022 times more processing power than the Tianhe-2 supercomputer, which has a processing power of 33.86 peta flops on the order of 105 teraFLOPS (tera = 1012).
This picture of bit rates coincides closely with the Archaean expansion scenario noted above and suggests that evolution has been a two-phase process in which the much higher bit rates of the collective single-celled genome under promiscuous sexuality and horizontal transfer has arrived at a global genetic solution to the protein folding problems of the central metabolic, electro-chemical and even root developmental pathways, which are then later capitalized on by multi-celled organisms, through gene duplication and loss as well as the creation of new specialized genes at a much lower rate.
Fig 11: Horizontal transfers across the bacterial tree under two thresholds 10, 5 genes (Dagan et. al.).
The massive extent of horizontal transfer in eubacteria, as well as archaea has also become clear suggesting large components of procaryote genomes are effectively globaly optimized for their niches by frequent genetic transfer. Dagan et. al. have characterized the extent of horizontal transfer for a series of thresholds as well as establishing specific modularity of horizontal transfer of functions between groups.
Fig 12: Genetic diffusion at the root of the tree (Dagan and Martin)
Critics of the validity of the tree root concept, such as Dagan and Martin emphasize the small proportion (1%) of the genome used in Bork's study and stress both the lateral (or horizontal) gene transfer events uniting the prokaryote realms and the apparent chimaeric nature of the Eukaryote genome, which appears to contain both archaea-related informational genes and eubacterial metabolic ones, in addition to obvious endosymbiont events of the mitochondrion and chloroplast.
The case for horizontal transfer of genes between unrelated Eukaryote species through infectious elements invading new and hence non-resistant species is also well established. Genome-wide comparative and phylogenetic analyses show that HGT in animals typically gives rise to tens or hundreds of active ‘foreign’ genes, largely concerned with metabolism (Crisp et al. 2015). The SPIN element is present in a diverse unrelated set of species, spanning amphibians, reptiles, marsupials and mammals while absent from closely related species (Lisch).
Fig 13: Left: Spread of the HAS1 hyaluronan synthase genes across diverse groups - chordates, other metazoa, fungi, plants, bacteria and archaea (Crisp et al. 2015). Right: Pattern of invasions of the SPIN element (Lisch)
The Eukaryote Nuclear Genome as a Genetic Fusion
Fig 14: The proposed ring of life (Rivera and Lake)
In a further study, Rivera and Lake used a new algorithm to take account of possible genetic fusion events by forming a genetic ring through matching partial trees into a most parsimonious whole, inferring that the Eukaryote genome has arisen from a fusion of an archaeal (possibly eocyte) genome with that of either a cyanobacterium or possibly a γ-proteobacterium.
The method used cannot definitively determine whether or not the eubacterial genome could have come from the mitochondrial event, which, to an even greater extent than the more recent chloroplast, has resulted in a high net transfer of genes from the mitochondrial chromosome to the nucleus, leaving open the possibility that in addition to the mitochondrial symbiosis, and the later chloroplast one, there may have been an additional genetic fusion. Lane and Archibald have cited further major endosymbiosis events involving complex three genome interaction in protista, where both green and red algae have been incorporated by endosymbiosis into other protists, which demonstrate both that endosymbiois has occurred many times and the genomic complexity of nuclear symbiont gene exchange.
Fig 15: Proposed fusion between two genomes - informational, from Archaea (red), and metabolic, from Eubacteria (blue) as well as mitochondrial genes migrating to the nucleus (green) (Horiike et. al.). Up to 75% of nuclear genes whose ancestry has been elucidated may come from bacteria (Lane).
The idea of a genetic fusion between a member of the archaea and a γ-proteobacterium is also supported by several other lines of research, including evolution of glycolytic enzymes (Emelyanov), ‘homology-hit analysis’ of non-mitochondrial genes determined the number of yeast orthologous ORFs in each functional category to the ORFs in six Archaea and nine Bacteria at several thresholds, suggesting an archaeal parasite engulfed by a eubacterium (Horiike et. al.) the proposal that close association between a central methanogenic archaebacterium (archaea) and a close-knit surrounding clump of ancestral sulfate-respiring δ-proteobacteria could have also led to the nucleus and endoplasmic reticulum (Moreira and Lopez-Garcia). See also fig 16b2.
Recent work shows that the mitochondrial and nuclear genomes are in a feedback relationship, rather than the former being merely a denuded genetic skeleton reduced to its bare bones functions. While the process of transfer of mitochondrial genes to the nucleus has resulted in a tally of mitochondrial genes of only 37, comprising respiration enzymes and essential genes involved in mitochondrial DNA and protein synthesis, the highly-compactified human 16,569 bp mitochondrial genome neverthelss contains up to 500 overlapping open reading frames (Yen et al. 2013), as well as abundant mitochondrial genome-encoded small RNAs (Ro et al. 2013), which appear to be products of currently unidentified mitochondrial ribonucleases and may have a regulatory role. Humanin, a ubiquitous protein involved in human stress protection (Yen et al. 2013) appears to be generated from the mitochondrial genome, although duplicate copies also appear to have been transferred to the nucleus. This is also consistent with the metabolically responsive roles of the mitochondrion, for example in in initiating apoptosis (controlled cell death) in which humanin plays an inhibitory role, and enriching synaptic contacts in neurons (Sun et al. 2013). Mitochondrial genomes evolve 5-15 times faster than the nuclear genome. Until recently, it was generally accepted that all mitochondrial DNA molecules are identical at birth, however, recent work has shown that ~25% of healthy individuals inherit a mixture of wild-type and variant mtDNA, generally involving the non-coding hypervariable mtDNA D-loop responsible for initiating DNA replication and transcription (Payne et al. 2013). Variant mtDNA lineage expansions have been found in Tibetan sherpas living at different altitudes suggesting evolutionary adaptions over the order of 102 years (Kang et al. 2013). In turn, it has also been found that endurance exercise can turn over mitochondria, effectively removing those with reduced function due to accumulated mutations (Safdar et al. 2011).
Bacteria generally have a nucleoid region of distinctive cytoplasm around the DNA, but no nuclear membrane. An intriguing question is raised by the bacterium Gemmata obscuriglobus which is a member of the phylum Planctomycetes (see fig 9 enlargement) which appears to have both a nuclear envelope and endoplamic reticulum-like intra-cellular membranes and the ability to uptake proteins present in the external milieu in an energy-dependent process analogous to eukaryotic endocytosis consistent with autogenous evolution of endocytosis and the endomembrane system in an ancestral non-eukaryote cell (Lonhiennea et. al.).
Fig 16: Gene replacement tree root (Makarova) Hartman and Federov's list of putative chronocyte genes correspond to the 359 above.
Other theories stick to the three domain paradigm and propose that a primitive Eukaryote precursor possibly still retaining an RNA-based genome, as suggested by Woese (1998) might be the case for the progenote (first root life form) to be the last universal common ancestor of all three domains, possibly including genes for endoplasmic reticulum and microtubules, which later engulfed both a member of the archaea and a eubacterium. Hartman and Federov cite a collection of such genes, including those for ribosomal proteins as well, naming the organism as a chronocyte.
This is also consistent with the much greater complexity of use of RNA in Eukaryotes, including alternative splicing, the use of introns, interfering-RNAs in gene regulation, micro-RNAs and the use of the nucleus to contain a diversely functioning RNA informational metabolism not unlike that of a putative progenote.
In 2015 a new Archaeal phyllum Lokiarchaeota, from the TACK super-phyllum, already known to include homologs of actins and tubulins and cell division sorting proteins (Guy & Ettema) illustrated in the tree, was discovered at the Arctic Mid-Ocean Spreading Ridge near the Loki's castle vent, having a closer evolutionary relationship with eucaryotes than any other known archeote, which also has the largest complement of ESPs, or eucaryote signature proteins, so far discovered (Spang et al.). The analysis of the genome of Lokiarchaeum revealed about 175 proteins that were related to eukaryotic proteins. These included actins, ubiquitin modifying proteins, diverse Ras-superfamily GTP-ases surpassed only by the eucaryote Naegleria gruberi (see below), an ESCRT gene cluster which eucaryotes is an essential component of the multivesicular endosome pathway for lysosomal degradation of damaged or superfluous proteins and plays a role in several budding processes including cytokinesis, autophagy and viral budding as well as of several additional proteins homologous to components of the eukaryotic multivesicular endosome pathway, including curvature sensing protein families involved in various aspects of vesicle/membrane trafficking or remodelling processes in eukaryotes.
LECA: Finding the Roots of the Eucaryote and Metazoan Trees
Just as we have investigated the enigmatic nature of LUCA, the last common ancestor of all life on earth, the nature of LECA, the last eucaryote common ancestor remains obscure. We can now tourn from the question of genetic fusion founding the eucaryotes to the question of what genetic and cellular regulation and signalling systems our common ancestor possessed and how early the key components enabling multi-cellular life evolved.
Fig 16a: (a) Complement of signalling systems found in Naegleria gruberi (Fritz-Laylin et al. 2010), a free-living single celled bikont amoebo-flagellate, belonging to the excavata, which include some of the most primitive eucaryotes such as Giardia and Trichomonads. Nevertheless it is capable of both oxidative respiration and anaerobic metabolism and can switch between amoeboid and flagellated modes of behavior, regenerating complete centrioles and flagellae de novo (Fritz-Laylin & Cande 2010). The Naegleria genome sequence contains actin and microtubule cytoskeletons, mitotic and meiotic machinery, sugesting cryptic sex, several transcription factors and a rich repertiore of signalling molecules, including G-protein coupled receptors, histidine kinases and second messengers including cAMP. One strain analyzed is a composite of two distinct haplotypes indicating hybridization. Although sexual mating has not been obseved in Naegleria, the heterozygosity found in its genome is typical of a sexual organism, with perhaps infrequent matings. Additionally, identification of the core RNAi machinery indicates that Naegleria may use this mechanism. Nagleria fowlerii is a notorious warm-water species causing meningitis. (b) Two of four genes involved in meiosis, HOP1 believed to be a component of the synaptonemal complex essential for sexual recombination, and MRE11 meiotic recombination 11 involved in recombination, telomere processing and double-stranded DNA break repair. Both show a wide evolutionary distribution across eucaryotes including Giardia implying that LECA the last eucaryote comon ancestor was a sexual organism (Ramesh, Malik & Lodgson 2005). MRE11, along with RAD51 also show homology with Archaea, suggesting an even deeper origin of sexuality.
The fact that primitive eucaryotes such as Giardia and Trichomonads appear to lack key structures and processes such as mitochondria and sexual recombination initially led to the notion that the founding eucaryote was a simple amoeboid cell lacking the genetic complexity of more advanced protista and higher organisms. However it is now clear that a tree reconstruction artefact, known as Long Branch Attraction, is responsible for the apparent early emergence of the fast evolving Archezoa in the eukaryotic tree. The notion that all extant eukaryotes are ancestrally mitochondrial is strongly supported by the discovery of rudimentary mitochondrial organelles in all analysed Archezoa. Part of the problem comes from the fact that organisms such as Giardia and Trichomonads have specialized for parasitic microenvironments and in particular anaerobic conditions where they can suffer rapid loss of inessential genes, distorting the evolutionary picture. When we look at close free-living relatives of these organisms, we discover a complex genetic complement containing most of the functionalites of "advanced" eucaryotes.
Naegleria gruberi transitions from amoeboid to flagellate behavior and back (https://youtu.be/yy5zI3LBK24).
The free-living amoebo-flagellate Naegleria gruberi for example (Fritz-Laylin et al. 2010) belongs to a varied and ubiquitous protist clade (Heterolobosea) that diverged from other eukaryotic lineages over a billion years ago and is very close to the hypothetical root of the eucaryote tree (see fig 16b5). The Naegleria genome, analyzed in the context of other protists, reveals a remarkably complex ancestral eukaryote with a rich repertoire of cytoskeletal, sexual, signaling, and metabolic modules.
Fig 16b: Consensus tree for eucaryote root (Brinkmann & Philippe 2007). The assumption underlying Woese's paradigm - that simple organisms (e.g., prokaryotes and Archezoa) represent genuinely primitive intermediates in the progressive construction of complex eukaryotic cells - has on the basis of more recent genetic analysis been replaced by that of a complex last common eucaryote ancestor (LECA/LCAEE). In the consensus picture, any feature present in some opisthokonts (e.g., animals) and in some bikonts (e.g., plants) is necessarily ancestral, i.e., inherited from the LECA. This implies that the LECA was in Brinkman and Phillipe's words: "amazingly more complex than previously thought. Among others, the LCAEE already possessed mitochondria, an efficient cytoskeleton associated with several intracellular transport systems, an endo-membrane system interconnected by a complicated vesicular transport machinery, including an endoplamic reticulum, a Golgi apparatus, a standard nucleus, efficient secretory and uptake pathways, recycling of food‐vacuoles, peroxisomes, spliceosomal introns, and flagella‐dependent motility. Therefore, simple extant eukaryotes have evolved from a complex LCAEE mainly by loss and secondary simplification".
Recently, a molecular dating study based on a large phylogenomic dataset with a relaxed molecular clock and multiple time intervals yielded in a surprisingly recent time estimate of 1085 Mya for the origin of the extant eukaryotic diversity. Therefore, extant eukaryotes seem to be the product of a massive radiation that happened rather late, at least in terms of prokaryotic diversity (Brinkmann & Philippe 2007). Given the large number of novel eucaryote genes and folds (Lane & Martin 2010) this late diversification needs to be carefully rationized with the results of (David and Alm 2010) shown in fig 10 and with the fact that the cyanobacterial chloroplast was incorporated by plants between 1200 and 1800 Mya (Dorrel & Howe 2012). The oldest undisputedly eukaryotic microfossils [containing mitochondria] go back 1450 Mya in the fossil record (Martin & Mentel 2010).
Fig 16b2: Left: The inside-out hypothesis for the origina of the eucaryote cell has the original eocyte budding out from a cell comprising the nuclear envelope to form external blebs through the nuclear pore structures to facilitate symbiosis with γ-proteobacteria later forming the endoplasmic reticulum and engufing the mirochondria as the blebs evolved into a continuous external membrane. Right: two archaeoic Candidatus Giganthauma karukerense cells surrounded by ectosymbiotic γ-proteobacteria (Baum & Baum 2014).
The eucaryote tree of life remains an enigma, particularly in terms of locating the root of the tree. Originally portrayed in terms of the five kingdoms of plants, animals and fungi, protruding from the single-celled protista, complementing the eubacteria and archaeotes classed as monera. However the tree has now been revised (Adl 2012). It is now recognized that multi-cellularity has arisen independently in many branches of the eucaryotes (Brown et al. 2012).
Fig6b3: (Left) Traditional five-kingdom eucaryote tree of life (Whittaker). (Right) Multiple evolution of multcellularity in diverse eucaryote super-groups
extends to many groups besides fungal and animal Opisthokonts and plant and algal Archaeplastida (Brown et al. 2012).
However a key step in multicellularity has been linked to a single critical mutation. To form and maintain organized tissues, cells must coordinate how they divide relative to the position of their neighbours. One important aspect of this process is orientation of the mitotic spindle, a structure inside the dividing cell that distributes the chromosomes —and the genetic material they carry — between the daughter cells. When the spindle is not oriented properly, malformed tissues and cancer can result. In a diverse range of animals, the orientation of the spindle is controlled by an ancient scaffolding protein that links the spindle to “marker” proteins on the edge of the cell. Anderson et al. have now used a technique called ancestral protein reconstruction to investigate how this molecular complex evolved its ability to position the spindle. First, the amino acid sequences of the scaffolding protein’s ancient progenitors, which existed before the origin of the most primitive animals on Earth, were determined. Anderson et al. (2015) computationally retraced the evolution of large numbers of present-day scaffolding protein sequences down the tree of life, into the deep past. Living cells were then made to produce the ancient proteins, allowing their properties to be experimentally examined. By dissecting successive ancestral versions of the scaffolding protein, they deduced how the molecular complex that it anchors came to control spindle orientation. This new ability evolved by a number of 'molecular exploitation' events, which repurposed parts of the protein for new roles. The progenitor of the scaffolding protein was actually an enzyme, but the evolution of its spindle-orienting ability can be recapitulated by introducing a single amino acid change that happened many hundreds of millions of years ago.
Fig 6b3b: Tree of Dlg protein evolution to animal multicellular mitotic division (Anderson et al. 2015)
Closer studies of the genetic and phylogeny of a large number of eucaryote proteins has led to a revision of the tree in which single-celled species such as amoebae are grouped with animals and fungi in the opisthokonts, which are separated to the ery root of the tree from multi-cellular plants forming the archaeplastids. Animals are closer cousins to single-celled choanoflagellates than to other multi-cellular organisms, such as fungi and plants. Giant kelp are closer relatives to single-celled diatoms than to multicelled red seaweeds or plants. Likewise separate groups consisting of Rhizaria, Alveolates and Stametopiles have been found to form a single supergroup now called SAR.
Fig 16b4: Revised eucaryote tree (Milius 2015) taking into account supergroupings which include both single-celled and multicelled organisms rooted as closely as, possible given several lines of phylogenomic (Burki 2014) evidence .
Attempts to find the root of the eucaryote tree naturally revolve around linking eucaryotes to their founding archaeota using RNAs and proteins linked to nucleic acid processing and other processes inherited from the founding archaeal genome. However, the archaeal sequences differ substantially from their eukaryotic counterpart, resulting in extremely long phylogenetic distances between archaea and eukaryotes. The use of a distant outgroup in phylogenetic reconstruction is highly problematic because the remaining phylogenetic signal is very weak, and so correct positioning of the root is even weaker, creating a nonphylogenetic signal that is often stronger than the phylogenetic signal, thereby favoring long-branch attraction so that fast evolving eukaryotes are constantly found at the base of all other eukaryotes. Rare cytological and genomic changes specific to some eukaryotic lineages have also been considered for rooting of the eukaryotic tree, but an innovative alternative strategy is to use the eubacterial genes which have been incorporated into eucaryote lines, particulaly the alpha-proteobacterial genes originally incorporated with the mitochondria. By combining the "ALPHA-PROT" dataset using of 42 eukaryotic proteins with a mitochondrial function encoded by the nuclear or mitochondrial genomes as phylogenetic markers with the "EUBAC" dataset using 37 eukaryotic genes acquired by ancient lateral gene transfers from different eubacterial sources and including a wider set of newly sequenced species the analysis arrives at the following consistent rooted tree (Derelle et al. 2015). This places malawimonads such as Malawimonas jakobiformis and discoba such as Naegleria gruberi on opposite sides of the root (fig16c), despite the act that they are both classed as excavates (Cavalier-Smith 2010, 2014) which are otherwise regarded as monophyletic based on their flagellar ultrastructure.
Fig 16c: Left: Consistent rooted eucaryote tree using two phylogenetic measures based of eubacterial-origin eucaryote gene sets. Malawimonas jakobiformis and Naegleria gruberi are illustrated to show common features close to the root of the tree. Right: The early origin of sexuality is attested to by recent research into the extended family tree of amoebae, which shows that sexualty is likely to have arisen in the common ancestor and been subsequently lost in asexual protist species, rather than the reverse (Lahr et al 2011). Amoeboid organisms (left) highlighted on the eucaryote tree of life. SAR indicates the group composed of Stramenopila, Alveolata and Rhizaria. Two principal branches (right) unikont amoebozoa and dikont rhizaria show confirmed (black), direct evidence such as meiosis (grey) and indirect evidence (white) for sexualty showing it is a likely founding characteristic. Unequivocal evidence for sex in Giardia implies sexuality arose in the last common ancestor of all eucaryotes, very early in evolution (Lane 2009a).
Contrary to the idea that sexuality and sexual recombination are late adaptions of advanced eucaryotes, investigation of meiosis-related genes HOP1, MRE11, MLH/PMS and RAD51/DMC1 produces evolutionary trees showing the occurrence of these genes across the eucaryotes from Giardia to Homo, implying sexuality is a founding characteristic of all extant eucaryotes (Ramesh, Malik & Lodgson 2005). Amoeboid forms of both unikont amoebozoa and dikont rhizaria show trends consistent with founding sexuality, later lost in some branches (Lahr et al 2011).
A detailed evolutionary picture showing the relationship between metazoa and protozoans and the bridge between choanoflagellates and sponges has been developed in association with elucidating the genome of the single-celled choanoflagellate Monosiga brevicollis (King et al. 2008).
Fig 16d: (a) Tree of organisms expressing various forms of integrin-related proteins, from α-actinin to α-integrin includes diverse branches of single-celled eucaryotes (Sebe-Pedros et al. 2010). α-actinin was expressed in all branches. Both α- and β-integrin are expressed in those marked with (*) including single-celled Capsaspora owcarzaki and Amastigomonas sp. (b) Tree of tyrosine kinases (Suga et al. 2012). Cytoplasmic CTKs have a deep eucaryotic origin while receptor RTKs evolve later. (c) Tree of GPCR components extends back to LECA (Mendoza et al. 2014) making this signalling pathway, central to the nervous systems of higher animals, a founding unit of eucaryote evolution. All except those marked with * had one or more GPCR components and all clades included species possessing canonical GPCR, across both unikonta and bikonta, implying a common origin with LECA.
Examination of three key gene systems associated with the emergence of metazoa, the integrin pathway (Sebe-Pedros et al. 2010), tyrosine kinases (Suga et al. 2012) and G-protein coupled receptors (Mendoza et al. 2014) shows the key components evolved long before in single-celled eucaryotes and even the last eucaryote common ancestor in the case of G-protein system components. In both tyrosine kinases and GPCRs this allowed for a burst diversification of receptor-based tyrosine kinases and diverse GP-coupled receptors as the metazoa emerged.
The view of the root of the metazoan tree has shifted significantly with the discovery of new genetic data, novel primitive species and new fossil evidence. Research into the absence of the miRNA pathway of ctenophores (Maxwell et al. 2012) and an apparently separate evolution of neural nets and apparent absence of HOX genes (Moroz et al. 2014) suggests that, rather than being sister organisms of the cnidaria, they may be more ancient life forms originating in the Ediacarian. Fossils of another organism Eoandromeda octobrachiata dating back to 580 million years (Tang et al. 2011) is consistent with this picture. The discovery of a living genus Dendrogramma in Australia with characteristics also consistent with a Medusoid Ediacarian origin (Just et al. 2014) puts it even earlier than the emergence of the ctenophores, also lending weight to the biolocial nature of Mawsonites spriggi (Seilacher et al. 2005). Furthermore, the discovery of large colonial fossils dating back to 2.1 billion years (Albani et al. 2010) in addition to the previous discovery of the putative tubular alga Grypania spiralis dating back to a similar age (Han & Runnegar 1992) sets a potentially very early origin for multicellular organisms.
Fig 16e: Left: Trees of eucaryotes based on translation elongation factor EF-2 and β-tubulin genes (King and Carroll). Right :Finding the root of the metazoan tree. (a) Ctenophora first tree elucidating fixation of identified components from miRNA to neurones and muscles (Moroz et al. 2014) (b) Evidence for early emergence of the Ctenphora exemplified by Mnemiopsis leidyi based on their lack of miRNA processing suggests they diverged before both sponges such as Amphimedon queenslandica and the placozoan Trichoplax adhaerens (Maxwell et al. 2012). (c) Possible evolutionary position of Dendrogramma (Just et al. 2014).
Viral Influences on the Nuclear Genome
Fig 17: Proposed viral contribution of DNA polymerases FvA etc. founder viruses (Forterre)
Other cellular and viral genealogies are possible and the scheme is merely representative.
Forterre looks likewise to a three component origin, but his emphasis is on the idea that viruses have contributed major components to the genome of all three groups, possibly providing each of three RNA-based cell lineages with independent transitions to DNA-based genomes by contributing DNA-polymerases, thus radically improving the stability and competitiveness of these cell lines who became the eventual survivors. In addition to the ribosomal proteins and rRNAs having distinct qualitative features in each domain, many DNA informational proteins exist in different nonhomologous families (usually with several versions for one family). There are already six known nonhomologous families of cellular DNA polymerases. In the case of DNA polymerases of the B family, there is one version in Bacteria (only found in some proteobacteria), one in Archaea, and several in Eukarya. The distribution of the different versions and families of cellular DNA informational proteins among domains is erratic most of the time and does not fit with any of the models proposed for the universal tree, suggesting abrupt insertion into the cellular genomes by viral transfer.
Fig 17b: The differing DNA and RNA viral taxomomy of the Rep and capsid genes of BSL RHDV (Diemer & Stedman 2012).
A very unusual virus has given an indication how the transfer from RNA to DNA could have meen mediated by viruses. Although viruses are very promiscuous, they generally only recombine with viruses of s similar type or at least the same mode of replication. Thus until recently no instances were known of viral recombination bridging the three major groups: RNA viruses, DNA viruses and retroviruses encoding DNA from RNA insktructions by reverse transcriptase. However a chimeric circular, putatively single-stranded DNA virus BSL RHDV encoding a major capsid protein similar to those found only in single-stranded RNA viruses was discovered in a hot acidic lake (Diemer & Stedman 2012). They also found that something very similar had turned up in samples of ocean water sequenced by a team led by Craig Venter. This gives the beginning of an explanation how DNA-based RNA-viral genes in endemic viruses, presumably via reverse transcriptase, have made multiple RNA to DNA transitions of other viral and cellular genes, probably when RNA, DNA and retroviruses cohabited cells.
Viruses such as phages appear to have no evolutionary tree, with genomes across widely diverse habitats consisting of cut and paste components, implying viral adaption has resulted almost entirely from utilization of advantageous genes from horizontal transfer. Around 10% of all bacterial genes sequenced to date consist of ORFans that bear no resemblance to genes seen anywhere else, suggesting horizontal viral origin (Hamilton).
Fig 18: Left: Evolutionary tree of DNA polymerase amino termini (Villareal and Defilippis)
Right: Bacterial DNA polymerases also show viral members (underlined) close to the root of the tree (File et. al.).
Villareal and Defilippis have likewise investigated the idea that DNA viruses are the origin of DNA replication proteins, by investigating the amino terminus and constructing an evolutionary tree which shows DNA polymerases of DNA viruses, eukaryotes (&alpha,&delta), archaea, E coli and two phages rooted in a tree consistent with a viral origin.
This idea has a great deal of plausibility because viruses are now know to have a potentially primal origin, rather than being recent escapees from cellular genomes which have undergone reductive parasitic changes to their genome. Viruses clearly also have retained both RNA-RNA, DNA-DNA and retrotranscription DNA-RNA-DNA using both RNA and DNA stages in their capsid viral forms, so they retain all the transitional states between RNA and DNA-based replication.
Furthermore the retroviruses and related mobile genetic elements have a common ancient evolutionary origin, which is related to telomerase, which itself uses an RNA primer to initiate chromosome duplication. There is thus a plausible case that telomerase is in fact a biological fossil of a retroviral conversion of the founding Eukaryote cell line to a DNA genome.
The Symbiotic Face of Eukaryote Mobile Elements
Fig 19: (Left) Human transposable element evolutionary history of L1-LINEs (cream), Alu elements (lt. blue), retrovirus-like LTR (long-terminal repeat) elements (green) and DNA transposons (dark brown). Older L2-LINEs and SINEs are in yellow and dark blue. This history extends back over 200 million years indicating the very ancient basis of this potentially symbiotic relationship (Human Genome Consortium) (click to enlarge). Comparison with the mouse genome can be accessed at Waterston et. al. (2002). (Right) L1 replication: L1 is transcribed in open reading frames ORF1, an RNA-binding protein, and ORF2 an endonuclease/reverse-transcriptase. The bound RNA-protein complex RNP is transported to the nucleus where target-primed reverse transcription to chromosomal DNA takes place (Han and Boeke).
In 1978, following the work of Darryl Reanny (1974-6), I proposed (1978, 1992) that viruses and transposable elements, far from just being selfish genes (Dawkins 1976), formed part of a dynamical system of genetic symbiosis between the hosts and the mobile genetic elements, because the mobile elements permitted forms of coordinated gene expression and the formation of new genes in a modular manner, which would otherwise be impossible, achieving in return perpetuation of their own genomes over evolutionary time scales. Most of the details of this proposal have proved to be realized. The ENCODE project has demonstrated involvement of all the major classes of human transposable element in regulatory enchancer activity, most specific to a single cell type (Thurman 2012).
Fig 20: ENCODE data showing involvement of the major classes of transposable element in enhancer activity (Thurman 2012).
By some reckonings, 40 to 50 per cent of the human genome consists of DNA imported horizontally by viruses, some of which has taken on vital biological functions. Taken together, virus-like genes represent a staggering 90 per cent of the human genome (Hamilton). Coding sequences comprise less than 5% of the human genome, whereas repeat sequences account for at least 50% and probably much more. Transposable LINE or long-intermediate repeat retroelements, common to mammals (Han and Boeke), and insects (Jensen et al, Sheen et al) with a history running back to the Eukaryote origin are specifically activated in both sperms and eggs during meiosis (Branciforte and Martin, Tchénio et. al., Trelogan and Martin), although subjected to down regulation by interfering piRNAs (Aravin et al). They replicate from transcribed RNA copies of themselves thus using RNA to instruct DNA copies, indicating an origin in RNA-based life, as does the active RNA processing of our own Eukaryote cells. Their RNA-based reverse transcriptase shows homologies with the telomerase essential for maintaining immortality in our germ line, indicating a common and symbiotic origin. 100,000 partially defective LINEs, around 100 of which remain fully active in humans, and their 300,000 dependent smaller fellow traveller Alu SINEs make up a significant portion of the human and mammalian genomes, along with pseudogenes, apparently defective copies of existing genes translocated by elements such as LINEs.
These elements travel passively down the germ line with chromosomal DNA, so their specific activation during meiosis suggests they may perform a role of coordinated regulatory mutation. This suggests that the type of symbiotic sexuality embraced by bacteria and plasmids also continues to function in higher organisms in a form of sexual symbiosis between our chromosomes and transposable genetic elements. This is consistent with the 1.4% point mutation divergence between humans and chimps, being overshadowed by an additional 3.9% divergence to 5.4% overall (Britten), when insertions and deletions are accounted.
SINEs, such as human Alu, a free-rider on the LINE reverse transcriptase derived from the small cellular RNA used to insert nascent proteins through the membrane, are in turn implicated in active functional genes (Reynolds, Schmid) particularly some involved in cellular stress reactions, again suggesting genetic symbiosis. Humans have about 13 times as many RNA edits as non-primate species, including inosine insertions associated with Alu elements, as well as intron deletions (Holmes ) and newly inserted exons (Ast), which may differentiate humans from other apes through alternative splicing of genes expressed in the brain. RNA editing is abundant in brain tissue, where editing defects have been linked to depression, epilepsy and motor neuron disease. There is a new Alu insert about every 100 births. As many as three quarters of all human genes are subject to alternative splice editing.
Recent explosion of the area of interfering miRNAs as regulatory elements in gametogenesis and development (Großhans) has provided an explanation of how pseudogenes, including those retrotransposed via LINE elements, can gain functional regulatory significance even though they do not produce translatable mRNAs.
Fig 21: Pseudogene-mediated production of endogenous small interfering RNAs (endo-siRNAs). Pseudogenes can arise through the copying of a parent gene (by duplication or by retrotransposition). (a) An antisense transcript of the pseudogene and an mRNA transcript of its parent gene can then form a double-stranded RNA. (b) Pseudogenic endo-siRNAs can also arise through copying of the parent gene as in a and then nearby duplication and inversion of this copy. The subsequent transcription of both copies results in a long RNA, which folds into a hairpin, as one half of it is complementary to its other half. In both a and b, the double-stranded RNA is cut by Dicer into 21-nucleotide endo-siRNAs, which are guided by the RISC complex to interact with, and degrade, the parent gene's remaining mRNA transcripts. The mRNA from genes is in red and that from pseudogenes is in blue. Green arrows indicate DNA rearrangements (Sasidharan and Gerstein).
Although the data from the human genome project indicated that human LINEs are becoming less active as a group by comparison with the corresponding elements in the more rapidly evolving mouse genome, there remain about 60 active human LINE elements which are known to be responsible for mutations in humans. More recent investigation (Boissinot et. al.) shows that the most recent families are highly active. Around four million years ago shortly after the chimp-human split, a new family Ta-L1 LINE-1 emerged and is still active, with about half the Ta insertions being polymorphic, varying across human populations. Moreover 90% of Ta-1d, the most recent subfamily are polymorphic, showing highly active lines remain present. LINEs are more heavily distributed on the sex chromosomes with X chromosomes containing 3 times as many full length potentially active elements and the Y chromosome 9 times as many! This is consistent with a continuing mutational load on humans which is removed more slowly from the sex chromosomes by crossing over in proportion to the degree to which crossing over is inhibited in each (i.e. totally on the Y and largely in males in the X but not in females). Sexual recombination is a protection from mutational error in a process called Muller's ratchet.
Fig 22: Evolution of reverse transcriptases from a common ancestor bearing a LINE archetype (Xiong and Eickbush, Nakamura et. al.). The root of their evolution goes back to the transfer from RNA to DNA at the beginning of life. They form a complementary evolutionary tree to that of cellular life as genetic symbionts of metazoa travelling down the germ line. Their group includes telomerases essential to the reproductive cycle.
LINEs are preferentially expressed in both steriodogenic and germ-line tissues in mice (Branciforte and Martin, Trelogan and Martin), suggesting stress could interact with meiosis. L1 expression occurs in embryogenesis, at several stages of spermatogenesis including leptotene, and in the primary oocytes of females poised at prophase 1. Conversely the SRY-group male determining gene SOX has been found to regulate LINE retrotransposition (Tchénio et. al.). Similarly LINE elements have been proposed to be 'boosters' in the inactivation of one X chromosome that happens in female embryogenesis (Lyon). This could enable somatic stress to have a potential effect on translocation in the germ-line which might enable form of genetic adaption in long-lived species such as humans. They have diverse means both to cause mutational damage and novel alleles (Han and Boeke).
Both L1 and Alu elements may be able to self-regulate rates of replication, through the existence of stealth drivers, viable elements which maintain a low transcription rate of active elements, with little genomic impact and hence little negative selection. These occasionally seed daughter master elements, which may replicate actively to form new families when conditions permit. This picture is consistent with long periods of quiescence, punctuated by bursts of 'saltatory' replication leading to large copy numbers (Han et. al.).
Further evidence of a symbiotic relationship comes from Drosophila telomeres, which are maintained by the non-LTR retrotransposons, the Line-like TART (Jensen et al, Sheen et al) and HeT-A (Biessmann et al). Likewise the recombination activating gene protein RAG1/RAG2, essential for the mutational variability of the vertebrate immune system, appears to have evolved from an ancient DNA transposon common to the metazoa (Agrawal et al, ). Significant similarities exist in the catalytic proteins of Hermes hAT transposase in insects, the V(D)J recombinase RAG, and retroviral integrase superfamily transposases, thereby linking the movement of transposable elements and V(D)J recombination (Zhou et al).
Fig 22b: Transib evolutionary tree spans the eucaryotes (Kapitonov and Jurka).
The approximately 600-amino acid ‘‘core’’ region of RAG1 required for its catalytic activity is significantly similar to the transposase encoded by DNA transposons of the Transib superfamily discovered recently based on computational analysis of the fruit fly and African malaria mosquito genomes. Transib transposons also are present in the genomes of sea urchin, yellow fever mosquito, silkworm, dog hookworm, hydra, and soybean rust (Kapitonov and Jurka).
Looking at the varying pace of evolutionary change in a very ancient gene family and one of the largest in the human genome, the G-protein linked receptor family has roots going back to the first eucaryotes, with two major types of serotonin receptor 5HT1 and 5HT2 diverging before the molluscs, arthropods and vertebrates diverged and originating between 750 million and 1 billion yeas ago. Consequently serotonin functions in mood and circadin rhythms in a similar manner in insects and humans. The diversity of neurotransmitters in humans particularly the amines serotonin, dopamine, norepinephrine and histamine and the amino acids glutamate, gamma-amino-butyric acid and glycine originate from the need of single celled eucaryotes to communicate major pathways of strategic survival from nutrition through aversion to reproduction and sporulation. The family also includes the opsins of visual perception, development receptors and a diverse array of olfactory receptors which are evolving far more rapidly. Serotonin appears with the first photosynthetic bacteria that used tryptophan to hold the porphyrin reactive center and continues, along with melatonin to play a crucial role in light and circadian cycles in humans, as well as mood and social responsiveness.
Fig 22c: (a) The two major serotonin receptor types 5HT1 and 5HT2 separated before the molluscs, arthropods and vertebrates diverged (Blenau & Thamm). (b) Evolutionary tree of the human G-protein linked receptors with examples highlighted in color. On the α branch are amine receptors - serotonin 5HT1A and 5HT2A, dopamine D1, and D2 (DRD1, DRD2), adrenergic α2a (ADRA2A), muscarinic acetylcholine (CHRM2), trace amine TAR1, as well as rhodopsin (RHO) and encephalopsin (OPM3). On the glutamate branch are metabotropic glutamate mGluR2 and GABA GABBR1. On the β branch is oxytocin (OXTR) surrounded by vasopressin receptors and Ghrelin. On the γ branch are opioid κ and μ (OPRK1, OPRM1). Olfactory and the non-rhodopsin receptors are linked to their respective points on the rhodopsin family tree. (Fredriksson R et al, Zozulya S. et al). (c) Insect tree of receptors for serotonin, dopamine, tyramine and octopamine neurotransmitters (Blenau & Baumann).
The evolution of the metazoan sodium channel essential for the neuronal action potential, from the Calcium channel shared by fungi and animals ocurred in single celled eucaryotes before the metazoa evolved from choanoflagellate-like ancestors. Metazoan eyes also appear to have a common origin, as indicated by the capacity of both jellyfish and mouse pax genes to elicit ectopic compound eyes on fruit files.
Fig 22d (a) Common involvement of PAX genes in eye formation from jellyfish, insects and vertebrates suggests a single common origin despite the differing mechanisms. The jelly fish pax genes, like mouse pax-6, induce ectopic compound eyes in fruit fly (right) (Suga et al, Kozmik et al). The small camera eyes of a jellyfish are shown top right (yellow arrow). It has even been suggested that the jellyfish could have gained the eye development pathway through symbiosis with certian single-celled dinoflagellates which possess an eyespot ocelloid, complete with lens and retinoid organelle (lower right) and may have in turn inherited this functionality from cyanobacterial chloroplasts via red algae (Pennisi, Keim, Le Page). Detailed analysis (Gavelis et al. 2015) shows it to be a compound endosymbitoic structure involving both a mitochondrial 'cornea' and red-alga plastid derived retinal body comprising stacked wave-form membranes derived from chloroplast thylakoids surrounded by pigmented lipid droplets. Dinoflagellates are thus an example of multiple engulfing endosymbiont events in which both mitochondria and then single-celled red-algae complete with plastids have been incorporated. (b) Evolutionary diversification of Na+ channels from Ca++ channels, essential for the action potential, appears to have occurred before the existence of nervous systems in founding single-celled eucaryotes leading to the metazoa before the choanoflagelates such as monosiga (Liebeskind et al).
Endogenous retroviruses, or ERVs, which also travel down the germ line as free-riders, although some may retain infectious capacity, may be essential for placental function, as every mammal tested has placental blooms of endogenous retroviruses which appear to both aid the formation of the syncytium, the super-cellular fused membrane that enables diffusion from the mother to the baby and the immunity suppression, which prevents rejection of the embryo, both characteristics of retroviruses such as HIV.
Mi and colleagues (2000) found a placental gene whose sequence was homologous to several retroviral envelope proteins. The sequence, now called syncytin, is identical to the envelope protein of the HERV-W retrovirus (Blond et. al.) which exists in around 40 apparently defective viral copies, including those in which the two syncytin viral env genes are fully functional (Mi et. al.). Syncytin is expressed at high levels in the syncytiotrophoblast (and at low levels in the testes) and nowhere else. Most of the other genes of the provirus have been mutated, suggesting that the envelope glycoprotein function was specifically selected. If cultured cells are made to express syncytin, they will fuse together, and this fusion can be blocked with antibodies against syncytin. HERV-W is only found in primates, but mice have similar retroviral blooms and ERV-related Syncytin genes have also been found in them (Dupressoir et. al.). The ability of mammals and thus ourselves to form a viable placenta and give birth to live young may thus depend on the mammals having harnessed a viral gene somewhere in our evolutionary lineage.
Fig 22e: HIV appears to have been transmitted to humans on three separate occasions leading to distinct evolving genotypes (Left: Sharp & Hahn 2010 - HIV genotypes red. Right: HIV groups M, N & O. evolution.berkeley.edu/evolibrary/article/medicine_04).
The most recent endogenous retrovirus to infect the human line HERV-K, entering both before and after the divergence from chimpanzees, still has members with open reading frames and has been found to be expressed at the 8-cell stage of embryogenesis and appears to protect the embryo against infection from other viruses (Grow et al. 2015). HERV-K is transcribed during normal human embryogenesis, beginning with embryonic genome activation at the 8-cell stage, continuing through the emergence of epiblast cells in preimplantation blastocysts, and ceasing during human embryonic stem cell derivation from blastocyst outgrowths. HERV-K viral-like particles and Gag proteins can be found in human blastocysts, indicating that early human development proceeds in the presence of retroviral products. Expression of the HERV-K accessory protein Rec, may also inhibit viral infection. Moreover, Rec directly binds a subset of cellular RNAs and modulates their ribosome occupancy, indicating that complex interactions between retroviral proteins and host factors can fine-tune pathways of early human development.
Fig 22f: Evolution of the ebola single stranded RNA virus in the 2014 ebola epidamic. (A) Phylogenetic and temporal placement of 188 Liberian EBOV genomes relative to 734 sequences from Guinea, Mali, and Sierra Leone. Three distinct lineages are represented in the Liberian samples: GN1, SL1, and SL2. (B) Median-joining haplotype network including 175 Liberian sequences with ≥97% genome coverage and 466 sequences representative of lineages circulating elsewhere in Western Africa (Ladner et al. (2015).
Retroviruses are divided between predominantly exogenous infectious habit such as HIV and SIV and conversion to endogenous transmission down the germ line such as the diverse HERV types. Magiorkinis G et. al. (2012) have verified that the loss of the Env gene which enables cell infection, is associated with super-amplification of germ-line retroviral elements by a factor of about 30, as exogenous retroviruses, switch to endogenous modes of selection. They have investigated the widespread occurence of retroviruses, including intracisternal A-particles, or IAPs, across the diversity mammal groups. Notably following Jern and Coffin (2008) the evolutionary tree of retroviruses both spans the vertebrates and includes both endogenous and exogenous habits, rooted in exogenous viral types.
Fig 23: (a) The seven retroviral genera: alpha-, beta-, gamma-, delta-, epsilon-, lenti-, and spuma-like retroviruses and their intermediate groups based on Pol sequences. Black branches indicate viruses known only in exogenous infectious forms (XRV); redbranches indicate viruses present in both XRV andendogenous (ERV) forms; and blue branches indicate ERVs Jern and Coffin (2008) . (b) Phylogeny of mammals with ERV megafamilies shown as colored circles (area is proportional to the percentage of the ERV loci in the genome represented by that family Magiorkinis G et. al. (2012). When retrotranspons are included (fig 22) they extend to all Eukaryote realms.
The defective copies of endogenous retroviruses may also serve to protect the host against further infection by becoming transcribed and causing incorporation of defective elements into the replicating virus (Best et. al.).
Fig 23b: Root of the animal tree with ctenophores being an ancient outlier and xenocoelomorphs forming a newly-discovered primitive phyllum (Rouse et al. 2016).
The Cambrian Radiation, Homeotic Genes, Metamorphosis and Hybridization
One of the most stunning and puzzling aspects of evolution is the Cambrian radiation some 550 million years ago which over a very short period of geological time, gave rise to the major phylla of multicellular animals we see today. This radiation forms the core of the evolutionary tree of fig 1. The previous evolutionary epoch - the Ediacarian by contrast has far fewer and less elaborate fossil forms such as Charnia, Dicksonia and Spriggina in fig 1, and particularly few organisms with well-preserved mineral skeletons.
There have many proposals why such a rapid and abundant radiation could have occurred, including geological scenarios involving the ending of a snowball earth epoch in which the earth became frozen and thus reflected radiant heat, until rising CO2 levels caused a rapid thawing, setting off a major expansion of cyanobacteria, filling the atmosphere with oxygen and changing the ocean from an acidic state with dissolved iron and litle oxygen to one capable of harbouring diverse forms of multicellular life.
Fig 24: Center to right: Homeotic genes specifying differentiation along the bodily axis have closely related sequences and are organized in a parallel scheme in arthropods and vertebrates and extend to cnidaria predating bilatera. Mutations such as antenapedia and bithorax in the fruit fly alter sequential specialization of segments. HOX genes share extensive sequence homology with phage lambda genes such as cro altering gene expression in prokaryotes. Left: Mutations of related genes in maize cause disruption of leaf development. Upper left: The homeobox gene antennapedia which induces legs in the place of antennae in the fruit fly binding to DNA. Lower left: Ectopic compound eye on the leg of a fruit fly induced by mouse pax-6.
However the underlying reasons may be genetic and more to do with the evolution of a pangenomic algorithm for generating the multicellular body plan based on homeotic and related developmental genes which are highly conserved and spread across the major animal and even plant kingdoms. Closely related schemes of homeotic genes drive the vertebrate and arthropod development along the bilateral axis, being involved in segmentation and notochord differentiation. For example pax-6, a gene involved in eye formation in the mouse, will induce ectopic eyes on the fruit fly, intriguingly the compound eyes insects usually have, indicating a deep commonality between the genes organizing the body plan in these two pivotal phylla.
This suggests it may have taken evolution a considerable time to come up with such an algorithmic regulatory process, but that once it came into play, it permitted almost symphonic variations, leading to the diversity of the major animal phylla over a relatively short geological time.
Fig 25: Body schemes of the bilateria (Martın-Duran et al. 2012).
A central division in the Cambrian radiation is the one that led to the bilateria - organisms with the left right symmetry possessed by both arthropods and vertebrates. The conventional argument is that the founding event differentiating bilateria from earlier organisms such as the cnidaria which have a mouth only is the symmetry introduced by the formation of an anus and an intestinal tube. Here things get complicated because embryonic development can go either from mouth first, based on the blastopore originating in cnidaria and then to anus or vice versa. In the conventional division the deuterostomes including the vertebrates go "arse-first" i.e. anus > mouth but the proterostomes go mouth > anus. The difficulty is that key deuterostomes such as the priapulid worm Priapulus caudatus, which was abundant in the middle Cambrian, actually develops on a deuterstome plan although in terms of molecular evolution the expression of bra, cdx, foxA, gsc, and otx during early development is similar to nematodes and arthropods spanning the "skin-shedding" ecdysozoa. Given the fact that the Chaetognatha or arrow worms also have this pattern the deuterstome pattern appears to be ancestral with the proterostomes forming a diverse set of body plans (Martın-Duran et al. 2012).
The conventional theory of insect morphogenesis is the evolution from eggs giving rise to small adult form individuals to delayed maturation of embryonic forms in the form of larvae, which did not compete with the adults in food consumption and habit, thus leading to a two-stage life cycle, with non-competing foraging larval and reproductive adults forms (Jabr). A controversial theory (Ryan) posits that aspects of metamorphosis, which we also see in insects, but more pivotally in marine organisms such as echinoderms, may have resulted from early hybridization even between organisms which have now become distinct phylla, such as vertebrates and echinoderms. For example Nectocaris pteryx appears to have a body plan looking like a chimera of an arthropod head and the abdomen of an entirely different phyllum, although this may be a result of the way fossils are depicted in drawings and this species may simply be an early cephalopod (Smith & Caron). The validity of this idea, at least in insects, is highly controversial and hotly disputed (Hart & Grosberg, Williamson).
Fig 26: Fossil and two very different scientific illustrations (insets) of Nectocaris ptaryx
and two views of the larva of Luidia sarsi emitting an echinodermal 'offspring'.
Studies attempting to trace the evolutionary tree depending on the simpler larval forms from which one would expect the adult form to have evolved have yielded contradictory results, suggesting the two might have independent genetic origins. Genetic analysis of sea squirts, which have vertebrate larval forms with a notochord and a primitive brain but metamorphose into fixed sea-floor feeders, suggests they have two genomic components, one coming from vertebrates and the other from and an unknown but now extinct non-vertebrate at a very early stage in the evolution of animals. Echinoderms themselves have larvae with a bilateral body plan which later becomes colonized by pluripotent cells in the abdominal cavity forming radially symmetric organisms which grow into adults. In the starfish Luidia sarsi, the embryonic form some 4 cm long survives for several months as a vegetarian living off phytoplankton after its starfish 'offspring' have burst out to their carnivorous habit of hunting other starfish.
Donald Williamson, who in the 1950s advanced the 'larval transfer' theory claimed to have successfully hybridized fertilised eggs from the sea squirt Ascidia mentula with sperm from the sea urchin Echinus esculentus. Then in 2002, in an unpublished study with Sebastian Holmes and Nic Boerboom, he did the reverse cross, using eggs from the urchin and sperm from the sea squirt. Both crosses resulted in large numbers of offspring, the majority of eggs developing into easel-shaped larvae - the 'pluteus' form typical of sea urchins, rather than the tadpole larvae that are the hallmark of sea squirts. Most of these larvae subsequently metamorphosed to a rounded adult form, which Williamson called a 'spheroid'. The first cross created spheroids with a suction cup, that enabled them to attach to surfaces. Most intriguingly, the second produced spheroids that reproduced asexually through budding, the pinching off of a section of the body to create a clone. However these were never subjected to genetic analysis. A cross between two echinoderm species has however resulted in new developmental phenotypes with confirmed hybridized genomes.
John Long, David Choo
Vertebrate Penetrative Sex: A Tortuous History
Recent fossil evidence suggests that penetrative sex has evolved four times in the vertebrate tree and was the original form of sex in the ancient placoderms that gave rise to all the jawed vertebrates (Long et al. 2014). It still appears in stingrays, although they have lost the bony claspers and replaced tham with cartilagenous ones, and has re-emerged in both guppys and land vertebrates which have evolved a variety of intromittent penises. This suggests that fundamental components of development have been preserved for long epochs leading to penetrative sex not having a developmental gene bootstrap.
Fig 26b: Conflict in the tree of mammalian diversification. Detailed traditional DNA-based evolutionary tree of the mammals (right) (Meredith et al, dos Reis et al) tends to have a different order of diversification from one based on the number of new miRNAs appearing in successive branches (top left Dolgin). Micro miRNA numbers have also been suggested to be correlated with neural complexity (bottom left Technau). Larger image of the right figure. Major mammal groupings image.
Mammalian Radiative Adaption: Traditional DNA versus Micro RNAs
Different assay methods are shedding an intriguing light on radiative adaptation and diversification of animal species, from the Cambrian through to the present day. The detailed branching of the tree of life when calculated by traditional mutational DNA methods (Meredith et al, dos Reis et al), appears to differ significantly from a new technique developed by Kevin Peterson (Dolgin) that depends on the number of newly accrued micro miRNAs which modulate gene expression by selectively binding to specific messengers inhibiting their expression.
A single miRNA is thus able to modulate the expression of a diverse array of mRNAs to which it binds, thus providing for sophisticated forms of coordinated regulation conducive to phylogenetic complexity. It has also been suggested that neuronal complexity correlates with the number of miRNAs (Technau, Grimson et al) an interesting question in itself to do with how complex nervous systems are generated in development. Notice here that humans have fewer protein genes than a mouse, roughly 21,000 against 22,000 although we have a brain with 10,000 times as many neurons, so we need to have an idea how organismic complexity evolves in terms of sophisticated gene regulation, and miRNAs do just that.
Consitent with such a role in multi-celled evolution, the appearance of miRNAs goes back to the earliest multicelled animals. Sea anemones already carry up to 40. Metazoa, from sponges to bilateria, also share the two classes of piRNA, the second of which plays a role in suppressing transposable elements in gametogenesis, by containing a sequence complementary to a transposase mRNA. In fruit flies these are directed against DNA transposons, but in mammals they target LINE L1 and IAP transcription during meiosis in the germ line, by methylating L1 and IAP DNA sequences (Aravin et al).
While a traditional DNA-based tree places primates and humans much closer to rodents, as highly evolved branches, with the elephants diverging earliest, an miRNA analysis places rodents as branching out earliest, something which might seem to be consistent with their possibly closer correspondence to the founding shrew-like mammalian type. The critical question determining the fate of the miRNA perspective is what the rate of loss of these small RNA molecules is in evolution. A higher rate of loss would tend to remove the inconsistency. While the picture is consistent with retaining miRNAs in mammalian diversification, in insects and a primitive chordate sudden losses have occurred. Ida's controversial place in the primate famiy tree.
For modern lineages of birds and mammals, few fossils have been found that predate the Cretaceous–Palaeogene (K-Pg) boundary, although molecular studies using fossil calibrations have shown that many of these lineages existed at that time. One intriguing way of checking the evolution of mammals and its timing is to examine the parasites of birds reptiles and mammals and to develop a "tree of lice". Smith et al. (2011) demonstrate that the major louse suborders began to radiate before the K-Pg boundary, lending support to an earlier Cretaceous diversification of many modern bird and mammal lineages.
Fig 26c: Evolutionary tree of mammalian male infanticide (circled species) which occurs in around half of the 200 species invesigated. It is commonest in social species (dark grey) and less so in solitary species (light grey) and even less in monogamous species (black) (Lukas & Huchard 2014).
Mammals give birth to live young and lactate. This has caused the reproductive investment of the two sexes to become highly skewed, with males investing primarily in fertilization, while females are investing primarily in parenting. This has a variety of consequences. For example only around 3% of mammal species are socially monogamous, with the rest being either polygynous with harems, or having promiscuous females. This again skews the reproductive strategies further, because there are secondary consequences. Females are less likely to go in heat and become pregnant while they are lactating a brood, and offsping of other males will compete with his own, so there is a double investment in species with competing males, to kill the offspring of competitors - male infanticide. Around half of mammalian species, including the ancestors of the great apes and humans, are inveterate infanticiders, as promiscuous chimps and harem forming gorillas are, and the trait appears to have evolved multiple times. In turn, the females adapt to promiscuous mating, often with an advertised estrus to make it as difficult as possible for males to determine paternity (Lukas & Huchard 2014). Finally the males adapt by forming larger testes to deal with the issues of sperm competition involved in promiscuity.
Emergence and Diversification of Modern Humans
Fig 27: One hypothetical evolutionary tree for humans and related apes. There is much debate about the actual form of such a tree.
Homo sapiens appears to have evolved into a single dominant species on the planet, after preceding period in which fossil evidence suggests there were several different anthropoid species coexistent.
The final stage of this process was the disappearance of Homo erectus and Neanderthal, the latter after a well-defined period of coexistence in Europe at the end of the last ice age.
Our own evolutionary and cultural roots appear to lie in Africa, with evidence of culture and cosmetics running back over 100,000 years, in addition to evidence for tools and weapons. An alternative regional development theory has proposed that humans evolved through a considerable amount of interbreeding over the whole African and Asian continental region, however genetic evidence is coming to point towards an African origin with only at most very occasional cross fertilization with related species.
Fig 28: Human-Neanderthal-Chimp divergences (Green et. al. 2006)
According to genetic analysis, Neanderthals diverged from homo sapiens ~500,000 years ago. There has been no major interbreeding, but possibly some transfer of genes e.g. from human males to Neanaderthal females, although candidate human genes conferring natural advantage do have a profile consistent with transfer from Neanderthals (Pennisi).
Specific genes such a s PDHA1 consist of two families with the last common ancestor 1.8 million years ago, and microcephalin variants appearing 40,000 years ago also have differences suggesting an original divergence 1 million years ago suggesting 'introgression' from Neanderthals (Jones). An even more ancient divergence in the pseudogene RRM2P4 in East Asian people suggests interbreeding with Homo Erectus. Some evidence from skeletons is also consistent with this picture. However more recent sequencing of the Neanderthal nuclear genome (Callaway 2008) suggests little or no interbreeding with Homo sapiens and has cast doubt on the existence of the microcephalin variant in Neanderthals, as well as a gene associated with increased fertility in Icelanders also attributed to transfer from Neanderthals.
Fig 29: The "Out of Africa" hypothesis may be consistent with a degree of regional development
involving some sexual interbreeding with Neanderthals and Homo erectus (New Scientist).
Comprehensive investigation of the Neanderthal genome (Green et al. 2010, Dalton 2010) suggests that there was a period of interbreeding between Neanderthals and humans in the Near East around the time of the first migration out of Africa, rather than more recently in Europe, as the putative sequences are shared by non-African French, Han and Papuan, but not by the African Yoruba or San. It is estimated that among the former, 1-4% of the genome derives from Neanderthal sequences, although there is little evidence for these corresponding to the specific genes suggested by Lahn's team. Other transfers could have occurred but are not apparent in the research.
The situation has been complicated by two finds. Firstly we have the 'hobbit' human remains found on Flores, named Homo fiorensis. These are variously claimed to be a Separate human species possibly related to Homo erectus, or disclaimed as microcephalic human pygmy peoples. More recently we have the discovery of remains of Denisovans (Callaway 2010), a further species branching off from the Neanderthals and their lines branching from Homo sapiens some 800,000 years ago. Genetic analysis of the remains indicates a significant interbreeding specifically with Melanesian people of some 6% (Reich et. al 2010, Meyer et al 2012).
More recent investigations show that interbreeding with other hominins was critical to the globalization of Homo sapiens. Human leukocyte antigens (HLAs), a family of about 200 genes that essential to our immune system also contains some of the most variable human genes: hundreds of versions - or alleles - exist of each gene in the population, allowing our bodies to react to a huge number of disease-causing agents and adapt to new ones. One allele, HLA-C*0702, is common in modern Europeans and Asians but never seen in Africans; Peter Parham has found it in the Neanderthal genome, suggesting it made its way into H. sapiens of non-African descent through interbreeding. HLA-A*11 had a similar story: it is mostly found in Asians and never in Africans, and Parham found it in the Denisovan genome, again suggesting its source was interbreeding outside of Africa. This tallies with interbreeding giving H. sapiens pivotal resistance to non-African diseases. While only 6 per cent of the non-African modern human genome comes from other hominins, the share of HLAs acquired during interbreeding is much higher. Half of European HLA-A alleles come from other hominins, says Parham, and that figure rises to 72 per cent for people in China, and over 90 per cent for those in Papua New Guinea (Marshall 2011).
In a 2014 study by David Reich and coworkers, genes for keratin filaments that lend toughness to skin, hair and nails, were enriched with Neanderthal DNA. This may have helped provide the newcomers with thicker insulation against cold conditions, the scientists suggest. But other genes are implicated in human illnesses, such as type 2 diabetes, long-term depression, lupus, billiary cirrhosis - an autoimmune disease of the liver - and Crohn's disease. Other regions of the human genome, including the X-chromosome, are devoid of Neanderthal sequences suggesting they were selected against as deleterious. A genome region that lacked Neanderthal genes includes FOXP2, thought to play an important role in human speech (Sankararaman et al 2014 , Vernot & Akey 2014).
Fig 29c: Recent discovery of 400000 year old bones with mitochondrial relationship to Denisovians has raised further questions about early human emergence (Meyer et al. 2013).
Some doubt has subsequently been cast on the Neanderthal interbreeding idea, attributing the effects instead to shared sequences arising from isolated African populations of the two species separating some 300-350,000 years ago, with the last exchanged genetic material some 47-65,000 years ago. However these results are contested by the original researchers from Paabo's team. They say recent analyses actually firm up the case for interbreeding. Their evidence suggests non-Africans have shared genes in common with Neanderthals for only a few tens of thousands of years, so these genes cannot predate the origin of Neanderthals.(Marshall 2012).
In 2014 more accurate datings of the demise of Neanderthals in Europe suggests they were already in serious decline 50,000 years ago probably as a result of a climate cooling phase and that by the time sapiens arrived they were already only a small fragile remnant population in scattered isolated bands (Brahic 2014, Benazzi 2011). By 39,000 years ago they had largely vanished. This doesn't imply they were actively killed off by Homo sapiens but that their territory and resources were compromised by a new invasive species. Neither is it clear that they were manifestly intellectually inferior to modern humans, as artifacts from both species show similar innovations (Barras 2014).
Fig 29d: A more complete picture of introgression of genes from one hominin species to another has been developed as a result of further sequencing of a slightly older Neanderthal toe bone also from Denisova cave and the analysis of the Denisovan sequence implying gene flow from an older hominin, possibly erectus or heidelbergensis. Notably the Neanderthal female was highly inbred corresponding to being the offspring of half-siblings with a common mother (Prufer et al 2013).
There is also suggestive evidence for up to 2% interbreeding with another hominin species, in ancient African populations of Biaka Pygmies and San Bushmen (Kaplan 2011, Hammer, et al. 2011) although this is based only on statistical divergences in some loci, and lacks a sister species reference sequence, it suggests interbreeding around 35,000 years ago with a population that originally diverged from the Homo sapiens line some 700,000 years before.
The female Neanderthal sequenced in the above study also had extensive inbreeding (Marshall 2013), with up to an eighth of the genome devoid of alelle variation implying breeding between half siblings, possibly as a consequence of small isolated populations. This led to notable incidence of deformities. Those he studied have a range of deformities, many of which are rare in modern humans (Wu et al. 2013). Our genomes likewise still carry traces of past small population bottlenecks. A 2010 study concluded that our ancestors 1.2 million years ago had a population of just 18,500 individuals, spread over a vast area (Huff et al. 2010).
Fig 29e: Relationships of intestinal bacteria in the evolution of great apes and Homo sapiens. Change in the microbiome was slow and clock-like during African ape diversification, but human microbiomes have deviated from the ancestral state at an accelerated rate. Human microbiomes have lost ancestral microbial diversity while becoming adapted for animal-based diets. (Moeller et al. 2014).
Chromosomes contain a variety of markers that can be used to compare diverse populations and infer an evolutionary relationship between them. These include the slowly varying protein polymorphisms of coding regions which are useful for long-term trends, single nucleotide polymorphisms, and non-coding region changes (mutation rates about 2.5 x 10-8 per base pair per generation and useful for reconstructing evolutionary history only over millions of years) insertion and deletion events (about 8% of polymorphisms, extending from one to millions of nucleotides), particularly those driven by transposable elements such as the LINEs and even more frequent SINEs, non-coding micro-satellites (mutation rate 10-5 - 10-2 due to repeat slippage) and mini-satellite regions of repeating DNA (mutation rates as high as 2 x 10-1 due to meiotic recombination in sperm) that both evolve rapidly and are not subject to the strong selection of coding regions which can differentiate changes over the much shorter time scales of modern human migration.
The insertions and deletions of the million or so Alu elements in the human genome are particularly useful, as the most active sub-population of about 1000 Alu is actively transcribing and undergoing rapid change. A subpopulation of Alu are capable of generating new coding regions (exons), when inserted into non-coding introns between spliced sections of a translated mRNA, because one base-pair change within Alu leads to formation of a new exon reading into the surrounding DNA. This is not necessarily deleterious because alternative splicing still allows the original protein to be made as well. We have the highest number of introns per gene of any organism, and thus have to have gained an advantage from this costly error-prone process. Alus may have given rise, through alternative splicing, to new proteins that drove primates' divergence from other mammals. Recent studies have shown that the nearly identical genes of humans and chimps produce essentially the same proteins in most tissues, except in parts of the brain, where certain human genes are more active and others generate significantly different proteins through alternative splicing of gene transcripts. Our divergence from other primates may thus be due in part to alternative splicing.
If we consider the likely effects of the out of Africa hypothesis, we would expect that founding African populations not subject to active expansion and migration would have greater genetic diversity and that the genetic makeup of other world populations would come from a subset of the African diversity, consisting of those subgroups who migrated. This picture is complicated by the evidence for one or more bottlenecks that reduced the genetic diversity of the surviving human population to 3000-10,000 breeding pairs around 70,000 years ago, which has been associated with the supervolcanic Toba eruption in Sumatra.
Fig 29f: The Volcanic Winter/Weak Garden of Eden model proposed in Ambrose 1998. Population subdivision due to dispersal within African and to other continents during the early Late Pleistocene is followed by bottlenecks caused by volcanic winter, resulting from the eruption of Toba, around 71,000 years ago. The bottleneck may have lasted either 1000 years, during the hyper-cold stadial period between Dansgaard-Oeschlger events 19 and 20, or 10,000 years, during oxygen isotope stage 4. Population bottlenecks and releases are both synchronous. More individuals survived in Africa because tropical refugia were largest there, resulting in greater genetic diversity in Africa.
In the case of mitochondrial mtDNA (mutation rate about 2.5 x 10-7) and its hyper-variable D-loop (mutation rates as high as 4 x 10-3), which is transmitted only down the maternal line (see Tishkoff and Verrelli for caveat) and the non-recombining majority of the Y-chromosome which is transmitted only down the paternal line, each with no recombination, we would expect greater diversity going deeper into the historical tree of divergence, with certain existing groups who have retained the founding patterns of survival and have not undergone rapid population expansions to retain an increasingly diverse source variation. All these features are broadly observed in the genetic data to date.
Fig 30: (a) MtDNA tree for African groups showing haplotypes of !Kung, Mbuti and Biaka as well as the line coming out of Africa (Chen et. al.). (b) Diagram of world migration and regional differentiation of successive mtDNA haplotypes (Gilbert). (c) mtDNA distances between founding African groups including Hadza (clicks) Khwe is from (Knight et. al.). Recent mtDNA evidence suggests a first wave of migration down the coast of Asia all the way to Australia (Forster et. al.).
Most studies of non-coding regions of autosomal, X-chromosome, and mitochondrial mtDNA genetic variation (which are desirable markers because they are not so subject to selection and thus have relatively neutral drift) show higher levels of genetic variation in African populations compared to non-African populations, using many types of markers. Although some studies of Y-chromosome variation have observed higher heterozygosity levels in non-African populations, the African populations have higher levels of pairwise sequence differences, consistent with these populations being ancestral. High levels of diversity in African populations alone do not prove that African populations are ancestral. A recent bottleneck event and/or colonization and extinction events among non-African populations, or a more recent onset of population growth in non-Africans, could also cause a decrease in genetic diversity (Tishkoff and Verrelli). In fact the complete inter-fertility of all human populations and the relative lack of genetic divergence by comparison with the few remaining chimp colonies in the wild (Hrdy 183) does indicate a significant bottleneck. The genetic data is consistent with a human emergence from a population of only 10,000 around 100,000 years ago. This is also consistent with the delayed maturation, long birth spacings as a result of prolonged lactation and high infant mortality seen in gather-hunter populations such as the !Kung. At such low growth rates a population of 100 would take 50,000 years to reach 10,000 (Hrdy 183).
Fig 31: Patterns of male migration. The Genographic Project - a partnership between National Geographic and IBM - will collect DNA samples from over 100,000 people worldwide to provide a high-resolution genetic map of human migration.
However studies of protein polymorphisms as well as mtDNA haplotypes, X-chromosome and Y-chromosome haplotypes, autosomal microsatellites and minisatellites, Alu elements, and autosomal haplotypes indicate that the roots of the population trees constructed from these data are composed of African populations and/or that Africans have the most divergent lineages, as expected under a recent African origin rather than a multi-regional emergence model. Additionally, studies of autosomal, X-chromosomal haplotype and mtDNA variation indicate that Africans have the largest number of population-specific alleles and that non-African populations harbor a subset of the genetic diversity that is present in Africa, as expected if there was a genetic bottleneck when modern humans migrated out of Africa. Analysis of genetic variation among ethnically diverse human populations indicates that populations cluster by geographic region (i.e., Africa, Europe/Middle East, Asia, Oceania, New World) and that African populations are highly divergent. The mtDNA studies hypothesize a primal female ancestor - the African Eve - around 150,000 years ago (Chen et. al.) while the Y-chromosome Adam is more recent, at around 90,000 years ago (Underhill et. al.) consistent with the greater reproductive variance of males than females. Differences between the Y- and mtDNA distributions indicate how migration, intermarriage and female exogamy have affected the gene pool. The genetic patterns of both these and autosomal microsatellites (Zhivotovsky et. al.) are consistent with founding African diversity with migratory radiations to form other world populations, with deep founding radiations to the forest people such as the Biaka and Mbuti, Khoisan click-language speaking !Kung-san bushmen of Botswana and the Sandawe of Tanzania, and possibly the Hadzabe, as well as the forest people such as the Mbuti and Biaka 'pygmies' who have adopted the Bantu languages of the farming neighbours with which they now share semi-symbiotic relationships. Along with some Ethiopian and Sudanese sub-populations, these groups may represent some of the oldest and deeply diversified branches of modern humans.
Fig 32: (Right) Genographic project study of mitochondiral origins shows a deep split separating Khoisan mitochondrial inheritance from other groups, including those migrating out of Africa, suggesting a separation of some 100,000 years possibly caused by long term drought in Africa (Behar et al.) (Left) Phylogeny of 526 complete mitochondrial genomes depicting the earliest diverged modern human maternal lineages, including the first ancient Khoesan mtDNA (StHe) within the L0d2c lineage. All non-L0d2c genomes have been collapsed with each triangle representing the relative diversity of the corresponding haplogroups and subclades. In 2014 the skeleton of a male marine forager discovered at St. Helena, a carbon dated to 2,330 ± 25 years before the present, displays one of the oldest mitochondrial clades L0d2c1c, unlike its Khoe-language based sister-clades (L0d2c1a and L0d2c1b) most closely related to contemporary indigenous San-speakers (specifically Ju). whose ancestors diverged from other humans roughly 150,000 years ago. (Morris et al. 2014) before Khoekhoe speaking pastoralists arrived 500 years later.
Such recent genetic evidence has laid bare the relationships between some of the founding human groups spread across Africa from the 'Cushite' horn of Ethiopia to the southern Kalahari. Mitochondrial DNA studies have highlighted the ancient origin of the !Kung San and of pygmy peoples of the Congo Basin such as the Mbuti and the Biaka.
Y-chromosome studies have shown the !Kung share a most ancient haplotype with sub-populations from Ethiopia and the Sudan. According to an overall survey of genetic research by Sarah Tishkoff of the University of Maryland, the most deeply ancestral known human DNA lineages may be those of East Africans, such as the Sandawe, who share many phenotypic features and a click language with the !Kung. This suggests southern Khoisan-speaking peoples originated in East Africa. The most ancient populations are now believed to also include the Sandawe, Burunge, Gorowaa and Datog people of Tanzania. The Burunge and Gorowaa migrated to Tanzania from Ethiopia within the last 5,000 years consistent with an ancient founding population in this area. Echoes of the earliest language spoken by ancient humans tens of thousands of years ago may have been preserved in the distinctive clicking sounds still spoken by some existing African tribes.
Highlighting unique features of human genetic evolution, are two key genes whose mutations cause microcephaly, consistent with increased brain size, whose rapid spread through the human population may coincide with spurts in human culture. Microcephalin (Evans et. al.) appeared ~37,000 years ago coinciding with the birth of culture and ASPM spread from the Near East around 5000 years ago (Mekel-Bobrov et. al.). However studies linking these variants have failed to find differences in intelligence and results remain highly controversial (DOI:10.1126/science.314.5807.1872). Nevertheless, these results are consistent with an overall examination of linkage disequilibrium in single nucleotide polymorphisms (Moyzis et. al.) which indicate that about 7% of our genes have been subject to selection in the last 50,000 years, a figure similar to domestication of maize, including genes for protein metabolism, disease resistance and brain function.
Fig 33: Left: (a) Non-recombining Y-chromosome evolutionary tree (Underhill et. al.) (b) Geographical distribution showing the ancient haplotype shared by the San and Ethiopian and Sudanese sub-populations. (c) Genetic distances between Khoisan and forest peoples sharing M112 a Y-chromosome allele common only in these groups showing great genetic distance between Hadzabe and San peoples (Knight et. al.) . (d) Autosome satellite analysis confirming ancient divergence of San and forest peoples leading to migration from Africa (Zhivotovsky et. al.). Right: The genetic structure of 126 Ethiopian and 139 Senegalese Y chromosomes was investigated by a hierarchical analysis of 30 diagnostic biallelic markers selected from the worldwide Y-chromosome genealogy. The present study reveals that only the Ethiopians share with the Khoisan the deepest human Y-chromosome clades. This confirms the ancestral affinity between the Ethiopians and the Khoisan, which has previously been suggested by both archaeological and genetic findings (Semino et al.).
Y-chromosome studies have shown the !Kung share a most ancient haplotype with sub-populations from Ethiopia and the Sudan, suggesting they are parts of an ancient widespread population later divided by the Bantu expansion. According to an overall survey of genetic research by Sarah Tishkoff of the University of Maryland, the most deeply ancestral known human DNA lineages may be those of East Africans, such as the Sandawe, who share many phenotypic features and a click language with the !Kung. This suggests southern Khoisan-speaking peoples originated in East Africa. The most ancient populations are now believed to also include the Sandawe, Burunge, Gorowaa and Datog people of Tanzania. The Burunge and Gorowaa migrated to Tanzania from Ethiopia within the last 5,000 years consistent with an ancient founding population in this area. Echoes of the earliest language spoken by ancient humans tens of thousands of years ago may have been preserved in the distinctive clicking sounds still spoken by some existing African tribes.
Fig 34: Human divergence trees calculated by single nucleotide polymorphisms (SNPs) top left (Li et. al.) bottom right (Jakobsson et. al.). Trees for haplotypes and copy number variation between populations (Li et. al.). (click to enlarge).
In a counterpoint to these studies, (Hein, Rohde et. al.) estimate that the repeated spreading of family trees by sexually recombining mobile populations and differences in reproductive rates leads to an estimate of the most recent common ancestor of our global populations existing just 3,500 years ago, excepting these most isolated groups.
Further studies of the nuclear genome, using SNPs (single nucleotide polymorphisms), CNVs (copy number variation) and haplotype have thrown up reasonably consistent maps of regional divergence of principal human groups, demonstrating correspondence to the "Out of Africa" hypothesis and consistent with major patterns of migration.
Biallellic deletion-based tree including Neanderthals and Denisovans as outliers (Sudmant et al. 2015).
In 2015 (Sudmant et al.) the study of CNVs (deletions and gene duplications) was expanded to 236 individuals from 125 distinct human populations including an in depth exploration of duplications which require more advanced techniques to assess. In total, 7.01% of the human genome is variable due to CNVs in contrast to 1.1% due to single-nucleotide variation. Deletions (loss of sequence) were less common (representing 2.77% of the genome) compared to duplications (4.4% of the genome), suggesting that many duplications are fixed because they are advantageous. CNVs mapping to segmental duplications were larger on average (median of 14.4 kbp), than CNVs mapping to the unique portions of the genome (median of 6.2 kbp).
In 2012 an in depth study into human origins (Schlebusch et al) has found no single founding location but mixing and divergence of populations, with the Khoe-San diverging from other human groups over 100,000 years ago, and a further division later between North and South Kalahari populations around 35,000 years ago, but in addition, deep complexity both within and wthout Khoe-San populations.
Fig 35: Left: In 2009, Tishkoff et. al. reported on a major study of African and African American evolution containing the most detailed information on African diversity to date (click to enlarge). Right: Reproductive bottleneck in Y-chromosome diversity began about 10,000 years ago and continued for several millennia (Karmin et al. 2015). Inset shows 11 independent areas of primal agriculture discovered. Evidence of animal husbandry has also been found in Turkey 10500 years ago. (The real first farmers: How agriculture was a global invention New Sci 28 Oct 2015).
In 2015, research into the comparative population diversity of maternal mitochondrial DNA and the male Y-chromosome led to an astounding contrast. Around 10,000 years ago, corresponding to the birth of agriculture, the diversity of the Y-chromosome underwent a collapse across vast areas on the human-colonized planet. There is no evidence this was a result of direct biological or genetic factors as there were no differences between differing Y-clades. The conclusion is that the effect was driven by cultural changes associated with agriculture in which powerful men were able to reproductively exploit large numbers of women and transmit their reproductive success on to their male heirs, squeezing the majority of males out of the reproductive race. Estimates of this phase of extreme reproductive polygyny suggest that for every reproducing male there were 17 reproductive females effectively making harems the predominant form of sexual relationship (Karmin et al. 2015).
A member of the research team hypothesizes that somehow, only a few men accumulated lots of wealth and power, leaving nothing for others. These men could then pass their wealth on to their sons, perpetuating this pattern of elitist reproductive success. Then, as more thousands of years passed, the numbers of men reproducing, compared to women, rose again. In more recent history, as a global average, about four or five women reproduced for every one man, still a highly polygynous picture that leads into some of the great patriarchs of history from Ghengis Khan whose Y-chromosome continues to exist in 8% of men in 16 populations spanning Asia and some 0.5% of males worldwide (Zerjal, T. et al. 2003 Am. J. Hum. Genet. 72, 717-21) and Tamerlane who was said to keep 10,000 virgins behind flaming walls. Several other great founders of Y-chromosome lineages have been discovered (Callaway E 2015 Nature doi:10.1038/nature.2015.16767, Balaresque, P et al. 2014 Eur. J. Hum. Genet. doi:10.1038/ejhg.2014.285).
This comes as an ironical twist since it is assumed that agriculture was an invention of women coming out of their role as gatherers in gather-hunter societies and provides a new perspective on the societies of the planter queens where female deities appear to have been worshipped at the same time as this extreme form of male reproductive elitism. The other thing that is really stunning about this effect is that it has been repeated widely acros disaparate world cultures, from China through the Near East to Europe and even Precolombian America.
When three populations Khoisan from Africa, Mongolian Khalks and Papua New Guinea Highlanders were examined for the differences in age between the Y-chromosome Adam and the mitochondrial Eve, the ages of all three groups had a roughly 2:1 difference in age (SAN 73.6 kya vs 176.5 kya, MNG 43.6 kya vs 134.4 kya and PNG 45.5 kya vs 81.05 kya). These results are most consistent with a higher female effective population size skewed toward an excess of females by sex-biased demographic processes. They demonstrate that overall female reproductive populations sizes throughout the last 100,000 years of human evolution have been effectively polygynous by a factor of around 2:1.
Fig 36: Left: Evolutionary tree of Indo-European languages suggests a possible radiation corresponding to the Kurgans occurred around 6,900 years ago and that they were preceded by Hittite migrations into Anatolia. Time scales in red are BP (Gray and Atkinson). Significantly Tocharian appears in Buddhist writings from China's Xinjiang province, indicating early far-eastern spread. Inset: hypothetical relationship between Indo-European and wider language groups such as Afro-Asiatic (click to enlarge). Looking at the Indo-European origin geographically Bouckaert et al. (2012) found decisive support for an Anatolian origin over a steppe origin. Both the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago. Right: The DNA analysis of widespread fossil and current genomes has led to confirmation of a great Yamnaya migration from the Steppe around 4500 years ago which almost completely replaced the gatherer-hunter populations of Europe (Haak et al. 2015). See also Allentoft et al. (2015).
The evolutionary tree of human ethnic and migratory peoples bears an interesting relationship with the corresponding tree of languages, in which language appears to have a cultural evolutionary capacity of its own occurring more rapidly than genetic evolution, complementing the biological evolution of human populations.
Counterposing the idea of a hardwired genetic basis for the human capacity for spoken language, as exemplified by Chomsky's generative grammar, is the theory of language as an evolutionary 'parasites' converging towards internal efficiency through the modularity of their grammar and word set. Darwin (1904), the founder of the evolutionary approach (1859) speculated that language was potentially an invention: "Man not only uses inarticulate cries, gestures and expressions, but has invented articulate language, if indeed the word invented can be applied to a process completed by innumerable steps half consciously made". Morten Christiansen (Christiansen and Kirby 2003) question the need to invoke a Chomskian generative grammar. Instead, they argue, language has adapted to utilize more general cognitive processing capacities that were already part of our ancestors' brains before language came along. Among these, he focuses on 'sequential learning' - the ability to encode and represent the order of the discrete elements in a sequence. This ability is not unique to humans: mountain gorillas, for example, use it in the complicated preparation of certain 'spiky' plant foods, where a sequence of tasks is required to remove the edible part. Language, he says, is a 'non-obligate mutualistic endosymbiont' - a kind of evolutionary structure like a 'symbolic virus'. Kirby suggests our brains are not so specifically designed for language and that we appear to be biologically adapted to language because language, which evolves much faster than biology has culturally adapted to us, gaining semantic power and representational efficiency as it evolves. Languages as different as Danish and Hindi have evolved in less than 5000 years from a common Proto-Indo-European ancestor. Yet it took up to 200,000 years for modern humans to evolve from archaic Homo sapiens. This tallies well with the fact that written languages cannot possibly have a hard-wired basis, having only existed for the last 4000 or so years and being a product of only a few cultures, yet we can adapt our visual pattern recognition readily to become fully literate.
Confirmation that the tree of life of language evolution is a cultural evolutionary phenomenon, rather than a cognitive universal, by implication determined genetically (Ball 2011) has come with the work of Russel Gray and coworkers (Dunn et al. 2011). In the Nature editorial "Universal Truths" the scope of this is made clear. There are two theories of language universality which delineate the field. Noam Chomsky proposed that the brain is genetically endowed with rules providing brain modules which express a universal grammar. Joshua Greenberg, takes a more empirical approach, identifying traits (particularly in word order) shared by many languages, which are considered to represent biases that result from cognitive constraints. Gray and his colleagues have put both to the test using phylogenetic methods to examine four family trees that between them represent more than 2,000 languages. They considered whether what we call prepositions occur before or after a noun ("in the boat" versus "the boat in") and how the word order of subject and object work out in either case ("I put the dog in the boat" versus "I the dog put the canoe in"). A generative grammar should show patterns of language change that are independent of the family tree or the pathway tracked through it, whereas Greenbergian universality predicts strong co-dependencies between particular types of word-order relations (and not others). Neither of these patterns is borne out by the analysis, suggesting that the structures of the languages are lineage-specific and not governed by universals.
Quentin Atkinson (2011) has taken this a step further. Human genetic and phenotypic diversity declines with distance from Africa, as predicted by a serial founder effect in which successive population bottlenecks during range expansion progressively reduce diversity, consistent with the "out of Africa" hypothesis. Likewise Atkinson showed that the number of phonemes used in a global sample of 504 languages is clinal and fits a serial founder-effect model of expansion from an inferred origin in Africa - in effect a cultural evolutionary process. In Atkinson's words this "points to parallel mechanisms shaping genetic and linguistic diversity and supports an African origin of modern human languages.
To quote Jabr (2011) "Earlier research has shown that the more people speak a language, the higher its phonemic diversity. Africa turned out to have the greatest phonemic diversity - it is the only place in the world where languages incorporate clicks of the tongue into their vocabularies, for instance - while South America and Oceania have the smallest. Remarkably, this echoes genetic analyses showing that African populations have higher genetic diversity than European, Asian and American populations.
Fig 37: Hypothetical core of all human languages from Greenhill et al. (2010) further work associated with Gray and Atkinson's research (click to enlarge). The Nature article above implies "languages evolve in their own idiosyncratic ways, rather than being governed by universal rules set down in human brain patterns".
Fig 38: Tree of world religions included to turn the tables on creationist deniers of evolution (click to enlarge). Lacks detail for African tribal religions (see Culture out of Africa).
Finally we note that, contrary to creationist and intelligent design notions of life needing a design specification from a third party "God" whith no detectable presence in the natural universe, religions themselves can be seen to show a similar form of cultural evolution to the languages they are expressed in. The evolutionary tree of life remains the root and branch from which, through the muck and slime of sexual recombination, human intelligence, culture and religion has sprung. Thus evolution is fecundly capable of spawning religion but religion cannot legitimately deny evolution. Nature thus reigns supreme and we fantasize against it at our folly.
See also: Nature, Violence, Consciousness, Sexuality and World Religion An unveiling expose of the lethal fallacies of violence, sexuality, nature and consciousness, that underly religious traditions, which between them are followed by a majority of people on this planet.
Conclusion: The Tree of Life, the Selfish Gene, and Climax Genetic Diversity
The picture conveyed by the significance of endosymbiosis, genome fusion and horizontal transfer as key evolutionary processes complementing the vertical transmission of the tree of life, makes clear that evolution is not just a matter of competitive survival of the fittest gene, individual, or species, but of dynamic survival of genes in a surviving ecosystem. Although Dawkins' (1978) notion of the "selfish gene" was pivotal in drawing attention to the fact that it was the survival of genes and not organisms, or even species, that was the key evolutionary process, attributing the human sentiment of selfishness to a gene is somewhat of a self-serving advertising distraction on the part of the author, which diminishes the subtlety and complexity of the sometimes apparently paradoxical ways genes actually interact to bring about beneficial outcomes in the evolutionary dynamics of the ecosystem.
Although the idea of selection of genes has been pivotal in defining the need to consider evolutionarily stable strategies under genetic variation in ways which have been subsequently confirmed time and time again in situations such as the sexual genetics of social insects such as bees and ants, social selection is by no means ineffectual, or much of sociobiology, including the biological basis of morality as an extension of reciprocal altruism, would cease to exist.
Moreover, from what we have seen, particularly about horizontal gene transfer, and the capacity of mobile elements to induce modulated changes in nuclear genomes, it is not the 'selfishness' of a genetic element alone that results in survival of both a gene and its hosts, but dynamic feedbacks,, and relationships which ultimately contribute to a massive sharing of information in the manner of parallel genetic algorithms fundamental to the replicative genetic process, which enable global forms of genetic and genome optimization central to the overall viability of life as complex systems.
Fig 39: The Mandala of Evolution (Dion Wright) Click to see full image.
Just as a predator, such as a lion survives, not because it is a selfish beast thinking only of eating the next gazelle, but because the predator, although it is surviving by killing individual antelopes, is maintaining a degree of stability in population dynamics, without which, the herbivores might multiply causing a massive famine, leading to cycles of boom and bust and the potential extinction or attrition of antelopes, lions and the grasslands.
Likewise, although we may think of individual genes, transposable elements, or viruses as 'selfish' for reproducing sufficiently to ensure their own survival, and sometimes behaving as noxious parasites, the overall effects of this process, in evolution can be to enrich the genetic potential of many unrelated organisms along the way, changing forever the face of the ecosystems in which they exist, enabling organisms of far greater complexity to evolve and to survive in the closing circle of the biosphere.