The Tree of Life: Tangled Roots and Sexy Shoots
Tracing the genetic pathway from the Last Universal Common Ancestor to Homo sapiens
Chris King - Genotype 1.3.31 - Dec 2009 - May 2017
PDF (with hi-res images) Active site: http://www.dhushara.com/unravel/ Please repost this article! Click on tree images to enlarge.
Abstract: This article is a fully referenced research review to overview progress in unraveling the details of the evolutionary Tree of Life, from life's first occurrence in the RNA-era, to humanity's emergence and diversification, through migration and intermarriage.The Tree of Life, in biological terms, has come to be identified with the evolutionary tree of biological diversity. It is this tree which represents the climax fruitfulness of the biosphere and the genetic foundation of our existence, embracing not just higher Eucaryotes, plants, animals and fungi, but Protista, Eubacteria and Archaea, the realm, including the extreme heat and salt-loving organisms, which appears to lie almost at the root of life itself. The notion of a tree of evolution veertically down te generations has become complicated by evidence for promiscuous horizontal gene transfer and for genetic symbiosis at the root of the eucaryote tree. This review will cover all these aspects, from the first life on Earth to Homo sapiens.
Prequel: Biocosmology a definitive overview of the research to elucidate the origin of life on Earth and to establish life as an interactive manifestation of cosmic symmetry-breaking.
Fig 1: Comprehensive Evolutionary Tree of Life (King) Click on images to enlarge.
Introduction: The Comprehensive Tree
Linked in figure 1 is a high-resolution image of the evolutionary tree of life, from viruses through bacteria and archaea to protista, plants, animals and fungi, with a selection of representative species illustrated. I have updated and amended this several times as new research has clarified specific parts of the trunk and branches. The evolutionary tree of life is our immortal progenitor, not just of ourselves, but of all the species with which we co-depend, so we need to both understand it and protect it for the future generations. This initial tree forms a good representation of the evolution of higher plants and fungi, so the remainder of the article will examine the tortuous route from the last common ancestor, through the eucaryotes to metazoa, and ultimately to humanity, language and culture.
This article seeks to be a real time account of the discovery processes showing us in ever-incteasing detail, the nature of the tree and its many tangled interactions, both at the genetic and organismic level. It also strives to be a fully up-to-date scientfic account of the discovery process for which we all owe a vote of thanks to the many researchers whose work is illustrated and cited in this extensive review article.
Where the trees are complicated and detailed, high-resolution versions can be viewed by clicking each of the images. A high-resolution PDF version is also provided.
LUCA: The Last Universal Common Ancestor
Following a phase of biogenesis possibly emerging directly from cosmic symmetry-breaking (King 1978, 2004), based on spontaneous prebiotic RNA synthesis (Powner et. al. 2009, 2010) recent research suggests that the last universal common ancestor (LUCA) of all life on the planet may have arisen before the first cells, from a phase interface between alkaline hydrogen-emitting undersea vents and the archaic acidified iron-rich ocean (Martin and Russel 2003) in which differential dynamics in membranous micropores in the vents managed to concentrate polypeptides and polynucleotides to biologically sustainable levels (Baaske et. al. 2007, Budin et. al. 2009), giving rise to the RNA era, while at the same time providing a free energy source based on proton transport across membranous microcellular interfaces resulting from fatty acids also being concentrated above their critical aggregate concentration. The transition to enclosed cells is likely to have been in an active iron-sulphur reaction phase still present in living cells and associated with sodium-proton anti-porters activating ATP (Lane and Martin 2012, Lane 2009b), leading in turn to electron transport and some of the most ancient proteins, such as ferredoxin,
Fig 1a: Proposed scheme for the universal common ancestor (Martin and Russel 2003)
The universal common ancestor of the three domains of life may have thus been a proton-pumping membranous interface from which archaea and bacteria emerged as free-living adaptions. This is suggested by fundamental differences in their cell walls and other details of evolutionary relationships among some of the oldest genes.
Although it has been suggested that glycolysis evolved before ion pumping and electron transport (Alberts et al. 2002, Koonin 2003), respiratory electron transport is universal to the three domains of life including eucaryote mitochondria and chloroplasts and both bacteria and archaea (Schafer et al. 1996, 1999, see fig 1d). Among the archaea, halobacteria still use a form of photosynthesis generating ATP from H+ gradients generated by a rhodopsin protein and those in hydrothermal vents rely on Na+-H+ antiporters to generate ion gradients, and their membrane proteins, such as the ATP synthase, are compatible with gradients of sodium ions or protons (Lane and Martin 2012, Yong 2012). The archaea also use a unique form of electron transport in methanogenesis (Schafer 2004).
Fig 1b1: (Above) founding metabolism based on Na+-H+ anti-transported, ATP synthetase and FeSNiS containing vents (Lane and Martin 2012). The extremely ancient origin of the rhodopsin family of heptahelical receptors can be seen from the ultra-primitive archael photosynthesis in Halobacteria, which relies on direct coupling between photo-stimulated chemiosmotic H+ pumping and H+ generated ATP formation, based on bacteriorhodopsin, which is heptahelical, uses a form of retinal and whose helices share a distant sequence homology with vertebrate rhodopsin (Ihara et al 1999) (click to enlarge).
The H+-dependent ATPsynthase universal to the chemiosmotic coupling of electron transport to ATP production is a rotary motor which appears to have evolved from two separate subunits, one of which has been proposed to be a helicase (Doering et al. 1995, Crofts 1996). Hexameric helicases are found both in the SF3 superfamily in viruses (Hickman & Dyda 2005) and the MCM helicases are critical to replication forks in diverse organisms from humans to archaea (Fletcher et al. 2003), Onesti & MacNeill 2013, Sharma et al. 2006). The viral SF3 superfamily (Leitão 2015) helicase tree shows variants active on both RNA and SNA substrates, consistent with an origin in the RNA era (Caprari et al. 2015). Supporting the notion of subunits, a beta-chain of ATP synthase is homologous to a hepatic lipoprotein receptor (Martinez et al. 2003).
Fig 1b2: Left: Rotary action of ATPsynthase, shown centre. See video. Right: Evolutionary tree of viral RNAhelicase includes forms active in both RNA and single and double-stranded DNA viruses (Caprari et al. 2015).
Respiratory electron transport occurs in both aerobic and anaerobic organisms and the terminal oxidases, iron-sulphur proteins and flavin-binding polypeptides all show evolutionary trees reaching back to the common ancestor of the three domains, implying terminal oxidases predate oxygenic photosynthesis. The fact that many components of archaeal electron transport are significantly different in structure from those of bacteria implies these evolved separately and that archaeal electron transport is not simply a more recent result of horizontal transfer (Schafer 2004). Terminal oxidases belonging to oxygen, nitrate, sulfate, and sulfur respiratory pathways have been sequenced in members of both bacteria and archaea including cytochrome oxidase, nitrate reductase, adenylylsulfate reductase, sulfite reductase, and polysulfide reductase which can likewise be assigned to LUCA (Castresana & Moreira 1999). Similar considerations apply to ferredoxins, one of the most ancient coded proteins (Fitch & Bruschi 1987, Hall, Cammack & Rao 1974).
Fig 1b3: Evolutionary trees for two components of the electron transport chain, Fe-S proteins (left) and flavin-binding polypeptides (right archaea lower right Homo sapiens upper left), span the three domains of life (Schafer et al. 1996, Schafer 2004).
It has also been proposed, on the basis of the highly-conserved commonality of transcription and translation proteins to all life, but the apparently independent emergence of distinct DNA replication enzymes in archaea/eucaryotes and eubacteria, that the last universal common ancestor had a mixed RNA-DNA metabolism based on reverse transcriptase, pinpointing it to the latter phases of the RNA era (Leipe et. al. 1999).
To get a characterization of LUCA at the point it diversified into the three domains of life Archaea, Eucaryotes and Bacteria, one cannot rely on nucleotide gene sequences because these would have mutated beyond recognition, but amino acid sequences mutate more slowly because neutral mutations leave the amino acid sequence fixed and the tertiary folded structure of a protein is even more strongly conserved.
The validity of the RNA-era concept and the capacity for RNAs to be both replicating informational and active ribo-enzymes is emphasized by the continuing dependence of the ribosome on rRNA rather than the protein components demonstrated by the 3D realizations of the two subunits in fig 1c1, which show that the rRNA molecules are still carrying out the central task of protein assembly with only minor modification due to the 'chaperoning' proteins, despite 3.8 billion years of evolution.
Fig 1c1: Small and large rRNA subunits of the eubacteia Thermus thermophilus and the archaeon Haloarcula marismortui.
RNA orange and yellow, protein blue and active site green. (Wikipedia Ribosome) Click the image to see the RNAs rotating.
Brooks et al. (2002) have found that the amino acids used in sections of genes common to life which are believed to originate with LUCA show amino acid distributions reflecting the relative abundance of such amino acids in primitive synthesis, indicating that the first translational genes used the amino acids which were spontaneously available, consistent with my original hypothesis on origin of the genetic code in Biocosmology. A specfic model of the evolution of the ribosome envisages that the smaller subunit which binds to and moves along the mRNA began first as an RNA-based RNA helicase which was essential to avoid the RNA era ending in non-replicating double stranded hairpins (Zenkin 2012). This would have then coupled to the larger subunit which could have assembled transfer RNAs coupled to amino acids via ribozymes, resulting in a simple genetic code, for example based on polar and non-polar amino acids.
One intriguing indication of the state of genetic translation in LUCA is the incorporation of selenocysteine into the genetic code. Selenoenzymes which contain selenocysteine as a genetically translated amino acid are essential to the three domains of life and source back to LUCA, despite the fact that the 21st coded amino acid selenocysteine could not be fitted into the genetic code. An ingenious piece of genetic software engineering evolved in which the amber stop codon UAG is overridden if the m-RNA possesses a motif called SECIS (selenocysteine insertion sequence). Selenocysteine is then inserted instead of termination, and translation continues.
Fig 1c2: Left: Evolutionary tree of selenophosphate synthetase (Romero et al. 2005) spans the three domains of life. Centre: SECIS hairpins of archaea (A), bacteria (B) and corresponding eukaryote variants (C, D) (Moldave ed 2006). Top right: Tertiary structure of SECIS showing highly conserved regions (hot) (Walczak et al. 1996). Lower right: SECIS acts as an RNA-enzyme to attach the selenocysteine t-RNA to the nascent protein (click to enlarge).
SECIS is an unusual hairpin loop structure which has varying forms in archaea and prokaryotes with both forms appearing in eucaryotes, but they have a common feature of a highly conserved hairpin loop forming an RNA translational catalyst, which literally takes over some of the ribosomal RNA function, binding to the selenocysteine t-RNA and coupling selenocysteine to the nascent protein chain, as shown in fig 1c2. It is clear that this unique piece of genetic software engineering evolved in LUCA because the wobble positions of three other essential amino acid t-RNAs, lysine, glutamine and glutamic acid (those with two wobble positions XAA-XAG, the fourth set being amber and ochre stop codons), all depend on a modified 2-seleno-uridine base to function and this has to be generated from selenophosphate, which in turn is generated by selenophosphate synthetase. As shown above left, this enzyme has an evolutionary tree extending back to LUCA confirming the obvious - that the genetic code cannot exist without the 21st software engineered amino acid selenocysteine!
In a ground-breaking project to identify genes that can illuminate the biology of LUCA, a team associated with Martin, (Weiss et al. 2016) took a phylogenetic approach to decoding the LUCA metabolism. Among proteins encoded in sequenced prokaryotic genomes, they sought those that: (1) are present in at least two higher taxa of bacteria and archaea, and (2) its tree should recover bacterial and archaeal monophyly. Genes meeting both criteria are unlikely to have undergone transdomain lateral gene transfer (LGT), and thus were probably present in LUCA and inherited within domains since then. By focusing on phylogeny rather than universal gene presence, they identified genes involved in LUCA's physiology - the ways that cells access carbon, energy and nutrients from the environment for growth.
The presence of the thermophile-specific enzyme reverse gyrase implies that LUCA was a thermophile. A rotator-stator ATP synthase subunit suggests LUCA was able to harness ion gradients for energy. LUCA also appears to have had a gene for a 'revolving door' protein that could swap sodium and hydrogen ions across this gradient. Earlier studies by Martin and Nick Lane of University College London suggest that such a protein would have been absolutely crucial for exploiting the natural gradient at vents. The only energy pathway enzymes present were those of the Wood-Ljungdahl (WL) pathway, which uses H2 as an electron donor and CO2 as electron acceptor. The H2 must have come from geological sources, since it could not have been made through fermentation. Analysis of the phylogenetic trees constructed from the 355 protein families places Clostridia and methanogens as the earliest-diverging organisms - both of which are anaerobic, H2 -dependent and use the WL pathway. In methanogens and acetogenic clostridia, methyl groups are central to growth, comprising the very core of carbon and energy metabolism. The implication of this work is that LUCA was very much dependent on abiotic sources of H2 to provide it with energy, consistent with a metabolism associated with lost-city vents in which alkaline mineral rich water enters the acidic high-CO2 ocean.
Fig 1c3: Elements of LUCA's metabolism elucidated in Weiss et al. (2016). (a) The overall metabolic pathways iin LUCA. CODH/ACS, carbon monoxide dehydrogenase/acetyl CoA-synthase; Nif, nitrogenase; GS, glutamine synthetase; Mrp, MrP type Na+/H+ antiporter; CH3-R, methyl groups; HS-R, organic thiols. (b) Prominent methyl groups, and S and Se modifications. (c) Methyl transfer from tetrahydrofolate to methane and other C-containing biomolecules. (d) Molybdenum-containg MoCo. (e) SAM S-adenosyl-methionine attached to an FeS center. (f) The WL pathway showing how electron transfer from H2 to CO2 enables incorporation of metabolic molecules.
Cells conserve energy via chemiosmotic coupling with rotor - stator-type ATP synthases or via substrate-level phosphorylation. LUCA's genes encompass both a phosphotransacetylase (PTA) and an ATP synthase subunit. PTA generates acetylphosphate from acetyl-CoA, conserving the energy in the thioester bond, which can phosphorylate ADP or other substrates. LUCA's WL enzymes are replete with FeS and FeNiS centres, indicating transition-metal requirements and requiring organic cofactors: flavin, F420, methanofuran, two pterins (the molybdenum cofactor MoCo and tetrahydromethanopterin) and corrins such as cobalamin (fig 27), as well as nucleotide and other cofactors.
LUCA's genes for RNA nucleoside modification indicate that it performed chemical modification of nucleosides in both tRNA and rRNA. Four of LUCA's nucleoside modifications are methylations requiring SAM. In the modern code, several base modifications are required for codon-anticodon interactions at the wobble position. Consistent with the recurrent role of methyl groups in LUCA's biology, by far the most common tRNA and rRNA nucleoside modifications that are conserved across the archaeal bacterial divide are methylations, although thio-methylations and incorporation of sulfur and selenium are observed. Notably Selenophosphate synthase is included in the LUCA list, as well as nitrogenase molybdenum-iron protein alpha and beta chains as well as NifH, confirming the LUCA hypothesis for nitrogen fixation (Leigh 2000, Raymond et al. 2004, Boyd & Peters 2013). This picture indicates the antiquity and functional significance of methylated bases in the evolution of the ribosome and the genetic code and forges links between the genetic code, primitive carbon and energy metabolism and hydrothermal environments.
LUCA's gene list reveals only nine nucleotide biosynthesis and five amino acid biosynthesis proteins. The paucity of enzymes for essential amino acid, nucleoside and cofactor biosyntheses suggests that LUCA might not yet have evolved the genes in question prior to the bacterial-archaeal split, with the pathway products for LUCA being still provided by primordial geochemistry.
The late heavy bombardment (LHB) of Earth by comets and asteroids approximately 4-3.8 billion years ago probably resulted in Earth being periodically heated to the point that the oceans were vaporized and probably led to bottlenecks in the diversity of life at the time, meaning that only hyperthermophiles survived. The amount of oxygen available for biological cells was negligible and all life was anaerobic. When we look at the inferred metabolism of LUCA, we are looking at the dominant and most successful kind of metabolism on the planet before the Bacteria and Archaea diverged.
There have been a variety of other studies that have attempted to find a critical minimal gene set for life, or to find the minimal gene set that has homologous members in all three domains. These come to varying conclusions depending on the stringency of their criteria and the choice of representative organisms (Mat et al. 2008, Forterre et al. 2005, Harris et al. 2003).
To reconstruct the set of proteins LUCA could make, Kim and Caetano-Anollés (2011) (direct link), see also Wang et al. (2007), searched a database of proteins from 420 modern organisms, looking for structures that were common to all. Of the structures found, just 5 to 11 per cent were universal, meaning they were conserved enough to have originated in LUCA. By looking at their function, they conclude that LUCA had an advanced metabolic network, especially rich in nucleotide metabolism enzymes, had primordial pathways for the biosynthesis of membrane glycerol ether and ester lipids, crucial elements of translation, including aminoacyl-tRNA synthases, regulatory factors, and a primordial ribosome with protein synthesis capabilities. It lacked however transcription from DNA to RNA, processes for extracellular communication, and enzymes for deoxyribonucleotide synthesis, and in advanced evolutionary stages stored genetic information in RNA (not DNA) molecules
Fig 1d: Phylogenomic tree of proteomes describing the evolution of 420 free-living organisms. phylogenomic study of protein domain structure in the proteomes of 420 free-living fully sequenced organisms. Domains were defined at the highly conserved fold superfamily (FSF) level of structural classification (Kim and Caetano-Anollés).
(click image to link to original research version)
Organelles were thought to be the preserve of eukaryotes, but in 2003 researchers found an organelle called the acidocalcisome also occurred in bacteria. Caetano-Anollés' team has now found that tiny granules in some archaea are also acidocalcisomes, or at least their precursors. That means acidocalcisomes are found in all three domains of life, and date back to LUCA (Seufferheld et al. - direct link).
Acidocalcisomes were originally discovered in Trypanosomes (sleeping sickness and Chagas disease) but have since been found in Toxoplasma gondii (toxoplasmosis), Plasmodium (malaria), Chlamydomonas reinhardtii (a green alga), Dictyostelium discoideum (a slime mould), bacteria and human platelets. Their membranes contain a number of protein pumps and antiporters, including aquaporins, ATPases and Ca2+/H+ and Na+/H+ antiporters. Acidocalcisomes have been implied in osmoregulation. They were detected in vicinity of the contractile vacuole in Trypanosoma cruzi and were shown to fuse with the vacuole when the cells were exposed to osmotic stress. Presumably the acidocalcisomes empty their ion contents into the contractile vacuole, thereby increasing the vacuole's osmolarity. This then causes water from the cytoplasm to enter the vacuole, until the latter gathers a certain amount of water and expels it out of the cell.
Fig 1e: Tangled web linking acidocalcisomes in existent archaea, bacteria and eucaryote species (Seufferheld et al.), overlaying electron micrographs of acidocalcisomes in Agrobacterium tumefaciens(a, b) and Methanosarcina acetivorans (c, d). (click image to link to original research version)
LUCA may have used RNA rather than DNA, as there is no evidence LUCA possessed ribonucleotide reductases, which create the deoxy versions of ribonucleotides the building blocks of DNA (Lundin et al - direct link). Rather it appears these functions have been transferred from bacteria back to archaea by horizontal transfer on at least two separate occasions (arrows in fig 1e). Eucaryotes (mid green) would also have received theirs after LUCA diversification.
Fig 2: Ribonucleotide reductase trees showing bacterial, eucaryote and archaeal branches, with evidence of two events of horizontal transfer from bacteria to archaea (arrows) after the diversification of LUCA (Lundin et al).
LUCA was a "progenote". Progenotes can make proteins using genes as a template, but the process is so error-prone that the proteins can be quite unlike what the gene specified. Both Di Giulio and Caetano-Anollés have found evidence that systems that make protein synthesis accurate appear long after LUCA. In order to cope, the early cells must have shared their genes and proteins with each other. Caetano-Anollés says the free exchange and lack of competition mean this living primordial ocean essentially functioned as a single mega-organism.
Fig 2b: Ancestral tryptophan synthase (Busch et al.).
A picture of the efficiency of enzymes in the last common ancestor of bacteria LBCA, which although more recent than LUCA still dates back some 3.5 byrs, has come from the reverse sequencing of the most probable sequence of the ancestral tryptophan synthase enzyme of the common ancestor of a selection of bacteria and archaea. The tree is rooted within the bacteria, because Euryarchaeota have most likely obtained the TS by a more recent horizontal gene-transfer event from a bacterial predecessor. This proved to have efficient functionality when inserted into E. coli. The LBCA TS subunits are thermostable, exhibit high catalytic activity and form an αββα complex whose crystal structure is similar to modern TS. (Busch et al. 2016).
At the absolute limit of molecular simplicity that has hints of the RNA era that preceded LUCA are the two genuses of viroid, small circular RNA molecules that cause disease in higher plants. These do not encode any genes as viruses and all other life forms do, and survive passively through being copied by RNA polymerases either in the nulceus or the chloroplasts, using a rolling circle replication that requires no primers or tags. The have pathological effects because their nucleotide sequences cause RNAi interference to essential plant genes, giving rise to popato spindle tuber disease and similar diseases in avocados and other plants. Although they havent been isolated in the wild from cyanobacteria or other prokayotes, they have been found to be able to replicate in cyanobacteria, hinting at an early origin.
Fig 2c: The two known types of viroid above causing potato spindle disease and replicating asymmetricallyin the nucleus and below causing avocado diesase and replicating symmetrically in the chloroplast also functioning as ribozymes in their replication cycle (Daros et al 2006).
Diener's (1989, 2016) hypothesis proposed that the unique properties of viroids make them plausible "living relics" of the RNA world, possessing six primal attributes:
1. viroids' small size, imposed by error-prone replication
2. their high guanine and cytosine content, which increases stability and replication fidelity
3. their circular structure, which assures complete replication without genomic tags
4. existence of structural periodicity, which permits modular assembly into enlarged genomes
5. their lack of protein-coding ability, consistent with a ribosome-free habitat
6. replication mediated in some by ribozymes - the fingerprint of the RNA world.
Some introns, the non-coding sections of DNA that punctuate modular coding sections of eucaryote and some procaryote genes can also self-splice as ribozymes but their link with the RNA era is less encompassing.
Life today is informationally based on the sequences of the four bases A, G, T and C in DNA, with messenger copies of the genetic sequence in mRNA (with U replacing T) forming intermediates in the assembly of proteins, as the cell's primary active chemical and structural agents. This is achieved through a process of translation at the ribosome - a supra-molecular complex composed of some 50 chaperoning proteins surrounding a core composed of three rRNA units, fed by amino-acid coupled tRNAs. The RNAs carry out the essential function, supporting the idea that translation was at first a purely RNA-based process of protein construction. In line with this and other RNA fossils found particularly in Eukaryotes, it is widely believed that life began based on RNA, which shares both the capacity for complementary replication of DNA and the formation of 3-dimensional chemically reactive conformations, similar to proteins, after which the ribosome evolved, transferring the reactive burden on to proteins sequenced through the genetic code. Some time later, the informational genome was consolidated into more stable DNA.
Fig 3: The initial tree of rRNAs shows three distinct founding domains (Woese 1987) Click to enlarge
Originally the Bacteria and Archaea were thought to be one large diverse family of prokaryotes until Carl Woese (1977, 1978, 1987, 1990) and others investigated the evolutionary tree of ribosomal RNAs and found that there were three distinct founding evolutionary domains, then named eubacteria, archaebacteria along with the eukaryotes.
This gave the Eukaryotes a closer founding status as well, by contrast with the idea that the procaryotic bacteria came first and then, somehow the higher Eukaryote organisms with their complex cellular structures, including among others - the endoplasmic reticulum, along with the nuclear envelope and Golgi apparatus - all parts of a common complex of internal membranous partitions - and the architecture of microtubules, including centrioles, and the Eukaryote flagellum, as well as the Eukaryotes endosymbiont mitochondria and chloroplasts.
Fig 4: Key structural differences separating
the larger rRNA units of the three domains
(Woese 1987) (click to enlarge).
In addition to their evolutionary sequence divergence, the smaller 30s ribosomal RNAs of each domain, show distinct structural features characteristic of their own domain, but also emphasizing structural links between Bacteria and Archaea on the one hand and Archaea and Eukaryotes on the other, qualitatively confirming the central place of the Archaea in the divergence.
Fig 5: (a) Further elaboration of the rRNA tree (Pace 1997) (b) A third rRNA tree which suggests Archaea lie very close to the root is contrasted with that for the enzyme HMGCoA reductase (c), which also shows evidence of horizontal transfer to an Archaean (ex Doolittle 2000). Click to enlarge
Norman Pace subsequently enlarged the scope and accuracy of the rRNA tree, including a greater diversity of organisms. This tree has become the basis of several other studies. Surviving Archaea are known to inhabit extreme environments, including hot vocanic pools, hydrothermal vents and extreme salty environments and several arrangements of the root of the tree, including Bork's team's work suggest a hot origin for life. However other research (Brochier and Philippe, Boussau et. al.), concludes the base root may have been at about 25oC, a more viable temperature for a simple RNA metabolism, with a succeeding period of high temperature adaptions shortly after the differentiation of the three domains in evolutionary time.
However James Lake (1988) had already challenged the notion of three domains, with an analysis claiming that the eucaryotes instead branched off form only one line of the archaea, the eocytes or chrenarcheota. This view has been confirmed by accumulating genetic studies (Williams & Embley 2014, Williams et al. 2013, Foster, Cox & Embley 2009, Cox et al. 2008) in which the TACK group of archaeota (fig 5b right) have a pivotal relationship with eucaryotes.
Fig 5b: Left: Evolutionary root of the tree of life and its diversification into archaea, bacteria and Eukaryotes appears to have gone through an early period of cool temperature consistent with an RNA era, followed by a hot period (Anathaswamy, Boussau et. al.) (click to enlarge). Right: Three domains (a) is contrasted with a recent version of the "eocyte" hypothesis (b) showing the eucaryotes emerging from the wider crenarcheota grouping (TACK) after divergence from euryarcheota, implying the amoeboid ancestor of the eucaryotes was an "eocyte" (Williams et al. 2013).
The Copernican principle asserts that the Earth is a typical rocky planet in a typical planetary system, located in an unexceptional region of a common barred-spiral galaxy, hence it is probable that the universe teems with complex life. This is supported to a reasonable extent by the discovery of an increasing number of planets including some putative "Goldilocks" zone planets where water would be liquid and life as we know it could potentially exist. Set against this, the "rare earth" hypothesis argues that the emergence of complex life requires a host of fortuitous circumstances, including a galactic habitable zone, a central star and planetary system having the requisite character, the circumstellar habitable zone, the size of the planet, the advantage of a large satellite, conditions needed to assure the planet has a magnetosphere and plate tectonics, the chemistry of the lithosphere, atmosphere, and oceans, the role of "evolutionary pumps" such as massive glaciation and rare bolide impacts, and whatever led to the still mysterious Cambrian explosion of animal phyla. This might mean that planets able to support a bacterial level of life are not so uncommon, but those supporting complex multicellular life might be.
Fig 6: Metabolic power of eucarote cells per haploid genome and hence the capacity for genomic complexity depends on the respiratory power of mitochondria (Lane and Martin).
Bringing this question to a pivotal crux in our context, the emergence of mitochondria as endosymbionts has been proposed to be a critical bottleneck which allowed complex life to evolve only once on Earth, because, only in this effectively fractal cellular architecture, can the membrane surface areas necessary to support the chemical reactions enabling the vastly larger number of genes in a complex organism's genome to maintain metabolic stability (Lane and Martin 2010, 2012). Lane and Martin note "The cornerstone of eukaryotic complexity is a vastly expanded repertoire of novel protein folds, protein interactions and regulatory cascades. The eukaryote common ancestor increased its genetic repertoire by some 3,000 novel gene families. The invention of new protein folds in the eukaryotes was the most intense phase of gene invention since the origin of life. Eukaryotes invented five times as many protein folds as eubacteria, and ten times as many as archaea. Even median protein length is 30% greater in eukaryotes than in prokaryotes". Whether such endo-symbiosis is rare. or a common extreme of parasitic or predatory relationships would then determine how likely or unlikely complex life might be.
That said, one can also see how an endosymbiotic relationship between an archaeote and a respiring bacterium could set up a mutually beneficial energetic relationship which could lead to the eucaryote emergence. Martin and Muller (1998) have proposed that mitochondrial symbiosis might emerge as an interaction under low oxygen between a respiring hydrogen-producing bacterium and an archaeal cell utilizing hydrogen to make ATP.
This massive increase in complexity remains obscure in the genetic and fossil records and requires some ingenious model construction to envisage how mitosis, meiosis, sexuality, the nuclear envelope, endoplasmic reticulum, cytoskeleton, and all the complexities of eucaryote regulation evolved. For a seminal work on this see (Cavalier-Smith 2010).
Regardless of this, Lane and Martin's metabolic approach explains neatly why there is little sign of any of these structures in any existing prokaryote. In effect endo-symbiosis created a completely new energetic regime, in which the only niche players were the newly formed endo-symbiotic chimeras themselves, who then underwent a massive adaptive radiation to form ever more complex forms of cellular machinery and ultimately LECA and the diversity of eucaryotes as we now know them. There are echoes in this metabolic shangri-la of the conditions in lost city vents that we are coming to understand may have likewise given rise much earier to LUCA.
Are Giant Viruses a Fourth Domain of Life?
Fig 6b:Left: Bacterium Gemmata obscuriglobus with internal nuclear envelope and vaccuoles (Rachel Melwig & Christine Panagiotidis / EMBL). Right: Ultrathin EM section of a mimivirus in amoeba (Jean-Michel Claverie) Inset: Mamavirus infected by sputnik phage.
Offset against both the uniqueness of the mitochondrial endo-symbiosis and the closely linked, but independent question of the origin of the nucleus and nuclear envelope, has been the discovery of mimi-, mama-, mega- and pandora-viruses infecting amoeba (Raoult et, al., Philippe et al) and related very large aquatic viruses such as CroV infecting single celled plankton species (Fisher et. al.), which despite their recent discovery, appear from ocean gene analyses to be potentially ubiquitous and widespread in the oceans and possibly playing a crucial role in regulating the atmospheric-oceanic pathways, such as carbon sequestration.
These form an intermediate genetic position between viruses and cells, having the largest genomes, with extensive cellular machinery, including protein translation, and larger than the smallest completely autonomous bacterial and archaeal genomes.
Megavirus chilensis, for example is 10 to 20 times wider than the average virus. The particle measures about 0.7 micrometres (thousandths of a millimetre) in diameter. It just beats the previous record holder, Mimivirus, which was found in a water cooling tower in the UK in 1992. A study of the megavirus's DNA shows it to have more than a thousand genes. The mimivirus genome is a linear, double-stranded molecule of DNA with 1.18 Mbp in length. Megavirus has 1.25 Mbp. Like Mimivirus, Megavirus has hair-like structures, or fibrils, on the exterior of its shell, or capsid, that probably attract unsuspecting amoebas looking to prey on bacteria displaying similar features. These viruses show many characteristics at the boundary of living and non-living. They are as large as several bacterial species, such as Rickettsia conorii and Tropheryma whipplei, possess a genome of comparable size to several bacteria, including those above, and code for products previously not thought to be encoded by viruses. Mimivirus has genes coding for nucleotide and amino acid synthesis, which even some small obligate intracellular bacteria lack. However, it lacks genes for ribosomal proteins, making it dependent on a host cell for protein translation and energy metabolism.
As of mid-2013, an even larger virus with a 2.5 Mb genome without morphological or genomic resemblance to any previously defined virus families has been discovered by the same researchers that found mimivirus, in both the same ocean sample off Peru and in a freshwater pond in Australia. Named pandoravirus - reflecting their lack of similarity with previously described microorganisms and the surprises expected from their future study. The researchers suspect that giant viruses evolved from cells. They think that at some point, the dynasty on Earth was much bigger than the three domains of bacteria, archaea and eukaryotes. Some cells gave rise to modern life, and others survived by parasitizing them and evolving into viruses. Pandora might thus provide a complementary relic of the genomes of this wider founding group (Philippe et al). Using the Global Ocean Sampling (GOS) Expedition data to explore variants of recA (the universal DNA repair enzyme) and rpoB (the beta subunit of bacterial RNA polymerase) a team associated with Craig Venter have discovered branches which may also point to a fourth domain (Wu et al).
Fig 6c: Evolutionary tree of B-family DNA polymerase showing relationship of pandoravirus to other viruses and eucaryotes. Inset is shown pandoraviruses invading acanthamoeba (Philippe et al).
As an illustration of genes in mimivirus normally appearing only in cellular genomes, the mimivirus has genes for central protein-translation components, including four amino-acyl transfer RNA synthetases, peptide release factor 1, translation elongation factor EF-TU, and translation initiation factor 1. The genome also exhibits six tRNAs. Other notable features include the presence of both type I and type II topoisomerases, components of all DNA repair pathways, although the topoisomerase 1B has a different header structure from the eucaryote form (Brochier-Armanet, Gribaldo & Forterre 2008), many polysaccharide synthesis enzymes, and one intein-containing gene. Inteins are protein-splicing domains encoded by mobile intervening sequences (IVSs). They self-catalyze their excision from the host protein, ligating their former flanks by a peptide bond. They have been found in all domains of life (Eukaria, Archaea, and Eubacteria), but their distribution is highly sporadic. Only a few instances of viral inteins have been described. Self-splicing type I introns are a different type of mobile IVS, self-excising at the mRNA level. They are rare in viruses. Mimivirus exhibits four instances of self-excising intron, all in RNA polymerase genes.
Fig 6d: Evolutionary diversification of Mimiviruses from nucleocytoplasmic large DNA viruses (Fisher et. al.) and in relation to the three domains of cellular life based on the concatenated sequences of seven universally conserved protein sequences (Raoult et. al.)
Mamaviruses also host parasitic virophages, affectionately named sputnik (Pearson 2008) as viral satellites, which piggy back on the metabolism of the large viral factories set up by these giant viral genomes causing the mimiviruses to sicken, and these virophages also contains genes that are linked to viruses infecting each of the three domains of life Eukarya, Archaea and Bacteria (La Scola et. al.). It has thus been suggested that they have a primary role in the establishment of cellular life and that they may have been instrumental in the emergence of the nuclear envelope.
Fig 6e: (Left) Genome bins of the Klosneuviruses. From outside to inside: In the first ring, solid circles indicate genes exclusively shared with nucleocytoplasmic large DNA viruses (NCLDVs) (blue), genes specific for Klosneuviruses (white), genes shared with eukaryotes (red), genes shared with Bacteria (green), genes represented in all three domains of cellular life (yellow), and singletons (gray). The second ring displays positions of genes (gray) either on the minus or the plus strand. The next track depicts GC content in shades of gray ranging from 20% (white) to 50% (dark gray). Links connect paralogs (gray) and nearly identical repeats (orange). (Right) Genome evolution of Klosneuviruses. A maximum likelihood tree from a concatenated alignment of five core nucleocytoplasmic virus orthologous genes.
However an investigation of another newly discovered group of very large viruses, the Klosneuviruses in metagenomic data from Austrian sewage (Schultz et al. 2017) shows they have arisen through multiple aggregation events. Compared with other giant viruses, the Klosneuviruses encode an expanded translation machinery, including aminoacyl transfer RNA synthetases with specificities for all 20 amino acids. Notwithstanding the prevalence of translation system components, comprehensive phylogenomic analysis of these genes indicates that Klosneuviruses did not evolve from a cellular ancestor but rather are derived from a much smaller virus through extensive gain of host genes.
CRISPR-Cas has become famous for its potential to perform genome editing. About 90% of archaea and 30% of bacteria have some form of CRISPR-Cas immunity, but how did bacteria and archaea come to possess such sophisticated immune systems? Viruses outnumber prokaryotes by ten to one and are said to kill half of the world's bacteria every two days. Prokaryotes also swap scraps of DNA called plasmids, which can be parasitic. Prokaryotes have evolved a slew of weapons to cope with these threats. Restriction enzymes, for example, cut DNA at or near a specific sequence, but these defences are blunt. Each enzyme is programmed to recognize certain sequences, and a microbe is protected only if it has a copy of the right gene. CRISPR–Cas is more dynamic. It adapts to and remembers specific genetic invaders in a similar way to how human antibodies provide long-term immunity after an infection. The leading theory of their origin is that the systems are derived from transposons that can hop from one position to another in the genome. Evolutionary biologist Eugene Koonin and colleagues Krupovic et al. (2014) have found a class of these they have called Casposons that encode the protein Cas1 involved in inserting spacers into the genome. They describe a new superfamily of archaeal and bacterial mobile elements which encode Cas1 endonuclease, a key enzyme of the CRISPR-Cas adaptive immunity systems of archaea and bacteria. The casposons share several features with self-synthesizing eukaryotic DNA transposons of the Polinton/Maverick class, including terminal inverted repeats and genes for B family DNA polymerases. However, unlike any other known mobile elements, the casposons are predicted to rely on Cas1 for integration and excision, via a mechanism similar to the integration of new spacers into CRISPR loci .
Tangled Roots of Horizontal Transfer
Darwin (1859) was the first person to publish an evolutionary tree of life on page 133 of his seminal work. The basis of such a tree in the genetic age has become the vertical transfer of genetic information through reproduction coupled to mutation and selective advantage. This is the basis of the tree diagram itself and all evolutionary trees constructed on genetic data. However, despite the division into three domains, further investigations of proteins in the three domains began to reveal a much more confused and complicated picture.
Firstly the ribosomal proteins, like the rRNAs show distinct, easily differentiated morphologies with some correspondences linking one pair of domains and other another pair (Forterre 2006b, Woese 2000). Secondly, horizontal transfer of genes e.g. through viral interaction has occurred at fluctuating rates throughout all the domains of life. Lawton (2009), provides an in depth a review of this debate. Thirdly, the proteins in Eukaryotes appear to have a mixed origin, with the informational ones having an evolutionary relationship with archaea but the metabolic enzymes appearing to have a bacterial origin. This suggests that the Eukaryote genome has either resulted from one, or more symbiotic fusions e.g. an archaeal and a bacterial genome and/or that there has been a high degree of horizontal gene transfer between bacteria and Eukaryotes.
Fig 7: Evolution of iron-sulphur cluster proteins of mitochondria is linked to α-proetobacteria (Emelyanov 2003)
The evidence for symbiotic inclusions is clear from the fact that all Eukaryotes have endosymbiotic respiring mitochondria. Plants also have photosynthetic chloroplasts derived from cyanobacteria. The only apparent exceptions are a few primitive anaerobic organisms, such as the metamonad human gut parasite Giardia lamblia which nevertheless has a mitochondrial remnant. Mitochondria are evolutionarily related to α-proteobacteria and in particular the SAR11-clade of Rickettsiiae (Emelyanov 2003, Thrash et al. 2011) named after their discovery in the Sargasso sea.
SAR11 clade organisms, unlike the Rickettsiae causing diseases such as typhus, are free living dominant ocean organisms. Pelagibacter ubique discovered in the Sargasso sea, along with its relatives constitute 25% of all microbial plankton cells - the most abundant ocean bacteria and possibly the most abundant bacteria on Earth. It is one of the smallest self-replicating cells known, with a length of 0.37-0.89 µm and a diameter of only 0.12-0.20 µm. 30% of the cell's volume is taken up by its genome. It has the smallest genome 1.30 Mbp of any free living organism, encoding only 1,354 open reading frames (1,389 genes total). The only species with smaller genomes are intracellular symbionts and parasites, such as Mycoplasma genitalium. It recycles dissolved organic carbon and undergoes regular seasonal cycles in abundance - in summer reaching ~50% of the cells in the temperate ocean surface waters. Thus it plays a major role in the Earth's carbon cycle.
Fig 7b: Nitrosopumilus maritimus one of the ubiquitous Thaumarchaea
and Pelagibacter ubique from the SAR11 clade.
Complementing this, in terms of close archaeal relatives of the eucaryote endosymbiosis, are the Crenarchaeota and the closely related TACK sub-clade Thaumarchaeota discovered from their wide occurrence in ocean samples, which may have an even closer relationship with eucaryotes (Brochier-Armanet et al. 2008). The wider grouping of Crenarchaeota were originally thought to be extreme thermophiles, such as the aerobic Sulfolobus solfataricus found in an Italian hot pool growing at 80oC in a pH of 2-4. However since then ubiquitous low temperature Thaumarchaeote species, such as Cenarchaeum symbiosum and Nitrosopumilus maritimus have been discovered in cool oxic ocean. Nitrosopumilus is one of the smallest living organisms at 0.2 µm in diameter. It has a genome of 1.65 Mbp and lives by oxidizing ammonia to nitrite. Based on measurements of their signature lipids taken from ocean samples, These organisms are thought to be very abundant - estimated at 1028 cells in the world’s oceans (Konneke et al. 2005) - suggesting that they have a major role in global biogeochemical cycles and are one of the main contributors to the fixation of carbon. DNA sequences from Crenarchaea have also been found in soil and freshwater environments, suggesting that this phylum is ubiquitous to most environments.
Significantly, these two organisms have been shown to possess a eucaryote type topoisomerase 1B (swivelase) which plays a major role in DNA replicaton, replication and chromatin assembly in eucaryotes, and distinct from the 1B types found in some viruses and bacteria (Brochier-Armanet, Gribaldo & Forterre 2008). This confrims the founding eucaryote had a DNA genome. The closely related Caldiarchaeum subterraneum harbors a ubiquitin-like protein modifier system with structural motifs specific to eukaryotic system proteins. The presence of such a eukaryote-type system is unprecedented in prokaryotes, and indicates that a prototype of the eukaryotic protein modifier system is present in the Archaea (Nunoura et al. 2010).
Thus the two cell types believed to be involved in eucaryote endosymbiosis are both closely related to existing ubiquitous species dominant on a global basis.
Fig 8: Tangled roots (Doolittle 2000)
Many α-proteobacteria, including Rickettsiae (and related Wollbachia and Agrocbacterium), obligately live in the cytoplasm of other cells and so are naturally adapted to becoming an endo-symbiont of a glycolytic organism by providing respiring energy to the host's metabolism resulting in the mitochondrion. Giardia still retains traces of mitochondrial proteins so appears to have lost its respiring organelles, through specializing for an anaerobic parasitic habitat, rather than occupying a place in the tree before mitochondria were incorporated into eucarya (Adam 2000). In fact Giardia was found to have mitochondrial Fe-S proteins which showed up in vestigal mitochondria, now called a mitosomes (Zimmer 2009). Some protista, such as Trichomonads and fungi such as Chritidiomycetae from cattle rumen have hydrogen-generating hydrogenosomes which share evolutionary homology with mitochondria (Martin & Mentel 2010). Some mitochondria in worms and molluscs are also able to shift to a low-energy anaerobic mode.
Having discovered the hydrogenosome, Martin and Muller (1998) suggested that the advent of mitochondrial symbiosis might be explained as an interaction under anoxic conditions between a respiring hydrogen-producing bacterium and an archaeal cell utilizing hydrogen to make ATP, as many archaea still do. Between 1000 and 600 Mya, the ocean underwent an anoxic period, possibly due to a snowball Earth phase which was broken by atmospheric CO2 buildup, and a major oxygen increase generated by cyanobacteria, which then precipitated the acidic ocean iron and triggered the Cambrian.
There are glaring incidences of horizontal transfer in higher organisms, where for example, cows have a gene that originated in snakes. The picture of horizontal transfer is even more tangled in bacterial and archaeal genomes, which contain a great number of shared and exchanged genes, through promiscuous viral transfer between species.
This has led to Doolittle (1998), Woese (2002), and others, proposing a tangled root to the tree of life involving a transition from a regime in which there was a much higher rate of horizontal exchange and effective global optimization of genomes, to tree-like vertical evolution of genomes, once the more complex genomes of the Eukaryote domains became established.
Fig 9: Left: Superfamily fold incidence evoutionary tree of eucaryotes and the three domains of life. The number of folds connecting each group is shown lower left (Yang, Doolittle & Bourne 2005).
Right: High resolution tree of the three domains of life - eubacteria, archaea, eukaryotes. (Ciccarelli, Bork et. al. 2006) (click to enlarge).
One way of testing whether the three branches actually have a meaningful evolutionary tree is to use the simple presence of a given super-family protein fold as a classifier (Yang, Doolittle & Bourne 2005). This proves to be a more accurate measure of taxonomy than many others, based of fold frequency, which correctly divides the Archaea into crenarchs and euryarchs and groups the Eukarya into animals, plants, fungi, and others (protists). This method leads to the evolutionary tree illustrated left in fig 9. This leads to the implication that LECA was a well-defined organism with a rich gene complement and not just a quasi-genome shaped by horizontal transfer.
To try to clarify the taxonomic relationships founding the tree of life, Peer Bork and his team produced a refined evolutionary tree (fig 9 right) by selecting only universal proteins that had not been subjected to horizontal transfer, providing the most detailed tree root diagram to date, although admittedly on only a skeleton gene set comprising some 1% of the respective genomes. The phylogenetic tree has its basis in a cleaned and concatenated alignment of 31 universal protein families and covers 191 species whose genomes have been fully sequenced. Merhej et. al. have further demonstrated convergent evolution among specialized bacterial groups.
Fig 10: Tree diagram of the birth, transfer, duplication and loss of key genes in the redox and electron transport pathways, in a founding burst of gene evolution between 3.3 and 2.7 billion years ago (David and Alm 2010).
Then Lawrence David and Eric Alm (2010) produced the above tree investigating the central genes common to a wide spectrum of life forms, involving the founding steps of redox reactions and electron transport, demonstrating a rapid evolutionary innovation during an Archaean genetic expansion between 3.3 and 2.7 billion years ago. They mapped the evolutionary history of 3983 gene families that occur in a wide range of modern species. They were able to show that 27 per cent of these gene families appeared in a short evolutionary burst. Many of the genes from this time were involved in electron transport - a key step in respiration and photosynthesis, which ultimately led to oxygen-producing photosynthesis and the "great oxygenation event" 2.4 billion years ago, when the atmosphere became oxygen rich.
This lends support to the idea that the collective primordial genome functioned as a supercomputer (King 2010) based on parallel genetic algorithms combined with horizontal genetic transfer, whose bit computation rate through mutation and recombination is sufficient to generate the functional conformations, through protein folding, to solve the key metabolic pathways over a period no longer than 300 million years.
Fig 10b: Tree of orphan genes in metazoa with charts for the mouse and fruit fly, show the emergence of orphans throughout the span of evolution, with a peak in both at 800 million years ago when earth emerged from its “snowball” phase, with the current peaks corresponding to newborn genes, many of which will be lost. About 20 percent of new genes in fruit flies appear to be required for survival. And many others show signs of natural selection (Tautz & Domazet-Loso 2011).
Contrasting with the the early emergence of key functional genes it has been discovered that regions of non-coding DNA have been repeatedly activated from non-coding DNA to become de-novo 'orphan' genes, which cannot be accounted for by gene duplication, conversion to new functions through exon shuffling to make new modular arrangements, or genes generated from transposable elements. While orphan genes might seem exponentially improbable on the basis that n base pairs with 4n possible arrangements could randomly become functional, such genes have been recently found to be ubiquitous. 10-20% of genes in all taxa so far explored lack homologs in other species. About 2/3 of domains of unknown function (DUF) open reading frames (ORF) whose 3-D structures have been analysed show folds which are likely outlier extremes of existing gene families not recognized by gene comparison systems such as BLAST (Jaroszewski et al. 2009) many are also de-novo orphans.
A clear example is the Pldi gene in the mouse M. musculus, which has arisen within the past 2.5-3.5 million years in a large intergenic region present in many mammals, including humans, thus excluding gene duplication, transposable elements, or other genome rearrangements. The gene has three exons, shows alternative splicing, and is specifically expressed in postmeiotic cells of the testis enhancing sperm motility (Heinen et al. 2009). Its emergence correlates with indel mutations in the 5' regulatory region. A recent selective sweep is associated with the transcript region in M. musculus populations.
In humans, at least one de novo gene is active in the brain, leading some scientists to speculate such genes may have helped drive the brain’s evolution. Others are linked to cancer when mutated, suggesting they have an important function in the cell. De novo genes are often short, and produce small proteins. Rather than folding into a precise structure they have a more disordered architecture, allowing the protein to promiscuously bind to a broader array of molecules (Singer 2015). Investigations suggest that such taxon-specific genes drive morphological specification, enabling organisms to adapt to changing conditions in the generation of morphological diversity, and innate defence (Khalturin et al. 2009).
Bacteria engage in much more radical forms of pan-sexuality than higher organisms, involving viruses and plasmids, themselves separate mobile genetic elements, acting as agents of genetic transfer, accelerating the pace of bacterial evolution (Maxmen 2010). This enables the genetic sequences of bacteria, archaea and protists to move around in the genome and to be exchanged between cells, and even between different species. Sexual exchange of material can happen both through viral exchange and through a conjugation plasmid, which can spool DNA from one bacterium into another, resulting in a net donation of genes from one strain or species to another, which ensures a broad exchange of genetic material throughout bacterial ecosystems, resulting in rapid accumulation of advantageous genes exemplified by plasmid borne infectious drug resistance.
To give a very rough idea of the computing power of the combined bacterial genome alone, taking into account bacterial soil densities (~109/g), effective surface area (~1018 cm2), genome sizes (~106), combined reproduction and mutation rates (~10-3/s) gives a combined presentation rate of new combinations of up to 1030 bits per second, roughly 1012 times greater than the current fastest computer at 33 petaflops or about 1017 bit ops per second. Corresponding rates for complex life forms would be much lower, at around 1017 per second because they are fewer in total number and have lower reproduction rates and longer generation times, but they are still vying with the computation rates of the fastest supercomputer on earth.
An even higher figure has been given by Ladenmark et al. (doi:10.1371/journal.pbio.1002168.t001). Using information on the typical mass per cell for each domain and group and the genome size, estimate the total amount of DNA in the biosphere to be 5.3 x 1031 (±3.6 x 1031) megabases (Mb) of DNA (Table 1). This quantity corresponds to approximately 5 x 1010 tonnes of DNA, assuming that 978 Mb of DNA is equivalent to one picogram. Assuming the commonly used density for DNA of 1.7 g/cm3, then this DNA is equivalent to the volume of approximately 1 billion standard (6.1 x 2.44 x 2.44 m) shipping containers. The DNA is incorporated within approximately 2 x 1012 tonnes of biomass and approximately 5 x 1030 living cells, the latter dominated by prokaryotes. By analogy, it would require 1021 computers with the mean storage capacity of the world’s four most powerful supercomputers (Tianhe-2, Titan, Sequoia, and K computer) to store this information. If all the DNA in the biosphere was being transcribed at reported rates, taking an estimated transcription rate of 30 bases per second, then the potential computational power of the biosphere would be approximately 1015 yottaNOPS (yotta = 1024), about 1022 times more processing power than the Tianhe-2 supercomputer, which has a processing power of 33.86 peta flops on the order of 105 teraFLOPS (tera = 1012).
This picture of bit rates coincides closely with the Archaean expansion scenario noted above and suggests that evolution has been a two-phase process in which the much higher bit rates of the collective single-celled genome under promiscuous sexuality and horizontal transfer has arrived at a global genetic solution to the protein folding problems of the central metabolic, electro-chemical and even root developmental pathways, which are then later capitalized on by multi-celled organisms, through gene duplication and loss as well as the creation of new specialized genes at a much lower rate.
Fig 11: Horizontal transfers across the bacterial tree under two thresholds 10, 5 genes (Dagan et. al.).
The massive extent of horizontal transfer in eubacteria, as well as archaea has also become clear suggesting large components of procaryote genomes are effectively globaly optimized for their niches by frequent genetic transfer. Dagan et. al. have characterized the extent of horizontal transfer for a series of thresholds as well as establishing specific modularity of horizontal transfer of functions between groups.
Fig 12: Genetic diffusion at the root of the tree (Dagan and Martin)
Critics of the validity of the tree root concept, such as Dagan and Martin emphasize the small proportion (1%) of the genome used in Bork's study and stress both the lateral (or horizontal) gene transfer events uniting the prokaryote realms and the apparent chimaeric nature of the Eukaryote genome, which appears to contain both archaea-related informational genes and eubacterial metabolic ones, in addition to obvious endosymbiont events of the mitochondrion and chloroplast.
The case for horizontal transfer of genes between unrelated Eukaryote species through infectious elements invading new and hence non-resistant species is also well established. Genome-wide comparative and phylogenetic analyses show that HGT in animals typically gives rise to tens or hundreds of active ‘foreign’ genes, largely concerned with metabolism (Crisp et al. 2015). The SPIN element is present in a diverse unrelated set of species, spanning amphibians, reptiles, marsupials and mammals while absent from closely related species (Lisch).
Fig 13: Left: Spread of the HAS1 hyaluronan synthase genes across diverse groups - chordates, other metazoa, fungi, plants, bacteria and archaea (Crisp et al. 2015). Right: Pattern of invasions of the SPIN element (Lisch)
The estimate of the number of phyla of as yet undiscovered bacteria continues to grow in an unbounded trajectory. A research team from Berkeley (Brown et al. 2015) gathered water samples from the Colorado River, passed the water through a pair of increasingly fine filters - with pores 0.2 and 0.1 microns wide - and then analyzed the bacteria captured by the filters, many of which were very small and included hair-like structures (inset fig 13b), reconstructing the scrambled short pieces of DNA into complete and draft genomes. They divided the 789 organisms into 35 phyla - 28 of which were newly discovered. They based the sorting on the organisms’ evolutionary history and on similarities in the code on the organisms’ 16S rRNA genes - those with at least 75% of their code in common went into the same phylum. With these new additions, there are now roughly 90 identified bacterial phyla. This is a lot more than there were a year ago, but also far fewer than the 1,300 to 1,500 phyla that microbiologists estimate we’ll have once a complete accounting is finished.
Fig 13b: New phyla of bacteria (Brown et al. 2015).
The Eukaryote Nuclear Genome as a Genetic Fusion
Fig 14: The proposed ring of life (Rivera and Lake)
Rivera and Lake used a new algorithm to take account of possible genetic fusion events, forming a genetic ring through matching partial trees into a most parsimonious whole, inferring that the Eukaryote genome has arisen from a fusion of an archaeal (possibly eocyte) genome with that of either a cyanobacterium or possibly a γ-proteobacterium.
The method used cannot definitively determine whether or not the eubacterial genome could have come from the mitochondrial event, which, to an even greater extent than the more recent chloroplast, has resulted in a high net transfer of genes from the mitochondrial chromosome to the nucleus, leaving open the possibility that in addition to the mitochondrial symbiosis, and the later chloroplast one, there may have been an additional genetic fusion. Lane and Archibald have cited further major endosymbiosis events involving complex three genome interaction in protista, where both green and red algae have been incorporated by endosymbiosis into other protists, which demonstrate both that endosymbiois has occurred many times and the genomic complexity of nuclear symbiont gene exchange.
Fig 15: Proposed fusion between two genomes - informational, from Archaea (red), and metabolic, from Eubacteria (blue) as well as mitochondrial genes migrating to the nucleus (green) (Horiike et. al.). Up to 75% of nuclear genes whose ancestry has been elucidated may come from bacteria (Lane).
The idea of a genetic fusion between a member of the archaea and a γ-proteobacterium is also supported by several other lines of research, including evolution of glycolytic enzymes (Emelyanov), ‘homology-hit analysis’ of non-mitochondrial genes determined the number of yeast orthologous ORFs in each functional category to the ORFs in six Archaea and nine Bacteria at several thresholds, suggesting an archaeal parasite engulfed by a eubacterium (Horiike et. al.) the proposal that close association between a central methanogenic archaebacterium (archaea) and a close-knit surrounding clump of ancestral sulfate-respiring δ-proteobacteria could have also led to the nucleus and endoplasmic reticulum (Moreira and Lopez-Garcia). See also fig 16b2.
Recent work shows that the mitochondrial and nuclear genomes are in a feedback relationship, rather than the former being merely a denuded genetic skeleton reduced to its bare bones functions. While the process of transfer of mitochondrial genes to the nucleus has resulted in a tally of mitochondrial genes of only 37, comprising respiration enzymes and essential genes involved in mitochondrial DNA and protein synthesis, the highly-compactified human 16,569 bp mitochondrial genome neverthelss contains up to 500 overlapping open reading frames (Yen et al. 2013), as well as abundant mitochondrial genome-encoded small RNAs (Ro et al. 2013), which appear to be products of currently unidentified mitochondrial ribonucleases and may have a regulatory role. Humanin, a ubiquitous protein involved in human stress protection (Yen et al. 2013) appears to be generated from the mitochondrial genome, although duplicate copies also appear to have been transferred to the nucleus. This is also consistent with the metabolically responsive roles of the mitochondrion, for example in in initiating apoptosis (controlled cell death) in which humanin plays an inhibitory role, and enriching synaptic contacts in neurons (Sun et al. 2013). Mitochondrial genomes evolve 5-15 times faster than the nuclear genome. Until recently, it was generally accepted that all mitochondrial DNA molecules are identical at birth, however, recent work has shown that ~25% of healthy individuals inherit a mixture of wild-type and variant mtDNA, generally involving the non-coding hypervariable mtDNA D-loop responsible for initiating DNA replication and transcription (Payne et al. 2013). Variant mtDNA lineage expansions have been found in Tibetan sherpas living at different altitudes suggesting evolutionary adaptions over the order of 102 years (Kang et al. 2013). In turn, it has also been found that endurance exercise can turn over mitochondria, effectively removing those with reduced function due to accumulated mutations (Safdar et al. 2011).
Bacteria generally have a nucleoid region of distinctive cytoplasm around the DNA, but no nuclear membrane. An intriguing question is raised by the bacterium Gemmata obscuriglobus which is a member of the phylum Planctomycetes (see fig 9 enlargement) which appears to have both a nuclear envelope and endoplamic reticulum-like intra-cellular membranes and the ability to uptake proteins present in the external milieu in an energy-dependent process analogous to eukaryotic endocytosis consistent with autogenous evolution of endocytosis and the endomembrane system in an ancestral non-eukaryote cell (Lonhiennea et. al.).
Fig 16: Gene replacement tree root (Makarova) Hartman and Federov's list of putative chronocyte genes correspond to the 359 above.
Other theories stick to the three domain paradigm and propose that a primitive Eukaryote precursor possibly still retaining an RNA-based genome, as suggested by Woese (1998) might be the case for the progenote (first root life form) to be the last universal common ancestor of all three domains, possibly including genes for endoplasmic reticulum and microtubules, which later engulfed both a member of the archaea and a eubacterium. Hartman and Federov cite a collection of such genes, including those for ribosomal proteins as well, naming the organism as a chronocyte.
This is also consistent with the much greater complexity of use of RNA in Eukaryotes, including alternative splicing, the use of introns, interfering-RNAs in gene regulation, micro-RNAs and the use of the nucleus to contain a diversely functioning RNA informational metabolism not unlike that of a putative progenote.
Fig 16a1; In 2015 a new Archaeal phyllum Lokiarchaeota, from the TACK super-phyllum, already known to include homologs of actins and tubulins and cell division sorting proteins (Guy & Ettema) illustrated in the tree, was discovered at the Arctic Mid-Ocean Spreading Ridge near the Loki's castle vent, having a closer evolutionary relationship with eucaryotes than any other known archeote, which also has the largest complement of ESPs, or eucaryote signature proteins, so far discovered (Spang et al.). The analysis of the genome of Lokiarchaeum revealed about 175 proteins that were related to eukaryotic proteins. These included actins, ubiquitin modifying proteins, diverse Ras-superfamily GTP-ases surpassed only by the eucaryote Naegleria gruberi (see below), an ESCRT gene cluster which eucaryotes is an essential component of the multivesicular endosome pathway for lysosomal degradation of damaged or superfluous proteins and plays a role in several budding processes including cytokinesis, autophagy and viral budding as well as of several additional proteins homologous to components of the eukaryotic multivesicular endosome pathway, including curvature sensing protein families involved in various aspects of vesicle/membrane trafficking or remodelling processes in eukaryotes.
LECA: Finding the Roots of the Eucaryote and Metazoan Trees
Just as we have investigated the enigmatic nature of LUCA, the last common ancestor of all life on earth, the nature of LECA, the last eucaryote common ancestor remains obscure. We have seen that certain archaea such as the Lokiarchaeota possess genes such as structural proteins which are signature of eucaryotes and may have enabled such species to engulf mitochondria. We can now turn from the question of genetic fusion founding the eucaryotes to the question of what genetic and cellular regulation and signalling systems our common ancestor possessed and how early the key components enabling multi-cellular life evolved.
Fig 16a2: 1.6 Bya old fossils resembling red algae found in Tirohan Dolomite of the Lower Vindhyan in central India (doi:10.1371/journal.pbio.2000735 ).
Cyanobacteria enter the fossil record in the form of stromatolite mats around 3.5 billion years ago. William Schopf (Scientific American Feb 1991) found remnants of 3.6 billion-year-old stromatolites lying near fossils of 3.5 billion-year-old cells that resemble modern cyanobacteria, forming strings of putative microscopic cells. In 2016 (Nutman et al. 2016) discovered putative stromatolites dating to 3.7 billion years in the Isua formation, in Greenland. Thus oxygen-generating photosynthesis which provides an energetic basis for eucaryote respiratory metabolisms to survive arose very early. Evidence for an increase in oxygen in the shallow oceans due to photosynthesis, based on analysis of the Manzimnyama Banded Iron Formation, South Africa goes back 3.2 billion years (doi:10.1016/j.epsl.2015.08.007). Variations in selenium isotope concentrations in ancient rocks suggest that 2.32 billion to 2.1 billion years ago, shallow coastal waters held enough oxygen to support oxygen-hungry life-forms (doi:10.1073/pnas.1615867114). This timing corresponds to the so-called great oxygenation event, crisis or catastrophe, in which molecular O2 entered the atmosphere for the first time, after a long period where any oxygen released was taken up by oceanic iron and other terrestrial minerals entering the oxidized state, resulting in an approximate doubling of mineral diversity. But fossil and evolutionary evidence for eucaryote cells doesn't appear convincingly until around 1.6 billion years ago, when fossils resembling crown red algae have been found, indicating eucarytoes welll already well diversified into multicellular forms (doi:10.1371/journal.pbio.2000735 ). By comparison, eukaryotic microfossils containing mitochondria go back only 1.45 Bya in the fossil record (Martin & Mentel 2010 - fig 16b2).
Fig 16a3: Naegleria gruberi (Fritz-Laylin et al. 2010), a free-living single celled bikont amoebo-flagellate, belonging to the excavata, which include some of the most primitive eucaryotes such as Giardia and Trichomonads. Nevertheless it is capable of both oxidative respiration and anaerobic metabolism and can switch between amoeboid and flagellated modes of behavior, regenerating complete centrioles and flagellae de novo (Fritz-Laylin & Cande 2010). The Naegleria genome sequence contains actin and microtubule cytoskeletons, mitotic and meiotic machinery, suggesting cryptic sex, several transcription factors and a rich repertoire of signalling molecules, including G-protein coupled receptors, histidine kinases and second messengers including cAMP. One strain analyzed is a composite of two distinct haplotypes indicating hybridization. Although sexual mating has not been observed in Naegleria, the heterozygosity found in its genome is typical of a sexual organism, with perhaps infrequent matings. Additionally, identification of the core RNAi machinery indicates that Naegleria may use this mechanism. Nagleria fowlerii is a notorious warm-water species causing meningitis. (b) Two of four genes involved in meiosis, HOP1 believed to be a component of the synaptonemal complex essential for sexual recombination, and MRE11 meiotic recombination 11 involved in recombination, telomere processing and double-stranded DNA break repair. Both show a wide evolutionary distribution across eucaryotes including Giardia implying that LECA the last eucaryote common ancestor was a sexual organism (Ramesh, Malik & Lodgson 2005). MRE11, along with RAD51 also show homology with Archaea, suggesting an even deeper origin of sexuality.
The fact that primitive eucaryotes such as Giardia and Trichomonads appear to lack key structures and processes such as mitochondria and sexual recombination initially led to the notion that the founding eucaryote was a simple amoeboid cell lacking the genetic complexity of more advanced protista and higher organisms. However it is now clear that a tree reconstruction artefact, known as Long Branch Attraction, is responsible for the apparent early emergence of the fast evolving Archezoa in the eukaryotic tree. The notion that all extant eukaryotes are ancestrally mitochondrial is strongly supported by the discovery of rudimentary mitochondrial organelles in all analysed Archezoa. Part of the problem comes from the fact that organisms such as Giardia and Trichomonads have specialized for parasitic microenvironments and in particular anaerobic conditions where they can suffer rapid loss of inessential genes, distorting the evolutionary picture. When we look at close free-living relatives of these organisms, we discover a complex genetic complement containing most of the functionalites of "advanced" eucaryotes.
Naegleria gruberi transitions from amoeboid to flagellate behavior and back (https://youtu.be/yy5zI3LBK24).
The free-living amoebo-flagellate Naegleria gruberi for example (Fritz-Laylin et al. 2010) belongs to a varied and ubiquitous protist clade (Heterolobosea) that diverged from other eukaryotic lineages over a billion years ago and is very close to the hypothetical root of the eucaryote tree (see fig 16b5). The Naegleria genome, analyzed in the context of other protists, reveals a remarkably complex ancestral eukaryote with a rich repertoire of cytoskeletal, sexual, signaling, and metabolic modules.
Fig 16b: Consensus tree for eucaryote root (Brinkmann & Philippe 2007). The assumption underlying Woese's paradigm - that simple organisms (e.g., prokaryotes and Archezoa) represent genuinely primitive intermediates in the progressive construction of complex eukaryotic cells - has on the basis of more recent genetic analysis been replaced by that of a complex last common eucaryote ancestor (LECA/LCAEE). In the consensus picture, any feature present in some opisthokonts (e.g., animals) and in some bikonts (e.g., plants) is necessarily ancestral, i.e., inherited from the LECA. This implies that the LECA was in Brinkman and Phillipe's words: "amazingly more complex than previously thought. Among others, the LCAEE already possessed mitochondria, an efficient cytoskeleton associated with several intracellular transport systems, an endo-membrane system interconnected by a complicated vesicular transport machinery, including an endoplamic reticulum, a Golgi apparatus, a standard nucleus, efficient secretory and uptake pathways, recycling of food‐vacuoles, peroxisomes, spliceosomal introns, and flagella‐dependent motility. Therefore, simple extant eukaryotes have evolved from a complex LCAEE mainly by loss and secondary simplification".
Recently, a molecular dating study based on a large phylogenomic dataset with a relaxed molecular clock and multiple time intervals yielded in a surprisingly recent time estimate of 1085 Mya for the origin of the extant eukaryotic diversity. Therefore, extant eukaryotes seem to be the product of a massive radiation that happened rather late, at least in terms of prokaryotic diversity (Brinkmann & Philippe 2007). Given the large number of novel eucaryote genes and folds (Lane & Martin 2010) this late diversification needs to be carefully rationized with the results of (David and Alm 2010) shown in fig 10 and with the fact that the cyanobacterial chloroplast was incorporated by plants between 1200 and 1800 Mya (Dorrel & Howe 2012).
Fig 16b2: Left: The inside-out hypothesis for the origin of the eucaryote cell has the original eocyte budding out from a cell comprising the nuclear envelope to form external blebs through the nuclear pore structures to facilitate symbiosis with γ-proteobacteria later forming the endoplasmic reticulum and engufing the mirochondria as the blebs evolved into a continuous external membrane. Centre: two archaeoic Candidatus Giganthauma karukerense cells surrounded by ectosymbiotic γ-proteobacteria (Baum & Baum 2014). Right: Eucaryote fossil cell dated to around 1.45 billion years (Martin & Mentel 2010).
The eucaryote tree of life remains an enigma, particularly in terms of locating the root of the tree. Originally portrayed in terms of the five kingdoms of plants, animals and fungi, protruding from the single-celled protista, complementing the eubacteria and archaeotes classed as monera. However the tree has now been revised (Adl 2012). It is now recognized that multi-cellularity has arisen independently in many branches of the eucaryotes (Brown et al. 2012).
Fig16b3: (Left) Traditional five-kingdom eucaryote tree of life (Whittaker). (Right) Multiple evolution of multcellularity in diverse eucaryote super-groups
extends to many groups besides fungal and animal Opisthokonts and plant and algal Archaeplastida (Brown et al. 2012).
However a key step in multicellularity has been linked to a single critical mutation. To form and maintain organized tissues, cells must coordinate how they divide relative to the position of their neighbours. One important aspect of this process is orientation of the mitotic spindle, a structure inside the dividing cell that distributes the chromosomes —and the genetic material they carry — between the daughter cells. When the spindle is not oriented properly, malformed tissues and cancer can result. In a diverse range of animals, the orientation of the spindle is controlled by an ancient scaffolding protein that links the spindle to “marker” proteins on the edge of the cell. Anderson et al. have now used a technique called ancestral protein reconstruction to investigate how this molecular complex evolved its ability to position the spindle. First, the amino acid sequences of the scaffolding protein’s ancient progenitors, which existed before the origin of the most primitive animals on Earth, were determined. Anderson et al. (2015) computationally retraced the evolution of large numbers of present-day scaffolding protein sequences down the tree of life, into the deep past. Living cells were then made to produce the ancient proteins, allowing their properties to be experimentally examined. By dissecting successive ancestral versions of the scaffolding protein, they deduced how the molecular complex that it anchors came to control spindle orientation. This new ability evolved by a number of 'molecular exploitation' events, which repurposed parts of the protein for new roles. The progenitor of the scaffolding protein was actually an enzyme, but the evolution of its spindle-orienting ability can be recapitulated by introducing a single amino acid change that happened many hundreds of millions of years ago.
Fig 16b3b: Tree of Dlg protein evolution to animal multicellular mitotic division (Anderson et al. 2015)
Closer studies of the genetic and phylogeny of a large number of eucaryote proteins has led to a revision of the tree in which single-celled species such as amoebae are grouped with animals and fungi in the opisthokonts, which are separated to the ery root of the tree from multi-cellular plants forming the archaeplastids. Animals are closer cousins to single-celled choanoflagellates than to other multi-cellular organisms, such as fungi and plants. Giant kelp are closer relatives to single-celled diatoms than to multicelled red seaweeds or plants. Likewise separate groups consisting of Rhizaria, Alveolates and Stametopiles have been found to form a single supergroup now called SAR.
Fig 16b4: Revised eucaryote tree (Milius 2015) taking into account supergroupings which include both single-celled and multicelled organisms rooted as closely as, possible given several lines of phylogenomic (Burki 2014) evidence .
Attempts to find the root of the eucaryote tree naturally revolve around linking eucaryotes to their founding archaeota using RNAs and proteins linked to nucleic acid processing and other processes inherited from the founding archaeal genome. However, the archaeal sequences differ substantially from their eukaryotic counterpart, resulting in extremely long phylogenetic distances between archaea and eukaryotes. The use of a distant outgroup in phylogenetic reconstruction is highly problematic because the remaining phylogenetic signal is very weak, and so correct positioning of the root is even weaker, creating a nonphylogenetic signal that is often stronger than the phylogenetic signal, thereby favoring long-branch attraction so that fast evolving eukaryotes are constantly found at the base of all other eukaryotes. Rare cytological and genomic changes specific to some eukaryotic lineages have also been considered for rooting of the eukaryotic tree, but an innovative alternative strategy is to use the eubacterial genes which have been incorporated into eucaryote lines, particulaly the alpha-proteobacterial genes originally incorporated with the mitochondria. By combining the "ALPHA-PROT" dataset using of 42 eukaryotic proteins with a mitochondrial function encoded by the nuclear or mitochondrial genomes as phylogenetic markers with the "EUBAC" dataset using 37 eukaryotic genes acquired by ancient lateral gene transfers from different eubacterial sources and including a wider set of newly sequenced species the analysis arrives at the following consistent rooted tree (Derelle et al. 2015). This places malawimonads such as Malawimonas jakobiformis and discoba such as Naegleria gruberi on opposite sides of the root (fig16c), despite the act that they are both classed as excavates (Cavalier-Smith 2010, 2014) which are otherwise regarded as monophyletic based on their flagellar ultrastructure.
Fig 16c: Left: Consistent rooted eucaryote tree using two phylogenetic measures based of eubacterial-origin eucaryote gene sets. Malawimonas jakobiformis and Naegleria gruberi are illustrated to show common features close to the root of the tree. Right: The early origin of sexuality is attested to by recent research into the extended family tree of amoebae, which shows that sexualtiy is likely to have arisen in the common ancestor and been subsequently lost in asexual protist species, rather than the reverse (Lahr et al 2011). Amoeboid organisms (left) highlighted on the eucaryote tree of life. SAR indicates the group composed of Stramenopila, Alveolata and Rhizaria. Two principal branches (right) unikont amoebozoa and dikont rhizaria show confirmed (black), direct evidence such as meiosis (grey) and indirect evidence (white) for sexualty showing it is a likely founding characteristic. Unequivocal evidence for sex in Giardia implies sexuality arose in the last common ancestor of all eucaryotes, very early in evolution (Lane 2009a).
Contrary to the idea that sexuality and sexual recombination are late adaptions of advanced eucaryotes, investigation of meiosis-related genes HOP1, MRE11, MLH/PMS and RAD51/DMC1 produces evolutionary trees showing the occurrence of these genes across the eucaryotes from Giardia to Homo, implying sexuality is a founding characteristic of all extant eucaryotes (Ramesh, Malik & Lodgson 2005). Amoeboid forms of both unikont amoebozoa and dikont rhizaria show trends consistent with founding sexuality, later lost in some branches (Lahr et al 2011).
A detailed evolutionary picture showing the relationship between metazoa and protozoans and the bridge between choanoflagellates and sponges has been developed in association with elucidating the genome of the single-celled choanoflagellate Monosiga brevicollis (King et al. 2008).
Fig 16d: (a) Tree of organisms expressing various forms of integrin-related proteins, from α-actinin to α-integrin includes diverse branches of single-celled eucaryotes (Sebe-Pedros et al. 2010). α-actinin was expressed in all branches. Both α- and β-integrin are expressed in those marked with (*) including single-celled Capsaspora owcarzaki and Amastigomonas sp. (b) Tree of tyrosine kinases (Suga et al. 2012). Cytoplasmic CTKs have a deep eucaryotic origin while receptor RTKs evolve later. (c) Tree of GPCR components extends back to LECA (Mendoza et al. 2014) making this signalling pathway, central to the nervous systems of higher animals, a founding unit of eucaryote evolution. All except those marked with * had one or more GPCR components and all clades included species possessing canonical GPCR, across both unikonta and bikonta, implying a common origin with LECA.
Examination of three key gene systems associated with the emergence of metazoa, the integrin pathway (Sebe-Pedros et al. 2010), tyrosine kinases (Suga et al. 2012) and G-protein coupled receptors (Mendoza et al. 2014) shows the key components evolved long before in single-celled eucaryotes and even the last eucaryote common ancestor in the case of G-protein system components. In both tyrosine kinases and GPCRs this allowed for a burst diversification of receptor-based tyrosine kinases and diverse GP-coupled receptors as the metazoa emerged.
Fig 16e: Two controversial fossils suggesting an early origin for multicellular life, the left at 2.1 byrs (El Albani et al. 2010) and the right at 1.5 byrs (Zhu et al. 2016).
The view of the root of the metazoan tree has shifted significantly with the discovery of new genetic data, novel primitive species and new fossil evidence. Research into the absence of the miRNA pathway of ctenophores (Maxwell et al. 2012) and an apparently separate evolution of neural nets and apparent absence of HOX genes (Moroz et al. 2014) suggests that, rather than being sister organisms of the cnidaria, they may be more ancient life forms originating in the Ediacarian. Fossils of another organism Eoandromeda octobrachiata dating back to 580 million years (Tang et al. 2011) is consistent with this picture. The discovery of large colonial fossils dating back to 2.1 billion years (Albani et al. 2010) in addition to the previous discovery of the putative tubular alga Grypania spiralis dating back to a similar age (Han & Runnegar 1992) sets a potentially very early origin for multicellular organisms, also lending weight to the biological nature of Mawsonites spriggi (Seilacher et al. 2005).
The discovery of a living genus Dendrogramma in Australia (Just et al. 2014) originally put it earlier than the emergence of the ctenophores, but later genetic evidence indicates this is a siphonophore, and hence in the cnidaria (Gough M 2016 Origin of mystery deep-sea mushroom revealed BBC 7 Jun).
Fig 16f: Left: Trees of eucaryotes based on translation elongation factor EF-2 and β-tubulin genes (King and Carroll). Right :Finding the root of the metazoan tree. (a) Ctenophora first tree elucidating fixation of identified components from miRNA to neurones and muscles (Moroz et al. 2014) (b) Evidence for early emergence of the Ctenphora exemplified by Mnemiopsis leidyi based on their lack of miRNA processing suggests they diverged before both sponges such as Amphimedon queenslandica and the placozoan Trichoplax adhaerens (Maxwell et al. 2012). (c) Possible evolutionary position of Dendrogramma (Just et al. 2014).
Looking at the varying pace of evolutionary change in a very ancient gene family and one of the largest in the human genome, the G-protein linked receptor family has roots going back to the first eucaryotes, with two major types of serotonin receptor 5HT1 and 5HT2 diverging before the molluscs, arthropods and vertebrates diverged and originating between 750 million and 1 billion yeas ago. Consequently serotonin functions in mood and circadin rhythms in a similar manner in insects and humans. The diversity of neurotransmitters in humans particularly the amines serotonin, dopamine, norepinephrine and histamine and the amino acids glutamate, gamma-amino-butyric acid and glycine originate from the need of single celled eucaryotes to communicate major pathways of strategic survival from nutrition through aversion to reproduction and sporulation. The family also includes the opsins of visual perception, development receptors and a diverse array of olfactory receptors which are evolving far more rapidly. Serotonin appears with the first photosynthetic bacteria that used tryptophan to hold the porphyrin reactive center and continues, along with melatonin to play a crucial role in light and circadian cycles in humans, as well as mood and social responsiveness.
Fig 16g: (a) The two major serotonin receptor types 5HT1 and 5HT2 separated before the molluscs, arthropods and vertebrates diverged (Blenau & Thamm). (b) Evolutionary tree of the human G-protein linked receptors with examples highlighted in color. On the α branch are amine receptors - serotonin 5HT1A and 5HT2A, dopamine D1, and D2 (DRD1, DRD2), adrenergic α2a (ADRA2A), muscarinic acetylcholine (CHRM2), trace amine TAR1, as well as rhodopsin (RHO) and encephalopsin (OPM3). On the glutamate branch are metabotropic glutamate mGluR2 and GABA GABBR1. On the β branch is oxytocin (OXTR) surrounded by vasopressin receptors and Ghrelin. On the γ branch are opioid κ and μ (OPRK1, OPRM1). Olfactory and the non-rhodopsin receptors are linked to their respective points on the rhodopsin family tree. (Fredriksson R et al, Zozulya S. et al). (c) Insect tree of receptors for serotonin, dopamine, tyramine and octopamine neurotransmitters (Blenau & Baumann).
The evolution of the metazoan sodium channel essential for the neuronal action potential, from the Calcium channel shared by fungi and animals ocurred in single celled eucaryotes before the metazoa evolved from choanoflagellate-like ancestors. Metazoan eyes also appear to have a common origin, as indicated by the capacity of both jellyfish and mouse pax genes to elicit ectopic compound eyes on fruit files.
Fig 16h (a) Common involvement of PAX genes in eye formation from jellyfish, insects and vertebrates suggests a single common origin despite the differing mechanisms. The jelly fish pax genes, like mouse pax-6, induce ectopic compound eyes in fruit fly (right) (Suga et al, Kozmik et al). The small camera eyes of a jellyfish are shown top right (yellow arrow). It has even been suggested that the jellyfish could have gained the eye development pathway through symbiosis with certian single-celled dinoflagellates which possess an eyespot ocelloid, complete with lens and retinoid organelle (lower right) and may have in turn inherited this functionality from cyanobacterial chloroplasts via red algae (Pennisi, Keim, Le Page). Detailed analysis (Gavelis et al. 2015) shows it to be a compound endosymbitoic structure involving both a mitochondrial 'cornea' and red-alga plastid derived retinal body comprising stacked wave-form membranes derived from chloroplast thylakoids surrounded by pigmented lipid droplets. Dinoflagellates are thus an example of multiple engulfing endosymbiont events in which both mitochondria and then single-celled red-algae complete with plastids have been incorporated. (b) Evolutionary diversification of Na+ channels from Ca++ channels, essential for the action potential, appears to have occurred before the existence of nervous systems in founding single-celled eucaryotes leading to the metazoa before the choanoflagelates such as monosiga (Liebeskind et al).
Trees of Life based on Integrating Phylogenetic relationships and Environmental Genetic Diversity
There have been several recent efforts to construct comprehensive trees of life that include phylogenetic relationships that unite all lineages. To reconstruct a comprehensive tree of life, Hinchliff et al. (2015) synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny, presenting a draft tree containing 2.3 million tips - the Open Tree of Life.
Fig 16i: Phylogenies representing the tree. The depicted tree is limited to lineages containing at least 500 descendants. (A) Colors represent proportion of lineages represented in NCBI databases. (B) Colors represent the amount of diversity measured by number of descendant tips. (C) Dark lineages have at least one representative in an input source tree.
Fig 16j: Although the Open Tree of Life contains only one resolution at any given node, the underlying graph database contains conflict between trees and taxonomy, highlighting ongoing conflict near the base of Eukaryota (A) and Metazoa (B).
Following a complementary course, Hug et al. (2016) use new genomic data from over 1,000 uncultivated and little known organisms, using metagenomics - a shotgun sequencing-based method in which DNA isolated directly from the environment is sequenced, and the reconstructed genome fragments are assigned to draft genomes, together with published sequences, to infer a dramatically expanded version of the tree of life. The results reveal the dominance of bacterial diversification and underline the importance of organisms lacking isolated representatives, with substantial evolution concentrated in a major radiation of such organisms. This tree highlights major lineages currently underrepresented in biogeochemical models and identifies radiations that are probably important for future evolutionary analyses.
Fig 16k: Tree of life including uncharacterized species.
A third comprehensive tree of life (Gringer), based on completely sequenced genomes with genome sizes arrayed around the boundary.
Fig 16l: Tree of life based on complete genome sequences.
Viral Influences on the Nuclear Genome
Fig 17: Proposed viral contribution of DNA polymerases FvA etc. founder viruses (Forterre)
Other cellular and viral genealogies are possible and the scheme is merely representative.
Forterre looks likewise to a three component origin, but his emphasis is on the idea that viruses have contributed major components to the genome of all three groups, possibly providing each of three RNA-based cell lineages with independent transitions to DNA-based genomes by contributing DNA-polymerases, thus radically improving the stability and competitiveness of these cell lines who became the eventual survivors. In addition to the ribosomal proteins and rRNAs having distinct qualitative features in each domain, many DNA informational proteins exist in different nonhomologous families (usually with several versions for one family). There are already six known nonhomologous families of cellular DNA polymerases. In the case of DNA polymerases of the B family, there is one version in Bacteria (only found in some proteobacteria), one in Archaea, and several in Eukarya. The distribution of the different versions and families of cellular DNA informational proteins among domains is erratic most of the time and does not fit with any of the models proposed for the universal tree, suggesting abrupt insertion into the cellular genomes by viral transfer.
Fig 17b: The differing DNA and RNA viral taxomomy of the Rep and capsid genes of BSL RHDV (Diemer & Stedman 2012).
A very unusual virus has given an indication how the transfer from RNA to DNA could have meen mediated by viruses. Although viruses are very promiscuous, they generally only recombine with viruses of s similar type or at least the same mode of replication. Thus until recently no instances were known of viral recombination bridging the three major groups: RNA viruses, DNA viruses and retroviruses encoding DNA from RNA instructions by reverse transcriptase. However a chimeric circular, putatively single-stranded DNA virus BSL RHDV encoding a major capsid protein similar to those found only in single-stranded RNA viruses was discovered in a hot acidic lake (Diemer & Stedman 2012). They also found that something very similar had turned up in samples of ocean water sequenced by a team led by Craig Venter. This gives the beginning of an explanation how DNA-based RNA-viral genes in endemic viruses, presumably via reverse transcriptase, have made multiple RNA to DNA transitions of other viral and cellular genes, probably when RNA, DNA and retroviruses cohabited cells.
Viruses such as phages appear to have no evolutionary tree, with genomes across widely diverse habitats consisting of cut and paste components, implying viral adaption has resulted almost entirely from utilization of advantageous genes from horizontal transfer. Around 10% of all bacterial genes sequenced to date consist of ORFans that bear no resemblance to genes seen anywhere else, suggesting horizontal viral origin (Hamilton).
Fig 18: Left: Evolutionary tree of DNA polymerase amino termini (Villareal and Defilippis)
Right: Bacterial DNA polymerases also show viral members (underlined) close to the root of the tree (File et. al.).
Villareal and Defilippis have likewise investigated the idea that DNA viruses are the origin of DNA replication proteins, by investigating the amino terminus and constructing an evolutionary tree which shows DNA polymerases of DNA viruses, eukaryotes (&alpha,&delta), archaea, E coli and two phages rooted in a tree consistent with a viral origin.
This idea has a great deal of plausibility because viruses are now know to have a potentially primal origin, rather than being recent escapees from cellular genomes which have undergone reductive parasitic changes to their genome. Viruses clearly also have retained both RNA-RNA, DNA-DNA and retrotranscription DNA-RNA-DNA using both RNA and DNA stages in their capsid viral forms, so they retain all the transitional states between RNA and DNA-based replication.
Furthermore the retroviruses and related mobile genetic elements have a common ancient evolutionary origin, which is related to telomerase, which itself uses an RNA primer to initiate chromosome duplication. There is thus a plausible case that telomerase is in fact a biological fossil of a retroviral conversion of the founding Eukaryote cell line to a DNA genome.
The Symbiotic Face of Eukaryote Mobile Elements
Fig 19: (Left) Human transposable element evolutionary history of L1-LINEs (cream), Alu elements (lt. blue), retrovirus-like LTR (long-terminal repeat) elements (green) and DNA transposons (dark brown). Older L2-LINEs and SINEs are in yellow and dark blue. This history extends back over 200 million years indicating the very ancient basis of this potentially symbiotic relationship (Human Genome Consortium) (click to enlarge). Comparison with the mouse genome can be accessed at Waterston et. al. (2002). (Right) L1 replication: L1 is transcribed in open reading frames ORF1, an RNA-binding protein, and ORF2 an endonuclease/reverse-transcriptase. The bound RNA-protein complex RNP is transported to the nucleus where target-primed reverse transcription to chromosomal DNA takes place (Han and Boeke).
In 1978, following the work of Darryl Reanny (1974-6), I proposed (1978, 1992) that viruses and transposable elements, far from just being selfish genes (Dawkins 1976), formed part of a dynamical system of genetic symbiosis between the hosts and the mobile genetic elements, because the mobile elements permitted forms of coordinated gene expression and the formation of new genes in a modular manner, which would otherwise be impossible, achieving in return perpetuation of their own genomes over evolutionary time scales. Most of the details of this proposal have proved to be realized. The ENCODE project has demonstrated involvement of all the major classes of human transposable element in regulatory enchancer activity, most specific to a single cell type (Thurman 2012).
Fig 20: ENCODE data showing involvement of the major classes of transposable element in enhancer activity (Thurman 2012).
By some reckonings, 40 to 50 per cent of the human genome consists of DNA imported horizontally by viruses, some of which has taken on vital biological functions. Taken together, virus-like genes represent a staggering 90 per cent of the human genome (Hamilton). Coding sequences comprise less than 5% of the human genome, whereas repeat sequences account for at least 50% and probably much more. Transposable LINE or long-intermediate repeat retroelements, common to mammals (Han and Boeke), and insects (Jensen et al, Sheen et al) with a history running back to the Eukaryote origin are specifically activated in both sperms and eggs during meiosis (Branciforte and Martin, Tchénio et. al., Trelogan and Martin), although subjected to down regulation by interfering piRNAs (Aravin et al). They replicate from transcribed RNA copies of themselves thus using RNA to instruct DNA copies, indicating an origin in RNA-based life, as does the active RNA processing of our own Eukaryote cells. Their RNA-based reverse transcriptase shows homologies with the telomerase essential for maintaining immortality in our germ line, indicating a common and symbiotic origin. 100,000-950,000 partially defective LINEs, around 100 of which remain fully active in humans, and their 300,000 dependent smaller fellow traveller Alu SINEs make up a significant portion of the human and mammalian genomes, along with pseudogenes, apparently defective copies of existing genes translocated by elements such as LINEs.
These elements travel passively down the germ line with chromosomal DNA, so their specific activation during meiosis suggests they may perform a role of coordinated regulatory mutation. This suggests that the type of symbiotic sexuality embraced by bacteria and plasmids also continues to function in higher organisms in a form of sexual symbiosis between our chromosomes and transposable genetic elements. This is consistent with the 1.4% point mutation divergence between humans and chimps, being overshadowed by an additional 3.9% divergence to 5.4% overall (Britten), when insertions and deletions are accounted.
SINEs, such as human Alu, a free-rider on the LINE reverse transcriptase derived from the small cellular RNA used to insert nascent proteins through the membrane, are in turn implicated in active functional genes (Reynolds, Schmid) particularly some involved in cellular stress reactions, again suggesting genetic symbiosis. Humans have about 13 times as many RNA edits as non-primate species, including inosine insertions associated with Alu elements, as well as intron deletions (Holmes ) and newly inserted exons (Ast), which may differentiate humans from other apes through alternative splicing of genes expressed in the brain. RNA editing is abundant in brain tissue, where editing defects have been linked to depression, epilepsy and motor neuron disease. There is a new Alu insert about every 100 births. As many as three quarters of all human genes are subject to alternative splice editing.
Somatic variation in LINE L1 insertions has been found in the human brain (Erwin et al. 2016). The healthy human brain is a mosaic of varied genomes. L1 retrotransposition is known to create mosaicism by inserting L1 sequences into new locations of somatic cell genomes. Somatic L1-associated variants in the brain (SLAVs) are composed of: (a) L1 retrotransposition insertions and (b) retrotransposition-independent L1-associated variants. A subset of SLAVs comprise somatic deletions generated by L1 endonuclease cutting activity. Retrotransposition-independent rearrangements in inherited L1s resulted in the deletion of proximal genomic regions. These rearrangements were resolved by microhomology- mediated repair, suggesting L1-associated genomic regions are hotspots for somatic copy number variants in the brain and therefore a heritable genetic contributor to somatic mosaicism. SLAVs are present in crucial neural genes, such as DLG2 and affect 44–63% of the cells in the healthy brain. Some transposon hopping may be a reaction to stress that may allow brain cells to develop capabilities not initially encoded in the genome which could influence behavior, thinking and personality. Even identical twins may have genetically different brain cells because of transposon hopping after the embryo splits.
Recent explosion of the area of interfering miRNAs as regulatory elements in gametogenesis and development (Großhans) has provided an explanation of how pseudogenes, including those retrotransposed via LINE elements, can gain functional regulatory significance even though they do not produce translatable mRNAs.
Fig 21: Pseudogene-mediated production of endogenous small interfering RNAs (endo-siRNAs). Pseudogenes can arise through the copying of a parent gene (by duplication or by retrotransposition). (a) An antisense transcript of the pseudogene and an mRNA transcript of its parent gene can then form a double-stranded RNA. (b) Pseudogenic endo-siRNAs can also arise through copying of the parent gene as in a and then nearby duplication and inversion of this copy. The subsequent transcription of both copies results in a long RNA, which folds into a hairpin, as one half of it is complementary to its other half. In both a and b, the double-stranded RNA is cut by Dicer into 21-nucleotide endo-siRNAs, which are guided by the RISC complex to interact with, and degrade, the parent gene's remaining mRNA transcripts. The mRNA from genes is in red and that from pseudogenes is in blue. Green arrows indicate DNA rearrangements (Sasidharan and Gerstein).
Although the data from the human genome project indicated that human LINEs are becoming less active as a group by comparison with the corresponding elements in the more rapidly evolving mouse genome, there remain about 60 active human LINE elements which are known to be responsible for mutations in humans. More recent investigation (Boissinot et. al.) shows that the most recent families are highly active. Around four million years ago shortly after the chimp-human split, a new family Ta-L1 LINE-1 emerged and is still active, with about half the Ta insertions being polymorphic, varying across human populations. Moreover 90% of Ta-1d, the most recent subfamily are polymorphic, showing highly active lines remain present. LINEs are more heavily distributed on the sex chromosomes with X chromosomes containing 3 times as many full length potentially active elements and the Y chromosome 9 times as many! This is consistent with a continuing mutational load on humans which is removed more slowly from the sex chromosomes by crossing over in proportion to the degree to which crossing over is inhibited in each (i.e. totally on the Y and largely in males in the X but not in females). Sexual recombination is a protection from mutational error in a process called Muller's ratchet.
Fig 22: Evolution of reverse transcriptases from a common ancestor bearing a LINE archetype (Xiong and Eickbush, Nakamura et. al.). The root of their evolution goes back to the transfer from RNA to DNA at the beginning of life. They form a complementary evolutionary tree to that of cellular life as genetic symbionts of metazoa travelling down the germ line. Their group includes telomerases essential to the reproductive cycle.
LINEs are preferentially expressed in both steriodogenic and germ-line tissues in mice (Branciforte and Martin, Trelogan and Martin), suggesting stress could interact with meiosis. L1 expression occurs in embryogenesis, at several stages of spermatogenesis including leptotene, and in the primary oocytes of females poised at prophase 1. Conversely the SRY-group male determining gene SOX has been found to regulate LINE retrotransposition (Tchénio et. al.). Similarly LINE elements have been proposed to be 'boosters' in the inactivation of one X chromosome that happens in female embryogenesis (Lyon). This could enable somatic stress to have a potential effect on translocation in the germ-line which might enable form of genetic adaption in long-lived species such as humans. They have diverse means both to cause mutational damage and novel alleles (Han and Boeke).
Both L1 and Alu elements may be able to self-regulate rates of replication, through the existence of stealth drivers, viable elements which maintain a low transcription rate of active elements, with little genomic impact and hence little negative selection. These occasionally seed daughter master elements, which may replicate actively to form new families when conditions permit. This picture is consistent with long periods of quiescence, punctuated by bursts of 'saltatory' replication leading to large copy numbers (Han et. al.).
Further evidence of a symbiotic relationship comes from Drosophila telomeres, which are maintained by the non-LTR retrotransposons, the Line-like TART (Jensen et al, Sheen et al) and HeT-A (Biessmann et al). Likewise the recombination activating gene protein RAG1/RAG2, essential for the mutational variability of the vertebrate immune system, appears to have evolved from an ancient DNA transposon common to the metazoa (Agrawal et al, ). Significant similarities exist in the catalytic proteins of Hermes hAT transposase in insects, the V(D)J recombinase RAG, and retroviral integrase superfamily transposases, thereby linking the movement of transposable elements and V(D)J recombination (Zhou et al).
Fig 22b: Transib evolutionary tree spans the eucaryotes (Kapitonov and Jurka).
Researchers had long suspected that two DNA-cutting enzymes called RAG1 and RAG2 are encoded by relics of a DNA transposon, but no one had ever found a transposon that uses those proteins. Working with the primordial vertebrates called lancelets, Xu and colleagues found a DNA transposon called ProtoRAG implying the DNA transposon that gave rise to the two enzymes jumped into an ancestor of lancelets and jawed vertebrates about 550 million years ago (Saey 2017). The approximately 600-amino acid ‘‘core’’ region of RAG1 required for its catalytic activity is also significantly similar to the transposase encoded by DNA transposons of the Transib superfamily discovered recently based on computational analysis of the fruit fly and African malaria mosquito genomes. Transib transposons also are present in the genomes of sea urchin, yellow fever mosquito, silkworm, dog hookworm, hydra, and soybean rust (Kapitonov and Jurka).
Transposable elements have major evolutionary influences on higher eucaryote evolution, in ways which make them central to phenotypic complexity and identity (Saey 2011, 2017). As entities that make their living by getting copied into RNA over and over again, retrotransposons are littered with transcription factor binding sites. Gene modules such as those arising from the bounding long terminal repeats or LTRs of transposable elements can disseminate whole libraries of binding sites that over time become complex gene-regulating switches.
Some of these recycled transposon factors may have helped humans fight viruses. About 45 million to 60 million years ago, a retrovirus called MER41 invaded the genome of a primate ancestor of humans. MER41 includes binding sites for transcription factors involved in fighting infections which are alerted to infection by interferon gamma. The retrovirus may have used the interferon gamma signal to boost its own production. But over time, the mammalian hosts turned that weapon against the virus.
Rewiring gene activity in humans happened, in part, when transposons inserted themselves into the genomes of human ancestors after the split from chimpanzees. Remains of the transposons that infected humans have been recycled into more than a thousand regulatory switches found only in humans.
MicroRNAs (miRNAs) are crucial regulators of gene expression at the post-transcriptional level in eukaryotes by targeting gene 3'-untranslated regions. Researchers identified 409 TE-derived miRNAs, 386 of which overlapped with TEs, which are derived from TEs in human, indicating that TEs play important roles in origin of human miRNAs, with humans also having more than other mammals and vertebrates.
Into the transgenetic milieu also come lincRNAs - non-coding RNAs over 200 nucleotides in length that are generally but not necessarily inter-genic have reamined enigmatic but are increasingly associated with key protein traffic management roles in the eucaryote cell, serving as guides showing proteins where to go, tethering proteins to different types of RNA, or to DNA, acting as decoys and distracting regulatory molecules from their usual assignments, moulding cellular development and the phenotypic features of every organismic species, which are defined by regulatory variations in a largely similar complement of protein-oding genes. Although genes that code for proteins make up only 1.5% of the mouse genome, more than 63% of the genome's DNA is copied into RNA. In humans the number is even higher, with up to 93% of the genome made into RNA, even though protein-coding genes make up less than 2% of the genome. According to experimental estimates there are 6,736 - 10,000 long noncoding RNAs encoded in the human genome, a figure comparable to the 20,000 protein coding genes. More than 30 percent of long noncoding RNAs are repurposed transposable elements. Humans have several lincRNAs that are found in no other species. Many of those RNAs are made in the brain, leading scientists to speculate that the molecules may be at least partially responsible for that important organ's evolution.
Examples are XIST and it nemesis TSIX which produce competing RNAs that inactivate one of the two X-chromosomes in females by coating one X in XIST. Another is HOTAIR which is functionally involved in 854 distinct locations and inactivates HOXD developmenal genes also involved in cancer and it complementing activator HOTTIP. A computerized search of the human genome has identified 7,000 genes whose m-RNAs could act as microRNA decoys in 248,000 interactions. A pseudogene (a an inactive mutated version of a gene originating from a transposable element) has been confirmed to act as a decoy attracting microRNAs that bind to and inactivate m-RNAs for important genes such as PTEN involved in cancer suppression. the linc-RNA linc-MD1 is important in muscle development. It sponges two microRNAs away from the messenger RNA of two genes, allowing more muscle-building proteins to be made from those genes.
Fig 22b2: Top row: Human Blastocyst expression of HERVK. DAPI is a flourescent dye binding to AT-rich regions, OCT4 is a key embroygenesis stem factor (Grow et al. 2015):
Bottom left: Graduated involvement of various linc-RNAs in embryogenic differentiaton. .Bottom Right: XIST binding to one of the two X's.
Retroviral DNA - remnants of ancient retrovirus infections of germline cells - comprises 8% of the modern human genome. They are one of the oldest viral groups with evidence for an origin >450 mya ago with the first marine vertebrates or earlier (see fig 22b2). Endogenous retroviruses, or ERVs, which also travel down the germ line as free-riders, although some may retain infectious capacity, may be essential for placental function, as every mammal tested has placental blooms of endogenous retroviruses which appear to both aid the formation of the syncytium, the super-cellular fused membrane that enables diffusion from the mother to the baby and the immunity suppression, which prevents rejection of the embryo, both characteristics of retroviruses such as HIV.
Pluripotent stem cells are capable of generating all embryonic cell lineages but, until recently, scientists could seldom manipulate induced pluripotent stem cells (iPSCs) and embryonic stem cells (ESCs) to generate extra-embryonic cell types, such as placental cells. Prior work had shown that a small number of cells independently develop the potential to produce extra-embryonic cell types, and that the process was linked to endogenous retroviruses. Choi et. al (2017) have now shown that removing microRNA miR-34a from a stem cell can kick off a molecular pathway that induces endogenous retroviruses and, at the same time, enables iPSCs and ESCs to consistently form extra-embryonic cells in a dish. The results suggest that a particular class of noncoding RNA works in concert with the latent viral elements of the genome work to limit stem cell potential, and that removing a key miRNA can lift this limitation.
Brattas et al. (2017) have determined that almost 10,000, primarily primate-specific, ERVs may serve as "docking platforms" for a protein called TRIM28. Two years ago, Johan Jakobsson's team showed that ERV have a specific regulatory role in mouse neurons specifically. However, their 2017 study has been made using human cells. TRIM28 has the ability to "switch off" not only viruses but also the standard genes adjacent to them in the DNA helix, allowing the presence of ERV to affect gene expression. These results uncover a gene regulatory network based on ERVs that participates in control of gene expression of protein-coding transcripts important for brain development. This switching-off mechanism may also behave differently in different people, since retroviruses are a type of potentially transposable genetic material that may end up in different places in the genome. This makes it a possible tool for evolution, and even a possible underlying cause of neurological diseases. There are further studies that indicate a deviating regulation of ERV in several neurological diseases such as ALS, schizophrenia and bipolar disorder.
Fig 22b3: Left evolution of functional TRIM28-binding ERVs with full length ERVs in red.
Right: Deelopmental expression of ERVs.
When the team analyzed ERV expression in detail, they found distinct differences between pluripotent human embryonic stem cells, which correspond to a developmental stage before germline commitment, and samples obtained from human embryonic brain. Although the majority of reads in neural cells originate from non-internal ERV fragments, hESCs transcribe a large number of internal ERV fragments, indicating that more complete ERV loci are primarily expressed in hESC, whereas long terminal repeat (LTR) fragments dominate in embryonic brain samples. Because the majority of ERVs expressed during human brain development were incomplete fragments, these loci may primarily be passively ex- pressed due to their position in a transcriptionally active genomic region.
Only one retrovirus, the most recent endogenous retrovirus to infect the human line, the HML2 subgroup of Human Endogenous Retrovirus K (HERV-K), has continued to reinfect the human line after the divergence from the lineage leading to chimpanzees and bonobos approximately 6 million years ago. HERV-K has remained active since then, reinfecting germ lineage cells of Neanderthals and Denisovans multiple times, around the time of, or subsequent to, the divergence of the archaic hominin lineages from that leading to modern humans (Agoni et al. 2012, Lee et al. 2014). One of the proviruses was shared by Neandertals and Denisovans, consistent with these archaic humans sharing a common ancestor more recently than they shared one with the lineage leading to modern humans.
Fig 22b4: Left: The Big Bang of picorna-like virus evolution antedates the radiation of eukaryotic supergroups (Koonin 2008). Picornaviruses are nonenveloped viruses that represent a large family of small, cytoplasmic, plus-strand RNA(~7.5kb) viruses, also responsibel for acture respiratory illness in humans. Compare fig 16b. Right: Retroviral phylogeny illustrating how foamy viruses (FVs) and amphibian and fish foamy-like endogenous retroviruses (FLERVs) relate to other retroviruses. These phylogenetic analyses suggest that this major retroviral lineage, and therefore retroviruses as a whole, have an ancient marine origin and originated together with, if not before, their jawed vertebrate hosts >450 million years ago in the Ordovician period, early Palaeozoic Era (Aiewsakun & Katzourakis 2017). An even older origin date for retroviruses can be inferred from the autonomous DNA transposon Polinton family parasitizing protists, funci and animals, which acquired a retroviral integrase at least 1 billion years ago (doi:10.1073pnas.0600833103).
HERV-K still has members with open reading frames and has been found to be expressed at the 8-cell stage of embryogenesis and appears to protect the embryo against infection from other viruses (Grow et al. 2015). HERV-K is transcribed during normal human embryogenesis, beginning with embryonic genome activation at the 8-cell stage, continuing through the emergence of epiblast cells in preimplantation blastocysts, and ceasing during human embryonic stem cell derivation from blastocyst outgrowths. Unlike most other human ERVs, HERVK retained multiple copies of intact open reading frames encoding retroviral proteins. It is transcriptionally silenced by the host, with the exception of in certain pathological contexts such as germ-cell tumours, melanoma or human immunodeficiency virus (HIV) infection. DNA hypomethylation at long terminal repeat elements representing the most recent genomic integrations, together with transactivation by OCT4 (which inhibits differentiation of stem cells in the pre-implantation embryo), synergistically facilitate HERVK expression in the early embryo from the 8-cell to blastula.HERV-K viral-like particles and Gag proteins can be found in human blastocysts, indicating that early human development proceeds in the presence of retroviral products. Expression of the HERV-K accessory protein Rec, may also inhibit viral infection. Moreover, Rec directly binds a subset of cellular RNAs and modulates their ribosome occupancy, indicating that complex interactions between retroviral proteins and host factors can fine-tune pathways of early human development.
Fig 22c: The horizontally spread retrovirus HIV appears to have been transmitted to humans on three separate occasions leading to distinct evolving genotypes (Left: Sharp & Hahn 2010 - HIV genotypes red. Right: HIV groups M, N & O. evolution.berkeley.edu/evolibrary/article/medicine_04).
An X-linked complete version of HERV-K Xq21.33 probably capable of replication has been discovered, along with 36 undocumented provirus forms, in a search of the 1000 genomes project and a subset of the Human Genome Diversity Project panel (Wildschutte et al. 2016).
Mi and colleagues (2000) found a placental gene whose sequence was homologous to several retroviral envelope proteins. The sequence, now called syncytin, is identical to the envelope protein of the HERV-W retrovirus (Blond et. al.) which exists in around 40 apparently defective viral copies, including those in which the two syncytin viral env genes are fully functional (Mi et. al.). Syncytin is expressed at high levels in the syncytiotrophoblast (and at low levels in the testes) and nowhere else. Most of the other genes of the provirus have been mutated, suggesting that the envelope glycoprotein function was specifically selected. If cultured cells are made to express syncytin, they will fuse together, and this fusion can be blocked with antibodies against syncytin. HERV-W is only found in primates, but mice have similar retroviral blooms and ERV-related Syncytin genes have also been found in them (Dupressoir et. al.). The ability of mammals and thus ourselves to form a viable placenta and give birth to live young may thus depend on the mammals having harnessed a viral gene somewhere in our evolutionary lineage.
Fig 22d: Evolution of the ebola single stranded RNA virus in the 2014 ebola epidamic. (A) Phylogenetic and temporal placement of 188 Liberian EBOV genomes relative to 734 sequences from Guinea, Mali, and Sierra Leone. Three distinct lineages are represented in the Liberian samples: GN1, SL1, and SL2. (B) Median-joining haplotype network including 175 Liberian sequences with ≥97% genome coverage and 466 sequences representative of lineages circulating elsewhere in Western Africa (Ladner et al. (2015).
The fact that placental mammals depend on fusion between embryonic and maternal cells in the uterus and the fact that there are several different strains, e.g. in mice and men, which have clearly been 'borrowed' multiple times from retroviruses suggests an intimate evolutionary relationship between mammals and their retroviral counterparts. The discovery that syncytin is active in bone osteoclasts (Søe et al. 2011) and immature muscle cell myoblasts (Bjerregard et al. 2014), both of which involve cell fusion as well as the placenta, indicates a wider role. Redelsperger et al. (2016) discovered that knocking out both mouse versions, syncytin A and B, and is lethal, however knocking out only syncytin B results in scrawny males with depleted muscle fibres, showing that syncytin B is important in male muscle generation and appears to have a role in the greater muscle mass of many male mammals. Pivotally muscle is generated by the fusion of myoblasts to form multi-nucleated muscle fibres indicating a key role for syncitin nd why male mice have depleted muscle function, but why it is apparently confined to males remains to be resolved.
Retroviruses are divided between predominantly exogenous infectious habit such as HIV and SIV and conversion to endogenous transmission down the germ line such as the diverse HERV types. Magiorkinis G et. al. (2012) have verified that the loss of the Env gene which enables cell infection, is associated with super-amplification of germ-line retroviral elements by a factor of about 30, as exogenous retroviruses, switch to endogenous modes of selection. They have investigated the widespread occurence of retroviruses, including intracisternal A-particles, or IAPs, across the diversity mammal groups. Notably following Jern and Coffin (2008) the evolutionary tree of retroviruses both spans the vertebrates and includes both endogenous and exogenous habits, rooted in exogenous viral types.
Fig 23: (a) The seven retroviral genera: alpha-, beta-, gamma-, delta-, epsilon-, lenti-, and spuma-like retroviruses and their intermediate groups based on Pol sequences. Black branches indicate viruses known only in exogenous infectious forms (XRV); redbranches indicate viruses present in both XRV andendogenous (ERV) forms; and blue branches indicate ERVs Jern and Coffin (2008) . (b) Phylogeny of mammals with ERV megafamilies shown as colored circles (area is proportional to the percentage of the ERV loci in the genome represented by that family Magiorkinis G et. al. (2012). When retrotranspons are included (fig 22) they extend to all Eukaryote realms.
The defective copies of endogenous retroviruses may also serve to protect the host against further infection by becoming transcribed and causing incorporation of defective elements into the replicating virus (Best et. al.).
Fig 23b: (left) Root of the animal tree with ctenophores being an ancient outlier and xenocoelomorphs forming a newly-discovered primitive phyllum (Rouse et al. 2016).
Right: More recent evolutionary tree taking into account multiple genetic trends puts sponges back at the base division (Simion et al. 2017).
The Cambrian Radiation, Homeotic Genes, Metamorphosis and Hybridization
One of the most stunning and puzzling aspects of evolution is the Cambrian radiation some 550 million years ago which over a very short period of geological time, gave rise to the major phylla of multicellular animals we see today. This radiation forms the core of the evolutionary tree of fig 1. The previous evolutionary epoch - the Ediacarian by contrast has far fewer and less elaborate fossil forms such as Charnia, Dicksonia and Spriggina in fig 1, and particularly few organisms with well-preserved mineral skeletons.
There have many proposals why such a rapid and abundant radiation could have occurred, including geological scenarios involving the ending of a snowball earth epoch in which the earth became frozen and thus reflected radiant heat, until rising CO2 levels caused a rapid thawing, setting off a major expansion of cyanobacteria, filling the atmosphere with oxygen and changing the ocean from an acidic state with dissolved iron and litle oxygen to one capable of harbouring diverse forms of multicellular life.
Fig 24: Center to right: Homeotic genes specifying differentiation along the bodily axis have closely related sequences and are organized in a parallel scheme in arthropods and vertebrates and extend to cnidaria predating bilatera. Mutations such as antenapedia and bithorax in the fruit fly alter sequential specialization of segments. HOX genes share extensive sequence homology with phage lambda genes such as cro altering gene expression in prokaryotes. Left: Mutations of related genes in maize cause disruption of leaf development. Upper left: The homeobox gene antennapedia which induces legs in the place of antennae in the fruit fly binding to DNA. Lower left: Ectopic compound eye on the leg of a fruit fly induced by mouse pax-6.
However the underlying reasons may be genetic and more to do with the evolution of a pangenomic algorithm for generating the multicellular body plan based on homeotic and related developmental genes which are highly conserved and spread across the major animal and even plant kingdoms. Closely related schemes of homeotic genes drive the vertebrate and arthropod development along the bilateral axis, being involved in segmentation and notochord differentiation. For example pax-6, a gene involved in eye formation in the mouse, will induce ectopic eyes on the fruit fly, intriguingly the compound eyes insects usually have, indicating a deep commonality between the genes organizing the body plan in these two pivotal phylla.
This suggests it may have taken evolution a considerable time to come up with such an algorithmic regulatory process, but that once it came into play, it permitted almost symphonic variations, leading to the diversity of the major animal phylla over a relatively short geological time.
Fig 25: Body schemes of the bilateria (Martın-Duran et al. 2012).
A central division in the Cambrian radiation is the one that led to the bilateria - organisms with the left right symmetry possessed by both arthropods and vertebrates. The conventional argument is that the founding event differentiating bilateria from earlier organisms such as the cnidaria which have a mouth only is the symmetry introduced by the formation of an anus and an intestinal tube. Here things get complicated because embryonic development can go either from mouth first, based on the blastopore originating in cnidaria and then to anus or vice versa. In the conventional division the deuterostomes including the vertebrates go "arse-first" i.e. anus > mouth but the proterostomes go mouth > anus. The difficulty is that key deuterostomes such as the priapulid worm Priapulus caudatus, which was abundant in the middle Cambrian, actually develops on a deuterstome plan although in terms of molecular evolution the expression of bra, cdx, foxA, gsc, and otx during early development is similar to nematodes and arthropods spanning the "skin-shedding" ecdysozoa. Given the fact that the Chaetognatha or arrow worms also have this pattern the deuterstome pattern appears to be ancestral with the proterostomes forming a diverse set of body plans (Martın-Duran et al. 2012).
The conventional theory of insect morphogenesis is the evolution from eggs giving rise to small adult form individuals to delayed maturation of embryonic forms in the form of larvae, which did not compete with the adults in food consumption and habit, thus leading to a two-stage life cycle, with non-competing foraging larval and reproductive adults forms (Jabr). A controversial theory (Ryan) posits that aspects of metamorphosis, which we also see in insects, but more pivotally in marine organisms such as echinoderms, may have resulted from early hybridization even between organisms which have now become distinct phylla, such as vertebrates and echinoderms. For example Nectocaris pteryx appears to have a body plan looking like a chimera of an arthropod head and the abdomen of an entirely different phyllum, although this may be a result of the way fossils are depicted in drawings and this species may simply be an early cephalopod (Smith & Caron). The validity of this idea, at least in insects, is highly controversial and hotly disputed (Hart & Grosberg, Williamson).
Fig 26: Fossil and two very different scientific illustrations (insets) of Nectocaris ptaryx
and two views of the larva of Luidia sarsi emitting an echinodermal 'offspring'.
Studies attempting to trace the evolutionary tree depending on the simpler larval forms from which one would expect the adult form to have evolved have yielded contradictory results, suggesting the two might have independent genetic origins. Genetic analysis of sea squirts, which have vertebrate larval forms with a notochord and a primitive brain but metamorphose into fixed sea-floor feeders, suggests they have two genomic components, one coming from vertebrates and the other from and an unknown but now extinct non-vertebrate at a very early stage in the evolution of animals. Echinoderms themselves have larvae with a bilateral body plan which later becomes colonized by pluripotent cells in the abdominal cavity forming radially symmetric organisms which grow into adults. In the starfish Luidia sarsi, the embryonic form some 4 cm long survives for several months as a vegetarian living off phytoplankton after its starfish 'offspring' have burst out to their carnivorous habit of hunting other starfish.
Donald Williamson, who in the 1950s advanced the 'larval transfer' theory claimed to have successfully hybridized fertilised eggs from the sea squirt Ascidia mentula with sperm from the sea urchin Echinus esculentus. Then in 2002, in an unpublished study with Sebastian Holmes and Nic Boerboom, he did the reverse cross, using eggs from the urchin and sperm from the sea squirt. Both crosses resulted in large numbers of offspring, the majority of eggs developing into easel-shaped larvae - the 'pluteus' form typical of sea urchins, rather than the tadpole larvae that are the hallmark of sea squirts. Most of these larvae subsequently metamorphosed to a rounded adult form, which Williamson called a 'spheroid'. The first cross created spheroids with a suction cup, that enabled them to attach to surfaces. Most intriguingly, the second produced spheroids that reproduced asexually through budding, the pinching off of a section of the body to create a clone. However these were never subjected to genetic analysis. A cross between two echinoderm species has however resulted in new developmental phenotypes with confirmed hybridized genomes.
John Long, David Choo
Vertebrate Penetrative Sex: A Tortuous History
Recent fossil evidence suggests that penetrative sex has evolved four times in the vertebrate tree and was the original form of sex in the ancient placoderms that gave rise to all the jawed vertebrates (Long et al. 2014). It still appears in stingrays, although they have lost the bony claspers and replaced tham with cartilagenous ones, and has re-emerged in both guppys and land vertebrates which have evolved a variety of intromittent penises. This suggests that fundamental components of development have been preserved for long epochs leading to penetrative sex not having a developmental gene bootstrap.
Fig 26b: Conflict in the tree of mammalian diversification. Detailed traditional DNA-based evolutionary tree of the mammals (right) (Meredith et al, dos Reis et al) tends to have a different order of diversification from one based on the number of new miRNAs appearing in successive branches (top left Dolgin). Micro miRNA numbers have also been suggested to be correlated with neural complexity (bottom left Technau). Larger image of the right figure. Major mammal groupings image.
Mammalian Radiative Adaption: Traditional DNA versus Micro RNAs
Different assay methods are shedding an intriguing light on radiative adaptation and diversification of animal species, from the Cambrian through to the present day. The detailed branching of the tree of life when calculated by traditional mutational DNA methods (Meredith et al, dos Reis et al), appears to differ significantly from a new technique developed by Kevin Peterson (Dolgin) that depends on the number of newly accrued micro miRNAs which modulate gene expression by selectively binding to specific messengers inhibiting their expression.
A single miRNA is thus able to modulate the expression of a diverse array of mRNAs to which it binds, thus providing for sophisticated forms of coordinated regulation conducive to phylogenetic complexity. It has also been suggested that neuronal complexity correlates with the number of miRNAs (Technau, Grimson et al) an interesting question in itself to do with how complex nervous systems are generated in development. Notice here that humans have fewer protein genes than a mouse, roughly 21,000 against 22,000 although we have a brain with 10,000 times as many neurons, so we need to have an idea how organismic complexity evolves in terms of sophisticated gene regulation, and miRNAs do just that.
Consitent with such a role in multi-celled evolution, the appearance of miRNAs goes back to the earliest multicelled animals. Sea anemones already carry up to 40. Metazoa, from sponges to bilateria, also share the two classes of piRNA, the second of which plays a role in suppressing transposable elements in gametogenesis, by containing a sequence complementary to a transposase mRNA. In fruit flies these are directed against DNA transposons, but in mammals they target LINE L1 and IAP transcription during meiosis in the germ line, by methylating L1 and IAP DNA sequences (Aravin et al).
Ida's controversial place in the primate famiy tree.
While a traditional DNA-based tree places primates and humans much closer to rodents, as highly evolved branches, with the elephants diverging earliest, an miRNA analysis places rodents as branching out earliest, something which might seem to be consistent with their possibly closer correspondence to the founding shrew-like mammalian type. The critical question determining the fate of the miRNA perspective is what the rate of loss of these small RNA molecules is in evolution. A higher rate of loss would tend to remove the inconsistency. While the picture is consistent with retaining miRNAs in mammalian diversification, in insects and a primitive chordate sudden losses have occurred.
For modern lineages of birds and mammals, few fossils have been found that predate the Cretaceous–Palaeogene (K-Pg) boundary, although molecular studies using fossil calibrations have shown that many of these lineages existed at that time. One intriguing way of checking the evolution of mammals and its timing is to examine the parasites of birds reptiles and mammals and to develop a "tree of lice". Smith et al. (2011) demonstrate that the major louse suborders began to radiate before the K-Pg boundary, lending support to an earlier Cretaceous diversification of many modern bird and mammal lineages.
Fig 26c: Evolutionary tree of mammalian male infanticide (circled species) which occurs in around half of the 200 species invesigated. It is commonest in social species (dark grey) and less so in solitary species (light grey) and even less in monogamous species (black) (Lukas & Huchard 2014).
Mammals give birth to live young and lactate. This has caused the reproductive investment of the two sexes to become highly skewed, with males investing primarily in fertilization, while females are investing primarily in parenting. This has a variety of consequences. For example only around 3% of mammal species are socially monogamous, with the rest being either polygynous with harems, or having promiscuous females. This again skews the reproductive strategies further, because there are secondary consequences. Females are less likely to go in heat and become pregnant while they are lactating a brood, and offsping of other males will compete with his own, so there is a double investment in species with competing males, to kill the offspring of competitors - male infanticide. Around half of mammalian species, including the ancestors of the great apes and humans, are inveterate infanticiders, as promiscuous chimps and harem forming gorillas are, and the trait appears to have evolved multiple times. In turn, the females adapt to promiscuous mating, often with an advertised estrus to make it as difficult as possible for males to determine paternity (Lukas & Huchard 2014). Finally the males adapt by forming larger testes to deal with the issues of sperm competition involved in promiscuity.
Emergence and Diversification of Modern Humans
Fig 27: One hypothetical evolutionary tree for humans and related apes. There is much debate about the actual form of such a tree.
Homo sapiens appears to have evolved into a single dominant species on the planet, after preceding period in which fossil evidence suggests there were several different anthropoid species coexistent.
The final stage of this process was the disappearance of Homo erectus and Neanderthal, the latter after a well-defined period of coexistence in Europe at the end of the last ice age.
Our own evolutionary and cultural roots appear to lie in Africa, with evidence of culture and cosmetics running back over 100,000 years, in addition to evidence for tools and weapons. An alternative regional development theory has proposed that humans evolved through a considerable amount of interbreeding over the whole African and Asian continental region, however genetic evidence is coming to point towards an African origin with only at most very occasional cross fertilization with related species.
Fig 28: Human-Neanderthal-Chimp divergences (Green et. al. 2006)
According to genetic analysis, Neanderthals diverged from homo sapiens ~500,000 years ago. There has been no major interbreeding, but possibly some transfer of genes e.g. from human males to Neanaderthal females, although candidate human genes conferring natural advantage do have a profile consistent with transfer from Neanderthals (Pennisi).
Specific genes such a s PDHA1 consist of two families with the last common ancestor 1.8 million years ago, and microcephalin variants appearing 40,000 years ago also have differences suggesting an original divergence 1 million years ago suggesting 'introgression' from Neanderthals (Jones). An even more ancient divergence in the pseudogene RRM2P4 in East Asian people suggests interbreeding with Homo Erectus. Some evidence from skeletons is also consistent with this picture. However more recent sequencing of the Neanderthal nuclear genome (Callaway 2008) suggests little or no interbreeding with Homo sapiens and has cast doubt on the existence of the microcephalin variant in Neanderthals, as well as a gene associated with increased fertility in Icelanders also attributed to transfer from Neanderthals.
Fig 29: The "Out of Africa" hypothesis may be consistent with a degree of regional development
involving some sexual interbreeding with Neanderthals and Homo erectus (New Scientist).
Comprehensive investigation of the Neanderthal genome (Green et al. 2010, Dalton 2010) suggests that there was a period of interbreeding between Neanderthals and humans in the Near East around the time of the first migration out of Africa, rather than more recently in Europe, as the putative sequences are shared by non-African French, Han and Papuan, but not by the African Yoruba or San. It is estimated that among the former, 1-4% of the genome derives from Neanderthal sequences, although there is little evidence for these corresponding to the specific genes suggested by Lahn's team. Other transfers could have occurred but are not apparent in the research.
The situation has been complicated by two finds. Firstly we have the 'hobbit' human remains found on Flores, named Homo fiorensis. These are variously claimed to be a Separate human species possibly related to Homo erectus, or disclaimed as microcephalic human pygmy peoples. More recently we have the discovery of remains of Denisovans (Callaway 2010), a further species branching off from the Neanderthals and their lines branching from Homo sapiens some 800,000 years ago. Genetic analysis of the remains indicates a significant interbreeding specifically with Melanesian people of some 6% (Reich et. al 2010, Meyer et al 2012).
More recent investigations show that interbreeding with other hominins was critical to the globalization of Homo sapiens. Human leukocyte antigens (HLAs), a family of about 200 genes that essential to our immune system also contains some of the most variable human genes: hundreds of versions - or alleles - exist of each gene in the population, allowing our bodies to react to a huge number of disease-causing agents and adapt to new ones. One allele, HLA-C*0702, is common in modern Europeans and Asians but never seen in Africans; Peter Parham has found it in the Neanderthal genome, suggesting it made its way into H. sapiens of non-African descent through interbreeding. HLA-A*11 had a similar story: it is mostly found in Asians and never in Africans, and Parham found it in the Denisovan genome, again suggesting its source was interbreeding outside of Africa. This tallies with interbreeding giving H. sapiens pivotal resistance to non-African diseases. While only 6 per cent of the non-African modern human genome comes from other hominins, the share of HLAs acquired during interbreeding is much higher. Half of European HLA-A alleles come from other hominins, says Parham, and that figure rises to 72 per cent for people in China, and over 90 per cent for those in Papua New Guinea (Marshall 2011).
The contribution to immunity from Neanderthals has now become strongly evident, with highly significant differences between immune response to pathogens in both macrophages and monocytes in separate studies highlighting differences between African and non-African populations, which involve genes with Neanderthal homology, implying a Neanderthal-derived shift in the immune response to cope with new kinds of pathgens in new areas the migrants found themselves in. The greater intensity of immune response in African populations may also may explain the three-fold higher rates of auto-immune disease in African women (Reardon S. 2016 Neanderthal DNA affects ethnic differences in immune response Nature doi:10.1038/nature.2016.20854).
In a 2014 study by David Reich and coworkers, genes for keratin filaments that lend toughness to skin, hair and nails, were enriched with Neanderthal DNA. This may have helped provide the newcomers with thicker insulation against cold conditions, the scientists suggest. But other genes are implicated in human illnesses, such as type 2 diabetes, long-term depression, lupus, billiary cirrhosis - an autoimmune disease of the liver - and Crohn's disease. Other regions of the human genome, including the X-chromosome, are devoid of Neanderthal sequences suggesting they were selected against as deleterious. A genome region that lacked Neanderthal genes includes FOXP2, thought to play an important role in human speech (Sankararaman et al 2014 , Vernot & Akey 2014).
More recently selection for specific Tol-like receptors significant in disease resistance and acclimatization to high altitudes in Tibetans have both been tied to Neanderthal allelles among a wide survey of relative contributions to disease related genes (Callaway E 2015). In a 2016 study traits linked to hypercoagulation, depresion and tobacco addiction correlated with Neanderthal alleles. Although these may be disadvantageous in older modern populations they must have either conferred advantages during reproductive age or general advantages for the population of the time. Coagulation may have protected against injury but makes people more prone to stroke in modern populations (Simonti et al. 2016). A second 2016 study by Akey and co-workers confirms that hybridization with Neanderthals and Denisovans provided an important reservoir of advantageous mutations for modern humans that enabled adaptation to emergent selective pressures as they dispersed out of Africa. Our results show that immune and pigmentation traits were frequent substrates of adaptive introgression and that in many cases adaptive archaic haplotypes also contribute to the disease susceptibility in contemporary individuals. Many positively selected archaic haplotypes act as expression quantitative trait loci, which modulate the quantitative expression of particular genes, suggesting that modulation of transcript abundance was a common mechanism facilitating adaptive introgression (Gittelman et al. 2016).
Two studies examining immune response to infection highlight such differences. Nédélec, Y. et al. (2016) measured how gene expression in macrophages changed in response to the infection. About 30% of the approximately 12,000 genes that they tested were expressed differently between the two groups, even before infection and many of the genes whose activity changed the most during the immune reaction had sequences that were very similar between Europeans and Neanderthals, but not Africans. Quach, H. et al. (2016) grew monocytes in a dish and infected them with bacteria and viruses. Once again, the two groups showed differences in the activity of numerous genes, and Neanderthal-like gene variants in the European group played a major role in altering their immune response. The differences were especially stark in the way that those of African descent and the other half of European descent responded to viral infection. For some diseases, such as tuberculosis, a lower immune response tends to help with survival, and modern humans in Europe adopted the Neanderthal traits that helped with this. Overactive immune systems could help to explain why African American women, for instance, are up to three times more prone to the autoimmune disease lupus than white Americans.
The oligoadenylate synthetase (OAS) locus, which consists of three genes - OAS1, OAS2, and OAS3 - that encode enzymes involved in the innate immune response against viruses, and are among the core genes that are important to stop viral replication, A further comparison of OAS sequences among human populations revealed that this OAS Neanderthal allele is found in about 60 percent of individuals in Africa. However, outside of Africa, it is only found in individuals that harbor the Neanderthal haplotype. It is likely that Neanderthals were better adapted to the pathogens present in non-African environments than anatomically modern humans that had newly moved into these regions and it appears that this allele was lost during the out-of-Africa migration and that the Neanderthal haplotype resurrected this allele after the bottleneck following the human migration out of Africa (Sams A et al. 2016 Adaptively introgressed Neandertal haplotype at the OAS locus functionally impacts innate immune responses in humans Genome Biology 17 246).
McCoy, Wakefield and Akey (2017) note that there is significant downregulation of Neanderthal genes in both the brain and testes, indicating that these genes are mildly deleterious: "Recent theoretical work predicts that Neanderthals suffered a high load of weakly deleterious mutations accumulated during extended population bottlenecks. Assuming additive fitness effects, this mutational burden was estimated to have reduced Neanderthal fitness by at least 40% compared to modern humans. Under this model, deleterious haplotypes introgressed into larger modern human populations would have been subject to strong selection during the first ~20 generations after hybridization — a prediction with growing empirical support from genetic data. Nevertheless, many weakly deleterious variants are predicted to persist in present-day human populations, with a cumulative impact comparable to that of the Out-of-Africa bottleneck. Contributing to this result, we observed a striking bias toward downregulation of Neanderthal alleles in the brain and testes. Brain regions had significantly lower expression of Neanderthal alleles than non-brain tissues, particularly in the neuron-rich cerebellum and basal ganglia regions. This level of downregulation is exceptional, as equalsized samples of non-introgressed SNPs matched for sample sizes of individuals and tissues showed no such bias. Further consistent with these data, brain regions including the cerebellum were enriched for significantly down-regulated compared to significantly upregulated Neanderthal SNPs. Significant downregulation of introgressed alleles in the brain is particularly remarkable given the previous observation that brain-expressed genes show less alelle-specific expression (ASE) overall, a finding that was attributed to reduced levels of genetic diversity in this gene set. One brain-specific gene that exemplifies this pattern of down- regulation is NTRK2, which encodes a neurotrophic tyrosine receptor kinase that regulates neuron survival and differentiation as well as synapse formation."
Neanderthal versions of genes in the testes, including some needed for sperm function, were also less active than human varieties. That finding is consistent with earlier studies that suggested male human-Neandertal hybrids may have been infertile. But Neandertal genes don't always lose. In particular, the Neandertal version of an immunity gene called TLR1 is more active than the human version. Lopsided gene activity may help explain why carrying Neandertal versions of some genes has been linked to human diseases, such as lupus and depression.
Functionally important regions are deficient in Neanderthal ancestry (Sankararaman et al 2014).
A study of the Neanderthal Y chromosome, which is probably extinct in the human population, despite interbreediing, has shown that several key male histocompatability genes on the Y are mutated in a way which could have led to a maternal immune response and miscarriages, forming a barrier to interfertility (Mendez F et al. 2016). Moreover there are signs of hybrid infertility, suggesting only about 1 in 50 inter-matings resulted in fertile offspring, as regions in the X-chromosome, and on other chromosomes linked to testis genes, and mitochondrial DNA are all devoid of Neanderthal genes, a pattern common to interspecies infertility. Evidence from a study of European Ice Age genomes shows the proportion of Neanderthal DNA has been declining due to natural selection (Fu et al. 2016).
Fig 29c: Recent discovery of 400000 year old bones with mitochondrial relationship to Denisovians has raised further questions about early human emergence (Meyer et al. 2013).
Some doubt has been cast on the Neanderthal interbreeding idea, attributing the effects instead to shared sequences arising from isolated African populations of the two species separating some 300-350,000 years ago, with the last exchanged genetic material some 47-65,000 years ago. However these results are contested by the original researchers from Paabo's team. They say recent analyses actually firm up the case for interbreeding. Their evidence suggests non-Africans have shared genes in common with Neanderthals for only a few tens of thousands of years, so these genes cannot predate the origin of Neanderthals.(Marshall 2012).
In 2014 more accurate datings of the demise of Neanderthals in Europe suggests they were already in serious decline 50,000 years ago probably as a result of a climate cooling phase and that by the time sapiens arrived they were already only a small fragile remnant population in scattered isolated bands (Brahic 2014, Benazzi 2011). By 39,000 years ago they had largely vanished. This doesn't imply they were actively killed off by Homo sapiens but that their territory and resources were compromised by a new invasive species. Neither is it clear that they were manifestly intellectually inferior to modern humans, as artifacts from both species show similar innovations (Barras 2014).
Fig 29d: A more complete picture of introgression of genes from one hominin species to another has been developed as a result of further sequencing of a slightly older Neanderthal toe bone also from Denisova cave and the analysis of the Denisovan sequence implying gene flow from an older hominin, possibly erectus or heidelbergensis. Notably the Neanderthal female was highly inbred corresponding to being the offspring of half-siblings with a common mother (Prufer et al 2013).
There is also suggestive evidence for up to 2% interbreeding with another hominin species, in ancient African populations of Biaka Pygmies and San Bushmen (Kaplan 2011, Hammer, et al. 2011) although this is based only on statistical divergences in some loci, and lacks a sister species reference sequence, it suggests interbreeding around 35,000 years ago with a population that originally diverged from the Homo sapiens line some 700,000 years before. A further introgression of one, or more unknown hominins possibly erectus has been found in the genomes of Andaman Islanders (Mondal M et al. 2016).
Fig 29e; Left: Earlier interbreeding also occurred 100,000 years ago in which human genes and those of another mystery hominin were transferred to Eastern Neanderthals and Denisovans respectvely (Callaway E 2016 Evidence mounts for interbreeding bonanza in ancient human species Nature doi:10.1038/nature.2016.19394). Centre and right: The same events giving percentages and populations sizes (Kuhlwilm M et al. 2016 Ancient gene flow from early modern humans into Eastern Neanderthals Nature doi:10.1038/nature16544).
The female Neanderthal sequenced in the above study also had extensive inbreeding (Marshall 2013), with up to an eighth of the genome devoid of alelle variation implying breeding between half siblings, possibly as a consequence of small isolated populations. This led to notable incidence of deformities. Those he studied have a range of deformities, many of which are rare in modern humans (Wu et al. 2013). Our genomes likewise still carry traces of past small population bottlenecks. A 2010 study concluded that our ancestors 1.2 million years ago had a population of just 18,500 individuals, spread over a vast area (Huff et al. 2010).
Fig 29f: Relationships of intestinal bacteria in the evolution of great apes and Homo sapiens. Change in the microbiome was slow and clock-like during African ape diversification, but human microbiomes have deviated from the ancestral state at an accelerated rate. Human microbiomes have lost ancestral microbial diversity while becoming adapted for animal-based diets. (Moeller et al. 2014).
Chromosomes contain a variety of markers that can be used to compare diverse populations and infer an evolutionary relationship between them. These include the slowly varying protein polymorphisms of coding regions which are useful for long-term trends, single nucleotide polymorphisms, and non-coding region changes (mutation rates about 2.5 x 10-8 per base pair per generation and useful for reconstructing evolutionary history only over millions of years) insertion and deletion events (about 8% of polymorphisms, extending from one to millions of nucleotides), particularly those driven by transposable elements such as the LINEs and even more frequent SINEs, non-coding micro-satellites (mutation rate 10-5 - 10-2 due to repeat slippage) and mini-satellite regions of repeating DNA (mutation rates as high as 2 x 10-1 due to meiotic recombination in sperm) that both evolve rapidly and are not subject to the strong selection of coding regions which can differentiate changes over the much shorter time scales of modern human migration.
The insertions and deletions of the million or so Alu elements in the human genome are particularly useful, as the most active sub-population of about 1000 Alu is actively transcribing and undergoing rapid change. A subpopulation of Alu are capable of generating new coding regions (exons), when inserted into non-coding introns between spliced sections of a translated mRNA, because one base-pair change within Alu leads to formation of a new exon reading into the surrounding DNA. This is not necessarily deleterious because alternative splicing still allows the original protein to be made as well. We have the highest number of introns per gene of any organism, and thus have to have gained an advantage from this costly error-prone process. Alus may have given rise, through alternative splicing, to new proteins that drove primates' divergence from other mammals. Recent studies have shown that the nearly identical genes of humans and chimps produce essentially the same proteins in most tissues, except in parts of the brain, where certain human genes are more active and others generate significantly different proteins through alternative splicing of gene transcripts. Our divergence from other primates may thus be due in part to alternative splicing.
If we consider the likely effects of the out of Africa hypothesis, we would expect that founding African populations not subject to active expansion and migration would have greater genetic diversity and that the genetic makeup of other world populations would come from a subset of the African diversity, consisting of those subgroups who migrated. This picture is complicated by the evidence for one or more bottlenecks that reduced the genetic diversity of the surviving human population to 3000-10,000 breeding pairs around 70,000 years ago, which has been associated with the supervolcanic Toba eruption in Sumatra.
Fig 29f: The Volcanic Winter/Weak Garden of Eden model proposed in Ambrose 1998. Population subdivision due to dispersal within African and to other continents during the early Late Pleistocene is followed by bottlenecks caused by volcanic winter, resulting from the eruption of Toba, around 71,000 years ago. The bottleneck may have lasted either 1000 years, during the hyper-cold stadial period between Dansgaard-Oeschlger events 19 and 20, or 10,000 years, during oxygen isotope stage 4. Population bottlenecks and releases are both synchronous. More individuals survived in Africa because tropical refugia were largest there, resulting in greater genetic diversity in Africa.
In the case of mitochondrial mtDNA (mutation rate about 2.5 x 10-7) and its hyper-variable D-loop (mutation rates as high as 4 x 10-3), which is transmitted only down the maternal line (see Tishkoff and Verrelli for caveat) and the non-recombining majority of the Y-chromosome which is transmitted only down the paternal line, each with no recombination, we would expect greater diversity going deeper into the historical tree of divergence, with certain existing groups who have retained the founding patterns of survival and have not undergone rapid population expansions to retain an increasingly diverse source variation. All these features are broadly observed in the genetic data to date.
Fig 30: (a) MtDNA tree for African groups showing haplotypes of !Kung, Mbuti and Biaka as well as the line coming out of Africa (Chen et. al.). (b) Diagram of world migration and regional differentiation of successive mtDNA haplotypes (Gilbert). (c) mtDNA distances between founding African groups including Hadza (clicks) Khwe is from (Knight et. al.). Recent mtDNA evidence suggests a first wave of migration down the coast of Asia all the way to Australia (Forster et. al.).
Most studies of non-coding regions of autosomal, X-chromosome, and mitochondrial mtDNA genetic variation (which are desirable markers because they are not so subject to selection and thus have relatively neutral drift) show higher levels of genetic variation in African populations compared to non-African populations, using many types of markers. Although some studies of Y-chromosome variation have observed higher heterozygosity levels in non-African populations, the African populations have higher levels of pairwise sequence differences, consistent with these populations being ancestral. High levels of diversity in African populations alone do not prove that African populations are ancestral. A recent bottleneck event and/or colonization and extinction events among non-African populations, or a more recent onset of population growth in non-Africans, could also cause a decrease in genetic diversity (Tishkoff and Verrelli). In fact the complete inter-fertility of all human populations and the relative lack of genetic divergence by comparison with the few remaining chimp colonies in the wild (Hrdy 183) does indicate a significant bottleneck. The genetic data is consistent with a human emergence from a population of only 10,000 around 100,000 years ago. This is also consistent with the delayed maturation, long birth spacings as a result of prolonged lactation and high infant mortality seen in gather-hunter populations such as the !Kung. At such low growth rates a population of 100 would take 50,000 years to reach 10,000 (Hrdy 183).
Fig 31: Patterns of male migration. The Genographic Project - a partnership between National Geographic and IBM - will collect DNA samples from over 100,000 people worldwide to provide a high-resolution genetic map of human migration.
However studies of protein polymorphisms as well as mtDNA haplotypes, X-chromosome and Y-chromosome haplotypes, autosomal microsatellites and minisatellites, Alu elements, and autosomal haplotypes indicate that the roots of the population trees constructed from these data are composed of African populations and/or that Africans have the most divergent lineages, as expected under a recent African origin rather than a multi-regional emergence model. Additionally, studies of autosomal, X-chromosomal haplotype and mtDNA variation indicate that Africans have the largest number of population-specific alleles and that non-African populations harbor a subset of the genetic diversity that is present in Africa, as expected if there was a genetic bottleneck when modern humans migrated out of Africa. Analysis of genetic variation among ethnically diverse human populations indicates that populations cluster by geographic region (i.e., Africa, Europe/Middle East, Asia, Oceania, New World) and that African populations are highly divergent. The mtDNA studies hypothesize a primal female ancestor - the African Eve - around 150,000 years ago (Chen et. al.) while the Y-chromosome Adam is more recent, at around 90,000 years ago (Underhill et. al.) consistent with the greater reproductive variance of males than females. Differences between the Y- and mtDNA distributions indicate how migration, intermarriage and female exogamy have affected the gene pool. The genetic patterns of both these and autosomal microsatellites (Zhivotovsky et. al.) are consistent with founding African diversity with migratory radiations to form other world populations, with deep founding radiations to the forest people such as the Biaka and Mbuti, Khoisan click-language speaking !Kung-san bushmen of Botswana and the Sandawe of Tanzania, and possibly the Hadzabe, as well as the forest people such as the Mbuti and Biaka 'pygmies' who have adopted the Bantu languages of the farming neighbours with which they now share semi-symbiotic relationships. Along with some Ethiopian and Sudanese sub-populations, these groups may represent some of the oldest and deeply diversified branches of modern humans.
Fig 32: (Right) Genographic project study of mitochondiral origins shows a deep split separating Khoisan mitochondrial inheritance from other groups, including those migrating out of Africa, suggesting a separation of some 100,000 years possibly caused by long term drought in Africa (Behar et al.) (Left) Phylogeny of 526 complete mitochondrial genomes depicting the earliest diverged modern human maternal lineages, including the first ancient Khoesan mtDNA (StHe) within the L0d2c lineage. All non-L0d2c genomes have been collapsed with each triangle representing the relative diversity of the corresponding haplogroups and subclades. In 2014 the skeleton of a male marine forager discovered at St. Helena, a carbon dated to 2,330 ± 25 years before the present, displays one of the oldest mitochondrial clades L0d2c1c, unlike its Khoe-language based sister-clades (L0d2c1a and L0d2c1b) most closely related to contemporary indigenous San-speakers (specifically Ju). whose ancestors diverged from other humans roughly 150,000 years ago. (Morris et al. 2014) before Khoekhoe speaking pastoralists arrived 500 years later.
Such recent genetic evidence has laid bare the relationships between some of the founding human groups spread across Africa from the 'Cushite' horn of Ethiopia to the southern Kalahari. Mitochondrial DNA studies have highlighted the ancient origin of the !Kung San and of pygmy peoples of the Congo Basin such as the Mbuti and the Biaka.
Fig32b: The largest ever study of global genetic variation in the human Y chromosome has elucidated the phylogenetic tree of men. Some parts of the tree were more like a bush, with many branches originating at the same point indicating there was an explosive increase in the number of men carrying that type of Y chromosome The earliest occurred 50,000-55,000 years ago, across Asia and Europe, and 15,000 years ago in the Americas. There were also later expansions in sub-Saharan Africa, Western Europe, South Asia and East Asia, between 8,000 and 4,000 years ago. The earlier population increases probably resulted from the first peopling by modern humans of vast continents, where plenty of resources were available and the later from advances in technology that could be controlled by small groups of men: Wheeled transport, metal working and organized warfare (Poznik, G et al. 2016)
Y-chromosome studies have shown the !Kung share a most ancient haplotype with sub-populations from Ethiopia and the Sudan. According to an overall survey of genetic research by Sarah Tishkoff of the University of Maryland, the most deeply ancestral known human DNA lineages may be those of East Africans, such as the Sandawe, who share many phenotypic features and a click language with the !Kung. This suggests southern Khoisan-speaking peoples originated in East Africa. The most ancient populations are now believed to also include the Sandawe, Burunge, Gorowaa and Datog people of Tanzania. The Burunge and Gorowaa migrated to Tanzania from Ethiopia within the last 5,000 years consistent with an ancient founding population in this area. Echoes of the earliest language spoken by ancient humans tens of thousands of years ago may have been preserved in the distinctive clicking sounds still spoken by some existing African tribes.
(Above) Baka Adam vs Mbuti Eve. The new data from Poznik et al (2013) does not necessarily mean Eve is exactly the same age as Adam. (Below) New resolution of the Y-spread immediately out of Africa.
Highlighting unique features of human genetic evolution, are two key genes whose mutations cause microcephaly, consistent with increased brain size, whose rapid spread through the human population may coincide with spurts in human culture. Microcephalin (Evans et. al.) appeared ~37,000 years ago coinciding with the birth of culture and ASPM spread from the Near East around 5000 years ago (Mekel-Bobrov et. al.). However studies linking these variants have failed to find differences in intelligence and results remain highly controversial (DOI:10.1126/science.314.5807.1872). Nevertheless, these results are consistent with an overall examination of linkage disequilibrium in single nucleotide polymorphisms (Moyzis et. al.) which indicate that about 7% of our genes have been subject to selection in the last 50,000 years, a figure similar to domestication of maize, including genes for protein metabolism, disease resistance and brain function.
Fig 33: Left: (a) Non-recombining Y-chromosome evolutionary tree (Underhill et. al.) (b) Geographical distribution showing the ancient haplotype shared by the San and Ethiopian and Sudanese sub-populations. (c) Genetic distances between Khoisan and forest peoples sharing M112 a Y-chromosome allele common only in these groups showing great genetic distance between Hadzabe and San peoples (Knight et. al.) . (d) Autosome satellite analysis confirming ancient divergence of San and forest peoples leading to migration from Africa (Zhivotovsky et. al.). Right: The genetic structure of 126 Ethiopian and 139 Senegalese Y chromosomes was investigated by a hierarchical analysis of 30 diagnostic biallelic markers selected from the worldwide Y-chromosome genealogy. The present study reveals that only the Ethiopians share with the Khoisan the deepest human Y-chromosome clades. This confirms the ancestral affinity between the Ethiopians and the Khoisan, which has previously been suggested by both archaeological and genetic findings (Semino et al.).
Y-chromosome studies have shown the !Kung share a most ancient haplotype with sub-populations from Ethiopia and the Sudan, suggesting they are parts of an ancient widespread population later divided by the Bantu expansion. According to an overall survey of genetic research by Sarah Tishkoff of the University of Maryland, the most deeply ancestral known human DNA lineages may be those of East Africans, such as the Sandawe, who share many phenotypic features and a click language with the !Kung. This suggests southern Khoisan-speaking peoples originated in East Africa. The most ancient populations are now believed to also include the Sandawe, Burunge, Gorowaa and Datog people of Tanzania. The Burunge and Gorowaa migrated to Tanzania from Ethiopia within the last 5,000 years consistent with an ancient founding population in this area. Echoes of the earliest language spoken by ancient humans tens of thousands of years ago may have been preserved in the distinctive clicking sounds still spoken by some existing African tribes.
Fig 34: Human divergence trees calculated by single nucleotide polymorphisms (SNPs) top left (Li et. al.) bottom right (Jakobsson et. al.). Trees for haplotypes and copy number variation between populations (Li et. al.). (click to enlarge).
In a counterpoint to these studies, (Hein, Rohde et. al.) estimate that the repeated spreading of family trees by sexually recombining mobile populations and differences in reproductive rates leads to an estimate of the most recent common ancestor of our global populations existing just 3,500 years ago, excepting these most isolated groups.
Further studies of the nuclear genome, using SNPs (single nucleotide polymorphisms), CNVs (copy number variation) and haplotype have thrown up reasonably consistent maps of regional divergence of principal human groups, demonstrating correspondence to the "Out of Africa" hypothesis and consistent with major patterns of migration.
Biallellic deletion-based tree including Neanderthals and Denisovans as outliers (Sudmant et al. 2015).
In 2015 (Sudmant et al.) the study of CNVs (deletions and gene duplications) was expanded to 236 individuals from 125 distinct human populations including an in depth exploration of duplications which require more advanced techniques to assess. In total, 7.01% of the human genome is variable due to CNVs in contrast to 1.1% due to single-nucleotide variation. Deletions (loss of sequence) were less common (representing 2.77% of the genome) compared to duplications (4.4% of the genome), suggesting that many duplications are fixed because they are advantageous. CNVs mapping to segmental duplications were larger on average (median of 14.4 kbp), than CNVs mapping to the unique portions of the genome (median of 6.2 kbp).
In 2012 an in depth study into human origins (Schlebusch et al) has found no single founding location but mixing and divergence of populations, with the Khoe-San diverging from other human groups over 100,000 years ago, and a further division later between North and South Kalahari populations around 35,000 years ago, but in addition, deep complexity both within and wthout Khoe-San populations.
Fig 35: Left: In 2009, Tishkoff et. al. reported on a major study of African and African American evolution containing the most detailed information on African diversity to date (click to enlarge). Right: Reproductive bottleneck in Y-chromosome diversity began about 10,000 years ago and continued for several millennia (Karmin et al. 2015). Inset shows 11 independent areas of primal agriculture discovered. Evidence of animal husbandry has also been found in Turkey 10500 years ago. (The real first farmers: How agriculture was a global invention New Sci 28 Oct 2015).
In 2015, research into the comparative population diversity of maternal mitochondrial DNA and the male Y-chromosome led to an astounding contrast. Around 10,000 years ago, corresponding to the birth of agriculture, the diversity of the Y-chromosome underwent a collapse across vast areas on the human-colonized planet. There is no evidence this was a result of direct biological or genetic factors as there were no differences between differing Y-clades. The conclusion is that the effect was driven by cultural changes associated with agriculture in which powerful men were able to reproductively exploit large numbers of women and transmit their reproductive success on to their male heirs, squeezing the majority of males out of the reproductive race. Estimates of this phase of extreme reproductive polygyny suggest that for every reproducing male there were 17 reproductive females effectively making harems the predominant form of sexual relationship (Karmin et al. 2015).
A member of the research team hypothesizes that somehow, only a few men accumulated lots of wealth and power, leaving nothing for others. These men could then pass their wealth on to their sons, perpetuating this pattern of elitist reproductive success. Then, as more thousands of years passed, the numbers of men reproducing, compared to women, rose again. In more recent history, as a global average, about four or five women reproduced for every one man, still a highly polygynous picture that leads into some of the great patriarchs of history from Ghengis Khan whose Y-chromosome continues to exist in 8% of men in 16 populations spanning Asia and some 0.5% of males worldwide (Zerjal, T. et al. 2003 Am. J. Hum. Genet. 72, 717-21) Udayama who was said to keep 16,000 virgins behind flaming walls (R577, R735 99). Several other great founders of Y-chromosome lineages have been discovered (Callaway E 2015 Nature doi:10.1038/nature.2015.16767, Balaresque, P et al. 2014 Eur. J. Hum. Genet. doi:10.1038/ejhg.2014.285).
This comes as an ironical twist since it is assumed that agriculture was an invention of women coming out of their role as gatherers in gather-hunter societies and provides a new perspective on the societies of the planter queens where female deities appear to have been worshipped at the same time as this extreme form of male reproductive elitism. The other thing that is really stunning about this effect is that it has been repeated widely acros disaparate world cultures, from China through the Near East to Europe and even Precolombian America.
When three populations Khoisan from Africa, Mongolian Khalks and Papua New Guinea Highlanders were examined for the differences in age between the Y-chromosome Adam and the mitochondrial Eve, the ages of all three groups had a roughly 2:1 difference in age (SAN 73.6 kya vs 176.5 kya, MNG 43.6 kya vs 134.4 kya and PNG 45.5 kya vs 81.05 kya). These results are most consistent with a higher female effective population size skewed toward an excess of females by sex-biased demographic processes. They demonstrate that overall female reproductive populations sizes throughout the last 100,000 years of human evolution have been effectively polygynous by a factor of around 2:1.
For an authoritative recent overview of the peopling of the planet in terms of genomics see Nielsen et al. (2017).
Language and Cultural Evolution
Fig 36: Left: Evolutionary tree of Indo-European languages suggests a possible radiation corresponding to the Kurgans occurred around 6,900 years ago and that they were preceded by Hittite migrations into Anatolia. Time scales in red are BP (Gray and Atkinson). Significantly Tocharian appears in Buddhist writings from China's Xinjiang province, indicating early far-eastern spread. Inset: hypothetical relationship between Indo-European and wider language groups such as Afro-Asiatic (click to enlarge). Looking at the Indo-European origin geographically Bouckaert et al. (2012) found decisive support for an Anatolian origin over a steppe origin. Both the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago. Right: The DNA analysis of widespread fossil and current genomes has led to confirmation of a great Yamnaya migration from the Steppe around 4500 years ago which almost completely replaced the gatherer-hunter populations of Europe (Haak et al. 2015). See also Allentoft et al. (2015).
The evolutionary tree of human ethnic and migratory peoples bears an interesting relationship with the corresponding tree of languages, in which language appears to have a cultural evolutionary capacity of its own occurring more rapidly than genetic evolution, complementing the biological evolution of human populations.
Counterposing the idea of a hardwired genetic basis for the human capacity for spoken language, as exemplified by Chomsky's generative grammar, is the theory of language as an evolutionary 'parasites' converging towards internal efficiency through the modularity of their grammar and word set. Darwin (1904), the founder of the evolutionary approach (1859) speculated that language was potentially an invention: "Man not only uses inarticulate cries, gestures and expressions, but has invented articulate language, if indeed the word invented can be applied to a process completed by innumerable steps half consciously made". Morten Christiansen (Christiansen and Kirby 2003) question the need to invoke a Chomskian generative grammar. Instead, they argue, language has adapted to utilize more general cognitive processing capacities that were already part of our ancestors' brains before language came along. Among these, he focuses on 'sequential learning' - the ability to encode and represent the order of the discrete elements in a sequence. This ability is not unique to humans: mountain gorillas, for example, use it in the complicated preparation of certain 'spiky' plant foods, where a sequence of tasks is required to remove the edible part. Language, he says, is a 'non-obligate mutualistic endosymbiont' - a kind of evolutionary structure like a 'symbolic virus'. Kirby suggests our brains are not so specifically designed for language and that we appear to be biologically adapted to language because language, which evolves much faster than biology has culturally adapted to us, gaining semantic power and representational efficiency as it evolves. Languages as different as Danish and Hindi have evolved in less than 5000 years from a common Proto-Indo-European ancestor. Yet it took up to 200,000 years for modern humans to evolve from archaic Homo sapiens. This tallies well with the fact that written languages cannot possibly have a hard-wired basis, having only existed for the last 4000 or so years and being a product of only a few cultures, yet we can adapt our visual pattern recognition readily to become fully literate.
Confirmation that the tree of life of language evolution is a cultural evolutionary phenomenon, rather than a cognitive universal, by implication determined genetically (Ball 2011) has come with the work of Russel Gray and coworkers (Dunn et al. 2011). In the Nature editorial "Universal Truths" the scope of this is made clear. There are two theories of language universality which delineate the field. Noam Chomsky proposed that the brain is genetically endowed with rules providing brain modules which express a universal grammar. Joshua Greenberg, takes a more empirical approach, identifying traits (particularly in word order) shared by many languages, which are considered to represent biases that result from cognitive constraints. Gray and his colleagues have put both to the test using phylogenetic methods to examine four family trees that between them represent more than 2,000 languages. They considered whether what we call prepositions occur before or after a noun ("in the boat" versus "the boat in") and how the word order of subject and object work out in either case ("I put the dog in the boat" versus "I the dog put the canoe in"). A generative grammar should show patterns of language change that are independent of the family tree or the pathway tracked through it, whereas Greenbergian universality predicts strong co-dependencies between particular types of word-order relations (and not others). Neither of these patterns is borne out by the analysis, suggesting that the structures of the languages are lineage-specific and not governed by universals.
Quentin Atkinson (2011) has taken this a step further. Human genetic and phenotypic diversity declines with distance from Africa, as predicted by a serial founder effect in which successive population bottlenecks during range expansion progressively reduce diversity, consistent with the "out of Africa" hypothesis. Likewise Atkinson showed that the number of phonemes used in a global sample of 504 languages is clinal and fits a serial founder-effect model of expansion from an inferred origin in Africa - in effect a cultural evolutionary process. In Atkinson's words this "points to parallel mechanisms shaping genetic and linguistic diversity and supports an African origin of modern human languages.
To quote Jabr (2011) "Earlier research has shown that the more people speak a language, the higher its phonemic diversity. Africa turned out to have the greatest phonemic diversity - it is the only place in the world where languages incorporate clicks of the tongue into their vocabularies, for instance - while South America and Oceania have the smallest. Remarkably, this echoes genetic analyses showing that African populations have higher genetic diversity than European, Asian and American populations.
Fig 37: Hypothetical core of all human languages from Greenhill et al. (2010) further work associated with Gray and Atkinson's research (click to enlarge). The Nature article above implies "languages evolve in their own idiosyncratic ways, rather than being governed by universal rules set down in human brain patterns".
Fig 38: Tree of world religions included to turn the tables on creationist deniers of evolution (click to enlarge). Lacks detail for African tribal religions (see Culture out of Africa).
Finally we note that, contrary to creationist and intelligent design notions of life needing a design specification from a third party "God" whith no detectable presence in the natural universe, religions themselves can be seen to show a similar form of cultural evolution to the languages they are expressed in. The evolutionary tree of life remains the root and branch from which, through the muck and slime of sexual recombination, human intelligence, culture and religion has sprung. Thus evolution is fecundly capable of spawning religion but religion cannot legitimately deny evolution. Nature thus reigns supreme and we fantasize against it at our folly.
Resplendence: A Paradigm Shift from Religion to Reparadise the Earth
Conclusion: The Tree of Life, the Selfish Gene, and Climax Genetic Diversity
The picture conveyed by the significance of endosymbiosis, genome fusion and horizontal transfer as key evolutionary processes complementing the vertical transmission of the tree of life, makes clear that evolution is not just a matter of competitive survival of the fittest gene, individual, or species, but of dynamic survival of genes in a surviving ecosystem. Although Dawkins' (1978) notion of the "selfish gene" was pivotal in drawing attention to the fact that it was the survival of genes and not organisms, or even species, that was the key evolutionary process, attributing the human sentiment of selfishness to a gene is somewhat of a self-serving advertising distraction on the part of the author, which diminishes the subtlety and complexity of the sometimes apparently paradoxical ways genes actually interact to bring about beneficial outcomes in the evolutionary dynamics of the ecosystem.
Although the idea of selection of genes has been pivotal in defining the need to consider evolutionarily stable strategies under genetic variation in ways which have been subsequently confirmed time and time again in situations such as the sexual genetics of social insects such as bees and ants, social selection is by no means ineffectual, or much of sociobiology, including the biological basis of morality as an extension of reciprocal altruism, would cease to exist.
Moreover, from what we have seen, particularly about horizontal gene transfer, and the capacity of mobile elements to induce modulated changes in nuclear genomes, it is not the 'selfishness' of a genetic element alone that results in survival of both a gene and its hosts, but dynamic feedbacks and relationships which ultimately contribute to a massive sharing of information in the manner of parallel genetic algorithms fundamental to the replicative genetic process, which enable global forms of genetic and genome optimization central to the overall viability of life as complex systems.
Fig 39: The Mandala of Evolution (Dion Wright) Click to see full image.
A predator, such as a lion survives, not because it is a selfish beast thinking only of eating the next gazelle, but because the predator, although it is surviving by killing individual antelopes, is maintaining a degree of stability in population dynamics, without which, the herbivores might multiply causing a massive famine, leading to cycles of boom and bust and the potential extinction or attrition of antelopes, lions and the grasslands.
Likewise, although we may think of individual genes, transposable elements, or viruses as 'selfish' for reproducing sufficiently to ensure their own survival, and sometimes behaving as noxious parasites, the overall effects of this process, in evolution can be to enrich the genetic potential of many unrelated organisms along the way, changing forever the face of the ecosystems in which they exist, enabling organisms of far greater complexity to evolve and to survive in the closing circle of the biosphere.
What humanity needs to learn to come to terms with is that our own survival is inextricably entwined with the survival of the immortal evolutionary tree tree of the diversity of life and it is this tree and the biosphere that contains it that we need to give our unswerving devotion to or we will fail the acid test of being a species co-dependent in a perennial ecosystem. By corrupting the tree of life through human impacts of climate change and habitat destruction we are invoking a mass extinction of life and may all too easily become one of the many casualties unless we care for the Tree as our primary goal in life.