[There] are doubts stemming from the fact that many genes give believably different phylogenies for the same organisms… almost certainly because they have been "laterally transferred." If instances of LGT can no longer be dismissed as "exceptions that prove the rule," it must be admitted (i) that it is not logical to equate gene phylogeny and organismal phylogeny and (ii) that, unless organisms are construed as either less or more than the sum of their genes, there is no unique organismal phylogeny. Thus, there is a problem with the very conceptual basis of phylogenetic classification… — W. Ford Doolittle (1999)
The concept of "genome" is somewhat ambiguous, referring either to some of the hereditary material carried by an organism or all of it. It can refer to chromosome(s) or more than just chromosome(s). Genomes furthermore can consist of DNA, exclusively for cellular organisms, versus RNA for some viruses. One maybe (or maybe not) can also include under the concept of "genome" various considerations of epigenetic modifications, e.g., methylation, carried by the genetic material. A genome, however, represents without question some collection of hereditary material that, except for mutation or recombination and perhaps epigenetic modification as well, is not altered going from parent to offspring.
How genomes function is the purview of molecular genetics and, more recently, genomics. There are certain functions that are necessary for a genome to encode. These include those required to assure a genome's replication and segregation to daughter cells along with a genome's gene expression. The latter, especially, should be viewed in terms of transcription (i.e., transcriptomics) but also may be viewed, though to a more limited extent, in terms of the translation of proteins (proteomics). As with much of evolutionary biology and ecology, understanding how "higher level" functions operate, e.g., such as ecological functions, can be well informed by gaining an understanding of the "lower level" molecular functions associated with gene expression (with "level" being used in an emergent properties sense). Nonetheless, here our emphasis will be limited to how genomes change through time.
Our goal will be to consider especially general principles of how the molecular mechanisms and features associated with genomes can be impacted by natural selection and genetic migration. Emphasis, of course, will be on microorganism genome evolution. Given the ease with which genome sequencing may be accomplished, and the strong emphasis by microbiologists with an evolutionary bent on genotype-based evolutionary concepts, this chapter represents most closely what many would describe as microbial evolution, sensu stricto. Perhaps deviating from expectations, I nonetheless will continue to emphasize an evolutionary ecological perspective in covering issues of microbe genome evolution, that is, fitness and adaptation.
Table: Genome Evolution- and Phylogenetics-Associated Terms and Concepts
| |
Term | Definition/Discussion |
Clade | Group of related species that is both fully inclusive and excludes less-well related species. |
A clade is a monophyletic taxon, that is, a correctly defined taxon. Such a taxon includes a particular, typically ancestral species along with all of the descendants of that species. An example of a clade thus could be all of the upright-walking apes, which would include whatever species served as the original upright-walking ape and all of that species' descendant species, whether extant (i.e., Homo sapiens) or instead extinct. (In this example, note that the upright-walking apes are found nested, presumably, within the "tribe" known as Hominini, which may or may not consist exclusively of upright-walking apes but which by definition does not include the common ancestor between humans and chimpanzees.) While defining clades is crucial towards documenting evolutionary relatedness, at the same time horizontal gene transfer between species and especially between clades can hugely complicate this documentation, to the point where it can, for microorganisms, be difficult to conclude just what exactly a clade truly represents. More traditionally, it is convergent evolution that has complicated the grouping of organisms into clades. In principle, while convergent evolution can give rise to errors in organism classification that in principle can be overcome, the issue with horizontal transfer is that complications in fact are not errors but instead simply complications. | |
Cladogram | Visual representation of a phylogeny. |
A cladogram is a graphical representation of phylogenetic relationships. Cladograms visualize in particular how closely related specific groups of individuals (e.g., species) relative to the other groups of individuals. This is accomplished by indicating relations between individuals in terms of their mutual relationship to what typically is a hypothetical common ancestor. The closer that ancestor is in terms of time and/or similarity (particularly for microbes in terms of genetic similarity), then the more closely related two individuals are assumed to be, that is, the more recently the two individuals shared a common ancestor. Less closely related individuals will have shared a common ancestor less recently and will be less similar genetically. Cladograms can be constructed particularly in terms of individual genes or instead in terms of whole organisms. The key piece of information provided by cladograms are branch points which represent speciation events where some, again often hypothetical common ancestor (in this case an individual species) gives rise to two species which, by definition, no longer to a substantial extent share gene pools. Various concepts are relevant to generating cladograms including homoplasies (products of convergent evolution but also evolutionary reversals), plesiomorphies (features present in ancestors to a common ancestor), and synapomorphies (features present in a common ancestor but not its ancestors). Cladograms ideally are constructed particularly on the basis of synapomorphies. In this course it is not the means of generation of cladograms that will be stressed but instead their interpretation once generated, and particularly so in terms of the impact of horizontal gene transfer. | |
Complexity hypothesis | Argument that genes that are more easily integrated into a recipient organism's metabolism following acquisition via horizontal gene transfer are more likely to be retained within a genome. |
Complexity is in terms especially of direct physical interactions between macromolecules such as protein-protein interactions or protein-RNA interactions (such as within ribosomes). The more numerous such interactions, the more important such interactions are to gene product functioning, and the more those interactions vary going from donor to recipient organism, then the less likely that a product of horizontal gene transfer will be subject to positive selection, and this is especially so given orthologous replacement. That is, when one gene via homologous recombination comes to replace an equivalent gene within an organism's genome, the more physical interactions that gene or its product has with other genes or gene products, then the less likely that the newly received version of the gene will benefit the receiving organism. This is (or would be) particular so because the receiving gene will be more likely to be detrimental due to flaws in its ability to interact with other macromolecules previously present within the receiving organism. The more complex the interactions of a component with other components, then the more difficult it is swap out that component for something that is somewhat different. One result of this is that it can be simpler to swap out components as a group of interacting components versus individually (e.g., a complete engine swap versus dropping the cam shaft from one engine into a completely unrelated engine, or a complete bacteriophage tail assembly versus individual genes/proteins making up that tail). | |
Core genes | Sections of organism genomes that are relatively resistant to horizontal gene transfer. |
Core genes are those that serve to define a species and might also serve to define a species' basic ecology. They tend to be subject to stabilizing selection (and hence their retention within genomes) and may be somewhat resistant to orthologous replacement (gene swapping) due to physical associations of their gene products with those of other core genes within their native genomes (i.e., see complexity hypothesis). Due to their resistance to orthologous replacement, core genes can serve to define organismal phylogenies. Note that "core" in this term generally is meant to imply "central" to an organism's functioning or phylogeny, though it also, in light of interactions with other genes, can be viewed as the actual, physical position occupied by these genes (i.e., as their products may be found at the core of multimolecular complexes). This latter connotation I point out, however, only for the purpose of visualization rather than to imply that the products of core genes are necessarily also associated with the "cores" of multimolecular complexes. | |
Functional redundancy (alt link) | Two sections of genomes that upon expression give rise to equivalent traits. |
When two genes are functionally redundant then the presence of one in a genome does not result in any increase in functionality, particularly as impacts fitness, given the presence of the other as well. This means that the acquisition of a gene via horizontal gene transfer, if it is functionally redundant, will not be subject to positive selection, and thus may be retained within a genome only via luck (that is, genetic drift). Gene duplication events, for example, tend to result in functional redundancy. On the other hand, it is possible to for a gene product to be functionally redundant in some ways but not others. As a consequence, it is possible for a gene to, for example, under some conditions to be completely functionally redundant to another gene while under other circumstances to not be completely functionally redundant and therefore to be at least potentially independently useful. The result can be a selective retention of functional redundancy within genomes that in fact is for reasons other than solely for the sake of retention of functional redundancy, such as optimal functionality of gene products at different temperatures in combination with residual and therefore redundant functionality over a broader range of temperatures. | |
Gene duplication | Gaining of redundant information by a genome through the copying of genetic sections already found within a genome. |
Gene duplication represents a classic explanation for how new genes evolve within organisms, and particularly an explanation that predates more modern appreciation of the importance of horizontal gene transfer particularly to microorganism evolution. The "problem" with gene duplication is that at least initially it can supply no new information and therefore may not be subject to positive selection (assuming that the gene is duplicated in full versus, for example, recombining with other genes to generate novel sequences/genes). A different issue is that it can be difficult to distinguish over longer time periods gene duplication events from gene insertion events that instead are associated with horizontal gene transfer. The consequence of gene duplication events nevertheless is the formation of what are known as paralogs. | |
Gene phylogeny | Description of evolutionary relationships between organisms that may have been impacted by horizontal gene transfer. |
A gene phylogeny is a depiction of the evolutionary relationships as seen among individual alleles. These gene phylogenies may or may not be equivalent to phylogenies associated with the organisms containing the gene or genes in question. This is because, given horizontal gene transfer, then individual genes (or larger or smaller sections of genomes) can display independent evolutionary histories from each other and thus evolutionary histories that are independent from that of the organism itself. Contrast gene phylogenies, that is, organismal phylogenies. As a result of these independent evolutionary histories – where explicitly what is occurring is gene evolution that takes place while the gene is not yet present within the genome of a given organism – is that phylogenies that are based on individual genes (especially comparisons of otherwise equivalent genes sequences between organisms) will not necessarily be equivalent to the phylogenies based on other genes that nonetheless are found within the same genome. | |
Genome | The bulk of hereditary material associated with individual cells or viruses. |
Genomes can be haploid, diploid, or polyploid, and may or may not conceptually include accessory genetic elements, such as plasmids. Genomes are encoded particularly by chromosomes. Nonetheless, the concept of genome is an ambiguous one that always should be considered to be at least a little bit "fuzzy" unless an author has made an effort the explicitly define what is being meant by "genome". Here we will employ the more "fuzzy" sense of what a genome is, using the word to cover a number of circumstances in which the bulk of an organism's hereditary material is being referred to. | |
Homologs (homologues; homologous genes) | Two or more genes that share sequence similarity presumably as a consequence of shared descent. |
Homologs have descended from an ancestral gene that also shared this homology. The concept is not identical to that of homologous chromosome which, in fact, as of this writing, is the target for a "homologue" search within Wikipedia, hence the favicon-associated searches of google on 'gene homolog' (without the quotes) or 'gene homologue' to the left (a search on "homolog" by contrast hits instead "homology"). In any case, note that "Homolog", "Homologue", and "Homologous gene", as used here are just different spellings of the same term. | |
Monophyletic | Evolutionarily correctly specified taxa. |
Monophyletic is referring to a taxon that contains all extant members along with the common ancestor to all extant members. A monophyletic taxon, that is, is a clade. Note that the word "all" is used a little loosely here. That is, while it is correct that in fact all descendants of that common ancestor must be present to make up the monophyly, it is worth keeping in mind that not all of those members will necessarily be known (in fact, there really is no way that all members will be known). Furthermore, it is highly likely that even the common ancestor will not be known. Lastly, especially for fossil species, there will not necessarily be conclusive evidence of membership. Thus, while in taxonomy the designation of monophyletic tax is the goal, that does not mean that this process is foolproof, easy, nor really ever followed through to an unambiguous completion. | |
Mosaic evolution | The impact of successful horizontal gene transfer on organisms. |
Mosaic evolution is the genomic consequence of horizontal gene transfer and represents the existence in genomes of xenologs. Mosaicism by definition represents a lack of congruity between organism and gene phylogenies. In highly mosaic genomes, as one sees with various viruses such as tailed bacteriophages, it can be difficult or even impossible to define core genes nor even "what came from what" since ultimately what one is working with is similarities between some segments of genomes with one set of organisms and other similarities for other segments of genomes with other sets of organisms. In other words, a BLAST">BLAST hit only tells you that two sequences are similar rather than that one gave rise to the other (or, more generally, correlation does not imply causality, nor do extant organisms evolutionarily give rise to one another but instead all evolved from more or less no longer living ancestral organisms). | |
Organismal phylogeny | Description of evolutionary relationships between organisms that are not a consequence of nor have been impacted by horizontal gene transfer. |
With the evolutionary relationships described by organismal phylogenies there is an assumption, often implicit, that some reasonable number of genes found within a taxon, e.g., core genes, possess effectively identical evolutionary histories. In many cases, given appropriate gene choice, this is probably a reasonable assumption, though nonetheless not necessary an assumption without violation both since in biology there almost "always" are exceptions and also "appropriate gene choice" really is key. Nevertheless, the important point is that within a given taxon there will be defining synapomorphies and these along with plesiomorphies, so long as these are present as a consequence of truly vertical descent from a common ancestor, will allow the defining of an organismal phylogeny. See, equivalently, species phylogeny. Note that the existence of an organismal phylogeny does not imply that horizontal gene transfer does not occur nor even that the genes that are used to generate the organismal phylogeny are immune to horizontal gene transfer but instead that any genes that do undergo substantial changes in the course of horizontal gene transfer (i.e., as following orthologous replacement) are (ideally) not used to generate an organismal phylogeny while any genes that are used to generate the organismal phylogeny have not been subject to easily recognized change in the course of any horizontal gene transfer. | |
Orthologous replacement | Swapping of one gene for another homologous gene in the course of horizontal gene transfer. |
Orthologous replacement can result in increases in the identity of genes found in different strains or species as a consequence of horizontal gene transfer. Specifically, greater similarity can be seen among two alleles found in two different lineages – if those two alleles are related as a consequence of orthologous replacement – than is consistent with their organismal phylogeny (AA|BBCCBC → AA|BACBC). The result can be a lack of equivalence between gene and organismal phylogenies. Note that homologous replacement occurs as a consequence of homologous recombination and the retention of these swaps, at least in terms of natural selection, will be a function of whether the new allele can give rise in the recipient organism the same fitness benefits as supplied by the original allele that has been replaced (note too the concepts considered under the heading of complexity hypothesis). | |
Orthologs (orthologues; orthologous genes) | Divergent nucleotide sequences that are products of vertical inheritance from a common ancestor. |
Orthologs are homologous genes that are shared by two or more species. Numerous orthologs exist, for example, among humans and chimpanzees, or indeed among humans and domestic mice, and the standard explanation, in these species, is that equivalent homologous genes also existed in the common ancestors to the extant species. In a species in which horizontal gene transfer exists to greater extents, such as among bacteria, the explanation for why gene homology may exist between species can be more complicated (see, e.g., orthologous replacement). Commonly, though, orthologs are described as homologs that have been separated by speciation events, which is to say, in the course of the vertical gene transfer evens that give rise to organismal phylogenies. Homologs thus are not necessarily orthologs though orthologs are necessarily homologs. Note that the three listed terms (orthologs, orthologues, and orthologous genes) should be viewed as more or less equivalent synonyms. | |
Paralogs (paralogues; paralogous genes) | Products of gene duplication events residing in the same genome. |
Paralogs are homologous genes that are found within the same individual where the similarity is a consequence of a gene duplication event. Note that among multiple loci in multiple species there can be multiple paralogous and orthologous relationships where orthologous relationships are a function of possessing the same locus whereas two paralogs, by definition, will not possess the same locus. As a result, different species can possess both orthologous and paralogous genes, though within individuals only paralogous relationships are possible, i.e., homology shared between different loci. Even more confusingly, gene duplication events may or may not be products of horizontal gene transfer events (i.e., within-genome duplications, e.g., products simply of heterologous recombination, versus insertions of genetic material sourced from different individuals, i.e., sex). | |
Paraphyletic | Evolutionarily incorrectly specified taxa due to omission of legitimate members. |
Referring to a taxon in which a sub-group has been intentionally not included for convenience or historical reasons, e.g., not including birds among dinosaurs or humans among great apes. Paraphylies are not clades so therefore taxonomically are mistakes but nonetheless these often, at least in modern times, can be intentional mistakes. Most relevant to this text are when paraphylies are created as a consequence of horizontal gene transfer, e.g., such as the separation out of a taxon from a larger taxon due to the presence or absence of a specific endosymbiont. Among microorganisms, when these errors are caught, particularly in the course of nuclear genome sequencing, they can give rise to the merging of groups that previously had been considered to be separate, resulting in new groups that may not make as much "sense" as previous groupings perhaps had (e.g., the grouping of the apicomplexans which are parasitic protists such as Plasmodium, the cause of malaria, with dinoflagellates, which are often photosynthetic flagellates, as well as with the heterotrophic ciliates, etc.). | |
Phylogeny | Description of the degree to which different organisms, especially different species, are related to each other. |
A phylogeny is a description of the evolutionary relationship of organisms, typically indicating differences in the closeness of the genetic relationship between organisms. In the phylogeny of humans, bonobos, and lemurs, humans thus are more closely related to bonobos than bonobos are related to lemurs, or humans to lemurs (bonobos are more commonly referred to as "pigmy chimpanzees"). Note that graphical representations of phylogenies are described as cladograms. | |
Polyphyletic | Evolutionarily incorrectly specified taxa due to inclusion of illegitimate members. |
Polyphyletic refers to taxa that fail to include all extant members usually because two or more species have been incorrectly grouped together due to superficial resemblance, e.g., a taxon that consists of the flying vertebrates, birds and bats, but that excludes crocodiles (a moderately bird-like reptile) and lemurs (a mammal). Polyphyly typically result from cases of convergent evolution (i.e., the mistaking of analogies for homologies). The most important thing to recognize about polyphylies is that they are to be avoided as well as eliminated whenever they are identified. Nevertheless, horizontal gene transfer can confuse this issue since it can result in phylogenies that are monophyletic for one gene (a gene phylogeny) that nonetheless, were that phylogeny applied directly to the encoding organisms (organismal phylogeny), may imply a polyphyletic taxon: a mistake, as is often the case for polyphylies, that is a consequence having been based on insufficient information. | |
Species phylogeny | Description of evolutionary relationships between organisms that are not a consequence of nor have been impacted by horizontal gene transfer. |
See, equivalently, organismal phylogeny. | |
Synologs (synologues; synologous genes) | Two or more loci that are homologous but for unknown reasons. |
These could be orthologs in which the role of horizontal gene transfer in their establishment is unknown. That is, are they or are they not as similar as they are as a consequence of orthologous replacement? Alternatively, are two paralogous genes a consequence of intra-genomic recombination events or instead inter-genomic recombination (the latter, i.e., as associated with horizontal gene transfer, but not the former). The concept of synolog captures nicely the complexity that arises in determining evolutionary relationships as well as genome evolution in the face of possible but not confirmed horizontal gene transfer events. | |
Taxon (taxa) | Groups of species whose members are more closely related to each other than they are to other such groups. |
Taxa are groupings of organisms according to "blood"/genetic relations where members of a given grouping should be more closely related with each other than any are related to members of other such non-overlapping groupings. For example, members of domain Bacteria should be more closely related to members of domain Bacteria than they are to members of either domain Archaea or domain Eukarya. As should be a theme by now, the existence of horizontal gene transfer complicates the grouping of organisms into taxa. Higher level taxa that are properly defined, that is, groupings of species, ideally are defined as clades, a.k.a., monophyletic taxa. | |
Xenologs (xenologues; xenologous genes) | Two or more loci that are both homologous and related as a consequence of horizontal gene transfer. |
Xenologs are genes that are homologous in different individuals as a consequence of horizontal gene transfer, including as a consequence of orthologous replacement, rather than due exclusively to vertical descent from a common ancestor. In a sense xenologs therefore are homologs that are a products of horizontal gene transfer rather, orthologous replacement or illegitimate recombination, rather than strictly as a consequence of vertical inheritance (i.e., rather than strictly being either orthologs or paralogs). A subset of synologs (HGT uncertain) therefore are xenologs (HGT certain) or instead orthologs or paralogs (not products of HGT). | |
Zone of paralogy | Description of the potential for enhanced likelihood of retention of horizontally acquired products of illegitimate recombination. |
The zone of paralogy, potentially better described as a 'zone of xenology', is an argument that the acquisition of novel functions may be more likely than the orthologous replacement of existing genes with divergent variants. This is particularly so to the extent that the acquisition of xenologs neither disrupts preexisting functions within a genome (i.e., as consistent with the complexity hypothesis) nor are so divergent from an organism's other genes or functions as to be unlikely to display an immediately useful function. The zone of paralogy is not exactly a description of increased likelihood of acquisition of additional genetic material (i.e., as following illegitimate recombination) versus products of homologous recombination (e.g., orthologous replacement) but instead that products of illegitimate recombination are more easily recognized after the fact than orthologous replacement involving highly similar or instead identical alleles. |