Recall that evolution can be considered from two perspectives, roughly fitness (or adaptation) versus change in allele frequency. Alternatively, one can view these differences as corresponding (also roughly) to phenotype versus genotype, or instead to mechanisms of natural selection versus the consequences of evolutionary forces in general. Throughout this text the emphasis tends toward phenotypic/fitness/natural selection considerations , though other evolutionary forces are fairly well considered as well (i.e., mutation, drift, and migration). In this section, by contrast, the emphasis is more toward genetic change per se. The reason for this change of emphasis is twofold. The first is that I will consider issues of phylogenetics, i.e., the evolutionary and especially macroevolutionary relationships between individual organisms. Much of phylogenetics, in this modern era, stems from an analysis of nucleotide sequence, that is, of genotype. The second reason is that I also consider microevolutionary forces, especially migration and natural selection, as they shape genomes, which generally are also analyzed at the level of nucleotide sequence.
Notwithstanding this emphasis on genotype, I want to take a moment to discuss phenotype and its relationship to genotype and therefore to genomes. Genomes contain genes that are expressed through transcription and, often, through translation as well. The resulting proteins play structural, regulatory, and catalytic roles that give rise to metabolism, physiology, morphology, and behavior. The latter, collectively, can be described as organismal properties, which in turn can both impact environments beyond the expressing organism and display emergent properties when the characteristics of multiple organisms are combined. Thus, genomes produce gene products which together, and often in complicated ways, interface with environments as phenotypes. Environments, in turn, feedback on the molecular as well as organismal, modifying genotype evolutionarily, i.e., as a consequence of natural selection, while modifying phenotype, also directly, and thus giving rise to physiological, morphological, or behavioral plasticity. Collectively we can describe phenotypic modifications if not associated with genotypic modifications as phenotypic plasticity.
These various organismal properties can be described in terms of Darwinian fitness. It is important, however, to keep in mind that fitness can have many components and, further, that adaptive solutions in one area can result in reduced adaptation in another (i.e., antagonistic pleiotropy/tradeoffs). Genomes, as genotype, overall display propensities to survive and reproduce which are functions of how their phenotypes interface with environments and, in the course of these interactions and especially in terms of population characteristics, genomes are themselves modified. My goal, here, is the identification and discussion of general principles regarding those modifications including basic principles of phylogenetics, which ultimately, especially in this modern era, is a description of genome-genome relatedness; the concept of species and speciation which can be viewed in terms of genome reproductive or recombinogenic isolation from other genomes; and how natural selection can give rise to a retention of certain genomic features such as linked genes while rejecting others.
Phylogenetics is the study of the evolutionary interrelationship of organisms. Originally a phenotypic undertaking, modern phylogenetics has become essentially synonymous with molecular phylogenetics. This is especially so for microbial phylogenetics where phenotype information is less useful to phylogenic reconstruction and genotype information is almost painfully abundant. The overarching triumph of molecular microbial phylogenetics was the discovery of domain Archaea and the subsequent classification of all cellular organisms on the basis of small subunit ribosomal RNA gene sequence (ssu rRNA; i.e., 16S and 18S rRNAs). The perhaps less stellar but no less significant discovery has been that of extensive horizontal gene transfer among organisms, especially microorganisms. In both cases what at first appeared to be anomalous results – a new form of life or evolutionary trees that differ for different genes within the same organisms – came to be viewed as profound alterations in our basic understanding of organismal relationship and genomic evolution.
To some degree the utility of the domain system of cellular-organism classification – as outstanding and useful as it otherwise clearly is – is in conflict with the view that horizontal gene transfer is so pervasive. This conflict is such where, on the one hand, we now have a means of unambiguously distinguishing and classifying lineages (i.e., as based on ssu rRNA gene sequence), whereas on the other hand we no longer can be fully sure of how or why such unambiguous differentiation of lineages is even possible. That is, if microorganisms can so readily exchange DNA, then how is it that ssu rRNA genes themselves are not also exchanged, thereby invalidating otherwise carefully honed phylogenies? Alternatively, if everything (seemingly) but ssu rRNA genes may be horizontally exchanged, then what meaning is one left with vis-à-vis ssu rRNA sequence and resulting phylogenies? The answer, seemingly, is that horizontal gene exchange has limits, and those limits are (perhaps) what microorganism phylogenies actually represent, with ssu rRNA phylogenies only one of many ways that the limitations to horizontal gene exchange may be classified. Another way of stating these points are that "it must be admitted (i) that it is not logical to equate gene phylogeny and organismal phylogeny and (ii) that, unless organisms are construed as either less or more than the sum of their genes, there is no unique organismal phylogeny" (Doolittle, 1999). Alternatively, one can speak of "a 'vapour' of transient genes surrounds a stable set of core genes" (Gogarten and Townsend, 2005).
A phylogeny is a description of organismal relatedness, often as viewed from above the level of the species, that is, of macroevolutionary relationships. Typically phylogenies are presented as cladograms in which important features are the relative location of nodes, which can be viewed as speciation points, i.e., the occurrence of divergence between lineages. Absolute distances between both nodes and organisms ideally can represent actual temporal distances between organisms and ancestors (or descendants), but usually instead represent degrees of divergence of features such as in terms of nucleotide sequence. Ideally phylogenies allow for the assemblage of species or isolates into monophyletic groupings, that is, into taxa, which one otherwise calls clades. Clades consist of all of the extant organisms as well as all of their ancestors that are not shared with other clades.
Thus, for example, the clade of humans and chimpanzees (including bonobos) is represented by all extant human and chimpanzee-like species as well as the common ancestor to all human and chimpanzee-like species. This clade, however, does not include the common ancestor to, for example, humans, chimpanzees, and gorillas, which is a more distant relative but which would be included in the human-chimpanzee-gorilla clade. Note that clades can overlap hierarchically, i.e., the human-chimpanzee clade is nested within the human-chimpanzee-gorilla clade. Note also that horizontal gene transfer is complicating to clade-based classification schemes. For example, imagine the classification problems were it the case that genes had been transferred directly between humans and gorillas such that some human and gorilla genes were substantially more closely related to each other than those genes were related to those of chimpanzees.
Two common sources of error occur in the course of determination of clades. The first is the generation of polyphyletic taxa. This occurs when the common ancestor to two (or more) species is thought to exist later in evolutionary time than it actually is. That is, two lineages are thought to be more closely related than they actually are. These errors typically occur as a consequence of convergent evolution, where similar adaptations are observed in otherwise ecologically similar organisms but where those adaptations, in fact, do not stem from a common ancestor that also displayed the adaptations in question (that is, they are not synapomorphies). The great utility of molecular phylogenetics, in addition to the ease with which data (nucleotide sequence) may be collected, along with a relative lack of ambiguity in that data, is the potential to avoid creating polyphylies.
An example of such errors could be a grouping of Staphylococcus aureus with Escherichia coli based solely on each displaying a facultatively anaerobic oxygen requirement. Alternatively, horizontal gene transfer can give rise to polyphyletic classifications that in fact seem to be supported by sequence relationship, though here the problem can be viewed as too little data, i.e., insufficient sequence data to indicate the horizontal transfer origin, the 'vapour' status, of a given feature. Nonetheless, the result can be gene phylogenies, such as perhaps those encoding fermentation and cellular respiration, that are not necessarily identical to organismal phylogenies.
Even without considering phenotypic similarities, an important route towards what can appear to be convergence, such as of sequence data, is horizontal gene transfer. In this modern age of sequence-based phylogenetics, what once was described as polyphylies, where essentially phylogenies are assembled because of the mistaken conclusion that one or more features of different organisms are identical by descent, today are described as gene phylogenies that deviate from organismal phylogenies. That is, an actually not mistaken conclusion is made that one or more genes are identical by descent, resulting in a tree that describes two groups as being more closely related than they actually are, and therefore possessing a more recent common ancestor than they actually have. The mistake here, rather than in assigning common descent to two features that lack common descent (e.g., bird and beetle wings), is instead to assume that common descent stems from vertical rather than horizontal inheritance, which especially with microorganisms is often not the case.
The second common error in phylogenetic reconstruction is the generation of so-called paraphyletic taxa. These can be viewed as errors, but ones that are intentional. In a paraphyletic taxon a descendant species is not classified with other descendants of an ancestral species. An example of a paraphyletic taxon would be dinosaurs without birds, which instead are dinosaur descendants (specifically of theropod dinosaurs). Uncorrected paraphyletic taxa can be considered to represent consensus value-judgments by researches, that is, to some extent they are useful despite being misleading in a phylogenetic sense.
Contrasting paraphylies, polyphyletic taxa are simply mistakes. The question then is what to do when in fact monophyletic gene-based phylogenies, which are not incorrect but nonetheless are not a consequence of solely vertical inheritance, give rise to organismal phylogenies that mistakenly appear to be polyphyletic. To some degree, however, we can disregard this concern if we simply accept that there can be conflicts between gene-based and more genome-based organismal phylogenies.