Rethinking phylogenetics using Caryophyllales (angiosperms), matK gene and trnK intron as experimental platform

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Tech

The recent call to reconstruct a detailed picture of the tree of life for all organisms has forever changed the field of molecular phylogenetics. Sequencing technology has improved to the point that scientics can now routinely sequence complete plastid/mitochondrial genomes and thus, vast amounts of data can be used to reconstruct phylogenies. These data are accumulating in DNA sequence repositories, such as GenBank, where everyone can benefit from the vast growth of information. The trend of generating genomic-region rich datasets has far outpaced the expasion of datasets by sampling a broader array of taxa. We show here that expanding a dataset both by increasing genomic regions and species sampled using GenBank data, despite the inherent missing DNA that comes with GenBank data, can provide a robust phylogeny for the plant order Caryophyllales (angiosperms). We also investigate the utility of trnK intron in phylogeny reconstruction at relativley deep evolutionary history (the caryophyllid order) by comparing it with rapidly evolving matK. We show that trnK intron is comparable to matK in terms of the proportion of variable sites, parsimony informative sites, the distribution of those sites among rate classes, and phylogenetic informativness across the history of the order. This is especailly useful since trnK intron is often sequenced concurrently with matK which saves on time and resources by increasing the phylogenetic utility of a single genomic region (rapidly evolving matK/trnK). Finally, we show that the inclusion of RNA edited sites in datasets for phylogeny reconstruction did not appear to impact resolution or support in the Gnetales indicating that edited sites in such low proportions do not need to be a consideration when building datasets. We also propose an alternate start codon for matK in Ephedra based on the presense of a 38 base pair indel in several species that otherwise result in pre-mature stop codons, and present 20 RNA edited sites in two Zamiaceae and three Pinaceae species.

gnetophytes, RNA editing, matK, trnK intron, caryophyllids, missing data, phylogeny