The Phylogenetic Reconstruction of the Grass Family (Poaceae) Using matK Gene Sequences
Liang, Hongping III
MetadataShow full item record
Comparative DNA sequencing of matK, a maturase-encoding gene located within the intron of the chloroplast trnK gene, was evaluated for phylogenetic utility above the family level and within the grass family (Poaceae). There are three major objectives in the research. The first one is to study the utility of the matK gene in plant evolution. The second objective is to characterize the matK gene in the grass family. The last major goal is to address the phylogenetic questions in the Poaceae using the matK sequences from representatives of different grass groups. In order to study the potential application of matK to plant systematics above the family level, eleven complete sequences from GenBank representing seed plants and liverworts and nine partial sequences generated for genera representing the monocot families Poaceae, Joinvilleaceae, Cyperaceae, and Smilacaceae were analyzed. The study underscored the following useful properties of the matK gene for phylogenetic reconstruction: reasonable size (1500 bp), high rate of substitution, large proportion of variation at the first and the second codon positions, low transition-transversion ratio, and the presence of mutationally-conserved sectors. The use of different sectors of the gene and the cumulative inclusion of informative sites showed that the 3' region was the most useful in resolving phylogeny, and that the topology and robustness of the tree reached a plateau after the inclusion of 100 informative sites. The presence of a relatively conserved 3' region and the less conserved 5' region provides two sets of characters that can be used at different taxonomic levels from the tribal to the division levels. It also has demonstrated the potential of partial sequencing in resolving systematic relationships from the tribe to the division level. The matK gene in the Poaceae was characterized with complete sequences from 11 grass genera, representing 7 subfamilies and 11 tribes, and one outgroup (Joinvillea plicata, Joinvillaceae). The alignment of 1632 base pairs from 14 species yielded a data set of 601 (36.8)% variable sites and 246 (15.1%) informative sites. The variations at nucleic and amino acid levels evenly distributed throughout the entire gene, and the 5' region appears to have more variation than the 3' region. The changes at the third codon position are very low as compared to the total of the first and second positions. This has led to a similar variation pattern at nucleic and at amino acid levels. The average tr/tv ratio generated from 14 entire matK sequences is 1.29. It is intriguing to find that the tr/tv ratios were regionally related. RASA analysis of the alignment data indicated a relatively high phylogenetic signal in the data set of 14 taxa. In the two half analyses, while the tRASA of the 5' half of the matK gene (0.43) is not significant, the 3' of the matK gene showed a significant phylogenetic signal. Among the 5 sections of the 14 entire matK sequences, only the fourth sector contains a statistically significant phylogenetic signal. These results indicate that matK is a phylogenetically valuable gene and that the 3' region of the matK gene contains strong phylogenetic information. A single most parsimonious tree was obtained from the 246 informative sites of the 14 entire matK sequences. Seven major groups were well resolved on the most parsimonious tree, corresponding to the seven commonly recognized subfamilies: Aruninoideae, Bambusoideae, Centothecoideae, Chloridoideae, Panicoideae, Pooideae and Oryzoideae. Approximately 960 base pairs of the matK gene were sequenced from grass species representing 48 genera, 21 tribes, and seven subfamilies to reconstruct a phylogeny for the Poaceae. Joinvillea plicata (Joinvilleaceae) was used as an outgroup species. The aligned sequences showed that 495 nucleotides (51%) were variable and 390 (36%) were phylogenetically informative. RASA indicated that very significant phylogenetic signals exist in this data set. The cumulative addition of informative sites starting at the internal end of the sequences revealed that at 300 sites, tree topology and bootstrap values matched those of the consensus tree based on the entire sequence. Parsimony analyses using PAUP resulted in six most parsimonious trees and a strict consensus tree showing major lineages supported by high bootstrap values. These lineages corresponded to six subfamilies: Bambusoideae, Oryzoideae, Pooideae, Chloridoideae, Panicoideae, and Arundinoideae. The Bambusoideae, including woody and herbaceous taxa, diverged as the most basal lineage, and the monophyletic oryzoid species formed a sister group. The Chloridoideae, Panicoideae, Arundinoideae, and the centothecoid Zeugitis (PACC group) emerged as a monophyletic assemblage with 95% bootstrap support. The Aristideae branched off as a monophyletic line basal to the chloridoid clade. Stipeae appeared as a sister taxon to the Pooideae. The matK-based phylogeny did not reveal a major dichotomy in the family. The matK gene has provided sequence information sufficient for good resolution of the major grass lineages.
- Doctoral Dissertations