Integrative Genomic Approaches for Plant Trait Discovery, Fungal Pathogen Identification and Taxonomy 

Loading...
Thumbnail Image

TR Number

Date

2026-03-02

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Genomics underpins crop improvement and plant health surveillance, yet its application remains constrained by incomplete detection of complex genetic variation, difficulty recovering pathogen genomes directly from diseased tissues, and inconsistent reference resources for fungal and oomycete identification. This dissertation integrates long-read sequencing, metagenomic approaches, and genome-based classification to address these limitations. First, long-read resequencing of 29 food-grade soybean and edamame genotypes enabled discovery of structural variants (SVs) associated with agronomic and seed-quality traits and established a workflow for SV candidate discovery and marker development. Experimental validation confirmed SV-phenotype relationships, including a 1,443-bp deletion between Kunitz trypsin inhibitor (KTI) genes associated with reduced expression and decreased seed KTI content. Second, long-read metagenomic sequencing was applied to vascular streak dieback, an emerging disease of woody ornamentals in the U.S., in contexts where culturing is infeasible. Sequencing of 106 samples from 34 host species across seven states identified Ceratobasidium sp. as the only pathogen consistently detected across samples and made it possible to assemble 17 high-quality genomes. Comparative phylogenomics and pangenome analyses indicated that U.S. isolates form a distinct cluster relative to Ceratobasidium theobromae and revealed gene-content differences, including candidate effectors and secondary metabolite gene clusters, which may contribute to host interaction and support improved diagnostics. Third, this dissertation introduces Myco-genomeRxiv, a web platform implementing an ANI-based Life Identification Number (LIN) system for genome-based identification and strain typing of fungi and oomycetes. Populated with 19,155 genomes from the NCBI Assembly database, the system uses genome-based classification to flag misassigned taxonomic identifiers and likely contamination and circumscribes 17,702 putative species using existing genome membership or a provisional 99% ANI threshold. Collectively, these studies integrate long-read sequencing, metagenomics, and genome-scale classification into a unified framework that expands discovery of trait-associated variation, enables genome-resolved investigation of disease from complex plant samples, and improves the stability and reproducibility of fungal and oomycete taxonomy for agricultural, clinical, and biosecurity applications.

Description

Keywords

structural variation, metagenomics, pathogen detection, classification, long-read sequencing, nanopore sequencing

Citation