Integrative Genomic Approaches for Plant Trait Discovery, Fungal Pathogen Identification and Taxonomy 

dc.contributor.authorBelay, Kassaye Hussenen
dc.contributor.committeechairVinatzer, Boris A.en
dc.contributor.committeecochairLi, Songen
dc.contributor.committeememberBrown, C. Titusen
dc.contributor.committeememberZeng, Yuanen
dc.contributor.committeememberHaak, David C.en
dc.contributor.departmentGenetics, Bioinformatics, and Computational Biologyen
dc.date.accessioned2026-03-03T09:00:16Zen
dc.date.available2026-03-03T09:00:16Zen
dc.date.issued2026-03-02en
dc.description.abstractGenomics underpins crop improvement and plant health surveillance, yet its application remains constrained by incomplete detection of complex genetic variation, difficulty recovering pathogen genomes directly from diseased tissues, and inconsistent reference resources for fungal and oomycete identification. This dissertation integrates long-read sequencing, metagenomic approaches, and genome-based classification to address these limitations. First, long-read resequencing of 29 food-grade soybean and edamame genotypes enabled discovery of structural variants (SVs) associated with agronomic and seed-quality traits and established a workflow for SV candidate discovery and marker development. Experimental validation confirmed SV-phenotype relationships, including a 1,443-bp deletion between Kunitz trypsin inhibitor (KTI) genes associated with reduced expression and decreased seed KTI content. Second, long-read metagenomic sequencing was applied to vascular streak dieback, an emerging disease of woody ornamentals in the U.S., in contexts where culturing is infeasible. Sequencing of 106 samples from 34 host species across seven states identified Ceratobasidium sp. as the only pathogen consistently detected across samples and made it possible to assemble 17 high-quality genomes. Comparative phylogenomics and pangenome analyses indicated that U.S. isolates form a distinct cluster relative to Ceratobasidium theobromae and revealed gene-content differences, including candidate effectors and secondary metabolite gene clusters, which may contribute to host interaction and support improved diagnostics. Third, this dissertation introduces Myco-genomeRxiv, a web platform implementing an ANI-based Life Identification Number (LIN) system for genome-based identification and strain typing of fungi and oomycetes. Populated with 19,155 genomes from the NCBI Assembly database, the system uses genome-based classification to flag misassigned taxonomic identifiers and likely contamination and circumscribes 17,702 putative species using existing genome membership or a provisional 99% ANI threshold. Collectively, these studies integrate long-read sequencing, metagenomics, and genome-scale classification into a unified framework that expands discovery of trait-associated variation, enables genome-resolved investigation of disease from complex plant samples, and improves the stability and reproducibility of fungal and oomycete taxonomy for agricultural, clinical, and biosecurity applications.en
dc.description.abstractgeneralImproving crops and managing plant diseases increasingly depends on rapid, accurate genomic analysis. However, many agriculturally important crop variants are missed by short-read sequencing, and numerous plant diseases are caused by fungi and oomycetes that are difficult to culture and to identify reliably due to incomplete or inconsistent reference databases. This dissertation applies long-read sequencing, direct DNA sequencing from diseased plant tissues, and genome-scale classification to address these limitations and to make genomic information more application-relevant for breeding, diagnostics, and biosecurity. In the first part, long-read whole-genome resequencing was used to characterize structural variants (large DNA changes, including insertions and deletions) in 29 soybean genotypes and to link these variants to agronomic and seed-quality traits. This work generated a high-coverage long-read dataset, discovered previously unrecognized structural variants associated with phenotypic differences, and experimentally validated specific deletions that alter gene expression and seed composition. In the second part, long-read metagenomic sequencing, direct sequencing of DNA from symptomatic plant tissues, was applied to vascular streak dieback of woody ornamentals in the United States. Sequencing 106 diseased samples from seven states enabled consistent detection of the associated Ceratobasidium sp. fungus and recovery of 17 genomes. Finally, the dissertation introduces Myco-genomeRxiv, a web platform that identifies fungi and oomycetes using whole-genome similarity. Including 19,155 genomes, the system supports rapid, standardized identification and strain-level typing, and helps detect mislabeling and contamination in public genome collections. Overall, this work demonstrates that integrating long-read sequencing with genome-based identification can enhance trait discovery for crop improvement, support genome-resolved investigation of emerging diseases directly from plant samples and provide a more robust framework for fungal and oomycete identification in research, agricultural, and biosecurity settings.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:45681en
dc.identifier.urihttps://hdl.handle.net/10919/141634en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution-NonCommercial 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/en
dc.subjectstructural variationen
dc.subjectmetagenomicsen
dc.subjectpathogen detectionen
dc.subjectclassificationen
dc.subjectlong-read sequencingen
dc.subjectnanopore sequencingen
dc.titleIntegrative Genomic Approaches for Plant Trait Discovery, Fungal Pathogen Identification and Taxonomy en
dc.typeDissertationen
thesis.degree.disciplineGenetics, Bioinformatics, and Computational Biologyen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Belay_KH_D_2026.pdf
Size:
16.08 MB
Format:
Adobe Portable Document Format