Computational Tools for Improved Detection, Identification, and Classification of Plant Pathogens Using Genomics and Metagenomics

TR Number

Date

2023-02-13

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Plant pathogens are one of the biggest threats to plant health and food security worldwide. To effectively contain plant disease outbreaks, classification and precise identification of pathogens is crucial to determine treatment and preventive measurements. Conventional methods of detection such as PCR may not be sufficient when the pathogen in question is unknown. Advances in sequencing technology have made it possible to sequence entire genomes and metagenomes in real-time and at a relatively low cost, opening an opportunity for the development of alternative methods for detection of novel and unknown plant pathogens. Within this dissertation, an integrated approach is used to reclassify a high-impact group of plant pathogens. Additionally, the application of metagenomics and nanopore sequencing using the Oxford Nanopore Technologies (ONT) MinION for fungal and bacterial plant pathogen detection and precise identification are demonstrated. To improve the classification of the strains belonging to the Ralstonia solanacearum species complex (RSSC), we performed a meta-analysis using a comparative genomics and a reverse ecology approach to accurately portray and refine the understanding of the diversity and evolution of the RSSC. The groups identified by these approaches were circumscribed and made publicly available through the LINbase web server so future isolates can be properly classified. To develop a culture-free detection method of plant pathogens, we used metagenomes of various plants and long-read nanopore sequencing to precisely identify plant pathogens to the strain-level and performed phylogenetic analysis with SNP resolution. In the first paper, we used tomato plants to demonstrate the detection power of bacterial plant pathogens. We compared bioinformatics tools for detection at the strain-level using reads and assemblies. In the second paper, we used a read-based approach to test the feasibility of the methodology to precisely detect the fungal pathogen causing boxwood blight. Lastly, with the improvement in nanopore sequencing, we used grapevine petioles to investigate whether we can go beyond detection and identification and do a phylogenetic analysis. We assembled a metagenome-assembled genome (MAG) of almost the same quality as the genomes obtained from cultured isolates and did a phylogenetic analysis with SNP resolution. Finally, for the cases where there may be no related genome in the database like the pathogen in question, we used machine learning and metagenomics to develop a reference-free approach to detection of plant diseases. We trained eight different machine learning models with reads from healthy and infected plant metagenomes and compared the classification accuracy of reads as belonging to a healthy or infected plant. From the comparison, random forest was the best model in terms of computational resources needed while maintaining a high accuracy (> 0.90).

Description

Keywords

Metagenomics, plant disease detection, plant pathogen identification, long-read sequencing, nanopore sequencing

Citation