Tackling the current limitations of bacterial taxonomy with genome-based classification and identification on a crowdsourcing Web service

dc.contributor.authorTian, Longen
dc.contributor.committeechairVinatzer, Boris A.en
dc.contributor.committeememberHeath, Lenwood S.en
dc.contributor.committeememberMarek, Paul E.en
dc.contributor.committeememberZhang, Liqingen
dc.contributor.departmentGenetics, Bioinformatics, and Computational Biologyen
dc.date.accessioned2021-04-18T06:00:28Zen
dc.date.available2021-04-18T06:00:28Zen
dc.date.issued2019-10-25en
dc.description.abstractBacterial taxonomy is the science of classifying, naming, and identifying bacteria. The scope and practice of taxonomy has evolved through history with our understanding of life and our growing and changing needs in research, medicine, and industry. As in animal and plant taxonomy, the species is the fundamental unit of taxonomy, but the genetic and phenotypic diversity that exists within a single bacterial species is substantially higher compared to animal or plant species. Therefore, the current "type"-centered classification scheme that describes a species based on a single type strain is not sufficient to classify bacterial diversity, in particular in regard to human, animal, and plant pathogens, for which it is necessary to trace disease outbreaks back to their source. Here we discuss the current needs and limitations of classic bacterial taxonomy and introduce LINbase, a Web service that not only implements current species-based bacterial taxonomy but complements its limitations by providing a new framework for genome sequence-based classification and identification independently of the type-centric species. LINbase uses a sequence similarity-based framework to cluster bacteria into hierarchical taxa, which we call LINgroups, at multiple levels of relatedness and crowdsources users' expertise by encouraging them to circumscribe these groups as taxa from the genus-level to the intraspecies-level. Circumscribing a group of bacteria as a LINgroup, adding a phenotypic description, and giving the LINgroup a name using the LINbase Web interface allows users to instantly share new taxa and complements the lengthy and laborious process of publishing a named species. Furthermore, unknown isolates can be identified immediately as members of a newly described LINgroup with fast and precise algorithms based on their genome sequences, allowing species- and intraspecies-level identification. The employed algorithms are based on a combination of the alignment-based algorithm BLASTN and the alignment-free method Sourmash, which is based on k-mers, and the MinHash algorithm. The potential of LINbase is shown by using examples of plant pathogenic bacteria.en
dc.description.abstractgeneralLife is always easier when people talk to each other in the same language. Taxonomy is the language that biologists use to communicate about life by 1. classifying organisms into groups, 2. giving names to these groups, and 3. identifying individuals as members of these named groups. When most scientists and the general public think of taxonomy, they think of the hierarchical structure of “Life”, “Domain”, “Kingdom”, “Phylum”, “Class”, “Order”, “Family”, “Genus” and “Species”. However, the basic goal of taxonomy is to allow the identification of an organism as a member of a group that is predictive of its characteristics and to provide a name to communicate about that group with other scientists and the public. In the world of micro-organism, taxonomy is extremely important since there are an estimated 10,000,000 to 1,000,000,000 different bacteria species. Moreover, microbiologists and pathologists need to consider differences among bacterial isolates even within the same species, a level, that the current taxonomic system does not even cover. Therefore, we developed a Web service, LINbase, which uses genome sequences to classify individual microbial isolates. The database at the backend of LINbase assigns Life Identification Numbers (LINs) that express how individual microbial isolates are related to each other above, at, and below the species level. The LINbase Web service is designed to be an interactive web-based encyclopedia of microorganisms where users can share everything they know about micro-organisms, be it individual isolates or groups of isolates, for professional and scientific purposes. To develop LINbase, efficient computer programs were developed and implemented. To show how LINbase can be used, several groups of bacteria that cause plant diseases were classified and described.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:22137en
dc.identifier.urihttp://hdl.handle.net/10919/103055en
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en
dc.subjectBacterial taxonomyen
dc.subjectaverage nucleotide identityen
dc.subjectANIen
dc.subjectmin-wise independent permutationsen
dc.subjectlocality sensitive hashingen
dc.subjectMinHashen
dc.subjectWeb serviceen
dc.subjectcrowdsourcingen
dc.titleTackling the current limitations of bacterial taxonomy with genome-based classification and identification on a crowdsourcing Web serviceen
dc.typeDissertationen
thesis.degree.disciplineGenetics, Bioinformatics, and Computational Biologyen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Tian_L_D_2019.pdf
Size:
13.07 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Tian_L_D_2019_support_1.pdf
Size:
39.54 KB
Format:
Adobe Portable Document Format
Description:
Supporting documents