Tackling the current limitations of bacterial taxonomy with genome-based classification and identification on a crowdsourcing Web service
MetadataShow full item record
Bacterial taxonomy is the science of classifying, naming, and identifying bacteria. The scope and practice of taxonomy has evolved through history with our understanding of life and our growing and changing needs in research, medicine, and industry. As in animal and plant taxonomy, the species is the fundamental unit of taxonomy, but the genetic and phenotypic diversity that exists within a single bacterial species is substantially higher compared to animal or plant species. Therefore, the current "type"-centered classification scheme that describes a species based on a single type strain is not sufficient to classify bacterial diversity, in particular in regard to human, animal, and plant pathogens, for which it is necessary to trace disease outbreaks back to their source. Here we discuss the current needs and limitations of classic bacterial taxonomy and introduce LINbase, a Web service that not only implements current species-based bacterial taxonomy but complements its limitations by providing a new framework for genome sequence-based classification and identification independently of the type-centric species. LINbase uses a sequence similarity-based framework to cluster bacteria into hierarchical taxa, which we call LINgroups, at multiple levels of relatedness and crowdsources users' expertise by encouraging them to circumscribe these groups as taxa from the genus-level to the intraspecies-level. Circumscribing a group of bacteria as a LINgroup, adding a phenotypic description, and giving the LINgroup a name using the LINbase Web interface allows users to instantly share new taxa and complements the lengthy and laborious process of publishing a named species. Furthermore, unknown isolates can be identified immediately as members of a newly described LINgroup with fast and precise algorithms based on their genome sequences, allowing species- and intraspecies-level identification. The employed algorithms are based on a combination of the alignment-based algorithm BLASTN and the alignment-free method Sourmash, which is based on k-mers, and the MinHash algorithm. The potential of LINbase is shown by using examples of plant pathogenic bacteria.
- Doctoral Dissertations