Named Entity Recognition for Bacterial Type IV Secretion Systems

dc.contributor.authorAnaniadou, Sophiaen
dc.contributor.authorSullivan, Danen
dc.contributor.authorBlack, Williamen
dc.contributor.authorLevow, Gina-Anneen
dc.contributor.authorGillespie, Joseph J.en
dc.contributor.authorMao, Chunhongen
dc.contributor.authorPyysalo, Sampoen
dc.contributor.authorKolluru, BalaKrishnaen
dc.contributor.authorTsujii, Junichien
dc.contributor.authorSobral, Brunoen
dc.date.accessed2014-04-19en
dc.date.accessioned2014-06-17T20:12:05Zen
dc.date.available2014-06-17T20:12:05Zen
dc.date.issued2011-03-29en
dc.description.abstractResearch on specialized biological systems is often hampered by a lack of consistent terminology, especially across species. In bacterial Type IV secretion systems genes within one set of orthologs may have over a dozen different names. Classifying research publications based on biological processes, cellular components, molecular functions, and microorganism species should improve the precision and recall of literature searches allowing researchers to keep up with the exponentially growing literature, through resources such as the Pathosystems Resource Integration Center (PATRIC, patricbrc.org). We developed named entity recognition (NER) tools for four entities related to Type IV secretion systems: 1) bacteria names, 2) biological processes, 3) molecular functions, and 4) cellular components. These four entities are important to pathogenesis and virulence research but have received less attention than other entities, e.g., genes and proteins. Based on an annotated corpus, large domain terminological resources, and machine learning techniques, we developed recognizers for these entities. High accuracy rates (>80%) are achieved for bacteria, biological processes, and molecular function. Contrastive experiments highlighted the effectiveness of alternate recognition strategies; results of term extraction on contrasting document sets demonstrated the utility of these classes for identifying T4SS-related documents.en
dc.description.sponsorshipThis was supported by the National Institutes of Health (http://www.nih.gov/) [grant numbers HHSN272200900040C and HHSN266200400035C] to BWSS. Sophia Ananiadou and Gina-Anne Levow acknowledge support by the Biotechnology and Biological Sciences Research Council (http://www.bbsrc.ac.uk/) through grant numbers BBS/B/13640, BB/F006039/1. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.en
dc.identifier.citationAnaniadou S, Sullivan D, Black W, Levow G-A, Gillespie JJ, et al. (2011) Named Entity Recognition for Bacterial Type IV Secretion Systems. PLoS ONE 6(3): e14780. doi:10.1371/journal.pone.0014780en
dc.identifier.doihttps://doi.org/10.1371/journal.pone.0014780en
dc.identifier.issn1932-6203en
dc.identifier.urihttp://hdl.handle.net/10919/48981en
dc.identifier.urlhttp://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0014780en
dc.language.isoen_USen
dc.publisherPublic Library of Scienceen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectBacteriaen
dc.subjectBacterial Pathologyen
dc.subjectGene ontologiesen
dc.subjectMachine learningen
dc.subjectNamed entity recognitionen
dc.subjectSecretion systemen
dc.subjectSyntaxen
dc.subjectText miningen
dc.titleNamed Entity Recognition for Bacterial Type IV Secretion Systemsen
dc.title.serialPLoS ONEen
dc.typeArticle - Refereeden
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
journal_pone_0014780.pdf
Size:
316.38 KB
Format:
Adobe Portable Document Format