Browsing by Author "Mao, Chunhong"
Now showing 1 - 19 of 19
Results Per Page
Sort Options
- Analysis of tall fescue ESTs representing different abiotic stresses, tissue types and developmental stagesMian, M. A. Rouf; Zhang, Yan; Wang, Zeng-Yu; Zhang, Ji-Yi; Cheng, Xiaofei; Chen, Lei; Chekhovskiy, Konstantin; Dai, Xinbin; Mao, Chunhong; Cheung, Foo; Zhao, Xuechun; He, Ji; Scott, Angela D.; Town, Christopher D.; May, Gregory D. (2008-03-04)Background Tall fescue (Festuca arundinacea Schreb) is a major cool season forage and turf grass species grown in the temperate regions of the world. In this paper we report the generation of a tall fescue expressed sequence tag (EST) database developed from nine cDNA libraries representing tissues from different plant organs, developmental stages, and abiotic stress factors. The results of inter-library and library-specific in silico expression analyses of these ESTs are also reported. Results A total of 41,516 ESTs were generated from nine cDNA libraries of tall fescue representing tissues from different plant organs, developmental stages, and abiotic stress conditions. The Festuca Gene Index (FaGI) has been established. To date, this represents the first publicly available tall fescue EST database. In silico gene expression studies using these ESTs were performed to understand stress responses in tall fescue. A large number of ESTs of known stress response gene were identified from stressed tissue libraries. These ESTs represent gene homologues of heat-shock and oxidative stress proteins, and various transcription factor protein families. Highly expressed ESTs representing genes of unknown functions were also identified in the stressed tissue libraries. Conclusion FaGI provides a useful resource for genomics studies of tall fescue and other closely related forage and turf grass species. Comparative genomic analyses between tall fescue and other grass species, including ryegrasses (Lolium sp.), meadow fescue (F. pratensis) and tetraploid fescue (F. arundinacea var glaucescens) will benefit from this database. These ESTs are an excellent resource for the development of simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) PCR-based molecular markers.
- Anopheles mosquitoes reveal new principles of 3D genome organization in insectsLukyanchikova, Varvara; Nuriddinov, Miroslav; Belokopytova, Polina; Taskina, Alena; Liang, Jiangtao; Reijnders, Maarten J. M. F.; Ruzzante, Livio; Feron, Romain; Waterhouse, Robert M.; Wu, Yang; Mao, Chunhong; Tu, Zhijian Jake; Sharakhov, Igor V.; Fishman, Veniamin (Nature Portfolio, 2022-04-12)Chromosomes are hierarchically folded within cell nuclei into territories, domains and subdomains, but the functional importance and evolutionary dynamics of these hierarchies are poorly defined. Here, we comprehensively profile genome organizations of five Anopheles mosquito species and show how different levels of chromatin architecture influence each other. Patterns observed on Hi-C maps are associated with known cytological structures, epigenetic profiles, and gene expression levels. Evolutionary analysis reveals conservation of chromatin architecture within synteny blocks for tens of millions of years and enrichment of synteny breakpoints in regions with increased genomic insulation. However, in-depth analysis shows a confounding effect of gene density on both insulation and distribution of synteny breakpoints, suggesting limited causal relationship between breakpoints and regions with increased genomic insulation. At the level of individual loci, we identify specific, extremely long-ranged looping interactions, conserved for similar to 100 million years. We demonstrate that the mechanisms underlying these looping contacts differ from previously described Polycomb-dependent interactions and clustering of active chromatin.
- Antimicrobial Resistance Prediction in PATRIC and RASTDavis, James J.; Boisvert, Sebastien; Brettin, Thomas; Kenyon, Ronald W.; Mao, Chunhong; Olson, Robert D.; Overbeek, Ross; Santerre, John; Shukla, Maulik; Wattam, Alice R.; Will, Rebecca; Xia, Fangfang; Stevens, Rick L. (Springer Nature, 2016-06-14)The emergence and spread of antimicrobial resistance (AMR) mechanisms in bacterial pathogens, coupled with the dwindling number of effective antibiotics, has created a global health crisis. Being able to identify the genetic mechanisms of AMR and predict the resistance phenotypes of bacterial pathogens prior to culturing could inform clinical decision-making and improve reaction time. At PATRIC (http://patricbrc.org/), we have been collecting bacterial genomes with AMR metadata for several years. In order to advance phenotype prediction and the identification of genomic regions relating to AMR, we have updated the PATRIC FTP server to enable access to genomes that are binned by their AMR phenotypes, as well as metadata including minimum inhibitory concentrations. Using this infrastructure, we custom built AdaBoost (adaptive boosting) machine learning classifiers for identifying carbapenem resistance in Acinetobacter baumannii, methicillin resistance in Staphylococcus aureus, and beta-lactam and co-trimoxazole resistance in Streptococcus pneumoniae with accuracies ranging from 88-99%. We also did this for isoniazid, kanamycin, ofloxacin, rifampicin, and streptomycin resistance in Mycobacterium tuberculosis, achieving accuracies ranging from 71-88%. This set of classifiers has been used to provide an initial framework for species-specific AMR phenotype and genomic feature prediction in the RAST and PATRIC annotation services.
- The Beginning of the End: A Chromosomal Assembly of the New World Malaria Mosquito Ends with a Novel TelomereCompton, Austin; Liang, Jiangtao; Chen, Chujia; Lukyanchikova, Varvara; Qi, Yumin; Potters, Mark B.; Settlage, Robert; Miller, Dustin; Deschamps, Stephane; Mao, Chunhong; Llaca, Victor; Sharakhov, Igor V.; Tu, Zhijian Jake (Genetics Society of America, 2020-10-01)Chromosome level assemblies are accumulating in various taxonomic groups including mosquitoes. However, even in the few reference-quality mosquito assemblies, a significant portion of the heterochromatic regions including telomeres remain unresolved. Here we produce a de novo assembly of the New World malaria mosquito, Anopheles albimanus by integrating Oxford Nanopore sequencing, Illumina, Hi-C and optical mapping. This 172.6 Mbps female assembly, which we call AalbS3, is obtained by scaffolding polished large contigs (contig N50 = 13.7 Mbps) into three chromosomes. All chromosome arms end with telomeric repeats, which is the first in mosquito assemblies and represents a significant step toward the completion of a genome assembly. These telomeres consist of tandem repeats of a novel 30-32 bp Telomeric Repeat Unit (TRU) and are confirmed by analyzing the termini of long reads and through both chromosomal in situ hybridization and a Bal31 sensitivity assay. The AalbS3 assembly included previously uncharacterized centromeric and rDNA clusters and more than doubled the content of transposable elements and other repetitive sequences. This telomere-to-telomere assembly, although still containing gaps, represents a significant step toward resolving biologically important but previously hidden genomic components. The comparison of different scaffolding methods will also inform future efforts to obtain reference-quality genomes for other mosquito species.
- Bioinformatic Analysis of Coronary Disease Associated SNPs and Genes to Identify Proteins Potentially Involved in the Pathogenesis of AtherosclerosisMao, Chunhong; Howard, Timothy D.; Sullivan, Dan; Fu, Zongming; Yu, Guoqiang; Parker, Sarah J.; Will, Rebecca; Vander Heide, Richard S.; Wang, Yue; Hixson, James; Van Eyk, Jennifer; Herrington, David M. (Open Access Pub, 2017-03-04)Factors that contribute to the onset of atherosclerosis may be elucidated by bioinformatic techniques applied to multiple sources of genomic and proteomic data. The results of genome wide association studies, such as the CardioGramPlusC4D study, expression data, such as that available from expression quantitative trait loci (eQTL) databases, along with protein interaction and pathway data available in Ingenuity Pathway Analysis (IPA), constitute a substantial set of data amenable to bioinformatics analysis. This study used bioinformatic analyses of recent genome wide association data to identify a seed set of genes likely associated with atherosclerosis. The set was expanded to include protein interaction candidates to create a network of proteins possibly influencing the onset and progression of atherosclerosis. Local average connectivity (LAC), eigenvector centrality, and betweenness metrics were calculated for the interaction network to identify top gene and protein candidates for a better understanding of the atherosclerotic disease process. The top ranking genes included some known to be involved with cardiovascular disease (APOA1, APOA5, APOB, APOC1, APOC2, APOE, CDKN1A, CXCL12, SCARB1, SMARCA4 and TERT), and others that are less obvious and require further investigation (TP53, MYC, PPARG, YWHAQ, RB1, AR, ESR1, EGFR, UBC and YWHAZ). Collectively these data help define a more focused set of genes that likely play a pivotal role in the pathogenesis of atherosclerosis and are therefore natural targets for novel therapeutic interventions.
- The chromosome-scale genome assembly for the West Nile vector Culex quinquefasciatus uncovers patterns of genome evolution in mosquitoesRyazansky, Sergei S.; Chen, Chujia; Potters, Mark; Naumenko, Anastasia N.; Lukyanchikova, Varvara; Masri, Reem A.; Brusentsov, Ilya I.; Karagodin, Dmitriy A.; Yurchenko, Andrey A.; dos Anjos, Vitor L.; Haba, Yuki; Rose, Noah H.; Hoffman, Jinna; Guo, Rong; Menna, Theresa; Kelley, Melissa; Ferrill, Emily; Schultz, Karen E.; Qi, Yumin; Sharma, Atashi; Deschamps, Stéphane; Llaca, Victor; Mao, Chunhong; Murphy, Terence D.; Baricheva, Elina M.; Emrich, Scott; Fritz, Megan L.; Benoit, Joshua B.; Sharakhov, Igor V.; McBride, Carolyn S.; Tu, Zhijian; Sharakhova, Maria V. (2024-01-25)Background: Understanding genome organization and evolution is important for species involved in transmission of human diseases, such as mosquitoes. Anophelinae and Culicinae subfamilies of mosquitoes show striking differences in genome sizes, sex chromosome arrangements, behavior, and ability to transmit pathogens. However, the genomic basis of these differences is not fully understood. Methods: In this study, we used a combination of advanced genome technologies such as Oxford Nanopore Technology sequencing, Hi-C scaffolding, Bionano, and cytogenetic mapping to develop an improved chromosome-scale genome assembly for the West Nile vector Culex quinquefasciatus. Results: We then used this assembly to annotate odorant receptors, odorant binding proteins, and transposable elements. A genomic region containing male-specific sequences on chromosome 1 and a polymorphic inversion on chromosome 3 were identified in the Cx. quinquefasciatus genome. In addition, the genome of Cx. quinquefasciatus was compared with the genomes of other mosquitoes such as malaria vectors An. coluzzi and An. albimanus, and the vector of arboviruses Ae. aegypti. Our work confirms significant expansion of the two chemosensory gene families in Cx. quinquefasciatus, as well as a significant increase and relocation of the transposable elements in both Cx. quinquefasciatus and Ae. aegypti relative to the Anophelines. Phylogenetic analysis clarifies the divergence time between the mosquito species. Our study provides new insights into chromosomal evolution in mosquitoes and finds that the X chromosome of Anophelinae and the sex-determining chromosome 1 of Culicinae have a significantly higher rate of evolution than autosomes. Conclusion: The improved Cx. quinquefasciatus genome assembly uncovered new details of mosquito genome evolution and has the potential to speed up the development of novel vector control strategies.
- Comparative nutritional and chemical phenome of Clostridium difficile isolates determined using phenotype microarraysScaria, Joy; Chen, Jenn-Wei; Useh, Nicodemus; He, Hongxuan; McDonough, Sean P.; Mao, Chunhong; Sobral, Bruno; Chang, Yung-Fu (International Society for Infectious Diseases, 2014-10)Objectives: Clostridium difficile infection (CDI) is the leading cause of infectious diarrhea in North America and Europe. The risk of CDI increases significantly in the case where antimicrobial treatment reduces the number of competing bacteria in the gut, thus leading to the increased availability of nutrients and loss of colonization resistance. The objective of this study was to determine comprehensive nutritional utilization and the chemical sensitivity profile of historic and newer C. difficile isolates and to examine the possible role of the phenotype diversity in C. difficile virulence. Methods: Phenotype microarrays (PMs) were used to elucidate the complete nutritional and chemical sensitivity profile of six C. difficile isolates. Results: Of the 760 nutrient sources tested, 285 compounds were utilized by at least one strain. Among the C. difficile isolates compared, R20291, a recent hypervirulent outbreak-associated strain, appears to have an expanded nutrient utilization profile when compared to all other strains. Conclusions: The expanded nutritional utilization profile of some newer C. difficile strains could be one of the reasons for infections in patients who are not exposed to the hospital environment or not undergoing antibiotic treatment. This nutritional profile could be used to design tube feeding formulas that reduce the risk of CDI.
- Differential Stress Transcriptome Landscape of Historic and Recently Emerged Hypervirulent Strains of Clostridium difficile Strains Determined Using RNA-seqScaria, Joy; Mao, Chunhong; Chen, Jenn-Wei; McDonough, Sean P.; Sobral, Bruno; Chang, Yung-Fu (Public Library of Science, 2014-11-07)C. difficile is the most common cause of nosocomial diarrhea in North America and Europe. Genomes of individual strains of C. difficile are highly divergent. To determine how divergent strains respond to environmental changes, the transcriptomes of two historic and two recently isolated hypervirulent strains were analyzed following nutrient shift and osmotic shock. Illumina based RNA-seq was used to sequence these transcriptomes. Our results reveal that although C. difficile strains contain a large number of shared and strain specific genes, the majority of the differentially expressed genes were core genes. We also detected a number of transcriptionally active regions that were not part of the primary genome annotation. Some of these are likely to be small regulatory RNAs.
- Genomic composition and evolution of Aedes aegypti chromosomes revealed by the analysis of physically mapped supercontigsTimoshevskiy, Vladimir A.; Kinney, Nicholas A.; deBruyn, Becky S.; Mao, Chunhong; Tu, Zhijian Jake; Severson, D. W.; Sharakhov, Igor V.; Sharakhova, Maria V. (Biomed Central, 2014-04-14)Background An initial comparative genomic study of the malaria vector Anopheles gambiae and the yellow fever mosquito Aedes aegypti revealed striking differences in the genome assembly size and in the abundance of transposable elements between the two species. However, the chromosome arms homology between An. gambiae and Ae. aegypti, as well as the distribution of genes and repetitive elements in chromosomes of Ae. aegypti, remained largely unexplored because of the lack of a detailed physical genome map for the yellow fever mosquito. Results Using a molecular landmark-guided fluorescent in situ hybridization approach, we mapped 624-Mb of the Ae. aegypti genome to mitotic chromosomes. We used this map to analyze the distribution of genes, tandem repeats and transposable elements along the chromosomes and to explore the patterns of chromosome homology and rearrangements between Ae. aegypti and An. gambiae. The study demonstrated that the q arm of the sex-determining chromosome 1 had the lowest gene content and the highest density of minisatellites. A comparative genomic analysis with An. gambiae determined that the previously proposed whole-arm synteny is not fully preserved; a number of pericentric inversions have occurred between the two species. The sex-determining chromosome 1 had a higher rate of genome rearrangements than observed in autosomes 2 and 3 of Ae. aegypti. Conclusions The study developed a physical map of 45% of the Ae. aegypti genome and provided new insights into genomic composition and evolution of Ae. aegypti chromosomes. Our data suggest that minisatellites rather than transposable elements played a major role in rapid evolution of chromosome 1 in the Aedes lineage. The research tools and information generated by this study contribute to a more complete understanding of the genome organization and evolution in mosquitoes.
- Guy1, a Y-linked embryonic signal, regulates dosage compensation in Anopheles stephensi by increasing X gene expressionQi, Yumin; Wu, Yang; Saunders, Randy; Chen, Xiaoguang; Mao, Chunhong; Biedler, James K.; Tu, Zhijian Jake (2019-03-19)We previously showed that Guy1, a primary signal expressed from the Y chromosome, is a strong candidate for a male-determining factor that confers female-specific lethality in Anopheles stephensi (Criscione et al., 2016). Here, we present evidence that Guyl increases X gene expression in Guy1-transgenic females from two independent lines, providing a mechanism underlying the Guy1-conferred female lethality. The median level gene expression (MGE) of X-linked genes is significantly higher than autosomal genes in Guy1-transgenic females while there is no significant difference in MGE between X and autosomal genes in wild-type females. Furthermore, Guyl significantly upregulates at least 40% of the 996 genes across the X chromosome in transgenic females. Guy1-conferred female-specific lethality is remarkably stable and completely penetrant. These findings indicate that Guyl regulates dosage compensation in An. stephensi and components of dosage compensation may be explored to develop novel strategies to control mosquito-borne diseases.
- Identification of new genes in Sinorhizobium meliloti using the Genome Sequencer FLX systemMao, Chunhong; Evans, Clive; Jensen, Roderick V.; Sobral, Bruno (2008-05-02)Background Sinorhizobium meliloti is an agriculturally important model symbiont. There is an ongoing need to update and improve its genome annotation. In this study, we used a high-throughput pyrosequencing approach to sequence the transcriptome of S. meliloti, and search for new bacterial genes missed in the previous genome annotation. This is the first report of sequencing a bacterial transcriptome using the pyrosequencing technology. Results Our pilot sequencing run generated 19,005 reads with an average length of 136 nucleotides per read. From these data, we identified 20 new genes. These new gene transcripts were confirmed by RT-PCR and their possible functions were analyzed. Conclusion Our results indicate that high-throughput sequence analysis of bacterial transcriptomes is feasible and next-generation sequencing technologies will greatly facilitate the discovery of new genes and improve genome annotation.
- Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource CenterWattam, Alice R.; Davis, James J.; Assaf, Rida; Boisvert, Sebastien; Brettin, Thomas; Bun, Christopher; Conrad, Neal; Dietrich, Emily M.; Disz, Terry L.; Gabbard, Joseph L.; Gerdes, Svetlana; Henry, Christopher S.; Kenyon, Ronald W.; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olsen, Gary J.; Murphy-Olson, Daniel E.; Olson, Robert D.; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Vonstein, Veronika; Warren, Andrew S.; Xia, Fangfang; Yoo, Hyunseung; Stevens, Rick L. (2017-01-04)The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user- created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by `virtual integration' to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.
- Mapping the Regulatory Network for Salmonella enterica Serovar Typhimurium InvasionSmith, Carol; Stringer, Anne M.; Mao, Chunhong; Palumbo, Michael J.; Wade, Joseph T. (American Society for Microbiology, 2016-09)Salmonella enterica pathogenicity island 1 (SPI-1) encodes proteins required for invasion of gut epithelial cells. The timing of invasion is tightly controlled by a complex regulatory network. The transcription factor (TF) HilD is the master regulator of this process and senses environmental signals associated with invasion. HilD activates transcription of genes within and outside SPI-1, including six other TFs. Thus, the transcriptional program associated with host cell invasion is controlled by at least 7 TFs. However, very few of the regulatory targets are known for these TFs, and the extent of the regulatory network is unclear. In this study, we used complementary genomic approaches to map the direct regulatory targets of all 7 TFs. Our data reveal a highly complex and interconnected network that includes many previously undescribed regulatory targets. Moreover, the network extends well beyond the 7 TFs, due to the inclusion of many additional TFs and noncoding RNAs. By comparing gene expression profiles of regulatory targets for the 7 TFs, we identified many uncharacterized genes that are likely to play direct roles in invasion. We also uncovered cross talk between SPI-1 regulation and other regulatory pathways, which, in turn, identified gene clusters that likely share related functions. Our data are freely available through an intuitive online browser and represent a valuable resource for the bacterial research community. IMPORTANCE Invasion of epithelial cells is an early step during infection by Salmonella enterica and requires secretion of specific proteins into host cells via a type III secretion system (T3SS). Most T3SS-associated proteins required for invasion are encoded in a horizontally acquired genomic locus known as Salmonella pathogenicity island 1 (SPI-1). Multiple regulators respond to environmental signals to ensure appropriate timing of SPI-1 gene expression. In particular, there are seven transcription regulators that are known to be involved in coordinating expression of SPI-1 genes. We have used complementary genome-scale approaches to map the gene targets of these seven regulators. Our data reveal a highly complex and interconnected regulatory network that includes many previously undescribed target genes. Moreover, our data functionally implicate many uncharacterized genes in the invasion process and reveal cross talk between SPI-1 regulation and other regulatory pathways. All datasets are freely available through an intuitive online browser.
- Named Entity Recognition for Bacterial Type IV Secretion SystemsAnaniadou, Sophia; Sullivan, Dan; Black, William; Levow, Gina-Anne; Gillespie, Joseph J.; Mao, Chunhong; Pyysalo, Sampo; Kolluru, BalaKrishna; Tsujii, Junichi; Sobral, Bruno (Public Library of Science, 2011-03-29)Research on specialized biological systems is often hampered by a lack of consistent terminology, especially across species. In bacterial Type IV secretion systems genes within one set of orthologs may have over a dozen different names. Classifying research publications based on biological processes, cellular components, molecular functions, and microorganism species should improve the precision and recall of literature searches allowing researchers to keep up with the exponentially growing literature, through resources such as the Pathosystems Resource Integration Center (PATRIC, patricbrc.org). We developed named entity recognition (NER) tools for four entities related to Type IV secretion systems: 1) bacteria names, 2) biological processes, 3) molecular functions, and 4) cellular components. These four entities are important to pathogenesis and virulence research but have received less attention than other entities, e.g., genes and proteins. Based on an annotated corpus, large domain terminological resources, and machine learning techniques, we developed recognizers for these entities. High accuracy rates (>80%) are achieved for bacteria, biological processes, and molecular function. Contrastive experiments highlighted the effectiveness of alternate recognition strategies; results of term extraction on contrasting document sets demonstrated the utility of these classes for identifying T4SS-related documents.
- Nix alone is sufficient to convert female Aedes aegypti into fertile males and myo-sex is needed for male flightAryan, Azadeh; Anderson, Michelle A. E.; Biedler, James K.; Qi, Yumin; Overcash, Justin M.; Naumenko, Anastasia N.; Sharakhova, Maria V.; Mao, Chunhong; Adelman, Zach N.; Tu, Zhijian Jake (NAS, 2020-06-12)A dominant male-determining locus (M-locus) establishes the male sex (M/m) in the yellow fever mosquito, Aedes aegypti. Nix, a gene in the M-locus, was shown to be a male-determining factor (M factor) as somatic knockout of Nix led to feminized males (M/m) while transient expression of Nix resulted in partially masculinized females (m/m), with male reproductive organs but retained female antennae. It was not clear whether any of the other 29 genes in the 1.3-Mb M-locus are also needed for complete sexconversion. Here, we report the generation of multiple transgenic lines that express Nix under the control of its own promoter. Genetic and molecular analyses of these lines provided insights unattainable from previous transient experiments. We show that the Nix transgene alone, in the absence of the M-locus,was sufficient to convert females into males with all male-specific sexually dimorphic features and male-like gene expression. The converted m/m males are flightless, unable to perform the nuptial flight required for mating. However, they were able to father sex-converted progeny when presented with cold-anesthetized wild-type females. We show that myo-sex, a myosin heavy-chain gene also in the M-locus, was required for male flight as knockout of myo-sex rendered wild-type males flightless. We also show that Nix-mediated female-to-male conversion was 100% penetrant and stable over many generations. Therefore, Nix has great potential for developing mosquito control strategies to reduce vector populations by female-to-male sex conversion, or to aid in a sterile insect technique that requires releasing only non-biting males.
- Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011Pyysalo, Sampo; Ohta, Tomoko; Rak, Rafal; Sullivan, Dan; Mao, Chunhong; Wang, Chunxia; Sobral, Bruno; Tsujii, Jun'ichi; Ananiadou, Sophia (2012-06-26)We present the preparation, resources, results and analysis of three tasks of the BioNLP Shared Task 2011: the main tasks on Infectious Diseases (ID) and Epigenetics and Post-translational Modifications (EPI), and the supporting task on Entity Relations (REL). The two main tasks represent extensions of the event extraction model introduced in the BioNLP Shared Task 2009 (ST'09) to two new areas of biomedical scientific literature, each motivated by the needs of specific biocuration tasks. The ID task concerns the molecular mechanisms of infection, virulence and resistance, focusing in particular on the functions of a class of signaling systems that are ubiquitous in bacteria. The EPI task is dedicated to the extraction of statements regarding chemical modifications of DNA and proteins, with particular emphasis on changes relating to the epigenetic control of gene expression. By contrast to these two application-oriented main tasks, the REL task seeks to support extraction in general by separating challenges relating to part-of relations into a subproblem that can be addressed by independent systems. Seven groups participated in each of the two main tasks and four groups in the supporting task. The participating systems indicated advances in the capability of event extraction methods and demonstrated generalization in many aspects: from abstracts to full texts, from previously considered subdomains to new ones, and from the ST'09 extraction targets to other entities and events. The highest performance achieved in the supporting task REL, 58% F-score, is broadly comparable with levels reported for other relation extraction tasks. For the ID task, the highest-performing system achieved 56% F-score, comparable to the state-of-the-art performance at the established ST'09 task. In the EPI task, the best result was 53% F-score for the full set of extraction targets and 69% F-score for a reduced set of core extraction targets, approaching a level of performance sufficient for user-facing applications. In this study, we extend on previously reported results and perform further analyses of the outputs of the participating systems. We place specific emphasis on aspects of system performance relating to real-world applicability, considering alternate evaluation metrics and performing additional manual analysis of system outputs. We further demonstrate that the strengths of extraction systems can be combined to improve on the performance achieved by any system in isolation. The manually annotated corpora, supporting resources, and evaluation tools for all tasks are available from http://www.bionlp-st.org and the tasks continue as open challenges for all interested parties.
- The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilitiesDavis, James J.; Wattam, Alice R.; Aziz, Ramy K.; Brettin, Thomas; Butler, Ralph; Butler, Rory M.; Chlenski, Philippe; Conrad, Neal; Dickerman, Allan W.; Dietrich, Emily M.; Gabbard, Joseph L.; Gerdes, Svetlana; Guard, Andrew; Kenyon, Ronald W.; Machi, Dustin; Mao, Chunhong; Murphy-Olson, Daniel E.; Nguyen, Marcus; Nordberg, Eric K.; Olsen, Gary J.; Olson, Robert D.; Overbeek, Jamie C.; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Thomas, Chris; VanOeffelen, Margo; Vonstein, Veronika; Warren, Andrew S.; Xia, Fangfang; Xie, Dawen; Yoo, Hyunseung; Stevens, Rick L. (2020-01-08)The PathoSystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center funded by the National Institute of Allergy and Infectious Diseases (https://www.patricbrc.org). PATRIC supports bioinformatic analyses of all bacteria with a special emphasis on pathogens, offering a rich comparative analysis environment that provides users with access to over 250 000 uniformly annotated and publicly available genomes with curated metadata. PATRIC offers web-based visualization and comparative analysis tools, a private workspace in which users can analyze their own data in the context of the public collections, services that streamline complex bioinformatic workflows and command-line tools for bulk data analysis. Over the past several years, as genomic and other omics-related experiments have become more cost-effective and widespread, we have observed considerable growth in the usage of and demand for easy-to-use, publicly available bioinformatic tools and services. Here we report the recent updates to the PATRIC resource, including new web-based comparative analysis tools, eight new services and the release of a command-line interface to access, query and analyze data.
- PATRIC, the bacterial bioinformatics database and analysis resourceWattam, Alice R.; Abraham, David; Dalay, Oral; Disz, Terry L.; Driscoll, Timothy; Gabbard, Joseph L.; Gillespie, Joseph J.; Gough, Roger; Hix, Deborah; Kenyon, Ronald W.; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olson, Robert; Overbeek, Ross; Pusch, Gordon D.; Shukla, Maulik; Schulman, Julie; Stevens, Rick L.; Sullivan, Daniel E.; Vonstein, Veronika; Warren, Andrew S.; Will, Rebecca; Wilson, Meredith J. C.; Yoo, Hyunseung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno (2014-01)The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e. g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.
- Whole Exome Sequencing to Identify Genetic Variants Associated with Raised Atherosclerotic Lesions in Young PersonsHixson, James E.; Jun, Goo; Shimmin, Lawrence C.; Wang, Yizhi; Yu, Guoqiang; Mao, Chunhong; Warren, Andrew S.; Howard, Timothy D.; Vander Heide, Richard S.; Van Eyk, Jennifer E.; Wang, Yue; Herrington, David M. (Springer Nature, 2017-06-22)We investigated the influence of genetic variants on atherosclerosis using whole exome sequencing in cases and controls from the autopsy study "Pathobiological Determinants of Atherosclerosis in Youth (PDAY)". We identified a PDAY case group with the highest total amounts of raised lesions (n = 359) for comparisons with a control group with no detectable raised lesions (n = 626). In addition to the standard exome capture, we included genome-wide proximal promoter regions that contain sequences that regulate gene expression. Our statistical analyses included single variant analysis for common variants (MAF > 0.01) and rare variant analysis for low frequency and rare variants (MAF < 0.05). In addition, we investigated known CAD genes previously identified by meta-analysis of GWAS studies. We did not identify individual common variants that reached exome-wide significance using single variant analysis. In analysis limited to 60 CAD genes, we detected strong associations with COL4A2/COL4A1 that also previously showed associations with myocardial infarction and arterial stiffness, as well as coronary artery calcification. Likewise, rare variant analysis did not identify genes that reached exomewide significance. Among the 60 CAD genes, the strongest association was with NBEAL1 that was also identified in gene-based analysis of whole exome sequencing for early onset myocardial infarction.