Virginia Tech
    • Log in
    View Item 
    •   VTechWorks Home
    • ETDs: Virginia Tech Electronic Theses and Dissertations
    • Masters Theses
    • View Item
    •   VTechWorks Home
    • ETDs: Virginia Tech Electronic Theses and Dissertations
    • Masters Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A RNA Virus Reference Database (RVRD) to Enhance Virus Detection in Metagenomic Data

    Thumbnail
    View/Open
    Lei_S_T_2018.pdf (1.326Mb)
    Downloads: 527
    Date
    2018-10-16
    Author
    Lei, Shaohua
    Metadata
    Show full item record
    Abstract
    With the great promise that metagenomics holds in exploring virome composition and discovering novel virus species, there is a pressing demand for comprehensive and up-to-date reference databases to enhance the downstream bioinformatics analysis. In this study, a RNA virus reference database (RVRD) was developed by manual and computational curation of RNA virus genomes downloaded from the three major virus sequence databases including NCBI, ViralZone, and ViPR. To reduce viral sequence redundancy caused by multiple identical or nearly identical sequences, sequences were first clustered and all sequences except one in a cluster that have more than 98% identity to one another were removed. Other identity cutoffs were also examined, and Hepatitis C virus genomes were studied in detail as an example. Using the 98% identity cutoff, sequences obtained from ViPR were combined with the unique RNA virus references from NCBI and ViralZone to generate the final RVRD. The resulting RVRD contained 23,085 sequences, nearly 5 times the size of NCBI RNA virus reference, and had a broad coverage of RNA virus families, with significant expansion on circular ssRNA virus and pathogenic virus families. Compared to NCBI RNA virus reference in performance evaluation, using RVRD as reference database identified more RNA virus species in RNAseq data derived from wastewater samples. Moreover, using RVRD as reference database also led to the discovery of porcine rotavirus as the etiology of unexplained diarrhea observed in pigs. RVRD is publicly available for enhancing RNA virus metagenomics.
    General Audience Abstract
    Next-generation sequencing technology has demonstrated capability for the detection of viruses in various samples, but one challenge in bioinformatics analysis is the lack of well-curated reference databases, especially for RNA viruses. In this study, a RNA virus reference database (RVRD) was developed by manual and computational curation from the three commonly used resources: NCBI, ViralZone, and ViPR. While RVRD was managed to be comprehensive with broad coverage of RNA virus families, clustering was performed to reduce redundant sequences. The performance of RVRD was compared with NCBI RNA virus reference database using the pipeline FastViromeExplorer developed by our lab recently, the results showed that more RNA viruses were identified in several metagenomic datasets using RVRD, indicating improved performance in practice.
    URI
    http://hdl.handle.net/10919/85388
    Collections
    • Masters Theses [21074]

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us
     

     

    VTechWorks

    AboutPoliciesHelp

    Browse

    All of VTechWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Log inRegister

    Statistics

    View Usage Statistics

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us