Computational Analysis of Viruses in Metagenomic Data

dc.contributor.authorTithi, Saima Sultanaen
dc.contributor.committeechairZhang, Liqingen
dc.contributor.committeememberJensen, Roderick V.en
dc.contributor.committeememberMeng, Naen
dc.contributor.committeememberRaghvendra, Sharathen
dc.contributor.committeememberLiu, Linshuen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2020-03-04T21:26:45Zen
dc.date.available2020-03-04T21:26:45Zen
dc.date.issued2019-10-24en
dc.description.abstractViruses have huge impact on controlling diseases and regulating many key ecosystem processes. As metagenomic data can contain many microbiomes including many viruses, by analyzing metagenomic data we can analyze many viruses at the same time. The first step towards analyzing metagenomic data is to identify and quantify viruses present in the data. In order to answer this question, we developed a computational pipeline, FastViromeExplorer. FastViromeExplorer leverages a pseudoalignment based approach, which is faster than the traditional alignment based approach to quickly align millions/billions of reads. Application of FastViromeExplorer on both human gut samples and environmental samples shows that our tool can successfully identify viruses and quantify the abundances of viruses quickly and accurately even for a large data set. As viruses are getting increased attention in recent times, most of the viruses are still unknown or uncategorized. To discover novel viruses from metagenomic data, we developed a computational pipeline named FVE-novel. FVE-novel leverages a hybrid of both reference based and de novo assembly approach to recover novel viruses from metagenomic data. By applying FVE-novel to an ocean metagenome sample, we successfully recovered two novel viruses and two different strains of known phages. Analysis of viral assemblies from metagenomic data reveals that viral assemblies often contain assembly errors like chimeric sequences which means more than one viral genomes are incorrectly assembled together. In order to identify and fix these types of assembly errors, we developed a computational tool called VirChecker. Our tool can identify and fix assembly errors due to chimeric assembly. VirChecker also extends the assembly as much as possible to complete it and then annotates the extended and improved assembly. Application of VirChecker to viral scaffolds collected from an ocean meatgenome sample shows that our tool successfully fixes the assembly errors and extends two novel virus genomes and two strains of known phage genomes.en
dc.description.abstractgeneralVirus, the most abundant micro-organism on earth has a profound impact on human health and environment. Analyzing metagenomic data for viruses has the beneFIt of analyzing many viruses at a time without the need of cultivating them in the lab environment. Here, in this dissertation, we addressed three research problems of analyzing viruses from metagenomic data. To analyze viruses in metagenomic data, the first question needs to answer is what viruses are there and at what quantity. To answer this question, we developed a computational pipeline, FastViromeExplorer. Our tool can identify viruses from metagenomic data and quantify the abundances of viruses present in the data quickly and accurately even for a large data set. To recover novel virus genomes from metagenomic data, we developed a computational pipeline named FVE-novel. By applying FVE-novel to an ocean metagenome sample, we successfully recovered two novel viruses and two strains of known phages. Examination of viral assemblies from metagenomic data reveals that due to the complex nature of metagenome data, viral assemblies often contain assembly errors and are incomplete. To solve this problem, we developed a computational pipeline, named VirChecker, to polish, extend and annotate viral assemblies. Application of VirChecker to virus genomes recovered from an ocean metagenome sample shows that our tool successfully extended and completed those virus genomes.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:22525en
dc.identifier.urihttp://hdl.handle.net/10919/97194en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectMetagenomicsen
dc.subjectVirusen
dc.subjectPhageen
dc.subjectViral Read Classificationen
dc.subjectViral Genome Assemblyen
dc.subjectImprovement of Virus Assemblyen
dc.subjectDevelopment of Computational Pipelineen
dc.titleComputational Analysis of Viruses in Metagenomic Dataen
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Tithi_S_D_2019.pdf
Size:
26.52 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Tithi_S_D_2019_support_1.pdf
Size:
287.19 KB
Format:
Adobe Portable Document Format
Description:
Supporting documents