Computational Analysis of Viruses in Metagenomic Data

Tithi, Saima Sultana

Computational Analysis of Viruses in Metagenomic Data

dc.contributor.author	Tithi, Saima Sultana	en
dc.contributor.committeechair	Zhang, Liqing	en
dc.contributor.committeemember	Jensen, Roderick V.	en
dc.contributor.committeemember	Meng, Na	en
dc.contributor.committeemember	Raghvendra, Sharath	en
dc.contributor.committeemember	Liu, Linshu	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2020-03-04T21:26:45Z	en
dc.date.available	2020-03-04T21:26:45Z	en
dc.date.issued	2019-10-24	en
dc.description.abstract	Viruses have huge impact on controlling diseases and regulating many key ecosystem processes. As metagenomic data can contain many microbiomes including many viruses, by analyzing metagenomic data we can analyze many viruses at the same time. The first step towards analyzing metagenomic data is to identify and quantify viruses present in the data. In order to answer this question, we developed a computational pipeline, FastViromeExplorer. FastViromeExplorer leverages a pseudoalignment based approach, which is faster than the traditional alignment based approach to quickly align millions/billions of reads. Application of FastViromeExplorer on both human gut samples and environmental samples shows that our tool can successfully identify viruses and quantify the abundances of viruses quickly and accurately even for a large data set. As viruses are getting increased attention in recent times, most of the viruses are still unknown or uncategorized. To discover novel viruses from metagenomic data, we developed a computational pipeline named FVE-novel. FVE-novel leverages a hybrid of both reference based and de novo assembly approach to recover novel viruses from metagenomic data. By applying FVE-novel to an ocean metagenome sample, we successfully recovered two novel viruses and two different strains of known phages. Analysis of viral assemblies from metagenomic data reveals that viral assemblies often contain assembly errors like chimeric sequences which means more than one viral genomes are incorrectly assembled together. In order to identify and fix these types of assembly errors, we developed a computational tool called VirChecker. Our tool can identify and fix assembly errors due to chimeric assembly. VirChecker also extends the assembly as much as possible to complete it and then annotates the extended and improved assembly. Application of VirChecker to viral scaffolds collected from an ocean meatgenome sample shows that our tool successfully fixes the assembly errors and extends two novel virus genomes and two strains of known phage genomes.	en
dc.description.abstractgeneral	Virus, the most abundant micro-organism on earth has a profound impact on human health and environment. Analyzing metagenomic data for viruses has the beneFIt of analyzing many viruses at a time without the need of cultivating them in the lab environment. Here, in this dissertation, we addressed three research problems of analyzing viruses from metagenomic data. To analyze viruses in metagenomic data, the first question needs to answer is what viruses are there and at what quantity. To answer this question, we developed a computational pipeline, FastViromeExplorer. Our tool can identify viruses from metagenomic data and quantify the abundances of viruses present in the data quickly and accurately even for a large data set. To recover novel virus genomes from metagenomic data, we developed a computational pipeline named FVE-novel. By applying FVE-novel to an ocean metagenome sample, we successfully recovered two novel viruses and two strains of known phages. Examination of viral assemblies from metagenomic data reveals that due to the complex nature of metagenome data, viral assemblies often contain assembly errors and are incomplete. To solve this problem, we developed a computational pipeline, named VirChecker, to polish, extend and annotate viral assemblies. Application of VirChecker to virus genomes recovered from an ocean metagenome sample shows that our tool successfully extended and completed those virus genomes.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:22525	en
dc.identifier.uri	http://hdl.handle.net/10919/97194	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Metagenomics	en
dc.subject	Virus	en
dc.subject	Phage	en
dc.subject	Viral Read Classification	en
dc.subject	Viral Genome Assembly	en
dc.subject	Improvement of Virus Assembly	en
dc.subject	Development of Computational Pipeline	en
dc.title	Computational Analysis of Viruses in Metagenomic Data	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Tithi_S_D_2019.pdf
Size:: 26.52 MB
Format:: Adobe Portable Document Format

Download

Name:: Tithi_S_D_2019_support_1.pdf
Size:: 287.19 KB
Format:: Adobe Portable Document Format
Description:: Supporting documents

Download

Collections

Doctoral Dissertations