Scholarly Works, Fralin Life Sciences Institute
Permanent URI for this collection
Browse
Browsing Scholarly Works, Fralin Life Sciences Institute by Issue Date
Now showing 1 - 20 of 559
Results Per Page
Sort Options
- Towards a calculus of biological networksReidys, Christian Michael; Mortveit, Henning S. (2002)In this paper we present a new framework for studying the dynamics of biological networks. A specific class of dynamical systems, Sequential Dynamical Systems (SDS), is introduced. These systems allow one to investigate the interplay between structural properties of the network and its phase space. We will show in detail how to find a reduced system that captures key features of a given system. This reduction is based on a special graph-theoretic relation between the two networks. We will study the reduction of SDS over n-cubes in detail and we will present several examples.
- Reproducibility, bioinformatic analysis and power of the SAGE method to evaluate changes in transcriptomeDinel, S.; Bolduc, C.; Belleau, P.; Boivin, A.; Yoshioka, M.; Calvo, E.; Piedboeuf, B.; Snyder, E. E.; Labrie, F.; St-Amand, J. (2005-01-01)The serial analysis of gene expression (SAGE) method is used to study global gene expression in cells or tissues in various experimental conditions. However, its reproducibility has not yet been definitively assessed. In this study, we have evaluated the reproducibility of the SAGE method and identified the factors that affect it. The determination coefficient (R-2 ) for the reproducibility of SAGE is 0.96. However, there are some factors that can affect the reproducibility of SAGE, such as the replication of concatemers and ditags, the number of sequenced tags and double PCR amplification of ditags. Thus, corrections for these factors must be made to ensure the reproducibility and accuracy of SAGE results. A bioinformatic analysis of SAGE data is also presented in order to eliminate these artifacts. Finally, the current study shows that increasing the number of sequenced tags improves the power of the method to detect transcripts and their regulation by experimental conditions.
- Mitochondrial-encoded membrane protein transcripts are pyrimidine-rich while soluble protein transcripts and ribosomal RNA are purine-richBradshaw, Patrick C.; Rathi, Anand; Samuels, David C. (2005-09-26)Background Eukaryotic organisms contain mitochondria, organelles capable of producing large amounts of ATP by oxidative phosphorylation. Each cell contains many mitochondria with many copies of mitochondrial DNA in each organelle. The mitochondrial DNA encodes a small but functionally critical portion of the oxidative phosphorylation machinery, a few other species-specific proteins, and the rRNA and tRNA used for the translation of these transcripts. Because the microenvironment of the mitochondrion is unique, mitochondrial genes may be subject to different selectional pressures than those affecting nuclear genes. Results From an analysis of the mitochondrial genomes of a wide range of eukaryotic species we show that there are three simple rules for the pyrimidine and purine abundances in mitochondrial DNA transcripts. Mitochondrial membrane protein transcripts are pyrimidine rich, rRNA transcripts are purine-rich and the soluble protein transcripts are purine-rich. The transitions between pyrimidine and purine-rich regions of the genomes are rapid and are easily visible on a pyrimidine-purine walk graph. These rules are followed, with few exceptions, independent of which strand encodes the gene. Despite the robustness of these rules across a diverse set of species, the magnitude of the differences between the pyrimidine and purine content is fairly small. Typically, the mitochondrial membrane protein transcripts have a pyrimidine richness of 56%, the rRNA transcripts are 55% purine, and the soluble protein transcripts are only 53% purine. Conclusion The pyrimidine richness of mitochondrial-encoded membrane protein transcripts is partly driven by U nucleotides in the second codon position in all species, which yields hydrophobic amino acids. The purine-richness of soluble protein transcripts is mainly driven by A nucleotides in the first codon position. The purine-richness of rRNA is also due to an abundance of A nucleotides. Possible mechanisms as to how these trends are maintained in mtDNA genomes of such diverse ancestry, size and variability of A-T richness are discussed.
- The distribution of SNPs in human gene regulatory regionsGuo, Yongjian; Jamison, D. Curtis (2005-10-06)Background As a result of high-throughput genotyping methods, millions of human genetic variants have been reported in recent years. To efficiently identify those with significant biological functions, a practical strategy is to concentrate on variants located in important sequence regions such as gene regulatory regions. Results Analysis of the most common type of variant, single nucleotide polymorphisms (SNPs), shows that in gene promoter regions more SNPs occur in close proximity to transcriptional start sites than in regions further upstream, and a disproportionate number of those SNPs represent nucleotide transversions. Additionally, the number of SNPs found in the predicted transcription factor binding sites is higher than in non-binding site sequences. Conclusion Current information about transcription factor binding site sequence patterns may not be exhaustive, and SNPs may be actively involved in influencing gene expression by affecting the transcription factor binding sites.
- VMD: a community annotation database for oomycetes and microbial genomesTripathy, Sucheta; Pandey, Varun N.; Fang, Bing; Salas, Fidel; Tyler, Brett M. (2006-01-01)The VBI Microbial Database (VMD) is a database system designed to host a range of microbial genome sequences. At present, the database contains genome sequence and annotation data of two plant pathogens Phytophthora sojae and Phytophthora ramorum. With the completion of the draft genome sequences of these pathogens in collaboration with the DOE Joint Genome Institute (JGI), we have created this resource to make the sequences publicly available. The genome sequences ( 95 MB for P. sojae and 65 MB for P. ramorum) were annotated with similar to 19 000 and similar to 16 000 gene models, respectively. We used two different statistical methods to validate these gene models, Fickett's and a log-likelihood method. Functional annotation of the gene models is based on results from BlastX and InterProScan screens. From the InterProScan results, we could assign putative functions to 17 694 genes in P. sojae and 14 700 genes in P. ramorum. We created an easy-to-use genome browser to view the genome sequence data, which opens to detailed annotation pages for each gene model. A community annotation interface is available for registered community members to add or edit annotations. There are similar to 1600 gene models for P. sojae and similar to 700 models for P. ramorum that have already been manually curated. A toolkit is provided as an additional resource for users to perform a variety of sequence analysis jobs. The database is publicly available at http://phytophthora.vbi.vt.edu/.
- Tomato Expression Database (TED): a suite of data presentation and analysis toolsFei, Zhangjun; Tang, Xuemei; Alba, Rob; Giovannoni, James (2006-01-01)The Tomato Expression Database (TED) includes three integrated components. The Tomato Microarray Data Warehouse serves as a central repository for raw gene expression data derived from the public tomato cDNA microarray. In addition to expression data, TED stores experimental design and array information in compliance with the MIAME guidelines and provides web interfaces for researchers to retrieve data for their own analysis and use. The Tomato Microarray Expression Database contains normalized and processed microarray data for ten time points with nine pair-wise comparisons during fruit development and ripening in a normal tomato variety and nearly isogenic single gene mutants impacting fruit development and ripening. Finally, the Tomato Digital Expression Database contains raw and normalized digital expression ( EST abundance) data derived from analysis of the complete public tomato EST collection containing. 150 000 ESTs derived from 27 different non-normalized EST libraries. This last component also includes tools for the comparison of tomato and Arabidopsis digital expression data. A set of query interfaces and analysis, and visualization tools have been developed and incorporated into TED, which aid users in identifying and deciphering biologically important information from our datasets. TED can be accessed at http://ted.bti.cornell.edu.
- The statistics of identifying differentially expressed genes in Expresso and TM4: a comparisonSioson, Allan A.; Mane, Shrinivasrao P.; Li, Pinghua; Sha, Wei; Heath, Lenwood S.; Bohnert, Hans J.; Grene, Ruth (2006-04-20)Background Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data. Results The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data. Conclusion The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity.
- GenomeBlast: a web tool for small genome comparisonLu, Guoqing; Jiang, Liying; Helikar, Resa M. K.; Rowley, Thaine W.; Zhang, Luwen; Chen, Xianfeng; Moriyama, Etsuko N. (2006-12-12)Background Comparative genomics has become an essential approach for identifying homologous gene candidates and their functions, and for studying genome evolution. There are many tools available for genome comparisons. Unfortunately, most of them are not applicable for the identification of unique genes and the inference of phylogenetic relationships in a given set of genomes. Results GenomeBlast is a Web tool developed for comparative analysis of multiple small genomes. A new parameter called "coverage" was introduced and used along with sequence identity to evaluate global similarity between genes. With GenomeBlast, the following results can be obtained: (1) unique genes in each genome; (2) homologous gene candidates among compared genomes; (3) 2D plots of homologous gene candidates along the all pairwise genome comparisons; and (4) a table of gene presence/absence information and a genome phylogeny. We demonstrated the functions in GenomeBlast with an example of multiple herpesviral genome analysis and illustrated how GenomeBlast is useful for small genome comparison. Conclusion We developed a Web tool for comparative analysis of small genomes, which allows the user not only to identify unique genes and homologous gene candidates among multiple genomes, but also to view their graphical distributions on genomes, and to reconstruct genome phylogeny. GenomeBlast runs on a Linux server with 4 CPUs and 4 GB memory. The online version of GenomeBlast is available to public by using a Web browser with the URL http://bioinfo-srv1.awh.unomaha.edu/genomeblast/.
- Computational prediction of host-pathogen protein–protein interactionsDyer, Matthew D.; Murali, T. M.; Sobral, Bruno (Oxford University Press, 2007)Motivation: Infectious diseases such as malaria result in millions of deaths each year. An important aspect of any host-pathogen system is the mechanism by which a pathogen can infect its host. One method of infection is via protein–protein interactions (PPIs) where pathogen proteins target host proteins. Developing computational methods that identify which PPIs enable a pathogen to infect a host has great implications in identifying potential targets for therapeutics. Results: We present a method that integrates known intra-species PPIs with protein-domain profiles to predict PPIs between host and pathogen proteins. Given a set of intra-species PPIs, we identify the functional domains in each of the interacting proteins. For every pair of functional domains, we use Bayesian statistics to assess the probability that two proteins with that pair of domains will interact. We apply our method to the Homo sapiens – Plasmodium falciparum host-pathogen system. Our system predicts 516 PPIs between proteins from these two organisms. We show that pairs of human proteins we predict to interact with the same Plasmodium protein are close to each other in the human PPI network and that Plasmodium pairs predicted to interact with same human protein are co-expressed in DNA microarray datasets measured during various stages of the Plasmodium life cycle. Finally, we identify functionally enriched sub-networks spanned by the predicted interactions and discuss the plausibility of our predictions.
- A syntactic model to design and verify synthetic genetic constructs derived from standard biological partsCai, Y.; Hartnett, B.; Gustafsson, C.; Peccoud, Jean (2007)Motivation: The sequence of artificial genetic constructs is composed of multiple functional fragments, or genetic parts, involved in different molecular steps of gene expression mechanisms. Biologists have deciphered structural rules that the design of genetic constructs needs to follow in order to ensure a successful completion of the gene expression process, but these rules have not been formalized, making it challenging for non-specialists to benefit from the recent progress in gene synthesis. Results: We show that context-free grammars (CFG) can formalize these design principles. This approach provides a path to organizing libraries of genetic parts according to their biological functions, which correspond to the syntactic categories of the CFG. It also provides a framework for the systematic design of new genetic constructs consistent with the design principles expressed in the CFG. Using parsing algorithms, this syntactic model enables the verification of existing constructs. We illustrate these possibilities by describing a CFG that generates the most common architectures of genetic constructs in Escherichia coli. Availability: A web site allows readers to experiment with the algorithms presented in this article: www.genocad.org
- PATRIC: The VBI PathoSystems Resource Integration CenterSnyder, E. E.; Kampanya, N.; Lu, J.; Nordberg, E. K.; Karur, H. R.; Shukla, Maulik; Soneja, J.; Tian, Y.; Xue, T.; Yoo, H.; Zhang, F.; Dharmanolla, C.; Dongre, N. V.; Gillespie, J. J.; Hamelius, J.; Hance, M.; Huntington, K. I.; Jukneliene, D.; Koziski, J.; Mackasmiel, L.; Mane, S. P.; Nguyen, V.; Purkayastha, A.; Shallom, J.; Yu, G.; Guo, Y.; Gabbard, Joseph L.; Hix, D.; Azad, A. F.; Baker, S. C.; Boyle, Stephen M.; Khudyakov, Y.; Meng, Xiang-Jin; Rupprecht, C.; Vinje, J.; Crasta, Oswald R.; Czar, M. J.; Dickerman, Allan W.; Eckart, J. D.; Kenyon, R.; Will, R.; Setubal, Joao C.; Sobral, Bruno (2007-01)The PathoSystems Resource Integration Center (PATRIC) is one of eight Bioinformatics Resource Centers (BRCs) funded by the National Institute of Allergy and Infection Diseases (NIAID) to create a data and analysis resource for selected NIAID priority pathogens, specifically proteobacteria of the genera Brucella, Rickettsia and Coxiella, and corona-, calici- and lyssaviruses and viruses associated with hepatitis A and E. The goal of the project is to provide a comprehensive bioinformatics resource for these pathogens, including consistently annotated genome, proteome and metabolic pathway data to facilitate research into counter-measures, including drugs, vaccines and diagnostics. The project's curation strategy has three prongs: 'breadth first' beginning with whole-genome and proteome curation using standardized protocols, a 'targeted' approach addressing the specific needs of researchers and an integrative strategy to leverage high-throughput experimental data (e.g. microarrays, proteomics) and literature. The PATRIC infrastructure consists of a relational database, analytical pipelines and a website which supports browsing, querying, data visualization and the ability to download raw and curated data in standard formats. At present, the site warehouses complete sequences for 17 bacterial and 332 viral genomes. The PATRIC website (https://patric.vbi.vt.edu) will continually grow with the addition of data, analysis and functionality over the course of the project.
- MvirDB - a microbial database of protein toxins, virulence factors and antibiotic resistance genes for bio-defence applicationsZhou, C. E.; Smith, J.; Lam, M.; Zemla, A.; Dyer, Matthew D.; Slezak, Tom (2007-01)Knowledge of toxins, virulence factors and antibiotic resistance genes is essential for bio-defense applications aimed at identifying 'functional' signatures for characterizing emerging or engineered pathogens. Whereas genetic signatures identify a pathogen, functional signatures identify what a pathogen is capable of. To facilitate rapid identification of sequences and characterization of genes for signature discovery, we have collected all publicly available (as of this writing), organized sequences representing known toxins, virulence factors, and antibiotic resistance genes in one convenient database, which we believe will be of use to the bio-defense research community. MvirDB integrates DNA and protein sequence information from Tox-Prot, SCORPION, the PRINTS virulence factors, VFDB, TVFac, Islander, ARGO and a subset of VIDA. Entries in MvirDB are hyperlinked back to their original sources. A blast tool allows the user to blast against all DNA or protein sequences in MvirDB, and a browser tool allows the user to search the database to retrieve virulence factor descriptions, sequences, and classifications, and to download sequences of interest. MvirDB has an automated weekly update mechanism. Each protein sequence in MvirDB is annotated using our fully automated protein annotation system and is linked to that system's browser tool. MvirDB can be accessed at http://mvirdb.llnl.gov/.
- GeneTrees: a phylogenomics resource for prokaryotesTian, Yuying; Dickerman, Allan W. (2007-01)The GeneTrees phylogenomics system pursues comparative genomic analyses from the perspective of gene phylogenies for individual genes. The GeneTrees project has the goal of providing detailed evolutionary models for all protein-coding gene components of the fully sequenced genomes. Currently, a database of alignments and trees for all protein sequences for 325 fully sequenced and annotated prokaryote genomes is available. The prokaryote database contains 890 000 protein sequences organized into over 100 000 alignments, each described by a phylogenetic tree. An original homology group discovery tool assembles sets of related proteins from all versus all pairwise alignments. Multiple alignments for each homology group are stored and subjected to phylogenetic tree inference. A graphical web interface provides visual exploration of the GeneTrees database. Homology groups can be queried by sequence identifiers or annotation terms. Genomes can be browsed visually on a gene map of each chromosome or plasmid. Phylogenetic trees with support values are displayed in conjunction with the associated sequence alignment. A variety of classes of information can be selected to label the tree tips to aid in visual evaluation of annotation and gene function. This web interface is available at http://genetrees.vbi.vt.edu.
- Plasmids and Rickettsial Evolution: Insight from Rickettsia felisGillespie, Joseph J.; Beier, Magda S.; Rahman, M. Sayeedur; Ammerman, Nicole C.; Shallom, Joshua M.; Purkayastha, Anjan; Sobral, Bruno; Azad, Abdu F. (PLOS, 2007-03-07)Background The genome sequence of Rickettsia felis revealed a number of rickettsial genetic anomalies that likely contribute not only to a large genome size relative to other rickettsiae, but also to phenotypic oddities that have confounded the categorization of R. felis as either typhus group (TG) or spotted fever group (SFG) rickettsiae. Most intriguing was the first report from rickettsiae of a conjugative plasmid (pRF) that contains 68 putative open reading frames, several of which are predicted to encode proteins with high similarity to conjugative machinery in other plasmid-containing bacteria. Methodology/Principal Findings Using phylogeny estimation, we determined the mode of inheritance of pRF genes relative to conserved rickettsial chromosomal genes. Phylogenies of chromosomal genes were in agreement with other published rickettsial trees. However, phylogenies including pRF genes yielded different topologies and suggest a close relationship between pRF and ancestral group (AG) rickettsiae, including the recently completed genome of R. bellii str. RML369-C. This relatedness is further supported by the distribution of pRF genes across other rickettsiae, as 10 pRF genes (or inactive derivatives) also occur in AG (but not SFG) rickettsiae, with five of these genes characteristic of typical plasmids. Detailed characterization of pRF genes resulted in two novel findings: the identification of oriV and replication termination regions, and the likelihood that a second proposed plasmid, pRFδ, is an artifact of the original genome assembly. Conclusion/Significance Altogether, we propose a new rickettsial classification scheme with the addition of a fourth lineage, transitional group (TRG) rickettsiae, that is unique from TG and SFG rickettsiae and harbors genes from possible exchanges with AG rickettsiae via conjugation. We offer insight into the evolution of a plastic plasmid system in rickettsiae, including the role plasmids may have played in the acquirement of virulence traits in pathogenic strains, and the likely origin of plasmids within the rickettsial tree.
- Bayesian estimation of genetic parameters for multivariate threshold and continuous phenotypes and molecular genetic data in simulated horse populations using Gibbs samplingStock, Kathrin F.; Distl, Ottmar; Hoeschele, Ina (2007-05-09)Background Requirements for successful implementation of multivariate animal threshold models including phenotypic and genotypic information are not known yet. Here simulated horse data were used to investigate the properties of multivariate estimators of genetic parameters for categorical, continuous and molecular genetic data in the context of important radiological health traits using mixed linear-threshold animal models via Gibbs sampling. The simulated pedigree comprised 7 generations and 40000 animals per generation. Additive genetic values, residuals and fixed effects for one continuous trait and liabilities of four binary traits were simulated, resembling situations encountered in the Warmblood horse. Quantitative trait locus (QTL) effects and genetic marker information were simulated for one of the liabilities. Different scenarios with respect to recombination rate between genetic markers and QTL and polymorphism information content of genetic markers were studied. For each scenario ten replicates were sampled from the simulated population, and within each replicate six different datasets differing in number and distribution of animals with trait records and availability of genetic marker information were generated. (Co)Variance components were estimated using a Bayesian mixed linear-threshold animal model via Gibbs sampling. Residual variances were fixed to zero and a proper prior was used for the genetic covariance matrix. Results Effective sample sizes (ESS) and biases of genetic parameters differed significantly between datasets. Bias of heritability estimates was -6% to +6% for the continuous trait, -6% to +10% for the binary traits of moderate heritability, and -21% to +25% for the binary traits of low heritability. Additive genetic correlations were mostly underestimated between the continuous trait and binary traits of low heritability, under- or overestimated between the continuous trait and binary traits of moderate heritability, and overestimated between two binary traits. Use of trait information on two subsequent generations of animals increased ESS and reduced bias of parameter estimates more than mere increase of the number of informative animals from one generation. Consideration of genotype information as a fixed effect in the model resulted in overestimation of polygenic heritability of the QTL trait, but increased accuracy of estimated additive genetic correlations of the QTL trait. Conclusion Combined use of phenotype and genotype information on parents and offspring will help to identify agonistic and antagonistic genetic correlations between traits of interests, facilitating design of effective multiple trait selection schemes.
- A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use caseYu, G. X.; Snyder, E. E.; Boyle, Stephen M.; Crasta, Oswald R.; Czar, M. J.; Mane, S. P.; Purkayastha, A.; Sobral, Bruno; Setubal, Joao C. (2007-06)We present a bacterial genome computational analysis pipeline, called GenVar. The pipeline, based on the program GeneWise, is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames (split genes) and those with insertions and deletions (indels). For a given genome to be analyzed, GenVar relies on a database containing closely related genomes (such as other species or strains) as well as a few additional reference genomes. GenVar also helps identify gene disruptions probably caused by sequencing errors. We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes. Brucella is an important human pathogen and zoonotic agent. The analysis revealed hundreds of missed gene calls, new split genes and indels, several of which are species specific and hence provide valuable clues to the understanding of the genome basis of Brucella pathogenicity and host specificity.
- Role of information and communication networks in malaria survivalMozumder, Pallab; Marathe, Achla (2007-10-10)Background Quite often symptoms of malaria go unrecognized or untreated. According to the Multilateral Initiative on Malaria, 70% of the malaria cases that are treated at home are mismanaged. Up to 82% of all malaria episodes in sub-Saharan Africa are treated outside the formal health sector. Fast and appropriate diagnosis and treatment of malaria is extremely important in reducing morbidity and mortality. Method Data from 70 different countries is pooled together to construct a panel dataset of health and socio-economic variables for a time span of (1960-2004). The generalized two-stage least squares and panel data models are used to investigate the impact of information and communication network (ICN) variables on malaria death probability. The intensity of ICN is represented by the number of telephone main lines per 1,000 people and the number of television sets per 1,000 people. Results The major finding is that the intensity of ICN is associated with reduced probability of deaths of people that are clinically identified as malaria infected. The results are robust for both indicators i.e. interpersonal and mass communication networks and for all model specifications examined. Conclusion The results suggest that information and communication networks can substantially scale up the effectiveness of the existing resources for malaria prevention. Resources spent in preventing malaria are far less than needed. Expanded information and communication networks will widen the avenues for community based "participatory development", that encourages the use of local information, knowledge and decision making. Timely information, immediate care and collective knowledge based treatment can be extremely important in reducing child mortality and achieving the millennium development goal.
- A virtual look at Epstein-Barr virus infection: Biological interpretationsDuca, Karen A.; Shapiro, Michael; Delgado-Eckert, Edgar; Hadinoto, Vey; Jarrah, Abdul Salam; Laubenbacher, Reinhard C.; Lee, Kichol; Luzuriaga, Katherine; Polys, Nicholas F.; Thorley-Lawson, David A. (PLOS, 2007-10-19)The possibility of using computer simulation and mathematical modeling to gain insight into biological and other complex systems is receiving increased attention. However, it is as yet unclear to what extent these techniques will provide useful biological insights or even what the best approach is. Epstein -Barr virus (EBV) provides a good candidate to address these issues. It persistently infects most humans and is associated with several important diseases. In addition, a detailed biological model has been developed that provides an intricate understanding of EBV infection in the naturally infected human host and accounts for most of the virus' diverse and peculiar properties. We have developed an agent-based computer model/ simulation (PathSim, Pathogen Simulation) of this biological model. The simulation is performed on a virtual grid that represents the anatomy of the tonsils of the nasopharyngeal cavity (Waldeyer ring) and the peripheral circulation -the sites of EBV infection and persistence. The simulation is presented via a user friendly visual interface and reproduces quantitative and qualitative aspects of acute and persistent EBV infection. The simulation also had predictive power in validation experiments involving certain aspects of viral infection dynamics. Moreover, it allows us to identify switch points in the infection process that direct the disease course towards the end points of persistence, clearance, or death. Lastly, we were able to identify parameter sets that reproduced aspects of EBV-associated diseases. These investigations indicate that such simulations, combined with laboratory and clinical studies and animal models, will provide a powerful approach to investigating and controlling EBV infection, including the design of targeted anti-viral therapies.
- Analysis of Schistosoma mansoni genes shared with Deuterostomia and with possible roles in host interactionsVenancio, Thiago M.; DeMarco, Ricardo; Almeida, Giulliana T.; Oliveira, Katia C.; Setubal, João C.; Verjovski-Almeida, Sergio (2007-11-08)Background: Schistosoma mansoni is a blood helminth parasite that causes schistosomiasis, a disease that affects 200 million people in the world. Many orthologs of known mammalian genes have been discovered in this parasite and evidence is accumulating that some of these genes encode proteins linked to signaling pathways in the parasite that appear to be involved with growth or development, suggesting a complex co-evolutionary process. Results: In this work we found 427 genes conserved in the Deuterostomia group that have orthologs in S. mansoni and no members in any nematodes and insects so far sequenced. Among these genes we have identified Insulin Induced Gene (INSIG), Interferon Regulatory Factor (IRF) and vasohibin orthologs, known to be involved in mammals in mevalonate metabolism, immune response and angiogenesis control, respectively. We have chosen these three genes for a more detailed characterization, which included extension of their cloned messages to obtain full-length sequences. Interestingly, SmINSIG showed a 10-fold higher expression in adult females as opposed to males, in accordance with its possible role in regulating egg production. SmIRF has a DNA binding domain, a tryptophan-rich N-terminal region and several predicted phosphorylation sites, usually important for IRF activity. Fourteen different alternatively spliced forms of the S. mansoni vasohibin (SmVASL) gene were detected that encode seven different protein isoforms including one with a complete C-terminal end, and other isoforms with shorter C-terminal portions. Using S. mansoni homologs, we have employed a parsimonious rationale to compute the total gene losses/gains in nematodes, arthropods and deuterostomes under either the Coelomata or the Ecdysozoa evolutionary hypotheses; our results show a lower losses/gains number under the latter hypothesis. Conclusion: The genes discussed which are conserved between S. mansoni and deuterostomes, probably have an ancient origin and were lost in Ecdysozoa, being still present in Lophotrochozoa. Given their known functions in Deuterostomia, it is possible that some of them have been co-opted to perform functions related (directly or indirectly) to host adaptation or interaction with host signaling processes.
- Detecting epistatic interactions contributing to human gene expression using the CEPH family dataLi, Hua; Gao, Guimin; Li, Jian; Page, Grier P.; Zhang, Kui (2007-12-18)It is believed that epistatic interactions among loci contribute to variations in quantitative traits. Several methods are available to detect epistasis using population-based data. However, methods to characterize epistasis for quantitative traits in family-based association analysis are not well developed, especially for studying thousands of gene expression traits. Here, we proposed a linear mixed-model approach to detect epistasis for quantitative traits using family data. The proposed method was implemented in a widely used software program SOLAR. We evaluated the power of the method by simulation studies and applied this method to the analysis of the Centre d'Etude du Polymorphisme Humain family gene expression data provided by Genetics Analysis Workshop 15 (GAW15).