Browsing by Author "Anandakrishnan, Ramu"
Now showing 1 - 13 of 13
Results Per Page
Sort Options
- Accelerating Electrostatic Surface Potential Calculation with Multiscale Approximation on Graphics Processing UnitsAnandakrishnan, Ramu; Scogland, Thomas R. W.; Fenley, Andrew T.; Gordon, John; Feng, Wu-chun; Onufriev, Alexey V. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2009)Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. This paper demonstrates how one can take advantage of graphic processing units (GPUs) available in today’s typical desktop computer, together with a multiscale approximation method, to significantly speedup such computations. Specifically, the electrostatic potential computation, using an analytical linearized Poisson Boltzmann (ALPB) method, is implemented on an ATI Radeon 4870 GPU in combination with the hierarchical charge partitioning (HCP) multiscale approximation. This implementation delivers a combined 1800-fold speedup for a 476,040 atom viral capsid.
- CAGm: A repository of germline microsatellite variations in the 1000 genomes projectKinney, N.; Titus-Glover, K.; Wren, J.D.; Varghese, Ronnie; Michalak, Pawel; Liao, H.; Anandakrishnan, Ramu; Pulenthiran, A.; Kang, L.; Garner, Harold R. (Oxford University Press, 2019-01-08)The human genome harbors an abundance of repetitive DNA; however, its function continues to be debated. Microsatellites-a class of short tandem repeat-are established as an important source of genetic variation. Array length variants are common among microsatellites and affect gene expression; but, efforts to understand the role and diversity of microsatellite variation has been hampered by several challenges. Without adequate depth, both long-read and short-read sequencing may not detect the variants present in a sample; additionally, large sample sizes are needed to reveal the degree of population-level polymorphism. To address these challenges we present the Comparative Analysis of Germline Microsatellites (CAGm): A database of germline microsatellites from 2529 individuals in the 1000 genomes project. A key novelty of CAGm is the ability to aggregate microsatellite variation by population, ethnicity (super population) and gender. The database provides advanced searching for microsatellites embedded in genes and functional elements. All data can be downloaded as Microsoft Excel spreadsheets. Two use-case scenarios are presented to demonstrate its utility: A mononucleotide (A) microsatellite at the BAT-26 locus and a dinucleotide (CA) microsatellite in the coding region of FGFRL1. CAGm is freely available at http://www.cagmdb.org/. © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.
- Cranial manipulation affects cholinergic pathway gene expression in aged ratsAnandakrishnan, Ramu; Tobey, Hope; Nguyen, Steven; Sandoval, Osscar; Klein, Bradley G.; Costa, Blaise M. (De Gruyter, 2022-01-10)Context: Age-dependent dementia is a devastating disorder afflicting a growing older population. Although pharmacological agents improve symptoms of dementia, age-related comorbidities combined with adverse effects often outweigh their clinical benefits. Therefore, nonpharmacological therapies are being investigated as an alternative. In a previous pilot study, aged rats demonstrated improved spatial memory after osteopathic cranial manipulative medicine (OCMM) treatment. Objectives: In this continuation of the pilot study, we examine the effect of OCMM on gene expression to elicit possible explanations for the improvement in spatialmemory. Methods: OCMM was performed on six of 12 elderly rats every day for 7 days. Rats were then euthanized to obtain the brain tissue, from which RNA samples were extracted. RNA from three treated and three controls were of sufficient quality for sequencing. These samples were sequenced utilizing next-generation sequencing from Illumina NextSeq. The Cufflinks software suite was utilized to assemble transcriptomes and quantify the RNA expression level for each sample. Results: Transcriptome analysis revealed that OCMM significantly affected the expression of 36 genes in the neuronal pathway (false discovery rate [FDR] <0.004). The top five neuronal genes with the largest-fold change were part of the cholinergic neurotransmission mechanism, which is known to affect cognitive function. In addition, 39.9% of 426 significant differentially expressed (SDE) genes (FDR<0.004) have been previously implicated in neurological disorders. Overall, changes in SDE genes combined with their role in central nervous system signaling pathways suggest a connection to previously reported OCMM-induced behavioral and biochemical changes in aged rats. Conclusions: Results from this pilot study provide sufficient evidence to support a more extensive study with a larger sample size. Further investigation in this direction will provide a better understanding of the molecular mechanisms of OCMM and its potential in clinical applications. With clinical validation, OCMM could represent a much-needed low-risk adjunct treatment for age-related dementia including Alzheimer's disease.
- Crossing complexity of space-filling curves reveals entanglement of S-phase DNAKinney, Nick; Hickman, Molly; Anandakrishnan, Ramu; Garner, Harold R. (2020-08-31)Space-filling curves have been used for decades to study the folding principles of globular proteins, compact polymers, and chromatin. Formally, space-filling curves trace a single circuit through a set of points (x,y,z); informally, they correspond to a polymer melt. Although not quite a melt, the folding principles of Human chromatin are likened to the Hilbert curve: a type of space-filling curve. Hilbert-like curves in general make biologically compelling models of chromatin; in particular, they lack knots which facilitates chromatin folding, unfolding, and easy access to genes. Knot complexity has been intensely studied with the aid of Alexander polynomials; however, the approach does not generalize well to cases of more than one chromosome. Crossing complexity is an understudied alternative better suited for quantifying entanglement between chromosomes. Do Hilbert-like configurations limit crossing complexity between chromosomes? How does crossing complexity for Hilbert-like configurations compare to equilibrium configurations? To address these questions, we extend the Mansfield algorithm to enable sampling of Hilbert-like space filling curves on a simple cubic lattice. We use the extended algorithm to generate equilibrium, intermediate, and Hilbert-like configurational ensembles and compute crossing complexity between curves (chromosomes) in each configurational snapshot. Our main results are twofold: (a) Hilbert-like configurations limit entanglement between chromosomes and (b) Hilbert-like configurations do not limit entanglement in a model of S-phase DNA. Our second result is particularly surprising yet easily rationalized with a geometric argument. We explore ergodicity of the extended algorithm and discuss our results in the context of more sophisticated models of chromatin.
- Differentiating between cancer and normal tissue samples using multi-hit combinations of genetic mutationsDash, Sajal; Kinney, N.A.; Varghese, Ronnie; Garner, Harold R.; Feng, Wu-chun; Anandakrishnan, Ramu (Nature Publishing Group, 2019-01-30)Cancer is known to result from a combination of a small number of genetic defects. However, the specific combinations of mutations responsible for the vast majority of cancers have not been identified. Current computational approaches focus on identifying driver genes and mutations. Although individually these mutations can increase the risk of cancer they do not result in cancer without additional mutations. We present a fundamentally different approach for identifying the cause of individual instances of cancer: we search for combinations of genes with carcinogenic mutations (multi-hit combinations) instead of individual driver genes or mutations. We developed an algorithm that identified a set of multi-hit combinations that differentiate between tumor and normal tissue samples with 91% sensitivity (95% Confidence Interval (CI) = 89–92%) and 93% specificity (95% CI = 91–94%) on average for seventeen cancer types. We then present an approach based on mutational profile that can be used to distinguish between driver and passenger mutations within these genes. These combinations, with experimental validation, can aid in better diagnosis, provide insights into the etiology of cancer, and provide a rational basis for designing targeted combination therapies. © 2019, The Author(s).
- Functional bias in molecular evolution rate of Arabidopsis thalianaWarren, Andrew S.; Anandakrishnan, Ramu; Zhang, Liqing (2010-05-01)Background Characteristics derived from mutation and other mechanisms that are advantageous for survival are often preserved during evolution by natural selection. Some genes are conserved in many organisms because they are responsible for fundamental biological function, others are conserved for their unique functional characteristics. Therefore one would expect the rate of molecular evolution for individual genes to be dependent on their biological function. Whether this expectation holds for genes duplicated by whole genome duplication is not known. Results We empirically demonstrate here, using duplicated genes generated from the Arabidopsis thaliana α-duplication event, that the rate of molecular evolution of genes duplicated in this event depend on biological function. Using functional clustering based on gene ontology annotation of gene pairs, we show that some duplicated genes, such as defense response genes, are under weaker purifying selection or under stronger diversifying selection than other duplicated genes, such as protein translation genes, as measured by the ratio of nonsynonymous to synonymous divergence (dN/dS). Conclusions These results provide empirical evidence indicating that molecular evolution rate for genes duplicated in whole genome duplication, as measured by dN/dS, may depend on biological function, which we characterize using gene ontology annotation. Furthermore, the general approach used here provides a framework for comparative analysis of molecular evolution rate for genes based on their biological function.
- H++3.0: automating pK prediction and the preparation of biomolecular structures for atomistic molecular modeling and simulationsAnandakrishnan, Ramu; Aguilar, Boris; Onufriev, Alexey V. (2012-07)The accuracy of atomistic biomolecular modeling and simulation studies depend on the accuracy of the input structures. Preparing these structures for an atomistic modeling task, such as molecular dynamics (MD) simulation, can involve the use of a variety of different tools for: correcting errors, adding missing atoms, filling valences with hydrogens, predicting pK values for titratable amino acids, assigning predefined partial charges and radii to all atoms, and generating force field parameter/topology files for MD. Identifying, installing and effectively using the appropriate tools for each of these tasks can be difficult for novice and time-consuming for experienced users. H++ (http://biophysics.cs.vt.edu/) is a free open-source web server that automates the above key steps in the preparation of biomolecular structures for molecular modeling and simulations. H++ also performs extensive error and consistency checking, providing error/warning messages together with the suggested corrections. In addition to numerous minor improvements, the latest version of H++ includes several new capabilities and options: fix erroneous (flipped) side chain conformations for HIS, GLN and ASN, include a ligand in the input structure, process nucleic acid structures and generate a solvent box with specified number of common ions for explicit solvent MD.
- Identifying multi-hit carcinogenic gene combinations: Scaling up a weighted set cover algorithm using compressed binary matrix representation on a GPUAl Hajri, Qais; Dash, Sajal; Feng, Wu-chun; Garner, Harold R.; Anandakrishnan, Ramu (Nature Publishing Group, 2020-02-06)Despite decades of research, effective treatments for most cancers remain elusive. One reason is that different instances of cancer result from different combinations of multiple genetic mutations (hits). Therefore, treatments that may be effective in some cases are not effective in others. We previously developed an algorithm for identifying combinations of carcinogenic genes with mutations (multi-hit combinations), which could suggest a likely cause for individual instances of cancer. Most cancers are estimated to require three or more hits. However, the computational complexity of the algorithm scales exponentially with the number of hits, making it impractical for identifying combinations of more than two hits. To identify combinations of greater than two hits, we used a compressed binary matrix representation, and optimized the algorithm for parallel execution on an NVIDIA V100 graphics processing unit (GPU). With these enhancements, the optimized GPU implementation was on average an estimated 12,144 times faster than the original integer matrix based CPU implementation, for the 3-hit algorithm, allowing us to identify 3-hit combinations. The 3-hit combinations identified using a training set were able to differentiate between tumor and normal samples in a separate test set with 90% overall sensitivity and 93% overall specificity. We illustrate how the distribution of mutations in tumor and normal samples in the multi-hit gene combinations can suggest potential driver mutations for further investigation. With experimental validation, these combinations may provide insight into the etiology of cancer and a rational basis for targeted combination therapy.
- Modulation of nucleosomal DNA accessibility via charge-altering post-translational modifications in histone coreFenley, Andrew T.; Anandakrishnan, Ramu; Kidane, Yared H.; Onufriev, Alexey V. (2018-03-16)Background Controlled modulation of nucleosomal DNA accessibility via post-translational modifications (PTM) is a critical component to many cellular functions. Charge-altering PTMs in the globular histone core—including acetylation, phosphorylation, crotonylation, propionylation, butyrylation, formylation, and citrullination—can alter the strong electrostatic interactions between the oppositely charged nucleosomal DNA and the histone proteins and thus modulate accessibility of the nucleosomal DNA, affecting processes that depend on access to the genetic information, such as transcription. However, direct experimental investigation of the effects of these PTMs is very difficult. Theoretical models can rationalize existing observations, suggest working hypotheses for future experiments, and provide a unifying framework for connecting PTMs with the observed effects. Results A physics-based framework is proposed that predicts the effect of charge-altering PTMs in the histone core, quantitatively for several types of lysine charge-neutralizing PTMs including acetylation, and qualitatively for all phosphorylations, on the nucleosome stability and subsequent changes in DNA accessibility, making a connection to resulting biological phenotypes. The framework takes into account multiple partially assembled states of the nucleosome at the atomic resolution. The framework is validated against experimentally known nucleosome stability changes due to the acetylation of specific lysines, and their effect on transcription. The predicted effect of charge-altering PTMs on DNA accessibility can vary dramatically, from virtually none to a strong, region-dependent increase in accessibility of the nucleosomal DNA; in some cases, e.g., H4K44, H2AK75, and H2BK57, the effect is significantly stronger than that of the extensively studied acetylation sites such H3K56, H3K115 or H3K122. Proximity to the DNA is suggestive of the strength of the PTM effect, but there are many exceptions. For the vast majority of charge-altering PTMs, the predicted increase in the DNA accessibility should be large enough to result in a measurable modulation of transcription. However, a few possible PTMs, such as acetylation of H4K77, counterintuitively decrease the DNA accessibility, suggestive of the repressed chromatin. A structural explanation for the phenomenon is provided. For the majority of charge-altering PTMs, the effect on DNA accessibility is simply additive (noncooperative), but there are exceptions, e.g., simultaneous acetylation of H4K79 and H3K122, where the combined effect is amplified. The amplification is a direct consequence of the nucleosome–DNA complex having more than two structural states. The effect of individual PTMs is classified based on changes in the accessibility of various regions throughout the nucleosomal DNA. The PTM’s resulting imprint on the DNA accessibility, “PTMprint,” is used to predict effects of many yet unexplored PTMs. For example, acetylation of H4K44 yields a PTMprint similar to the PTMprint of H3K56, and thus acetylation of H4K44 is predicted to lead to a wide range of strong biological effects. Conclusion Charge-altering post-translational modifications in the relatively unexplored globular histone core may provide a precision mechanism for controlling accessibility to the nucleosomal DNA.
- A Partition Function Approximation Using Elementary Symmetric FunctionsAnandakrishnan, Ramu (PLOS, 2012-12-12)In statistical mechanics, the canonical partition function can be used to compute equilibrium properties of a physical system. Calculating however, is in general computationally intractable, since the computation scales exponentially with the number of particles in the system. A commonly used method for approximating equilibrium properties, is the Monte Carlo (MC) method. For some problems the MC method converges slowly, requiring a very large number of MC steps. For such problems the computational cost of the Monte Carlo method can be prohibitive. Presented here is a deterministic algorithm – the direct interaction algorithm (DIA) – for approximating the canonical partition function in operations. The DIA approximates the partition function as a combinatorial sum of products known as elementary symmetric functions (ESFs), which can be computed in operations. The DIA was used to compute equilibrium properties for the isotropic 2D Ising model, and the accuracy of the DIA was compared to that of the basic Metropolis Monte Carlo method. Our results show that the DIA may be a practical alternative for some problems where the Monte Carlo method converge slowly, and computational speed is a critical constraint, such as for very large systems or web-based applications.
- Point Charges Optimally Placed to Represent the Multipole Expansion of Charge DistributionsAnandakrishnan, Ramu; Baker, Charles; Izadi, Saeed; Onufriev, Alexey V. (PLOS, 2013-07-04)We propose an approach for approximating electrostatic charge distributions with a small number of point charges to optimally represent the original charge distribution. By construction, the proposed optimal point charge approximation (OPCA) retains many of the useful properties of point multipole expansion, including the same far-field asymptotic behavior of the approximate potential. A general framework for numerically computing OPCA, for any given number of approximating charges, is described. We then derive a 2-charge practical point charge approximation, PPCA, which approximates the 2-charge OPCA via closed form analytical expressions, and test the PPCA on a set of charge distributions relevant to biomolecular modeling. We measure the accuracy of the new approximations as the RMS error in the electrostatic potential relative to that produced by the original charge distribution, at a distance the extent of the charge distribution–the mid-field. The error for the 2-charge PPCA is found to be on average 23% smaller than that of optimally placed point dipole approximation, and comparable to that of the point quadrupole approximation. The standard deviation in RMS error for the 2-charge PPCA is 53% lower than that of the optimal point dipole approximation, and comparable to that of the point quadrupole approximation. We also calculate the 3-charge OPCA for representing the gas phase quantum mechanical charge distribution of a water molecule. The electrostatic potential calculated by the 3-charge OPCA for water, in the mid-field (2.8 Å from the oxygen atom), is on average 33.3% more accurate than the potential due to the point multipole expansion up to the octupole order. Compared to a 3 point charge approximation in which the charges are placed on the atom centers, the 3-charge OPCA is seven times more accurate, by RMS error. The maximum error at the oxygen-Na distance (2.23 Å ) is half that of the point multipole expansion up to the octupole order.
- Scaling out a combinatorial algorithm for discovering carcinogenic gene combinations to thousands of GPUsDash, Sajal; Al-Hajri, Qais; Feng, Wu-chun; Garner, Harold R.; Anandakrishnan, Ramu (IEEE, 2021-05-01)Cancer is a leading cause of death in the US, second only to heart disease. It is primarily a result of a combination of an estimated two-nine genetic mutations (multi-hit combinations). Although a body of research has identified hundreds of cancer-causing genetic mutations, we don't know the specific combination of mutations responsible for specific instances of cancer for most cancer types. An approximate algorithm for solving the weighted set cover problem was previously adapted to identify combinations of genes with mutations that may be responsible for individual instances of cancer. However, the algorithm's computational requirement scales exponentially with the number of genes, making it impractical for identifying more than three-hit combinations, even after the algorithm was parallelized and scaled up to a V100 GPU. Since most cancers have been estimated to require more than three hits, we scaled out the algorithm to identify combinations of four or more hits using 1000 nodes (6000 V100 GPUs with ≈ 48× 106 processing cores) on the Summit supercomputer at Oak Ridge National Laboratory. Efficiently scaling out the algorithm required a series of algorithmic innovations and optimizations for balancing an exponentially divergent workload across processors and for minimizing memory latency and inter-node communication. We achieved an average strong scaling efficiency of 90.14% (80.96%-97.96% for 200 to 1000 nodes), compared to a 100 node run, with 84.18% scaling efficiency for 1000 nodes. With experimental validation, the multi-hit combinations identified here could provide further insight into the etiology of different cancer subtypes and provide a rational basis for targeted combination therapy.
- Statistics and Physical Origins of pK and Ionization State Changes upon Protein-Ligand BindingAguilar, Boris; Anandakrishnan, Ramu; Ruscio, Jory Z.; Onufriev, Alexey V. (CELL PRESS, 2010-03-01)This work investigates statistical prevalence and overall physical origins of changes in charge states of receptor proteins upon ligand binding. These changes are explored as a function of the ligand type (small molecule, protein, and nucleic acid), and distance from the binding region. Standard continuum solvent methodology is used to compute, on an equal footing, pK changes upon ligand binding for a total of 5899 ionizable residues in 20 protein-protein, 20 protein-small molecule, and 20 protein-nucleic acid high-resolution complexes. The size of the data set combined with an extensive error and sensitivity analysis allows us to make statistically justified and conservative conclusions: in 60% of all protein-small molecule, 90% of all protein-protein, and 85% of all protein-nucleic acid complexes there exists at least one ionizable residue that changes its charge state upon ligand binding at physiological conditions (pH = 6.5). Considering the most biologically relevant pH range of 4-8, the number of ionizable residues that experience substantial pK changes (Delta pK > 1.0) due to ligand binding is appreciable: on average, 6% of all ionizable residues in protein-small molecule complexes, 9% in protein-protein, and 12% in protein-nucleic acid complexes experience a substantial pK change upon ligand binding. These changes are safely above the statistical false-positive noise level. Most of the changes occur in the immediate binding interface region, where approximately one out of five ionizable residues experiences substantial pK change regardless of the ligand type. However, the physical origins of the change differ between the types: in protein-nucleic acid complexes, the pK values of interface residues are predominantly affected by electrostatic effects, whereas in protein-protein and protein-small molecule complexes, structural changes due to the induced-fit effect play an equally important role. In protein-protein and protein-nucleic acid complexes, there is a statistically significant number of substantial pK perturbations, mostly due to the induced-fit structural changes, in regions far from the binding interface.