Bradley Department of Electrical and Computer Engineering
Permanent URI for this community
From pervasive computing, to smart power systems, Virginia Tech ECE faculty and students delve into all major areas of electrical and computer engineering. The main campus is in Blacksburg, and the department has additional research and teaching facilities in Arlington, Falls Church, and Hampton, Virginia.
Browse
Browsing Bradley Department of Electrical and Computer Engineering by Department "Computer Science"
Now showing 1 - 18 of 18
Results Per Page
Sort Options
- Applications and Security of Next-Generation, User-Centric Wireless SystemsRamstetter, Jerry Rick; Yang, Yaling; Yao, Danfeng (Daphne) (MDPI, 2010-07-28)Pervasive wireless systems have significantly improved end-users quality of life. As manufacturing costs decrease, communications bandwidth increases, and contextual information is made more readily available, the role of next generation wireless systems in facilitating users daily activities will grow. Unique security and privacy issues exist in these wireless, context-aware, often decentralized systems. For example, the pervasive nature of such systems allows adversaries to launch stealthy attacks against them. In this review paper, we survey several emergent personal wireless systems and their applications. These systems include mobile social networks, active implantable medical devices, and consumer products. We explore each systems usage of contextual information and provide insight into its security vulnerabilities. Where possible, we describe existing solutions for defendingagainst these vulnerabilities. Finally, we point out promising future research directions for improving these systems robustness and security
- Beyond Finding Change: multitemporal Landsat for forest monitoring and managementWynne, Randolph H.; Thomas, Valerie A.; Brooks, Evan B.; Coulston, J. O.; Derwin, Jill M.; Liknes, Greg C.; Yang, Z.; Fox, Thomas R.; Ghannam, S.; Abbott, A. Lynn; House, M. N.; Saxena, R.; Watson, Layne T.; Gopalakrishnan, Ranjith (2017-07)Take homes
- Tobler’s Law still in effect with time series – spatial autocorrelation in temporal coherence can help in both preprocessing and estimation
- Continual process improvement in extant algorithms needed
- Need additional means by which variations within (parameterization) and across algorithms addressed (the Reverend…)
- Time series improving higher order products (example with NLCD TCC) enabling near continuous monitoring
- BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving EnvironmentsVerstak, Alex; Ramakrishnan, Naren; Watson, Layne T.; He, Jian; Shaffer, Clifford A.; Bae, Kyung Kyoon; Jiang, Jing; Tranter, William H.; Rappaport, Theodore S. (Hindawi, 2003-01-01)We describe a binding schema markup language (BSML) for describing data interchange between scientific codes. Such a facility is an important constituent of scientific problem solving environments (PSEs). BSML is designed to integrate with a PSE or application composition system that views model specification and execution as a problem of managing semistructured data. The data interchange problem is addressed by three techniques for processing semistructured data: validation, binding, and conversion. We present BSML and describe its application to a PSE for wireless communications system design.
- Changes in RNA Splicing in Developing Soybean (Glycine max) EmbryosAghamirzaie, Delasa; Nabiyouni, Mahdi; Fang, Yihui; Klumas, Curtis; Heath, Lenwood S.; Grene, Ruth; Collakova, Eva (MDPI, 2013-11-21)Developing soybean seeds accumulate oils, proteins, and carbohydrates that are used as oxidizable substrates providing metabolic precursors and energy during seed germination. The accumulation of these storage compounds in developing seeds is highly regulated at multiple levels, including at transcriptional and post-transcriptional regulation. RNA sequencing was used to provide comprehensive information about transcriptional and post-transcriptional events that take place in developing soybean embryos. Bioinformatics analyses lead to the identification of different classes of alternatively spliced isoforms and corresponding changes in their levels on a global scale during soybean embryo development. Alternative splicing was associated with transcripts involved in various metabolic and developmental processes, including central carbon and nitrogen metabolism, induction of maturation and dormancy, and splicing itself. Detailed examination of selected RNA isoforms revealed alterations in individual domains that could result in changes in subcellular localization of the resulting proteins, protein-protein and enzyme-substrate interactions, and regulation of protein activities. Different isoforms may play an important role in regulating developmental and metabolic processes occurring at different stages in developing oilseed embryos.
- Differentiating between cancer and normal tissue samples using multi-hit combinations of genetic mutationsDash, Sajal; Kinney, N.A.; Varghese, Ronnie; Garner, Harold R.; Feng, Wu-chun; Anandakrishnan, Ramu (Nature Publishing Group, 2019-01-30)Cancer is known to result from a combination of a small number of genetic defects. However, the specific combinations of mutations responsible for the vast majority of cancers have not been identified. Current computational approaches focus on identifying driver genes and mutations. Although individually these mutations can increase the risk of cancer they do not result in cancer without additional mutations. We present a fundamentally different approach for identifying the cause of individual instances of cancer: we search for combinations of genes with carcinogenic mutations (multi-hit combinations) instead of individual driver genes or mutations. We developed an algorithm that identified a set of multi-hit combinations that differentiate between tumor and normal tissue samples with 91% sensitivity (95% Confidence Interval (CI) = 89–92%) and 93% specificity (95% CI = 91–94%) on average for seventeen cancer types. We then present an approach based on mutational profile that can be used to distinguish between driver and passenger mutations within these genes. These combinations, with experimental validation, can aid in better diagnosis, provide insights into the etiology of cancer, and provide a rational basis for designing targeted combination therapies. © 2019, The Author(s).
- A feasibility study to use machine learning as an inversion algorithm for aerosol profile and property retrieval from multi-axis differential absorption spectroscopy measurementsDong, Yun; Spinei, Elena; Karpatne, Anuj (2020-10-16)In this study, we explore a new approach based on machine learning (ML) for deriving aerosol extinction coefficient profiles, single-scattering albedo and asymmetry parameter at 360 nm from a single multi-axis differential optical absorption spectroscopy (MAX-DOAS) sky scan. Our method relies on a multi-output sequence-to-sequence model combining convolutional neural networks (CNNs) for feature extraction and long short-term memory networks (LSTMs) for profile prediction. The model was trained and evaluated using data simulated by Vector Linearized Discrete Ordinate Radiative Transfer (VLIDORT) v2.7, which contains 1 459 200 unique mappings. From the simulations, 75 % were randomly selected for training and the remaining 25 % for validation. The overall error of estimated aerosol properties (1) for total aerosol optical depth (AOD) is -1.4 +/- 10.1 %, (2) for the single-scattering albedo is 0.1 +/- 3.6 %, and (3) for the asymmetry factor is -0.1 +/- 2.1 %. The resulting model is capable of retrieving aerosol extinction coefficient profiles with degrading accuracy as a function of height. The uncertainty due to the randomness in ML training is also discussed.
- High-performance biocomputing for simulating the spread of contagion over large contact networksBisset, Keith R.; Aji, Ashwin M.; Marathe, Madhav V.; Feng, Wu-chun (BMC, 2012-04-12)Background Many important biological problems can be modeled as contagion diffusion processes over interaction networks. This article shows how the EpiSimdemics interaction-based simulation system can be applied to the general contagion diffusion problem. Two specific problems, computational epidemiology and human immune system modeling, are given as examples. We then show how the graphics processing unit (GPU) within each compute node of a cluster can effectively be used to speed-up the execution of these types of problems. Results We show that a single GPU can accelerate the EpiSimdemics computation kernel by a factor of 6 and the entire application by a factor of 3.3, compared to the execution time on a single core. When 8 CPU cores and 2 GPU devices are utilized, the speed-up of the computational kernel increases to 9.5. When combined with effective techniques for inter-node communication, excellent scalability can be achieved without significant loss of accuracy in the results. Conclusions We show that interaction-based simulation systems can be used to model disparate and highly relevant problems in biology. We also show that offloading some of the work to GPUs in distributed interaction-based simulations can be an effective way to achieve increased intra-node efficiency.
- iBLAST: Incremental BLAST of new sequences via automated e-value correctionDash, Sajal; Rahman, S. R.; Hines, H. M.; Feng, Wu-chun (PLoS, 2021-04-01)Search results from local alignment search tools use statistical scores that are sensitive to the size of the database to report the quality of the result. For example, NCBI BLAST reports the best matches using similarity scores and expect values (i.e., e-values) calculated against the database size. Given the astronomical growth in genomics data throughout a genomic research investigation, sequence databases grow as new sequences are continuously being added to these databases. As a consequence, the results (e.g., best hits) and associated statistics (e.g., e-values) for a specific set of queries may change over the course of a genomic investigation. Thus, to update the results of a previously conducted BLAST search to find the best matches on an updated database, scientists must currently rerun the BLAST search against the entire updated database, which translates into irrecoverable and, in turn, wasted execution time, money, and computational resources. To address this issue, we devise a novel and efficient method to redeem past BLAST searches by introducing iBLAST. iBLAST leverages previous BLAST search results to conduct the same query search but only on the incremental (i.e., newly added) part of the database, recomputes the associated critical statistics such as e-values, and combines these results to produce updated search results. Our experimental results and fidelity analyses show that iBLAST delivers search results that are identical to NCBI BLAST at a substantially reduced computational cost, i.e., iBLAST performs (1 + δ)/δ times faster than NCBI BLAST, where δ represents the fraction of database growth. We then present three different use cases to demonstrate that iBLAST can enable efficient biological discovery at a much faster speed with a substantially reduced computational cost.
- Identifying multi-hit carcinogenic gene combinations: Scaling up a weighted set cover algorithm using compressed binary matrix representation on a GPUAl Hajri, Qais; Dash, Sajal; Feng, Wu-chun; Garner, Harold R.; Anandakrishnan, Ramu (Nature Publishing Group, 2020-02-06)Despite decades of research, effective treatments for most cancers remain elusive. One reason is that different instances of cancer result from different combinations of multiple genetic mutations (hits). Therefore, treatments that may be effective in some cases are not effective in others. We previously developed an algorithm for identifying combinations of carcinogenic genes with mutations (multi-hit combinations), which could suggest a likely cause for individual instances of cancer. Most cancers are estimated to require three or more hits. However, the computational complexity of the algorithm scales exponentially with the number of hits, making it impractical for identifying combinations of more than two hits. To identify combinations of greater than two hits, we used a compressed binary matrix representation, and optimized the algorithm for parallel execution on an NVIDIA V100 graphics processing unit (GPU). With these enhancements, the optimized GPU implementation was on average an estimated 12,144 times faster than the original integer matrix based CPU implementation, for the 3-hit algorithm, allowing us to identify 3-hit combinations. The 3-hit combinations identified using a training set were able to differentiate between tumor and normal samples in a separate test set with 90% overall sensitivity and 93% overall specificity. We illustrate how the distribution of mutations in tumor and normal samples in the multi-hit gene combinations can suggest potential driver mutations for further investigation. With experimental validation, these combinations may provide insight into the etiology of cancer and a rational basis for targeted combination therapy.
- Identifying Transcriptional Regulatory Modules Among Different Chromatin States in Mouse Neural Stem CellsBanerjee, Sharmi; Zhu, Hongxiao; Tang, Man; Feng, Wu-chun; Wu, Xiaowei; Xie, Hehuang David (Frontiers, 2019-01-15)Gene expression regulation is a complex process involving the interplay between transcription factors and chromatin states. Significant progress has been made toward understanding the impact of chromatin states on gene expression. Nevertheless, the mechanism of transcription factors binding combinatorially in different chromatin states to enable selective regulation of gene expression remains an interesting research area. We introduce a nonparametric Bayesian clustering method for inhomogeneous Poisson processes to detect heterogeneous binding patterns of multiple proteins including transcription factors to form regulatory modules in different chromatin states. We applied this approach on ChIP-seq data for mouse neural stem cells containing 21 proteins and observed different groups or modules of proteins clustered within different chromatin states. These chromatin-state-specific regulatory modules were found to have significant influence on gene expression. We also observed different motif preferences for certain TFs between different chromatin states. Our results reveal a degree of interdependency between chromatin states and combinatorial binding of proteins in the complex transcriptional regulatory process. The software package is available on Github at - https://github.com/BSharmi/DPM-LGCP.
- Intrusion Detection System for Applications using Linux ContainersAbed, Amr S.; Clancy, Thomas Charles III; Levy, David S. (Springer, 2015-12-09)Linux containers are gaining increasing traction in both individual and industrial use, and as these containers get integrated into mission-critical systems, real-time detection of malicious cyber attacks becomes a critical operational requirement. This paper introduces a real-time host-based intrusion detection system that can be used to passively detect malfeasance against applications within Linux containers running in a standalone or in a cloud multi-tenancy environment. The demonstrated intrusion detection system uses bags of system calls monitored from the host kernel for learning the behavior of an application running within a Linux container and determining anomalous container behavior. Performance of the approach using a database application was measured and results are discussed.
- Measuring and Modeling On-chip Interconnect Power on Real HardwareAdhinarayanan, Vignesh; Paul, Indrani; Greathouse, Joseph L.; Huang, Wei; Pattnaik, Ashutosh; Feng, Wu-chun (IEEE, 2016-09-26)On-chip data movement is a major source of power consumption in modern processors, and future technology nodes will exacerbate this problem. Properly understanding the power that applications expend moving data is vital for inventing mitigation strategies. Previous studies combined data movement energy, which is required to move information across the chip, with data access energy, which is used to read or write on- chip memories. This combination can hide the severity of the problem, as memories and interconnects will scale differently to future technology nodes. Thus, increasing the fidelity of our energy measurements is of paramount concern. We propose to use physical data movement distance as a mechanism for separating movement energy from access energy. We then use this mechanism to design microbenchmarks to ascertain data movement energy on a real modern processor. Using these microbenchmarks, we study the following parameters that affect interconnect power: (i) distance, (ii) interconnect bandwidth, (iii) toggle rate, and (iv) voltage and frequency. We conduct our study on an AMD GPU built in 28nm technology and validate our results against industrial estimates for energy/bit/millimeter. We then construct an empirical model based on our characterization and use it to evaluate the interconnect power of 22 real-world applications. We show that up to 14% of the dynamic power in some applications can be consumed by the interconnect and present a range of mitigation strategies.
- MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-Core ClustersHelal, A.; Sathre, Paul; Feng, Wu-chun (2016-11-15)To attain scalable performance efficiently, the HPC community expects future exascale systems to consist of multiple nodes, each with different types of hardware accelerators. In addition to GPUs and Intel MICs, additional candidate accelerators include embedded multiprocessors and FPGAs. End users need appropriate tools to efficiently use the available compute resources in such systems, both within a compute node and across compute nodes. As such, we present MetaMorph, a library framework designed to (automatically) extract as much computational capability as possible from HPC systems. Its design centers around three core principles: abstraction, interoperability, and adaptivity. To demonstrate its efficacy, we present a case study that uses the structured grids design pattern, which is heavily used in computational fluid dynamics. We show how MetaMorph significantly reduces the development time, while delivering performance and interoperability across an array of heterogeneous devices, including multicore CPUs, Intel MICs, AMD GPUs, and NVIDIA GPUs.
- Multi-dimensional characterization of electrostatic surface potential computation on graphics processors(2012-04-12)Background Calculating the electrostatic surface potential (ESP) of a biomolecule is critical towards understanding biomolecular function. Because of its quadratic computational complexity (as a function of the number of atoms in a molecule), there have been continual efforts to reduce its complexity either by improving the algorithm or the underlying hardware on which the calculations are performed. Results We present the combined effect of (i) a multi-scale approximation algorithm, known as hierarchical charge partitioning (HCP), when applied to the calculation of ESP and (ii) its mapping onto a graphics processing unit (GPU). To date, most molecular modeling algorithms perform an artificial partitioning of biomolecules into a grid/lattice on the GPU. In contrast, HCP takes advantage of the natural partitioning in biomolecules, which in turn, better facilitates its mapping onto the GPU. Specifically, we characterize the effect of known GPU optimization techniques like use of shared memory. In addition, we demonstrate how the cost of divergent branching on a GPU can be amortized across algorithms like HCP in order to deliver a massive performance boon. Conclusions We accelerated the calculation of ESP by 25-fold solely by parallelization on the GPU. Combining GPU and HCP, resulted in a speedup of at most 1,860-fold for our largest molecular structure. The baseline for these speedups is an implementation that has been hand-tuned SSE-optimized and parallelized across 16 cores on the CPU. The use of GPU does not deteriorate the accuracy of our results.
- Optimization and model reduction in the high dimensional parameter space of a budding yeast cell cycle modelOguz, Cihan; Laomettachit, Teeraphan; Chen, Katherine C.; Watson, Layne T.; Baumann, William T.; Tyson, John J. (Biomed Central, 2013-06-28)Background 'Parameter estimation from experimental data is critical for mathematical modeling of protein regulatory networks. For realistic networks with dozens of species and reactions, parameter estimation is an especially challenging task. In this study, we present an approach for parameter estimation that is effective in fitting a model of the budding yeast cell cycle (comprising 26 nonlinear ordinary differential equations containing 126 rate constants) to the experimentally observed phenotypes (viable or inviable) of 119 genetic strains carrying mutations of cell cycle genes. Results Starting from an initial guess of the parameter values, which correctly captures the phenotypes of only 72 genetic strains, our parameter estimation algorithm quickly improves the success rate of the model to 105-111 of the 119 strains. This success rate is comparable to the best values achieved by a skilled modeler manually choosing parameters over many weeks. The algorithm combines two search and optimization strategies. First, we use Latin hypercube sampling to explore a region surrounding the initial guess. From these samples, we choose ∼20 different sets of parameter values that correctly capture wild type viability. These sets form the starting generation of differential evolution that selects new parameter values that perform better in terms of their success rate in capturing phenotypes. In addition to producing highly successful combinations of parameter values, we analyze the results to determine the parameters that are most critical for matching experimental outcomes and the most competitive strains whose correct outcome with a given parameter vector forces numerous other strains to have incorrect outcomes. These “most critical parameters” and “most competitive strains” provide biological insights into the model. Conversely, the “least critical parameters” and “least competitive strains” suggest ways to reduce the computational complexity of the optimization. Conclusions Our approach proves to be a useful tool to help systems biologists fit complex dynamical models to large experimental datasets. In the process of fitting the model to the data, the tool identifies suggestive correlations among aspects of the model and the data.
- A Stochastic Model Correctly Predicts Changes in Budding Yeast Cell Cycle Dynamics upon Periodic Expression of CLN2Oguz, Cihan; Palmisano, Alida; Laomettachit, Teeraphan; Watson, Layne T.; Baumann, William T.; Tyson, John J. (PLOS, 2014-05-09)In this study, we focus on a recent stochastic budding yeast cell cycle model. First, we estimate the model parameters using extensive data sets: phenotypes of 110 genetic strains, single cell statistics of wild type and cln3 strains. Optimization of stochastic model parameters is achieved by an automated algorithm we recently used for a deterministic cell cycle model. Next, in order to test the predictive ability of the stochastic model, we focus on a recent experimental study in which forced periodic expression of CLN2 cyclin (driven by MET3 promoter in cln3 background) has been used to synchronize budding yeast cell colonies. We demonstrate that the model correctly predicts the experimentally observed synchronization levels and cell cycle statistics of mother and daughter cells under various experimental conditions (numerical data that is not enforced in parameter optimization), in addition to correctly predicting the qualitative changes in size control due to forced CLN2 expression. Our model also generates a novel prediction: under frequent CLN2 expression pulses, G1 phase duration is bimodal among small-born cells. These cells originate from daughters with extended budded periods due to size control during the budded period. This novel prediction and the experimental trends captured by the model illustrate the interplay between cell cycle dynamics, synchronization of cell colonies, and size control in budding yeast.
- Stochastic simulation of enzyme-catalyzed reactions with disparate timescalesBarik, Debashis; Paul, Mark R.; Baumann, William T.; Cao, Yang; Tyson, John J. (Cell Press, 2008-10-01)Many physiological characteristics of living cells are regulated by protein interaction networks. Because the total numbers of these protein species can be small, molecular noise can have significant effects on the dynamical properties of a regulatory network. Computing these stochastic effects is made difficult by the large timescale separations typical of protein interactions (e. g., complex formation may occur in fractions of a second, whereas catalytic conversions may take minutes). Exact stochastic simulation may be very inefficient under these circumstances, and methods for speeding up the simulation without sacrificing accuracy have been widely studied. We show that the "total quasi-steady-state approximation'' for enzyme-catalyzed reactions provides a useful framework for efficient and accurate stochastic simulations. The method is applied to three examples: a simple enzyme-catalyzed reaction where enzyme and substrate have comparable abundances, a Goldbeter-Koshland switch, where a kinase and phosphatase regulate the phosphorylation state of a common substrate, and coupled Goldbeter-Koshland switches that exhibit bistability. Simulations based on the total quasi-steady-state approximation accurately capture the steady-state probability distributions of all components of these reaction networks. In many respects, the approximation also faithfully reproduces time-dependent aspects of the fluctuations. The method is accurate even under conditions of poor timescale separation.
- Transcriptome-wide functional characterization reveals novel relationships among differentially expressed transcripts in developing soybean embryosAghamirzaie, Delasa; Batra, Dhruv; Heath, Lenwood S.; Schneider, Andrew; Grene, Ruth; Collakova, Eva (Biomed Central, 2015-11-14)Background Transcriptomics reveals the existence of transcripts of different coding potential and strand orientation. Alternative splicing (AS) can yield proteins with altered number and types of functional domains, suggesting the global occurrence of transcriptional and post-transcriptional events. Many biological processes, including seed maturation and desiccation, are regulated post-transcriptionally (e.g., by AS), leading to the production of more than one coding or noncoding sense transcript from a single locus. Results We present an integrated computational framework to predict isoform-specific functions of plant transcripts. This framework includes a novel plant-specific weighted support vector machine classifier called CodeWise, which predicts the coding potential of transcripts with over 96 % accuracy, and several other tools enabling global sequence similarity, functional domain, and co-expression network analyses. First, this framework was applied to all detected transcripts (103,106), out of which 13 % was predicted by CodeWise to be noncoding RNAs in developing soybean embryos. Second, to investigate the role of AS during soybean embryo development, a population of 2,938 alternatively spliced and differentially expressed splice variants was analyzed and mined with respect to timing of expression. Conserved domain analyses revealed that AS resulted in global changes in the number, types, and extent of truncation of functional domains in protein variants. Isoform-specific co-expression network analysis using ArrayMining and clustering analyses revealed specific sub-networks and potential interactions among the components of selected signaling pathways related to seed maturation and the acquisition of desiccation tolerance. These signaling pathways involved abscisic acid- and FUSCA3-related transcripts, several of which were classified as noncoding and/or antisense transcripts and were co-expressed with corresponding coding transcripts. Noncoding and antisense transcripts likely play important regulatory roles in seed maturation- and desiccation-related signaling in soybean. Conclusions This work demonstrates how our integrated framework can be implemented to make experimentally testable predictions regarding the coding potential, co-expression, co-regulation, and function of transcripts and proteins related to a biological process of interest.