Browsing by Author "Yu, Guoqiang"
Now showing 1 - 20 of 77
Results Per Page
Sort Options
- Aberrant Calcium Signaling in Astrocytes Inhibits Neuronal Excitability in a Human Down Syndrome Stem Cell ModelMizuno, Grace O.; Wang, Yinxue; Shi, Guilai; Wang, Yizhi; Sun, Junqing; Papadopoulos, Stelios; Broussard, Gerard J.; Unger, Elizabeth K.; Deng, Wenbin; Weick, Jason; Bhattacharyya, Anita; Chen, Chao-Yin; Yu, Guoqiang; Looger, Loren L.; Tian, Lin (Elsevier, 2018-07-10)Down syndrome (DS) is a genetic disorder that causes cognitive impairment. The staggering effects associated with an extra copy of human chromosome 21 (HSA21) complicates mechanistic understanding of DS pathophysiology. We examined the neuronastrocyte interplay in a fully recapitulated HSA21 trisomy cellular model differentiated from DS-patientderived induced pluripotent stem cells (iPSCs). By combining calciumimaging with genetic approaches, we discovered the functional defects of DS astroglia and their effects on neuronal excitability. Compared with control isogenic astroglia, DS astroglia exhibited more-frequent spontaneous calcium fluctuations, which reduced the excitability of co-cultured neurons. Furthermore, suppressed neuronal activity could be rescued by abolishing astrocytic spontaneous calcium activity either chemically by blocking adenosine-mediated signaling or genetically by knockdown of inositol triphosphate (IP3) receptors or S100B, a calcium binding protein coded on HSA21. Our results suggest a mechanism by which DS alters the function of astrocytes, which subsequently disturbs neuronal excitability.
- Accurate Identification of Significant Aberrations in Cancer Genome: Implementation and ApplicationsHou, Xuchu (Virginia Tech, 2013-01-07)Somatic Copy Number Alterations (CNAs) are common events in human cancers. Identifying CNAs and Significant Copy number Aberrations (SCAs) in cancer genomes is a critical task in searching for cancer-associated genes. Advanced genome profiling technologies, such as SNP array technology, facilitate copy number study at a genome-wide scale with high resolution. However, due to normal tissue contamination, the observed intensity signals are actually the mixture of copy number signals contributed from both tumor and normal cells. This genetic confounding factor would significantly affect the subsequent copy number analyses. In order to accurately identify significant aberrations in contaminated cancer genome, we develop a Java AISAIC package (Accurate Identification of Significant Aberrations in Cancer) that incorporates recent novel algorithms in the literature, BACOM (Bayesian Analysis of Copy number Mixtures) and SAIC (Significant Aberrations in Cancer). Specifically, BACOM is used to estimate the normal tissue contamination fraction and recover the "true" copy number profiles. And SAIC is used to detect SCAs using large recovered tumor samples. Considering the popularity of modern multi-core computers and clusters, we adopt concurrent computing using Java Fork/Join API to speed up the analysis. We evaluate the performance of the AISAIC package in both empirical family-wise type I error rate and detection power on a large number of simulation data, and get promising results. Finally, we use AISAIC to analyze real cancer data from TCGA portal and detect many SCAs that not only cover majority of reported cancer-associated genes, but also some novel genome regions that may worth further study.
- Adversarial RFML: Evading Deep Learning Enabled Signal ClassificationFlowers, Bryse Austin (Virginia Tech, 2019-07-24)Deep learning has become an ubiquitous part of research in all fields, including wireless communications. Researchers have shown the ability to leverage deep neural networks (DNNs) that operate on raw in-phase and quadrature samples, termed Radio Frequency Machine Learning (RFML), to synthesize new waveforms, control radio resources, as well as detect and classify signals. While there are numerous advantages to RFML, this thesis answers the question "is it secure?" DNNs have been shown, in other applications such as Computer Vision (CV), to be vulnerable to what are known as adversarial evasion attacks, which consist of corrupting an underlying example with a small, intelligently crafted, perturbation that causes a DNN to misclassify the example. This thesis develops the first threat model that encompasses the unique adversarial goals and capabilities that are present in RFML. Attacks that occur with direct digital access to the RFML classifier are differentiated from physical attacks that must propagate over-the-air (OTA) and are thus subject to impairments due to the wireless channel or inaccuracies in the signal detection stage. This thesis first finds that RFML systems are vulnerable to current adversarial evasion attacks using the well known Fast Gradient Sign Method originally developed for CV applications. However, these current adversarial evasion attacks do not account for the underlying communications and therefore the adversarial advantage is limited because the signal quickly becomes unintelligible. In order to envision new threats, this thesis goes on to develop a new adversarial evasion attack that takes into account the underlying communications and wireless channel models in order to create adversarial evasion attacks with more intelligible underlying communications that generalize to OTA attacks.
- Algorithms and Simulation Framework for Residential Demand ResponseAdhikari, Rajendra (Virginia Tech, 2019-02-11)An electric power system is a complex network consisting of a large number of power generators and consumers interconnected by transmission and distribution lines. One remarkable thing about the electric grid is that there has to be a continuous balance between the amount of electricity generated and consumed at all times. Maintaining this balance is critical for the stable operation of the grid and this task is achieved in the long term, short term and real-time by operating a three-tier wholesale electricity market consisting of the capacity market, the energy market and the ancillary services market respectively. For a demand resource to participate in the energy and the capacity markets, it needs to be able to reduce the power consumption on-demand, whereas to participate in the ancillary services market, the power consumption of the demand resource needs to be varied continuously following the regulation signal sent by the grid operator. This act of changing the demand to help maintain energy balance is called demand response (DR). The dissertation presents novel algorithms and tools to enable residential buildings to participate as demand resources on such markets to provide DR. Residential sector consumes 37% of the total U.S. electricity consumption and a recent consumer survey showed that 88% of consumers are either eager or supportive of advanced technologies for energy efficiency, including demand response. This indicates that residential sector is a very good target for DR. Two broad solutions for residential DR are presented. The first is a set of efficient algorithms that intelligently controls the customers' heating, ventilating and air conditioning (HVAC) devices for providing DR services to the grid. The second solution is an extensible residential demand response simulation framework that can help evaluate and experiment with different residential demand response algorithms. One of the algorithms presented in this dissertation is to reduce the aggregated demand of a set of HVACs during a DR event while respecting the customers' comfort requirements. The algorithm is shown to be efficient, simple to implement and is proven to be optimal. The second algorithm helps provide the regulation DR while honoring customer comfort requirements. The algorithm is efficient, simple to implement and is shown to perform well in a range of real-world situations. A case study is presented estimating the monetary benefit that can be obtained by implementing the algorithm in a cluster of 100 typical homes and shows promising result. Finally, the dissertation presents the design of a python-based object-oriented residential DR simulation framework which is easy to extend as needed. The framework supports simulation of thermal dynamics of a residential building and supports house hold appliances such as HVAC, water heater, clothes washer/dryer and dish washer. A case study showing the application of the simulation framework for various DR implementation is presented, which shows that the simulation framework performs well and can be a useful tool for future research in residential DR.
- Analysis of Blockchain-based Smart Contracts for Peer-to-Peer Solar Electricity Transactive MarketsLin, Jason (Virginia Tech, 2019-02-08)The emergence of blockchain technology and increasing penetration of distributed energy resources (DERs) have created a new opportunity for peer-to-peer (P2P) energy trading. However, challenges arise in such transactive markets to ensure individual rationality, incentive compatibility, budget balance, and economic efficiency during the trading process. This thesis creates an hour-ahead P2P energy trading network based on the Hyperledger Fabric blockchain and explores a comparative analysis of different auction mechanisms that form the basis of smart contracts. Considered auction mechanisms are discriminatory and uniform k-Double Auction with different k values. This thesis also investigates effects of four consumer and prosumer bidding strategies: random, preference factor, price-only game-theoretic approach, and supply-demand game-theoretic approach. A custom simulation framework that models the behavior of the transactive market is developed. Case studies of a 100-home microgrid at various photovoltaic (PV) penetration levels are presented using typical residential load and PV generation profiles in the metropolitan Washington, D.C. area. Results indicate that regardless of PV penetration levels and employed bidding strategies, discriminatory k-DA can outperform uniform k-DA. Despite so, discriminatory k-DA is more sensitive to market conditions than uniform k-DA. Additionally, results show that the price-only game-theoretic bidding strategy leads to near-ideal economic efficiencies regardless of auction mechanisms and PV penetration levels.
- Apply Machine Learning on Cattle Behavior Classification Using Accelerometer DataZhao, Zhuqing (Virginia Tech, 2022-04-15)We used a 50Hz sampling frequency to collect tri-axle acceleration from the cows. For the traditional Machine learning approach, we segmented the data to calculate features, selected the important features, and applied machine learning algorithms for classification. We compared the performance of various models and found a robust model with relatively low computation and high accuracy. For the deep learning approach, we designed an end-to-end trainable Convolutional Neural Networks (CNN) to predict activities for given segments, applied distillation, and quantization to reduce model size. In addition to the fixed window size approach, we used CNN to predict dense labels that each data point has an individual label, inspired by semantic segmentation. In this way, we could have a more precise measurement for the composition of activities. Summarily, physically monitoring the well-being of crowded animals is labor-intensive, so we proposed a solution for timely and efficient measuring of cattle’s daily activities using wearable sensors and machine learning models.
- Architecting IoT-Enabled Smart Building TestbedAmanzadeh, Leila (Virginia Tech, 2018-10-29)Smart building's benefits range from improving comfort of occupant, increased productivity, reduction in energy consumption and operating costs, lower CO2 emission, to improved life cycle of utilities, efficient operation of building systems, etc. [65]. Hence, modern building owners are turning towards smart buildings. However, the current smart buildings mostly are not capable of achieving the objectives they are designed for and they can improve a lot better [22]. Therefore, a new technology called, Internet of Things, or IoT, is combined with the smart buildings to improve their performance [23]. IoT is the inter-networking of things embedded with electronics, software, sensors, actuators, and network connectivity to collect and exchange data, and things in this definition is anything and everything around us and even ourselves. Using this technology, e.g. a door can be a thing and can sense how many people have passed it's sensor to enter a space and let the lighting system know to prepare appropriate amount of light, or the HVAC (Heating Ventilation Air Conditioning) system to provide desirable temperature. IoT will provide a lot of useful information that before that accessibility to it was impossible, e.g., condition of water pipes in winter, which helps avoiding damages like frozen or broken pipes. However, despite all the benefits, IoT suffers from being vulnerable to cyber attacks. Examples have been provided later in Chapter 1. In this project among building systems, HVAC system is chosen to be automated with a new control method called MPC (Model Predictive Control). This method is fast, very energy efficient and has a lower than 0.001 rate of error for regulating the space temperature to any temperature that the occupants desire according to the results of this project. Furthermore, a PID (Proportional–Integral–Derivative) controller has been designed for the HVAC system that in the exact same cases MPC shows a much better performance. To design controllers for HVAC system and set the temperature to the desired value a method to automate balancing the heat flow should be found, therefore a thermal model of building should be available that using this model, the amount of heat, flowing in and out of a space in the building disregarding the external weather would be known to estimate. To automate the HVAC system using the programming languages like MATLAB, there is a need to convert the thermal model of the building to a mathematical model. This mathematical model is unique for each building depending on how many floors it has, how wide it is, and what materials have been used to construct the building. This process is needs a lot of effort and time even for buildings with 2 floors and 2 rooms on each floor and at the end the engineer might have done it with error. In this project you will see a software that will do the conversion of thermal model of buildings in any size to their mathematical model automatically, which helps improving the HVAC controllers to set temperature to the value occupants desire and avoid errors and time loss which is put both into calculations and troubleshooting. In addition, a test environment has been designed and constructed as a cyber physical system that allows us to test the IoT- enabled control systems before implementing them on real buildings, observe the performance, and decide if the system is satisfying or not. Also, all cyber threats can be explored and the solutions to those attacks can be evaluated. Even for the systems that are already out there, there is an opportunity to be assessed on this testbed and if there is any vulnerability in case of cyber security, solutions would be evaluated and help the existing systems improve.
- Armband EMG-based Lifting Detection and Load Classification Algorithms using Static and Dynamic Lifting TrialsTaori, Sakshi Pranay (Virginia Tech, 2023-06-08)The high prevalence of work-related musculoskeletal disorders in occupational settings necessitates the development of economic, accurate, and convenient methods for quantifying biomechanical risk exposures. In terms of lifting, the occupational work environment does not provide resources for recording the start and end times of lifting tasks performed by individual workers. As a result, automatic detection of lift starts and ends is required for practical purposes. Occupational lifting styles vary depending on the asymmetry angle, which is the degree of shoulder or trunk rotation required by the lifting task. Predictive or machine learning (ML) algorithms have been increasingly used in the ergonomics field to identify occupational risk factors, such as lifting loads. However, such algorithms are often developed and validated using the dataset collected from the same lab-based experimental set-up, which limits their external validity. The recent development of wearable armbands with surface electromyography (sEMG) electrodes provides a low-cost, wireless, and non-invasive way to collect EMG data beyond laboratory settings. Despite their tremendous potential for field-based workload estimation, these armbands have not been widely implemented yet in automated lift detection and occupational workload estimation. The objective of this study was to evaluate the performance of machine learning (ML) algorithms in the automatic detection of lifts and classification of hand loads during manual lifting tasks from the data acquired by a wearable armband sensor with eight surface electromyography (sEMG) electrodes. Twelve healthy participants (six male and six female) performed repetitive symmetric (S), asymmetric (A), and free dynamic (F) lifts with three different hand-load levels (5 lb, 10 lb and 15 lb) at two origin (24" and 36") and two destination heights (6" and 36"). Three ML algorithms were utilized: Random Forest (RF), Support Vector Machines (SVM) and Gaussian Naïve Bayes (GNB). For lift detection, a subset of four participants was analyzed as a preliminary investigation. RF showed the best performance with the mean start and end errors of 0.53 ± 0.25 seconds and 0.76 ± 0.28 seconds, respectively. The accuracy score of 84.3 ± 3.3% was reported for lift start and 83.3 ± 9.9% for lift end. For hand-load classification, prediction models were developed using four different lifting datasets (S, A, S+A, and F) and were cross-validated using F as the test dataset. Mean classification accuracy was significantly lower in models developed with the S dataset (78.8 ± 7.3%) compared to A (83.3 ± 7.2%), S+A (82.1 ± 7.3%), and F (83.4 ± 8.1%). Overall, findings indicate that the implementation of ML algorithms with wearable EMG armbands for automatic lift detection in occupational settings can be promising. In hand-load classification, models developed with only controlled symmetric lifts were less accurate in predicting loads of more dynamic, unconstrained lifts, which is common in real-world settings. However, since both A and S+A demonstrated equivalent model accuracy with F, EMG armbands possess strong potential for estimating the hand loads of free-dynamic lifts using constrained lift trials involving asymmetric lifts.
- Asymmetric independence modeling identifies novel gene-environment interactionsYu, Guoqiang; Miller, David J.; Wu, Chiung-Ting; Hoffman, Eric P.; Liu, Chunyu; Herrington, David M.; Wang, Yue (Springer Nature, 2019-02-21)Most genetic or environmental factors work together in determining complex disease risk. Detecting gene-environment interactions may allow us to elucidate novel and targetable molecular mechanisms on how environmental exposures modify genetic effects. Unfortunately, standard logistic regression (LR) assumes a convenient mathematical structure for the null hypothesis that however results in both poor detection power and type 1 error, and is also susceptible to missing factor, imperfect surrogate, and disease heterogeneity confounding effects. Here we describe a new baseline framework, the asymmetric independence model (AIM) in case-control studies, and provide mathematical proofs and simulation studies verifying its validity across a wide range of conditions. We show that AIM mathematically preserves the asymmetric nature of maintaining health versus acquiring a disease, unlike LR, and thus is more powerful and robust to detect synergistic interactions. We present examples from four clinically discrete domains where AIM identified interactions that were previously either inconsistent or recognized with less statistical certainty.
- Automated Analysis of Astrocyte Activities from Large-scale Time-lapse Microscopic Imaging DataWang, Yizhi (Virginia Tech, 2019-12-13)The advent of multi-photon microscopes and highly sensitive protein sensors enables the recording of astrocyte activities on a large population of cells over a long-time period in vivo. Existing tools cannot fully characterize these activities, both within single cells and at the population-level, because of the insufficiency of current region-of-interest-based approaches to describe the activity that is often spatially unfixed, size-varying, and propagative. Here, we present Astrocyte Quantitative Analysis (AQuA), an analytical framework that releases astrocyte biologists from the ROI-based paradigm. The framework takes an event-based perspective to model and accurately quantify the complex activity in astrocyte imaging datasets, with an event defined jointly by its spatial occupancy and temporal dynamics. To model the signal propagation in astrocyte, we developed graphical time warping (GTW) to align curves with graph-structured constraints and integrated it into AQuA. To make AQuA easy to use, we designed a comprehensive software package. The software implements the detection pipeline in an intuitive step by step GUI with visual feedback. The software also supports proof-reading and the incorporation of morphology information. With synthetic data, we showed AQuA performed much better in accuracy compared with existing methods developed for astrocytic data and neuronal data. We applied AQuA to a range of ex vivo and in vivo imaging datasets. Since AQuA is data-driven and based on machine learning principles, it can be applied across model organisms, fluorescent indicators, experimental modes, and imaging resolutions and speeds, enabling researchers to elucidate fundamental astrocyte physiology.
- Automated Identification and Tracking of Motile Oligodendrocyte Precursor Cells (OPCs) from Time-lapse 3D Microscopic Imaging Data of Cell Clusters in vivoWang, Yinxue (Virginia Tech, 2021-06-02)Advances in time-lapse 3D in vivo fluorescence microscopic imaging techniques enables the observation and investigation into the migration of Oligodendrocyte precursor cells (OPCs) and its role in the central nervous system. However, current practice of image-based OPC motility analysis heavily relies on manual labeling and tracking on 2D max projection of the 3D data, which suffers from massive human labor, subjective biases, weak reproducibility and especially information loss and distortion. Besides, due to the lack of OPC specific genetically encoded indicator, OPCs can only be identified from other oligodendrocyte lineage cells by their observed motion patterns. Automated analytical tools are needed for the identification and tracking of OPCs. In this dissertation work, we proposed an analytical framework, MicTracker (Migrating Cell Tracker), for the integrated task of identifying, segmenting and tracking migrating cells (OPCs) from in vivo time-lapse fluorescence imaging data of high-density cell clusters composed of cells with different modes of motions. As a component of the framework, we presented a novel strategy for cell segmentation with global temporal consistency enforced, tackling the challenges caused by highly clustered cell population and temporally inconsistently blurred boundaries between touching cells. We also designed a data association algorithm to address the violation of usual assumption of small displacements. Recognizing that the violation was in the mixed cell population composed of two cell groups while the assumption held within each group, we proposed to solve the seemingly impossible mission by de-mixing the two groups of cell motion modes without known labels. We demonstrated the effectiveness of MicTracker in solving our problem on in vivo real data.
- Automated Tracking of Mouse Embryogenesis from Large-scale Fluorescence Microscopy DataWang, Congchao (Virginia Tech, 2021-06-03)Recent breakthroughs in microscopy techniques and fluorescence probes enable the recording of mouse embryogenesis at the cellular level for days, easily generating terabyte-level 3D time-lapse data. Since millions of cells are involved, this information-rich data brings a natural demand for an automated tool for its comprehensive analysis. This tool should automatically (1) detect and segment cells at each time point and (2) track cell migration across time. Most existing cell tracking methods cannot scale to the data with such large size and high complexity. For those purposely designed for embryo data analysis, the accuracy is heavily sacrificed. Here, we present a new computational framework for the mouse embryo data analysis with high accuracy and efficiency. Our framework detects and segments cells with a fully probability-principled method, which not only has high statistical power but also helps determine the desired cell territories and increase the segmentation accuracy. With the cells detected at each time point, our framework reconstructs cell traces with a new minimum-cost circulation-based paradigm, CINDA (CIrculation Network-based DataAssociation). Compared with the widely used minimum-cost flow-based methods, CINDA guarantees the global optimal solution with the best-of-known theoretical worst-case complexity and hundreds to thousands of times practical efficiency improvement. Since the information extracted from a single time point is limited, our framework iteratively refines cell detection and segmentation results based on the cell traces which contain more information from other time points. Results show that this dramatically improves the accuracy of cell detection, segmentation, and tracking. To make our work easy to use, we designed a standalone software, MIVAQ (Microscopic Image Visualization, Annotation, and Quantification), with our framework as the backbone and a user-friendly interface. With MIVAQ, users can easily analyze their data and visually check the results.
- BACOM2.0 facilitates absolute normalization and quantification of somatic copy number alterations in heterogeneous tumorFu, Yi; Yu, Guoqiang; Levine, Douglas A.; Wang, Niya; Shih, Ie-Ming; Zhang, Zhen; Clarke, Robert; Wang, Yue (Springer Nature, 2015-09-09)Most published copy number datasets on solid tumors were obtained from specimens comprised of mixed cell populations, for which the varying tumor-stroma proportions are unknown or unreported. The inability to correct for signal mixing represents a major limitation on the use of these datasets for subsequent analyses, such as discerning deletion types or detecting driver aberrations. We describe the BACOM2.0 method with enhanced accuracy and functionality to normalize copy number signals, detect deletion types, estimate tumor purity, quantify true copy numbers, and calculate average-ploidy value. While BACOM has been validated and used with promising results, subsequent BACOM analysis of the TCGA ovarian cancer dataset found that the estimated average tumor purity was lower than expected. In this report, we first show that this lowered estimate of tumor purity is the combined result of imprecise signal normalization and parameter estimation. Then, we describe effective allele-specific absolute normalization and quantification methods that can enhance BACOM applications in many biological contexts while in the presence of various confounders. Finally, we discuss the advantages of BACOM in relation to alternative approaches. Here we detail this revised computational approach, BACOM2.0, and validate its performance in real and simulated datasets.
- Bayesian Alignment Model for Analysis of LC-MS-based Omic DataTsai, Tsung-Heng (Virginia Tech, 2014-05-22)Liquid chromatography coupled with mass spectrometry (LC-MS) has been widely used in various omic studies for biomarker discovery. Appropriate LC-MS data preprocessing steps are needed to detect true differences between biological groups. Retention time alignment is one of the most important yet challenging preprocessing steps, in order to ensure that ion intensity measurements among multiple LC-MS runs are comparable. In this dissertation, we propose a Bayesian alignment model (BAM) for analysis of LC-MS data. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and provides estimates of the retention time variability along with uncertainty measures, enabling a natural framework to integrate information of various sources. From methodology development to practical application, we investigate the alignment problem through three research topics: 1) development of single-profile Bayesian alignment model, 2) development of multi-profile Bayesian alignment model, and 3) application to biomarker discovery research. Chapter 2 introduces the profile-based Bayesian alignment using a single chromatogram, e.g., base peak chromatogram from each LC-MS run. The single-profile alignment model improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler using a block Metropolis-Hastings algorithm, and 2) an adaptive mechanism for knot specification using stochastic search variable selection (SSVS). Chapter 3 extends the model to integrate complementary information that better captures the variability in chromatographic separation. We use Gaussian process regression on the internal standards to derive a prior distribution for the mapping functions. In addition, a clustering approach is proposed to identify multiple representative chromatograms for each LC-MS run. With the Gaussian process prior, these chromatograms are simultaneously considered in the profile-based alignment, which greatly improves the model estimation and facilitates the subsequent peak matching process. Chapter 4 demonstrates the applicability of the proposed Bayesian alignment model to biomarker discovery research. We integrate the proposed Bayesian alignment model into a rigorous preprocessing pipeline for LC-MS data analysis. Through the developed analysis pipeline, candidate biomarkers for hepatocellular carcinoma (HCC) are identified and confirmed on a complementary platform.
- Biclustering and Visualization of High Dimensional Data using VIsual Statistical Data AnalyzerBlake, Patrick Michael (Virginia Tech, 2019-01-31)Many data sets have too many features for conventional pattern recognition techniques to work properly. This thesis investigates techniques that alleviate these difficulties. One such technique, biclustering, clusters data in both dimensions and is inherently resistant to the challenges posed by having too many features. However, the algorithms that implement biclustering have limitations in that the user must know at least the structure of the data and how many biclusters to expect. This is where the VIsual Statistical Data Analyzer, or VISDA, can help. It is a visualization tool that successively and progressively explores the structure of the data, identifying clusters along the way. This thesis proposes coupling VISDA with biclustering to overcome some of the challenges of data sets with too many features. Further, to increase the performance, usability, and maintainability as well as reduce costs, VISDA was translated from Matlab to a Python version called VISDApy. Both VISDApy and the overall process were demonstrated with real and synthetic data sets. The results of this work have the potential to improve analysts' understanding of the relationships within complex data sets and their ability to make informed decisions from such data.
- Bioinformatic Analysis of Coronary Disease Associated SNPs and Genes to Identify Proteins Potentially Involved in the Pathogenesis of AtherosclerosisMao, Chunhong; Howard, Timothy D.; Sullivan, Dan; Fu, Zongming; Yu, Guoqiang; Parker, Sarah J.; Will, Rebecca; Vander Heide, Richard S.; Wang, Yue; Hixson, James; Van Eyk, Jennifer; Herrington, David M. (Open Access Pub, 2017-03-04)Factors that contribute to the onset of atherosclerosis may be elucidated by bioinformatic techniques applied to multiple sources of genomic and proteomic data. The results of genome wide association studies, such as the CardioGramPlusC4D study, expression data, such as that available from expression quantitative trait loci (eQTL) databases, along with protein interaction and pathway data available in Ingenuity Pathway Analysis (IPA), constitute a substantial set of data amenable to bioinformatics analysis. This study used bioinformatic analyses of recent genome wide association data to identify a seed set of genes likely associated with atherosclerosis. The set was expanded to include protein interaction candidates to create a network of proteins possibly influencing the onset and progression of atherosclerosis. Local average connectivity (LAC), eigenvector centrality, and betweenness metrics were calculated for the interaction network to identify top gene and protein candidates for a better understanding of the atherosclerotic disease process. The top ranking genes included some known to be involved with cardiovascular disease (APOA1, APOA5, APOB, APOC1, APOC2, APOE, CDKN1A, CXCL12, SCARB1, SMARCA4 and TERT), and others that are less obvious and require further investigation (TP53, MYC, PPARG, YWHAQ, RB1, AR, ESR1, EGFR, UBC and YWHAZ). Collectively these data help define a more focused set of genes that likely play a pivotal role in the pathogenesis of atherosclerosis and are therefore natural targets for novel therapeutic interventions.
- Blockchain-based Peer-to-peer Electricity Trading Framework Through Machine Learning-based Anomaly Detection TechniqueJing, Zejia (Virginia Tech, 2022-08-31)With the growing installation of home photovoltaics, traditional energy trading is evolving from a unidirectional utility-to-consumer model into a more distributed peer-to-peer paradigm. Besides, with the development of building energy management platforms and demand response-enabled smart devices, energy consumption saved, known as negawatt-hours, has also emerged as another commodity that can be exchanged. Users may tune their heating, ventilation, and air conditioning (HVAC) system setpoints to adjust building hourly energy consumption to generate negawatt-hours. Both photovoltaic (PV) energy and negawatt-hours are two major resources of peer-to-peer electricity trading. Blockchain has been touted as an enabler for trustworthy and reliable peer-to-peer trading to facilitate the deployment of such distributed electricity trading through encrypted processes and records. Unfortunately, blockchain cannot fully detect anomalous participant behaviors or malicious inputs to the network. Consequentially, end-user anomaly detection is imperative in enhancing trust in peer-to-peer electricity trading. This dissertation introduces machine learning-based anomaly detection techniques in peer-to-peer PV energy and negawatt-hour trading. This can help predict the next hour's PV energy and negawatt-hours available and flag potential anomalies when submitted bids. As the traditional energy trading market is agnostic to tangible real-world resources, developing, evaluating, and integrating machine learning forecasting-based anomaly detection methods can give users knowledge of reasonable bid offer quantity. Suppose a user intentionally or unintentionally submits extremely high/low bids that do not match their solar panel capability or are not backed by substantial negawatt-hours and PV energy resources. Some anomalies occur because the participant's sensor is suffering from integrity errors. At the same time, some other abnormal offers are maliciously submitted intentionally to benefit attackers themselves from market disruption. In both cases, anomalies should be detected by the algorithm and rejected by the market. Artificial Neural Networks (ANN), Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), and Convolutional Neural Network (CNN) are compared and studied in PV energy and negawatt-hour forecasting. The semi-supervised anomaly detection framework is explained, and its performance is demonstrated. The threshold values of anomaly detection are determined based on the model trained on historical data. Besides ambient weather information, HVAC setpoint and building occupancy are input parameters to predict building hourly energy consumption in negawatt-hour trading. The building model is trained and managed by negawatt-hour aggregators. CO2 monitoring devices are integrated into the cloud-based smart building platform BEMOSS™ to demonstrate occupancy levels, further improving building load forecasting accuracy in negawatt-hour trading. The relationship between building occupancy and CO2 measurement is analyzed. Finally, experiments based on the Hyperledger platform demonstrate blockchain-based peer-to-peer energy trading and how the platform detects anomalies.
- Blockchain-enabled Secure and Trusted Personalized Health RecordDong, Yibin (Virginia Tech, 2022-12-20)Longitudinal personalized electronic health record (LPHR) provides a holistic view of health records for individuals and offers a consistent patient-controlled information system for managing the health care of patients. Except for the patients in Veterans Affairs health care service, however, no LPHR is available for the general population in the U.S. that can integrate the existing patients' electronic health records throughout life of care. Such a gap may be contributed mainly by the fact that existing patients' electronic health records are scattered across multiple health care facilities and often not shared due to privacy and security concerns from both patients and health care organizations. The main objective of this dissertation is to address these roadblocks by designing a scalable and interoperable LPHR with patient-controlled and mutually-trusted security and privacy. Privacy and security are complex problems. Specifically, without a set of access control policies, encryption alone cannot secure patient data due to insider threat. Moreover, in a distributed system like LPHR, so-called race condition occurs when access control policies are centralized while decisions making processes are localized. We propose a formal definition of secure LPHR and develop a blockchain-enabled next generation access control (BeNGAC) model. The BeNGAC solution focuses on patient-managed secure authorization for access, and NGAC operates in open access surroundings where users can be centrally known or unknown. We also propose permissioned blockchain technology - Hyperledger Fabric (HF) - to ease the shortcoming of race condition in NGAC that in return enhances the weak confidentiality protection in HF. Built upon BeNGAC, we further design a blockchain-enabled secure and trusted (BEST) LPHR prototype in which data are stored in a distributed yet decentralized database. The unique feature of the proposed BEST-LPHR is the use of blockchain smart contracts allowing BeNGAC policies to govern the security, privacy, confidentiality, data integrity, scalability, sharing, and auditability. The interoperability is achieved by using a health care data exchange standard called Fast Health Care Interoperability Resources. We demonstrated the feasibility of the BEST-LPHR design by the use case studies. Specifically, a small-scale BEST-LPHR is built for sharing platform among a patient and health care organizations. In the study setting, patients have been raising additional ethical concerns related to consent and granular control of LPHR. We engineered a Web-delivered BEST-LPHR sharing platform with patient-controlled consent granularity, security, and privacy realized by BeNGAC. Health organizations that holding the patient's electronic health record (EHR) can join the platform with trust based on the validation from the patient. The mutual trust is established through a rigorous validation process by both the patient and built-in HF consensus mechanism. We measured system scalability and showed millisecond-range performance of LPHR permission changes. In this dissertation, we report the BEST-LPHR solution to electronically sharing and managing patients' electronic health records from multiple organizations, focusing on privacy and security concerns. While the proposed BEST-LPHR solution cannot, expectedly, address all problems in LPHR, this prototype aims to increase EHR adoption rate and reduce LPHR implementation roadblocks. In a long run, the BEST-LPHR will contribute to improving health care efficiency and the quality of life for many patients.
- Building occupancy analytics based on deep learning through the use of environmental sensor dataZhang, Zheyu (Virginia Tech, 2023-05-24)Balancing indoor comfort and energy consumption is crucial to building energy efficiency. Occupancy information is a vital aspect in this process, as it determines the energy demand. Although there are various sensors used to gather occupancy information, environmental sensors stand out due to their low cost and privacy benefits. Machine learning algorithms play a critical role in estimating the relationship between occupancy levels and environmental data. To improve performance, more complex models such as deep learning algorithms are necessary. Long Short-Term Memory (LSTM) is a powerful deep learning algorithm that has been utilized in occupancy estimation. However, recently, an algorithm named Attention has emerged with improved performance. The study proposes a more effective model for occupancy level estimation by incorporating Attention into the existing Long Short-Term Memory algorithm. The results show that the proposed model is more accurate than using a single algorithm and has the potential to be integrated into building energy control systems to conserve even more energy.
- Collimator width Optimization in X-ray Luminescent Computed TomographyMishra, Sourav (Virginia Tech, 2013-06-17)X-ray Luminescent Computed Tomography (XLCT) is a new imaging modality which is under extensive trials at present. The modality works by selective excitation of X-ray sensitive nanophosphors and detecting the optical signal thus generated. This system can be used towards recreating high quality tomographic slices even with low X-ray dose. There have been many studies which have reported successful validation of the underlying philosophy. However, there is still lack of information about optimal settings or combination of imaging parameters, which could yield best outputs. Research groups participating in this area have reported results on basis of dose, signal to noise ratio or resolution only. In this thesis, the candidate has evaluated XLCT taking into consideration noise and resolution in terms of composite indices. Simulations have been performed for various beam widths and noise & resolution metrics deduced. This information has been used in evaluating quality of images on basis of CT Figure of Merit & a modified Wang-Bovik Image Quality index. Simulations indicate the presence of an optimal setting which can be set prior to extensive scans. The conducted study, although focusing on a particular implementation, hopes to establish a paradigm in finding best settings for any XLCT system. Scanning with an optimal setting preconfigured can help in vastly reducing the cost and risks involved with this imaging modality.