VTechWorks Repository :: Browsing by Author "Lou, Wenjing"

Browsing by Author "Lou, Wenjing"

Now showing 1 - 20 of 58

Android Application Install-time Permission Validation and Run-time Malicious Pattern Detection
Ma, Zhongmin (Virginia Tech, 2014-01-31)
The open source structure of Android applications introduces security vulnerabilities that can be readily exploited by third-party applications. We address certain vulnerabilities at both installation and runtime using machine learning. Effective classification techniques with neural networks can be used to verify the application categories on installation. We devise a novel application category verification methodology that involves machine learning the application permissions and estimating the likelihoods of different categories. To detect malicious patterns in runtime, we present a Hidden Markov Model (HMM) method to analyze the activity usage by tracking Intent log information. After applying our technique to nearly 1,700 popular third-party Android applications and malware, we report that a major portion of the category declarations were judged correctly. This demonstrates the effectiveness of neural network decision engines in validating Android application categories. The approach, using HMM to analyze the Intent log for the detection of malicious runtime behavior, is new. The test results show promise with a limited input dataset (69.7% accuracy). To improve the performance, further work will be carried out to: increase the dataset size by adding game applications, to optimize Baum-Welch algorithm parameters, and to balance the size of the Intent sequence. To better emulate the participant's usage, some popular applications can be selected in advance, and the remainder can be randomly chosen.
Anomaly Detection Through System and Program Behavior Modeling
Xu, Kui (Virginia Tech, 2014-12-15)
Various vulnerabilities in software applications become easy targets for attackers. The trend constantly being observed in the evolution of advanced modern exploits is their growing sophistication in stealthy attacks. Code-reuse attacks such as return-oriented programming allow intruders to execute mal-intended instruction sequences on a victim machine without injecting external code. Successful exploitation leads to hijacked applications or the download of malicious software (drive-by download attack), which usually happens without the notice or permission from users. In this dissertation, we address the problem of host-based system anomaly detection, specifically by predicting expected behaviors of programs and detecting run-time deviations and anomalies. We first introduce an approach for detecting the drive-by download attack, which is one of the major vectors for malware infection. Our tool enforces the dependencies between user actions and system events, such as file-system access and process execution. It can be used to provide real time protection of a personal computer, as well as for diagnosing and evaluating untrusted websites for forensic purposes. We perform extensive experimental evaluation, including a user study with 21 participants, thousands of legitimate websites (for testing false alarms), 84 malicious websites in the wild, as well as lab reproduced exploits. Our solution demonstrates a usable host-based framework for controlling and enforcing the access of system resources. Secondly, we present a new anomaly-based detection technique that probabilistically models and learns a program's control flows for high-precision behavioral reasoning and monitoring. Existing solutions suffer from either incomplete behavioral modeling (for dynamic models) or overestimating the likelihood of call occurrences (for static models). We introduce a new probabilistic anomaly detection method for modeling program behaviors. Its uniqueness is the ability to quantify the static control flow in programs and to integrate the control flow information in probabilistic machine learning algorithms. The advantage of our technique is the significantly improved detection accuracy. We observed 11 up to 28-fold of improvement in detection accuracy compared to the state-of-the-art HMM-based anomaly models. We further integrate context information into our detection model, which achieves both strong flow-sensitivity and context-sensitivity. Our context-sensitive approach gives on average over 10 times of improvement for system call monitoring, and 3 orders of magnitude for library call monitoring, over existing regular HMM methods. Evaluated with a large amount of program traces and real-world exploits, our findings confirm that the probabilistic modeling of program dependences provides a significant source of behavior information for building high-precision models for real-time system monitoring. Abnormal traces (obtained through reproducing exploits and synthesized abnormal traces) can be well distinguished from normal traces by our model.
Attack and Defense with Hardware-Aided Security
Zhang, Ning (Virginia Tech, 2016-08-26)
Riding on recent advances in computing and networking, our society is now experiencing the evolution into the age of information. While the development of these technologies brings great value to our daily life, the lucrative reward from cyber-crimes has also attracted criminals. As computing continues to play an increasing role in the society, security has become a pressing issue. Failures in computing systems could result in loss of infrastructure or human life, as demonstrated in both academic research and production environment. With the continuing widespread of malicious software and new vulnerabilities revealing every day, protecting the heterogeneous computing systems across the Internet has become a daunting task. Our approach to this challenge consists of two directions. The first direction aims to gain a better understanding of the inner working of both attacks and defenses in the cyber environment. Meanwhile, our other direction is designing secure systems in adversarial environment.
Blockchain and Distributed Consensus: From Security Analysis to Novel Applications
Xiao, Yang (Virginia Tech, 2022-05-13)
Blockchain, the technology behind cryptocurrency, enables decentralized and distrustful parties to maintain a unique and consistent transaction history through consensus, without involving a central authority. The decentralization, transparency, and consensus-driven security promised by blockchain are unprecedented and can potentially enable a wide range of new applications that prevail in the decentralized zero-trust model. While blockchain represents a secure-by-design approach to building zero-trust applications, there still exist outstanding security bottlenecks that hinder the technology's wider adoption, represented by the following two challenges: (1) blockchain as a distributed networked system is multi-layered in nature which has complex security implications that are not yet fully understood or addressed; (2) when we use blockchain to construct new applications, especially those previously implemented in the centralized manner, there often lack effective paradigms to customize and augment blockchain's security offerings to realize domain-specific security goals. In this work, we provide answers to the above two challenges in two coordinated efforts. In the first effort, we target the fundamental security issues caused by blockchain's multi-layered nature and the consumption of external data. Existing analyses on blockchain consensus security overlooked an important cross-layer factor---the heterogeneity of the P2P network's connectivity. We first provide a comprehensive review on notable blockchain consensus protocols and their security properties. Then we focus one class of consensus protocol---the popular Nakamoto consensus---for which we propose a new analytical model from the networking perspective that quantifies the impact of heterogeneous network connectivity on key consensus security metrics, providing insights on the actual "51% attack" threshold (safety) and mining revenue distribution (fairness). The external data truthfulness challenge is another fundamental challenge concerning the decentralized applications running on top of blockchain. The validity of external data is key to the system's operational security but is out of the jurisdiction of blockchain consensus. We propose DecenTruth, a system that combines a data mining technique called truth discovery and Byzantine fault-tolerant consensus to enable decentralized nodes to collectively extract truthful information from data submitted by untrusted external sources. In the second effort, we harness the security offerings of blockchain's smart contract functionality along with external security tools to enable two domain-specific applications---data usage control and decentralized spectrum access system. First, we use blockchain to tackle a long-standing privacy challenge of data misuse. Individual data owners often lose control on how their data can be used once sharing the data with another party, epitomized by the Facebook-Cambridge Analytica data scandal. We propose PrivacyGuard, a security platform that combines blockchain smart contract and hardware trusted execution environment (TEE) to enable individual data owner's fine-grained control over the usage (e.g., which operation, who can use on what condition/price) of their private data. A core technical innovation of PrivacyGuard is the TEE-based execution and result commitment protocol, which extends blockchain's zero-trust security to the off-chain physical domain. Second, we employ blockchain to address the potential security and performance issues facing dynamic spectrum sharing in the 5G or next-G wireless networks. The current spectrum access system (SAS) designated by the FCC follows a centralized server-client service model which is vulnerable to single-point failures of SAS service providers and also lacks an efficient, automated inter-SAS synchronization mechanism. In response, we propose a blockchain-based decentralized SAS architecture dubbed BD-SAS to provide SAS service efficiently to spectrum users and enable automated inter-SAS synchronization, without assuming trust on individual SAS service providers. We hope this work can provide new insights into blockchain's fundamental security and applicability to new security domains.
Blockchain-based Peer-to-peer Electricity Trading Framework Through Machine Learning-based Anomaly Detection Technique
Jing, Zejia (Virginia Tech, 2022-08-31)
With the growing installation of home photovoltaics, traditional energy trading is evolving from a unidirectional utility-to-consumer model into a more distributed peer-to-peer paradigm. Besides, with the development of building energy management platforms and demand response-enabled smart devices, energy consumption saved, known as negawatt-hours, has also emerged as another commodity that can be exchanged. Users may tune their heating, ventilation, and air conditioning (HVAC) system setpoints to adjust building hourly energy consumption to generate negawatt-hours. Both photovoltaic (PV) energy and negawatt-hours are two major resources of peer-to-peer electricity trading. Blockchain has been touted as an enabler for trustworthy and reliable peer-to-peer trading to facilitate the deployment of such distributed electricity trading through encrypted processes and records. Unfortunately, blockchain cannot fully detect anomalous participant behaviors or malicious inputs to the network. Consequentially, end-user anomaly detection is imperative in enhancing trust in peer-to-peer electricity trading. This dissertation introduces machine learning-based anomaly detection techniques in peer-to-peer PV energy and negawatt-hour trading. This can help predict the next hour's PV energy and negawatt-hours available and flag potential anomalies when submitted bids. As the traditional energy trading market is agnostic to tangible real-world resources, developing, evaluating, and integrating machine learning forecasting-based anomaly detection methods can give users knowledge of reasonable bid offer quantity. Suppose a user intentionally or unintentionally submits extremely high/low bids that do not match their solar panel capability or are not backed by substantial negawatt-hours and PV energy resources. Some anomalies occur because the participant's sensor is suffering from integrity errors. At the same time, some other abnormal offers are maliciously submitted intentionally to benefit attackers themselves from market disruption. In both cases, anomalies should be detected by the algorithm and rejected by the market. Artificial Neural Networks (ANN), Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), and Convolutional Neural Network (CNN) are compared and studied in PV energy and negawatt-hour forecasting. The semi-supervised anomaly detection framework is explained, and its performance is demonstrated. The threshold values of anomaly detection are determined based on the model trained on historical data. Besides ambient weather information, HVAC setpoint and building occupancy are input parameters to predict building hourly energy consumption in negawatt-hour trading. The building model is trained and managed by negawatt-hour aggregators. CO2 monitoring devices are integrated into the cloud-based smart building platform BEMOSS™ to demonstrate occupancy levels, further improving building load forecasting accuracy in negawatt-hour trading. The relationship between building occupancy and CO2 measurement is analyzed. Finally, experiments based on the Hyperledger platform demonstrate blockchain-based peer-to-peer energy trading and how the platform detects anomalies.
Building trustworthy machine learning systems in adversarial environments
Wang, Ning (Virginia Tech, 2023-05-26)
Modern AI systems, particularly with the rise of big data and deep learning in the last decade, have greatly improved our daily life and at the same time created a long list of controversies. AI systems are often subject to malicious and stealthy subversion that jeopardizes their efficacy. Many of these issues stem from the data-driven nature of machine learning. While big data and deep models significantly boost the accuracy of machine learning models, they also create opportunities for adversaries to tamper with models or extract sensitive data. Malicious data providers can compromise machine learning systems by supplying false data and intermediate computation results. Even a well-trained model can be deceived to misbehave by an adversary who provides carefully designed inputs. Furthermore, curious parties can derive sensitive information of the training data by interacting with a machine-learning model. These adversarial scenarios, known as poisoning attack, adversarial example attack, and inference attack, have demonstrated that security, privacy, and robustness have become more important than ever for AI to gain wider adoption and societal trust. To address these problems, we proposed the following solutions: (1) FLARE, which detects and mitigates stealthy poisoning attacks by leveraging latent space representations; (2) MANDA, which detects adversarial examples by utilizing evaluations from diverse sources, i.e, model-based prediction and data-based evaluation; (3) FeCo which enhances the robustness of machine learning-based network intrusion detection systems by introducing a novel representation learning method; and (4) DP-FedMeta, which preserves data privacy and improves the privacy-accuracy trade-off in machine learning systems through a novel adaptive clipping mechanism.
Characterizing and Detecting Online Deception via Data-Driven Methods
Hu, Hang (Virginia Tech, 2020-05-27)
In recent years, online deception has become a major threat to information security. Online deception that caused significant consequences is usually spear phishing. Spear-phishing emails come in a very small volume, target a small number of audiences, sometimes impersonate a trusted entity and use very specific content to redirect targets to a phishing website, where the attacker tricks targets sharing their credentials. In this thesis, we aim at measuring the entire process. Starting from phishing emails, we examine anti-spoofing protocols, analyze email services' policies and warnings towards spoofing emails, and measure the email tracking ecosystem. With phishing websites, we implement a powerful tool to detect domain name impersonation and detect phishing pages using dynamic and static analysis. We also analyze credential sharing on phishing websites, and measure what happens after victims share their credentials. Finally, we discuss potential phishing and privacy concerns on new platforms such as Alexa and Google Assistant. In the first part of this thesis (Chapter 3), we focus on measuring how email providers detect and handle forged emails. We also try to understand how forged emails can reach user inboxes by deliberately composing emails. Finally, we check how email providers warn users about forged emails. In the second part (Chapter 4), we measure the adoption of anti-spoofing protocols and seek to understand the reasons behind the low adoption rates. In the third part of this thesis (Chapter 5), we observe that a lot of phishing emails use email tracking techniques to track targets. We collect a large dataset of email messages using disposable email services and measure the landscape of email tracking. In the fourth part of this thesis (Chapter 6), we move on to phishing websites. We implement a powerful tool to detect squatting domains and train a machine learning model to classify phishing websites. In the fifth part (Chapter 7), we focus on the credential leaks. More specifically, we measure what happens after the targets' credentials are leaked. We monitor and measure the potential post-phishing exploiting activities. Finally, with new voice platforms such as Alexa becoming more and more popular, we wonder if new phishing and privacy concerns emerge with new platforms. In this part (Chapter 8), we systematically assess the attack surfaces by measuring sensitive applications on voice assistant systems. My thesis measures important parts of the complete process of online deception. With deeper understandings of phishing attacks, more complete and effective defense mechanisms can be developed to mitigate attacks in various dimensions.
Computational Modeling for Differential Analysis of RNA-seq and Methylation data
Wang, Xiao (Virginia Tech, 2016-08-16)
Computational systems biology is an inter-disciplinary field that aims to develop computational approaches for a system-level understanding of biological systems. Advances in high-throughput biotechnology offer broad scope and high resolution in multiple disciplines. However, it is still a major challenge to extract biologically meaningful information from the overwhelming amount of data generated from biological systems. Effective computational approaches are of pressing need to reveal the functional components. Thus, in this dissertation work, we aim to develop computational approaches for differential analysis of RNA-seq and methylation data to detect aberrant events associated with cancers. We develop a novel Bayesian approach, BayesIso, to identify differentially expressed isoforms from RNA-seq data. BayesIso features a joint model of the variability of RNA-seq data and the differential state of isoforms. BayesIso can not only account for the variability of RNA-seq data but also combines the differential states of isoforms as hidden variables for differential analysis. The differential states of isoforms are estimated jointly with other model parameters through a sampling process, providing an improved performance in detecting isoforms of less differentially expressed. We propose to develop a novel probabilistic approach, DM-BLD, in a Bayesian framework to identify differentially methylated genes. The DM-BLD approach features a hierarchical model, built upon Markov random field models, to capture both the local dependency of measured loci and the dependency of methylation change. A Gibbs sampling procedure is designed to estimate the posterior distribution of the methylation change of CpG sites. Then, the differential methylation score of a gene is calculated from the estimated methylation changes of the involved CpG sites and the significance of genes is assessed by permutation-based statistical tests. We have demonstrated the advantage of the proposed Bayesian approaches over conventional methods for differential analysis of RNA-seq data and methylation data. The joint estimation of the posterior distributions of the variables and model parameters using sampling procedure has demonstrated the advantage in detecting isoforms or methylated genes of less differential. The applications to breast cancer data shed light on understanding the molecular mechanisms underlying breast cancer recurrence, aiming to identify new molecular targets for breast cancer treatment.
Coping Uncertainty in Wireless Network Optimization
Li, Shaoran (Virginia Tech, 2022-10-24)
Network optimization plays an important role in 5G/next-G networks, which requires knowledge of network parameters (e.g., channel state information). The majority of existing works assume that all network parameters are either given a prior or can be accurately estimated. However, in many practical scenarios, some parameters are uncertain at the time of allocating resources and can only be modeled by random variables. Further, we only have limited knowledge of those uncertain parameters. For instance, channel gains are not exactly known due to channel estimation errors, network delay, limited feedback, and a lack of cooperation (between networks). Therefore, a practical solution to network optimization must address such uncertainty inside wireless networks. There are three approaches to address such a network uncertainty: stochastic programming, worst-case optimization, and chance-constrained programming (CCP). Among the three, CCP has some unique benefits compared to the other two approaches. Stochastic programming explicitly requires full distribution knowledge, which is usually unavailable in practice. In comparison, CCP can work with various settings of available knowledge such as first and second order statistics, symmetric properties, or limited data samples. Therefore, CCP is more flexible to handle different network settings, which is important to address problems in 5G/next-G networks. Further, worst-case optimization assumes upper or lower bounds (i.e., worst cases) for the uncertain parameters and it is known to be conservative due to its focus on extreme cases. In contrast, CCP allows occasional and controllable violations for some constraints and thus offers much better performance in resource utilization compared to worst-case optimization. The only drawback of CCP is that it may lead to intractability due to its probabilistic formulation and limited knowledge of the underlying random variables. To date, CCP has not been well utilized in the wireless communication and networking community. The goal of this dissertation is to extend the state-of-the-art of CCP techniques and address a number of challenging network optimization problems. This dissertation is correspondingly organized into two parts. In the first part, we assume the uncertain parameters are only known by their mean and covariance (without distribution knowledge). We assume these statistics are rather stationary (i.e., time-invariant for a sufficiently long time) and thus can be accurately estimated. In this setting, we introduce a novel reformulation technique based on the mean and covariance to derive a solution. In the second part, we assume these statistics are time-varying and thus cannot be accurately estimated.In this setting, we employ limited data samples that are collected in a small time window and use them to derive a solution. For the first part, we investigate four research problems based on the mean and covariance of the uncertain parameters: - In the first problem, we study how to maximize spectrum efficiency in underlay coexistence.The interference from all secondary users to each primary user must be kept below a given threshold. However, there is much uncertainty about the channel gains between the primary users and the second users due to a lack of cooperation between them. We formulate probabilistic interference constraints using CCP for the primary users. For tractability, we introduce a novel and powerful reformulation technique called Exact Conic Reformulation (ECR). With limited knowledge of mean and covariance, ECR offers an equivalent reformulation for the intractable chance constraints with tractable deterministic constraints without relaxation errors. After reformulation, we employ linearization techniques to the mixed-integer non-linear problem to reduce the computation complexity. We show that our proposed approach can achieve near-optimal performance and stands as a performance benchmark for the underlay coexistence problem. - To find a solution for the same underlay coexistence problem that can be used in the real world, we need to find a solution in "real-time". The real-time requirement here refers to finding a solution in 125 us (the minimum time slot for small cells in 5G). Our proposed solution has three steps. First, it employs ECR to reformulate the original CCP into a deterministic optimization problem. Then it decomposes the problem and narrows down the search space into a smaller but promising one. By random sampling inside the promising search space and through local search, our proposed solution can meet the 125 us requirement in 5G while achieving 90% optimality on average. - We further apply CCP, predicated on the reformulation technique ECR, to two other problems. * We study the problem of power control in concurrent transmissions. Our objective is to maximize energy efficiency for all transmitter-receiver pairs with capacity requirements. This problem is challenging due to mutual interference among different transmitter-receiver pairs and the uncertain channel gain between any transmitter and receiver. We formulate a CCP and reformulate it into a deterministic problem using ECR. Then we employ Geometric Programming (GP) with a tight approximation to derive a near-optimal solution. * We study task offloading in Mobile Edge Computing (MEC) where the number of processing cycles of a task is unknown until completion. The goal is to minimize the energy consumption of the users while meeting probabilistic deadlines for the tasks. We formulate the probabilistic deadlines into chance constraints and then use ECR to reformulate them into deterministic constraints. We propose a solution that consists of periodic scheduling and schedule updates to choose the offloaded tasks and task-to-processor assignments at the base station. In the second part, we investigate two research problems based on limited data samples of the uncertain parameters: - We study MU-MIMO beamforming based on Channel State Information (CSI). The goal is to derive a beamforming solution---minimizing power consumption at the BS while meeting the probabilistic data rate requirements of the users---by using very limited CSI data samples. For our CCP formulation, we explore the idea of Wasserstein ambiguity set to quantify the distance between the true (but unknown) distribution and the empirical distribution based on the limited data samples. Our proposed solution---Data-Driven Beamforming (D^2BF)---reformulates the CCP into a non-convex deterministic optimization problem based on the properties of Wasserstein ambiguity set. Then D^2BF employs a novel convex approximation to the non-convex deterministic problem, which can be directly solved by commercial solvers. - For a solution to the MU-MIMO beamforming to be useful in the real world, it must meet the "real-time" requirement. Here, the real-time requirement refers to 1 ms, which is one transmission time interval (TTI) under 5G numerology 0. We present ReDBeam---a Real-time Data-driven Beamforming solution for the MU-MIMO beamforming problem (minimizing power consumption while offering probabilistic data rate guarantees to the users) with limited CSI data samples. RedBeam is a parallel algorithm and is purposefully designed to take advantage of the vast parallel processing capability offered by GPU. ReDBeam generates a large number of initial solutions from a promising search space and then refines each solution by a local search. We show that ReDBeam meets the 1 ms real-time requirement on a commercial GPU and is orders of magnitude faster than other state-of-the-art algorithms for the same problem.
Deep Learning Empowered Unsupervised Contextual Information Extraction and its applications in Communication Systems
Gusain, Kunal (Virginia Tech, 2023-01-16)
Design and Analysis of Intrusion Detection Protocols in Cyber Physical Systems
Mitchel, Robert Raymondl III (Virginia Tech, 2013-04-23)
In this dissertation research we aim to design and validate intrusion detection system (IDS) protocols for a cyber physical system (CPS) comprising sensors, actuators, control units, and physical objects for controlling and protecting physical infrastructures.
The design part includes host IDS, system IDS and IDS response designs. The validation part includes a novel model-based analysis methodology with simulation validation. Our objective is to maximize the CPS reliability or lifetime in the presence of malicious nodes performing attacks which can cause security failures. Our host IDS design results in a lightweight, accurate, autonomous and adaptive protocol that runs on every node in the CPS to detect misbehavior of neighbor nodes based on state-based behavior specifications. Our system IDS design results in a robust and resilient protocol that can cope with malicious, erroneous, partly trusted, uncertain and incomplete information in a CPS. Our IDS response design results in a highly adaptive and dynamic control protocol that can adjust detection strength in response to environment changes in attacker strength and behavior. The end result is an energy-aware and adaptive IDS that can maximize the CPS lifetime in the presence of malicious attacks, as well as malicious, erroneous, partly trusted, uncertain and incomplete information.
We develop a probability model based on stochastic Petri nets to describe the behavior of a CPS incorporating our proposed intrusion detection and response designs, subject to attacks by malicious nodes exhibiting a range of attacker behaviors, including reckless, random, insidious and opportunistic attacker models. We identify optimal intrusion detection settings under which the CPS reliability or lifetime is maximized for each attacker model. Adaptive control for maximizing IDS performance is achieved by dynamically adjusting detection and response strength in response to attacker strength and behavior detected at runtime. We conduct extensive analysis of our designs with four case studies, namely, a mobile group CPS, a medical CPS, a smart grid CPS and an unmanned aircraft CPS. The results show that our adaptive intrusion and response designs operating at optimizing conditions significantly outperform existing anomaly-based IDS techniques for CPSs.
Differential Network Analysis based on Omic Data for Cancer Biomarker Discovery
Zuo, Yiming (Virginia Tech, 2017-06-16)
Recent advances in high-throughput technique enables the generation of a large amount of omic data such as genomics, transcriptomics, proteomics, metabolomics, glycomics etc. Typically, differential expression analysis (e.g., student's t-test, ANOVA) is performed to identify biomolecules (e.g., genes, proteins, metabolites, glycans) with significant changes on individual level between biologically disparate groups (disease cases vs. healthy controls) for cancer biomarker discovery. However, differential expression analysis on independent studies for the same clinical types of patients often led to different sets of significant biomolecules and had only few in common. This may be attributed to the fact that biomolecules are members of strongly intertwined biological pathways and highly interactive with each other. Without considering these interactions, differential expression analysis could lead to biased results. Network-based methods provide a natural framework to study the interactions between biomolecules. Commonly used data-driven network models include relevance network, Bayesian network and Gaussian graphical models. In addition to data-driven network models, there are many publicly available databases such as STRING, KEGG, Reactome, and ConsensusPathDB, where one can extract various types of interactions to build knowledge-driven networks. While both data- and knowledge-driven networks have their pros and cons, an appropriate approach to incorporate the prior biological knowledge from publicly available databases into data-driven network model is desirable for more robust and biologically relevant network reconstruction. Recently, there has been a growing interest in differential network analysis, where the connection in the network represents a statistically significant change in the pairwise interaction between two biomolecules in different groups. From the rewiring interactions shown in differential networks, biomolecules that have strongly altered connectivity between distinct biological groups can be identified. These biomolecules might play an important role in the disease under study. In fact, differential expression and differential network analyses investigate omic data from two complementary perspectives: the former focuses on the change in individual biomolecule level between different groups while the latter concentrates on the change in pairwise biomolecules level. Therefore, an approach that can integrate differential expression and differential network analyses is likely to discover more reliable and powerful biomarkers. To achieve these goals, we start by proposing a novel data-driven network model (i.e., LOPC) to reconstruct sparse biological networks. The sparse networks only contains direct interactions between biomolecules which can help researchers to focus on the more informative connections. Then we propose a novel method (i.e., dwgLASSO) to incorporate prior biological knowledge into data-driven network model to build biologically relevant networks. Differential network analysis is applied based on the networks constructed for biologically disparate groups to identify cancer biomarker candidates. Finally, we propose a novel network-based approach (i.e., INDEED) to integrate differential expression and differential network analyses to identify more reliable and powerful cancer biomarker candidates. INDEED is further expanded as INDEED-M to utilize omic data at different levels of human biological system (e.g., transcriptomics, proteomics, metabolomics), which we believe is promising to increase our understanding of cancer. Matlab and R packages for the proposed methods are developed and available at Github (https://github.com/Hurricaner1989) to share with the research community.
Discovery of Triggering Relations and Its Applications in Network Security and Android Malware Detection
Zhang, Hao (Virginia Tech, 2015-11-30)
An increasing variety of malware, including spyware, worms, and bots, threatens data confidentiality and system integrity on computing devices ranging from backend servers to mobile devices. To address these threats, exacerbated by dynamic network traffic patterns and growing volumes, network security has been undergoing major changes to improve accuracy and scalability in the security analysis techniques. This dissertation addresses the problem of detecting the network anomalies on a single device by inferring the traffic dependence to ensure the root-triggers. In particular, we propose a dependence model for illustrating the network traffic causality. This model depicts the triggering relation of network requests, and thus can be used to reason about the occurrences of network events and pinpoint stealthy malware activities. The triggering relationships can be inferred by means of both rule-based and learning-based approaches. The rule-based approach originates from several heuristic algorithms based on the domain knowledge. The learning-based approach discovers the triggering relationship using a pairwise comparison operation that converts the requests into event pairs with comparable attributes. Machine learning classifiers predict the triggering relationship and further reason about the legitimacy of requests by enforcing their root-triggers. We apply our dependence model on the network traffic from a single host and a mobile device. Evaluated with real-world malware samples and synthetic attacks, our findings confirm that the traffic dependence model provides a significant source of semantic and contextual information that detects zero-day malicious applications. This dissertation also studies the usability of visualizing the traffic causality for domain experts. We design and develop a tool with a visual locality property. It supports different levels of visual based querying and reasoning required for the sensemaking process on complex network data. The significance of this dissertation research is in that it provides deep insights on the dependency of network requests, and leverages structural and semantic information, allowing us to reason about network behaviors and detect stealthy anomalies.
DPP: Dual Path PKI for Secure Aircraft Data Communication
Buchholz, Alexander Karl (Virginia Tech, 2013-05-02)
Through application of modern technology, aviation systems are becoming more automated and are relying less on antiquated air traffic control (ATC) voice systems. Aircraft are now able to wirelessly broadcast and receive identity and location information using transponder technology. This helps reduce controller workload and allows the aircraft to take more responsibility for maintaining safe separation. However, these systems lack source authentication methods or the ability to check the integrity of message content. This opens the door for hackers to potentially create fraudulent messages or manipulate message content. This thesis presents a solution to handling many of the potential security issues in aircraft data communication. This is accomplished through the implementation of a Dual Path PKI (DPP) design which includes a novel approach to handling certificate revocation through session certificates. DPP defines two authentication protocols, one between aircraft and another between aircraft and ATC, to achieve source authentication. Digital signature technology is utilized to achieve message content and source integrity as well as enable bootstrapping DPP into current ATC systems. DPP employs cutting-edge elliptic curve cryptography (ECC) algorithms to increase performance and reduce overhead. T is found that the DPP design successfully mitigates several of the cyber security concerns in aircraft and ATC data communications. An implementation of the design shows that anticipated ATC systems can accommodate the additional processing power and bandwidth required by DPP to successfully achieve system integrity and security.
Dynamic Trust Management for Mobile Networks and Its Applications
Bao, Fenye (Virginia Tech, 2013-06-05)
Trust management in mobile networks is challenging due to dynamically changing network environments and the lack of a centralized trusted authority. In this dissertation research, we design and validate a class of dynamic trust management protocols for mobile networks, and demonstrate the utility of dynamic trust management with trust-based applications. Unlike existing work, we consider social trust derived from social networks in addition to traditional quality-of-service (QoS) trust derived from communication networks to obtain a composite trust metric as a basis for evaluating trust of nodes in mobile network applications. Untreated in the literature, we design and validate trust composition, aggregation, propagation, and formation protocols for dynamic trust management that can learn from past experiences and adapt to changing environment conditions to maximize application performance and enhance operation agility. Furthermore, we propose, explore and validate the design concept of application-level trust optimization in response to changing conditions to maximize application performance or best satisfy application requirements. We provide formal proof for the convergence, accuracy, and resiliency properties of our trust management protocols. To achieve the goals of identifying the best trust protocol setting and optimizing the use of trust for trust-based applications, we develop a novel model-based analysis methodology with simulation validation for analyzing and validating our dynamic trust management protocol design. The dissertation research provides new understanding of dynamic trust management for mobile wireless networks. We gain insight on the best trust composition and trust formation out of social and QoS trust components, as well as the best trust aggregation and propagation protocols for optimizing application performance. We gain insight on how a modeling and analysis tool can be built, allowing trust composition, aggregation, propagation, and formation designs to be incorporated, tested and validated. We demonstrate the utility of dynamic trust management protocol for mobile networks including mobile ad-hoc networks, delay tolerant networks, wireless sensor networks, and Internet of things systems with practical applications including misbehaving node detection, trust-based survivability management, trust-based secure routing, and trust-based service composition. Through model-based analysis with simulation validation, we show that our dynamic trust management based protocols outperform non-trust-based and Bayesian trust-based protocols in the presence of malicious, erroneous, partly trusted, uncertain and incomplete information, and are resilient to trust related attacks.
Efficient Algorithms for Mining Large Spatio-Temporal Data
Chen, Feng (Virginia Tech, 2013-01-21)
Knowledge discovery on spatio-temporal datasets has attracted
growing interests. Recent advances on remote sensing technology mean
that massive amounts of spatio-temporal data are being collected,
and its volume keeps increasing at an ever faster pace. It becomes
critical to design efficient algorithms for identifying novel and
meaningful patterns from massive spatio-temporal datasets. Different
from the other data sources, this data exhibits significant
space-time statistical dependence, and the assumption of i.i.d. is
no longer valid. The exact modeling of space-time dependence will
render the exponential growth of model complexity as the data size
increases. This research focuses on the construction of efficient
and effective approaches using approximate inference techniques for
three main mining tasks, including spatial outlier detection, robust
spatio-temporal prediction, and novel applications to real world
problems.

Spatial novelty patterns, or spatial outliers, are those data points
whose characteristics are markedly different from their spatial
neighbors. There are two major branches of spatial outlier detection
methodologies, which can be either global Kriging based or local
Laplacian smoothing based. The former approach requires the exact
modeling of spatial dependence, which is time extensive; and the
latter approach requires the i.i.d. assumption of the smoothed
observations, which is not statistically solid. These two approaches
are constrained to numerical data, but in real world applications we
are often faced with a variety of non-numerical data types, such as
count, binary, nominal, and ordinal. To summarize, the main research
challenges are: 1) how much spatial dependence can be eliminated via
Laplace smoothing; 2) how to effectively and efficiently detect
outliers for large numerical spatial datasets; 3) how to generalize
numerical detection methods and develop a unified outlier detection
framework suitable for large non-numerical datasets; 4) how to
achieve accurate spatial prediction even when the training data has
been contaminated by outliers; 5) how to deal with spatio-temporal
data for the preceding problems.

To address the first and second challenges, we mathematically
validated the effectiveness of Laplacian smoothing on the
elimination of spatial autocorrelations. This work provides
fundamental support for existing Laplacian smoothing based methods.
We also discovered a nontrivial side-effect of Laplacian smoothing,
which ingests additional spatial variations to the data due to
convolution effects. To capture this extra variability, we proposed
a generalized local statistical model, and designed two fast forward
and backward outlier detection methods that achieve a better balance
between computational efficiency and accuracy than most existing
methods, and are well suited to large numerical spatial datasets.

We addressed the third challenge by mapping non-numerical variables
to latent numerical variables via a link function, such as logit
function used in logistic regression, and then utilizing
error-buffer artificial variables, which follow a Student-t
distribution, to capture the large valuations caused by outliers. We
proposed a unified statistical framework, which integrates the
advantages of spatial generalized linear mixed model, robust spatial
linear model, reduced-rank dimension reduction, and Bayesian
hierarchical model. A linear-time approximate inference algorithm
was designed to infer the posterior distribution of the error-buffer
artificial variables conditioned on observations. We demonstrated
that traditional numerical outlier detection methods can be directly
applied to the estimated artificial variables for outliers
detection. To the best of our knowledge, this is the first
linear-time outlier detection algorithm that supports a variety of
spatial attribute types, such as binary, count, ordinal, and
nominal.

To address the fourth and fifth challenges, we proposed a robust
version of the Spatio-Temporal Random Effects (STRE) model, namely
the Robust STRE (R-STRE) model. The regular STRE model is a recently
proposed statistical model for large spatio-temporal data that has a
linear order time complexity, but is not best suited for
non-Gaussian and contaminated datasets. This deficiency can be
systemically addressed by increasing the robustness of the model
using heavy-tailed distributions, such as the Huber, Laplace, or
Student-t distribution to model the measurement error, instead of
the traditional Gaussian. However, the resulting R-STRE model
becomes analytical intractable, and direct application of
approximate inferences techniques still has a cubic order time
complexity. To address the computational challenge, we reformulated
the prediction problem as a maximum a posterior (MAP) problem with a
non-smooth objection function, transformed it to a equivalent
quadratic programming problem, and developed an efficient
interior-point numerical algorithm with a near linear order
complexity. This work presents the first near linear time robust
prediction approach for large spatio-temporal datasets in both
offline and online cases.
Efficient Sharing of Radio Spectrum for Wireless Networks
Yuan, Xu (Virginia Tech, 2016-07-11)
The radio spectrum that can be used for wireless communications is a finite but extremely valuable resource. During the past two decades, with the proliferation of new wireless applications, the use of the radio spectrum has intensified to the point that improved spectrum sharing policies and new mechanisms are needed to enhance its utilization efficiency. This dissertation studies spectrum sharing and coexistence on both licensed and unlicensed bands for wireless networks. For licensed bands, we study two coexistence paradigms: transparent coexistence (a.k.a., underlay) and policy-based network cooperation (a.k.a., overlay). These two paradigms can offer significant improvement in spectrum utilization and throughput performance than the interweave paradigm. For unlicensed band, we study coexistence of Wi-Fi and LTE, the two most poplar wireless networks.
Enhancing Security and Privacy in Head-Mounted Augmented Reality Systems Using Eye Gaze
Corbett, Matthew (Virginia Tech, 2024-04-22)
Augmented Reality (AR) devices are set apart from other mobile devices by the immersive experience they offer. Specifically, head-mounted AR devices can accurately sense and understand their environment through an increasingly powerful array of sensors such as cameras, depth sensors, eye gaze trackers, microphones, and inertial sensors. The ability of these devices to collect this information presents both challenges and opportunities to improve existing security and privacy techniques in this domain. Specifically, eye gaze tracking is a ready-made capability to analyze user intent, emotions, and vulnerability, and as an input mechanism. However, modern AR devices lack systems to address their unique security and privacy issues. Problems such as lacking local pairing mechanisms usable while immersed in AR environments, bystander privacy protections, and the increased vulnerability to shoulder surfing while wearing AR devices all lack viable solutions. In this dissertation, I explore how readily available eye gaze sensor data can be used to improve existing methods for assuring information security and protecting the privacy of those near the device. My research has presented three new systems, BystandAR, ShouldAR, and GazePair that each leverage user eye gaze to improve security and privacy expectations in or with Augmented Reality. As these devices grow in power and number, such solutions are necessary to prevent perception failures that hindered earlier devices. The work in this dissertation is presented in the hope that these solutions can improve and expedite the adoption of these powerful and useful devices.
Exploring Performance Limits of Wireless Networks with Advanced Communication Technologies
Qin, Xiaoqi (Virginia Tech, 2016-10-13)
Over the past decade, wireless data communication has experienced a phenomenal growth, which is driven by the popularity of wireless devices and the growing number of bandwidth hungry applications. During the same period, various advanced communication technologies have emerged to improve network throughput. Some examples include multi-input multi-output (MIMO), full duplex, cognitive radio, mmWave, among others. An important research direction is to understand the impacts of these new technologies on network throughput performance. Such investigation is critical not only for theoretical understanding, but also can be used as a guideline to design algorithms and network protocols in the field. The goal of this dissertation is to understand the impact of some advanced technologies on network throughput performance. More specifically, we investigate the following three technologies: MIMO, full duplex, and mmWave communication. For each technology, we explore the performance envelope of wireless networks by studying a throughput maximization problem.
Exploring the Sensing Capability of Wireless Signals
Du, Changlai (Virginia Tech, 2018-07-06)
Wireless communications are ubiquitous nowadays, especially in the new era of Internet of Things (IoT). Most of IoT devices access the Internet via some kind of wireless connections. The major role of wireless signals is a type of communication medium. Besides that, taking advantage of the growing physical layer capabilities of wireless techniques, recent research has demonstrated the possibility of reusing wireless signals for both communication and sensing. The capability of wireless sensing and the ubiquitous availability of wireless signals make it possible to meet the rising demand of pervasive environment perception. Physical layer features including signal attributes and channel state information (CSI) can be used for the purpose of physical world sensing. This dissertation focuses on exploring the sensing capability of wireless signals. The research approach is to first take measurements from physical layer of wireless connections, and then develop various techniques to extract or infer information about the environment from the measurements, like the locations of signal sources, the motion of human body, etc. The research work in this dissertation makes three contributions. We start from wireless signal attributes analysis. Specifically, the cyclostationarity properties of wireless signals are studied. Taking WiFi signals as an example, we propose signal cyclostationarity models induced by WiFi Orthogonal Frequency Division Multiplexing (OFDM) structure including pilots, cyclic prefix, and preambles. The induced cyclic frequencies is then applied to the signal-selective direction estimation problem. Second, based on the analysis of wireless signal attributes, we design and implement a prototype of a single device system, named MobTrack, which can locate indoor interfering radios. The goal of designing MobTrack is to provide a lightweight, handhold system that can locate interfering radios with sub-meter accuracy with as few antennas as possible. With a small antenna array, the cost, complexity as well as size of this device are reduced. MobTrack is the first single device indoor interference localization system without the requirement of multiple pre-deployed access points (AP). Third, channel state information is studied in applications of human motion sensing. We design WiTalk, the first system which is able to do fine-grained motion sensing like leap reading on smartphones using the CSI dynamics generated by human movements. WiTalk proposes a new fine-grained human motion sensing technique with the distinct context-free feature. To achieve this goal using CSI, WiTalk generates CSI spectrograms using signal processing techniques and extracts features by calculating the contours of the CSI spectrograms. The proposed technique is verified in the application scenario of lip reading, where the fine-grained motion is the mouth movements.

Browsing by Author "Lou, Wenjing"

Results Per Page

Sort Options