Browsing by Author "Yao, Danfeng (Daphne)"
Now showing 1 - 20 of 88
Results Per Page
Sort Options
- Adaptive Key Protection in Complex Cryptosystems with AttributesWang, Zilong; Yao, Danfeng (Daphne); Feng, Rongquan (Department of Computer Science, Virginia Polytechnic Institute & State University, 2012)In the attribute-based encryption (ABE) model, attributes (as opposed to identities) are used to encrypt messages, and all the receivers with qualifying attributes can decrypt the ciphertext. However, compromised attribute keys may affect the communications of many users who share the same access control policies. We present the notion of forward-secure attribute-based encryption (fs-ABE) and give a concrete construction based on bilinear map and decisional bilinear Diffie-Hellman assumption. Forward security means that a compromised private key by an adversary at time t does not break the confidentiality of the communication that took place prior to t. We describe how to achieve both forward security and encryption with attributes, and formally prove our security against the adaptive chosen-ciphertext adversaries. Our scheme is non-trivial, and the key size only grows polynomially with logN (where N is the number of time periods). We further generalize our scheme to support the individualized key-updating schedule for each attribute, which provides a finer granularity for key management. Our insights on the required properties that an ABE scheme needs to possess in order to be forward-secure compatible are useful beyond the specific fs-ABE construction given. We raise an open question at the end of the paper on the escrow problem of the master key in ABE schemes.
- Addressing Challenges of Modern News Agencies via Predictive Modeling, Deep Learning, and Transfer LearningKeneshloo, Yaser (Virginia Tech, 2019-07-22)Today's news agencies are moving from traditional journalism, where publishing just a few news articles per day was sufficient, to modern content generation mechanisms, which create more than thousands of news pieces every day. With the growth of these modern news agencies comes the arduous task of properly handling this massive amount of data that is generated for each news article. Therefore, news agencies are constantly seeking solutions to facilitate and automate some of the tasks that have been previously done by humans. In this dissertation, we focus on some of these problems and provide solutions for two broad problems which help a news agency to not only have a wider view of the behaviour of readers around the article but also to provide an automated tools to ease the job of editors in summarizing news articles. These two disjoint problems are aiming at improving the users' reading experience by helping the content generator to monitor and focus on poorly performing content while allow them to promote the good-performing ones. We first focus on the task of popularity prediction of news articles via a combination of regression, classification, and clustering models. We next focus on the problem of generating automated text summaries for a long news article using deep learning models. The first problem aims at helping the content developer in understanding of how a news article is performing over the long run while the second problem provides automated tools for the content developers to generate summaries for each news article.
- Algorithms and Frameworks for Accelerating Security Applications on HPC PlatformsYu, Xiaodong (Virginia Tech, 2019-09-09)Typical cybersecurity solutions emphasize on achieving defense functionalities. However, execution efficiency and scalability are equally important, especially for real-world deployment. Straightforward mappings of cybersecurity applications onto HPC platforms may significantly underutilize the HPC devices' capacities. On the other hand, the sophisticated implementations are quite difficult: they require both in-depth understandings of cybersecurity domain-specific characteristics and HPC architecture and system model. In our work, we investigate three sub-areas in cybersecurity, including mobile software security, network security, and system security. They have the following performance issues, respectively: 1) The flow- and context-sensitive static analysis for the large and complex Android APKs are incredibly time-consuming. Existing CPU-only frameworks/tools have to set a timeout threshold to cease the program analysis to trade the precision for performance. 2) Network intrusion detection systems (NIDS) use automata processing as its searching core and requires line-speed processing. However, achieving high-speed automata processing is exceptionally difficult in both algorithm and implementation aspects. 3) It is unclear how the cache configurations impact time-driven cache side-channel attacks' performance. This question remains open because it is difficult to conduct comparative measurement to study the impacts. In this dissertation, we demonstrate how application-specific characteristics can be leveraged to optimize implementations on various types of HPC for faster and more scalable cybersecurity executions. For example, we present a new GPU-assisted framework and a collection of optimization strategies for fast Android static data-flow analysis that achieve up to 128X speedups against the plain GPU implementation. For network intrusion detection systems (IDS), we design and implement an algorithm capable of eliminating the state explosion in out-of-order packet situations, which reduces up to 400X of the memory overhead. We also present tools for improving the usability of Micron's Automata Processor. To study the cache configurations' impact on time-driven cache side-channel attacks' performance, we design an approach to conducting comparative measurement. We propose a quantifiable success rate metric to measure the performance of time-driven cache attacks and utilize the GEM5 platform to emulate the configurable cache.
- Anomaly Detection Through System and Program Behavior ModelingXu, Kui (Virginia Tech, 2014-12-15)Various vulnerabilities in software applications become easy targets for attackers. The trend constantly being observed in the evolution of advanced modern exploits is their growing sophistication in stealthy attacks. Code-reuse attacks such as return-oriented programming allow intruders to execute mal-intended instruction sequences on a victim machine without injecting external code. Successful exploitation leads to hijacked applications or the download of malicious software (drive-by download attack), which usually happens without the notice or permission from users. In this dissertation, we address the problem of host-based system anomaly detection, specifically by predicting expected behaviors of programs and detecting run-time deviations and anomalies. We first introduce an approach for detecting the drive-by download attack, which is one of the major vectors for malware infection. Our tool enforces the dependencies between user actions and system events, such as file-system access and process execution. It can be used to provide real time protection of a personal computer, as well as for diagnosing and evaluating untrusted websites for forensic purposes. We perform extensive experimental evaluation, including a user study with 21 participants, thousands of legitimate websites (for testing false alarms), 84 malicious websites in the wild, as well as lab reproduced exploits. Our solution demonstrates a usable host-based framework for controlling and enforcing the access of system resources. Secondly, we present a new anomaly-based detection technique that probabilistically models and learns a program's control flows for high-precision behavioral reasoning and monitoring. Existing solutions suffer from either incomplete behavioral modeling (for dynamic models) or overestimating the likelihood of call occurrences (for static models). We introduce a new probabilistic anomaly detection method for modeling program behaviors. Its uniqueness is the ability to quantify the static control flow in programs and to integrate the control flow information in probabilistic machine learning algorithms. The advantage of our technique is the significantly improved detection accuracy. We observed 11 up to 28-fold of improvement in detection accuracy compared to the state-of-the-art HMM-based anomaly models. We further integrate context information into our detection model, which achieves both strong flow-sensitivity and context-sensitivity. Our context-sensitive approach gives on average over 10 times of improvement for system call monitoring, and 3 orders of magnitude for library call monitoring, over existing regular HMM methods. Evaluated with a large amount of program traces and real-world exploits, our findings confirm that the probabilistic modeling of program dependences provides a significant source of behavior information for building high-precision models for real-time system monitoring. Abnormal traces (obtained through reproducing exploits and synthesized abnormal traces) can be well distinguished from normal traces by our model.
- Applications and Security of Next-Generation, User-Centric Wireless SystemsRamstetter, Jerry Rick; Yang, Yaling; Yao, Danfeng (Daphne) (MDPI, 2010-07-28)Pervasive wireless systems have significantly improved end-users quality of life. As manufacturing costs decrease, communications bandwidth increases, and contextual information is made more readily available, the role of next generation wireless systems in facilitating users daily activities will grow. Unique security and privacy issues exist in these wireless, context-aware, often decentralized systems. For example, the pervasive nature of such systems allows adversaries to launch stealthy attacks against them. In this review paper, we survey several emergent personal wireless systems and their applications. These systems include mobile social networks, active implantable medical devices, and consumer products. We explore each systems usage of contextual information and provide insight into its security vulnerabilities. Where possible, we describe existing solutions for defendingagainst these vulnerabilities. Finally, we point out promising future research directions for improving these systems robustness and security
- Attack and Defense with Hardware-Aided SecurityZhang, Ning (Virginia Tech, 2016-08-26)Riding on recent advances in computing and networking, our society is now experiencing the evolution into the age of information. While the development of these technologies brings great value to our daily life, the lucrative reward from cyber-crimes has also attracted criminals. As computing continues to play an increasing role in the society, security has become a pressing issue. Failures in computing systems could result in loss of infrastructure or human life, as demonstrated in both academic research and production environment. With the continuing widespread of malicious software and new vulnerabilities revealing every day, protecting the heterogeneous computing systems across the Internet has become a daunting task. Our approach to this challenge consists of two directions. The first direction aims to gain a better understanding of the inner working of both attacks and defenses in the cyber environment. Meanwhile, our other direction is designing secure systems in adversarial environment.
- Bluetooth Threat TaxonomyDunning, John Paul (Virginia Tech, 2010-10-08)Since its release in 1999, Bluetooth has become a commonly used technology available on billions of devices through the world. Bluetooth is a wireless technology used for information transfer by devices such as Smartphones, headsets, keyboard/mice, laptops/desktops, video game systems, automobiles, printers, heart monitors, and surveillance cameras. Dozens of threats have been developed by researchers and hackers which targets these Bluetooth enabled devices. The work in this thesis provides insight into past and current Bluetooth threats along with methods of threat mitigation. The main focus of this thesis is the Bluetooth Threat Taxonomy (BTT); it is designed for classifying threats against Bluetooth enabled technology. The BTT incorporates nine distinct classifications to categorize Bluetooth attack tools and methods and a discussion on 42 threats. In addition, several new threats developed by the author will be discussed. This research also provides means to secure Bluetooth enabled devices. The Bluetooth Attack Detection Engine (BLADE) is as a host-based Intrusion Detection System (IDS) presented to detect threats targeted toward a host system. Finally, a threat mitigation schema is provided to act as a guideline for securing Bluetooth enabled devices.
- Building CTRnet Digital Library Services using Archive-It and LucidWorks Big Data SoftwareChitturi, Kiran (Virginia Tech, 2014-03-27)When a crisis occurs, information flows rapidly in the Web through social media, blogs, and news articles. The shared information captures the reactions, impacts, and responses from the government as well as the public. Later, researchers, scholars, students, and others seek information about earlier events, sometimes for cross-event analysis or comparison. There are very few integrated systems which try to collect and permanently archive the information about an event and provide access to the crisis information at the same time. In this thesis, we describe the CTRnet Digital Library and Archive which aims to permanently archive crisis event information by using Archive-It services and then provide access to the archived information by using LucidWorks Big Data software. Through the Big Data (LWBD) software, we take advantage of text extraction, clustering, similarity, annotation, and indexing services and build digital libraries with the generated metadata that will be helpful for the system stakeholders to locate information about an event. Through this study, we collected data for 46 crises events using Archive-It. We built a CTRnet DL prototype and its services for the ``Boston Marathon Bombing" collection by using the components of LucidWorks Big Data. Running LucidWorks Big Data on a 30 node Hadoop cluster accelerates the sub-workflows processing and also provides fault tolerant execution. LWBD sub-workflows, ``ingest" and ``extract", processed the textual data present in the WARC files. Other sub-workflows ``kmeans", ``simdoc", and ``annotate" helped in grouping the search-results, deleting the duplicates and providing metadata for additional facets in the CTRnet DL prototype, respectively.
- CCS 2017- Women in Cyber Security (CyberW) WorkshopYao, Danfeng (Daphne); Bertino, Elisa (ACM, 2017)The CyberW workshop is motivated by the significant gender imbalance in all security conferences, in terms of the number of publishing authors, PC members, organizers, and attendees. What causes this gender imbalance remains unclear. However, multiple research studies have shown that a diverse group is more creative, diligent, and productive than a homogeneous group. Achieving cyber security requires a diverse group. To maintain a sustainable and creative workforce, substantial efforts need to be made by the security community to broaden the participation from underrepresented groups in cyber security research conferences. We hope this workshop can attract all underrepresented cybersecurity professionals, students, and researchers to attend top security and privacy conferences, engage in cutting-edge security and privacy research, excel in cyber security professions, and ultimately take on leadership positions.
- Characterizing and Detecting Online Deception via Data-Driven MethodsHu, Hang (Virginia Tech, 2020-05-27)In recent years, online deception has become a major threat to information security. Online deception that caused significant consequences is usually spear phishing. Spear-phishing emails come in a very small volume, target a small number of audiences, sometimes impersonate a trusted entity and use very specific content to redirect targets to a phishing website, where the attacker tricks targets sharing their credentials. In this thesis, we aim at measuring the entire process. Starting from phishing emails, we examine anti-spoofing protocols, analyze email services' policies and warnings towards spoofing emails, and measure the email tracking ecosystem. With phishing websites, we implement a powerful tool to detect domain name impersonation and detect phishing pages using dynamic and static analysis. We also analyze credential sharing on phishing websites, and measure what happens after victims share their credentials. Finally, we discuss potential phishing and privacy concerns on new platforms such as Alexa and Google Assistant. In the first part of this thesis (Chapter 3), we focus on measuring how email providers detect and handle forged emails. We also try to understand how forged emails can reach user inboxes by deliberately composing emails. Finally, we check how email providers warn users about forged emails. In the second part (Chapter 4), we measure the adoption of anti-spoofing protocols and seek to understand the reasons behind the low adoption rates. In the third part of this thesis (Chapter 5), we observe that a lot of phishing emails use email tracking techniques to track targets. We collect a large dataset of email messages using disposable email services and measure the landscape of email tracking. In the fourth part of this thesis (Chapter 6), we move on to phishing websites. We implement a powerful tool to detect squatting domains and train a machine learning model to classify phishing websites. In the fifth part (Chapter 7), we focus on the credential leaks. More specifically, we measure what happens after the targets' credentials are leaked. We monitor and measure the potential post-phishing exploiting activities. Finally, with new voice platforms such as Alexa becoming more and more popular, we wonder if new phishing and privacy concerns emerge with new platforms. In this part (Chapter 8), we systematically assess the attack surfaces by measuring sensitive applications on voice assistant systems. My thesis measures important parts of the complete process of online deception. With deeper understandings of phishing attacks, more complete and effective defense mechanisms can be developed to mitigate attacks in various dimensions.
- Coexistence of Wireless Networks for Shared Spectrum AccessGao, Bo (Virginia Tech, 2014-09-18)The radio frequency spectrum is not being efficiently utilized partly due to the current policy of allocating the frequency bands to specific services and users. In opportunistic spectrum access (OSA), the ``white spaces'' that are not occupied by primary users (a.k.a. incumbent users) can be opportunistically utilized by secondary users. To achieve this, we need to solve two problems: (i) primary-secondary incumbent protection, i.e., prevention of harmful interference from secondary users to primary users; (ii) secondary-secondary network coexistence, i.e., mitigation of mutual interference among secondary users. The first problem has been addressed by spectrum sensing techniques in cognitive radio (CR) networks and geolocation database services in database-driven spectrum sharing. The second problem is the main focus of this dissertation. To obtain a clear picture of coexistence issues, we propose a taxonomy of heterogeneous coexistence mechanisms for shared spectrum access. Based on the taxonomy, we choose to focus on four typical coexistence scenarios in this dissertation. Firstly, we study sensing-based OSA, when secondary users are capable of employing the channel aggregation technique. However, channel aggregation is not always beneficial due to dynamic spectrum availability and limited radio capability. We propose a channel usage model to analyze the impact of both primary and secondary user behaviors on the efficiency of channel aggregation. Our simulation results show that user demands in both the frequency and time domains should be carefully chosen to minimize expected cumulative delay. Secondly, we study the coexistence of homogeneous CR networks, termed as self-coexistence, when co-channel networks do not rely on inter-network coordination. We propose an uplink soft frequency reuse technique to enable globally power-efficient and locally fair spectrum sharing. We frame the self-coexistence problem as a non-cooperative game, and design a local heuristic algorithm that achieves the Nash equilibrium in a distributed manner. Our simulation results show that the proposed technique is mostly near-optimal and improves self-coexistence in spectrum utilization, power consumption, and intra-cell fairness. Thirdly, we study the coexistence of heterogeneous CR networks, when co-channel networks use different air interface standards. We propose a credit-token-based spectrum etiquette framework that enables spectrum sharing via inter-network coordination. Specifically, we propose a game-auction coexistence framework, and prove that the framework is stable. Our simulation results show that the proposed framework always converges to a near-optimal distributed solution and improves coexistence fairness and spectrum utilization. Fourthly, we study database-driven OSA, when secondary users are mobile. The use of geolocation databases is inadequate in supporting location-aided spectrum sharing if the users are mobile. We propose a probabilistic coexistence framework that supports mobile users by locally adapting their location uncertainty levels in order to find an appropriate trade-off between interference mitigation effectiveness and location update cost. Our simulation results show that the proposed framework can determine and adapt the database query intervals of mobile users to achieve near-optimal interference mitigation with minimal location updates.
- Data Leak Detection As a Service: Challenges and SolutionsShu, Xiaokui; Yao, Danfeng (Daphne) (Department of Computer Science, Virginia Polytechnic Institute & State University, 2012)We describe a network-based data-leak detection (DLD) technique, the main feature of which is that the detection does not require the data owner to reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed. Our technique – referred to as the fuzzy fingerprint – can be used to detect accidental data leaks due to human errors or application flaws. The privacy-preserving feature of our algorithms minimizes the exposure of sensitive data and enables the data owner to safely delegate the detection to others.We describe how cloud providers can offer their customers data-leak detection as an add-on service with strong privacy guarantees. We perform extensive experimental evaluation on the privacy, efficiency, accuracy and noise tolerance of our techniques. Our evaluation results under various data-leak scenarios and setups show that our method can support accurate detection with very small number of false alarms, even when the presentation of the data has been transformed. It also indicates that the detection accuracy does not degrade when partial digests are used. We further provide a quantifiable method to measure the privacy guarantee offered by our fuzzy fingerprint framework.
- Deceptive Environments for Cybersecurity Defense on Low-power DevicesKedrowitsch, Alexander Lee (Virginia Tech, 2017-06-05)The ever-evolving nature of botnets have made constant malware collection an absolute necessity for security researchers in order to analyze and investigate the latest, nefarious means by which bots exploit their targets and operate in concert with each other and their bot master. In that effort of on-going data collection, honeypots have established themselves as a curious and useful tool for deception-based security. Low-powered devices, such as the Raspberry Pi, have found a natural home with some categories of honeypots and are being embraced by the honeypot community. Due to the low cost of these devices, new techniques are being explored to employ multiple honeypots within a network to act as sensors, collecting activity reports and captured malicious binaries to back-end servers for later analysis and network threat assessments. While these techniques are just beginning to gain their stride within the security community, they are held back due to the minimal amount of deception a traditional honeypot on a low-powered device is capable of delivering. This thesis seeks to make a preliminary investigation into the viability of using Linux containers to greatly expand the deception possible on low-powered devices by providing isolation and containment of full system images with minimal resource overhead. It is argued that employing Linux containers on low-powered device honeypots enables an entire category of honeypots previously unavailable on such hardware platforms. In addition to granting previously unavailable interaction with honeypots on Raspberry Pis, the use of Linux containers grants unique advantages that have not previously been explored by security researchers, such as the ability to defeat many types of virtual environment and monitoring tool detection methods.
- Defending Against GPS Spoofing by Analyzing Visual CuesXu, Chao (Virginia Tech, 2020-05-21)Massive GPS navigation services are used by billions of people in their daily lives. GPS spoofing is quite a challenging problem nowadays. Existing Anti-GPS spoofing systems primarily focus on expensive equipment and complicated algorithms, which are not practical and deployable for most of the users. In this thesis, we explore the feasibility of a simple text-based system design for Anti-GPS spoofing. The goal is to use the lower cost and make the system more effective and robust for general spoofing attack detection. Our key idea is to only use the textual information from the physical world and build a real-time system to detect GPS spoofing. To demonstrate the feasibility, we first design image processing modules to collect sufficient textual information in panoramic images. Then, we simulate real-world spoofing attacks from two cities to build our training and testing datasets. We utilize LSTM to build a binary classifier which is the key for our Anti-GPS spoofing system. Finally, we evaluate the system performance by simulating driving tests. We prove that our system can achieve more than 98% detection accuracy when the ratio of attacked points in a driving route is more than 50%. Our system has a promising performance for general spoofing attack strategies and it proves the feasibility of using textual information for the spoofing attack detection.
- Designing Practical Software Bug Detectors Using Commodity Hardware and Common Programming PatternsZhang, Tong (Virginia Tech, 2020-01-13)Software bugs can cost millions and affect people's daily lives. However, many bug detection tools are not always practical in reality, which hinders their wide adoption. There are three main concerns regarding existing bug detectors: 1) run-time overhead in dynamic bug detectors, 2) space overhead in dynamic bug detectors, and 3) scalability and precision issues in static bug detectors. With those in mind, we propose to: 1) leverage commodity hardware to reduce run-time overhead, 2) reuse metadata maintained by one bug detector to detect other types of bugs, reducing space overhead, and 3) apply programming idioms to static analyses, improving scalability and precision. We demonstrate the effectiveness of three approaches using data race bugs, memory safety bugs, and permission check bugs, respectively. First, we leverage the commodity hardware transactional memory (HTM) selectively to use the dynamic data race detector only if necessary, thereby reducing the overhead from 11.68x to 4.65x. We then present a production-ready data race detector, which only incurs a 2.6% run-time overhead, by using performance monitoring units (PMUs) for online memory access sampling and offline unsampled memory access reconstruction. Second, for memory safety bugs, which are more common than data races, we provide practical temporal memory safety on top of the spatial memory safety of the Intel MPX in a memory-efficient manner without additional hardware support. We achieve this by reusing the existing metadata and checks already available in the Intel MPX-instrumented applications, thereby offering full memory safety at only 36% memory overhead. Finally, we design a scalable and precise function pointer analysis tool leveraging indirect call usage patterns in the Linux kernel. We applied the tool to the detection of permission check bugs; the detector found 14 previously unknown bugs within a limited time budget.
- Detection of stealthy malware activities with traffic causality and scalable triggering relation discovery(United States Patent and Trademark Office, 2018-02-06)A computer system for distinguishing user-initiated network traffic from malware-initiated network traffic comprising at least one central processing unit (CPU) and a memory communicatively coupled to the CPU. The memory includes a program code executable by the CPU to monitor individual network events to determine for an individual network event whether the event has a legitimate root-trigger. Malware-initiated traffic is identified as an individual network event that does not have a legitimate root-trigger.
- Device-Based Isolation for Securing Cryptographic KeysElish, Karim O.; Deng, Yipan; Yao, Danfeng (Daphne); Kafura, Dennis G. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2012)In this work, we describe an eective device-based isolation approach for achieving data security. Device-based isolation leverages the proliferation of personal computing devices to provide strong run-time guarantees for the condentiality of secrets. To demonstrate our isolation approach, we show its use in protecting the secrecy of highly sensitive data that is crucial to security operations, such as cryptographic keys used for decrypting ciphertext or signing digital signatures. Private key is usually encrypted when not used, however, when being used, the plaintext key is loaded into the memory of the host for access. In our threat model, the host may be compromised by attackers, and thus the condentiality of the host memory cannot be preserved. We present a novel and practical solution and its prototype called DataGuard to protect the secrecy of the highly sensitive data through the storage isolation and secure tunneling enabled by a mobile handheld device. DataGuard can be deployed for the key protection of individuals or organizations.
- DeviceGuard: External Device-Assisted System And Data SecurityDeng, Yipan (Virginia Tech, 2011-05-02)This thesis addresses the threat that personal computer faced from malware when the personal computer is connected to the Internet. Traditional host-based security approaches, such as anti-virus scanning protect the host from virus, worms, Trojans and other malwares. One of the issues of the host-based security approaches is that when the operating system is compromised by the malware, the antivirus software also becomes vulnerable. In this thesis, we present a novel approach through using an external device to enhance the host security by offloading the security solution from the host to the external device. We describe the design of the DeviceGuard framework that separate the security solution from the host and offload it to the external device, a Trusted Device. The architecture of the DeviceGuard consists of two components, the DeviceGuard application on the Trusted Device and a DeviceGuard daemon on the host. Our prototype based on Android Development Phone (ADP) shows the feasibilities and efficiency of our approach to provide security features including system file and user data integrity monitoring, secure signing and secure decryption. We use Bluetooth as the communication protocol between the host and the Trusted Device. Our experiment results indicates a practical Bluetooth throughput at about 2M Bytes per second is sufficient for short range communication between the host and the Trusted Device; Message digest with SHA-512, digital signing with 1024 bits signature and secure decryption with AES 256 bits on the Trusted device takes only the scale of 10? and 10? ms for 1K bytes and 1M bytes respectively which are also shows the feasibility and efficiency of the DeviceGuard solution. We also investigated the use of embedded system as the Trusted Device. Our solution takes advantage of the proliferation of devices, such as Smartphone, for stronger system and data security.
- Discovery of Triggering Relations and Its Applications in Network Security and Android Malware DetectionZhang, Hao (Virginia Tech, 2015-11-30)An increasing variety of malware, including spyware, worms, and bots, threatens data confidentiality and system integrity on computing devices ranging from backend servers to mobile devices. To address these threats, exacerbated by dynamic network traffic patterns and growing volumes, network security has been undergoing major changes to improve accuracy and scalability in the security analysis techniques. This dissertation addresses the problem of detecting the network anomalies on a single device by inferring the traffic dependence to ensure the root-triggers. In particular, we propose a dependence model for illustrating the network traffic causality. This model depicts the triggering relation of network requests, and thus can be used to reason about the occurrences of network events and pinpoint stealthy malware activities. The triggering relationships can be inferred by means of both rule-based and learning-based approaches. The rule-based approach originates from several heuristic algorithms based on the domain knowledge. The learning-based approach discovers the triggering relationship using a pairwise comparison operation that converts the requests into event pairs with comparable attributes. Machine learning classifiers predict the triggering relationship and further reason about the legitimacy of requests by enforcing their root-triggers. We apply our dependence model on the network traffic from a single host and a mobile device. Evaluated with real-world malware samples and synthetic attacks, our findings confirm that the traffic dependence model provides a significant source of semantic and contextual information that detects zero-day malicious applications. This dissertation also studies the usability of visualizing the traffic causality for domain experts. We design and develop a tool with a visual locality property. It supports different levels of visual based querying and reasoning required for the sensemaking process on complex network data. The significance of this dissertation research is in that it provides deep insights on the dependency of network requests, and leverages structural and semantic information, allowing us to reason about network behaviors and detect stealthy anomalies.
- DroidCat: Unified Dynamic Detection of Android MalwareCai, Haipeng; Meng, Na; Ryder, Barbara G.; Yao, Danfeng (Daphne) (Department of Computer Science, Virginia Polytechnic Institute & State University, 2016)Various dynamic approaches have been developed to detect or categorize Android malware. These approaches execute software, collect call traces, and then detect abnormal system calls or sensitive API usage. Consequently, attackers can evade these approaches by intentionally obfuscating those calls under focus. Additionally, existing approaches treat detection and categorization of malware as separate tasks, although intuitively both tasks are relevant and could be performed simultaneously. This paper presents DroidCat, the first unified dynamic malware detection approach, which not only detects malware, but also pinpoints the malware family. DroidCat leverages supervised machine learning to train a multi-class classifier using diverse behavioral profiles of benign apps and different kinds of malware. Compared with prior heuristics-based machine learning-based approaches, the feature set used in DroidCat is decided purely based on a systematic dynamic characterization study of benign and malicious apps. All differentiating features that show behavioral differences between benign and malicious apps are included. In this way, DroidCat is robust to existing evasion attacks. We evaluated DroidCat using leave-one-out cross validation with 136 benign apps and 135 malicious apps. The evaluation shows that DroidCat provided an effective and scalable unified malware detection solution with 81% precision, 82% recall, and 92% accuracy.