Browsing by Author "Chung, Taejoong Tijay"
Now showing 1 - 9 of 9
Results Per Page
Sort Options
- Chameleon Interference: Assessing Vulnerability of Magnetic Sensors to Spoofing and Signal injection attacks through Environmental interference in Mobile DevicesGleason, David Theodore (Virginia Tech, 2023-01-06)Embedded sensors are a fixture of most devices in the current computer industry. These small devices are used for a variety of purposes throughout many fields to collect whatever kind of information is needed by the user. From data on device acceleration to data on position relative to the Earth's magnetic field, embedded sensors can provide it for any number of tasks. The advent of these devices has made work and research in the computer industry significantly easier but they are not without their drawbacks. Most of these sensors operate by drawing external data from the environment through send and receive signals. This mode of operation leaves them vulnerable to external malicious users who seek access to the data being stored and handled by the sensors. Concerns over security and privacy of embedded sensor data has become a topic of great concern with the continued digitization of sensitive personal data. Within the last five years, studies have shown the ability to manipulate embedded magnetic sensors in order to gain access to various forms of sensitive personal data. This is of great concern to the developers of mobile devices as most mobile devices possess embedded magnetic sensors. The vulnerability of sensors to external influence leads to concerns for both data privacy and degradation of public trust in the ability of their devices to keep their personal information safe and out of the wrong hands. Degradation of public trust in security methodologies is a major concern to many in the research and tech industry as much of the work conducted to advance both security and technology depends on large amounts of public data. If the public loses trust in the ability of the devices used by researchers to protect and ensure the safety of the data provided to them, then they may stop providing data which would then make the work of researchers and other tech workers considerably more difficult. To address these concerns, this thesis will present an introduction to Magnetic sensor devices (a prominent tool for data collection), how these sensors work and the ways they handle data. We shall then examine the techniques used to interfere with the functioning and output of magnetic sensors employed by mobile devices. Finally, we shall examine existing techniques for defending against these kinds of attacks as well as propose potential new techniques. The end goal of this work is to provide a broader perspective on the nature of environmental/natural interference and its relationship to scientific study and technological advancement. Literature around this topic does exist, however, all existing works currently in the literature focus exclusively on one form of interference i.e., light which leads to a smaller/narrower perspective which this work seeks to remedy. The end result is meant to give a broader perspective of multiple forms of interference and their interrelations between each other than is possible by current perspectives due to their narrow lens.
- Defending Against Misuse of Synthetic Media: Characterizing Real-world Challenges and Building Robust DefensesPu, Jiameng (Virginia Tech, 2022-10-07)Recent advances in deep generative models have enabled the generation of realistic synthetic media or deepfakes, including synthetic images, videos, and text. However, synthetic media can be misused for malicious purposes and damage users' trust in online content. This dissertation aims to address several key challenges in defending against the misuse of synthetic media. Key contributions of this dissertation include the following: (1) Understanding challenges with the real-world applicability of existing synthetic media defenses. We curate synthetic videos and text from the wild, i.e., the Internet community, and assess the effectiveness of state-of-the-art defenses on synthetic content in the wild. In addition, we propose practical low-cost adversarial attacks, and systematically measure the adversarial robustness of existing defenses. Our findings reveal that most defenses show significant degradation in performance under real-world detection scenarios, which leads to the second thread of my work: (2) Building detection schemes with improved generalization performance and robustness for synthetic content. Most existing synthetic image detection schemes are highly content-specific, e.g., designed for only human faces, thus limiting their applicability. I propose an unsupervised content-agnostic detection scheme called NoiseScope, which does not require a priori access to synthetic images and is applicable to a wide variety of generative models, i.e., GANs. NoiseScope is also resilient against a range of countermeasures conducted by a knowledgeable attacker. For the text modality, our study reveals that state-of-the-art defenses that mine sequential patterns in the text using Transformer models are vulnerable to simple evasion schemes. We conduct further exploration towards enhancing the robustness of synthetic text detection by leveraging semantic features.
- Empirical Investigations of More Practical Fault Localization ApproachesDao, Tung Manh (Virginia Tech, 2023-10-18)Developers often spend much of their valuable development time on software debugging and bug finding. In addition, software defects cost software industry as a whole hundreds or even a trillion of US dollars. As a result, many fault localization (FL) techniques for localizing bugs automatically, have been proposed. Despite its popularity, adopting FL in industrial environments has been impractical due to its undesirable accuracy and high runtime overhead cost. Motivated by the real-world challenges of FL applicability, this dissertation addresses these issues by proposing two main enhancements to the existing FL. First, it explores different strategies to combine a variety of program execution information with Information Retrieval-based fault localization (IRFL) techniques to increase FL's accuracy. Second, this dissertation research invents and experiments with the unconventional techniques of Instant Fault Localization (IFL) using the innovative concept of triggering modes. Our empirical evaluations of the proposed approaches on various types of bugs in a real software development environment shows that both FL's accuracy is increased and runtime is reduced significantly. We find that execution information helps increase IRFL's Top-10 by 17–33% at the class level, and 62–100% at the method level. Another finding is that IFL achieves as much as 100% runtime cost reduction while gaining comparable or better accuracy. For example, on single-location bugs, IFL scores 73% MAP, compared with 56% of the conventional approach. For multi-location bugs, IFL's Top-1 performance on real bugs is 22%, just right below 24% that of the existing FL approaches. We hope the results and findings from this dissertation help make the adaptation of FL in the real-world industry more practical and prevalent.
- Exploring the Evolution of the TLS Certificate EcosystemFarhan, Syed Muhammad (Virginia Tech, 2022-06-01)A vast majority of popular communication protocols for the internet employ the use of TLS (Transport Layer Security) to secure communication. As a result, there have been numerous efforts including the introduction of Certificate Transparency logs and Free Automated CAs to improve the SSL certificate ecosystem. Our work highlights the effectiveness of these efforts using the Certificate Transparency dataset as well as certificates collected via full IPv4 scans. We show that a large proportion of invalid certificates still exists and outline reasons why these certificates are invalid and where they are hosted. Moreover, we show that the incorrect use of template certificates has led to incorrect SCTs being embedded in the certificates. Taken together, our results emphasize continued involvement for the research community to improve the web's PKI ecosystem.
- Information Freshness Optimization in Real-time Network ApplicationsLiu, Zhongdong (Virginia Tech, 2024-06-12)In recent years, the remarkable development in ubiquitous communication networks and smart portable devices spawned a wide variety of real-time applications that require timely information updates (e.g., autonomous vehicular systems, industrial automation systems, and live streaming services). These real-time applications all have one thing in common: they desire their knowledge of the information source to be as fresh as possible. In order to measure the freshness of information, a new metric, called the Age-of-Information (AoI) is proposed. AoI is defined as the time elapsed since the generation time of the freshest delivered update. This metric is influenced by both the inter-arrival time and the delay of the updates. As a result of these dependencies, the AoI metric exhibits distinct characteristics compared to traditional delay and throughput metrics. In this dissertation, our goal is to optimize AoI under various real-time network applications. Firstly, we investigate a fundamental problem of how exactly various scheduling policies impact AoI performance. Though there is a large body of work studying the AoI performance under different scheduling policies, the use of the update-size information and its combinations with other information (such as arrival-time information and service preemption) to reduce AoI has still not been explored yet. Secondly, as a recently introduced measure of freshness, the relationship between AoI and other performance metrics remains largely ambiguous. We analyze the tradeoffs between AoI and additional performance metrics, including service performance and update cost, within real-world applications. This dissertation is organized into three parts. In the first part, we realize that scheduling policies leveraging the update-size information can substantially reduce the delay, one of the key components of AoI. However, it remains largely unknown how exactly scheduling policies (especially those making use of update-size information) impact the AoI performance. To this end, we conduct a systematic and comparative study to investigate the impact of scheduling policies on the AoI performance in single-server queues and provide useful guidelines for the design of AoI-efficient scheduling policies. In the second part, we analyze the tradeoffs between AoI and other performance metrics in real-world systems. Specifically, we focus on the following two important tradeoffs. (i) The tradeoff between service performance and AoI that arises in the data-driven real-time applications (e.g., Google Maps and stock trading applications). In these applications, the computing resource is often shared for processing both updates from information sources and queries from end users. Hence there is a natural tradeoff between service performance (e.g., response time to queries) and AoI (i.e., the freshness of data in response to user queries). To address this tradeoff, we begin by introducing a simple single-server two-queue model that captures the coupled scheduling between updates and queries. Subsequently, we design threshold-based scheduling policies to prioritize either updates or queries. Finally, we conduct a rigorous analysis of the performance of these threshold-based scheduling policies. (ii) The tradeoff between update cost and AoI that appear in the crowdsensing-based applications (e.g., Google Waze and GasBuddy). On the one hand, users are not satisfied if the responses to their requests are stale; on the other side, there is a cost for the applications to update their information regarding certain points of interest since they typically need to make monetary payments to incentivize users. To capture this tradeoff, we first formulate an optimization problem with the objective of minimizing the sum of the staleness cost (which is a function of the AoI) and the update cost, then we obtain a closed-form optimal threshold-based policy by reformulating the problem as a Markov decision process (MDP). In the third part, we study the minimization of data freshness and transmission costs (e.g., energy cost) under an (arbitrary) time-varying wireless channel without and with machine learning (ML) advice. We consider a discrete-time system where a resource-constrained source transmits time-sensitive data to a destination over a time-varying wireless channel. Each transmission incurs a fixed cost, while not transmitting results in a staleness cost measured by the AoI. The source needs to balance the tradeoff between these transmission and staleness costs. To tackle this challenge, we develop a robust online algorithm aimed at minimizing the sum of transmission and staleness costs, ensuring a worst-case performance guarantee. While online algorithms are robust, they tend to be overly conservative and may perform poorly on average in typical scenarios. In contrast, ML algorithms, which leverage historical data and prediction models, generally perform well on average but lack worst-case performance guarantees. To harness the advantages of both approaches, we design a learning-augmented online algorithm that achieves two key properties: (i) consistency: closely approximating the optimal offline algorithm when the ML prediction is accurate and trusted; (ii) robustness: providing a worst-case performance guarantee even when ML predictions are inaccurate.
- Measurement and Development for Automated Secure Coding SolutionsFrantz, Miles Eugene (Virginia Tech, 2024-09-09)With the rise of development efforts, there has also been a rise in source code vulnerabilities. Advanced security tools have been created to identify these vulnerabilities throughout the lifetime of the developer's ecosystem and afterward, before the vulnerabilities are exposed. One such popular method is Static Code Analysis (Code Analysis) (SCA), which scans developers' source code to identify potential vulnerabilities in the code. My Ph.D. work aims to help reduce the vulnerabilities exposed by YIELD, ENHANCE, and EVALUATE (EYE) SCA tools to identify vulnerabilities while the developer writes the code. We first look into evaluating tools that support developers with their source code by determining how accurate they are with identifying vulnerability information. Large Language Machine Learning Model (LLM)s have been on the rise recently with the introduction of Chat Generative Pre-trained Transformer (ChatGPT) 3.5, ChatGPT 4.1, Google Gemini, and many more. Using a common framework, we created a zero-shot prompt instructing the LLM to identify; whether there is a vulnerability in the provided source code and what Common Weakness Enumeration (CWE) value represents the vulnerability. With our Python cryptographic benchmark PyCryptoBench, we sent vulnerable samples to four different LLMs and two different versions of ChatGPT Application Program Interface (API)s. The samples allow us to measure how reliable each LLM is at vulnerability identification and defining. The Chat- GPT APIs include multiple reproducible fields that allowed us to measure how reproducible the responses are. Next, we yield a new SCA tool to apply what we learned to a current gap in increasingly complex source code. Cryptolation, our state-of-the-art (SOA) Python SCA tool uses constant propagation-supported variable inference to obtain insight into the data flow state through the program's execution. Python source code has ever-increasing complexities and a lack of SCA tools compared to Java. We compare Cryptolation with the other SOA SCA tools Bandit, Semgrep, and Dlint. To verify the Precision of our tool, we created the benchmark PyCryptoBench, which contains 1,836 test cases and encompasses five different language features. Next, we crawled over 1,000 cryptographic-related Python projects on GitHub and each with each tool. Finally, we reviewed all PyCryptoBench results and sampled over 10,000 cryptographic-related Python projects. The results reveal Cryptolation has a 100% Precision on the benchmark, with the second highest Precision with cryptographic-related projects. Finally, we look at enhancing SCA tools. The SOA tools already compete to have the highest Precision, Recall, and Accuracy. However, we examine several developer surveys to determine their reasons for not adopting such tools. These are generally better aesthetics, usability, customization, and a low effort cost to use consistently. To achieve this, we enhance the SOA Java SCA tool CryptoGuard with the following: integrated build tools, modern terminal Command Line Interface (CLI) usage, customizable and vendor-specific output formats, and no-install demos.
- Measuring and Understanding TTL Violations in DNS ResolversBhowmick, Protick (Virginia Tech, 2024-01-02)The Domain Name System (DNS) is a scalable-distributed caching architecture where each DNS records are cached around several DNS servers distributed globally. DNS records include a time-to-live (TTL) value that dictates how long the record can be stored before it's evicted from the cache. TTL holds significant importance in aspects of DNS security, such as determining the caching period for DNSSEC-signed responses, as well as performance, like the responsiveness of CDN-managed domains. On a high level, TTL is crucial for ensuring efficient caching, load distribution, and network security in Domain Name System. Setting appropriate TTL values is a key aspect of DNS administration to ensure the reliable and efficient functioning of the Domain Name System. Therefore, it is crucial to measure how TTL violations occur in resolvers. But, assessing how DNS resolvers worldwide handle TTL is not easy and typically requires access to multiple nodes distributed globally. In this work, we introduce a novel methodology for measuring TTL violations in DNS resolvers leveraging a residential proxy service called Brightdata, enabling us to evaluate more than 27,000 resolvers across 9,500 Autonomous Systems (ASes). We found that 8.74% arbitrarily extends TTL among 8,524 resolvers that had atleast five distinct exit nodes. Additionally, we also find that the DNSSEC standard is being disregarded by 44.1% of DNSSEC-validating resolvers, as they continue to provide DNSSEC-signed responses even after the RRSIGs have expired.
- Message Authentication Codes On Ultra-Low SWaP DevicesLiao, Che-Hsien (Virginia Tech, 2022-05-27)This thesis focuses on specific crypto algorithms, Message Authentication Codes (MACs), running on ultra-low SWaP devices. The type of MACs we used is hash-based message authentication codes (HMAC) and cipher-block-chaining message authentication code (CBC-MAC). The most important thing about ultra-low SWaP devices is their energy usage. This thesis measures different implementations' execution times on ultra-low SWaP devices. We could understand which implementation is suitable for a specific device. In order to understand the crypto algorithm we used, this thesis briefly introduces the concept of hash-based message authentication codes (HMAC) and cipher-block-chaining message authentication code (CBC-MAC) from a high level, including their usage and advantage. The research method is empirical research. This thesis determines the execution times of different implementations. These two algorithms (HMAC and CBC-MAC) contain three implementations. The result comes from those implementations running on the devices we used.
- NoiseLearner: An Unsupervised, Content-agnostic Approach to Detect Deepfake ImagesVives, Cristian (Virginia Tech, 2022-03-21)Recent advancements in generative models have resulted in the improvement of hyper- realistic synthetic images or "deepfakes" at high resolutions, making them almost indistin- guishable from real images from cameras. While exciting, this technology introduces room for abuse. Deepfakes have already been misused to produce pornography, political propaganda, and misinformation. The ability to produce fully synthetic content that can cause such mis- information demands for robust deepfake detection frameworks. Most deepfake detection methods are trained in a supervised manner, and fail to generalize to deepfakes produced by newer and superior generative models. More importantly, such detection methods are usually focused on detecting deepfakes having a specific type of content, e.g., face deepfakes. How- ever, other types of deepfakes are starting to emerge, e.g., deepfakes of biomedical images, satellite imagery, people, and objects shown in different settings. Taking these challenges into account, we propose NoiseLearner, an unsupervised and content-agnostic deepfake im- age detection method. NoiseLearner aims to detect any deepfake image regardless of the generative model of origin or the content of the image. We perform a comprehensive evalu- ation by testing on multiple deepfake datasets composed of different generative models and different content groups, such as faces, satellite images, landscapes, and animals. Further- more, we include more recent state-of-the-art generative models in our evaluation, such as StyleGAN3 and probabilistic denoising diffusion models (DDPM). We observe that Noise- Learner performs well on multiple datasets, achieving 96% accuracy on both StyleGAN and StyleGAN2 datasets.