Browsing by Author "Zhang, Ying"
Now showing 1 - 9 of 9
Results Per Page
Sort Options
- Bayesian D-Optimal Design for Generalized Linear ModelsZhang, Ying (Virginia Tech, 2006-12-07)Bayesian optimal designs have received increasing attention in recent years, especially in biomedical and clinical trials. Bayesian design procedures can utilize the available prior information of the unknown parameters so that a better design can be achieved. However, a difficulty in dealing with the Bayesian design is the lack of efficient computational methods. In this research, a hybrid computational method, which consists of the combination of a rough global optima search and a more precise local optima search, is proposed to efficiently search for the Bayesian D-optimal designs for multi-variable generalized linear models. Particularly, Poisson regression models and logistic regression models are investigated. Designs are examined for a range of prior distributions and the equivalence theorem is used to verify the design optimality. Design efficiency for various models are examined and compared with non-Bayesian designs. Bayesian D-optimal designs are found to be more efficient and robust than non-Bayesian D-optimal designs. Furthermore, the idea of the Bayesian sequential design is introduced and the Bayesian two-stage D-optimal design approach is developed for generalized linear models. With the incorporation of the first stage data information into the second stage, the two-stage design procedure can improve the design efficiency and produce more accurate and robust designs. The Bayesian two-stage D-optimal designs for Poisson and logistic regression models are evaluated based on simulation studies. The Bayesian two-stage optimal design approach is superior to the one-stage approach in terms of a design efficiency criterion.
- Broadly Enabling KLEE to Effortlessly Find Unrecoverable Errors in RustZhang, Ying; Li, Peng; Ding, Yu; Wang, Lingxiang; Williams, Dan; Meng, Na (ACM, 2024)Rust is a general-purpose programming language designed for performance and safety. Unrecoverable errors (e.g., Divide by Zero) in Rust programs are critical, as they signal bad program states and terminate programs abruptly. Previous work has contributed to utilizing KLEE, a dynamic symbolic test engine, to verify the program would not panic. However, it is difficult for engineers who lack domain expertise to write test code correctly. Besides, the effectiveness of KLEE in finding panics in production Rust code has not been evaluated. We created an approach, called PanicCheck, to hide the complexity of verifying Rust programs with KLEE. Using PanicCheck, engineers only need to annotate the function-to-verify with #[panic_check]. The annotation guides PanicCheck to generate test code, compile the function together with tests, and execute KLEE for verification. After applying PanicCheck to 21 open-source and 2 closed-source projects, we found 61 test inputs that triggered panics; 59 of the 61 panics have been addressed by developers so far. Our research shows promising verification results by KLEE, while revealing technical challenges in using KLEE. Our experience will shed light on future practice and research in program verification.
- FARCI: Fast and Robust Connectome InferenceMeamardoost, Saber; Bhattacharya, Mahasweta; Hwang, Eun Jung; Komiyama, Takaki; Mewes, Claudia; Wang, Linbing; Zhang, Ying; Gunawan, Rudiyanto (MDPI, 2021-11-24)The inference of neuronal connectome from large-scale neuronal activity recordings, such as two-photon Calcium imaging, represents an active area of research in computational neuroscience. In this work, we developed FARCI (Fast and Robust Connectome Inference), a MATLAB package for neuronal connectome inference from high-dimensional two-photon Calcium fluorescence data. We employed partial correlations as a measure of the functional association strength between pairs of neurons to reconstruct a neuronal connectome. We demonstrated using in silico datasets from the Neural Connectomics Challenge (NCC) and those generated using the state-of-the-art simulator of Neural Anatomy and Optimal Microscopy (NAOMi) that FARCI provides an accurate connectome and its performance is robust to network sizes, missing neurons, and noise levels. Moreover, FARCI is computationally efficient and highly scalable to large networks. In comparison with the best performing connectome inference algorithm in the NCC, Generalized Transfer Entropy (GTE), and Fluorescence Single Neuron and Network Analysis Package (FluoroSNNAP), FARCI produces more accurate networks over different network sizes, while providing significantly better computational speed and scaling.
- Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective DetectionLi, Yuxi; Liu, Yi; Deng, Gelei; Zhang, Ying; Song, Wenjia; Shi, Ling; Wang, Kailong; Li, Yuekang; Liu, Yang; Wang, Haoyu (ACM, 2024-07-12)With the expanding application of Large Language Models (LLMs) in various domains, it becomes imperative to comprehensively investigate their unforeseen behaviors and consequent outcomes. In this study, we introduce and systematically explore the phenomenon of “glitch tokens”, which are anomalous tokens produced by established tokenizers and could potentially compromise the models’ quality of response. Specifically, we experiment on seven top popular LLMs utilizing three distinct tokenizers and involving a totally of 182,517 tokens. We present categorizations of the identified glitch tokens and symptoms exhibited by LLMs when interacting with glitch tokens. Based on our observation that glitch tokens tend to cluster in the embedding space, we propose GlitchHunter, a novel iterative clustering-based technique, for efficient glitch token detection. The evaluation shows that our approach notably outperforms three baseline methods on eight open-source LLMs. To the best of our knowledge, we present the first comprehensive study on glitch tokens. Our new detection further provides valuable insights into mitigating tokenization-related errors in LLMs.
- A Hitchhiker's Guide to Jailbreaking ChatGPT via Prompt EngineeringLiu, Yi; Deng, Gelei; Xu, Zhengzi; Li, Yuekang; Zheng, Yaowen; Zhang, Ying; Zhao, Lida; Zhang, Tianwei; Wang, Kailong (ACM, 2024-07-15)Natural language prompts serve as an essential interface between users and Large Language Models (LLMs) like GPT-3.5 and GPT-4, which are employed by ChatGPT to produce outputs across various tasks. However, prompts crafted with malicious intent, known as jailbreak prompts, can circumvent the restrictions of LLMs, posing a significant threat to systems integrated with these models. Despite their critical importance, there is a lack of systematic analysis and comprehensive understanding of jailbreak prompts. Our paper aims to address this gap by exploring key research questions to enhance the robustness of LLM systems: 1) What common patterns are present in jailbreak prompts? 2) How effectively can these prompts bypass the restrictions of LLMs? 3) With the evolution of LLMs, how does the effectiveness of jailbreak prompts change? To address our research questions, we embarked on an empirical study targeting the LLMs underpinning ChatGPT, one of today’s most advanced chatbots. Our methodology involved categorizing 78 jailbreak prompts into 10 distinct patterns, further organized into three jailbreak strategy types, and examining their distribution.We assessed the effectiveness of these prompts on GPT-3.5 and GPT-4, using a set of 3,120 questions across 8 scenarios deemed prohibited by OpenAI. Additionally, our study tracked the performance of these prompts over a 3-month period, observing the evolutionary response of ChatGPT to such inputs. Our findings offer a comprehensive view of jailbreak prompts, elucidating their taxonomy, effectiveness, and temporal dynamics. Notably, we discovered that GPT-3.5 and GPT-4 could still generate inappropriate content in response to malicious prompts without the need for jailbreaking. This underscores the critical need for effective prompt management within LLM systems and provides valuable insights and data to spur further research in LLM testing and jailbreak prevention.
- Protein Arginine Deiminase 2 Binds Calcium in an Ordered Fashion: Implications for Inhibitor DesignSlade, Daniel J.; Fang, Pengfei; Dreyton, Christina J.; Zhang, Ying; Fuhrmann, Jakob; Rempel, Don; Bax, Benjamin D.; Coonrod, Scott A.; Lewis, Huw D.; Guo, Min; Gross, Michael L.; Thompson, Paul R. (American Chemical Society, 2015-04-01)Protein arginine deiminases (PADs) are calcium-dependent histone-modifying enzymes whose activity is dysregulated in inflammatory diseases and cancer. PAD2 functions as an Estrogen Receptor (ER) coactivator in breast cancer cells via the citrullination of histone tail arginine residues at ER binding sites. Although an attractive therapeutic target, the mechanisms that regulate PAD2 activity are largely unknown, especially the detailed role of how calcium facilitates enzyme activation. To gain insights into these regulatory processes, we determined the first structures of PAD2 (27 in total), and through calcium-titrations by X-ray crystallography, determined the order of binding and affinity for the six calcium ions that bind and activate this enzyme. These structures also identified several PAD2 regulatory elements, including a calcium switch that controls proper positioning of the catalytic cysteine residue, and a novel active site shielding mechanism. Additional biochemical and mass-spectrometry-based hydrogen/deuterium exchange studies support these structural findings. The identification of multiple intermediate calcium-bound structures along the PAD2 activation pathway provides critical insights that will aid the development of allosteric inhibitors targeting the PADs.
- Recent Progress in Lyme Disease and Remaining ChallengesBobe, Jason R.; Jutras, Brandon L.; Horn, Elizabeth J.; Embers, Monica E.; Bailey, Allison; Moritz, Robert L.; Zhang, Ying; Soloski, Mark J.; Ostfeld, Richard S.; Marconi, Richard T.; Aucott, John; Ma'ayan, Avi; Keesing, Felicia; Lewis, Kim; Ben Mamoun, Choukri; Rebman, Alison W.; McClune, Mecaila E.; Breitschwerdt, Edward B.; Reddy, Panga Jaipal; Maggi, Ricardo; Yang, Frank; Nemser, Bennett; Ozcan, Aydogan; Garner, Omai; Di Carlo, Dino; Ballard, Zachary; Joung, Hyou-Arm; Garcia-Romeu, Albert; Griffiths, Roland R.; Baumgarth, Nicole; Fallon, Brian A. (Frontiers, 2021-08-18)Lyme disease (also known as Lyme borreliosis) is the most common vector-borne disease in the United States with an estimated 476,000 cases per year. While historically, the long-term impact of Lyme disease on patients has been controversial, mounting evidence supports the idea that a substantial number of patients experience persistent symptoms following treatment. The research community has largely lacked the necessary funding to properly advance the scientific and clinical understanding of the disease, or to develop and evaluate innovative approaches for prevention, diagnosis, and treatment. Given the many outstanding questions raised into the diagnosis, clinical presentation and treatment of Lyme disease, and the underlying molecular mechanisms that trigger persistent disease, there is an urgent need for more support. This review article summarizes progress over the past 5 years in our understanding of Lyme and tick-borne diseases in the United States and highlights remaining challenges.
- Secure Coding Practice in Java: Automatic Detection, Repair, and Vulnerability DemonstrationZhang, Ying (Virginia Tech, 2023-10-12)The Java platform and third-party open-source libraries provide various Application Programming Interfaces (APIs) to facilitate secure coding. However, using these APIs securely is challenging for developers who lack cybersecurity training. Prior studies show that many developers use APIs insecurely, thereby introducing vulnerabilities in their software. Despite the availability of various tools designed to identify API insecure usage, their effectiveness in helping developers with secure coding practices remains unclear. This dissertation focuses on two main objectives: (1) exploring the strengths and weaknesses of the existing automated detection tools for API-related vulnerabilities, and (2) creating better tools that detect, repair, and demonstrate these vulnerabilities. Our research started with investigating the effectiveness of current tools in helping with developers' secure coding practices. We systematically explored the strengths and weaknesses of existing automated tools for detecting API-related vulnerabilities. Through comprehensive analysis, we observed that most existing tools merely report misuses, without suggesting any customized fixes. Moreover, developers often rejected tool-generated vulnerability reports due to their concerns on the correctness of detection, and the exploitability of the reported issues. To address these limitations, the second work proposed SEADER, an example-based approach to detect and repair security-API misuses. Given an exemplar ⟨insecure, secure⟩ code pair, SEADER compares the snippets to infer any API-misuse template and corresponding fixing edit. Based on the inferred information, given a program, SEADER performs inter-procedural static analysis to search for security-API misuses and to propose customized fixes. The third work leverages ChatGPT-4.0 to automatically generate security test cases. These test cases can demonstrate how vulnerable API usage facilitates supply chain attacks on specific software applications. By running such test cases during software development and maintenance, developers can gain more relevant information about exposed vulnerabilities, and may better create secure-by-design and secure-by-default software.
- Unsupervised discovery of solid-state lithium ion conductorsZhang, Ying; He, Xingfeng; Chen, Zhiqian; Bai, Qiang; Nolan, Adelaide M.; Roberts, Charles A.; Banerjee, Debasish; Matsunaga, Tomoya; Mo, Yifei; Ling, Chen (2019-11-20)Although machine learning has gained great interest in the discovery of functional materials, the advancement of reliable models is impeded by the scarcity of available materials property data. Here we propose and demonstrate a distinctive approach for materials discovery using unsupervised learning, which does not require labeled data and thus alleviates the data scarcity challenge. Using solid-state Li-ion conductors as a model problem, unsupervised materials discovery utilizes a limited quantity of conductivity data to prioritize a candidate list from a wide range of Li-containing materials for further accurate screening. Our unsupervised learning scheme discovers 16 new fast Li-conductors with conductivities of 10(-4)-10(-1) S cm(-1) predicted in ab initio molecular dynamics simulations. These compounds have structures and chemistries distinct to known systems, demonstrating the capability of unsupervised learning for discovering materials over a wide materials space with limited property data.