Browsing by Author "Hu, Yang"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- CS 5604: Information Storage and Retrieval - Webpages (WP) TeamBarry-Straume, Jostein; Vives, Cristian; Fan, Wentao; Tan, Peng; Zhang, Shuaicheng; Hu, Yang; Wilson, Tishauna (Virginia Tech, 2020-12-18)The first major goal of this project is to build a state-of-the-art information retrieval engine for searching webpages and for opening up access to existing and new webpage collections resulting from Digital Library Research Laboratory (DLRL) projects relating to eventsarchive.org. The task of the Webpage (WP) team was to provide the functionality of making any archived webpage accessible and indexed. The webpages can be obtained either through event focused crawlers or collections of data, such as WARC files containing webpages, or sets of tweets which contains embedded URLs. Toward completion of the project, the WP team worked on four major tasks: 1.) Contents of WARC files searchable through ElasticSearch. 2.) Contents of WARC files cleaned and searchable through ElasticSearch. 3.) Event focused crawler running and producing WARC files. 4.) Additional extracted/derived information (e.g., dates, classes) made searchable. The foundation of the software is a Docker container cluster employing Airflow, a Reasoner, and Kubernetes. The raw data of the information content of the given webpage collections is stored using the Network File System (NFS), while Ceph is used for persistent storage for the Docker containers. Retrieval, analysis, and visualization of the webpage collection is carried out with ElasticSearch and Kibana, respectively. These two technologies form an Elastic Stack application which serves as the vehicle with which the WP team indexes, maps, and stores the processed data and model outputs with regards to webpage collections. The software is co-designed by 7 team members of Virginia Tech graduate students, all members of the same computer science class, CS 5604: Information Storage and Retrieval. The course is taught by Professor Edward A. Fox. Dr. Fox structures the class in a way for his students to perform in a “mock” business development setting. In other words, the academic project submitted by the WP team for all intents and purposes can be viewed as a microcosm of software development within a corporate structure. This submission focuses on the work of the WP team, which creates and administers Docker containers such that various services are tested and deployed in whole. Said services pertain solely to the ingestion, cleansing, analysis, extraction, classification, and indexing of webpages and their respective content.
- FLARE: Defending Federated Learning against Model Poisoning Attacks via Latent Space RepresentationsWang, Ning; Xiao, Yang; Chen, Yimin; Hu, Yang; Lou, Wenjing; Hou, Y. Thomas (ACM, 2022-05-30)Federated learning (FL) has been shown vulnerable to a new class of adversarial attacks, known as model poisoning attacks (MPA), where one or more malicious clients try to poison the global model by sending carefully crafted local model updates to the central parameter server. Existing defenses that have been fixated on analyzing model parameters show limited effectiveness in detecting such carefully crafted poisonous models. In this work, we propose FLARE, a robust model aggregation mechanism for FL, which is resilient against state-of-the-art MPAs. Instead of solely depending on model parameters, FLARE leverages the penultimate layer representations (PLRs) of the model for characterizing the adversarial influence on each local model update. PLRs demonstrate a better capability to differentiate malicious models from benign ones than model parameter-based solutions. We further propose a trust evaluation method that estimates a trust score for each model update based on pairwise PLR discrepancies among all model updates. Under the assumption that honest clients make up the majority, FLARE assigns a trust score to each model update in a way that those far from the benign cluster are assigned low scores. FLARE then aggregates the model updates weighted by their trust scores and finally updates the global model. Extensive experimental results demonstrate the effectiveness of FLARE in defending FL against various MPAs, including semantic backdoor attacks, trojan backdoor attacks, and untargeted attacks, and safeguarding the accuracy of FL.
- On Transferability of Adversarial Examples on Machine-Learning-Based Malware ClassifiersHu, Yang (Virginia Tech, 2022-05-12)The use of Machine Learning for malware detection is essential to counter the massive growth in malware types compared with the traditional signature-based detection system. However, machine learning models could also be extremely vulnerable and sensible to transferable adversarial example (AE) attacks. The transfer AE attack does not require extra information from the victim model such as gradient information. Researchers explore mainly 2 lines of transfer-based adversarial example attacks: ensemble models and ensemble samples. \\ Although comprehensive innovations and progress have been achieved in transfer AE attacks, few works have investigated how these techniques perform in malware data. Besides, generating adversarial examples on an android APK file is not as easy and convenient as it is on image data since the generated AE of malware should also remain its functionality and executability after perturbation. Therefore, it is urgent to validate whether previous methodologies could still have their effect on malware considering the differences compared to image data. \\ In this thesis, we first have a thorough literature review for the AE attacks on malware data and general transfer AE attacks. Then we design our algorithm for the transfer AE attack. We formulate the optimization problem based on the intuition that the contribution evenness of features towards the final prediction result is highly correlated to the AE transferability. We then solve the optimization problem by gradient descent and evaluate it through extensive experiments. Implementing and experimenting with the state-of-the-art AE algorithms and transferability enhancement techniques, we analyze and summarize the weaknesses and strengths of each method.