Browsing by Author "Zhao, Yan"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
- Collection Management Tobacco Settlement Documents (CMT) CS5604 Fall 2019Muhundan, Sushmethaa; Bendelac, Alon; Zhao, Yan; Svetovidov, Andrei; Biswas, Debasmita; Marin Thomas, Ashin (Virginia Tech, 2019-12-11)Consumption of tobacco causes health issues, both mental and physical. Despite this widely known fact, tobacco companies had sustained their huge presence in the market over the past century owing to a variety of successful marketing strategies. This report documents the work of the Collection Management Tobacco Settlement Documents (CMT) team, the data ingestion team for the tobacco documents. We deal with an archive of tobacco documents that were produced during litigation between the United States and seven major tobacco industry organizations. Our aim is to process these documents and assist Dr. David M. Townsend, an assistant professor at Virginia Polytechnic Institute and State University (Virginia Tech) Pamplin College of Business, in his research towards understanding the marketing strategies of the tobacco companies. The team is part of a larger initiative: to build a state-of-the-art information retrieval and analysis system. We handle over 14 million tobacco settlement documents as part of this project. Our tasks include extracting the data as well as metadata from these documents. We cater to the needs of the ElasticSearch (ELS) team and the Text Analytics and Machine Learning (TML) team. We provide tobacco settlement data in suitable formats to enable them to process and feed the data into the information retrieval system. We have successfully processed both the metadata and the document texts into a usable format. For metadata, this involved collaborating with the above-mentioned teams to come up with a suitable format. We retrieved the metadata from a MySQL database and converted it into a JSON for Elasticsearch ingestion. For the data, this involved lemmatization, tokenization, and text cleaning. We have supplied the entire dataset to the ELS and TML teams. Data, as well as metadata of these documents, were cleaned and provided. Python scripts were used to query the database and output the results in the required format. We also closely interacted with Dr. Townsend to understand his research needs in order to guide the Front-end and Kibana (FEK) team in terms of insights about features that can be used for visualizations. This way, the information retrieval system we build would add more value to our client.
- Genotyping Points to Divergent Evolution of 'Candidatus Phytoplasma asteris' Strains Causing North American Grapevine Yellows and Strains Causing Aster YellowsDavis, Robert E.; Dally, Ellen L.; Zhao, Yan; Wolf, Tony K. (2018-09)Grapevine yellows diseases occur in cultivated grapevine (Viris vinifera L.) on several continents, where the diseases are known by different names depending upon the identities of the causal phytoplasmas. In this study, phytoplasma strains associated with grapevine yellows disease (North American grapevine yellows INAGY]) in vineyards of Pennsylvania were characterized as belonging to 16S ribosomal RNA (rRNA) gene restriction fragment length polymorphism group 16Srl (aster yellows phytoplasma group), subgroup 16Srl-B (1-B), and variant subgroup I-B*. The strains (NAGY1 strains) were subjected to genotyping based on analyses of 16S rRNA and secY genes, and to in silico three-dimensional modeling of the SecY protein. Although the NAGY1 strains are closely related to aster yellows (AY) phytoplasma strains and are classified like AY strains in subgroup I-B or in variant subgroup I-B*, the results from genotyping and protein modeling may signal ongoing evolutionary divergence of NAGY1 strains from related strains in subgroup 16Srl-B.
- How Do Developers Reuse StackOverflow Answers in Their GitHub Projects?Chen, Juntong; Zhao, Yan; Meng, Na (ACM, 2024-10-27)StackOverflow (SO) is a widely used question-and-answer (Q&A) website for software developers and computer scientists. GitHub is an online development platform used for storing, tracking, and collaborating on software projects. Prior work relates the information mined from both platforms without carefully inspecting the answer-reuse practices. For this paper, we did an empirical study by mining the SO answers reused by Java projects available on GitHub. We created a hybrid approach of clone detection, keyword-based search, and manual inspection, to identify the answer(s) actually used by developers. Based on those answers, we studied topics of the discussion threads, answer characteristics (e.g., scores, ages, code lengths, and text lengths), and developers’ reuse practices. We observed that most reused answers offer programs to implement specific coding tasks. Among all analyzed SO discussion threads, the reused answers often have higher scores, older ages, longer code, and longer text than unused answers. In only 9% of scenarios (40/430), developers fully copied answer code for reuse. In the remaining scenarios, they reused partial code or created brand new code from scratch. Our study characterized 130 SO discussion threads referred to by Java developers in 357 GitHub projects. Our observations can guide SO answerers to provide better answers, and shed lights on future human-centric research that creates better tools to help with code reuse.
- A Lightweight Approach of Human-Like Playtest for Android AppsZhao, Yan (Virginia Tech, 2022-02-01)Testing is recognized as a key and challenging factor that can either boost or halt the game development in the mobile game industry. On one hand, manual testing is expensive and time-consuming, especially the wide spectrum of device hardware and software, so called fragmentation, significantly increases the cost to test applications on devices manually. On the other hand, automated testing is also very difficult due to more inherent technical issues to test games as compared to other mobile applications, such as non-native widgets, nondeterminism , complex game strategies and so on. Current testing frameworks (e.g., Android Monkey, Record and Replay) are limited because they adopt no domain knowledge to test games. Learning-based tools (e.g., Wuji) require tremendous resources and manual efforts to train a model before testing any game. The high cost of manual testing and lack of efficient testing tools for mobile games motivated the work presented in this thesis which aims to explore easy and efficient approaches to test mobile games efficiently and effectively. A new Android mobile game testing tool, called LIT, has been developed. LIT is a lightweight approach to generalize playtest tactics from manual testing, and to adopt the tactics for automatic game testing. LIT has two phases: tactic generalization and tactic concretization. In Phase I, when a human tester plays an Android game G for awhile (e.g., eight minutes), LIT records the tester's inputs and related scenes. Based on the collected data, LIT infers a set of context-aware, abstract playtest tactics that describe under what circumstances, what actions can be taken. In Phase II,LIT tests G based on the generalized tactics. Namely, given a randomly generated game scene, LIT tentatively matches that scene with the abstract context of any inferred tactic; if the match succeeds, LIT customizes the tactic to generate an action for playtest. Our evaluation with nine games shows LIT to outperform two state-of-the-art tools and are reinforcement learning (RL)-based tool, by covering more code and triggering more errors. This implies that LIT complements existing tools and helps developers better test certain games (e.g., match3).
- Method to correct the distortion caused by amplified stimulated emission as motivated by LIF-based flow diagnosticsLi, Xuesong; Zhao, Yan; Ma, Lin (Optical Society of America, 2012-04-01)Amplified stimulated emission (ASE) represents a significant issue in two-photon laser-induced fluorescence (TPLIF). The ASE effects are nonlinear and nonlocal, i.e., the ASE effects distort the LIF signal nonlinearly, and the distortion at one location depends on conditions at other locations. In this sense, the ASE effects pose a greater challenge to quantitative TPLIF than quenching and ionization. This work therefore seeks a method to correct such distortion. The method uses two LIF measurements, one with low signal-to-noise ratio (SNR) and negligible ASE distortion and another with high SNR but significant distortion, to generate a faithful measurement with high SNR. Extensive simulations were performed to evaluate the performance of this method for practical applications. c 2012 Optical Society of America OCIS codes: 300.2530, 300.6420, 120.1740.
- Simultaneous measurements of multiple flow parameters for scramjet characterization using tunable diode-laser sensorsLi, Fei; Yu, XiLong; Gu, Hongbin; Li, Zhi; Zhao, Yan; Ma, Lin; Chen, Lihong; Chang, Xinyu (Optical Society of America, 2011-12-01)This paper reports the simultaneous measurements of multiple flow parameters in a scramjet facility operating at a nominal Mach number of 2.5 using a sensing system based on tunable diode-laser absorption spectroscopy (TDLAS). The TDLAS system measures velocity, temperature, and water vapor partial pressure at three different locations of the scramjet: the inlet, the combustion region near the flame stabilization cavity, and the exit of the combustor. These measurements enable the determination of the variation of the Mach number and the combustion mode in the scramjet engine, which are critical for evaluating the combustion efficiency and optimizing engine performance. The results obtained in this work clearly demonstrated the applicability of TDLAS sensors in harsh and high-speed environments. The TDLAS system, due to its unique virtues, is expected to play an important role in the development of scramjet engines. (C) 2011 Optical Society of America