Browsing by Author "Han, Yi"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- Big Data Text Summarization for the NeverAgain MovementArora, Anuj; Miller, Chreston; Fan, Jixiang; Liu, Shuai; Han, Yi (Virginia Tech, 2018-12-10)When you are browsing social media websites such as Twitter and Facebook, have you ever seen hashtags like #NeverAgain and #EnoughIsEnough? Do you know what they mean? Never Again is an American student-led political movement for gun control to prevent gun violence. In the United States, gun control has long been debated. According to the data from the Gun Violence Archive (http://www.shootingtracker.com/), in 2017, the U.S. saw a total of 346 mass shootings. Supporters claim that the proliferation of firearms is the direct spark of a series of social unrest factors such as robbery, sexual crimes, and theft, while others believe the gun culture represents an integral part of their freedom. For the Never Again Gun Control Movement, we would like to generate a human readable summary based on deep learning methods so that one can study incidents of gun violence that shocked the world such as the 2017 Las Vegas shooting, in order to figure out the impact of gun proliferation. Our project includes three steps: pre-processing, topic modeling, and abstractive summarization using deep learning. We began with a large collection of news articles associated with the #NeverAgain movement. The raw news articles needed to be pre-processed in multiple ways. An ArchiveSpark script was used to convert the WARC and CDX files to a readable and parseable JSON. However, we figured out that at least forty percent of the data was noise. A series of restrictive word filters was applied to remove noise. After noise removal, we identified the most frequent words to get a preliminary idea whether we were filtering noise properly. We used the Natural Language Toolkit’s (NLTK) Named Entity chunker to generate named entities, which are phrases that form important nouns (people, places, organizations, etc.) in a sentence. For Topic Modeling, we classified sentences into different buckets or topics, which identified distinct themes in the collection. While we were performing the dictionary creation and document vectorization, the Latent Dirichlet allocation algorithm (for topic modeling) did not take the normalized and tokenized word corpus directly. It had to be converted into a vector for each article in the collection. We chose to use the Bag Of Words (BOW) approach. The Bag Of Words method is a simplifying representation used in natural language processing and information retrieval. In this model, text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order, but keeping multiplicity. According to topic modeling, we needed to choose the number of topics, which means one must guess how many topics are present in a collection. There is no foolproof way of replacing human logic to weave keywords into topics with semantic meaning. To address this we tried the coherence score approach. Coherence score is an attempt to mimic the human readability of the topic, and the higher the coherence score, the more ”coherent” the topics are considered. The last step for topic modeling is Latent Dirichlet Allocation (LDA). Latent Dirichlet allocation (LDA) is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. Compared with some other algorithms, LDA is a probabilistic one, which means that LDA is better at handling topic mixtures in different documents. In addition, LDA identifies topics coherently whereas the topics from other algorithms are more disjoint. After we had our topics (three in total), we filtered the article collection based on these topics. What resulted was three distinct collections of articles on which we could apply an abstractive summarization algorithm to produce a coherent summary. We chose to use a Pointer-Generator Network (PGN), a deep learning approach designed to create abstractive summaries, to produce said summaries. We created a summary for each identified topic and performed post-processing to produce one summary that connected the three topics (which are related) into a summary that flowed. The result was a summary that reflected the main themes of the article collection and informed the reader of the contents of said collection in less than two pages.
- Brown marmorated stink bug, Halyomorpha halys (Stål), genome: putative underpinnings of polyphagy, insecticide resistance potential and biology of a top worldwide pestSparks, Michael E.; Bansal, Raman; Benoit, Joshua B.; Blackburn, Michael B.; Chao, Hsu; Chen, Mengyao; Cheng, Sammy; Childers, Christopher; Dinh, Huyen; Doddapaneni, Harsha V.; Dugan, Shannon; Elpidina, Elena N.; Farrow, David W.; Friedrich, Markus; Gibbs, Richard A.; Hall, Brantley; Han, Yi; Hardy, Richard W.; Holmes, Christopher J.; Hughes, Daniel S. T.; Ioannidis, Panagiotis; Cheatle Jarvela, Alys M.; Johnston, J. Spencer; Jones, Jeffery W.; Kronmiller, Brent A.; Kung, Faith; Lee, Sandra L.; Martynov, Alexander G.; Masterson, Patrick; Maumus, Florian; Munoz-Torres, Monica; Murali, Shwetha C.; Murphy, Terence D.; Muzny, Donna M.; Nelson, David R.; Oppert, Brenda; Panfilio, Kristen A.; Paula, Débora P.; Pick, Leslie; Poelchau, Monica F.; Qu, Jiaxin; Reding, Katie; Rhoades, Joshua H.; Rhodes, Adelaide; Richards, Stephen; Richter, Rose; Robertson, Hugh M.; Rosendale, Andrew J.; Tu, Zhijian Jake; Velamuri, Arun S.; Waterhouse, Robert M.; Weirauch, Matthew T.; Wells, Jackson T.; Werren, John H.; Worley, Kim C.; Zdobnov, Evgeny M.; Gundersen-Rindal, Dawn E. (2020-03-14)Background Halyomorpha halys (Stål), the brown marmorated stink bug, is a highly invasive insect species due in part to its exceptionally high levels of polyphagy. This species is also a nuisance due to overwintering in human-made structures. It has caused significant agricultural losses in recent years along the Atlantic seaboard of North America and in continental Europe. Genomic resources will assist with determining the molecular basis for this species’ feeding and habitat traits, defining potential targets for pest management strategies. Results Analysis of the 1.15-Gb draft genome assembly has identified a wide variety of genetic elements underpinning the biological characteristics of this formidable pest species, encompassing the roles of sensory functions, digestion, immunity, detoxification and development, all of which likely support H. halys’ capacity for invasiveness. Many of the genes identified herein have potential for biomolecular pesticide applications. Conclusions Availability of the H. halys genome sequence will be useful for the development of environmentally friendly biomolecular pesticides to be applied in concert with more traditional, synthetic chemical-based controls.
- Microstructure Representation and Prediction via Convolutional Neural Network-Based Texture Representation and Synthesis, Towards Process Structure LinkageHan, Yi (Virginia Tech, 2021-05-19)Metal additive manufacturing (AM) provides a platform for microstructure optimization via process control, the ability to model the evolution of microstructures from changes in processing condition or even predict the microstructures from given processing condition would greatly reduce the time frame and the cost of the optimization process. In 1, we present a deep learning framework to quantitatively analyze the microstructural variations of metals fabricated by AM under different processing conditions. We also demonstrate the capability of predicting new microstructures from the representation with deep learning and we can explore the physical insights of the implicitly expressed microstructure representations. We validate our framework using samples fabricated by a solid-state AM technology, additive friction stir deposition, which typically results in equiaxed microstructures. In 2, we further improve and generalize the generating framework, a set of metrics is used to quantitatively analyze the effectiveness of the generation by comparing the microstructure characteristics between the generated samples and the originals. We also take advantage of image processing techniques to aid the calculation of metrics that require grain segmentation.
- Unique features of a global human ectoparasite identified through sequencing of the bed bug genomeBenoit, Joshua B.; Adelman, Zach N.; Reinhardt, Klaus; Dolan, Amanda M.; Poelchau, Monica; Jennings, Emily C.; Szuter, Elise M.; Hagan, Richard W.; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M.; Nelson, David R.; Rosendale, Andrew J.; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M.; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R.; Ioannidis, Panagiotis; Waterhouse, Robert M.; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J. Spencer; Gondhalekar, Ameya D.; Scharf, Michael E.; Peterson, Brittany F.; Raje, Kapil R.; Hottel, Benjamin A.; Armisen, David; Crumiere, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Severine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S. T.; Duncan, Elizabeth J.; Murali, Shwetha C.; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L.; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C.; Muzny, Donna M.; Wheeler, David; Panfilio, Kristen A.; Jentzsch, Iris M. Vargas; Vargo, Edward L.; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T.; Anderson, Michelle A. E.; Jones, Jeffery W.; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D.; Attardo, Geoffrey M.; Robertson, Hugh M.; Zdobnov, Evgeny M.; Ribeiro, Jose M. C.; Gibbs, Richard A.; Werren, John H.; Palli, Subba R.; Schal, Coby; Richards, Stephen (Nature, 2016-02-02)The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host–symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human–bed bug and symbiont–bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite