Browsing by Author "Wang, Haitao"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- CS5604: Information and Storage Retrieval Fall 2017 - FE (Front-End Team) Chon, Jieun; Wang, Haitao; Bian, Yali; Niu, Shuo (Virginia Tech, 2017-12-24)Social media and Web data are becoming important sources of information for researchers to monitor and study global events. GETAR, led by Dr. Edward Fox, is a project aiming to collect, organize, browse, visualize, study, analyze, summarize, and explore content and sources related to biodiversity, climate change, crises, disasters, elections, energy policy, environmental policy/planning, geospatial information, green engineering, human rights, inequality, migrations, nuclear power, population growth, resiliency, shootings, sustainability, violence, etc. The report introduces the work of the Front End (FE) team analyzing users' requirements and building user interfaces for people to explore tweet/webpage data. The work of the FE team highly relies on the results from other teams. Our duty includes presenting the collected tweets/webpages, visualizing the clusters and topics, showing the indexed and clustered search results, and last but not least allowing users to perform customized queries and exploration. Therefore the team needs to consider how other teams collect and manage the data, as well as how people utilize the information to gain insights from the data repository. Throughout Fall 2017, our team aims to bridge the data archive and users’ need, focusing on providing various user interfaces for tweet/webpage exploration and analysis. Overall, two main user interfaces are designed and implemented throughout the semester. (1) A visualization-based analytical tool for people to create categories by searching and interacting with filtering tools, which are presented in visualizations such as bar-chart, tag cloud, and node-link graph. (2) A geo-based interface for location-based information, implemented with GeoBlacklight, enabling users to view tweets/webpages on maps. This report documents the background, plans, schedule, design, implementation, software installation, and other related useful information. We used Solr and a triple-store to provide data, and the "getar-cs5604f17-final_shard1_replica1" collection was used in the final testing and delivery. An overview of the team work and detailed design and implementation are both provided. We highlight the visualization-based interface and the location-based interface, as they provide visual tools for people to better understand the data collected by all the teams. We seek to provide information on how we extract users' requirements, how user needs are reflected in light of the related literature, and how that leads to the design of the visualization and geo-interface. An installation manual is also detailed, seeking to help other software engineers who will keep working on GETAR to reuse our work.
- Hybrid Summarization of Dakota Access Pipeline Protests (NoDAPL)Chen, Xiaoyu; Wang, Haitao; Mehrotra, Maanav; Chhikara, Naman; Sun, Di (Virginia Tech, 2018-12-14)Dakota Access Pipeline Protests (known with the hashtag #NoDAPL) are grassroots movements that began in April 2016 in reaction to the approved construction of Energy Transfer Partners’ Dakota Access Pipeline in the northern United States. The NoDAPL movements produce many FaceBook messages, tweets, blogs, and news, which reflect different aspects of the NoDAPL events. The related information keeps increasing rapidly, which makes it difficult to understand the events in an efficient manner. Therefore, it is invaluable to automatically or at least semi-automatically generate short summaries based on the online available big data. Motivated by this automatic summarization need, the objective of this project is to propose a novel automatic summarization approach to efficiently and effectively summarize the topics hidden in the online big text data. Although automatic summarization has been investigated for more than 60 years since the publication of Luhn’s 1958 seminal paper, several challenges exist in summarizing online big text sets, such as large proportion of noise texts, highly redundant information, multiple latent topics, etc. Therefore, we propose an automatic framework with minimal human efforts to summarize big online text sets (~11,000 documents on NoDAPL) according to latent topics with nonrelevant information removed. This framework provides a hybrid model to combine the advantages of latent Dirichlet allocation (LDA) based extractive and deep-learning based abstractive methods. Different from semi-automatic summarization approaches such as template-based summarization, the proposed method does not require a deep understanding of the events from the practitioners to create the template nor to fill in the template by using regular expressions. During the procedure, the only human effort needed is to manually label a few (say, 100) documents as relevant and irrelevant. We evaluate the quality of the generated automatic summary with both extrinsic and intrinsic measurement. In the extrinsic subjective evaluation, we design a set of guideline questions and conduct a task-based measurement. Results show that 91.3% of sentences are within the scope of the guideline, and 69.6% of the outlined questions can be answered by reading the generated summary. The intrinsic ROUGE measurements show our entity coverage is a total of 2.6% and ROUGE L and ROUGE SU4 scores are 0.148 and 0.065. Overall, the proposed hybrid model achieves decent performance on summarizing NoDAPL events. Future work includes testing of the approach with more textual datasets for interesting topics, and investigation of topic modeling-supervised classification approach to minimize human efforts in automatic summarization. Besides, we also would like to investigate a deep learning-based recommender system for better sentence re-ranking.
- Monitoring vegetation dynamics in Zhongwei, an arid city of Northwest ChinaWang, Haitao (Virginia Tech, 2014-06-10)This case study used Zhongwei City in northwest China to quantify the urbanization and revegetation processes (1990-2011) through a unified sub-pixel measure of vegetation cover. Research strategies included: (1) Conduct sub-pixel vegetation mapping (1990, 1996, 2004, and 2011) with Random Forest (RF) algorithm by integrating high (OrbView-3) and medium spatial resolution (Landsat TM) data; (2) Examine simple Dark Object Subtraction (DOS) atmospheric correction method to support temporal generalization of sub-pixel mapping algorithm; (3) And characterize patterns of vegetation cover dynamics based on change detection analysis. We found the RF algorithm, combined with simple DOS, showed good generalization capability for sub-pixel vegetation mapping. Predicted sub-pixel vegetation proportions were consistent for "pseudo-invariant" pixels. Vegetation change analysis suggested persistent urban development within the city boundary, accompanied by a continuous expansion of revegetated area at the city fringe. Urban development occurred at both the suburban and urban core areas, and was mainly shaped by transportation networks. A transition in revegetation practices was documented: the large-scale governmental revegetation programs were replaced by the commercial afforestation conducted by industries. This study showed a slight increase in vegetation cover over the time period, balanced by losses to urban expansion, and a likely severe degradation of vegetation cover due to conversion of arable land to desert vegetation. The loss of arable land and the growth of artificial desert vegetation have yielded a dynamic equilibrium in terms of overall vegetation cover during 1990 to 2011, but in the long run vegetation quality is certainly reduced.
- Spatial-Temporal Pattern of Agricultural Total Factor Productivity Change (Tfpch) in China and Its Implications for Agricultural Sustainable DevelopmentZhang, Haonan; Chen, Zheng; Wang, Jieyong; Wang, Haitao; Zhang, Yingwen (MDPI, 2023-03-21)With increasing tension between humans and land, and arising pressure on food security in China, the improvement of total factor productivity is important to realize agricultural modernization and promote rural revitalization strategy. In this study, we applied the DEA-Malmquist index method to measure the growth of China’s agricultural total factor productivity and its decomposition indexes at the prefecture-level city scale from 2011 to 2020. We found the average annual growth rate of agricultural total factor productivity was 4.5% during this period, with technical change being the driving factor and technical efficiency change being the suppressing factor. There is an initial decrease and then an increase in the Dagum Gini coefficient. The cold and hot spot areas of agricultural Tfpch were clearly formed. During the decade, the gravity center of agricultural Tfpch has migrated from the northeast to the southwest in general. Based on the characteristics of agricultural Tfpch, China is classified into four zones. In the future, the Chinese government should balance the government and the market mechanism, improve the agricultural science and technology innovation system and technology adoption promotion system, and implement classified policies to improve agriculture production efficiency.