Browsing by Author "Lee, Sunshin"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- Analysis of Moving Events Using TweetsPatil, Supritha Basavaraj (Virginia Tech, 2019-07-02)The Digital Library Research Laboratory (DLRL) has collected over 3.5 billion tweets on different events for the Coordinated, Behaviorally-Aware Recovery for Transportation and Power Disruptions (CBAR-tpd), the Integrated Digital Event Archiving and Library (IDEAL), and the Global Event Trend Archive Research (GETAR) projects. The tweet collection topics include heart attack, solar eclipse, terrorism, etc. There are several collections on naturally occurring events such as hurricanes, floods, and solar eclipses. Such naturally occurring events are distributed across space and time. It would be beneficial to researchers if we can perform a spatial-temporal analysis to test some hypotheses, and to find any trends that tweets would reveal for such events. I apply an existing algorithm to detect locations from tweets by modifying it to work better with the type of datasets I work with. I use the time captured in tweets and also identify the tense of the sentences in tweets to perform the temporal analysis. I build a rule-based model for obtaining the tense of a tweet. The results from these two algorithms are merged to analyze naturally occurring moving events such as solar eclipses and hurricanes. Using the spatial-temporal information from tweets, I study if tweets can be a relevant source of information in understanding the movement of the event. I create visualizations to compare the actual path of the event with the information extracted by my algorithms. After examining the results from the analysis, I noted that Twitter can be a reliable source to identify places affected by moving events almost immediately. The locations obtained are at a more detailed level than in news-wires. We can also identify the time that an event affected a particular region by date.
- FishTraits version 2: integrating ecological, biogeographic and bibliographic informationXie, Zhiwu; Frimpong, Emmanuel A.; Lee, Sunshin (ACM, 2013-07-22)In this paper we describe the new development of FishTraits. Originating from an ecological database that documents and consolidates more than 100 traits for 809 fish species, the new version focuses on the integration of these traits data with the bibliographic and biogeographic information. We explain the overall design as well as the implementation details.
- Geo-Locating Tweets with Latent Location InformationLee, Sunshin (Virginia Tech, 2017-02-13)As part of our work on the NSF funded Integrated Digital Event Archiving and Library (IDEAL) project and the Global Event and Trend Archive Research (GETAR) project, we collected over 1.4 billion tweets using over 1,000 keywords, key phrases, mentions, or hashtags, starting from 2009. Since many tweets talk about events (with useful location information), such as natural disasters, emergencies, and accidents, it is important to geo-locate those tweets whenever possible. Due to possible location ambiguity, finding a tweet's location often is challenging. Many distinct places have the same geoname, e.g., "Greenville" matches 50 different locations in the U.S.A. Frequently, in tweets, explicit location information, like geonames mentioned, is insufficient, because tweets are often brief and incomplete. They have a small fraction of the full location information of an event due to the 140 character limitation. Location indicative words (LIWs) may include latent location information, for example, "Water main break near White House" does not have any geonames but it is related to a location "1600 Pennsylvania Ave NW, Washington, DC 20500 USA" indicated by the key phrase 'White House'. To disambiguate tweet locations, we first extracted geospatial named entities (geonames) and predicted implicit state (e.g., Virginia or California) information from entities using machine learning algorithms including Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest (RF). Implicit state information helps reduce ambiguity. We also studied how location information of events is expressed in tweets and how latent location indicative information can help to geo-locate tweets. We then used a machine learning (ML) approach to predict the implicit state using geonames and LIWs. We conducted experiments with tweets (e.g., about potholes), and found significant improvement in disambiguating tweet locations using a ML algorithm along with the Stanford NER. Adding state information predicted by our classifiers increased the possibility to find the state-level geo-location unambiguously by up to 80%. We also studied over 6 million tweets (3 mid-size and 2 big-size collections about water main breaks, sinkholes, potholes, car crashes, and car accidents), covering 17 months. We found that up to 91.1% of tweets have at least one type of location information (geo-coordinates or geonames), or LIWs. We also demonstrated that in most cases adding LIWs helps geo-locate tweets with less ambiguity using a geo-coding API. Finally, we conducted additional experiments with the five different tweet collections, and found significant improvement in disambiguating tweet locations using a ML approach with geonames and all LIWs that are present in tweet texts as features.