Browsing by Author "Butler, Patrick Julian Carey"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Information Extraction of Technical Details From Scholarly ArticlesKaushal, Kulendra Kumar (Virginia Tech, 2021-06-16)Researchers have made significant progress in information extraction from short documents in the last few years, including social media interaction, news articles, and email excerpts. This research aims to extract technical entities like hardware resources, computing platforms, compute time, programming language, and libraries from scholarly research articles. Research articles are generally long documents having both salient as well as non-salient entities. Analyzing the cross-sectional relation, filtering the relevant information, measuring the saliency of mentioned entities, and extracting novel entities are some of the technical challenges involved in this research. This work presents a detailed study about the performance, effectiveness, and scalability of rule-based weakly supervised algorithms. We also develop an automated end-to-end Research Entity and Relationship Extractor (E2R Extractor). Additionally, we perform a comprehensive study about the effectiveness of existing deep learning-based information extraction tools like Dygie, Dygie++, SciREX. The research also contributes a dataset containing novel entities annotated in BILUO format and represents the baseline results using the E2R extractor on the proposed dataset. The results indicate that the E2R extractor successfully extracts salient entities from research articles.
- Knowledge Discovery in Intelligence AnalysisButler, Patrick Julian Carey (Virginia Tech, 2014-06-03)Intelligence analysts today are faced with many challenges, chief among them being the need to fuse disparate streams of data, as well as rapidly arrive at analytical decisions and quantitative predictions for use by policy makers. These problems are further exacerbated by the sheer volume of data that is available to intelligence analysts. Machine learning methods enable the automated transduction of such large datasets from raw feeds to actionable knowledge but successful use of such methods require integrated frameworks for contextualizing them within the work processes of the analyst. Intelligence analysts typically distinguish between three classes of problems: collections, analysis, and operations. This dissertation specifically focuses on two problems in analysis: i) the reconstruction of shredded documents using a visual analytic framework combining computer vision techniques and user input, and ii) the design and implementation of a system for event forecasting which allows an analyst to not just consume forecasts of significant societal events but also understand the rationale behind these alerts and the use of data ablation techniques to determine the strength of conclusions. This work does not attempt to replace the role of the analyst with machine learning but instead outlines several methods to augment the analyst with machine learning. In doing so this dissertation also explores the responsibilities of an analyst in evaluating complex models and decisions made by these models. Finally, this dissertation defines a list of responsibilities for models designed to aid the analyst's work in evaluating and verifying the models.