Browsing by Author "Mehta, Sneha"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Topic Analysis project in CS5604, Spring 2016: Extracting Topics from Tweets and Webpages for IDEALMehta, Sneha; Vinayagam, Radha Krishnan (2016-05-04)The IDEAL (Integrated Digital Event Archiving and Library) project aims to ingest tweets and web-based content from social media and the web and index it for retrieval. One of the required milestones for a graduate-level course CS5604 on Information Storage and Retrieval is to implement a state-of-the-art information retrieval and analysis system in support of the IDEAL project. The overall objective of this project is to build a robust Information Retrieval system on top of Solr, a general purpose open-source search engine. To enable the search and retrieval process we use various approaches including Latent Dirichlet Allocation, Named-Entity Recognition, Clustering, Classification, Social Network Analysis and Front-end interface for search. The project has been divided into various segments and our team has been assigned Topic Analysis. A topic in this context is a set of words that can be used to represent a document. The output of our team will be a well-defined set of topics that describe each document in the collections we have. The topics will facilitate a facet based search in the frontend search interface. This submission includes the project report, final presentation, LDA code, test datasets, and results. In the project report,we introduce the relevant background, design & implementation, and the requirements to make our part functional. The developer’s manual describes our approach in detail. Walk-through tutorials for related software packages have been included in the user’s manual. Finally, we also provide exhaustive results and detailed evaluation methodologies for the topic quality.
- Towards Explainable Event Detection and ExtractionMehta, Sneha (Virginia Tech, 2021-07-22)Event extraction refers to extracting specific knowledge of incidents from natural language text and consolidating it into a structured form. Some important applications of event extraction include search, retrieval, question answering and event forecasting. However, before events can be extracted it is imperative to detect events i.e. identify which documents from a large collection contain events of interest and from those extracting the sentences that might contain the event related information. This task is challenging because it is easier to obtain labels at the document level than finegrained annotations at the sentence level. Current approaches for this task are suboptimal because they directly aggregate sentence probabilities estimated by a classifier to obtain document probabilities resulting in error propagation. To alleviate this problem we propose to leverage recent advances in representation learning by using attention mechanisms. Specifically, for event detection we propose a method to compute document embeddings from sentence embeddings by leveraging attention and training a document classifier on those embeddings to mitigate the error propagation problem. However, we find that existing attention mechanisms are inept for this task, because either they are suboptimal or they use a large number of parameters. To address this problem we propose a lean attention mechanism which is effective for event detection. Current approaches for event extraction rely on finegrained labels in specific domains. Extending extraction to new domains is challenging because of difficulty of collecting finegrained data. Machine reading comprehension(MRC) based approaches, that enable zero-shot extraction struggle with syntactically complex sentences and long-range dependencies. To mitigate this problem, we propose a syntactic sentence simplification approach that is guided by MRC model to improve its performance on event extraction.