Information Extraction of Technical Details From Scholarly Articles

Kaushal, Kulendra Kumar

Information Extraction of Technical Details From Scholarly Articles

dc.contributor.author	Kaushal, Kulendra Kumar	en
dc.contributor.committeechair	Ramakrishnan, Narendran	en
dc.contributor.committeemember	Butler, Patrick Julian Carey	en
dc.contributor.committeemember	Lu, Chang Tien	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2022-12-09T07:00:25Z	en
dc.date.available	2022-12-09T07:00:25Z	en
dc.date.issued	2021-06-16	en
dc.description.abstract	Researchers have made significant progress in information extraction from short documents in the last few years, including social media interaction, news articles, and email excerpts. This research aims to extract technical entities like hardware resources, computing platforms, compute time, programming language, and libraries from scholarly research articles. Research articles are generally long documents having both salient as well as non-salient entities. Analyzing the cross-sectional relation, filtering the relevant information, measuring the saliency of mentioned entities, and extracting novel entities are some of the technical challenges involved in this research. This work presents a detailed study about the performance, effectiveness, and scalability of rule-based weakly supervised algorithms. We also develop an automated end-to-end Research Entity and Relationship Extractor (E2R Extractor). Additionally, we perform a comprehensive study about the effectiveness of existing deep learning-based information extraction tools like Dygie, Dygie++, SciREX. The research also contributes a dataset containing novel entities annotated in BILUO format and represents the baseline results using the E2R extractor on the proposed dataset. The results indicate that the E2R extractor successfully extracts salient entities from research articles.	en
dc.description.abstractgeneral	Information extraction is a process of automatically extracting meaningful information from unstructured text such as articles, news feeds and presenting it in a structured format. Researchers have made significant progress in this domain over the past few years. However, their work primarily focuses on short documents such as social media interactions, news articles, email excerpts, and not on long documents such as scholarly articles and research papers. Long documents contain a lot of redundant data, so filtering and extracting meaningful information is quite challenging. This work focuses on extracting entities such as hardware resources, compute platforms, and programming languages used in scholarly articles. We present a deep learning-based model to extract such entities from research articles and research papers. We evaluate the performance of our deep learning model against simple rule-based algorithms and other state-of-the-art models for extracting the desired entities. Our work also contributes a labeled dataset containing the entities mentioned above and results obtained on this dataset using our deep learning model.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:31609	en
dc.identifier.uri	http://hdl.handle.net/10919/112825	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Information Extraction	en
dc.subject	Long Documents	en
dc.subject	Research Articles	en
dc.subject	Named Entity Recognition	en
dc.subject	Hardware Resources	en
dc.subject	Compute Platform	en
dc.subject	Programming Language and Libraries	en
dc.title	Information Extraction of Technical Details From Scholarly Articles	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Kaushal_K_T_2021.pdf
Size:: 2.1 MB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses