Browsing by Author "Naleshwarkar, Kanad"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- PaperPalooza: A comprehensive research support toolBhagwat, Devashree; Naleshwarkar, Kanad; Shailly, Ritish; Murali, Vivek; Bhujbal, Sanket (2024-05-03)Researchers use multiple tools daily, they often need to search for new papers pertaining to their field, save papers that they wish to cite, and check to make sure there are no grammatical errors in their writing. A few researchers and advisors would like to keep tabs on their ongoing projects. Sometimes they just want a tool that quickly summarizes a paper or a giant piece of text so they can skim through without spending hours. Paperpalooza is a tool we developed that integrates the solutions to all the above requirements into one single application that can be the one stop shop for any researcher.
- Team 4: Language Models, Classification and SummarizationNaleshwarkar, Kanad; Bhatambarekar, Gayatri; Desai, Zeel; Kumaran, Aishwarya; Haque, Shadab; Srinivasan Manikandan, Adithya Harish (Virginia Tech, 2023-12-17)The CS5604 class at Virginia Tech has been tasked with developing an information retrieval and analysis system that can handle the collection of data of at least 500,000 Electronic Theses and Dissertations (ETDs), under the direction of Dr. Edward A. Fox. This program should function as a search engine with a variety of capabilities, including browsing, searching, giving suggestions, and rating search results. The class has been split into six teams to execute this job, and each team has been given a specific task. The goal of this report is to provide an overview of Team 4's contribution, which focuses on classification, summarization, and language models. Our prime tasks were testing out various models for classification and summarization. During the course of this project, we evaluated models developed by the previous team working on this task and explored various strategies to improve them. For the classification task, we fine-tuned the SciBERT model to get standardized subject category labels that are in accordance with ProQuest. We also evaluated a large language model, LLaMA 2, for the classification task, and after comparing its performance with the fine-tuned SciBERT model, we observed that LLaMA 2 was not efficient enough for a large-scale system that the class was working on. For summarization, we evaluated summaries generated by various transformer, non-transformer, and LLM-based models. The five models that we evaluated for summarization were TextRank, LexRank, LSA, BigBirdPegasus, and LLaMA 2 7B. We observed that although TextRank and BigBirdPegasus had comparable results, the summaries generated by TextRank were more comprehensive. This experimentation gave us valuable insight into the complexities of processing a large set of documents and performing tasks such as classification and summarization. Additionally, it allowed us to explore the deployment of these models in a production environment to evaluate their performance at scale.