Browsing by Author "Sharma, Dhruv"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- CS4984/CS5984: Big Data Text Summarization Team 17 ETDsKhaghani, Farnaz; Marin Thomas, Ashin; Patnayak, Chinmaya; Sharma, Dhruv; Aromando, John (Virginia Tech, 2018-12-15)Given the current explosion of information over various media such as electronic and physical texts, concise and relevant data has become key to the understanding of things. Summarization, which essentially is the process of reducing the text to convey only the salient aspects, has emerged as a challenging task in the field of Natural Language Processing. In a scientific construct, academia has been generating voluminous amounts of data in the form of theses and dissertations. Obtaining the chapter-wise summary of an electronic thesis or dissertation can be a computationally expensive task, particularly because of its length and the subject to which it pertains to. Through this course, research and development of various summarization techniques, primarily extractive and abstractive summarization, were analyzed. There have been various developments in the field of deep learning to tackle problems related to summarization and produce coherent and meaningful summaries for news articles. In this project, tools that could be used to generate coherent and concise summaries of long electronic theses and dissertations (ETDs) were developed as well. The major concern initially was to get the text from a PDF file of an ETD. GROBID and Scienceparse were used as pre-processing tools to carry out this task and presented the text from a PDF in a structured format such as XML or JSON file. The outputs from each of the tools were compared qualitatively as well as quantitatively. After this, a transfer learning approach was adopted, wherein a pre-trained model was tweaked to fit to the task of summarizing each ETD. This came in as a challenge to make the model learn the nuances of an ETD. An iterative approach was used to explore various networks, each trying to improve the shortcomings of the previous one in its novel way. Existing deep learning models including Sequence-2-Sequence, Pointer Generator Networks, and A Hybrid Extractive-Abstractive Reinforce-Selecting Sentence Rewriting Network, were used to generate and test summaries. Further tweaks were made to these deep neural networks to account for much longer and varied datasets as compared to what they were inherently designed to work for -- in this case ETDs. A thorough evaluation of these generated summaries was also done with respect to golden standards for five dissertations and theses created during the span of the course. ROUGE-1, ROUGE-2, and ROUGE-SU4 were used to compare the generated summaries with the golden standards. The average ROUGE scores were 0.1387, 0.1224, and 0.0480 respectively. These low ROUGE scores could be attributed to the varying summary length, and also to the complexity of the task of summarizing an ETD. The scope of improvements and the underlying reasons for the performance have also been analyzed. The conclusion that can be drawn from the project is that any machine learning task is highly biased by what pattern is inherently present in the data on which it is being trained. In the context of summarization, there can be a different perspective from which an article can be summarized, and thus the quantitative evaluation measures can vary drastically even after the summary is a coherent one.
- MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domainSharma, Dhruv; Purushotham, Sanjay; Reddy, Chandan K. (Nature Portfolio, 2021-10-06)Medical images are difficult to comprehend for a person without expertise. The scarcity of medical practitioners across the globe often face the issue of physical and mental fatigue due to the high number of cases, inducing human errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision maker. Thus, it becomes crucial to have a reliable visual question answering (VQA) system to provide a ‘second opinion’ on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. Moreover, the VQA system for medical images needs to consider a limited amount of training data available in this domain. In this paper, we develop MedFuseNet, an attention-based multimodal deep learning model, for VQA on medical images taking the associated challenges into account. Our MedFuseNet aims at maximizing the learning with minimal complexity by breaking the problem statement into simpler tasks and predicting the answer. We tackle two types of answer prediction—categorization and generation. We conducted an extensive set of quantitative and qualitative analyses to evaluate the performance of MedFuseNet. Our experiments demonstrate that MedFuseNet outperforms the state-of-the-art VQA methods, and that visualization of the captured attentions showcases the intepretability of our model’s predicted results.
- Visual Question Answering in the Medical DomainSharma, Dhruv (Virginia Tech, 2020-07-21)Medical images are extremely complicated to comprehend for a person without expertise. The limited number of practitioners across the globe often face the issue of fatigue due to the high number of cases. This fatigue, physical and mental, can induce human-errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision-maker. Thus, it becomes crucial to have a reliable Visual Question Answering (VQA) system which can provide a "second opinion" on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. Moreover, the VQA system for medical images needs to consider a limited amount of training data available in this domain. In this thesis, we develop a deep learning-based model for VQA on medical images taking the associated challenges into account. Our MedFuseNet system aims at maximizing the learning with minimal complexity by breaking the problem statement into simpler tasks and weaving everything together to predict the answer. We tackle two types of answer prediction - categorization and generation. We conduct an extensive set of both quantitative and qualitative analyses to evaluate the performance of MedFuseNet. Our results conclude that MedFuseNet outperforms other state-of-the-art methods available in the literature for these tasks.