Scholarly Works, Sanghani Center for Artificial Intelligence and Data Analytics

Permanent URI for this collection

https://hdl.handle.net/10919/111717

Browse

Now showing 1 - 6 of 6

Vision-Language Models for Biomedical Applications
Thapa, Surendrabikram; Naseem, Usman; Zhou, Luping; Kim, Jinman (ACM, 2024-10-28)
Vision-language models (VLMs) are transforming the landscape of biomedical research and healthcare by enabling the seamless integration and interpretation of complex multimodal data, including medical images and clinical texts. Recognizing the growing impact of these models, the first international workshop on Vision- Language Models for Biomedicine (VLM4Bio) was held in conjunction with ACM Multimedia 2024. The workshop aimed to address the critical need for advanced techniques that can leverage VLMs in applications such as medical imaging, diagnostics, and personalized treatment. As healthcare data increasingly involves both visual and textual information, VLM4Bio provided a platform for interdisciplinary collaboration between experts in natural language processing, computer vision, biomedical engineering, and AI ethics. This paper provides an overview of the inaugural edition of the VLM4Bio workshop, summarizing the key discussions, contributions, and future directions for expanding the workshop’s scope and influence in subsequent editions.
Semi-Supervised Code Translation Overcoming the Scarcity of Parallel Code Data
Zhu, Ming; Karim, Mohimenul; Lourentzou, Ismini; Yao, Daphne (ACM, 2024-10-27)
Neural code translation is the task of converting source code from one programming language to another. One of the main challenges is the scarcity of parallel code data, which hinders the ability of translation models to learn accurate cross-language alignments. In this paper, we introduce MIRACLE, a semi-supervised approach that improves code translation through synthesizing high-quality parallel code data and curriculum learning on code data with ascending alignment levels. MIRACLE leverages static analysis and compilation to generate synthetic parallel code datasets with enhanced quality and alignment to address the challenge of data scarcity. We evaluate the proposed method along with strong baselines including instruction-tuned Large Language Models (LLMs) for code. Our analysis reveals that LLMs pre-trained on open-source code data, regardless of their size, suffer from the “shallow translation” problem. This issue arises when translated code copies keywords, statements, and even code blocks from the source language, leading to compilation and runtime errors. Extensive experiments demonstrate that our method significantly mitigates this issue, enhancing code translation performance across multiple models in C++, Java, Python, and C. Remarkably, MIRACLE outperforms code LLMs that are ten times larger in size. MIRACLE also achieves up to a 43% improvement in C code translation with fewer than 150 annotated examples.
RUHate-MM: Identification of Hate Speech and Targets using Multimodal Data from Russia-Ukraine Crisis
Thapa, Surendrabikram; Jafri, Farhan; Rauniyar, Kritesh; Nasim, Mehwish; Naseem, Usman (ACM, 2024-05-13)
During the conflict between Ukraine and Russia, hate speech targeted toward specific groups was widespread on different social media platforms. With most social platforms allowing multimodal content, the use of multimodal content to express hate speech is widespread on the Internet. Although there has been considerable research in detecting hate speech within unimodal content, the investigation into multimodal content remains insufficient. The limited availability of annotated multimodal datasets further restricts our ability to explore new methods to interpret and identify hate speech and its targets. The availability of annotated datasets for hate speech detection during political events, such as invasions, are even limited. To fill this gap, we introduce a comprehensive multimodal dataset consisting of 20,675 posts related to the Russia- Ukraine crisis, which were manually annotated as either ‘Hate Speech’ or ‘No Hate Speech’. Additionally, we categorize the hate speech data into three targets: ‘Individual’, ‘Organization’, and ‘Community’. Our benchmarked evaluations show that there is still room for improvement in accurately identifying hate speech and its targets. We hope that the availability of this dataset and the evaluations performed on it will encourage the development of new methods for identifying hate speech and its targets during political events like invasions and wars. The dataset and resources are made available at https://github.com/Farhan-jafri/Russia-Ukraine.
Data analysis and modeling pipelines for controlled networked social science experiments
Cedeno-Mieles, Vanessa; Hu, Zhihao; Ren, Yihui; Deng, Xinwei; Contractor, Noshir; Ekanayake, Saliya; Epstein, Joshua M.; Goode, Brian J.; Korkmaz, Gizem; Kuhlman, Christopher J.; Machi, Dustin; Macy, Michael; Marathe, Madhav V.; Ramakrishnan, Naren; Saraf, Parang; Self, Nathan (PLOS, 2020-11-24)
There is large interest in networked social science experiments for understanding human behavior at-scale. Significant effort is required to perform data analytics on experimental outputs and for computational modeling of custom experiments. Moreover, experiments and modeling are often performed in a cycle, enabling iterative experimental refinement and data modeling to uncover interesting insights and to generate/refute hypotheses about social behaviors. The current practice for social analysts is to develop tailor-made computer programs and analytical scripts for experiments and modeling. This often leads to inefficiencies and duplication of effort. In this work, we propose a pipeline framework to take a significant step towards overcoming these challenges. Our contribution is to describe the design and implementation of a software system to automate many of the steps involved in analyzing social science experimental data, building models to capture the behavior of human subjects, and providing data to test hypotheses. The proposed pipeline framework consists of formal models, formal algorithms, and theoretical models as the basis for the design and implementation. We propose a formal data model, such that if an experiment can be described in terms of this model, then our pipeline software can be used to analyze data efficiently. The merits of the proposed pipeline framework is elaborated by several case studies of networked social science experiments.
Collaborative efforts to forecast seasonal influenza in the United States, 2015–2016
McGowan, Craig J.; Biggerstaff, Matthew; Johansson, Michael; Apfeldorf, Karyn M.; Ben-Nun, Michal; Brooks, Logan; Convertino, Matteo; Erraguntla, Madhav; Farrow, David C.; Freeze, John; Ghosh, Saurav; Hyun, Sangwon; Kandula, Sasikiran; Lega, Joceline; Liu, Yang; Michaud, Nicholas; Morita, Haruka; Niemi, Jarad; Ramakrishnan, Naren; Ray, Evan L.; Reich, Nicholas G.; Riley, Pete; Shaman, Jeffrey; Tibshirani, Ryan; Vespignani, Alessandro; Zhang, Qian; Reed, Carrie; Rosenfeld, Roni; Ulloa, Nehemias; Will, Katie; Turtle, James; Bacon, David; Riley, Steven; Yang, Wan; The Influenza Forecasting Working Group (Nature Publishing Group, 2019-01-24)
Since 2013, the Centers for Disease Control and Prevention (CDC) has hosted an annual influenza season forecasting challenge. The 2015–2016 challenge consisted of weekly probabilistic forecasts of multiple targets, including fourteen models submitted by eleven teams. Forecast skill was evaluated using a modified logarithmic score. We averaged submitted forecasts into a mean ensemble model and compared them against predictions based on historical trends. Forecast skill was highest for seasonal peak intensity and short-term forecasts, while forecast skill for timing of season onset and peak week was generally low. Higher forecast skill was associated with team participation in previous influenza forecasting challenges and utilization of ensemble forecasting techniques. The mean ensemble consistently performed well and outperformed historical trend predictions. CDC and contributing teams will continue to advance influenza forecasting and work to improve the accuracy and reliability of forecasts to facilitate increased incorporation into public health response efforts.
What to know before forecasting the flu
Chakraborty, Prithwish; Lewis, Bryan L.; Eubank, Stephen; Brownstein, John S.; Marathe, Madhav V.; Ramakrishnan, Naren (PLOS, 2018-10-12)
Accurate and timely influenza (flu) forecasting has gained significant traction in recent times. If done well, such forecasting can aid in deploying effective public health measures. Unlike other statistical or machine learning problems, however, flu forecasting brings unique challenges and considerations stemming from the nature of the surveillance apparatus and the end utility of forecasts. This article presents a set of considerations for flu forecasters to take into account prior to applying forecasting algorithms.

Browse

Recent Submissions