Identifying Drug Related Events from Social Media

Abstract

The overall goal of the project was to establish an innovative information system, which can automatically detect and extract content related to side effect of drugs from user reviews, determine whether they are talking about effectiveness or adverse drug events, extract keywords or phrases related to effectiveness or adverse drug events, and visualize the resulting information to doctors and patients. Our group was provided with crawled Twitter reviews and social network forum reviews on drugs that are used to treat diabetes. The raw data were manually labeled in four different label for named entity recognition in order to create training, testing, and validation sets. Using the training data set, a side effect dictionary was created using PamTAT. Side effect dictionary was then refined by removing neutral words to increase accuracy. To validate the accuracy of the generated side effect dictionary, the results of side effect analysis based on the generated dictionary and two other general negative word dictionaries were compared. The generated side effect dictionary performed better in recognizing side effect entities. After validation, the generated dictionary was further tested with a set of user reviews on a drug that is used to treat stroke. Using generated dictionary, the project accomplished to accurately determine if any reviews relates to the mention of side effect of specific drugs. The project successfully delivered to accurately detect mention of side effect from the reviews in > 90% accuracy. Resulting algorithm can be used to create innovative information system to detect and extract content related to side effect of drugs for any other drugs with creation of problem specific dictionary. The project should be further developed to incorporate automatic extraction of user reviews, analysis of data, and visualization of results.

Description

Keywords

Machine learning, pamTat, multimedia, hypertext, web crawling, confusion matrix, smoke list, statistical learning, statistical model

Citation