A Classifier to Detect Informational vs. Non-Informational Heart Attack Tweets

Karajeh, Ola; Darweesh, Dirar; Darwish, Omar; Abu-El-Rub, Noor; Alsinglawi, Belal; Alsaedi, Nasser

A Classifier to Detect Informational vs. Non-Informational Heart Attack Tweets

dc.contributor.author	Karajeh, Ola	en
dc.contributor.author	Darweesh, Dirar	en
dc.contributor.author	Darwish, Omar	en
dc.contributor.author	Abu-El-Rub, Noor	en
dc.contributor.author	Alsinglawi, Belal	en
dc.contributor.author	Alsaedi, Nasser	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2021-01-22T18:04:21Z	en
dc.date.available	2021-01-22T18:04:21Z	en
dc.date.issued	2021-01-16	en
dc.date.updated	2021-01-22T15:47:02Z	en
dc.description.abstract	Social media sites are considered one of the most important sources of data in many fields, such as health, education, and politics. While surveys provide explicit answers to specific questions, posts in social media have the same answers implicitly occurring in the text. This research aims to develop a method for extracting implicit answers from large tweet collections, and to demonstrate this method for an important concern: the problem of heart attacks. The approach is to collect tweets containing “heart attack” and then select from those the ones with useful information. Informational tweets are those which express real heart attack issues, e.g., “Yesterday morning, my grandfather had a heart attack while he was walking around the garden.” On the other hand, there are non-informational tweets such as “Dropped my iPhone for the first time and almost had a heart attack.” The starting point was to manually classify around 7000 tweets as either informational (11%) or non-informational (89%), thus yielding a labeled dataset to use in devising a machine learning classifier that can be applied to our large collection of over 20 million tweets. Tweets were cleaned and converted to a vector representation, suitable to be fed into different machine-learning algorithms: Deep neural networks, support vector machine (SVM), J48 decision tree and naïve Bayes. Our experimentation aimed to find the best algorithm to use to build a high-quality classifier. This involved splitting the labeled dataset, with 2/3 used to train the classifier and 1/3 used for evaluation besides cross-validation methods. The deep neural network (DNN) classifier obtained the highest accuracy (95.2%). In addition, it obtained the highest F1-scores with (73.6%) and (97.4%) for informational and non-informational classes, respectively.	en
dc.description.version	Published version	en
dc.format.mimetype	application/pdf	en
dc.identifier.citation	Karajeh, O.; Darweesh, D.; Darwish, O.; Abu-El-Rub, N.; Alsinglawi, B.; Alsaedi, N. A Classifier to Detect Informational vs. Non-Informational Heart Attack Tweets. Future Internet 2021, 13, 19.	en
dc.identifier.doi	https://doi.org/10.3390/fi13010019	en
dc.identifier.uri	http://hdl.handle.net/10919/102013	en
dc.language.iso	en	en
dc.publisher	MDPI	en
dc.rights	Creative Commons Attribution 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	en
dc.subject	Machine learning	en
dc.subject	classification	en
dc.subject	support vector machine	en
dc.subject	deep neural networks	en
dc.subject	tweets	en
dc.subject	heart attack	en
dc.subject	health	en
dc.title	A Classifier to Detect Informational vs. Non-Informational Heart Attack Tweets	en
dc.title.serial	Future Internet	en
dc.type	Article - Refereed	en
dc.type.dcmitype	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: futureinternet-13-00019.pdf
Size:: 1.93 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 0 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Journal Articles, Multidisciplinary Digital Publishing Institute (MDPI)
Destination Area: Data and Decisions (D&D)
Scholarly Works, Computer Science