Tobacco Settlement Documents

dc.contributor.authorOnofrio, Nicken
dc.contributor.authorSorkin, Nicken
dc.contributor.authorVenetsanos, Devinen
dc.contributor.authorDiFrancisco, Michaelen
dc.contributor.authorJohnson, Campbellen
dc.contributor.authorFox, Edward A.en
dc.date.accessioned2019-07-06T20:29:24Zen
dc.date.available2019-07-06T20:29:24Zen
dc.date.issued2019-05-10en
dc.description.abstractTobacco companies have had some of the best marketing strategies over the past century. It is well documented and well known that tobacco produces both mental and physical health issues, and yet these companies have found ways to remain as one of the largest businesses. The goal of our project is to assist Dr. Townsend in his research to understand Big Tobacco’s strategies. This is done by taking some of the fourteen million documents released by tobacco companies online and presenting the data in a meaningful way so they can be analyzed. This project is hosted on a Virtual Machine provided to the team by Dr. Fox and the VT Computer Science department. The idea for the project is to begin by gathering the documents from online, turning them into a usable text format, then feeding these documents to a Doc2Vec-based machine learning tool that was created with Gensim. Using a pre-trained model, we then need to take this data and cluster it so that it is presentable in a usable manner. Thus Dr. Townsend and many others can use this system to further their research. This submission includes a report on how to use the system and maintain it. This way Dr. Townsend can do what he wants with the system, and any future developers can understand how the system works. This system is comprised of different online components such as a Gensim doc2vec model and a fast approximate nearest neighbor similarity package from Gensim to do the clustering of the data. This has all been stored and set up on the virtual machine provided by the CS department so it should be accessible as long as the user is connected to the campus wifi. Through this project our team learned many things about working with a client, working with new technologies, and how to go about tracking and presenting progress to others.en
dc.description.notesTobaccoPresentation.pdf: PDF version of the final presentation TobaccoPresentation.pptx: PowerPoint version of the final presentation TobaccoReport.docx: Word version of the final report TobaccoReport.pdf: PDF version of the final report TobaccoTestimonies-master.zip: archive of software developeden
dc.identifier.urihttp://hdl.handle.net/10919/91193en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsCreative Commons CC0 1.0 Universal Public Domain Dedicationen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/en
dc.subjecttobacco settlement documentsen
dc.subjectDoc2Vecen
dc.subjectclusteringen
dc.titleTobacco Settlement Documentsen
dc.typePresentationen
dc.typeReporten
dc.typeSoftwareen

Files

Original bundle
Now showing 1 - 5 of 5
Name:
TobaccoTestimonies-master.zip
Size:
17.26 KB
Format:
Loading...
Thumbnail Image
Name:
TobaccoReport.pdf
Size:
1.99 MB
Format:
Adobe Portable Document Format
Name:
TobaccoReport.docx
Size:
2.76 MB
Format:
Microsoft Word XML
Name:
TobaccoPresentation.pptx
Size:
4.54 MB
Format:
Microsoft Powerpoint XML
Loading...
Thumbnail Image
Name:
TobaccoPresentation.pdf
Size:
1.2 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: