Tweets Metadata


The previous CTRnet and current IDEAL projects have involved collecting large numbers of tweets about different events. Others also collect about events, so it is important to be able to merge tweet collections. We need a metadata standard to describe tweet collections in order to perform this merge and to be able to store them logically in a database. We were tasked with continued development of a tool for archiving tweet collections, merging collections, and preparing summaries about the respective collections, their union, and their overlap, etc. Preliminary work on this was carried out by Michael Shuffett (see his report from CS6604 in VTechWorks) and we were asked to develop upon his code, test it, extend it, apply new technology, and further document it where necessary.

We met with our client, Mohamed Magdy, a Ph.D. candidate working with QCRI and Dr. Fox and came up with the following project deliverables: (1) a standard for tweet sharing and for tweet collection metadata, (2) a method for merging such collections, (3) a report, and (4) a web-application tool, putting all of these things together.

The expected impact of this project will be having increased collaboration between researchers and investigators all trying to use tweet metadata to find insights into everything from natural disasters to criminal activity and even stock market trends. A tool of this type will help ease the process of merging archives of tweets between researchers which will then lead to better analysis and less time spent trying to re-organize information that could be sifted through by this tool.

Our team was able to develop upon Michael Shuffett’s code, improve it, and set up new and improved wireframes for the website. We were able to start framing out a tool that allows more than two types of files to be merged, which previously had to be in a single format. In the future, the required formats wouldn’t be as strict, making it easier to upload different types of files, thus making it even easier on the user.

Twitter, Metadata, tweet collections, events archive, CTRnet, IDEAL