Library Tweets Conversion

Abstract

The Digital Library Research Laboratory (DLRL) has collected billions of tweets over the course of years. These tweets were gathered using three different data collection tools, and have been organized into collections based on keywords. The different collection tools used were: Social Feed Manager (SFM), yourTwapperKeeper (YTK), and Digital Methods Initiative Twitter Capture and Analysis Toolset (DMI-TCAT). Because each of these tools store the tweets differently, the DLRL aims to consolidate these tweets so the Library can provide a service that allows the campus to easily access and use this data.

Our job was to come up with a unified JSON format that all of these tweets could be represented by and to provide a way to convert them to this new format. Additionally, we had to provide suitable collection-level information for each distinct data collection that showed the connections between tweets and the collections they belonged to. To accomplish this, we have six conversion scripts. Three of these are for converting the individual tweets, and three of them are for compiling the collection-level metadata and preserving the relationship between tweets and collections. When run with the Twitter data, they provide a unified way to digest all of the collected data regardless of which method it was obtained by.

Description

Keywords

YTK, yourtwapperkeeper, Twitter, Data conversion, Collection-level, Python, tweet, Digital Methods Initiative Twitter Capture and Analysis Toolset, DMI-TCAT, Social Feed Manager, SFM, MySQL, JSON, Library, Library tweets data

Citation