CS 5604 SP15 Course Instructor: Prof. Ed Fox (fox@cs.vt.edu) Group Member: Bharadwaj Bulusu - bbsb08@vt.edu Vanessa Cedeno- vcedeno@vt.edu Islam Harb - iharb@vt.edu Yilong Jin - jin28@vt.edu Sai Ravi Kiran Mallampati - sairavi5@vt.edu scripts: main.py: driver script that takes tweet file, user file and output importance for each tweet message in AVRO format multi_convert.py: covert JSON objects to AVRO format. Notice that the JSON objects should contain user information as well util.py: fucntions that is called by main.py Library: glob, multiprocessing, avro, time, json, sys, etc. all library used are standard python library execept for Apache AVRO library for I/O Multiprocessing: by default, multi_convert and main.py will run in multi process mode. The number of processes used by the script equals to the number of CPU cores on the machine multi_convert will generate N pairs of tweet and user avro files, where N is the number of cores main.py will take one tweet avro file and one user avro file and output one importance file Schema: All three schema used are in schema directory