Text Transformation

Abstract

The purpose of this project is to assist the VTTI in converting a large citation file into a CSV file for ease of access. It required us to develop an application which can parse through a text file of citations, and determine how to properly put the data into CSV format. We designed the program in Java and developed a user-interface using JavaFX, which is included in the latest edition of Java. We came up with two main tools: the developer tool and the parsing program itself. The developer tool is used to build a tree made up of regular expressions which would be used in parsing the citations. The top nodes of the tree would be very general regexes, and the leaf nodes of the tree would become much more specific. This program can export the regex tree as a binary file which will be used by the main parsing program. The main parsing program takes three inputs: a binary regex tree file, a citation text file, and an output location. Once run, it parses the citations based off of the tree it was given. It outputs the parsed citations into a CSV file with the citations separated by field. For any citations that the program is unable to process, it dumps them into a failed output text file so. We also created an additional program as an alternative solution to ours. It uses Brown University’s FreeCite parsing program, and then outputs parsed citations to a CSV file.

Description
Keywords
Citation, Parse, Regex, Java
Citation