Text Transformation

dc.contributor.authorThompson, Dustinen
dc.contributor.authorHenke, Zachen
dc.contributor.authorCox, Kevinen
dc.contributor.authorFenton, Kevinen
dc.date.accessioned2015-05-15T04:05:45Zen
dc.date.available2015-05-15T04:05:45Zen
dc.date.issued2015-05-14en
dc.description.abstractThe purpose of this project is to assist the VTTI in converting a large citation file into a CSV file for ease of access. It required us to develop an application which can parse through a text file of citations, and determine how to properly put the data into CSV format. We designed the program in Java and developed a user-interface using JavaFX, which is included in the latest edition of Java. We came up with two main tools: the developer tool and the parsing program itself. The developer tool is used to build a tree made up of regular expressions which would be used in parsing the citations. The top nodes of the tree would be very general regexes, and the leaf nodes of the tree would become much more specific. This program can export the regex tree as a binary file which will be used by the main parsing program. The main parsing program takes three inputs: a binary regex tree file, a citation text file, and an output location. Once run, it parses the citations based off of the tree it was given. It outputs the parsed citations into a CSV file with the citations separated by field. For any citations that the program is unable to process, it dumps them into a failed output text file so. We also created an additional program as an alternative solution to ours. It uses Brown University’s FreeCite parsing program, and then outputs parsed citations to a CSV file.en
dc.description.sponsorshipNathan Hallen
dc.identifier.urihttp://hdl.handle.net/10919/52338en
dc.language.isoen_USen
dc.rightsCreative Commons Attribution 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/en
dc.subjectCitationen
dc.subjectParseen
dc.subjectRegexen
dc.subjectJavaen
dc.titleText Transformationen
dc.typeDataseten
dc.typePresentationen
dc.typeSoftwareen
dc.typeTechnical reporten

Files

Original bundle
Now showing 1 - 5 of 13
Name:
CitationParser.jar
Size:
316.07 KB
Format:
Unknown data format
Description:
Executable jar that runs citation parsing tool
Name:
TreeTool.jar
Size:
25.67 KB
Format:
Unknown data format
Description:
Executable jar that runs developer tree building tool
Name:
CitationParserProject.zip
Size:
35.58 KB
Format:
Unknown data format
Description:
Zip file containing eclipse project files for the citation parsing tool
Name:
TreeToolProject.zip
Size:
38.83 KB
Format:
Unknown data format
Description:
Zip file containing eclipse project files for the developer tree building tool
Loading...
Thumbnail Image
Name:
FinalPresentation.pdf
Size:
231.32 KB
Format:
Adobe Portable Document Format
Description:
Final Project Presentation
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: