A system for document analysis, translation, and automatic hypertext linking

dc.contributor.authorAverboch, Guillermo Andresen
dc.contributor.committeechairHeath, Lenwood S.en
dc.contributor.committeememberFox, Edward A.en
dc.contributor.committeememberArthur, James D.en
dc.contributor.departmentComputer Science and Applicationsen
dc.date.accessioned2014-03-14T21:40:48Zen
dc.date.adate2009-07-21en
dc.date.available2014-03-14T21:40:48Zen
dc.date.issued1995-06-05en
dc.date.rdate2009-07-21en
dc.date.sdate2009-07-21en
dc.description.abstractA digital library database is a heterogeneous collection of documents. Documents may become available in different formats (e.g., ASCII, SGML, typesetter languages) and they may have to be translated to a standard document representation scheme used by the digital library. This work focuses on the design of a framework that can be used to convert text documents in any format to equivalent documents in different formats and, in particular, to SGML (Standard Generalized Markup Language). In addition, the framework must be able to extract information about the analyzed documents, store that information in a permanent database, and construct hypertext links between documents and the information contained in that database and between the document themselves. For example, information about the author of a document could be extracted and stored in the database. A link can then be established between the document and the information about its author and from there to other documents by the same author. These tasks must be performed without any human intervention, even at the risk of making a small number of mistakes. To accomplish these goals we developed a language called DELTO (Description Language for Textual Objects) that can be used to describe a document format. Given a description for a particular format, our system is able to extract information from documents in that format, to store part of that information in a permanent database, and to use that information in constructing an abstract representation of those documents that can be used to generate equivalent documents in different formats. The system originated from this work is used for constructing the database of Envision, a Virginia Tech digital library research project.en
dc.description.degreeMaster of Scienceen
dc.format.extentxii, 226 leavesen
dc.format.mediumBTDen
dc.format.mimetypeapplication/pdfen
dc.identifier.otheretd-07212009-040529en
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-07212009-040529/en
dc.identifier.urihttp://hdl.handle.net/10919/43809en
dc.language.isoenen
dc.publisherVirginia Techen
dc.relation.haspartLD5655.V855_1995.A992.pdfen
dc.relation.isformatofOCLC# 34376883en
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectformatsen
dc.subjectcomputer languageen
dc.subjectDatabaseen
dc.subject.lccLD5655.V855 1995.A992en
dc.titleA system for document analysis, translation, and automatic hypertext linkingen
dc.typeThesisen
dc.type.dcmitypeTexten
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LD5655.V855_1995.A992.pdf
Size:
7.2 MB
Format:
Adobe Portable Document Format

Collections