Tracking Text in Mixed Mode Documents

Files
TR Number
TR-88-19
Date
1988
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Computer Science, Virginia Polytechnic Institute & State University
Abstract

This paper describes a method for extracting arbitrarily oriented text in documents containing both text and graphics. The technique presented is inspired by the tracking algorithms frequently found in raster to vector conversion systems. By identifying text components in the document, reducing the resolution of the image by the size of the characters, and then tracking the centers of the character components, all text strings can be removed and subsequently reoriented to the horizontal. They can then be presented for automated character recognition. A by-product of the method is that characters are automatically grouped together to form words and/or phrases. We give a detailed description of the algorithm, discuss its strengths and weaknesses, and present some sample results obtained from a typical city street map.

Description
Keywords
Citation