Tracking Text in Mixed Mode Documents

Files

TR Number

TR-88-19

Date

1988

Journal Title

Journal ISSN

Volume Title

Publisher

Department of Computer Science, Virginia Polytechnic Institute & State University

Abstract

This paper describes a method for extracting arbitrarily oriented text in documents containing both text and graphics. The technique presented is inspired by the tracking algorithms frequently found in raster to vector conversion systems. By identifying text components in the document, reducing the resolution of the image by the size of the characters, and then tracking the centers of the character components, all text strings can be removed and subsequently reoriented to the horizontal. They can then be presented for automated character recognition. A by-product of the method is that characters are automatically grouped together to form words and/or phrases. We give a detailed description of the algorithm, discuss its strengths and weaknesses, and present some sample results obtained from a typical city street map.

Description

Keywords

Citation