End-To-End Text Detection Using Deep Learning

dc.contributor.authorIbrahim, Ahmed Sobhy Elnadyen
dc.contributor.committeechairAbbott, A. Lynnen
dc.contributor.committeememberHuang, Berten
dc.contributor.committeememberStilwell, Daniel J.en
dc.contributor.committeememberHussein, Mohamed E.en
dc.contributor.committeememberHuang, Jia-Binen
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2017-12-20T09:00:16Zen
dc.date.available2017-12-20T09:00:16Zen
dc.date.issued2017-12-19en
dc.description.abstractText detection in the wild is the problem of locating text in images of everyday scenes. It is a challenging problem due to the complexity of everyday scenes. This problem possesses a great importance for many trending applications, such as self-driving cars. Previous research in text detection has been dominated by multi-stage sequential approaches which suffer from many limitations including error propagation from one stage to the next. Another line of work is the use of deep learning techniques. Some of the deep methods used for text detection are box detection models and fully convolutional models. Box detection models suffer from the nature of the annotations, which may be too coarse to provide detailed supervision. Fully convolutional models learn to generate pixel-wise maps that represent the location of text instances in the input image. These models suffer from the inability to create accurate word level annotations without heavy post processing. To overcome these aforementioned problems we propose a novel end-to-end system based on a mix of novel deep learning techniques. The proposed system consists of an attention model, based on a new deep architecture proposed in this dissertation, followed by a deep network based on Faster-RCNN. The attention model produces a high-resolution map that indicates likely locations of text instances. A novel aspect of the system is an early fusion step that merges the attention map directly with the input image prior to word-box prediction. This approach suppresses but does not eliminate contextual information from consideration. Progressively larger models were trained in 3 separate phases. The resulting system has demonstrated an ability to detect text under difficult conditions related to illumination, resolution, and legibility. The system has exceeded the state of the art on the ICDAR 2013 and COCO-Text benchmarks with F-measure values of 0.875 and 0.533, respectively.en
dc.description.degreePh. D.en
dc.format.mediumETDen
dc.identifier.othervt_gsexam:13267en
dc.identifier.urihttp://hdl.handle.net/10919/81277en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectDeep learning (Machine learning)en
dc.subjectComputer Visionen
dc.subjectText Detectionen
dc.titleEnd-To-End Text Detection Using Deep Learningen
dc.typeDissertationen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ibrahim_AS_D_2017.pdf
Size:
20.51 MB
Format:
Adobe Portable Document Format