Visual word recognition with large-scale image retrieval

Placeholder Show Content

Abstract/Contents

Abstract
OCR techniques that were developed for scanned high-resolution images suffer from poor text recognition accuracy on camera images. In this dissertation, we cast text recognition as a word patch retrieval problem. By comparing visual text queries against a database of labeled word images, we demonstrate a text recognition system that achieves significantly improved performance on camera images. Robust visual text features are the foundation for optimal word patch retrieval. We introduce Text Aggregated Gradients (TAG), a visual text descriptor discriminatively learned using easily obtainable training data. At the query stage, query words can be accurately recognized via Approximate Nearest Neighbor search. In the interest of system scalability to a large number of fonts, a font recognition method is developed to facilitate fast query retrieval. We show that by employing a Bayesian network, surprisingly good font recognition accuracy can be achieved via probabilistic inference. A compact database is highly desirable for our large-scale word patch retrieval. To eliminate inter-font redundancies from the database, we perform feature aggregation within groups of similar fonts. Using Canonical Correlation Analysis, our database can be compressed by a factor of 5x without any loss of word retrieval accuracy.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English

Creators/Contributors

Associated with Chen, Huizhong
Associated with Stanford University, Department of Electrical Engineering.
Primary advisor Girod, Bernd
Thesis advisor Girod, Bernd
Thesis advisor Guibas, Leonidas J
Thesis advisor Pauly, John (John M.)
Advisor Guibas, Leonidas J
Advisor Pauly, John (John M.)

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Huizhong Chen.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

Copyright
© 2015 by Huizhong Chen
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...