Invariance for perceptual recognition through deep learning

Placeholder Show Content

Abstract/Contents

Abstract
The brain implements recognition systems with incredible competence. Our perceptual systems recognize an object from various perspectives as it transforms through space and time. A key property of effective recognition is invariance to changes in the input. In fact, invariant representations focus on high-level information and neglect irrelevant changes, facilitating effective recognition. It is desirable for computational simulations to capture invariant properties. However, quantifying and designing invariance is difficult, because the input signals to a perceptual system are high dimensional, and the number of input variations, conceived in terms of separate dimensions of variation, such as position, rotation, scale can be exponentially large. Natural invariance resides in a subspace of this exponential space, one that, I argue, can be more effectively captured through learning than through design. To capture perceptual invariance, I take the approach of modeling through deep neural networks. These models are classic AI algorithms. The deep neural network is characteristic of composing simple features from lower layers into more complex representations in higher layers. Going up in the hierarchy, the network forms high-level representations which capture various forms of invariance found in natural images. Within this framework, I present three applications. First, I investigate position-preserving invariance properties of a classical architecture, the convolutional neural network. Indeed, with convolutional networks, I show results surpassing the previous state-of-the-art performance in detecting the location of objects in images. In such models, however, translational invariance is designed, limiting their ability to capture the full invariance structure of real inputs. To learn invariance without design, I exploit unsupervised learning from videos using the 'slowness' principle. Concretely, the unsupervised learning algorithm discovers invariance arising from transformations such as rotation, out-of-plane changes, or warping from motions in video. When quantitatively measured, the learned invariant features are more robust than ones that are hand-crafted. Using such invariant features, recognition in still images is consistently improved. Finally, I explore the development of invariant representations of number through learning from unlabeled examples in a generic neural network. By learning from examples of 'visual numbers', this network forms number representations invariant to object size. With these representations, I illustrate novel simulations for cognitive processes of the 'Approximate Number Sense'. Concretely, I correlate simulations with deep networks with the sensitivity of discrimination across a range of numbers. These simulations capture properties of human number representation, focusing on approximate invariance to other stimulus factors.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English

Creators/Contributors

Associated with Zou, Youzhi
Associated with Stanford University, Department of Electrical Engineering.
Primary advisor McClelland, James L
Thesis advisor McClelland, James L
Thesis advisor Guibas, Leonidas J
Thesis advisor Widrow, Bernard, 1929-
Advisor Guibas, Leonidas J
Advisor Widrow, Bernard, 1929-

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Youzhi (Will) Zou.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

Copyright
© 2015 by Youzhi Zou
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...