Invariance for perceptual recognition through deep learning

Zou, Youzhi; Stanford University, Department of Electrical Engineering.

Invariance for perceptual recognition through deep learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fnd387jf5675" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: The brain implements recognition systems with incredible competence. Our perceptual systems recognize an object from various perspectives as it transforms through space and time. A key property of effective recognition is invariance to changes in the input. In fact, invariant representations focus on high-level information and neglect irrelevant changes, facilitating effective recognition. It is desirable for computational simulations to capture invariant properties. However, quantifying and designing invariance is difficult, because the input signals to a perceptual system are high dimensional, and the number of input variations, conceived in terms of separate dimensions of variation, such as position, rotation, scale can be exponentially large. Natural invariance resides in a subspace of this exponential space, one that, I argue, can be more effectively captured through learning than through design. To capture perceptual invariance, I take the approach of modeling through deep neural networks. These models are classic AI algorithms. The deep neural network is characteristic of composing simple features from lower layers into more complex representations in higher layers. Going up in the hierarchy, the network forms high-level representations which capture various forms of invariance found in natural images. Within this framework, I present three applications. First, I investigate position-preserving invariance properties of a classical architecture, the convolutional neural network. Indeed, with convolutional networks, I show results surpassing the previous state-of-the-art performance in detecting the location of objects in images. In such models, however, translational invariance is designed, limiting their ability to capture the full invariance structure of real inputs. To learn invariance without design, I exploit unsupervised learning from videos using the 'slowness' principle. Concretely, the unsupervised learning algorithm discovers invariance arising from transformations such as rotation, out-of-plane changes, or warping from motions in video. When quantitatively measured, the learned invariant features are more robust than ones that are hand-crafted. Using such invariant features, recognition in still images is consistently improved. Finally, I explore the development of invariant representations of number through learning from unlabeled examples in a generic neural network. By learning from examples of 'visual numbers', this network forms number representations invariant to object size. With these representations, I illustrate novel simulations for cognitive processes of the 'Approximate Number Sense'. Concretely, I correlate simulations with deep networks with the sensitivity of discrimination across a range of numbers. These simulations capture properties of human number representation, focusing on approximate invariance to other stimulus factors.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2015
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Zou, Youzhi
Associated with	Stanford University, Department of Electrical Engineering.
Primary advisor	McClelland, James L
Thesis advisor	McClelland, James L
Thesis advisor	Guibas, Leonidas J
Thesis advisor	Widrow, Bernard, 1929-
Advisor	Guibas, Leonidas J
Advisor	Widrow, Bernard, 1929-

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Youzhi (Will) Zou.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2015.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...