User-trainable object recognition systems via group induction

Placeholder Show Content

Abstract/Contents

Abstract
Despite initial optimism in the 1960s and the five decades of research since, we remain far from the goal of computers that can perceive and understand objects in the environment around them. The vast majority of research in this area has been conducted using traditional cameras, but the advent of high quality depth sensing - perhaps most visible in the 2005 and 2007 DARPA autonomous driving challenges - opens up entirely new approaches to the object recognition problem. This dissertation is about one such new approach to object recognition, the conditions in which it can be applied, and the significant benefits it confers. The results focus primarily on object recognition in autonomous driving, but generalize to other contexts. Most objects on and near roads actively avoid collision with other objects, making it possible to segment and track objects seen by a depth sensor without use of a prior appearance or motion model. Thus, classification of objects can occur after the segmentation and tracking steps; this is the key idea in the STAC approach ("segment, track, and classify"). This approach requires model-free segmentation and tracking but magnifies user annotation efficiency (because entire tracks rather than individual views are the unit of annotation), improves classification accuracy (because classifications can be aggregated over different views of an object), and enables a new form of semi-supervised learning: group induction. Machine perception systems often require hundreds of thousands or millions of labeled training examples to produce accurate classifications. For individual users this is a major barrier, the removal of which could have significant implications for flexible manufacturing, home automation, agricultural robotics, and more. Group induction harnesses the structure in unlabeled data that naturally arises from the STAC approach, making it possible to produce accurate classifiers from ~10-100 user-annotated training tracks. The utility of STAC and group induction raises the question of how they can be applied in environments where model-free segmentation and tracking are not so readily available. The final contribution of this dissertation is the STRGEN algorithm (pronounced "sturgeon"), which provides a framework for propagating object segmentations in such environments with minimal simplifying assumptions. A problem of this difficulty likely requires the use of a wide range of cues: color texture, image edges, depth edges, surface normal changes, optical flow vectors, and so on can all contribute probabilistically to the propagation of a segmentation mask through time. STRGEN synthesizes a simple and elegant method of combining these diverse cues at runtime as well as -- crucially -- a method of learning from data how to best combine these cues for use in propagating a segmentation for a randomly selected, as-yet-unseen object. This learning is one level of abstraction higher than the online learning of object models often seen in state of the art bounding box trackers, and is essential to the effective use of the rich segmentation models made use of here.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2014
Issuance monographic
Language English

Creators/Contributors

Associated with Teichman, Alex
Associated with Stanford University, Department of Computer Science.
Primary advisor Thrun, Sebastian, 1967-
Thesis advisor Thrun, Sebastian, 1967-
Thesis advisor Ng, Andrew Hock-soon, 1972-
Thesis advisor Savarese, Silvio
Advisor Ng, Andrew Hock-soon, 1972-
Advisor Savarese, Silvio

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Alex Teichman.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2014.
Location electronic resource

Access conditions

Copyright
© 2014 by Alexander William Teichman
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...