User-trainable object recognition systems via group induction
Abstract/Contents
- Abstract
- Despite initial optimism in the 1960s and the five decades of research since, we remain far from the goal of computers that can perceive and understand objects in the environment around them. The vast majority of research in this area has been conducted using traditional cameras, but the advent of high quality depth sensing - perhaps most visible in the 2005 and 2007 DARPA autonomous driving challenges - opens up entirely new approaches to the object recognition problem. This dissertation is about one such new approach to object recognition, the conditions in which it can be applied, and the significant benefits it confers. The results focus primarily on object recognition in autonomous driving, but generalize to other contexts. Most objects on and near roads actively avoid collision with other objects, making it possible to segment and track objects seen by a depth sensor without use of a prior appearance or motion model. Thus, classification of objects can occur after the segmentation and tracking steps; this is the key idea in the STAC approach ("segment, track, and classify"). This approach requires model-free segmentation and tracking but magnifies user annotation efficiency (because entire tracks rather than individual views are the unit of annotation), improves classification accuracy (because classifications can be aggregated over different views of an object), and enables a new form of semi-supervised learning: group induction. Machine perception systems often require hundreds of thousands or millions of labeled training examples to produce accurate classifications. For individual users this is a major barrier, the removal of which could have significant implications for flexible manufacturing, home automation, agricultural robotics, and more. Group induction harnesses the structure in unlabeled data that naturally arises from the STAC approach, making it possible to produce accurate classifiers from ~10-100 user-annotated training tracks. The utility of STAC and group induction raises the question of how they can be applied in environments where model-free segmentation and tracking are not so readily available. The final contribution of this dissertation is the STRGEN algorithm (pronounced "sturgeon"), which provides a framework for propagating object segmentations in such environments with minimal simplifying assumptions. A problem of this difficulty likely requires the use of a wide range of cues: color texture, image edges, depth edges, surface normal changes, optical flow vectors, and so on can all contribute probabilistically to the propagation of a segmentation mask through time. STRGEN synthesizes a simple and elegant method of combining these diverse cues at runtime as well as -- crucially -- a method of learning from data how to best combine these cues for use in propagating a segmentation for a randomly selected, as-yet-unseen object. This learning is one level of abstraction higher than the online learning of object models often seen in state of the art bounding box trackers, and is essential to the effective use of the rich segmentation models made use of here.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2014 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Teichman, Alex |
---|---|
Associated with | Stanford University, Department of Computer Science. |
Primary advisor | Thrun, Sebastian, 1967- |
Thesis advisor | Thrun, Sebastian, 1967- |
Thesis advisor | Ng, Andrew Hock-soon, 1972- |
Thesis advisor | Savarese, Silvio |
Advisor | Ng, Andrew Hock-soon, 1972- |
Advisor | Savarese, Silvio |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Alex Teichman. |
---|---|
Note | Submitted to the Department of Computer Science. |
Thesis | Thesis (Ph.D.)--Stanford University, 2014. |
Location | electronic resource |
Access conditions
- Copyright
- © 2014 by Alexander William Teichman
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...