Scaling up object detection

Placeholder Show Content

Abstract/Contents

Abstract
Hundreds of billions of photographs are created on the web each year. An important step towards understanding the content of these photographs is to be able to understand all objects that are depicted. My research focuses on the problem of automatically naming and localizing objects in large collections of images. This is referred to as the task of object detection. The work in this thesis scales up object detection algorithms in both the number of images and the number of objects that can be recognized. I've developed efficient object detection algorithms which can be applied on large image collections and studied using shareable generic object attribute descriptions that can be used to effectively describe a variety of object classes without learning individual class appearance models. The key roadblock to scaling up object detection is that extensive manual annotation is required for training the models, which can be very time-consuming and expensive. To address this roadblock, my colleagues and I created the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). ILSVRC serves as a benchmark large-scale object recognition for hundreds of international research teams. I led the effort to construct the object detection benchmark, scaling up by more than an order of magnitude compared to previous dataset (e.g., the PASCAL VOC). The construction of this dataset required developing novel crowd engineering techniques for reducing annotation cost. The availability of this large-scale data lead to a revolution in object detection algorithms. I performed a detailed analysis of the current state of the field of object recognition, providing insights for future research efforts. Thinking ahead about scaling up object detection even further, I developed a framework for bringing together the state-of-the-art automatic large-scale object detection with state-of-the-art crowd engineering techniques into a principled human-in-the-loop framework for accurately and efficiently localizing objects in images.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English

Creators/Contributors

Associated with Russakovsky, Olga
Associated with Stanford University, Department of Computer Science.
Primary advisor Li, Fei Fei, 1976-
Thesis advisor Li, Fei Fei, 1976-
Thesis advisor Bernstein, Michael
Thesis advisor Lin, Yuanqing, (Department Head of Media Analytics)
Advisor Bernstein, Michael
Advisor Lin, Yuanqing, (Department Head of Media Analytics)

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Olga Russakovsky.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

Copyright
© 2015 by Olga Russakovsky
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...