Scaling up object detection
Abstract/Contents
- Abstract
- Hundreds of billions of photographs are created on the web each year. An important step towards understanding the content of these photographs is to be able to understand all objects that are depicted. My research focuses on the problem of automatically naming and localizing objects in large collections of images. This is referred to as the task of object detection. The work in this thesis scales up object detection algorithms in both the number of images and the number of objects that can be recognized. I've developed efficient object detection algorithms which can be applied on large image collections and studied using shareable generic object attribute descriptions that can be used to effectively describe a variety of object classes without learning individual class appearance models. The key roadblock to scaling up object detection is that extensive manual annotation is required for training the models, which can be very time-consuming and expensive. To address this roadblock, my colleagues and I created the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). ILSVRC serves as a benchmark large-scale object recognition for hundreds of international research teams. I led the effort to construct the object detection benchmark, scaling up by more than an order of magnitude compared to previous dataset (e.g., the PASCAL VOC). The construction of this dataset required developing novel crowd engineering techniques for reducing annotation cost. The availability of this large-scale data lead to a revolution in object detection algorithms. I performed a detailed analysis of the current state of the field of object recognition, providing insights for future research efforts. Thinking ahead about scaling up object detection even further, I developed a framework for bringing together the state-of-the-art automatic large-scale object detection with state-of-the-art crowd engineering techniques into a principled human-in-the-loop framework for accurately and efficiently localizing objects in images.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2015 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Russakovsky, Olga |
---|---|
Associated with | Stanford University, Department of Computer Science. |
Primary advisor | Li, Fei Fei, 1976- |
Thesis advisor | Li, Fei Fei, 1976- |
Thesis advisor | Bernstein, Michael |
Thesis advisor | Lin, Yuanqing, (Department Head of Media Analytics) |
Advisor | Bernstein, Michael |
Advisor | Lin, Yuanqing, (Department Head of Media Analytics) |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Olga Russakovsky. |
---|---|
Note | Submitted to the Department of Computer Science. |
Thesis | Thesis (Ph.D.)--Stanford University, 2015. |
Location | electronic resource |
Access conditions
- Copyright
- © 2015 by Olga Russakovsky
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...