Scaling up object detection

Russakovsky, Olga; Stanford University, Department of Computer Science.

Scaling up object detection

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fpg226tj8872" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Hundreds of billions of photographs are created on the web each year. An important step towards understanding the content of these photographs is to be able to understand all objects that are depicted. My research focuses on the problem of automatically naming and localizing objects in large collections of images. This is referred to as the task of object detection. The work in this thesis scales up object detection algorithms in both the number of images and the number of objects that can be recognized. I've developed efficient object detection algorithms which can be applied on large image collections and studied using shareable generic object attribute descriptions that can be used to effectively describe a variety of object classes without learning individual class appearance models. The key roadblock to scaling up object detection is that extensive manual annotation is required for training the models, which can be very time-consuming and expensive. To address this roadblock, my colleagues and I created the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). ILSVRC serves as a benchmark large-scale object recognition for hundreds of international research teams. I led the effort to construct the object detection benchmark, scaling up by more than an order of magnitude compared to previous dataset (e.g., the PASCAL VOC). The construction of this dataset required developing novel crowd engineering techniques for reducing annotation cost. The availability of this large-scale data lead to a revolution in object detection algorithms. I performed a detailed analysis of the current state of the field of object recognition, providing insights for future research efforts. Thinking ahead about scaling up object detection even further, I developed a framework for bringing together the state-of-the-art automatic large-scale object detection with state-of-the-art crowd engineering techniques into a principled human-in-the-loop framework for accurately and efficiently localizing objects in images.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2015
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Russakovsky, Olga
Associated with	Stanford University, Department of Computer Science.
Primary advisor	Li, Fei Fei, 1976-
Thesis advisor	Li, Fei Fei, 1976-
Thesis advisor	Bernstein, Michael
Thesis advisor	Lin, Yuanqing, (Department Head of Media Analytics)
Advisor	Bernstein, Michael
Advisor	Lin, Yuanqing, (Department Head of Media Analytics)

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Olga Russakovsky.
Note	Submitted to the Department of Computer Science.
Thesis	Thesis (Ph.D.)--Stanford University, 2015.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...