Image descriptor aggregation for efficient retrieval

Placeholder Show Content

Abstract/Contents

Abstract
As more and more visual content is created every day, it is critical to make sense of large image databases. Searching such databases using a query image is the goal of content-based image retrieval, or visual search. At the core of this task is the trade-off between search speed and accuracy. This work proposes to use aggregation to improve this trade-off. By aggregating information across images we can directly compare a query image with sets of images represented by a single descriptor. We show that this can make the search considerably faster with a very limited loss of accuracy. The main questions explored in this work relate to how to best perform this aggregation: how to choose which images should be aggregated, how to represent a set of images with a single descriptor, and finally how to index these descriptors so as to maximize the search speed. We show that it is beneficial to aggregate images that share similar characteristics, such as images captured from nearby viewpoints, and that the higher level of abstraction achieved by searching aggregated descriptors instead of original images allows for considerable speed gains by reducing the size of the database. Our next contribution is to show that improving the representation of a given set of image descriptors can lead to additional gains: the representation using generalized max pooling is much better for retrieval tasks. More complex parametric methods do not seem to show additional benefits compared to simple pooling of optimized image descriptors. Finally, we show the importance of indexing the aggregated descriptors into a well-chosen hierarchical structure that combines the benefits of a coarse database search at the higher levels of the hierarchy with the benefits of a fine database search at the lower levels. All these contributions jointly combine to drastically improve retrieval speed without degrading the accuracy. This study is rooted in a theoretical framework that we develop as a basis for aggregation, and we then show that the insights learned from this framework carry on to a wide range of real-world applications such as 3D object retrieval, indoor localization and person re-identification.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English

Creators/Contributors

Author Boin, Jean-Baptiste Roger Noel
Degree supervisor Girod, Bernd
Thesis advisor Girod, Bernd
Thesis advisor Wandell, Brian A
Thesis advisor Wetzstein, Gordon
Degree committee member Wandell, Brian A
Degree committee member Wetzstein, Gordon
Associated with Stanford University, Department of Electrical Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Jean-Baptiste Boin.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2019.
Location electronic resource

Access conditions

Copyright
© 2019 by Jean-Baptiste Roger Noel Boin
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...