Image descriptor aggregation for efficient retrieval
Abstract/Contents
- Abstract
- As more and more visual content is created every day, it is critical to make sense of large image databases. Searching such databases using a query image is the goal of content-based image retrieval, or visual search. At the core of this task is the trade-off between search speed and accuracy. This work proposes to use aggregation to improve this trade-off. By aggregating information across images we can directly compare a query image with sets of images represented by a single descriptor. We show that this can make the search considerably faster with a very limited loss of accuracy. The main questions explored in this work relate to how to best perform this aggregation: how to choose which images should be aggregated, how to represent a set of images with a single descriptor, and finally how to index these descriptors so as to maximize the search speed. We show that it is beneficial to aggregate images that share similar characteristics, such as images captured from nearby viewpoints, and that the higher level of abstraction achieved by searching aggregated descriptors instead of original images allows for considerable speed gains by reducing the size of the database. Our next contribution is to show that improving the representation of a given set of image descriptors can lead to additional gains: the representation using generalized max pooling is much better for retrieval tasks. More complex parametric methods do not seem to show additional benefits compared to simple pooling of optimized image descriptors. Finally, we show the importance of indexing the aggregated descriptors into a well-chosen hierarchical structure that combines the benefits of a coarse database search at the higher levels of the hierarchy with the benefits of a fine database search at the lower levels. All these contributions jointly combine to drastically improve retrieval speed without degrading the accuracy. This study is rooted in a theoretical framework that we develop as a basis for aggregation, and we then show that the insights learned from this framework carry on to a wide range of real-world applications such as 3D object retrieval, indoor localization and person re-identification.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2019; ©2019 |
Publication date | 2019; 2019 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Boin, Jean-Baptiste Roger Noel |
---|---|
Degree supervisor | Girod, Bernd |
Thesis advisor | Girod, Bernd |
Thesis advisor | Wandell, Brian A |
Thesis advisor | Wetzstein, Gordon |
Degree committee member | Wandell, Brian A |
Degree committee member | Wetzstein, Gordon |
Associated with | Stanford University, Department of Electrical Engineering. |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Jean-Baptiste Boin. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2019. |
Location | electronic resource |
Access conditions
- Copyright
- © 2019 by Jean-Baptiste Roger Noel Boin
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...