Introduction to Image Retrieval
Updated: Nov 21, 2018
In this post I will give an introduction to image retrieval making emphasis on Content Based Image Retrieval (CBIR). This will be the first of a series where I will also review the state-of-the-art and explain the work I did during my thesis.
Image retrieval has always been an interesting topic for researchers. It is defined as the process of searching and fetching images from a dataset. Having the chance to obtain a visual representation of a concept is very useful and has lots of different related applications.
It all started with keyword searches, where the user inputted words as queries. These systems relied on finding the images in documents or file names containing those queries.
But as the saying goes:
Sometimes an image is worth a thousand words!
The next iteration was using visual query searches. In this case, the user input is an image. The system is expected to analyze the image, obtain its most characteristic features and return the most similar images. Some examples for visual similarity criteria are based on color distribution, textures or shape attributes.
One of the main challenges in CBIR is finding image representations so that related images have higher similarity score than dissimilar ones. One could think, that a good and simple approach for obtaining similar images may be comparing them by their raw pixel values. However, this approach is not robust to changes in scale, translation or illumination. Besides, it is not efficient, as images have millions of pixels. For these reasons, images need to be characterized with a representation which is invariant to certain transformations. Another important and desirable property for these representations is to be compact, that is, having a small memory footprint. When compactness increases, the storage requirements decrease and, in addition, the search speed increases.
In general, image retrieval systems first encode and index a dataset of images where the search will be performed. Then, when the user poses a query, a similarity search comparing the query representation with the ones stored in the database is carried out. After ordering the scores computed, a ranked list of images is provided. Finally, a search post-processing is usually applied to refine the initial search. An example of image retrieval pipeline is depicted in the next image:
In the first retrieval systems, features were handcrafted to satisfy these invariance properties. More recently, there has been a significant performance improvement by using deep learning models as feature extractors (learned features). In the latter case, two main ways of tackle the problem have been proposed. On the one hand, by transferring the knowledge of general models trained for the image classification task, where there are lots of labeled datasets available to train models with really good performance. On the other hand, by training models with image retrieval labeled (or soft labeled) datasets and a retrieval based loss function. This latter approach is the one giving best results but has to confront some of the following challenges: