Findings on ranking evaluation functions for feature weighting in image retrieval
© Silva et al.; licensee Springer. 2014
Received: 19 February 2013
Accepted: 2 December 2013
Published: 6 March 2014
There are substantial benefits to be gained from ranking optimization in several information retrieval and recommendation systems. However, the analysis of ranking evaluation functions (REFs), which play a major role in many ranking optimization models, needs to be further investigated. An analysis of previous studies that investigated REFs was performed, and evidence was found which indicated that the choice of a proper REF is context sensitive.
In this study, we analyze a broad set of REFs for feature weighting aimed at increasing the image retrieval effectiveness. The REFs analyzed sums ten and includes the most successful and representative REFs from the literature. The REFs were embedded into a genetic algorithm (GA)-based relevance feedback (RF) model, called WLSP-C ±, aimed at improving image retrieval results through the use of learning weights for image descriptors and image regions.
Analyses of precision-recall curves in five real-world image data sets showed that one non-parameterized REF named F5, not analyzed in previous studies, overcame recommended ones, which require parameter adjustment. We also provided a computational analysis of the GA-based RF model investigated, and it was shown that it is linear in regard to the image data set cardinality.
We conclude that REF F5 should be investigated in other contexts and problem scenarios centered on ranking optimization, as ranking optimization techniques rely heavily on the ranking quality measure.
Ranking optimization research studies have fostered widespread developments in information retrieval and recommendation systems [1–6]. Ranking optimization techniques can be grouped into three main classes: rank learning [2, 4, 5], rank aggregation (also known as data fusion) [7–10] and ranking (or list) diversification [1, 11, 12]. Rank learning relies on supervised queries, relevance feedback or context information to achieve an adequate model to rank items like web pages, images, etc. Normally, rank aggregation is an unsupervised method that relies on multi-criteria ranks and tries to combine them to produce a consensus rank. On the other hand, ranking or list diversification aims at balancing ‘precision’ and ‘diversity’ to reflect a broad spectrum of user interests concerning items.
Rank learning tasks are generally stated as optimization problems: to find the best model (or the best adjustment in a given model) according to some representation to rank items. Given its general formulation, solutions of rank learning normally apply a search method guided by some ranking evaluation function. Ranking evaluation functions (REFs) are normally computed with a basis on supervised queries or user relevance feedback (RF). These REFs evaluate models or adjustments according to the effectiveness of the ranking produced. In regard to search methods, most research studies have employed evolutionary algorithms (EAs). The EA flexibility enables the modeling of rank learning in many ways, such as through ranking function discovery [5, 13, 14], weight and parameter learning [15–19], among others. Independent to the model representation, a proper evaluation function is very important for the effectiveness and efficiency of EAs.
Although REFs have been shown to have applied a major rule to rank learning almost a decade ago [13, 15–17], in recent studies, little attention has been given to the design and selection of more appropriate ones. Researchers have chosen popular REFs and applied them to new contexts and models without any theoretical or empirical evidence about its suitableness. Moreover, few studies have focused on rank learning for image retrieval tasks, and the existing ones are not deep enough and do not cover all the spectrum of models employed in this sector.
López-Pujalte et al. [15–17] have studied the problem of adapting document descriptions through learning terms, weights and parameters in matching functions applied to information retrieval. These researches investigated mainly the issue of different REFs as fitness functions for genetic algorithms (GAs) in relevance feedback. By analyzing the mean precision in three levels of recall, these studies showed that the results effectiveness varied widely depending on which REF is used. Also, in these studies, it was found that utility theory-based ranking evaluation functions (UTB-REFs) comprises the most adequate kind of REF for rank learning applications. Moreover, the REF named F4 in this present study was recommended by López-Pujalte et al. in  as a promising one.
Fan et al.  compared seven UTB-REFs on ranking function discovery for Web search using genetic programming (GP). Their experiments on a large Web Corpus revealed that some UTB-REFs, named F9, F7, F8 and F3 in this present study, were more effective in guiding the GP search than others which were analyzed. In a following investigation, Fan et al.  used the UTB-REF named F10 in this paper, with the aim of increasing the precision of information retrieval in two steps: first, by discovering new ranking functions using genetic programming; second, by combining document retrieval scores of different ranking functions using genetic algorithms. The use of UTB-REF F10 was justified since it is a standard performance measure used in information retrieval studies.
Torres et al.  used GP to discover functions to combine different descriptors for content-based image retrieval (CBIR) tasks. Their method relies on a training set containing query images together with the relevant images to each query image and, obviously, a REF that guides the GP search towards a proper combination function. In this context, the authors tested seven UTB-REFs as fitness functions in the GP - the same UTB-REFs used by Fan et al. in . The UTB-REFs that produced the best results are named F6, F7 and F4 in this paper. Ferreira et al.  proposed a similar method of  using RF instead of a training set of queries. This study does not compare REFs and uses the UTB-REF F4, due to its promising results in .
Stejić et al.  used a GA-based RF model to improve image retrieval results by applying learning weights to image descriptors and image regions (WLSP-C ± model). This study presented promising approaches such as the concept of local similarity patterns (LSP) and the use of continuous positive and negative weights modeling relevance and undesirability of visual features. In spite of the promising features of the model, the authors did not provide an effective mechanism for learning a proper set of weights. The use of the R-precision measure without any other REF analysis is the most critical aspect of the Stejić et al. research, as other studies had shown that UTB-REFs are more appropriate for such ranking modeling.
Silva et al.  extended the WLSP-C ± model by Stejić et al.  proposing a new UTB-REF in substitution to the R-precision measure used as the objective (fitness) function into GA. Their results showed a significant improvement in the image retrieval precision and in efficiency as the proposed UTB-REF speed up the GA search in direction of optimal solutions.
As we can observe from the studies reported, there is no consensus about which is the best REF for many of the applications, and many studies have overlooked the REF analysis. Even for the same task, there is no consensus about the best REF, as we can see from the REF analysis performed in the studies by Fan et al.  and Torres et al.  that employed the same set of REFs. In this way, we will show that there is space for development in this issue and that new studies should consider the analysis of broad sets of REFs, due to the fact that a proper choice should be context-sensitive.
In this paper, we used the WLSP-C ± model proposed by Stejić et al.  and used in  to investigate a broad set of REFs for feature weighting aimed at improving image retrieval performance. The choice of WLSP-C ± model was motivated by its promising results. The REFs were applied as fitness functions into a specialized GA for learning weights. Analyses of precision-recall curves in five real-world image data sets showed that the REF design applies a key role regarding the effectiveness and efficiency of the WLSP-C ± model. Also, we found that the non-parameterized REF proposed in  and named F5 in this present paper overcame recommended ones, which require parameter adjustment. This result indicates that the REF F5 should be investigated in other contexts and problem scenarios centered on ranking optimization mainly for image retrieval, as ranking optimization techniques rely heavily on the ranking quality measure.
The remainder of this paper is organized as follows. The ‘Methods’ Section describes the methodology employed for the analysis of REFs on the WLSP-C ± model. The ‘Results and discussion’ Section compares a broad set of REFs for feature weighting aimed at improving image retrieval and provides a computational complexity analysis of the model. The ‘Conclusions’ Section concludes the paper highlighting the main findings and implications of the present research.
When the user carries out a search, feature vectors of color, shape and texture are extracted from the query image by the feature extraction module and compared, through similarity measures, found in the image feature vectors from the range of images stored in the database. The similarity measure module returns a similarity value S I (q,i) for each image in the database, in relation to the query image. Then, the images are sorted in decreasing order of similarity (ranking) and the first samples are shown to the user. Not satisfied with the result of the search, the user can provide feedback, indicating to the system the relevant images according to his/her point of view. Based on the user’s feedback, the GA-based relevance feedback mechanism adjusts the similarity measure according to the user’s criteria through image feature vector weighting (ω F ) and region weighting (ω R ). n g corresponds to the number of generations for the genetic algorithm.
The retrieval process is based on the local similarity pattern, where the image areas are uniformly partitioned into regions, and the similarity between images is measured by corresponding region similarities. Similarity between regions, and therefore between images, is computed through three feature vectors (F) encoding properties of color, shape and texture, represented by color moments, edge direction histogram and texture neighborhood, respectively. The distance between pairs of color feature vectors is computed by Euclidean distance, while distances between pairs of shape and texture feature vectors are computed by city-block distance.
To make comprehension easier, we present in the next subsections a detailed description of the WLSP-C ± model and the GA-based RF mechanism. Then, we describe the analyzed ranking evaluation functions and also the employed image data sets.
WLSP-C ± model
The WLSP-C ± model is optimized by fitting the weights ω R (r) and ω F (r,f), so that the retrieval accuracy according to the query image and the set of relevant images chosen by the user is maximized. As in  and , we solve this optimization problem using a real-code GA that infers weights in the range [ -1,1]. Continuous negative and positive allows for the mapping of the user’s concepts of relevance, irrelevance and undesirability of image visual properties producing superior results than positive weights alone as shown in . Since we found the best results with the WLSP-C ± model, we did not analyze in this study the other models proposed by Stejić et al. in .
The GA-based RF mechanism
Our RF mechanism relies on a GA designed and adjusted for learning weights in the paper . Algorithm 1 describes the main steps of the GA. The chromosome coding is similar to the coding employed in . As each image was partitioned into m regions, each chromosome (C) contains m genes (G1,G2,G3,…,G m ). Moreover, each gene (G i ) contains a vector of four weights, with the first quantifying the region importance and the other ones quantifying the importance of the color, shape and texture descriptors, respectively. We have tested m = 4, m = 9, m = 16 and m = 25. The best result obtained from these empirical tests was m = 16, which was defined as default.
Algorithm 1 GA-based RF algorithm
Ranking evaluation functions
We compared ten REFs being two not based on the utility theory (nUTB-REF) and eight based on the utility theory (UTB-REF). Utility theory-based fitness functions (UTB-REFs) are based on the utility concept, where the score value of a relevant element in the ranking is usually inversely proportional to its position. That is, the higher the rank of a relevant element, the higher its utility. Non-utility theory-based fitness functions (nUTB-REFs) are REFs that do not strictly follow the utility concept.
A REF plays the role of the GA fitness function, and it is applied as described in Algorithm 2. First, the image similarities (Equation 1) between the query image and each image in the data set are computed by employing the weights coded by the individual . Then, the images are sorted according to the similarity values which make up a ranking. Finally, a ranking evaluation function is applied to the ranking to obtain the fitness value. In the following, we describe the ranking evaluation functions analyzed, grouping them into two categories: nUTB-REF and UTB-REF. denotes the fitness value of the individual for the query q, I represents the image data set, |I| denotes the cardinality of I, D represents the set of images known to be relevant to a query q, |D| denotes the cardinality of D and pos (i) returns the position (rank) of the image i in the ranking.
Algorithm 2 Fitness function employment
Non-utility theory-based fitness functions
The non-utility theory-based fitness functions are as follows:
Fitness function F1. This fitness function is given by the R-precision measure, which is a well-known REF used to evaluate information retrieval effectiveness:(2)
where n R is the number of elements considered in the query answer.
Fitness function F2. This function is based on an analysis of the numbers of true positive (Rr - relevant and retrieved items), false positive (Rn - retrieved but non-relevant items) and false negative (Nr - non-retrieved relevant items):(3)
Utility theory-based fitness functions
Utility theory-based fitness functions (UTB-FFs) are fitness functions based on UTB-REFs. We analyzed eight UTB-FFs (F 3 to F 10) defined as follows:
Fitness function F3(4)
Fitness function F4(5)
where A is a user-defined parameter with values larger than or equal to 2.
Fitness function F5(6)
Fitness function F6(8)
where k1 and k2 are user-defined parameters.
Fitness function F7(9)
where k3 is a user-defined parameter.
Fitness function F8(10)
where k4, k5, k6 and k7 are user-defined parameters.
Fitness function F9(11)
where k8 and k9 are user-defined parameters.
Fitness function F10(12)
where r(argi i:pos(i i)==j) returns 1 if the image ii in the j th position of the ranking is relevant, otherwise it returns 0.
Fitness functions F3 and F4 were used in  for the learning of weights, which were structured according to the vectorial space model, in the context of textual information retrieval. The fitness function F5 was proposed in , and the functions F6 to F10 are used in  and  for GP-based ranking function discovery to improve textual information retrieval and CBIR tasks, respectively.
Results and discussion
Previous studies on rank learning methods [5, 13, 17, 20] show that, in general, UTB-REFs lead to more precise information retrieval results than nUTB-REFs. Moreover, these studies show that the UTB-REFs’ design by itself significantly affects the information retrieval results. In our study, we performed a systematic investigation of REFs for descriptor/region weighting in image retrieval using the successful model WLSP-C ± (Equation 1). Considering the comparison of REFs, although our results were in line with those reported in the literature, we found better results with the UTB-REF F5, which has not been investigated in other research studies.
Area under the precision recall curve referred to in Figure 2 , bounded at 25%, 50% and 75% of recall
Data set Vistex-167
Data set Corel-1000
Data set DB-10000
Data set Scenes-1044
Data set Caltech101-8872
Scores assigned for two hypothetical rankings with 31 retrieved images
Also, in reference to Figure 2, we found that the P&R graphs obtained using UTB-REFs (F3–F10) are noticeably different from those obtained using nUTB-REFs (F1 and F2). One easily notes that, in general, the UTB-REFs produced substantially higher precision values than the nUTB-REFs (F1 and F2), when considering low recall rates. This is a very important aspect that has not been discussed by other researchers. Utility theory-based evaluation functions enable these sort of results, due to the fact that they allow for the appropriate modeling of the user requirements in regard to ranking quality.
Average number of GA generations and computational time (in seconds) spent
Data set category
Finally, we provided a study for the computational complexity of the RF technique, and we found that it is linear regarding the number of images in the data set. We analyzed the number of similarity operations (Equation 1) computed by the fitness function during the evolutive process, as the similarity calculus is the most expensive operation in the RF process.
In Algorithm 1, step 1 has complexity O(1), as it does not depend on the number of images in the data set. In step 2, the fitness score for each individual is computed employing Algorithm 2. Analyzing the Algorithm 2, it is trivial to find out that the image similarity operation (step 2) takes time O(n), where n is the number of images in the data set. Step 3 is O(n logn) – time for sorting the similarity values of n images. However, the image similarity operation takes significantly larger computational time than value comparisons and exchanges of sorting algorithms, even for considered unthinkably large image data sets today (containing several million or more elements). Thus, we consider as the main operation of Algorithm 2, i.e. the time unit, the number of operations performed by the similarity query process that increases in O(n).
Returning to Algorithm 1, any of the steps 3 to 7 has complexity O(1) for the same reason as step 1. In summary, as the fitness function is applied a constant number of times, depending on the population size, generation number and crossover rate, the GA-based RF algorithm is O(1)O(n), i.e., linear. It is important to remember that the constant term O(1) can be significantly high, depending on the GA parameters. However, the fitness operations can be performed in a parallel fashion in each GA generation.
As known from many research studies, the objective function plays a crucial role in ranking optimization. In this study, we present an up-to-date investigation of ranking evaluation functions (REFs), a special class of objective function employed in rank learning methods aimed at providing precise information retrieval. Using a GA-based RF method as a rank learning mechanism for image retrieval, we analyzed ten REFs, which includes the most successful REFs employed in previous studies regarding comparison of REFs adding some functions not investigated.
We performed an analysis of precision-recall curves in five real-world image data sets. Although our results were in line with those reported in the literature, showing that the REF design has a decisive hole in rank learning, we found that the UTB-REF named here F5, which is not included in previous studies that compared REFs, provided better results than the recommended REFs. Additionally, the computation of F5 does not require any parameter, to the contrary of previously recommended REFs. Also, we found that UTB-REF is the most appropriate class of REF for top-ranking optimization. Another important issue noticed is that the time spent in the ranking optimization process when using a proper UTB-REF, such as F5, is significantly lower than when using a well-known nUTB-REF, such as the R-precision measure. Showing the strength of GA search for the optimization task, we compared and found that GA significantly overcame multistart (MS) search. This result shows that GA search is effective for learning weights through RF aiming at optimizing image retrieval results.
Our results added to those from the literature, showing a categorization and a systematic analysis of REFs and confirming that the REF design plays a key role in rank learning. To the best of our knowledge, this is the first study carried out to investigate the importance of REFs in feature weighting for CBIR tasks.
As REFs play a key role in many ranking optimization tasks, our results indicate that REF F5 could be effectively applied in other contexts and applications focused on ranking optimization, such as recommender systems: the idea here is to provide recommendations sorted according to their expected utility, such as user rating and/or similarity according to the user’s interests. Also, we put together and compared a broad set of REFs that can be used for future research in the ranking optimization field.
We thank CNPq, CAPES and FAPESP for the financial support.
- Adomavicius G: Improving aggregate recommendation diversity using ranking-based techniques. IEEE Trans Knowl Data Eng 2012, 24(5):896–911.View ArticleGoogle Scholar
- Liu TY: Learning to rank for information retrieval. Foundations Trends Inf Retrieval 2009, 3(3):225–231.View ArticleGoogle Scholar
- Pedronette D, Torres R: Exploiting contextual information for image re-ranking and rank aggregation. Int J Multimedia Inf Retrieval 2012, 1: 1–14. 10.1007/s13735-012-0009-1View ArticleGoogle Scholar
- Qin T, Liu TY, Xu J, Li H: LETOR: a benchmark collection for research on learning to rank for information retrieval. Inf Retrieval 2010, 13(4):346–374. 10.1007/s10791-009-9123-yView ArticleGoogle Scholar
- Torres RS, Falcão AX, Gonçalves MA, Papa JP, Zang B, Fan W, Fox EA: A genetic programming framework for content-based image retrieval. Pattern Recognit 2009, 42(2):283–292. 10.1016/j.patcog.2008.04.010View ArticleGoogle Scholar
- Vargas S, Castells P: Rank and relevance in novelty and diversity metrics for recommender systems. In Proceedings of the fifth ACM conference on recommender systems. Chicago, 23–27 October 2011; 2011:109–116.View ArticleGoogle Scholar
- Ah-Pine J: On data fusion in information retrieval using different aggregation operators. Web Intell Agent Syst 2011, 9: 43–55.Google Scholar
- Ailon N: Aggregation of partial rankings, p-ratings and top-m lists. Algorithmica 2008, 57(2):284–300.MathSciNetView ArticleGoogle Scholar
- Lin S: Rank aggregation methods. Wiley Interdiscip Rev: Comput Stat 2010, 2(5):555–570. 10.1002/wics.111View ArticleGoogle Scholar
- Nuray R, Can F: Automatic ranking of information retrieval systems using data fusion. Inf Process Manag 2006, 42(3):595–614. 10.1016/j.ipm.2005.03.023View ArticleGoogle Scholar
- Drosou M, Pitoura E: Search result diversification. ACM SIGMOD Rec 2010, 39(1):41–47. 10.1145/1860702.1860709View ArticleGoogle Scholar
- Santos R, Macdonald C, Ounis I: Exploiting query reformulations for web search result diversification. Proceedings of the 19th international conference on World Wide Web, WWW ’10, Raleigh, 26–30 April 2010 2010, 881–890.Google Scholar
- Fan W, Fox EA, Pathak P, Wu H: The effects of fitness functions on genetic programming-based ranking discovery for web search. J Am Soc Inf Sci Technol 2004, 55(7):628–636. 10.1002/asi.20009View ArticleGoogle Scholar
- Ferreira C, Santos J, Torres RS, Gonçalves M, Rezende R, Fan W: Relevance feedback based on genetic programming for image retrieval. Pattern Recognit Lett 2011, 32(1):27–37. 10.1016/j.patrec.2010.05.015View ArticleGoogle Scholar
- López-Pujalte C, Guerrero-Bote VP, De Moya-Anegón F: Genetic algorithms in relevance feedback: a second test and new contributions. Inf Process Manag 2003, 39(5):669–687. 10.1016/S0306-4573(02)00044-4View ArticleGoogle Scholar
- López-Pujalte C, Guerrero Bote VP, Moya-Anegón F: A test of genetic algorithms in relevance feedback. Inf Process Manag 2002, 38(6):793–805. 10.1016/S0306-4573(01)00061-9View ArticleGoogle Scholar
- López-Pujalte C, Guerrero-Bote VP, Moya-Anegón F: Order-based fitness functions for genetic algorithms applied to relevance feedback. J Am Soc Inf Sci 2003, 54(2):152–160. 10.1002/asi.10179View ArticleGoogle Scholar
- Silva SF, Barcelos CAZ, Batista MA: Adaptive image retrieval through the use of a genetic algorithm. Proceedings of IEEE international conference on tools with artificial intelligence (ICTAI), Patras, 29–31 October 2007 2007, 557–564.Google Scholar
- Stejić Z, Takama Y, Hirota K: Genetic algorithms for a family of image similarity models incorporated in the relevance feedback mechanism. Appl Soft Comput 2003, 2: 306–327. 10.1016/S1568-4946(02)00070-4View ArticleGoogle Scholar
- Fan W, Pathak P, Zhou M: Genetic-based approaches in ranking function discovery and optimization in information retrieval—a framework. Decis Support Syst 2009, 47: 398–407. 10.1016/j.dss.2009.04.005View ArticleGoogle Scholar
- Massachusetts Institute of Technology Media Laboratory: Vistex database. 2005.http://vismod.media.mit.edu/pub/VisTex/ . Last accessed on 06 Feb 2014Google Scholar
- James Z. Wang’s Research Group. Corel database: Corel Corporation, Corel Gallery 3.0. 2004.http://wang.ist.psu.edu/~jwang/test1.tar . Last accessed on 06 Feb 2014Google Scholar
- Vision Lab. in Computer Science Department: 13 scene categories database. 2004.http://vision.stanford.edu/Datasets/SceneClass13.rar . Last accessed on 06 Feb 2014Google Scholar
- Fei-Fei L, Fergus R, Perona P: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. IEEE CVPR 2004 workshop on generative-model based Vision (IEEE, Piscataway, 2004) 2004.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.