Applying graphical oracles to evaluate image segmentation results
© The Author(s) 2017
Received: 7 July 2015
Accepted: 22 December 2016
Published: 19 January 2017
Segmentation plays an important role in the pattern recognition and image processing areas. Several techniques have been proposed aiming at solving generic issues or particular applications. Traditionally, these techniques have been evaluated by using the Overlap measure, which verifies the coincident and non-coincident areas between the image resulting from a segmentation process and an image considered correct. Albeit widely, this type of measure does not allow flexibility in the assessment process. We here propose an approach to evaluate segmentation techniques using concepts from content-based image retrieval and considering a methodology for testing generic programs with graphical outputs, named graphic oracle. Our approach was applied to evaluate the segmentation of mammographic images, and the results indicate a performance compatible with the traditional measure with more flexibility and precision. Thus, our approach provides a contribution to allow a more flexible segmentation assessment, according to image characteristics and application objectives.
Segmentation is the process of subdividing an image into its constituent parts or objects in order to isolate a region of interest . Segmentation is essential to most image processing and pattern recognition algorithms as well as applications in related areas.
In approaches involving computer-aided diagnosis (CAD), for example, this task is the basis for locating suspicious regions in various medical imaging modalities. Zheng et al.  drew attention to studies showing that the effective segmentation of mammographic masses and microcalcifications are essential for developing CAD schemes. Image segmentation also plays an important role in recognizing biometric measurements , objects in satellite images , and plant structures in agriculture [5, 6].
A segmentation scheme implemented for a given purpose must undergo an evaluation process to verify its effectiveness in terms of the problem under consideration. This process is largely composed of software testing activities. When a component is executed under specific conditions, the results are observed, and some aspects of the component is evaluated.
Evaluating segmentation algorithms is a special case of testing programs with graphical outputs. This problem has additional complexity as it is no easy task to confirm whether an output image is correct or not according to the system requirements. For example, with respect to mammography CAD schemes, Zheng et al.  stated that evaluating automated segmentation schemes is a difficult task and can be ineffective in the case of subtle masses with irregular diffuse edges and surrounded by dense breast tissues.
A systematic review  showed that most studies that evaluated segmentation schemes used the Overlap measure in this task [8, 9] and there is no new approach in the CAD context in the literature in the recent years. However, some proposals are cited to evaluate generic segmentation, such as set of scalable discrepancy measures , the computation of the difference between a region extracted from a segmentation map and the corresponding one on an ideal segmentation , a metric defined as a function of various error types , a measure built according to defined quality criteria, such as shape parameters and homogeneity criterion between regions , a metric based on the distance between segmentation partitions , and more recently probabilistic metrics .
A new software testing technique for programs with graphic outputs was proposed in . This new technique applies concepts of content-based image retrieval (CBIR) to automate the testing oracles. Based on this new approach, this paper aims to propose, implement, and validate a software testing methodology for evaluating image segmentation results. To reach this goal, we use a previously developed framework named O-FIm (Oracle For Images)1 as a support tool. The validation is conducted with segmentation outputs of a breast region in mammographic images, which is part of a CAD system .
The evaluation results from the proposed methodology were compared to the results that used a methodology based on the Overlap measure under equivalent test conditions. Thus, the consistency of our methodology could be validated with the results of a traditional measure and thereby determine its advantages and limitations. In general, the results from applying the methodology based on graphic oracles proved to be compatible with those obtained using the methodology based on the Overlap measure. A significant advantage of our approach regards its flexibility, which allows adaptation to the test criteria set for the system under evaluation.
In addition to this introductory section, this paper is organized as follows. The section “Background” offers a background regarding the main concepts used in this article as well as a literature review about assessment of segmentation schemes. The section “Research design and methodology” presents the materials and methods used to carry out the work, which includes a description of the testing methodologies evaluated, the characteristics extracted from the images, the similarity functions used, and the comparison and evaluation of the methods and the results. The section “Results and discussion” presents and discusses the results of the various experiments conducted. Lastly, the section “Conclusions” shows the final remarks.
Segmentation, content-based image retrieval, and graphic oracles are the main concepts used herein, and are presented in the next three subsections. In the section “Evaluation of segmentation,” we provide a literature review about segmentation evaluation.
Image processing techniques have been used for several applications in a wide range of knowledge areas. A common classification of such techniques considers three categories: low, intermediate, and high levels. The former category considers techniques to smooth noises and enhance structures of interest in an image. The latter category is responsible for linking the information provided by the previous steps with a knowledge basis. Segmentation is in the second category. It aims at isolating a region of interest in an image. Pixels of this region have common characteristics, which are usually related to aspects of an object represented in the image. Segmentation helps in image classification, since it allows identifying structures present in images.
There is no consensus about the classification of segmentation techniques. One of the mostly used is provided by Gonzalez and Woods , which categorizes the techniques in three generic classes: thresholding, edge detection, and region growing. The applicability of each technique depends on the image contrast and on the presence of noise. Segmentation is not a trivial task and must consider the goal of the application as well as the characteristics of the image. This is the reason that explains the constant publication of new specific techniques in the literature.
Content-based image retrieval
The CBIR system performance and accuracy depend on the choice of suitable features to capture relevant information about the images. This is made in the feature extraction step, which usually occurs after a step of pre-processing and segmentation of the image. From then on, the extraction process assigns values for aspects inherent to the segmented object or the region of interest. For most applications, features for image content representation must be insensitive to variations in size, translation, and rotation. Several extractors can be implemented in a CBIR system, and each of them refers to one aspect of the image. The set of values resulting from the features extraction composes a feature vector.
Feature vectors are indexed in the database. Thus, given an user’s query image, its feature vector is computed and compared to the feature vectors of images stored in the database .
To apply a query per similarity in a image database, it is necessary to measure the distance between the feature vectors by using a similarity function. A similarity function is an algorithm that compares two feature vectors and returns a non-negative value. The most common process for this purpose uses metric distances as, for instance, the Euclidean distance.
The search process can involve the comparison between vectors with high dimensions. Thus, it can be necessary to optimize the performance of the queries by applying adequate index structures, particularly those driven by distance measures. Some works have proposed structures for this purpose [20, 21].
Graphic oracles and the O-FIm framework
The definition of a test oracle refers to an effective mechanism that tells the tester whether the output obtained for a given test is acceptable or not [22, 23]. Oracles are well defined for trivial domains, when inputs and outputs of the program are, e.g., numbers or texts, but for more complex domains, some challenges exist.
A new approach was presented in  aiming at contributing to test software with graphic outputs. Using concepts of CBIR, the authors propose to automate the testing oracles for this kind of programs by extracting features from the output images and comparing them with a similarity metric. This approach was named graphic oracles.
These oracles can compare one output of the program under test against an image provided as a reference to execute this program. Thus, given the criteria related to the characteristics of the images, defined by the tester, the oracles give a verdict regarding the correct output under examination. Additionally, those researchers proposed a tool to support the definition and use of graphic oracles—the O-FIm framework.
The core of the tool responds to commands to install feature extractors and functions of similarity (plugins) developed for specific image domains. It also provides an application programming interface (API) that allows creating oracles in a simple manner. To conduct a test, the tester must provide a textual description (oracle description) that indicates which are the components installed in the tool (similarity and extractor functions) to be used, as well as their parameters when needed. The O-FIm then uses this description to create the oracle.
The example below shows a textual description example of an oracle compatible with the framework. A graphic oracle is defined by two feature extractors, MyExtractor and OurExtractor, both receiving the necessary parameters for its implementation. Furthermore, also, as part of the oracle example definition is the inclusion of the Euclidean distance as a similarity function and the definition of a threshold value (precision) to indicate the maximum acceptable distance between two compared images to consider them equivalent. In the next sections, we refer to the threshold value in a graphic oracle as threshold.
The oracle descriptors are a simple way to tell the framework how a graphic oracle should be created in order to carry out a comparison during the execution of a specific test. Given this scenario, the plugins are the tester contributions to create the oracles necessary for the testing activity to be conducted.
Evaluation of segmentation
We conducted a systematic review of CAD systems and metrics to evaluate segmentation in such systems . From a large number of papers retrieved, 10 detailed segmentation techniques and the evaluation metrics used in the testing stage. In this context, evaluation metrics refers to metrics that use quantitative data obtained from a system execution to attribute it a performance index.
Other metrics also used in the works included were specificity and sensitivity [27–29]. These metrics are part of a set of very traditional statistic metrics used for evaluating CAD systems. They are based on the concepts of true positive diagnoses (TP), true negatives (TN), false positives (FP), and false negatives (FN) . When applied to segmentation, an approach to use these metrics is to determine the pixels for TP (belonging to the region of interest and segmented), TN (do not belong to the region of interest, and were not segmented), FP (do not belong to the region of interest but were segmented), and FN (belong to the region of interest and were not segmented).
The metrics presented in this section refer to the generic approaches that can and are used to evaluate segmentation schemes developed for different types of applications. These metrics were introduced at this time so that they could be compared using the approach proposed herein.
Research design and methodology
We proposed, applied, and evaluated a methodology based on graphic oracles to test a segmentation scheme used in . We used the O-FIm framework as base technology to conduct all the experiments. The validation of the proposed methodology was performed by comparing its results with the results of a methodology based on the Overlap measure. Thus, we verified to what extent the graphic oracle approach based on CBIR can contribute to an effective evaluation of segmentation algorithms, as well as its advantages and disadvantages when compared to methods based on metrics as those presented in the section “Evaluation of segmentation”.
First, we defined the case study and a graphic oracle was built for evaluating images of this case study. Next, we chose the similarity functions and the suitable features to be extracted from the images. The next step was to implement these artifacts in the O-FIm framework and to define a method to evaluate segmentation. Lastly, we compared the results with an Overlap-based evaluation approach.
The segmentation algorithm used in our tests automatically isolates the breast region in mammograms. The objective of this case study was to compare the results of an automated segmentation process with the results of a manual segmentation considered correct. Gray-level mammographic images were processed considering the steps described as follows. Firstly, the image is analyzed to find the center of mass and discover on which side the breast image is located (right or left). If the breast is located on the left side, a rotation is executed to put the image on the right side. Secondly, a thresholding is executed to transform the original image into a binary one, where the white region represents the breast and the black region is the background. Then, the center point of the right border is selected and, from it, radial lines are drawn from a determined interval of angles, from this point to the first black pixel found (first background pixel). Lastly, all the points found are joined to form the breast edge and all external points to this edge receive the black color.
We used 30 test cases of this scheme in the evaluation experiments. Two individuals manually segmented the breast region in the 30 original mammograms, thus generating two sets of ground truth (GT) composed of the reference images—in the context of our approach—in the test oracles.
The graphic oracles are configured in relation to similarity functions and feature extractors. Therefore, to conduct the case study, we implemented and included different functions and extractors in the O-FIm framework. Each experiment was repeated three times, using a different similarity function in each execution. The results of a previously conducted study  determined three major similarity function groups with similar behaviors. For this study, one function from each of these groups was selected.
We used three different metric distances in the experiments. They are presented as follows, where A and B represent the feature vectors of the two images being compared. In addition, n represents the number of feature extractors used to perform the comparisons.
Three feature extractors were implemented and included in the O-FIm framework herein. All features were implemented to normalize the computed values in the interval [0,1].
Area: the Area extractor counts the number of pixels that belong to a region of interest in the image (breast area represented). In the images we used, the region of interest is represented by pixels with values greater than zero (in the grayscale), as the images are binarized. The normalizalization of the result is obtained by dividing the number of pixels by the total number of pixels in the image.
Signature: the value computed by the Signature extractor represents the breast contour according to its regularity. Therefore, the algorithm finds the center of the breast in the last column of the image, with intervals in degrees, calculating the distance of this center point from the breast contour, and calculates the standard deviation of the values computed (for a perfect circle, this value should be zero). The extractor returns the standard deviation obtained divided by the highest measure calculated in the previous step.
Perimeter: this extractor counts the number of pixels belonging to edge of the region of interest in the image. The value obtained is divided by the total perimeter in the image.
Methodology based on the proposed graphic oracles
To evaluate each output in the segmentation scheme (O i , i=1…30), comparing it with each of its reference images (G 1,i e G 2,i , i=1…30), we determined the graphic oracle textually described, as shown next.
This oracle description considers the features of Area, Perimeter, and Signature (described in the section “Feature extractors”) to compose the feature vectors. Thus, the automated segmentation quality can be evaluated by comparing the output vector under test with the vector of its reference image. To do so, a FS similarity function that calculates a measure is used to indicate the difference between vectors. A threshold value indicates the maximum value for this measure, which will indicate that the vectors are sufficiently similar to consider this segmentation correct.
All characteristics used in this case study consider all the pixels with zero value in grayscale as background image (meaning of the thr parameter that appears in the description of the graphic oracle). The signature extractor uses an interval of 10° (parameter angleRadius) to compute the feature value (details are available in the section “Feature extractors”). Note that one of the reasons for choosing these characteristics was their simplicity. This fact demonstrated that extremely complex extractors are not needed to obtain the evaluation results that are consistent with the Overlap measure, which served as a basis for comparison in this study.
Besides the general graphic oracle defined above, comparisons were made using each feature individually. Thus, the contribution and influence of each feature in different test scenarios constructed in this work could be evaluated.
To establish whether an output tested passed or failed the test (comparison), a threshold value must be determined (threshold) that will indicate the maximum difference that can be computed between two vectors so they can be considered similar.
As the distances between the images of G T 1 and G T 2 are small—as they were manually processed—the α value represents the number of times the distance between an output and its reference image can be greater than the maximum distance between two reference images of the sets G T 1 and G T 2 for the same output. In this study, we conducted experiments using different values for α, varying them from 1 to 2 at fixed intervals.
Methodology based on the Overlap measure
Thus, the closer to zero the ComplOverlap value is, the better the segmentation performance. This measurement transformation was performed to allow calculating a threshold similar to that shown in the Section “Threshold definition”.
The ComplOverlap value was calculated to evaluate each output of the segmentation scheme with its respective reference images of sets G T 1 and G T 2. The threshold values determined were then used to evaluate whether the output passed different test scenarios (possible due to the change in value α). For automating these procedures, algorithms were implemented to calculate the ComplOverlap between a pair of images (representing the segmented regions) and to evaluate the results.
A coherent comparison was thus made between the methodology based on graphic oracles and the methodology based on Overlap measure, the ComplOverlap.
Results and discussion
The performance of the segmentation scheme was determined as a quality criterion to evaluate the number of images approved, that is, those which, according to the evaluation method used, were correctly segmented by the system. The performance measure considered was the percentage of approved images.
All the experiments that included the defined graphic oracles were performed three times. A different similarity function was used in each run (see the Section “Similarity functions”).
Results from the Canberra distance
Comparing the number of approved and disapproved images, for the set of reference images G T 1, the ComplOverlap methodology was verified to have approved two images less than the “Canb + All” oracle. Using the set G T 2, only one image was. Therefore, when the two methodologies are compared, no significant variation was observed in the number of approved images, given that the greatest difference was of only two images. In this experiment, the ComplOverlap method proved to be more rigorous for evaluating the images.
Applying each extractor individually for both sets of reference, a less significant variation was obtained in the number of images approved for the Area extractor (~Canb + Area~)—one more image was approved for G T 1 and two more images for G T 2. More significant variations, that is, a greater number of approved images was observed in the Perimeter (~Canb + Peri~) and Signature (~Canb + Sign~) extractors—five more images approved for G T 1 and eight more images for G T 2.
According to these results, for the Canberra distance, the most critical feature (the one that most disapproved) was the Area, which significantly influenced the results when all three extractors were applied in unison. Note how in this result, in the graphic oracle approach, each extractor is individually important in the comparison. This means that the evaluator or system tester can set the characteristics that are important for a satisfactory segmentation and use them in the evaluation. In fact, the empirical results showed that, excluding only the Area extractor, the number of approved images increased to 21 when the reference set G T 1 was used, and increased to 19 when G T 2 was used.
The Overlap measure evaluates only the features of the area in the segmentation and the location of the segmented region. Thus, the quality of the system under evaluation is intrinsically related just to these characteristics. In the methodology based on graphic oracles, many other characteristics may be incorporated, such as the regularity and circularity of the edge, and other features, hence enriching the test process and increasing the tester flexibility.
Performances obtained with Canberra distance by combining the two methodologies with the two reference sets (threshold with α=1)
Canb + All (%)
Output×G T 1
Output×G T 2
Performances obtained with Canberra distance by combining the two methodologies with the two reference sets (threshold with α=2)
Canb + All %
Output×G T 1
Output×G T 2
Also important is the fact that for all of the α values used, the ComplOverlap methodology was more rigorous in the evaluation of the segmentation system, that is, fewer images were approved. This shows that the inclusion of characteristics the Overlap measure does not consider, such as Signature, influences the evaluation of the results.
Value of ComplOverlap and Canberra distances calculated from the images in Fig. 9 and their respective evaluation results
Value calculated a
Results from statistical value χ 2
Performance obtained with statistical value χ 2 by combining the two methodologies with two sets of reference (threshold with α=1)
χ 2 + All (%)
Output×G T 1
Output×G T 2
For the extractors applied individually with the function χ 2, for α=1, no characteristic was significantly more critical than the others. This result could explain the lower variation observed for the results of the ComplOverlap methodology and the results of the graphic oracle using the three extractors together.
Performances obtained with the statistical value χ 2by combining the two methodologies with two sets of reference (threshold with α=2)
χ 2 + All (%)
Output×G T 1
Output×G T 2
Individually applying the extractors to the graphic oracle approach showed that the increase in thresholds highlighted the Area feature as the one that least approved outputs for both sets of reference, unlike the results from α=1, in which this difference was less significant.
Unlike the Canberra distance, similarity function χ 2 was more rigorous in most thresholds than the ComplOverlap methodology.
Results from the Euclidean distance
Individually applying each extractor in the graphic oracle methodology, higher numbers of approved outputs are observed, with some performance variation between extractors. Thus, it is important to check the consistency of the results from the ComplOverlap methodology and the results from the graphic oracles, regardless of the similarity function used.
Performances obtained using Euclidean distance by combining the two methodologies with the two sets of reference (threshold with α=1)
Eucl + All (%)
Output×G T 1
Output×G T 2
Performances obtained using Euclidean distance by combining the two methodologies with the two sets of reference (threshold with α=2)
Eucl + All (%)
Output×G T 1
Output×G T 2
Again, considering the extractors individually, there were variations in the number of approved outputs comparing extractor against extractor. Even with these differences, comparing the results of the four oracles and the ComplOverlap methodology, it was verified that the performance of the segmentation schemes did not significantly differ from each other.
Considering the reference set G T 1, the ComplOverlap was more rigorous than the graphic oracle only after value α=1.5. Considering the set G T 2, for any value of α tested, ComplOverlap was more rigorous or produced results equivalent to those with the graphic oracle.
Comparison between the similarity functions used
With the results presented in the previous subsections, the performances obtained were in accordance with the results of the three similarity functions used. Differences were observed for these results, which were expected, since functions with behaviors different from each other were selected, according to a previous work on similarity functions .
It was observed that for most α values tested, the Euclidean and Canberra distances were less rigorous than the ComplOverlap methodology. However, function χ 2 was more rigorous than this metric in most cases.
Characteristics, advantages, and limitations of the methodology based on graphic oracles
The main limitation or disadvantage of the proposed methodology regards the computational cost involved. The more images to be tested, the more feature vectors to be computed. The greater the vector size, the more processing is required. This disadvantage can be minimized by optimizing the algorithms implemented with effective indexing techniques (for example, when there is a fixed set of images it is not necessary to always calculate the same features if their values are efficiently stored) and with size reduction techniques for the vectors.
Even with this limitation, the advantage of adapting the methodology to the evaluation criteria for each particular system, determining the specific extractors, is very interesting and powerful. The experiments conducted demonstrated the consistency of the results with results provided by a technique that is widely used in the literature. However, although the methodology based on the Overlap measure, in the specific case of the experiments discussed here, exhibited similar results, it does not have the same flexibility as the proposed methodology.
Although apparently simpler, the intersection between the images compared does not allow finding differences that could lead to improving the algorithms implemented in a CAD scheme. However, using the specific extractors, as proposed in this study, can help to more precisely identify defects in the software. For example, the pre-processing algorithms may be distorting the edges of the structures of interest, a defect that an edge extractor can more clearly point to, as is the case of the example shown in Fig. 9.
With all the results shown, the methodology based on graphic oracles was concluded to be a robust tool to evaluate the performance of segmentation schemes. A key attribute of this methodology is that the evaluator can define which criteria are deemed important for evaluating the system and transform them into feature extractors. Thus, during the tests, only the essential can be taken into consideration. For example, if the regularity of the edge of the segmented region is not so relevant to consider whether the segmentation is accurate or not, a Signature extractor will not be necessary. However, if the size and location of one or more objects in the image are relevant, then the Area and Center of Mass extractors , for example, may be effective for composing feature vectors.
Furthermore, the O-FIm framework is a free tool that is available and can be used to configure graphic oracles, as well as serve as API for implementing test scripts that use such oracles to evaluate systems with graphical outputs.
This paper presented the results of a case study in which an evaluation methodology was tested on a segmentation scheme for mammographic images. The proposed methodology is based on graphic oracles and uses the O-FIm framework as a tool for configuring the oracles and for conducting the tests.
The contribution of this paper is to provide a flexible methodology for evaluating segmentation schemes, which can include features of interest, besides the segmented area. The O-Fim framework is distributed as free software and available for public access. The feature extractors used in the experiments were also available and can be easily reused. In addition, other extractors can be implemented, including extractors based on techniques used in the segmentation process itself.
The results in this work demonstrate the validity of the proposed methodology and its consistency with the results of a second methodology based on the Overlap measure, a metric that has been used in many works found in the literature to evaluate segmentation schemes. In the experiments conducted, the proposed methodology proved to be robust for the similarity function used and also flexible and adaptable to effectively evaluate segmentation schemes.
The authors would like to thank The State of São Paulo Research Foundation (Fundação de Amparo à Pesquisa do Estado de São Paulo) (Fapesp)—Process 2010/15691-0 and the Brazilian National Council of Scientific and Technological Development (Conselho Nacional de Desenvolvimento Cientìfico e Tecnológico) (CNPq)—Processes 559931/2010-7 and 401745/2013-9, and the National Institute of Science and Technology—Medicine Assisted by Scientific Computing (Instituto Nacional de Ciência e Tecnologia—Medicina Assistida por Computação Científica)—INCT-MACC.
VMG contributed in the implementation of the features and similarity functions, experimental study planning and execution, and text writing. MD contributed in the framework conception and implementation, planning and evaluation of the experimental studies, and text writing. FN was involved in the conception of the project idea, conception of the case studies, definition of the features to be used in the experimental studies, and text writing. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Gonzalez RC, Woods RE (2008) Digital Image Processing. 3rd edn. Pearson Education, New Jersey.Google Scholar
- Zheng B, Pu J, Park SC, Zuley M, Gur D (2008) Medical Imaging 2008: Computer-Aided Diagnosis. In: Giger ML Karssemeijer N (eds)Medical Imaging 2008: Computer-Aided Diagnosis, 691530–169153011.. SPIE. http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=2008SPIE.6915E....G%26data_type=BIBTEX%26db_key=PHY%26nocookieset=1.
- Bastos CACM, Tsang IR, Vasconcelos GS, Cavalcanti GDC (2012) Pupil segmentation using pulling and pushing and BSOM neural network In: Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference On, 2359–2364. doi:http://dx.doi.org/10.1109/ICSMC.2012.6378095.
- Banerjee B, Surender VG, Buddhiraju KM (2012) Satellite image segmentation: a novel adaptive mean-shift clustering based approach In: Geoscience and Remote Sensing Symposium (IGARSS), 2012 IEEE International, 4319–4322. doi:http://dx.doi.org/10.1109/IGARSS.2012.6351712.
- Deepa P, Geethalakshmi SN (2011) Improved watershed segmentation for apple fruit grading In: Process Automation, Control and Computing (PACC), 2011 International Conference On, 1–5. doi:http://dx.doi.org/10.1109/PACC.2011.5979003.
- Huddar SR, Gowri S, Keerthana K, Vasanthi S, Rupanagudi SR (2012) Novel algorithm for segmentation and automatic identification of pests on plants using image processing In: Computing Communication Networking Technologies (ICCCNT), 2012 Third International Conference On, 1–5. doi:http://dx.doi.org/10.1109/ICCCNT.2012.6396012.
- Gonçalves VM, Delamaro ME, Nunes FLS (2014) A systematic review on evaluation and characteristics of computer-aided diagnosis systems. Rev Bras Engenharia Biomédica 30(4): 355–383.View ArticleGoogle Scholar
- Gruszauskas NP, Drukker K, Giger ML, Sennett CA, Pesce LL (2008) Performance of breast ultrasound computer-aided diagnosis: dependence on image selection. Acad Radiol 15(10): 1234–1245.View ArticleGoogle Scholar
- Korfiatis P, Skiadopoulos S, Sakellaropoulos P, Kalogeropoulou C, Costaridou L (2007) Automated 3D segmentation of lung fields in thin slice CT exploiting wavelet preprocessing In: Proceedings of the 12th International Conference on Computer Analysis of Images and Patterns, 237–244.. Springer, Berlin.View ArticleGoogle Scholar
- Odet C, Belaroussi B, Benoit-Cattin H (2002) Scalable discrepancy measures for segmentation evaluation In: Image Processing. 2002. Proceedings. 2002 International Conference On, 785–7881. doi:http://dx.doi.org/10.1109/ICIP.2002.1038142.
- Goumeidane AB, Khamadja M, Belaroussi B, Benoit-Cattin H, Odet C (2003) New discrepancy measures for segmentation evaluation In: Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference On, 411–143. doi:http://dx.doi.org/10.1109/ICIP.2003.1246704.
- Gelasca ED, Ebrahimi T, Farias MCQ, Carli M, Mitra SK (2004) Towards perceptually driven segmentation evaluation metrics In: Computer Vision and Pattern Recognition Workshop, 2004. CVPRW ’04. Conference On, 52–52. doi:http://dx.doi.org/10.1109/CVPR.2004.191.
- Hachouf F, Ahmed Seghir Z, Zeggari A (2006) A generic methodology for image segmentation evaluation In: Information and Communication Technologies, 2006. ICTTA ’06. 2nd, 1794–1799. doi:http://dx.doi.org/10.1109/ICTTA.2006.1684658.
- Cardoso JS, Corte-Real L (2005) Toward a generic evaluation of image segmentation. IEEE Trans Image Process 14(11): 1773–1782. doi:http://dx.doi.org/10.1109/TIP.2005.854491.
- Peng B, Li T (2013) A probabilistic measure for quantitative evaluation of image segmentation. Signal Process Lett, IEEE 20(7): 689–692. doi:http://dx.doi.org/10.1109/LSP.2013.2262938.
- Delamaro ME, Nunes FLS, Oliveira RAP (2011) Using concepts of content-based image retrieval to implement graphical testing oracles. Softw Test Verif Reliab 23(3): 171–198.View ArticleGoogle Scholar
- Nunes FLS, Schiabel H, Goes C (2007) Contrast enhancement in dense breast images to aid clustered microcalcifications detection. J Digit Imaging 20(1): 53–66.View ArticleGoogle Scholar
- Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2): 5–1560.View ArticleGoogle Scholar
- El-Naqa I, Yang Y, Galatsanos NP, Nishikawa RM, Wernick MN (2004) A similarity learning approach to content-based image retrieval: application to digital mammography. IEEE Trans Med Imaging 23(10): 1233–1244. doi:http://dx.doi.org/10.1109/TMI.2004.834601.
- Traina Jr C, Traina A, Faloutsos C, Seeger B (2002) Fast indexing and visualization of metric data sets using slim-trees. IEEE Trans Knowl Data Eng 14(2): 244–260.View ArticleGoogle Scholar
- Petrakis EGM, Faloutsos C, Lin KI (2002) Imagemap: an image indexing method based on spatial similarity. IEEE Trans Knowl Data Eng 14(5): 979–987.View ArticleGoogle Scholar
- Baresi L, Young M (2001) Test oracles. Technical Report CIS-TR-01-02, University of Oregon, Dept. of Computer and Information Science, Eugene, Oregon, USA. https://people.eecs.ku.edu/~saiedian/Teaching/Fa07/814/Resources/oracles.pdf.
- Hoffman D (1998) A taxonomy for test oracles In: Proceedings of the 11th International Quality Week, 1–8.. Software Research Institute, Inc, San Francisco.Google Scholar
- Schilham AMR, van Ginneken B, Loog M (2006) A computer-aided diagnosis system for detection of lung nodules in chest radiographs with an evaluation on a public database. Med Image Anal 10(2): 247–258.View ArticleGoogle Scholar
- Song E, Xu S, Xu X, Zeng J, Lan Y, Zhang S, Hung CC (2010) Hybrid segmentation of mass in mammograms using template matching and dynamic programming. Acad Radiol 17(11): 1414–1424.View ArticleGoogle Scholar
- Tan NM, Liu J, Wong DWK, Yin F, Lim JH, Wong TY (2010) Mixture model-based approach for optic cup segmentation In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, 4817–4820.. IEEE.Google Scholar
- Huang SF, Chaoa HY, Hsu CC, Yang SF, Kao PF (2009) A computer-aided diagnosis system for whole body bone scan using single photon emission computed tomography In: 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 542–545.. IEEE.Google Scholar
- Jiménez S, Alemany P, Fondón I, Foncubierta A, Acha B, Serrano C (2010) Detección automática de vasos en retinografías. Arch Soc Esp Oftalmol 85: 103–109.View ArticleGoogle Scholar
- Pietka E, Kawa J, Badura P, Spinczyk D (2010) Open architecture computer-aided diagnosis system. Expert Syst 27(1): 17–39.View ArticleGoogle Scholar
- Garnavi R, Aldeen M, Celebi ME (2011) Weighted performance index for objective evaluation of border detection methods in dermoscopy images. Skin Res Technol 17(1): 35–44.View ArticleGoogle Scholar
- Nunes FLS, Delamaro ME, Gonçalves VM, Lauretto MS (2015) CBIR based testing oracles: an experimental evaluation of similarity functions. Int J Softw Eng Knowl Eng 25(08): 1271–1306.View ArticleGoogle Scholar
- Bugatti PH, Traina AJM, Traina-Jr C (2008) Assessing the best integration between distance-function and image-feature to answer similarity queries In: Proceedings of the 2008 ACM Symposium on Applied Computing. SAC ’08, 1225–1230.. ACM, New York.View ArticleGoogle Scholar
- Ponciano-Silva M, Traina AJM, Azevedo-Marques PM, Felipe JC, Traina-Jr C (2009) Including the perceptual parameter to tune the retrieval ability of pulmonary CBIR systems In: Proceedings of the 22nd IEEE International Symposium on Computer-Based Medical Systems, 1–8.. IEEE, Albuquerque.Google Scholar