- Open Access
Evaluating different strategies for integration testing of aspect-oriented programs
Journal of the Brazilian Computer Society volume 20, Article number: 9 (2014)
The determination of an order for integration and testing of aspects and classes is a difficult optimization problem. This order should be associated to a minimal possible stubbing cost. To determine such order, different approaches exist. For example, traditional approaches are based on Tarjan’s algorithm; search-based approaches are based on metaheuristics, usually genetic algorithms (GA). In addition to such approaches, in the literature, there are different strategies to integrate aspect-oriented software. Some works suggest the integration of aspects and classes in a combined way. Other ones adopt an incremental strategy. Studies evaluating the approaches show that the multi-objective one presents better solutions. However, these studies were conducted applying only the combined strategy.
In this paper, we present experimental results comparing both strategies with three different approaches: the traditional one, a simple GA-based, and a multi-objective one.
The results show better performance of the multi-objective approach independently of the strategy adopted. A comparison of both strategies points out that the incremental strategy reaches a lower cost in most cases, considering a number of attributes and operations to be emulated in the stub.
It seems that with Incremental+, the best choice is the multi-objective approach. If the system is very complex, PAES seems to be the best MOEA.
The test of aspect-oriented (AO) programs is an important activity, which constitutes an active research topic, investigated by many authors. The goal is to extend the knowledge acquired in the object-oriented (OO) context and to introduce specific test criteria for the AO software[1–4]. Similarly to the OO software, the AO testing should be conducted in different phases. In the integration test phase, the focus is on the interaction between the modules. In the AO context, a module can be either a class or an aspect, and new kinds of faults appear, as well as, some new difficulties.
The main difficulty is to make sure that dependencies between aspects and classes are tested adequately. To test such dependencies different strategies are found in the literature. Some authors suggest incremental strategies that first test the classes[4, 6]. Aspects are integrated in a second step, in an interactive way. The main motivation for this strategy is to reduce the complexity of the testing process. Other strategies generate sequences to test the interactions among classes and aspects in a combined strategy. This seems to be a more practical strategy, since some related modules are generally developed together. However, in the application of both strategies, a problem exists, known as the integration and testing order problem. This problem refers to the determination of an integration order that minimizes the cost associated to stubs construction. Stubs simulate the resources that are needed for the modules being tested. Their purpose is to provide canned answers to calls made during the test, allowing the integration and test of modules, which are dependent on other non-available modules. In the AO context, this problem is called the class and aspects integration and test order (CAITO) problem.
To solve the CAITO problem, most proposed solutions use a graph to represent the dependencies between the modules. The graph, named the object relation diagram (ORD), was extended from the OO context to represent aspects and other kinds of dependencies[7, 8]. In most systems, there is a dependency relation between two modules. When dependency cycles exist, it is necessary to break the dependency and to construct a stub to emulate the behavior of the required module. The study reported in shows that it is very common to find complex dependency cycles in Java programs. In the AO context, it is common to find crosscutting concerns that are dependent on other crosscutting concerns, implying dependency between aspects, and between classes and aspects. Hence, to reduce stubbing costs, it is important to determine the best sequence for integration and testing of classes and aspects. Determining such sequence is not a trivial task, because different factors influence on the stub creation, such as the number of attributes to be emulated, number of operations, number of types of return, and so on.
To break the cycles and establish the test order, different algorithms have been used. The traditional approaches are based on Tarjan’s algorithm[7, 8]. Most recent approaches are search based and use metaheuristics such as genetic algorithms (GA). The approaches based on multi-objective and evolutionary optimization algorithms have presented promising results[11–13]. Such approaches allow the generation of more adequate solutions considering real constraints and diverse factors that may influence the integration and testing order problem. However, the search-based approaches have not been evaluated with different integration strategies proposed in the literature. In the works[11–13], the aspects and classes are integrated in a combined way. Ré et al. evaluated four strategies: Combined, Incremental+, Reverse and Random, using only the approach based on Tarjan’s algorithm. In their evaluation, the strategies Combined and Incremental+ obtained better results than the others.
Motivated by these facts and to better evaluate the integration strategies, in a previous work, we present results from the strategies application with traditional and evolutionary search-based approaches. Three algorithms were used: Tarjan’s algorithm, a simple GA, and the multi-objective algorithm NSGA-II. The strategies were compared according to the number of stubs generated, of attributes, and of operations necessary to emulate the stub behavior. Following up on our research, this paper extends our previous work by revisiting the results presented in and by adding in the comparison of two other multi-objective algorithms: SPEA2 and PAES. Since the multi-objective approach presents better results, this extension allows the evaluation of different multi-objective algorithms to solve the problem considering both strategies. Moreover, to know which algorithm is more suitable to solve a particular problem is a question that needs to be answered by means of experimental results. Those chosen are well-known GA-based algorithms that implement different evolution mechanisms to be compared. In this way, the main research questions addressed in this paper are:
RQ1: How are the results of the Incremental+ and Combined strategies? It is important to know, in a general case, what strategy presents lower costs related to the number of stubs, and global stubbing costs, related for example to the number of attributes and operations to be emulated. In addition to the general case, it is also important to investigate the performance of each strategy considering particular cases, i.e., system characteristics and used algorithms (or approaches).
RQ2: How is the performance of each algorithm (or approach) considering both strategies? This aims at investigating the performance of each algorithm used with both strategies. In the case of the Combined strategy, the multi-objective algorithms presented the best results[11, 12]. Whether such result is also valid for the Incremental+ strategy, how are the impacts of both strategies in the performance of the three evaluated multi-objective evolutionary algorithms (MOEAs)?
The paper is organized as follows: Section ‘Integration testing of classes and aspects’ reviews approaches and strategies for the integration of aspects and classes, as well as, related work. Section ‘Multi-objective evolutionary algorithms’ describes main concepts about GAs and the multi-objective algorithms used in the evaluation. Section ‘Evaluation description’ describes the evaluation conducted: systems, strategies, and algorithms used. Section ‘Results and discussion’ presents the results, which are analyzed to answer the research questions posed above and discusses some threats to validity of our work. Section ‘Conclusions’ concludes the paper with our final remarks.
Integration testing of classes and aspects
As mentioned before, most approaches to solve the CAITO problem are graph based, as it happens in the OO context, where the most used graph is named ORD[18, 19]. In such graph, the vertices represent the classes, and their relations are represented by the edges. Examples of relations between classes are association (As), aggregation (Ag), and inheritance (I). In the AO context, the ORD was extended to represent AO characteristics. An example of extended ORD is presented in Figure1, extracted from. In the left side of the figure are the vertices representing only class and their relations; the extended part is on the right, representing the aspects and the new dependency relations introduced by Ré and Masiero. The following new relationships are possible:
Crosscutting association (C) represents the association generated by a pointcut with a class method or other advice. In Figure1, it is illustrated between the aspect Billing and class Call;
Dependency (U) is generated by a relation between advices and pointcuts, and between pointcuts;
Association dependency (As) occurs between objects involved in pointcuts. This is shown in Figure1 by the relationship between Timing and Customer;
Inter-type declaration dependency (It) occurs when there are inter-type relationships between aspects and the base class. For example, an aspect Aa declares that class A extends B. In the example, there is this kind of dependency between Billing and Local; and among MyPersistentEntities, PersistentRoot and Connection;
Inheritance dependency (I) represents inheritance relationships between aspects or among classes and aspects, as it is observed by the aspects PersistentEntities and MyPersistent Entities in Figure1.
We can see in the picture the existence of dependency cycles, for instance, between Timing and Billing. This is a situation where one of the dependencies must be broken to allow the integration testing. For the broken dependency, a stuba is required.
To clarify the notion of stub, we present below an example extracted from. We need to integrate and test the aspect TimerLog whose implementation is shown in Listing 1. TimerLog depends on the aspect Timing and the class Timer (Figure1). However, in the test order, Timing and Timer are not available yet. So, stubs for both are required to perform the test. TimerLog depends on Timing because it crosscuts two advices that call the methods start() and stop() from Timer. TimerLog depends on Timer because the class is affected by the pointcuts, and the aspect accesses the attributes from the class.
Listing 1 Aspect TimerLog
The stub for the aspect Timing is presented in Listing 2. It emulates two advices and one inter-type declaration of an attribute. Listing 3 shows the stub for the class Timer, emulating two attributes and three methods.
Listing 2 Stub for the Aspect Timing
Listing 3 Stub for the Class Timer
Aiming at reducing the number of required stubs, different approaches have been used to break the cycles and to establish the test order. These approaches are presented in the next subsections. They are divided in two groups: (1) approaches that use graph algorithms and (2) approaches that use search-based algorithm. Furthermore, subsection ‘Integration strategies’ presents the integration strategies for the AO context.
Traditional approaches[7, 8] are based on Tarjan’s algorithm. The algorithm is recursively applied in the graph for identifying the cycles. The weight of each edge in the cycle is computed based on the number of incoming and outgoing dependencies. The cycle is broken by removing the edge with the greatest weight. When no more cycles remain in the graph, a reverse topological order is performed to determine the test order.
These approaches usually produce solutions found in a local optimal since they do not analyze the consequences of breaking a dependency. In some cases, a minimum number of cycles does not imply a lower cost. Another disadvantage is that they need some extension to be used with other factors related to the stubbing process, such as number of attributes of a class, number of calls or distinct operations invoked, constraints related to organizational reasons, etc. A global cost is required.
To overcome the local optimal limitations of traditional approaches, similar to what happens in the OO context, a strategy based on GA was proposed. GAs allow the use of different factors to establish the test orders by using a fitness function based on an aggregation of objectives to be minimized, for instance, a weighted average of number of operations and number of attributes. However, this fitness function requires the tester to adjust the weight of each objective, and the choice of the most adequate weights for the GA is a labor-intensive task for complex cases. To reduce these efforts and make the evolutionary strategy more practical for real systems, multi-objective optimization algorithms were applied in the AO context[12, 13].
In the multi-objective optimization, the objectives to be optimized are usually in conflict, and the goal is to find a good trade-off of solutions representing a possible compromise among them. In this way, a set of good solutions is possible. This set forms the approximation to the Pareto front (PFapprox), composed by different non-dominated solutions. Given a set of possible solutions, the solution A dominates B, if the value of at least one objective in A is better than the corresponding objective value in B and the values of the remaining objectives in A are at least equal to the corresponding values in B. A is non-dominated if it is not dominated by any other solution.
The multi-objective approach presents promising results when compared with a simple GA. The multi-objective algorithms generate more adequate solutions considering real constraints and diverse factors that may influence the stubbing process.
The abovementioned approaches use distinct algorithms to break cycles and different measures to the stubbing costs. However, these approaches can be applied with different integration strategies. The most used integration testing strategy for AO programs is called incremental[2, 5]. This strategy tests the base program first, and then with its aspects. The incremental strategy presents some advantages. It is easier to implement and may also allow easy fault localization. Another strategy suggests the integration in a combined way[7, 8]. This strategy seems to be more practical since classes and aspects probably are tested together if both are under development.
Both strategies present points in favor and against. However, these strategies have not been compared considering different algorithms for breaking cycles. The work that has the similar objective to ours is the study described in and more detailed in. In such study, four strategies were evaluated:
Combined: combines the integration of aspects and classes;
Incremental+: first integrates only classes and after, considers only aspects;
Reverse: applies the reverse combined order; and
Random: applies a random selected order.
The main results of the evaluation conducted are the following: (1) the Incremental+ and Combined strategies presented similar behavior, and the obtained results do not point out a best one; (2) the Reverse strategy produces many stubs, mainly stubs of classes. The authors conclude that it is not a good idea starting the integration from the aspects; (3) the Random strategy performed worse than Incremental+ and Combined strategies, and it was used only as a reference.
When comparing Incremental+ with Combined, Ré states that the Combined strategy has lower integration cost than the Incremental+ strategy. Furthermore, in his study, the Combined strategy minimized the number of stubs for the three evaluated systems. Another finding of Ré is the trend of balancing between the numbers of stubs for classes and aspects when Combined is used. With regards to Incremental+, there is a trend to generate a greater number of stubs for aspects than the number of stubs for classes.
These results give us some idea of the performance of strategies. But in the evaluation of Ré and Masiero, the strategies were only applied with the traditional approach based on the Briand et al.’s approach and Tarjan’s algorithm. The evolutionary approaches have been compared with traditional approaches and present better results, but the strategy used in the evaluations was the Combined one, considering as fitness functions the number of attributes and methods to be minimized. The existing works do not help us to answer our research questions. Section ‘Evaluation description’ describes a better comparison of both strategies and approaches, which is the goal of the present paper. First of all, next section contains a brief description of GAs and the multi-objective algorithms employed in the conducted evaluation.
Multi-objective evolutionary algorithms
Multi-objective evolutionary algorithms have been widely applied in several areas, such as Search-based software engineering, to solve problems with many interdependent interests (objectives) that must be optimized simultaneously. Variants of GA adapted to multi-objective problems were proposed. A GA is a heuristic inspired by the theory of natural selection and genetic evolution. From an initial population, basic operators are applied consisting of selection, crossover, and mutation. These operators evolve the population, generation by generation. Through the selection operator, more copies of those individuals with the best values of the objective function are selected to be parents. So the best individuals (candidate solutions) will survive in the next population. The crossover operator combines parts of two parent solutions to create a new one. The mutation operator randomly modifies a solution. The descendant population created from the selection, crossover, and mutation replaces the parent population.
Three representative MOEAs that are variants of traditional GAs are non-dominated sorting genetic algorithm (NSGA-II), strength Pareto evolutionary algorithm (SPEA2), and Pareto archived evolution strategy (PAES). Each algorithm adopts different evolution and diversification strategies. They are briefly described below.
NSGA-II is based on GA with a strong elitism strategy. For each generation, it sorts the individuals from parent and offspring populations, considering the non-dominance relation, creating several fronts. The first front generated by NSGA-II is composed by all non-dominated solutions. After removal of solutions belonging to the first front, the second front is composed with the solutions which become non-dominated. In the same way, the third front is formed by the solutions that become non-dominated after the removal of the solutions belonging to the first and the second fronts, and so on until all solutions are classified. For the solutions of the same front, another sort is performed using the crowding distance to maintain the diversity of solutions. The crowding distance calculates how far away the neighbors of a given solution are and, after calculation, the solutions are decreasingly sorted. The solutions in the boundary of the search space are benefited with high values of crowding distance since the solutions are more diversified but with fewer neighbors. Both sorting procedures, front and crowding distance, are used by the selection operator. The binary tournament selects individuals of lower front. In case of same fronts, the solution with greater crowding distance is chosen. New populations are generated with crossover and mutation.
SPEA2 has a specific way to store the non-dominated solutions found in the evolutionary process. It maintains an external archive that stores non-dominated solutions in addition to its regular population. From the archive, the individuals for the evolutionary process are selected. For each solution in the archive and in the population, a strength value is calculated. The strength value of a solution i corresponds to the number j of individuals, belonging to the archive and to the population, dominated by i. This strength value is used in the fitness function. The archive size s is fixed; so, in some moments, the number of non-dominated solutions found can be lower or bigger than s. When the number n of solutions is lower than s, dominated solutions are used to fill the archive, on the other hand, if n exceeds s, a clustering algorithm is used to reduce n.
PAES is an evolutionary algorithm that works like a hill climbing algorithm. It adopts a population concept different from other evolutionary algorithm strategies, since only one solution is maintained in each generation. The strategy to generate new individuals is to use only the mutation operator, blue which what makes it perform like a local search. As the algorithm works with only one solution for generation, there is no possibility to use the crossover operator. Like in SPEA2, there is an external archive that is populated with the non-dominated solutions found along the evolutionary process. If the external archive size is exceeded, a diversity strategy is applied on the set of solutions in order to remove the similar solutions and to maintain wide the exploitation of the search space. In the literature, PAES presents promising results in comparison with NSGA-II and SPEA2.
This section describes the evaluation conducted to compare strategies and approaches and answer our research questions. Based on the results described in Section ‘Integration testing of classes and aspects’, we only selected the best strategies according to the work of Ré et al.: Combined and Incremental+. Both were applied with three approaches and different algorithms: (1) TA, the traditional one based on Briand et al.’s approach and Tarjan’s algorithm; (2) SBA, the search-based one, implemented with a simple GA and using three configurations of weights for the fitness function; and (3) MSBA, the multi-objective search-based approach, implemented with the algorithms NSGA-II, PAES, and SPEA2 and uses Pareto’s dominance concepts.
Our goals are to evaluate (1) each strategy according to stubbing costs, considering a general case, characteristics of the systems, and used algorithm; and (2) the impact of using both strategies in the performance of the approaches and different algorithms.
Next, we describe the experimental setting: systems evaluated, evaluation measures, how the algorithms were implemented and configured, and the quality indicators used to compare the algorithms used in MSBA.
In contrast to the related work, in our evaluation, we used four real AspectJ systemsb, also used in our previous works[11–13]. We can see in Table1 that two of them contain more than one thousand dependencies. AJHotDraw is an AO refactoring of the JHotDraw two-dimensional graphics framework. AJSHQLDB is also an AO refactoring of HSQLDB, which is a database manager developed in Java. The Health Watcher collects and manages public health related to complaints and notifications. The Toll System Demonstrator is a concept proof for automatic charging of toll on roads and streets.
The search-based algorithms are guided by a fitness function that measures the quality of the produced solutions. As we desire solutions (orders) with low cost, we use in this work two coupling measures given by the number of attributes and operations to be emulated in the stub. These measures were also adopted in related works[7, 12, 13, 22].
Considering that (1) m i and m j are two coupled modules (m i depends on m j ), (2) modules are either classes or aspects, and (3) the ‘operation’ term represents class methods, aspect methods, and aspect advices, we define
Number of attributes (A) = The number of attributes locally declared in m j when references or pointers to instances of m j appear in the argument list of some operations in m i , as the type of their return value, in the list of attributes (data members) of m i , or as local parameters of operations of m i (adapted from). This complexity measure counts the (maximum) number of attributes to be handled in the stub if the dependency were broken.
Number of operations (O) = The number of operations (including constructors) locally declared in m j , which are invoked by operations of m i (adapted from). This complexity measure counts the number of operations to be emulated in the stub if the dependency were broken.
The stubbing complexity of an order t is based on its attribute and operation coupling. Two complexities are then calculated in the following way:
Attribute complexity (A(t)) - The attribute complexity counts the maximum number of attributes that would have to be handled in the stub if the dependency were broken (attribute coupling measure). This information is an input for the algorithms and is represented by a matrix AM(i,j), where rows and columns are modules and i depends on j. Then, for a given test order t and a set of d dependencies to be broken, the attribute complexity A is calculated according to Equation 1, where n is the total number of modules and k is any module included before the module i, in test order t.(1)
Operation complexity (O(t)) - The operation complexity counts the number of operations that would have to be emulated in the stub if the dependency were broken (operation coupling measure). This information is an input for the algorithms and is represented by a matrix OM(i,j), where rows and columns are modules and i depends on j. Then, for a given test order t and a set of d dependencies to be broken, the operation complexity O is computed as defined by Equation 2.(2)
To illustrate the use of both measures, consider the order t = [ …, TimerLog, Timing, Timer, …] for the example presented in Section ‘Integration testing of classes and aspects’. This order requires stubs for the aspect Timing and the class Timer (Listings 2 and 3). It is possible to determine the values for the measures A and O for this fragment of order. For A(t), the cost value is three, composed by one attribute implemented for an inter-type declaration in the stub for the aspect Timing and two attributes implemented in the stub for the class Timer. For this same fragment, the value of O(t) is five, composed by two advices in the stub for Timing and three methods in the stub for Timer. Hence, the cost of the fragment of t is (A = 3,O = 5).
Based on the measures presented above, the problem is the search for an order that minimizes the objectives A and O.
A reverse engineering was performed to identify the existing dependencies between modules from program codes using the same parser adopted in our previous works[11, 13]. A parser based on AJATO (AspectJ and Java Assessment Tool;http://homepages.dcc.ufmg.br/~figueiredo/ajato/) was developed to do this. It uses the Java/AspectJ code as entry and returns the syntactic tree code. From this tree, the associations, uses, inheritances, advices, point-cuts, and inter-type declaration dependencies were identified. At the end, the parser generated as output three matrices (dependency, attributes, and operations complexities) that were used as input to the algorithms. We consider that inheritance and inter-type declaration dependencies cannot be broken, similar to related works[7, 12, 13, 22].
As mentioned before, our goal is to evaluate existing approaches, TA, SBA, and MSBA, with both integration strategies, Incremental+ and Combined. In this section, we describe how the algorithms of each approach were implemented. In the TA, Tarjan’s algorithm was implemented according to, using the ANNAS framework. In the SBA, the implemented GAs are provided by the Bigus. They were adapted to compute the fitness based on the aggregation of both coupling measures. Regarding the MSBA, the multi-objective algorithms NSGA-II, SPEA2, and PAES were implemented by using the framework jMetal. Such algorithms were chosen due to two main reasons. First of all, evolutionary algorithms, such as NSGA-II, have presented the best performance in the OO context, when compared with other bio-inspired algorithms, such as PACO and MTabu. The second one is that they implement different evolution mechanisms, and this helps us to investigate the influence of the strategies in the search space.
We use the same representation and genetic operators to implement all evolutionary algorithms, approaches SBA and MSBA. The chromosome, solution in the population, is represented by a vector whose positions assume an integer that represents the modules. The size of this vector is equal to the number of modules, and a module must not appear twice in a test order. For both strategies, the crossover operator follows the technique of two-point crossover. In this technique, two points are selected randomly, and the genes inside them are swapped in the children. The remaining genes are used to complete the solution, from left to right. Figure2a shows an example of the two-point crossover operator using an individual with five genes. For the mutation operator, we used the technique of swap mutation. In this technique, two genes are randomly selected and are swapped in the child. Figure2b shows an example of swap mutation operator, using an individual with five genes. In the Incremental+ strategy if the randomly selected gene is a class, the gene to be swapped must be another class. In the other hand, if the gene is an aspect, it must be swapped by another aspect, in order to maintain the boundary between classes and aspects in the chromosome.
The use of crossover and mutation operators can generate test orders that break the precedence constraints between the modules (dependencies I, Ag, and It). This means that base modules must precede child modules in any test order t. The strategy adopted to deal with these constraints consists to check the test order, and if an invalid solution is generated, the module that breaks the constraint is placed at the end of the order according to the module type. For instance, in the Incremental+ strategy, if the module is a class, it must be placed at the end of the classes space; and analogously for aspects. The fitness function (objectives) is calculated from three matrices, inputs to the algorithms, associated to (1) dependencies between modules; (2) measure A; and (3) measure O (described in the last section).
Tarjan’s algorithm does not have parameters to be adjusted. The parameters of the GAs and MOEAs were adjusted following our previous works[11, 12], where an empirical parameter tuning was done. To configure the algorithms of approach SBA, besides the parameters related to the evolution process, it was also necessary to set the weights of the measures: attribute and operation coupling to compose the aggregated fitness function. We evaluated three combinations of weights. To verify the empirical influence of each measure in the stub construction we used a configuration to minimize only the attribute coupling (identified here as the configuration GA with attributes (GAA)). In this configuration, the weight of the measure operation coupling was set to zero. The other configuration minimizes only the operation coupling (identified here as the configuration GA with Operations (GAO)). In this configuration, the weight of the measure attribute coupling was set to zero. In the third configuration (configuration GA), equal importance was given to both measures.
Table2 shows the parameter values adopted. Each evolutionary algorithm was executed 30 times for each system. All the algorithms executed the same number of fitness evaluations, used as stopping criteria in order to analyze whether they can produce similar solutions when they are restricted to the same resources. Furthermore, they were executed in the same computer. At the end, the set of non-dominated solutions considering all runs was obtained for each algorithm.
To compare the results presented by the MOEAs with both strategies, we used some quality indicators from the literature: coverage (C), hypervolume (HV), and Euclidean distance from an ideal solution (ED).
To calculate such indicators, some sets were obtained from the execution of the algorithms. In each run, each MOEA found an approximation set of solutions named PFapprox. Furthermore, for each MOEA, it is obtained in a set called PFknown, formed by all non-dominated solutions achieved in all runs. Considering that PFtrue is unknown, in order to calculate the indicators, we generate PFtrue for each system through the union of all solutions achieved by all algorithms, removing dominated and equal solutions, as recommended in the literature.
The coverage C calculates the proportion of solutions in the Pareto front, PFa, which are dominated by PFb. The function C(PFa, PFb) maps the ordered pair of (PFa and PFb) into the range [0,1] according to the proportion of solutions in PFb that are dominated by PFa. Similarly, we compare C(PFb, PFa) to obtain the proportion of solutions in PFa that are dominated by PFb. Figure3a presents an example of C indicator for a minimization problem with two objectives. For instance, C(Pa, Pb) corresponds to 0.5 because the Pb set has two of its four elements dominated by Pa set. Value 0 for C indicates that the solutions of the former set do not dominate any element of the latter set; on the other hand, value 1 indicates that all elements of the latter set are dominated by elements of the former set.
The HV indicator is considered the best metric to performance assessment of algorithms for multi-objective optimization problems. It measures the volume of the dominated portion of the objective space and is of exceptional interest as it possesses the highly desirable feature of strict Pareto compliance, i.e., whenever one approximation completely dominates another approximation, the hypervolume of the former will be greater than the hypervolume of the latter. Figure3b presents an example of HV indicator.
The determination of a solution that minimizes all objectives is difficult in multi-objective optimization problems, and decision makers usually prefer the solution that is nearest to the ideal solution. An ideal solution has the minimum value of each objective of PFtrue, considering a minimization problem. Figure3c depicts an example of ED for a minimization problem with two objectives. Therefore, here, the Euclidean distance from an ideal solution (ED) is used to find the closest solutions to the best objectives.
Results and discussion
In this section, the results are presented and analyzed aiming at answering our research questions in the following subsections.
RQ1: strategies evaluation
RQ1 investigates the performance of each strategy according to the stubbing costs, characteristics of the systems and used approach. The goal is to help the tester in the selection of a strategy.
To conduct the analysis, we use Tables3 and4. Table3 presents the global cost of the solutions found by each algorithm. The global cost refers to the measures A and O, which represent how many attributes and operations need to be emulated in stubs. The solutions in italics are non-dominated considering all solutions of the algorithms.
In addition to the global cost, we estimate the number of stubs required for aspects and classes to show the impact of each strategy on the results. For each obtained solution, that is a test order, we analyze how many stubs are required for classes and for aspects taking into account the matrix of dependencies of each system. Table4 presents the mean number of stubs for classes (C) and for aspects (A) generated by each strategy and algorithm.
We can observe that a lower number of stubs does not imply a lower global cost. One stub can be more complex to be written due to the number of dependencies to be emulated inside it. For example, despite Tarjan’s algorithm having the lowest number of required stubs, the solutions achieved by it have higher global costs.
To help in the evaluation, we also use the indicator coverage, whose results are presented in Table5. In this table, the value C(Comb,Inc+), between 0 and 1, represents how much the solutions of the Incremental+ strategy are dominated by solutions of the Combined strategy. Similarly, C(Inc+,Comb) represents how much the solutions of Combined are dominated by solutions of Incremental+. Only bold values, near or greater than 0.5, are significant.
Regarding the global costs (given by the coupling measures), we can see in Tables3 and5 that the solutions obtained with Incremental+ strategy present lower costs (9 cases out of 28), considering all systems and algorithms. In two cases, the Combined strategy is better. In the remaining 17 cases, they are similar, the majority of them (16) being associated to the less complex systems, Health Watcher and Toll System.
Considering the cost associated to the mean number of stubs, a lower number of required stubs was achieved using the strategy Incremental+ (in 12 cases out of 28); see Table4. The Combined strategy achieved a lower number of stubs in eight cases, mainly for MOEAs and more complex systems (AJHotDraw and AJHSQLDB). Similar number of stubs were obtained for the other cases, mainly for Toll Systems.
Stubs for aspects are needed in few cases. In such cases, the stubs for aspects are required when using the strategy Combined. This not happens in the orders obtained by Incremental+ because the aspects are in the end of the orders. In addition, probably there are no dependency cycles between aspects in the systems evaluated.
Only GAA and GAO find orders that require stubs for aspects (Table4). These are situations in which only one of the measures is considered. Considering a multi-objective treatment, the number of stubs for aspects tends to be 0.
In short, in a general case, Incremental+ seems to be a better choice because it requires a lower number of stubs and has lower costs, with respect to the number of attributes and operations to be emulated. This strategy also presented a lower number of stubs for aspects.
As mentioned before, Ré has conducted a similar study with three systems and the traditional strategy. Our results are different from the results of his study. In our study, Combined has achieved neither the lowest number of stubs nor the lowest stubbing cost. Also, Combined has not achieved a better balance between the number of stubs for classes and for aspects than Incremental+.
We observe that the system characteristics influence the performance of the strategy. As mentioned before, there is no difference between the strategies for the small systems considering global costs. The only difference was found for GAO, where Incremental+ performs better. Considering the number of stubs and small systems, Incremental+ generated the lowest number in five cases whereas Combined achieved the lowest number in two cases for Health Watcher. In one of these two cases, Combined required a greater number of stubs for aspects, despite having generated the lowest number of stubs.
For the most complex systems, Incremental+ performed better in most cases. However, it is worth to mention that Combined presented its best cost results for AJHSQLDB. Such system has the largest number of LOC and their solutions are more expensive than the AJHotDraw solutions in terms of the number of attributes and operations and also in the number of stubs.
Tarjan’s algorithm presents the difference only for AJHSQLDB, where the best solution is obtained using the Incremental+ strategy. Considering SBA, for the algorithm GAA, Incremental+ strategy is better for the system AJHSQLDB and slightly better for system AJHotDraw. For GAO, the Incremental+ strategy always finds the best solutions.
When using GA, the Combined strategy is better for the system AJHotDraw. And Incremental+ strategy is slightly better for the system AJHSQLDB. Regarding MSBA, for the algorithms NSGA-II, Incremental+ strategy is better for the system AJHotDraw and Combined strategy is better for the system AJHSQLDB. For PAES, Incremental+ is slightly better than Combined for AJHSQLDB and finally, for SPEA2, the Incremental+ strategy is slightly better for AJHotDraw.
In summary, in 15 cases, there is no difference between the strategies. Using the Incremental+ strategy, the best solutions are obtained in eleven cases; eight of them give a single objective treatment to the problem (Tarjan, GAA, and GAO). Using the Combined strategy the best solutions are obtained in only two cases, where a multi-objective treatment is given to the problem (GA using an aggregation function and with the multi-objective approach).
According to the results, the Incremental+ is the best strategy to solve the problem when using traditional and search-based approaches (TA and SBA). When there is a multi-objective treatment, Combined achieves good results, too. When applying the multi-objective approach, both strategies have a similar behavior considering the context of our work.
Despite both strategies achieving satisfactory results, if the internal members of each stub are considered in the stubbing cost, a statement of Ré would be considered to choose one strategy. He states that if the aspects of the system under test are small and have few implemented internal members, it is possible to conjecture that Incremental+ will have better performance than Combined. It happens because there is a trend of the number of stubs for aspects greater than the number of stubs for classes in Incremental+, leading to a lower number of internal members to be emulated in these stubs.
RQ2: algorithms evaluation
RQ2 aims at investigating the performance of each algorithm with both strategies. An important answer is to know the best approach to a given strategy being used by the tester. For instance, if the tester needs to adopt the Incremental+ strategy since he/she does not have the aspects available for the test, which is the most suitable approach to be used? With respect to this question, we can also observe if a strategy influences in the performance of the algorithms, mainly the multi-objective ones.
We can see in Table3 that MSBA presents the best cost independent of the strategy used; the solutions of NSGA-II and PAES represent the best trade-off between both objectives, with a greater number of non-dominated (italics) solutions for all systems. Health Watcher and Toll System have only one optimal cost solution. Some approaches have not found this solution (Tarjan and GAO), independent of the strategy used.
Hence, the result obtained in our previous works[11, 12] is also valid for the Incremental+ strategy. We can see in Table3 that the solutions achieved by Incremental+ for AJHotDraw and AJHSQLDB in MSBA have better trade-off between the objectives than solutions achieved by this strategy in TA and SBA.
A greater number of non-dominated solutions is obtained with the Combined strategy: regarding the number of solutions, we can observe in Table3 that the search-based algorithms find a greater number of non-dominated solutions when using the strategy Combined. An explanation for this is that the Incremental+ strategy imposes restrictions to the algorithms and this reduces the search space, decreasing, as a consequence, the number of possible solutions. GA for the system AJHotDraw is the single exception where the Incremental+ strategy found a greater number of solutions. The other exception involves the systems with a single solution, cases where there is no difference.
Since, the multi-objective approach is the best choice for both strategies; next, we evaluate which is the best MOEA. To do this, we performed a visual analysis of the obtained solutions in the search space, and use two quality indicators to compare the MOEAs and strategies: HV (Table6), and ED (Table7). In both tables, the boldface is used to emphasize the strategy with best results for the same algorithm when differences are observed.
To our visual analysis, we depict the solutions on graphs only for AJHotDraw and AJHSQLDB, since for the other two systems, a single solution was achieved (see Table3). The graphs are presented in Figure4. For system AJHotDraw, the solutions are in the same area of the graph, but the set of best solutions are achieved by NSGA-II and PAES, both using the Incremental+ strategy. The worst MOEA is SPEA2 independently of the strategy used. For system AJHSQLDB, the best solutions are clearly observed. These solutions are achieved by PAES with the strategy Incremental+ and PAES with the Combined strategy. We also observe that NSGA-II with the strategy Combined is better than the NSGA-II with Incremental+ and SPEA2 with both strategies.
Table6 presents the mean values of HV considering the 30 runs of each MOEA. The number between parentheses represents the standard deviation. Due to the stochastic nature of the algorithms, to perform a statistical comparison, the Friedman test was used at a 5% significance level. This test is applied to raw values, and the post-test of the Friedman test indicates whether there is any statistical difference between each analyzed data set; to identify which data set has the best values, boxplot charts are used. The boxplot chart gives information about the location, spread, skewness, and tails of the data.
The Friedman statistical test does not point difference between the strategies for the same MOEA. The Friedman test points statistical difference between algorithms only for systems AJHSQLDB and Toll System. Figure5 presents the boxplots for indicator HV for these systems. Regarding the system AJHSQLDB, the results achieved by PAES, independent of the used strategy, are better than NSGA-II and SPEA2. Between the other two MOEAs, NSGA-II with Combined is better than SPEA2 with Incremental+. In all other cases, there is no statistical difference. Now, observing the results for the Toll System, we note that NSGA-II with Incremental+ strategy and SPEA2 with both strategies are better than PAES with Combined strategy, and PAES with Incremental+ is statistically equivalent to all the algorithms.
Since, MOEAs return a set of solutions, we need to choose one solution to be used by the tester. Consequently, we use the indicator ED to observe the closest solutions to the ideal solution. Table7 presents the results of indicator ED, the cost of the solution with the lowest ED is presented between parenthesis. The values in boldface correspond to the best result of all MOEAs.
For almost all the systems, PAES achieved the best ED solutions, with the exception of the AJHotDraw system with Incremental+ where the best was the NSGA II; however, the difference with PAES solution is not great. Moreover, PAES does not present different results between strategies.
There is no difference among the MOEAs in the Combined strategy. In the Incremental+ strategy, it is possible to note that the strategy influences the results of the MOEAs leading to a slight difference among them. PAES achieves better results than the other MOEAs in some cases considering HV and ED. However, this advantage of PAES in relation to the other algorithms has statistical difference in only one case.
Threats to validity
In this section, threats to the validity of our work are analyzed. Regarding the construct validity, in our evaluation, a possible threat is related to the model used to represent the dependencies of the AO systems, as well as, the coupling measure used to calculate the stubbing costs. We know that there are other factors that could be considered. To mitigate such threat, we used the model ORD, algorithms, and measures considered in the literature and similar studies[7, 8, 10, 22]. We intend to conduct other experiments and study other measures that can influence the problem. In such experiments, other research questions should be investigated.
To mitigate reliability threats, we executed the non-deterministic algorithms 30 times, as recommended in the literature. The experiments can be repeatable since the systems are available following the same methodology. Regarding the internal validity, we also use in our analysis quality indicators, coverage, HV, and ED, and statistical analysis used in the literature.
The main threat of our work is related to the external validity. The number of the systems evaluated can influence the generalization of the obtained results. Although, we are using a greater number of systems than related works, the results cannot be generalized because the number of systems is still small. So, our findings can be considered as evidences about the performance of the approaches and strategies. To reduce this influence, we selected aspect-oriented systems, with different sizes and complexities, given by the number of modules and dependencies.
This work described results of an experimental evaluation of two different strategies, Incremental+ and Combined, for integration testing of the AO software. The strategies were evaluated with real systems and three approaches (and seven algorithms): (1) the traditional one, based on Tarjan’s algorithm, (2) the GA-based one, implemented with three different configurations of weights, and (3) the multi-objective one, implemented with three MOEAs: NSGA-II, PAES, and SPEA2.
The strategies were evaluated according to the costs, given by the number of stubs, number of attributes and operations, characteristics of the programs, and approach adopted. The performances of the algorithms used with each strategy were also compared.
In a general case, the Incremental+ strategy presented lower costs, and it is a good choice independently of the approach and system characteristics. The Combined strategy presents a greater number of non-dominated solution options to the tester and good performance with more complex systems and the multi-objective approach. The Combined strategy generates a greater number of stubs for aspects since they are integrated and tested together with the classes. Despite of not finding the greatest number of non-dominated solutions, every time in the multi-objective approach, Incremental+ does not require the development of stubs for aspects and it achieves solutions with the lowest ED (preferred by the decision makers).
In the context of our study, the results show that the multi-objective approach is better than the other approaches independent of the adopted integration strategy. Given this fact, the three multi-objective algorithms were compared, considering the most complex systems. PAES achieved the best results followed by NSGA-II.
In short, it seems that the best choice is the multi-objective approach with Incremental+, since it may be more interesting for the tester to adopt a strategy that generates orders with lower global cost. If the system is very complex, PAES seems to be the best MOEA.
As future work, we intend to use in further experiments other measures that affect the stubbing costs, especially of aspects. In addition to coupling measures, other objectives could be used, for instance, to minimize the total number of stubs, or to minimize the number of stubs for classes or aspects. New experiments with other systems should be conducted to better evaluate the influence of the system characteristics in the performance of the strategies and approaches. Finally, further studies may include costs related to the execution of the test orders.
a Some works use mocks instead of stubs to simulate dependencies in the AO context. Mocks are similar to stubs, but stubs use state verification, whereas mocks use behavior verification. The adoption of stubs or mocks in the integration testing is a decision of the tester, but independently of the simulating technique used, the minimization of the required stubs/mocks is necessary.
b AJHotDraw (version 0.4):http://sourceforge.net/projects/ajhotdraw/; AJHSQLDB (version 18):http://sourceforge.net/projects/ajhsqldb/files/; Toll System (version 9):http://people.cs.kuleuven.be/~aram.hovsepyan/process_study.html; Health Watcher (version 9):http://ptolemy.cs.iastate.edu/design-study/.
Alexander RT, Bieman JM, Andrews AA: Towards the systematic testing of aspect-oriented programs. 2004. Colorado State University, Technical Report
Ceccato M, Tonella P, Ricca F: Is AOP code easier or harder to test than OOP code? In First Workshop on Testing Aspect-Oriented Program (WTAOP).. Chicago, Illinois; 2005. 15 March 2005 15 March 2005
Lemos OAL, Franchin IG, Masiero PC: Integration testing of object-oriented and aspect-oriented programs a structural pairwise approach for java. Sci Comput Program 2009, 74(10):861–878. 10.1016/j.scico.2009.05.001
Zhao J: Data-flow-based unit testing of aspect-oriented programs. 27th Annual International Conference on Computer Software and Applications (COMPSAC). 2003. Dallas, TX, USA, 3–6 November 2003 Dallas, TX, USA, 3-6 November 2003
Zhou Y, Ziv H, Richardson DJ: Towards a practical approach to test aspect-oriented software. In Beydeda S, Gruhn V, Mayer J, Reussner R, Schweiggert, F (eds) Proceedings of the workshop on testing component-based systems (TECOS 2004). Erfurt, Germany, September, 2004; 2004. Lecture notes in informatics, vol 58. p 1–16. GI, Konstanz Lecture notes in informatics, vol 58. p 1–16. GI, Konstanz
Massicotte P, Badri M, Badri L: Aspects-classes integration testing strategy: an incremental approach. In 2nd International Workshop on Rapid Integration of Software Engineering techniques (RISE 2005) Heraklion, Crete, Greece, 8–9 September 2005. Lectures notes in computer science, vol 3943. Heidelberg: Springer; 2005:158–173.
Ré R, Masiero PC: Integration testing of aspect-oriented programs: a characterization study to evaluate how to minimize the number of stubs. Brazilian Symposium on Software Engineering (SBES), João Pessoa, PB, Brazil, 15–19 October 2007. 2007, 411–426.
Ré R, Lemos OAL, Masiero PC: Minimizing stub creation during integration test of aspect-oriented programs. In 3rd Workshop on Testing Aspect-Oriented Program (WTAOP). Vancouver, British Columbia; 2007. 13 March 2007. pp 1–6 13 March 2007. pp 1–6
Melton H, Tempero E: An empirical study of cycles among classes in Java. Empir Softw Eng 2007, 12: 389–415. 10.1007/s10664-006-9033-1
Galvan R, Pozo A, Vergilio S: Establishing integration test orders for aspect-oriented programs with an evolutionary strategy. In 4th Latin American Workshop on Aspect-Oriented Software Development (LA-WASP). Salvador, BA, Brazil; 2010. 27–28 September 2010 27-28 September 2010
Assunção W, Colanzi T, Vergilio S, Pozo A: Generating integration test orders for aspect-oriented software with multi-objective algorithms. Revista de Informática Teórica e Aplicada (RITA) 2013, 20(2):301–327.
Colanzi T, Assunção W, Vergilio S, Pozo A: Generating integration test orders for aspect-oriented software with multi-objective algorithms. In Latin American Workshop on Aspect-Oriented Software Development (LA-WASP). São, Paulo, SP, Brazil; 2011. 26 September 2011 26 September 2011
Colanzi T, Assunção WKG, Vergilio SR, Pozo A: Integration test of classes and aspects with a multi-evolutionary and coupling-based approach. In Third International Symposium on Search Based Software Engineering (SSBSE). Szeged, Hungary; 2011. 10–12 September 2011. pp 188–203 10-12 September 2011. pp 188–203
Assunção W, Colanzi T, Vergilio S, Pozo A: Evaluating different strategies for integration testing of aspect-oriented programs. In Latin American Workshop on Aspect-Oriented Software Development (LA-WASP),. Natal, RN, Brazil; 2012. 23 September 2012 23 September 2012
Deb K, Pratap A, Agarwal S, Meyarivan T: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 2002, 6(2):182–197. 10.1109/4235.996017
Zitzler E, Laumanns M, Thiele L: SPEA2: improving the strength Pareto evolutionary algorithm. Switzerland: Technical Report 103, Gloriastrasse 35, CH-8092 Zurich; 2001.
Knowles JD, Corne DW: Approximating the nondominated front using the Pareto archived evolution strategy. Evol Comput 2000, 8: 149–172. 10.1162/106365600568167
Kung D, Gao J, Hsia P, Toyoshima Y, Chen C: A test strategy for object-oriented programs. In 19th Computer Software and Applications Conference. Dallas, TX, USA; 1995. 9–11 August 1995 9-11 August 1995
Kung DC, Gao J, Hsia P, Lin J, Toyoshima Y: Class firewall, test order and regression testing of object-oriented programs. J Object-Oriented Programming 1995, 8(2):51–65.
Ré R: A contribution to the minimization of the number of stubs during integration test of aspect-oriented programs. PhD thesis, University of São Paulo – Institute of Mathematical and Computer Sciences (ICMC-USP). In portuguese 2009.
Tarjan R: Depth firstsearch and linear graph algorithms. SIAM J Comput 1972, 1(2):146–160. 10.1137/0201010
Briand LC, Feng J, Labiche Y: Using genetic algorithms and coupling measures to devise optimal integration test orders. In 14th International Conference on Software Engineering and Knowledge Engineering (SEKE). Ischia, Italy; 2002. 15–19 July 2002 15-19 July 2002
Briand LC, Labiche Y: An investigation of graph-based class integration test order strategies. IEEE Trans. Softw Eng 2003, 29(7):594–607. 10.1109/TSE.2003.1214324
Harman M: The current state and future of search based software engineering. In Future of Software Engineering - FOSE, Minneapolis, Minnesota, 23–25 May 2007. Washington, DC: IEEE Computer Society; 2007:342–357.
Goldberg DE: Genetic algorithms in search, optimization, and machine learning. Boston: Addison-Wesley; 1989.
Coello CAC, Lamont GB, Veldhuizen DAV: Evolutionary algorithms for solving multi-objective problems (Genetic and evolutionary computation). Secaucus: Springer-Verlag New York, Inc.; 2006.
Chicano JF, Luna F, Nebro AJ, Alba E: Using multi-objective metaheuristics to solve the software project scheduling problem. 13th Genetic and Evolutionary Computation Conference (GECCO), Dublin, Ireland, 12–16 July 2011 2011, 1915–1922.
ANNAS: Graph implementation and algorithm package. 2011.http://code.google.com/p/annas/ Available at. Accessed August 2011
Bigus JP, Bigus J: Constructing intelligent agents using Java, 2nd edition. New York: John Wiley & Sons, Inc.; 2001.
Durillo J, Nebro A, Alba E: The jMetal framework for multi-objective optimization: design and architecture. In IEEE Congress on Evolutionary Computation (CEC), Barcelona, Spain. Lecture notes in computer science, vol 5467. Berlin/Heidelberg: Springer; 2010:4138–4325.
Vergilio S, Pozo A, Árias J, Cabral R, Nobre T: Multi-objective optimization algorithms applied to the class integration and test order problem. Int J Softw Tools Technol Transf (STTT) 2012, 14: 461–475. doi:10.1007/s10009–012–0226–1 doi:10.1007/s10009-012-0226-1 10.1007/s10009-012-0226-1
Arcuri A, Fraser G: On parameter tuning in search based software engineering. In Proceedings of the Third International Symposium on Search Based Software Engineering, SSBSE’11 Szeged, Hungary, 10–12 September2011. Berlin, Heidelberg: Springer-Verlag; 2011:33–47.
Zitzler E, Thiele L, Laumanns M, Fonseca CM, da Fonseca VG: Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans Evol Comput 2003, 7: 117–132. 10.1109/TEVC.2003.810758
Zitzler E, Thiele L: Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 1999, 3(4):257–271. 10.1109/4235.797969
Cochrane J, Zeleny M: Multiple criteria decision making. Columbia; 1973.
García S, Molina D, Lozano M, Herrera F: A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 Special Session on Real Parameter Optimization. J Heuristics 2009, 15(6):617–644. 10.1007/s10732-008-9080-4
Mortensen M, Ghosh S, Bieman JM: A test driven approach for aspectualizing legacy software using mock systems. Inf Softw Technol 2008, 50(7–8):621–640.
We would like to thank CNPq and CAPES for their financial support. This paper is an extended version of a previous work presented in LAWASP-2012.
The authors declare that they have no competing interests.
All authors have contributed to the different conceptual and experimental aspects study presented in this article. All authors read and approved the final manuscript.
About this article
Cite this article
Assunção, W.K.G., Colanzi, T.E., Vergilio, S.R. et al. Evaluating different strategies for integration testing of aspect-oriented programs. J Braz Comput Soc 20, 9 (2014). https://doi.org/10.1186/1678-4804-20-9
- Aspect-oriented software
- Integration testing strategies
- Evolutionary algorithms