Testing of aspect-oriented programs: difficulties and lessons learned based on theoretical and practical experience

Ferrari, Fabiano C.; P. Cafeo, Bruno B.; Levin, Thiago G.; S. Lacerda, Jésus T.; L. Lemos, Otávio A.; C. Maldonado, José; Masiero, Paulo C.

doi:10.1186/s13173-015-0040-1

Research
Open access
Published: 20 November 2015

Testing of aspect-oriented programs: difficulties and lessons learned based on theoretical and practical experience

Fabiano C. Ferrari¹,
Bruno B. P. Cafeo²,
Thiago G. Levin¹,
Jésus T. S. Lacerda¹,
Otávio A. L. Lemos³,
José C. Maldonado⁴ &
…
Paulo C. Masiero⁴

Journal of the Brazilian Computer Society volume 21, Article number: 20 (2015) Cite this article

5451 Accesses
2 Citations
Metrics details

Abstract

Background

Since the first discussions of new challenges posed by aspect-oriented programming (AOP) to software testing, the real difficulties of testing aspect-oriented (AO) programs have not been properly analysed. Firstly, despite the customisation of traditional testing techniques to the AOP context, the literature lacks discussions on how hard it is to apply them to (even ordinary) AO programs based on practical experience. Secondly, and equally important, due to the cautious AOP adoption focused on concern refactoring, test reuse is another relevant issue that has been overlooked so far. This paper deals with these two issues. It discusses the difficulties of testing AO programs from three perspectives: (i) structural-based testing, (ii) fault-based testing and (iii) test set reuse across paradigms.

Methods

Perspectives (i) and (ii) are addressed by means of a retrospective of research done by the authors’ group. We analyse the impact of using AOP mechanisms on the testability of programs in terms of the underlying test models, the derived test requirements and the coverage of such requirements. The discussion is based on our experience on developing and applying testing approaches and tools to AspectJ programs at both unit and integration levels. Perspective (iii), on the other hand, consists of recent exploratory studies that analyse the effort to adapt test sets for refactored systems and the quality of such test sets in terms of structural coverage.

Results

Building test models for AO programs imposes higher complexity when compared to the OO paradigm. Besides this, adapting test suites for OO programs to AO equivalent programs tends to require less effort than doing the other way around, and resulting suites achieve similar quality levels for small-sized aplications.

Conclusions

The conclusion is that building test models for AO programs, as well as deriving and covering paradigm-specific test requirements, is not straightforward as it has been for procedural and object-oriented (OO) programs at some extent. Once you have test suites in conformance with programs implemented in both paradigms, the quality of such suited in termos of code coverage may vary depending on the size and characteristics of the applications under testing.

Introduction

In 2004, Alexander et al. [1] first discussed the challenges posed by aspect-oriented programming (AOP) to the software testing researchers. They enumerated potential sources of faults in aspect-oriented (AO) programs, ranging from the base code itself (i.e. not directly related to aspectual code) to emerging properties due to multiple aspect interactions. In the same report, they proposed a candidate, coarse-grained fault taxonomy for AO programs. Ever since, the software testing community has been investigating ways of dealing with the challenges described by them. In summary, research on testing of AO programs (hereafter called AO testing) has been mainly concerned with: (i) the characterisation of fault types and bug patterns [2–7], (ii) the definition of underlying test models and test selection criteria [8–18] and (iii) the provision of automated tool support [11, 14, 16, 18–22]. In particular, structural-based and mutation-based testing have been on focus by several research initiatives [8, 9, 11–29].

Despite the variety of approaches for testing AO software, too little has been reported about the difficulties of applying them based on practical experience. In other words, researchers rarely discuss the difficulty of fulfilling AO-specific test requirements and the ability of their approaches in revealing faults in AO programs. For example, questions like “how hard is for one to create a test case to traverse a specific path in an AO program graph (in structural-based testing)?” and “how hard is for one to kill an AO mutant (in mutation-based testing)?” can hardly be answered based on the analysis and discussions presented in the existing literature. Besides this, we observe that, even after almost two decades of the AOP dissemination, it is still adopted with caution by practitioners and researchers. This fact was observed in two relatively recent reports [30, 31]. From our experience and observations, when adopted, AOP is applied to refactor existing object-oriented (OO) systems to achieve better modularisation of behaviour that appears intertwined or spread across the system modules (these are the so-called crosscutting concerns [32]). Examples of AOP applied in this context can be found in the work of van Deursen et al. [2], Mortensen et al. [33], Ferrari et al. [34] and Alves et al. [35, 36], not limited to particular technologies such as Java and AspectJ.

Our previous research investigated the fault-proneness of AO programs based on faults identified during the testing of real-world AO applications [34]. This is related to the first aforementioned topic (i.e. fault characterisation). The conclusion was that, amongst the main mechanisms commonly used in AO programs, none of them stands out in terms of fault-proneness. In that exploratory study, we used test sets built upon the OO versions of the applications and then used such test sets to evaluate the AO counterparts with some test set customisations. Even though that study [34] addressed the reuse of test suites in refactoring scenarios, we did not provide any discussion with respect to the achieved code coverages, neither with respect to effort required for reusing test sets.

In this paper, we revisit our contributions on AO testing achieved by our research group along the last decade. We discuss the challenges and difficulties of testing AO programs from three perspectives: (i) structural-based testing, (ii) fault-based testing and (iii) test set reuse across programming paradigms. Regarding perspectives (i) and (ii), we analyse the impact of using AOP mechanisms on the testability of programs in terms of the definition of the underlying models, the derivation of test requirements and the coverage of the requirements. In regard to the perspective (iii), considering the OO and AO paradigms, we address the effort for adapting test suites from one paradigm to the other and analyse the quality of reused test sets in both paradigms.

We highlight upfront that this paper extends the discussions and results presented in a previous publication [37]. In order to extend our previous work, we focused on the aforementioned perspective (iii)—test set reuse across programming paradigms. We report on the results of a recently performed exploratory study that measures the effort (in terms of code changes) required to adapt test suites from one paradigm to the other and vice versa. Beyond this, we measure the structural coverage that results from the applied test sets. The reader should notice the points presented in this paper rely on our practical experience of establishing and applying approaches to test AO programs by means of theoretical definitions and exploratory assessments.

The remainder of this paper is organised as follows: section ‘Background’ describes basic background on structural and fault-based testing. It also presents basic concepts of aspect-oriented programming and the AspectJ language. Sections ‘Structural-based viewpoint analysis’ and ‘Mutation-based viewpoint analysis’ revisit the contributions of our research group on structural and mutation testing of AO programs, respectively. Section ‘Reuse-centred viewpoint analysis’ brings novel results of an exploratory study that addressed the reuse of test sets across the OO and AO paradigms. Examples and experimental results are presented along sections ‘Structural-based viewpoint analysis’, ‘Mutation-based viewpoint analysis’ and ‘Reuse-centred viewpoint analysis’. Section ‘Related work’ summarises related research. Finally, section ‘Final remarks, limitations and research directions’ points out future research directions and concludes this work.

Background

Structural testing

Structural testing—also called white-box testing—is a technique based on internal implementation details of the software. In other words, this technique establishes testing requirements based on internal structures of an application. As a consequence, the main concern of this technique is with the coverage degree of the program logic yielded by the tests [38].

In structural testing, a control flow graph (CFG) is typically used to represent the control flow of a program. A CFG is a directed graph representing the order in which the individual statements, instructions or function calls of program are executed. In a CFG, nodes represent a statement or a block of statements, and edges represent the flow of control from one statement or block of statements to another. In the context of this paper, we define a block of statement as a set of statements of a program. After the execution of the first statement of the block, the other statements within the block are sequentially executed according to the control flow. Each block corresponds to a node in the CFG and the transfer of control from one node to another is represented by directed edges between nodes.

Test selection criteria (or simply testing criteria) based on control flow use only information about the execution flow of the program such as statements and branches to determine which structures need to be tested. Typical structural-based testing criteria defined based on a CFG are all-nodes, all-edges and all-paths [38]. These criteria require test cases that exercises all nodes (i.e. all statements), all edges (i.e. all branches), and all paths (i.e. all possible combination of nodes and edges) that compose a CFG, respectively. It is important to notice that, although desirable, the coverage of all of these criteria is unfeasible in general. For instance, the coverage of the all-paths criterion may be impracticable due to the high number of paths in a CFG. This and other limitations of the control flow-based criteria motivated the introduction of data flow-based criteria.

For data flow-based testing, the def-use graph extends the CFG with information about the definitions and uses of variables [39]. Data flow-based testing uses data flow analysis as source of information to derive testing requirements. In other words, the interactions involving definition of variables and use of such definitions are explored to derive test requirements. For our purposes, the occurrence of a variable in a program is classified either as a definition or a use. We consider as a definition a value assignment to a variable. With respect to use occurrences, we consider as a predicate use (p-use), a use of a variable associated with the decision outcome of the predicate portion of a decision statement—e.g. if (x == 0)—and as a computational use (c-use), a use of a variable that directly affects a computation and it is not a p-use—e.g. y = x + 1. P-uses are associated to the def-use graph edges and c-uses are associated to the nodes. A definition clear path (def-clear path) is a path that goes from the definition place of a variable to a subsequent c-use or p-use, such that the variable is not redefined along the way. A def-use pair with respect to some variable is then a pair of definition and subsequent use locations such that there is a def-clear path with respect to that same variable from the definition to the use location [39]. If a def-use graph is used as the underlying model, typical criteria are all-defs and all-uses [39]. In short, such data flow-based criteria require test cases that traverse paths that include the definition and subsequent uses of variables of the program. For more information about the structural-testing criteria mentioned in this section, the reader may refer to seminal studies of structural testing [38, 39]).

Fault-based testing and mutation testing

The fault-based testing technique derives test requirements based on information about recurring errors made by programmers during the software development process. It focuses on types of faults which designers and programmers are likely to insert into the software and on how to deal with this issue in order to demonstrate the absence of such prespecified faults [40]. In this technique, fault models (or fault taxonomies) guide the selection or design of test cases that are able to reveal fault types characterised on such models. Fault models and taxonomies can be devised from a combination of historical data, researchers’ and practitioners’ expertise and specific programming paradigm concepts and technologies.

The most investigated and applied fault-based test selection criterion is the mutant analysis [41], also known as mutation testing. Basically, it consists in creating several versions of the program under testing, each one containing a simple fault. Such modified versions of the program are called mutants and are expected to behave differently from the original program. Each mutant is executed against the test data and is expected to produce a different output when compared to the execution of the original program.

In mutation testing, given an original program P, mutation operators encapsulate a set of modification rules applied to P in order to create a set of mutants M. Then, for each mutant m,(m∈M), the tester runs a test suite T originally designed for P. If ∃t,(t∈T) | m(t)≠P(t), this mutant is considered killed. If not, the tester should enhance T with a test case that reveals the difference between m and P. If m and P are equivalent, then P(t)=m(t) for all test cases that can be derived from P’s input domain.

Mutation testing can be applied with two goals: (i) evaluation of the program under test (i.e. P) or (ii) evaluation of the test data (i.e. T). In the first case, faults in P are uncovered when fault-revealing mutants are identified. Given that S is the specification of P, a mutant is said to be fault-revealing when it leads to the creation of a test case that shows that P(t)≠S(t),(t∈T) ([42] p. 536).

In the second case, mutation testing evaluates how sensitive the test set is in order to identify as many faults simulated by mutants as possible.

Mutation testing is usually performed in four steps [41]: (1) execution of the original program, (2) generation of mutants, (3) execution of the mutants and (4) analysis of the mutants. After each cycle of mutation testing, the current result is calculated through the mutation score, which is the ratio of the number of killed mutants to the total number of generated (non-equivalent) mutants. The mutation score is a value in the interval [0,1] that reflects the quality of the test set with respect to the produced mutants. The closer to 1 the mutant set is, the higher the quality of the test set [42].

Aspect-oriented programming

Aspect-oriented programming (AOP) [32] relies in the principle of separation of concerns (SoC) [43]. Software concerns, in general, may address both functional requirements (e.g. business rules) and non-functional properties (e.g. synchronisation or transaction management). In the context of AOP, a concern is handled as a coarse-grained feature that can be modularised within well-defined implementation units. In AOP, the so-called crosscutting concerns cannot be properly modularised within conventional units [32]. For example, in traditional programming approaches like procedural and object-oriented programming (OOP), code that implements a crosscutting concern usually appears scattered over several modules and/or tangled with other concern-specific code. Other (non-crosscutting) concern codes comprise the base code of the software.

To improve the modular implementation of crosscutting concerns, AOP introduces the notion of aspects. An aspect can be either a conceptual programming unit or a concrete, specific unit named aspect (as in widely investigated languages such as AspectJ¹ and CaesarJ²). Once both aspects and base code are developed, they are combined during a weaving process [32] to produce a complete system.

In AspectJ, which is the most investigated AOP language and whose implementation model has inspired the proposition of several other languages, aspects have the ability to modify the behaviour of a program at specific points during its execution. Each of the points at which aspectual behaviour is activated is called a join point. A set of join points is identified by means of a pointcut descriptor or simply pointcut. A pointcut is represented by a language-based matching expression that identifies a set of join points that share some common characteristic (e.g. based on properties or naming conventions). This selection ability is often referred to as quantification [44].

During the program execution, once a join point is identified, a method-like construct named advice may run, depending or not of some runtime checking routine. Advices can be of different types depending on the supporting technology. For example, in AspectJ, advices can be defined to run at three different moments when a join point is reached: before, after or around (in place of) it.

AspectJ can also perform structural modifications of modules that comprise the base code. These modifications are achieved by the so-called intertype declarations (ITDs). Examples of intertype declarations are the introduction of a new attribute or method into a base module or a change in the class’ inheritance.

Structural-based viewpoint analysis

This section revisits the contributions of our research group on structural testing of AO programs. It addresses three main concerns of systematic testing: the establishment of underlying structural models (section ‘Creating an underlying model’), the identification of relevant test requirements based on that models (section ‘Deriving test requirements’) and the difficult to analyse and cover such requirements (section ‘Covering and analysing test requirements’).

Creating an underlying model

As described in Section ‘Structural testing’, the basic idea behind structural testing criteria is to ensure that specific elements (control elements and data structures) in a program are exercised by a given test set, providing evidence of the quality of the testing activity. It is supposed that the underlying model represents the dynamic behaviour of programs based on static information to generate relevant test requirements. In general, such static information is extracted from the source code. However, there may be differences between what is extracted from source code and what is the real dynamic behaviour. In techniques such as OO programming, such differences can be seen in cases of, for example, member (e.g. method or attribute) overriding and method overloading. In such cases, a special representation of these cases in the underlying model can help to reveal problems related to the dynamic behaviour.

In AOP, this situation seems to be more critical. Underlying models for AO testing are often adapted from other paradigms and programming techniques. Such models adapt existing abstractions by simply adding nodes and edges to represent the integration of some aspectual behaviour with the base program [8, 15, 45]. This is a limitation because the gap between the static information used to build the underlying model in AOP and the its dynamic behaviour is more evident. For example, AOP allows the use of different mechanisms, such as the cflow command or the around advice, which are inherently runtime-dependent.

To ameliorate the aforementioned problem, our research group applies a more sophisticated approach. We devised a series of underlying test models based on static information which are closer to the dynamic behaviour of the program. We consider specific situations that happens in OO and AO to be represented in the models and then generate relevant test requirements for testing dynamic behaviour of that program. We use the Java bytecode to generate the underlying model for programs written in Java and AspectJ [11, 14, 16, 24, 46]. We take advantage of the AspectJ weaving process to extract static information of two different programming languages from one unified representation (the Java bytecode). This reduces the gap between static information and dynamic behaviour of a program. Moreover, our approach handles some particular cases where the bytecode does not have sufficient information for building the underlying model. This is related to information that enables the generation of relevant test requirements for testing OO and AO programs such as overriding, recursion and around advice.

Deriving test requirements

Structural testing uses an internal perspective of the system to define testing criteria and derive test requirements. These test requirements aim at exercising the program’s data structures and its control flow. To better analyse the issues of deriving test requirements in AO programs, we summarise some research that has proposed structural testing criteria for procedural and OO programs. Afterwards, we describe adapted (procedural and OO) criteria to AO programs and contrast them with AO-specific to emphasise the tricks of deriving test requirements in AO programs.

Structural requirements for procedural and OO programs

Control flow- and data flow-based criteria for procedural programs (e.g. all-nodes, all-edges and all-uses) are well-established. They date from 30 years ago [39] and have been evolved to address the integration level [47]. The underlying models explicitly show the internal logic of units and the data interactions when either unit or integration testing is on focus.

For OO programs, control flow and data flow criteria are evolutions of criteria defined for procedural programs. For instance, Harrold and Rothermel [48] addressed the structural testing of OO programs by defining data flow-based criteria for four test levels: intra-method, inter-method, intra-class and inter-class. The authors addressed only explicit unit interactions; dealing with polymorphic calls and dynamic binding issues—i.e. OO specificities—was listed as future work [48].

Inspired by Harrold and Rothermel’s criteria, Vincenzi et al. [49] presented a set of testing criteria based on both control flow and data flow for unit (i.e. method) testing. Vincenzi et al. approach relies on Java bytecode analysis and is automated by the JaBUTi tool. As the reader can notice, unit interactions was again not addressed by the author.

Structural requirements for AO programs

In our research [11], we developed an approach for unit testing of AO programs considering a method or an advice as the unit under testing. We proposed a model to represent the control flow of a unit and the join points that may activate an advice. Special types of nodes, the so called crosscutting nodes, are included in the CFG to represent additional information about the type of advice that affects that point, as well as the name of the aspect the advice belongs to. Control flow and data flow testing criteria are proposed to particularly require paths that include the crosscutting nodes and their incoming and outgoing edges.

To address the integration level, we explored the pairwise integration testing of OO and AO programs [14]. In short, the approach combines two communicating units into a single graph. We also defined a set of control flow and data flow criteria based on such representation. Figure 1 exemplifies the integration of two units (caller and called). Note that one of the units is affected by a before advice, which is represented with the crosscutting node notation. Note that crosscutting nodes are represented as dashed, elliptical nodes.

Neves et al. [46] developed an approach for integration testing of OO and AO programs in which a unit is integrated with all the units that interact with it in a single level of integration depth. We presented an evolution [24] of the approaches presented by ourselves [11, 14] and by Neves et al. [46]. We augmented the integration of units considering deeper interaction chains (up to the deepest level), without making the integration testing activity too expensive, since we integrate units in a configurable level of integration depth. Such augmented integration approach also brings customised control flow and data flow criteria. We highlight that all the representation models we proposed relies on Java bytecode analysis; furthermore, they all represent crosscutting nodes using a special type of node as shown in Fig. 1.

Our most recent approach characterises the whole execution context for a given piece of advice in a model that represents the execution flow from the aspect perspective [16]. A set of control flow and data flow criteria was proposed to require the execution of paths related to base code-advice integration.

Covering and analysing test requirements

In a series of preliminary assessment studies, we emphasised the effort required to cover test requirements derived from the proposed criteria for pairwise testing [14], multi-level integration testing [24] and pointcut-based integration testing [16]. A summary of the results is depicted in Table 1.

Table 1 Results of evaluation study of structural-based testing approaches

Testing of aspect-oriented programs: difficulties and lessons learned based on theoretical and practical experience

Abstract

Background

Methods

Results

Conclusions

Introduction

Background

Structural testing

Fault-based testing and mutation testing

Aspect-oriented programming

Structural-based viewpoint analysis

Creating an underlying model

Deriving test requirements

Structural requirements for procedural and OO programs

Structural requirements for AO programs

Covering and analysing test requirements

Related work on structural-based testing of AO programs

Mutation-based viewpoint analysis

Creating an underlying model

Deriving test requirements

Mutation operators for procedural and OO programs

Mutation operators for AO programs

Covering and analysing test requirements

Related work on mutation-based testing

Reuse-centred viewpoint analysis

Study configuration

Effort to adapt test sets across paradigms

Structural coverage yielded by test sets across paradigms

Related work

Final remarks, limitations and research directions

Endnotes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Author’s contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords