An exception handling system for service component architectures
© The Brazilian Computer Society 2012
Received: 20 December 2011
Accepted: 16 January 2012
Published: 4 February 2012
The Service Component Architecture (SCA) makes it possible to combine existing and new services based on a variety of technologies with components built using a component-based development approach. However, when asynchronous service compositions are executed, one or more errors can occur, possibly at the same time, affecting the dependability of the composition. To guarantee that the composition succeeds or at least fails in a controlled manner, fault tolerance mechanisms must be employed. In this paper, we propose a novel exception handling model that targets the needs of dependable SCA applications. The model is applicable to service-oriented systems and allows the creation of fault-tolerant asynchronous service compositions. The EH-SCA framework instantiates the proposed model as an extension of the Apache Tuscany SCA infrastructure. Developers can apply this instantiation of the model to both new and existing applications by using a simple and flexible aspect-oriented programming model. Finally, a case study of the EH-SCA framework shows how it can be used to build dependable distributed applications.
KeywordsException handling Service-component architectures Fault tolerance Service-oriented computing Coordinated exception handling
Service-Oriented Architecture (SOA) is an architectural model that aims to enhance efficiency, agility and productivity of enterprise businesses by structuring services and service compositions . A service is defined as a self-contained distributed unit, composed of two loosely coupled elements: a specification with a provided abstract interface and an implementation. Services can be grouped to be executed in a specific order, either synchronously or asynchronously, resulting in a service composition.
Different software technologies can be used to implement the SOA paradigm, such as Web Services technology, which is based on XML-based standards, like Simple Object Access Protocol (SOAP) and Web Services Description Language (WSDL). SOA can also be implemented using a Service Component Architecture (SCA) , which defines a component model for implementing services and service compositions. A service implemented within an SCA component model is called a service component. SCAs support interoperability among various SOA technologies, such as Web Services, WS-BPEL, Java Message Services (JMS), JSON-RPC, CORBA, and EJB.
In particular, when asynchronous services compositions are executed, one or more errors can occur concurrently in different services, possibly at the same time, affecting the composition’s dependability. In this way, fault tolerance mechanisms are necessary in order to prevent services compositions from reaching a failure state. There are two ways to recover a service composition from an error: backward and forward error recovery. The former is based on rolling the system components back to a previous correct state, while the latter involves transforming the system into a new correct state using exception handling mechanisms. Considering that, in the SOA context, it is not always possible to rollback services, since they are by definition autonomous and self-contained units, one must rely on exception handling mechanisms to bring the system to a new correct state, for providing fault-tolerant asynchronous service compositions. However, exception handling mechanisms for asynchronous services compositions should consider the fact that different exceptions types can be raised concurrently by different services at the same time. This means that different combinations of concurrent exceptions might imply in the execution of different combinations of service handlers sets. This complex error handling scenario requires that exception handling mechanisms be flexible and dynamic during runtime. Moreover, some global coordination mechanism for error handling is required, such as Coordinated Atomic actions (CA actions) . A CA action provides fault tolerance by integrating coordinated exception handling, cooperative multithreading, and atomic transactions.
WS-Business Process Execution Language (WS-BPEL) , one of the most popular languages to create Web services compositions, does not support fault-tolerant asynchronous service compositions. When a service signals an exception within a composition, all services invocations are terminated as soon as the exception is caught, and a single handler is executed. There is no support for error handling coordination, and concurrent exception handling. Tartanoglu et al.  proposes a solution in terms of a structuring unit called Web Service Composition Action (WSCA), based on the concept of CA action without transactional guarantees. However, this solution has some drawbacks. First, for each exception raised by a service, the same exception is delivered to all composition’s participants, decreasing the flexibility for the implementation of handling actions. Second, the recovery process is strongly based on compensation actions, which is not a mandatory feature present in the implementation of a service. Moreover, the proposed solution is based only on Web services technology.
In this paper, we present the design and implementation of a coordinated exception handling model targeting some of the particularities of SCA (Sect. 2.1) systems. It allows the creation of fault-tolerant asynchronous service compositions in a flexible way. Also, considering that SCA systems may be highly dynamic, it supports a flexible notion of exception propagation where recovery rules that dictate how exceptions are propagated can be defined on a per-application basis. The definition of application-specific recovery rules, which are not necessarily based on compensation actions, makes our solution more general and flexible than WSCA . We describe our solution using the primitives and abstractions of the Guardian model [15, 16] for exception handling (Sect. 2.3). Guardian is a general conceptual framework for describing coordinated exception handling models and mechanisms. We have implemented the proposed exception handling model as a framework, named EH-SCA (Sect. 3), which extends the Apache Tuscany SCA platform  (Sect. 2.2). The latter is an SCA infrastructure capable of integrating various SOA technologies. To use EH-SCA in their applications, developers can leverage a simple aspect-oriented programming (AOP)  programming model that requires little effort to use (Sect. 4). We also provide an example of the usage of the EH-SCA framework to implement a primary-backup system (Sect. 5).
2.1 Service component architecture
A software component is a unit of modularity composed of two parts: the specification part, with explicit provided and required interfaces, and the implementation part . A service component provides support for implementing services using components. Each component can implement one or more services, where the service’s implementation is part of the component’s implementation, and the service’s specification is mapped to a component’s provided interface. An implementation defines the materialization of the business logic into a specific technology, including programming languages, like Java, C++, and Ruby, and frameworks and environments, such as Spring and BPEL. A provided interface, referred to as “service” in the SCA context, defines the operations provided by the service component. Moreover, a service component can use required interfaces provided by other service components, known as “references.”
SCA service components can be connected either manually (by a programmer) or automatically (by the SCA runtime environment), using services and references. The communication protocol used between two service components is specified over the SCA “binding” element, which implies that the service component’s communication infrastructure is separated from the business logic implementation, enhancing the service component’s reusability.
In terms of error handling, SCAs pose a number of challenges. First of all, SCA systems are essentially service-oriented. They may span different administrative domains and comprise services whose implementations are not available or that are hosted by different organizations. As pointed out previously , error recovery mechanisms cannot make assumptions about the recovery capabilities of these services. In this scenario, it is not possible to employ a rollback-based approach. Second, services can be invoked asynchronously and errors may be signaled concurrently. Since it is usually not safe to assume that concurrently signaled errors are independent, some means for coordinating error recovery are necessary. A third challenge is that SCA systems can integrate a number of different technologies with different features and based on different standards. In this sense, SCA differs from Web services, since the latter are only one of the technologies that can be used in applications based on the former. An exception handling model for SCAs must be generic and flexible enough to support service compositions involving diverse technologies. At the same time, it has to be well-integrated with the underlying SCA platform, the infrastructure that mediates the interactions amongst the parts of the composition. Finally, the exception handling model must work in a dynamic setting, because components may fail or become temporarily unavailable due to failures, upgrades, reboots, etc., services may be taken out since they can be hosted by different organizations, and application needs may trigger reconfigurations, which imply different interactions and, hence, different sets of parts to be involved in coordinated error handling.
2.2 Apache Tuscany—an SCA platform
Tuscany’s runtime environment has a modular and pluggable architecture so users can choose the functionality that they need. The Composite Application block represents the business application built with Tuscany and described using the XML-based SCA assembly model (Sect. 2.1). Tuscany Extensions are implemented by using the Tuscany SPI (Service Provider Interface), which offers a modularized way to define bindings, databindings, implementation types, policies, and interface types. Bindings provide support for different kinds of communication protocols. Databindings provide support for different data formats for communication among services. Implementation types provide support for different programming languages and container models. Policies provide flexibility to adjust architecture concerns, such as security and transactions, without impacting the business logic code.
To run an SCA business application, the first step the Tuscany runtime takes is to load and configure the SCA SCDL file. The various SCDL file artifacts are inspected, and factory methods help instantiate the various objects, which represent the service components in memory. The next step is to instantiate the runtime wires that connect the components. In this phase, runtime wires are created for component references and component services over the mentioned bindings in the SCDL file.
A runtime wire is a collection of invocation chains. Each invocation chain consists of a set of invokers and interceptors. Invokers provide the invocation logic to binding protocols and implementation technologies, while interceptors are a special kind of invoker that provides additional functionality, such as data transformation, security, and transaction control. The runtime environment creates an invocation chain for each operation in a service/reference interface .
2.3 The guardian model of exception handling
An exception context corresponds to an application specific execution phase or region of a program, which has a symbolic name and a handler associated with it (e.g., in Java, a try-catch block). There are three kinds of exception contexts: signaling context, raising context, and target context. The first one is the context a participant is in when an exception is signaled within it. The second is the context a participant is in when an exception is raised in it. Finally, the target context is the context in which an exception is handled. Furthermore, as depicted in Fig. 4, the communication occurs among participants and guardian members, and guardian members and the guardian group, where a guardian member is a logical replication of the guardian associated with each participant. A set of operations, called guardian primitives, are used as the communication channel. The guardian primitives are responsible for controlling contexts (enabling and removing a context) and exception propagation (throwing global exceptions) at runtime and checking whether there are any pendent global exceptions to be delivered to a specific participant. In a coordinated exception handling mechanism, a global exception is an exception that needs to be handled cooperatively by a set of participants. On the other hand, an exception that can be handled locally within a participant is called local exception .
When one or more different exceptions types are raised concurrently, the guardian uses a resolution tree mechanism to find a resolved exception. Finding a resolved exception consists of searching for the lowest common ancestor in a resolution tree, which establishes a hierarchy among a set of exceptions that can be raised concurrently. Since the resolved exception is found, the guardian relies on the recovery rules to determine which exception should be raised in each involved participant, as well as the proper target context.
3 The EH-SCA framework
In this section, we describe the proposed exception handling model in terms of the abstractions and primitives of the Guardian model. Concomitantly, we present our implementation of these abstractions and primitives, the EH-SCA framework, as an extension to the Apache Tuscany platform. Then, in the next section, we present the EH-SCA programming model, which leverages aspect-oriented programming techniques to simplify the definition of handlers and their association with specific points of an SCA system.
In the current version of EH-SCA, the components that signal, raise, and handle exceptions should be written in Java. To make EH-SCA language-independent would require more in-depth understanding of Tuscany’s internals and a larger implementation effort. Since we organize EH-SCA in terms of the primitives of the guardian model, which is language-independent, we can say that it does not rely on the specifics of the Java language. At the same time, an SCA application whose components employ multiple technologies can still use EH-SCA, since Apache Tuscany can make these components communicate employing different technologies, such as WS-BPEL, JSON-RPC, and Web Services. In scenarios where third-party services must be integrated in a fault-tolerant manner by using EH-SCA, service components written in Java are responsible for raising and handling the exceptions. This does not limit the applicability of the proposed model and implementation. The composition of third-party services can be performed by using EH-SCA-based service components as proxies for the composed services, since it is not be possible to modify the composed services.
3.1 Exception representation
We define exceptions as classes that extend the GlobalException class. These exceptions are global, i.e., they flow between different service components. Instances of GlobalException carry the information of which participant has raised the exception, the context in which this exception was raised (the signaling context), as well as the context in which the exception should be handled (target context). The model also predefines some membership global exceptions, such as JoinException and LeaveException. A JoinException is raised when a new participant joins a group (Sect. 3.2). In a similar way, a LeaveException is raised when a participant leaves the group. These exceptions are useful to maintain compatibility with the Guardian model.
A program raises a global exception by calling the gthrow() method (Sect. 3.2). This method is homonym to the primitive of the Guardian model responsible for signaling a global exception. In our Java implementation, local exceptions are raised by using the throw statement. We do not impose any constraints on how exceptions are represented within a component since programming languages employ different approaches for exceptions. Some of them, such as C, do not have the concept of exception nor anything similar to it.
The signatures of the services in component interfaces should explicitly indicate the exceptions they raise. During composition, the compiler should check whether clients of these services handle these exceptions and, if they do not, produce an error message. In other words, global exceptions are checked exceptions. A number of modern programming languages, such as C#, Scala, and Go, do not use checked exceptions because of their well-known maintainability issues . Nonetheless, we consider that explicitly indicated exceptions provide useful documentation. In addition, they represent part of the contract that users of a service must be aware of. Furthermore, due to compiler support, they can improve system reliability by emphasizing the need for errors to be handled. Finally, if checked exceptions are employed only at the component interface level, changes to such interfaces do not necessarily imply global changes as would be the case for finer-grained checked exceptions.
3.2 Exception handling contexts and coordination
The guardian group is the central entity responsible for mediating the interaction between participants of a composition when errors occur during its execution. The guardian group provides an interface to enable and disable raising, signaling, and handling exception contexts. In EH-SCA, the guardian group is itself a service component.
Each participant is associated with a set of exception handling contexts, enabled dynamically, at runtime, and explicitly, by the application. Nonetheless, applications can enable and disable contexts in a nonintrusive way, without the need to modify the source code of preexisting service components. When one or more global exceptions are raised within a context, all the participants where that context is enabled are involved in coordinated error recovery. In the proposed model, contexts are always associated with sets of participants and may be nested, similarly to Java’s try blocks (although the latter define static scopes). As discussed in the previous section, for SCA systems, it does not make sense to define finer-grained, intracomponent exception handling contexts since components might be implemented using radically different technologies and languages.
The first two methods control exception contexts. The enableContext() method adds and enables a context c in a LAST-IN-FIRST-OUT (LIFO) context list associated with a participant. The removeContext() method removes the last added context in same list. The Context class implements the concept of exception context, aggregating a name and a list of exceptions that can be handled in the context.
The gthrow() and propagate() methods control the flow of global exceptions. The first one was explained previously and is used by a participant to throw a global exception ex to a set of participants specified in participantList. The invocation of gthrow() causes the suspension of all the participants listed in participantList, as well as the interruption of the invoker participant. The propagate() method determines whether the global exception ex should be handled in the current context or propagated to an upper level context. In other words, the method compares the current context with the target context specified in ex. Since the guardian does not have control over the exception flow inside the participant, the existence of the propagate() method is necessary.
At last, the checkExceptionStatus() method allows a participant to check the existence of any pendent global exception that needs to be handled. It is executed periodically by the participants. If there are any global exceptions to be delivered, they are raised within the participant. Otherwise, the method simply returns.
3.3 Handler attachment
In the proposed model, exception handlers can be attached to local or global contexts. Local contexts are the contexts that the underlying programming language implements. In EH-SCA, a local context corresponds to a block of statements, the only kind of exception handling context that Java supports, by means of try-catch blocks, where the catch blocks are local handlers. Local handlers address internal exceptions. Handling these exceptions does not require a coordinated approach.
In broad terms, we consider that global handlers can be attached to sets of service components taking part in a composition. At the same time, participants of a composition can have multiple global exception contexts associated with them. For each context, it is possible to attach a number of exception handlers. In fact, there are no bounds on the number of contexts per service component, nor on the number of exception handlers per global context. When a global exception is signaled by a method, it is passed on to the guardian. The latter, based on its recovery rules (Sect. 3.4), decides which exception will be raised in each participant of the composition and the context where this will happen. An exception handler is triggered in a service component if it has an exception handler attached to the selected context and targeting the raised exception.
In EH-SCA, a handler is any subclass of the AbstractHandler class. The latter defines the execute() method, which receives a single argument of type GlobalException and implements the handler logic. Components in an application that uses EH-SCA should extend the HandlerContainer class. This class implements methods for managing the handlers attached to the contexts enabled in a service component. We more carefully describe the implementation of exception handlers in EH-SCA in Sect. 4.
3.4 Exception propagation
Exception propagation is a difficult issue in service component architectures. As pointed out in Sect. 2.1, exceptions must be propagated in nonstandard ways because SCA systems are intrinsically dynamic due to user needs, heterogeneous technologies, administrative issues, and the distributed setting in which they run. As a consequence, SCA systems require more flexible policies for exception propagation, to make it possible to deal with situations such as the enabling and disabling of exception handling contexts at runtime or simply the unavailability of certain service components.
In the proposed model, recovery rules determine the exception propagation paths in an application. They establish the destination of an exception raised in a set of contexts associated with a set of participants. These rules also determine the exception that will be handled by each participant of a guardian group involved in coordinated exception handling. To better cope with the dynamism of SCA systems, recovery rules can be enabled and disabled at runtime. Hence, the propagation paths in an application can be modified dynamically, orthogonally to its structure, as required during its execution. To the best of our knowledge, this is the first exception handling model to provide this kind of flexibility to software developers.
Each participant has a dot-separated identifier defined by the elements in its context list. The same structure is applied for building the regular expression associated with the match attribute, where the character “*” can be used as a wildcard, meaning that it does not matter the context the participant is. Also, the keyword SIGNALER can be used to retrieve the participant that has raised the referred exception.
The exception that should be raised in the selected participants is determined in the throw_exception element (line 4), where the exception class is specified via class attribute and the context where the exception will be handled via the target_context attribute. The min_participants_joined and max_participant_joined optional attributes represent, respectively, the minimum and maximum number of participants that must join the guardian group for the exception to be delivered to the selected participants. Finally, the affected_participants optional element (line 9) yields a subset of the selected participants, for example, the first (FIRST keyword) or the last (LAST keyword) in the selected participant list. The order of participants in the list is determined by the order in which the guardian receives the requests for association.
3.5 Exception resolution
The proposed model, analogously to previous approaches , uses exception resolution trees to determine which exception represents a set of exceptions raised concurrently in a certain context. In summary, the exceptions that can be raised in a system are organized as the nodes of a tree. When two or more exceptions are raised in a given context, the tree is looked up to find a node E that has all of the raised exceptions as its children. E is then called a “resolved exception” and it is the exception sent to all the participants in involved coordinated error recovery. On the other hand, the resolved exception may undergo a transformation before it is delivered to each participant of the composition. This transformation is useful to allow independently-developed exception handlers to work as a unit because they may have been defined in terms of different exception types. As a consequence, after resolution, a number of different exceptions can be delivered to the various handlers. The transformation of resolved exceptions is defined by means of recovery rules, more specifically, the throw_exception element of Fig. 8.
In the event that two exceptions are concurrently raised, EH-SCA first obtains the values stored at the corresponding positions of R. It then uses these values as indexes for vector L. The lowest depth appearing between these values in L is the depth of the resolved exception. EH-SCA obtains it by inspecting the corresponding position in vector E. Both the time required to construct the E, L, and R vectors and the time to perform exception resolution are linear with the number of nodes in the tree.
3.6 Guardian member
The usage of a policy avoids the need to explicitly declare the guardian members in the SCDL file. At the developer side, a service component participant communicates directly with the guardian (another service component). However, this communication is mediated by an interceptor that hides the guardian member logic, and the defined communication model is held: participants ↔ guardian members, guardian members ↔ guardian group.
As an alternative to using a policy to implement guardian members, we could have implemented guardian members as service components. In this scenario, GuardianMemberImpl would need to implement a hypothetical GuardianMember interface which extends GuardianPrimitives (Sect. 3.2). Moreover, it would have a reference to the guardian service component. This approach has two drawbacks. The first one is that there is a runtime overhead due to the need to manage extra service components. A policy is much cheaper than a component. The second one is that developers of service-oriented applications would then need to be aware of guardian member components, declaring them in the SCDL file. By using policies to represent the latter, developers only indicate the participants of a fault-tolerant composition, associating them with the guardianExceptionHandling intent.
4 AOP programming model for EH-SCA
The EH-SCA programming model, as the guardian programming model defined originally by Miller [15, 16], consists of the invocation of the guardian primitives by a participant using a predefined programming pattern. The way in which participants invoke the guardian primitives depends on recovery actions implemented in the application logic. However, a general structure based on the conversation concept , that is suitable to a large range of applications, is depicted in Fig. 13.
The code snippet enables a context within a scope, checks for any pendent global exceptions, executes the application-specific code, and then removes the enabled context. There is also a handler for each global and local exception that can be raised, respectively, by the checkExceptionStatus() guardian primitive and by the application-specific code. The problem with this approach is that application logic is tangled with error recovery code. Ideally, the parts of the code that enable and disable contexts, and handle exceptions should be separated from application code . In this manner, it becomes possible to change exception handlers without affecting application-specific parts of the code. At the same time, one can reuse exception handlers across components within the same application, in order to avoid duplication .
The HandlerContainer abstract class implements a number of operations to manipulate global exception handlers. These operations support the retrieval, attachment, and deletion of handlers (Fig. 14) associated with contexts of a service component. The main class implementing each service component must be a subclass of HandlerContainer in an application that uses EH-SCA. In addition, the class must provide an implementation for the getGuardianReference() abstract method to make the guardian accessible.
As briefly mentioned in Sect. 3.3, the AbstractHandler abstract class represents a generic handler for GlobalException. Application-specific exception handling strategies for a certain context are realized by implementations of the execute() abstract method. The remaining methods of AbstractHandler provide: (i) the Class object corresponding to the type of the handled exception; and (ii) the actual parameters of the method to which the Context annotation is linked. EH-SCA provides an implementation for AbstractHandler, the DefaultHandler class. The latter provides constructors that developers can use to pass information to the handler, and an empty execute() method.
The AssemblerAspect aspect combines all the aforementioned elements. It intercepts the execution of application-specific methods annotated with Context and Checkpoint in subclasses of HandlerContainer and uses the handler management methods to integrate normal and exceptional behavior. Using the aspect-oriented programming model involves the following steps: (1) to make the main class of a participant service component a subclass of HandlerContainer; (2) to implement the getGuardianReference() method so as to return the guardian; (3) to define a set of exception handlers as subclasses of AbstractHandler, attaching them to service components by means of the addHandler() and setHandlers() methods (Fig. 14); and (4) to annotate each method that takes part in a context where exceptions are handled with the Context and Checkpoint annotations.
5 Case study: primary-backup system
This section describes how to use EH-SCA, including the AOP programming model, to implement a primary-backup structuring technique in a client-server application. The primary-backup approach is well known in the fault tolerance literature and, as a consequence, a good showcase for the capabilities of EH-SCA. Moreover, previous work on the Guardian model  has employed a primary-backup system as a case study.
5.1 System description
Global exceptions and the contexts in which they need to be handled
Internal primary server error
Indicates the existence of a primary server
MAIN, PRIMARY, BACKUP
Internal backup server error
Indicates that a new backup has joined in the group
5.2 Execution scenarios
A normal execution of the application with no internal errors in the server nodes follows these steps: the ServerNode1 component is started in the MAIN context, causing the raising of a JoinException by the guardian. Rule1 is executed, but no exception is delivered since there is not a minimum of participants joined in the group. Thus, the component reaches the MAIN.PRIMARY context. When the ServerNode2 component starts executing, the raising of the JoinException causes the delivery of a BackupJoinedException to be handled in the PRIMARY context of ServerNode1, and a PrimaryExistsException to be handled in the MAIN context of the ServerNode2. When an invocation of checkExceptionStatus() triggers the raising of exceptions in the service components, the proper handlers are activated and the ServerNode1 updates its backup list, while the ServerNode2 reaches the MAIN.BACKUP context.
If an internal error occurs in the primary server, an exception of type PrimaryFailedException is signaled to the guardian, where the Rule2 is executed. The guardian delivers a PrimaryFailedException to the ServerNode1 and ServerNode2, but with different target contexts: INIT and MAIN, respectively. When the checkExceptionStatus() is invoked within both service components, an interruption is caused in ServerNode1. At the same time, ServerNode2 becomes the new primary server by executing the proper handler for PrimaryFailedException associated to the MAIN context. In case of a backup failure, a similar path is executed over the recovery rule Rule3. At the end, the primary removes a backup from its backup list, and the failed backup node is interrupted.
It is important to emphasize that exceptions are signaled to the guardian through the use of the gthrow() guardian primitive. In fact, the EH-SCA programming model with aspects hides most of the details related to the guardian primitives programming. However, some of them still need to be invoked explicitly. The same applies to the propagate() primitive. The latter is necessary to make exceptions reach their intended contexts. Further, the INIT context is a top-level context defined by the guardian in an application.
It is important to stress that, without the resolution tree, the guardian would process the exceptions sequentially. If PrimaryFailedException is processed first, the primary server will be interrupted and the first backup in the list will became the new primary. However, the backup is down, and will not be able to assume the primary role. If BackupFailedException is be processed first, a similar problem occurs: the primary receives a notification indicating the failure of the backup but the primary itself has failed as well. As a consequence, the backup list is not updated, and the new primary sends messages to a failed backup.
6 Related work
Garcia and Toledo  present an architecture for the construction of fault-tolerant Web Services. It extends the Universal Description, Discovery, and Integration (UDDI—the standard protocol for publishing and discovering services in Web Service-based systems) protocol to enable quality of service monitoring. This architecture introduces two new elements in service-oriented applications: the monitor and the mediator. The first one detects, notifies, and confines errors, intercepting messages with the goal of analyzing and testing services. It is also responsible for forwarding service invocations to replicas, when necessary. On the other hand, the mediator creates and manages these replicas. This approach leverages replication as a means for fault tolerance. In a similar vein, Chen and Romanovsky  introduce a mechanism named WS-Mediator to improve the reliability of Web service integration. The WS-Mediator is responsible for monitoring Web Services, collecting information such as response time and failure rate. Moreover, it applies user-defined policies depending on the information it obtains. Both of these approaches improve the dependability of service-oriented applications. However, none of them uses coordinated exception handling. Also, they are only applicable to Web Services and not to SCA.
In the literature some solutions provided fault tolerance over distributed systems based on the CA action concept. The work by Tartanoglu et al.  is based on CA Action concepts without transactional guarantees, resulting in a structure called Web Service Composition Action (WSCA). The WSCA description uses a XML-based language called Web Service Composition Action Language (WSCAL). The solution is implemented using the Web services technology, and is strongly based on the use of compensation actions. In our solution, a wide range of SOA technologies can be used (it is not restricted to Web services), and the recovery rules allow different ways to provide the recovery actions (including compensation actions).
The work of Gorbenko et al.  proposes an approach to model reliable Web Services that comprise unreliable components. It uses WSCA as a means to structure Web Services using CA actions. The authors apply the proposed approach to a travel agency case study and claim that this significantly increases the number of successful Web Service requests. Since this work leverages WSCA, it exhibits the same limitations, when compared to EH-SCA.
Silva et al.  propose the use of composition contracts, an adaptation of coordination contracts  for the definition of concurrent fault-tolerant compositions. A coordination contract is a connection among a set of objects that is governed by rules and restrictions that are necessary for collaboration to ensue. Composition contracts extend coordination contracts by using CA actions without atomicity guarantees to implement fault tolerance. Exception propagation is static in this approach. Moreover, it does not define some aspects of the underlying exception handling model, such as what the granularity of exception handling contexts.
Souchon and colleagues  present a model for exception handling in multi-agents system. The model focuses on two problems: (1) preserving the agent programming paradigm, and (2) providing support to cooperative concurrency among the agents. The model implements coordinated exception handling, where resolution is based on resolution functions. However, the recovery rules are fixed, making the model less flexible when compared to the EH-SCA framework. Further, the proposed mechanism is abstract. On the one hand, this means that it may be applicable to systems that use different technologies. On the other hand, it does not consider the peculiarities of real approaches for the construction of distributed systems, in general, or SOA technology, in particular.
Capozucca et al.  describe a framework to develop software systems based on CA actions. Their framework, CAA-DRIP, is an evolution of the DRIP framework proposed by Zorzo and Stroud  to structure software systems in terms of dependable multiparty interactions. None of them targets the construction of distributed systems. Furthermore, they are not targeted at the integration of preexisting components and employ more constrained exception propagation mechanisms. Later on, Capozucca et al.  conducted a survey of existing frameworks implementing the concept of CA actions. Among the frameworks that they examine, only one tackles the problems of SOA technology, the work by Tartanoglu et al. , described above.
A preliminary version of this paper appeared elsewhere . It did not discuss exception resolution and handler attachment in detail, nor the AOP programming model. Furthermore, it presented the case study only superficially.
In this paper, we have presented the design and implementation of a new exception handling model that targets the so-called Service Component Architectures. We are not aware of any middleware platform, service-oriented or otherwise, that provides support for coordinated exception handling in the way that EH-SCA does. Previous work that described actual middleware platforms did not focus on coordinated exception handling and previous work targeting coordinated exception handling did not cater for middleware platforms.
The first contribution of this paper is to allow the creation of fault-tolerant asynchronous service compositions in a SCA architecture. The second contribution is the development of a framework implementing the proposed model named EH-SCA. The EH-SCA framework defines a way to build applications in a conversation-based structuring unit, hiding the explicit usage of most of the guardian primitives. Our implementation is open-source and distributed under Apache2 license.2
Future work includes extending the resolution trees model to allow the usage of different exception levels and implementing some fault tolerance mechanism in the guardian group element because it currently is a single point of failure. In addition, we would like to conduct an experimental validation of the ability of the framework to structure exception handling in a practical application and how that affects the application’s reliability and performance.
We would like to thank the anonymous referees, who helped to improve this paper. Cecília is supported by CNPq (305331/2009-4) and FAPESP (2010/00628-1). Fernando is supported by CNPq (308383/2008-7 and 475157/2010-9), FACEPE (APQ-0395-1.03/10), and by INES (CNPq 573964/2008-4 and FACEPE APQ-1037-1.03/08).
- Anderson T, Lee PA (1990) Fault tolerance: principles and practice, 2nd edn. Springer, BerlinGoogle Scholar
- Becker D (2009) Service component architecture (SCA) lets you invoke components from different technologies. http://www.ibm.com/developerworks/opensource/library/os-apache-tuscany-sca/index.html
- Bender MA, Farach-colton M (2000) The lca problem revisited. In: 4th Latin-American symposium on theoretical informatics, Punta del Este, Uruguay, April 2000, pp 88–94Google Scholar
- Capozucca A, Guelfi N, Pelliccione P, Romanovsky A, Zorzo AF (2006) CAA-DRIP: a framework for implementing coordinated atomic actions. In: Proc of IEEE international symposium on software reliability engineering, Raleigh, USA, November 2006, pp 385–394Google Scholar
- Capozucca A, Guelfi N, Pelliccione P, Romanovsky A, Zorzo AF (2009) Frameworks for designing and implementing dependable systems using coordinated atomic actions: a comparative study. J Syst Softw 82(2):207–228View ArticleGoogle Scholar
- Chen Y, Romanovsky A (2008) Improving the dependability of web services integration. IT Prof 10(3):29–35View ArticleGoogle Scholar
- Fiadeiro JL, Andrade LF (2001) Interconnecting objects via contracts. In: Proc 38th international conference on technology of object-oriented languages and systems, Zurich, Switzerland, March 2001, pp 182–183View ArticleGoogle Scholar
- Garcia DZG, de Toledo MBF (2007) A fault tolerant web service architecture. In: 5th Latin-American Web congress, Santiago, Chile, October/November 2007, pp 42–49Google Scholar
- Gorbenko A, Kharchenko V, Romanovsky A (2007) On composing dependable web services using undependable web components. Int J Simul Process Model 3(1/2)Google Scholar
- Kiczales G, Lamping J, Mendhekar A, Maeda C, Lopes C, Loingtier J-M, Irwin J (1997) Aspect-oriented programming. In: Proceedings of the 11th ECOOP, June 1997Google Scholar
- Laddad R (2009) AspectJ in action: enterprise AOP with spring applications. Manning Publications, CambridgeGoogle Scholar
- Laws S, Combellack M, Feng R, Mahbod H, Nash S (2010) Tuscany SCA in action, 1st edn. Manning Publications, CambridgeGoogle Scholar
- Leite DS, Rubira CMF, Castor F (2011) Exception handling for service component architectures. In: 5th Latin-American symposium on dependable computing, São José dos Campos, Brazil, pp 84–93Google Scholar
- Margolis B, Sharpe J (2007) SOA for the business developer—concepts, BPEL, and SCA, 1st edn. MC Press, ParisGoogle Scholar
- Miller R, Tripathi A (2002) The guardian model for exception handling in distributed systems. In: SRDS’02: proceedings of the 21st IEEE symposium on reliable distributed systems, Washington, DC, USA. IEEE Computer Society, Los Alamitos, p 304View ArticleGoogle Scholar
- Miller R, Tripathi A (2004) The guardian model and primitives for exception handling in distributed systems. IEEE Trans Softw Eng 30(12):1008–1022View ArticleGoogle Scholar
- Randell B (1975) System structure for software fault tolerance. In: Proceedings of the international conference on reliable software, New York, NY, USA. ACM, New York, pp 437–449View ArticleGoogle Scholar
- Silva R, Guerra P, Rubira C (2003) Component integration using composition contracts with exception handling. In: 3rd workshop on exception handling in object-oriented systems, Darmstadt, pp 1–20Google Scholar
- Souchon F, Dony C, Urtado C, Vauttier S (2004) Improving exception handling in multi-agent systems. In: Software engineering for multi-agent systems II. Springer, Berlin, pp 333–337Google Scholar
- Szyperski C (2002) Component software: beyond object-oriented programming, 2nd edn. Addison-Wesley, ReadingGoogle Scholar
- Tartanoglu F, Issarny V, Romanovsky A, Levy N (2003) Coordinated forward error recovery for composite web services. In: 22nd symposium on reliable distributed systems (SRDS), pp 167–176Google Scholar
- Taveira JC, Queiroz C, Lima R, Saraiva J, Castor F, Soares S, Oliveira H, Temudo N, Barreiros E, Araujo A, Amorim J (2009) Assessing intra-application exception handling reuse with aspects. In: Proceedings of the 23rd Brazilian symposium on software engineering, Fortaleza, Brazil, October 2009. IEEE Computer Society, Los AlamitosGoogle Scholar
- Thomas E (2007) SOA principles of service design, 1st edn. Prentice Hall, New YorkGoogle Scholar
- van Dooren M, Steegmans E (2005) Combining the robustness of checked exceptions with the flexibility of unchecked exceptions using anchored exception declarations. In: OOPSLA’05, pp 455–471Google Scholar
- Web services business process execution language version 2.0 (2007) http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.pdf
- Xu J et al. (1995) Fault tolerance in concurrent object-oriented software through coordinated error recovery. In: Proceedings of FTCS ’95, Washington, DC, USA. IEEE Computer Society, Los Alamitos, p 499Google Scholar
- Zorzo A, Stroud RJ (1999) A distributed object-oriented framework for dependable multiparty interactions. In: ACM conference on object-oriented programming, systems, languages, and applications, Denver, USA, November 1999, pp 435–446Google Scholar