The graphs proposed in the paper have been designed to support the following requirements: (a) they should be able to express the different types of relations available in object-oriented systems, including relations due to dynamic calls and reflection; (b) they should support the creation of coarse-grained groups of objects to increase readability and scalability; (c) they should provide means to distinguish objects created by different threads; (d) in order to provide support to dynamic architecture conformance, it should be possible to highlight relations that are expected—or that are not expected—when running a system. Finally, it should be possible to extract OGs from running systems in a non-invasive way.
Formal definition An OG is a directed graph that represents the dynamic behavior of the objects in an existing system. In an OG, the nodes denote objects (and classes with static members) and the edges represent possible relations between the represented nodes. In formal terms, an OG is defined as a graph (Nodes, Edges), where Nodes and Edges are the following sets:
$$\begin{aligned} Nodes&= Type \times Name\\ Type&= \{object,class\}\\ Name&= UnsignedInt \times String \times String\\ Edges&= Nodes \times Nodes \end{aligned}$$
where Type is a set with the two possible types of a node (which can represent objects or classes) and Name is a tuple with three fields: the insertion order of the node (a non-negative integer), the name of the class of the node (a string), and the node’s color (a string). Finally, Edges are ordered pairs of Nodes.
In the following paragraphs, we provide more details on this definition, including information on how OGs must be displayed.
Nodes As defined by the set Type, there are two types of nodes in an OG. Nodes in the form of a circle denote objects. Nodes in the form of a square represent classes. Circle-shaped nodes have the same life span of the objects they model (in other words, a circle node is inserted in an OG when an object is created in the host program; likewise, it is removed when the represented object is destroyed by the garbage collector). Square-shaped nodes are created to model accesses to static members of a class. Therefore, only classes with static members accessed by objects are represented in an OG.
As defined by the set Name, the name of a node is a tuple with three fields. The first field is a sequential non-negative integer that indicates the order in which the nodes have been inserted in the graph. By convention, the first node receives the number zero (typically, this node denotes the class containing the application’s main method). The goal of this number is to guide the developers when “reading” the graph. The second field indicates the class name of the represented object (in the case of circular nodes) or the name of the class whose static member has been accessed (in the case of square nodes). Finally, the third field represents the node’s color. In OGs, colors are used to distinguish circular nodes created by different threads. Nodes created by the main thread have a white color and a fresh color is automatically assigned to nodes representing objects created by other threads.
Edges As defined by the set Edges, edges denote relations between objects and classes. Suppose that \(o_1\) and \(o_2\) are circle-shaped nodes (representing objects) and that \(c_1\) and \(c_2\) are square-shaped nodes (representing classes). The directed edge \((o_1,o_2)\) indicates that \(o_1\)—at some point during its life span—has obtained a reference to \(o_2\). This reference could have been acquired by an object’s field, by a local variable, or by a method’s formal parameter. Similarly, the directed edge \((o_1,c_1)\) indicates that \(o_1\)—at some point during its life span—has called a static method implemented by \(c_1\). On the other hand, an edge \((c_1,o_1)\) indicates that a static member of \(c_1\)—at some point of the program’s execution—has obtained a reference to \(o_1\). Finally, the edge \((c_1,c_2)\) indicates that \(c_1\) has accessed a static member of \(c_2\).
Edges are inserted in an OG as soon as the represented relation is established during the execution of the host program. When a node is removed from the graph, its incoming and outcoming edges are also removed. Furthermore, for the sake of readability, edges denoting loops (i.e. edges starting and ending in the same node) are not represented.
Example (Nodes and Edges) Consider the code shown in Listing 1.
In this code, the Main class creates an object of type Invoice and calls the load method (lines 4–5). This method creates and adds a Product to an ArrayList (lines 11–13). Figure 1 presents the OG generated by the execution of the code fragment shown in this listing. This OG has one square-shaped node (representing the class with the main method) and three circle-shaped nodes, representing the Invoice, ArrayList, and Product objects.
The extracted OG illustrates in a compact way the runtime behavior of the presented program fragment. Following the sequential integers associated with each node, it is possible to conclude that initially the Main class (node 0) has accessed an Invoice object (node 1). Next, this Invoice object has accessed an ArrayList object (node 2). Finally, a Product has been created (node 3). This Product instance has been accessed by the Invoice object (responsible for its creation) and by the ArrayList object (responsible for its storage).
Example (Threads) Consider the code presented in Listing 2. In this code, the Main class creates and activates two Box threads (lines 3–4). Each thread creates a Product object (line 10). Figure 2 presents the OG generated by the execution of this program. In this OG, the Main class (node 0) has references to two Box objects (nodes 1 and 3). Moreover, we can verify that each Box references its own Product
object (nodes 2 and 4). More importantly, nodes denoting Product objects have different colors, because they have been created by different threads.
Packages and domains As it is common when extracting runtime diagrams, the number of nodes and edges in an OG can grow rapidly, even for small applications. Therefore, to promote the scalability of OGs, there are two forms of summarization: by packages or by domains. When package summarization is enabled, all the objects and classes from a given package are represented as a single node. In such compacted graphs, suppose two nodes representing packages \(p_1\) and \(p_2\). In this case, an edge \((p_1,p_2)\) indicates that at least one element summarized by \(p_1\) is connected to an element summarized by \(p_2\).
The second form of summarization is by domain. Basically, in the particular context of this paper, a domain is a group of nodes explicitly defined by developers using the following syntax:
where \(\mathtt{{{<}name{>}}}\) is the domain name and \(\mathtt{{{<}classes{>}}}\) is a list of classes separated by commas. For summarization purposes, objects from the specified classes will be represented in the graph by a single node, in the form of a hexagon. Moreover, to facilitate the specification of domains, classes can be defined using regular expressions (e.g. model.*DAO denotes the classes in the model package whose names end with DAO).
Domain-based summarization is more flexible than summarization by packages, because developers can explicitly define the domain names—to resemble, for example, architectural relevant components and abstractions. Moreover, developers have the freedom to define the members of a domain, by mapping classes to their respective domains. By contrast, summarization by packages is more rigid, since it assumes that architectural relevant components can be extracted automatically from the package hierarchy. From our experience with OGs, the usual procedure is to start by using OGs with package summarization, especially when no other form of documentation is available. After an initial understanding of the architecture, maintainers usually get enough knowledge to define their own domains (e.g. domains that summarize packages related to persistence, when the maintenance task does not require changes in persistence concerns).
Example (Domains) Consider a hypothetical system following the model–view–controller (MVC) architecture [17]. To provide a high-level picture for this architecture, the domains presented in Listing 3 have been defined. In this listing, the View domain denotes instances of the myapp.view.IView class and of its subclasses (as prescribed by the operator +) (line 1). The Controller domain includes objects from any class implemented in the myapp.controller package (line 2). The Model domain includes objects whose class names begin with myapp.model and end with the string DAO (line 3). In the specification of domains, the operator ** denotes classes from packages with a given prefix. For example, the Swing and Hibernate domains include, respectively, objects from classes in the javax.swing and org.hibernate packages, as well as objects from classes implemented in inner packages (lines 4 and 5).
Figure 3 presents the OG extracted for the MVC-based system considered in this example. First, we can observe that the nodes associated with domains are displayed as hexagons. However, there is a single node in the form of a circle (node 3, Util), representing an object whose class has not been included in any of the defined domains. In other words, objects or classes that are members of a defined domain are summarized by a hexagonal node; objects or classes that are not captured by any defined domain continue to be represented by circles (in the case of objects) or squares (in the case of classes).
As can be observed in the OG presented in Fig. 3, the target system’s architecture follows the MVC pattern. For example, there is a bidirectional communication link between the View and Controller domains, and between the Controller and Model domains. Furthermore, the OG reveals that the Controller acts as a mediator between the View and the Model, as expected in MVC architectures. We can also observe that only the View relies on services provided by the Swing framework (for GUI concerns) and that only the Model is coupled to the Hibernate framework (for persistence concerns).
Detailed information on edges It is also possible to display detailed information on the object-oriented relations modeled by an OG’s edges. Suppose that \(o_1\) and \(o_2\) are nodes in an OG and (\(o_1\),\(o_2\)) is an edge connecting such nodes. An edge’s name is a structure in the following format:
The members of this structure are
-
Edge_Order is a sequential non-negative integer that indicates the order in which the edges have been created in the graph. This integer makes possible a sequential reading of the graph’s edges.
-
O1_Order is a sequential non-negative integer enclosed by brackets that indicates the order in which the node \(o_1\) was inserted in the graph.
-
Location represents the program location where the relation was established.
-
O2_Order is a sequential non-negative integer enclosed by brackets that indicates the order in which the node \(o_2\) was inserted in the graph.
-
O2_Service represents the service provided by \(o_2\) that has been accessed to establish the edge.
-
Suffix provides information about both the Location and Target elements. It can assume one of the following values:
-
() indicates access to methods.
-
(MS) indicates access to static methods.
-
(C) indicates access to constructors.
-
(A) indicates access to attributes.
-
(AS) indicates access to static attributes.
-
\(<\)new\(>\) indicates that an object has been created.
Example (information on edges) Listing 4 shows information on the edges of the OG presented in Fig. 1. In this listing, line 1 indicates that the static method Main.main (suffix MS) has a static field (suffix AS) that references an instance of the class Invoice. Line 3 indicates that at the location Invoice.load() the source object has created an ArrayList. Next, at the same location, this ArrayList object has been assigned to the field listProducts (suffix A, line 4). Finally, the ArrayList.add() method has been called (line 6).