CFDSD: a Communication Framework for Distributed Software Development

Due to geographical and/or temporal dispersion, communication between teams in distributed software projects is a critical factor for success. Notably, distributed teams suffer adverse physical and temporal dispersion effects during an information exchange. To mitigate problems arising from interactions, it is important to understand the communication structure of teams during distributed projects. The objective of this work is to present the Communication Framework for Distributed Software Development. This framework groups a set of distributed projects communication concepts and enables a unified view of all stakeholders’ intercommunications that use many interaction technologies. The goal of this framework is to portray the dynamics of the interaction between distributed teams in a multi-tier structure, and each tier approaches a single function in the intercommunication process. We analyzed all interactions from distributed teams after developing an experimental IT Project, using the content analysis method to validate the Communication Framework for Distributed Software Development. The main contribution of this work is the framework specification, and the investigation, which has the potential to help mapping communication patterns in DSD. Moreover, this framework comprises interfaces for communication assessment and includes intercommunications from automated engineering tools as bots.


Introduction
The distributed software development (DSD) is common in companies. The growth of this scenario is a reality. A Gartner report explained by [1] shows that more than 90% of the companies in the Fortune Global 500, a list that ranks the top 500 corporations worldwide according to their revenue, used external resources to deliver Information Technology (IT) services. Lee [1] reports that an engineer of a large IT company said the following: "It is nearly impossible these days to find a software team that is completely collocated. " Several companies adopted DSD to improve their customer relationships, reach new markets, enhance the quality of products and processes, attract specialized labor, reduce time to market, and reduce costs. same purpose; • An interface specification that links communication with software process.
The present work was organized as follows: the next section approaches the fundamentals, related works, and key concepts. The third section presents a detailed description of CFDSD, the subject of this paper. The fourth section exposes the methodology and research procedures employed. Finally, the last section presents the results, an interpretative analysis of the collected data and the conclusions, respectively.

Communication in DSD and related works
Fuks [15] bases his work on the definition of communication as the act or effect of sending, transmitting, and receiving messages using stipulated methods and/or processes, both through spoken or written language or other signs, signals, or symbols or using specialized technological, sound or visual devices.
The media synchronicity theory presented by [25] aggregates a set of essential concepts related to project communication. In the first concept, the author affirms that the success of completing many tasks that involve more than one person is associated with communication performance. Under these terms, the performance of communication is associated with convergence and conveyance [25]. The development of shared knowledge is a success factor for the project in these situations. In the same work, [25] presents a set of attributes associated with the media synchronicity theory, such as synchronism, communication process, speed, capability, parallelism, symbol set, rehearsability, reprocessability, and appropriation.
The media can influence communication performance and include underlying communication processes [27,28], and both, transmission of data and individual cognitive processes, to develop information. When addressing the technology-related challenges of DSD, [10] and [17] state that the most evident configuration is the synchrony in communication. Synchronous communication occurs when the participants have access to transmitted messages in real time. Examples of this are person-to-person conversations or meetings or phone calls or even software with real-time exchanging of texts, audios, and videos. An asynchronous communication, according to [10], occurs when participants The above-related works approach essentials elements for communication in a DSD environment. This set includes works with a focus on social studies, communication process, synchronicity, maturity, and others. Each work approaches a key communication concept in a distributed software development environment, and CFDSD aggregates many of these concepts. This section explains these concepts. Table 1 shows a set of concepts about software projects communication. The first column in this table is the ID, the second the name, and the third includes a list of authors that approach the concept. We approach the elements in Table 1 as below.
The concept site (1) in Table 1, for this research, is an autonomous software production organization. Sites are organizations that develop software and have at least one person (collaborator). The site defined as home-office has only one person and this person is a collaborator of an organization with a process and pre-defined tools. However, [33,34] address another kind of relation between the person and an organization called crowdsourcing. In this case, the development is task-centered, and the person executes many tasks for many organizations. Work [35] presents an essential approach to sites.
According to [36], a software process is a set of activities and results that the purpose to create software. This element is essential for creating a software product or service and many authors such as [37][38][39] highlight the process improvement.
In software development, according to [40], many stakeholders use many tools (Table 1  item 3). These tools are, for instance, IDE (integrated development environment), project management tools, text editors, spreadsheets, communications tools, bots, UML modelers, and others. Each tool, combined with the work process, can generate or update one/or more artifacts (work product). 19 Symbol set [25] Physical distance is the geographic range between sites. This feature defines DSD/GSD. Many authors approach this characteristic as [3,41]. The time zone, in distributed software development, can be associated with temporal distance (item 5 of Table 1) and physical distance. There is a temporal distance when the sites are largely physically distant, or the collaborators have different work schedules. Many authors approach this subject as [1,26].
Media, item 6 in Table 1, is the transport environment between source and destination. The media synchronicity theory aggregates many properties in this element. Researches [25] and [26] approach this subject.
Computer-mediated communication tools (CMC) allow interaction in many different ways and data exchange between individuals. Some authors who approach this theme, item 7 of Table 1, are [42,43]. Social media also compose this element, as reported by [30]. The interaction uses a medium and has content. Several authors approach this subject, such as [25,26,44].
Conveyance (item 9 in Table 1) is the transport of interaction. The authors of [25,26] use this definition as the basis for their work. In item 10, Table 1, convergence is the extraction of meaning from the information transmitted by individuals, according to authors [25,26].
Synchronous or asynchronous communication is recognized as an essential factor affecting interpersonal communication and teamwork [45]. Synchronicity (item 11, in Table 1) is a coordinate shared pattern behavior among individuals as they work together, according to [25].
Feedback, item 12 in Table 1, is associated with the media capacity. A medium with a high degree of synchronicity is able to receive immediate feedbacks [25]. In software projects, feedback is very important to support the identification of problems and the improvement of product/processes, such as in the work of [46,47]. Feedback is also (2020) 26:7 Page 6 of 21 crucial in maintaining context awareness. For example, the work of [48] systematically addresses how to reduce delay feedbacks by using continuous analysis. Based on the definition of [26], communication frequency (item 13 in Table 1) is the intensity of the interactions. Table 1, item 14, indicates turn-taking. Turn-taking is about switching of interlocutors during a conversation or explanation. This concept is common in a dyadic conversation (communication between two interlocutors) for establishing a commonly shared knowledge among stakeholders. The works of [26,49] approach this concept. According to [49], the turn-taking is composed by several components and [50] points that turn-taking involves processes for construction, contributions, response to previous comments, and transitioning to a different speaker, using a variety of linguistic and non-linguistic signs.
Parallelism (item 15 in Table 1) is defined by [25] as the set of simultaneous interactions with many senders sending many messages at the same time to many receivers.
The author of [25] defines rehearsability (item 16 in Table 1) as the media capability to improve the content before sending the message. Reprocessability (item 17 in Table 1) is the capability that the medium has to enable a message to be reexamined or processed again. In a set of interactions, different interpretations can exist between the stakeholders, that is, the inherent divergence and ambiguity of communication can lead to different and sometimes conflicting knowledge. Several authors approach the subject, such as [51]. A symbol set (item 19 in Table 1), derived from [25], is the set of symbol types used in interactions.

The CFDSD
These issues are present largely in distributed project stakeholder's communication. The CFDSD creates a structure that enables to store and analyze content, including a mechanism for evaluating the communication. The CFDSD has the following features: 1. A general nature that makes communication process understandable, regardless of its maturity, software-engineering tools used in the project, and context; 2. A structure with interactions generated by people and tools (e.g., automatic bots); 3. An interface that allows the creation of techniques and models to assess communication; 4. It allows the study of communication even in a current project; 5. It creates interactions groups into the same context; 6. It encourages the introduction of software tools to automate communication assessments and structuring in distributed projects.
The CFDSD corroborates with the above-mentioned works in the "Key concepts" subsection. The concepts addressed in the second section corroborates with the conceptual framework that frames communication elements into single units. Table 2 shows how CFDSD combines key concepts. Key concepts are presented in the previous section. This table shows the relationship among them. Figures 1 and 2 show the CFDSD and its details in the next section.

Framework details
To accomplish the previously mentioned objectives, the CFDSD was divided into six tiers. Figure 1 shows a package diagram with these six tiers, named T0_Project, T1_Purpose,

19
Symbol set CFDSD encapsulates any set of symbols, but it cannot parse them in this work. T2_Composition, T3_Interaction, T4_Evaluation, and T5_Process. Each tier has a set of common purposes. The next sections of this paper describe each tier in detail. Figure 2 presents the CFDSD in a class diagram. In this work, the communication in any distributed project consists of two elements: interaction and host. Interaction is a oneway flow of information with origin and destination. The hosts are the agents that send and/or receive this flow, such as developers, managers, customers, software engineering tools, chatbots, continuous integration tools, and other stakeholders.

The t0_Project tier
This first tier, T0_Project (Fig. 2), represents an abstraction of the project under development. This level contains all the elements that generate and receive interactions within a software project. A software project (represented by the Project class) is developed by multiple (two or more) sites composed of one or more collaborators (typified as human beings-Person class). In addition to the sites, a project can contain software engineering tools (SETool class) used to directly or indirectly support the development of the final (2020) 26:7 Page 9 of 21 product, a knowledge base (KnowledgeBase class)-repository of templates and artifacts generated by the project or through previously collaborator experiences-and external entities (ExternalEntity class)-representation of entities not directly associated with the project.

The T1_Purpose tier
The T1_Purpose tier (Fig. 2) abstracts all the elements that generate and receive messages. It also associates a set of forms of intervention that may occur during project execution. The forms of intervention, mapped in the CollaborationForm class (action, schedule, media, network, adaptation, command/control, among others), were based on the activities defined by Fuks [15] and they are represented by the generic class Purpose. This tier is the start of communication and fragments it into several communication units (Com-mUnit Class). A CommUnit is a set of interactions between origin(s)/destinations(s) that start and end with same purpose. The host class initializes the communication by creating and sending/receiving a message. This class is an abstraction of four classes of the Project tier: collaborator, SETool, KnowledgeBase, and ExternalEntity. The CFDSD approaches the host as any stakeholder that send and/or receive messages and includes interactions generated and received between collaborators, tools (software engineering and/or bots), external entities, and the knowledge base.
When elaborating a message, the host creates the CommUnit, which aggregates the purpose and has a destination. Moreover, a CommUnit is composed of a set of interactions ([0, n]) from other CommUnits, characterizing a tree structure.

The T2_Compositions tier
Once the concepts of the T1_Purpose tier have been established, the next tier (T2_Composition Fig. 2) physically addresses message construction physically. In this tier, the Message class defines the transmission method and the synchronism of the message, which has a content (Content class) and can have software project artifacts (Artifact class) attached.
Any message in a software project can be associated in a specific process phase, activity, or discipline. The CFDSD creates a relationship between a message and a specific point in the process.

The T3_Interaction tier
The T3_Interaction tier (Fig. 2) transmits the message generated in the T2_Composition tier (immediately above tier). The classes that make up this tier are Transmission and CommunicationTool. For this purpose, the Transmission class stores submission data and the communication tool (CommunicationTool class) adopted. This tier approaches the conveyance.

The T4_Evaluation tier
Communication evaluation occurs in the last tier, T4_Evaluation (Fig. 2). This tier has an abstract evaluation model (EvaluationModel template class) in which the parameter of this template defines which object will be evaluated. When the subclass (EvalEngine4CU class) is instantiated, the attributes and the collaborator (person responsible for the evaluation) are informed through the constructor method. The EvalEngine4CU class evaluates CommUnit type objects, as bind indicated in Fig. 2. The evaluation method must also be implemented. It contains a list of objects to be evaluated and their possible evaluations. Finally, the template class EvaluateObject stores the object that was evaluated and your evaluation.
Any object instantiated from the CFDSD classes can be evaluated as long as they are created from the EvaluationModel and inform the bind value; the EvalEngine4CU class, for example, is responsible for evaluating CommUnits.

The T5_Process tier
The T5_Process is the tier to abstract the process. A software process is composed of several components such as phases, activities, and guidance, and the configuration of these components depends on a set of organizational factors. It is common for the same organization to run different process instances for different projects. The ProcessInstance class is for linking the organization process static view, and the interface IProcess associates the message to the process. This structure associates software process structure to messages. The CFDSD can gather interactions to conduct an assessment, for example.

Research methods and procedures
The research method adopted was a qualitative study using content analysis. This method was applied in the works of [52][53][54][55] for example. Studies carried out by [56,57] indicate that this method can be both quantitative and qualitative and inductive or deductive. The inductive content analysis method gets an abstraction from data, and a deductive content analysis is often used for retesting existing data in a new context. This work uses the deductive approach to test CFDSD cohesion in a real environment. A group of productive sector professionals developed a robotic project pre-established in a laboratory environment. All interactions among the distributed groups were stored during project development (this data is the content) and analyzed later by the researchers.
This method was chosen to meet the environmental conditions of DSD and to track sites behavior and all communication generated during the distributed project implementation. The content analyzed here was the interactions (message content and logs). The content analysis intention was to verify if CFDSD is applicable in a real situation, proving its adherence to real interactions generated in a DSD environment. However, interactions are not purely composed of artifacts but also of particular interactions, so it was necessary to analyze casual information among participants of the project.
A physical building with several rooms inside at UTFPR-CP (Federal University of Technology -Cornélio Procópio/PR -Brazil -23 • 11' 13.3" S 50 • 39' 23.3" W) was provided for the teams to execute the distributed experimental project. The rooms were far apart enough to ensure that people in different work teams were completely isolated and did not have physical contact with each other.
The participants were professionals with at least 3 years of experience in software production companies. Four sites with four to five participants in each were configured to perform the tasks related to the development of the experimental project. In addition to these four sites, two extra sites were created. The first additional site was named logistics and the second was named orchestrator. The orchestrator in the project is responsible for organizing all the stages before, during, and after the project execution. The logistics site was used to move physical components between sites to ensure people in different sites did not have face-to-face contact during the project runtime.

Project planning
A problem scenario was introduced in the first face-to-face meeting with all stakeholders. In a second stage, the work teams were physically separated to develop the product. The main objective of the project was deliver a robot with embedded software to solve a specific problem. The teams had to assemble the robot and develop the software. Besides, they followed a macro process and used a set of tools (Eclipse, Astah and LeJOS -Lego Java Operating System) defined earlier by the orchestrator. The macro process (based on a waterfall) was introduced in the first face-to-face meeting with the team members. Figure 3 presents the experimental project execution processes divided into six steps. The first one defines the composition and the physical location of each site. The next stage corresponds to the face-to-face meeting with the stakeholders. This second stage also includes the project presentation, initial resources, and the restrictions of each site. In step three, the sites are isolated in separate rooms. The step four represents the experimental project execution.

Experimental project execution
Step five consists of the final product delivery and, finally, the last step is the completion of the experimental project.

Selecting the unit of analysis
According to [58], the unit of analysis is a the sample of small pieces that the researcher needs to determine; however, [57] points out that there are no fixed criteria to create a the unit of analysis. The interactions between stakeholders were analyzed to validate this work, and we observed the following features: (1) communication purposes. Although Fuks studies [15] were used in this work, it is understandable that the diversity and the number of purposes are related to the singularities of the project. In this study, the aim was to identify the amount and the types of purposes generated in the experiment and aggregate them into communications unities.

Data collection and instrumentation
To monitor all the interactions, the work teams used previously registered accounts in a groupware environment. This environment provided collaborators with an e-mail, forum, wiki, chat, and videoconference. Communication tools recorded the data collected during the project runtime and Internet interactions data stored in a proxy system.

Results
Project runtime was 7 h and 45 min and it was conducted in two periods. The teams were responsible for the coordination and organization of the activities. They had to define the actions, roles, and assignments for each site. After the project execution, the data extracted from Moodle, Hangouts, Gmail, and Squid proxy were standardized. Table 3 presents the fields of the final worksheet generated after formatting the data and their respective descriptions: The fields CommUnit and Purpose, described in Table 3, were completed during analysis after building the project. Therefore, these data were generated from the researchers' interpretation of the messages. Although the experiment was performed in Brazil and the language used in these interactions was Brazilian Portuguese, for illustration purposes, some interactions in this document were translated into English.

Data analysis and interpretation
The analysis of the final worksheet resulted in the identification of 152 communication units in more than 2573 interactions. This section presents the structure and sequence of some different interactions. Date and time were omitted to reduce the size of the tables presented below. Tables 4 and 5 are in chronological order. The Table 4 contains a communication unit that was developed in the project between team 1 (g1) and the orchestrator (vl). In addition, the second and third column of Table 4 (Purpose and SubPurpose -S.Purp.) was describe into Table 6.
In Table 4, five interactions were identified in order to establish this CommUnit. Team 1 (g1) misinterpreted the model for the cash flow. This team requested the help of the orchestrator, who explained that the cash flow would be in dollars and the initial resource would be $200. In the sixth row of Table 4, g1 delivers a formal artifact and, it was possible to measure the communication through this artifact.
There was a time delay between interactions in the Table 4. It is important to note that a Communication Unit is not self-exclusive, which means that there could be another CommUnit, composed of other interactions, running at the same time. Interactions 2 and 3 in Table 4 sent the same message, but the communication tool duplicated them. Message 3 was noise, for this work. The same sequence identified in Table 4 can be represented graphically, as illustrated in Fig. 4.
There was also communication between the team and the knowledge base. Table 5 shows the interaction performed for the conclusion of the work breakdown structure (WBS). The g4 (team 4) downloaded and delivered the artifact without the intervention of collaborators from other sites. The communications presented in Table 5 were integralized and achieved their purposes. Figure 5 illustrates the communication flow of this CommUnit.
A single CommUnit can have multiple origins and destinations. In the experiment, there were some situations where the same site sent or received messages from more than one site. Figure 6 illustrates this process.
The purpose of this interaction is to identify potential sellers of a particular physical component. Team 3 (g3) needed a component but was unaware of which team had this component.

Identified purposes
After analyzing the final worksheet, it was possible to identify the purposes of the interactions that occurred during the project execution. These purposes are listed and described in Table 6.
The purpose was associated with the interaction after we collected the data. Three researchers involved in this work analyzed the spreadsheet described in Table 3 and, after each one included the purpose, the results were compared. Each researcher worked on this operation at least 40 hours. Moreover, it was found that purposes can be made up  from other purposes. This way, at a certain moment, a purpose can be considered the main one, and at another time, it can become a component of another purpose. In this work, all undesired situations that could compromise the communication process are considered a noise, similarly to the computer network area, as explained in [59]. Some noise was detected during the project execution, most of them related to the technology that was being used for communication (for example, duplication of messages in the chat system), and the rest of them was human-generated. As illustrated in Fig. 7, it was possible to map the purposes (P1..Px) of this project in a hierarchical structure consisting of CommUnits. Figure 7 contains a graph used to divide the project into CUs and split the units into subunits. This figure also presents an evaluation indicative, represented by the ellipse av1, and the association of artifacts with the CUs. The interactions (SP1...SPn) are in the lowest level of this tree. The representative structure shown in Fig. 7 only includes only two communications unities.   Table 4 Unit of analysis Although we had mapped the communication units after the project execution, it was possible to group a set of interactions into a single purpose (purpose and sub-purpose). Thus, the variable was considered satisfied and 100 For the same purpose/CU, several sites can generate interactions according to interests. A single CommUnit can temporarily contain a variable set of sites and hosts. We have found CommUnits that the communication tool switched from an e-mail (asynchronous) to Google Hangouts (synchronous). This evidence establishes the possibility of using different communication tools and, consequently, different types of synchronism in the same CommUnit.
The analysis of Figs. 4, 5, and 7 allowed the quantification and qualification of the interaction, and this experiment revealed that it is also possible to evaluate a set of interactions, manually, thus qualifying a CommUnit in this scenario. The conversation between sites was effective, and the communication tools provided for the project fulfilled all the needs. The project was delivered according to the specifications, and the predefined macro process was precisely followed.
The researchers visited the sites every half hour, with no intervention, to observe the behavior and the work of the teams. Through these visits, the researchers identified some deadlocks caused by different expectations from the sites. During the execution of the tasks, the sites did not communicate with each other. This way, a site could presume that the other site was not executing a given task because it was solving internal conflicts related to the project. This presumption created the false sensation of a setback One site perception about the others is reported by the concept of awareness, according to [21,60,61].

Threats to validity
Data obtained from the project were collected, processed, and analyzed. The analysis process was very difficult since all contents were analyzed manually, a possible but unfeasible task to be done in real-time. To identify all communication units was difficult. However, the CFDSD was adherent to the project communication set. Even though we conducted the experiment and stored all interactions, our research method has constraints for generalization since we analyzed the interactions, and, although we had analyzed some variables, we tested them in a controlled environment, and we made the analysis after the project delivery. It is important to highlight that the main research method was content analysis, and this method has constraints for generalization.
Furthermore, all participants were Brazilians with experience in software development, and some spoke a foreign language; therefore, the language adopted was Portuguese. The experiment did not address homonyms (perfect homonyms) problems in communication.
The experimental project was similar to a real project since all participants had similar professional skills. We conducted experimental tests with undergraduate students prior to this IT project, and we found out that participants had problems related to software process (students in the second semester), software development (students in the first semester), maturity, and capability to solve problems (students in the last semester). All students were attending undergraduate IT courses.
During the visits to the sites, the researchers observed that these sites did not always respond to requests from other sites, which caused some anxiety. On the other hand, in some occasions, the stakeholders sent an excessive number of messages. These problems are defined by [21,61,62] as awareness in a DSD environment.

Conclusions
This work shows how communication in a distributed project can be structured in CFDSD. For this purpose, we conducted an IT project during which we monitor all the interactions. In this research, the communication generated by the project is composed of communication units, and these units are composed of interactions. This model attests that communication can be structured in a tiered structure, as illustrated in Fig. 8. Figure 8 shows the communication segment of a project. At the highest atomic level, interactions are identified; a set of these interactions with the same purpose characterizes a CommUnit. The communication of a project consists of a set of CommUnit.
In addition to the explicit knowledge mapped in the CFDSD, other implicit concepts were observed during the project execution. These implicit concepts are discussed briefly below.
The following implicit knowledge was observed in the first tier (T0_Project): internal and external perspectives caused by geographic dispersion, coordination mechanism 1 , and prior knowledge related to project and development.
The perspectives mentioned above refer to the personal perception of each site. In this case, the internal perspective is related to site performance at the project development. The external perspective is associated with the vision of a site in relation to another site during the project development. In this regard, a site may have incoherent perceptions of other sites. An example of this occurs in a project when a site takes a long time to respond to other websites because it is solving some issues related to the project's tasks. Consequently, the other sites believe the site is blocking the project development, which is not the case.
The T1_Purpose tier implicitly treats project demands and prior relations between project stakeholders, which lead to a competition and/or cooperation between them.
The following tier, T2_Composition, implicitly includes language, dialect, and expression capacity, as well as internal denominations used in projects, such as references to specific document names. The T3_Interaction tier implicitly covers different levels of formality, ranging from informal and without record (e.g., phone calls) to formal and auditable (e.g., the use of an e-mail as a contract). Moreover, in this tier, it is implicit that, for the same purpose, stakeholders can switch communication tools.
Finally, the last tier, T4_Evaluation, implicitly incorporates the way that the communication elements, such as an interaction or an artifact, are evaluated and measured.
Although the project executed by the professionals was fictional, we noticed some important factors that influence communication in distributed projects. There is a direct relationship with the stakeholder's coordination mechanism in the communication. How one site can influence (offshore is different by branch, for example) another is subjective and can jeopardize process standardization.
After the implementation of our research method and analysis of the data, we can summarize our conclusion as follows: (1) this is a new UML communication specification in distributed projects based on relevant works; (2) the CFDSD was validated by a similar real project conducted by professionals; (3) the CFDSD works, but not in real-time; (4) the CFDSD provides interfaces to connect to project management and evaluation methods; (5) the communication unit is the new element of this work and its contribution; (6) the abstract evaluation model is a new element, too; and (7) there are unmapped problems in communication yet. These problems include face-to-face communication hidden by CFDSD, message overload, coordination mechanism, and awareness. The future works identified in this paper converge on two aspects: the conceptual and the instrumental. From the conceptual aspect, a future work is to develop metrics based on CommUnit. These metrics could identify problems and may even proactively enhance project performance. The adopted instrumentation was efficient, but it was not agile or effective. Since CommUnit mapping was manual, this process was relatively slow and required too much time and labor. Real projects that demand time and more data will require computer tools to classify the interactions and map the purposes. A mechanism to drive interactions with the automatic identification of CommUnit and purposes without human intervention can be a valuable tool for future experiments or projects.