# On the analysis of the collaboration network of the Brazilian symposium on computer networks and distributed systems

- Guilherme Maia
^{1}Email author, - Pedro O. S. Vaz de Melo
^{1}, - Daniel L. Guidoni
^{2}, - Fernanda S.H. Souza
^{2}, - Thiago H. Silva
^{1}, - Jussara M. Almeida
^{1}and - Antonio A. F. Loureiro
^{1}

**19**:109

https://doi.org/10.1007/s13173-013-0109-7

© The Brazilian Computer Society 2013

**Received: **1 October 2012

**Accepted: **11 March 2013

**Published: **26 March 2013

## Abstract

The Brazilian symposium on computer networks and distributed systems (SBRC) reached its 30th edition as the paramount scientific event in the area of computer networks and distributed systems in Brazil. Faced with this opportune moment in the event’s history, we here study the collaboration network established among authors who have jointly published in the symposium. Towards that end, we collected bibliographic data from all 30 editions, and built the co-authorship network of the event. We then analyzed the network structural features and evolution throughout its history. Our results reveal the main kind of co-author relationship among authors, show the most prominent communities within SBRC, the regions of Brazil that attracts the most authors, the researchers with central roles in the network as well as the importance of inter-state collaborations. Finally, we align our results with historical facts that may have had a key impact on the symposium success.

### Keywords

Collaboration networks Scientific networks Social networks SBRC## 1 Introduction

In 2012, the Brazilian symposium on computer networks and distributed systems (SBRC) reached its 30th edition as the paramount scientific event in the area of computer networks and distributed systems in Brazil. Its importance may be evidenced by the number of papers submitted and by the number of participants in the last editions of the event. For instance, in the last few editions, the symposium received between 250 and 300 papers from about 1,000 authors, including researchers, professionals and students. Due to its wide acceptance, SBRC assembles most of the work in the areas of computer networks and distributed systems from Brazil’s academic and professional communities, besides international researchers. Scientific events play a central role in knowledge dissemination, since they are one of the few opportunities for researchers with common interests to gather together, present new ideas and establish new collaborations. SBRC is not different, as we shall present throughout this paper. Hence, given this opportune moment in the event’s history, a broad investigation of such research community is timely.

We use social network analysis (SNA) to further investigate this well established research community. Because of the popularity of online social networks and the large availability of real social data, SNA has gained a lot of momentum in the last few years [22, 26, 36, 43]. Besides online social networks [20, 23, 25], it is possible to apply SNA to discover knowledge in the most diverse systems, such as mobile operators [12, 19, 40], Internet websites [1, 13], railroads [13], citation networks [17], movies and actors [21], sports leagues [28] and many others.

In summary, a social network is composed by a set of individuals or a group connected by different kinds of relationships. Individuals, also known as *actors*, may represent a single person, a group or even an organization. Their relationships, or *ties*, may indicate, for instance, a friendship, a professional relationship or a scientific collaboration. Actors and ties are defined according to the question of interest.

A scientific collaboration network is a special type of social network in which the *actors* represent authors and *ties* indicate that the authors have published at least one paper together. Collaboration networks have been widely analyzed [31–33, 35], as these studies disclose several interesting features of the academic communities that comprise them. For instance, the analysis of topological features enables the identification of communities [2], the intensity of collaborations among authors [11] and how the network evolves over the years [25].

Therefore, in this paper we study the SBRC’s collaboration network. Towards that goal, bibliographic data from all 30 editions of the event were collected and a series of features, obtained from the topological structure of the collaboration network, was analyzed. In particular, we here investigate the evolution of the largest connected component, number of communities, importance of nodes, their degree distribution and correlations, and network homophily.^{1} Through this study, it is possible to better understand the behavior of such a vibrant community and part of the impact produced by some crucial collaborations established through the years. For example, we are interested in investigating the peculiarities of collaborations among researchers from a region with a historically very active and productive research community, and among researchers from a region with no such community. It is worth noticing that when compared to previous studies on collaboration networks, our work stands out for three main reasons. First because we analysed 30 years of data, which to the best of our knowledge is more than any other study available in the literature. Second, our analysis considers several features that are usually not present, such as the geographic location of the researchers, the institutions they work for, among others. Finally, we make a parallel of our findings with several historical facts that may have had a key impact on the symposium success and also may have changed the way research is done in Brazil.

The remainder of this paper is organized as follows. Section 2 presents the related work. Then, Sect. 3 describes how data used in this work was collected and how the network was built. Section 4 presents some statistics about the participation of authors from different regions of Brazil. Next, Sect. 5 describes the main kinds of collaborations among authors, whereas Sect. 6 presents a study of the connected components of the network. Section 7 discusses distance and clustering measures, and Sect. 8 analyzes the main communities within SBRC. Researchers with strategic positions in the network are identified in Sect. 9, and Sect. 10 analyzes homophily in the SBRC network. Section 11 presents a cross analysis among some evaluated metrics. Section 12 presents the conclusions of this work. Finally, Appendix presents the historical aspects that may have contributed to foster the research development in Brazil.

## 2 Related work

The analysis of collaboration networks is well explored in the literature. For instance, Newman [31, 32] presents some of the pioneering studies in this area. The author analyzes three scientific communities—Computer Science, Physics and Biomedicine—and presents several structural and topological features of these communities, focusing on the main similarities and differences among them. Although these communities share some similarities, Newman shows that they also have substantial differences. In that direction, Menezes et al. [29] assess how the process of knowledge production in Computer Science happens in different geographic regions of the globe. The authors divide the globe into three main regions and evaluate how research is conducted in 30 different subfields of Computer Science for each of the considered regions, focusing on the structural and temporal features of the network. Among the main results, Menezes et al. show that the scientific production of Brazilian researchers is increasing in recent years, which they attribute to an increase in funding provided by Brazilian government agencies to foster research in the country.

Towards analyzing the Brazilian scientific production, Freire and Figueiredo [15] show the main similarities and differences between two co-authorship networks they propose: “Global”, created from all publications of the DBLP database, and “Brazilian”, which is a subset of the first network considering only researchers affiliated to Brazilian institutions. Moreover, they propose a new ranking metric to measure the importance of both an individual in the network and groups of individuals. This metric is applied to the Brazilian network and is compared with two existing ranking measurements in Brazil: the Research Fellowship Program of CNPq (an agency of the Brazilian Ministry of Science and Technology) and graduate programs in Computer Science provided by CAPES (an agency of the Brazilian Ministry of Education). The authors show that the proposed metric can accurately identify influential groups and well-established graduate programs in Brazil.

There are studies that analyze specific events and areas. Procópio et al. [38] create and analyze the co-authorship network of articles published during the first 25 years of the Brazilian Symposium on Databases (SBBD). The authors focus on the network’s structural features and temporal evolution throughout the event’s history. They present and study statistics such as average number of papers per author, average number of papers per edition of the symposium, average co-authors per paper, among others. Finally, the work shows that the studied network follows a well-known phenomenon called small world, typically found in other social networks. Silva et al. [41] create and analyze the co-authorship network of papers published in three international top conferences focused on Ubiquitous Computing (Ubicomp). They provide useful analysis for that network, such as representativeness of authors and institutions, and formation of communities. Finally, Nascimento et al. [30] analyze the co-authorship graph of the ACM Special Interest Group on Management of Data (SIGMOD) Conference. Among the main results, the authors observe that the SIGMOD community is also a small-world network. In comparison with these previous studies of co-authorship networks of specific research communities, we go further and analyze three fundamental aspects of researchers who publish in SBRC: geographic location, topological characteristics in the network and productivity statistics in the conference.

Finally, scientific collaboration networks are not limited to co-authorship networks. Bazzan and Argenta [4] create a social network of the PC (Program Committee) members of conferences sponsored by the Brazilian Computer Society (SBC). The relations among nodes of this network are established according to co-authorship data extracted from the DBLP. By using well-known network metrics, such as node degree, largest connected component and clustering coefficient, the authors show that the studied network does not fit any well-established pattern when compared to other networks studied in the literature. This is probably due to the fact that members of this network do not necessarily interact with one another in terms of co-authorship, once they belong to different sub-areas within Computer Science. One of the main findings was that the most connected nodes are non-Brazilian PC members, and they play an important role in the network by acting as connectors between Brazilian researchers. When compared to our work, we point out that SBRC includes both well-established authors and newcomers to the symposium, while the PC network is formed exclusively by members of senior character, which explains the difference in some of the metrics. Nevertheless, we observed that the SBRC network follows similar patterns to other previously analyzed scientific events and communities, such as the ones in [30, 38] and [41].

## 3 The network of the SBRC symposia

### 3.1 Data acquisition

Our study is based on bibliographic data of the 30 editions of SBRC, which took place from 1983 until 2012. It is focused on the collaboration network established among authors of papers published in the main track of each edition of the symposium. Thus, we collected data of full papers published in the proceedings of the event, excluding lectures, tutorials and workshop papers. For each paper, we collected its title, year of publication, list of authors with their respective affiliations, geographic location of the authors’ institutions, and the language the paper was written. The data comprises digital and non-digital sources, since the first editions of the event occurred before the existence of the Web. Part of the bibliographic data was obtained automatically through the website of the Brazilian Computer Society (SBC)^{2}, while the rest was collected manually from the proceedings of each edition. We manually disambiguated all author names to ensure data consistency.

### 3.2 Network creation

In this paper, the SBRC network is represented as a temporal graph \(G_y=(V_y,E_y)\), where \(V_y\) is the set of vertices, \(E_y\) is the set of edges and \(y\) is the year the network refers to. The graph \(G_y=(V_y,E_y)\) is an undirected weighted graph, where the vertices are authors and the edges indicate that two authors have published together in and before the year \(y\). Moreover, each edge has a corresponding weight, which represents the number of papers the authors published together in and before the year \(y\).

### 3.3 Network metrics

Network metrics have great importance while investigating network representation, characterization and behavior. This section presents a summary of the key network measurements used in our analysis, which are discussed along the paper.

*degree*(\(k_i\)) of a vertex \(i \in V_y\) is the number of edges incident to vertex \(i\) and the

*degree distribution*(\(P(k)\)) expresses the fraction of vertices in the whole graph with degree \(k\). The assortativity measures whether vertices of high degree tend to connect to vertices of high degree (assortative network) whereas the network is called disassortative when vertices of high degree tend to connect to vertices of low degree. A path connecting two vertices \(i,j \in V_y\) is said to be minimal if there is no other path connecting \(i\) to \(j\) with less links. Accordingly, the

*average path length*of \(G_y\) is the average number of links in all shortest paths connecting all pairs of vertices in \(V_y\). The graph

*diameter*is the length of the longest shortest path between all pairs of vertices in \(V_y\). The

*clustering coefficient of a vertex*\(i\) is the ratio of the number of edges between neighbors of vertex \(i\) to the upper bound on the number of edges between them. For instance, given \(i,j,k \in V_y\) and assuming that edges \((i,j), (i,k) \in E_y\), the clustering coefficient defines the probability that \((j,k)\) also belongs to the set \(E_y\). The

*clustering coefficient of a graph*is the average value of the clustering coefficients of all vertices in \(G_y\). The

*betweenness centrality*of a vertex \(i\) is associated with an importance measure, based on the number of shortest paths between other pairs of vertices that include vertex \(i\). The

*closeness centrality*of a vertex \(i\) is defined as the inverse of

*farness*, which in turn, is the sum of distances to all other nodes.

*Homophily*is the tendency of people (in our case, researches) with similar features to interact with one another more than with people with dissimilar features. The indicator function \({\small 1}\!\!1{c_i = c_j}\) assumes value \(1\) if the class \(c_i\) of node \(i\) is equal to the class \(c_j\) of node \(j\), and 0 otherwise. Notice that the assortativity is the homophily when the class \(c_i\) of node \(i\) is its degree \(k_i\). Table 1 summarizes the mathematical formulas for the main network metrics outlined above. Please refer to Costa et al. [10] for a complete review of measurements.

Network metrics

Metric | Formula |
---|---|

Order | \(N = |V_y|\) |

Size | \(M = |E_y|\) |

Degree | \(k_i = \sum \limits _{j\in V_y} a_{ij}\) |

where \(a_{ij} = \left\{ \begin{array}{ll} 1, &{}\quad \text{ if } \text{ vertices } i \text{ and } j \text{ are } \text{ connected }\\ 0, &{} \quad \text{ otherwise } \end{array}\right. \) | |

Degree distribution | \(P(k) = \frac{n_k}{N}\) |

where \(n_k\) is the number of vertices with degree \(k\) | |

Assortativity | \(r = \frac{\frac{1}{M}\sum \limits _{j>i}k_i k_j a_{ij} - [\frac{1}{M}\sum _{j>i}\frac{1}{2} (k_i + k_j)a_{ij}]^2}{\frac{1}{M}\sum _{j>i} \frac{1}{2}(k_i^2 + k_j^2)a_{ij} - [\frac{1}{M}\sum _{j>i} \frac{1}{2} (k_i + k_j)a_{ij}]^2}\) |

Average path length | \(L = \frac{1}{N(N-1)} \displaystyle \sum \limits _{i,j \in V_y: i \ne j} d_{ij}\) |

where \(d_{ij}\) is the distance between vertices \(i\) and \(j\) | |

Diameter | \(D = \max \lbrace d_{ij}\rbrace ,\; \forall i,j \in V_y, i \ne j\) |

Clustering coefficient of a vertex | \({\text{ cc }}_i = \frac{2e_i}{r_i(r_i - 1)}\) |

where \(e_i\) is the number of edges between neighbors of \(i\), and \(r_i\) is the number of neighbors of vertex \(i\) | |

Clustering coefficient of a graph | \(CC = \frac{1}{N} \displaystyle \sum \limits _{i \in V_y} {\text{ cc }}_{i}\) |

Betweenness centrality of a vertex | \(B_{i} = \displaystyle \sum \limits _{s,t \in V_y: s \ne t} \frac{\sigma (s,i,t)}{\sigma (s,t)},\; s \ne i, t \ne i\) |

where \(\sigma (s,i,t)\) is the number of shortest paths between vertices \(s\) and \(t\) that pass through vertex \(i\) and \(\sigma (s,t)\) is the total number of shortest paths between \(s\) and \(t\) | |

Closeness centrality of a vertex | \(C_i=\left[ \displaystyle \sum \limits _{j \in V_y} d_{ij}\right] ^{-1}\) |

where \(d_{ij}\) is the distance between vertices \(i\) and \(j\) | |

Homophily | \(H = \frac{\sum _{\forall (i,j) \in E_y}{\displaystyle {1\!1}_{\left[ {c_i = c_j} \right] }}}{2|E_y|}\) |

where \({1\!1}_{\left[ {c_i = c_j} \right] }\) is an indicator function that assumes value \(1\) if the class \(c_i\) of node \(i\) is equal to the class \(c_j\) of node \(j\), and 0 otherwise |

## 4 Statistics

^{3}As can be observed, the number of new authors more than doubled between the years 2000 and 2012. The same increase also happened to the number of new universities and published papers. These results show that SBRC is attracting the participation of new researchers and new institutions over the years. Moreover, they clearly reflect the increase in the number of new graduate programs in Computer Science in Brazil, especially during the 2000s, as shown in Fig. 26 of the Appendix.

Top 20 Brazilian authors

Author | Number of publications |
---|---|

Antonio A. F. Loureiro | 62 |

Otto C. M. B. Duarte | 50 |

Nelson L. S. da Fonseca | 50 |

José Ferreira de Rezende | 46 |

Liane M. Rochenbach Tarouco | 44 |

José Marcos Nogueira | 40 |

Joni da Silva Fraga | 38 |

Djamel Sadok | 35 |

Edmundo R. M. Madeira | 32 |

Paulo R. F. Cunha | 32 |

Luci Pirmez | 30 |

Luiz F. G. Soares | 25 |

Lisandro Z. Granville | 25 |

José A. S. Monteiro | 25 |

Edmundo A. de S. e Silva | 24 |

Maurício F. Magalhães | 23 |

Jean-Marie Farines | 23 |

Marinho P. Barcellos | 23 |

Jussara Almeida | 22 |

Luciano Paschoal Gaspary | 20 |

Top 20 foreign authors

Author | Number of publications |
---|---|

Guy Pujolle | 11 |

Francisco Vasques | 5 |

Alysson Neves Bessani | 5 |

José Neuman de Souza | 4 |

Serge Fdida | 4 |

Aline Carneiro Viana | 4 |

Emir Toktar | 4 |

José Marcos Nogueira | 4 |

Dominique Gaiti | 3 |

Badri Nath | 3 |

Luís Ferreira Pires | 3 |

Pedro Braconnot Velloso | 3 |

Azzedine Boukerche | 3 |

Miguel Correia | 3 |

Pierre de Saqui-Sannes | 3 |

Don Towsley | 3 |

Gregor von Bochmann | 3 |

Jean-Pierre Courtiat | 3 |

Marcelo Dias de Amorim | 3 |

Michel Hurfin | 2 |

## 5 Collaborations

As stated before, an edge between two researchers indicates a scientific collaboration between them. Thus, the degree of a node \(i\) represents the number of collaborators of researcher \(i\). The analysis of the node’s degree in a collaboration network allows the assessment of the structure of co-authorship relationships among researchers in the communities of computer networks and distributed systems in Brazil.

*biased fit*” represent biased fits and should not be considered good fits

^{4}[9]. It is worth noticing that there is a general trend towards \(\alpha \) decreasing over the years, which indicates that the variance distribution increases as the number of nodes with a high degree in the network grows. For instance, in the first year of the SBRC network, all nodes have degrees of the first order of magnitude, i.e., lower than \(10\). In the last year, however, while several nodes have node degrees close to the third order of magnitude, the large majority still have degree lower than \(10\). This is an expected behavior in a collaborative network since, over time, researchers tend to consolidate and aggregate groups and communities that share the same interests. This shall be seen in more details hereafter.

*assortative network*, high degree nodes tend to connect with other high degree nodes, whereas in a

*disassortative network*, high degree nodes tend to be connected to low degree nodes. The assortativity values range from \(-\)1, when the network is fully disassortative, to 1, when it is fully assortative. Figure 11 shows that the SBRC collaboration network becomes disassortative over the years. In 1983, the network is completely assortative due to the presence of cliques, i.e., each node is connected to nodes having the same degree. During the initial years, the network still presents an assortative feature, due to the large presence of isolated cliques or small connected components. However, from the end of the 1990s on, the network is consolidated as disassortative, where the tendency is that high degree nodes be connected to low degree nodes. This is the natural behavior in collaboration networks, as students or newcomers (low degree nodes) tend to connect with well-established and expert researchers (high degree nodes) to grow in their academic careers.

## 6 Connected components

Top five largest components

Component | Size | ||||||||
---|---|---|---|---|---|---|---|---|---|

1983 | 1984 | 1985 | 1986 | 1989 | 1995 | 2001 | 2007 | 2012 | |

1 | 4 | 4 | 8 | 10 | 48 | 156 | 577 | 1,108 | 1,476 |

2 | 3 | 4 | 6 | 10 | 36 | 73 | 12 | 13 | 13 |

3 | 3 | 4 | 4 | 10 | 13 | 31 | 9 | 12 | 12 |

4 | 2 | 4 | 4 | 7 | 12 | 24 | 8 | 8 | 11 |

5 | 2 | 3 | 4 | 4 | 10 | 9 | 7 | 7 | 11 |

To illustrate how important individual collaborations can impact the network structure, consider the year of 2001, when the SLCC merges with the LCC. This happened exclusively because of the collaboration of two researchers from the SLCC with researchers belonging to the LCC. More specifically, in 2001, Michael A. Stanton, an author in the SLCC in 2000, was a co-author with Noemi de La Rocque Rodriguez, who belongs to the LCC in 2000. Similarly, also in 2001, José Neuman de Souza, who belongs to the SLCC in 2000, co-authored a paper with Nelson L. S. da Fonseca, who belongs to the LCC in 2000. These two collaborations illustrate a non-geographically constrained collaboration and a geographically constrained collaboration, respectively. For instance, in 2001, Michael A. Stanton was working at the Federal University Fluminense, located in Niterói, RJ, and Noemi de La Rocque Rodriguez was working at the Pontifical Catholic University of Rio de Janeiro, located in Rio de Janeiro, RJ. These two cities are about 20 km from one another. However, in 2001, José Neuman de Souza was working at the Federal University of Ceará, located in Fortaleza, CE, and Nelson L. S. da Fonseca was working at the State University of Campinas, located in Campinas, SP. These two cities are about 3,000 km far way from one another. It is important to notice that during the 2000s, collaborations like the one between Neuman and Fonseca start to become more common due to the many technological advancements in telecommunication and transportation, and also to the expansion of Computer Science graduate programs in many regions of Brazil.

## 7 Clustering and distance

The clustering coefficient (CC) and distance are important metrics to evaluate social networks. The clustering coefficient \({\text{ cc }}_i\) characterizes the density of connections close to vertex \(i\). It measures the probability of two given neighbors of node \(i\) to be connected. The clustering coefficient of the network is the average \({\text{ cc }}_i, \forall i \in V\).

## 8 Communities

One of the most relevant characteristics of graphs representing real systems is the structure of communities, i.e., the organization of vertices into clusters, with many edges between the vertices of the same cluster and relatively few edges connecting vertices of different clusters. In order to identify communities in the collaboration network, we used the \(k\)-clique community identification algorithm. A community is defined as the union of all cliques of size \(k\) that can be achieved through adjacent \(k\)-cliques (two \(k\)-cliques are considered adjacent if they share \(k-1\) vertices). In other words, a k-clique community is the largest connected sub-graph obtained by the union of a \(k\)-clique and of all \(k\)-cliques which are connected to it. The implementation of this algorithm was based on Palla et al. [37].

Our main goal is to evaluate how distributed and clustered are the collaborations among authors in the SBRC network. This justifies the choice for the \(k\)-clique community algorithm, since it is a good measure to select sub-communities and also overlapping communities [14]. In order to achieve our goal, we use the lowest bound value of \(k = 3\), since it is the most favorable value to capture the largest group of authors (largest connected sub-graph) that forms a community, according to the algorithm specification. When executing the \(k\)-clique community algorithm with \(k=3\), assuming a network with high collaboration between nodes, it is expected to find very few communities. However, as discussed hereafter, this is not the case for the SBRC network.

### 8.1 View of communities

*FOREIGN*represents researchers from institutions located outside Brazil.

After executing the \(k\)-clique community algorithm (with \(k=3\)), we would expect to find a small number of communities. But, as we can see, we identified many different communities. Obviously, with higher values of \(k\) we find communities that have authors more connected among themselves. Considering \(k = 4\), for example, the largest, second largest, and third largest communities have 42, 39, 31 authors, respectively. If we consider \(k = 5\), the number of authors in the largest, second largest, and third largest communities drops to 16, 16, 15, respectively.

A value of \(k = 3\) is particularly interesting for visualizing the general interaction among the authors of the SBRC network, but on the other hand this may not find very strong communities. This is what happened for the community consisting mainly of authors from RS and RJ (largest 3-clique community). After a closer look, we can see that the number of collaborations between these groups of authors is not as large as the number of collaborations within the groups. For instance, when we execute the algorithm considering \(k=4\), we notice that this community is divided into two communities, one formed mostly by authors from RS, and the other formed by authors from RJ. This shows that RJ and RS together as the largest 3-clique community do not represent a very strong connected community.

In general, we observe that most of the interactions tend to happen among authors from particular regions and institutions. This information might be particularly interesting to support decisions towards the improvement of collaborations among researchers from different universities and regions of Brazil.

### 8.2 Community evolution over time

Figure 20b shows the cumulative distribution function (CDF) of the number of authors in the communities, considering the years of 1983, 1993, 2003, and 2012. A high number of communities, as observed in Fig. 20a, does not mean that there are many authors in all these communities. Figure 20b shows that communities with a small number of authors represent a considerable subset of all communities. Around 90 % of all communities have less than 10 authors, and approximately 55 % have only three authors. However, we can notice that over the years, due to an increase in the number of collaborations, communities with a higher number of authors start to arise. For example, in 1983 the largest community had only four authors, whereas in 2012 six communities had more than 30 authors.

Figure 20c shows the number of authors over the years for the following groups of communities: all communities, 20, 10 and 5 largest communities, and the largest community. We observe that from 2004 to 2012 the number of authors per community increases considerably. As stated before, such increase is due to the growth of a few communities with a large number of authors. In this way, we observe that in 2004, the 5 largest communities represent approximately 64 % of the top 10 communities and approximately 48 % of the top 20 largest communities. Considering the year 2012, these values are 79 and 65 %, respectively. We also observe that the top 5 communities represent a significant amount (29 %) of all considered authors. This result indicates that authors in the largest communities interact with researchers outside their communities, thus increasing it over time.

Finally, someone may attribute the change in the communities dynamics during the 2000s, as shown in Fig. 20, to the merge of the LCC and SLCC in 2001, as previously described in Sect. 6. However, this event alone does not totally explain such a change. It is worth noting that it is during the 2000s that significant historical events start to happen in Brazil (see Appendix). For instance, we can outline the developments in the telecommunications and transportation sectors. Moreover, Brazil witnessed a rapid growing in the number of Computer Science graduate programs all over the country. Therefore, we can conclude that the combination of these events changed the way researchers used to collaborate, thus better explaining the change in the communities dynamics during this decade.

## 9 Important nodes

The identification of important nodes within a social network structure is a common activity in SNA. Usually, the identification of such nodes is performed by using centrality metrics, such as the closeness and betweenness [6]. These metrics aim to identify nodes that possess strategic locations within the social network structure. A strategic location may indicate that a node has a high influence over other nodes, or it hold the attention of nodes whose positions are not as convenient in the social context.

The main idea behind the closeness centrality metric is to show how close a node is to all other nodes in the network, i.e., how many edges separate a node from other nodes. On the other hand, the main idea behind the betweenness centrality is to show how often a node is in the shortest path between any two other nodes. In the perspective of a co-authorship network, the closeness centrality may indicate the authors with a favorable location in the network structure to start the dissemination of new scientific findings or research directions to the whole network. For instance, if an author with a high closeness disseminates a new scientific finding, the probability for this new finding reaching the whole network in the least amount of time is higher than if the dissemination started at an author with a lower closeness.

In the case of the betweenness centrality, it may indicate the most efficient authors to act as bridges to carry information among different authors or communities. For instance, if an author has a high betweenness, the probability that a given piece of information being disseminated passes through this researcher is higher than for an author with a lower betweenness. Therefore, we hope that these metrics are able to identify not only strategically located authors in the co-authorship network, but also distinguished researchers in the scientific community of computer networks and distributed systems.

Top 10 betweenness authors

Name | Betweenness |
---|---|

José Neuman de Souza | 0.186 |

Nelson L. S. da Fonseca | 0.124 |

Paulo Roberto Freire Cunha | 0.109 |

José Ferreira de Rezende | 0.095 |

Maurício Ferreira Magalhães | 0.086 |

Marcos Rogério Salvador | 0.077 |

José Marcos Nogueira | 0.070 |

Artur Ziviani | 0.065 |

Liane M. R. Tarouco | 0.064 |

Luci Pirmez | 0.055 |

Top 10 closeness authors

Name | Closeness |
---|---|

José Neuman de Souza | 0.243 |

Nelson L. S. da Fonseca | 0.229 |

José Ferreira de Rezende | 0.223 |

Luci Pirmez | 0.221 |

Jorge Luiz de Castro e Silva | 0.216 |

Paulo Roberto Freire Cunha | 0.215 |

Alexandre Lages | 0.214 |

Elias Procópio Duarte Jr. | 0.212 |

Flávia Coimbra Delicato | 0.212 |

Rossana M. C. Andrade | 0.211 |

For instance, the researcher *Alexandre Lages* is in the top 10 authors for the closeness, but this author has only four publications in the SBRC and his last work was in 2007. However, a careful analysis of the collaborations of this author explains why such a fact occurs. It also highlights that the importance of an author in the co-authorship network, as identified by the centrality metrics, is strongly influenced by the pattern of his collaborations. That is, despite Lages’ small number of publications, they were in collaboration with very influential and central authors. For instance, in 2004, Lages’ work has as collaborators the following influential authors: *Flávia Coimbra Delicato* (16 publications in SBRC), *Luci Pirmez* (30 publications in SBRC) and *José Ferreira de Rezende* (46 publications in SBRC). Lages also has collaborations with *José Neuman de Souza* (17 publications in SBRC), *Lisandro Granville Zambenedetti*, (25 publications in SBRC) and *Liane Margarida Rochenbach Tarouco* (44 publications in SBRC). It can be observed that these authors are identified by one or both metrics as influential within the SBRC community (despite the author *Lisandro Granville Zambenedetti* does not appear in both tables, he is in the top 20 for both centrality metrics).

From this result we can conclude that when an author collaborates with central authors with a high closeness, then this researcher also increases his own closeness to all other authors in the network. For instance, in 2004, when Lages published together with *José Ferreira de Rezende*, his distance to *Otto C. M. B. Duarte* went from *not possible to reach* to two edges. Therefore, a collaboration with a central author made Lages closer to another author that was not his direct collaborator. Notice that the same may also happen to the betweenness, i.e., when two or more authors publish a paper together, these authors may create a new “bridge” connecting different groups of researchers, thus increasing the betweenness for these authors.

Looking at Tables 5 and 6 in this section and Table 2 in Sect. 4, we can notice two interesting facts. First, the top two publishers in SBRC, *Antonio A. F. Loureiro* and *Otto C. M. B. Duarte*, do not appear in the top 10 of both centrality metrics. Second, an author that is not in the top 30 publishers in SBRC, *José Neuman de Souza* (17 publications in SBRC), is the most central author according to both centrality metrics. For instance, if we look into the history of both Loureiro and Souza we can notice similar aspects. They are constantly publishing in SBRC since 1995, they appear in almost the same number of communities (Loureiro appears in 7, while Souza in 6), they collaborate with almost the same number of universities (Loureiro has collaborators in 14 universities, while Souza has collaborators in 15) and also states (Loureiro has collaborators in 11 states, while Souza has in 10).

However, once again, a careful analysis of the collaboration of these authors might explain why such facts occur. Using the same \({\langle } k_{nn} {\rangle }\) metric as in Sect. 5 we find that the average degree of Loureiro’s collaborators is 6.42, while for Souza it is 14.28. Therefore, we can assume that while Loureiro usually publishes with his students, Souza usually publishes with senior researchers, probably acting as a “bridge” among prominent groups within the SBRC community. In particular, Souza is a collaborator to 5 authors in the top 10 betweenness and to 8 authors in the top 10 closeness. As an experiment, let us assume that Loureiro and Souza published a paper together at some point in the history of SBRC, resulting in an edge between the two authors. By adding this single collaboration, Loureiro goes from the 51st largest closeness in the network to the 13th largest closeness. Considering the betweenness, Loureiro goes from the 11th largest betweenness to the 6th largest betweenness. Actually, Loureiro’s betweenness suffers an increase of about 60 %. Therefore, we can conclude that in a co-authorship collaboration network, the number of publications alone does not dictate the importance of an author within the community, but rather the pattern of his collaborations.

Furthermore, it is important to notice that centrality metrics are important tools in identifying strategic nodes in a network structure. Nevertheless, these metrics alone do not hold the final word on which nodes are actually important or not. For instance, we showed that using these metrics alone we were able to identify a central author that, apparently, is not active in the community anymore, and also active and prolific authors that are not considered as central authors.

*Maurício F. Magelhães*,

*Paulo R. F. Cunha*,

*Nelson L. S. da Fonseca*and

*José Neuman de Souza*had the largest value of betweenness in different years, with the latter holding the top position since 2004. Notice that the values of closeness follow a similar behavior, which is mainly due to the arrival of new authors in the network and the emergence of new collaborations, especially after 1995. In particular, we can see that both metrics drastically increased in 2001 for the authors

*José Neuman de Souza*and

*Nelson L. S. da Fonseca*due to a new collaboration between them. Recall from Sect. 6 that this collaboration was responsible for merging the two largest connected components at the time. Figure 21c shows the degree evolution for the five researchers with the highest degrees in the network. It is worth noticing that four of the five researchers have little collaborations until 1995, but then experience a dramatic increase in their degrees afterwards.

## 10 Homophily and its impact

In SNA, the homophily principle states that similar nodes are more likely to connect than non-similar ones [24]. Consider similar nodes that share, for instance, the same gender, age, social status, religion, education, geographic location, and other types of attributes. Homophily has powerful implications in our world, limiting the information people receive, the attitudes they take, and the interactions they experience [27]. Thus, in this section we analyze homophily in the SBRC network, using the geographic location of the corresponding author as the node attribute that determines similarity, i.e., the state where the author’s institution is located. It is natural to think that researchers who are geographically closer are more likely to publish together. However, here we also show the impact of this expected geographic segregation in the spread of research information in a large country as Brazil.

After verifying that homophily decreases over time in the SBRC network, a natural step is to analyze if it brings any impact to research. As we have seen previously, the distribution of publications among the states is concentrated into a few states. However, recently a few states, which were completely inactive, showed a small but significant progress. For instance, the state of Pará had only two publications in the first 20 years of SBRC, in the years of 1997 and 1998. In the last 10 years, researchers from the state of Pará had published a total of nine papers in six distinct years.

In order to formalize this, we use the Gini coefficient [8, 16] to measure the inequality in the number of publications over the regions, states and universities of Brazil. The Gini coefficient was initially proposed to describe the income inequality in a population, commonly between countries and within countries [8, 16]. It has found application in the study of inequalities in several other disciplines [39] and here we apply it to measure how the publications are distributed among the states of Brazil. It assumes values from 0, which expresses perfect equality, where all values are the same, to 1, which expresses maximal inequality among values, where all publications are concentrated in a single state.

In Figs. 23d–f, we show the Gini coefficient for the SBRC network computed on an year basis as well as over the aggregated network, considering the distribution of the publications among the geographical regions, states and universities. Like the homophily, observe the Gini coefficient decreases over the years, indicating that the distribution of the number of publications is becoming more equal. In fact, it decreases practically at the same rate as the homophily decreases. The Pearson’s correlation coefficient between the homophily and the Gini coefficient in the SBRC network is 0.90 and, yearly, 0.45, among regions; 0.95 and, yearly, 0.54, among states; and 0.92 and, yearly, 0.70, among universities. This fact strongly suggests that the increase in the collaborations between researchers from different backgrounds significantly contributes to diminishing the inequality in the number of publications in Brazil, indicating that the network is becoming more heterogeneous.

This conclusion shows the importance of inter-state and inter-country collaboration programs, such as the recently created “Ciência Sem Fronteiras”^{5} Brazilian program and the creation of graduate programs over the years. Such programs and other incentive mechanisms allow that regions with low research activity develop, mirroring their more productive partners.

## 11 Cross analysis

## 12 Conclusions

In this paper we made an analysis of the collaboration network between authors who have published in the editions of the Brazilian Symposium on Computer Networks and Distributed Systems. From this analysis, we have shown why the symposium is so relevant for the Brazilian research community and the regions with the highest number of participations. Moreover, we showed that the main kind of co-authorship is between well-established authors and newcomers to the symposium, which represents the natural kind of co-authorship between student and advisor. The most prominent communities were presented in two visualizations, one by universities and another by the Brazilian states. Furthermore, we identified the researchers who have a strategic position within the collaboration network and, thus, the power to influence others. Finally, we presented some Brazilian historical aspects that may have had a great impact on the symposium success, by allowing the collaboration of geographically distant researchers, thus strengthening the creation and establishment of new communities. As future work, it would be interesting to analyse other Brazilian Symposiums, such as the SBBD, SBES and SIBGRAPI. By analysing these communities at the same level of detail as the study here performed, it would be possible to draw a bigger picture of the research community in Computer Science in Brazil.

The aggregated number of authors (universities and papers) for year \(y\) is the number of unique authors (universities and papers) in all years up to \(y\). Henceforth, all aggregated results follow the same logic.

## Notes

## Declarations

### Acknowledgments

This work is partially supported by the authors’ individual grants and scholarships from CNPq, CAPES, and FAPEMIG, as well as by the Brazilian National Institute of Science and Technology for Web Research (MCT/ CNPq/ INCT Web Grant Number 573871/2008-6).

## Authors’ Affiliations

## References

- Albert R, Jeong H, Barabási AL (1999) Diameter of the world wide web. Nature 401:130–131View ArticleGoogle Scholar
- Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: KDD ’06: proceedings of the 12th ACM SIGKDD, pp 44–54Google Scholar
- Bayati M, Kim JH, Saberi A (2010) A sequential algorithm for generating random graphs. Algorithmica 58(4):860–910MATHMathSciNetView ArticleGoogle Scholar
- Bazzan A, Argenta V (2011) Network of collaboration among pc members of Brazilian computer science conferences. J Braz Comput Soc 17:133–139Google Scholar
- Ben-Naim E, Frauenfelder H, Toroczkai Z (2004) Complex networks. Lecture notes in physics. Springer, BerlinGoogle Scholar
- Bonacich P (1987) Power and centrality: a family of measures. Am J Sociol 95(5):1170–1182View ArticleGoogle Scholar
- de Carvalho MSRM (2006) A trajetria da internet no brasil: do surgimento das redes de computadores ã instituicao dos mecanismos de governança. Coope, Federal University of Rio de Janeiro, Master’s ThesisGoogle Scholar
- Ceriani L, Verme P (2012) The origins of the gini index: extracts from variabilità e mutabilità (1912) by corrado gini. J Econ Inequal 10(3):421–443View ArticleGoogle Scholar
- Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661–703Google Scholar
- Costa LF, Rodrigues FA, Travieso G, Boas PRV (2007) Characterization of complex networks: a survey of measurements. Adv Phys 56:167–242View ArticleGoogle Scholar
- Crandall D, Cosley D, Huttenlocher D, Kleinberg J, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: Proceedings of 14th ACM SIGKDD, pp 160–168Google Scholar
- Du N, Faloutsos C, Wang B, Akoglu L (2009) Large human communication networks: patterns and a utility-driven generator. In: KDD ’09: Proceedings of the 15th ACM SIGKDD, pp 269–278Google Scholar
- Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: SIGCOMM, pp 251–262Google Scholar
- Fortunato S, Lancichinetti A (2009) Community detection algorithms: a comparative analysis: invited presentation, extended abstract. In: Proceedings of the fourth international ICST conference on performance evaluation methodologies and tools, VALUETOOLS ’09, pp 27:1–27:2Google Scholar
- Freire V, Figueiredo D (2011) Ranking in collaboration networks using a group based metric. J Braz Comput Soc 17:255–266MathSciNetView ArticleGoogle Scholar
- Gini C (1912) Variabilità e mutabilità: contributo allo studio delle distribuzioni e delle relazioni statistiche. pt. 1. Tipogr. di P. CuppiniGoogle Scholar
- Guo Z, Zhang Z, Zhu S, Chi Y, Gong Y (2009) Knowledge discovery from citation networks. In: 2009 Ninth IEEE international conference on data mining, pp 800–805Google Scholar
- Hassan AE, Holt RC (2004) The small world of software reverse engineering. In: Proceedings of the 11th working conference on reverse, engineering, pp 278–283Google Scholar
- Hidalgo CA, Rodriguez-Sickert C (2008) The dynamics of a mobile phone network. Phys A Stat Mech Appl 387(12):3017–3024View ArticleGoogle Scholar
- Hill S, Nagle A (2009) Social network signatures: a framework for re-identification in networked data and experimental results. In: CASON ’09: Proceedings of the 2009 international conference on computational aspects of social networks, pp 88–97Google Scholar
- Jensen DD, Fast AS, Taylor BJ, Maier ME (2008) Automatic identification of quasi-experimental designs for discovering causal knowledge. In: KDD ’08: Proceeding of the 14th ACM SIGKDD, pp 372–380Google Scholar
- Kossinets G, Watts DJ (2006) Empirical analysis of an evolving social network. Science 311(5757):88–90MATHMathSciNetView ArticleGoogle Scholar
- Kumar R, Novak J, Tomkins A (2006) Structure and evolution of online social networks. In: KDD ’06: Proceedings of the 12th ACM SIGKDD, pp 611–617Google Scholar
- Lazarsfeld PF, Merton RK (1954) Friendship as a social process: a substantive and methodological analysis. In freedom and control in modern society. 18(1):18–66Google Scholar
- Leskovec J, Kleinberg JM, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. TKDD 1(1):2View ArticleGoogle Scholar
- Lewis K, Kaufman J, Gonzalez M, Wimmer A, Christakis N (2008) Tastes, ties, and time: a new social network dataset using Facebook.com. Soc Netw 30(4):330–342View ArticleGoogle Scholar
- McPherson M, Lovin LS, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444View ArticleGoogle Scholar
- Vaz de Melo POS, Almeida VAF, Loureiro AAF (2008) Can complex network metrics predict the behavior of nba teams? In: KDD ’08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 695–703Google Scholar
- Menezes GV, Ziviani N, Laender AH, Almeida V (2009) A geographical analysis of knowledge production in computer science. In: Proceedings of the 18th international conference on world wide web, WWW ’09, pp 1041–1050Google Scholar
- Nascimento MA, Sander J, Pound J (2003) Analysis of sigmod’s co-authorship graph. SIGMOD Rec 32(3):8–10Google Scholar
- Newman MEJ (2001) Scientific collaboration networks. I. Network construction and fundamental results. Phys Rev E 64(1):016131+Google Scholar
- Newman MEJ (2001) Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys Rev E 64(1):7Google Scholar
- Newman MEJ (2001) The structure of scientific collaboration networks. Proc Natl Acad Sci USA 98(2):404–409Google Scholar
- Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256MATHMathSciNetView ArticleGoogle Scholar
- Newman MEJ (2004) Coauthorship networks and patterns of scientific collaboration. In: Proceedings of the national academy of sciences, pp 5200–5205Google Scholar
- Newman MEJ (2011) Complex systems: a survey. Am J Phys 79(8):800–809View ArticleGoogle Scholar
- Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814–818View ArticleGoogle Scholar
- Procópio PS, Laender AHF, Moro MM (2011) Anãlise da rede de coautoria do simpósio brasileiro de bancos de dados. In: Simpósio Brasileiro de Banco de, Dados, pp 050–1-050-8Google Scholar
- Sadras V, Bongiovanni R (2004) Use of Lorenz curves and gini coefficients to assess yield inequality within paddocks. Field Crops Res 90:303–310View ArticleGoogle Scholar
- Seshadri M, Machiraju S, Sridharan A, Bolot J, Faloutsos C, Leskove J (2008) Mobile call graphs: beyond power-law and lognormal distributions. In: KDD ’08: Proceeding of the 14th ACM SIGKDD, pp 596–604Google Scholar
- Silva TH, Celes CSFS, Mota VFS, Loureiro AAF (2012) Overview of ubicomp research based on scientific publications. In: Proceedings of IV Simpósio Brasileiro de Computação Ubíqua e Pervasiva, SBCUP 2012Google Scholar
- Watts DJ (2004) Six degrees: the science of a connected age. W. W. Norton & Company, New YorkGoogle Scholar
- Watts DJ, Dodds PS, Newman MEJ (2002) Identity and search in social networks. Science 296(1):1302–1305View ArticleGoogle Scholar
- Watts DJ, Strogatz SH (1998) Collective dynamics of “small-world” networks. Nature 393:440–442View ArticleGoogle Scholar