 Research
 Open Access
Rational Erdös number and maximum flow as measurement models for scientific social network analysis
 Victor Ströele^{1}Email authorView ORCID ID profile,
 Renato Crivano^{2},
 Geraldo Zimbrão^{2},
 Jano M. Souza^{2},
 Fernanda Campos^{1},
 José Maria N. David^{1} and
 Regina Braga^{1}
https://doi.org/10.1186/s1317301800706
© The Author(s). 2018
 Received: 23 October 2017
 Accepted: 11 June 2018
 Published: 4 July 2018
Abstract
In social network analysis, the detection of communities—composed of people with common interests—is a classical problem. Moreover, people can somehow influence any other in the community, i.e., they can spread information among them. In this paper, two models are proposed considering information diffusion strategies and the identification of communities in a scientific social network built through these two model concepts. The maximum flowbased and the Erdös numberbased models are proposed as a measurement to weigh all the relationships between elements. A clustering algorithm (kmedoids) was used for the identification of communities of closely connected people in order to evaluate the proposed models in a scientific social network. Detailed analysis of the obtained scientific communities was conducted to compare the structure of formed groups and to demonstrate the feasibility of the solution. The results demonstrate the viability and effectiveness of the proposed solution, showing that information reaches elements that are not directly related to the element that produces it.
Keywords
 Scientific social network analysis
 Information diffusion
 Measurement models
 Clustering algorithm
Introduction
Web evolution and the increasing availability of data allow researchers to study the ways in which connections between people are established, and how they evolve over time. Social networks emerged to represent people and their relationships, and thereafter, many efforts have been made to analyze these networks, contributing to a better understanding of the social structures [45]. A relationship is defined as a specific contact or connection type between pairs of actors. Relationships may be direct, when an actor provides information and the other receives it directly, or indirect, when information reaches its destination through intermediate actors.
Since actors are directly or indirectly related, social networks play a key role in information diffusion, increasing the spread of new information and different points of view [2]. Facebook, for example, was very powerful in the Arab Spring in 2010 [20] and Twitter in the US presidential elections in 2008 [22]. More recently, these social networks were also powerful in the 2013 Brazilian protests [5, 46]. All these events are highly linked to social networking platforms, which contributed to disseminate information at specific moments.
Considering (i) information diffusion in social networks [17], (ii) some social network analysis studies ([16, 50] show that even two indirectly related people may be influenced by each other; and (iii) in previous studies from our research group in scientific social networks [41, 40], the authors proposed models for scientific social network analysis that include information diffusion strategies, throughout indirect relationships among actors [17, 27].
This paper aims to describe models capable of exploiting information diffusion in scientific social networks, i.e., how information propagates through people who have different potentials for spreading information (different tie weight). Our first objective is to build a scientific social network which allows developing studies to consider information diffusion in the network itself. It aims to analyze social network considering that all elements can somehow influence one another. Influence is related to the individual’s ability to affect other people in a social network community. In general, in a scientific context, influential researchers tend to create or strengthen ties, as they can propagate their knowledge, reaching a larger number of people. Our second objective, with the construction of a scientific social network, is to consider the proposed models and the identification of scientific communities using the clustering algorithm kmedoids.
As contributions, the authors can highlight the development of two evaluation measures to analyze how information diffusion occurs between pairs of researchers, namely maximum flow and weighted resistance distance. These measures consider that information travels throughout all possible paths between two researchers, like cascade analysis in information diffusion studies [27]. As a result, the influence that researchers have over each other is calculated.
Another contribution is the modeling of a scientific social network using those two evaluation measures and the use of a clustering algorithm to find scientific communities. The resulting groups are compared in order to validate the weighted resistance distance and to evaluate which measure produces the most homogeneous groups. In homogeneous groups, people of similar scientific interest are engaged. In this paper, maximum flow and weighted resistance distance are used as similarity measures.
The paper has five sections, besides its introduction. The “Scientific social networks and information diffusion” section presents the background. The “Measurement models to define tie weight” section discusses the proposed measurement models. The “Case study” section introduces a case study and analyzes the feasibility of the proposed models. The “Final remarks” section presents the conclusions and future work.
Scientific social networks and information diffusion
The growth of social networks is due to the Web evolution, and researchers are dealing with its challenges. Some works include mining newsgroups [50], predicting the popularity of links [35, 48, 52] and videos [8], and discovering useful information and patterns from data streams in sensor networks [25].
In addition, structure analysis of social networks helps in identifying critical regions and people [42]. Identifying them is a complex but essential task, as these elements are responsible for collaboration among all other peers in the network, and they are potential elements in information diffusion. Some studies have been conducted to gather information from social networks since appropriate aggregation of multiple social networks could offer a better opportunity for deep user understanding [38, 39].
Scientific social networks are specific types of social networks where two scientists are considered connected if they have coauthored a paper [23, 32]. In the real world, the nodes of a social network tend to be tightly connected, forming groups of people who work together, named communities. This effect also occurs in scientific social networks where researchers who have high concentration of common edges define scientific communities [33, 34]. Recovering these communities has been a challenging task, and many studies on social network analysis have been developed in order to identify them [48, 50].
In general, social network analysis involves the definition of a model to represent the relationship weights among researchers. According to [2], the higher the edge width/weight which defines the relationship between two individuals, the higher the influence that they exert on each other. Studies have shown that information spreads more easily between elements that have tighter relationships. However, looser relationships also play a key role in information diffusion [16, 24].
By means of a network topology, the behavior of its elements is used with several objectives. There are studies that attempted to model developers’ networks to explore the coordination performance of opensource software (OSS) [19] or the behavior of developers in OSS communities [21]. Both apply information diffusion measures to help understand those complex networks. This kind of analysis has been carried out in the health scenario, for instance, for the identification of influential members [37] or communities [11].
This study differs from others in that it addresses (i) the identification of relationships and their quantification and its use by a clustering algorithm to identify scientific communities, (ii) considering information diffusion concepts, (iii) in relationship weight definition of scientific social networks in terms of direct and indirect influence, as well as tight and loose existing connections.
Measurement models to define tie weight
In some cases, social network analysis is carried out by a quantitative evaluation that indicates the link weight between related elements [16, 24, 44]. Based on such quantitative analysis, the weight of links between authors is defined, allowing not only the use of other metrics (degree, closeness, betweenness, PageRank, etc.) [43] but also the discovery of communities [50, 32, 14, 34, 49].
However, quantitative analysis alone is not enough to evaluate the effect that indirectly related elements produced on each other. Many studies are being developed in order to evaluate information diffusion [2, 17]. As aforementioned, in these studies, all the elements of the social network have some influence on information diffusion.
Previous studies have shown that there is influence between peer researchers even though they are not directly related [41]. This indirect influence can be calculated in different ways. In this study, a comparison between the maximum flowbased model and the resistance distancebased model was made to evaluate the communication potential between nodes in a scientific social network.
Maximum flowbased model
The modeling process of the scientific social network was divided into three steps as follows: number of common relationships between researchers, age of the relationships that link these researchers, and loss of knowledge when the relationship between researchers is indirect [41].
Common relationships between researchers
where TR(α) is a matrix with all relationship values of each researcher pair.
Relationship’s age
Content loss in long relationships
Formally, the network maximum flow problem can be interpreted as a graph flow problem. The social graph with flow is represented by G = (X, U, f), in which f is a vector of dimension m + 1 and can be written in the form f = (f_{0}, f_{1}, …, f_{m}).
Vector f is the flow in graph G, and each of the components indicates the value of the flow between the elements of G. The social graph is represented in this study by a nonoriented graph, so the maximum flow coming out of x_{i} to x_{j} is equal to the maximum flow of x_{j} to x_{i}, for all x_{i}, x_{j} ∈ X.
Consider G as the social graph represented by SM defined in Eq. (5), X is the set of researchers, U is the set of relationships between these researchers, and f is the set of maximum flows between each pair of researchers. Thus, for any x_{i}, x_{j} ∈ X, the maximum flow between these two researchers will be equal to \( {f}_w={\sum}_{v:\left(i,v\right)\in E}{\overline{f}}_{iv} \), where i is the source node, v ∈ X\{i, j}, \( {\overline{f}}_{iv} \) represents the amount of flow passing from source i to node v and 0 ≤ w ≤ m.
The maximum flow was calculated using the EdmondsKarp algorithm [13]. This algorithm is a variation of the FordFulkerson maximum flow algorithm [15]. The main difference between these two approaches is that the EdmondsKarp algorithm targets the maximum flow between two elements considering the shortest paths. Thus, it is guaranteed that the algorithm will converge on a finite number of iterations, even for nonoriented graphs.
As the EdmondsKarp algorithm considers the shortest paths, an adaptation was made in the algorithm so that the maximum flow was penalized with a percentage according to the path length.
Erdös numberbased model
Michael Barr proposed the rational Erdös number model (RENM) as a distance measure (Michael [31]). The idea was to consider a social network of researchers who have published together, as being akin to an electric circuit of resistances. To each person, a rational number is assigned representing the total resistance from that node to the center of the network—in this case, the famous mathematician Paul Erdös. If two researchers coauthored one paper, there would be a 1Ω resistor between them. If they had two coauthored papers, then there would be two 1Ω resistors connected in parallel, which, by the laws of electricity, are equivalent to a 0.5Ω resistance. In Michael Barr’s proposal, a paper that is written by more than two authors should be represented by a new node in the graph, and N/4 Ω resistors should be placed connecting that node to each one of the N authors involved.
The RENM was intended as a distance measure to Paul Erdös, who coauthored many articles and is somehow related to most of the mathematical community. But the idea of representing a social network as an electric circuit of resistances can be applied in other realities, and it is possible to calculate not only distances to a center but also distances between each pair of people. If the distance between each node of a graph is to be calculated, then an algorithm can be used to identify groups of tightly connected nodes and to identify elements that have the shortest distance to all other elements of their groups, i.e., identify the medoids.
Weighted resistance distance
To solve the resistance distance, there is a method named the determinantal formula, which was proposed by Bapat [4].
Unfortunately, our preliminary studies showed that if all resistances are equal, no good groups are then identified. Thus, it is necessary to find an appropriate mathematical method to solve the circuit and obtain the effective resistance between each two authors.
The standard method to calculate the resistance between two points in an electrical network is to solve all equations provided by the first and second laws of Kirchhoff, and also by the Ohm’s law [30]. Although being feasible, it would lead to very complex and tedious calculations. The formula for the determinant of the resistance matrix was derived by Bapat [3] and is shown to reduce to the formula obtained by Xiao and Gutman in the unweighted case [47]. Consequently, it was possible to calculate the effective resistance even if the circuit had different valued resistors.
Multiple terminals deltastar transformation
According to Michael Barr’s original proposal, new nodes should be created to represent publications involving more than two people. Even though it is possible, the creation of more nodes would imply a bigger matrix, increasing the processing time of the algorithm that calculates the resistance distances. Therefore, this study worked with an alternative approach, which produces the same result, but does not raise the number of nodes.
The approach consists of a generalization of the wellknown deltastar transformation on resistive circuits [28], and it was only possible because in this case, all resistances between the center point (publication) and the terminals (authors) are equal. So, when more than two people coauthored a publication, the starlike array of N/4 Ω resistors was replaced by a network of same valued resistors connecting each author to every other author, just like in a complete graph.
The model in action
The weighted Laplacian matrix of conductance for this RENM network can be written as follows:
and by applying the determinantal formula to compute the resistances between each two authors, the following values can be found:
Case study
The case study was conducted in order to evaluate the measurement models proposed to define tie weight. The scope of the evaluation, based on the GQM method [29], was described as follows: “To analyze the scientific communities generated by clustering techniques and/or information diffusion in social networks for the purpose of evaluating scientific communities homogeneity and compare the two proposed approaches in relation to information diffusion from the point of view of the researchers in the context of scientific communities obtained based on researchers’ information diffusion potential in a scientific social network.”

How are scientific communities organized considering individual influences and measurement models that quantify information diffusion among researchers from a scientific social network?

RQ1: Which information diffusion approach produces more homogeneous scientific communities?

RQ2: Does the use of cluster analysis retrieve real scientific communities considering the activities developed by the researchers?

RQ3: Are researchers from the same scientific community connected through direct or indirect relationships?
In view of the above research questions, the case study was a suitable choice as a research method, considering that a contemporary phenomenon was evaluated, in its “realworld context,” according to Yin [51].
Dataset and case study process
Data for the construction of scientific social network were selected from DBLP,^{1} one of the databases commonly used in scientific studies on social networks [10, 50]. For this case study, the data of five out of eight highquality Brazilian institutions were extracted (COPPE/UFRJ, PUCRJ, UFPE, UFRGS, and UFMG). Altogether, 169 researchers were analyzed from the area of Computer Science.
 (1)
Data extraction from DBLP and construction of the social graph where nodes represent researchers and edges represent their coauthoring relationships. This social graph was used by both proposed models and to compare them.
 (2)
After construction of this social graph, the relationship weight (tie weight) between each pair of researchers was defined. The weight was defined by the two measurement models, previously described.
 (3)
As a result of the application of these models, two matrices representing the communication potential between pairs of researchers were obtained.
 (4)
In order to identify the scientific communities, the kmedoids clustering algorithm was applied to each of these matrices.
 (5)
The obtained measurements were compared based on the quality of the clusters generated by each of them.
kmedoids
The clustering algorithm developed aims to group researchers who have the greatest potential of communication with each other. To assist the development basis of the algorithm, the kmedoids algorithm was used [18].
As in the kmedoids algorithm, the proposed algorithm defined that k elements randomly represented the medoids. In the second step, each element of the data set is associated with the group (medoid) in which this element has the greatest potential for communication. In the third step, the medoids of each group are set once again, and the elements are regrouped.
The definition of medoids is based on the internal communication of each researcher group. The relationship weight of each researcher internal to the group is added, and the researcher who has the largest sum is considered the medoid of the group. After defining the new medoids, the algorithm comes back to the second step until there are no changes in the structure of the groups.
One of the biggest difficulties of some clustering techniques, including kmedoids, is in defining the ideal number of groups. Cluster analysis aims to identify homogeneous groups so that the sum of differences within the group (intragroup) is minimized, and the sum of differences among groups (intergroup) is maximized [1]. The groups are validated by evaluating which set of groups has the best grouping structure.
There are several techniques that can be used to assist in defining this number, such as PBM index, intragroup distance, and intergroup distance [6, 36].
Another inherent difficulty in the kmedoids algorithm is the initial setting of medoids because the result of the clustering process depends on the initial selection of these elements. Therefore, to select the best group, it is necessary to define the number of groups and the best set of elements that compose the initial medoids. In the next section, the details for setting these parameters and the results of clustering algorithm for the two proposed measurement models will be presented.
Experiments and results analysis
In this section, many experiments were conducted to analyze, compare, and evaluate the effectiveness of the two proposed models and answer the research questions.
Number of groups
To answer RQ1, it was necessary to set the number of groups and evaluate the best clustering, so the kmedoids algorithm was applied to the data generated by the two proposed measurement models. The value of k was evaluated in the range between 20 and 80 groups, and the values of seeds for random definition of medoids ranged from 1 to 199.
At this stage, the internal costs of the groups were analyzed for each seed. However, the sum of the internal maximum flow and the sum of the internal resistance (models proposed in the “Measurement models to define tie weight” section) produce values quantitatively and semantically different. Thus, it was necessary to define a criterion to calculate the internal distance of the groups so that the two methods could be compared.
As illustrated in Fig. 9, both models are based on a social graph where the relationship weight is defined by the scientific work done by them together. Since this graph is the same for both models, the intragroup distance (Eq. (10)) will be calculated based on it. It is worth noting that into the obtained groups may have elements that are not directly related to the medoid, i.e., there are elements that relate indirectly to the medoid. Thus, the distance between an element and the related medoid was calculated based on Dijkstra’s algorithm [12], which calculates the shortest path between nodes in a graph.
Average intragroup distances
To evaluate the behavior of the two proposed models, a detailed analysis was performed focusing on changes in the intragroup distances. Considering the boxplot graph, it is worth stressing the following three points: the first middle point (the median) and the middle points of the two halves. These three points divide the entire data set into quarters, named “quartiles” (Q1, Q2, and Q3). The distance between the first (Q1) and third (Q3) quartiles is a simple dispersion measure that represents the range containing the data on the average. This distance is named interquartile range (IQR), represented by the box on the chart. IQR can be used as a measure of how the values are spread out. If IQR is small, data dispersion is lower, i.e., the data are more homogeneous.
The maximum flow model has its interquartile range (variation around the median) always smaller than that of the Erdös model. This is another indication that the maximum flow model produces more homogeneous groups than the Erdös model does, regardless of the number of clusters.
Group elements
According to Fig. 14, there is a tendency that larger groups produced by the maximum flow model have fewer elements than the larger groups produced by the Erdös model. Moreover, the maximum flow model tends to increase the smaller groups, while the Erdös model produces a greater number of groups with fewer elements. While the Erdös model has the lowest average intragroup distances, as shown in Fig. 10b, this model produces many groups with few elements. The maximum flow model tends to produce groups with a better distribution of elements.
An important feature of the measurement models is the possibility of considering all the alternative paths between two elements to define their relationship weight. This feature answers RQ3, as can be seen in Fig. 15, where elements 50392 and 373964 are in group 46 due to their strong relationship with medoid 73438, and the relationship among them is defined by intermediary elements.
After the case study, it can be stated that both approaches produce good clusters, but the maximum flowbased approach produced more homogeneous groups with better distribution of the researchers among them. Moreover, the results showed that researchers participating in the same community could be indirectly related.
Final remarks
All elements in a social network have some influence on information diffusion. In addition, information reaches elements that are not directly related to the element that produces it. Therefore, this paper proposed two models (maximum flowbased model and Erdös numberbased model) capable of measuring the influence that elements of a social network have on each other, even if they are not directly related.
The Erdös numberbased model is a new approach for calculating resistance distance, which allows weights to be applied to resistance, considerably increasing the applicability of resistance distances in realworld applications. The maximum flowbased model is another way of calculating the relationship weight, which considers the maximum amount of information to be transmitted between two elements of the social network.
Both models were applied to scientific social networks built using the DBLP database, and a clustering algorithm (kmedoids) was used to identify scientific communities. These communities are composed of researchers who have great potential for communication among themselves. The obtained results allowed us to carry out a detailed analysis about the behavior of these models when identifying the scientific communities analyzed to assess whether the Erdös model also produces a good measure to set the relationship weight compared to the maximum flow model. This analysis shows that the results were satisfactory for both models.
By means of a case study, both the analyses and the obtained results confirmed the effectiveness of our approach using weighted resistance distance calculation in scientific social network analysis. However, additional experiments are needed so as to carry out a qualitative analysis.
As future work, the authors intend to improve the Erdös numberbased model so that the elements connected by intermediate nodes can reduce their weight in the relationship, as proposed by maximum flowbased model. Thus, the distribution of elements in the groups can be enhanced, producing a smaller number of unit groups.
In this study, we considered only the Laplacian matrix for weighted graphs in the definition of the RENM. However, the spectral graph theory studies the structural properties derived from the matrices that represent graphs. The latter lead to the spectral properties of the representation matrices, which are the central element of the spectral theory of graphs. In this sense, as the spectral graph theory deals with matrices of weighted graphs similar to Laplacian, as future work, we intend to improve the RENM by considering other matrix representations.
Moreover, considering the complexity to set the parameters of kmedoids algorithm, the authors intend to explore other clustering algorithms that do not require these settings, such as densitybased clustering algorithms.
Declarations
Availability of data and materials
The dataset supporting the conclusions of this article is not available in an online repository, but all data used could be shared when asked via email from victor.stroele@ice.ufjf.br.
Authors’ contributions
VS contributed to the data extraction, model development, results analysis, and paper writing. RC contributed to the model development, results analysis, and paper writing. GZ contributed to the students’ guideline, model development, and results analysis. JMS contributed to the students’ guideline and results analysis, FC, JMND, and RB contributed to the students’ guideline, results analysis, and paper writing. All authors read and approved the final manuscript.
Authors’ information
Victor Ströele: BS degree in Computer Science from Federal University of Juiz de Fora (2005), master’s degree (2007) and Ph.D. (2012) in Systems Engineering and Computer Science Program from Federal University of Rio de Janeiro. He is currently an Associate Professor II at Federal University of Juiz de Fora. He has experience in Computer Science, with emphasis on Data Mining and Complex Network, working mainly on the following topics: Clustering Algorithm, Social Network Analysis, Recommender Systems, and Informatics in Education.
Renato Crivano: Founding partner and handson CTO; Renato designed most of the technological solutions developed by the company and recruited and led the team of programmers. With the other partners, he participated in all the strategic decisions and various investment fundraising processes. Always interested in cuttingedge computer technologies, he also has a master’s degree in Database from the Systems Engineering and Computer Science Program/COPPE/UFRJ.
Geraldo Zimbrão: BS degree in Computer Science from Federal University of Rio de Janeiro (1993), master’s degree in Applied Mathematics from Federal University of Rio de Janeiro (1993), Ph.D. in Systems Engineering and Computer Science from Federal University of Rio de Janeiro (1999), postdoctoral degree from FernUniversitat (2005). He is currently a professor at Federal University of Rio de Janeiro and has a fellowship from CEDERJ (Center for Distance Higher Education of Rio de Janeiro). He has an experience in the area of Computer Science, with emphasis on Computer Systems, focused mainly on the following topics: Spatial Join and Spatial Databases.
Jano M. Souza: BS degree in Mechanical Engineering from Federal University of Rio de Janeiro (1974), master’s degree in Computer Science from COPPEFederal University of Rio de Janeiro (1978), and Ph.D. in Information Systems from the University of East Anglia (1986). Sabbatical at CERN from 1989 to 1993 (3 months a year). Researches and teaches Computer Science, focusing on the following topics: Databases, Knowledge Management, Social Networks, CSCW, Autonomic Computing, and Negotiation Support Systems.
Fernanda Campos: BS degree in Mathematics from Federal University de Juiz de Fora (1978). Master’s degree in Systems Engineering and Computer Science from Federal University of Rio de Janeiro (1994). Ph.D. in Systems Engineering and Computer Science from Federal University of Rio de Janeiro (1999). She is currently a senior professor at the Federal University of Juiz de Fora, working at the Computer Science undergraduate program, and is a permanent member of the Masters Program in Computer Science. She has an experience in Computer Science, with emphasis on software engineering, specifically in the following topics: eScience, eLearning, Recommender Systems, and Ecosystems.
José Maria N. David: BS degree in Electrical Engineering from Military Institute of Engineering, IME, (1983), master’s degree (M.Sc.) in Computer Science from Federal University of Rio de Janeiro (1991), and doctoral degree (D.Sc) in Computer Science from Federal University of Rio de Janeiro (2004). He is currently an associate professor at the Federal University of Juiz de Fora, working at the Computer Science undergraduate program, and is a permanent member of the Masters Program in Computer Science. He has an experience in Computer Science, focusing on Software Engineering, acting on the following topics: Groupware, CSCW, CSCL, Software Ecosystems, and Middleware.
Regina Braga: BS degree in Computer Science from Federal University de Juiz de Fora (1991). Master’s degree in Systems Engineering and Computer Science from Federal University of Rio de Janeiro (1995). Ph.D. in Systems Engineering and Computer Science from Federal University of Rio de Janeiro (2000). She is currently an associate professor at the Federal University of Juiz de Fora, working at the Computer Science undergraduate program, and is a permanent member of the Masters Program in Computer Science. She has an experience in computer science, with emphasis on software engineering and databases, specifically on the following topics: Software Reuse, Ontologies, Data Integration, Ecosystems, and Scientific Workflows.
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Aldenderfer MS, Blashfield RK (1984) Cluster analysis. Sage, Beverly Hills. https://doi.org/10.4135/9781412983648, View ArticleMATHGoogle Scholar
 Bakshy E, Rosenn I, Marlow C, Adamic L (2012) The role of social networks in information diffusion. In: Proceedings of the 21st international conference on World Wide Web SE—WWW ‘12, pp 519–528View ArticleGoogle Scholar
 Bapat RB (2004) Resistance matrix of a weighted graph. Communications in Mathematical and in Computer Chemistry/MATCH 50:73–82MathSciNetMATHGoogle Scholar
 Bapat RB, Gutman I, Xiao W (2003) A simple method for computing resistance distance. J Phys Sci 58:494–498Google Scholar
 Bastos MT, Recuero RDC, Zago GDS (2014) Taking tweets to the streets: a spatial analysis of the vinegar protests in Brazil. First Monday. https://doi.org/10.5210/fm.v19i3.5227
 Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics) 28:301–315. https://doi.org/10.1109/3477.678624 View ArticleGoogle Scholar
 Cayley A: A theorem on trees, Quart. J. Pure Appl. Math. 23 (1889); 376–378.Google Scholar
 Chen J, Song X, Nie L et al (2016) Micro tells macro. In: Proceedings of the 2016 ACM on Multimedia Conference—MM ‘16. ACM Press, New York, New York, USA, pp 898–907View ArticleGoogle Scholar
 Collected Mathematical Papers Vol. 13, Cambridge University Press 1897, 6–28.Google Scholar
 Cumpsty NA (2009) Some Lessons Learned. In: Volume 7: Turbomachinery, parts A and B. ASME, Lyon, France, pp 785–794View ArticleGoogle Scholar
 Dias A, Chomutare T, Botsis T (2012) Exploring the community structure of a diabetes forum. Studies in Health Technology and Informatics 180:833–837. https://doi.org/10.3233/9781614991014833 Google Scholar
 Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1:269–271. https://doi.org/10.1007/BF01386390 MathSciNetView ArticleMATHGoogle Scholar
 Edmonds J, Karp RM (1972) Theoretical improvements in algorithmic efficiency for network flow problems. J ACM 19:248–264. https://doi.org/10.1145/321694.321699 View ArticleMATHGoogle Scholar
 Evans TS, Lambiotte R, Panzarasa P (2011) Community structure and patterns of scientific collaboration in business and management. Scientometrics 89:381–396. https://doi.org/10.1007/s1119201104391 View ArticleGoogle Scholar
 Ford LR, Fulkerson DR (1956) Maximum flow through a network. Can J Math 8:399–404. https://doi.org/10.4153/CJM19560455 View ArticleMATHGoogle Scholar
 Grabowicz P, Ramasco J, Moro E et al (2012) Social features of online networks: the strength of intermediary ties in online social media. PLoS One 7:e29358. https://doi.org/10.1371/journal.pone.0029358 View ArticleGoogle Scholar
 Guille A (2013) Information diffusion in online social networks. Proceedings of the 2013 Sigmod/PODS PhD symposium on PhD symposium 1:31–36. https://doi.org/10.1145/2483574.2483575 View ArticleGoogle Scholar
 Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann Publishers, USAMATHGoogle Scholar
 Hossain L, Zhu D (2009) Social networks and coordination performance of distributed software development teams. J High Technol Management Res 20:52–61. https://doi.org/10.1016/j.hitech.2009.02.007 View ArticleGoogle Scholar
 Howard PN, Duffy A, Freelon D et al (2011) Opening closed regimes: what was the role of social media during the Arab Spring? Project on Information Technology and Political Islam:1–30. https://doi.org/10.1007/s1339801401737.2
 Huang Y, He P, Li B (2012) IEEE Xplore—Applying centrality measures to the behavior analysis of developers in open source software Communit... In: cloud and green computing (CGC), 2012 second international conference on. IEEE pp 418–423Google Scholar
 Hughes AL, Palen L (2009) Twitter adoption and use in mass convergence and emergency events. Int J Emerg Manag 6:248. https://doi.org/10.1504/IJEM.2009.031564 View ArticleGoogle Scholar
 Ichise R, Takeda H, Ueyama K (2005) Community mining tool using bibliography data. In: Proceedings of the International Conference on Information Visualisation IEEE, pp 953–960Google Scholar
 Jones JJ, Settle JE, Bond RM et al (2013) Inferring tie strength from online directed behavior. PLoS One 8:e52168. https://doi.org/10.1371/journal.pone.0052168 View ArticleGoogle Scholar
 Jung JJ (2010) Integrating social networks for context fusion in mobile service platforms. J Universal Computer Sci 16:2099–2110Google Scholar
 Kaufman L, Rousseeuw PJ (2005) Finding groups in data: an introduction to cluster analysis. WileyInterscience 33:368. https://doi.org/10.1007/s001340060431z Google Scholar
 Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining—KDD ‘03, p 137. https://doi.org/10.1145/956755.956769 View ArticleGoogle Scholar
 Kennelly AE (1899) Equivalence of triangles and threepointed stars in conducting networks. Electrical World and Engineer 34:413–414Google Scholar
 Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering version 2.3. Engineering 45:1051. https://doi.org/10.1145/1134285.1134500 Google Scholar
 Kuphaldt TR (2009) Lessons In electric circuits: DC. In: Open Book Project. http://www.ibiblio.org/kuphaldt/electricCircuits/DC/, pp 1–526Google Scholar
 Michael Barr (2001) Rational Erdős number. https://www.oakland.edu/Assets/upload/docs/ErdosNumberProject/barr.pdf
 Newman M (2004a) Who is the best connected scientist?A study of scientific coauthorship networks. In: Complex networks. SFI working paper 001264, Santa Fe, pp 337–370Google Scholar
 Newman MEJ (2004) Coauthorship networks and patterns of scientific collaboration. In: Proceedings of the National Academy of Sciences 101 (suppl 1) 52005205. https://doi.org/10.1073/pnas.0307545100
 Newman M (2004c) Detecting community structure in networks. Eur Phys J B 38:321–330. https://doi.org/10.1140/epjb/e200400124y View ArticleGoogle Scholar
 Nowell DL, Kleinberg J (2003) The link prediction problem for social networks. In: Proceedings of the twelfth international conference on information and knowledge management, vol 58, pp 556–559. https://doi.org/10.1145/956863.956972 Google Scholar
 Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recogn 37:487–501. https://doi.org/10.1016/j.patcog.2003.06.005 View ArticleMATHGoogle Scholar
 Rice E, Tulbert E, Cederbaum J et al (2012) Mobilizing homeless youth for HIV prevention: a social network analysis of the acceptability of a facetoface and online social networking intervention. Health Educ Res 27:226–236. https://doi.org/10.1093/her/cyr113 View ArticleGoogle Scholar
 Song X, Nie L, Zhang L et al (2015a) Interest inference via structureconstrained multisource multitask learning. In: IJCAI International Joint Conference on artificial intelligence, pp 2371–2377Google Scholar
 Song X, Nie L, Zhang L et al (2015b) Multiple social network learning and its application in volunteerism tendency prediction. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval—SIGIR ‘15. ACM Press, New York, New York, USA, pp 213–222Google Scholar
 Strode V, Campos F, Pereira CK et al (2016) Information extraction to improve link prediction in scientific social networks. 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD):515–520. https://doi.org/10.1109/CSCWD.2016.7566043
 Ströele V, Zimbrão G, Souza JM (2013) Group and link analysis of multirelational scientific social networks. J Syst Softw 86:1819–1830. https://doi.org/10.1016/j.jss.2013.02.024 View ArticleGoogle Scholar
 Trajanovski S, Kuipers FA, Ilic A et al (2015) Finding critical regions and regiondisjoint paths in a network. IEEE/ACM Trans Networking 23:908–921. https://doi.org/10.1109/TNET.2014.2309253 View ArticleGoogle Scholar
 Varlamis I, Eirinaki M, Louta M (2010) A study on social network metrics and their application in trust networks. In: Proceedings  2010 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2010, pp 168–175View ArticleGoogle Scholar
 Wang D, Pedreschi D, Song C et al (2011) Human mobility, social ties, and link prediction. In: Acm. ACM, New York, NY, USA, pp 1100–1108Google Scholar
 Wasserman S, Faust K (1994) Social network analysis: methods and applications. https://doi.org/10.1525/ae.1997.24.1.219 View ArticleMATHGoogle Scholar
 Watts J (2013) Brazil protests: president to hold emergency meeting. In: The Guardian. http://www.guardian.co.uk/world/2013/jun/21/brazilprotestspresidentemergencymeeting. Accessed 27 Mar 2018
 Xiao W, Gutman I (2003) Resistance distance and Laplacian spectrum. Theoretical Chemistry Accounts: Theory, Computation, and Modeling (Theoretica Chimica Acta) 110:284–289. https://doi.org/10.1007/s0021400304604 View ArticleGoogle Scholar
 Yan E, Guns R (2014) Predicting and recommending collaborations: an author, institution, and countrylevel analysis. Journal of Informetrics 8:295–309. https://doi.org/10.1016/j.joi.2014.01.008 View ArticleGoogle Scholar
 Yang C, Ma J, Silva T et al (2014) A multilevel information mining approach for expert recommendation in online scientific communities. Comput J 58:1921–1936. https://doi.org/10.1093/comjnl/bxu033 View ArticleGoogle Scholar
 Yang J, Leskovec J (2012) Defining and evaluating network communities based on groundtruth. ACM SIGKDD Work Min Data Semant 42:745–754Google Scholar
 Yin RK (2009) Case study research: design and methods, fifth edn. Sage Publications, Beverly HillsGoogle Scholar
 Yuan G, Murukannaiah PK, Zhang Z, Singh MP (2014) Exploiting sentiment homophily for link prediction. In: Proceedings of the 8th ACM Conference on Recommender systems—RecSys ‘14, pp 17–24Google Scholar