 Research
 Open Access
 Published:
Networkbased data classification: combining Kassociated optimal graphs and highlevel prediction
Journal of the Brazilian Computer Society volume 20, Article number: 14 (2014)
Abstract
Background
Traditional data classification techniques usually divide the data space into subspaces, each representing a class. Such a division is carried out considering only physical attributes of the training data (e.g., distance, similarity, or distribution). This approach is called lowlevel classification. On the other hand, network or graphbased approach is able to capture spacial, functional, and topological relations among data, providing a socalled highlevel classification. Usually, networkbased algorithms consist of two steps: network construction and classification. Despite that complex network measures are employed in the classification to capture patterns of the input data, the network formation step is critical and is not well explored. Some of them, such as Knearest neighbors algorithm (KNN) and εradius, consider strict local information of the data and, moreover, depend on some parameters, which are not easy to be set.
Methods
We propose a networkbased classification technique, named highlevel classification on Kassociated optimal graph (HLKAOG), combining the Kassociated optimal graph and highlevel prediction. In this way, the network construction algorithm is nonparametric, and it considers both local and global information of the training data. In addition, since the proposed technique combines lowlevel and highlevel terms, it classifies data not only by physical features but also by checking conformation of the test instance to formation pattern of each class component. Computer simulations are conducted to assess the effectiveness of the proposed technique.
Results
The results show that a larger portion of the highlevel term is required to get correct classification when there is a complexformed and welldefined pattern in the data set. In this case, we also show that traditional classification algorithms are unable to identify those data patterns. Moreover, computer simulations on realworld data sets show that HLKAOG and support vector machines provide similar results and they outperform wellknown techniques, such as decision trees and Knearest neighbors.
Conclusions
The proposed technique works with a very reduced number of parameters and it is able to obtain good predictive performance in comparison with traditional techniques. In addition, the combination of high level and low level algorithms based on network components can allow greater exploration of patterns in data sets.
Background
Introduction
Complex networks gather concepts from statistics, dynamical systems, and graph theory. Basically, they are largescale graphs with nontrivial connection patterns [1]. In addition, the ability to capture spacial, functional, and topological relations is one of their salient characteristics. Nowadays, complex networks appear in many scenarios [2], such as social networks [3], biological networks [4], Internet and World Wide Web [5], electric energy networks [6], and classification and pattern recognition [7–11]. Thus, distinct fields of sciences, such as physics, mathematics, biology, computer science, and engineering have contributed to the large advances in complex network study.
Data classification is an important task in machine learning. It is related to construct computer programs able to learn from labeled data sets and, subsequently, to predict unlabeled instances [12, 13]. Due to the vast number of applications, many data classification techniques have been developed. Some of the wellknown ones are decision trees [14, 15], instancebased learning, e.g., the Knearest neighbors algorithm (KNN) [16], artificial neural networks [17], NaiveBayes [18], and support vector machines (SVM) [19]. Nevertheless, most of them are highly dependent of appropriate parameter tunning. Examples include the confidence factor and the minimum number of cases to partition a set in C4.5 decision tree; the K value in KNN; the stop criterion, the number of neurons, the number of hidden layers, and others in artificial neural networks; and the soft margin, the kernel function, the kernel parameters, the stopping criterion, and others in SVM.
Complex networks have made considerable contributions to machine learning study. However, most of the researches related to complex networks are applied to data clustering, dimensionality reduction and semisupervised learning [20–22]. Recently, some networkbased techniques have been proposed to solve supervised learning problems, such as data classification [8, 10, 11, 23, 24]. The obtained results show that networkbased techniques have advantages over traditional ones in many aspects, such as the ability to detect classes of different shapes, absence of parameters, and the ability to classify data according to pattern formation of the training data.
In [8], the authors proposed a networkbased classification algorithm, called Kassociated optimal graph (KAOG). Among other characteristics, KAOG constructs a network through a local optimization structure based on a index of purity, which measures the compactness of the graph components. In the classification stage, KAOG combines the constructed network and a Bayes optimal classifier to compute the probability of a test instance belonging to each of the classes. One of the noticeable advantages of KAOG is that it is a nonparametric technique. This is a desirable feature, which will be employed in the technique proposed in the present work. However, both KAOG and other traditional classification techniques consider exclusively the physical features of the data (e.g., distance, similarity, or distribution). This limited way to perform classification tasks is known as lowlevel classification [11]. However, the human (animal) brain performs both low and high orders of learning for identifying patterns according to the semantic meaning of the input data. Data classification that considers not only physical attributes but also the pattern formation is referred to as highlevel classification [11]. Figure 1 shows an illustrative data set in which there are two classes (black and gray circles) and a new test instance to be classified (a white circle). Applying a SVM with optimized parameters and radial basis function as kernel, the test instance is classified into the black class. The same occurs when using optimized versions of another algorithms, such as KNN and decision tree. However, one could consider that the test instance belongs to the welldefined triangle pattern formed by the gray circles. This example shows that traditional classification techniques fail to identify general data patterns. On the other hand, in [11], the authors present a quite different kind of classification technique that is able to consider the pattern formation of the training data by using the topological structure of the underlying network, which is called highlevel classification. Specifically, the data pattern is identified by using some complex network measures. The test instance is classified by checking its conformation to each class of the network. Again, considering Figure 1, the highlevel technique is able to detect the pattern formed by gray cycle class and put the test instance into it. Furthermore, highlevel and lowlevel classifications can work together in a unique framework, as proposed in [11].
Despite the fact that highlevel classification offers a new vision on the data classification, the network formation still depends on some parameters, such as the parameter K in KNN and the parameter ε in εradius technique. Moreover, there are other parameters in the technique, which involve the weight assignment to each network measure employed in the framework [11]. All these parameters are problemoriented, and the selection of the values on these parameters is timeconsuming and has a strong influence on the quality of classification.
In this paper, we propose a networkbased classification technique combining two techniques: KAOG and the highlevel classification. We referred to as highlevel classification on Kassociated optimal graph (HLKAOG). It considers not only the physical attributes but also the pattern formation of the data. Specifically, the proposed technique provides the following:

A nonparametric way to construct the network

A highlevel data classification with only one parameter

A more sensitive highlevel classification by examining the network components instead of classes. This is relevant because classes can consist of several (eventually distinct) components. Thus, the components are smaller than the networks of whole classes. Consequently, the highlevel tests are more sensitive and good results can be obtained

An automatic way to obtain the influence coefficient for the network measures. In addition, this coefficient adapts itself according to each test instance

A new complex network measure adapted to highlevel classification, named component efficiency
Computer simulations have been conducted to assess the effectiveness of the proposed technique. Interestingly, the results show that a larger portion of the highlevel term is required to get correct classification when there is a complexformed and welldefined pattern in the data set. In this case, we also show that traditional classification algorithms are unable to identify those data patterns. Moreover, computer simulations on realworld data sets show that HLKAOG and support vector machines provide similar results and they outperform wellknown techniques, such as decision trees and Knearest neighbors.
The remainder of the paper is organized as follows: A background and a brief overview about the related works are presented in the ‘Overview’ section The proposed technique and the contributions of this work are detailed in the ‘Methods’ section. Empirical evaluation and discussions about the proposed algorithm on artificial and real data sets are showed in the ‘Results and discussion’ section. Finally, the ‘Conclusions’ section concludes the paper.
Overview
In this section, we review the most relevant classification techniques. Firstly, we present an overview on the networkbased data classification. Then, we describe the network construction and classification using the Kassociated optimal graph. Finally, we present the rationale behind the highlevel classification technique.
Networkbased data classification
In data classification, the algorithms receive as input a given training data set, denoted here X = {(inp_{1},lab_{1}),…,(inp_{ n },lab_{ n })}, where the pair (inp_{ i },lab_{ i }) is the ith data instance in the data set. Here, inp_{ i } = (x_{1},…,x_{ d }) represents the attributes of a ddimensional data item and lab_{ i }∈ L = {L_{1},…,L_{ C }} represents the target class or label associated to that data item.
In networkbased classification, the training data is usually represented as a network in which each instance is a vertex and the edges (or links) represent the similarity relations between vertices. The goal of the training phase is to induce a classifier from inp → lab by using the training data X.
In the prediction (classification) phase, the goal is to use the constructed classifier to predict new input instances unseen in training. So, there is a set of test instances Y = {(inp_{n+1},lab_{n+1}),…,(inp_{ z },lab_{ z })}. In this phase, the algorithm receives only the inp and uses the constructed network to predict the correct class lab for that inp.
Kassociated optimal graph
KAOG uses a purity measure to construct and optimize each component of the network. The resulting network together with the Bayes optimal classifier is used for classification of new instances. Specifically, a KAOG is a final network merging several Kassociated graphs while maintaining or improving their purity measure. For the sake of clarity, the network construction phase can be divided in two concepts: creating a Kassociated graph (Kac) and creating a Kassociated optimal graph.
The Kassociated graph builds up a network from a given data set and a K value, which is related to the number of neighbors to be considered for each vertex. Basically, the algorithm only connects a vertex v_{ i }to v_{ j }if v_{ j }is one of the Knearest neighbors of v_{ i }and if v_{ i }and v_{ j }have the same class label.
Algorithm 1 shows in detail how a Kassociated graph is constructed. In Kac, V denotes the set of vertices v_{ i }∈ V, which represents all training instances; E provides the set of edges e_{i,j}∈ E, which contains all links between vertices; ${\Lambda}_{{v}_{i},K}$ contains the Knearest neighbors of vertex v_{ i }; c_{ i }is the class label of vertex i; findComponents(V,E) is a function that find all existing components^{a} in the graph; C is the set of all components α ∈ C; purity (α) gives the compactness Φ_{ α }of each component α ∈ C; and G^{(K)} represents the Kassociated graph. Figure 2 shows the formation of the Kassociated graph for (a) K = 1 and (b) K = 2.
Also, about the Kassociated graph, there are two important characteristics that are highlighted as follows:

1.
Asymmetrical property: according to Algorithm 1, Kac returns a digraph. Digraphs provide good representation for the asymmetric nature existing in many data sets because many times ${v}_{j}\in {\Lambda}_{{v}_{i},K}$ does not imply ${v}_{i}\in {\Lambda}_{{v}_{j},K}$.

2.
Purity: this measure expresses the level of mixture of a component in relation to other components of distinct classes. Basically, the purity measure is given by
$${\Phi}_{\alpha}=\frac{{D}_{\alpha}}{2K},$$(1)
where D_{ α }denotes the average degree of a component α; and 2K is the maximum number of possible links that a vertex can have. D_{ α }can be obtained by
in which N is the total number of vertices in α and k_{ i }is the degree^{b} of vertex i.
Despite the fact that Kassociated graph presents good performance on data classification, there is a drawback: it uses the same value of K to form networks for all data classes. However, rarely a network obtained by an unique value of K is able to produce the best configuration of instances in the components, in terms of the purity measure. In this way, an algorithm, which is able to adapt itself to different classes of the data set is welcome. The idea of KAOG is to obtain the optimal K value for each component in order to maximize its purity [8].
Algorithm 2 shows in detail the construction of KAOG from Kassociated graphs. In the algorithm, G^{(Ot)} denotes the Kassociated optimal graph and lastAvgDegree is the average degree of the network before incrementing the value of K; note that no parameter is introduced in the algorithm. In the following, we describe Algorithm 2. In the first lines, K starts with the value 1 and, thus, the 1associated graph is considered as the optimal graph (G^{opt}) at this moment. After the initial setting, a loop starts to merge the subsequent Kassociated graphs by increasing K, while improving the purity of the network encountered so far, until the optimal network measured by the purity degree [8] is reached. Between lines 7 and 12, the algorithm verifies for each component of the Kassociated graph (${C}_{\beta}^{\left(K\right)}$) whether the condition given by line eight is satisfied. In affirmative case, an operation to remove the components that compose ${C}_{\beta}^{\left(K\right)}$ from the optimal graph is performed (line 9). In line 10, the new component is added to G^{opt}. At the end, the algorithm returns the obtained components with their respective values of K and purities.
About the time complexity of the KAOG network construction, given a training set with N instances and d attributes, the complexity for generating the corresponding distance matrix is N (N  1) ∗ d which yields the complexity order of O (N^{2}). Another functions such as find the graph components and compute the purity measure of the graph components yield the complexity order of O (N). Therefore, the time complexity to build up the network is O (N^{2}) [8].
After the construction of the KAOG network, a Bayes optimal classifier (BayesOC) performs the classification of new instances using the constructed network. Let us consider a new instance y to be classified. KAOG performs the classification computing the probability of y belonging to each component. Thus, from the Bayes theory, the a posteriori probability of y to belong to a component α taking the K_{ α }nearest neighbors of this new case (Λ_{ y }) is given by
According to [8], the probability of the neighborhood Λ_{ y } given that y belongs to α is given by
The normalization term P (Λ_{ y }) is obtained by
Also, the a priori probability P (y ∈ α) is given by
In addition, if the number of components in the network is bigger than the number of classes in the data set, BayesOC sums up the probabilities associated to the each class j, as follows:
At the end, BayesOC chooses the class with the largest a posteriori probability.
About the time complexity, the BayesOC classification yields the complexity order of O (N) that is related to the calculation of the distance matrix between the training set and the test case [8].
Highlevel classification
Highlevel prediction considers not only physical attributes but also global semantic characteristics of the data [11]. This is due to the use of complex network measures, which are able to capture the pattern formation of the input data. The next sections provide more details about highlevel classification.
Network construction As all other networkbased learning techniques, here, the first step is the network construction from the vectorbased input data. In fact, the way how the network is constructed influences to a large extent the classification results of the highlevel prediction. In [11], the authors propose a network construction method combining the εradius and KNN algorithms, which is given by
where KNN and εradius return, respectively, the set containing the Knearest vertices of the same class of vertex i and the set of vertices of the same class of i, in which the distance from i is smaller than ε. Note that K and ε are usercontrollable parameters. In addition, the algorithm connects the vertex i to other vertices using εradius when the condition is satisfied and using K NN, otherwise.
As shown in Equation 8, the network formation algorithm depends on some parameters. Different parameter values produce very distinct results. Moreover, the model selection of the parameters is timeconsuming. For this reason, we propose a nonparametric network formation method based on the KAOG to work together with the highlevel technique. The ‘Methods’ section describes our proposal.
Hybrid classification technique In [11], the authors propose a hybrid classification technique combining a lowlevel term and a highlevel term, which is given by
Considering a test instance y ∈ Y, ${M}_{y}^{\left(J\right)}$ denotes the association produced by low and highlevel algorithms when evaluating instance y for the class J. Also in the equation, the variable ${P}_{y}^{\left(J\right)}\in [\phantom{\rule{0.3em}{0ex}}0,1]$ establishes the association produced by lowlevel classifier between the instance y and the class J. On the other hand, the variable ${H}_{y}^{\left(J\right)}\in \phantom{\rule{0.3em}{0ex}}[\phantom{\rule{0.3em}{0ex}}0,1]$ points to an association produced by the highlevel technique (composed by complex network measures, such as assortativity and clustering coefficient [1]) between y and the class J. Finally, λ ∈ [ 0,1] is a usercontrollable variable and it defines a weight assigned to each produced classification. Note that λ just defines the contribution of low and highlevel classifications. For example, if λ = 0, only lowlevel algorithm works.
The highlevel classification of a new instance y for a given class J is given by
where ${H}_{y}^{\left(J\right)}\in \phantom{\rule{0.3em}{0ex}}[\phantom{\rule{0.3em}{0ex}}0,1]$, u is related to the network measures employed in the highlevel algorithm, δ (u) ∈ [ 0,1], ∀ u ∈ {1,…,Z} is a usercontrollable variable that indicates the influence of each network measure in the classification process and ${f}_{y}^{\left(J\right)}\left(u\right)$ provides an answer whether the test instance y presents the same patterns of the class J or not, considering the uth network measure applied. The denominator term is only for normalization. There is also a constraint about δ (u); (10) is valid only if $\sum _{u=1}^{K}\delta \left(u\right)=1$.
About ${f}_{y}^{\left(J\right)}\left(u\right)$, it is given by
in which $\Delta {G}_{y}^{\left(J\right)}\left(u\right)\in \phantom{\rule{0.3em}{0ex}}[\phantom{\rule{0.3em}{0ex}}0,1]$ represents the variation that occurs in a complex network measure whenever a new instance y ∈ Y is inserted and p^{(J)}∈ [ 0,1] is the proportion of instances that belongs to the class J.
Complex network measures In fact, complex network measures are used to provide a highlevel analysis on the data [25]. So, when a new instance y needs to be classified, the technique computes the impact by inserting this new vertex for each class in an isolated way. Basically, the variation of the results in network measures indicates which is the class that y belongs to. In other words, if there is a little variation in the formation pattern of that class when connecting y to it, highlevel prediction returns a big value indicating that y is in conformity with this pattern. In the opposite, if there is a great variation when linking y to a class, it returns a small value denoting that y is not in conformity with this pattern.
In [11], three network measures are employed to check the pattern formation of the input data: assortativity, average degree, and clustering coefficient [26]. A more detailed view about the network measures employed in HLKAOG is provided in the next section.
Methods
Most machine learning algorithms perform the classification exclusively based on physical attributes of the data. They are called lowlevel algorithms. One example includes the BayesOC algorithm showed in the previous section. On the other hand, complex networkbased techniques provide a different kind of classification that is able to capture formation patterns in the data sets.
Actually, the principal drawback in the use of complex network measures for the data classification is the network formation. Some techniques have been largely used in the literature, such as KNN and εradius, but they depend on parameters. This means that the technique is not able to detect information in the data, so different parameters produce very distinct results. In HLKAOG, we exploit the KAOG ability to produce an efficient and nonparametric network to address this problem. In addition, other contributions are presented here.
This section describes the principal contributions of this investigation. The ‘Component efficiency measure’ section shows a new complex network measure for highlevel classification: the component efficiency. The ‘Linking highlevel prediction and KAOG algorithm’ section provides details about how KAOG network and highlevel classifier work together. The ‘Highlevel classification on network components’ section denotes an important conceptual modification in our highlevel approach: complex network measures are employed on graph components. The ‘Nonparametric influence coefficient for the network measures’ section shows an automatic way to obtain the influence coefficient of the network measures. The ‘Complex network measures per component’ section provides the adaptation of the complex network measures to work on components instead of classes.
Component efficiency measure
The component efficiency measure quantifies the efficiency of the component in sending information between its vertices. The component efficiency is a new network measure incorporated into the highlevel classification technique. Its development is motivated by the concept of efficiency of a network [27], which measures how efficient the network exchanges information. Once our highlevel algorithm is based on the component level, we named our measure as component efficiency.
Initially, consider a vertex i in a component α. The local efficiency of i is given by
where V_{ i }denotes the number of links from i, Λ_{ i }represents the vertex that receives links from i, and q_{ ij }is related to the geodesic distance between i and j.
We define the efficiency of a component α as the average of the local efficiency of the nodes that belong to α. So, we have
in which V_{ α }is the number of vertices in the component α.
Linking highlevel prediction and KAOG algorithm
We adapted the concepts of Kassociated optimal graph and highlevel algorithm to permit that they work together. Firstly, Kassociated optimal graph divides the network in components according to purity measure. Thus, the highlevel technique proposed here considers components instead of classes. In addition, different from BayesOC that considers only those components in which at least one vertex belongs to the nearest neighbors of the test instance, the highlevel algorithm employs complex network measures to examine if the insertion of the test instance in a component is in conformity with the formation pattern of that component.Figure 1 shows an illustrative example in which the network topology cannot be detected by the original KAOG. So, if we employ KAOG directly to our model, it would not be able to classify the new instance into gray cycle class. Instead, it will be classified into black cycle class. On the other hand, our model permits the correct classification because it uses available information in each component constructed by KAOG in a different way.
Suppose a test instance y will be classified. The classification stage of our networkbased technique can be divided in two steps according to [23]:

1.
Firstly, the proposed technique uses the component efficiency measure to verify what are the components where y can be inserted. This information is important especially because it considers the local features of the components and establishes a heuristic that excludes components that are not in conformity with the insertion of y into them.
In a more formal definition, let us consider a component α and a set F related to the components in which the variations of complex network measures will be computed. For each new instance y, F_{ y }is given by
where ${e}_{y}^{(\alpha )}$ denotes the local efficiency of y to each vertex that belongs to component α and E^{(α)} is the component efficiency of α.

2.
The next step is the insertion of y in each α ∈ F _{ y }. According to our technique, y makes connections with each vertex i ∈ α following the equation given by
$${\alpha}_{y}\Leftarrow \alpha \cup \left\{i\phantom{\rule{1em}{0ex}}\right\phantom{\rule{1em}{0ex}}{e}_{y}^{\left(i\right)}\le {E}^{(\alpha )}\},$$(15)
where α_{ y }includes component α and the connections between y and their vertices, ${e}_{y}^{\left(i\right)}$ is the local efficiency in exchanging information between y and i, and ${e}_{y}^{\left(i\right)}\le {E}^{(\alpha )}$ is the condition to be satisfied to assure a link between y and i.
Note that if F_{ y }= ∅ in (14) (very unusual situation), the algorithm employs the K_{ α }value associated with each component α ∈ Λ and verifies that the vertices in α are one of the K_{ α }nearest neighbors of y. If there is at least one vertex that satisfies this condition in component α, then the complex network measures are applied to this component; otherwise, α is not considered in the classification phase.
Highlevel classification on network components
Once the network is obtained by KAOG, the highlevel algorithm can be applied to classify new instances by checking the variation of complex network measures in each component before and after the insertion of the new instance. Thus, the proposed highlevel technique works on the components instead of whole classes. This is an important feature introduced in this work. Since each class can encompass more than one component, each component is smaller or equal to the corresponding whole class. In this way, the insertion of a test instance can generate more precise variations on the network measures. Consequently, it is easier to check the conformity of a test instance to the pattern formation of each class component. On the other hand, the previous work of highlevel classification considers the network of a whole class of data items. In this case, the variations are weaker and sometimes it is difficult to distinguish the conformity levels of the test instance to each class. Therefore, taking (9), the highlevel classification of a new instance y for a given component α, is given by
where α^{J}denotes a component α in which its instances belong to class J, ${P}_{y}^{\left({\alpha}^{J}\right)}$ establishes the association produced by a lowlevel classifier between the instance y and the component α, and ${H}_{y}^{\left({\alpha}^{J}\right)}$ points to an association produced by the highlevel technique between y and the component α. The general idea behind ${H}_{y}^{\left({\alpha}^{J}\right)}$ is very simple: (i) KAOG network finds a set of components based on the purity measure, (ii) highlevel technique examines these components in relation to the insertion of a test instance y, and (iii) the probabilities for each component α are obtained.
Also about ${H}_{y}^{(\alpha )}$, it represents the compatibility of each component α with the new instance y and is given by
in which u is related to the network measures employed in the highlevel algorithm, δ_{ y }(u) ∈ [ 0,1], ∀ u ∈ {1,…,Z} indicates the influence of each network measure in the classification process, and ${f}_{y}^{(\alpha )}\left(u\right)$ provides an answer whether the test instance y presents the same patterns of the component α or not, considering the uth network measure. The denominator term in (17) is only for normalization. Details about δ_{ y }(u) are provided in the ‘Nonparametric influence coefficient for the network measures’ section.
From (17), a simple modification is performed in (11) to obtain a highlevel classification on components (α) instead classes (J). So, we have
where the denominator term in (18) is only for normalization and p^{(α)}∈ [ 0,1] is the proportion of instances that belong to the component α.
Nonparametric influence coefficient for the network measures
Differently of the previous works, the highlevel technique proposed in this work is nonparametric not only at the network construction phase but also at the classification phase. We have developed an automatic way for the weight assignment among the employed network measures, i.e., the δ term in Equation 17 is determined by
in which $\Delta {G}_{y}^{(\alpha )}\left(u\right)\in [\phantom{\rule{0.3em}{0ex}}0,1]$ represents the variation that occurs in a complex network measure whenever a new instance y ∈ Y is inserted. Therefore, δ_{ y }(u) is based on the opposite of the difference between the biggest and the smallest uth network measure variation on all components α. The idea of determining δ_{ y }(u) in this way is to balance all the employed network measures in the decision process and not permit only one network measure to dominate the classification decision. Note that this equation is valid only if $\sum _{u=1}^{K}{\delta}_{y}\left(u\right)=1$.
Complex network measures per component
The complex network measures presented in the previous section work on classes. Differently of the approach proposed in [11], the approach proposed in this work considers pattern formation per component. This feature makes the algorithm more sensitive to the network measure variations. In addition, the components are constructed regarding the purity measure, which gives more precise information about the data set.
In this section, we adapted assortativity and clustering coefficient to work on components. Also, as the time complexity of the highlevel classification is directly related to the network measures employed, we present the complexity order of each measure.
Assortativity ( $\Delta {G}_{y}^{\left(J\right)}\left(1\right)$ )
The assortativity measure quantifies the tendency of connections between vertices [26] in a complex network. This measure analyzes whether a link occurs preferentially between vertices with similar degree or not. The assortativity with regard to each component α of the data set is given by
where r^{(α)}∈ [ 1,1], U_{ α }= {u:i_{ u }∈ α ∧ k_{ u }∈ α} encompasses all the edges within the component α, u represents an edge, and i_{ u },k_{ u }indicate the vertices at each end of the edge u.
Therefore, the membership value of a test instance y ∈ Y with respect to the component α is given by
The assortativity measure yields the complexity order of O(E+V), where E and V denote, respectively, the number of edges and the number of vertices in the graph.
Clustering coefficient ( $\Delta {G}_{y}^{\left(J\right)}\left(2\right)$ )
Clustering coefficient is a measure that quantifies the degree to which local nodes in a network tend to cluster together [28]. The clustering coefficient with regard to each component α of the data set is given by
in which $C{C}_{i}^{(\alpha )}\in \phantom{\rule{0.3em}{0ex}}[\phantom{\rule{0.3em}{0ex}}0,1]$ and V_{ α }denotes the number of vertices in the component α. The membership value of a test instance y ∈ Y with respect to the component α is given by:
The clustering coefficient measure yields the complexity order of O (V∗p^{2}), where V and p denote, respectively, the number of vertices in the graph and the average node degree.
Average degree ( $\Delta {G}_{y}^{\left(J\right)}\left(3\right)$ )
The average degree is a very simple measure. It quantifies, statistically, the average degree of the vertices in a component. The average degree with regard to each component α is given by
in which k^{(α)}∈ [ 0,1] and V_{ α } denotes the number of vertices in component α. Regarding the membership value of a test instance y ∈ Y with respect to component α, it is given by
The average degree measure yields the complexity order of O (V), where V denotes the number of vertices in the graph.
Results and discussion
In this section, we present a set of computer simulations to assess the effectiveness of HLKAOG technique. The ‘Experiments on artificial data sets’ section supplies results obtained on artificial data sets, which emphasize the key features of the HLKAOG. The ‘Experiments on real data sets’ section provides simulations on realworld data sets, which highlight a great ability of HLKAOG to perform data classification. Note that the Euclidean distance is used as similarity measure in all the experiments.
Experiments on artificial data sets
Initially, we use some artificial data sets presenting strong patterns to evaluate the proposed technique. These examples provide particular situations where lowlevel classifiers by themselves have trouble to correctly classify the data items in the test set. Thus, this section serves as a tool for better motivating the usage of the proposed model.
The first step of HLKAOG is the construction of the KAOG network. Different from other techniques of network construction, KAOG is nonparametric and it builds up the network considering the purity measure (1). In the second step, HLKAOG employs the hybrid low and highlevel techniques to classify the test instances. The lowlevel classification here uses Bayes optimal classifier (3) and the highlevel term uses the complex network measures given by (17) to capture the pattern formation of each component. Finally, the general framework given by (16) combines low and highlevel techniques.
Triangle data set
Figure 3 shows the KAOG network on a twoclass data set. One can see that the gray class data items exhibit a strong pattern (similar to a triangle). Since the traditional machine learning techniques are not able to consider the pattern formation of classes, they cannot classify the new instance y (white color vertex) correctly. In addition, these techniques consider only the physical distance among the data items, which contributes to classify y as belonging to class black. So, as the use of the Bayes optimal classifier, the same thing happens if we use other traditional techniques, such as decision tree, Knearest neighbors, and support vector machine (SVM classifiers consider transformed feature space through a kernel function, but it is essentially based on the physical distance among the data), i.e., they are not able to identify the triangle pattern formed by the gray component.
Table 1 provides a detailed view about how the highlevel classification is performed in HLKAOG. Considering the new instance y and the network showed in Figure 3, the classification process starts by computing each network measure on each component that satisfies (14). Then, HLKAOG inserts y temporally in these components, according to (15), and computes the network measure on each of them again. So, the variation of each network measure before and after the insertion of the new instance can be obtained and (17) can be computed. Remember that the objective of the proposed function (19) is to balance the membership results obtained by the complex network measures in a way that the classification process is performed considering all of the applied network measures.
Table 2 presents the performance of the proposed technique on the artificial data sets. The ‘Classification result’ column shows the final probability of classification when λ = 0.2 and λ = 0.6 (highlevel algorithms gets a higher contribution to the classification). The results showed in Table 2 emphasize some interesting properties of HLKAOG. Firstly, one see that, combining a larger portion of the highlevel classification (λ = 0.6), the technique is able to correctly classify the test instance when considering a very clear pattern. However, using a larger portion of the KAOG classification (λ = 0.2), the pattern formation is not detected because Bayes optimal classifier considers only physical attributes to perform the prediction. Secondly, we employ a nonparametric technique to build up the network and to assign a weight to each network measure. So, HLKAOG has only the λ value to set. Thirdly, HLKAOG does not work on classes, but on the components. This is an important difference in relation to previous work about high level because the insertion of one vertex in a whole class of data items can seem very weak to present big network measures variations. Thus, the variation at the component level can highlight better the variation when y is inserted in that component.
Line I data set
Figure 4 shows the KAOG network on a twoclass data set, in which gray class presents a clear pattern of a straight line. Note that the three nearest neighbors of the new instance y belong to the following classes: gray, black and black, respectively. In this way, KAOG classification is not able to detect the pattern presented by the gray component. The whole classification process performed by the Bayes optimal classifier on the KAOG is described as follows: Firstly, the algorithm computes the probability a priori of y to belong to each component as described in (6):
where the normalized purity measure of the components is used as a priori probability (Φ_{gray} = 1 and Φ_{black} = 1). Then, the probability of the neighborhood Λ_{ y }given that y ∈ C_{gray} and y ∈ C_{black} is calculated as in(4):
Then, the normalization term can be computed through (5):
Finally, the final result of the KAOG classification on Line 1 data set can be obtained with (3) and (7):
Note that other lowlevel algorithms classify y as belonging to black class too. On the other hand, Table 2 shows that, with a high contribution of the highlevel term (λ = 0.6), the technique can produce a correct label for y, according to the pattern formation presented in Figure 4.
Line II data set
Figure 5 shows an interesting situation by two reasons. Firstly, because the new instance is very close to the black components, it implies that the traditional techniques classify the new instance to the black class. Secondly, the big distances between the new instance and gray class data items make it difficult to be classified to the gray class. However, HLKAOG can provide a correct classification due to its robustness to detect pattern formation, as shown in Table 2.
Multiclass data set
Figure 6 shows the KAOG network on a data set with three classes. There is a test instance (white color) that will be classified. Figure 6a presents a dubious case where there is no pattern formation related to the test instance. In this way, the technique (independent of the λ value) classifies the test instance as belonging to the black class. On the other hand, Figure 6b shows a more representative information about the pattern formation related to the test instance. So, considering this clear pattern formation, the test instance is correctly classified when using a combination between low and highlevel algorithms where λ = 0.6 (Table 2).
Experiments on real data sets
We also conducted simulations on realworld data sets available in UCI repository [29] and KEEL data sets [30]. Table 3 provides details about the data sets used here.
The proposed technique is compared to decision tree, Knearest neighbors, and support vector machine. All these traditional techniques are available in the python machine learning module named scikitlearn [31]. Grid search algorithm is employed to perform parameter selection for these techniques. For decision tree, scikitlearn provides an optimized version of the classification and regression tree (CART) algorithm [14]. In these experiments, two parameters are configured: the minimum density over the set {0,0.1,0.25,0.5,0.8,1}, which controls a tradeoff in an optimization heuristic, and the minimum number of samples required to be at a leaf node, here denoted as ms, which is optimized over the set ms ∈ {0,1,2,3, 4,5,10,15,20,30,50}. Based on the previous works applying the KNN on realworld data sets [8], a K value is optimized over the set K ∈ {1,3,5,…,31}, which is sufficient to provide the best results for this algorithm. In SVM simulations, we reduce the search space for the optimization process by fixing a single wellknown kernel, namely the radial basis function (RBF) kernel. The stopping criterion for the optimization method is defined as the KarushKuhnTucker violation to be less than 10 ^{3}. For each data set, the model selection is performed by considering the kernel parameter γ ∈ {2^{4},2^{3},…,2^{10} and the cost parameter C ∈ 2^{12},2^{11},…,2^{2}. Finally, the results obtained by each algorithm are averaged over 30 runs using the stratified 10fold crossvalidation process. Parameters were tuned using only the training data.
On the other hand, selection of parameters is unnecessary for our technique when building up a network. Differently of the parameters in traditional machine learning (such as K in KNN, kernel function in SVM, and so on), λ variable does not influence the training phase. In this way, we fix λ_{1} = 0.2 (due to good values found in [11]) and λ_{2} = 0.6 (to provide a very larger portion of the highlevel classification). In the previous section, artificial data sets provided particular situations where lowlevel techniques have trouble to perform the classification. Here, we evaluate (i) the linear combination between Kassociated optimal graph and highlevel algorithm, and (ii) the λ influence in a context of realworld data sets. Once most techniques are essentially based on lowlevel characteristics, it is sure that these information contribute for a good classification. Consequently, when working together with lowlevel characteristics, highlevel characteristics can improve the classification results by considering more than the physical attributes.
Table 4 shows the predictive performance of the algorithms on realworld data sets presented in Table 3. In this table, ‘Acc’. denotes the average of accuracy for each technique and ‘Std.’ represents the standard deviation of this accuracies. In order to analyze statistically the results, we adopted a statistical test that compares multiple classifier over multiple data sets [32]. Firstly, Friedman test is calculated to check whether the performance of the classifiers are significantly different. Using a significance level of 5%, the null hypothesis is rejected. This means that the algorithms under study are not equivalent. Following a post hoc test, Nemenyi test is employed (also considering a significance level of 5%). The results of this test indicate that HLKAOG and SVM provide similar results and they outperform CART and KNN. This result is quite attractive because HLKAOG, different from other traditional techniques, is able to capture spacial, functional, and topological relations in the data. In addition, the computer simulations show that a very larger portion of highlevel classification (λ_{2}) can improve the final prediction in some realworld data sets. This means that these data sets present welldefined patterns, which can be detected considering mainly the topological structure of the data, instead of their physical attributes.
Conclusions
HLKAOG takes advantages provided by the Kassociated optimal graph and the highlevel technique for data classification. Specifically, the former provides a nonparametric construction of the network based on the purity measure, while the latter is able to capture pattern formation of the training data. Thus, some contributions of HLKAOG includes the following:

The technique does not work on classes, but on the network components, i.e., each class can have more than one component. In this way, the insertion of a test instance can generate bigger variations on the network measures. Consequently, it is much easier to check the conformation of a test instance to the pattern formation of each class component. On the other hand, the previous work of high level considers the network of a whole class of data items. In this case, the variations are very weak and sometimes it is difficult to distinguish the conformation levels of the test instance to each class.

The use of Kassociated optimal graph to obtain a nonparametric network.

An automatic way to obtain the influence coefficient for the network measures. In addition, this coefficient adapts itself according to each test instance.

Development of a new network measure named component efficiency to perform the highlevel classification.
Computer simulations and statistic tests show that HLKAOG presents good performance on both artificial and real data sets. In comparison with traditional machine learning techniques, computer simulations on realworld data sets showed that HLKAOG and support vector machines provide similar results and they outperform very wellknown techniques, such as decision trees and Knearest neighbors. On the other hand, experiments performed with artificial data sets emphasized some drawbacks of the traditional machine learning that, differently from HLKAOG, are unable to consider the formation pattern of the data.
Forthcoming works include the incorporation of dynamical complex network measures, such as random walk and tourist walk, to the highlevel classification algorithm, which can give a combined local and global vision in a natural way on the networks under analysis. Future researches include also a complete analysis of the highlevel classification when dealing with imbalanced data sets and the investigation of complex network measures able to prevent the risk of overfitting in the data classification.
Endnotes
^{a} A component is a subgraph α where any vertices v_{ i }∈ C can be reached by other v_{ j }∈ C and cannot be reached by any other vertices v_{ t }∉ C.
^{b} The degree of a vertex v, denoted by k_{ v }, is the total number of vertices adjacent to v.
References
 1.
Newman M: The structure and function of complex networks. SIAM Rev 2003, 45(2):167–256. 10.1137/S003614450342480
 2.
Costa LDF, Oliveira ON, Travieso G, Rodrigues FA, Boas PRV, Antiqueira L, Viana MP, Da Rocha LEC: Analyzing and modeling realworld phenomena with complex networks: a survey of applications. Adv Phys 2007, 60(3):103.
 3.
Lu Z, Savas B, Tang W, Dhillon IS: Supervised link prediction using multiple sources. In 2010 IEEE international conference on data mining. Sidney, Australia; 2010:923–928.
 4.
Fortunato S: Community detection in graphs. Phys Rep 2010, 486(3–5):75–174.
 5.
Boccaletti S, Ivanchenko M, Latora V, Pluchino A, Rapisarda A: Detecting complex network modularity by dynamical clustering. Phys Rev Lett 2007, 75: 045102.
 6.
Newman M: Networks: an introduction. Oxford University Press, New York; 2010.
 7.
Bertini JR, Lopes AA, Zhao L: Partially labeled data stream classification with the semisupervised kassociated graph. J Braz Comput Soc 2012, 18(4):299–310. 10.1007/s1317301200728
 8.
Bertini JR, Zhao L, Motta R, Lopes AA: A nonparametric classification method based on kassociated graphs. Inf Sci 2011, 181(24):5435–5456. 10.1016/j.ins.2011.07.043
 9.
Carneiro MG, Rosa JL, Lopes AA, Zhao L: Classificação de alto nível utilizando grafo kassociados ótimo. In IV international workshop on web and text intelligence. Curitiba, Brazil; 2012:1–10.
 10.
Cupertino TH, Carneiro MG, Zhao L: Dimensionality reduction with the kassociated optimal graph applied to image classification. In 2013 IEEE international conference on imaging systems and techniques. Beijing, China; 2013:366–371.
 11.
Silva TC, Zhao L: Networkbased high level data classification. IEEE Trans Neural Netw 2012, 23: 954–970.
 12.
Bishop CM: Pattern recognition and machine learning. Information science and statistics. SpringerVerlag, New York; 2006.
 13.
Mitchell T: Machine learning. McGrawHill series in Computer Science, McGrawHill, New York; 1997.
 14.
Breiman L: Classification and regression trees. Chapman & Hall, London; 1984.
 15.
Quinlan J: Induction of decision trees. Mach Learn 1986, 1: 81–106.
 16.
Aha DW, Kibler D, Albert M: Instancebased learning algorithms. Mach Learn 1991, 6: 37–66.
 17.
Haykin S: Neural networks: a comprehensive foundation,. Prentice Hall PTR, Upper Saddle River; 1998.
 18.
Neapolitan RE: Learning Bayesian networks. PrenticeHall, Upper Saddle River; 2003.
 19.
Cortes C, Vapnik V: Supportvector networks. Mach Learn 1995, 20(3):273–297.
 20.
Chapelle O, Scholkopf B, Zien A: Semisupervised learning. MIT Press, Cambridge; 2006.
 21.
Schaeffer SE: Graph clustering. Comput Sci Rev 2007, 1(1):27–64. 10.1016/j.cosrev.2007.05.001
 22.
Yan S, Xu D, Zhang B, Zhang HJ, Yang Q, Lin S: Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 2007, 29(1):40–51.
 23.
Carneiro MG, Zhao L: High level classification totally based on complex networks. In Proceedings of the 1st BRICS Countries Congress. Porto de Galinhas, Brazil; 2013:1–8.
 24.
Rossi R, de Paulo Faleiros T, de Andrade Lopes A, Rezende S: Inductive model generation for text categorization using a bipartite heterogeneous network. In 2012 IEEE international conference on data mining. Brussels, Belgium; 2012:1086–1091.
 25.
Andrade RFS, Miranda JGV, Pinho STR, Lobão TP: Characterization of complex networks by higher order neighborhood properties. Eur Phys J B 2006, 61(2):28.
 26.
Newman MEJ: Assortative mixing in networks. Phys Rev Lett 2002, 89: 208701.
 27.
Latora V, Marchiori M: Efficient behavior of smallworld networks. Phys Rev Lett 2001, 87: 198701.
 28.
Watts D, Strogatz S: Collective dynamics of smallworld networks. Nature 1998, 393: 440–442. 10.1038/30918
 29.
Frank A, Asuncion A: UCI machine learning repository. 2010.http://archive.ics.uci.edu/ml . Accessed 10 Nov 2013
 30.
AlcaláFdez J, Fernández A, Luengo J, Derrac J: García, S. MultipleValued Logic Soft Comput 2011, 17(2–3):255–287.
 31.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E: scikitlearn: machine learning in Python. J Mach Learn Res 2011, 12: 2825–2830.
 32.
Demšar J: Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 2006, 7: 1–30.
Acknowledgements
The authors would like to acknowledge the São Paulo State Research Foundation (FAPESP) and the Brazilian National Council for Scientific and Technological Development (CNPq) for the financial support given to this research.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
MGC is the main author of this paper, which contains some results of his doctoral thesis in the Computer Science Program at Institute of Mathematics and Computer Science, University of São Paulo. MGC and LZ have contributed to the conceptual aspects of this work. Experiments and manuscript writing were performed by MGC under supervision of LZ, JLGR and AAL. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Carneiro, M.G., Rosa, J.L., Lopes, A.A. et al. Networkbased data classification: combining Kassociated optimal graphs and highlevel prediction. J Braz Comput Soc 20, 14 (2014). https://doi.org/10.1186/167848042014
Received:
Accepted:
Published:
Keywords
 Highlevel classification
 Complex network
 Machine learning
 Data classification