Networkbased data classification: combining Kassociated optimal graphs and highlevel prediction
 Murillo G Carneiro^{1}Email author,
 João LG Rosa^{2},
 Alneu A Lopes^{2} and
 Liang Zhao^{3}
https://doi.org/10.1186/167848042014
© Carneiro et al.; licensee Springer. 2014
Received: 15 May 2013
Accepted: 2 April 2014
Published: 17 June 2014
Abstract
Background
Traditional data classification techniques usually divide the data space into subspaces, each representing a class. Such a division is carried out considering only physical attributes of the training data (e.g., distance, similarity, or distribution). This approach is called lowlevel classification. On the other hand, network or graphbased approach is able to capture spacial, functional, and topological relations among data, providing a socalled highlevel classification. Usually, networkbased algorithms consist of two steps: network construction and classification. Despite that complex network measures are employed in the classification to capture patterns of the input data, the network formation step is critical and is not well explored. Some of them, such as Knearest neighbors algorithm (KNN) and εradius, consider strict local information of the data and, moreover, depend on some parameters, which are not easy to be set.
Methods
We propose a networkbased classification technique, named highlevel classification on Kassociated optimal graph (HLKAOG), combining the Kassociated optimal graph and highlevel prediction. In this way, the network construction algorithm is nonparametric, and it considers both local and global information of the training data. In addition, since the proposed technique combines lowlevel and highlevel terms, it classifies data not only by physical features but also by checking conformation of the test instance to formation pattern of each class component. Computer simulations are conducted to assess the effectiveness of the proposed technique.
Results
The results show that a larger portion of the highlevel term is required to get correct classification when there is a complexformed and welldefined pattern in the data set. In this case, we also show that traditional classification algorithms are unable to identify those data patterns. Moreover, computer simulations on realworld data sets show that HLKAOG and support vector machines provide similar results and they outperform wellknown techniques, such as decision trees and Knearest neighbors.
Conclusions
The proposed technique works with a very reduced number of parameters and it is able to obtain good predictive performance in comparison with traditional techniques. In addition, the combination of high level and low level algorithms based on network components can allow greater exploration of patterns in data sets.
Keywords
Highlevel classification Complex network Machine learning Data classificationBackground
Introduction
Complex networks gather concepts from statistics, dynamical systems, and graph theory. Basically, they are largescale graphs with nontrivial connection patterns [1]. In addition, the ability to capture spacial, functional, and topological relations is one of their salient characteristics. Nowadays, complex networks appear in many scenarios [2], such as social networks [3], biological networks [4], Internet and World Wide Web [5], electric energy networks [6], and classification and pattern recognition [7–11]. Thus, distinct fields of sciences, such as physics, mathematics, biology, computer science, and engineering have contributed to the large advances in complex network study.
Data classification is an important task in machine learning. It is related to construct computer programs able to learn from labeled data sets and, subsequently, to predict unlabeled instances [12, 13]. Due to the vast number of applications, many data classification techniques have been developed. Some of the wellknown ones are decision trees [14, 15], instancebased learning, e.g., the Knearest neighbors algorithm (KNN) [16], artificial neural networks [17], NaiveBayes [18], and support vector machines (SVM) [19]. Nevertheless, most of them are highly dependent of appropriate parameter tunning. Examples include the confidence factor and the minimum number of cases to partition a set in C4.5 decision tree; the K value in KNN; the stop criterion, the number of neurons, the number of hidden layers, and others in artificial neural networks; and the soft margin, the kernel function, the kernel parameters, the stopping criterion, and others in SVM.
Complex networks have made considerable contributions to machine learning study. However, most of the researches related to complex networks are applied to data clustering, dimensionality reduction and semisupervised learning [20–22]. Recently, some networkbased techniques have been proposed to solve supervised learning problems, such as data classification [8, 10, 11, 23, 24]. The obtained results show that networkbased techniques have advantages over traditional ones in many aspects, such as the ability to detect classes of different shapes, absence of parameters, and the ability to classify data according to pattern formation of the training data.
Despite the fact that highlevel classification offers a new vision on the data classification, the network formation still depends on some parameters, such as the parameter K in KNN and the parameter ε in εradius technique. Moreover, there are other parameters in the technique, which involve the weight assignment to each network measure employed in the framework [11]. All these parameters are problemoriented, and the selection of the values on these parameters is timeconsuming and has a strong influence on the quality of classification.
In this paper, we propose a networkbased classification technique combining two techniques: KAOG and the highlevel classification. We referred to as highlevel classification on Kassociated optimal graph (HLKAOG). It considers not only the physical attributes but also the pattern formation of the data. Specifically, the proposed technique provides the following:

A nonparametric way to construct the network

A highlevel data classification with only one parameter

A more sensitive highlevel classification by examining the network components instead of classes. This is relevant because classes can consist of several (eventually distinct) components. Thus, the components are smaller than the networks of whole classes. Consequently, the highlevel tests are more sensitive and good results can be obtained

An automatic way to obtain the influence coefficient for the network measures. In addition, this coefficient adapts itself according to each test instance

A new complex network measure adapted to highlevel classification, named component efficiency
Computer simulations have been conducted to assess the effectiveness of the proposed technique. Interestingly, the results show that a larger portion of the highlevel term is required to get correct classification when there is a complexformed and welldefined pattern in the data set. In this case, we also show that traditional classification algorithms are unable to identify those data patterns. Moreover, computer simulations on realworld data sets show that HLKAOG and support vector machines provide similar results and they outperform wellknown techniques, such as decision trees and Knearest neighbors.
The remainder of the paper is organized as follows: A background and a brief overview about the related works are presented in the ‘Overview’ section The proposed technique and the contributions of this work are detailed in the ‘Methods’ section. Empirical evaluation and discussions about the proposed algorithm on artificial and real data sets are showed in the ‘Results and discussion’ section. Finally, the ‘Conclusions’ section concludes the paper.
Overview
In this section, we review the most relevant classification techniques. Firstly, we present an overview on the networkbased data classification. Then, we describe the network construction and classification using the Kassociated optimal graph. Finally, we present the rationale behind the highlevel classification technique.
Networkbased data classification
In data classification, the algorithms receive as input a given training data set, denoted here X = {(inp_{1},lab_{1}),…,(inp_{ n },lab_{ n })}, where the pair (inp_{ i },lab_{ i }) is the ith data instance in the data set. Here, inp_{ i } = (x_{1},…,x_{ d }) represents the attributes of a ddimensional data item and lab_{ i }∈ L = {L_{1},…,L_{ C }} represents the target class or label associated to that data item.
In networkbased classification, the training data is usually represented as a network in which each instance is a vertex and the edges (or links) represent the similarity relations between vertices. The goal of the training phase is to induce a classifier from inp → lab by using the training data X.
In the prediction (classification) phase, the goal is to use the constructed classifier to predict new input instances unseen in training. So, there is a set of test instances Y = {(inp_{n+1},lab_{n+1}),…,(inp_{ z },lab_{ z })}. In this phase, the algorithm receives only the inp and uses the constructed network to predict the correct class lab for that inp.
Kassociated optimal graph
KAOG uses a purity measure to construct and optimize each component of the network. The resulting network together with the Bayes optimal classifier is used for classification of new instances. Specifically, a KAOG is a final network merging several Kassociated graphs while maintaining or improving their purity measure. For the sake of clarity, the network construction phase can be divided in two concepts: creating a Kassociated graph (Kac) and creating a Kassociated optimal graph.
The Kassociated graph builds up a network from a given data set and a K value, which is related to the number of neighbors to be considered for each vertex. Basically, the algorithm only connects a vertex v_{ i }to v_{ j }if v_{ j }is one of the Knearest neighbors of v_{ i }and if v_{ i }and v_{ j }have the same class label.
 1.
Asymmetrical property: according to Algorithm 1, Kac returns a digraph. Digraphs provide good representation for the asymmetric nature existing in many data sets because many times ${v}_{j}\in {\Lambda}_{{v}_{i},K}$ does not imply ${v}_{i}\in {\Lambda}_{{v}_{j},K}$.
 2.Purity: this measure expresses the level of mixture of a component in relation to other components of distinct classes. Basically, the purity measure is given by${\Phi}_{\alpha}=\frac{{D}_{\alpha}}{2K},$(1)
in which N is the total number of vertices in α and k_{ i }is the degree^{b} of vertex i.
Despite the fact that Kassociated graph presents good performance on data classification, there is a drawback: it uses the same value of K to form networks for all data classes. However, rarely a network obtained by an unique value of K is able to produce the best configuration of instances in the components, in terms of the purity measure. In this way, an algorithm, which is able to adapt itself to different classes of the data set is welcome. The idea of KAOG is to obtain the optimal K value for each component in order to maximize its purity [8].
Algorithm 2 shows in detail the construction of KAOG from Kassociated graphs. In the algorithm, G^{(Ot)} denotes the Kassociated optimal graph and lastAvgDegree is the average degree of the network before incrementing the value of K; note that no parameter is introduced in the algorithm. In the following, we describe Algorithm 2. In the first lines, K starts with the value 1 and, thus, the 1associated graph is considered as the optimal graph (G^{opt}) at this moment. After the initial setting, a loop starts to merge the subsequent Kassociated graphs by increasing K, while improving the purity of the network encountered so far, until the optimal network measured by the purity degree [8] is reached. Between lines 7 and 12, the algorithm verifies for each component of the Kassociated graph (${C}_{\beta}^{\left(K\right)}$) whether the condition given by line eight is satisfied. In affirmative case, an operation to remove the components that compose ${C}_{\beta}^{\left(K\right)}$ from the optimal graph is performed (line 9). In line 10, the new component is added to G^{ opt }. At the end, the algorithm returns the obtained components with their respective values of K and purities.
About the time complexity of the KAOG network construction, given a training set with N instances and d attributes, the complexity for generating the corresponding distance matrix is N (N  1) ∗ d which yields the complexity order of O (N^{2}). Another functions such as find the graph components and compute the purity measure of the graph components yield the complexity order of O (N). Therefore, the time complexity to build up the network is O (N^{2}) [8].
At the end, BayesOC chooses the class with the largest a posteriori probability.
About the time complexity, the BayesOC classification yields the complexity order of O (N) that is related to the calculation of the distance matrix between the training set and the test case [8].
Highlevel classification
Highlevel prediction considers not only physical attributes but also global semantic characteristics of the data [11]. This is due to the use of complex network measures, which are able to capture the pattern formation of the input data. The next sections provide more details about highlevel classification.
where KNN and εradius return, respectively, the set containing the Knearest vertices of the same class of vertex i and the set of vertices of the same class of i, in which the distance from i is smaller than ε. Note that K and ε are usercontrollable parameters. In addition, the algorithm connects the vertex i to other vertices using εradius when the condition is satisfied and using K NN, otherwise.
As shown in Equation 8, the network formation algorithm depends on some parameters. Different parameter values produce very distinct results. Moreover, the model selection of the parameters is timeconsuming. For this reason, we propose a nonparametric network formation method based on the KAOG to work together with the highlevel technique. The ‘Methods’ section describes our proposal.
Considering a test instance y ∈ Y, ${M}_{y}^{\left(J\right)}$ denotes the association produced by low and highlevel algorithms when evaluating instance y for the class J. Also in the equation, the variable ${P}_{y}^{\left(J\right)}\in [\phantom{\rule{0.3em}{0ex}}0,1]$ establishes the association produced by lowlevel classifier between the instance y and the class J. On the other hand, the variable ${H}_{y}^{\left(J\right)}\in \phantom{\rule{0.3em}{0ex}}[\phantom{\rule{0.3em}{0ex}}0,1]$ points to an association produced by the highlevel technique (composed by complex network measures, such as assortativity and clustering coefficient [1]) between y and the class J. Finally, λ ∈ [ 0,1] is a usercontrollable variable and it defines a weight assigned to each produced classification. Note that λ just defines the contribution of low and highlevel classifications. For example, if λ = 0, only lowlevel algorithm works.
where ${H}_{y}^{\left(J\right)}\in \phantom{\rule{0.3em}{0ex}}[\phantom{\rule{0.3em}{0ex}}0,1]$, u is related to the network measures employed in the highlevel algorithm, δ (u) ∈ [ 0,1], ∀ u ∈ {1,…,Z} is a usercontrollable variable that indicates the influence of each network measure in the classification process and ${f}_{y}^{\left(J\right)}\left(u\right)$ provides an answer whether the test instance y presents the same patterns of the class J or not, considering the uth network measure applied. The denominator term is only for normalization. There is also a constraint about δ (u); (10) is valid only if $\sum _{u=1}^{K}\delta \left(u\right)=1$.
in which $\Delta {G}_{y}^{\left(J\right)}\left(u\right)\in \phantom{\rule{0.3em}{0ex}}[\phantom{\rule{0.3em}{0ex}}0,1]$ represents the variation that occurs in a complex network measure whenever a new instance y ∈ Y is inserted and p^{(J)}∈ [ 0,1] is the proportion of instances that belongs to the class J.
Complex network measures In fact, complex network measures are used to provide a highlevel analysis on the data [25]. So, when a new instance y needs to be classified, the technique computes the impact by inserting this new vertex for each class in an isolated way. Basically, the variation of the results in network measures indicates which is the class that y belongs to. In other words, if there is a little variation in the formation pattern of that class when connecting y to it, highlevel prediction returns a big value indicating that y is in conformity with this pattern. In the opposite, if there is a great variation when linking y to a class, it returns a small value denoting that y is not in conformity with this pattern.
In [11], three network measures are employed to check the pattern formation of the input data: assortativity, average degree, and clustering coefficient [26]. A more detailed view about the network measures employed in HLKAOG is provided in the next section.
Methods
Most machine learning algorithms perform the classification exclusively based on physical attributes of the data. They are called lowlevel algorithms. One example includes the BayesOC algorithm showed in the previous section. On the other hand, complex networkbased techniques provide a different kind of classification that is able to capture formation patterns in the data sets.
Actually, the principal drawback in the use of complex network measures for the data classification is the network formation. Some techniques have been largely used in the literature, such as KNN and εradius, but they depend on parameters. This means that the technique is not able to detect information in the data, so different parameters produce very distinct results. In HLKAOG, we exploit the KAOG ability to produce an efficient and nonparametric network to address this problem. In addition, other contributions are presented here.
This section describes the principal contributions of this investigation. The ‘Component efficiency measure’ section shows a new complex network measure for highlevel classification: the component efficiency. The ‘Linking highlevel prediction and KAOG algorithm’ section provides details about how KAOG network and highlevel classifier work together. The ‘Highlevel classification on network components’ section denotes an important conceptual modification in our highlevel approach: complex network measures are employed on graph components. The ‘Nonparametric influence coefficient for the network measures’ section shows an automatic way to obtain the influence coefficient of the network measures. The ‘Complex network measures per component’ section provides the adaptation of the complex network measures to work on components instead of classes.
Component efficiency measure
The component efficiency measure quantifies the efficiency of the component in sending information between its vertices. The component efficiency is a new network measure incorporated into the highlevel classification technique. Its development is motivated by the concept of efficiency of a network [27], which measures how efficient the network exchanges information. Once our highlevel algorithm is based on the component level, we named our measure as component efficiency.
where V_{ i }denotes the number of links from i, Λ_{ i }represents the vertex that receives links from i, and q_{ ij }is related to the geodesic distance between i and j.
in which V_{ α }is the number of vertices in the component α.
Linking highlevel prediction and KAOG algorithm
We adapted the concepts of Kassociated optimal graph and highlevel algorithm to permit that they work together. Firstly, Kassociated optimal graph divides the network in components according to purity measure. Thus, the highlevel technique proposed here considers components instead of classes. In addition, different from BayesOC that considers only those components in which at least one vertex belongs to the nearest neighbors of the test instance, the highlevel algorithm employs complex network measures to examine if the insertion of the test instance in a component is in conformity with the formation pattern of that component.Figure 1 shows an illustrative example in which the network topology cannot be detected by the original KAOG. So, if we employ KAOG directly to our model, it would not be able to classify the new instance into gray cycle class. Instead, it will be classified into black cycle class. On the other hand, our model permits the correct classification because it uses available information in each component constructed by KAOG in a different way.
 1.
Firstly, the proposed technique uses the component efficiency measure to verify what are the components where y can be inserted. This information is important especially because it considers the local features of the components and establishes a heuristic that excludes components that are not in conformity with the insertion of y into them.
 2.The next step is the insertion of y in each α ∈ F _{ y }. According to our technique, y makes connections with each vertex i ∈ α following the equation given by${\alpha}_{y}\Leftarrow \alpha \cup \left\{i\phantom{\rule{1em}{0ex}}\right\phantom{\rule{1em}{0ex}}{e}_{y}^{\left(i\right)}\le {E}^{(\alpha )}\},$(15)
where α_{ y }includes component α and the connections between y and their vertices, ${e}_{y}^{\left(i\right)}$ is the local efficiency in exchanging information between y and i, and ${e}_{y}^{\left(i\right)}\le {E}^{(\alpha )}$ is the condition to be satisfied to assure a link between y and i.
Note that if F_{ y }= ∅ in (14) (very unusual situation), the algorithm employs the K_{ α }value associated with each component α ∈ Λ and verifies that the vertices in α are one of the K_{ α }nearest neighbors of y. If there is at least one vertex that satisfies this condition in component α, then the complex network measures are applied to this component; otherwise, α is not considered in the classification phase.
Highlevel classification on network components
where α^{ J }denotes a component α in which its instances belong to class J, ${P}_{y}^{\left({\alpha}^{J}\right)}$ establishes the association produced by a lowlevel classifier between the instance y and the component α, and ${H}_{y}^{\left({\alpha}^{J}\right)}$ points to an association produced by the highlevel technique between y and the component α. The general idea behind ${H}_{y}^{\left({\alpha}^{J}\right)}$ is very simple: (i) KAOG network finds a set of components based on the purity measure, (ii) highlevel technique examines these components in relation to the insertion of a test instance y, and (iii) the probabilities for each component α are obtained.
in which u is related to the network measures employed in the highlevel algorithm, δ_{ y }(u) ∈ [ 0,1], ∀ u ∈ {1,…,Z} indicates the influence of each network measure in the classification process, and ${f}_{y}^{(\alpha )}\left(u\right)$ provides an answer whether the test instance y presents the same patterns of the component α or not, considering the uth network measure. The denominator term in (17) is only for normalization. Details about δ_{ y }(u) are provided in the ‘Nonparametric influence coefficient for the network measures’ section.
where the denominator term in (18) is only for normalization and p^{(α)}∈ [ 0,1] is the proportion of instances that belong to the component α.
Nonparametric influence coefficient for the network measures
in which $\Delta {G}_{y}^{(\alpha )}\left(u\right)\in [\phantom{\rule{0.3em}{0ex}}0,1]$ represents the variation that occurs in a complex network measure whenever a new instance y ∈ Y is inserted. Therefore, δ_{ y }(u) is based on the opposite of the difference between the biggest and the smallest uth network measure variation on all components α. The idea of determining δ_{ y }(u) in this way is to balance all the employed network measures in the decision process and not permit only one network measure to dominate the classification decision. Note that this equation is valid only if $\sum _{u=1}^{K}{\delta}_{y}\left(u\right)=1$.
Complex network measures per component
The complex network measures presented in the previous section work on classes. Differently of the approach proposed in [11], the approach proposed in this work considers pattern formation per component. This feature makes the algorithm more sensitive to the network measure variations. In addition, the components are constructed regarding the purity measure, which gives more precise information about the data set.
In this section, we adapted assortativity and clustering coefficient to work on components. Also, as the time complexity of the highlevel classification is directly related to the network measures employed, we present the complexity order of each measure.
Assortativity ( $\Delta {G}_{y}^{\left(J\right)}\left(1\right)$ )
where r^{(α)}∈ [ 1,1], U_{ α }= {u:i_{ u }∈ α ∧ k_{ u }∈ α} encompasses all the edges within the component α, u represents an edge, and i_{ u },k_{ u }indicate the vertices at each end of the edge u.
The assortativity measure yields the complexity order of O(E+V), where E and V denote, respectively, the number of edges and the number of vertices in the graph.
Clustering coefficient ( $\Delta {G}_{y}^{\left(J\right)}\left(2\right)$ )
The clustering coefficient measure yields the complexity order of O (V∗p^{2}), where V and p denote, respectively, the number of vertices in the graph and the average node degree.
Average degree ( $\Delta {G}_{y}^{\left(J\right)}\left(3\right)$ )
The average degree measure yields the complexity order of O (V), where V denotes the number of vertices in the graph.
Results and discussion
In this section, we present a set of computer simulations to assess the effectiveness of HLKAOG technique. The ‘Experiments on artificial data sets’ section supplies results obtained on artificial data sets, which emphasize the key features of the HLKAOG. The ‘Experiments on real data sets’ section provides simulations on realworld data sets, which highlight a great ability of HLKAOG to perform data classification. Note that the Euclidean distance is used as similarity measure in all the experiments.
Experiments on artificial data sets
Initially, we use some artificial data sets presenting strong patterns to evaluate the proposed technique. These examples provide particular situations where lowlevel classifiers by themselves have trouble to correctly classify the data items in the test set. Thus, this section serves as a tool for better motivating the usage of the proposed model.
The first step of HLKAOG is the construction of the KAOG network. Different from other techniques of network construction, KAOG is nonparametric and it builds up the network considering the purity measure (1). In the second step, HLKAOG employs the hybrid low and highlevel techniques to classify the test instances. The lowlevel classification here uses Bayes optimal classifier (3) and the highlevel term uses the complex network measures given by (17) to capture the pattern formation of each component. Finally, the general framework given by (16) combines low and highlevel techniques.
Triangle data set
Employed complex network measures in highlevel classification on the Triangle data set
C _{ α }  Complex network measures  

Assortativity  Clustering coefficient  Average degree  
r ^{ ( . ) }  ${r}^{\prime \left(.\right)}$  $\Delta {G}_{y}^{\left(.\right)}\left(1\right)$  ${f}_{y}^{\left(.\right)}\left(1\right)$  C C ^{ ( . ) }  $C{C}^{\prime (.)}$  $\Delta {G}_{y}^{(.)}\left(2\right)$  ${f}_{y}^{(.)}\left(2\right)$  E ^{ ( . ) }  ${E}_{y}^{\left(.\right)}$  $\Delta {G}_{y}^{(.)}\left(3\right)$  ${f}_{y}^{(.)}\left(3\right)$  
C _{gray}  0.977  0.968  0.930  0.869  0.118  0.101  0.408  0.256  6.000  5.667  0.657  0.489 
C _{black}  0.850  0.851  0.070  0.131  0.188  0.212  0.592  0.744  8.000  7.826  0.343  0.511 
Classification results (in terms of probability) obtained for each artificial data set when λ =0 . 2 and λ =0 . 6
Data set  Class  Classification result  

λ= 0 . 2  λ= 0 . 6  
Triangle data set  gray  0.439  0.516 
black  0.561  0.483  
Line I data set  gray  0.481  0.586 
black  0.519  0.414  
Line II data set  gray _{1}  0.000  0.000 
gray _{2}  0.200  0.600  
black _{1}  0.400  0.200  
black _{2}  0.400  0.200  
Multiclass data set  black  0.800  0.400 
gray  0.000  0.000  
green  0.200  0.600 
Line I data set
Note that other lowlevel algorithms classify y as belonging to black class too. On the other hand, Table 2 shows that, with a high contribution of the highlevel term (λ = 0.6), the technique can produce a correct label for y, according to the pattern formation presented in Figure 4.
Line II data set
Multiclass data set
Experiments on real data sets
Brief description of the data sets
Name  #Inst.  #Attr.  #Classes  Maj. class (%) 

Iris  150  4  3  33.33 
Glass  214  9  7  35.51 
Balance  625  4  3  46.08 
Monks2  601  6  2  65.72 
Ecoli  336  7  8  42.56 
Append.  106  7  2  80.19 
Thyroid  215  5  3  69.77 
Sonar  208  60  2  53.37 
Digits  5,620  64  10  10.18 
SPECTF  267  44  2  79.40 
The proposed technique is compared to decision tree, Knearest neighbors, and support vector machine. All these traditional techniques are available in the python machine learning module named scikitlearn [31]. Grid search algorithm is employed to perform parameter selection for these techniques. For decision tree, scikitlearn provides an optimized version of the classification and regression tree (CART) algorithm [14]. In these experiments, two parameters are configured: the minimum density over the set {0,0.1,0.25,0.5,0.8,1}, which controls a tradeoff in an optimization heuristic, and the minimum number of samples required to be at a leaf node, here denoted as ms, which is optimized over the set ms ∈ {0,1,2,3, 4,5,10,15,20,30,50}. Based on the previous works applying the KNN on realworld data sets [8], a K value is optimized over the set K ∈ {1,3,5,…,31}, which is sufficient to provide the best results for this algorithm. In SVM simulations, we reduce the search space for the optimization process by fixing a single wellknown kernel, namely the radial basis function (RBF) kernel. The stopping criterion for the optimization method is defined as the KarushKuhnTucker violation to be less than 10 ^{3}. For each data set, the model selection is performed by considering the kernel parameter γ ∈ {2^{4},2^{3},…,2^{10} and the cost parameter C ∈ 2^{12},2^{11},…,2^{2}. Finally, the results obtained by each algorithm are averaged over 30 runs using the stratified 10fold crossvalidation process. Parameters were tuned using only the training data.
On the other hand, selection of parameters is unnecessary for our technique when building up a network. Differently of the parameters in traditional machine learning (such as K in KNN, kernel function in SVM, and so on), λ variable does not influence the training phase. In this way, we fix λ_{1} = 0.2 (due to good values found in [11]) and λ_{2} = 0.6 (to provide a very larger portion of the highlevel classification). In the previous section, artificial data sets provided particular situations where lowlevel techniques have trouble to perform the classification. Here, we evaluate (i) the linear combination between Kassociated optimal graph and highlevel algorithm, and (ii) the λ influence in a context of realworld data sets. Once most techniques are essentially based on lowlevel characteristics, it is sure that these information contribute for a good classification. Consequently, when working together with lowlevel characteristics, highlevel characteristics can improve the classification results by considering more than the physical attributes.
Comparative results obtained by HLKAOG, CART, KNN, and SVM on ten realworld data sets
HLKAOG  CART  KNN  SVM  

Data set  Acc. ± Std.  Acc. ± Std.  Acc. ± Std.  Acc. ± Std. 
Iris  97.33 ± 3.52 (λ_{1})  93.60 ± 5.59  96.37 ± 4.63  96.28 ± 4.02 
Glass  70.78 ± 9.16 (λ_{1})  64.12 ± 9.33  72.64 ± 8.09  68.61 ± 7.78 
Balance  95.71 ± 2.40 (λ_{1})  88.20 ± 4.25  89.77 ± 1.96  99.97 ± 0.08 
Monks2  96.53 ± 2.43 (λ_{1})  95.67 ± 2.48  81.26 ± 5.02  93.79 ± 3.22 
Ecoli  84.90 ± 5.73 (λ_{2})  80.78 ± 5.55  85.99 ± 5.11  87.23 ± 5.22 
Append.  83.54 ± 7.27 (λ_{1})  77.24 ± 9.95  86.99 ± 8.71  85.72 ± 8.15 
Thyroid  97.30 ± 3.16 (λ_{1})  96.64 ± 2.95  93.58 ± 4.74  97.19 ± 2.59 
Sonar  83.75 ± 8.07 (λ_{1})  74.14 ± 9.69  81.78 ± 8.08  86.06 ± 7.43 
Digits  98.75 ± 0.35 (λ_{1})  90.27 ± 1.27  98.79 ± 0.37  99.26 ± 0.33 
SPECTF  80.07 ± 5.42 (λ_{2})  75.41 ± 6.20  77.90 ± 6.78  78.01 ± 3.92 
Conclusions
HLKAOG takes advantages provided by the Kassociated optimal graph and the highlevel technique for data classification. Specifically, the former provides a nonparametric construction of the network based on the purity measure, while the latter is able to capture pattern formation of the training data. Thus, some contributions of HLKAOG includes the following:

The technique does not work on classes, but on the network components, i.e., each class can have more than one component. In this way, the insertion of a test instance can generate bigger variations on the network measures. Consequently, it is much easier to check the conformation of a test instance to the pattern formation of each class component. On the other hand, the previous work of high level considers the network of a whole class of data items. In this case, the variations are very weak and sometimes it is difficult to distinguish the conformation levels of the test instance to each class.

The use of Kassociated optimal graph to obtain a nonparametric network.

An automatic way to obtain the influence coefficient for the network measures. In addition, this coefficient adapts itself according to each test instance.

Development of a new network measure named component efficiency to perform the highlevel classification.
Computer simulations and statistic tests show that HLKAOG presents good performance on both artificial and real data sets. In comparison with traditional machine learning techniques, computer simulations on realworld data sets showed that HLKAOG and support vector machines provide similar results and they outperform very wellknown techniques, such as decision trees and Knearest neighbors. On the other hand, experiments performed with artificial data sets emphasized some drawbacks of the traditional machine learning that, differently from HLKAOG, are unable to consider the formation pattern of the data.
Forthcoming works include the incorporation of dynamical complex network measures, such as random walk and tourist walk, to the highlevel classification algorithm, which can give a combined local and global vision in a natural way on the networks under analysis. Future researches include also a complete analysis of the highlevel classification when dealing with imbalanced data sets and the investigation of complex network measures able to prevent the risk of overfitting in the data classification.
Endnotes
^{a} A component is a subgraph α where any vertices v_{ i }∈ C can be reached by other v_{ j }∈ C and cannot be reached by any other vertices v_{ t }∉ C.
^{b} The degree of a vertex v, denoted by k_{ v }, is the total number of vertices adjacent to v.
Declarations
Acknowledgements
The authors would like to acknowledge the São Paulo State Research Foundation (FAPESP) and the Brazilian National Council for Scientific and Technological Development (CNPq) for the financial support given to this research.
Authors’ Affiliations
References
 Newman M: The structure and function of complex networks. SIAM Rev 2003, 45(2):167–256. 10.1137/S003614450342480MathSciNetView ArticleGoogle Scholar
 Costa LDF, Oliveira ON, Travieso G, Rodrigues FA, Boas PRV, Antiqueira L, Viana MP, Da Rocha LEC: Analyzing and modeling realworld phenomena with complex networks: a survey of applications. Adv Phys 2007, 60(3):103.Google Scholar
 Lu Z, Savas B, Tang W, Dhillon IS: Supervised link prediction using multiple sources. In 2010 IEEE international conference on data mining. Sidney, Australia; 2010:923–928.View ArticleGoogle Scholar
 Fortunato S: Community detection in graphs. Phys Rep 2010, 486(3–5):75–174.MathSciNetView ArticleGoogle Scholar
 Boccaletti S, Ivanchenko M, Latora V, Pluchino A, Rapisarda A: Detecting complex network modularity by dynamical clustering. Phys Rev Lett 2007, 75: 045102.Google Scholar
 Newman M: Networks: an introduction. Oxford University Press, New York; 2010.View ArticleGoogle Scholar
 Bertini JR, Lopes AA, Zhao L: Partially labeled data stream classification with the semisupervised kassociated graph. J Braz Comput Soc 2012, 18(4):299–310. 10.1007/s1317301200728MathSciNetView ArticleGoogle Scholar
 Bertini JR, Zhao L, Motta R, Lopes AA: A nonparametric classification method based on kassociated graphs. Inf Sci 2011, 181(24):5435–5456. 10.1016/j.ins.2011.07.043MathSciNetView ArticleGoogle Scholar
 Carneiro MG, Rosa JL, Lopes AA, Zhao L: Classificação de alto nível utilizando grafo kassociados ótimo. In IV international workshop on web and text intelligence. Curitiba, Brazil; 2012:1–10.Google Scholar
 Cupertino TH, Carneiro MG, Zhao L: Dimensionality reduction with the kassociated optimal graph applied to image classification. In 2013 IEEE international conference on imaging systems and techniques. Beijing, China; 2013:366–371.View ArticleGoogle Scholar
 Silva TC, Zhao L: Networkbased high level data classification. IEEE Trans Neural Netw 2012, 23: 954–970.View ArticleGoogle Scholar
 Bishop CM: Pattern recognition and machine learning. Information science and statistics. SpringerVerlag, New York; 2006.Google Scholar
 Mitchell T: Machine learning. McGrawHill series in Computer Science, McGrawHill, New York; 1997.Google Scholar
 Breiman L: Classification and regression trees. Chapman & Hall, London; 1984.Google Scholar
 Quinlan J: Induction of decision trees. Mach Learn 1986, 1: 81–106.Google Scholar
 Aha DW, Kibler D, Albert M: Instancebased learning algorithms. Mach Learn 1991, 6: 37–66.Google Scholar
 Haykin S: Neural networks: a comprehensive foundation,. Prentice Hall PTR, Upper Saddle River; 1998.Google Scholar
 Neapolitan RE: Learning Bayesian networks. PrenticeHall, Upper Saddle River; 2003.Google Scholar
 Cortes C, Vapnik V: Supportvector networks. Mach Learn 1995, 20(3):273–297.Google Scholar
 Chapelle O, Scholkopf B, Zien A: Semisupervised learning. MIT Press, Cambridge; 2006.View ArticleGoogle Scholar
 Schaeffer SE: Graph clustering. Comput Sci Rev 2007, 1(1):27–64. 10.1016/j.cosrev.2007.05.001MathSciNetView ArticleGoogle Scholar
 Yan S, Xu D, Zhang B, Zhang HJ, Yang Q, Lin S: Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 2007, 29(1):40–51.View ArticleGoogle Scholar
 Carneiro MG, Zhao L: High level classification totally based on complex networks. In Proceedings of the 1st BRICS Countries Congress. Porto de Galinhas, Brazil; 2013:1–8.Google Scholar
 Rossi R, de Paulo Faleiros T, de Andrade Lopes A, Rezende S: Inductive model generation for text categorization using a bipartite heterogeneous network. In 2012 IEEE international conference on data mining. Brussels, Belgium; 2012:1086–1091.View ArticleGoogle Scholar
 Andrade RFS, Miranda JGV, Pinho STR, Lobão TP: Characterization of complex networks by higher order neighborhood properties. Eur Phys J B 2006, 61(2):28.Google Scholar
 Newman MEJ: Assortative mixing in networks. Phys Rev Lett 2002, 89: 208701.View ArticleGoogle Scholar
 Latora V, Marchiori M: Efficient behavior of smallworld networks. Phys Rev Lett 2001, 87: 198701.View ArticleGoogle Scholar
 Watts D, Strogatz S: Collective dynamics of smallworld networks. Nature 1998, 393: 440–442. 10.1038/30918View ArticleGoogle Scholar
 Frank A, Asuncion A: UCI machine learning repository. 2010.http://archive.ics.uci.edu/ml . Accessed 10 Nov 2013Google Scholar
 AlcaláFdez J, Fernández A, Luengo J, Derrac J: García, S. MultipleValued Logic Soft Comput 2011, 17(2–3):255–287.Google Scholar
 Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E: scikitlearn: machine learning in Python. J Mach Learn Res 2011, 12: 2825–2830.MathSciNetGoogle Scholar
 Demšar J: Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 2006, 7: 1–30.MathSciNetGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.