Open Access

Network-based data classification: combining K-associated optimal graphs and high-level prediction

  • Murillo G Carneiro1Email author,
  • João LG Rosa2,
  • Alneu A Lopes2 and
  • Liang Zhao3
Journal of the Brazilian Computer Society201420:14

https://doi.org/10.1186/1678-4804-20-14

Received: 15 May 2013

Accepted: 2 April 2014

Published: 17 June 2014

Abstract

Background

Traditional data classification techniques usually divide the data space into sub-spaces, each representing a class. Such a division is carried out considering only physical attributes of the training data (e.g., distance, similarity, or distribution). This approach is called low-level classification. On the other hand, network or graph-based approach is able to capture spacial, functional, and topological relations among data, providing a so-called high-level classification. Usually, network-based algorithms consist of two steps: network construction and classification. Despite that complex network measures are employed in the classification to capture patterns of the input data, the network formation step is critical and is not well explored. Some of them, such as K-nearest neighbors algorithm (KNN) and ε-radius, consider strict local information of the data and, moreover, depend on some parameters, which are not easy to be set.

Methods

We propose a network-based classification technique, named high-level classification on K-associated optimal graph (HL-KAOG), combining the K-associated optimal graph and high-level prediction. In this way, the network construction algorithm is non-parametric, and it considers both local and global information of the training data. In addition, since the proposed technique combines low-level and high-level terms, it classifies data not only by physical features but also by checking conformation of the test instance to formation pattern of each class component. Computer simulations are conducted to assess the effectiveness of the proposed technique.

Results

The results show that a larger portion of the high-level term is required to get correct classification when there is a complex-formed and well-defined pattern in the data set. In this case, we also show that traditional classification algorithms are unable to identify those data patterns. Moreover, computer simulations on real-world data sets show that HL-KAOG and support vector machines provide similar results and they outperform well-known techniques, such as decision trees and K-nearest neighbors.

Conclusions

The proposed technique works with a very reduced number of parameters and it is able to obtain good predictive performance in comparison with traditional techniques. In addition, the combination of high level and low level algorithms based on network components can allow greater exploration of patterns in data sets.

Keywords

High-level classification Complex network Machine learning Data classification

Background

Introduction

Complex networks gather concepts from statistics, dynamical systems, and graph theory. Basically, they are large-scale graphs with nontrivial connection patterns [1]. In addition, the ability to capture spacial, functional, and topological relations is one of their salient characteristics. Nowadays, complex networks appear in many scenarios [2], such as social networks [3], biological networks [4], Internet and World Wide Web [5], electric energy networks [6], and classification and pattern recognition [711]. Thus, distinct fields of sciences, such as physics, mathematics, biology, computer science, and engineering have contributed to the large advances in complex network study.

Data classification is an important task in machine learning. It is related to construct computer programs able to learn from labeled data sets and, subsequently, to predict unlabeled instances [12, 13]. Due to the vast number of applications, many data classification techniques have been developed. Some of the well-known ones are decision trees [14, 15], instance-based learning, e.g., the K-nearest neighbors algorithm (KNN) [16], artificial neural networks [17], Naive-Bayes [18], and support vector machines (SVM) [19]. Nevertheless, most of them are highly dependent of appropriate parameter tunning. Examples include the confidence factor and the minimum number of cases to partition a set in C4.5 decision tree; the K value in KNN; the stop criterion, the number of neurons, the number of hidden layers, and others in artificial neural networks; and the soft margin, the kernel function, the kernel parameters, the stopping criterion, and others in SVM.

Complex networks have made considerable contributions to machine learning study. However, most of the researches related to complex networks are applied to data clustering, dimensionality reduction and semi-supervised learning [2022]. Recently, some network-based techniques have been proposed to solve supervised learning problems, such as data classification [8, 10, 11, 23, 24]. The obtained results show that network-based techniques have advantages over traditional ones in many aspects, such as the ability to detect classes of different shapes, absence of parameters, and the ability to classify data according to pattern formation of the training data.

In [8], the authors proposed a network-based classification algorithm, called K-associated optimal graph (KAOG). Among other characteristics, KAOG constructs a network through a local optimization structure based on a index of purity, which measures the compactness of the graph components. In the classification stage, KAOG combines the constructed network and a Bayes optimal classifier to compute the probability of a test instance belonging to each of the classes. One of the noticeable advantages of KAOG is that it is a non-parametric technique. This is a desirable feature, which will be employed in the technique proposed in the present work. However, both KAOG and other traditional classification techniques consider exclusively the physical features of the data (e.g., distance, similarity, or distribution). This limited way to perform classification tasks is known as low-level classification [11]. However, the human (animal) brain performs both low and high orders of learning for identifying patterns according to the semantic meaning of the input data. Data classification that considers not only physical attributes but also the pattern formation is referred to as high-level classification [11]. Figure 1 shows an illustrative data set in which there are two classes (black and gray circles) and a new test instance to be classified (a white circle). Applying a SVM with optimized parameters and radial basis function as kernel, the test instance is classified into the black class. The same occurs when using optimized versions of another algorithms, such as KNN and decision tree. However, one could consider that the test instance belongs to the well-defined triangle pattern formed by the gray circles. This example shows that traditional classification techniques fail to identify general data patterns. On the other hand, in [11], the authors present a quite different kind of classification technique that is able to consider the pattern formation of the training data by using the topological structure of the underlying network, which is called high-level classification. Specifically, the data pattern is identified by using some complex network measures. The test instance is classified by checking its conformation to each class of the network. Again, considering Figure 1, the high-level technique is able to detect the pattern formed by gray cycle class and put the test instance into it. Furthermore, high-level and low-level classifications can work together in a unique framework, as proposed in [11].
Figure 1

A two-class data set, in which gray class data items form a triangle pattern. The white color instance needs to be classified in one of the classes.

Despite the fact that high-level classification offers a new vision on the data classification, the network formation still depends on some parameters, such as the parameter K in KNN and the parameter ε in ε-radius technique. Moreover, there are other parameters in the technique, which involve the weight assignment to each network measure employed in the framework [11]. All these parameters are problem-oriented, and the selection of the values on these parameters is time-consuming and has a strong influence on the quality of classification.

In this paper, we propose a network-based classification technique combining two techniques: KAOG and the high-level classification. We referred to as high-level classification on K-associated optimal graph (HL-KAOG). It considers not only the physical attributes but also the pattern formation of the data. Specifically, the proposed technique provides the following:

  •  A non-parametric way to construct the network

  •  A high-level data classification with only one parameter

  •  A more sensitive high-level classification by examining the network components instead of classes. This is relevant because classes can consist of several (eventually distinct) components. Thus, the components are smaller than the networks of whole classes. Consequently, the high-level tests are more sensitive and good results can be obtained

  •  An automatic way to obtain the influence coefficient for the network measures. In addition, this coefficient adapts itself according to each test instance

  •  A new complex network measure adapted to high-level classification, named component efficiency

Computer simulations have been conducted to assess the effectiveness of the proposed technique. Interestingly, the results show that a larger portion of the high-level term is required to get correct classification when there is a complex-formed and well-defined pattern in the data set. In this case, we also show that traditional classification algorithms are unable to identify those data patterns. Moreover, computer simulations on real-world data sets show that HL-KAOG and support vector machines provide similar results and they outperform well-known techniques, such as decision trees and K-nearest neighbors.

The remainder of the paper is organized as follows: A background and a brief overview about the related works are presented in the ‘Overview’ section The proposed technique and the contributions of this work are detailed in the ‘Methods’ section. Empirical evaluation and discussions about the proposed algorithm on artificial and real data sets are showed in the ‘Results and discussion’ section. Finally, the ‘Conclusions’ section concludes the paper.

Overview

In this section, we review the most relevant classification techniques. Firstly, we present an overview on the network-based data classification. Then, we describe the network construction and classification using the K-associated optimal graph. Finally, we present the rationale behind the high-level classification technique.

Network-based data classification

In data classification, the algorithms receive as input a given training data set, denoted here X = {(inp1,lab1),…,(inp n ,lab n )}, where the pair (inp i ,lab i ) is the i-th data instance in the data set. Here, inp i = (x1,…,x d ) represents the attributes of a d-dimensional data item and lab i L = {L1,…,L C } represents the target class or label associated to that data item.

In network-based classification, the training data is usually represented as a network in which each instance is a vertex and the edges (or links) represent the similarity relations between vertices. The goal of the training phase is to induce a classifier from inp → lab by using the training data X.

In the prediction (classification) phase, the goal is to use the constructed classifier to predict new input instances unseen in training. So, there is a set of test instances Y = {(inpn+1,labn+1),…,(inp z ,lab z )}. In this phase, the algorithm receives only the inp and uses the constructed network to predict the correct class lab for that inp.

K-associated optimal graph

KAOG uses a purity measure to construct and optimize each component of the network. The resulting network together with the Bayes optimal classifier is used for classification of new instances. Specifically, a KAOG is a final network merging several K-associated graphs while maintaining or improving their purity measure. For the sake of clarity, the network construction phase can be divided in two concepts: creating a K-associated graph (Kac) and creating a K-associated optimal graph.

The K-associated graph builds up a network from a given data set and a K value, which is related to the number of neighbors to be considered for each vertex. Basically, the algorithm only connects a vertex v i to v j if v j is one of the K-nearest neighbors of v i and if v i and v j have the same class label.

Algorithm 1 shows in detail how a K-associated graph is constructed. In Kac, V denotes the set of vertices v i V, which represents all training instances; E provides the set of edges ei,j E, which contains all links between vertices; Λ v i , K contains the K-nearest neighbors of vertex v i ; c i is the class label of vertex i; findComponents(V,E) is a function that find all existing componentsa in the graph; C is the set of all components α C; purity (α) gives the compactness Φ α of each component α C; and G(K) represents the K-associated graph. Figure 2 shows the formation of the K-associated graph for (a) K = 1 and (b) K = 2.
Figure 2

Network formation using K -associated graph algorithm.

Also, about the K-associated graph, there are two important characteristics that are highlighted as follows:
  1. 1.

    Asymmetrical property: according to Algorithm 1, Kac returns a digraph. Digraphs provide good representation for the asymmetric nature existing in many data sets because many times v j Λ v i , K does not imply v i Λ v j , K .

     
  2. 2.
    Purity: this measure expresses the level of mixture of a component in relation to other components of distinct classes. Basically, the purity measure is given by
    Φ α = D α 2 K ,
    (1)
     
where D α denotes the average degree of a component α; and 2K is the maximum number of possible links that a vertex can have. D α can be obtained by
D α = 1 N i = 1 N k i ,
(2)

in which N is the total number of vertices in α and k i is the degreeb of vertex i.

Despite the fact that K-associated graph presents good performance on data classification, there is a drawback: it uses the same value of K to form networks for all data classes. However, rarely a network obtained by an unique value of K is able to produce the best configuration of instances in the components, in terms of the purity measure. In this way, an algorithm, which is able to adapt itself to different classes of the data set is welcome. The idea of KAOG is to obtain the optimal K value for each component in order to maximize its purity [8].

Algorithm 2 shows in detail the construction of KAOG from K-associated graphs. In the algorithm, G(Ot) denotes the K-associated optimal graph and lastAvgDegree is the average degree of the network before incrementing the value of K; note that no parameter is introduced in the algorithm. In the following, we describe Algorithm 2. In the first lines, K starts with the value 1 and, thus, the 1-associated graph is considered as the optimal graph (Gopt) at this moment. After the initial setting, a loop starts to merge the subsequent K-associated graphs by increasing K, while improving the purity of the network encountered so far, until the optimal network measured by the purity degree [8] is reached. Between lines 7 and 12, the algorithm verifies for each component of the K-associated graph ( C β ( K ) ) whether the condition given by line eight is satisfied. In affirmative case, an operation to remove the components that compose C β ( K ) from the optimal graph is performed (line 9). In line 10, the new component is added to G opt . At the end, the algorithm returns the obtained components with their respective values of K and purities.

About the time complexity of the KAOG network construction, given a training set with N instances and d attributes, the complexity for generating the corresponding distance matrix is N (N - 1) d which yields the complexity order of O (N2). Another functions such as find the graph components and compute the purity measure of the graph components yield the complexity order of O (N). Therefore, the time complexity to build up the network is O (N2) [8].

After the construction of the KAOG network, a Bayes optimal classifier (BayesOC) performs the classification of new instances using the constructed network. Let us consider a new instance y to be classified. KAOG performs the classification computing the probability of y belonging to each component. Thus, from the Bayes theory, the a posteriori probability of y to belong to a component α taking the K α -nearest neighbors of this new case (Λ y ) is given by
P ( y α | Λ y ) = P ( Λ y | y α ) P ( y α ) P ( Λ y ) .
(3)
According to [8], the probability of the neighborhood Λ y given that y belongs to α is given by
P ( Λ y | y α ) = | Λ y , K α α | K α .
(4)
The normalization term P (Λ y ) is obtained by
P ( Λ y ) = N y , β 0 P ( Λ y | y α ) P ( y α ) .
(5)
Also, the a priori probability P (y α) is given by
P ( y α ) = Φ α N y , β 0 Φ β .
(6)
In addition, if the number of components in the network is bigger than the number of classes in the data set, BayesOC sums up the probabilities associated to the each class j, as follows:
P y ( j ) = α j P ( y α | Λ y ) .
(7)

At the end, BayesOC chooses the class with the largest a posteriori probability.

About the time complexity, the BayesOC classification yields the complexity order of O (N) that is related to the calculation of the distance matrix between the training set and the test case [8].

High-level classification

High-level prediction considers not only physical attributes but also global semantic characteristics of the data [11]. This is due to the use of complex network measures, which are able to capture the pattern formation of the input data. The next sections provide more details about high-level classification.

Network construction As all other network-based learning techniques, here, the first step is the network construction from the vector-based input data. In fact, the way how the network is constructed influences to a large extent the classification results of the high-level prediction. In [11], the authors propose a network construction method combining the ε-radius and KNN algorithms, which is given by
Net ( i ) = ε - radius ( inp i , lab i ) , if | ε - radius ( inp i , lab i ) | > K KNN ( inp i , lab i ) , otherwise
(8)

where KNN and ε-radius return, respectively, the set containing the K-nearest vertices of the same class of vertex i and the set of vertices of the same class of i, in which the distance from i is smaller than ε. Note that K and ε are user-controllable parameters. In addition, the algorithm connects the vertex i to other vertices using ε-radius when the condition is satisfied and using K NN, otherwise.

As shown in Equation 8, the network formation algorithm depends on some parameters. Different parameter values produce very distinct results. Moreover, the model selection of the parameters is time-consuming. For this reason, we propose a non-parametric network formation method based on the KAOG to work together with the high-level technique. The ‘Methods’ section describes our proposal.

Hybrid classification technique In [11], the authors propose a hybrid classification technique combining a low-level term and a high-level term, which is given by
M y ( J ) = ( 1 - λ ) P y ( J ) + λ H y ( J ) .
(9)

Considering a test instance y Y, M y ( J ) denotes the association produced by low- and high-level algorithms when evaluating instance y for the class J. Also in the equation, the variable P y ( J ) [ 0 , 1 ] establishes the association produced by low-level classifier between the instance y and the class J. On the other hand, the variable H y ( J ) [ 0 , 1 ] points to an association produced by the high-level technique (composed by complex network measures, such as assortativity and clustering coefficient [1]) between y and the class J. Finally, λ [ 0,1] is a user-controllable variable and it defines a weight assigned to each produced classification. Note that λ just defines the contribution of low- and high-level classifications. For example, if λ = 0, only low-level algorithm works.

The high-level classification of a new instance y for a given class J is given by
H y ( J ) = u = 1 Z δ ( u ) [ 1 - f y ( J ) ( u ) ] g L u = 1 Z δ ( u ) [ 1 - f y ( g ) ( u ) ] ,
(10)

where H y ( J ) [ 0 , 1 ] , u is related to the network measures employed in the high-level algorithm, δ (u) [ 0,1], u {1,…,Z} is a user-controllable variable that indicates the influence of each network measure in the classification process and f y ( J ) ( u ) provides an answer whether the test instance y presents the same patterns of the class J or not, considering the u-th network measure applied. The denominator term is only for normalization. There is also a constraint about δ (u); (10) is valid only if u = 1 K δ ( u ) = 1 .

About f y ( J ) ( u ) , it is given by
f y ( J ) ( u ) = Δ G y ( J ) ( u ) p ( J ) ,
(11)

in which Δ G y ( J ) ( u ) [ 0 , 1 ] represents the variation that occurs in a complex network measure whenever a new instance y Y is inserted and p(J) [ 0,1] is the proportion of instances that belongs to the class J.

Complex network measures In fact, complex network measures are used to provide a high-level analysis on the data [25]. So, when a new instance y needs to be classified, the technique computes the impact by inserting this new vertex for each class in an isolated way. Basically, the variation of the results in network measures indicates which is the class that y belongs to. In other words, if there is a little variation in the formation pattern of that class when connecting y to it, high-level prediction returns a big value indicating that y is in conformity with this pattern. In the opposite, if there is a great variation when linking y to a class, it returns a small value denoting that y is not in conformity with this pattern.

In [11], three network measures are employed to check the pattern formation of the input data: assortativity, average degree, and clustering coefficient [26]. A more detailed view about the network measures employed in HL-KAOG is provided in the next section.

Methods

Most machine learning algorithms perform the classification exclusively based on physical attributes of the data. They are called low-level algorithms. One example includes the BayesOC algorithm showed in the previous section. On the other hand, complex network-based techniques provide a different kind of classification that is able to capture formation patterns in the data sets.

Actually, the principal drawback in the use of complex network measures for the data classification is the network formation. Some techniques have been largely used in the literature, such as KNN and ε-radius, but they depend on parameters. This means that the technique is not able to detect information in the data, so different parameters produce very distinct results. In HL-KAOG, we exploit the KAOG ability to produce an efficient and nonparametric network to address this problem. In addition, other contributions are presented here.

This section describes the principal contributions of this investigation. The ‘Component efficiency measure’ section shows a new complex network measure for high-level classification: the component efficiency. The ‘Linking high-level prediction and KAOG algorithm’ section provides details about how KAOG network and high-level classifier work together. The ‘High-level classification on network components’ section denotes an important conceptual modification in our high-level approach: complex network measures are employed on graph components. The ‘Non-parametric influence coefficient for the network measures’ section shows an automatic way to obtain the influence coefficient of the network measures. The ‘Complex network measures per component’ section provides the adaptation of the complex network measures to work on components instead of classes.

Component efficiency measure

The component efficiency measure quantifies the efficiency of the component in sending information between its vertices. The component efficiency is a new network measure incorporated into the high-level classification technique. Its development is motivated by the concept of efficiency of a network [27], which measures how efficient the network exchanges information. Once our high-level algorithm is based on the component level, we named our measure as component efficiency.

Initially, consider a vertex i in a component α. The local efficiency of i is given by
E i ( α ) = 1 V i j Λ i q ij ,
(12)

where V i denotes the number of links from i, Λ i represents the vertex that receives links from i, and q ij is related to the geodesic distance between i and j.

We define the efficiency of a component α as the average of the local efficiency of the nodes that belong to α. So, we have
E ( α ) = 1 V α i = 1 V α E i ( α ) ,
(13)

in which V α is the number of vertices in the component α.

Linking high-level prediction and KAOG algorithm

We adapted the concepts of K-associated optimal graph and high-level algorithm to permit that they work together. Firstly, K-associated optimal graph divides the network in components according to purity measure. Thus, the high-level technique proposed here considers components instead of classes. In addition, different from BayesOC that considers only those components in which at least one vertex belongs to the nearest neighbors of the test instance, the high-level algorithm employs complex network measures to examine if the insertion of the test instance in a component is in conformity with the formation pattern of that component.Figure 1 shows an illustrative example in which the network topology cannot be detected by the original KAOG. So, if we employ KAOG directly to our model, it would not be able to classify the new instance into gray cycle class. Instead, it will be classified into black cycle class. On the other hand, our model permits the correct classification because it uses available information in each component constructed by KAOG in a different way.

Suppose a test instance y will be classified. The classification stage of our network-based technique can be divided in two steps according to [23]:
  1. 1.

    Firstly, the proposed technique uses the component efficiency measure to verify what are the components where y can be inserted. This information is important especially because it considers the local features of the components and establishes a heuristic that excludes components that are not in conformity with the insertion of y into them.

     
In a more formal definition, let us consider a component α and a set F related to the components in which the variations of complex network measures will be computed. For each new instance y, F y is given by
F y F y { α | min e y ( α ) E ( α ) } ,
(14)
where e y ( α ) denotes the local efficiency of y to each vertex that belongs to component α and E(α) is the component efficiency of α.
  1. 2.
    The next step is the insertion of y in each α F y . According to our technique, y makes connections with each vertex i α following the equation given by
    α y α { i | e y ( i ) E ( α ) } ,
    (15)
     

where α y includes component α and the connections between y and their vertices, e y ( i ) is the local efficiency in exchanging information between y and i, and e y ( i ) E ( α ) is the condition to be satisfied to assure a link between y and i.

Note that if F y = in (14) (very unusual situation), the algorithm employs the K α value associated with each component α Λ and verifies that the vertices in α are one of the K α -nearest neighbors of y. If there is at least one vertex that satisfies this condition in component α, then the complex network measures are applied to this component; otherwise, α is not considered in the classification phase.

High-level classification on network components

Once the network is obtained by KAOG, the high-level algorithm can be applied to classify new instances by checking the variation of complex network measures in each component before and after the insertion of the new instance. Thus, the proposed high-level technique works on the components instead of whole classes. This is an important feature introduced in this work. Since each class can encompass more than one component, each component is smaller or equal to the corresponding whole class. In this way, the insertion of a test instance can generate more precise variations on the network measures. Consequently, it is easier to check the conformity of a test instance to the pattern formation of each class component. On the other hand, the previous work of high-level classification considers the network of a whole class of data items. In this case, the variations are weaker and sometimes it is difficult to distinguish the conformity levels of the test instance to each class. Therefore, taking (9), the high-level classification of a new instance y for a given component α, is given by
M y ( α J ) = ( 1 - λ ) P y ( α J ) + λ H y ( α J ) .
(16)

where α J denotes a component α in which its instances belong to class J, P y ( α J ) establishes the association produced by a low-level classifier between the instance y and the component α, and H y ( α J ) points to an association produced by the high-level technique between y and the component α. The general idea behind H y ( α J ) is very simple: (i) KAOG network finds a set of components based on the purity measure, (ii) high-level technique examines these components in relation to the insertion of a test instance y, and (iii) the probabilities for each component α are obtained.

Also about H y ( α ) , it represents the compatibility of each component α with the new instance y and is given by
H y ( α J ) = u = 1 Z δ y ( u ) [ 1 - f y ( α ) ( u ) ] g L u = 1 Z δ y ( u ) [ 1 - f y ( g ) ( u ) ] ,
(17)

in which u is related to the network measures employed in the high-level algorithm, δ y (u) [ 0,1], u {1,…,Z} indicates the influence of each network measure in the classification process, and f y ( α ) ( u ) provides an answer whether the test instance y presents the same patterns of the component α or not, considering the u-th network measure. The denominator term in (17) is only for normalization. Details about δ y (u) are provided in the ‘Non-parametric influence coefficient for the network measures’ section.

From (17), a simple modification is performed in (11) to obtain a high-level classification on components (α) instead classes (J). So, we have
f y ( α ) ( u ) = Δ G y ( α ) ( u ) p ( α ) α Δ G y ( α ) ( u ) p ( α ) ,
(18)

where the denominator term in (18) is only for normalization and p(α) [ 0,1] is the proportion of instances that belong to the component α.

Non-parametric influence coefficient for the network measures

Differently of the previous works, the high-level technique proposed in this work is non-parametric not only at the network construction phase but also at the classification phase. We have developed an automatic way for the weight assignment among the employed network measures, i.e., the δ term in Equation 17 is determined by
δ y ( u ) = 1 - ( max α Δ G y ( α ) ( u ) - min α Δ G y ( α ) ( u ) ) u = 1 K 1 - ( max α Δ G y ( α ) ( u ) - min α Δ G y ( α ) ( u ) ) ,
(19)

in which Δ G y ( α ) ( u ) [ 0 , 1 ] represents the variation that occurs in a complex network measure whenever a new instance y Y is inserted. Therefore, δ y (u) is based on the opposite of the difference between the biggest and the smallest u-th network measure variation on all components α. The idea of determining δ y (u) in this way is to balance all the employed network measures in the decision process and not permit only one network measure to dominate the classification decision. Note that this equation is valid only if u = 1 K δ y ( u ) = 1 .

Complex network measures per component

The complex network measures presented in the previous section work on classes. Differently of the approach proposed in [11], the approach proposed in this work considers pattern formation per component. This feature makes the algorithm more sensitive to the network measure variations. In addition, the components are constructed regarding the purity measure, which gives more precise information about the data set.

In this section, we adapted assortativity and clustering coefficient to work on components. Also, as the time complexity of the high-level classification is directly related to the network measures employed, we present the complexity order of each measure.

Assortativity ( Δ G y ( J ) ( 1 ) )

The assortativity measure quantifies the tendency of connections between vertices [26] in a complex network. This measure analyzes whether a link occurs preferentially between vertices with similar degree or not. The assortativity with regard to each component α of the data set is given by
r ( α ) = L - 1 u U J i u k u - [ L - 1 u U J 1 2 ( i u + k u ) ] 2 L - 1 u U J 1 2 ( i u 2 + k u 2 ) - [ L - 1 u U J 1 2 ( i u + k u ) ] 2
(20)

where r(α) [ -1,1], U α = {u:i u α k u α} encompasses all the edges within the component α, u represents an edge, and i u ,k u indicate the vertices at each end of the edge u.

Therefore, the membership value of a test instance y Y with respect to the component α is given by
Δ G y ( α ) ( 1 ) = r ( α ) - r ( α ) u U r ( u ) - r ( u ) .
(21)

The assortativity measure yields the complexity order of O(|E|+|V|), where |E| and |V| denote, respectively, the number of edges and the number of vertices in the graph.

Clustering coefficient ( Δ G y ( J ) ( 2 ) )

Clustering coefficient is a measure that quantifies the degree to which local nodes in a network tend to cluster together [28]. The clustering coefficient with regard to each component α of the data set is given by
C C i ( α ) = e us k i ( k i - 1 ) ,
(22)
C C ( α ) = 1 V α i = 1 V α C C i ( α ) ,
(23)
in which C C i ( α ) [ 0 , 1 ] and V α denotes the number of vertices in the component α. The membership value of a test instance y Y with respect to the component α is given by:
Δ G y ( α ) ( 2 ) = C C ( α ) - C C ( α ) u U C C ( u ) - C C ( u ) .
(24)

The clustering coefficient measure yields the complexity order of O (|V|p2), where |V| and p denote, respectively, the number of vertices in the graph and the average node degree.

Average degree ( Δ G y ( J ) ( 3 ) )

The average degree is a very simple measure. It quantifies, statistically, the average degree of the vertices in a component. The average degree with regard to each component α is given by
k ( α ) = 1 V α i = 1 V α k i ( α ) ,
(25)
in which k(α) [ 0,1] and V α denotes the number of vertices in component α. Regarding the membership value of a test instance y Y with respect to component α, it is given by
Δ G y ( α ) ( 3 ) = k ( α ) - k ( α ) u Γ k ( u ) - k ( u ) .
(26)

The average degree measure yields the complexity order of O (|V|), where |V| denotes the number of vertices in the graph.

Results and discussion

In this section, we present a set of computer simulations to assess the effectiveness of HL-KAOG technique. The ‘Experiments on artificial data sets’ section supplies results obtained on artificial data sets, which emphasize the key features of the HL-KAOG. The ‘Experiments on real data sets’ section provides simulations on real-world data sets, which highlight a great ability of HL-KAOG to perform data classification. Note that the Euclidean distance is used as similarity measure in all the experiments.

Experiments on artificial data sets

Initially, we use some artificial data sets presenting strong patterns to evaluate the proposed technique. These examples provide particular situations where low-level classifiers by themselves have trouble to correctly classify the data items in the test set. Thus, this section serves as a tool for better motivating the usage of the proposed model.

The first step of HL-KAOG is the construction of the KAOG network. Different from other techniques of network construction, KAOG is non-parametric and it builds up the network considering the purity measure (1). In the second step, HL-KAOG employs the hybrid low- and high-level techniques to classify the test instances. The low-level classification here uses Bayes optimal classifier (3) and the high-level term uses the complex network measures given by (17) to capture the pattern formation of each component. Finally, the general framework given by (16) combines low- and high-level techniques.

Triangle data set

Figure 3 shows the KAOG network on a two-class data set. One can see that the gray class data items exhibit a strong pattern (similar to a triangle). Since the traditional machine learning techniques are not able to consider the pattern formation of classes, they cannot classify the new instance y (white color vertex) correctly. In addition, these techniques consider only the physical distance among the data items, which contributes to classify y as belonging to class black. So, as the use of the Bayes optimal classifier, the same thing happens if we use other traditional techniques, such as decision tree, K-nearest neighbors, and support vector machine (SVM classifiers consider transformed feature space through a kernel function, but it is essentially based on the physical distance among the data), i.e., they are not able to identify the triangle pattern formed by the gray component.
Figure 3

KAOG network obtained from the Triangle data set.

Table 1 provides a detailed view about how the high-level classification is performed in HL-KAOG. Considering the new instance y and the network showed in Figure 3, the classification process starts by computing each network measure on each component that satisfies (14). Then, HL-KAOG inserts y temporally in these components, according to (15), and computes the network measure on each of them again. So, the variation of each network measure before and after the insertion of the new instance can be obtained and (17) can be computed. Remember that the objective of the proposed function (19) is to balance the membership results obtained by the complex network measures in a way that the classification process is performed considering all of the applied network measures.
Table 1

Employed complex network measures in high-level classification on the Triangle data set

C α

Complex network measures

 

Assortativity

Clustering coefficient

Average degree

 

r ( . )

r ( . )

Δ G y ( . ) ( 1 )

f y ( . ) ( 1 )

C C ( . )

C C ( . )

Δ G y ( . ) ( 2 )

f y ( . ) ( 2 )

E ( . )

E y ( . )

Δ G y ( . ) ( 3 )

f y ( . ) ( 3 )

C gray

0.977

0.968

0.930

0.869

0.118

0.101

0.408

0.256

6.000

5.667

0.657

0.489

C black

0.850

0.851

0.070

0.131

0.188

0.212

0.592

0.744

8.000

7.826

0.343

0.511

The calculated assortativity, clustering coefficient, and component efficiency measures for each component.

Table 2 presents the performance of the proposed technique on the artificial data sets. The ‘Classification result’ column shows the final probability of classification when λ = 0.2 and λ = 0.6 (high-level algorithms gets a higher contribution to the classification). The results showed in Table 2 emphasize some interesting properties of HL-KAOG. Firstly, one see that, combining a larger portion of the high-level classification (λ = 0.6), the technique is able to correctly classify the test instance when considering a very clear pattern. However, using a larger portion of the KAOG classification (λ = 0.2), the pattern formation is not detected because Bayes optimal classifier considers only physical attributes to perform the prediction. Secondly, we employ a non-parametric technique to build up the network and to assign a weight to each network measure. So, HL-KAOG has only the λ value to set. Thirdly, HL-KAOG does not work on classes, but on the components. This is an important difference in relation to previous work about high level because the insertion of one vertex in a whole class of data items can seem very weak to present big network measures variations. Thus, the variation at the component level can highlight better the variation when y is inserted in that component.
Table 2

Classification results (in terms of probability) obtained for each artificial data set when λ =0 . 2 and λ =0 . 6

Data set

Class

Classification result

  

λ= 0 . 2

λ= 0 . 6

Triangle data set

gray

0.439

0.516

 

black

0.561

0.483

Line I data set

gray

0.481

0.586

 

black

0.519

0.414

Line II data set

gray 1

0.000

0.000

 

gray 2

0.200

0.600

 

black 1

0.400

0.200

 

black 2

0.400

0.200

Multi-class data set

black

0.800

0.400

 

gray

0.000

0.000

 

green

0.200

0.600

Italic values denote the final classification result.

Line I data set

Figure 4 shows the KAOG network on a two-class data set, in which gray class presents a clear pattern of a straight line. Note that the three nearest neighbors of the new instance y belong to the following classes: gray, black and black, respectively. In this way, KAOG classification is not able to detect the pattern presented by the gray component. The whole classification process performed by the Bayes optimal classifier on the KAOG is described as follows: Firstly, the algorithm computes the probability a priori of y to belong to each component as described in (6):
Figure 4

KAOG network obtained from Line I data set, in which gray class data items form a straight line. Line I data set is composed of two-class data items (gray and black). White data item needs to be classified. Obviously, gray data items form a straight line.

P ( y C gray ) = 0.5 P ( y C black ) = 0.5
where the normalized purity measure of the components is used as a priori probability (Φgray = 1 and Φblack = 1). Then, the probability of the neighborhood Λ y given that y Cgray and y Cblack is calculated as in(4):
P ( Λ y | y C gray ) = 1 / 2 P ( Λ y | y C black ) = 2 / 3 .
Then, the normalization term can be computed through (5):
P ( Λ y ) 0.583 .
Finally, the final result of the KAOG classification on Line 1 data set can be obtained with (3) and (7):
P y ( gray ) = P ( y C gray | Λ y ) 0.43 P y ( black ) = P ( y C black | Λ y ) 0.57 .

Note that other low-level algorithms classify y as belonging to black class too. On the other hand, Table 2 shows that, with a high contribution of the high-level term (λ = 0.6), the technique can produce a correct label for y, according to the pattern formation presented in Figure 4.

Line II data set

Figure 5 shows an interesting situation by two reasons. Firstly, because the new instance is very close to the black components, it implies that the traditional techniques classify the new instance to the black class. Secondly, the big distances between the new instance and gray class data items make it difficult to be classified to the gray class. However, HL-KAOG can provide a correct classification due to its robustness to detect pattern formation, as shown in Table 2.
Figure 5

KAOG network generated from Line II data set. This data set is composed of two-class data items (gray and black). White data item needs to be classified.

Multi-class data set

Figure 6 shows the KAOG network on a data set with three classes. There is a test instance (white color) that will be classified. Figure 6a presents a dubious case where there is no pattern formation related to the test instance. In this way, the technique (independent of the λ value) classifies the test instance as belonging to the black class. On the other hand, Figure 6b shows a more representative information about the pattern formation related to the test instance. So, considering this clear pattern formation, the test instance is correctly classified when using a combination between low- and high-level algorithms where λ = 0.6 (Table 2).
Figure 6

KAOG network found from different Multi-class data sets scenarious: (a) dubious case in detection of some pattern formation related to the test instance; (b) a clear pattern formation related to the test instance.

Experiments on real data sets

We also conducted simulations on real-world data sets available in UCI repository [29] and KEEL data sets [30]. Table 3 provides details about the data sets used here.
Table 3

Brief description of the data sets

Name

#Inst.

#Attr.

#Classes

Maj. class (%)

Iris

150

4

3

33.33

Glass

214

9

7

35.51

Balance

625

4

3

46.08

Monks-2

601

6

2

65.72

Ecoli

336

7

8

42.56

Append.

106

7

2

80.19

Thyroid

215

5

3

69.77

Sonar

208

60

2

53.37

Digits

5,620

64

10

10.18

SPECTF

267

44

2

79.40

Name, the data set name; #Inst., the number of instances; #Attr., the number of attributes; #Classes, the number of classes; Maj. class, the percentage of instances in the majority class.

The proposed technique is compared to decision tree, K-nearest neighbors, and support vector machine. All these traditional techniques are available in the python machine learning module named scikit-learn [31]. Grid search algorithm is employed to perform parameter selection for these techniques. For decision tree, scikit-learn provides an optimized version of the classification and regression tree (CART) algorithm [14]. In these experiments, two parameters are configured: the minimum density over the set {0,0.1,0.25,0.5,0.8,1}, which controls a trade-off in an optimization heuristic, and the minimum number of samples required to be at a leaf node, here denoted as ms, which is optimized over the set ms {0,1,2,3, 4,5,10,15,20,30,50}. Based on the previous works applying the KNN on real-world data sets [8], a K value is optimized over the set K {1,3,5,…,31}, which is sufficient to provide the best results for this algorithm. In SVM simulations, we reduce the search space for the optimization process by fixing a single well-known kernel, namely the radial basis function (RBF) kernel. The stopping criterion for the optimization method is defined as the Karush-Kuhn-Tucker violation to be less than 10 -3. For each data set, the model selection is performed by considering the kernel parameter γ {24,23,…,2-10 and the cost parameter C 212,211,…,2-2. Finally, the results obtained by each algorithm are averaged over 30 runs using the stratified 10-fold cross-validation process. Parameters were tuned using only the training data.

On the other hand, selection of parameters is unnecessary for our technique when building up a network. Differently of the parameters in traditional machine learning (such as K in KNN, kernel function in SVM, and so on), λ variable does not influence the training phase. In this way, we fix λ1 = 0.2 (due to good values found in [11]) and λ2 = 0.6 (to provide a very larger portion of the high-level classification). In the previous section, artificial data sets provided particular situations where low-level techniques have trouble to perform the classification. Here, we evaluate (i) the linear combination between K-associated optimal graph and high-level algorithm, and (ii) the λ influence in a context of real-world data sets. Once most techniques are essentially based on low-level characteristics, it is sure that these information contribute for a good classification. Consequently, when working together with low-level characteristics, high-level characteristics can improve the classification results by considering more than the physical attributes.

Table 4 shows the predictive performance of the algorithms on real-world data sets presented in Table 3. In this table, ‘Acc’. denotes the average of accuracy for each technique and ‘Std.’ represents the standard deviation of this accuracies. In order to analyze statistically the results, we adopted a statistical test that compares multiple classifier over multiple data sets [32]. Firstly, Friedman test is calculated to check whether the performance of the classifiers are significantly different. Using a significance level of 5%, the null hypothesis is rejected. This means that the algorithms under study are not equivalent. Following a post hoc test, Nemenyi test is employed (also considering a significance level of 5%). The results of this test indicate that HL-KAOG and SVM provide similar results and they outperform CART and KNN. This result is quite attractive because HL-KAOG, different from other traditional techniques, is able to capture spacial, functional, and topological relations in the data. In addition, the computer simulations show that a very larger portion of high-level classification (λ2) can improve the final prediction in some real-world data sets. This means that these data sets present well-defined patterns, which can be detected considering mainly the topological structure of the data, instead of their physical attributes.
Table 4

Comparative results obtained by HL-KAOG, CART, KNN, and SVM on ten real-world data sets

 

HL-KAOG

CART

KNN

SVM

Data set

Acc. ± Std.

Acc. ± Std.

Acc. ± Std.

Acc. ± Std.

Iris

97.33 ± 3.52 (λ1)

93.60 ± 5.59

96.37 ± 4.63

96.28 ± 4.02

Glass

70.78 ± 9.16 (λ1)

64.12 ± 9.33

72.64 ± 8.09

68.61 ± 7.78

Balance

95.71 ± 2.40 (λ1)

88.20 ± 4.25

89.77 ± 1.96

99.97 ± 0.08

Monks-2

96.53 ± 2.43 (λ1)

95.67 ± 2.48

81.26 ± 5.02

93.79 ± 3.22

Ecoli

84.90 ± 5.73 (λ2)

80.78 ± 5.55

85.99 ± 5.11

87.23 ± 5.22

Append.

83.54 ± 7.27 (λ1)

77.24 ± 9.95

86.99 ± 8.71

85.72 ± 8.15

Thyroid

97.30 ± 3.16 (λ1)

96.64 ± 2.95

93.58 ± 4.74

97.19 ± 2.59

Sonar

83.75 ± 8.07 (λ1)

74.14 ± 9.69

81.78 ± 8.08

86.06 ± 7.43

Digits

98.75 ± 0.35 (λ1)

90.27 ± 1.27

98.79 ± 0.37

99.26 ± 0.33

SPECTF

80.07 ± 5.42 (λ2)

75.41 ± 6.20

77.90 ± 6.78

78.01 ± 3.92

‘Acc’ and ‘Std’. denote, respectively, the average of accuracy and the standard deviation over 30 runs using the stratified 10-fold cross-validation process. In HL-KAOG, the classification result is obtained from the best value between λ1 = 0.2 and λ2 = 0.6. Italic values denote the best predictive performance among the techniques for each data set.

Conclusions

HL-KAOG takes advantages provided by the K-associated optimal graph and the high-level technique for data classification. Specifically, the former provides a non-parametric construction of the network based on the purity measure, while the latter is able to capture pattern formation of the training data. Thus, some contributions of HL-KAOG includes the following:

  •  The technique does not work on classes, but on the network components, i.e., each class can have more than one component. In this way, the insertion of a test instance can generate bigger variations on the network measures. Consequently, it is much easier to check the conformation of a test instance to the pattern formation of each class component. On the other hand, the previous work of high level considers the network of a whole class of data items. In this case, the variations are very weak and sometimes it is difficult to distinguish the conformation levels of the test instance to each class.

  •  The use of K-associated optimal graph to obtain a non-parametric network.

  •  An automatic way to obtain the influence coefficient for the network measures. In addition, this coefficient adapts itself according to each test instance.

  •  Development of a new network measure named component efficiency to perform the high-level classification.

Computer simulations and statistic tests show that HL-KAOG presents good performance on both artificial and real data sets. In comparison with traditional machine learning techniques, computer simulations on real-world data sets showed that HL-KAOG and support vector machines provide similar results and they outperform very well-known techniques, such as decision trees and K-nearest neighbors. On the other hand, experiments performed with artificial data sets emphasized some drawbacks of the traditional machine learning that, differently from HL-KAOG, are unable to consider the formation pattern of the data.

Forthcoming works include the incorporation of dynamical complex network measures, such as random walk and tourist walk, to the high-level classification algorithm, which can give a combined local and global vision in a natural way on the networks under analysis. Future researches include also a complete analysis of the high-level classification when dealing with imbalanced data sets and the investigation of complex network measures able to prevent the risk of overfitting in the data classification.

Endnotes

a A component is a sub-graph α where any vertices v i C can be reached by other v j C and cannot be reached by any other vertices v t C.

b The degree of a vertex v, denoted by k v , is the total number of vertices adjacent to v.

Declarations

Acknowledgements

The authors would like to acknowledge the São Paulo State Research Foundation (FAPESP) and the Brazilian National Council for Scientific and Technological Development (CNPq) for the financial support given to this research.

Authors’ Affiliations

(1)
Faculty of Computing (FACOM), Federal University of Uberlândia (UFU)
(2)
Institute of Mathematics and Computer Science (ICMC), University of São Paulo (USP)
(3)
Department of Computing and Mathematics, School of Philosophy, Sciences and Literatures (FFCLRP), University of São Paulo (USP)

References

  1. Newman M: The structure and function of complex networks. SIAM Rev 2003, 45(2):167–256. 10.1137/S003614450342480MathSciNetView ArticleGoogle Scholar
  2. Costa LDF, Oliveira ON, Travieso G, Rodrigues FA, Boas PRV, Antiqueira L, Viana MP, Da Rocha LEC: Analyzing and modeling real-world phenomena with complex networks: a survey of applications. Adv Phys 2007, 60(3):103.Google Scholar
  3. Lu Z, Savas B, Tang W, Dhillon IS: Supervised link prediction using multiple sources. In 2010 IEEE international conference on data mining. Sidney, Australia; 2010:923–928.View ArticleGoogle Scholar
  4. Fortunato S: Community detection in graphs. Phys Rep 2010, 486(3–5):75–174.MathSciNetView ArticleGoogle Scholar
  5. Boccaletti S, Ivanchenko M, Latora V, Pluchino A, Rapisarda A: Detecting complex network modularity by dynamical clustering. Phys Rev Lett 2007, 75: 045102.Google Scholar
  6. Newman M: Networks: an introduction. Oxford University Press, New York; 2010.View ArticleGoogle Scholar
  7. Bertini JR, Lopes AA, Zhao L: Partially labeled data stream classification with the semi-supervised k-associated graph. J Braz Comput Soc 2012, 18(4):299–310. 10.1007/s13173-012-0072-8MathSciNetView ArticleGoogle Scholar
  8. Bertini JR, Zhao L, Motta R, Lopes AA: A nonparametric classification method based on k-associated graphs. Inf Sci 2011, 181(24):5435–5456. 10.1016/j.ins.2011.07.043MathSciNetView ArticleGoogle Scholar
  9. Carneiro MG, Rosa JL, Lopes AA, Zhao L: Classificação de alto nível utilizando grafo k-associados ótimo. In IV international workshop on web and text intelligence. Curitiba, Brazil; 2012:1–10.Google Scholar
  10. Cupertino TH, Carneiro MG, Zhao L: Dimensionality reduction with the k-associated optimal graph applied to image classification. In 2013 IEEE international conference on imaging systems and techniques. Beijing, China; 2013:366–371.View ArticleGoogle Scholar
  11. Silva TC, Zhao L: Network-based high level data classification. IEEE Trans Neural Netw 2012, 23: 954–970.View ArticleGoogle Scholar
  12. Bishop CM: Pattern recognition and machine learning. Information science and statistics. Springer-Verlag, New York; 2006.Google Scholar
  13. Mitchell T: Machine learning. McGraw-Hill series in Computer Science, McGraw-Hill, New York; 1997.Google Scholar
  14. Breiman L: Classification and regression trees. Chapman & Hall, London; 1984.Google Scholar
  15. Quinlan J: Induction of decision trees. Mach Learn 1986, 1: 81–106.Google Scholar
  16. Aha DW, Kibler D, Albert M: Instance-based learning algorithms. Mach Learn 1991, 6: 37–66.Google Scholar
  17. Haykin S: Neural networks: a comprehensive foundation,. Prentice Hall PTR, Upper Saddle River; 1998.Google Scholar
  18. Neapolitan RE: Learning Bayesian networks. Prentice-Hall, Upper Saddle River; 2003.Google Scholar
  19. Cortes C, Vapnik V: Support-vector networks. Mach Learn 1995, 20(3):273–297.Google Scholar
  20. Chapelle O, Scholkopf B, Zien A: Semi-supervised learning. MIT Press, Cambridge; 2006.View ArticleGoogle Scholar
  21. Schaeffer SE: Graph clustering. Comput Sci Rev 2007, 1(1):27–64. 10.1016/j.cosrev.2007.05.001MathSciNetView ArticleGoogle Scholar
  22. Yan S, Xu D, Zhang B, Zhang HJ, Yang Q, Lin S: Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 2007, 29(1):40–51.View ArticleGoogle Scholar
  23. Carneiro MG, Zhao L: High level classification totally based on complex networks. In Proceedings of the 1st BRICS Countries Congress. Porto de Galinhas, Brazil; 2013:1–8.Google Scholar
  24. Rossi R, de Paulo Faleiros T, de Andrade Lopes A, Rezende S: Inductive model generation for text categorization using a bipartite heterogeneous network. In 2012 IEEE international conference on data mining. Brussels, Belgium; 2012:1086–1091.View ArticleGoogle Scholar
  25. Andrade RFS, Miranda JGV, Pinho STR, Lobão TP: Characterization of complex networks by higher order neighborhood properties. Eur Phys J B 2006, 61(2):28.Google Scholar
  26. Newman MEJ: Assortative mixing in networks. Phys Rev Lett 2002, 89: 208701.View ArticleGoogle Scholar
  27. Latora V, Marchiori M: Efficient behavior of small-world networks. Phys Rev Lett 2001, 87: 198701.View ArticleGoogle Scholar
  28. Watts D, Strogatz S: Collective dynamics of small-world networks. Nature 1998, 393: 440–442. 10.1038/30918View ArticleGoogle Scholar
  29. Frank A, Asuncion A: UCI machine learning repository. 2010.http://archive.ics.uci.edu/ml . Accessed 10 Nov 2013Google Scholar
  30. Alcalá-Fdez J, Fernández A, Luengo J, Derrac J: García, S. Multiple-Valued Logic Soft Comput 2011, 17(2–3):255–287.Google Scholar
  31. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E: scikit-learn: machine learning in Python. J Mach Learn Res 2011, 12: 2825–2830.MathSciNetGoogle Scholar
  32. Demšar J: Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 2006, 7: 1–30.MathSciNetGoogle Scholar

Copyright

© Carneiro et al.; licensee Springer. 2014

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.