A graph clustering algorithm based on a clustering coefficient for weighted graphs

Nascimento, Mariá C. V.; Carvalho, André C. P. L. F.

doi:10.1007/s13173-010-0027-x

Original Paper
Open access
Published: 21 December 2010

A graph clustering algorithm based on a clustering coefficient for weighted graphs

Mariá C. V. Nascimento¹ &
André C. P. L. F. Carvalho¹

Journal of the Brazilian Computer Society volume 17, pages 19–29 (2011)Cite this article

1243 Accesses
13 Citations
Metrics details

Abstract

Graph clustering is an important issue for several applications associated with data analysis in graphs. However, the discovery of groups of highly connected nodes that can represent clusters is not an easy task. Many assumptions like the number of clusters and if the clusters are or not balanced, may need to be made before the application of a clustering algorithm. Moreover, without previous information regarding data label, there is no guarantee that the partition found by a clustering algorithm automatically extracts the relevant information present in the data. This paper proposes a new graph clustering algorithm that automatically defines the number of clusters based on a clustering tendency connectivity-based validation measure, also proposed in the paper. According to the computational results, the new algorithm is able to efficiently find graph clustering partitions for complete graphs.

References

Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma sub-classes. Proc Natl Acad Sci USA 98(24):13790–13795
Article Google Scholar
Boginski V, Butenko S, Pardalos PM (2006) Mining market data: a network approach. Comput Oper Res 33:3171–3184
Article MATH Google Scholar
Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111
Article Google Scholar
Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957
Article Google Scholar
Evett IW, Spiehler EJ (1987) Rule induction in forensic science. In: KBS in government, online publications, pp 107–118
Google Scholar
Feder T, Hell P, Klein S, Motwani R (1999) Complexity of graph partition problems. In: 31ST ANNUAL ACM STOC. Plenum, New York, pp 464–472
Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Article Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Article Google Scholar
Hoshida Y, Brunet JP, Tamayo P, Golub TR, Mesiro JP (2007) Subclass mapping: identifying common subtypes in independent disease data sets. PLoS ONE 2(11):e1195
Article Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Article MATH Google Scholar
Huttenhower C, Flamholz AI, Landis JN, Sahi S, Myers CL, Olszewski KL, Hibbs MA, Siemers NO, Troyanskaya OG, Coller HA (2007) Nearest neighbor networks: clustering expression data based on gene neighborhoods. BMC Bioinform 8:250
Article Google Scholar
Karypis G, Kumar V (1996) Parallel multilevel graph partitioning. In: Proceedings of the international parallel processing symposium
Google Scholar
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Article MathSciNet MATH Google Scholar
Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Sweet-Cordero A, Ebert BL, Mak RH, Ferrando AA, Downing JR, Jacks T, Horvitz RR, Golub TR (2005) Microrna expression profiles classify human cancers. Nature 435(7043):834–838
Article Google Scholar
Maier M, von Luxburg U, Hein M (2009) Influence of graph construction on graph-based clustering measures. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems, vol 21, pp 1025–1032. Curran, Red Hook
Google Scholar
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Kluwer Academic, Dordrecht. Tech rep, Broad Institute/MIT
MATH Google Scholar
Nakai K, Kanehisa M (1991) Expert system for predicting protein localization sites in gram-negative bacteria. Proteins 11:95–110
Article Google Scholar
Nascimento MCV, Toledo FMB, Carvalho ACPLF (2010) Investigation of a new GRASP-based clustering algorithm applied to biological data. Comput Oper Res 37:1381–1388
Article MATH Google Scholar
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113
Article Google Scholar
Onnela JP, Saramäki J, Kertész J, Kaski K (2005) Intensity and coherence of motifs in weighted complex networks. Phys Rev E 71:065(R), 103(R)
Article Google Scholar
Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: Computer and information sciences—ISCIS 2005, pp 284–293
Chapter Google Scholar
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, Loda M, Lander ES, Golub TR (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA 98(26):15,149–15,154
Article Google Scholar
Reichardt J, Bornholdt S (2006) Statistical mechanics of community detection. Phys Rev E 74:016 110
MathSciNet Google Scholar
Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1:27–64
Article MATH Google Scholar
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22:888–905
Article Google Scholar
Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RCT, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS, Koval MA, Last KW, Norton A, Lister TA, Mesirov J (2002) Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8:68–74
Article Google Scholar
Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, Patapoutian A, Hampton GM, Schultz PG, Hogenesch JB (2002) Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA 99:4465–4470
Article Google Scholar
van ’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536
Article Google Scholar
Venables WN, Smith DM (2010) An introduction to R. R Development Core Team, The R Foundation for Statistical Computing, version 2.11.1
Watts D, Strogatz S (1998) Collective dynamics of small-world networks. Nature 393:440
Article Google Scholar
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Marks JR, Nevins JR (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98(20):11462–11467
Article Google Scholar
Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm F, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui CH, Evans WE, Naeve C, Wong L, Downing J (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1:133–143
Article Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, Caixa Postal 668, São Carlos, SP, CEP 13560-970, Brazil
Mariá C. V. Nascimento & André C. P. L. F. Carvalho

Authors

Mariá C. V. Nascimento
View author publications
You can also search for this author in PubMed Google Scholar
André C. P. L. F. Carvalho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariá C. V. Nascimento.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Nascimento, M.C.V., Carvalho, A.C.P.L.F. A graph clustering algorithm based on a clustering coefficient for weighted graphs. J Braz Comput Soc 17, 19–29 (2011). https://doi.org/10.1007/s13173-010-0027-x

Download citation

Received: 27 May 2010
Accepted: 29 November 2010
Published: 21 December 2010
Issue Date: March 2011
DOI: https://doi.org/10.1007/s13173-010-0027-x