scriptLattes: an open-source knowledge extraction system from the Lattes platform

Mena-Chalco, Jesús Pascual; Junior, Roberto Marcondes Cesar

doi:10.1007/BF03194511

Open access
Published: December 2009

scriptLattes: an open-source knowledge extraction system from the Lattes platform

Jesús Pascual Mena-Chalco¹ &
Roberto Marcondes Cesar Junior¹

Journal of the Brazilian Computer Society volume 15, pages 31–39 (2009)Cite this article

1987 Accesses
39 Citations
3 Altmetric
Metrics details

Abstract

The Lattes platform is the major scientific information system maintained by the National Council for Scientific and Technological Development (CNPq). This platform allows to manage the curricular information of researchers and institutions working in Brazil based on the so called Lattes Curriculum. However, the public information is individually available for each researcher, not providing the automatic creation of reports of several scientific productions for research groups. It is thus difficult to extract and to summarize useful knowledge for medium to large size groups of researchers. This paper describes the design, implementation and experiences with scriptLattes: an open-source system to create academic reports of groups based on curricula of the Lattes Database. The scriptLattes system is composed by the following modules: (a) data selection, (b) data preprocessing, (c) redundancy treatment, (d) collaboration graph generation among group members, (e) research map generation based on geographical information, and (f) automatic report creation of bibliographical, technical and artistic production, and academic supervisions. The system has been extensively tested for a large variety of research groups of Brazilian institutions, and the generated reports have shown an alternative to easily extract knowledge from data in the context of Lattes platform. The source code, usage instructions and examples are available at http://scriptlattes.sourceforge.net/.

References

Amorin CV. Curriculum vitae organization: the Lattes software platform.Pesquisa Odontológica Brasileira 2003; 17(1): 18–22.
Google Scholar
Balancieri R, Bovo AB, Kern VM, Pacheco RCS and Barcia RM. A análise de redes de colaboraçào cientifica sob as novas tecnologias de informação e comunicação: um estudo na Plataforma Lattes. Ciência da Informação 2005; 34(l):64–77.
Google Scholar
Börner K, Chen CM and Boyack KW. Visualizing knowledge domains. In: Cronin, B. (Ed.). Annual Review of Information Science and Technology 2003; 37(1):179-255.
Castaño AC.Populando ontologias através de informações em HTML: o caso do currículo Lattes. [Master’s thesis]. São Paulo: Universidade de São Paulo; 2008.
Google Scholar
Cormen TH, Leiserson CE, Rivest RL and Stein C.Introd action to algorithms. 2 ed. Cambridge: MIT Press; 2001.
Google Scholar
Costa LF, Rodrigues FA, Travieso G and Villas Boas PR. Characterization of complex networks: a survey of measurements.Advances in Physics 2007; 56(l):167–242.
Article Google Scholar
Day MY, Tsai TH, Sung CL, Lee CW, Wu SH, Ong CS et al. A knowledge-based approach to citation extraction. In: Zhang D, Khoshgoftaar TM and Shyu ML. (Eds.).Proceedings of the International Conference on Information Reuse and Integration; 2005; Las Vegas Hilton. Las Vegas: IEEE Systems, Man, and Cybernetics Society; 2005. p. 50–55.
Google Scholar
Duda RO, Hart PE and Stork DG.Pattern classification. 2 ed. New York: John Wiley & Sons; 2000.
Google Scholar
Han H, Zha H and Giles CL. Name disambiguation in author citations using a K-way spectral clustering method. In:Proceedings of the 5 ACM/IEEE-CS Joint Conference on Digital Libraries, Tools & techniques: identifying names of people and places; 2005; Denver. Canada: ACM; 2005. p. 334–343.
Google Scholar
10. Hey T, Tansley S and Tolle K. (Eds.).The fourth paradigm. Redmond, Washington: Microsoft Research; 2009.
Google Scholar
The Digging into Data Challenge. 2009. Available from: http://www.diggingintodata.org/. Access in: 20/10/2009.
Communications of the ACM: Surviving the data deluge 2008; 51(12). New York, NY, USA: ACM; 2008.
Google Scholar
Jolliffe IT.Principal component analysis. 2 ed. New York: Springer-Verlag; 2002. (Series in statistics)
MATH Google Scholar
Koren Y, North SC and Volinsky C. Measuring and extracting proximity in networks. In: Eliassi Rad T, Ungar LH, Craven M and Gunopulos D. (Eds.).Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2006; Philadelphia. Philadelphia: ACM; 2006. p. 245- 255.
Google Scholar
Kouzes RT, Anderson GA, Elbert ST, Gorton I and Gracio DK. The changing paradigm of dataintensive computing.Computer 2009; 42(l):26–34.
Article Google Scholar
Laender AHF, Lucena CJP, Maldonado JC, Souza e Silva E and Ziviani N. Assessing the research and education quality of the top Brazilian Computer Science graduate programs.ACM SIGCSE Bulletin 2008; 40(2):135–145.
Article Google Scholar
Liu X, Bollen J, Nelson ML and Van de Sompel H. Co-authorship networks in the digital library research community.Informations Processing and Management 2005; 41(6):1462–1480.
Article Google Scholar
Project Zoomable Visual Transformation Machine. 2009. Available from: http://zvtm.sourceforge.net/. Access in: 20/10/2009.
Maia MF and Caregnato SE. Co-autoria como indicador de redes de colaboração cientifica.Perspectivas em Ciência da Informação 2008; 13(2):18–31.
Article Google Scholar
Nascimento MA, Sander Jand Pound J. Analysis of SIGMOD’s co-authorship graph.SIGMOD Record 2003; 32(3):8–10.
Article Google Scholar
Newman MEJ and Girvan M. Finding and evaluating community structure in networks.Physical Review E 2004; 69(2):026113.
Article Google Scholar
Nicholson S. The basis for bibliomining: frameworks for bringing together usage-based data mining and bibliometrics through data warehousing in digital library services.Informations Processing and Management 2006; 42(3):785–804.
Article MathSciNet Google Scholar
University of São Paulo - USP.Publications of the Department of Computer Science. São Paulo, 2009. Available from: http:/ / www.vision.ime.usp.br/creativision/publications_dcc/. Access in: 20/10/2009.
Vision Research Group- IME — USP.Publications of the Vision Research Group. São Paulo: University of São Paulo, 2009. Available from:http://www.vision.ime.usp.br/ creativision/publications_vision/. Access in: 20/10/2009.
Google Scholar
Pacheco RCS and Kern VM. Uma ontologia comum para a integração de bases de informações e conhecimento sobre ciência e tecnologia.Ciência da Informação 2001; 30(3):56–63.
Google Scholar
Paulovich FV, Nonato LG, Minghim R and Levkowitz H. Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping.IEEE Transactions on Visualization and Computer Graphics 2008; 14(3):564–575.
Article Google Scholar
Peng F and McCallum A. Information extraction from research papers using conditional random fields.Informations Processing and Management 2006; 42(4):963–979.
Article Google Scholar
Said YH, Wegman EJ, Sharabati WK and Rigsby JT. Social networks of author-coauthor relationships.Computational Statistics & Data Analysis 2008; 52(4):2177–2184.
Article MATH MathSciNet Google Scholar
Project script Lattes.scriptLattes: uma ferramenta para extração e visualização de conhecimento a partir de Currículos Lattes. São Paulo: Universidade de São Paulo, 2009. Available from: http://scriptlattes.sourceforge.net. Access in: 20/10/2009.
Google Scholar
Sobral FAF, Almeida MRC and Caixeta MVG. As lideranças científicas.Ciências & Cognição 2008; 13(2):179–191.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Institute of Mathematics and Statistics, University of São Paulo - USP, Rua do Matão, 1010, 05508-090, São Paulo, SP, Brazil
Jesús Pascual Mena-Chalco & Roberto Marcondes Cesar Junior

Authors

Jesús Pascual Mena-Chalco
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Marcondes Cesar Junior
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Mena-Chalco, J.P., Junior, R.M.C. scriptLattes: an open-source knowledge extraction system from the Lattes platform. J Braz Comp Soc 15, 31–39 (2009). https://doi.org/10.1007/BF03194511

Download citation

Received: 07 May 2009
Accepted: 15 December 2009
Issue Date: December 2009
DOI: https://doi.org/10.1007/BF03194511

scriptLattes: an open-source knowledge extraction system from the Lattes platform

Abstract

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keytvords