RelHunter: a machine learning method for relation extraction from text

Fernandes, Eraldo R.; Milidiú, Ruy L.; Rentería, Raúl P.

doi:10.1007/s13173-010-0018-y

Original Paper
Open access
Published: 08 August 2010

RelHunter: a machine learning method for relation extraction from text

Eraldo R. Fernandes¹,
Ruy L. Milidiú¹ &
Raúl P. Rentería²

Journal of the Brazilian Computer Society volume 16, pages 191–199 (2010)Cite this article

762 Accesses
2 Citations
Metrics details

Abstract

We propose RelHunter, a machine learning-based method for the extraction of structured information from text. RelHunter’s key idea is to model the target structures as a relation over entities. Hence, the modeling effort is reduced to the identification of entities and the generation of a candidate relation, which are simpler problems than the original one. RelHunter fits a very broad spectrum of complex computational linguistic problems. We apply it to five tasks: phrase chunking, clause identification, hedge detection, quotation extraction, and dependency parsing. We compare RelHunter to token classification approaches through several computational experiments on seven multilingual corpora. RelHunter outperforms the token classification approaches by 2.14% on average. Moreover, we compare the derived systems against state-of-the-art systems for each corpus. Our systems achieve state-of-the-art performances for three corpora: Portuguese phrase chunking, Portuguese clause identification, and English quotation extraction. Additionally, the derived systems show good quality performance for the other four corpora.

References

Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21(4):543–565
Google Scholar
Buchholz S, Marsi E (2006) CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the tenth conference on computational natural language learning, New York, USA, pp 149–164
Chapter Google Scholar
Carreras X, Màrquez L, Punyakanok V, Roth D (2002) Learning and inference for clause identification. In: Proceedings of the thirteenth European conference on machine learning, pp 35–47
Google Scholar
Carreras X, Màrquez L, Castro J (2005) Filtering-ranking perceptron learning for partial parsing. Mach Learn 60(13):41–71
Article Google Scholar
de La Clergerie É, Sagot B, Stern R, Denis P, Recourcé G, Mignot V (2009) Extracting and visualizing quotations from news wires. In: Proceedings of the 4th language and technology conference, Poznań, Poland, November
dos Santos CN, Milidiú RL (2009) Entropy guided transformation learning. In: Foundations of computational intelligence, volume 1: Learning and approximation. Studies in Computational Intelligence, vol 201. Springer, Berlin, pp 159–184
Chapter Google Scholar
dos Santos CN, Milidiú RL, Renteria RP (2008) Portuguese part-of-speech tagging using entropy guided transformation learning. In: Proceedings of the international conference on computational processing of Portuguese language (PROPOR), Aveiro, Portugal
Farkas R, Vincze V, Mora G, Csirik J, Szarvas G (2010) The CoNLL 2010 shared task: learning to detect hedges and their scope in natural language text. In: Proceedings of the fourteenth conference on computational natural language learning shared task (CoNLL), Uppsala, Sweden
Fernandes ER, dos Santos CN, Milidiú RL (2009) Portuguese language processing service. In: Proceedings of the web in Ibero-America alternate track of the 18th World Wide Web conference (WWW), Madrid
Fernandes ER, Pires BA, dos Santos CN, Milidiú RL (2009) Clause identification using entropy guided transformation learning. In: Proceedings of the 7th Brazilian symposium in information and human language technology (STIL), São Carlos, Brazil
Fernandes ER, Pires BA, dos Santos CN, Milidiú RL (2010) A machine learning approach to Portuguese clause identification. In: Proceedings of the 9th international conference on computational processing of the Portuguese language (PROPOR), Porto, Alegre, Brazil. Lecture notes in artificial intelligence, vol 6001. Springer, Berlin, pp 55–64
Chapter Google Scholar
Fernandes E, Crestana C, Milidiú R (2010) Hedge detection using the RelHunter approach. In: Proceedings of the 14th conference on computational natural language learning, July 2010, Uppsala, Sweden. Association for Computational Linguistics, Stroudsburg, pp 64–69. http://www.aclweb.org/anthology/W10-3009
Google Scholar
Freitas MC, Rocha P, Bick E (2008) Floresta sintá(c)tica: bigger, thicker and easier. In: Teixeira A, Lúcia Strube de Lima V, Caldas de Oliveira L, Quaresma P (eds) Computational processing of the Portuguese language. Lecture notes in computer science, vol 5190. Springer, Berlin, pp 216–219
Chapter Google Scholar
Màrquez L, Carreras X, Litkowski KC, Stevenson S (2008) Semantic role labeling: an introduction to the special issue. Comput Linguist 34(2):145–159
Article Google Scholar
McDonald R, Lerman K, Pereira F (2006) Multilingual dependency analysis with a two-stage discriminative parser. In: Proceedings of the tenth conference on computational natural language learning, New York, USA. Association for Computational Linguistics, Stroudsburg, pp 216–220
Chapter Google Scholar
Milidiú RL, dos Santos CN, Duarte JC (2008) Phrase chunking using entropy guided transformation learning. In: Proceedings of ACL–HLT, Columbus, OH, USA. Association for Computational Linguistics, Stroudsburg, pp 647–655
Google Scholar
Milidiú RL, dos Santos CN, Duarte JC (2008) Portuguese corpus-based learning using ETL. J Braz Comput Soc 14(4). doi:10.1590/S0104-65002008000400003
Milidiú RL, dos Santos CN, Crestana CEM (2009) A token classification approach to dependency parsing. In: Proceedings of the 7th Brazilian symposium in information and human language technology (STIL), São Carlos, Brazil
Nivre J, Hall J, Kübler S, McDonald R, Nilsson J, Riedel S, Yuret D (2007) The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL shared task, Prague, Czech Republic, pp 915–932
Pouliquen B, Steinberger R, Best C (2007) Automatic detection of quotations in multilingual news. In: Proceedings of recent advances in natural language processing, Borovets, Bulgaria, September
Punyakanok V, Roth D (2001) The use of classifiers in sequential inference. In: Proceedings of the conference on advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 995–1001
Google Scholar
Sang EFTK (2000) Text chunking by system combination. In: Proceedings of conference on computational natural language learning, Lisbon, Portugal
Sang EFTK, Buchholz S (2000) Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal
Sang EFTK, Déjean H (2001) Introduction to the CoNLL-2001 shared task: clause identification. In: Proceedings of fifth conference on computational natural language learning, Toulouse, France
Sarmento L, Nunes S (2009) Automatic extraction of quotes and topics from news feeds. In: Proceedings of the 4th doctoral symposium on informatics engineering, Porto, Portugal, February
Vincze V, Szarvas G, Richárd F, Mora G, Csirik J (2008) The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinf 9(Suppl 11):S9
Article Google Scholar
Wu YC, Chang CH, Lee YS (2006) A general and multi-lingual phrase chunking model based on masking method. In: Proceedings of the 7th international conference on intelligent text processing and computational linguistics, pp 144–155

Download references

Author information

Authors and Affiliations

Departamento de Informática, PUC-Rio, R. Mq. de São Vicente 225, Rio de Janeiro, 22.451-900, Brasil
Eraldo R. Fernandes & Ruy L. Milidiú
Microsoft Enterprise Search Group, Redmond, USA
Raúl P. Rentería

Authors

Eraldo R. Fernandes
View author publications
You can also search for this author in PubMed Google Scholar
Ruy L. Milidiú
View author publications
You can also search for this author in PubMed Google Scholar
Raúl P. Rentería
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eraldo R. Fernandes.

Additional information

This work was partially funded by CNPq and FAPERJ grants 557.128/2009-9 and E-26/170028/2008. The first author holds a CNPq doctoral fellowship and is supported by Instituto Federal de Educação, Ciência e Tecnologia de Goiás, Brazil.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Fernandes, E.R., Milidiú, R.L. & Rentería, R.P. RelHunter: a machine learning method for relation extraction from text. J Braz Comput Soc 16, 191–199 (2010). https://doi.org/10.1007/s13173-010-0018-y

Download citation

Received: 08 February 2010
Accepted: 16 July 2010
Published: 08 August 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s13173-010-0018-y