A systematic review of named entity recognition in biomedical texts

Goulart, Rodrigo Rafael Villarreal; Strube de Lima, Vera Lúcia; Xavier, Clarissa Castellã

doi:10.1007/s13173-011-0031-9

Original Paper
Open access
Published: 02 March 2011

A systematic review of named entity recognition in biomedical texts

Rodrigo Rafael Villarreal Goulart¹,
Vera Lúcia Strube de Lima² &
Clarissa Castellã Xavier²

Journal of the Brazilian Computer Society volume 17, pages 103–116 (2011)Cite this article

2366 Accesses
14 Citations
Metrics details

Abstract

Biomedical Named Entities (NEs) are phrases or combinations of phrases that denote specific objects or groups of objects in the biomedical literature. Research on Named Entity Recognition (NER) is one of the most disseminated activities in the automatic processing of biomedical scientific articles. We analyzed articles relevant to NER in biomedical texts, in the period from 2007 to 2009, through a systematic review. The results identify the main methods in the recognition of Biomedical NEs, features and methodologies for a NER system implementation. Aside from the tendencies identified, some gaps are detected that may constitute opportunities for new studies in the area.

References

Ananiadou S, McNaught J (2005) Text mining for biology and biomedicine. Artech House, Norwood
Google Scholar
Chen L, Liu H, Friedman C (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21(2):248–256
Article Google Scholar
Kozareva Z, Ferrandez O, Montoyo A, Munoz R, Suarez A (2007) Combining data-driven systems for improving named entity recognition. Data Knowl Eng 61(3):449–466
Article Google Scholar
Biolchini J, Mian PG, Natali ACC, Travassos GH (2005) Systematic review in software engineering. System Engineering and Computer Science Department COPPE/UFRJ, Technical report ES, 679(05)
Kitchenham B (2004) Procedures for performing systematic reviews. Technical report, Keele University and NICTA
Kim J-D, Ohta T, Tsuruoka Y, Tateisi Y, Collier N (2004) Introduction to the bio-entity recognition task at JNLPBA. In: JNLPBA’04: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications, Morristown, NJ, USA. Association for Computational Linguistics, Menlo Park, pp 70–75
Chapter Google Scholar
Tsai RT-H, Wu S-H, Chou W-C, Lin Y-C, He D, Hsiang J, Sungand T-Y, Hsu W-L (2006) Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinform 7:92
Article Google Scholar
Sasaki Y, Tsuruoka Y, McNaught J, Ananiadou S (2008) How to make the most of NE dictionaries in statistical NER. BMC Bioinform 9(Suppl 11):S5
Article Google Scholar
Sun B, Mitra P, Giles CL (2008) Mining, indexing, and searching for textual chemical molecule information on the web. In: WWW ’08: Proceedings of the 17th international conference on World Wide Web. ACM, New York, pp 735–744
Chapter Google Scholar
Hakenberg J, Plake C, Leaman R, Schroeder M, Gonzalez G (2008) Inter-species normalization of gene mentions with GNAT. Bioinformatics 24(16):126–132
Article Google Scholar
Tan H, Lambrix P (2009) Selecting an ontology for biomedical text mining. In: BioNLP ’09: Proceedings of the workshop on BioNLP, Morristown, NJ, USA. Association for Computational Linguistics, Menlo Park, pp 55–62
Chapter Google Scholar
Vlachos A (2007) Evaluating and combining biomedical named entity recognition systems. In: BioNLP ’07: Proceedings of the workshop on BioNLP 2007, Morristown, NJ, USA. Association for Computational Linguistics, Menlo Park, pp 199–206
Chapter Google Scholar
Jijkoun V, Khalid MA, Marx M, de Rijke M (2008) Named entity normalization in user generated content. In: AND ’08: proceedings of the second workshop on analytics for noisy unstructured text data. ACM, New York, pp 23–30
Chapter Google Scholar
Sarafraz F, Eales J, Mohammadi R, Dickerson J, Robertson D, Nenadic G (2009) Biomedical event detection using rules, conditional random fields and parse tree distances. In: BioNLP ’09: proceedings of the workshop on BioNLP, Morristown, NJ, USA. Association for Computational Linguistics, Menlo Park, pp 115–118
Chapter Google Scholar
Shi Z, Sarkar A, Popowich F (2007) Simultaneous identification of biomedical named-entity and functional relations using statistical parsing techniques. In: NAACL ’07: human language technologies 2007: the conference of the North American; Companion volume, Short papers on XX, Morristown, NJ, USA. Association for Computational Linguistics, Menlo Park, pp 161–164
Google Scholar
Liu H, Blouin C, Keselj V (2009) Identifying interaction sentences from biological literature using automatically extracted patterns. In: BioNLP ’09: proceedings of the workshop on BioNLP, Morristown, NJ, USA. Association for Computational Linguistics, Menlo Park, pp 133–141
Chapter Google Scholar
Cohen KB, Verspoor K, Johnson HL, Roeder C, Ogren PV, Baumgartner WA Jr, White E, Tipney H, Hunter L (2009) High-precision biological event extraction with a concept recognizer. In: BioNLP ’09: proceedings of the workshop on BioNLP, Morristown, NJ, USA. Association for Computational Linguistics, Menlo Park, pp 50–58
Chapter Google Scholar
Aleman-Meza B, Nagarajan M, Ding L, Sheth A, Arpinar IB, Joshi A, Finin T (2008) Scalable semantic analytics on social networks for addressing the problem of conflict of interest detection. ACM Trans Web 2(1):1–29
Article Google Scholar
Alpaydin E (2004) Introduction to machine learning. MIT Press, Cambridge
Google Scholar
Jurafsky D, Martin JH (2009) Speech and language processing, 2nd edn. Prentice-Hall, New York
Google Scholar
Joachims T (1999) Advances in kernel methods: support vector learning. In: Making large-scale support vector machine learning practical. MIT Press, Cambridge, pp 169–184
Google Scholar
Vlachos A (2007) Evaluating and combining biomedical named entity recognition systems. In: BioNLP ’07: proceedings of the workshop on BioNLP 2007, Morristown, NJ, USA. Association for Computational Linguistics, Menlo Park, pp 199–206
Chapter Google Scholar
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, pp 282–289
Google Scholar
Groves RM, Fowler FJ Jr, Couper MP, Lepkowski JM, Singer E, Tourangeau R (2009) Survey methodology, 2nd edn. Wiley-Blackwell, New York
Google Scholar
Overview articles. http://www.signalprocessingsociety.org/publications/overview-articles/
Chan S-K, Lam W, Yu X (2007) A cascaded approach to biomedical named entity recognition using a unified model. In: Data mining. ICDM 2007. Seventh IEEE international conference on, Oct 2007, pp 93–102
Google Scholar
Wang H, Zhao T, Liu J (2008) Multi-agent classifiers fusion strategy for biomedical named entity recognition. In: BioMedical engineering and informatics. BMEI 2008. International conference on, May 2008, vol 1, pp 311–315
Chapter Google Scholar
Kim J-D, Ohta T, Teteisi Y, Tsujii J (2003) GENIA corpus—a semantically annotated corpus for bio-textmining. Bioinformatics 19(Suppl 1):180–182
Article Google Scholar
Corbett P, Batchelor C, Teufel S (2007) Annotation of chemical named entities BioNLP 2007: biological, translational, and clinical language processing, Prague, Czech Republic, pp 57–64
Corbett P, Copestake A (2008) Cascaded classifiers for confidence-based chemical named entity recognition. BMC Bioinformatics 9(Suppl 11):S4
Article Google Scholar
Sasaki Y, Tsuruoka Y, McNaught J, Ananiadou S (2008) How to make the most of ne dictionaries in statistical ner. In: BioNLP ’08: proceedings of the workshop on current trends in biomedical natural language processing, Morristown, NJ, USA. Association for Computational Linguistics, Menlo Park, pp 63–70
Chapter Google Scholar
Neves ML, Carazo JM, Pascual-Montano A (2009) Extraction of biomedical events using case-based reasoning. In: BioNLP ’09: proceedings of the workshop on BioNLP, Morristown, NJ, USA. Association for Computational Linguistics, Menlo Park, pp 68–76
Chapter Google Scholar
Li Y, Lin H, Yang Z (2007) Two approaches for biomedical text classification. In: Bioinformatics and biomedical engineering. ICBBE 2007. The 1st international conference on, July 2007, pp 310–313
Chapter Google Scholar
Viola P, Jones M (2001) Fast multi-view face detection. In: Proc of CVPR
Google Scholar
Yoshida K, Tsujii J (2007) Reranking for biomedical named-entity recognition. In: BioNLP ’07: proceedings of the workshop on BioNLP 2007, Morristown, NJ, USA. Association for Computational Linguistics, Menlo Park, pp 209–216
Chapter Google Scholar
Gu B, Dahl V, Popowich F (2007) Recognizing biomedical named entities in the absence of human annotated corpora. In: Natural language processing and knowledge engineering. NLP-KE 2007. International conference on, August 30 2007–Sept 1, pp 74–81
Chapter Google Scholar
Cohen KB, Fox L, Ogren PV, Hunter L (2005) Empirical data on corpus design and usage in biomedical natural language processing. In: AMIA symposium, pp 156–160
Google Scholar
Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall PTR, New York
Google Scholar

Download references

Author information

Authors and Affiliations

Feevale University, Novo Hamburgo, Brazil
Rodrigo Rafael Villarreal Goulart
PUCRS, Porto Alegre, Brazil
Vera Lúcia Strube de Lima & Clarissa Castellã Xavier

Authors

Rodrigo Rafael Villarreal Goulart
View author publications
You can also search for this author in PubMed Google Scholar
Vera Lúcia Strube de Lima
View author publications
You can also search for this author in PubMed Google Scholar
Clarissa Castellã Xavier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodrigo Rafael Villarreal Goulart.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Goulart, R.R.V., Strube de Lima, V.L. & Xavier, C.C. A systematic review of named entity recognition in biomedical texts. J Braz Comput Soc 17, 103–116 (2011). https://doi.org/10.1007/s13173-011-0031-9

Download citation

Received: 05 August 2010
Accepted: 03 February 2011
Published: 02 March 2011
Issue Date: June 2011
DOI: https://doi.org/10.1007/s13173-011-0031-9

A systematic review of named entity recognition in biomedical texts

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords