A review on Relation Extraction with an eye on Portuguese

de Abreu, Sandra Collovini; Bonamigo, Tiago Luis; Vieira, Renata

doi:10.1007/s13173-013-0116-8

Journal of the Brazilian Computer Society

Table 4 Data and evaluation methods for English

From: A review on Relation Extraction with an eye on Portuguese

References	Data/corpora	Data size	Method	Evaluation	Performance (%)
Brin [10]	Web pages	24 million pages	Exact pattern matching	Manual evaluation of 20 books selected from a list of over 150,000	19 correct books—95 %
Agichtein and Gravano [1]	North American News corpus	300,000 newspapers	Matching with similar function	Manual evaluation of a set of 100 tuples	93 correct tuples—93 %
Hasegawa et al. [54]	Articles from New York Times	1 year (1995)	Clustering	Manual evaluation of the relations for 2 domains	Person-GPE F\(=\) 80 %, Company-Company F\(=\) 75 %
Pantel and Pennacchiotti [79]	Articles from TREC-9 and CHEM	TREC-9 \(=\) 5,951,432 words, CHEM \(=\) 313,590 words	Weakly-supervisioned classifier	Manual annotation of 680 instances from TREC and CHEM corpora (2 experts)	TREC part-of P\(=\) 69.9 %, succession P\(=\) 49 %, CHEM is-a P\(=\) 76 %, reaction P\(=\) 91.4 %, production P\(=\) 55.8 %
Carlson et al. [17]	Web pages	200 million pages	Coupling Semi-supervised Learning	Freebase database as Golden Standard	Category average P\(=\) 83 %; relation average P\(=\) 84 %.
Li et al. [64]	Wikipedia and Tago project	Wikipedia \(=\) 4,556,821 pages, % Tago \(=\) 67,973 entity pairs	Semi-supervised multi-view ranking	5 types of relation extract by YAGO Project as Golden Standard	Relation average \(=\) 39 %
Banko and Cafarella [3], Yates et al. [102]	Web pages	9 million pages	Naive Bayes	Manual evaluation of 400 tuples (3 experts)	80.4 % correct tuples
Banko and Etzioni[4]	Sent500 corpus [13]	Sent500 = 500 sentences	Conditional Random Fields	Small set of labeled data for 4 relations from Sent500	Open relation F\(=\) 59.8 %; pre-specified relation F\(=\) 29.5%
Zhu et al. [105]	Sent500 corpus and Web1M corpus	Sent500 \(=\) 500 sentences, Web1M \(=\) 1 million of blocks of Web pages	Markov Logic Networks	Manual evaluation of the extracted tuples from Sent500	F\(=\) 76.4 %
Wu [99], Wu and Weld [100]	WSJ from Penn Treebank, Wikipedia and Web pages	–	Conditional Random Fields	Manual evaluation of 300 sentences from each corpora (2 experts)	WSJ F\(=\) 64.7 %, Wikipedia F\(=\) 57.2 %, Web F\(=\) 65%
Culotta et al. [25]	Articles from Wikipedia	271 articles	Conditional Random Fields	Manual annotation of the 53 family relations	F = 61.36 %
Li et al. [65]	Articles from New York Times, articles from Wikipedia [25]	New York Times \(=\) 150 articles, Wikipedia \(=\) 271 articles	Conditional Random Fields	Manual annotation of the relations	New York Times Employment F\(=\) 80 %, Wikipedia personal/social F\(=\) 51 %
Fader et al. [36]	Web pages	500 sentences	Logic Regression classifier	Manual evaluation of each extraction as correct or incorrect (2 experts)	F\(=\) 69.8 %.
Liu et al. [66]	Expert-curated corpus	150K words	Semantic interpretation approach	Manual annotation of 565 relation instances for protein-organism-location	F\(=\) 74.9 %
Gamallo et al. [48]	Sentences from Wikipedia in English, Spanish, Galician, Portuguese (2010)	English \(=\) 78,826,696, Spanish \(=\) 21,208,089, Galician \(=\) 1,461,705, Portuguese \(=\) 11,714,672	Unsupervised extraction of verb-based triples	Manual evaluation of 200 sentences from English Wikipedia (2 experts)	P\(=\) 68 %

Back to article page