From: A review on Relation Extraction with an eye on Portuguese
References | Data/corpora | Data size | Method | Evaluation | Performance (%) |
---|---|---|---|---|---|
Brin [10] | Web pages | 24 million pages | Exact pattern matching | Manual evaluation of 20 books selected from a list of over 150,000 | 19 correct books—95 % |
Agichtein and Gravano [1] | North American News corpus | 300,000 newspapers | Matching with similar function | Manual evaluation of a set of 100 tuples | 93 correct tuples—93 % |
Hasegawa et al. [54] | Articles from New York Times | 1Â year (1995) | Clustering | Manual evaluation of the relations for 2 domains | Person-GPE F\(=\) 80Â %, Company-Company F\(=\) 75Â % |
Pantel and Pennacchiotti [79] | Articles from TREC-9 and CHEM | TREC-9Â \(=\) 5,951,432 words, CHEMÂ \(=\) 313,590 words | Weakly-supervisioned classifier | Manual annotation of 680 instances from TREC and CHEM corpora (2 experts) | TREC part-of P\(=\) 69.9Â %, succession P\(=\) 49Â %, CHEM is-a P\(=\) 76Â %, reaction P\(=\) 91.4Â %, production P\(=\) 55.8Â % |
Carlson et al. [17] | Web pages | 200 million pages | Coupling Semi-supervised Learning | Freebase database as Golden Standard | Category average P\(=\) 83Â %; relation average P\(=\) 84Â %. |
Li et al. [64] | Wikipedia and Tago project | Wikipedia \(=\) 4,556,821 pages, % Tago \(=\) 67,973 entity pairs | Semi-supervised multi-view ranking | 5 types of relation extract by YAGO Project as Golden Standard | Relation average \(=\) 39Â % |
Web pages | 9 million pages | Naive Bayes | Manual evaluation of 400 tuples (3 experts) | 80.4Â % correct tuples | |
Banko and Etzioni[4] | Sent500 corpus [13] | Sent500 = 500 sentences | Conditional Random Fields | Small set of labeled data for 4 relations from Sent500 | Open relation F\(=\) 59.8Â %; pre-specified relation F\(=\) 29.5% |
Zhu et al. [105] | Sent500 corpus and Web1M corpus | Sent500 \(=\) 500 sentences, Web1M \(=\) 1 million of blocks of Web pages | Markov Logic Networks | Manual evaluation of the extracted tuples from Sent500 | F\(=\) 76.4Â % |
WSJ from Penn Treebank, Wikipedia and Web pages | – | Conditional Random Fields | Manual evaluation of 300 sentences from each corpora (2 experts) | WSJ F\(=\) 64.7 %, Wikipedia F\(=\) 57.2 %, Web F\(=\) 65% | |
Culotta et al. [25] | Articles from Wikipedia | 271 articles | Conditional Random Fields | Manual annotation of the 53 family relations | F = 61.36Â % |
Li et al. [65] | Articles from New York Times, articles from Wikipedia [25] | New York Times \(=\) 150 articles, Wikipedia \(=\) 271 articles | Conditional Random Fields | Manual annotation of the relations | New York Times Employment F\(=\) 80Â %, Wikipedia personal/social F\(=\) 51Â % |
Fader et al. [36] | Web pages | 500 sentences | Logic Regression classifier | Manual evaluation of each extraction as correct or incorrect (2 experts) | F\(=\) 69.8Â %. |
Liu et al. [66] | Expert-curated corpus | 150K words | Semantic interpretation approach | Manual annotation of 565 relation instances for protein-organism-location | F\(=\) 74.9Â % |
Gamallo et al. [48] | Sentences from Wikipedia in English, Spanish, Galician, Portuguese (2010) | English \(=\) 78,826,696, Spanish \(=\) 21,208,089, Galician \(=\) 1,461,705, Portuguese \(=\) 11,714,672 | Unsupervised extraction of verb-based triples | Manual evaluation of 200 sentences from English Wikipedia (2 experts) | P\(=\) 68Â % |