Skip to main content

Table 5 Data and evaluation methods for Portuguese

From: A review on Relation Extraction with an eye on Portuguese

References

Data/corpora

Data size

Method

Evaluation

Performance (%)

Brucksen et al. [11]

HAREM/ ReRelEM Golden Collection

4,417 words

Set of heuristics based on morphosyntactic and semantic information

Golden Standard annotated manually

All relations F\(=\) 36 %

Cardoso [15]

HAREM/ ReRelEM Golden Collection

4,417 words

Set of grammar rules

Golden Standard annotated manually

All relations F\(=\) 45 %

Chaves [19]

HAREM/ ReRelEM Golden Collection

4,417 words

Set of grammar rules

Golden Standard annotated manually

All relations F\(=\) 27 %

Xavier and de Lima [101]

Tourism category from Wikipedia

–

Semi-automatic method based on structure from Wikipedia and syntactic heuristics

Golden Standard the domain of Tourism

F\(=\) 85 %

Santos et al. [85]

Biographies texts from Wikipedia, CETEMPblico corpus

CETEMPblico \(=\) 110 sentences

Rule-base approach

Manual evaluation of the family relations

Wikipedia F\(=\) 29 % CETEMPblico F\(=\) 36 %

Ferreira et al. [39]

MedAlert corpus

2,724,860 tokens

REMMA system

MedAlert Golden Standard composed by 20 texts annotated manually

Inclusion F\(=\) 89 %

Tanev et al. [97]

News articles for Portuguese about security and disaster-related topics

News articles = 3.4 million titles, disaster-related articles \(=\) 100 (April 2009)

Ontopopulis system

Comparative evaluation between Baseline Portuguese and the results

Dead F\(=\) 69 %, Wounded F\(=\) 51 %, Kidnapeed F\(=\) 67 %, Arrested F\(=\) 47 %

Fernandes et al. [38]

GLOBO QUOTES from Globo.com

Around 13.5 million tokens

Entropy Guided Transformation Learning

Baseline system manually constructed

Quotation-Author F\(=\) 79.02 %