Skip to main content

Table 5 Data and evaluation methods for Portuguese

From: A review on Relation Extraction with an eye on Portuguese

References Data/corpora Data size Method Evaluation Performance (%)
Brucksen et al. [11] HAREM/ ReRelEM Golden Collection 4,417 words Set of heuristics based on morphosyntactic and semantic information Golden Standard annotated manually All relations F\(=\) 36 %
Cardoso [15] HAREM/ ReRelEM Golden Collection 4,417 words Set of grammar rules Golden Standard annotated manually All relations F\(=\) 45 %
Chaves [19] HAREM/ ReRelEM Golden Collection 4,417 words Set of grammar rules Golden Standard annotated manually All relations F\(=\) 27 %
Xavier and de Lima [101] Tourism category from Wikipedia Semi-automatic method based on structure from Wikipedia and syntactic heuristics Golden Standard the domain of Tourism F\(=\) 85 %
Santos et al. [85] Biographies texts from Wikipedia, CETEMPblico corpus CETEMPblico \(=\) 110 sentences Rule-base approach Manual evaluation of the family relations Wikipedia F\(=\) 29 % CETEMPblico F\(=\) 36 %
Ferreira et al. [39] MedAlert corpus 2,724,860 tokens REMMA system MedAlert Golden Standard composed by 20 texts annotated manually Inclusion F\(=\) 89 %
Tanev et al. [97] News articles for Portuguese about security and disaster-related topics News articles = 3.4 million titles, disaster-related articles \(=\) 100 (April 2009) Ontopopulis system Comparative evaluation between Baseline Portuguese and the results Dead F\(=\) 69 %, Wounded F\(=\) 51 %, Kidnapeed F\(=\) 67 %, Arrested F\(=\) 47 %
Fernandes et al. [38] GLOBO QUOTES from Globo.com Around 13.5 million tokens Entropy Guided Transformation Learning Baseline system manually constructed Quotation-Author F\(=\) 79.02 %