Research in KA has focused on the acquisition of common-sense knowledge based on the collaborative effort of specialists as well as various web users. The CYC project [16] is one of the oldest examples of a common-sense base constructed based on specialists. In the early years of the project, CYC already had 1.6 million rules and 180,000 concepts. The initial effort of knowledge acquisition was carried out by a group of specialists who were paid to perform this task. A problem with this approach is that the knowledge gained is dependent upon specialists. As an improvement, [39] propose a KA system by interactive dialogue.The methodology is similar to the method proposed here, in that the user, prior to including a concept, chooses a similar concept that belongs to the CYC base and, in an interactive manner, the user can accept or reject a set of assertions of the similar concept to be acquired for the new concept. For example, if the concept that the user wishes to include is “computer” and if there is the concept “notebook” in the CYC base, the user can select the latter and be guided through a process of questions and answers aimed to acquiring common-sense facts for “computer” based on what is already known about “notebook”. [39] does not present an assessment of this KA method.
In 2000, the Open Mind Common Sense (OMCS) project [34] was launched with the aim of collecting—from the Internet and from volunteer collaborators—sentences expressing facts of ordinary life. The OMCS corpus gave rise to the triples of common-sense knowledge in ConceptNet [12]. The new version of the OMCS [35] already provides functionalities that help the user to refine and validate the knowledge collected. Version 3.0 of this project is distinguished by expanding the project to other languages and by the expression of common-sense relations of a negative nature. For example, “dogs cannot fly.” Currently, ConceptNet is the largest common-sense base, containing 35,854,766 relations and is currently in version 5.0, which is characterized by the combination of knowledge acquired from other bases and corpora such as Wikipedia.
Speer et al. [36] proposes the interactive game called 20 Questions with the dual purpose of motivating voluntary contributions to the OMCS project and increasing the rate of new knowledge acquisition. This game uses a hierarchical cluster model to define a set of 20 questions that will be used to motivate the user and to define a cluster of concepts. For example, for acquisition of the concept “apple”, the game asks the following questions:
-
Is it an example of a place? Answer: No
-
Is it an example of food? A: Yes
-
Can you find it in a store? A: Yes ...
Based on these responses, the clustering algorithm can define that the new concept “apple” belongs to the same cluster of concepts “cheese”, “bread”, “meat”, etc. This method of KA was evaluated in two ways. The first evaluation consisted of a questionnaire for users of the game to compare the proposed method with the traditional way of including common-sense relationships in the OMCS project. For example, questions about how much more amusing the game is, and about how intuitive the game is. On average, 80 % of users evaluated that the 20 Questions game is more amusing than the traditional method. However, 56 % of users did not consider it intuitive. In the second evaluation, the authors measured the time it took to include a concept by using the game and by not using of the game. Users who used the game took 50 % less time than users who do not use the game. There was no assessment on the quality of the content acquired.
The Verbosity project [1] is also an interactive game for KA of common-sense knowledge. Just as [36], the main idea of Verbosity is to transform the process of common-sense KA into something amusing and interesting. It consists of a guessing game between a Narrator and a Guesser. The Narrator chooses a word and gives tips for the Guesser to discover the related concept. The tips are formulated by a template with a set of types of predetermined common-sense relations (contain, is a type of, is about, is the opposite of, is used for, is within, etc.). At the end of the process, if the Guesser is able to discover the concept that the Narrator is thinking of, the set of relations on the concept is acquired for a common-sense base. For example, the Narrator chooses the concept “computer” and formulates tips, such as “It contains a Keyboard.” The Narrator keeps formulating other tips until the Guesser discovers the concept chosen by the Narrator. At the end of the process, the formulated and answered tips will be expressed as common-sense knowledge. In the example, the relation “computer contains keyboard” will be expressed in the knowledge base. The evaluation of this method concluded that the average number of inclusions was 29.58 common-sense relations, in an average usage time estimated at 23.58 min.
ReVerb [12] is a system for extracting open (non-domain specific) common-sense relationships that uses a set of syntactic and lexical constraints. In general, it uses regular expressions to recognize sentences and morphological modifications, such as converting verbs to the infinitive form. The lexical constraint is intended to discard sentences with poorly formed or complex relationships. For example, the sentence “The Obama administration is offering only modest targets for reducing greenhouse gases at the conference”, ReVerb extracts the relation “X is offering only modest targets for reducing greenhouse gases at Y” with the arguments X \(=\) “Obama” and Y \(=\) “conference”. This relationship does not meet the lexical constraints because the relationship is very specific. It also has a sorting algorithm to exclude possible meaningless or incomplete relationships, i.e., relationships that have no relevant information. This model is specific to the English language. To evaluate this system, 500 relations extracted by ReVerb from texts on the web were chosen at random, which were reviewed by two evaluators. As a result, 86 % of the relations extracted by ReVerb were corroborated by human evaluators.
In [9], the authors propose an automatic method to generate new triples of knowledge based on common-sense metarules. The proposed algorithm automatically searches an extended WordNet,Footnote 1 base for the concepts that have a given property, and generates new axioms using common-sense facts. As an example, we can cite the acquisition of new relations for the concept “glass”. If “glass” has the property of “transparent”, and “see through” is a characteristic of “transparent”, then we can conclude that “see through” is also characteristic of “glass”. The method was evaluated through human validation. About 50 axioms generated by the method were randomly chosen and the users were asked which seemed correct and which didn’t make sense. Overall, we had a little more than 98 % accuracy for the proposed method.
For the Portuguese language, there are two important common-sense KA projects. The Open Mind Common Sense Brazil (OMCS-Br) project collects common-sense knowledge in Portuguese by collaborators on the web [3], such as the traditional KA strategy of the OMCS. Currently it has around 255,000 common-sense relationships. The InferenceNet Conceptual Base was initially translated from ConceptNet 2.0 by expert human translators and heuristics were applied to generate new common-sense and inferentialist knowledge relations [27]. It currently has 700,000 common-sense and inferentialist relations.
Our method is not intended to supplant the progress of these and other methods for acquiring common-sense knowledge. Instead, it is a complementary solution to leverage the process of KA. In this sense, the differential of the method proposed in this paper is the retrieval of similar content from the previous knowledge base, which facilitates more productive interactions for the acquisition of common-sense and inferentialist knowledge for new concepts. The interactions are more productive because the process helps the user to remember common-sense relations based on the content of related concepts. For example, for acquisition of the new concept crime passional, the algorithm proposes semantic relations retrieved from conceptual content of “paixão” [“passion”], namely: (eventPreRequisitOf, “paixão”, “amante”, Pre); (effectOf, “paixão”, “sofrimento”, Pos); (usedFor, “paixão”, “romance”, Pre), enriching process of KA of the new concept.
A comparative analysis with state of the art, presented here, allowed us to position our proposal in relation to the work of [12, 36], which bear some resemblance to our proposal; all of them, in one way or another, use a base of previous relations and concepts as the baseline for KA. None of these presented an evaluation that would enable a comparison regarding the quality of knowledge acquired. Our process of KA uses heuristics based on the grammatical structures of the concepts, and thereby augments the possibility of related concepts that will serve as a baseline for KA that will elicit, for the user, ideas about common-sense relations. The projects Verbosity [1] and ReVerb [12] are different from the proposal of this work because they do not use a conceptual base to support the process of KA.