Link prediction using a probabilistic description logic
 José Eduardo Ochoa Luna^{1},
 Kate Revoredo^{2}Email author and
 Fabio Gagliardi Cozman^{1}
https://doi.org/10.1007/s1317301301088
© The Brazilian Computer Society 2013
Received: 23 November 2012
Accepted: 7 March 2013
Published: 25 April 2013
Abstract
Due to the growing interest in social networks, link prediction has received significant attention. Link prediction is mostly based on graphbased features, with some recent approaches focusing on domain semantics. We propose algorithms for link prediction that use a probabilistic ontology to enhance the analysis of the domain and the unavoidable uncertainty in the task (the ontology is specified in the probabilistic description logic cr\(\mathcal{ALC }\)). The scalability of the approach is investigated, through a combination of semantic assumptions and graphbased features. We evaluate empirically our proposal, and compare it with standard solutions in the literature.
Keywords
1 Introduction
Many social, biological, and information systems can be well described as networks, where nodes represent objects (individuals), and links denote the relations or interactions between nodes. Predicting a possible link in a network is an interesting issue that has received significant attention. For instance, one may be interested in finding potential friendships between two persons in a social network, or a potential collaboration between two researchers. In short, link prediction aims at predicting whether two nodes should be connected, given previous information about their relationships or interests.
Mohammad and Mohammed [18] survey representative link prediction methods, classifying them into three groups. In the first group, featurebased methods construct pairwise features to use in classification. The majority of the features are extracted from the graph topology by computing similarity based on the neighborhood of the pair of nodes, or based on ensembles of paths between the pair of nodes [15]. Semantic information has also been used as features [26, 32]. The second group includes probabilistic approaches that model the joint probability for entities in a network by Bayesian graphical models [31]. The third group employs linear algebraic approaches that compute the similarity between nodes in a network by rankreduced similarity matrices [14].
We present an approach for link prediction that combines Bayesian graphical models and semanticbased features. Hence, our proposal belongs to the first two categories mentioned in the previous paragraph. To represent semanticbased features, we employ a probabilistic description logic called Credal \(\mathcal{ALC }\) (cr\(\mathcal{ALC }\)) [5]. This probabilistic description logic extends the popular logic \(\mathcal{ALC }\) [27] with probabilistic inclusions. These are sentences, such as \(P(\mathsf{Professor }\mathsf{Researcher })=0.4\), specifying the probability that an element of the domain is a \(\mathsf{Professor }\) given that it is a \(\mathsf{Researcher }\). Exact and approximate inference algorithms for cr\(\mathcal{ALC }\) have been proposed [5], using ideas inherited from the theory of Relational Bayesian Networks [12]. We benefit from such algorithms, and add some techniques to make our approach scalable to real domains. We also present experimental validation of our proposal.
The paper is organized as follows. Section 2 reviews basic concepts of probabilistic description logics and of link prediction. Our proposals for a scalable semantic link prediction approach appear in Sect. 3. Section 4 describes experiments, and Sect. 5 concludes the paper and discusses some future work.
2 Background
This section briefly review probabilistic description logics and link prediction methods, with a focus on concepts and techniques that are later used.
2.1 Probabilistic description logics and cr\(\mathcal{ALC }\)
Description logics (DLs) form a family of representation languages that are typically decidable fragments of firstorder logic (FOL) [3]. Knowledge is expressed in terms of individuals, concepts, and roles. The semantics of a description is given by a domain\(\mathcal{D }\) (a set) and an interpretation\(\cdot ^{\mathcal{I }}\) (a functor). Individuals represent objects through names from a set \(N_\mathsf{I }=\{a,b,\ldots \}\). Each concept in the set \(N_\mathsf{C }=\{C,D,\ldots \}\) is interpreted as a subset of a domain \(\mathcal{D }\). Each role in the set \(N_\mathsf{R }=\{r,s,\ldots \}\) is interpreted as a binary relation on the domain. An assertion states that an individual belongs to a concept of that a pair of individuals satisfies a role. An ABox is a set of assertions.

\((C \sqcap D)^\mathcal{I } = C^\mathcal{I } \cap D^\mathcal{I }\);

\((C \sqcup D)^\mathcal{I } = C^\mathcal{I } \cup D^\mathcal{I }\);

\((\lnot C)^\mathcal{I } = \mathcal{D } \backslash C^\mathcal{I }\);

\((\exists r.C)^\mathcal{I }=\{x \in \mathcal{D }  \exists y : (x,y)\in r^\mathcal{I } \wedge y \in C^\mathcal{I }\}\);

\((\forall r.C)^\mathcal{I } = \{ x \in \mathcal{D }  \forall y : (x,y) \in r^\mathcal{I } \rightarrow y \in C^\mathcal{I }\}\).
Several probabilistic description logics have appeared in the literature [13, 17]; here we just indicate a few representative proposals.
The semantics of cr\(\mathcal{ALC }\) is based on probability measures over the space of interpretations, for a fixed domain. To make sure a terminology specifies a single probability measure, a number of additional assumptions are adopted: the domain is assumed finite, fixed, and known; the uniquename assumption and the rigidity assumption for individuals (as usual in firstorder probabilistic logic [6]) are assumed; a single concept name appears in the left hand side of any inclusion or definition and in the conditioned side of any probabilistic inclusion; and finally a Markov condition imposes independence of any grounding of concept/role conditional on the groundings of its corresponding parents in the graph \(\mathcal{G }(\mathcal{T })\) [5]. Given these assumptions, a set of sentences \(\mathcal{T }\) in cr\(\mathcal{ALC }\) defines a relational Bayesian network [12] whose underlying graph is exactly \(\mathcal{G }(\mathcal{T })\).
Inferences, such as \(P(\mathsf{A }_o(\mathsf{a }_0)\mathcal{A })\) for an ABox \(\mathcal{A }\), can be computed by grounding, thus generating a Bayesian network where one “slice” is built for each individual. For instance, in the Bayesian network depicted in Fig. 2 two slices, one for individual \(\mathsf{bob }\) and another for individual \(\mathsf{paper }\), are built. For large domains, exact probabilistic inference is in general quite hard. Variational algorithms that approximate such probabilities are available in the literature [5].
2.2 Link prediction
The task we are interested in can be defined as follows [15]. One is given a network (a graph) \(G\) consisting of a set of nodes \(V\) (represented by letters \(a,b\), etc) and a set of edges \(E\), where an edge represents an interaction between nodes. Interactions may be tagged with times, and the link prediction problem may be one of predicting the existence of edges in a time interval, given the edges observed in another time interval. Here we are interested in a static problem where we are given nodes and edges, except for the edge between two nodes \(A\) and \(B\), and we must then predict whether there is an edge between \(A\) and \(B\).
Approaches to link prediction can be understood not only by considering the kinds of tools employed, but also by examining the model that is used to represent the network as a whole. Typically, one assumes some sort of probabilistic mechanism that at least partially explains the existence of edges, perhaps together with domainspecific knowledge (for instance, domain theories about human relationships) [9, 19]. Thus the simplest network model is the Erdös–Rènyi random graph: each pair of nodes can be connected with identical probability. More sophisticated models resort to hierarchical specification of link probabilities, or to grouping of nodes within blocks of varying probability.
One way to capture the probabilistic structure of a network is through graphbased models such as Markov random fields or Bayesian networks [23]. However, these languages are well suited to express independence relations between a fixed set of random variables; when nodes and links are to be dealt within graphs, it is best to consider modeling languages that can specify Markov random fields and Bayesian networks over relational structures. Indeed many proposals for link prediction resort to such languages, from seminal work by Getoor et al. [8] and Taskar et al. [29]. The presence of relational structure lets one to represent properties of individuals nodes, of links, of communities; one can then compute the probability of specific links, and estimate such probabilities from data. In this paper, we follow this modeling strategy; the difference between our modeling language and previous proposals is that we adopt a language based on description logics, as already indicated in the previous section. Our interest in models based on description logics is justified given recent results on the importance of ontologies in organizing information that can be used in link prediction [2, 4, 30].
3 Link prediction with cr\(\mathcal{ALC }\)
Given a network \(G\) where many links are observed, one is interested in predicting whether a link between nodes \(a\) and \(b\) exists (presumably the linkage between \(a\) and \(b\) has not been observed). We address this problem by considering, in addition to topological information about the network, knowledge about the domain concerning network entities. To do so, domain knowledge is represented through a probabilistic ontology using cr\(\mathcal{ALC }\). Among the concepts (\(N_C\)) and roles (\(N_R\)) in the ontology, there is a concept \({{\hat{C}}}\) that indicates which elements of the domain are nodes in \(G\), and a role \({\hat{r}}\) that indicates which pairs of elements are linked—hence \({\hat{C}}\) and \({\hat{r}}\) describe the network itself, while other concepts and roles describe the remaining domain knowledge. In our experience, it is important to explicitly indicate which elements of the domain are nodes, to make sure inference runs only with the required elements (in effect this is providing a type that separates network nodes from other elements of the domain).
Our first link prediction algorithm is described in Algorithm 1.^{1}
The algorithm starts by going through all pairs of instances of the concept \({\hat{C}}\) (that is, all nodes). For each pair, it checks whether a link between the corresponding nodes exist in the network; if not, the probability of the link is computed using the relational Bayesian network extracted from the ontology \(\mathcal{O }\). If the probability is greater than a threshold, then the corresponding link is added to the set of suggested links. (Alternatively, when the threshold is not given, a list of links, ranked by their probability, can be produced.)
The evidence is the given set of assertions; the size of this set has great impact in inference effort. When inferences are computed, the ontology is turned into a relational Bayesian network, whose grounding is a Bayesian network—each assertion may generate a new slice of nodes in this grounded Bayesian network. Approximate algorithms are necessary for inference; in this work we employ the variational inference method described in Ref. [5]. While one can suppose that more assertions lead to more accurate predictions, the computational effort involved in inference may be so large as to generate bad approximations. Hence it is important to filter out assertions and to focus on the most relevant ones.
We are interested in predicting a relationship between two specific nodes, \(a\) and \(b\). Therefore, assertions directly related to these two objects and to other objects strongly related to them in the network are more relevant for link prediction than assertions on other objects in the network. We can make our link prediction algorithm scalable if we only consider assertions about \(a,b\) and about the objects strongly related to them in our inferences. To do so, we must specify the set \(\mathcal{A }(a,b)\) of elements of the domain that are deemed strongly related to \(a\) and \(b\).
LibenNowell and Kleinberg [15] compute similarities between two nodes using ensembles of paths between the two nodes (so as to decide whether to include a link between the nodes). It seems reasonable to adopt the same strategy, and define \(\mathcal{A }(a,b)\) to contain nodes in paths between \(a\) and \(b\) (although we could consider all possible paths between two nodes, compute this could be expensive. Hence, we restrict ourselves to a path size of five). Therefore, in Algorithm 1 the evidence must be specialized for each pair of nodes; given \(a\) and \(b\), the set \(\mathcal{A }(a,b)\) must be constructed and the relevant assertions are then collected into \(E\).
4 Experiments
Experiments have been conducted to evaluate our approach to semantic link prediction. A real world data repository, the Lattes curriculum platform, was used. Our algorithm was combined with stateoftheart classifiers for link prediction. This section reports the steps involved in this process.
4.1 Scenario description
The Lattes platform is the public repository of Brazilian scientific curricula that consists of approximately a million registered documents. Information is encoded in HTML format, ranging from personal information such as name and professional address to publication lists, administrative tasks, research areas, research projects and advising/advisor information. There is implicit relational information in these web pages; for instance, collaboration networks are built by advising/adviser links, shared publications, and so on.
Research areas and number of coauthored collaboration
Research area  Code  Number 

Agricultural Sciences  A1  17,157 
Biological Sciences  A2  23,222 
Exact and Earth Sciences  A3  18,440 
Human Sciences  A4  2,281 
Social Sciences  A5  4,462 
Health Sciences  A6  17,255 
Engineering  A7  10,879 
Languages and Arts  A8  1,315 
Assertions were extracted from the Lattes platform concerning these researchers. For instance, if a parser finds that a researcher John has four publications (\(p_1,p_2,p_3,p_4\)) and a researcher Mary has two (\(p_2,p_5\)), where \(p_2\) was done in collaboration with John, then assertions, as the following, are extracted:
\(\mathsf{Researcher(john) },\mathsf{Researcher(ann) }\),
\(\mathsf{Publication(p_1) },\mathsf{Publication(p_2) }, \mathsf{Publication(p_3) }\),
\(\mathsf{Publication(p_4) },\mathsf{Publication(p_5) }\)
\(\mathsf{sharePublication(john,ann) }\).
A probabilistic ontology was then learned using algorithms in the literature [20, 24]. This ontology is comprised by 24 probabilistic inclusions and 17 concept definitions. Because learning is mainly concerned with deterministic and probabilistic inclusions, the learned ontology was enlarged with 4 relevant roles. Parts of the final ontology can be seen in Figs. 3 and 4.
In this probabilistic ontology, concepts and probabilistic inclusions typically denote mutual research interests. In short, in this ontology a \(\mathsf{ResearcherLattes }\) is a person that has publications, advises other people and participates on examination boards. On the other hand, a \(\mathsf{SupervisionCollaborator }\) is a probabilistic inclusion which denotes a kind of researcher that was advised for another researcher. The \(\mathsf{SameInstitution }\) concept denotes researchers that work at the same institution. Seemingly, the \(\mathsf{SameBoard }\) concept denotes researchers that have participated on same examination boards. The \(\mathsf{NearCollaborator }\) is a probabilistic inclusion that denotes researchers working at the same institution that have shared publications. The \(\mathsf{FacultyNearCollaborator }\) is a near collaborator that also participates of same examination boards. The \(\mathsf{NullMobility }\)\(\mathsf{Researcher }\) concept denotes researchers which have low mobility, i.e., they remain at the same institution where they were advised. The \(\mathsf{StrongRelatedResearcher }\) denotes strong relationship between two researchers (advisor and advisee) which also share publications.
The concept \(\mathsf{Researcher }\) indicates whether an element of the domain is a node in the network (hence \(\mathsf{Researcher }\) is \({\hat{C}}\)) and the role \(\mathsf{sharePublication }\) indicates whether a pair of elements of the domain is linked in the network (hence \(\mathsf{sharePublication }\) is \({\hat{r}}\)).
4.2 Methodology
In this section, we describe our main design choices to run experiments.
Given the 8,000 selected researchers, there exist \(31{,}996{,}000\) possible link relationships. To perform link prediction we have considered collaborations based on coauthorship on publications (there are \(2{,}837{,}206\) publications). After analysing these publications we identified \(95{,}100\) true positive links among researchers based on coauthorship. Table 1 details true coauthorship collaborations for every research area.
Lattes datasets: number of positive (+) and negative (\(\)) examples
Name  # Examples (\(+/\)) 

Lattes I (General)  90,000 
Lattes II (Biological Sciences)  20,000 
Lattes III (Exact and Earth Sciences)  18,000 
Although we can use probabilistic inference to decide whether there is a link between two nodes, to perform comparisons with previous approaches we resort to a classification algorithm approach. This paradigm allow us to combine several metrics (topological, semantic and probabilistic) as features of a classification algorithm. In this sense, we can compare which feature is more relevant by adding, deleting and combining features and observing the classification results.
To perform classification we resort to the Logistic regression algorithm. Which outputs values between 0 and 1 (due the logistic function) and prevent us from doing feature normalization. A threshold of 0.5 was used to decide a classification.
The features used in the classification for link prediction (defined in Sect. 2.2) are commonly extracted from topological graph properties such as neighborhood and paths between nodes. In addition, numerical features are also computed from joint probability distributions and semantics.
The two baseline graphbased numerical feature, Katz and AdamicAdar measures, have been used in our experiments. For the first one, since computing all paths (\(\infty \)) is expensive we only consider paths of length at most four (\(i\le 4\)).
 (i)
The keyword match count between two researchers [10].
 (ii)
The cosine between the TFIDF features vectors of two researchers [31].
4.3 Results
In order to evaluate suitability of our approach in predicting coauthorships in the Lattes dataset, several experiments were run. The experiments were performed in three stages, considering incrementally, topological, semantic and probabilisticlogic scores.
Classification results for datasets Lattes I, Lattes II and Lattes III on accuracy (\(\%\)) for baseline features: AdamicAdar (Adamic), Katz, Word matching (Match), Cosine, CrALC and a combination of them
Stage  Feature  Lattes I (acc.)  Lattes II (acc.)  Lattes III (acc.) 

l  Adamic  83.34 \(\pm \) 1.87  82.5 \(\pm \) 1.35  81.23 \(\pm \) 1.46 
Katz  85.4 \(\pm \) 1.07  87.7 \(\pm \) 0.91  84.43 \(\pm \) 0.84  
Adamic + Katz  85.9\(\pm \) 1.12  87.75\(\pm \) 1.03  85.44\(\pm \) 0.78  
2  Match  75.42 \(\pm \) 1.66  73.42 \(\pm \) 2.66  72.8 \(\pm \) 0.47 
Cosine  89.35 \(\pm \) 1.28  90.4 \(\pm \) 1.37  86.7\(\pm \) 0.85  
Adamic + Katz + Match + Cosine  91.63\(\pm \) 1.23  90.69\(\pm \) 1.23  86.3 \(\pm \) 0.12  
3  Cralc  93.3 \(\pm \) 0.79  94.2 \(\pm \) 1.48  89.72 \(\pm \) 1.67 
Adamic + Katz + Match + Cosine + Cralc  93.89\(\pm \) 0.83  94.46\(\pm \) 0.83  90.2\(\pm \) 0.72 
For all three Lattes dataset, the Katz feature yields the best accuracy when the two topological features are used in isolation. Katz has been shown to be among the most effective topological measures for the link prediction task [15]. Furthermore, when we combine the two features, we improve all three accuracy.
In the second stage, we evaluate two features based on semantic similarity and their combination with topological features. Results on accuracy for these semantic features are depicted in Table 3 (stage 2). The cosine similarity feature performs better than matching keyword feature and outperforms the two former topological features. When we combine all four features together, there is an improvement in accuracy considering datasets Lattes I and Lattes II. Dataset Lattes III was indifferent to the combination of all four features.
Finally, in the third stage, a probabilistic feature based on cr\(\mathcal{ALC }\) was introduced into the model. Results on accuracy for this feature are depicted in Table 3 (stage 3), showing it performs better than all other features. Moreover, there is significant improvement in accuracy considering datasets Lattes 1 and Lattes 2, when all five features are combined.
It is worth noting that the probabilistic logic feature used in isolation outperforms all other features and allows us to improve the classification model for link prediction on accuracy. It could be argued that such performance stems from evidence used on probabilistic inferences, but a similar analysis could be done for topological and semantic features. They use information that is missing on a probabilistic description logic setting. In conclusion, despite the fact that all features have different approaches, experimental results showed that they can be successfully used together.
Nothing prevents us from defining adhoc probabilistic networks to estimate link probabilities. However, by doing so we are expected to define a large propositionalized network (a relational Bayesian network) [25] or estimate local probabilistic networks [31]. These approaches do not scale well, since computing probabilistic inference for large networks is expensive.
To overcome these performance and scalability issues, we resort to lifted inference in cr\(\mathcal{ALC }\) which is based on variational methods—tuned by evidence defined according to the nodes’s neighborhood. Thus, for a 10,000 possible nodes, if evidence is given for 5 nodes (this is the neighborhood for a given link candidate), then there are only 6 slices which have messages interchanged. To instantiate the overall network, we use local evidence to perform inference for every link candidate, i.e., neighborhood evidence is instantiated accordingly.
Average runtime for inference in cr\(\mathcal{ALC }\) considering the number of nodes in the network
# Nodes  Runtime (ms) 

10,000  168 
100,000  175 
10,000,000  185 
5 Conclusion
In this paper, we have introduced a link prediction method that combines graphbased and ontological information through the use of a probabilistic description logic. Given a collaborative network, we encode interests and graph features through a cr\(\mathcal{ALC }\) probabilistic ontology. To predict links, we resort to probabilistic inference—thus we combine and extend previous work on relational probabilistic models of link prediction, and on ontologybased link prediction. To make the proposal scalable we propose a novel strategy for approximating link probabilities: for each pair of nodes, we focus only on evidence collected along paths between them. Our proposal was evaluated on an academic domain, where links among researchers were predicted. Moreover, the approach was successfully compared with graphbased and semanticbased features.
Compared to previous work, our approach employs a rich ontology (as opposed to simple isa terminologies) that can encode substantial information about the domain. Hierarchical structure can be encoded together with knowledge about specific nodes in a network—we plan to explore richer ontologies in the future. Moreover, our proposal attains better scalability than previous proposals that have tried to explore probabilistic relational models for similar purposes.
This algorithm was first discussed in Ref. [25], and later refined, together with Algorithm 2, in Refs. [22] and [21]; the presentation is here further refined. Some experiments and results reported here appeared in those preliminary publications; in this paper we also describe novel experiments with significantly larger datasets.
The problem of class skewness, imbalance in the class distribution, give rise to poor performance of a supervised learning algorithm [18]. To cope with this issue, existing research suggests several different approaches, such as altering the training sample by upsampling or downsampling, i.e., balancing.
Declarations
Acknowledgments
The third author is partially supported by CNPq. The work reported here has received substantial support by FAPESP Grant 2008/039955 and FAPERJ Grant E26/111484/2010. Thanks to Jesus Pascual Mena Chalco for providing us datasets and figures of the Lattes research areas.
Authors’ Affiliations
References
 Adamic L, Adar E (2001) Friends and neighbors on the web. Soc Netw 25:211–230View ArticleGoogle Scholar
 Aljandal W, Bahirwani V, Caragea D, Hsu H (2009) Ontologyaware classification and association rule mining for interest and link prediction in social networks. In: AAAI 2009 Spring symposium on social semantic web: where web 2.0 meets web 3.0. Standford, CAGoogle Scholar
 Baader F, Nutt W (2007) Basic description logics. In: Description logic handbook. Cambridge University Press, Cambridge, pp 47–100Google Scholar
 Caragea D, Bahirwani V, Aljandal W, Hsu W (2009) Ontologybased link prediction in the livejournal social network. In: SARA’09, p 1Google Scholar
 Cozman FG, Polastro RB (2009) Complexity analysis and variational inference for interpretationbased probabilistic description logics. In: Proceedings of the twentyfifth conference annual conference on uncertainty in artificial intelligence (UAI09). AUAI Press, Corvallis, Oregon, pp 117–125Google Scholar
 Fagin R, Halpern JY, Megiddo N (1990) A logic for reasoning about probabilities. Inf Comput 87:78–128MathSciNetView ArticleGoogle Scholar
 Getoor L, Diehl CP (2005) Link mining: a survey. ACM SIGKDD Explor Newsl 7(2):3–12View ArticleGoogle Scholar
 Getoor L, Friedman N, Koller D, Taskar B (2002) Learning probabilistic models of link structure. J Mach Learn Res 3:679–707MathSciNetGoogle Scholar
 Goldenberg A, Zheng AX, Fienberg SE, Airoldi EM (2010) A survey of statistical network models. Found Trends Mach Learn 2(2):129–233Google Scholar
 Hasan MA, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: Proceedings of SDM 06 workshop on link analysis, counterterrorism and securityGoogle Scholar
 Heinsohn J (1994) Probabilistic description logics. In: International conference on uncertainty in artificial intelligence, pp 311–318Google Scholar
 Jaeger M (2002) Relational Bayesian networks: a survey. Linkoping Electr Artic Comput Inf Sci 6Google Scholar
 Klinov P (2008) Pronto: A nonmonotonic probabilistic description logic reasoner. In: The semantic web research and applications, pp 822–826Google Scholar
 Kunegis J, Lommatzsch A (2009) Learning spectral graph transformations for link prediction. In: Proceedings of the ICML, pp 561–568Google Scholar
 LibenNowell D, Kleinberg J (2007) The link prediction problem for social networks. J Am Soc Inf Sci Technol 7(58):1019–1031View ArticleGoogle Scholar
 Lu L, Zhou T (2011) Link prediction in complex networks: a survey. Physica A 390:1150–1170MathSciNetView ArticleGoogle Scholar
 Lukasiewicz T, Straccia U (2008) Managing uncertainty and vagueness in description logics for the semantic web. Semant Web J 6(4):291–308View ArticleGoogle Scholar
 Mohammad A, Mohammed J (2011) A survey of link prediction in social networks. In: Social network data analytics, pp 243–275Google Scholar
 Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167–256MathSciNetView ArticleGoogle Scholar
 OchoaLuna J, Revoredo K, Cozman F (2011) Learning probabilistic description logics: a framework and algorithms. In: Proceedings of the MICAI, LNCS, vol 7094. Springer, Berlin, pp 28–39Google Scholar
 OchoaLuna J, Revoredo K, Cozman F (2012) An experimental evaluation of a scalable probabilistic description logics approach for semantic link prediction. In: Bobillo F et al (eds) Proceedings of the 8th international workshop on uncertainty reasoning for the semantic web, vol 900. CEURWS.org, Shangai, China,analytics, pp 63–74Google Scholar
 OchoaLuna J, Revoredo K, Cozman F (2012) A scalable semantic link prediction approach through probabilistic description logics. In: Proceedings of 9th artificial intelligence national meeting (ENIA)Google Scholar
 Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, Sananalytics, FranciscoGoogle Scholar
 Revoredo K, OchoaLuna J, Cozman F (2010) Learning terminologies in probabilistic description logics. In: da Rocha Costa A, Vicari R, Tonidandel F (eds) Advances in artificial intelligence SBIA, (2010) Lecture Notes in Computer Science, vol 6404. Springer/Heidelberg, Berlin, pp 41–50Google Scholar
 Revoredo K, OchoaLuna J, Cozman F (2011) International workshop on URSW, semantic link prediction through probabilistic description logics. In: Bobillo F et al (eds) Proceedings of the 7th international workshop on URSW, vol 778, pp 87–97Google Scholar
 Sachan M, Ichise R (2011) Using semantic information to improve link prediction results in network datasets. Int J Comput Theory Eng 3:71–76Google Scholar
 SchmidtSchauss M, Smolka G (1991) Attributive concept descriptions with complements. Artif Intel 48:1–26MathSciNetView ArticleGoogle Scholar
 Sebastiani F (1994) A probabilistic terminological logic for modelling information retrieval. In: ACM conference on research and development in information retrieval (SIGIR), pp 122–130Google Scholar
 Taskar B, Wong MF, Abbeel P, Koller D (2003) Link prediction in relational data. In: Proceedings of neural information processing systemsGoogle Scholar
 Thor A, Anderson P, Raschid L, Navlakha S, Saha B, Khuller S, Zhang XN (2011) Link prediction for annotation graphs using graph summarization. In: The semantic webISWC, pp 714–729Google Scholar
 Wang C, Satuluri V, Parthasarathy S (2007) Local probabilistic models for link prediction. In: Proceedings of the 2007 seventh IEEE ICDM. IEEE Computer Society, Washington, DC, USA, pp 322–331. doi:10.1109/ICDM.2007.108
 Wohlfarth T, Ichise R (2008) Semantic and eventbased approach for link prediction. In: Proceedings of the 7th international conference on practical aspects of knowledge managementGoogle Scholar