17.Reasoning over Knowledge Graphs

Created on December 29|Last edited on January 13
Comment
﻿
Knowledge graphs are networks capture entities, types and relationships. Nodes are entities, nodes are labeled with their types and edges between two nodes capture relationship between entities.
Bibliographic networks are an example of knowledge graphs. Node types: paper, title, author, conference, year. And each of these different kinds of nodes will be connected by relation types like published_Where, published_Year, has_Title, has_author etc.
Social networks where node types of account, song, post, food, channel with relation types can be friendship relation, like, cook watch and listen. This is a broader representation of social network as a knowledge graph.
Google's Knowledge Graph example
A sample application of a knowledge graph is serving information, where you have a search query for "latest films by director of titanic" instead of searching through all the web and unstructured data, if we were able to construct a knowledge graph of movies and directors, we will be able to answer this query much faster.
Knowledge graphs are also useful in question answering systems and conversation agents.
Sample publicly available Knowledge Graphs: FreeBase, Wikidata, Dbpedia, YAGO, NELL etc.,
Knowledge graph CompletionKnowledge graphs are usually massive  with millions of nodes and edges, and most of the graph is incomplete, with many true edges missing. Whenever we have massive knowledge graph, enumerating all the possible true relationships and facts is not tractable. So we need to be able to predict plausible but missing links from knowledge graphs.
Freebase Knowledge GraphFreebase is a knowledge graph with ~50 million entities, with ~38K relations types and ~3 Billion facts about relationships between entities. Even with these many relationship edges there are 93% of people in the KG with no place of birth and 78% with no nationality.
We have a FB15k and FB15k-237 Knowledge Graphs that are subsets the original KG for research purposes that are much more refined as a good starting point.
DatasetEntitiesRelationsTotalEdges
FB15k14,9511,345592,213
FB15k-23714,505237310,079
Given an enormous Knowledge Graph our task is to complete the KG and predict missing relations. It's harder because we have to predict which two entities are related and what kind of relationship is between them.
KG RepresentationEdges are represented as triplets (h, r, t) where head(h) has a relation(r) with tail(t). And the key idea is to model entities and relations in the embedding/vector space ℜd\Re^dℜd.
Given a relationship tuple (h, r, t) that is present in the KG, the goal is that the embedding of (h, r) should be close to the embedding t. We need a way to embed (h, r) and a way to define closeness of (h, r) and t.
Not all relationships have similar properties, some are symmetric, some are not and etc.
Symmetric relations where r(h,t)⇒r(t,h)∀h,tr(h, t) \Rightarrow r(t, h) \forall h, tr(h,t)⇒r(t,h)∀h,t. Roommate and etc come under this.
Composition relations where r1(x,y)∧r2(y,z)⇒r3(x,z)∀x,y,zr_1(x, y) \wedge r_2(y, z) \Rightarrow r_3(x, z) \forall x, y, zr1​(x,y)∧r2​(y,z)⇒r3​(x,z)∀x,y,z. Where r1,r2,r3r_1, r_2, r_3r1​,r2​,r3​ are different relations. for example mother's husband is a father.
1 to N, N to 1 relations, where r(h,t1),r(h,t2),⋯r(h,tn)r(h, t_1), r(h, t_2),\cdots r(h, t_n)r(h,t1​),r(h,t2​),⋯r(h,tn​) are all true, for example where r is a relation type "StudentsOf".
TransEWhen we were learning about the TransE algorithm in previous sections, we considered the Translation intuition for relationships. For a triple (h, r, t) its embedding s h,r,t∈ℜd\bold{h,r, t} \in \Re^dh,r,t∈ℜd, will satisfy h+r=t\bold{h + r = t}h+r=t
And we used the score function fr(h,t)=∣∣h+r−t∣∣f_r(h, t) = ||h + r - t||fr​(h,t)=∣∣h+r−t∣∣.
And the max margin loss was given by:
L=∑(h,r,t)∈G,(h,r,t′)∉G[γ+fr(h,t)−fr(h,t′)]+\displaystyle \mathcal{L} = \sum_{(h, r, t) \in G, (h, r, t') \notin G} [\gamma + f_r(h, t) - f_r(h, t')]_+L=(h,r,t)∈G,(h,r,t′)∈/​G∑​[γ+fr​(h,t)−fr​(h,t′)]+​
where (h, r, t) was a valid triple and (h, r, t') a corrupted sample triple and γ\gammaγ is the margin, the smallest distance tolerated by the model between a valid triple and a corrupted one.
Composition in TransEWhen we have relations that follow composition like r1(x,y)∧r2(y,x)⇒r3(x,z)∀x,y,zr_1(x, y) \wedge r_2(y, x) \Rightarrow r_3(x, z) \forall x, y, zr1​(x,y)∧r2​(y,x)⇒r3​(x,z)∀x,y,z, in TransE the above relations will follow r3=r1+r2r_3 = r_1 + r_2r3​=r1​+r2​
Symmetry in TransEIf we want TransE to handle symmetric relations r for all h, t that satisfy r(h, t) if r(t, h) is also True. This implies ||h + r - t|| = 0 and ||t + r - h|| = 0, which leads to r=0 and h=t, but h, and t are different entities and if they are represented by the same embedding because of a single symmetric relation r, it will be a problem.
N-ary RelationsIf (h,r,t1)(h, r, t_1)(h,r,t1​) and (h,r,t2)(h, r, t_2)(h,r,t2​) both exist in the knowledge graph, with TransE, t1t_1t1​ and t2t_2t2​ both will map to the same vector, although they are different entities.
h+r=t1=t2h + r = t_1 = t_2h+r=t1​=t2​ in TransE, TransE will squash all the N entities into a single embedding when we have N-ary relations.
So TransE is not powerful enough to work where relations are symmetric or we have N to 1 or 1 to N relations in our Knowledge Graph.
TransRTransR models entities as vectors in the entity space ℜd\Re^dℜd and model each relation in the relation space ℜk\Re^kℜk with Mr∈ℜk×dM_r \in \Re^{k\times d}Mr​∈ℜk×d as the projection matrix. We separate out the embedding space for entities and relations.
If (h, r, t) is a valid triple in the Knowledge graph then h⊥=Mrh,t⊥=Mrt,fr(h,t)=∣∣h⊥+r−t⊥∣∣h_\bot = M_rh, t_\bot = M_rt, \quad f_r(h, t) = ||h_\bot + r - t_\bot||h⊥​=Mr​h,t⊥​=Mr​t,fr​(h,t)=∣∣h⊥​+r−t⊥​∣∣
Symmetry in TransRIn case of symmetric relation types r where r(h,t)⇒r(h,t)∀h,tr(h, t) \Rightarrow r(h, t) \forall h, tr(h,t)⇒r(h,t)∀h,t, h and t will have different embeddings in the embedding space, but their respective projections in the relations embedding h⊥,t⊥h_\bot, t_\both⊥​,t⊥​ can be the same.
In TransR h⊥=Mrh=Mrt=t⊥h_\bot = M_rh = M_rt = t_\both⊥​=Mr​h=Mr​t=t⊥​, now we can map h and to to the same location on the relation space.
N-ary Relations in TransRIn TransR we can learn MrM_rMr​ so that t⊥=Mrt1=Mrt2t_\bot = M_rt_1 = M_rt_2t⊥​=Mr​t1​=Mr​t2​ when we have valid triples (h,r,t1)(h, r, t_1)(h,r,t1​) and (h,r,t2)(h, r, t_2)(h,r,t2​) in the knowledge graph. And t1,t2t_1, t_2t1​,t2​ have their own embeddings in the entity space.
Composition Relations in TransRTransR fails when we have relations that can be composed, each relation has a different space. It is not naturally compositional for multiple relations.
Summary of TransE and TransREmbeddingEntityRelationfr(h,t)f_r(h, t)fr​(h,t)
TransEh,t∈ℜdh, t \in \Re^dh,t∈ℜdr∈ℜdr \in \Re^dr∈ℜd∥h+r−t∥\lVert h+r-t\rVert∥h+r−t∥
TransRh,t∈ℜdh, t \in \Re^dh,t∈ℜdr∈ℜk,Mr∈ℜk×dr \in \Re^k, M_r \in \Re^{k\times d}r∈ℜk,Mr​∈ℜk×d∥Mrh+r−Mrt∥\lVert M_rh+r-M_rt\rVert∥Mr​h+r−Mr​t∥
EmbeddingSymmetryCompositionOne to Many
TransE❌✅❌
TransR✅❌✅
QueriesOur goal is to make use of Knowledge Graphs and do multi-hop reasoning, i.e answering complex queries efficiently on an incomplete massive KG.
Example of different kinds of queries:
One-hop Queries: Where did Hinton graduate?
Path Queries: Where did turing Award winners graduate?
Conjunctive Queries: Where did Canadians with Turing award graduate?
EPFO Queries: Where did Canadians with turing Award or Nobel graduate?
One-hop QueryEasiest possible query which is basically a link prediction query on the Knowledge Graph.
where in Link Prediction we ask if link (h, r, t) is True where as on One-hop query we ask if t is an answer to the query (h, r)
﻿
Path QueriesWe use one-hope queries to generalize our path queries, by adding more relations on the path. Path queries can be represented by q=(va,r1,⋯ ,rn)q = (v_a, r_1, \cdots, r_n)q=(va​,r1​,⋯,rn​), where vav_ava​ is a constant node.
We define the computation graph of q as the graph of path queries in the chain.
In the question "Where did Turing Award winners graduate?", vav_ava​ is "Turing award" and r1,r2r_1, r_2r1​,r2​ is ("win", "graduate"). Given a Knowledge Graph and the computational graph representing the query below how do we get the answer?
We can answer path queries by traversing the Knowledge Graph like below
We can traverse an incomplete KG and miss lots of predictions, or we have to do link prediction and complete the KG with probabilistic edges. Which makes the KG a dense graph. And the time complexity of traversing a dense KG with |V| entities to answer (va,r1,⋯ ,rn)(v_a, r_1, \cdots, r_n)(va​,r1​,⋯,rn​) of length n is O(∣V∣n)O(|V|^n)O(∣V∣n)
Instead of doing this, we work on embedded queries. We generalize TransE to multi-hop reasoning. given a path query q=(va,r1,⋯ ,rn)q = (v_a, r_1, \cdots, r_n)q=(va​,r1​,⋯,rn​) we get the relation q=va+r1+⋯+rnq = v_a + r_1 + \cdots + r_nq=va​+r1​+⋯+rn​ in the embedding space. And after that we do a nearest neighbor search for all entities based on fq(v)=∣∣q−v∣∣f_q(v) = ||q - v||fq​(v)=∣∣q−v∣∣, with time complexity O(∣V∣)O(|V|)O(∣V∣)
Guu, Kelvin, John Miller et al "Traversing knowledge graphs in vector space".
Below you can see the computation graph of the path query and the embedding process for the question "Where did Turing Award winners graduate?"
We train the general TransE algorithm, and just using them embeddings for answering the path query.
﻿
Conjunctive QueriesConjunctive queries are queries that have multiple anchor nodes, with two different relations satisfying the same end node. An example of conjunctive query is "Where did Canadian citizens with Turing Award graduate?". And the computation graph of this conjunctive query looks like below.
If we have a complete KG we just traverse the KG starting from the anchor nodes and get the below paths. We will have to do an intersection for the first question and arrive at answers.
Similar to path traversal this is not tractable in the KG space, and we do it on the embedding space, but the problem arises when figuring out how to do the intersection in the embedding space. You can see the computational graph and the embedding process of the above query in the below image
We make use of Neural Intersection Operator to do intersection of several vectors in the embedding space.
For this we have several current query embeddings q1,⋯ ,\qmq_1, \cdots, \q_mq1​,⋯,\qm​ and the output has to be the intersection of the query embedding q. For this the order of the queries should not matter so our operation should be permutation invariant on the embedding vectors. Below J\mathcal{J}J should be permutation invariant.
J(q1,⋯ ,qm)=J(qp(1),⋯ ,qp(m))\mathcal{J}(q_1, \cdots, q_m) = \mathcal{J}(q_{p(1)}, \cdots, q_{p(m)})J(q1​,⋯,qm​)=J(qp(1)​,⋯,qp(m)​); where [p(1),⋯ ,p(m)][p(1), \cdots, p(m)][p(1),⋯,p(m)] is any permutation of [1,⋯ ,m][1, \cdots, m][1,⋯,m]
We make use of DeepSets architecture, to get the intersection embedding of initial query embeddings.
In the above architecture, ϕ,β\phi, \betaϕ,β are neural networks, where ϕ\phiϕ generates features of the input query embeddings, and add them using a permutation invariant operator like mean, and put it through the neural network β\betaβ and get our vector embedding of the intersection query.
We train this along with our embedding method on our KG.
TrainingGiven entity embedding v and a query embedding q, the distance fq(v)=∣∣q−v∣∣f_q(v) = ||q-v||fq​(v)=∣∣q−v∣∣. The training parameters will be
Entity embeddings: d|V|
Relation embeddings: d|R|
intersection operator ϕ,β\phi, \betaϕ,β
The number of training parameters does not depend on the graph size. We will have the same training strategy as TransE
We sample a query q, answer v, and negative sample v'. Embed the query q. Calculate the distances fq(v)f_q(v)fq​(v) and fq(v′)f_q(v')fq​(v′). Optimize the Loss L\mathcal{L}L
Given a test query q, embed the query q, for all v in KG, calculate fq(v)f_q(v)fq​(v) and get the best node entity.
LimitationsWe need a better way of finding intersection in case of conjunctive queries. When we traverse the KG to achieve the answers, each step produces a set of reachable entities, we are not modeling on these sets, we just operating on the embeddings of the relationships. Can we find a better expressive geometry to embed the queries.
﻿
Box EmbeddingsInstead of operating on vector embeddings we want to be able to model properly on the sets of entities we get after each step of the query. For this we use Box Embeddings. We embed queries with hyper-rectangles
q=(Center(q),Offset(q))q = (Center(q), Offset(q))q=(Center(q),Offset(q))
Taking intersection between query vectors is not an intuitive operation, but intersection of boxes is well defined. And similarly boxes are a powerful abstraction to model the sets of entities we end up with each query that can be thought of as entities enclosed by the box with a projected center and offset.
ParametersEntity embeddings d|V|, can be seen as zero-volume boxes.
Relations embeddings are modeled as actual boxes, hence we have 2d|R| parameters, one for the center and one for offset. Here we are augmenting earlier relation embeddings with an offset.
Intersection operator ϕ,β\phi, \betaϕ,β, here the number of parameters do not depend on the graph size. But the key difference is that the inputs and outputs of this intersection operator are both boxes.
Geometric Projection Operator P:Box×Relation→Box\mathcal{P}: \text{Box} \times \text{Relation} \rarr \text{Box}P:Box×Relation→Box.
Center(q') = Center(q)  + Center(r)
Offset(q') = Offset(q) + Offset(r)
The geometric projection operator will increase the size of the box with more relations. An entity with zero size will become a box a projection on a box will become even bigger.
The computation graph and embedding space of the above discussed query in case of Box embeddings will look like below
Instead of neural intersection operator we just use a Geometric Intersection Operator J:Box×⋯Box→Box\mathcal{J}: \text{Box} \times \cdots \text{Box} \rarr \text{Box}J:Box×⋯Box→Box
The new center of the intersected box will be a weighted average and the new offset shrinks. The Geometric Intersection operator shrinks the size of the boxes.
Center(qinter)=∑iwi⊙Center(qi)\text{Center}(q_{\text{inter}}) = \sum_iw_i\odot\text{Center}(q_i)Center(qinter​)=∑i​wi​⊙Center(qi​)
Offset(qinter)=min⁡(Offset(q1),⋯ ,Offset(qn))⊙σ(Deepsets(q1,⋯ ,qn))\text{Offset}(q_{\text{inter}}) = \min(\text{Offset}(q_1),\cdots,\text{Offset}(q_n)) \odot \sigma(\text{Deepsets}(q_1,\cdots,q_n))Offset(qinter​)=min(Offset(q1​),⋯,Offset(qn​))⊙σ(Deepsets(q1​,⋯,qn​))
The min guarantees shrinking, and the sigmoid function squashes output in (0, 1). We use Deepsets to identify what boxes are more important.
Following the computation graph on the complete query using box embeddings it looks like below in the Knowledge Graph.
Entity to box distanceOnce we have the query box q and entity vector v, we compute the distance between the two like below
dbox(q,v)=dout(q,v)+α.din(q,v)d_{\text{box}}(q, v) = d_{\text{out}}(q, v) + \alpha . d_{\text{in}}(q, v)dbox​(q,v)=dout​(q,v)+α.din​(q,v) where 0<α<10 < \alpha < 10<α<1
doutd_{\text{out}}dout​ is the distance from the box edge if the entity is outside, and dind_{\text{in}}din​ is the distance from the center.
We defined it this way to penalize entities outside the box while training.
TrainingGiven a set of queries and answers, query q, valid answer v, and wrong answer v'
L=−log⁡σ(γ−dbox(q,v))−log⁡σ(dbox(q,v′i)−γ)\mathcal{L} = -\log \sigma(\gamma - d_{\text{box}}(q, v)) - \log \sigma(d_{\text{box}}(q, {v'}_i) - \gamma)L=−logσ(γ−dbox​(q,v))−logσ(dbox​(q,v′i​)−γ)
Summary of Query algorithmsFrom the paper
EmbeddingSymmetryCompositionOne-to-many
TransE❌✅❌
TransH✅❌✅
Query2Box✅✅✅
N-ary Relations in query2boxSymmetric Relations in query2boxComposition Relations in query2box
﻿
EPFO QueriesConjunctive queries + disjunction is called Existential Positive First-Order(EPFO) queries.
A sample EPFO query looks like "Where did Canadians with Turing Award or Nobel graduate?"
For designing a disjunction operator and embed EPFO queries in low-dimensional vector space you can read this paper
﻿
Chapters
Introduction, Structure of Graphs
Properties of Networks and Random Graph Models
Motifs and Structural Roles in Networks
Community Structure in Networks
Spectral Clustering
Message Passing and Node Classification
Graph Representation Learning
Graph Neural Networks
Graph Neural Networks - Pytorch Geometric
Deep Generative Models for Graphs
Link Analysis: PageRank
Network Effects and Cascading Behaviour
Probabilistic Contagion and Models of Influence
Influence Maximization in Networks
Outbreak Detection in Networks
Network Evolution
Reasoning over Knowledge Graphs
Limitations of Graph Neural Networks
Applications of Graph Neural Networks
﻿
﻿
Dataset	Entities	Relations	TotalEdges
FB15k	14,951	1,345	592,213
FB15k-237	14,505	237	310,079
Embedding	Entity	Relation	$f_r(h, t)$
TransE	$\in \Re^d$	$\in \Re^d$	$∥h+r−t∥\lVert h+r-t\rVert$
TransR	$\in \Re^d$	$\in \Re^k, M_r \in \Re^{k\times d}$	$∥Mrh+r−Mrt∥\lVert M_rh+r-M_rt\rVert$
Embedding	Symmetry	Composition	One to Many
TransE	❌	✅	❌
TransR	✅	❌	✅
Embedding	Symmetry	Composition	One-to-many
TransE	❌	✅	❌
TransH	✅	❌	✅
Query2Box	✅	✅	✅
Add a comment