Skip to main content

17.Reasoning over Knowledge Graphs

Created on December 29|Last edited on January 13

Knowledge graphs are networks capture entities, types and relationships. Nodes are entities, nodes are labeled with their types and edges between two nodes capture relationship between entities.

Bibliographic networks are an example of knowledge graphs. Node types: paper, title, author, conference, year. And each of these different kinds of nodes will be connected by relation types like published_Where, published_Year, has_Title, has_author etc.

Social networks where node types of account, song, post, food, channel with relation types can be friendship relation, like, cook watch and listen. This is a broader representation of social network as a knowledge graph.

Google's Knowledge Graph example google kg

A sample application of a knowledge graph is serving information, where you have a search query for "latest films by director of titanic" instead of searching through all the web and unstructured data, if we were able to construct a knowledge graph of movies and directors, we will be able to answer this query much faster.

Knowledge graphs are also useful in question answering systems and conversation agents.

Sample publicly available Knowledge Graphs: FreeBase, Wikidata, Dbpedia, YAGO, NELL etc.,

Knowledge graph Completion

Knowledge graphs are usually massive with millions of nodes and edges, and most of the graph is incomplete, with many true edges missing. Whenever we have massive knowledge graph, enumerating all the possible true relationships and facts is not tractable. So we need to be able to predict plausible but missing links from knowledge graphs.

Freebase Knowledge Graph

Freebase is a knowledge graph with ~50 million entities, with ~38K relations types and ~3 Billion facts about relationships between entities. Even with these many relationship edges there are 93% of people in the KG with no place of birth and 78% with no nationality.

We have a FB15k and FB15k-237 Knowledge Graphs that are subsets the original KG for research purposes that are much more refined as a good starting point.

DatasetEntitiesRelationsTotalEdges
FB15k14,9511,345592,213
FB15k-23714,505237310,079

Given an enormous Knowledge Graph our task is to complete the KG and predict missing relations. It's harder because we have to predict which two entities are related and what kind of relationship is between them.

KG Representation

Edges are represented as triplets (h, r, t) where head(h) has a relation(r) with tail(t). And the key idea is to model entities and relations in the embedding/vector space ℜd\Re^d.

Given a relationship tuple (h, r, t) that is present in the KG, the goal is that the embedding of (h, r) should be close to the embedding t. We need a way to embed (h, r) and a way to define closeness of (h, r) and t.

Not all relationships have similar properties, some are symmetric, some are not and etc.

  • Symmetric relations where r(h,t)⇒r(t,h)∀h,tr(h, t) \Rightarrow r(t, h) \forall h, t. Roommate and etc come under this.
  • Composition relations where r1(x,y)∧r2(y,z)⇒r3(x,z)∀x,y,zr_1(x, y) \wedge r_2(y, z) \Rightarrow r_3(x, z) \forall x, y, z. Where r1,r2,r3r_1, r_2, r_3 are different relations. for example mother's husband is a father.
  • 1 to N, N to 1 relations, where r(h,t1),r(h,t2),⋯r(h,tn)r(h, t_1), r(h, t_2),\cdots r(h, t_n) are all true, for example where r is a relation type "StudentsOf".

TransE

When we were learning about the TransE algorithm in previous sections, we considered the Translation intuition for relationships. For a triple (h, r, t) its embedding s h,r,t∈ℜd\bold{h,r, t} \in \Re^d, will satisfy h+r=t\bold{h + r = t}

And we used the score function fr(h,t)=∣∣h+r−t∣∣f_r(h, t) = ||h + r - t||.

And the max margin loss was given by:

L=∑(h,r,t)∈G,(h,r,t′)∉G[γ+fr(h,t)−fr(h,t′)]+\displaystyle \mathcal{L} = \sum_{(h, r, t) \in G, (h, r, t') \notin G} [\gamma + f_r(h, t) - f_r(h, t')]_+

where (h, r, t) was a valid triple and (h, r, t') a corrupted sample triple and γ\gamma is the margin, the smallest distance tolerated by the model between a valid triple and a corrupted one.

Composition in TransE

When we have relations that follow composition like r1(x,y)∧r2(y,x)⇒r3(x,z)∀x,y,zr_1(x, y) \wedge r_2(y, x) \Rightarrow r_3(x, z) \forall x, y, z, in TransE the above relations will follow r3=r1+r2r_3 = r_1 + r_2

Symmetry in TransE

If we want TransE to handle symmetric relations r for all h, t that satisfy r(h, t) if r(t, h) is also True. This implies ||h + r - t|| = 0 and ||t + r - h|| = 0, which leads to r=0 and h=t, but h, and t are different entities and if they are represented by the same embedding because of a single symmetric relation r, it will be a problem.

N-ary Relations

If (h,r,t1)(h, r, t_1) and (h,r,t2)(h, r, t_2) both exist in the knowledge graph, with TransE, t1t_1 and t2t_2 both will map to the same vector, although they are different entities.

h+r=t1=t2h + r = t_1 = t_2 in TransE, TransE will squash all the N entities into a single embedding when we have N-ary relations.

So TransE is not powerful enough to work where relations are symmetric or we have N to 1 or 1 to N relations in our Knowledge Graph.

TransR

TransR models entities as vectors in the entity space ℜd\Re^d and model each relation in the relation space ℜk\Re^k with Mr∈ℜk×dM_r \in \Re^{k\times d} as the projection matrix. We separate out the embedding space for entities and relations.

If (h, r, t) is a valid triple in the Knowledge graph then h⊥=Mrh,t⊥=Mrt,fr(h,t)=∣∣h⊥+r−t⊥∣∣h_\bot = M_rh, t_\bot = M_rt, \quad f_r(h, t) = ||h_\bot + r - t_\bot||

TransR

Symmetry in TransR

In case of symmetric relation types r where r(h,t)⇒r(h,t)∀h,tr(h, t) \Rightarrow r(h, t) \forall h, t, h and t will have different embeddings in the embedding space, but their respective projections in the relations embedding h⊥,t⊥h_\bot, t_\bot can be the same.

In TransR h⊥=Mrh=Mrt=t⊥h_\bot = M_rh = M_rt = t_\bot, now we can map h and to to the same location on the relation space.

symmetry in TransR

N-ary Relations in TransR

In TransR we can learn MrM_r so that t⊥=Mrt1=Mrt2t_\bot = M_rt_1 = M_rt_2 when we have valid triples (h,r,t1)(h, r, t_1) and (h,r,t2)(h, r, t_2) in the knowledge graph. And t1,t2t_1, t_2 have their own embeddings in the entity space.

N-ary relations in TransR

Composition Relations in TransR

TransR fails when we have relations that can be composed, each relation has a different space. It is not naturally compositional for multiple relations.

Summary of TransE and TransR

EmbeddingEntityRelationfr(h,t)f_r(h, t)
TransEh,t∈ℜdh, t \in \Re^dr∈ℜdr \in \Re^d∥h+r−t∥\lVert h+r-t\rVert
TransRh,t∈ℜdh, t \in \Re^dr∈ℜk,Mr∈ℜk×dr \in \Re^k, M_r \in \Re^{k\times d}∥Mrh+r−Mrt∥\lVert M_rh+r-M_rt\rVert
EmbeddingSymmetryCompositionOne to Many
TransE
TransR

Queries

Our goal is to make use of Knowledge Graphs and do multi-hop reasoning, i.e answering complex queries efficiently on an incomplete massive KG.

Example of different kinds of queries:

  • One-hop Queries: Where did Hinton graduate?
  • Path Queries: Where did turing Award winners graduate?
  • Conjunctive Queries: Where did Canadians with Turing award graduate?
  • EPFO Queries: Where did Canadians with turing Award or Nobel graduate?

One-hop Query

Easiest possible query which is basically a link prediction query on the Knowledge Graph.

where in Link Prediction we ask if link (h, r, t) is True where as on One-hop query we ask if t is an answer to the query (h, r)



Path Queries

We use one-hope queries to generalize our path queries, by adding more relations on the path. Path queries can be represented by q=(va,r1,⋯ ,rn)q = (v_a, r_1, \cdots, r_n), where vav_a is a constant node.

We define the computation graph of q as the graph of path queries in the chain.

computation graph

In the question "Where did Turing Award winners graduate?", vav_a is "Turing award" and r1,r2r_1, r_2 is ("win", "graduate"). Given a Knowledge Graph and the computational graph representing the query below how do we get the answer?

cg of the query

We can answer path queries by traversing the Knowledge Graph like below traversing KG

We can traverse an incomplete KG and miss lots of predictions, or we have to do link prediction and complete the KG with probabilistic edges. Which makes the KG a dense graph. And the time complexity of traversing a dense KG with |V| entities to answer (va,r1,⋯ ,rn)(v_a, r_1, \cdots, r_n) of length n is O(∣V∣n)O(|V|^n)

Instead of doing this, we work on embedded queries. We generalize TransE to multi-hop reasoning. given a path query q=(va,r1,⋯ ,rn)q = (v_a, r_1, \cdots, r_n) we get the relation q=va+r1+⋯+rnq = v_a + r_1 + \cdots + r_n in the embedding space. And after that we do a nearest neighbor search for all entities based on fq(v)=∣∣q−v∣∣f_q(v) = ||q - v||, with time complexity O(∣V∣)O(|V|)

Guu, Kelvin, John Miller et al "Traversing knowledge graphs in vector space".

Below you can see the computation graph of the path query and the embedding process for the question "Where did Turing Award winners graduate?"

vector space

We train the general TransE algorithm, and just using them embeddings for answering the path query.



Conjunctive Queries

Conjunctive queries are queries that have multiple anchor nodes, with two different relations satisfying the same end node. An example of conjunctive query is "Where did Canadian citizens with Turing Award graduate?". And the computation graph of this conjunctive query looks like below.

computation graph of conjunctive query

If we have a complete KG we just traverse the KG starting from the anchor nodes and get the below paths. We will have to do an intersection for the first question and arrive at answers.

path traversal

Similar to path traversal this is not tractable in the KG space, and we do it on the embedding space, but the problem arises when figuring out how to do the intersection in the embedding space. You can see the computational graph and the embedding process of the above query in the below image

embedded space conjunctive query

We make use of Neural Intersection Operator to do intersection of several vectors in the embedding space.

For this we have several current query embeddings q1,⋯ ,\qmq_1, \cdots, \q_m and the output has to be the intersection of the query embedding q. For this the order of the queries should not matter so our operation should be permutation invariant on the embedding vectors. Below J\mathcal{J} should be permutation invariant.

J(q1,⋯ ,qm)=J(qp(1),⋯ ,qp(m))\mathcal{J}(q_1, \cdots, q_m) = \mathcal{J}(q_{p(1)}, \cdots, q_{p(m)}); where [p(1),⋯ ,p(m)][p(1), \cdots, p(m)] is any permutation of [1,⋯ ,m][1, \cdots, m]

We make use of DeepSets architecture, to get the intersection embedding of initial query embeddings.

deep sets

In the above architecture, ϕ,β\phi, \beta are neural networks, where ϕ\phi generates features of the input query embeddings, and add them using a permutation invariant operator like mean, and put it through the neural network β\beta and get our vector embedding of the intersection query.

We train this along with our embedding method on our KG.

computation graph and embedding space on deep sets

Training

Given entity embedding v and a query embedding q, the distance fq(v)=∣∣q−v∣∣f_q(v) = ||q-v||. The training parameters will be

  • Entity embeddings: d|V|
  • Relation embeddings: d|R|
  • intersection operator ϕ,β\phi, \beta

The number of training parameters does not depend on the graph size. We will have the same training strategy as TransE

We sample a query q, answer v, and negative sample v'. Embed the query q. Calculate the distances fq(v)f_q(v) and fq(v′)f_q(v'). Optimize the Loss L\mathcal{L}

Given a test query q, embed the query q, for all v in KG, calculate fq(v)f_q(v) and get the best node entity.

Limitations

We need a better way of finding intersection in case of conjunctive queries. When we traverse the KG to achieve the answers, each step produces a set of reachable entities, we are not modeling on these sets, we just operating on the embeddings of the relationships. Can we find a better expressive geometry to embed the queries.



Box Embeddings

Instead of operating on vector embeddings we want to be able to model properly on the sets of entities we get after each step of the query. For this we use Box Embeddings. We embed queries with hyper-rectangles

q=(Center(q),Offset(q))q = (Center(q), Offset(q))

Taking intersection between query vectors is not an intuitive operation, but intersection of boxes is well defined. And similarly boxes are a powerful abstraction to model the sets of entities we end up with each query that can be thought of as entities enclosed by the box with a projected center and offset.

Parameters

  • Entity embeddings d|V|, can be seen as zero-volume boxes.
  • Relations embeddings are modeled as actual boxes, hence we have 2d|R| parameters, one for the center and one for offset. Here we are augmenting earlier relation embeddings with an offset.
  • Intersection operator ϕ,β\phi, \beta, here the number of parameters do not depend on the graph size. But the key difference is that the inputs and outputs of this intersection operator are both boxes.

Geometric Projection Operator P:Box×Relation→Box\mathcal{P}: \text{Box} \times \text{Relation} \rarr \text{Box}.

Center(q') = Center(q) + Center(r)

Offset(q') = Offset(q) + Offset(r) The geometric projection operator will increase the size of the box with more relations. An entity with zero size will become a box a projection on a box will become even bigger.

The computation graph and embedding space of the above discussed query in case of Box embeddings will look like below

box embeddings

Instead of neural intersection operator we just use a Geometric Intersection Operator J:Box×⋯Box→Box\mathcal{J}: \text{Box} \times \cdots \text{Box} \rarr \text{Box}

The new center of the intersected box will be a weighted average and the new offset shrinks. The Geometric Intersection operator shrinks the size of the boxes.

Center(qinter)=∑iwi⊙Center(qi)\text{Center}(q_{\text{inter}}) = \sum_iw_i\odot\text{Center}(q_i)

Offset(qinter)=min⁡(Offset(q1),⋯ ,Offset(qn))⊙σ(Deepsets(q1,⋯ ,qn))\text{Offset}(q_{\text{inter}}) = \min(\text{Offset}(q_1),\cdots,\text{Offset}(q_n)) \odot \sigma(\text{Deepsets}(q_1,\cdots,q_n))

The min guarantees shrinking, and the sigmoid function squashes output in (0, 1). We use Deepsets to identify what boxes are more important.

Following the computation graph on the complete query using box embeddings it looks like below in the Knowledge Graph.

box embeddings query on KG

Entity to box distance

Once we have the query box q and entity vector v, we compute the distance between the two like below

dbox(q,v)=dout(q,v)+α.din(q,v)d_{\text{box}}(q, v) = d_{\text{out}}(q, v) + \alpha . d_{\text{in}}(q, v) where 0<α<10 < \alpha < 1

doutd_{\text{out}} is the distance from the box edge if the entity is outside, and dind_{\text{in}} is the distance from the center.

We defined it this way to penalize entities outside the box while training.

Training

Given a set of queries and answers, query q, valid answer v, and wrong answer v'

L=−log⁡σ(γ−dbox(q,v))−log⁡σ(dbox(q,v′i)−γ)\mathcal{L} = -\log \sigma(\gamma - d_{\text{box}}(q, v)) - \log \sigma(d_{\text{box}}(q, {v'}_i) - \gamma)

Summary of Query algorithms

From the paper

EmbeddingSymmetryCompositionOne-to-many
TransE
TransH
Query2Box

N-ary Relations in query2box

Symmetric Relations in query2box

Composition Relations in query2box



EPFO Queries

Conjunctive queries + disjunction is called Existential Positive First-Order(EPFO) queries.

A sample EPFO query looks like "Where did Canadians with Turing Award or Nobel graduate?"

For designing a disjunction operator and embed EPFO queries in low-dimensional vector space you can read this paper