# house_knowledge_graph_embedding_with_householder_parameterization__63ed0dcf.pdf

Hous E: Knowledge Graph Embedding with Householder Parameterization

Rui Li 1 * Jianan Zhao 2 Chaozhuo Li 3 Di He 4 Yiqi Wang 5 Yuming Liu 6

Hao Sun 6 Senzhang Wang 7 Weiwei Deng 6 Yanming Shen 1 Xing Xie 3 Qi Zhang 6

The effectiveness of knowledge graph embedding (KGE) largely depends on the ability to model intrinsic relation patterns and mapping properties. However, existing approaches can only capture some of them with insufficient modeling capacity. In this work, we propose a more powerful KGE framework named Hous E, which involves a novel parameterization based on two kinds of Householder transformations: (1) Householder rotations to achieve superior capacity of modeling relation patterns; (2) Householder projections to handle sophisticated relation mapping properties. Theoretically, Hous E is capable of modeling crucial relation patterns and mapping properties simultaneously. Besides, Hous E is a generalization of existing rotation-based models while extending the rotations to high-dimensional spaces. Empirically, Hous E achieves new state-of-the-art performance on five benchmark datasets. Our code is available at https://github.com/anrep/Hous E.

1. Introduction

Knowledge graphs (KGs) store massive human knowledge as a collection of factual triples, where each triple (h, r, t) represents a relation r between head entity h and tail entity t. With a wealth of human knowledge, KGs have demonstrated their effectiveness in a myriad of downstream applications (Xiong et al., 2017). However, real-world KGs such as Freebase (Bollacker et al., 2008) and Yago (Suchanek et al., 2007)) usually suffer from incompleteness. Knowledge

*Work done during internship at MSRA. 1Department of Computer Science and Technology, Dalian University of Technology, Dalian, China 2University of Notre Dame, Indiana, USA 3Microsoft Research Asia, Beijing, China 4Peking University, Beijing, China 5Michigan State University, Michigan, USA 6Microsoft, Beijing, China 7Central South University, Changsha, China. Correspondence to: Yanming Shen <shen@dlut.edu.cn>, Chaozhuo Li <cli@microsoft.com>.

Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022. Copyright 2022 by the author(s).

(a) Symmetry (b) Antisymmetry (c) Inversion

(d) Composition (e) 1-to-N (f) N-to-1

Figure 1. Illustrations of four relation patterns (a-d) (Sun et al.,

2019) and two challenging RMPs (e-f) (Bordes et al., 2013).

Graph Embedding (KGE), which learns low-dimensional representations for entities and relations, excels as an effective tool for predicting missing links.

A crucial challenge of KGE lies in how to model relation patterns (e.g., symmetry, antisymmetry, inversion and composition) and relation mapping properties (RMPs, i.e., 1-to1, 1-to-N, N-to-1 and N-to-N) (Bordes et al., 2013; Sun et al.,

2019) as shown in Figure 1. Most works design specific vector spaces and operations to capture such patterns and RMPs. For example, Trans E (Bordes et al., 2013) represents relations as translations, which fails in modeling symmetry and RMPs. Recently, Rotat E (Sun et al., 2019) represents relations as rotations in the complex plane to model the four relation patterns, but it is incapable of handling RMPs due to the distance-preserving property of rotations. Rotate3D (Gao et al., 2020) and Quat E (Zhang et al., 2019) introduce quaternions to extend rotations to 3-dimensional and 4-dimensional spaces, and achieve better performance with larger model capacity.

However, as far as we know, none of the existing methods is capable of modeling all the relation patterns and RMPs as shown in Table 1, leading to the sub-optimal performance. Furthermore, some advanced approaches, such as (Sun et al., 2019; Gao et al., 2020; Zhang et al., 2019), are specifically designed on 2,3,4 dimensional spaces, which may be inadequate to capture the sophisticated structures of KGs (Zhang et al., 2019). Therefore, this brings us a question: is there

Hous E: Knowledge Graph Embedding with Householder Parameterization

Table 1. Recent models capability of modeling relation patterns and relation mapping properties (RMPs). Trans X represents many Trans E s variants, such as Trans H (Wang et al., 2014), Trans R (Lin et al., 2015) and Trans D (Ji et al., 2015).

Model Symmetry Antisymmetry Inversion Composition RMPs Dim. of Rotation

Trans E % " " " % - Trans X " " % % " - Dist Mult " % % % " - Compl Ex " " " % " - Rotat E " " " " % 2-D Rotate3D " " " " % 3-D Quat E " " " % " 4-D Dual E " " " % " 3-D

Hous E-r " " " " % k-D Hous E " " " " " k-D

a framework to handle all the above relation patterns and RMPs with more powerful modeling capacity?

In this paper, we give an affirmative answer by proposing a more powerful and general framework named Hous E based on Householder parameterization. We prove that the composition of 2 k

2 Householder reflections (Householder, 1958) can represent any k-dimensional rotations. This unique property of Householder reflections provides us a natural way to model high-dimensional rotations with more degree of freedom. We call this kind of rotations as Householder rotations, based on which a simple model named Hous E-r is proposed to achieve superior capacity of modeling relation patterns. Nevertheless, Hous E-r is plagued by the sophisticated RMPs due to the distance-preserving nature of pure Householder rotations. To remedy this deficiency, we modify the vanilla Householder reflections to Householder projections, which can flexibly adjust the relative distances between points. The Householder projections are further integrated with Hous E-r to establish the final Hous E. By enjoying the merits of Householder rotations and Householder projections, Hous E is theoretically capable of modeling all the relation patterns and RMPs shown in Table 1. Moreover, our proposal is a general framework and existing rotationbased models can be viewed as the special cases of Hous E. Our contributions are summarized as follows:

To the best of our knowledge, we are the first to utilize Householder parameterization to build a more powerful and general KGE framework named Hous E.

We present a simple way to represent relations as high-dimensional Householder rotations, which endows Hous E with better modeling capacity.

We further modify the vanilla Householder reflections to Householder projections. By combining the Householder projections and rotations, Hous E is able to model all the relation patterns and RMPs in Table 1.

We conduct extensive experiments over five benchmarks and our proposal consistently outperforms SOTA baselines over all the datasets.

2. Problem Setup

Given the entity set E and relation set R, a knowledge graph can be formally defined as a collection of factual triples D = {(h, r, t)}, in which head/tail entities h, t E and relation r R. To predict missing links, KGE maps entities and relations to distributed representations, and defines a score function to measure the plausibility of each triple.

Following a series of previous works (Bordes et al., 2013; Sun et al., 2019; Gao et al., 2020; Song et al., 2021), we define the score function as a distance function dr(h, t). The distance of the positive triple (h, r, t) D is expected to be smaller than the corrupted negative triples (h , r, t) or (h, r, t ), which can be generated by randomly replacing the entity h or t with other entities.

In the training process, the self-adversarial negative sampling (Sun et al., 2019) is used to optimize the models in a contrastive way. Given a positive triple and its negative samples, the loss function is defined as follows:

L = log σ(γ dr(h, t))

i=1 p(h i, r, t i) log σ(dr(h i, t i) γ)

where γ is a pre-defined margin, σ is the sigmoid function, l denotes the number of negative samples, (h i, r, t i) is a negative sample against (h, r, t), p(h i, r, t i) is the weight of negative sampling defined in (Sun et al., 2019), λ is the regularization coefficient. For the sake of clarification, notations used in this paper are listed in Appendix A.

Hous E: Knowledge Graph Embedding with Householder Parameterization

(a) (b) (c) (d) Figure 2. (a) Householder reflection in 2-dimensional space; (b) Hous E-r models relation r as a 2-dimensional Householder rotation composed of two Householder reflections; (c) Modified Householder reflection in 2-dimensional space with different values of τ; (d) To model (h, r, t1) and (h, r, t2), Hous E first utilizes relational Householder projections Pro-H1 and Pro-H2 (blue lines) to change the relative distance between entities, such as increasing the distance between Sh and the negative samples (Marked by triangles) and decreasing the distance between two positive tail entities St1 and St2, then Hous E performs a relational Householder rotation Rot-H (red lines) from projected head embedding Sh,r to the projected tail embedding. Note that we omit the row (element) indices [i] for simplicity.

3. Methodology

3.1. Hous E-r: Relational Householder Rotations

In the first step, we seek to develop a general framework to model relations as rotations in the space of any dimension k, going beyond (Sun et al., 2019; Gao et al., 2020; Zhang et al., 2019), for better modeling capacity. In order to parameterize a k-dimensional rotation matrix, a straight-forward strategy is to randomly initialize a matrix and restrict it to a rotation matrix after every gradient descent update. However, such a naive approach may lead to the complicated optimization process and cannot fully cover the set of all k k rotation matrices. In this paper, we theoretically prove that any k-dimensional rotations can be represented as 2 k

2 Householder reflections (Householder, 1958). Inspired by this theorem, we design an elegant parameterization based on Householder reflections to model k-dimensional rotations without any special optimizing procedure.

As the basic mathematical operator in this work, Householder matrix (Householder, 1958) represents the reflection (Householder reflection) about a hyperplane containing the origin. Given a unit vector u Rk, the k k Householder matrix H, taking u as variable, is defined as H(u):

H(u) = I 2uu , (2)

where u 2 2 = 1 and I is the k k identity matrix. Geometrically, as shown in Figure 2(a), the Householder matrix transforms x to ex by a reflection about the hyperplane orthogonal to the normal vector u:

ex = H(u)x = x 2 x, u u, (3)

where denotes the dot product.

Based on the Householder matrices, we can design a mapping to represent rotations. Specifically, given a series of

unit vectors U = {uc}2n c=1 where uc Rk and n is a positive integer, we define the mapping as follows:

c=1 H(uc). (4)

The output of Rot-H is a k k orthogonal matrix with determinant 1, i.e., a rotation matrix (Artin, 2016), since each Householder matrix H(uc) is orthogonal and its determinant is 1. Moreover, we also prove that any rotation can be expressed as the composition of Householder reflections.

Formally, we have the following theorem:

Theorem 3.1. When n = k

2 , the image of Rot-H is the set of all k k rotation matrices, i.e., Image(Rot-H) = SO(k), SO(k) is the k-dimensional special orthogonal group. (See proof in Appendix B)

Theorem 3.1 provides us a natural way to represent relations as high-dimensional rotations for better modeling capacity. We call such rotations composed of 2 k

2 Householder reflections as the Householder rotations.

Given a triple (h, r, t), we denote the embeddings of head entity h and tail entity t as Sh Rd k and St Rd k, where d is the embedding size of entities and k is the dimension size of each row vector. Recall that in Rotat E (Sun et al., 2019), Rotate3D (Gao et al., 2020) and Quat E (Zhang et al., 2019), each element (row) in the entity embeddings is represented as a 2-dimensional, 3-dimensional, 4-dimensional vector (i.e., k = 2, 3, 4). More generally, Hous E-r represents each row of entity embeddings as a k-dimensional vector, i.e., Sh[i], St[i] Rk, i {1, . . . , d}. To model each relation as a row-wise k-dimensional rotation between head and tail entities, the embedding of relation r is denoted as Ur Rd 2n k, where n = k

2 . Each row Ur[i] R2n k is composed of 2n k-dimensional unit vectors, i.e., Ur[i][j] Rk and Ur[i][j] 2 2 = 1, j {1, . . . , 2n}.

Hous E: Knowledge Graph Embedding with Householder Parameterization

We propose to parameterize the relational Householder rotations by using the mapping Rot-H in Equation (4). Formally, for each triple (h, r, t), Hous E-r applies r-specific Householder rotations to the i-th row of head embedding h:

S h[i] = Rot-H(Ur[i])Sh[i]

j=1 H(Ur[i][j])Sh[i]. (5)

Based on Theorem 3.1, any k-dimensional relational rotations can be represented by Equation (5). As illustrated in Figure 2(b), a 2-dimensional rotation can be viewed as the composition of 2 Householder reflections.

Distance function of Hous E-r. The distance function measures the distance between the rotated head entity embedding S h and the tail entity embedding St:

i=1 S h[i] St[i] 2. (6)

Modeling capability of Hous E-r. Theoretically, Hous Er can model and infer symmetry, antisymmetry, inversion and composition patterns. The definitions of these relation patterns are listed in Appendix C for clarity.

Claim 3.2. Hous E-r can model the symmetry/antisymmetry pattern. (See proof in Appendix D.1)

Claim 3.3. Hous E-r can model the inversion pattern. (See proof in Appendix D.2)

Claim 3.4. Hous E-r can model the composition pattern. (See proof in Appendix D.3)

Efficient computation. The time complexity of Equation (5) is O(2nk2), in which 2n matrix-vector multiplications incur high computational costs. However, it is worth noting that these matrix multiplications can be replaced by the vector operations. Formally, based on Equation (3), the j-th matrix-vector multiplication can be expressed as:

Sj h[i] = H(Ur[i][j])Sj 1 h [i]

= Sj 1 h [i] 2 Sj 1 h [i], Ur[i][j] Ur[i][j], (7)

where S0 h[i] = Sh[i]. Through such iterated vector operations, the time complexity can be reduced to O(2nk).

Connections to Rotat E, Rotate3D and Quat E. As shown in Table 1, the rotations of Rotat E, Rotate3D and Quat E are modeled in 2-dimensional, 3-dimensional and 4dimensional spaces, respectively. Geometrically, they can be viewed as the special cases of Hous E-r by setting the rotation dimension k to 2, 3 and 4, respectively. For example, as shown in Figure 2(b), Hous E-r in 2-dimensional space

is equivalent to Rotat E since any rotation in a plane can be represented by two conjunctive Householder reflections. Moreover, unlike previous models restricting rotations to a fixed and low-dimensional space, Hous E-r can easily model high-dimensional rotations by enlarging the value of k.

Limitation of Householder rotations. On the other side of the coin, Hous E-r is not the panacea as the pure Householder rotations suffer from the challenge of indistinguishable representations in modeling RMPs. Considering the ideal case of no-error embedding, we have the following deductions:

For a 1-to-N relation r, when (h, r, t1) and (h, r, t2) hold, St1[i] = St2[i].

For an N-to-1 relation r , when (h1, r , t) and (h2, r , t) hold, Sh1[i] = Sh2[i].

One can see that the embeddings of different entities tend to be identical when facing the complex RMPs, leading to the uninformative representations. Thus, it is meaningful to tackle this challenge in our proposal.

3.2. Hous E: Improved Hous E-r with Relational Householder Projections

To handle the sophisticated RMPs, some projection operations have been proposed and shown their effectiveness (Wang et al., 2014; Lin et al., 2015; Ji et al., 2015). The relational projections enable the KGE models to generate relation-specific representations for each entity (Wang et al., 2014). However, existing projections are irreversible transformations, leading to the failure in modeling inversion and composition patterns (Sun et al., 2019). Differently, we propose the novel invertible projections named Householder projections by modifying the vanilla Householder matrices to tackle the limitation of Hous E-r.

More concretely, given a unit vector p Rk, i.e., p 2 2 = 1 and a real scalar τ, the k k modified Householder matrix M(p, τ) is defined as:

M(p, τ) = I τpp . (8)

Note that the modified Householder matrix M(p, τ) has k 1 eigenvalues equal to 1 and one eigenvalue equal to 1 τ. Thus, M(p, τ) is invertible when τ = 1. Geometrically, the modified Householder matrix transforms x to ˆx by a projection along the axis p:

ˆx = M(p, τ) = x τ x, p p, (9)

where τ determines the position of ˆx on the axis p. Figure 2(c) illustrates several projected results with different values of τ in two-dimensional space.

Moreover, based on the modified Householder matrices, given a series of real scalars T = {τc}m c=1 and unit vectors

Hous E: Knowledge Graph Embedding with Householder Parameterization

P = {pc}m c=1 where m is a positive integer and pc Rk, we define the mapping:

Pro-H(P, T) =

c=1 M(pc, τc). (10)

The output of Pro-H(P, T) is an invertible matrix since the product of invertible matrices is also an invertible matrix. We name such projections composed of m modified Householder reflections as Householder projections. Different from the rigidly distance-preserving Householder rotations, the Householder projections can reversibly change the relative distance between two points, and thus provide a suitable solution for modeling RMPs without sacrificing the capability of modeling relation patterns.

Specifically, we incorporate the relational Householder rotations and relational Householder projections under a unified framework named Hous E to enjoy the merits from both sides. The relational Householder projections enable relation-specific representations for each entity and the relational Householder rotations enable high-dimensional rotations between projected entities. As shown in Figure 2(d), given the input triple (h, r, t), Hous E first learns the relation(r)-specific representations Sh,r and St,r for head and tail entities via Householder projections, respectively. Then, Sh,r is transformed by the high-dimensional Householder rotations to be close to St,r.

In the phase of relational Householder projections, we define two types of parameters for each relation r: the axes Pr Rd m k and the scalars Tr Rd m, where m is a positive integer. Each row Pr[i] Rm k is composed of m kdimensional unit vectors (projection axes), i.e., Pr[i][j] Rk and Pr[i][j] 2 2 = 1. Each row Tr[i] is composed of m real values (projection scalars).

We propose to parameterize the relational Householder projections by using the mapping Pro-H in Equation (10). Considering the head and tail parts of a relation usually have different implicit types (Bordes et al., 2011), Hous E utilizes two sets of independent projection parameters {Pr,1, Tr,1} and {Pr,2, Tr,2} for each relation r to project h and t, respectively. Formally, For each triple (h, r, t), Hous E transforms each row of head entity h and tail entity t with r-specific Householder projections:

Sh,r[i] = Pro-H(Pr,1[i], Tr,1[i])Sh[i]

j=1 M(Pr,1[i][j], Tr,1[i][j])Sh[i],

St,r[i] = Pro-H(Pr,2[i], Tr,2[i])St[i]

j=1 M(Pr,2[i][j], Tr,2[i][j])St[i].

Algorithm 1 Forward procedure of Hous E

1: Input: An input triple (h, r, t), head (tail) entity embedding Sh (St), projection parameters {Tr,1, Tr,2} and {Pr,1, Pr,2}, rotation parameters Ur, embedding size d, rotation dimension k, number of modified Householder reflections m. 2: Output: Distance δ 3: δ 0 4: n k

2 5: for i = 1 to d do 6: /* Relational Householder projections */ Sh,r[i] Qm j=1 M(Pr,1[i][j], Tr,1[i][j])Sh[i] St,r[i] Qm j=1 M(Pr,2[i][j], Tr,2[i][j])St[i] 7: /* Relational Householder rotations */ S h,r[i] Q2n j=1 H(Ur[i][j])Sh,r[i] 8: δ δ + S h,r[i] St,r[i] 2 9: end for 10: Return: δ

After that, Hous E models the row-wise Householder rotations between the projected head point Sh,r and projected tail point St,r, which is the same as the one in Equation (5). If (h, r, t) holds, we expect the rotated head point S h,r[i] = Rot-H(Ur[i])Sh,r[i] St,r[i], where Ur[i] is composed of 2 k

2 r-specific Householder reflections.

As shown in Algorithm 1, for each triple (h, r, t), Hous E first utilizes the relational Householder projection to generate r-specific representations Sh,r and St,r for h and t, as in line 6. Then, Hous E applies the relational Householder rotation to the projected head embedding Sh,r, as in line 7. The rotated result S h,r is expected to be close to the projected tail embedding St,r. Note that we replace the matrix-vector multiplications in line 6 and 7 with the vector operations in Equation (9) and (3) for efficient computation.

The learnable parameters of Hous E include {Se}e E and {Ur, Pr,1, Pr,2, Tr,1, Tr,2}r R. Compared to previous models (Sun et al., 2019; Zhang et al., 2019), the extra cost is proportional to the number of relation types, which is usually much smaller than the number of entities. Therefore, the total number of parameters in Hous E is about O(dk |E|).

Distance function of Hous E. For each triple (h, r, t), the distance function of Hous E is defined as:

i=1 S h,r[i] St,r[i] 2. (12)

Modeling capability of Hous E. Hous E can model and infer all the relation patterns and RMPs as shown in Table 1 (we also discuss other relation patterns in Appendix E). Formally, we can achieve the following claims:

Claim 3.5. Hous E can model the symmetry/antisymmetry

Hous E: Knowledge Graph Embedding with Householder Parameterization

Table 2. Link prediction results on WN18 and FB15k. Best results are in bold and second best results are underlined. [ ]: Results are taken from (Nguyen et al., 2018); [ ]: Results are taken from (Kadlec et al., 2017). Other results are taken from the original papers.

Model MR MRR H@1 H@3 H@10 MR MRR H@1 H@3 H@10

Trans E - .495 .113 .888 .943 - .463 .297 .578 .749 Dist Mult 655 .797 - - .946 42.2 .798 - - .893 Compl Ex - .941 .936 .945 .947 - .692 .599 .759 .84 Conv E 374 .943 .935 .946 .956 51 .657 .558 .723 .831 Rotat E 309 .949 .944 .952 .959 40 .797 .746 .830 .884 Rotate3D 214 .951 .945 .953 .961 39 .789 .728 .832 .887 Quat E 388 .949 .941 .954 .960 41 .770 .700 .821 .878 Dual E - .951 .945 .956 .961 - .790 .734 .829 .881

Hous E-r 155 .953 .947 .956 .964 39 .807 .758 .839 .893 Hous E 137 .954 .948 .957 .964 38 .811 .759 .847 .898

Table 3. Link prediction results on WN18RR, FB15k-237 and YAGO3-10. Best results are in bold and second best results are underlined. [ ]: Results are taken from (Nguyen et al., 2018); [ ]: Results are taken from (Dettmers et al., 2018). Other results are taken from the corresponding original papers.

WN18RR FB15k-237 YAGO3-10

Model MR MRR H@1 H@3 H@10 MR MRR H@1 H@3 H@10 MR MRR H@1 H@3 H@10

Trans E 3384 .226 - - .501 357 .294 - - .465 - - - - - Dist Mult 5110 .43 .39 .44 .49 254 .241 .155 .263 .419 5926 .34 .24 .38 .54 Compl Ex 5261 .44 .41 .46 .51 339 .247 .158 .275 .428 6351 .36 .26 .4 .55 Conv E 4187 .43 .40 .44 .52 224 .325 .237 .356 .501 1671 .44 .35 .49 .62 Rotat E 3340 .476 .428 .492 .571 177 .338 .241 .375 .533 1767 .495 .402 .55 .67 Rotate3D 3328 .489 .442 .505 .579 165 .347 .250 .385 .543 - - - - - Quat E 3472 .481 .436 .500 .564 176 .311 .221 .342 .495 - - - - - Dual E - .482 .440 .500 .561 - .330 .237 .363 .518 - - - - - Rot-Pro 2815 .457 .397 .482 .577 201 .344 .246 .383 .540 1797 .542 .443 .596 .669

Hous E-r 1885 .496 .452 .511 .585 165 .348 .254 .384 .534 1449 .565 .487 .616 .703 Hous E 1303 .511 .465 .528 .602 153 .361 .266 .399 .551 1415 .571 .491 .620 .714

pattern. (See proof in Appendix D.4)

Claim 3.6. Hous E can model the inversion pattern. (See proof in Appendix D.5)

Claim 3.7. Hous E can model the composition pattern. (See proof in Appendix D.6)

Claim 3.8. Hous E can model the relation mapping properties. (See proof in Appendix D.7)

Connections to Trans H, Trans R and Trans D. Previous works such as Trans H, Trans R and Trans D also focus on designing the projection operations to ensure that the same entity has different representations under different relations. However, as shown in Table 1, these methods will undermine the ability to infer inversion and composition patterns due to the irreversible projection operations. Note that, the projection operation of Trans H is a special case of Hous E if we set the scalar τ = 1, which essentially is the irreversible transformation. Different from these works, Hous E utilizes an invertible matrix derived by a series of modified Householder matrices to generate relation-specific entity representations. Such invertible projections enable our proposal to model relation mapping properties without sacrificing the capability in modeling relation patterns.

4. Experiment

4.1. Experimental Setup

Datasets. We evaluate our proposals on five widely-used benchmarks: WN18 (Bordes et al., 2013), FB15k (Bordes et al., 2013), WN18RR (Dettmers et al., 2018), FB15k237 (Toutanova & Chen, 2015) and YAGO3-10 (Mahdisoltani et al., 2015). Refer to Appendix F for more details.

Baselines. We compare our models with a number of baselines. For non-rotation models, we report Trans E (Bordes et al., 2013), Dist Mult (Yang et al., 2015), Compl Ex (Trouillon et al., 2016) and Conv E (Dettmers et al., 2018). For rotation-based models, we report Rotat E (Sun et al., 2019), Rotate3D (Gao et al., 2020), Quat E (Zhang et al., 2019), Dual E (Cao et al., 2021) and Rot-Pro (Song et al., 2021).

Implementation details. To ensure fair comparisons, we set a smaller embedding size d for Hous E-r and Hous E, so that the total numbers of parameters are comparable to baselines. More details can be found in Appendix G.

4.2. Main Results

The experimental results are summarized in Table 2 and Table 3. Compared to all the baselines, both Hous E-r and

Hous E: Knowledge Graph Embedding with Householder Parameterization

Table 4. MRR for the models tested on each relation of WN18RR.

Relation Name Rotat E Quat E Hous E-r Hous E

hypernym 0.154 0.172 0.182 0.207 instance hypernym 0.324 0.362 0.395 0.440 member meronym 0.255 0.236 0.275 0.312 synset domain topic of 0.334 0.395 0.396 0.428 has part 0.205 0.210 0.217 0.232 member of domain usage 0.277 0.372 0.415 0.453 member of domain region 0.243 0.140 0.281 0.395 derivationally related form 0.957 0.952 0.958 0.958 also see 0.627 0.607 0.638 0.640 verb group 0.968 0.930 0.968 0.968 similar to 1.000 1.000 1.000 1.000

Hous E achieve SOTA performance, demonstrating the effectiveness of the Householder framework.

Table 2 shows the results on WN18 and FB15k, from which we observe that even with only Householder rotations, Hous E-r already consistently outperforms the baselines over both datasets. Moreover, by combining Householder rotations and Householder projections together, Hous E further achieves new state-of-the-art results on both WN18 and FB15k datasets. Considering that the main relation patterns in WN18 and FB15k are symmetry, antisymmetry and inversion, the superior performance of Hous E-r and Hous E reveals their effectiveness in modeling these patterns.

Table 3 summarizes the results on WN18RR, FB15k-237 and YAGO3-10. On these datasets, Hous E-r surpasses most of the baselines. The only comparable exception is Rotate3D on FB15k-237 which models relations as 3-d rotations. However, Hous E-r uses much less parameters than Rotate3D as shown in Appendix G and achieves similar performance, which also verifies the superior modeling capacity of Householder rotations. The improvements over existing rotations-based baselines (Rotat E, Rotate3D, Quat E and Dual E) demonstrate the superiority of high-dimensional rotations. Moreover, Hous E consistently outperforms Hous E-r along with all the baselines by a large margin on the three datasets across all metrics, benefiting from the ability to model relational mapping properties.

4.3. Fine-grained Performance Analysis

To further verify the modeling capacity of our proposal from a fine-grained perspective, we report the performance on each relation of WN18RR following (Zhang et al., 2019). As shown in Table 4, compared to two rotation-based baselines Rotat E and Quat E, we observe that:

(1) Hous E-r surpasses all the baselines on all 11 relation types, confirming the superior modeling capacity of the Householder rotations.

(2) By incorporating the Householder projections, Hous E achieves more significant improvements on the challenging

Table 5. MRR for the models tested on RMPs in FB15k-237.

Task RMPs Rotat E Hous E

Predicting Head (MRR)

1-to-1 0.498 0.514 1-to-N 0.475 0.479 N-to-1 0.088 0.114 N-to-N 0.260 0.286

Predicting Tail (MRR)

1-to-1 0.490 0.502 1-to-N 0.071 0.086 N-to-1 0.747 0.778 N-to-N 0.367 0.392

1-to-N and N-to-1 relations. For example, Hous E outperforms Rotat E on 1-to-N relation member of domain region and N-to-1 relation instance hypernym with 62.55% and 35.80% relative gains, respectively.

4.4. Capability of Modeling RMPs

In order to further demonstrate the effectiveness of Hous E in modeling RMPs, we report the detailed results of our proposal on different RMPs2 in FB15k-237.

Table 5 exhibits the results on different types of RMPs. One can see that Hous E outperforms Rotat E across all RMP types. For example, on the challenging N-to-1 (predicting head) and 1-to-N (predicting tail) tasks, Hous E achieves 29.55% and 21.13% relative improvements over Rotat E. Such advanced performance of Hous E owes to the powerful modeling capability of the Householder projections.

4.5. Hyperparameter Sensitivity Analysis

Dimension of rotations. To verify the expressiveness of high-dimensional rotations, we conduct experiments for our models under varying rotation dimension k. Figure 3(a) and 3(b) show the results on WN18RR and FB15k-237.

As expected, on both datasets, Hous E-r and Hous E rotated in higher-dimensional spaces achieve better performance than the ones rotated in lower-dimensional spaces, since the highdimensional rotations bring the superior modeling capacity. Moreover, Hous E consistently surpasses Hous E-r by a large margin across all rotation dimensions, demonstrating the effectiveness of the integrated Householder projections. For example, on WN18RR, Hous E with 4-dimensional rotations already outperforms Hous E-r with 12-dimensional rotations.

Number of modified Householder matrices. As shown in Equation (10), a Householder projection is composed of

2Following (Sun et al., 2019), for each relation r, we compute the average number of heads per tail (hptr) and the average number of tails per head (tphr). If hptr<1.5 and tphr<1.5, r is treated as 1-to-1; if hptr 1.5 and tphr 1.5, r is treated as N-to-N; if hptr<1.5 and tphr 1.5, r is treated as 1-to-N; if hptr 1.5 and tphr<1.5, r is treated as N-to-1.

Hous E: Knowledge Graph Embedding with Householder Parameterization

2 4 6 8 10 12 0.47

Dim. of rotation k

Hous E Hous E-r

(a) MRR vs. k on WN18RR

4 8 12 16 20 24 28

Dim. of rotation k

Hous E Hous E-r

(b) MRR vs. k on FB15k-237

0 1 2 3 4 5 6 0.47

#modified Hous. m

(c) MRR vs. m on WN18RR

0 1 2 3 4 5 6 7 8

#modified Hous. m

(d) MRR vs. m on FB15k-237

Figure 3. (a) and (b) show the MRR results of Hous E and Hous E-r with varying rotation dimensions on WN18RR and FB15k-237; (c) and (d) show the MRR results of Hous E with different numbers of modified Householder matrices on WN18RR and FB15k-237.

m modified Householder matrices. Here we investigate the impact of m on the performance (MRR) of Hous E. Figure 3(c) and 3(d) show the results on WN18RR and FB15k-237.

With the increase of m, the performance of Hous E first improves and then drops on both datasets. This is because the larger m provides greater projection capability, but the overcomplicated projections also aggravate the risk of overfitting. Moreover, the values of m for the best performance on the two datasets are different (m = 1 on WN18RR and m = 6 on FB15k-237) due to the distinct graph densities. Specifically, WN18RR is a sparse KG dataset with the average degree of 2.19, while FB15k-237 is a much denser KG with the average degree of 18.71. Thus, the larger m is needed for modeling the richer graph information in FB15k-237.

4.6. Superiority of Householder Projections

To verify the effectiveness of the proposed Householder projections, we design two variants of Hous E by replacing the Householder projections with previous irreversible projections used in Trans H (Wang et al., 2014) and Trans R (Lin et al., 2015), dubbed Hous H and Hous R respectively. Table 6 shows the experimental results on WN18RR and FB15k237. Compared to Hous E-r without any projections, the performance of Hous H and Hous R is barely improved on FB15k-237, and even degraded on WN18RR. It reveals that the irreversible projections may hinder the modeling capability. Moreover, Hous E significantly outperforms Hous H and Hous R on both datasets, demonstrating the superiority of the invertible Householder projections in Hous E.

Table 6. Performance of different variants.

WN18RR FB15k-237

Variants MRR H@10 MRR H@10

Hous H .491 .584 .347 .537 Hous R .488 .580 .349 .538

Hous E-r .496 .585 .348 .534 Hous E-r+ .500 .591 .351 .538

Hous E .511 .602 .361 .551 Hous E+ .514 .606 .366 .552

4.7. Additional Translations

To explore the potential of our proposal, we also incorporate translations (Bordes et al., 2013) into Hous E-r and Hous E, dubbed Hous E-r+ and Hous E+ respectively. The translations are directly deployed after the Householder rotations. From Table 6, we see that these two variants both outperform their original versions. This is because the translations provide a natural way to represent the hierarchical property of KGs (Bordes et al., 2013), which also endows our proposal with more comprehensive modeling capacity.

5. Related Work

Translation-based models. Trans E (Bordes et al., 2013) is the first model that represents each relation as a translation between entities. This simple model is effective in modeling antisymmetry, inversion and composition patterns, but fails in handling symmetry pattern and RMPs. To tackle Trans E s limitations, a set of variants (Wang et al., 2014; Lin et al., 2015; Ji et al., 2015; Xiao et al., 2015) are subsequently proposed. Trans H (Wang et al., 2014) projects entities to a relation-specific hyperplane and performs translation on this hyperplane. Trans R (Lin et al., 2015) models entities and relations in distinct spaces and conducts relation-specific projections with normal linear transformations. However, these models lose the ability to model inversion and composition patterns since irreversible linear transformations are performed on head and tail entities (Sun et al., 2019).

Rotation-based models. Following Compl Ex (Trouillon et al., 2016) which extends Dist Mult (Yang et al., 2015) to complex number systems, Rotat E (Sun et al., 2019) represents each relation as a 2-dimensional rotation in complex plane to model symmetry, antisymmetry, inversion and composition patterns. Rotate3D (Gao et al., 2020) and Quat E (Zhang et al., 2019) extend the rotations to 3-dimensional and 4-dimensional spaces by introducing the quaternion number system. Recently, Dual E (Cao et al., 2021) utilizes dual quaternions to combine translations and rotations in 3-d space for modeling multiple relations.

Neural-network-based models. There are also some models using neural networks for KGE. R-GCN (Schlichtkrull

Hous E: Knowledge Graph Embedding with Householder Parameterization

et al., 2018) introduces graph neural networks as the graph encoders. Conv E (Dettmers et al., 2018) exploits convolution operations to facilitate the score calculation. However, such methods lack of explicit geometrical explanations on modeling relation patterns and RMPs.

6. Conclusion

In this paper, we propose Hous E, a novel KGE framework based on Householder parameterization. Hous E models relations as high-dimensional Householder rotations to capture crucial relation patterns. Moreover, with Householder projections, Hous E generates relation-specific embeddings for each entity to model RMPs. Experimental results on five datasets demonstrate the superiority of our proposal.

Acknowledgements

This work is supported in part by the National Key Research and Development Program of China (no. 2021ZD0112400), and also in part by the National Natural Science Foundation of China (no. U1811463).

Artin, E. Geometric algebra. Courier Dover Publications, 2016.

Bergstra, J. and Bengio, Y. Random search for hyperparameter optimization. Journal of Machine Learning Research, 13:281 305, 2012.

Bollacker, K., Evans, C., Paritosh, P., Sturge, T., and Taylor, J. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1247 1250, 2008.

Bordes, A., Weston, J., Collobert, R., and Bengio, Y. Learning structured embeddings of knowledge bases. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011.

Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., and Yakhnenko, O. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems, pp. 2787 2795, 2013.

Cao, Z., Xu, Q., Yang, Z., Cao, X., and Huang, Q. Dual quaternion knowledge graph embeddings. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, pp. 6894 6902, 2021.

Dettmers, T., Minervini, P., Stenetorp, P., and Riedel, S. Convolutional 2d knowledge graph embeddings. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pp. 1811 1818, 2018.

Gao, C., Sun, C., Shan, L., Lin, L., and Wang, M. Rotate3d: Representing relations as rotations in three-dimensional space for knowledge graph embedding. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 385 394, 2020.

Householder, A. S. Unitary triangularization of a nonsymmetric matrix. Journal of the ACM (JACM), pp. 339 342, 1958.

Ji, G., He, S., Xu, L., Liu, K., and Zhao, J. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 687 696, 2015.

Kadlec, R., Bajgar, O., and Kleindienst, J. Knowledge base completion: Baselines strike back. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pp. 69 74, 2017.

Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, 2015.

Lin, Y., Liu, Z., Sun, M., Liu, Y., and Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2181 2187, 2015.

Mahdisoltani, F., Biega, J., and Suchanek, F. Yago3: A knowledge base from multilingual wikipedias. In 7th Biennial Conference on Innovative Data Systems Research, 2015.

Miller, G. A. Wordnet: a lexical database for english. Communications of the ACM, pp. 39 41, 1995.

Nguyen, D. Q., Nguyen, T. D., Nguyen, D. Q., and Phung, D. A novel embedding model for knowledge base completion based on convolutional neural network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 327 333, 2018.

Schlichtkrull, M., Kipf, T. N., Bloem, P., Van Den Berg, R., Titov, I., and Welling, M. Modeling relational data with graph convolutional networks. In European semantic web conference, pp. 593 607, 2018.

Song, T., Luo, J., and Huang, L. Rot-pro: Modeling transitivity by projection in knowledge graph embedding. In Advances in Neural Information Processing Systems, 2021.

Suchanek, F. M., Kasneci, G., and Weikum, G. Yago: a core of semantic knowledge. In Proceedings of the 16th

Hous E: Knowledge Graph Embedding with Householder Parameterization

International Conference on World Wide Web, pp. 697 706, 2007.

Sun, Z., Deng, Z.-H., Nie, J.-Y., and Tang, J. Rotate: Knowledge graph embedding by relational rotation in complex space. In 7th International Conference on Learning Representations, 2019.

Toutanova, K. and Chen, D. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality, pp. 57 66, 2015.

Trouillon, T., Welbl, J., Riedel, S., Gaussier, E., and Bouchard, G. Complex embeddings for simple link prediction. In Proceedings of the 33nd International Conference on Machine Learning, pp. 2071 2080, 2016.

Wang, Z., Zhang, J., Feng, J., and Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 1112 1119, 2014.

Xiao, H., Huang, M., Hao, Y., and Zhu, X. Transa: An adaptive approach for knowledge graph embedding. Co RR, 2015.

Xiong, C., Power, R., and Callan, J. Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th International Conference on World Wide Web, pp. 1271 1279, 2017.

Yang, B., Yih, W., He, X., Gao, J., and Deng, L. Embedding entities and relations for learning and inference in knowledge bases. In 3rd International Conference on Learning Representations, 2015.

Zhang, S., Tay, Y., Yao, L., and Liu, Q. Quaternion knowledge graph embeddings. In Advances in Neural Information Processing Systems, pp. 2731 2741, 2019.

Hous E: Knowledge Graph Embedding with Householder Parameterization

A. Notations

Table 7. Notations used in this paper.

Symbol Shape Description

E - Set of entities

R - Set of relations

D - Set of factual triples

h, t - Head entity and tail entity

r - Relation type

d R Embedding size

k R Rotation dimension

m R Number of modified Householder matrices

Se Rd k Representation of entity e E

Se,r Rd k r-specific representation of entity e E

2 k Param. of r R for Householder rotation

Pr Rd m k Param. of r R for Householder projection axes

Tr Rd m Param. of r R for Householder projection scalars

B. Proofs of Theorem 3.1

B.1. Proof of Lemma B.1

In order to prove Theorem 3.1, we first prove an auxiliary Lemma B.1.

Lemma B.1. Any k k orthogonal matrix Q can be decomposed into the product of k 1 or k Householder matrices.

Proof. From the Householder QR decomposition (Householder, 1958), we can upper triangularize any full-rank matrix W Rk k by using k 1 Householder reflections, i.e.,

H(uk 1)H(uk 2) H(u1)W = R,

where R Rk k is an upper triangular matrix and its first n 1 diagonal elements are all positive.

When Household QR decomposition is performed on an orthogonal matrix Q, we can get:

H(uk 1)H(uk 2) H(u1)Q = R.

Note that R here is both upper triangular and orthogonal (i.e., RRT = I) since it is a product of k orthogonal matrices. It establishes that R is a diagonal matrix, where the first k 1 diagonal entries are equal to +1 and the last diagonal entry is either +1 or -1.

If the last diagonal entry of R is equal to +1, we have

H(uk 1)H(uk 2) H(u1)Q = I.

As each Householder matrix H(ui) is its own inverse, we obtain that

Q = H(u1) H(uk 1). (13)

If the last diagonal entry of R is equal to 1, we can set uk = ek = (0, . . . , 0, 1) Rk and consequently get

H(uk)R = H(uk)H(uk 1) H(u1)Q = I.

Hous E: Knowledge Graph Embedding with Householder Parameterization

Since H(ui) is its own inverse, we also obtain that

Q = H(u1) H(uk 1)H(uk). (14)

From Equation (13) and (14), we can see that any k k orthogonal matrix can be decomposed into the product of k 1 or k Householder matrices.

B.2. Proof of Theorem 3.1

Proof. We first prove that when n = k

2 , the image of Rot-H is a subset of SO(k), i.e., Rot-H(U) SO(k). Note that each Householder matrix is symmetric and orthogonal and its determinant is 1. Therefore, the product of 2n Householder matrices is an orthogonal matrix with determinant +1, i.e., a rotation matrix (Artin, 2016), which means Rot-H(U) SO(k).

Then we also prove that its converse is also valid, i.e., any k k rotation matrix can be expressed as the product of 2 k

2 Householder matrices H(ui). Note that a rotation matrix e Q is a special orthogonal matrix with determinant +1 (Artin, 2016), i.e. det( e Q) = +1, and thus e Q can be decomposed into the product of k 1 or k Householder matrices based on Lemma B.1. Moreover, since det(H(ui)) = 1 and the determinant of a product of matrices is the product of their determinants, we can naturally derive that any k k rotation matrix can be decomposed into the product of 2 k

2 Householder matrices, i.e., SO(k) Rot-H(U). All in all, we have Rot-H(U) = SO(k).

C. Definitions

Definition C.1. A relation r is symmetric (antisymmetric) if x, y

r(x, y) r(y, x) (r(x, y) r(y, x)).

A clause with such form is a symmetry (antisymmetry) pattern.

Definition C.2. A relation r1 is inverse to relation r2 if x, y

r2(x, y) r1(y, x).

A clause with such form is an inversion pattern.

Definition C.3. A relation r1 is composed of relation r2 and relation r3 if x, y, z

r2(x, y) r3(y, z) r1(x, z).

A clause with such form is a composition pattern.

Following (Bordes et al., 2013), there are four relation mapping properties:

Definition C.4. A relation r is 1-to-1 if a head can appear with at most one tail.

Definition C.5. A relation r is 1-to-N if a head can appear with many tails.

Definition C.6. A relation r is N-to-1 if many heads can appear with the same tail.

Definition C.7. A relation r is N-to-N if many heads can appear with many tails.

D. Proofs of Claims

We denote the r-specific Householder rotation matrix and Householder projection matrices as e Qr and {Wr,1, Wr,2} respectively:

e Qr = Rot-H(Ur[i]),

Wr,1 = Pro-H(Pr,1[i], Tr,1[i]),

Wr,2 = Pro-H(Pr,2[i], Tr,2[i]).

For simplicity, we also omit the row indices [i] of entity representations in the following proofs.

Hous E: Knowledge Graph Embedding with Householder Parameterization

D.1. Proof of Claim 3.2

Proof. if r(x, y) and r(y, x) hold, we have

Sy = e Qr Sx Sx = e Qr Sy e Qr e Qr = I

Otherwise, if r(x, y) and r(y, x) hold, we have

Sy = e Qr Sx Sx = e Qr Sy e Qr e Qr = I

D.2. Proof of Claim 3.3

Proof. if r1(x, y) and r2(y, x) hold, we have

Sy = e Qr1Sx Sx = e Qr2Sy e Qr1 = e QT r2

D.3. Proof of Claim 3.4

Proof. if r1(x, z), r2(x, y) and r3(y, z) hold, we have

Sz = e Qr1Sx Sy = e Qr2Sx Sz = e Qr3Sy e Qr1 = e Qr3 e Qr2

D.4. Proof of Claim 3.5

Proof. if r(x, y) and r(y, x) hold, we have

Wr,2Sy = e Qr Wr,1Sx Wr,2Sx = e Qr Wr,1Sy

(W 1 r,2 e Qr Wr,1)(W 1 r,2 e Qr Wr,1) = I

Otherwise, if r(x, y) and r(y, x) hold, we have

Wr,2Sy = e Qr Wr,1Sx Wr,2Sx = e Qr Wr,1Sy

(W 1 r,2 e Qr Wr,1)(W 1 r,2 e Qr Wr,1) = I

D.5. Proof of Claim 3.6

Proof. if r1(x, y) and r2(y, x) hold, we have

Wr1,2Sy = e Qr1Wr1,1Sx Wr2,2Sx = e Qr2Wr2,1Sy

W 1 r1,2 e Qr1Wr1,1 = (W 1 r2,2 e Qr2Wr2,1) 1

D.6. Proof of Claim 3.7

Proof. if r1(x, z), r2(x, y) and r3(y, z) hold, we have

Wr1,2Sz = e Qr1Wr1,1Sx

Wr2,2Sy = e Qr2Wr2,1Sx

Wr3,2Sz = e Qr3Wr3,1Sy

W 1 r1,2 e Qr1Wr1,1 = (W 1 r3,2 e Qr3Wr3,1)(W 1 r2,2 e Qr2Wr2,1)

Hous E: Knowledge Graph Embedding with Householder Parameterization

D.7. Proof of Claim 3.8

In order to model sophisticated RMPs, we expect to tackle the challenge of indistinguishable representations with Householder projections as mentioned in Section 3.1. For the N-to-1 relations, here we take a 2-to-1 relation r as an example with two triples (h1, r, t) and (h2, r, t). Householder projections can adjust the relative distance between entity h1 and h2 according to relation r. Formally, the original distance between h1 and h2 is defined as: s = Sh1 Sh2 2. After applying a modified Householder matrix, the relative distance between the projected representations is:

ˆs2 = Sh1,r Sh2,r 2 2 = s2 + (τ 2 r 2τr)s2 cos2 θs,pr.

It is clear that the learnable τr determines the increase or decrease of the relative distance: (1) when 0 < τr < 2, ˆs s; (2) when τr = 0 or 2, ˆs = s; (3) when τr < 0 or τ > 2, ˆs s. Moreover, the term cos θs,pr is determined by the relative positions between the entities and the projection axis pr. This reveals that the Householder projections can adaptively change the relative distance between entities based on their positions. With such projections, one can obtain similar r-specific representations Sh1,r and Sh2,r for h1 and h2 , while the original representations Sh1 and Sh2 can be still distinguishable. The same is also true for 1-to-N relations.

E. Discussion on Other Relation Patterns

E.1. Multiplicity

The multiplicity pattern has been investigated in Dual E (Cao et al., 2021). Formally, it has the following definition:

Definition E.1. Relation r1, r2, . . . , r N are multiple if i {1, . . . , N}, (h, ri, t) can hold in KGs simultaneously. A clause with such form is a multiplicity pattern.

Dual E utilizes dual quaternions to represent each relation as a 3-dimensional rotation followed by a translation. It proves that the combination of rotations and translations can model multiple relations, since for any given rotation applied to the head entity h, there is always a corresponding translation to transform the rotated head entity to the tail entity t.

In our proposed Hous E, the relational Householder projections can be regarded as a special translation along the projection axes. Thus, Hous E is similar to Dual E in terms of multiplicity modeling capacity. What s more, as shown in Section 4.7, our proposal can also easily integrate translations to achieve better performance. Geometrically, Dual E can be viewed as a special case of Hous E+ with 3-dimensional rotations.

E.2. Transitivity.

Rot-Pro (Song et al., 2021) focuses on modeling the transitivity pattern, which is formally defined as: Definition E.2. A relation r is transitive if for any instances (e1, r, e2) and (e2, r, e3) of relation r, (e1, r, e3) is also an instance of r. A clause with such form is a transitivity pattern.

Rot-Pro theoretically shows that the transitive relations can be modeled with a special orthogonal projections, which is designed to project the points onto the rotated axes. This kind of projections can be viewed as a 2-dimensional case of Trans H s projections.

Hous E can be reduced to Rot-Pro if we set the rotation dimension to 2 and the projection scalars to 1. However, in our opinion, such projections may not be the optimal way to handle transitivity. As shown in (Song et al., 2021), Rot-Pro tends to project the entities under the transitive relation to a same point and the phase of relational rotation tends to be 2nπ(n = 0, 1, 2, . . .). We can see that such solution is a subset of the solution of modeling symmetric relations, which means that the modeled transitive relations must be symmetric and the antisymmetric transitive relations are ignored. Therefore, how to comprehensively model the transitive relations is still a challenging problem, and we will take this as the future work.

F. Datasets

Table 8 summarizes the detailed statistics of five benchmark datasets:

WN18 (Bordes et al., 2013) is extracted from Word Net (Miller, 1995), a database featuring lexical relations between words.

FB15k (Bordes et al., 2013) contains relation triples from Freebase (Bollacker et al., 2008), a large-scale knowledge graph

Hous E: Knowledge Graph Embedding with Householder Parameterization

Table 8. Statistics of five standard benchmarks.

Dataset #entity #relation #training #validation #test

WN18 40,943 18 141,442 5,000 5,000 FB15k 14,951 1,345 483,142 50,000 59,071 WN18RR 40,943 11 86,835 3,034 3,134 FB15k-237 14,541 237 272,115 17,535 20,466 YAGO3-10 123,182 37 1,079,040 5,000 5,000

containing general knowledge facts. The main relation patterns in WN18 and FB15k are symmetry, antisymmetry and inversion.

The WN18RR (Dettmers et al., 2018) and FB15k-237 (Toutanova & Chen, 2015) datasets are subsets of WN18 and FB15k respectively with inverse relations removed. The key of link prediction on WN18RR and FB15k-237 boils down to model and infer the symmetry, antisymmetry and composition patterns.

YAGO3-10 is a subset of YAGO3 (Mahdisoltani et al., 2015), containing 123,182 entities and 37 relations. Most of the triples in YAGO3-10 are descriptive attributes of people, such as citizenship, gender, profession and marital status.

G. Implementation details

Table 9 shows the amount of parameters used in our models and several recent competitive baselines: Rotat E, Rotate3D, Quat E and Dual E. To ensure fair comparisons, we set the smaller embedding size d to represent each entity and relation in Hous E-r and Hous E, so that the total number of parameters is similar to other baselines. Specifically, we fix the number of parameters d k to represent a single entity as 1000, 1200, 800, 600, 1000 on WN18, FB15k, WN18RR, FB15k-237 and YAGO3-10, respectively. Hyperparameter d denotes the embedding size and k is the rotation dimension. The larger rotation dimension k leads to the smaller embedding size d.

From Table 9, one can see that our proposed models have similar numbers of parameters compared to the baselines. The only exception is Quat E on WN18RR and FB15k-237. We have tried to increase the number of parameters of Quat E by enlarging the embedding size d on these two datasets, while carefully tuning hyperparameters simultaneously. Unfortunately, the performance of Quat E drops with more free parameters. Thus, to ensure the fairness of performance comparison, we report the parameter numbers of Quat E with the best link prediction results.

Table 9. Number of free parameters comparison. The results of baselines are taken from the original papers.

Model Rotat E Rotate3D Quat E Dual E Hous E-r Hous E

WN18 40.95M 122.90M 49.15M 65.53M 40.88M 41.03M FB15k 31.25M 50.23M 26.08M 26.08M 24.40M 27.63M WN18RR 40.95M 61.44M 16.38M 32.76M 32.57M 32.84M FB15k-237 29.32M 44.57M 5.82M 11.64M 12.13M 13.36M YAGO3-10 123.18M - - - 122.91M 122.99M

Table 10 shows the convergence time required for the model training on five datasets. Rotat E is the simplest rotation-based model with the linear time complexity, which is selected as the baseline. Compared to Rotat E, our proposed Hous E-r and Hous E cost comparable or even less training time on these datasets by using the efficient computation in Equation (7). Combined with the link prediction results in Table 2 and 3, one can see that our proposal is capable of improving model effectiveness without sacrificing the efficiency.

Table 10. Training time of Rotat E and our proposal on five datasets.

Model WN18RR FB15k-237 WN18 FB15k YAGO3-10

Rotat E 4h 6h 4h 9h 10h Hous E-r 1.5h 3h 3h 8h 11h Hous E 1.5h 5h 3h 9h 13h

Hous E: Knowledge Graph Embedding with Householder Parameterization

We use Adam (Kingma & Ba, 2015) as the optimizer and fine-tune the hyperparameters on the validation dataset. The hyperparameters are tuned by the random search (Bergstra & Bengio, 2012), including batch size b, self-adversarial sampling temperature α, fixed margin γ, learning rate lr, rotation dimension k, number of modified Householder reflections m for Householder projections, and regularization coefficient λ. The hyper-parameter search space is shown in Table 11.

Table 11. Hyperparameter search space.

Hyperparameter Search Space Type

b {500, 800, 1000, 1500, 2000} Choice α [0.5, 2.0] Range γ {5, 7, 9, 10, 11, 16, 20, 24, 28} Choice lr [0.0001, 0.003] Range k {2, 4, 8, 12, 16, 20, 25, 30} Choice m {1, 2, 3, 4, 6, 8} Choice λ [0, 0.3] Range