This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

TeX-Graph: Coupled tensor-matrix knowledge-graph embedding for COVID-19 drug repurposing

Charilaos I. Kanatsoulis 00footnotemark: 0 Electrical and Computer Engineering Department, University of Minnesota. Email: kanat003@umn.edu, \dagger Electrical and Computer Engineering Department, University of Virginia. Email: nikos@virginia.edu    Nicholas D. Sidiropoulos 22footnotemark: 2
Abstract

Knowledge graphs (KGs) are powerful tools that codify relational behaviour between entities in knowledge bases. KGs can simultaneously model many different types of subject-predicate-object and higher-order relations. As such, they offer a flexible modeling framework that has been applied to many areas, including biology and pharmacology – most recently, in the fight against COVID-19. The flexibility of KG modeling is both a blessing and a challenge from the learning point of view. In this paper we propose a novel coupled tensor-matrix framework for KG embedding. We leverage tensor factorization tools to learn concise representations of entities and relations in knowledge bases and employ these representations to perform drug repurposing for COVID-19. Our proposed framework is principled, elegant, and achieves 100%100\% improvement over the best baseline in the COVID-19 drug repurposing task using a recently developed biological KG.

Keywords— knowledge graphs, tensor, drug repurposing, COVID-19, embedding, network

1 Introduction

How does COVID-19 relate to better-studied viral infections and biological mechanisms? Can we use existing drugs to effectively treat COVID-19 symptoms? Since the COVID-19 pandemic has disrupted our lives, there is a pressing need to answer such questions, and COVID-19 research has swiftly risen to the top of the scientific agenda, worldwide. While these questions will ultimately be answered by medical experts, data-driven methods can help to cut-down the immense search space, thus helping to accelerate progress and optimize the allocation of precious research resources. In this paper, our goal is to derive such a method by using network science and multi-view analysis tools.

Networks are powerful abstractions that model interactions between the entities of a system [3]. Networks and network science offer concise and informative modeling, analysis and processing for various biological, engineering and social systems, to name a few [11, 22]. Networks are usually represented by graphs, that are defined by a set of nodes and a set of edges connecting pairs of nodes. The entities of a system are usually represented by the nodes of the graph, and the interactions by the edges.

A knowledge graph (KG) models the relational behavior of various entities in knowledge bases. A KG is heterogeneous in the sense that it models interactions between entities of different type, e.g., drugs and diseases, and is also a multidimensional network (edge-labeled multi-graph) [4], since the edges (interactions) that connect the nodes (entities) can be multiple and also of different type. Knowledge graphs (KGs) have recently attracted significant attention due to their applicability to various science and engineering tasks. For instance, popular knowledge graphs are YAGO [32], DBpedia [1], NELL [8], Freebase [5], and the Google KG [29]. A recent trend codifies knowledge bases of biomedical components and processes, such as genes, diseases and drugs into KGs e.g., [14, 15, 17]. KGs can model any relations of the form subject-predicate-object, as well as higher-order generalizations. However, this broad modeling freedom can sometimes be a challenge, as the entities can be very diverse and the dimensions of the KG can turn prohibitively large.

A common way to exploit KGs for data mining and machine learning applications is via knowledge graph embedding. KG embedding aims to extract meaningful low dimensional representations of the entities and the relations present in a KG. A plethora of methods have been proposed to perform KG embedding [20, 12, 25, 18, 26, 6, 21, 34, 23, 33, 30, 2]. The most popular among them adopt a single-layer perceptron or neural network approach e.g., [6, 21, 34, 33, 30]. Various tensor factorization models have also been proposed, e.g., [20, 12, 25, 23, 2]. Matrix factorization is also a tool that has been utilized for KG embedding, e.g., [18, 26].

In this paper we propose TeX-Graph, a novel coupled tensor-matrix framework to perform KG embedding. The proposed KG coupled tensor-matrix modeling extracts meaningful information from a set of diverse entities with multi-modal interactions in a principled and concise manner. TeX-Graph avoids modeling inefficiencies in previously proposed tensor models, and relative to neural network approaches it offers a principled and effective way to produce unique KG representations. The proposed framework is used for drug repurposing, a pivotal tool in the fight against COVID-19 and other diseases. Learning concise representations for drug compounds, diseases, and the relations between them, our approach allows for link prediction between drug compounds and COVID-19 or other diseases. The impact is critical. First, compound repurposing enables drug design that drastically reduces the design exploration cycle and the failure rate. Second, it markedly reduces drug development cost, as developing new therapeutic drugs is tremendously expensive.

The contributions of our work can be summarized as follows:

  • Novel KG modeling: We propose a principled coupled tensor-matrix model tailored to KG needs for efficient and parsimonious representations.

  • Analysis: The TeX-Graph embeddings are unique and permutation invariant, a property which is important for consistency and necessary for interpretability.

  • Algorithm: We design a scalable algorithmic framework with lightweight updates, that can effectively handle very large KGs.

  • Application: The proposed framework is developed to perform drug repurposing, a pivotal task in the fight against COVID-19.

  • Performance: TeX-Graph achieves 100%100\% performance improvement compared to the best available baseline for COVID-19 drug repurposing using a recently developed COVID-19 KG.

Reproducibility: The DRKG dataset used in the experiments is publicly available; we will also release our code upon publication of the paper.

Notation: Our notation is summarized in Table 1.

Table 1: Overview of notation.
x,y,zx,y,z \triangleq scalars
(m,n)(m,n)(h,r,t)(h,r,t) \triangleq ordered tuple
𝒙,𝒚,𝒛\bm{x},\bm{y},\bm{z} \triangleq vectors
𝑨,𝑩,𝑪\bm{A},~{}\bm{B},~{}\bm{C} \triangleq matrices
𝑿¯,𝒀¯,𝒁¯\underline{\bm{X}},\underline{\bm{Y}},\underline{\bm{Z}} \triangleq tensors
𝒮,𝒮n+,𝒮n\mathcal{S},\mathcal{S}_{n}^{+},\mathcal{S}_{n}^{-} \triangleq sets
𝑨(:,f)\bm{A}(:,f) \triangleq ff-th column of matrix 𝑨\bm{A}
𝑨(i,:)\bm{A}(i,:) \triangleq ii-th row of matrix 𝑨\bm{A}
𝑿k\bm{X}^{k} \triangleq kk-th frontal slab of tensor 𝑿¯\underline{\bm{X}}
𝑨T\bm{A}^{T} \triangleq transpose of matrix 𝑨\bm{A}
𝑨F\lVert\bm{A}\rVert_{F} \triangleq Frobenius norm of matrix 𝑨\bm{A}
\odot \triangleq Khatri-Rao (columnwise Kronecker) product
\circ \triangleq outer product
\ast \triangleq Hadamard product
diag(𝒙)(\bm{x}) \triangleq diagonal matrix of vector 𝒙\bm{x}
x\lfloor x\rfloor \triangleq largest integer that is less than or equal to xx
nnz \triangleq number of non-zeros

2 Preliminaries

In order to facilitate the upcoming discussion we now discuss some tensor algebra preliminaries. For more background on tensors the reader is referred to [28, 19].

A third-order tensor 𝑿¯I×J×K\underline{\bm{X}}\in\mathbb{R}^{I\times J\times K} is a three-way array indexed by i,j,ki,j,k with elements 𝑿¯(i,j,k)\underline{\bm{X}}(i,j,k). It has three mode types – columns 𝑿¯(i,:,k)\underline{\bm{X}}(i,:,k) (:: is used to denote all relevant index values for that mode, i.e., from beginning to end), rows 𝑿¯(:,j,k)\underline{\bm{X}}(:,j,k), and fibers 𝑿¯(i,j,:)\underline{\bm{X}}(i,j,:) – see Fig. 1. A third order tensor can also be described by a set of matrices (slabs), i.e., horizontal 𝑿¯(i,:,:)\underline{\bm{X}}(i,:,:), vertical 𝑿¯(:,j,:)\underline{\bm{X}}(:,j,:) and frontal slabs 𝑿¯(:,:,k)\underline{\bm{X}}(:,:,k) – see Fig. 2. A rank-one third order tensor 𝒁¯I×J×K\underline{\bm{Z}}\in\mathbb{R}^{I\times J\times K} is defined as the outer product of three vectors. Recall that a rank one matrix is the outer product of two vectors. Any third order tensor can be decomposed into a sum of three-way outer products (rank one tensors) as:

(2.1) 𝑿¯=f=1F𝑨(:,f)𝑩(:,f)𝑪(:,f),\underline{\bm{X}}=\sum_{f=1}^{F}\bm{A}(:,f)\circ\bm{B}(:,f)\circ\bm{C}(:,f),

where 𝑨I×F,𝑩J×F,𝑪K×F\bm{A}\in\mathbb{R}^{I\times F},~{}\bm{B}\in\mathbb{R}^{J\times F},~{}\bm{C}\in\mathbb{R}^{K\times F} are matrices collecting the respective mode factors, i.e., hold in their columns the vectors involved in the FF three-way outer products. The above expression is known as the polyadic decomposition (PD) of a third-order tensor. If FF is the minimum number of outer products required to synthesize 𝑿¯\underline{\bm{X}}, then FF is the rank of tensor 𝑿¯\underline{\bm{X}} and the decomposition is known as the canonical polyadic decomposition (CPD) or parallel factor analysis (PARAFAC) [13]. For the rest of the paper we use the notation 𝑿¯=𝑨,𝑩,𝑪\underline{\bm{X}}=\left\llbracket{\bm{A}},{\bm{B}},{\bm{C}}\right\rrbracket to denote the CPD of the tensor.

Refer to caption
Figure 1: The columns, rows and fibers of a third-order tensor.
Refer to caption
Figure 2: The horizontal, vertical, and frontal slabs of a third-order tensor.

A striking property of the CPD is that it is essentially unique under mild conditions, even if the rank is higher than the dimensions– see [9] for a generic result.

A tensor can be represented in a matrix form by employing the matricization operation. There are three common ways to matricize (or unfold) a third-order tensor, by stacking columns, rows, or fibers of the tensor to form a matrix. To be more precise let:

(2.2) 𝑿¯(:,:,k)=𝑿kI×J,\bm{\underline{X}}(:,:,k)=\bm{X}^{k}\in\mathbb{R}^{I\times J},

where 𝑿k\bm{X}^{k} are the frontal slabs of tensor 𝑿¯\bm{\underline{X}} and in the context of this paper they model adjacency matrices. Then the mode-11, mode-22 and mode-33 unfoldings of 𝑿¯\underline{\bm{X}} are:

(2.3) 𝑿(𝟏)=[𝑿1,,𝑿K]T=(𝑪𝑩)𝑨TJK×I,{\bm{{X}^{(1)}}=\begin{bmatrix}\bm{X}^{1},\ldots,\bm{X}^{K}\end{bmatrix}^{T}=({\bm{C}}\odot{\bm{B}}){\bm{A}}^{T}\in\mathbb{R}^{JK\times I},}
(2.4) 𝑿(𝟐)=[𝑿1T,,𝑿KT]=(𝑪𝑨)𝑩TIK×J,{\bm{{X}^{(2)}}=\begin{bmatrix}\bm{X}^{1^{T}},\ldots,\bm{X}^{K^{T}}\end{bmatrix}=({\bm{C}}\odot{\bm{A}}){\bm{B}}^{T}\in\mathbb{R}^{IK\times J},}
(2.5) 𝑿(𝟑)=[𝑿1(:,1),,𝑿K(:,1)𝑿1(:,J),,𝑿K(:,J)]=(𝑩𝑨)𝑪TIJ×K,{\bm{{X}^{(3)}}=\begin{bmatrix}\bm{X}^{1}(:,1),\cdots,\bm{X}^{K}(:,1)\\ \vdots\\ \bm{X}^{1}(:,J),\cdots,\bm{X}^{K}(:,J)\end{bmatrix}=({\bm{B}}\odot{\bm{A}}){\bm{C}}^{T}\in\mathbb{R}^{IJ\times K},}

Another important tensor model is the coupled CPD. In coupled CPD we are interested in decomposing an array of tensors that share at least one common latent factor. In particular, consider a collection of NN tensors 𝑿¯nI×Jn×Kn,n{1,N}\underline{\bm{X}}_{n}\in\mathbb{R}^{I\times J_{n}\times K_{n}},~{}n\in\{1,\dots N\}. The rank-FF coupled CPD of {𝑿¯n}\{\underline{\bm{X}}_{n}\} can be expressed as:

(2.6) 𝑿¯n=𝑨,𝑩n,𝑪n,n{1,N},\underline{\bm{X}}_{n}=\left\llbracket{\bm{A}},{\bm{B}}_{n},{\bm{C}}_{n}\right\rrbracket,~{}n\in\{1,\dots N\},

where 𝑨I×F\bm{A}\in\mathbb{R}^{I\times F} is the common factor and 𝑩nJn×F,𝑪nKn×F\bm{B}_{n}\in\mathbb{R}^{J_{n}\times F},~{}\bm{C}_{n}\in\mathbb{R}^{K_{n}\times F} are unshared factors. The coupled CPD is also unique under certain conditions, even if individual CPDs of 𝑿n\bm{X}_{n} are not unique. In this work we will use the following uniqueness theorem for coupled CPD:

Theorem 1

[31, p. 510] Consider the coupled CPD as expressed in (2.6). The decomposition is essentially unique if:

  1. 1.

    𝑨\bm{A} has full column rank,

  2. 2.

    𝑮\bm{G} has full column rank,

where 𝑮\bm{G} is defined as:

(2.7) 𝑮\displaystyle\bm{G} =[C2(𝑪1)C2(𝑩1)C2(𝑪N)C2(𝑩N)]\displaystyle=\begin{bmatrix}C_{2}\left(\bm{C}_{1}\right)\odot C_{2}\left(\bm{B}_{1}\right)\\ \vdots\\ C_{2}\left(\bm{C}_{N}\right)\odot C_{2}\left(\bm{B}_{N}\right)\end{bmatrix}
14n=1NJn(Jn1)Kn(Kn1)×12F(F1),\displaystyle\in\mathbb{R}^{\frac{1}{4}\sum_{n=1}^{N}J_{n}(J_{n}-1)K_{n}(K_{n}-1)\times\frac{1}{2}F(F-1)},

and C2(𝑪n)C_{2}\left(\bm{C}_{n}\right) is the second compound matrix of 𝑪n\bm{C}_{n}– see [10, p. 861-862]. In the context of coupled CPD, essential uniqueness corresponds to 𝑨\bm{A} being unique and {𝑩n,𝑪n}\{\bm{B}_{n},~{}\bm{C}_{n}\} being identifiable up to column scaling and counter-scaling.

3 Problem Statement

As mentioned in the introduction knowledge graphs (KGs) have attracted significant attention over the past decade due to their tremendous modeling capabilities. In particular, KGs model triplets of subject-predicate-object, denoted as (head, relation, tail) or (h, r, t). Subjects (heads) and objects (tails) are entities that are represented as graph nodes and predicates (relations) define the type of edge according to which the subject is connected to the object. A schematic representation of a KG, which models relations between genes, compounds and diseases is presented in Fig. 3.

In this paper, we focus our attention on a biological KG that models relational triplets between biological entities. For example, (compound 1, interacts with, compound 2), (compound 1, activates, gene 1), (gene 1, regulates, gene 2), (compound 1, prevents, disease 1), (gene 2, is linked with, disease 2) are common triplets in numerous recently developed knowledge bases [14, 15, 17]. Modeling these types of relations as a KG enables embedding entities and relations in a Euclidean space which can further facilitate any type of processing and analysis. For instance, obtaining a low dimensional representation of compounds, diseases and the ‘prevents’ relation allows measuring similarity, and thus predicting and testing hypotheses regarding (compound, prevents, disease) interactions. Drug repurposing can be performed by predicting candidate compounds for new and existing target diseases.

Refer to caption
Figure 3: Schematic representation of biological KG.

Note that the proposed framework to be introduced shortly is not limited to biological KGs – it can be applied to a wide variety of interesting KGs.

3.1 Prior Art

Several methods have been proposed to learn low dimensional representations of KGs. To properly describe them we need to define the score function f()f(\cdot) and the loss function ()\mathcal{L}(\cdot).

Let (hn,rn,tn)(h_{n},r_{n},t_{n}) be an available triplet and 𝒉nF,𝒕nF\bm{h}_{n}\in\mathbb{R}^{F},~{}\bm{t}_{n}\in\mathbb{R}^{F} and 𝒓nd\bm{r}_{n}\in\mathbb{R}^{d} be the low dimensional embeddings we aim to learn. Note that entity and relation embeddings need not be of the same dimension. The score function determines the relation model between the head (subject) and the tail (object). In simple words, high values of the score function f(𝒉n,𝒓n,𝒕n)f(\bm{h}_{n},\bm{r}_{n},\bm{t}_{n}) are desirable for existing triplets (hn,rn,tn)(h_{n},r_{n},t_{n}) and low values of f(𝒉n,𝒓n,𝒕n)f(\bm{h}_{n},\bm{r}_{n},\bm{t}_{n}) for non-existing ones.

In order to produce the entity and relational embeddings we define the following forward model for each triplet (hn,rn,tn)(h_{n},r_{n},t_{n}):

(3.8a) 𝒉n=γ(𝑾eT𝒐nh)F,\bm{h}_{n}=\gamma\left(\bm{W}_{e}^{T}\bm{o}^{h}_{n}\right)\in\mathbb{R}^{F},
(3.8b) 𝒕n=γ(𝑾eT𝒐nt)F,\bm{t}_{n}=\gamma\left(\bm{W}_{e}^{T}\bm{o}^{t}_{n}\right)\in\mathbb{R}^{F},
(3.8c) 𝒓n=δ(𝑾rT𝒐nr)d,\bm{r}_{n}=\delta\left(\bm{W}_{r}^{T}\bm{o}^{r}_{n}\right)\in\mathbb{R}^{d},

where 𝒐nh{0,1}Le,𝒐nt{0,1}Le,𝒐nr{0,1}Kr\bm{o}^{h}_{n}\in\mathbb{\{}0,1\}^{L_{e}},~{}\bm{o}^{t}_{n}\in\mathbb{\{}0,1\}^{L_{e}},~{}\bm{o}^{r}_{n}\in\{0,1\}^{K_{r}} are one-hot input vectors corresponding to the head, tail and relation index of the triplet (hn,rn,tn)(h_{n},r_{n},t_{n}) respectively, with Le,KrL_{e},~{}K_{r} being the total number of entities (nodes) and types of relations; γ()\gamma(\cdot) and δ()\delta(\cdot) are element-wise functions and 𝑾eLe×F,𝑾rKr×d\bm{W}_{e}\in\mathbb{R}^{L_{e}\times F},~{}\bm{W}_{r}\in\mathbb{R}^{K_{r}\times d} are matrices that contain the model parameters to be learned.

Popular choices for γ()\gamma(\cdot) and δ()\delta(\cdot) are the identity function and hyperbolic tangent. If γ()\gamma(\cdot) or δ()\delta(\cdot) are identity functions then the rows of 𝑾e\bm{W}_{e} or 𝑾e\bm{W}_{e} are the learned embeddings for entities and relations respectively. For TransE, DistMult and RotatE F=dF=d, whereas for TransR and RESCAL dFd\neq F. In the TransR model 𝑴rd×F\bm{M}_{r}\in\mathbb{R}^{d\times F} is a projection matrix associated with relation r and in RESCAL 𝑹F×F\bm{R}\in\mathbb{R}^{F\times F}.

Table 2: Knowledge Graph models
Model score function f(𝒉,𝒓,𝒕)f(\bm{h},\bm{r},\bm{t})
TransE [6] 1𝒉+𝒓𝒕21-\lVert\bm{h}+\bm{r}-\bm{t}\rVert_{2} or 1𝒉+𝒓𝒕11-\lVert\bm{h}+\bm{r}-\bm{t}\rVert_{1}
TransR [21] 1𝑴r𝒉+𝒓𝑴r𝒕21-\lVert\bm{M}_{r}\bm{h}+\bm{r}-\bm{M}_{r}\bm{t}\rVert_{2}
DistMult[34] 𝒉T\bm{h}^{T}diag(𝒓)𝒕(\bm{r})\bm{t}
RESCAL [23] 𝒉T𝑹𝒕\bm{h}^{T}\bm{R}\bm{t}
RotatE [33] 1𝒉𝒓𝒕1-\lVert\bm{h}\ast\bm{r}-\bm{t}\rVert

In order to learn the embeddings, state-of-the-art methods attempt to minimize the following risk:

(3.9) min𝑾e,𝑾r1Nn=1N(ynf(𝒉n,𝒓n,𝒕n))\displaystyle\operatorname*{min}_{\bm{W}_{e},\bm{W}_{r}}\frac{1}{N}\sum_{n=1}^{N}\mathcal{L}\left(y_{n}-f(\bm{h}_{n},\bm{r}_{n},\bm{t}_{n})\right)

where NN is the number of data points (triplets or non-triplets), yn=1y_{n}=1 if the triplet (hn,rn,tn)(h_{n},r_{n},t_{n}) exists, else yn=0y_{n}=0. Typical loss functions include the logistic loss, square loss, pairwise ranking loss, margin-based ranking loss and variants of them. In order to tackle the problem in (3.9) the most popular approach is stochastic gradient descent (SGD) or batch SGD [7].

3.2 The 3-way model

Modeling a KG using a third order tensor has been considered in [20, 12, 25, 23, 2]. In these works, the first and second mode of the tensor is the concatenation of all the available entities, regardless of their type, whereas the third mode represents the different type of relations – i.e., each frontal slab of the third order tensor represents a certain interaction type between the entities of the KG. The methods in [25, 2] work with incomplete tensors, whereas [20, 12, 23] model each frontal slab as an adjacency matrix. To be more precise, let 𝒁¯{0,1}Le×Le×Kr\underline{\bm{Z}}\in\{0,1\}^{L_{e}\times L_{e}\times K_{r}} be the third order tensor in [20, 12, 23]. Then 𝒁¯(i,j,k)=1\underline{\bm{Z}}(i,j,k)=1 if entity ii interacts with entity jj through relation kk and 𝒁¯(i,j,k)=0\underline{\bm{Z}}(i,j,k)=0 if there is no interaction between entities ii and jj via the kk relation.

An important observation is that although the first and second mode of tensor 𝒁¯\underline{\bm{Z}} represent the same entities, each frontal slab 𝒁k\bm{Z}^{k} is not necessarily symmetric. The reason is that subject-predicate-object does not necessarily imply object-predicate-subject. The works in [20, 12] compute the CPD of 𝒁¯\underline{\bm{Z}} (or scaled versions of 𝒁¯\underline{\bm{Z}}) and produce two embeddings for each entity, one as a subject and another as an object. Although this is not always a drawback, it can result in an overparametrized model because in many applications entities usually act either as a subject or as an object, but not both. Furthermore, a single unified representation is usually preferable. In order to overcome this issue, RESCAL [23] proposed the following model for each frontal slab:

(3.10) 𝒁k=𝑨𝑹k𝑨T,k=1,,Kr,\bm{Z}^{k}=\bm{A}\bm{R}^{k}\bm{A}^{T},~{}~{}~{}k=1,\dots,K_{r},

where 𝑹kF×F\bm{R}^{k}\in\mathbb{R}^{F\times F} is square matrix holding the relation embeddings associated with relation kk. Note that the RESCAL model is different than the traditional CPD (symmetric in mode 11 and 22) in the sense that 𝑹k\bm{R}^{k} is not constrained to be diagonal. Relaxing the diagonal constraints allows matrix 𝑹k\bm{R}^{k} to absorb in the relation embedding the direction in which different entities interact. On the downside, this type of relaxation forfeits the parsimony and uniqueness properties of the CPD. This is an important point, since uniqueness is a prerequisite for model interpretability when we are interested in exploratory / explanatory analysis (and not simply in making ‘black box’ predictions).

Another important drawback of the tree-way model is that it models unnecessary interactions. To see this, consider a KG that describes interactions between genes and diseases. Suppose that the observed interactions are of gene-gene and gene-disease type but there are no available data for disease-disease interactions. The tree-way model involves disease-disease interactions in the learning process (as non-edges), even though there are no data to justify it. As we will see in the upcoming section our proposed coupled tensor-matrix modeling addresses all the aforementioned challenges.

4 The TeX-Graph model

Refer to caption
Figure 4: Schematic representation of TeX-Graph model.

In this paper we leverage coupled tensor-matrix factorization to extract low dimensional representations of entities (head, tail) as well as representations for the interactions (relation). KGs can be naturally represented by a collection of tensors and matrices, as shown in Fig. 4. To see this, consider the previous example of gene, compound and disease entities. Gene-compound interactions, of a certain type, can be represented by an adjacency matrix. Since there are multiple types of interactions, multiple adjacency matrices are necessary to model every interaction, resulting in a tensor 𝑿¯g,c{0,1}Lg×Lc×Kg,c\bm{\underline{\bm{X}}}_{g,c}\in{\{0,1\}}^{L_{g}\times L_{c}\times K_{g,c}}, where Lg,LcL_{g},~{}L_{c} are the number of genes and compounds respectively, and Kg,cK_{g,c} is the number of different interactions between genes and compounds. The same idea can be applied to any (entity,interaction,entity) triplet.

To facilitate the discussion let 𝑿¯m,n{0,1}Lm×Ln×Km,n\underline{\bm{X}}_{m,n}\in{\{0,1\}}^{L_{m}\times L_{n}\times K_{m,n}} be the tensor of interactions between entity of type-mm and type-nn, e.g., mm codifies genes and nn codifies compounds. Also let LTL_{T} be the total number of different entity types, then m,n{1,,LT}m,n\in\{1,\dots,L_{T}\}. 𝑿¯m,n(i,j,k)=1\underline{\bm{X}}_{m,n}(i,j,k)=1 if the ii-th entity of type-mm interacts with the jj-th entity of type-nn via relation kk and 𝑿¯m,n(i,j,k)=0\underline{\bm{X}}_{m,n}(i,j,k)=0 if there is no type-kk interaction between the ii-th entity of type-mm and the jj-th entity of type-nn. The KG is represented by a collection of tensors as:

(4.11) 𝑿¯m,n{0,1}Lm×Ln×Km,n,(m,n)𝒮\displaystyle\underline{\bm{X}}_{m,n}\in{\{0,1\}}^{L_{m}\times L_{n}\times K_{m,n}},~{}(m,n)\in\mathcal{S}
𝒮={(m,n):mn,\displaystyle\mathcal{S}=\{(m,n):~{}m\leq n,~{}
(h,r,t)with(h,t)type(m,n)or(n,m)},\displaystyle\exists~{}(h,r,t)~{}\text{with}~{}(h,t)~{}\in\text{type}~{}(m,n)~{}\text{or}~{}(n,m)\},

where n=1LTLn=Le\sum_{n=1}^{L_{T}}L_{n}=L_{e} and (m,n)𝒮Km,n=Kr\sum_{(m,n)\in\mathcal{S}}K_{m,n}=K_{r}. Note that tensors 𝑿¯m,n\underline{\bm{X}}_{m,n} and 𝑿¯n,m\underline{\bm{X}}_{n,m} contain the same information since 𝑿m,nk=𝑿n,mkT{\bm{X}}_{m,n}^{k}={\bm{X}}_{n,m}^{k^{T}}. Therefore we only consider (m,n)(m,n) tuples where mnm\leq n.

Each of the tensors in the array {𝑿¯m,n,(m,n)𝒮}\{\underline{\bm{X}}_{m,n},~{}(m,n)\in\mathcal{S}\} admits a CPD and the overall model is cast as:

(4.12) 𝑿¯m,n=𝑨m,𝑨n,𝑪m,n,(m,n)𝒮,\underline{\bm{X}}_{m,n}=\left\llbracket{\bm{A}_{m}},{\bm{A}_{n}},{\bm{C}_{m,n}}\right\rrbracket,~{}(m,n)\in\mathcal{S},

where 𝑨nLn×F,𝑪m,nKm,n×F\bm{A}_{n}\in\mathbb{R}^{L_{n}\times F},\bm{C}_{m,n}\in\mathbb{R}^{K_{m,n}\times F}. The ii-th row of 𝑨n\bm{A}_{n} represents the FF-dimensional embedding of the ii-th type-nn entity and the kk-th row of 𝑪m,n\bm{C}_{m,n} represents the FF-dimensional embedding of the kk-th type relation between type-mm and type-nn entities. Note that in the case where entities of type-mm interact with entities of type-nn via only one type of relation, 𝑿m,n{0,1}Lm×Ln\bm{X}_{m,n}\in{\{0,1\}}^{L_{m}\times L_{n}} is a matrix and can be factored as:

(4.13) 𝑿m,n=𝑨mdiag(𝒄m,n)𝑨nT{\bm{X}}_{m,n}={\bm{A}_{m}}\text{diag}(\bm{c}_{m,n}){\bm{A}_{n}}^{T}

The model in (4.12) is a coupled CPD as the factors 𝑨n\bm{A}_{n} appear in multiple tensors. For instance, type-1-type-1 interactions (gene-gene), type-1-type-2 interactions (gene-compound), type-1-type-3 interactions (gene-disease), result in the factor 𝑨1\bm{A}_{1} appearing in tensors 𝑿¯1,1=𝑨1,𝑨1,𝑪1,1,𝑿¯1,2=𝑨1,𝑨2,𝑪1,1\underline{\bm{X}}_{1,1}=\left\llbracket\bm{A}_{1},\bm{A}_{1},\bm{C}_{1,1}\right\rrbracket,\underline{\bm{X}}_{1,2}=\left\llbracket\bm{A}_{1},\bm{A}_{2},\bm{C}_{1,1}\right\rrbracket and 𝑿¯1,3=𝑨1,𝑨3,𝑪1,3\underline{\bm{X}}_{1,3}=\left\llbracket\bm{A}_{1},\bm{A}_{3},\bm{C}_{1,3}\right\rrbracket.

The proposed TeX-Graph exhibits several favorable properties. First, the produced embeddings are unique, provided that they appear in more than one adjacency matrices.

Proposition 1

(Uniqueness of the embeddings) If the coupled tensor model in (4.12) is indeed low-rank, FF, there exist entity and relation embedding vectors in FF dimensional space that generate the given knowledge base. Then the FF-dimensional TeX-Graph embeddings for type-nn entities and type-(m,n)(m,n) relations are unique and permutation invariant provided that m𝒮n+Km,n+p𝒮nKn,p>1\sum_{m\in\mathcal{S}_{n}^{+}}K_{m,n}+\sum_{p\in\mathcal{S}_{n}^{-}}K_{n,p}>1 and Km,n>1K_{m,n}>1 respectively, where 𝒮n+,𝒮n\mathcal{S}_{n}^{+},~{}\mathcal{S}_{n}^{-} are defined in (4.16).

The proof of Proposition 1 utilizes the uniqueness results of Theorem 1 and is relegated to the journal version due to space limitations. In the case where Km,n=1K_{m,n}=1 and type-mm entities appear in multiple tensors but type-nn entities only in one, the TeX-Graph model identifies 𝑨m\bm{A}_{m} and 𝑨n\bm{A}_{n}diag(𝒄m,n)(\bm{c}_{m,n}), since there is rotational freedom between 𝑨n\bm{A}_{n} and 𝒄m,n\bm{c}_{m,n}.

Another important property of the proposed TeX-Graph is that it avoids modeling of spurious ‘cross-product’ relations that can never be observed. The coupled tensor-matrix model allows for a concise KG representation that eliminates such spurious relations from the start, contrary to the three-way model. To see this, consider the previous example of gene-disease KG that observes relational triplets between gene-gene and gene-disease type but not for disease-disease type. The proposed TeX-Graph does not model disease-disease interactions, whereas the three-way model treats them as non-edges.

It is worth noticing that TeX-Graph makes the implicit assumption that 𝑿¯n,n\underline{\bm{X}}_{n,n} are symmetric in the first and second mode. This is not always the case, since interactions between some entity types might be directed. To overcome this issue we assume that (h,r,t) implies (t,r,h) for (h,t) of the same type. Although this assumption ignores the direction in this type of interactions, it results in a more parsimonious model for the entity embeddings.

4.1 Algorithmic framework

In order to learn the FF-dimensional embeddings of all entities and relations we formulate the KG embedding problem as:

(4.14) min{𝑨m},{𝑪m,n}(m,n)𝒮𝑿¯m,n𝑨m,𝑨n,𝑪m,nF2,\displaystyle\operatorname*{min}_{\begin{subarray}{c}\{{\bm{A}}_{m}\},\{{\bm{C}}_{m,n}\}\end{subarray}}~{}\sum_{(m,n)\in\mathcal{S}}\left\|{\underline{\bm{X}}}_{m,n}-\llbracket{\bm{A}_{m}},{\bm{A}_{n}},{\bm{C}_{m,n}}\rrbracket\right\|_{F}^{2},

The problem in (4.14) is non-convex and NP-hard in general. In order to tackle it we propose to fix all variables but one and update the remaining variable. This procedure is repeated in an alternating fashion. The update for 𝑨n\bm{A}_{n} is a system of linear equations and takes the form:

(4.15) m𝒮n+(𝑪m,n𝑨m)T𝑿m,n(2)+p𝒮n(𝑪n,p𝑨p)T𝑿n,p(1)=\sum_{m\in\mathcal{S}_{n}^{+}}\left(\bm{C}_{m,n}\odot\bm{A}_{m}\right)^{T}\bm{X}_{m,n}^{(2)}+\sum_{p\in\mathcal{S}_{n}^{-}}\left(\bm{C}_{n,p}\odot\bm{A}_{p}\right)^{T}\bm{X}_{n,p}^{(1)}=
(m𝒮n+(𝑪m,nT𝑪m,n𝑨mT𝑨m)+p𝒮n(𝑪n,pT𝑪n,p𝑨pT𝑨p))𝑨nT,\left(\sum_{m\in\mathcal{S}_{n}^{+}}\left(\bm{C}_{m,n}^{T}\bm{C}_{m,n}\ast\bm{A}_{m}^{T}\bm{A}_{m}\right)+\sum_{p\in\mathcal{S}_{n}^{-}}\left(\bm{C}_{n,p}^{T}\bm{C}_{n,p}\ast\bm{A}_{p}^{T}\bm{A}_{p}\right)\right)\bm{A}_{n}^{T},

where

(4.16) 𝒮n+={m:(m,n)𝒮},𝒮n={p:(n,p)𝒮}\mathcal{S}_{n}^{+}=\{m:(m,n)\in\mathcal{S}\},~{}\mathcal{S}_{n}^{-}=\{p:(n,p)\in\mathcal{S}\}

The update for 𝑪m,n\bm{C}_{m,n} is the solution to the following system of linear equations:

(4.17) (𝑨n𝑨m)T𝑿m,n(3)=(𝑨nT𝑨n𝑨mT𝑨m)𝑪m,nT\left(\bm{A}_{n}\odot\bm{A}_{m}\right)^{T}\bm{X}_{m,n}^{(3)}=\left(\bm{A}_{n}^{T}\bm{A}_{n}\ast\bm{A}_{m}^{T}\bm{A}_{m}\right)\bm{C}_{m,n}^{T}

The derivations for these updates as well as implementation details are presented in Appendix A.

The proposed TeX-Graph is presented in Algorithm 1. TeX-Graph is an iterative algorithm that tackles a non-convex problem and NP-hard in general. As a result different initial points might produce different results. Although we have observed that random initialization is sufficient most of the times we propose an alternative initialization procedure that yields consistent and reproducible results. To be more specific we form a symmetric version of tensor 𝒁¯\underline{\bm{Z}} as:

(4.18) 𝒀¯(i,j,k)=min{1,𝒁¯(i,j,k)+𝒁¯(j,i,k)}\underline{\bm{Y}}(i,j,k)=\min\{1,\underline{\bm{Z}}(i,j,k)+\underline{\bm{Z}}(j,i,k)\}

Then we compute the semi-symmetric CPD of 𝒀¯=𝑨,𝑨,𝑪\underline{\bm{Y}}=\left\llbracket{\bm{A}},{\bm{A}},{\bm{C}}\right\rrbracket using sparse eigenvalue decomposition (EVD) [27]. The proposed initialization procedure is presented in Algorithm 2.

Algorithm 1 TeX-Graph
  Input: {(hn,rn,tn)}n=1N,{𝑨m},{𝑪m,n}\{(h_{n},r_{n},t_{n})\}_{n=1}^{N},~{}\{{\bm{A}}_{m}\},~{}\{{\bm{C}}_{m,n}\}.
  Output: {𝑨n}n=1Le,{𝑪m,n}(m,n)𝒮\{{\bm{A}}_{n}\}_{n=1}^{L_{e}},~{}\{{\bm{C}}_{m,n}\}_{(m,n)\in\mathcal{S}}.
  Create {𝑿¯m,n}(m,n)𝒮\{\underline{\bm{X}}_{m,n}\}_{(m,n)\in\mathcal{S}} from {(hn,rn,tn)}n=1N\{(h_{n},r_{n},t_{n})\}_{n=1}^{N}
  repeat
     for n{1,,LE}n\in\{1,\dots,L_{E}\} do
         𝑨n\bm{A}_{n}\leftarrow~{} solve  (4.15)
     end for
     for (m,n)𝒮(m,n)\in\mathcal{S} do
         𝑪(m,n)\bm{C}_{(m,n)}\leftarrow~{} solve  (4.17)
     end for
  until criterion is met.
Algorithm 2 TeX-Graph-initialization
  Input: {(hn,rn,tn)}n=1N\{(h_{n},r_{n},t_{n})\}_{n=1}^{N}.
  Output: {𝑨n}n=1Le,{𝑪m,n}(m,n)𝒮\{{\bm{A}}_{n}\}_{n=1}^{L_{e}},~{}\{{\bm{C}}_{m,n}\}_{(m,n)\in\mathcal{S}}.
  Create tensor 𝒁¯\underline{\bm{Z}} from {(hn,rn,tn)}n=1N\{(h_{n},r_{n},t_{n})\}_{n=1}^{N} as explained in section 3.2;
  Form 𝒀¯\underline{\bm{Y}} as: 𝒀¯(i,j,k)=min{1,𝒁¯(i,j,k)+𝒁¯(j,i,k)};\underline{\bm{Y}}(i,j,k)=\min\{1,\underline{\bm{Z}}(i,j,k)+\underline{\bm{Z}}(j,i,k)\};
  Solve 𝒀¯=𝑨,𝑨,𝑪\underline{\bm{Y}}=\left\llbracket{\bm{A}},{\bm{A}},{\bm{C}}\right\rrbracket via sparse EVD;
  Form {𝑨n}n=1Le,{𝑪m,n}(m,n)𝒮\{{\bm{A}}_{n}\}_{n=1}^{L_{e}},~{}\{{\bm{C}}_{m,n}\}_{(m,n)\in\mathcal{S}} from 𝑨,𝑪\bm{A},~{}\bm{C}.

4.2 Computational complexity analysis

In terms of memory requirements and computational complexity, the main bottleneck of TeX-Graph lies in instantiating and computing the matricized tensor times Khatri-Rao product (MTTKRP) in the left hand side (LHS) of (4.15) and (4.17). The number of flops needed to compute the LHS of (4.15) and (4.17) is 𝒪(Fnnz(m𝒮n+𝑿¯m,n+p𝒮n𝑿¯n,p))\mathcal{O}\left(F\cdot\text{nnz}\left(\sum_{m\in\mathcal{S}_{n}^{+}}\underline{\bm{X}}_{m,n}+\sum_{p\in\mathcal{S}_{n}^{-}}\underline{\bm{X}}_{n,p}\right)\right) and 𝒪(Fnnz(𝑿¯m,n))\mathcal{O}\left(F\cdot\text{nnz}\left(\underline{\bm{X}}_{m,n}\right)\right) respectively. For small values of FF which is usually the case in practice the complexity is linear in the number of triplets participating in each update. Furthermore the Khatri-Rao products in the (LHS) of (4.15) and (4.17) are not being instantiated as shown in Appendix A.

5 Drug Repurposing for COVID-19

In this section we apply TeKGraph to a recently developed KG [17] in order to perform drug repurposing for COVID-19 disease. All algorithms were implemented in Matlab or Python, and executed on a Linux server comprising 32 cores at 2GHz and 128GB RAM.

5.1 Data

The dataset used in the experiments is the Drug Repurposing Knowledge Graph (DRKG)111github.com/gnn4dr/DRKG [17]. It codifies triplets of biological interactions between 97,238 different entities of 13 types, namely, genes, compounds, diseases, anatomy, tax, biological process, cellular component, pathway, molecular function, anatomical therapeutic chemical (Atc), side effect, pharmacological class, and symptom. The total number of triplets is 5,874,258 and there are 107 different types of interactions. The KG is organised in 6 adjacency tensors and 11 adjacency matrices. Detailed description of the dataset and the modeling can be found in Table 3. Each row denotes a different adjacency tensor or matrix and #\# type-mm entities, #\# type-mm entities, #\# relation types correspond to the dimension of mode 1, mode 2, and mode 3 respectively. The last column (sparsity) denotes the sparsity of each tensor, i.e., nnz(𝑿¯m,n)LmLnKm,n\frac{\text{nnz}\left(\underline{\bm{X}}_{m,n}\right)}{L_{m}L_{n}K_{m,n}}

Table 3: Coupled tensor-matrix DRKG modeling.
entity type-m entity type-n # type-m entities 1 # type-n entities 2 # relation types tensor sparsity
Gene Gene 39,220 39,220 32 𝑿¯1,1=𝑨1,𝑨1,𝑪1,1\underline{\bm{X}}_{1,1}=\left\llbracket{\bm{A}}_{1},{\bm{A}}_{1},{\bm{C}}_{1,1}\right\rrbracket 6.121056.12~{}10^{-5}
Compound 39,220 24,313 34 𝑿¯1,2=𝑨1,𝑨2,𝑪1,2\underline{\bm{X}}_{1,2}=\left\llbracket{\bm{A}}_{1},{\bm{A}}_{2},{\bm{C}}_{1,2}\right\rrbracket 6.501066.50~{}10^{-6}
Disease 39,220 5,103 15 𝑿¯1,3=𝑨1,𝑨3,𝑪1,3\underline{\bm{X}}_{1,3}=\left\llbracket{\bm{A}}_{1},{\bm{A}}_{3},{\bm{C}}_{1,3}\right\rrbracket 4.131054.13~{}10^{-5}
Anatomy 39,220 400 3 𝑿¯1,4=𝑨1,𝑨4,𝑪1,4\underline{\bm{X}}_{1,4}=\left\llbracket{\bm{A}}_{1},{\bm{A}}_{4},{\bm{C}}_{1,4}\right\rrbracket 0.0154
Tax 39,220 215 1 𝑿1,5=𝑨1diag(𝒄1,5)𝑨5T{\bm{X}}_{1,5}={\bm{A}}_{1}\text{diag}({\bm{c}}_{1,5}){\bm{A}}_{5}^{T} 0.0017
Biological Process 39,220 11,381 1 𝑿1,6=𝑨1diag(𝒄1,6)𝑨6T{\bm{X}}_{1,6}={\bm{A}}_{1}\text{diag}({\bm{c}}_{1,6}){\bm{A}}_{6}^{T} 0.0013
Cellular Component 39,220 1,391 1 𝑿1,7=𝑨1diag(𝒄1,7)𝑨7T{\bm{X}}_{1,7}={\bm{A}}_{1}\text{diag}({\bm{c}}_{1,7}){\bm{A}}_{7}^{T} 0.0013
Pathway 39,220 1,822 1 𝑿1,8=𝑨1diag(𝒄1,8)𝑨8T{\bm{X}}_{1,8}={\bm{A}}_{1}\text{diag}({\bm{c}}_{1,8}){\bm{A}}_{8}^{T} 0.0012
Molecular Function 39,220 2,884 1 𝑿1,9=𝑨1diag(𝒄1,9)𝑨9T{\bm{X}}_{1,9}={\bm{A}}_{1}\text{diag}({\bm{c}}_{1,9}){\bm{A}}_{9}^{T} 8.61048.610^{-4}
Compound Compound 24,313 24,313 2 𝑿¯2,2=𝑨2,𝑨2,𝑪2,2\underline{\bm{X}}_{2,2}=\left\llbracket{\bm{A}}_{2},{\bm{A}}_{2},{\bm{C}}_{2,2}\right\rrbracket 0.0023
Disease 24,313 5,103 10 𝑿¯2,3=𝑨2,𝑨3,𝑪2,3\underline{\bm{X}}_{2,3}=\left\llbracket{\bm{A}}_{2},{\bm{A}}_{3},{\bm{C}}_{2,3}\right\rrbracket 6.761056.76~{}10^{-5}
Atc 24,313 4,048 1 𝑿2,10=𝑨2diag(𝒄2,10)𝑨10T{\bm{X}}_{2,10}={\bm{A}}_{2}\text{diag}({\bm{c}}_{2,10}){\bm{A}}_{10}^{T} 1.61041.6~{}10^{-4}
Side Effect 24,313 5,701 1 𝑿2,11=𝑨2diag(𝒄2,11)𝑨11T{\bm{X}}_{2,11}={\bm{A}}_{2}\text{diag}({\bm{c}}_{2,11}){\bm{A}}_{11}^{T} 0.0010
Pharmacological Class 24,313 345 1 𝑿2,12=𝑨2diag(𝒄2,12)𝑨12T{\bm{X}}_{2,12}={\bm{A}}_{2}\text{diag}({\bm{c}}_{2,12}){\bm{A}}_{12}^{T} 1.221041.22~{}10^{-4}
Disease Disease 5,103 5,103 1 𝑿3,3=𝑨3diag(𝒄3,3)𝑨3T{\bm{X}}_{3,3}={\bm{A}}_{3}\text{diag}({\bm{c}}_{3,3}){\bm{A}}_{3}^{T} 4.171054.17~{}10^{-5}
Anatomy 5,103 400 1 𝑿3,4=𝑨3diag(𝒄3,4)𝑨4T{\bm{X}}_{3,4}={\bm{A}}_{3}\text{diag}({\bm{c}}_{3,4}){\bm{A}}_{4}^{T} 0.0018
Symptom 5,103 415 1 𝑿3,13=𝑨3diag(𝒄3,13)𝑨13T{\bm{X}}_{3,13}={\bm{A}}_{3}\text{diag}({\bm{c}}_{3,13}){\bm{A}}_{13}^{T} 0.0016

5.2 Procedure

Drug repurposing refers to the task of discovering existing drugs that can effectively manage certain diseases– COVID-19 in our study. DRKG codifies relational triplets of (compound,treats,disease) and (compound,inhibits,disease). Therefore drug repurposing in the context of DRKG boils down to predicting new ‘treats’ and ‘inhibits’ edges (links) between compounds and diseases of interest.

We follow the evaluation procedure proposed in [17]. In the training phase we learn low dimensional representations for the entities and relations, using all the edges in DRKG. In the testing phase, we assign a score to (compound,treats,disease) and (compound,inhibits,disease) triplets according to the scoring function used for training. For the proposed TeX-Graph, the scores assigned to the triplet (hyper-edge) (compound ii,treats,disease jj) and (compound ii,inhibits,disease jj) are:

scorei,j,2=𝑨2(i,:)diag(𝑪2,3(2,:))𝑨2(j,:)T,\text{score}_{i,j,2}=\bm{A}_{2}(i,:)\text{diag}\left(\bm{C}_{2,3}\left(2,:\right)\right)\bm{A}_{2}(j,:)^{T},
scorei,j,9=𝑨2(i,:)diag(𝑪2,3(9,:))𝑨2(j,:)T,\text{score}_{i,j,9}=\bm{A}_{2}(i,:)\text{diag}\left(\bm{C}_{2,3}\left(9,:\right)\right)\bm{A}_{2}(j,:)^{T},

since ‘treats’ and ‘inhibits’ relations correspond to the second and ninth frontal slab of 𝑿¯2,3\underline{\bm{X}}_{2,3}, respectively. The testing set consists of 34 corona-virus related diseases, including SARS, MERS and SARS-COV2 and 8,1038,103 FDA-approved drugs in Drugbank. Drugs with molecule weight less than 250 daltons are excluded from testing. Ribavirin was also excluded from the testing set, since there exist a ‘treat’ edge in the training set between Ribavirin and a target disease. In order to evaluate the performance of the proposed TeX-Graph and the alternatives we retrieve the top-100 ranked drugs that appear in the highest testing scoring (hyper-)edges. These are the proposed candidate drugs for COVID-19. Then we assess how many of the 32 clinical trial drugs 222www.covid19-trials.com (Ribavirin is excluded) appear in the proposed candidate top-100 drugs.

5.3 Methods

The methods used in the experiments are:

  • TeX-Graph. The proposed TeKGraph algorithm initialized with Algorithm 2. The embedding dimension is set to F=50F=50 and the algorithm runs for 10 iterations.

  • TransE-DRKG [6, 17]. TransE learns low dimensional KG embeddings using the score function shown in Table 2. For the the task of drug repurposing we use the specifications proposed in [17]. The l2l_{2} norm is chosen in the score function and training is performed using the deep graph library for knowledge graphs [35]. To evaluate the performance of TransE-DRKG on the drug repurposing task we used the 400400-dimensional pretrained embeddings in [17], with which the drug repurposing results were better than the stand-alone code without pretraining.

  • 3-way KG embeddings (3-way KGE). We add as a baseline the embeddings produced by computing the CPD of tensor 𝒀¯\underline{\bm{Y}} in (4.18). Recall that we use an algebraic CPD of 𝒀¯\underline{\bm{Y}} to initialize TeX-Graph. In 3-way KGE we initialize using the same procedure and also run 10 alternating least-squares iterations to compute the CPD of 𝒀¯\underline{\bm{Y}}. 3-way KGE is tested with F=50F=50.

5.4 Results

Table 4 shows the clinical trial drugs that appear in the top-100 recommendations along with their [rank-order]. The proposed approach retrieves 10 clinical trial drugs in the top-100 positions, and 7 in the top-50. Compared to TransE-DRKG that was the first proposed algorithm to perform drug-repurposing for COVID-19, TeX-Graph achieves 75%75\% and 100%100\% improvement in precision in the top-5050 and top-100100 respectively.

It is worth emphasizing that the proposed Tex-Graph retrieves approximately 1/31/3 of the COVID-19 clinical trial drugs, in the top-100, among a testing set of 8,1038,103 drugs. This result is pretty remarkable and can essentially help cutting down the immense search space of medical research. For instance, consider the case of Dexamethasone, which is retrieved by Tex-Graph in the top ranked position (it admitted the highest score among all 8,1038,103 drugs). At the onset of the pandemic, the initial guidance for Dexamethasone and other corticosteroids was indecisive. Guidelines from different sources issued either a weak recommendation to use Dexamethasone (with an asterisk that further evidence was required) or a weak recommendation against corticosteroids and Dexamethasone [24]. However, recent results indicate that treatment with Dexamethasone reduces mortality in patients with COVID-19 [16]. The results of Tex-Graph coalign with the latest evidence and rank Dexamethasone as the top recommended drug. This suggests that our proposed data-driven approach could have essentially contributed in overturning the initial hesitancy to administrate Dexamethasone as a first line treatment.

Table 4: Proposed candidate drugs for COVID-19
TeX-Graph TransE-DRKG 3-way KGE
F=50 F=400 F=50
Dexamethasone [1] Dexamethasone [4] Oseltamivir [89]
Methylprednisolone [6] Colchine [8]
Azithromycin [13] Methylprednisolone [16]
Thalidomide [18] Oseltamivir [49]
Losartan [41] Deferoxamine [87]
Hydroxychloroquine [47]
Colchine [48]
Oseltamivir[60]
Chloroquine[68]
Deferoxamine [88]

6 Conclusion

In this paper we proposed a novel coupled tensor-matrix framework for knowledge graph embedding. The proposed model is principled and enjoys several favorable properties, including parsimony and uniqueness. The developed algorithmic framework admits lightweight updates and can handle very large graphs. Finally the proposed TeX-Graph showed very promising results in a timely application to drug repurposing, a task of paramount importance in the fight against COVID-19.

7 Acknowledgements

The authors would like to acknowledge Ioanna Papadatou, M.D., Ph.D, for contributing in the medical assessment of the produced results.

A Appendix: TeX-Graph updates

TeX-Graph solves the following problem

(A.1) min{𝑨m},{𝑪m,n}(m,n)𝒮𝑿¯m,n𝑨m,𝑨n,𝑪m,nF2.\displaystyle\operatorname*{min}_{\begin{subarray}{c}\{{\bm{A}}_{m}\},\{{\bm{C}}_{m,n}\}\end{subarray}}~{}\sum_{(m,n)\in\mathcal{S}}\left\|{\underline{\bm{X}}}_{m,n}-\llbracket{\bm{A}_{m}},{\bm{A}_{n}},{\bm{C}_{m,n}}\rrbracket\right\|_{F}^{2}.

Then the update for 𝑨n\bm{A}_{n} is the solution of:

min𝑨n\displaystyle\operatorname*{min}_{\begin{subarray}{c}\bm{A}_{n}\end{subarray}}~{} m𝒮n+𝑿¯m,n𝑨m,𝑨n,𝑪m,nF2+\displaystyle\sum_{m\in\mathcal{S}_{n}^{+}}\left\|{\underline{\bm{X}}}_{m,n}-\llbracket{\bm{A}_{m}},{\bm{A}_{n}},{\bm{C}_{m,n}}\rrbracket\right\|_{F}^{2}+
(A.2) p𝒮n𝑿¯n,p𝑨n,𝑨p,𝑪n,pF2,\displaystyle\sum_{p\in\mathcal{S}_{n}^{-}}\left\|{\underline{\bm{X}}}_{n,p}-\llbracket{\bm{A}_{n}},{\bm{A}_{p}},{\bm{C}_{n,p}}\rrbracket\right\|_{F}^{2},

where 𝒮n+,𝒮n\mathcal{S}_{n}^{+},\mathcal{S}_{n}^{-} are defined in (4.16). Problem (A) can be written as:

min𝑨n\displaystyle\operatorname*{min}_{\begin{subarray}{c}\bm{A}_{n}\end{subarray}}~{} m𝒮n+𝑿m,n(1)(𝑪m,n𝑨m)𝑨nTF2+\displaystyle\sum_{m\in\mathcal{S}_{n}^{+}}\left\|\bm{{X}}_{m,n}^{(1)}-\left({\bm{C}_{m,n}}\odot{\bm{A}_{m}}\right){\bm{A}_{n}}^{T}\right\|_{F}^{2}+
(A.3) p𝒮n𝑿n,p(2)(𝑪n,p𝑨p)𝑨nTF2.\displaystyle\sum_{p\in\mathcal{S}_{n}^{-}}\left\|\bm{{X}}_{n,p}^{(2)}-\left({\bm{C}_{n,p}}\odot{\bm{A}_{p}}\right){\bm{A}_{n}}^{T}\right\|_{F}^{2}.

Taking the gradient of (A) with respect to 𝑨n\bm{A}_{n} and setting it to zero yields the equation in (4.15). The main bottleneck of (4.15) in terms of memory requirements and computational complexity is instantiating the Khatri-Rao products (𝑪n,p𝑨p),(𝑪m,n𝑨m)\left(\bm{C}_{n,p}\odot\bm{A}_{p}\right),\left(\bm{C}_{m,n}\odot\bm{A}_{m}\right) and computing the MTTKRP (𝑪n,p𝑨p)T𝑿n,p(1),(𝑪m,n𝑨m)T𝑿m,n(2)\left(\bm{C}_{n,p}\odot\bm{A}_{p}\right)^{T}\bm{X}_{n,p}^{(1)},~{}\left(\bm{C}_{m,n}\odot\bm{A}_{m}\right)^{T}\bm{X}_{m,n}^{(2)}. We focus on the computation of:

(A.4) (𝑪n,p𝑨p)T𝑿n,p(1).\left(\bm{C}_{n,p}\odot\bm{A}_{p}\right)^{T}\bm{X}_{n,p}^{(1)}.

Equation (A.4) can be equivalently written as:

[𝑨pdiag(𝑪n,p(1,:))𝑨pdiag(𝑪n,p(Kn,p,:))]T[𝑿n,p1T𝑿n,pKn,pT]=\displaystyle\begin{bmatrix}\bm{A}_{p}\text{diag}\left(\bm{C}_{n,p}(1,:)\right)\\ \vdots\\ \bm{A}_{p}\text{diag}\left(\bm{C}_{n,p}(K_{n,p},:)\right)\\ \end{bmatrix}^{T}\begin{bmatrix}\bm{X}^{1^{T}}_{n,p}\\ \vdots\\ \bm{X}^{K_{n,p}^{T}}_{n,p}\end{bmatrix}=
(A.5) k=1Kn,pdiag(𝑪n,p(k,:))𝑨p𝑿n,pkT.\displaystyle\sum_{k=1}^{K_{n,p}}\text{diag}\left(\bm{C}_{n,p}(k,:)\right)\bm{A}_{p}\bm{X}_{n,p}^{k^{T}}.

It is clear from equation (A) that (𝑪n,p𝑨p)\left(\bm{C}_{n,p}\odot\bm{A}_{p}\right) need not be instantiated. Furthermore, the number of flops to compute (A) is 𝒪(Fnnz(𝑿¯n,p))\mathcal{O}(F\cdot\text{nnz}(\underline{\bm{X}}_{n,p})). Note that computing (𝑪m,n𝑨m)T𝑿m,n(2)\left(\bm{C}_{m,n}\odot\bm{A}_{m}\right)^{T}\bm{X}_{m,n}^{(2)} is only different in the fact that the frontal slabs are not transposed, and is thus omitted.

The update for 𝑪m,n{\bm{C}_{m,n}} is the solution of:

(A.6) min𝑪m,n𝑿¯m,n𝑨m,𝑨n,𝑪m,nF2,\operatorname*{min}_{\begin{subarray}{c}\bm{C}\end{subarray}_{m,n}}~{}\left\|{\underline{\bm{X}}}_{m,n}-\llbracket{\bm{A}_{m}},{\bm{A}_{n}},{\bm{C}_{m,n}}\rrbracket\right\|_{F}^{2},

or equivalently:

(A.7) min𝑪m,n𝑿m,n(3)(𝑨m𝑨n)𝑪m,nTF2.\operatorname*{min}_{\begin{subarray}{c}\bm{C}\end{subarray}_{m,n}}~{}\left\|{{\bm{X}}}_{m,n}^{(3)}-\left({\bm{A}_{m}}\odot{\bm{A}_{n}}\right){\bm{C}_{m,n}}^{T}\right\|_{F}^{2}.

Taking the gradient of (A.7) with respect to 𝑪m,n{\bm{C}_{m,n}} and setting it to zero yields the equation in (4.17). The main memory and computation bottleneck of equation (4.17) is computing the MTTKRP. The formula in (A) can be utilized if 𝑪n,p\bm{C}_{n,p} is replaced by 𝑨n\bm{A}_{n}, 𝑨p\bm{A}_{p} is replaced by 𝑨m\bm{A}_{m} and the transposed frontal slabs 𝑿m,nkT\bm{X}_{m,n}^{k^{T}} are replaced by vertical slabs.

References

  • [1] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. Dbpedia: A nucleus for a web of open data. In The semantic web, pages 722–735. Springer, 2007.
  • [2] Ivana Balazevic, Carl Allen, and Timothy Hospedales. Tucker: Tensor factorization for knowledge graph completion. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5188–5197, 2019.
  • [3] Albert-László Barabási et al. Network science. Cambridge university press, 2016.
  • [4] Michele Berlingerio, Michele Coscia, Fosca Giannotti, Anna Monreale, and Dino Pedreschi. Foundations of multidimensional network analysis. In 2011 international conference on advances in social networks analysis and mining, pages 485–489. IEEE, 2011.
  • [5] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250, 2008.
  • [6] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, pages 2787–2795, 2013.
  • [7] Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
  • [8] Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R Hruschka, and Tom M Mitchell. Toward an architecture for never-ending language learning. In Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010.
  • [9] Luca Chiantini and Giorgio Ottaviani. On generic identifiability of 3-tensors of small rank. SIAM Journal on Matrix Analysis and Applications, 33(3):1018–1037, 2012.
  • [10] Ignat Domanov and Lieven De Lathauwer. On the uniqueness of the canonical polyadic decomposition of third-order tensors — part ii: Uniqueness of the overall decomposition. SIAM Journal on Matrix Analysis and Applications (SIMAX), 34(3):876–903, 2013.
  • [11] David Easley, Jon Kleinberg, et al. Networks, crowds, and markets, volume 8. Cambridge university press Cambridge, 2010.
  • [12] Thomas Franz, Antje Schultz, Sergej Sizov, and Steffen Staab. Triplerank: Ranking semantic web data by tensor decomposition. In International semantic web conference, pages 213–228. Springer, 2009.
  • [13] Richard A Harshman, Margaret E Lundy, et al. Parafac: Parallel factor analysis. Computational Statistics and Data Analysis, 18(1):39–72, 1994.
  • [14] Daniel S Himmelstein and Sergio E Baranzini. Heterogeneous network edge prediction: a data integration approach to prioritize disease-associated genes. PLoS computational biology, 11(7), 2015.
  • [15] Daniel Scott Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, and Sergio E Baranzini. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. Elife, 6:e26726, 2017.
  • [16] Peter Horby, Wei Shen Lim, Jonathan R Emberson, Marion Mafham, Jennifer L Bell, Louise Linsell, Natalie Staplin, Christopher Brightling, Andrew Ustianowski, Einas Elmahi, et al. Dexamethasone in hospitalized patients with covid-19-preliminary report. The New England journal of medicine, 2020.
  • [17] Vassilis N. Ioannidis, Xiang Song, Saurav Manchanda, Mufei Li, Xiaoqin Pan, Da Zheng, Xia Ning, Xiangxiang Zeng, and George Karypis. Drkg - drug repurposing knowledge graph for covid-19. https://github.com/gnn4dr/DRKG/, 2020.
  • [18] Xueyan Jiang, Volker Tresp, Yi Huang, and Maximilian Nickel. Link prediction in multi-relational graphs using additive models. SeRSy, 919:1–12, 2012.
  • [19] Tamara G Kolda and Brett W Bader. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.
  • [20] Tamara G Kolda, Brett W Bader, and Joseph P Kenny. Higher-order web link analysis using multilinear algebra. In Proceedings of Fifth IEEE International Conference on Data Mining, pages 8–pp. IEEE, 2005.
  • [21] Hailun Lin, Yong Liu, Weiping Wang, Yinliang Yue, and Zheng Lin. Learning entity and relation embeddings for knowledge resolution. Procedia Computer Science, 108:345–354, 2017.
  • [22] Mark Newman. Networks. Oxford university press, 2018.
  • [23] Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. A three-way model for collective learning on multi-relational data. In Icml, volume 11, pages 809–816, 2011.
  • [24] Hallie C Prescott and Todd W Rice. Corticosteroids in covid-19 ards: evidence and hope during the pandemic. Jama, 324(13):1292–1295, 2020.
  • [25] Steffen Rendle and Lars Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In Proceedings of the third ACM international conference on Web search and data mining, pages 81–90, 2010.
  • [26] Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M Marlin. Relation extraction with matrix factorization and universal schemas. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 74–84, 2013.
  • [27] Eugenio Sanchez and Bruce R Kowalski. Tensorial resolution: a direct trilinear decomposition. Journal of Chemometrics, 4(1):29–45, 1990.
  • [28] Nicholas D Sidiropoulos, Lieven De Lathauwer, Xiao Fu, Kejun Huang, Evangelos E Papalexakis, and Christos Faloutsos. Tensor decomposition for signal processing and machine learning. IEEE Transactions on Signal Processing, 65(13):3551–3582, 2017.
  • [29] Amit Singhal. Introducing the knowledge graph: things, not strings. Official google blog, 16, 2012.
  • [30] Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems, pages 926–934, 2013.
  • [31] MIKAEL Sørensen, Ignat Domanov, and L De Lathauwer. Coupled canonical polyadic decompositions and (coupled) decompositions in multilinear rank-(lr, n, lr, n, 1) terms—part ii: Algorithms. SIAM Journal on Matrix Analysis and Applications, 36:1015–1045, 2015.
  • [32] Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web, pages 697–706, 2007.
  • [33] Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. Rotate: Knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations, 2018.
  • [34] Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575, 2014.
  • [35] Da Zheng, Xiang Song, Chao Ma, Zeyuan Tan, Zihao Ye, Jin Dong, Hao Xiong, Zheng Zhang, and George Karypis. Dgl-ke: Training knowledge graph embeddings at scale. arXiv preprint arXiv:2004.08532, 2020.