This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Promotion/Inhibition Effects in Networks:
A Model with Negative Probabilities

Anqi Dong,  Tryphon T. Georgiou, 
and Allen Tannenbaum
Anqi Dong and Tryphon T. Georgiou are with the Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA 92697; {anqid2,tryphon}@uci.edu.Allen Tannenbaum is with the Departments of Computer Science and Applied Mathematics & Statistics at the State University of New York, Stony Brook; arobertan@gmail.com
Abstract

Biological networks often encapsulate promotion/inhibition as signed edge-weights of a graph. Nodes may correspond to genes assigned expression levels (mass) of respective proteins. The promotion/inhibition nature of co-expression between nodes is encoded in the sign of the corresponding entry of a sign-indefinite adjacency matrix, though the strength of such co-expression (i.e., the precise value of edge weights) cannot typically be directly measured. Herein we address the inverse problem to determine network edge-weights based on a sign-indefinite adjacency and expression levels at the nodes. While our motivation originates in gene networks, the framework applies to networks where promotion/inhibition dictates a stationary mass distribution at the nodes. In order to identify suitable edge-weights we adopt a framework of “negative probabilities,” advocated by P. Dirac and R. Feynman, and we set up a likelihood formalism to obtain values for the sought edge-weights. The proposed optimization problem can be solved via a generalization of the well-known Sinkhorn algorithm; in our setting the Sinkhorn-type “diagonal scalings” are multiplicative or inverse-multiplicative, depending on the sign of the respective entries in the adjacency matrix, with value computed as the positive root of a quadratic polynomial. 111Research supported in part by the AFOSR under grant FA9550-23-1-0096 and ARO under W911NF-22-1-0292. The authorship is alphabetical; the conception of this work is due to the second author.

Index Terms:
Sinkhorn Algorithm, Negative Probabilities, Gene Regulatory Networks, Promotion/Inhibition Effects

I Introduction

Networks are often used to encode interactions between chemical or biological compounds, such as proteins and genes [1, 2, 3, 4, 5]. A distinguishing feature of such networks is that edge-weights quantify activation/suppression rates, equivalently, promotion/inhibition relations between nodes. These in turn impact expression levels of substances assigned to node sites. Measuring precisely the rates that regulate such interactions is often a difficult task, whereas measuring expression levels at the node sites is substantially easier. Thus, the subject of the present work is to develop a framework for solving the inverse problem to identify such network parameters (edge-weights) from information on expression levels and affinity between nodes. Specifically, we seek to determine sign-indefinite values for edge-weights of a network based on knowledge of nodal mass and the sign of the edge-weights. The sign of edge-weights is made available in the form of a sign-indefinite adjacency matrix. It is also possible that an estimate of (signed) edge-weights is already available and needs to be updated, so as to restore consistency with measured mass at the nodes. This situation is completely analogous to the case where the prior is only the signed adjacency matrix, and can be treated similarly.

We introduce a natural framework to identify sign-indefinite edge-weights from this type information, by adopting a sign-indefinite transition-probability model to encapsulate “promotion vs. inhibition.” Incorporating “negative probabilities” in a model was advocated early on by Paul Dirac [6] and Richard Feynman [7], and in some scarce work that followed [8, 9]. Dirac and Feynman argued specifically that negative probabilities, when not directly and experimentally measurable, are perfectly acceptable as reflecting internal unobservable manifestations of an underlying mechanism. In our setting however, the interpretation of negative transition probabilities as rates is physically meaningful. Activation/suppression rates, reflected in sign-indefinite edge-weights, dictate the stationary distribution at the nodes (e.g., protein levels, and so on) and are sought to explain the observed data.

The significance of identifying edge-weights in a gene regulatory network is of vital importance in that it allows assessing the role of nodes and edges in the overall functionality and robustness of networks [2, 3]. Indeed, chemical deactivation of sites in a gene regulatory network that may contribute to selective cell-death has been the rationale behind many types of medical treatment of cancer, and the endeavour to properly quantify functionality and robustness of gene networks with suitable metrics, that point to vital nodes and links, is on-going [10]. In particular, various centrality measures as well as network curvature have been proposed to quantify significance of nodes/edges as well as resilience of the network to changes. Determining the value of such metrics presupposes knowledge of edge-weights, which is the modest goal of the inverse problem that we consider herein.

A contribution of the present work is to propose identifying edge weights by minimizing a suitable relative entropy functional between sign-indefinite measures. The problem can be solved by a Sinkhorn-like iteration to obtain the minimizer. This new algorithm reduces to the well-known Sinkhorn “diagonal-scaling iteration” when the network adjacency matrix is sign definite. In the generality of the present model, with sign-indefinite edge weights, the algorithm amounts to an iterative scaling scheme with the scaling of multiplicative or inverse-multiplicative nature, depending on the sign of a corresponding entry in the adjacency matrix. At each step of the iteration, the scaling factor is readily available as the positive root of a quadratic polynomial. The new algorithm, much like the Sinkhorn iteration, can be seen as effecting coordinate ascent to obtain the maximum of a concave functional, displaying as a consequence linear convergence rates.

Below, in Section II we present our problem formulation, followed by theoretical development and algorithmic considerations that are presented in Section III. Section IV provides illustrative examples.

II Problem formulation

The data for our mathematical problem consist of two entities: first a symmetric matrix A=A{1,0,1}n×nA=A^{\prime}\in\mathbb{\{}-1,0,1\}^{n\times n}, and then a (column) probability vector pnp\in\mathbb{R}^{n}, i.e., such that pi0p_{i}\geq 0 for i{1,,n}i\in\{1,\ldots,n\} and i=1:npi=1\sum_{i=1:n}p_{i}=1.

As noted in the introduction, the matrix AA may represent the sign-indefinite adjacency of a given gene regulatory network. In such an example, the nodes v𝒱={1,2,,n}v\in\mathcal{V}=\{1,2,\ldots,n\} correspond to genes and the vertex set cardinality |𝒱|=n|\mathcal{V}|=n is typically very large, with nn of the order of 10310410^{3}-10^{4}. The adjacency matrix A=[Aij]i,j=1:nA=[A_{ij}]_{i,j=1:n} encodes both, a “connectivity” structure among genes as well as a constructive/indifferent/destructive (promoting/no effect/inhibiting) contribution of the corresponding pair of genes to the respective protein production levels. This information is encoded in the sign of the respective entry Aij{1,0,1}A_{ij}\in\{-1,0,1\}, with positive value indicating a constructive co-expression, zero indicating no perceived effect, and negative indicating inhibiting effect. This (biological) information is assumed given to us.

The second piece of datum for our problem is a probability vector pip_{i} (i{1,,n}i\in\{1,\ldots,n\} that represents corresponding relative protein levels, and hence respective potency of the genes sites. That is, pi0p_{i}\geq 0 and i=1:npi=1\sum_{i=1:n}p_{i}=1, with high values representing strong expression of respective proteins. Once again, this information is given to us from experimental data.

Biologists are typically interested in modeling the contributions of various genes on relative protein production levels in a quantitative manner. To this end, ad-hoc schemes have been utilized; see e.g., [3, 1] and the references therein. In earlier work, the rationale behind the proposed schemes was to employ the theory of Markov chains by calibrating the constructive/inhibiting effect of sites to production levels to only positive values. This was done by adding a suitable positive constant to all entries of AA. Accordingly, AA becomes a positive matrix, to which the standard theory of Markov chains may be adapted, after scaling AA suitably, so that pp may be regarded as the stationary distribution of a corresponding Markov chain. Several pitfalls plague all such schemes, and are traceable to the difference between constructive and inhibiting effects, since adding a constant, evidently creates a bias in one direction and not the other.

In the present work, we propose an approach that seems natural for the problem at hand, and brings in the concept of negative probabilities. As indicated earlier, such concepts have had notable proponents including Dirac and Feynman, and some scant following. In some detail, we hereby, postulate and seek a sign-indefinite Markov transition model to explain the observed invariant distribution in pp, while acknowledging the promoting/inhibiting nature of the links between genes. Feynman’s dictum suggests that, as long as internal probabilities in a model are not experimentally observable (as is the case with the sought transition kernel that will be constructed to respect the signs of AijA_{ij}’s), it is acceptable provided that it explains observed and experimentally measurable (non-negative) probabilities and relative frequencies.

Thus, we seek to identify a sign-indefinite probability transition matrix Π=[Πij]i,j=1:n\Pi=[\Pi_{ij}]_{i,j=1:n} such that Πij0\Pi_{ij}\gtreqqless 0 in accordance with the similar property for the corresponding AijA_{ij}. Alternatively, ΠijAij0\frac{\Pi_{ij}}{A_{ij}}\geq 0, while Aij=0Πij=0A_{ij}=0\Rightarrow\Pi_{ij}=0. To this end, we formulate the following problem seeking to minimize a suitable functional (herein, entropy).

Problem 1.

Determine Π\Pi that satisfies the above conditions,

minimizes
J(Π,A):=\displaystyle J(\Pi,A):=\!\!\! i,j|Ai,j0piΠijAijlog(ΠijAij),\displaystyle\sum_{i,j|A_{i,j}\neq 0}\!\!p_{i}\frac{\Pi_{ij}}{A_{ij}}\log(\frac{\Pi_{ij}}{A_{ij}}), (1a)
and satisfies
i=1:npiΠij=pj, for all j=1:n,\displaystyle\sum_{i=1:n}p_{i}\Pi_{ij}=p_{j},\mbox{ for all }j=1:n, (1b)
j=1:nΠij=1, for all i=1:n.\displaystyle\sum_{j=1:n}\Pi_{ij}=1,\mbox{ for all }i=1:n. (1c)

The use of relative entropy between AA and Π\Pi, as above, echoes a similar usage in the justifications of Schrödinger’s bridges, rooted in large deviations theory (see [11]. It has also been central in other problems in network theory [12]. Negative transition probabilities in our context can be interpreted as promotion/inhibition rates.

III Theoretical development

We first reformulate Problem 1 in the following manner. We define

𝐀:=|A|:=[|Aij|]i,j=1:n.\mathbf{A}:=|A|:=[|A_{ij}|]_{i,j=1:n}.

This is seen to be a sign-definite adjacency of the network with entries 𝐀i,j{0,1}\mathbf{A}_{i,j}\in\{0,1\}. Likewise, for a sign-indefinite transition probability Π\Pi as earlier, we write

𝚷=|Π|=[|Πij|]i,j=1:n\mathbf{\Pi}=|\Pi|=[|\Pi_{ij}|]_{i,j=1:n}

for the corresponding (typically) unnormalized transition probability matrix. That is, 𝚷\mathbf{\Pi} has non-negative entries, but without guaranteed row-sums being equal to one. Thus, our problem can be re-cast as follows.

Problem 2.

Determine an entry-wise nonnegative matrix 𝚷\mathbf{\Pi} that minimizes

𝐉(𝚷,𝐀):=i,j|𝐀i,j0pi𝚷ij𝐀ijlog(𝚷ij𝐀ij)\displaystyle\mathbf{J}(\mathbf{\Pi},\mathbf{A}):=\!\!\!\sum_{i,j|\mathbf{A}_{i,j}\neq 0}\!\!p_{i}\frac{{\mathbf{\Pi}}_{ij}}{\mathbf{A}_{ij}}\log(\frac{{\mathbf{\Pi}}_{ij}}{\mathbf{A}_{ij}}) (2a)
and satisfies the linear constraints
ipiAij𝚷ij=pj\displaystyle\sum_{i}p_{i}A_{ij}\mathbf{\Pi}_{ij}=p_{j} (2b)
jAij𝚷ij=1,\displaystyle\sum_{j}A_{ij}\mathbf{\Pi}_{ij}=1, (2c)
for j,i{1,,n}j,i\in\{1,\ldots,n\}.

It is clear that Problems 1 and 2 are equivalent, and of course, that 𝐉(𝚷,𝐀)=J(Π,A)\mathbf{J}(\mathbf{\Pi},\mathbf{A})=J(\Pi,A). Problem 2 is quite similar to the classical Schrödinger problem [11], but because AA is sign-indefinite, (2b-2c) are not the usual conditions. Yet, they are still linear, and the above re-formulation readily leads to the following conclusion.

Theorem 1.

If Problem 2 is feasible, the minimizer exists and is unique.

Proof.

The result follows by virtue of the fact that 𝐉(𝚷,𝐀)\mathbf{J}(\mathbf{\Pi},\mathbf{A}) is strictly convex in 𝚷\mathbf{\Pi}, and the constraints are linear. ∎

Corollary 2.

If Problem 1 is feasible, the minimizer exists and is unique.

III-A A Sinkhorn-like algorithm

We now seek to develop a computational approach for our problem that is akin to the Sinkhorn iteration [13, 11]. To this end, we invoke duality theory so as to obtain the functional form of the minimizer. Thus, we introduce Lagrange multipliers and obtain the Lagrangian

(𝚷,p,λ,μ):=\displaystyle{\mathcal{L}}(\mathbf{\Pi},p,\lambda,\mu):=\!\!\! i,j|𝐀i,j0pi𝚷ij𝐀ijlog(𝚷ij𝐀ij)\displaystyle\sum_{i,j|\mathbf{A}_{i,j}\neq 0}\!\!\!p_{i}\frac{{\mathbf{\Pi}}_{ij}}{\mathbf{A}_{ij}}\log(\frac{{\mathbf{\Pi}}_{ij}}{\mathbf{A}_{ij}})
+jμj(ipiAij𝚷ijpj)\displaystyle+\sum_{j}\mu_{j}\big{(}\sum_{i}p_{i}A_{ij}\mathbf{\Pi}_{ij}-p_{j}\big{)}
+iλi(jAij𝚷ij1).\displaystyle+\sum_{i}\lambda_{i}\big{(}\sum_{j}A_{ij}\mathbf{\Pi}_{ij}-1\big{)}.

The first-order optimality condition, /𝚷ij=0\partial\mathcal{L}/\partial\mathbf{\Pi}_{ij}=0, gives that

pi𝐀ijlog(𝚷ij𝐀ij)+pi𝐀ij+μjpiAij+λiAij=0,\displaystyle\frac{p_{i}}{\mathbf{A}_{ij}}\log(\frac{\mathbf{\Pi}_{ij}}{\mathbf{A}_{ij}})+\frac{p_{i}}{\mathbf{A}_{ij}}+\mu_{j}p_{i}A_{ij}+\lambda_{i}A_{ij}=0,

for i,ji,j such that 𝐀ij=1\mathbf{A}_{ij}=1, otherwise 𝚷ij=0\mathbf{\Pi}_{ij}=0. Since 𝐀i,j{0,1}\mathbf{A}_{i,j}\in\{0,1\} and 𝐀i,jAi,j=Ai,j\mathbf{A}_{i,j}A_{i,j}=A_{i,j}, the optimizer must have the following functional dependence on the Lagrange multipliers

𝚷ij=𝐀ij×exp(1μjAijλipiAij).\displaystyle\mathbf{\Pi}^{*}_{ij}=\mathbf{A}_{ij}\times\exp\big{(}-1-\mu_{j}A_{ij}-\frac{\lambda_{i}}{p_{i}}A_{ij}\big{)}. (3)

Define A+=12(𝐀+A)A^{+}=\frac{1}{2}\left(\mathbf{A}+A\right) and A=12(𝐀A)A^{-}=\frac{1}{2}\left(\mathbf{A}-A\right), so that 𝐀=|A|=A++A\mathbf{A}=|A|=A^{+}+A^{-} and A=A+AA=A^{+}-A^{-}, and a scaled set of new parameters νi:=λi/pi\nu_{i}:=\lambda_{i}/p_{i}, for i{1,,n}i\in\{1,\ldots,n\}. Then, the optimal kernel may be written in the form

𝚷ij={exp(1μjνi),when Aij>0,exp(1+μj+νi)when Aij<0,0when Aij=0,\displaystyle\mathbf{\Pi}^{*}_{ij}=\begin{cases}\exp\big{(}-1-\mu_{j}-\nu_{i}\big{)},&\mbox{when }A_{ij}>0,\\ \exp\big{(}-1+\mu_{j}+\nu_{i}\big{)}&\mbox{when }A_{ij}<0,\\ 0&\mbox{when }A_{ij}=0,\end{cases} (4)

and equivalently, in the form

𝚷ij=1e(Aij+e(μj+νi)+Aije+(μj+νi)),\displaystyle\mathbf{\Pi}^{*}_{ij}=\frac{1}{e}\left(A_{ij}^{+}e^{-(\mu_{j}+\nu_{i})}+A_{ij}^{-}e^{+(\mu_{j}+\nu_{i})}\right), (5)

with constraints Fj(μj)=pjF_{j}(\mu_{j})=p_{j} and Bi(νi)=1B_{i}(\nu_{i})=1, where

Fj(μj):=\displaystyle F_{j}(\mu_{j}):= i=1:n1e(Aij+e(μj+νi)Aije+(μj+νi))pi\displaystyle\sum_{i=1:n}\frac{1}{e}\left(A_{ij}^{+}e^{-(\mu_{j}+\nu_{i})}-A_{ij}^{-}e^{+(\mu_{j}+\nu_{i})}\right)p_{i}
Bi(νi):=\displaystyle B_{i}(\nu_{i}):= j=1:n1e(Aij+e(μj+νi)Aije+(μj+νi)).\displaystyle\sum_{j=1:n}\frac{1}{e}\left(A_{ij}^{+}e^{-(\mu_{j}+\nu_{i})}-A_{ij}^{-}e^{+(\mu_{j}+\nu_{i})}\right).

We re-write,

Fj(μj)\displaystyle F_{j}(\mu_{j}) =ajF(ν)eμjbjF(ν)eμj\displaystyle=a^{F}_{j}(\nu)e^{-\mu_{j}}-b^{F}_{j}(\nu)e^{\mu_{j}} (6a)
Bi(νi)\displaystyle B_{i}(\nu_{i}) =aiB(μ)eνibiB(μ)eνi,\displaystyle=a^{B}_{i}(\mu)e^{-\nu_{i}}-b^{B}_{i}(\mu)e^{\nu_{i}}, (6b)

for

ajF(ν):=1ei=1:nAij+pieνi,\displaystyle a^{F}_{j}(\nu):=\frac{1}{e}\sum_{i=1:n}A_{ij}^{+}p_{i}e^{-\nu_{i}}, bjF(ν):=1ei=1:nAijpieνi,\displaystyle b^{F}_{j}(\nu):=\frac{1}{e}\sum_{i=1:n}A_{ij}^{-}p_{i}e^{\nu_{i}},
aiB(μ):=1ej=1:nAij+eμj,\displaystyle a^{B}_{i}(\mu):=\frac{1}{e}\sum_{j=1:n}A_{ij}^{+}e^{-\mu_{j}}, biB(μ):=1ej=1:nAijeμj,\displaystyle b^{B}_{i}(\mu):=\frac{1}{e}\sum_{j=1:n}A_{ij}^{-}e^{\mu_{j}},

and we readily observe that μj\mu_{j} can be computed explicitly from the constraint Fj(μj)=pjF_{j}(\mu_{j})=p_{j}, when the vector ν\nu is kept fixed, and similarly, νi\nu_{i} can be computed from Bi(νi)=1B_{i}(\nu_{i})=1.

To see this, consider the function f(x)=aexbexf(x)=ae^{-x}-be^{x}, with a,b>0a,b>0, and observe that it is monotonic, with negative derivative on the whole real axis, having limits f()=f(-\infty)=\infty and f()=f(\infty)=-\infty. Thus, for any given value cc, a solution to f(x)=cf(x)=c may be readily obtained as the logarithm of the positive root of a quadratic, giving,

x=log(c+c2+4ab2b)=:g(a,b,c).\displaystyle x=\log\left(\frac{-c+\sqrt{c^{2}+4ab}}{2b}\right)=:g(a,b,c). (7a)
For our purposes, the case where b=0b=0 is also of interest, and here
x=log(ac)=:g(a,0,c).\displaystyle x=\log\left(\frac{a}{c}\right)=:g(a,0,c). (7b)
This follows from aex=cae^{-x}=c, and of course, it also coincides with the value limb0g(a,b,c)\lim_{b\to 0}g(a,b,c).

Bringing all of the above together, we arrive at the following iterative algorithm, where μj\mu_{j} is computed as a function of ν\nu, using (7), to solve

Fj(μj)=ajF(ν)eμjbjF(ν)eμj=pj,\displaystyle F_{j}(\mu_{j})=a^{F}_{j}(\nu)e^{-\mu_{j}}-b^{F}_{j}(\nu)e^{\mu_{j}}=p_{j}, (8a)
for j=1:n,\displaystyle\mbox{ for }j=1:n,
and νi\nu_{i} is computed as a function of μ\mu, using (7) to solve
Bi(νi)=aiB(μ)eνibiB(μ)eνi=1,\displaystyle B_{i}(\nu_{i})=a^{B}_{i}(\mu)e^{-\nu_{i}}-b^{B}_{i}(\mu)e^{\nu_{i}}=1, (8b)
for i=1:n.\displaystyle\mbox{ for }i=1:n.

The steps are summarized below:

Algorithm 1 Sinkhorn-like algorithm
1:  Initialize νn\nu\in\mathbb{R}^{n}, e.g., setting ν=0\nu=0.
2:  For j=1:nj=1:n, determine μj=g(ajF(ν),bjF(ν),pj)\mu_{j}=g(a^{F}_{j}(\nu),b^{F}_{j}(\nu),p_{j}).
3:  For i=1:ni=1:n, determine νi=g(aiB(μ),biB(μ),1)\nu_{i}=g(a^{B}_{i}(\mu),b^{B}_{i}(\mu),1).
4:  Repeat steps 2 and 3 until convergence.

When AA^{-} is the zero matrix, the above system of equations (8) reduces to the classical Schrödinger system [11]. Indeed, in this case, bF,bBb^{F},b^{B} vanish and the steps for finding μj\mu_{j} and νi\nu_{i}, from ν\nu and μ\mu, respectively, reduce to the standard diagonal scaling of the celebrated Sinkhorn algorithm, giving eμj=pj/ajF(ν)e^{-\mu_{j}}=p_{j}/a_{j}^{F}(\nu), and similarly, eνi=1/aiB(μ)e^{-\nu_{i}}=1/a^{B}_{i}(\mu). Thus, the above algorithm represents a generalization of the Sinkhorn algorithm; the sign-indefiniteness of AA prevents simple diagonal scaling as an option to satisfy iteratively the boundary conditions, as in the original Sinkhorn iteration [13], yet the update can still be done quite easily using (7).

Algorithm 1 can also be seen as implementing coordinate ascent on the dual functional of the Lagrangian, with exact evaluation of the maximizer of a smooth functional along coordinate directions at each step. A detailed analysis together with extension of the general framework to address inverse problems in higher dimensions is the subject of forthcoming work [14].

III-B Gradient descent method

Once again, using duality as earlier, the dual function corresponding to the primal problem (Problem 2) reads

g(μ,λ)=\displaystyle g(\mu,\lambda)=\!\!\! i,j|𝐀i,j0𝐀ijexp(μjAijλipiAij)pi\displaystyle\sum_{i,j|\mathbf{A}_{i,j}\neq 0}\!\!\!-\mathbf{A}_{ij}\exp(-\mu_{j}A_{ij}-\frac{\lambda_{i}}{p_{i}}A_{ij})p_{i}
jμjpjiλi.\displaystyle-\sum_{j}\mu_{j}p_{j}-\sum_{i}\lambda_{i}.

Therefore, the dual problem reads

maxμ,λg(μ,λ),\displaystyle\max_{\mu,\lambda}\ \ g(\mu,\lambda),

and can be also solved by gradient ascent. Specifically, taking partials of g(λ,μ)g(\lambda,\mu) with respect to μ\mu and λ\lambda, we have

gμj\displaystyle\frac{\partial g}{\partial\mu_{j}} =1eipiAijexp(μjAijλipiAij)pj,\displaystyle=\frac{1}{e}\sum_{i}p_{i}A_{ij}\exp\big{(}-\mu_{j}A_{ij}-\frac{\lambda_{i}}{p_{i}}A_{ij}\big{)}-p_{j}, (9a)
gλi\displaystyle\frac{\partial g}{\partial\lambda_{i}} =1ejAijexp(μjAijλipiAij)1.\displaystyle=\frac{1}{e}\sum_{j}A_{ij}\exp\big{(}-\mu_{j}A_{ij}-\frac{\lambda_{i}}{p_{i}}A_{ij}\big{)}-1. (9b)

The optimizer in (3) is obtained by iteratively updating the dual variables in the direction of the gradient with a suitable step size, and is sketched below.

Algorithm 2 Gradient descent method
1:  Initialize λ=0\lambda=0, μ=0\mu=0, and select step size γ\gamma.
2:  For j=1:nj=1:n, determine
μjnext=μj+γ(1eipiAijexp(μjAijλipiAij)pj).\mu_{j}^{\rm next}=\mu_{j}+\gamma(\frac{1}{e}\sum_{i}p_{i}A_{ij}\exp\big{(}-\mu_{j}A_{ij}-\frac{\lambda_{i}}{p_{i}}A_{ij}\big{)}-p_{j}).
3:  For j=1:nj=1:n, determine
νjnext=νj+γ(1eiAijexp(μjAijλipiAij)1).\nu_{j}^{\rm next}=\nu_{j}+\gamma(\frac{1}{e}\sum_{i}A_{ij}\exp\big{(}-\mu_{j}A_{ij}-\frac{\lambda_{i}}{p_{i}}A_{ij}\big{)}-1).
4:  Update the values of μ,ν\mu,\nu and repeat until convergence.

III-C Remarks

Although the functional 𝐉\mathbf{J} in Problem 2 is convex, existence of a minimizer hinges on whether the constraints are feasible. It is of interest to determine efficient ways to test feasibility, especially for very large matrices.

It is also important to note that the theory applies to the case where AA is not necessarily symmetric and/or does not have a symmetric sign structure. A non-symmetric adjacency represents a directed graph, while asymmetry of the sign structure brings in interesting feedback dynamics.

IV Illustrative examples

We now discuss representative examples to illustrate what can be accomplished with the proposed formulation. The code used for working out all of the examples may be found at https://github.com/dytroshut/negative-probability-forward-backward.

112233+{+}{-}+{+}+{+}
Figure 1: Network topology
Example 1.

Consider the 33-node network shown in Figure 1. We take AA to be the following sign-indefinite adjacency matrix:

A=[111101110].\displaystyle A=\begin{bmatrix}\phantom{-}1&-1&\phantom{-}1\\ -1&\phantom{-}0&\phantom{-}1\\ \phantom{-}1&\phantom{-}1&\phantom{-}0\end{bmatrix}.

Note that the sign of the edge between the first two nodes is negative, indicating inhibiting effect. We take p=[0.3 0.3 0.4]p=[0.3\ 0.3\ 0.4]. Algorithms 1 and 2 converge to the same value (evidently), giving

Π=[0.76310.04820.25810.048201.04820.21380.78620],\displaystyle\Pi=\begin{bmatrix}\phantom{-}0.7631&-0.0482&0.2581\\ -0.0482&\phantom{-}0&1.0482\\ \phantom{-}0.2138&\phantom{-}0.7862&0\end{bmatrix},

which can be verified to satisfy the constraints. As can be observed, Π\Pi is not symmetric, as it was not required and only respects the signature structure of AA.

Example 2.

Once again we consider the 33-node network shown in Figure 1. However, this time, we take AA to have the following sign-structure,

A=[111101100],\displaystyle A=\begin{bmatrix}\phantom{-}1&\phantom{-}1&\phantom{-}1\\ -1&\phantom{-}0&\phantom{-}1\\ \phantom{-}1&\phantom{-}0&\phantom{-}0\end{bmatrix},

which is already not symmetric. We take the same probability vector p=[0.3 0.3 0.4]p=[0.3\ 0.3\ 0.4]. The solution in this case can be computed by either algorithm to be

Π=[0.00090.99620.00290.332101.3321100],\displaystyle\Pi=\begin{bmatrix}\phantom{-}0.0009&0.9962&0.0029\\ -0.3321&0&1.3321\\ \phantom{-}1&0&0\end{bmatrix},

which can be verified to satisfy the constraints Πp=p\Pi^{\prime}p=p and Π𝟙=𝟙\Pi\mathds{1}=\mathds{1}. This example highlights the fact that the theory applies equally well to the case where AA is not symmetric and/or does not have symmetric sign structure.

Example 3.

We now work out an example with a substantially larger adjacency matrix AA. The sign indefinite adjacency matrix AA chosen for this example is

A=[1000101010001000000101001010000000100100101111000000001010101010011000000100001110000101010100000110]\displaystyle A=\begin{bmatrix}\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&-1&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}1&\phantom{-}0\\ \phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1\\ \phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ \phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0\\ -1&\phantom{-}0&\phantom{-}1&\phantom{-}1&\phantom{-}1&-1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ \phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&-1&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}1&\phantom{-}0\\ \phantom{-}1&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ \phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&-1\\ \phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}1\\ \phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&-1&\phantom{-}1&\phantom{-}0\end{bmatrix}

The probability vector pp is

p=[0.1 0.05 0.05 0.15 0.2 0.05 0.03 0.07 0.25 0.05].\displaystyle p=\begin{bmatrix}0.1\ 0.05\ 0.05\ 0.15\ 0.2\ 0.05\ 0.03\ 0.07\ 0.25\ 0.05\end{bmatrix}.

The topology of the network is shown in Figure 2, with color-coded display of the negative values using dashed red curves. The resulting matrix Π\Pi is given in (10).

1122334455667788991010
Figure 2: The topology of a 1010-node network
Π=[0.43370000.339800.120500.78570000.29450000000.705500.4595000.415000.125500000000.6451000.3549000.193000.16230.58950.64410.2029000000000.247000.165801.081200.406400.0940000.38660.11290000000.458600000.98870.44730.332100000.315900.167800.184100.5405000000.50380.96330]\Pi=\begin{bmatrix}\phantom{-}0.4337&0&0&0&{\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,0,.25}-0.3398}&\phantom{-}0&0.1205&\phantom{-}0&0.7857&\phantom{-}0\\ \phantom{-}0&0&0.2945&0&\phantom{-}0&\phantom{-}0&0&\phantom{-}0&0&\phantom{-}0.7055\\ \phantom{-}0&0.4595&0&0&\phantom{-}0.4150&\phantom{-}0&0.1255&\phantom{-}0&0&\phantom{-}0\\ \phantom{-}0&0&0&0&\phantom{-}0.6451&\phantom{-}0&0&\phantom{-}0.3549&0&\phantom{-}0\\ {\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,0,.25}-0.1930}&0&0.1623&0.5895&\phantom{-}0.6441&{\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,0,.25}-0.2029}&0&\phantom{-}0&0&\phantom{-}0\\ \phantom{-}0&0&0&0&{\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,0,.25}-0.2470}&\phantom{-}0&0.1658&\phantom{-}0&1.0812&\phantom{-}0\\ \phantom{-}0.4064&0&0.0940&0&\phantom{-}0&\phantom{-}0.3866&0.1129&\phantom{-}0&0&\phantom{-}0\\ \phantom{-}0&0&0&0.4586&\phantom{-}0&\phantom{-}0&0&\phantom{-}0&0.9887&{\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,0,.25}-0.4473}\\ \phantom{-}0.3321&0&0&0&\phantom{-}0&\phantom{-}0.3159&0&\phantom{-}0.1678&0&\phantom{-}0.1841\\ \phantom{-}0&0.5405&0&0&\phantom{-}0&\phantom{-}0&0&{\color[rgb]{.75,0,.25}\definecolor[named]{pgfstrokecolor}{rgb}{.75,0,.25}-0.5038}&0.9633&\phantom{-}0\end{bmatrix} (10)
Example 4.

We finally consider a random graph with 100100 nodes and 14281428 edges, shown in Figure 3. The graph has 13601360 edges with positive signs and the remaining with negative. The edges with negative transition probability are randomly assigned and colored in red, and the ones with positive probability are shown in blue. Figure 4 displays the convergence of the Sinkhorn-like algorithm 1 in terms of the objective function 𝐉(𝚷,𝐀)\mathbf{J}(\mathbf{\Pi},\mathbf{A}) and marginal constraint violation222Measure used to assess convergence of the standard Sinkhorn iteration. log(Πpp)\log(\|\Pi^{\prime}p-p\|). A linear convergence rate on the constraint/marginal violation is observed, and can be shown by pointing to the fact that Algorithm 1 can be seen as a coordinate ascent algorithm. The example can be replicated using code and data in the project’s website.

Refer to caption
Figure 3: The topology of the 100-node random network used. Nodes are enumerated and arranged sequentially on a circle. Edges assigned negative transition probability are shown in red, while edges corresponding to positive transition probability are shown in blue.
Refer to caption
Figure 4: The top subplot shows 𝐉(𝚷,𝐀)\mathbf{J}(\mathbf{\Pi},\mathbf{A}) as a function of iteration number. The second subplot shows the violation/error of the marginal constraint |Πpp||\Pi^{\prime}p-p| in log-scale.

V Concluding remarks

The purpose of the present work has been to introduce a sign-indefinite probabilistic model to reflect promoting/inhibiting affinity between nodes, as in co-expression of genes in gene regulatory networks. The theory is of independent interest and leads to a nonstandard extension of the classical Sinkhorn algorithm. An important question that remains is on how to efficiently test feasibility of Problems 1 and 2. In addition, in light of the fact that the Sinkhorn iteration addresses a static version of the more general Schrödinger bridge problem [11], it is of interest to explore a setting that allows modeling evolution of nodal mass, allowing for promotion/inhibition effects, under suitable dynamics.

References

  • [1] A.-L. Barabasi and Z. N. Oltvai, “Network biology: understanding the cell’s functional organization,” Nature reviews genetics, vol. 5, no. 2, pp. 101–113, 2004.
  • [2] R. Sandhu, T. Georgiou, E. Reznik, L. Zhu, I. Kolesov, Y. Senbabaoglu, and A. Tannenbaum, “Graph curvature for differentiating cancer networks,” Scientific Reports, vol. 5, no. 1, p. 12323, 2015.
  • [3] R. Sandhu, S. Tannenbaum, T. Georgiou, and A. Tannenbaum, “Geometry of correlation networks for studying the biology of cancer,” in 2016 IEEE 55th Conference on Decision and Control (CDC).   IEEE, 2016, pp. 2501–2506.
  • [4] C. Liu, Y. Ma, J. Zhao, R. Nussinov, Y.-C. Zhang, F. Cheng, and Z.-K. Zhang, “Computational network biology: data, models, and applications,” Physics Reports, vol. 846, pp. 1–66, 2020.
  • [5] D. Seçilmiş, T. Hillerton, D. Morgan, A. Tjärnberg, S. Nelander, T. E. Nordling, and E. L. Sonnhammer, “Uncovering cancer gene regulation by accurate regulatory network inference from uninformative data,” NPJ systems biology and applications, vol. 6, no. 1, p. 37, 2020.
  • [6] P. A. M. Dirac, “Bakerian lecture-the physical interpretation of quantum mechanics,” Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, vol. 180, no. 980, pp. 1–40, 1942.
  • [7] R. P. Feynman, “Negative probability,” Quantum implications: essays in honour of David Bohm, pp. 235–248, 1987.
  • [8] M. S. Bartlett, “Negative probability,” in Mathematical Proceedings of the Cambridge Philosophical Society, vol. 41, no. 1.   Cambridge University Press, 1945, pp. 71–73.
  • [9] M. Burgin, “Interpretations of negative probabilities,” arXiv preprint arXiv:1008.1287, 2010.
  • [10] A. Baptista, B. D. MacArthur, and C. R. Banerji, “Charting cellular differentiation trajectories with Ricci flow,” bioRxiv, pp. 2023–07, 2023.
  • [11] Y. Chen, T. T. Georgiou, and M. Pavon, “Stochastic control liaisons: Richard Sinkhorn meets Gaspard Monge on a Schrödinger bridge,” SIAM Review, vol. 63, no. 2, pp. 249–313, 2021.
  • [12] H. Zhou, “Optimal transport on networks,” IEEE Control Systems Magazine, vol. 41, no. 4, pp. 70–81, 2021.
  • [13] T. T. Georgiou and M. Pavon, “Positive contraction mappings for classical and quantum Schrödinger systems,” Journal of Mathematical Physics, vol. 56, no. 3, 2015.
  • [14] A. Dong, T. T. Georgiou, and A. Tannenbaum, “Negative probabilities in high dimensions: Sign-indefinite priors for data assimilation,” in preparation, 2023.