Promotion/Inhibition Effects in Networks:
A Model with Negative Probabilities
Abstract
Biological networks often encapsulate promotion/inhibition as signed edge-weights of a graph. Nodes may correspond to genes assigned expression levels (mass) of respective proteins. The promotion/inhibition nature of co-expression between nodes is encoded in the sign of the corresponding entry of a sign-indefinite adjacency matrix, though the strength of such co-expression (i.e., the precise value of edge weights) cannot typically be directly measured. Herein we address the inverse problem to determine network edge-weights based on a sign-indefinite adjacency and expression levels at the nodes. While our motivation originates in gene networks, the framework applies to networks where promotion/inhibition dictates a stationary mass distribution at the nodes. In order to identify suitable edge-weights we adopt a framework of “negative probabilities,” advocated by P. Dirac and R. Feynman, and we set up a likelihood formalism to obtain values for the sought edge-weights. The proposed optimization problem can be solved via a generalization of the well-known Sinkhorn algorithm; in our setting the Sinkhorn-type “diagonal scalings” are multiplicative or inverse-multiplicative, depending on the sign of the respective entries in the adjacency matrix, with value computed as the positive root of a quadratic polynomial. 111Research supported in part by the AFOSR under grant FA9550-23-1-0096 and ARO under W911NF-22-1-0292. The authorship is alphabetical; the conception of this work is due to the second author.
Index Terms:
Sinkhorn Algorithm, Negative Probabilities, Gene Regulatory Networks, Promotion/Inhibition EffectsI Introduction
Networks are often used to encode interactions between chemical or biological compounds, such as proteins and genes [1, 2, 3, 4, 5]. A distinguishing feature of such networks is that edge-weights quantify activation/suppression rates, equivalently, promotion/inhibition relations between nodes. These in turn impact expression levels of substances assigned to node sites. Measuring precisely the rates that regulate such interactions is often a difficult task, whereas measuring expression levels at the node sites is substantially easier. Thus, the subject of the present work is to develop a framework for solving the inverse problem to identify such network parameters (edge-weights) from information on expression levels and affinity between nodes. Specifically, we seek to determine sign-indefinite values for edge-weights of a network based on knowledge of nodal mass and the sign of the edge-weights. The sign of edge-weights is made available in the form of a sign-indefinite adjacency matrix. It is also possible that an estimate of (signed) edge-weights is already available and needs to be updated, so as to restore consistency with measured mass at the nodes. This situation is completely analogous to the case where the prior is only the signed adjacency matrix, and can be treated similarly.
We introduce a natural framework to identify sign-indefinite edge-weights from this type information, by adopting a sign-indefinite transition-probability model to encapsulate “promotion vs. inhibition.” Incorporating “negative probabilities” in a model was advocated early on by Paul Dirac [6] and Richard Feynman [7], and in some scarce work that followed [8, 9]. Dirac and Feynman argued specifically that negative probabilities, when not directly and experimentally measurable, are perfectly acceptable as reflecting internal unobservable manifestations of an underlying mechanism. In our setting however, the interpretation of negative transition probabilities as rates is physically meaningful. Activation/suppression rates, reflected in sign-indefinite edge-weights, dictate the stationary distribution at the nodes (e.g., protein levels, and so on) and are sought to explain the observed data.
The significance of identifying edge-weights in a gene regulatory network is of vital importance in that it allows assessing the role of nodes and edges in the overall functionality and robustness of networks [2, 3]. Indeed, chemical deactivation of sites in a gene regulatory network that may contribute to selective cell-death has been the rationale behind many types of medical treatment of cancer, and the endeavour to properly quantify functionality and robustness of gene networks with suitable metrics, that point to vital nodes and links, is on-going [10]. In particular, various centrality measures as well as network curvature have been proposed to quantify significance of nodes/edges as well as resilience of the network to changes. Determining the value of such metrics presupposes knowledge of edge-weights, which is the modest goal of the inverse problem that we consider herein.
A contribution of the present work is to propose identifying edge weights by minimizing a suitable relative entropy functional between sign-indefinite measures. The problem can be solved by a Sinkhorn-like iteration to obtain the minimizer. This new algorithm reduces to the well-known Sinkhorn “diagonal-scaling iteration” when the network adjacency matrix is sign definite. In the generality of the present model, with sign-indefinite edge weights, the algorithm amounts to an iterative scaling scheme with the scaling of multiplicative or inverse-multiplicative nature, depending on the sign of a corresponding entry in the adjacency matrix. At each step of the iteration, the scaling factor is readily available as the positive root of a quadratic polynomial. The new algorithm, much like the Sinkhorn iteration, can be seen as effecting coordinate ascent to obtain the maximum of a concave functional, displaying as a consequence linear convergence rates.
II Problem formulation
The data for our mathematical problem consist of two entities: first a symmetric matrix , and then a (column) probability vector , i.e., such that for and .
As noted in the introduction, the matrix may represent the sign-indefinite adjacency of a given gene regulatory network. In such an example, the nodes correspond to genes and the vertex set cardinality is typically very large, with of the order of . The adjacency matrix encodes both, a “connectivity” structure among genes as well as a constructive/indifferent/destructive (promoting/no effect/inhibiting) contribution of the corresponding pair of genes to the respective protein production levels. This information is encoded in the sign of the respective entry , with positive value indicating a constructive co-expression, zero indicating no perceived effect, and negative indicating inhibiting effect. This (biological) information is assumed given to us.
The second piece of datum for our problem is a probability vector ( that represents corresponding relative protein levels, and hence respective potency of the genes sites. That is, and , with high values representing strong expression of respective proteins. Once again, this information is given to us from experimental data.
Biologists are typically interested in modeling the contributions of various genes on relative protein production levels in a quantitative manner. To this end, ad-hoc schemes have been utilized; see e.g., [3, 1] and the references therein. In earlier work, the rationale behind the proposed schemes was to employ the theory of Markov chains by calibrating the constructive/inhibiting effect of sites to production levels to only positive values. This was done by adding a suitable positive constant to all entries of . Accordingly, becomes a positive matrix, to which the standard theory of Markov chains may be adapted, after scaling suitably, so that may be regarded as the stationary distribution of a corresponding Markov chain. Several pitfalls plague all such schemes, and are traceable to the difference between constructive and inhibiting effects, since adding a constant, evidently creates a bias in one direction and not the other.
In the present work, we propose an approach that seems natural for the problem at hand, and brings in the concept of negative probabilities. As indicated earlier, such concepts have had notable proponents including Dirac and Feynman, and some scant following. In some detail, we hereby, postulate and seek a sign-indefinite Markov transition model to explain the observed invariant distribution in , while acknowledging the promoting/inhibiting nature of the links between genes. Feynman’s dictum suggests that, as long as internal probabilities in a model are not experimentally observable (as is the case with the sought transition kernel that will be constructed to respect the signs of ’s), it is acceptable provided that it explains observed and experimentally measurable (non-negative) probabilities and relative frequencies.
Thus, we seek to identify a sign-indefinite probability transition matrix such that in accordance with the similar property for the corresponding . Alternatively, , while . To this end, we formulate the following problem seeking to minimize a suitable functional (herein, entropy).
Problem 1.
Determine that satisfies the above conditions,
minimizes | ||||
(1a) | ||||
and satisfies | ||||
(1b) | ||||
(1c) |
The use of relative entropy between and , as above, echoes a similar usage in the justifications of Schrödinger’s bridges, rooted in large deviations theory (see [11]. It has also been central in other problems in network theory [12]. Negative transition probabilities in our context can be interpreted as promotion/inhibition rates.
III Theoretical development
We first reformulate Problem 1 in the following manner. We define
This is seen to be a sign-definite adjacency of the network with entries . Likewise, for a sign-indefinite transition probability as earlier, we write
for the corresponding (typically) unnormalized transition probability matrix. That is, has non-negative entries, but without guaranteed row-sums being equal to one. Thus, our problem can be re-cast as follows.
Problem 2.
Determine an entry-wise nonnegative matrix that minimizes
(2a) | ||||
and satisfies the linear constraints | ||||
(2b) | ||||
(2c) | ||||
for . |
It is clear that Problems 1 and 2 are equivalent, and of course, that . Problem 2 is quite similar to the classical Schrödinger problem [11], but because is sign-indefinite, (2b-2c) are not the usual conditions. Yet, they are still linear, and the above re-formulation readily leads to the following conclusion.
Theorem 1.
If Problem 2 is feasible, the minimizer exists and is unique.
Proof.
The result follows by virtue of the fact that is strictly convex in , and the constraints are linear. ∎
Corollary 2.
If Problem 1 is feasible, the minimizer exists and is unique.
III-A A Sinkhorn-like algorithm
We now seek to develop a computational approach for our problem that is akin to the Sinkhorn iteration [13, 11]. To this end, we invoke duality theory so as to obtain the functional form of the minimizer. Thus, we introduce Lagrange multipliers and obtain the Lagrangian
The first-order optimality condition, , gives that
for such that , otherwise . Since and , the optimizer must have the following functional dependence on the Lagrange multipliers
(3) |
Define and , so that and , and a scaled set of new parameters , for . Then, the optimal kernel may be written in the form
(4) |
and equivalently, in the form
(5) |
with constraints and , where
We re-write,
(6a) | ||||
(6b) |
for
and we readily observe that can be computed explicitly from the constraint , when the vector is kept fixed, and similarly, can be computed from .
To see this, consider the function , with , and observe that it is monotonic, with negative derivative on the whole real axis, having limits and . Thus, for any given value , a solution to may be readily obtained as the logarithm of the positive root of a quadratic, giving,
(7a) | |||
For our purposes, the case where is also of interest, and here | |||
(7b) | |||
This follows from , and of course, it also coincides with the value . |
Bringing all of the above together, we arrive at the following iterative algorithm, where is computed as a function of , using (7), to solve
(8a) | ||||
and is computed as a function of , using (7) to solve | ||||
(8b) | ||||
The steps are summarized below:
When is the zero matrix, the above system of equations (8) reduces to the classical Schrödinger system [11]. Indeed, in this case, vanish and the steps for finding and , from and , respectively, reduce to the standard diagonal scaling of the celebrated Sinkhorn algorithm, giving , and similarly, . Thus, the above algorithm represents a generalization of the Sinkhorn algorithm; the sign-indefiniteness of prevents simple diagonal scaling as an option to satisfy iteratively the boundary conditions, as in the original Sinkhorn iteration [13], yet the update can still be done quite easily using (7).
Algorithm 1 can also be seen as implementing coordinate ascent on the dual functional of the Lagrangian, with exact evaluation of the maximizer of a smooth functional along coordinate directions at each step. A detailed analysis together with extension of the general framework to address inverse problems in higher dimensions is the subject of forthcoming work [14].
III-B Gradient descent method
Once again, using duality as earlier, the dual function corresponding to the primal problem (Problem 2) reads
Therefore, the dual problem reads
and can be also solved by gradient ascent. Specifically, taking partials of with respect to and , we have
(9a) | ||||
(9b) |
The optimizer in (3) is obtained by iteratively updating the dual variables in the direction of the gradient with a suitable step size, and is sketched below.
III-C Remarks
Although the functional in Problem 2 is convex, existence of a minimizer hinges on whether the constraints are feasible. It is of interest to determine efficient ways to test feasibility, especially for very large matrices.
It is also important to note that the theory applies to the case where is not necessarily symmetric and/or does not have a symmetric sign structure. A non-symmetric adjacency represents a directed graph, while asymmetry of the sign structure brings in interesting feedback dynamics.
IV Illustrative examples
We now discuss representative examples to illustrate what can be accomplished with the proposed formulation. The code used for working out all of the examples may be found at https://github.com/dytroshut/negative-probability-forward-backward.
Example 1.
Consider the -node network shown in Figure 1. We take to be the following sign-indefinite adjacency matrix:
Note that the sign of the edge between the first two nodes is negative, indicating inhibiting effect. We take . Algorithms 1 and 2 converge to the same value (evidently), giving
which can be verified to satisfy the constraints. As can be observed, is not symmetric, as it was not required and only respects the signature structure of .
Example 2.
Once again we consider the -node network shown in Figure 1. However, this time, we take to have the following sign-structure,
which is already not symmetric. We take the same probability vector . The solution in this case can be computed by either algorithm to be
which can be verified to satisfy the constraints and . This example highlights the fact that the theory applies equally well to the case where is not symmetric and/or does not have symmetric sign structure.
Example 3.
We now work out an example with a substantially larger adjacency matrix . The sign indefinite adjacency matrix chosen for this example is
The probability vector is
The topology of the network is shown in Figure 2, with color-coded display of the negative values using dashed red curves. The resulting matrix is given in (10).
(10) |
Example 4.
We finally consider a random graph with nodes and edges, shown in Figure 3. The graph has edges with positive signs and the remaining with negative. The edges with negative transition probability are randomly assigned and colored in red, and the ones with positive probability are shown in blue. Figure 4 displays the convergence of the Sinkhorn-like algorithm 1 in terms of the objective function and marginal constraint violation222Measure used to assess convergence of the standard Sinkhorn iteration. . A linear convergence rate on the constraint/marginal violation is observed, and can be shown by pointing to the fact that Algorithm 1 can be seen as a coordinate ascent algorithm. The example can be replicated using code and data in the project’s website.
V Concluding remarks
The purpose of the present work has been to introduce a sign-indefinite probabilistic model to reflect promoting/inhibiting affinity between nodes, as in co-expression of genes in gene regulatory networks. The theory is of independent interest and leads to a nonstandard extension of the classical Sinkhorn algorithm. An important question that remains is on how to efficiently test feasibility of Problems 1 and 2. In addition, in light of the fact that the Sinkhorn iteration addresses a static version of the more general Schrödinger bridge problem [11], it is of interest to explore a setting that allows modeling evolution of nodal mass, allowing for promotion/inhibition effects, under suitable dynamics.
References
- [1] A.-L. Barabasi and Z. N. Oltvai, “Network biology: understanding the cell’s functional organization,” Nature reviews genetics, vol. 5, no. 2, pp. 101–113, 2004.
- [2] R. Sandhu, T. Georgiou, E. Reznik, L. Zhu, I. Kolesov, Y. Senbabaoglu, and A. Tannenbaum, “Graph curvature for differentiating cancer networks,” Scientific Reports, vol. 5, no. 1, p. 12323, 2015.
- [3] R. Sandhu, S. Tannenbaum, T. Georgiou, and A. Tannenbaum, “Geometry of correlation networks for studying the biology of cancer,” in 2016 IEEE 55th Conference on Decision and Control (CDC). IEEE, 2016, pp. 2501–2506.
- [4] C. Liu, Y. Ma, J. Zhao, R. Nussinov, Y.-C. Zhang, F. Cheng, and Z.-K. Zhang, “Computational network biology: data, models, and applications,” Physics Reports, vol. 846, pp. 1–66, 2020.
- [5] D. Seçilmiş, T. Hillerton, D. Morgan, A. Tjärnberg, S. Nelander, T. E. Nordling, and E. L. Sonnhammer, “Uncovering cancer gene regulation by accurate regulatory network inference from uninformative data,” NPJ systems biology and applications, vol. 6, no. 1, p. 37, 2020.
- [6] P. A. M. Dirac, “Bakerian lecture-the physical interpretation of quantum mechanics,” Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, vol. 180, no. 980, pp. 1–40, 1942.
- [7] R. P. Feynman, “Negative probability,” Quantum implications: essays in honour of David Bohm, pp. 235–248, 1987.
- [8] M. S. Bartlett, “Negative probability,” in Mathematical Proceedings of the Cambridge Philosophical Society, vol. 41, no. 1. Cambridge University Press, 1945, pp. 71–73.
- [9] M. Burgin, “Interpretations of negative probabilities,” arXiv preprint arXiv:1008.1287, 2010.
- [10] A. Baptista, B. D. MacArthur, and C. R. Banerji, “Charting cellular differentiation trajectories with Ricci flow,” bioRxiv, pp. 2023–07, 2023.
- [11] Y. Chen, T. T. Georgiou, and M. Pavon, “Stochastic control liaisons: Richard Sinkhorn meets Gaspard Monge on a Schrödinger bridge,” SIAM Review, vol. 63, no. 2, pp. 249–313, 2021.
- [12] H. Zhou, “Optimal transport on networks,” IEEE Control Systems Magazine, vol. 41, no. 4, pp. 70–81, 2021.
- [13] T. T. Georgiou and M. Pavon, “Positive contraction mappings for classical and quantum Schrödinger systems,” Journal of Mathematical Physics, vol. 56, no. 3, 2015.
- [14] A. Dong, T. T. Georgiou, and A. Tannenbaum, “Negative probabilities in high dimensions: Sign-indefinite priors for data assimilation,” in preparation, 2023.