This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The Role of Normalization in the Belief Propagation Algorithmthanks: This work was supported by the French National Research Agency (ANR) grant Travesti (ANR-08-SYSC-017)

Victorin Martin
INRIA Paris-Rocquencourt
   Jean-Marc Lasgouttes
INRIA Paris-Rocquencourt
   Cyril Furtlehner
INRIA Saclay
Abstract

An important part of problems in statistical physics and computer science can be expressed as the computation of marginal probabilities over a Markov Random Field. The belief propagation algorithm, which is an exact procedure to compute these marginals when the underlying graph is a tree, has gained its popularity as an efficient way to approximate them in the more general case. In this paper, we focus on an aspect of the algorithm that did not get that much attention in the literature, which is the effect of the normalization of the messages. We show in particular that, for a large class of normalization strategies, it is possible to focus only on belief convergence. Following this, we express the necessary and sufficient conditions for local stability of a fixed point in terms of the graph structure and the beliefs values at the fixed point. We also explicit some connexion between the normalization constants and the underlying Bethe Free Energy.

1 Introduction

We are interested in this article in a random Markov field on a finite graph with local interactions, on which we want to compute marginal probabilities. The structure of the underlying model is described by a set of discrete variables 𝐱={xi,i𝕍}{1,,q}𝕍\mathbf{x}=\{x_{i},i\in\mathbb{V}\}\in\{1,\ldots,q\}^{\mathbb{V}}, where the set 𝕍\mathbb{V} of variables is linked together by so-called “factors” which are subsets a𝕍a\subset\mathbb{V} of variables. If 𝔽\mathbb{F} is this set of factors, we consider the set of probability measures of the form

p(𝐱)=i𝕍ϕi(xi)a𝔽ψa(𝐱a),p(\mathbf{x})=\prod_{i\in\mathbb{V}}\phi_{i}(x_{i})\prod_{a\in\mathbb{F}}\psi_{a}(\mathbf{x}_{a}), (1.1)

where 𝐱a={xi,ia}\mathbf{x}_{a}=\{x_{i},i\in a\}.

𝔽\mathbb{F} together with 𝕍\mathbb{V} define the factor graph 𝒢\mathcal{G} (Kschischang et al., 2001), that is an undirected bipartite graph, which will be assumed to be connected. We will also assume that the functions ψa\psi_{a} are never equal to zero, which is to say that the Markov random field exhibits no deterministic behavior. The set 𝔼\mathbb{E} of edges contains all the couples (a,i)𝔽×𝕍(a,i)\in\mathbb{F}\times\mathbb{V} such that iai\in a. We denote dad_{a} (resp. did_{i}) the degree of the factor node aa (resp. of the variable node ii), and CC the number of independent cycles of 𝒢\mathcal{G}.

Exact procedures to compute marginal probabilities of pp generally face an exponential complexity problem and one has to resort to approximate procedures. The Bethe approximation, which is used in statistical physics, consists in minimizing an approximate version of the variational free energy associated to (1.1). In computer science, the belief propagation (BP) algorithm (Pearl, 1988) is a message passing procedure that allows to compute efficiently exact marginal probabilities when the underlying graph is a tree. When the graph has cycles, it is still possible to apply the procedure, which converges with a rather good accuracy on sufficiently sparse graphs. However, there may be several fixed points, either stable or unstable. It has been shown that these fixed points coincide with stationary points of the Bethe free energy (Yedidia et al., 2005). In addition (Heskes, 2003; Watanabe and Fukumizu, 2009), stable fixed points of BP are local minima of the Bethe free energy. We will come back to this variational point of view of the BP algorithm in Section 6.

We discuss in this paper an aspect of the algorithm that did not get that much attention in the literature, which is the effect of the normalization of the messages on the behavior of the algorithm. Indeed, the justification for normalization is generally that it “improves convergence”. Moreover, different authors use different schemes, without really explaining what are the difference between these definitions.

The paper is organized as follows: the BP algorithm and its various normalization strategies are defined in Section 2. Section 3 deals with the effect of different types of messages normalization on the existence of fixed points. Section 4 is dedicated to the dynamic of the algorithm in terms of beliefs and cases where convergence of messages is equivalent to convergence of beliefs; moreover, it is shown that normalization does not change belief dynamic. In Section 5, we show that normalization is required for convergence of the messages, and provide some sufficient conditions. Finally, in Section 6, we tackle the issue of normalization in the variational problem associated to Bethe approximation. New research directions are proposed in Section 7.

2 The belief propagation algorithm

The belief propagation algorithm (Pearl, 1988) is a message passing procedure, which output is a set of estimated marginal probabilities, the beliefs ba(𝐱a)b_{a}(\mathbf{x}_{a}) (including single nodes beliefs bi(xi)b_{i}(x_{i})). The idea is to factor the marginal probability at a given site as a product of contributions coming from neighboring factor nodes, which are the messages. With definition (1.1) of the joint probability measure, the updates rules read:

mai(xi)\displaystyle m_{a\to i}(x_{i}) 𝐱aiψa(𝐱a)jainja(xj),\displaystyle\leftarrow\sum_{\mathbf{x}_{a\setminus i}}\psi_{a}(\mathbf{x}_{a})\prod_{j\in a\setminus i}n_{j\to a}(x_{j}), (2.1)
nia(xi)\displaystyle n_{i\to a}(x_{i}) =defϕi(xi)ai,aamai(xi),\displaystyle\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\phi_{i}(x_{i})\prod_{a^{\prime}\ni i,a^{\prime}\neq a}m_{a^{\prime}\to i}(x_{i}), (2.2)

where the notation 𝐱s\sum_{\mathbf{x}_{s}} should be understood as summing all the variables xix_{i}, is𝕍i\in s\subset\mathbb{V}, from 11 to qq. At any point of the algorithm, one can compute the current beliefs as

bi(xi)\displaystyle b_{i}(x_{i}) =def1Zi(m)ϕi(xi)aimai(xi),\displaystyle\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\frac{1}{Z_{i}(m)}\phi_{i}(x_{i})\prod_{a\ni i}m_{a\to i}(x_{i}), (2.3)
ba(𝐱a)\displaystyle b_{a}(\mathbf{x}_{a}) =def1Za(m)ψa(𝐱a)iania(xi),\displaystyle\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\frac{1}{Z_{a}(m)}\psi_{a}(\mathbf{x}_{a})\prod_{i\in a}n_{i\to a}(x_{i}), (2.4)

where Zi(m)Z_{i}(m) and Za(m)Z_{a}(m) are the normalization constants that ensure that

xibi(xi)=1,𝐱aba(𝐱a)=1.\sum_{x_{i}}b_{i}(x_{i})=1,\qquad\sum_{\mathbf{x}_{a}}b_{a}(\mathbf{x}_{a})=1. (2.5)

These constants reduce to 11 when 𝒢\mathcal{G} is a tree.

In practice, the messages are often normalized so that

xi=1qmai(xi)=1.\sum_{x_{i}=1}^{q}m_{a\to i}(x_{i})=1. (2.6)

However, the possibilities of normalization are not limited to this setting. Consider the mapping

Θai,xi(m)=def𝐱aiψa(𝐱a)jai[ϕj(xj)aj,aamaj(xj)].\Theta_{ai,x_{i}}(m)\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\sum_{\mathbf{x}_{a\setminus i}}\psi_{a}(\mathbf{x}_{a})\prod_{j\in a\setminus i}\biggl{[}\phi_{j}(x_{j})\prod_{a^{\prime}\ni j,a^{\prime}\neq a}m_{a^{\prime}\to j}(x_{j})\biggr{]}. (2.7)

A normalized version of BP is defined by the update rule

m~ai(xi)Θai,xi(m~)Zai(m~).\tilde{m}_{a\to i}(x_{i})\leftarrow\frac{\Theta_{ai,x_{i}}(\tilde{m})}{Z_{ai}(\tilde{m})}. (2.8)

where Zai(m~)Z_{ai}(\tilde{m}) is a constant that depends on the messages and which, in the case of (2.6), reads

Zaimess(m~)=defx=1qΘai,x(m~).Z^{\mathrm{mess}}_{ai}(\tilde{m})\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\sum_{x=1}^{q}\Theta_{ai,x}(\tilde{m}). (2.9)

In the remaining of this paper, (2.1,2.2) will be referred to as “plain BP” algorithm, to differentiate it from the “normalized BP” of (2.8).

Following Wainwright (2002), it is worth noting that the plain message update scheme can be rewritten as

mai(xi)Za(m)bi|a(xi)Zi(m)bi(xi)mai(xi),m_{a\to i}(x_{i})\leftarrow\frac{Z_{a}(m)b_{i|a}(x_{i})}{Z_{i}(m)b_{i}(x_{i})}m_{a\to i}(x_{i}), (2.10)

where we use the convenient shorthand notation

bi|a(xi)=def𝐱aiba(𝐱a).b_{i|a}(x_{i})\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\sum_{\mathbf{x}_{a\setminus i}}b_{a}(\mathbf{x}_{a}).

This suggests a different type of normalization, used in particular by Heskes (2003), namely

Zaibel(m~)=Za(m~)Zi(m~),Z^{\text{bel}}_{ai}(\tilde{m})=\frac{Z_{a}(\tilde{m})}{Z_{i}(\tilde{m})}, (2.11)

which leads to the simple update rule

m~ai(xi)bi|a(xi)bi(xi)m~ai(xi).\tilde{m}_{a\to i}(x_{i})\leftarrow\frac{b_{i|a}(x_{i})}{b_{i}(x_{i})}\tilde{m}_{a\to i}(x_{i}). (2.12)

The following lemma recapitulates some properties shared by all normalization strategies at a fixed point.

Lemma 2.1.

Let m~\tilde{m} be such that

m~ai(xi)=Θai,xi(m~)Zai(m~).\tilde{m}_{a\to i}(x_{i})=\frac{\Theta_{ai,x_{i}}(\tilde{m})}{Z_{ai}(\tilde{m})}.

The associated normalization constants satisfy

Zai(m~)=Za(m~)Zi(m~),ai𝔼,Z_{ai}(\tilde{m})=\frac{Z_{a}(\tilde{m})}{Z_{i}(\tilde{m})},\qquad\forall ai\in\mathbb{E}, (2.13)

and the following compatibility condition holds.

𝐱aiba(𝐱a)=bi(xi).\sum_{\mathrm{\mathbf{x}}_{a\setminus i}}b_{a}(\mathrm{\mathbf{x}}_{a})=b_{i}(x_{i}). (2.14)

In particular, when Zai1Z_{ai}\equiv 1 (no normalization), all the ZaZ_{a} and ZiZ_{i} are equal to some common constant ZZ.

Proof.

The normalized update rule (2.8), together with (2.3)–(2.4), imply

𝐱a\xiba(𝐱a)=ZiZaiZabi(xi).\sum_{\mathbf{x}_{a}\backslash x_{i}}b_{a}(\mathbf{x}_{a})=\frac{Z_{i}Z_{ai}}{Z_{a}}b_{i}(x_{i}).

By definition of ZaZ_{a} and ZiZ_{i}, bab_{a} and bib_{i} are normalized to 11, so summing this relation w.r.t xix_{i} gives (2.13) and the equation above reduces to (2.14).  

It is known (Yedidia et al., 2005) that the belief propagation algorithm is an iterative way of solving a variational problem, namely it minimizes over bb the Bethe free energy F(b)F(b) associated with (1.1).

F(b)=defa,𝐱aba(𝐱a)logba(𝐱a)ψa(𝐱a)+i,xibi(xi)logbi(xi)1diϕi(xi).F(b)\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\sum_{a,\mathbf{x}_{a}}b_{a}(\mathbf{x}_{a})\log\frac{b_{a}(\mathbf{x}_{a})}{\psi_{a}(\mathbf{x}_{a})}+\sum_{i,x_{i}}b_{i}(x_{i})\log\frac{b_{i}(x_{i})^{1-d_{i}}}{\phi_{i}(x_{i})}. (2.15)

Writing the Lagrangian of the minimization of (2.15) with bb subject to the constraints (2.14) and (2.5), one obtains

(b,λ,γ)=F(b)+i,aixiλai(xi)(bi(xi)𝐱a/xiba(𝐱a))iγi(xibi(xi)1).\mathcal{L}(b,\lambda,\gamma)=F(b)+\sum_{\begin{subarray}{c}i,a\ni i\\ x_{i}\end{subarray}}\lambda_{ai}(x_{i})\Bigl{(}b_{i}(x_{i})-\sum_{\mathbf{x}_{a}/x_{i}}b_{a}(\mathbf{x}_{a})\Bigr{)}-\sum_{i}\gamma_{i}\Bigl{(}\sum_{x_{i}}b_{i}(x_{i})-1\Bigr{)}.

The minima are stationary points of (b,λ,γ)\mathcal{L}(b,\lambda,\gamma) which correspond to

{ba(𝐱a)=ψa(𝐱a)ejabj,bambj(xj),a𝔽bi(xi)=ϕi(xi)exp(1di1γi)bimai(xi),i𝕍\begin{cases}b_{a}(\mathbf{x}_{a})&=\displaystyle\frac{\psi_{a}(\mathbf{x}_{a})}{e}\prod_{j\in a}\prod_{b\ni j,b\neq a}m_{b\to j}(x_{j}),\,\forall a\in\mathbb{F}\\[5.69046pt] b_{i}(x_{i})&=\displaystyle\phi_{i}(x_{i})\exp(\frac{1}{d_{i}-1}-\gamma_{i})\prod_{b\ni i}m_{a\to i}(x_{i}),\,\forall i\in\mathbb{V}\end{cases}

with the (invertible) parametrization

λai(xi)=logbi,bambi(xi),\lambda_{ai}(x_{i})=\log\prod_{b\ni i,b\neq a}m_{b\to i}(x_{i}),

Enforcing constraints (2.14) yields the BP fixed points equations with normalization terms γi\gamma_{i}. We will return to this variational setting in Section 6.

3 Normalization and existence of fixed points

We discuss here an aspect of the algorithm that did not get that much attention in the literature, which is the equivalence of the fixed points of the normalized and plain BP flavors.

It is not immediate to check that the normalized version of the algorithm does not introduce new fixed points, that would therefore not correspond to true stationary points of the Bethe free energy. We show in Theorem 3.2 that the sets of fixed points are equivalent, except possibly when the graph 𝒢\mathcal{G} has one unique cycle.


As pointed out by Mooij and Kappen (2007), many different sets of messages can correspond to the same set of beliefs. The following lemma shows that the set of messages leading to the same beliefs is simply constructed through linear mappings.

Lemma 3.1.

Two set of messages mm and mm^{\prime} lead to the same beliefs if, and only if, there is a set of strictly positive constants caic_{ai} such that

mai(xi)=caimai(xi).m^{\prime}_{a\to i}(x_{i})=c_{ai}m_{a\to i}(x_{i}).
Proof.

The direct part of the lemma is trivial. Concerning the other part, we have from (2.3) and (2.4)

ba(𝐱a)Za(m)ψa(𝐱a)=jabj,bambj(xj)\frac{b_{a}(\mathbf{x}_{a})Z_{a}(m)}{\psi_{a}(\mathbf{x}_{a})}=\prod_{j\in a}\prod_{b\ni j,b\neq a}m_{b\to j}(x_{j})
bi(xi)Zi(m)ϕi(xi)=aimai(xi).\frac{b_{i}(x_{i})Z_{i}(m)}{\phi_{i}(x_{i})}=\prod_{a\ni i}m_{a\to i}(x_{i}).

Assume the two vectors of messages mm and mm^{\prime} lead to the same set of beliefs bb and write mai(xi)=cai,ximai(xi)m_{a\to i}(x_{i})=c_{ai,x_{i}}\,m^{\prime}_{a\to i}(x_{i}). Then, from the relation on bib_{i}, the vector 𝐜\mathbf{c} satisfies

aicai,xi=aimai(xi)mai(xi)=Zi(m)Zi(m)=defvi.\prod_{a\ni i}c_{ai,x_{i}}=\prod_{a\ni i}\frac{m_{a\to i}(x_{i})}{m^{\prime}_{a\to i}(x_{i})}=\frac{Z_{i}(m)}{Z_{i}(m^{\prime})}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}v_{i}. (3.1)

Moreover, we want to preserve the beliefs bab_{a}. Using (3.1), we have

jamaj(xj)maj(xj)=jacaj,xj=Za(m)Za(m)iavi=defva,\prod_{j\in a}\frac{m_{a\to j}(x_{j})}{m^{\prime}_{a\to j}(x_{j})}=\prod_{j\in a}c_{aj,x_{j}}=\frac{Z_{a}(m^{\prime})}{Z_{a}(m)}\prod_{i\in a}v_{i}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}v_{a}, (3.2)

Since viv_{i} (resp. vav_{a}) does not depend on the choice of xix_{i} (resp. 𝐱a\mathbf{x}_{a}), (3.2) implies the independence of cai,xic_{ai,x_{i}} with respect to xix_{i}. Indeed, if we compare two vectors 𝐱a\mathbf{x}_{a} and 𝐱a\mathbf{x}_{a}^{\prime} such that, for all iaji\in a\setminus j, xi=xix^{\prime}_{i}=x_{i}, but xjxjx^{\prime}_{j}\neq x_{j}, then caj,xj=caj,xjc_{aj,x_{j}}=c_{aj,x^{\prime}_{j}}, which concludes the proof.  

3.1 From normalized BP to plain BP

We show that in most cases the fixed points of a normalized BP algorithm (no matter the normalization used) are associated with fixed points of the plain BP algorithm. Recall that CC is the number of independent cycles of 𝒢\mathcal{G}.

Theorem 3.2.

A fixed point m~\tilde{m} of the BP algorithm with normalized messages corresponds to a fixed point of the plain BP algorithm associated to the same beliefs iff one of the two following conditions is satisfied:

  1. (i)

    the graph 𝒢\mathcal{G} has either no cycle or more than one (C1C\neq 1);

  2. (ii)

    C=1C=1, and the normalization constants of the associated beliefs are such that

    a𝔽Za(m~)i𝕍Zi(m~)1di=1.\prod_{a\in\mathbb{F}}Z_{a}(\tilde{m})\prod_{i\in\mathbb{V}}Z_{i}(\tilde{m})^{1-d_{i}}=1. (3.3)
Proof.

Let m~\tilde{m} be a fixed point of (2.8). Let us find a set of constants caic_{ai} such that mai(xi)=caim~ai(xi)m_{a\to i}(x_{i})=c_{ai}\,\tilde{m}_{a\to i}(x_{i}) is a non-zero fixed point of (2.12.2). Using Lemma 3.1, we see that mm and m~\tilde{m} correspond to the same beliefs. We have

Θai,xi(m)\displaystyle\Theta_{ai,x_{i}}(m) =[jaiaj,aacaj]Θai,xi(m~)\displaystyle=\biggl{[}\prod_{j\in a\setminus i}\prod_{a^{\prime}\ni j,a^{\prime}\neq a}c_{a^{\prime}j}\biggr{]}\Theta_{ai,x_{i}}(\tilde{m})
=[jaiaj,aacaj]Zaim~ai(xi)\displaystyle=\biggl{[}\prod_{j\in a\setminus i}\prod_{a^{\prime}\ni j,a^{\prime}\neq a}c_{a^{\prime}j}\biggr{]}Z_{ai}\,\tilde{m}_{a\to i}(x_{i})
=1cai[jaiaj,aacaj]Zaimai(xi),\displaystyle=\frac{1}{c_{ai}}\biggl{[}\prod_{j\in a\setminus i}\prod_{a^{\prime}\ni j,a^{\prime}\neq a}c_{a^{\prime}j}\biggr{]}Z_{ai}\,m_{a\to i}(x_{i}),

and therefore

logcaijaiaj,aalogcaj=logZai.\log c_{ai}-\sum_{j\in a\setminus i}\sum_{a^{\prime}\ni j,a^{\prime}\neq a}\log c_{a^{\prime}j}=\log Z_{ai}.

This equation is precisely in the setting of Lemma A.2 given in the Appendix, with xai=logcaix_{ai}=\log c_{ai} and yai=logZai=logZalogZiy_{ai}=\log Z_{ai}=\log Z_{a}-\log Z_{i}. It always has a solution when C1C\neq 1; when C=1C=1, the additional condition (A.5) is required, and (3.3) follows.  

There is in general an infinite number of fixed points mm corresponding to each m~\tilde{m}. However, as noted at the beginning of the section, this is not a problem, since all these fixed points correspond to the same set of beliefs. In this sense, normalizing the messages can have the effect of collapsing equivalent fixed points.

When C=1C=1, it is known (Weiss, 2000) that normalized BP always converges to a fixed point. However, the theorem above states that there may be no basic fixed point mm corresponding to a given m~\tilde{m}.

It is actually not difficult to see what happens in this case: assume a trivial network with two variables and two factors a=b={1,2}a=b=\{1,2\} and assume for simplicity that ϕ1=ϕ2=1\phi_{1}=\phi_{2}=1. The equations for the BP fixed point boil down to relations like

ma1(xi)=x2ψa(x1,x2)mb2(x2),m_{a\to 1}(x_{i})=\sum_{x_{2}}\psi_{a}(x_{1},x_{2})m_{b\to 2}(x_{2}),

or, with a matrix notation,

𝐦a1=Ψa𝐦b2=ΨaΨb𝐦a1.\mathbf{m}_{a\to 1}=\Psi_{a}\mathbf{m}_{b\to 2}=\Psi_{a}\Psi_{b}\mathbf{m}_{a\to 1}.

Therefore, the matrix ΨaΨb\Psi_{a}\Psi_{b} necessarily has 11 as an eigenvalue. Since this is not true in general, there can be no fixed point for basic BP. In the normalized case, Weiss (2000) shows that BP always converges to the Perron vector of this matrix. We know there is an infinite number (not even countable, see Lemma 3.1) set of messages corresponding to the same beliefs.

It is possible that the behavior of the algorithm leads to convergence of the beliefs without the convergence of messages as the case C=1C=1 suggests. Indeed, the plain BP scheme is then a linear dynamical system which can converge to a subspace as described in Hartfiel (1997). We will describe more precisely this kind of behavior in Section 4.

3.2 From plain BP to normalized BP

It turns out that there is no general result about whether a plain BP fixed point is mapped to a fixed point by normalization. In this section, we will thus first examine the case of a fairly general family of normalizations, and then look at two other examples.

Definition 3.3.

A normalization ZaiZ_{ai} is said to be positive homogeneous when it is of the form Zai=NaiΘaiZ_{ai}=N_{ai}\circ\Theta_{ai}, with Nai:qN_{ai}\,:\;\mathbb{R}^{q}\mapsto\mathbb{R} positive homogeneous functions of order 11 satisfying

Nai(λmai)\displaystyle N_{ai}(\lambda m_{a\to i}) =λNai(mai),λ0.\displaystyle=\lambda N_{ai}(m_{a\to i}),\forall\lambda\geq 0. (3.4)
Nai(mai)\displaystyle N_{ai}(m_{a\to i}) =0mai=0.\displaystyle=0\iff m_{a\to i}=0. (3.5)

The part \impliedby of (3.5) is obviously implied by (3.4). A particular family of positive homogeneous normalizations is built from all norms NaiN_{ai} on q\mathbb{R}^{q}. These contain in particular the normalization Zaimess(m)Z_{ai}^{\mathrm{mess}}(m) (2.9) or the maximum of messages

Zai(m)=defmaxxΘai,x(m).Z^{\infty}_{ai}(m)\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\max_{x}\Theta_{ai,x}(m).

It is actually not necessary to have a proper norm: Watanabe and Fukumizu (2009) use a scheme that amounts to

Zai1(m)=defΘai,1(m).Z^{1}_{ai}(m)\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\Theta_{ai,1}(m).

The following proposition describes the effect of the above family of normalizations.

Proposition 3.4.

All the fixed points of the plain BP algorithm leading to the same set of beliefs correspond to a unique fixed point of a positive homogeneous normalized scheme.

Proof.

Let mm be a fixed point of the plain BP scheme. Using Lemma 3.1, a fixed point m~\tilde{m} of the normalized scheme associated with the same beliefs than mm is such as

m~ai(xi)=caimai(xi).\tilde{m}_{a\to i}(x_{i})=c_{ai}m_{a\to i}(x_{i}). (3.6)

Since Θ\Theta is multi-linear,

Θai,xi(m~)=(jaidj,dacdj)Θai,xi(m),\Theta_{ai,x_{i}}(\tilde{m})=\left(\prod_{j\in a\setminus i}\prod_{d\ni j,d\neq a}c_{dj}\right)\Theta_{ai,x_{i}}(m),

and, using (3.4),

Zai(m~)\displaystyle Z_{ai}(\tilde{m}) =(jaidj,dacdj)Zai(m),\displaystyle=\left(\prod_{j\in a\setminus i}\prod_{d\ni j,d\neq a}c_{dj}\right)Z_{ai}(m),
m~ai(xi)\displaystyle\tilde{m}_{a\to i}(x_{i}) =Θai,xi(m~)Zai(m~)=mai(xi)Zai(m).\displaystyle=\frac{\Theta_{ai,x_{i}}(\tilde{m})}{Z_{ai}(\tilde{m})}=\frac{m_{a\to i}(x_{i})}{Z_{ai}(m)}.

Therefore, m~\tilde{m} is determined uniquely from mm. Since m~\tilde{m} is clearly invariant for all the set of messages mm corresponding to the same beliefs (see Lemma 3.1), the proof is complete.  

In order to emphasize the result of Proposition 3.4, it is interesting to describe what happens with the belief normalization ZbelZ^{\text{bel}} (2.11). We know from Lemma 2.1 that, for any normalization, we have at any fixed point

Zai(m)=Za(m)Zi(m)=defZaibel(m).Z_{ai}(m)=\frac{Z_{a}(m)}{Z_{i}(m)}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}Z_{ai}^{\text{bel}}(m).

Therefore, any fixed point of any normalized scheme (even of the plain scheme) is a fixed point of the scheme with normalization ZbelZ^{\mathrm{bel}}. We see the difference between this kind of normalization and a positive homogeneous one. While the latter collapses families of fixed points to one unique fixed point, ZbelZ^{\mathrm{bel}} instead conserves all the fixed points of all possible schemes.

To conclude this section, we will present an example of a “bad normalization” to illustrate a worst case scenario. Consider the following normalization

Zai(m)=xΘai,x(m)supxmai(x).Z_{ai}(m)=\frac{\sum_{x}\Theta_{ai,x}(m)}{\sup_{x}m_{a\to i}(x)}.

This normalization, which is not homogeneous at all, defines a BP algorithm which does not admit any fixed point. Following the proof of Proposition 3.4, let m~\tilde{m} be a fixed point of normalized BP associated with a plain fixed point mm through (3.6), then

m~ai(xi)=Θai,xi(m~)Zai(m~)=m~ai(xi)Zai(m)\tilde{m}_{a\to i}(x_{i})=\frac{\Theta_{ai,x_{i}}(\tilde{m})}{Z_{ai}(\tilde{m})}=\frac{\tilde{m}_{a\to i}(x_{i})}{Z_{ai}(m)}

Indeed it is easy to check that

Zai(m~)=jaibj,bacbjcaiZai(m).Z_{ai}(\tilde{m})=\frac{\prod_{j\in a\setminus i}\prod_{b\ni j,b\neq a}c_{bj}}{c_{ai}}Z_{ai}(m).

Since for any fixed point mm of the plain update we have Zai(m)>1Z_{ai}(m)>1, no message m~\tilde{m} can be a fixed point for this normalized scheme. Using Theorem 3.2 we conclude that this scheme admits no fixed point.

4 Belief dynamic

We are interested here in looking at the dynamic in terms of convergence of beliefs. At each step of the algorithm, using (2.3) and (2.4), we can compute the current beliefs bi(n)b^{\scriptscriptstyle(n)}_{i} and ba(n)b^{\scriptscriptstyle(n)}_{a} associated with the message m(n)m^{\scriptscriptstyle(n)}. The sequence m(n)m^{\scriptscriptstyle(n)} will be said to be “bb-convergent” when the sequences bi(n)b^{\scriptscriptstyle(n)}_{i} and ba(n)b^{\scriptscriptstyle(n)}_{a} converge. The term “simple convergence” will be used to refer to convergence of the sequence m(n)m^{\scriptscriptstyle(n)} itself. Simple convergence obviously implies bb-convergence. We will first show that for a positive homogeneous normalization, bb-convergence and simple convergence are equivalent. We will then conclude by looking at bb-convergence in a quotient space introduced in Mooij and Kappen (2007) and we show the links between these two approaches.

Proposition 4.1.

For any positive homogeneous normalization ZaiZ_{ai} with continuous NaiN_{ai}, simple convergence and bb-convergence are equivalent.

Proof.

Assume that the sequences of beliefs, indexed by iteration nn, are such that ba(n)bab^{\scriptscriptstyle(n)}_{a}\to b_{a} and bi(n)bib^{\scriptscriptstyle(n)}_{i}\to b_{i} as nn\to\infty. The idea of the proof is first to express the normalized messages m~ai(n)\tilde{m}^{\scriptscriptstyle(n)}_{a\to i} at each step in terms of these beliefs, and then to conclude by a continuity argument. Starting from a rewrite of (2.3)–(2.4),

bi(n)(xi)\displaystyle b_{i}^{\scriptscriptstyle(n)}(x_{i}) =ϕi(xi)Zi(m~(n))aim~ai(n)(xi),\displaystyle=\frac{\phi_{i}(x_{i})}{Z_{i}(\tilde{m}^{\scriptscriptstyle(n)})}\prod_{a\ni i}\tilde{m}_{a\to i}^{\scriptscriptstyle(n)}(x_{i}),
ba(n)(𝐱a)\displaystyle b_{a}^{\scriptscriptstyle(n)}(\mathbf{x}_{a}) =ψa(𝐱a)Za(m~(n))jaϕj(xj)bj,bam~bj(n)(xj),\displaystyle=\frac{\psi_{a}(\mathbf{x}_{a})}{Z_{a}(\tilde{m}^{\scriptscriptstyle(n)})}\prod_{j\in a}\phi_{j}(x_{j})\prod_{b\ni j,b\neq a}\tilde{m}_{b\to j}^{\scriptscriptstyle(n)}(x_{j}),

one obtains by recombination

jam~aj(n)(xj)=jaZj(m~(n))Za(m~(n))ψa(𝐱a)jabj(n)(xj)ba(n)(𝐱a)=defKai(n)(𝐱ai;xi)Z~ai(m~),\prod_{j\in a}\tilde{m}^{\scriptscriptstyle(n)}_{a\to j}(x_{j})=\frac{\prod_{j\in a}Z_{j}(\tilde{m}^{\scriptscriptstyle(n)})}{Z_{a}(\tilde{m}^{\scriptscriptstyle(n)})}\psi_{a}(\mathbf{x}_{a})\frac{\prod_{j\in a}b_{j}^{\scriptscriptstyle(n)}(x_{j})}{b_{a}^{\scriptscriptstyle(n)}(\mathbf{x}_{a})}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\frac{K^{\scriptscriptstyle(n)}_{ai}(\mathbf{x}_{a\setminus i};x_{i})}{\tilde{Z}_{ai}(\tilde{m})},

where an arbitrary variable iai\in a has been singled out and

1Z~ai(m~)=defjaZj(m~(n))Za(m~(n)).\frac{1}{\tilde{Z}_{ai}(\tilde{m})}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\frac{\prod_{j\in a}Z_{j}(\tilde{m}^{\scriptscriptstyle(n)})}{Z_{a}(\tilde{m}^{\scriptscriptstyle(n)})}.

Assume now that 𝐱ai\mathbf{x}_{a\setminus i} is fixed and consider 𝐊ai(n)(𝐱ai)=defKai(n)(𝐱ai;)\mathbf{K}^{\scriptscriptstyle(n)}_{ai}(\mathbf{x}_{a\setminus i})\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}K^{\scriptscriptstyle(n)}_{ai}(\mathbf{x}_{a\setminus i};\cdot) as a vector of q\mathbb{R}^{q}. Normalizing each side of the equation with a positive homogeneous function NaiN_{ai} yields

m~ai(n)(xi)Nai[m~ai(n)]=Kai(n)(𝐱ai;xi)Nai[𝐊ai(n)(𝐱ai)].\frac{\tilde{m}^{\scriptscriptstyle(n)}_{a\to i}(x_{i})}{N_{ai}\bigl{[}\tilde{m}^{\scriptscriptstyle(n)}_{a\to i}\bigr{]}}=\frac{K^{\scriptscriptstyle(n)}_{ai}(\mathbf{x}_{a\setminus i};x_{i})}{N_{ai}\bigl{[}\mathbf{K}^{\scriptscriptstyle(n)}_{ai}(\mathbf{x}_{a\setminus i})\bigr{]}}.

Actually Nai[m~ai(n)]=1N_{ai}\bigl{[}\tilde{m}^{\scriptscriptstyle(n)}_{a\to i}\bigr{]}=1, since m~ai(n)\tilde{m}^{\scriptscriptstyle(n)}_{a\to i} has been normalized by NaiN_{ai} and therefore

m~ai(n)(xi)=Kai(n)(𝐱ai;xi)Nai[𝐊ai(n)(𝐱ai)].\tilde{m}^{\scriptscriptstyle(n)}_{a\to i}(x_{i})=\frac{K^{\scriptscriptstyle(n)}_{ai}(\mathbf{x}_{a\setminus i};x_{i})}{N_{ai}\bigl{[}\mathbf{K}^{\scriptscriptstyle(n)}_{ai}(\mathbf{x}_{a\setminus i})\bigr{]}}.

This conclude the proof, since m~ai(n)\tilde{m}^{\scriptscriptstyle(n)}_{a\to i} has been expressed as a continuous function of bi(n)b^{\scriptscriptstyle(n)}_{i} and ba(n)b^{\scriptscriptstyle(n)}_{a}, and therefore it converges whenever the beliefs converge.  


We follow now an idea developed in Mooij and Kappen (2007) and study the behavior of the BP algorithm in a quotient space corresponding to the invariance of beliefs. First we will introduce a natural parametrization for which the quotient space is just a vector space. Then it will be trivial to show that, in terms of bb-convergence, the effect of normalization is null.

The idea of bb-convergence is easier to express with the new parametrization :

μai(xi)=deflogmai(xi),\mu_{ai}(x_{i})\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\log m_{a\to i}(x_{i}),

so that the plain update mapping (2.7) becomes

Λai,xi(μ)=log[𝐱aiψa(𝐱a)exp(jaibjbaμbj(xj))].\Lambda_{ai,x_{i}}(\mu)=\log\left[\sum_{\mathbf{x}_{a}\setminus i}\psi_{a}(\mathbf{x}_{a})\exp\Bigl{(}\sum_{j\in a\setminus i}\sum_{\begin{subarray}{c}b\ni j\\ b\neq a\end{subarray}}\mu_{bj}(x_{j})\Bigr{)}\right].

We have μ𝒩=def|𝔼|q\mu\in\mathcal{N}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\mathbb{R}^{|\mathbb{E}|q} and we define the vector space 𝒲\mathcal{W} which is the linear span of the following vectors {eai𝒩}(ai)𝔼\{e_{ai}\in\mathcal{N}\}_{(ai)\in\mathbb{E}}

(eai)cj,xj=def11{ai=cj}.(e_{ai})_{cj,x_{j}}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\leavevmode\hbox{\rm\small 1\kern-3.23753pt\normalsize 1}_{\{ai=cj\}}.

It is trivial to see that the invariance set of the beliefs corresponding to μ\mu described in Lemma 3.1 is simply the affine space μ+𝒲\mu+\mathcal{W}. So the bb-convergence of a sequence μ(n)\mu^{(n)} is simply the convergence of μ(n)\mu^{(n)} in the quotient space 𝒩𝒲\mathcal{N}\setminus\mathcal{W} (which is a vector space, see Halmos (1974)). Finally we define the notation [x][x] for the canonical projection of xx on 𝒩𝒲\mathcal{N}\setminus\mathcal{W}.

Suppose that we resolve to some kind of normalization on μ\mu, it is easy to see that this normalization plays no role in the quotient space. The normalization on μ\mu leads to μ+w\mu+w with some w𝒲w\in\mathcal{W}. We have

Λai,xi(μ+w)\displaystyle\Lambda_{ai,x_{i}}(\mu+w) =log(jaibjbawbj)+Λai,xi(μ)\displaystyle=\log\Bigl{(}\sum_{j\in a\setminus i}\sum_{\begin{subarray}{c}b\ni j\\ b\neq a\end{subarray}}w_{bj}\Bigr{)}+\Lambda_{ai,x_{i}}(\mu)
=deflai+Λai,xi(μ),\displaystyle\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}l_{ai}+\Lambda_{ai,x_{i}}(\mu),

which can be summed up by

[Λ(μ+𝒲)]=[Λ(μ)],[\Lambda(\mu+\mathcal{W})]=[\Lambda(\mu)], (4.1)

since l𝒲l\in\mathcal{W}. We conclude by a proposition which is directly implied by (4.1).

Proposition 4.2.

The dynamic, i.e. the value of the normalized beliefs at each step, of the BP algorithm with or without normalization is exactly the same.

We will come back to this vision in term of quotient space in section 5.3.

5 Local stability of BP fixed points

The question of convergence of BP has been addressed in a series of works (Tatikonda and Jordan, 2002; Mooij and Kappen, 2007; Ihler et al., 2005) which establish conditions and bounds on the MRF coefficients for having global convergence. In this section, we change the viewpoint and, instead of looking for conditions ensuring a single fixed point, we examine the different fixed points for a given joint probability and their local properties.

In what follows, we are interested in the local stability of a message fixed point mm with associated beliefs bb. It is known that a BP fixed point is locally attractive if the Jacobian of the relevant mapping (Θ\Theta or its normalized version) at this point has a spectral radius strictly smaller than 11 and unstable when the spectral radius is strictly greater than 11. The term “spectral radius” should be understood here as the modulus of the largest eigenvalue of the Jacobian matrix.

We will first show that BP with plain messages can in fact never converge when there is more than one cycle (Theorem 5.1), and then explain how normalization of messages improves the situation (Proposition 5.2, Theorem 5.3).

5.1 Unnormalized messages

The characterization of the local stability relies on two ingredients. The first one is the oriented line graph L(𝒢)L(\mathcal{G}) based on 𝒢\mathcal{G}, which vertices are the elements of 𝔼\mathbb{E}, and which oriented links relate aiai to aja^{\prime}j if jaaj\in a\cap a^{\prime}, jij\neq i and aaa^{\prime}\neq a. The corresponding 0-11 adjacency matrix AA is defined by the coefficients

Aaiaj=def11{jaa,ji,aa}.A_{ai}^{a^{\prime}j}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\leavevmode\hbox{\rm\small 1\kern-3.23753pt\normalsize 1}_{\{j\in a\cap a^{\prime},\,j\neq i,\,a^{\prime}\neq a\}}. (5.1)

The second ingredient is the set of stochastic matrices B(iaj)B^{(iaj)}, attached to pairs of variables (i,j)(i,j) having a factor node aa in common, and which coefficients are the conditional beliefs,

bk(iaj)=defba(xj=|xi=k)=𝐱a{i,j}ba(𝐱a)bi(xi)|xi=kxj=b_{k\ell}^{(iaj)}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}b_{a}(x_{j}=\ell|x_{i}=k)=\sum_{\mathbf{x}_{a\setminus\{i,j\}}}\frac{b_{a}(\mathbf{x}_{a})}{b_{i}(x_{i})}\Bigg{|}_{\begin{subarray}{c}x_{i}=k\\ x_{j}=\ell\end{subarray}}

for all (k,){1,,q}2(k,\ell)\in\{1,\ldots,q\}^{2}.

Using the representation (2.10) of the BP algorithm, the Jacobian reads at this point:

Θai,xi(m)maj(xj)\displaystyle\frac{\partial\Theta_{ai,x_{i}}(m)}{\partial m_{a^{\prime}\to j}(x_{j})} =𝐱a{i,j}ba(𝐱a)bi(xi)mai(xi)maj(xj)11{jai}11{aj,aa}\displaystyle=\sum_{\mathbf{x}_{a\setminus\{i,j\}}}\frac{b_{a}(\mathbf{x}_{a})}{b_{i}(x_{i})}\frac{m_{a\to i}(x_{i})}{m_{a^{\prime}\to j}(x_{j})}\leavevmode\hbox{\rm\small 1\kern-3.23753pt\normalsize 1}_{\{j\in a\setminus i\}}\leavevmode\hbox{\rm\small 1\kern-3.23753pt\normalsize 1}_{\{a^{\prime}\ni j,a^{\prime}\neq a\}}
=bij|a(xi,xj)bi(xi)mai(xi)maj(xj)Aaiaj\displaystyle=\frac{b_{ij|a}(x_{i},x_{j})}{b_{i}(x_{i})}\frac{m_{a\to i}(x_{i})}{m_{a^{\prime}\to j}(x_{j})}A_{ai}^{a^{\prime}j}

Therefore, the Jacobian of the plain BP algorithm is—using a trivial change of variable—similar to the matrix JJ defined, for any pair (ai,k)(ai,k) and (aj,)(a^{\prime}j,\ell) of 𝔼×{1,,q}\mathbb{E}\times\{1,\ldots,q\} by the elements

Jai,kaj,=defbk(iaj)Aaiaj,J_{ai,k}^{a^{\prime}j,\ell}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}b_{k\ell}^{(iaj)}A_{ai}^{a^{\prime}j},

This expression is analogous to the Jacobian encountered in Mooij and Kappen (2007). It is interesting to note that it only depends on the structure of the graph and on the belief corresponding to the fixed point.

Since 𝒢\mathcal{G} is a singly connected graph, it is clear that AA is an irreducible matrix. To simplify the discussion, we assume in the following that JJ is also irreducible. This will be true as long as the ψ\psi are always positive. It is easy to see that to any right eigenvector of AA corresponds a right eigenvector of JJ associated to the same eigenvalue: if 𝐯=(vai,ai𝔼)\mathbf{v}=(v_{ai},ai\in\mathbb{E}) is such that A𝐯=λ𝐯A\mathbf{v}=\lambda\mathbf{v}, then the vector 𝐯+\mathbf{v}^{+}, defined by coordinates vaj+=defvajv^{+}_{a^{\prime}j\ell}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}v_{a^{\prime}j}, for all aj𝔼a^{\prime}j\in\mathbb{E} and {1,,q}\ell\in\{1,\ldots,q\}, satisfies J𝐯=λ𝐯J\mathbf{v}=\lambda\mathbf{v}. We will say that 𝐯+\mathbf{v}^{+} is a AA-based right eigenvector of JJ. Similarly, if 𝐮\mathbf{u} is a left eigenvector of AA, with obvious notations one can define a AA-based left eigenvector 𝐮+\mathbf{u}^{+} of JJ by the following coordinates: uaik+=defuaibi(k)u^{+}_{aik}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}u_{ai}b_{i}(k).

Using this correspondence between the two matrices, we can prove the following result.

Theorem 5.1.

If the graph 𝒢\mathcal{G} has more than one cycle (C>1C>1), and the matrix JJ is irreducible, then the plain BP update rules (2.12.2) do not admit any stable fixed point.

Proof.

Let 𝝅\boldsymbol{\pi} be the right Perron vector of AA, which has positive entries, since AA is irreducible (Seneta, 2006, Theorem 1.5). The AA-based vector 𝝅+\boldsymbol{\pi}^{+} also has positive coordinates and is therefore the right Perron vector of JJ (Seneta, 2006, Theorem 1.6); the spectral radius of JJ is thus equal to the one of AA.

When C>1C>1, Lemma A.1 implies that 11 is an eigenvalue of AA associated to divergenceless vectors. However, such vectors cannot be non-negative, and therefore the Perron eigenvalue of AA is strictly greater than 11. This concludes the proof of the theorem.  

5.2 Positively homogeneous normalization

We have seen in Proposition 4.1 that all the continuous positively homogeneous normalizations make simple convergence equivalent to bb-convergence. As a result, one expects that local stability of fixed points will again depend on the beliefs structure only. Since all the positively homogeneous normalization share the same properties, we look at the particular case of Zaimess(m)Z_{ai}^{\text{mess}}(m), which is both simple and differentiable. We then obtain a Jacobian matrix with more interesting properties. In particular, this matrix depends not only on the beliefs at the fixed point, but also on the messages themselves: for the normalized BP algorithm (2.8 with ZaimessZ^{\mathrm{mess}}_{ai}), the coefficients of the Jacobian at fixed point mm with beliefs bb read

m~aj()[Θai,k(m~)x=1qΘai,x(m~)]\displaystyle\frac{\partial}{\partial\tilde{m}_{a^{\prime}\to j}(\ell)}\biggl{[}\frac{\Theta_{ai,k}(\tilde{m})}{\sum_{x=1}^{q}\Theta_{ai,x}(\tilde{m})}\biggr{]}
=Jai,kaj,mai(k)maj()mai(k)x=1qJai,xaj,mai(x)maj(),\displaystyle=J_{ai,k}^{a^{\prime}j,\ell}\frac{m_{a\to i}(k)}{m_{a^{\prime}\to j}(\ell)}-m_{a\to i}(k)\sum_{x=1}^{q}J_{ai,x}^{a^{\prime}j,\ell}\frac{m_{a\to i}(x)}{m_{a^{\prime}\to j}(\ell)},

which is again similar to the matrix J~\widetilde{J} of general term

J~ai,kaj,=def[bk(iaj)x=1qmai(x)bx(iaj)]Aaiaj=Jai,kaj,x=1qmai(x)Jai,xaj,.\widetilde{J}_{ai,k}^{a^{\prime}j,\ell}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\biggl{[}b_{k\ell}^{(iaj)}-\sum_{x=1}^{q}m_{a\to i}(x)b_{x\ell}^{(iaj)}\biggr{]}A_{ai}^{a^{\prime}j}=J_{ai,k}^{a^{\prime}j,\ell}-\sum_{x=1}^{q}m_{a\to i}(x)J_{ai,x}^{a^{\prime}j,\ell}. (5.2)

It is actually possible to prove that the spectrum of J~\tilde{J} does not depend on the messages themselves but only of the belief at the fixed point.

Proposition 5.2.

The eigenvectors of JJ are associated to eigenvectors of J~\widetilde{J} with the same eigenvalues, except the AA-based eigenvectors of JJ (including its Perron vector), which belong to the kernel of J~\widetilde{J}.

Proof.

The new Jacobian matrix can be expressed from the old one as J~=(𝕀M)J\widetilde{J}=(\mathbb{I}-M)J, where MM is the matrix whose coefficient at row (ai,k)(ai,k) and column (aj,)(a^{\prime}j,\ell) is 11{a=a,i=j}maj()\leavevmode\hbox{\rm\small 1\kern-3.23753pt\normalsize 1}_{\{a=a^{\prime},i=j\}}m_{a^{\prime}\to j}(\ell). Elementary computations yield the following properties of MM:

  • M2=MM^{2}=M: MM is a projector;

  • J~M=0\widetilde{J}M=0.

For any right eigenvector 𝐯\mathbf{v} of JJ associated to some eigenvalue λ\lambda,

J~(𝐯M𝐯)\displaystyle\widetilde{J}(\mathbf{v}-M\mathbf{v}) =J~𝐯=(𝕀M)J𝐯=λ(𝐯M𝐯)\displaystyle=\widetilde{J}\mathbf{v}=(\mathbb{I}-M)J\mathbf{v}=\lambda(\mathbf{v}-M\mathbf{v})

so that 𝐯M𝐯\mathbf{v}-M\mathbf{v} is a (right) eigenvector of J~\widetilde{J} associated to λ\lambda, unless 𝐯\mathbf{v} is an AA-based eigenvector, in which case 𝐯=M𝐯\mathbf{v}=M\mathbf{v} and 𝐯\mathbf{v} is in the kernel of J~\widetilde{J}.

Similarly, if 𝐮\mathbf{u} is such that 𝐮TJ~=λ𝐮T\mathbf{u}^{T}\widetilde{J}=\lambda\mathbf{u}^{T} for λ0\lambda\neq 0, then λ𝐮TM=𝐮TJ~M=0\lambda\mathbf{u}^{T}M=\mathbf{u}^{T}\widetilde{J}M=0 and therefore 𝐮TJ~=𝐮T(𝕀𝕕M)J=𝐮TJ=λ𝐮T\mathbf{u}^{T}\widetilde{J}=\mathbf{u}^{T}(\mathbb{Id}-M)J=\mathbf{u}^{T}J=\lambda\mathbf{u}^{T}: any non-zero eigenvalue of J~\widetilde{J} is an eigenvalue of JJ. This proves the last part of the theorem.  

As a consequence of this proposition, when JJ is an irreducible matrix, J~\widetilde{J} has a strictly smaller spectral radius: the net effect of normalization is to improve convergence (although it may actually not be enough to guarantee convergence). To quantify this improvement of convergence related to message normalization, we resort to classical arguments used in speed convergence of Markov chains (see e.g. Brémaud (1999)).

The presence of the messages in the Jacobian matrix J~\widetilde{J} complicates the evaluation of this effect. However, it is known (see e.g. Furtlehner et al. (2010)) that it is possible to chose the functions ϕ^\hat{\phi} and ψ^\hat{\psi} as

ϕ^i(xi)=defb^i(xi),ψ^a(𝐱a)=defb^a(𝐱a)iab^i(xi),\hat{\phi}_{i}(x_{i})\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\hat{b}_{i}(x_{i}),\qquad\hat{\psi}_{a}(\mathbf{x}_{a})\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\frac{\hat{b}_{a}(\mathbf{x}_{a})}{\prod_{i\in a}\hat{b}_{i}(x_{i})}, (5.3)

in order to obtain a prescribed set of beliefs b^\hat{b} at a fixed point. Indeed, BP will admit a fixed point with ba=b^ab_{a}=\hat{b}_{a} and bi=b^ib_{i}=\hat{b}_{i} when mai(xi)1m_{a\to i}(x_{i})\equiv 1. Since only the beliefs matter here, without loss of generality, we restrict ourselves in the remainder of this section to the functions (5.3). Then, from (5.2), the definition of J~\widetilde{J} rewrites

J~ai,kaj,=def[bk(iaj)1qx=1qbx(iaj)]Aaiaj=Jai,kaj,1qx=1qJai,xaj,.\widetilde{J}_{ai,k}^{a^{\prime}j,\ell}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\biggl{[}b_{k\ell}^{(iaj)}-\frac{1}{q}\sum_{x=1}^{q}b_{x\ell}^{(iaj)}\biggr{]}A_{ai}^{a^{\prime}j}=J_{ai,k}^{a^{\prime}j,\ell}-\frac{1}{q}\sum_{x=1}^{q}J_{ai,x}^{a^{\prime}j,\ell}.

For each connected pair (i,j)(i,j) of variable nodes, we associate to the stochastic kernel B(iaj)B^{(iaj)} a combined stochastic kernel K(iaj)=defB(iaj)B(jai)K^{(iaj)}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}B^{(iaj)}B^{(jai)}, with coefficients

Kk(iaj)=defm=1qbkm(iaj)bm(jai).K_{k\ell}^{(iaj)}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\sum_{m=1}^{q}b_{km}^{(iaj)}b_{m\ell}^{(jai)}. (5.4)

Since b(i)B(iaj)=b(j)b^{(i)}B^{(iaj)}=b^{(j)}, b(i)b^{(i)} is the invariant measure associated to KK:

b(i)K(iaj)=b(i)B(iaj)B(jai)=b(j)B(jai)=b(i)b^{(i)}K^{(iaj)}=b^{(i)}B^{(iaj)}B^{(jai)}=b^{(j)}B^{(jai)}=b^{(i)}

and K(iaj)K^{(iaj)} is reversible, since

bk(i)Kk(iaj)\displaystyle b_{k}^{(i)}K_{k\ell}^{(iaj)} =m=1qbmk(jai)bm(j)bm(jai)\displaystyle=\sum_{m=1}^{q}b_{mk}^{(jai)}b_{m}^{(j)}b_{m\ell}^{(jai)}
=m=1qbmk(jai)bm(iaj)b(i)=b(i)Kk(iaj).\displaystyle=\sum_{m=1}^{q}b_{mk}^{(jai)}b_{\ell m}^{(iaj)}b_{\ell}^{(i)}=b_{\ell}^{(i)}K_{\ell k}^{(iaj)}.

Let μ2(iaj)\mu_{2}^{(iaj)} be the second largest eigenvalue of K(iaj)K^{(iaj)} and let

μ2=defmaxij|μ2(iaj)|12.\mu_{2}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\max_{ij}|\mu_{2}^{(iaj)}|^{\frac{1}{2}}.

The combined effect of the graph and of the local correlations, on the stability of the reference fixed point is stated as follows.

Theorem 5.3.

Let λ1\lambda_{1} be the Perron eigenvalue of the matrix AA

  1. (i)

    if λ1μ2<1\lambda_{1}\mu_{2}<1, the fixed point of the normalized BP schema (2.8 with ZaimessZ^{\mathrm{mess}}_{ai}) associated to bb is stable.

  2. (ii)

    condition (i) is necessary and sufficient if the system is homogeneous (B(iaj)=BB^{(iaj)}=B independent of ii, jj and aa), with μ2\mu_{2} representing the second largest eigenvalue of BB.

Proof.

See Appendix B  

The quantity μ2\mu_{2} is representative of the level of mutual information between variables. It relates to the spectral gap (see e.g. Diaconis and Strook (1991) for geometric bounds) of each elementary stochastic matrix B(iaj)B^{(iaj)}, while λ1\lambda_{1} encodes the statistical properties of the graph connectivity. The bound λ1μ2<1\lambda_{1}\mu_{2}<1 could be refined when dealing with the statistical average of the sum over path in (B.1) which allows to define μ2\mu_{2} as

μ2=limnmax(ai,aj){1|Γai,aj(n)|γΓai,aj(n)((x,y)γμ2(xy))12n}.\mu_{2}=\lim_{n\to\infty}\max_{(ai,a^{\prime}j)}\Bigl{\{}\frac{1}{|\Gamma_{ai,a^{\prime}j}^{\scriptscriptstyle(n)}|}\sum_{\gamma\in\Gamma_{ai,a^{\prime}j}^{\scriptscriptstyle(n)}}\Bigl{(}\prod_{(x,y)\in\gamma}\mu_{2}^{(xy)}\Bigr{)}^{\frac{1}{2n}}\Bigr{\}}.

5.3 Local convergence in quotient space 𝒩𝒲\mathcal{N}\setminus\mathcal{W}

The idea is to make the connexion between local stability of fixed point as described previously and the same notion of local stability but in the quotient space 𝒩𝒲\mathcal{N}\setminus\mathcal{W} described in Section 4. Trivial computation based on the results of Section 5.1 gives us the derivatives of Λ\Lambda.

Λai,xi(μ)μbj(xj)=bij|a(xi,xj)bi(xi)Aaibj=Jai,xibj,xj.\frac{\partial\Lambda_{ai,x_{i}}(\mu)}{\partial\mu_{bj}(x_{j})}=\frac{b_{ij|a}(x_{i},x_{j})}{b_{i}(x_{i})}A_{ai}^{bj}=J_{ai,x_{i}}^{bj,x_{j}}.

In terms of convergence in 𝒩𝒲\mathcal{N}\setminus\mathcal{W}, the stability of a fixed point is given by the projection of JJ on the quotient space 𝒩𝒲\mathcal{N}\setminus\mathcal{W} and we have (Mooij and Kappen, 2007) :

[J]=def[Λ]=[Λ][J]\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}[\nabla\Lambda]=\nabla[\Lambda]
Proposition 5.4.

The eigenvalues of [J][J] are the eigenvalues of JJ which are not associated with AA-based eigenvectors. The AA-based eigenvectors of JJ belong to the kernel of [J][J]

Proof.

Let vv be an eigenvector of JJ for the eigenvalue λ\lambda, we have

[Jv]=[λv]=λ[v],[Jv]=[\lambda v]=\lambda[v],

so [v][v] is an eigenvector of [J][J] with the same eigenvalue λ\lambda iff [v]0[v]\neq 0. The AA-based eigenvectors (see Section 5.1) ww of JJ belongs to 𝒲\mathcal{W} so we have

[w]=0.[w]=0.

It means that these eigenvectors of JJ have no equivalent w.r.t [J][J] and play no role in belief fixed point stability.  

We have seen that the normalization ZaimessZ^{\mathrm{mess}}_{ai} is equivalent to multiplying the jacobian matrix JJ by the projection 𝕀M\mathbb{I}-M (Proposition 5.2), with

ker(𝕀M)=𝒲.\ker(\mathbb{I}-M)=\mathcal{W}.

The projection 𝕀M\mathbb{I}-M is in fact a quotient map from 𝒩\mathcal{N} to 𝒩𝒲\mathcal{N}\setminus\mathcal{W}. So the normalization ZaimessZ^{\mathrm{mess}}_{ai} is strictly equivalent, when we look at the messages mai(xi)m_{a\to i}(x_{i}), to working on the quotient space 𝒩𝒲\mathcal{N}\setminus\mathcal{W}. More generally for any differentiable positively homogeneous normalization we will obtain the same result, the jacobian of the corresponding normalized scheme will be the projection of the jacobian JJ on the quotient space 𝒩𝒲\mathcal{N}\setminus\mathcal{W}, through some quotient map.

6 Normalization in the variational problem

Since Proposition 4.2 shows that the choice of normalization has no real effect on the dynamic of BP, it will have no effect on bb-convergence either. In this section, we turn to the effect of normalization on the underlying variational problem. It will be assumed here that the beliefs bib_{i} and bab_{a} are normalized (2.5) and compatible (2.14). If only (2.14) is satisfied, they will be denoted βi\beta_{i} and βa\beta_{a}. It is quite obvious that imposing only compatibility constraints leads to a unique normalization constant ZZ

Z(β)=defxiβi(xi)=𝐱aβa(𝐱a),Z(\beta)\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\sum_{x_{i}}\beta_{i}(x_{i})=\sum_{\mathbf{x}_{a}}\beta_{a}(\mathbf{x}_{a}),

which is not a priori related to the constants Za(m)Z_{a}(m) and Zi(m)Z_{i}(m) seen in the previous sections. The quantities βi(xi)/Z(β)\beta_{i}(x_{i})/Z(\beta) and βa(𝐱a)/Z(β)\beta_{a}(\mathbf{x}_{a})/Z(\beta) can be denoted as bi(xi)b_{i}(x_{i}) and ba(𝐱a)b_{a}(\mathbf{x}_{a}) since (2.5) holds for them.

The aim of this section is to explicit the relationship between the minimizations of the Bethe free energy (2.15) with and without normalization constraints (2.5). Generally speaking, we can express them as a minimization problem 𝒫(E)\mathcal{P}(E) on some set EE as

𝒫(E):argminβEF(β)\mathcal{P}(E)\quad:\quad\underset{\beta\in E}{\operatorname{argmin}}\,F(\beta) (6.1)

where EE is chosen as follows

  • plain case: E=E1E=E_{1} is the set of positive measures such as (2.14) holds,

  • normalized case: E=E2E1E=E_{2}\varsubsetneq E_{1} has the additional constraint (2.5).

It is possible to derive a BP algorithm for the plain problem following the same path as in Section 2. The resulting update equations will be identical, except for the γi\gamma_{i} terms.

The first step is to compare the solutions of (6.1) on E1E_{1} and E2E_{2}. Let φ\varphi be the bijection between E1E_{1} and E2×+E_{2}\times\mathbb{R}^{*}_{+},

φ:\displaystyle\varphi: E2×+E1\displaystyle\,E_{2}\times\mathbb{R}^{*}_{+}\longrightarrow E_{1}
(b,Z)bZ.\displaystyle(b,Z)\longrightarrow bZ.

The variational problem 𝒫(E1)\mathcal{P}(E_{1}) is equivalent to

(b^,Z^)=argmin(b,Z)E2F(φ(b,Z)),(\hat{b},\hat{Z})=\underset{(b,Z)\in E_{2}}{\operatorname{argmin}}\,F(\varphi(b,Z)),

with φ(b^,Z^)=b^Z^=β^=defargminβE1F(β)\varphi(\hat{b},\hat{Z})=\hat{b}\hat{Z}=\hat{\beta}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\underset{\beta\in E_{1}}{\operatorname{argmin}}\,F(\beta).

The next step is to express the Bethe free energy F(β)F(\beta) of an unnormalized positive measure β\beta as a function of the Bethe free energy of the corresponding normalized measure bb.

Lemma 6.1.

As soon as the factor graph is connected, for any β=ZbE1\beta=Zb\in E_{1} we have

F(Zb)=Z(F(b)+(1C)logZ),F(Zb)=Z\bigl{(}F(b)+(1-C)\log Z\bigr{)}, (6.2)

with CC being the number of independent cycles of the graph.

Proof.
F(β)\displaystyle F(\beta) =F(Zb)\displaystyle=F(Zb)
=Z[a,𝐱aba(𝐱a)log(Zba(𝐱a)ψa(𝐱a))+i,xibi(xi)log((Zbi(xi))1diϕi(xi))]\displaystyle=Z\Bigl{[}\sum_{a,\mathbf{x}_{a}}b_{a}(\mathbf{x}_{a})\log\Bigl{(}\frac{Zb_{a}(\mathbf{x}_{a})}{\psi_{a}(\mathbf{x}_{a})}\Bigr{)}+\sum_{i,x_{i}}b_{i}(x_{i})\log\Bigl{(}\frac{\left(Zb_{i}(x_{i})\right)^{1-d_{i}}}{\phi_{i}(x_{i})}\Bigr{)}\Bigr{]}
=Z(F(b)+(|𝔽|+|𝕍||𝔼|)logZ)\displaystyle=Z\Bigl{(}F(b)+(|\mathbb{F}|+|\mathbb{V}|-|\mathbb{E}|)\log Z\Bigr{)}
=Z(F(b)+(1C)logZ),\displaystyle=Z\Bigl{(}F(b)+(1-C)\log Z\Bigr{)},

where the last equality comes from elementary graph theory (see e.g. Berge (1967)).  

The quantity 1C1-C will be negative in the nontrivial cases (at least 22 cycles). Since all the ZbZb are equivalent from our point of view, we look at the derivatives of F(Zb)F(Zb) as a function of ZZ to see what happens in the plain variational problem.

Theorem 6.2.

The normalized beliefs corresponding to the extrema of the plain variational problem 𝒫(E1)\mathcal{P}(E_{1}) are exactly the same as the ones of the normalized problem 𝒫(E2)\mathcal{P}(E_{2}) as soon as C1C\neq 1.

Proof.

Using Lemma 6.1 we obtain

F(β)Z=F(b)+(1C)(logZ+1),\frac{\partial F(\beta)}{\partial Z}=F(b)+(1-C)(\log Z+1),

the stationary points are

Z^=exp(F(b)C11).\hat{Z}=\exp\Big{(}\frac{F(b)}{C-1}-1\Big{)}. (6.3)

At these points we can compute the Bethe free energy

F(β^)=F(Z^b)=(C1)exp(F(b)C11)=G(F(b)).F(\hat{\beta})=F(\hat{Z}b)=(C-1)\exp\Big{(}\frac{F(b)}{C-1}-1\Big{)}=G(F(b)).

It is easy to check that, if C1C\neq 1, GG is an increasing function, so the extrema of F(β)F(\beta) are reached at the same normalized beliefs. More precisely, if b1b_{1} and b2b_{2} are elements of E2E_{2} such that F(b1)F(b2)F(b_{1})\leq F(b_{2}) then F(β^1=Z^1b1)F(β^2=Z^2b2)F(\hat{\beta}_{1}=\hat{Z}_{1}b_{1})\leq F(\hat{\beta}_{2}=\hat{Z}_{2}b_{2}), which allows us to conclude.  

In other words, imposing a normalization in the variational problem or normalizing after a solution is reached is equivalent as long as C1C\neq 1. Moreover, in the unnormalized case, the Bethe free energy at the local extremum writes

F(b)=(𝒞1)(logZ^+1).F(b)=(\mathcal{C}-1)(\log\hat{Z}+1). (6.4)

We can therefore compare the “quality” of different fixed points by comparing only the normalization constant obtained: the smaller ZZ is, the better the approximation, modulo the fact that we’re not minimizing a true distance.

When C=1C=1, it has been shown already in Section 3 that the normalized scheme is always convergent, whereas the plain scheme can have no fixed point. In this case, (6.2) rewrites

F(β)=F(Zb)=ZF(b).F(\beta)=F(Zb)=ZF(b).

The form of this relationship shows what happens: if the extremum of the normalized variational problem is strictly negative, F(β)F(\beta) is unbounded from below and ZZ will diverge to ++\infty; conversely, if the extremum is strictly positive, ZZ will go to zero. In the (very) particular case where the minimum of the normalized problem is equal to zero, the problem is still well defined. In fact this condition F(b)=0F(b)=0 is equivalent to the one of Theorem 3.2 when 𝒞=1\mathcal{C}=1.


To sum up, as soon as the plain variational problem is well defined, it is equivalent to the normalized one and the normalization constant allows to compute easily the Bethe free energy using (6.4). When this is no longer the case, we still know that the dynamics of both algorithms remain the same (Proposition 4.2) but the plain variational problem (which can still converge in terms of beliefs) will not converge in terms of normalization constant ZZ, and we have no more easy information on the fixed point free energy.


As emphasized previously, the relationship between Z^\hat{Z}, Za(m)Z_{a}(m) and Zi(m)Z_{i}(m) is not trivial. In the case of the plain BP algorithm, for which Za(m)=Zi(m)Z_{a}(m)=Z_{i}(m), an elementary computation yields the following relation at any fixed point

F(b)=(𝒞1)logZa(m),F(b)=(\mathcal{C}-1)\log Z_{a}(m),

which seemingly contradicts (6.4). In fact, the algorithm derived from the plain variational problem is not exactly the plain BP scheme. Usually, since one resorts to some kind of normalization, the multiplicative constants of the fixed point equations are discarded (see Yedidia et al. (2005) for more details). Keeping track of them yields

mai(xi)=exp(di2di1)Θai,xi(m),m_{a\to i}(x_{i})=\exp\left(\frac{d_{i}-2}{d_{i}-1}\right)\Theta_{ai,x_{i}}(m), (6.5)
βa(𝐱a)\displaystyle\beta_{a}(\mathbf{x}_{a}) =1eψa(𝐱a)janja(xj),\displaystyle=\frac{1}{e}\psi_{a}(\mathbf{x}_{a})\prod_{j\in a}n_{j\to a}(x_{j}),
βi(xi)\displaystyle\beta_{i}(x_{i}) =ϕi(xi)exp(1di1)bimbi(xi).\displaystyle=\phi_{i}(x_{i})\exp\left(\frac{1}{d_{i}-1}\right)\prod_{b\ni i}m_{b\to i}(x_{i}).

Actually, the plain update scheme (2.1,2.2) corresponds to some constant normalization exp(di2di1)\exp\left(\frac{d_{i}-2}{d_{i}-1}\right). Without any normalization, using (6.5) as update rule, one would obtain

Z^=Za(m)e=Zi(m)exp(1di1).\hat{Z}=\frac{Z_{a}(m)}{e}=Z_{i}(m)\exp\left(\frac{1}{d_{i}-1}\right).

7 Conclusion

This paper motivation was to fill a void in the literature about the effect of normalization on the BP algorithm. What we have learnt can be summarized in a few main points

  • using a normalization in BP can in some rare cases kill or create new fixed points;

  • not all normalizations are created equal when it comes to message convergence, but there is a big category of positive homogeneous normalization that all have the same effect;

  • the user is ultimately concerned with convergence of beliefs, and thankfully the dynamic of normalized beliefs is insensitive to normalization.

The messages having no interest by themselves, it is worthy of remark that combining the update rules (2.12) recalled below

mai(xi)bi|a(xi)bi(xi)mai(xi),m_{a\to i}(x_{i})\leftarrow\frac{b_{i|a}(x_{i})}{b_{i}(x_{i})}m_{a\to i}(x_{i}),

and the definition (2.3) and (2.4) of beliefs, one can eliminate the messages and obtain

bi(xi)\displaystyle b_{i}(x_{i}) bi(xi)aibi|a(xi)bi(xi),\displaystyle\leftarrow b_{i}(x_{i})\prod_{a\ni i}\frac{b_{i|a}(x_{i})}{b_{i}(x_{i})},
ba(𝐱a)\displaystyle b_{a}(\mathbf{x}_{a}) ba(𝐱a)iaci,cabi|c(xi)bi(xi),\displaystyle\leftarrow b_{a}(\mathbf{x}_{a})\prod_{i\in a}\prod_{c\ni i,c\neq a}\frac{b_{i|c}(x_{i})}{b_{i}(x_{i})},

One particularity of these update rules is that they do not depend on the functions ψ\psi or ϕ\phi but only on the graph structure. The dependency on the joint law (1.1) occurs only through the initial conditions. This “product sum” algorithm therefore shares common properties for all models build on the same underlying graph, and the initial conditions should impose the details of the joint law. To our knowledge this algorithm has never been studied and we let it for future work.

Appendix A Spectral properties of the factor graph

This appendix is devoted to some properties of the matrix AA defined in (5.1) that are used in Sections 3 and 5.

We consider two types of fields associated to 𝒢\mathcal{G}, namely scalar fields and vector fields. Scalar fields are quantities attached to the vertices of the graph, while vector fields are attached to its edges. A vector field 𝐰={wai,ai𝔼}\mathbf{w}=\{w_{ai},\ ai\in\mathbb{E}\} is divergenceless if

a𝔽,iawai=0andi𝕍,aiwai=0.\forall a\in\mathbb{F},\ \sum_{i\in a}w_{ai}=0\quad\text{and}\quad\forall i\in\mathbb{V},\ \sum_{a\ni i}w_{ai}=0.

A vector field 𝐮={uai,ai𝔼}\mathbf{u}=\{u_{ai},\ ai\in\mathbb{E}\} is a gradient if there exists a scalar field {ua,ui,a𝔽,i𝕍}\{u_{a},u_{i},\ a\in\mathbb{F},\ i\in\mathbb{V}\} such that

ai𝔼,uai=uaui.\forall ai\in\mathbb{E},\ u_{ai}=u_{a}-u_{i}.

There is an orthogonal decomposition of any vector field into a divergenceless and a gradient component. Indeed, the scalar product

𝐰T𝐮=ai𝔼waiuai=a𝔽uaiawaii𝕍uiaiwai,\mathbf{w}^{T}\mathbf{u}=\sum_{ai\in\mathbb{E}}w_{ai}u_{ai}=\sum_{a\in\mathbb{F}}u_{a}\sum_{i\in a}w_{ai}-\sum_{i\in\mathbb{V}}u_{i}\sum_{a\ni i}w_{ai},

is 0 for all gradient fields 𝐮\mathbf{u} iff 𝐰\mathbf{w} is divergenceless. Dimensional considerations show that any vector field 𝐯\mathbf{v} can be decomposed in this way.

In the following, it will be useful to define the Laplace operator Δ\Delta associated to 𝒢\mathcal{G}. For any scalar field 𝐮\mathbf{u}:

(Δ𝐮)a=defdauaiaui,a𝔽\displaystyle(\Delta\mathbf{u})_{a}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}d_{a}u_{a}-\sum_{i\in a}u_{i},\qquad\forall a\in\mathbb{F} (A.1)
(Δ𝐮)i=defdiuiaiua,i𝕍.\displaystyle(\Delta\mathbf{u})_{i}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}d_{i}u_{i}-\sum_{a\ni i}u_{a},\qquad\forall i\in\mathbb{V}. (A.2)

The following lemma describes the spectrum of AA in terms of a Laplace equation on the graph 𝒢\mathcal{G}.

Lemma A.1.

(i) Both gradient and divergenceless vector spaces are AA-invariant and divergenceless vectors are eigenvectors of AA with eigenvalue 11. (ii) eigenvectors associated to eigenvalues λ1\lambda\neq 1 are gradient vectors of a scalar field 𝐮\mathbf{u} which satisfies

(Δ𝐮)a=(λ1)(da1)λuaand(Δ𝐮)i=(1λ)ui.\bigl{(}\Delta\mathbf{u}\bigr{)}_{a}=\frac{(\lambda-1)(d_{a}-1)}{\lambda}u_{a}\ \text{and}\ \bigl{(}\Delta\mathbf{u}\bigr{)}_{i}=(1-\lambda)u_{i}. (A.3)

and there exists a gradient vector associated to 11 iff 𝒢\mathcal{G} has exactly one cycle (C=1C=1).

Proof.

The action of AA on a given vector 𝐱\mathbf{x} reads

aj𝔼Aaiajxaj=ja(ajxajxaj)aixai+xai,\sum_{a^{\prime}j\in\mathbb{E}}A_{ai}^{a^{\prime}j}x_{a^{\prime}j}=\sum_{j\in a}\Bigl{(}\sum_{a^{\prime}\ni j}x_{a^{\prime}j}-x_{aj}\Bigr{)}-\sum_{a^{\prime}\ni i}x_{a^{\prime}i}+x_{ai},

The first two terms in the second member vanish if 𝐱\mathbf{x} is divergenceless. In addition, the first term in parentheses is independent of ii while the second one is independent of aa so the first assertion is justified. We concentrate then on solving the eigenvalue equation A𝐱λ𝐱=0A\mathbf{x}-\lambda\mathbf{x}=0 for a gradient vector 𝐱\mathbf{x}, with xai=uauix_{ai}=u_{a}-u_{i}. A𝐱λ𝐱A\mathbf{x}-\lambda\mathbf{x} is the gradient of a constant scalar KK\in\mathbb{R}, and by identification we have

{(Δ𝐮)a+ja(Δ𝐮)j=(1λ)ua+K(Δ𝐮)i=(1λ)ui+K.\begin{cases}\displaystyle\bigl{(}\Delta\mathbf{u}\bigr{)}_{a}+\sum_{j\in a}\bigl{(}\Delta\mathbf{u}\bigr{)}_{j}=(1-\lambda)u_{a}+K\\[5.69046pt] \displaystyle\bigl{(}\Delta\mathbf{u}\bigr{)}_{i}=(1-\lambda)u_{i}+K.\end{cases}

The Laplacian of a constant scalar is zero, so for λ1\lambda\neq 1, KK may be reabsorbed in 𝐮\mathbf{u} and, combining these two equations with the help of identities (A.1,A.2), yields equation (A.3). For λ=1\lambda=1, we obtain

(Δ𝐮)a=(1da)Kand(Δ𝐮)i=K.\bigl{(}\Delta\mathbf{u}\bigr{)}_{a}=(1-d_{a})K\qquad\text{and}\qquad\bigl{(}\Delta\mathbf{u}\bigr{)}_{i}=K. (A.4)

Let DD be the diagonal matrix associated to the graph 𝒢\mathcal{G}, whose diagonal entries are the degrees dad_{a} and did_{i} of each node. M=𝕀D1ΔM=\mathbb{I}-D^{-1}\Delta is a stochastic irreducible matrix, which unique right Perron vector (1,,1)(1,\ldots,1) generates the kernel of Δ\Delta. As a result, for K=0K=0, the solution to (A.4) is ua=ui=cteu_{a}=u_{i}=cte so that xai=0x_{ai}=0.

For K0K\neq 0, there is a solution if the second member of (A.4) is orthogonal (Δ\Delta is a symmetric operator) to the kernel. The condition reads

0=a(1da)+i1=|𝔽||𝔼|+|𝕍|=1C,0=\sum_{a}(1-d_{a})+\sum_{i}1=|\mathbb{F}|-|\mathbb{E}|+|\mathbb{V}|=1-C,

where the last equality comes from elementary graph theory (see e.g. Berge (1967)).  

Since 11 is an eigenvalue of AA, it is interesting to investigate linear equations involving 𝕀A\mathbb{I}-A. Since it is already known that divergenceless vectors are in the kernel of this matrix, we restrict ourselves to the case where the constant term is of gradient type.

Lemma A.2.

For a given gradient vector field 𝐲\mathbf{y}, the equation

(𝕀A)𝐱=𝐲,\bigl{(}\mathbb{I}-A\bigr{)}\mathbf{x}=\mathbf{y},

has a solution (unique up to a divergenceless vector) iff C1C\neq 1 or C=1C=1 and

a𝔽ya+i𝕍(1di)yi=0.\sum_{a\in\mathbb{F}}y_{a}+\sum_{i\in\mathbb{V}}(1-d_{i})y_{i}=0. (A.5)
Proof.

We look here only for gradient-type solutions xai=uauix_{ai}=u_{a}-u_{i} and write yai=yayiy_{ai}=y_{a}-y_{i}. Owing to the same arguments as in Lemma A.1, there exists a constant KK such that

(Δ𝐮)a\displaystyle\bigl{(}\Delta\mathbf{u}\bigr{)}_{a} =K(da1)+yajayj\displaystyle=K(d_{a}-1)+y_{a}-\sum_{j\in a}y_{j}
(Δ𝐮)i\displaystyle\bigl{(}\Delta\mathbf{u}\bigr{)}_{i} =yiK.\displaystyle=y_{i}-K.

Stating as before the compatibility condition for this equation yields

a𝔽ya+i𝕍(1di)yi=K(C1).\sum_{a\in\mathbb{F}}y_{a}+\sum_{i\in\mathbb{V}}(1-d_{i})y_{i}=K(C-1).

It is always possible to find a suitable KK as long as C1C\neq 1 and when C=1C=1, (A.5) has to hold.  

Appendix B Proof of Theorem 5.3

Let us start with (ii): when the system is homogeneous, J~\widetilde{J} is a tensor product of AA with B~\widetilde{B}, and its spectrum is therefore the product of their respective spectra. In particular if 𝒢\mathcal{G} has uniform degrees dad_{a} and did_{i}, the condition reads

μ2(da1)(di1)<1.\mu_{2}(d_{a}-1)(d_{i}-1)<1.

In order to prove part (i) of the theorem, we will consider a local norm on q\mathbb{R}^{q} attached to each variable node ii,

xb(i)=def(k=1qxk2bk(i))12andxb(i)=defk=1qxkbk(i),\|x\|_{b^{(i)}}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\Bigl{(}\sum_{k=1}^{q}x_{k}^{2}b_{k}^{(i)}\Bigr{)}^{\frac{1}{2}}\ \text{and}\ \langle x\rangle_{b^{(i)}}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\sum_{k=1}^{q}x_{k}b_{k}^{(i)},

the local average of xqx\in\mathbb{R}^{q} w.r.t b(i)b^{(i)}. For convenience we will also consider the somewhat hybrid global norm on q×|𝔼|\mathbb{R}^{q\times|\mathbb{E}|}

xπ,b=defaiπaixaib(i),\|x\|_{\pi,b}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\sum_{a\to i}\pi_{ai}\|x_{ai}\|_{b^{(i)}},

where 𝝅\boldsymbol{\pi} is again the right Perron vector of AA, associated to λ1\lambda_{1}.

We have the following useful inequality.

Lemma B.1.

For any (xi,xj)2q(x_{i},x_{j})\in\mathbb{R}^{2q}, such that xib(i)=0\langle x_{i}\rangle_{b^{(i)}}=0 and xj,b(j)=kxi,kbk(i)Bk(iaj)x_{j,\ell}b_{\ell}^{(j)}=\sum_{k}x_{i,k}b_{k}^{(i)}B_{k\ell}^{(iaj)},

xjb(j)=0andxjb(j)2μ2(iaj)xib(i)2.\langle x_{j}\rangle_{b^{(j)}}=0\qquad\text{and}\qquad\|x_{j}\|_{b^{(j)}}^{2}\leq\mu_{2}^{(iaj)}\|x_{i}\|_{b^{(i)}}^{2}.
Proof.

By definition (5.4), we have

x(j)b(j)2\displaystyle\|x^{(j)}\|_{b^{(j)}}^{2} =k=1q1bk(j)|=1qbk(iaj)b(i)x(i)|2\displaystyle=\sum_{k=1}^{q}\frac{1}{b_{k}^{(j)}}\Bigl{|}\sum_{\ell=1}^{q}b_{\ell k}^{(iaj)}b_{\ell}^{(i)}x_{\ell}^{(i)}\Bigr{|}^{2}
=,mx(i)xm(i)Km(iaj)b(i).\displaystyle=\sum_{\ell,m}x_{\ell}^{(i)}x_{m}^{(i)}K_{\ell m}^{(iaj)}b_{\ell}^{(i)}.

Since K(iaj)K^{(iaj)} is reversible we have from Rayleigh’s theorem

μ2(iaj)=defsupx{kxkxKk(iaj)bk(i)kxk2bk(i),xb(i)=0,x0},\mu_{2}^{(iaj)}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\sup_{x}\Bigl{\{}\frac{\sum_{k\ell}x_{k}x_{\ell}K_{k\ell}^{(iaj)}b_{k}^{(i)}}{\sum_{k}x_{k}^{2}b_{k}^{(i)}},\langle x\rangle_{b^{(i)}}=0,x\neq 0\Bigr{\}},

which concludes the proof.  

To deal with iterations of JJ, we express it as a sum over paths.

(Jn)ai,kaj,=(An)aiaj(Bai,aj(n))k,\bigl{(}J^{n}\bigr{)}_{ai,k}^{a^{\prime}j,\ell}=\bigr{(}A^{n}\bigr{)}_{ai}^{a^{\prime}j}\bigl{(}B_{ai,a^{\prime}j}^{\scriptscriptstyle(n)}\bigr{)}_{k\ell},

where Bai,aj(n)B_{ai,a^{\prime}j}^{\scriptscriptstyle(n)} is an average stochastic kernel,

Bai,aj(n)=def1|Γai,aj(n)|γΓai,aj(n)(x,y)γB(xy).B_{ai,a^{\prime}j}^{\scriptscriptstyle(n)}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}\frac{1}{|\Gamma_{ai,a^{\prime}j}^{\scriptscriptstyle(n)}|}\sum_{\gamma\in\Gamma_{ai,a^{\prime}j}^{\scriptscriptstyle(n)}}\prod_{(x,y)\in\gamma}B^{(xy)}. (B.1)

Γai,aj(n)\Gamma_{ai,a^{\prime}j}^{\scriptscriptstyle(n)} represents the set of directed path of length nn joining aiai and aja^{\prime}j on L(𝒢)L(\mathcal{G}) and its cardinal is precisely |Γai,aj(n)|=(An)aiaj|\Gamma_{ai,a^{\prime}j}^{\scriptscriptstyle(n)}|=\bigr{(}A^{n}\bigr{)}_{ai}^{a^{\prime}j}.

Lemma B.2.

For any (xai,xaj)2q(x_{ai},x_{a^{\prime}j})\in\mathbb{R}^{2q}, such that xib(i)=0\langle x_{i}\rangle_{b^{(i)}}=0 and

xaj,b(j)=kxai,kbk(i)(Bai,aj(n))k,x_{a^{\prime}j,\ell}b_{\ell}^{(j)}=\sum_{k}x_{ai,k}b_{k}^{(i)}\bigl{(}B_{ai,a^{\prime}j}^{\scriptscriptstyle(n)}\bigr{)}_{k\ell},

the following inequality holds

xajb(j)μ2nxaib(i).\|x_{a^{\prime}j}\|_{b^{(j)}}\leq\mu_{2}^{n}\|x_{ai}\|_{b^{(i)}}.
Proof.

Let xajγx_{a^{\prime}j}^{\gamma} the contribution to xajx_{a^{\prime}j} corresponding to the path γΓai,aj(n)\gamma\in\Gamma_{ai,a^{\prime}j}^{\scriptscriptstyle(n)}. Using Lemma B.1 recursively yields for each individual path

xajγb(j)μ2nxaib(i),\|x_{a^{\prime}j}^{\gamma}\|_{b^{(j)}}\leq\mu_{2}^{n}\|x_{ai}\|_{b^{(i)}},

and, owing to triangle inequality,

xajb(j)1|Γai,aj(n)|γΓai,aj(n)xajγb(j)μ2nxaib(i).\|x_{a^{\prime}j}\|_{b^{(j)}}\leq\frac{1}{|\Gamma_{ai,a^{\prime}j}^{\scriptscriptstyle(n)}|}\sum_{\gamma\in\Gamma_{ai,a^{\prime}j}^{\scriptscriptstyle(n)}}\|x_{a^{\prime}j}^{\gamma}\|_{b^{(j)}}\leq\mu_{2}^{n}\|x_{ai}\|_{b^{(i)}}.
 

It is now possible to conclude the proof of the theorem.

Proof of Theorem 5.3(i).

(i) Let 𝐯\mathbf{v} and 𝐯\mathbf{v}^{\prime} two vectors with 𝐯=𝐯J~n=𝐯(𝕀M)Jn\mathbf{v}^{\prime}=\mathbf{v}\tilde{J}^{n}=\mathbf{v}(\mathbb{I}-M)J^{n}, (MM is the projector defined in Proposition 5.2) since J~M=0\tilde{J}M=0. Recall that the effect of (𝕀M)(\mathbb{I}-M) is to first project on a vector with zero local sum, k(𝐯(𝕀M))ai,k=0,i𝕍\sum_{k}\bigl{(}\mathbf{v}(\mathbb{I}-M)\bigr{)}_{ai,k}=0,\ \forall i\in\mathbb{V}, so we assume directly 𝐯\mathbf{v} of the form

vai,k=xai,kbk(i),withxaib(i)=0.v_{ai,k}=x_{ai,k}b_{k}^{(i)},\qquad\text{with}\qquad\langle x_{ai}\rangle_{b^{(i)}}=0.

As a result 𝐯=𝐯Jn=𝐯(𝕀M)\mathbf{v}^{\prime}=\mathbf{v}J^{n}=\mathbf{v}^{\prime}(\mathbb{I}-M) is of the same form. Let xaj,=defvaj,/b(j)x^{\prime}_{a^{\prime}j,\ell}\stackrel{{\scriptstyle\mbox{\tiny def}}}{{=}}v^{\prime}_{a^{\prime}j,\ell}/b_{\ell}^{(j)}. We have

xπ,bajπajai(An)aiajyajb(j)\|x^{\prime}\|_{\pi,b}\leq\sum_{a^{\prime}\to j}\pi_{a^{\prime}j}\sum_{a\to i}\bigl{(}A^{n}\bigr{)}_{ai}^{a^{\prime}j}\|y_{a^{\prime}j}\|_{b^{(j)}}

with yaj,b(j)=kxai,kbk(i)(Bai,aj(n))ky_{a^{\prime}j,\ell}\ b_{\ell}^{(j)}=\sum_{k}x_{ai,k}b_{k}^{(i)}\bigl{(}B_{ai,a^{\prime}j}^{\scriptscriptstyle(n)}\bigr{)}_{k\ell}. From Lemma B.2 applied to yajy_{a^{\prime}j},

xπ,b\displaystyle\|x^{\prime}\|_{\pi,b} ajπajai(An)aiajμ2nxaib(i)=λ1nμ2nxπ,b,\displaystyle\leq\sum_{a^{\prime}\to j}\pi_{a^{\prime}j}\sum_{a\to i}\bigl{(}A^{n}\bigr{)}_{ai}^{a^{\prime}j}\mu_{2}^{n}\|x_{ai}\|_{b^{(i)}}=\lambda_{1}^{n}\mu_{2}^{n}\|x\|_{\pi,b},

since 𝝅\boldsymbol{\pi} is the right Perron vector of AA.  

References

  • Berge (1967) C. Berge. Théorie des graphes et ses applications, volume II of Collection Universitaire des Mathématiques. Dunod, 2ème edition, 1967.
  • Brémaud (1999) P. Brémaud. Markov chains: Gibbs fields, Monte Carlo simulation and queues. Springer-Verlag, 1999.
  • Diaconis and Strook (1991) P. Diaconis and D. Strook. Geometric bounds for eigenvalues of markov chains. Ann. Appl. Probab, 1(1):36–61, 1991.
  • Furtlehner et al. (2010) C. Furtlehner, J.-M. Lasgouttes, and A. Auger. Learning multiple belief propagation fixed points for real time inference. Physica A: Statistical Mechanics and its Applications, 389(1):149–163, 2010.
  • Halmos (1974) P. R. Halmos. Finite-Dimensional Vector Space. Springer-Velag, 1974.
  • Hartfiel (1997) D. J. Hartfiel. System behavior in quotient systems. Applied Mathematics and Computation, 81(1):31–48, 1997.
  • Heskes (2003) T. Heskes. Stable fixed points of loopy belief propagation are minima of the Bethe free energy. Advances in Neural Information Processing Systems, 15, 2003.
  • Ihler et al. (2005) A. Ihler, J. I. Fischer, and A. Willsky. Loopy belief propagation: Convergence and effects of message errors. J. Mach. Learn. Res., 6:905–936, 2005.
  • Kschischang et al. (2001) F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Trans. on Inf. Th., 47(2):498–519, 2001.
  • Mooij and Kappen (2007) J. M. Mooij and H. J. Kappen. Sufficient conditions for convergence of the sum-product algorithm. IEEE Trans. on Inf. Th., 53(12):4422–4437, 2007.
  • Pearl (1988) J. Pearl. Probabilistic Reasoning in Intelligent Systems: Network of Plausible Inference. Morgan Kaufmann, 1988.
  • Seneta (2006) E. Seneta. Non-negative matrices and Markov chains. Springer, 2006.
  • Tatikonda and Jordan (2002) S. Tatikonda and M. Jordan. Loopy belief propagation and gibbs measures. In UAI-02, pages 493–50, 2002.
  • Wainwright (2002) M. J. Wainwright. Stochastic processes on graphs with cycles: geometric and variational approaches. PhD thesis, MIT, Jan. 2002.
  • Watanabe and Fukumizu (2009) Y. Watanabe and K. Fukumizu. Graph zeta function in the bethe free energy and loopy belief propagation. In Advances in Neural Information Processing Systems, volume 22, pages 2017–2025, 2009.
  • Weiss (2000) Y. Weiss. Correctness of local probability propagation in graphical models with loops. Neural Computation, 12(1):1–41, 2000.
  • Yedidia et al. (2005) J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inform. Theory., 51(7):2282–2312, 2005.