This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Statistical Uncertainty Principle in Stochastic Dynamics

Ying-Jen Yang ying-jen.yang@stonybrook.edu Laufer Center for Physical and Quantitative Biology, State University of New York, Stony Brook, New York 11794, USA    Hong Qian Department of Applied Mathematics, University of Washington, Seattle, Washington 98195, USA
(March 28, 2025)
Abstract

Maximum entropy principle identifies forces conjugated to observables and the thermodynamic relations between them, independent upon their underlying mechanistic details. For data about state distributions or transition statistics, the principle can be derived from limit theorems of infinite data sampling. This derivation reveals its empirical origin and clarify the meaning of applying it to large but finite data. We derive an uncertainty principle for the statistical variations of the observables and the inferred forces. We use a toy model for molecular motor as an example.

Thermodynamics has been the guiding empirical principle for statistical physicists to understand heat engines and condensed matters [1, 2]. Importantly, it identifies entropic forces conjugated to the observables of interest and dictates force-observable relations such as the equation of state and the Maxwell relations. However, textbook thermodynamics was limited to describing state distributions in large, mechanical systems at equilibrium. Extensions were thus needed for its application to nonequilibrium [3], small [4, 5], and dynamical [6, 7] systems.

The complexity of biological systems poses an additional challenge for formulating such a theory of forces. The “constituting individuals” of complex biological systems, e.g. a cell in a tissue or an organism in a ecosystem, are themselves high dimensional. When describing these complicated biological systems, it is often not practical to model its constituting individuals from classical or quantum mechanics. We consider these systems far from mechanics as the practical mathematical models that describe them are not Hamiltonian-based. To formulate a thermodynamic theory of forces for biological systems, we must formulate a theory that applies to not just nonequilibrium, small, dynamical systems, but also systems that are not modeled by mechanics [8, 9].

The formulation of such a theory, known as the Maximum Entropy principle (MaxEnt), has been summarized nicely by E. T. Jaynes [10, 6] and put into practices [7]. This paper mainly adds on two points. First, we revisit and advocate the empirical origin of MaxEnt based on limit theorems in the idealization of having infinite data. Second, closely following the logic of the first point, we derive and explain the uncertainty principle between the statistical variations of dynamical observables and the conjugated path entropic forces they infer.

We first argue that both of the two other mainstream derivations of MaxEnt implicitly assume the idealized limit of infinite data when MaxEnt is applied to real data. Specifically, both Jaynes’ “maximum-ignorance” argument [10] and Shore and Johnson’s axiomatic derivations [11] are formulated about the expected values of the observables. And, as true expected values are only available in the data infinitus limit, these formulations implicitly assumed data infinitum. Whenever one measures a sample average from large but finite data and consider the sample average a good approximation to the true expected value, it is plugged into MaxEnt as if the data were infinitely big.

With this, we advocate the empirical derivation of MaxEnt based solely on mathematical limit theorems of the data infinitus limit. This derivation has at least two advantages. On the one hand, it explicitly states the data infinitum assumption and clarifies how MaxEnt is used in finite but large data: MaxEnt applies to Big Data as a leading order approximation, as how textbook thermodynamics is applied to finite but large system. On the other hand, this derivation shows three equivalent interpretations of the MaxEnt posterior with clear connections to Bayesian conditioning. Further, it provides the entropy function a statistical meaning instead of treating it as an auxiliary function for inference.

We then revisit the limit theorem based derivation of the dynamic extension of MaxEnt [12, 13, 14], now commonly known as the maximum caliber principle (MaxCal) [6, 7]. We use it to derive the uncertainty principle of the statistical variations of observables and forces, which we shall call it the Statistical Uncertainty Principle (SUP) for stochastic dynamics. We will use a simple three-state toy model of molecular motor and the data it produces as an example. The SUP is different from the recently-celebrated thermodynamic uncertainty relation in stochastic thermodynamics [15, 16]. Our SUP is closer to the uncertainty principle in quantum mechanics as both of them are from invertible mathematical transforms: Legendre for SUP and Fourier for quantum.

Maximum Entropy Principle for Markov Processes

The empirical derivation of MaxEnt for state distribution data of independent and identically-distributed (i.i.d.) ensemble has been revisited by one of us [17, 18]. Here, we briefly revisit the data-driven derivation of its extension to correlated data about transitions [12, 13, 14].

Before we begin, let us first remark that applying MaxEnt to stochastic processes is conceptually straightforward from either Jaynes’ argument of least-bias [11] or Shore and Johnson’s axioms [11, 7]: one simply replaces state distribution with path distribution. Jaynes called this the Maximum Caliber principle (MaxCal) [6]. In this generalization, the stochastic process needs not to be Markovian or have a steady state. Once we know the expected values of some path observables, it can be used in MaxCal to provide an update on the our model of path probabilities. However, if one aims to get these expected values from data, one has to rely on the law of large number for convergence, i.e. the data about dynamics has to be either i.i.d. ensembles of paths or the large collections of Markov correlated transitions in a single long path. The former belongs to the formulation of i.i.d. ensembles and has been reviewed before. Here, we focus on the latter generalization where the data is Markov correlated about consequent transitions.

Let us begin by considering (a vector of) transition-based observables 𝒈ij\boldsymbol{g}_{ij} in a discrete-time Markov chain (DTMC) where ii and jj are in the state space 𝒳\mathcal{X}. The steady-state expected value of 𝒈ij\boldsymbol{g}_{ij} is

𝒈=i,j𝒳πiKj|i𝒈ij\left\langle\boldsymbol{g}\right\rangle=\sum_{i,j\in\mathcal{X}}\pi_{i}K_{j|i}\boldsymbol{g}_{ij} (1)

where πi\pi_{i} is the steady state distribution and Kj|iK_{j|i} is the underlying transition probability matrix of the DTMC from ii to jj. The ergodic theory for Markov chain tells us that the long-term empirical average of 𝒈ij\boldsymbol{g}_{ij} converges to the steady-state mean value 𝒈\left\langle\boldsymbol{g}\right\rangle:

limT𝒈¯T=limT1Tt=1t=T𝒈xt1,xt=𝒈.\lim_{T\rightarrow\infty}\bar{\boldsymbol{g}}_{T}=\lim_{T\rightarrow\infty}\frac{1}{T}\sum_{t=1}^{t=T}\boldsymbol{g}_{x_{t-1},x_{t}}=\left\langle\boldsymbol{g}\right\rangle. (2)

The underlying mechanism of this convergence is the convergence of the joint empirical frequency of a transition pair in a length-TT path x0:Tx_{0:T},

fij(T)# of ij in x0:Ttime length T,f_{ij}(T)\coloneqq\frac{\#\text{ of }i\mapsto j\text{ in }x_{0:T}}{\text{time length }T}, (3)

to the steady-state pair probability πiKj|i\pi_{i}K_{j|i} in the long-term limit TT\rightarrow\infty. These laws of large number for Markov chains are the direct extension from the i.i.d. sample of distribution to correlated-data produced by Markov processes.

Now, similar to the i.i.d. case [19, 18], a key to derive MaxEnt is that the frequency fijf_{ij} has an asymptotical distribution with an exponential form under a prior Markov chain model with probability \mathbb{Q} [20]:

{fij}=exp[Ti,j𝒳fijlnfj|iRj|i+o(T)].\mathbb{Q}\{f_{ij}\}=\exp\left[-T\sum_{i,j\in\mathcal{X}}f_{ij}\ln\frac{f_{j|i}}{R_{j|i}}+o(T)\right]. (4)

The matrix Rj|iR_{j|i} is our prior transition matrix that defines our prior probability \mathbb{Q}, and the matrix fj|if_{j|i} is the empirical transition matrix calculated by fij/k𝒳fikf_{ij}/\sum_{k\in\mathcal{X}}f_{ik}. Then, we can consider three conceptually different posterior joint stationary probability [12], which are all mathematically equivalent in the long-term limit T:T\rightarrow\infty:

a) the asymptotic conditional probability:

Pij=limT{Xs(T)=i,Xs(T)+1=j|𝒈¯T}P_{ij}^{*}=\lim_{T\rightarrow\infty}\mathbb{Q}\{X_{s(T)}=i,X_{s(T)+1}=j|\bar{\boldsymbol{g}}_{T}\} (5)

where the time label ss is a function of TT, chosen such that XsX_{s} and Xs+1X_{s+1} are at the steady state of the process;

b) asymptotic conditional expectation of the empirical pair frequency:

Pij=limT𝔼[fij(T)|𝒈¯T]P_{ij}^{*}=\lim_{T\rightarrow\infty}\mathbb{E}[f_{ij}(T)|\bar{\boldsymbol{g}}_{T}] (6)

where 𝔼[]\mathbb{E}[\cdot] is taken w.r.t. the prior model \mathbb{Q};

c) the most probable empirical frequency:

Pij\displaystyle P_{ij}^{*} =argmin{f}{i,j𝒳fijlnfj|iRj|i𝜷(i,j𝒳fij𝒈i,j𝒈¯)\displaystyle=\arg\min_{\left\{f\right\}}\Big{\{}\sum_{i,j\in\mathcal{X}}f_{ij}\ln\frac{f_{j|i}}{R_{j|i}}-\boldsymbol{\beta}\cdot(\sum_{i,j\in\mathcal{X}}f_{ij}\boldsymbol{g}_{i,j}-\bar{\boldsymbol{g}})
i𝒳σij𝒳(fijfji)ν(i,j𝒳fij1)}.\displaystyle-\sum_{i\in\mathcal{X}}\sigma_{i}\sum_{j\in\mathcal{X}}\left(f_{ij}-f_{ji}\right)-\nu(\sum_{i,j\in\mathcal{X}}f_{ij}-1)\Big{\}}. (7)

The three constraints in Eq. (7) are the empirical averages of data ad infinitum, the stationary constraint, and the normalization. The equivalency among the three are known as the Gibbs conditioning principle [21, 12].

We note that the “entropy” to be extremized in the Markov correlated data case here in Eq. (7) is not the Kullback-Leibler relative entropy of pair probabilities i,j𝒳PijlnPijπiRRj|i\sum_{i,j\in\mathcal{X}}P_{ij}\ln\frac{P_{ij}}{\pi_{i}^{R}R_{j|i}} where πiR\pi_{i}^{R} is the stationary distribution of prior Rj|iR_{j|i}. The fundamental reason of this is because MaxCal is about the whole path x(t),t>0x(t),t>0, not just one step. This can be seen from the alternative MaxCal derivation of Eq. (7) shown in the Supplemental Material.

Since Eq. (7) is a less-known MaxEnt calculation, we briefly summarize the recipe of computing the posterior joint probability PijP_{ij}^{*} below. First, we construct a tilted matrix Mij(𝜷)=Rj|ie𝜷𝒈i,jM_{ij}(\boldsymbol{\beta})=R_{j|i}e^{\boldsymbol{\beta}\cdot\boldsymbol{g}_{i,j}}. Second, we compute its largest eigenvalue λ\lambda and the corresponding left and right eigenvectors, lil_{i} and rir_{i} (chosen such that i𝒳liri=1\sum_{i\in\mathcal{X}}l_{i}r_{i}=1). The Perron-Forbenius theorem guarantees that λ\lambda is real and non-negative lil_{i} and rir_{i} can be found. Third, the posterior probability transition in terms of 𝜷\boldsymbol{\beta} is then given by

Pj|i=rj(𝜷)λ(𝜷)ri(𝜷)Rj|ie𝜷𝒈i,j\displaystyle P_{j|i}^{*}=\frac{r_{j}(\boldsymbol{\beta})}{\lambda(\boldsymbol{\beta})r_{i}(\boldsymbol{\beta})}R_{j|i}e^{\boldsymbol{\beta}\cdot\boldsymbol{g}_{i,j}} (8)

with the stationary distribution given by πi=li(𝜷)ri(𝜷)\pi_{i}^{*}=l_{i}(\boldsymbol{\beta})r_{i}(\boldsymbol{\beta}) and Pij=πiPj|iP_{ij}^{*}=\pi_{i}^{*}P_{j|i}^{*}. Finally, we solve 𝜷\boldsymbol{\beta}(𝒈)\left\langle\boldsymbol{g}\right\rangle) according to 𝒈=lnλ(𝜷)\left\langle\boldsymbol{g}\right\rangle=\nabla\ln\lambda(\boldsymbol{\beta}), which is a set of PDEs that can be solved systematically with optimization procedures described later in Eq. (12) and Eq. (13).

Thermodynamic structures emerge from limit theorems.

Statistical thermodynamics can be derived generally by MaxEnt [7, 18]. And, based on the data-driven empirical derivation of MaxEnt reviewed above, we can consider thermodynamics as emerged from the data infinitus limit, for both i.i.d. ensembles and Markov correlated transitions.

The origin of the thermodynamic structure is the convex duality between a pair of functions, known as entropy and free energy in classical thermodynamics. For i.i.d. data about the distribution of states, the “entropy” is the posterior relative entropy, φ(𝒈)iΩpilogpiqi\varphi(\left\langle\boldsymbol{g}\right\rangle)\coloneqq\sum_{i\in\Omega}p_{i}^{*}\log\frac{p_{i}^{*}}{q_{i}}, and the “free energy” is the generating function of the observable 𝒈\boldsymbol{g}, ψ(𝜷)logiΩe𝜷𝒈\psi(\boldsymbol{\beta})\coloneqq\log\sum_{i\in\Omega}e^{\boldsymbol{\beta}\cdot\boldsymbol{g}}. This is the textbook classical thermodynamics [17, 18]. For transition-based Markov correlated data, the “entropy” becomes the posterior path relative entropy,

φ(𝒈)i,j𝒳PijlogPj|iRj|i,\varphi(\left\langle\boldsymbol{g}\right\rangle)\coloneqq\sum_{i,j\in\mathcal{X}}P_{ij}^{*}\log\frac{P_{j|i}^{*}}{R_{j|i}}, (9)

and the “free energy” is the scaled generating function for the empirical sum 𝑮Tt=1T𝒈xt1,xt,\boldsymbol{G}_{T}\coloneqq\sum_{t=1}^{T}\boldsymbol{g}_{x_{t-1},x_{t}},

ψ(𝜷)limT1Tlog𝔼[e𝜷𝑮T]=logλ(𝜷),\psi\left(\boldsymbol{\beta}\right)\coloneqq\lim_{T\rightarrow\infty}\frac{1}{T}\log\mathbb{E}\left[e^{\boldsymbol{\beta}\cdot\boldsymbol{G}_{T}}\right]=\log\lambda(\boldsymbol{\beta}), (10)

which becomes the logarithm of the largest eigenvalue λ\lambda computed by the tilted matrix. In both cases, the “free energy” ψ\psi is a generating function, and the entropy φ\varphi is the extremized value of a entropy function.

Convex duality between the entropy φ\varphi and the free energy ψ\psi emerges from limit theorems. On the one hand, the free energy ψ\psi is always the Legendre-Fenchel transform of the entropy φ\varphi.

ψ(𝜷)=max𝒙[𝜷𝒙φ(𝒙)].\psi(\boldsymbol{\beta})=\max_{\boldsymbol{x}}\left[\boldsymbol{\beta}\cdot\boldsymbol{x}-\varphi(\boldsymbol{x})\right]. (11)

This is a direct consequence of computing ψ\psi from its definition with the Laplace’s approximation in asymptotic analysis [22, 17]. On the other hand, the inverse of Eq. (11) requires the existence and differentiability of ψ\psi. This is known as the Gärtner-Ellis theorem [23, 24, 22]:

φ(𝒈)\displaystyle\varphi(\left\langle\boldsymbol{g}\right\rangle) =max𝝃[𝝃𝒈ψ(𝝃)].\displaystyle=\max_{\boldsymbol{\xi}}\left[\boldsymbol{\xi}\cdot\left\langle\boldsymbol{g}\right\rangle-\psi(\left\langle\boldsymbol{\xi}\right\rangle)\right]. (12)

These Legendre-Fenchel-transform expressions of φ\varphi and ψ\psi tell us that both of them are convex functions [22]. With differentiable ψ\psi and φ\varphi, the two Legendre-Fenchel transforms above reduces to a single Legendre transform, which encodes derivative relations between the dual coordinates 𝒈\left\langle\boldsymbol{g}\right\rangle and 𝜷\boldsymbol{\beta} of the system,

𝜷=φ(𝒈) and 𝒈=ψ(𝜷),\boldsymbol{\beta}=\nabla\varphi(\left\langle\boldsymbol{g}\right\rangle)\text{ and }\left\langle\boldsymbol{g}\right\rangle=\nabla\psi(\boldsymbol{\beta}), (13)

as well as the Maxwell’s relations associated with them. Importantly, Eq. (13) shows that the parameters 𝜷\boldsymbol{\beta} are the entropy forces conjugated to the observables 𝒈\left\langle\boldsymbol{g}\right\rangle.

Statistical Uncertainty Principle (SUP)

We are now ready to discuss the dynamic extension of the uncertainty principle between the statistical variations of observables (e.g. energy) and of the inferred conjugated entropic forces (e.g. 1/temperature) in thermodynamics [1, 25, 26, 27]. We shall call it the statistical uncertainty principle (SUP) since it is an leading-order statistical result for large but finite data. Our contributions here are twofold: a) We extends SUP from state observables [1, 25, 26] to transition observables, from distribution to dynamics; b) SUP shows the physical meaning of a well-known mathematical relation in large deviation theory.

To illustrate, let’s consider the following scenario. Suppose Bob has transition-based data in the form of the empirical mean 𝒈¯T=i,j𝒳fT(i,j)𝒈(i,j)\bar{\boldsymbol{g}}_{T}=\sum_{i,j\in\mathcal{X}}f_{T}(i,j)\boldsymbol{g}(i,j) for a large but finite data of length TT. Note that 𝒈¯T\bar{\boldsymbol{g}}_{T} is itself random: if Bob repeats the experiment, he can get different values of 𝒈¯T\bar{\boldsymbol{g}}_{T}. Now, suppose Alice knows the true expected values 𝒈\left\langle\boldsymbol{g}\right\rangle of the transition observable 𝒈(i,j)\boldsymbol{g}(i,j) either because she had data ad infinitum or due to other sources, she can then predict the asymptotic variation of Bob’s 𝒈¯T\bar{\boldsymbol{g}}_{T} quantified by the covariance matrix of Bob’s 𝒈¯T\bar{\boldsymbol{g}}_{T} to the leading order,

Co𝕍[𝒈¯T]1Tψ(𝜷).{\rm Co}\mathbb{V}[\bar{\boldsymbol{g}}_{T}]\sim\frac{1}{T}\nabla\nabla\psi\left(\boldsymbol{\beta}\right). (14)

Bob can verify this by repeatedly measuring 𝒈¯T\bar{\boldsymbol{g}}_{T} from i.i.d. copies of the length-TT process.

Each time Bob gets a 𝒈¯T\bar{\boldsymbol{g}}_{T}, he can use it to infer the level of entropic forces 𝜷\boldsymbol{\beta}. He computes 𝜷¯T=φ(𝒈¯T)\bar{\boldsymbol{\beta}}_{T}=\nabla\varphi(\bar{\boldsymbol{g}}_{T}) by using the entropy function φ\varphi given by Eq. (9) and plug in the 𝒈¯T\bar{\boldsymbol{g}}_{T} he measured. This inferred force 𝜷¯T\bar{\boldsymbol{\beta}}_{T} fluctuates due to the stochasticity of 𝒈¯T\bar{\boldsymbol{g}}_{T}. Alice can also derive the leading-order fluctuation of Bob’s 𝜷¯T\bar{\boldsymbol{\beta}}_{T}, which becomes

Co𝕍[𝜷¯T]1Tφ(𝒈).{\rm Co}\mathbb{V}[\bar{\boldsymbol{\beta}}_{T}]\sim\frac{1}{T}\nabla\nabla\varphi\left(\left\langle\boldsymbol{g}\right\rangle\right). (15)

See Supplemental Material for a derivation. Since φ\varphi and ψ\psi have a reciprocal curvature due to the Legendre transform [22], we then have the SUP for the fluctuations of 𝒈¯T\bar{\boldsymbol{g}}_{T} and of the 𝜷¯T\bar{\boldsymbol{\beta}}_{T} they infer:

ψ(𝜷)TCo𝕍[𝒈¯T]φ(𝒈)TCo𝕍[𝜷¯T]=𝐈.\underset{\sim T{\rm Co}\mathbb{V}[\bar{\boldsymbol{g}}_{T}]}{\underbrace{\nabla\nabla\psi\left(\boldsymbol{\beta}\right)}}\underset{\sim T{\rm Co}\mathbb{V}[\bar{\boldsymbol{\beta}}_{T}]}{\underbrace{\nabla\nabla\varphi\left(\left\langle\boldsymbol{g}\right\rangle\right)}}=\mathbf{I}. (16)

While this mathematical relation is well-known to the large deviation theory community [22], to our knowledge this is the first time its statistical meaning is pointed out. It is worth noticing that Schlögl has derived an inequality version of SUP without taking data infinitus limit, which can be regarded as the mesoscopic origin of SUP [27].

A simple toy model of molecular motor as an example

To illustrate SUP, let us consider a simple three-state Markov chain as a toy model for a molecular motor monomer like myosin [28]. Our toy molecular motor is assumed to have three states with state space illustrated in Fig. 1.

Refer to caption
Figure 1: The state space of a three-state toy model of molecular motor (myosin). Solid (dashed) arrows indicate transitions in a forward (backward) cycle.

The motor at state 1 is bounded to the actin. Through coupling with an ATP, it detaches and becomes state 2. Then, hydrolysis of ATP leads to the mechanically deformed state 3. Through releasing ADP and Pi, the motor generates a power stroke (from - to +) and re-attach to the actin, back to state 1. The dynamics of this motor (in discrete time) is described by the transition probabilities Pj|iP_{j|i} from ii to jj, where i,j{1,2,3}i,j\in\left\{1,2,3\right\}.

With big data about a long trajectory of the motor’s state, we can use counting statistics to infer Pj|iP_{j|i}. Let us consider the following six linearly independent counting frequencies: the occurance frequency of two of the three states, say frequencies f2f_{2} and f3f_{3}, the symmetric flux (what Maes called traffic in [29]) over all edges measured by

fijsym=(# of ij)+(# of ji)length of trajectoryf_{ij}^{\text{sym}}=\frac{\left(\text{\# of }i\mapsto j\right)+\left(\text{\# of }j\mapsto i\right)}{\text{length of trajectory}} (17)

where ij={12,23,13}ij=\{12,23,13\}, and the net (antisymmetric) flux of the power stroke step, from state 33 to 11, denoted by

f31anti=(# of 31)(# of 13)length of trajectory.f_{31}^{\text{anti}}=\frac{\left(\text{\# of }3\mapsto 1\right)-\left(\text{\# of }1\mapsto 3\right)}{\text{length of trajectory}}. (18)

Note that this f31antif_{31}^{{\rm anti}} is the net empirical velocity of the motor over a length TT trajectory.

Following the MaxEnt recipe mentioned above, we assume a uniform prior Rj|i=1/3R_{j|i}=1/3 and construct the tilted matrix

Mij=13(1eβ12eβ13γeα2+β12eα2eα2+β23eα3+β13+γeα3+β23eα3)M_{ij}=\frac{1}{3}\left(\begin{array}[]{ccc}1&e^{\beta_{12}}&e^{\beta_{13}-\gamma}\\ e^{\alpha_{2}+\beta_{12}}&e^{\alpha_{2}}&e^{\alpha_{2}+\beta_{23}}\\ e^{\alpha_{3}+\beta_{13}+\gamma}&e^{\alpha_{3}+\beta_{23}}&e^{\alpha_{3}}\end{array}\right) (19)

with six parameters (α2,α3,β12,β23,β13,γ)(\alpha_{2},\alpha_{3},\beta_{12},\beta_{23},\beta_{13},\gamma) corresponding to the six observables 𝒇=(f2,f3,f12sym,f23sym,f13sym,f31anti)\boldsymbol{f}=\left(f_{2},f_{3},f_{12}^{{\rm sym}},f_{23}^{{\rm sym}},f_{13}^{{\rm sym}},f_{31}^{{\rm anti}}\right). Then by Eq. (8), the posterior transition probabilities take the form of

Pj|i=13λ(1eβ12r2r1eβ13γr3r1eα2+β12r1r2eα2eα2+β23r3r2eα3+β13+γr1r3eα3+β23r2r3eα3)P_{j|i}^{*}=\frac{1}{3\lambda}\left(\begin{array}[]{ccc}1&e^{\beta_{12}}\frac{r_{2}}{r_{1}}&e^{\beta_{13}-\gamma}\frac{r_{3}}{r_{1}}\\ e^{\alpha_{2}+\beta_{12}}\frac{r_{1}}{r_{2}}&e^{\alpha_{2}}&e^{\alpha_{2}+\beta_{23}}\frac{r_{3}}{r_{2}}\\ e^{\alpha_{3}+\beta_{13}+\gamma}\frac{r_{1}}{r_{3}}&e^{\alpha_{3}+\beta_{23}}\frac{r_{2}}{r_{3}}&e^{\alpha_{3}}\end{array}\right) (20)

where λ\lambda is the largest eigenvalue of MM and rr is the corresponding right eigenvector.

The set of observables 𝒇\boldsymbol{f} we chose is holographic, i.e. it captures all degrees of freedom of the dynamics. When the trajectory becomes very long, the ergodic theorem of Markov chain guarantees that (f2,f3)(π2,π3)\left(f_{2},f_{3}\right)\rightarrow\left(\pi_{2},\pi_{3}\right),

fijsymτij=Pij+Pji,f_{ij}^{{\rm sym}}\rightarrow\tau_{ij}=P_{ij}+P_{ji}, (21)

and

f31antiJ=P31P13.f_{31}^{{\rm anti}}\rightarrow J=P_{31}-P_{13}. (22)

With these six averages 𝒇=(π2,π3,τ12,τ23,τ13,J)\left\langle\boldsymbol{f}\right\rangle=(\pi_{2},\pi_{3},\tau_{12},\tau_{23},\tau_{13},J), one can uniquely compute the true underlying Pj|iP_{j|i} as a function of these six averages. Furthermore, since our observables are non-degenerate, simple relations between the six parameters and the transition probabilities can be derived 111One of us is writing a paper regarding the general version of this fact.:

αn\displaystyle\alpha_{n} =lnPn|nlnP1|1\displaystyle=\ln P_{n|n}-\ln P_{1|1} (23a)
βij\displaystyle\beta_{ij} =12lnPj|iPi|jPi|iPj|j\displaystyle=\frac{1}{2}\ln\frac{P_{j|i}P_{i|j}}{P_{i|i}P_{j|j}} (23b)
β12\displaystyle\beta_{12} =12lnP2|1P1|2P1|1P2|2\displaystyle=\frac{1}{2}\ln\frac{P_{2|1}P_{1|2}}{P_{1|1}P_{2|2}} (23c)
γ\displaystyle\gamma =12logP2|1P3|2P1|3P3|1P2|3P1|2\displaystyle=\frac{1}{2}\log\frac{P_{2|1}P_{3|2}P_{1|3}}{P_{3|1}P_{2|3}P_{1|2}} (23d)

where n={2,3}n=\{2,3\} and ij={12,23,13}ij=\{12,23,13\}. Notice that γ\gamma is (half of) the cycle affinity, an important term in stochastic thermodynamics [31].

Recall from Eq. (13) that the six parameters in Eqs. (23) are actually the entropic forces. In our example here, we plug in Rj|i=1/3R_{j|i}=1/3 and

Pj|i=τij+Jij2πiP_{j|i}^{*}=\frac{\tau_{ij}+J_{ij}}{2\pi_{i}} (24)

into the entropy form φ(𝒇)\varphi(\left\langle\boldsymbol{f}\right\rangle) in Eq. (9) by using normalization π1=1π2π3\pi_{1}=1-\pi_{2}-\pi_{3} and stationarity J12=J23=J31=JJ_{12}=J_{23}=J_{31}=J. One can easily check that

αn=φπn,βij=φτij,γ=φJ\alpha_{n}=\frac{\partial\varphi}{\partial\pi_{n}},\beta_{ij}=\frac{\partial\varphi}{\partial\tau_{ij}},\gamma=\frac{\partial\varphi}{\partial J} (25)

for n=2,3n=2,3 and ij{12,23,31}ij\in\{12,23,31\}.

The SUP in Eq. (16) is an asymptotic relation between the covariance of the six frequency observables 𝒇\boldsymbol{f} collected from a long but finite trajectory and the forces they inferred by computing 𝑭=φ(𝒇).\boldsymbol{F}=\nabla\varphi(\boldsymbol{f}). By numerically produce a big ensemble of very long trajectories with length TT, we can play Bob’s role and check the SUP for the empirical 6×66\times 6 scaled covariance of frequencies TCo𝕍[𝒇]T{\rm Co\mathbb{V}[\text{$\boldsymbol{f}$}]} and that of the inferred forces TCo𝕍[𝑭]T{\rm Co}\mathbb{V}[\boldsymbol{F}]. Their product is indeed very close to the identify matrix with vanishing difference to the identity matrix shown in Fig. (2).

Refer to caption
Figure 2: The image shows the difference between the product of statistic variations of frequencies and forces, T2Co𝕍[𝒇]Co𝕍[𝑭]T^{2}{\rm Co}\mathbb{V}[\boldsymbol{f}]{\rm Co}\mathbb{V}[\boldsymbol{F}], in a simulated data and the identify matrix. The underlying transition probabilities are P:|1=(1340,920,920),P:|2=(320,14,35),P_{:|1}=\left(\frac{13}{40},\frac{9}{20},\frac{9}{20}\right),P_{:|2}=\left(\frac{3}{20},\frac{1}{4},\frac{3}{5}\right), P:|3=(2140,310,740)P_{:|3}=\left(\frac{21}{40},\frac{3}{10},\frac{7}{40}\right). This is chosen arbitrarily so that (π2,π3,τ12,τ23,τ13,J)=(13,13,15,14,310,110)\left(\pi_{2},\pi_{3},\tau_{12},\tau_{23},\tau_{13},J\right)=\left(\frac{1}{3},\frac{1}{3},\frac{1}{5},\frac{1}{4},\frac{3}{10},\frac{1}{10}\right). We simulate M=105M=10^{5} i.i.d. copies of length T=1000T=1000 trajectories and compute the six empirical frequencies and the inferred forces from each trajectories. The differences would shrink to zero when MM\rightarrow\infty and TT\rightarrow\infty.

Summary

In this paper, we revisit and advocate the empirical derivation of the Maximum Entropy principle of Markov correlated data [12, 13, 14], also known as the Maximum Caliber principle. We review how the principle can identify entropic forces, and lead to statistical thermodynamic conjugacy. From the empirical understanding and data-driven derivation that we revisited and advocated, we derived an uncertainty principle between the statistical variations of observables and the forces they infer from finite data. This theory is purely empirical and can thus be applied to trajectory data from small, nonequilibrium, dynamical biological systems that are far from mechanics.

In short, more is indeed different [32]. Maximum entropy principle and statistical thermodynamics emerge empirically and de-mechanically from the limit theorems of data ad infinitum, i.i.d. or Markov correlated. Akin to the uncertainty principle in quantum mechanics, there is an uncertainty principle about the statistical variations of dynamical observables and forces for Big Data.

Acknowledgements.
The authors thank Erin Angelini, Ken Dill, Charles Kocher, Zhiyue Lu, Jonny Patcher, Dalton Sakthivadivel, Dominic Skinner, David Sivak, Lowell Thompson, and Jin Wang for helpful feedback on our manuscript and for the stimulating discussions they had with the authors. H. Q. thanks the support from the Olga Jung Wan Endowed Professorship.

References

  • Landau and Lifshitz [1980] L. D. Landau and E. M. Lifshitz, Statistical Physics, Third Edition, Part 1: Volume 5, 3rd ed. (Butterworth-Heinemann, Amsterdam u.a, 1980).
  • Huang [1987] K. Huang, Statistical Mechanics, 2nd Edition, 2nd ed. (Wiley, New York, 1987).
  • Nicolis and Prigogine [1977] G. Nicolis and I. Prigogine, Self-Organization in Nonequilibrium Systems: From Dissipative Structures to Order through Fluctuations, 1st ed. (Wiley, New York, 1977).
  • Hill [2013] T. L. Hill, Thermodynamics of Small Systems, Parts I & II (Dover Publications, Mineola, New York, 2013).
  • Bedeaux et al. [2020] D. Bedeaux, S. Kjelstrup, and S. Schnell, Nanothermodynamics: General Theory (2020).
  • Jaynes [1980] E. T. Jaynes, The Minimum Entropy Production Principle, Annual Review of Physical Chemistry 31, 579 (1980).
  • Pressé et al. [2013] S. Pressé, K. Ghosh, J. Lee, and K. A. Dill, Principles of maximum entropy and maximum caliber in statistical physics, Rev. Mod. Phys. 85, 1115 (2013).
  • Szilard [1929] L. Szilard, Uber die Ausdehnung der phanomenologischen Thermodynamik auf die Schewankungserscheinungen, Z. Phys. 32, 840 (1929).
  • Mandelbrot [1964] B. Mandelbrot, On the Derivation of Statistical Thermodynamics from Purely Phenomenological Principles, J. Math. Phys. 5, 164 (1964).
  • Jaynes [1957] E. T. Jaynes, Information Theory and Statistical Mechanics, Phys. Rev. 106, 620 (1957).
  • Shore and Johnson [1980] J. Shore and R. Johnson, Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy, IEEE Trans. Inf. Theor. 26, 26 (1980).
  • Csiszar et al. [1987] I. Csiszar, T. Cover, and Byoung-Seon Choi, Conditional limit theorems under Markov conditioning, IEEE Trans. Inform. Theory 33, 788 (1987).
  • Chetrite and Touchette [2013] R. Chetrite and H. Touchette, Nonequilibrium Microcanonical and Canonical Ensembles and Their Equivalence, Phys. Rev. Lett. 111, 120601 (2013).
  • Chetrite and Touchette [2015] R. Chetrite and H. Touchette, Nonequilibrium Markov Processes Conditioned on Large Deviations, Ann. Henri Poincaré 16, 2005 (2015).
  • Barato and Seifert [2015] A. C. Barato and U. Seifert, Thermodynamic Uncertainty Relation for Biomolecular Processes, Phys. Rev. Lett. 114, 158101 (2015).
  • Horowitz and Gingrich [2020] J. M. Horowitz and T. R. Gingrich, Thermodynamic uncertainty relations constrain non-equilibrium fluctuations, Nat Phys 16, 15 (2020).
  • Qian and Cheng [2020] H. Qian and Y.-C. Cheng, Counting single cells and computing their heterogeneity: from phenotypic frequencies to mean value of a quantitative biomarker, Quant Biol 8, 172 (2020).
  • Lu and Qian [2022] Z. Lu and H. Qian, Emergence and Breaking of Duality Symmetry in Generalized Fundamental Thermodynamic Relations, Phys. Rev. Lett. 128, 150603 (2022).
  • Cheng et al. [2021] Y.-C. Cheng, H. Qian, and Y. Zhu, Asymptotic Behavior of a Sequence of Conditional Probability Distributions and the Canonical Ensemble, Ann. Henri Poincaré 22, 1561 (2021).
  • Barato and Chetrite [2015] A. C. Barato and R. Chetrite, A Formal View on Level 2.5 Large Deviations and Fluctuation Relations, J Stat Phys 160, 1154 (2015).
  • Dembo and Zeitouni [2009] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications, 2nd ed. (Springer, Berlin Heidelberg, 2009).
  • Touchette [2009] H. Touchette, The large deviation approach to statistical mechanics, Physics Reports 478, 1 (2009).
  • Gärtner [1977] J. Gärtner, On Large Deviations from the Invariant Measure, Theory Probab. Appl. 22, 24 (1977).
  • Ellis [1984] R. S. Ellis, Large Deviations for a General Class of Random Vectors, The Annals of Probability 12, 1 (1984).
  • Mandelbrot [1989] B. B. Mandelbrot, Temperature Fluctuation: A Well-Defined and Unavoidable Notion, Physics Today 42, 71 (1989), publication Title: Physics Today Publisher: American Institute of PhysicsAIP.
  • Uffink and van Lith [1999] J. Uffink and J. van Lith, Thermodynamic Uncertainty Relations, Foundations of Physics 29, 655 (1999).
  • Schlögl [1988] F. Schlögl, Thermodynamic uncertainty relation, Journal of Physics and Chemistry of Solids 49, 679 (1988).
  • Qian [1997] H. Qian, A simple theory of motor protein kinetics and energetics, Biophysical Chemistry 67, 263 (1997).
  • Maes [2020] C. Maes, Frenesy: Time-symmetric dynamical activity in nonequilibria, Physics Reports 850, 1 (2020).
  • Note [1] One of us is writing a paper regarding the general version of this fact.
  • Yang and Qian [2021] Y.-J. Yang and H. Qian, Bivectorial Nonequilibrium Thermodynamics: Cycle Affinity, Vorticity Potential, and Onsager’s Principle, J Stat Phys 182, 46 (2021).
  • Anderson [1972] P. W. Anderson, More Is Different, Science 177, 393 (1972).