This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Autoparallelity of Quantum Statistical Manifolds in The Light of Quantum Estimation Theory

Hiroshi Nagaoka The University of Electro-Communications, Chofu, Tokyo ​182-8585, Japan. nagaoka@is.uec.ac.jp Akio Fujiwara Department of Mathematics, Osaka University, Toyonaka, Osaka 560-0043, Japan. fujiwara@math.sci.osaka-u.ac.jp
Abstract

In this paper we study the autoparallelity w.r.t. the e-connection for an information-geometric structure called the SLD structure, which consists of a Riemannian metric and mutually dual e- and m-connections, induced on the manifold of strictly positive density operators. Unlike the classical information geometry, the e-connection has non-vanishing torsion, which brings various mathematical difficulties. The notion of e-autoparallel submanifolds is regarded as a quantum version of exponential families in classical statistics, which is known to be characterized as statistical models having efficient estimators (unbiased estimators uniformly achieving the equality in the Cramér-Rao inequality). As quantum extensions of this classical result, we present two different forms of estimation-theoretical characterizations of the e-autoparallel submanifolds. We also give several results on the e-autoparallelity, some of which are valid for the autoparallelity w.r.t. an affine connection in a more general geometrical situation.

Keywords— quantum estimation theory, information geometry, autoparallel submanifold, dual connection, torsion, SLD (symmetric logarithmic derivative)

1 Introduction

The autoparallelity is a multi-dimensional version (including the 1-dimensional case in particular) of the notion of geodesic for a manifold equipped with an affine connection. In the classical information geometry for manifolds of probability distributions, where the triple (g,(e),(m))(g,\nabla^{({\rm e})},\nabla^{({\rm m})}) of the Fisher metric gg, the e-connection (e)\nabla^{({\rm e})} and the m-connection (m)\nabla^{({\rm m})} plays the leading role, the autoparallelity w.r.t. (with respect to) the e-connection, which is called the e-autoparallelity, is particularly important. This is because the e-autoparallel submanifolds of the space 𝒫{\mathcal{P}} consisting of all strictly positive probability distributions on a finite set are the exponential families, which is one of the key concepts in probability theory and statistics. For quantum statistical manifolds, which are submanifolds of the space 𝒮{\mathcal{S}} consisting of all strictly positive density operators on a finite-dimensional Hilbert space, we may introduce an analogous notion of exponential families as autoparallel submanifolds w.r.t.  an affine connection on 𝒮{\mathcal{S}} analogous to the classical e-connection. However, the notion of quantum exponential families introduced in this way does not necessarily have statistical and/or physical importance. One of the main achievements of the present paper is that the e-autoparallelity for the SLD structure, which is among a family of information-geometric structures introduced on 𝒮{\mathcal{S}} in a unified way, has been shown to possess estimation-theoretical characterizations.

In order to clarify our motivation, we begin with an overview of a result in the classical estimation theory. Let 𝒫=𝒫(Ω){\mathcal{P}}={\mathcal{P}}(\Omega) be the totality of strictly positive probability distributions (probability mass functions) on a finite set Ω\Omega, and let ={pξ|ξ=(ξ1,,ξn)Ξ}𝒫{\mathcal{M}}=\{p_{\xi}\,|\,\xi=(\xi^{1},\dots,\xi^{n})\in\Xi\}\subset{\mathcal{P}} be a statistical model whose elements pξp_{\xi} are smoothly and injectively parametrized by an nn-dimensional parameter ξ\xi ranging over an open subset Ξ\Xi of n\mathbb{R}^{n}. As is well known, the variance matrix Vξn×nV_{\xi}\in\mathbb{R}^{n\times n} of an arbitrary unbiased estimator for the parameter ξ\xi satisfies the Cramér-Rao inequality VξGξ1V_{\xi}\geq G_{\xi}^{-1}, where Gξn×nG_{\xi}\in\mathbb{R}^{n\times n} denotes the Fisher information matrix. An unbiased estimator achieving the equality Vξ=Gξ1V_{\xi}=G_{\xi}^{-1} for every ξΞ\xi\in\Xi is called an efficient estimator for the parameter ξ\xi, whose existence is known to impose strong restrictions on both the set {\mathcal{M}} and the parametrization ξpξ\xi\mapsto p_{\xi}. Namely, we have the following theorem (e.g. § ​5a.2 (p.324) of [1], Eq. (7.14) in §​ 7.2 of [2], Theorem 3.12 of [3]).

Theorem 1.1.

For a statistical model ={pξ}{\mathcal{M}}=\{p_{\xi}\}, the following conditions are equivalent.

  • (i)

    There exists an efficient estimator for the parameter ξ\xi.

  • (ii)

    {\mathcal{M}} is an exponential family, and ξ\xi is an expectation parameter.

Condition (ii) in the above theorem means that the elements of {\mathcal{M}} are represented as

pξ(ω)=exp[C(ω)+i=1nθi(ξ)Fi(ω)ψ(ξ)]p_{\xi}(\omega)=\exp[C(\omega)+\sum_{i=1}^{n}\theta_{i}(\xi)F^{i}(\omega)-\psi(\xi)] (1.1)

and that the parameter ξ\xi satisfies

ξi=Eξ(Fi),\xi^{i}=E_{\xi}(F^{i}), (1.2)

where EξE_{\xi} denotes the expectation w.r.t. the distribution pξp_{\xi}. This condition is expressed in the language of geometry as follows.

  • (ii)’

    {\mathcal{M}} is autoparallel in 𝒫{\mathcal{P}} w.r.t. the e-connection of 𝒫{\mathcal{P}}, and ξ\xi forms an affine coordinate system w.r.t. the m-connection of {\mathcal{M}}.

Remark 1.2.

In the notation of [3] as well as of many references on information geometry, θi\theta_{i}, FiF^{i} and ξi\xi^{i} in (1.1) and (1.2) are expressed as θi\theta^{i}, FiF_{i} and ηi\eta_{i}, respectively. The reason why the upper and lower indices are reversed here (and throughout the paper) is that we first treat ξ\xi as an arbitrary coordinate system, which has an upper index (superscript) as in the standard notation of differential geometry, and then consider the condition for ξ\xi to become an m-affine coordinate system such as the expectation coordinate system.

A quantum version of Theorem ​1.1 is known, which we state below. Let {\mathcal{H}} be a finite-dimensional Hilbert space, 𝒮=𝒮(){\mathcal{S}}={\mathcal{S}}({\mathcal{H}}) be the totality of strictly positive density operators on {\mathcal{H}}, and ={ρξ|ξΞ}{\mathcal{M}}=\{\rho_{\xi}\;|\;\xi\in\Xi\} be an arbitrary quantum statistical model consisting of states ρξ\rho_{\xi} in 𝒮{\mathcal{S}}. It is well known that the variance matrix VξV_{\xi} of an arbitrary unbiased estimator for the parameter ξ\xi satisfies the SLD Cramér-Rao inequality VξGξ1V_{\xi}\geq G_{\xi}^{-1} [6, 7], where GξG_{\xi} is the SLD Fisher information matrix (see Section 4 for details.) When an unbiased estimator satisfies Vξ=Gξ1V_{\xi}=G_{\xi}^{-1} for every ξΞ\xi\in\Xi, we call it an efficient estimator for the parameter ξ\xi as in the classical case. Then we have the following theorem (Theorem 7.6 of [3]).

Theorem 1.3.

For a quantum statistical model ={ρξ}{\mathcal{M}}=\{\rho_{\xi}\}, the following conditions are equivalent.

  • (i)

    There exists an efficient estimator for the parameter ξ\xi.

  • (ii)

    There exist mutually commuting Hermitian operators F1,,FnF^{1},\ldots,F^{n} and a strictly positive operator PP such that the elements of {\mathcal{M}} are represented as

    ρξ=exp[12{i=1nθi(ξ)Fiψ(ξ)}]Pexp[12{i=1nθi(ξ)Fiψ(ξ)}]\rho_{\xi}=\exp\Bigl{[}\,\frac{1}{2}\Bigl{\{}\sum_{i=1}^{n}\theta_{i}(\xi)F^{i}-\psi(\xi)\Bigr{\}}\Bigr{]}\,P\,\exp\Bigl{[}\,\frac{1}{2}\Bigl{\{}\sum_{i=1}^{n}\theta_{i}(\xi)F^{i}-\psi(\xi)\Bigr{\}}\Bigr{]} (1.3)

    and that

    ξΞ,i,ξi=Tr(ρξFi).\forall\xi\in\Xi,\;\forall i,\quad\xi^{i}={\rm Tr}\left(\rho_{\xi}\,F^{i}\right). (1.4)

When a model ={ρξ}{\mathcal{M}}=\{\rho_{\xi}\} is represented as (1.3), we call it a quasi-classical exponential family. As is pointed out in [3] and will be verified in Section 4 of this paper, a quasi-classical exponential family is e-autoparallel in 𝒮{\mathcal{S}} w.r.t. the SLD structure. Note, however, that this is merely a special case of e-autoparallel submanifolds. Namely, the existence of efficient estimator is too strong as a characterization of the e-autoparallelity. Is it then possible to characterize the e-autoparallelity by an estimation-theoretic condition which is weaker than the existence of efficient estimator? We give an affirmative answer to this question in Section ​5. We also give another characterization of the e-autoparallelity in Section ​7 by considering estimation for scalar-valued functions instead of estimation for vector-valued parameters.

As mentioned above, the e-autoparallelity in the SLD structure has estimation-theoretical significance and is therefore a concept worth studying further. It should be noted here that the e-connection in the SLD structure is curvature-free but not torsion-free, so that the e-connection is not flat. This also means that 𝒮{\mathcal{S}} is not a dually flat space w.r.t. the SLD structure. In the case of a flat connection, an autoparallel submanifold corresponds to an affine subspace in the coordinate space of an affine coordinate system, so that the existence condition for autoparallel submanifolds is obvious. For a non-flat connection, on the other hand, we cannot see the whole picture of autoparallel submanifolds and are faced with the new problem of what kind of condition ensures the existence of autoparallel submanifold. Therefore, it is also important to study the autoparallelity from a purely geometrical point of view, away from estimation problems. This is another concern in this paper, along with the estimation-theoretical consideration.

The paper is organized as follows. In Section ​2, we explain the basic issues about the autoparallelity for an affine connection \nabla on a general differential manifold, focusing in particular on the situation where the dual connection of \nabla w.r.t. a Riemannian metric is flat. In Section ​3, we introduce a family of information-geometric structures on the space 𝒮(){\mathcal{S}}({\mathcal{H}}), and derive the basic issues concerning the e-autoparallelity by applying the results of Section ​2. These two sections are preliminaries for later sections. Although the results shown there are mostly known, we present them together with their derivations so that the descriptions are as self-contained as possible. Section ​4 also consists of basically known results, where Theorem ​1.3 is revisited, and it is clarified that the existence of efficient estimator only partially characterizes the e-autoparallelity for the SLD structure. This observation motivates Section ​5, where a sequence of estimators, which is called a filtration of estimators, is treated instead of a single estimator, and the existence of efficient filtration is shown to characterize the e-autoparallelity. In Section ​6, it is shown that a quantum Gaussian shift model has an efficient filtration and that the model in fact exhibits an analogous property to the e-autoparallelity in 𝒮(){\mathcal{S}}({\mathcal{H}}), although the Hilbert space {\mathcal{H}} is infinite-dimensional in this case, so that we cannot fully develop a differential geometrical argument there. Section ​7 treats an estimation problem for scalar-valued functions, where it is shown that a quantum statistical manifold is e-autoparallel in 𝒮{\mathcal{S}} if and only if the linear space formed by functions having efficient estimators is of maximal dimension. In Section ​8, we move away from estimation theory and consider the condition for existence of e-autoparallel submanifolds from a purely geometrical point of view, where the involutivity of a parallel distribution of tangent spaces is studied in relation to the torsion tensor. In Section ​9, we treat the case when dim=2\dim{\mathcal{H}}=2 and study the SLD structure of the space 𝒮(){\mathcal{S}}({\mathcal{H}}) of qubit states. It is shown there that 𝒮(){\mathcal{S}}({\mathcal{H}}) in this case has a characteristic property that every e-parallel distribution is involutive. Section ​10 is devoted to concluding remarks. Some proofs and additional results are included in Appendix for the sake of readability of the main text.

Remark 1.4.

We make some remarks on the nomenclature and the notation of the paper.

  1. 1.

    Throughout the paper, when we refer to a manifold, say MM, it means that MM is a manifold with a trivial global structure, so that we need not worry about the difference between global properties and local properties of MM. For instance, MM is always supposed to have a global coordinate system, and every closed differential form on MM is considered to be exact.

  2. 2.

    When we say that ξ=(ξ1,,ξn)\xi=(\xi^{1},\ldots,\xi^{n}) is a coordinate system of a manifold MM in the subsequent sections, it basically means that ξ\xi is a map :Mn:M\rightarrow\mathbb{R}^{n} (a global chart of MM) which represents each point pMp\in M by an nn-dimensional vector ξ(p)=(ξ1(p),,ξn(p))n\xi(p)=(\xi^{1}(p),\ldots,\xi^{n}(p))\in\mathbb{R}^{n}, although the same symbol ξ\xi has appeared above as a parameter to specify a point in the manifold. They are equivalent by ξ(p)=ξp=pξ\xi(p)=\xi^{\prime}\Leftrightarrow p=p_{\xi^{\prime}}. A parametrization is often more convenient than a coordinate system when dealing with concrete examples. In fact, we will use parametrizations in Section 6 for quantum Gaussian states and in Section 9 for qubit states.

  3. 3.

    This paper contains both arguments on quantum statistical manifolds (manifolds consisting of density operators) and those on general manifolds. We denote quantum statistical manifolds by 𝒮,,𝒩,{\mathcal{S}},{\mathcal{M}},{\mathcal{N}},\dots, while general manifolds are denoted by S,M,N,S,M,N,\dots.

2 Basic issues about autoparallelity

In this section we summarize basic issues related to autoparallelity from the perspective of general differential geometry, which will be necessary for later discussions.

Let SS be an arbitrary manifold, and denote the totality of smooth functions and that of smooth vector fields on SS by (S){\mathcal{F}}(S) and 𝒳(S){\mathscr{X}}(S), respectively. Suppose that SS is provided with an affine connection \nabla, which is a map 𝒳(S)×𝒳(S)𝒳(S),(X,Y)XY{\mathscr{X}}(S)\times{\mathscr{X}}(S)\rightarrow{\mathscr{X}}(S),\;(X,Y)\mapsto\nabla_{X}Y. Given a submanifold MM of SS, let 𝒳(S/M){\mathscr{X}}(S/M) denote the totality of smooth mappings which map each point pMp\in M to a tangent vector in Tp(S)T_{p}(S), i.e., sections of the vector bundle pMTp(S)\bigsqcup_{p\in M}T_{p}(S). Then \nabla naturally induces a map 𝒳(M)×𝒳(S/M)𝒳(S/M){\mathscr{X}}(M)\times{\mathscr{X}}(S/M)\rightarrow{\mathscr{X}}(S/M) so that for any X𝒳(M)X\in{\mathscr{X}}(M) and any Y𝒳(S/M)Y\in{\mathscr{X}}(S/M), XY\nabla_{X}Y is defined as an element of 𝒳(S/M){\mathscr{X}}(S/M). Since 𝒳(M)=𝒳(M/M)𝒳(S/M){\mathscr{X}}(M)={\mathscr{X}}(M/M)\subset{\mathscr{X}}(S/M), XY𝒳(S/M)\nabla_{X}Y\in{\mathscr{X}}(S/M) is defined for any X,Y𝒳(M)X,Y\in{\mathscr{X}}(M), although it does not necessarily belong to 𝒳(M){\mathscr{X}}(M).

When XY\nabla_{X}Y belongs to 𝒳(M){\mathscr{X}}(M) for every X,Y𝒳(M)X,Y\in{\mathscr{X}}(M), MM is said to be autoparallel w.r.t. ​\nabla or \nabla-autoparallel in SS (e.g., Sec. ​8 in Chap. ​VII of [4]). In particular, SS itself is \nabla-autoparallel in SS. An autoparallel curve is usually called a geodesic (or pregeodesic when we wish to clarify that our interest lies only in the image of the curve), so that the autoparallelity is a multi-dimensional extension of the notion of geodesic. When MM is \nabla-autoparallel in SS, \nabla defines an affine connection on MM. We denote this connection by |M\nabla|_{M} when we wish to distinguish it from the original connection \nabla on SS. The autoparallelity is transitive in the sense that if MM is \nabla-autoparallel in SS and NN is |M\nabla|_{M}-autoparallel in MM, then NN is \nabla-autoparallel in SS.

Remark 2.1.

If MM is \nabla-autoparallel in SS and NN is a nonempty open set of MM, then NN is also \nabla-autoparallel in SS having the same dimension as MM. In this paper, we restrict ourselves to maximal autoparallel submanifolds to avoid this ambiguity. In particular, an autoparallel submanifold of SS having the same dimension as SS is considered to be only SS. This restriction is merely for simplicity of descriptions.

Remark 2.2.

A similar but different notion to autoparallelity is total geodesicness. A submanifold MM is said to be totally geodesic w.r.t. ​\nabla or \nabla-totally geodesic in SS when for any point pMp\in M and any tangent vector XpTp(M)X_{p}\in T_{p}(M) of MM, the \nabla-geodesic passing through pp in direction XpX_{p} lies in MM. It is obvious that the autoparallelity implies the total geodesicness, but the converse is not true in general except when \nabla is torsion-free ([4], Theorem ​8.4 in Chap. ​VII). We will revisit this topic in Remark ​8.6.

A vector field X𝒳(S)X\in{\mathscr{X}}(S) is said to be parallel w.r.t. \nabla or \nabla-parallel when Y𝒳(S),YX=0\forall Y\in{\mathscr{X}}(S),\,\nabla_{Y}X=0. More generally, X𝒳(S/M)X\in{\mathscr{X}}(S/M) (including the case X𝒳(M)X\in{\mathscr{X}}(M)) is said to be \nabla-parallel when Y𝒳(M),YX=0\forall Y\in{\mathscr{X}}(M),\,\nabla_{Y}X=0.

When there exist on SS the same number of linearly independent \nabla-parallel vector fields as dimS\dim S, we say that (S,)(S,\nabla) is curvature-free. This condition is known to be equivalent to the curvature tensor of \nabla vanishing on SS. When (S,)(S,\nabla) is curvature-free, the parallel transport Φp,q():Tp(S)Tq(S)\Phi^{(\nabla)}_{p,q}:T_{p}(S)\rightarrow T_{q}(S) is defined for arbitrary two points p,qSp,q\in S so that a vector field X𝒳(S)X\in{\mathscr{X}}(S) is \nabla-parallel iff p,qS,Φp,q()(Xp)=Xq\forall p,q\in S,\;\Phi^{(\nabla)}_{p,q}(X_{p})=X_{q}.

The following two propositions are straightforward, where the curvature-freeness is essential.

Proposition 2.3.

For a submanifold MM of SS on which a curvature-free connection \nabla is given and for X𝒳(S/M)X\in{\mathscr{X}}(S/M) (including the case when X𝒳(M)X\in{\mathscr{X}}(M)), the following conditions are equivalent.

  • (i)

    XX is \nabla-parallel.

  • (ii)

    X~𝒳(S)\exists\tilde{X}\in{\mathscr{X}}(S), X~\tilde{X} is \nabla-parallel and X=X~|MX=\tilde{X}|_{M}.

Proposition 2.4.

For an nn-dimensional submanifold MM of SS on which a curvature-free connection \nabla is given, the following conditions are equivalent.

  • (i)

    MM is \nabla-autoparallel in SS.

  • (ii)

    p,qM,Φp,q()(Tp(M))=Tq(M)\forall p,q\in M,\;\Phi_{p,q}^{(\nabla)}(T_{p}(M))=T_{q}(M).

  • (iii)

    There exist nn linearly independent \nabla-parallel vector fields on MM.

  • (iv)

    There exist nn linearly independent \nabla-parallel vector fields X~1,,X~n\tilde{X}^{1},\ldots,\tilde{X}^{n} on SS such that i\forall i, Xi~|M𝒳()\tilde{X^{i}}|_{M}\in{\mathscr{X}}({\mathcal{M}}).

When these conditions hold, a vector field XX on MM is |M\nabla|_{M}-parallel iff it is \nabla-parallel, and |M\nabla|_{M} is curvature-free.

In the following, we consider the case when (S,)(S,\nabla) is additionally provided with a Riemannian metric gg for which the dual connection \nabla^{*} of \nabla is flat. Namely, the triple (g,,)(g,\nabla,\nabla^{*}) satisfies the duality [5] [3]:

X,Y,Z𝒳(S),Xg(Y,Z)=g(XY,Z)+g(Y,XZ),\forall X,Y,Z\in{\mathscr{X}}(S),\;Xg(Y,Z)=g(\nabla_{X}Y,Z)+g(Y,\nabla_{X}^{*}Z), (2.1)

and \nabla^{*} is flat in the sense that it is curvature-free and torsion free. The flatness is known to be equivalent to the existence of a coordinate system ξ=(ξi)\xi=(\xi^{i}), which is called an affine coordinate system w.r.t.  \nabla^{*}, such that i=ξi,i{1,,dimS}\partial_{i}=\frac{\partial}{\partial\xi^{i}},i\in\{1,\ldots,\dim S\}, are all \nabla^{*}-parallel. In this case, \nabla turns out to be curvature-free, since the curvature-freeness is preserved by the duality of connections (Theorem ​3.3 of [3]), but is not necessarily torsion-free, and hence (S,g,,)(S,g,\nabla,\nabla^{*}) is not necessarily dually flat.

Proposition 2.5.

In the above situation, we have:

  • (1)

    For a vector field X𝒳(S)X\in{\mathscr{X}}(S), XX is \nabla-parallel iff g(X,Y)g(X,Y) is constant on SS for every \nabla^{*}-parallel vector field Y𝒳(S)Y\in{\mathscr{X}}(S).

  • (2)

    For a vector field X𝒳(S)X\in{\mathscr{X}}(S), XX is \nabla^{*}-parallel iff g(X,Y)g(X,Y) is constant on SS for every \nabla-parallel vector field Y𝒳(S)Y\in{\mathscr{X}}(S).

Proof  The proposition relies not on the torsion-freeness of \nabla^{*} but only on the curvature-freeness of \nabla and \nabla^{*}, so that it suffices to show (1). For an X𝒳(S)X\in{\mathscr{X}}(S), we have

XX is \nabla-parallel Y,Z,g(ZX,Y)=0\displaystyle\Leftrightarrow\;\forall Y,\forall Z,\;g(\nabla_{Z}X,Y)=0
aY:-parallel,Z,g(ZX,Y)=0\displaystyle\stackrel{{\scriptstyle\text{a}}}{{\Leftrightarrow}}\forall Y:\text{$\nabla^{*}$-parallel},\forall Z,\;g(\nabla_{Z}X,Y)=0
bY:-parallel,Z,Zg(X,Y)=0\displaystyle\stackrel{{\scriptstyle\text{b}}}{{\Leftrightarrow}}\;\forall Y:\text{$\nabla^{*}$-parallel},\forall Z,\;Zg(X,Y)=0
Y:-parallel,g(X,Y) is constant,\displaystyle\Leftrightarrow\;\forall Y:\text{$\nabla^{*}$-parallel},\;\text{$g(X,Y)$ is constant}, (2.2)

where \Leftarrow in a\stackrel{{\scriptstyle\text{a}}}{{\Leftrightarrow}} follows since the set of \nabla^{*}-parallel vector fields has the same dimension as dimS\dim S due to the curvature-freeness of \nabla^{*}, and b\stackrel{{\scriptstyle\text{b}}}{{\Leftrightarrow}} follows since the duality (2.1) and the \nabla^{*}-parallelity of YY implies

Zg(X,Y)=g(ZX,Y)+g(X,ZY)=g(ZX,Y).Zg(X,Y)=g(\nabla_{Z}X,Y)+g(X,\nabla^{*}_{Z}Y)=g(\nabla_{Z}X,Y). (2.3)

\Box

Suppose that MM is \nabla-autoparallel in SS. Then MM is a Riemannian submanifold of (S,g)(S,g) equipped with the affine connection |M\nabla|_{M}. Hence the dual connection of |M\nabla|_{M} w.r.t. gg (more precisely, w.r.t. the induced metric g|Mg|_{M} on MM) is defined, which we denote by M:=(|M)\nabla^{*}_{M}:=(\nabla|_{M})^{*}; i.e.,

X,Y,Z𝒳(M),Xg(Y,Z)=g(XY,Z)+g(Y,(M)XZ),\forall X,Y,Z\in{\mathscr{X}}(M),\;Xg(Y,Z)=g(\nabla_{X}Y,Z)+g(Y,(\nabla^{*}_{M})_{X}Z), (2.4)

where we have applied (|M)XY=XY(\nabla|_{M})_{X}Y=\nabla_{X}Y. From (2.1) and (2.4), we have

X,Y,Z𝒳(M),g(Y,(M)XZ)=g(Y,XZ),\forall X,Y,Z\in{\mathscr{X}}(M),\;g(Y,(\nabla^{*}_{M})_{X}Z)=g(Y,\nabla^{*}_{X}Z), (2.5)

which means that (M)XZ(\nabla^{*}_{M})_{X}Z is the gg-projection of XZ𝒳(S/M)\nabla^{*}_{X}Z\in{\mathscr{X}}(S/M) onto 𝒳(M){\mathscr{X}}(M).

restriction|Mg-dualg|M-dualg-projectionM\begin{CD}\nabla @>{\text{restriction}}>{}>\nabla|_{M}\\ @V{\text{$g$-dual}}V{}V@V{}V{\text{$g|_{M}$-dual}}V\\ \nabla^{*}@>{\text{$g$-projection}}>{}>\nabla_{M}^{*}\end{CD}

Since the curvature-freeness and the torsion-freeness are respectively preserved by the duality and the projection, M\nabla^{*}_{M} is flat as in the case with \nabla^{*}, which ensures the existence of M\nabla^{*}_{M}-affine coordinate system of MM.

The following proposition will be of fundamental importance for later arguments.

Proposition 2.6.

For an nn-dimensional submanifold MM of SS and a coordinate system ξ\xi of MM, the following conditions are equivalent.

  • (i)

    MM is \nabla-autoparallel in SS, and ξ\xi is a M\nabla^{*}_{M}-affine coordinate system.

  • (ii)

    For every i{1,,n}i\in\{1,\ldots,n\}, the vector field

    Xi:=jgijj𝒳(M)X^{i}:=\sum_{j}g^{ij}\partial_{j}\in{\mathscr{X}}(M) (2.6)

    is \nabla-parallel, where i:=ξi\partial_{i}:=\frac{\partial}{\partial\xi^{i}} and [gij]:=[gij:=g(i,j)]1[g^{ij}]:=[g_{ij}:=g(\partial_{i},\partial_{j})]^{-1}.

Proof  We first show (i) \Rightarrow (ii). Assume (i) and define Xi𝒳(M)X^{i}\in{\mathscr{X}}(M) by (2.6). Then we have g(Xi,j)=δjig(X^{i},\partial_{j})=\delta^{i}_{j} for every i,ji,j, which is constant on MM. Noting that n(=dimM)n(=\dim M) vector fields {j}\{\partial_{j}\} are M\nabla^{*}_{M}-parallel and that Prop. ​2.5 can be applied to (M,g|M,|M,M)(M,g|_{M},\nabla|_{M},\nabla^{*}_{M}) due to the \nabla-autoparallelity of MM, it follows from item (1) of Prop. ​2.5 that XiX^{i} is |M\nabla|_{M}-parallel, and hence it is \nabla-parallel.

We next show (ii) \Rightarrow (i). Assume (ii). Then according to Prop. ​2.4, MM is \nabla-autoparallel in SS. Noting that g(Xi,j)g(X^{i},\partial_{j}) is constant on MM and applying item (2) of Prop. ​2.5 to (M,g|M,|M,M)(M,g|_{M},\nabla|_{M},\nabla^{*}_{M}), we have that {j}\{\partial_{j}\} are M\nabla^{*}_{M}-parallel, which means that ξ\xi is M\nabla^{*}_{M}-affine. \Box

3 Information geometric structures on quantum statistical manifolds

In this section, we introduce the information-geometric structures on quantum statistical manifolds and apply the results of the previous section to them. The geometric structure treated here is essentially the same as the one studied in § ​7.3 of [3].

Let =(){\mathcal{L}}={\mathcal{L}}({\mathcal{H}}), h=h(){\mathcal{L}}_{{\rm h}}={\mathcal{L}}_{{\rm h}}({\mathcal{H}}) and 𝒮=𝒮(){\mathcal{S}}={\mathcal{S}}({\mathcal{H}}) be the totality of linear operators on {\mathcal{H}}, that of Hermitian operators on {\mathcal{H}} and that of strictly positive density operators on {\mathcal{H}}, respectively. Then 𝒮{\mathcal{S}} is an open subset of the affine space h,1:={Ah|TrA=1}{{\mathcal{L}}_{{\rm h}}}_{,1}:=\{A\in{\mathcal{L}}_{{\rm h}}\,|\,{\rm Tr}A=1\}, so that a flat affine connection is naturally introduced on 𝒮{\mathcal{S}}, which we call the m-connection and denote by (m)\nabla^{({\rm m})}. In order to express (m)\nabla^{({\rm m})} more explicitly, we introduce the embedding map ι:𝒮h,1\iota:{\mathcal{S}}\rightarrow{{\mathcal{L}}_{{\rm h}}}_{,1} so that ρ𝒮\rho\in{\mathcal{S}} is denoted by ι(ρ)\iota(\rho) when treating it as an element of h,1{{\mathcal{L}}_{{\rm h}}}_{,1}. Since ι\iota is a smooth map, it has the differential at every point ρ𝒮\rho\in{\mathcal{S}}, which we denote by ι=(dι)ρ:Tρ(𝒮)h,0:={Ah|TrA=0}\iota_{*}=(d\iota)_{\rho}:T_{\rho}({\mathcal{S}})\rightarrow{{\mathcal{L}}_{{\rm h}}}_{,0}:=\{A\in{\mathcal{L}}_{{\rm h}}\,|\,{\rm Tr}A=0\}. For a vector field X𝒳(𝒮)X\in{\mathscr{X}}({\mathcal{S}}), the map ι(X):𝒮h,0\iota_{*}(X):{\mathcal{S}}\rightarrow{{\mathcal{L}}_{{\rm h}}}_{,0} is defined to be ρι(Xρ)=(dι)ρ(Xρ)\rho\mapsto\iota_{*}(X_{\rho})=(d\iota)_{\rho}(X_{\rho}). Then the definition of the m-connection is represented as follows:

X,Y𝒳(𝒮),ι(X(m)Y)=Xι(Y),\forall X,Y\in{\mathscr{X}}({\mathcal{S}}),\;\iota_{*}(\nabla^{({\rm m})}_{X}Y)=X\iota_{*}(Y), (3.1)

where Xι(Y):𝒮h,0X\iota_{*}(Y):{\mathcal{S}}\rightarrow{{\mathcal{L}}_{{\rm h}}}_{,0} is the derivative of ι(Y)\iota_{*}(Y) w.r.t. ​XX. When a coordinate system ξ=(ξi)\xi=(\xi^{i}) is arbitrarily given and the elements of 𝒮{\mathcal{S}} is parametrized by it as ρξ\rho_{\xi}, we have

ι(i)=iρξ,\iota_{*}\left(\partial_{i}\right)=\partial_{i}\rho_{\xi}, (3.2)

and

ι(i(m)j)=iι(j)=ijρξ,\iota_{*}\left(\nabla^{({\rm m})}_{\partial_{i}}\partial_{j}\right)=\partial_{i}\,\iota_{*}\left(\partial_{j}\right)=\partial_{i}\partial_{j}\rho_{\xi}, (3.3)

where i:=ξi\partial_{i}:=\frac{\partial}{\partial\xi^{i}}.

Suppose that we are given a family of inner products {,ρ|ρ𝒮()}\{\left\langle\cdot,\cdot\right\rangle_{\rho}\,|\,\rho\in{\mathcal{S}}({\mathcal{H}})\} on the \mathbb{R}-linear space h{\mathcal{L}}_{{\rm h}}, where the correspondence ρ,ρ\rho\mapsto\left\langle\cdot,\cdot\right\rangle_{\rho} is smooth, and assume that

ρ𝒮,Ah,A,Iρ=Aρ:=Tr(ρA).\forall\rho\in{\mathcal{S}},\forall A\in{\mathcal{L}}_{{\rm h}},\;\left\langle A,I\right\rangle_{\rho}=\left\langle A\right\rangle_{\rho}:={\rm Tr}(\rho A). (3.4)

The inner products are represented as

A,Bρ=A,Ωρ(B)HS=Tr(AΩρ(B))\left\langle A,B\right\rangle_{\rho}=\left\langle A,\Omega_{\rho}(B)\right\rangle_{\rm HS}={\rm Tr}(A\,\Omega_{\rho}(B)) (3.5)

by a family of super-operators {Ωρ:hh}ρ𝒮\{\Omega_{\rho}:{\mathcal{L}}_{{\rm h}}\rightarrow{\mathcal{L}}_{{\rm h}}\}_{\rho\in{\mathcal{S}}}, where ,HS\left\langle\cdot,\cdot\right\rangle_{\rm HS} denotes the Hilbert-Schmidt inner product. Note that the assumption (3.4) is equivalent to

ρ𝒮,Ωρ(I)=ρ.\forall\rho\in{\mathcal{S}},\;\Omega_{\rho}(I)=\rho. (3.6)

For an arbitrary tangent vector XρTρ(𝒮)X_{\rho}\in T_{\rho}({\mathcal{S}}), a Hermitian operator LXρhL_{X_{\rho}}\in{\mathcal{L}}_{{\rm h}} is defined by the relation

Ah,XρA=LXρ,Aρ,\forall A\in{\mathcal{L}}_{{\rm h}},\;X_{\rho}\left\langle A\right\rangle=\left\langle L_{X_{\rho}},A\right\rangle_{\rho}, (3.7)

where the LHS denotes the derivative of the function A:𝒮\left\langle A\right\rangle:{\mathcal{S}}\rightarrow\mathbb{R}, ρAρ\rho\mapsto\left\langle A\right\rangle_{\rho} w.r.t. XρX_{\rho}. Noting that the LHS and the RHS are represented as Tr(ι(Xρ)A){\rm Tr}(\iota_{*}(X_{\rho})A) and Tr(Ωρ(LXρ)A){\rm Tr}(\Omega_{\rho}(L_{X_{\rho}})\,A), respectively, we can rewrite (3.7) into

ι(Xρ)=Ωρ(LXρ).\iota_{*}(X_{\rho})=\Omega_{\rho}(L_{X_{\rho}}). (3.8)

From (3.4) and (3.7), we have

LXρρ=LXρ,Iρ=XρI=Xρ1=0.\left\langle L_{X_{\rho}}\right\rangle_{\rho}=\left\langle L_{X_{\rho}},I\right\rangle_{\rho}=X_{\rho}\left\langle I\right\rangle=X_{\rho}1=0. (3.9)

Since Xρι(Xρ)LXρX_{\rho}\leftrightarrow\iota_{*}(X_{\rho})\leftrightarrow L_{X_{\rho}} are one-to-one correspondences, we obtain the following identity:

{LXρ|XρTρ(𝒮)}={Ah|Aρ=0}.\{L_{X_{\rho}}\,|\,X_{\rho}\in T_{\rho}({\mathcal{S}})\}=\{A\in{\mathcal{L}}_{{\rm h}}\,|\,\left\langle A\right\rangle_{\rho}=0\}. (3.10)

In the following, we often express (3.4) as

Ah,A,I=A\forall A\in{\mathcal{L}}_{{\rm h}},\;\left\langle A,I\right\rangle=\left\langle A\right\rangle (3.11)

as an identity for functions on 𝒮{\mathcal{S}}. Similarly, (3.7) is expressed as

Ah,XA=LX,A,\forall A\in{\mathcal{L}}_{{\rm h}},\;X\left\langle A\right\rangle=\left\langle L_{X},A\right\rangle, (3.12)

where XX is a vector field on 𝒮{\mathcal{S}}, LXL_{X} denotes the map 𝒮h{\mathcal{S}}\rightarrow{\mathcal{L}}_{{\rm h}}, ρLXρ\rho\mapsto L_{X_{\rho}}, and LX,A\left\langle L_{X},A\right\rangle denotes the function ρLXρ,Aρ\rho\mapsto\left\langle L_{X_{\rho}},A\right\rangle_{\rho}.

For a submanifold {\mathcal{M}} of 𝒮{\mathcal{S}} and a vector field X𝒳()X\in{\mathscr{X}}({\mathcal{M}}) on it, the map LX:h,ρLXρL_{X}:{\mathcal{M}}\rightarrow{\mathcal{L}}_{{\rm h}},\rho\mapsto L_{X_{\rho}} is defined, for which (3.12) holds as an identity for functions on {\mathcal{M}}. We may write XAX\left\langle A\right\rangle as XA|X\left\langle A\right\rangle\!|_{{\mathcal{M}}} in this case. In particular, given a coordinate system ξ=(ξi)\xi=(\xi^{i}) of {\mathcal{M}}, we have

Ah,iA=iA|=Li,A,\forall A\in{\mathcal{L}}_{{\rm h}},\;\partial_{i}\left\langle A\right\rangle=\partial_{i}\left\langle A\right\rangle\!|_{{\mathcal{M}}}=\left\langle L_{i},A\right\rangle, (3.13)

where i:=ξi\partial_{i}:=\frac{\partial}{\partial\xi^{i}} and Li:=LiL_{i}:=L_{\partial_{i}}.

Remark 3.1.

In the terminology of [3], ι(Xρ)\iota_{*}(X_{\rho}) and LXρL_{X_{\rho}} are called the m-representation and e-representation, and are denoted by ι(Xρ)=Xρ(m)\iota_{*}(X_{\rho})=X_{\rho}^{({\rm m})} and LXρ=Xρ(e)L_{X_{\rho}}=X_{\rho}^{({\rm e})}, respectively. We do not use the symbols Xρ(m)X_{\rho}^{({\rm m})} and Xρ(e)X_{\rho}^{({\rm e})} in this paper, but may use the following notation when convenient:

Tρ(m)()\displaystyle T^{({\rm m})}_{\rho}({\mathcal{M}}) :={ι(Xρ)|XρTρ()},\displaystyle:=\{\iota_{*}(X_{\rho})\,|\,X_{\rho}\in T_{\rho}({\mathcal{M}})\}, (3.14)
Tρ(e)()\displaystyle T^{({\rm e})}_{\rho}({\mathcal{M}}) :={LXρ|XρTρ()}.\displaystyle:=\{L_{X_{\rho}}\,|\,X_{\rho}\in T_{\rho}({\mathcal{M}})\}. (3.15)

Note that

Tρ(m)(𝒮)\displaystyle T^{({\rm m})}_{\rho}({\mathcal{S}}) =h,0andTρ(e)(𝒮)={Ah|Aρ=0}.\displaystyle={{\mathcal{L}}_{{\rm h}}}_{,0}\quad\text{and}\quad T^{({\rm e})}_{\rho}({\mathcal{S}})=\{A\in{\mathcal{L}}_{{\rm h}}\,|\,\left\langle A\right\rangle_{\rho}=0\}. (3.16)

Now, we define a Riemannian metric gg on 𝒮{\mathcal{S}} by

ρ𝒮,Xρ,YρTρ(𝒮),gρ(Xρ,Yρ):=LXρ,LYρρ=Tr(ι(Xρ)LYρ),\forall\rho\in{\mathcal{S}},\forall X_{\rho},Y_{\rho}\in T_{\rho}({\mathcal{S}}),\;g_{\rho}(X_{\rho},Y_{\rho}):=\left\langle L_{X_{\rho}},L_{Y_{\rho}}\right\rangle_{\rho}={\rm Tr}\left(\iota_{*}(X_{\rho})L_{Y_{\rho}}\right), (3.17)

which is equivalently written as

X,Y𝒳(𝒮),g(X,Y):=LX,LY=Tr(ι(X)LY).\forall X,Y\in{\mathscr{X}}({\mathcal{S}}),\;g(X,Y):=\left\langle L_{X},L_{Y}\right\rangle={\rm Tr}\left(\iota_{*}(X)L_{Y}\right). (3.18)

We also have the expression

X,Y𝒳(𝒮),g(X,Y)=XLY=YLX,\forall X,Y\in{\mathscr{X}}({\mathcal{S}}),\;g(X,Y)=-\left\langle XL_{Y}\right\rangle=-\left\langle YL_{X}\right\rangle, (3.19)

where XLY:𝒮hXL_{Y}:{\mathcal{S}}\rightarrow{\mathcal{L}}_{{\rm h}}, ρXρLY\rho\mapsto X_{\rho}L_{Y}, denotes the derivative of the map LY:𝒮hL_{Y}:{\mathcal{S}}\rightarrow{\mathcal{L}}_{{\rm h}}, ρLYρ\rho\mapsto L_{Y_{\rho}}, w.r.t. XX, and XLY\left\langle XL_{Y}\right\rangle denotes the function ρXρLYρ\rho\mapsto\left\langle X_{\rho}L_{Y}\right\rangle_{\rho}. This expression is derived as

0=XρLY=XρLY\displaystyle 0=X_{\rho}\left\langle L_{Y}\right\rangle=X_{\rho}\left\langle L_{Y_{\bm{\cdot}}}\right\rangle_{\bm{\cdot}} =XρLYρ+XρLYρ\displaystyle=X_{\rho}\left\langle L_{Y_{\rho}}\right\rangle_{\bm{\cdot}}+X_{\rho}\left\langle L_{Y_{\bm{\cdot}}}\right\rangle_{\rho}
=LXρ,LYρρ+XρLYρ,\displaystyle=\left\langle L_{X_{\rho}},L_{Y_{\rho}}\right\rangle_{\rho}+\left\langle X_{\rho}L_{Y}\right\rangle_{\rho}, (3.20)

where the first equality follows from (3.9) and the last from (3.7), with dots {\bm{\cdot}} added to clarify the positions of variables of maps. An important class of such Riemannian metrics is that of monotone metrics [9] for which Ωρ\Omega_{\rho} is represented as

Ωρ(A)=f(Δρ)(Aρ)=f(Δρ)(A)ρ,\Omega_{\rho}(A)=f(\Delta_{\rho})(A\rho)=f(\Delta_{\rho})(A)\rho, (3.21)

where f:(0,)(0,)f:(0,\infty)\rightarrow(0,\infty) is an operator monotone function satisfying x>0,xf(1/x)=f(x)\forall x>0,\,xf(1/x)=f(x) and f(1)=1f(1)=1, and Δρ:\Delta_{\rho}:{\mathcal{L}}\rightarrow{\mathcal{L}} is the modular operator defined by Δρ(A)=ρAρ1\Delta_{\rho}(A)=\rho A\rho^{-1}. The class contains the SLD metric, which plays the main role in this paper, defined by f(x)=(x+1)/2f(x)=(x+1)/2 and

A,Bρ=ReTr(ρAB)=Tr(ρ(AB))=Tr((ρA)B),A,Bh,\left\langle A,B\right\rangle_{\rho}={\rm Re}\,{\rm Tr}(\rho AB)={\rm Tr}(\rho(A\circ B))={\rm Tr}((\rho\circ A)B),\;\;A,B\in{\mathcal{L}}_{{\rm h}}, (3.22)

where \circ denotes the symmetrized product: AB=12(AB+BA)A\circ B=\frac{1}{2}(AB+BA). In this case, LXρL_{X_{\rho}} (LXL_{X}, resp.) is called the SLD (symmetric logarithmic derivative) of the tangent vector XρX_{\rho} (the vector field XX, resp.). In particular, Li:=LiL_{i}:=L_{\partial_{i}} obeys the equation

iρξ=ρξLi,ξ,\partial_{i}\rho_{\xi}=\rho_{\xi}\circ L_{i,\xi}, (3.23)

which is a popular expression for the SLD.

Given a family of inner products {,ρ|ρ𝒮}\{\left\langle\cdot,\cdot\right\rangle_{\rho}\,|\,\rho\in{\mathcal{S}}\} which determines a Riemannian metric gg, let the e-connection (e)\nabla^{({\rm e})} be defined as the dual connection of (m)\nabla^{({\rm m})} w.r.t. ​gg; i.e.,

X,Y,Z𝒳(𝒮),Xg(Y,Z)=g(X(e)Y,Z)+g(Y,X(m)Z).\forall X,Y,Z\in{\mathscr{X}}({\mathcal{S}}),\;Xg(Y,Z)=g(\nabla^{({\rm e})}_{X}Y,Z)+g(Y,\nabla^{({\rm m})}_{X}Z). (3.24)

We have thus obtained (𝒮,g,(e),(m))({\mathcal{S}},g,\nabla^{({\rm e})},\nabla^{({\rm m})}) as an example of (S,g,,)(S,g,\nabla,\nabla^{*}) treated in the previous section, where (e)\nabla^{({\rm e})} and (m)\nabla^{({\rm m})} are dual w.r.t. gg, (e)\nabla^{({\rm e})} is curvature-free and (m)\nabla^{({\rm m})} is flat. The triple (𝒮,g,(e),(m))({\mathcal{S}},g,\nabla^{({\rm e})},\nabla^{({\rm m})}) is called the information-geometric structure on 𝒮{\mathcal{S}} induced from a family of inner products {,ρ|ρ𝒮}\{\left\langle\cdot,\cdot\right\rangle_{\rho}\,|\,\rho\in{\mathcal{S}}\}. In particular, the information-geometric structure induced from the symmetrized inner product (3.22) is called the SLD structure. It is the SLD structure that will play a leading role in subsequent sections in relation to estimation theory, but this section will continue the discussion on general information-geometric structures.

Remark 3.2.

As is shown in Theorem ​7.1 of [3], there is only one information-geometric structure defined in the manner described above for which the e-connection is torsion-free (so that (𝒮,g,(e),(m))({\mathcal{S}},g,\nabla^{({\rm e})},\nabla^{({\rm m})}) is dually flat). That is the structure induced from the BKM (Bogoliubov-Kubo-Mori) inner product

A,Bρ=01Tr(ρsAρ1sB)𝑑s,A,Bh.\left\langle A,B\right\rangle_{\rho}=\int_{0}^{1}{\rm Tr}\left(\rho^{s}A\rho^{1-s}B\right)ds,\;\;A,B\in{\mathcal{L}}_{{\rm h}}. (3.25)

The induced Riemannian metric is a monotone metric corresponding to f(x)=x1logx=01xs𝑑sf(x)=\frac{x-1}{\log x}=\int_{0}^{1}x^{s}ds. In the other cases, the torsion 𝒯(e)(X,Y)=X(e)YY(e)X[X,Y]{{\mathcal{T}}}^{({\rm e})}(X,Y)=\nabla^{({\rm e})}_{X}Y-\nabla^{({\rm e})}_{Y}X-[X,Y] does not vanish, where [,][\cdot,\cdot] denotes the Lie bracket for vector fields.

Remark 3.3.

For the SLD structure, it is known ([3], Eq. ​(7.80)) that the torsion has the following representation : for each point ρ𝒮\rho\in{\mathcal{S}} and each tangent vectors Xρ,YρTρ(𝒮)X_{\rho},Y_{\rho}\in T_{\rho}({\mathcal{S}}), we have

ι(𝒯(e)(Xρ,Yρ))=14[[LXρ,LYρ],ρ],\iota_{*}({{\mathcal{T}}}^{({\rm e})}(X_{\rho},Y_{\rho}))=\frac{1}{4}[[L_{X_{\rho}},L_{Y_{\rho}}],\rho], (3.26)

where [,][\cdot,\cdot] in the RHS denotes the commutator for operators on {\mathcal{H}}. Since this representation will be used in Sections 8 and 9, we show its proof in A1 of appendix for the reader’s convenience.

Henceforth, we use the prefixes e- and m- for notions concerning the e-connection and m-connection; e.g., e-parallel, e-autoparallel, m-affine, etc.

Proposition 3.4.

For any vector fields X,Y,W𝒳(𝒮)X,Y,W\in{\mathscr{X}}({\mathcal{S}}), we have

W=X(e)Y\displaystyle W=\nabla^{({\rm e})}_{X}Y\; LW=XLYXLY=XLY+g(X,Y).\displaystyle\Leftrightarrow\;L_{W}=XL_{Y}-\left\langle XL_{Y}\right\rangle=XL_{Y}+g(X,Y). (3.27)

Proof  Differentiating g(Y,Z)=Tr(LYι(Z))g(Y,Z)={\rm Tr}(L_{Y}\,\iota_{*}(Z)) (see (3.18)) by XX, we have

Xg(Y,Z)\displaystyle Xg(Y,Z) =Tr((XLY)ι(Z))+Tr(LY(Xι(Z)))\displaystyle={\rm Tr}\left((XL_{Y})\,\iota_{*}(Z)\right)+{\rm Tr}\left(L_{Y}\,(X\iota_{*}(Z))\right)
=Tr((XLY)ι(Z))+g(Y,X(m)Z),\displaystyle={\rm Tr}\left((XL_{Y})\,\iota_{*}(Z)\right)+g(Y,\nabla^{({\rm m})}_{X}Z),

where the second equality follows from (3.1). Letting W𝒳(𝒮)W^{\prime}\in{\mathscr{X}}({\mathcal{S}}) be defined by LW=XLYXLYL_{W^{\prime}}=XL_{Y}-\left\langle XL_{Y}\right\rangle, whose existence is ensured by (3.10), the above equation is represented as

Xg(Y,Z)=g(W,Z)+g(Y,X(m)Z).Xg(Y,Z)=g(W^{\prime},Z)+g(Y,\nabla^{({\rm m})}_{X}Z).

This means that W=X(e)YW^{\prime}=\nabla^{({\rm e})}_{X}Y, and proves the proposition. \Box

Proposition 3.5.

For a vector field X𝒳(𝒮)X\in{\mathscr{X}}({\mathcal{S}}), we have

XX is e-parallel Ah,LX=AA\displaystyle\Leftrightarrow\;\exists A\in{\mathcal{L}}_{{\rm h}},\;L_{X}=A-\left\langle A\right\rangle
Ah,ρ𝒮,LXρ=AAρ.\displaystyle\Leftrightarrow\;\exists A\in{\mathcal{L}}_{{\rm h}},\forall\rho\in{\mathcal{S}},\;L_{X_{\rho}}=A-\left\langle A\right\rangle_{\rho}. (3.28)

Proof  We may use Prop. ​3.4 to prove this, but here we show an alternative proof. We first note that, according to (3.1), a vector field Y𝒳(𝒮)Y\in{\mathscr{X}}({\mathcal{S}}) is m-parallel if and only if ι(Y):𝒮h,0\iota_{*}(Y):{\mathcal{S}}\rightarrow{{\mathcal{L}}_{{\rm h}}}_{,0} is a constant map represented by an operator Bh,0B\in{{\mathcal{L}}_{{\rm h}}}_{,0} as ι(Y)=B\iota_{*}(Y)=B. Invoking Prop. ​2.5, we have

XX is e-parallel Y:m-parallel,g(X,Y) is constant on 𝒮\displaystyle\Leftrightarrow\;\forall Y:\text{m-parallel},\;\text{$g(X,Y)$ is constant on ${\mathcal{S}}$}
Bh,0,LX,B is constant on 𝒮\displaystyle\Leftrightarrow\;\forall B\in{{\mathcal{L}}_{{\rm h}}}_{,0},\;\text{$\left\langle L_{X},B\right\rangle$ is constant on ${\mathcal{S}}$}
Ah,LX=AA.\displaystyle\Leftrightarrow\;\exists A\in{\mathcal{L}}_{{\rm h}},\;L_{X}=A-\left\langle A\right\rangle.

\Box

Recalling Prop. ​2.3, the following corollary is immediate.

Corollary 3.6.

For a submanifold {\mathcal{M}} of 𝒮{\mathcal{S}} and for X𝒳(𝒮/)X\in{\mathscr{X}}({\mathcal{S}}/{\mathcal{M}}) (including the case when X𝒳()X\in{\mathscr{X}}({\mathcal{M}})), we have

XX is e-parallel Ah,LX=AA|\displaystyle\Leftrightarrow\;\exists A\in{\mathcal{L}}_{{\rm h}},\;L_{X}=A-\left\langle A\right\rangle|_{{\mathcal{M}}}
Ah,ρ,LXρ=AAρ.\displaystyle\Leftrightarrow\;\exists A\in{\mathcal{L}}_{{\rm h}},\forall\rho\in{\mathcal{M}},\;L_{X_{\rho}}=A-\left\langle A\right\rangle_{\rho}. (3.29)

The following corollary is also immediate from Prop. ​3.5.

Corollary 3.7.

The e-parallel transport Φρ,σ(e)\Phi^{({\rm e})}_{\rho,\sigma} :Tρ(𝒮)Tσ(𝒮):T_{\rho}({\mathcal{S}})\rightarrow T_{\sigma}({\mathcal{S}}) for arbitrary two points ρ,σ𝒮\rho,\sigma\in{\mathcal{S}} is represented as follows: XρTρ(𝒮),XσTσ(𝒮)\forall X_{\rho}\in T_{\rho}({\mathcal{S}}),\forall X_{\sigma}\in T_{\sigma}({\mathcal{S}}),

Xσ=Φρ,σ(e)(Xρ)LXσ=LXρLXρσ.X_{\sigma}=\Phi^{({\rm e})}_{\rho,\sigma}(X_{\rho})\;\Leftrightarrow\;L_{X_{\sigma}}=L_{X_{\rho}}-\left\langle L_{X_{\rho}}\right\rangle_{\sigma}. (3.30)

In the following, a pair (,ξ)({\mathcal{M}},\xi) of a submanifold {\mathcal{M}} of 𝒮{\mathcal{S}} and a coordinate system ξ\xi of {\mathcal{M}} is called a model.

Proposition 3.8.

For an nn-dimensional model (,ξ)({\mathcal{M}},\xi), the following conditions are equivalent.

  • (i)

    {\mathcal{M}} is e-autoparallel in 𝒮{\mathcal{S}}, and ξ\xi is an m-affine coordinate system.
    (Note: “m-affine” means “affine w.r.t. the m-connection (m)\nabla^{({\rm m})}_{{\mathcal{M}}} on {\mathcal{M}}”.)

  • (ii)

    {F1,,Fn}h\exists\{F^{1},\ldots,F^{n}\}\subset{\mathcal{L}}_{{\rm h}} such that for every i{1,,n}i\in\{1,\ldots,n\}

    jgijLj=FiFi|,\sum_{j}g^{ij}L_{j}=F^{i}-\left\langle F^{i}\right\rangle|_{{\mathcal{M}}}, (3.31)

    where i:=ξi\partial_{i}:=\frac{\partial}{\partial\xi^{i}}, Lj:=LjL_{j}:=L_{\partial_{j}} and [gij]:=[gij:=g(i,j)]1[g^{ij}]:=[g_{ij}:=g(\partial_{i},\partial_{j})]^{-1}.

  • (iii)

    {F1,,Fn}h\exists\{F^{1},\ldots,F^{n}\}\subset{\mathcal{L}}_{{\rm h}} such that for every i{1,,n}i\in\{1,\ldots,n\}

    jgijLj=Fiξi.\sum_{j}g^{ij}L_{j}=F^{i}-\xi^{i}. (3.32)

    (Note: (3.32) implies ξi=Fi|\xi^{i}=\left\langle F^{i}\right\rangle|_{{\mathcal{M}}}.)

  • (iv)

    {F1,,Fn}h\exists\{F^{1},\ldots,F^{n}\}\subset{\mathcal{L}}_{{\rm h}} such that for every i{1,,n}i\in\{1,\ldots,n\}

    ρ,Li,ρspan{Fj}j=1nandξi(ρ)=Fiρ,\forall\rho\in{\mathcal{M}},\;\;L_{i,\rho}\in{\rm span}\,\{F^{j}\}_{j=1}^{n}\oplus\mathbb{R}\quad\text{and}\quad\xi^{i}(\rho)=\left\langle F^{i}\right\rangle_{\rho}, (3.33)

    where \mathbb{R} is identified with {cI|c}\{cI\,|\,c\in\mathbb{R}\} (see Remark ​3.9 below).

Proof  The equivalence (i) \Leftrightarrow (ii) is immediate from Prop. ​2.6 and Cor. ​3.6, and (iii) \Rightarrow (ii) is obvious since Li=0\left\langle L_{i}\right\rangle=0.

To show (ii) \Rightarrow (iii), assume (ii). Then we have

jgijLj,Lk=Fi,Lk.\sum_{j}g^{ij}\left\langle L_{j},L_{k}\right\rangle=\left\langle F^{i},L_{k}\right\rangle.

Here, the LHS is jgijgjk=δki=kξi\sum_{j}g^{ij}g_{jk}=\delta^{i}_{k}=\partial_{k}\xi^{i} and the RHS is kFi\partial_{k}\left\langle F^{i}\right\rangle due to (3.13). Hence, there exists a constant vector (ci)n(c^{i})\in\mathbb{R}^{n} such that Fi|=ξi+ci\left\langle F^{i}\right\rangle|_{{\mathcal{M}}}=\xi^{i}+c^{i}. Redefining Fi:=FiciF^{i}:=F^{i}-c^{i}, (3.31) is rewritten as (3.32), and (iii) is verified.

Since (3.32) implies that

Li,ρ=jgij(ρ)Fj(jgij(ρ)ξj(ρ))span{Fj}j=1n,L_{i,\rho}=\sum_{j}g_{ij}(\rho)F^{j}-\Bigl{(}\sum_{j}g_{ij}(\rho)\xi^{j}(\rho)\Bigr{)}\in{\rm span}\,\{F^{j}\}_{j=1}^{n}\oplus\mathbb{R},

we have (iii) \Rightarrow (iv). To show the converse, we assume the existence of {F1,,Fn}\{F^{1},\ldots,F^{n}\} in (iv). Then we have ξi=Fi|\xi^{i}=\left\langle F^{i}\right\rangle|_{{\mathcal{M}}}, and for each ρ\rho\in{\mathcal{M}} there exist [aij]n×n[a_{ij}]\in\mathbb{R}^{n\times n} and [bi]n[b_{i}]\in\mathbb{R}^{n} such that for any ii

Li,ρ=jaijFj+bi.L_{i,\rho}=\sum_{j}a_{ij}F^{j}+b_{i}.

This implies that

gik(ρ)\displaystyle g_{ik}(\rho) =Lk,ρ,Li,ρρ=jaijLk,ρ,Fjρ\displaystyle=\left\langle L_{k,\rho},L_{i,\rho}\right\rangle_{\rho}=\sum_{j}a_{ij}\left\langle L_{k,\rho},F^{j}\right\rangle_{\rho}
=jaij(kFj)ρ=jaij(kξj)ρ=aik.\displaystyle=\sum_{j}a_{ij}(\partial_{k}\left\langle F^{j}\right\rangle)_{\rho}=\sum_{j}a_{ij}(\partial_{k}\xi^{j})_{\rho}=a_{ik}.

Hence we have

jgij(ρ)Lj,ρ=Fi+jgij(ρ)bj.\sum_{j}g^{ij}(\rho)L_{j,\rho}=F^{i}+\sum_{j}g^{ij}(\rho)b_{j}.

Here, the constant jgij(ρ)bj\sum_{j}g^{ij}(\rho)b_{j} should be equal to Fiρ=ξi(ρ)-\left\langle F^{i}\right\rangle_{\rho}=-\xi^{i}(\rho) due to Lj,ρρ=0\left\langle L_{j,\rho}\right\rangle_{\rho}=0. Thus (iii) is concluded. \Box

Remark 3.9.

In (3.31) and (3.32), the operators {FiFiρ}i=1n\{F^{i}-\left\langle F^{i}\right\rangle_{\rho}\}_{i=1}^{n} turn out to be linearly independent for each ρ\rho\in{\mathcal{M}}, which implies that {F1,,Fn,I}\{F^{1},\ldots,F^{n},I\} are linearly independent, or equivalently that {F1,,Fn}\{F^{1},\ldots,F^{n}\} are linearly independent in the quotient space h/{\mathcal{L}}_{{\rm h}}/\mathbb{R} with identification ={cI|c}\mathbb{R}=\{cI\,|\,c\in\mathbb{R}\}.

At the end of this section, we present a proposition which claims that i.i.d. extensions of a model preserves the e-autoparallelity. For the proposition, we assume the following condition on the family of inner products from which the information-geometric structure is defined:

{At}t=1k,{Bt}t=1kh(),t=1kAt,t=1kBtρk=t=1kAt,Btρ,\forall\{A_{t}\}_{t=1}^{k},\,\forall\{B_{t}\}_{t=1}^{k}\subset{\mathcal{L}}_{{\rm h}}({\mathcal{H}}),\;\bigl{\langle}\,\operatorname*{\text{\raisebox{0.96873pt}{\scalebox{0.8}{$\bigotimes$}}}}_{t=1}^{k}A_{t},\operatorname*{\text{\raisebox{0.96873pt}{\scalebox{0.8}{$\bigotimes$}}}}_{t=1}^{k}B_{t}\,\bigr{\rangle}_{\rho^{\otimes k}}=\prod_{t=1}^{k}\left\langle A_{t},B_{t}\right\rangle_{\rho}, (3.34)

which is equivalent to

{At}t=1kh(),Ωρk(t=1kAt)=t=1kΩρ(At).\forall\{A_{t}\}_{t=1}^{k}\subset{\mathcal{L}}_{{\rm h}}({\mathcal{H}}),\;\Omega_{\rho^{\otimes k}}\bigl{(}\operatorname*{\text{\raisebox{0.96873pt}{\scalebox{0.8}{$\bigotimes$}}}}_{t=1}^{k}A_{t}\bigr{)}=\operatorname*{\text{\raisebox{0.96873pt}{\scalebox{0.8}{$\bigotimes$}}}}_{t=1}^{k}\Omega_{\rho}(A_{t}). (3.35)

The assumption is satisfied when the inner products are defined from a function ff by (3.5) and (3.21) for which Ωρ=f(Δρ)\Omega_{\rho}=f(\Delta_{\rho}) and Ωρk=f(Δρk)\Omega_{\rho^{\otimes k}}=f\bigl{(}\Delta_{\rho^{\otimes k}}\bigr{)} hold. In particular, the proposition hols for the SLD structure.

Proposition 3.10.

Given a model (,ξ)({\mathcal{M}},\xi) in 𝒮(){\mathcal{S}}({\mathcal{H}}) and a natural number k2k\geq 2, define the model (~,ξ~)(\tilde{{\mathcal{M}}},\tilde{\xi}) in 𝒮(k){\mathcal{S}}({\mathcal{H}}^{\otimes k}) by ~:={ρk|ρ}\tilde{{\mathcal{M}}}:=\{\rho^{\otimes k}\,|\,\rho\in{\mathcal{M}}\} and ξ~i(ρk)=ξi(ρ)\tilde{\xi}^{i}(\rho^{\otimes k})=\xi^{i}(\rho) for ρ\rho\in{\mathcal{M}}. Under the assumption (3.34)-(3.35), the following conditions are equivalent.

  • (i)

    {\mathcal{M}} is e-autoparallel in 𝒮{\mathcal{S}}, and ξ\xi is m-affine.

  • (ii)

    ~\tilde{{\mathcal{M}}} is e-autoparallel in 𝒮(k){\mathcal{S}}({\mathcal{H}}^{\otimes k}), and ξ~\tilde{\xi} is m-affine.

Proof  Let i:=ξi\partial_{i}:=\frac{\partial}{\partial\xi^{i}}, ~i:=ξi~\tilde{\partial}_{i}:=\frac{\partial}{\partial\tilde{\xi^{i}}}, and Li=LiL_{i}=L_{\partial_{i}}, L~i=L~i\tilde{L}_{i}=L_{\tilde{\partial}_{i}}, which are determined by

ι((i)ρ)=Ωρ(Li,ρ)\iota_{*}((\partial_{i})_{\rho})=\Omega_{\rho}(L_{i,\rho})

and

ι~((~i)ρk)=Ωρk(L~i,ρk),\tilde{\iota}_{*}((\tilde{\partial}_{i})_{\rho^{\otimes k}})=\Omega_{\rho^{\otimes k}}(\tilde{L}_{i,\rho^{\otimes k}}),

where ι~\tilde{\iota} denotes the natural embedding 𝒮(k)h,1(k){\mathcal{S}}({\mathcal{H}}^{\otimes k})\rightarrow{{\mathcal{L}}_{{\rm h}}}_{,1}({\mathcal{H}}^{\otimes k}). With the aid of the parametric representation (3.2), we see that

ι~((~i)ρk)\displaystyle\tilde{\iota}_{*}((\tilde{\partial}_{i})_{\rho^{\otimes k}}) =~i(ρk)ξ~=iρξk=t=1kρξ(t1)iρξρξ(kt)\displaystyle=\tilde{\partial}_{i}(\rho^{\otimes k})_{\tilde{\xi}}=\partial_{i}\rho_{\xi}^{\otimes k}=\sum_{t=1}^{k}\rho_{\xi}^{\otimes(t-1)}\otimes\partial_{i}\rho_{\xi}\otimes\rho_{\xi}^{\otimes(k-t)}
=t=1kρ(t1)ι((i)ρ)ρ(kt)\displaystyle=\sum_{t=1}^{k}\rho^{\otimes(t-1)}\otimes\iota_{*}((\partial_{i})_{\rho})\otimes\rho^{\otimes(k-t)}
=t=1kρ(t1)Ωρ(Li,ρ)ρ(kt)\displaystyle=\sum_{t=1}^{k}\rho^{\otimes(t-1)}\otimes\Omega_{\rho}(L_{i,\rho})\otimes\rho^{\otimes(k-t)}
=t=1kΩρk(I(t1)Li,ρI(kt))=Ωρk(Li,ρ(k)),\displaystyle\stackrel{{\scriptstyle\star}}{{=}}\sum_{t=1}^{k}\Omega_{\rho^{\otimes k}}(I^{\otimes(t-1)}\otimes L_{i,\rho}\otimes I^{\otimes(k-t)})=\Omega_{\rho^{\otimes k}}(L_{i,\rho}^{(k)}),

where =\stackrel{{\scriptstyle\star}}{{=}} follows from (3.6) and (3.35), and we have used the notation

A(k):=t=1kI(t1)AI(kt)forAh().A^{(k)}:=\sum_{t=1}^{k}I^{\otimes(t-1)}\otimes A\otimes I^{\otimes(k-t)}\quad\text{for}\;\;A\in{\mathcal{L}}_{{\rm h}}({\mathcal{H}}).

This implies that L~i,ρk=(Li,ρ)(k)\tilde{L}_{i,\rho^{\otimes k}}=(L_{i,\rho})^{(k)} and that g~ij=kgij\tilde{g}_{ij}=k\,g_{ij} for gij(ρ)=Li,ρ,Lj,ρρg_{ij}(\rho)=\left\langle L_{i,\rho},L_{j,\rho}\right\rangle_{\rho} and g~ij(ρk)=L~i,ρk,L~j,ρkρk\tilde{g}_{ij}(\rho^{\otimes k})=\bigl{\langle}\tilde{L}_{i,\rho^{\otimes k}},\tilde{L}_{j,\rho^{\otimes k}}\bigr{\rangle}_{\rho^{\otimes k}}, which leads to g~ij=1kgij\tilde{g}^{ij}=\frac{1}{k}\,g^{ij}. Hence, Li:=jgijLjL^{i}:=\sum_{j}g^{ij}L_{j} and L~i:=jg~ijL~j\tilde{L}^{i}:=\sum_{j}\tilde{g}^{ij}\tilde{L}_{j} are linked by L~ρki=1k(Lρi)(k).\tilde{L}^{i}_{\rho^{\otimes k}}=\frac{1}{k}\,(L^{i}_{\rho})^{(k)}. Now, according to Prop. ​3.8, conditions (i), (ii) of the present proposition are respectively expressed as

  • (i)’

    {Fi}(),i,ρ,Lρi=Fiξi(ρ)\exists\{F^{i}\}\subset{\mathcal{L}}({\mathcal{H}}),\;\forall i,\forall\rho\in{\mathcal{M}},\;L^{i}_{\rho}=F^{i}-\xi^{i}(\rho).

  • (ii)”

    {F~i}(k),i,ρ,1k(Lρi)(k)=F~iξi(ρ)\exists\{\tilde{F}^{i}\}\subset{\mathcal{L}}({\mathcal{H}}^{\otimes k}),\;\forall i,\forall\rho\in{\mathcal{M}},\;\frac{1}{k}\,(L^{i}_{\rho})^{(k)}=\tilde{F}^{i}-\xi^{i}(\rho).

They are obviously equivalent with relation Fi~=1k(Fi)(k)\tilde{F^{i}}=\frac{1}{k}(F^{i})^{(k)}. \Box

4 Efficient estimators

From this section, we investigate the relationship between estimation problems and geometric properties for quantum statistical models. Henceforth, we will consider only the SLD structure as an information geometric structure unless otherwise stated.

Given a model (,ξ)({\mathcal{M}},\xi) in 𝒮(){\mathcal{S}}({\mathcal{H}}), an estimator for coordinates ξ\xi is generally represented by a POVM Π=Π(dξ^)\Pi=\Pi(d\hat{\xi}) on {\mathcal{H}}, where ξ^\hat{\xi} is a variable representing an estimate. A representative case is when Π\Pi is expressed as Π(dξ^)=ωΩπωδf(ω)(dξ^)\Pi(d\hat{\xi})=\sum_{\omega\in\Omega}\pi_{\omega}\,\delta_{f(\omega)}(d\hat{\xi}) by a POVM π=(πω)ωΩ\pi=(\pi_{\omega})_{\omega\in\Omega} on a finite set Ω\Omega and a function f:Ωnf:\Omega\rightarrow\mathbb{R}^{n}, where δf(ω)\delta_{f(\omega)} denotes the δ\delta-measure concentrated on the point f(ω)nf(\omega)\in\mathbb{R}^{n}. This estimator, which is denoted by Π=(π,f)\Pi=(\pi,f), represents the estimation procedure in which the estimate is determined as ξ=f(ω)\xi=f(\omega) from the outcome ω\omega of the measurement π\pi.

The expectation Eρ(Π)nE_{\rho}(\Pi)\in\mathbb{R}^{n} and the mean squared error (the variance in the unbiased case) Vρ(Π)n×nV_{\rho}(\Pi)\in\mathbb{R}^{n\times n} of Π\Pi for a state ρ\rho are defined by

Eρ(Π)\displaystyle E_{\rho}(\Pi) :=ξ^Tr(ρΠ(dξ^)),\displaystyle:=\int\hat{\xi}\,{\rm Tr}\left(\rho\,\Pi(d\hat{\xi})\right), (4.1)
Vρ(Π)\displaystyle V_{\rho}(\Pi) :=(ξ^ξ(ρ))(ξ^ξ(ρ))TrT(ρΠ(dξ^)),\displaystyle:=\int(\hat{\xi}-\xi(\rho)){(\hat{\xi}-\xi(\rho))}{}^{T}\,{\rm Tr}\left(\rho\,\Pi(d\hat{\xi})\right), (4.2)

where n\mathbb{R}^{n} is regarded as the space of column vectors n×1\mathbb{R}^{n\times 1} and T denotes the transpose. For Π=(π,f)\Pi=(\pi,f), we have

Eρ(Π)\displaystyle E_{\rho}(\Pi) =ωf(ω)Tr(ρπω),\displaystyle=\sum_{\omega}f(\omega)\,{\rm Tr}(\rho\pi_{\omega}), (4.3)
Vρ(Π)\displaystyle V_{\rho}(\Pi) =ω(f(ω)ξ(ρ))(f(ω)ξ(ρ))TrT(ρπω).\displaystyle=\sum_{\omega}(f(\omega)-\xi(\rho)){(f(\omega)-\xi(\rho))}{}^{T}\,{\rm Tr}(\rho\pi_{\omega}). (4.4)

An estimator Π\Pi is called locally unbiased for a coordinate system ξ\xi at ρ\rho\in{\mathcal{M}} when the elements Eρi(Π)E^{i}_{\rho}(\Pi), i{1,,n}i\in\{1,\ldots,n\}, of Eρ(Π)E_{\rho}(\Pi) satisfy

Eρi(Π)=ξi(ρ)andjEρi(Π)=δji,E^{i}_{\rho}(\Pi)=\xi^{i}(\rho)\quad\text{and}\quad\partial_{j}E^{i}_{\rho}(\Pi)=\delta^{i}_{j}, (4.5)

where jEρi(Π)\partial_{j}E^{i}_{\rho}(\Pi) denotes the derivative of the function Ei(Π):σEσi(Π)E^{i}(\Pi):\sigma\mapsto E^{i}_{\sigma}(\Pi) by i=ξi\partial_{i}=\frac{\partial}{\partial\xi^{i}} evaluated at the point ρ\rho. We denote by 𝒰(ρ,ξ){\mathcal{U}}(\rho,\xi) the totality of locally unbiased estimators for ξ\xi at ρ\rho. Using the symmetrized inner product ,\left\langle\cdot,\cdot\right\rangle of (3.22) and the SLDs Li=LiL_{i}=L_{\partial_{i}}, i{1,,n}i\in\{1,\ldots,n\}, we have

Π𝒰(ρ,ξ)i,j,Aiρ=ξi(ρ)andAi,Lj,ρρ=δji,\Pi\in{\mathcal{U}}(\rho,\xi)\;\Leftrightarrow\;\forall i,j,\;\left\langle A^{i}\right\rangle_{\rho}=\xi^{i}(\rho)\;\;\text{and}\;\;\left\langle A^{i},L_{j,\rho}\right\rangle_{\rho}=\delta_{j}^{i}, (4.6)

where

Ai:=ξ^iΠ(dξ^)h().A^{i}:=\int\hat{\xi}^{i}\Pi(d\hat{\xi})\in{\mathcal{L}}_{{\rm h}}({\mathcal{H}}). (4.7)

The well-known SLD Cramér-Rao inequality [6, 7] states that every Π𝒰(ρ,ξ)\Pi\in{\mathcal{U}}(\rho,\xi) obeys

Vρ(Π)Gρ1,V_{\rho}(\Pi)\geq G_{\rho}^{-1}, (4.8)

where Gρ=[gij(ρ)]G_{\rho}=[g_{ij}(\rho)] denotes the SLD Fisher information matrix defined by gij=Li,Lj=g(i,j)g_{ij}=\left\langle L_{i},L_{j}\right\rangle=g(\partial_{i},\partial_{j}), where gg is the SLD metric. Furthermore, the following proposition holds.

Proposition 4.1.

For every column vector 𝐮n=n×1{\mbox{\boldmath$u$}}\in\mathbb{R}^{n}=\mathbb{R}^{n\times 1}, we have

infΠ𝒰(ρ,ξ)𝒖VρT(Π)𝒖=𝒖Gρ1T𝒖.\inf_{\Pi\in{\mathcal{U}}(\rho,\xi)}\,{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$u$}}={{\mbox{\boldmath$u$}}}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}. (4.9)

Note that inf\inf in (4.9) cannot be replaced with min\min in general.

Let us introduce a class of randomized procedures for estimation that will be useful in the proofs of both Prop. ​4.1 above and Theorem ​5.1 later. Suppose that a point ρ\rho\in{\mathcal{M}}, a basis {𝒖1,,𝒖n}\{{\mbox{\boldmath$u$}}^{1},\ldots,{\mbox{\boldmath$u$}}^{n}\} of n\mathbb{R}^{n}, a positive probability vector (p1,,pn)(p_{1},\ldots,p_{n}) s.t. i\forall i, pi>0,ipi=1p_{i}>0,\sum_{i}p_{i}=1 and n2n^{2} real numbers {γki}\{\gamma_{k}^{i}\} satisfying

kpkγki=ξi(ρ)\sum_{k}p_{k}\,\gamma_{k}^{i}=\xi^{i}(\rho) (4.10)

are arbitrarily given. Let

Xk:=iuikLρih(),X^{k}:=\sum_{i}u^{k}_{i}\,L^{i}_{\rho}\in{\mathcal{L}}_{{\rm h}}({\mathcal{H}}), (4.11)

where 𝒖k=[uik]{\mbox{\boldmath$u$}}^{k}=[u^{k}_{i}], Lρi:=jgij(ρ)Lj,ρL^{i}_{\rho}:=\sum_{j}g^{ij}(\rho)L_{j,\rho}, Gρ1=[gij(ρ)]G_{\rho}^{-1}=[g^{ij}(\rho)], and consider the random measurement such that k{1,,n}k\in\{1,\ldots,n\} is randomly chosen according to the probability distribution (p1,,pn)(p_{1},\ldots,p_{n}) and then the observable XkX^{k} is measured. This measurement is represented by the POVM π={πk,r}={pkπrk}\pi=\{\pi_{k,r}\}=\{p_{k}\pi^{k}_{r}\}, where {πrk}\{\pi^{k}_{r}\} are the projectors in the spectral decomposition Xk=rxrkπrkX^{k}=\sum_{r}x^{k}_{r}\,\pi^{k}_{r}. (Do not confuse XkX^{k}, xrkx^{k}_{r} and πrk\pi^{k}_{r} with the nnth powers of XX, xrx_{r} and πr\pi_{r}.) When an eigenvalue xrkx^{k}_{r} is observed by measuring XkX^{k}, we estimate ξ\xi by ξ^=f(k,xrk)\hat{\xi}=f(k,x^{k}_{r}), where f=(f1,,fn)Tf={(f^{1},\ldots,f^{n})}{}^{T} is defined by

fi(k,x):=γki+wkipkx,f^{i}(k,x):=\gamma_{k}^{i}+\frac{w^{i}_{k}}{p_{k}}x, (4.12)

and [wki]=[uik]1[w^{i}_{k}]=[u^{k}_{i}]^{-1}, i.e., kuikwkj=δij\sum_{k}u^{k}_{i}w^{j}_{k}=\delta_{i}^{j} and iuikwli=δlk\sum_{i}u^{k}_{i}w^{i}_{l}=\delta_{l}^{k}. This estimation procedure defines the estimator Π:=(π,f)\Pi:=(\pi,f), which is characterized by the following property: for any polynomial function φ:n\varphi:\mathbb{R}^{n}\rightarrow\mathbb{R},

φ(ξ^)Π(dξ^)=kpkφ(f(k,Xk)).\int\varphi(\hat{\xi})\Pi(d\hat{\xi})=\sum_{k}p_{k}\,\varphi(f(k,X^{k})). (4.13)

In this situation we have the following lemma, whose proof is given in A2 of Appendix.

Lemma 4.2.

The estimator Π=(π,f)\Pi=(\pi,f) satisfies:

  1. (1)

    Π𝒰(ρ,ξ)\Pi\in{\mathcal{U}}(\rho,\xi).

  2. (2)

    k,(𝒖k)VρT(Π)𝒖k=1pk(𝒖k)Gρ1T𝒖k+lpl(alk)2\displaystyle\forall k,\;{({\mbox{\boldmath$u$}}^{k})}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$u$}}^{k}=\frac{1}{p_{k}}\,{({\mbox{\boldmath$u$}}^{k})}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}^{k}+\sum_{l}p_{l}(a_{l}^{k})^{2},
    where alk:=iuik(γliξi(ρ))\displaystyle a_{l}^{k}:=\sum_{i}u^{k}_{i}(\gamma_{l}^{i}-\xi^{i}(\rho)).

Proof of Prop. 4.1  Let γki:=ξi(ρ)\gamma_{k}^{i}:=\xi^{i}(\rho), which satisfies (4.10) so that Lemma ​4.2 is applicable. Since alk=0a_{l}^{k}=0 in this case, we have

(𝒖k)VρT(Π)𝒖k=1pk(𝒖k)Gρ1T𝒖k.{({\mbox{\boldmath$u$}}^{k})}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$u$}}^{k}=\frac{1}{p_{k}}{({\mbox{\boldmath$u$}}^{k})}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}^{k}.

The proposition is obvious when 𝒖=0{\mbox{\boldmath$u$}}=0, so we assume 𝒖0{\mbox{\boldmath$u$}}\neq 0 and choose p(0,1)p\in(0,1) arbitrarily. Taking (𝒖1,,𝒖n)({\mbox{\boldmath$u$}}^{1},\ldots,{\mbox{\boldmath$u$}}^{n}) and (p1,,pn)(p_{1},\ldots,p_{n}) in the above construction such that 𝒖1=𝒖{\mbox{\boldmath$u$}}^{1}={\mbox{\boldmath$u$}} and p1=pp_{1}=p, the resulting Π\Pi satisfies 𝒖VρT(Π)𝒖=1p𝒖Gρ1T𝒖{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$u$}}=\frac{1}{p}{{\mbox{\boldmath$u$}}}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}. Since pp can be arbitrarily close to 11, we have the proposition. \Box

A locally unbiased estimator Π𝒰(ρ,ξ)\Pi\in{\mathcal{U}}(\rho,\xi) is called locally efficient for ξ\xi at ρ\rho if Vρ(Π)Vρ(Π)V_{\rho}(\Pi)\leq V_{\rho}(\Pi^{\prime}) for all Π(ρ,ξ)\Pi^{\prime}\in(\rho,\xi). Given a positive-semidefinite matrix Wn×nW\in\mathbb{R}^{n\times n} as a weight, an estimator Π𝒰(ρ,ξ)\Pi\in{\mathcal{U}}(\rho,\xi) is called locally WW-efficient for ξ\xi at ρ\rho if tr(WVρ(Π))tr(WVρ(Π)){\rm tr}\,(WV_{\rho}(\Pi))\leq{\rm tr}\,(WV_{\rho}(\Pi^{\prime})) for all Π𝒰(ρ,ξ)\Pi^{\prime}\in{\mathcal{U}}(\rho,\xi), or equivalently if

tr(WVρ(Π))=minΠ𝒰(ρ,ξ)tr(WVρ(Π)).{\rm tr}\,(WV_{\rho}(\Pi))=\min_{\Pi^{\prime}\in{\mathcal{U}}(\rho,\xi)}{\rm tr}\,(WV_{\rho}(\Pi^{\prime})). (4.14)

Here, the symbol tr{\rm tr}\, is used for the trace of n×nn\times n matrices to distinguish it from the trace Tr{\rm Tr} for operators on {\mathcal{H}}.

Proposition 4.3.

Given a model (,ξ)({\mathcal{M}},\xi), a point ρ\rho\in{\mathcal{M}} and an estimator Π𝒰(ρ,ξ)\Pi\in{\mathcal{U}}(\rho,\xi), the following conditions are equivalent.

  • (i)

    Π\Pi is locally efficient for ξ\xi at ρ\rho.

  • (ii)

    Vρ(Π)=Gρ1V_{\rho}(\Pi)=G_{\rho}^{-1}.

  • (iii)

    Π\Pi is locally 𝐮𝐮T{\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}-efficient for ξ\xi at ρ\rho for every column vector 𝒖n{\mbox{\boldmath$u$}}\in\mathbb{R}^{n}.

  • (iv)

    Π\Pi is locally WW-efficient for ξ\xi at ρ\rho for every positive weight W>0W>0.

Proof  The equivalence (i) \Leftrightarrow (iii) \Leftrightarrow (iv) is obvious since

Vρ(Π)Vρ(Π)\displaystyle V_{\rho}(\Pi)\leq V_{\rho}(\Pi^{\prime})\; 𝒖n,𝒖VρT(Π)𝒖𝒖VρT(Π)𝒖\displaystyle\Leftrightarrow\;\forall{\mbox{\boldmath$u$}}\in\mathbb{R}^{n},\;{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$u$}}\leq{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi^{\prime}){\mbox{\boldmath$u$}}
W>0,tr(WVρ(Π))tr(WVρ(Π)),\displaystyle\Leftrightarrow\;\forall W>0,\;{\rm tr}\,(WV_{\rho}(\Pi))\leq{\rm tr}\,(WV_{\rho}(\Pi^{\prime})),

and (ii) \Leftrightarrow (iii) follows from Prop. ​4.1. \Box

Remark 4.4.

As is well known, there exists a locally efficient estimator at ρ\rho iff the SLDs L1,ρ,,Ln,ρL_{1,\rho},\ldots,L_{n,\rho} mutually commute (e.g. § ​7.4 of [3]).

An estimator Π\Pi is called efficient for a coordinate system ξ\xi if Π\Pi is locally efficient for ξ\xi at every point ρ\rho\in{\mathcal{M}}. Given a weight field 𝒲={Wρ|ρ}{\mathcal{W}}=\{W_{\rho}\;|\;\rho\in{\mathcal{M}}\}, Π\Pi is called 𝒲{\mathcal{W}}-efficient for ξ\xi if Π\Pi is locally WρW_{\rho}-efficient for ξ\xi at every ρ\rho\in{\mathcal{M}}. When Wρ=WW_{\rho}=W for all ρ\rho, we simply call it WW-efficient for ξ\xi. According to Prop. ​4.3, Π\Pi is efficient \Leftrightarrow Π\Pi is 𝒖𝒖T{\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}-efficient for every 𝒖n{\mbox{\boldmath$u$}}\in\mathbb{R}^{n} \Leftrightarrow Π\Pi is WW-efficient for every W>0W>0.

The condition for existence of efficient estimator was mentioned in the introduction as Theorem ​1.3. Suppose that (,ξ)({\mathcal{M}},\xi) is represented as (1.3) and (1.4) in the theorem; namely, every ρ\rho\in{\mathcal{M}} is represented in the form

ρ=exp[12{i=1nθi(ρ)Fiψ(ρ)}]Pexp[12{i=1nθi(ρ)Fiψ(ρ)}]\rho=\exp\Bigl{[}\,\frac{1}{2}\Bigl{\{}\sum_{i=1}^{n}\theta_{i}(\rho)F^{i}-\psi(\rho)\Bigr{\}}\Bigr{]}\,P\exp\Bigl{[}\,\frac{1}{2}\Bigl{\{}\sum_{i=1}^{n}\theta_{i}(\rho)F^{i}-\psi(\rho)\Bigr{\}}\Bigr{]} (4.15)

and satisfies ξi(ρ)=Fiρ\xi^{i}(\rho)=\left\langle F^{i}\right\rangle_{\rho}, where F1,,FnF^{1},\ldots,F^{n} are mutually commuting observables. Note that PP can be chosen to be an arbitrary element of {\mathcal{M}} if we wish. The SLDs of {\mathcal{M}} w.r.t. ξ\xi are represented as

Li=i(jθjFjψ)=j(iθj)(Fjjψ),L_{i}=\partial_{i}\Bigl{(}\sum_{j}\theta_{j}F^{j}-\psi\Bigr{)}=\sum_{j}(\partial_{i}\theta_{j})(F^{j}-\partial^{j}\psi), (4.16)

where i:=ξi\partial_{i}:=\frac{\partial}{\partial\xi^{i}} and i:=θi\partial^{i}:=\frac{\partial}{\partial\theta_{i}}. Noting that the positions of upper/lower indices (superscripts and subscripts) are reversed from the standard notation of information geometry as in [3] (see Remark ​1.2), we have iθj=gij\partial_{i}\theta_{j}=g_{ij}, iψ=ξi\partial^{i}\psi=\xi^{i} and

Li=jgijLj=Fiξi.L^{i}=\sum_{j}g^{ij}L_{j}=F^{i}-\xi^{i}. (4.17)

According to Prop. ​3.8, this means that {\mathcal{M}} is e-autoparallel in 𝒮{\mathcal{S}} and that ξ\xi is an m-affine coordinate system w.r.t. ​the SLD structure. Furthermore, the induced e-connection (e)|\nabla^{({\rm e})}|_{{\mathcal{M}}} on MM turns out to to be torsion-free and hence flat, for which θ=(θi)\theta=(\theta_{i}) forms an affine coordinate system. We have thus seen that {\mathcal{M}} is dually flat just as classical exponential families.

When n=1n=1, (4.15) is written as

ρ=exp[12{θ(ρ)Fψ(ρ)}]Pexp[12{θ(ρ)Fψ(ρ)}].\rho=\exp\left[\frac{1}{2}\bigl{\{}\theta(\rho)F-\psi(\rho)\bigr{\}}\right]\,P\exp\left[\frac{1}{2}\bigl{\{}\theta(\rho)F-\psi(\rho)\bigr{\}}\right]. (4.18)

This is a general form of e-geodesic in the sense that every e-geodesic is represented in this form by some PP and FF. In the multi-dimensional case n2n\geq 2, on the other hand, (4.15) provides merely a special case of e-autoparallel submanifolds. In order to characterize the e-autoparallelity by an estimation-theoretical notion, the existence of efficient estimator is too strong, and we need a new variant of the notion of efficient estimators, which will be introduced in the next section.

5 Efficient filtration of estimators

We now consider a one-parameter family of estimators Π=(Πε)ε(0,ε1)\vec{\Pi}=(\Pi_{\varepsilon})_{{\varepsilon}\in(0,{\varepsilon}_{1})} instead of a single estimator, and call it a filtration of estimators or simply a filtration. The upper limit ε1{\varepsilon}_{1} can be an arbitrary positive number or \infty, but our interest lies only in the limiting property of ε0{\varepsilon}\downarrow 0 and the value of ε1{\varepsilon}_{1} plays no role. So we simply write Π=(Πε)ε>0\vec{\Pi}=(\Pi_{\varepsilon})_{{\varepsilon}>0}. Given a nonnegative real matrix Wn×nW\in\mathbb{R}^{n\times n} as a weight, a filtration Π=(Πε)ε>0\vec{\Pi}=(\Pi_{\varepsilon})_{{\varepsilon}>0} is called locally WW-efficient for ξ\xi at ρ\rho if Πε𝒰(ρ,ξ)\Pi_{\varepsilon}\in{\mathcal{U}}(\rho,\xi) for every ε>0{\varepsilon}>0 and limε0tr(WVρ(Πε))tr(WVρ(Π))\lim_{{\varepsilon}\downarrow 0}{\rm tr}\,(WV_{\rho}(\Pi_{\varepsilon}))\leq{\rm tr}\,(WV_{\rho}(\Pi^{\prime})) for every Π𝒰(ρ,ξ)\Pi^{\prime}\in{\mathcal{U}}(\rho,\xi), which is equivalent to

limε0tr(WVρ(Πε))=infΠ𝒰(ρ,ξ)tr(WVρ(Π)).\lim_{{\varepsilon}\downarrow 0}{\rm tr}\,(WV_{\rho}(\Pi_{\varepsilon}))=\inf_{\Pi^{\prime}\in{\mathcal{U}}(\rho,\xi)}{\rm tr}\,(WV_{\rho}(\Pi^{\prime})). (5.1)

When W=𝒖𝒖TW={\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T} with 𝒖n{\mbox{\boldmath$u$}}\in\mathbb{R}^{n}, in particular, this is represented as

limε0𝒖VρT(Πε)𝒖=𝒖Gρ1T𝒖\lim_{{\varepsilon}\downarrow 0}{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi_{\varepsilon}){\mbox{\boldmath$u$}}={{\mbox{\boldmath$u$}}}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}} (5.2)

due to Prop. ​4.1. Compare (5.1) with (4.14), and note that a locally WW-efficient filtration at ρ\rho always exists, even when a locally WW-efficient estimator does not exist.

Given a weight field 𝒲={Wρ|ρ}{\mathcal{W}}=\{W_{\rho}\,|\,\rho\in{\mathcal{M}}\}, a filtration Π=(Πε)ε>0\vec{\Pi}=(\Pi_{\varepsilon})_{{\varepsilon}>0} is called 𝒲{\mathcal{W}}-efficient for ξ\xi if it is WρW_{\rho}-locally efficient for ξ\xi at every ρ\rho\in{\mathcal{M}}. When Wρ=WW_{\rho}=W for all ρ\rho, we simply call it WW-efficient for ξ\xi.

Now, we have the following theorem, which gives an estimation-theoretical characterization of the e-autoparallelity.

Theorem 5.1.

For a model (,ξ)({\mathcal{M}},\xi), the following conditions are equivalent.

  • (i)

    {\mathcal{M}} is e-autoparallel in 𝒮{\mathcal{S}}, and ξ\xi is an m-affine coordinate system.

  • (ii)

    For every 𝒖n{\mbox{\boldmath$u$}}\in\mathbb{R}^{n}, there exists a 𝐮𝐮T{\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}-efficient filtration for ξ\xi.

Proof  According to Prop. ​3.8, it suffices to show the equivalence of (ii) and the condition that

{Fi},i,Li=Fiξi,\exists\{F^{i}\},\;\forall i,\;\;L^{i}=F^{i}-\xi^{i}, (5.3)

where Li:=jgijLiL^{i}:=\sum_{j}g^{ij}L_{i}.

We first show (ii) \Rightarrow (5.3). Fix i{1,,n}i\in\{1,\ldots,n\} arbitrarily, and let Π=(Πε)ε>0\vec{\Pi}=(\Pi_{\varepsilon})_{{\varepsilon}>0} be an 𝒆i(𝒆i)T{{\mbox{\boldmath$e$}}}^{i}{({\mbox{\boldmath$e$}}^{i})}{}^{T}-efficient filtration for ξ\xi, where 𝒆i=(δji){\mbox{\boldmath$e$}}^{i}=(\delta_{j}^{i}) denotes the iith vector of the natural basis of n\mathbb{R}^{n}. For each ρ\rho\in{\mathcal{M}}, we have

(𝒆i)VρT(Π)𝒆i\displaystyle{({\mbox{\boldmath$e$}}^{i})}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$e$}}^{i} =(ξ^iξi(ρ))2Tr(ρΠε(dξ^))\displaystyle=\int(\hat{\xi}^{i}-\xi^{i}(\rho))^{2}{\rm Tr}(\rho\Pi_{{\varepsilon}}(d\hat{\xi}))
Tr(ρ(Fεiξi(ρ))2)=Fεiξi(ρ)ρ2,\displaystyle\geq{\rm Tr}(\rho(F_{\varepsilon}^{i}-\xi^{i}(\rho))^{2})=\|F_{\varepsilon}^{i}-\xi^{i}(\rho)\|_{\rho}^{2}, (5.4)

where

Fεi:=ξ^iΠε(dξ^)h(),F_{\varepsilon}^{i}:=\int\hat{\xi}^{i}\Pi_{\varepsilon}(d\hat{\xi})\in{\mathcal{L}}_{{\rm h}}({\mathcal{H}}),

and ρ\|\cdot\|_{\rho} denotes the norm for the symmetrized inner product ,ρ\left\langle\cdot,\cdot\right\rangle_{\rho}. (We also denote the norm for the metric gρg_{\rho} by the same symbol.) From the local unbiasedness condition (4.6) applied to Πε\Pi_{\varepsilon}, we have

Fεiξi(ρ),Lj,ρρ=δji=Lρi,Lj,ρρ.\left\langle F_{\varepsilon}^{i}-\xi^{i}(\rho),L_{j,\rho}\right\rangle_{\rho}=\delta_{j}^{i}=\left\langle L_{\rho}^{i},L_{j,\rho}\right\rangle_{\rho}.

This means that LρiL_{\rho}^{i} is the orthogonal projection of Fεiξi(ρ)F_{\varepsilon}^{i}-\xi^{i}(\rho) onto span{Lj,ρ}j=1n{\rm span}\{L_{j,\rho}\}_{j=1}^{n}. Hence we have

Fεiξi(ρ)ρ2=Lρiρ2+Fεiξi(ρ)Lρiρ2.\|F_{\varepsilon}^{i}-\xi^{i}(\rho)\|_{\rho}^{2}=\|L_{\rho}^{i}\|_{\rho}^{2}+\|F_{\varepsilon}^{i}-\xi^{i}(\rho)-L_{\rho}^{i}\|_{\rho}^{2}. (5.5)

The 𝒆i(𝒆i)T{{\mbox{\boldmath$e$}}}^{i}{({\mbox{\boldmath$e$}}^{i})}{}^{T}-efficiency of Π\vec{\Pi} is represented as

limε0(𝒆i)VρT(Π)𝒆i=(𝒆i)Gρ1T𝒆i=gii(ρ)=Liρ2,\lim_{{\varepsilon}\downarrow 0}{({\mbox{\boldmath$e$}}^{i})}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$e$}}^{i}={({\mbox{\boldmath$e$}}^{i})}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$e$}}^{i}=g^{ii}(\rho)=\|L^{i}\|_{\rho}^{2},

which, combined with (5.4) and (5.5), yields

limε0Fεiξi(ρ)Lρiρ2=0.\lim_{{\varepsilon}\downarrow 0}\|F_{\varepsilon}^{i}-\xi^{i}(\rho)-L_{\rho}^{i}\|_{\rho}^{2}=0.

This implies that a ρ\rho-independent Hermitian operator Fi:=limε0FεiF^{i}:=\lim_{{\varepsilon}\downarrow 0}F_{\varepsilon}^{i} exists and satisfies Lρi=Fiξi(ρ)L_{\rho}^{i}=F^{i}-\xi^{i}(\rho) for every ρ\rho\in{\mathcal{M}}, which concludes (5.3).

We next show (5.3) \Rightarrow (ii). Let 𝒖=(ui)n{\mbox{\boldmath$u$}}=(u_{i})\in\mathbb{R}^{n} be arbitrarily given, for which we will construct 𝒖𝒖T{\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}-efficient filtration by assuming the existence of {Fi}\{F^{i}\} of (5.3). We can assume 𝒖0{\mbox{\boldmath$u$}}\neq 0, and take a basis {𝒖1,,𝒖n}\{{\mbox{\boldmath$u$}}^{1},\ldots,{\mbox{\boldmath$u$}}^{n}\}, 𝒖k=(uik){\mbox{\boldmath$u$}}^{k}=(u_{i}^{k}), of n\mathbb{R}^{n} such that 𝒖1=𝒖{\mbox{\boldmath$u$}}^{1}={\mbox{\boldmath$u$}}, whereby for each kk we define

Yk:=iuikFi.Y^{k}:=\sum_{i}u^{k}_{i}F^{i}. (5.6)

Let ε(0,1){\varepsilon}\in(0,1), and take a positive probability vector (p1,,pn)(p_{1},\ldots,p_{n}) such that p1=1εp_{1}=1-{\varepsilon}. We define the estimator Πε\Pi_{\varepsilon} by the following estimation procedure: randomly choose k{1,,n}k\in\{1,\ldots,n\} according to the probability distribution (p1,,pn)(p_{1},\ldots,p_{n}), measure the observable YkY^{k} to get an outcome yy, and then estimate ξ\xi by ξ^=g(k,y)\hat{\xi}=g(k,y) using the function g=(g1,,gn)Tg={(g^{1},\ldots,g^{n})}{}^{T} defined by

gi(k,y):=wkipky,g^{i}(k,y):=\frac{w_{k}^{i}}{p_{k}}y, (5.7)

where [wki]=[uik]1[w^{i}_{k}]=[u^{k}_{i}]^{-1}. Invoking (5.3) evaluated at an arbitrary point ρ\rho\in{\mathcal{M}}, we have

gi(k,Yk)\displaystyle g^{i}(k,Y^{k}) =wkipkjujkFj=wkipkjujk(Lρj+ξj(ρ))\displaystyle=\frac{w_{k}^{i}}{p_{k}}\sum_{j}u^{k}_{j}F^{j}=\frac{w_{k}^{i}}{p_{k}}\sum_{j}u^{k}_{j}(L_{\rho}^{j}+\xi^{j}(\rho))
=γki+wkipkXk=fi(k,Xk),\displaystyle=\gamma_{k}^{i}+\frac{w_{k}^{i}}{p_{k}}X^{k}=f^{i}(k,X^{k}),

where XkX^{k} and fif^{i} are those defined by (4.11) and (4.12) with

γki:=wkipkjujkξj(ρ).\gamma_{k}^{i}:=\frac{w_{k}^{i}}{p_{k}}\sum_{j}u^{k}_{j}\xi^{j}(\rho). (5.8)

Since this γki\gamma_{k}^{i} satisfies (4.10), Lemma ​4.2 applies to conclude that Πε\Pi_{\varepsilon} is locally unbiased at ρ\rho and satisfies, for every kk and every ρ\rho\in{\mathcal{M}},

(𝒖k)VρT(Πε)𝒖k=1pk(𝒖k)Gρ1T𝒖k+lpl(alk)2,{({\mbox{\boldmath$u$}}^{k})}{}^{T}V_{\rho}(\Pi_{\varepsilon}){\mbox{\boldmath$u$}}^{k}=\frac{1}{p_{k}}\,{({\mbox{\boldmath$u$}}^{k})}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}^{k}+\sum_{l}p_{l}(a_{l}^{k})^{2}, (5.9)

where alk:=iuik(γliξi(ρ))a_{l}^{k}:=\sum_{i}u^{k}_{i}(\gamma_{l}^{i}-\xi^{i}(\rho)). From (5.8), we have

alk=1pli,juikwliujlξj(ρ)iuikξi(ρ)=(δlkpk1)iuikξi(ρ)a_{l}^{k}=\frac{1}{p_{l}}\sum_{i,j}u_{i}^{k}w_{l}^{i}u_{j}^{l}\xi^{j}(\rho)-\sum_{i}u_{i}^{k}\xi^{i}(\rho)=\Bigl{(}\frac{\delta_{l}^{k}}{p_{k}}-1\Bigr{)}\sum_{i}u_{i}^{k}\xi^{i}(\rho)

and hence

lpl(alk)2=lpl(δlkpk1)2(iuikξi(ρ))2=1pkpk(iuikξi(ρ))2.\sum_{l}p_{l}(a_{l}^{k})^{2}=\sum_{l}p_{l}\Bigl{(}\frac{\delta_{l}^{k}}{p_{k}}-1\Bigr{)}^{2}\Bigl{(}\sum_{i}u_{i}^{k}\xi^{i}(\rho)\Bigr{)}^{2}=\frac{1-p_{k}}{p_{k}}\Bigl{(}\sum_{i}u_{i}^{k}\xi^{i}(\rho)\Bigr{)}^{2}.

Thus, letting k=1k=1 in (5.9), we obtain

𝒖VρT(Πε)𝒖=11ε𝒖Gρ1T𝒖+ε1ε(iuikξi(ρ))2.{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi_{\varepsilon}){\mbox{\boldmath$u$}}=\frac{1}{1-{\varepsilon}}{{\mbox{\boldmath$u$}}}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}+\frac{{\varepsilon}}{1-{\varepsilon}}\Bigl{(}\sum_{i}u_{i}^{k}\xi^{i}(\rho)\Bigr{)}^{2}.

This implies that limε0𝒖VρT(Πε)𝒖=𝒖Gρ1T𝒖\lim_{{\varepsilon}\downarrow 0}{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi_{\varepsilon}){\mbox{\boldmath$u$}}={{\mbox{\boldmath$u$}}}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}} for every ρ\rho and that Π:=(Πε)ε(0,1)\vec{\Pi}:=(\Pi_{\varepsilon})_{{\varepsilon}\in(0,1)} is a 𝒖𝒖T{\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}-efficient filtration. \Box

The following proposition follows immediately from Theorem ​5.1 and Prop. ​3.10.

Proposition 5.2.

In the situation of Prop. ​3.10 where (~,ξ~)(\tilde{{\mathcal{M}}},\tilde{\xi}) is the kkth i.i.d. extension of (,ξ)({\mathcal{M}},\xi), (~,ξ~)(\tilde{{\mathcal{M}}},\tilde{\xi}) has an efficient filtration if and only if (,ξ)({\mathcal{M}},\xi) has an efficient filtration.

6 A sufficient condition for the e-autoparallelity and its relation to the Gaussian states

Proposition 6.1.

For a model (,ξ)({\mathcal{M}},\xi), the following condition is sufficient for (i) and (ii) of Theorem ​5.1:

  • For every positive weight W>0W>0, there exists a WW-efficient estimator for ξ\xi.

Proof  Given 𝒖n{\mbox{\boldmath$u$}}\in\mathbb{R}^{n} and ε>0{\varepsilon}>0, arbitrarily, let Πε\Pi_{\varepsilon} be a (𝒖𝒖+TεI)({\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}+{\varepsilon}I)-efficient estimator. Then, for an arbitrary ρ\rho\in{\mathcal{M}} and an arbitrary Π𝒰(ρ,ξ)\Pi^{\prime}\in{\mathcal{U}}(\rho,\xi), we have

𝒖VρT(Πε)𝒖\displaystyle{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi_{\varepsilon}){\mbox{\boldmath$u$}} tr((𝒖𝒖+TεI)Vρ(Πε))\displaystyle\leq{\rm tr}\,(({\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}+{\varepsilon}I)V_{\rho}(\Pi_{\varepsilon}))
tr((𝒖𝒖+TεI)Vρ(Π))=𝒖VρT(Π)𝒖+εtr(Vρ(Π)).\displaystyle\leq{\rm tr}\,(({\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}+{\varepsilon}I)V_{\rho}(\Pi^{\prime}))={{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi^{\prime}){\mbox{\boldmath$u$}}+{\varepsilon}{\rm tr}\,(V_{\rho}(\Pi^{\prime})).

This implies that limε0𝒖VρT(Πε)𝒖𝒖VρT(Π)𝒖\lim_{{\varepsilon}\downarrow 0}{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi_{\varepsilon}){\mbox{\boldmath$u$}}\leq{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi^{\prime}){\mbox{\boldmath$u$}} for every Π𝒰(ρ,ξ)\Pi^{\prime}\in{\mathcal{U}}(\rho,\xi), so that the filtration Π=(Πε)ε>0\vec{\Pi}=(\Pi_{\varepsilon})_{{\varepsilon}>0} is 𝒖𝒖T{\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}-efficient. \Box

An important example of a model satisfying the condition of Prop. ​6.1 is given by a model consisting of quantum Gaussian states. Strictly speaking, the model is mathematically out of our scope in that the underling Hilbert space is infinite-dimensional. Nevertheless, it is worthwhile to consider the relationship between this important model and the e-autoparallelity, even at the expense of some rigor.

We begin by quickly reviewing the definition of quantum Gaussian states based on the description in Chapter ​5 of [8]. Let ZZ be an even-dimensional real linear space on which a symplectic form Δ(,)\Delta(\cdot,\cdot) is given, and U()U(\cdot) be an irreducible projective representation of (Z,Δ)(Z,\Delta) on a separable Hilbert space {\mathcal{H}}; i.e., {U(z)|zZ}\{U(z)\,|\,z\in Z\} is a family of unitary operators on {\mathcal{H}} satisfying z,zZ\forall z,z^{\prime}\in Z, U(z)U(z)=e1Δ(z,z)/2U(z+z)U(z)U(z^{\prime})=e^{\sqrt{-1}\Delta(z,z^{\prime})/2}U(z+z^{\prime}) and having no nontrivial invariant subspace. For each zZz\in Z, a self-adjoint operator R(z)R(z) is defined by U(tz)=e1tR(z),tU(tz)=e^{\sqrt{-1}\,tR(z)},t\in\mathbb{R}, and satisfies

z,zZ,[R(z),R(z)]=1Δ(z,z)I.\forall z,z^{\prime}\in Z,\;[R(z),R(z^{\prime})]=-\sqrt{-1}\,\Delta(z,z^{\prime})I. (6.1)

Given a symmetric bilinear form α(,)\alpha(\cdot,\cdot) on ZZ satisfying z,zZ\forall z,z^{\prime}\in Z, α(z,z)+α(z,z)Δ(z,z)\alpha(z,z)+\alpha(z^{\prime},z^{\prime})\geq\Delta(z,z^{\prime}) and a linear functional μ()\mu(\cdot) on ZZ, there uniquely exists a density operator ρ\rho on {\mathcal{H}} such that

zZ,Tr(ρU(z))=e1μ(z)12α(z,z).\forall z\in Z,\;{\rm Tr}(\rho\,U(z))=e^{\sqrt{-1}\,\mu(z)-\frac{1}{2}\alpha(z,z)}. (6.2)

This ρ\rho is called the Gaussian state determined by (μ,α)(\mu,\alpha), and satisfies

μ(z)=I,R(z)ρ=R(z)ρ,\mu(z)=\left\langle I,R(z)\right\rangle_{\rho}=\left\langle R(z)\right\rangle_{\rho}, (6.3)
α(z,z)=R(z)μ(z),R(z)μ(z)ρ.\alpha(z,z^{\prime})=\left\langle R(z)-\mu(z),R(z^{\prime})-\mu(z^{\prime})\right\rangle_{\rho}. (6.4)

Assuming that α\alpha and linearly independent μ1,,μn\mu_{1},\ldots,\mu_{n} are arbitrarily given, consider the model ={ρξ|ξ=(ξ1,,ξn)n}{\mathcal{M}}=\{\rho_{\xi}\,|\,\xi=(\xi^{1},\ldots,\xi^{n})\in\mathbb{R}^{n}\}, where ρξ\rho_{\xi} is the Gaussian state determined by (μξ,α)(\mu_{\xi},\alpha) with μξ:=iξiμi\mu_{\xi}:=\sum_{i}\xi^{i}\mu_{i}. We call ={ρξ}{\mathcal{M}}=\{\rho_{\xi}\} a quantum Gaussian shift model. Holevo showed (§ ​6.9 of [8]) that the model has WW-efficient estimator for every positive weight WW. Namely, the sufficient condition presented in Prop. ​6.1 is fulfilled. Hence, if {\mathcal{H}} were finite-dimensional, it would have been concluded that {\mathcal{M}} is e-autoparallel in 𝒮{\mathcal{S}} and that ξ\xi is m-affine as a coordinate system of {\mathcal{M}}. In reality, however, {\mathcal{H}} is infinite-dimensional, and the e-autoparallelity for a model in 𝒮{\mathcal{S}} is not given a mathematical definition in the framework of the present paper. Nevertheless, there is no essential difference from the finite-dimensional case. In fact, we can verify that the model (,ξ)({\mathcal{M}},\xi) satisfies conditions (ii)-(iv) of Prop. ​3.8 as follows. According to [8], the iith SLD Li,ξL_{i,\xi} is represented as

Li,ξ=R(zi)μξ(zi),L_{i,\xi}=R(z_{i})-\mu_{\xi}(z_{i}), (6.5)

where ziz_{i} is the element of ZZ determined by the condition zZ\forall z\in Z, μi(z)=α(zi,z)\mu_{i}(z)=\alpha(z_{i},z). The SLD Fisher information matrix G=[gij]G=[g_{ij}] is given by

gij=Li,ξ,Lj,ξξ=α(zi,zj),g_{ij}=\left\langle L_{i,\xi},L_{j,\xi}\right\rangle_{\xi}=\alpha(z_{i},z_{j}), (6.6)

which does not depend on the parameter ξ\xi. Letting Fi:=jgijR(zj)F^{i}:=\sum_{j}g^{ij}R(z_{j}), where G1=[gij]G^{-1}=[g^{ij}], we have

jgijLj,ξ=Fijgijμξ(zj)=Fiξi,\sum_{j}g^{ij}L_{j,\xi}=F^{i}-\sum_{j}g^{ij}\mu_{\xi}(z_{j})=F^{i}-\xi^{i}, (6.7)

where the second equality follows from μξ(zj)=kξkμk(zj)\mu_{\xi}(z_{j})=\sum_{k}\xi^{k}\mu_{k}(z_{j}) and μk(zj)=α(zk,zj)=gkj\mu_{k}(z_{j})=\alpha(z_{k},z_{j})=g_{kj}. We thus have verified that (,ξ)({\mathcal{M}},\xi) satisfies condition (iii) of Prop. ​3.8, which is evidently equivalent to (ii) and (iv) even in this case. Hence, we may consider that the model also satisfies condition (i) at least in a naive sense. In order to mathematically justify this consideration, we need a rigorous treatment of 𝒮(){\mathcal{S}}({\mathcal{H}}) as an infinite-dimensional manifold equipped with an information geometric structure, which is out of scope of the present paper.

The fact that G=[gij]G=[g_{ij}] is constant on {\mathcal{M}} means that (,g)({\mathcal{M}},g) is a Euclidean manifold and the m-affine coordinate system ξ\xi is also affine w.r.t. the flat Levi-Civita connection of gg. This implies that {\mathcal{M}} is dually flat and that the e, m-connections on {\mathcal{M}} coincide with the Levi-Civita connection. Note also that the SLDs {Li,ξ}\{L_{i,\xi}\} do not commute and hence {\mathcal{M}} has no efficient estimator.

Remark 6.2.

Let α\alpha be arbitrarily fixed, and let 𝒩:={ρμ|μZ}{\mathcal{N}}:=\{\rho_{\mu}\,|\,\mu\in Z^{*}\}, where ZZ^{*} denotes the dual linear space of ZZ and ρμ\rho_{\mu} denotes the Gaussian state determined by (μ,α)(\mu,\alpha). This is a special (maximal) case of {\mathcal{M}} treated above, so that 𝒩{\mathcal{N}} is “e-autoparallel in 𝒮{\mathcal{S}}” in the same naive sense. The SLD structure of 𝒩{\mathcal{N}} is Euclidean, and the model ={ρξ|ξn}{\mathcal{M}}=\{\rho_{\xi}\,|\,\xi\in\mathbb{R}^{n}\} treated above, where μξ=iξiμi\mu_{\xi}=\sum_{i}\xi^{i}\mu_{i}, forms an e,m-autoparallel submanifold of 𝒩{\mathcal{N}}. Generally, a submanifold {\mathcal{M}} of 𝒩{\mathcal{N}} is e,m-autoparallel in 𝒩{\mathcal{N}} iff there exists an affine subspace 𝒜{\mathcal{A}} of ZZ^{*} such that ={ρμ|μ𝒜}{\mathcal{M}}=\{\rho_{\mu}\,|\,\mu\in{\mathcal{A}}\}, which is represented as ={ρξ|ξn}{\mathcal{M}}=\{\rho_{\xi}\,|\,\xi\in\mathbb{R}^{n}\} with μξ=μ0+i=1nξiμi\mu_{\xi}=\mu_{0}+\sum_{i=1}^{n}\xi^{i}\mu_{i}. Note that the construction of WW-efficient estimator by Holevo is immediately applied to this extended model, so that it satisfies the sufficient condition of Prop. ​6.1.

7 Another estimation-theoretical characterization of e-autoparallelity

In this section we give another characterization to the e-autoparallelity by considering a different type of estimation problem. Before we get into the main discussion, some preliminaries on geometrical language are in order.

On a general Riemannian manifold (M,g)(M,g), a one-to-one correspondence between a tangent vector XpTp(M)X_{p}\in T_{p}(M) and a cotangent vector ωpTp(M)\omega_{p}\in T^{*}_{p}(M) at a point pMp\in M is naturally defined; denoting the correspondence by gp\stackrel{{\scriptstyle g_{p}}}{{\longleftrightarrow}}, we have

XpgpωpYpTp(M),ωp(Yp)=gp(Xp,Yp).X_{p}\stackrel{{\scriptstyle g_{p}}}{{\longleftrightarrow}}\omega_{p}\;\Leftrightarrow\;\forall Y_{p}\in T_{p}(M),\;\omega_{p}(Y_{p})=g_{p}(X_{p},Y_{p}). (7.1)

This is extended to the correspondence g\stackrel{{\scriptstyle g}}{{\longleftrightarrow}} between a vector field X𝒳(M)X\in{\mathscr{X}}(M) and a differential 1-form ω𝒟(M)\omega\in{\mathcal{D}}(M), where 𝒟(M){\mathcal{D}}(M) denotes the totality of 1-forms on MM, such that

Xgω\displaystyle X\stackrel{{\scriptstyle g}}{{\longleftrightarrow}}\omega\; pM,Xpgpωp\displaystyle\Leftrightarrow\;\forall p\in M,\;X_{p}\stackrel{{\scriptstyle g_{p}}}{{\longleftrightarrow}}\omega_{p}
Y𝒳(M),ω(Y)=g(X,Y).\displaystyle\Leftrightarrow\;\forall Y\in{\mathscr{X}}(M),\;\omega(Y)=g(X,Y). (7.2)

When a coordinate system ξ=(ξi)\xi=(\xi^{i}) is given on MM, and X𝒳(M)X\in{\mathscr{X}}(M) and ω𝒟(M)\omega\in{\mathcal{D}}(M) are represented as X=iXiiX=\sum_{i}X^{i}\partial_{i} and ω=jωjdξj\omega=\sum_{j}\omega_{j}\,d\xi^{j} by functions {Xi},{ωj}(M)\{X^{i}\},\{\omega_{j}\}\subset{\mathcal{F}}(M), we have

Xgωj,ωj=iXigiji,Xi=jωjgij,X\stackrel{{\scriptstyle g}}{{\longleftrightarrow}}\omega\;\Leftrightarrow\;\forall j,\;\omega_{j}=\sum_{i}X^{i}g_{ij}\;\Leftrightarrow\;\forall i,\;X^{i}=\sum_{j}\omega_{j}g^{ij}, (7.3)

where gij=g(i,j)g_{ij}=g(\partial_{i},\partial_{j}) and gij=g(dξi,dξj)g^{ij}=g(d\xi^{i},d\xi^{j}) which form the inverse matrices of each other.

For a function f(M)f\in{\mathcal{F}}(M), its gradient w.r.t. gg is defined as the vector field X𝒳(M)X\in{\mathscr{X}}(M) such that XgdfX\stackrel{{\scriptstyle g}}{{\longleftrightarrow}}df, which we denote by X=gradfX={\rm grad}\leavevmode\nobreak\ \!f. This is represented as

gradf=i,j(if)gijj.{\rm grad}\leavevmode\nobreak\ \!f=\sum_{i,j}(\partial_{i}f)g^{ij}\partial_{j}. (7.4)

The correspondence gp\stackrel{{\scriptstyle g_{p}}}{{\longleftrightarrow}} induces an inner product and a norm on the cotangent space Tp(M)T^{*}_{p}(M) such that gp\stackrel{{\scriptstyle g_{p}}}{{\longleftrightarrow}} is an isometry; i.e., XpgpωpX_{p}\stackrel{{\scriptstyle g_{p}}}{{\longleftrightarrow}}\omega_{p} \Rightarrow Xpp=ωpp\|X_{p}\|_{p}=\|\omega_{p}\|_{p}. In particular, we have

(gradf)pp2=(df)pp2=i,jgij(p)if(p)jf(p).\|({\rm grad}\leavevmode\nobreak\ \!f)_{p}\|^{2}_{p}=\|(df)_{p}\|^{2}_{p}=\sum_{i,j}g^{ij}(p)\,\partial_{i}f(p)\,\partial_{j}f(p). (7.5)

Now, we are ready to start the main discussion of this section. Let {\mathcal{M}} be an nn-dimensional submanifold of 𝒮=𝒮(){\mathcal{S}}={\mathcal{S}}({\mathcal{H}}), and f()f\in{\mathcal{F}}({\mathcal{M}}) be a smooth function on it. We consider the problem of estimating the scalar value f(ρ)f(\rho) for unknown ρ\rho\in{\mathcal{M}}. An estimator is generally represented by a POVM Λ=Λ(dt^)\Lambda=\Lambda(d\hat{t}\,), where t^\hat{t} is a scalar variable representing an estimate for t=f(ρ)t=f(\rho). The expectation Eρ(Λ)E_{\rho}(\Lambda) and the mean squared error (the variance in the unbiased case) Vρ(Λ)V_{\rho}(\Lambda) of Λ\Lambda for a state ρ\rho are defined by

Eρ(Λ)\displaystyle E_{\rho}(\Lambda) :=t^Tr(ρΛ(dt^)),\displaystyle:=\int\hat{t}\,{\rm Tr}\left(\rho\,\Lambda(d\hat{t}\,)\right), (7.6)
Vρ(Λ)\displaystyle V_{\rho}(\Lambda) :=(t^f(ρ))2Tr(ρΛ(dt^)).\displaystyle:=\int(\hat{t}-f(\rho))^{2}\,{\rm Tr}\left(\rho\,\Lambda(d\hat{t}\,)\right). (7.7)

Localizing the unbiasedness condition E(Λ)=fE(\Lambda)=f, where the LHS denotes the function ,ρEρ(Λ){\mathcal{M}}\rightarrow\mathbb{R},\;\rho\mapsto E_{\rho}(\Lambda), we say that Λ\Lambda is locally unbiased for ff at ρ\rho\in{\mathcal{M}} when

Eρ(Λ)=f(ρ)and(dE(Λ))ρ=(df)ρ.E_{\rho}(\Lambda)=f(\rho)\quad\text{and}\quad(dE(\Lambda))_{\rho}=(df)_{\rho}. (7.8)

When a coordinate system ξ=(ξi)\xi=(\xi^{i}) is arbitrarily given on {\mathcal{M}}, the second condition in (7.8) is expressed as

i{1,,n},iEρ(Λ)=if(ρ),\forall i\in\{1,\ldots,n\},\;\;\partial_{i}E_{\rho}(\Lambda)=\partial_{i}f(\rho), (7.9)

where iEρ(Λ)\partial_{i}E_{\rho}(\Lambda) and if(ρ)\partial_{i}f(\rho) denote the derivatives of the functions E(Λ)E(\Lambda) and ff by i=ξi\partial_{i}=\frac{\partial}{\partial\xi^{i}} evaluated at ρ\rho. We denote by 𝒰(ρ,f){\mathcal{U}}(\rho,f) the totality of locally unbiased estimators for ff at ρ\rho.

Proposition 7.1.

For any f()f\in{\mathcal{F}}({\mathcal{M}}) and any ρ\rho\in{\mathcal{M}}, we have

minΛ𝒰(ρ,f)Vρ(Λ)=(df)ρρ2.\min_{\Lambda\in{\mathcal{U}}(\rho,f)}V_{\rho}(\Lambda)=\|(df)_{\rho}\|_{\rho}^{2}. (7.10)

The minimum of (7.10) is achieved by the spectral measure of the observable

Fρ:=f(ρ)+iif(ρ)Lρi,F_{\rho}:=f(\rho)+\sum_{i}\partial_{i}f(\rho)\,L^{i}_{\rho}, (7.11)

where Lρi:=jgij(ρ)Li,ρL^{i}_{\rho}:=\sum_{j}g^{ij}(\rho)L_{i,\rho} and Li:=LiL_{i}:=L_{\partial_{i}}.

Proof  Given an estimator Λ\Lambda, let

A:=t^Λ(dt^)h.A:=\int\hat{t}\,\Lambda(d\hat{t})\in{\mathcal{L}}_{{\rm h}}.

Then the local unbiasedness of Λ\Lambda at ρ\rho is represented as

Aρ=f(ρ)andi,iAρ=if(ρ),\left\langle A\right\rangle_{\rho}=f(\rho)\quad\text{and}\quad\forall i,\;\partial_{i}\left\langle A\right\rangle_{\rho}=\partial_{i}f(\rho), (7.12)

and we have

Vρ(Λ)(Af(ρ))2ρ=Af(ρ)ρ2,V_{\rho}(\Lambda)\geq\left\langle(A-f(\rho))^{2}\right\rangle_{\rho}=\|A-f(\rho)\|^{2}_{\rho}, (7.13)

where ρ\|\cdot\|_{\rho} denotes the norm w.r.t. the symmetrized inner product ,ρ\left\langle\cdot,\cdot\right\rangle_{\rho} on h{\mathcal{L}}_{{\rm h}}. Noting that the second condition of (7.12) is equivalent to

i,Li,ρ,Af(ρ)ρ=Li,ρ,jjf(ρ)Lρjρ\forall i,\;\left\langle L_{i,\rho},A-f(\rho)\right\rangle_{\rho}=\Bigl{\langle}L_{i,\rho},\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}\Bigr{\rangle}_{\rho}

we see that jjf(ρ)Lρj\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho} is the orthogonal projection of Af(ρ)A-f(\rho) onto the space span{Lj,ρ}j=1n{\rm span}\leavevmode\nobreak\ \!\{L_{j,\rho}\}_{j=1}^{n}. Hence we have

Af(ρ)ρ2\displaystyle\|A-f(\rho)\|^{2}_{\rho} =jjf(ρ)Lρjρ2+Af(ρ)jjf(ρ)Lρjρ2\displaystyle=\|\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}\|_{\rho}^{2}+\|A-f(\rho)-\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}\|_{\rho}^{2}
jjf(ρ)Lρjρ2\displaystyle\geq\|\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}\|_{\rho}^{2}
=i,jgij(ρ)if(ρ)jf(ρ)=(df)ρρ2.\displaystyle=\sum_{i,j}g^{ij}(\rho)\,\partial_{i}f(\rho)\,\partial_{j}f(\rho)=\|(df)_{\rho}\|_{\rho}^{2}. (7.14)

The inequality in (7.13) holds with equality when Λ\Lambda is the spectral measure of AA, and the inequality in (7.14) holds with equality when A=f(ρ)+jjf(ρ)Lρj=FρA=f(\rho)+\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}=F_{\rho}. These observations prove the proposition. \Box

Based on Proposition ​7.1, we call an estimator Λ\Lambda locally efficient for ff at ρ\rho when Λ𝒰(ρ,f)\Lambda\in{\mathcal{U}}(\rho,f) and Vρ(Λ)=dfρ2V_{\rho}(\Lambda)=\|df\|_{\rho}^{2}, and call it efficient for ff when it is locally efficient for ff at every ρ\rho\in{\mathcal{M}}. Note that, unlike the case of estimation for multi-dimensional coordinates (ξi)(\xi^{i}) where the infimum in (4.9) cannot be replaced with minimum in general, there always exists a locally efficient estimator for a scalar function ff at each ρ\rho. Furthermore, since a locally efficient estimator is obtained as the spectral measure of an observable as is shown in the proof of Prop. ​7.1, it suffices to treat only estimators represented by Hermitian operators. Note that an estimator FhF\in{\mathcal{L}}_{{\rm h}} is efficient for ff iff

ρ,Fρ=f(ρ)andVρ(F):=(FFρ)2ρ=(df)ρρ2.\forall\rho\in{\mathcal{M}},\;\left\langle F\right\rangle_{\rho}=f(\rho)\quad\text{and}\quad V_{\rho}(F):=\bigl{\langle}(F-\left\langle F\right\rangle_{\rho})^{2}\bigr{\rangle}_{\rho}=\|(df)_{\rho}\|_{\rho}^{2}. (7.15)

We define

():={f()|there exists an efficient estimator for f}.{\mathcal{E}}({\mathcal{M}}):=\{f\in{\mathcal{F}}({\mathcal{M}})\,|\,\text{there exists an efficient estimator for $f$}\}. (7.16)
Proposition 7.2.

For a function f()f\in{\mathcal{F}}({\mathcal{M}}), the following conditions are equivalent.

  • (i)

    f()f\in{\mathcal{E}}({\mathcal{M}}).

  • (ii)

    Fh,Ff=i(if)Li\exists F\in{\mathcal{L}}_{{\rm h}},\;F-f=\sum_{i}(\partial_{i}f)\,L^{i}.

  • (iii)

    Fh,FF|=i(if)Li\exists F\in{\mathcal{L}}_{{\rm h}},\;F-\left\langle F\right\rangle|_{{\mathcal{M}}}=\sum_{i}(\partial_{i}f)\,L^{i}.

  • (iv)

    gradf{\rm grad}\leavevmode\nobreak\ \!f is e-parallel (i.e. parallel w.r.t. the e-connection on 𝒮{\mathcal{S}}).

In (ii), the observable FF gives an efficient estimator for ff.

Proof  From Prop. ​7.1, it immediately follows that (i) \Leftrightarrow (ii) and that FF in (ii) gives an efficient estimator for ff.

It is obvious that (ii) \Rightarrow (iii). To show the converse, assume that an operator FhF\in{\mathcal{L}}_{{\rm h}} satisfies FF|=i(if)LiF-\left\langle F\right\rangle|_{{\mathcal{M}}}=\sum_{i}(\partial_{i}f)\,L^{i}. Then we have

iF=Li,F=Li,FF|=Li,j(jf)Li=if,\displaystyle\partial_{i}\left\langle F\right\rangle=\left\langle L_{i},F\right\rangle=\left\langle L_{i},F-\left\langle F\right\rangle|_{{\mathcal{M}}}\right\rangle=\bigl{\langle}L_{i},\sum_{j}(\partial_{j}f)\,L^{i}\bigr{\rangle}=\partial_{i}f,

which implies that c\exists c\in\mathbb{R}, f=F|+cf=\left\langle F\right\rangle|_{{\mathcal{M}}}+c. Redefining F:=F+cF:=F+c, we have Ff=i(if)LiF-f=\sum_{i}(\partial_{i}f)L^{i}. This proves (iii) \Rightarrow (ii).

Let X:=gradfX:={\rm grad}\leavevmode\nobreak\ \!f. Then (7.4) yields

LX=i,j(if)gijLj=i(if)Li,L_{X}=\sum_{i,j}(\partial_{i}f)g^{ij}L_{j}=\sum_{i}(\partial_{i}f)L^{i},

and Cor. ​3.6 yields

X is e-parallelFh,LX=FF|.\text{$X$ is e-parallel}\;\Leftrightarrow\;\exists F\in{\mathcal{L}}_{{\rm h}},\;L_{X}=F-\left\langle F\right\rangle|_{{\mathcal{M}}}.

Thus we obtain (iii) \Leftrightarrow (iv). \Box

Corollary 7.3.

(){\mathcal{E}}({\mathcal{M}}) is an \mathbb{R}-linear space.

Proof  Obvious from Prop. ​7.2. \Box

Proposition 7.4.

For a vector field X𝒳()X\in{\mathscr{X}}({\mathcal{M}}), we have

X is e-parallelf(),X=gradf.\text{$X$ is e-parallel}\;\Leftrightarrow\;\exists f\in{\mathcal{E}}({\mathcal{M}}),\;X={\rm grad}\,f. (7.17)

Proof  The implication \Leftarrow follows from (i) \Rightarrow (iv) in Prop. ​7.2. To show the converse, assume that XX is e-parallel. Then, according to Cor. ​3.6, there exists FhF\in{\mathcal{L}}_{{\rm h}} such that LX=FfL_{X}=F-f, where f:=F|f:=\left\langle F\right\rangle|_{{\mathcal{M}}}. For any Y𝒳()Y\in{\mathscr{X}}({\mathcal{M}}) we have

g(X,Y)=LY,Ff=LY,F=YF|=Yf=df(Y),g(X,Y)=\left\langle L_{Y},F-f\right\rangle=\left\langle L_{Y},F\right\rangle\stackrel{{\scriptstyle\star}}{{=}}Y\left\langle F\right\rangle|_{{\mathcal{M}}}=Yf=df(Y), (7.18)

where =\stackrel{{\scriptstyle\star}}{{=}} follows from (3.7). This implies that X=gradfX={\rm grad}\leavevmode\nobreak\ \!f. Since XX is e-parallel, it follows from (iv) \Rightarrow (i) in Prop. 7.2 that f()f\in{\mathcal{E}}({\mathcal{M}}). Thus, \Rightarrow in (7.17) has been verified. \Box

Define

d():={df|f()}𝒟().d{\mathcal{E}}({\mathcal{M}}):=\{df\,|\,f\in{\mathcal{E}}({\mathcal{M}})\}\;\subset{\mathcal{D}}({\mathcal{M}}). (7.19)

Since df=dfff=df=df^{\prime}\;\Leftrightarrow\;f-f^{\prime}= const., we have the natural identification d()()/d{\mathcal{E}}({\mathcal{M}})\simeq{\mathcal{E}}({\mathcal{M}})/\mathbb{R}. We also define

𝒳e-par():={X𝒳()|X is e-parallel}.{{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}}):=\{X\in{\mathscr{X}}({\mathcal{M}})\,|\,\text{$X$ is e-parallel}\}. (7.20)

Then we have the following proposition.

Proposition 7.5.

The correspondence g|\stackrel{{\scriptstyle g|_{{\mathcal{M}}}}}{{\longleftrightarrow}} establishes a linear isomorphism between 𝒳e-par(){{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}}) and d()d{\mathcal{E}}({\mathcal{M}}). As a consequence, we have

dimd()=dim()1=dim𝒳e-par()dim.\dim d{\mathcal{E}}({\mathcal{M}})=\dim{\mathcal{E}}({\mathcal{M}})-1=\dim{{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}})\leq\dim{\mathcal{M}}. (7.21)

Proof  It suffices to show that for an arbitrary pair (X,ω)𝒳()×𝒟()(X,\omega)\in{\mathscr{X}}({\mathcal{M}})\times{\mathcal{D}}({\mathcal{M}}) satisfying Xg|ωX\stackrel{{\scriptstyle g|_{{\mathcal{M}}}}}{{\longleftrightarrow}}\omega, the following equivalence holds:

X𝒳e-par()f(),ω=df.X\in{{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}})\;\Leftrightarrow\;\exists f\in{\mathcal{E}}({\mathcal{M}}),\;\omega=df. (7.22)

Since ω=dfX=gradf\omega=df\;\Leftrightarrow\;X={\rm grad}\leavevmode\nobreak\ \!f, this is just Prop. ​7.4. \Box

Now, we present two theorems for characterization of the e-autoparallelity in terms of (){\mathcal{E}}({\mathcal{M}}).

Theorem 7.6.

For an nn-dimensional submanifold {\mathcal{M}} of 𝒮{\mathcal{S}}, the following conditions are equivalent.

  • (i)

    {\mathcal{M}} is e-autoparallel in 𝒮{\mathcal{S}}.

  • (ii)

    dim()=n+1\dim{\mathcal{E}}({\mathcal{M}})=n+1.

Proof  We have (i) \Leftrightarrow dim𝒳e-par()=n\dim{{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}})=n by Prop. ​2.4, and dim𝒳e-par()=n\dim{{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}})=n \Leftrightarrow (ii) by Prop. ​7.5. \Box

Theorem 7.7.

For an nn-dimensional model (,ξ)({\mathcal{M}},\xi), the following conditions are equivalent.

  • (i)

    {\mathcal{M}} is e-autoparallel in 𝒮{\mathcal{S}}, and ξ\xi is an m-affine coordinate system.

  • (ii)

    i{1,,n},ξi()\forall i\in\{1,\ldots,n\},\;\xi^{i}\in{\mathcal{E}}({\mathcal{M}}).

  • (iii)

    ()={c+i=1nuiξi|(c,u1,,un)n+1}\displaystyle{\mathcal{E}}({\mathcal{M}})=\bigl{\{}c+\sum_{i=1}^{n}u_{i}\xi^{i}\,\big{|}\,(c,u_{1},\ldots,u_{n})\in\mathbb{R}^{n+1}\bigr{\}}.

Proof  Let Xi:=gradξi=jgijjX^{i}:={\rm grad}\leavevmode\nobreak\ \!\xi^{i}=\sum_{j}g^{ij}\partial_{j}. Then we have

(i) i,Xi𝒳e-par()(ii),\displaystyle\Leftrightarrow\;\forall i,\;X^{i}\in{{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}})\;\Leftrightarrow\;\text{(ii)},

where the first \Leftrightarrow follows from Prop. ​2.6 and the second \Leftrightarrow follows from Prop. ​7.2.

Next, noting that constant functions on {\mathcal{M}} belong to (){\mathcal{E}}({\mathcal{M}}), we have

(ii) {c+i=1nuiξi|(c,u1,,un)n+1}()\displaystyle\Leftrightarrow\;\bigl{\{}c+\sum_{i=1}^{n}u_{i}\xi^{i}\,\big{|}\,(c,u_{1},\ldots,u_{n})\in\mathbb{R}^{n+1}\bigr{\}}\subset{\mathcal{E}}({\mathcal{M}})
(iii),\displaystyle\Leftrightarrow\;\text{(iii)},

where the second \Leftrightarrow follows since dim()n+1\dim{\mathcal{E}}({\mathcal{M}})\leq n+1 by (7.21) and {1,ξ1,,ξn}\{1,\xi^{1},\ldots,\xi^{n}\} are linearly independent. \Box

Remark 7.8.

If we replace 𝒮(){\mathcal{S}}({\mathcal{H}}) by 𝒫(Ω){\mathcal{P}}(\Omega) in Theorems 7.6 and 7.7, these theorems hold as they are in the classical case. When the coordinate functions ξ1,,ξn\xi^{1},\ldots,\xi^{n} satisfy condition (ii) in Theorem ​7.7, they have their efficient estimators F1,,FnF^{1},\ldots,F^{n}, which are functions Ω\Omega\rightarrow\mathbb{R} in this case, and the map (F1,,Fn):Ωn(F^{1},\ldots,F^{n}):\Omega\rightarrow\mathbb{R}^{n} becomes an efficient estimator for ξ=(ξ1,,ξn)\xi=(\xi^{1},\ldots,\xi^{n}). Thus, we see that the equivalence (i) \Leftrightarrow (ii) in the theorem is just Theorem ​1.1.

Finally, we present three propositions that will aid in understanding the above results in a purely geometric context, whose proofs are given in A3 of Appendix.

Proposition 7.9.

Fh,ρ𝒮,Vρ(F):=(FFρ)2ρ=(dF)ρρ2\forall F\in{\mathcal{L}}_{{\rm h}},\forall\rho\in{\mathcal{S}},\;V_{\rho}(F):=\bigl{\langle}(F-\left\langle F\right\rangle_{\rho})^{2}\bigr{\rangle}_{\rho}=\|(d\left\langle F\right\rangle)_{\rho}\|_{\rho}^{2}.

Proposition 7.10.

We have

(𝒮)\displaystyle{\mathcal{E}}({\mathcal{S}}) ={F|Fh}\displaystyle=\{\left\langle F\right\rangle\,|\,F\in{\mathcal{L}}_{{\rm h}}\}
={f(𝒮)|gradf is e-parallel}\displaystyle=\{f\in{\mathcal{F}}({\mathcal{S}})\,|\,\text{${\rm grad}\leavevmode\nobreak\ \!f$ is e-parallel}\}
={f(𝒮)|df is m-parallel},\displaystyle=\{f\in{\mathcal{F}}({\mathcal{S}})\,|\,\text{$df$ is m-parallel}\}, (7.23)

where a 1-form ω𝒟(𝒮)\omega\in{\mathcal{D}}({\mathcal{S}}) is said to be m-parallel when

X,Y𝒳(𝒮),(X(m)ω)(Y):=Xω(Y)ω(X(m)Y)=0.\forall X,Y\in{\mathscr{X}}({\mathcal{S}}),\;(\nabla^{({\rm m})}_{X}\omega)(Y):=X\omega(Y)-\omega\bigl{(}\nabla^{({\rm m})}_{X}Y\bigr{)}=0. (7.24)
Proposition 7.11.

For an arbitrary submanifold {\mathcal{M}} of 𝒮{\mathcal{S}}, we have

()={f()|f~(𝒮),\displaystyle{\mathcal{E}}({\mathcal{M}})=\bigl{\{}f\in{\mathcal{F}}({\mathcal{M}})\,\big{|}\,\exists\tilde{f}\in{\mathcal{E}}({\mathcal{S}}), f=f~|and\displaystyle\;f=\tilde{f}|_{{\mathcal{M}}}\;\;\text{and}\;\;
ρ,(df)ρρ=(df~)ρρ}.\displaystyle\forall\rho\in{\mathcal{M}},\;\|(df)_{\rho}\|_{\rho}=\|(d\tilde{f})_{\rho}\|_{\rho}\,\bigr{\}}. (7.25)

As these propositions suggest, the discussion for (){\mathcal{E}}({\mathcal{M}}) given in this section can be extended to a more general geometrical setting. Let us recall the situation teated in section 2 where a manifold SS is provided with a Riemannian metric gg together with mutually dual affine connections \nabla and \nabla^{*} such that \nabla is curvature-free and \nabla^{*} is flat. We define

(S):=\displaystyle{\mathcal{E}}(S):= {f(S)|gradf is -parallel}\displaystyle\{f\in{\mathcal{F}}(S)\,|\,\text{${\rm grad}\leavevmode\nobreak\ \!f$ is $\nabla$-parallel}\}
=\displaystyle= {f(S)|df is -parallel},\displaystyle\{f\in{\mathcal{F}}(S)\,|\,\text{$df$ is $\nabla^{*}$-parallel}\}, (7.26)

where we have invoked the fact that the correspondence 𝒳(S)Xgω𝒟(S){\mathscr{X}}(S)\ni X\stackrel{{\scriptstyle g}}{{\longleftrightarrow}}\omega\in{\mathcal{D}}(S) implies (cf. (A.7))

X is -parallelω is -parallel.\text{$X$ is $\nabla$-parallel}\;\Leftrightarrow\;\text{$\omega$ is $\nabla^{*}$-parallel}. (7.27)

Given a submanifold MM of SS, let

(M):={f(M)|f~(S),\displaystyle{\mathcal{E}}(M):=\bigl{\{}f\in{\mathcal{F}}(M)\,\big{|}\,\exists\tilde{f}\in{\mathcal{E}}(S), f=f~|Mand\displaystyle\;f=\tilde{f}|_{M}\;\;\text{and}\;\;
ρM,(df)ρρ=(df~)ρρ}.\displaystyle\forall\rho\in M,\;\|(df)_{\rho}\|_{\rho}=\|(d\tilde{f})_{\rho}\|_{\rho}\,\bigr{\}}. (7.28)

Then it is not difficult to verify that Theorems 7.6 and 7.7 as well as their proofs are extended to this general situation almost as they are.

It should be noted that the flatness of \nabla^{*} is essential in that it ensures dim(S)=dimS+1\dim{\mathcal{E}}(S)=\dim S+1. To clarify the role of the flatness, let us consider a more general situation by removing the assumption that \nabla is curvature-free and \nabla^{*} is flat, assuming only that they are dual w.r.t. gg. We start from the following general identity: for any 1-form ω𝒟(S)\omega\in{\mathcal{D}}(S) and any vector fields X,Y𝒳(S)X,Y\in{\mathscr{X}}(S), we have

(dω)(X,Y)\displaystyle(d\omega)(X,Y) :=Xω(Y)Yω(X)ω([X,Y])\displaystyle:=X\omega(Y)-Y\omega(X)-\omega([X,Y])
=(Xω)(Y)+ω(XY)(Yω)(X)ω(YX)ω([X,Y])\displaystyle=(\nabla^{*}_{X}\omega)(Y)+\omega(\nabla^{*}_{X}Y)-(\nabla^{*}_{Y}\omega)(X)-\omega(\nabla^{*}_{Y}X)-\omega([X,Y])
=(Xω)(Y)(Yω)(X)+ω(𝒯()(X,Y)),\displaystyle=(\nabla^{*}_{X}\omega)(Y)-(\nabla^{*}_{Y}\omega)(X)+\omega({\mathcal{T}}^{(\nabla^{*})}(X,Y)), (7.29)

where 𝒯(){\mathcal{T}}^{(\nabla^{*})} denotes the torsion of \nabla^{*}: 𝒯()(X,Y)=XYYX[X,Y]{\mathcal{T}}^{(\nabla^{*})}(X,Y)=\nabla^{*}_{X}Y-\nabla^{*}_{Y}X-[X,Y]. When \nabla^{*} is torsion-free, this implies that for any ω𝒟(S)\omega\in{\mathcal{D}}(S)

ω is -paralleldω=0f(S),ω=df\text{$\omega$ is $\nabla^{*}$-parallel}\;\Rightarrow\;d\omega=0\;\Leftrightarrow\;\exists f\in{\mathcal{F}}(S),\;\omega=df (7.30)

(see Remark 1.4) and that for any X𝒳(S)X\in{\mathscr{X}}(S)

X is -parallelf(S),X=gradf.\text{$X$ is $\nabla$-parallel}\;\Rightarrow\;\exists f\in{\mathcal{F}}(S),\;X={\rm grad}\leavevmode\nobreak\ \!f. (7.31)

This leads to

d(S):={df|f(S)}={ω|ω is -parallel}d{\mathcal{E}}(S):=\{df\,|\,f\in{\mathcal{E}}(S)\}=\{\omega\,|\,\text{$\omega$ is $\nabla^{*}$-parallel}\} (7.32)

and hence

dim(S)=dim{ω|ω is -parallel}+1=dim𝒳-par(S)+1,\dim{\mathcal{E}}(S)=\dim\{\omega\,|\,\text{$\omega$ is $\nabla^{*}$-parallel}\}+1=\dim{\mathscr{X}}_{\text{$\nabla$-par}}(S)+1, (7.33)

where 𝒳-par(S){\mathscr{X}}_{\text{$\nabla$-par}}(S) denotes the totality of \nabla-parallel vector fields on SS. If, in addition, \nabla is curvature-free, which is equivalent to the flatness of \nabla^{*}, then dim𝒳-par(S)=dimS\dim{\mathscr{X}}_{\text{$\nabla$-par}}(S)=\dim S, and we obtain dim(S)=dimS+1\dim{\mathcal{E}}(S)=\dim S+1.

8 Integrability conditions

Consider the general situation where an affine connection \nabla is given on a manifold SS. For an arbitrary point pSp\in S and an arbitrary 11-dimensional subspace VV of the tangent space Tp(S)T_{p}(S), there always exists a \nabla-autoparallel curve, i.e. a \nabla-geodesic, that passes through pp in direction VV. For the existence of multi-dimensional autoparallel submanifolds, the situation differs greatly depending on whether \nabla is flat or not. When \nabla is flat (curvature-free and torsion-free), the autoparallel submanifolds are those determined by arbitrary affine constraints on \nabla-affine coordinates. This ensures that, for an arbitrary point pSp\in S and an arbitrary linear subspace VV of the tangent space Tp(S)T_{p}(S), there uniquely exists a \nabla-autoparallel submanifold MM satisfying pMp\in M and Tp(M)=VT_{p}(M)=V. This is the case with the e-connection on the space 𝒫{\mathcal{P}} of probability distributions, for which the autoparallel submanifolds are the exponential families. When \nabla is not flat, on the other hand, the existence of multi-dimensional autoparallel submanifolds is not ensured in general. In this section we investigate conditions for existence of autoparallel submanifolds.

Let us consider the case when \nabla is curvature-free as in the e-connection on 𝒮(){\mathcal{S}}({\mathcal{H}}). According to (i) \Leftrightarrow (iv) of Prop. ​2.4, an nn-dimensional submanifold MM of SS is \nabla-autoparallel iff there exist nn linearly independent \nabla-parallel vector fields X1,,Xn𝒳(S)X^{1},\ldots,X^{n}\in{\mathscr{X}}(S) such that their restrictions X1|M,,Xn|MX^{1}|_{M},\ldots,X^{n}|_{M} belong to 𝒳(M){\mathscr{X}}(M). This means that MM is an integral manifold of {X1,,Xn}\{X^{1},\ldots,X^{n}\}, or equivalently that MM is an integral manifold of the nn-dimensional distribution

𝒱:SpVp:=span{Xp1,,Xpn}Tp(S),{\mathcal{V}}:S\ni p\mapsto V_{p}:={\rm span}\{X^{1}_{p},\ldots,X^{n}_{p}\}\subset T_{p}(S), (8.1)

which is \nabla-parallel in the sense that p,qS,Φp,q()(Vp)=Vq\forall p,q\in S,\;\Phi^{(\nabla)}_{p,q}(V_{p})=V_{q}, where Φp,q()\Phi^{(\nabla)}_{p,q} denotes the parallel transport w.r.t. \nabla.

Proposition 8.1.

Suppose that we are given a manifold SS with a curvature-free connection \nabla and an nn-dimensional \nabla-parallel distribution 𝒱:SpVp{\mathcal{V}}:S\ni p\mapsto V_{p}. Define

𝒳(S:𝒱):={X𝒳(S)|pS,XpVp}{\mathscr{X}}(S\colon{\mathcal{V}}):=\{X\in{\mathscr{X}}(S)\,|\,\forall p\in S,\,X_{p}\in V_{p}\} (8.2)

and

𝒳-par(S:𝒱):={X𝒳(S:𝒱)|X is -parallel}.{{\mathscr{X}}}_{\text{$\nabla$-par}}(S\colon{\mathcal{V}}):=\{X\in{\mathscr{X}}(S\colon{\mathcal{V}})\,|\,\text{$X$ is $\nabla$-parallel}\}. (8.3)

Then the following conditions are equivalent.

  • (i)

    For every pSp\in S, there exists a \nabla-autoparallel submanifold MM of SS satisfying pMp\in M and Tp(M)=VpT_{p}(M)=V_{p}.

  • (ii)

    The distribution 𝒱{\mathcal{V}} is involutive in the sense that X,Y𝒳(S:𝒱),[X,Y]𝒳(S:𝒱)\forall X,Y\in{\mathscr{X}}(S\colon{\mathcal{V}}),\;[X,Y]\in{\mathscr{X}}(S\colon{\mathcal{V}}).

  • (iii)

    X,Y𝒳-par(S:𝒱),[X,Y]𝒳(S:𝒱)\forall X,Y\in{{\mathscr{X}}}_{\text{$\nabla$-par}}(S\colon{\mathcal{V}}),\;[X,Y]\in{\mathscr{X}}(S\colon{\mathcal{V}}).

  • (iv)

    The torsion 𝒯(){\mathcal{T}}^{(\nabla)} of \nabla satisfies pS,𝒯p()(Vp×Vp)Vp\forall p\in S,\;{\mathcal{T}}^{(\nabla)}_{p}(V_{p}\times V_{p})\subset V_{p}.

Proof  (i) is equivalent to the condition that for any point pSp\in S, there exists an integral manifold of 𝒱{\mathcal{V}} containing pp, and is equivalent to (ii) by the famous Frobenius theorem for integrability.

(ii) \Rightarrow (iii) is obvious, and (iii) \Rightarrow (ii) follows since there exist nn linearly independent \nabla-parallel vector fields {X1,,Xn}𝒳-par(S:𝒱)\{X_{1},\ldots,X_{n}\}\subset{{\mathscr{X}}}_{\text{$\nabla$-par}}(S\colon{\mathcal{V}}), whereby every element of 𝒳(S:𝒱){\mathscr{X}}(S\colon{\mathcal{V}}) is expressed as ifiXi\sum_{i}f_{i}X_{i} by some functions {f1,,fn}(M)\{f_{1},\ldots,f_{n}\}\subset{\mathcal{F}}(M).

For any \nabla-parallel vector fields XX and YY, we have

𝒯()(X,Y):=XYYX[X,Y]=[X,Y].{\mathcal{T}}^{(\nabla)}(X,Y):=\nabla_{X}Y-\nabla_{Y}X-[X,Y]=-[X,Y]. (8.4)

Hence (iii) is equivalent to

X,Y𝒳-par(S:𝒱),𝒯()(X,Y)𝒳(S:𝒱).\forall X,Y\in{{\mathscr{X}}}_{\text{$\nabla$-par}}(S\colon{\mathcal{V}}),\;\;{\mathcal{T}}^{(\nabla)}(X,Y)\in{\mathscr{X}}(S\colon{\mathcal{V}}). (8.5)

Since 𝒯(){\mathcal{T}}^{(\nabla)} is a tensor field so that (𝒯()(X,Y))p=𝒯p()(Xp,Yp))({\mathcal{T}}^{(\nabla)}(X,Y))_{p}={\mathcal{T}}^{(\nabla)}_{p}(X_{p},Y_{p})) holds at each point pp, (8.5) is equivalent to (iv). \Box

Remark 8.2.

Condition (i) in Prop. ​8.1 (and hence (ii)-(iv) also) means that there exists a foliation S=αMαS=\bigsqcup_{\alpha}M_{\alpha} such that each leaf MαM_{\alpha} is \nabla-autoparallel in SS and satisfies Tp(Mα)=VpT_{p}(M_{\alpha})=V_{p} for every pMαp\in M_{\alpha}.

The following proposition is an immediate consequence of (i) \Leftrightarrow (iv) in Prop. ​8.1.

Proposition 8.3.

For a manifold SS with a curvature-free connection \nabla, the following conditions are equivalent.

  • (i)

    For every point pSp\in S and every linear subspace VV of Tp(S)T_{p}(S), there exists a \nabla-autoparallel submanifold MM satisfying pMp\in M and Tp(M)=VT_{p}(M)=V.

  • (ii)

    pS,Xp,YpTp(S),𝒯p()(Xp,Yp)span{Xp,Yp}\forall p\in S,\;\forall X_{p},Y_{p}\in T_{p}(S),\;{\mathcal{T}}^{(\nabla)}_{p}(X_{p},Y_{p})\in{\rm span}\{X_{p},Y_{p}\}, or equivalently, X,Y𝒳(S),𝒯()(X,Y)span(S){X,Y}:={fX+gY|f,g(S)}\forall X,Y\in{\mathscr{X}}(S),\;{\mathcal{T}}^{(\nabla)}(X,Y)\in{\rm span}_{{\mathcal{F}}(S)}\{X,Y\}:=\{fX+gY\,|\,f,g\in{\mathcal{F}}(S)\}.

Let us apply the above considerations to 𝒮=𝒮(){\mathcal{S}}={\mathcal{S}}({\mathcal{H}}) with the SLD structure and its submanifolds. Let 𝒜{\mathcal{A}} be an arbitrary linear subspace of h{\mathcal{L}}_{{\rm h}}, and define for each point ρ𝒮\rho\in{\mathcal{S}}

V𝒜,ρ\displaystyle V_{{\mathcal{A}},\rho} :={XρTρ(𝒮)|A𝒜,LXρ=AAρ}\displaystyle:=\{X_{\rho}\in T_{\rho}({\mathcal{S}})\,|\,\exists A\in{\mathcal{A}},\;L_{X_{\rho}}=A-\left\langle A\right\rangle_{\rho}\}
={XρTρ(𝒮)|LXρ𝒜+},\displaystyle=\{X_{\rho}\in T_{\rho}({\mathcal{S}})\,|\,L_{X_{\rho}}\in{\mathcal{A}}+\mathbb{R}\}, (8.6)

where \mathbb{R} is identified with {cI|c}\{cI\,|\,c\in\mathbb{R}\}. Then 𝒱𝒜:𝒮ρV𝒜,ρ{\mathcal{V}}_{{\mathcal{A}}}:{\mathcal{S}}\ni\rho\mapsto V_{{\mathcal{A}},\rho} defines an e-parallel distribution on 𝒮{\mathcal{S}}, whose dimension dimV𝒜,ρ\dim V_{{\mathcal{A}},\rho} is equal to dim𝒜\dim{\mathcal{A}} when I𝒜I\notin{\mathcal{A}} and dim𝒜1\dim{\mathcal{A}}-1 otherwise. Every e-parallel distribution on 𝒮{\mathcal{S}} is represented as 𝒱𝒜{\mathcal{V}}_{{\mathcal{A}}} by some 𝒜{\mathcal{A}}, and 𝒱𝒜=𝒱𝒜{\mathcal{V}}_{{\mathcal{A}}}={\mathcal{V}}_{{\mathcal{A}}^{\prime}} iff 𝒜+=𝒜+{\mathcal{A}}+\mathbb{R}={\mathcal{A}}^{\prime}+\mathbb{R}. This means that 𝒜𝒱𝒜{\mathcal{A}}\mapsto{\mathcal{V}}_{{\mathcal{A}}} establishes a one-to-one correspondence between linear subspaces of the quotient space h/{\mathcal{L}}_{{\rm h}}/\mathbb{R} and e-parallel distributions on 𝒮{\mathcal{S}}.

Theorem 8.4.

Given a subspace 𝒜h{\mathcal{A}}\subset{\mathcal{L}}_{{\rm h}}, the following conditions are equivalent.

  • (i)

    For every ρ𝒮\rho\in{\mathcal{S}}, there exists an e-autoparallel submanifold {\mathcal{M}} of 𝒮{\mathcal{S}} satisfying ρ\rho\in{\mathcal{M}} and Tρ()=V𝒜,ρT_{\rho}({\mathcal{M}})=V_{{\mathcal{A}},\rho}.

  • (ii)

    For every ρ𝒮\rho\in{\mathcal{S}},

    {[[A,B],ρ]|A,B𝒜}{ρC|C𝒜+}.\{[[A,B],\rho]\,|\,A,B\in{\mathcal{A}}\}\subset\{\rho\circ C\,|\,C\in{\mathcal{A}}+\mathbb{R}\}. (8.7)

Proof  From (3.26) it follows that for any Xρ,Yρ,ZρTρ(𝒮)X_{\rho},Y_{\rho},Z_{\rho}\in T_{\rho}({\mathcal{S}})

𝒯ρ(e)(Xρ,Yρ)=Zρ14[[LXρ,LYρ],ρ]=ι(Zρ)=ρLZρ.{\mathcal{T}}_{\rho}^{({\rm e})}(X_{\rho},Y_{\rho})=Z_{\rho}\;\Leftrightarrow\;\frac{1}{4}[[L_{X_{\rho}},L_{Y_{\rho}}],\rho]=\iota_{*}(Z_{\rho})=\rho\circ L_{Z_{\rho}}.

Hence, noting that [AAρ,BBρ]=[A,B][A-\left\langle A\right\rangle_{\rho},B-\left\langle B\right\rangle_{\rho}]=[A,B], we obtain (i) \Leftrightarrow (ii) from (i) \Leftrightarrow (iv) in Prop. ​8.1. \Box

Let F1,,FnF^{1},\ldots,F^{n} be Hermitian operators on {\mathcal{H}} such that i,j,[Fi,Fj]=0\forall i,j,\;[F^{i},F^{j}]=0 and that {F1,,Fn,I}\{F^{1},\ldots,F^{n},I\} are linearly independent (cf. Remark ​3.9), and let 𝒜:=span{F1,,Fn}(∌I){\mathcal{A}}:={\rm span}\,\{F^{1},\ldots,F^{n}\}\;(\not\ni I). Then for any ρ𝒮\rho\in{\mathcal{S}} and any A,B𝒜A,B\in{\mathcal{A}} we have [[A,B],ρ]=0[[A,B],\rho]=0, so that (ii) in Theorem ​8.4 trivially holds. Hence the distribution 𝒱𝒜{\mathcal{V}}_{{\mathcal{A}}} is integrable, and we obtain a foliation 𝒮=αα{\mathcal{S}}=\bigsqcup_{\alpha}{\mathcal{M}}_{\alpha}, whose leaves {α}\{{\mathcal{M}}_{\alpha}\} are nn-dimensional quasi-exponential families of the form (4.15).

Remark 8.5.

Recall the situation when we defined the quantum Gaussian shift model ={\mathcal{M}}= {ρξ|ξ=(ξ1,,ξn)n}\{\rho_{\xi}\,|\,\xi=(\xi^{1},\ldots,\xi^{n})\in\mathbb{R}^{n}\} in Section 6, and let 𝒜:={\mathcal{A}}:= span{R(z1),,R(zn)}{\rm span}\leavevmode\nobreak\ \!\{R(z_{1}),\ldots,R(z_{n})\}. Then, for any A,B𝒜A,B\in{\mathcal{A}}, we have [A,B]=cI[A,B]=cI with a purely imaginary constant cc and hence [[A,B],ρ]=0[[A,B],\rho]=0. So, (ii) in Theorem ​8.4 holds at least formally, and the Gaussian model may be regarded as an integral manifold of 𝒱𝒜{\mathcal{V}}_{{\mathcal{A}}}. Note, however, that Theorem ​8.4 is not valid in the infinite-dimensional case so that (ii) does not imply (i), because various mathematical problems arise that were not present in the finite dimensional case, such as the fact that a positive operator does not always have finite trace and hence is not always normalizable.

Remark 8.6.

Let us revisit the relationship between autoparallelity and total geodesicness described in Remark ​2.2 in the context of Prop. ​8.3. Suppose that (S,)(S,\nabla) satisfies conditions (i)-(ii) of Prop. ​8.3 and that a submanifold MM of SS is \nabla-totally geodesic. Given a point pMp\in M arbitrarily, there exists a \nabla-autoparallel submanifold NN which satisfies pNp\in N and Tp(N)=Tp(M)T_{p}(N)=T_{p}(M) by condition (i). Since NN is also \nabla-totally geodesic, we have M=NM=N, so that MM is \nabla-autoparallel. Namely, condition (ii) together with the curvature-freeness of \nabla implies the equivalence between \nabla-autoparallelity and \nabla-total geodesicness. In fact, the curvature-freeness is unnecessary, and their equivalence follows from condition (ii) alone. See A4 of Appendix for details.

At the end of this section, we give an examples of e-autoparallel submanifold that does not fall within the scope of Theorem ​8.4. Let ={|1,,|d}{\mathcal{B}}=\{\left|1\right\rangle,\ldots,\left|d\right\rangle\} with d=dimd=\dim{\mathcal{H}} be an arbitrary orthonormal basis of {\mathcal{H}}, and let :={i,jaij|ij||[aij]d×d}{\mathcal{L}}^{{\mathcal{B}}}:=\{\sum_{i,j}a_{ij}\left|i\right\rangle\left\langle j\right|\,|\,[a_{ij}]\in\mathbb{R}^{d\times d}\}, h:=h{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}:={\mathcal{L}}_{{\rm h}}\cap{\mathcal{L}}^{{\mathcal{B}}} and 𝒮:=𝒮{\mathcal{S}}^{{\mathcal{B}}}:={\mathcal{S}}\cap{\mathcal{L}}^{{\mathcal{B}}}.

Proposition 8.7.

𝒮{\mathcal{S}}^{{\mathcal{B}}} is e-autoparallel in 𝒮{\mathcal{S}}.

Proof  It is easy to see that for each ρ𝒮\rho\in{\mathcal{S}}^{{\mathcal{B}}}

Tρ(m)(𝒮):={ι(Xρ)|XρTρ(𝒮)}={Ah|TrA=0}T^{({\rm m})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}}):=\{\iota_{*}(X_{\rho})\,|\,X_{\rho}\in T_{\rho}({\mathcal{S}}^{{\mathcal{B}}})\}=\{A\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\,|\,{\rm Tr}A=0\} (8.8)

and that

Tρ(e)(𝒮)\displaystyle T^{({\rm e})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}}) :={LXρ|XρTρ(𝒮)}\displaystyle:=\{L_{X_{\rho}}\,|\,X_{\rho}\in T_{\rho}({\mathcal{S}}^{{\mathcal{B}}})\}
={Ah|BTρ(m)(𝒮),B=ρA}\displaystyle=\{A\in{\mathcal{L}}_{{\rm h}}\,|\,\exists B\in T^{({\rm m})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}}),\;B=\rho\circ A\}
={Ah|Aρ=0}.\displaystyle=\{A\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\,|\,\left\langle A\right\rangle_{\rho}=0\}. (8.9)

(See Remark ​3.1 for the symbols T(m)T^{({\rm m})} and T(e)T^{({\rm e})}.) It follows from (8.9) that ATρ(e)(𝒮)AAσTσ(e)(𝒮)A\in T^{({\rm e})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}})\;\Leftrightarrow\;A-\left\langle A\right\rangle_{\sigma}\in T^{({\rm e})}_{\sigma}({\mathcal{S}}^{{\mathcal{B}}}), and hence we have from (3.30) that XρTρ(𝒮)Φρ,σ(e)(Xρ)Tσ(𝒮)X_{\rho}\in T_{\rho}({\mathcal{S}}^{{\mathcal{B}}})\;\Leftrightarrow\;\Phi_{\rho,\sigma}^{({\rm e})}(X_{\rho})\in T_{\sigma}({\mathcal{S}}^{{\mathcal{B}}}). This proves the proposition by Prop. ​2.4. \Box

Let us examine whether the e-autoparallelity of 𝒮{\mathcal{S}}^{{\mathcal{B}}} can be understood as an example of Theorem ​8.4. Namely, the problem is whether 𝒮{\mathcal{S}}^{{\mathcal{B}}} is an integral manifold of an e-parallel distribution 𝒱𝒜{\mathcal{V}}_{{\mathcal{A}}} for some 𝒜{\mathcal{A}} satisfying condition (ii) in Theorem ​8.4. For each ρ𝒮\rho\in{\mathcal{S}}^{{\mathcal{B}}}, we have Tρ(e)(𝒮)={AAρ|Ah}T^{({\rm e})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}})=\{A-\left\langle A\right\rangle_{\rho}\,|\,A\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\} (see (8.9)), which means that Tρ(𝒮)=Vh,ρT_{\rho}({\mathcal{S}}^{{\mathcal{B}}})=V_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}},\rho} and that 𝒮{\mathcal{S}}^{{\mathcal{B}}} is an integral manifold of the distribution 𝒱h{\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}}. Noting that h+=h{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}+\mathbb{R}={\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}, the problem comes down to whether

{[[A,B],ρ]|A,Bh}{ρC|Ch}\{[[A,B],\rho]\,|\,A,B\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\}\subset\{\rho\circ C\,|\,C\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\} (8.10)

holds for every ρ𝒮\rho\in{\mathcal{S}}. The answer is no, except when dim=2\dim{\mathcal{H}}=2.

Proposition 8.8.

When dim3\dim{\mathcal{H}}\geq 3,

ρ𝒮,{[[A,B],ρ]|A,Bh}{ρC|Ch}.\exists\rho\in{\mathcal{S}},\;\{[[A,B],\rho]\,|\,A,B\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\}\not\subset\{\rho\circ C\,|\,C\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\}. (8.11)

As a consequence, the distribution 𝒱h{\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}} is not involutive.

Proof  We represent operators on a dd-dimensional Hilbert space by d×dd\times d matrices, and show that there exist a strictly positive density matrix ρ\rho and real symmetric matrices A,BA,B such that [[A,B],ρ][[A,B],\rho] cannot be represented as ρC\rho\circ C by any real symmetric CC when d3d\geq 3. Let

A1:=(100010000),B1:=(011101110),P1:=(1iεiεiε1iεiεiε1),A_{1}:=\left(\begin{matrix}1&0&0\\ 0&1&0\\ 0&0&0\end{matrix}\right),\;B_{1}:=\left(\begin{matrix}0&1&1\\ 1&0&1\\ 1&1&0\end{matrix}\right),\;P_{1}:=\left(\begin{matrix}1&i{\varepsilon}&i{\varepsilon}\\ -i{\varepsilon}&1&i{\varepsilon}\\ -i{\varepsilon}&-i{\varepsilon}&1\end{matrix}\right),

where i:=1i:=\sqrt{-1} and ε{\varepsilon} is an arbitrary real number, and let A,BA,B and ρ\rho be d×dd\times d matrices with the block representations

A:=(A1000),B:=(B1000),ρ:=1d(P100I).A:=\left(\begin{array}[]{c|c}A_{1}&0\\ \hline\cr 0&0\end{array}\right),\;B:=\left(\begin{array}[]{c|c}B_{1}&0\\ \hline\cr 0&0\end{array}\right),\;\rho:=\frac{1}{d}\left(\begin{array}[]{c|c}P_{1}&0\\ \hline\cr 0&I\end{array}\right).

Then A,BA,B are real symmetric, and ρ\rho is Hermitian with trace 1 and strictly positive when |ε||{\varepsilon}| is sufficiently small. A direct calculation shows that

[[A,B],ρ]=1d(Q100I)withQ1:=[[A1,B1],P1]=(00iε00iεiεiε0).[[A,B],\rho]=\frac{1}{d}\left(\begin{array}[]{c|c}Q_{1}&0\\ \hline\cr 0&I\end{array}\right)\quad\text{with}\quad Q_{1}:=[[A_{1},B_{1}],P_{1}]=\left(\begin{matrix}0&0&-i{\varepsilon}\\ 0&0&i{\varepsilon}\\ i{\varepsilon}&-i{\varepsilon}&0\end{matrix}\right).

Suppose that a d×dd\times d real symmetric matrix CC satisfies [[A,B],ρ]=ρC[[A,B],\rho]=\rho\circ C. Letting C1C_{1} be the 3×33\times 3 block of CC, the 3×33\times 3 block of ρC\rho\circ C equals 1dP1C1\frac{1}{d}P_{1}\circ C_{1}. Hence we have Q1=P1C1Q_{1}=P_{1}\circ C_{1}, which is rewritten as

iε(001001110)=C1+iε(011101110)C1.i{\varepsilon}\left(\begin{matrix}0&0&-1\\ 0&0&1\\ 1&-1&0\end{matrix}\right)=C_{1}+i{\varepsilon}\left(\begin{matrix}0&1&1\\ -1&0&1\\ -1&-1&0\end{matrix}\right)\circ C_{1}.

Since C13×3C_{1}\in\mathbb{R}^{3\times 3} and ε{\varepsilon}\in\mathbb{R}, this implies that C1=0C_{1}=0 and ε=0{\varepsilon}=0. Therefore, if we take ε0{\varepsilon}\neq 0, no real symmetric CC satisfies [[A,B],ρ]=ρC[[A,B],\rho]=\rho\circ C . \Box

The above result implies that the e-parallel distribution 𝒱h{\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}} does not induce a foliation with e-autoparallel leaves and that 𝒮{\mathcal{S}}^{{\mathcal{B}}} is an isolated integral manifold of 𝒱h{\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}} when dim3\dim{\mathcal{H}}\geq 3. The exceptional case dim=2\dim{\mathcal{H}}=2 will be discussed in the next section.

Remark 8.9.

Prop. ​8.7 holds for a wide class of information geometric structures, not limited to the SLD structure. In fact, the proof of Prop. ​8.7 given above relies only upon the fact that if ρ𝒮\rho\in{\mathcal{S}}^{{\mathcal{B}}} and XρTρ(𝒮)X_{\rho}\in T_{\rho}({\mathcal{S}}^{{\mathcal{B}}}), then LXρhL_{X_{\rho}}\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}. Due to (3.8) stating that ι(Xρ)=Ωρ(LXρ)\iota_{*}(X_{\rho})=\Omega_{\rho}(L_{X_{\rho}}), this fact is shared by the e-connection defined from an arbitrary family of inner products ,ρ=,Ωρ()HS\left\langle\cdot,\cdot\right\rangle_{\rho}=\left\langle\cdot,\Omega_{\rho}(\cdot)\right\rangle_{\rm HS}, ρ𝒮\rho\in{\mathcal{S}}, such that

ρ𝒮,Ωρ(h)=h.\forall\rho\in{\mathcal{S}}^{{\mathcal{B}}},\;\Omega_{\rho}({\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}})={\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}. (8.12)

This means that Prop. ​8.7 holds under this mild condition on {Ωρ}ρ𝒮\{\Omega_{\rho}\}_{\rho\in{\mathcal{S}}}. In particular, if Ωρ\Omega_{\rho} is represented in the form (3.21) by a function f:(0,)(0,)f:(0,\infty)\rightarrow(0,\infty) such that x>0,xf(1/x)=f(x)\forall x>0,\,xf(1/x)=f(x) and f(1)=1f(1)=1 as in the case of monotone metrics, condition (8.12) is satisfied. To verify this, we represent (3.21) as Ωρ=f(Δρ)ρ\Omega_{\rho}=f(\Delta_{\rho}){\mathcal{R}}_{\rho}, where ρ:AAρ{\mathcal{R}}_{\rho}:A\mapsto A\rho, and consider Ωρ\Omega_{\rho} as a \mathbb{C}-linear map {\mathcal{L}}\rightarrow{\mathcal{L}}. Then it is easy to see that if ρ𝒮\rho\in{\mathcal{S}}^{{\mathcal{B}}}, then ρ()={\mathcal{R}}_{\rho}({\mathcal{L}}^{{\mathcal{B}}})={\mathcal{L}}^{{\mathcal{B}}} and Δρ()=\Delta_{\rho}({\mathcal{L}}^{{\mathcal{B}}})={\mathcal{L}}^{{\mathcal{B}}}, which yields f(Δρ)()=f(\Delta_{\rho})({\mathcal{L}}^{{\mathcal{B}}})={\mathcal{L}}^{{\mathcal{B}}}, and hence we have Ωρ()=\Omega_{\rho}({\mathcal{L}}^{{\mathcal{B}}})={\mathcal{L}}^{{\mathcal{B}}}. Combined with Ωρ(h)=h\Omega_{\rho}({\mathcal{L}}_{{\rm h}})={\mathcal{L}}_{{\rm h}}, this proves (8.12).

Remark 8.10.

Since (8.8) shows that Tρ(m)(𝒮)=ι(Tρ(𝒮))T^{({\rm m})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}})=\iota_{*}(T_{\rho}({\mathcal{S}}^{{\mathcal{B}}})) does not depend on ρ\rho, 𝒮{\mathcal{S}}^{{\mathcal{B}}} is m-autoparallel in 𝒮{\mathcal{S}}, so that 𝒮{\mathcal{S}}^{{\mathcal{B}}} is doubly autoparallel (e.g, [10]) w.r.t. the e, ​m-connections. This example exhibits a remarkable contrast to the following fact for the classical case [11]; if a submanifold {\mathcal{M}} of 𝒫(Ω){\mathcal{P}}(\Omega), where Ω\Omega is an arbitrary finite set, is doubly autoparallel in 𝒫(Ω){\mathcal{P}}(\Omega) w.r.t. the e, m-connections, then {\mathcal{M}} is statistically isomorphic to 𝒫(Ω){\mathcal{P}}(\Omega^{\prime}) for some finite set Ω\Omega^{\prime}.

9 Qubit manifolds

Throughout this section, we assume {\mathcal{H}} to be 2-dimensional. To begin with, we make some preparations. Let {σ1,σ2,σ3}h\{\sigma_{1},\sigma_{2},\sigma_{3}\}\subset{\mathcal{L}}_{{\rm h}} be a triple of Pauli operators such that

Trσi=0,σi2=I,andσiσi+1=1σi+2(i:mod 3).{\rm Tr}\,\sigma_{i}=0,\quad\sigma_{i}^{2}=I,\quad\text{and}\quad\sigma_{i}\sigma_{i+1}=\sqrt{-1}\,\sigma_{i+2}\quad(i:{\rm mod}\ 3). (9.1)

Then {σ1,σ2,σ3}\{\sigma_{1},\sigma_{2},\sigma_{3}\} form a basis of h,0{{\mathcal{L}}_{{\rm h}}}_{,0}. For any a=(ai)3\vec{a}=(a_{i})\in\mathbb{R}^{3}, we write aσ:=iaiσi\vec{a}\cdot\vec{\sigma}:=\sum_{i}a_{i}\sigma_{i}, so that we have

h,0={aσ|a3}.{{\mathcal{L}}_{{\rm h}}}_{,0}=\{\vec{a}\cdot\vec{\sigma}\,|\,\vec{a}\in\mathbb{R}^{3}\}. (9.2)

It follows that

(aσ)(bσ)\displaystyle(\vec{a}\cdot\vec{\sigma})(\vec{b}\cdot\vec{\sigma}) =(ab)I+1(a×b)σ,\displaystyle=(\vec{a}\cdot\vec{b})I+\sqrt{-1}\,(\vec{a}\times\vec{b})\cdot\vec{\sigma}, (9.3)
(aσ)(bσ)\displaystyle(\vec{a}\cdot\vec{\sigma})\circ(\vec{b}\cdot\vec{\sigma}) =(ab)I,\displaystyle=(\vec{a}\cdot\vec{b})I, (9.4)
[aσ,bσ]\displaystyle[\vec{a}\cdot\vec{\sigma},\vec{b}\cdot\vec{\sigma}] =21(a×b)σ,\displaystyle=2\sqrt{-1}\,(\vec{a}\times\vec{b})\cdot\vec{\sigma}, (9.5)

where ab=iaibi\vec{a}\cdot\vec{b}=\sum_{i}a_{i}b_{i}, and a×b=c\vec{a}\times\vec{b}=\vec{c} \Leftrightarrow i:mod 3\forall i:{\rm mod}\ 3, aibi+1ai+1bi=ci+2a_{i}b_{i+1}-a_{i+1}b_{i}=c_{i+2}. The manifold 𝒮=𝒮(){\mathcal{S}}={\mathcal{S}}({\mathcal{H}}) is represented as

𝒮={ρr|r},{\mathcal{S}}=\{\rho_{\vec{r}}\,|\,\vec{r}\in{\mathcal{R}}\}, (9.6)

where

ρr:=12(I+rσ),:={r3|r:=rr<1}.\rho_{\vec{r}}:=\frac{1}{2}(I+\vec{r}\cdot\vec{\sigma}),\quad{\mathcal{R}}:=\{\vec{r}\in\mathbb{R}^{3}\,|\,\|\vec{r}\|:=\sqrt{\vec{r}\cdot\vec{r}}<1\}. (9.7)

For ρ=ρr\rho=\rho_{\vec{r}} and A=a0I+aσA=a_{0}I+\vec{a}\cdot\vec{\sigma}, we have Aρ=a0+ra\left\langle A\right\rangle_{\rho}=a_{0}+\vec{r}\cdot\vec{a}.

A tangent vector XρTρ(𝒮)X_{\rho}\in T_{\rho}({\mathcal{S}}) at ρ=ρr\rho=\rho_{\vec{r}} is represented by a 3-dimensional vector x3\vec{x}\in\mathbb{R}^{3} such that

ι(Xρ)=12xσ.\iota_{*}(X_{\rho})=\frac{1}{2}\vec{x}\cdot\vec{\sigma}. (9.8)

The SLD of XρX_{\rho} is then represented as

LXρ=r(x)σλr(x)I,L_{X_{\rho}}=\ell_{\vec{r}}(\vec{x})\cdot\vec{\sigma}-\lambda_{\vec{r}}(\vec{x})I, (9.9)

where

λr(x):=xr1r2andr(x):=x+λr(x)r.\lambda_{\vec{r}}(\vec{x}):=\frac{\vec{x}\cdot\vec{r}}{1-\|\vec{r}\|^{2}}\quad\text{and}\quad\ell_{\vec{r}}(\vec{x}):=\vec{x}+\lambda_{\vec{r}}(\vec{x})\,\vec{r}. (9.10)

In fact, (9.9) is verified as follows: noting that (9.10) yields rr(x)=λr(x)\vec{r}\cdot\ell_{\vec{r}}(\vec{x})=\lambda_{\vec{r}}(\vec{x}), we have

ρLXρ\displaystyle\rho\circ L_{X_{\rho}} =12(I+rσ)(r(x)σλr(x)I)\displaystyle=\frac{1}{2}(I+\vec{r}\cdot\vec{\sigma})\circ(\ell_{\vec{r}}(\vec{x})\cdot\vec{\sigma}-\lambda_{\vec{r}}(\vec{x})I)
=12(r(x)λr(x)r)σ+12((rr(x))λr(x))I\displaystyle=\frac{1}{2}(\ell_{\vec{r}}(\vec{x})-\lambda_{\vec{r}}(\vec{x})\,\vec{r})\cdot\vec{\sigma}+\frac{1}{2}((\vec{r}\cdot\ell_{\vec{r}}(\vec{x}))-\lambda_{\vec{r}}(\vec{x}))I
=12xσ=ι(Xr).\displaystyle=\frac{1}{2}\vec{x}\cdot\vec{\sigma}=\iota_{*}(X_{r}). (9.11)

Let us investigate the e-autoparallel submanifolds of 𝒮{\mathcal{S}}. We first consider the 1-dimensional case, i.e., the e-geodesics. We recall that the general form of e-geodesic is given by (4.18). Treating the coordinate θ\theta as a parameter to specify states and choosing PP in (4.18) to be a state ρ0\rho_{0}, an arbitrary e-geodesic {\mathcal{M}} is represented as the trajectory ={ρθ|θ}{\mathcal{M}}=\{\rho_{\theta}\,|\,\theta\in\mathbb{R}\} of

ρθ=1Zθexp(θ2F)ρ0exp(θ2F),Zθ:=Tr(ρ0exp(θF)),\rho_{\theta}=\frac{1}{Z_{\theta}}\exp\Bigl{(}\frac{\theta}{2}F\Bigr{)}\,\rho_{0}\exp\Bigl{(}\frac{\theta}{2}F\Bigr{)},\quad Z_{\theta}:={\rm Tr}(\rho_{0}\exp(\theta F)), (9.12)

where FF is a Hermitian operator such that {F,I}\{F,I\} are linearly independent. Since the transformation FaF+bF\rightarrow aF+b by a,ba,b\in\mathbb{R}, a0a\neq 0, together with θ1aθ\theta\rightarrow\frac{1}{a}\theta and ψψ+baθ\psi\rightarrow\psi+\frac{b}{a}\theta, keeps MM invariant, we can assume that FF is represented as F=uσF=\vec{u}\cdot\vec{\sigma} by a unit vector u\vec{u}.

Proposition 9.1.

Let ρ0=ρr0\rho_{0}=\rho_{\vec{r}_{0}} and F=uσF=\vec{u}\cdot\vec{\sigma} with u=1\|\vec{u}\|=1 in (9.12). Letting v\vec{v} be a unit vector such that uv=0\vec{u}\cdot\vec{v}=0 and that r0span{u,v}\vec{r}_{0}\in{\rm span}\{\vec{u},\vec{v}\}, the e-geodesic ={ρθ|θ}{\mathcal{M}}=\{\rho_{\theta}\,|\,\theta\in\mathbb{R}\} is represented as

={ρr|r𝒬}with𝒬:={r(ξ)|1<ξ<1},{\mathcal{M}}=\{\rho_{\vec{r}}\,|\,\vec{r}\in{\mathcal{Q}}\}\quad\text{with}\quad{\mathcal{Q}}:=\{\vec{r}(\xi)\,|\,-1<\xi<1\}, (9.13)

where

r(ξ):=ξu+c1ξ2v,\displaystyle\vec{r}(\xi):=\xi\,\vec{u}+c\sqrt{1-\xi^{2}}\,\vec{v}, (9.14)
c:=b1a2,a:=r0u,b:=r0v.\displaystyle\quad c:=\frac{b}{\sqrt{1-a^{2}}},\quad a:=\vec{r}_{0}\cdot\vec{u},\quad b:=\vec{r}_{0}\cdot\vec{v}. (9.15)

(Here ξ2\xi^{2} denotes the square of ξ\xi, while the same symbol will appear as the second component of ξ=(ξi)\xi=(\xi^{i}) later.) The parameter ξ\xi is m-affine as a coordinate system of {\mathcal{M}} and in one-to-one correspondence with the e-affine parameter θ\theta by

ξ\displaystyle\xi =(1+a)e2θ(1a)(1+a)e2θ+(1a)andθ=12log(1a)(1+ξ)(1+a)(1ξ).\displaystyle=\frac{(1+a)e^{2\theta}-(1-a)}{(1+a)e^{2\theta}+(1-a)}\quad\text{and}\quad\theta=\frac{1}{2}\log\frac{(1-a)(1+\xi)}{(1+a)(1-\xi)}. (9.16)

Proof  Noting that F=uσF=\vec{u}\cdot\vec{\sigma} is represented as

F=ρuρu=1ρu+(1)ρuF=\rho_{\vec{u}}-\rho_{-\vec{u}}=1\rho_{\vec{u}}+(-1)\rho_{-\vec{u}}

and that this is the spectral decomposition of FF with projectors {ρu,ρu}\{\rho_{\vec{u}},\rho_{-\vec{u}}\}, we have

exp(θ2F)=eθ/2ρu+eθ/2ρu=cosh(θ/2)I+sinh(θ/2)uσ.\exp\Bigl{(}\frac{\theta}{2}F\Bigr{)}=e^{\theta/2}\rho_{\vec{u}}+e^{-\theta/2}\rho_{-\vec{u}}=\cosh(\theta/2)I+\sinh(\theta/2)\vec{u}\cdot\vec{\sigma}.

Using this expression and representing r0\vec{r}_{0} as r0=au+bv\vec{r}_{0}=a\vec{u}+b\vec{v} by a:=r0ua:=\vec{r}_{0}\cdot\vec{u} and b:=r0vb:=\vec{r}_{0}\cdot\vec{v}, a direct calculation shows that

exp(θ2F)ρr0exp(θ2F)=Zθ2I+12{(acoshθ+sinhθ)u+bv}σ\displaystyle\exp\Bigl{(}\frac{\theta}{2}F\Bigr{)}\,\rho_{\vec{r}_{0}}\exp\Bigl{(}\frac{\theta}{2}F\Bigr{)}=\frac{Z_{\theta}}{2}\,I+\frac{1}{2}\bigl{\{}(a\cosh\theta+\sinh\theta)\,\vec{u}+b\,\vec{v}\bigr{\}}\cdot\vec{\sigma}

and Zθ=coshθ+asinhθ,Z_{\theta}=\cosh\theta+a\sinh\theta, which yields

ρθ=12(I+s(θ)σ),\rho_{\theta}=\frac{1}{2}(I+\vec{s}(\theta)\cdot\vec{\sigma}),

where

s(θ)\displaystyle\vec{s}(\theta) :=acoshθ+sinhθcoshθ+asinhθu+bcoshθ+asinhθv.\displaystyle:=\frac{a\cosh\theta+\sinh\theta}{\cosh\theta+a\sinh\theta}\,\vec{u}+\frac{b}{\cosh\theta+a\sinh\theta}\,\vec{v}.

If we define ξ\xi from θ\theta by (9.16), we have

acoshθ+sinhθcoshθ+asinhθ=ξandbcoshθ+asinhθ=c1ξ2,\frac{a\cosh\theta+\sinh\theta}{\cosh\theta+a\sinh\theta}=\xi\quad\text{and}\quad\frac{b}{\cosh\theta+a\sinh\theta}=c\sqrt{1-\xi^{2}},

so that s(θ)=r(ξ)\vec{s}(\theta)=\vec{r}(\xi). It is easy to see that the range of ξ\xi is (1,1)(-1,1), and we obtain (9.13). In addition, since

Fρr(ξ)=r(ξ)u=ξ,\left\langle F\right\rangle_{\rho_{\vec{r}(\xi)}}=\vec{r}(\xi)\cdot\vec{u}=\xi,

the parameter ξ\xi is m-affine. \Box

Note that 𝒬{\mathcal{Q}} in the above proposition forms a semi-ellipse in the open unit ball {\mathcal{R}} obtained by cutting an ellipse in half on the major axis; see Fig.1. In the special case of c=0c=0, the semi-ellipse becomes a straight line.

Refer to caption
Figure 1: The semi-ellipse representing an e-geodesic

Next, let us proceed to considering the 2-dimensional case. In searching for 2-dimensional e-autoparallel submanifolds, the previously obtained knowledge of e-geodesics provides an important clue. If a 2-dimensional submanifold ={ρr|r𝒬}{\mathcal{M}}=\{\rho_{\vec{r}}\,|\,\vec{r}\in{\mathcal{Q}}\} is e-autoparallel, it must be e-totally geodesic, and hence the surface 𝒬{\mathcal{Q}} should be a union of semi-ellipses. The following proposition claims that a 2-dimensional e-autoparallel submanifold is obtained as a semi-ellipsoid formed by rotating a semi-ellipse representing an e-geodesic around its minor axis.

Proposition 9.2.

Given an orthonormal basis {u1,u2,v}\{\vec{u}_{1},\vec{u}_{2},\vec{v}\} of 3\mathbb{R}^{3} and a real constant cc satisfying |c|<1|c|<1, let

𝒬:={r(ξ)|ξ=(ξ1,ξ2)2,(ξ1)2+(ξ2)2<1},{\mathcal{Q}}:=\{\vec{r}(\xi)\,|\,\xi=(\xi^{1},\xi^{2})\in\mathbb{R}^{2},\;(\xi^{1})^{2}+(\xi^{2})^{2}<1\}, (9.17)

where

r(ξ):=ξ1u1+ξ2u2+c1(ξ1)2(ξ2)2v.\vec{r}(\xi):=\xi^{1}\vec{u}_{1}+\xi^{2}\vec{u}_{2}+c\sqrt{1-(\xi^{1})^{2}-(\xi^{2})^{2}}\,\vec{v}. (9.18)

Then :={ρr|r𝒬}{\mathcal{M}}:=\{\rho_{\vec{r}}\,|\,\vec{r}\in{\mathcal{Q}}\} is e-autoparallel in 𝒮{\mathcal{S}}, and the parameter ξ=(ξ1,ξ2)\xi=(\xi^{1},\xi^{2}) is m-affine as a coordinate system of {\mathcal{M}}. More specifically, letting Fi:=uiσF^{i}:=\vec{u}_{i}\cdot\vec{\sigma}, 𝒜:=span{F1,F2}{\mathcal{A}}:={\rm span}\{F^{1},F^{2}\}, and 𝒱𝒜:𝒮ρV𝒜,ρ{\mathcal{V}}_{{\mathcal{A}}}:{\mathcal{S}}\ni\rho\mapsto V_{{\mathcal{A}},\rho} be the e-parallel distribution defined from 𝒜{\mathcal{A}} by (8.6), {\mathcal{M}} is an integral manifold of 𝒱𝒜{\mathcal{V}}_{{\mathcal{A}}} and ξi=Fiρr(ξ)\xi^{i}=\left\langle F^{i}\right\rangle_{\rho_{\vec{r}(\xi)}}.

Proof  For i{1,2}i\in\{1,2\}, let

xi:=ir(ξ)=uicξiα(ξ)v,\vec{x}_{i}:=\partial_{i}\vec{r}(\xi)=\vec{u}_{i}-\frac{c\,\xi^{i}}{\alpha(\xi)}\,\vec{v},

where i:=ξi\partial_{i}:=\frac{\partial}{\partial\xi^{i}} and α(ξ):=1(ξ1)2(ξ2)2\alpha(\xi):=\sqrt{1-(\xi^{1})^{2}-(\xi^{2})^{2}}. Noting that

r(ξ)2=1(1c2)α(ξ)2andxir(ξ)=(1c2)ξi,\|\vec{r}(\xi)\|^{2}=1-(1-c^{2})\alpha(\xi)^{2}\quad\text{and}\quad\vec{x}_{i}\cdot\vec{r}(\xi)=(1-c^{2})\xi^{i},

we have

r(ξ)(xi)\displaystyle\ell_{\vec{r}(\xi)}(\vec{x}_{i}) =xi+ξiα(ξ)2r(ξ)\displaystyle=\vec{x}_{i}+\frac{\xi^{i}}{\alpha(\xi)^{2}}\,\vec{r}(\xi)
=ui+ξiξ1α(ξ)2u1+ξiξ2α(ξ)2u2span{u1,u2},\displaystyle=\vec{u}_{i}+\frac{\xi^{i}\xi^{1}}{\alpha(\xi)^{2}}\,\vec{u}_{1}+\frac{\xi^{i}\xi^{2}}{\alpha(\xi)^{2}}\,\vec{u}_{2}\;\in{\rm span}\,\{\vec{u}_{1},\vec{u}_{2}\},

where the terms proportional to v\vec{v} included in xi\vec{x}_{i} and r(ξ)\vec{r}(\xi) cancel, yielding the last line. Owing to (9.9) this implies that the SLDs satisfy Li,ξspan{F1,F2}L_{i,\xi}\in{\rm span}\,\{F^{1},F^{2}\}\oplus\mathbb{R} for i{1,2}i\in\{1,2\}, which means that the first condition in (3.33) is satisfied. The second condition is also satisfied since Fiρr(ξ)=uir(ξ)=ξi\left\langle F^{i}\right\rangle_{\rho_{\vec{r}(\xi)}}=\vec{u}_{i}\cdot\vec{r}(\xi)=\xi^{i}. Thus the claim of the proposition follows from Prop. ​3.8. \Box

As can be seen from naive geometric intuition, for any point r\vec{r} in {\mathcal{R}} and any plane P=r+VP=\vec{r}+V containing r\vec{r}, where VV is a 2-dimensional linear subspace of 3\mathbb{R}^{3}, there always exist an orthonormal basis {u1,u2,v}\{\vec{u}_{1},\vec{u}_{2},\vec{v}\} and a constant c(1,1)c\in(-1,1) such that the semi-ellipsoid 𝒬{\mathcal{Q}} defined from them by (9.17) and (9.18) contains r\vec{r} and has PP as the tangent plane at r\vec{r}. In fact, such {u1,u2,v}\{\vec{u}_{1},\vec{u}_{2},\vec{v}\} and cc are obtained as follows: take an orthonormal basis {u1,u2,v}\{\vec{u}_{1},\vec{u}_{2},\vec{v}\} so that {u1,u2}r(V)={r(x)|xV}\{\vec{u}_{1},\vec{u}_{2}\}\subset\ell_{\vec{r}}(V)=\{\ell_{\vec{r}}(\vec{x})\,|\,\vec{x}\in V\}, and then let β2:=(ru1)2+(ru2)2\beta^{2}:=(\vec{r}\cdot\vec{u}_{1})^{2}+(\vec{r}\cdot\vec{u}_{2})^{2} (i.e. the squared norm of the orthogonal projection of r\vec{r} onto r(V)\ell_{\vec{r}}(V)), γ:=rv\gamma:=\vec{r}\cdot\vec{v}, and c:=γ/1β2c:=\gamma/\sqrt{1-\beta^{2}}. Since dim𝒮=3\dim{\mathcal{S}}=3, this fact means that (𝒮,(e))({\mathcal{S}},\nabla^{(e)}) satisfies condition (i) of Prop. ​8.3, and necessarily satisfies condition (ii) as well. Invoking (3.26), condition (ii) is expressed as follows.

Proposition 9.3.

When dim=2\dim{\mathcal{H}}=2, for any ρ𝒮\rho\in{\mathcal{S}} and any A,BhA,B\in{\mathcal{L}}_{{\rm h}} satisfying Aρ=Bρ=0\left\langle A\right\rangle_{\rho}=\left\langle B\right\rangle_{\rho}=0 we have

[[A,B],ρ]span{ρA,ρB}.[[A,B],\rho]\in{\rm span}\,\{\rho\circ A,\rho\circ B\}. (9.19)

This proposition can also be proved directly by the use of the following lemma, whose proof is given in A5 of Appendix.

Lemma 9.4.

When dim=2\dim{\mathcal{H}}=2, for any ρ𝒮\rho\in{\mathcal{S}} and any A,BhA,B\in{\mathcal{L}}_{{\rm h}}, we have

12[[A,B],ρ]=\displaystyle\frac{1}{2}[[A,B],\rho]= (TrA2Aρ)(ρB)(TrB2Bρ)(ρA)\displaystyle({\rm Tr}A-2\left\langle A\right\rangle_{\rho})(\rho\circ B)-({\rm Tr}B-2\left\langle B\right\rangle_{\rho})(\rho\circ A)
+{(TrB)Aρ(TrA)Bρ}ρ.\displaystyle+\bigl{\{}({\rm Tr}B)\left\langle A\right\rangle_{\rho}-({\rm Tr}A)\left\langle B\right\rangle_{\rho}\bigr{\}}\,\rho. (9.20)

Letting Aρ=Bρ=0\left\langle A\right\rangle_{\rho}=\left\langle B\right\rangle_{\rho}=0 in (9.20), we obtain

[[A,B],ρ]=\displaystyle[[A,B],\rho]= 2(TrA)(ρB)2(TrB)(ρA),\displaystyle 2({\rm Tr}A)\,(\rho\circ B)-2({\rm Tr}B)\,(\rho\circ A), (9.21)

which proves Prop. ​9.3.

The following proposition immediately follows from Prop. ​9.3, which presents a remarkable contrast to Prop. ​8.8 for the case dim3\dim{\mathcal{H}}\geq 3.

Proposition 9.5.

When dim=2\dim{\mathcal{H}}=2, for any orthonormal basis {\mathcal{B}} of {\mathcal{H}} it holds that

ρ𝒮,{[[A,B],ρ]|A,Bh}{ρC|Ch}.\forall\rho\in{\mathcal{S}},\;\{[[A,B],\rho]\,|\,A,B\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\}\subset\{\rho\circ C\,|\,C\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\}. (9.22)

Proof  Obvious from Prop. ​9.3 since h{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}} is an \mathbb{R}-linear space with IhI\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}. \Box

Thus, the distribution 𝒱h{\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}} is involutive, and induces a foliation 𝒮=αα{\mathcal{S}}=\bigsqcup_{\alpha}{\mathcal{M}}_{\alpha} whose leaves {α}\{{\mathcal{M}}_{\alpha}\} are 2-dimensional e-autoparallel submanifolds that are integral manifolds of 𝒱h{\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}}. Furthermore, we can see from the following lemma that every 22-dimensional e-autoparallel submanifold of 𝒮{\mathcal{S}} is an integral manifold of 𝒱h{\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}} for some {\mathcal{B}}.

Lemma 9.6.

When dim=2\dim{\mathcal{H}}=2, for any A,BhA,B\in{\mathcal{L}}_{{\rm h}} there exists an orthonormal basis {\mathcal{B}} such that {A,B}h\{A,B\}\subset{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}.

Proof  Let {|1,|2}\{\left|1\right\rangle,\left|2\right\rangle\} be an orthonormal basis that diagonalizes AA, and choose β\beta\in\mathbb{C} so that |β|=1|\beta|=1 and β1|B|2\beta\langle 1|B|2\rangle\in\mathbb{R}. Then :={|1,β|2}{\mathcal{B}}:=\{\left|1\right\rangle,\beta\left|2\right\rangle\} satisfies the desired condition. \Box

10 Concluding remarks

In this paper we studied the autoparallelity w.r.t. the e-connection for an information-geometric structure induced on 𝒮(){\mathcal{S}}({\mathcal{H}}). In particular, we focused on the e-autoparallelity for the SLD structure, for which two different estimation-theoretical characterizations were given. We also investigated the existence conditions for e-autoparallel submanifolds by way of the involutivity of e-parallel distributions and its relation to the torsion tensor. As a result, a specialty of the qubit case was revealed.

Since the obtained estimation-theoretical characterizations of the e-autoparallelity are complete in themselves, we do not see at this time what kind of development lies ahead. It is expected that the future development of quantum estimation theory and related fields may reveal new directions. The classical exponential family has a variety of important properties besides the existence of efficient estimator, some of which may present new materical to characterize certain geometric notions.

For the autoparallelity w.r.t. non-flat connections, our understanding is still very limited. For example, we do not yet have the whole picture about e-autoparallel submanifolds of 𝒮(){\mathcal{S}}({\mathcal{H}}) when dim3\dim{\mathcal{H}}\geq 3. We look forward to further research on this topic in information geometry and/or general differential geometry.

It may also be a challenging problem to develop the infinite-dimensional quantum information geometry so that Theorem ​5.1 is extended to the case when dim=\dim{\mathcal{H}}=\infty and that the naive geometric consideration on the quantum Gaussian shift model presented in Section ​6 is mathematically justified.

Geometry of quantum statistical manifolds in an asymptotic framework would also be an important subject to be addressed. For example, consider a sequence (n)={ρξn}{\mathcal{M}}^{(n)}=\{\rho_{\xi}^{\otimes n}\}, n=1,2,n=1,2,\dots, of i.i.d. ​extensions of a quantum statistical model ={ρξ}{\mathcal{M}}=\{\rho_{\xi}\}. Recent progress in asymptotic quantum statistics has revealed that the sequence exhibits a desirable property called a quantum local asymptotic normality, which tells us that in a shrinking (1/n\sim 1/\sqrt{n}) neighbourhood of a given point ξ0\xi_{0}, the sequence converges to a quantum Gaussian shift model [12, 13, 14, 15, 16]. As pointed out in Section ​6, the limiting quantum Gaussian shift model has a characteristic feature in view of quantum information geometry. It would, therefore, be an interesting future project to extend the geometrical idea presented in this paper to an asymptotic framework so that the convergence of quantum statistical manifolds can be discussed under a suitably chosen topology.

Acknowledgments

This work was partly supported by JSPS KAKENHI Grant Numbers 23H05492 (HN), 23H01090 (AF) and 17H02861 (HN, AF).

References

  • [1] Rao CR. Linear Statistical Inference and Its Applications (2nd ed). John Wiley & Sons (1973).
  • [2] Kiefer JC. Introduction to Statistical Inference. Springer-Verlag (1987).
  • [3] Amari S, Nagaoka H. Methods of Information Geometry. American Mathematical Society, Oxford University Press (2000).
  • [4] Kobayashi S, Nomizu K. Foundations of Differential Geometry, II. New York: John Wiley & Sons (1969).
  • [5] Nagaoka H., Amari S. Differential geometry of smooth families of probability distributions. METR 82-7, Dept Math Eng and Instr Phys, Univ. of Tokyo (1982). (https://www.keisu.t.u-tokyo.ac.jp/research/techrep/y1982/) (https://bsi-ni.brain.riken.jp/database/item/104)
  • [6] Helstrom CW. Minimum mean-square error estimation in quantum statistics. Phys Lett (1967) 25A:101–102 .
  • [7] Helstrom CW. Quantum Detection and Estimation Theory. New York: Academic Press (1976).
  • [8] Holevo AS. Probabilistic and Statistical Aspects of Quantum Theory. Pisa: Edizioni della Normale (2011). (the previous edition. Amsterdam: North-Holland (1982).)
  • [9] Petz D. Monotone metrics on matrix spaces. Linear Algebra Appl (1996) 244:81–96.
  • [10] Ohara A. Geodesics for dual connections and means on symmetric cones. Integr equ oper theory (2004) 50:537–548.
  • [11] Nagaoka H. Information-geometrical characterization of statistical models which are statistically equivalent to probability simplexes. Proc. 2017 IEEE International Symposium on Information (2017) 1346–1350.
  • [12] Guţă M, Kahn J. Local asymptotic normality for qubit states. Phys Rev A (2006) 73:052108.
  • [13] Kahn, J, Guţă M. Local asymptotic normality for finite dimensional quantum systems. Comm Math Phys (2009) 289:597–652.
  • [14] Yamagata K, Fujiwara A, Gill RD. Quantum local asymptotic normality based on a new quantum likelihood ratio, Ann Statist (2013) 41:2197–2217.
  • [15] Fujiwara A, Yamagata K. Noncommutative Lebesgue decomposition and contiguity with application to quantum local asymptotic normality. Bernoulli (2020) 26:2105–2142.
  • [16] Fujiwara A, Yamagata K. Efficiency of estimators for locally asymptotically normal quantum statistical models. Ann Statist (to appear). (arXiv:2209.00832)

Appendix

A1 Proof of (3.26)

We first consider the general situation where the e-connection (e)\nabla^{({\rm e})} is determined by a family of inner products A,Bρ=A,Ωρ(B)HS\left\langle A,B\right\rangle_{\rho}=\left\langle A,\Omega_{\rho}(B)\right\rangle_{\rm HS}, and show that the torsion 𝒯(e){{\mathcal{T}}}^{({\rm e})} of (e)\nabla^{({\rm e})} is represented as follows: for any X,Y𝒳(𝒮)X,Y\in{\mathscr{X}}({\mathcal{S}}),

ι(𝒯(e)(X,Y))\displaystyle\iota_{*}({{\mathcal{T}}}^{({\rm e})}(X,Y)) =(YΩ)(LX)(XΩ)(LY),\displaystyle=(Y\Omega)(L_{X})-(X\Omega)(L_{Y}), (A.1)

where YΩ:ρYρΩY\Omega:\rho\mapsto Y_{\rho}\Omega denotes the derivative of the super-operator-valued map Ω:ρΩρ\Omega:\rho\mapsto\Omega_{\rho} w.r.t. YY, and (YΩ)(LX)(Y\Omega)(L_{X}) denotes tha map ρ(YρΩ)(LXρ)h\rho\mapsto(Y_{\rho}\Omega)(L_{X_{\rho}})\in{\mathcal{L}}_{{\rm h}}. In fact, invoking (3.8) and (3.27), we have for any X,Y,Z𝒳(𝒮)X,Y,Z\in{\mathscr{X}}({\mathcal{S}}),

Z=𝒯(e)(X,Y)\displaystyle Z={{\mathcal{T}}}^{({\rm e})}(X,Y)\; Z=X(e)YY(e)X[X,Y]\displaystyle\Leftrightarrow\;Z=\nabla_{X}^{({\rm e})}Y-\nabla_{Y}^{({\rm e})}X-[X,Y]
LZ=(XLY+g(X,Y))(YLX+g(Y,X))L[X,Y]\displaystyle\Leftrightarrow\;L_{Z}=(XL_{Y}+g(X,Y))-(YL_{X}+g(Y,X))-L_{[X,Y]}
LZ=XLYYLXL[X,Y]\displaystyle\Leftrightarrow\;L_{Z}=XL_{Y}-YL_{X}-L_{[X,Y]}
ι(Z)=Ω(XLYYLXL[X,Y])\displaystyle\Leftrightarrow\;\iota_{*}(Z)=\Omega(XL_{Y}-YL_{X}-L_{[X,Y]})
ι(Z)=Ω(XLY)Ω(YLX)ι([X,Y]).\displaystyle\Leftrightarrow\;\iota_{*}(Z)=\Omega(XL_{Y})-\Omega(YL_{X})-\iota_{*}([X,Y]).

Noting that

ι([X,Y])\displaystyle\iota_{*}([X,Y]) =Xι(Y)Yι(X)\displaystyle=X\iota_{*}(Y)-Y\iota_{*}(X)
=X(Ω(LY))Y(Ω(LX))\displaystyle=X(\Omega(L_{Y}))-Y(\Omega(L_{X}))
=(XΩ)(LY)+Ω(XLY)(YΩ)(LX)Ω(YLX),\displaystyle=(X\Omega)(L_{Y})+\Omega(XL_{Y})-(Y\Omega)(L_{X})-\Omega(YL_{X}),

we obtain (A.1). For the SLD structure, we have Ωρ(A)=12(ι(ρ)A+Aι(ρ))\Omega_{\rho}(A)=\frac{1}{2}(\iota(\rho)\,A+A\,\iota(\rho)), which yields that for any point ρ𝒮\rho\in{\mathcal{S}} and any tangent vectors Xρ,YρTρ(𝒮)X_{\rho},Y_{\rho}\in T_{\rho}({\mathcal{S}}),

(YρΩ)(LXρ)\displaystyle(Y_{\rho}\Omega)(L_{X_{\rho}}) =12{ι(Yρ)LXρ+LXρι(Yρ)}\displaystyle=\frac{1}{2}\left\{\iota_{*}(Y_{\rho})L_{X_{\rho}}+L_{X_{\rho}}\iota_{*}(Y_{\rho})\right\}
=14(ρLYρLXρ+LYρρLXρ+LXρρLYρ+LXρLYρρ).\displaystyle=\frac{1}{4}(\rho L_{Y_{\rho}}L_{X_{\rho}}+L_{Y_{\rho}}\rho L_{X_{\rho}}+L_{X_{\rho}}\rho L_{Y_{\rho}}+L_{X_{\rho}}L_{Y_{\rho}}\rho).

Similarly, we have

(XρΩ)(LYρ)\displaystyle(X_{\rho}\Omega)(L_{Y_{\rho}}) =14(ρLXρLYρ+LXρρLYρ+LYρρLXρ+LYρLXρρ).\displaystyle=\frac{1}{4}(\rho L_{X_{\rho}}L_{Y_{\rho}}+L_{X_{\rho}}\rho L_{Y_{\rho}}+L_{Y_{\rho}}\rho L_{X_{\rho}}+L_{Y_{\rho}}L_{X_{\rho}}\rho).

Substituting these into (A.1), we obtain (3.26).

A2 Proof of Lemma 4.2

  • (1)

    Let Ai:=ξ^iΠ(dξ)A^{i}:=\int\hat{\xi}^{i}\Pi(d\xi). Then we have

    Ai\displaystyle A^{i} =kpkfi(k,Xk)=kpk(γik+wkipkXk)\displaystyle=\sum_{k}p_{k}f^{i}(k,X^{k})=\sum_{k}p_{k}(\gamma_{i}^{k}+\frac{w^{i}_{k}}{p_{k}}X^{k})
    =ξi(ρ)+kwkiXk=ξi(ρ)+kjwkiujkLρj=ξi(ρ)+Lρi,\displaystyle=\xi^{i}(\rho)+\sum_{k}w_{k}^{i}X^{k}=\xi^{i}(\rho)+\sum_{k}\sum_{j}w_{k}^{i}u^{k}_{j}\,L^{j}_{\rho}=\xi^{i}(\rho)+L_{\rho}^{i},

    where we have invoked (4.13), (4.10), (4.11) and (4.12). Now Π𝒰(ρ,ξ)\Pi\in{\mathcal{U}}(\rho,\xi) follows from (4.6).

  • (2)

    Invoking (4.13), we have for each kk

    Bk\displaystyle B^{k} :={iuik(ξ^iξi(ρ))}2Π(dξ^)\displaystyle:=\int\Bigl{\{}\sum_{i}u_{i}^{k}(\hat{\xi}^{i}-\xi^{i}(\rho))\Bigr{\}}^{2}\Pi(d\hat{\xi})
    =lpl{iuik(fi(l,Xl)ξi(ρ))}2\displaystyle=\sum_{l}p_{l}\Bigl{\{}\sum_{i}u_{i}^{k}(f^{i}(l,X^{l})-\xi^{i}(\rho))\Bigr{\}}^{2}
    =lpl(Clk+alk)2,\displaystyle=\sum_{l}p_{l}(C_{l}^{k}+a_{l}^{k})^{2}, (A.2)

    where

    Clk:=iuikwliplXl=δlkpkXk.C_{l}^{k}:=\sum_{i}u_{i}^{k}\frac{w_{l}^{i}}{p_{l}}X^{l}=\frac{\delta_{l}^{k}}{p_{k}}X^{k}. (A.3)

    This leads to

    (𝒖k)VρT(Π)𝒖k\displaystyle{({\mbox{\boldmath$u$}}^{k})}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$u$}}^{k} =Tr(ρBk)=lplTr(ρ(Clk+alk)2)\displaystyle={\rm Tr}(\rho B^{k})=\sum_{l}p_{l}{\rm Tr}(\rho(C_{l}^{k}+a_{l}^{k})^{2})
    =lplTr(ρ(Clk)2)+lpl(alk)2,\displaystyle=\sum_{l}p_{l}{\rm Tr}(\rho(C_{l}^{k})^{2})+\sum_{l}p_{l}(a_{l}^{k})^{2}, (A.4)

    where we invoked Tr(ρClk)=0{\rm Tr}(\rho\,C_{l}^{k})=0 due to Clkspan{Li,ρ}i=1nC_{l}^{k}\in{\rm span}\{L_{i,\rho}\}_{i=1}^{n}. Recalling (4.11) and (A.3), we have

    lplTr(ρ(Clk)2)\displaystyle\sum_{l}p_{l}{\rm Tr}(\rho(C_{l}^{k})^{2}) =1pk,Tr(ρ(Xk)2)\displaystyle=\frac{1}{p_{k}}^{,}{\rm Tr}(\rho(X^{k})^{2})
    =1pki,juikujkLρi,Lρjρ=1pk(𝒖k)Gρ1T𝒖k,\displaystyle=\frac{1}{p_{k}}\,\sum_{i,j}u_{i}^{k}u_{j}^{k}\left\langle L^{i}_{\rho},L^{j}_{\rho}\right\rangle_{\rho}=\frac{1}{p_{k}}\,{({\mbox{\boldmath$u$}}^{k})}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}^{k}, (A.5)

    which, combined with (A.4), yields the desired identity.

A3 Proofs of Propositions 7.9, 7.10 and 7.11

Proof of Prop. ​7.9  This proposition is essentially contained in Theorem 7.2 of [3]. Here we give a proof for the reader’s convenience.

Given FhF\in{\mathcal{L}}_{{\rm h}} and ρ𝒮\rho\in{\mathcal{S}}, there exists a tangent vector XρTρ(𝒮)X_{\rho}\in T_{\rho}({\mathcal{S}}) satisfying LXρ=FFρL_{X_{\rho}}=F-\left\langle F\right\rangle_{\rho} by (3.10). Applying (7.18) to the case =𝒮{\mathcal{M}}={\mathcal{S}}, we have Xρ=(gradF)ρX_{\rho}=({\rm grad}\leavevmode\nobreak\ \!\left\langle F\right\rangle)_{\rho}, and hence

(dF)ρρ2=Xρρ2=LXρ,LXρρ=(FFρ)2ρ=Vρ(F).\displaystyle\|(d\left\langle F\right\rangle)_{\rho}\|_{\rho}^{2}=\|X_{\rho}\|_{\rho}^{2}=\left\langle L_{X_{\rho}},L_{X_{\rho}}\right\rangle_{\rho}=\bigl{\langle}(F-\left\langle F\right\rangle_{\rho})^{2}\bigr{\rangle}_{\rho}=V_{\rho}(F).

\Box

Proof of Prop. ​7.10  Recalling (7.15) and (7.16), we have

(𝒮)={f(𝒮)|Fh,f=Fandρ𝒮,Vρ(F)=(df)ρρ2}.{\mathcal{E}}({\mathcal{S}})=\{f\in{\mathcal{F}}({\mathcal{S}})\,|\,\exists F\in{\mathcal{L}}_{{\rm h}},\;f=\left\langle F\right\rangle\;\;\text{and}\;\;\forall\rho\in{\mathcal{S}},\;V_{\rho}(F)=\|(df)_{\rho}\|_{\rho}^{2}\,\}. (A.6)

Since the condition Vρ(F)=(df)ρρ2V_{\rho}(F)=\|(df)_{\rho}\|_{\rho}^{2} is always satisfied by Prop. ​7.9, we have the first equality in (7.10). The second equality follows from Prop. ​7.2, and the third follows since under the relation XgωX\stackrel{{\scriptstyle g}}{{\longleftrightarrow}}\omega we have

XX is e-parallel Y,Z𝒳(𝒮),g(Y(e)X,Z)=0\displaystyle\Leftrightarrow\;\forall Y,Z\in{\mathscr{X}}({\mathcal{S}}),\;g(\nabla^{({\rm e})}_{Y}X,Z)=0
Y,Z𝒳(𝒮),Yg(X,Z)=g(X,Y(m)Z)\displaystyle\Leftrightarrow\;\forall Y,Z\in{\mathscr{X}}({\mathcal{S}}),\;Yg(X,Z)=g(X,\nabla^{({\rm m})}_{Y}Z)
Y,Z𝒳(𝒮),Yω(Z)=ω(Y(m)Z)\displaystyle\Leftrightarrow\;\forall Y,Z\in{\mathscr{X}}({\mathcal{S}}),\;Y\omega(Z)=\omega(\nabla^{({\rm m})}_{Y}Z)
ω is m-parallel.\displaystyle\Leftrightarrow\;\text{$\omega$ is m-parallel}. (A.7)

\Box

Proof of Prop. ​7.11  By Propositions 7.9 and 7.10, the condition imposed on ff in (7.25) is equivalent to the existence of FhF\in{\mathcal{L}}_{{\rm h}} satisfying (7.15). \Box

A4 A result on the relationship between autoparallelity and total geodesicness

In Remark ​8.6 we noted that condition (ii) of Prop. ​8.3 implies the equivalence between autoparallelity and total geodesicness. This is restated in the following proposition.

Proposition A.1.

Suppose that an affine connection \nabla is given on a manifold SS whose torsion satisfies

X,Y𝒳(S),𝒯()(X,Y)span(S){X,Y}.\forall X,Y\in{\mathscr{X}}(S),\;{\mathcal{T}}^{(\nabla)}(X,Y)\in{\rm span}_{{\mathcal{F}}(S)}\{X,Y\}. (A.8)

Then every \nabla-totally geodesic submanifold of SS is \nabla-autoparallel.

We present a proof below, which is almost parallel to the proof of Theorem ​8.4 in Chap. ​VII of [4] cited as a result due to E. Cartan.

Proof  Let dimS=n+r\dim S=n+r, and MM be a \nabla-totally geodesic submanifold with dimM=n\dim M=n. We take a coordinate system ξ~=(ξ~i)\tilde{\xi}=(\tilde{\xi}^{i}) of SS such that MM is represented as

M={pS|i{n+1,,n+r},ξ~i(p)=0}M=\{p\in S\,|\,\forall i\in\{n+1,\ldots,n+r\},\;\tilde{\xi}^{i}(p)=0\}

and that (ξ1,,ξn):=(ξ~1|M,,ξ~n|M)(\xi^{1},\ldots,\xi^{n}):=(\tilde{\xi}^{1}|_{M},\ldots,\tilde{\xi}^{n}|_{M}) forms a coordinate system of MM. Let ~i:=ξ~i\tilde{\partial}_{i}:=\frac{\partial}{\partial\tilde{\xi}^{i}}, i:=ξi\partial_{i}:=\frac{\partial}{\partial\xi^{i}}, and denote the connection coefficients of \nabla w.r.t. ξ~\tilde{\xi} by {Γijk}\{\Gamma_{ij}^{k}\}: ~i~j=kΓijk~k\nabla_{\tilde{\partial}_{i}}\tilde{\partial}_{j}=\sum_{k}\Gamma_{ij}^{k}\,\tilde{\partial}_{k} for i,j{1,,n+r}i,j\in\{1,\ldots,n+r\}. For arbitrary i,ji,j, it follows from the assumption (A.8) that 𝒯()(~i,~j)=k(ΓijkΓjik)~kspan(S){~i,~j}{\mathcal{T}}^{(\nabla)}(\tilde{\partial}_{i},\tilde{\partial}_{j})=\sum_{k}(\Gamma_{ij}^{k}-\Gamma_{ji}^{k})\,\tilde{\partial}_{k}\in{\rm span}_{{\mathcal{F}}(S)}\{\tilde{\partial}_{i},\tilde{\partial}_{j}\}, which implies that ΓijkΓjik=0\Gamma_{ij}^{k}-\Gamma_{ji}^{k}=0 for any k{i,j}k\notin\{i,j\}. Hence we have

i,j{1,,n},k{n+1,,n+r},Γijk=Γjik.\forall i,j\in\{1,\ldots,n\},\;\forall k\in\{n+1,\ldots,n+r\},\;\Gamma_{ij}^{k}=\Gamma_{ji}^{k}. (A.9)

Given a point pMp\in M and a tangent vector Xp=i=1nxi(i)pTp(M)X_{p}=\sum_{i=1}^{n}x^{i}\,(\partial_{i})_{p}\in T_{p}(M) arbitrarily, let γ:tγ(t)\gamma:t\mapsto\gamma(t) be a \nabla-geodesic with an affine parameter tt satisfying γ(0)=p\gamma(0)=p and γ˙(0):=ddtγ(t)|t=0=Xp\dot{\gamma}(0):=\frac{d}{dt}\gamma(t)|_{t=0}=X_{p}. The geodesic should satisfy the differential equation γ˙(t)γ˙(t)=0\nabla_{\dot{\gamma}(t)}\dot{\gamma}(t)=0, which is represented as

k{1,\displaystyle\forall k\in\{1, ,n+r},d2dt2ξ~k(γ(t))+i,j=1n+rddtξ~i(γ(t))ddtξ~j(γ(t))(Γijk)γ(t)=0.\displaystyle\ldots,n+r\},\;\frac{d^{2}}{dt^{2}}\,\tilde{\xi}^{k}(\gamma(t))+\sum_{i,j=1}^{n+r}\frac{d}{dt}\,\tilde{\xi}^{i}(\gamma(t))\,\frac{d}{dt}\,\tilde{\xi}^{j}(\gamma(t))\,\bigl{(}\Gamma_{ij}^{k}\bigr{)}_{\gamma(t)}=0.

Since MM is assumed to be \nabla-totally geodesic, γ(t)\gamma(t) stays in MM and hence ξ~k(γ(t))=0\tilde{\xi}^{k}(\gamma(t))=0 for k{n+1,,n+r}k\in\{n+1,\ldots,n+r\}. Therefore, the above equation yields

k{n+1,,n+r},i,j=1nddtξ~i(γ(t))ddtξ~j(γ(t))(Γijk)γ(t)=0,\forall k\in\{n+1,\ldots,n+r\},\;\sum_{i,j=1}^{n}\frac{d}{dt}\,\tilde{\xi}^{i}(\gamma(t))\,\frac{d}{dt}\,\tilde{\xi}^{j}(\gamma(t))\,\bigl{(}\Gamma_{ij}^{k}\bigr{)}_{\gamma(t)}=0,

and letting t=0t=0, we obtain

k{n+1,,n+r},i,j=1nxixj(Γijk)p=0.\forall k\in\{n+1,\ldots,n+r\},\;\sum_{i,j=1}^{n}x^{i}x^{j}\bigl{(}\Gamma_{ij}^{k}\bigr{)}_{p}=0.

Since pMp\in M and Xp=i=1nxi(i)pX_{p}=\sum_{i=1}^{n}x^{i}(\partial_{i})_{p} are arbitrary and Γijk\Gamma_{ij}^{k} is symmetric w.r.t. iji\leftrightarrow j by (A.9), it follows that

i,j{1,,n},k{n+1,,n+r},Γijk|M=0.\forall i,j\in\{1,\ldots,n\},\;\forall k\in\{n+1,\ldots,n+r\},\;\Gamma_{ij}^{k}|_{M}=0. (A.10)

Now, for arbitrary vector fields X=i=1nXii=i=1nXii~|MX=\sum_{i=1}^{n}X^{i}\partial_{i}=\sum_{i=1}^{n}X^{i}\tilde{\partial_{i}}|_{M} and Y=j=1nYjjY=\sum_{j=1}^{n}Y^{j}\partial_{j} =j=1nYjj~|M=\sum_{j=1}^{n}Y^{j}\tilde{\partial_{j}}|_{M} on MM, where {Xi},{Yj}(M)\{X^{i}\},\{Y^{j}\}\subset{\mathcal{F}}(M), we have

XY\displaystyle\nabla_{X}Y =i,j=1nk=1n+rXiYjΓijk|M~k|M+j=1nX(Yj)j\displaystyle=\sum_{i,j=1}^{n}\sum_{k=1}^{n+r}X^{i}Y^{j}\Gamma_{ij}^{k}|_{M}\,\tilde{\partial}_{k}|_{M}+\sum_{j=1}^{n}X(Y^{j})\partial_{j}
=i,j=1nk=1nXiYjΓijk|Mk+j=1nX(Yj)j𝒳(M),\displaystyle=\sum_{i,j=1}^{n}\sum_{k=1}^{n}X^{i}Y^{j}\Gamma_{ij}^{k}|_{M}\,\partial_{k}+\sum_{j=1}^{n}X(Y^{j})\partial_{j}\;\in{\mathscr{X}}(M),

which concludes that MM is \nabla-autoparallel in SS. \Box

Note that the only difference from the proof of [4] is whether (A.9) is derived from 𝒯()=0{\mathcal{T}}^{(\nabla)}=0 or from the weaker assumption (A.8).

A5 Proof of Lemma 9.4

When TrA=TrB=0{\rm Tr}A={\rm Tr}B=0, (9.20) is reduced to

12[[A,B],ρ]=\displaystyle\frac{1}{2}[[A,B],\rho]= 2Aρ(ρB)+2Bρ(ρA),\displaystyle-2\left\langle A\right\rangle_{\rho}(\rho\circ B)+2\left\langle B\right\rangle_{\rho}(\rho\circ A), (A.11)

which we prove first. Letting A=aσA=\vec{a}\cdot\vec{\sigma}, B=bσB=\vec{b}\cdot\vec{\sigma} and ρ=12(I+rσ)\rho=\frac{1}{2}(I+\vec{r}\cdot\sigma), it immediately follows from (9.4) and (9.5) that

12[[A,B],ρ]=(r×(a×b))σ\frac{1}{2}[[A,B],\rho]=(\vec{r}\times(\vec{a}\times\vec{b}))\cdot\vec{\sigma}

and

2Aρ(ρB)+2Bρ(ρA)={(br)a(ar)b}σ.-2\left\langle A\right\rangle_{\rho}(\rho\circ B)+2\left\langle B\right\rangle_{\rho}(\rho\circ A)=\bigl{\{}(\vec{b}\cdot\vec{r})\vec{a}-(\vec{a}\cdot\vec{r})\vec{b}\bigr{\}}\cdot\vec{\sigma}.

Hence, the well-known formula for the vector triple product proves (A.11).

Remove the assumption TrA=TrB=0{\rm Tr}A={\rm Tr}B=0, and let A:=ATrA2IA^{\prime}:=A-\frac{{\rm Tr}A}{2}I and B:=BTrB2IB^{\prime}:=B-\frac{{\rm Tr}B}{2}I. Then we have

12[[A,B],ρ]=\displaystyle\frac{1}{2}[[A,B],\rho]= 12[[A,B],ρ]\displaystyle\frac{1}{2}[[A^{\prime},B^{\prime}],\rho]
=\displaystyle= 2Aρ(ρB)+2Bρ(ρA)\displaystyle-2\left\langle A^{\prime}\right\rangle_{\rho}(\rho\circ B^{\prime})+2\left\langle B^{\prime}\right\rangle_{\rho}(\rho\circ A^{\prime})
=\displaystyle= (TrA2Aρ)(ρB)(TrB2Bρ)(ρA)\displaystyle({\rm Tr}A-2\left\langle A\right\rangle_{\rho})(\rho\circ B)-({\rm Tr}B-2\left\langle B\right\rangle_{\rho})(\rho\circ A)
+{(TrB)Aρ(TrA)Bρ}ρ,\displaystyle+\bigl{\{}({\rm Tr}B)\left\langle A\right\rangle_{\rho}-({\rm Tr}A)\left\langle B\right\rangle_{\rho}\bigr{\}}\,\rho,

where the second equality follows from (A.11). Thus we obtain (9.20).