Autoparallelity of Quantum Statistical Manifolds in The Light of Quantum Estimation Theory

Hiroshi Nagaoka The University of Electro-Communications, Chofu, Tokyo 182-8585, Japan. nagaoka@is.uec.ac.jp Akio Fujiwara Department of Mathematics, Osaka University, Toyonaka, Osaka 560-0043, Japan. fujiwara@math.sci.osaka-u.ac.jp

Abstract

In this paper we study the autoparallelity w.r.t. the e-connection for an information-geometric structure called the SLD structure, which consists of a Riemannian metric and mutually dual e- and m-connections, induced on the manifold of strictly positive density operators. Unlike the classical information geometry, the e-connection has non-vanishing torsion, which brings various mathematical difficulties. The notion of e-autoparallel submanifolds is regarded as a quantum version of exponential families in classical statistics, which is known to be characterized as statistical models having efficient estimators (unbiased estimators uniformly achieving the equality in the Cramér-Rao inequality). As quantum extensions of this classical result, we present two different forms of estimation-theoretical characterizations of the e-autoparallel submanifolds. We also give several results on the e-autoparallelity, some of which are valid for the autoparallelity w.r.t. an affine connection in a more general geometrical situation.

Keywords— quantum estimation theory, information geometry, autoparallel submanifold, dual connection, torsion, SLD (symmetric logarithmic derivative)

1 Introduction

The autoparallelity is a multi-dimensional version (including the 1-dimensional case in particular) of the notion of geodesic for a manifold equipped with an affine connection. In the classical information geometry for manifolds of probability distributions, where the triple $(g,\nabla^{({\rm e})},\nabla^{({\rm m})})$ of the Fisher metric $g$ , the e-connection $\nabla^{({\rm e})}$ and the m-connection $\nabla^{({\rm m})}$ plays the leading role, the autoparallelity w.r.t. (with respect to) the e-connection, which is called the e-autoparallelity, is particularly important. This is because the e-autoparallel submanifolds of the space ${\mathcal{P}}$ consisting of all strictly positive probability distributions on a finite set are the exponential families, which is one of the key concepts in probability theory and statistics. For quantum statistical manifolds, which are submanifolds of the space ${\mathcal{S}}$ consisting of all strictly positive density operators on a finite-dimensional Hilbert space, we may introduce an analogous notion of exponential families as autoparallel submanifolds w.r.t. an affine connection on ${\mathcal{S}}$ analogous to the classical e-connection. However, the notion of quantum exponential families introduced in this way does not necessarily have statistical and/or physical importance. One of the main achievements of the present paper is that the e-autoparallelity for the SLD structure, which is among a family of information-geometric structures introduced on ${\mathcal{S}}$ in a unified way, has been shown to possess estimation-theoretical characterizations.

In order to clarify our motivation, we begin with an overview of a result in the classical estimation theory. Let ${\mathcal{P}}={\mathcal{P}}(\Omega)$ be the totality of strictly positive probability distributions (probability mass functions) on a finite set $\Omega$ , and let ${\mathcal{M}}=\{p_{\xi}\,|\,\xi=(\xi^{1},\dots,\xi^{n})\in\Xi\}\subset{\mathcal{P}}$ be a statistical model whose elements $p_{\xi}$ are smoothly and injectively parametrized by an $n$ -dimensional parameter $\xi$ ranging over an open subset $\Xi$ of $\mathbb{R}^{n}$ . As is well known, the variance matrix $V_{\xi}\in\mathbb{R}^{n\times n}$ of an arbitrary unbiased estimator for the parameter $\xi$ satisfies the Cramér-Rao inequality $V_{\xi}\geq G_{\xi}^{-1}$ , where $G_{\xi}\in\mathbb{R}^{n\times n}$ denotes the Fisher information matrix. An unbiased estimator achieving the equality $V_{\xi}=G_{\xi}^{-1}$ for every $\xi\in\Xi$ is called an efficient estimator for the parameter $\xi$ , whose existence is known to impose strong restrictions on both the set ${\mathcal{M}}$ and the parametrization $\xi\mapsto p_{\xi}$ . Namely, we have the following theorem (e.g. § 5a.2 (p.324) of [1], Eq. (7.14) in § 7.2 of [2], Theorem 3.12 of [3]).

Theorem 1.1.

For a statistical model ${\mathcal{M}}=\{p_{\xi}\}$ , the following conditions are equivalent.

(i)

There exists an efficient estimator for the parameter $\xi$ .
(ii)

${\mathcal{M}}$ is an exponential family, and $\xi$ is an expectation parameter.

Condition (ii) in the above theorem means that the elements of ${\mathcal{M}}$ are represented as

p_{\xi}(\omega)=\exp[C(\omega)+\sum_{i=1}^{n}\theta_{i}(\xi)F^{i}(\omega)-\psi(\xi)]

(1.1)

and that the parameter $\xi$ satisfies

\xi^{i}=E_{\xi}(F^{i}),

(1.2)

where $E_{\xi}$ denotes the expectation w.r.t. the distribution $p_{\xi}$ . This condition is expressed in the language of geometry as follows.

(ii)’

${\mathcal{M}}$ is autoparallel in ${\mathcal{P}}$ w.r.t. the e-connection of ${\mathcal{P}}$ , and $\xi$ forms an affine coordinate system w.r.t. the m-connection of ${\mathcal{M}}$ .

Remark 1.2.

In the notation of [3] as well as of many references on information geometry, $\theta_{i}$ , $F^{i}$ and $\xi^{i}$ in (1.1) and (1.2) are expressed as $\theta^{i}$ , $F_{i}$ and $\eta_{i}$ , respectively. The reason why the upper and lower indices are reversed here (and throughout the paper) is that we first treat $\xi$ as an arbitrary coordinate system, which has an upper index (superscript) as in the standard notation of differential geometry, and then consider the condition for $\xi$ to become an m-affine coordinate system such as the expectation coordinate system.

A quantum version of Theorem 1.1 is known, which we state below. Let ${\mathcal{H}}$ be a finite-dimensional Hilbert space, ${\mathcal{S}}={\mathcal{S}}({\mathcal{H}})$ be the totality of strictly positive density operators on ${\mathcal{H}}$ , and ${\mathcal{M}}=\{\rho_{\xi}\;|\;\xi\in\Xi\}$ be an arbitrary quantum statistical model consisting of states $\rho_{\xi}$ in ${\mathcal{S}}$ . It is well known that the variance matrix $V_{\xi}$ of an arbitrary unbiased estimator for the parameter $\xi$ satisfies the SLD Cramér-Rao inequality $V_{\xi}\geq G_{\xi}^{-1}$ [6, 7], where $G_{\xi}$ is the SLD Fisher information matrix (see Section 4 for details.) When an unbiased estimator satisfies $V_{\xi}=G_{\xi}^{-1}$ for every $\xi\in\Xi$ , we call it an efficient estimator for the parameter $\xi$ as in the classical case. Then we have the following theorem (Theorem 7.6 of [3]).

Theorem 1.3.

For a quantum statistical model ${\mathcal{M}}=\{\rho_{\xi}\}$ , the following conditions are equivalent.

(i)

There exists an efficient estimator for the parameter $\xi$ .

(ii)

There exist mutually commuting Hermitian operators $F^{1},\ldots,F^{n}$ and a strictly positive operator $P$ such that the elements of ${\mathcal{M}}$ are represented as

\rho_{\xi}=\exp\Bigl{[}\,\frac{1}{2}\Bigl{\{}\sum_{i=1}^{n}\theta_{i}(\xi)F^{i}-\psi(\xi)\Bigr{\}}\Bigr{]}\,P\,\exp\Bigl{[}\,\frac{1}{2}\Bigl{\{}\sum_{i=1}^{n}\theta_{i}(\xi)F^{i}-\psi(\xi)\Bigr{\}}\Bigr{]}

(1.3)

and that

\forall\xi\in\Xi,\;\forall i,\quad\xi^{i}={\rm Tr}\left(\rho_{\xi}\,F^{i}\right).

(1.4)

When a model ${\mathcal{M}}=\{\rho_{\xi}\}$ is represented as (1.3), we call it a quasi-classical exponential family. As is pointed out in [3] and will be verified in Section 4 of this paper, a quasi-classical exponential family is e-autoparallel in ${\mathcal{S}}$ w.r.t. the SLD structure. Note, however, that this is merely a special case of e-autoparallel submanifolds. Namely, the existence of efficient estimator is too strong as a characterization of the e-autoparallelity. Is it then possible to characterize the e-autoparallelity by an estimation-theoretic condition which is weaker than the existence of efficient estimator? We give an affirmative answer to this question in Section 5. We also give another characterization of the e-autoparallelity in Section 7 by considering estimation for scalar-valued functions instead of estimation for vector-valued parameters.

As mentioned above, the e-autoparallelity in the SLD structure has estimation-theoretical significance and is therefore a concept worth studying further. It should be noted here that the e-connection in the SLD structure is curvature-free but not torsion-free, so that the e-connection is not flat. This also means that ${\mathcal{S}}$ is not a dually flat space w.r.t. the SLD structure. In the case of a flat connection, an autoparallel submanifold corresponds to an affine subspace in the coordinate space of an affine coordinate system, so that the existence condition for autoparallel submanifolds is obvious. For a non-flat connection, on the other hand, we cannot see the whole picture of autoparallel submanifolds and are faced with the new problem of what kind of condition ensures the existence of autoparallel submanifold. Therefore, it is also important to study the autoparallelity from a purely geometrical point of view, away from estimation problems. This is another concern in this paper, along with the estimation-theoretical consideration.

The paper is organized as follows. In Section 2, we explain the basic issues about the autoparallelity for an affine connection $\nabla$ on a general differential manifold, focusing in particular on the situation where the dual connection of $\nabla$ w.r.t. a Riemannian metric is flat. In Section 3, we introduce a family of information-geometric structures on the space ${\mathcal{S}}({\mathcal{H}})$ , and derive the basic issues concerning the e-autoparallelity by applying the results of Section 2. These two sections are preliminaries for later sections. Although the results shown there are mostly known, we present them together with their derivations so that the descriptions are as self-contained as possible. Section 4 also consists of basically known results, where Theorem 1.3 is revisited, and it is clarified that the existence of efficient estimator only partially characterizes the e-autoparallelity for the SLD structure. This observation motivates Section 5, where a sequence of estimators, which is called a filtration of estimators, is treated instead of a single estimator, and the existence of efficient filtration is shown to characterize the e-autoparallelity. In Section 6, it is shown that a quantum Gaussian shift model has an efficient filtration and that the model in fact exhibits an analogous property to the e-autoparallelity in ${\mathcal{S}}({\mathcal{H}})$ , although the Hilbert space ${\mathcal{H}}$ is infinite-dimensional in this case, so that we cannot fully develop a differential geometrical argument there. Section 7 treats an estimation problem for scalar-valued functions, where it is shown that a quantum statistical manifold is e-autoparallel in ${\mathcal{S}}$ if and only if the linear space formed by functions having efficient estimators is of maximal dimension. In Section 8, we move away from estimation theory and consider the condition for existence of e-autoparallel submanifolds from a purely geometrical point of view, where the involutivity of a parallel distribution of tangent spaces is studied in relation to the torsion tensor. In Section 9, we treat the case when $\dim{\mathcal{H}}=2$ and study the SLD structure of the space ${\mathcal{S}}({\mathcal{H}})$ of qubit states. It is shown there that ${\mathcal{S}}({\mathcal{H}})$ in this case has a characteristic property that every e-parallel distribution is involutive. Section 10 is devoted to concluding remarks. Some proofs and additional results are included in Appendix for the sake of readability of the main text.

Remark 1.4.

We make some remarks on the nomenclature and the notation of the paper.

1.

Throughout the paper, when we refer to a manifold, say $M$ , it means that $M$ is a manifold with a trivial global structure, so that we need not worry about the difference between global properties and local properties of $M$ . For instance, $M$ is always supposed to have a global coordinate system, and every closed differential form on $M$ is considered to be exact.
2.

When we say that $\xi=(\xi^{1},\ldots,\xi^{n})$ is a coordinate system of a manifold $M$ in the subsequent sections, it basically means that $\xi$ is a map $:M\rightarrow\mathbb{R}^{n}$ (a global chart of $M$ ) which represents each point $p\in M$ by an $n$ -dimensional vector $\xi(p)=(\xi^{1}(p),\ldots,\xi^{n}(p))\in\mathbb{R}^{n}$ , although the same symbol $\xi$ has appeared above as a parameter to specify a point in the manifold. They are equivalent by $\xi(p)=\xi^{\prime}\Leftrightarrow p=p_{\xi^{\prime}}$ . A parametrization is often more convenient than a coordinate system when dealing with concrete examples. In fact, we will use parametrizations in Section 6 for quantum Gaussian states and in Section 9 for qubit states.
3.

This paper contains both arguments on quantum statistical manifolds (manifolds consisting of density operators) and those on general manifolds. We denote quantum statistical manifolds by ${\mathcal{S}},{\mathcal{M}},{\mathcal{N}},\dots$ , while general manifolds are denoted by $S,M,N,\dots$ .

2 Basic issues about autoparallelity

In this section we summarize basic issues related to autoparallelity from the perspective of general differential geometry, which will be necessary for later discussions.

Let $S$ be an arbitrary manifold, and denote the totality of smooth functions and that of smooth vector fields on $S$ by ${\mathcal{F}}(S)$ and ${\mathscr{X}}(S)$ , respectively. Suppose that $S$ is provided with an affine connection $\nabla$ , which is a map ${\mathscr{X}}(S)\times{\mathscr{X}}(S)\rightarrow{\mathscr{X}}(S),\;(X,Y)\mapsto\nabla_{X}Y$ . Given a submanifold $M$ of $S$ , let ${\mathscr{X}}(S/M)$ denote the totality of smooth mappings which map each point $p\in M$ to a tangent vector in $T_{p}(S)$ , i.e., sections of the vector bundle $\bigsqcup_{p\in M}T_{p}(S)$ . Then $\nabla$ naturally induces a map ${\mathscr{X}}(M)\times{\mathscr{X}}(S/M)\rightarrow{\mathscr{X}}(S/M)$ so that for any $X\in{\mathscr{X}}(M)$ and any $Y\in{\mathscr{X}}(S/M)$ , $\nabla_{X}Y$ is defined as an element of ${\mathscr{X}}(S/M)$ . Since ${\mathscr{X}}(M)={\mathscr{X}}(M/M)\subset{\mathscr{X}}(S/M)$ , $\nabla_{X}Y\in{\mathscr{X}}(S/M)$ is defined for any $X,Y\in{\mathscr{X}}(M)$ , although it does not necessarily belong to ${\mathscr{X}}(M)$ .

When $\nabla_{X}Y$ belongs to ${\mathscr{X}}(M)$ for every $X,Y\in{\mathscr{X}}(M)$ , $M$ is said to be autoparallel w.r.t. $\nabla$ or $\nabla$ -autoparallel in $S$ (e.g., Sec. 8 in Chap. VII of [4]). In particular, $S$ itself is $\nabla$ -autoparallel in $S$ . An autoparallel curve is usually called a geodesic (or pregeodesic when we wish to clarify that our interest lies only in the image of the curve), so that the autoparallelity is a multi-dimensional extension of the notion of geodesic. When $M$ is $\nabla$ -autoparallel in $S$ , $\nabla$ defines an affine connection on $M$ . We denote this connection by $\nabla|_{M}$ when we wish to distinguish it from the original connection $\nabla$ on $S$ . The autoparallelity is transitive in the sense that if $M$ is $\nabla$ -autoparallel in $S$ and $N$ is $\nabla|_{M}$ -autoparallel in $M$ , then $N$ is $\nabla$ -autoparallel in $S$ .

Remark 2.1.

If $M$ is $\nabla$ -autoparallel in $S$ and $N$ is a nonempty open set of $M$ , then $N$ is also $\nabla$ -autoparallel in $S$ having the same dimension as $M$ . In this paper, we restrict ourselves to maximal autoparallel submanifolds to avoid this ambiguity. In particular, an autoparallel submanifold of $S$ having the same dimension as $S$ is considered to be only $S$ . This restriction is merely for simplicity of descriptions.

Remark 2.2.

A similar but different notion to autoparallelity is total geodesicness. A submanifold $M$ is said to be totally geodesic w.r.t. $\nabla$ or $\nabla$ -totally geodesic in $S$ when for any point $p\in M$ and any tangent vector $X_{p}\in T_{p}(M)$ of $M$ , the $\nabla$ -geodesic passing through $p$ in direction $X_{p}$ lies in $M$ . It is obvious that the autoparallelity implies the total geodesicness, but the converse is not true in general except when $\nabla$ is torsion-free ([4], Theorem 8.4 in Chap. VII). We will revisit this topic in Remark 8.6.

A vector field $X\in{\mathscr{X}}(S)$ is said to be parallel w.r.t. $\nabla$ or $\nabla$ -parallel when $\forall Y\in{\mathscr{X}}(S),\,\nabla_{Y}X=0$ . More generally, $X\in{\mathscr{X}}(S/M)$ (including the case $X\in{\mathscr{X}}(M)$ ) is said to be $\nabla$ -parallel when $\forall Y\in{\mathscr{X}}(M),\,\nabla_{Y}X=0$ .

When there exist on $S$ the same number of linearly independent $\nabla$ -parallel vector fields as $\dim S$ , we say that $(S,\nabla)$ is curvature-free. This condition is known to be equivalent to the curvature tensor of $\nabla$ vanishing on $S$ . When $(S,\nabla)$ is curvature-free, the parallel transport $\Phi^{(\nabla)}_{p,q}:T_{p}(S)\rightarrow T_{q}(S)$ is defined for arbitrary two points $p,q\in S$ so that a vector field $X\in{\mathscr{X}}(S)$ is $\nabla$ -parallel iff $\forall p,q\in S,\;\Phi^{(\nabla)}_{p,q}(X_{p})=X_{q}$ .

The following two propositions are straightforward, where the curvature-freeness is essential.

Proposition 2.3.

For a submanifold $M$ of $S$ on which a curvature-free connection $\nabla$ is given and for $X\in{\mathscr{X}}(S/M)$ (including the case when $X\in{\mathscr{X}}(M)$ ), the following conditions are equivalent.

(i)

$X$ is $\nabla$ -parallel.
(ii)

$\exists\tilde{X}\in{\mathscr{X}}(S)$ , $\tilde{X}$ is $\nabla$ -parallel and $X=\tilde{X}|_{M}$ .

Proposition 2.4.

For an $n$ -dimensional submanifold $M$ of $S$ on which a curvature-free connection $\nabla$ is given, the following conditions are equivalent.

(i)

$M$ is $\nabla$ -autoparallel in $S$ .
(ii)

$\forall p,q\in M,\;\Phi_{p,q}^{(\nabla)}(T_{p}(M))=T_{q}(M)$ .
(iii)

There exist $n$ linearly independent $\nabla$ -parallel vector fields on $M$ .
(iv)

There exist $n$ linearly independent $\nabla$ -parallel vector fields $\tilde{X}^{1},\ldots,\tilde{X}^{n}$ on $S$ such that $\forall i$ , $\tilde{X^{i}}|_{M}\in{\mathscr{X}}({\mathcal{M}})$ .

When these conditions hold, a vector field $X$ on $M$ is $\nabla|_{M}$ -parallel iff it is $\nabla$ -parallel, and $\nabla|_{M}$ is curvature-free.

In the following, we consider the case when $(S,\nabla)$ is additionally provided with a Riemannian metric $g$ for which the dual connection $\nabla^{*}$ of $\nabla$ is flat. Namely, the triple $(g,\nabla,\nabla^{*})$ satisfies the duality [5] [3]:

\forall X,Y,Z\in{\mathscr{X}}(S),\;Xg(Y,Z)=g(\nabla_{X}Y,Z)+g(Y,\nabla_{X}^{*}Z),

(2.1)

and $\nabla^{*}$ is flat in the sense that it is curvature-free and torsion free. The flatness is known to be equivalent to the existence of a coordinate system $\xi=(\xi^{i})$ , which is called an affine coordinate system w.r.t. $\nabla^{*}$ , such that $\partial_{i}=\frac{\partial}{\partial\xi^{i}},i\in\{1,\ldots,\dim S\}$ , are all $\nabla^{*}$ -parallel. In this case, $\nabla$ turns out to be curvature-free, since the curvature-freeness is preserved by the duality of connections (Theorem 3.3 of [3]), but is not necessarily torsion-free, and hence $(S,g,\nabla,\nabla^{*})$ is not necessarily dually flat.

Proposition 2.5.

In the above situation, we have:

(1)

For a vector field $X\in{\mathscr{X}}(S)$ , $X$ is $\nabla$ -parallel iff $g(X,Y)$ is constant on $S$ for every $\nabla^{*}$ -parallel vector field $Y\in{\mathscr{X}}(S)$ .
(2)

For a vector field $X\in{\mathscr{X}}(S)$ , $X$ is $\nabla^{*}$ -parallel iff $g(X,Y)$ is constant on $S$ for every $\nabla$ -parallel vector field $Y\in{\mathscr{X}}(S)$ .

Proof The proposition relies not on the torsion-freeness of $\nabla^{*}$ but only on the curvature-freeness of $\nabla$ and $\nabla^{*}$ , so that it suffices to show (1). For an $X\in{\mathscr{X}}(S)$ , we have

$X$ is $\nabla$ -parallel	$\displaystyle\Leftrightarrow\;\forall Y,\forall Z,\;g(\nabla_{Z}X,Y)=0$
	$\displaystyle\stackrel{{\scriptstyle\text{a}}}{{\Leftrightarrow}}\forall Y:\text{$\nabla^{*}$-parallel},\forall Z,\;g(\nabla_{Z}X,Y)=0$
	$\displaystyle\stackrel{{\scriptstyle\text{b}}}{{\Leftrightarrow}}\;\forall Y:\text{$\nabla^{*}$-parallel},\forall Z,\;Zg(X,Y)=0$
	$\displaystyle\Leftrightarrow\;\forall Y:\text{$\nabla^{*}$-parallel},\;\text{$g(X,Y)$ is constant},$	(2.2)

where $\Leftarrow$ in $\stackrel{{\scriptstyle\text{a}}}{{\Leftrightarrow}}$ follows since the set of $\nabla^{*}$ -parallel vector fields has the same dimension as $\dim S$ due to the curvature-freeness of $\nabla^{*}$ , and $\stackrel{{\scriptstyle\text{b}}}{{\Leftrightarrow}}$ follows since the duality (2.1) and the $\nabla^{*}$ -parallelity of $Y$ implies

Zg(X,Y)=g(\nabla_{Z}X,Y)+g(X,\nabla^{*}_{Z}Y)=g(\nabla_{Z}X,Y).

(2.3)

$\Box$

Suppose that $M$ is $\nabla$ -autoparallel in $S$ . Then $M$ is a Riemannian submanifold of $(S,g)$ equipped with the affine connection $\nabla|_{M}$ . Hence the dual connection of $\nabla|_{M}$ w.r.t. $g$ (more precisely, w.r.t. the induced metric $g|_{M}$ on $M$ ) is defined, which we denote by $\nabla^{*}_{M}:=(\nabla|_{M})^{*}$ ; i.e.,

\forall X,Y,Z\in{\mathscr{X}}(M),\;Xg(Y,Z)=g(\nabla_{X}Y,Z)+g(Y,(\nabla^{*}_{M})_{X}Z),

(2.4)

where we have applied $(\nabla|_{M})_{X}Y=\nabla_{X}Y$ . From (2.1) and (2.4), we have

\forall X,Y,Z\in{\mathscr{X}}(M),\;g(Y,(\nabla^{*}_{M})_{X}Z)=g(Y,\nabla^{*}_{X}Z),

(2.5)

which means that $(\nabla^{*}_{M})_{X}Z$ is the $g$ -projection of $\nabla^{*}_{X}Z\in{\mathscr{X}}(S/M)$ onto ${\mathscr{X}}(M)$ .

\begin{CD}\nabla @>{\text{restriction}}>{}>\nabla|_{M}\\ @V{\text{$g$-dual}}V{}V@V{}V{\text{$g|_{M}$-dual}}V\\ \nabla^{*}@>{\text{$g$-projection}}>{}>\nabla_{M}^{*}\end{CD}

Since the curvature-freeness and the torsion-freeness are respectively preserved by the duality and the projection, $\nabla^{*}_{M}$ is flat as in the case with $\nabla^{*}$ , which ensures the existence of $\nabla^{*}_{M}$ -affine coordinate system of $M$ .

The following proposition will be of fundamental importance for later arguments.

Proposition 2.6.

For an $n$ -dimensional submanifold $M$ of $S$ and a coordinate system $\xi$ of $M$ , the following conditions are equivalent.

(i)

$M$ is $\nabla$ -autoparallel in $S$ , and $\xi$ is a $\nabla^{*}_{M}$ -affine coordinate system.
(ii)

For every $i\in\{1,\ldots,n\}$ , the vector field

$X^{i}:=\sum_{j}g^{ij}\partial_{j}\in{\mathscr{X}}(M)$ (2.6)

is $\nabla$ -parallel, where $\partial_{i}:=\frac{\partial}{\partial\xi^{i}}$ and $[g^{ij}]:=[g_{ij}:=g(\partial_{i},\partial_{j})]^{-1}$ .

Proof We first show (i) $\Rightarrow$ (ii). Assume (i) and define $X^{i}\in{\mathscr{X}}(M)$ by (2.6). Then we have $g(X^{i},\partial_{j})=\delta^{i}_{j}$ for every $i,j$ , which is constant on $M$ . Noting that $n(=\dim M)$ vector fields $\{\partial_{j}\}$ are $\nabla^{*}_{M}$ -parallel and that Prop. 2.5 can be applied to $(M,g|_{M},\nabla|_{M},\nabla^{*}_{M})$ due to the $\nabla$ -autoparallelity of $M$ , it follows from item (1) of Prop. 2.5 that $X^{i}$ is $\nabla|_{M}$ -parallel, and hence it is $\nabla$ -parallel.

We next show (ii) $\Rightarrow$ (i). Assume (ii). Then according to Prop. 2.4, $M$ is $\nabla$ -autoparallel in $S$ . Noting that $g(X^{i},\partial_{j})$ is constant on $M$ and applying item (2) of Prop. 2.5 to $(M,g|_{M},\nabla|_{M},\nabla^{*}_{M})$ , we have that $\{\partial_{j}\}$ are $\nabla^{*}_{M}$ -parallel, which means that $\xi$ is $\nabla^{*}_{M}$ -affine. $\Box$

3 Information geometric structures on quantum statistical manifolds

In this section, we introduce the information-geometric structures on quantum statistical manifolds and apply the results of the previous section to them. The geometric structure treated here is essentially the same as the one studied in § 7.3 of [3].

Let ${\mathcal{L}}={\mathcal{L}}({\mathcal{H}})$ , ${\mathcal{L}}_{{\rm h}}={\mathcal{L}}_{{\rm h}}({\mathcal{H}})$ and ${\mathcal{S}}={\mathcal{S}}({\mathcal{H}})$ be the totality of linear operators on ${\mathcal{H}}$ , that of Hermitian operators on ${\mathcal{H}}$ and that of strictly positive density operators on ${\mathcal{H}}$ , respectively. Then ${\mathcal{S}}$ is an open subset of the affine space ${{\mathcal{L}}_{{\rm h}}}_{,1}:=\{A\in{\mathcal{L}}_{{\rm h}}\,|\,{\rm Tr}A=1\}$ , so that a flat affine connection is naturally introduced on ${\mathcal{S}}$ , which we call the m-connection and denote by $\nabla^{({\rm m})}$ . In order to express $\nabla^{({\rm m})}$ more explicitly, we introduce the embedding map $\iota:{\mathcal{S}}\rightarrow{{\mathcal{L}}_{{\rm h}}}_{,1}$ so that $\rho\in{\mathcal{S}}$ is denoted by $\iota(\rho)$ when treating it as an element of ${{\mathcal{L}}_{{\rm h}}}_{,1}$ . Since $\iota$ is a smooth map, it has the differential at every point $\rho\in{\mathcal{S}}$ , which we denote by $\iota_{*}=(d\iota)_{\rho}:T_{\rho}({\mathcal{S}})\rightarrow{{\mathcal{L}}_{{\rm h}}}_{,0}:=\{A\in{\mathcal{L}}_{{\rm h}}\,|\,{\rm Tr}A=0\}$ . For a vector field $X\in{\mathscr{X}}({\mathcal{S}})$ , the map $\iota_{*}(X):{\mathcal{S}}\rightarrow{{\mathcal{L}}_{{\rm h}}}_{,0}$ is defined to be $\rho\mapsto\iota_{*}(X_{\rho})=(d\iota)_{\rho}(X_{\rho})$ . Then the definition of the m-connection is represented as follows:

\forall X,Y\in{\mathscr{X}}({\mathcal{S}}),\;\iota_{*}(\nabla^{({\rm m})}_{X}Y)=X\iota_{*}(Y),

(3.1)

where $X\iota_{*}(Y):{\mathcal{S}}\rightarrow{{\mathcal{L}}_{{\rm h}}}_{,0}$ is the derivative of $\iota_{*}(Y)$ w.r.t. $X$ . When a coordinate system $\xi=(\xi^{i})$ is arbitrarily given and the elements of ${\mathcal{S}}$ is parametrized by it as $\rho_{\xi}$ , we have

\iota_{*}\left(\partial_{i}\right)=\partial_{i}\rho_{\xi},

(3.2)

and

\iota_{*}\left(\nabla^{({\rm m})}_{\partial_{i}}\partial_{j}\right)=\partial_{i}\,\iota_{*}\left(\partial_{j}\right)=\partial_{i}\partial_{j}\rho_{\xi},

(3.3)

where $\partial_{i}:=\frac{\partial}{\partial\xi^{i}}$ .

Suppose that we are given a family of inner products $\{\left\langle\cdot,\cdot\right\rangle_{\rho}\,|\,\rho\in{\mathcal{S}}({\mathcal{H}})\}$ on the $\mathbb{R}$ -linear space ${\mathcal{L}}_{{\rm h}}$ , where the correspondence $\rho\mapsto\left\langle\cdot,\cdot\right\rangle_{\rho}$ is smooth, and assume that

\forall\rho\in{\mathcal{S}},\forall A\in{\mathcal{L}}_{{\rm h}},\;\left\langle A,I\right\rangle_{\rho}=\left\langle A\right\rangle_{\rho}:={\rm Tr}(\rho A).

(3.4)

The inner products are represented as

\left\langle A,B\right\rangle_{\rho}=\left\langle A,\Omega_{\rho}(B)\right\rangle_{\rm HS}={\rm Tr}(A\,\Omega_{\rho}(B))

(3.5)

by a family of super-operators $\{\Omega_{\rho}:{\mathcal{L}}_{{\rm h}}\rightarrow{\mathcal{L}}_{{\rm h}}\}_{\rho\in{\mathcal{S}}}$ , where $\left\langle\cdot,\cdot\right\rangle_{\rm HS}$ denotes the Hilbert-Schmidt inner product. Note that the assumption (3.4) is equivalent to

\forall\rho\in{\mathcal{S}},\;\Omega_{\rho}(I)=\rho.

(3.6)

For an arbitrary tangent vector $X_{\rho}\in T_{\rho}({\mathcal{S}})$ , a Hermitian operator $L_{X_{\rho}}\in{\mathcal{L}}_{{\rm h}}$ is defined by the relation

\forall A\in{\mathcal{L}}_{{\rm h}},\;X_{\rho}\left\langle A\right\rangle=\left\langle L_{X_{\rho}},A\right\rangle_{\rho},

(3.7)

where the LHS denotes the derivative of the function $\left\langle A\right\rangle:{\mathcal{S}}\rightarrow\mathbb{R}$ , $\rho\mapsto\left\langle A\right\rangle_{\rho}$ w.r.t. $X_{\rho}$ . Noting that the LHS and the RHS are represented as ${\rm Tr}(\iota_{*}(X_{\rho})A)$ and ${\rm Tr}(\Omega_{\rho}(L_{X_{\rho}})\,A)$ , respectively, we can rewrite (3.7) into

\iota_{*}(X_{\rho})=\Omega_{\rho}(L_{X_{\rho}}).

(3.8)

From (3.4) and (3.7), we have

\left\langle L_{X_{\rho}}\right\rangle_{\rho}=\left\langle L_{X_{\rho}},I\right\rangle_{\rho}=X_{\rho}\left\langle I\right\rangle=X_{\rho}1=0.

(3.9)

Since $X_{\rho}\leftrightarrow\iota_{*}(X_{\rho})\leftrightarrow L_{X_{\rho}}$ are one-to-one correspondences, we obtain the following identity:

\{L_{X_{\rho}}\,|\,X_{\rho}\in T_{\rho}({\mathcal{S}})\}=\{A\in{\mathcal{L}}_{{\rm h}}\,|\,\left\langle A\right\rangle_{\rho}=0\}.

(3.10)

In the following, we often express (3.4) as

\forall A\in{\mathcal{L}}_{{\rm h}},\;\left\langle A,I\right\rangle=\left\langle A\right\rangle

(3.11)

as an identity for functions on ${\mathcal{S}}$ . Similarly, (3.7) is expressed as

\forall A\in{\mathcal{L}}_{{\rm h}},\;X\left\langle A\right\rangle=\left\langle L_{X},A\right\rangle,

(3.12)

where $X$ is a vector field on ${\mathcal{S}}$ , $L_{X}$ denotes the map ${\mathcal{S}}\rightarrow{\mathcal{L}}_{{\rm h}}$ , $\rho\mapsto L_{X_{\rho}}$ , and $\left\langle L_{X},A\right\rangle$ denotes the function $\rho\mapsto\left\langle L_{X_{\rho}},A\right\rangle_{\rho}$ .

For a submanifold ${\mathcal{M}}$ of ${\mathcal{S}}$ and a vector field $X\in{\mathscr{X}}({\mathcal{M}})$ on it, the map $L_{X}:{\mathcal{M}}\rightarrow{\mathcal{L}}_{{\rm h}},\rho\mapsto L_{X_{\rho}}$ is defined, for which (3.12) holds as an identity for functions on ${\mathcal{M}}$ . We may write $X\left\langle A\right\rangle$ as $X\left\langle A\right\rangle\!|_{{\mathcal{M}}}$ in this case. In particular, given a coordinate system $\xi=(\xi^{i})$ of ${\mathcal{M}}$ , we have

\forall A\in{\mathcal{L}}_{{\rm h}},\;\partial_{i}\left\langle A\right\rangle=\partial_{i}\left\langle A\right\rangle\!|_{{\mathcal{M}}}=\left\langle L_{i},A\right\rangle,

(3.13)

where $\partial_{i}:=\frac{\partial}{\partial\xi^{i}}$ and $L_{i}:=L_{\partial_{i}}$ .

Remark 3.1.

In the terminology of [3], $\iota_{*}(X_{\rho})$ and $L_{X_{\rho}}$ are called the m-representation and e-representation, and are denoted by $\iota_{*}(X_{\rho})=X_{\rho}^{({\rm m})}$ and $L_{X_{\rho}}=X_{\rho}^{({\rm e})}$ , respectively. We do not use the symbols $X_{\rho}^{({\rm m})}$ and $X_{\rho}^{({\rm e})}$ in this paper, but may use the following notation when convenient:

	$\displaystyle T^{({\rm m})}_{\rho}({\mathcal{M}})$	$\displaystyle:=\{\iota_{*}(X_{\rho})\,\|\,X_{\rho}\in T_{\rho}({\mathcal{M}})\},$		(3.14)
	$\displaystyle T^{({\rm e})}_{\rho}({\mathcal{M}})$	$\displaystyle:=\{L_{X_{\rho}}\,\|\,X_{\rho}\in T_{\rho}({\mathcal{M}})\}.$		(3.15)

Note that

\displaystyle T^{({\rm m})}_{\rho}({\mathcal{S}})

\displaystyle={{\mathcal{L}}_{{\rm h}}}_{,0}\quad\text{and}\quad T^{({\rm e})}_{\rho}({\mathcal{S}})=\{A\in{\mathcal{L}}_{{\rm h}}\,|\,\left\langle A\right\rangle_{\rho}=0\}.

(3.16)

Now, we define a Riemannian metric $g$ on ${\mathcal{S}}$ by

\forall\rho\in{\mathcal{S}},\forall X_{\rho},Y_{\rho}\in T_{\rho}({\mathcal{S}}),\;g_{\rho}(X_{\rho},Y_{\rho}):=\left\langle L_{X_{\rho}},L_{Y_{\rho}}\right\rangle_{\rho}={\rm Tr}\left(\iota_{*}(X_{\rho})L_{Y_{\rho}}\right),

(3.17)

which is equivalently written as

\forall X,Y\in{\mathscr{X}}({\mathcal{S}}),\;g(X,Y):=\left\langle L_{X},L_{Y}\right\rangle={\rm Tr}\left(\iota_{*}(X)L_{Y}\right).

(3.18)

We also have the expression

\forall X,Y\in{\mathscr{X}}({\mathcal{S}}),\;g(X,Y)=-\left\langle XL_{Y}\right\rangle=-\left\langle YL_{X}\right\rangle,

(3.19)

where $XL_{Y}:{\mathcal{S}}\rightarrow{\mathcal{L}}_{{\rm h}}$ , $\rho\mapsto X_{\rho}L_{Y}$ , denotes the derivative of the map $L_{Y}:{\mathcal{S}}\rightarrow{\mathcal{L}}_{{\rm h}}$ , $\rho\mapsto L_{Y_{\rho}}$ , w.r.t. $X$ , and $\left\langle XL_{Y}\right\rangle$ denotes the function $\rho\mapsto\left\langle X_{\rho}L_{Y}\right\rangle_{\rho}$ . This expression is derived as

	$\displaystyle 0=X_{\rho}\left\langle L_{Y}\right\rangle=X_{\rho}\left\langle L_{Y_{\bm{\cdot}}}\right\rangle_{\bm{\cdot}}$	$\displaystyle=X_{\rho}\left\langle L_{Y_{\rho}}\right\rangle_{\bm{\cdot}}+X_{\rho}\left\langle L_{Y_{\bm{\cdot}}}\right\rangle_{\rho}$
		$\displaystyle=\left\langle L_{X_{\rho}},L_{Y_{\rho}}\right\rangle_{\rho}+\left\langle X_{\rho}L_{Y}\right\rangle_{\rho},$		(3.20)

where the first equality follows from (3.9) and the last from (3.7), with dots ${\bm{\cdot}}$ added to clarify the positions of variables of maps. An important class of such Riemannian metrics is that of monotone metrics [9] for which $\Omega_{\rho}$ is represented as

\Omega_{\rho}(A)=f(\Delta_{\rho})(A\rho)=f(\Delta_{\rho})(A)\rho,

(3.21)

where $f:(0,\infty)\rightarrow(0,\infty)$ is an operator monotone function satisfying $\forall x>0,\,xf(1/x)=f(x)$ and $f(1)=1$ , and $\Delta_{\rho}:{\mathcal{L}}\rightarrow{\mathcal{L}}$ is the modular operator defined by $\Delta_{\rho}(A)=\rho A\rho^{-1}$ . The class contains the SLD metric, which plays the main role in this paper, defined by $f(x)=(x+1)/2$ and

\left\langle A,B\right\rangle_{\rho}={\rm Re}\,{\rm Tr}(\rho AB)={\rm Tr}(\rho(A\circ B))={\rm Tr}((\rho\circ A)B),\;\;A,B\in{\mathcal{L}}_{{\rm h}},

(3.22)

where $\circ$ denotes the symmetrized product: $A\circ B=\frac{1}{2}(AB+BA)$ . In this case, $L_{X_{\rho}}$ ( $L_{X}$ , resp.) is called the SLD (symmetric logarithmic derivative) of the tangent vector $X_{\rho}$ (the vector field $X$ , resp.). In particular, $L_{i}:=L_{\partial_{i}}$ obeys the equation

\partial_{i}\rho_{\xi}=\rho_{\xi}\circ L_{i,\xi},

(3.23)

which is a popular expression for the SLD.

Given a family of inner products $\{\left\langle\cdot,\cdot\right\rangle_{\rho}\,|\,\rho\in{\mathcal{S}}\}$ which determines a Riemannian metric $g$ , let the e-connection $\nabla^{({\rm e})}$ be defined as the dual connection of $\nabla^{({\rm m})}$ w.r.t. $g$ ; i.e.,

\forall X,Y,Z\in{\mathscr{X}}({\mathcal{S}}),\;Xg(Y,Z)=g(\nabla^{({\rm e})}_{X}Y,Z)+g(Y,\nabla^{({\rm m})}_{X}Z).

(3.24)

We have thus obtained $({\mathcal{S}},g,\nabla^{({\rm e})},\nabla^{({\rm m})})$ as an example of $(S,g,\nabla,\nabla^{*})$ treated in the previous section, where $\nabla^{({\rm e})}$ and $\nabla^{({\rm m})}$ are dual w.r.t. $g$ , $\nabla^{({\rm e})}$ is curvature-free and $\nabla^{({\rm m})}$ is flat. The triple $({\mathcal{S}},g,\nabla^{({\rm e})},\nabla^{({\rm m})})$ is called the information-geometric structure on ${\mathcal{S}}$ induced from a family of inner products $\{\left\langle\cdot,\cdot\right\rangle_{\rho}\,|\,\rho\in{\mathcal{S}}\}$ . In particular, the information-geometric structure induced from the symmetrized inner product (3.22) is called the SLD structure. It is the SLD structure that will play a leading role in subsequent sections in relation to estimation theory, but this section will continue the discussion on general information-geometric structures.

Remark 3.2.

As is shown in Theorem 7.1 of [3], there is only one information-geometric structure defined in the manner described above for which the e-connection is torsion-free (so that $({\mathcal{S}},g,\nabla^{({\rm e})},\nabla^{({\rm m})})$ is dually flat). That is the structure induced from the BKM (Bogoliubov-Kubo-Mori) inner product

\left\langle A,B\right\rangle_{\rho}=\int_{0}^{1}{\rm Tr}\left(\rho^{s}A\rho^{1-s}B\right)ds,\;\;A,B\in{\mathcal{L}}_{{\rm h}}.

(3.25)

The induced Riemannian metric is a monotone metric corresponding to $f(x)=\frac{x-1}{\log x}=\int_{0}^{1}x^{s}ds$ . In the other cases, the torsion ${{\mathcal{T}}}^{({\rm e})}(X,Y)=\nabla^{({\rm e})}_{X}Y-\nabla^{({\rm e})}_{Y}X-[X,Y]$ does not vanish, where $[\cdot,\cdot]$ denotes the Lie bracket for vector fields.

Remark 3.3.

For the SLD structure, it is known ([3], Eq. (7.80)) that the torsion has the following representation : for each point $\rho\in{\mathcal{S}}$ and each tangent vectors $X_{\rho},Y_{\rho}\in T_{\rho}({\mathcal{S}})$ , we have

\iota_{*}({{\mathcal{T}}}^{({\rm e})}(X_{\rho},Y_{\rho}))=\frac{1}{4}[[L_{X_{\rho}},L_{Y_{\rho}}],\rho],

(3.26)

where $[\cdot,\cdot]$ in the RHS denotes the commutator for operators on ${\mathcal{H}}$ . Since this representation will be used in Sections 8 and 9, we show its proof in A1 of appendix for the reader’s convenience.

Henceforth, we use the prefixes e- and m- for notions concerning the e-connection and m-connection; e.g., e-parallel, e-autoparallel, m-affine, etc.

Proposition 3.4.

For any vector fields $X,Y,W\in{\mathscr{X}}({\mathcal{S}})$ , we have

\displaystyle W=\nabla^{({\rm e})}_{X}Y\;

\displaystyle\Leftrightarrow\;L_{W}=XL_{Y}-\left\langle XL_{Y}\right\rangle=XL_{Y}+g(X,Y).

(3.27)

Proof Differentiating $g(Y,Z)={\rm Tr}(L_{Y}\,\iota_{*}(Z))$ (see (3.18)) by $X$ , we have

	$\displaystyle Xg(Y,Z)$	$\displaystyle={\rm Tr}\left((XL_{Y})\,\iota_{}(Z)\right)+{\rm Tr}\left(L_{Y}\,(X\iota_{}(Z))\right)$
		$\displaystyle={\rm Tr}\left((XL_{Y})\,\iota_{*}(Z)\right)+g(Y,\nabla^{({\rm m})}_{X}Z),$

where the second equality follows from (3.1). Letting $W^{\prime}\in{\mathscr{X}}({\mathcal{S}})$ be defined by $L_{W^{\prime}}=XL_{Y}-\left\langle XL_{Y}\right\rangle$ , whose existence is ensured by (3.10), the above equation is represented as

Xg(Y,Z)=g(W^{\prime},Z)+g(Y,\nabla^{({\rm m})}_{X}Z).

This means that $W^{\prime}=\nabla^{({\rm e})}_{X}Y$ , and proves the proposition. $\Box$

Proposition 3.5.

For a vector field $X\in{\mathscr{X}}({\mathcal{S}})$ , we have

	$X$ is e-parallel	$\displaystyle\Leftrightarrow\;\exists A\in{\mathcal{L}}_{{\rm h}},\;L_{X}=A-\left\langle A\right\rangle$
		$\displaystyle\Leftrightarrow\;\exists A\in{\mathcal{L}}_{{\rm h}},\forall\rho\in{\mathcal{S}},\;L_{X_{\rho}}=A-\left\langle A\right\rangle_{\rho}.$		(3.28)

Proof We may use Prop. 3.4 to prove this, but here we show an alternative proof. We first note that, according to (3.1), a vector field $Y\in{\mathscr{X}}({\mathcal{S}})$ is m-parallel if and only if $\iota_{*}(Y):{\mathcal{S}}\rightarrow{{\mathcal{L}}_{{\rm h}}}_{,0}$ is a constant map represented by an operator $B\in{{\mathcal{L}}_{{\rm h}}}_{,0}$ as $\iota_{*}(Y)=B$ . Invoking Prop. 2.5, we have

	$X$ is e-parallel	$\displaystyle\Leftrightarrow\;\forall Y:\text{m-parallel},\;\text{$g(X,Y)$ is constant on ${\mathcal{S}}$}$
		$\displaystyle\Leftrightarrow\;\forall B\in{{\mathcal{L}}_{{\rm h}}}_{,0},\;\text{$\left\langle L_{X},B\right\rangle$ is constant on ${\mathcal{S}}$}$
		$\displaystyle\Leftrightarrow\;\exists A\in{\mathcal{L}}_{{\rm h}},\;L_{X}=A-\left\langle A\right\rangle.$

$\Box$

Recalling Prop. 2.3, the following corollary is immediate.

Corollary 3.6.

For a submanifold ${\mathcal{M}}$ of ${\mathcal{S}}$ and for $X\in{\mathscr{X}}({\mathcal{S}}/{\mathcal{M}})$ (including the case when $X\in{\mathscr{X}}({\mathcal{M}})$ ), we have

	$X$ is e-parallel	$\displaystyle\Leftrightarrow\;\exists A\in{\mathcal{L}}_{{\rm h}},\;L_{X}=A-\left\langle A\right\rangle\|_{{\mathcal{M}}}$
		$\displaystyle\Leftrightarrow\;\exists A\in{\mathcal{L}}_{{\rm h}},\forall\rho\in{\mathcal{M}},\;L_{X_{\rho}}=A-\left\langle A\right\rangle_{\rho}.$		(3.29)

The following corollary is also immediate from Prop. 3.5.

Corollary 3.7.

The e-parallel transport $\Phi^{({\rm e})}_{\rho,\sigma}$ $:T_{\rho}({\mathcal{S}})\rightarrow T_{\sigma}({\mathcal{S}})$ for arbitrary two points $\rho,\sigma\in{\mathcal{S}}$ is represented as follows: $\forall X_{\rho}\in T_{\rho}({\mathcal{S}}),\forall X_{\sigma}\in T_{\sigma}({\mathcal{S}})$ ,

X_{\sigma}=\Phi^{({\rm e})}_{\rho,\sigma}(X_{\rho})\;\Leftrightarrow\;L_{X_{\sigma}}=L_{X_{\rho}}-\left\langle L_{X_{\rho}}\right\rangle_{\sigma}.

(3.30)

In the following, a pair $({\mathcal{M}},\xi)$ of a submanifold ${\mathcal{M}}$ of ${\mathcal{S}}$ and a coordinate system $\xi$ of ${\mathcal{M}}$ is called a model.

Proposition 3.8.

For an $n$ -dimensional model $({\mathcal{M}},\xi)$ , the following conditions are equivalent.

(i)

${\mathcal{M}}$ is e-autoparallel in ${\mathcal{S}}$ , and $\xi$ is an m-affine coordinate system.
(Note: “m-affine” means “affine w.r.t. the m-connection $\nabla^{({\rm m})}_{{\mathcal{M}}}$ on ${\mathcal{M}}$ ”.)
(ii)

$\exists\{F^{1},\ldots,F^{n}\}\subset{\mathcal{L}}_{{\rm h}}$ such that for every $i\in\{1,\ldots,n\}$

$\sum_{j}g^{ij}L_{j}=F^{i}-\left\langle F^{i}\right\rangle|_{{\mathcal{M}}},$ (3.31)

where $\partial_{i}:=\frac{\partial}{\partial\xi^{i}}$ , $L_{j}:=L_{\partial_{j}}$ and $[g^{ij}]:=[g_{ij}:=g(\partial_{i},\partial_{j})]^{-1}$ .
(iii)

$\exists\{F^{1},\ldots,F^{n}\}\subset{\mathcal{L}}_{{\rm h}}$ such that for every $i\in\{1,\ldots,n\}$

$\sum_{j}g^{ij}L_{j}=F^{i}-\xi^{i}.$ (3.32)

(Note: (3.32) implies $\xi^{i}=\left\langle F^{i}\right\rangle|_{{\mathcal{M}}}$ .)

(iv)

$\exists\{F^{1},\ldots,F^{n}\}\subset{\mathcal{L}}_{{\rm h}}$ such that for every $i\in\{1,\ldots,n\}$

\forall\rho\in{\mathcal{M}},\;\;L_{i,\rho}\in{\rm span}\,\{F^{j}\}_{j=1}^{n}\oplus\mathbb{R}\quad\text{and}\quad\xi^{i}(\rho)=\left\langle F^{i}\right\rangle_{\rho},

(3.33)

where $\mathbb{R}$ is identified with $\{cI\,|\,c\in\mathbb{R}\}$ (see Remark 3.9 below).

Proof The equivalence (i) $\Leftrightarrow$ (ii) is immediate from Prop. 2.6 and Cor. 3.6, and (iii) $\Rightarrow$ (ii) is obvious since $\left\langle L_{i}\right\rangle=0$ .

To show (ii) $\Rightarrow$ (iii), assume (ii). Then we have

\sum_{j}g^{ij}\left\langle L_{j},L_{k}\right\rangle=\left\langle F^{i},L_{k}\right\rangle.

Here, the LHS is $\sum_{j}g^{ij}g_{jk}=\delta^{i}_{k}=\partial_{k}\xi^{i}$ and the RHS is $\partial_{k}\left\langle F^{i}\right\rangle$ due to (3.13). Hence, there exists a constant vector $(c^{i})\in\mathbb{R}^{n}$ such that $\left\langle F^{i}\right\rangle|_{{\mathcal{M}}}=\xi^{i}+c^{i}$ . Redefining $F^{i}:=F^{i}-c^{i}$ , (3.31) is rewritten as (3.32), and (iii) is verified.

Since (3.32) implies that

L_{i,\rho}=\sum_{j}g_{ij}(\rho)F^{j}-\Bigl{(}\sum_{j}g_{ij}(\rho)\xi^{j}(\rho)\Bigr{)}\in{\rm span}\,\{F^{j}\}_{j=1}^{n}\oplus\mathbb{R},

we have (iii) $\Rightarrow$ (iv). To show the converse, we assume the existence of $\{F^{1},\ldots,F^{n}\}$ in (iv). Then we have $\xi^{i}=\left\langle F^{i}\right\rangle|_{{\mathcal{M}}}$ , and for each $\rho\in{\mathcal{M}}$ there exist $[a_{ij}]\in\mathbb{R}^{n\times n}$ and $[b_{i}]\in\mathbb{R}^{n}$ such that for any $i$

L_{i,\rho}=\sum_{j}a_{ij}F^{j}+b_{i}.

This implies that

	$\displaystyle g_{ik}(\rho)$	$\displaystyle=\left\langle L_{k,\rho},L_{i,\rho}\right\rangle_{\rho}=\sum_{j}a_{ij}\left\langle L_{k,\rho},F^{j}\right\rangle_{\rho}$
		$\displaystyle=\sum_{j}a_{ij}(\partial_{k}\left\langle F^{j}\right\rangle)_{\rho}=\sum_{j}a_{ij}(\partial_{k}\xi^{j})_{\rho}=a_{ik}.$

Hence we have

\sum_{j}g^{ij}(\rho)L_{j,\rho}=F^{i}+\sum_{j}g^{ij}(\rho)b_{j}.

Here, the constant $\sum_{j}g^{ij}(\rho)b_{j}$ should be equal to $-\left\langle F^{i}\right\rangle_{\rho}=-\xi^{i}(\rho)$ due to $\left\langle L_{j,\rho}\right\rangle_{\rho}=0$ . Thus (iii) is concluded. $\Box$

Remark 3.9.

In (3.31) and (3.32), the operators $\{F^{i}-\left\langle F^{i}\right\rangle_{\rho}\}_{i=1}^{n}$ turn out to be linearly independent for each $\rho\in{\mathcal{M}}$ , which implies that $\{F^{1},\ldots,F^{n},I\}$ are linearly independent, or equivalently that $\{F^{1},\ldots,F^{n}\}$ are linearly independent in the quotient space ${\mathcal{L}}_{{\rm h}}/\mathbb{R}$ with identification $\mathbb{R}=\{cI\,|\,c\in\mathbb{R}\}$ .

At the end of this section, we present a proposition which claims that i.i.d. extensions of a model preserves the e-autoparallelity. For the proposition, we assume the following condition on the family of inner products from which the information-geometric structure is defined:

\forall\{A_{t}\}_{t=1}^{k},\,\forall\{B_{t}\}_{t=1}^{k}\subset{\mathcal{L}}_{{\rm h}}({\mathcal{H}}),\;\bigl{\langle}\,\operatorname*{\text{\raisebox{0.96873pt}{\scalebox{0.8}{$\bigotimes$}}}}_{t=1}^{k}A_{t},\operatorname*{\text{\raisebox{0.96873pt}{\scalebox{0.8}{$\bigotimes$}}}}_{t=1}^{k}B_{t}\,\bigr{\rangle}_{\rho^{\otimes k}}=\prod_{t=1}^{k}\left\langle A_{t},B_{t}\right\rangle_{\rho},

(3.34)

which is equivalent to

\forall\{A_{t}\}_{t=1}^{k}\subset{\mathcal{L}}_{{\rm h}}({\mathcal{H}}),\;\Omega_{\rho^{\otimes k}}\bigl{(}\operatorname*{\text{\raisebox{0.96873pt}{\scalebox{0.8}{$\bigotimes$}}}}_{t=1}^{k}A_{t}\bigr{)}=\operatorname*{\text{\raisebox{0.96873pt}{\scalebox{0.8}{$\bigotimes$}}}}_{t=1}^{k}\Omega_{\rho}(A_{t}).

(3.35)

The assumption is satisfied when the inner products are defined from a function $f$ by (3.5) and (3.21) for which $\Omega_{\rho}=f(\Delta_{\rho})$ and $\Omega_{\rho^{\otimes k}}=f\bigl{(}\Delta_{\rho^{\otimes k}}\bigr{)}$ hold. In particular, the proposition hols for the SLD structure.

Proposition 3.10.

Given a model $({\mathcal{M}},\xi)$ in ${\mathcal{S}}({\mathcal{H}})$ and a natural number $k\geq 2$ , define the model $(\tilde{{\mathcal{M}}},\tilde{\xi})$ in ${\mathcal{S}}({\mathcal{H}}^{\otimes k})$ by $\tilde{{\mathcal{M}}}:=\{\rho^{\otimes k}\,|\,\rho\in{\mathcal{M}}\}$ and $\tilde{\xi}^{i}(\rho^{\otimes k})=\xi^{i}(\rho)$ for $\rho\in{\mathcal{M}}$ . Under the assumption (3.34)-(3.35), the following conditions are equivalent.

(i)

${\mathcal{M}}$ is e-autoparallel in ${\mathcal{S}}$ , and $\xi$ is m-affine.
(ii)

$\tilde{{\mathcal{M}}}$ is e-autoparallel in ${\mathcal{S}}({\mathcal{H}}^{\otimes k})$ , and $\tilde{\xi}$ is m-affine.

Proof Let $\partial_{i}:=\frac{\partial}{\partial\xi^{i}}$ , $\tilde{\partial}_{i}:=\frac{\partial}{\partial\tilde{\xi^{i}}}$ , and $L_{i}=L_{\partial_{i}}$ , $\tilde{L}_{i}=L_{\tilde{\partial}_{i}}$ , which are determined by

\iota_{*}((\partial_{i})_{\rho})=\Omega_{\rho}(L_{i,\rho})

and

\tilde{\iota}_{*}((\tilde{\partial}_{i})_{\rho^{\otimes k}})=\Omega_{\rho^{\otimes k}}(\tilde{L}_{i,\rho^{\otimes k}}),

where $\tilde{\iota}$ denotes the natural embedding ${\mathcal{S}}({\mathcal{H}}^{\otimes k})\rightarrow{{\mathcal{L}}_{{\rm h}}}_{,1}({\mathcal{H}}^{\otimes k})$ . With the aid of the parametric representation (3.2), we see that

	$\displaystyle\tilde{\iota}_{*}((\tilde{\partial}_{i})_{\rho^{\otimes k}})$	$\displaystyle=\tilde{\partial}_{i}(\rho^{\otimes k})_{\tilde{\xi}}=\partial_{i}\rho_{\xi}^{\otimes k}=\sum_{t=1}^{k}\rho_{\xi}^{\otimes(t-1)}\otimes\partial_{i}\rho_{\xi}\otimes\rho_{\xi}^{\otimes(k-t)}$
		$\displaystyle=\sum_{t=1}^{k}\rho^{\otimes(t-1)}\otimes\iota_{*}((\partial_{i})_{\rho})\otimes\rho^{\otimes(k-t)}$
		$\displaystyle=\sum_{t=1}^{k}\rho^{\otimes(t-1)}\otimes\Omega_{\rho}(L_{i,\rho})\otimes\rho^{\otimes(k-t)}$
		$\displaystyle\stackrel{{\scriptstyle\star}}{{=}}\sum_{t=1}^{k}\Omega_{\rho^{\otimes k}}(I^{\otimes(t-1)}\otimes L_{i,\rho}\otimes I^{\otimes(k-t)})=\Omega_{\rho^{\otimes k}}(L_{i,\rho}^{(k)}),$

where $\stackrel{{\scriptstyle\star}}{{=}}$ follows from (3.6) and (3.35), and we have used the notation

A^{(k)}:=\sum_{t=1}^{k}I^{\otimes(t-1)}\otimes A\otimes I^{\otimes(k-t)}\quad\text{for}\;\;A\in{\mathcal{L}}_{{\rm h}}({\mathcal{H}}).

This implies that $\tilde{L}_{i,\rho^{\otimes k}}=(L_{i,\rho})^{(k)}$ and that $\tilde{g}_{ij}=k\,g_{ij}$ for $g_{ij}(\rho)=\left\langle L_{i,\rho},L_{j,\rho}\right\rangle_{\rho}$ and $\tilde{g}_{ij}(\rho^{\otimes k})=\bigl{\langle}\tilde{L}_{i,\rho^{\otimes k}},\tilde{L}_{j,\rho^{\otimes k}}\bigr{\rangle}_{\rho^{\otimes k}}$ , which leads to $\tilde{g}^{ij}=\frac{1}{k}\,g^{ij}$ . Hence, $L^{i}:=\sum_{j}g^{ij}L_{j}$ and $\tilde{L}^{i}:=\sum_{j}\tilde{g}^{ij}\tilde{L}_{j}$ are linked by $\tilde{L}^{i}_{\rho^{\otimes k}}=\frac{1}{k}\,(L^{i}_{\rho})^{(k)}.$ Now, according to Prop. 3.8, conditions (i), (ii) of the present proposition are respectively expressed as

(i)’

$\exists\{F^{i}\}\subset{\mathcal{L}}({\mathcal{H}}),\;\forall i,\forall\rho\in{\mathcal{M}},\;L^{i}_{\rho}=F^{i}-\xi^{i}(\rho)$ .
(ii)”

$\exists\{\tilde{F}^{i}\}\subset{\mathcal{L}}({\mathcal{H}}^{\otimes k}),\;\forall i,\forall\rho\in{\mathcal{M}},\;\frac{1}{k}\,(L^{i}_{\rho})^{(k)}=\tilde{F}^{i}-\xi^{i}(\rho)$ .

They are obviously equivalent with relation $\tilde{F^{i}}=\frac{1}{k}(F^{i})^{(k)}$ . $\Box$

4 Efficient estimators

From this section, we investigate the relationship between estimation problems and geometric properties for quantum statistical models. Henceforth, we will consider only the SLD structure as an information geometric structure unless otherwise stated.

Given a model $({\mathcal{M}},\xi)$ in ${\mathcal{S}}({\mathcal{H}})$ , an estimator for coordinates $\xi$ is generally represented by a POVM $\Pi=\Pi(d\hat{\xi})$ on ${\mathcal{H}}$ , where $\hat{\xi}$ is a variable representing an estimate. A representative case is when $\Pi$ is expressed as $\Pi(d\hat{\xi})=\sum_{\omega\in\Omega}\pi_{\omega}\,\delta_{f(\omega)}(d\hat{\xi})$ by a POVM $\pi=(\pi_{\omega})_{\omega\in\Omega}$ on a finite set $\Omega$ and a function $f:\Omega\rightarrow\mathbb{R}^{n}$ , where $\delta_{f(\omega)}$ denotes the $\delta$ -measure concentrated on the point $f(\omega)\in\mathbb{R}^{n}$ . This estimator, which is denoted by $\Pi=(\pi,f)$ , represents the estimation procedure in which the estimate is determined as $\xi=f(\omega)$ from the outcome $\omega$ of the measurement $\pi$ .

The expectation $E_{\rho}(\Pi)\in\mathbb{R}^{n}$ and the mean squared error (the variance in the unbiased case) $V_{\rho}(\Pi)\in\mathbb{R}^{n\times n}$ of $\Pi$ for a state $\rho$ are defined by

	$\displaystyle E_{\rho}(\Pi)$	$\displaystyle:=\int\hat{\xi}\,{\rm Tr}\left(\rho\,\Pi(d\hat{\xi})\right),$		(4.1)
	$\displaystyle V_{\rho}(\Pi)$	$\displaystyle:=\int(\hat{\xi}-\xi(\rho)){(\hat{\xi}-\xi(\rho))}{}^{T}\,{\rm Tr}\left(\rho\,\Pi(d\hat{\xi})\right),$		(4.2)

where $\mathbb{R}^{n}$ is regarded as the space of column vectors $\mathbb{R}^{n\times 1}$ and ^T denotes the transpose. For $\Pi=(\pi,f)$ , we have

	$\displaystyle E_{\rho}(\Pi)$	$\displaystyle=\sum_{\omega}f(\omega)\,{\rm Tr}(\rho\pi_{\omega}),$		(4.3)
	$\displaystyle V_{\rho}(\Pi)$	$\displaystyle=\sum_{\omega}(f(\omega)-\xi(\rho)){(f(\omega)-\xi(\rho))}{}^{T}\,{\rm Tr}(\rho\pi_{\omega}).$		(4.4)

An estimator $\Pi$ is called locally unbiased for a coordinate system $\xi$ at $\rho\in{\mathcal{M}}$ when the elements $E^{i}_{\rho}(\Pi)$ , $i\in\{1,\ldots,n\}$ , of $E_{\rho}(\Pi)$ satisfy

E^{i}_{\rho}(\Pi)=\xi^{i}(\rho)\quad\text{and}\quad\partial_{j}E^{i}_{\rho}(\Pi)=\delta^{i}_{j},

(4.5)

where $\partial_{j}E^{i}_{\rho}(\Pi)$ denotes the derivative of the function $E^{i}(\Pi):\sigma\mapsto E^{i}_{\sigma}(\Pi)$ by $\partial_{i}=\frac{\partial}{\partial\xi^{i}}$ evaluated at the point $\rho$ . We denote by ${\mathcal{U}}(\rho,\xi)$ the totality of locally unbiased estimators for $\xi$ at $\rho$ . Using the symmetrized inner product $\left\langle\cdot,\cdot\right\rangle$ of (3.22) and the SLDs $L_{i}=L_{\partial_{i}}$ , $i\in\{1,\ldots,n\}$ , we have

\Pi\in{\mathcal{U}}(\rho,\xi)\;\Leftrightarrow\;\forall i,j,\;\left\langle A^{i}\right\rangle_{\rho}=\xi^{i}(\rho)\;\;\text{and}\;\;\left\langle A^{i},L_{j,\rho}\right\rangle_{\rho}=\delta_{j}^{i},

(4.6)

where

A^{i}:=\int\hat{\xi}^{i}\Pi(d\hat{\xi})\in{\mathcal{L}}_{{\rm h}}({\mathcal{H}}).

(4.7)

The well-known SLD Cramér-Rao inequality [6, 7] states that every $\Pi\in{\mathcal{U}}(\rho,\xi)$ obeys

V_{\rho}(\Pi)\geq G_{\rho}^{-1},

(4.8)

where $G_{\rho}=[g_{ij}(\rho)]$ denotes the SLD Fisher information matrix defined by $g_{ij}=\left\langle L_{i},L_{j}\right\rangle=g(\partial_{i},\partial_{j})$ , where $g$ is the SLD metric. Furthermore, the following proposition holds.

Proposition 4.1.

For every column vector ${\mbox{\boldmath$u$}}\in\mathbb{R}^{n}=\mathbb{R}^{n\times 1}$ , we have

\inf_{\Pi\in{\mathcal{U}}(\rho,\xi)}\,{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$u$}}={{\mbox{\boldmath$u$}}}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}.

(4.9)

Note that $\inf$ in (4.9) cannot be replaced with $\min$ in general.

Let us introduce a class of randomized procedures for estimation that will be useful in the proofs of both Prop. 4.1 above and Theorem 5.1 later. Suppose that a point $\rho\in{\mathcal{M}}$ , a basis $\{{\mbox{\boldmath$u$}}^{1},\ldots,{\mbox{\boldmath$u$}}^{n}\}$ of $\mathbb{R}^{n}$ , a positive probability vector $(p_{1},\ldots,p_{n})$ s.t. $\forall i$ , $p_{i}>0,\sum_{i}p_{i}=1$ and $n^{2}$ real numbers $\{\gamma_{k}^{i}\}$ satisfying

\sum_{k}p_{k}\,\gamma_{k}^{i}=\xi^{i}(\rho)

(4.10)

are arbitrarily given. Let

X^{k}:=\sum_{i}u^{k}_{i}\,L^{i}_{\rho}\in{\mathcal{L}}_{{\rm h}}({\mathcal{H}}),

(4.11)

where ${\mbox{\boldmath$u$}}^{k}=[u^{k}_{i}]$ , $L^{i}_{\rho}:=\sum_{j}g^{ij}(\rho)L_{j,\rho}$ , $G_{\rho}^{-1}=[g^{ij}(\rho)]$ , and consider the random measurement such that $k\in\{1,\ldots,n\}$ is randomly chosen according to the probability distribution $(p_{1},\ldots,p_{n})$ and then the observable $X^{k}$ is measured. This measurement is represented by the POVM $\pi=\{\pi_{k,r}\}=\{p_{k}\pi^{k}_{r}\}$ , where $\{\pi^{k}_{r}\}$ are the projectors in the spectral decomposition $X^{k}=\sum_{r}x^{k}_{r}\,\pi^{k}_{r}$ . (Do not confuse $X^{k}$ , $x^{k}_{r}$ and $\pi^{k}_{r}$ with the $n$ th powers of $X$ , $x_{r}$ and $\pi_{r}$ .) When an eigenvalue $x^{k}_{r}$ is observed by measuring $X^{k}$ , we estimate $\xi$ by $\hat{\xi}=f(k,x^{k}_{r})$ , where $f={(f^{1},\ldots,f^{n})}{}^{T}$ is defined by

f^{i}(k,x):=\gamma_{k}^{i}+\frac{w^{i}_{k}}{p_{k}}x,

(4.12)

and $[w^{i}_{k}]=[u^{k}_{i}]^{-1}$ , i.e., $\sum_{k}u^{k}_{i}w^{j}_{k}=\delta_{i}^{j}$ and $\sum_{i}u^{k}_{i}w^{i}_{l}=\delta_{l}^{k}$ . This estimation procedure defines the estimator $\Pi:=(\pi,f)$ , which is characterized by the following property: for any polynomial function $\varphi:\mathbb{R}^{n}\rightarrow\mathbb{R}$ ,

\int\varphi(\hat{\xi})\Pi(d\hat{\xi})=\sum_{k}p_{k}\,\varphi(f(k,X^{k})).

(4.13)

In this situation we have the following lemma, whose proof is given in A2 of Appendix.

Lemma 4.2.

The estimator $\Pi=(\pi,f)$ satisfies:

(1)

$\Pi\in{\mathcal{U}}(\rho,\xi)$ .
(2)

$\displaystyle\forall k,\;{({\mbox{\boldmath$u$}}^{k})}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$u$}}^{k}=\frac{1}{p_{k}}\,{({\mbox{\boldmath$u$}}^{k})}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}^{k}+\sum_{l}p_{l}(a_{l}^{k})^{2}$ ,
where $\displaystyle a_{l}^{k}:=\sum_{i}u^{k}_{i}(\gamma_{l}^{i}-\xi^{i}(\rho))$ .

Proof of Prop. 4.1 Let $\gamma_{k}^{i}:=\xi^{i}(\rho)$ , which satisfies (4.10) so that Lemma 4.2 is applicable. Since $a_{l}^{k}=0$ in this case, we have

{({\mbox{\boldmath$u$}}^{k})}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$u$}}^{k}=\frac{1}{p_{k}}{({\mbox{\boldmath$u$}}^{k})}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}^{k}.

The proposition is obvious when ${\mbox{\boldmath$u$}}=0$ , so we assume ${\mbox{\boldmath$u$}}\neq 0$ and choose $p\in(0,1)$ arbitrarily. Taking $({\mbox{\boldmath$u$}}^{1},\ldots,{\mbox{\boldmath$u$}}^{n})$ and $(p_{1},\ldots,p_{n})$ in the above construction such that ${\mbox{\boldmath$u$}}^{1}={\mbox{\boldmath$u$}}$ and $p_{1}=p$ , the resulting $\Pi$ satisfies ${{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$u$}}=\frac{1}{p}{{\mbox{\boldmath$u$}}}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}$ . Since $p$ can be arbitrarily close to $1$ , we have the proposition. $\Box$

A locally unbiased estimator $\Pi\in{\mathcal{U}}(\rho,\xi)$ is called locally efficient for $\xi$ at $\rho$ if $V_{\rho}(\Pi)\leq V_{\rho}(\Pi^{\prime})$ for all $\Pi^{\prime}\in(\rho,\xi)$ . Given a positive-semidefinite matrix $W\in\mathbb{R}^{n\times n}$ as a weight, an estimator $\Pi\in{\mathcal{U}}(\rho,\xi)$ is called locally $W$ -efficient for $\xi$ at $\rho$ if ${\rm tr}\,(WV_{\rho}(\Pi))\leq{\rm tr}\,(WV_{\rho}(\Pi^{\prime}))$ for all $\Pi^{\prime}\in{\mathcal{U}}(\rho,\xi)$ , or equivalently if

{\rm tr}\,(WV_{\rho}(\Pi))=\min_{\Pi^{\prime}\in{\mathcal{U}}(\rho,\xi)}{\rm tr}\,(WV_{\rho}(\Pi^{\prime})).

(4.14)

Here, the symbol ${\rm tr}\,$ is used for the trace of $n\times n$ matrices to distinguish it from the trace ${\rm Tr}$ for operators on ${\mathcal{H}}$ .

Proposition 4.3.

Given a model $({\mathcal{M}},\xi)$ , a point $\rho\in{\mathcal{M}}$ and an estimator $\Pi\in{\mathcal{U}}(\rho,\xi)$ , the following conditions are equivalent.

(i)

$\Pi$ is locally efficient for $\xi$ at $\rho$ .
(ii)

$V_{\rho}(\Pi)=G_{\rho}^{-1}$ .
(iii)

$\Pi$ is locally ${\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}$ -efficient for $\xi$ at $\rho$ for every column vector ${\mbox{\boldmath$u$}}\in\mathbb{R}^{n}$ .
(iv)

$\Pi$ is locally $W$ -efficient for $\xi$ at $\rho$ for every positive weight $W>0$ .

Proof The equivalence (i) $\Leftrightarrow$ (iii) $\Leftrightarrow$ (iv) is obvious since

	$\displaystyle V_{\rho}(\Pi)\leq V_{\rho}(\Pi^{\prime})\;$	$\displaystyle\Leftrightarrow\;\forall{\mbox{\boldmath$u$}}\in\mathbb{R}^{n},\;{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$u$}}\leq{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi^{\prime}){\mbox{\boldmath$u$}}$
		$\displaystyle\Leftrightarrow\;\forall W>0,\;{\rm tr}\,(WV_{\rho}(\Pi))\leq{\rm tr}\,(WV_{\rho}(\Pi^{\prime})),$

and (ii) $\Leftrightarrow$ (iii) follows from Prop. 4.1. $\Box$

Remark 4.4.

As is well known, there exists a locally efficient estimator at $\rho$ iff the SLDs $L_{1,\rho},\ldots,L_{n,\rho}$ mutually commute (e.g. § 7.4 of [3]).

An estimator $\Pi$ is called efficient for a coordinate system $\xi$ if $\Pi$ is locally efficient for $\xi$ at every point $\rho\in{\mathcal{M}}$ . Given a weight field ${\mathcal{W}}=\{W_{\rho}\;|\;\rho\in{\mathcal{M}}\}$ , $\Pi$ is called ${\mathcal{W}}$ -efficient for $\xi$ if $\Pi$ is locally $W_{\rho}$ -efficient for $\xi$ at every $\rho\in{\mathcal{M}}$ . When $W_{\rho}=W$ for all $\rho$ , we simply call it $W$ -efficient for $\xi$ . According to Prop. 4.3, $\Pi$ is efficient $\Leftrightarrow$ $\Pi$ is ${\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}$ -efficient for every ${\mbox{\boldmath$u$}}\in\mathbb{R}^{n}$ $\Leftrightarrow$ $\Pi$ is $W$ -efficient for every $W>0$ .

The condition for existence of efficient estimator was mentioned in the introduction as Theorem 1.3. Suppose that $({\mathcal{M}},\xi)$ is represented as (1.3) and (1.4) in the theorem; namely, every $\rho\in{\mathcal{M}}$ is represented in the form

\rho=\exp\Bigl{[}\,\frac{1}{2}\Bigl{\{}\sum_{i=1}^{n}\theta_{i}(\rho)F^{i}-\psi(\rho)\Bigr{\}}\Bigr{]}\,P\exp\Bigl{[}\,\frac{1}{2}\Bigl{\{}\sum_{i=1}^{n}\theta_{i}(\rho)F^{i}-\psi(\rho)\Bigr{\}}\Bigr{]}

(4.15)

and satisfies $\xi^{i}(\rho)=\left\langle F^{i}\right\rangle_{\rho}$ , where $F^{1},\ldots,F^{n}$ are mutually commuting observables. Note that $P$ can be chosen to be an arbitrary element of ${\mathcal{M}}$ if we wish. The SLDs of ${\mathcal{M}}$ w.r.t. $\xi$ are represented as

L_{i}=\partial_{i}\Bigl{(}\sum_{j}\theta_{j}F^{j}-\psi\Bigr{)}=\sum_{j}(\partial_{i}\theta_{j})(F^{j}-\partial^{j}\psi),

(4.16)

where $\partial_{i}:=\frac{\partial}{\partial\xi^{i}}$ and $\partial^{i}:=\frac{\partial}{\partial\theta_{i}}$ . Noting that the positions of upper/lower indices (superscripts and subscripts) are reversed from the standard notation of information geometry as in [3] (see Remark 1.2), we have $\partial_{i}\theta_{j}=g_{ij}$ , $\partial^{i}\psi=\xi^{i}$ and

L^{i}=\sum_{j}g^{ij}L_{j}=F^{i}-\xi^{i}.

(4.17)

According to Prop. 3.8, this means that ${\mathcal{M}}$ is e-autoparallel in ${\mathcal{S}}$ and that $\xi$ is an m-affine coordinate system w.r.t. the SLD structure. Furthermore, the induced e-connection $\nabla^{({\rm e})}|_{{\mathcal{M}}}$ on $M$ turns out to to be torsion-free and hence flat, for which $\theta=(\theta_{i})$ forms an affine coordinate system. We have thus seen that ${\mathcal{M}}$ is dually flat just as classical exponential families.

When $n=1$ , (4.15) is written as

\rho=\exp\left[\frac{1}{2}\bigl{\{}\theta(\rho)F-\psi(\rho)\bigr{\}}\right]\,P\exp\left[\frac{1}{2}\bigl{\{}\theta(\rho)F-\psi(\rho)\bigr{\}}\right].

(4.18)

This is a general form of e-geodesic in the sense that every e-geodesic is represented in this form by some $P$ and $F$ . In the multi-dimensional case $n\geq 2$ , on the other hand, (4.15) provides merely a special case of e-autoparallel submanifolds. In order to characterize the e-autoparallelity by an estimation-theoretical notion, the existence of efficient estimator is too strong, and we need a new variant of the notion of efficient estimators, which will be introduced in the next section.

5 Efficient filtration of estimators

We now consider a one-parameter family of estimators $\vec{\Pi}=(\Pi_{\varepsilon})_{{\varepsilon}\in(0,{\varepsilon}_{1})}$ instead of a single estimator, and call it a filtration of estimators or simply a filtration. The upper limit ${\varepsilon}_{1}$ can be an arbitrary positive number or $\infty$ , but our interest lies only in the limiting property of ${\varepsilon}\downarrow 0$ and the value of ${\varepsilon}_{1}$ plays no role. So we simply write $\vec{\Pi}=(\Pi_{\varepsilon})_{{\varepsilon}>0}$ . Given a nonnegative real matrix $W\in\mathbb{R}^{n\times n}$ as a weight, a filtration $\vec{\Pi}=(\Pi_{\varepsilon})_{{\varepsilon}>0}$ is called locally $W$ -efficient for $\xi$ at $\rho$ if $\Pi_{\varepsilon}\in{\mathcal{U}}(\rho,\xi)$ for every ${\varepsilon}>0$ and $\lim_{{\varepsilon}\downarrow 0}{\rm tr}\,(WV_{\rho}(\Pi_{\varepsilon}))\leq{\rm tr}\,(WV_{\rho}(\Pi^{\prime}))$ for every $\Pi^{\prime}\in{\mathcal{U}}(\rho,\xi)$ , which is equivalent to

\lim_{{\varepsilon}\downarrow 0}{\rm tr}\,(WV_{\rho}(\Pi_{\varepsilon}))=\inf_{\Pi^{\prime}\in{\mathcal{U}}(\rho,\xi)}{\rm tr}\,(WV_{\rho}(\Pi^{\prime})).

(5.1)

When $W={\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}$ with ${\mbox{\boldmath$u$}}\in\mathbb{R}^{n}$ , in particular, this is represented as

\lim_{{\varepsilon}\downarrow 0}{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi_{\varepsilon}){\mbox{\boldmath$u$}}={{\mbox{\boldmath$u$}}}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}

(5.2)

due to Prop. 4.1. Compare (5.1) with (4.14), and note that a locally $W$ -efficient filtration at $\rho$ always exists, even when a locally $W$ -efficient estimator does not exist.

Given a weight field ${\mathcal{W}}=\{W_{\rho}\,|\,\rho\in{\mathcal{M}}\}$ , a filtration $\vec{\Pi}=(\Pi_{\varepsilon})_{{\varepsilon}>0}$ is called ${\mathcal{W}}$ -efficient for $\xi$ if it is $W_{\rho}$ -locally efficient for $\xi$ at every $\rho\in{\mathcal{M}}$ . When $W_{\rho}=W$ for all $\rho$ , we simply call it $W$ -efficient for $\xi$ .

Now, we have the following theorem, which gives an estimation-theoretical characterization of the e-autoparallelity.

Theorem 5.1.

For a model $({\mathcal{M}},\xi)$ , the following conditions are equivalent.

(i)

${\mathcal{M}}$ is e-autoparallel in ${\mathcal{S}}$ , and $\xi$ is an m-affine coordinate system.
(ii)

For every ${\mbox{\boldmath$u$}}\in\mathbb{R}^{n}$ , there exists a ${\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}$ -efficient filtration for $\xi$ .

Proof According to Prop. 3.8, it suffices to show the equivalence of (ii) and the condition that

\exists\{F^{i}\},\;\forall i,\;\;L^{i}=F^{i}-\xi^{i},

(5.3)

where $L^{i}:=\sum_{j}g^{ij}L_{i}$ .

We first show (ii) $\Rightarrow$ (5.3). Fix $i\in\{1,\ldots,n\}$ arbitrarily, and let $\vec{\Pi}=(\Pi_{\varepsilon})_{{\varepsilon}>0}$ be an ${{\mbox{\boldmath$e$}}}^{i}{({\mbox{\boldmath$e$}}^{i})}{}^{T}$ -efficient filtration for $\xi$ , where ${\mbox{\boldmath$e$}}^{i}=(\delta_{j}^{i})$ denotes the $i$ th vector of the natural basis of $\mathbb{R}^{n}$ . For each $\rho\in{\mathcal{M}}$ , we have

	$\displaystyle{({\mbox{\boldmath$e$}}^{i})}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$e$}}^{i}$	$\displaystyle=\int(\hat{\xi}^{i}-\xi^{i}(\rho))^{2}{\rm Tr}(\rho\Pi_{{\varepsilon}}(d\hat{\xi}))$
		$\displaystyle\geq{\rm Tr}(\rho(F_{\varepsilon}^{i}-\xi^{i}(\rho))^{2})=\\|F_{\varepsilon}^{i}-\xi^{i}(\rho)\\|_{\rho}^{2},$		(5.4)

where

F_{\varepsilon}^{i}:=\int\hat{\xi}^{i}\Pi_{\varepsilon}(d\hat{\xi})\in{\mathcal{L}}_{{\rm h}}({\mathcal{H}}),

and $\|\cdot\|_{\rho}$ denotes the norm for the symmetrized inner product $\left\langle\cdot,\cdot\right\rangle_{\rho}$ . (We also denote the norm for the metric $g_{\rho}$ by the same symbol.) From the local unbiasedness condition (4.6) applied to $\Pi_{\varepsilon}$ , we have

\left\langle F_{\varepsilon}^{i}-\xi^{i}(\rho),L_{j,\rho}\right\rangle_{\rho}=\delta_{j}^{i}=\left\langle L_{\rho}^{i},L_{j,\rho}\right\rangle_{\rho}.

This means that $L_{\rho}^{i}$ is the orthogonal projection of $F_{\varepsilon}^{i}-\xi^{i}(\rho)$ onto ${\rm span}\{L_{j,\rho}\}_{j=1}^{n}$ . Hence we have

\|F_{\varepsilon}^{i}-\xi^{i}(\rho)\|_{\rho}^{2}=\|L_{\rho}^{i}\|_{\rho}^{2}+\|F_{\varepsilon}^{i}-\xi^{i}(\rho)-L_{\rho}^{i}\|_{\rho}^{2}.

(5.5)

The ${{\mbox{\boldmath$e$}}}^{i}{({\mbox{\boldmath$e$}}^{i})}{}^{T}$ -efficiency of $\vec{\Pi}$ is represented as

\lim_{{\varepsilon}\downarrow 0}{({\mbox{\boldmath$e$}}^{i})}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$e$}}^{i}={({\mbox{\boldmath$e$}}^{i})}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$e$}}^{i}=g^{ii}(\rho)=\|L^{i}\|_{\rho}^{2},

which, combined with (5.4) and (5.5), yields

\lim_{{\varepsilon}\downarrow 0}\|F_{\varepsilon}^{i}-\xi^{i}(\rho)-L_{\rho}^{i}\|_{\rho}^{2}=0.

This implies that a $\rho$ -independent Hermitian operator $F^{i}:=\lim_{{\varepsilon}\downarrow 0}F_{\varepsilon}^{i}$ exists and satisfies $L_{\rho}^{i}=F^{i}-\xi^{i}(\rho)$ for every $\rho\in{\mathcal{M}}$ , which concludes (5.3).

We next show (5.3) $\Rightarrow$ (ii). Let ${\mbox{\boldmath$u$}}=(u_{i})\in\mathbb{R}^{n}$ be arbitrarily given, for which we will construct ${\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}$ -efficient filtration by assuming the existence of $\{F^{i}\}$ of (5.3). We can assume ${\mbox{\boldmath$u$}}\neq 0$ , and take a basis $\{{\mbox{\boldmath$u$}}^{1},\ldots,{\mbox{\boldmath$u$}}^{n}\}$ , ${\mbox{\boldmath$u$}}^{k}=(u_{i}^{k})$ , of $\mathbb{R}^{n}$ such that ${\mbox{\boldmath$u$}}^{1}={\mbox{\boldmath$u$}}$ , whereby for each $k$ we define

Y^{k}:=\sum_{i}u^{k}_{i}F^{i}.

(5.6)

Let ${\varepsilon}\in(0,1)$ , and take a positive probability vector $(p_{1},\ldots,p_{n})$ such that $p_{1}=1-{\varepsilon}$ . We define the estimator $\Pi_{\varepsilon}$ by the following estimation procedure: randomly choose $k\in\{1,\ldots,n\}$ according to the probability distribution $(p_{1},\ldots,p_{n})$ , measure the observable $Y^{k}$ to get an outcome $y$ , and then estimate $\xi$ by $\hat{\xi}=g(k,y)$ using the function $g={(g^{1},\ldots,g^{n})}{}^{T}$ defined by

g^{i}(k,y):=\frac{w_{k}^{i}}{p_{k}}y,

(5.7)

where $[w^{i}_{k}]=[u^{k}_{i}]^{-1}$ . Invoking (5.3) evaluated at an arbitrary point $\rho\in{\mathcal{M}}$ , we have

	$\displaystyle g^{i}(k,Y^{k})$	$\displaystyle=\frac{w_{k}^{i}}{p_{k}}\sum_{j}u^{k}_{j}F^{j}=\frac{w_{k}^{i}}{p_{k}}\sum_{j}u^{k}_{j}(L_{\rho}^{j}+\xi^{j}(\rho))$
		$\displaystyle=\gamma_{k}^{i}+\frac{w_{k}^{i}}{p_{k}}X^{k}=f^{i}(k,X^{k}),$

where $X^{k}$ and $f^{i}$ are those defined by (4.11) and (4.12) with

\gamma_{k}^{i}:=\frac{w_{k}^{i}}{p_{k}}\sum_{j}u^{k}_{j}\xi^{j}(\rho).

(5.8)

Since this $\gamma_{k}^{i}$ satisfies (4.10), Lemma 4.2 applies to conclude that $\Pi_{\varepsilon}$ is locally unbiased at $\rho$ and satisfies, for every $k$ and every $\rho\in{\mathcal{M}}$ ,

{({\mbox{\boldmath$u$}}^{k})}{}^{T}V_{\rho}(\Pi_{\varepsilon}){\mbox{\boldmath$u$}}^{k}=\frac{1}{p_{k}}\,{({\mbox{\boldmath$u$}}^{k})}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}^{k}+\sum_{l}p_{l}(a_{l}^{k})^{2},

(5.9)

where $a_{l}^{k}:=\sum_{i}u^{k}_{i}(\gamma_{l}^{i}-\xi^{i}(\rho))$ . From (5.8), we have

a_{l}^{k}=\frac{1}{p_{l}}\sum_{i,j}u_{i}^{k}w_{l}^{i}u_{j}^{l}\xi^{j}(\rho)-\sum_{i}u_{i}^{k}\xi^{i}(\rho)=\Bigl{(}\frac{\delta_{l}^{k}}{p_{k}}-1\Bigr{)}\sum_{i}u_{i}^{k}\xi^{i}(\rho)

and hence

\sum_{l}p_{l}(a_{l}^{k})^{2}=\sum_{l}p_{l}\Bigl{(}\frac{\delta_{l}^{k}}{p_{k}}-1\Bigr{)}^{2}\Bigl{(}\sum_{i}u_{i}^{k}\xi^{i}(\rho)\Bigr{)}^{2}=\frac{1-p_{k}}{p_{k}}\Bigl{(}\sum_{i}u_{i}^{k}\xi^{i}(\rho)\Bigr{)}^{2}.

Thus, letting $k=1$ in (5.9), we obtain

{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi_{\varepsilon}){\mbox{\boldmath$u$}}=\frac{1}{1-{\varepsilon}}{{\mbox{\boldmath$u$}}}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}+\frac{{\varepsilon}}{1-{\varepsilon}}\Bigl{(}\sum_{i}u_{i}^{k}\xi^{i}(\rho)\Bigr{)}^{2}.

This implies that $\lim_{{\varepsilon}\downarrow 0}{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi_{\varepsilon}){\mbox{\boldmath$u$}}={{\mbox{\boldmath$u$}}}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}$ for every $\rho$ and that $\vec{\Pi}:=(\Pi_{\varepsilon})_{{\varepsilon}\in(0,1)}$ is a ${\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}$ -efficient filtration. $\Box$

The following proposition follows immediately from Theorem 5.1 and Prop. 3.10.

Proposition 5.2.

In the situation of Prop. 3.10 where $(\tilde{{\mathcal{M}}},\tilde{\xi})$ is the $k$ th i.i.d. extension of $({\mathcal{M}},\xi)$ , $(\tilde{{\mathcal{M}}},\tilde{\xi})$ has an efficient filtration if and only if $({\mathcal{M}},\xi)$ has an efficient filtration.

6 A sufficient condition for the e-autoparallelity and its relation to the Gaussian states

Proposition 6.1.

For a model $({\mathcal{M}},\xi)$ , the following condition is sufficient for (i) and (ii) of Theorem 5.1:

•

For every positive weight $W>0$ , there exists a $W$ -efficient estimator for $\xi$ .

Proof Given ${\mbox{\boldmath$u$}}\in\mathbb{R}^{n}$ and ${\varepsilon}>0$ , arbitrarily, let $\Pi_{\varepsilon}$ be a $({\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}+{\varepsilon}I)$ -efficient estimator. Then, for an arbitrary $\rho\in{\mathcal{M}}$ and an arbitrary $\Pi^{\prime}\in{\mathcal{U}}(\rho,\xi)$ , we have

	$\displaystyle{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi_{\varepsilon}){\mbox{\boldmath$u$}}$	$\displaystyle\leq{\rm tr}\,(({\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}+{\varepsilon}I)V_{\rho}(\Pi_{\varepsilon}))$
		$\displaystyle\leq{\rm tr}\,(({\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}+{\varepsilon}I)V_{\rho}(\Pi^{\prime}))={{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi^{\prime}){\mbox{\boldmath$u$}}+{\varepsilon}{\rm tr}\,(V_{\rho}(\Pi^{\prime})).$

This implies that $\lim_{{\varepsilon}\downarrow 0}{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi_{\varepsilon}){\mbox{\boldmath$u$}}\leq{{\mbox{\boldmath$u$}}}{}^{T}V_{\rho}(\Pi^{\prime}){\mbox{\boldmath$u$}}$ for every $\Pi^{\prime}\in{\mathcal{U}}(\rho,\xi)$ , so that the filtration $\vec{\Pi}=(\Pi_{\varepsilon})_{{\varepsilon}>0}$ is ${\mbox{\boldmath$u$}}{{\mbox{\boldmath$u$}}}{}^{T}$ -efficient. $\Box$

An important example of a model satisfying the condition of Prop. 6.1 is given by a model consisting of quantum Gaussian states. Strictly speaking, the model is mathematically out of our scope in that the underling Hilbert space is infinite-dimensional. Nevertheless, it is worthwhile to consider the relationship between this important model and the e-autoparallelity, even at the expense of some rigor.

We begin by quickly reviewing the definition of quantum Gaussian states based on the description in Chapter 5 of [8]. Let $Z$ be an even-dimensional real linear space on which a symplectic form $\Delta(\cdot,\cdot)$ is given, and $U(\cdot)$ be an irreducible projective representation of $(Z,\Delta)$ on a separable Hilbert space ${\mathcal{H}}$ ; i.e., $\{U(z)\,|\,z\in Z\}$ is a family of unitary operators on ${\mathcal{H}}$ satisfying $\forall z,z^{\prime}\in Z$ , $U(z)U(z^{\prime})=e^{\sqrt{-1}\Delta(z,z^{\prime})/2}U(z+z^{\prime})$ and having no nontrivial invariant subspace. For each $z\in Z$ , a self-adjoint operator $R(z)$ is defined by $U(tz)=e^{\sqrt{-1}\,tR(z)},t\in\mathbb{R}$ , and satisfies

\forall z,z^{\prime}\in Z,\;[R(z),R(z^{\prime})]=-\sqrt{-1}\,\Delta(z,z^{\prime})I.

(6.1)

Given a symmetric bilinear form $\alpha(\cdot,\cdot)$ on $Z$ satisfying $\forall z,z^{\prime}\in Z$ , $\alpha(z,z)+\alpha(z^{\prime},z^{\prime})\geq\Delta(z,z^{\prime})$ and a linear functional $\mu(\cdot)$ on $Z$ , there uniquely exists a density operator $\rho$ on ${\mathcal{H}}$ such that

\forall z\in Z,\;{\rm Tr}(\rho\,U(z))=e^{\sqrt{-1}\,\mu(z)-\frac{1}{2}\alpha(z,z)}.

(6.2)

This $\rho$ is called the Gaussian state determined by $(\mu,\alpha)$ , and satisfies

\mu(z)=\left\langle I,R(z)\right\rangle_{\rho}=\left\langle R(z)\right\rangle_{\rho},

(6.3)

\alpha(z,z^{\prime})=\left\langle R(z)-\mu(z),R(z^{\prime})-\mu(z^{\prime})\right\rangle_{\rho}.

(6.4)

Assuming that $\alpha$ and linearly independent $\mu_{1},\ldots,\mu_{n}$ are arbitrarily given, consider the model ${\mathcal{M}}=\{\rho_{\xi}\,|\,\xi=(\xi^{1},\ldots,\xi^{n})\in\mathbb{R}^{n}\}$ , where $\rho_{\xi}$ is the Gaussian state determined by $(\mu_{\xi},\alpha)$ with $\mu_{\xi}:=\sum_{i}\xi^{i}\mu_{i}$ . We call ${\mathcal{M}}=\{\rho_{\xi}\}$ a quantum Gaussian shift model. Holevo showed (§ 6.9 of [8]) that the model has $W$ -efficient estimator for every positive weight $W$ . Namely, the sufficient condition presented in Prop. 6.1 is fulfilled. Hence, if ${\mathcal{H}}$ were finite-dimensional, it would have been concluded that ${\mathcal{M}}$ is e-autoparallel in ${\mathcal{S}}$ and that $\xi$ is m-affine as a coordinate system of ${\mathcal{M}}$ . In reality, however, ${\mathcal{H}}$ is infinite-dimensional, and the e-autoparallelity for a model in ${\mathcal{S}}$ is not given a mathematical definition in the framework of the present paper. Nevertheless, there is no essential difference from the finite-dimensional case. In fact, we can verify that the model $({\mathcal{M}},\xi)$ satisfies conditions (ii)-(iv) of Prop. 3.8 as follows. According to [8], the $i$ th SLD $L_{i,\xi}$ is represented as

L_{i,\xi}=R(z_{i})-\mu_{\xi}(z_{i}),

(6.5)

where $z_{i}$ is the element of $Z$ determined by the condition $\forall z\in Z$ , $\mu_{i}(z)=\alpha(z_{i},z)$ . The SLD Fisher information matrix $G=[g_{ij}]$ is given by

g_{ij}=\left\langle L_{i,\xi},L_{j,\xi}\right\rangle_{\xi}=\alpha(z_{i},z_{j}),

(6.6)

which does not depend on the parameter $\xi$ . Letting $F^{i}:=\sum_{j}g^{ij}R(z_{j})$ , where $G^{-1}=[g^{ij}]$ , we have

\sum_{j}g^{ij}L_{j,\xi}=F^{i}-\sum_{j}g^{ij}\mu_{\xi}(z_{j})=F^{i}-\xi^{i},

(6.7)

where the second equality follows from $\mu_{\xi}(z_{j})=\sum_{k}\xi^{k}\mu_{k}(z_{j})$ and $\mu_{k}(z_{j})=\alpha(z_{k},z_{j})=g_{kj}$ . We thus have verified that $({\mathcal{M}},\xi)$ satisfies condition (iii) of Prop. 3.8, which is evidently equivalent to (ii) and (iv) even in this case. Hence, we may consider that the model also satisfies condition (i) at least in a naive sense. In order to mathematically justify this consideration, we need a rigorous treatment of ${\mathcal{S}}({\mathcal{H}})$ as an infinite-dimensional manifold equipped with an information geometric structure, which is out of scope of the present paper.

The fact that $G=[g_{ij}]$ is constant on ${\mathcal{M}}$ means that $({\mathcal{M}},g)$ is a Euclidean manifold and the m-affine coordinate system $\xi$ is also affine w.r.t. the flat Levi-Civita connection of $g$ . This implies that ${\mathcal{M}}$ is dually flat and that the e, m-connections on ${\mathcal{M}}$ coincide with the Levi-Civita connection. Note also that the SLDs $\{L_{i,\xi}\}$ do not commute and hence ${\mathcal{M}}$ has no efficient estimator.

Remark 6.2.

Let $\alpha$ be arbitrarily fixed, and let ${\mathcal{N}}:=\{\rho_{\mu}\,|\,\mu\in Z^{*}\}$ , where $Z^{*}$ denotes the dual linear space of $Z$ and $\rho_{\mu}$ denotes the Gaussian state determined by $(\mu,\alpha)$ . This is a special (maximal) case of ${\mathcal{M}}$ treated above, so that ${\mathcal{N}}$ is “e-autoparallel in ${\mathcal{S}}$ ” in the same naive sense. The SLD structure of ${\mathcal{N}}$ is Euclidean, and the model ${\mathcal{M}}=\{\rho_{\xi}\,|\,\xi\in\mathbb{R}^{n}\}$ treated above, where $\mu_{\xi}=\sum_{i}\xi^{i}\mu_{i}$ , forms an e,m-autoparallel submanifold of ${\mathcal{N}}$ . Generally, a submanifold ${\mathcal{M}}$ of ${\mathcal{N}}$ is e,m-autoparallel in ${\mathcal{N}}$ iff there exists an affine subspace ${\mathcal{A}}$ of $Z^{*}$ such that ${\mathcal{M}}=\{\rho_{\mu}\,|\,\mu\in{\mathcal{A}}\}$ , which is represented as ${\mathcal{M}}=\{\rho_{\xi}\,|\,\xi\in\mathbb{R}^{n}\}$ with $\mu_{\xi}=\mu_{0}+\sum_{i=1}^{n}\xi^{i}\mu_{i}$ . Note that the construction of $W$ -efficient estimator by Holevo is immediately applied to this extended model, so that it satisfies the sufficient condition of Prop. 6.1.

7 Another estimation-theoretical characterization of e-autoparallelity

In this section we give another characterization to the e-autoparallelity by considering a different type of estimation problem. Before we get into the main discussion, some preliminaries on geometrical language are in order.

On a general Riemannian manifold $(M,g)$ , a one-to-one correspondence between a tangent vector $X_{p}\in T_{p}(M)$ and a cotangent vector $\omega_{p}\in T^{*}_{p}(M)$ at a point $p\in M$ is naturally defined; denoting the correspondence by $\stackrel{{\scriptstyle g_{p}}}{{\longleftrightarrow}}$ , we have

X_{p}\stackrel{{\scriptstyle g_{p}}}{{\longleftrightarrow}}\omega_{p}\;\Leftrightarrow\;\forall Y_{p}\in T_{p}(M),\;\omega_{p}(Y_{p})=g_{p}(X_{p},Y_{p}).

(7.1)

This is extended to the correspondence $\stackrel{{\scriptstyle g}}{{\longleftrightarrow}}$ between a vector field $X\in{\mathscr{X}}(M)$ and a differential 1-form $\omega\in{\mathcal{D}}(M)$ , where ${\mathcal{D}}(M)$ denotes the totality of 1-forms on $M$ , such that

	$\displaystyle X\stackrel{{\scriptstyle g}}{{\longleftrightarrow}}\omega\;$	$\displaystyle\Leftrightarrow\;\forall p\in M,\;X_{p}\stackrel{{\scriptstyle g_{p}}}{{\longleftrightarrow}}\omega_{p}$
		$\displaystyle\Leftrightarrow\;\forall Y\in{\mathscr{X}}(M),\;\omega(Y)=g(X,Y).$		(7.2)

When a coordinate system $\xi=(\xi^{i})$ is given on $M$ , and $X\in{\mathscr{X}}(M)$ and $\omega\in{\mathcal{D}}(M)$ are represented as $X=\sum_{i}X^{i}\partial_{i}$ and $\omega=\sum_{j}\omega_{j}\,d\xi^{j}$ by functions $\{X^{i}\},\{\omega_{j}\}\subset{\mathcal{F}}(M)$ , we have

X\stackrel{{\scriptstyle g}}{{\longleftrightarrow}}\omega\;\Leftrightarrow\;\forall j,\;\omega_{j}=\sum_{i}X^{i}g_{ij}\;\Leftrightarrow\;\forall i,\;X^{i}=\sum_{j}\omega_{j}g^{ij},

(7.3)

where $g_{ij}=g(\partial_{i},\partial_{j})$ and $g^{ij}=g(d\xi^{i},d\xi^{j})$ which form the inverse matrices of each other.

For a function $f\in{\mathcal{F}}(M)$ , its gradient w.r.t. $g$ is defined as the vector field $X\in{\mathscr{X}}(M)$ such that $X\stackrel{{\scriptstyle g}}{{\longleftrightarrow}}df$ , which we denote by $X={\rm grad}\leavevmode\nobreak\ \!f$ . This is represented as

{\rm grad}\leavevmode\nobreak\ \!f=\sum_{i,j}(\partial_{i}f)g^{ij}\partial_{j}.

(7.4)

The correspondence $\stackrel{{\scriptstyle g_{p}}}{{\longleftrightarrow}}$ induces an inner product and a norm on the cotangent space $T^{*}_{p}(M)$ such that $\stackrel{{\scriptstyle g_{p}}}{{\longleftrightarrow}}$ is an isometry; i.e., $X_{p}\stackrel{{\scriptstyle g_{p}}}{{\longleftrightarrow}}\omega_{p}$ $\Rightarrow$ $\|X_{p}\|_{p}=\|\omega_{p}\|_{p}$ . In particular, we have

\|({\rm grad}\leavevmode\nobreak\ \!f)_{p}\|^{2}_{p}=\|(df)_{p}\|^{2}_{p}=\sum_{i,j}g^{ij}(p)\,\partial_{i}f(p)\,\partial_{j}f(p).

(7.5)

Now, we are ready to start the main discussion of this section. Let ${\mathcal{M}}$ be an $n$ -dimensional submanifold of ${\mathcal{S}}={\mathcal{S}}({\mathcal{H}})$ , and $f\in{\mathcal{F}}({\mathcal{M}})$ be a smooth function on it. We consider the problem of estimating the scalar value $f(\rho)$ for unknown $\rho\in{\mathcal{M}}$ . An estimator is generally represented by a POVM $\Lambda=\Lambda(d\hat{t}\,)$ , where $\hat{t}$ is a scalar variable representing an estimate for $t=f(\rho)$ . The expectation $E_{\rho}(\Lambda)$ and the mean squared error (the variance in the unbiased case) $V_{\rho}(\Lambda)$ of $\Lambda$ for a state $\rho$ are defined by

	$\displaystyle E_{\rho}(\Lambda)$	$\displaystyle:=\int\hat{t}\,{\rm Tr}\left(\rho\,\Lambda(d\hat{t}\,)\right),$		(7.6)
	$\displaystyle V_{\rho}(\Lambda)$	$\displaystyle:=\int(\hat{t}-f(\rho))^{2}\,{\rm Tr}\left(\rho\,\Lambda(d\hat{t}\,)\right).$		(7.7)

Localizing the unbiasedness condition $E(\Lambda)=f$ , where the LHS denotes the function ${\mathcal{M}}\rightarrow\mathbb{R},\;\rho\mapsto E_{\rho}(\Lambda)$ , we say that $\Lambda$ is locally unbiased for $f$ at $\rho\in{\mathcal{M}}$ when

E_{\rho}(\Lambda)=f(\rho)\quad\text{and}\quad(dE(\Lambda))_{\rho}=(df)_{\rho}.

(7.8)

When a coordinate system $\xi=(\xi^{i})$ is arbitrarily given on ${\mathcal{M}}$ , the second condition in (7.8) is expressed as

\forall i\in\{1,\ldots,n\},\;\;\partial_{i}E_{\rho}(\Lambda)=\partial_{i}f(\rho),

(7.9)

where $\partial_{i}E_{\rho}(\Lambda)$ and $\partial_{i}f(\rho)$ denote the derivatives of the functions $E(\Lambda)$ and $f$ by $\partial_{i}=\frac{\partial}{\partial\xi^{i}}$ evaluated at $\rho$ . We denote by ${\mathcal{U}}(\rho,f)$ the totality of locally unbiased estimators for $f$ at $\rho$ .

Proposition 7.1.

For any $f\in{\mathcal{F}}({\mathcal{M}})$ and any $\rho\in{\mathcal{M}}$ , we have

\min_{\Lambda\in{\mathcal{U}}(\rho,f)}V_{\rho}(\Lambda)=\|(df)_{\rho}\|_{\rho}^{2}.

(7.10)

The minimum of (7.10) is achieved by the spectral measure of the observable

F_{\rho}:=f(\rho)+\sum_{i}\partial_{i}f(\rho)\,L^{i}_{\rho},

(7.11)

where $L^{i}_{\rho}:=\sum_{j}g^{ij}(\rho)L_{i,\rho}$ and $L_{i}:=L_{\partial_{i}}$ .

Proof Given an estimator $\Lambda$ , let

A:=\int\hat{t}\,\Lambda(d\hat{t})\in{\mathcal{L}}_{{\rm h}}.

Then the local unbiasedness of $\Lambda$ at $\rho$ is represented as

\left\langle A\right\rangle_{\rho}=f(\rho)\quad\text{and}\quad\forall i,\;\partial_{i}\left\langle A\right\rangle_{\rho}=\partial_{i}f(\rho),

(7.12)

and we have

V_{\rho}(\Lambda)\geq\left\langle(A-f(\rho))^{2}\right\rangle_{\rho}=\|A-f(\rho)\|^{2}_{\rho},

(7.13)

where $\|\cdot\|_{\rho}$ denotes the norm w.r.t. the symmetrized inner product $\left\langle\cdot,\cdot\right\rangle_{\rho}$ on ${\mathcal{L}}_{{\rm h}}$ . Noting that the second condition of (7.12) is equivalent to

\forall i,\;\left\langle L_{i,\rho},A-f(\rho)\right\rangle_{\rho}=\Bigl{\langle}L_{i,\rho},\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}\Bigr{\rangle}_{\rho}

we see that $\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}$ is the orthogonal projection of $A-f(\rho)$ onto the space ${\rm span}\leavevmode\nobreak\ \!\{L_{j,\rho}\}_{j=1}^{n}$ . Hence we have

$\displaystyle\\|A-f(\rho)\\|^{2}_{\rho}$	$\displaystyle=\\|\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}\\|_{\rho}^{2}+\\|A-f(\rho)-\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}\\|_{\rho}^{2}$
	$\displaystyle\geq\\|\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}\\|_{\rho}^{2}$
	$\displaystyle=\sum_{i,j}g^{ij}(\rho)\,\partial_{i}f(\rho)\,\partial_{j}f(\rho)=\\|(df)_{\rho}\\|_{\rho}^{2}.$	(7.14)

The inequality in (7.13) holds with equality when $\Lambda$ is the spectral measure of $A$ , and the inequality in (7.14) holds with equality when $A=f(\rho)+\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}=F_{\rho}$ . These observations prove the proposition. $\Box$

Based on Proposition 7.1, we call an estimator $\Lambda$ locally efficient for $f$ at $\rho$ when $\Lambda\in{\mathcal{U}}(\rho,f)$ and $V_{\rho}(\Lambda)=\|df\|_{\rho}^{2}$ , and call it efficient for $f$ when it is locally efficient for $f$ at every $\rho\in{\mathcal{M}}$ . Note that, unlike the case of estimation for multi-dimensional coordinates $(\xi^{i})$ where the infimum in (4.9) cannot be replaced with minimum in general, there always exists a locally efficient estimator for a scalar function $f$ at each $\rho$ . Furthermore, since a locally efficient estimator is obtained as the spectral measure of an observable as is shown in the proof of Prop. 7.1, it suffices to treat only estimators represented by Hermitian operators. Note that an estimator $F\in{\mathcal{L}}_{{\rm h}}$ is efficient for $f$ iff

\forall\rho\in{\mathcal{M}},\;\left\langle F\right\rangle_{\rho}=f(\rho)\quad\text{and}\quad V_{\rho}(F):=\bigl{\langle}(F-\left\langle F\right\rangle_{\rho})^{2}\bigr{\rangle}_{\rho}=\|(df)_{\rho}\|_{\rho}^{2}.

(7.15)

We define

{\mathcal{E}}({\mathcal{M}}):=\{f\in{\mathcal{F}}({\mathcal{M}})\,|\,\text{there exists an efficient estimator for $f$}\}.

(7.16)

Proposition 7.2.

For a function $f\in{\mathcal{F}}({\mathcal{M}})$ , the following conditions are equivalent.

(i)

$f\in{\mathcal{E}}({\mathcal{M}})$ .
(ii)

$\exists F\in{\mathcal{L}}_{{\rm h}},\;F-f=\sum_{i}(\partial_{i}f)\,L^{i}$ .
(iii)

$\exists F\in{\mathcal{L}}_{{\rm h}},\;F-\left\langle F\right\rangle|_{{\mathcal{M}}}=\sum_{i}(\partial_{i}f)\,L^{i}$ .
(iv)

${\rm grad}\leavevmode\nobreak\ \!f$ is e-parallel (i.e. parallel w.r.t. the e-connection on ${\mathcal{S}}$ ).

In (ii), the observable $F$ gives an efficient estimator for $f$ .

Proof From Prop. 7.1, it immediately follows that (i) $\Leftrightarrow$ (ii) and that $F$ in (ii) gives an efficient estimator for $f$ .

It is obvious that (ii) $\Rightarrow$ (iii). To show the converse, assume that an operator $F\in{\mathcal{L}}_{{\rm h}}$ satisfies $F-\left\langle F\right\rangle|_{{\mathcal{M}}}=\sum_{i}(\partial_{i}f)\,L^{i}$ . Then we have

\displaystyle\partial_{i}\left\langle F\right\rangle=\left\langle L_{i},F\right\rangle=\left\langle L_{i},F-\left\langle F\right\rangle|_{{\mathcal{M}}}\right\rangle=\bigl{\langle}L_{i},\sum_{j}(\partial_{j}f)\,L^{i}\bigr{\rangle}=\partial_{i}f,

which implies that $\exists c\in\mathbb{R}$ , $f=\left\langle F\right\rangle|_{{\mathcal{M}}}+c$ . Redefining $F:=F+c$ , we have $F-f=\sum_{i}(\partial_{i}f)L^{i}$ . This proves (iii) $\Rightarrow$ (ii).

Let $X:={\rm grad}\leavevmode\nobreak\ \!f$ . Then (7.4) yields

L_{X}=\sum_{i,j}(\partial_{i}f)g^{ij}L_{j}=\sum_{i}(\partial_{i}f)L^{i},

and Cor. 3.6 yields

\text{$X$ is e-parallel}\;\Leftrightarrow\;\exists F\in{\mathcal{L}}_{{\rm h}},\;L_{X}=F-\left\langle F\right\rangle|_{{\mathcal{M}}}.

Thus we obtain (iii) $\Leftrightarrow$ (iv). $\Box$

Corollary 7.3.

${\mathcal{E}}({\mathcal{M}})$ is an $\mathbb{R}$ -linear space.

Proof Obvious from Prop. 7.2. $\Box$

Proposition 7.4.

For a vector field $X\in{\mathscr{X}}({\mathcal{M}})$ , we have

\text{$X$ is e-parallel}\;\Leftrightarrow\;\exists f\in{\mathcal{E}}({\mathcal{M}}),\;X={\rm grad}\,f.

(7.17)

Proof The implication $\Leftarrow$ follows from (i) $\Rightarrow$ (iv) in Prop. 7.2. To show the converse, assume that $X$ is e-parallel. Then, according to Cor. 3.6, there exists $F\in{\mathcal{L}}_{{\rm h}}$ such that $L_{X}=F-f$ , where $f:=\left\langle F\right\rangle|_{{\mathcal{M}}}$ . For any $Y\in{\mathscr{X}}({\mathcal{M}})$ we have

g(X,Y)=\left\langle L_{Y},F-f\right\rangle=\left\langle L_{Y},F\right\rangle\stackrel{{\scriptstyle\star}}{{=}}Y\left\langle F\right\rangle|_{{\mathcal{M}}}=Yf=df(Y),

(7.18)

where $\stackrel{{\scriptstyle\star}}{{=}}$ follows from (3.7). This implies that $X={\rm grad}\leavevmode\nobreak\ \!f$ . Since $X$ is e-parallel, it follows from (iv) $\Rightarrow$ (i) in Prop. 7.2 that $f\in{\mathcal{E}}({\mathcal{M}})$ . Thus, $\Rightarrow$ in (7.17) has been verified. $\Box$

Define

d{\mathcal{E}}({\mathcal{M}}):=\{df\,|\,f\in{\mathcal{E}}({\mathcal{M}})\}\;\subset{\mathcal{D}}({\mathcal{M}}).

(7.19)

Since $df=df^{\prime}\;\Leftrightarrow\;f-f^{\prime}=$ const., we have the natural identification $d{\mathcal{E}}({\mathcal{M}})\simeq{\mathcal{E}}({\mathcal{M}})/\mathbb{R}$ . We also define

{{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}}):=\{X\in{\mathscr{X}}({\mathcal{M}})\,|\,\text{$X$ is e-parallel}\}.

(7.20)

Then we have the following proposition.

Proposition 7.5.

The correspondence $\stackrel{{\scriptstyle g|_{{\mathcal{M}}}}}{{\longleftrightarrow}}$ establishes a linear isomorphism between ${{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}})$ and $d{\mathcal{E}}({\mathcal{M}})$ . As a consequence, we have

\dim d{\mathcal{E}}({\mathcal{M}})=\dim{\mathcal{E}}({\mathcal{M}})-1=\dim{{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}})\leq\dim{\mathcal{M}}.

(7.21)

Proof It suffices to show that for an arbitrary pair $(X,\omega)\in{\mathscr{X}}({\mathcal{M}})\times{\mathcal{D}}({\mathcal{M}})$ satisfying $X\stackrel{{\scriptstyle g|_{{\mathcal{M}}}}}{{\longleftrightarrow}}\omega$ , the following equivalence holds:

X\in{{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}})\;\Leftrightarrow\;\exists f\in{\mathcal{E}}({\mathcal{M}}),\;\omega=df.

(7.22)

Since $\omega=df\;\Leftrightarrow\;X={\rm grad}\leavevmode\nobreak\ \!f$ , this is just Prop. 7.4. $\Box$

Now, we present two theorems for characterization of the e-autoparallelity in terms of ${\mathcal{E}}({\mathcal{M}})$ .

Theorem 7.6.

For an $n$ -dimensional submanifold ${\mathcal{M}}$ of ${\mathcal{S}}$ , the following conditions are equivalent.

(i)

${\mathcal{M}}$ is e-autoparallel in ${\mathcal{S}}$ .
(ii)

$\dim{\mathcal{E}}({\mathcal{M}})=n+1$ .

Proof We have (i) $\Leftrightarrow$ $\dim{{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}})=n$ by Prop. 2.4, and $\dim{{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}})=n$ $\Leftrightarrow$ (ii) by Prop. 7.5. $\Box$

Theorem 7.7.

For an $n$ -dimensional model $({\mathcal{M}},\xi)$ , the following conditions are equivalent.

(i)

${\mathcal{M}}$ is e-autoparallel in ${\mathcal{S}}$ , and $\xi$ is an m-affine coordinate system.
(ii)

$\forall i\in\{1,\ldots,n\},\;\xi^{i}\in{\mathcal{E}}({\mathcal{M}})$ .
(iii)

$\displaystyle{\mathcal{E}}({\mathcal{M}})=\bigl{\{}c+\sum_{i=1}^{n}u_{i}\xi^{i}\,\big{|}\,(c,u_{1},\ldots,u_{n})\in\mathbb{R}^{n+1}\bigr{\}}$ .

Proof Let $X^{i}:={\rm grad}\leavevmode\nobreak\ \!\xi^{i}=\sum_{j}g^{ij}\partial_{j}$ . Then we have

(i)

\displaystyle\Leftrightarrow\;\forall i,\;X^{i}\in{{\mathscr{X}}}_{\text{e-par}}({\mathcal{M}})\;\Leftrightarrow\;\text{(ii)},

where the first $\Leftrightarrow$ follows from Prop. 2.6 and the second $\Leftrightarrow$ follows from Prop. 7.2.

Next, noting that constant functions on ${\mathcal{M}}$ belong to ${\mathcal{E}}({\mathcal{M}})$ , we have

	(ii)	$\displaystyle\Leftrightarrow\;\bigl{\{}c+\sum_{i=1}^{n}u_{i}\xi^{i}\,\big{\|}\,(c,u_{1},\ldots,u_{n})\in\mathbb{R}^{n+1}\bigr{\}}\subset{\mathcal{E}}({\mathcal{M}})$
		$\displaystyle\Leftrightarrow\;\text{(iii)},$

where the second $\Leftrightarrow$ follows since $\dim{\mathcal{E}}({\mathcal{M}})\leq n+1$ by (7.21) and $\{1,\xi^{1},\ldots,\xi^{n}\}$ are linearly independent. $\Box$

Remark 7.8.

If we replace ${\mathcal{S}}({\mathcal{H}})$ by ${\mathcal{P}}(\Omega)$ in Theorems 7.6 and 7.7, these theorems hold as they are in the classical case. When the coordinate functions $\xi^{1},\ldots,\xi^{n}$ satisfy condition (ii) in Theorem 7.7, they have their efficient estimators $F^{1},\ldots,F^{n}$ , which are functions $\Omega\rightarrow\mathbb{R}$ in this case, and the map $(F^{1},\ldots,F^{n}):\Omega\rightarrow\mathbb{R}^{n}$ becomes an efficient estimator for $\xi=(\xi^{1},\ldots,\xi^{n})$ . Thus, we see that the equivalence (i) $\Leftrightarrow$ (ii) in the theorem is just Theorem 1.1.

Finally, we present three propositions that will aid in understanding the above results in a purely geometric context, whose proofs are given in A3 of Appendix.

Proposition 7.9.

$\forall F\in{\mathcal{L}}_{{\rm h}},\forall\rho\in{\mathcal{S}},\;V_{\rho}(F):=\bigl{\langle}(F-\left\langle F\right\rangle_{\rho})^{2}\bigr{\rangle}_{\rho}=\|(d\left\langle F\right\rangle)_{\rho}\|_{\rho}^{2}$ .

Proposition 7.10.

We have

$\displaystyle{\mathcal{E}}({\mathcal{S}})$	$\displaystyle=\{\left\langle F\right\rangle\,\|\,F\in{\mathcal{L}}_{{\rm h}}\}$
	$\displaystyle=\{f\in{\mathcal{F}}({\mathcal{S}})\,\|\,\text{${\rm grad}\leavevmode\nobreak\ \!f$ is e-parallel}\}$
	$\displaystyle=\{f\in{\mathcal{F}}({\mathcal{S}})\,\|\,\text{$df$ is m-parallel}\},$	(7.23)

where a 1-form $\omega\in{\mathcal{D}}({\mathcal{S}})$ is said to be m-parallel when

\forall X,Y\in{\mathscr{X}}({\mathcal{S}}),\;(\nabla^{({\rm m})}_{X}\omega)(Y):=X\omega(Y)-\omega\bigl{(}\nabla^{({\rm m})}_{X}Y\bigr{)}=0.

(7.24)

Proposition 7.11.

For an arbitrary submanifold ${\mathcal{M}}$ of ${\mathcal{S}}$ , we have

	$\displaystyle{\mathcal{E}}({\mathcal{M}})=\bigl{\{}f\in{\mathcal{F}}({\mathcal{M}})\,\big{\|}\,\exists\tilde{f}\in{\mathcal{E}}({\mathcal{S}}),$	$\displaystyle\;f=\tilde{f}\|_{{\mathcal{M}}}\;\;\text{and}\;\;$
		$\displaystyle\forall\rho\in{\mathcal{M}},\;\\|(df)_{\rho}\\|_{\rho}=\\|(d\tilde{f})_{\rho}\\|_{\rho}\,\bigr{\}}.$		(7.25)

As these propositions suggest, the discussion for ${\mathcal{E}}({\mathcal{M}})$ given in this section can be extended to a more general geometrical setting. Let us recall the situation teated in section 2 where a manifold $S$ is provided with a Riemannian metric $g$ together with mutually dual affine connections $\nabla$ and $\nabla^{*}$ such that $\nabla$ is curvature-free and $\nabla^{*}$ is flat. We define

	$\displaystyle{\mathcal{E}}(S):=$	$\displaystyle\{f\in{\mathcal{F}}(S)\,\|\,\text{${\rm grad}\leavevmode\nobreak\ \!f$ is $\nabla$-parallel}\}$
	$\displaystyle=$	$\displaystyle\{f\in{\mathcal{F}}(S)\,\|\,\text{$df$ is $\nabla^{*}$-parallel}\},$		(7.26)

where we have invoked the fact that the correspondence ${\mathscr{X}}(S)\ni X\stackrel{{\scriptstyle g}}{{\longleftrightarrow}}\omega\in{\mathcal{D}}(S)$ implies (cf. (A.7))

\text{$X$ is $\nabla$-parallel}\;\Leftrightarrow\;\text{$\omega$ is $\nabla^{*}$-parallel}.

(7.27)

Given a submanifold $M$ of $S$ , let

	$\displaystyle{\mathcal{E}}(M):=\bigl{\{}f\in{\mathcal{F}}(M)\,\big{\|}\,\exists\tilde{f}\in{\mathcal{E}}(S),$	$\displaystyle\;f=\tilde{f}\|_{M}\;\;\text{and}\;\;$
		$\displaystyle\forall\rho\in M,\;\\|(df)_{\rho}\\|_{\rho}=\\|(d\tilde{f})_{\rho}\\|_{\rho}\,\bigr{\}}.$		(7.28)

Then it is not difficult to verify that Theorems 7.6 and 7.7 as well as their proofs are extended to this general situation almost as they are.

It should be noted that the flatness of $\nabla^{*}$ is essential in that it ensures $\dim{\mathcal{E}}(S)=\dim S+1$ . To clarify the role of the flatness, let us consider a more general situation by removing the assumption that $\nabla$ is curvature-free and $\nabla^{*}$ is flat, assuming only that they are dual w.r.t. $g$ . We start from the following general identity: for any 1-form $\omega\in{\mathcal{D}}(S)$ and any vector fields $X,Y\in{\mathscr{X}}(S)$ , we have

$\displaystyle(d\omega)(X,Y)$	$\displaystyle:=X\omega(Y)-Y\omega(X)-\omega([X,Y])$
	$\displaystyle=(\nabla^{}_{X}\omega)(Y)+\omega(\nabla^{}_{X}Y)-(\nabla^{}_{Y}\omega)(X)-\omega(\nabla^{}_{Y}X)-\omega([X,Y])$
	$\displaystyle=(\nabla^{}_{X}\omega)(Y)-(\nabla^{}_{Y}\omega)(X)+\omega({\mathcal{T}}^{(\nabla^{*})}(X,Y)),$	(7.29)

where ${\mathcal{T}}^{(\nabla^{*})}$ denotes the torsion of $\nabla^{*}$ : ${\mathcal{T}}^{(\nabla^{*})}(X,Y)=\nabla^{*}_{X}Y-\nabla^{*}_{Y}X-[X,Y]$ . When $\nabla^{*}$ is torsion-free, this implies that for any $\omega\in{\mathcal{D}}(S)$

\text{$\omega$ is $\nabla^{*}$-parallel}\;\Rightarrow\;d\omega=0\;\Leftrightarrow\;\exists f\in{\mathcal{F}}(S),\;\omega=df

(7.30)

(see Remark 1.4) and that for any $X\in{\mathscr{X}}(S)$

\text{$X$ is $\nabla$-parallel}\;\Rightarrow\;\exists f\in{\mathcal{F}}(S),\;X={\rm grad}\leavevmode\nobreak\ \!f.

(7.31)

This leads to

d{\mathcal{E}}(S):=\{df\,|\,f\in{\mathcal{E}}(S)\}=\{\omega\,|\,\text{$\omega$ is $\nabla^{*}$-parallel}\}

(7.32)

and hence

\dim{\mathcal{E}}(S)=\dim\{\omega\,|\,\text{$\omega$ is $\nabla^{*}$-parallel}\}+1=\dim{\mathscr{X}}_{\text{$\nabla$-par}}(S)+1,

(7.33)

where ${\mathscr{X}}_{\text{$\nabla$-par}}(S)$ denotes the totality of $\nabla$ -parallel vector fields on $S$ . If, in addition, $\nabla$ is curvature-free, which is equivalent to the flatness of $\nabla^{*}$ , then $\dim{\mathscr{X}}_{\text{$\nabla$-par}}(S)=\dim S$ , and we obtain $\dim{\mathcal{E}}(S)=\dim S+1$ .

8 Integrability conditions

Consider the general situation where an affine connection $\nabla$ is given on a manifold $S$ . For an arbitrary point $p\in S$ and an arbitrary $1$ -dimensional subspace $V$ of the tangent space $T_{p}(S)$ , there always exists a $\nabla$ -autoparallel curve, i.e. a $\nabla$ -geodesic, that passes through $p$ in direction $V$ . For the existence of multi-dimensional autoparallel submanifolds, the situation differs greatly depending on whether $\nabla$ is flat or not. When $\nabla$ is flat (curvature-free and torsion-free), the autoparallel submanifolds are those determined by arbitrary affine constraints on $\nabla$ -affine coordinates. This ensures that, for an arbitrary point $p\in S$ and an arbitrary linear subspace $V$ of the tangent space $T_{p}(S)$ , there uniquely exists a $\nabla$ -autoparallel submanifold $M$ satisfying $p\in M$ and $T_{p}(M)=V$ . This is the case with the e-connection on the space ${\mathcal{P}}$ of probability distributions, for which the autoparallel submanifolds are the exponential families. When $\nabla$ is not flat, on the other hand, the existence of multi-dimensional autoparallel submanifolds is not ensured in general. In this section we investigate conditions for existence of autoparallel submanifolds.

Let us consider the case when $\nabla$ is curvature-free as in the e-connection on ${\mathcal{S}}({\mathcal{H}})$ . According to (i) $\Leftrightarrow$ (iv) of Prop. 2.4, an $n$ -dimensional submanifold $M$ of $S$ is $\nabla$ -autoparallel iff there exist $n$ linearly independent $\nabla$ -parallel vector fields $X^{1},\ldots,X^{n}\in{\mathscr{X}}(S)$ such that their restrictions $X^{1}|_{M},\ldots,X^{n}|_{M}$ belong to ${\mathscr{X}}(M)$ . This means that $M$ is an integral manifold of $\{X^{1},\ldots,X^{n}\}$ , or equivalently that $M$ is an integral manifold of the $n$ -dimensional distribution

{\mathcal{V}}:S\ni p\mapsto V_{p}:={\rm span}\{X^{1}_{p},\ldots,X^{n}_{p}\}\subset T_{p}(S),

(8.1)

which is $\nabla$ -parallel in the sense that $\forall p,q\in S,\;\Phi^{(\nabla)}_{p,q}(V_{p})=V_{q}$ , where $\Phi^{(\nabla)}_{p,q}$ denotes the parallel transport w.r.t. $\nabla$ .

Proposition 8.1.

Suppose that we are given a manifold $S$ with a curvature-free connection $\nabla$ and an $n$ -dimensional $\nabla$ -parallel distribution ${\mathcal{V}}:S\ni p\mapsto V_{p}$ . Define

{\mathscr{X}}(S\colon{\mathcal{V}}):=\{X\in{\mathscr{X}}(S)\,|\,\forall p\in S,\,X_{p}\in V_{p}\}

(8.2)

and

{{\mathscr{X}}}_{\text{$\nabla$-par}}(S\colon{\mathcal{V}}):=\{X\in{\mathscr{X}}(S\colon{\mathcal{V}})\,|\,\text{$X$ is $\nabla$-parallel}\}.

(8.3)

Then the following conditions are equivalent.

(i)

For every $p\in S$ , there exists a $\nabla$ -autoparallel submanifold $M$ of $S$ satisfying $p\in M$ and $T_{p}(M)=V_{p}$ .
(ii)

The distribution ${\mathcal{V}}$ is involutive in the sense that $\forall X,Y\in{\mathscr{X}}(S\colon{\mathcal{V}}),\;[X,Y]\in{\mathscr{X}}(S\colon{\mathcal{V}})$ .
(iii)

$\forall X,Y\in{{\mathscr{X}}}_{\text{$\nabla$-par}}(S\colon{\mathcal{V}}),\;[X,Y]\in{\mathscr{X}}(S\colon{\mathcal{V}})$ .
(iv)

The torsion ${\mathcal{T}}^{(\nabla)}$ of $\nabla$ satisfies $\forall p\in S,\;{\mathcal{T}}^{(\nabla)}_{p}(V_{p}\times V_{p})\subset V_{p}$ .

Proof (i) is equivalent to the condition that for any point $p\in S$ , there exists an integral manifold of ${\mathcal{V}}$ containing $p$ , and is equivalent to (ii) by the famous Frobenius theorem for integrability.

(ii) $\Rightarrow$ (iii) is obvious, and (iii) $\Rightarrow$ (ii) follows since there exist $n$ linearly independent $\nabla$ -parallel vector fields $\{X_{1},\ldots,X_{n}\}\subset{{\mathscr{X}}}_{\text{$\nabla$-par}}(S\colon{\mathcal{V}})$ , whereby every element of ${\mathscr{X}}(S\colon{\mathcal{V}})$ is expressed as $\sum_{i}f_{i}X_{i}$ by some functions $\{f_{1},\ldots,f_{n}\}\subset{\mathcal{F}}(M)$ .

For any $\nabla$ -parallel vector fields $X$ and $Y$ , we have

{\mathcal{T}}^{(\nabla)}(X,Y):=\nabla_{X}Y-\nabla_{Y}X-[X,Y]=-[X,Y].

(8.4)

Hence (iii) is equivalent to

\forall X,Y\in{{\mathscr{X}}}_{\text{$\nabla$-par}}(S\colon{\mathcal{V}}),\;\;{\mathcal{T}}^{(\nabla)}(X,Y)\in{\mathscr{X}}(S\colon{\mathcal{V}}).

(8.5)

Since ${\mathcal{T}}^{(\nabla)}$ is a tensor field so that $({\mathcal{T}}^{(\nabla)}(X,Y))_{p}={\mathcal{T}}^{(\nabla)}_{p}(X_{p},Y_{p}))$ holds at each point $p$ , (8.5) is equivalent to (iv). $\Box$

Remark 8.2.

Condition (i) in Prop. 8.1 (and hence (ii)-(iv) also) means that there exists a foliation $S=\bigsqcup_{\alpha}M_{\alpha}$ such that each leaf $M_{\alpha}$ is $\nabla$ -autoparallel in $S$ and satisfies $T_{p}(M_{\alpha})=V_{p}$ for every $p\in M_{\alpha}$ .

The following proposition is an immediate consequence of (i) $\Leftrightarrow$ (iv) in Prop. 8.1.

Proposition 8.3.

For a manifold $S$ with a curvature-free connection $\nabla$ , the following conditions are equivalent.

(i)

For every point $p\in S$ and every linear subspace $V$ of $T_{p}(S)$ , there exists a $\nabla$ -autoparallel submanifold $M$ satisfying $p\in M$ and $T_{p}(M)=V$ .
(ii)

$\forall p\in S,\;\forall X_{p},Y_{p}\in T_{p}(S),\;{\mathcal{T}}^{(\nabla)}_{p}(X_{p},Y_{p})\in{\rm span}\{X_{p},Y_{p}\}$ , or equivalently, $\forall X,Y\in{\mathscr{X}}(S),\;{\mathcal{T}}^{(\nabla)}(X,Y)\in{\rm span}_{{\mathcal{F}}(S)}\{X,Y\}:=\{fX+gY\,|\,f,g\in{\mathcal{F}}(S)\}$ .

Let us apply the above considerations to ${\mathcal{S}}={\mathcal{S}}({\mathcal{H}})$ with the SLD structure and its submanifolds. Let ${\mathcal{A}}$ be an arbitrary linear subspace of ${\mathcal{L}}_{{\rm h}}$ , and define for each point $\rho\in{\mathcal{S}}$

	$\displaystyle V_{{\mathcal{A}},\rho}$	$\displaystyle:=\{X_{\rho}\in T_{\rho}({\mathcal{S}})\,\|\,\exists A\in{\mathcal{A}},\;L_{X_{\rho}}=A-\left\langle A\right\rangle_{\rho}\}$
		$\displaystyle=\{X_{\rho}\in T_{\rho}({\mathcal{S}})\,\|\,L_{X_{\rho}}\in{\mathcal{A}}+\mathbb{R}\},$		(8.6)

where $\mathbb{R}$ is identified with $\{cI\,|\,c\in\mathbb{R}\}$ . Then ${\mathcal{V}}_{{\mathcal{A}}}:{\mathcal{S}}\ni\rho\mapsto V_{{\mathcal{A}},\rho}$ defines an e-parallel distribution on ${\mathcal{S}}$ , whose dimension $\dim V_{{\mathcal{A}},\rho}$ is equal to $\dim{\mathcal{A}}$ when $I\notin{\mathcal{A}}$ and $\dim{\mathcal{A}}-1$ otherwise. Every e-parallel distribution on ${\mathcal{S}}$ is represented as ${\mathcal{V}}_{{\mathcal{A}}}$ by some ${\mathcal{A}}$ , and ${\mathcal{V}}_{{\mathcal{A}}}={\mathcal{V}}_{{\mathcal{A}}^{\prime}}$ iff ${\mathcal{A}}+\mathbb{R}={\mathcal{A}}^{\prime}+\mathbb{R}$ . This means that ${\mathcal{A}}\mapsto{\mathcal{V}}_{{\mathcal{A}}}$ establishes a one-to-one correspondence between linear subspaces of the quotient space ${\mathcal{L}}_{{\rm h}}/\mathbb{R}$ and e-parallel distributions on ${\mathcal{S}}$ .

Theorem 8.4.

Given a subspace ${\mathcal{A}}\subset{\mathcal{L}}_{{\rm h}}$ , the following conditions are equivalent.

(i)

For every $\rho\in{\mathcal{S}}$ , there exists an e-autoparallel submanifold ${\mathcal{M}}$ of ${\mathcal{S}}$ satisfying $\rho\in{\mathcal{M}}$ and $T_{\rho}({\mathcal{M}})=V_{{\mathcal{A}},\rho}$ .
(ii)

For every $\rho\in{\mathcal{S}}$ ,

$\{[[A,B],\rho]\,|\,A,B\in{\mathcal{A}}\}\subset\{\rho\circ C\,|\,C\in{\mathcal{A}}+\mathbb{R}\}.$ (8.7)

Proof From (3.26) it follows that for any $X_{\rho},Y_{\rho},Z_{\rho}\in T_{\rho}({\mathcal{S}})$

{\mathcal{T}}_{\rho}^{({\rm e})}(X_{\rho},Y_{\rho})=Z_{\rho}\;\Leftrightarrow\;\frac{1}{4}[[L_{X_{\rho}},L_{Y_{\rho}}],\rho]=\iota_{*}(Z_{\rho})=\rho\circ L_{Z_{\rho}}.

Hence, noting that $[A-\left\langle A\right\rangle_{\rho},B-\left\langle B\right\rangle_{\rho}]=[A,B]$ , we obtain (i) $\Leftrightarrow$ (ii) from (i) $\Leftrightarrow$ (iv) in Prop. 8.1. $\Box$

Let $F^{1},\ldots,F^{n}$ be Hermitian operators on ${\mathcal{H}}$ such that $\forall i,j,\;[F^{i},F^{j}]=0$ and that $\{F^{1},\ldots,F^{n},I\}$ are linearly independent (cf. Remark 3.9), and let ${\mathcal{A}}:={\rm span}\,\{F^{1},\ldots,F^{n}\}\;(\not\ni I)$ . Then for any $\rho\in{\mathcal{S}}$ and any $A,B\in{\mathcal{A}}$ we have $[[A,B],\rho]=0$ , so that (ii) in Theorem 8.4 trivially holds. Hence the distribution ${\mathcal{V}}_{{\mathcal{A}}}$ is integrable, and we obtain a foliation ${\mathcal{S}}=\bigsqcup_{\alpha}{\mathcal{M}}_{\alpha}$ , whose leaves $\{{\mathcal{M}}_{\alpha}\}$ are $n$ -dimensional quasi-exponential families of the form (4.15).

Remark 8.5.

Recall the situation when we defined the quantum Gaussian shift model ${\mathcal{M}}=$ $\{\rho_{\xi}\,|\,\xi=(\xi^{1},\ldots,\xi^{n})\in\mathbb{R}^{n}\}$ in Section 6, and let ${\mathcal{A}}:=$ ${\rm span}\leavevmode\nobreak\ \!\{R(z_{1}),\ldots,R(z_{n})\}$ . Then, for any $A,B\in{\mathcal{A}}$ , we have $[A,B]=cI$ with a purely imaginary constant $c$ and hence $[[A,B],\rho]=0$ . So, (ii) in Theorem 8.4 holds at least formally, and the Gaussian model may be regarded as an integral manifold of ${\mathcal{V}}_{{\mathcal{A}}}$ . Note, however, that Theorem 8.4 is not valid in the infinite-dimensional case so that (ii) does not imply (i), because various mathematical problems arise that were not present in the finite dimensional case, such as the fact that a positive operator does not always have finite trace and hence is not always normalizable.

Remark 8.6.

Let us revisit the relationship between autoparallelity and total geodesicness described in Remark 2.2 in the context of Prop. 8.3. Suppose that $(S,\nabla)$ satisfies conditions (i)-(ii) of Prop. 8.3 and that a submanifold $M$ of $S$ is $\nabla$ -totally geodesic. Given a point $p\in M$ arbitrarily, there exists a $\nabla$ -autoparallel submanifold $N$ which satisfies $p\in N$ and $T_{p}(N)=T_{p}(M)$ by condition (i). Since $N$ is also $\nabla$ -totally geodesic, we have $M=N$ , so that $M$ is $\nabla$ -autoparallel. Namely, condition (ii) together with the curvature-freeness of $\nabla$ implies the equivalence between $\nabla$ -autoparallelity and $\nabla$ -total geodesicness. In fact, the curvature-freeness is unnecessary, and their equivalence follows from condition (ii) alone. See A4 of Appendix for details.

At the end of this section, we give an examples of e-autoparallel submanifold that does not fall within the scope of Theorem 8.4. Let ${\mathcal{B}}=\{\left|1\right\rangle,\ldots,\left|d\right\rangle\}$ with $d=\dim{\mathcal{H}}$ be an arbitrary orthonormal basis of ${\mathcal{H}}$ , and let ${\mathcal{L}}^{{\mathcal{B}}}:=\{\sum_{i,j}a_{ij}\left|i\right\rangle\left\langle j\right|\,|\,[a_{ij}]\in\mathbb{R}^{d\times d}\}$ , ${\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}:={\mathcal{L}}_{{\rm h}}\cap{\mathcal{L}}^{{\mathcal{B}}}$ and ${\mathcal{S}}^{{\mathcal{B}}}:={\mathcal{S}}\cap{\mathcal{L}}^{{\mathcal{B}}}$ .

Proposition 8.7.

${\mathcal{S}}^{{\mathcal{B}}}$ is e-autoparallel in ${\mathcal{S}}$ .

Proof It is easy to see that for each $\rho\in{\mathcal{S}}^{{\mathcal{B}}}$

T^{({\rm m})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}}):=\{\iota_{*}(X_{\rho})\,|\,X_{\rho}\in T_{\rho}({\mathcal{S}}^{{\mathcal{B}}})\}=\{A\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\,|\,{\rm Tr}A=0\}

(8.8)

and that

$\displaystyle T^{({\rm e})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}})$	$\displaystyle:=\{L_{X_{\rho}}\,\|\,X_{\rho}\in T_{\rho}({\mathcal{S}}^{{\mathcal{B}}})\}$
	$\displaystyle=\{A\in{\mathcal{L}}_{{\rm h}}\,\|\,\exists B\in T^{({\rm m})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}}),\;B=\rho\circ A\}$
	$\displaystyle=\{A\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\,\|\,\left\langle A\right\rangle_{\rho}=0\}.$	(8.9)

(See Remark 3.1 for the symbols $T^{({\rm m})}$ and $T^{({\rm e})}$ .) It follows from (8.9) that $A\in T^{({\rm e})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}})\;\Leftrightarrow\;A-\left\langle A\right\rangle_{\sigma}\in T^{({\rm e})}_{\sigma}({\mathcal{S}}^{{\mathcal{B}}})$ , and hence we have from (3.30) that $X_{\rho}\in T_{\rho}({\mathcal{S}}^{{\mathcal{B}}})\;\Leftrightarrow\;\Phi_{\rho,\sigma}^{({\rm e})}(X_{\rho})\in T_{\sigma}({\mathcal{S}}^{{\mathcal{B}}})$ . This proves the proposition by Prop. 2.4. $\Box$

Let us examine whether the e-autoparallelity of ${\mathcal{S}}^{{\mathcal{B}}}$ can be understood as an example of Theorem 8.4. Namely, the problem is whether ${\mathcal{S}}^{{\mathcal{B}}}$ is an integral manifold of an e-parallel distribution ${\mathcal{V}}_{{\mathcal{A}}}$ for some ${\mathcal{A}}$ satisfying condition (ii) in Theorem 8.4. For each $\rho\in{\mathcal{S}}^{{\mathcal{B}}}$ , we have $T^{({\rm e})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}})=\{A-\left\langle A\right\rangle_{\rho}\,|\,A\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\}$ (see (8.9)), which means that $T_{\rho}({\mathcal{S}}^{{\mathcal{B}}})=V_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}},\rho}$ and that ${\mathcal{S}}^{{\mathcal{B}}}$ is an integral manifold of the distribution ${\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}}$ . Noting that ${\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}+\mathbb{R}={\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}$ , the problem comes down to whether

\{[[A,B],\rho]\,|\,A,B\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\}\subset\{\rho\circ C\,|\,C\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\}

(8.10)

holds for every $\rho\in{\mathcal{S}}$ . The answer is no, except when $\dim{\mathcal{H}}=2$ .

Proposition 8.8.

When $\dim{\mathcal{H}}\geq 3$ ,

\exists\rho\in{\mathcal{S}},\;\{[[A,B],\rho]\,|\,A,B\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\}\not\subset\{\rho\circ C\,|\,C\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\}.

(8.11)

As a consequence, the distribution ${\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}}$ is not involutive.

Proof We represent operators on a $d$ -dimensional Hilbert space by $d\times d$ matrices, and show that there exist a strictly positive density matrix $\rho$ and real symmetric matrices $A,B$ such that $[[A,B],\rho]$ cannot be represented as $\rho\circ C$ by any real symmetric $C$ when $d\geq 3$ . Let

A_{1}:=\left(\begin{matrix}1&0&0\\ 0&1&0\\ 0&0&0\end{matrix}\right),\;B_{1}:=\left(\begin{matrix}0&1&1\\ 1&0&1\\ 1&1&0\end{matrix}\right),\;P_{1}:=\left(\begin{matrix}1&i{\varepsilon}&i{\varepsilon}\\ -i{\varepsilon}&1&i{\varepsilon}\\ -i{\varepsilon}&-i{\varepsilon}&1\end{matrix}\right),

where $i:=\sqrt{-1}$ and ${\varepsilon}$ is an arbitrary real number, and let $A,B$ and $\rho$ be $d\times d$ matrices with the block representations

A:=\left(\begin{array}[]{c|c}A_{1}&0\\ \hline\cr 0&0\end{array}\right),\;B:=\left(\begin{array}[]{c|c}B_{1}&0\\ \hline\cr 0&0\end{array}\right),\;\rho:=\frac{1}{d}\left(\begin{array}[]{c|c}P_{1}&0\\ \hline\cr 0&I\end{array}\right).

Then $A,B$ are real symmetric, and $\rho$ is Hermitian with trace 1 and strictly positive when $|{\varepsilon}|$ is sufficiently small. A direct calculation shows that

[[A,B],\rho]=\frac{1}{d}\left(\begin{array}[]{c|c}Q_{1}&0\\ \hline\cr 0&I\end{array}\right)\quad\text{with}\quad Q_{1}:=[[A_{1},B_{1}],P_{1}]=\left(\begin{matrix}0&0&-i{\varepsilon}\\ 0&0&i{\varepsilon}\\ i{\varepsilon}&-i{\varepsilon}&0\end{matrix}\right).

Suppose that a $d\times d$ real symmetric matrix $C$ satisfies $[[A,B],\rho]=\rho\circ C$ . Letting $C_{1}$ be the $3\times 3$ block of $C$ , the $3\times 3$ block of $\rho\circ C$ equals $\frac{1}{d}P_{1}\circ C_{1}$ . Hence we have $Q_{1}=P_{1}\circ C_{1}$ , which is rewritten as

i{\varepsilon}\left(\begin{matrix}0&0&-1\\ 0&0&1\\ 1&-1&0\end{matrix}\right)=C_{1}+i{\varepsilon}\left(\begin{matrix}0&1&1\\ -1&0&1\\ -1&-1&0\end{matrix}\right)\circ C_{1}.

Since $C_{1}\in\mathbb{R}^{3\times 3}$ and ${\varepsilon}\in\mathbb{R}$ , this implies that $C_{1}=0$ and ${\varepsilon}=0$ . Therefore, if we take ${\varepsilon}\neq 0$ , no real symmetric $C$ satisfies $[[A,B],\rho]=\rho\circ C$ . $\Box$

The above result implies that the e-parallel distribution ${\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}}$ does not induce a foliation with e-autoparallel leaves and that ${\mathcal{S}}^{{\mathcal{B}}}$ is an isolated integral manifold of ${\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}}$ when $\dim{\mathcal{H}}\geq 3$ . The exceptional case $\dim{\mathcal{H}}=2$ will be discussed in the next section.

Remark 8.9.

Prop. 8.7 holds for a wide class of information geometric structures, not limited to the SLD structure. In fact, the proof of Prop. 8.7 given above relies only upon the fact that if $\rho\in{\mathcal{S}}^{{\mathcal{B}}}$ and $X_{\rho}\in T_{\rho}({\mathcal{S}}^{{\mathcal{B}}})$ , then $L_{X_{\rho}}\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}$ . Due to (3.8) stating that $\iota_{*}(X_{\rho})=\Omega_{\rho}(L_{X_{\rho}})$ , this fact is shared by the e-connection defined from an arbitrary family of inner products $\left\langle\cdot,\cdot\right\rangle_{\rho}=\left\langle\cdot,\Omega_{\rho}(\cdot)\right\rangle_{\rm HS}$ , $\rho\in{\mathcal{S}}$ , such that

\forall\rho\in{\mathcal{S}}^{{\mathcal{B}}},\;\Omega_{\rho}({\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}})={\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}.

(8.12)

This means that Prop. 8.7 holds under this mild condition on $\{\Omega_{\rho}\}_{\rho\in{\mathcal{S}}}$ . In particular, if $\Omega_{\rho}$ is represented in the form (3.21) by a function $f:(0,\infty)\rightarrow(0,\infty)$ such that $\forall x>0,\,xf(1/x)=f(x)$ and $f(1)=1$ as in the case of monotone metrics, condition (8.12) is satisfied. To verify this, we represent (3.21) as $\Omega_{\rho}=f(\Delta_{\rho}){\mathcal{R}}_{\rho}$ , where ${\mathcal{R}}_{\rho}:A\mapsto A\rho$ , and consider $\Omega_{\rho}$ as a $\mathbb{C}$ -linear map ${\mathcal{L}}\rightarrow{\mathcal{L}}$ . Then it is easy to see that if $\rho\in{\mathcal{S}}^{{\mathcal{B}}}$ , then ${\mathcal{R}}_{\rho}({\mathcal{L}}^{{\mathcal{B}}})={\mathcal{L}}^{{\mathcal{B}}}$ and $\Delta_{\rho}({\mathcal{L}}^{{\mathcal{B}}})={\mathcal{L}}^{{\mathcal{B}}}$ , which yields $f(\Delta_{\rho})({\mathcal{L}}^{{\mathcal{B}}})={\mathcal{L}}^{{\mathcal{B}}}$ , and hence we have $\Omega_{\rho}({\mathcal{L}}^{{\mathcal{B}}})={\mathcal{L}}^{{\mathcal{B}}}$ . Combined with $\Omega_{\rho}({\mathcal{L}}_{{\rm h}})={\mathcal{L}}_{{\rm h}}$ , this proves (8.12).

Remark 8.10.

Since (8.8) shows that $T^{({\rm m})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}})=\iota_{*}(T_{\rho}({\mathcal{S}}^{{\mathcal{B}}}))$ does not depend on $\rho$ , ${\mathcal{S}}^{{\mathcal{B}}}$ is m-autoparallel in ${\mathcal{S}}$ , so that ${\mathcal{S}}^{{\mathcal{B}}}$ is doubly autoparallel (e.g, [10]) w.r.t. the e, m-connections. This example exhibits a remarkable contrast to the following fact for the classical case [11]; if a submanifold ${\mathcal{M}}$ of ${\mathcal{P}}(\Omega)$ , where $\Omega$ is an arbitrary finite set, is doubly autoparallel in ${\mathcal{P}}(\Omega)$ w.r.t. the e, m-connections, then ${\mathcal{M}}$ is statistically isomorphic to ${\mathcal{P}}(\Omega^{\prime})$ for some finite set $\Omega^{\prime}$ .

9 Qubit manifolds

Throughout this section, we assume ${\mathcal{H}}$ to be 2-dimensional. To begin with, we make some preparations. Let $\{\sigma_{1},\sigma_{2},\sigma_{3}\}\subset{\mathcal{L}}_{{\rm h}}$ be a triple of Pauli operators such that

{\rm Tr}\,\sigma_{i}=0,\quad\sigma_{i}^{2}=I,\quad\text{and}\quad\sigma_{i}\sigma_{i+1}=\sqrt{-1}\,\sigma_{i+2}\quad(i:{\rm mod}\ 3).

(9.1)

Then $\{\sigma_{1},\sigma_{2},\sigma_{3}\}$ form a basis of ${{\mathcal{L}}_{{\rm h}}}_{,0}$ . For any $\vec{a}=(a_{i})\in\mathbb{R}^{3}$ , we write $\vec{a}\cdot\vec{\sigma}:=\sum_{i}a_{i}\sigma_{i}$ , so that we have

{{\mathcal{L}}_{{\rm h}}}_{,0}=\{\vec{a}\cdot\vec{\sigma}\,|\,\vec{a}\in\mathbb{R}^{3}\}.

(9.2)

It follows that

$\displaystyle(\vec{a}\cdot\vec{\sigma})(\vec{b}\cdot\vec{\sigma})$	$\displaystyle=(\vec{a}\cdot\vec{b})I+\sqrt{-1}\,(\vec{a}\times\vec{b})\cdot\vec{\sigma},$	(9.3)
$\displaystyle(\vec{a}\cdot\vec{\sigma})\circ(\vec{b}\cdot\vec{\sigma})$	$\displaystyle=(\vec{a}\cdot\vec{b})I,$	(9.4)
$\displaystyle[\vec{a}\cdot\vec{\sigma},\vec{b}\cdot\vec{\sigma}]$	$\displaystyle=2\sqrt{-1}\,(\vec{a}\times\vec{b})\cdot\vec{\sigma},$	(9.5)

where $\vec{a}\cdot\vec{b}=\sum_{i}a_{i}b_{i}$ , and $\vec{a}\times\vec{b}=\vec{c}$ $\Leftrightarrow$ $\forall i:{\rm mod}\ 3$ , $a_{i}b_{i+1}-a_{i+1}b_{i}=c_{i+2}$ . The manifold ${\mathcal{S}}={\mathcal{S}}({\mathcal{H}})$ is represented as

{\mathcal{S}}=\{\rho_{\vec{r}}\,|\,\vec{r}\in{\mathcal{R}}\},

(9.6)

where

\rho_{\vec{r}}:=\frac{1}{2}(I+\vec{r}\cdot\vec{\sigma}),\quad{\mathcal{R}}:=\{\vec{r}\in\mathbb{R}^{3}\,|\,\|\vec{r}\|:=\sqrt{\vec{r}\cdot\vec{r}}<1\}.

(9.7)

For $\rho=\rho_{\vec{r}}$ and $A=a_{0}I+\vec{a}\cdot\vec{\sigma}$ , we have $\left\langle A\right\rangle_{\rho}=a_{0}+\vec{r}\cdot\vec{a}$ .

A tangent vector $X_{\rho}\in T_{\rho}({\mathcal{S}})$ at $\rho=\rho_{\vec{r}}$ is represented by a 3-dimensional vector $\vec{x}\in\mathbb{R}^{3}$ such that

\iota_{*}(X_{\rho})=\frac{1}{2}\vec{x}\cdot\vec{\sigma}.

(9.8)

The SLD of $X_{\rho}$ is then represented as

L_{X_{\rho}}=\ell_{\vec{r}}(\vec{x})\cdot\vec{\sigma}-\lambda_{\vec{r}}(\vec{x})I,

(9.9)

where

\lambda_{\vec{r}}(\vec{x}):=\frac{\vec{x}\cdot\vec{r}}{1-\|\vec{r}\|^{2}}\quad\text{and}\quad\ell_{\vec{r}}(\vec{x}):=\vec{x}+\lambda_{\vec{r}}(\vec{x})\,\vec{r}.

(9.10)

In fact, (9.9) is verified as follows: noting that (9.10) yields $\vec{r}\cdot\ell_{\vec{r}}(\vec{x})=\lambda_{\vec{r}}(\vec{x})$ , we have

$\displaystyle\rho\circ L_{X_{\rho}}$	$\displaystyle=\frac{1}{2}(I+\vec{r}\cdot\vec{\sigma})\circ(\ell_{\vec{r}}(\vec{x})\cdot\vec{\sigma}-\lambda_{\vec{r}}(\vec{x})I)$
	$\displaystyle=\frac{1}{2}(\ell_{\vec{r}}(\vec{x})-\lambda_{\vec{r}}(\vec{x})\,\vec{r})\cdot\vec{\sigma}+\frac{1}{2}((\vec{r}\cdot\ell_{\vec{r}}(\vec{x}))-\lambda_{\vec{r}}(\vec{x}))I$
	$\displaystyle=\frac{1}{2}\vec{x}\cdot\vec{\sigma}=\iota_{*}(X_{r}).$	(9.11)

Let us investigate the e-autoparallel submanifolds of ${\mathcal{S}}$ . We first consider the 1-dimensional case, i.e., the e-geodesics. We recall that the general form of e-geodesic is given by (4.18). Treating the coordinate $\theta$ as a parameter to specify states and choosing $P$ in (4.18) to be a state $\rho_{0}$ , an arbitrary e-geodesic ${\mathcal{M}}$ is represented as the trajectory ${\mathcal{M}}=\{\rho_{\theta}\,|\,\theta\in\mathbb{R}\}$ of

\rho_{\theta}=\frac{1}{Z_{\theta}}\exp\Bigl{(}\frac{\theta}{2}F\Bigr{)}\,\rho_{0}\exp\Bigl{(}\frac{\theta}{2}F\Bigr{)},\quad Z_{\theta}:={\rm Tr}(\rho_{0}\exp(\theta F)),

(9.12)

where $F$ is a Hermitian operator such that $\{F,I\}$ are linearly independent. Since the transformation $F\rightarrow aF+b$ by $a,b\in\mathbb{R}$ , $a\neq 0$ , together with $\theta\rightarrow\frac{1}{a}\theta$ and $\psi\rightarrow\psi+\frac{b}{a}\theta$ , keeps $M$ invariant, we can assume that $F$ is represented as $F=\vec{u}\cdot\vec{\sigma}$ by a unit vector $\vec{u}$ .

Proposition 9.1.

Let $\rho_{0}=\rho_{\vec{r}_{0}}$ and $F=\vec{u}\cdot\vec{\sigma}$ with $\|\vec{u}\|=1$ in (9.12). Letting $\vec{v}$ be a unit vector such that $\vec{u}\cdot\vec{v}=0$ and that $\vec{r}_{0}\in{\rm span}\{\vec{u},\vec{v}\}$ , the e-geodesic ${\mathcal{M}}=\{\rho_{\theta}\,|\,\theta\in\mathbb{R}\}$ is represented as

{\mathcal{M}}=\{\rho_{\vec{r}}\,|\,\vec{r}\in{\mathcal{Q}}\}\quad\text{with}\quad{\mathcal{Q}}:=\{\vec{r}(\xi)\,|\,-1<\xi<1\},

(9.13)

where

	$\displaystyle\vec{r}(\xi):=\xi\,\vec{u}+c\sqrt{1-\xi^{2}}\,\vec{v},$		(9.14)
	$\displaystyle\quad c:=\frac{b}{\sqrt{1-a^{2}}},\quad a:=\vec{r}_{0}\cdot\vec{u},\quad b:=\vec{r}_{0}\cdot\vec{v}.$		(9.15)

(Here $\xi^{2}$ denotes the square of $\xi$ , while the same symbol will appear as the second component of $\xi=(\xi^{i})$ later.) The parameter $\xi$ is m-affine as a coordinate system of ${\mathcal{M}}$ and in one-to-one correspondence with the e-affine parameter $\theta$ by

\displaystyle\xi

\displaystyle=\frac{(1+a)e^{2\theta}-(1-a)}{(1+a)e^{2\theta}+(1-a)}\quad\text{and}\quad\theta=\frac{1}{2}\log\frac{(1-a)(1+\xi)}{(1+a)(1-\xi)}.

(9.16)

Proof Noting that $F=\vec{u}\cdot\vec{\sigma}$ is represented as

F=\rho_{\vec{u}}-\rho_{-\vec{u}}=1\rho_{\vec{u}}+(-1)\rho_{-\vec{u}}

and that this is the spectral decomposition of $F$ with projectors $\{\rho_{\vec{u}},\rho_{-\vec{u}}\}$ , we have

\exp\Bigl{(}\frac{\theta}{2}F\Bigr{)}=e^{\theta/2}\rho_{\vec{u}}+e^{-\theta/2}\rho_{-\vec{u}}=\cosh(\theta/2)I+\sinh(\theta/2)\vec{u}\cdot\vec{\sigma}.

Using this expression and representing $\vec{r}_{0}$ as $\vec{r}_{0}=a\vec{u}+b\vec{v}$ by $a:=\vec{r}_{0}\cdot\vec{u}$ and $b:=\vec{r}_{0}\cdot\vec{v}$ , a direct calculation shows that

\displaystyle\exp\Bigl{(}\frac{\theta}{2}F\Bigr{)}\,\rho_{\vec{r}_{0}}\exp\Bigl{(}\frac{\theta}{2}F\Bigr{)}=\frac{Z_{\theta}}{2}\,I+\frac{1}{2}\bigl{\{}(a\cosh\theta+\sinh\theta)\,\vec{u}+b\,\vec{v}\bigr{\}}\cdot\vec{\sigma}

and $Z_{\theta}=\cosh\theta+a\sinh\theta,$ which yields

\rho_{\theta}=\frac{1}{2}(I+\vec{s}(\theta)\cdot\vec{\sigma}),

where

\displaystyle\vec{s}(\theta)

\displaystyle:=\frac{a\cosh\theta+\sinh\theta}{\cosh\theta+a\sinh\theta}\,\vec{u}+\frac{b}{\cosh\theta+a\sinh\theta}\,\vec{v}.

If we define $\xi$ from $\theta$ by (9.16), we have

\frac{a\cosh\theta+\sinh\theta}{\cosh\theta+a\sinh\theta}=\xi\quad\text{and}\quad\frac{b}{\cosh\theta+a\sinh\theta}=c\sqrt{1-\xi^{2}},

so that $\vec{s}(\theta)=\vec{r}(\xi)$ . It is easy to see that the range of $\xi$ is $(-1,1)$ , and we obtain (9.13). In addition, since

\left\langle F\right\rangle_{\rho_{\vec{r}(\xi)}}=\vec{r}(\xi)\cdot\vec{u}=\xi,

the parameter $\xi$ is m-affine. $\Box$

Note that ${\mathcal{Q}}$ in the above proposition forms a semi-ellipse in the open unit ball ${\mathcal{R}}$ obtained by cutting an ellipse in half on the major axis; see Fig.1. In the special case of $c=0$ , the semi-ellipse becomes a straight line.

Refer to caption — Figure 1: The semi-ellipse representing an e-geodesic

Next, let us proceed to considering the 2-dimensional case. In searching for 2-dimensional e-autoparallel submanifolds, the previously obtained knowledge of e-geodesics provides an important clue. If a 2-dimensional submanifold ${\mathcal{M}}=\{\rho_{\vec{r}}\,|\,\vec{r}\in{\mathcal{Q}}\}$ is e-autoparallel, it must be e-totally geodesic, and hence the surface ${\mathcal{Q}}$ should be a union of semi-ellipses. The following proposition claims that a 2-dimensional e-autoparallel submanifold is obtained as a semi-ellipsoid formed by rotating a semi-ellipse representing an e-geodesic around its minor axis.

Proposition 9.2.

Given an orthonormal basis $\{\vec{u}_{1},\vec{u}_{2},\vec{v}\}$ of $\mathbb{R}^{3}$ and a real constant $c$ satisfying $|c|<1$ , let

{\mathcal{Q}}:=\{\vec{r}(\xi)\,|\,\xi=(\xi^{1},\xi^{2})\in\mathbb{R}^{2},\;(\xi^{1})^{2}+(\xi^{2})^{2}<1\},

(9.17)

where

\vec{r}(\xi):=\xi^{1}\vec{u}_{1}+\xi^{2}\vec{u}_{2}+c\sqrt{1-(\xi^{1})^{2}-(\xi^{2})^{2}}\,\vec{v}.

(9.18)

Then ${\mathcal{M}}:=\{\rho_{\vec{r}}\,|\,\vec{r}\in{\mathcal{Q}}\}$ is e-autoparallel in ${\mathcal{S}}$ , and the parameter $\xi=(\xi^{1},\xi^{2})$ is m-affine as a coordinate system of ${\mathcal{M}}$ . More specifically, letting $F^{i}:=\vec{u}_{i}\cdot\vec{\sigma}$ , ${\mathcal{A}}:={\rm span}\{F^{1},F^{2}\}$ , and ${\mathcal{V}}_{{\mathcal{A}}}:{\mathcal{S}}\ni\rho\mapsto V_{{\mathcal{A}},\rho}$ be the e-parallel distribution defined from ${\mathcal{A}}$ by (8.6), ${\mathcal{M}}$ is an integral manifold of ${\mathcal{V}}_{{\mathcal{A}}}$ and $\xi^{i}=\left\langle F^{i}\right\rangle_{\rho_{\vec{r}(\xi)}}$ .

Proof For $i\in\{1,2\}$ , let

\vec{x}_{i}:=\partial_{i}\vec{r}(\xi)=\vec{u}_{i}-\frac{c\,\xi^{i}}{\alpha(\xi)}\,\vec{v},

where $\partial_{i}:=\frac{\partial}{\partial\xi^{i}}$ and $\alpha(\xi):=\sqrt{1-(\xi^{1})^{2}-(\xi^{2})^{2}}$ . Noting that

\|\vec{r}(\xi)\|^{2}=1-(1-c^{2})\alpha(\xi)^{2}\quad\text{and}\quad\vec{x}_{i}\cdot\vec{r}(\xi)=(1-c^{2})\xi^{i},

we have

	$\displaystyle\ell_{\vec{r}(\xi)}(\vec{x}_{i})$	$\displaystyle=\vec{x}_{i}+\frac{\xi^{i}}{\alpha(\xi)^{2}}\,\vec{r}(\xi)$
		$\displaystyle=\vec{u}_{i}+\frac{\xi^{i}\xi^{1}}{\alpha(\xi)^{2}}\,\vec{u}_{1}+\frac{\xi^{i}\xi^{2}}{\alpha(\xi)^{2}}\,\vec{u}_{2}\;\in{\rm span}\,\{\vec{u}_{1},\vec{u}_{2}\},$

where the terms proportional to $\vec{v}$ included in $\vec{x}_{i}$ and $\vec{r}(\xi)$ cancel, yielding the last line. Owing to (9.9) this implies that the SLDs satisfy $L_{i,\xi}\in{\rm span}\,\{F^{1},F^{2}\}\oplus\mathbb{R}$ for $i\in\{1,2\}$ , which means that the first condition in (3.33) is satisfied. The second condition is also satisfied since $\left\langle F^{i}\right\rangle_{\rho_{\vec{r}(\xi)}}=\vec{u}_{i}\cdot\vec{r}(\xi)=\xi^{i}$ . Thus the claim of the proposition follows from Prop. 3.8. $\Box$

As can be seen from naive geometric intuition, for any point $\vec{r}$ in ${\mathcal{R}}$ and any plane $P=\vec{r}+V$ containing $\vec{r}$ , where $V$ is a 2-dimensional linear subspace of $\mathbb{R}^{3}$ , there always exist an orthonormal basis $\{\vec{u}_{1},\vec{u}_{2},\vec{v}\}$ and a constant $c\in(-1,1)$ such that the semi-ellipsoid ${\mathcal{Q}}$ defined from them by (9.17) and (9.18) contains $\vec{r}$ and has $P$ as the tangent plane at $\vec{r}$ . In fact, such $\{\vec{u}_{1},\vec{u}_{2},\vec{v}\}$ and $c$ are obtained as follows: take an orthonormal basis $\{\vec{u}_{1},\vec{u}_{2},\vec{v}\}$ so that $\{\vec{u}_{1},\vec{u}_{2}\}\subset\ell_{\vec{r}}(V)=\{\ell_{\vec{r}}(\vec{x})\,|\,\vec{x}\in V\}$ , and then let $\beta^{2}:=(\vec{r}\cdot\vec{u}_{1})^{2}+(\vec{r}\cdot\vec{u}_{2})^{2}$ (i.e. the squared norm of the orthogonal projection of $\vec{r}$ onto $\ell_{\vec{r}}(V)$ ), $\gamma:=\vec{r}\cdot\vec{v}$ , and $c:=\gamma/\sqrt{1-\beta^{2}}$ . Since $\dim{\mathcal{S}}=3$ , this fact means that $({\mathcal{S}},\nabla^{(e)})$ satisfies condition (i) of Prop. 8.3, and necessarily satisfies condition (ii) as well. Invoking (3.26), condition (ii) is expressed as follows.

Proposition 9.3.

When $\dim{\mathcal{H}}=2$ , for any $\rho\in{\mathcal{S}}$ and any $A,B\in{\mathcal{L}}_{{\rm h}}$ satisfying $\left\langle A\right\rangle_{\rho}=\left\langle B\right\rangle_{\rho}=0$ we have

[[A,B],\rho]\in{\rm span}\,\{\rho\circ A,\rho\circ B\}.

(9.19)

This proposition can also be proved directly by the use of the following lemma, whose proof is given in A5 of Appendix.

Lemma 9.4.

When $\dim{\mathcal{H}}=2$ , for any $\rho\in{\mathcal{S}}$ and any $A,B\in{\mathcal{L}}_{{\rm h}}$ , we have

	$\displaystyle\frac{1}{2}[[A,B],\rho]=$	$\displaystyle({\rm Tr}A-2\left\langle A\right\rangle_{\rho})(\rho\circ B)-({\rm Tr}B-2\left\langle B\right\rangle_{\rho})(\rho\circ A)$
		$\displaystyle+\bigl{\{}({\rm Tr}B)\left\langle A\right\rangle_{\rho}-({\rm Tr}A)\left\langle B\right\rangle_{\rho}\bigr{\}}\,\rho.$		(9.20)

Letting $\left\langle A\right\rangle_{\rho}=\left\langle B\right\rangle_{\rho}=0$ in (9.20), we obtain

\displaystyle[[A,B],\rho]=

\displaystyle 2({\rm Tr}A)\,(\rho\circ B)-2({\rm Tr}B)\,(\rho\circ A),

(9.21)

which proves Prop. 9.3.

The following proposition immediately follows from Prop. 9.3, which presents a remarkable contrast to Prop. 8.8 for the case $\dim{\mathcal{H}}\geq 3$ .

Proposition 9.5.

When $\dim{\mathcal{H}}=2$ , for any orthonormal basis ${\mathcal{B}}$ of ${\mathcal{H}}$ it holds that

\forall\rho\in{\mathcal{S}},\;\{[[A,B],\rho]\,|\,A,B\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\}\subset\{\rho\circ C\,|\,C\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\}.

(9.22)

Proof Obvious from Prop. 9.3 since ${\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}$ is an $\mathbb{R}$ -linear space with $I\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}$ . $\Box$

Thus, the distribution ${\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}}$ is involutive, and induces a foliation ${\mathcal{S}}=\bigsqcup_{\alpha}{\mathcal{M}}_{\alpha}$ whose leaves $\{{\mathcal{M}}_{\alpha}\}$ are 2-dimensional e-autoparallel submanifolds that are integral manifolds of ${\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}}$ . Furthermore, we can see from the following lemma that every $2$ -dimensional e-autoparallel submanifold of ${\mathcal{S}}$ is an integral manifold of ${\mathcal{V}}_{{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}}$ for some ${\mathcal{B}}$ .

Lemma 9.6.

When $\dim{\mathcal{H}}=2$ , for any $A,B\in{\mathcal{L}}_{{\rm h}}$ there exists an orthonormal basis ${\mathcal{B}}$ such that $\{A,B\}\subset{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}$ .

Proof Let $\{\left|1\right\rangle,\left|2\right\rangle\}$ be an orthonormal basis that diagonalizes $A$ , and choose $\beta\in\mathbb{C}$ so that $|\beta|=1$ and $\beta\langle 1|B|2\rangle\in\mathbb{R}$ . Then ${\mathcal{B}}:=\{\left|1\right\rangle,\beta\left|2\right\rangle\}$ satisfies the desired condition. $\Box$

10 Concluding remarks

In this paper we studied the autoparallelity w.r.t. the e-connection for an information-geometric structure induced on ${\mathcal{S}}({\mathcal{H}})$ . In particular, we focused on the e-autoparallelity for the SLD structure, for which two different estimation-theoretical characterizations were given. We also investigated the existence conditions for e-autoparallel submanifolds by way of the involutivity of e-parallel distributions and its relation to the torsion tensor. As a result, a specialty of the qubit case was revealed.

Since the obtained estimation-theoretical characterizations of the e-autoparallelity are complete in themselves, we do not see at this time what kind of development lies ahead. It is expected that the future development of quantum estimation theory and related fields may reveal new directions. The classical exponential family has a variety of important properties besides the existence of efficient estimator, some of which may present new materical to characterize certain geometric notions.

For the autoparallelity w.r.t. non-flat connections, our understanding is still very limited. For example, we do not yet have the whole picture about e-autoparallel submanifolds of ${\mathcal{S}}({\mathcal{H}})$ when $\dim{\mathcal{H}}\geq 3$ . We look forward to further research on this topic in information geometry and/or general differential geometry.

It may also be a challenging problem to develop the infinite-dimensional quantum information geometry so that Theorem 5.1 is extended to the case when $\dim{\mathcal{H}}=\infty$ and that the naive geometric consideration on the quantum Gaussian shift model presented in Section 6 is mathematically justified.

Geometry of quantum statistical manifolds in an asymptotic framework would also be an important subject to be addressed. For example, consider a sequence ${\mathcal{M}}^{(n)}=\{\rho_{\xi}^{\otimes n}\}$ , $n=1,2,\dots$ , of i.i.d. extensions of a quantum statistical model ${\mathcal{M}}=\{\rho_{\xi}\}$ . Recent progress in asymptotic quantum statistics has revealed that the sequence exhibits a desirable property called a quantum local asymptotic normality, which tells us that in a shrinking ( $\sim 1/\sqrt{n}$ ) neighbourhood of a given point $\xi_{0}$ , the sequence converges to a quantum Gaussian shift model [12, 13, 14, 15, 16]. As pointed out in Section 6, the limiting quantum Gaussian shift model has a characteristic feature in view of quantum information geometry. It would, therefore, be an interesting future project to extend the geometrical idea presented in this paper to an asymptotic framework so that the convergence of quantum statistical manifolds can be discussed under a suitably chosen topology.

Acknowledgments

This work was partly supported by JSPS KAKENHI Grant Numbers 23H05492 (HN), 23H01090 (AF) and 17H02861 (HN, AF).

References

[1] Rao CR. Linear Statistical Inference and Its Applications (2nd ed). John Wiley & Sons (1973).
[2] Kiefer JC. Introduction to Statistical Inference. Springer-Verlag (1987).
[3] Amari S, Nagaoka H. Methods of Information Geometry. American Mathematical Society, Oxford University Press (2000).
[4] Kobayashi S, Nomizu K. Foundations of Differential Geometry, II. New York: John Wiley & Sons (1969).
[5] Nagaoka H., Amari S. Differential geometry of smooth families of probability distributions. METR 82-7, Dept Math Eng and Instr Phys, Univ. of Tokyo (1982). (https://www.keisu.t.u-tokyo.ac.jp/research/techrep/y1982/) (https://bsi-ni.brain.riken.jp/database/item/104)
[6] Helstrom CW. Minimum mean-square error estimation in quantum statistics. Phys Lett (1967) 25A:101–102 .
[7] Helstrom CW. Quantum Detection and Estimation Theory. New York: Academic Press (1976).
[8] Holevo AS. Probabilistic and Statistical Aspects of Quantum Theory. Pisa: Edizioni della Normale (2011). (the previous edition. Amsterdam: North-Holland (1982).)
[9] Petz D. Monotone metrics on matrix spaces. Linear Algebra Appl (1996) 244:81–96.
[10] Ohara A. Geodesics for dual connections and means on symmetric cones. Integr equ oper theory (2004) 50:537–548.
[11] Nagaoka H. Information-geometrical characterization of statistical models which are statistically equivalent to probability simplexes. Proc. 2017 IEEE International Symposium on Information (2017) 1346–1350.
[12] Guţă M, Kahn J. Local asymptotic normality for qubit states. Phys Rev A (2006) 73:052108.
[13] Kahn, J, Guţă M. Local asymptotic normality for finite dimensional quantum systems. Comm Math Phys (2009) 289:597–652.
[14] Yamagata K, Fujiwara A, Gill RD. Quantum local asymptotic normality based on a new quantum likelihood ratio, Ann Statist (2013) 41:2197–2217.
[15] Fujiwara A, Yamagata K. Noncommutative Lebesgue decomposition and contiguity with application to quantum local asymptotic normality. Bernoulli (2020) 26:2105–2142.
[16] Fujiwara A, Yamagata K. Efficiency of estimators for locally asymptotically normal quantum statistical models. Ann Statist (to appear). (arXiv:2209.00832)

Appendix

A1 Proof of (3.26)

We first consider the general situation where the e-connection $\nabla^{({\rm e})}$ is determined by a family of inner products $\left\langle A,B\right\rangle_{\rho}=\left\langle A,\Omega_{\rho}(B)\right\rangle_{\rm HS}$ , and show that the torsion ${{\mathcal{T}}}^{({\rm e})}$ of $\nabla^{({\rm e})}$ is represented as follows: for any $X,Y\in{\mathscr{X}}({\mathcal{S}})$ ,

\displaystyle\iota_{*}({{\mathcal{T}}}^{({\rm e})}(X,Y))

\displaystyle=(Y\Omega)(L_{X})-(X\Omega)(L_{Y}),

(A.1)

where $Y\Omega:\rho\mapsto Y_{\rho}\Omega$ denotes the derivative of the super-operator-valued map $\Omega:\rho\mapsto\Omega_{\rho}$ w.r.t. $Y$ , and $(Y\Omega)(L_{X})$ denotes tha map $\rho\mapsto(Y_{\rho}\Omega)(L_{X_{\rho}})\in{\mathcal{L}}_{{\rm h}}$ . In fact, invoking (3.8) and (3.27), we have for any $X,Y,Z\in{\mathscr{X}}({\mathcal{S}})$ ,

	$\displaystyle Z={{\mathcal{T}}}^{({\rm e})}(X,Y)\;$	$\displaystyle\Leftrightarrow\;Z=\nabla_{X}^{({\rm e})}Y-\nabla_{Y}^{({\rm e})}X-[X,Y]$
		$\displaystyle\Leftrightarrow\;L_{Z}=(XL_{Y}+g(X,Y))-(YL_{X}+g(Y,X))-L_{[X,Y]}$
		$\displaystyle\Leftrightarrow\;L_{Z}=XL_{Y}-YL_{X}-L_{[X,Y]}$
		$\displaystyle\Leftrightarrow\;\iota_{*}(Z)=\Omega(XL_{Y}-YL_{X}-L_{[X,Y]})$
		$\displaystyle\Leftrightarrow\;\iota_{}(Z)=\Omega(XL_{Y})-\Omega(YL_{X})-\iota_{}([X,Y]).$

Noting that

	$\displaystyle\iota_{*}([X,Y])$	$\displaystyle=X\iota_{}(Y)-Y\iota_{}(X)$
		$\displaystyle=X(\Omega(L_{Y}))-Y(\Omega(L_{X}))$
		$\displaystyle=(X\Omega)(L_{Y})+\Omega(XL_{Y})-(Y\Omega)(L_{X})-\Omega(YL_{X}),$

we obtain (A.1). For the SLD structure, we have $\Omega_{\rho}(A)=\frac{1}{2}(\iota(\rho)\,A+A\,\iota(\rho))$ , which yields that for any point $\rho\in{\mathcal{S}}$ and any tangent vectors $X_{\rho},Y_{\rho}\in T_{\rho}({\mathcal{S}})$ ,

	$\displaystyle(Y_{\rho}\Omega)(L_{X_{\rho}})$	$\displaystyle=\frac{1}{2}\left\{\iota_{}(Y_{\rho})L_{X_{\rho}}+L_{X_{\rho}}\iota_{}(Y_{\rho})\right\}$
		$\displaystyle=\frac{1}{4}(\rho L_{Y_{\rho}}L_{X_{\rho}}+L_{Y_{\rho}}\rho L_{X_{\rho}}+L_{X_{\rho}}\rho L_{Y_{\rho}}+L_{X_{\rho}}L_{Y_{\rho}}\rho).$

Similarly, we have

\displaystyle(X_{\rho}\Omega)(L_{Y_{\rho}})

\displaystyle=\frac{1}{4}(\rho L_{X_{\rho}}L_{Y_{\rho}}+L_{X_{\rho}}\rho L_{Y_{\rho}}+L_{Y_{\rho}}\rho L_{X_{\rho}}+L_{Y_{\rho}}L_{X_{\rho}}\rho).

Substituting these into (A.1), we obtain (3.26).

A2 Proof of Lemma 4.2

(1)

Let $A^{i}:=\int\hat{\xi}^{i}\Pi(d\xi)$ . Then we have

	$\displaystyle A^{i}$	$\displaystyle=\sum_{k}p_{k}f^{i}(k,X^{k})=\sum_{k}p_{k}(\gamma_{i}^{k}+\frac{w^{i}_{k}}{p_{k}}X^{k})$
		$\displaystyle=\xi^{i}(\rho)+\sum_{k}w_{k}^{i}X^{k}=\xi^{i}(\rho)+\sum_{k}\sum_{j}w_{k}^{i}u^{k}_{j}\,L^{j}_{\rho}=\xi^{i}(\rho)+L_{\rho}^{i},$

where we have invoked (4.13), (4.10), (4.11) and (4.12). Now $\Pi\in{\mathcal{U}}(\rho,\xi)$ follows from (4.6).

(2)

Invoking (4.13), we have for each $k$

$\displaystyle B^{k}$	$\displaystyle:=\int\Bigl{\{}\sum_{i}u_{i}^{k}(\hat{\xi}^{i}-\xi^{i}(\rho))\Bigr{\}}^{2}\Pi(d\hat{\xi})$
	$\displaystyle=\sum_{l}p_{l}\Bigl{\{}\sum_{i}u_{i}^{k}(f^{i}(l,X^{l})-\xi^{i}(\rho))\Bigr{\}}^{2}$
	$\displaystyle=\sum_{l}p_{l}(C_{l}^{k}+a_{l}^{k})^{2},$	(A.2)

where

C_{l}^{k}:=\sum_{i}u_{i}^{k}\frac{w_{l}^{i}}{p_{l}}X^{l}=\frac{\delta_{l}^{k}}{p_{k}}X^{k}.

(A.3)

This leads to

	$\displaystyle{({\mbox{\boldmath$u$}}^{k})}{}^{T}V_{\rho}(\Pi){\mbox{\boldmath$u$}}^{k}$	$\displaystyle={\rm Tr}(\rho B^{k})=\sum_{l}p_{l}{\rm Tr}(\rho(C_{l}^{k}+a_{l}^{k})^{2})$
		$\displaystyle=\sum_{l}p_{l}{\rm Tr}(\rho(C_{l}^{k})^{2})+\sum_{l}p_{l}(a_{l}^{k})^{2},$		(A.4)

where we invoked ${\rm Tr}(\rho\,C_{l}^{k})=0$ due to $C_{l}^{k}\in{\rm span}\{L_{i,\rho}\}_{i=1}^{n}$ . Recalling (4.11) and (A.3), we have

	$\displaystyle\sum_{l}p_{l}{\rm Tr}(\rho(C_{l}^{k})^{2})$	$\displaystyle=\frac{1}{p_{k}}^{,}{\rm Tr}(\rho(X^{k})^{2})$
		$\displaystyle=\frac{1}{p_{k}}\,\sum_{i,j}u_{i}^{k}u_{j}^{k}\left\langle L^{i}_{\rho},L^{j}_{\rho}\right\rangle_{\rho}=\frac{1}{p_{k}}\,{({\mbox{\boldmath$u$}}^{k})}{}^{T}G_{\rho}^{-1}{\mbox{\boldmath$u$}}^{k},$		(A.5)

which, combined with (A.4), yields the desired identity.

A3 Proofs of Propositions 7.9, 7.10 and 7.11

Proof of Prop. 7.9 This proposition is essentially contained in Theorem 7.2 of [3]. Here we give a proof for the reader’s convenience.

Given $F\in{\mathcal{L}}_{{\rm h}}$ and $\rho\in{\mathcal{S}}$ , there exists a tangent vector $X_{\rho}\in T_{\rho}({\mathcal{S}})$ satisfying $L_{X_{\rho}}=F-\left\langle F\right\rangle_{\rho}$ by (3.10). Applying (7.18) to the case ${\mathcal{M}}={\mathcal{S}}$ , we have $X_{\rho}=({\rm grad}\leavevmode\nobreak\ \!\left\langle F\right\rangle)_{\rho}$ , and hence

\displaystyle\|(d\left\langle F\right\rangle)_{\rho}\|_{\rho}^{2}=\|X_{\rho}\|_{\rho}^{2}=\left\langle L_{X_{\rho}},L_{X_{\rho}}\right\rangle_{\rho}=\bigl{\langle}(F-\left\langle F\right\rangle_{\rho})^{2}\bigr{\rangle}_{\rho}=V_{\rho}(F).

$\Box$

Proof of Prop. 7.10 Recalling (7.15) and (7.16), we have

{\mathcal{E}}({\mathcal{S}})=\{f\in{\mathcal{F}}({\mathcal{S}})\,|\,\exists F\in{\mathcal{L}}_{{\rm h}},\;f=\left\langle F\right\rangle\;\;\text{and}\;\;\forall\rho\in{\mathcal{S}},\;V_{\rho}(F)=\|(df)_{\rho}\|_{\rho}^{2}\,\}.

(A.6)

Since the condition $V_{\rho}(F)=\|(df)_{\rho}\|_{\rho}^{2}$ is always satisfied by Prop. 7.9, we have the first equality in (7.10). The second equality follows from Prop. 7.2, and the third follows since under the relation $X\stackrel{{\scriptstyle g}}{{\longleftrightarrow}}\omega$ we have

$X$ is e-parallel	$\displaystyle\Leftrightarrow\;\forall Y,Z\in{\mathscr{X}}({\mathcal{S}}),\;g(\nabla^{({\rm e})}_{Y}X,Z)=0$
	$\displaystyle\Leftrightarrow\;\forall Y,Z\in{\mathscr{X}}({\mathcal{S}}),\;Yg(X,Z)=g(X,\nabla^{({\rm m})}_{Y}Z)$
	$\displaystyle\Leftrightarrow\;\forall Y,Z\in{\mathscr{X}}({\mathcal{S}}),\;Y\omega(Z)=\omega(\nabla^{({\rm m})}_{Y}Z)$
	$\displaystyle\Leftrightarrow\;\text{$\omega$ is m-parallel}.$	(A.7)

$\Box$

Proof of Prop. 7.11 By Propositions 7.9 and 7.10, the condition imposed on $f$ in (7.25) is equivalent to the existence of $F\in{\mathcal{L}}_{{\rm h}}$ satisfying (7.15). $\Box$

A4 A result on the relationship between autoparallelity and total geodesicness

In Remark 8.6 we noted that condition (ii) of Prop. 8.3 implies the equivalence between autoparallelity and total geodesicness. This is restated in the following proposition.

Proposition A.1.

Suppose that an affine connection $\nabla$ is given on a manifold $S$ whose torsion satisfies

\forall X,Y\in{\mathscr{X}}(S),\;{\mathcal{T}}^{(\nabla)}(X,Y)\in{\rm span}_{{\mathcal{F}}(S)}\{X,Y\}.

(A.8)

Then every $\nabla$ -totally geodesic submanifold of $S$ is $\nabla$ -autoparallel.

We present a proof below, which is almost parallel to the proof of Theorem 8.4 in Chap. VII of [4] cited as a result due to E. Cartan.

Proof Let $\dim S=n+r$ , and $M$ be a $\nabla$ -totally geodesic submanifold with $\dim M=n$ . We take a coordinate system $\tilde{\xi}=(\tilde{\xi}^{i})$ of $S$ such that $M$ is represented as

M=\{p\in S\,|\,\forall i\in\{n+1,\ldots,n+r\},\;\tilde{\xi}^{i}(p)=0\}

and that $(\xi^{1},\ldots,\xi^{n}):=(\tilde{\xi}^{1}|_{M},\ldots,\tilde{\xi}^{n}|_{M})$ forms a coordinate system of $M$ . Let $\tilde{\partial}_{i}:=\frac{\partial}{\partial\tilde{\xi}^{i}}$ , $\partial_{i}:=\frac{\partial}{\partial\xi^{i}}$ , and denote the connection coefficients of $\nabla$ w.r.t. $\tilde{\xi}$ by $\{\Gamma_{ij}^{k}\}$ : $\nabla_{\tilde{\partial}_{i}}\tilde{\partial}_{j}=\sum_{k}\Gamma_{ij}^{k}\,\tilde{\partial}_{k}$ for $i,j\in\{1,\ldots,n+r\}$ . For arbitrary $i,j$ , it follows from the assumption (A.8) that ${\mathcal{T}}^{(\nabla)}(\tilde{\partial}_{i},\tilde{\partial}_{j})=\sum_{k}(\Gamma_{ij}^{k}-\Gamma_{ji}^{k})\,\tilde{\partial}_{k}\in{\rm span}_{{\mathcal{F}}(S)}\{\tilde{\partial}_{i},\tilde{\partial}_{j}\}$ , which implies that $\Gamma_{ij}^{k}-\Gamma_{ji}^{k}=0$ for any $k\notin\{i,j\}$ . Hence we have

\forall i,j\in\{1,\ldots,n\},\;\forall k\in\{n+1,\ldots,n+r\},\;\Gamma_{ij}^{k}=\Gamma_{ji}^{k}.

(A.9)

Given a point $p\in M$ and a tangent vector $X_{p}=\sum_{i=1}^{n}x^{i}\,(\partial_{i})_{p}\in T_{p}(M)$ arbitrarily, let $\gamma:t\mapsto\gamma(t)$ be a $\nabla$ -geodesic with an affine parameter $t$ satisfying $\gamma(0)=p$ and $\dot{\gamma}(0):=\frac{d}{dt}\gamma(t)|_{t=0}=X_{p}$ . The geodesic should satisfy the differential equation $\nabla_{\dot{\gamma}(t)}\dot{\gamma}(t)=0$ , which is represented as

\displaystyle\forall k\in\{1,

\displaystyle\ldots,n+r\},\;\frac{d^{2}}{dt^{2}}\,\tilde{\xi}^{k}(\gamma(t))+\sum_{i,j=1}^{n+r}\frac{d}{dt}\,\tilde{\xi}^{i}(\gamma(t))\,\frac{d}{dt}\,\tilde{\xi}^{j}(\gamma(t))\,\bigl{(}\Gamma_{ij}^{k}\bigr{)}_{\gamma(t)}=0.

Since $M$ is assumed to be $\nabla$ -totally geodesic, $\gamma(t)$ stays in $M$ and hence $\tilde{\xi}^{k}(\gamma(t))=0$ for $k\in\{n+1,\ldots,n+r\}$ . Therefore, the above equation yields

\forall k\in\{n+1,\ldots,n+r\},\;\sum_{i,j=1}^{n}\frac{d}{dt}\,\tilde{\xi}^{i}(\gamma(t))\,\frac{d}{dt}\,\tilde{\xi}^{j}(\gamma(t))\,\bigl{(}\Gamma_{ij}^{k}\bigr{)}_{\gamma(t)}=0,

and letting $t=0$ , we obtain

\forall k\in\{n+1,\ldots,n+r\},\;\sum_{i,j=1}^{n}x^{i}x^{j}\bigl{(}\Gamma_{ij}^{k}\bigr{)}_{p}=0.

Since $p\in M$ and $X_{p}=\sum_{i=1}^{n}x^{i}(\partial_{i})_{p}$ are arbitrary and $\Gamma_{ij}^{k}$ is symmetric w.r.t. $i\leftrightarrow j$ by (A.9), it follows that

\forall i,j\in\{1,\ldots,n\},\;\forall k\in\{n+1,\ldots,n+r\},\;\Gamma_{ij}^{k}|_{M}=0.

(A.10)

Now, for arbitrary vector fields $X=\sum_{i=1}^{n}X^{i}\partial_{i}=\sum_{i=1}^{n}X^{i}\tilde{\partial_{i}}|_{M}$ and $Y=\sum_{j=1}^{n}Y^{j}\partial_{j}$ $=\sum_{j=1}^{n}Y^{j}\tilde{\partial_{j}}|_{M}$ on $M$ , where $\{X^{i}\},\{Y^{j}\}\subset{\mathcal{F}}(M)$ , we have

	$\displaystyle\nabla_{X}Y$	$\displaystyle=\sum_{i,j=1}^{n}\sum_{k=1}^{n+r}X^{i}Y^{j}\Gamma_{ij}^{k}\|_{M}\,\tilde{\partial}_{k}\|_{M}+\sum_{j=1}^{n}X(Y^{j})\partial_{j}$
		$\displaystyle=\sum_{i,j=1}^{n}\sum_{k=1}^{n}X^{i}Y^{j}\Gamma_{ij}^{k}\|_{M}\,\partial_{k}+\sum_{j=1}^{n}X(Y^{j})\partial_{j}\;\in{\mathscr{X}}(M),$

which concludes that $M$ is $\nabla$ -autoparallel in $S$ . $\Box$

Note that the only difference from the proof of [4] is whether (A.9) is derived from ${\mathcal{T}}^{(\nabla)}=0$ or from the weaker assumption (A.8).

A5 Proof of Lemma 9.4

When ${\rm Tr}A={\rm Tr}B=0$ , (9.20) is reduced to

\displaystyle\frac{1}{2}[[A,B],\rho]=

\displaystyle-2\left\langle A\right\rangle_{\rho}(\rho\circ B)+2\left\langle B\right\rangle_{\rho}(\rho\circ A),

(A.11)

which we prove first. Letting $A=\vec{a}\cdot\vec{\sigma}$ , $B=\vec{b}\cdot\vec{\sigma}$ and $\rho=\frac{1}{2}(I+\vec{r}\cdot\sigma)$ , it immediately follows from (9.4) and (9.5) that

\frac{1}{2}[[A,B],\rho]=(\vec{r}\times(\vec{a}\times\vec{b}))\cdot\vec{\sigma}

and

-2\left\langle A\right\rangle_{\rho}(\rho\circ B)+2\left\langle B\right\rangle_{\rho}(\rho\circ A)=\bigl{\{}(\vec{b}\cdot\vec{r})\vec{a}-(\vec{a}\cdot\vec{r})\vec{b}\bigr{\}}\cdot\vec{\sigma}.

Hence, the well-known formula for the vector triple product proves (A.11).

Remove the assumption ${\rm Tr}A={\rm Tr}B=0$ , and let $A^{\prime}:=A-\frac{{\rm Tr}A}{2}I$ and $B^{\prime}:=B-\frac{{\rm Tr}B}{2}I$ . Then we have

	$\displaystyle\frac{1}{2}[[A,B],\rho]=$	$\displaystyle\frac{1}{2}[[A^{\prime},B^{\prime}],\rho]$
	$\displaystyle=$	$\displaystyle-2\left\langle A^{\prime}\right\rangle_{\rho}(\rho\circ B^{\prime})+2\left\langle B^{\prime}\right\rangle_{\rho}(\rho\circ A^{\prime})$
	$\displaystyle=$	$\displaystyle({\rm Tr}A-2\left\langle A\right\rangle_{\rho})(\rho\circ B)-({\rm Tr}B-2\left\langle B\right\rangle_{\rho})(\rho\circ A)$
		$\displaystyle+\bigl{\{}({\rm Tr}B)\left\langle A\right\rangle_{\rho}-({\rm Tr}A)\left\langle B\right\rangle_{\rho}\bigr{\}}\,\rho,$

where the second equality follows from (A.11). Thus we obtain (9.20).

$\displaystyle\\|A-f(\rho)\\|^{2}_{\rho}$	$\displaystyle=\\|\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}\\|_{\rho}^{2}+\\|A-f(\rho)-\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}\\|_{\rho}^{2}$
	$\displaystyle\geq\\|\sum_{j}\partial_{j}f(\rho)L^{j}_{\rho}\\|_{\rho}^{2}$
	$\displaystyle=\sum_{i,j}g^{ij}(\rho)\,\partial_{i}f(\rho)\,\partial_{j}f(\rho)=\\|(df)_{\rho}\\|_{\rho}^{2}.$	(7.14)

$\displaystyle{\mathcal{E}}({\mathcal{S}})$	$\displaystyle=\{\left\langle F\right\rangle\,\|\,F\in{\mathcal{L}}_{{\rm h}}\}$
	$\displaystyle=\{f\in{\mathcal{F}}({\mathcal{S}})\,\|\,\text{${\rm grad}\leavevmode\nobreak\ \!f$ is e-parallel}\}$
	$\displaystyle=\{f\in{\mathcal{F}}({\mathcal{S}})\,\|\,\text{$df$ is m-parallel}\},$	(7.23)

	$\displaystyle{\mathcal{E}}({\mathcal{M}})=\bigl{\{}f\in{\mathcal{F}}({\mathcal{M}})\,\big{\|}\,\exists\tilde{f}\in{\mathcal{E}}({\mathcal{S}}),$	$\displaystyle\;f=\tilde{f}\|_{{\mathcal{M}}}\;\;\text{and}\;\;$
		$\displaystyle\forall\rho\in{\mathcal{M}},\;\\|(df)_{\rho}\\|_{\rho}=\\|(d\tilde{f})_{\rho}\\|_{\rho}\,\bigr{\}}.$		(7.25)

	$\displaystyle{\mathcal{E}}(M):=\bigl{\{}f\in{\mathcal{F}}(M)\,\big{\|}\,\exists\tilde{f}\in{\mathcal{E}}(S),$	$\displaystyle\;f=\tilde{f}\|_{M}\;\;\text{and}\;\;$
		$\displaystyle\forall\rho\in M,\;\\|(df)_{\rho}\\|_{\rho}=\\|(d\tilde{f})_{\rho}\\|_{\rho}\,\bigr{\}}.$		(7.28)

$\displaystyle T^{({\rm e})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}})$	$\displaystyle:=\{L_{X_{\rho}}\,\|\,X_{\rho}\in T_{\rho}({\mathcal{S}}^{{\mathcal{B}}})\}$
	$\displaystyle=\{A\in{\mathcal{L}}_{{\rm h}}\,\|\,\exists B\in T^{({\rm m})}_{\rho}({\mathcal{S}}^{{\mathcal{B}}}),\;B=\rho\circ A\}$
	$\displaystyle=\{A\in{\mathcal{L}}_{{\rm h}}^{{\mathcal{B}}}\,\|\,\left\langle A\right\rangle_{\rho}=0\}.$	(8.9)