This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Information Geometry for Maximum Diversity Distributions

Shinto Eguchi
Abstract

In recent years, biodiversity measures have gained prominence as essential tools for ecological and environmental assessments, particularly in the context of increasingly complex and large-scale datasets. We provide a comprehensive review of diversity measures, including the Gini-Simpson index, Hill numbers, and Rao’s quadratic entropy, examining their roles in capturing various aspects of biodiversity. Among these, Rao’s quadratic entropy stands out for its ability to incorporate not only species abundance but also functional and genetic dissimilarities. The paper emphasizes the statistical and ecological significance of Rao’s quadratic entropy under the information geometry framework. We explore the distribution maximizing such a diversity measure under linear constraints that reflect ecological realities, such as resource competition or habitat suitability. Furthermore, we discuss a unified approach of the Leinster-Cobbold index combining Hill numbers and Rao’s entropy, allowing for an adaptable and similarity-sensitive measure of biodiversity. Finally, we discuss the information geometry associated with the maximum diversity distribution focusing on the cross diversity measures such as the cross-entropy.

1 Introduction

This paper examines the application of information geometry to biodiversity measurement. Various indices, including the Simpson-Gini index, Hill numbers, and Rao’s quadratic entropy, have been developed and utilized to assess the ecological states of biological communities. In particular, Rao’s quadratic entropy has become a cornerstone of biodiversity research , which incorporates both species abundance and functional or genetic dissimilarity, see Rao (1982a, b). By combining species abundance with functional or genetic differences, Rao’s quadratic entropy offers a comprehensive tool for ecological and population genetics studies, enabling ecologists to measure and interpret diversity in ways that were not possible before. By combining species abundance with functional and genetic differences across large datasets, it provides insights into ecosystem stability, resilience, and conservation needs. As computational power and data accessibility continue to grow, Rao’s quadratic entropy will likely remain at the forefront of biodiversity assessments, driving advances in ecological understanding and environmental stewardship. Rao’s groundbreaking contributions in statistical theory and information geometry have had a profound influence on the development of diversity measures, see Rao (1945). His introduction of the Fisher-Rao information metric laid the foundation for viewing statistical models as geometric entities, an approach that has become instrumental in machine learning, ecological modeling, and data science, see Rao (1961, 1987).

We present a comprehensive review of diversity indices through the lens of information geometry. This approach highlights the deep interconnections between Rao’s information-theoretic measures and ecological resilience, revealing how the geometric properties of diversity measures can enhance our understanding of community structure. Specifically, we discuss the maximum diversity distributions under linear constraints, examining how ecological factors, such as resource limitations or habitat suitability, can be represented as geometric constraints on species distributions. We reduce the problem of finding the maximal distribution to a mathematical optimization characterized by the Karush-Kuhn-Tucker (KKT) conditions to find an optimal solution. We demonstrate that the maximum Hill-number distributions can be characterized using qq-geodesics, which generalize the concept of geodesic paths in the simplex and provide insightful connections between statistical estimation and ecological interpretations. This approach extends Rao’s vision of information geometry, providing ecologists and conservationists with a robust method for identifying optimal diversity configurations in response to environmental constraints. In addition, we propose practical applications for this framework, emphasizing how Rao’s contributions to information geometry continue to shape and advance biodiversity research, guiding conservation strategies and promoting a deeper understanding of ecosystem dynamics. From this information-geometric viewpoint, we offer valuable insights into ecosystem stability and resilience. Our approach not only advances theoretical understanding but also has practical implications for biodiversity conservation and environmental management.

The paper is organized as follows: Section 2 builds a mathematical framework for diversity measures, in which we overview the standard indices of diversity such as the Gini-Simpson index, the Hill number and Rao’s quadratic entropy. In Section 3 we present an idea of maximum divergence distributions under a linear constraint focusing on the Hill numbers. The approach is extended to a case of multiple constraints. The ecological interpretation is explored from the practical point of view. Section 4 gives information-geometric understanding for the maximum divergence distributions in a simplex, or a space of categorical distributions. Notably, we associate a foliation, a partitioning into submanifolds, of the simplex with the maximum divergence distribution under linear constraints. This geometric structure provides deeper insight into the behavior of diversity measures within the simplex of categorical distributions. In Section 5 we discuss the maximum divergence distribution in comprehensive perspectives.

2 Biodiversity measures

We provide an overview of common measures used to quantify the diversity of a community, cf. May (1975). Consider a community with SS species, where pip_{i} represents the relative abundance of species ii. The set of all possible relative abundance vectors is represented by the (S1)(S-1)-dimensional simplex:

ΔS1={𝒑=(p1,,pS):i=1Spi=1,pi0(i=1,,S)}.\Delta_{S-1}=\Big{\{}\mbox{\boldmath$p$}=(p_{1},\ldots,p_{S}):\sum_{i=1}^{S}p_{i}=1,\,p_{i}\geq 0\,(i=1,\ldots,S)\Big{\}}.

Henceforth, we will identify ΔS1\Delta_{S-1} with the set of all SS-variate categorical distributions. The following measures of diversity, applied in various ecological contexts, enable various views of biodiversity, emphasizing different aspects from species evenness and dominance. The Gini-Simpson index is given by

DGS(𝒑)=1i=1Spi2D_{\text{GS}}({\mbox{\boldmath$p$}})=1-\sum_{i=1}^{S}p_{i}^{2}

for 𝒑=(pi){\mbox{\boldmath$p$}}=(p_{i}) of ΔS1\Delta_{S-1}, see Simpson (1949). The Gini-Simpson index represents the probability that two randomly selected individuals from a community belong to different species. This index is sensitive to species evenness and increases as species abundances become more evenly distributed. The Hill Numbers provide a general framework for diversity that is sensitive to the order qq, which adjusts the emphasis on species richness versus evenness. The Hill number of order qq is defined as:

Dq(𝒑)=(i=1Spi)q11q\displaystyle{}^{q}\!D({\mbox{\boldmath$p$}})=\left(\sum_{i=1}^{S}p_{i}{}^{q}\!\right)^{\frac{1}{1-q}} (1)

where qq is a controlling parameter that determines the sensitivity of the index to species abundances, see Hill (1973). When q=0q=0, D0(𝒑)=S{}^{0}\!D({\mbox{\boldmath$p$}})=S, which is simply species richness (the count of species). When qq goes to 11, the measure becomes Boltzmann-Shannon entropy, with Dq(𝒑){}^{q}\!D({\mbox{\boldmath$p$}}) calculated as D1(𝒑)=exp(H(𝒑)),{}^{1}\!D({\mbox{\boldmath$p$}})=\exp\left(H({\mbox{\boldmath$p$}})\right), where

H(𝒑)=i=1Spilog(pi).H({\mbox{\boldmath$p$}})=-\sum_{i=1}^{S}p_{i}\log(p_{i}).

This form uses the exponential of Boltzmann-Shannon entropy H(𝒑)H({\mbox{\boldmath$p$}}) to remain in the Hill framework. This quantifies the uncertainty in predicting the species of a randomly chosen individual. It is maximized when all species are equally abundant, indicating high diversity, and decreases as dominance by fewer species increases. When q=2q=2, The measure reduces to the inverse Simpson index

D2(𝒑)=(i=1Spi2)1{}^{2}\!D({\mbox{\boldmath$p$}})=\left(\sum_{i=1}^{S}p_{i}^{2}\right)^{-1}

which gives greater weight to common species and is particularly useful for measuring dominance. Thus, the parameter qq adjusts the sensitivity of the Hill number to species abundances, with higher qq-values emphasizing dominant species and lower values favoring rare species. The Berger-Parker index focuses on dominance by measuring the proportional abundance of the most abundant species

DBP(𝒑)=max(pi)D_{\text{BP}}({\mbox{\boldmath$p$}})=\max(p_{i})

where pip_{i} is the relative abundance of each species. The Berger-Parker index is useful for identifying communities dominated by a single or few species, with higher values indicating lower diversity and higher dominance by certain species. Alternatively, the inverse Berger-Parker index 1/DBP(𝒑)1/D_{\text{B}P}({\mbox{\boldmath$p$}}) can be used as a diversity measure, where higher values indicate greater diversity. When the order of the Hill number goes to infinity, it goes the inverse Berger-Parker index. This is because

limqDq(𝒑)=limq[max(pi){i=1S(pimax(pi))q}1q]q1q\lim_{q\rightarrow\infty}{}^{q}\!D({\mbox{\boldmath$p$}})=\lim_{q\rightarrow\infty}\left[\max(p_{i})\left\{\sum_{i=1}^{S}\left(\frac{p_{i}}{\max(p_{i})}\right)^{q}\right\}^{\frac{1}{q}}\right]^{\frac{q}{1-q}}

which is nothing but 1/DBP(𝒑)1/D_{\text{B}P}({\mbox{\boldmath$p$}}). Alternatively, as qq\rightarrow-\infty, the Hill number converges to the inverse of the smallest relative abundance: limqDq(𝒑)=1/mini(pi)\lim_{q\rightarrow-\infty}{}^{q}\!D({\mbox{\boldmath$p$}})=1/\min_{i}(p_{i}).

Rao’ s quadratic entropy is a measure of diversity that accounts not only for species abundance but also for the dissimilarity between species, such as functional or genetic differences, see Rao (1982a, b). It is defined as

Q(𝒑)=i=1Sj=1SpipjWij,{\mathrm{Q}}({\mbox{\boldmath$p$}})=\sum_{i=1}^{S}\sum_{j=1}^{S}p_{i}p_{j}{W}_{ij},

where 𝒑=(p1,,pS)\mbox{\boldmath$p$}=(p_{1},...,p_{S}) represents the relative abundances, and WijW_{ij} is a measure of dissimilarity between species ii and jj satisfying Wii=0W_{ii}=0 and Wij0W_{ij}\geq 0. For example, the dissimilarity matrix is often derived from a distance matrix calculated using measures such as Gower’s distance, Manhattan distance, and others, see Botta-Dukát (2005); Southwood and Henderson (2009). Rao’ s quadratic entropy increases with both the abundance of different species and their functional or genetic differences.

When Wij=1W_{ij}=1 for all iji\neq j, Rao’s quadratic entropy reduces to a form of the Gini-Simpson index. This measure is particularly useful for quantifying functional diversity in ecosystems where species differ in ecological roles or genetic traits. The Hill number Dq(𝒑){}^{q}\!D({\mbox{\boldmath$p$}}) is a family of diversity indices parameterized by qq, which controls the sensitivity of the measure to species abundances. Hill numbers are ‘effective’ numbers, meaning they give an intuitively meaningful measure of the effective number of species in a community by reflecting both species abundances and the emphasis on common versus rare species through the parameter qq. Rao’s quadratic entropy, on the other hand, incorporates both species abundances and a dissimilarity matrix that quantifies pairwise similarities between species, allowing it to account for the phylogenetic or functional differences between them. Mathematically, it is defined as the expected dissimilarity between two randomly chosen individuals from a community. In essence, Rao’s entropy provides a diversity measure that is sensitive not only to species abundances but also to their similarity, making it valuable for communities where species may vary significantly in terms of genetic, functional, or ecological traits. To integrate these two concepts the Leinster-Cobbold index is proposed by

D𝒁q(𝒑)=(i=1Spi(𝒁𝒑)iq1)11q{}^{q}\!D_{\mbox{\scriptsize\boldmath$Z$}}({\mbox{\boldmath$p$}})=\left(\sum_{i=1}^{S}p_{i}({\mbox{\boldmath$Z$}}{\mbox{\boldmath$p$}})_{i}^{q-1}\right)^{\frac{1}{1-q}}

where 𝒁Z is the similarity matrix with entries ZijZ_{ij} representing the similarity between species ii and jj satisfying Zii=1Z_{ii}=1 and 0Zij10\leq Z_{ij}\leq 1, see Leinster and Cobbold (2012). The term (𝒁𝒑)i=j=1SZijpj(\mbox{\boldmath$Z$}\mbox{\boldmath$p$})_{i}=\sum_{j=1}^{S}Z_{ij}p_{j} is the average similarity of species ii to all species in the community, weighted their relative abundances. This retains the effective number interpretation of Hill numbers while incorporating a similarity matrix like Rao’s quadratic entropy. For a given community, they define a generalized diversity measure that combines both relative abundances and similarities among species. The sensitivity parameter qq of the Hill number framework is retained, controlling the emphasis on rare versus common species, and a similarity matrix 𝒁Z quantifies species resemblance. Leinster and Cobbold’s index D𝒁2(𝒑){}^{2}\!D_{\mbox{\scriptsize\boldmath$Z$}}({\mbox{\boldmath$p$}}) with q=2q=2 simplifies

D𝒁2(𝒑)=1𝒑𝒁𝒑,{}^{2}\!D_{\mbox{\scriptsize\boldmath$Z$}}(\mbox{\boldmath$p$})=\frac{1}{\mbox{\boldmath$p$}^{\top}\mbox{\boldmath$Z$}\mbox{\boldmath$p$}},

which, under some conditions, aligns with the inverse of Rao’s quadratic entropy when 𝒁=𝐈𝑾\mbox{\boldmath$Z$}={\bf I}-\mbox{\boldmath$W$} (i.e., the similarity matrix is derived from the dissimilarity matrx). This makes it a direct similarity-sensitive generalization of the Gini-Simpson index. By varying qq, one obtains a diversity profile that reflects different levels of sensitivity to rare and common species, while the similarity matrix 𝒁Z ensures that closely related species contribute less to the overall diversity measure. This connection allows both Hill numbers and Rao’s entropy to be seen as part of a single framework, with D𝒁q(𝒑){}^{q}\!D_{\mbox{\scriptsize\boldmath$Z$}}({\mbox{\boldmath$p$}}) converging to the traditional Hill numbers when species are maximally distinct (i.e., the similarity matrix 𝒁Z is an identity matrix). In this unified measure, the flexibility of Hill numbers in emphasizing different aspects of species abundance combines with Rao’s consideration of species dissimilarity, providing a comprehensive and adaptable biodiversity measure that bridges the strengths of both approaches.

3 Maximum diversity distributions

This section explores the mathematical foundations of maximizing biodiversity measures, focusing on Hill numbers, Rao’s quadratic entropy, and the Leinster-Cobbold index. We present optimization results under realistic ecological constraints, such as linear resource limits, and examine their implications for understanding community dynamics. By integrating these mathematical insights with ecological interpretations, we aim to provide a robust framework for evaluating and managing biodiversity in both theoretical and applied contexts.

Hill numbers Dq(𝒑){}^{q}\!D({\mbox{\boldmath$p$}})are interpreted as effective species counts, emphasizing their bounded nature and ensuring that the do not exceed the actual species count SS, see Hill (1973); Jost (2006). Here, we confirm the maximum of the Hill numbers within a mathematical framework.

Proposition 1.

Let 𝐞=(ei){\mbox{\boldmath$e$}}=(e_{i}) be a uniform distribution, that is, ei=1Se_{i}=\frac{1}{S} for i=1,,Si=1,...,S. Then, the Hill number Dq(𝐩){}^{q}\!D({\mbox{\boldmath$p$}}) attains its maximum value SS at 𝐞e.

Proof.

We aim to show that Dq(𝒑)S{}^{q}\!D({\mbox{\boldmath$p$}})\leq S for all 𝒑ΔS1\mbox{\boldmath$p$}\in\Delta_{S-1}, with equality if and only if 𝒑=𝒆\mbox{\boldmath$p$}=\mbox{\boldmath$e$}.
Case 1: q>1q>1. The function f(p)=pqf(p)=p^{q} is convex on [0,1][0,1] when q>1q>1. By Jensen’s inequality for convex functions:

1Si=1Spiq(1Si=1Spi)q=Sq.\displaystyle\frac{1}{S}\sum_{i=1}^{S}p_{i}{}^{q}\geq\Big{(}\frac{1}{S}\sum_{i=1}^{S}{p_{i}}\Big{)}^{q}=S^{-q}. (2)

Thus, Dq(𝒑)S{}^{q}\!D({\mbox{\boldmath$p$}})\leq S. The equality holds when and only when 𝒑=𝒆{\mbox{\boldmath$p$}}={\mbox{\boldmath$e$}}.
Case 2: q<1q<1, The function f(p)=pqf(p)=p^{q} is concave on [0,1][0,1] when q<1q<1. The reverse of inequality (2) holds and hence, i=1SpiqS1q\sum_{i=1}^{S}p_{i}{}^{q}\!\leq S^{1-q}, or we get the same inequality: Dq(𝒑)S{}^{q}\!D({\mbox{\boldmath$p$}})\leq S due to 1q>01-q>0. Similarly, the equality holds when and only when 𝒑=𝒆{\mbox{\boldmath$p$}}={\mbox{\boldmath$e$}}. Thus, in both cases, the maximum Hill number is SS and is attained uniquely at the uniform distribution 𝒆e. ∎

Extending this result, over a (K1)(K-1)-dimensional simplex ΔK1\Delta_{K-1} (i.e., considering only KK out of SS species), the Hill number Dq(𝒑){}^{q}\!D({\mbox{\boldmath$p$}}) attains its maximum value KK when 𝒑p is the uniform distribution over those KK species, with pi=1Kp_{i}=\frac{1}{K} for the KK species and pi=0p_{i}=0 for the remaining SKS-K species. In this way, the Hill number satisfies 1Dq(𝒑)S1\leq{}^{q}\!D({\mbox{\boldmath$p$}})\leq S. When Dq(𝒑)=1{}^{q}\!D({\mbox{\boldmath$p$}})=1, the community is entirely dominated by a single species, indicating. Conversely, when Dq(𝒑)=S{}^{q}\!D({\mbox{\boldmath$p$}})=S, the community has maximum diversity with species occurring in equal abundances. Therefore, the Hill number Dq(𝒑){}^{q}\!D({\mbox{\boldmath$p$}}) represents the effective number of species, providing a meaningful measure of diversity that accounts for both species richness and evenness. It is helpful to compare values of the Hill number across different ecosystems or treatments, considering both the index value and how close it is to its theoretical maximum for that number of species, see Chao (1987) for ecological perspectives.

However, in real ecosystems, perfect evenness is rare, and the maximizing may not always be ecologically desirable or feasible, see Colwell and Coddington (1994). Maximizing the Hill number might conflict with other ecological goals or constraints. To address this, we introduce a linear constraint on species distribution to reflect real-world ecological limits like resource availability, climate tolerance, and habitat capacity. We fix a linear constraint

i=1Saipi=C,\sum_{i=1}^{S}a_{i}p_{i}=C,

where aia_{i}’s and CC are fixed constants. Here, without losing generality, we assume aia_{i} is positive. The constant CC reflects a fixed ecological limit, which lies within the range aminCamaxa_{\rm min}\leq C\leq a_{\rm max} to ensure the existence for feasible 𝒑=(pi)\mbox{\boldmath$p$}=(p_{i}), where amin=miniaia_{\min}=\min_{i}a_{i} and amax=maxiaia_{\max}=\max_{i}a_{i}. In practice, the value of CC is often set as C=i=1Saiπ^i,C=\sum_{i=1}^{S}a_{i}\hat{\pi}_{i}, where π^i\hat{\pi}_{i} represents an initial estimate of species proportions based on preliminary field data or other relevant ecological information. This approach anchors CC in observed or estimated ecological conditions, providing a realistic basis for the constraint on species distribution. This can represent ecological factors such as resource preference, habitat suitability, or species-specific traits. For instance, in a desert ecosystem, aia_{i} might represent drought tolerance in plants, favoring species with high aia_{i} values as they better adapt to dry conditions. The distribution can reflect species’ access to shared resources under competition. Species with high aia_{i} values may beat others in resource acquisition, resulting in larger pip_{i} values.

In this context, the linear constraint model is particularly relevant for habitats where specific resources are scarce and heavily contested. Such constraints shape community structures by influencing which species thrive, based on factors such as resource efficiency or adaptability to environmental conditions. For a general value of qq in the Hill number framework, the maximum Hill number distribution under a linear constraint is characterized by a power-law form of qq as follows.

Proposition 2.

Assume a linear constraint i=1Saipi=C\sum_{i=1}^{S}a_{i}p_{i}=C, where aia_{i}’s and CC are constants.

  1. 1.

    For q<1q<1, the Hill number Dq(𝒑){}^{q}\!D(\mbox{\boldmath$p$}) is maximized at

    p^θi=(1+(q1)θai)1q1j=1S(1+(q1)θaj)1q1(i=1,,S),\displaystyle\hat{p}_{\theta i}=\frac{(1+(q-1)\theta a_{i})^{\frac{1}{q-1}}}{\sum_{j=1}^{S}(1+(q-1)\theta a_{j})^{\frac{1}{q-1}}}\quad(i=1,...,S), (3)

    if CC satisfies C𝒂CamaxC_{\mbox{\scriptsize\boldmath$a$}}\leq C\leq a_{\max}, where θ\theta is determined by the constraint. Here

    C𝒂=i=1aiqq1i=1ai1q1(i=1,,d).\displaystyle C_{\mbox{\scriptsize\boldmath$a$}}=\frac{\sum_{i=1}a_{i}^{\frac{q}{q-1}}}{\sum_{i=1}a_{i}^{\frac{1}{q-1}}}\quad(i=1,...,d).
  2. 2.

    For q>1q>1, the Hill number Dq(𝒑){}^{q}\!D(\mbox{\boldmath$p$}) is maximized at

    p^θi=[1+(q1)θai]+1q1j=1S[1+(q1)θaj]+1q1(i=1,,S),\displaystyle\hat{p}_{\theta i}=\frac{[1+(q-1)\theta a_{i}]_{+}^{\frac{1}{q-1}}}{\sum_{j=1}^{S}[1+(q-1)\theta a_{j}]_{+}^{\frac{1}{q-1}}}\quad(i=1,...,S), (4)

    if CC satisfies aminCC𝒂a_{\min}\leq C\leq C_{\mbox{\scriptsize\boldmath$a$}}, where [x]+=max{x,0}[x]_{+}=\max\{x,0\}, θ\theta is determined by the constraint.

Proof.

Our goal is to maximize Dq(𝒑){}^{q}\!D({\mbox{\boldmath$p$}}) under the constraints i=1Spi=1\sum_{i=1}^{S}p_{i}=1 and i=1Saipi=C\sum_{i=1}^{S}a_{i}p_{i}=C.
Case 1: q<1q<1. Since Dq(𝒑){}^{q}\!D({\mbox{\boldmath$p$}}) is an increasing function of i=1Spiq\sum_{i=1}^{S}p_{i}^{q} when q<1q<1, maximizing Dq(𝒑){}^{q}\!D({\mbox{\boldmath$p$}}) is equivalent to maximizing i=1Spiq\sum_{i=1}^{S}p_{i}^{q}. We construct the Lagrangian:

=1qi=1Spiqλ(i=1SaipiC)μ(i=1Spi1)i=1Sνipi,\mathcal{L}=\frac{1}{q}\sum_{i=1}^{S}p_{i}{}^{q}-\lambda\Big{(}\sum_{i=1}^{S}a_{i}p_{i}-C\Big{)}-\mu\Big{(}\sum_{i=1}^{S}p_{i}-1\Big{)}-\sum_{i=1}^{S}\nu_{i}p_{i},

where μ\mu, λ\lambda, and νi0\nu_{i}\geq 0 are Lagrange multipliers. Taking the derivative of \mathcal{L} with respect to pip_{i} and setting it to zero gives:

pi=piq1λaiμνi=0\frac{\partial\mathcal{L}}{\partial p_{i}}=p_{i}^{q-1}-\lambda a_{i}-\mu-\nu_{i}=0

If pi>0p_{i}>0, then νi=0\nu_{i}=0, and thus piq1μ+λaip_{i}^{q-1}\propto\mu+\lambda a_{i}. If pi=0p_{i}=0, then pi1q1p_{i}^{\frac{1}{q-1}} is undefied because of q<1q<1, so all pip_{i} must be positive. Setting θ=λ/μ(q1)\theta=\lambda/\mu(q-1), we obtain the global maximizer p^θi\hat{p}_{\theta i} in (3) since (p^θi)i(\hat{p}_{\theta i})_{i} satisfies the KKT conditions.

Next, we address the existence θ\theta in (1/amax,)(-1/a_{\max},\infty) such that (pθi)(p_{\theta i}) satisfies the linear constraint. Let Cθ=i=1SaipθiC_{\theta}=\sum_{i=1}^{S}a_{i}p_{\theta i}. Then,

θlogCθ=S1S4S2S3S2S4,\nabla_{\theta}\log C_{\theta}=\frac{S_{1}S_{4}-S_{2}S_{3}}{S_{2}S_{4}},

where S1=iai2(1+(q1)θai)2qq1,S2=iai(1+(q1)θai)1q1,S3=iai(1+(q1)θai)2qq1,S4=i(1+(q1)θai)1q1.S_{1}=\sum_{i}a_{i}^{2}(1+(q-1)\theta a_{i})^{\frac{2-q}{q-1}},S_{2}=\sum_{i}a_{i}(1+(q-1)\theta a_{i})^{\frac{1}{q-1}},S_{3}=\sum_{i}a_{i}(1+(q-1)\theta a_{i})^{\frac{2-q}{q-1}},S_{4}=\sum_{i}(1+(q-1)\theta a_{i})^{\frac{1}{q-1}}. Hence,

S1S4S2S3=i,j(1+(q1)θai)2qq1(1+(q1)θaj)2qq1(ai2aiaj){S_{1}S_{4}-S_{2}S_{3}}=\sum_{i,j}(1+(q-1)\theta a_{i})^{\frac{2-q}{q-1}}(1+(q-1)\theta a_{j})^{\frac{2-q}{q-1}}(a_{i}^{2}-a_{i}a_{j})

If p~i=(1+(q1)θai)2qq1/j(1+(q1)θaj)2qq1\tilde{p}_{i}=(1+(q-1)\theta a_{i})^{\frac{2-q}{q-1}}/\sum_{j}(1+(q-1)\theta a_{j})^{\frac{2-q}{q-1}}, then

S1S4S2S3iai2p~i(iaip~i)2.{S_{1}S_{4}-S_{2}S_{3}}\propto\sum_{i}a_{i}^{2}\tilde{p}_{i}-\Big{(}\sum_{i}a_{i}\tilde{p}_{i}\Big{)}^{2}.

This implies S1S4S2S30{S_{1}S_{4}-S_{2}S_{3}}\geq 0, or equivalently θlogCθ0\nabla_{\theta}\log C_{\theta}\geq 0, and hence CθC_{\theta} is monotone increasing in θ\theta. Therefore, C𝒂CθamaxC_{\mbox{\scriptsize\boldmath$a$}}\leq C_{\theta}\leq a_{\max} noting limθCθ=C𝒂\lim_{\theta\rightarrow-\infty}C_{\theta}=C_{\mbox{\scriptsize\boldmath$a$}}. This ensures that (q1)θ(q-1)\theta is in (1/amax,)(-1/a_{\max},\infty) such that (p^θi)(\hat{p}_{\theta i}) satisfies the linear constraint.

Case 2: q>1q>1. Similarly, by the monotonicity argument, the Lagrangian is the same as \mathcal{L}, and we can observe the similar argument for the KKT conditions. If pi>0p_{i}>0, then νi=0\nu_{i}=0, and thus piq1=μ+λaip_{i}^{q-1}=\mu+\lambda a_{i}. If pi=0p_{i}=0, then the stationarity condition yields νi=(μ+λai)0\nu_{i}=-(\mu+\lambda a_{i})\geq 0. This yields the equilibrium distribution is given by p^θi\hat{p}_{\theta i} in (4), which satisfies the KKT conditions. Since CθC_{\theta} is monotone increasing, CθC_{\theta} coverges to C𝒂C_{\mbox{\scriptsize\boldmath$a$}} as thetatheta goes to \infty. Therefore, aminC(θ)C𝒂a_{\min}\leq C(\theta)\leq C_{\mbox{\scriptsize\boldmath$a$}}, which ensures the existence θ\theta in (1/amin,)(-1/a_{\min},\infty) such that (p^θi)(\hat{p}_{\theta i}) satisfies the linear constraint. The proof is complete. ∎

We note that, if q=1q=1, then the equilibrium is well-known as the maximum entropy distribution:

p^θi=exp(θai)j=1Sexp(θaj)(i=1,,S),\hat{p}_{\theta i}=\frac{\exp(\theta a_{i})}{\sum_{j=1}^{S}\exp(\theta a_{j})}\quad(i=1,...,S),

in which the equilibrium distribution can be defined for any CC of the fully feasible interval (amin,amax)(a_{\min},a_{\max}). This is referred to as a softmax function that is widely utilized in the field of Machine Learning. It also closely related to MaxEnt model that is important as a species distribution model, see Phillips et al. (2004).

To illustrate the application of Proposition 2, we consider a community with 101101 species, where the trait aia_{i} for species ii is given by

ai=3exp{(i20)2/100}+exp{(i80)2/100}(i=1,,101).a_{i}=3\exp\{-(i-20)^{2}/100\}+\exp\{-(i-80)^{2}/100\}\quad(i=1,...,101).

This trait distribution creates two peaks, representing species with higher ecological advantages. We compute the maximum Dq{}^{q}\!D distributions under this linear constraint i=1101aipi=C\sum_{i=1}^{101}a_{i}p_{i}=C with C=0.2C=0.2 and C=0.5C=0.5 for q=2q=2 and q=0.7q=0.7, respectively. The results are shown in Fig. 1. The left panel of Fig. 1displays the trait values aia_{i} against species ii. The right panel shows the optimized species distribution for q=2q=2 (squares) and q=0.7q=0.7 (circles). For q=2q=2, the distribution assigns zero probability to species with indices between i=11i=11 and i=30i=30, indicating that these species are excluded from the optimal distribution due to the emphasis on common species. For q=0.7q=0.7, the distribution includes all species with non-zero probabilities, promoting evenness across the species. Indeed, for q=0.7q=0.7, the Hill number is more sensitive to rare species, resulting in a more uniform distribution of probabilities among the species. We note that this numerical study can be conducted by solving only a one-parameter equation for θ\theta to satisfy the linear constraint thanks to the result of Proposition 2. Thus, the optimization problem with 101 variables is drastically reduced to such a simplified one.

Refer to caption
Figure 1: The Max Hill number distributions

We discuss ecological meaning for C𝒂C_{\mbox{\scriptsize\boldmath$a$}}, which represents a critical value in the linear constraint. If CC satisfy C<C𝒂C<C_{\mbox{\scriptsize\boldmath$a$}} for a case of q<1q<1, then the optimal distribution becomes degenerate, meaning that it assigns zero probability to some species. This occurs because the resource constraint CC is too restrictive to allow for a distribution where all species have positive abundances. Ecologically, this reflects situations where only species with certain trait values aia_{i} can survive under the given constraints, leading to competitive exclusion and reduced diversity. Thus, the ecosystem cannot sustain all species at non-zero abundances without violating resource constraints. Fig. 2 gives a 3-D plot of the Hill number Dq(𝒑){}^{q}\!D({\mbox{\boldmath$p$}}) against 𝒑p of a 22-dimensional simplex with q=1q=1. In the surface, the curve {𝒑^θ}θ\{\hat{\mbox{\boldmath$p$}}_{\theta}\}_{\theta} has the centered point that attains the global maximum 33 of Dq(𝒑){}^{q}\!D({\mbox{\boldmath$p$}}).

Refer to caption
Figure 2: Plots of the Max Hill number

Fig. 3 gives a contour plot of the Hill number Dq(𝒑){}^{q}\!D({\mbox{\boldmath$p$}}) against pp of a 22-dimensional simplex with q=1q=1. In the simplex, the curve represents the model {𝒑θ}θ\{{\mbox{\boldmath$p$}}_{\theta}\}_{\theta} intersects the line defined by the linear constrain at the point with the conditional maximum; the centered point attains the global maximum 33 of Dq(𝒑){}^{q}\!D({\mbox{\boldmath$p$}}).

Refer to caption
Figure 3: Plots of the Max Hill number

On the other hand, Rao’s quadratic entropy stands out for its ability to incorporate not only species abundance but also functional and genetic dissimilarities. Let us look into the distribution maximizing Rao’s quadratic entropy.

Proposition 3.

Assume that 𝐖W is invertible and that 𝐖1𝟏>𝟎\mbox{\boldmath$W$}^{-1}{\bf 1}>{\bf 0} in Rao’s quadratic entropy Q(𝐩)=𝐩𝐖𝐩{\rm Q}({\mbox{\boldmath$p$}})={\mbox{\boldmath$p$}}^{\top}{\mbox{\boldmath$W$}}{\mbox{\boldmath$p$}}, where 𝟏{\mathbf{1}} denotes a copy vector of 11’s. Then, Q(𝐩){\rm Q}({\mbox{\boldmath$p$}}) is maximized at

𝒑^0(Q)=𝑾1𝟏𝟏𝑾1𝟏.\displaystyle\hat{\mbox{\boldmath$p$}}_{0}^{({\rm Q})}=\frac{{\mbox{\boldmath$W$}}^{-1}{\mathbf{1}}}{{\mathbf{1}}^{\top}{\mbox{\boldmath$W$}}^{-1}{\mathbf{1}}}. (5)
Proof.

We introduce the Lagrangian function with the multipliers μ\mu and 𝝂\nu for 𝒑p to be in ΔS1\Delta_{S-1}:

=12𝒑𝑾𝒑μ(𝟏𝒑1)𝝂𝒑.\mathcal{L}=\frac{1}{2}{\mbox{\boldmath$p$}}^{\top}\mbox{\boldmath$W$}{\mbox{\boldmath$p$}}-\mu\left({\mathbf{1}}^{\top}{\mbox{\boldmath$p$}}-1\right)-\mbox{\boldmath$\nu$}^{\top}\mbox{\boldmath$p$}.

The stationarity condition yields 𝑾𝒑μ𝟏+𝝂=𝟎{\mbox{\boldmath$W$}}{\mbox{\boldmath$p$}}-\mu{\bf 1}+\mbox{\boldmath$\nu$}={\bf 0}. If pi=0p_{i}=0, then μi=0\mu_{i}=0, and we have (𝑾𝒑)i=μ.({\mbox{\boldmath$W$}}\mbox{\boldmath$p$})_{i}=\mu. Therefore, 𝑾𝒑=μ𝟏.\mbox{\boldmath$W$}\mbox{\boldmath$p$}=\mu{\bf 1}. Solving for 𝒑p, we get 𝒑=𝑾1𝟏/(𝟏𝑾1𝟏).\mbox{\boldmath$p$}={\mbox{\boldmath$W$}^{-1}{\bf 1}}/({{\bf 1}^{\top}\mbox{\boldmath$W$}^{-1}{\bf 1}}). The positivity of 𝑾1𝟏\mbox{\boldmath$W$}^{-1}{\bf 1} ensures that 𝒑0\mbox{\boldmath$p$}\geq 0 and the KKT conditions are satisfied. ∎

We discuss a case where entries of 𝑾1𝟏\mbox{\boldmath$W$}^{-1}{\bf 1} have positive and negative signs. The optimal distribution would be in the boundary of ΔS1\Delta_{S-1}, however any closed form is not solved here. For example, consider the following examples of a dissimilarity matrix:

𝑾1=[0123103223033220],𝑾2=[0123102122013110].\mbox{\boldmath$W$}_{1}=\begin{bmatrix}0&1&2&3\\ 1&0&3&2\\ 2&3&0&3\\ 3&2&2&0\end{bmatrix},\quad\mbox{\boldmath$W$}_{2}=\begin{bmatrix}0&1&2&3\\ 1&0&2&1\\ 2&2&0&1\\ 3&1&1&0\end{bmatrix}.

Then, we find 𝑾11𝟏=(323,323,423,423)\mbox{\boldmath$W$}_{1}^{-1}{\bf 1}=(\frac{3}{23},\frac{3}{23},\frac{4}{23},\frac{4}{23})^{\top}, which satisfies the assumption Proposition 3. The maximum of Rao’s entropy defined by 𝑾1\mbox{\boldmath$W$}_{1} is 1423\frac{14}{23} at the optimal distribution (314,314,27,27)(\frac{3}{14},\frac{3}{14},\frac{2}{7},\frac{2}{7})^{\top}. On the other hand, 𝑾21𝟏=(2,4,3,3)\mbox{\boldmath$W$}_{2}^{-1}{\bf 1}=(-2,4,3,-3)^{\top}, which violates the assumption Proposition 3. Indeed, the optimal distribution is given by (12,0,0,12)(\frac{1}{2},0,0,\frac{1}{2})^{\top} in the boundary, at which Roa’s entropy has a maximum 32\frac{3}{2}. These species with zero probabilities are effectively excluded from the ecosystem’s diversity under the entropy maximization principle, meaning they ”cannot survive” within the model’s framework.

If all species are equally dissimilar, then the global maximum distribution p^0(Q)\hat{p}_{0}^{(\rm Q)} equals the even distribution 𝒆e. In effect, consider the case where 𝑾=𝟏𝟏𝐈{\mbox{\boldmath$W$}}={\bf 1}{\bf 1}^{\top}-{\bf I}, where 𝐈\bf I denotes the identity matrix. Then, 𝑾1=1S1𝟏𝟏𝐈,{\mbox{\boldmath$W$}}^{-1}=\frac{1}{S-1}{\bf 1}{\bf 1}^{\top}-{\bf I}, and hence, 𝒑^0(Q)\hat{\mbox{\boldmath$p$}}_{0}^{\rm(Q)} equals the even distribution 𝒆e. This is the same as the maximizer of Dq(𝒑){}^{q}D({\mbox{\boldmath$p$}}) noting that Rao’s quadratic entropy is reduced to the Gini-Simpson index. However, in most realistic scenarios, species differ in their functional or genetic traits, leading to variations in dissimilarities. Therefore, 𝒑^0(Q)\hat{\mbox{\boldmath$p$}}_{0}^{(\rm Q)} typically differs from 𝒆e, giving higher probabilities to species more dissimilar to others. Next, consider another simple case where 𝑾𝝅=𝝅𝝅diag(𝝅2){\mbox{\boldmath$W$}}_{\mbox{\scriptsize\boldmath$\pi$}}={\mbox{\boldmath$\pi$}}{\mbox{\boldmath$\pi$}}^{\top}-{\rm diag}({\mbox{\boldmath$\pi$}}^{2}) for a fixed 𝝅\pi of ΔS1\Delta_{S-1}, where diag\rm diag denotes the diagonal matrix. Thus,

𝑾𝝅1=1S1𝝅1(𝝅1)diag(𝝅2),{\mbox{\boldmath$W$}}_{\mbox{\scriptsize\boldmath$\pi$}}^{-1}=\frac{1}{S-1}{\mbox{\boldmath$\pi$}}^{-1}({\mbox{\boldmath$\pi$}}^{-1})^{\top}-{\rm diag}({\mbox{\boldmath$\pi$}}^{-2}),

and hence (𝑾𝝅1𝟏)i=𝟏𝝅1S1πi1πi2({\mbox{\boldmath$W$}}_{\mbox{\scriptsize\boldmath$\pi$}}^{-1}{\bf 1})_{i}=\frac{{\bf 1}^{\top}\mbox{\scriptsize\boldmath$\pi$}^{-1}}{S-1}{\pi_{i}}^{-1}-{\pi_{i}}^{-2}. This yields that, if πi>S1𝟏𝝅1\pi_{i}>\frac{S-1}{{\bf 1}^{\top}\mbox{\scriptsize\boldmath$\pi$}^{-1}}, then (𝑾𝝅1𝟏)i<0({\mbox{\boldmath$W$}}_{\mbox{\scriptsize\boldmath$\pi$}}^{-1}{\bf 1})_{i}<0, or the probability of species ii is a zero in the maximum distribution. Further, assume 𝝅=(α𝟏~,β𝟏~)/m(α+β)\mbox{\boldmath$\pi$}=(\alpha\tilde{\bf 1},\beta\tilde{\bf 1})/m(\alpha+\beta), where S=2mS=2m and 𝟏~\tilde{\bf 1} is the one-copy vector of mm dimension. Then, a straightforward calculus yields that, if

|αα+β12|12(2m1),\Big{|}\frac{\alpha}{\alpha+\beta}-\frac{1}{2}\Big{|}\leq\frac{1}{2(2m-1)},

then 𝒑^0(Q)\hat{\mbox{\boldmath$p$}}_{0}^{(\rm Q)} defined in (5) is in ΔS1\Delta_{S-1}. This situation is quite near the case of α=β\alpha=\beta, or the equally-dissimilar case 𝑾W. In general, it is difficult to get a closed form of the maximum distribution without the assumption 𝑾1𝟏>𝟎{\mbox{\boldmath$W$}}^{-1}{\bf 1}>{\bf 0}. However, we can use efficient algorithms to find numerically the maximizer, in which the optimization problem is reduced a a standard quadratic programming problem. Fast solvers like interior-point methods, active set methods, or alternating direction method of multipliers can handle this formulation. Tools such as quadprog in R and Python or specialized libraries like Gurobi or CVXPY provide robust implementations for high-dimensional problems. See Dostál (2009) for quadratic programming algorithms.

Next, consider a linear constraint: 𝒂𝒑=C{\mbox{\boldmath$a$}}^{\top}{\mbox{\boldmath$p$}}=C, where 𝒂𝟎{\mbox{\boldmath$a$}}\geq\bf{0} and CC are fixed constants. Then, under the linear constraint, Similarly, we introduce the Lagrangian function:

12Q(𝒑)+μ(𝟏𝒑1)+λ(𝒂𝒑C).\frac{1}{2}{\rm Q}({\mbox{\boldmath$p$}})+\mu\left({\mathbf{1}}^{\top}{\mbox{\boldmath$p$}}-1\right)+\lambda\left({\mbox{\boldmath$a$}}^{\top}{\mbox{\boldmath$p$}}-C\right).

The stationarity equation is given by 𝑾𝒑+λ𝒂+μ𝟏=𝟎,{\mbox{\boldmath$W$}}{\mbox{\boldmath$p$}}+\lambda{\mbox{\boldmath$a$}}+\mu{\mathbf{1}}={\bf 0}, which yields the solution form:

𝒑^θ(Q)=𝑾1(𝟏+θ𝒂)𝟏(𝑾1(𝟏+θ𝒂)).\displaystyle\hat{\mbox{\boldmath$p$}}_{\theta}^{({\rm Q})}=\frac{{\mbox{\boldmath$W$}}^{-1}({\bf 1}+\theta{\mbox{\boldmath$a$}})}{{\bf 1}^{\top}({\mbox{\boldmath$W$}}^{-1}({\bf 1}+\theta{\mbox{\boldmath$a$}}))}. (6)

Here θ\theta is determined by the linear constraint, so that

θ=(C𝟏𝒂)𝑾1𝟏(C𝟏𝒂)𝑾1𝒂.\theta=\frac{(C{\bf 1}-\mbox{\boldmath$a$})^{\top}{\mbox{\boldmath$W$}}^{-1}{\bf 1}}{(C{\bf 1}-\mbox{\boldmath$a$})^{\top}{\mbox{\boldmath$W$}}^{-1}{\mbox{\boldmath$a$}}}.

If we assume 𝑾1(𝟏+θ𝒂)>0{\mbox{\boldmath$W$}}^{-1}({\bf 1}+\theta{\mbox{\boldmath$a$}})>0, then 𝒑^θ(Q)\hat{\mbox{\boldmath$p$}}_{\theta}^{({\rm Q})} is properly the distribution maximizing Rao’s entropy due to the discussion similar to the proof of Proposition 3.

Let us examine the maximum Leinster-Cobbold index distribution. This index is integrated the Hill number with Rao’s quadratic entropy, providing a unified framework for measuring biodiversity that accounts for both species abundances and similarities.. We can find the maximizing distribution by combining arguments in Propositions 2 and 3.

Proposition 4.

Assume that 𝐙Z is invertible and that 𝐙1𝟏>𝟎\mbox{\boldmath$Z$}^{-1}\bf{1}>\bf{0} in the Leinster-Cobbold index:

D𝒁q(𝒑)=(𝒑(𝐙𝐩)q1)11q,{}^{q}\!D_{\mbox{\scriptsize\boldmath$Z$}}({\mbox{\boldmath$p$}})=\left({\mbox{\boldmath$p$}}^{\top}({\mbox{\boldmath$Z$}}{\mbox{\boldmath$p$}})^{q-1}\right)^{\frac{1}{1-q}},

where 𝐲x{\mbox{\boldmath$y$}}^{x} is defined as the vector (yix)(y_{i}^{x}) for a vector 𝐲=(yi){\mbox{\boldmath$y$}}=(y_{i}). Then, the distribution maximizing D𝐙q(𝐩){}^{q}\!D_{\mbox{\scriptsize\boldmath$Z$}}({\mbox{\boldmath$p$}}) in 𝐩p is given by

𝒑^LC(q)=𝒁1𝟏/(𝟏𝒁𝟏).\displaystyle\hat{\mbox{\boldmath$p$}}_{\rm LC}^{(q)}=\mbox{\boldmath$Z$}^{-1}{\bf 1}/({\bf 1}^{\top}\mbox{\boldmath$Z$}{\bf 1}). (7)
Proof.

The Lagrangian function with the multipliers μ\mu and ν\nu is given by

𝒑(𝒁𝒑)q1μ(𝟏𝒑1)+𝝂𝒑,{\mbox{\boldmath$p$}}^{\top}({\mbox{\boldmath$Z$}}{\mbox{\boldmath$p$}})^{q-1}-\mu\left({\mathbf{1}}^{\top}{\mbox{\boldmath$p$}}-1\right)+{\mbox{\boldmath$\nu$}}^{\top}{\mbox{\boldmath$p$}},

which leads to the stationarity equation,

(𝒁𝒑)q1+(q1)𝒁(𝒑{(𝒁𝒑)q2}μ𝟏+𝝂=𝟎,({\mbox{\boldmath$Zp$}})^{q-1}+(q-1){\mbox{\boldmath$Z$}}(\mbox{\boldmath$p$}\odot\{({\mbox{\boldmath$Zp$}})^{q-2}\}-\mu{\bf 1}+{\mbox{\boldmath$\nu$}}={\bf 0},

where \odot denotes the Hadamar product. If 𝒑>𝟎\mbox{\boldmath$p$}>{\bf 0}, then 𝝂=𝟎\mbox{\boldmath$\nu$}={\bf 0} by the complementary slackness. Let 𝒑=σ𝒁1𝟏\mbox{\boldmath$p$}=\sigma\mbox{\boldmath$Z$}^{-1}{\bf 1}, where σ>0\sigma>0. Then, the stationarity equation becomes

σq1𝟏+(q1)σq1𝟏μ𝟏=𝟎,\sigma^{q-1}{\bf 1}+(q-1)\sigma^{q-1}{\bf 1}-\mu{\bf 1}={\bf 0},

which implies 𝒑𝒁1𝟏\mbox{\boldmath$p$}\propto\mbox{\boldmath$Z$}^{-1}{\bf 1} is a candidate of the solution. Therefore, the maximum distribution is given by (7).

It is worthwhile to note that 𝒑^LC(q)\hat{\mbox{\boldmath$p$}}_{\rm LC}^{(q)} is independent of qq, and is the same form as the maximum Rao’s entropy distribution 𝒑^0(Q)\hat{\mbox{\boldmath$p$}}_{0}^{\rm(Q)} defined in (5). The distribution maximizing under a linear constraint 𝒂𝒑=C{\mbox{\boldmath$a$}}^{\top}{\mbox{\boldmath$p$}}=C is given by

𝒑^θ(LC)=𝒁1(𝟏+(q1)θ𝒂)1q1𝟏𝒁1(𝟏+(q1)θ𝒂)1q1,\displaystyle\hat{\mbox{\boldmath$p$}}_{\theta}^{({\rm LC})}=\frac{{\mbox{\boldmath$Z$}}^{-1}\big{(}{\bf 1}+(q-1)\theta{\mbox{\boldmath$a$}}\big{)}^{\frac{1}{q-1}}}{{\bf 1}^{\top}{\mbox{\boldmath$Z$}}^{-1}\big{(}{\bf 1}+(q-1)\theta{\mbox{\boldmath$a$}}\big{)}^{\frac{1}{q-1}}}, (8)

where θ\theta is determined by CC. We confirm that, if q=2q=2, then 𝒑^θ(LC)\hat{\mbox{\boldmath$p$}}_{\theta}^{({\rm LC})} is reduced to 𝒑^θ(Q)\hat{\mbox{\boldmath$p$}}_{\theta}^{({\rm Q})} defined in (6) by a re-parameterized θ\theta; if 𝒁=𝐈{\mbox{\boldmath$Z$}}={\bf I}, then this is reduced to 𝒑^θ\hat{\mbox{\boldmath$p$}}_{\theta} in (3). Thus, this effectively integrates both maximum distributions. Then, the stationarity condition is written by

q(𝒁𝒑)q1+λ𝒂+μ𝟏=0.q(\mbox{\boldmath$Zp$})^{q-1}+\lambda{\mbox{\boldmath$a$}}+\mu{\mathbf{1}}=0.

This gives (𝒁𝒑)q1𝟏+(q1)θ𝒂(\mbox{\boldmath$Zp$})^{q-1}\propto{\mathbf{1}}+(q-1)\theta{\mbox{\boldmath$a$}}, which concludes the solution form (8).

The Leinster-Cobbold index serves as a unifying framework that encompasses both the Hill numbers and Rao’s quadratic entropy. The inverse similarity matrix 𝒁1{\mbox{\boldmath$Z$}}^{-1} adjusts species abundances based on their functional or phylogenetic relationships. High qq values focus on dominant species or those with higher contributions to similarity; low qq values emphasize rare species or those with unique traits.

The maximum diversity distribution derived under the linear constraint 𝒂𝒑=C{\mbox{\boldmath$a$}}^{\top}{\mbox{\boldmath$p$}}=C has ecological interpretations that bridge mathematical optimization and real-world ecological dynamics. The coefficients aia_{i} represent species-specific traits or environmental tolerances that directly influence each species’ ability to thrive under certain ecological conditions. For instance, in ecosystems where a specific resource is limited, aia_{i} could represent each species’ efficiency in utilizing that resource. Species with higher aia_{i} values are more efficient and thus have a competitive advantage, leading to higher abundances pip_{i}. Maximizing the Leinster-Cobbold index under this constraint does not lead to perfect evenness, which is rarely observed in natural ecosystems. Instead, it yields a distribution that balances diversity with ecological realism, reflecting the natural dominance of certain species due to their advantageous traits. This mathematical optimization highlights the trade-offs between achieving maximum diversity and adhering to ecological constraints, demonstrating that the most diverse community under a given constraint proportionally represents species according to their ecological roles and advantages. This approach offers a quantitative framework to model and analyze community structures under various ecological scenarios. By adjusting aia_{i} and cc, ecologists can simulate different environmental conditions and assess their impact on diversity and species distributions.

4 Information geometry on a simplex

In this section, we explore the general properties of a simplex within the framework of information geometry. Information geometry provides a powerful way to study statistical models by viewing them as Riemannian manifolds endowed with the Fisher-Rao metric Rao (1945). The Fisher-Rao metric provides the Cramér-Rao lower bound for unbiased estimators. This perspective allows us to examine geometric structures such as geodesics, divergence measures, and metric tensors, which have profound implications in statistical inference and diversity measurement.

We have a concise review of the information geometric natures on a simplex. Let ΔS1\Delta_{S-1} be a simplex of dimension S1S-1, or

ΔS1={𝒑S:𝟏𝒑=1,𝒑𝟎}.{\Delta_{S-1}}=\big{\{}{\mbox{\boldmath$p$}}\in\mathbb{R}^{S}:{\bf 1}^{\top}\mbox{\boldmath$p$}=1,\mbox{\boldmath$p$}\geq{\bf 0}\big{\}}.

Henceforth, we identify ΔS1\Delta_{S-1} with the space of all SS-variate categorical distributions. The Fisher-Rao metric gg on ΔS1\Delta_{S-1} is given by the information matrix:

𝑮𝒑=diag(𝒑1)+1pS𝟏𝟏\displaystyle\mbox{\boldmath$G$}_{\mbox{\scriptsize\boldmath$p$}}={\rm diag}({\mbox{\boldmath$p$}}^{-1})+\frac{1}{p_{S}}{\bf 1}{\bf 1}^{\top} (9)

where pS=1i=1S1pip_{S}=1-\sum_{i=1}^{S-1}p_{i}. The Fisher-Rao metric captures the intrinsic geometry of the probability simplex, reflecting the sensitivity of probability distributions to parameter changes. The metric gg induces geodesics that can be expressed using trigonometric functions, specifically the arcsine function. The Fisher-Rao distance between two points 𝝅\pi and 𝒑p on the simplex ΔS1\Delta_{S-1} is given by:

d(𝝅,𝒑)=2arccos(𝝅𝒑).d({\mbox{\boldmath$\pi$}},{\mbox{\boldmath$p$}})=2\arccos\left(\sqrt{\mbox{\boldmath$\pi$}}^{\top}\sqrt{\mbox{\boldmath$p$}}\right).

The Riemannian geodesic connecting 𝝅\pi and 𝒑p under the Fisher-Rao metric can be expressed parametrically using trigonometric functions:

𝒑(t)=(sin((1t)ϕ)sinϕ𝝅+sin(tϕ)sinϕ𝒑)2\mbox{\boldmath$p$}(t)=\left(\frac{\sin\left((1-t)\phi\right)}{\sin\phi}\sqrt{\mbox{\boldmath$\pi$}}+\frac{\sin\left(t\phi\right)}{\sin\phi}\sqrt{\mbox{\boldmath$p$}}\right)^{2}

for t,0t1t,0\leq t\leq 1, where ϕ\phi represents the angle between 𝝅\sqrt{\mbox{\boldmath$\pi$}} and 𝒑\sqrt{\mbox{\boldmath$p$}} on the sphere. This geodesic provides the shortest path between two probability distributions under the Fisher-Rao metric and is characterized by trigonometric functions arising from the geometry of the unit sphere.

In information geometry, two types of affine connections–the mixture and exponential connections–lead to dualistic interpretations between statistical models and inference, see Amari (1982); Amari and Nagaoka (2000), Nielsen (2021). These connections give rise to m-geodesics and e-geodesics: For any distinct points 𝝅\pi and 𝒑p of ΔS1\Delta_{S-1}, the m-geodesic connecting 𝝅\pi and 𝒑p is given by

𝒑(m)(t)=(1t)𝝅+t𝒑,t[0,1]\mbox{\boldmath$p$}^{\rm(m)}(t)=(1-t)\mbox{\boldmath$\pi$}+t\mbox{\boldmath$p$},\quad t\in[0,1]

the e-geodesic connecting 𝝅\pi and 𝒑p is given by

𝒑(e)(t)=exp((1t)log𝝅+tlog𝒑)𝟏exp((1t)log𝝅+tlog𝒑),t[0,1],\mbox{\boldmath$p$}^{\rm(e)}(t)=\frac{\exp((1-t)\log\mbox{\boldmath$\pi$}+t\log\mbox{\boldmath$p$})}{{\bf 1}^{\top}\exp((1-t)\log\mbox{\boldmath$\pi$}+t\log\mbox{\boldmath$p$})},\quad t\in[0,1],

where log𝒑\log\mbox{\boldmath$p$} denotes the vector of logarithms of pip_{i}. The e-geodesic corresponds to linear interpolation in the natural parameter space of the exponential family. It is noted that the range of tt for 𝒑t(m)\mbox{\boldmath$p$}_{t}^{\rm(m)} can be extended to a closed interval including [0,1][0,1]; the range of tt for 𝒑t(e)\mbox{\boldmath$p$}_{t}^{\rm(e)} can be extended to as (,)(-\infty,\infty), see appendix for detailed discussion for the enlarged interval. We observe an important example of the mixture geodesic in Rao’s quadratic entropy. In effect, the maximum quadratic entropy distribution in (6) under the linear constraint is written as

𝒑^t(Q)=(1t)𝒑^0(Q)+t𝒑𝒂.\displaystyle\hat{\mbox{\boldmath$p$}}_{t}^{({\rm Q})}=(1-t)\hat{\mbox{\boldmath$p$}}_{0}^{({\rm Q})}+t{\mbox{\boldmath$p$}}_{\mbox{\scriptsize\boldmath$a$}}.

This is a mixture geodesic connecting 𝒑0(Q){\mbox{\boldmath$p$}}_{0}^{({\rm Q})} and 𝒑𝒂{\mbox{\boldmath$p$}}_{\mbox{\scriptsize\boldmath$a$}}, where 𝒑𝒂=𝑾1𝒂/(𝟏𝑾1𝒂){\mbox{\boldmath$p$}}_{\mbox{\scriptsize\boldmath$a$}}={{\mbox{\boldmath$W$}}^{-1}{\mbox{\boldmath$a$}}}/({{\mathbf{1}}^{\top}{\mbox{\boldmath$W$}}^{-1}{\mbox{\boldmath$a$}}}). Thus, 𝒑^t(Q)\hat{\mbox{\boldmath$p$}}_{t}^{({\rm Q})} becomes a global maximum at t=0t=0; it becomes a constrained maximum at tt determined by the linear constraint. If we take KK distinct points 𝒑(1),,𝒑(K){\mbox{\boldmath$p$}}_{(1)},...,{\mbox{\boldmath$p$}}_{(K)}, then the e-geodesic family is given by

𝒑𝒕(e)=Z𝒕(e)exp(t1log𝒑(1)+tKlog𝒑(K))\mbox{\boldmath$p$}^{\rm(e)}_{\mbox{\scriptsize\boldmath$t$}}=Z_{\mbox{\scriptsize\boldmath$t$}}^{\rm(e)}{\exp\big{(}t_{1}\log\mbox{\boldmath$p$}_{(1)}\cdots+t_{K}\log\mbox{\boldmath$p$}_{(K)}\big{)}}

where 𝒕=(ti){\mbox{\boldmath$t$}}=(t_{i}) is in a simplex ΔK1\Delta_{K-1}. In general, such a family (e)={𝒑𝒕(e):𝒕ΔK1}{\mathcal{F}}^{\rm(e)}=\{{\mbox{\boldmath$p$}}^{(\rm e)}_{\mbox{\scriptsize\boldmath$t$}}:{\mbox{\boldmath$t$}}\in\Delta_{K-1}\} is referred to as an exponential family. By definition, the e-geodesic connecting any points 𝝅\pi and 𝒑p of (e){\mathcal{F}}^{\rm(e)} is included in (e){\mathcal{F}}^{\rm(e)}. Under an assumption of an exponential family, the Cramer-Rao inequality provides insight into the efficiency of estimators, making it clear when estimators are at or near this lower bound. It is shown that MLEs within the exponential family have optimal properties, often achieving the Cramer-Rao lower bound under regularity conditions, see Rao (1961). Further, in a curved model in exp{\mathcal{F}}_{\exp}, Rao (1961) gives an insightful proof for the maximum likelihood estimator to be second-order efficient, see also Efron (1975); Eguchi (1983).

As a generalized geodesic, we consider the qq-geodesic connecting 𝝅\pi with 𝒑p defined by

𝒑t(q)=Zt(q){(1t)𝝅q1+t𝒑q1}1q1,t[0,1],\mbox{\boldmath$p$}^{(q)}_{t}=Z_{t}^{(q)}{\{(1-t)\mbox{\boldmath$\pi$}^{q-1}+t\mbox{\boldmath$p$}^{q-1}\}^{\frac{1}{q-1}}},\quad t\in[0,1],

where qq is a fixed real number and Zt(q)Z_{t}^{(q)} is the normalizing constant, cf. Eguchi et al. (2011). We will observe a close relation to the maximum Hill-number distributions discussed in Section 3. By definition, the family of the qq-geodesics includes the mixture geodesic and the exponential geodesic as

limq0𝒑t(q)=𝒑t(m),limq1𝒑t(q)=𝒑t(e).\lim_{q\rightarrow 0}\mbox{\boldmath$p$}^{(q)}_{t}=\mbox{\boldmath$p$}^{\rm(m)}_{t},\quad\lim_{q\rightarrow 1}\mbox{\boldmath$p$}^{(q)}_{t}=\mbox{\boldmath$p$}^{\rm(e)}_{t}.

We note that the feasible range for tt can be extended from [0,1][0,1] to a closed interval, see Appendix for the exact form.

Similarly, for KK distinct points 𝒑(1),,𝒑(K){\mbox{\boldmath$p$}}_{(1)},...,{\mbox{\boldmath$p$}}_{(K)}, then the qq-geodesic family is given by (q)={𝒑𝒕(q):𝒕ΔK1}{\mathcal{F}}^{(q)}=\{\mbox{\boldmath$p$}^{\rm(q)}_{\mbox{\scriptsize\boldmath$t$}}:\mbox{\boldmath$t$}\in\Delta_{K-1}\}, where

𝒑𝒕(q)=Z𝒕(q){t1𝒑(1)q1++tK𝒑(K)q1}1q1.\mbox{\boldmath$p$}^{\rm(q)}_{\mbox{\scriptsize\boldmath$t$}}=Z_{\mbox{\scriptsize\boldmath$t$}}^{(q)}\big{\{}t_{1}\mbox{\boldmath$p$}_{(1)}^{\ q-1}+\cdots+t_{K}\mbox{\boldmath$p$}_{(K)}^{\ q-1}\big{\}}^{\frac{1}{q-1}}.

The qq-geodesic connecting between any points 𝝅\pi and 𝒑p of (q){\mathcal{F}}^{\rm(q)} is included in the qq-geodesic family.

Consider a linear constraint: 𝒂𝒑=C\mbox{\boldmath$a$}^{\top}\mbox{\boldmath$p$}=C in ΔS1\Delta_{S-1}. Then, the maximum Hill-number distribution

𝒑^θ=(𝟏+(q1)θ𝒂)1q1𝟏(𝟏+(q1)θ𝒂)1q1\hat{\mbox{\boldmath$p$}}_{\theta}=\frac{({\bf 1}+(q-1)\theta\mbox{\boldmath$a$})^{\frac{1}{q-1}}}{{\bf 1}^{\top}({\bf 1}+(q-1)\theta\mbox{\boldmath$a$})^{\frac{1}{q-1}}}

with q<1q<1 under the linear constraint lies along the qq-geodesic connecting between 𝒆e and 𝒑𝒂\mbox{\boldmath$p$}_{\mbox{\scriptsize\boldmath$a$}}, where 𝒆=𝟏/S\mbox{\boldmath$e$}={\bf 1}/S and 𝒑𝒂=𝒂1q1/𝟏𝒂1q1.\mbox{\boldmath$p$}_{\mbox{\scriptsize\boldmath$a$}}={\mbox{\boldmath$a$}^{\frac{1}{q-1}}}/{{\bf 1}^{\top}\mbox{\boldmath$a$}^{\frac{1}{q-1}}}. Indeed, by the definition of the qq-geodesic,

𝒑t(q)=((1t)𝒆+t𝒑𝒂q1)1q1𝟏((1t)𝒆+t𝒑𝒂q1)1q1\mbox{\boldmath$p$}^{(q)}_{t}=\frac{\big{(}(1-t)\mbox{\boldmath$e$}+t\mbox{\boldmath$p$}_{\mbox{\scriptsize\boldmath$a$}}^{q-1}\big{)}^{\frac{1}{q-1}}}{{\bf 1}^{\top}\big{(}(1-t)\mbox{\boldmath$e$}+t\mbox{\boldmath$p$}_{\mbox{\scriptsize\boldmath$a$}}^{q-1}\big{)}^{\frac{1}{q-1}}}

which can be written as

𝒑t(q)=((1t)T𝟏+t𝒂)1q1𝟏((1t)T𝟏+t𝒂)1q1.\mbox{\boldmath$p$}^{(q)}_{t}=\frac{\big{(}(1-t)T{\bf 1}+t\mbox{\boldmath$a$}\big{)}^{\frac{1}{q-1}}}{{\bf 1}^{\top}\big{(}(1-t)T{\bf 1}+t\mbox{\boldmath$a$}\big{)}^{\frac{1}{q-1}}}.

where T=(𝟏𝒂1q1)q1/Sq1T={({\bf 1}^{\top}\mbox{\boldmath$a$}^{\frac{1}{q-1}})^{q-1}}/{S^{q-1}}. Therefore, 𝒑t(q)=𝒑^θ\mbox{\boldmath$p$}^{(q)}_{t}=\hat{\mbox{\boldmath$p$}}_{\theta} if θ\theta is defined by t/((1t)T(q1)).t/\big{(}(1-t)T(q-1)\big{)}. The qq-geodesic provides a path of distributions that transition smoothly from maximum evenness 𝒆e to a distribution that fully incorporates the constraint 𝒑𝒂\mbox{\boldmath$p$}_{\mbox{\scriptsize\boldmath$a$}}.

Consider a one-parameter family of probability distributions defined as:

𝒫𝒂(q)={𝒑θ(q)=Zθ(q)(𝟏+(q1)θ𝒂)1q1:θΘ}\displaystyle{\mathcal{P}}_{\mbox{\scriptsize\boldmath$a$}}^{(q)}=\{\mbox{\boldmath$p$}_{\theta}^{(q)}=Z_{\theta}^{(q)}{({\bf 1}+(q-1)\theta\mbox{\boldmath$a$})^{\frac{1}{q-1}}}:\theta\in\Theta\} (10)

and a subset of of ΔS1\Delta_{S-1} constrained by a linear condition:

𝒂(C)={𝒑ΔS1:𝒂𝒑=C}.{\mathcal{H}}_{\mbox{\scriptsize\boldmath$a$}}(C)=\big{\{}{\mbox{\boldmath$p$}}\in\Delta_{S-1}:\mbox{\boldmath$a$}^{\top}\mbox{\boldmath$p$}=C\big{\}}.

Geometrically, this implies that the simplex ΔS1\Delta_{S-1} can be foliated into a family of hyperplanes {𝒂(C)}\{{\mathcal{H}}_{\mbox{\scriptsize\boldmath$a$}}(C)\}, where each hyperplane intersects the one-parameter family 𝒫𝒂(q){\mathcal{P}}_{\mbox{\scriptsize\boldmath$a$}}^{(q)} at a single point:

𝒂(C)𝒫𝒂(q)={𝒑θ(q)},{\mathcal{H}}_{\mbox{\scriptsize\boldmath$a$}}(C)\cap{\mathcal{P}}_{\mbox{\scriptsize\boldmath$a$}}^{(q)}=\{\mbox{\boldmath$p$}_{\theta^{*}}^{(q)}\},

where θ\theta^{*} satisfies the condition 𝒂𝒑θ(q)=C\mbox{\boldmath$a$}^{\top}\mbox{\boldmath$p$}_{\theta^{*}}^{(q)}=C. This foliation provides a geometric perspective on the simplex: each hyperplane 𝒂(C){\mathcal{H}}_{\mbox{\scriptsize\boldmath$a$}}(C) forms a ”leaf,” and the distribution of these leaves varies smoothly over ΔS1\Delta_{S-1}. At every point within a leaf, the tangent space of the leaf aligns with a smoothly varying distribution over the simplex. Fig. 4 gives a 3D plot of the foliation in a 33-dimensional simplex Δ3\Delta_{3}. We observe that the curve represents the ray 𝒫𝒂{\mathcal{P}}_{\mbox{\scriptsize\boldmath$a$}} that intersects the leaves 𝒂(C){\mathcal{H}}_{\mbox{\scriptsize\boldmath$a$}}(C)’s at the conditional maximum points, in which the leaves are parallel to each other.

Refer to caption
Figure 4: Plots of the foliation in Δ3\Delta_{3}.

The concepts for divergence have been well discussed in statistics, see Burbea and Rao (1982); Basu et al. et al. (1998); Eguchi and Komori (2022). The γ\gamma-power cross-entropy between community distributions 𝝅\pi and 𝒑p can be given by

Hγ(𝝅,𝒑)=1γ𝝅𝒑γ(𝟏𝒑γ+1)γγ+1H_{\gamma}({\mbox{\boldmath$\pi$}},{\mbox{\boldmath$p$}})=-\frac{1}{\gamma}\frac{\!\!\!\!\mbox{\boldmath$\pi$}^{\top}\mbox{\boldmath$p$}^{\gamma}}{\big{(}{\bf 1}^{\top}\mbox{\boldmath$p$}^{\gamma+1}\big{)}^{\frac{\gamma}{\gamma+1}}}

with a power parameter γ\gamma of \mathbb{R} and the (diagonal) entropy is given by

Hγ(𝒑)=1γ(𝟏𝒑γ+1)1γ+1H_{\gamma}({\mbox{\boldmath$p$}})=-\frac{1}{\gamma}{\big{(}{\bf 1}^{\top}\mbox{\boldmath$p$}^{\gamma+1}\big{)}^{\frac{1}{\gamma+1}}}

equating 𝝅\pi with 𝒑p, see Fujisawa and Eguchi (2008); Eguchi (2024). When γ=0\gamma=0, Hγ(𝒑)H_{\gamma}({\mbox{\boldmath$p$}}) is the Boltzmann-Shannon entropy; when γ=1\gamma=-1, Hγ(𝒑)H_{\gamma}({\mbox{\boldmath$p$}}) is the geometric mean, i=1Spi1S\prod_{i=1}^{S}p_{i}^{\frac{1}{S}}; when γ=2\gamma=-2, Hγ(𝒑)H_{\gamma}({\mbox{\boldmath$p$}}) is the harmonic mean, or (i=1Spi1)1\big{(}\sum_{i=1}^{S}p_{i}^{-1}\big{)}^{-1}. In essence, the γ\gamma-power entropy is equivalent to the Hill number with a relation of Dq(𝒑)=γ{Hγ(𝒑)}1γ{}^{q}\!D({\mbox{\boldmath$p$}})=-\gamma\{H_{\gamma}({\mbox{\boldmath$p$}})\}^{-\frac{1}{\gamma}} in relation to q=γ+1q=\gamma+1. A fundamental inequality holds: Hγ(𝝅,𝒑)Hγ(𝒑)H_{\gamma}({\mbox{\boldmath$\pi$}},{\mbox{\boldmath$p$}})\geq H_{\gamma}({\mbox{\boldmath$p$}}), where the difference is referred as the γ\gamma-power divergence

𝒟γ(𝝅,𝒑)=Hγ(𝝅,𝒑)Hγ(𝝅).{\mathcal{D}}_{\gamma}({\mbox{\boldmath$\pi$}},{\mbox{\boldmath$p$}})=H_{\gamma}({\mbox{\boldmath$\pi$}},{\mbox{\boldmath$p$}})-H_{\gamma}({\mbox{\boldmath$\pi$}}).

In general, any statistical divergence is associated with the Riemannian metric and a pair of affine connections, see Eguchi (1992). The power cross-entropy has the empirical form for regression and classification model, in which the minimization problem gives a robust estimation in a context of machine learning. When γ\gamma equals 0, then Hγ(𝒑)H_{\gamma}(\mbox{\boldmath$p$}), Hγ(𝒑,𝝅)H_{\gamma}(\mbox{\boldmath$p$},\mbox{\boldmath$\pi$}) and Dγ(𝒑,𝝅)D_{\gamma}(\mbox{\boldmath$p$},\mbox{\boldmath$\pi$}) are reduced to the Boltzmann-Shannon entropy, the cross-entropy and Kullback-Leibler divergence, respectively. The empirical form of the cross-entropy under a parametric model is nothing but the negative log-likelihood function, see Eguchi and Copas (2006) for detailed discussion. We explore statistical properties for estimation methods for the maximum Hill-number model 𝒫𝒂(q){\mathcal{P}}_{\mbox{\scriptsize\boldmath$a$}}^{(q)} defined in (10).

Let 𝝅^\hat{\mbox{\boldmath$\pi$}} be an observed frequency vector in ΔS1\Delta_{S-1} derived from ecological research or auxiliary information, such as prior field surveys, species inventories, or environmental modeling outputs. Then, the linear constraint is fixed by 𝒂𝒑=C^\mbox{\boldmath$a$}^{\top}\mbox{\boldmath$p$}=\hat{C}, where C^=𝒂𝝅^\hat{C}=\mbox{\boldmath$a$}^{\top}\hat{\mbox{\boldmath$\pi$}}. In general, the maximum likelihood method is recognized as a universal method for estimating a general parametric model. Consider the standard maximum entropy model

𝒑θ(1)=exp(θ𝒂)/𝟏exp(θ𝒂).\displaystyle\mbox{\boldmath$p$}_{\theta}^{(1)}={\exp(\theta\mbox{\boldmath$a$})}/{{\bf 1}^{\top}\exp(\theta\mbox{\boldmath$a$})}.

Then, the log-likelihood function L(θ)=𝝅^log(𝒑θ(1))L(\theta)=\hat{\mbox{\boldmath$\pi$}}^{\top}\log(\mbox{\boldmath$p$}_{\theta}^{(1)}) leads to the likelihood equation: 𝒂𝒑θ(1)=C^.\mbox{\boldmath$a$}^{\top}\mbox{\boldmath$p$}_{\theta}^{(1)}=\hat{C}. In effect, the maximum likelihood estimation is equivalent to the maximum Hill number method. However, other estimation methods are not associated with such equivalence. In general, we would like to consider the minimum γ\gamma-power method by fixing the power parameter as γ=q1\gamma=q-1 for the order qq of the Hill number. For an application to ecological studies, we consider the cross Hill number of qq-order as

Dq(𝝅,𝒑)=(𝟏𝒑q)(𝝅𝒑q1)q1q.\displaystyle{}^{q}\!D(\mbox{\boldmath$\pi$},\mbox{\boldmath$p$})={\big{(}{\bf 1}^{\top}\mbox{\boldmath$p$}^{q}\big{)}}\big{(}{{\mbox{\boldmath$\pi$}}^{\top}\mbox{\boldmath$p$}^{q-1}}\big{)}^{\frac{q}{1-q}}. (11)

If 𝝅=𝒑\mbox{\boldmath$\pi$}=\mbox{\boldmath$p$}, then Dq(𝝅,𝒑)=(𝟏𝝅q)11q{}^{q}\!D(\mbox{\boldmath$\pi$},\mbox{\boldmath$p$})=({\bf 1}^{\top}\mbox{\boldmath$\pi$}^{q})^{\frac{1}{1-q}}, or the cross Hill number is reduced to the Hill number, or Dq(𝝅){}^{q}\!D(\mbox{\boldmath$\pi$}). Hence, since Hγ(𝝅,𝒑)Hγ(𝝅)H_{\gamma}(\mbox{\boldmath$\pi$},\mbox{\boldmath$p$})\geq H_{\gamma}(\mbox{\boldmath$\pi$}), Dq(𝝅,𝒑)Dq(𝝅),{}^{q}\!D(\mbox{\boldmath$\pi$},\mbox{\boldmath$p$})\geq{}^{q}\!D(\mbox{\boldmath$\pi$}), and hence we define the Hill divergence

𝚫q(𝝅,𝒑)=Dq(𝝅,𝒑)Dq(𝝅).{}^{q}\!{\mbox{\boldmath$\Delta$}}(\mbox{\boldmath$\pi$},\mbox{\boldmath$p$})={}^{q}\!D(\mbox{\boldmath$\pi$},\mbox{\boldmath$p$})-{}^{q}\!D(\mbox{\boldmath$\pi$}).

Assume 𝒂𝒑θ^(q)=C^\mbox{\boldmath$a$}^{\top}\mbox{\boldmath$p$}_{\hat{\theta}}^{(q)}=\hat{C} and 𝒂𝝅=C^\mbox{\boldmath$a$}^{\top}\mbox{\boldmath$\pi$}=\hat{C}. Then, we result

Dq(𝒑θ^(q))Dq(𝝅){}^{q}\!D(\mbox{\boldmath$p$}_{\hat{\theta}}^{(q)})\geq{}^{q}\!D(\mbox{\boldmath$\pi$})

that is, 𝒑θ^(q)\mbox{\boldmath$p$}_{\hat{\theta}}^{(q)} is the maximum Hill number distribution. This is because

Dq(𝒑θ^(q))Dq(𝒑θ^(q))=𝚫q(𝒑θ^(q),𝝅).\displaystyle{}^{q}\!D(\mbox{\boldmath$p$}_{\hat{\theta}}^{(q)})-{}^{q}\!D(\mbox{\boldmath$p$}_{\hat{\theta}}^{(q)})={}^{q}\!{\mbox{\boldmath$\Delta$}}(\mbox{\boldmath$p$}_{\hat{\theta}}^{(q)},\mbox{\boldmath$\pi$}).

due to 𝝅(𝒑θ^(q))q1=𝒑θ^(q)(𝒑θ^(q))q1=1+θC^\mbox{\boldmath$\pi$}^{\top}(\mbox{\boldmath$p$}_{\hat{\theta}}^{(q)})^{q-1}=\mbox{\boldmath$p$}_{\hat{\theta}}^{(q)}{}^{\top}(\mbox{\boldmath$p$}_{\hat{\theta}}^{(q)})^{q-1}=1+\theta\hat{C}. In this way, we can show Proposition 2 from 𝚫q(𝒑θ^(q),𝝅)0{}^{q}\!{\mbox{\boldmath$\Delta$}}(\mbox{\boldmath$p$}_{\hat{\theta}}^{(q)},\mbox{\boldmath$\pi$})\geq 0.

We propose the maximum qq-order estimator defined by

θ^(q)=argminθDq(𝝅^,𝒑θ(q)).\displaystyle\hat{\theta}^{(q)}=\mathop{\rm argmin}_{\theta}{}^{q}\!D(\hat{\mbox{\boldmath$\pi$}},\mbox{\boldmath$p$}_{\theta}^{(q)}). (12)

We will observe that the maximum qq-order estimation is equivalent to the method of the maximum Hill number distribution.

Proposition 5.

Let θ^(q)\hat{\theta}^{(q)} be the maximum qq-order estimator defined in (12). Then, the estimator θ=θ^(q)\theta=\hat{\theta}^{(q)} satisfies the linear constraint 𝐚𝐩θ(q)=C^\mbox{\boldmath$a$}^{\top}\mbox{\boldmath$p$}_{\theta}^{(q)}=\hat{C}, where C^=𝐚𝛑^\hat{C}=\mbox{\boldmath$a$}^{\top}\mbox{\boldmath$\hat{\pi}$}. Furthermore,

maxθDq(𝝅^,𝒑θ(q))=max𝒑:𝒂𝒑=C^Dq(𝒑)\max_{\theta}{}^{q}\!D(\hat{\mbox{\boldmath$\pi$}},\mbox{\boldmath$p$}_{\theta}^{(q)})=\max_{\mbox{\scriptsize\boldmath$p$}\>:\>\mbox{\scriptsize\boldmath$a$}^{\top}\mbox{\scriptsize\boldmath$p$}\>=\>\hat{C}}\ {}^{q}\!D(\mbox{\boldmath$p$})
Proof.

The qq-order cross Hill number (11) is simplified at (𝝅^,𝒑θ(q))(\hat{\mbox{\boldmath$\pi$}},\mbox{\boldmath$p$}_{\theta}^{(q)}), noting the definition of C^\hat{C}:

Dq(𝝅^,𝒑θ(q))=(1+(q1)θC^)q1q(𝟏(𝟏(q1)θ𝒂)qq1).\displaystyle{}^{q}\!D(\hat{\mbox{\boldmath$\pi$}},\mbox{\boldmath$p$}_{\theta}^{(q)})=(1+(q-1)\theta\hat{C})^{\frac{q}{1-q}}\big{(}{\bf 1}^{\top}({\bf 1}-(q-1)\theta\mbox{\boldmath$a$})^{\frac{q}{q-1}}\big{)}.

Hence, the gradient is given by

θDq(𝝅^,𝒑θ(q))=q1q{C^(1+θ~C^)11q(𝟏(𝟏θ~𝒂)qq1)(1+θ~C^)q1q(𝒂(𝟏θ~𝒂)1q1)}\displaystyle\nabla_{\theta}{}^{q}\!D(\hat{\mbox{\boldmath$\pi$}},\mbox{\boldmath$p$}_{\theta}^{(q)})=\frac{q}{1-q}\Big{\{}\hat{C}(1+\tilde{\theta}\hat{C})^{\frac{1}{1-q}}\big{(}{\bf 1}^{\top}({\bf 1}-\tilde{\theta}\mbox{\boldmath$a$})^{\frac{q}{q-1}}\big{)}-(1+\tilde{\theta}\hat{C})^{\frac{q}{1-q}}\big{(}{\mbox{\boldmath$a$}}^{\top}({\bf 1}-\tilde{\theta}\mbox{\boldmath$a$})^{\frac{1}{q-1}}\big{)}\Big{\}}

which is written as

q1q(1+θ~C^)11q(𝟏(𝟏θ~𝒂)qq1)(C^𝒂𝒑θ(q))\displaystyle\frac{q}{1-q}{(1+\tilde{\theta}\hat{C})^{\frac{1}{1-q}}\big{(}{\bf 1}^{\top}({\bf 1}-\tilde{\theta}\mbox{\boldmath$a$})^{\frac{q}{q-1}}\big{)}}\big{(}\hat{C}-\mbox{\boldmath$a$}^{\top}\mbox{\boldmath$p$}_{\theta}^{(q)}\big{)}

due to the definition of 𝒑θ(q)\mbox{\boldmath$p$}_{\theta}^{(q)}, where θ~=(q1)θ\tilde{\theta}=(q-1)\theta. Equating the gradient to 0 is equivalent to 𝒂𝒑θ(q)=C^\mbox{\boldmath$a$}^{\top}\mbox{\boldmath$p$}_{\theta}^{(q)}=\hat{C}. The linear constraint is exactly satisfied at θ=θ^(q)\theta=\hat{\theta}^{(q)} since, by definition, θ^(q)\hat{\theta}^{(q)} is the solution of the gradient equation. Further, if θ=θ^(q)\theta=\hat{\theta}^{(q)}, then 1+θC^=𝒑θ(q)(𝟏θ𝒂)1+\theta\hat{C}=\mbox{\boldmath$p$}_{\theta}^{(q)}{}^{\top}({\bf 1}-\theta\mbox{\boldmath$a$}), so that

Dq(𝝅^,𝒑θ(q))=(𝟏(𝒑θ(q))q)(𝒑θ(q)(𝒑θ(q))q1)q1q.\displaystyle{}^{q}\!D(\hat{\mbox{\boldmath$\pi$}},\mbox{\boldmath$p$}_{\theta}^{(q)})={\big{(}{\bf 1}^{\top}(\mbox{\boldmath$p$}_{\theta}^{(q)})^{q}\big{)}}\big{(}{{\mbox{\boldmath$p$}_{\theta}^{(q)}}^{\top}(\mbox{\boldmath$p$}_{\theta}^{(q)})^{q-1}}\big{)}^{\frac{q}{1-q}}.

This is nothing but Hill number Dq(𝒑θ(q)){}^{q}\!D(\mbox{\boldmath$p$}_{\theta}^{(q)}) that attains the maximum at θ=θ^(q)\theta=\hat{\theta}^{(q)} under the linear constraint. This completes the proof. ∎

In the proof, we observe a surprising property: θDq(𝝅^,𝒑θ(q))C^𝒂𝒑θ(q)\nabla_{\theta}{}^{q}\!D(\hat{\mbox{\boldmath$\pi$}},\mbox{\boldmath$p$}_{\theta}^{(q)})\propto\hat{C}-\mbox{\boldmath$a$}^{\top}\mbox{\boldmath$p$}_{\theta}^{(q)}. This directly shows the equivalence between the maximum qq-order estimation and the maximum distribution for the Hill number of the order qq. Thus, the maxim qq-order estimation naturally introduce the maximum Hill number of the order qq.

We consider a more general setting of linear constraint that allows multiple linear constraints. Let 𝑨A be a matrix of size K×SK\times S with full rank and

𝑨(𝑪)={𝒑:𝑨𝒑=𝑪}\displaystyle{\mathcal{H}}_{\mbox{\scriptsize\boldmath$A$}}(\mbox{\boldmath$C$})=\{{\mbox{\boldmath$p$}}:{\mbox{\boldmath$A$}}{\mbox{\boldmath$p$}}={\mbox{\boldmath$C$}}\} (13)

where 𝑪C is a constant vector of KK dimension. This means KK constraints 𝒂j𝒑=Cj(j=1,,K)\mbox{\boldmath$a$}_{j}\top\mbox{\boldmath$p$}=C_{j}\ (j=1,...,K), where 𝒂j\mbox{\boldmath$a$}_{j} is the jjth row vector of 𝑨A. Consider the maximum Hill-number distribution under such linear constraints. We have an argument similar to that for one-dimensional constraint as discussed above. Thus, we introduce Lagrange multipliers μ\mu, 𝝀\lambda and 𝝂\nu to incorporate the constraints, and form the Lagrangian function:

=i=1Spi+q𝝀(𝑨𝒑𝑪)+μ(𝟏𝒑1)+𝝂𝒑.\mathcal{L}=\sum_{i=1}^{S}p_{i}{}^{q}+{\mbox{\boldmath$\lambda$}}^{\top}\left({\mbox{\boldmath$A$}}{\mbox{\boldmath$p$}}-{\mbox{\boldmath$C$}}\right)+\mu\left({\bf 1}^{\top}\mbox{\boldmath$p$}-1\right)+\mbox{\boldmath$\nu$}^{\top}\mbox{\boldmath$p$}.

The stationarity condition gives the solution form:

𝒑𝜽(q)=[𝟏+(q1)𝑨𝜽]+1q1𝟏[𝟏+(q1)𝑨𝜽]+1q1.\mbox{\boldmath$p$}^{(q)}_{\mbox{\scriptsize\boldmath$\theta$}}=\frac{\big{[}{\bf 1}+(q-1){\mbox{\boldmath$A$}}^{\top}{\mbox{\boldmath$\theta$}}\big{]}_{+}^{\frac{1}{q-1}}}{{\bf 1}^{\top}\big{[}{\bf 1}+(q-1){\mbox{\boldmath$A$}}^{\top}{\mbox{\boldmath$\theta$}}]_{+}^{\frac{1}{q-1}}}.

combining both cases of q<1q<1 and q>1q>1. It is necessary to define a region of feasible 𝑪C values to ensure that 𝜽\theta satisfies the linear constraints. However, we omit this discussion as it closely parallels the proof of Proposition 2. The gradient of the qq-order risk function for the model {𝒑𝜽(q)}𝜽\{\mbox{\boldmath$p$}_{\mbox{\scriptsize\boldmath$\theta$}}^{(q)}\}_{\mbox{\scriptsize\boldmath$\theta$}} satisfies

𝜽Dq(𝝅^,𝒑𝜽(q))𝑪^𝑨𝒑𝜽(q)\nabla_{\mbox{\scriptsize\boldmath$\theta$}\ }{}^{q}\!D(\hat{\mbox{\boldmath$\pi$}},\mbox{\boldmath$p$}^{(q)}_{\mbox{\scriptsize\boldmath$\theta$}})\propto\hat{\mbox{\boldmath$C$}}-{\mbox{\boldmath$A$}}{\mbox{\boldmath$p$}}_{\mbox{\scriptsize\boldmath$\theta$}}^{(q)}

Hence, the maximum qq-order estimator 𝜽^(q)\hat{\mbox{\boldmath$\theta$}}^{(q)} satisfies the linear constraints 𝑨𝒑𝜽^(q)=𝑪^.{\mbox{\boldmath$A$}}{\mbox{\boldmath$p$}}_{\mbox{\scriptsize\boldmath$\hat{\theta}$}^{(q)}}=\hat{\mbox{\boldmath$C$}}. Let us consider the asymptotic distribution of 𝜽^(q)\hat{\mbox{\boldmath$\theta$}}^{(q)} under an assumption: the observed frequency vector 𝝅^\hat{\mbox{\boldmath$\pi$}} is randomly sampled from a categorical distribution with a frequency vector 𝝅\pi with size NN. Then, N(𝝅^𝝅)\sqrt{N}(\hat{\mbox{\boldmath$\pi$}}-{\mbox{\boldmath$\pi$}}) asymptotically converges to a normal distribution with mean 𝟎\bf 0 and covariance 𝑮𝝅1\mbox{\boldmath$G$}_{\mbox{\scriptsize\boldmath$\pi$}}^{-1}, where 𝑮𝝅\mbox{\boldmath$G$}_{\mbox{\scriptsize\boldmath$\pi$}} is the Fisher-Rao information matrix as defined in (9). Hence, N(𝜽^(q)𝜽)\sqrt{N}(\hat{\mbox{\boldmath$\theta$}}^{(q)}-{\mbox{\boldmath$\theta$}}) converges to a normal distribution with mean 𝟎\bf 0 and covariance

(𝑨𝑱𝜽)1𝑨G𝝅1𝑨(𝑱𝜽𝑨)1,({\mbox{\boldmath$A$}}\mbox{\boldmath$J$}_{\mbox{\scriptsize\boldmath$\theta$}})^{-1}{\mbox{\boldmath$A$}}G_{\mbox{\scriptsize\boldmath$\pi$}}^{-1}{\mbox{\boldmath$A$}}^{\top}(\mbox{\boldmath$J$}_{\mbox{\scriptsize\boldmath$\theta$}}^{\top}{\mbox{\boldmath$A$}}^{\top})^{-1},

where 𝑱𝜽{\mbox{\boldmath$J$}}_{\mbox{\scriptsize\boldmath$\theta$}} is the Jacobian matrix of the model 𝒑𝜽(q){\mbox{\boldmath$p$}}_{\mbox{\scriptsize\boldmath$\theta$}}^{(q)}, or

𝑱𝜽:=𝜽𝒑𝜽(q)=(diag(𝒑𝜽(𝟏+𝑨)1)𝟏(𝟏+𝑨)2qq1𝟏(𝟏+𝑨)1q1diag(𝒑𝜽))𝑨.{\mbox{\boldmath$J$}}_{{\mbox{\scriptsize\boldmath$\theta$}}}:=\nabla_{\mbox{\scriptsize\boldmath$\theta$}}{\mbox{\boldmath$p$}_{{\mbox{\scriptsize\boldmath$\theta$}}}^{(q)}}=\Big{(}{\rm diag}\big{(}\mbox{\boldmath$p$}_{\mbox{\scriptsize\boldmath$\theta$}}\odot({\bf 1}+\mbox{\boldmath$A$}^{\top})^{-1}\big{)}-\frac{{\bf 1}^{\top}({\bf 1}+\mbox{\boldmath$A$}^{\top})^{\frac{2-q}{q-1}}}{{\bf 1}^{\top}({\bf 1}+\mbox{\boldmath$A$}^{\top})^{\frac{1}{q-1}}}{\rm diag}\big{(}\mbox{\boldmath$p$}_{\mbox{\scriptsize\boldmath$\theta$}}\big{)}\Big{)}\mbox{\boldmath$A$}^{\top}.

Finally, we note that, if q=1q=1, the Hill number of order qq become equivalent to the Boltzmann-Shannon entropy, and the minimum qq-order estimator is reduced the maximum likelihood estimator.

This section demonstrates how the framework of information geometry, particularly through the concept of qq-geodesics, enriches our understanding of diversity measures on the simplex ΔS1\Delta_{S-1}. By connecting statistical divergences, geodesic paths, and maximum diversity distributions, we establish a geometric interpretation of the Hill numbers and related diversity indices. The foliation of the simplex into hyperplanes corresponding to linear constraints further illustrates the interplay between geometric structures and ecological considerations. These insights not only deepen the theoretical foundations but also provide practical tools for analyzing and optimizing biodiversity under various constraints.

5 Discussion

The maximum diversity distribution under a linear constraint embodies the interplay between species-specific ecological traits and the overall diversity of the community. It provides valuable insights into how ecological factors shape community composition and offers practical implications for biodiversity conservation and ecosystem management. By considering both the mathematical optimum and ecological principles, ecologists can better interpret diversity patterns, predict ecological dynamics, and set informed conservation goals that align with the inherent constraints of natural ecosystems. By selecting different values of qq, ecologists can focus on different aspects of biodiversity (dominance or rarity), making this a powerful tool for assessing community dynamics, resource competition, and conservation strategies in response to environmental constraints. For conservation applications, comparing observed abundance distribution to the theoretical maximal Hill number distributions offers a valuable diagnostic tool. Alignment with the maximal distribution suggests that the community may have reached an optimal diversity configuration for its specific ecological context, whereas deviations indicate potential imbalances or stresses, such as dominance by invasive species or suppression of keystone species. This comparison helps identify conservation needs and highlights areas for intervention, guiding efforts to maintain a balanced, resilient ecosystem. Monitoring these deviations over time can also track how community structure responds to environmental pressures, providing insights into resilience and adaptation.

Species Distribution Modeling (SDM) is a statistical approach used to predict the geographic distribution of species based on their known occurrences and environmental conditions. SDMs relate species presence-absence or presence-only data to environmental variables (e.g., temperature, precipitation, habitat features) to estimate the probability of species occurrence or abundance across spatial landscapes. These models are widely applied in ecology for biodiversity assessments, habitat suitability mapping, and predicting the impacts of environmental changes, such as climate change, on species distributions, see Southwood and Henderson (2009); Elith and Leathwick (2009); Komori and Eguchi (2019); Saigusa et al. (2024). We discuss a potential framework and its implications for incorporating SDM into the approach of maximum diversity distributions represents. SDMs naturally account for environmental covariates (temperature, precipitation, habitat types), providing realistic constraints for diversity maximization. Using the predicted relative abundances from SDMs, the observed diversity can be directly compared to the maximum diversity, offering insights into the extent to which ecological constraints influence community assembly, see Kusumoto et al. (2023). In areas with significant deviation from maximum diversity distributions, SDMs can provide insights into which species are missing or disproportionately abundant, suggesting specific restoration or management interventions.

Maximum diversity distributions offer a powerful framework that can extend beyond ecology to other domains like economics and engineering, providing insights into resource allocation, system robustness, and optimization under constraints. In portfolio planning, maximum diversity principles can balance risk and return by distributing investments across assets. In control problems, diversity-based strategies enhance robustness by distributing control efforts or resources across system components. Applications include energy management in networks, feedback control in complex systems, and adaptive control for dynamic environments (e.g., autonomous systems, smart grids). By adapting maximum diversity distributions to these fields,we can address challenges in resource optimization, system stability, and risk management.

References

  • Amari (1982) Amari, S. I. Differential geometry of curved exponential families: curvatures and information loss. The Annals of Statistics, 10(2):357–385, 1982.
  • Amari and Nagaoka (2000) Amari, S. I., & Nagaoka, H. Methods of information geometry (Vol. 191). American Mathematical Soc., 2000.
  • Basu et al. et al. (1998) Basu, A., Harris, I. R., Hjort, N. L. and Jones, M. C. Robust and efficient estimation by minimising a density power divergence. Biometrika, 85(3):549–559, 1998.
  • Botta-Dukát (2005) Botta-Dukát, Z. Rao’s quadratic entropy as a measure of functional diversity based on multiple traits. Journal of vegetation science, 16(5), 533-540, 2005.
  • Burbea and Rao (1982) Burbea, J. & Rao, C. R. On the convexity of some divergence measures based on entropy functions. IEEE Transactions on Information Theory, 28(3):489–495, 1982.
  • Chao (1987) Chao, A. Estimating the population size for capture-recapture data with unequal matchability. Biometrics 43, 783-791, 1987.
  • Colwell and Coddington (1994) Colwell, R. K. & Coddington, J. A. Estimating terrestrial biodiversity through extrapolation. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 345, 101-118, 1994.
  • Dostál (2009) Dostál, Zdenek Optimal quadratic programming algorithms: with applications to variational inequalities,23, 2009, Springer Science & Business Media.
  • Efron (1975) Efron, B. Defining the curvature of a statistical problem (with applications to second order efficiency). The Annals of Statistics, pages 1189–1242, 1975.
  • Eguchi (1983) Eguchi, S. Second order efficiency of minimum contrast estimators in a curved exponential family. The Annals of Statistics, 793-803, 1983.
  • Eguchi (1992) Eguchi, S. Geometry of minimum contrast. Hiroshima Mathematical Journal, 22(3):631–647, 1992.
  • Eguchi and Copas (2006) Eguchi, S. and Copas, J. Interpreting Kullback–Leibler divergence with the Neyman–Pearson lemma. Journal of Multivariate Analysis, 97(9):2034–2040, 2006.
  • Eguchi (2024) Eguchi, S. Minimum Gamma Divergence for Regression and Classification Problems. arXiv preprint arXiv:2408.01893, 2024.
  • Eguchi and Komori (2022) Eguchi, S. and Komori, O. Minimum divergence methods in statistical machine learning: From an Information Geometric Viewpoint. Springer Japan KK, 2022.
  • Eguchi et al. (2011) Eguchi, E., Komori, O. and Kato, S. Projective power entropy and maximum tsallis entropy distributions. Entropy, 13(10):1746–1764, 2011.
  • Elith and Leathwick (2009) Elith, J. and Leathwick, J. R. Species distribution models: ecological explanation and prediction across space and time. Annual review of ecology, evolution, and systematics, 40(1):677–697, 2009.
  • Fujisawa and Eguchi (2008) Fujisawa, H. and Eguchi, S. Robust parameter estimation with a small bias against heavy contamination. Journal of Multivariate Analysis, 99:2053–2081, 2008.
  • Hill (1973) Hill, M. O. Diversity and evenness: a unifying notation and its consequences. Ecology 54, 427-432, 1973.
  • Jost (2006) Jost, L. Entropy and diversity. Oikos, 113(2), 363-375, 2006.
  • Komori and Eguchi (2019) Komori, O. and Eguchi, S. Statistical Methods for Imbalanced Data in Ecological and Biological Studies. Springer, Tokyo, 2019.
  • Kusumoto et al. (2023) Kusumoto, B., Chao, A., Eiserhardt, W. L., Svenning, J. C., Shiono, T., & Kubota, Y. Occurrence-based diversity estimation reveals macroecological and conservation knowledge gaps for global woody plants. Science Advances, 9(40), eadh9719, 2023.
  • May (1975) May, R. M. Patterns of species abundance and diversity. In Ecology and Evolution of Communities (ed. M. L. D. Cody, J. M.). Cambridge, Mass.: Harvard University Press, 1975.
  • Leinster and Cobbold (2012) Leinster, T., & Cobbold, C. A. Measuring diversity: the importance of species similarity. Ecology, 93(3), 477-489, 2012.
  • Nielsen (2021) Nielsen, F. On geodesic triangles with right angles in a dually flat space. In Progress in Information Geometry: Theory and Applications, pages 153–190. Springer, 2021.
  • Phillips et al. (2004) Phillips, S. J., Dudík, M. and Schapire, R. E. A maximum entropy approach to species distribution modeling. In Proceedings of the twenty-first international conference on Machine learning, page 83, 2004.
  • Rao (1945) Rao, C. R. Information and accuracy attainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society, 37(3), 81-91, 1945.
  • Rao (1961) Rao, C. R. Asymptotic efficiency and limiting information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, 4, 531-546, 1961. University of California Press.
  • Rao (1982a) Rao, C. R. Diversity and dissimilarity coefficients: a unified approach. Theoretical population biology, 21(1), 24-43, 1982.
  • Rao (1982b) Rao, C. R. Diversity: Its measurement, decomposition, apportionment and analysis. Sankhyā: The Indian Journal of Statistics, Series A, 1-22, 1982.
  • Rao (1987) Rao, C. R. Differential metrics in probability spaces. Differential geometry in statistical inference, 10:217–240, 1987.
  • Saigusa et al. (2024) Saigusa, Y., Eguchi, S. and Komori, O. Robust minimum divergence estimation in a spatial Poisson point process. Ecological Informatics, 81:102569, 2024.
  • Simpson (1949) Simpson, E. H. Measurement of diversity. Nature 163, 688, 1949.
  • Southwood and Henderson (2009) Southwood, T. R. E., & Henderson, P. A. Ecological methods. John Wiley & Sons, 2009.

Appendix

We explore the ranges of tt for defining a mixture geodesic and qq-geodesic in ΔS1\Delta_{S-1}.

Proposition 6.

Consider a mixture geodesic 𝐩t(m)=(1t)𝛑+t𝐩\mbox{\boldmath$p$}^{\rm(m)}_{t}=(1-t)\mbox{\boldmath$\pi$}+t\mbox{\boldmath$p$} connecting 𝛑\pi and 𝐩p in the interior of ΔS1\Delta_{S-1}, where 0t10\leq t\leq 1. The range of tt is enlarged to a closed interval

[min(1R11,1Q11),1+min(1R21,1Q21)]\left[-\min\Big{(}\frac{1}{R_{1}-1},\frac{1}{Q_{1}-1}\Big{)},1+\min\Big{(}\frac{1}{R_{2}-1},\frac{1}{Q_{2}-1}\Big{)}\right]

where

R1=maxipiπi,R2=maxiπipi,Q1=maxi1pi1πi,Q2=maxi1πi1pi.R_{1}=\max_{i}\frac{p_{i}}{\pi_{i}},\quad R_{2}=\max_{i}\frac{\pi_{i}}{p_{i}},\quad Q_{1}=\max_{i}\frac{1-p_{i}}{1-\pi_{i}},\quad Q_{2}=\max_{i}\frac{1-\pi_{i}}{1-p_{i}}.
Proof.

By definition,

πi<(piπi)t<1πi(i=1,,S).\displaystyle-\pi_{i}<(p_{i}-\pi_{i})t<1-\pi_{i}\quad(i=1,...,S). (14)

If pi>πip_{i}>\pi_{i}, then

1piπi1<t<1+11πi1pi1(i=1,,S).-\frac{1}{\frac{p_{i}}{\pi_{i}}-1}<t<1+\frac{1}{\frac{1-\pi_{i}}{1-p_{i}}-1}\quad(i=1,...,S).

This implies

1maxipiπi1<t<1+1maxi1πi1pi1.\displaystyle-\frac{1}{\max_{i}\frac{p_{i}}{\pi_{i}}-1}<t<1+\frac{1}{\max_{i}\frac{1-\pi_{i}}{1-p_{i}}-1}. (15)

Similarly, if pi<πip_{i}<\pi_{i}, then

1maxi1pi1πi1<t<1+1maxiπipi1.\displaystyle-\frac{1}{\max_{i}\frac{1-p_{i}}{1-\pi_{i}}-1}<t<1+\frac{1}{\max_{i}\frac{\pi_{i}}{p_{i}}-1}. (16)

Hence, this yields the closed interval combining inequalities (15) and (16).

We next focus on a case of the qq-geodesic.

Proposition 7.

Consider a qq-geodesic

𝒑t(q)=Zt{(1t)𝝅q1+t𝒑q1}1q1\mbox{\boldmath$p$}^{(q)}_{t}=Z_{t}\{(1-t)\mbox{\boldmath$\pi$}^{q-1}+t\mbox{\boldmath$p$}^{q-1}\}^{\frac{1}{q-1}}

connecting 𝛑\pi and 𝐩p in in the interior of ΔS1\Delta_{S-1}, where 0t10\leq t\leq 1 and ztz_{t} is the normalizing constant. The range of tt is enlarged to a closed interval

[min(1R1(q)1,1Q1(q)1),1+min(1R2(q)1,1Q2(q)1)]\displaystyle\left[-\min\Big{(}\frac{1}{R_{1}^{(q)}-1},\frac{1}{Q_{1}^{(q)}-1}\Big{)},1+\min\Big{(}\frac{1}{R_{2}^{(q)}-1},\frac{1}{Q_{2}^{(q)}-1}\Big{)}\right] (17)

where

R1(q)=maxipiq1πiq1,R2(q)=maxiπiq1piq1,Q1(q)=maxi1(Ztpi)q11(Ztπi)q1,Q2(q)=maxi1(Ztπi)q11(Ztpi)q1.\displaystyle R_{1}^{(q)}=\max_{i}\frac{p_{i}^{q-1}}{\pi_{i}^{q-1}},\quad R_{2}^{(q)}=\max_{i}\frac{\pi_{i}^{q-1}}{p_{i}^{q-1}},\quad Q_{1}^{(q)}=\max_{i}\frac{1-(Z_{t}p_{i})^{q-1}}{1-(Z_{t}\pi_{i})^{q-1}},\quad Q_{2}^{(q)}=\max_{i}\frac{1-(Z_{t}\pi_{i})^{q-1}}{1-(Z_{t}p_{i})^{q-1}}.
Proof.

By definition,

(Ztπi)q1<{(Ztpi)q1(Ztπiq1)}t<1(Ztπi)q1(i=1,,S).-(Z_{t}\pi_{i})^{q-1}<\{(Z_{t}p_{i})^{q-1}-(Z_{t}\pi_{i}^{q-1})\}t<1-(Z_{t}\pi_{i})^{q-1}\quad(i=1,...,S).

which can be written as

Πi<(PiΠi}t<1Πi(i=1,,S).-\Pi_{i}<(P_{i}-\Pi_{i}\}t<1-\Pi_{i}\quad(i=1,...,S).

where Πi=(Ztπi)q1\Pi_{i}=(Z_{t}\pi_{i})^{q-1} and Pi=(Ztpi)q1P_{i}=(Z_{t}p_{i})^{q-1}. This is essentially equal to (14). Therefore, we conclude the closed interval (17) by the same discussion as that in the proof of Proposition 6. ∎