This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Note on Follow-the-Perturbed-Leader in
Combinatorial Semi-Bandit Problems

Botao Chen111 Kyoto University; chen.botao.63r@st.kyoto-u.ac.jp.    Junya Honda222 Kyoto University and RIKEN AIP; honda@i.kyoto-u.ac.jp.
Abstract

This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in size-invariant combinatorial semi-bandit problems. Recently, Honda et al. (2023) and Lee et al. (2024) showed that FTPL achieves Best-of-Both-Worlds (BOBW) optimality in standard multi-armed bandit problems with Fréchet-type distributions. However, the optimality of FTPL in combinatorial semi-bandit problems remains unclear. In this paper, we consider the regret bound of FTPL with geometric resampling (GR) in size-invariant semi-bandit setting, showing that FTPL respectively achieves O(m2d1αT+mdT)O\quantity(\sqrt{m^{2}d^{\frac{1}{\alpha}}T}+\sqrt{mdT}) regret with Fréchet distributions, and the best possible regret bound of O(mdT)O\quantity(\sqrt{mdT}) with Pareto distributions in adversarial setting. Furthermore, we extend the conditional geometric resampling (CGR) to size-invariant semi-bandit setting, which reduces the computational complexity from O(d2)O(d^{2}) of original GR to O(md(log(d/m)+1))O\quantity(md\quantity(\log(d/m)+1)) without sacrificing the regret performance of FTPL.

1 Introduction

The combinatorial semi-bandit is a sequential decision-making problem under uncertainty, which is a generalization of the classical multi-armed bandit problem. It is instrumental in many practical applications, such as recommender systems (Wang et al., 2017), online advertising (Nuara et al., 2022), crowdsourcing (ul Hassan and Curry, 2016), adaptive routing (Gai et al., 2012) and network optimization (Kveton et al., 2014). In this problem, the learner chooses an action ata_{t} from an action set 𝒜{0,1}d\mathcal{A}\subset\quantity{0,1}^{d}, where dd\in\mathbb{N} is the dimension of the action set. In each round t[T]={1,2,,T}t\in[T]=\quantity{1,2,\cdots,T}, the loss vector t=(t,1,t,2,,t,d)\ell_{t}=\quantity(\ell_{t,1},\ell_{t,2},\cdots,\ell_{t,d}) is determined by the environment, and the learner incurs a loss t,at\left\langle\ell_{t},a_{t}\right\rangle and can only observe the loss t,i\ell_{t,i} for all i[d]i\in[d] such that at,i=1a_{t,i}=1. The goal of the learner is to minimize the cumulative loss over all the rounds. The performance of the learner is often measured in terms of the pseudo-regret defined as (T)=𝔼[t=1Tt,at]mina𝒜𝔼[t=1Tt,a]\mathcal{R}(T)=\mathbb{E}\quantity[\sum_{t=1}^{T}\left\langle\ell_{t},a_{t}\right\rangle]-\min_{a\in\mathcal{A}}\mathbb{E}\quantity[\sum_{t=1}^{T}\left\langle\ell_{t},a\right\rangle], which describes the gap between the expected cumulative loss of the learner and of the optimal arm fixed in hindsight.

Since the introduction by Chen et al. (2013), combinatorial semi-bandit problem has been widely studied, which mainly focus on two settings on the formulation of the environment to decide the loss vector, namely the stochastic setting and the adversarial setting. In the stochastic setting, the sequence of loss vectors (t)t=1T\quantity(\ell_{t})_{t=1}^{T} is assumed to be independent and identically distributed (i.i.d.) from an unknown but fixed distribution 𝒟\mathcal{D} over [0,1]d[0,1]^{d} with mean μ=𝔼𝒟[]\mu=\mathbb{E}_{\ell\sim\mathcal{D}}\quantity[\ell]. The fixed single optimal action is defined as a=argmina𝒜𝔼[t=1Tt,a]a^{*}=\mathop{\arg\min}_{a\in\mathcal{A}}\mathbb{E}\quantity[\sum_{t=1}^{T}\left\langle\ell_{t},a\right\rangle], and we present the minimum suboptimality gap as Δ=mina𝒜{a}{μ(aa)}\Delta=\min_{a\in\mathcal{A}\setminus\{a^{*}\}}\quantity{\mu^{\top}(a-a^{*})}. CombUCB (Kveton et al., 2015) and Combinatorial Thompson Sampling (Wang and Chen, 2018) can achieves a gap-dependent regret bounds of O(dmlogT/Δ)O\quantity(dm\log T/\Delta) for general action sets and O((dm)logT/Δ)O\quantity((d-m)\log T/\Delta) for matroid semi-bandits, where m=maxa𝒜am=\max_{a\in\mathcal{A}}\left\lVert a\right\rVert denotes the maximum size of any action in the set 𝒜\mathcal{A}.

In the adversarial setting, the loss vectors t\ell_{t} is determined from [0,1]d[0,1]^{d} by an adversary in an arbitrary manner, which are not assumed to follow any specific distribution (Kveton et al., 2015; Neu, 2015; Wang and Chen, 2018). For this setting, the regret bound of O(mdT)O\quantity(\sqrt{mdT}) can be achieved by some policies, such as OSMD (Audibert et al., 2014) and FTRL with hybrid-regularizer (Zimmert et al., 2019), which matches the lower bound of Ω(mdT)\Omega\quantity(\sqrt{mdT}) (Audibert et al., 2014).

In practical scenarios, the environment to determine the loss vectors is often unknown. Therefore, policies that can adaptively address both stochastic and adversarial settings have been widely studied, particularly in the context of standard multi-armed bandit problems. The Tsallis-INF policy (Zimmert and Seldin, 2021) policy, which is based on Follow-the-Regularized-Leader (FTRL), has demonstrated the ability to achieve the optimality in both settings. For combinatorial semi-bandit problems, there also exist some work on this topic (Wei and Luo, 2018; Zimmert et al., 2019; Ito, 2021; Tsuchiya et al., 2023).

Howerver, some BOBW policies, such as FTRL, require an explicit computation of the arm-selection probability by solving a optimization problem. This leads to computational inefficiencies, and the complexity substantially increase in for combinatorial semi-bandits. In light of this limitation, the Follow-the-Perturbed-Leader (FTPL) policy, has gained significant attention due to its optimization-free nature. Recently, Honda et al. (2023) and Lee et al. (2024) demonstrated that FTPL achieves the Best-of-Both-Worlds (BOBW) optimality in standard multi-armed bandit problems with Fréchet-type perturbations, which inspires researchers to explore the optimality of FTPL in combinatorial semi-bandit problems. A preliminary effort by Zhan et al. (2025) aimed to tackle this setting, though their analysis contains a technical flaw. In fact, the analysis becomes substantially more complex in the combinatorial semi-bandit setting, which needs more furter investigation.

Contributions of This Paper

Firstly, we investigate the optimality of FTPL with geometric resampling with Fréchet or Pareto distributions in adversarial size-invariant semi-bandit problems. We show that FTPL respectively achieves O(m2d1αT+mdT)O\quantity(\sqrt{m^{2}d^{\frac{1}{\alpha}}T}+\sqrt{mdT}) regret with Fréchet distributions, and the best possible regret bound of O(mdT)O\quantity(\sqrt{mdT}) with Pareto distributions in this setting. To the best of our knowledge, this is the first work that provides a correct proof of the regret bound for FTPL with Fréchet-type distributions in adversarial combinatorial semi-bandit problems. Furthermore, we extend the technique called Conditional Geometric Resampling (CGR) (Chen et al., 2025) to size-invariant semi-bandit setting, which reduces the computational complexity from O(d2)O(d^{2}) of the original GR to O(md(log(d/m)+1))O\quantity(md\quantity(\log(d/m)+1)) without sacrificing the regret guarantee of the one with the original GR.

1.1 Related Work

1.1.1 Technical Issues in Zhan et al. (2025)

The most closely related work is by Zhan et al. (2025). In their paper, they consider the FTPL policy with Fréchet distribution with shape 22 in the size-invariant semi-bandits, which is a special case of combinatorial semi-bandit problems. They provide a proof to claim that FTPL with Fréchet distribution with shape 22 achieves O(mdlog(d)T)O\quantity(\sqrt{md\log(d)T}) regret in adversarial setting and a logrithmic regret bound in stochastic setting. However, their proof includes a serious issue that renders the main result incorrect, which is explained below.

A function is analyzed in Lemma 4.1 in Zhan et al. (2025), which is later used in their evaluation of a component of the regret called the stability term. In this lemma, they evaluate the function for two cases, and the second case is just mentioned as “can be shown by the same argument” without a proof. However, upon closer inspection, this claim cannot be justified by an analogy. In Section 9, we highlight the detailed step where the analogy fails, and further support this observation with numerical verification, which demonstrates that the claimed result does not hold.

The main difficulty of the optimal regret analysis of FTPL lies in the analysis in the stability term (Honda et al., 2023; Lee et al., 2024), which is also the problem we mainly addressed. Unfortunately, this main difficulty lies behind these skipped or incorrect arguments, and thus we need an essentially new technique to complete the regret analysis for this problem. In this paper, we evaluate the stability term in a totally different way in Lemmas 36, which demonstrates that the stability term can be bounded by the maximum of simple quantities each of which is associated with a subset of base-arms.

2 Problem Setup

In this section, we formulate the problem and introduce the framework of FTPL with geometric resampling. We consider an action set 𝒜{0,1}d\mathcal{A}\subset\quantity{0,1}^{d}, where each element a𝒜a\in\mathcal{A} is called an action. For each base-arm i[d]i\in[d], we assume that there exists at least an action a𝒜a\in\mathcal{A} such that ai=1a_{i}=1. In this paper, we consider a special case of action sets in combinatorial semi-bandit, referred to as size-invariant semi-bandit. In this setting, we define the action set 𝒜={a:a1=m}\mathcal{A}=\quantity{a:\left\lVert a\right\rVert_{1}=m}, where mm is the number of selected base-arms at each round. At each round t[T]={1,2,,T}t\in[T]=\quantity{1,2,\cdots,T}, the environment determines a loss vector t=(t,1,t,2,,t,d)[0,1]d\ell_{t}=\quantity(\ell_{t,1},\ell_{t,2},\cdots,\ell_{t,d})^{\top}\in[0,1]^{d}, and the learner takes an action at𝒜a_{t}\in\mathcal{A} and incurs a loss t,at\left\langle\ell_{t},a_{t}\right\rangle. In the semi-bandit setting, the learner only observes the loss t,i\ell_{t,i} for all i[d]i\in[d] such that at,i=1a_{t,i}=1, whereas t,i\ell_{t,i} that corresponds to at,i=0a_{t,i}=0 is not observed.

In this paper, we only consider the setting that the loss vector is determined in an adversarial way. In this setting, the loss vectors (t)t=1T\quantity(\ell_{t})_{t=1}^{T} are not assumed to follow any specific distribution, and they are determined in an arbitrary manner, which may depend on the past history of the actions and losses {(s,as)}s=1t1\quantity{\quantity(\ell_{s},a_{s})}_{s=1}^{t-1}.

The performance of the learner is evaluated in terms of the pseudo-regret, which is defined as

(T)=𝔼[t=1Tt,ata],amina𝒜𝔼[t=1Tt,a].\mathcal{R}(T)=\mathbb{E}\quantity[\sum_{t=1}^{T}\left\langle\ell_{t},a_{t}-a^{*}\right\rangle],\quad a^{*}\in\min_{a\in\mathcal{A}}\mathbb{E}\quantity[\sum_{t=1}^{T}\left\langle\ell_{t},a\right\rangle].

2.1 Follow-the-Perturbed-Leader

Table 1: Notation
Symbol Meaning
𝒜{0,1}d\mathcal{A}\subset\{0,1\}^{d} Action set
dd\in\mathbb{N} Dimensionality of action set
mdm\leq d m=a1m=\|a\|_{1} for any a𝒜a\in\mathcal{A}
η\eta Learning rate
t[0,1]d\ell_{t}\in[0,1]^{d} Loss vector
^t[0,]d\hat{\ell}_{t}\in\quantity[0,\infty]^{d} Estimated loss vector
L^t[0,]d\hat{L}_{t}\in[0,\infty]^{d} Cumulative estimated loss vector
Rank(i,𝒖;)\text{{Rank}}\quantity(i,\bm{u};\mathcal{B}) Rank of ii-th element of 𝒖\bm{u} in {uj:j}\quantity{u_{j}:j\in\mathcal{B}} in descending order, \mathcal{B} omitted when =[d]\mathcal{B}=[d]
σi\sigma_{i} The number of arms (including ii itself) whose cumulative losses do not exceed L^t,i\hat{L}_{t,i}, i.e., Rank(i,L^t)\text{{Rank}}(i,-\hat{L}_{t})
ν\nu Left-end point of perturbation
rt[ν,]dr_{t}\in[\nu,\infty]^{d} dd-dimensional perturbation
f(x)f(x) Probability distribution function of perturbation
F(x)F(x) Cumulative distribution function of perturbation
α\mathcal{F}_{\alpha} Fréchet distribution with shape α\alpha
𝒫α\mathcal{P}_{\alpha} Pareto distribution with shape α\alpha
Input: Action set 𝒜{0,1}d\mathcal{A}\subseteq\{0,1\}^{d}, learning rate η+\eta\in\mathbb{R}^{+};
1 Initialization: L^1𝟎d\hat{L}_{1}\coloneqq\mathbf{0}\in\mathbb{R}^{d};
2
3for t=1,,Tt=1,\dots,T do
4   Sample rt=(rt,1,rt,2,,rt,d)r_{t}=\quantity(r_{t,1},r_{t,2},\cdots,r_{t,d}) i.i.d. from 𝒟\mathcal{D};
5   Choose action at=argmina𝒜{a(ηL^trt)}a_{t}=\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r_{t})\right\} and observe {t,i:at,i=1}\quantity{\ell_{t,i}:a_{t,i}=1};
6   Compute an estimator wt,i1^\widehat{w_{t,i}^{-1}} for wt,i1w_{t,i}^{-1} by geometric resampling for all ii such that at,i=1a_{t,i}=1;
7   Set ^ti:at,i=1t,iwt,i1^ei\hat{\ell}_{t}\coloneqq\sum_{i:a_{t,i}=1}\ell_{t,i}\widehat{w_{t,i}^{-1}}e_{i} and L^t+1L^t+^t\hat{L}_{t+1}\coloneqq\hat{L}_{t}+\hat{\ell}_{t};
8 
9 end for
Algorithm 1 Follow-the-Perturbed-Leader

We consider the Follow-the-Perturbed-Leader (FTPL) policy, whose entire procedure is given in Algorithm 1. In combinatorial semi-bandit problems, FTPL policy maintains a cumulative estimated loss L^t\hat{L}_{t} and plays an action

at=argmina𝒜{a(ηL^trt)},a_{t}=\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r_{t})\right\},

where η+\eta\in\mathbb{R}^{+} is the learning rate, and rt=(rt,1,rt,2,,rt,d)r_{t}=\quantity(r_{t,1},r_{t,2},\cdots,r_{t,d}) denotes the random perturbation i.i.d. from a common distribution 𝒟\mathcal{D} with a distribution function FF. In this paper, we consider two types of perturbation distributions. The first is Fréchet distribution α\mathcal{F}_{\alpha}, with the probability density function f(x)f(x) and the cumulative distribution function F(x)F(x) given by

f(x)=αx(α+1)e1/xα,F(x)=e1/xα,x0,α>1.f(x)=\alpha x^{-(\alpha+1)}e^{-1/x^{\alpha}},\quad F(x)=e^{-1/x^{\alpha}},\quad x\geq 0,\alpha>1.

The second is Pareto distribution 𝒫α\mathcal{P}_{\alpha}, whose density and cumulative distribution functions are defined as

f(x)=αx(α+1),F(x)=1xα,x1,α>1.f(x)=\alpha x^{-(\alpha+1)},\quad F(x)=1-x^{-\alpha},\quad x\geq 1,\alpha>1.

Denote the rank of ii-th element in 𝒖\bm{u} in descending order as Rank(i,𝒖)\text{{Rank}}\quantity(i,\bm{u}). The probability of selecting base-arm ii and Rank(i,rλ)=θ[m]\text{{Rank}}\quantity(i,r-\lambda)=\theta\in[m] given L^t\hat{L}_{t} is written as ϕi,θ(ηL^t;𝒟)\phi_{i,\theta}(\eta\hat{L}_{t};\mathcal{D}). Then, for λ[0,)d\lambda\in[0,\infty)^{d}, letting λ~1,λ~2,,λ~d\widetilde{\lambda}_{1},\widetilde{\lambda}_{2},\cdots,\widetilde{\lambda}_{d} be the sorted elements of λ\lambda such that λ~1λ~2λ~d\widetilde{\lambda}_{1}\leq\widetilde{\lambda}_{2}\leq\cdots\leq\widetilde{\lambda}_{d}, ϕi,θ(λ;𝒟)\phi_{i,\theta}(\lambda;\mathcal{D}) can be expressed as

ϕi,θ(λ;𝒟)\displaystyle\phi_{i,\theta}(\lambda;\mathcal{D}) r=(r1,,rd)𝒟[Rank(i,rλ)=θ]\displaystyle\coloneqq\mathbb{P}_{r=\quantity(r_{1},\dots,r_{d})\sim\mathcal{D}}\quantity[\text{{Rank}}\quantity(i,r-\lambda)=\theta]
=νλ~θ𝒗𝒮i,θ(j:vj=1(1F(z+λj))j:vj=0,jiF(z+λj))dF(z+λi),\displaystyle=\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}\sum_{\bm{v}\in\mathcal{S}_{i,\theta}}\quantity(\prod_{j:v_{j}=1}\quantity(1-F(z+\lambda_{j}))\prod_{j:v_{j}=0,j\neq i}F(z+\lambda_{j}))\differential F(z+\lambda_{i}), (1)

where 𝒮i,θ\mathcal{S}_{i,\theta} is defined as 𝒮i,θ={𝒗{0,1}d:𝒗1=θ1,vi=0}\mathcal{S}_{i,\theta}=\quantity{\bm{v}\in\quantity{0,1}^{d}:\left\lVert\bm{v}\right\rVert_{1}=\theta-1,v_{i}=0}. Here, ν\nu denotes the left endpoint of the support of FF.

Then, we write the probability of selecting the base-arm ii as wt,i=ϕi(ηL^t;𝒟)w_{t,i}=\phi_{i}(\eta\hat{L}_{t};\mathcal{D}), where for λ[0,)d\lambda\in[0,\infty)^{d}

ϕi(λ;𝒟)\displaystyle\phi_{i}(\lambda;\mathcal{D}) θ=1mr=(r1,,rd)𝒟[Rank(i,rλ)=θ]=θ=1mϕi,θ(λ;𝒟)\displaystyle\coloneqq\sum_{\theta=1}^{m}\mathbb{P}_{r=\quantity(r_{1},\dots,r_{d})\sim\mathcal{D}}\quantity[\text{{Rank}}\quantity(i,r-\lambda)=\theta]=\sum_{\theta=1}^{m}\phi_{i,\theta}(\lambda;\mathcal{D})
=θ=1mνλ~θ𝒗𝒮i,θ(j:vj=1(1F(z+λj))j:vj=0,jiF(z+λj))dF(z+λi).\displaystyle=\sum_{\theta=1}^{m}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}\sum_{\bm{v}\in\mathcal{S}_{i,\theta}}\quantity(\prod_{j:v_{j}=1}\quantity(1-F(z+\lambda_{j}))\prod_{j:v_{j}=0,j\neq i}F(z+\lambda_{j}))\differential F(z+\lambda_{i}). (2)

Table 1 summarizes the notation used in this paper.

2.2 Geometric Resampling

Input: Chosen action ata_{t}, action set 𝒜\mathcal{A}, cumulative loss L^t\hat{L}_{t}, learning rate η\eta
1 Set K𝟎dK\coloneqq\bm{0}\in\mathbb{R}^{d}; sats\coloneqq a_{t};
2 repeat
3 KK+sK\coloneqq K+s;
4   Sample rt=(rt,1,rt,2,,rt,d)r_{t}^{\prime}=(r_{t,1}^{\prime},r_{t,2}^{\prime},\dots,r_{t,d}^{\prime}) i.i.d. from 𝒟\mathcal{D};
5 at=argmina𝒜{a(ηL^trt)}a^{\prime}_{t}=\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r_{t})\right\};
 ss(𝟏dat)s\coloneqq s\circ\quantity(\bm{1}_{d}-a^{\prime}_{t});
    // 𝟏d\bm{1}_{d} denotes the dd-dimensional all-ones vector
6 
7until s=𝟎s=\bm{0};
8Set wt,i1^Ki\widehat{w_{t,i}^{-1}}\coloneqq K_{i} for all ii such that at,i=1a_{t,i}=1;
Algorithm 2 Geometric Resampling

Since the loss in every round is partially observable in the setting of semi-bandit feedback, for the unbiased loss estimator, many policies generally use an estimator ^t\hat{\ell}_{t} for the loss vector t\ell_{t}. Then, the cumulative estimated loss vector L^t\hat{L}_{t} is obtained as L^t=s=1t1^s\hat{L}_{t}=\sum_{s=1}^{t-1}\hat{\ell}_{s}. In standard multi-armed bandit problems, many policies like FTRL often employ an importance-weighted (IW) estimator ^t=(t,It/wt,It)et,It\hat{\ell}_{t}=\quantity(\ell_{t,I_{t}}/w_{t,I_{t}})e_{t,I_{t}}, where ItI_{t} means the chosen arm at round tt, and the arm-selection probability wt,Itw_{t,I_{t}} is explicitly computed. However, in the combinatorial semi-bandit setting, the individual probability of each base-arm ii is not available, which complicates the construction of the unbiased estimator. To address this issue, Neu and Bartók (2016) proposed a technique called Geometric Resampling (GR). With this technique, for each selected base-arm ii, FTPL policy can efficiently compute an unbiased estimator wt,i1^\widehat{w_{t,i}^{-1}} for wt,i1w_{t,i}^{-1}. The procedure of GR is shown in Algorithm 2, where the notation aba\circ b denotes the element-wise product of two vectors aa and bb, i.e., (ab)i=aibi\quantity(a\circ b)_{i}=a_{i}b_{i} for all ii.

Now we consider the computational complexity of GR. Let Mt,iM_{t,i} denote the number of resampling taken by geometric resampling at round tt until sis_{i} switches from 11 to 0. Here, Mt,iM_{t,i} is equal to wt,i1^\widehat{w_{t,i}^{-1}} in GR. Then, the expected number of total resampling Mt=maxi:at,i=1Mt,iM_{t}=\max_{i:a_{t,i}=1}M_{t,i} given L^t\hat{L}_{t} can be bounded as

𝔼[Mt|L^t]\displaystyle\mathbb{E}\quantity[M_{t}\middle|\hat{L}_{t}] =𝔼[maxi:at,i=1Mt,i|L^t]\displaystyle=\mathbb{E}\quantity[\max_{i:a_{t,i}=1}M_{t,i}\middle|\hat{L}_{t}]
𝔼[i=1dat,iMt,i|L^t]\displaystyle\leq\mathbb{E}\quantity[\sum_{i=1}^{d}a_{t,i}M_{t,i}\middle|\hat{L}_{t}]
=i=1d𝔼[at,i|L^t]𝔼[Mt,i|L^t,at,i]\displaystyle=\sum_{i=1}^{d}\mathbb{E}\quantity[a_{t,i}\middle|\hat{L}_{t}]\mathbb{E}\quantity[M_{t,i}\middle|\hat{L}_{t},a_{t,i}]
=i=1dwt,i1wt,i=d.\displaystyle=\sum_{i=1}^{d}w_{t,i}\cdot\frac{1}{w_{t,i}}=d.

For each resampling, the generation of perturbation requires the complexity of O(d)O(d), and thus the total complexity of GR at each round is O(d2)O(d^{2}), which is independent of wtw_{t}. Compared with many other policies, in combinatorial semi-bandit problems, FTPL with GR is computationally more efficient. However, in standard KK-armed bandit problems, the computational complexity of FTPL with GR still maintains O(K2)O(K^{2}). Though FTPL remains efficient for moderate size of KK (Honda et al., 2023), primarily thanks to its optimization-free nature, the running time increases substantially as KK grows to a large number. To overcome this limitation, Chen et al. (2025) proposed an improved technique called Conditional Geometric Resampling (CGR), which reduces the complexity to O(KlogK)O(K\log K) and demonstrates superior runtime performance. Inspired by this, we extend CGR to size-invariant semi-bandit setting in this paper, which is presented in Section 5.

3 Regret Bounds

In this section, we summarize the regret bound of FTPL in adversarial size-invariant semi-bandit setting.

3.1 Main Results

Combining Lemma 2 from this section and Lemma 6 given in Section 4, we can obtain the regret bound of FTPL in adversarial setting in the following theorem.

Theorem 1.

In the adversarial setting, when perturbation follows Fréchet distribution with shape α>1\alpha>1, FTPL with learning rate

η=(αα1m11α+Γ(11α))(d+1)1α+m2(α+1)(m+1α)1α(m+αα1(dm+1)11/α)T\eta=\sqrt{\frac{\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}+m}{2(\alpha+1)\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+\frac{\alpha}{\alpha-1}(d-m+1)^{1-1/\alpha})T}}

satisfies

(T)2(2(α+1)(m+1α)1α(m+αα1(dm+1)11/α)((αα1m11α+Γ(11α))(d+1)1α+m)T)12,\mathcal{R}(T)\leq\\ 2\quantity(2(\alpha+1)\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+\frac{\alpha}{\alpha-1}(d-m+1)^{1-1/\alpha})\quantity(\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}+m)T)^{\frac{1}{2}},

whose order is O(m2d1αT+mdT)O\quantity(\sqrt{m^{2}d^{\frac{1}{\alpha}}T}+\sqrt{mdT}), where Γ()\Gamma(\cdot) is the gamma function. When perturbation follows Pareto distribution with shape α>1\alpha>1, FTPL with learning rate

η=(αm11α+(α1)Γ(11α))(d+1)1α4α2(m+1α)1αd11/αT\eta=\sqrt{\frac{\quantity(\alpha m^{1-\frac{1}{\alpha}}+\quantity(\alpha-1)\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}}{4\alpha^{2}\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}d^{1-1/\alpha}T}}

satisfies

(T)4α(α1)12((m+1α)1α(αα1m11α+Γ(11α))d11/α(d+1)1αT)12,\mathcal{R}(T)\leq\frac{4\alpha}{\quantity(\alpha-1)^{\frac{1}{2}}}\quantity(\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))d^{1-1/\alpha}\quantity(d+1)^{\frac{1}{\alpha}}T)^{\frac{1}{2}},

whose order is O(mdT)O\quantity(\sqrt{mdT}).

3.2 Regret Decomposition

To evaluate the regret of FTPL, we firstly decompose the regret which is expressed as

(T)=𝔼[t=1Tt,ata]=t=1T𝔼[t,ata]=t=1T𝔼[^t,ata].\mathcal{R}(T)=\mathbb{E}\quantity[\sum_{t=1}^{T}\left\langle\ell_{t},a_{t}-a^{*}\right\rangle]=\sum_{t=1}^{T}\mathbb{E}\quantity[\left\langle\ell_{t},a_{t}-a^{*}\right\rangle]=\sum_{t=1}^{T}\mathbb{E}\quantity[\left\langle\hat{\ell}_{t},a_{t}-a^{*}\right\rangle].

This can be decomposed in the following way, whose proof is given in Section 6.

Lemma 2.

For any α>1\alpha>1 and 𝒟α{α,𝒫α}\mathcal{D}_{\alpha}\in\quantity{\mathcal{F}_{\alpha},\mathcal{P}_{\alpha}},

(T){t=1T𝔼[^t,wtwt+1]+(αα1m11α+Γ(11α))(d+1)1α+mηif 𝒟α=α,t=1T𝔼[^t,wtwt+1]+(αα1m11α+Γ(11α))(d+1)1αηif 𝒟α=𝒫α.\mathcal{R}(T)\leq\begin{cases}\sum_{t=1}^{T}\mathbb{E}\left[\left\langle\hat{\ell}_{t},w_{t}-w_{t+1}\right\rangle\right]+\frac{\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}+m}{\eta}&\text{if }\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha},\\ \sum_{t=1}^{T}\mathbb{E}\left[\left\langle\hat{\ell}_{t},w_{t}-w_{t+1}\right\rangle\right]+\frac{\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}}{\eta}&\text{if }\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}.\end{cases} (3)

We refer to the first and second terms of (3) as stability term and penalty term, respectively.

4 Stability of Arm-selection Probability

In the standard multi-armed bandit problem, the core and most challeging part in analyzing the regret of FTPL lies on the analysis of the arm-selection probability function (Abernethy et al., 2015; Honda et al., 2023; Lee et al., 2024). This challenge is further amplified in the combinatorial semi-bandit setting, where the base-arm selection probability ϕi(λ;𝒟)\phi_{i}(\lambda;\mathcal{D}) given in (2) exhibits significantly greater complexity. To this end, this section firstly introduces some tools used in the analysis and then derives properties of ϕi(λ;𝒟)\phi_{i}(\lambda;\mathcal{D}), which is the main difficulty of the analysis of FTPL.

4.1 General Tools for Analysis

Since the probability of some events in different base-arm set [d]\mathcal{B}\subset[d] will be considered in the subsequent analysis, we introduce the parameter \mathcal{B} into ϕi,θ()\phi_{i,\theta}(\cdot). Denote the rank of ii-th element of 𝒖\bm{u} in {uj:j}\quantity{u_{j}:j\in\mathcal{B}} in descending order as Rank(i,𝒖;)\text{{Rank}}\quantity(i,\bm{u};\mathcal{B}). We define

ϕi,θ(λ;𝒟α,)\displaystyle\phi_{i,\theta}(\lambda;\mathcal{D}_{\alpha},\mathcal{B}) =r=(r1,,rd)𝒟[Rank(i,rλ;)=θ]\displaystyle=\mathbb{P}_{r=\quantity(r_{1},\dots,r_{d})\sim\mathcal{D}}\quantity[\text{{Rank}}\quantity(i,r-\lambda;\mathcal{B})=\theta]
=νλ~θ𝒗𝒮i,θ(j:vj=1(1F(z+λj))j:vj=0,j{i}F(z+λj))dF(z+λi),\displaystyle=\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}\sum_{\bm{v}\in\mathcal{S}_{i,\theta}^{\mathcal{B}}}\quantity(\prod_{j:v_{j}=1}\quantity(1-F(z+\lambda_{j}))\prod_{j:v_{j}=0,j\in\mathcal{B}\setminus\quantity{i}}F(z+\lambda_{j}))\differential F(z+\lambda_{i}), (4)

where 𝒮i,θ={𝒗{0,1}d:𝒗1=θ1,vi=0, and vj=0 for all j}\mathcal{S}_{i,\theta}^{\mathcal{B}}=\quantity{\bm{v}\in\quantity{0,1}^{d}:\left\lVert\bm{v}\right\rVert_{1}=\theta-1,v_{i}=0,\text{ and }v_{j}=0\text{ for all }j\notin\mathcal{B}}.

Under this definition, ϕi,θ(λ;𝒟α,)\phi_{i,\theta}(\lambda;\mathcal{D}_{\alpha},\mathcal{B}) means the probability that the base-arm ii ranks θ\theta-th among the base-arm set \mathcal{B}. Based on (4), we define

ϕi(λ;𝒟α,m~,)=θ=1m~r=(r1,,rd)𝒟[Rank(i,rλ;)=θ]=θ=1m~ϕi,θ(λ;𝒟α,),\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=\sum_{\theta=1}^{\widetilde{m}}\mathbb{P}_{r=\quantity(r_{1},\dots,r_{d})\sim\mathcal{D}}\quantity[\text{{Rank}}\quantity(i,r-\lambda;\mathcal{B})=\theta]=\sum_{\theta=1}^{\widetilde{m}}\phi_{i,\theta}(\lambda;\mathcal{D}_{\alpha},\mathcal{B}),

which represents the probability of selecting base-arm ii when the base-arm set and the number of selected base-arms are respectively set as \mathcal{B} and m~\widetilde{m} in size-invariant semi-bandit setting. Since the definition above is an extension of (1) and (2), we have

ϕi,θ(λ;𝒟α)=ϕi,θ(λ;𝒟α,[d]),andϕi(λ;𝒟α)=ϕi(λ;𝒟α,m,[d]).\phi_{i,\theta}(\lambda;\mathcal{D}_{\alpha})=\phi_{i,\theta}(\lambda;\mathcal{D}_{\alpha},[d]),\quad\text{and}\quad\phi_{i}(\lambda;\mathcal{D}_{\alpha})=\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,[d]).

For analysis on the derivative, based on (4) we define

Ji,θ(λ;𝒟α,)=νλ~θ1z+λi𝒗𝒮i,θ(j:vj=1(1F(z+λj))j:vj=0,j{i}F(z+λj))dF(z+λi)J_{i,\theta}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})=\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}\frac{1}{z+\lambda_{i}}\sum_{\bm{v}\in\mathcal{S}_{i,\theta}^{\mathcal{B}}}\quantity(\prod_{j:v_{j}=1}\quantity(1-F(z+\lambda_{j}))\prod_{j:v_{j}=0,j\in\mathcal{B}\setminus\quantity{i}}F(z+\lambda_{j}))\differential F(z+\lambda_{i})

and

Ji(λ;𝒟α,m~,)=θ=1m~Ji,θ(λ;𝒟α,).J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=\sum_{\theta=1}^{\widetilde{m}}J_{i,\theta}(\lambda;\mathcal{D}_{\alpha},\mathcal{B}).

When m~=m\widetilde{m}=m and =[d]\mathcal{B}=[d], we simply write

Ji,θ(λ;𝒟α)=Ji,θ(λ;𝒟α,m,[d]),andJi(λ;𝒟α)=Ji(λ;𝒟α,m,[d]).J_{i,\theta}(\lambda;\mathcal{D}_{\alpha})=J_{i,\theta}(\lambda;\mathcal{D}_{\alpha},m,[d]),\quad\text{and}\quad J_{i}(\lambda;\mathcal{D}_{\alpha})=J_{i}(\lambda;\mathcal{D}_{\alpha},m,[d]).

In the following, we write σi\sigma_{i} to denote the number of arms (including ii itself) whose cumulative losses do not exceed L^t,i\hat{L}_{t,i}, i.e., Rank(i,L^t)=σi\text{{Rank}}(i,\hat{L}_{t})=\sigma_{i}. Without loss of generality, in the subsequent analysis we always assume λ1λ2λd\lambda_{1}\leq\lambda_{2}\leq\cdots\leq\lambda_{d} (ties are broken arbitrarily) so that σi=i\sigma_{i}=i for notational simplicity. To derive an upper bound, we employ the tools introduced above to provide lemmas related to the relation between the base-arm selection probability and its derivatives.

4.2 Important Lemmas

Lemma 3.

It holds that

Ji(λ;𝒟α)ϕi(λ;𝒟α)maxw{0}[(mi)1]θ[(mi)w]{Ji,θ(λ;𝒟α,i,w)ϕi,θ(λ;𝒟α,i,w)},\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m\land i)-w]\end{subarray}}\quantity{\frac{J_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}},

where

i,w={[i],if w=0,[i][w],if w[i], and λk={λi,if ki,λk,if k>i.\mathcal{B}_{i,w}=\begin{cases}[i],&\text{if }w=0,\\ [i]\setminus[w],&\text{if }w\in[i],\end{cases}\text{ and }\lambda_{k}^{*}=\begin{cases}\lambda_{i},&\text{if }k\leq i,\\ \lambda_{k},&\text{if }k>i.\end{cases}

Based on this result, the following lemma holds.

Lemma 4.

If Rank(i,λ)=σi\text{{Rank}}(i,-\lambda)=\sigma_{i}, that is, λi\lambda_{i} is the σi\sigma_{i}-th smallest among λ1,,λd\lambda_{1},\cdots,\lambda_{d} (ties are broken arbitrarily), then

Ji(λ;α)ϕi(λ;α)((σim)+1α(σim+1)1)1αandJi(λ;𝒫α)ϕi(λ;𝒫α)2αα+1((σim)+1ασi)1α.\frac{J_{i}(\lambda;\mathcal{F}_{\alpha})}{\phi_{i}(\lambda;\mathcal{F}_{\alpha})}\leq\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{(\sigma_{i}-m+1)\lor 1})^{\frac{1}{\alpha}}\quad\text{and}\quad\frac{J_{i}(\lambda;\mathcal{P}_{\alpha})}{\phi_{i}(\lambda;\mathcal{P}_{\alpha})}\leq\frac{2\alpha}{\alpha+1}\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{\sigma_{i}})^{\frac{1}{\alpha}}.

Next, following the steps in Honda et al. (2023) and Lee et al. (2024), we extend the analysis to the combinatorial semi-bandit setting and obtain the following lemma.

Lemma 5.

For any i[d],α>1,ηL^ti\in[d],\alpha>1,\eta\hat{L}_{t} and 𝒟α{α,𝒫α}\mathcal{D}_{\alpha}\in\quantity{\mathcal{F}_{\alpha},\mathcal{P}_{\alpha}}, it holds that

𝔼[^t,i(ϕi(ηL^t;𝒟α)ϕi(η(L^t+^t);𝒟α))|L^t]{2(α+1)η((σim)+1α(σim+1)1)1α,if 𝒟α=α,4αη((σim)+1ασi)1α,if 𝒟α=𝒫α.\mathbb{E}\quantity[\hat{\ell}_{t,i}\quantity(\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})-\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{D}_{\alpha}))\middle|\hat{L}_{t}]\leq\begin{cases}2(\alpha+1)\eta\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{(\sigma_{i}-m+1)\lor 1})^{\frac{1}{\alpha}},&\text{if }\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha},\\ 4\alpha\eta\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{\sigma_{i}})^{\frac{1}{\alpha}},&\text{if }\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}.\end{cases}

By using the above lemma, we can express the stability term as follows.

Lemma 6.

For any ηL^t\eta\hat{L}_{t}, α>1\alpha>1 and 𝒟α{α,𝒫α}\mathcal{D}_{\alpha}\in\quantity{\mathcal{F}_{\alpha},\mathcal{P}_{\alpha}}, it holds that

𝔼[^t(ϕ(ηL^t;𝒟α)ϕ(η(L^t+^t);𝒟α))|L^t]{2(α+1)η(m+1α)1α(m+αα1(dm+1)11/α),if 𝒟α=α,4α2α1η(m+1α)1αd11/α,if 𝒟α=𝒫α.\mathbb{E}\quantity[\hat{\ell}_{t}\quantity(\phi\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})-\phi\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{D}_{\alpha}))\middle|\hat{L}_{t}]\leq\\ \begin{cases}2(\alpha+1)\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+\frac{\alpha}{\alpha-1}(d-m+1)^{1-1/\alpha}),&\text{if }\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha},\\ \frac{4\alpha^{2}}{\alpha-1}\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}d^{1-1/\alpha},&\text{if }\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}.\end{cases}

5 Conditional Geometric Resampling for Size-Invariant Semi-Bandit

Building on the idea proposed by Chen et al. (2025), this section introduces an extension of Conditional Geometric Resampling (CGR) to the size-invariant semi-bandit setting. This algorithm is designed to provide multiple unbiased estimators {wt,i1:at,i=1}\quantity{w_{t,i}^{-1}:a_{t,i}=1} in a more efficient way, which is based on the following lemma.

Lemma 7.

Let t,i\mathcal{E}_{t,i} be an be an arbitrary necessary condition for

[argmina𝒜{a(ηL^trt′′)}]i=1.\quantity[\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r^{\prime\prime}_{t})\right\}]_{i}=1. (5)

Consider resampling of rt′′r_{t}^{\prime\prime} from 𝒟\mathcal{D} conditioned on t,i\mathcal{E}_{t,i} until (5) is satisfied. Then, the number Mt,iM_{t,i} of resampling for base-arm ii satisfies

𝔼[Mt,i|L^t,at,i]=[t,i|L^t,at,i]wt,i.\mathbb{E}[M_{t,i}|\hat{L}_{t},a_{t,i}]=\frac{\mathbb{P}[\mathcal{E}_{t,i}|\hat{L}_{t},a_{t,i}]}{w_{t,i}}.

From this lemma, we can use

wt,i1^=Mt,i[t,i|L^t,at,i]\widehat{{w}_{t,i}^{-1}}=\frac{M_{t,i}}{\mathbb{P}[\mathcal{E}_{t,i}|\hat{L}_{t},a_{t,i}]}

as an unbiased estimator of wt,i1w_{t,i}^{-1} for rt′′r_{t}^{\prime\prime} sampled from 𝒟\mathcal{D} conditioned on t,i\mathcal{E}_{t,i}.

Proof.

Define

χt,i(rt′′)={1,if [argmina𝒜{a(ηL^trt′′)}]i=1,0,otherwise.\chi_{t,i}(r_{t}^{\prime\prime})=\begin{cases}1,&\text{if }\quantity[\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r^{\prime\prime}_{t})\right\}]_{i}=1,\\ 0,&\text{otherwise}.\end{cases}

Consider wt,iw_{t,i}, the probability that base-arm ii is selected, with the condition t,i\mathcal{E}_{t,i}. wt,iw_{t,i} can be expressed as

wt,i\displaystyle w_{t,i} =[χt,i(rt′′)=1|L^t,at,i]\displaystyle=\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\hat{L}_{t},a_{t,i}]
=[χt,i(rt′′)=1|t,i,L^t,at,i][t,i|L^t,at,i]+[χt,i(rt′′)=1|t,ic,L^t,at,i][t,ic|L^t,at,i].\displaystyle=\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}]\mathbb{P}[\mathcal{E}_{t,i}|\hat{L}_{t},a_{t,i}]+\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i}^{c},\hat{L}_{t},a_{t,i}]\mathbb{P}[\mathcal{E}_{t,i}^{c}|\hat{L}_{t},a_{t,i}].{} (6)

Note that t,i\mathcal{E}_{t,i} is an arbitrary necessary condition for χt,i(rt′′)=1\chi_{t,i}(r_{t}^{\prime\prime})=1, which implies that

[χt,i(rt′′)=1|t,ic,L^t,at,i]=0.\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i}^{c},\hat{L}_{t},a_{t,i}]=0.

Therefore, from (6) we immediately obtain

wt,i=[χt,i(rt′′)=1|t,i,L^t,at,i][t,i|L^t,at,i].w_{t,i}=\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}]\mathbb{P}[\mathcal{E}_{t,i}|\hat{L}_{t},a_{t,i}]. (7)

Now we consider the expected number of resampling Mt,iM_{t,i} for base-arm ii. Recall that rt′′r_{t}^{\prime\prime} is sampled from 𝒟\mathcal{D} conditioned on t,i\mathcal{E}_{t,i} until (5) is satisfied, that is, χt,i(rt′′)=1\chi_{t,i}(r_{t}^{\prime\prime})=1. Then Mt,iM_{t,i} follows geometric distribution with probability mass function

[Mt,i=m|L^t,at,i]=(1[χt,i(rt′′)=1|t,i,L^t,at,i])m1[χt,i(rt′′)=1|t,i,L^t,at,i].\mathbb{P}[M_{t,i}=m|\hat{L}_{t},a_{t,i}]=\quantity(1-\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}])^{m-1}\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}].

Therefore, the expected number of resampling given L^t\hat{L}_{t} and at,ia_{t,i} is expressed as

𝔼rt′′𝒟|t,i[Mt,i|L^t,at,i]\displaystyle\mathbb{E}_{r_{t}^{\prime\prime}\sim\mathcal{D}|\mathcal{E}_{t,i}}[M_{t,i}|\hat{L}_{t},a_{t,i}] =[χt,i(rt′′)=1|t,i,L^t,at,i]n=1n(1[χt,i(rt′′)=1|t,i,L^t,at,i])n1\displaystyle=\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}]\sum_{n=1}^{\infty}n\quantity(1-\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}])^{n-1}
=[χt,i(rt′′)=1|t,i,L^t,at,i]/([χt,i(rt′′)=1|t,i,L^t,at,i])2\displaystyle=\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}]/\quantity(\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}])^{2}
=1/[χt,i(rt′′)=1|t,i,L^t,at,i].\displaystyle=1/\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}].{} (8)

Combining (7) and (8), we obtain

𝔼rt′′𝒟|t,i[Mt,i|L^t,at,i]=[t,i|L^t,at,i]wt,i.\displaystyle\mathbb{E}_{r_{t}^{\prime\prime}\sim\mathcal{D}|\mathcal{E}_{t,i}}[M_{t,i}|\hat{L}_{t},a_{t,i}]=\frac{\mathbb{P}[\mathcal{E}_{t,i}|\hat{L}_{t},a_{t,i}]}{w_{t,i}}.

For base arm ii such that σi>m\sigma_{i}>m, we now consider resampling rt′′r^{\prime\prime}_{t} from the perturbation distribution 𝒟\mathcal{D} conditioned on

t,i={|{j:rt,j′′rt,i′′,σjσi}|m},\mathcal{E}_{t,i}=\quantity{\left|\quantity{j:r^{\prime\prime}_{t,j}\leq r^{\prime\prime}_{t,i},\sigma_{j}\leq\sigma_{i}}\right|\leq m},

that is, the event that rt,i′′r^{\prime\prime}_{t,i} lies among the top-mm largest of the base-arms jj whose cumulative estimated losses are no worse than ii. By the symmetry nature of the i.i.d. perturbations, we can sample rt′′r^{\prime\prime}_{t} from this conditional distribution with simple operation, which corresponds to Lines 33 in Algorithm 3. For each base-arm ii such that σim\sigma_{i}\leq m, the resampling procedure in our proposed algorithm is the same as the original GR. By Lemma 7, we can derive the properties of CGR in size-invariant semi-bandit setting as follows.

Input: Chosen action ata_{t}, action set 𝒜\mathcal{A}, cumulative loss L^t\hat{L}_{t}, learning rate η\eta;
1 Set K𝟎dK\coloneqq\bm{0}\in\mathbb{R}^{d}; sats\coloneqq a_{t}; UU\coloneqq\varnothing; C𝟏ddC\coloneqq\mathbf{1}_{d}\in\mathbb{R}^{d};
2 for i=1,,di=1,\dots,d do
3 if at,i=1 and σi>ma_{t,i}=1\text{ and }\sigma_{i}>m then
4    UU{i}U\coloneqq U\cup\{i\}; Ciσi/mC_{i}\coloneqq\sigma_{i}/m;
5    
6   end if
7 
8 end for
9
10repeat
11 KK+sK\coloneqq K+s;
12 
13  Sample rt=(rt,1,rt,2,,rt,d)r_{t}^{\prime}=(r_{t,1}^{\prime},r_{t,2}^{\prime},\dots,r_{t,d}^{\prime}) i.i.d. from 𝒟\mathcal{D};
14 
15 atargmina𝒜{a(ηL^trt)}a^{\prime}_{t}\coloneqq\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r^{\prime}_{t})\right\};
16 
17  Sample θ\theta from [m][m] uniformly at random;
18 
19 for iUi\in U do
20      Find ii^{\prime} such that rt,ir^{\prime}_{t,i^{\prime}} is the θ-th\theta\text{-th} largest in {rt,j:σjσi}\quantity{r^{\prime}_{t,j}:\sigma_{j}\leq\sigma_{i}};
21      Set r′′rr^{\prime\prime}\coloneqq r^{\prime};
22      Swap rt,i′′r^{\prime\prime}_{t,i^{\prime}} and rt,i′′r^{\prime\prime}_{t,i};
23    at,i[argmina𝒜{a(ηL^trt′′)}]ia^{\prime}_{t,i}\coloneqq\quantity[\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r^{\prime\prime}_{t})\right\}]_{i};
24    if at,i=1a^{\prime}_{t,i}=1 then UU{i}U\coloneqq U\setminus\{i\};
25    
26   end for
27 ss(𝟏dat)s\coloneqq s\circ\quantity(\mathbf{1}_{d}-a^{\prime}_{t});
28 
29until s=0s=0;
30Set wt,i1^CiKi\widehat{w_{t,i}^{-1}}\coloneqq C_{i}K_{i} for all ii such that at,i=1a_{t,i}=1;
Algorithm 3 Conditional Geometric Resampling
Lemma 8.

The sample rt′′r^{\prime\prime}_{t} obtained by Algorithm 3 for each base-arm ii such that σi>m\sigma_{i}>m follows the conditional distribution of 𝒟\mathcal{D} given

t,i={|{j:rt,j′′rt,i′′,σjσi}|m}.\mathcal{E}_{t,i}=\quantity{\left|\quantity{j:r^{\prime\prime}_{t,j}\leq r^{\prime\prime}_{t,i},\sigma_{j}\leq\sigma_{i}}\right|\leq m}.

In addition, for any i[d]i\in[d],

wt,i1^=(σim1)Mt,i\widehat{w_{t,i}^{-1}}=\quantity(\frac{\sigma_{i}}{m}\lor 1)M_{t,i}

given Algorithm 3 serves as an unbiased estimator for wt,i1w_{t,i}^{-1}, and the number of resampling MtM_{t} satisfies

𝔼rt𝒟,rt′′𝒟|t,i[Mt|L^t]m+mlog(d/m).\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}|\mathcal{E}_{t,i}}\quantity[M_{t}\middle|\hat{L}_{t}]\leq m+m\log\quantity(d/m). (9)
Proof.

Let []\mathbb{P}^{*}[\cdot] denote the probability distribution of rt′′r^{\prime\prime}_{t} after the value-swapping operation, and Ranki,j\text{{Rank}}_{i,j} denote the rank of rt,j′′r^{\prime\prime}_{t,j} among {rt,k′′:σkσi}\quantity{r^{\prime\prime}_{t,k}:\sigma_{k}\leq\sigma_{i}}. Then, we have t,i={Ranki,i[m]}\mathcal{E}_{t,i}=\quantity{\text{{Rank}}_{i,i}\in[m]}. Given L^t,at,i\hat{L}_{t},a_{t,i} and θ\theta, for any realization θ0\theta_{0} in [m][m] of θ\theta we have

[j:σjσi{rt,j′′xj}|L^t,at,i,θ=θ0]\displaystyle\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]
=\displaystyle= j:σjσi[k:σkσi,i{j,i}{rt,k′′xk},rt,j′′xi,rt,i′′xj,Ranki,j=θ0|L^t,at,i]\displaystyle\sum_{j:\sigma_{j}\leq\sigma_{i}}\mathbb{P}\quantity[\bigcap\nolimits_{k:\sigma_{k}\leq\sigma_{i},i\notin\quantity{j,i}}\left\{r^{\prime\prime}_{t,k}\leq x_{k}\right\},\,r^{\prime\prime}_{t,j}\leq x_{i},\,r^{\prime\prime}_{t,i}\leq x_{j},\,\text{{Rank}}_{i,j}=\theta_{0}\middle|\hat{L}_{t},a_{t,i}]
=\displaystyle= j:σjσi[i:σiσi,i{j,i}{rt,i′′xi},rt,j′′xi,rt,i′′xj|Ranki,j=θ0,L^t,at,i][Ranki,j=θ0|L^t,at,i].\displaystyle\sum_{j:\sigma_{j}\leq\sigma_{i}}\mathbb{P}\quantity[\bigcap\nolimits_{i:\sigma_{i}\leq\sigma_{i},i\notin\quantity{j,i}}\left\{r^{\prime\prime}_{t,i}\leq x_{i}\right\},\,r^{\prime\prime}_{t,j}\leq x_{i},\,r^{\prime\prime}_{t,i}\leq x_{j}\middle|\text{{Rank}}_{i,j}=\theta_{0},\hat{L}_{t},a_{t,i}]\mathbb{P}\quantity[\text{{Rank}}_{i,j}=\theta_{0}\middle|\hat{L}_{t},a_{t,i}].{} (10)

By symmetry of rt′′[ν,)dr^{\prime\prime}_{t}\in[\nu,\infty)^{d}, we have

[Ranki,j=θ0|L^t,at,i]=[Ranki,i=θ0|L^t,at,i]\displaystyle\mathbb{P}\quantity[\text{{Rank}}_{i,j}=\theta_{0}\middle|\hat{L}_{t},a_{t,i}]=\mathbb{P}\quantity[\text{{Rank}}_{i,i}=\theta_{0}\middle|\hat{L}_{t},a_{t,i}] (11)

for any jj suth that σjσi\sigma_{j}\leq\sigma_{i}. Then we have

1\displaystyle 1 =[j:σjσi{Ranki,j=θ0}|L^t,at,i]\displaystyle=\mathbb{P}\quantity[\bigcup_{j:\sigma_{j}\leq\sigma_{i}}\{\text{{Rank}}_{i,j}=\theta_{0}\}\middle|\hat{L}_{t},a_{t,i}]
=j:σjσi[Ranki,j=θ0|L^t,at,i]\displaystyle=\sum_{j:\sigma_{j}\leq\sigma_{i}}\mathbb{P}\quantity[\text{{Rank}}_{i,j}=\theta_{0}\middle|\hat{L}_{t},a_{t,i}]
=σi[Ranki,i=θ0|L^t,at,i],\displaystyle=\sigma_{i}\mathbb{P}\quantity[\text{{Rank}}_{i,i}=\theta_{0}\middle|\hat{L}_{t},a_{t,i}],

which means that (11) is equal to 1/σi1/\sigma_{i}. Therefore, from (10) we have

[j:σjσi{rt,j′′xj}|L^t,at,i,θ=θ0]=1σij:σjσi[i:σiσi,i{j,i}{rt,i′′xi},rt,j′′xi,rt,i′′xj|Ranki,j=θ0,L^t,at,i].\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]=\\ \frac{1}{\sigma_{i}}\sum_{j:\sigma_{j}\leq\sigma_{i}}\mathbb{P}\quantity[\bigcap\nolimits_{i:\sigma_{i}\leq\sigma_{i},i\notin\quantity{j,i}}\left\{r^{\prime\prime}_{t,i}\leq x_{i}\right\},\,r^{\prime\prime}_{t,j}\leq x_{i},\,r^{\prime\prime}_{t,i}\leq x_{j}\middle|\text{{Rank}}_{i,j}=\theta_{0},\hat{L}_{t},a_{t,i}]. (12)

By symmetry, each probability term in the RHS of (12) is equal. Therefore, we have

[j:σjσi{rt,j′′xj}|L^t,at,i,θ=θ0]=[j:σjσi{rt,j′′xj}|Ranki,i=θ0,L^t,at,i],\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]=\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\text{{Rank}}_{i,i}=\theta_{0},\hat{L}_{t},a_{t,i}],

with which we immediately obtain

1mθ0[m][j:σjσi{rt,j′′xj}|L^t,at,i,θ=θ0]=1mθ0[m][j:σjσi{rt,j′′xj}|Ranki,i=θ0,L^t,at,i].\frac{1}{m}\sum_{\theta_{0}\in[m]}\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]=\\ \frac{1}{m}\sum_{\theta_{0}\in[m]}\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\text{{Rank}}_{i,i}=\theta_{0},\hat{L}_{t},a_{t,i}]. (13)

For the LHS of (13), since for any θ0[m]\theta_{0}\in[m] we have [θ=θ0]=1/m\mathbb{P}[\theta=\theta_{0}]=1/m, we have

1mθ0[m][j:σjσi{rt,j′′xj}|L^t,at,i,θ=θ0]\displaystyle\frac{1}{m}\sum_{\theta_{0}\in[m]}\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]
=\displaystyle= θ0[m][j:σjσi{rt,j′′xj}|L^t,at,i,θ=θ0][θ=θ0]\displaystyle\sum_{\theta_{0}\in[m]}\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]\mathbb{P}\quantity[\theta=\theta_{0}]
=\displaystyle= θ0[m][j:σjσi{rt,j′′xj}|L^t,at,i,θ=θ0][θ=θ0|L^t,at,i]\displaystyle\sum_{\theta_{0}\in[m]}\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]\mathbb{P}\quantity[\theta=\theta_{0}\middle|\hat{L}_{t},a_{t,i}]
=\displaystyle= θ0[m][j:σjσi{rt,j′′xj},θ=θ0|L^t,at,i]\displaystyle\sum_{\theta_{0}\in[m]}\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\},\theta=\theta_{0}\middle|\hat{L}_{t},a_{t,i}]
=\displaystyle= [j:σjσi{rt,j′′xj}|L^t,at,i].\displaystyle\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\hat{L}_{t},a_{t,i}].{} (14)

On the other hand, for any θ0[m]\theta_{0}\in[m] and σim\sigma_{i}\geq m we have

(t,i)=[Ranki,i[m]|L^t,at,i]=θ0[m][Ranki,i=θ0|L^t,at,i]=m/σi,\mathbb{P}\quantity(\mathcal{E}_{t,i})=\mathbb{P}\quantity[\text{{Rank}}_{i,i}\in[m]\middle|\,\hat{L}_{t},a_{t,i}]=\sum_{\theta_{0}\in[m]}\mathbb{P}\quantity[\text{{Rank}}_{i,i}=\theta_{0}\middle|\,\hat{L}_{t},a_{t,i}]=m/\sigma_{i}, (15)

and

[Ranki,i=θ0|Ranki,i[m],L^t,at,i]\displaystyle\mathbb{P}\quantity[\text{{Rank}}_{i,i}=\theta_{0}\middle|\text{{Rank}}_{i,i}\in[m],\hat{L}_{t},a_{t,i}] =[Ranki,i=θ0|L^t,at,i][Ranki,i[m]|L^t,at,i]\displaystyle=\frac{\mathbb{P}\quantity[\text{{Rank}}_{i,i}=\theta_{0}\middle|\,\hat{L}_{t},a_{t,i}]}{\mathbb{P}\quantity[\text{{Rank}}_{i,i}\in[m]\middle|\,\hat{L}_{t},a_{t,i}]}
=1/σim/σi=1m.\displaystyle=\frac{1/\sigma_{i}}{m/\sigma_{i}}=\frac{1}{m}.

Then, for the RHS of (13) we have

1mθ0[m][j:σjσi{rt,j′′xj}|Ranki,i=θ0,L^t,at,i]\displaystyle\frac{1}{m}\sum_{\theta_{0}\in[m]}\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\text{{Rank}}_{i,i}=\theta_{0},\hat{L}_{t},a_{t,i}]
=\displaystyle= θ0[m][j:σjσi{rt,j′′xj}|Ranki,i=θ0,L^t,at,i][Ranki,i=θ0|Ranki,i[m],L^t,at,i]\displaystyle\sum_{\theta_{0}\in[m]}\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\text{{Rank}}_{i,i}=\theta_{0},\hat{L}_{t},a_{t,i}]\mathbb{P}\quantity[\text{{Rank}}_{i,i}=\theta_{0}\middle|\text{{Rank}}_{i,i}\in[m],\hat{L}_{t},a_{t,i}]
=\displaystyle= θ0[m][j:σjσi{rt,j′′xj},Ranki,i=θ0|Ranki,i[m],L^t,at,i]\displaystyle\sum_{\theta_{0}\in[m]}\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\},\text{{Rank}}_{i,i}=\theta_{0}\middle|\text{{Rank}}_{i,i}\in[m],\hat{L}_{t},a_{t,i}]
=\displaystyle= [j:σjσi{rt,j′′xj}|Ranki,i[m],L^t,at,i].\displaystyle\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\text{{Rank}}_{i,i}\in[m],\hat{L}_{t},a_{t,i}].{} (16)

Combining (13), (14) and (16), we have

[j:σjσi{rt,j′′xj}|L^t,at,i]=[j:σjσi{rt,j′′xj}|Ranki,i[m],L^t,at,i],\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\hat{L}_{t},a_{t,i}]=\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\text{{Rank}}_{i,i}\in[m],\hat{L}_{t},a_{t,i}],

which means that CGR samples rt′′r^{\prime\prime}_{t} from the conditional distribution of 𝒟\mathcal{D} conditioned on {Ranki,i[m]}\quantity{\text{{Rank}}_{i,i}\in[m]}. Combining this fact and (15) with Lemma 7, for σim\sigma_{i}\geq m we have

𝔼rt′′𝒟|t,i[Mt|L^t,at,i]=[Ranki,i[m]|L^t,at,i]wt,i=mσiwt,i.\mathbb{E}_{r^{\prime\prime}_{t}\sim\mathcal{D}|\mathcal{E}_{t,i}}[M_{t}|\hat{L}_{t},a_{t,i}]=\frac{\mathbb{P}\quantity[\text{{Rank}}_{i,i}\in[m]\middle|\,\hat{L}_{t},a_{t,i}]}{w_{t,i}}=\frac{m}{\sigma_{i}w_{t,i}}.

Note that for ii satisfying σi<m\sigma_{i}<m, the resampling method is the same as the original GR. For such ii we have

𝔼rt𝒟[Mt|L^t,at,i]=1wt,i.\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D}}[M_{t}|\hat{L}_{t},a_{t,i}]=\frac{1}{w_{t,i}}.

Therefore, for any i[d]i\in[d],

wt,i1^=(σim1)Mt,i\widehat{w_{t,i}^{-1}}=\quantity(\frac{\sigma_{i}}{m}\lor 1)M_{t,i}

serves as an unbiased estimator for wt,i1w_{t,i}^{-1}. Then, the expected number MtM_{t} of resampling given L^t\hat{L}_{t} in CGR is bounded by

𝔼rt𝒟,rt′′𝒟|t,i[Mt|L^t]\displaystyle\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}|\mathcal{E}_{t,i}}[M_{t}|\hat{L}_{t}] =𝔼rt𝒟,rt′′𝒟|t,i[maxi:at,i=1,σimMt,i+i:at,i=1,σi>mMt,i|L^t,at]\displaystyle=\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}|\mathcal{E}_{t,i}}\quantity[\max_{i:a_{t,i}=1,\sigma_{i}\leq m}M_{t,i}+\sum_{i:a_{t,i}=1,\sigma_{i}>m}M_{t,i}\middle|\hat{L}_{t},a_{t}]
𝔼rt𝒟,rt′′𝒟|t,i[i=1dat,iMt,i|L^t,at]\displaystyle\leq\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}|\mathcal{E}_{t,i}}\quantity[\sum_{i=1}^{d}a_{t,i}M_{t,i}\middle|\hat{L}_{t},a_{t}]
=i=1d[at,i=1|L^t]𝔼rt𝒟,rt′′𝒟|t,i[Mt,i|L^t,at,i=1]\displaystyle=\sum_{i=1}^{d}\mathbb{P}[a_{t,i}=1|\hat{L}_{t}]\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}|\mathcal{E}_{t,i}}[M_{t,i}|\hat{L}_{t},a_{t,i}=1]
=i=1mwt,i1wt,i+i=m+1dwt,imσiwt,i\displaystyle=\sum_{i=1}^{m}w_{t,i}\cdot\frac{1}{w_{t,i}}+\sum_{i=m+1}^{d}w_{t,i}\cdot\frac{m}{\sigma_{i}w_{t,i}}
m+mmd1xdx\displaystyle\leq m+m\int_{m}^{d}\frac{1}{x}\differential x
=m+mlog(dm).\displaystyle=m+m\log\quantity(\frac{d}{m}).

Average Complexity

Now we analyze the average complexity of CGR, which can be expressed as

CCGR=Cfilter+𝔼rt𝒟,rt′′𝒟|t,i[Mt|L^t]Cresampling,C_{\text{CGR}}=C_{\text{filter}}+\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}|\mathcal{E}_{t,i}}\quantity[M_{t}\middle|\hat{L}_{t}]\cdot C_{\text{resampling}}, (17)

where CfilterC_{\text{filter}} is the cost of scanning the base-arms and determining whether to include them in the set UU (Lines 33), and CresamplingC_{\text{resampling}} is the cost of each resampling. For the former, the condition σi>m\sigma_{i}>m that requires O(d)O(d) is only evaluated when ai=1a_{i}=1. Then, we have

Cfilter=dO(1)+a1O(d)=dO(1)+mO(d)=O(md).C_{\text{filter}}=d\cdot O(1)+\left\lVert a\right\rVert_{1}\cdot O(d)=d\cdot O(1)+m\cdot O(d)=O\quantity(md). (18)

For the resampling process, as shown in Algorithm 3, base-arms in UU and those not in UU are resampled differently, since the former involves an additional value-swapping operation (Lines 33). However, this operation does not change the order of the resampling cost, which remains Cresampling=O(d)C_{\text{resampling}}=O(d) in both cases. Combining (9), (17) and (18), we have

CCGR=O(md)+(m+mlog(d/m))O(d)=O(md(log(d/m)+1)).C_{\text{CGR}}=O(md)+\quantity(m+m\log\quantity(d/m))\cdot O(d)=O\quantity(md\quantity(\log\quantity(d/m)+1)).
Remark 1.

In this paper, though we only analyze the regret bound of FTPL with the original GR, the analysis of FTPL with CGR is similar, as we only need to replace the expression of expectation of the estimator wt,i1^2\widehat{{w_{t,i}^{-1}}}^{2} in (59). In fact, the regret bound of FTPL with CGR can attain a slightly better regret bound compared with the one with the original GR. This is because the variance of wt,i1^\widehat{{w_{t,i}^{-1}}} becomes

Var[wt,i1^|L^t,at,i]={1wt,i21wt,i(original GR),1wt,i21(t,i)wt,i(CGR),\mathrm{Var}[\widehat{w_{t,i}^{-1}}|\hat{L}_{t},a_{t,i}]=\begin{cases}\frac{1}{w_{t,i}^{2}}-\frac{1}{w_{t,i}}&\mbox{(original GR)},\\ \frac{1}{w_{t,i}^{2}}-\frac{1}{\mathbb{P}(\mathcal{E}_{t,i})w_{t,i}}&\mbox{(CGR)},\end{cases}

where the latter is no larger than the former.

6 Proofs for regret decomposition

In this section, we provide the proof of Lemma 2. Firstly, similarly to Lemma 3 in Honda et al. (2023), we prove the general framework of the regret decomposition that can be applied to general distributions.

Lemma 9.
(T)t=1T𝔼[^t,wtwt+1]+𝔼r1𝒟α[a1r1]η.\mathcal{R}(T)\leq\sum_{t=1}^{T}\mathbb{E}\quantity[\left\langle\hat{\ell}_{t},w_{t}-w_{t+1}\right\rangle]+\frac{\mathbb{E}_{r_{1}\sim\mathcal{D}_{\alpha}}\quantity[a_{1}^{\top}r_{1}]}{\eta}.
Proof.

Let us consider random variable r[0,)dr\in[0,\infty)^{d} that independently follows Fréchet distribution α\mathcal{F}_{\alpha} or Pareto distribution 𝒫α\mathcal{P}_{\alpha}, and is independent from the randomness {t,rt}t=1T\{\ell_{t},r_{t}\}_{t=1}^{T} of the environment and the policy. Define ut=argminwΔdηL^tr,wu_{t}=\mathop{\arg\min}_{w\in\Delta_{d}}\left\langle\eta\hat{L}_{t}-r,w\right\rangle, where Δd={p[0,1]d:i[d]1pim}\Delta_{d}=\{p\in[0,1]^{d}:\sum_{i\in[d]}1\leq p_{i}\leq m\}. Then, since rtr_{t} and rr are identically distributed given L^t\hat{L}_{t}, we have

𝔼[ut|L^t]=wt,𝔼[r,ut|L^t]=𝔼[atr|L^t].\mathbb{E}[u_{t}|\hat{L}_{t}]=w_{t},\quad\mathbb{E}[\langle r,u_{t}\rangle|\hat{L}_{t}]=\mathbb{E}[a_{t}^{\top}r|\hat{L}_{t}]. (19)

Denote the optimal action as aa^{*}. Recalling L^t=s=1t^s\hat{L}_{t}=\sum_{s=1}^{t}\hat{\ell}_{s}, we have

t=1T^t,a\displaystyle\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},a^{*}\right\rangle =L^T+1,a\displaystyle=\left\langle\hat{L}_{T+1},a^{*}\right\rangle
=L^T+11ηr,a+1ηr,a\displaystyle=\left\langle\hat{L}_{T+1}-\frac{1}{\eta}r,a^{*}\right\rangle+\frac{1}{\eta}\langle r,a^{*}\rangle
L^T+11ηr,uT+1+1ηr,a\displaystyle\geq\left\langle\hat{L}_{T+1}-\frac{1}{\eta}r,u_{T+1}\right\rangle+\frac{1}{\eta}\langle r,a^{*}\rangle
=L^T1ηr,uT+1+^T,uT+1+1ηr,a\displaystyle=\left\langle\hat{L}_{T}-\frac{1}{\eta}r,u_{T+1}\right\rangle+\left\langle\hat{\ell}_{T},u_{T+1}\right\rangle+\frac{1}{\eta}\langle r,a^{*}\rangle
L^T1ηr,uT+^T,uT+1+1ηr,a\displaystyle\geq\left\langle\hat{L}_{T}-\frac{1}{\eta}r,u_{T}\right\rangle+\left\langle\hat{\ell}_{T},u_{T+1}\right\rangle+\frac{1}{\eta}\langle r,a^{*}\rangle

and recursively applying this relation, we obtain

t=1T^t,a1ηr,u1+t=1T^t,ut+1+1ηr,a\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},a^{*}\right\rangle\geq\left\langle-\frac{1}{\eta}r,u_{1}\right\rangle+\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},u_{t+1}\right\rangle+\frac{1}{\eta}\langle r,a^{*}\rangle

and therefore

t=1T^t,uta1ηr,u1a+t=1T^t,utut+1.\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},u_{t}-a^{*}\right\rangle\leq\frac{1}{\eta}\left\langle r,u_{1}-a^{*}\right\rangle+\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},u_{t}-u_{t+1}\right\rangle.

By using (19) and taking the expectation with respect to rr we obtain

t=1T^t,wta\displaystyle\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},w_{t}-a^{*}\right\rangle 1η𝔼r𝒟α[r,u1a]+t=1T^t,wtwt+1\displaystyle\leq\frac{1}{\eta}\mathbb{E}_{r\sim\mathcal{D}_{\alpha}}\left[\left\langle r,u_{1}-a^{*}\right\rangle\right]+\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},w_{t}-w_{t+1}\right\rangle
1η𝔼r1𝒟α[a1r1]+t=1T^t,wtwt+1.\displaystyle\leq\frac{1}{\eta}\mathbb{E}_{r_{1}\sim\mathcal{D}_{\alpha}}\left[a_{1}^{\top}r_{1}\right]+\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},w_{t}-w_{t+1}\right\rangle.

For Fréchet and Pareto distributions, we bound 𝔼r1𝒟α[a1r1]\mathbb{E}_{r_{1}\sim\mathcal{D}_{\alpha}}\left[a_{1}^{\top}r_{1}\right] in the following lemma.

Lemma 10.

For 𝒟α{𝒫α,α}\mathcal{D}_{\alpha}\in\quantity{\mathcal{P}_{\alpha},\mathcal{F}_{\alpha}} and α>1\alpha>1, we have

𝔼r1𝒟α[a1r1]{(αα1m11α+Γ(11α))(d+1)1αif 𝒟α=𝒫α(αα1m11α+Γ(11α))(d+1)1α+mif 𝒟α=α.\mathbb{E}_{r_{1}\sim\mathcal{D}_{\alpha}}\left[a_{1}^{\top}r_{1}\right]\leq\begin{cases}\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}&\text{if }\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}\\ \quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}+m&\text{if }\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha}.\end{cases}
Proof.

Let rkr_{k}^{*} be the kk-th largest perturbation among r1,1,r1,2,,r1,dr_{1,1},r_{1,2},\cdots,r_{1,d} i.i.d. sampled from 𝒟α\mathcal{D}_{\alpha} for k[d]k\in[d]. Then, we have

𝔼r𝒟α[a1r1]𝔼r𝒟α[k=1mrk]k=1m𝔼r𝒟α[rk].\mathbb{E}_{r\sim\mathcal{D}_{\alpha}}\quantity[a_{1}^{\top}r_{1}]\leq\mathbb{E}_{r\sim\mathcal{D}_{\alpha}}\quantity[\sum_{k=1}^{m}r_{k}^{*}]\leq\sum_{k=1}^{m}\mathbb{E}_{r\sim\mathcal{D}_{\alpha}}\quantity[r_{k}^{*}]. (20)
Pareto Distribution

If 𝒟α=𝒫α\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}, by Lemma 17, we obtain

k=1m𝔼r𝒫α[rk]k=1mΓ(d+1)Γ(dk1α+1)Γ(dk+1)Γ(d1α+1).\sum_{k=1}^{m}\mathbb{E}_{r\sim\mathcal{P}_{\alpha}}\quantity[r_{k}^{*}]\leq\sum_{k=1}^{m}\frac{\Gamma\quantity(d+1)\Gamma\quantity(d-k-\frac{1}{\alpha}+1)}{\Gamma\quantity(d-k+1)\Gamma\quantity(d-\frac{1}{\alpha}+1)}. (21)

For k=m=dk=m=d, we have

Γ(d+1)Γ(dk1α+1)Γ(dk+1)Γ(d1α+1)\displaystyle\frac{\Gamma\quantity(d+1)\Gamma\quantity(d-k-\frac{1}{\alpha}+1)}{\Gamma\quantity(d-k+1)\Gamma\quantity(d-\frac{1}{\alpha}+1)} =Γ(d+1)Γ(11α)Γ(d1α+1)\displaystyle=\frac{\Gamma\quantity(d+1)\Gamma\quantity(1-\frac{1}{\alpha})}{\Gamma\quantity(d-\frac{1}{\alpha}+1)}
Γ(11α)(d+1)1α,\displaystyle\leq\Gamma\quantity(1-\frac{1}{\alpha})\quantity(d+1)^{\frac{1}{\alpha}}, (22)

where the last inequality follows from Gautschi’s inequality in Lemma 16. Similarly, for k[m]k\in[m] and k<dk<d, by Gautschi’s inequality, we have

Γ(d+1)Γ(dk1α+1)Γ(dk+1)Γ(d1α+1)(d+1dk)1α.\frac{\Gamma\quantity(d+1)\Gamma\quantity(d-k-\frac{1}{\alpha}+1)}{\Gamma\quantity(d-k+1)\Gamma\quantity(d-\frac{1}{\alpha}+1)}\leq\quantity(\frac{d+1}{d-k})^{\frac{1}{\alpha}}. (23)

By combining (20), (21), (22) and (23), we have

𝔼r𝒫α[a1r1]\displaystyle\mathbb{E}_{r\sim\mathcal{P}_{\alpha}}\quantity[a_{1}^{\top}r_{1}] k=1m𝔼r𝒫α[rk]\displaystyle\leq\sum_{k=1}^{m}\mathbb{E}_{r\sim\mathcal{P}_{\alpha}}\quantity[r_{k}^{*}]
Γ(11α)(d+1)1α+k=1m(d1)(d+1dk)1α\displaystyle\leq\Gamma\quantity(1-\frac{1}{\alpha})\quantity(d+1)^{\frac{1}{\alpha}}+\sum_{k=1}^{m\land(d-1)}\quantity(\frac{d+1}{d-k})^{\frac{1}{\alpha}}
<Γ(11α)(d+1)1α+k=1m(d+1k)1α\displaystyle<\Gamma\quantity(1-\frac{1}{\alpha})\quantity(d+1)^{\frac{1}{\alpha}}+\sum_{k=1}^{m}\quantity(\frac{d+1}{k})^{\frac{1}{\alpha}}
(Γ(11α)+1+1mx1αdx)(d+1)1α\displaystyle\leq\quantity(\Gamma\quantity(1-\frac{1}{\alpha})+1+\int_{1}^{m}x^{-\frac{1}{\alpha}}\differential x)\quantity(d+1)^{\frac{1}{\alpha}}
=(Γ(11α)+1+αα1x11α|1m)(d+1)1α\displaystyle=\quantity(\Gamma\quantity(1-\frac{1}{\alpha})+1+\frac{\alpha}{\alpha-1}x^{1-\frac{1}{\alpha}}\bigg{|}_{1}^{m})\quantity(d+1)^{\frac{1}{\alpha}}
=(αα1m11α+Γ(11α)1α1)(d+1)1α\displaystyle=\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha})-\frac{1}{\alpha-1})\quantity(d+1)^{\frac{1}{\alpha}}
<(αα1m11α+Γ(11α))(d+1)1α.\displaystyle<\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}. (24)
Fréchet Distribution

If 𝒟α=α\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha}, by combining (20), Lemma 19 and (24) we have

𝔼rα[a1r1]\displaystyle\mathbb{E}_{r\sim\mathcal{F}_{\alpha}}\quantity[a_{1}^{\top}r_{1}] k=1m𝔼rα[rk]\displaystyle\leq\sum_{k=1}^{m}\mathbb{E}_{r\sim\mathcal{F}_{\alpha}}\quantity[r_{k}^{*}]
k=1m𝔼r𝒫α[rk]+1\displaystyle\leq\sum_{k=1}^{m}\mathbb{E}_{r\sim\mathcal{P}_{\alpha}}\quantity[r_{k}^{*}]+1
<(αα1m11α+Γ(11α))(d+1)1α+m.\displaystyle<\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}+m.

7 Analysis on Stability Term

7.1 Proof for Monotonicity

Lemma 11.

Let ψ(x):[νλi,)\psi(x):[\nu-\lambda_{i},\infty)\to\mathbb{R} denote a non-negative function that is independent of λj\lambda_{j}. If jij\neq i and F(x)F(x) is the cumulative distribution function of Fréchet or Pareto distributions, then

ν(λiλj)ψ(z)F(z+λj)/(z+λi)dzν(λiλj)ψ(z)F(z+λj)dz\frac{\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\psi(z)F(z+\lambda_{j})/(z+\lambda_{i})\differential z}{\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\psi(z)F(z+\lambda_{j})\differential z}

is monotonically increasing in λj\lambda_{j}.

Proof.

Let

N(λj)=ν(λiλj)1z+λiψ(z)F(z+λj)dz,D(λj)=ν(λiλj)ψ(z)F(z+λj)dz.N(\lambda_{j})=\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)F(z+\lambda_{j})\differential z,\quad D(\lambda_{j})=\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\psi(z)F(z+\lambda_{j})\differential z.

The derivative of N(λj)/D(λj)N(\lambda_{j})/D(\lambda_{j}) with respect to λj\lambda_{j} is expressed as

ddλjN(λj)/D(λj)=N(λj)D(λj)N(λj)D(λj)(D(λj))2.\frac{d}{d\lambda_{j}}N(\lambda_{j})/D(\lambda_{j})=\frac{N^{\prime}(\lambda_{j})D(\lambda_{j})-N(\lambda_{j})D^{\prime}(\lambda_{j})}{(D(\lambda_{j}))^{2}}.

If λi>λj\lambda_{i}>\lambda_{j}, we have

N(λj)\displaystyle N^{\prime}(\lambda_{j}) =λjνλj1z+λiψ(z)F(z+λj)dz\displaystyle=\frac{\partial}{\partial\lambda_{j}}\int_{\nu-\lambda_{j}}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)F(z+\lambda_{j})\differential z
=νλj1z+λiψ(z)f(z+λj)dz+1(νλj)+λiψ(νλj)F((νλj)+λj)\displaystyle=\int_{\nu-\lambda_{j}}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)f(z+\lambda_{j})\differential z+\frac{1}{(\nu-\lambda_{j})+\lambda_{i}}\psi(\nu-\lambda_{j})F((\nu-\lambda_{j})+\lambda_{j})
=νλj1z+λiψ(z)f(z+λj)dz,\displaystyle=\int_{\nu-\lambda_{j}}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)f(z+\lambda_{j})\differential z,

where the last equality holds since F(ν)=0F(\nu)=0. On the other hand, if λiλj\lambda_{i}\leq\lambda_{j}, we have

N(λj)=λjνλi1z+λiψ(z)F(z+λj)dz=νλi1z+λiψ(z)f(z+λj)dz.N^{\prime}(\lambda_{j})=\frac{\partial}{\partial\lambda_{j}}\int_{\nu-\lambda_{i}}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)F(z+\lambda_{j})\differential z=\int_{\nu-\lambda_{i}}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)f(z+\lambda_{j})\differential z.

In both cases, we have

N(λj)=ν(λiλj)1z+λiψ(z)f(z+λj)dz.N^{\prime}(\lambda_{j})=\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)f(z+\lambda_{j})\differential z.

Similarly, we have

D(λj)=ν(λiλj)ψ(z)f(z+λj)dz,D^{\prime}(\lambda_{j})=\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\psi(z)f(z+\lambda_{j})\differential z,

Next, we divide the proof into two cases.

Fréchet Distribution

When F(x)F(x) is the cumulative distribution function of Fréchet distribution, we define ψ~(x)=ψ(x)e1/(x+λj)α\widetilde{\psi}(x)=\psi(x)e^{-1/(x+\lambda_{j})^{\alpha}}. Under this definition, we have

N(λj)D(λj)\displaystyle N^{\prime}(\lambda_{j})D(\lambda_{j}) =z,w(λiλj)ψ(z)ψ(w)f(z+λj)F(w+λj)(z+λi)dzdw\displaystyle=\iint_{z,w\geq-(\lambda_{i}\land\lambda_{j})}\frac{\psi(z)\psi(w)f(z+\lambda_{j})F(w+\lambda_{j})}{(z+\lambda_{i})}\differential z\differential w
=αz,w(λiλj)ψ~(z)ψ~(w)(z+λi)(z+λj)α+1dzdw\displaystyle=\alpha\iint_{z,w\geq-(\lambda_{i}\land\lambda_{j})}\frac{\widetilde{\psi}(z)\widetilde{\psi}(w)}{(z+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}\differential z\differential w
=α2z,w(λiλj)ψ~(z)ψ~(w)(1(z+λi)(z+λj)α+1+1(w+λi)(w+λj)α+1)dzdw.\displaystyle=\frac{\alpha}{2}\iint_{z,w\geq-(\lambda_{i}\land\lambda_{j})}\widetilde{\psi}(z)\widetilde{\psi}(w)\quantity(\frac{1}{(z+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}+\frac{1}{(w+\lambda_{i})(w+\lambda_{j})^{\alpha+1}})\differential z\differential w.

and

N(λj)D(λj)\displaystyle N(\lambda_{j})D^{\prime}(\lambda_{j}) =z,w(λiλj)ψ(z)ψ(w)F(z+λj)f(w+λj)(z+λi)dzdw\displaystyle=\iint_{z,w\geq-(\lambda_{i}\land\lambda_{j})}\frac{\psi(z)\psi(w)F(z+\lambda_{j})f(w+\lambda_{j})}{(z+\lambda_{i})}\differential z\differential w
=αz,w(λiλj)ψ~(z)ψ~(w)(z+λi)(w+λj)α+1dzdw\displaystyle=\alpha\iint_{z,w\geq-(\lambda_{i}\land\lambda_{j})}\frac{\widetilde{\psi}(z)\widetilde{\psi}(w)}{(z+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}\differential z\differential w
=α2z,w(λiλj)ψ~(z)ψ~(w)(1(z+λi)(w+λj)α+1+1(w+λi)(z+λj)α+1)dzdw.\displaystyle=\frac{\alpha}{2}\iint_{z,w\geq-(\lambda_{i}\land\lambda_{j})}\widetilde{\psi}(z)\widetilde{\psi}(w)\quantity(\frac{1}{(z+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}+\frac{1}{(w+\lambda_{i})(z+\lambda_{j})^{\alpha+1}})\differential z\differential w.

Here, by an elementary calculation we can see

1(z+λi)(z+λj)α+1+1(w+λi)(w+λj)α+11(z+λi)(w+λj)α+11(w+λi)(z+λj)α+1\displaystyle\frac{1}{(z+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}+\frac{1}{(w+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}-\frac{1}{(z+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}-\frac{1}{(w+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}
=wz(z+λi)(w+λi)(1(z+λj)α+11(w+λj)α+1)\displaystyle=\frac{w-z}{(z+\lambda_{i})(w+\lambda_{i})}\quantity(\frac{1}{(z+\lambda_{j})^{\alpha+1}}-\frac{1}{(w+\lambda_{j})^{\alpha+1}})
=(wz)((w+λj)α+1(z+λj)α+1)(z+λi)(w+λi)(z+λj)α+1(w+λj)α+10,\displaystyle=\frac{(w-z)\quantity(\quantity(w+\lambda_{j})^{\alpha+1}-\quantity(z+\lambda_{j})^{\alpha+1})}{(z+\lambda_{i})(w+\lambda_{i})(z+\lambda_{j})^{\alpha+1}(w+\lambda_{j})^{\alpha+1}}\geq 0,

where the last inequality holds since h(x)=xα+1h(x)=x^{\alpha+1} is monotonically increasing in [0,+)[0,+\infty) for α>0\alpha>0. Therefore, when F(x)F(x) is the cumulative distribution function of Fréchet distribution, we have ddλjN(λj)/D(λj)0\frac{d}{d\lambda_{j}}N(\lambda_{j})/D(\lambda_{j})\geq 0, which implies that N(λj)/D(λj)N(\lambda_{j})/D(\lambda_{j}) is monotonically increasing in λj\lambda_{j}.

Pareto Distribution

When F(x)F(x) is the cumulative distribution function of Pareto distribution, we have

N(λj)D(λj)\displaystyle N^{\prime}(\lambda_{j})D(\lambda_{j}) =z,w1(λiλj)ψ(z)ψ(w)f(z+λj)F(w+λj)(z+λi)dzdw\displaystyle=\iint_{z,w\geq 1-(\lambda_{i}\land\lambda_{j})}\frac{\psi(z)\psi(w)f(z+\lambda_{j})F(w+\lambda_{j})}{(z+\lambda_{i})}\differential z\differential w
=αz,w1(λiλj)ψ(z)ψ(w)(1(w+λj)α)(z+λi)(z+λj)α+1dzdw\displaystyle=\alpha\iint_{z,w\geq 1-(\lambda_{i}\land\lambda_{j})}\frac{\psi(z)\psi(w)(1-(w+\lambda_{j})^{-\alpha})}{(z+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}\differential z\differential w
=α2z,w1(λiλj)ψ(z)ψ(w)((1(w+λj)α)(z+λi)(z+λj)α+1+(1(z+λj)α)(w+λi)(w+λj)α+1)dzdw,\displaystyle=\frac{\alpha}{2}\iint_{z,w\geq 1-(\lambda_{i}\land\lambda_{j})}\psi(z)\psi(w)\quantity(\frac{(1-(w+\lambda_{j})^{-\alpha})}{(z+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}+\frac{(1-(z+\lambda_{j})^{-\alpha})}{(w+\lambda_{i})(w+\lambda_{j})^{\alpha+1}})\differential z\differential w,

and

N(λj)D(λj)\displaystyle N(\lambda_{j})D^{\prime}(\lambda_{j}) =z,w1(λiλj)ψ(z)ψ(w)f(z+λj)F(w+λj)(z+λi)dzdw\displaystyle=\iint_{z,w\geq 1-(\lambda_{i}\land\lambda_{j})}\frac{\psi(z)\psi(w)f(z+\lambda_{j})F(w+\lambda_{j})}{(z+\lambda_{i})}\differential z\differential w
=αz,w1(λiλj)ψ(z)ψ(w)(1(z+λj)α)(z+λi)(w+λj)α+1dzdw\displaystyle=\alpha\iint_{z,w\geq 1-(\lambda_{i}\land\lambda_{j})}\frac{\psi(z)\psi(w)(1-(z+\lambda_{j})^{-\alpha})}{(z+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}\differential z\differential w
=α2z,w1(λiλj)ψ(z)ψ(w)((1(z+λj)α)(z+λi)(w+λj)α+1+(1(w+λj)α)(w+λi)(z+λj)α+1)dzdw.\displaystyle=\frac{\alpha}{2}\iint_{z,w\geq 1-(\lambda_{i}\land\lambda_{j})}\psi(z)\psi(w)\quantity(\frac{(1-(z+\lambda_{j})^{-\alpha})}{(z+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}+\frac{(1-(w+\lambda_{j})^{-\alpha})}{(w+\lambda_{i})(z+\lambda_{j})^{\alpha+1}})\differential z\differential w.

Here, by an elementary calculation we can see

(1(w+λj)α)(z+λi)(z+λj)α+1+(1(z+λj)α)(w+λi)(w+λj)α+1(1(z+λj)α)(z+λi)(w+λj)α+1(1(w+λj)α)(w+λi)(z+λj)α+1\displaystyle\frac{(1-(w+\lambda_{j})^{-\alpha})}{(z+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}+\frac{(1-(z+\lambda_{j})^{-\alpha})}{(w+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}-\frac{(1-(z+\lambda_{j})^{-\alpha})}{(z+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}-\frac{(1-(w+\lambda_{j})^{-\alpha})}{(w+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}
=wz(z+λi)(w+λi)(1(w+λj)α(z+λj)α+11(z+λj)α(w+λj)α+1)\displaystyle=\frac{w-z}{(z+\lambda_{i})(w+\lambda_{i})}\quantity(\frac{1-(w+\lambda_{j})^{-\alpha}}{(z+\lambda_{j})^{\alpha+1}}-\frac{1-(z+\lambda_{j})^{-\alpha}}{(w+\lambda_{j})^{\alpha+1}})
=wz(z+λi)(w+λi)(z+λj)α+1(w+λj)α+1(((w+λj)α+1(w+λj))((z+λj)α+1(z+λj)))0,\displaystyle=\frac{w-z}{(z+\lambda_{i})(w+\lambda_{i})(z+\lambda_{j})^{\alpha+1}(w+\lambda_{j})^{\alpha+1}}\quantity(\quantity((w+\lambda_{j})^{\alpha+1}-(w+\lambda_{j}))-\quantity((z+\lambda_{j})^{\alpha+1}-(z+\lambda_{j})))\geq 0,

where the last inequality holds because h(x)=xα+1xh(x)=x^{\alpha+1}-x is monotonically increasing in [1,+)[1,+\infty) for α>0\alpha>0. Therefore, we have ddλjN(λj)/D(λj)0\frac{d}{d\lambda_{j}}N(\lambda_{j})/D(\lambda_{j})\geq 0, which concludes the proof. ∎

Lemma 12.

Let a,b>0a,b>0, f(x),g(x)>0f(x),g(x)>0, where xx\in\mathbb{R}. If both f(x)f(x) and g(x)/f(x)g(x)/f(x) are monotonically increasing in xx, then for any x1<x2x_{1}<x_{2}, we have

b+g(x1)a+f(x1)bab+g(x2)a+f(x2).\frac{b+g(x_{1})}{a+f(x_{1})}\leq\frac{b}{a}\lor\frac{b+g(x_{2})}{a+f(x_{2})}.

Provided that limx(b+g(x))/(a+f(x))\lim_{x\to\infty}\quantity(b+g(x))/\quantity(a+f(x)) exists, for any x0x_{0}\in\mathbb{R} we have

b+g(x0)a+f(x0)balimxb+g(x)a+f(x).\frac{b+g(x_{0})}{a+f(x_{0})}\leq\frac{b}{a}\lor\lim_{x\to\infty}\frac{b+g(x)}{a+f(x)}.
Proof.

According to the assumption, we have

f(x1)f(x2) and g(x1)f(x1)g(x2)f(x2).f(x_{1})\leq f(x_{2})\text{ and }\frac{g(x_{1})}{f(x_{1})}\leq\frac{g(x_{2})}{f(x_{2})}.

If b/a>g(x2)/f(x2)b/a>g(x_{2})/f(x_{2}), then we have

b+g(x1)a+f(x1)bag(x1)f(x1)bag(x2)f(x2)babab+g(x2)a+f(x2).\frac{b+g(x_{1})}{a+f(x_{1})}\leq\frac{b}{a}\lor\frac{g(x_{1})}{f(x_{1})}\leq\frac{b}{a}\lor\frac{g(x_{2})}{f(x_{2})}\leq\frac{b}{a}\leq\frac{b}{a}\lor\frac{b+g(x_{2})}{a+f(x_{2})}.

On the other hand, if b/ag(x2)/f(x2)b/a\leq g(x_{2})/f(x_{2}), then we have

b+g(x1)a+f(x1)=b+f(x1)g(x1)f(x1)a+f(x1)b+f(x1)g(x2)f(x2)a+f(x1).\frac{b+g(x_{1})}{a+f(x_{1})}=\frac{b+f(x_{1})\frac{g(x_{1})}{f(x_{1})}}{a+f(x_{1})}\leq\frac{b+f(x_{1})\frac{g(x_{2})}{f(x_{2})}}{a+f(x_{1})}.

Let h(z)=(b+g(x2)f(x2)z)/(a+z)h(z)=\quantity(b+\frac{g(x_{2})}{f(x_{2})}z)/\quantity(a+z), where z[f(x1),f(x2)]z\in[f(x_{1}),f(x_{2})]. Then, we have

h(z)\displaystyle h^{\prime}(z) =g(x2)f(x2)(a+z)(b+g(x2)f(x2)z)(a+z)2\displaystyle=\frac{\frac{g(x_{2})}{f(x_{2})}\quantity(a+z)-\quantity(b+\frac{g(x_{2})}{f(x_{2})}z)}{\quantity(a+z)^{2}}
=g(x2)f(x2)ab(a+z)2=a(g(x2)f(x2)ba)(a+z)20,\displaystyle=\frac{\frac{g(x_{2})}{f(x_{2})}a-b}{\quantity(a+z)^{2}}=\frac{a\quantity(\frac{g(x_{2})}{f(x_{2})}-\frac{b}{a})}{\quantity(a+z)^{2}}\geq 0,

which means that h(z)h(z) is monotonically increasing in [f(x1),f(x2)][f(x_{1}),f(x_{2})]. Therefore, we have

b+f(x1)g(x2)f(x2)a+f(x1)b+f(x2)g(x2)f(x2)a+f(x2)=b+g(x2)a+f(x2)bab+g(x2)a+f(x2).\frac{b+f(x_{1})\frac{g(x_{2})}{f(x_{2})}}{a+f(x_{1})}\leq\frac{b+f(x_{2})\frac{g(x_{2})}{f(x_{2})}}{a+f(x_{2})}=\frac{b+g(x_{2})}{a+f(x_{2})}\leq\frac{b}{a}\lor\frac{b+g(x_{2})}{a+f(x_{2})}.

Combining both cases, we have

b+g(x1)a+f(x1)bab+g(x2)a+f(x2)\frac{b+g(x_{1})}{a+f(x_{1})}\leq\frac{b}{a}\lor\frac{b+g(x_{2})}{a+f(x_{2})}

for any x1<x2x_{1}<x_{2}. If limx(b+g(x))/(a+f(x))\lim_{x\to\infty}\quantity(b+g(x))/\quantity(a+f(x)) exists, the result for the infinite case follows directly by taking the limit of x2x_{2}\to\infty. ∎

Lemma 13.

For m~>1\widetilde{m}>1 and j{i}j\in\mathcal{B}\setminus\quantity{i}, let λd\lambda^{\prime}\in\mathbb{R}^{d} be such that λjλj\lambda^{\prime}_{j}\geq\lambda_{j} and λk=λk\lambda_{k}^{\prime}=\lambda_{k} for all kjk\neq j. Then, we have

Ji(λ;𝒟α,m~,)ϕi(λ;𝒟α,m~,)Ji(λ;𝒟α,m~1,{j})ϕi(λ;𝒟α,m~1,{j})Ji(λ;𝒟α,m~,)ϕi(λ;𝒟α,m~,).\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}\leq\frac{J_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\frac{J_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}.
Proof.

For θm~\theta\leq\widetilde{m}, we define

i,θ,j={Rank(i,rλ;)=θRank(j,rλ;)},\mathcal{E}_{i,\theta,j}^{\mathcal{B}}=\quantity{\text{{Rank}}\quantity(i,r-\lambda;\mathcal{B})=\theta\leq\text{{Rank}}\quantity(j,r-\lambda;\mathcal{B})},

where the probability can be expressed as

ϕi,θ,j(λ;𝒟α,)\displaystyle\phi_{i,\theta,j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B}) r=(r1,,rd)𝒟{Rank(i,rλ;)=θRank(j,rλ;)}\displaystyle\coloneqq\mathbb{P}_{r=\quantity(r_{1},\dots,r_{d})\sim\mathcal{D}}\quantity{\text{{Rank}}\quantity(i,r-\lambda;\mathcal{B})=\theta\leq\text{{Rank}}\quantity(j,r-\lambda;\mathcal{B})}
=ν(λiλj)f(z+λi)F(z+λj)𝒗𝒮i,θ,j(k:vk=1(1F(z+λk))k:vk=0,k{i}F(z+λk))dz.\displaystyle=\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}f(z+\lambda_{i})F(z+\lambda_{j})\sum_{\bm{v}\in\mathcal{S}_{i,\theta,j}^{\mathcal{B}}}\quantity(\prod_{k:v_{k}=1}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i}}F(z+\lambda_{k}))\differential z. (25)

Here, 𝒮i,θ,j={𝒗{0,1}d:𝒗1=θ1,vi=vj=0, and vk=0 for all k}\mathcal{S}_{i,\theta,j}^{\mathcal{B}}=\quantity{\bm{v}\in\quantity{0,1}^{d}:\left\lVert\bm{v}\right\rVert_{1}=\theta-1,v_{i}=v_{j}=0,\text{ and }v_{k}=0\text{ for all }k\notin\mathcal{B}}. Corresponding to this, we define

Ji,θ,j(λ;𝒟α,)ν(λiλj)f(z+λi)z+λi𝒗𝒮i,θ,j(k:vk=1(1F(z+λk))k:vk=0,k{i}F(z+λk))dz.J_{i,\theta,j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})\coloneqq\\ \int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\frac{f(z+\lambda_{i})}{z+\lambda_{i}}\sum_{\bm{v}\in\mathcal{S}_{i,\theta,j}^{\mathcal{B}}}\quantity(\prod_{k:v_{k}=1}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i}}F(z+\lambda_{k}))\differential z. (26)

Considering the event

~i,j={Rank(i,rtλt;{j})m~1},\widetilde{\mathcal{E}}_{i,j}=\quantity{\text{{Rank}}\quantity(i,r_{t}-\lambda_{t};\mathcal{B}\setminus\quantity{j})\leq\widetilde{m}-1},

we have

r=(r1,,rd)𝒟(~i,j)=ϕi(λ;𝒟α,m~1,{j}).\mathbb{P}_{r=\quantity(r_{1},\dots,r_{d})\sim\mathcal{D}}\quantity(\widetilde{\mathcal{E}}_{i,j})=\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j}).

By definition, we can see that

{Rank(i,rtλt;)m~}=~i,ji,m~,j,~i,ji,m~,j=,\quantity{\text{{Rank}}\quantity(i,r_{t}-\lambda_{t};\mathcal{B})\leq\widetilde{m}}=\widetilde{\mathcal{E}}_{i,j}\cup\mathcal{E}_{i,\widetilde{m},j}^{\mathcal{B}},\quad\widetilde{\mathcal{E}}_{i,j}\cap\mathcal{E}_{i,\widetilde{m},j}^{\mathcal{B}}=\varnothing,

Therefore, we can docompose the expression of the probability as

ϕi(λ;𝒟α,m~,)=ϕi(λ;𝒟α,m~1,{j})+ϕi,m~,j(λ;𝒟α,).\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})+\phi_{i,\widetilde{m},j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B}). (27)

Similarly, by definition we have

Ji(λ;𝒟α,m~,)=Ji(λ;𝒟α,m~1,{j})+Ji,m~,j(λ;𝒟α,).J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})+J_{i,\widetilde{m},j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B}). (28)

Consider the expression of ϕi,θ,j(λ;𝒟α,m~,)\phi_{i,\theta,j}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}) and Ji,θ,j(λ;𝒟α,m~,)J_{i,\theta,j}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}) respectively given by (25) and (26). By Lemma 11, it follows that

Ji,m~,j(λ;𝒟α,)ϕi,m~,j(λ;𝒟α,)Ji,m~,j(λ;𝒟α,)ϕi,m~,j(λ;𝒟α,)\frac{J_{i,\widetilde{m},j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})}{\phi_{i,\widetilde{m},j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})}\leq\frac{J_{i,\widetilde{m},j}(\lambda^{\prime};\mathcal{D}_{\alpha},\mathcal{B})}{\phi_{i,\widetilde{m},j}(\lambda^{\prime};\mathcal{D}_{\alpha},\mathcal{B})}

because of the monotonic increase in λj\lambda_{j}. Then, since both ϕi(λ;𝒟α,m~1,{j})\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j}) and Ji(λ;𝒟α,m~1,{j})J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j}) do not depend on λj\lambda_{j}, we have

Ji(λ;𝒟α,m~,)ϕi(λ;𝒟α,m~,)\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})} =Ji(λ;𝒟α,m~1,{j})+Ji,m~,j(λ;𝒟α,)ϕi(λ;𝒟α,m~1,{j})+ϕi,m~,j(λ;𝒟α,)\displaystyle=\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})+J_{i,\widetilde{m},j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})+\phi_{i,\widetilde{m},j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})}
Ji(λ;𝒟α,m~1,{j})ϕi(λ;𝒟α,m~1,{j})Ji(λ;𝒟α,m~,)ϕi(λ;𝒟α,m~,)\displaystyle\leq\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\frac{J_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})} (29)
=Ji(λ;𝒟α,m~1,{j})ϕi(λ;𝒟α,m~1,{j})Ji(λ;𝒟α,m~,)ϕi(λ;𝒟α,m~,),\displaystyle=\frac{J_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\frac{J_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})},

where the inequality (29) holds by recalling that the same relation as (27) and (28) holds for λj\lambda_{j}^{\prime} in Lemma 12. ∎

The following lemma presents a special case of Lemma 13, where we take λj\lambda^{\prime}_{j}\to\infty.

Lemma 14.

Let m~2\widetilde{m}\geq 2. If m~<||\widetilde{m}<\left\lvert\mathcal{B}\right\rvert, we have

Ji(λ;𝒟α,m~,)ϕi(λ;𝒟α,m~,)Ji(λ;𝒟α,m~1,{j})ϕi(λ;𝒟α,m~1,{j})Ji(λ;𝒟α,m~,{j})ϕi(λ;𝒟α,m~,{j}).\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}\leq\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}\setminus\quantity{j})}. (30)

If m~=||\widetilde{m}=\left\lvert\mathcal{B}\right\rvert, we have

Ji(λ;𝒟α,m~,)ϕi(λ;𝒟α,m~,)Ji(λ;𝒟α,m~1,{j})ϕi(λ;𝒟α,m~1,{j}).\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}\leq\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}. (31)
Proof.
Inequality (30)

Recall that

ϕi(λ;𝒟α,m~,)=θ=1mνλ~θf(z+λi)𝒗𝒮i,θ(k:vk=1(1F(z+λk))k:vk=0,k{i}F(z+λk))dz,\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=\\ \sum_{\theta=1}^{m}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f(z+\lambda_{i})\sum_{\bm{v}\in\mathcal{S}_{i,\theta}^{\mathcal{B}}}\quantity(\prod_{k:v_{k}=1}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i}}F(z+\lambda_{k}))\differential z,

where 𝒮i,θ={𝒗{0,1}d:𝒗1=θ1,vi=0, and vk=0 for all k}\mathcal{S}_{i,\theta}^{\mathcal{B}}=\quantity{\bm{v}\in\quantity{0,1}^{d}:\left\lVert\bm{v}\right\rVert_{1}=\theta-1,v_{i}=0,\text{ and }v_{k}=0\text{ for all }k\notin\mathcal{B}}. Note that

𝒮i,θ𝒮i,θ,j={𝒗{0,1}d:𝒗1=θ1,vi=0,vj=1, and vk=0 for all k},\mathcal{S}_{i,\theta}^{\mathcal{B}}\setminus\mathcal{S}_{i,\theta,j}^{\mathcal{B}}=\quantity{\bm{v}\in\quantity{0,1}^{d}:\left\lVert\bm{v}\right\rVert_{1}=\theta-1,v_{i}=0,v_{j}=1,\text{ and }v_{k}=0\text{ for all }k\notin\mathcal{B}},

where 𝒮i,θ,j={𝒗{0,1}d:𝒗1=θ1,vi=vj=0, and vk=0 for all k}\mathcal{S}_{i,\theta,j}^{\mathcal{B}}=\quantity{\bm{v}\in\quantity{0,1}^{d}:\left\lVert\bm{v}\right\rVert_{1}=\theta-1,v_{i}=v_{j}=0,\text{ and }v_{k}=0\text{ for all }k\notin\mathcal{B}}. Then, ϕi(λ;𝒟α,m~,)\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}) can be rewritten as

ϕi(λ;𝒟α,m~,)=θ=1m~νλ~θf(z+λi)(1F(z+λj))𝒗𝒮i,θ𝒮i,θ,j(k:vk=1,kj(1F(z+λk))k:vk=0,k{i}F(z+λk))dz+θ=1m~νλ~θf(z+λi)F(z+λj)𝒗𝒮i,θ,j(k:vk=1(1F(z+λk))k:vk=0,k{i,j}F(z+λk))dz.\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=\\ \sum_{\theta=1}^{\widetilde{m}}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f(z+\lambda_{i})(1-F(z+\lambda_{j}))\sum_{\bm{v}\in\mathcal{S}_{i,\theta}^{\mathcal{B}}\setminus\mathcal{S}_{i,\theta,j}^{\mathcal{B}}}\quantity(\prod_{k:v_{k}=1,k\neq j}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i}}F(z+\lambda_{k}))\differential z\\ +\sum_{\theta=1}^{\widetilde{m}}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f(z+\lambda_{i})F(z+\lambda_{j})\sum_{\bm{v}\in\mathcal{S}_{i,\theta,j}^{\mathcal{B}}}\quantity(\prod_{k:v_{k}=1}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i,j}}F(z+\lambda_{k}))\differential z. (32)

Taking the limit of λj\lambda_{j}\to\infty on both sides of (32), since limλjF(z+λj)=1\lim_{\lambda_{j}\to\infty}F(z+\lambda_{j})=1, the first term of the RHS of (32) vanishes, and the second term becomes independent of λj\lambda_{j}. Then, limλjϕi(λ;𝒟α,m~,)\lim_{\lambda_{j}\to\infty}\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}) can be expressed as

limλjϕi(λ;𝒟α,m~,)\displaystyle\lim_{\lambda_{j}\to\infty}\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})
=\displaystyle= θ=1m~νλ~θf(z+λi)𝒗𝒮i,θ,j(k:vk=1(1F(z+λk))k:vk=0,k{i,j}F(z+λk))dz\displaystyle\sum_{\theta=1}^{\widetilde{m}}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f(z+\lambda_{i})\sum_{\bm{v}\in\mathcal{S}_{i,\theta,j}^{\mathcal{B}}}\quantity(\prod_{k:v_{k}=1}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i,j}}F(z+\lambda_{k}))\differential z
=\displaystyle= θ=1m~νλ~θf(z+λi)𝒗𝒮i,θ{j}(k:vk=1(1F(z+λk))k:vk=0,k{i,j}F(z+λk))dz,\displaystyle\sum_{\theta=1}^{\widetilde{m}}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f(z+\lambda_{i})\sum_{\bm{v}\in\mathcal{S}_{i,\theta}^{\mathcal{B}\setminus\quantity{j}}}\quantity(\prod_{k:v_{k}=1}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i,j}}F(z+\lambda_{k}))\differential z,
=\displaystyle= ϕi(λ;𝒟α,m~,{j}).\displaystyle\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}\setminus\quantity{j}). (33)

By the same argument, we also have

limλjJi(λ;𝒟α,m~,)=Ji(λ;𝒟α,m~,{j}).\lim_{\lambda_{j}\to\infty}J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}\setminus\quantity{j}). (34)

By Lemma 13, (33) and (34), we have

Ji(λ;𝒟α,m~,)ϕi(λ;𝒟α,m~,)\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})} limλjJi(λ;𝒟α,m~1,{j})ϕi(λ;𝒟α,m~1,{j})limλjJi(λ;𝒟α,m~,)ϕi(λ;𝒟α,m~,)\displaystyle\leq\lim_{\lambda_{j}\to\infty}\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\lim_{\lambda_{j}\to\infty}\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}
=Ji(λ;𝒟α,m~1,{j})ϕi(λ;𝒟α,m~1,{j})limλjJi(λ;𝒟α,m~,)limλjϕi(λ;𝒟α,m~,)\displaystyle=\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\frac{\lim_{\lambda_{j}\to\infty}J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\lim_{\lambda_{j}\to\infty}\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})} (35)
=Ji(λ;𝒟α,m~1,{j})ϕi(λ;𝒟α,m~1,{j})Ji(λ;𝒟α,m~,{j})ϕi(λ;𝒟α,m~,{j}).\displaystyle=\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}\setminus\quantity{j})}.

where (35) holds since both limλjJi(λ;𝒟α,m~,)\lim_{\lambda_{j}\to\infty}J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}) and limλjϕi(λ;𝒟α,m~,)\lim_{\lambda_{j}\to\infty}\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}) exist and nonzero.

Inequality (31)

This result follows as a special case of (30), where in (33) we have 𝒮i,m~{j}=\mathcal{S}_{i,\widetilde{m}}^{\mathcal{B}\setminus\quantity{j}}=\varnothing, since there exists no 𝒗{0,1}d\bm{v}\in\quantity{0,1}^{d} satisfying 𝒗1=m~1=||1\left\lVert\bm{v}\right\rVert_{1}=\widetilde{m}-1=\left\lvert\mathcal{B}\right\rvert-1, vi=0v_{i}=0 and vk=0v_{k}=0 for all k{j}k\notin\mathcal{B}\setminus\quantity{j} simultaneously. Therefore, we have

limλjϕi(λ;𝒟α,m~,)=ϕi(λ;𝒟α,m~1,{j})\lim_{\lambda_{j}\to\infty}\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})

and

limλjJi(λ;𝒟α,m~,)=Ji(λ;𝒟α,m~1,{j}),\lim_{\lambda_{j}\to\infty}J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j}),

which conclude the proof. ∎

7.1.1 Proof of Lemma 3

Lemma 3 (Restated) It holds that

Ji(λ;𝒟α)ϕi(λ;𝒟α)maxw{0}[(mi)1]θ[(mi)w]{Ji,θ(λ;𝒟α,i,w)ϕi,θ(λ;𝒟α,i,w)},\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m\land i)-w]\end{subarray}}\quantity{\frac{J_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}},

where

i,0=[i],i,w=[i][w], and λk={λi,if ki,λk,if k>i.\mathcal{B}_{i,0}=[i],\mathcal{B}_{i,w}=[i]\setminus[w],\text{ and }\lambda_{k}^{*}=\begin{cases}\lambda_{i},&\text{if }k\leq i,\\ \lambda_{k},&\text{if }k>i.\end{cases}
Proof.

In this proof, we locally use λ(0),λ(1),λ(2),,λ(i1)\lambda^{(0)},\lambda^{(1)},\lambda^{(2)},\cdots,\lambda^{(i-1)} to denote a sequence of dd-dimensional vectors defined as follows. Define λ(0)=λ\lambda^{(0)}=\lambda, for jij\leq i we have

λk(j)={λi,if k=j,λk(j1),otherwise, which implies λk(j)={λi,if k[j]{i},λk,otherwise.\lambda^{(j)}_{k}=\begin{cases}\lambda_{i},&\text{if }k=j,\\ \lambda^{(j-1)}_{k},&\text{otherwise,}\end{cases}\text{ which implies }\lambda^{(j)}_{k}=\begin{cases}\lambda_{i},&\text{if }k\in[j]\cup\quantity{i},\\ \lambda_{k},&\text{otherwise.}\end{cases}

Consequently, we have λ(i1)=λ.\lambda^{(i-1)}=\lambda^{*}.

We are now ready to derive the result. Firstly, to address all λjλi\lambda_{j}\leq\lambda_{i}, for i=1i=1, we only need to show that

Ji(λ;𝒟α)ϕi(λ;𝒟α)maxw{0}[k(m1)]{Ji(λ(k);𝒟α,mw,d,w)ϕi(λ(k);𝒟α,mw,d,w)}.\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{w\in\quantity{0}\cup[k\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(k)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(k)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}. (36)

holds for k=0k=0, which is trivial. For i2i\geq 2, we prove (36) holds for all k[i1]k\in[i-1] by mathematical induction. We have

Ji(λ;𝒟α)ϕi(λ;𝒟α)=Ji(λ;𝒟α,m,d,0)ϕi(λ;𝒟α,m,d,0).\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}=\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}.

We begin by verifying the base case of the induction. When k=1k=1, the statement becomes

Ji(λ;𝒟α,m,d,0)ϕi(λ;𝒟α,m,d,0)maxw{0}[1(m1)]{Ji(λ(1);𝒟α,mw,d,w)ϕi(λ(1);𝒟α,mw,d,w)}.\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}\leq\max_{w\in\quantity{0}\cup[1\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}.

If m=1m=1, the statement is immediate, since Ji(λ;𝒟α)/ϕi(λ;𝒟α)J_{i}(\lambda;\mathcal{D}_{\alpha})/\phi_{i}(\lambda;\mathcal{D}_{\alpha}) is expressed as

Ji(λ;𝒟α)ϕi(λ;𝒟α)=νminj[d]λj1z+λif(z+λi)jiF(z+λj)dzνminj[d]λjf(z+λi)jiF(z+λj)dz,\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}=\frac{\int_{\nu-\min_{j\in[d]}\lambda_{j}}^{\infty}\frac{1}{z+\lambda_{i}}f(z+\lambda_{i})\prod_{j\neq i}F(z+\lambda_{j})\differential z}{\int_{\nu-\min_{j\in[d]}\lambda_{j}}^{\infty}f(z+\lambda_{i})\prod_{j\neq i}F(z+\lambda_{j})\differential z},

which is monotonically increasing in all λjλi\lambda_{j}\neq\lambda_{i} by Lemma 11. Otherwise, by applying Lemma 13, we have

Ji(λ;𝒟α,m,d,0)ϕi(λ;𝒟α,m,d,0)\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})} Ji(λ(1);𝒟α,m1,d,1)ϕi(λ(1);𝒟α,m1,d,1)Ji(λ(1);𝒟α,m,d,0)ϕi(λ(1);𝒟α,m,d,0)\displaystyle\leq\frac{J_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m-1,\mathcal{B}_{d,1})}{\phi_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m-1,\mathcal{B}_{d,1})}\lor\frac{J_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}
=maxw{0,1}{Ji(λ(1);𝒟α,mw,d,w)ϕi(λ(1);𝒟α,mw,d,w)}.\displaystyle=\max_{w\in\quantity{0,1}}\quantity{\frac{J_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}.

Therefore, the statement holds for k=1k=1.

We assume, as the inductive hypothesis, that the statement holds for k=u<i1k=u<i-1, i.e.,

Ji(λ;𝒟α,m,d,0)ϕi(λ;𝒟α,m,d,0)maxw{0}[u(m1)]{Ji(λ(u);𝒟α,mw,d,w)ϕi(λ(u);𝒟α,mw,d,w)}.\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}\leq\max_{w\in\quantity{0}\cup[u\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}. (37)

If we can prove the statement holds for k=u+1k=u+1, then by induction, the statement holds for all ki1k\leq i-1, thereby establishing the desired result in (36). Now we aim to prove it for k=u+1i1k=u+1\leq i-1, i.e., we want to show that

Ji(λ;𝒟α,m,d,0)ϕi(λ;𝒟α,m,d,0)maxw{0}[(u+1)(m1)]{Ji(λ(u+1);𝒟α,mw,d,w)ϕi(λ(u+1);𝒟α,mw,d,w)}.\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}\leq\max_{w\in\quantity{0}\cup[(u+1)\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}.

To prove this, it suffices to show the following inequality holds:

maxw{0}[u(m1)]{Ji(λ(u);𝒟α,mw,d,w)ϕi(λ(u);𝒟α,mw,d,w)}maxw{0}[(u+1)(m1)]{Ji(λ(u+1);𝒟α,mw,d,w)ϕi(λ(u+1);𝒟α,mw,d,w)},\max_{w\in\quantity{0}\cup[u\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}\leq\\ \max_{w\in\quantity{0}\cup[(u+1)\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}, (38)

since this, together with the induction hypothesis (37), implies that

Ji(λ;𝒟α,m,d,0)ϕi(λ;𝒟α,m,d,0)\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})} maxw{0}[u(m1)]{Ji(λ(u);𝒟α,mw,d,w)ϕi(λ(u);𝒟α,mw,d,w)}\displaystyle\leq\max_{w\in\quantity{0}\cup[u\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}
maxw{0}[(u+1)(m1)]{Ji(λ(u+1);𝒟α,mw,d,w)ϕi(λ(u+1);𝒟α,mw,d,w)}.\displaystyle\leq\max_{w\in\quantity{0}\cup[(u+1)\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}.

Now we prove (38) holds for u<i1u<i-1. For each term in the LHS of (38) given by

Ji(λ(u);𝒟α,mw0,d,w0)/ϕi(λ(u);𝒟α,mw0,d,w0), where w0{0}[u(m1)],J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})/\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}}),\text{ where }w_{0}\in\quantity{0}\cup[u\land(m-1)],

we consider the following two cases.

  • Case 1: w0=m1w_{0}=m-1.
    In this case, we have mw0=1m-w_{0}=1. Similarly to the analysis on the base case, by Lemma 11, for any jd,w0{i}j\in\mathcal{B}_{d,w_{0}}\setminus\quantity{i},

    Ji(λ(u);𝒟α,1,d,w0)/ϕi(λ(u);𝒟α,1,d,w0)J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})/\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})

    is monotonically increasing in λj\lambda_{j}. Applying this to λu+1(u)\lambda^{(u)}_{u+1}, we have

    Ji(λ(u);𝒟α,1,d,w0)ϕi(λ(u);𝒟α,1,d,w0)\displaystyle\frac{J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})} Ji(λ(u+1);𝒟α,1,d,w0)ϕi(λ(u+1);𝒟α,1,d,w0)\displaystyle\leq\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})}
    maxw{0}[(u+1)(m1)]{Ji(λ(u+1);𝒟α,mw,d,w)ϕi(λ(u+1);𝒟α,mw,d,w)},\displaystyle\leq\max_{w\in\quantity{0}\cup[(u+1)\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}, (39)

    where the last inequality holds since w0{0}[u(m1)]w_{0}\in\quantity{0}\cup[u\land(m-1)] and mw0=1m-w_{0}=1.

  • Case 2: w0m2w_{0}\leq m-2.
    In this case, since w0m2w_{0}\leq m-2, we have w0<w0+1m1w_{0}<w_{0}+1\leq m-1. On the other hand, since w0{0}[u(m1)]w_{0}\in\quantity{0}\cup[u\land(m-1)], we have w0uw_{0}\leq u and thus w0<w0+1u+1w_{0}<w_{0}+1\leq u+1. Combining these, we have {w0,w0+1}{0}[(u+1)(m1)]\quantity{w_{0},w_{0}+1}\subset\quantity{0}\cup[(u+1)\land(m-1)]. Since mw02m-w_{0}\geq 2, by applying Lemma 13 again, we have

    Ji(λ(u);𝒟α,mw0,d,w0)ϕi(λ(u);𝒟α,mw0,d,w0)\displaystyle\frac{J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}
    \displaystyle\leq Ji(λ(u+1);𝒟α,m(w0+1),d,w0+1)ϕi(λ(u+1);𝒟α,m(w0+1),d,w0+1)Ji(λ(u+1);𝒟α,mw0,d,w0)ϕi(λ(u+1);𝒟α,mw0,d,w0)\displaystyle\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-(w_{0}+1),\mathcal{B}_{d,w_{0}+1})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-(w_{0}+1),\mathcal{B}_{d,w_{0}+1})}\lor\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}
    \displaystyle\leq maxw{0}[(u+1)(m1)]{Ji(λ(u+1);𝒟α,mw,d,w)ϕi(λ(u+1);𝒟α,mw,d,w)},\displaystyle\max_{w\in\quantity{0}\cup[(u+1)\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}, (40)

    where the last inequality holds since {w0,w0+1}{0}[(u+1)(m1)]\quantity{w_{0},w_{0}+1}\subset\quantity{0}\cup[(u+1)\land(m-1)].

Combining (39) and (40), we see that (38) holds and thus the statement holds for k=u+1i1k=u+1\leq i-1, completing the inductive step. By the principle of mathematical induction, the statement (36) holds for k[i1]k\in[i-1]. By letting k=i1k=i-1, we have

Ji(λ;𝒟α)ϕi(λ;𝒟α)maxw{0}[(mi)1]{Ji(λ;𝒟α,mw,d,w)ϕi(λ;𝒟α,mw,d,w)}.\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{w\in\quantity{0}\cup[(m\land i)-1]}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}. (41)

Now we address all the base-arms in d,wi,w=[d][i]\mathcal{B}_{d,w}\setminus\mathcal{B}_{i,w}=[d]\setminus[i], i.e., all λj\lambda_{j} satisfying λj>λi\lambda_{j}>\lambda_{i}. We prove

Ji(λ;𝒟α)ϕi(λ;𝒟α)maxw{0}[(mi)1]m~[(m(dk))w]{Ji(λ;𝒟α,m~,dk,w)ϕi(λ;𝒟α,m~,dk,w)}\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-k))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-k,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-k,w})}} (42)

by mathematical induction on k[di]k\in[d-i]. We begin by verifying the base case of the induction. When k=1k=1, the statement becomes

Ji(λ;𝒟α)ϕi(λ;𝒟α)maxw{0}[(mi)1]m~[(m(d1))w]{Ji(λ;𝒟α,m~,d1,w)ϕi(λ;𝒟α,m~,d1,w)}.\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-1,w})}}.

If m<dm<d, we consider each term in the RHS of (41) in two cases. For w=w0{0}[(mi)1]w=w_{0}\in\quantity{0}\cup[(m\land i)-1] such that mw0=1m-w_{0}=1, it follows that

Ji(λ;𝒟α,mw0,d,w0)ϕi(λ;𝒟α,mw0,d,w0)\displaystyle\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})} =Ji(λ;𝒟α,1,d,w0)ϕi(λ;𝒟α,1,d,w0)\displaystyle=\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})}
=νminjd,w0λj1z+λif(z+λi)jd,w0{i}F(z+λj)dzνminjd,w0λjf(z+λi)jd,w0{i}F(z+λj)dz,\displaystyle=\frac{\int_{\nu-\min_{j\in\mathcal{B}_{d,w_{0}}}\lambda^{*}_{j}}^{\infty}\frac{1}{z+\lambda^{*}_{i}}f(z+\lambda^{*}_{i})\prod_{j\in\mathcal{B}_{d,w_{0}}\setminus\quantity{i}}F(z+\lambda^{*}_{j})\differential z}{\int_{\nu-\min_{j\in\mathcal{B}_{d,w_{0}}}\lambda^{*}_{j}}^{\infty}f(z+\lambda^{*}_{i})\prod_{j\in\mathcal{B}_{d,w_{0}}\setminus\quantity{i}}F(z+\lambda^{*}_{j})\differential z},

which is monotonically increasing in λj\lambda_{j} for all jd,w0{i}j\in\mathcal{B}_{d,w_{0}}\setminus\quantity{i} by Lemma 11. Taking the limit λd\lambda^{*}_{d}\to\infty, we have limλdF(z+λd)=1\lim_{\lambda^{*}_{d}\to\infty}F(z+\lambda^{*}_{d})=1 and thus

Ji(λ;𝒟α,mw0,d,w0)ϕi(λ;𝒟α,mw0,d,w0)\displaystyle\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})} νminjd,w0λj1z+λif(z+λi)jd,w0{i,d}F(z+λj)dzνminjd,w0λjf(z+λi)jd,w0{i,d}F(z+λj)dz\displaystyle\leq\frac{\int_{\nu-\min_{j\in\mathcal{B}_{d,w_{0}}}\lambda^{*}_{j}}^{\infty}\frac{1}{z+\lambda^{*}_{i}}f(z+\lambda^{*}_{i})\prod_{j\in\mathcal{B}_{d,w_{0}}\setminus\quantity{i,d}}F(z+\lambda^{*}_{j})\differential z}{\int_{\nu-\min_{j\in\mathcal{B}_{d,w_{0}}}\lambda^{*}_{j}}^{\infty}f(z+\lambda^{*}_{i})\prod_{j\in\mathcal{B}_{d,w_{0}}\setminus\quantity{i,d}}F(z+\lambda^{*}_{j})\differential z}
=Ji(λ;𝒟α,mw0,d1,w0)ϕi(λ;𝒟α,mw0,d1,w0).\displaystyle=\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d-1,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d-1,w_{0}})}.

For w=w0{0}[(mi)1]w=w_{0}\in\quantity{0}\cup[(m\land i)-1] such that mw02m-w_{0}\geq 2, by applyng Lemma 14 to base-arm dd, we obtain

Ji(λ;𝒟α,mw0,d,w0)ϕi(λ;𝒟α,mw0,d,w0)Ji(λ;𝒟α,mw0,d1,w0)ϕi(λ;𝒟α,mw0,d1,w0)Ji(λ;𝒟α,m1w0,d1,w0)ϕi(λ;𝒟α,m1w0,d1,w0).\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}\leq\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d-1,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d-1,w_{0}})}\lor\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-1-w_{0},\mathcal{B}_{d-1,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-1-w_{0},\mathcal{B}_{d-1,w_{0}})}.

Combining the above two cases, by (41) we have

Ji(λ;𝒟α)ϕi(λ;𝒟α)\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})} maxw{0}[(mi)1]{Ji(λ;𝒟α,mw,d1,w)ϕi(λ;𝒟α,mw,d1,w),Ji(λ;𝒟α,(m1w)1,d1,w)ϕi(λ;𝒟α,(m1w)1,d1,w)}\displaystyle\leq\max_{w\in\quantity{0}\cup[(m\land i)-1]}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d-1,w})},\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\quantity(m-1-w)\lor 1,\mathcal{B}_{d-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\quantity(m-1-w)\lor 1,\mathcal{B}_{d-1,w})}}
maxw{0}[(mi)1]m~[(m(d1))w]{Ji(λ;𝒟α,m~,d1,w)ϕi(λ;𝒟α,m~,d1,w)},\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-1,w})}},

where the last inequality holds since m<dm<d and {mw,(m1w)1}[(m(d1))w]\quantity{m-w,\quantity(m-1-w)\lor 1}\subset[(m\land(d-1))-w] for any w{0}[(mi)1]w\in\quantity{0}\cup[(m\land i)-1]. On the other hand, if m=dm=d, by Lemma 14 and (41), we obtain

Ji(λ;𝒟α)ϕi(λ;𝒟α)\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})} maxw{0}[(mi)1]{Ji(λ;𝒟α,m1w,d1,w)ϕi(λ;𝒟α,m1w,d1,w)}\displaystyle\leq\max_{w\in\quantity{0}\cup[(m\land i)-1]}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-1-w,\mathcal{B}_{d-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-1-w,\mathcal{B}_{d-1,w})}}
maxw{0}[(mi)1]m~[(m(d1))w]{Ji(λ;𝒟α,m~,d1,w)ϕi(λ;𝒟α,m~,d1,w)},\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-1,w})}},

where the last inequality holds since m1w[(m(d1))w]m-1-w\in[(m\land(d-1))-w] for any w{0}[(mi)1]w\in\quantity{0}\cup[(m\land i)-1]. Therefore, the statement holds for base case k=1k=1.

Now we assume, as the inductive hypothesis, that the statement holds for k=udi1k=u\leq d-i-1, i.e.,

Ji(λ;𝒟α)ϕi(λ;𝒟α)maxw{0}[(mi)1]m~[(m(du))w]{Ji(λ;𝒟α,m~,du,w)ϕi(λ;𝒟α,m~,du,w)}.\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u,w})}}. (43)

If we can prove the statement holds for k=u+1k=u+1, then by induction, the statement holds for all kdik\leq d-i, thereby establishing the desired result in (42). Now we aim to prove it for k=u+1dik=u+1\leq d-i, i.e., we want to show that

Ji(λ;𝒟α)ϕi(λ;𝒟α)maxw{0}[(mi)1]m~[(m(du1))w]{Ji(λ;𝒟α,m~,du1,w)ϕi(λ;𝒟α,m~,du1,w)}.\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}}.

To prove this, by induction hypothesis (43), we only need to show that the following inequality holds:

maxw{0}[(mi)1]m~[(m(du))w]{Ji(λ;𝒟α,m~,du,w)ϕi(λ;𝒟α,m~,du,w)}maxw{0}[(mi)1]m~[(m(du1))w]{Ji(λ;𝒟α,m~,du1,w)ϕi(λ;𝒟α,m~,du1,w)}.\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u,w})}}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}}. (44)

We consider each term in the LHS of the above inequality with w=w0{0}[(mi)1]w=w_{0}\in\quantity{0}\cup[(m\land i)-1] and m~=m~0[(m(du))w0]\widetilde{m}=\widetilde{m}_{0}\in[(m\land(d-u))-w_{0}] in the following three cases.

  • Case 1: m~0=1\widetilde{m}_{0}=1.
    In this case, since udi1u\leq d-i-1 and w0(mi)1w_{0}\leq(m\land i)-1, we have

    (du1)w0d(di1)1((mi)1)1=m~0.(d-u-1)-w_{0}\geq d-(d-i-1)-1-\quantity((m\land i)-1)\geq 1=\widetilde{m}_{0}.

    On the other hand, since m~0[(m(du))w0]\widetilde{m}_{0}\in[(m\land(d-u))-w_{0}], we have m~0mw0\widetilde{m}_{0}\leq m-w_{0}. Combining these, it follows that m~0[(m(du1))w0]\widetilde{m}_{0}\in[(m\land(d-u-1))-w_{0}]. Similarly to the analysis on the base case, by Lemma 11, for any jdu,w0{i}j\in\mathcal{B}_{d-u,w_{0}}\setminus\quantity{i},

    Ji(λ;𝒟α,m~0,du,w0)/ϕi(λ;𝒟α,m~0,du,w0)J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})/\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})

    is monotonically increasing in λj\lambda_{j}. Taking the limit λdu\lambda^{*}_{d-u}\to\infty, it follows that

    Ji(λ;𝒟α,m~0,du,w0)ϕi(λ;𝒟α,m~0,du,w0)\displaystyle\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})} Ji(λ;𝒟α,m~0,du1,w0)ϕi(λ;𝒟α,m~0,du1,w0)\displaystyle\leq\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u-1,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u-1,w_{0}})}
    maxw{0}[(mi)1]m~[(m(du1))w]{Ji(λ;𝒟α,m~,du1,w)ϕi(λ;𝒟α,m~,du1,w)},\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}},

    where the last inequality holds since w0{0}[(mi)1]w_{0}\in\quantity{0}\cup[(m\land i)-1] and m~0[(m(du1))w0]\widetilde{m}_{0}\in[(m\land(d-u-1))-w_{0}].

  • Case 2: m~02\widetilde{m}_{0}\geq 2 and m~0(du)w01\widetilde{m}_{0}\leq(d-u)-w_{0}-1.
    In this case, since m~0[(m(du))w0]\widetilde{m}_{0}\in[(m\land(d-u))-w_{0}] we have m~0mw0\widetilde{m}_{0}\leq m-w_{0}. Combining it with m~0(du)w01\widetilde{m}_{0}\leq(d-u)-w_{0}-1, it follows that m~0[(m(du1))w0]\widetilde{m}_{0}\in[(m\land(d-u-1))-w_{0}]. Since m~02\widetilde{m}_{0}\geq 2, we have {m~01,m~0}[(m(du1))w0]\quantity{\widetilde{m}_{0}-1,\widetilde{m}_{0}}\subset[(m\land(d-u-1))-w_{0}]. By applying Lemma 14 to base-arm dud-u, we have

    Ji(λ;𝒟α,m~0,du,w0)ϕi(λ;𝒟α,m~0,du,w0)\displaystyle\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})} Ji(λ;𝒟α,m~01,du1,w0)ϕi(λ;𝒟α,m~01,du1,w0)Ji(λ;𝒟α,m~0,du1,w0)ϕi(λ;𝒟α,m~0,du1,w0)\displaystyle\leq\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0}-1,\mathcal{B}_{d-u-1,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0}-1,\mathcal{B}_{d-u-1,w_{0}})}\lor\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u-1,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u-1,w_{0}})}
    maxw{0}[(mi)1]m~[(m(du1))w]{Ji(λ;𝒟α,m~,du1,w)ϕi(λ;𝒟α,m~,du1,w)},\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}},

    where the last inequality holds since w0{0}[(mi)1]w_{0}\in\quantity{0}\cup[(m\land i)-1] and {m~01,m~0}[(m(du1))w0]\quantity{\widetilde{m}_{0}-1,\widetilde{m}_{0}}\subset[(m\land(d-u-1))-w_{0}].

  • Case 3: m~02\widetilde{m}_{0}\geq 2 and m~0=(du)w0\widetilde{m}_{0}=(d-u)-w_{0}.
    In this case, since m~02\widetilde{m}_{0}\geq 2 and m~0=(du)w0\widetilde{m}_{0}=(d-u)-w_{0}, we have m~01=(du1)w01\widetilde{m}_{0}-1=(d-u-1)-w_{0}\geq 1. On the other hand, since m~0[(m(du))w0]\widetilde{m}_{0}\in[(m\land(d-u))-w_{0}] we have m~01mw0\widetilde{m}_{0}-1\leq m-w_{0}. Combining these, it follows that m~01[(m(du1))w0]\widetilde{m}_{0}-1\in[(m\land(d-u-1))-w_{0}]. By applying Lemma 14 to base-arm dud-u, we have

    Ji(λ;𝒟α,m~0,du,w0)ϕi(λ;𝒟α,m~0,du,w0)\displaystyle\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})} Ji(λ;𝒟α,m~01,du1,w0)ϕi(λ;𝒟α,m~01,du1,w0)\displaystyle\leq\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0}-1,\mathcal{B}_{d-u-1,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0}-1,\mathcal{B}_{d-u-1,w_{0}})}
    maxw{0}[(mi)1]m~[(m(du1))w]{Ji(λ;𝒟α,m~,du1,w)ϕi(λ;𝒟α,m~,du1,w)},\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}},

    where the last inequality holds since w0{0}[(mi)1]w_{0}\in\quantity{0}\cup[(m\land i)-1] and m~01[(m(du1))w0]\widetilde{m}_{0}-1\in[(m\land(d-u-1))-w_{0}].

Combining the three cases, we see that (44) holds, and thus the statement holds for k=u+1dik=u+1\leq d-i, completing the inductive step. By induction, the statement (42) holds for k[di]k\in[d-i]. By letting k=dik=d-i, we immediately obtain

Ji(λ;𝒟α)ϕi(λ;𝒟α)maxw{0}[(mi)1]m~[(mi)w]{Ji(λ;𝒟α,m~,i,w)ϕi(λ;𝒟α,m~,i,w)}.\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land i)-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{i,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{i,w})}}. (45)

Note that we have

ϕi(λ;𝒟α,m~0,i,w0)=θ=1m~0ϕi,θ(λ;𝒟α,i,w0)\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{i,w_{0}})=\sum_{\theta=1}^{\widetilde{m}_{0}}\phi_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}}) (46)

and

Ji(λ;𝒟α,m~0,i,w0)=θ=1m~0Ji,θ(λ;𝒟α,i,w0).J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{i,w_{0}})=\sum_{\theta=1}^{\widetilde{m}_{0}}J_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}}). (47)

Since each term in the RHS of both (46) and (47) is positive, we have

Ji(λ;𝒟α,m~0,i,w0)ϕi(λ;𝒟α,m~0,i,w0)\displaystyle\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{i,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{i,w_{0}})} =θ=1m~0Ji,θ(λ;𝒟α,i,w0)θ=1m~0ϕi,θ(λ;𝒟α,i,w0)\displaystyle=\frac{\sum_{\theta=1}^{\widetilde{m}_{0}}J_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}})}{\sum_{\theta=1}^{\widetilde{m}_{0}}\phi_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}})}
maxθ[m~0]{Ji,θ(λ;𝒟α,i,w0)ϕi,θ(λ;𝒟α,i,w0)}.\displaystyle\leq\max_{\theta\in[\widetilde{m}_{0}]}\quantity{\frac{J_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}})}{\phi_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}})}}. (48)

Combining (45) and (48), we have

Ji(λ;𝒟α)ϕi(λ;𝒟α)\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})} maxw{0}[(mi)1]m~[(mi)w]{maxθ[m~]{Ji,θ(λ;𝒟α,i,w)ϕi,θ(λ;𝒟α,i,w)}}\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land i)-w]\end{subarray}}\quantity{\max_{\theta\in[\widetilde{m}]}\quantity{\frac{J_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}}}
=maxw{0}[(mi)1]θ[(mi)w]{Ji,θ(λ;𝒟α,i,w)ϕi,θ(λ;𝒟α,i,w)},\displaystyle=\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m\land i)-w]\end{subarray}}\quantity{\frac{J_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}},

which concludes the proof. ∎

7.2 Proof of Lemma 4

7.2.1 Pareto Distribution

Let us consider the case 𝒟α=𝒫α\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}. Recall that the probability density function and cumulative distribution function of Pareto distribution are given by

f(x)=αxα+1,F(x)=1xα,x1.f(x)=\frac{\alpha}{x^{\alpha+1}},\quad F(x)=1-x^{-\alpha},\quad x\geq 1.

Then, for w{0}[(mi)1],θ[(mi)w]w\in\quantity{0}\cup[(m\land i)-1],\theta\in[(m\land i)-w], we have

Ji,θ(λ;𝒫α,i,w)\displaystyle J_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w}) =(iw1θ1)1λθf(z+λi)z+λi(1F(z+λi))θ1Fiwθ(z+λi)dz\displaystyle=\binom{i-w-1}{\theta-1}\int_{1-\lambda_{\theta}}^{\infty}\frac{f(z+\lambda_{i})}{z+\lambda_{i}}\quantity(1-F(z+\lambda_{i}))^{\theta-1}F^{i-w-\theta}(z+\lambda_{i})\differential z
=(iw1θ1)1f(z)z(1F(z))θ1Fiwθ(z)dz\displaystyle=\binom{i-w-1}{\theta-1}\int_{1}^{\infty}\frac{f(z)}{z}\quantity(1-F(z))^{\theta-1}F^{i-w-\theta}(z)\differential z

and

ϕi,θ(λ;𝒫α,i,w)\displaystyle\phi_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w}) =(iw1θ1)1λθf(z+λi)(1F(z+λi))θ1Fiwθ(z+λi)dz\displaystyle=\binom{i-w-1}{\theta-1}\int_{1-\lambda_{\theta}}^{\infty}f(z+\lambda_{i})\quantity(1-F(z+\lambda_{i}))^{\theta-1}F^{i-w-\theta}(z+\lambda_{i})\differential z
=(iw1θ1)1f(z)(1F(z))θ1Fiwθ(z)dz.\displaystyle=\binom{i-w-1}{\theta-1}\int_{1}^{\infty}f(z)\quantity(1-F(z))^{\theta-1}F^{i-w-\theta}(z)\differential z.

Then, it holds that

Ji,θ(λ;𝒫α,i,w)\displaystyle J_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w}) =(iw1θ1)1αzαθ+2(1zα)iwθdz\displaystyle=\binom{i-w-1}{\theta-1}\int_{1}^{\infty}\frac{\alpha}{z^{\alpha\theta+2}}\quantity(1-z^{-\alpha})^{i-w-\theta}\differential z
=(iw1θ1)01wθ1+1α(1w)iwθdw\displaystyle=\binom{i-w-1}{\theta-1}\int_{0}^{1}w^{\theta-1+\frac{1}{\alpha}}(1-w)^{i-w-\theta}\differential w
=(iw1θ1)B(θ+1α,i+1wθ).\displaystyle=\binom{i-w-1}{\theta-1}B\quantity(\theta+\frac{1}{\alpha},i+1-w-\theta).

where B(a,b)=01ta1(1t)b1dtB\quantity(a,b)=\int_{0}^{1}t^{a-1}(1-t)^{b-1}\differential t denotes the Beta function. Similarly, we have

ϕi,θ(λ;𝒫α,i,w)\displaystyle\phi_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w}) =(iw1θ1)1αzαθ+1(1zα)iwθdz\displaystyle=\binom{i-w-1}{\theta-1}\int_{1}^{\infty}\frac{\alpha}{z^{\alpha\theta+1}}\quantity(1-z^{-\alpha})^{i-w-\theta}\differential z
=(iw1θ1)01wθ1(1w)iwθdw\displaystyle=\binom{i-w-1}{\theta-1}\int_{0}^{1}w^{\theta-1}(1-w)^{i-w-\theta}\differential w
=(iw1θ1)B(θ,i+1wθ).\displaystyle=\binom{i-w-1}{\theta-1}B\quantity(\theta,i+1-w-\theta).

Therefore, we have

Ji,θ(λ;𝒫α,i,w)ϕi,θ(λ;𝒫α,i,w)=B(θ+1α,i+1wθ)B(θ,i+1wθ).\frac{J_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})}=\frac{B\quantity(\theta+\frac{1}{\alpha},i+1-w-\theta)}{B\quantity(\theta,i+1-w-\theta)}. (49)

Similarly to the proof of Lee et al. (2024), we bound (49) as follows. For α>1\alpha>1, we have

B(θ+1α,i+1wθ)B(θ,i+1wθ)\displaystyle\frac{B(\theta+\frac{1}{\alpha},i+1-w-\theta)}{B(\theta,i+1-w-\theta)}
=\displaystyle= Γ(θ+1α)Γ(i+1wθ)Γ(i+1+1αw)Γ(i+1w)Γ(θ)Γ(i+1wθ)\displaystyle\frac{\Gamma(\theta+\frac{1}{\alpha})\Gamma(i+1-w-\theta)}{\Gamma(i+1+\frac{1}{\alpha}-w)}\frac{\Gamma(i+1-w)}{\Gamma(\theta)\Gamma(i+1-w-\theta)} (by B(a,b)=Γ(a)Γ(b)Γ(a+b)B(a,b)=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)})
=\displaystyle= Γ(θ+1α)Γ(i+1+1αw)Γ(i+1w)Γ(θ)\displaystyle\frac{\Gamma(\theta+\frac{1}{\alpha})}{\Gamma(i+1+\frac{1}{\alpha}-w)}\frac{\Gamma(i+1-w)}{\Gamma(\theta)}
=\displaystyle= 1i+1αwΓ(θ+1α)Γ(θ)Γ(i+1w)Γ(i+1αw)\displaystyle\frac{1}{i+\frac{1}{\alpha}-w}\frac{\Gamma(\theta+\frac{1}{\alpha})}{\Gamma(\theta)}\frac{\Gamma(i+1-w)}{\Gamma(i+\frac{1}{\alpha}-w)} (by Γ(n)=(n1)Γ(n1)\Gamma(n)=(n-1)\Gamma(n-1))
\displaystyle\leq 1i+1αw(θ+1α)1α(i+1w)11α\displaystyle\frac{1}{i+\frac{1}{\alpha}-w}\left(\theta+\frac{1}{\alpha}\right)^{\frac{1}{\alpha}}\left(i+1-w\right)^{1-\frac{1}{\alpha}} (by Gautschi’s inequality)
=\displaystyle= i+1wi+1αw(θ+1αi+1w)1α\displaystyle\frac{i+1-w}{i+\frac{1}{\alpha}-w}\left(\frac{\theta+\frac{1}{\alpha}}{i+1-w}\right)^{\frac{1}{\alpha}}
\displaystyle\leq 2αα+1(θ+1αi+1w)1α.\displaystyle\frac{2\alpha}{\alpha+1}\left(\frac{\theta+\frac{1}{\alpha}}{i+1-w}\right)^{\frac{1}{\alpha}}. (equality holds when w=i1w=i-1)

Since w{0}[(mi)1],θ[(mw)(iw)]w\in\quantity{0}\cup[(m\land i)-1],\theta\in[(m-w)\land(i-w)], we have

maxw{0}[(mi)1]θ[(mw)(iw)]θ+1αi+1w=maxw{0}[(mi)1](mi)+1αwi+1w=(mi)+1αi+1.\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m-w)\land(i-w)]\end{subarray}}\frac{\theta+\frac{1}{\alpha}}{i+1-w}=\max_{w\in\quantity{0}\cup[(m\land i)-1]}\frac{(m\land i)+\frac{1}{\alpha}-w}{i+1-w}=\frac{(m\land i)+\frac{1}{\alpha}}{i+1}.

Therefore, we have

maxw{0}[(mi)1]θ[(mw)(iw)]B(θ+1α,i+1wθ)B(θ,i+1wθ)2αα+1((im)+1αi)1α.\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m-w)\land(i-w)]\end{subarray}}\frac{B(\theta+\frac{1}{\alpha},i+1-w-\theta)}{B(\theta,i+1-w-\theta)}\leq\frac{2\alpha}{\alpha+1}\quantity(\frac{(i\land m)+\frac{1}{\alpha}}{i})^{\frac{1}{\alpha}}.

Recall that σi=i\sigma_{i}=i as previously noted for notational simplicity. By Lemma 3, for any i[d]i\in[d], we have

Ji(λ;𝒫α)ϕi(λ;𝒫α)maxw{0}[(mi)1]θ[(mw)(iw)]Ji,θ(λ;𝒫α,i,w)ϕi,θ(λ;𝒫α,i,w)2αα+1((σim)+1ασi)1α.\frac{J_{i}(\lambda;\mathcal{P}_{\alpha})}{\phi_{i}(\lambda;\mathcal{P}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m-w)\land(i-w)]\end{subarray}}\frac{J_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})}\leq\frac{2\alpha}{\alpha+1}\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{\sigma_{i}})^{\frac{1}{\alpha}}.

7.2.2 Fréchet Distribution

Before proving the statement in the case 𝒟α=α\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha}, we need to give the following lemma.

Lemma 15.

Let F(x)F(x) denote the cumulative distribution function of Fréchet distribution with shape α\alpha. For λi0\lambda_{i}\geq 0, we have

0f(z+λi)z+λi(1F(z+λi))pFq(z+λi)dz0f(z+λi)(1F(z+λi))pFq(z+λi)dz0f(z+λi)(z+λi)αp+1Fq(z+λi)dz0f(z+λi)(z+λi)αpFq(z+λi)dz.\frac{\int_{0}^{\infty}\frac{f(z+\lambda_{i})}{z+\lambda_{i}}\quantity(1-F(z+\lambda_{i}))^{p}F^{q}(z+\lambda_{i})\differential z}{\int_{0}^{\infty}f(z+\lambda_{i})\quantity(1-F(z+\lambda_{i}))^{p}F^{q}(z+\lambda_{i})\differential z}\leq\frac{\int_{0}^{\infty}\frac{f(z+\lambda_{i})}{(z+\lambda_{i})^{\alpha p+1}}F^{q}(z+\lambda_{i})\differential z}{\int_{0}^{\infty}\frac{f(z+\lambda_{i})}{(z+\lambda_{i})^{\alpha p}}F^{q}(z+\lambda_{i})\differential z}. (50)
Proof.

By letting h(x)=f(z+λi)Fq(z+λi)h(x)=f(z+\lambda_{i})F^{q}(z+\lambda_{i}), the inequality (50) can be rewritten as

01z+λi(1F(z+λi))ph(z)dz0(1F(z+λi))ph(z)dz01(z+λi)αp+1h(z)dz01(z+λi)αph(z)dz.\frac{\int_{0}^{\infty}\frac{1}{z+\lambda_{i}}\quantity(1-F(z+\lambda_{i}))^{p}h(z)\differential z}{\int_{0}^{\infty}\quantity(1-F(z+\lambda_{i}))^{p}h(z)\differential z}\leq\frac{\int_{0}^{\infty}\frac{1}{(z+\lambda_{i})^{\alpha p+1}}h(z)\differential z}{\int_{0}^{\infty}\frac{1}{(z+\lambda_{i})^{\alpha p}}h(z)\differential z}.

Equivalently, multiplying both sides by

0(1F(z+λi))ph(z)dz01(z+λi)αph(z)dz,\int_{0}^{\infty}\quantity(1-F(z+\lambda_{i}))^{p}h(z)\differential z\cdot\int_{0}^{\infty}\frac{1}{(z+\lambda_{i})^{\alpha p}}h(z)\differential z,

we arrive at

0h(z)(z+λi)(1F(z+λi))pdz0h(z)(z+λi)αpdz0h(z)(z+λi)αp+1dz0h(z)(1F(z+λi))pdz,\int_{0}^{\infty}\frac{h(z)}{(z+\lambda_{i})}\quantity(1-F(z+\lambda_{i}))^{p}\differential z\int_{0}^{\infty}\frac{h(z)}{(z+\lambda_{i})^{\alpha p}}\differential z\leq\\ \int_{0}^{\infty}\frac{h(z)}{(z+\lambda_{i})^{\alpha p+1}}\differential z\int_{0}^{\infty}h(z)\quantity(1-F(z+\lambda_{i}))^{p}\differential z, (51)

and thus we only need to prove it to conclude the proof. The LHS of (51) can be expressed as

0h(z)(z+λi)(1F(z+λi))pdz0h(z)(z+λi)αpdz\displaystyle\int_{0}^{\infty}\frac{h(z)}{(z+\lambda_{i})}\quantity(1-F(z+\lambda_{i}))^{p}\differential z\int_{0}^{\infty}\frac{h(z)}{(z+\lambda_{i})^{\alpha p}}\differential z
=\displaystyle= z,w0h(z)h(w)(1exp(1/(z+λi)α))p(z+λi)(w+λi)αpdzdw\displaystyle\iint_{z,w\geq 0}\frac{h(z)h(w)\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}}{(z+\lambda_{i})(w+\lambda_{i})^{\alpha p}}\differential z\differential w
=\displaystyle= 12z,w0h(z)h(w)((1exp(1/(z+λi)α))p(z+λi)(w+λi)αp+(1exp(1/(w+λi)α))p(w+λi)(z+λi)αp)dzdw\displaystyle\frac{1}{2}\iint_{z,w\geq 0}h(z)h(w)\quantity(\frac{\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}}{(z+\lambda_{i})(w+\lambda_{i})^{\alpha p}}+\frac{\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{p}}{(w+\lambda_{i})(z+\lambda_{i})^{\alpha p}})\differential z\differential w

and the RHS of (51) can be expressed as

0h(z)(z+λi)αp+1dz0h(z)(1F(z+λi))pdz\displaystyle\int_{0}^{\infty}\frac{h(z)}{(z+\lambda_{i})^{\alpha p+1}}\differential z\int_{0}^{\infty}h(z)\quantity(1-F(z+\lambda_{i}))^{p}\differential z
=\displaystyle= z,w0h(z)h(w)(1exp(1/(w+λi)α))n(z+λi)αn+1dzdw\displaystyle\iint_{z,w\geq 0}\frac{h(z)h(w)\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{n}}{(z+\lambda_{i})^{\alpha n+1}}\differential z\differential w
=\displaystyle= 12z,w0h(z)h(w)((1exp(1/(w+λi)α))n(z+λi)αn+1+(1exp(1/(z+λi)α))n(w+λi)αn+1)dzdw.\displaystyle\frac{1}{2}\iint_{z,w\geq 0}h(z)h(w)\quantity(\frac{\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{n}}{(z+\lambda_{i})^{\alpha n+1}}+\frac{\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{n}}{(w+\lambda_{i})^{\alpha n+1}})\differential z\differential w.

By an elementary calculation we can see

(1exp(1/(z+λi)α))p(z+λi)(w+λi)αp+(1exp(1/(w+λi)α))p(w+λi)(z+λi)αp\displaystyle\frac{\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}}{(z+\lambda_{i})(w+\lambda_{i})^{\alpha p}}+\frac{\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{p}}{(w+\lambda_{i})(z+\lambda_{i})^{\alpha p}}
\displaystyle- (1exp(1/(w+λi)α))p(z+λi)αp+1(1exp(1/(z+λi)α))p(w+λi)αp+1\displaystyle\frac{\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{p}}{(z+\lambda_{i})^{\alpha p+1}}-\frac{\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}}{(w+\lambda_{i})^{\alpha p+1}}
=\displaystyle= (z+λi)αp(1exp(1/(z+λi)α))p(w+λi)αp(1exp(1/(w+λi)α))p(z+λi)αp+1(w+λi)αp\displaystyle\frac{(z+\lambda_{i})^{\alpha p}\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}-(w+\lambda_{i})^{\alpha p}\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{p}}{(z+\lambda_{i})^{\alpha p+1}(w+\lambda_{i})^{\alpha p}}
\displaystyle- (w+λi)αp(1exp(1/(w+λi)α))p(z+λi)αp(1exp(1/(z+λi)α))p(w+λi)αp+1(z+λi)αp\displaystyle\frac{(w+\lambda_{i})^{\alpha p}\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{p}-(z+\lambda_{i})^{\alpha p}\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}}{(w+\lambda_{i})^{\alpha p+1}(z+\lambda_{i})^{\alpha p}}
=\displaystyle= wz(z+λi)αp+1(w+λi)αp+1((z+λi)αp(1exp(1/(z+λi)α))p(w+λi)αp(1exp(1/(w+λi)α))p).\displaystyle\frac{w-z}{(z+\lambda_{i})^{\alpha p+1}(w+\lambda_{i})^{\alpha p+1}}\quantity((z+\lambda_{i})^{\alpha p}\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}-(w+\lambda_{i})^{\alpha p}\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{p}). (52)

By letting t(x)=(x+λi)αt(x)=(x+\lambda_{i})^{\alpha}, we have

(x+λi)αn(1exp(1/(x+λi)α))n=(t(1e1/t))n.(x+\lambda_{i})^{\alpha n}\quantity(1-\exp\quantity(-1/(x+\lambda_{i})^{\alpha}))^{n}=\quantity(t\quantity(1-e^{-1/t}))^{n}. (53)

Since t(x)t(x) is monotonically increasing in xx, and t(1e1/t)>0t\quantity(1-e^{-1/t})>0, the LHS of (53) is monotonic in the same direction as t(1e1/t)t\quantity(1-e^{-1/t}), whose derivative is expressed as

1e1/t1te1/t1e1/t(e1/t1)e1/t=0,1-e^{-1/t}-\frac{1}{t}e^{-1/t}\geq 1-e^{-1/t}-(e^{1/t}-1)e^{-1/t}=0,

where the inequality holds since 1te1/t1\frac{1}{t}\leq e^{1/t}-1. Therefore, the LHS of (53) is monotonically increasing in xx. This implies that the expression in (52) is non-positivive, which concludes the proof. ∎

We now prove Lemma 4 in the case 𝒟α=α\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha} by applying Lemma 15. Recall that the probability density function and cumulative distribution function of Pareto distribution are given by

f(x)=αxα+1e1/xα,F(x)=e1/xα,x0.f(x)=\frac{\alpha}{x^{\alpha+1}}e^{-1/x^{\alpha}},\quad F(x)=e^{-1/x^{\alpha}},\quad x\geq 0.

Then, for w{0}[(mi)1],θ[(mi)w]w\in\quantity{0}\cup[(m\land i)-1],\theta\in[(m\land i)-w], we have

Ji,θ(λ;α,i,w)\displaystyle J_{i,\theta}(\lambda^{*};\mathcal{F}_{\alpha},\mathcal{B}_{i,w}) =(iw1θ1)λθf(z+λi)z+λi(1F(z+λi))θ1Fiwθ(z+λi)dz\displaystyle=\binom{i-w-1}{\theta-1}\int_{-\lambda_{\theta}}^{\infty}\frac{f(z+\lambda_{i})}{z+\lambda_{i}}\quantity(1-F(z+\lambda_{i}))^{\theta-1}F^{i-w-\theta}(z+\lambda_{i})\differential z
=(iw1θ1)0f(z)z(1F(z))θ1Fiwθ(z)dz\displaystyle=\binom{i-w-1}{\theta-1}\int_{0}^{\infty}\frac{f(z)}{z}\quantity(1-F(z))^{\theta-1}F^{i-w-\theta}(z)\differential z

and

ϕi,θ(λ;α,i,w)\displaystyle\phi_{i,\theta}(\lambda^{*};\mathcal{F}_{\alpha},\mathcal{B}_{i,w}) =(iw1θ1)λθf(z+λi)(1F(z+λi))θ1Fiwθ(z+λi)dz\displaystyle=\binom{i-w-1}{\theta-1}\int_{-\lambda_{\theta}}^{\infty}f(z+\lambda_{i})\quantity(1-F(z+\lambda_{i}))^{\theta-1}F^{i-w-\theta}(z+\lambda_{i})\differential z
=(iw1θ1)0f(z)(1F(z))θ1Fiwθ(z)dz.\displaystyle=\binom{i-w-1}{\theta-1}\int_{0}^{\infty}f(z)\quantity(1-F(z))^{\theta-1}F^{i-w-\theta}(z)\differential z.

Define

Ii,n(q;α)=01zneq/zα.I_{i,n}\quantity(q;\mathcal{F}_{\alpha})=\int_{0}^{\infty}\frac{1}{z^{n}}e^{-q/z^{\alpha}}.

Then, by Lemma 15, we immediately obtain

Ji,θ(λ;α,i,w)ϕi,θ(λ;α,i,w)Ii,αθ+2(i+1wθ;α)Ii,αθ+1(i+1wθ;α).\frac{J_{i,\theta}(\lambda^{*};\mathcal{F}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{*};\mathcal{F}_{\alpha},\mathcal{B}_{i,w})}\leq\frac{I_{i,\alpha\theta+2}\quantity(i+1-w-\theta;\mathcal{F}_{\alpha})}{I_{i,\alpha\theta+1}\quantity(i+1-w-\theta;\mathcal{F}_{\alpha})}. (54)

Similarly to the proofs of Honda et al. (2023) and Lee et al. (2024), we bound the RHS of (54) as follows. By letting u=q/zαu=q/z^{\alpha}, both Ii,αθ+2(q;α)I_{i,\alpha\theta+2}(q;\alpha) and Ii,αθ+1(q;α)I_{i,\alpha\theta+1}(q;\alpha) can be expressed by Gamma function Γ(k)=0ettk1dt\Gamma(k)=\int_{0}^{\infty}e^{-t}t^{k-1}\differential t as

Ii,αθ+2(q;α)\displaystyle I_{i,\alpha\theta+2}\quantity(q;\mathcal{F}_{\alpha}) =01zαθ+2exp(q/zα)dz\displaystyle=\int_{0}^{\infty}\frac{1}{z^{\alpha\theta+2}}\exp\quantity(-q/z^{\alpha})\differential z
=1αq0(uq)θ+1α1eudu\displaystyle=\frac{1}{\alpha q}\int_{0}^{\infty}\quantity(\frac{u}{q})^{\theta+\frac{1}{\alpha}-1}e^{-u}\differential u
=1αqθ+1α0uθ+1α1eudu\displaystyle=\frac{1}{\alpha q^{\theta+\frac{1}{\alpha}}}\int_{0}^{\infty}u^{\theta+\frac{1}{\alpha}-1}e^{-u}\differential u
=1αqθ+1αΓ(θ+1α),\displaystyle=\frac{1}{\alpha q^{\theta+\frac{1}{\alpha}}}\Gamma\quantity(\theta+\frac{1}{\alpha}),

and

Ii,αθ+1(q;α)\displaystyle I_{i,\alpha\theta+1}\quantity(q;\mathcal{F}_{\alpha}) =01zαθ+1exp(q/zα)dz\displaystyle=\int_{0}^{\infty}\frac{1}{z^{\alpha\theta+1}}\exp\quantity(-q/z^{\alpha})\differential z
=1αq0(uq)θ1eudu\displaystyle=\frac{1}{\alpha q}\int_{0}^{\infty}\quantity(\frac{u}{q})^{\theta-1}e^{-u}\differential u
=1αqθ0uθ1eudu\displaystyle=\frac{1}{\alpha q^{\theta}}\int_{0}^{\infty}u^{\theta-1}e^{-u}\differential u
=1αqθΓ(θ).\displaystyle=\frac{1}{\alpha q^{\theta}}\Gamma\quantity(\theta).

Replacing qq with i+1wθi+1-w-\theta, we have

Ii,αθ+2(i+1wθ;α)Ii,αθ+1(i+1wθ;α)=1i+1wθαΓ(θ+1α)Γ(θ).\frac{I_{i,\alpha\theta+2}\quantity(i+1-w-\theta;\mathcal{F}_{\alpha})}{I_{i,\alpha\theta+1}\quantity(i+1-w-\theta;\mathcal{F}_{\alpha})}=\frac{1}{\sqrt[\alpha]{i+1-w-\theta}}\frac{\Gamma\quantity(\theta+\frac{1}{\alpha})}{\Gamma\quantity(\theta)}. (55)

By Lemma 16, Gautschi’s inequality, we have

Γ(θ+1α)Γ(θ)(θ+1α)1α.\frac{\Gamma\quantity(\theta+\frac{1}{\alpha})}{\Gamma\quantity(\theta)}\leq\quantity(\theta+\frac{1}{\alpha})^{\frac{1}{\alpha}}.

Combining this result with (55), we obtain

Ii,αθ+2(i+1wθ;α)Ii,αθ+1(i+1wθ;α)(θ+1αi+1wθ)1α.\frac{I_{i,\alpha\theta+2}\quantity(i+1-w-\theta;\mathcal{F}_{\alpha})}{I_{i,\alpha\theta+1}\quantity(i+1-w-\theta;\mathcal{F}_{\alpha})}\leq\quantity(\frac{\theta+\frac{1}{\alpha}}{i+1-w-\theta})^{\frac{1}{\alpha}}.

Since w{0}[(mi)1],θ[(mw)(iw)]w\in\quantity{0}\cup[(m\land i)-1],\theta\in[(m-w)\land(i-w)], we have

maxw{0}[(mi)1]θ[(mw)(iw)]θ+1αi+1wθ\displaystyle\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m-w)\land(i-w)]\end{subarray}}\frac{\theta+\frac{1}{\alpha}}{i+1-w-\theta} =maxw{0}[(mi)1](mi)+1αwi+1(mi)\displaystyle=\max_{w\in\quantity{0}\cup[(m\land i)-1]}\frac{(m\land i)+\frac{1}{\alpha}-w}{i+1-(m\land i)}
=(mi)+1αi+1(mi)\displaystyle=\frac{(m\land i)+\frac{1}{\alpha}}{i+1-(m\land i)}
=(mi)+1α(im+1)1.\displaystyle=\frac{(m\land i)+\frac{1}{\alpha}}{(i-m+1)\lor 1}.

Recall that σi=i\sigma_{i}=i as previously noted for notational simplicity. By Lemma 3 and (54), for any i[d]i\in[d], we have

Ji(λ;α)ϕi(λ;α)\displaystyle\frac{J_{i}(\lambda;\mathcal{F}_{\alpha})}{\phi_{i}(\lambda;\mathcal{F}_{\alpha})} maxw{0}[(mi)1]θ[(mw)(iw)]Ji,θ(λ;𝒫α,i,w)ϕi,θ(λ;𝒫α,i,w)\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m-w)\land(i-w)]\end{subarray}}\frac{J_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})}
maxw{0}[(mi)1]θ[(mw)(iw)]Ii,αθ+2(λ,i+1wθ;α)Ii,αθ+1(λ,i+1wθ;α)\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m-w)\land(i-w)]\end{subarray}}\frac{I_{i,\alpha\theta+2}(\lambda^{*},i+1-w-\theta;\mathcal{F}_{\alpha})}{I_{i,\alpha\theta+1}(\lambda^{*},i+1-w-\theta;\mathcal{F}_{\alpha})}
((mσi)+1α(σim+1)1)1α.\displaystyle\leq\quantity(\frac{(m\land\sigma_{i})+\frac{1}{\alpha}}{(\sigma_{i}-m+1)\lor 1})^{\frac{1}{\alpha}}.

7.3 Proof of Lemma 5

Following the proofs of Honda et al. (2023) and Lee et al. (2024), we extend the statement to combinatorial semi-bandit setting.

Proof.

Define

Ω¯={r:[argmina𝒜{a(η(L^t+(t,iwt,i1^)ei)r)}]i=1}\underline{\Omega}=\quantity{r:\quantity[\mathop{\arg\min}\limits_{a\in\mathcal{A}}\left\{a^{\top}(\eta(\hat{L}_{t}+(\ell_{t,i}\widehat{w_{t,i}^{-1}})e_{i})-r)\right\}]_{i}=1}

and

Ω¯={r:[argmina𝒜{a(η(L^t+^t)r)}]i=1}.\overline{\Omega}=\quantity{r:\quantity[\mathop{\arg\min}\limits_{a\in\mathcal{A}}\left\{a^{\top}(\eta(\hat{L}_{t}+\hat{\ell}_{t})-r)\right\}]_{i}=1}.

Then, we have

ϕi(η(L^t+(t,iwt,i1^)ei);𝒟α)=r𝒟α(Ω¯),ϕi(η(L^t+^t);𝒟α)=r𝒟α(Ω¯).\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\quantity(\ell_{t,i}\widehat{w_{t,i}^{-1}})e_{i});\mathcal{D}_{\alpha})=\mathbb{P}_{r\sim\mathcal{D}_{\alpha}}\quantity(\underline{\Omega}),\quad\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{D}_{\alpha})=\mathbb{P}_{r\sim\mathcal{D}_{\alpha}}\quantity(\overline{\Omega}).

Since Ω¯Ω¯\underline{\Omega}\subset\overline{\Omega}, we immediately have

ϕi(η(L^t+(t,iwt,i1^)ei);𝒟α)ϕi(η(L^t+^t);𝒟α),\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\quantity(\ell_{t,i}\widehat{w_{t,i}^{-1}})e_{i});\mathcal{D}_{\alpha})\leq\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{D}_{\alpha}),

with which we have

ϕi(ηL^t;𝒟α)ϕi(η(L^t+^t);𝒟α)\displaystyle\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})-\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{D}_{\alpha}) ϕi(ηL^t;𝒟α)ϕi(η(L^t+(t,iwt,i1^)ei);𝒟α)\displaystyle\leq\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})-\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\quantity(\ell_{t,i}\widehat{w_{t,i}^{-1}})e_{i});\mathcal{D}_{\alpha})
=0ηt,iwt,i1^ϕi(ηL^t+xei;𝒟α)dx.\displaystyle=\int_{0}^{\eta\ell_{t,i}\widehat{w_{t,i}^{-1}}}-\phi_{i}^{\prime}\quantity(\eta\hat{L}_{t}+xe_{i};\mathcal{D}_{\alpha})\differential x. (56)

Recalling that ϕi(λ;𝒟α)\phi_{i}(\lambda;\mathcal{D}_{\alpha}) is expressed as

ϕi(λ;𝒟α)=θ=1mνλ~θf(z+λi)𝒗𝒮i,θ(j:vj=1(1F(z+λj))j:vj=0,jiF(z+λj))dz,\phi_{i}(\lambda;\mathcal{D}_{\alpha})=\sum_{\theta=1}^{m}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f(z+\lambda_{i})\sum_{\bm{v}\in\mathcal{S}_{i,\theta}}\quantity(\prod_{j:v_{j}=1}\quantity(1-F(z+\lambda_{j}))\prod_{j:v_{j}=0,j\neq i}F(z+\lambda_{j}))\differential z,

we see that ϕi(λ;𝒟α)=ϕiλi(λ;𝒟α)\phi^{\prime}_{i}(\lambda;\mathcal{D}_{\alpha})=\frac{\partial\phi_{i}}{\partial\lambda_{i}}(\lambda;\mathcal{D}_{\alpha}) is expressed as

ϕi(λ;𝒟α)=θ=1mνλ~θf(z+λi)𝒗𝒮i,θ(j:vj=1(1F(z+λj))j:vj=0,jiF(z+λj))dz.\phi^{\prime}_{i}(\lambda;\mathcal{D}_{\alpha})=\\ \sum_{\theta=1}^{m}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f^{\prime}(z+\lambda_{i})\sum_{\bm{v}\in\mathcal{S}_{i,\theta}}\quantity(\prod_{j:v_{j}=1}\quantity(1-F(z+\lambda_{j}))\prod_{j:v_{j}=0,j\neq i}F(z+\lambda_{j}))\differential z.

Now we divide the proof into two cases.

Fréchet distribution

When 𝒟α=α\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha}, since for x>0x>0,

f(x)=(α(α+1)xα+2+α2x2(α+1))e1/xα,f^{\prime}(x)=\quantity(-\frac{\alpha(\alpha+1)}{x^{\alpha+2}}+\frac{\alpha^{2}}{x^{2(\alpha+1)}})e^{-1/x^{\alpha}},

we have

f(x)=(α(α+1)xα+2α2x2(α+1))e1/xαα(α+1)xα+2e1/xα=α+1xf(x).-f^{\prime}(x)=\quantity(\frac{\alpha(\alpha+1)}{x^{\alpha+2}}-\frac{\alpha^{2}}{x^{2(\alpha+1)}})e^{-1/x^{\alpha}}\leq\frac{\alpha(\alpha+1)}{x^{\alpha+2}}e^{-1/x^{\alpha}}=\frac{\alpha+1}{x}f(x).

Therefore, by (56) we have

ϕi(ηL^t;α)ϕi(η(L^t+^t);α)\displaystyle\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{F}_{\alpha})-\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{F}_{\alpha}) (α+1)0ηt,iwt,i1^Ji(ηL^t+xei;α)dx\displaystyle\leq(\alpha+1)\int_{0}^{\eta\ell_{t,i}\widehat{w_{t,i}^{-1}}}J_{i}\quantity(\eta\hat{L}_{t}+xe_{i};\mathcal{F}_{\alpha})\differential x
(α+1)0ηt,iwt,i1^Ji(ηL^t;α)dx\displaystyle\leq(\alpha+1)\int_{0}^{\eta\ell_{t,i}\widehat{w_{t,i}^{-1}}}J_{i}\quantity(\eta\hat{L}_{t};\mathcal{F}_{\alpha})\differential x (57)
=(α+1)ηt,iJi(ηL^t;α)wt,i1^,\displaystyle=(\alpha+1)\eta\ell_{t,i}J_{i}\quantity(\eta\hat{L}_{t};\mathcal{F}_{\alpha})\widehat{w_{t,i}^{-1}},

where (57) follows from the monotonicity of Ji(ηL^t;α)J_{i}\quantity(\eta\hat{L}_{t};\mathcal{F}_{\alpha}).

Pareto distribution

When 𝒟α=𝒫α\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}, since for x>1x>1,

f(x)=α(α+1)x(α+2),f^{\prime}(x)=-\alpha(\alpha+1)x^{-(\alpha+2)},

by (56) we have

ϕi(ηL^t;𝒫α)ϕi(η(L^t+^t);𝒫α)\displaystyle\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{P}_{\alpha})-\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{P}_{\alpha}) =(α+1)0ηt,iwt,i1^Ji(ηL^t+xei;𝒫α)dx\displaystyle=(\alpha+1)\int_{0}^{\eta\ell_{t,i}\widehat{w_{t,i}^{-1}}}J_{i}\quantity(\eta\hat{L}_{t}+xe_{i};\mathcal{P}_{\alpha})\differential x
(α+1)0ηt,iwt,i1^Ji(ηL^t;𝒫α)dx\displaystyle\leq(\alpha+1)\int_{0}^{\eta\ell_{t,i}\widehat{w_{t,i}^{-1}}}J_{i}\quantity(\eta\hat{L}_{t};\mathcal{P}_{\alpha})\differential x (58)
=(α+1)ηt,iJi(ηL^t;𝒫α)wt,i1^,\displaystyle=(\alpha+1)\eta\ell_{t,i}J_{i}\quantity(\eta\hat{L}_{t};\mathcal{P}_{\alpha})\widehat{w_{t,i}^{-1}},

where (58) follows from the monotonicity of Ji(ηL^t;𝒫α)J_{i}\quantity(\eta\hat{L}_{t};\mathcal{P}_{\alpha}).

Here note that wt,i1^\widehat{{w_{t,i}^{-1}}} follows the geometric distribution with expectation 1/wt,i1/w_{t,i}, given L^t\hat{L}_{t} and at,ia_{t,i}, which satisfies

𝔼[wt,i1^2|L^t,at,i]=2wt,i21wt,i2wt,i2.\mathbb{E}\quantity[\widehat{{w_{t,i}^{-1}}}^{2}\middle|\hat{L}_{t},a_{t,i}]=\frac{2}{w_{t,i}^{2}}-\frac{1}{w_{t,i}}\leq\frac{2}{w_{t,i}^{2}}. (59)

Since ^t,i=(t,iwt,i1^)ei\hat{\ell}_{t,i}=\quantity(\ell_{t,i}\widehat{{w_{t,i}^{-1}}})e_{i} when at,i=1a_{t,i}=1, for 𝒟α{α,𝒫α}\mathcal{D}_{\alpha}\in\quantity{\mathcal{F}_{\alpha},\mathcal{P}_{\alpha}} we obtain

𝔼[^t,i(ϕi(ηL^t;𝒟α)ϕi(η(L^t+^t);𝒟α))|L^t]\displaystyle\mathbb{E}\quantity[\hat{\ell}_{t,i}\quantity(\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})-\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{D}_{\alpha}))\middle|\hat{L}_{t}]
\displaystyle\leq 𝔼[𝟙[at,i=1]t,iwt,i1^(α+1)ηt,iJi(ηL^t;𝒟α)wt,i1^|L^t]\displaystyle\mathbb{E}\quantity[\mathbbm{1}\quantity[a_{t,i}=1]\ell_{t,i}\widehat{{w_{t,i}^{-1}}}\cdot(\alpha+1)\eta\ell_{t,i}J_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})\widehat{w_{t,i}^{-1}}\middle|\hat{L}_{t}]
\displaystyle\leq 2(α+1)η𝔼[wt,it,i2Ji(ηL^t;𝒟α)wt,i2|L^t]\displaystyle 2(\alpha+1)\eta\mathbb{E}\quantity[w_{t,i}\frac{\ell^{2}_{t,i}J_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})}{w^{2}_{t,i}}\middle|\hat{L}_{t}]
\displaystyle\leq 2(α+1)η𝔼[Ji(ηL^t;𝒟α)ϕi(ηL^t;𝒟α)|L^t]\displaystyle 2(\alpha+1)\eta\mathbb{E}\quantity[\frac{J_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})}{\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})}\middle|\hat{L}_{t}]
\displaystyle\leq {2(α+1)η((σim)+1α(σim+1)1)1α,if 𝒟α=α,4αη((σim)+1ασi)1α,if 𝒟α=𝒫α.\displaystyle\begin{cases}2(\alpha+1)\eta\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{(\sigma_{i}-m+1)\lor 1})^{\frac{1}{\alpha}},&\text{if }\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha},\\ 4\alpha\eta\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{\sigma_{i}})^{\frac{1}{\alpha}},&\text{if }\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}.\end{cases} (by Lemma 4)

7.4 Proof of Lemma 6

By Lemma 5, we have the following result.

7.4.1 Fréchet Distribution

When 𝒟α=α\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha}, we have

𝔼[^t(ϕ(ηL^t;α)ϕ(η(L^t+^t);α))|L^t]\displaystyle\mathbb{E}\quantity[\hat{\ell}_{t}\quantity(\phi\quantity(\eta\hat{L}_{t};\mathcal{F}_{\alpha})-\phi\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{F}_{\alpha}))\middle|\hat{L}_{t}]
=\displaystyle= i[d]2(α+1)η((σim)+1α(σim+1)1)1α\displaystyle\sum_{i\in[d]}2(\alpha+1)\eta\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{(\sigma_{i}-m+1)\lor 1})^{\frac{1}{\alpha}}
\displaystyle\leq 2(α+1)η(m+1α)1α(m+i=1dm+11iα)\displaystyle 2(\alpha+1)\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+\sum_{i=1}^{d-m+1}\frac{1}{\sqrt[\alpha]{i}})
\displaystyle\leq 2(α+1)η(m+1α)1α(m+1+1dm+1x1/αdx)\displaystyle 2(\alpha+1)\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+1+\int_{1}^{d-m+1}x^{-1/\alpha}\differential x)
=\displaystyle= 2(α+1)η(m+1α)1α(m+α(dm+1)11/α1α1)\displaystyle 2(\alpha+1)\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+\frac{\alpha(d-m+1)^{1-1/\alpha}-1}{\alpha-1})
\displaystyle\leq 2(α+1)η(m+1α)1α(m+αα1(dm+1)11/α).\displaystyle 2(\alpha+1)\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+\frac{\alpha}{\alpha-1}(d-m+1)^{1-1/\alpha}).

7.4.2 Pareto Distribution

When 𝒟α=𝒫α\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}, we have

𝔼[^t(ϕ(ηL^t;𝒫α)ϕ(η(L^t+^t);𝒫α))|L^t]\displaystyle\mathbb{E}\quantity[\hat{\ell}_{t}\quantity(\phi\quantity(\eta\hat{L}_{t};\mathcal{P}_{\alpha})-\phi\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{P}_{\alpha}))\middle|\hat{L}_{t}] =i[d]4αη((σim)+1ασi)1α\displaystyle=\sum_{i\in[d]}4\alpha\eta\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{\sigma_{i}})^{\frac{1}{\alpha}}
=i[d]4αη(m+1ασi)1α\displaystyle=\sum_{i\in[d]}4\alpha\eta\quantity(\frac{m+\frac{1}{\alpha}}{\sigma_{i}})^{\frac{1}{\alpha}}
4αη(m+1α)1α(1+1dx1/αdx)\displaystyle\leq 4\alpha\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(1+\int_{1}^{d}x^{-1/\alpha}\differential x)
=4αη(m+1α)1ααd11/α1α1\displaystyle=4\alpha\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\frac{\alpha d^{1-1/\alpha}-1}{\alpha-1}
4α2α1η(m+1α)1αd11/α.\displaystyle\leq\frac{4\alpha^{2}}{\alpha-1}\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}d^{1-1/\alpha}.

8 Technical Lemmas

Lemma 16 (Gautschi’s inequality).

For x>0x>0 and s(0,1)s\in(0,1),

x1s<Γ(x+1)Γ(x+s)<(x+1)1s.x^{1-s}<\frac{\Gamma(x+1)}{\Gamma(x+s)}<(x+1)^{1-s}.
Lemma 17.

(Malik, 1966, Eq. (3.7)) Let Xk,nX_{k,n} be the kk-th order statistics of i.i.d. RVs from 𝒫α\mathcal{P}_{\alpha} for k[n]k\in[n], where α>1\alpha>1. Then, we have

𝔼[Xk,n]=Γ(n+1)Γ(nk1α+1)Γ(nk+1)Γ(n1α+1).\mathbb{E}[X_{k,n}]=\frac{\Gamma\quantity(n+1)\Gamma\quantity(n-k-\frac{1}{\alpha}+1)}{\Gamma\quantity(n-k+1)\Gamma\quantity(n-\frac{1}{\alpha}+1)}.
Lemma 18.

Let F(x)F(x) and G(x)G(x) be CDFs of some random variables such that G(x)F(x)G(x)\geq F(x) for all xx\in\mathbb{R}. Let (X1,X2,,Xn)(X_{1},X_{2},\dots,X_{n}) (resp. (Y1,Y2,,Yn)(Y_{1},Y_{2},\dots,Y_{n})) be RVs i.i.d. from FF (resp. GG), and Xk,nX_{k,n} (resp. Yk,nY_{k,n}) be its kk-th order statistics for any k[n]k\in[n]. Then, 𝔼[Yk,n]𝔼[Xk,n]\mathbb{E}[Y_{k,n}]\leq\mathbb{E}[X_{k,n}] holds.

Proof.

Let U[0,1]U\in[0,1] be uniform random variable over [0,1][0,1] and let X=F1(U)X=F^{-1}(U) and Y=G1(U)Y=G^{-1}(U), where F1F^{-1} and G1G^{-1} are the left-continuous inverses of FF and GG, respectively. Then, YXY\leq X holds almost surely and the marginal distributions satisfy XFX\sim F and YGY\sim G. Therefore, if when take (X1,Y1),,(Xn,Yn)(X_{1},Y_{1}),\dots,(X_{n},Y_{n}) as i.i.d. copies of this (X,Y)(X,Y), we see that Yk,nXk,nY_{k,n}\leq X_{k,n} holds almost surely, which proves the lemma. ∎

Lemma 19.

Let Xk,nX_{k,n} (resp. Yk,nY_{k,n}) be the kk-th order statistics of i.i.d. RVs from 𝒫α\mathcal{P}_{\alpha} and α\mathcal{F}_{\alpha} for k[n]k\in[n]. Then, 𝔼[Yk,n]𝔼[Xk,n]+1\mathbb{E}[Y_{k,n}]\leq\mathbb{E}[X_{k,n}]+1 holds.

Proof.

Letting F(x)F(x) and G(x)G(x) be the CDFs of 𝒫α\mathcal{P}_{\alpha} and α\mathcal{F}_{\alpha}, we have

G(x)\displaystyle G(x) =𝟙[x0]e1/xα\displaystyle=\mathbbm{1}[x\geq 0]e^{-1/x^{\alpha}}
𝟙[x0](11xα)\displaystyle\geq\mathbbm{1}[x\geq 0]\left(1-\frac{1}{x^{\alpha}}\right)
𝟙[x10](11((x1)+1)α)\displaystyle\geq\mathbbm{1}[x-1\geq 0]\left(1-\frac{1}{((x-1)+1)^{\alpha}}\right)
=F(x1),\displaystyle=F(x-1),

where F(x1)F(x-1) is the CDF of X+1X+1 for X𝒫αX\sim\mathcal{P}_{\alpha}. Then, it holds from Lemma 18 that 𝔼[Yk,n]𝔼[Xk,n+1]=𝔼[Xk,n]+1\mathbb{E}[Y_{k,n}]\leq\mathbb{E}[X_{k,n}+1]=\mathbb{E}[X_{k,n}]+1. ∎

9 Issues in Proof of Claimed Extension to the Monotone Decreasing Case

In this section, we use the same notation as Zhan et al. (2025) and reconstruct their missing arguments in Lemma 4.1 in Zhan et al. (2025). The claim proposed by them is as follows.

Claim 1 (Lemma 4.1 in Zhan et al. (2025)).

For any {1,2,,d}\mathcal{I}\subseteq\quantity{1,2,\cdots,d}, ii\notin\mathcal{I}, λd\lambda\in\mathbb{R}^{d} such that λi0\lambda_{i}\geq 0 and any N3N\geq 3, let

Ji,N,(λ)01(x+λi)Nq(1F(x+λq))qF(x+λq)dx.J_{i,N,\mathcal{I}}(\lambda)\coloneqq\int_{0}^{\infty}\frac{1}{(x+\lambda_{i})^{N}}\prod_{q\in\mathcal{I}}\quantity(1-F(x+\lambda_{q}))\prod_{q\notin\mathcal{I}}F(x+\lambda_{q})\differential x.

Then, for all k>0k>0, Ji,N+k,(λ)Ji,N,(λ)\frac{J_{i,N+k,\mathcal{I}}(\lambda)}{J_{i,N,\mathcal{I}}(\lambda)} is increasing in λq0\lambda_{q}\geq 0 for qq\notin\mathcal{I} and qiq\neq i, while decreasing in λq0\lambda_{q}\geq 0 for qq\in\mathcal{I}.

In their paper, they consider the case that F(x)F(x) is the cumulative distribution function of Fréchet distribution with shape 22, which is expressed as

F(x)=e1/x2,x0,F(x)=e^{-1/x^{2}},\quad x\geq 0,

and thus we also adopt the same setting in this section. For this claim, they only gave the proof of the monotonic increase for the former case and it is written that the monotonic decrease for the latter case “can be shown by the same argument”. Nevertheless, the monotonic decrease is not proved by the same argument, and it is highly likely that the statement itself does not hold as we will demonstrate below.

Now we follow the line of the proof of monotone increasing case in Zhan et al. (2025), giving an attempt to prove that for all k>0k>0, Ji,N+k,(λ)Ji,N,(λ)\frac{J_{i,N+k,\mathcal{I}}(\lambda)}{J_{i,N,\mathcal{I}}(\lambda)} is monotonically decreasing on λq0\lambda_{q}\geq 0 for qq\in\mathcal{I}. For q0q_{0}\in\mathcal{I}, denote

Ji,N,q0(λ)\displaystyle J^{q_{0}}_{i,N,\mathcal{I}}(\lambda) =12λq0Ji,N,(λ)\displaystyle=-\frac{1}{2}\frac{\partial}{\partial\lambda_{q_{0}}}J_{i,N,\mathcal{I}}(\lambda)
=0e1/(x+λq0)2(x+λi)N(x+λq0)3q{q0}(1F(x+λq))qF(x+λq)dx.\displaystyle=\int_{0}^{\infty}\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{N}(x+\lambda_{q_{0}})^{3}}\prod_{q\in\mathcal{I}\setminus\quantity{q_{0}}}\quantity(1-F(x+\lambda_{q}))\prod_{q\notin\mathcal{I}}F(x+\lambda_{q})\differential x.

Hence,

λq0Ji,N+k,(λ)Ji,N,(λ)=2Ji,N+k,q0(λ)Ji,N,(λ)Ji,N+k,(λ)Ji,N,q0(λ)Ji,N,(λ)2.\frac{\partial}{\partial\lambda_{q_{0}}}\frac{J_{i,N+k,\mathcal{I}}(\lambda)}{J_{i,N,\mathcal{I}}(\lambda)}=-2\cdot\frac{J^{q_{0}}_{i,N+k,\mathcal{I}}(\lambda)J_{i,N,\mathcal{I}}(\lambda)-J_{i,N+k,\mathcal{I}}(\lambda)J^{q_{0}}_{i,N,\mathcal{I}}(\lambda)}{J_{i,N,\mathcal{I}}(\lambda)^{2}}. (60)

Letting Q(x)=1(x+λi)Nq{q0}(1F(x+λq))qF(x+λq)Q(x)=\frac{1}{(x+\lambda_{i})^{N}}\prod_{q\in\mathcal{I}\setminus\quantity{q_{0}}}\quantity(1-F(x+\lambda_{q}))\prod_{q\notin\mathcal{I}}F(x+\lambda_{q}), we have

Ji,N+k,q0(λ)Ji,N,(λ)\displaystyle J^{q_{0}}_{i,N+k,\mathcal{I}}(\lambda)J_{i,N,\mathcal{I}}(\lambda)
=\displaystyle= x,y0e1/(x+λq0)2(x+λi)k(x+λq0)3(1e1/(y+λq0)2)Q(x)Q(y)dxdy\displaystyle\iint_{x,y\geq 0}\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}(x+\lambda_{q_{0}})^{3}}\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}})Q(x)Q(y)\differential x\differential y
=\displaystyle= 12x,y0Q(x)Q(y)[e1/(x+λq0)2(x+λi)k(x+λq0)3(1e1/(y+λq0)2)+e1/(y+λq0)2(y+λi)k(y+λq0)3(1e1/(x+λq0)2)]dxdy.\displaystyle\frac{1}{2}\iint_{x,y\geq 0}Q(x)Q(y)\quantity[\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}(x+\lambda_{q_{0}})^{3}}\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}})+\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{i})^{k}(y+\lambda_{q_{0}})^{3}}\quantity(1-e^{-1/(x+\lambda_{q_{0}})^{2}})]\differential x\differential y.

Here, it is worth noting that e1/(x+λq0)2(x+λi)k(x+λq0)3\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}(x+\lambda_{q_{0}})^{3}} and (1e1/(y+λq0)2)\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}}) share no common factors, and thus no factor can be hidden in Q(x)Q(x).

Similarly, we have

Ji,N+k,(λ)Ji,N,q0(λ)\displaystyle J_{i,N+k,\mathcal{I}}(\lambda)J^{q_{0}}_{i,N,\mathcal{I}}(\lambda)
=\displaystyle= x,y01e1/(x+λq0)2(x+λi)ke1/(y+λq0)2(y+λq0)3Q(x)Q(y)dxdy\displaystyle\iint_{x,y\geq 0}\frac{1-e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}}\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{q_{0}})^{3}}Q(x)Q(y)\differential x\differential y
=\displaystyle= 12x,y0Q(x)Q(y)[1e1/(x+λq0)2(x+λi)ke1/(y+λq0)2(y+λq0)3+1e1/(y+λq0)2(y+λi)ke1/(x+λq0)2(x+λq0)3]dxdy.\displaystyle\frac{1}{2}\iint_{x,y\geq 0}Q(x)Q(y)\quantity[\frac{1-e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}}\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{q_{0}})^{3}}+\frac{1-e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{i})^{k}}\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{q_{0}})^{3}}]\differential x\differential y.

Now we substitute the above two expressions into (60).

By an elementary calculation, we have

e1/(x+λq0)2(x+λi)k(x+λq0)3(1e1/(y+λq0)2)+e1/(y+λq0)2(y+λi)k(y+λq0)3(1e1/(x+λq0)2)\displaystyle\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}(x+\lambda_{q_{0}})^{3}}\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}})+\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{i})^{k}(y+\lambda_{q_{0}})^{3}}\quantity(1-e^{-1/(x+\lambda_{q_{0}})^{2}})
\displaystyle- 1e1/(x+λq0)2(x+λi)ke1/(y+λq0)2(y+λq0)31e1/(y+λq0)2(y+λi)ke1/(x+λq0)2(x+λq0)3\displaystyle\frac{1-e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}}\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{q_{0}})^{3}}-\frac{1-e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{i})^{k}}\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{q_{0}})^{3}}
=\displaystyle= (y+λi)k(x+λi)k(x+λi)k(y+λi)k(e1/(x+λq0)2(1e1/(y+λq0)2)(x+λq0)3e1/(y+λq0)2(1e1/(x+λq0)2)(y+λq0)3)\displaystyle\frac{(y+\lambda_{i})^{k}-(x+\lambda_{i})^{k}}{(x+\lambda_{i})^{k}(y+\lambda_{i})^{k}}\quantity(\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}})}{(x+\lambda_{q_{0}})^{3}}-\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}\quantity(1-e^{-1/(x+\lambda_{q_{0}})^{2}})}{(y+\lambda_{q_{0}})^{3}})
=\displaystyle= ((y+λi)k(x+λi)k)(1e1/(y+λq0)2)(1e1/(x+λq0)2)(x+λi)k(y+λi)k\displaystyle\frac{\quantity((y+\lambda_{i})^{k}-(x+\lambda_{i})^{k})\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}})\quantity(1-e^{-1/(x+\lambda_{q_{0}})^{2}})}{(x+\lambda_{i})^{k}(y+\lambda_{i})^{k}}
\displaystyle\cdot (e1/(x+λq0)2(x+λq0)3(1e1/(x+λq0)2)e1/(y+λq0)2(y+λq0)3(1e1/(y+λq0)2)).\displaystyle\quantity(\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{q_{0}})^{3}\quantity(1-e^{-1/(x+\lambda_{q_{0}})^{2}})}-\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{q_{0}})^{3}\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}})}). (61)

Here, for (61), one can see that when xyx\geq y, the first term becomes negative. On the other hand, when xyx\leq y, the first term becomes positive. Define

h(x)=e1/x2x3(1e1/x2).h(x)=\frac{e^{-1/x^{2}}}{x^{3}\quantity(1-e^{-1/x^{2}})}.

If h(x)h(x) can be shown to be monotonically decreasing, then (60) is negative, that is, the claim holds. However, the derivative of h(x)h(x) is given as

h(x)=e1/x2(3x2(e1/x21)+2)x6(1e1/x2)2,h^{\prime}(x)=\frac{e^{-1/x^{2}}\quantity(3x^{2}\quantity(e^{-1/x^{2}}-1)+2)}{x^{6}\quantity(1-e^{-1/x^{2}})^{2}},

where one can see that h(x)h^{\prime}(x) is not always negative, and thus h(x)h(x) is not monotonic in x[0,)x\in[0,\infty).

Refer to caption
Figure 1: The ratio Ji,4,(λ)/Ji,3,(λ)J_{i,4,\mathcal{I}}(\lambda)/J_{i,3,\mathcal{I}}(\lambda) as a function of λq0\lambda_{q_{0}} for q0{i}q_{0}\in\mathcal{I}\setminus\quantity{i}.

By the analysis above, we can see that the monotone decreasing case can not be proved by the same argument as in increasing case.

Numerical Simulation

We now turn to a numerical simulation implemented in Python to illustrate this failure. As a counterexample, it is sufficient to show one special case where ||=5\left\lvert\mathcal{I}\right\rvert=5, q0q_{0}\in\mathcal{I} and λq=λi=0.5\lambda_{q}=\lambda_{i}=0.5 for all qq0q\neq q_{0}. Then, we consider the expression

Ji,4,(λq0,λi)Ji,3,(λq0,λi)=01(x+λi)4(1F(x+λq0))(1F(x+λi))4F(x+λi)2dx01(x+λi)3(1F(x+λq0))(1F(x+λi))4F(x+λi)2dx.\frac{J_{i,4,\mathcal{I}}(\lambda_{q_{0}},\lambda_{i})}{J_{i,3,\mathcal{I}}(\lambda_{q_{0}},\lambda_{i})}=\frac{\int_{0}^{\infty}\frac{1}{(x+\lambda_{i})^{4}}\quantity(1-F(x+\lambda_{q_{0}}))\quantity(1-F(x+\lambda_{i}))^{4}F(x+\lambda_{i})^{2}\differential x}{\int_{0}^{\infty}\frac{1}{(x+\lambda_{i})^{3}}\quantity(1-F(x+\lambda_{q_{0}}))\quantity(1-F(x+\lambda_{i}))^{4}F(x+\lambda_{i})^{2}\differential x}.

Here, we set N=3N=3 and k=1k=1, which are the parameters used in the subsequent analysis in Zhan et al. (2025). Treating this expression as a function of λq0\lambda_{q_{0}}, we can observe from Figure 1 that the function is not monotonic in λq\lambda_{q}, and it does not hold that Ji,4,(λq0,λi)Ji,3,(λq0,λi)Ji,4,(λi,λi)Ji,3,(λi,λi)\frac{J_{i,4,\mathcal{I}}(\lambda_{q_{0}},\lambda_{i})}{J_{i,3,\mathcal{I}}(\lambda_{q_{0}},\lambda_{i})}\leq\frac{J_{i,4,\mathcal{I}}(\lambda_{i},\lambda_{i})}{J_{i,3,\mathcal{I}}(\lambda_{i},\lambda_{i})} for some λq0>λi\lambda_{q_{0}}>\lambda_{i}.

From these arguments, it is highly likely that Lemma 4.1 in Zhan et al. (2025) does not hold, and at least the current proof is not complete. On the other hand, our analysis is constructed in a way that does not need the monotonicity of Ji,N+k,(λ)Ji,N,(λ)\frac{J_{i,N+k,\mathcal{I}}(\lambda)}{J_{i,N,\mathcal{I}}(\lambda)}, which becomes the key to the extension of the results for the MAB to the combinatorial semi-bandit setting.

Acknowledgements

We thank Dr. Jongyeong Lee for pointing out the error in the proof of Lemma 10 in the previous version.

References

  • Abernethy et al. (2015) Jacob D Abernethy, Chansoo Lee, and Ambuj Tewari. Fighting bandits with a new kind of smoothness. Advances in Neural Information Processing Systems, 28, 2015.
  • Audibert et al. (2014) Jean-Yves Audibert, Sébastien Bubeck, and Gábor Lugosi. Regret in online combinatorial optimization. Mathematics of Operations Research, 39(1):31–45, 2014.
  • Chen et al. (2025) Botao Chen, Jongyeong Lee, and Junya Honda. Geometric resampling in nearly linear time for follow-the-perturbed-leader with best-of-both-worlds guarantee in bandit problems. In International conference on machine learning. PMLR, 2025. In press.
  • Chen et al. (2013) Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multi-armed bandit: General framework and applications. In International conference on machine learning, pages 151–159. PMLR, 2013.
  • Gai et al. (2012) Yi Gai, Bhaskar Krishnamachari, and Rahul Jain. Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. IEEE/ACM Transactions on Networking, 20(5):1466–1478, 2012.
  • Honda et al. (2023) Junya Honda, Shinji Ito, and Taira Tsuchiya. Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for Bandit Problems. In Proceedings of The 34th International Conference on Algorithmic Learning Theory, volume 201 of PMLR, pages 726–754. PMLR, 20 Feb–23 Feb 2023.
  • Ito (2021) Shinji Ito. Hybrid regret bounds for combinatorial semi-bandits and adversarial linear bandits. Advances in Neural Information Processing Systems, 34:2654–2667, 2021.
  • Kveton et al. (2014) Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, and Brian Eriksson. Matroid bandits: fast combinatorial optimization with learning. In Uncertainty in Artificial Intelligence, pages 420–429, 2014.
  • Kveton et al. (2015) Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari. Tight regret bounds for stochastic combinatorial semi-bandits. In Artificial Intelligence and Statistics, pages 535–543. PMLR, 2015.
  • Lee et al. (2024) Jongyeong Lee, Junya Honda, Shinji Ito, and Min-hwan Oh. Follow-the-perturbed-leader with fréchet-type tail distributions: Optimality in adversarial bandits and best-of-both-worlds. In Conference on Learning Theory, pages 3375–3430. PMLR, 2024.
  • Malik (1966) Henrick John Malik. Exact moments of order statistics from the pareto distribution. Scandinavian Actuarial Journal, 1966(3-4):144–157, 1966.
  • Neu (2015) Gergely Neu. First-order regret bounds for combinatorial semi-bandits. In Conference on Learning Theory, pages 1360–1375. PMLR, 2015.
  • Neu and Bartók (2016) Gergely Neu and Gábor Bartók. Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits. Journal of Machine Learning Research, 17(154):1–21, 2016.
  • Nuara et al. (2022) Alessandro Nuara, Francesco Trovò, Nicola Gatti, and Marcello Restelli. Online joint bid/daily budget optimization of internet advertising campaigns. Artificial Intelligence, 305:103663, 2022.
  • Tsuchiya et al. (2023) Taira Tsuchiya, Shinji Ito, and Junya Honda. Further adaptive best-of-both-worlds algorithm for combinatorial semi-bandits. In International Conference on Artificial Intelligence and Statistics, pages 8117–8144. PMLR, 2023.
  • ul Hassan and Curry (2016) Umair ul Hassan and Edward Curry. Efficient task assignment for spatial crowdsourcing: A combinatorial fractional optimization approach with semi-bandit learning. Expert Systems with Applications, 58:36–56, 2016.
  • Wang and Chen (2018) Siwei Wang and Wei Chen. Thompson sampling for combinatorial semi-bandits. In International Conference on Machine Learning, pages 5114–5122. PMLR, 2018.
  • Wang et al. (2017) Yingfei Wang, Hua Ouyang, Chu Wang, Jianhui Chen, Tsvetan Asamov, and Yi Chang. Efficient ordered combinatorial semi-bandits for whole-page recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
  • Wei and Luo (2018) Chen-Yu Wei and Haipeng Luo. More adaptive algorithms for adversarial bandits. In Conference On Learning Theory, pages 1263–1291. PMLR, 2018.
  • Zhan et al. (2025) Jingxin Zhan, Yuchen Xin, and Zhihua Zhang. Follow-the-perturbed-leader approaches best-of-both-worlds for the m-set semi-bandit problems. arXiv preprint arXiv:2504.07307v2, 2025.
  • Zimmert and Seldin (2021) Julian Zimmert and Yevgeny Seldin. Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22(28):1–49, 2021.
  • Zimmert et al. (2019) Julian Zimmert, Haipeng Luo, and Chen-Yu Wei. Beating stochastic and adversarial semi-bandits optimally and simultaneously. In International Conference on Machine Learning, pages 7683–7692. PMLR, 2019.