Note on Follow-the-Perturbed-Leader in Combinatorial Semi-Bandit Problems

1 Introduction

The combinatorial semi-bandit is a sequential decision-making problem under uncertainty, which is a generalization of the classical multi-armed bandit problem. It is instrumental in many practical applications, such as recommender systems (Wang et al., 2017), online advertising (Nuara et al., 2022), crowdsourcing (ul Hassan and Curry, 2016), adaptive routing (Gai et al., 2012) and network optimization (Kveton et al., 2014). In this problem, the learner chooses an action $a_{t}$ from an action set $\mathcal{A}\subset\quantity{0,1}^{d}$ , where $d\in\mathbb{N}$ is the dimension of the action set. In each round $t\in[T]=\quantity{1,2,\cdots,T}$ , the loss vector $\ell_{t}=\quantity(\ell_{t,1},\ell_{t,2},\cdots,\ell_{t,d})$ is determined by the environment, and the learner incurs a loss $\left\langle\ell_{t},a_{t}\right\rangle$ and can only observe the loss $\ell_{t,i}$ for all $i\in[d]$ such that $a_{t,i}=1$ . The goal of the learner is to minimize the cumulative loss over all the rounds. The performance of the learner is often measured in terms of the pseudo-regret defined as $\mathcal{R}(T)=\mathbb{E}\quantity[\sum_{t=1}^{T}\left\langle\ell_{t},a_{t}\right\rangle]-\min_{a\in\mathcal{A}}\mathbb{E}\quantity[\sum_{t=1}^{T}\left\langle\ell_{t},a\right\rangle]$ , which describes the gap between the expected cumulative loss of the learner and of the optimal arm fixed in hindsight.

Since the introduction by Chen et al. (2013), combinatorial semi-bandit problem has been widely studied, which mainly focus on two settings on the formulation of the environment to decide the loss vector, namely the stochastic setting and the adversarial setting. In the stochastic setting, the sequence of loss vectors $\quantity(\ell_{t})_{t=1}^{T}$ is assumed to be independent and identically distributed (i.i.d.) from an unknown but fixed distribution $\mathcal{D}$ over $[0,1]^{d}$ with mean $\mu=\mathbb{E}_{\ell\sim\mathcal{D}}\quantity[\ell]$ . The fixed single optimal action is defined as $a^{*}=\mathop{\arg\min}_{a\in\mathcal{A}}\mathbb{E}\quantity[\sum_{t=1}^{T}\left\langle\ell_{t},a\right\rangle]$ , and we present the minimum suboptimality gap as $\Delta=\min_{a\in\mathcal{A}\setminus\{a^{*}\}}\quantity{\mu^{\top}(a-a^{*})}$ . CombUCB (Kveton et al., 2015) and Combinatorial Thompson Sampling (Wang and Chen, 2018) can achieves a gap-dependent regret bounds of $O\quantity(dm\log T/\Delta)$ for general action sets and $O\quantity((d-m)\log T/\Delta)$ for matroid semi-bandits, where $m=\max_{a\in\mathcal{A}}\left\lVert a\right\rVert$ denotes the maximum size of any action in the set $\mathcal{A}$ .

In the adversarial setting, the loss vectors $\ell_{t}$ is determined from $[0,1]^{d}$ by an adversary in an arbitrary manner, which are not assumed to follow any specific distribution (Kveton et al., 2015; Neu, 2015; Wang and Chen, 2018). For this setting, the regret bound of $O\quantity(\sqrt{mdT})$ can be achieved by some policies, such as OSMD (Audibert et al., 2014) and FTRL with hybrid-regularizer (Zimmert et al., 2019), which matches the lower bound of $\Omega\quantity(\sqrt{mdT})$ (Audibert et al., 2014).

In practical scenarios, the environment to determine the loss vectors is often unknown. Therefore, policies that can adaptively address both stochastic and adversarial settings have been widely studied, particularly in the context of standard multi-armed bandit problems. The Tsallis-INF policy (Zimmert and Seldin, 2021) policy, which is based on Follow-the-Regularized-Leader (FTRL), has demonstrated the ability to achieve the optimality in both settings. For combinatorial semi-bandit problems, there also exist some work on this topic (Wei and Luo, 2018; Zimmert et al., 2019; Ito, 2021; Tsuchiya et al., 2023).

Howerver, some BOBW policies, such as FTRL, require an explicit computation of the arm-selection probability by solving a optimization problem. This leads to computational inefficiencies, and the complexity substantially increase in for combinatorial semi-bandits. In light of this limitation, the Follow-the-Perturbed-Leader (FTPL) policy, has gained significant attention due to its optimization-free nature. Recently, Honda et al. (2023) and Lee et al. (2024) demonstrated that FTPL achieves the Best-of-Both-Worlds (BOBW) optimality in standard multi-armed bandit problems with Fréchet-type perturbations, which inspires researchers to explore the optimality of FTPL in combinatorial semi-bandit problems. A preliminary effort by Zhan et al. (2025) aimed to tackle this setting, though their analysis contains a technical flaw. In fact, the analysis becomes substantially more complex in the combinatorial semi-bandit setting, which needs more furter investigation.

Contributions of This Paper

Firstly, we investigate the optimality of FTPL with geometric resampling with Fréchet or Pareto distributions in adversarial size-invariant semi-bandit problems. We show that FTPL respectively achieves $O\quantity(\sqrt{m^{2}d^{\frac{1}{\alpha}}T}+\sqrt{mdT})$ regret with Fréchet distributions, and the best possible regret bound of $O\quantity(\sqrt{mdT})$ with Pareto distributions in this setting. To the best of our knowledge, this is the first work that provides a correct proof of the regret bound for FTPL with Fréchet-type distributions in adversarial combinatorial semi-bandit problems. Furthermore, we extend the technique called Conditional Geometric Resampling (CGR) (Chen et al., 2025) to size-invariant semi-bandit setting, which reduces the computational complexity from $O(d^{2})$ of the original GR to $O\quantity(md\quantity(\log(d/m)+1))$ without sacrificing the regret guarantee of the one with the original GR.

1.1 Related Work

1.1.1 Technical Issues in Zhan et al. (2025)

The most closely related work is by Zhan et al. (2025). In their paper, they consider the FTPL policy with Fréchet distribution with shape $2$ in the size-invariant semi-bandits, which is a special case of combinatorial semi-bandit problems. They provide a proof to claim that FTPL with Fréchet distribution with shape $2$ achieves $O\quantity(\sqrt{md\log(d)T})$ regret in adversarial setting and a logrithmic regret bound in stochastic setting. However, their proof includes a serious issue that renders the main result incorrect, which is explained below.

A function is analyzed in Lemma 4.1 in Zhan et al. (2025), which is later used in their evaluation of a component of the regret called the stability term. In this lemma, they evaluate the function for two cases, and the second case is just mentioned as “can be shown by the same argument” without a proof. However, upon closer inspection, this claim cannot be justified by an analogy. In Section 9, we highlight the detailed step where the analogy fails, and further support this observation with numerical verification, which demonstrates that the claimed result does not hold.

The main difficulty of the optimal regret analysis of FTPL lies in the analysis in the stability term (Honda et al., 2023; Lee et al., 2024), which is also the problem we mainly addressed. Unfortunately, this main difficulty lies behind these skipped or incorrect arguments, and thus we need an essentially new technique to complete the regret analysis for this problem. In this paper, we evaluate the stability term in a totally different way in Lemmas 3–6, which demonstrates that the stability term can be bounded by the maximum of simple quantities each of which is associated with a subset of base-arms.

2 Problem Setup

In this section, we formulate the problem and introduce the framework of FTPL with geometric resampling. We consider an action set $\mathcal{A}\subset\quantity{0,1}^{d}$ , where each element $a\in\mathcal{A}$ is called an action. For each base-arm $i\in[d]$ , we assume that there exists at least an action $a\in\mathcal{A}$ such that $a_{i}=1$ . In this paper, we consider a special case of action sets in combinatorial semi-bandit, referred to as size-invariant semi-bandit. In this setting, we define the action set $\mathcal{A}=\quantity{a:\left\lVert a\right\rVert_{1}=m}$ , where $m$ is the number of selected base-arms at each round. At each round $t\in[T]=\quantity{1,2,\cdots,T}$ , the environment determines a loss vector $\ell_{t}=\quantity(\ell_{t,1},\ell_{t,2},\cdots,\ell_{t,d})^{\top}\in[0,1]^{d}$ , and the learner takes an action $a_{t}\in\mathcal{A}$ and incurs a loss $\left\langle\ell_{t},a_{t}\right\rangle$ . In the semi-bandit setting, the learner only observes the loss $\ell_{t,i}$ for all $i\in[d]$ such that $a_{t,i}=1$ , whereas $\ell_{t,i}$ that corresponds to $a_{t,i}=0$ is not observed.

In this paper, we only consider the setting that the loss vector is determined in an adversarial way. In this setting, the loss vectors $\quantity(\ell_{t})_{t=1}^{T}$ are not assumed to follow any specific distribution, and they are determined in an arbitrary manner, which may depend on the past history of the actions and losses $\quantity{\quantity(\ell_{s},a_{s})}_{s=1}^{t-1}$ .

The performance of the learner is evaluated in terms of the pseudo-regret, which is defined as

\mathcal{R}(T)=\mathbb{E}\quantity[\sum_{t=1}^{T}\left\langle\ell_{t},a_{t}-a^{*}\right\rangle],\quad a^{*}\in\min_{a\in\mathcal{A}}\mathbb{E}\quantity[\sum_{t=1}^{T}\left\langle\ell_{t},a\right\rangle].

2.1 Follow-the-Perturbed-Leader

Table 1: Notation

Symbol	Meaning
$\mathcal{A}\subset\{0,1\}^{d}$	Action set
$d\in\mathbb{N}$	Dimensionality of action set
$m\leq d$	$m=\\|a\\|_{1}$ for any $a\in\mathcal{A}$
$\eta$	Learning rate
$\ell_{t}\in[0,1]^{d}$	Loss vector
$\hat{\ell}_{t}\in\quantity[0,\infty]^{d}$	Estimated loss vector
$\hat{L}_{t}\in[0,\infty]^{d}$	Cumulative estimated loss vector
$\text{{Rank}}\quantity(i,\bm{u};\mathcal{B})$	Rank of $i$ -th element of $\bm{u}$ in $\quantity{u_{j}:j\in\mathcal{B}}$ in descending order, $\mathcal{B}$ omitted when $\mathcal{B}=[d]$
$\sigma_{i}$	The number of arms (including $i$ itself) whose cumulative losses do not exceed $\hat{L}_{t,i}$ , i.e., $\text{{Rank}}(i,-\hat{L}_{t})$
$\nu$	Left-end point of perturbation
$r_{t}\in[\nu,\infty]^{d}$	$d$ -dimensional perturbation
$f(x)$	Probability distribution function of perturbation
$F(x)$	Cumulative distribution function of perturbation
$\mathcal{F}_{\alpha}$	Fréchet distribution with shape $\alpha$
$\mathcal{P}_{\alpha}$	Pareto distribution with shape $\alpha$

Input: Action set

\mathcal{A}\subseteq\{0,1\}^{d}

, learning rate

\eta\in\mathbb{R}^{+}

;

1 Initialization:

\hat{L}_{1}\coloneqq\mathbf{0}\in\mathbb{R}^{d}

;

2

3for $t=1,\dots,T$ do

4 Sample

r_{t}=\quantity(r_{t,1},r_{t,2},\cdots,r_{t,d})

i.i.d. from

\mathcal{D}

;

5 Choose action

a_{t}=\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r_{t})\right\}

and observe

\quantity{\ell_{t,i}:a_{t,i}=1}

;

6 Compute an estimator

\widehat{w_{t,i}^{-1}}

for

w_{t,i}^{-1}

by geometric resampling for all

i

such that

a_{t,i}=1

;

7 Set

\hat{\ell}_{t}\coloneqq\sum_{i:a_{t,i}=1}\ell_{t,i}\widehat{w_{t,i}^{-1}}e_{i}

and

\hat{L}_{t+1}\coloneqq\hat{L}_{t}+\hat{\ell}_{t}

;

8

9 end for

Algorithm 1 Follow-the-Perturbed-Leader

We consider the Follow-the-Perturbed-Leader (FTPL) policy, whose entire procedure is given in Algorithm 1. In combinatorial semi-bandit problems, FTPL policy maintains a cumulative estimated loss $\hat{L}_{t}$ and plays an action

a_{t}=\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r_{t})\right\},

where $\eta\in\mathbb{R}^{+}$ is the learning rate, and $r_{t}=\quantity(r_{t,1},r_{t,2},\cdots,r_{t,d})$ denotes the random perturbation i.i.d. from a common distribution $\mathcal{D}$ with a distribution function $F$ . In this paper, we consider two types of perturbation distributions. The first is Fréchet distribution $\mathcal{F}_{\alpha}$ , with the probability density function $f(x)$ and the cumulative distribution function $F(x)$ given by

f(x)=\alpha x^{-(\alpha+1)}e^{-1/x^{\alpha}},\quad F(x)=e^{-1/x^{\alpha}},\quad x\geq 0,\alpha>1.

The second is Pareto distribution $\mathcal{P}_{\alpha}$ , whose density and cumulative distribution functions are defined as

f(x)=\alpha x^{-(\alpha+1)},\quad F(x)=1-x^{-\alpha},\quad x\geq 1,\alpha>1.

Denote the rank of $i$ -th element in $\bm{u}$ in descending order as $\text{{Rank}}\quantity(i,\bm{u})$ . The probability of selecting base-arm $i$ and $\text{{Rank}}\quantity(i,r-\lambda)=\theta\in[m]$ given $\hat{L}_{t}$ is written as $\phi_{i,\theta}(\eta\hat{L}_{t};\mathcal{D})$ . Then, for $\lambda\in[0,\infty)^{d}$ , letting $\widetilde{\lambda}_{1},\widetilde{\lambda}_{2},\cdots,\widetilde{\lambda}_{d}$ be the sorted elements of $\lambda$ such that $\widetilde{\lambda}_{1}\leq\widetilde{\lambda}_{2}\leq\cdots\leq\widetilde{\lambda}_{d}$ , $\phi_{i,\theta}(\lambda;\mathcal{D})$ can be expressed as

	$\displaystyle\phi_{i,\theta}(\lambda;\mathcal{D})$	$\displaystyle\coloneqq\mathbb{P}_{r=\quantity(r_{1},\dots,r_{d})\sim\mathcal{D}}\quantity[\text{{Rank}}\quantity(i,r-\lambda)=\theta]$
		$\displaystyle=\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}\sum_{\bm{v}\in\mathcal{S}_{i,\theta}}\quantity(\prod_{j:v_{j}=1}\quantity(1-F(z+\lambda_{j}))\prod_{j:v_{j}=0,j\neq i}F(z+\lambda_{j}))\differential F(z+\lambda_{i}),$		(1)

where $\mathcal{S}_{i,\theta}$ is defined as $\mathcal{S}_{i,\theta}=\quantity{\bm{v}\in\quantity{0,1}^{d}:\left\lVert\bm{v}\right\rVert_{1}=\theta-1,v_{i}=0}$ . Here, $\nu$ denotes the left endpoint of the support of $F$ .

Then, we write the probability of selecting the base-arm $i$ as $w_{t,i}=\phi_{i}(\eta\hat{L}_{t};\mathcal{D})$ , where for $\lambda\in[0,\infty)^{d}$

	$\displaystyle\phi_{i}(\lambda;\mathcal{D})$	$\displaystyle\coloneqq\sum_{\theta=1}^{m}\mathbb{P}_{r=\quantity(r_{1},\dots,r_{d})\sim\mathcal{D}}\quantity[\text{{Rank}}\quantity(i,r-\lambda)=\theta]=\sum_{\theta=1}^{m}\phi_{i,\theta}(\lambda;\mathcal{D})$
		$\displaystyle=\sum_{\theta=1}^{m}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}\sum_{\bm{v}\in\mathcal{S}_{i,\theta}}\quantity(\prod_{j:v_{j}=1}\quantity(1-F(z+\lambda_{j}))\prod_{j:v_{j}=0,j\neq i}F(z+\lambda_{j}))\differential F(z+\lambda_{i}).$		(2)

Table 1 summarizes the notation used in this paper.

2.2 Geometric Resampling

Input: Chosen action

a_{t}

, action set

\mathcal{A}

, cumulative loss

\hat{L}_{t}

, learning rate

\eta

1 Set

K\coloneqq\bm{0}\in\mathbb{R}^{d}

;

s\coloneqq a_{t}

;

2 repeat

3

K\coloneqq K+s

;

4 Sample

r_{t}^{\prime}=(r_{t,1}^{\prime},r_{t,2}^{\prime},\dots,r_{t,d}^{\prime})

i.i.d. from

\mathcal{D}

;

5

a^{\prime}_{t}=\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r_{t})\right\}

;

s\coloneqq s\circ\quantity(\bm{1}_{d}-a^{\prime}_{t})

;

//

\bm{1}_{d}

denotes the

d

-dimensional all-ones vector

6

7until $s=\bm{0}$ ;

8Set

\widehat{w_{t,i}^{-1}}\coloneqq K_{i}

for all

i

such that

a_{t,i}=1

;

Algorithm 2 Geometric Resampling

Since the loss in every round is partially observable in the setting of semi-bandit feedback, for the unbiased loss estimator, many policies generally use an estimator $\hat{\ell}_{t}$ for the loss vector $\ell_{t}$ . Then, the cumulative estimated loss vector $\hat{L}_{t}$ is obtained as $\hat{L}_{t}=\sum_{s=1}^{t-1}\hat{\ell}_{s}$ . In standard multi-armed bandit problems, many policies like FTRL often employ an importance-weighted (IW) estimator $\hat{\ell}_{t}=\quantity(\ell_{t,I_{t}}/w_{t,I_{t}})e_{t,I_{t}}$ , where $I_{t}$ means the chosen arm at round $t$ , and the arm-selection probability $w_{t,I_{t}}$ is explicitly computed. However, in the combinatorial semi-bandit setting, the individual probability of each base-arm $i$ is not available, which complicates the construction of the unbiased estimator. To address this issue, Neu and Bartók (2016) proposed a technique called Geometric Resampling (GR). With this technique, for each selected base-arm $i$ , FTPL policy can efficiently compute an unbiased estimator $\widehat{w_{t,i}^{-1}}$ for $w_{t,i}^{-1}$ . The procedure of GR is shown in Algorithm 2, where the notation $a\circ b$ denotes the element-wise product of two vectors $a$ and $b$ , i.e., $\quantity(a\circ b)_{i}=a_{i}b_{i}$ for all $i$ .

Now we consider the computational complexity of GR. Let $M_{t,i}$ denote the number of resampling taken by geometric resampling at round $t$ until $s_{i}$ switches from $1$ to $0$ . Here, $M_{t,i}$ is equal to $\widehat{w_{t,i}^{-1}}$ in GR. Then, the expected number of total resampling $M_{t}=\max_{i:a_{t,i}=1}M_{t,i}$ given $\hat{L}_{t}$ can be bounded as

	$\displaystyle\mathbb{E}\quantity[M_{t}\middle\|\hat{L}_{t}]$	$\displaystyle=\mathbb{E}\quantity[\max_{i:a_{t,i}=1}M_{t,i}\middle\|\hat{L}_{t}]$
		$\displaystyle\leq\mathbb{E}\quantity[\sum_{i=1}^{d}a_{t,i}M_{t,i}\middle\|\hat{L}_{t}]$
		$\displaystyle=\sum_{i=1}^{d}\mathbb{E}\quantity[a_{t,i}\middle\|\hat{L}_{t}]\mathbb{E}\quantity[M_{t,i}\middle\|\hat{L}_{t},a_{t,i}]$
		$\displaystyle=\sum_{i=1}^{d}w_{t,i}\cdot\frac{1}{w_{t,i}}=d.$

For each resampling, the generation of perturbation requires the complexity of $O(d)$ , and thus the total complexity of GR at each round is $O(d^{2})$ , which is independent of $w_{t}$ . Compared with many other policies, in combinatorial semi-bandit problems, FTPL with GR is computationally more efficient. However, in standard $K$ -armed bandit problems, the computational complexity of FTPL with GR still maintains $O(K^{2})$ . Though FTPL remains efficient for moderate size of $K$ (Honda et al., 2023), primarily thanks to its optimization-free nature, the running time increases substantially as $K$ grows to a large number. To overcome this limitation, Chen et al. (2025) proposed an improved technique called Conditional Geometric Resampling (CGR), which reduces the complexity to $O(K\log K)$ and demonstrates superior runtime performance. Inspired by this, we extend CGR to size-invariant semi-bandit setting in this paper, which is presented in Section 5.

3 Regret Bounds

In this section, we summarize the regret bound of FTPL in adversarial size-invariant semi-bandit setting.

3.1 Main Results

Combining Lemma 2 from this section and Lemma 6 given in Section 4, we can obtain the regret bound of FTPL in adversarial setting in the following theorem.

Theorem 1.

In the adversarial setting, when perturbation follows Fréchet distribution with shape $\alpha>1$ , FTPL with learning rate

\eta=\sqrt{\frac{\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}+m}{2(\alpha+1)\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+\frac{\alpha}{\alpha-1}(d-m+1)^{1-1/\alpha})T}}

satisfies

\mathcal{R}(T)\leq\\ 2\quantity(2(\alpha+1)\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+\frac{\alpha}{\alpha-1}(d-m+1)^{1-1/\alpha})\quantity(\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}+m)T)^{\frac{1}{2}},

whose order is $O\quantity(\sqrt{m^{2}d^{\frac{1}{\alpha}}T}+\sqrt{mdT})$ , where $\Gamma(\cdot)$ is the gamma function. When perturbation follows Pareto distribution with shape $\alpha>1$ , FTPL with learning rate

\eta=\sqrt{\frac{\quantity(\alpha m^{1-\frac{1}{\alpha}}+\quantity(\alpha-1)\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}}{4\alpha^{2}\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}d^{1-1/\alpha}T}}

satisfies

\mathcal{R}(T)\leq\frac{4\alpha}{\quantity(\alpha-1)^{\frac{1}{2}}}\quantity(\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))d^{1-1/\alpha}\quantity(d+1)^{\frac{1}{\alpha}}T)^{\frac{1}{2}},

whose order is $O\quantity(\sqrt{mdT})$ .

3.2 Regret Decomposition

To evaluate the regret of FTPL, we firstly decompose the regret which is expressed as

\mathcal{R}(T)=\mathbb{E}\quantity[\sum_{t=1}^{T}\left\langle\ell_{t},a_{t}-a^{*}\right\rangle]=\sum_{t=1}^{T}\mathbb{E}\quantity[\left\langle\ell_{t},a_{t}-a^{*}\right\rangle]=\sum_{t=1}^{T}\mathbb{E}\quantity[\left\langle\hat{\ell}_{t},a_{t}-a^{*}\right\rangle].

This can be decomposed in the following way, whose proof is given in Section 6.

Lemma 2.

For any $\alpha>1$ and $\mathcal{D}_{\alpha}\in\quantity{\mathcal{F}_{\alpha},\mathcal{P}_{\alpha}}$ ,

\mathcal{R}(T)\leq\begin{cases}\sum_{t=1}^{T}\mathbb{E}\left[\left\langle\hat{\ell}_{t},w_{t}-w_{t+1}\right\rangle\right]+\frac{\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}+m}{\eta}&\text{if }\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha},\\ \sum_{t=1}^{T}\mathbb{E}\left[\left\langle\hat{\ell}_{t},w_{t}-w_{t+1}\right\rangle\right]+\frac{\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}}{\eta}&\text{if }\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}.\end{cases}

(3)

We refer to the first and second terms of (3) as stability term and penalty term, respectively.

4 Stability of Arm-selection Probability

In the standard multi-armed bandit problem, the core and most challeging part in analyzing the regret of FTPL lies on the analysis of the arm-selection probability function (Abernethy et al., 2015; Honda et al., 2023; Lee et al., 2024). This challenge is further amplified in the combinatorial semi-bandit setting, where the base-arm selection probability $\phi_{i}(\lambda;\mathcal{D})$ given in (2) exhibits significantly greater complexity. To this end, this section firstly introduces some tools used in the analysis and then derives properties of $\phi_{i}(\lambda;\mathcal{D})$ , which is the main difficulty of the analysis of FTPL.

4.1 General Tools for Analysis

Since the probability of some events in different base-arm set $\mathcal{B}\subset[d]$ will be considered in the subsequent analysis, we introduce the parameter $\mathcal{B}$ into $\phi_{i,\theta}(\cdot)$ . Denote the rank of $i$ -th element of $\bm{u}$ in $\quantity{u_{j}:j\in\mathcal{B}}$ in descending order as $\text{{Rank}}\quantity(i,\bm{u};\mathcal{B})$ . We define

	$\displaystyle\phi_{i,\theta}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})$	$\displaystyle=\mathbb{P}_{r=\quantity(r_{1},\dots,r_{d})\sim\mathcal{D}}\quantity[\text{{Rank}}\quantity(i,r-\lambda;\mathcal{B})=\theta]$
		$\displaystyle=\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}\sum_{\bm{v}\in\mathcal{S}_{i,\theta}^{\mathcal{B}}}\quantity(\prod_{j:v_{j}=1}\quantity(1-F(z+\lambda_{j}))\prod_{j:v_{j}=0,j\in\mathcal{B}\setminus\quantity{i}}F(z+\lambda_{j}))\differential F(z+\lambda_{i}),$		(4)

where $\mathcal{S}_{i,\theta}^{\mathcal{B}}=\quantity{\bm{v}\in\quantity{0,1}^{d}:\left\lVert\bm{v}\right\rVert_{1}=\theta-1,v_{i}=0,\text{ and }v_{j}=0\text{ for all }j\notin\mathcal{B}}$ .

Under this definition, $\phi_{i,\theta}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})$ means the probability that the base-arm $i$ ranks $\theta$ -th among the base-arm set $\mathcal{B}$ . Based on (4), we define

\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=\sum_{\theta=1}^{\widetilde{m}}\mathbb{P}_{r=\quantity(r_{1},\dots,r_{d})\sim\mathcal{D}}\quantity[\text{{Rank}}\quantity(i,r-\lambda;\mathcal{B})=\theta]=\sum_{\theta=1}^{\widetilde{m}}\phi_{i,\theta}(\lambda;\mathcal{D}_{\alpha},\mathcal{B}),

which represents the probability of selecting base-arm $i$ when the base-arm set and the number of selected base-arms are respectively set as $\mathcal{B}$ and $\widetilde{m}$ in size-invariant semi-bandit setting. Since the definition above is an extension of (1) and (2), we have

\phi_{i,\theta}(\lambda;\mathcal{D}_{\alpha})=\phi_{i,\theta}(\lambda;\mathcal{D}_{\alpha},[d]),\quad\text{and}\quad\phi_{i}(\lambda;\mathcal{D}_{\alpha})=\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,[d]).

For analysis on the derivative, based on (4) we define

J_{i,\theta}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})=\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}\frac{1}{z+\lambda_{i}}\sum_{\bm{v}\in\mathcal{S}_{i,\theta}^{\mathcal{B}}}\quantity(\prod_{j:v_{j}=1}\quantity(1-F(z+\lambda_{j}))\prod_{j:v_{j}=0,j\in\mathcal{B}\setminus\quantity{i}}F(z+\lambda_{j}))\differential F(z+\lambda_{i})

and

J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=\sum_{\theta=1}^{\widetilde{m}}J_{i,\theta}(\lambda;\mathcal{D}_{\alpha},\mathcal{B}).

When $\widetilde{m}=m$ and $\mathcal{B}=[d]$ , we simply write

J_{i,\theta}(\lambda;\mathcal{D}_{\alpha})=J_{i,\theta}(\lambda;\mathcal{D}_{\alpha},m,[d]),\quad\text{and}\quad J_{i}(\lambda;\mathcal{D}_{\alpha})=J_{i}(\lambda;\mathcal{D}_{\alpha},m,[d]).

In the following, we write $\sigma_{i}$ to denote the number of arms (including $i$ itself) whose cumulative losses do not exceed $\hat{L}_{t,i}$ , i.e., $\text{{Rank}}(i,\hat{L}_{t})=\sigma_{i}$ . Without loss of generality, in the subsequent analysis we always assume $\lambda_{1}\leq\lambda_{2}\leq\cdots\leq\lambda_{d}$ (ties are broken arbitrarily) so that $\sigma_{i}=i$ for notational simplicity. To derive an upper bound, we employ the tools introduced above to provide lemmas related to the relation between the base-arm selection probability and its derivatives.

4.2 Important Lemmas

Lemma 3.

It holds that

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m\land i)-w]\end{subarray}}\quantity{\frac{J_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}},

where

\mathcal{B}_{i,w}=\begin{cases}[i],&\text{if }w=0,\\ [i]\setminus[w],&\text{if }w\in[i],\end{cases}\text{ and }\lambda_{k}^{*}=\begin{cases}\lambda_{i},&\text{if }k\leq i,\\ \lambda_{k},&\text{if }k>i.\end{cases}

Based on this result, the following lemma holds.

Lemma 4.

If $\text{{Rank}}(i,-\lambda)=\sigma_{i}$ , that is, $\lambda_{i}$ is the $\sigma_{i}$ -th smallest among $\lambda_{1},\cdots,\lambda_{d}$ (ties are broken arbitrarily), then

\frac{J_{i}(\lambda;\mathcal{F}_{\alpha})}{\phi_{i}(\lambda;\mathcal{F}_{\alpha})}\leq\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{(\sigma_{i}-m+1)\lor 1})^{\frac{1}{\alpha}}\quad\text{and}\quad\frac{J_{i}(\lambda;\mathcal{P}_{\alpha})}{\phi_{i}(\lambda;\mathcal{P}_{\alpha})}\leq\frac{2\alpha}{\alpha+1}\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{\sigma_{i}})^{\frac{1}{\alpha}}.

Next, following the steps in Honda et al. (2023) and Lee et al. (2024), we extend the analysis to the combinatorial semi-bandit setting and obtain the following lemma.

Lemma 5.

For any $i\in[d],\alpha>1,\eta\hat{L}_{t}$ and $\mathcal{D}_{\alpha}\in\quantity{\mathcal{F}_{\alpha},\mathcal{P}_{\alpha}}$ , it holds that

\mathbb{E}\quantity[\hat{\ell}_{t,i}\quantity(\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})-\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{D}_{\alpha}))\middle|\hat{L}_{t}]\leq\begin{cases}2(\alpha+1)\eta\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{(\sigma_{i}-m+1)\lor 1})^{\frac{1}{\alpha}},&\text{if }\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha},\\ 4\alpha\eta\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{\sigma_{i}})^{\frac{1}{\alpha}},&\text{if }\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}.\end{cases}

By using the above lemma, we can express the stability term as follows.

Lemma 6.

For any $\eta\hat{L}_{t}$ , $\alpha>1$ and $\mathcal{D}_{\alpha}\in\quantity{\mathcal{F}_{\alpha},\mathcal{P}_{\alpha}}$ , it holds that

\mathbb{E}\quantity[\hat{\ell}_{t}\quantity(\phi\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})-\phi\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{D}_{\alpha}))\middle|\hat{L}_{t}]\leq\\ \begin{cases}2(\alpha+1)\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+\frac{\alpha}{\alpha-1}(d-m+1)^{1-1/\alpha}),&\text{if }\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha},\\ \frac{4\alpha^{2}}{\alpha-1}\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}d^{1-1/\alpha},&\text{if }\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}.\end{cases}

5 Conditional Geometric Resampling for Size-Invariant Semi-Bandit

Building on the idea proposed by Chen et al. (2025), this section introduces an extension of Conditional Geometric Resampling (CGR) to the size-invariant semi-bandit setting. This algorithm is designed to provide multiple unbiased estimators $\quantity{w_{t,i}^{-1}:a_{t,i}=1}$ in a more efficient way, which is based on the following lemma.

Lemma 7.

Let $\mathcal{E}_{t,i}$ be an be an arbitrary necessary condition for

\quantity[\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r^{\prime\prime}_{t})\right\}]_{i}=1.

(5)

Consider resampling of $r_{t}^{\prime\prime}$ from $\mathcal{D}$ conditioned on $\mathcal{E}_{t,i}$ until (5) is satisfied. Then, the number $M_{t,i}$ of resampling for base-arm $i$ satisfies

\mathbb{E}[M_{t,i}|\hat{L}_{t},a_{t,i}]=\frac{\mathbb{P}[\mathcal{E}_{t,i}|\hat{L}_{t},a_{t,i}]}{w_{t,i}}.

From this lemma, we can use

\widehat{{w}_{t,i}^{-1}}=\frac{M_{t,i}}{\mathbb{P}[\mathcal{E}_{t,i}|\hat{L}_{t},a_{t,i}]}

as an unbiased estimator of $w_{t,i}^{-1}$ for $r_{t}^{\prime\prime}$ sampled from $\mathcal{D}$ conditioned on $\mathcal{E}_{t,i}$ .

Proof.

Define

\chi_{t,i}(r_{t}^{\prime\prime})=\begin{cases}1,&\text{if }\quantity[\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r^{\prime\prime}_{t})\right\}]_{i}=1,\\ 0,&\text{otherwise}.\end{cases}

Consider $w_{t,i}$ , the probability that base-arm $i$ is selected, with the condition $\mathcal{E}_{t,i}$ . $w_{t,i}$ can be expressed as

	$\displaystyle w_{t,i}$	$\displaystyle=\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1\|\hat{L}_{t},a_{t,i}]$
		$\displaystyle=\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1\|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}]\mathbb{P}[\mathcal{E}_{t,i}\|\hat{L}_{t},a_{t,i}]+\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1\|\mathcal{E}_{t,i}^{c},\hat{L}_{t},a_{t,i}]\mathbb{P}[\mathcal{E}_{t,i}^{c}\|\hat{L}_{t},a_{t,i}].{}$		(6)

Note that $\mathcal{E}_{t,i}$ is an arbitrary necessary condition for $\chi_{t,i}(r_{t}^{\prime\prime})=1$ , which implies that

\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i}^{c},\hat{L}_{t},a_{t,i}]=0.

Therefore, from (6) we immediately obtain

w_{t,i}=\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}]\mathbb{P}[\mathcal{E}_{t,i}|\hat{L}_{t},a_{t,i}].

(7)

Now we consider the expected number of resampling $M_{t,i}$ for base-arm $i$ . Recall that $r_{t}^{\prime\prime}$ is sampled from $\mathcal{D}$ conditioned on $\mathcal{E}_{t,i}$ until (5) is satisfied, that is, $\chi_{t,i}(r_{t}^{\prime\prime})=1$ . Then $M_{t,i}$ follows geometric distribution with probability mass function

\mathbb{P}[M_{t,i}=m|\hat{L}_{t},a_{t,i}]=\quantity(1-\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}])^{m-1}\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}].

Therefore, the expected number of resampling given $\hat{L}_{t}$ and $a_{t,i}$ is expressed as

$\displaystyle\mathbb{E}_{r_{t}^{\prime\prime}\sim\mathcal{D}\|\mathcal{E}_{t,i}}[M_{t,i}\|\hat{L}_{t},a_{t,i}]$	$\displaystyle=\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1\|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}]\sum_{n=1}^{\infty}n\quantity(1-\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1\|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}])^{n-1}$
	$\displaystyle=\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1\|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}]/\quantity(\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1\|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}])^{2}$
	$\displaystyle=1/\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1\|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}].{}$	(8)

Combining (7) and (8), we obtain

\displaystyle\mathbb{E}_{r_{t}^{\prime\prime}\sim\mathcal{D}|\mathcal{E}_{t,i}}[M_{t,i}|\hat{L}_{t},a_{t,i}]=\frac{\mathbb{P}[\mathcal{E}_{t,i}|\hat{L}_{t},a_{t,i}]}{w_{t,i}}.

∎

For base arm $i$ such that $\sigma_{i}>m$ , we now consider resampling $r^{\prime\prime}_{t}$ from the perturbation distribution $\mathcal{D}$ conditioned on

\mathcal{E}_{t,i}=\quantity{\left|\quantity{j:r^{\prime\prime}_{t,j}\leq r^{\prime\prime}_{t,i},\sigma_{j}\leq\sigma_{i}}\right|\leq m},

that is, the event that $r^{\prime\prime}_{t,i}$ lies among the top- $m$ largest of the base-arms $j$ whose cumulative estimated losses are no worse than $i$ . By the symmetry nature of the i.i.d. perturbations, we can sample $r^{\prime\prime}_{t}$ from this conditional distribution with simple operation, which corresponds to Lines 3–3 in Algorithm 3. For each base-arm $i$ such that $\sigma_{i}\leq m$ , the resampling procedure in our proposed algorithm is the same as the original GR. By Lemma 7, we can derive the properties of CGR in size-invariant semi-bandit setting as follows.

Input: Chosen action

a_{t}

, action set

\mathcal{A}

, cumulative loss

\hat{L}_{t}

, learning rate

\eta

;

1 Set

K\coloneqq\bm{0}\in\mathbb{R}^{d}

;

s\coloneqq a_{t}

;

U\coloneqq\varnothing

;

C\coloneqq\mathbf{1}_{d}\in\mathbb{R}^{d}

;

2 for $i=1,\dots,d$ do

3 if $a_{t,i}=1\text{ and }\sigma_{i}>m$ then

4

U\coloneqq U\cup\{i\}

;

C_{i}\coloneqq\sigma_{i}/m

;

5

6 end if

7

8 end for

9

10repeat

11

K\coloneqq K+s

;

12

13 Sample

r_{t}^{\prime}=(r_{t,1}^{\prime},r_{t,2}^{\prime},\dots,r_{t,d}^{\prime})

i.i.d. from

\mathcal{D}

;

14

15

a^{\prime}_{t}\coloneqq\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r^{\prime}_{t})\right\}

;

16

17 Sample

\theta

from

[m]

uniformly at random;

18

19 for $i\in U$ do

20 Find

i^{\prime}

such that

r^{\prime}_{t,i^{\prime}}

is the

\theta\text{-th}

largest in

\quantity{r^{\prime}_{t,j}:\sigma_{j}\leq\sigma_{i}}

;

21 Set

r^{\prime\prime}\coloneqq r^{\prime}

;

22 Swap

r^{\prime\prime}_{t,i^{\prime}}

and

r^{\prime\prime}_{t,i}

;

23

a^{\prime}_{t,i}\coloneqq\quantity[\mathop{\arg\min}_{a\in\mathcal{A}}\left\{a^{\top}(\eta\hat{L}_{t}-r^{\prime\prime}_{t})\right\}]_{i}

;

24 if $a^{\prime}_{t,i}=1$ then

U\coloneqq U\setminus\{i\}

;

25

26 end for

27

s\coloneqq s\circ\quantity(\mathbf{1}_{d}-a^{\prime}_{t})

;

28

29until $s=0$ ;

30Set

\widehat{w_{t,i}^{-1}}\coloneqq C_{i}K_{i}

for all

i

such that

a_{t,i}=1

;

Algorithm 3 Conditional Geometric Resampling

Lemma 8.

The sample $r^{\prime\prime}_{t}$ obtained by Algorithm 3 for each base-arm $i$ such that $\sigma_{i}>m$ follows the conditional distribution of $\mathcal{D}$ given

\mathcal{E}_{t,i}=\quantity{\left|\quantity{j:r^{\prime\prime}_{t,j}\leq r^{\prime\prime}_{t,i},\sigma_{j}\leq\sigma_{i}}\right|\leq m}.

In addition, for any $i\in[d]$ ,

\widehat{w_{t,i}^{-1}}=\quantity(\frac{\sigma_{i}}{m}\lor 1)M_{t,i}

given Algorithm 3 serves as an unbiased estimator for $w_{t,i}^{-1}$ , and the number of resampling $M_{t}$ satisfies

\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}|\mathcal{E}_{t,i}}\quantity[M_{t}\middle|\hat{L}_{t}]\leq m+m\log\quantity(d/m).

(9)

Proof.

Let $\mathbb{P}^{*}[\cdot]$ denote the probability distribution of $r^{\prime\prime}_{t}$ after the value-swapping operation, and $\text{{Rank}}_{i,j}$ denote the rank of $r^{\prime\prime}_{t,j}$ among $\quantity{r^{\prime\prime}_{t,k}:\sigma_{k}\leq\sigma_{i}}$ . Then, we have $\mathcal{E}_{t,i}=\quantity{\text{{Rank}}_{i,i}\in[m]}$ . Given $\hat{L}_{t},a_{t,i}$ and $\theta$ , for any realization $\theta_{0}$ in $[m]$ of $\theta$ we have

	$\displaystyle\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle\|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]$
$\displaystyle=$	$\displaystyle\sum_{j:\sigma_{j}\leq\sigma_{i}}\mathbb{P}\quantity[\bigcap\nolimits_{k:\sigma_{k}\leq\sigma_{i},i\notin\quantity{j,i}}\left\{r^{\prime\prime}_{t,k}\leq x_{k}\right\},\,r^{\prime\prime}_{t,j}\leq x_{i},\,r^{\prime\prime}_{t,i}\leq x_{j},\,\text{{Rank}}_{i,j}=\theta_{0}\middle\|\hat{L}_{t},a_{t,i}]$
$\displaystyle=$	$\displaystyle\sum_{j:\sigma_{j}\leq\sigma_{i}}\mathbb{P}\quantity[\bigcap\nolimits_{i:\sigma_{i}\leq\sigma_{i},i\notin\quantity{j,i}}\left\{r^{\prime\prime}_{t,i}\leq x_{i}\right\},\,r^{\prime\prime}_{t,j}\leq x_{i},\,r^{\prime\prime}_{t,i}\leq x_{j}\middle\|\text{{Rank}}_{i,j}=\theta_{0},\hat{L}_{t},a_{t,i}]\mathbb{P}\quantity[\text{{Rank}}_{i,j}=\theta_{0}\middle\|\hat{L}_{t},a_{t,i}].{}$	(10)

By symmetry of $r^{\prime\prime}_{t}\in[\nu,\infty)^{d}$ , we have

\displaystyle\mathbb{P}\quantity[\text{{Rank}}_{i,j}=\theta_{0}\middle|\hat{L}_{t},a_{t,i}]=\mathbb{P}\quantity[\text{{Rank}}_{i,i}=\theta_{0}\middle|\hat{L}_{t},a_{t,i}]

(11)

for any $j$ suth that $\sigma_{j}\leq\sigma_{i}$ . Then we have

	$\displaystyle 1$	$\displaystyle=\mathbb{P}\quantity[\bigcup_{j:\sigma_{j}\leq\sigma_{i}}\{\text{{Rank}}_{i,j}=\theta_{0}\}\middle\|\hat{L}_{t},a_{t,i}]$
		$\displaystyle=\sum_{j:\sigma_{j}\leq\sigma_{i}}\mathbb{P}\quantity[\text{{Rank}}_{i,j}=\theta_{0}\middle\|\hat{L}_{t},a_{t,i}]$
		$\displaystyle=\sigma_{i}\mathbb{P}\quantity[\text{{Rank}}_{i,i}=\theta_{0}\middle\|\hat{L}_{t},a_{t,i}],$

which means that (11) is equal to $1/\sigma_{i}$ . Therefore, from (10) we have

\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]=\\ \frac{1}{\sigma_{i}}\sum_{j:\sigma_{j}\leq\sigma_{i}}\mathbb{P}\quantity[\bigcap\nolimits_{i:\sigma_{i}\leq\sigma_{i},i\notin\quantity{j,i}}\left\{r^{\prime\prime}_{t,i}\leq x_{i}\right\},\,r^{\prime\prime}_{t,j}\leq x_{i},\,r^{\prime\prime}_{t,i}\leq x_{j}\middle|\text{{Rank}}_{i,j}=\theta_{0},\hat{L}_{t},a_{t,i}].

(12)

By symmetry, each probability term in the RHS of (12) is equal. Therefore, we have

\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]=\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\text{{Rank}}_{i,i}=\theta_{0},\hat{L}_{t},a_{t,i}],

with which we immediately obtain

\frac{1}{m}\sum_{\theta_{0}\in[m]}\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]=\\ \frac{1}{m}\sum_{\theta_{0}\in[m]}\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\text{{Rank}}_{i,i}=\theta_{0},\hat{L}_{t},a_{t,i}].

(13)

For the LHS of (13), since for any $\theta_{0}\in[m]$ we have $\mathbb{P}[\theta=\theta_{0}]=1/m$ , we have

	$\displaystyle\frac{1}{m}\sum_{\theta_{0}\in[m]}\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle\|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]$
$\displaystyle=$	$\displaystyle\sum_{\theta_{0}\in[m]}\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle\|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]\mathbb{P}\quantity[\theta=\theta_{0}]$
$\displaystyle=$	$\displaystyle\sum_{\theta_{0}\in[m]}\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle\|\hat{L}_{t},a_{t,i},\theta=\theta_{0}]\mathbb{P}\quantity[\theta=\theta_{0}\middle\|\hat{L}_{t},a_{t,i}]$
$\displaystyle=$	$\displaystyle\sum_{\theta_{0}\in[m]}\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\},\theta=\theta_{0}\middle\|\hat{L}_{t},a_{t,i}]$
$\displaystyle=$	$\displaystyle\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle\|\hat{L}_{t},a_{t,i}].{}$	(14)

On the other hand, for any $\theta_{0}\in[m]$ and $\sigma_{i}\geq m$ we have

\mathbb{P}\quantity(\mathcal{E}_{t,i})=\mathbb{P}\quantity[\text{{Rank}}_{i,i}\in[m]\middle|\,\hat{L}_{t},a_{t,i}]=\sum_{\theta_{0}\in[m]}\mathbb{P}\quantity[\text{{Rank}}_{i,i}=\theta_{0}\middle|\,\hat{L}_{t},a_{t,i}]=m/\sigma_{i},

(15)

and

	$\displaystyle\mathbb{P}\quantity[\text{{Rank}}_{i,i}=\theta_{0}\middle\|\text{{Rank}}_{i,i}\in[m],\hat{L}_{t},a_{t,i}]$	$\displaystyle=\frac{\mathbb{P}\quantity[\text{{Rank}}_{i,i}=\theta_{0}\middle\|\,\hat{L}_{t},a_{t,i}]}{\mathbb{P}\quantity[\text{{Rank}}_{i,i}\in[m]\middle\|\,\hat{L}_{t},a_{t,i}]}$
		$\displaystyle=\frac{1/\sigma_{i}}{m/\sigma_{i}}=\frac{1}{m}.$

Then, for the RHS of (13) we have

	$\displaystyle\frac{1}{m}\sum_{\theta_{0}\in[m]}\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle\|\text{{Rank}}_{i,i}=\theta_{0},\hat{L}_{t},a_{t,i}]$
$\displaystyle=$	$\displaystyle\sum_{\theta_{0}\in[m]}\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle\|\text{{Rank}}_{i,i}=\theta_{0},\hat{L}_{t},a_{t,i}]\mathbb{P}\quantity[\text{{Rank}}_{i,i}=\theta_{0}\middle\|\text{{Rank}}_{i,i}\in[m],\hat{L}_{t},a_{t,i}]$
$\displaystyle=$	$\displaystyle\sum_{\theta_{0}\in[m]}\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\},\text{{Rank}}_{i,i}=\theta_{0}\middle\|\text{{Rank}}_{i,i}\in[m],\hat{L}_{t},a_{t,i}]$
$\displaystyle=$	$\displaystyle\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle\|\text{{Rank}}_{i,i}\in[m],\hat{L}_{t},a_{t,i}].{}$	(16)

Combining (13), (14) and (16), we have

\mathbb{P}^{*}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\hat{L}_{t},a_{t,i}]=\mathbb{P}\quantity[\bigcap\nolimits_{j:\sigma_{j}\leq\sigma_{i}}\left\{r^{\prime\prime}_{t,j}\leq x_{j}\right\}\middle|\text{{Rank}}_{i,i}\in[m],\hat{L}_{t},a_{t,i}],

which means that CGR samples $r^{\prime\prime}_{t}$ from the conditional distribution of $\mathcal{D}$ conditioned on $\quantity{\text{{Rank}}_{i,i}\in[m]}$ . Combining this fact and (15) with Lemma 7, for $\sigma_{i}\geq m$ we have

\mathbb{E}_{r^{\prime\prime}_{t}\sim\mathcal{D}|\mathcal{E}_{t,i}}[M_{t}|\hat{L}_{t},a_{t,i}]=\frac{\mathbb{P}\quantity[\text{{Rank}}_{i,i}\in[m]\middle|\,\hat{L}_{t},a_{t,i}]}{w_{t,i}}=\frac{m}{\sigma_{i}w_{t,i}}.

Note that for $i$ satisfying $\sigma_{i}<m$ , the resampling method is the same as the original GR. For such $i$ we have

\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D}}[M_{t}|\hat{L}_{t},a_{t,i}]=\frac{1}{w_{t,i}}.

Therefore, for any $i\in[d]$ ,

\widehat{w_{t,i}^{-1}}=\quantity(\frac{\sigma_{i}}{m}\lor 1)M_{t,i}

serves as an unbiased estimator for $w_{t,i}^{-1}$ . Then, the expected number $M_{t}$ of resampling given $\hat{L}_{t}$ in CGR is bounded by

	$\displaystyle\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}\|\mathcal{E}_{t,i}}[M_{t}\|\hat{L}_{t}]$	$\displaystyle=\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}\|\mathcal{E}_{t,i}}\quantity[\max_{i:a_{t,i}=1,\sigma_{i}\leq m}M_{t,i}+\sum_{i:a_{t,i}=1,\sigma_{i}>m}M_{t,i}\middle\|\hat{L}_{t},a_{t}]$
		$\displaystyle\leq\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}\|\mathcal{E}_{t,i}}\quantity[\sum_{i=1}^{d}a_{t,i}M_{t,i}\middle\|\hat{L}_{t},a_{t}]$
		$\displaystyle=\sum_{i=1}^{d}\mathbb{P}[a_{t,i}=1\|\hat{L}_{t}]\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}\|\mathcal{E}_{t,i}}[M_{t,i}\|\hat{L}_{t},a_{t,i}=1]$
		$\displaystyle=\sum_{i=1}^{m}w_{t,i}\cdot\frac{1}{w_{t,i}}+\sum_{i=m+1}^{d}w_{t,i}\cdot\frac{m}{\sigma_{i}w_{t,i}}$
		$\displaystyle\leq m+m\int_{m}^{d}\frac{1}{x}\differential x$
		$\displaystyle=m+m\log\quantity(\frac{d}{m}).$

∎

Average Complexity

Now we analyze the average complexity of CGR, which can be expressed as

C_{\text{CGR}}=C_{\text{filter}}+\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}|\mathcal{E}_{t,i}}\quantity[M_{t}\middle|\hat{L}_{t}]\cdot C_{\text{resampling}},

(17)

where $C_{\text{filter}}$ is the cost of scanning the base-arms and determining whether to include them in the set $U$ (Lines 3–3), and $C_{\text{resampling}}$ is the cost of each resampling. For the former, the condition $\sigma_{i}>m$ that requires $O(d)$ is only evaluated when $a_{i}=1$ . Then, we have

C_{\text{filter}}=d\cdot O(1)+\left\lVert a\right\rVert_{1}\cdot O(d)=d\cdot O(1)+m\cdot O(d)=O\quantity(md).

(18)

For the resampling process, as shown in Algorithm 3, base-arms in $U$ and those not in $U$ are resampled differently, since the former involves an additional value-swapping operation (Lines 3–3). However, this operation does not change the order of the resampling cost, which remains $C_{\text{resampling}}=O(d)$ in both cases. Combining (9), (17) and (18), we have

C_{\text{CGR}}=O(md)+\quantity(m+m\log\quantity(d/m))\cdot O(d)=O\quantity(md\quantity(\log\quantity(d/m)+1)).

Remark 1.

In this paper, though we only analyze the regret bound of FTPL with the original GR, the analysis of FTPL with CGR is similar, as we only need to replace the expression of expectation of the estimator $\widehat{{w_{t,i}^{-1}}}^{2}$ in (59). In fact, the regret bound of FTPL with CGR can attain a slightly better regret bound compared with the one with the original GR. This is because the variance of $\widehat{{w_{t,i}^{-1}}}$ becomes

\mathrm{Var}[\widehat{w_{t,i}^{-1}}|\hat{L}_{t},a_{t,i}]=\begin{cases}\frac{1}{w_{t,i}^{2}}-\frac{1}{w_{t,i}}&\mbox{(original GR)},\\ \frac{1}{w_{t,i}^{2}}-\frac{1}{\mathbb{P}(\mathcal{E}_{t,i})w_{t,i}}&\mbox{(CGR)},\end{cases}

where the latter is no larger than the former.

6 Proofs for regret decomposition

In this section, we provide the proof of Lemma 2. Firstly, similarly to Lemma 3 in Honda et al. (2023), we prove the general framework of the regret decomposition that can be applied to general distributions.

Lemma 9.

\mathcal{R}(T)\leq\sum_{t=1}^{T}\mathbb{E}\quantity[\left\langle\hat{\ell}_{t},w_{t}-w_{t+1}\right\rangle]+\frac{\mathbb{E}_{r_{1}\sim\mathcal{D}_{\alpha}}\quantity[a_{1}^{\top}r_{1}]}{\eta}.

Proof.

Let us consider random variable $r\in[0,\infty)^{d}$ that independently follows Fréchet distribution $\mathcal{F}_{\alpha}$ or Pareto distribution $\mathcal{P}_{\alpha}$ , and is independent from the randomness $\{\ell_{t},r_{t}\}_{t=1}^{T}$ of the environment and the policy. Define $u_{t}=\mathop{\arg\min}_{w\in\Delta_{d}}\left\langle\eta\hat{L}_{t}-r,w\right\rangle$ , where $\Delta_{d}=\{p\in[0,1]^{d}:\sum_{i\in[d]}1\leq p_{i}\leq m\}$ . Then, since $r_{t}$ and $r$ are identically distributed given $\hat{L}_{t}$ , we have

\mathbb{E}[u_{t}|\hat{L}_{t}]=w_{t},\quad\mathbb{E}[\langle r,u_{t}\rangle|\hat{L}_{t}]=\mathbb{E}[a_{t}^{\top}r|\hat{L}_{t}].

(19)

Denote the optimal action as $a^{*}$ . Recalling $\hat{L}_{t}=\sum_{s=1}^{t}\hat{\ell}_{s}$ , we have

	$\displaystyle\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},a^{*}\right\rangle$	$\displaystyle=\left\langle\hat{L}_{T+1},a^{*}\right\rangle$
		$\displaystyle=\left\langle\hat{L}_{T+1}-\frac{1}{\eta}r,a^{}\right\rangle+\frac{1}{\eta}\langle r,a^{}\rangle$
		$\displaystyle\geq\left\langle\hat{L}_{T+1}-\frac{1}{\eta}r,u_{T+1}\right\rangle+\frac{1}{\eta}\langle r,a^{*}\rangle$
		$\displaystyle=\left\langle\hat{L}_{T}-\frac{1}{\eta}r,u_{T+1}\right\rangle+\left\langle\hat{\ell}_{T},u_{T+1}\right\rangle+\frac{1}{\eta}\langle r,a^{*}\rangle$
		$\displaystyle\geq\left\langle\hat{L}_{T}-\frac{1}{\eta}r,u_{T}\right\rangle+\left\langle\hat{\ell}_{T},u_{T+1}\right\rangle+\frac{1}{\eta}\langle r,a^{*}\rangle$

and recursively applying this relation, we obtain

\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},a^{*}\right\rangle\geq\left\langle-\frac{1}{\eta}r,u_{1}\right\rangle+\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},u_{t+1}\right\rangle+\frac{1}{\eta}\langle r,a^{*}\rangle

and therefore

\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},u_{t}-a^{*}\right\rangle\leq\frac{1}{\eta}\left\langle r,u_{1}-a^{*}\right\rangle+\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},u_{t}-u_{t+1}\right\rangle.

By using (19) and taking the expectation with respect to $r$ we obtain

	$\displaystyle\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},w_{t}-a^{*}\right\rangle$	$\displaystyle\leq\frac{1}{\eta}\mathbb{E}_{r\sim\mathcal{D}_{\alpha}}\left[\left\langle r,u_{1}-a^{*}\right\rangle\right]+\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},w_{t}-w_{t+1}\right\rangle$
		$\displaystyle\leq\frac{1}{\eta}\mathbb{E}_{r_{1}\sim\mathcal{D}_{\alpha}}\left[a_{1}^{\top}r_{1}\right]+\sum_{t=1}^{T}\left\langle\hat{\ell}_{t},w_{t}-w_{t+1}\right\rangle.$

∎

For Fréchet and Pareto distributions, we bound $\mathbb{E}_{r_{1}\sim\mathcal{D}_{\alpha}}\left[a_{1}^{\top}r_{1}\right]$ in the following lemma.

Lemma 10.

For $\mathcal{D}_{\alpha}\in\quantity{\mathcal{P}_{\alpha},\mathcal{F}_{\alpha}}$ and $\alpha>1$ , we have

\mathbb{E}_{r_{1}\sim\mathcal{D}_{\alpha}}\left[a_{1}^{\top}r_{1}\right]\leq\begin{cases}\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}&\text{if }\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}\\ \quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}+m&\text{if }\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha}.\end{cases}

Proof.

Let $r_{k}^{*}$ be the $k$ -th largest perturbation among $r_{1,1},r_{1,2},\cdots,r_{1,d}$ i.i.d. sampled from $\mathcal{D}_{\alpha}$ for $k\in[d]$ . Then, we have

\mathbb{E}_{r\sim\mathcal{D}_{\alpha}}\quantity[a_{1}^{\top}r_{1}]\leq\mathbb{E}_{r\sim\mathcal{D}_{\alpha}}\quantity[\sum_{k=1}^{m}r_{k}^{*}]\leq\sum_{k=1}^{m}\mathbb{E}_{r\sim\mathcal{D}_{\alpha}}\quantity[r_{k}^{*}].

(20)

Pareto Distribution

If $\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}$ , by Lemma 17, we obtain

\sum_{k=1}^{m}\mathbb{E}_{r\sim\mathcal{P}_{\alpha}}\quantity[r_{k}^{*}]\leq\sum_{k=1}^{m}\frac{\Gamma\quantity(d+1)\Gamma\quantity(d-k-\frac{1}{\alpha}+1)}{\Gamma\quantity(d-k+1)\Gamma\quantity(d-\frac{1}{\alpha}+1)}.

(21)

For $k=m=d$ , we have

	$\displaystyle\frac{\Gamma\quantity(d+1)\Gamma\quantity(d-k-\frac{1}{\alpha}+1)}{\Gamma\quantity(d-k+1)\Gamma\quantity(d-\frac{1}{\alpha}+1)}$	$\displaystyle=\frac{\Gamma\quantity(d+1)\Gamma\quantity(1-\frac{1}{\alpha})}{\Gamma\quantity(d-\frac{1}{\alpha}+1)}$
		$\displaystyle\leq\Gamma\quantity(1-\frac{1}{\alpha})\quantity(d+1)^{\frac{1}{\alpha}},$		(22)

where the last inequality follows from Gautschi’s inequality in Lemma 16. Similarly, for $k\in[m]$ and $k<d$ , by Gautschi’s inequality, we have

\frac{\Gamma\quantity(d+1)\Gamma\quantity(d-k-\frac{1}{\alpha}+1)}{\Gamma\quantity(d-k+1)\Gamma\quantity(d-\frac{1}{\alpha}+1)}\leq\quantity(\frac{d+1}{d-k})^{\frac{1}{\alpha}}.

(23)

By combining (20), (21), (22) and (23), we have

$\displaystyle\mathbb{E}_{r\sim\mathcal{P}_{\alpha}}\quantity[a_{1}^{\top}r_{1}]$	$\displaystyle\leq\sum_{k=1}^{m}\mathbb{E}_{r\sim\mathcal{P}_{\alpha}}\quantity[r_{k}^{*}]$
	$\displaystyle\leq\Gamma\quantity(1-\frac{1}{\alpha})\quantity(d+1)^{\frac{1}{\alpha}}+\sum_{k=1}^{m\land(d-1)}\quantity(\frac{d+1}{d-k})^{\frac{1}{\alpha}}$
	$\displaystyle<\Gamma\quantity(1-\frac{1}{\alpha})\quantity(d+1)^{\frac{1}{\alpha}}+\sum_{k=1}^{m}\quantity(\frac{d+1}{k})^{\frac{1}{\alpha}}$
	$\displaystyle\leq\quantity(\Gamma\quantity(1-\frac{1}{\alpha})+1+\int_{1}^{m}x^{-\frac{1}{\alpha}}\differential x)\quantity(d+1)^{\frac{1}{\alpha}}$
	$\displaystyle=\quantity(\Gamma\quantity(1-\frac{1}{\alpha})+1+\frac{\alpha}{\alpha-1}x^{1-\frac{1}{\alpha}}\bigg{\|}_{1}^{m})\quantity(d+1)^{\frac{1}{\alpha}}$
	$\displaystyle=\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha})-\frac{1}{\alpha-1})\quantity(d+1)^{\frac{1}{\alpha}}$
	$\displaystyle<\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}.$	(24)

Fréchet Distribution

If $\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha}$ , by combining (20), Lemma 19 and (24) we have

	$\displaystyle\mathbb{E}_{r\sim\mathcal{F}_{\alpha}}\quantity[a_{1}^{\top}r_{1}]$	$\displaystyle\leq\sum_{k=1}^{m}\mathbb{E}_{r\sim\mathcal{F}_{\alpha}}\quantity[r_{k}^{*}]$
		$\displaystyle\leq\sum_{k=1}^{m}\mathbb{E}_{r\sim\mathcal{P}_{\alpha}}\quantity[r_{k}^{*}]+1$
		$\displaystyle<\quantity(\frac{\alpha}{\alpha-1}m^{1-\frac{1}{\alpha}}+\Gamma\quantity(1-\frac{1}{\alpha}))\quantity(d+1)^{\frac{1}{\alpha}}+m.$

∎

7 Analysis on Stability Term

7.1 Proof for Monotonicity

Lemma 11.

Let $\psi(x):[\nu-\lambda_{i},\infty)\to\mathbb{R}$ denote a non-negative function that is independent of $\lambda_{j}$ . If $j\neq i$ and $F(x)$ is the cumulative distribution function of Fréchet or Pareto distributions, then

\frac{\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\psi(z)F(z+\lambda_{j})/(z+\lambda_{i})\differential z}{\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\psi(z)F(z+\lambda_{j})\differential z}

is monotonically increasing in $\lambda_{j}$ .

Proof.

Let

N(\lambda_{j})=\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)F(z+\lambda_{j})\differential z,\quad D(\lambda_{j})=\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\psi(z)F(z+\lambda_{j})\differential z.

The derivative of $N(\lambda_{j})/D(\lambda_{j})$ with respect to $\lambda_{j}$ is expressed as

\frac{d}{d\lambda_{j}}N(\lambda_{j})/D(\lambda_{j})=\frac{N^{\prime}(\lambda_{j})D(\lambda_{j})-N(\lambda_{j})D^{\prime}(\lambda_{j})}{(D(\lambda_{j}))^{2}}.

If $\lambda_{i}>\lambda_{j}$ , we have

	$\displaystyle N^{\prime}(\lambda_{j})$	$\displaystyle=\frac{\partial}{\partial\lambda_{j}}\int_{\nu-\lambda_{j}}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)F(z+\lambda_{j})\differential z$
		$\displaystyle=\int_{\nu-\lambda_{j}}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)f(z+\lambda_{j})\differential z+\frac{1}{(\nu-\lambda_{j})+\lambda_{i}}\psi(\nu-\lambda_{j})F((\nu-\lambda_{j})+\lambda_{j})$
		$\displaystyle=\int_{\nu-\lambda_{j}}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)f(z+\lambda_{j})\differential z,$

where the last equality holds since $F(\nu)=0$ . On the other hand, if $\lambda_{i}\leq\lambda_{j}$ , we have

N^{\prime}(\lambda_{j})=\frac{\partial}{\partial\lambda_{j}}\int_{\nu-\lambda_{i}}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)F(z+\lambda_{j})\differential z=\int_{\nu-\lambda_{i}}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)f(z+\lambda_{j})\differential z.

In both cases, we have

N^{\prime}(\lambda_{j})=\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\frac{1}{z+\lambda_{i}}\psi(z)f(z+\lambda_{j})\differential z.

Similarly, we have

D^{\prime}(\lambda_{j})=\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\psi(z)f(z+\lambda_{j})\differential z,

Next, we divide the proof into two cases.

Fréchet Distribution

When $F(x)$ is the cumulative distribution function of Fréchet distribution, we define $\widetilde{\psi}(x)=\psi(x)e^{-1/(x+\lambda_{j})^{\alpha}}$ . Under this definition, we have

	$\displaystyle N^{\prime}(\lambda_{j})D(\lambda_{j})$	$\displaystyle=\iint_{z,w\geq-(\lambda_{i}\land\lambda_{j})}\frac{\psi(z)\psi(w)f(z+\lambda_{j})F(w+\lambda_{j})}{(z+\lambda_{i})}\differential z\differential w$
		$\displaystyle=\alpha\iint_{z,w\geq-(\lambda_{i}\land\lambda_{j})}\frac{\widetilde{\psi}(z)\widetilde{\psi}(w)}{(z+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}\differential z\differential w$
		$\displaystyle=\frac{\alpha}{2}\iint_{z,w\geq-(\lambda_{i}\land\lambda_{j})}\widetilde{\psi}(z)\widetilde{\psi}(w)\quantity(\frac{1}{(z+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}+\frac{1}{(w+\lambda_{i})(w+\lambda_{j})^{\alpha+1}})\differential z\differential w.$

and

	$\displaystyle N(\lambda_{j})D^{\prime}(\lambda_{j})$	$\displaystyle=\iint_{z,w\geq-(\lambda_{i}\land\lambda_{j})}\frac{\psi(z)\psi(w)F(z+\lambda_{j})f(w+\lambda_{j})}{(z+\lambda_{i})}\differential z\differential w$
		$\displaystyle=\alpha\iint_{z,w\geq-(\lambda_{i}\land\lambda_{j})}\frac{\widetilde{\psi}(z)\widetilde{\psi}(w)}{(z+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}\differential z\differential w$
		$\displaystyle=\frac{\alpha}{2}\iint_{z,w\geq-(\lambda_{i}\land\lambda_{j})}\widetilde{\psi}(z)\widetilde{\psi}(w)\quantity(\frac{1}{(z+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}+\frac{1}{(w+\lambda_{i})(z+\lambda_{j})^{\alpha+1}})\differential z\differential w.$

Here, by an elementary calculation we can see

	$\displaystyle\frac{1}{(z+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}+\frac{1}{(w+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}-\frac{1}{(z+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}-\frac{1}{(w+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}$
	$\displaystyle=\frac{w-z}{(z+\lambda_{i})(w+\lambda_{i})}\quantity(\frac{1}{(z+\lambda_{j})^{\alpha+1}}-\frac{1}{(w+\lambda_{j})^{\alpha+1}})$
	$\displaystyle=\frac{(w-z)\quantity(\quantity(w+\lambda_{j})^{\alpha+1}-\quantity(z+\lambda_{j})^{\alpha+1})}{(z+\lambda_{i})(w+\lambda_{i})(z+\lambda_{j})^{\alpha+1}(w+\lambda_{j})^{\alpha+1}}\geq 0,$

where the last inequality holds since $h(x)=x^{\alpha+1}$ is monotonically increasing in $[0,+\infty)$ for $\alpha>0$ . Therefore, when $F(x)$ is the cumulative distribution function of Fréchet distribution, we have $\frac{d}{d\lambda_{j}}N(\lambda_{j})/D(\lambda_{j})\geq 0$ , which implies that $N(\lambda_{j})/D(\lambda_{j})$ is monotonically increasing in $\lambda_{j}$ .

Pareto Distribution

When $F(x)$ is the cumulative distribution function of Pareto distribution, we have

	$\displaystyle N^{\prime}(\lambda_{j})D(\lambda_{j})$	$\displaystyle=\iint_{z,w\geq 1-(\lambda_{i}\land\lambda_{j})}\frac{\psi(z)\psi(w)f(z+\lambda_{j})F(w+\lambda_{j})}{(z+\lambda_{i})}\differential z\differential w$
		$\displaystyle=\alpha\iint_{z,w\geq 1-(\lambda_{i}\land\lambda_{j})}\frac{\psi(z)\psi(w)(1-(w+\lambda_{j})^{-\alpha})}{(z+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}\differential z\differential w$
		$\displaystyle=\frac{\alpha}{2}\iint_{z,w\geq 1-(\lambda_{i}\land\lambda_{j})}\psi(z)\psi(w)\quantity(\frac{(1-(w+\lambda_{j})^{-\alpha})}{(z+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}+\frac{(1-(z+\lambda_{j})^{-\alpha})}{(w+\lambda_{i})(w+\lambda_{j})^{\alpha+1}})\differential z\differential w,$

and

	$\displaystyle N(\lambda_{j})D^{\prime}(\lambda_{j})$	$\displaystyle=\iint_{z,w\geq 1-(\lambda_{i}\land\lambda_{j})}\frac{\psi(z)\psi(w)f(z+\lambda_{j})F(w+\lambda_{j})}{(z+\lambda_{i})}\differential z\differential w$
		$\displaystyle=\alpha\iint_{z,w\geq 1-(\lambda_{i}\land\lambda_{j})}\frac{\psi(z)\psi(w)(1-(z+\lambda_{j})^{-\alpha})}{(z+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}\differential z\differential w$
		$\displaystyle=\frac{\alpha}{2}\iint_{z,w\geq 1-(\lambda_{i}\land\lambda_{j})}\psi(z)\psi(w)\quantity(\frac{(1-(z+\lambda_{j})^{-\alpha})}{(z+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}+\frac{(1-(w+\lambda_{j})^{-\alpha})}{(w+\lambda_{i})(z+\lambda_{j})^{\alpha+1}})\differential z\differential w.$

Here, by an elementary calculation we can see

	$\displaystyle\frac{(1-(w+\lambda_{j})^{-\alpha})}{(z+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}+\frac{(1-(z+\lambda_{j})^{-\alpha})}{(w+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}-\frac{(1-(z+\lambda_{j})^{-\alpha})}{(z+\lambda_{i})(w+\lambda_{j})^{\alpha+1}}-\frac{(1-(w+\lambda_{j})^{-\alpha})}{(w+\lambda_{i})(z+\lambda_{j})^{\alpha+1}}$
	$\displaystyle=\frac{w-z}{(z+\lambda_{i})(w+\lambda_{i})}\quantity(\frac{1-(w+\lambda_{j})^{-\alpha}}{(z+\lambda_{j})^{\alpha+1}}-\frac{1-(z+\lambda_{j})^{-\alpha}}{(w+\lambda_{j})^{\alpha+1}})$
	$\displaystyle=\frac{w-z}{(z+\lambda_{i})(w+\lambda_{i})(z+\lambda_{j})^{\alpha+1}(w+\lambda_{j})^{\alpha+1}}\quantity(\quantity((w+\lambda_{j})^{\alpha+1}-(w+\lambda_{j}))-\quantity((z+\lambda_{j})^{\alpha+1}-(z+\lambda_{j})))\geq 0,$

where the last inequality holds because $h(x)=x^{\alpha+1}-x$ is monotonically increasing in $[1,+\infty)$ for $\alpha>0$ . Therefore, we have $\frac{d}{d\lambda_{j}}N(\lambda_{j})/D(\lambda_{j})\geq 0$ , which concludes the proof. ∎

Lemma 12.

Let $a,b>0$ , $f(x),g(x)>0$ , where $x\in\mathbb{R}$ . If both $f(x)$ and $g(x)/f(x)$ are monotonically increasing in $x$ , then for any $x_{1}<x_{2}$ , we have

\frac{b+g(x_{1})}{a+f(x_{1})}\leq\frac{b}{a}\lor\frac{b+g(x_{2})}{a+f(x_{2})}.

Provided that $\lim_{x\to\infty}\quantity(b+g(x))/\quantity(a+f(x))$ exists, for any $x_{0}\in\mathbb{R}$ we have

\frac{b+g(x_{0})}{a+f(x_{0})}\leq\frac{b}{a}\lor\lim_{x\to\infty}\frac{b+g(x)}{a+f(x)}.

Proof.

According to the assumption, we have

f(x_{1})\leq f(x_{2})\text{ and }\frac{g(x_{1})}{f(x_{1})}\leq\frac{g(x_{2})}{f(x_{2})}.

If $b/a>g(x_{2})/f(x_{2})$ , then we have

\frac{b+g(x_{1})}{a+f(x_{1})}\leq\frac{b}{a}\lor\frac{g(x_{1})}{f(x_{1})}\leq\frac{b}{a}\lor\frac{g(x_{2})}{f(x_{2})}\leq\frac{b}{a}\leq\frac{b}{a}\lor\frac{b+g(x_{2})}{a+f(x_{2})}.

On the other hand, if $b/a\leq g(x_{2})/f(x_{2})$ , then we have

\frac{b+g(x_{1})}{a+f(x_{1})}=\frac{b+f(x_{1})\frac{g(x_{1})}{f(x_{1})}}{a+f(x_{1})}\leq\frac{b+f(x_{1})\frac{g(x_{2})}{f(x_{2})}}{a+f(x_{1})}.

Let $h(z)=\quantity(b+\frac{g(x_{2})}{f(x_{2})}z)/\quantity(a+z)$ , where $z\in[f(x_{1}),f(x_{2})]$ . Then, we have

	$\displaystyle h^{\prime}(z)$	$\displaystyle=\frac{\frac{g(x_{2})}{f(x_{2})}\quantity(a+z)-\quantity(b+\frac{g(x_{2})}{f(x_{2})}z)}{\quantity(a+z)^{2}}$
		$\displaystyle=\frac{\frac{g(x_{2})}{f(x_{2})}a-b}{\quantity(a+z)^{2}}=\frac{a\quantity(\frac{g(x_{2})}{f(x_{2})}-\frac{b}{a})}{\quantity(a+z)^{2}}\geq 0,$

which means that $h(z)$ is monotonically increasing in $[f(x_{1}),f(x_{2})]$ . Therefore, we have

\frac{b+f(x_{1})\frac{g(x_{2})}{f(x_{2})}}{a+f(x_{1})}\leq\frac{b+f(x_{2})\frac{g(x_{2})}{f(x_{2})}}{a+f(x_{2})}=\frac{b+g(x_{2})}{a+f(x_{2})}\leq\frac{b}{a}\lor\frac{b+g(x_{2})}{a+f(x_{2})}.

Combining both cases, we have

\frac{b+g(x_{1})}{a+f(x_{1})}\leq\frac{b}{a}\lor\frac{b+g(x_{2})}{a+f(x_{2})}

for any $x_{1}<x_{2}$ . If $\lim_{x\to\infty}\quantity(b+g(x))/\quantity(a+f(x))$ exists, the result for the infinite case follows directly by taking the limit of $x_{2}\to\infty$ . ∎

Lemma 13.

For $\widetilde{m}>1$ and $j\in\mathcal{B}\setminus\quantity{i}$ , let $\lambda^{\prime}\in\mathbb{R}^{d}$ be such that $\lambda^{\prime}_{j}\geq\lambda_{j}$ and $\lambda_{k}^{\prime}=\lambda_{k}$ for all $k\neq j$ . Then, we have

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}\leq\frac{J_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\frac{J_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}.

Proof.

For $\theta\leq\widetilde{m}$ , we define

\mathcal{E}_{i,\theta,j}^{\mathcal{B}}=\quantity{\text{{Rank}}\quantity(i,r-\lambda;\mathcal{B})=\theta\leq\text{{Rank}}\quantity(j,r-\lambda;\mathcal{B})},

where the probability can be expressed as

	$\displaystyle\phi_{i,\theta,j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})$	$\displaystyle\coloneqq\mathbb{P}_{r=\quantity(r_{1},\dots,r_{d})\sim\mathcal{D}}\quantity{\text{{Rank}}\quantity(i,r-\lambda;\mathcal{B})=\theta\leq\text{{Rank}}\quantity(j,r-\lambda;\mathcal{B})}$
		$\displaystyle=\int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}f(z+\lambda_{i})F(z+\lambda_{j})\sum_{\bm{v}\in\mathcal{S}_{i,\theta,j}^{\mathcal{B}}}\quantity(\prod_{k:v_{k}=1}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i}}F(z+\lambda_{k}))\differential z.$		(25)

Here, $\mathcal{S}_{i,\theta,j}^{\mathcal{B}}=\quantity{\bm{v}\in\quantity{0,1}^{d}:\left\lVert\bm{v}\right\rVert_{1}=\theta-1,v_{i}=v_{j}=0,\text{ and }v_{k}=0\text{ for all }k\notin\mathcal{B}}$ . Corresponding to this, we define

J_{i,\theta,j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})\coloneqq\\ \int_{\nu-(\lambda_{i}\land\lambda_{j})}^{\infty}\frac{f(z+\lambda_{i})}{z+\lambda_{i}}\sum_{\bm{v}\in\mathcal{S}_{i,\theta,j}^{\mathcal{B}}}\quantity(\prod_{k:v_{k}=1}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i}}F(z+\lambda_{k}))\differential z.

(26)

Considering the event

\widetilde{\mathcal{E}}_{i,j}=\quantity{\text{{Rank}}\quantity(i,r_{t}-\lambda_{t};\mathcal{B}\setminus\quantity{j})\leq\widetilde{m}-1},

we have

\mathbb{P}_{r=\quantity(r_{1},\dots,r_{d})\sim\mathcal{D}}\quantity(\widetilde{\mathcal{E}}_{i,j})=\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j}).

By definition, we can see that

\quantity{\text{{Rank}}\quantity(i,r_{t}-\lambda_{t};\mathcal{B})\leq\widetilde{m}}=\widetilde{\mathcal{E}}_{i,j}\cup\mathcal{E}_{i,\widetilde{m},j}^{\mathcal{B}},\quad\widetilde{\mathcal{E}}_{i,j}\cap\mathcal{E}_{i,\widetilde{m},j}^{\mathcal{B}}=\varnothing,

Therefore, we can docompose the expression of the probability as

\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})+\phi_{i,\widetilde{m},j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B}).

(27)

Similarly, by definition we have

J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})+J_{i,\widetilde{m},j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B}).

(28)

Consider the expression of $\phi_{i,\theta,j}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})$ and $J_{i,\theta,j}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})$ respectively given by (25) and (26). By Lemma 11, it follows that

\frac{J_{i,\widetilde{m},j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})}{\phi_{i,\widetilde{m},j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})}\leq\frac{J_{i,\widetilde{m},j}(\lambda^{\prime};\mathcal{D}_{\alpha},\mathcal{B})}{\phi_{i,\widetilde{m},j}(\lambda^{\prime};\mathcal{D}_{\alpha},\mathcal{B})}

because of the monotonic increase in $\lambda_{j}$ . Then, since both $\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})$ and $J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})$ do not depend on $\lambda_{j}$ , we have

$\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}$	$\displaystyle=\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})+J_{i,\widetilde{m},j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})+\phi_{i,\widetilde{m},j}(\lambda;\mathcal{D}_{\alpha},\mathcal{B})}$
	$\displaystyle\leq\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\frac{J_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}$	(29)
	$\displaystyle=\frac{J_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\frac{J_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda^{\prime};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})},$

where the inequality (29) holds by recalling that the same relation as (27) and (28) holds for $\lambda_{j}^{\prime}$ in Lemma 12. ∎

The following lemma presents a special case of Lemma 13, where we take $\lambda^{\prime}_{j}\to\infty$ .

Lemma 14.

Let $\widetilde{m}\geq 2$ . If $\widetilde{m}<\left\lvert\mathcal{B}\right\rvert$ , we have

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}\leq\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}\setminus\quantity{j})}.

(30)

If $\widetilde{m}=\left\lvert\mathcal{B}\right\rvert$ , we have

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}\leq\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}.

(31)

Proof.

Inequality (30)

Recall that

\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=\\ \sum_{\theta=1}^{m}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f(z+\lambda_{i})\sum_{\bm{v}\in\mathcal{S}_{i,\theta}^{\mathcal{B}}}\quantity(\prod_{k:v_{k}=1}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i}}F(z+\lambda_{k}))\differential z,

where $\mathcal{S}_{i,\theta}^{\mathcal{B}}=\quantity{\bm{v}\in\quantity{0,1}^{d}:\left\lVert\bm{v}\right\rVert_{1}=\theta-1,v_{i}=0,\text{ and }v_{k}=0\text{ for all }k\notin\mathcal{B}}$ . Note that

\mathcal{S}_{i,\theta}^{\mathcal{B}}\setminus\mathcal{S}_{i,\theta,j}^{\mathcal{B}}=\quantity{\bm{v}\in\quantity{0,1}^{d}:\left\lVert\bm{v}\right\rVert_{1}=\theta-1,v_{i}=0,v_{j}=1,\text{ and }v_{k}=0\text{ for all }k\notin\mathcal{B}},

where $\mathcal{S}_{i,\theta,j}^{\mathcal{B}}=\quantity{\bm{v}\in\quantity{0,1}^{d}:\left\lVert\bm{v}\right\rVert_{1}=\theta-1,v_{i}=v_{j}=0,\text{ and }v_{k}=0\text{ for all }k\notin\mathcal{B}}$ . Then, $\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})$ can be rewritten as

\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=\\ \sum_{\theta=1}^{\widetilde{m}}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f(z+\lambda_{i})(1-F(z+\lambda_{j}))\sum_{\bm{v}\in\mathcal{S}_{i,\theta}^{\mathcal{B}}\setminus\mathcal{S}_{i,\theta,j}^{\mathcal{B}}}\quantity(\prod_{k:v_{k}=1,k\neq j}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i}}F(z+\lambda_{k}))\differential z\\ +\sum_{\theta=1}^{\widetilde{m}}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f(z+\lambda_{i})F(z+\lambda_{j})\sum_{\bm{v}\in\mathcal{S}_{i,\theta,j}^{\mathcal{B}}}\quantity(\prod_{k:v_{k}=1}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i,j}}F(z+\lambda_{k}))\differential z.

(32)

Taking the limit of $\lambda_{j}\to\infty$ on both sides of (32), since $\lim_{\lambda_{j}\to\infty}F(z+\lambda_{j})=1$ , the first term of the RHS of (32) vanishes, and the second term becomes independent of $\lambda_{j}$ . Then, $\lim_{\lambda_{j}\to\infty}\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})$ can be expressed as

	$\displaystyle\lim_{\lambda_{j}\to\infty}\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})$
$\displaystyle=$	$\displaystyle\sum_{\theta=1}^{\widetilde{m}}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f(z+\lambda_{i})\sum_{\bm{v}\in\mathcal{S}_{i,\theta,j}^{\mathcal{B}}}\quantity(\prod_{k:v_{k}=1}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i,j}}F(z+\lambda_{k}))\differential z$
$\displaystyle=$	$\displaystyle\sum_{\theta=1}^{\widetilde{m}}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f(z+\lambda_{i})\sum_{\bm{v}\in\mathcal{S}_{i,\theta}^{\mathcal{B}\setminus\quantity{j}}}\quantity(\prod_{k:v_{k}=1}\quantity(1-F(z+\lambda_{k}))\prod_{k:v_{k}=0,k\in\mathcal{B}\setminus\quantity{i,j}}F(z+\lambda_{k}))\differential z,$
$\displaystyle=$	$\displaystyle\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}\setminus\quantity{j}).$	(33)

By the same argument, we also have

\lim_{\lambda_{j}\to\infty}J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}\setminus\quantity{j}).

(34)

By Lemma 13, (33) and (34), we have

$\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}$	$\displaystyle\leq\lim_{\lambda_{j}\to\infty}\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\lim_{\lambda_{j}\to\infty}\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}$
	$\displaystyle=\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\frac{\lim_{\lambda_{j}\to\infty}J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}{\lim_{\lambda_{j}\to\infty}\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})}$	(35)
	$\displaystyle=\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})}\lor\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}\setminus\quantity{j})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}\setminus\quantity{j})}.$

where (35) holds since both $\lim_{\lambda_{j}\to\infty}J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})$ and $\lim_{\lambda_{j}\to\infty}\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})$ exist and nonzero.

Inequality (31)

This result follows as a special case of (30), where in (33) we have $\mathcal{S}_{i,\widetilde{m}}^{\mathcal{B}\setminus\quantity{j}}=\varnothing$ , since there exists no $\bm{v}\in\quantity{0,1}^{d}$ satisfying $\left\lVert\bm{v}\right\rVert_{1}=\widetilde{m}-1=\left\lvert\mathcal{B}\right\rvert-1$ , $v_{i}=0$ and $v_{k}=0$ for all $k\notin\mathcal{B}\setminus\quantity{j}$ simultaneously. Therefore, we have

\lim_{\lambda_{j}\to\infty}\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=\phi_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j})

and

\lim_{\lambda_{j}\to\infty}J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B})=J_{i}(\lambda;\mathcal{D}_{\alpha},\widetilde{m}-1,\mathcal{B}\setminus\quantity{j}),

which conclude the proof. ∎

7.1.1 Proof of Lemma 3

Lemma 3 (Restated) It holds that

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m\land i)-w]\end{subarray}}\quantity{\frac{J_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}},

where

\mathcal{B}_{i,0}=[i],\mathcal{B}_{i,w}=[i]\setminus[w],\text{ and }\lambda_{k}^{*}=\begin{cases}\lambda_{i},&\text{if }k\leq i,\\ \lambda_{k},&\text{if }k>i.\end{cases}

Proof.

In this proof, we locally use $\lambda^{(0)},\lambda^{(1)},\lambda^{(2)},\cdots,\lambda^{(i-1)}$ to denote a sequence of $d$ -dimensional vectors defined as follows. Define $\lambda^{(0)}=\lambda$ , for $j\leq i$ we have

\lambda^{(j)}_{k}=\begin{cases}\lambda_{i},&\text{if }k=j,\\ \lambda^{(j-1)}_{k},&\text{otherwise,}\end{cases}\text{ which implies }\lambda^{(j)}_{k}=\begin{cases}\lambda_{i},&\text{if }k\in[j]\cup\quantity{i},\\ \lambda_{k},&\text{otherwise.}\end{cases}

Consequently, we have $\lambda^{(i-1)}=\lambda^{*}.$

We are now ready to derive the result. Firstly, to address all $\lambda_{j}\leq\lambda_{i}$ , for $i=1$ , we only need to show that

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{w\in\quantity{0}\cup[k\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(k)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(k)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}.

(36)

holds for $k=0$ , which is trivial. For $i\geq 2$ , we prove (36) holds for all $k\in[i-1]$ by mathematical induction. We have

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}=\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}.

We begin by verifying the base case of the induction. When $k=1$ , the statement becomes

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}\leq\max_{w\in\quantity{0}\cup[1\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}.

If $m=1$ , the statement is immediate, since $J_{i}(\lambda;\mathcal{D}_{\alpha})/\phi_{i}(\lambda;\mathcal{D}_{\alpha})$ is expressed as

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}=\frac{\int_{\nu-\min_{j\in[d]}\lambda_{j}}^{\infty}\frac{1}{z+\lambda_{i}}f(z+\lambda_{i})\prod_{j\neq i}F(z+\lambda_{j})\differential z}{\int_{\nu-\min_{j\in[d]}\lambda_{j}}^{\infty}f(z+\lambda_{i})\prod_{j\neq i}F(z+\lambda_{j})\differential z},

which is monotonically increasing in all $\lambda_{j}\neq\lambda_{i}$ by Lemma 11. Otherwise, by applying Lemma 13, we have

	$\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}$	$\displaystyle\leq\frac{J_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m-1,\mathcal{B}_{d,1})}{\phi_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m-1,\mathcal{B}_{d,1})}\lor\frac{J_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}$
		$\displaystyle=\max_{w\in\quantity{0,1}}\quantity{\frac{J_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}.$

Therefore, the statement holds for $k=1$ .

We assume, as the inductive hypothesis, that the statement holds for $k=u<i-1$ , i.e.,

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}\leq\max_{w\in\quantity{0}\cup[u\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}.

(37)

If we can prove the statement holds for $k=u+1$ , then by induction, the statement holds for all $k\leq i-1$ , thereby establishing the desired result in (36). Now we aim to prove it for $k=u+1\leq i-1$ , i.e., we want to show that

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}\leq\max_{w\in\quantity{0}\cup[(u+1)\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}.

To prove this, it suffices to show the following inequality holds:

\max_{w\in\quantity{0}\cup[u\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}\leq\\ \max_{w\in\quantity{0}\cup[(u+1)\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}},

(38)

since this, together with the induction hypothesis (37), implies that

	$\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha},m,\mathcal{B}_{d,0})}$	$\displaystyle\leq\max_{w\in\quantity{0}\cup[u\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}$
		$\displaystyle\leq\max_{w\in\quantity{0}\cup[(u+1)\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}.$

Now we prove (38) holds for $u<i-1$ . For each term in the LHS of (38) given by

J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})/\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}}),\text{ where }w_{0}\in\quantity{0}\cup[u\land(m-1)],

we consider the following two cases.

•

Case 1: $w_{0}=m-1$ .
In this case, we have $m-w_{0}=1$ . Similarly to the analysis on the base case, by Lemma 11, for any $j\in\mathcal{B}_{d,w_{0}}\setminus\quantity{i}$ ,

J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})/\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})

is monotonically increasing in $\lambda_{j}$ . Applying this to $\lambda^{(u)}_{u+1}$ , we have

	$\displaystyle\frac{J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})}$	$\displaystyle\leq\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})}$
		$\displaystyle\leq\max_{w\in\quantity{0}\cup[(u+1)\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}},$		(39)

where the last inequality holds since $w_{0}\in\quantity{0}\cup[u\land(m-1)]$ and $m-w_{0}=1$ .

•

Case 2: $w_{0}\leq m-2$ .
In this case, since $w_{0}\leq m-2$ , we have $w_{0}<w_{0}+1\leq m-1$ . On the other hand, since $w_{0}\in\quantity{0}\cup[u\land(m-1)]$ , we have $w_{0}\leq u$ and thus $w_{0}<w_{0}+1\leq u+1$ . Combining these, we have $\quantity{w_{0},w_{0}+1}\subset\quantity{0}\cup[(u+1)\land(m-1)]$ . Since $m-w_{0}\geq 2$ , by applying Lemma 13 again, we have

	$\displaystyle\frac{J_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{(u)};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}$
$\displaystyle\leq$	$\displaystyle\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-(w_{0}+1),\mathcal{B}_{d,w_{0}+1})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-(w_{0}+1),\mathcal{B}_{d,w_{0}+1})}\lor\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}$
$\displaystyle\leq$	$\displaystyle\max_{w\in\quantity{0}\cup[(u+1)\land(m-1)]}\quantity{\frac{J_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{(u+1)};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}},$	(40)

where the last inequality holds since $\quantity{w_{0},w_{0}+1}\subset\quantity{0}\cup[(u+1)\land(m-1)]$ .

Combining (39) and (40), we see that (38) holds and thus the statement holds for $k=u+1\leq i-1$ , completing the inductive step. By the principle of mathematical induction, the statement (36) holds for $k\in[i-1]$ . By letting $k=i-1$ , we have

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{w\in\quantity{0}\cup[(m\land i)-1]}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d,w})}}.

(41)

Now we address all the base-arms in $\mathcal{B}_{d,w}\setminus\mathcal{B}_{i,w}=[d]\setminus[i]$ , i.e., all $\lambda_{j}$ satisfying $\lambda_{j}>\lambda_{i}$ . We prove

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-k))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-k,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-k,w})}}

(42)

by mathematical induction on $k\in[d-i]$ . We begin by verifying the base case of the induction. When $k=1$ , the statement becomes

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-1,w})}}.

If $m<d$ , we consider each term in the RHS of (41) in two cases. For $w=w_{0}\in\quantity{0}\cup[(m\land i)-1]$ such that $m-w_{0}=1$ , it follows that

	$\displaystyle\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}$	$\displaystyle=\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},1,\mathcal{B}_{d,w_{0}})}$
		$\displaystyle=\frac{\int_{\nu-\min_{j\in\mathcal{B}_{d,w_{0}}}\lambda^{}_{j}}^{\infty}\frac{1}{z+\lambda^{}_{i}}f(z+\lambda^{}_{i})\prod_{j\in\mathcal{B}_{d,w_{0}}\setminus\quantity{i}}F(z+\lambda^{}_{j})\differential z}{\int_{\nu-\min_{j\in\mathcal{B}_{d,w_{0}}}\lambda^{}_{j}}^{\infty}f(z+\lambda^{}_{i})\prod_{j\in\mathcal{B}_{d,w_{0}}\setminus\quantity{i}}F(z+\lambda^{*}_{j})\differential z},$

which is monotonically increasing in $\lambda_{j}$ for all $j\in\mathcal{B}_{d,w_{0}}\setminus\quantity{i}$ by Lemma 11. Taking the limit $\lambda^{*}_{d}\to\infty$ , we have $\lim_{\lambda^{*}_{d}\to\infty}F(z+\lambda^{*}_{d})=1$ and thus

	$\displaystyle\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}$	$\displaystyle\leq\frac{\int_{\nu-\min_{j\in\mathcal{B}_{d,w_{0}}}\lambda^{}_{j}}^{\infty}\frac{1}{z+\lambda^{}_{i}}f(z+\lambda^{}_{i})\prod_{j\in\mathcal{B}_{d,w_{0}}\setminus\quantity{i,d}}F(z+\lambda^{}_{j})\differential z}{\int_{\nu-\min_{j\in\mathcal{B}_{d,w_{0}}}\lambda^{}_{j}}^{\infty}f(z+\lambda^{}_{i})\prod_{j\in\mathcal{B}_{d,w_{0}}\setminus\quantity{i,d}}F(z+\lambda^{*}_{j})\differential z}$
		$\displaystyle=\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d-1,w_{0}})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d-1,w_{0}})}.$

For $w=w_{0}\in\quantity{0}\cup[(m\land i)-1]$ such that $m-w_{0}\geq 2$ , by applyng Lemma 14 to base-arm $d$ , we obtain

\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d,w_{0}})}\leq\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d-1,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-w_{0},\mathcal{B}_{d-1,w_{0}})}\lor\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-1-w_{0},\mathcal{B}_{d-1,w_{0}})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},m-1-w_{0},\mathcal{B}_{d-1,w_{0}})}.

Combining the above two cases, by (41) we have

	$\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}$	$\displaystyle\leq\max_{w\in\quantity{0}\cup[(m\land i)-1]}\quantity{\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d-1,w})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},m-w,\mathcal{B}_{d-1,w})},\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\quantity(m-1-w)\lor 1,\mathcal{B}_{d-1,w})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\quantity(m-1-w)\lor 1,\mathcal{B}_{d-1,w})}}$
		$\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-1,w})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-1,w})}},$

where the last inequality holds since $m<d$ and $\quantity{m-w,\quantity(m-1-w)\lor 1}\subset[(m\land(d-1))-w]$ for any $w\in\quantity{0}\cup[(m\land i)-1]$ . On the other hand, if $m=d$ , by Lemma 14 and (41), we obtain

	$\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}$	$\displaystyle\leq\max_{w\in\quantity{0}\cup[(m\land i)-1]}\quantity{\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},m-1-w,\mathcal{B}_{d-1,w})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},m-1-w,\mathcal{B}_{d-1,w})}}$
		$\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-1,w})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-1,w})}},$

where the last inequality holds since $m-1-w\in[(m\land(d-1))-w]$ for any $w\in\quantity{0}\cup[(m\land i)-1]$ . Therefore, the statement holds for base case $k=1$ .

Now we assume, as the inductive hypothesis, that the statement holds for $k=u\leq d-i-1$ , i.e.,

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u,w})}}.

(43)

If we can prove the statement holds for $k=u+1$ , then by induction, the statement holds for all $k\leq d-i$ , thereby establishing the desired result in (42). Now we aim to prove it for $k=u+1\leq d-i$ , i.e., we want to show that

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}}.

To prove this, by induction hypothesis (43), we only need to show that the following inequality holds:

\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u,w})}}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}}.

(44)

We consider each term in the LHS of the above inequality with $w=w_{0}\in\quantity{0}\cup[(m\land i)-1]$ and $\widetilde{m}=\widetilde{m}_{0}\in[(m\land(d-u))-w_{0}]$ in the following three cases.

•

Case 1: $\widetilde{m}_{0}=1$ .
In this case, since $u\leq d-i-1$ and $w_{0}\leq(m\land i)-1$ , we have

(d-u-1)-w_{0}\geq d-(d-i-1)-1-\quantity((m\land i)-1)\geq 1=\widetilde{m}_{0}.

On the other hand, since $\widetilde{m}_{0}\in[(m\land(d-u))-w_{0}]$ , we have $\widetilde{m}_{0}\leq m-w_{0}$ . Combining these, it follows that $\widetilde{m}_{0}\in[(m\land(d-u-1))-w_{0}]$ . Similarly to the analysis on the base case, by Lemma 11, for any $j\in\mathcal{B}_{d-u,w_{0}}\setminus\quantity{i}$ ,

J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})/\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})

is monotonically increasing in $\lambda_{j}$ . Taking the limit $\lambda^{*}_{d-u}\to\infty$ , it follows that

	$\displaystyle\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})}$	$\displaystyle\leq\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u-1,w_{0}})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u-1,w_{0}})}$
		$\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}},$

where the last inequality holds since $w_{0}\in\quantity{0}\cup[(m\land i)-1]$ and $\widetilde{m}_{0}\in[(m\land(d-u-1))-w_{0}]$ .

•

Case 2: $\widetilde{m}_{0}\geq 2$ and $\widetilde{m}_{0}\leq(d-u)-w_{0}-1$ .
In this case, since $\widetilde{m}_{0}\in[(m\land(d-u))-w_{0}]$ we have $\widetilde{m}_{0}\leq m-w_{0}$ . Combining it with $\widetilde{m}_{0}\leq(d-u)-w_{0}-1$ , it follows that $\widetilde{m}_{0}\in[(m\land(d-u-1))-w_{0}]$ . Since $\widetilde{m}_{0}\geq 2$ , we have $\quantity{\widetilde{m}_{0}-1,\widetilde{m}_{0}}\subset[(m\land(d-u-1))-w_{0}]$ . By applying Lemma 14 to base-arm $d-u$ , we have

	$\displaystyle\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})}$	$\displaystyle\leq\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0}-1,\mathcal{B}_{d-u-1,w_{0}})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0}-1,\mathcal{B}_{d-u-1,w_{0}})}\lor\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u-1,w_{0}})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u-1,w_{0}})}$
		$\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}},$

where the last inequality holds since $w_{0}\in\quantity{0}\cup[(m\land i)-1]$ and $\quantity{\widetilde{m}_{0}-1,\widetilde{m}_{0}}\subset[(m\land(d-u-1))-w_{0}]$ .

•

Case 3: $\widetilde{m}_{0}\geq 2$ and $\widetilde{m}_{0}=(d-u)-w_{0}$ .
In this case, since $\widetilde{m}_{0}\geq 2$ and $\widetilde{m}_{0}=(d-u)-w_{0}$ , we have $\widetilde{m}_{0}-1=(d-u-1)-w_{0}\geq 1$ . On the other hand, since $\widetilde{m}_{0}\in[(m\land(d-u))-w_{0}]$ we have $\widetilde{m}_{0}-1\leq m-w_{0}$ . Combining these, it follows that $\widetilde{m}_{0}-1\in[(m\land(d-u-1))-w_{0}]$ . By applying Lemma 14 to base-arm $d-u$ , we have

	$\displaystyle\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{d-u,w_{0}})}$	$\displaystyle\leq\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0}-1,\mathcal{B}_{d-u-1,w_{0}})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0}-1,\mathcal{B}_{d-u-1,w_{0}})}$
		$\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land(d-u-1))-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{d-u-1,w})}},$

where the last inequality holds since $w_{0}\in\quantity{0}\cup[(m\land i)-1]$ and $\widetilde{m}_{0}-1\in[(m\land(d-u-1))-w_{0}]$ .

Combining the three cases, we see that (44) holds, and thus the statement holds for $k=u+1\leq d-i$ , completing the inductive step. By induction, the statement (42) holds for $k\in[d-i]$ . By letting $k=d-i$ , we immediately obtain

\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land i)-w]\end{subarray}}\quantity{\frac{J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{i,w})}{\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m},\mathcal{B}_{i,w})}}.

(45)

Note that we have

\phi_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{i,w_{0}})=\sum_{\theta=1}^{\widetilde{m}_{0}}\phi_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}})

(46)

and

J_{i}(\lambda^{*};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{i,w_{0}})=\sum_{\theta=1}^{\widetilde{m}_{0}}J_{i,\theta}(\lambda^{*};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}}).

(47)

Since each term in the RHS of both (46) and (47) is positive, we have

	$\displaystyle\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{i,w_{0}})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{i,w_{0}})}$	$\displaystyle=\frac{\sum_{\theta=1}^{\widetilde{m}_{0}}J_{i,\theta}(\lambda^{};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}})}{\sum_{\theta=1}^{\widetilde{m}_{0}}\phi_{i,\theta}(\lambda^{};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}})}$
		$\displaystyle\leq\max_{\theta\in[\widetilde{m}_{0}]}\quantity{\frac{J_{i,\theta}(\lambda^{};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}})}{\phi_{i,\theta}(\lambda^{};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}})}}.$		(48)

Combining (45) and (48), we have

	$\displaystyle\frac{J_{i}(\lambda;\mathcal{D}_{\alpha})}{\phi_{i}(\lambda;\mathcal{D}_{\alpha})}$	$\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \widetilde{m}\in[(m\land i)-w]\end{subarray}}\quantity{\max_{\theta\in[\widetilde{m}]}\quantity{\frac{J_{i,\theta}(\lambda^{};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}}}$
		$\displaystyle=\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m\land i)-w]\end{subarray}}\quantity{\frac{J_{i,\theta}(\lambda^{};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{};\mathcal{D}_{\alpha},\mathcal{B}_{i,w})}},$

which concludes the proof. ∎

7.2 Proof of Lemma 4

7.2.1 Pareto Distribution

Let us consider the case $\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}$ . Recall that the probability density function and cumulative distribution function of Pareto distribution are given by

f(x)=\frac{\alpha}{x^{\alpha+1}},\quad F(x)=1-x^{-\alpha},\quad x\geq 1.

Then, for $w\in\quantity{0}\cup[(m\land i)-1],\theta\in[(m\land i)-w]$ , we have

	$\displaystyle J_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})$	$\displaystyle=\binom{i-w-1}{\theta-1}\int_{1-\lambda_{\theta}}^{\infty}\frac{f(z+\lambda_{i})}{z+\lambda_{i}}\quantity(1-F(z+\lambda_{i}))^{\theta-1}F^{i-w-\theta}(z+\lambda_{i})\differential z$
		$\displaystyle=\binom{i-w-1}{\theta-1}\int_{1}^{\infty}\frac{f(z)}{z}\quantity(1-F(z))^{\theta-1}F^{i-w-\theta}(z)\differential z$

and

	$\displaystyle\phi_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})$	$\displaystyle=\binom{i-w-1}{\theta-1}\int_{1-\lambda_{\theta}}^{\infty}f(z+\lambda_{i})\quantity(1-F(z+\lambda_{i}))^{\theta-1}F^{i-w-\theta}(z+\lambda_{i})\differential z$
		$\displaystyle=\binom{i-w-1}{\theta-1}\int_{1}^{\infty}f(z)\quantity(1-F(z))^{\theta-1}F^{i-w-\theta}(z)\differential z.$

Then, it holds that

	$\displaystyle J_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})$	$\displaystyle=\binom{i-w-1}{\theta-1}\int_{1}^{\infty}\frac{\alpha}{z^{\alpha\theta+2}}\quantity(1-z^{-\alpha})^{i-w-\theta}\differential z$
		$\displaystyle=\binom{i-w-1}{\theta-1}\int_{0}^{1}w^{\theta-1+\frac{1}{\alpha}}(1-w)^{i-w-\theta}\differential w$
		$\displaystyle=\binom{i-w-1}{\theta-1}B\quantity(\theta+\frac{1}{\alpha},i+1-w-\theta).$

where $B\quantity(a,b)=\int_{0}^{1}t^{a-1}(1-t)^{b-1}\differential t$ denotes the Beta function. Similarly, we have

	$\displaystyle\phi_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})$	$\displaystyle=\binom{i-w-1}{\theta-1}\int_{1}^{\infty}\frac{\alpha}{z^{\alpha\theta+1}}\quantity(1-z^{-\alpha})^{i-w-\theta}\differential z$
		$\displaystyle=\binom{i-w-1}{\theta-1}\int_{0}^{1}w^{\theta-1}(1-w)^{i-w-\theta}\differential w$
		$\displaystyle=\binom{i-w-1}{\theta-1}B\quantity(\theta,i+1-w-\theta).$

Therefore, we have

\frac{J_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})}=\frac{B\quantity(\theta+\frac{1}{\alpha},i+1-w-\theta)}{B\quantity(\theta,i+1-w-\theta)}.

(49)

Similarly to the proof of Lee et al. (2024), we bound (49) as follows. For $\alpha>1$ , we have

	$\displaystyle\frac{B(\theta+\frac{1}{\alpha},i+1-w-\theta)}{B(\theta,i+1-w-\theta)}$
$\displaystyle=$	$\displaystyle\frac{\Gamma(\theta+\frac{1}{\alpha})\Gamma(i+1-w-\theta)}{\Gamma(i+1+\frac{1}{\alpha}-w)}\frac{\Gamma(i+1-w)}{\Gamma(\theta)\Gamma(i+1-w-\theta)}$	(by $B(a,b)=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}$ )
$\displaystyle=$	$\displaystyle\frac{\Gamma(\theta+\frac{1}{\alpha})}{\Gamma(i+1+\frac{1}{\alpha}-w)}\frac{\Gamma(i+1-w)}{\Gamma(\theta)}$
$\displaystyle=$	$\displaystyle\frac{1}{i+\frac{1}{\alpha}-w}\frac{\Gamma(\theta+\frac{1}{\alpha})}{\Gamma(\theta)}\frac{\Gamma(i+1-w)}{\Gamma(i+\frac{1}{\alpha}-w)}$	(by $\Gamma(n)=(n-1)\Gamma(n-1)$ )
$\displaystyle\leq$	$\displaystyle\frac{1}{i+\frac{1}{\alpha}-w}\left(\theta+\frac{1}{\alpha}\right)^{\frac{1}{\alpha}}\left(i+1-w\right)^{1-\frac{1}{\alpha}}$	(by Gautschi’s inequality)
$\displaystyle=$	$\displaystyle\frac{i+1-w}{i+\frac{1}{\alpha}-w}\left(\frac{\theta+\frac{1}{\alpha}}{i+1-w}\right)^{\frac{1}{\alpha}}$
$\displaystyle\leq$	$\displaystyle\frac{2\alpha}{\alpha+1}\left(\frac{\theta+\frac{1}{\alpha}}{i+1-w}\right)^{\frac{1}{\alpha}}.$	(equality holds when $w=i-1$ )

Since $w\in\quantity{0}\cup[(m\land i)-1],\theta\in[(m-w)\land(i-w)]$ , we have

\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m-w)\land(i-w)]\end{subarray}}\frac{\theta+\frac{1}{\alpha}}{i+1-w}=\max_{w\in\quantity{0}\cup[(m\land i)-1]}\frac{(m\land i)+\frac{1}{\alpha}-w}{i+1-w}=\frac{(m\land i)+\frac{1}{\alpha}}{i+1}.

Therefore, we have

\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m-w)\land(i-w)]\end{subarray}}\frac{B(\theta+\frac{1}{\alpha},i+1-w-\theta)}{B(\theta,i+1-w-\theta)}\leq\frac{2\alpha}{\alpha+1}\quantity(\frac{(i\land m)+\frac{1}{\alpha}}{i})^{\frac{1}{\alpha}}.

Recall that $\sigma_{i}=i$ as previously noted for notational simplicity. By Lemma 3, for any $i\in[d]$ , we have

\frac{J_{i}(\lambda;\mathcal{P}_{\alpha})}{\phi_{i}(\lambda;\mathcal{P}_{\alpha})}\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m-w)\land(i-w)]\end{subarray}}\frac{J_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{*};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})}\leq\frac{2\alpha}{\alpha+1}\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{\sigma_{i}})^{\frac{1}{\alpha}}.

7.2.2 Fréchet Distribution

Before proving the statement in the case $\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha}$ , we need to give the following lemma.

Lemma 15.

Let $F(x)$ denote the cumulative distribution function of Fréchet distribution with shape $\alpha$ . For $\lambda_{i}\geq 0$ , we have

\frac{\int_{0}^{\infty}\frac{f(z+\lambda_{i})}{z+\lambda_{i}}\quantity(1-F(z+\lambda_{i}))^{p}F^{q}(z+\lambda_{i})\differential z}{\int_{0}^{\infty}f(z+\lambda_{i})\quantity(1-F(z+\lambda_{i}))^{p}F^{q}(z+\lambda_{i})\differential z}\leq\frac{\int_{0}^{\infty}\frac{f(z+\lambda_{i})}{(z+\lambda_{i})^{\alpha p+1}}F^{q}(z+\lambda_{i})\differential z}{\int_{0}^{\infty}\frac{f(z+\lambda_{i})}{(z+\lambda_{i})^{\alpha p}}F^{q}(z+\lambda_{i})\differential z}.

(50)

Proof.

By letting $h(x)=f(z+\lambda_{i})F^{q}(z+\lambda_{i})$ , the inequality (50) can be rewritten as

\frac{\int_{0}^{\infty}\frac{1}{z+\lambda_{i}}\quantity(1-F(z+\lambda_{i}))^{p}h(z)\differential z}{\int_{0}^{\infty}\quantity(1-F(z+\lambda_{i}))^{p}h(z)\differential z}\leq\frac{\int_{0}^{\infty}\frac{1}{(z+\lambda_{i})^{\alpha p+1}}h(z)\differential z}{\int_{0}^{\infty}\frac{1}{(z+\lambda_{i})^{\alpha p}}h(z)\differential z}.

Equivalently, multiplying both sides by

\int_{0}^{\infty}\quantity(1-F(z+\lambda_{i}))^{p}h(z)\differential z\cdot\int_{0}^{\infty}\frac{1}{(z+\lambda_{i})^{\alpha p}}h(z)\differential z,

we arrive at

\int_{0}^{\infty}\frac{h(z)}{(z+\lambda_{i})}\quantity(1-F(z+\lambda_{i}))^{p}\differential z\int_{0}^{\infty}\frac{h(z)}{(z+\lambda_{i})^{\alpha p}}\differential z\leq\\ \int_{0}^{\infty}\frac{h(z)}{(z+\lambda_{i})^{\alpha p+1}}\differential z\int_{0}^{\infty}h(z)\quantity(1-F(z+\lambda_{i}))^{p}\differential z,

(51)

and thus we only need to prove it to conclude the proof. The LHS of (51) can be expressed as

		$\displaystyle\int_{0}^{\infty}\frac{h(z)}{(z+\lambda_{i})}\quantity(1-F(z+\lambda_{i}))^{p}\differential z\int_{0}^{\infty}\frac{h(z)}{(z+\lambda_{i})^{\alpha p}}\differential z$
	$\displaystyle=$	$\displaystyle\iint_{z,w\geq 0}\frac{h(z)h(w)\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}}{(z+\lambda_{i})(w+\lambda_{i})^{\alpha p}}\differential z\differential w$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\iint_{z,w\geq 0}h(z)h(w)\quantity(\frac{\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}}{(z+\lambda_{i})(w+\lambda_{i})^{\alpha p}}+\frac{\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{p}}{(w+\lambda_{i})(z+\lambda_{i})^{\alpha p}})\differential z\differential w$

and the RHS of (51) can be expressed as

		$\displaystyle\int_{0}^{\infty}\frac{h(z)}{(z+\lambda_{i})^{\alpha p+1}}\differential z\int_{0}^{\infty}h(z)\quantity(1-F(z+\lambda_{i}))^{p}\differential z$
	$\displaystyle=$	$\displaystyle\iint_{z,w\geq 0}\frac{h(z)h(w)\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{n}}{(z+\lambda_{i})^{\alpha n+1}}\differential z\differential w$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\iint_{z,w\geq 0}h(z)h(w)\quantity(\frac{\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{n}}{(z+\lambda_{i})^{\alpha n+1}}+\frac{\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{n}}{(w+\lambda_{i})^{\alpha n+1}})\differential z\differential w.$

By an elementary calculation we can see

	$\displaystyle\frac{\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}}{(z+\lambda_{i})(w+\lambda_{i})^{\alpha p}}+\frac{\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{p}}{(w+\lambda_{i})(z+\lambda_{i})^{\alpha p}}$
$\displaystyle-$	$\displaystyle\frac{\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{p}}{(z+\lambda_{i})^{\alpha p+1}}-\frac{\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}}{(w+\lambda_{i})^{\alpha p+1}}$
$\displaystyle=$	$\displaystyle\frac{(z+\lambda_{i})^{\alpha p}\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}-(w+\lambda_{i})^{\alpha p}\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{p}}{(z+\lambda_{i})^{\alpha p+1}(w+\lambda_{i})^{\alpha p}}$
$\displaystyle-$	$\displaystyle\frac{(w+\lambda_{i})^{\alpha p}\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{p}-(z+\lambda_{i})^{\alpha p}\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}}{(w+\lambda_{i})^{\alpha p+1}(z+\lambda_{i})^{\alpha p}}$
$\displaystyle=$	$\displaystyle\frac{w-z}{(z+\lambda_{i})^{\alpha p+1}(w+\lambda_{i})^{\alpha p+1}}\quantity((z+\lambda_{i})^{\alpha p}\quantity(1-\exp\quantity(-1/(z+\lambda_{i})^{\alpha}))^{p}-(w+\lambda_{i})^{\alpha p}\quantity(1-\exp\quantity(-1/(w+\lambda_{i})^{\alpha}))^{p}).$	(52)

By letting $t(x)=(x+\lambda_{i})^{\alpha}$ , we have

(x+\lambda_{i})^{\alpha n}\quantity(1-\exp\quantity(-1/(x+\lambda_{i})^{\alpha}))^{n}=\quantity(t\quantity(1-e^{-1/t}))^{n}.

(53)

Since $t(x)$ is monotonically increasing in $x$ , and $t\quantity(1-e^{-1/t})>0$ , the LHS of (53) is monotonic in the same direction as $t\quantity(1-e^{-1/t})$ , whose derivative is expressed as

1-e^{-1/t}-\frac{1}{t}e^{-1/t}\geq 1-e^{-1/t}-(e^{1/t}-1)e^{-1/t}=0,

where the inequality holds since $\frac{1}{t}\leq e^{1/t}-1$ . Therefore, the LHS of (53) is monotonically increasing in $x$ . This implies that the expression in (52) is non-positivive, which concludes the proof. ∎

We now prove Lemma 4 in the case $\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha}$ by applying Lemma 15. Recall that the probability density function and cumulative distribution function of Pareto distribution are given by

f(x)=\frac{\alpha}{x^{\alpha+1}}e^{-1/x^{\alpha}},\quad F(x)=e^{-1/x^{\alpha}},\quad x\geq 0.

Then, for $w\in\quantity{0}\cup[(m\land i)-1],\theta\in[(m\land i)-w]$ , we have

	$\displaystyle J_{i,\theta}(\lambda^{*};\mathcal{F}_{\alpha},\mathcal{B}_{i,w})$	$\displaystyle=\binom{i-w-1}{\theta-1}\int_{-\lambda_{\theta}}^{\infty}\frac{f(z+\lambda_{i})}{z+\lambda_{i}}\quantity(1-F(z+\lambda_{i}))^{\theta-1}F^{i-w-\theta}(z+\lambda_{i})\differential z$
		$\displaystyle=\binom{i-w-1}{\theta-1}\int_{0}^{\infty}\frac{f(z)}{z}\quantity(1-F(z))^{\theta-1}F^{i-w-\theta}(z)\differential z$

and

	$\displaystyle\phi_{i,\theta}(\lambda^{*};\mathcal{F}_{\alpha},\mathcal{B}_{i,w})$	$\displaystyle=\binom{i-w-1}{\theta-1}\int_{-\lambda_{\theta}}^{\infty}f(z+\lambda_{i})\quantity(1-F(z+\lambda_{i}))^{\theta-1}F^{i-w-\theta}(z+\lambda_{i})\differential z$
		$\displaystyle=\binom{i-w-1}{\theta-1}\int_{0}^{\infty}f(z)\quantity(1-F(z))^{\theta-1}F^{i-w-\theta}(z)\differential z.$

Define

I_{i,n}\quantity(q;\mathcal{F}_{\alpha})=\int_{0}^{\infty}\frac{1}{z^{n}}e^{-q/z^{\alpha}}.

Then, by Lemma 15, we immediately obtain

\frac{J_{i,\theta}(\lambda^{*};\mathcal{F}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{*};\mathcal{F}_{\alpha},\mathcal{B}_{i,w})}\leq\frac{I_{i,\alpha\theta+2}\quantity(i+1-w-\theta;\mathcal{F}_{\alpha})}{I_{i,\alpha\theta+1}\quantity(i+1-w-\theta;\mathcal{F}_{\alpha})}.

(54)

Similarly to the proofs of Honda et al. (2023) and Lee et al. (2024), we bound the RHS of (54) as follows. By letting $u=q/z^{\alpha}$ , both $I_{i,\alpha\theta+2}(q;\alpha)$ and $I_{i,\alpha\theta+1}(q;\alpha)$ can be expressed by Gamma function $\Gamma(k)=\int_{0}^{\infty}e^{-t}t^{k-1}\differential t$ as

	$\displaystyle I_{i,\alpha\theta+2}\quantity(q;\mathcal{F}_{\alpha})$	$\displaystyle=\int_{0}^{\infty}\frac{1}{z^{\alpha\theta+2}}\exp\quantity(-q/z^{\alpha})\differential z$
		$\displaystyle=\frac{1}{\alpha q}\int_{0}^{\infty}\quantity(\frac{u}{q})^{\theta+\frac{1}{\alpha}-1}e^{-u}\differential u$
		$\displaystyle=\frac{1}{\alpha q^{\theta+\frac{1}{\alpha}}}\int_{0}^{\infty}u^{\theta+\frac{1}{\alpha}-1}e^{-u}\differential u$
		$\displaystyle=\frac{1}{\alpha q^{\theta+\frac{1}{\alpha}}}\Gamma\quantity(\theta+\frac{1}{\alpha}),$

and

	$\displaystyle I_{i,\alpha\theta+1}\quantity(q;\mathcal{F}_{\alpha})$	$\displaystyle=\int_{0}^{\infty}\frac{1}{z^{\alpha\theta+1}}\exp\quantity(-q/z^{\alpha})\differential z$
		$\displaystyle=\frac{1}{\alpha q}\int_{0}^{\infty}\quantity(\frac{u}{q})^{\theta-1}e^{-u}\differential u$
		$\displaystyle=\frac{1}{\alpha q^{\theta}}\int_{0}^{\infty}u^{\theta-1}e^{-u}\differential u$
		$\displaystyle=\frac{1}{\alpha q^{\theta}}\Gamma\quantity(\theta).$

Replacing $q$ with $i+1-w-\theta$ , we have

\frac{I_{i,\alpha\theta+2}\quantity(i+1-w-\theta;\mathcal{F}_{\alpha})}{I_{i,\alpha\theta+1}\quantity(i+1-w-\theta;\mathcal{F}_{\alpha})}=\frac{1}{\sqrt[\alpha]{i+1-w-\theta}}\frac{\Gamma\quantity(\theta+\frac{1}{\alpha})}{\Gamma\quantity(\theta)}.

(55)

By Lemma 16, Gautschi’s inequality, we have

\frac{\Gamma\quantity(\theta+\frac{1}{\alpha})}{\Gamma\quantity(\theta)}\leq\quantity(\theta+\frac{1}{\alpha})^{\frac{1}{\alpha}}.

Combining this result with (55), we obtain

\frac{I_{i,\alpha\theta+2}\quantity(i+1-w-\theta;\mathcal{F}_{\alpha})}{I_{i,\alpha\theta+1}\quantity(i+1-w-\theta;\mathcal{F}_{\alpha})}\leq\quantity(\frac{\theta+\frac{1}{\alpha}}{i+1-w-\theta})^{\frac{1}{\alpha}}.

Since $w\in\quantity{0}\cup[(m\land i)-1],\theta\in[(m-w)\land(i-w)]$ , we have

	$\displaystyle\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m-w)\land(i-w)]\end{subarray}}\frac{\theta+\frac{1}{\alpha}}{i+1-w-\theta}$	$\displaystyle=\max_{w\in\quantity{0}\cup[(m\land i)-1]}\frac{(m\land i)+\frac{1}{\alpha}-w}{i+1-(m\land i)}$
		$\displaystyle=\frac{(m\land i)+\frac{1}{\alpha}}{i+1-(m\land i)}$
		$\displaystyle=\frac{(m\land i)+\frac{1}{\alpha}}{(i-m+1)\lor 1}.$

Recall that $\sigma_{i}=i$ as previously noted for notational simplicity. By Lemma 3 and (54), for any $i\in[d]$ , we have

	$\displaystyle\frac{J_{i}(\lambda;\mathcal{F}_{\alpha})}{\phi_{i}(\lambda;\mathcal{F}_{\alpha})}$	$\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m-w)\land(i-w)]\end{subarray}}\frac{J_{i,\theta}(\lambda^{};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})}{\phi_{i,\theta}(\lambda^{};\mathcal{P}_{\alpha},\mathcal{B}_{i,w})}$
		$\displaystyle\leq\max_{\begin{subarray}{c}w\in\quantity{0}\cup[(m\land i)-1]\\ \theta\in[(m-w)\land(i-w)]\end{subarray}}\frac{I_{i,\alpha\theta+2}(\lambda^{},i+1-w-\theta;\mathcal{F}_{\alpha})}{I_{i,\alpha\theta+1}(\lambda^{},i+1-w-\theta;\mathcal{F}_{\alpha})}$
		$\displaystyle\leq\quantity(\frac{(m\land\sigma_{i})+\frac{1}{\alpha}}{(\sigma_{i}-m+1)\lor 1})^{\frac{1}{\alpha}}.$

7.3 Proof of Lemma 5

Following the proofs of Honda et al. (2023) and Lee et al. (2024), we extend the statement to combinatorial semi-bandit setting.

Proof.

Define

\underline{\Omega}=\quantity{r:\quantity[\mathop{\arg\min}\limits_{a\in\mathcal{A}}\left\{a^{\top}(\eta(\hat{L}_{t}+(\ell_{t,i}\widehat{w_{t,i}^{-1}})e_{i})-r)\right\}]_{i}=1}

and

\overline{\Omega}=\quantity{r:\quantity[\mathop{\arg\min}\limits_{a\in\mathcal{A}}\left\{a^{\top}(\eta(\hat{L}_{t}+\hat{\ell}_{t})-r)\right\}]_{i}=1}.

Then, we have

\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\quantity(\ell_{t,i}\widehat{w_{t,i}^{-1}})e_{i});\mathcal{D}_{\alpha})=\mathbb{P}_{r\sim\mathcal{D}_{\alpha}}\quantity(\underline{\Omega}),\quad\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{D}_{\alpha})=\mathbb{P}_{r\sim\mathcal{D}_{\alpha}}\quantity(\overline{\Omega}).

Since $\underline{\Omega}\subset\overline{\Omega}$ , we immediately have

\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\quantity(\ell_{t,i}\widehat{w_{t,i}^{-1}})e_{i});\mathcal{D}_{\alpha})\leq\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{D}_{\alpha}),

with which we have

	$\displaystyle\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})-\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{D}_{\alpha})$	$\displaystyle\leq\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})-\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\quantity(\ell_{t,i}\widehat{w_{t,i}^{-1}})e_{i});\mathcal{D}_{\alpha})$
		$\displaystyle=\int_{0}^{\eta\ell_{t,i}\widehat{w_{t,i}^{-1}}}-\phi_{i}^{\prime}\quantity(\eta\hat{L}_{t}+xe_{i};\mathcal{D}_{\alpha})\differential x.$		(56)

Recalling that $\phi_{i}(\lambda;\mathcal{D}_{\alpha})$ is expressed as

\phi_{i}(\lambda;\mathcal{D}_{\alpha})=\sum_{\theta=1}^{m}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f(z+\lambda_{i})\sum_{\bm{v}\in\mathcal{S}_{i,\theta}}\quantity(\prod_{j:v_{j}=1}\quantity(1-F(z+\lambda_{j}))\prod_{j:v_{j}=0,j\neq i}F(z+\lambda_{j}))\differential z,

we see that $\phi^{\prime}_{i}(\lambda;\mathcal{D}_{\alpha})=\frac{\partial\phi_{i}}{\partial\lambda_{i}}(\lambda;\mathcal{D}_{\alpha})$ is expressed as

\phi^{\prime}_{i}(\lambda;\mathcal{D}_{\alpha})=\\ \sum_{\theta=1}^{m}\int_{\nu-\widetilde{\lambda}_{\theta}}^{\infty}f^{\prime}(z+\lambda_{i})\sum_{\bm{v}\in\mathcal{S}_{i,\theta}}\quantity(\prod_{j:v_{j}=1}\quantity(1-F(z+\lambda_{j}))\prod_{j:v_{j}=0,j\neq i}F(z+\lambda_{j}))\differential z.

Now we divide the proof into two cases.

Fréchet distribution

When $\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha}$ , since for $x>0$ ,

f^{\prime}(x)=\quantity(-\frac{\alpha(\alpha+1)}{x^{\alpha+2}}+\frac{\alpha^{2}}{x^{2(\alpha+1)}})e^{-1/x^{\alpha}},

we have

-f^{\prime}(x)=\quantity(\frac{\alpha(\alpha+1)}{x^{\alpha+2}}-\frac{\alpha^{2}}{x^{2(\alpha+1)}})e^{-1/x^{\alpha}}\leq\frac{\alpha(\alpha+1)}{x^{\alpha+2}}e^{-1/x^{\alpha}}=\frac{\alpha+1}{x}f(x).

Therefore, by (56) we have

$\displaystyle\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{F}_{\alpha})-\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{F}_{\alpha})$	$\displaystyle\leq(\alpha+1)\int_{0}^{\eta\ell_{t,i}\widehat{w_{t,i}^{-1}}}J_{i}\quantity(\eta\hat{L}_{t}+xe_{i};\mathcal{F}_{\alpha})\differential x$
	$\displaystyle\leq(\alpha+1)\int_{0}^{\eta\ell_{t,i}\widehat{w_{t,i}^{-1}}}J_{i}\quantity(\eta\hat{L}_{t};\mathcal{F}_{\alpha})\differential x$	(57)
	$\displaystyle=(\alpha+1)\eta\ell_{t,i}J_{i}\quantity(\eta\hat{L}_{t};\mathcal{F}_{\alpha})\widehat{w_{t,i}^{-1}},$

where (57) follows from the monotonicity of $J_{i}\quantity(\eta\hat{L}_{t};\mathcal{F}_{\alpha})$ .

Pareto distribution

When $\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}$ , since for $x>1$ ,

f^{\prime}(x)=-\alpha(\alpha+1)x^{-(\alpha+2)},

by (56) we have

$\displaystyle\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{P}_{\alpha})-\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{P}_{\alpha})$	$\displaystyle=(\alpha+1)\int_{0}^{\eta\ell_{t,i}\widehat{w_{t,i}^{-1}}}J_{i}\quantity(\eta\hat{L}_{t}+xe_{i};\mathcal{P}_{\alpha})\differential x$
	$\displaystyle\leq(\alpha+1)\int_{0}^{\eta\ell_{t,i}\widehat{w_{t,i}^{-1}}}J_{i}\quantity(\eta\hat{L}_{t};\mathcal{P}_{\alpha})\differential x$	(58)
	$\displaystyle=(\alpha+1)\eta\ell_{t,i}J_{i}\quantity(\eta\hat{L}_{t};\mathcal{P}_{\alpha})\widehat{w_{t,i}^{-1}},$

where (58) follows from the monotonicity of $J_{i}\quantity(\eta\hat{L}_{t};\mathcal{P}_{\alpha})$ .

Here note that $\widehat{{w_{t,i}^{-1}}}$ follows the geometric distribution with expectation $1/w_{t,i}$ , given $\hat{L}_{t}$ and $a_{t,i}$ , which satisfies

\mathbb{E}\quantity[\widehat{{w_{t,i}^{-1}}}^{2}\middle|\hat{L}_{t},a_{t,i}]=\frac{2}{w_{t,i}^{2}}-\frac{1}{w_{t,i}}\leq\frac{2}{w_{t,i}^{2}}.

(59)

Since $\hat{\ell}_{t,i}=\quantity(\ell_{t,i}\widehat{{w_{t,i}^{-1}}})e_{i}$ when $a_{t,i}=1$ , for $\mathcal{D}_{\alpha}\in\quantity{\mathcal{F}_{\alpha},\mathcal{P}_{\alpha}}$ we obtain

	$\displaystyle\mathbb{E}\quantity[\hat{\ell}_{t,i}\quantity(\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})-\phi_{i}\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{D}_{\alpha}))\middle\|\hat{L}_{t}]$
$\displaystyle\leq$	$\displaystyle\mathbb{E}\quantity[\mathbbm{1}\quantity[a_{t,i}=1]\ell_{t,i}\widehat{{w_{t,i}^{-1}}}\cdot(\alpha+1)\eta\ell_{t,i}J_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})\widehat{w_{t,i}^{-1}}\middle\|\hat{L}_{t}]$
$\displaystyle\leq$	$\displaystyle 2(\alpha+1)\eta\mathbb{E}\quantity[w_{t,i}\frac{\ell^{2}_{t,i}J_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})}{w^{2}_{t,i}}\middle\|\hat{L}_{t}]$
$\displaystyle\leq$	$\displaystyle 2(\alpha+1)\eta\mathbb{E}\quantity[\frac{J_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})}{\phi_{i}\quantity(\eta\hat{L}_{t};\mathcal{D}_{\alpha})}\middle\|\hat{L}_{t}]$
$\displaystyle\leq$	$\displaystyle\begin{cases}2(\alpha+1)\eta\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{(\sigma_{i}-m+1)\lor 1})^{\frac{1}{\alpha}},&\text{if }\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha},\\ 4\alpha\eta\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{\sigma_{i}})^{\frac{1}{\alpha}},&\text{if }\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}.\end{cases}$	(by Lemma 4)

∎

7.4 Proof of Lemma 6

By Lemma 5, we have the following result.

7.4.1 Fréchet Distribution

When $\mathcal{D}_{\alpha}=\mathcal{F}_{\alpha}$ , we have

		$\displaystyle\mathbb{E}\quantity[\hat{\ell}_{t}\quantity(\phi\quantity(\eta\hat{L}_{t};\mathcal{F}_{\alpha})-\phi\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{F}_{\alpha}))\middle\|\hat{L}_{t}]$
	$\displaystyle=$	$\displaystyle\sum_{i\in[d]}2(\alpha+1)\eta\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{(\sigma_{i}-m+1)\lor 1})^{\frac{1}{\alpha}}$
	$\displaystyle\leq$	$\displaystyle 2(\alpha+1)\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+\sum_{i=1}^{d-m+1}\frac{1}{\sqrt[\alpha]{i}})$
	$\displaystyle\leq$	$\displaystyle 2(\alpha+1)\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+1+\int_{1}^{d-m+1}x^{-1/\alpha}\differential x)$
	$\displaystyle=$	$\displaystyle 2(\alpha+1)\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+\frac{\alpha(d-m+1)^{1-1/\alpha}-1}{\alpha-1})$
	$\displaystyle\leq$	$\displaystyle 2(\alpha+1)\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(m+\frac{\alpha}{\alpha-1}(d-m+1)^{1-1/\alpha}).$

7.4.2 Pareto Distribution

When $\mathcal{D}_{\alpha}=\mathcal{P}_{\alpha}$ , we have

	$\displaystyle\mathbb{E}\quantity[\hat{\ell}_{t}\quantity(\phi\quantity(\eta\hat{L}_{t};\mathcal{P}_{\alpha})-\phi\quantity(\eta\quantity(\hat{L}_{t}+\hat{\ell}_{t});\mathcal{P}_{\alpha}))\middle\|\hat{L}_{t}]$	$\displaystyle=\sum_{i\in[d]}4\alpha\eta\quantity(\frac{(\sigma_{i}\land m)+\frac{1}{\alpha}}{\sigma_{i}})^{\frac{1}{\alpha}}$
		$\displaystyle=\sum_{i\in[d]}4\alpha\eta\quantity(\frac{m+\frac{1}{\alpha}}{\sigma_{i}})^{\frac{1}{\alpha}}$
		$\displaystyle\leq 4\alpha\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\quantity(1+\int_{1}^{d}x^{-1/\alpha}\differential x)$
		$\displaystyle=4\alpha\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}\frac{\alpha d^{1-1/\alpha}-1}{\alpha-1}$
		$\displaystyle\leq\frac{4\alpha^{2}}{\alpha-1}\eta\quantity(m+\frac{1}{\alpha})^{\frac{1}{\alpha}}d^{1-1/\alpha}.$

8 Technical Lemmas

Lemma 16 (Gautschi’s inequality).

For $x>0$ and $s\in(0,1)$ ,

x^{1-s}<\frac{\Gamma(x+1)}{\Gamma(x+s)}<(x+1)^{1-s}.

Lemma 17.

(Malik, 1966, Eq. (3.7)) Let $X_{k,n}$ be the $k$ -th order statistics of i.i.d. RVs from $\mathcal{P}_{\alpha}$ for $k\in[n]$ , where $\alpha>1$ . Then, we have

\mathbb{E}[X_{k,n}]=\frac{\Gamma\quantity(n+1)\Gamma\quantity(n-k-\frac{1}{\alpha}+1)}{\Gamma\quantity(n-k+1)\Gamma\quantity(n-\frac{1}{\alpha}+1)}.

Lemma 18.

Let $F(x)$ and $G(x)$ be CDFs of some random variables such that $G(x)\geq F(x)$ for all $x\in\mathbb{R}$ . Let $(X_{1},X_{2},\dots,X_{n})$ (resp. $(Y_{1},Y_{2},\dots,Y_{n})$ ) be RVs i.i.d. from $F$ (resp. $G$ ), and $X_{k,n}$ (resp. $Y_{k,n}$ ) be its $k$ -th order statistics for any $k\in[n]$ . Then, $\mathbb{E}[Y_{k,n}]\leq\mathbb{E}[X_{k,n}]$ holds.

Proof.

Let $U\in[0,1]$ be uniform random variable over $[0,1]$ and let $X=F^{-1}(U)$ and $Y=G^{-1}(U)$ , where $F^{-1}$ and $G^{-1}$ are the left-continuous inverses of $F$ and $G$ , respectively. Then, $Y\leq X$ holds almost surely and the marginal distributions satisfy $X\sim F$ and $Y\sim G$ . Therefore, if when take $(X_{1},Y_{1}),\dots,(X_{n},Y_{n})$ as i.i.d. copies of this $(X,Y)$ , we see that $Y_{k,n}\leq X_{k,n}$ holds almost surely, which proves the lemma. ∎

Lemma 19.

Let $X_{k,n}$ (resp. $Y_{k,n}$ ) be the $k$ -th order statistics of i.i.d. RVs from $\mathcal{P}_{\alpha}$ and $\mathcal{F}_{\alpha}$ for $k\in[n]$ . Then, $\mathbb{E}[Y_{k,n}]\leq\mathbb{E}[X_{k,n}]+1$ holds.

Proof.

Letting $F(x)$ and $G(x)$ be the CDFs of $\mathcal{P}_{\alpha}$ and $\mathcal{F}_{\alpha}$ , we have

	$\displaystyle G(x)$	$\displaystyle=\mathbbm{1}[x\geq 0]e^{-1/x^{\alpha}}$
		$\displaystyle\geq\mathbbm{1}[x\geq 0]\left(1-\frac{1}{x^{\alpha}}\right)$
		$\displaystyle\geq\mathbbm{1}[x-1\geq 0]\left(1-\frac{1}{((x-1)+1)^{\alpha}}\right)$
		$\displaystyle=F(x-1),$

where $F(x-1)$ is the CDF of $X+1$ for $X\sim\mathcal{P}_{\alpha}$ . Then, it holds from Lemma 18 that $\mathbb{E}[Y_{k,n}]\leq\mathbb{E}[X_{k,n}+1]=\mathbb{E}[X_{k,n}]+1$ . ∎

9 Issues in Proof of Claimed Extension to the Monotone Decreasing Case

In this section, we use the same notation as Zhan et al. (2025) and reconstruct their missing arguments in Lemma 4.1 in Zhan et al. (2025). The claim proposed by them is as follows.

Claim 1 (Lemma 4.1 in Zhan et al. (2025)).

For any $\mathcal{I}\subseteq\quantity{1,2,\cdots,d}$ , $i\notin\mathcal{I}$ , $\lambda\in\mathbb{R}^{d}$ such that $\lambda_{i}\geq 0$ and any $N\geq 3$ , let

J_{i,N,\mathcal{I}}(\lambda)\coloneqq\int_{0}^{\infty}\frac{1}{(x+\lambda_{i})^{N}}\prod_{q\in\mathcal{I}}\quantity(1-F(x+\lambda_{q}))\prod_{q\notin\mathcal{I}}F(x+\lambda_{q})\differential x.

Then, for all $k>0$ , $\frac{J_{i,N+k,\mathcal{I}}(\lambda)}{J_{i,N,\mathcal{I}}(\lambda)}$ is increasing in $\lambda_{q}\geq 0$ for $q\notin\mathcal{I}$ and $q\neq i$ , while decreasing in $\lambda_{q}\geq 0$ for $q\in\mathcal{I}$ .

In their paper, they consider the case that $F(x)$ is the cumulative distribution function of Fréchet distribution with shape $2$ , which is expressed as

F(x)=e^{-1/x^{2}},\quad x\geq 0,

and thus we also adopt the same setting in this section. For this claim, they only gave the proof of the monotonic increase for the former case and it is written that the monotonic decrease for the latter case “can be shown by the same argument”. Nevertheless, the monotonic decrease is not proved by the same argument, and it is highly likely that the statement itself does not hold as we will demonstrate below.

Now we follow the line of the proof of monotone increasing case in Zhan et al. (2025), giving an attempt to prove that for all $k>0$ , $\frac{J_{i,N+k,\mathcal{I}}(\lambda)}{J_{i,N,\mathcal{I}}(\lambda)}$ is monotonically decreasing on $\lambda_{q}\geq 0$ for $q\in\mathcal{I}$ . For $q_{0}\in\mathcal{I}$ , denote

	$\displaystyle J^{q_{0}}_{i,N,\mathcal{I}}(\lambda)$	$\displaystyle=-\frac{1}{2}\frac{\partial}{\partial\lambda_{q_{0}}}J_{i,N,\mathcal{I}}(\lambda)$
		$\displaystyle=\int_{0}^{\infty}\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{N}(x+\lambda_{q_{0}})^{3}}\prod_{q\in\mathcal{I}\setminus\quantity{q_{0}}}\quantity(1-F(x+\lambda_{q}))\prod_{q\notin\mathcal{I}}F(x+\lambda_{q})\differential x.$

Hence,

\frac{\partial}{\partial\lambda_{q_{0}}}\frac{J_{i,N+k,\mathcal{I}}(\lambda)}{J_{i,N,\mathcal{I}}(\lambda)}=-2\cdot\frac{J^{q_{0}}_{i,N+k,\mathcal{I}}(\lambda)J_{i,N,\mathcal{I}}(\lambda)-J_{i,N+k,\mathcal{I}}(\lambda)J^{q_{0}}_{i,N,\mathcal{I}}(\lambda)}{J_{i,N,\mathcal{I}}(\lambda)^{2}}.

(60)

Letting $Q(x)=\frac{1}{(x+\lambda_{i})^{N}}\prod_{q\in\mathcal{I}\setminus\quantity{q_{0}}}\quantity(1-F(x+\lambda_{q}))\prod_{q\notin\mathcal{I}}F(x+\lambda_{q})$ , we have

		$\displaystyle J^{q_{0}}_{i,N+k,\mathcal{I}}(\lambda)J_{i,N,\mathcal{I}}(\lambda)$
	$\displaystyle=$	$\displaystyle\iint_{x,y\geq 0}\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}(x+\lambda_{q_{0}})^{3}}\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}})Q(x)Q(y)\differential x\differential y$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\iint_{x,y\geq 0}Q(x)Q(y)\quantity[\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}(x+\lambda_{q_{0}})^{3}}\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}})+\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{i})^{k}(y+\lambda_{q_{0}})^{3}}\quantity(1-e^{-1/(x+\lambda_{q_{0}})^{2}})]\differential x\differential y.$

Here, it is worth noting that $\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}(x+\lambda_{q_{0}})^{3}}$ and $\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}})$ share no common factors, and thus no factor can be hidden in $Q(x)$ .

Similarly, we have

		$\displaystyle J_{i,N+k,\mathcal{I}}(\lambda)J^{q_{0}}_{i,N,\mathcal{I}}(\lambda)$
	$\displaystyle=$	$\displaystyle\iint_{x,y\geq 0}\frac{1-e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}}\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{q_{0}})^{3}}Q(x)Q(y)\differential x\differential y$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\iint_{x,y\geq 0}Q(x)Q(y)\quantity[\frac{1-e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}}\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{q_{0}})^{3}}+\frac{1-e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{i})^{k}}\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{q_{0}})^{3}}]\differential x\differential y.$

Now we substitute the above two expressions into (60).

By an elementary calculation, we have

	$\displaystyle\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}(x+\lambda_{q_{0}})^{3}}\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}})+\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{i})^{k}(y+\lambda_{q_{0}})^{3}}\quantity(1-e^{-1/(x+\lambda_{q_{0}})^{2}})$
$\displaystyle-$	$\displaystyle\frac{1-e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{i})^{k}}\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{q_{0}})^{3}}-\frac{1-e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{i})^{k}}\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{q_{0}})^{3}}$
$\displaystyle=$	$\displaystyle\frac{(y+\lambda_{i})^{k}-(x+\lambda_{i})^{k}}{(x+\lambda_{i})^{k}(y+\lambda_{i})^{k}}\quantity(\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}})}{(x+\lambda_{q_{0}})^{3}}-\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}\quantity(1-e^{-1/(x+\lambda_{q_{0}})^{2}})}{(y+\lambda_{q_{0}})^{3}})$
$\displaystyle=$	$\displaystyle\frac{\quantity((y+\lambda_{i})^{k}-(x+\lambda_{i})^{k})\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}})\quantity(1-e^{-1/(x+\lambda_{q_{0}})^{2}})}{(x+\lambda_{i})^{k}(y+\lambda_{i})^{k}}$
$\displaystyle\cdot$	$\displaystyle\quantity(\frac{e^{-1/(x+\lambda_{q_{0}})^{2}}}{(x+\lambda_{q_{0}})^{3}\quantity(1-e^{-1/(x+\lambda_{q_{0}})^{2}})}-\frac{e^{-1/(y+\lambda_{q_{0}})^{2}}}{(y+\lambda_{q_{0}})^{3}\quantity(1-e^{-1/(y+\lambda_{q_{0}})^{2}})}).$	(61)

Here, for (61), one can see that when $x\geq y$ , the first term becomes negative. On the other hand, when $x\leq y$ , the first term becomes positive. Define

h(x)=\frac{e^{-1/x^{2}}}{x^{3}\quantity(1-e^{-1/x^{2}})}.

If $h(x)$ can be shown to be monotonically decreasing, then (60) is negative, that is, the claim holds. However, the derivative of $h(x)$ is given as

h^{\prime}(x)=\frac{e^{-1/x^{2}}\quantity(3x^{2}\quantity(e^{-1/x^{2}}-1)+2)}{x^{6}\quantity(1-e^{-1/x^{2}})^{2}},

where one can see that $h^{\prime}(x)$ is not always negative, and thus $h(x)$ is not monotonic in $x\in[0,\infty)$ .

Refer to caption — Figure 1: The ratio $J_{i,4,\mathcal{I}}(\lambda)/J_{i,3,\mathcal{I}}(\lambda)$ as a function of $\lambda_{q_{0}}$ for $q_{0}\in\mathcal{I}\setminus\quantity{i}$ .

By the analysis above, we can see that the monotone decreasing case can not be proved by the same argument as in increasing case.

Numerical Simulation

We now turn to a numerical simulation implemented in Python to illustrate this failure. As a counterexample, it is sufficient to show one special case where $\left\lvert\mathcal{I}\right\rvert=5$ , $q_{0}\in\mathcal{I}$ and $\lambda_{q}=\lambda_{i}=0.5$ for all $q\neq q_{0}$ . Then, we consider the expression

\frac{J_{i,4,\mathcal{I}}(\lambda_{q_{0}},\lambda_{i})}{J_{i,3,\mathcal{I}}(\lambda_{q_{0}},\lambda_{i})}=\frac{\int_{0}^{\infty}\frac{1}{(x+\lambda_{i})^{4}}\quantity(1-F(x+\lambda_{q_{0}}))\quantity(1-F(x+\lambda_{i}))^{4}F(x+\lambda_{i})^{2}\differential x}{\int_{0}^{\infty}\frac{1}{(x+\lambda_{i})^{3}}\quantity(1-F(x+\lambda_{q_{0}}))\quantity(1-F(x+\lambda_{i}))^{4}F(x+\lambda_{i})^{2}\differential x}.

Here, we set $N=3$ and $k=1$ , which are the parameters used in the subsequent analysis in Zhan et al. (2025). Treating this expression as a function of $\lambda_{q_{0}}$ , we can observe from Figure 1 that the function is not monotonic in $\lambda_{q}$ , and it does not hold that $\frac{J_{i,4,\mathcal{I}}(\lambda_{q_{0}},\lambda_{i})}{J_{i,3,\mathcal{I}}(\lambda_{q_{0}},\lambda_{i})}\leq\frac{J_{i,4,\mathcal{I}}(\lambda_{i},\lambda_{i})}{J_{i,3,\mathcal{I}}(\lambda_{i},\lambda_{i})}$ for some $\lambda_{q_{0}}>\lambda_{i}$ .

From these arguments, it is highly likely that Lemma 4.1 in Zhan et al. (2025) does not hold, and at least the current proof is not complete. On the other hand, our analysis is constructed in a way that does not need the monotonicity of $\frac{J_{i,N+k,\mathcal{I}}(\lambda)}{J_{i,N,\mathcal{I}}(\lambda)}$ , which becomes the key to the extension of the results for the MAB to the combinatorial semi-bandit setting.

Acknowledgements

We thank Dr. Jongyeong Lee for pointing out the error in the proof of Lemma 10 in the previous version.

	$\displaystyle\mathbb{E}\quantity[M_{t}\middle\|\hat{L}_{t}]$	$\displaystyle=\mathbb{E}\quantity[\max_{i:a_{t,i}=1}M_{t,i}\middle\|\hat{L}_{t}]$
		$\displaystyle\leq\mathbb{E}\quantity[\sum_{i=1}^{d}a_{t,i}M_{t,i}\middle\|\hat{L}_{t}]$
		$\displaystyle=\sum_{i=1}^{d}\mathbb{E}\quantity[a_{t,i}\middle\|\hat{L}_{t}]\mathbb{E}\quantity[M_{t,i}\middle\|\hat{L}_{t},a_{t,i}]$
		$\displaystyle=\sum_{i=1}^{d}w_{t,i}\cdot\frac{1}{w_{t,i}}=d.$

$\displaystyle\mathbb{E}_{r_{t}^{\prime\prime}\sim\mathcal{D}\|\mathcal{E}_{t,i}}[M_{t,i}\|\hat{L}_{t},a_{t,i}]$	$\displaystyle=\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1\|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}]\sum_{n=1}^{\infty}n\quantity(1-\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1\|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}])^{n-1}$
	$\displaystyle=\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1\|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}]/\quantity(\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1\|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}])^{2}$
	$\displaystyle=1/\mathbb{P}[\chi_{t,i}(r_{t}^{\prime\prime})=1\|\mathcal{E}_{t,i},\hat{L}_{t},a_{t,i}].{}$	(8)

	$\displaystyle 1$	$\displaystyle=\mathbb{P}\quantity[\bigcup_{j:\sigma_{j}\leq\sigma_{i}}\{\text{{Rank}}_{i,j}=\theta_{0}\}\middle\|\hat{L}_{t},a_{t,i}]$
		$\displaystyle=\sum_{j:\sigma_{j}\leq\sigma_{i}}\mathbb{P}\quantity[\text{{Rank}}_{i,j}=\theta_{0}\middle\|\hat{L}_{t},a_{t,i}]$
		$\displaystyle=\sigma_{i}\mathbb{P}\quantity[\text{{Rank}}_{i,i}=\theta_{0}\middle\|\hat{L}_{t},a_{t,i}],$

	$\displaystyle\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}\|\mathcal{E}_{t,i}}[M_{t}\|\hat{L}_{t}]$	$\displaystyle=\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}\|\mathcal{E}_{t,i}}\quantity[\max_{i:a_{t,i}=1,\sigma_{i}\leq m}M_{t,i}+\sum_{i:a_{t,i}=1,\sigma_{i}>m}M_{t,i}\middle\|\hat{L}_{t},a_{t}]$
		$\displaystyle\leq\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}\|\mathcal{E}_{t,i}}\quantity[\sum_{i=1}^{d}a_{t,i}M_{t,i}\middle\|\hat{L}_{t},a_{t}]$
		$\displaystyle=\sum_{i=1}^{d}\mathbb{P}[a_{t,i}=1\|\hat{L}_{t}]\mathbb{E}_{r^{\prime}_{t}\sim\mathcal{D},r^{\prime\prime}_{t}\sim\mathcal{D}\|\mathcal{E}_{t,i}}[M_{t,i}\|\hat{L}_{t},a_{t,i}=1]$
		$\displaystyle=\sum_{i=1}^{m}w_{t,i}\cdot\frac{1}{w_{t,i}}+\sum_{i=m+1}^{d}w_{t,i}\cdot\frac{m}{\sigma_{i}w_{t,i}}$
		$\displaystyle\leq m+m\int_{m}^{d}\frac{1}{x}\differential x$
		$\displaystyle=m+m\log\quantity(\frac{d}{m}).$

	$\displaystyle\frac{J_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{i,w_{0}})}{\phi_{i}(\lambda^{};\mathcal{D}_{\alpha},\widetilde{m}_{0},\mathcal{B}_{i,w_{0}})}$	$\displaystyle=\frac{\sum_{\theta=1}^{\widetilde{m}_{0}}J_{i,\theta}(\lambda^{};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}})}{\sum_{\theta=1}^{\widetilde{m}_{0}}\phi_{i,\theta}(\lambda^{};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}})}$
		$\displaystyle\leq\max_{\theta\in[\widetilde{m}_{0}]}\quantity{\frac{J_{i,\theta}(\lambda^{};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}})}{\phi_{i,\theta}(\lambda^{};\mathcal{D}_{\alpha},\mathcal{B}_{i,w_{0}})}}.$		(48)

Note on Follow-the-Perturbed-Leader in Combinatorial Semi-Bandit Problems

Abstract

1 Introduction

Contributions of This Paper

1.1 Related Work

1.1.1 Technical Issues in Zhan et al. (2025)

2 Problem Setup

2.1 Follow-the-Perturbed-Leader

2.2 Geometric Resampling

3 Regret Bounds

3.1 Main Results

Theorem 1.

3.2 Regret Decomposition

Lemma 2.

4 Stability of Arm-selection Probability

4.1 General Tools for Analysis

4.2 Important Lemmas

Lemma 3.

Lemma 4.

Lemma 5.

Lemma 6.

5 Conditional Geometric Resampling for Size-Invariant Semi-Bandit

Lemma 7.

Proof.

Lemma 8.

Proof.

Average Complexity

Remark 1.

6 Proofs for regret decomposition

Lemma 9.

Proof.

Lemma 10.

Proof.

Pareto Distribution

Fréchet Distribution

7 Analysis on Stability Term

7.1 Proof for Monotonicity

Lemma 11.

Proof.

Fréchet Distribution

Pareto Distribution

Lemma 12.

Proof.

Lemma 13.

Proof.

Lemma 14.

Proof.

Inequality (30)

Inequality (31)

7.1.1 Proof of Lemma 3

Proof.

7.2 Proof of Lemma 4

7.2.1 Pareto Distribution

7.2.2 Fréchet Distribution

Lemma 15.

Proof.

7.3 Proof of Lemma 5

Proof.

Fréchet distribution

Pareto distribution

7.4 Proof of Lemma 6

7.4.1 Fréchet Distribution

7.4.2 Pareto Distribution

8 Technical Lemmas

Lemma 16 (Gautschi’s inequality).

Lemma 17.

Lemma 18.

Proof.

Lemma 19.

Proof.

9 Issues in Proof of Claimed Extension to the Monotone Decreasing Case

Claim 1 (Lemma 4.1 in Zhan et al. (2025)).

Numerical Simulation

Acknowledgements

References

Note on Follow-the-Perturbed-Leader in
Combinatorial Semi-Bandit Problems