This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Multi-point Feedback of Bandit Convex Optimization with Hard Constraints

Yasunari Hikima
Artificial Intelligence Laboratory, Fujitsu Limited, Japan
hikima.yasunari@fujitsu.com
Abstract

This paper studies bandit convex optimization with constraints, where the learner aims to generate a sequence of decisions under partial information of loss functions such that the cumulative loss is reduced as well as the cumulative constraint violation is simultaneously reduced. We adopt the cumulative hard constraint violation as the metric of constraint violation, which is defined by t=1Tmax{gt(𝒙t),0}\sum_{t=1}^{T}\max\{g_{t}(\bm{x}_{t}),0\}. Owing to the maximum operator, a strictly feasible solution cannot cancel out the effects of violated constraints compared to the conventional metric known as long-term constraints violation. We present a penalty-based proximal gradient descent method that attains a sub-linear growth of both regret and cumulative hard constraint violation, in which the gradient is estimated with a two-point function evaluation. Precisely, our algorithm attains O(d2Tmax{c,1c})O(d^{2}T^{\max\{c,1-c\}}) regret bounds and O(d2T1c2)O(d^{2}T^{1-\frac{c}{2}}) cumulative hard constraint violation bounds for convex loss functions and time-varying constraints, where dd is the dimensionality of the feasible region and c[12,1)c\in[\frac{1}{2},1) is a user-determined parameter. We also extend the result for the case where the loss functions are strongly convex and show that both regret and constraint violation bounds can be further reduced.

1 Introduction

Bandit Convex Optimization (BCO) is a fundamental framework of sequential decision-making under uncertain environments and with limited feedback, which can be regarded as a structured repeated game between a learner and an environment (Hazan et al. 2016, Lattimore and Szepesvári 2020). In this framework, a learner is given a convex feasible region 𝒳d\mathcal{X}\subseteq\mathbb{R}^{d} and the total number TT of rounds. At each round, t=1,2,,Tt=1,2,\dots,T, the learner makes decision 𝒙t𝒳\bm{x}_{t}\in\mathcal{X}, and then a convex loss function ft:𝒳f_{t}:\mathcal{X}\to\mathbb{R} is revealed. The learner cannot access the loss function ftf_{t}, but only the bandit feedback is available, i.e., the learner can only observe the value of the loss at the point she committed to, i.e., ft(𝒙t)f_{t}(\bm{x}_{t}). The objective of the learner is to generate a sequence of decisions {𝒙t}t=1T𝒳\{\bm{x}_{t}\}_{t=1}^{T}\subseteq\mathcal{X} that minimizes cumulative loss t=1Tft(𝒙t)\sum_{t=1}^{T}f_{t}(\bm{x}_{t}) under bandit feedback. The performance of the learner is evaluated in terms of regret, which is defined by

RTt=1Tft(𝒙t)min𝒙𝒳t=1Tft(𝒙).R_{T}\coloneqq\sum_{t=1}^{T}f_{t}(\bm{x}_{t})-\min_{\bm{x}\in\mathcal{X}}\sum_{t=1}^{T}f_{t}(\bm{x}).

This regret measures the difference between the cumulative loss of the learner’s strategy and the minimum possible cumulative loss where the sequence of loss functions {ft(𝒙)}t=1T\{f_{t}(\bm{x})\}_{t=1}^{T} had been known in advance and the learner could choose the best fixed optimal decision in hindsight.

In many real-world scenarios, the decisions are often subject to some constraints such as budget or resources. In the context of Online Convex Optimization (OCO), where the learner has access to the complete information about the loss functions, a projection operator is typically applied in each round so that the decisions belong to constraints (Zinkevich 2003, Hazan et al. 2016). However, such a projection step is typically a computational bottleneck when the feasible region is complex.

To address the issue of the projection step, Mahdavi et al. (2012) considers online convex optimization with long-term constraints, where the learner aims to generate a sequence of decisions that the decisions satisfy constraints in the long run, instead of requiring to satisfy the constraints in all rounds. They introduce the cumulative soft constraint violation metric defined by VTsoftt=1Tgt(𝒙t)V^{\text{soft}}_{T}\coloneqq\sum_{t=1}^{T}g_{t}(\bm{x}_{t}), where gt(𝒙)0g_{t}(\bm{x})\leq 0 is the functional constraint to be satisfied. Later, Yuan and Lamperski (2018) consideres strict notion of constraint violation reffered to as cumulative hard constraint violation, which is defined by VThardt=1Tmax{gt(𝒙t),0}V^{\text{hard}}_{T}\coloneqq\sum_{t=1}^{T}\max\{g_{t}(\bm{x}_{t}),0\}. This metric overcomes the drawback of cumulative soft constraint violation, and it is suitable for safety-critical systems, in which the failure of constraint violation may result in catastrophic consequences.

To see that the notion of cumulative hard constraint violation is a stronger metric, let us consider the example discussed in Guo et al. (2023). Given a sequence of decisions whose constraint functions are {gt(𝒙t)}t=1T\{g_{t}(\bm{x}_{t})\}_{t=1}^{T} with T=1000T=1000 such that gt(𝒙t)=1g_{t}(\bm{x}_{t})=-1 if tt is odd; otherwise gt(𝒙t)=1g_{t}(\bm{x}_{t})=1, we have t=1τgt(𝒙t)0\sum_{t=1}^{\tau}g_{t}(\bm{x}_{t})\leq 0 for any τ{1,2,,T}\tau\in\{1,2,\dots,T\}, however, the constraint gt(𝒙)0g_{t}(\bm{x})\leq 0 is violated at half of rounds. On the other hand, the notion of hard constraint violation can capture the constraint violation since we have VThard=500V^{\text{hard}}_{T}=500. Thus, the conventional definition of cumulative soft constraint violation VTsoftV_{T}^{\text{soft}} cannot accurately measure the constraint violation but cumulative hard constraint violation VThardV_{T}^{\text{hard}} can.

Many existing algorithms for BCO with constraints proposed in prior works typically involve projection operators as well as algorithms for OCO with constraints (Agarwal et al. 2010, Zhao et al. 2021), and are generally limited to the simple convex set. Chen et al. (2019), Garber and Kretzu (2020) consider a projection-free algorithm for BCO, but the constraint violation bound has not been reported. Some studies have extended the algorithm for OCO with soft constraints to the bandit setting (Mahdavi et al. 2012, Cao and Liu 2018), however, these algorithms cannot be directly extended to BCO with hard constraints. In other words, there has been no algorithm that can simultaneously achieve sub-linear bound both regret and cumulative hard constraints violation.

The present study focuses on the particular case of multi-point feedback of BCO with constraints, in which the loss functions are convex or strongly convex, and constraint violation is evaluated in terms of hard constraints. This kind of problem widely appears in real-world scenarios such as portfolio management problems, in which the manager has concrete constraints to be satisfied but only has access to the loss function ft()f_{t}(\cdot) at several points close to the decision 𝒙t\bm{x}_{t}. We present a penalty-based proximal gradient descent method which attains both O(d2Tmax{c,1c})O(d^{2}T^{\max\{c,1-c\}}) regret bound and O(d2T1c2)O(d^{2}T^{1-\frac{c}{2}}) cumulative hard constraint violation bound, where dd is the dimensionality of the feasible region and c[12,1)c\in[\frac{1}{2},1) is a user-determined parameter. Our proposed algorithm is inspired by a gradient estimation in the BCO literature (Flaxman et al. 2005, Agarwal et al. 2010) and an algorithm for OCO with hard constraints (Guo et al. 2022).

1.1 Related work

For OCO with constraints, a projection operator is generally applied to the updated variables to enforce them feasible at each round (Zinkevich 2003, Duchi et al. 2010). However, such projection is typically inefficient to implement due to the high computational effort especially when the feasible region 𝒳\mathcal{X} is complex (e.g., 𝒳\mathcal{X} is characterized by multiple inequalities), and efficient projection computation is limited to simple sets such as 1\ell_{1}-ball or probability simplex (Duchi et al. 2008).

Instead of requiring that the decisions belong to the feasible region in all rounds, Mahdavi et al. (2012) first considers relaxing the notion of constraints by allowing them to be violated at some rounds but requiring them to be satisfied in the long run. This type of OCO is referred to as online convex optimization with long-term constraints, and the performance metric for constraint violation is defined by the cumulative violation of the decisions from the constraints for all rounds, i.e., VTsoftt=1Tgt(𝒙t)V^{\text{soft}}_{T}\coloneqq\sum_{t=1}^{T}g_{t}(\bm{x}_{t}) referred to as soft constraints. Mahdavi et al. (2012) proposes a primal-dual gradient-based algorithm that attains O(T)O(\sqrt{T}) regret bound and O(T34)O(T^{\frac{3}{4}}) constraint violations and subsequent researches have been conducted to improve both bounds. Jenatton et al. (2016) extends the algorithm to achieve O(Tmax{c,1c})O(T^{\max\{c,1-c\}}) regret bound and O(T1c2)O(T^{1-\frac{c}{2}}) constraint violation, where c(0,1)c\in(0,1) is a user-determined parameter. Yu and Neely (2020) proposes the drift-plus-penalty based algorithm developed for stochastic optimization in dynamic queue networks (Neely 2022), and prove the algorithm attains O(T)O(\sqrt{T}) regret bound and O(1)O(1) constraint violation bound.

Yuan and Lamperski (2018) proposes the more strict notion of a constraint violation, which is defined by VThardt=1Tmax{gt(𝒙t),0}V^{\text{hard}}_{T}\coloneqq\sum_{t=1}^{T}\max\{g_{t}(\bm{x}_{t}),0\}, so as not to cancel out the effect of violated constraints by the strict feasible solution. Such paradigm is later referred to as online convex optimization with hard constraints (Guo et al. 2022). In Yuan and Lamperski (2018), an algorithm that attains O(Tmax{c,1c})O(T^{\max\{c,1-c\}}) regret bound and O(T1c2)O(T^{1-\frac{c}{2}}) constraint violation bound has proposed. Yi et al. (2021) extends the algorithm that attains O(Tmax{c,1c})O(T^{\max\{c,1-c\}}) regret bound and O(T1c2)O(T^{\frac{1-c}{2}}) constraint violation bound, and Yi et al. (2021) also consideres the general dynamic regret bound. Guo et al. (2022) proposes an algorithm that rectifies updated variables and penalty variables and proves the algorithm attains O(T)O(\sqrt{T}) regret bound and O(T34)O(T^{\frac{3}{4}}) constraint violation for convex loss functions.

In the partial information setting, a learner is limited to accessing the loss functions and thus the learner cannot construct an algorithm by using a gradient of loss functions. Flaxman et al. (2005) considers a one-point feedback model, where only one-point function value is available, and constructed an unbiased estimator of the gradient of the loss functions. By employing the gradient estimator, they applied online gradient descent algorithm (Zinkevich 2003) and proved the algorithm attains O(d23T23)O(d^{\frac{2}{3}}T^{\frac{2}{3}}) regret bound. Another variant of the feedback model is multi-point feedback, where the learner is allowed to query multiple points of function in each round. Agarwal et al. (2010) and Nesterov and Spokoiny (2017) consideres two-point feedback model and establishes an O(d2T)O(d^{2}\sqrt{T}) regret bound for convex loss functions.

Table 1: Regret bound and cumulative constraint violation bound for bandit convex optimization with constraints. The column of “Metric” stands for the metric of constraint violation.
Reference Bandit Metric Loss Regret Violation
Flaxman et al. (2005) \checkmark convex O(d23T23)O(d^{\frac{2}{3}}T^{\frac{2}{3}})
Agarwal et al. (2010) \checkmark convex O(d2T)O(d^{2}\sqrt{T})
\checkmark str.-convex O(d2logT)O(d^{2}\log T)
Mahdavi et al. (2012) \checkmark soft convex O(T)O(\sqrt{T}) O(T34)O(T^{\frac{3}{4}})
Guo et al. (2022) hard convex O(T)O(\sqrt{T}) O(T34)O(T^{\frac{3}{4}})
str.-convex O(logT)O(\log T) O(T(1+logT))O(\sqrt{T(1+\log T)})
\checkmark convex O(d2Tmax{c,1c})O(d^{2}T^{\max\{c,1-c\}}) O(d2T1c2)O(d^{2}T^{1-\frac{c}{2}})
This work \checkmark hard str.-convex O(d2logT)O(d^{2}\log T) O(d2T(1+logT))O(d^{2}\sqrt{T(1+\log T)})

1.2 Contribution

This paper focuses on the multi-point feedback BCO with constraints, in which the constraint violation is evaluated in terms of cumulative hard constraint violation. We propose an algorithm (Algorithm 1) for the BCO and show that the proposed algorithm attains an O(d2Tmax{c,1c})O(d^{2}T^{\max\{c,1-c\}}) regret bound and an O(d2T1c2)O(d^{2}T^{1-\frac{c}{2}}) cumulative hard constraint violation bound, where c[12,1)c\in[\frac{1}{2},1) is a user-defined parameter (Theorem 1 and Theorem 2). By setting c=12c=\frac{1}{2}, the algorithm attains O(d2T)O(d^{2}\sqrt{T}) regret bound and O(d2T34)O(d^{2}T^{\frac{3}{4}}) constraint violation bound, which is compatible with the prior work for constrained online convex optimization with full-information (Yi et al. 2022, Guo et al. 2022). We also show both regret and constraint violation bounds are reduced to an O(d2logT)O(d^{2}\log T) and O(d2T(1+logT))O(d^{2}\sqrt{T(1+\log T)}), respectively, when the loss functions are strongly convex (Theorem 3 and Theorem 4). The comparison of this study with prior works is summarized in Table 1.

1.3 Organization

The rest of this paper is organized as follows. In Section 2, we introduce necessary preliminaries of BCO with constraints. Section 3 presents the proposed algorithm to solve the BCO with constraints under two-point bandit feedback. In Section 4, we provide a theoretical analysis of regret bound and hard constraint violation bound for both convex and strongly convex loss functions. Finally, Section 5 concludes the present paper and addresses future work.

2 Preliminaries

2.1 Notation

For a vector 𝒙=(x1,x2,,xd)d\bm{x}=(x_{1},x_{2},\dots,x_{d})^{\top}\in\mathbb{R}^{d}, let 𝒙2\norm{\bm{x}}_{2} be the 2\ell_{2}-norm of 𝒙\bm{x}, i.e., x2=𝒙𝒙=i=1dxi2\norm{x}_{2}=\sqrt{\bm{x}^{\top}\bm{x}}=\sqrt{\sum_{i=1}^{d}x_{i}^{2}}. Let 𝒙,𝒚\left<\bm{x},\bm{y}\right> be the inner product of two vectors 𝒙\bm{x} and 𝒚\bm{y}. Let 𝔹d\mathbb{B}^{d} and 𝕊d\mathbb{S}^{d} denote the dd-dimensional Euclidean ball and unit sphere, and let 𝒗𝔹d\bm{v}\in\mathbb{B}^{d} and 𝒖𝕊d\bm{u}\in\mathbb{S}^{d} denote the random variables sampled uniformly from 𝔹d\mathbb{B}^{d} and 𝕊d\mathbb{S}^{d}, respectively. For a scalar zz\in\mathbb{R}, we denote [z]+max{z,0}[z]_{+}\coloneqq\max\{z,0\}. For a Lipschitz continuous function f:df:\mathbb{R}^{d}\to\mathbb{R}, let lip(f)>0\operatorname{lip}(f)>0 be the Lipschitz constant of ff. We use [T][T] as a shorthand for the set of positive integers {1,2,,T}\{1,2,\dots,T\}. Finally, we use the notation 𝔼t\mathbb{E}_{t} as the conditional expectation over the condition of all randomness in the first t1t-1 rounds.

2.2 Assumptions

Following prior works of constrained OCO (Mahdavi et al. 2012, Guo et al. 2022), we make the following standard assumptions on feasible region, loss functions, and constraint functions.

Assumption 1 (Bounded domain).

The feasible region 𝒳d\mathcal{X}\subseteq\mathbb{R}^{d} is a non-empty bounded closed convex set such that 𝒙𝒚2D\norm{\bm{x}-\bm{y}}_{2}\leq D holds for any 𝒙,𝒚𝒳\bm{x},\,\bm{y}\in\mathcal{X}.

Assumption 2 (Convexity and Lipschitz continuity of loss functions).

The loss function ft:𝒳f_{t}:\mathcal{X}\to\mathbb{R} is convex and Lipschitz continuous with Lipschitz constant Ft>0F_{t}>0 on 𝒳\mathcal{X}, that is, we have

|ft(𝒙)ft(𝒚)|Ft𝒙𝒚2,\displaystyle\absolutevalue{f_{t}(\bm{x})-f_{t}(\bm{y})}\leq F_{t}\norm{\bm{x}-\bm{y}}_{2},

for any 𝒙,𝒚𝒳\bm{x},\bm{y}\in\mathcal{X} and for any t[T]t\in[T]. For simplicity, we define F:=maxt[T]FtF:=\max_{t\in[T]}F_{t}.

Assumption 3 (Convexity and Lipschitz continuity of constraint functions).

The constraint function gt:𝒳g_{t}:\mathcal{X}\to\mathbb{R} is convex and Lipschitz continuous with Lipschitz constant Gt>0G_{t}>0 on 𝒳\mathcal{X}, that is, we have

|gt(𝒙)gt(𝒚)|Gt𝒙𝒚2,\displaystyle\absolutevalue{g_{t}(\bm{x})-g_{t}(\bm{y})}\leq G_{t}\norm{\bm{x}-\bm{y}}_{2},

for any 𝒙,𝒚𝒳\bm{x},\bm{y}\in\mathcal{X} and for any t[T]t\in[T]. For simplicity, we define G:=maxt[T]GtG:=\max_{t\in[T]}G_{t}.

2.3 Offline constrained OCO

With the full knowledge of loss functions {ft(𝒙)}t=1T\{f_{t}(\bm{x})\}_{t=1}^{T} and constraint functions {gt(𝒙)}t=1T\{g_{t}(\bm{x})\}_{t=1}^{T} in all rounds, the offline constrained OCO is formulated as the following convex optimization problem:

min𝒙𝒳\displaystyle\min_{\bm{x}\in\mathcal{X}}\quad t=1Tft(𝒙)\displaystyle\sum_{t=1}^{T}f_{t}(\bm{x}) (1a)
subject to gt(𝒙)0t[T],\displaystyle g_{t}(\bm{x})\leq 0\qquad\forall t\in[T], (1b)

where 𝒳\mathcal{X} is assumed to be a simple convex set (e.g., Euclidean ball, probability simplex) for which the projection onto 𝒳\mathcal{X} is efficiently computable.

For the sake of simplicity of theoretical analysis, the present paper considers the case where there exists a single constraint function. By defining gt(𝒙)maxi[m]gt(i)(𝒙)g_{t}(\bm{x})\coloneqq\max_{i\in[m]}g^{(i)}_{t}(\bm{x}), this study can be easily extended to the case where multiple constraint functions, i.e., gt(i)(𝒙)0(i[m])g_{t}^{(i)}(\bm{x})\leq 0\,(i\in[m]) exist, because maximum of finite convex functions is also convex.

2.4 Performance metrics

Given a sequence of decisions {𝒙t}t=1T𝒳\{\bm{x}_{t}\}_{t=1}^{T}\subseteq\mathcal{X} generated by some OCO algorithm (e.g., Online Gradient Descent method (Zinkevich 2003)). Under the situation where all loss functions {ft(𝒙)}t=1T\{f_{t}(\bm{x})\}_{t=1}^{T} and constraint functions {gt(𝒙)}t=1T\{g_{t}(\bm{x})\}_{t=1}^{T} in each round t=1,2,,Tt=1,2,\dots,T are known in advance, the regret and cumulative hard constraint violation are defined as follows:

RT\displaystyle R_{T} t=1Tft(𝒙t)t=1Tft(𝒙),\displaystyle\coloneqq\sum_{t=1}^{T}f_{t}(\bm{x}_{t})-\sum_{t=1}^{T}f_{t}(\bm{x}^{\star}), (2)
VT\displaystyle V_{T} t=1T[gt(𝒙t)]+=t=1Tmax{gt(𝒙t),0},\displaystyle\coloneqq\sum_{t=1}^{T}\quantity[g_{t}(\bm{x}_{t})]_{+}=\sum_{t=1}^{T}\max\quantity{g_{t}(\bm{x}_{t}),0}, (3)

where 𝒙𝒳\bm{x}^{\star}\in\mathcal{X} is the optimal solution to the offline constrained OCO formulated as Eq. (1). The objective of the learner is to generate a sequence of decisions that attains a sub-linear growth of both regret and cumulative constraint violation, that is, limsupTRTT0\lim\sup_{T\to\infty}\frac{R_{T}}{T}\leq 0 and limsupTVTT0\lim\sup_{T\to\infty}\frac{V_{T}}{T}\leq 0.

2.5 Gradient estimator

In the partial information setting where only limited feedback is available to the learner, we follow the prior works (Flaxman et al. 2005, Agarwal et al. 2010, Zhao et al. 2021). The following result guarantees the gradient estimator with one-point feedback being an unbiased estimator.

Lemma 1.

(Zhao et al. 2021: Lemma 1) For any convex function f:𝒳f:\mathcal{X}\to\mathbb{R}, define its smoothed version function f^(𝒙)=𝔼𝒗𝔹d[f(𝒙+δ𝒗)]\widehat{f}(\bm{x})=\mathbb{E}_{\bm{v}\in\mathbb{B}^{d}}[f(\bm{x}+\delta\bm{v})], where the expectation is taken over the random vector 𝒗𝔹d\bm{v}\in\mathbb{B}^{d} with 𝔹d\mathbb{B}^{d} being the unit ball, i.e., 𝔹d{𝒙d𝒙21}\mathbb{B}^{d}\coloneqq\quantity{\bm{x}\in\mathbb{R}^{d}\mid\norm{\bm{x}}_{2}\leq 1}. Then, for any δ>0\delta>0, we have

𝔼𝒖𝕊d[dδf(𝒙+δ𝒖)𝒖]=f^(𝒙),\displaystyle\mathbb{E}_{\bm{u}\in\mathbb{S}^{d}}\quantity[\frac{d}{\delta}f(\bm{x}+\delta\bm{u})\bm{u}]=\nabla\widehat{f}(\bm{x}),

where the expectation is taken over the random vector 𝒔𝕊d\bm{s}\in\mathbb{S}^{d} with 𝕊d\mathbb{S}^{d} being the unit sphere centered around the origin, i.e., 𝕊d{𝒙d𝒙2=1}\mathbb{S}^{d}\coloneqq\quantity{\bm{x}\in\mathbb{R}^{d}\mid\norm{\bm{x}}_{2}=1}.

Proof.

See Flaxman et al. (2005: Lemma 2.1). ∎

Moreover, as shown in Shamir (2017: Lemma 8), for any convex function f:𝒳f:\mathcal{X}\to\mathbb{R} and its smoothed version f^\widehat{f}, we have

sup𝒙𝒳|f^(𝒙)f(𝒙)|δlip(f).\displaystyle\sup_{\bm{x}\in\mathcal{X}}\absolutevalue{\widehat{f}(\bm{x})-f(\bm{x})}\leq\delta\operatorname{lip}(f). (4)

The present study considers a two-point feedback model where the learner is allowed to query two points in each round. Specifically, at round t[T]t\in[T], the learner is allowed to query two points around decision 𝒙t\bm{x}_{t}, that is, 𝒙t+δ𝒖t\bm{x}_{t}+\delta\bm{u}_{t} and 𝒙tδ𝒖t\bm{x}_{t}-\delta\bm{u}_{t}, where δ>0\delta>0 is a perturbation parameter and 𝒖t\bm{u}_{t} is a random unit vector sampled from unit sphere 𝕊d\mathbb{S}^{d}. With two points 𝒙t+δ𝒖t\bm{x}_{t}+\delta\bm{u}_{t} and 𝒙tδ𝒖t\bm{x}_{t}-\delta\bm{u}_{t}, the gradient estimator of the function ftf_{t} at 𝒙t\bm{x}_{t} is given by

~ftd2δ[ft(𝒙t+δ𝒖t)ft(𝒙tδ𝒖t)]𝒖t,\displaystyle\widetilde{\nabla}f_{t}\coloneqq\frac{d}{2\delta}\quantity[f_{t}(\bm{x}_{t}+\delta\bm{u}_{t})-f_{t}(\bm{x}_{t}-\delta\bm{u}_{t})]\bm{u}_{t}, (5)

where dd is the dimensionality of the domain 𝒳d\mathcal{X}\subseteq\mathbb{R}^{d}. As shown in Agarwal et al. (2010), ~ft\widetilde{\nabla}f_{t} is norm bounded, that is, we have ~ft2δ2δlip(ft)2δ𝒖t2lip(ft)d\|\widetilde{\nabla}f_{t}\|_{2}\leq\frac{\delta}{2\delta}\operatorname{lip}(f_{t})\norm{2\delta\bm{u}_{t}}_{2}\leq\operatorname{lip}(f_{t})d, where the first inequality holds by the Lipschitz continuity of ftf_{t}.

Lemma 1 implies that the gradient estimator ~ft\widetilde{\nabla}f_{t} is an unbiased estimator of f^t(𝒙t)\nabla\widehat{f}_{t}(\bm{x}_{t}), i.e., 𝔼𝒖𝕊d[~ft]=f^t(𝒙t)\mathbb{E}_{\bm{u}\in\mathbb{S}^{d}}[\widetilde{\nabla}f_{t}]=\nabla\widehat{f}_{t}(\bm{x}_{t}), where f^t(𝒙t)=𝔼𝒗𝔹d[ft(𝒙t+δ𝒗)]\widehat{f}_{t}(\bm{x}_{t})=\mathbb{E}_{\bm{v}\in\mathbb{B}^{d}}[f_{t}(\bm{x}_{t}+\delta\bm{v})] is the smoothed version of original function ftf_{t}. This property holds because the distribution of perturbation 𝒖t\bm{u}_{t} in Eq. (5) is symmetric.

3 Proposed Algorithm

This section presents the proposed algorithm for solving the constrained BCO with two-point feedback. The procedure of the algorithm is shown in Algorithm 1, and this algorithm is motivated by the work in Guo et al. (2022) and the design of the algorithm is related to penalty-based proximal gradient descent method (Cheung and Lou 2017). At round t[T]t\in[T], Algorithm 1 finds the decision vector 𝒙t+1\bm{x}_{t+1} by solving the following strongly convex optimization problem:

𝒙t+1=argmin𝒙(1ξ)𝒳{ft(𝒙t)+~ft(𝒙𝒙t)+λtg^t+(𝒙)+αt2𝒙𝒙t22},\displaystyle\bm{x}_{t+1}=\arg\min_{\bm{x}\in(1-\xi)\mathcal{X}}\quantity{f_{t}(\bm{x}_{t})+\widetilde{\nabla}f_{t}^{\top}(\bm{x}-\bm{x}_{t})+\lambda_{t}\widehat{g}^{+}_{t}(\bm{x})+\frac{\alpha_{t}}{2}\norm{\bm{x}-\bm{x}_{t}}^{2}_{2}}, (6)

where λt\lambda_{t} is the penalty variable for controlling the quality of the decision, g^t+(𝒙)γt[gt(𝒙)]+\widehat{g}^{+}_{t}(\bm{x})\coloneqq\gamma_{t}[g_{t}(\bm{x})]_{+}, ξ>0\xi>0 is the shrinkage constant, and αt>0,γt>0\alpha_{t}>0,\,\gamma_{t}>0 are predetermined learning rate. Note that the optimization problem in the right-hand side (r.h.s) of Eq. (6) is strongly convex optimization due to the 2\ell_{2} regularizer term, and hence the optimal solution 𝒙t+1\bm{x}_{t+1} does exist and unique. As is the case with Mahdavi et al. (2012), we optimize the r.h.s. of Eq. (6) on the domain (1ξ)𝒳(1-\xi)\mathcal{X} to ensure that randomized two points around 𝒙t\bm{x}_{t} are inside the feasible region 𝒳\mathcal{X}. As shown in Flaxman et al. (2005), for any 𝒙(1ξ)𝒳\bm{x}\in(1-\xi)\mathcal{X} and for any unit vector 𝒖𝕊d\bm{u}\in\mathbb{S}^{d}, it holds 𝒙±δ𝒖𝒳\bm{x}\pm\delta\bm{u}\in\mathcal{X}.

At round tt, where we find the decision 𝒙t+1𝒳\bm{x}_{t+1}\in\mathcal{X}, since we do not have the prior knowledge of the loss function ft+1(𝒙)f_{t+1}(\bm{x}) to be minimized, we estimate the loss by the first-order approximation at the previous decision 𝒙t\bm{x}_{t} as f~t+1(𝒙)=ft(𝒙t)+ft(𝒙t),𝒙𝒙t\widetilde{f}_{t+1}(\bm{x})=f_{t}(\bm{x}_{t})+\left<\nabla f_{t}(\bm{x}_{t}),\bm{x}-\bm{x}_{t}\right>. Simultaneously, we have no full information of the loss function ft(𝒙)f_{t}(\bm{x}) and hence we cannot access its graient ft(𝒙)\nabla f_{t}(\bm{x}), so we estimate gradient by ~ft\widetilde{\nabla}f_{t} with two points (line 5). To prevent the constraint from being severely violated, we also introduce the rectified Lagrange multiplier λt\lambda_{t} associated with the functional constraint gt(𝒙)0g_{t}(\bm{x})\leq 0, and add the penalty term λtg^t+(𝒙)\lambda_{t}\widehat{g}_{t}^{+}(\bm{x}) to the objective function (6), which is an approximator of the original penalty term θtgt(𝒙)\theta_{t}g_{t}(\bm{x}), where θt\theta_{t} is the Lagrangian multiplier associated with the constraint gt(𝒙)0g_{t}(\bm{x})\leq 0. We also add 2\ell_{2} regularization term αt2𝒙𝒙t22\frac{\alpha_{t}}{2}\norm{\bm{x}-\bm{x}_{t}}^{2}_{2} to stabilize the optimization problem.

We will describe more in detail the role of penalty parameter λt\lambda_{t} and its update rule. The penalty parameter λt\lambda_{t} is related to the Lagrangian multiplier (denoted by θt\theta_{t}) associated with the functional constraint gt(𝒙)0g_{t}(\bm{x})\leq 0, but slightly different because we have no prior knowledge of the constraint functions when making-decision. Instead, we take place the original Lagrangian multiplier θt+1\theta_{t+1} with λt\lambda_{t} such that λtg^t+(𝒙)\lambda_{t}\widehat{g}^{+}_{t}(\bm{x}) is an approximator of θtgt(𝒙)\theta_{t}g_{t}(\bm{x}). We update the penalty parameter (line 9) as λt+1=max{λt+γt+1[gt+1(𝒙t)]+,ηt+1}\lambda_{t+1}=\max\{\lambda_{t}+\gamma_{t+1}[g_{t+1}(\bm{x}_{t})]_{+},\eta_{t+1}\}, where the first coordinate of maximum operator is the sum of the old λt\lambda_{t} and the rectified constraint function value γt+1[gt+1(𝒙t)]+\gamma_{t+1}[g_{t+1}(\bm{x}_{t})]_{+}; and the second coordinate is the user-determined constant ηt+1\eta_{t+1} to impose a minimum penalty. This update rule for the penalty parameter prevents the decision determined by solving the problem (6) from being overly aggressive which leads to large constraint violation.

Algorithm 1 A Rectified Bandit Convex Optimization with Hard Constraints under Two-Point Bandit Feedback
0:  Total number of rounds TT, learning rates {αt}t=1T>0,{γt}t=1T>0,{ηt}t=1T>0\{\alpha_{t}\}_{t=1}^{T}\subseteq\mathbb{R}_{>0},\,\{\gamma_{t}\}_{t=1}^{T}\subseteq\mathbb{R}_{>0},\,\{\eta_{t}\}_{t=1}^{T}\subseteq\mathbb{R}_{>0}, shrinkage paramaeter ξ>0\xi>0, and perturbation parameter δ>0\delta>0.
1:  Initialization: 𝒙1𝒳,λ1=0\bm{x}_{1}\in\mathcal{X},\,\lambda_{1}=0, and set g^1+(𝒙)=γ1[g1(𝒙)]+\widehat{g}^{+}_{1}(\bm{x})=\gamma_{1}[g_{1}(\bm{x})]_{+}.
2:  for t=1,2,,Tt=1,2,\dots,T do
3:     Draw unit vector 𝒖t\bm{u}_{t} from 𝕊d\mathbb{S}^{d} uniformly at random.
4:     Query ft(𝒙)f_{t}(\bm{x}) at two points 𝒙t+δ𝒖t\bm{x}_{t}+\delta\bm{u}_{t} and 𝒙tδ𝒖t\bm{x}_{t}-\delta\bm{u}_{t}.
5:     Compute ~ft=d2δ[ft(𝒙t+δ𝒖t)ft(𝒙tδ𝒖t)]𝒖t\widetilde{\nabla}f_{t}=\frac{d}{2\delta}\quantity[f_{t}(\bm{x}_{t}+\delta\bm{u}_{t})-f_{t}(\bm{x}_{t}-\delta\bm{u}_{t})]\bm{u}_{t}.
6:     Find the optimal solution 𝒙t+1\bm{x}_{t+1} by solving optimization problem (6).
7:     Submit 𝒙t+1\bm{x}_{t+1}, incur loss ft+1(𝒙t+1)f_{t+1}(\bm{x}_{t+1}) and observe constraint gt+1(𝒙)g_{t+1}(\bm{x}).
8:     Set g^t+1+(𝒙)=γt+1[gt+1(𝒙)]+\widehat{g}^{+}_{t+1}(\bm{x})=\gamma_{t+1}[g_{t+1}(\bm{x})]_{+}.
9:     Update the penalty variable as λt+1=max{λt+γt+1[gt+1(𝒙t)]+,ηt+1}\lambda_{t+1}=\max\{\lambda_{t}+\gamma_{t+1}[g_{t+1}(\bm{x}_{t})]_{+},\eta_{t+1}\}.
10:  end for

4 Theoretical Analysis

This section provides the theoretical analysis for the Algorithm 1. To facilitate the analysis, let ht:𝒳h_{t}:\mathcal{X}\to\mathbb{R} be a function defined by

ht(𝒙)f^t(𝒙)+~ftf^t(𝒙t),𝒙,h_{t}(\bm{x})\coloneqq\widehat{f}_{t}(\bm{x})+\left<\widetilde{\nabla}f_{t}-\nabla\widehat{f}_{t}(\bm{x}_{t}),\bm{x}\right>, (7)

where f^t(𝒙)=𝔼𝒗𝔹d[ft(𝒙+δ𝒗)]\widehat{f}_{t}(\bm{x})=\mathbb{E}_{\bm{v}\in\mathbb{B}^{d}}[f_{t}(\bm{x}+\delta\bm{v})] and ~ft\widetilde{\nabla}f_{t} is defined as Eq. (5). It is easily seen that ht(𝒙t)=~ft\nabla h_{t}(\bm{x}_{t})=\widetilde{\nabla}f_{t} holds, and hence we have ht(𝒙)2=^ft2dlip(ft)\norm{\nabla h_{t}(\bm{x})}_{2}=\|\widehat{\nabla}f_{t}\|_{2}\leq d\text{lip}(f_{t}) for any 𝒙𝒳\bm{x}\in\mathcal{X}. Moreover, the function hth_{t} defined as Eq. (7) is convex and Lipschitz continuous with Lipschitz constant lip(ht)=3dlip(ft)\operatorname{lip}(h_{t})=3d\operatorname{lip}(f_{t}) on 𝒳\mathcal{X}, because for any 𝒙,𝒚𝒳\bm{x},\bm{y}\in\mathcal{X}, we have

|h(𝒙)h(𝒚)|\displaystyle\absolutevalue{h(\bm{x})-h(\bm{y})} |f^t(𝒙)f^t(𝒚)|+|~ftf^t(𝒙t),𝒙𝒚|\displaystyle\leq\absolutevalue{\widehat{f}_{t}(\bm{x})-\widehat{f}_{t}(\bm{y})}+\absolutevalue{\left<\widetilde{\nabla}f_{t}-\nabla\widehat{f}_{t}(\bm{x}_{t}),\bm{x}-\bm{y}\right>}
lip(f^t)𝒙𝒚2+(~ft2+f^t(𝒙t)2)𝒙𝒚2\displaystyle\leq\operatorname{lip}(\widehat{f}_{t})\norm{\bm{x}-\bm{y}}_{2}+\quantity(\|\widetilde{\nabla}f_{t}\|_{2}+\|\nabla\widehat{f}_{t}(\bm{x}_{t})\|_{2})\norm{\bm{x}-\bm{y}}_{2}
lip(f^t)𝒙𝒚2+(lip(f^t)d+lip(f^t))𝒙𝒚23dlip(ft)𝒙𝒚2,\displaystyle\leq\operatorname{lip}(\widehat{f}_{t})\norm{\bm{x}-\bm{y}}_{2}+\quantity(\operatorname{lip}(\widehat{f}_{t})d+\operatorname{lip}(\widehat{f}_{t}))\norm{\bm{x}-\bm{y}}_{2}\leq 3d\operatorname{lip}(f_{t})\norm{\bm{x}-\bm{y}}_{2},

where the first inequality follows from the triangle inequality, the second inequality follows from the Cauchy-Schwarz inequality, the third inequality follows from f(𝒙)2lip(f)\norm{\nabla f(\bm{x})}_{2}\leq\operatorname{lip}(f) for any Lipshitz continuous function ff and for any 𝒙𝒳\bm{x}\in\mathcal{X}, and the last inequality follows from lip(f^t)=lip(ft)\operatorname{lip}(\widehat{f}_{t})=\operatorname{lip}(f_{t}).

To prove Algorithm 1 attains sub-linear bound for both regret and cumulative hard constraint violation, we first show the following result which is a well-known property of a strongly convex function.

Lemma 2.

(Nesterov et al. 2018: Theorem 2.1.8) Let 𝒳d\mathcal{X}\subseteq\mathbb{R}^{d} be a convex set. Let f:𝒳f:\mathcal{X}\to\mathbb{R} be a strongly convex function with modulus σ\sigma on 𝒳\mathcal{X}, and let 𝒙𝒳\bm{x}^{\star}\in\mathcal{X} be an optimal solution of ff, that is, 𝒙=argmin𝒙𝒳f(𝒙)\bm{x}^{\star}=\arg\min_{\bm{x}\in\mathcal{X}}f(\bm{x}). Then, f(𝒙)f(𝒙)+σ2𝒙𝒙22f(\bm{x})\geq f(\bm{x}^{\star})+\frac{\sigma}{2}\norm{\bm{x}-\bm{x}^{\star}}^{2}_{2} holds for any 𝒙𝒳\bm{x}\in\mathcal{X}.

Proof.

By the definition of strong convexity of ff, for any 𝒙,𝒚𝒳\bm{x},\,\bm{y}\in\mathcal{X}, we have

f(𝒙)f(𝒚)+f(𝒙),𝒙𝒚+σ2𝒙𝒚22.\displaystyle f(\bm{x})\geq f(\bm{y})+\left<\nabla f(\bm{x}),\bm{x}-\bm{y}\right>+\frac{\sigma}{2}\norm{\bm{x}-\bm{y}}^{2}_{2}. (8)

Plugging an optimal solution 𝒙𝒳\bm{x}^{\star}\in\mathcal{X} into 𝒚\bm{y} in the above inequality (8), we have

f(𝒙)\displaystyle f(\bm{x}) f(𝒙)+f(𝒙),𝒙𝒙+σ2𝒙𝒙22f(𝒙)+σ2𝒙𝒙22,\displaystyle\geq f(\bm{x}^{\star})+\left<\nabla f(\bm{x}^{\star}),\bm{x}-\bm{x}^{\star}\right>+\frac{\sigma}{2}\norm{\bm{x}-\bm{x}^{\star}}^{2}_{2}\geq f(\bm{x}^{\star})+\frac{\sigma}{2}\norm{\bm{x}-\bm{x}^{\star}}^{2}_{2},

where the last inequality holds by the first-order optimality condition, f(𝒙),𝒙𝒙0\left<\nabla f(\bm{x}^{\star}),\bm{x}-\bm{x}^{\star}\right>\geq 0. ∎

The following two lemmas play an important role in proving the main theorem (Theorem 1 and Theorem 2). The first one (Lemma 3) is an inequality involving the update rule of Algorithm 1, and the second one (Lemma 4) characterizes the relationship between the current solution 𝒙t\bm{x}_{t} in Algorithm 1 and the optimal solution of the offline optimization problem formulated as Eq. (1).

Lemma 3.

(Guo et al. 2022: Lemma 5) Let φt:𝒳d\varphi_{t}:\mathcal{X}\to\mathbb{R}^{d} be a function defined by

φt(𝒙)ft(𝒙t)+ft(𝒙t),𝒙𝒙t+λtg^t+(𝒙)+αt2𝒙𝒙t22,\displaystyle\varphi_{t}(\bm{x})\coloneqq f_{t}(\bm{x}_{t})+\left<\nabla f_{t}(\bm{x}_{t}),\bm{x}-\bm{x}_{t}\right>+\lambda_{t}\widehat{g}^{+}_{t}(\bm{x})+\frac{\alpha_{t}}{2}\norm{\bm{x}-\bm{x}_{t}}^{2}_{2}, (9)

where g^t+(𝒙)γtgt(𝒙)\widehat{g}^{+}_{t}(\bm{x})\coloneqq\gamma_{t}g_{t}(\bm{x}) and αt>0,γt>0\alpha_{t}>0,\gamma_{t}>0 are predetermined learning rate. Let 𝒙t+1\bm{x}_{t+1} be the optimal solution returned by Algorithm 1 where the gradient ft(𝒙)\nabla f_{t}(\bm{x}) is accessible, that is, 𝒙t+1=argmin𝒙𝒳φt(𝒙)\bm{x}_{t+1}=\arg\min_{\bm{x}\in\mathcal{X}}\varphi_{t}(\bm{x}). Then, for any 𝒙𝒳\bm{x}\in\mathcal{X}, we have

ft(𝒙t)+ft(𝒙t),𝒙t+1𝒙t+λtg^t+(𝒙t+1)+αt2𝒙t+1𝒙t22ft(𝒙t)+ft(𝒙t),𝒙𝒙t+λtg^t+(𝒙)+αt2𝒙𝒙t22αt2𝒙𝒙t+122.\displaystyle\begin{split}&f_{t}(\bm{x}_{t})+\left<\nabla f_{t}(\bm{x}_{t}),\bm{x}_{t+1}-\bm{x}_{t}\right>+\lambda_{t}\widehat{g}^{+}_{t}(\bm{x}_{t+1})+\frac{\alpha_{t}}{2}\norm{\bm{x}_{t+1}-\bm{x}_{t}}^{2}_{2}\\ &\quad\leq f_{t}(\bm{x}_{t})+\left<\nabla f_{t}(\bm{x}_{t}),\bm{x}-\bm{x}_{t}\right>+\lambda_{t}\widehat{g}^{+}_{t}(\bm{x})+\frac{\alpha_{t}}{2}\norm{\bm{x}-\bm{x}_{t}}^{2}_{2}-\frac{\alpha_{t}}{2}\norm{\bm{x}-\bm{x}_{t+1}}^{2}_{2}.\end{split} (10)
Proof.

Since φt\varphi_{t} is a strongly convex function with modulus αt\alpha_{t}, we can apply Lemma 2 to φt\varphi_{t}. Thus, we have φt(𝒙t+1)φt(𝒙)αt2𝒙𝒙t+122\varphi_{t}(\bm{x}_{t+1})\leq\varphi_{t}(\bm{x})-\frac{\alpha_{t}}{2}\norm{\bm{x}-\bm{x}_{t+1}}^{2}_{2} for any 𝒙𝒳\bm{x}\in\mathcal{X}, which completes the proof. ∎

Lemma 4 (Self-bounding Property).

(Guo et al. 2022: Lemma 1) Let ft:𝒳f_{t}:\mathcal{X}\to\mathbb{R} be a convex function satisfying Assumption 2. Let 𝒙𝒳\bm{x}^{\star}\in\mathcal{X} be any optimal solution to the offline constrained OCO of Eq. (1) and 𝒙t𝒳\bm{x}_{t}\in\mathcal{X} be the optimal solution returned by Algorithm 1. Then, we have

ft(𝒙t)ft(𝒙)+λtg^t+(𝒙t+1)Ft24αt+αt2𝒙𝒙t22αt2𝒙𝒙t+122,f_{t}(\bm{x}_{t})-f_{t}(\bm{x}^{\star})+\lambda_{t}\widehat{g}^{+}_{t}(\bm{x}_{t+1})\leq\frac{F_{t}^{2}}{4\alpha_{t}}+\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t+1}}^{2}_{2}, (11)

where g^t+(𝒙)γtgt(𝒙)\widehat{g}^{+}_{t}(\bm{x})\coloneqq\gamma_{t}g_{t}(\bm{x}) and αt>0,γt>0\alpha_{t}>0,\gamma_{t}>0 are predetermined learning rate.

Proof.

See Guo et al. (2022: Lemma 1). ∎

We are now ready to prove the main results, which state Algorithm 1 achieves a sub-linear bound for both regret (2) and cumulative hard constraint violation (3). We first show the case where the loss functions are convex and constraint functions are fixed throughout the whole round.

4.1 Convex loss function case

Theorem 1.

Let {𝐱t}t=1T\{\bm{x}_{t}\}_{t=1}^{T} be a sequence of decisions generated by Algorithm 1 and let 𝐱𝒳\bm{x}^{\star}\in\mathcal{X} be an optimal solution to the offline OCO of Eq. (1). Assume that constraint functions are fixed, that is, gt(𝐱)=g(𝐱)g_{t}(\bm{x})=g(\bm{x}) for any t[T]t\in[T]. Define αttc,γttc+ε,ηttc\alpha_{t}\coloneqq t^{c},\,\gamma_{t}\coloneqq t^{c+\varepsilon},\,\eta_{t}\coloneqq t^{c} and δ1T\delta\coloneqq\frac{1}{T}, where c[12,1)c\in[\frac{1}{2},1) and ε>0\varepsilon>0. Under Assumptions 1, 2 and 3, we have

t=1T[ft(𝒙t)ft(𝒙)]\displaystyle\sum_{t=1}^{T}\quantity[f_{t}(\bm{x}_{t})-f_{t}(\bm{x}^{\star})] (9F2d24(1c)+D22+2F)Tmax{c,1c}=O(d2Tmax{c,1c}),\displaystyle\leq\quantity(\frac{9F^{2}d^{2}}{4(1-c)}+\frac{D^{2}}{2}+2F)T^{\max\{c,1-c\}}=O(d^{2}T^{\max\{c,1-c\}}), (12)
t=1T[gt(𝒙t)]+\displaystyle\sum_{t=1}^{T}[g_{t}(\bm{x}_{t})]_{+} 27F2d24+3FdD(1+ε)ε+D2=O(d2).\displaystyle\leq\frac{27F^{2}d^{2}}{4}+\frac{3FdD(1+\varepsilon)}{\varepsilon}+D^{2}=O(d^{2}). (13)
Proof.

Similar to the argumant in Flaxman et al. (2005) and Agarwal et al. (2010), letting 𝝃t~ftf^t(𝒙t)\bm{\xi}_{t}\coloneqq\widetilde{\nabla}f_{t}-\nabla\widehat{f}_{t}(\bm{x}_{t}), then we have 𝔼t[𝝃t]=𝟎\mathbb{E}_{t}[\bm{\xi}_{t}]=\bm{0} from Lemma 1, and thus, we have 𝔼t[𝝃𝒙]=0\mathbb{E}_{t}[\bm{\xi}^{\top}\bm{x}]=0 for any fixed 𝒙𝒳\bm{x}\in\mathcal{X}. Therefore, for any fixed 𝒙𝒳\bm{x}\in\mathcal{X}, we have

𝔼t[ht(𝒙)]\displaystyle\mathbb{E}_{t}[h_{t}(\bm{x})] =𝔼t[f^t(𝒙)]+𝔼t[𝝃t𝒙]=f^t(𝒙).\displaystyle=\mathbb{E}_{t}\left[\widehat{f}_{t}(\bm{x})\right]+\mathbb{E}_{t}\left[\bm{\xi}_{t}^{\top}\bm{x}\right]=\widehat{f}_{t}(\bm{x}).

Part (i): Proof of Eq. (12)

Recall that the function hth_{t} is Lipschitz continuous with Lipschitz constant lip(ht)=3Ftd\operatorname{lip}(h_{t})=3F_{t}d. Applying Lemma 4 to the convex function hth_{t} defined by Eq. (7), for an optimal solution 𝒙\bm{x}^{\star} to the offline optimization problem as Eq. (1), we have

t=1T[ht(𝒙t)ht(𝒙)]\displaystyle\sum_{t=1}^{T}\quantity[h_{t}(\bm{x}_{t})-h_{t}(\bm{x}^{\star})] t=1Tlip(ht)24αt+t=1T(αt2𝒙𝒙t22αt2𝒙𝒙t+122)\displaystyle\leq\sum_{t=1}^{T}\frac{\operatorname{lip}(h_{t})^{2}}{4\alpha_{t}}+\sum_{t=1}^{T}\quantity(\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t+1}}^{2}_{2})
9F2d24t=1T1αt+t=1T(αt2αt12)𝒙𝒙t22αT2𝒙𝒙T+122\displaystyle\leq\frac{9F^{2}d^{2}}{4}\sum_{t=1}^{T}\frac{1}{\alpha_{t}}+\sum_{t=1}^{T}\quantity(\frac{\alpha_{t}}{2}-\frac{\alpha_{t-1}}{2})\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\frac{\alpha_{T}}{2}\norm{\bm{x}^{\star}-\bm{x}_{T+1}}^{2}_{2}
9F2d24t=1T1αt+D2t=1T(αt2αt12),\displaystyle\leq\frac{9F^{2}d^{2}}{4}\sum_{t=1}^{T}\frac{1}{\alpha_{t}}+D^{2}\sum_{t=1}^{T}\quantity(\frac{\alpha_{t}}{2}-\frac{\alpha_{t-1}}{2}),

where the last inequality follows from Assumption 1. Plugging in αt=tc\alpha_{t}=t^{c}, we have

t=1T[ht(𝒙t)ht(𝒙)]9F2d24T1c1c+D22Tc=(9F2d24(1c)+D22)Tmax{c,1c}.\displaystyle\sum_{t=1}^{T}\quantity[h_{t}(\bm{x}_{t})-h_{t}(\bm{x}^{\star})]\leq\frac{9F^{2}d^{2}}{4}\cdot\frac{T^{1-c}}{1-c}+\frac{D^{2}}{2}T^{c}=\quantity(\frac{9F^{2}d^{2}}{4(1-c)}+\frac{D^{2}}{2})T^{\max\{c,1-c\}}.

Since we have 𝔼t[ht(𝒙)]=f^(𝒙)\mathbb{E}_{t}\left[h_{t}(\bm{x})\right]=\widehat{f}(\bm{x}), by taking expectation, we have

t=1T[f^t(𝒙t)f^t(𝒙)](9F2d24(1c)+D22)Tmax{c,1c}.\displaystyle\sum_{t=1}^{T}\quantity[\widehat{f}_{t}(\bm{x}_{t})-\widehat{f}_{t}(\bm{x}^{\star})]\leq\quantity(\frac{9F^{2}d^{2}}{4(1-c)}+\frac{D^{2}}{2})T^{\max\{c,1-c\}}.

From the inequality (4), for any optimal solution 𝒙𝒳\bm{x}^{\star}\in\mathcal{X} to the offline OCO as Eq. (1), we have

ft(𝒙t)ft(𝒙)f^t(𝒙t)f^t(𝒙)+2δFt,\displaystyle f_{t}(\bm{x}_{t})-f_{t}(\bm{x}^{\star})\leq\widehat{f}_{t}(\bm{x}_{t})-\widehat{f}_{t}(\bm{x}^{\star})+2\delta F_{t},

for any t[T]t\in[T]. Therefore, we have

t=1T[ft(𝒙t)ft(𝒙)]\displaystyle\sum_{t=1}^{T}\quantity[f_{t}(\bm{x}_{t})-f_{t}(\bm{x}^{\star})] t=1T[f^t(𝒙t)f^t(𝒙)]+t=1T2δFt\displaystyle\leq\sum_{t=1}^{T}\quantity[\widehat{f}_{t}(\bm{x}_{t})-\widehat{f}_{t}(\bm{x}^{\star})]+\sum_{t=1}^{T}2\delta F_{t}
(9F2d24(1c)+D22)Tmax{c,1c}+2F\displaystyle\leq\quantity(\frac{9F^{2}d^{2}}{4(1-c)}+\frac{D^{2}}{2})T^{\max\{c,1-c\}}+2F
(9F2d24(1c)+D22+2F)Tmax{c,1c},\displaystyle\leq\quantity(\frac{9F^{2}d^{2}}{4(1-c)}+\frac{D^{2}}{2}+2F)T^{\max\{c,1-c\}},

where the second inequality follows by plugging in δ=1T\delta=\frac{1}{T}.

Part (ii): Proof of Eq. (13)

From Lemma 4, for any optimal solution 𝒙𝒳\bm{x}^{\star}\in\mathcal{X} to the offline constrained OCO as Eq. (1), we have

λtg^t+(𝒙t+1)lip(ht)24αt+|ht(𝒙t)ht(𝒙)|+αt2𝒙𝒙t22αt2𝒙𝒙t+122.\displaystyle\lambda_{t}\widehat{g}^{+}_{t}(\bm{x}_{t+1})\leq\frac{\operatorname{lip}(h_{t})^{2}}{4\alpha_{t}}+\absolutevalue{h_{t}(\bm{x}_{t})-h_{t}(\bm{x}^{\star})}+\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t+1}}^{2}_{2}.

By the definition of g^t+(𝒙t+1)\widehat{g}^{+}_{t}(\bm{x}_{t+1}), i.e., g^t+1+(𝒙)=γt[gt(𝒙)]+\widehat{g}^{+}_{t+1}(\bm{x})=\gamma_{t}[g_{t}(\bm{x})]_{+}, and plugging in αt=ηt=tc\alpha_{t}=\eta_{t}=t^{c}, we have

[gt(𝒙t+1)]+\displaystyle[g_{t}(\bm{x}_{t+1})]_{+} 9Ft2d24λtαtγt+|ht(𝒙t)ht(𝒙)|λtγt+αt2λtγt𝒙𝒙t22αt2λtγt𝒙𝒙t+122\displaystyle\leq\frac{9F_{t}^{2}d^{2}}{4\lambda_{t}\alpha_{t}\gamma_{t}}+\frac{\absolutevalue{h_{t}(\bm{x}_{t})-h_{t}(\bm{x}^{\star})}}{\lambda_{t}\gamma_{t}}+\frac{\alpha_{t}}{2\lambda_{t}\gamma_{t}}\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\frac{\alpha_{t}}{2\lambda_{t}\gamma_{t}}\norm{\bm{x}^{\star}-\bm{x}_{t+1}}^{2}_{2}
9Ft2d24t3c+ε+|ht(𝒙t)ht(𝒙)|t2c+ε+1tc+ε(𝒙𝒙t22𝒙𝒙t+122),\displaystyle\leq\frac{9F_{t}^{2}d^{2}}{4t^{3c+\varepsilon}}+\frac{\absolutevalue{h_{t}(\bm{x}_{t})-h_{t}(\bm{x}^{\star})}}{t^{2c+\varepsilon}}+\frac{1}{t^{c+\varepsilon}}\quantity(\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\norm{\bm{x}^{\star}-\bm{x}_{t+1}}^{2}_{2}),

where the second inequality is followed by λtηt\lambda_{t}\geq\eta_{t}, and we plugging αt=ηt=tc\alpha_{t}=\eta_{t}=t^{c} and γt=tc+ε\gamma_{t}=t^{c+\varepsilon}. By taking summation over t=1,2,,Tt=1,2,\dots,T, we have

t=1T[gt(𝒙t+1)]+\displaystyle\sum_{t=1}^{T}[g_{t}(\bm{x}_{t+1})]_{+} t=1T9Ft2d24t3c+ε+t=1T|ht(𝒙t)ht(𝒙)|t2c+ε+t=1T1tc+ε(𝒙𝒙t22𝒙𝒙t+122)\displaystyle\leq\sum_{t=1}^{T}\frac{9F_{t}^{2}d^{2}}{4t^{3c+\varepsilon}}+\sum_{t=1}^{T}\frac{\absolutevalue{h_{t}(\bm{x}_{t})-h_{t}(\bm{x}^{\star})}}{t^{2c+\varepsilon}}+\sum_{t=1}^{T}\frac{1}{t^{c+\varepsilon}}\quantity(\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\norm{\bm{x}^{\star}-\bm{x}_{t+1}}^{2}_{2})
27F2d24+3FdD(1+ε)ε+D2,\displaystyle\leq\frac{27F^{2}d^{2}}{4}+\frac{3FdD(1+\varepsilon)}{\varepsilon}+D^{2},

where the second inequality holds from Lemma 5 in Appendix A, which completes the proof. ∎

Remark 1.

By setting constant c=12c=\frac{1}{2}, Algorithm 1 attains O(d2T)O(d^{2}\sqrt{T}) regret bound. This regret bound is compatible with the prior works of unconstrained bandit convex optimization (Agarwal et al. 2010), and is compatible with the result for full-information setting (Guo et al. 2022).

For the case where the constraint functions are time-varying, we can show the following result.

Theorem 2.

Let {𝐱t}t=1T\{\bm{x}_{t}\}_{t=1}^{T} be a sequence of decisions generated by Algorithm 1. Assume that constraint functions gt(𝐱)g_{t}(\bm{x}) are time-varying. Define αt:=tc,γt:=tc+ε\alpha_{t}:=t^{c},\,\gamma_{t}:=t^{c+\varepsilon}, and ηt:=tc\eta_{t}:=t^{c}, where c[12,1)c\in[\frac{1}{2},1) and ε>0\varepsilon>0. Under Assumptions 1, 2 and 3, we have

t=1T[gt(𝒙t)]+(27F2d2+G24+3FdD(8+1ε)+2D2)T1c2=O(d2T1c2).\displaystyle\sum_{t=1}^{T}[g_{t}(\bm{x}_{t})]_{+}\leq\quantity(\frac{27F^{2}d^{2}+G^{2}}{4}+3FdD\quantity(8+\frac{1}{\varepsilon})+2D^{2})T^{1-\frac{c}{2}}=O(d^{2}T^{1-\frac{c}{2}}). (14)
Proof.

By the convexity of [gt(𝒙t)]+[g_{t}(\bm{x}_{t})]_{+} and Assumption 3, we can show [gt(𝒙t)]+[gt(𝒙t+1)]+[g_{t}(\bm{x}_{t})]_{+}-[g_{t}(\bm{x}_{t+1})]_{+} is upper bounded by [gt(𝒙t)]+[gt(𝒙t+1)]+G24β+β𝒙t𝒙t+122[g_{t}(\bm{x}_{t})]_{+}-[g_{t}(\bm{x}_{t+1})]_{+}\leq\frac{G^{2}}{4\beta}+\beta\norm{\bm{x}_{t}-\bm{x}_{t+1}}^{2}_{2} for any β>0\beta>0 (Guo et al. 2022: Lemma 2). Applying Lemma 4 to the function hth_{t} defined by Eq. (7), for any 𝒙𝒳\bm{x}^{\star}\in\mathcal{X}, we have

𝒙t𝒙t+1222αt(ht(𝒙)ht(𝒙t))+2αtht(𝒙t),𝒙t𝒙t+1+𝒙𝒙t22𝒙𝒙t+122.\displaystyle\norm{\bm{x}_{t}-\bm{x}_{t+1}}^{2}_{2}\leq\frac{2}{\alpha_{t}}\quantity(h_{t}(\bm{x}^{\star})-h_{t}(\bm{x}_{t}))+\frac{2}{\alpha_{t}}\left<\nabla h_{t}(\bm{x}_{t}),\bm{x}_{t}-\bm{x}_{t+1}\right>+\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\norm{\bm{x}^{\star}-\bm{x}_{t+1}}^{2}_{2}.

By taking summation over t=1,2,,Tt=1,2,\dots,T, we have

t=1T𝒙t𝒙t+122\displaystyle\sum_{t=1}^{T}\norm{\bm{x}_{t}-\bm{x}_{t+1}}^{2}_{2}
t=1Tht(𝒙)ht(𝒙t)12αt+t=1Tht(𝒙t),𝒙t𝒙t+112αt+t=1T(𝒙𝒙t22𝒙𝒙t+122)\displaystyle\quad\leq\sum_{t=1}^{T}\frac{h_{t}(\bm{x}^{\star})-h_{t}(\bm{x}_{t})}{\frac{1}{2}\alpha_{t}}+\sum_{t=1}^{T}\frac{\left<\nabla h_{t}(\bm{x}_{t}),\bm{x}_{t}-\bm{x}_{t+1}\right>}{\frac{1}{2}\alpha_{t}}+\sum_{t=1}^{T}\quantity(\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\norm{\bm{x}^{\star}-\bm{x}_{t+1}}^{2}_{2})
t=1T2lip(ht)D12αt+𝒙𝒙12212FdD1cT1c+D2,\displaystyle\quad\leq\sum_{t=1}^{T}\frac{2\operatorname{lip}(h_{t})D}{\frac{1}{2}\alpha_{t}}+\norm{\bm{x}^{\star}-\bm{x}_{1}}^{2}_{2}\leq\frac{12FdD}{1-c}T^{1-c}+D^{2},

where the last inequality holds by plugging in αt=tc\alpha_{t}=t^{c}. Therefore, we have

t=1T[gt(𝒙t)]+\displaystyle\sum_{t=1}^{T}[g_{t}(\bm{x}_{t})]_{+} t=1T[gt(𝒙t+1)]++G2T4β+βt=1T𝒙t𝒙t+122\displaystyle\leq\sum_{t=1}^{T}[g_{t}(\bm{x}_{t+1})]_{+}+\frac{G^{2}T}{4\beta}+\beta\sum_{t=1}^{T}\norm{\bm{x}_{t}-\bm{x}_{t+1}}^{2}_{2}
27F2d24+3FdD(1+ε)ε+D2+G2T4β+β(12FdD1cT1c+D2)\displaystyle\leq\frac{27F^{2}d^{2}}{4}+\frac{3FdD(1+\varepsilon)}{\varepsilon}+D^{2}+\frac{G^{2}T}{4\beta}+\beta\quantity(\frac{12FdD}{1-c}T^{1-c}+D^{2})
27F2d24+3FdD(1+ε)ε+D2+(G24+24FdD+D2)T1c2,\displaystyle\leq\frac{27F^{2}d^{2}}{4}+\frac{3FdD(1+\varepsilon)}{\varepsilon}+D^{2}+\quantity(\frac{G^{2}}{4}+24FdD+D^{2})T^{1-\frac{c}{2}},

where the second inequality follows from Eq. (13) in Theorem 1 and the last inequality holds by plugging in β=Tc2\beta=T^{\frac{c}{2}}, which completes the proof. ∎

Remark 2.

By setting constant c=12c=\frac{1}{2}, we can obtain O(d2T34)O(d^{2}T^{\frac{3}{4}}) constraint violation bound. This bound is compatible with the result for full-information case (Guo et al. 2022).

4.2 Strongly convex loss function case

We extend the results discussed in the previous subsection to the case where the loss functions are strongly convex. We omit the proofs of the following results here since the technique of the proof is similar to that of Theorem 1 and Theorem 2. These proofs are found in Appendix B and Appendix C. To discuss the strongly convex case, we make the following assumption about loss functions.

Assumption 4 (Strong convexity of loss functions).

The loss function ft:𝒳f_{t}:\mathcal{X}\to\mathbb{R} is Lipschitz continuous with Lipschitz constant FtF_{t}, and strongly convex on 𝒳\mathcal{X} with modulus σt>0\sigma_{t}>0, i.e., we have

ft(𝒚)ft(𝒙)+ft(𝒙),𝒚𝒙+σt2𝒚𝒙22,\displaystyle f_{t}(\bm{y})\geq f_{t}(\bm{x})+\left<\nabla f_{t}(\bm{x}),\bm{y}-\bm{x}\right>+\frac{\sigma_{t}}{2}\norm{\bm{y}-\bm{x}}^{2}_{2}, (15)

for any 𝒙,𝒚𝒳\bm{x},\bm{y}\in\mathcal{X} and for any t[T]t\in[T]. For simplicity, we define σmaxt[T]σt\sigma\coloneqq\max_{t\in[T]}\sigma_{t}.

Under Assumption 4, the function ht:𝒳h_{t}:\mathcal{X}\to\mathbb{R} defined as Eq. (7) is also strongly convex with modulus σt\sigma_{t}, namely, ht(𝒚)ht(𝒙)+ht(𝒙),𝒚𝒙+σt2𝒚𝒙22h_{t}(\bm{y})\geq h_{t}(\bm{x})+\left<\nabla h_{t}(\bm{x}),\bm{y}-\bm{x}\right>+\frac{\sigma_{t}}{2}\norm{\bm{y}-\bm{x}}^{2}_{2} for any 𝒙,𝒚𝒳\bm{x},\bm{y}\in\mathcal{X}. Then, we can show the following results.

Theorem 3.

Let {𝐱t}t=1T\{\bm{x}_{t}\}_{t=1}^{T} be a sequence of decisions generated by Algorithm 1 and let 𝐱𝒳\bm{x}^{\star}\in\mathcal{X} be an optimal solution to the offline OCO of Eq. (1). Assume that constraint functions are fixed, that is, gt(𝐱)=g(𝐱)g_{t}(\bm{x})=g(\bm{x}) for any t[T]t\in[T]. Define αtσt,γttc+ε,ηttc\alpha_{t}\coloneqq\sigma t,\,\gamma_{t}\coloneqq t^{c+\varepsilon},\,\eta_{t}\coloneqq t^{c}, and δ1δ\delta\coloneqq\frac{1}{\delta}, where c[12,1)c\in[\frac{1}{2},1) and ε>0\varepsilon>0. Under Assumptions 1, 3 and 4, we have

t=1T[ft(𝒙t)ft(𝒙)]\displaystyle\sum_{t=1}^{T}\quantity[f_{t}(\bm{x}_{t})-f_{t}(\bm{x}^{\star})] (9F2d24σ+2F)(1+logT)=O(d2logT),\displaystyle\leq\quantity(\frac{9F^{2}d^{2}}{4\sigma}+2F)\quantity(1+\log T)=O(d^{2}\log T),
t=1T[gt(𝒙t)]+\displaystyle\sum_{t=1}^{T}[g_{t}(\bm{x}_{t})]_{+} 27F2d24σ+3FdD(1+ε)ε=O(d2).\displaystyle\leq\frac{27F^{2}d^{2}}{4\sigma}+\frac{3FdD(1+\varepsilon)}{\varepsilon}=O(d^{2}).
Theorem 4.

Let {𝐱t}t=1T\{\bm{x}_{t}\}_{t=1}^{T} be a sequence of decisions generated by Algorithm 1. Assume that constraint functions gt(𝐱)g_{t}(\bm{x}) are time-varying. Define αtσt,γttc+ε,ηttc\alpha_{t}\coloneqq\sigma t,\,\gamma_{t}\coloneqq t^{c+\varepsilon},\,\eta_{t}\coloneqq t^{c}, where c[12,1)c\in[\frac{1}{2},1) and ε>0\varepsilon>0. Under Assumptions 1, 3 and 4, we have

t=1T[gt(𝒙t)]+(27F2d24σ+G24+3FdD(1+1ε+4σ)+D2)T(1+logT).\displaystyle\sum_{t=1}^{T}[g_{t}(\bm{x}_{t})]_{+}\leq\quantity(\frac{27F^{2}d^{2}}{4\sigma}+\frac{G^{2}}{4}+3FdD\quantity(1+\frac{1}{\varepsilon}+\frac{4}{\sigma})+D^{2})\sqrt{T(1+\log T)}.

5 Conclusion and Future Directions

This paper studies the two-point feedback of bandit convex optimization with constraints, in which the loss functions are convex or strongly convex, constraint functions are fixed or time-varying, and the constraint violation is evaluated in terms of cumulative hard constraint violation (Yuan and Lamperski 2018). We present a penalty-based proximal gradient descent algorithm with an unbiased gradient estimator and show that the algorithm attains a sub-linear growth of both regret and cumulative hard constraint violation. It would be of interest to extend this work to the case where both the loss functions and constraint functions are bandit setup as discussed in Cao and Liu (2018), and the case where only one-point bandit feedback is available to the learner. Furthermore, theoretical analysis of dynamic regret, where the comparator sequence can be chosen arbitrarily from the feasible set, would be an important direction for future work.

Acknowledgments and Disclosure of Funding

The author would like to thank Dr. Sho Takemori for making a number of valuable suggestions and advice.

References

  • Hazan et al. (2016) Elad Hazan et al. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
  • Lattimore and Szepesvári (2020) Tor Lattimore and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.
  • Zinkevich (2003) Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pages 928–936, 2003.
  • Mahdavi et al. (2012) Mehrdad Mahdavi, Rong Jin, and Tianbao Yang. Trading regret for efficiency: online convex optimization with long term constraints. The Journal of Machine Learning Research, 13(1):2503–2528, 2012.
  • Yuan and Lamperski (2018) Jianjun Yuan and Andrew Lamperski. Online convex optimization for cumulative constraints. Advances in Neural Information Processing Systems, 31, 2018.
  • Guo et al. (2023) Hengquan Guo, Zhu Qi, and Xin Liu. Rectified pessimistic-optimistic learning for stochastic continuum-armed bandit with constraints. In Learning for Dynamics and Control Conference, pages 1333–1344. PMLR, 2023.
  • Agarwal et al. (2010) Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In Colt, pages 28–40. Citeseer, 2010.
  • Zhao et al. (2021) Peng Zhao, Guanghui Wang, Lijun Zhang, and Zhi-Hua Zhou. Bandit convex optimization in non-stationary environments. The Journal of Machine Learning Research, 22(1):5562–5606, 2021.
  • Chen et al. (2019) Lin Chen, Mingrui Zhang, and Amin Karbasi. Projection-free bandit convex optimization. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2047–2056. PMLR, 2019.
  • Garber and Kretzu (2020) Dan Garber and Ben Kretzu. Improved regret bounds for projection-free bandit convex optimization. In International Conference on Artificial Intelligence and Statistics, pages 2196–2206. PMLR, 2020.
  • Cao and Liu (2018) Xuanyu Cao and KJ Ray Liu. Online convex optimization with time-varying constraints and bandit feedback. IEEE Transactions on automatic control, 64(7):2665–2680, 2018.
  • Flaxman et al. (2005) Abraham D. Flaxman, Adam Tauman Kalai, and H. Brendan McMahan. Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’05, page 385–394, USA, 2005. Society for Industrial and Applied Mathematics. ISBN 0898715857.
  • Guo et al. (2022) Hengquan Guo, Xin Liu, Honghao Wei, and Lei Ying. Online convex optimization with hard constraints: Towards the best of two worlds and beyond. Advances in Neural Information Processing Systems, 35:36426–36439, 2022.
  • Duchi et al. (2010) John C Duchi, Shai Shalev-Shwartz, Yoram Singer, and Ambuj Tewari. Composite objective mirror descent. In COLT, volume 10, pages 14–26. Citeseer, 2010.
  • Duchi et al. (2008) John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pages 272–279, 2008.
  • Jenatton et al. (2016) Rodolphe Jenatton, Jim Huang, and Cédric Archambeau. Adaptive algorithms for online convex optimization with long-term constraints. In International Conference on Machine Learning, pages 402–411. PMLR, 2016.
  • Yu and Neely (2020) Hao Yu and Michael J. Neely. A Low Complexity Algorithm with O(T)O(\sqrt{T}) Regret and O(1)O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints. Journal of Machine Learning Research, 21(1):1–24, 2020.
  • Neely (2022) Michael Neely. Stochastic network optimization with application to communication and queueing systems. Springer Nature, 2022.
  • Yi et al. (2021) Xinlei Yi, Xiuxian Li, Tao Yang, Lihua Xie, Tianyou Chai, and Karl Johansson. Regret and cumulative constraint violation analysis for online convex optimization with long term constraints. In International Conference on Machine Learning, pages 11998–12008. PMLR, 2021.
  • Nesterov and Spokoiny (2017) Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17:527–566, 2017.
  • Yi et al. (2022) Xinlei Yi, Xiuxian Li, Tao Yang, Lihua Xie, Tianyou Chai, and H Karl. Regret and cumulative constraint violation analysis for distributed online constrained convex optimization. IEEE Transactions on Automatic Control, 2022.
  • Shamir (2017) Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. The Journal of Machine Learning Research, 18(1):1703–1713, 2017.
  • Cheung and Lou (2017) Yiu-ming Cheung and Jian Lou. Proximal average approximated incremental gradient descent for composite penalty regularized empirical risk minimization. Machine Learning, 106:595–622, 2017.
  • Nesterov et al. (2018) Yurii Nesterov et al. Lectures on convex optimization, volume 137. Springer, 2018.

Appendix A Proof of useful inequalities

To show Theorem 1, we present the following results which is similar argument of Guo et al. (2022: Lemma 6).

Lemma 5.

Let 𝒙𝒳\bm{x}^{\star}\in\mathcal{X} be an optimal solution to the offline constrained OCO defined as Eq. (1). Under Assumptions 1 and 2, for any feasible solution 𝒙𝒳\bm{x}\in\mathcal{X}, c[12,1)c\in[\frac{1}{2},1), and ε>0\varepsilon>0, we have

t=1T1t3c+ε3c+ε3c+ε13,\displaystyle\sum_{t=1}^{T}\frac{1}{t^{3c+\varepsilon}}\leq\frac{3c+\varepsilon}{3c+\varepsilon-1}\leq 3,
t=1T|ft(𝒙t)ft(𝒙)|t2c+εFD(2c+ε)2c+ε1,\displaystyle\sum_{t=1}^{T}\frac{\absolutevalue{f_{t}(\bm{x}_{t})-f_{t}(\bm{x}^{\star})}}{t^{2c+\varepsilon}}\leq\frac{FD(2c+\varepsilon)}{2c+\varepsilon-1},
t=1T𝒙t𝒙22𝒙t+1𝒙22tc+εD2.\displaystyle\sum_{t=1}^{T}\frac{\norm{\bm{x}_{t}-\bm{x}^{\star}}^{2}_{2}-\norm{\bm{x}_{t+1}-\bm{x}^{\star}}^{2}_{2}}{t^{c+\varepsilon}}\leq D^{2}.
Proof.

The first claim is shown as follows:

t=1T1t3c+ε1+1T1t3c+εdt=1+1T3cε+13c+ε13c+ε3c+ε13,\displaystyle\sum_{t=1}^{T}\frac{1}{t^{3c+\varepsilon}}\leq 1+\int_{1}^{T}\frac{1}{t^{3c+\varepsilon}}\,\differential{t}=1+\frac{1-T^{-3c-\varepsilon+1}}{3c+\varepsilon-1}\leq\frac{3c+\varepsilon}{3c+\varepsilon-1}\leq 3,

where the last inequality holds from the condition of 12c<1\frac{1}{2}\leq c<1.

The second claim is shown as follows:

t=1T|ft(𝒙t)ft(𝒙)|t2c+ε\displaystyle\sum_{t=1}^{T}\frac{\absolutevalue{f_{t}(\bm{x}_{t})-f_{t}(\bm{x}^{\star})}}{t^{2c+\varepsilon}} t=1TFt𝒙t𝒙2t2c+εt=1TFDt2c+εFD(2c+ε)2c+ε1,\displaystyle\leq\sum_{t=1}^{T}\frac{F_{t}\norm{\bm{x}_{t}-\bm{x}^{\star}}_{2}}{t^{2c+\varepsilon}}\leq\sum_{t=1}^{T}\frac{FD}{t^{2c+\varepsilon}}\leq\frac{FD(2c+\varepsilon)}{2c+\varepsilon-1},

where the first inequality follows from Assumption 2, the second inequality follows from Assumption 1, and the third inequality follows as is the case with the inequality t=1T1t3c+ε3c+ε3c+ε1\sum_{t=1}^{T}\frac{1}{t^{3c+\varepsilon}}\leq\frac{3c+\varepsilon}{3c+\varepsilon-1}.

The last claim is shown as follows:

t=1T𝒙t𝒙22𝒙t+1𝒙22tc+ε\displaystyle\sum_{t=1}^{T}\frac{\norm{\bm{x}_{t}-\bm{x}^{\star}}^{2}_{2}-\norm{\bm{x}_{t+1}-\bm{x}^{\star}}^{2}_{2}}{t^{c+\varepsilon}}
=𝒙1𝒙22+t=2T(1tc+ε1(t1)c+ε)𝒙t𝒙22𝒙T+1𝒙22Tc+ε\displaystyle\quad=\norm{\bm{x}_{1}-\bm{x}^{\star}}^{2}_{2}+\sum_{t=2}^{T}\quantity(\frac{1}{t^{c+\varepsilon}}-\frac{1}{(t-1)^{c+\varepsilon}})\norm{\bm{x}_{t}-\bm{x}^{\star}}^{2}_{2}-\frac{\norm{\bm{x}_{T+1}-\bm{x}^{\star}}^{2}_{2}}{T^{c+\varepsilon}}
D2+D2t=2T(1tc+ε1(t1)c+ε)\displaystyle\quad\leq D^{2}+D^{2}\sum_{t=2}^{T}\quantity(\frac{1}{t^{c+\varepsilon}}-\frac{1}{(t-1)^{c+\varepsilon}})
=D2+D2(1Tc+ε1)D2,\displaystyle\quad=D^{2}+D^{2}\quantity(\frac{1}{T^{c+\varepsilon}}-1)\leq D^{2},

where the first inequality follows from Assumption 1. ∎

Appendix B Proof of Theorem 3

Proof.

Similar to the argument of Lemma 3, for any strongly convex function ftf_{t} with modulus σt>0\sigma_{t}>0 and for any optimal solution 𝒙𝒳\bm{x}^{\star}\in\mathcal{X} to the offline constrained OCO as Eq. (1), we have

ft(𝒙t)ft(𝒙)+λtg^t+(𝒙t+1)Ft24αtσt2𝒙𝒙t22+αt2𝒙𝒙t22αt2𝒙𝒙t+122.\displaystyle f_{t}(\bm{x}_{t})-f_{t}(\bm{x}^{\star})+\lambda_{t}\widehat{g}^{+}_{t}(\bm{x}_{t+1})\leq\frac{F_{t}^{2}}{4\alpha_{t}}-\frac{\sigma_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}+\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t+1}}^{2}_{2}. (16)

Applying the above inequality (16) for the function hth_{t} defined by Eq. (7), we have

ht(𝒙t)ht(𝒙)+λtg^t+(𝒙t+1)9Ft2d24αtσt2𝒙𝒙t22+αt2𝒙𝒙t22αt2𝒙𝒙t+122.\displaystyle h_{t}(\bm{x}_{t})-h_{t}(\bm{x}^{\star})+\lambda_{t}\widehat{g}^{+}_{t}(\bm{x}_{t+1})\leq\frac{9F_{t}^{2}d^{2}}{4\alpha_{t}}-\frac{\sigma_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}+\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t+1}}^{2}_{2}. (17)

Note that the function hth_{t} is also strongly convex with modulus σt\sigma_{t} under Assumption 4. Since λtg^t+(𝒙t+1)\lambda_{t}\widehat{g}^{+}_{t}(\bm{x}_{t+1}) is nonnegative, from Eq. (17), we have

ht(𝒙t)ht(𝒙)9Ft2d24αtσt2𝒙𝒙t22+αt2𝒙𝒙t22αt2𝒙𝒙t+122.\displaystyle h_{t}(\bm{x}_{t})-h_{t}(\bm{x}^{\star})\leq\frac{9F_{t}^{2}d^{2}}{4\alpha_{t}}-\frac{\sigma_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}+\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t+1}}^{2}_{2}.

By taking summation over t=1,2,,Tt=1,2,\dots,T, we have

t=1T[ht(𝒙t)ht(𝒙)]\displaystyle\sum_{t=1}^{T}[h_{t}(\bm{x}_{t})-h_{t}(\bm{x}^{\star})] t=1T9Ft2d24αt+t=1T(αt2αt12σt2)𝒙t𝒙22\displaystyle\leq\sum_{t=1}^{T}\frac{9F_{t}^{2}d^{2}}{4\alpha_{t}}+\sum_{t=1}^{T}\quantity(\frac{\alpha_{t}}{2}-\frac{\alpha_{t-1}}{2}-\frac{\sigma_{t}}{2})\norm{\bm{x}_{t}-\bm{x}}^{2}_{2}
9F2d24t=1T1αt+D2t=1T(αt2αt12σ2),\displaystyle\leq\frac{9F^{2}d^{2}}{4}\sum_{t=1}^{T}\frac{1}{\alpha_{t}}+D^{2}\sum_{t=1}^{T}\quantity(\frac{\alpha_{t}}{2}-\frac{\alpha_{t-1}}{2}-\frac{\sigma}{2}),

where the second inequality holds from Assumption 1. Plugging in αt=σt\alpha_{t}=\sigma t, we have

t=1T[ht(𝒙t)ht(𝒙)]9F2d24t=1T1σt9F2d24σ(1+1T1tdt)=9F2d24σ(1+logT).\displaystyle\sum_{t=1}^{T}[h_{t}(\bm{x}_{t})-h_{t}(\bm{x}^{\star})]\leq\frac{9F^{2}d^{2}}{4}\sum_{t=1}^{T}\frac{1}{\sigma t}\leq\frac{9F^{2}d^{2}}{4\sigma}\quantity(1+\int_{1}^{T}\frac{1}{t}\differential{t})=\frac{9F^{2}d^{2}}{4\sigma}\quantity(1+\log T).

Similar to the proof of the convex case, since we have 𝔼t[ht(𝒙)]=f^t(𝒙)\mathbb{E}_{t}\left[h_{t}(\bm{x})\right]=\widehat{f}_{t}(\bm{x}) for any 𝒙𝒳\bm{x}\in\mathcal{X} and from the inequality (4), we have

t=1T[ft(𝒙t)ft(𝒙)]\displaystyle\sum_{t=1}^{T}[f_{t}(\bm{x}_{t})-f_{t}(\bm{x}^{\star})] t=1T[f^t(𝒙t)f^t(𝒙)]+t=1T2δFt\displaystyle\leq\sum_{t=1}^{T}[\widehat{f}_{t}(\bm{x}_{t})-\widehat{f}_{t}(\bm{x}^{\star})]+\sum_{t=1}^{T}2\delta F_{t}
9F2d24σ(1+logT)+2F\displaystyle\leq\frac{9F^{2}d^{2}}{4\sigma}\quantity(1+\log T)+2F
(9F2d24σ+2F)(1+logT),\displaystyle\leq\quantity(\frac{9F^{2}d^{2}}{4\sigma}+2F)\quantity(1+\log T),

where the second inequality holds from Assumption 4 the third inequality follows by letting δ=1T\delta=\frac{1}{T}.

Next, we show the cumulative hard constraint violation bound for fixed constraints. From Eq. (17), we have

λtg^t+(𝒙t+1)9Ft2d24αt+|ht(𝒙t)ht(𝒙)|σt2𝒙𝒙t22+αt2𝒙𝒙t22αt2𝒙𝒙t+122.\displaystyle\lambda_{t}\widehat{g}^{+}_{t}(\bm{x}_{t+1})\leq\frac{9F^{2}_{t}d^{2}}{4\alpha_{t}}+\absolutevalue{h_{t}(\bm{x}_{t})-h_{t}(\bm{x}^{\star})}-\frac{\sigma_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}+\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\frac{\alpha_{t}}{2}\norm{\bm{x}^{\star}-\bm{x}_{t+1}}^{2}_{2}.

By the definition of g^t(𝒙)=γt[gt(𝒙)]+\widehat{g}_{t}(\bm{x})=\gamma_{t}[g_{t}(\bm{x})]_{+}, we have

[gt(𝒙t+1)]+\displaystyle[g_{t}(\bm{x}_{t+1})]_{+} 9Ft2d24αtλtγt+|ht(𝒙t)ht(𝒙)|λtγtσt2λtγt(𝒙𝒙t22+𝒙𝒙t22𝒙𝒙t+122)\displaystyle\leq\frac{9F^{2}_{t}d^{2}}{4\alpha_{t}\lambda_{t}\gamma_{t}}+\frac{\absolutevalue{h_{t}(\bm{x}_{t})-h_{t}(\bm{x}^{\star})}}{\lambda_{t}\gamma_{t}}-\frac{\sigma_{t}}{2\lambda_{t}\gamma_{t}}\quantity(\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}+\norm{\bm{x}^{\star}-\bm{x}_{t}}^{2}_{2}-\norm{\bm{x}^{\star}-\bm{x}_{t+1}}^{2}_{2})

By taking summation over t=1,2,,Tt=1,2,\dots,T, and plugging αt=σt,γt=tc\alpha_{t}=\sigma t,\,\gamma_{t}=t^{c} into the above inequality, we have the following result:

t=1T[gt(𝒙t+1)]+\displaystyle\sum_{t=1}^{T}[g_{t}(\bm{x}_{t+1})]_{+} 9F2d24σt=1T1t3c+ε+t=1T|ht(𝒙t)ht(𝒙)|t2c+ε\displaystyle\leq\frac{9F^{2}d^{2}}{4\sigma}\sum_{t=1}^{T}\frac{1}{t^{3c+\varepsilon}}+\sum_{t=1}^{T}\frac{\absolutevalue{h_{t}(\bm{x}_{t})-h_{t}(\bm{x}^{\star})}}{t^{2c+\varepsilon}}
27F2d24σ+3FdD(1+ε)ε=O(d2),\displaystyle\leq\frac{27F^{2}d^{2}}{4\sigma}+\frac{3FdD(1+\varepsilon)}{\varepsilon}=O(d^{2}),

where the second inequality follows from Lemma 5. ∎

Appendix C Proof of Theorem 4

Proof.

Similar to the proof of Theorem 2, we can show the upper bound of t=1T𝒙t𝒙t+122\sum_{t=1}^{T}\norm{\bm{x}_{t}-\bm{x}_{t+1}}^{2}_{2} as

t=1T𝒙t𝒙t+122\displaystyle\sum_{t=1}^{T}\norm{\bm{x}_{t}-\bm{x}_{t+1}}^{2}_{2} t=1T2lip(ht)D12αt+D212FdDσ(1+logT)+D2,\displaystyle\leq\sum_{t=1}^{T}\frac{2\operatorname{lip}(h_{t})D}{\frac{1}{2}\alpha_{t}}+D^{2}\leq\frac{12FdD}{\sigma}\quantity(1+\log T)+D^{2},

where the second inequality holds by plugging in αt=σt\alpha_{t}=\sigma t. Since we have [gt(𝒙t)]+[gt(𝒙t+1)]+G24β+β𝒙t𝒙t+122[g_{t}(\bm{x}_{t})]_{+}-[g_{t}(\bm{x}_{t+1})]_{+}\leq\frac{G^{2}}{4\beta}+\beta\norm{\bm{x}_{t}-\bm{x}_{t+1}}^{2}_{2} for any β>0\beta>0, by letting β=T1+logT\beta=\sqrt{\frac{T}{1+\log T}}, we have

t=1T([gt(𝒙t)]+[gt(𝒙t+1)]+)\displaystyle\sum_{t=1}^{T}\quantity([g_{t}(\bm{x}_{t})]_{+}-[g_{t}(\bm{x}_{t+1})]_{+}) G2T4β+βt=1T𝒙t𝒙t+122\displaystyle\leq\frac{G^{2}T}{4\beta}+\beta\sum_{t=1}^{T}\norm{\bm{x}_{t}-\bm{x}_{t+1}}^{2}_{2}
G2T4β+β(12FdDσ(1+logT)+D2)\displaystyle\leq\frac{G^{2}T}{4\beta}+\beta\quantity(\frac{12FdD}{\sigma}\quantity(1+\log T)+D^{2})
(G24+12FdDσ)T(1+logT)+D2T1+logT.\displaystyle\leq\quantity(\frac{G^{2}}{4}+\frac{12FdD}{\sigma})\sqrt{T(1+\log T)}+D^{2}\sqrt{\frac{T}{1+\log T}}.

Finally, by combining the result of Theorem 3, we have the following result:

t=1T[gt(𝒙t)]+\displaystyle\sum_{t=1}^{T}[g_{t}(\bm{x}_{t})]_{+} t=1T[gt(𝒙t+1)]++(G24+12FdDσ)T(1+logT)+D2T1+logT\displaystyle\leq\sum_{t=1}^{T}[g_{t}(\bm{x}_{t+1})]_{+}+\quantity(\frac{G^{2}}{4}+\frac{12FdD}{\sigma})\sqrt{T(1+\log T)}+D^{2}\sqrt{\frac{T}{1+\log T}}
27F2d24σ+3FdD(1+ε)ε+(G24+12FdDσ)T(1+logT)+D2T1+logT\displaystyle\leq\frac{27F^{2}d^{2}}{4\sigma}+\frac{3FdD(1+\varepsilon)}{\varepsilon}+\quantity(\frac{G^{2}}{4}+\frac{12FdD}{\sigma})\sqrt{T(1+\log T)}+D^{2}\sqrt{\frac{T}{1+\log T}}
(27F2d24σ+G24+3FdD(1+1ε+4σ)+D2)T(1+logT).\displaystyle\leq\quantity(\frac{27F^{2}d^{2}}{4\sigma}+\frac{G^{2}}{4}+3FdD\quantity(1+\frac{1}{\varepsilon}+\frac{4}{\sigma})+D^{2})\sqrt{T(1+\log T)}.