This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Exploiting the polyhedral geometry of stochastic linear bilevel programming111A short version of this article was published at the proceedings of IPCO 2023 [30]. This extended version contains more detailed discussions, examples, results, and proofs.

Gonzalo Muñoz Instituto de Ciencias de la Ingeniería, Universidad de O’Higgins, Rancagua, Chile. gonzalo.munoz@uoh.cl    David Salas Instituto de Ciencias de la Ingeniería, Universidad de O’Higgins, Rancagua, Chile. david.salas@uoh.cl    Anton Svensson Instituto de Ciencias de la Ingeniería, Universidad de O’Higgins, Rancagua, Chile. anton.svensson@uoh.cl
Abstract

We study linear bilevel programming problems whose lower-level objective is given by a random cost vector with known distribution. We consider the case where this distribution is nonatomic, allowing to reformulate the problem of the leader using the Bayesian approach in the sense of Salas and Svensson (2023), with a decision-dependent distribution that concentrates on the vertices of the feasible set of the follower’s problem. We call this a vertex-supported belief. We prove that this formulation is piecewise affine over the so-called chamber complex of the feasible set of the high-point relaxation. We propose two algorithmic approaches to solve general problems enjoying this last property. The first one is based on enumerating the vertices of the chamber complex. This approach is not scalable, but we present it as a computational baseline and for its theoretical interest. The second one is a Monte-Carlo approximation scheme based on the fact that randomly drawn points of the domain lie, with probability 1, in the interior of full-dimensional chambers, where the problem (restricted to this chamber) can be reduced to a linear program. Finally, we evaluate these methods through computational experiments showing both approaches’ advantages and challenges.

Keywords: Bilevel programming; Bayesian approach; Chamber complex; Enumeration algorithm; Monte-Carlo algorithm.

1 Introduction

Stackelberg games, also referred to as bilevel programming problems, were first introduced by H. von Stackelberg in [35]. In this seminal work, an economic equilibrium problem between two firms was studied, under the particularity that one of them, the leader, is able to anticipate the decisions of the other one, the follower. Bilevel programming is an active field of research, and we refer the reader to the monographs [13, 14] for comprehensive introductions, and to [15] for recent developments. In the last decade, researchers have started to consider uncertainty in Stackelberg games. A recent survey by Beck, Ljubić and Schmidt [3] provides an overview of new questions and contributions on this topic.

One model that considers uncertainty in Stackelberg games is the Bayesian approach [28, 32]. Their starting point is that for any given leader’s decision xx, the leader only knows that the reaction yy of the follower is selected from a set S(x)S(x), making yy a decision-dependent and uncertain variable. The leader endows the set S(x)S(x) with a probability distribution βx\beta_{x} which models how the leader believes that the possible responses of the follower are distributed.

Uncertainty in the data of the lower-level has been considered by Claus for linear bilevel programming problems from a variational perspective considering risk measures (see the survey [9] and the references therein, and the recent works [10, 11]). In [23], Ivanov considered the cost function of the follower as a bilinear form Ax+ξ(ω),y\langle Ax+\xi(\omega),y\rangle, but with fixed feasible set for the follower. The method proposed to solve the problem was a Sample Average Approximation scheme, to build a deterministic bilevel program whose size depends on the number of samples. Recently, in [5], Buchheim, Henke and Irmai considered a bilevel version of the continuous knapsack problem with uncertainty on the follower’s objective.

In this work, we consider a linear bilevel programming problem where the lower-level objective is uncertain for the leader but follows a prior known distribution (as the particular case studied in [5]). We study the problem from a Bayesian perspective [32], and by means of the so-called chamber complex of a polytope (see Section 3), which subdivides the space of the leader’s decisions in a meaningful way. The idea of using the chamber complex to understand geometrical properties of optimization problems under uncertainty is not new, but it is recent. To the best of our knowledge, the first work that does this is [17] (see also [16]), on which multistage stochastic linear optimization is studied. However, the techniques there cannot be extended to Stackelberg games, since the latter carries an intrinsical nonconvexity.

1.1 Problem formulation

Our study focuses on the setting of linear bilevel programming, which is the case where the data (objective functions and constraints) of the problem is linear and all variables are continuous. More precisely, we aim to study the problem where the leader decides a vector xnxx\in\mathbb{R}^{{n_{x}}} that solves

{minxnxd1,x+𝔼[d2,y(x,ω)]s.t.Alxbly(x,ω) solves {minynyc(ω),ys.t.Afx+Bfybf,a.s. ωΩ,\left\{\begin{array}[]{cl}\displaystyle\min_{x\in\mathbb{R}^{{n_{x}}}}&\langle d_{1},x\rangle+\mathbb{E}[\langle d_{2},y(x,\omega)\rangle]\vspace{0.2cm}\\ s.t.&A_{l}x\leq b_{l}\\ &y(x,\omega)\text{ solves }\left\{\begin{array}[]{cl}\displaystyle\min_{y\in\mathbb{R}^{{n_{y}}}}&\langle c(\omega),y\rangle\\ s.t.&\begin{array}[]{l}A_{f}x+B_{f}y\leq b_{f},\end{array}\end{array}\right.\quad\text{a.s. }\omega\in\Omega,\end{array}\right. (1)

where Alml×nxA_{l}\in\mathbb{R}^{m_{l}\times{n_{x}}}, Afmf×nxA_{f}\in\mathbb{R}^{m_{f}\times{n_{x}}}, Bfmf×nyB_{f}\in\mathbb{R}^{m_{f}\times{n_{y}}}, blmlb_{l}\in\mathbb{R}^{m_{l}}, bfmfb_{f}\in\mathbb{R}^{m_{f}}, d1nxd_{1}\in\mathbb{R}^{{n_{x}}}, d2nyd_{2}\in\mathbb{R}^{{n_{y}}}, and c:Ω𝕊nyc:\Omega\to\mathbb{S}_{{n_{y}}} is a random vector over a probability space (Ω,Σ,)(\Omega,\Sigma,\mathbb{P}) with values in the unit sphere of ny\mathbb{R}^{{n_{y}}}. The subindexes ll and ff stand for leader and follower, respectively. The notation carries the usual ambiguity of bilevel problems, which appears whenever the lower-level optimal response y(x,ω)y(x,\omega) is not uniquely determined for some xx. However, we focus our attention here on costs whose distributions are nonatomic (see Section 2.2), which implies that with probability 1, y(x,ω)y(x,\omega) is unique for all feasible xx.

Remark 1.1.

In full generality, the upper-level constraint of a deterministic bilevel problem is of the form Alx+BlyblA_{l}x+B_{l}y\leq b_{l}. These constraints are known as coupling constraints, and they introduce another level of difficulty to bilevel programming, for example, by producing disconnected feasible regions (see, e.g., [29, 2]). However, it was recently shown that for any linear bilevel problem with coupling constraints, there is a problem without coupling constraints with the same optimal solutions [21]. For the stochastic setting we are considering, one way to incorporate coupling constraints in Problem (1) could take the form of

Alx+Bly(x,ω)bl, a.s. ωΩ,A_{l}x+B_{l}y(x,\omega)\leq b_{l},\quad\text{ a.s. }\omega\in\Omega, (2)

where coupling constraints must be satisfied for almost every follower’s response. It is unclear if a similar result to [21] holds in the stochastic setting. In any case, in this first study, we prefer to keep constraints of the form (2) out of the scope of this work.

To fix ideas, let us start with a warm-up example.

Example 1.2.

Let nx=2{n_{x}}=2 and ny=1{n_{y}}=1. For the follower, consider a random cost cc such that (c=1)=(c=1)=12\mathbb{P}(c=1)=\mathbb{P}(c=-1)=\frac{1}{2} and the constraints defined by the inequalities x1y0,x1y0,x2+y1, and x2+y1x_{1}-y\leq 0,\;-x_{1}-y\leq 0,\;x_{2}+y\leq 1,\text{ and }-x_{2}+y\leq 1. For the leader, let the costs be d1=(2,1)d_{1}=(2,1) and d2=4d_{2}=4, and consider no constraints. Then, Problem (1) is given by

{minx22x1+x2+𝔼[4y(x,ω)]s.t.y(x,ω) solves {minyc(ω)ys.t.x1y0,x1y0,x2+y1,x2+y1.a.s. ωΩ,\left\{\begin{array}[]{cl}\displaystyle\min_{x\in\mathbb{R}^{2}}&2x_{1}+x_{2}+\mathbb{E}[4y(x,\omega)]\vspace{0.2cm}\\ s.t.&y(x,\omega)\text{ solves }\left\{\begin{array}[]{cl}\displaystyle\min_{y\in\mathbb{R}}&c(\omega)y\\ s.t.&\begin{array}[]{rl}x_{1}-y&\leq 0,\\ -x_{1}-y&\leq 0,\\ x_{2}+y&\leq 1,\\ -x_{2}+y&\leq 1.\end{array}\end{array}\right.\quad\text{a.s. }\omega\in\Omega,\end{array}\right. (3)

Observe that, given x2x\in\mathbb{R}^{2}, the feasible set of the follower can be written as

S(x)={y:|x1|y1|x2|},S(x)=\{y\in\mathbb{R}\ :\ |x_{1}|\leq y\leq 1-|x_{2}|\}, (4)

and thus, the feasible set of the leader is given by all points for which the lower-level admits a feasible point, that is,

X:={x2:S(x)}={x2:|x1|+|x2|1}.X:=\{x\in\mathbb{R}^{2}\ :\ S(x)\neq\emptyset\}=\{x\in\mathbb{R}^{2}\ :\ |x_{1}|+|x_{2}|\leq 1\}. (5)

Finally, for any xXx\in X, the solution y(x,ω)y(x,\omega) of the lower-level is either equal to |x1||x_{1}| with probability 12\frac{1}{2} or equal to 1|x2|1-|x_{2}| with probability 12\frac{1}{2}. So problem (1) can be rewritten as

{minx22x1+x2+4(12|x1|+12(1|x2|))s.t.|x1|+|x2|1\left\{\begin{array}[]{cl}\displaystyle\min_{x\in\mathbb{R}^{2}}&2x_{1}+x_{2}+4\left(\frac{1}{2}|x_{1}|+\frac{1}{2}(1-|x_{2}|)\right)\\ s.t.&|x_{1}|+|x_{2}|\leq 1\\ \end{array}\right. (6)

This problem is nonconvex and piecewise linear, and its optimal solution is x=(0,1)x=(0,1). \Diamond

Problem (1) is computationally challenging. Its deterministic version is well-known to be strongly NP-Hard [20], and even checking local optimality is NP-Hard [34]. Recently, it was proven that linear bilevel optimization (both in its optimistic and pessimistic versions) belongs to NP [6].

For the stochastic version considered here, we are only aware of the hardness results in [5], who consider the special case of a continuous knapsack problem (see Section 4.2); here, the authors prove that if the knapsack item values are independently and uniformly distributed, the resulting problem is #P-Hard. To the best of our knowledge, there is no stronger hardness for the general case of problem (1). On the other hand, the robust case (i.e., when the leader takes a worst-case perspective on the follower’s cost vector) has even stronger complexity results [8, 7].

1.2 Our contribution

The contributions of this work can be summarized as follows. First, we reformulate problem (1) following the Bayesian approach [32] using a belief β\beta induced by the random cost c(ω)c(\omega), which we call vertex-supported belief induced by cc. This is done in Section 2, along with some preliminaries.

Secondly, we show that the objective function of problem (1) as well as its discretization using sample average approximation methods (see, e.g., [22]) have the property of being piecewise linear over the chamber complex of an appropriate polytope Dnx×nyD\subseteq\mathbb{R}^{{n_{x}}}\times\mathbb{R}^{{n_{y}}}. That is, we show that for every polytope in the chamber complex of DD, the objectives of the aforementioned problems are affine within that polytope. This is done in Section 3.

In Section 4 we provide some illustrative examples that highlight some difficulties of working with the chamber complex of a polytope, hidden in their implicit definition.

Section 5 and Section 6 are devoted to propose two methods to solve piecewise linear problems over the chamber complex of a polytope. The first proposal, in Section 5, is a deterministic algorithm that is based on straightforward observation: the optimal solution of such a problem must be attained at a vertex of the chamber complex. Thus, we provide a MIP formulation to enumerate all the vertices of the chamber complex. The proposed algorithm is not scalable since the MIPs to be solved are of the size of the Face Lattice of DD and the number of vertex chambers could be exponential (see e.g. Example 4.2). However, we include it due to its theoretical interest in understanding the combinatorial nature of Problem (1). It also serves as an exact computational baseline for inexact algorithms. The second proposal, in Section 6, is a Monte-Carlo algorithm that is based on another very simple observation: if a point xx is drawn randomly, it will belong to the interior of a full-dimensional chamber with probability 1. It is worth to mention that either of the two methods rely on samples of the cost function c(ω)c(\omega), instead of evaluating exactly the objective function of the leader. This yields that the size of samples used to estimate the cost function only affects the complexity of the proposed methods by a linear factor.

We finish our work by testing both solution methods over numerical experiments. This is done in Section 7.

2 Preliminaries

For an integer nn\in\mathbb{N}, we write [n]:={1,,n}[n]:=\{1,\ldots,n\}. Throughout this work we consider Euclidean spaces n\mathbb{R}^{n} endowed with the Euclidean norm \|\cdot\| and their inner product ,\langle\cdot,\cdot\rangle. We denote by 𝕊n\mathbb{S}_{n} the unit sphere in n\mathbb{R}^{n}. For a set AnA\subseteq\mathbb{R}^{n}, we will write int(A)\mathop{\rm int}\nolimits(A), A¯\overline{A}, aff(A)\mathop{\rm aff}\nolimits(A), to denote its interior, closure and affine hull, respectively. We denote by ri(A)\mathop{\rm ri}\nolimits(A) its relative interior, which is its interior with respect to the affine space aff(A)\mathop{\rm aff}\nolimits(A). We denote the affine dimension of AA as dim(A)\mathop{\rm dim}\nolimits(A), which corresponds to the dimension of aff(A)\mathop{\rm aff}\nolimits(A). It is well known that if AA is nonempty and convex, then ri(A)\mathop{\rm ri}\nolimits(A) is nonempty and ri(A)¯=A¯\overline{\mathop{\rm ri}\nolimits(A)}=\overline{A}. For a convex set AA and a point xAx\in A, we write NA(x)N_{A}(x) to denote the normal cone of AA at xx, i.e.,

NA(x):={dn:d,yx0yA}.N_{A}(x):=\{d\in\mathbb{R}^{n}\ :\ \langle d,y-x\rangle\leq 0\quad\forall y\in A\}. (7)

We write 𝟙A\mathds{1}_{A} to denote the indicator function of a set AnA\subseteq\mathbb{R}^{n}, having value 11 on AA and 0 elsewhere.

For a function f:Anf:A\subseteq\mathbb{R}^{n}\to\mathbb{R} and α\alpha\in\mathbb{R}, we write [fα][f\leq\alpha], [fα][f\geq\alpha] and [f=α][f=\alpha] to denote the α\alpha-sublevel set, the α\alpha-superlevel set, and the α\alpha-level set, respectively.

A face FF of a polyhedron PP is the argmax of a linear function over PP. The collection of all faces of PP, that we denote by (P)\mathscr{F}(P), is known as the Face Lattice of PP. For any k{0,,n}k\in\{0,\ldots,n\} we will write

k(P)\displaystyle\mathscr{F}_{k}(P) :={F(P):dim(F)=k},\displaystyle:=\{F\in\mathscr{F}(P)\ :\ \mathop{\rm dim}\nolimits(F)=k\}, (8)
k(P)\displaystyle\mathscr{F}_{\leq k}(P) :=j=0kj(P)={F(P):dim(F)k}.\displaystyle:=\bigcup_{j=0}^{k}\mathscr{F}_{j}(P)=\{F\in\mathscr{F}(P)\ :\ \mathop{\rm dim}\nolimits(F)\leq k\}. (9)

The zero-dimensional faces are the vertices and we write ext(P):=0(P)\mathop{\rm ext}\nolimits(P):=\mathscr{F}_{0}(P).

In what follows, concerning problem (1), we use nx{n_{x}} and ny{n_{y}} to denote the number of variables of the leader and the follower, respectively. Similarly, we use mm for the number of constraints in the follower’s problem. Finally, given two nonempty sets XnxX\subseteq\mathbb{R}^{{n_{x}}} and YnyY\subseteq\mathbb{R}^{{n_{y}}}, we write M:XYM:X\mathop{\rightrightarrows}\nolimits Y to denote a set-valued map, i.e., a function assigning to each xXx\in X a (possibly empty) set M(x)M(x) in YY. The graph of MM is the set gphM={(x,y):yM(x)},\mathop{\rm gph}\nolimits M=\{(x,y)\,:\,y\in M(x)\}, and the domain of MM is the set domM:={x:M(x)}\mathop{\rm dom}\nolimits M:=\{x\,:\,M(x)\neq\emptyset\}.

2.1 Specific notation

Motivated by the structure of problem (1), we define m=ml+mfm=m_{l}+m_{f} and

A=[AlAf]m×nx,B=[0Bf]m×ny, and b=[blbf]m.A=\begin{bmatrix}A_{l}\\ A_{f}\end{bmatrix}\in\mathbb{R}^{m\times n_{x}},\quad B=\begin{bmatrix}0\\ B_{f}\end{bmatrix}\in\mathbb{R}^{m\times n_{y}},\quad\text{ and }\quad b=\begin{bmatrix}b_{l}\\ b_{f}\end{bmatrix}\in\mathbb{R}^{m}.

With this, we define the polyhedron DD as

D:={(x,y)nx×ny:Ax+Byb},D:=\left\{(x,y)\in\mathbb{R}^{{n_{x}}}\times\mathbb{R}^{{n_{y}}}\ :\ Ax+By\leq b\right\}, (10)

which is the feasible set of the high-point relaxation of Problem (1) (see, e.g., [25]). We then consider the feasible set XX of the leader as all points xnxx\in\mathbb{R}^{{n_{x}}} verifying the leader’s constraints and admitting at least one feasible point for the follower. That is,

X:={xnx:yny such that Ax+Byb}.X:=\big{\{}x\in\mathbb{R}^{{n_{x}}}\ :\ \exists y\in\mathbb{R}^{{n_{y}}}\text{ such that }Ax+By\leq b\big{\}}. (11)

It will be assumed along the paper that DD is full-dimensional. We do not lose generality since it is always possible to embed DD into dim(D)\mathbb{R}^{\mathop{\rm dim}\nolimits(D)}. We will also assume that DD is compact, i.e. it is a polytope. Similarly, we define the ambient space for the follower’s decision vector as

Y:={yny:xnx such that Ax+Byb}.Y:=\big{\{}y\in\mathbb{R}^{{n_{y}}}\ :\ \exists x\in\mathbb{R}^{{n_{x}}}\text{ such that }Ax+By\leq b\big{\}}. (12)

Both XX and YY are polytopes (bounded polyhedra) as DD is assumed so. We write π:nx×nynx\pi:\mathbb{R}^{{n_{x}}}\times\mathbb{R}^{{n_{y}}}\to\mathbb{R}^{{n_{x}}} to denote the parallel projection given by π(x,y)=x\pi(x,y)=x. In particular, equation (11) can be written as X=π(D)X=\pi(D). We consider the set-valued map S:XnyS:X\mathop{\rightrightarrows}\nolimits\mathbb{R}^{{n_{y}}} defined as

S(x):={yny:(x,y)D}.S(x):=\{y\in\mathbb{R}^{{n_{y}}}\ :\ (x,y)\in D\}. (13)

We call SS the fiber map of DD (through the projection π\pi). Clearly, SS has nonempty convex and compact values.

For each i[m]i\in[m] we define the function gi:nx×nyg_{i}:\mathbb{R}^{{n_{x}}}\times\mathbb{R}^{{n_{y}}}\to\mathbb{R} given by

gi(x,y):=Aix+Biybi,g_{i}(x,y):=A_{i}x+B_{i}y-b_{i}, (14)

where AiA_{i} and BiB_{i} denote the iith row of AA and BB, respectively. Thus, we can write D=i[m][gi0].D=\bigcap_{i\in[m]}[g_{i}\leq 0]. Since every function gig_{i} is affine, we will simply write gi\nabla g_{i} to denote their gradients at any point. For any subset J[m]J\subseteq[m], we define the function gJ:nx×nyg_{J}:\mathbb{R}^{{n_{x}}}\times\mathbb{R}^{{n_{y}}}\to\mathbb{R} given by

gJ(x,y):=jJgj(x,y).g_{J}(x,y):=\sum_{j\in J}g_{j}(x,y). (15)

For simplicity we write :=(D)\mathscr{F}:=\mathscr{F}(D) and for each n{0,1,,nx+ny}n\in\{0,1,\ldots,{n_{x}}+{n_{y}}\}, n:=n(D)\mathscr{F}_{n}:=\mathscr{F}_{n}(D) and n:=n(D)\mathscr{F}_{\leq n}:=\mathscr{F}_{\leq n}(D). Note that, for each face FnF\in\mathscr{F}_{n} we can select a set J[m]J\subseteq[m] of cardinality |J|=(nx+ny)n|J|=({n_{x}}+{n_{y}})-n such that

F=D[gJ=0].F=D\cap[g_{J}=0]. (16)

Thus, for any face F(D)F\in\mathscr{F}(D), we define gF:=gJg_{F}:=g_{J}, where JJ is the (selected) subset of [m][m] verifying (16). Observe that since |J||J| coincides with the codimension of FF, the set {gj:jJ}\{\nabla g_{j}\ :\ j\in J\} must be linearly independent.

2.2 Vertex-supported beliefs and Bayesian formulation

In what follows, we write (Y)\mathcal{B}(Y) to denote the Borel σ\sigma-algebra of YY and 𝒫(Y)\mathscr{P}(Y) to denote the family of all (Borel) probability measures on YY.

Recall from [32] that for a set-valued map M:XYM:X\mathop{\rightrightarrows}\nolimits Y with closed and nonempty values, a map β:X𝒫(Y)\beta:X\to\mathscr{P}(Y) is said to be a belief over MM if for every xXx\in X, the measure β(x)=βx\beta(x)=\beta_{x} concentrates over M(x)M(x), i.e., βx(M(x))=1\beta_{x}(M(x))=1.

Let (Ω,Σ,)(\Omega,\Sigma,\mathbb{P}) be a probability space. To model the cost function of the lower-level problem in (1), we will consider only random vectors c:Ω𝕊nyc:\Omega\to\mathbb{S}_{{n_{y}}} with nonatomic distributions, in the sense that

O(𝕊ny):ny1(O)=0[c(ω)O]=0,\forall O\in\mathcal{B}(\mathbb{S}_{{n_{y}}}):\quad\mathcal{H}^{{n_{y}}-1}(O)=0\implies\mathbb{P}[c(\omega)\in O]=0, (17)

where ny1\mathcal{H}^{{n_{y}}-1} denotes the (ny1)({n_{y}}-1)-Hausdorff measure over (𝕊nx,(𝕊ny))(\mathbb{S}_{{n_{x}}},\mathcal{B}(\mathbb{S}_{{n_{y}}})). In other words, the probability measure c1\mathbb{P}\circ c^{-1} is absolutely continuous with respect to ny1\mathcal{H}^{{n_{y}}-1}. Note that with this definition, any random vector c:Ωnyc:\Omega\to\mathbb{R}^{{n_{y}}} having an absolutely continuous distribution with respect to the Lebesgue measure ny\mathcal{L}^{{n_{y}}} induces an equivalent random vector c¯:Ω𝕊ny\bar{c}:\Omega\to\mathbb{S}_{{n_{y}}} given by c¯(ω):=c(ω)c(ω)\bar{c}(\omega):=\frac{c(\omega)}{\|c(\omega)\|}. This new random vector is well-defined almost surely in Ω\Omega, except for the negligible set N=c1(0)N=c^{-1}(0), and using c()c(\cdot) or c¯()\bar{c}(\cdot) in problem (1) is equivalent.

Now, to understand the distribution of the optimal response y(x,ω)y(x,\omega) induced by the random vector c:Ω𝕊nyc:\Omega\to\mathbb{S}_{{n_{y}}}, we observe that its optimality with regards to the lower-level problem is characterized by c(ω)NS(x)(y(x,ω))-c(\omega)\in N_{S(x)}(y(x,\omega)). Thus, we consider the belief β:X𝒫(Y)\beta:X\to\mathscr{P}(Y) over the fiber map S:XYS:X\mathop{\rightrightarrows}\nolimits Y given by the density

dβx(y):=pc(x,y):=[c(ω)NS(x)(y)].d\beta_{x}(y):=p_{c}(x,y):=\mathbb{P}[-c(\omega)\in N_{S(x)}(y)]. (18)

Note that [c(ω)NS(x)(y)]=[c(ω)int(NS(x)(y))𝕊ny]\mathbb{P}[-c(\omega)\in N_{S(x)}(y)]=\mathbb{P}[-c(\omega)\in\mathop{\rm int}\nolimits(N_{S(x)}(y))\cap\mathbb{S}_{{n_{y}}}] for any (x,y)D(x,y)\in D, and that int(NS(x)(y))\mathop{\rm int}\nolimits(N_{S(x)}(y)) is nonempty if and only if yy is an extreme point of S(x)S(x). Therefore, one can easily deduce that for each xXx\in X, the function pc(x,)p_{c}(x,\cdot) is a discrete density with its support contained in ext(S(x))\mathop{\rm ext}\nolimits(S(x)) and the belief β\beta can be written as

βx(O)=yext(S(x))pc(x,y)𝟙O(y),O(Y).\beta_{x}(O)=\sum_{y\in\mathop{\rm ext}\nolimits(S(x))}p_{c}(x,y)\mathds{1}_{O}(y),\quad\forall O\in\mathcal{B}(Y). (19)

We call β\beta the vertex-supported belief induced by cc. With this construction, we can rewrite problem (1) as

minxXθ(x):=d1,x+𝔼βx[d2,]\min_{x\in X}\quad\theta(x):=\,\,\langle d_{1},x\rangle+\mathbb{E}_{\beta_{x}}[\langle d_{2},\cdot\rangle]\\ (20)

where 𝔼βx[d2,]=yext(S(x))d2,ypc(x,y)\mathbb{E}_{\beta_{x}}[\langle d_{2},\cdot\rangle]=\sum_{y\in\mathop{\rm ext}\nolimits(S(x))}\langle d_{2},y\rangle p_{c}(x,y). Our goal in this work is to study problem (1) by profiting from the Bayesian formulation (20), in the sense of [32].

Remark 2.1.

Note that, by defining the centroid of S(x)S(x) with respect to βx\beta_{x} as

𝔟(x):=𝔼βx[y]=yext(S(x))pc(x,y)y,\mathfrak{b}(x):=\mathbb{E}_{\beta_{x}}[y]=\sum_{y\in\mathop{\rm ext}\nolimits(S(x))}p_{c}(x,y)y, (21)

we get that θ(x)=d1,x+d2,𝔟(x)\theta(x)=\langle d_{1},x\rangle+\langle d_{2},\mathfrak{b}(x)\rangle. Thus, the Bayesian formulation (20) can be rewritten in terms of the centroid, which is the convex combination of the vertices of S(x)S(x), proportionally to the weight of their normal cones. Of course, this is valid since the expected value 𝔼βx[d2,y]\mathbb{E}_{\beta_{x}}[\langle d_{2},y\rangle] has a linear integrand.

2.3 The chamber complex

We start by recalling the definition of polyhedral complex and chamber complex, a well-known concepts in computational geometry (see, e.g., [12]).

Definition 2.2 (Polyhedral complex).

A set of polyhedra 𝒫\mathscr{P} in nx\mathbb{R}^{{n_{x}}} is a polyhedral complex if

  1. 1.

    For each P,Q𝒫P,Q\in\mathscr{P}, PQP\cap Q is a face of QQ and of PP.

  2. 2.

    For each P𝒫P\in\mathscr{P} and each face FF of PP, F𝒫F\in\mathscr{P}.

Given P𝒫P\in\mathscr{P}, each vertex of PP (as a face of PP) belongs to 𝒫\mathscr{P} and moreover, these are the minimal elements of 𝒫\mathscr{P}. If the maximal elements in 𝒫\mathscr{P} have the same dimension, we say that 𝒫\mathscr{P} is a pure polyhedral complex. We are interested in a class of polyhedral complexes that are pure, the so-called chamber complexes, which we recall next.

Definition 2.3 (Chamber complex).

Let Dnx×nyD\subseteq\mathbb{R}^{{n_{x}}}\times\mathbb{R}^{{n_{y}}} be a polyhedron as described in (10). For each xX=π(D)x\in X=\pi(D), we define the chamber of xx as

σ(x):={π(F):F(D),xπ(F)}.\sigma(x):=\bigcap\big{\{}\pi(F)\,:\,F\in\mathscr{F}(D),\,x\in\pi(F)\big{\}}. (22)

The chamber complex is the (finite) collection of chambers, i.e.,

𝒞(D):={σ(x):xX}.\mathscr{C}(D):=\{\sigma(x)\,:\,x\in X\}. (23)

The chamber complex is a polyhedral complex and so {ri(C):C𝒞(D)}\{\mathop{\rm ri}\nolimits(C):C\in\mathscr{C}(D)\} is a partition of XX. The next proposition provides additional facts about the chamber of a point.

Proposition 2.4.

For any xXx\in X, one has that

  1. 1.

    σ(x)={π(F):Fnx,xπ(F)}\sigma(x)=\bigcap\{\pi(F)\ :\ F\in\mathscr{F}_{\leq{n_{x}}},\,x\in\pi(F)\}.

  2. 2.

    For any C𝒞(D)C\in\mathscr{C}(D), xri(C)x\in\mathop{\rm ri}\nolimits(C) if and only if C=σ(x)C=\sigma(x).

Proof.

1.1. Let us consider FF\in\mathscr{F} with xπ(F)x\in\pi(F) such that dim(F)=n>nx\mathop{\rm dim}\nolimits(F)=n>{n_{x}}. Then, there exists J[m]J\subseteq[m] with |J|=(nx+ny)n<ny|J|=({n_{x}}+{n_{y}})-n<{n_{y}} such that F=D[gJ=0]F=D\cap[g_{J}=0]. Now, choose yS(x)y\in S(x) such that the point (x,y)F(x,y)\in F is a vertex of {x}×S(x)\{x\}\times S(x). Let J^[m]\hat{J}\subseteq[m] be the set of all active constraints at (x,y)(x,y). Then, JJ^J\subseteq\hat{J} and {(x,y)}=[gJ^(x,)=0]\{(x,y)\}=[g_{\hat{J}}(x,\cdot)=0]. This implies that

{gj:jJ^}{(ek,0)nx×ny:k[nx]},\{\nabla g_{j}\ :\ j\in\hat{J}\}\cup\{(e_{k},0)\in\mathbb{R}^{{n_{x}}}\times\mathbb{R}^{{n_{y}}}\ :\ k\in[{n_{x}}]\},

has nx+ny{n_{x}}+{n_{y}} linearly independent vectors. In particular, {gj:jJ^}\{\nabla g_{j}\ :\ j\in\hat{J}\} has at least ny{n_{y}} linearly independent vectors, which yields that F^=D[gJ^=0]\hat{F}=D\cap[g_{\hat{J}}=0] satisfies that dim(F^)nx\mathop{\rm dim}\nolimits(\hat{F})\leq{n_{x}}. Then, since JJ^J\subseteq\hat{J}, we have that F^F\hat{F}\subseteq F, which yields that π(F^)π(F)\pi(\hat{F})\subseteq\pi(F). The arbitrariness of FF allows us to write

σ(x)={π(F):F,xπ(F)}{π(F):Fnx,xπ(F)},\displaystyle\sigma(x)=\bigcap\{\pi(F)\ :\ F\in\mathscr{F},\,x\in\pi(F)\}\supseteq\bigcap\{\pi(F)\ :\ F\in\mathscr{F}_{\leq{n_{x}}},\,x\in\pi(F)\},

finishing the proof of the first assertion, since the reverse inclusion is direct.

2. Assume first that C=σ(x)C=\sigma(x). Since {ri(C):C𝒞(D)}\{\mathop{\rm ri}\nolimits(C^{\prime})\ :\ C^{\prime}\in\mathscr{C}(D)\} is a partition of XX, we know there exists zXz\in X such that xri(σ(z))x\in\mathop{\rm ri}\nolimits(\sigma(z)). Assume by contradiction that xri(C)x\notin\mathop{\rm ri}\nolimits(C) which implies that zxz\neq x. Then clearly σ(x)σ(z)\sigma(x)\subseteq\sigma(z) and so σ(x)\sigma(x) is a proper face of σ(z)\sigma(z), so that σ(x)ri(σ(z))=\sigma(x)\,\cap\,\mathop{\rm ri}\nolimits(\sigma(z))=\emptyset. But this implies that xσ(x)x\notin\sigma(x), which is evidently a contradiction.

Conversely assume that xri(C)x\in\mathop{\rm ri}\nolimits(C). Since xri(σ(x))x\in\mathop{\rm ri}\nolimits(\sigma(x)) we get that ri(σ(x))ri(C)\mathop{\rm ri}\nolimits(\sigma(x))\cap\mathop{\rm ri}\nolimits(C)\neq\emptyset, and recalling again that {ri(C):C𝒞(D)}\{\mathop{\rm ri}\nolimits(C^{\prime})\ :\ C^{\prime}\in\mathscr{C}(D)\} is a partition of XX we conclude that C=σ(x)C=\sigma(x). ∎∎

In what follows, it will be useful to distinguish the maximal and the minimal chambers, with respect to the inclusion order. Minimal chambers are the zero-dimensional chambers, while maximal chambers coincide with those with nonempty interior, since DD has nonempty interior. The following definition introduces such distinction.

Definition 2.5.

Let Dnx×nyD\subseteq\mathbb{R}^{{n_{x}}}\times\mathbb{R}^{{n_{y}}} be a polyhedron as described in (10). We define the families

𝒦(D)\displaystyle\mathscr{K}(D) :={K𝒞(D):int(K)},\displaystyle:=\left\{K\in\mathscr{C}(D)\ :\ \mathop{\rm int}\nolimits(K)\neq\emptyset\right\}, (24)
𝒱(D)\displaystyle\mathscr{V}(D) :={vX:{v}𝒞(D)}.\displaystyle:=\left\{v\in X\ :\ \{v\}\in\mathscr{C}(D)\right\}. (25)

We call 𝒦(D)\mathscr{K}(D) the full-dimensional chambers of 𝒞(D)\mathscr{C}(D), and 𝒱(D)\mathscr{V}(D), the chamber vertices (or zero-dimensional chambers) of 𝒞(D)\mathscr{C}(D).

It is worth mentioning that 𝒦(D)\mathscr{K}(D) is a covering of XX, i.e., X=K𝒦(D)K.X=\bigcup_{K\in\mathscr{K}(D)}K. This follows from the fact that each chamber is contained in a maximal (full-dimensional) chamber.

Remark 2.6.

Consider D2×D\subseteq\mathbb{R}^{2}\times\mathbb{R} the polytope defined as tuples (x1,x2,y)(x_{1},x_{2},y) such that x1y0,x1y0,x2+y1, and x2+y1x_{1}-y\leq 0,\;-x_{1}-y\leq 0,\;x_{2}+y\leq 1,\text{ and }-x_{2}+y\leq 1; these are the same constraints of the problem in Example 1.2.

1234
Figure 1: The polytope DD of Remark 2.6 and its chamber complex.

Clearly the vertices of DD are v1=(1,0,0)v_{1}=(1,0,0), v2=(0,1,1)v_{2}=(0,1,1), v3=(1,0,0)v_{3}=(-1,0,0) and v4=(0,1,1)v_{4}=(0,-1,1), and their projections (1,0)(1,0), (1,0)(-1,0), (0,1)(0,1) and (0,1)(0,-1) are chamber vertices. Additionally, (0,0)(0,0) is a vertex of the chamber complex 𝒞(D)\mathscr{C}(D) since it is the intersection of the projections of the edges between v1v_{1} and v3v_{3} and between v2v_{2} and v4v_{4}.

In Figure 1, the 5 chamber vertices are depicted using dots while its 4 full-dimensional chambers are enumerated 1 to 4. \Diamond

We close this preliminary section with the following direct proposition that relates minimal and maximal chambers.

Proposition 2.7.

Given DD as in (10) with nonempty interior then 𝒱(D)\mathscr{V}(D) coincides with K𝒦(D)ext(K)\bigcup_{K\in\mathscr{K}(D)}\mathop{\rm ext}\nolimits(K).

Proof.

Since 𝒞(D)\mathscr{C}(D) is a polyhedral complex (see, e.g., [12]), a vertex of a given chamber is itself a chamber, and hence an element of 𝒱(D)\mathscr{V}(D). This proves the inclusion 𝒱(D)K𝒦(D)ext(K)\mathscr{V}(D)\supseteq\bigcup_{K\in\mathscr{K}(D)}\mathop{\rm ext}\nolimits(K).

For the reverse inclusion, take v𝒱(D)v\in\mathscr{V}(D). Since X=K𝒦(D)KX=\bigcup_{K\in\mathscr{K}(D)}K, then {v}\{v\} is a chamber that is included in some maximal chamber K𝒦(D)K\in\mathscr{K}(D). But then {v}\{v\} is a vertex of KK again from the property of polyhedral complexes. This finishes the proof. ∎∎

3 Geometrical structure of vertex-supported belief

From the properties of the chamber complex, it is easy to deduce that continuous piecewise linear functions, where the pieces are the chambers, attain their extreme values on 𝒱(D)\mathscr{V}(D). Definition 3.1 and Corollary 3.2 formalize this intuition.

Definition 3.1.

A function f:Xf:X\to\mathbb{R} is said to be piecewise linear over the chamber complex 𝒞(D)\mathscr{C}(D) if there exists a sequence of pairs {(dC,aC):C𝒞(D)}nx×\{(d_{C},a_{C})\ :\ C\in\mathscr{C}(D)\}\subseteq\mathbb{R}^{{n_{x}}}\times\mathbb{R} such that

f(x)=C𝒞(D)(dC,x+aC)𝟙ri(C)(x),xX.f(x)=\sum_{C\in\mathscr{C}(D)}(\langle d_{C},x\rangle+a_{C})\mathds{1}_{\mathop{\rm ri}\nolimits(C)}(x),\qquad\forall x\in X. (26)
Corollary 3.2.

If a function f:Xf:X\to\mathbb{R} is continuous and piecewise linear over the chamber complex 𝒞(D)\mathscr{C}(D), then it has at least one minimizer in 𝒱(D)\mathscr{V}(D).

Proof.

Since ff is continuous and XX is nonempty and compact, Weierstrass theorem entails that ff attains its minimum at some point xXx^{*}\in X. Then, there exist a chamber C𝒞(D)C\in\mathscr{C}(D) such that xCx^{*}\in C. Since ff is piecewise linear over the chamber complex and continuous, we have that f|Cf\big{|}_{C} is affine. Thus, there exists vext(C)v^{*}\in\mathop{\rm ext}\nolimits(C) such that f(v)=f|C(v)f|C(x)=f(x)f(v^{*})=f\big{|}_{C}(v^{*})\leq f\big{|}_{C}(x^{*})=f(x^{*}). Clearly ext(C)ext(K)\mathop{\rm ext}\nolimits(C)\subseteq\mathop{\rm ext}\nolimits(K) for some K𝒦(D)K\in\mathscr{K}(D) and hence Proposition 2.7 yields that v𝒱(D)v^{*}\in\mathscr{V}(D), which finishes the proof. ∎∎

3.1 Piecewise affinity of the fiber map

We show in this section that the fiber map SS defined in equation (13) is piecewise affine, in the sense of Definition 3.3 below. Some properties in this subsection seem to be well-known but are hard to find in the literature in the present form. Here, we provide a presentation (and proofs) that are exclusively based on mathematical programming tools. We also present one of our main contributions, which is Theorem 3.6.

Definition 3.3.

A set-valued map M:XnxnyM:X\subseteq\mathbb{R}^{{n_{x}}}\mathop{\rightrightarrows}\nolimits\mathbb{R}^{n_{y}} is affine on a convex set Kdom(M)K\subseteq\mathop{\rm dom}\nolimits(M) if

M(ηx1+(1η)x0)=ηM(x1)+(1η)M(x0),M(\eta x_{1}+(1-\eta)x_{0})=\eta M(x_{1})+(1-\eta)M(x_{0}),

for all x0,x1Kx_{0},x_{1}\in K and η(0,1)\eta\in(0,1), where the addition of sets is understood in the sense of Minkowski. We say that MM is piecewise affine if the domain of MM can be written as a finite union of convex subsets where, in each one of them, the set-valued map is affine.

Note that if a set-valued map MM is affine over KK and there exists x¯ri(K)\bar{x}\in\mathop{\rm ri}\nolimits(K) such that |M(x¯)|=1|M(\bar{x})|=1, then MM must be single-valued over KK. Indeed, consider xKx\in K and consider arbitrary points y,y′′M(x)y^{\prime},y^{\prime\prime}\in M(x). Since x¯ri(K)\bar{x}\in\mathop{\rm ri}\nolimits(K) we can find x~K\tilde{x}\in K and η(0,1)\eta\in(0,1) such that x¯=ηx~+(1η)x.\bar{x}=\eta\tilde{x}+(1-\eta)x. We can take y~M(x~)\tilde{y}\in M(\tilde{x}). But then both y¯=ηy~+(1η)y\bar{y}^{\prime}=\eta\tilde{y}+(1-\eta)y^{\prime} and y¯′′=ηy~+(1η)y′′\bar{y}^{\prime\prime}=\eta\tilde{y}+(1-\eta)y^{\prime\prime} belong to M(x¯)M(\bar{x}) which implies that y¯=y¯′′\bar{y}^{\prime}=\bar{y}^{\prime\prime} and this yields that y=y′′y^{\prime}=y^{\prime\prime}. So M(x)M(x) contains a single point, and from the arbitrariness of xx, we conclude that MM is single-valued in KK. With this observation, we can establish the following lemma.

Lemma 3.4.

Let M:XnxnyM:X\subseteq\mathbb{R}^{{n_{x}}}\mathop{\rightrightarrows}\nolimits\mathbb{R}^{{n_{y}}} be affine on a convex set Kdom(M)K\subseteq\mathop{\rm dom}\nolimits(M), and consider a linear functional ψ:ny\psi:\mathbb{R}^{n_{y}}\to\mathbb{R} such that for some xKx\in K, supψ(M(x))\sup\psi(M(x)) is finite and attained. Then:

  1. 1.

    The function φ(x):=supψ(M(x))\varphi(x):=\sup\psi(M(x)) is affine on KK.

  2. 2.

    The set-valued map Mψ(x):=argmaxyM(x)ψ(y)M_{\psi}(x):=\mathop{\rm argmax}\nolimits_{y\in M(x)}\psi(y) is affine on KK. Additionally, if there exists a point x¯ri(K)\bar{x}\in\mathop{\rm ri}\nolimits(K), with |Mψ(x¯)|=1|M_{\psi}(\bar{x})|=1, then MψM_{\psi} is single-valued on KK.

Proof.

Fix x0,x1Kx_{0},x_{1}\in K and η(0,1)\eta\in(0,1), and let xη=ηx1+(1η)x0x_{\eta}=\eta x_{1}+(1-\eta)x_{0}. The first part of the Lemma follows from

supψ(M(xη))\displaystyle\sup\psi(M(x_{\eta})) =supψ(ηM(x1)+(1η)M(x0))\displaystyle=\sup\psi(\eta M(x_{1})+(1-\eta)M(x_{0}))
=sup{ηψ(M(x1))+(1η)ψ(M(x0))}\displaystyle=\sup\{\eta\psi(M(x_{1}))+(1-\eta)\psi(M(x_{0}))\}
=sup{ηψ(M(x1))}+sup{(1η)ψ(M(x0))}\displaystyle=\sup\{\eta\psi(M(x_{1}))\}+\sup\{(1-\eta)\psi(M(x_{0}))\}
=ηφ(x1)+(1η)φ(x0).\displaystyle=\eta\varphi(x_{1})+(1-\eta)\varphi(x_{0}).

Let us now prove the second part. Indeed, if yMψ(xη)y\in M_{\psi}(x_{\eta}), we have that yM(xη)y\in M(x_{\eta}) and since MM is affine, then y=ηy1+(1η)y0y=\eta y_{1}+(1-\eta)y_{0} for some y0M(x0)y_{0}\in M(x_{0}) and y1M(x1)y_{1}\in M(x_{1}). Since φ\varphi is affine we have that

φ(xη)\displaystyle\varphi(x_{\eta}) =ηφ(x1)+(1η)φ(x0)\displaystyle=\eta\varphi(x_{1})+(1-\eta)\varphi(x_{0})
ηψ(y1)+(1η)ψ(y0)=ψ(y)=φ(xη).\displaystyle\geq\eta\psi(y_{1})+(1-\eta)\psi(y_{0})=\psi(y)=\varphi(x_{\eta}).

But then ηφ(x1)+(1η)φ(x0)=ηψ(y1)+(1η)ψ(y0)\eta\varphi(x_{1})+(1-\eta)\varphi(x_{0})=\eta\psi(y_{1})+(1-\eta)\psi(y_{0}) and this can only occur if ψ(y1)=φ(x1)\psi(y_{1})=\varphi(x_{1}) and ψ(y0)=φ(x0)\psi(y_{0})=\varphi(x_{0}). Thus we have y1Mψ(x1)y_{1}\in M_{\psi}(x_{1}) and y0Mψ(x0)y_{0}\in M_{\psi}(x_{0}). We have hence proved the inclusion

Mψ(xη)ηM(x1)+(1η)M(x0).M_{\psi}(x_{\eta})\subseteq\eta M(x_{1})+(1-\eta)M(x_{0}).

The reverse inclusion corresponds to the convexity of the graph of MψM_{\psi}, which follows directly from the fact that the gphMψ\mathop{\rm gph}\nolimits M_{\psi} can be written as the intersection of gphM\mathop{\rm gph}\nolimits M and {(x,y):ψ(y)φ(x)0}\{(x,y):\psi(y)-\varphi(x)\geq 0\}. The fact that MψM_{\psi} is single-valued if there exists a point x¯ri(K)\bar{x}\in\mathop{\rm ri}\nolimits(K) such that |Mψ(x¯)|=1|M_{\psi}(\bar{x})|=1 follows from the discussion preceding the Lemma statement. ∎∎

Lemma 3.5.

The fiber map SS defined in equation (13) is piecewise affine over the chamber complex 𝒞(D)\mathscr{C}(D). More precisely, for any chamber K𝒞(D)K\subseteq\mathscr{C}(D), SS is affine on KK.

Proof.

Let K𝒞(D)K\in\mathscr{C}(D) and let v1,,vkv_{1},\ldots,v_{k} be the vertices of KK. If x0,x1Kx_{0},x_{1}\in K then

x0=i=1kλ0,ivix1=i=1kλ1,ivix_{0}=\sum_{i=1}^{k}\lambda_{0,i}v_{i}\quad x_{1}=\sum_{i=1}^{k}\lambda_{1,i}v_{i}

for some λ0,i,λ1,i0\lambda_{0,i},\lambda_{1,i}\geq 0, i[k]i\in[k], such that i=1kλ0,i=i=1kλ1,i=1\sum_{i=1}^{k}\lambda_{0,i}=\sum_{i=1}^{k}\lambda_{1,i}=1. Consider now a point xη:=ηx1+(1η)x0x_{\eta}:=\eta x_{1}+(1-\eta)x_{0} for some η(0,1)\eta\in(0,1). We clearly have that xη=i=1kλη,ivix_{\eta}=\sum_{i=1}^{k}\lambda_{\eta,i}v_{i} with λη,i:=ηλ0,1+(1η)λ0,i\lambda_{\eta,i}:=\eta\lambda_{0,1}+(1-\eta)\lambda_{0,i}, i[k]i\in[k].

Applying the representation of [31, Proposition 2.4], we know that for each convex combination x=i=1kλiviKx=\sum_{i=1}^{k}\lambda_{i}v_{i}\in K, we have that S(x)=i=1kλiS(vi)S(x)=\sum_{i=1}^{k}\lambda_{i}S(v_{i}). Thus, we can write

S(xη)=i=1kλη,iS(vi)\displaystyle S(x_{\eta})=\sum_{i=1}^{k}\lambda_{\eta,i}S(v_{i}) =ηi=1kλ1,iS(vi)+(1η)i=1kλ0,iS(vi)\displaystyle=\eta\sum_{i=1}^{k}\lambda_{1,i}S(v_{i})+(1-\eta)\sum_{i=1}^{k}\lambda_{0,i}S(v_{i})
=ηS(x1)+(1η)S(x0).\displaystyle=\eta S(x_{1})+(1-\eta)S(x_{0}).

This concludes the proof.∎∎

Recall from [32] that a belief β:X𝒫(Y)\beta:X\to\mathscr{P}(Y) is said to be weak continuous if for every sequence (xk)X(x_{k})\subseteq X converging to xXx\in X the measure βxk\beta_{x_{k}} weak converges to βx\beta_{x} (in the sense of [26, Chapter 13]), that is,

Yf(y)𝑑βxk(y)kYf(y)𝑑βx(y),f:Y continuous.\int_{Y}f(y)d\beta_{x_{k}}(y)\xrightarrow{k\to\infty}\int_{Y}f(y)d\beta_{x}(y),\quad\forall f:Y\to\mathbb{R}\text{ continuous.} (27)
Theorem 3.6.

Consider a random cost c:Ω𝕊nyc:\Omega\to\mathbb{S}_{{n_{y}}} with nonatomic distribution and the vertex-supported belief β:X𝒫(Y)\beta:X\to\mathscr{P}(Y) over SS induced by cc as defined in (19). Then,

  1. 1.

    β\beta is weak continuous, and thus for any lower semicontinuous function f:X×Yf:~{}X\times Y\to\mathbb{R}, the problem minxX𝔼βx[f(x,)]\displaystyle\min_{x\in X}\,\mathbb{E}_{\beta_{x}}[f(x,\cdot)] has a solution.

  2. 2.

    The function θ:X\theta:X\to\mathbb{R} given by θ(x):=d1,x+𝔼βx[d2,]\theta(x):=\langle d_{1},x\rangle+\mathbb{E}_{\beta_{x}}[\langle d_{2},\cdot\rangle] is continuous and piecewise linear over 𝒞(D)\mathscr{C}(D).

In particular, problem (1) has at least one solution in 𝒱(D)\mathscr{V}(D).

Proof.

Since 𝒦(D)\mathscr{K}(D) is a covering of XX, then it is enough to prove the assertions within a given chamber K𝒦(D)K\in\mathscr{K}(D). Let x¯intK\bar{x}\in\mathop{\rm int}\nolimits K and consider the vertices y¯1,,y¯k\bar{y}_{1},\ldots,\bar{y}_{k} of S(x¯)S(\bar{x}). Since SS is affine on KK with compact polyhedral images, we first claim that there exist affine functions y1,y2,,yk:KYy_{1},y_{2},\ldots,y_{k}:K\to Y, such that for all xKx\in K one has that {yi(x)}i[k]=ext(S(x))\{y_{i}(x)\}_{i\in[k]}=\mathop{\rm ext}\nolimits(S(x)) or, equivalently, that

{yi(x)}i[k]ext(S(x)) and S(x)=conv{yi(x):i[k]}.\{y_{i}(x)\}_{i\in[k]}\subseteq\mathop{\rm ext}\nolimits(S(x))\quad\text{ and }\quad S(x)=\mathrm{conv}\{y_{i}(x)\ :\ i\in[k]\}.

Indeed, for each i[k]i\in[k] there exists a linear functional ψi\psi_{i} that exposes y¯i\bar{y}_{i}, i.e., such that {y¯i}={yS(x¯):ψi(y)ψi(y¯i)}\{\bar{y}_{i}\}=\{y\in S(\bar{x}):\psi_{i}(y)\geq\psi_{i}(\bar{y}_{i})\}. Then from Lemma 3.4 the maximum of ψi\psi_{i} over S(x)S(x) defines the affine function

xyi(x)=argmax{ψi(y):yS(x)},x\mapsto y_{i}(x)=\mathop{\rm argmax}\nolimits\{\psi_{i}(y):y\in S(x)\},

for each i[k]i\in[k]. This proves that {y1(x),y2(x),,yk(x)}ext(S(x))\{y_{1}(x),y_{2}(x),\ldots,y_{k}(x)\}\subseteq\mathop{\rm ext}\nolimits(S(x)) for each xKx\in K. Now, it is direct that S(x)conv(y1(x),y2(x),,yk(x))S(x)\supseteq\mathrm{conv}(y_{1}(x),y_{2}(x),\ldots,y_{k}(x)), for each xKx\in K, and so we only need to prove the direct inclusion. Suppose that this is not the case and choose xKx\in K and yS(x)My\in S(x)\setminus M with M=conv(y1(x),y2(x),,yk(x))M=\mathrm{conv}(y_{1}(x),y_{2}(x),\ldots,y_{k}(x)). Since MM is compact, there exist a linear functional ψ\psi that strictly separates yy from MM, and so supψ(S(x))>supψ(M)\sup\psi(S(x))>\sup\psi(M). Now, since x¯int(K)\bar{x}\in\mathop{\rm int}\nolimits(K), there exists xKx^{\prime}\in K and η(0,1)\eta\in(0,1) such that x¯=ηx+(1η)x\bar{x}=\eta x+(1-\eta)x^{\prime}. Let M=conv(y1(x),y2(x),,yk(x))M^{\prime}=\mathrm{conv}(y_{1}(x^{\prime}),y_{2}(x^{\prime}),\ldots,y_{k}(x^{\prime})). Then, by affinity of the functions y1,,yky_{1},\ldots,y_{k}, one has that S(x¯)=ηM+(1η)MS(\bar{x})=\eta M+(1-\eta)M^{\prime}. Then, by Lemmas 3.4 and 3.5 we can write

supψ(S(x¯))\displaystyle\sup\psi(S(\bar{x})) =supψ(ηM+(1η)M)\displaystyle=\sup\psi(\eta M+(1-\eta)M^{\prime})
=ηsupψ(M)+(1η)supψ(M)\displaystyle=\eta\sup\psi(M)+(1-\eta)\sup\psi(M^{\prime})
<ηsupψ(S(x))+(1η)supψ(S(x))=supψ(S(x¯)),\displaystyle<\eta\sup\psi(S(x))+(1-\eta)\sup\psi(S(x^{\prime}))=\sup\psi(S(\bar{x})),

which is a contradiction. This proves our first claim. Our second claim is that for each i[k]i\in[k]

NS(x)(yi(x))=j[k]:yj(x)=yi(x)NS(x¯)(y¯j),xK.N_{S(x)}(y_{i}(x))=\bigcup_{j\in[k]:y_{j}(x)=y_{i}(x)}N_{S({\bar{x}})}(\bar{y}_{j}),\quad\forall x\in K. (28)

Indeed, for the direct inclusion, fix xKx\in K and choose ψint(NS(x)(yi(x)))\psi\in\mathop{\rm int}\nolimits(N_{S(x)}(y_{i}(x))), which is nonempty since yi(x)ext(S(x))y_{i}(x)\in\mathop{\rm ext}\nolimits(S(x)). This means that ψ\psi attains its maximum on S(x)S(x) only at yi(x)y_{i}(x). Now, ψ\psi attains its maximum over S(x¯)S(\bar{x}) at some y¯j\bar{y}_{j}, and so ψNS(x¯)(y¯j)\psi\in N_{S({\bar{x}})}(\bar{y}_{j}). Since x¯int(K)\bar{x}\in\mathop{\rm int}\nolimits(K), there exists xKx^{\prime}\in K and η(0,1)\eta\in(0,1) such that x¯=ηx+(1η)x\bar{x}=\eta x+(1-\eta)x^{\prime}. Then, by construction, y¯j=ηyj(x)+(1η)yj(x)\bar{y}_{j}=\eta y_{j}(x)+(1-\eta)y_{j}(x^{\prime}), which yields that ψ\psi attains its maximum on S(x)S(x) at yj(x)y_{j}(x). This implies that yi(x)=yj(x)y_{i}(x)=y_{j}(x), and so ψj[k]:yj(x)=yi(x)NS(x¯)(y¯j)\psi\in\bigcup_{j\in[k]:y_{j}(x)=y_{i}(x)}N_{S({\bar{x}})}(\bar{y}_{j}). The inclusion is proven by taking closure of int(NS(x)(yi(x)))\mathop{\rm int}\nolimits(N_{S(x)}(y_{i}(x))). Now, for the reverse inclusion, choose j[k]j\in[k] such that yi(x)=yj(x)y_{i}(x)=y_{j}(x) and ψNS(x¯)(y¯j)\psi\in N_{S(\bar{x})}(\bar{y}_{j}). Then, ψ\psi attains its maximum on S(x¯)S(\bar{x}) at y¯j\bar{y}_{j} and so, replicating the argument above, we have that ψ\psi attains its maximum on S(x)S(x) at yj(x)y_{j}(x). Since yi(x)=yj(x)y_{i}(x)=y_{j}(x), the conclusion follows. This proves the second claim.

Now, (28) entails that xNS(x)(yi(x))x\mapsto N_{S(x)}(y_{i}(x)) is constant in int(K)\mathop{\rm int}\nolimits(K) since for every xint(K)x\in\mathop{\rm int}\nolimits(K), one has that yi(x)yj(x)y_{i}(x)\neq y_{j}(x) if and only if iji\neq j. Then for each i[k]i\in[k] it is clear that

pc(x,yi(x))=[c(ω)NS(x)(yi(x))]p_{c}(x,y_{i}(x))=\mathbb{P}[-c(\omega)\in N_{S(x)}(y_{i}(x))]

is constant in int(K)\mathop{\rm int}\nolimits(K), equal to

ci:=pc(x¯,yi(x¯)),c_{i}:=p_{c}(\bar{x},y_{i}(\bar{x})), (29)

and for points xx in the boundary of KK it holds pc(x,yi(x))={cj:j[k],yi(x)=yj(x)}.p_{c}(x,y_{i}(x))=\sum\{c_{j}:j\in[k],\,y_{i}(x)=y_{j}(x)\}. Thus, it follows that for any continuous function h:Yh:Y\to\mathbb{R} we have

𝔼βx[h]=i=1kh(yi(x))ci,xK\mathbb{E}_{\beta_{x}}[h]=\sum_{i=1}^{k}h(y_{i}(x))c_{i},\qquad\forall x\in K (30)

This last expression is continuous on KK. Then, x𝔼βx[h]x\mapsto\mathbb{E}_{\beta{x}}[h] must be continuous on XX and, since hh is arbitrary, β\beta is a weak continuous belief. Then, for every lower semicontinuous function f:X×Yf:X\times Y\to\mathbb{R}, 𝔼βx[f(x,)]\mathbb{E}_{\beta_{x}}[f(x,\cdot)] is lower semicontinuous as well (see [32]). Finally, from (30) it is clear that taking f(x,y)=d1,x+d2,yf(x,y)=\langle d_{1},x\rangle+\langle d_{2},y\rangle, the function θ(x)=𝔼βx[f(x,)]\theta(x)=\mathbb{E}_{\beta_{x}}[f(x,\cdot)] is affine over KK.∎∎

3.2 Sample average formulation

Problem (1) has an intrinsic difficulty which consists in how to evaluate the objective function θ(x)=d1,x+𝔼βx[d2,]\theta(x)=\langle d_{1},x\rangle+\mathbb{E}_{\beta_{x}}[\langle d_{2},\cdot\rangle]. To make an exact evaluation of θ\theta at a point xXx\in X one would require to compute all the vertices y1,,yky_{1},\ldots,y_{k} of S(x)S(x) and the values c1,,ckc_{1},\ldots,c_{k} defined as the “sizes” of the respective normal cones at each vertex yiy_{i}, as in (29). This is not always possible.

To deal with this issue, we consider the well-known sample average approximation (SAA) method for stochastic optimization (see, e.g., [33, 22]). We take an independent sample {c^1,,c^N0}\{\hat{c}_{1},\ldots,\hat{c}_{N_{0}}\} of the random lower-level cost c()c(\cdot) and try to solve the (now deterministic) problem

{minxd1,x+1N0i=1N0d2,yi(x)s.t.Alxbli[N0],yi(x) solves {minyc^i,ys.t.Afx+Bfybf.\left\{\begin{array}[]{cl}\min_{x}&\langle d_{1},x\rangle+\frac{1}{N_{0}}\sum_{i=1}^{N_{0}}\langle d_{2},y_{i}(x)\rangle\\ \\ s.t.&A_{l}x\leq b_{l}\\ &\forall i\in[N_{0}],\,y_{i}(x)\text{ solves }\left\{\begin{array}[]{cl}\min_{y}&\langle\hat{c}_{i},y\rangle\\ s.t.&A_{f}x+B_{f}y\leq b_{f}.\end{array}\right.\end{array}\right. (31)
Proposition 3.7.

Assume that c()c(\cdot) has a nonatomic distribution over 𝕊ny\mathbb{S}_{{n_{y}}}, in the sense of (17). Then, with probability 1, we have that

argminy{c^i,y:Afx+Bfybf} is a singletoni[N0],xX.\mathop{\rm argmin}\nolimits_{y}\left\{\langle\hat{c}_{i},y\rangle\ :\ A_{f}x+B_{f}y\leq b_{f}\right\}\text{ is a singleton}\quad\forall i\in[N_{0}],\,\forall x\in X.
Proof.

We begin by noting that this result is well-known for a fixed xXx\in X. Here we extend it to handle the universal quantifier over xx. Consider the set

𝒩:={ωΩ:xX,|argminy{c^i(ω),y:Afx+Bfybf}|>1}.\mathcal{N}:=\big{\{}\omega\in\Omega\,:\,\exists x\in X,\,|\mathop{\rm argmin}\nolimits_{y}\left\{\langle\hat{c}_{i}(\omega),y\rangle:A_{f}x+B_{f}y\leq b_{f}\right\}|>1\big{\}}.

Let ω𝒩\omega\in\mathcal{N}, and choose xXx\in X as in the definition of 𝒩\mathcal{N}. The dual of the LP miny{c^i(ω),y:BfybfAfx}\min_{y}\left\{\langle\hat{c}_{i}(\omega),y\rangle:B_{f}y\leq b_{f}-A_{f}x\right\} is

maxρ{ρT(bfAfx):BfTρ=c^i(ω),ρ0}.\max_{\rho}\left\{\rho^{T}(b_{f}-A_{f}x)\,:\,B_{f}^{T}\rho=\hat{c}_{i}(\omega),\,\rho\leq 0\right\}.

Take ρ\rho^{*} an optimal basic solution to the dual. This solution must have at most ny{n_{y}} non-zeros; moreover, after potentially permuting columns of BfTB_{f}^{T}, we can assume ρ=((B~fT)1c^i(ω),𝟎mny)\rho^{*}=((\tilde{B}_{f}^{T})^{-1}\hat{c}_{i}(\omega),\mathbf{0}_{m-{n_{y}}}) where B~f\tilde{B}_{f} is an invertible ny×ny{n_{y}}\times{n_{y}} submatrix of BfB_{f} and 𝟎mfny\mathbf{0}_{m_{f}-{n_{y}}} is the vector of mfnym_{f}-{n_{y}} zeros.

Let b~fA~fx\tilde{b}_{f}-\tilde{A}_{f}x be the sub-vector of bfAfxb_{f}-A_{f}x corresponding to the rows of B~f\tilde{B}_{f}. From linear programming theory, we know there is an optimal solution to the primal yy^{*} for which B~fy=b~fA~fx\tilde{B}_{f}y^{*}=\tilde{b}_{f}-\tilde{A}_{f}x. Since we are assuming there are at least two solutions to the primal, there must exist another optimal solution y^\hat{y} such that one of the inequalities in B~fy^b~fA~fx\tilde{B}_{f}\hat{y}\leq\tilde{b}_{f}-\tilde{A}_{f}x has positive slack. This implies that one of the components of (B~fT)1c^i(ω)(\tilde{B}_{f}^{T})^{-1}\hat{c}_{i}(\omega) must be zero, as the primal-dual pair (y^,ρ)(\hat{y},\rho^{*}) must satisfy complementary slackness. We conclude that ρ\rho^{*} has at most ny1{n_{y}}-1 non-zeros.

Dual feasibility thus implies that c^i(ω)\hat{c}_{i}(\omega) is in the span of ny1{n_{y}}-1 columns of BfB^{\top}_{f} and such a subspace (intersected with the unit sphere 𝕊ny\mathbb{S}_{{n_{y}}}) has null measure with respect to the Hausdorff measure ny1\mathcal{H}^{{n_{y}}-1}. We conclude by noting that there are only finitely many such subspaces; thus, the set of {c^i(ω):ω𝒩}\{\hat{c}_{i}(\omega)\,:\,\omega\in\mathcal{N}\} has null measure. ∎∎

Based on the above proposition, for the sample {c^1,,c^N0}\{\hat{c}_{1},\ldots,\hat{c}_{N_{0}}\} we can, for each i[N0]i\in[N_{0}], define the mapping xyi(x)x~{}\mapsto~{}y_{i}(x) where yi(x)y_{i}(x) is the unique element in argminy{c^i,y:Afx+Bfybf}\mathop{\rm argmin}\nolimits_{y}\left\{\langle\hat{c}_{i},y\rangle\ :\ A_{f}x+B_{f}y\leq b_{f}\right\}. The following direct theorem shows that the objective of the sample average approximation has also the property of being (almost surely) piecewise linear over the chamber complex 𝒞(D)\mathscr{C}(D).

Theorem 3.8.

If cc has a nonatomic distribution over 𝕊ny\mathbb{S}_{{n_{y}}}, then, with probability 1, the function θ:X\theta:X\to\mathbb{R} given by θ(x):=d1,x+1Ni=1Nd2,yi(x)\theta(x):=\langle d_{1},x\rangle+\frac{1}{N}\sum_{i=1}^{N}\langle d_{2},y_{i}(x)\rangle is well-defined and it is piecewise linear over the chamber complex 𝒞(D)\mathscr{C}(D).

We close this section by showing that problem (31) is a consistent approximation of the original problem (20).

Proposition 3.9.

Let xx^{*} be an optimal solution of problem (31) for a given sample of size N0N_{0}, and let ν\nu^{*} be the associated optimal value. Let 𝒮\mathcal{S} be the set of solutions of problem (1) and let ν¯\bar{\nu} be its optimal value. Then,

  1. 1.

    d(x,𝒮):=infx𝒮xxN00d(x^{*},\mathcal{S}):=\inf_{x\in\mathcal{S}}\|x^{*}-x\|\xrightarrow{N_{0}\to\infty}0, w.p.1.

  2. 2.

    νN0ν¯\nu^{*}\xrightarrow{N_{0}\to\infty}\bar{\nu}, w.p.1.

Proof.

Define the function fN0(x):=1N0i=1N0d2,yi(x)f_{N_{0}}(x):=\frac{1}{N_{0}}\sum_{i=1}^{N_{0}}\langle d_{2},y_{i}(x)\rangle. First, note that, with probability 1, fN0(x)f_{N_{0}}(x) converges to the function f(x):=𝔼βx[d2,]f(x):=\mathbb{E}_{\beta_{x}}[\langle d_{2},\cdot\rangle] pointwisely as N0N_{0}\to\infty. Since 𝒱(D)\mathscr{V}(D) is finite, we have that, with probability 1, fN0f_{N_{0}} converges to ff uniformly on 𝒱(D)\mathscr{V}(D). That is, for almost every infinite sample {c^i(ω)}i\{\hat{c}_{i}(\omega)\}_{i\in\mathbb{N}}, one has that supx𝒱(D)|fN0(x)f(x)|0\sup_{x\in\mathscr{V}(D)}|f_{N_{0}}(x)-f(x)|\to 0 as N0N_{0}\to\infty.

Consider a full-dimensional chamber K𝒦(D)K\in\mathscr{K}(D). Then affinity of fN0f_{N_{0}} and ff on KK yields that

supxK|fN0(x)f(x)|=supxext(K)|fN0(x)f(x)|supx𝒱(D)|fN0(x)f(x)|,\sup_{x\in K}|f_{N_{0}}(x)-f(x)|=\sup_{x\in\mathop{\rm ext}\nolimits(K)}|f_{N_{0}}(x)-f(x)|\leq\sup_{x\in\mathscr{V}(D)}|f_{N_{0}}(x)-f(x)|,

where the last inequality follows from the inclusion ext(K)𝒱(D)\mathop{\rm ext}\nolimits(K)\subseteq\mathscr{V}(D) (see Proposition 2.7). Thus, with probability 1, the convergence is uniform on KK. Since there are finitely many full-dimensional chambers and they cover XX, we conclude that the convergence is, with probability 1, uniform in XX as well. Thus from [33, Theorem 5.3] we conclude the desired properties.∎∎

4 Some illustrative examples

Here we provide some academic examples illustrating key ideas and difficulties of the chamber complex. We also revise in detail the bilevel continuous knapsack problem with uncertain costs, which was recently studied in [5].

4.1 Academic examples

Example 4.1 (𝒱(D)\mathscr{V}(D) can be arbitrarily large for fixed dimensions (2×12\times 1)).

Let nn\in\mathbb{N} and fix ε(0,1)\varepsilon\in(0,1). For each i=0,,ni=0,\ldots,n define αi=εi2n2(2ni+1)\alpha_{i}=\frac{\varepsilon i}{2n^{2}}(2n-i+1) and βi=1in\beta_{i}=1-\frac{i}{n}. Now consider the following polytope DnD_{n} of all points (x,y)2×(x,y)\in\mathbb{R}^{2}\times\mathbb{R} satisfying that αi+βi|x1|y1αiβi|x2|\alpha_{i}+\beta_{i}|x_{1}|\leq y\leq 1-\alpha_{i}-\beta_{i}|x_{2}|, for all i{0,,n}i\in\{0,\ldots,n\}.

(a) The polytope D2D_{2} from Example 4.1
(b) The chamber complex of D2D_{2}.
Figure 2: Polytope of Example 4.1 whose chamber complex has several vertices that are not projections of vertices.

Note that since α0=0\alpha_{0}=0 and β0=1\beta_{0}=1, the polytope DnD_{n} is a subset of DD in Figure 1, but has suffered some cuts. The chamber complex of DnD_{n} has (2n+1)2(2n+1)^{2} vertices that are not projections of vertices of the polytope itself. We depict D2D_{2} and its chamber complex in Figure 2. \Diamond

Example 4.2 (𝒱(D)\mathscr{V}(D) can be exponentially larger than 𝒦(D)\mathscr{K}(D)).

Consider a polytope D~n\tilde{D}_{n} in n×\mathbb{R}^{n}\times\mathbb{R}, where nx=n{n_{x}}=n and ny=1{n_{y}}=1, defined as the convex hull of the bottom hypercube B={xn:x1}×{0}B=\{x\in\mathbb{R}^{n}:\|x\|_{\infty}\leq 1\}\times\{0\} and the top hypercube T={xn:x1/2}×{1}T=\{x\in\mathbb{R}^{n}:\|x\|_{\infty}\leq 1/2\}\times\{1\}. The representation of D~n\tilde{D}_{n} by linear inequalities can be written as

D~n={(x,y):0y112y1xi112yi[n].}\tilde{D}_{n}=\left\{(x,y)\ :\ \begin{array}[]{l}0\leq y\leq 1\\ \frac{1}{2}y-1\leq x_{i}\leq 1-\frac{1}{2}y\quad\forall i\in[n].\end{array}\right\}

We call D~n\tilde{D}_{n} the (n+1)(n+1)-truncated Pyramid. Figure 3 illustrates the truncated pyramid D~2\tilde{D}_{2} and Figure 4 provides its chamber complex in 2\mathbb{R}^{2}. While it is no longer possible to draw D~3\tilde{D}_{3} as a 4-dimensional object, Figure 5 shows the chamber complex in this case.

Figure 3: Truncated Pyramid D~2\tilde{D}_{2}
Figure 4: Chamber complex of D~2\tilde{D}_{2}
Figure 5: Chamber complex of D~3\tilde{D}_{3}

The (n+1)(n+1)-truncated pyramid D~n\tilde{D}_{n} is an example where the number of vertices of the chamber complex |𝒱(D~n)|=2n+1|\mathscr{V}(\tilde{D}_{n})|=2^{n+1} is exponentially larger than the number of full-dimensional chambers |𝒦(D~n)|=2n+1|\mathscr{K}(\tilde{D}_{n})|=2n+1. \Diamond

4.2 The bilevel continuous knapsack problem

Let a=(a1,,an)+na=(a_{1},\ldots,a_{n})\in\mathbb{R}_{+}^{n} be a weight vector and let A:=i=1naiA:=\sum_{i=1}^{n}a_{i}. The bilevel continuous knapsack problem associated to the weight vector aa is given by

BK(a):={maxxδx+𝔼[dy(x,ω)]s.t.x[L,U]y(x,ω) solves {minyc(ω),ys.t.ayx,y[0,1]nya.s. ωΩ,BK(a):=\left\{\begin{array}[]{cl}\displaystyle\max_{x}&-\delta x+\mathbb{E}[d^{\top}y(x,\omega)]\\ \\ s.t.&x\in[L,U]\\ &y(x,\omega)\text{ solves }\left\{\begin{array}[]{cl}\displaystyle\min_{y}&\langle c(\omega),y\rangle\\ s.t.&\begin{array}[]{l}a^{\top}y\leq x,\\ y\in[0,1]^{{n_{y}}}\end{array}\end{array}\right.\quad\text{a.s. }\omega\in\Omega,\end{array}\right. (32)

where δ+\delta\in\mathbb{R}_{+} is the cost of the knapsack capacity (decided by the leader), and dnd\in\mathbb{R}^{n} is the valorization of the leader over the objects 11 to nn. Once the knapsack size xx is decided, the follower must choose how much of each (divisible) object to carry with, respecting the capacity constraint ayxa^{\top}y\leq x, using as criteria the uncertain cost c(ω)c(\omega). A detailed study of this problem has been recently done in [5]. It is shown that the set of chamber vertices is

𝒱(D)={iIai:I[n]}[L,U],\mathscr{V}(D)=\left\{{\textstyle\sum_{i\in I}a_{i}\ :\ I\subseteq[n]}\right\}\cap[L,U], (33)

under the convention ai=0\sum_{\emptyset}a_{i}=0. Choosing a+na\in\mathbb{R}^{n}_{+} adequately, it is possible to obtain different values for every partial sum in (33) (for instance, with ai=2ia_{i}=2^{i} for all i[n]i\in[n]). This leads to |𝒱(D)|=2n|\mathscr{V}(D)|=2^{n}, which is exponentially large with respect to the 2n+32n+3 constraints needed to define DD. Furthermore, 𝒦(D)\mathscr{K}(D) is given by the intervals between consecutive vertices, and so |𝒦(D)|=2n1|\mathscr{K}(D)|=2^{n}-1. Thus, this is an example where 𝒦(D)\mathscr{K}(D) is almost as large as 𝒱(D)\mathscr{V}(D), and both are exponentially large with respect to the data of the problem.

When ana\in\mathbb{N}^{n} (i.e., the weights (a1,,an)(a_{1},\ldots,a_{n}) are integers), one can give an overestimation of 𝒱(D)\mathscr{V}(D) given by 𝒱(D){0,1,2,,A}[L,U]\mathscr{V}(D)\subseteq\{0,1,2,\ldots,A\}\cap[L,U], where A=i=1naiA=\sum_{i=1}^{n}a_{i}. Thus, assuming that we have an oracle to evaluate the function θ(x)=𝔼[d2,y(x,ω)]d1,x\theta(x)=\mathbb{E}[\langle d_{2},y(x,\omega)\rangle]-\langle d_{1},x\rangle for any x[0,A]x\in[0,A] (or that we are solving a sample approximation), then we can solve BK(a)BK(a) in pseudo-polynomial time by computing the values {θ(0),θ(1),,θ(A)}\{\theta(0),\theta(1),\ldots,\theta(A)\} (the number AA requires log(A)\log(A) bits to be represented).

Example 4.3.

Let us consider a very simple instance of problem (32), where n=2n=2, a1=1a_{1}=1, a2=2a_{2}=2, and cc follows a uniform distribution over 𝕊2\mathbb{S}_{2}. The polytope DD corresponding to the feasible region of the high-point relaxation of the problem is depicted in Figure 6.

In this case, the chamber vertices are exactly 𝒱(D)={0,1,2,3}\mathscr{V}(D)=\{0,1,2,3\}, and so, the full-dimensional chambers are given by 𝒦(D)={[0,1],[1,2],[2,3]}\mathscr{K}(D)=\{[0,1],[1,2],[2,3]\}. The fiber map SS in each full-dimensional chamber, together with the normal cones of the extreme points, is depicted in Figure 7.

00.511.522.53xxy1y_{1}y2y_{2}
Figure 6: Polytope DD. In yellow, S(0.5)S(0.5), S(1.5)S(1.5) and S(2.5)S(2.5).
π2+α\tfrac{\pi}{2}+\alphaπα\pi-\alphaπ2\tfrac{\pi}{2}
(a)
π2+α\tfrac{\pi}{2}+\alphaπ2α\tfrac{\pi}{2}-\alphaπ2\tfrac{\pi}{2}π2\tfrac{\pi}{2}
(b)
α\alphaπ2α\tfrac{\pi}{2}-\alphaπ2\tfrac{\pi}{2}π2\tfrac{\pi}{2}π2\tfrac{\pi}{2}
(c)
Figure 7: Slices S(0.5)S(0.5), S(1.5)S(1.5) and S(2.5)S(2.5), and normal cones. α=arccos(1/5)\alpha=\arccos(1/\sqrt{5}).

Setting α=arccos(1/5)\alpha=\arccos(1/\sqrt{5}), it is possible to show that the centroid map 𝔟\mathfrak{b} of the fiber map SS is given by

𝔟(x)={(14+α2π)[0x/2]+(14+π/2α2π)[x0] if x[0,1],(14+α2π)[0x/2]+14[10]+π/2α2π[1(x1)/2] if x[1,2],14[01]+α2π[x21]+14[10]+π/2α2π[1(x1)/2] if x[2,3].\mathfrak{b}(x)=\begin{cases}{\small\left(\frac{1}{4}+\frac{\alpha}{2\pi}\right)\begin{bmatrix}0\\ x/2\end{bmatrix}+\left(\frac{1}{4}+\frac{\pi/2-\alpha}{2\pi}\right)\begin{bmatrix}x\\ 0\end{bmatrix}}\quad&\text{ if }x\in[0,1],\\ {\small\left(\frac{1}{4}+\frac{\alpha}{2\pi}\right)\begin{bmatrix}0\\ x/2\end{bmatrix}+\frac{1}{4}\begin{bmatrix}1\\ 0\end{bmatrix}+\frac{\pi/2-\alpha}{2\pi}\begin{bmatrix}1\\ (x-1)/2\end{bmatrix}}&\text{ if }x\in[1,2],\\ {\small\frac{1}{4}\begin{bmatrix}0\\ 1\end{bmatrix}+\frac{\alpha}{2\pi}\begin{bmatrix}x-2\\ 1\end{bmatrix}+\frac{1}{4}\begin{bmatrix}1\\ 0\end{bmatrix}+\frac{\pi/2-\alpha}{2\pi}\begin{bmatrix}1\\ (x-1)/2\end{bmatrix}}&\text{ if }x\in[2,3].\end{cases}

This computation allows us to explicitly determine θ(x)\theta(x), simply by noting that θ(x)=d𝔟(x)δx\theta(x)=d^{\top}\mathfrak{b}(x)-\delta x. For appropriate values of δ\delta and dd, it is possible to set the minimum of θ\theta to be attained at any of the chamber vertices. \Diamond

5 Enumeration algorithm

The rest of the work is focused on algorithms to solve problem (20) in the case when we can evaluate the objective function xd1,x+𝔼βx[d2,]x\mapsto\langle d_{1},x\rangle+\mathbb{E}_{\beta_{x}}[\langle d_{2},\cdot\rangle], or to solve problem (31), otherwise. The key observation is that, thanks to Theorems 3.6 and 3.8, both problems have the form

minxXθ(x),\min_{x\in X}\theta(x), (34)

where θ:X\theta:X\to\mathbb{R} is a continuous function, piecewise linear over the chamber complex 𝒞(D)\mathscr{C}(D). Thus, we will provide algorithms to solve this generic problem.

In this setting, Corollary 3.2 gives us a natural strategy to solve problem (34). Compute the chamber vertices 𝒱(D)\mathscr{V}(D) and evaluate the corresponding objective function θ\theta at each one of them. In this section, we provide an enumeration algorithm to compute 𝒱(D)\mathscr{V}(D) by sequentially solving mixed-integer programming problems which are formulated using nx\mathscr{F}_{\leq{n_{x}}}.

Remark 5.1.

Computing 𝒱(D)\mathscr{V}(D) is at least as hard as computing all vertices of a polytope. Indeed, given an arbitrary (full-dimensional) polytope PnP\subseteq\mathbb{R}^{n}, one can consider

D:={(x,y)n+1:xP, 0y1}=P×[0,1].D:=\{(x,y)\in\mathbb{R}^{n+1}\,:\,x\in P,\,0\leq y\leq 1\}=P\times[0,1].

We observe that 𝒱(D)\mathscr{V}(D) corresponds exactly to the extreme points of PP. This follows since all faces of DD are either parallel to PP or orthogonal to PP. In both cases, the projection of faces of DD are also faces of PP (see Figure 8).

PP
Figure 8: Illustration of DD (the volume) as direct lifting of PP (in gray).

To the best of our knowledge, the complexity of finding all vertices of a polytope PP is currently unknown. However, for a polyhedron PP (not necessarily bounded) it is known that it is NP-complete to decide, given a subset of vertices of PP, if there is a new vertex of PP to add to the subset [24].

5.1 Mixed-integer formulation

In order to compute chambers, Definition 2.3 tells us that we need access to the Face Lattice \mathscr{F}. However, Proposition 2.4 improves this requirement reducing \mathscr{F} only to nx\mathscr{F}_{\leq{n_{x}}}. Recalling the representation (16), we have that each face FnF\in\mathscr{F}_{n} is represented by a set of constraints J[m]J\subseteq[m] with |J|=codim(F)=(nx+ny)n|J|=\mathop{\rm codim}\nolimits(F)=({n_{x}}+{n_{y}})-n. Thus,

|nx|=n=0nx|n|n=nynx+ny(mn)=O(mnx+ny).|\mathscr{F}_{\leq{n_{x}}}|=\sum_{n=0}^{{n_{x}}}|\mathscr{F}_{n}|\leq\sum_{n={n_{y}}}^{{n_{x}}+{n_{y}}}{m\choose n}=O(m^{{n_{x}}+{n_{y}}}). (35)

Given a chamber C𝒞(D)C\in\mathscr{C}(D) we define its label as

(C):={Fnx:Cπ(F)}\ell(C):=\left\{F\in\mathscr{F}_{\leq{n_{x}}}\ :\ C\subseteq\pi(F)\right\} (36)

and similarly for a point xXx\in X we define (x):=(σ(x))\ell(x):=\ell(\sigma(x)). Labels are a one-to-one representation of chambers, by noting that for every chamber C𝒞(D)C\in\mathscr{C}(D), we can write C=F(C)π(F)C=\bigcap_{F\in\ell(C)}\pi(F).

Lemma 5.2.

For every C1,C2𝒞(D)C_{1},C_{2}\in\mathscr{C}(D) one has that

C1C2(C1)(C2).C_{1}\subseteq C_{2}\iff\ell(C_{1})\supseteq\ell(C_{2}). (37)

Thus, the set {(v):v𝒱(D)}\{\ell(v)\ :\ v\in\mathscr{V}(D)\} corresponds to the maximal elements of {(C):C𝒞(D)}\{\ell(C)\ :\ C\in\mathscr{C}(D)\}, inclusion-wise.

Proof.

The implication \Rightarrow follows easily from the definition. For the \Leftarrow implication if (C1)(C2)\ell(C_{1})\supseteq\ell(C_{2}), then

C1=F(C1)π(F)F(C2)π(F)=C2.C_{1}=\bigcap_{F\in\ell(C_{1})}\pi(F)\subseteq\bigcap_{F\in\ell(C_{2})}\pi(F)=C_{2}.

The final part is direct from (37)∎∎

The next proposition shows how to compute the label of a point xXx\in X, without the need of computing the projections of faces. We do this using a single LP formulation.

Proposition 5.3.

Let xXx\in X and consider an optimal solution (yF:Fnx)(y_{F}^{*}\ :\ F\in\mathscr{F}_{\leq{n_{x}}}) of the following LP problem

{max(yF)FnxgF(x,yF)s.t.ByFbAx,Fnx.\left\{\begin{array}[]{cl}\displaystyle\max_{(y_{F})}&\sum_{F\in\mathscr{F}_{\leq{n_{x}}}}g_{F}(x,y_{F})\\ s.t.&By_{F}\leq b-Ax,\quad\forall F\in\mathscr{F}_{\leq{n_{x}}}.\end{array}\right. (38)

Then

(x)={Fnx:gF(x,yF)=0}.\ell(x)=\left\{F\in\mathscr{F}_{\leq{n_{x}}}\ :\ g_{F}(x,y_{F}^{*})=0\right\}. (39)
Proof.

By Proposition 2.4, xri(σ(x))x\in\mathop{\rm ri}\nolimits(\sigma(x)) and then for each FnxF\in\mathscr{F}_{\leq{n_{x}}} we have

σ(x)π(F)\displaystyle\sigma(x)\subseteq\pi(F) xπ(F)\displaystyle\iff x\in\pi(F)
yFS(x) such that gF(x,yF)=0\displaystyle\iff\exists y^{*}_{F}\in S(x)\text{ such that }g_{F}(x,y_{F}^{*})=0

where the last equivalence comes from (16). Since xx is fixed, and hence there are no coupling constraints among the variables yFy_{F}, FnxF\in\mathscr{F}_{\leq{n_{x}}}, it follows that

any solution (yF:Fnx)(y_{F}^{*}\ :\ F\in\mathscr{F}_{\leq{n_{x}}}) of (38) must satisfy that gF(x,yF)=0g_{F}(x,y_{F}^{*})=0 for all faces in (x)\ell(x), and only for those faces.∎∎

Using these results, we can generate an element of 𝒱(D)\mathscr{V}(D) through a mixed-integer linear formulation that aims at finding a maximal element of {(C):C𝒞(D)}\{\ell(C)\ :\ C\in\mathscr{C}(D)\}. The following formulation achieves this:

maxz,x,y\displaystyle\max_{z,x,y}\quad FnxzF\displaystyle\sum_{F\in\mathscr{F}_{\leq{n_{x}}}}z_{F} (40a)
s.t. Ax+ByFb\displaystyle Ax+By_{F}\leq b Fnx\displaystyle\forall F\in\mathscr{F}_{\leq{n_{x}}} (40b)
AFx+BFyFbFMF(1zF)\displaystyle A_{F}x+B_{F}y_{F}\geq b_{F}-M_{F}(1-z_{F}) Fnx\displaystyle\forall F\in\mathscr{F}_{\leq{n_{x}}} (40c)
zF{0,1}\displaystyle z_{F}\in\{0,1\} Fnx\displaystyle\forall F\in\mathscr{F}_{\leq{n_{x}}} (40d)

Here, yy and zz stand for the vectors (yF:Fnx)(y_{F}\ :\ F\in\mathscr{F}_{\leq{n_{x}}}) and (zF:Fnx)(z_{F}\ :\ F\in\mathscr{F}_{\leq{n_{x}}}), respectively. For each FnxF\in\mathscr{F}_{\leq{n_{x}}}, AFA_{F}, BFB_{F} and bFb_{F} are submatrices of AA, BB and bb such that F={(x,y)D:AFx+BFy=bF}F=\{(x,y)\in D\ :\ A_{F}x+B_{F}y=b_{F}\}, MM is a vector of mm positive values such that Aix+BiybiMiA_{i}x+B_{i}y-b_{i}\geq-M_{i}, for all (x,y)D(x,y)\in D, and MFM_{F} the corresponding subvector, matching the indices of bFb_{F}. The vector MM is well defined when DD is a polytope, and can be easily computed using mm linear programs. Formulation (40) tries to “activate” (with zF=1z_{F}=1) as many faces as possible keeping non-empty the intersection of their projection.

Let (z,x,y)(z^{*},x^{*},y^{*}) be an optimal solution of (40). Clearly, xx^{*} is an element of 𝒱(D)\mathscr{V}(D), thus, we can collect it and focus on generating a new element of 𝒱(D)\mathscr{V}(D). Noting that (x)={Fnx:zF=1}\ell(x^{*})=\{F\in\mathscr{F}_{\leq{n_{x}}}\,:\,z^{*}_{F}=1\}, we see that such a new element can be obtained by adding the following inequality constraint to (40):

Fnx:zF=0zF1.\sum_{F\in\mathscr{F}_{\leq{n_{x}}}\,:\,z_{F}^{*}=0}z_{F}\geq 1. (41)

Since (z,x,y)(z^{*},x^{*},y^{*}) induced a maximal element of {(C):C𝒞(D)}\{\ell(C)\ :\ C\in\mathscr{C}(D)\}, we can easily see that constraint (41) is removing only xx^{*} of 𝒱(D)\mathscr{V}(D) from (40).

This procedure can be iterated until the optimization problem becomes infeasible; however, in order to avoid detecting infeasibility in our computational implementation, we add a new binary variable ss that can relax (41) when needed. Under these considerations, we present the precise model we use. Suppose we have generated a set V𝒱(D)V\subseteq\mathscr{V}(D). We may generate an element of 𝒱(D)V\mathscr{V}(D)\setminus V or determine that V=𝒱(D)V=\mathscr{V}(D) using the following optimization problem:

maxz,s,x,y\displaystyle\max_{z,s,x,y}\quad FnxzF\displaystyle\sum_{F\in\mathscr{F}_{\leq{n_{x}}}}z_{F} (42a)
s.t. Ax+ByFb\displaystyle Ax+By_{F}\leq b Fnx\displaystyle\forall F\in\mathscr{F}_{\leq{n_{x}}} (42b)
AFx+BFyFbFMF(1zF)\displaystyle A_{F}x+B_{F}y_{F}\geq b_{F}-M_{F}(1-z_{F}) Fnx\displaystyle\forall F\in\mathscr{F}_{\leq{n_{x}}} (42c)
s+F(v)zF1\displaystyle s+\sum_{F\not\in\ell(v)}z_{F}\geq 1 vV\displaystyle\forall v\in V (42d)
FnxzF|nx|(1s)\displaystyle\sum_{F\in\mathscr{F}_{\leq{n_{x}}}}z_{F}\leq|\mathscr{F}_{\leq{n_{x}}}|(1-s) (42e)
zF{0,1}\displaystyle z_{F}\in\{0,1\} Fnx\displaystyle\forall F\in\mathscr{F}_{\leq{n_{x}}} (42f)
s{0,1}\displaystyle s\in\{0,1\} (42g)
Lemma 5.4.

Problem (42) is feasible provided that DD\neq\emptyset. Moreover, in an optimal solution (z,s,x,y)(z^{*},s^{*},x^{*},y^{*}), then one (and only one) of the following hold

  • s=0s^{*}=0 and x𝒱(D)Vx^{*}\in\mathscr{V}(D)\setminus V.

  • s=1s^{*}=1 and 𝒱(D)=V\mathscr{V}(D)=V.

Proof.

Let us first show that (42) is feasible. Since DD is nonempty, there exists a point (x¯,y¯)D(\bar{x},\bar{y})\in D. Then, we can consider s^=1\hat{s}=1, x^=x¯\hat{x}=\bar{x}, and, for each FnxF\in\mathscr{F}_{\leq{n_{x}}}, z^F=0\hat{z}_{F}=0 and y^F=y¯\hat{y}_{F}=\bar{y}. It is easy to verify that (z^,s^,x^,y^)(\hat{z},\hat{s},\hat{x},\hat{y}) is feasible.

For the second part of the statement, we begin by tackling the case of s=0s^{*}=0. By constraint (42d), we know that xVx^{*}\not\in V since, for each vVv\in V, zF=1z_{F}^{*}=1 for some F(v)F\not\in\ell(v) (recall that (v)\ell(v) is the set of all faces whose intersection of projections is {v}\{v\}). The fact that x𝒱(D)x^{*}\in\mathscr{V}(D) follows from the optimality of the solution and Lemma 5.2, as constraint (42d) only removes elements of VV.

Now assume s=1s^{*}=1. Constraint (42e) implies that zF=0z^{*}_{F}=0, Fnx\forall F\in\mathscr{F}_{\leq{n_{x}}}. If there were an element x~𝒱(D)V\tilde{x}\in\mathscr{V}(D)\setminus V, we could clearly set z~F=1\tilde{z}_{F}=1 F(x~)\forall F\in\ell(\tilde{x}) and find y~F\tilde{y}_{F} according to (38). The vector (z~,0,x~,y~F)(\tilde{z},0,\tilde{x},\tilde{y}_{F}) would be feasible and, since (x~)\ell(\tilde{x}) must be nonempty,

FnxzF=0<Fnxz~F\sum_{F\in\mathscr{F}_{\leq{n_{x}}}}z^{*}_{F}=0<\sum_{F\in\mathscr{F}_{\leq{n_{x}}}}\tilde{z}_{F}

This contradicts the optimality of (z,s,x,y)(z^{*},s^{*},x^{*},y^{*}), thus 𝒱(D)=V\mathscr{V}(D)=V.∎∎

In Algorithm 1 we formalize our enumeration procedure. The correctness of the algorithm is given by Lemma 5.4. To solve problem (34), it is enough to run Algorithm 1, and evaluate θ\theta over the set V=𝒱(D)V=\mathscr{V}(D).

1 Input: A,B,bA,B,b defining a polytope D={(x,y)nx×ny:Ax+Byb}D=\{(x,y)\in\mathbb{R}^{n_{x}}\times\mathbb{R}^{n_{y}}\,:\,Ax+By\leq b\};
2 Set V=V=\emptyset, s=0s^{*}=0;
3 Compute nx(D)\mathscr{F}_{\leq{n_{x}}}(D);
4 while true do
5       Solve problem (42) and obtain an optimal solution (z,s,x,y)(z^{*},s^{*},x^{*},y^{*});
6       if s=0s^{*}=0 then
7             VV{x}V\leftarrow V\cup\{x^{*}\};
8             Store (x)={Fnx(D):zF=1}\ell(x^{*})=\{F\in\mathscr{F}_{\leq{n_{x}}}(D)\,:\,z^{*}_{F}=1\};
9            
10       else
11             break;
12            
13       end if
14      
15 end while

Result: The set V=𝒱(D)V=\mathscr{V}(D)
Algorithm 1 Chamber vertex enumeration algorithm
Remark 5.5.

It is worth noting that Algorithm 1 is double-exponential, since problem (42) is a MIP whose number of variables is linear on the size of the Face Lattice, this last being possibly exponential on the size of the original problem. This makes Algorithm 1 far from being scalable. It is presented, however, to give a computational baseline (useful for really small problems), since, to the best of our knowledge, there is no other general algorithm in the literature computing global solution for problems of the form (34) (and in particular (1)).

5.2 Practical computational considerations

Algorithm 1 is basically made up of two (expensive) computations: the (partial) Face Lattice nx\mathscr{F}_{\leq{n_{x}}} and solving problem (42). For Face Lattice computations, we rely on Polymake [18], which is a highly optimized package for these purposes. For solving problem (42) we use Gurobi [19], which we enhance using the following simple local search heuristic.

A local-search heuristic:

When solving problem (42) in Algorithm 1, it is not strictly necessary to have the true optimal solution. A feasible solution (z^,s^,x^,y^)(\hat{z},\hat{s},\hat{x},\hat{y}) such that z^\hat{z} induces a maximal element of {(C):C𝒞(D)}\{\ell(C)\ :\ C\in\mathscr{C}(D)\} suffices. We can exploit this in the following way: whenever a feasible solution (z^,s^,x^,y^)(\hat{z},\hat{s},\hat{x},\hat{y}) is found by Gurobi, we flip any component z^F=0\hat{z}_{F}=0 to 1, and check if the xx components can be modified to obtain a new feasible solution to (42). This is a linear program. We repeat the process for each z^F=0\hat{z}_{F}=0 and pass the resulting solution to Gurobi via a callback.

6 Monte-Carlo approximation scheme

The enumeration algorithm of Section 5 has several drawbacks. First, it requires to compute (in practice) the whole Face Lattice of DD, which might depend exponentially on the whole dimension nx+ny{n_{x}}+{n_{y}}. Even with the Face Lattice at hand, computing all chamber vertices in 𝒱(D)\mathscr{V}(D) can be hard, and as we have seen in Section 4, 𝒱(D)\mathscr{V}(D) might be exponentially large.

Another approach, that we explore in this section, is to try to compute the collection of full-dimensional chambers 𝒦(D)\mathscr{K}(D). If one has access to this family, one could find the solution of problem (34) as follows:

  1. 1.

    For each K𝒦(D)K\in\mathscr{K}(D), compute dKnxd_{K}\in\mathbb{R}^{{n_{x}}} such that θ|K=dK,+aK\theta\big{|}_{K}=\langle d_{K},\cdot\rangle+a_{K} (constant aKa_{K} can be disregarded), and solve the linear problem minxKdK,x\min_{x\in K}\langle d_{K},x\rangle.

  2. 2.

    Noting that X=K𝒦(D)KX=\bigcup_{K\in\mathscr{K}(D)}K, the solution xx^{*} with minimal value θ(x)\theta(x^{*}) among those found in the first step must be the solution of problem (34).

Listing the whole family 𝒦(D)\mathscr{K}(D) could be exponentially long, as the Knapsack problem illustrates in Section 4, but we have an advantage: If we draw a point xXx\in X randomly, σ(x)\sigma(x) will be a full-dimensional chamber almost surely. Indeed, this follows from Proposition 2.4 and the fact that there are finitely many chambers in 𝒞(D)\mathscr{C}(D), and only those in 𝒦(D)\mathscr{K}(D) are not negligible.

6.1 Label representation for full-dimensional chambers

To simplify the exposition, from now on, we will assume that X[0,1]nxX\subseteq[0,1]^{{n_{x}}} and we will write Xc:=[0,1]nxXX^{c}:=[0,1]^{{n_{x}}}\setminus X. Since DD is a polytope, this requirement can be easily attained by a mild change of variables. To be able to consider samples in [0,1]nx[0,1]^{{n_{x}}} (which are easy to generate, and they are independent of the specific structure of DD), we identify (Xc)\ell(X^{c}) with \emptyset, which correspond to the set of faces that are activated by a point xXcx\in X^{c}.

Note that whenever F(K)F\in\ell(K) for a full-dimensional chamber K𝒦(D)K\in\mathscr{K}(D), then dim(F)nx\dim(F)\geq{n_{x}}. Then, we obtain the following direct lemma, that improves Proposition 2.4.

Lemma 6.1.

let K𝒦(D)K\in\mathscr{K}(D). Then, for any xint(K)x\in\mathop{\rm int}\nolimits(K), we have that

K={π(F):Fnx,xπ(F)}and(K)nx.K=\bigcap\left\{\pi(F)\ :\ F\in\mathscr{F}_{{n_{x}}},\,x\in\pi(F)\right\}\quad\text{and}\quad\ell(K)\subseteq\mathscr{F}_{{n_{x}}}. (43)

This is already a substantial improvement since computing nx\mathscr{F}_{{n_{x}}} is less demanding than computing nx\mathscr{F}_{\leq{n_{x}}}. Moreover, the size of nx\mathscr{F}_{{n_{x}}} can be controlled by fixing only the follower’s dimension ny{n_{y}} since

|nx||{J[m]:|J|=ny}|=(mny)=O(mny).|\mathscr{F}_{{n_{x}}}|\leq\left|\{J\subseteq[m]\ :\ |J|={n_{y}}\}\right|={m\choose{n_{y}}}=O(m^{{n_{y}}}). (44)

Motivated by this observation, for every x[0,1]nxx\in[0,1]^{{n_{x}}} we define

^(x)={Fnx:xπ(F)}.\hat{\ell}(x)=\{F\in\mathscr{F}_{{n_{x}}}\ :\ x\in\pi(F)\}. (45)

Then, for almost every x[0,1]nxx\in[0,1]^{{n_{x}}}, one has that ^(x)=(x)\hat{\ell}(x)=\ell(x). The next lemma, which is also direct, allows us to compute the linear component of θ\theta in a full-dimensional chamber KK at any point xint(K)x\in\mathop{\rm int}\nolimits(K).

Lemma 6.2.

Let xXx\in X and let =^(x)\ell=\hat{\ell}(x). If K:=σ(x)K:=\sigma(x) is a full-dimensional chamber, then for each j[nx]j\in[{n_{x}}], the following problem

{maxt,(yF)Fts.t.(x+tej,yF)D,F,gF(x+tej,yF)=0,F,\left\{\begin{array}[]{cl}\displaystyle\max_{t,\,(y_{F})_{F\in\ell}}&t\\ s.t.&\begin{array}[]{ll}(x+te_{j},y_{F})\in D,&\forall F\in\ell,\\ g_{F}(x+te_{j},y_{F})=0,&\forall F\in\ell,\end{array}\end{array}\right. (46)

has a solution tj>0t_{j}^{*}>0. Moreover, for every function θ:X\theta:X\to\mathbb{R} continuous and piecewise linear over the chamber complex 𝒞(D)\mathscr{C}(D), the vector dnxd_{\ell}\in\mathbb{R}^{{n_{x}}} such that θ|K=d,+a\theta\big{|}_{K}=\langle d_{\ell},\cdot\rangle+a_{\ell} (for some aa_{\ell}\in\mathbb{R}) can be taken as

d=(θ(x+t1e1)θ(x)t1,,θ(x+tnxenx)θ(x)tnx).d_{\ell}=\begin{pmatrix}\frac{\theta(x+t_{1}^{*}e_{1})-\theta(x)}{t_{1}^{*}},&\ldots,&\frac{\theta(x+t_{{n_{x}}}^{*}e_{{n_{x}}})-\theta(x)}{t_{{n_{x}}}^{*}}\end{pmatrix}^{\top}.
Proof.

Let K=σ(x)K=\sigma(x). Since xri(K)=int(K)x\in\mathop{\rm ri}\nolimits(K)=\mathop{\rm int}\nolimits(K), it is clear that for every j[nx]j\in[{n_{x}}], the problem

maxt{t:x+tejK},\max_{t}\left\{t\ :\ x+te_{j}\in K\right\},

has a solution tj>0t_{j}^{*}>0. The conclusion then follows by noting that the above optimization problem is equivalent to problem (46). The rest is direct.∎∎

Based on both lemmas, we propose the Monte-Carlo algorithm presented in Algorithm 2 to approximate the solution of problem (34), by randomly drawing points from [0,1]nx[0,1]^{{n_{x}}}. The algorithm returns two lists that allow to compute the visited chambers during the execution, and the numbers of visits per chamber. This information is used to estimate error measures, as it is described in the next subsection.

1 Input: A,B,bA,B,b defining a polytope D={(x,y)nx×ny:Ax+Byb}D=\{(x,y)\in\mathbb{R}^{n_{x}}\times\mathbb{R}^{n_{y}}\,:\,Ax+By\leq b\}, a function θ\theta continuous, computable, and piecewise linear over 𝒞(D)\mathscr{C}(D);
2 Generate a (uniformly iid) training sample SS of size NN over [0,1]nx[0,1]^{{n_{x}}};
3 Set List1=\texttt{List1}=\emptyset, List2=\texttt{List2}=\emptyset, x^=NaN\hat{x}=NaN, θ^=\hat{\theta}=\infty;
4 foreach ξS\xi\in S do
5       Compute =^(ξ):={Fnx:ξπ(F)}\ell=\hat{\ell}(\xi):=\{F\in\mathscr{F}_{{n_{x}}}\ :\ \xi\in\pi(F)\};
6       List1List1{(ξ,)}\texttt{List1}\leftarrow\texttt{List1}\cup\{(\xi,\ell)\};
7       if List2\ell\in\texttt{List2} or =\ell=\emptyset then
8             continue;
9      
10      else
11             List2List2{}\texttt{List2}\leftarrow\texttt{List2}\cup\{\ell\};
12             Compute dd_{\ell} as in Lemma 6.2;
13             Solve the problem
{minx,(yF)Fd,xs.t.Ax+ByFbFgF(x,yF)=0F\left\{\begin{array}[]{cl}\displaystyle\min_{x,(y_{F})_{F\in\ell}}&\langle d_{\ell},x\rangle\\ s.t.&\begin{array}[]{ll}Ax+By_{F}\leq b&\forall F\in\ell\\ g_{F}(x,y_{F})=0&\forall F\in\ell\end{array}\end{array}\right.
finding a solution x^\hat{x}_{\ell} and set the value θ^=θ(x^)\hat{\theta}_{\ell}=\theta(\hat{x}_{\ell});
14             if θ^<θ^\hat{\theta}_{\ell}<\hat{\theta} then
15                   x^x^\hat{x}\leftarrow\hat{x}_{\ell}, θ^θ^\hat{\theta}\leftarrow\hat{\theta}_{\ell};
16             end if
17            
18       end if
19      
20 end foreach

Result: The pair solution-value (x^,θ^)(\hat{x},\hat{\theta}), and the lists List1, List2.
Algorithm 2 Monte-Carlo algorithm

6.2 Volume-error estimators

The main drawback of the Algorithm 2 is that we do not know if the result is an optimal solution of problem (34) or not. Therefore, we would like to provide a suitable measure of error for the estimation (x^,θ^)(\hat{x},\hat{\theta}) produced by Algorithm 2. Error bounds based on Lipschitzness of θ\theta are not an option due to the curse of dimensionality: the number of samples required to be (metrically) close to the optimal solution grows exponentially with nx{n_{x}} (see, e.g., [27]). Instead, we propose to consider the volume of the “unseen chambers” during the execution of the Monte-Carlo algorithm.

To do so, define =𝒫(nx)\mathscr{L}=\mathcal{P}(\mathscr{F}_{{n_{x}}}), where 𝒫()\mathcal{P}(\cdot) denotes the power set. Recalling that (Xc)=\ell(X^{c})=\emptyset, we get that {(K):K𝒦(D){Xc}}\{\ell(K)\ :\ K\in\mathscr{K}(D)\cup\{X^{c}\}\} is a subset of \mathscr{L}. For each \ell\in\mathscr{L}, let us define 𝟙:=𝟙int(K)\mathds{1}_{\ell}:=\mathds{1}_{\mathop{\rm int}\nolimits(K_{\ell})}, where

K={K𝒦(D) if =(K)Xc if =. otherwise.\displaystyle K_{\ell}=\left\{\begin{array}[]{cl}K\in\mathscr{K}(D)&\text{ if }\ell=\ell(K)\\ X^{c}&\text{ if }\ell=\emptyset.\\ \emptyset&\text{ otherwise.}\end{array}\right. (47)

Let 𝒩=[0,1]nx{int(K):K𝒦(D){Xc}}\mathcal{N}=[0,1]^{{n_{x}}}\setminus\bigcup\{\mathop{\rm int}\nolimits(K)\ :\ K\in\mathscr{K}(D)\cup\{X^{c}\}\}. Note that nx(𝒩)=0\mathcal{L}^{{n_{x}}}(\mathcal{N})=0, and that for each x[0,1]nx𝒩x\in[0,1]^{{n_{x}}}\setminus\mathcal{N} there is one and only one \ell\in\mathscr{L} such that 𝟙(x)=1\mathds{1}_{\ell}(x)=1, which is given by =^(x)\ell=\hat{\ell}(x), as in equation (45).

In what follows, for a set KnxK\subseteq\mathbb{R}^{{n_{x}}} we write Vol(K)\mathrm{Vol}(K) as its nx{n_{x}}-dimensional volume, i.e., Vol(K)=nx(K)\mathrm{Vol}(K)=\mathcal{L}^{{n_{x}}}(K), where nx\mathcal{L}^{{n_{x}}} stands for the Lebesgue measure over nx\mathbb{R}^{{n_{x}}}.

Proposition 6.3.

Let {1,2,,p}\{\ell_{1},\ell_{2},\ldots,\ell_{p}\} be a subset of \mathscr{L} with no repeated elements. Let ZZ be a uniformly distributed random vector over [0,1]nx[0,1]^{{n_{x}}}. Then,

  1. 1.

    i=1p𝟙i\sum_{i=1}^{p}\mathds{1}_{\ell_{i}} coincides with the indicator function of i=1pint(Ki)\bigcup_{i=1}^{p}\mathop{\rm int}\nolimits(K_{\ell_{i}}).

  2. 2.

    𝔼[i=1p𝟙i(Z)]=Vol(i=1pKi)=i=1pVol(Ki)\mathbb{E}\left[\sum_{i=1}^{p}\mathds{1}_{\ell_{i}}(Z)\right]=\mathrm{Vol}\left(\bigcup_{i=1}^{p}K_{\ell_{i}}\right)=\sum_{i=1}^{p}\mathrm{Vol}\left(K_{\ell_{i}}\right).

  3. 3.

    Var[i=1p𝟙i(Z)]1\mathrm{Var}\left[\sum_{i=1}^{p}\mathds{1}_{\ell_{i}}(Z)\right]\leq 1.

Proof.

Set U=i=1pint(Ki)U=\bigcup_{i=1}^{p}\mathop{\rm int}\nolimits(K_{\ell_{i}}). Since {ri(C):C𝒞(D)}{Xc}\{\mathop{\rm ri}\nolimits(C)\ :\ C\in\mathscr{C}(D)\}\cup\{X^{c}\} is a partition of [0,1]nx[0,1]^{{n_{x}}} and since the labels {1,2,,p}\{\ell_{1},\ell_{2},\ldots,\ell_{p}\} are all different, we get that the union defining UU is pairwise disjoint. Then,

𝟙U(x)=1\displaystyle\mathds{1}_{U}(x)=1 !i[p],xint(Ki)i=1p𝟙i(x)=1.\displaystyle\iff\exists!i\in[p],x\in\mathop{\rm int}\nolimits(K_{\ell_{i}})\iff\sum_{i=1}^{p}\mathds{1}_{\ell_{i}}(x)=1.

This proves statement 1. Statement 2 follows directly from 1, by noting that the uniform distribution of ZZ entails that 𝔼[𝟙O(Z)]=Vol(O)\mathbb{E}[\mathds{1}_{O}(Z)]=\mathrm{Vol}(O) for each O[0,1]nxO\subseteq[0,1]^{{n_{x}}}. Let us prove statement 3. By definition, we have Vol(U)1\mathrm{Vol}(U)\leq 1 and thus max{Vol(U),1Vol(U)}1\max\{\mathrm{Vol}(U),1-~{}\mathrm{Vol}(U)\}\leq 1. We can write

Var[i=1p𝟙i(Z)]\displaystyle\mathrm{Var}\left[\sum_{i=1}^{p}\mathds{1}_{\ell_{i}}(Z)\right] =[0,1]nx(𝟙U(z)Vol(U))2𝑑z\displaystyle=\int_{[0,1]^{{n_{x}}}}(\mathds{1}_{U}(z)-\mathrm{Vol}(U))^{2}dz
[0,1]nxmax{Vol(U),1Vol(U)}2dz[0,1]nx1dz=1.\displaystyle\leq\int_{[0,1]^{{n_{x}}}}\max\{\mathrm{Vol}(U),1-\mathrm{Vol}(U)\}^{2}dz\leq\int_{[0,1]^{{n_{x}}}}1dz=1.

This concludes the proof.∎∎

Now, let us consider a sample S={ξ1,,ξN}S=\{\xi_{1},\ldots,\xi_{N}\} of independent uniformly distributed random variables over [0,1]nx[0,1]^{{n_{x}}}. We define the random set K(S)K(S) as

K(S):=i=1N1K(ξi).K(S):=\bigcup_{i=1}^{N_{1}}K_{\ell(\xi_{i})}.

Then, the complement of K(S)K(S), namely K(S)c=[0,1]nxK(S)K(S)^{c}=[0,1]^{{n_{x}}}\setminus K(S), can be represented with its indicator function, and can be written (almost surely) as

𝟙K(S)c(x)=[i=1N(1𝟙(ξi))]𝟙(x),x[0,1]nx𝒩.\mathds{1}_{K(S)^{c}}(x)=\sum_{\ell\in\mathscr{L}}\left[\prod_{i=1}^{N}(1-\mathds{1}_{\ell}(\xi_{i}))\right]\mathds{1}_{\ell}(x),\quad\forall x\in[0,1]^{{n_{x}}}\setminus\mathcal{N}.
Definition 6.4.

For a sample SS of [0,1]nx[0,1]^{{n_{x}}}, we define the volume-error estimator as

ρ(S)=Vol(K(S)c)=1Vol(K(S)).\rho(S)=\mathrm{Vol}(K(S)^{c})=1-\mathrm{Vol}(K(S)). (48)

The volume-error estimator can be interpreted as follows: For a given sample SS and an optimal solution xx^{*} of problem (34), ρ(S)\rho(S) is the probability of xK(S)cx^{*}\in K(S)^{c}, if xx^{*} were a random vector following a uniform distribution in [0,1]nx[0,1]^{{n_{x}}}. Another interpretation is that ρ(S)\rho(S) is the proportion of points that still have a chance to have a lower value than θ^\hat{\theta} since, by construction, θ^θ(x)\hat{\theta}\leq\theta(x) for all xK(S)x\in K(S).

To estimate ρ(S)\rho(S), we will divide the sample SS in two subsamples: a first sample S1={ξ1,,ξN1}S_{1}=\{\xi_{1},\ldots,\xi_{N_{1}}\}, called the training sample, and a second one S2={ζ1,,ζN2}S_{2}=\{\zeta_{1},\ldots,\zeta_{N_{2}}\}, called the testing sample. It is important to address that this division must be decided before any realization of S=(S1,S2)S=(S_{1},S_{2}) is obtained, to preserve independence. With this division, we can define the estimator

U(S1,S2)=1N2j=1N2𝟙K(S1)c(ζj)=1N2j=1N2[i=1N1(1𝟙(ξi))]𝟙(ζj),U(S_{1},S_{2})=\frac{1}{N_{2}}\sum_{j=1}^{N_{2}}\mathds{1}_{K(S_{1})^{c}}(\zeta_{j})=\frac{1}{N_{2}}\sum_{j=1}^{N_{2}}\sum_{\ell\in\mathscr{L}}\left[\prod_{i=1}^{N_{1}}(1-\mathds{1}_{\ell}(\xi_{i}))\right]\mathds{1}_{\ell}(\zeta_{j}), (49)

which selects a random set K(S1)K(S_{1}) given by all the chambers observed by the training sample S1S_{1}, and then estimates the volume of the complement using the testing sample S2S_{2}. Observe that, from the array List1 produced by Algorithm 2, we can easily compute U(S1,S2)U(S_{1},S_{2}) as

U(S1,S2)=1N2j=1N2[i=1N1(1𝟙^(ζj)(ξi))]=1N2j=1N2[i=1N1(^(ζj)^(ξi))],U(S_{1},S_{2})=\frac{1}{N_{2}}\sum_{j=1}^{N_{2}}\left[\prod_{i=1}^{N_{1}}(1-\mathds{1}_{\hat{\ell}(\zeta_{j})}(\xi_{i}))\right]=\frac{1}{N_{2}}\sum_{j=1}^{N_{2}}\left[\prod_{i=1}^{N_{1}}\big{(}\hat{\ell}(\zeta_{j})\neq\hat{\ell}(\xi_{i})\big{)}\right],

where (^(ζj)^(ξi))\big{(}\hat{\ell}(\zeta_{j})\neq\hat{\ell}(\xi_{i})\big{)} is interpreted as its associated boolean value (11 if the inequality holds, and 0 otherwise).

Proposition 6.5.

Let (S1,S2)(S_{1},S_{2}) be a fixed training-testing division for the sample SS. The following assertions hold:

  1. 1.

    The conditional expectation 𝔼[U(S1,S2)|S1]\mathbb{E}[U(S_{1},S_{2})|S_{1}] is well-defined, S1S_{1}-measurable and it verifies that

    𝔼[U(S1,S2)|S1]=Vol(K(S1)c)=1Vol(K(S1)).\mathbb{E}[U(S_{1},S_{2})|S_{1}]=\mathrm{Vol}(K(S_{1})^{c})=1-\mathrm{Vol}(K(S_{1})).
  2. 2.

    The expected error is bounded by N21/2N_{2}^{-1/2}, i.e.,

    𝔼[|e(S1,S2)|]1N2,\mathbb{E}\Big{[}|e(S_{1},S_{2})|\Big{]}\leq\frac{1}{\sqrt{N_{2}}},

    where e(S1,S2)=Vol(K(S1)c)U(S1,S2)e(S_{1},S_{2})=\mathrm{Vol}(K(S_{1})^{c})-U(S_{1},S_{2}).

Proof.

Set Ω1=([0,1]nx)N1\Omega_{1}=\big{(}[0,1]^{n_{x}}\big{)}^{N_{1}} and Ω2=([0,1]nx)N2\Omega_{2}=\big{(}[0,1]^{n_{x}}\big{)}^{N_{2}}. Let 1\mathbb{P}_{1} be the Lebesgue probability measure over Ω1\Omega_{1} and 2\mathbb{P}_{2} be the Lebesgue probability measure over Ω2\Omega_{2}. By its construction in (49), it is clear that UU is (1×2)(\mathbb{P}_{1}\times\mathbb{P}_{2})-integrable in the probability product space (Ω1,(Ω1),1)×(Ω2,(Ω2),2)(\Omega_{1},\mathcal{B}(\Omega_{1}),\mathbb{P}_{1})\times(\Omega_{2},\mathcal{B}(\Omega_{2}),\mathbb{P}_{2}), and thus, Fubini’s theorem (see, e.g., [26, Theorem 14.19]) entails that

𝔼[U(S1,S2)|S1]=U(S1,S2)𝑑2(S2)\mathbb{E}[U(S_{1},S_{2})|S_{1}]=\int U(S_{1},S_{2})d\mathbb{P}_{2}(S_{2})

is well-defined and 1\mathbb{P}_{1}-integrable. Now, for a fixed value of S1S_{1}, we get K(S1)cK(S_{1})^{c} is a fixed closed set and that U(S1,S2)=1N2j=1N2𝟙K(S1)c(ζj)U(S_{1},S_{2})=\frac{1}{N_{2}}\sum_{j=1}^{N_{2}}\mathds{1}_{K(S_{1})^{c}}(\zeta_{j}). Thus,

U(S1,S2)𝑑(S2)=1N2j=1N2𝔼ζj[𝟙K(S1)c(ζj)]=Vol(K(S1)c).\int U(S_{1},S_{2})d\mathbb{P}(S_{2})=\frac{1}{N_{2}}\sum_{j=1}^{N_{2}}\mathbb{E}_{\zeta_{j}}[\mathds{1}_{K(S_{1})^{c}}(\zeta_{j})]=\mathrm{Vol}(K(S_{1})^{c}).

This finishes the first part of the proof. Now, for the second part, we first observe that

e(S1,S2)2=(𝔼[U(S1,S2)|S1]U(S1,S2))2,e(S_{1},S_{2})^{2}=(\mathbb{E}[U(S_{1},S_{2})|S_{1}]-U(S_{1},S_{2}))^{2}, (50)

which is (1×2)(\mathbb{P}_{1}\times\mathbb{P}_{2})-integrable. Again, applying Fubini’s theorem, we get that

𝔼[e(S1,S2)2]=𝔼[𝔼[e(S1,S2)2|S1]]=𝔼[Var[U(S1,S2)|S1]].\mathbb{E}[e(S_{1},S_{2})^{2}]=\mathbb{E}\Big{[}\mathbb{E}[e(S_{1},S_{2})^{2}|S_{1}]\Big{]}=\mathbb{E}\Big{[}\mathrm{Var}[U(S_{1},S_{2})|S_{1}]\Big{]}.

Finally, the conditional variance Var[U(S1,S2)|S1]\mathrm{Var}[U(S_{1},S_{2})|S_{1}] is just given by the variance of the mean estimator of 𝟙K(S1)c\mathds{1}_{K(S_{1})^{c}} for the sample S2S_{2}, i.e.,

Var[U(S1,S2)|S1]\displaystyle\mathrm{Var}[U(S_{1},S_{2})|S_{1}] =𝔼[U(S1,S2)2|S1]𝔼[U(S1,S2)|S1]2\displaystyle=\mathbb{E}[U(S_{1},S_{2})^{2}|S_{1}]-\mathbb{E}[U(S_{1},S_{2})|S_{1}]^{2}
=(1N2j=1N2𝟙K(S1)c(ζj))2Vol(K(S1)c)2.\displaystyle=\left(\frac{1}{N_{2}}\sum_{j=1}^{N_{2}}\mathds{1}_{K(S_{1})^{c}}(\zeta_{j})\right)^{2}-\mathrm{Vol}(K(S_{1})^{c})^{2}.

Assuming K(S1)cK(S_{1})^{c} as fixed, the above expression is bounded by 1/N21/N_{2} (see, e.g., [27, Chapter 1]), given that Var(𝟙K(S)c)1\mathrm{Var}(\mathds{1}_{K(S)^{c}})\leq 1 thanks to Proposition 6.3. Thus, by a mild application of Jensen’s inequality, we get that

𝔼[|e(S1,S2)|]\displaystyle\mathbb{E}[|e(S_{1},S_{2})|] 𝔼[e(S1,S2)2]=(Var[U(S1,S2)|S1]𝑑(S1))1/21N2.\displaystyle\leq\sqrt{\mathbb{E}[e(S_{1},S_{2})^{2}]}=\left(\int\mathrm{Var}[U(S_{1},S_{2})|S_{1}]d\mathbb{P}(S_{1})\right)^{-1/2}\leq\frac{1}{\sqrt{N_{2}}}.

This finishes the proof. ∎∎

For a large enough training sample S1S_{1}, we would expect Vol(K(S1)c)\mathrm{Vol}(K(S_{1})^{c}) to be small with high probability. However, we don’t know how small Vol(K(S1)c)\mathrm{Vol}(K(S_{1})^{c}) actually is. The size of Vol(K(S1)c)\mathrm{Vol}(K(S_{1})^{c}) is estimated with S2S_{2}.

Theorem 6.6.

For a fixed training-testing division (S1,S2)(S_{1},S_{2}) of sample SS and for a confidence level ε>0\varepsilon>0, the volume-error estimator ρ(S)\rho(S) is less than U(S1,S2)+ε1N2U(S_{1},S_{2})+\frac{\varepsilon^{-1}}{\sqrt{N_{2}}}, i.e.,

[ρ(S)U(S1,S2)+ε1N2]1ε.\mathbb{P}\left[\rho(S)\leq U(S_{1},S_{2})+\frac{\varepsilon^{-1}}{\sqrt{N_{2}}}\right]\geq 1-\varepsilon. (51)
Proof.

We know that ρ(S)=Vol(K(S1S2)c)Vol(K(S1)c)\rho(S)=\mathrm{Vol}(K(S_{1}\cup S_{2})^{c})\leq\mathrm{Vol}(K(S_{1})^{c}). Thus, as a mild application of Markov’s inequality, we can write

[ρ(S)U(S1,S2)+ε1N2]\displaystyle\mathbb{P}\left[\rho(S)\leq U(S_{1},S_{2})+\frac{\varepsilon^{-1}}{\sqrt{N_{2}}}\right] =[Vol(K(S1)c)U(S1,S2)+ε1N2]\displaystyle=\mathbb{P}\left[\mathrm{Vol}(K(S_{1})^{c})\leq U(S_{1},S_{2})+\frac{\varepsilon^{-1}}{\sqrt{N_{2}}}\right]
[|Vol(K(S1)c)U(S1,S2)|ε1N2]\displaystyle\geq\mathbb{P}\left[\big{|}\mathrm{Vol}(K(S_{1})^{c})-U(S_{1},S_{2})\big{|}\leq\frac{\varepsilon^{-1}}{\sqrt{N_{2}}}\right]
=1[|e(S1,S2)|>ε1N2]\displaystyle=1-\mathbb{P}\left[\big{|}e(S_{1},S_{2})\big{|}>\frac{\varepsilon^{-1}}{\sqrt{N_{2}}}\right]
1εN2𝔼[|e(S1,S2)|].\displaystyle\geq 1-\varepsilon\sqrt{N_{2}}\mathbb{E}[|e(S_{1},S_{2})|].

Since 𝔼[|e(S1,S2)|]1/N2\mathbb{E}[|e(S_{1},S_{2})|]\leq 1/\sqrt{N_{2}} by Proposition 6.5, we deduce (51), finishing the proof.∎∎

7 Numerical experiments and conclusion

The goal of this section is to illustrate how both Algorithms 1 and 2 perform in some instances of bilevel optimization with uncertain cost. To do so, we adapt some deterministic bilevel problems available in the literature.

7.1 Set-up and instances

We implemented both Algorithms 1 and 2 in Julia 1.8.2 [4], using Polymake [18] to compute the faces of a polytope and Gurobi 9.5.2 [19] to solve (42) and any auxiliary LP. Our code is publicly available in https://github.com/g-munoz/bilevelbayesian. All experiments were run single-threaded on a Linux machine with an Intel Xeon Silver 4210 2.2G CPU and 128 GB RAM. The main objectives behind these experiments were (1) to determine how Algorithm 1 scales and (2) how well the Monte-Carlo algorithm performs in comparison to the exact method. A global time limit of 15 minutes was set for Algorithm 1; in case this time limit is met, only the chamber vertices that were found are used.

We focus our attention on sample average formulations, as in (31), where the lower-level cost is assumed to have a uniform distribution over the unit sphere. We use instances from two publicly available libraries: BOLib [36] and the bilevel instances in [1], which we call CoralLib. Since our approach relies on computing a (possibly exponentially) large number of faces, we can only consider low-dimensional instances at the moment: we restrict to nx+ny10n_{x}+n_{y}\leq 10.

Additionally, we consider randomly generated instances of the stochastic bilevel continuous knapsack problem [5], as presented in Section 4. In our experiments, we consider aa to be a random non-negative vector, δ=1/4\delta=1/4, and dd a vector of ones. We call Knapsack_i an instance generated for ny=in_{y}=i. While these instances have a more efficient algorithm for them than the one presented here (see [5]), they are helpful in showing how well our general-purpose Monte-Carlo algorithm performs.

In all experiments, we used a sample of size N0=100N_{0}=100 for the follower’s cost vector. The same sample is used in both algorithms to better compare their performance. Additionally, in Algorithm 2 we used a training sample of size N1=100N_{1}=100 and a testing sample of size N2=100N_{2}=100 as well.

7.2 Results

In Table 1, we compare the performance of both methods.

The “Obj gap” column shows how far the value of the Monte-Carlo algorithm is from the exact method, i.e., if vali\text{val}_{i} is the value obtained by Algorithm ii, then the gap is Gap=|val1|1(val2val1)\mbox{Gap}=|\text{val}_{1}|^{-1}(\text{val}_{2}-\text{val}_{1}). Since we ran Algorithm 1 with a time limit, it may be that Gap<0\mbox{Gap}<0, which indicates the Monte-Carlo algorithm performing better than the exact method. The “Error” column shows the error estimation as per (49). All execution times are in seconds.

The results in Table 1 clearly show an advantage of the Monte-Carlo approach over the exact method. The Monte-Carlo approach was able to meet or surpass the value of the exact method in almost all cases. In the largest examples, the Monte-Carlo method had a much better performance: in CoralLib/linderoth, Knapsack_6 and Knapsack_7, it found the optimal solution much faster than the exact method. Moreover, in Knapsack_8 and Knapsack_9, the Monte-Carlo algorithm found much better solutions than the exact method in shorter running times.

Note that this qualitative analysis for the Monte-Carlo algorithm is largely possible because we have the exact solution to compare with in most cases. In practice, if the enumeration algorithm cannot be executed, we would need to rely on the volume-errors. In this line, the results are less positive. While most instances of BOLib and CoralLib have a volume-error of 0%0\%, some instances have a large error, even attaining 74%74\% in the worst one, which happens to be the best one in time (CoralLib/linderoth). Probably, these instances have many small full-dimensional chambers, and so the volume-error estimator does not reflect how well the algorithm is performing (it largely overestimates the error).

We also note that, in theory, the Knapsack problem should have a large volume-error (since it has as many full-dimensional chambers as chamber vertices, as illustrated in Section 4). Nevertheless, the 0%0\% error estimation we obtained tells us that most of these chambers are, in fact, small. This could be exploited by a more ad-hoc sampling technique (instead of just plain Monte-Carlo).

Moving forward, the main (and clear) challenge for this work is scalability. These results show short running times since all instances have small dimensions. The main bottleneck currently is the enumeration of the faces of a polytope. In the case of Algorithm 1, there does not seem to be much hope in improving this substantially: note in Table 1 that in all but the bottom two entries222The last two entries of Table 1 correspond to cases where nx\mathscr{F}_{\leq{n_{x}}} was not fully computed due to the time limit., Algorithm 1 used all available faces. This is because the algorithm heavily relies on maximal labels, which is important in our procedure to not repeat chambers when enumerating. Nonetheless, we still believe Algorithm 1 can be useful as a baseline that has optimality guarantees. As an immediate perspective, we propose to conduct a study to understand the computational complexity of the problems presented in this work.

Algorithm 2 could potentially be improved significantly. First of all, recall that this approach only uses the faces of dimension nx{n_{x}} (i.e. nx\mathscr{F}_{{n_{x}}}), which can be considerably smaller than nx\mathscr{F}_{\leq{n_{x}}} (see columns 3 and 4 of Table 1). Therefore, a more intricate enumeration that exploits this could be devised. Additionally, and perhaps more importantly, note that in the instances where |nx||\mathscr{F}_{{n_{x}}}| is not too small (say, more than 40) Algorithm 2 only uses a fraction of nx\mathscr{F}_{{n_{x}}} in its execution. This indicates that one could heavily restrict the faces to consider initially and generate more on-the-fly, much like in a column generation approach. Another potential improvement path is exploiting more structure of a particular family of instances (as in the knapsack case), which may indicate which are the faces that one would truly need.

Table 1: Summary of results for Algorithms 1 and 2 for selected BOLib instances [36], CoralLib instances [1] and Knapsack instances [5]. The “Size” of the instance is (nx+ny,m)({n_{x}}+{n_{y}},m). The “Obj gap” column shows the gap between the values found for both algorithms; a negative gap indicates the stochastic method performed better. The “Error” column shows the upper estimation of the volume of unseen chambers during the sampling process as per (49). The columns labeled “Computation Times” contain the running times (in seconds) for the computation of all the faces, the execution of Algorithm 1 and of Algorithm 2. The columns labeled “Used faces” contain the number of faces that were explicitly used during the execution of each algorithm.

Computation Times Used Faces Instance Size |nx||\mathscr{F}_{\leq{n_{x}}}| |nx||\mathscr{F}_{{n_{x}}}| Obj gap Error Faces Alg. 1 Alg. 2 Alg. 1 Alg 2. BOLib/AnandalinghamWhite1990 (2,7) 12 6 0% 0% 1.4 3.8 7.2 12 5 BOLib/Bard1984a (2,6) 10 5 0% 0% 3.1 6.8 7.9 10 5 BOLib/Bard1984b (2,6) 10 5 0% 0% 1.4 4.1 7.5 10 5 BOLib/Bard1991Ex2 (3,6) 14 9 0% 0% 1.4 4.2 8.2 14 6 BOLib/BardFalk1982Ex2 (4,7) 45 17 0% 45% 1.7 4.3 7.6 45 5 BOLib/BenAyedBlair1990a (3,6) 20 12 0% 0% 1.4 4.2 8.5 20 4 BOLib/BenAyedBlair1990b (2,5) 6 3 0% 0% 1.5 4.3 8.0 6 3 BOLib/BialasKarwan1984a (3,8) 20 12 0% 0% 1.6 4.4 8.4 20 10 BOLib/BialasKarwan1984b (2,7) 12 6 0% 0% 1.5 4.3 8.0 12 5 BOLib/CandlerTownsley1982 (5,8) 111 48 1% 37% 1.8 7.9 18.5 111 16 BOLib/ClarkWesterberg1988 (2,3) 6 3 0% 0% 1.5 4.3 8.0 6 3 BOLib/ClarkWesterberg1990b (3,7) 15 9 0% 0% 1.5 4.4 8.3 15 9 BOLib/GlackinEtal2009 (3,6) 20 5 0% 48% 1.6 4.8 8.0 20 3 BOLib/HaurieSavardWhite1990 (2,4) 8 4 0% 0% 1.6 4.5 8.3 8 4 BOLib/HuHuangZhang2009 (3,6) 20 12 0% 0% 1.6 4.4 8.6 20 7 BOLib/LanWenShihLee2007 (2,8) 14 7 0% 1% 1.6 4.5 8.2 14 6 BOLib/LiuHart1994 (2,5) 10 5 0% 0% 1.6 4.4 8.6 10 4 BOLib/MershaDempe2006Ex1 (2,6) 8 4 0% 0% 1.7 4.8 8.6 8 4 BOLib/MershaDempe2006Ex2 (2,7) 10 5 0% 64% 1.5 4.3 6.4 10 3 BOLib/TuyEtal1993 (4,7) 45 17 0% 54% 1.7 4.5 7.6 45 5 BOLib/TuyEtal1994 (4,8) 72 24 0% 46% 1.6 4.5 7.5 72 6 BOLib/VisweswaranEtal1996 (2,6) 8 4 0% 0% 1.4 4.3 7.9 8 4 BOLib/WangJiaoLi2005 (3,7) 23 14 0% 0% 1.5 4.4 8.6 23 5 CoralLib/linderoth (6,15) 545 51 0% 74% 1.4 148.0 7.4 545 7 CoralLib/moore90_2 (2,7) 12 6 0% 0% 1.4 4.0 7.6 12 5 CoralLib/moore90 (2,8) 8 4 0% 0% 1.6 4.4 8.1 8 4 Knapsack_6 (7,15) 574 447 0% 0% 3.2 117.1 19.2 574 255 Knapsack_7 (8,17) 1278 1023 0% 0% 2.5 626.9 38.5 1278 575 Knapsack_8 (9,19) 2814 2303 -14% 0% 4.9 914.7 82.4 1990 1279 Knapsack_9 (10,21) 6142 5119 -380% 0% 15.4 984.2 181.4 3346 2815

Statements and Declaration.

Funding. G. Muñoz was supported by FONDECYT Iniciación 11190515 (ANID-Chile). D. Salas was supported by the Center of Mathematical Modeling (CMM) FB210005 BASAL funds for centers of excellence (ANID-Chile), and the grant FONDECYT Iniciación 11220586 (ANID-Chile). A. Svensson was supported by the grant FONDECYT postdoctorado 3210735 (ANID-Chile).

References

  • [1] Coral bilevel optimization problem library. https://coral.ise.lehigh.edu/data-sets/bilevel-instances/. Accessed: 2022-11-3.
  • [2] Charles Audet, Jean Haddad, and Gilles Savard. A note on the definition of a linear bilevel programming solution. Appl. Math. Comput., 181(1):351–355, 2006.
  • [3] Yasmine Beck, Ivana Ljubić, and Martin Schmidt. A survey on bilevel optimization under uncertainty. European Journal of Operational Research, 2023.
  • [4] Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65–98, 2017.
  • [5] C. Buchheim, D. Henke, and J. Irmai. The stochastic bilevel continuous knapsack problem with uncertain follower’s objective. J Optim Theory Appl, 194:521–542, 2022.
  • [6] Christoph Buchheim. Bilevel linear optimization belongs to NP and admits polynomial-size KKT-based reformulations. Operations Research Letters, 51(6):618–622, 2023.
  • [7] Christoph Buchheim and Dorothee Henke. The robust bilevel continuous knapsack problem with uncertain coefficients in the follower’s objective. Journal of Global Optimization, 83(4):803–824, 2022.
  • [8] Christoph Buchheim, Dorothee Henke, and Felix Hommelsheim. On the complexity of robust bilevel optimization with uncertain follower’s objective. Operations Research Letters, 49(5):703–707, 2021.
  • [9] Johanna Burtscheidt and Matthias Claus. Bilevel linear optimization under uncertainty. In Bilevel optimization—advances and next challenges, volume 161 of Springer Optim. Appl., pages 485–511. Springer, Cham, [2020] ©2020.
  • [10] Matthias Claus. On continuity in risk-averse bilevel stochastic linear programming with random lower level objective function. Oper. Res. Lett., 49(3):412–417, 2021.
  • [11] Matthias Claus. Existence of solutions for a class of bilevel stochastic linear programs. European J. Oper. Res., 299(2):542–549, 2022.
  • [12] Jesús De Loera, Jörg Rambau, and Francisco Santos. Triangulations: structures for algorithms and applications, volume 25. Springer Science & Business Media, 2010.
  • [13] S. Dempe. Foundations of bilevel programming. Springer Science & Business Media, 2002.
  • [14] Stephan Dempe, Vyacheslav Kalashnikov, Gerardo A. Pérez-Valdés, and Nataliya Kalashnykova. Bilevel programming problems. Energy Systems. Springer, Heidelberg, 2015. Theory, algorithms and applications to energy networks.
  • [15] Stephan Dempe and Alain Zemkoho, editors. Bilevel optimization—advances and next challenges, volume 161 of Springer Optimization and Its Applications. Springer, Cham, [2020] ©2020.
  • [16] M. Forcier. Multistage stochastic optimization and polyhedral geometry. PhD thesis, École de Ponts - ParisTech, 2022.
  • [17] Maël Forcier, Stephane Gaubert, and Vincent Leclère. Exact quantization of multistage stochastic linear problems. SIAM J. Optim., 34(1):533–562, 2024.
  • [18] Ewgenij Gawrilow and Michael Joswig. polymake: a framework for analyzing convex polytopes. In Polytopes—combinatorics and computation (Oberwolfach, 1997), volume 29 of DMV Sem., pages 43–73. Birkhäuser, Basel, 2000.
  • [19] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2022.
  • [20] Pierre Hansen, Brigitte Jaumard, and Gilles Savard. New branch-and-bound rules for linear bilevel programming. SIAM Journal on scientific and Statistical Computing, 13(5):1194–1217, 1992.
  • [21] Dorothee Henke, Henri Lefebvre, Martin Schmidt, and Johannes Thürauf. On coupling constraints in linear bilevel optimization, 2024. Preprint - arXiv:2402.12191.
  • [22] Tito Homem-de Mello and Güzin Bayraksan. Monte Carlo sampling-based methods for stochastic optimization. Surv. Oper. Res. Manag. Sci., 19(1):56–85, 2014.
  • [23] S. V. Ivanov. A bilevel programming problem with random parameters in the follower’s objective function. Diskretn. Anal. Issled. Oper., 25(4):27–45, 2018.
  • [24] Leonid Khachiyan, Endre Boros, Konrad Borys, Vladimir Gurvich, and Khaled Elbassioni. Generating all vertices of a polyhedron is hard. In Twentieth Anniversary Volume:, pages 1–17. Springer, 2009.
  • [25] Thomas Kleinert, Martine Labbé, Ivana Ljubić, and Martin Schmidt. A survey on mixed-integer programming techniques in bilevel optimization. EURO Journal on Computational Optimization, 9:100007, 2021.
  • [26] Achim Klenke. Probability Theory: A Comprehensive Course. Springer, 2014.
  • [27] Gunther Leobacher and Friedrich Pillichshammer. Introduction to quasi-Monte Carlo integration and applications. Compact Textbooks in Mathematics. Birkhäuser/Springer, Cham, 2014.
  • [28] L. Mallozzi and J. Morgan. Hierarchical Systems with Weighted Reaction Set, pages 271–282. Springer US, Boston, MA, 1996.
  • [29] Ayalew Getachew Mersha and Stephan Dempe. Linear bilevel programming with upper level constraints depending on the lower level solution. Appl. Math. Comput., 180(1):247–254, 2006.
  • [30] Gonzalo Muñoz, David Salas, and Anton Svensson. Exploiting the polyhedral geometry of stochastic linear bilevel programming. In International Conference on Integer Programming and Combinatorial Optimization, pages 363–377. Springer International Publishing Cham, 2023.
  • [31] Jörg Rambau and Günter M Ziegler. Projections of polytopes and the generalized baues conjecture. Discrete & Computational Geometry, 16(3):215–237, 1996.
  • [32] David Salas and Anton Svensson. Existence of solutions for deterministic bilevel games under a general Bayesian approach. SIAM J. Optim., 33(3):2311–2340, 2023.
  • [33] Alexander Shapiro, Darinka Dentcheva, and Andrzej Ruszczyński. Lectures on stochastic programming—modeling and theory, volume 28 of MOS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Optimization Society, Philadelphia, PA, [2021] ©2021. Third edition [of 2562798].
  • [34] Luis Vicente, Gilles Savard, and Joaquim Júdice. Descent approaches for quadratic bilevel programming. Journal of optimization theory and applications, 81(2):379–399, 1994.
  • [35] H. Von Stackelberg. Marktform und Gleichgewitch. Springer, 1934.
  • [36] Shenglong Zhou, Alain B Zemkoho, and Andrey Tin. Bolib: Bilevel optimization library of test problems. In Bilevel Optimization, pages 563–580. Springer, 2020.