This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Applications and Issues in Abstract Convexity

R. Díaz Millán1, Nadezda Sukhorukova2,∗, Julien Ugon1
(Date: January 2022)

1Faculty of Science, Engineering and Built Environment , Deakin University, Australia
2Department of Mathematics, Swinburne University of Technology, Australia


Abstract. The theory of abstract convexity, also known as convexity without linearity, is an extension of the classical convex analysis. There are a number of remarkable results, mostly concerning duality, and some numerical methods, however, this area has not found many practical applications yet. In this paper we study the application of abstract convexity to function approximation. Another important research direction addressed in this paper is the connection with the so-called axiomatic convexity.

Keywords. Abstract convexity, axiomatic convexity, Chebyshev approximation, Carathéodory number, quasiaffine functions.

2010 Mathematics Subject Classification. 90C26,90C90, 90C47, 65D15.

footnotetext: Corresponding author. E-mail addresses: r.diazmillan@deakin.edu.au (R. Díaz Millán), nsukhorukova@swin.edu.au (N. Sukhorukova), julien.ugon@deakin.edu.au (Julien Ugon). Received ; Accepted .

1. Introduction

Abstract convexity appears as a natural extension of the classical convex analysis when linear and affine functions (elementary functions) are “replaced” by other types of functions. This is why abstract convexity is also known as “convexity without linearity”.

It was demonstrated that many results from the classical convex analysis: conjugation, duality, subdifferential related issues can be extended to “non-linear” settings [25, 31]. In [25] the authors provide a very detailed historical review of the theory and it turned out that the origins of this environment comes back to early 70s, while some specific issues were already in place in the 50s of the twentieth century.

Despite a very productive work done in the development of abstract convexity, there are still several direction for improvements. From the onset of abstract convex theory, much effort has been devoted to study duality, leading to very elegant generalisations of classical convexity results. On the other hand, the geometric aspects of convexity have not received as much attention. Additionally, the theory remains “abstract” and needs practical applications. One of the goals of this paper is to provide some applications and develop a framework on the properties that the elementary functions have to satisfy to be efficient in applications.

There are a number of difficulties to develop efficient computational methods. The first step is to choose the set of elementary functions. This set should be simple enough to construct fast and efficient methods. On another hand, this set has to generate accurate approximations to the functions from the applications. In some cases, the choice of elementary functions is reasonably straightforward (see section 3), but this is rather exceptional and in most applications the choice of a suitable class of elementary functions is a difficult task.

A deep and detailed study on abstract convexity and related issues can be found in [31] and also in [29]. A detailed survey of methods of abstract convex programming can be found in [2]. Many more contributions and developments and results on duality and methods are not mentioned in this paper, but the most recent state of the developments in abstract convexity and a comprehensive literature review can be found in [6].

The paper is organised as follows. In section 2 we provide the essential background of abstract convexity and axiomatic convexity. In the same section, we underline connections and possible cross-feeding between these two types of convexity. Then, in section 3 we illustrate how abstract convexity and axiomatic convexity can be applied to function approximation. Section 4 illustrates the results of numerical experiments. Finally, section 5 provides conclusions and future research directions.

2. Abstract Convexity

2.1. Definitions and preliminaries

Consider the set of all real-valued functions acting on the domain XX, denoted by \mathscr{F} and defined by :={f:X}\mathscr{F}:=\{f:X\to\mathds{R}\}. We called the set of abstract linear functions to any subset of functions LL\subseteq\mathscr{F}. Following, we introduce some known definitions to state the abstract convexity concept. Suppose that we defined the set of abstract linear functions LL\subseteq\mathscr{F}.

Definition 2.1.

The vertical closure of LL, H:={l+c:lL,c}H:=\{l+c:l\in L,c\in\mathds{R}\} is the set of abstract affine functions.

Definition 2.2 (Abstract Convexity [29]).

A function ff is said to be LL-convex if there exists a set UHU\subset H such that for any xXx\in X, f(x)=supuUu(x)f(x)=\sup_{u\in U}u(x). Such functions are also called abstract convex with respect to LL.

In this paper we denote the set of all LL-convex functions by L\mathscr{F}_{L}.

Definition 2.3 (Support Set [29]).

The LL-support set of a function fFf\in F is defined as:
missingsuppLf:={lL:lf},\mathop{\mathrm{missing}}{supp}_{L}f:=\{l\in L:l\leq f\}, where lfl\leq f means that l(x)f(x),xXl(x)\leq f(x),\forall x\in X. The support sets of LL-convex functions are called LL-convex sets.

Therefore, informally, convexity without linearity simply means that the “role” of linear functions of the classical convex analysis is held by functions from a certain class. The first (and for some applications this is the most important) question is how to choose the set of elementary function.

The next definition, which will be detailed and used in the next subsection, “helps” the chosen of a “better” set of abstract linear functions.

Definition 2.4 (Supremal generator [14, 29]).

Let \mathscr{F} be a set of functions defined on a set XX. A set LL\subset\mathscr{F} is called a supremal generator of \mathscr{F} if each ff\in\mathscr{F} is abstract convex with respect to LL.

2.2. Quasiconvexity

The notion of quasiconvexity plays an essential role in abstract convexity. We are now going to explain why.

The notion of quasiconvexity was originally introduced in [11], where the author studied the behaviour of functions with convex sublevel sets, but the term quasiconvexity was introduced much later.

Definition 2.5.

Let DD be a convex subset of n\mathds{R}^{n}. A function f:Df:D\to\mathds{R} is quasiconvex if and only if its sublevel set

Sα={xD|f(x)α}S_{\alpha}=\{x\in D\,|\,f(x)\leq\alpha\}

is convex for any α\alpha\in\mathbb{R}.

There are several equivalent definitions of quasiconvex functions, but the one we use is convenient for our study.

Definition 2.6.

A function f:Df:D\to\mathds{R}, where DD is a convex subset of n\mathds{R}^{n}, is called quasiconcave if f-f is quasiconvex.

Definition 2.7.

Functions that are both quasiconvex and quasiconcave are called quasiaffine (quasilinear).

Note the following important observations.

  • Quasiconvex functions do not need to be continuous.

  • In the case of univariate functions, quasiaffine functions are monotone functions.

If a function is quasiaffine on n\mathds{R}^{n} (unconstrained problems) then, from the definition, that its level sets must be half-spaces, and it is clear that the hyperplanes defining these half-spaces need to be parallel. In the presence of constraints, this observation is not valid and we provide an example in section 4.

There are many studies dedicated to quasiconvex functions and quasiconvex optimisation [5, 9, 10, 14, 29, 30]. In these studies, the notion of quasiconvexity appears as one of the possible generalisations of convexity.

2.3. Quasiconvex functions and supremal generators

Theorem 7.13 from [14] states the following.

Theorem 2.8.

The set of all lower semicontinuous quasiaffine functions forms a supremal generator of the set of all lower semicontinuous quasiconvex functions.

Essentially, this theorem states that in the case of quasiconvex minimisation problems, quasiaffine functions “replace” the role of linear functions in classical convex settings. Therefore, the choice of elementary functions is clear (all quasiaffine functions).

One main advantage is that the sublevel sets of “new” elementary functions are half-spaces (similar to classical convex analysis). In most practical problems, there are some specific characteristics for the quasiaffine approximations (for example, the class of rational or generalised rational approximations, approximations that are compositions of certain monotone or non-decreasing function with certain affine or quasiaffine functions, etc.). This additional information helps dealing with some specific applications.

In the rest of this section, we clarify the “roles” of quasiaffine functions in the new abstract convex settings. Then section 4 contains the results of computational experiments and illustrations.

2.4. Relation to Axiomatic Convexity

Axiomatic Convexity aims at generalising the notion of convexity using only set-theoretic definitions. Good reviews can be found in [32, 33]. Given a set XX, a family 𝒞\mathcal{C} of subsets of XX is called a convexity if it follows the following axioms:

  1. (1)

    \emptyset and VV belong to 𝒞\mathcal{C};

  2. (2)

    Any arbitrary intersection of sets from 𝒞\mathcal{C} is in 𝒞\mathcal{C}: if D𝒞D\subset\mathcal{C},

    ADA𝒞\bigcap_{A\in D}A\in\mathcal{C}
  3. (3)

    The nested union of sets from 𝒞\mathcal{C} is in 𝒞\mathcal{C}: given an a family of sets D𝒞D\subset\mathcal{C} that is totally ordered by inclusion then

    ADA𝒞.\bigcup_{A\in D}A\in\mathcal{C}.

The pair (X,𝒞)(X,\mathcal{C}) is called a convex structure (or (abstract) convex space).

These axioms (in particular axioms 1 and 2) ensure the existence of the convex hull of any set SXS\subset X, namely the smallest convex set containing SS:

co𝒞S={A𝒞:SA}.\mathrm{co}_{\mathcal{C}}S=\bigcap\{A\in\mathcal{C}:S\subset A\}.

If FF is finite, then co𝒞F\mathrm{co}_{\mathcal{C}}F is called a polyhedron.

Axiomatic Convexity, also called abstract convexity by van de Vel [33], has found applications in combinatorial geometry [13, 18], among others. A family 𝒞\mathcal{C} that satisfies the first two axioms above is called a closure space, or \cap-stable [15].

There are clear connections between abstract convexity as defined by Definition 2.3 and axiomatic convexity. Indeed, the notion of abstract convex function (Definition 2.2) first appeared in [15] in the context of closure spaces. The families of sets considered in [15] were the closure of the set of sublevel sets of a given family of functions.

Abstract convex functions (Definition 2.2) and Abstract convex sets (Definition 2.3) were then formally introduced by Kutateladze and Rubinov [17], who showed the equivalence between these sets and the ones considered by Fan [15]. Below we formalise some results on the correspondence between abstract convex sets and closure spaces. Let us start with noting that the support set of any (not necessarily LL-convex) function is LL-convex. Indeed, the support set of ff is also the support set of its LL-convex envelope, which is convex.

Proposition 2.9.

Consider a family of functions LL. Then, the set of LL-convex sets form a closure space. Furthermore, every closure space is isomorphic to such a set.

Proof.

Consider a family of functions LL and let 𝒞\mathcal{C} be the set of LL-convex sets. We first show that LL is closed with respect to inclusion. Indeed, let D𝒞D\subset\mathcal{C} be an arbitrary set of LL-convex sets. For each ADA\in D, let fA(x)=suplAl(x)f_{A}(x)=\sup_{l\in A}l(x) and consider the function f(x)=infADfA(x)f(x)=\inf_{A\in D}f_{A}(x). We will show that ADA=missingsuppf\cap_{A\in D}A=\mathop{\mathrm{missing}}{supp}f, which implies that ADA\cap_{A\in D}A is LL-convex (note that missingsuppf\mathop{\mathrm{missing}}{supp}f is the support set of the lower convex envelope of ff).

missingsuppf\displaystyle\mathop{\mathrm{missing}}{supp}f ={lL:l(x)infADfA(x)xX}\displaystyle=\{l\in L:l(x)\leq\inf_{A\in D}f_{A}(x)\forall x\in X\}
={lL:l(x)fA(x)AD,xX}\displaystyle=\{l\in L:l(x)\leq f_{A}(x)\forall A\in D,x\in X\}
=AD{lL:l(x)fA(x),xX}=ADA.\displaystyle=\cap_{A\in D}\{l\in L:l(x)\leq f_{A}(x),\forall x\in X\}=\cap_{A\in D}A.

The first axiom is clearly verified by 𝒞\mathcal{C} from the fact that \emptyset is the support set of supll\sup_{l\in\emptyset}l and LL is the support set of suplLl\sup_{l\in L}l.

The second part of the statement is obtained by considering the set LL of indicator functions of the sets in the closure space. More precisely, let 𝒞\mathcal{C} be a closure space, and consider the set L={iA:A𝒞}L=\{i_{A}:A\in\mathcal{C}\} where

iA(x)={0if xA+otherwisei_{A}(x)=\begin{cases}0&\text{if }x\in A\\ +\infty&\text{otherwise}\end{cases}

Then, LL is also the set of LL-convex functions. Indeed, let D𝒞D\subset\mathcal{C}. Define U={iA:AD}LU=\{i_{A}:A\in D\}\subset L and f=suplUl=supADiAf=\sup_{l\in U}l=\sup_{A\in D}i_{A}. Then f(x)=0f(x)=0 if and only if xAx\in A for all ADA\in D. That is, f=iADAf=i_{\cap_{A\in D}A}. The isomorphism between the set of support sets of functions from LL and 𝒞\mathcal{C} is evident. ∎

Note that in practice it is not necessary to consider the entire set of indicator functions, but only the set of indicator functions of a basis of 𝒞\mathcal{C} (i.e., a set of functions whose closure is 𝒞\mathcal{C}), which corresponds to a suppremal generator of the corresponding abstract functions. It can also easily be seen that the set of domains of the LL-abstract functions defined as in the above proof, as well as the set of 0-sublevel sets, is precisely 𝒞\mathcal{C}.

On the other hand, not all families of abstract convex sets are a convexity structure.

Example 2.10.

Let LL be the set of linear functions on \mathds{R}, and consider the functions fαf_{\alpha} defined by fα(x)=α|x|f_{\alpha}(x)=\alpha\lvert x\rvert, for α>0\alpha>0. These functions are LL-convex, and induce a nested sequence of LL-convex sets Sα=missingsuppLfα={xβx:β[α,α]}S_{\alpha}=\mathop{\mathrm{missing}}{supp}_{L}f_{\alpha}=\{x\to\beta x:\beta\in[-\alpha,\alpha]\}.

However,

U=α<1Sα={xβx:β(1,1)}U=\bigcup_{\alpha<1}S_{\alpha}=\{x\to\beta x:\beta\in(-1,1)\}

is not a LL-convex set. Indeed, f1=suplUlf_{1}=\sup_{l\in U}l is the smallest LL-convex function whose support set contains UU, but the function xxmissingsuppf1Ux\to x\in\mathop{\mathrm{missing}}{supp}f_{1}\setminus U, which shows that UU cannot be the support set of f1f_{1} and therefore of any LL-convex function. This shows that the set of LL-convex sets is not a convex structure.

Let us introduce the notion of strict support set, which will enable us to address this issue.

The LL-strict support set of a function ff is defined as:

supp¯f={lL:l(x)<f(x),xmissingdom(f)}.\mathop{\mathrm{\underline{supp}}}f=\{l\in L:l(x)<f(x),\forall x\in\mathop{\mathrm{missing}}{dom}(f)\}.

We define the family of sets 𝒞L\mathcal{C}_{L} as follows:

Definition 2.11 (convexity extension).
𝒞Lf\displaystyle\mathcal{C}_{L}^{f} :={AL:supp¯fAmissingsuppf}\displaystyle:=\{A\in L:\mathop{\mathrm{\underline{supp}}}f\subseteq A\subseteq\mathop{\mathrm{missing}}{supp}f\}
𝒞L\displaystyle\mathcal{C}_{L} :=fL𝒞Lf.\displaystyle:=\bigcup_{f\in L}\mathcal{C}_{L}^{f}.

We call the set 𝒞L\mathcal{C}_{L} the convexity extension of the set of support sets of LL-convex functions.

Proposition 2.12.

For any family of functions LL, the convexity extension of LL, 𝒞L\mathcal{C}_{L}, forms a convexity structure.

Proof.

Since 𝒞L\mathcal{C}_{L} contains all LL-convex sets, it contains a closure space and therefore by Proposition 2.9, it contains the sets \emptyset and XX.

To see why the second axiom is satisfied consider an arbitrary family of sets D𝒞LD\subset\mathcal{C}_{L}. For each ADA\in D, there exists a LL-function fAf_{A} such that supp¯fAAmissingsuppfA\mathop{\mathrm{\underline{supp}}}f_{A}\subseteq A\subseteq\mathop{\mathrm{missing}}{supp}f_{A}. Define f=coLinfADfAf=\mathrm{co}_{L}\inf_{A\in D}f_{A}. Since AmissingsuppfAA\subseteq\mathop{\mathrm{missing}}{supp}f_{A} for any AA, we have that

ADAADmissingsuppfA=missingsuppf,\bigcap_{A\in D}A\subset\bigcap_{A\in D}\mathop{\mathrm{missing}}{supp}f_{A}=\mathop{\mathrm{missing}}{supp}f,

where the last equality was shown in the proof of Proposition 2.9. Additionally, we have:

supp¯f\displaystyle\mathop{\mathrm{\underline{supp}}}f ={lL:l(x)<infADfA(x)xX}\displaystyle=\{l\in L:l(x)<\inf_{A\in D}f_{A}(x)\forall x\in X\}
{lL:l(x)<fA(x)AD,xX}\displaystyle\subset\{l\in L:l(x)<f_{A}(x)\forall A\in D,x\in X\}
=AD{lL:l(x)<fA(x)xX}ADA.\displaystyle=\cap_{A\in D}\{l\in L:l(x)<f_{A}(x)\forall x\in X\}\subset\cap_{A\in D}A.

Therefore supp¯fADAmissingsuppf\mathop{\mathrm{\underline{supp}}}f\subseteq\cap_{A\in D}A\subseteq\mathop{\mathrm{missing}}{supp}f, we conclude that ADA\cap_{A\in D}A is in 𝒞L\mathcal{C}_{L}.

Finally, to see that the third axiom is also satisfied, let DD be an ordered family of sets from 𝒞\mathcal{C}. For each ADA\in D, we define fAf_{A} as above. The nested nature of the sets in DD implies that for A,AA,A^{\prime} in DD, if ADAA\leq_{D}A^{\prime}, then

fAfA.f_{A}\leq f_{A^{\prime}}. (2.1)

Let S=ADAS=\cup_{A\in D}A and f=supADfAf=\sup_{A\in D}f_{A}.

It is clear that SmissingsuppfS\subset\mathop{\mathrm{missing}}{supp}f, since for any uSu\in S there exists ADA\in D such that umissingsuppfAu\in\mathop{\mathrm{missing}}{supp}f_{A}. Then ufAfu\leq f_{A}\leq f. Now, consider uSu\notin S. Then uADsupp¯fAu\notin\cup_{A\in D}\mathop{\mathrm{\underline{supp}}}f_{A}, and so there exists xXx\in X such that u(x)fA(x)u(x)\geq f_{A}(x) for any ADA\in D, and therefore that u(x)supADfA(x)=f(x)u(x)\geq\sup_{A\in D}f_{A}(x)=f(x). Therefore usupp¯fu\notin\mathop{\mathrm{\underline{supp}}}f. This implies that supp¯fSmissingsuppf\mathop{\mathrm{\underline{supp}}}f\subseteq S\subseteq\mathop{\mathrm{missing}}{supp}f, and therefore SS is in 𝒞L\mathcal{C}_{L} and 𝒞L\mathcal{C}_{L} is a convexity structure. ∎

Remark 2.13.

Propositions 2.9 and  2.12 imply an equivalence relation between sets of abstract linear functions according to the convexity structure (axioms 1-3) or closure space (axioms 1,2) the induced abstract convex sets are isomorphic to.

Axiomatic convexity has been applied to obtain generalisations of well known geometric results in convexity theory. Of particular interest to this paper are generalisations of Carathéodory’s theorems. Much research has been devoted to investigating this topic. We refer to [33] for a review of classical results in the area.

Definition 2.14 ([33, Definition 1.5]).

In a given Convexity structure 𝒞\mathcal{C}, a set FF is Carathéodory dependent if

co𝒞FaFco𝒞(F{a}),\mathrm{co}_{\mathcal{C}}F\subseteq\bigcup_{a\in F}\mathrm{co}_{\mathcal{C}}(F\setminus\{a\}),

and Carathéodory independent otherwise.

Then we can define the Carathéodory number of a convex structure 𝒞\mathcal{C} as the largest cardinality of a Carathéodory independent set (i.e., any set FF with cardinality greater than c(x)c(x) is Carathéodory dependent). The Carathéodory numbers of several important classes of convexity structures are known and discussed in the literature [33].

Generalisations of Helly’s and Radon’s theorems were obtained similarly. The Carathéodory number enables the following generalisation of Carathéodory’s classical result:

Proposition 2.15 ([33, Theorem 1.7]).

Consider a convexity structure (𝒞,X)(\mathcal{C},X) with Carathéodory number c(x)c(x), and let A𝒞A\subset\mathcal{C} be a convex set from 𝒞\mathcal{C}. Then, for any xAx\in A there exists a set FAF\subset A such that |F|c(x)\lvert F\rvert\leq c(x) and xcoFx\in\mathrm{co}F. This is best possible.

3. Application to approximation

3.1. Problem formulation

We are working with uniform (Chebyshev) approximation:

minAsup𝐱X|f(𝐱)g(A,𝐱)|,\min_{A}\sup_{\mathbf{x}\in X}|f(\mathbf{x})-g(A,\mathbf{x})|, (3.1)

where AA are the decision variables and XX is a compact set. In many computer based applications, this set is a finite grid defined on a convex compact set. If we define L={Ag(A,𝐱):xX}L=\{A\to g(A,\mathbf{x}):x\in X\}, then it is clear from Formulation (3.1) that the objective function of this problem is HLH_{L}-convex.

In the case when the approximation function g(A,t)g(A,t) is a polynomial (and its coefficients are subject to optimisation), the optimality conditions are based on maximal deviation points (Chebyshev theorem, see [7]). In the case of univariate function approximation, the conditions are based on the notion of alternating sequence.

Theorem 3.1 (Chebyshev).

A polynomial of degree at most nn is optimal in uniform (Chebyshev) norm if and only there are n+2n+2 alternating points.

Essentially, the approximations here are presented as linear combinations of basis functions. In the case of polynomials the basis functions are monomials, but other classes of basis functions can be used. The corresponding optimisation problems are convex as supremum of linear forms. Theorem 3.1 can be proved using convex analysis and the number of alternative point (n+2n+2) obtained through Carathéodory’s theorem. In other words, the following conjecture is true, in the case when LL is a set of linear functions:

Conjecture 3.2.

For a function from LL to be an optimal approximation in uniform norm, it is enough that it is optimal at c(𝒞HL)+1c(\mathcal{C}_{H_{L}})+1 extreme points.

We also conjecture that this is best possible, in the sense that the statement is generally not true if we replace c(𝒞HL)+1c(\mathcal{C}_{H_{L}})+1 with c(𝒞HL)c(\mathcal{C}_{H_{L}}).

If true, this conjecture could form the basis for algorithms for best approximation, such a generalisation of Vallée-Poussin’s procedure [12]. Later in this section, we discuss an example of families LL in the context of this conjecture.

One possible generalisation to polynomial approximation is rational approximation, that is, approximation by the ratio of two polynomials, whose coefficients are subject to optimisation. Note that in some cases, the degree of the denominator and numerator can be reduced without compromising the accuracy:

j=0najtijk=0mbktik=j=0nνajtijk=0mμbktik,\frac{\sum_{j=0}^{n}a_{j}t_{i}^{j}}{\sum_{k=0}^{m}b_{k}t_{i}^{k}}=\frac{\sum_{j=0}^{n-\nu}a_{j}t_{i}^{j}}{\sum_{k=0}^{m-\mu}b_{k}t_{i}^{k}},

where d=min{ν,μ}d=\min\{\nu,\mu\} is called the defect. Then the necessary and sufficient optimality conditions are as follows [1].

Theorem 3.3.

A rational function in n,m\mathcal{R}_{n,m} with defect dd is the best polynomial approximation of a function fC0(I)f\in C^{0}(I) (the space of continuous functions over II) if and only if there exists a sequence of at least n+m+2dn+m+2-d points of maximal deviation where the sign of the deviation at these points alternates.

Therefore, similar to polynomial approximation, these conditions are bases the number of alternating points.

In connection to Conjecture 3.2, c(𝒞HL)c(\mathcal{C}_{H_{L}}) is n+m+2n+m+2. If nn and mm are the degrees of the corresponding polynomials, then the total number of the parameters is n+1+m+1=n+m+2n+1+m+1=n+m+2. However, one of the parameters can be fixed (otherwise one can divide the numerator and denominator by the same number). Therefore, there are n+m+1n+m+1 “free” parameters and we add one more for  c(𝒞HL)c(\mathcal{C}_{H_{L}}) (similar to Carathéodory’s theorem from classical convex analysis). We will provide a formal proof in a future paper.

Rational approximation was a very popular research area in the 1950s-70s [1, 4, 20, 27, 28] (just to name a few).

If the basis functions are not restricted to monomials, the approximations are called generalised rational approximations. The term is due to Cheney and Loeb [8].

One way to approach rational approximation is through constructing “near optimal” solutions [24]. This approach is very efficient and therefore very popular. The extension of this approach to non-monomial basis functions and multivariate approximation remains open (an extension to complex domains can be found in [23]).

An alternative way is based on modern optimisation techniques. This approach is preferable when the basis function are not restricted to univariate monomials and when deeper theoretical study is required. The corresponding optimisation problems are quasiconvex and can be solved using a general quasiconvex optimisation methods.

Rational and generalised rational functions are quasiaffine as ratios of linear forms [5, 19, 26]. It can be solved efficiently by applying a simple, but robust approach, called the bisection method for quasiconvex functions [5].

In [26] the authors use a well-known bisection method for quasiconvex functions (see [5]) to solve these problems. In [22], the authors use a projection-type algorithm for solving these problems.

3.2. Bisection method

Bisection method for quasiconvex functions [5] can be used efficiently for rational and generalised rational approximation, including multivariate settings [21, 26]. Can we use abstract convexity to extend this method to a wider class of approximations?

The problem can be formulated as follows:

minimisez~\text{minimise}~{}\tilde{z} (3.2)

subject to

f(𝐱)𝐀T𝐆(𝐱)𝐁T𝐇(𝐱)z~,𝐱X,f(\mathbf{x})-\frac{\mathbf{A}^{T}\mathbf{G}(\mathbf{x})}{\mathbf{B}^{T}\mathbf{H}(\mathbf{x})}\leq\tilde{z},~{}\mathbf{x}\in X, (3.3)
𝐀T𝐆(𝐱)𝐁T𝐇(𝐱)f(𝐱)z~,𝐱X,\frac{\mathbf{A}^{T}\mathbf{G}(\mathbf{x})}{\mathbf{B}^{T}\mathbf{H}(\mathbf{x})}-f(\mathbf{x})\leq\tilde{z},~{}\mathbf{x}\in X, (3.4)
𝐁T𝐇(𝐱)δ,𝐱X,\mathbf{B}^{T}\mathbf{H}(\mathbf{x})\geq\delta,~{}\mathbf{x}\in X, (3.5)

where z~\tilde{z} is the maximal deviation and δ\delta is a small positive number. Problem (3.2)-(3.5) is not linear.

The idea of this method is based on the fact that all the sublevel sets of quasiconvex functions are convex and therefore the sublevel sets of quasiaffine functions are half-spaces. Essentially, this means that the constraint set (3.3)-(3.5) for a fixed z~\tilde{z} is an intersection of a finite number of half-spaces (XX is a finite grid) and therefore it is a polytope.

The algorithm starts with an upper and lower bound (uu and ll) for the maximal deviation, then the sublevel set for the maximal deviation at the level z=u+l2z=\frac{u+l}{2} is a convex set and the algorithm checks if this set is empty. If it is empty, then the upper bound is updated to zz, otherwise, the lower bound is updated to zz. The algorithm terminates when the upper and lower bounds are within the specified precision.

In general, checking if the convex set (sublevel set of the maximal deviation) is empty or not may be a difficult task (convex feasibility problems). There are a number of efficient methods ([3, 34, 35, 36, 37] just to name a few), but there are still several open problems here. The discussion of these problems is out of scope of the current paper.

In the case of multivariate generalised rational approximation, however, this problem (convex feasibility) can be reduced to solving a linear programming problem [21]. The denominator of the approximation does not change the sign. Assume, for simplicity, that it is positive, then the problem of checking the feasibility is equivalent to solving the following problem:

minimiseu~\text{minimise}~{}\tilde{u} (3.6)

subject to

f(𝐱)𝐁T𝐇(𝐱)𝐀T𝐆(𝐱)z𝐁T𝐇(𝐱)+u~,𝐱X,f(\mathbf{x}){\mathbf{B}^{T}\mathbf{H}(\mathbf{x})}-{\mathbf{A}^{T}\mathbf{G}(\mathbf{x})}\leq z{\mathbf{B}^{T}\mathbf{H}(\mathbf{x})}+\tilde{u},~{}\mathbf{x}\in X, (3.7)
𝐀T𝐆(𝐱)f(𝐱)𝐁T𝐇(𝐱)z𝐁T𝐇(𝐱)+u~,𝐱X,{\mathbf{A}^{T}\mathbf{G}(\mathbf{x})}-f(\mathbf{x}){\mathbf{B}^{T}\mathbf{H}(\mathbf{x})}\leq z{\mathbf{B}^{T}\mathbf{H}(\mathbf{x})}+\tilde{u},~{}\mathbf{x}\in X, (3.8)
𝐁T𝐇(𝐱)δ,𝐱X,\mathbf{B}^{T}\mathbf{H}(\mathbf{x})\geq\delta,~{}\mathbf{x}\in X, (3.9)

where z=12(u+l)z=\frac{1}{2}(u+l) is the current bisection point (bisecting the possible values of maximal deviation).

If an optimal solution u~0\tilde{u}\leq 0, the corresponding sublevel set of the maximal deviation function is not empty (update the upper bound), otherwise the set is empty (update the lower bound). If XX is a finite grid, then (3.6)-(3.9) is a linear programming problem and can be solved efficiently at each step of the bisection method.

To summarise, an efficient implementation of the bisection method for quasiconvex functions, can be extended to approximation by any type of quasiaffine functions, since at each step one needs to check if a polytope (intersection of a finite number of half-spaces) is non-empty. The main problem is how to find this polytope. In the case of rational and generalised rational approximation, this problem is simple, but not for some other types of quasiaffine approximations. In section 4 we give more examples where the construction of this polytope is straightforward (composition of monotone and affine or ratios of affine functions).

One possibility for constructing sublevel sets for smooth quasiaffine approximation is to use their gradients (if it is not vanishing). In the rest section, we give an example demonstrating that this approach may be complicated if there are constraints (even simple linear constraints). In particulra, this approach will not work even in the case of rational approximation due to the requirement for the denominator to be strictly positive.

3.3. Approximation by a quasiaffine function

Suppose that instead of a generalised rational approximation one needs to approximate by a function from a different class, but the approximations are quasiaffine functions with respect to the decision variables. Then the problem is as follows:

minimisez~\text{minimise}~{}\tilde{z} (3.10)

subject to

f(𝐱)g(A,𝐱)z~,𝐱X,f(\mathbf{x})-g(A,\mathbf{x})\leq\tilde{z},~{}\mathbf{x}\in X, (3.11)
g(A,𝐱)f(𝐱)z~,𝐱X,g(A,\mathbf{x})-f(\mathbf{x})\leq\tilde{z},~{}\mathbf{x}\in X, (3.12)

where g(A,𝐱)g(A,\mathbf{x}) is a quasiaffine function. Since the sublevel sets of quasiaffine functions are half-spaces, the constraint set (3.11)-(3.12) is a polytope, since XX is a finite grid.

Remark 3.4.

A finite number of linear constraints may be added to (3.10)-(3.12), while the constraint set remains a polytope.

Therefore, the constraint set (for any fixed z~\tilde{z}), with or without additional linear constraints, is a poytope and we only need to check if this polytope is non-empty. However, we still need an efficient approach for finding this polytope. In the case of smooth functions, one can use the gradient as a possible normal vector to the hyperplanes, providing that it is not a zero vector. In section 4 we study the examples where the quasiaffine approximations can be decomposed as a composition of strictly monotone and affine (quasiaffine) functions. Other situations may be harder.

Note that if a function is quasiaffine on the whole space, then the sublevel sets are half-spaces, whose boundary hyperplanes are parallel. Otherwise, two hyperplanes will intersect. If there are additional (even linear) constraints, this observation may not be valid anymore. The following example illustrates that this may happen even in a very simple case of two variables.

Example 3.5.

Consider f(x,y)=xyf(x,y)=\frac{x}{y}, where y>0y>0. The sublevel sets are

Sα={(x,y):xyα,y>0},S_{\alpha}=\{(x,y):\frac{x}{y}\leq\alpha,y>0\},

where α\alpha is a given real number. Then SαS_{\alpha} can be described as

{xαy0,y>0}.\{x-\alpha y\leq 0,\quad y>0\}.

Each sublevel set is still a half-space, but the corresponding hyperplanes (in this example they correspond to level sets) are not parallel. These hyperplanes intersect at (0,0)(0,0), but this point is excluded from the domain due to the requirement for the denominator yy to be strictly positive.

This example demonstrates that the computation of sublevel sets for quasiaffine functions may be challenging. In the next section we give some examples of the classes of approximations that can be handled efficiently. We also provide the results of the numerical experiments.

4. Numerical experiments

4.1. Strictly monotone function in composition with an affine function.

In this example we approximate function

f(x,y)=(x+y3+x4)4f(x,y)=(-x+y^{3}+x^{4})^{4} (4.1)

by a quasiaffine function in the form

g(A,x,y)=(a1+a2x+a3y+a4x2+a5y2+a6xy)3,g(A,x,y)=(a_{1}+a_{2}x+a_{3}y+a_{4}x^{2}+a_{5}y^{2}+a_{6}xy)^{3}, (4.2)

where A=(a1,a2,a3,a4,a5,a6)A=(a_{1},a_{2},a_{3},a_{4},a_{5},a_{6}) are the decision variables, xx and yy form a grid on [1,1][-1,1] with the step-size 0.1. The optimal coefficients are (rounded to two decimal places):

A=(1.88,0.75,0.31,3.29,0.98,0.74),A=(-1.88,-0.75,0.31,3.29,0.98,-0.74),

the maximal absolute deviation is 8.01 (to two decimal places).

Figure 1 represents the function f(x,y)f(x,y), figure 2 depicts the best found approximation and figure 3 contains the deviation function f(x,y)g(A,x,y)f(x,y)-g(A,x,y), which is the error of approximation.

Function f(x,y)f(x,y) is almost flat with an abrupt increase around point (1,1)(-1,1). Visually, the approximation resembles the shape of the original function and the magnitude of the deviation confirms that the approximation is reasonably good.

Refer to caption
Figure 1. Function f(x,y)=(x+y3+x4)4f(x,y)=(-x+y^{3}+x^{4})^{4}.
Refer to caption
Figure 2. Approximation g(A,x,y)=(a1+a2x+a3y+a4x2+a5y2+a6xy)3g(A,x,y)=(a_{1}+a_{2}x+a_{3}y+a_{4}x^{2}+a_{5}y^{2}+a_{6}xy)^{3}.
Refer to caption
Figure 3. Deviation (error) function
f(x,y)g(A,x,y)=(x+y3+x4)4(a1+a2x+a3y+a4x2+a5y2+a6xy)3.f(x,y)-g(A,x,y)=(-x+y^{3}+x^{4})^{4}-(a_{1}+a_{2}x+a_{3}y+a_{4}x^{2}+a_{5}y^{2}+a_{6}xy)^{3}.

4.2. Strictly monotone function in composition with a rational function.

In this example we approximate the same function

f(x,y)=(x+y3+x4)4f(x,y)=(-x+y^{3}+x^{4})^{4} (4.3)

by a quasiaffine function in the form

g1(A,x,y)=(a1+a2x+a3y+a4x2+a5y21+a6xy)3,g_{1}(A,x,y)=\left(\frac{a_{1}+a_{2}x+a_{3}y+a_{4}x^{2}+a_{5}y^{2}}{1+a_{6}xy}\right)^{3}, (4.4)

where, similar to the previous example, A=(a1,a2,a3,a4,a5,a6)A=(a_{1},a_{2},a_{3},a_{4},a_{5},a_{6}) are the decision variables (the number of decision variables is the same as in the previous example), xx and yy are on a grid [1,1][-1,1] with the step-size 0.1. The purpose of approximation is to use lower degree polynomials in the composition. The optimal coefficients are (rounded to two decimal places):

A=(1.66,0.93,1.05,2.87,0.98,1),A=(-1.66,-0.93,1.05,2.87,0.98,1),

the maximal absolute deviation is 11.48 (to two decimal places).

Figure 4 contains the best found approximation and figure 5 depicts the deviation function f(x,y)g1(A,x,y)f(x,y)-g_{1}(A,x,y), which is the error of approximation.

The new approximation still resembles the shape of the original function, but the maximal deviation is higher.

Refer to caption
Figure 4. Approximation g1(A,x,y)=(a1+a2x+a3y+a4x2+a5y21+a6xy)3g_{1}(A,x,y)=(\frac{a_{1}+a_{2}x+a_{3}y+a_{4}x^{2}+a_{5}y^{2}}{1+a_{6}xy})^{3}.
Refer to caption
Figure 5. Deviation (error) function
f(x,y)g(A,x,y)=(x+y3+x4)4(a1+a2x+a3y+a4x2+a5y21+a6xy)3.f(x,y)-g(A,x,y)=(-x+y^{3}+x^{4})^{4}-\left(\frac{a_{1}+a_{2}x+a_{3}y+a_{4}x^{2}+a_{5}y^{2}}{1+a_{6}xy}\right)^{3}.

4.2.1. Deep learning applications.

At first glance, the class of a composition of a monotone univariate function with an affine or rational function is restrictive. However, it has many practical applications, including deep learning, where compositions of univariate activation functions and affine transformations are the main components or the models [16]. Most common choices for activation functions are sigmoid functions, ReLU and Leaky ReLU (the last two functions are simply non-decreasing piecewise linear functions with two linear pieces).

5. Conclusions

The goal of this study is to demonstrate that abstract convexity (in the sense “convexity without linearlity”) has several applications in different branches of mathematics, including approximation theory. This application appears naturally in the new settings.

The results of the numerical experiments demonstrate that our approach is computationally efficient. The applications lead to practical applications as well. One potential application is data science and deep learning.

We also touched the connections between “abstract convexity” and “axiomatic convexity”. These areas have many overlappings that can be seen as a new way of looking at the problems, in particular, function approximation.

Acknowledgements

This research was supported by the Australian Research Council (ARC), Solving hard Chebyshev approximation problems through nonsmooth analysis (Discovery Project DP180100602).

References

  • Achieser [1956] Achieser. Theory of Approximation. Frederick Ungar, New York, 1956.
  • Andramonov [2002] Mikhail Andramonov. A survey of methods of abstract convex programming. Journal of Statistics and Management Systems, 5(1-3):21–37, 2002. doi: 10.1080/09720510.2002.10701049. URL https://doi.org/10.1080/09720510.2002.10701049.
  • Bauschke and Lewis [2000] Heinz H Bauschke and Adrian S Lewis. Dykstras algorithm with bregman projections: A convergence proof. Optimization, 48(4):409–427, 2000.
  • Boehm [1964] Boehm. Functions whose best rational chebyshev approximation are polynomials. Numer. Math., pages 235––242, 1964.
  • Boyd and Vandenberghe [2009] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, New York, USA, seventh edition, 2009.
  • Bui et al. [2021] Hoa T. Bui, Regina S. Burachik, Alexander Y. Kruger, and David T. Yost. Zero duality gap conditions via abstract convexity. Optimization, 0(0):1–37, 2021. doi: 10.1080/02331934.2021.1910694. URL https://doi.org/10.1080/02331934.2021.1910694.
  • Chebyshev [1955] P.L Chebyshev. The theory of mechanisms known as parallelograms. In Selected Works, pages 611–648. Publishing Hours of the USSR Academy of Sciences, Moscow, 1955. (In Russian).
  • Cheney and Loeb [1964] E.W. Cheney and H.L. Loeb. Generalized rational approximation. Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis, 1(1):11–25, 1964.
  • Crouzeix [1980] J. P. Crouzeix. Conditions for convexity of quasiconvex functions. Mathematics of Operations Research, 5(1):120–125, 1980.
  • Daniilidis et al. [2002] A. Daniilidis, N. Hadjisavvas, and J.-E. Martinez-Legaz. An appropriate subdifferential for quasiconvex functions. SIAM Journal on Optimization, 12(2):407–420, 2002.
  • de Finetti [1949] B. de Finetti. Sulle stratificazioni convesse. Ann. Mat. Pura Appl., pages 173–183, 1949.
  • [12] C J de la Vallée Poussin. Sur la méthode de l’approximation minimum. Ann. Soc. Sci. Bruxelles, 35:1–16.
  • Duchet [1987] Pierre Duchet. Convexity in combinatorial structures. In Proceedings of the 14th Winter School on Abstract Analysis, pages [261]–293. Circolo Matematico di Palermo, 1987. URL http://eudml.org/doc/220736.
  • Dutta and Rubinov [2005] J. Dutta and A. M. Rubinov. Abstract convexity. Handbook of generalized convexity and generalized monotonicity, 76:293–333, 2005.
  • Fan [1963] Ky Fan. On the Krein-Milman theorem. In Convexity: Proceedings of the Seventh Symposium in Pure Mathematics of the American Mathematical Society, pages 211–219. American Mathematical Society, 1963. doi: 10.1090/pspum/007/0154097. URL https://doi.org/10.1090%2Fpspum%2F007%2F0154097.
  • Goodfellow et al. [2016] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
  • Kutateladze and Rubinov [1972] S S Kutateladze and A M Rubinov. Minkowski duality and its applications. Russ. Math. Surv., 27(3):137–191, jun 1972. doi: 10.1070/rm1972v027n03abeh001380. URL https://doi.org/10.1070%2Frm1972v027n03abeh001380.
  • Levi [1951] F. W. Levi. On helly’s theorem and the axioms of convexity. Journal of the Indian Mathematical Society, 15:65–76, 1951.
  • Loeb [1960] Henry L Loeb. Algorithms for chebyshev approximations using the ratio of linear forms. Journal of the Society for Industrial and Applied Mathematics, 8(3):458–465, 1960.
  • Meinardus [1967] G. Meinardus. Approximation of Functions: Theory and Numerical Methods. Springer-Verlag, Berlin, 1967.
  • Millán et al. [a] R. Díaz Millán, V. Peiris, N. Sukhorukova, and J. Ugon. Multivariate approximation by polynomial and generalised rational functions. Accepted in Optimization, a. URL https://arxiv.org/abs/2101.11786v1.
  • Millán et al. [b] R. Díaz Millán, N. Sukhorukova, and J. Ugon. An algorithm for best generalised rational approximation of continuous functions. In PRESS, accepted in Set-Valued and Variational Analysis, b. URL https://arxiv.org/abs/2011.02721.
  • Nakatsukasa and Trefethen [2020] Yuji Nakatsukasa and Lloyd N. Trefethen. An algorithm for real and complex rational minimax approximation. SIAM Journal on Scientific Computing, 42(5):A3157–A3179, 2020. doi: 10.1137/19M1281897.
  • Nakatsukasa et al. [2018] Yuji Nakatsukasa, Olivier Sete, and Lloyd N. Trefethen. The aaa algorithm for rational approximation. SIAM Journal on Scientific Computing, 40(3):A1494–A1522, 2018.
  • Pallaschke and Rolewicz [1997] Diethard Pallaschke and Stefan Rolewicz. Foundations of Mathematical Optimization: Convex Analysis without Linearity, volume 388. Springer Science & Business Media, 1997.
  • Peiris et al. [2021] V. Peiris, N. Sharon, N. Sukhorukova, and J. Ugon. Generalised rational approximation and its application to improve deep learning classifiers. Applied Mathematics and Computation, 389:125560, jan 2021. doi: 10.1016/j.amc.2020.125560. URL https://doi.org/10.1016%2Fj.amc.2020.125560.
  • Ralston [1965] Anthony Ralston. Rational chebyshev approximation by remes’ algorithms. Numer. Math., 7(4):322–330, August 1965. ISSN 0029-599X.
  • Rivlin [1962] TJ Rivlin. Polynomials of best uniform approximation to certain rational functions. Numerische Mathematik, 4(1):345–349, 1962.
  • Rubinov [2013] A. M. Rubinov. Abstract convexity and global optimization, volume 44. Springer Science & Business Media, 2013.
  • Rubinov and Simsek [1995] A. M. Rubinov and B. Simsek. Conjugate quasiconvex nonnegative functions. Optimization, 35(1):1–22, 1995. doi: 10.1080/02331939508844124.
  • Singer [1997] Ivan. Singer. Abstract convex analysis / Ivan Singer. Canadian Mathematical Society series of monographs and advanced texts. Wiley, New York, 1997. ISBN 0471160156.
  • Soltan [1984] V.V. Soltan. Introduction to the Axiomatic Theory of Convexity. Sthiinka, Kishinev, 1984. In Russian.
  • van de Vel [1993] M.L.J. van de Vel. Theory of convex structures. North-Holland, Amsterdam New York, 1993. ISBN 9780080933108.
  • ya Matsushita and Xu [2016] Shin ya Matsushita and Li Xu. On the finite termination of the douglas-rachford method for the convex feasibility problem. Optimization, 65(11):2037–2047, 2016.
  • Yang and Yang [2013] Yuning Yang and Qingzhi Yang. Some modified relaxed alternating projection methods for solving the two-sets convex feasibility problem. Optimization, 62(4):509–525, 2013.
  • Zaslavski [2013] A. J. Zaslavski. Subgradient projection algorithms and approximate solutions of convex feasibility problems. Journal of Optimization Theory and Applications, 157:803–819, 2013.
  • Zhao and Köbis [2018] Xiaopeng Zhao and Markus Arthur Köbis. On the convergence of general projection methods for solving convex feasibility problems with applications to the inverse problem of image recovery. Optimization, 67(9):1409–1427, 2018.