Applications and Issues in Abstract Convexity

R. Díaz Millán¹, Nadezda Sukhorukova^2,∗, Julien Ugon¹

(Date: January 2022)

¹Faculty of Science, Engineering and Built Environment , Deakin University, Australia
²Department of Mathematics, Swinburne University of Technology, Australia

Abstract. The theory of abstract convexity, also known as convexity without linearity, is an extension of the classical convex analysis. There are a number of remarkable results, mostly concerning duality, and some numerical methods, however, this area has not found many practical applications yet. In this paper we study the application of abstract convexity to function approximation. Another important research direction addressed in this paper is the connection with the so-called axiomatic convexity.

Keywords. Abstract convexity, axiomatic convexity, Chebyshev approximation, Carathéodory number, quasiaffine functions.

2010 Mathematics Subject Classification. 90C26,90C90, 90C47, 65D15.

^†^†footnotetext: ^∗Corresponding author. E-mail addresses: r.diazmillan@deakin.edu.au (R. Díaz Millán), nsukhorukova@swin.edu.au (N. Sukhorukova), julien.ugon@deakin.edu.au (Julien Ugon). Received ; Accepted .

1. Introduction

Abstract convexity appears as a natural extension of the classical convex analysis when linear and affine functions (elementary functions) are “replaced” by other types of functions. This is why abstract convexity is also known as “convexity without linearity”.

It was demonstrated that many results from the classical convex analysis: conjugation, duality, subdifferential related issues can be extended to “non-linear” settings [25, 31]. In [25] the authors provide a very detailed historical review of the theory and it turned out that the origins of this environment comes back to early 70s, while some specific issues were already in place in the 50s of the twentieth century.

Despite a very productive work done in the development of abstract convexity, there are still several direction for improvements. From the onset of abstract convex theory, much effort has been devoted to study duality, leading to very elegant generalisations of classical convexity results. On the other hand, the geometric aspects of convexity have not received as much attention. Additionally, the theory remains “abstract” and needs practical applications. One of the goals of this paper is to provide some applications and develop a framework on the properties that the elementary functions have to satisfy to be efficient in applications.

There are a number of difficulties to develop efficient computational methods. The first step is to choose the set of elementary functions. This set should be simple enough to construct fast and efficient methods. On another hand, this set has to generate accurate approximations to the functions from the applications. In some cases, the choice of elementary functions is reasonably straightforward (see section 3), but this is rather exceptional and in most applications the choice of a suitable class of elementary functions is a difficult task.

A deep and detailed study on abstract convexity and related issues can be found in [31] and also in [29]. A detailed survey of methods of abstract convex programming can be found in [2]. Many more contributions and developments and results on duality and methods are not mentioned in this paper, but the most recent state of the developments in abstract convexity and a comprehensive literature review can be found in [6].

The paper is organised as follows. In section 2 we provide the essential background of abstract convexity and axiomatic convexity. In the same section, we underline connections and possible cross-feeding between these two types of convexity. Then, in section 3 we illustrate how abstract convexity and axiomatic convexity can be applied to function approximation. Section 4 illustrates the results of numerical experiments. Finally, section 5 provides conclusions and future research directions.

2. Abstract Convexity

2.1. Definitions and preliminaries

Consider the set of all real-valued functions acting on the domain $X$ , denoted by $\mathscr{F}$ and defined by $\mathscr{F}:=\{f:X\to\mathds{R}\}$ . We called the set of abstract linear functions to any subset of functions $L\subseteq\mathscr{F}$ . Following, we introduce some known definitions to state the abstract convexity concept. Suppose that we defined the set of abstract linear functions $L\subseteq\mathscr{F}$ .

Definition 2.1.

The vertical closure of $L$ , $H:=\{l+c:l\in L,c\in\mathds{R}\}$ is the set of abstract affine functions.

Definition 2.2 (Abstract Convexity [29]).

A function $f$ is said to be $L$ -convex if there exists a set $U\subset H$ such that for any $x\in X$ , $f(x)=\sup_{u\in U}u(x)$ . Such functions are also called abstract convex with respect to $L$ .

In this paper we denote the set of all $L$ -convex functions by $\mathscr{F}_{L}$ .

Definition 2.3 (Support Set [29]).

The $L$ -support set of a function $f\in F$ is defined as:
$\mathop{\mathrm{missing}}{supp}_{L}f:=\{l\in L:l\leq f\},$ where $l\leq f$ means that $l(x)\leq f(x),\forall x\in X$ . The support sets of $L$ -convex functions are called $L$ -convex sets.

Therefore, informally, convexity without linearity simply means that the “role” of linear functions of the classical convex analysis is held by functions from a certain class. The first (and for some applications this is the most important) question is how to choose the set of elementary function.

The next definition, which will be detailed and used in the next subsection, “helps” the chosen of a “better” set of abstract linear functions.

Definition 2.4 (Supremal generator [14, 29]).

Let $\mathscr{F}$ be a set of functions defined on a set $X$ . A set $L\subset\mathscr{F}$ is called a supremal generator of $\mathscr{F}$ if each $f\in\mathscr{F}$ is abstract convex with respect to $L$ .

2.2. Quasiconvexity

The notion of quasiconvexity plays an essential role in abstract convexity. We are now going to explain why.

The notion of quasiconvexity was originally introduced in [11], where the author studied the behaviour of functions with convex sublevel sets, but the term quasiconvexity was introduced much later.

Definition 2.5.

Let $D$ be a convex subset of $\mathds{R}^{n}$ . A function $f:D\to\mathds{R}$ is quasiconvex if and only if its sublevel set

S_{\alpha}=\{x\in D\,|\,f(x)\leq\alpha\}

is convex for any $\alpha\in\mathbb{R}$ .

There are several equivalent definitions of quasiconvex functions, but the one we use is convenient for our study.

Definition 2.6.

A function $f:D\to\mathds{R}$ , where $D$ is a convex subset of $\mathds{R}^{n}$ , is called quasiconcave if $-f$ is quasiconvex.

Definition 2.7.

Functions that are both quasiconvex and quasiconcave are called quasiaffine (quasilinear).

Note the following important observations.

•

Quasiconvex functions do not need to be continuous.
•

In the case of univariate functions, quasiaffine functions are monotone functions.

If a function is quasiaffine on $\mathds{R}^{n}$ (unconstrained problems) then, from the definition, that its level sets must be half-spaces, and it is clear that the hyperplanes defining these half-spaces need to be parallel. In the presence of constraints, this observation is not valid and we provide an example in section 4.

There are many studies dedicated to quasiconvex functions and quasiconvex optimisation [5, 9, 10, 14, 29, 30]. In these studies, the notion of quasiconvexity appears as one of the possible generalisations of convexity.

2.3. Quasiconvex functions and supremal generators

Theorem 7.13 from [14] states the following.

Theorem 2.8.

The set of all lower semicontinuous quasiaffine functions forms a supremal generator of the set of all lower semicontinuous quasiconvex functions.

Essentially, this theorem states that in the case of quasiconvex minimisation problems, quasiaffine functions “replace” the role of linear functions in classical convex settings. Therefore, the choice of elementary functions is clear (all quasiaffine functions).

One main advantage is that the sublevel sets of “new” elementary functions are half-spaces (similar to classical convex analysis). In most practical problems, there are some specific characteristics for the quasiaffine approximations (for example, the class of rational or generalised rational approximations, approximations that are compositions of certain monotone or non-decreasing function with certain affine or quasiaffine functions, etc.). This additional information helps dealing with some specific applications.

In the rest of this section, we clarify the “roles” of quasiaffine functions in the new abstract convex settings. Then section 4 contains the results of computational experiments and illustrations.

2.4. Relation to Axiomatic Convexity

Axiomatic Convexity aims at generalising the notion of convexity using only set-theoretic definitions. Good reviews can be found in [32, 33]. Given a set $X$ , a family $\mathcal{C}$ of subsets of $X$ is called a convexity if it follows the following axioms:

(1)

$\emptyset$ and $V$ belong to $\mathcal{C}$ ;
(2)

Any arbitrary intersection of sets from $\mathcal{C}$ is in $\mathcal{C}$ : if $D\subset\mathcal{C}$ ,

$\bigcap_{A\in D}A\in\mathcal{C}$
(3)

The nested union of sets from $\mathcal{C}$ is in $\mathcal{C}$ : given an a family of sets $D\subset\mathcal{C}$ that is totally ordered by inclusion then

$\bigcup_{A\in D}A\in\mathcal{C}.$

The pair $(X,\mathcal{C})$ is called a convex structure (or (abstract) convex space).

These axioms (in particular axioms 1 and 2) ensure the existence of the convex hull of any set $S\subset X$ , namely the smallest convex set containing $S$ :

\mathrm{co}_{\mathcal{C}}S=\bigcap\{A\in\mathcal{C}:S\subset A\}.

If $F$ is finite, then $\mathrm{co}_{\mathcal{C}}F$ is called a polyhedron.

Axiomatic Convexity, also called abstract convexity by van de Vel [33], has found applications in combinatorial geometry [13, 18], among others. A family $\mathcal{C}$ that satisfies the first two axioms above is called a closure space, or $\cap$ -stable [15].

There are clear connections between abstract convexity as defined by Definition 2.3 and axiomatic convexity. Indeed, the notion of abstract convex function (Definition 2.2) first appeared in [15] in the context of closure spaces. The families of sets considered in [15] were the closure of the set of sublevel sets of a given family of functions.

Abstract convex functions (Definition 2.2) and Abstract convex sets (Definition 2.3) were then formally introduced by Kutateladze and Rubinov [17], who showed the equivalence between these sets and the ones considered by Fan [15]. Below we formalise some results on the correspondence between abstract convex sets and closure spaces. Let us start with noting that the support set of any (not necessarily $L$ -convex) function is $L$ -convex. Indeed, the support set of $f$ is also the support set of its $L$ -convex envelope, which is convex.

Proposition 2.9.

Consider a family of functions $L$ . Then, the set of $L$ -convex sets form a closure space. Furthermore, every closure space is isomorphic to such a set.

Proof.

Consider a family of functions $L$ and let $\mathcal{C}$ be the set of $L$ -convex sets. We first show that $L$ is closed with respect to inclusion. Indeed, let $D\subset\mathcal{C}$ be an arbitrary set of $L$ -convex sets. For each $A\in D$ , let $f_{A}(x)=\sup_{l\in A}l(x)$ and consider the function $f(x)=\inf_{A\in D}f_{A}(x)$ . We will show that $\cap_{A\in D}A=\mathop{\mathrm{missing}}{supp}f$ , which implies that $\cap_{A\in D}A$ is $L$ -convex (note that $\mathop{\mathrm{missing}}{supp}f$ is the support set of the lower convex envelope of $f$ ).

	$\displaystyle\mathop{\mathrm{missing}}{supp}f$	$\displaystyle=\{l\in L:l(x)\leq\inf_{A\in D}f_{A}(x)\forall x\in X\}$
		$\displaystyle=\{l\in L:l(x)\leq f_{A}(x)\forall A\in D,x\in X\}$
		$\displaystyle=\cap_{A\in D}\{l\in L:l(x)\leq f_{A}(x),\forall x\in X\}=\cap_{A\in D}A.$

The first axiom is clearly verified by $\mathcal{C}$ from the fact that $\emptyset$ is the support set of $\sup_{l\in\emptyset}l$ and $L$ is the support set of $\sup_{l\in L}l$ .

The second part of the statement is obtained by considering the set $L$ of indicator functions of the sets in the closure space. More precisely, let $\mathcal{C}$ be a closure space, and consider the set $L=\{i_{A}:A\in\mathcal{C}\}$ where

i_{A}(x)=\begin{cases}0&\text{if }x\in A\\ +\infty&\text{otherwise}\end{cases}

Then, $L$ is also the set of $L$ -convex functions. Indeed, let $D\subset\mathcal{C}$ . Define $U=\{i_{A}:A\in D\}\subset L$ and $f=\sup_{l\in U}l=\sup_{A\in D}i_{A}$ . Then $f(x)=0$ if and only if $x\in A$ for all $A\in D$ . That is, $f=i_{\cap_{A\in D}A}$ . The isomorphism between the set of support sets of functions from $L$ and $\mathcal{C}$ is evident. ∎

Note that in practice it is not necessary to consider the entire set of indicator functions, but only the set of indicator functions of a basis of $\mathcal{C}$ (i.e., a set of functions whose closure is $\mathcal{C}$ ), which corresponds to a suppremal generator of the corresponding abstract functions. It can also easily be seen that the set of domains of the $L$ -abstract functions defined as in the above proof, as well as the set of 0-sublevel sets, is precisely $\mathcal{C}$ .

On the other hand, not all families of abstract convex sets are a convexity structure.

Example 2.10.

Let $L$ be the set of linear functions on $\mathds{R}$ , and consider the functions $f_{\alpha}$ defined by $f_{\alpha}(x)=\alpha\lvert x\rvert$ , for $\alpha>0$ . These functions are $L$ -convex, and induce a nested sequence of $L$ -convex sets $S_{\alpha}=\mathop{\mathrm{missing}}{supp}_{L}f_{\alpha}=\{x\to\beta x:\beta\in[-\alpha,\alpha]\}$ .

However,

U=\bigcup_{\alpha<1}S_{\alpha}=\{x\to\beta x:\beta\in(-1,1)\}

is not a $L$ -convex set. Indeed, $f_{1}=\sup_{l\in U}l$ is the smallest $L$ -convex function whose support set contains $U$ , but the function $x\to x\in\mathop{\mathrm{missing}}{supp}f_{1}\setminus U$ , which shows that $U$ cannot be the support set of $f_{1}$ and therefore of any $L$ -convex function. This shows that the set of $L$ -convex sets is not a convex structure.

Let us introduce the notion of strict support set, which will enable us to address this issue.

The $L$ -strict support set of a function $f$ is defined as:

\mathop{\mathrm{\underline{supp}}}f=\{l\in L:l(x)<f(x),\forall x\in\mathop{\mathrm{missing}}{dom}(f)\}.

We define the family of sets $\mathcal{C}_{L}$ as follows:

Definition 2.11 (convexity extension).

	$\displaystyle\mathcal{C}_{L}^{f}$	$\displaystyle:=\{A\in L:\mathop{\mathrm{\underline{supp}}}f\subseteq A\subseteq\mathop{\mathrm{missing}}{supp}f\}$
	$\displaystyle\mathcal{C}_{L}$	$\displaystyle:=\bigcup_{f\in L}\mathcal{C}_{L}^{f}.$

We call the set $\mathcal{C}_{L}$ the convexity extension of the set of support sets of $L$ -convex functions.

Proposition 2.12.

For any family of functions $L$ , the convexity extension of $L$ , $\mathcal{C}_{L}$ , forms a convexity structure.

Proof.

Since $\mathcal{C}_{L}$ contains all $L$ -convex sets, it contains a closure space and therefore by Proposition 2.9, it contains the sets $\emptyset$ and $X$ .

To see why the second axiom is satisfied consider an arbitrary family of sets $D\subset\mathcal{C}_{L}$ . For each $A\in D$ , there exists a $L$ -function $f_{A}$ such that $\mathop{\mathrm{\underline{supp}}}f_{A}\subseteq A\subseteq\mathop{\mathrm{missing}}{supp}f_{A}$ . Define $f=\mathrm{co}_{L}\inf_{A\in D}f_{A}$ . Since $A\subseteq\mathop{\mathrm{missing}}{supp}f_{A}$ for any $A$ , we have that

\bigcap_{A\in D}A\subset\bigcap_{A\in D}\mathop{\mathrm{missing}}{supp}f_{A}=\mathop{\mathrm{missing}}{supp}f,

where the last equality was shown in the proof of Proposition 2.9. Additionally, we have:

	$\displaystyle\mathop{\mathrm{\underline{supp}}}f$	$\displaystyle=\{l\in L:l(x)<\inf_{A\in D}f_{A}(x)\forall x\in X\}$
		$\displaystyle\subset\{l\in L:l(x)<f_{A}(x)\forall A\in D,x\in X\}$
		$\displaystyle=\cap_{A\in D}\{l\in L:l(x)<f_{A}(x)\forall x\in X\}\subset\cap_{A\in D}A.$

Therefore $\mathop{\mathrm{\underline{supp}}}f\subseteq\cap_{A\in D}A\subseteq\mathop{\mathrm{missing}}{supp}f$ , we conclude that $\cap_{A\in D}A$ is in $\mathcal{C}_{L}$ .

Finally, to see that the third axiom is also satisfied, let $D$ be an ordered family of sets from $\mathcal{C}$ . For each $A\in D$ , we define $f_{A}$ as above. The nested nature of the sets in $D$ implies that for $A,A^{\prime}$ in $D$ , if $A\leq_{D}A^{\prime}$ , then

f_{A}\leq f_{A^{\prime}}.

(2.1)

Let $S=\cup_{A\in D}A$ and $f=\sup_{A\in D}f_{A}$ .

It is clear that $S\subset\mathop{\mathrm{missing}}{supp}f$ , since for any $u\in S$ there exists $A\in D$ such that $u\in\mathop{\mathrm{missing}}{supp}f_{A}$ . Then $u\leq f_{A}\leq f$ . Now, consider $u\notin S$ . Then $u\notin\cup_{A\in D}\mathop{\mathrm{\underline{supp}}}f_{A}$ , and so there exists $x\in X$ such that $u(x)\geq f_{A}(x)$ for any $A\in D$ , and therefore that $u(x)\geq\sup_{A\in D}f_{A}(x)=f(x)$ . Therefore $u\notin\mathop{\mathrm{\underline{supp}}}f$ . This implies that $\mathop{\mathrm{\underline{supp}}}f\subseteq S\subseteq\mathop{\mathrm{missing}}{supp}f$ , and therefore $S$ is in $\mathcal{C}_{L}$ and $\mathcal{C}_{L}$ is a convexity structure. ∎

Remark 2.13.

Propositions 2.9 and 2.12 imply an equivalence relation between sets of abstract linear functions according to the convexity structure (axioms 1-3) or closure space (axioms 1,2) the induced abstract convex sets are isomorphic to.

Axiomatic convexity has been applied to obtain generalisations of well known geometric results in convexity theory. Of particular interest to this paper are generalisations of Carathéodory’s theorems. Much research has been devoted to investigating this topic. We refer to [33] for a review of classical results in the area.

Definition 2.14 ([33, Definition 1.5]).

In a given Convexity structure $\mathcal{C}$ , a set $F$ is Carathéodory dependent if

\mathrm{co}_{\mathcal{C}}F\subseteq\bigcup_{a\in F}\mathrm{co}_{\mathcal{C}}(F\setminus\{a\}),

and Carathéodory independent otherwise.

Then we can define the Carathéodory number of a convex structure $\mathcal{C}$ as the largest cardinality of a Carathéodory independent set (i.e., any set $F$ with cardinality greater than $c(x)$ is Carathéodory dependent). The Carathéodory numbers of several important classes of convexity structures are known and discussed in the literature [33].

Generalisations of Helly’s and Radon’s theorems were obtained similarly. The Carathéodory number enables the following generalisation of Carathéodory’s classical result:

Proposition 2.15 ([33, Theorem 1.7]).

Consider a convexity structure $(\mathcal{C},X)$ with Carathéodory number $c(x)$ , and let $A\subset\mathcal{C}$ be a convex set from $\mathcal{C}$ . Then, for any $x\in A$ there exists a set $F\subset A$ such that $\lvert F\rvert\leq c(x)$ and $x\in\mathrm{co}F$ . This is best possible.

3. Application to approximation

3.1. Problem formulation

We are working with uniform (Chebyshev) approximation:

\min_{A}\sup_{\mathbf{x}\in X}|f(\mathbf{x})-g(A,\mathbf{x})|,

(3.1)

where $A$ are the decision variables and $X$ is a compact set. In many computer based applications, this set is a finite grid defined on a convex compact set. If we define $L=\{A\to g(A,\mathbf{x}):x\in X\}$ , then it is clear from Formulation (3.1) that the objective function of this problem is $H_{L}$ -convex.

In the case when the approximation function $g(A,t)$ is a polynomial (and its coefficients are subject to optimisation), the optimality conditions are based on maximal deviation points (Chebyshev theorem, see [7]). In the case of univariate function approximation, the conditions are based on the notion of alternating sequence.

Theorem 3.1 (Chebyshev).

A polynomial of degree at most $n$ is optimal in uniform (Chebyshev) norm if and only there are $n+2$ alternating points.

Essentially, the approximations here are presented as linear combinations of basis functions. In the case of polynomials the basis functions are monomials, but other classes of basis functions can be used. The corresponding optimisation problems are convex as supremum of linear forms. Theorem 3.1 can be proved using convex analysis and the number of alternative point ( $n+2$ ) obtained through Carathéodory’s theorem. In other words, the following conjecture is true, in the case when $L$ is a set of linear functions:

Conjecture 3.2.

For a function from $L$ to be an optimal approximation in uniform norm, it is enough that it is optimal at $c(\mathcal{C}_{H_{L}})+1$ extreme points.

We also conjecture that this is best possible, in the sense that the statement is generally not true if we replace $c(\mathcal{C}_{H_{L}})+1$ with $c(\mathcal{C}_{H_{L}})$ .

If true, this conjecture could form the basis for algorithms for best approximation, such a generalisation of Vallée-Poussin’s procedure [12]. Later in this section, we discuss an example of families $L$ in the context of this conjecture.

One possible generalisation to polynomial approximation is rational approximation, that is, approximation by the ratio of two polynomials, whose coefficients are subject to optimisation. Note that in some cases, the degree of the denominator and numerator can be reduced without compromising the accuracy:

\frac{\sum_{j=0}^{n}a_{j}t_{i}^{j}}{\sum_{k=0}^{m}b_{k}t_{i}^{k}}=\frac{\sum_{j=0}^{n-\nu}a_{j}t_{i}^{j}}{\sum_{k=0}^{m-\mu}b_{k}t_{i}^{k}},

where $d=\min\{\nu,\mu\}$ is called the defect. Then the necessary and sufficient optimality conditions are as follows [1].

Theorem 3.3.

A rational function in $\mathcal{R}_{n,m}$ with defect $d$ is the best polynomial approximation of a function $f\in C^{0}(I)$ (the space of continuous functions over $I$ ) if and only if there exists a sequence of at least $n+m+2-d$ points of maximal deviation where the sign of the deviation at these points alternates.

Therefore, similar to polynomial approximation, these conditions are bases the number of alternating points.

In connection to Conjecture 3.2, $c(\mathcal{C}_{H_{L}})$ is $n+m+2$ . If $n$ and $m$ are the degrees of the corresponding polynomials, then the total number of the parameters is $n+1+m+1=n+m+2$ . However, one of the parameters can be fixed (otherwise one can divide the numerator and denominator by the same number). Therefore, there are $n+m+1$ “free” parameters and we add one more for $c(\mathcal{C}_{H_{L}})$ (similar to Carathéodory’s theorem from classical convex analysis). We will provide a formal proof in a future paper.

Rational approximation was a very popular research area in the 1950s-70s [1, 4, 20, 27, 28] (just to name a few).

If the basis functions are not restricted to monomials, the approximations are called generalised rational approximations. The term is due to Cheney and Loeb [8].

One way to approach rational approximation is through constructing “near optimal” solutions [24]. This approach is very efficient and therefore very popular. The extension of this approach to non-monomial basis functions and multivariate approximation remains open (an extension to complex domains can be found in [23]).

An alternative way is based on modern optimisation techniques. This approach is preferable when the basis function are not restricted to univariate monomials and when deeper theoretical study is required. The corresponding optimisation problems are quasiconvex and can be solved using a general quasiconvex optimisation methods.

Rational and generalised rational functions are quasiaffine as ratios of linear forms [5, 19, 26]. It can be solved efficiently by applying a simple, but robust approach, called the bisection method for quasiconvex functions [5].

In [26] the authors use a well-known bisection method for quasiconvex functions (see [5]) to solve these problems. In [22], the authors use a projection-type algorithm for solving these problems.

3.2. Bisection method

Bisection method for quasiconvex functions [5] can be used efficiently for rational and generalised rational approximation, including multivariate settings [21, 26]. Can we use abstract convexity to extend this method to a wider class of approximations?

The problem can be formulated as follows:

\text{minimise}~{}\tilde{z}

(3.2)

subject to

f(\mathbf{x})-\frac{\mathbf{A}^{T}\mathbf{G}(\mathbf{x})}{\mathbf{B}^{T}\mathbf{H}(\mathbf{x})}\leq\tilde{z},~{}\mathbf{x}\in X,

(3.3)

\frac{\mathbf{A}^{T}\mathbf{G}(\mathbf{x})}{\mathbf{B}^{T}\mathbf{H}(\mathbf{x})}-f(\mathbf{x})\leq\tilde{z},~{}\mathbf{x}\in X,

(3.4)

\mathbf{B}^{T}\mathbf{H}(\mathbf{x})\geq\delta,~{}\mathbf{x}\in X,

(3.5)

where $\tilde{z}$ is the maximal deviation and $\delta$ is a small positive number. Problem (3.2)-(3.5) is not linear.

The idea of this method is based on the fact that all the sublevel sets of quasiconvex functions are convex and therefore the sublevel sets of quasiaffine functions are half-spaces. Essentially, this means that the constraint set (3.3)-(3.5) for a fixed $\tilde{z}$ is an intersection of a finite number of half-spaces ( $X$ is a finite grid) and therefore it is a polytope.

The algorithm starts with an upper and lower bound ( $u$ and $l$ ) for the maximal deviation, then the sublevel set for the maximal deviation at the level $z=\frac{u+l}{2}$ is a convex set and the algorithm checks if this set is empty. If it is empty, then the upper bound is updated to $z$ , otherwise, the lower bound is updated to $z$ . The algorithm terminates when the upper and lower bounds are within the specified precision.

In general, checking if the convex set (sublevel set of the maximal deviation) is empty or not may be a difficult task (convex feasibility problems). There are a number of efficient methods ([3, 34, 35, 36, 37] just to name a few), but there are still several open problems here. The discussion of these problems is out of scope of the current paper.

In the case of multivariate generalised rational approximation, however, this problem (convex feasibility) can be reduced to solving a linear programming problem [21]. The denominator of the approximation does not change the sign. Assume, for simplicity, that it is positive, then the problem of checking the feasibility is equivalent to solving the following problem:

\text{minimise}~{}\tilde{u}

(3.6)

subject to

f(\mathbf{x}){\mathbf{B}^{T}\mathbf{H}(\mathbf{x})}-{\mathbf{A}^{T}\mathbf{G}(\mathbf{x})}\leq z{\mathbf{B}^{T}\mathbf{H}(\mathbf{x})}+\tilde{u},~{}\mathbf{x}\in X,

(3.7)

{\mathbf{A}^{T}\mathbf{G}(\mathbf{x})}-f(\mathbf{x}){\mathbf{B}^{T}\mathbf{H}(\mathbf{x})}\leq z{\mathbf{B}^{T}\mathbf{H}(\mathbf{x})}+\tilde{u},~{}\mathbf{x}\in X,

(3.8)

\mathbf{B}^{T}\mathbf{H}(\mathbf{x})\geq\delta,~{}\mathbf{x}\in X,

(3.9)

where $z=\frac{1}{2}(u+l)$ is the current bisection point (bisecting the possible values of maximal deviation).

If an optimal solution $\tilde{u}\leq 0$ , the corresponding sublevel set of the maximal deviation function is not empty (update the upper bound), otherwise the set is empty (update the lower bound). If $X$ is a finite grid, then (3.6)-(3.9) is a linear programming problem and can be solved efficiently at each step of the bisection method.

To summarise, an efficient implementation of the bisection method for quasiconvex functions, can be extended to approximation by any type of quasiaffine functions, since at each step one needs to check if a polytope (intersection of a finite number of half-spaces) is non-empty. The main problem is how to find this polytope. In the case of rational and generalised rational approximation, this problem is simple, but not for some other types of quasiaffine approximations. In section 4 we give more examples where the construction of this polytope is straightforward (composition of monotone and affine or ratios of affine functions).

One possibility for constructing sublevel sets for smooth quasiaffine approximation is to use their gradients (if it is not vanishing). In the rest section, we give an example demonstrating that this approach may be complicated if there are constraints (even simple linear constraints). In particulra, this approach will not work even in the case of rational approximation due to the requirement for the denominator to be strictly positive.

3.3. Approximation by a quasiaffine function

Suppose that instead of a generalised rational approximation one needs to approximate by a function from a different class, but the approximations are quasiaffine functions with respect to the decision variables. Then the problem is as follows:

\text{minimise}~{}\tilde{z}

(3.10)

subject to

f(\mathbf{x})-g(A,\mathbf{x})\leq\tilde{z},~{}\mathbf{x}\in X,

(3.11)

g(A,\mathbf{x})-f(\mathbf{x})\leq\tilde{z},~{}\mathbf{x}\in X,

(3.12)

where $g(A,\mathbf{x})$ is a quasiaffine function. Since the sublevel sets of quasiaffine functions are half-spaces, the constraint set (3.11)-(3.12) is a polytope, since $X$ is a finite grid.

Remark 3.4.

A finite number of linear constraints may be added to (3.10)-(3.12), while the constraint set remains a polytope.

Therefore, the constraint set (for any fixed $\tilde{z}$ ), with or without additional linear constraints, is a poytope and we only need to check if this polytope is non-empty. However, we still need an efficient approach for finding this polytope. In the case of smooth functions, one can use the gradient as a possible normal vector to the hyperplanes, providing that it is not a zero vector. In section 4 we study the examples where the quasiaffine approximations can be decomposed as a composition of strictly monotone and affine (quasiaffine) functions. Other situations may be harder.

Note that if a function is quasiaffine on the whole space, then the sublevel sets are half-spaces, whose boundary hyperplanes are parallel. Otherwise, two hyperplanes will intersect. If there are additional (even linear) constraints, this observation may not be valid anymore. The following example illustrates that this may happen even in a very simple case of two variables.

Example 3.5.

Consider $f(x,y)=\frac{x}{y}$ , where $y>0$ . The sublevel sets are

S_{\alpha}=\{(x,y):\frac{x}{y}\leq\alpha,y>0\},

where $\alpha$ is a given real number. Then $S_{\alpha}$ can be described as

\{x-\alpha y\leq 0,\quad y>0\}.

Each sublevel set is still a half-space, but the corresponding hyperplanes (in this example they correspond to level sets) are not parallel. These hyperplanes intersect at $(0,0)$ , but this point is excluded from the domain due to the requirement for the denominator $y$ to be strictly positive.

This example demonstrates that the computation of sublevel sets for quasiaffine functions may be challenging. In the next section we give some examples of the classes of approximations that can be handled efficiently. We also provide the results of the numerical experiments.

4. Numerical experiments

4.1. Strictly monotone function in composition with an affine function.

In this example we approximate function

f(x,y)=(-x+y^{3}+x^{4})^{4}

(4.1)

by a quasiaffine function in the form

g(A,x,y)=(a_{1}+a_{2}x+a_{3}y+a_{4}x^{2}+a_{5}y^{2}+a_{6}xy)^{3},

(4.2)

where $A=(a_{1},a_{2},a_{3},a_{4},a_{5},a_{6})$ are the decision variables, $x$ and $y$ form a grid on $[-1,1]$ with the step-size 0.1. The optimal coefficients are (rounded to two decimal places):

A=(-1.88,-0.75,0.31,3.29,0.98,-0.74),

the maximal absolute deviation is 8.01 (to two decimal places).

Figure 1 represents the function $f(x,y)$ , figure 2 depicts the best found approximation and figure 3 contains the deviation function $f(x,y)-g(A,x,y)$ , which is the error of approximation.

Function $f(x,y)$ is almost flat with an abrupt increase around point $(-1,1)$ . Visually, the approximation resembles the shape of the original function and the magnitude of the deviation confirms that the approximation is reasonably good.

Refer to caption — Figure 1. Function $f(x,y)=(-x+y^{3}+x^{4})^{4}$ .

4.2. Strictly monotone function in composition with a rational function.

In this example we approximate the same function

f(x,y)=(-x+y^{3}+x^{4})^{4}

(4.3)

by a quasiaffine function in the form

g_{1}(A,x,y)=\left(\frac{a_{1}+a_{2}x+a_{3}y+a_{4}x^{2}+a_{5}y^{2}}{1+a_{6}xy}\right)^{3},

(4.4)

where, similar to the previous example, $A=(a_{1},a_{2},a_{3},a_{4},a_{5},a_{6})$ are the decision variables (the number of decision variables is the same as in the previous example), $x$ and $y$ are on a grid $[-1,1]$ with the step-size 0.1. The purpose of approximation is to use lower degree polynomials in the composition. The optimal coefficients are (rounded to two decimal places):

A=(-1.66,-0.93,1.05,2.87,0.98,1),

the maximal absolute deviation is 11.48 (to two decimal places).

Figure 4 contains the best found approximation and figure 5 depicts the deviation function $f(x,y)-g_{1}(A,x,y)$ , which is the error of approximation.

The new approximation still resembles the shape of the original function, but the maximal deviation is higher.

4.2.1. Deep learning applications.

At first glance, the class of a composition of a monotone univariate function with an affine or rational function is restrictive. However, it has many practical applications, including deep learning, where compositions of univariate activation functions and affine transformations are the main components or the models [16]. Most common choices for activation functions are sigmoid functions, ReLU and Leaky ReLU (the last two functions are simply non-decreasing piecewise linear functions with two linear pieces).

5. Conclusions

The goal of this study is to demonstrate that abstract convexity (in the sense “convexity without linearlity”) has several applications in different branches of mathematics, including approximation theory. This application appears naturally in the new settings.

The results of the numerical experiments demonstrate that our approach is computationally efficient. The applications lead to practical applications as well. One potential application is data science and deep learning.

We also touched the connections between “abstract convexity” and “axiomatic convexity”. These areas have many overlappings that can be seen as a new way of looking at the problems, in particular, function approximation.

Acknowledgements

This research was supported by the Australian Research Council (ARC), Solving hard Chebyshev approximation problems through nonsmooth analysis (Discovery Project DP180100602).

References

Achieser [1956] Achieser. Theory of Approximation. Frederick Ungar, New York, 1956.
Andramonov [2002] Mikhail Andramonov. A survey of methods of abstract convex programming. Journal of Statistics and Management Systems, 5(1-3):21–37, 2002. doi: 10.1080/09720510.2002.10701049. URL https://doi.org/10.1080/09720510.2002.10701049.
Bauschke and Lewis [2000] Heinz H Bauschke and Adrian S Lewis. Dykstras algorithm with bregman projections: A convergence proof. Optimization, 48(4):409–427, 2000.
Boehm [1964] Boehm. Functions whose best rational chebyshev approximation are polynomials. Numer. Math., pages 235––242, 1964.
Boyd and Vandenberghe [2009] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, New York, USA, seventh edition, 2009.
Bui et al. [2021] Hoa T. Bui, Regina S. Burachik, Alexander Y. Kruger, and David T. Yost. Zero duality gap conditions via abstract convexity. Optimization, 0(0):1–37, 2021. doi: 10.1080/02331934.2021.1910694. URL https://doi.org/10.1080/02331934.2021.1910694.
Chebyshev [1955] P.L Chebyshev. The theory of mechanisms known as parallelograms. In Selected Works, pages 611–648. Publishing Hours of the USSR Academy of Sciences, Moscow, 1955. (In Russian).
Cheney and Loeb [1964] E.W. Cheney and H.L. Loeb. Generalized rational approximation. Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis, 1(1):11–25, 1964.
Crouzeix [1980] J. P. Crouzeix. Conditions for convexity of quasiconvex functions. Mathematics of Operations Research, 5(1):120–125, 1980.
Daniilidis et al. [2002] A. Daniilidis, N. Hadjisavvas, and J.-E. Martinez-Legaz. An appropriate subdifferential for quasiconvex functions. SIAM Journal on Optimization, 12(2):407–420, 2002.
de Finetti [1949] B. de Finetti. Sulle stratificazioni convesse. Ann. Mat. Pura Appl., pages 173–183, 1949.
[12] C J de la Vallée Poussin. Sur la méthode de l’approximation minimum. Ann. Soc. Sci. Bruxelles, 35:1–16.
Duchet [1987] Pierre Duchet. Convexity in combinatorial structures. In Proceedings of the 14th Winter School on Abstract Analysis, pages [261]–293. Circolo Matematico di Palermo, 1987. URL http://eudml.org/doc/220736.
Dutta and Rubinov [2005] J. Dutta and A. M. Rubinov. Abstract convexity. Handbook of generalized convexity and generalized monotonicity, 76:293–333, 2005.
Fan [1963] Ky Fan. On the Krein-Milman theorem. In Convexity: Proceedings of the Seventh Symposium in Pure Mathematics of the American Mathematical Society, pages 211–219. American Mathematical Society, 1963. doi: 10.1090/pspum/007/0154097. URL https://doi.org/10.1090%2Fpspum%2F007%2F0154097.
Goodfellow et al. [2016] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
Kutateladze and Rubinov [1972] S S Kutateladze and A M Rubinov. Minkowski duality and its applications. Russ. Math. Surv., 27(3):137–191, jun 1972. doi: 10.1070/rm1972v027n03abeh001380. URL https://doi.org/10.1070%2Frm1972v027n03abeh001380.
Levi [1951] F. W. Levi. On helly’s theorem and the axioms of convexity. Journal of the Indian Mathematical Society, 15:65–76, 1951.
Loeb [1960] Henry L Loeb. Algorithms for chebyshev approximations using the ratio of linear forms. Journal of the Society for Industrial and Applied Mathematics, 8(3):458–465, 1960.
Meinardus [1967] G. Meinardus. Approximation of Functions: Theory and Numerical Methods. Springer-Verlag, Berlin, 1967.
Millán et al. [a] R. Díaz Millán, V. Peiris, N. Sukhorukova, and J. Ugon. Multivariate approximation by polynomial and generalised rational functions. Accepted in Optimization, a. URL https://arxiv.org/abs/2101.11786v1.
Millán et al. [b] R. Díaz Millán, N. Sukhorukova, and J. Ugon. An algorithm for best generalised rational approximation of continuous functions. In PRESS, accepted in Set-Valued and Variational Analysis, b. URL https://arxiv.org/abs/2011.02721.
Nakatsukasa and Trefethen [2020] Yuji Nakatsukasa and Lloyd N. Trefethen. An algorithm for real and complex rational minimax approximation. SIAM Journal on Scientific Computing, 42(5):A3157–A3179, 2020. doi: 10.1137/19M1281897.
Nakatsukasa et al. [2018] Yuji Nakatsukasa, Olivier Sete, and Lloyd N. Trefethen. The aaa algorithm for rational approximation. SIAM Journal on Scientific Computing, 40(3):A1494–A1522, 2018.
Pallaschke and Rolewicz [1997] Diethard Pallaschke and Stefan Rolewicz. Foundations of Mathematical Optimization: Convex Analysis without Linearity, volume 388. Springer Science & Business Media, 1997.
Peiris et al. [2021] V. Peiris, N. Sharon, N. Sukhorukova, and J. Ugon. Generalised rational approximation and its application to improve deep learning classifiers. Applied Mathematics and Computation, 389:125560, jan 2021. doi: 10.1016/j.amc.2020.125560. URL https://doi.org/10.1016%2Fj.amc.2020.125560.
Ralston [1965] Anthony Ralston. Rational chebyshev approximation by remes’ algorithms. Numer. Math., 7(4):322–330, August 1965. ISSN 0029-599X.
Rivlin [1962] TJ Rivlin. Polynomials of best uniform approximation to certain rational functions. Numerische Mathematik, 4(1):345–349, 1962.
Rubinov [2013] A. M. Rubinov. Abstract convexity and global optimization, volume 44. Springer Science & Business Media, 2013.
Rubinov and Simsek [1995] A. M. Rubinov and B. Simsek. Conjugate quasiconvex nonnegative functions. Optimization, 35(1):1–22, 1995. doi: 10.1080/02331939508844124.
Singer [1997] Ivan. Singer. Abstract convex analysis / Ivan Singer. Canadian Mathematical Society series of monographs and advanced texts. Wiley, New York, 1997. ISBN 0471160156.
Soltan [1984] V.V. Soltan. Introduction to the Axiomatic Theory of Convexity. Sthiinka, Kishinev, 1984. In Russian.
van de Vel [1993] M.L.J. van de Vel. Theory of convex structures. North-Holland, Amsterdam New York, 1993. ISBN 9780080933108.
ya Matsushita and Xu [2016] Shin ya Matsushita and Li Xu. On the finite termination of the douglas-rachford method for the convex feasibility problem. Optimization, 65(11):2037–2047, 2016.
Yang and Yang [2013] Yuning Yang and Qingzhi Yang. Some modified relaxed alternating projection methods for solving the two-sets convex feasibility problem. Optimization, 62(4):509–525, 2013.
Zaslavski [2013] A. J. Zaslavski. Subgradient projection algorithms and approximate solutions of convex feasibility problems. Journal of Optimization Theory and Applications, 157:803–819, 2013.
Zhao and Köbis [2018] Xiaopeng Zhao and Markus Arthur Köbis. On the convergence of general projection methods for solving convex feasibility problems with applications to the inverse problem of image recovery. Optimization, 67(9):1409–1427, 2018.