This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Mixed-integer linear programming for computing optimal experimental designs

Abstract

The problem of computing an exact experimental design that is optimal for the least-squares estimation of the parameters of a regression model is considered. We show that this problem can be solved via mixed-integer linear programming (MILP) for a wide class of optimality criteria, including the criteria of A-, I-, G- and MV-optimality. This approach improves upon the current state-of-the-art mathematical programming formulation, which uses mixed-integer second-order cone programming. The key idea underlying the MILP formulation is McCormick relaxation, which critically depends on finite interval bounds for the elements of the covariance matrix of the least-squares estimator corresponding to an optimal exact design. We provide both analytic and algorithmic methods for constructing these bounds. We also demonstrate the unique advantages of the MILP approach, such as the possibility of incorporating multiple design constraints into the optimization problem, including constraints on the variances and covariances of the least-squares estimator.

keywords:
Optimal design , Exact design , Mixed-integer linear programming , A-optimality , G-optimality
journal: Journal of Statistical Planning and Inference
\affiliation

[label1]organization=Comenius University Bratislava, city=Bratislava, country=Slovakia

1 Introduction

The optimal design of experiments is an important part of theoretical and applied statistics (e.g., Fedorov (1972), Pázman (1986), Pukelsheim (1993), Atkinson et al. (2007), Goos and Jones (2011)) that overlaps with various areas of general optimization (Vandenberghe and Boyd (1999), Ghosh et al. (2008), Todd (2016), and others). From the statistical perspective, the typical problem in optimal experimental design is the selection of trials to maximize the information obtained about the parameters of an underlying model.

In this research, we focus on optimal exact designs, which are technically solutions to a special class of discrete optimization problems; in contrast, approximate designs are continuous relaxations of exact designs. Precise formulations of both design classes are provided in Section 1.2. In the remainder of this paper, we use the term “design” specifically to refer to an exact design.

Analytical forms of optimal designs are difficult to derive; they have only been found for some special cases (e.g., Gaffke (1986), Neubauer et al. (2000), Bailey (2007)). Therefore, much attention is given to numerical computations. There are two general classes of algorithms for solving optimal design problems: heuristics and enumeration methods.

Heuristics aim to rapidly generate a reasonably efficient design but often fail to converge to an optimal one. Some of the most popular heuristics include various exchange methods (see, e.g., Chapter 12 in Atkinson et al. (2007) for a review of classical algorithms or Huang et al. (2019) for more recent developments) and methods inspired by physics or the natural world (Haines (1987), Lin et al. (2015), Chen et al. (2022) and others). Another approach is based on a quadratic approximation of the criterion (Harman and Filová (2014), Filová and Harman (2020)).

A separate approach consists of first computing optimal approximate designs and subsequently rounding them. The advantage of this approach is that the optimal approximate design problem is continuous and typically convex. From the perspective of mathematical programming, this means that this problem can be handled through various powerful methods of convex optimization, such as general convex programming (e.g., Wong and Zhou (2019), Wong and Zhou (2023)), semidefinite programming (e.g., Vandenberghe and Boyd (1999)), second-order cone programming (Sagnol (2011), Sagnol and Harman (2015)) and linear programming (e.g., Harman and Jurík (2008)). The disadvantage is that for practical applications, the resulting optimal approximate designs need to be converted into exact designs via a rounding procedure (e.g., Pukelsheim and Rieder (1992)), which often leads to a suboptimal exact design. That is, the two-step approach—first computing optimal approximate designs and then rounding them to exact designs—can also be considered a heuristic.

Compared to heuristics, enumeration methods are typically slower, but they ultimately find optimal designs with a proof of optimality. Enumeration methods include algorithms for solving integer programs because optimal design problems are, trivially, special instances of a general integer program. As such, very small optimal design problems can be solved through the complete enumeration of all permissible designs (e.g., Haines and Clark (2014)), but a more economical integer or mixed-integer programming method is usually needed, such as branch and bound (e.g., Welch (1982), Duarte et al. (2020), Ahipasaoglu (2021)). For problems that are shown to have a special structure, specialized and, thus, typically better-performing solvers can be employed. In particular, a mixed-integer second-order cone programming (MISOCP) formulation of the optimal design problem for the most popular design criteria presented by Sagnol and Harman (2015) permits the use of efficient branch and cut algorithms.

In this paper, we show that a large class of optimal design problems can be solved by means of an even more specialized mixed-integer linear programming (MILP) approach. More precisely, we show that the classical exact optimal design problem can be expressed as an MILP problem for a rich class of minimax optimality criteria, including the fundamental criteria of AA-, II-, MVMV- and GG-optimality.

This result is theoretically important, but it also has practical advantages. Most obviously, in contrast to MISOCP solvers, MILP solvers are more readily available. Moreover, our MILP formulation allows one to introduce additional constraints on the designs corresponding to practical requirements—these could be not only linear constraints on the experimental resources, but also limits on the variances and covariances of the least-squares estimator (more details are provided in Section 4). In contrast, heuristics cannot handle such additional design constraints (with some exceptions; see, e.g., Harman et al. (2016)), and even the existing enumeration methods (such as those of Ahipasaoglu (2021) and Sagnol and Harman (2015)) typically do not directly allow such a wide range of constraints (and it would be nontrivial to modify them to do so). Another advantage of the MILP formulation is that it covers criteria such as MVMV-optimality, for which an MISOCP formulation is currently unavailable.

Integer linear programming (ILP) and MILP have already been used in the field of experimental design for various purposes, albeit in a substantially different way than the one proposed in this paper. For instance, ILP has been used for the classification of orthogonal arrays and for the generation of minimum-size orthogonal fractional factorial designs (e.g., Bulutoglu and Margot (2008), Fontana (2013)), and MILP has been used for optimal arrangements of orthogonal designs (e.g., Sartono et al. (2015a), Sartono et al. (2015b), Vo-Thanh et al. (2018)). However, the above applications utilize specific (linear) characteristics of the studied models or of the studied classes of designs, whereas we provide an MILP formulation for arbitrary models, i.e., for problems that are not inherently linear (although we utilize certain indirect linear aspects of the considered optimality criteria, as detailed in Section 1.2).

The key idea underlying our formulation is McCormick relaxation (McCormick (1976)), which critically depends on finite interval bounds for all elements of a certain matrix. In the context of the optimal design of experiments, this matrix is the covariance matrix of the least-squares estimator of the model parameters under the optimal design; for brevity, we call such constraints covariance bounds. It would be trivial to construct covariance bounds for the MILP problem provided that we knew the optimal design. However, the optimal design is the final aim of the MILP computation itself; therefore, it is not known beforehand. The situation is also complicated by the fact that the covariance matrix depends on the experimental design in a nonlinear way. Therefore, the construction of covariance bounds is a major challenge in the application of the MILP approach, and we provide both analytical and numerical constructions of such bounds.

MILP formulations based on McCormick relaxation were recently proposed by Bhela et al. (2019) in the network topology context and, independently, by Okasaki et al. (2023) in the context of cc-optimal Bayesian sampling designs. Bhela et al. (2019) sought an optimal placement of edges in a graph so as to minimize a given function of the eigenvalues of the (weighted) Laplacian. This problem corresponds to AA-optimality in a specific statistical model (for more details on the connection between optimal designs and optimizing graph Laplacians, see Cheng (1981) and Bailey and Cameron (2009)). Thus, the problems addressed by Bhela et al. (2019) and Okasaki et al. (2023) can mathematically be viewed as special instances of the optimal design problem. In this paper, we extend their approaches to general problems of experimental design and to a broad class of minimax criteria. Importantly, to this end, we provide the covariance bounds for such general problems.

In the remainder of this section, we introduce the notation used in this paper and discuss the optimal design problem in more detail. In Section 2, we derive the MILP formulation, and in Section 3, we provide covariance bounds that can be used for this formulation. Various extensions are discussed in Section 4. The proposed formulation is applied in multiple experimental settings and compared with the MISOCP approach in Section 5. Finally, Section 6 presents a short discussion.

1.1 Notation

Throughout the paper, we use the following standard mathematical notation: 𝐈n\mathbf{I}_{n} is the n×nn\times n identity matrix; 𝟎m×n\mathbf{0}_{m\times n} is the m×nm\times n matrix of zeros; 𝟏n\mathbf{1}_{n} is the n×1n\times 1 all-ones vector; and 𝟎n\mathbf{0}_{n} is the n×1n\times 1 vector of zeros. Indices are suppressed where they are evident. The symbol 𝐞i\mathbf{e}_{i} denotes the ii-th elementary unit vector (i.e., its ii-th element is one, and all other elements are zero); n\mathbb{R}^{n} is the set of all real n×1n\times 1 vectors; and 𝕊m\mathbb{S}^{m}, 𝕊+m\mathbb{S}^{m}_{+} and 𝕊++m\mathbb{S}^{m}_{++} are the sets of all m×mm\times m symmetric, (symmetric) nonnegative definite and (symmetric) positive definite matrices, respectively. For 𝐌𝕊+m\mathbf{M}\in\mathbb{S}^{m}_{+}, λmax(𝐌)\lambda_{\max}(\mathbf{M}) denotes the maximum eigenvalue, tr(𝐌)\mathrm{tr}(\mathbf{M}) is the trace, 𝐌1/2\mathbf{M}^{1/2} is the square-root matrix, and if 𝐌\mathbf{M} is nonsingular, 𝐌1/2\mathbf{M}^{-1/2} denotes the inverse of 𝐌1/2\mathbf{M}^{1/2}. For any m×nm\times n matrix 𝐀\mathbf{A}, we denote its transpose by 𝐀\mathbf{A}^{\prime} and its Moore–Penrose pseudoinverse by 𝐀+\mathbf{A}^{+}; 𝒞(𝐀)\mathcal{C}(\mathbf{A}) is the column space (the range) of 𝐀\mathbf{A}. The notation (𝐀)jk(\mathbf{A})_{jk} represents the element of 𝐀\mathbf{A} with coordinates (j,k)(j,k). The symbol \|\cdot\| denotes the Euclidean norm of a vector or the Frobenius norm of a matrix. The symbol 0\mathbb{N}_{0} represents the set of all nonnegative integers, and {1:n}\{1{:}n\} is an abbreviated form of the set {1,,n}\{1,\ldots,n\}.

1.2 Optimal designs

For clarity, we explain the main message of this paper in the context of the standard optimal experimental design on a finite design space {1:n}\{1{:}n\}. That is, suppose that under the ii-th experimental conditions (“at design point ii”), the real-valued observation satisfies the regression model yi=𝐟i𝜷+εiy_{i}=\mathbf{f}_{i}^{\prime}\boldsymbol{\beta}+\varepsilon_{i}, where 𝜷m\boldsymbol{\beta}\in\mathbb{R}^{m} is the vector of unknown parameters and the 𝐟im\mathbf{f}_{i}\in\mathbb{R}^{m}, i{1:n}i\in\{1{:}n\}, are known vectors, sometimes called regressors. An actual experiment then consists of NN trials performed at NN design points selected from {1:n}\{1{:}n\}. We assume that m2m\geq 2, E(εi)=0E(\varepsilon_{i})=0, D(εi)=1D(\varepsilon_{i})=1 and that the observations from distinct trials are uncorrelated. The assumption D(εi)=1D(\varepsilon_{i})=1 can be made without loss of generality and is therefore commonly adopted in the optimal design of experiments. In Section 4, we show that our results can be easily extended to more general settings, such as nonlinear models, continuous design spaces and diverse design constraints. We note, however, that they do not extend to the case of correlated observations because for such problems, the information matrix does not have a desirable (additive) form.

We also initially restrict ourselves to binary (i.e., replication-free) exact designs, as the corresponding optimal design problems naturally lend themselves to our MILP reformulation. Optimal binary designs are of interest by themselves (see, e.g., Rasch et al. (1997)); however, we show in Section 4 that our results also extend to exact designs with replications. A binary design of size NnN\leq n on {1:n}\{1{:}n\} is a selection of NN distinct elements of {1:n}\{1{:}n\}. This can be formalized by representing a binary design as a vector 𝐝=(d1,,dn){0,1}n\mathbf{d}=(d_{1},\ldots,d_{n})^{\prime}\in\{0,1\}^{n} such that 𝟏n𝐝=N\mathbf{1}^{\prime}_{n}\mathbf{d}=N, where did_{i} denotes the number of trials at design point ii. The set of all such designs is denoted by 𝒟N(n)\mathcal{D}^{(n)}_{N}. In contrast, a general design 𝐝\mathbf{d} satisfies 𝐝{0,1,,N}n\mathbf{d}\in\{0,1,\ldots,N\}^{n} such that 𝟏n𝐝=N\mathbf{1}^{\prime}_{n}\mathbf{d}=N. As we focus on binary designs in our presentation, by a “design” we mean a binary design.

The amount of information provided by a design 𝐝\mathbf{d} is measured by its information matrix 𝐌(𝐝)=i=1ndi𝐟i𝐟i\mathbf{M}(\mathbf{d})=\sum_{i=1}^{n}d_{i}\mathbf{f}_{i}\mathbf{f}_{i}^{\prime}, which, if nonsingular, is the inverse of the covariance matrix 𝚺\boldsymbol{\Sigma} of the least-squares estimator 𝜷^\widehat{\boldsymbol{\beta}} of 𝜷\boldsymbol{\beta}. Moreover, if the errors are normal, 𝐌(𝐝)\mathbf{M}(\mathbf{d}) is exactly the Fisher information matrix for 𝜷\boldsymbol{\beta}. We also suppose that mNm\leq N and that the linear span of {𝐟i:i{1:n}}\{\mathbf{f}_{i}:i\in\{1{:}n\}\} is the full m\mathbb{R}^{m}, which guarantees that there exists at least one 𝐝𝒟N(n)\mathbf{d}\in\mathcal{D}^{(n)}_{N} with a nonsingular information matrix.

In some of our results, approximate designs will also play a minor role. Instead of integer numbers of trials, such designs represent the proportions of trials at individual design points for NN\to\infty. An approximate design for a model yi=𝐟i𝜷+εiy_{i}=\mathbf{f}_{i}^{\prime}\boldsymbol{\beta}+\varepsilon_{i}, i{1:n}i\in\{1{:}n\}, can be formally represented by a vector 𝐰n\mathbf{w}\in\mathbb{R}^{n} such that wi0w_{i}\geq 0 for all ii and iwi=1\sum_{i}w_{i}=1. The information matrix of 𝐰\mathbf{w} is then 𝐌(𝐰)=iwi𝐟i𝐟i\mathbf{M}(\mathbf{w})=\sum_{i}w_{i}\mathbf{f}_{i}\mathbf{f}_{i}^{\prime}. We will always use the qualifier “approximate” when referring to an approximate design.

The “loss” associated with an information matrix is measured by a so-called optimality criterion Φ\Phi, selected in accordance with the aim of the experiment; in our case, the optimality criterion is formally a function mapping from 𝕊++m\mathbb{S}^{m}_{++} to \mathbb{R}. A Φ\Phi-optimal design 𝐝\mathbf{d}^{*} then minimizes Φ(𝐌(𝐝))\Phi(\mathbf{M}(\mathbf{d})) over all designs 𝐝\mathbf{d} of size NN with a nonsingular information matrix 𝐌(𝐝)\mathbf{M}(\mathbf{d}). Examples include DD-optimality, for which ΦD(𝐌)=ln(det(𝐌1))\Phi_{D}(\mathbf{M})=\ln(\det(\mathbf{M}^{-1})); AA-optimality, for which ΦA(𝐌)=tr(𝐌1)\Phi_{A}(\mathbf{M})=\mathrm{tr}(\mathbf{M}^{-1}); II-optimality,111II-optimality is sometimes also called VV-optimality or IVIV-optimality because it can be interpreted as an integrated prediction variance. for which ΦI(𝐌)=i=1n𝐟i𝐌1𝐟i\Phi_{I}(\mathbf{M})=\sum_{i=1}^{n}\mathbf{f}_{i}^{\prime}\mathbf{M}^{-1}\mathbf{f}_{i}; GG-optimality, for which ΦG(𝐌)=maxi=1,,n𝐟i𝐌1𝐟i\Phi_{G}(\mathbf{M})=\max_{i=1,\ldots,n}\mathbf{f}_{i}^{\prime}\mathbf{M}^{-1}\mathbf{f}_{i}; MVMV-optimality, for which ΦMV(𝐌)=maxi=1,,m𝐞i𝐌1𝐞i\Phi_{MV}(\mathbf{M})=\max_{i=1,\ldots,m}\mathbf{e}_{i}^{\prime}\mathbf{M}^{-1}\mathbf{e}_{i}; EE-optimality, for which ΦE(𝐌)=λmax(𝐌1)\Phi_{E}(\mathbf{M})=\lambda_{\max}(\mathbf{M}^{-1}); and cc-optimality, for which Φc(𝐌)=𝐜𝐌1𝐜\Phi_{c}(\mathbf{M})=\mathbf{c}^{\prime}\mathbf{M}^{-1}\mathbf{c} for a selected 𝐜m\mathbf{c}\in\mathbb{R}^{m}. For statistical interpretations of these criteria, see, e.g., Chapter 10 in Atkinson et al. (2007) and Chapter 5 in Pronzato and Pázman (2013).

In this paper, we focus on a broad class of criteria, which we now define. For {1:K}\ell\in\{1{:}K\}, let 𝐁\mathbf{B}_{\ell} be a given m×sm\times s_{\ell} matrix. We use 𝔅\mathfrak{B} to denote the sequence 𝐁1,,𝐁K\mathbf{B}_{1},\ldots,\mathbf{B}_{K} and 𝐁\mathbf{B} to denote the m×sm\times\sum_{\ell}s_{\ell} matrix (𝐁1,,𝐁K)(\mathbf{B}_{1},\ldots,\mathbf{B}_{K}). We assume that 𝒞(𝐁)=m\mathcal{C}(\mathbf{B})=\mathbb{R}^{m} and that no column of 𝐁\mathbf{B} is equal to 𝟎m\mathbf{0}_{m}. We consider the following criterion:

Φ𝔅(𝐌)=max=1,,Ktr(𝐁𝐌1𝐁),\Phi_{\mathfrak{B}}(\mathbf{M})=\max_{\ell=1,\ldots,K}\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\mathbf{M}^{-1}\mathbf{B}_{\ell}),

which belongs to the family of minimax criteria discussed by Wong (1992). The expression 𝐁𝐌1𝐁\mathbf{B}_{\ell}^{\prime}\mathbf{M}^{-1}\mathbf{B}_{\ell} represents the variance matrix of the least-squares estimator of 𝐁𝜷\mathbf{B}_{\ell}^{\prime}\boldsymbol{\beta}, which implies that Φ𝔅\Phi_{\mathfrak{B}} seeks to minimize the maximum of the AA-optimality criterion values over all linear parameter systems 𝐁𝜷\mathbf{B}_{\ell}^{\prime}\boldsymbol{\beta}, {1:K}\ell\in\{1{:}K\}. Clearly, all the 𝐁𝜷\mathbf{B}_{\ell}^{\prime}\boldsymbol{\beta}s are estimable under a design 𝐝\mathbf{d} if and only if 𝐌(𝐝)\mathbf{M}(\mathbf{d}) is nonsingular. As special cases of Φ𝔅\Phi_{\mathfrak{B}}, we can obtain the criteria of AA-optimality (K=1K=1, 𝐁1=𝐈m\mathbf{B}_{1}=\mathbf{I}_{m}), II-optimality222The II-optimal design can also be found by computing the AA-optimal design for a transformed model (see, e.g., Appendix A.3 in Sagnol and Harman (2015)). In particular, the AA-optimal design for a model with regressors 𝐟~i=𝐆1/2𝐟i\tilde{\mathbf{f}}_{i}=\mathbf{G}^{-1/2}\mathbf{f}_{i}, i{1:n}i\in\{1{:}n\}, where 𝐆=i=1n𝐟i𝐟i\mathbf{G}=\sum_{i=1}^{n}\mathbf{f}_{i}\mathbf{f}_{i}^{\prime}, is II-optimal for the original model. However, in this paper, we consider II-optimality as a standalone criterion. Note that so-called weighted II-optimality (Φ(𝐌)=i=1nai𝐟i𝐌1𝐟i\Phi(\mathbf{M})=\sum_{i=1}^{n}a_{i}\mathbf{f}_{i}^{\prime}\mathbf{M}^{-1}\mathbf{f}_{i} for given positive weights aia_{i}) also belongs to the Φ𝔅\Phi_{\mathfrak{B}} class and it can be addressed in a similar manner. (K=1K=1, 𝐁1=(𝐟1,,𝐟n)\mathbf{B}_{1}=(\mathbf{f}_{1},\ldots,\mathbf{f}_{n})), MVMV-optimality (K=mK=m, 𝐁=𝐞\mathbf{B}_{\ell}=\mathbf{e}_{\ell} for {1:m}\ell\in\{1{:}m\}), and GG-optimality (K=nK=n, 𝐁=𝐟\mathbf{B}_{\ell}=\mathbf{f}_{\ell} for {1:n}\ell\in\{1{:}n\}).

Φ𝔅\Phi_{\mathfrak{B}} criteria are crucial for our MILP formulation, as they involve terms that are linear in the covariance matrix 𝚺=𝐌1(𝐝)\boldsymbol{\Sigma}=\mathbf{M}^{-1}(\mathbf{d}) of the least-squares estimator 𝜷^\widehat{\boldsymbol{\beta}}. We explicitly reformulate these criteria in terms of the covariance matrix as Ψ𝔅(𝚺)=Φ𝔅(𝚺1)=max=1,,Ktr(𝐁𝚺𝐁)\Psi_{\mathfrak{B}}(\boldsymbol{\Sigma})=\Phi_{\mathfrak{B}}(\boldsymbol{\Sigma}^{-1})=\max_{\ell=1,\ldots,K}\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\boldsymbol{\Sigma}\mathbf{B}_{\ell}), where 𝚺𝕊++m\boldsymbol{\Sigma}\in\mathbb{S}^{m}_{++}, which can be extended to the entire linear space 𝕊m\mathbb{S}^{m} by simply setting Ψ𝔅(𝚺)=max=1,,Ktr(𝐁𝚺𝐁)\Psi_{\mathfrak{B}}(\boldsymbol{\Sigma})=\max_{\ell=1,\ldots,K}\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\boldsymbol{\Sigma}\mathbf{B}_{\ell}) for any 𝚺𝕊m\boldsymbol{\Sigma}\in\mathbb{S}^{m}. This reformulation allows for an optimal design problem formulation that is linear in the covariance matrix 𝚺\boldsymbol{\Sigma}. However, this is not sufficient to make the problem linear: it merely moves the nonlinearity from the objective function to the constraints of the optimization problem because of the nonlinear relationship 𝚺=𝐌1(𝐝)\boldsymbol{\Sigma}=\mathbf{M}^{-1}(\mathbf{d}). Nevertheless, by applying McCormick relaxation, we can address the newly introduced nonlinearity in the constraints, thus ultimately arriving at a linear program, as detailed in Section 2. This also explains why our approach cannot be extended to DD-optimality: the problem of DD-optimal design apparently cannot be recast to involve only the aforementioned specific type of nonlinearity.

In the remainder of this paper, we denote the elements of a covariance matrix 𝚺\boldsymbol{\Sigma} by cjkc_{jk} and the elements of a Φ𝔅\Phi_{\mathfrak{B}}-optimal covariance matrix 𝚺\boldsymbol{\Sigma}^{*} by cjkc^{*}_{jk}.

2 MILP formulation

The problem of Φ𝔅\Phi_{\mathfrak{B}}-optimal design described in the previous section can be formulated as

min𝐌,𝐝\displaystyle\min_{\mathbf{M},\mathbf{d}}\quad max=1,,Ktr(𝐁𝐌1𝐁)\displaystyle\max_{\ell=1,\ldots,K}\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\mathbf{M}^{-1}\mathbf{B}_{\ell})
s.t.\displaystyle s.t.\quad 𝐌=i=1ndi𝐟i𝐟i,\displaystyle\mathbf{M}=\sum_{i=1}^{n}d_{i}\mathbf{f}_{i}\mathbf{f}_{i}^{\prime},
𝐌𝕊++m,𝐝𝒟N(n).\displaystyle\mathbf{M}\in\mathbb{S}^{m}_{++},\>\mathbf{d}\in\mathcal{D}_{N}^{(n)}.

Note that we assume that the set of feasible solutions is nonempty. Clearly, this problem can be rewritten in the following form:

min𝐌,𝚺,𝐝\displaystyle\min_{\mathbf{M},\boldsymbol{\Sigma},\mathbf{d}}\quad max=1,,Ktr(𝐁𝚺𝐁)\displaystyle\max_{\ell=1,\ldots,K}\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\boldsymbol{\Sigma}\mathbf{B}_{\ell}) (1)
s.t.\displaystyle s.t.\quad 𝐌=i=1ndi𝐟i𝐟i,𝐌𝚺=𝐈m,\displaystyle\mathbf{M}=\sum_{i=1}^{n}d_{i}\mathbf{f}_{i}\mathbf{f}_{i}^{\prime},\>\mathbf{M}\boldsymbol{\Sigma}=\mathbf{I}_{m},
𝐌,𝚺𝕊m,𝐝𝒟N(n).\displaystyle\mathbf{M},\boldsymbol{\Sigma}\in\mathbb{S}^{m},\>\mathbf{d}\in\mathcal{D}_{N}^{(n)}.

Since the objective function in (1) is the maximum of linear functions of 𝚺\boldsymbol{\Sigma}, the only nonlinear part of problem (1) is 𝐌𝚺=𝐈m\mathbf{M}\boldsymbol{\Sigma}=\mathbf{I}_{m}. However, this condition can be reformulated using the relaxation method proposed by McCormick (1976), analogously to Bhela et al. (2019). The trick, and the key difficulty, lies in finding a priori bounds LjkL_{jk} and UjkU_{jk} for each element cjkc_{jk}^{*} of any optimal 𝚺=𝐌1\boldsymbol{\Sigma}^{*}=\mathbf{M}_{*}^{-1}:

LjkcjkUjk,j,k{1:m},L_{jk}\leq c_{jk}^{*}\leq U_{jk},\quad j,k\in\{1{:}m\}, (2)

and then constructing new variables

zijk=dicjk,i{1:n};j,k{1:m}.z_{ijk}=d_{i}c_{jk},\quad i\in\{1{:}n\};\,j,k\in\{1{:}m\}. (3)

The condition 𝐌𝚺=𝐈m\mathbf{M}\boldsymbol{\Sigma}=\mathbf{I}_{m} can then be expressed as a set of linear equations for the zijkz_{ijk}s by applying the substitution 𝐌=idi𝐟i𝐟i\mathbf{M}=\sum_{i}d_{i}\mathbf{f}_{i}\mathbf{f}^{\prime}_{i}. This condition is linear in the nm2nm^{2}-dimensional real vector 𝐳\mathbf{z} composed of all variables zijkz_{ijk}, and we can express it as 𝐀𝐳=𝐛\mathbf{A}\mathbf{z}=\mathbf{b} for an appropriately chosen 𝐀\mathbf{A} and 𝐛\mathbf{b}.333It does not seem that there is a simple formula for 𝐀\mathbf{A} and 𝐛\mathbf{b} as a function of the elements of 𝐟1,,𝐟n\mathbf{f}_{1},\ldots,\mathbf{f}_{n}; see Appendix A for the detailed formulation of 𝐀𝐳=𝐛\mathbf{A}\mathbf{z}=\mathbf{b}.

However, in this process, the nonlinear conditions zijk=dicjkz_{ijk}=d_{i}c_{jk} are introduced. Note that without altering the set of optimal solutions, we can add the following inequalities:

di(cjkLjk)0,\displaystyle d_{i}(c_{jk}-L_{jk})\geq 0,
(di1)(cjkUjk)0,\displaystyle(d_{i}-1)(c_{jk}-U_{jk})\geq 0,
di(cjkUjk)0,\displaystyle d_{i}(c_{jk}-U_{jk})\leq 0,
(di1)(cjkLjk)0,\displaystyle(d_{i}-1)(c_{jk}-L_{jk})\leq 0,

which can be expressed using the variables zijkz_{ijk} as

zijkdiLjk0,\displaystyle z_{ijk}-d_{i}L_{jk}\geq 0, (4)
zijkdiUjkcjk+Ujk0,\displaystyle z_{ijk}-d_{i}U_{jk}-c_{jk}+U_{jk}\geq 0, (5)
zijkdiUjk0,\displaystyle z_{ijk}-d_{i}U_{jk}\leq 0, (6)
zijkdiLjkcjk+Ljk0\displaystyle z_{ijk}-d_{i}L_{jk}-c_{jk}+L_{jk}\leq 0 (7)

for all i{1:n}i\in\{1{:}n\} and j,k{1:m}j,k\in\{1{:}m\}. These inequalities hold for any optimal 𝚺\boldsymbol{\Sigma}^{*} and 𝐝\mathbf{d}^{*} because 0di10\leq d_{i}^{*}\leq 1 and LjkcjkUjkL_{jk}\leq c_{jk}^{*}\leq U_{jk}; thus, they can be added to (1). Interestingly, by adding the linear conditions expressed in (4)-(7), one can actually replace the nonlinear conditions (3) in (1) (as observed by Bhela et al. (2019) in the network topology context): because of the binary nature of 𝐝\mathbf{d}, conditions (4)-(7) imply (3). In particular, if di=0d_{i}=0, then (4) and (6) imply that zijk=0z_{ijk}=0, and if di=1d_{i}=1, then (5) and (7) imply that zijk=cjkz_{ijk}=c_{jk}. In each case, zijk=dicjkz_{ijk}=d_{i}c_{jk}, and thus, (3) is unnecessary.

Therefore, we obtain an equivalent444In the sense of having the same set of optimal solutions 𝐝\mathbf{d}^{*}. formulation of (1):

min𝐳,𝚺,𝐝\displaystyle\min_{\mathbf{z},\boldsymbol{\Sigma},\mathbf{d}}\quad max=1,,Ktr(𝐁𝚺𝐁)\displaystyle\max_{\ell=1,\ldots,K}\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\boldsymbol{\Sigma}\mathbf{B}_{\ell}) (8)
s.t.\displaystyle s.t.\quad 𝐀𝐳=𝐛,(4)(7),\displaystyle\mathbf{A}\mathbf{z}=\mathbf{b},\>\eqref{eMcCormick1}-\eqref{eMcCormick4},
𝐳nm2,𝚺𝕊m,𝐝𝒟N(n),\displaystyle\mathbf{z}\in\mathbb{R}^{nm^{2}},\>\boldsymbol{\Sigma}\in\mathbb{S}^{m},\>\mathbf{d}\in\mathcal{D}_{N}^{(n)},

where all the constraints are linear in the variables (𝐳,𝚺,𝐝)(\mathbf{z},\boldsymbol{\Sigma},\mathbf{d}). This is then equivalent to

minφ,𝐳,𝚺,𝐝\displaystyle\min_{\varphi,\mathbf{z},\boldsymbol{\Sigma},\mathbf{d}}\quad φ\displaystyle\varphi (9)
s.t.\displaystyle s.t.\quad φtr(𝐁𝚺𝐁),{1:K},\displaystyle\varphi\geq\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\boldsymbol{\Sigma}\mathbf{B}_{\ell}),\>\ell\in\{1{:}K\},
𝐀𝐳=𝐛,(4)(7),\displaystyle\mathbf{A}\mathbf{z}=\mathbf{b},\>\eqref{eMcCormick1}-\eqref{eMcCormick4},
φ,𝐳nm2,𝚺𝕊m,𝐝𝒟N(n),\displaystyle\varphi\in\mathbb{R},\>\mathbf{z}\in\mathbb{R}^{nm^{2}},\>\boldsymbol{\Sigma}\in\mathbb{S}^{m},\>\mathbf{d}\in\mathcal{D}_{N}^{(n)},

where even the objective function is linear. Since all the constraints in (9) are linear or binary,555Recall that 𝒟N(n)\mathcal{D}_{N}^{(n)} is a set of binary vectors 𝐝\mathbf{d} subject to a linear constraint 𝟏n𝐝=N\mathbf{1}^{\prime}_{n}\mathbf{d}=N. problem (9) is a mixed-integer linear program. Therefore, we have expressed the Φ𝔅\Phi_{\mathfrak{B}}-optimal binary design problem in the form of an MILP problem. As special cases, we can obtain the MILP formulations for AA-, II-, MVMV- and GG-optimal binary design by choosing the corresponding optimality criteria Φ𝔅\Phi_{\mathfrak{B}}. In Appendix A, we provide a precise formulation of (9) in vector form, which can therefore be used as input for MILP solvers. Nonetheless, to use the reformulation (9), we still need to determine the coefficients LjkL_{jk} and UjkU_{jk}, which appear in constraints (4)-(7).

3 Covariance bounds

The MILP formulation (9) requires interval bounds (2) on the elements of 𝚺\boldsymbol{\Sigma}^{*}, which we call covariance bounds. The construction of such bounds is a rich problem, interesting not only for the computation of optimal designs but also in and of itself as a potentially useful characteristic of an experimental design problem. In this section, we describe selected strategies for constructing such interval bounds.

3.1 General Φ𝔅\Phi_{\mathfrak{B}}-optimality

Our construction of the covariance bounds relies on knowledge of a design that has a nonsingular information matrix. Therefore, let 𝐝0𝒟N(n)\mathbf{d}_{0}\in\mathcal{D}_{N}^{(n)} have a nonsingular information matrix 𝐌0:=𝐌(𝐝0)𝕊++m\mathbf{M}_{0}:=\mathbf{M}(\mathbf{d}_{0})\in\mathbb{S}^{m}_{++}, and let 𝚺0=𝐌01\boldsymbol{\Sigma}_{0}=\mathbf{M}_{0}^{-1}. The more Φ𝔅\Phi_{\mathfrak{B}}-efficient 𝐝0\mathbf{d}_{0} is, the stronger are the bounds we obtain; thus, in applications, we recommend computing 𝐝0\mathbf{d}_{0} via a Φ𝔅\Phi_{\mathfrak{B}}-optimization heuristic (e.g., some exchange algorithm; see Chapter 12 in Atkinson et al. (2007)).

Let α:=Φ𝔅(𝐌0)\alpha:=\Phi_{\mathfrak{B}}(\mathbf{M}_{0}). Clearly, Ψ𝔅(𝚺)Ψ𝔅(𝚺0)=Φ𝔅(𝐌0)\Psi_{\mathfrak{B}}(\boldsymbol{\Sigma}^{*})\leq\Psi_{\mathfrak{B}}(\boldsymbol{\Sigma}_{0})=\Phi_{\mathfrak{B}}(\mathbf{M}_{0}), that is,

tr(𝐁𝚺𝐁)α,{1:K}.\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\boldsymbol{\Sigma}^{*}\mathbf{B}_{\ell})\leq\alpha,\>\ell\in\{1{:}K\}. (10)

We will show that the constraints (10) are sufficient to provide finite bounds on all the elements of 𝚺\boldsymbol{\Sigma}^{*}.

3.1.1 Computational construction

Recall the notation cjk:=(𝚺)jkc^{*}_{jk}:=(\boldsymbol{\Sigma}^{*})_{jk} for j,k{1:m}j,k\in\{1{:}m\}. A direct approach to finding the covariance bounds is a computational one: apply mathematical programming to find the smallest/largest value of (𝚺)jk(\boldsymbol{\Sigma})_{jk} over all matrices 𝚺𝕊+m\boldsymbol{\Sigma}\in\mathbb{S}^{m}_{+} that satisfy (10). Then, cjkc^{*}_{jk} must be bounded by these values. Formally, for any j,k{1:m}j,k\in\{1{:}m\},

cjkmax𝚺\displaystyle c^{*}_{jk}\quad\leq\quad\max_{\boldsymbol{\Sigma}} 𝐞j𝚺𝐞k,\displaystyle\mathbf{e}_{j}^{\prime}\boldsymbol{\Sigma}\mathbf{e}_{k}, (11)
s.t.\displaystyle s.t. tr(𝐁𝚺𝐁)α,{1:K},\displaystyle\mathrm{tr}(\mathbf{B}^{\prime}_{\ell}\boldsymbol{\Sigma}\mathbf{B}_{\ell})\leq\alpha,\>\ell\in\{1{:}K\},
𝚺𝕊+m;\displaystyle\boldsymbol{\Sigma}\in\mathbb{S}^{m}_{+};
cjkmin𝚺\displaystyle c^{*}_{jk}\quad\geq\quad\min_{\boldsymbol{\Sigma}} 𝐞j𝚺𝐞k,\displaystyle\mathbf{e}_{j}^{\prime}\boldsymbol{\Sigma}\mathbf{e}_{k}, (12)
s.t.\displaystyle s.t. tr(𝐁𝚺𝐁)α,{1:K},\displaystyle\mathrm{tr}(\mathbf{B}^{\prime}_{\ell}\boldsymbol{\Sigma}\mathbf{B}_{\ell})\leq\alpha,\>\ell\in\{1{:}K\},
𝚺𝕊+m.\displaystyle\boldsymbol{\Sigma}\in\mathbb{S}^{m}_{+}.

Optimization problems (11) and (12) are (continuous, not discrete) semidefinite programming (SDP) problems; that is, they are easy to solve using readily available and efficient SDP solvers. Moreover, the above bounds are, by definition, the strongest ones that can be constructed using only the inequalities (10).

In the case of the lower diagonal bounds, problem (12) can be analytically solved to arrive at the simple bound cjj0c_{jj}^{*}\geq 0. This is because 𝐞jT𝚺𝐞j0\mathbf{e}_{j}^{T}\boldsymbol{\Sigma}\mathbf{e}_{j}\geq 0 for any 𝚺𝕊+m\boldsymbol{\Sigma}\in\mathbb{S}^{m}_{+}, and the objective function value 𝐞jT𝚺𝐞j=0\mathbf{e}_{j}^{T}\boldsymbol{\Sigma}\mathbf{e}_{j}=0 is attained for the feasible solution 𝚺=𝟎m×m\boldsymbol{\Sigma}=\mathbf{0}_{m\times m}. On the other hand, the upper bounds on the diagonal elements of 𝚺\boldsymbol{\Sigma}^{*} as well as the lower and upper bounds on the nondiagonal elements of 𝚺\boldsymbol{\Sigma}^{*} generally depend on the optimality criterion and the value of α\alpha.

To find bounds for the entire 𝚺\boldsymbol{\Sigma}^{*}, a distinct pair of problems (11) and (12) must be solved for each cjkc_{jk}^{*}. This therefore requires solving m(m+1)m(m+1) semidefinite programs666More precisely, m(m+1)m=m2m(m+1)-m=m^{2} programs, because the mm lower diagonal bounds have analytical solutions., which in some cases may be inconvenient. In addition, the finiteness of the optimal values of (11) and (12) is not immediately apparent; however, these values must be finite to be of any utility. We therefore also provide analytical bounds on the cjkc^{*}_{jk}s that guarantee the finiteness of (11) and (12) (see Theorem 3) and do not require the numerical solution of auxiliary optimization problems.

3.1.2 Analytical construction

First, we provide two simple constructions of the bounds on the nondiagonal covariances cjkc^{*}_{jk} based on the bounds on the variances cjjc_{jj}^{*}. Because 𝚺\boldsymbol{\Sigma}^{*} is positive semidefinite, Sylvester’s criterion for positive semidefinite matrices implies that cjj0c_{jj}^{*}\geq 0 and cjjckk(cjk)20c_{jj}^{*}c_{kk}^{*}-(c_{jk}^{*})^{2}\geq 0, i.e.,

|cjk|cjjckk,j,k{1:m}.|c_{jk}^{*}|\leq\sqrt{c_{jj}^{*}c_{kk}^{*}},\quad j,k\in\{1{:}m\}. (13)

Furthermore, the inequality of the geometric and arithmetic means provides weaker but linear bounds, which can also be useful:

|cjk|12(cjj+ckk),j,k{1:m}.|c_{jk}^{*}|\leq\frac{1}{2}(c_{jj}^{*}+c_{kk}^{*}),\quad j,k\in\{1{:}m\}. (14)

Therefore, to find both lower and upper bounds on the cjkc^{*}_{jk}s for all j,k{1:m}j,k\in\{1{:}m\}, we need only to construct finite upper bounds on the variances cjjc^{*}_{jj} for all j{1:m}j\in\{1{:}m\} or on the sums cjj+ckkc_{jj}^{*}+c_{kk}^{*} for all j,k{1:m}j,k\in\{1{:}m\}.

The key mathematical result for the analytical bounds is the following lemma. The proof of Lemma 1 and all other nontrivial proofs are deferred to Appendix B.

Lemma 1

Let 𝚺\boldsymbol{\Sigma} be any m×mm\times m nonnegative definite matrix such that tr(𝐁𝚺𝐁)α\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\boldsymbol{\Sigma}\mathbf{B}_{\ell})\leq\alpha for all {1:K}\ell\in\{1{:}K\}. Let 𝐰=(w1,,wK)K\mathbf{w}=(w_{1},\ldots,w_{K})^{\prime}\in\mathbb{R}^{K} be a vector with nonnegative components summing to 11, and let 𝐍(𝐰)==1Kw𝐁𝐁\mathbf{N}(\mathbf{w})=\sum_{\ell=1}^{K}w_{\ell}\mathbf{B}_{\ell}\mathbf{B}^{\prime}_{\ell}. Assume that 𝐗\mathbf{X} is an m×rm\times r matrix such that 𝒞(𝐗)𝒞(𝐍(𝐰))\mathcal{C}(\mathbf{X})\subseteq\mathcal{C}(\mathbf{N}(\mathbf{w})). Then,

tr(𝐗𝚺𝐗)αλmax(𝐗𝐍+(𝐰)𝐗).\mathrm{tr}(\mathbf{X}^{\prime}\boldsymbol{\Sigma}\mathbf{X})\leq\alpha\lambda_{\max}(\mathbf{X}^{\prime}\mathbf{N}^{+}(\mathbf{w})\mathbf{X}). (15)

Because tr(𝐁𝚺𝐁)α\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\boldsymbol{\Sigma}^{*}\mathbf{B}_{\ell})\leq\alpha for all {1:K}\ell\in\{1{:}K\}, we can use Lemma 1 with various choices for 𝐗\mathbf{X} and 𝐰\mathbf{w} to obtain bounds on the elements of 𝚺\boldsymbol{\Sigma}^{*}. Let j,k{1:m}j,k\in\{1{:}m\}, jkj\neq k. Among the large variety of possibilities, we will use only the matrices 𝐗=𝐞j\mathbf{X}=\mathbf{e}_{j}, which directly provide bounds on cjjc_{jj}^{*}, and 𝐗=𝐄jk:=(𝐞j,𝐞k)\mathbf{X}=\mathbf{E}_{jk}:=(\mathbf{e}_{j},\mathbf{e}_{k}), which provide bounds on cjkc_{jk}^{*} due to inequality (14). For such 𝐗\mathbf{X}, any 𝐰k\mathbf{w}\in\mathbb{R}^{k} that satisfies w=1\sum_{\ell}w_{\ell}=1, 𝐰𝟎\mathbf{w}\geq\mathbf{0}, and 𝒞(𝐗)𝒞(𝐍(𝐰))\mathcal{C}(\mathbf{X})\subseteq\mathcal{C}(\mathbf{N}(\mathbf{w})) can be used.

It is then of interest to examine choices for 𝐰\mathbf{w}. Interestingly, such a vector 𝐰\mathbf{w} can be viewed as an approximate design for the artificial model 𝐲=𝐁𝜽+𝜺\mathbf{y}_{\ell}=\mathbf{B}_{\ell}^{\prime}{\boldsymbol{\theta}}+{\boldsymbol{\varepsilon}}_{\ell}, with ss_{\ell}-dimensional observations 𝐲\mathbf{y}_{\ell}, design space {1:K}\{1{:}K\} and elementary information matrices 𝐁1𝐁1,,𝐁K𝐁K\mathbf{B}_{1}\mathbf{B}_{1}^{\prime},\ldots,\mathbf{B}_{K}\mathbf{B}_{K}^{\prime}. The corresponding information matrix of 𝐰\mathbf{w} is then w𝐁𝐁\sum_{\ell}w_{\ell}\mathbf{B}_{\ell}\mathbf{B}_{\ell}^{\prime}, which is exactly 𝐍(𝐰)\mathbf{N}(\mathbf{w}). Therefore, we refer to such vectors 𝐰\mathbf{w} as (approximate) designs. In particular, we consider the following choices:

  • 1.

    The uniform design 𝐰(u)=𝟏K/K\mathbf{w}^{(u)}=\mathbf{1}_{K}/K; note that 𝐍(𝐰(u))=1K𝐁𝐁\mathbf{N}(\mathbf{w}^{(u)})=\frac{1}{K}\mathbf{B}\mathbf{B}^{\prime}.

  • 2.

    The design 𝐰(j+)\mathbf{w}^{(j+)} with components w(j+)=𝐡(j+)/Sjw_{\ell}^{(j+)}=\|\mathbf{h}^{(j+)}_{\ell}\|/S_{j} for all {1:K}\ell\in\{1{:}K\}, where 𝐡1(j+),,𝐡K(j+)\mathbf{h}^{(j+)}_{1},\ldots,\mathbf{h}^{(j+)}_{K} are vectors of dimensions s1,,sKs_{1},\dots,s_{K} such that

    ((𝐡1(j+)),,(𝐡K(j+)))=𝐁+𝐞j((\mathbf{h}^{(j+)}_{1})^{\prime},\ldots,(\mathbf{h}^{(j+)}_{K})^{\prime})^{\prime}=\mathbf{B}^{+}\mathbf{e}_{j}

    and Sj=𝐡(j+)S_{j}=\sum_{\ell}\|\mathbf{h}^{(j+)}_{\ell}\|. We call 𝐰(j+)\mathbf{w}^{(j+)} the Moore–Penrose approximate design for 𝐞j\mathbf{e}_{j} for the artificial model.

  • 3.

    The design 𝐰(j)\mathbf{w}^{(j*)} that minimizes 𝐞j𝐍+(𝐰)𝐞j\mathbf{e}_{j}^{\prime}\mathbf{N}^{+}(\mathbf{w})\mathbf{e}_{j} in the class of all approximate designs 𝐰\mathbf{w} (i.e., 𝐰𝟎K\mathbf{w}\geq\mathbf{0}_{K}, 𝟏K𝐰=1\mathbf{1}_{K}^{\prime}\mathbf{w}=1) satisfying 𝐞j𝒞(𝐍(𝐰))\mathbf{e}_{j}\in\mathcal{C}(\mathbf{N}(\mathbf{w})). We call 𝐰(j)\mathbf{w}^{(j*)} the 𝐞j\mathbf{e}_{j}-optimal approximate design for the artificial model.

  • 4.

    𝐰(jk+):=12(𝐰(j+)+𝐰(k+))\mathbf{w}^{(jk+)}:=\frac{1}{2}\left(\mathbf{w}^{(j+)}+\mathbf{w}^{(k+)}\right) and 𝐰(jk):=12(𝐰(j)+𝐰(k))\mathbf{w}^{(jk*)}:=\frac{1}{2}\left(\mathbf{w}^{(j*)}+\mathbf{w}^{(k*)}\right).

The uniform design is the most straightforward choice. A more refined choice is the Moore–Penrose approximate design, which accounts for the desirability of having a “small” matrix 𝐍+(𝐰)\mathbf{N}^{+}(\mathbf{w}). The 𝐞j\mathbf{e}_{j}-optimal approximate design actually optimizes the right-hand side of (15) for 𝐗=𝐞j\mathbf{X}=\mathbf{e}_{j} to make it as small as possible; such designs are theoretically interesting but not necessarily practically desirable. Finally, the designs 𝐰(jk+)\mathbf{w}^{(jk+)} and 𝐰(jk)\mathbf{w}^{(jk*)} are constructed such that 𝐗=𝐄jk\mathbf{X}=\mathbf{E}_{jk} satisfies 𝒞(𝐗)𝒞(𝐍(𝐰))\mathcal{C}(\mathbf{X})\subseteq\mathcal{C}(\mathbf{N}(\mathbf{w})). Note that although we use approximate designs as an auxiliary tool for computing covariance bounds, optimal approximate designs are needed only for the computation of 𝐰(j)\mathbf{w}^{(j*)} and 𝐰(jk)\mathbf{w}^{(jk*)}.

Lemma 2

Let j{1:m}j\in\{1{:}m\}. The column space of each of the matrices 𝐍(𝐰(u))\mathbf{N}(\mathbf{w}^{(u)}), 𝐍(𝐰(j+))\mathbf{N}(\mathbf{w}^{(j+)}), and 𝐍(𝐰(j))\mathbf{N}(\mathbf{w}^{(j*)}) contains 𝐞j\mathbf{e}_{j}. Let j,k{1:m}j,k\in\{1{:}m\}, jkj\neq k. The column space of each of the matrices 𝐍(𝐰(u))\mathbf{N}(\mathbf{w}^{(u)}), 𝐍(𝐰(jk+))\mathbf{N}(\mathbf{w}^{(jk+)}), and 𝐍(𝐰(jk))\mathbf{N}(\mathbf{w}^{(jk*)}) contains the column space of the matrix 𝐄jk\mathbf{E}_{jk}.

According to Lemma 2, for t{j,k}t\in\{j,k\}, we can use Lemma 1 with 𝐰=𝐰(u)\mathbf{w}=\mathbf{w}^{(u)}, 𝐰=𝐰(t+)\mathbf{w}=\mathbf{w}^{(t+)}, and 𝐰=𝐰(t)\mathbf{w}=\mathbf{w}^{(t*)} and with 𝐗=𝐞t\mathbf{X}=\mathbf{e}_{t} and apply bound (13) to obtain the following theorem:

Theorem 1 (Type I covariance bounds)

For any j,k{1:m}j,k\in\{1{:}m\}, we have

|cjk|\displaystyle|c_{jk}^{*}| \displaystyle\leq αK[(𝐁𝐁)jj1(𝐁𝐁)kk1]1/2,\displaystyle\alpha K[(\mathbf{B}\mathbf{B}^{\prime})^{-1}_{jj}(\mathbf{B}\mathbf{B}^{\prime})^{-1}_{kk}]^{1/2}, (16)
|cjk|\displaystyle|c_{jk}^{*}| \displaystyle\leq α[𝐍jj+(𝐰(j+))𝐍kk+(𝐰(k+))]1/2,\displaystyle\alpha[\mathbf{N}^{+}_{jj}(\mathbf{w}^{(j+)})\mathbf{N}^{+}_{kk}(\mathbf{w}^{(k+)})]^{1/2}, (17)
|cjk|\displaystyle|c_{jk}^{*}| \displaystyle\leq α[𝐍jj+(𝐰(j))𝐍kk+(𝐰(k))]1/2,\displaystyle\alpha[\mathbf{N}^{+}_{jj}(\mathbf{w}^{(j*)})\mathbf{N}^{+}_{kk}(\mathbf{w}^{(k*)})]^{1/2}, (18)

where (𝐁𝐁)ij1(\mathbf{B}\mathbf{B}^{\prime})^{-1}_{ij} and 𝐍ij+(𝐰)\mathbf{N}^{+}_{ij}(\mathbf{w}) represent the elements of matrices (𝐁𝐁)1(\mathbf{B}\mathbf{B}^{\prime})^{-1} and 𝐍+(𝐰)\mathbf{N}^{+}(\mathbf{w}) with coordinates (i,j)(i,j), respectively.

Similarly, for j,kj,k, jkj\neq k, we can use Lemma 1 with 𝐰=𝐰(u)\mathbf{w}=\mathbf{w}^{(u)}, 𝐰=𝐰(jk+)\mathbf{w}=\mathbf{w}^{(jk+)}, 𝐰=𝐰(jk)\mathbf{w}=\mathbf{w}^{(jk*)}, and 𝐗=𝐄jk\mathbf{X}=\mathbf{E}_{jk} and apply bound (14). We thus obtain the following theorem.

Theorem 2 (Type II covariance bounds)

For any j,k{1:m}j,k\in\{1{:}m\}, jkj\neq k, we have

|cjk|\displaystyle|c_{jk}^{*}| \displaystyle\leq α2Kλmax(𝐄jk(𝐁𝐁)1𝐄jk),\displaystyle\frac{\alpha}{2}K\lambda_{\max}(\mathbf{E}^{\prime}_{jk}(\mathbf{B}\mathbf{B}^{\prime})^{-1}\mathbf{E}_{jk}), (19)
|cjk|\displaystyle|c_{jk}^{*}| \displaystyle\leq α2λmax(𝐄jk𝐍+(𝐰(jk+))𝐄jk),\displaystyle\frac{\alpha}{2}\lambda_{\max}(\mathbf{E}^{\prime}_{jk}\mathbf{N}^{+}(\mathbf{w}^{(jk+)})\mathbf{E}_{jk}), (20)
|cjk|\displaystyle|c_{jk}^{*}| \displaystyle\leq α2λmax(𝐄jk𝐍+(𝐰(jk))𝐄jk).\displaystyle\frac{\alpha}{2}\lambda_{\max}(\mathbf{E}^{\prime}_{jk}\mathbf{N}^{+}(\mathbf{w}^{(jk*)})\mathbf{E}_{jk}). (21)

The bounds provided by the above theorems can be computed analytically or by means of standard numerical linear algebra and relatively simple optimization. They also guarantee the finiteness of the SDP-based bounds (e.g., using (16) and (19)):

Theorem 3 (Finiteness of the SDP-based bounds)

Let 𝐝0𝒟N(n)\mathbf{d}_{0}\in\mathcal{D}_{N}^{(n)} have a nonsingular information matrix 𝐌0:=𝐌(𝐝0)𝕊++m\mathbf{M}_{0}:=\mathbf{M}(\mathbf{d}_{0})\in\mathbb{S}^{m}_{++}, and let α=Φ𝔅(𝐌0)\alpha=\Phi_{\mathfrak{B}}(\mathbf{M}_{0}). Then, the optimal values of the optimization problems (11) and (12) are finite.

We emphasize that the analytical bounds from Theorems 1 and 2 are never stronger than the computational bounds (11) and (12). However, they are often very simple, and for some criteria, they even coincide with the SDP-based bounds, as demonstrated below.

3.2 AA-, II-, MVMV-, and GG-optimality

As shown in Section 3.1.1, we have cjj0c_{jj}^{*}\geq 0 for j{1:m}j\in\{1{:}m\}. We therefore further examine only the remaining bounds. Table 1 provides an overview of the results for AA-, II-, MVMV-, and GG-optimality, and the following subsections give detailed explanations of these bounds.

Φ\Phi diagonal: cjjc_{jj}^{*}\leq nondiagonal: |cjk|\lvert c_{jk}^{*}\rvert\leq
computational analytical computational analytical
AA α\alpha α\alpha α/2\alpha/2 α/2\alpha/2
II \star α(𝐅𝐅)jj1\alpha(\mathbf{F}\mathbf{F}^{\prime})^{-1}_{jj} \star αmin{[(𝐅𝐅)jj1(𝐅𝐅)kk1]1/2,\alpha\min\left\{[(\mathbf{F}\mathbf{F}^{\prime})^{-1}_{jj}(\mathbf{F}\mathbf{F}^{\prime})^{-1}_{kk}]^{1/2},\right.
λmax(𝐄jk(𝐅𝐅)1𝐄jk)/2}\left.\lambda_{\max}(\mathbf{E}^{\prime}_{jk}(\mathbf{F}\mathbf{F}^{\prime})^{-1}\mathbf{E}_{jk})/2\right\}
MVMV α\alpha α\alpha α\alpha α\alpha
GG \star Section 3.2.4 \star Section 3.2.4
Table 1: Overview of the covariance bounds for the cases of AA-, II-, MVMV-, and GG-optimality, as obtained from solutions to (11) and (12) (computational) and from Theorems 1 and 2 (analytical). A star \star denotes that we are not aware of an explicit solution to (11) and (12). In the case of GG-optimality, the reader is referred to Section 3.2.4 for the analytical bounds, as they do not lend themselves to a succinct description.

3.2.1 AA-optimality

For AA-optimality (K=1K=1, 𝐁1=𝐁=𝐈\mathbf{B}_{1}=\mathbf{B}=\mathbf{I}), α=tr(𝐌01)\alpha=\mathrm{tr}(\mathbf{M}^{-1}_{0}), and j,k{1:m}j,k\in\{1{:}m\}, bounds (16)-(18) and (19)-(21) reduce to

cjj\displaystyle c^{*}_{jj} \displaystyle\leq α,\displaystyle\alpha, (22)
|cjk|\displaystyle|c^{*}_{jk}| \displaystyle\leq α2,jk.\displaystyle\frac{\alpha}{2},\>j\neq k. (23)

By solving the SDP formulations (11) and (12), we again obtain only (22) and (23); for the diagonal elements, we have already proven that cjjαc_{jj}^{*}\leq\alpha, so it is sufficient to find a 𝚺𝕊+m\boldsymbol{\Sigma}\in\mathbb{S}^{m}_{+} with tr(𝚺)α\mathrm{tr}(\boldsymbol{\Sigma})\leq\alpha and cjj=αc_{jj}=\alpha. These conditions are satisfied for 𝚺=α𝐞j𝐞j\boldsymbol{\Sigma}=\alpha\mathbf{e}_{j}\mathbf{e}_{j}^{\prime}. For the nondiagonal elements, the maximum is attained for 𝚺=α(𝐞j+𝐞k)(𝐞j+𝐞k)/2\boldsymbol{\Sigma}=\alpha(\mathbf{e}_{j}+\mathbf{e}_{k})(\mathbf{e}_{j}+\mathbf{e}_{k})^{\prime}/2, and the minimum is attained for 𝚺=α(𝐞j𝐞k)(𝐞j𝐞k)/2\boldsymbol{\Sigma}=\alpha(\mathbf{e}_{j}-\mathbf{e}_{k})(\mathbf{e}_{j}-\mathbf{e}_{k})^{\prime}/2.

3.2.2 II-optimality

For II-optimality (K=1K=1, 𝐁1=𝐁=𝐅:=(𝐟1,,𝐟n)\mathbf{B}_{1}=\mathbf{B}=\mathbf{F}:=(\mathbf{f}_{1},\ldots,\mathbf{f}_{n})), α=tr(𝐅𝐌01𝐅)\alpha=\mathrm{tr}(\mathbf{F}^{\prime}\mathbf{M}^{-1}_{0}\mathbf{F}), and j,k{1:m}j,k\in\{1{:}m\}, we obtain the following bounds from (16)-(18) and (19)-(21):

|cjk|\displaystyle|c^{*}_{jk}| \displaystyle\leq α[(𝐅𝐅)jj1(𝐅𝐅)kk1]1/2,\displaystyle\alpha[(\mathbf{F}\mathbf{F}^{\prime})^{-1}_{jj}(\mathbf{F}\mathbf{F}^{\prime})^{-1}_{kk}]^{1/2}, (24)
|cjk|\displaystyle|c^{*}_{jk}| \displaystyle\leq α2λmax(𝐄jk(𝐅𝐅)1𝐄jk),jk.\displaystyle\frac{\alpha}{2}\lambda_{\max}(\mathbf{E}^{\prime}_{jk}(\mathbf{F}\mathbf{F}^{\prime})^{-1}\mathbf{E}_{jk}),\>j\neq k. (25)

Neither of these bounds dominates the other. For instance, let (𝐅𝐅)1=𝐈2(\mathbf{F}\mathbf{F}^{\prime})^{-1}=\mathbf{I}_{2}, j=1j=1 and k=2k=2. Then, (24) becomes |c12|α|c^{*}_{12}|\leq\alpha, and (25) becomes |c12|α/2|c^{*}_{12}|\leq\alpha/2. In contrast, if (𝐅𝐅)1(\mathbf{F}\mathbf{F}^{\prime})^{-1} is a 2×22\times 2 diagonal matrix with 100 and 1 on the diagonal, then (24) is |c12|10α|c^{*}_{12}|\leq 10\alpha and (25) is |c12|50α|c^{*}_{12}|\leq 50\alpha.

3.2.3 MVMV-optimality

For MVMV-optimality (K=mK=m, 𝐁=𝐞\mathbf{B}_{\ell}=\mathbf{e}_{\ell}, {1:m}\ell\in\{1{:}m\}, 𝐁=𝐈\mathbf{B}=\mathbf{I}), α=max𝐞𝐌01𝐞\alpha=\max_{\ell}\mathbf{e}^{\prime}_{\ell}\mathbf{M}^{-1}_{0}\mathbf{e}_{\ell}, let j,k{1:m}j,k\in\{1{:}m\} and note that 𝐰(j+)=𝐰(j)=𝐞j\mathbf{w}^{(j+)}=\mathbf{w}^{(j*)}=\mathbf{e}_{j}. Therefore, bounds (17), (18), (20) and (21) all lead to the same conclusion:

|cjk|α.|c^{*}_{jk}|\leq\alpha. (26)

Bounds (16) and (19) lead to weaker constraints of |cjk|mα|c^{*}_{jk}|\leq m\alpha and |cjk|mα/2|c^{*}_{jk}|\leq m\alpha/2, respectively.

Again, the SDP formulations (11) and (12) cannot improve upon (26). As in the case of AA-optimality, this can be easily proven: both the diagonal and nondiagonal maxima for MVMV-optimality are attained at 𝚺=α𝟏m𝟏m\boldsymbol{\Sigma}=\alpha\mathbf{1}_{m}\mathbf{1}_{m}^{\prime}, and the nondiagonal minimum is attained at 𝚺=α(𝐞j𝐞k)(𝐞j𝐞k)\boldsymbol{\Sigma}=\alpha(\mathbf{e}_{j}-\mathbf{e}_{k})(\mathbf{e}_{j}-\mathbf{e}_{k})^{\prime}.

3.2.4 GG-optimality

For GG-optimality (K=nK=n, 𝐁=𝐟\mathbf{B}_{\ell}=\mathbf{f}_{\ell}, {1:n}\ell\in\{1{:}n\}, 𝐁=𝐅=(𝐟1,,𝐟n)\mathbf{B}=\mathbf{F}=(\mathbf{f}_{1},\ldots,\mathbf{f}_{n})), the situation becomes more complex and more interesting. Let α=max𝐟𝐌01𝐟\alpha=\max_{\ell}\mathbf{f}^{\prime}_{\ell}\mathbf{M}^{-1}_{0}\mathbf{f}_{\ell}.777It may be efficient to compute 𝐌0=𝐌(𝐝0)\mathbf{M}_{0}=\mathbf{M}(\mathbf{d}_{0}) via a heuristic for DD-optimality, as these algorithms are typically faster and more readily available than those for GG-optimality. We note that DD- and GG-optimal designs tend to be close because they coincide in approximate design theory. Note that for GG-optimality, the primary (original) and the artificial models coincide; thus, the auxiliary approximate design 𝐰\mathbf{w} on {1:K}\{1{:}K\} is in fact the approximate design on the original design space {1:n}\{1{:}n\}, and 𝐍(𝐰)\mathbf{N}(\mathbf{w}) is exactly the standard information matrix of an approximate design 𝐍(𝐰)=i=1nwi𝐟i𝐟i\mathbf{N}(\mathbf{w})=\sum_{i=1}^{n}w_{i}\mathbf{f}_{i}\mathbf{f}^{\prime}_{i}. Then,

  • 1.

    bounds (16) and (19) become |cjk|nα[(𝐅𝐅)jj1(𝐅𝐅)kk1]1/2|c^{*}_{jk}|\leq n\alpha[(\mathbf{F}\mathbf{F}^{\prime})^{-1}_{jj}(\mathbf{F}\mathbf{F}^{\prime})^{-1}_{kk}]^{1/2} for any j,kj,k and |cjk|nαλmax(𝐄jk(𝐅𝐅)1𝐄jk)/2|c_{jk}^{*}|\leq n\alpha\lambda_{\max}(\mathbf{E}^{\prime}_{jk}(\mathbf{F}\mathbf{F}^{\prime})^{-1}\mathbf{E}_{jk})/2 for jkj\neq k;

  • 2.

    bounds (17) and (20) require 𝐰(t+)\mathbf{w}^{(t+)} for each t{1:n}t\in\{1{:}n\}, which is straightforward to calculate using 𝐡(t+)=𝐅+𝐞t\mathbf{h}^{(t+)}=\mathbf{F}^{+}\mathbf{e}_{t} and

    𝐰(t+)=(|h1(t+)|,,|hn(t+)|)/i=1n|hi(t+)|;\mathbf{w}^{(t+)}=(|h_{1}^{(t+)}|,\ldots,|h_{n}^{(t+)}|)^{\prime}/\sum_{i=1}^{n}|h_{i}^{(t+)}|;
  • 3.

    bounds (18) and (21) require computing 𝐰(t)\mathbf{w}^{(t*)}, which in this case is the standard 𝐞t\mathbf{e}_{t}-optimal approximate design; it is known that this design can be efficiently computed via continuous linear programming (see Harman and Jurík (2008)).

3.3 Further improvements

Let j,k{1:m}j,k\in\{1{:}m\}. A general strategy for the construction of interval bounds [Ljk,Ujk][L_{jk},U_{jk}] for the optimal cjkc_{jk}^{*} is as follows. First, find appropriate m×mm\times m matrices 𝐇t\mathbf{H}_{t} and numbers γt\gamma_{t} for tTt\in T, where TT is an index set, such that we can be sure that 𝚺\boldsymbol{\Sigma}^{*} satisfies the constraints

tr(𝐇t𝚺)γt,tT.\mathrm{tr}(\mathbf{H}_{t}\boldsymbol{\Sigma}^{*})\leq\gamma_{t},\>t\in T. (27)

Second, construct LjkL_{jk} and UjkU_{jk} such that [Ljk,Ujk][L_{jk},U_{jk}] contains the values (𝚺)jk=𝐞j𝚺𝐞k(\boldsymbol{\Sigma})_{jk}=\mathbf{e}_{j}^{\prime}\boldsymbol{\Sigma}\mathbf{e}_{k} for any 𝚺𝕊++m\boldsymbol{\Sigma}\in\mathbb{S}^{m}_{++} satisfying (27). In Section 3.1, we applied this strategy with T={1:K}T=\{1{:}K\}, 𝐇t=𝐁t𝐁t\mathbf{H}_{t}=\mathbf{B}_{t}\mathbf{B}_{t}^{\prime}, and γtα\gamma_{t}\equiv\alpha. However, constraints of the type (27) can be added to those from Section 3.1 to obtain better theoretical covariance bounds. They can also be added to the SDP formulations (11) and (12) to algorithmically provide better bounds.

For instance, a stronger lower bound for cjjc_{jj}^{*}, j{1:m}j\in\{1{:}m\}, can be found by computing the 𝐞j\mathbf{e}_{j}-optimal approximate design for the primary model, i.e., by minimizing 𝐞j𝐌+(𝐰)𝐞j\mathbf{e}_{j}^{\prime}\mathbf{M}^{+}(\mathbf{w})\mathbf{e}_{j} under the constraint 𝐞j𝒞(𝐌(𝐰))\mathbf{e}_{j}\in\mathcal{C}(\mathbf{M}(\mathbf{w})), where 𝐰n\mathbf{w}\in\mathbb{R}^{n} (wi0w_{i}\geq 0 for all ii and iwi=1\sum_{i}w_{i}=1) is an approximate design for the original model. Let tjt_{j}^{*} be the minimal value of 𝐞j𝐌+(𝐰)𝐞j\mathbf{e}_{j}^{\prime}\mathbf{M}^{+}(\mathbf{w})\mathbf{e}_{j}. Clearly, for any 𝐝\mathbf{d} with a nonsingular information matrix, we then have 𝐞j𝐌1(𝐝)𝐞j=N1𝐞j𝐌1(𝐝/N)𝐞jtj/N\mathbf{e}_{j}^{\prime}\mathbf{M}^{-1}(\mathbf{d})\mathbf{e}_{j}=N^{-1}\mathbf{e}_{j}^{\prime}\mathbf{M}^{-1}(\mathbf{d}/N)\mathbf{e}_{j}\geq t_{j}^{*}/N. It follows that cjjtj/Nc_{jj}^{*}\geq t_{j}^{*}/N, which is always stronger than our standard bound cjj0c_{jj}^{*}\geq 0. However, unlike cjj0c^{*}_{jj}\geq 0, obtaining these mm bounds requires solving mm 𝐞j\mathbf{e}_{j}-optimal approximate design problems by means of, e.g., linear programming (Harman and Jurík (2008)). A similar approach can be applied to the nondiagonal elements. Nevertheless, we have decided not to explore these possibilities in this paper and instead leave them for further research.

4 Extensions

In this section, we provide guidelines for more complex optimal design problems that can be solved in a straightforward way via the proposed MILP approach.

4.1 Designs with replications

If we have a Φ𝔅\Phi_{\mathfrak{B}}-optimal design problem in which we allow for up to NiN_{i} replicated observations at the design point i{1:n}i\in\{1{:}n\}, an MILP formulation can be obtained by replicating the regressor 𝐟i\mathbf{f}_{i} exactly NiN_{i} times and then using (9). Setting Ni=NN_{i}=N for each i{1:n}i\in\{1{:}n\} allows us to compute optimal designs for the classical problem, i.e., with a single constraint representing the total number NN of replications at all design points combined, which is demonstrated in Section 5.1.

4.2 Multiple design constraints

In some applications, we need to restrict the set of permissible designs to a set smaller than 𝒟N(n)\mathcal{D}^{(n)}_{N}. Such additional constraints on 𝐝\mathbf{d} can correspond to safety, logistical or resource restrictions. Many such constraints are linear (see, e.g., Harman (2014), Sagnol and Harman (2015) and Harman et al. (2016) for examples), and they can be directly included in (9). Another class of linear constraints corresponds to the problem of optimal design augmentation, which entails finding an optimal design when specified trials must be performed or have already been performed (see Chapter 19 in Atkinson et al. (2007)). Constraints of this kind can also be incorporated into the MILP formulation in a straightforward way (diNid_{i}\geq N_{i}, where Ni{0,1,}N_{i}\in\{0,1,\ldots\} is the number of trials already performed at point ii), but a design 𝐝0\mathbf{d}_{0} satisfying the required constraints must be used to construct the covariance bounds. However, the practical nature of the constraints usually allows for simple selection of such an initial 𝐝0\mathbf{d}_{0}. MILP with additional design constraints is used in Section 5.1 to obtain a “space-filling” optimal design and in Section 5.2 to find a cost-constrained optimal design (see Sections 5.1 and 5.2 for the formal definitions of such designs).

4.3 Optimal designs under covariance constraints

A unique feature of the proposed approach is that we can add to (9) any linear constraints on the elements of the variance matrix 𝚺\boldsymbol{\Sigma}, and the resulting problem will still be a mixed-integer linear program, whereas the available heuristics for computing optimal designs do not allow for such variance–covariance constraints. For instance, we can specify that the absolute values of all covariances should be less than a selected threshold (|cjk|c|c_{jk}|\leq c for jkj\neq k), with the aim of finding a design that is optimal among those that are “close” to orthogonal (or even exactly orthogonal). Similarly, we can add the condition that the sum of all variances of the least-squares estimators should be less than some number (jcjjc\sum_{j}c_{jj}\leq c), with the aim of, e.g., finding the MVMV-optimal design among all designs that attain at most a given value of AA-optimality. More generally, we can compute a Φ𝔅\Phi_{\mathfrak{B}}-optimal design that attains at most a selected value of a different criterion from the same class.

As in Section 4.2, a design 𝐝0\mathbf{d}_{0} that satisfies the required constraints must be used to construct the covariance bounds—which may not be easy to find for constraints such as |cjk|c|c_{jk}|\leq c. Fortunately, one class of constraints has such designs 𝐝0\mathbf{d}_{0} readily available: those that require the design to attain a certain value of a different optimality criterion from the Φ𝔅\Phi_{\mathfrak{B}} class. For instance, we can find the MVMV-optimal design among all the designs that satisfy jcjjc\sum_{j}c_{jj}\leq c (i.e., ΦA(𝐌(𝐝))c\Phi_{A}(\mathbf{M}(\mathbf{d}))\leq c: a limit on the value of AA-optimality). Here, 𝐝0\mathbf{d}_{0} can be chosen as either the AA-optimal design or a design obtained via a heuristic for AA-optimality. Such 𝐝0\mathbf{d}_{0} then allows for any cc such that cΦA(𝐌(𝐝0))c\geq\Phi_{A}(\mathbf{M}(\mathbf{d}_{0})), which captures all reasonable choices for cc.888A lower cc would mean that we seek a design that is better with respect to AA-optimality than either the AA-optimal design itself or some best known design. The applicability of this approach is also demonstrated in Section 5.1.

4.4 Infinite design spaces

We focus on finite design spaces because integer programming approaches are unsuitable for continuous design spaces (as there would be an infinite number of variables in the latter case). Finiteness of the design set is common in both experimental design and related disciplines (e.g., Mitchell (1974), Atkinson and Donev (1996), Ghosh et al. (2008), Bailey and Cameron (2009), Todd (2016)). Moreover, even if the theoretical design space is infinite, it can be discretized, which is a typical technique in design algorithms (e.g., Fedorov (1989), Meyer and Nachtsheim (1995)). This often results in only a negligible loss in the efficiency of the achievable design. Although some heuristics can handle continuous design spaces directly, they are still only heuristics and, as such, risk providing a design that is worse than one obtained by applying a discrete optimizer to a discretized design space (cf. Harman et al. (2021)).

Nevertheless, for the case of continuous design spaces, the MILP solution can be very useful for finding the best design on a small support predetermined by, for instance, some approximate optimal design algorithm, such as that of Harman et al. (2021). This is demonstrated in Section 5.2.

4.5 Nonlinear models

Because our results rely on the additive form of the information matrix, they can be straightforwardly extended to the usual generalizations by simply adapting the formula for the information matrix. For a nonlinear regression y(𝐱)=𝐡(𝐱,𝜷)+ε𝐱y(\mathbf{x})=\mathbf{h}(\mathbf{x},\boldsymbol{\beta})+\varepsilon_{\mathbf{x}} with Gaussian errors, the usual local optimality method can be used (see, e.g., Pronzato and Pázman (2013), Chapter 5), resulting in the information matrix

𝐌(𝐝)=i=1ndi𝐡(𝐱i,𝜷)𝜷(𝐡(𝐱i,𝜷)𝜷)|𝜷=𝜷0,\mathbf{M}(\mathbf{d})=\sum_{i=1}^{n}d_{i}\left.\frac{\partial\mathbf{h}(\mathbf{x}_{i},\boldsymbol{\beta})}{\partial\boldsymbol{\beta}}\left(\frac{\partial\mathbf{h}(\mathbf{x}_{i},\boldsymbol{\beta})}{\partial\boldsymbol{\beta}}\right)^{\prime}\right|_{\boldsymbol{\beta}=\boldsymbol{\beta}_{0}},

where 𝜷0\boldsymbol{\beta}_{0} is a nominal parameter that is assumed to be close to the true value of 𝜷\boldsymbol{\beta}. This approach is demonstrated in Section 5.2. Generalized linear models can be dealt with in a similar manner; see Khuri et al. (2006) and Atkinson and Woods (2015). In other words, if we use the so-called approach of locally optimal design for nonlinear models, the computational methods are equivalent to those of standard optimal design for linear models.

5 Numerical study

All computations reported in this section were performed on a personal computer with the 64-bit Windows 10 operating system and an Intel i5-6400 processor with 8 GB of RAM. Our algorithm was implemented in the statistical software R, the mixed-integer linear programs were solved using the R implementation of the Gurobi software (Gurobi Optimization, LLC (2023)), and the MISOCP computations of the optimal designs were performed using the OptimalDesign package (Harman and Filová (2019)), ultimately also calling the Gurobi solver. The SDP-based bounds (11) and (12) were solved for via convex programming using the CVXR package (Fu et al. (2020)). The R codes for applying our MILP approach are available at http://www.iam.fmph.uniba.sk/ospm/Harman/design/, and the authors also plan to implement the MILP algorithm in the OptimalDesign package to make it available in a more user-friendly manner.

Naturally, all the reported running times of the MILP approach include the time it took to compute the relevant bounds via the SDP formulations (11) and (12) or via the analytical formulas in Section 3.1.2. We also note that in all the following applications of our MILP formulation, we used the analytical bounds (22), (23) and (26) for AA- and MVMV-optimality because they are equivalent to the SDP bounds (11) and (12) but less time consuming to compute. For II- and GG-optimality, we considered both approaches, and we selected the one that tended to result in faster computation of the entire MILP algorithm: the SDP bounds for II-optimality and the analytical bounds for GG-optimality. Note that although the SDP bounds for GG-optimality tend to result in shorter total computation times for larger models, the analytical version enabled overall faster MILP computations for the models considered in this paper. The initial designs 𝐝0\mathbf{d}_{0} in the case of GG-optimality were computed using the KL exchange algorithm (see Chapter 12 of Atkinson et al. (2007)) for DD-optimality; for all other criteria, the KL exchange algorithm for AA-optimality was used.

5.1 Quadratic regression

Let us demonstrate the MILP algorithm on the quadratic regression model

yi=β1+β2xi+β3xi2+εi,y_{i}=\beta_{1}+\beta_{2}x_{i}+\beta_{3}x_{i}^{2}+\varepsilon_{i}, (28)

where the xix_{i} values lie in the domain [1,1][-1,1] equidistantly discretized to nn points, i.e., xi{1,1+2/(n1),,12/(n1),1}x_{i}\in\{-1,-1+2/(n-1),\ldots,1-2/(n-1),1\}. The exact optimal designs for model (28) have been extensively studied in the past (e.g., Gaffke and Krafft (1982), Chang and Yeh (1998), Imhof (2000), Imhof (2014)), but for some criteria, sample sizes and design spaces, there are still no known analytical solutions.

To illustrate the MILP computational approach, we select n=31n=31 design points and N=5N=5 observations, and we compute the AA-, II-, MVMV- and GG-optimal designs. The running times of these computations are given in the first row of Table 2, and the resulting designs are illustrated in Figure 1 (left). The running times show that the optimal design calculations using MILP strongly depend on the selected optimality criterion.

Criterion A I MV G
Binary 3.99 (0.26) 11.75 (1.18) 2.59 (0.33) 36.62 (0.90)
Replicated 74.98 (2.67) 214.57 (13.66) 64.44 (4.00) 487.56 (61.59)
Table 2: Running times of the MILP computations of optimal designs for the quadratic regression model (28) (N=5N=5, n=31n=31, m=3m=3) for binary designs (first row) and for designs with allowed replications (second row). Each algorithm was run 10 times, and the means and standard deviations of these runs are reported in the form “mean (standard deviation)”. All times are in seconds.
Refer to caption
Figure 1: Optimal designs for the quadratic regression model (28). The dots represent the values of xi[1,1]x_{i}\in[-1,1] for which di=1d_{i}=1. The designs in the left column are unconstrained, whereas the designs in the right column are constrained by (29), with the constraint limits indicated by blue braces [,]. The design “G-opt. 1” is for the original (discretized) model, and “G-opt. 2” is the design for the model with added points x=±γx=\pm\gamma; the optimal support points γ\gamma reported by Imhof (2014) are indicated by red lines.

The above results can be used to support a conjecture presented by Imhof (2014), who stated that a GG-optimal design for quadratic regression on the (continuous, not discretized) domain [1,1][-1,1] should satisfy Theorem 1 of Imhof (2014) even for N=5N=5, although he did not prove this conjecture. That is, such a design should specify one trial to be performed at each of the points x{1,γ,0,γ,1}x\in\{-1,-\gamma,0,\gamma,1\}, where γ(0,1)\gamma\in(0,1) is the solution to γ47γ2+4=0-\gamma^{4}-7\gamma^{2}+4=0. By plotting the corresponding value γ0.73\gamma\approx 0.73 (Figure 1), we see that the GG-optimal design indeed satisfies Theorem 1 of Imhof (2014) (up to the considered discretization). If we add the points ±γ\pm\gamma to the set of permissible xxs, then the GG-optimal design on the discrete set selects these points and thus exactly corresponds to the GG-optimal design hypothesized by Imhof (2014); see the design labeled “G-opt. 2” in Figure 1 (left).

Design constraints

As mentioned in Sections 4.2 and 4.3, a beneficial property of the MILP formulation is that it can easily accommodate any constraint that is linear in the design values and in the elements of the covariance matrix 𝐌1(𝐝)\mathbf{M}^{-1}(\mathbf{d}). For instance, the designs in Figure 1 (left) are focused on the regions around 0, 11 and 1-1, but one can naturally require a design that is more space-filling. In the present model, we formulate this by introducing the additional constraints

i=611di1,i=2126di1.\sum_{i=6}^{11}d_{i}\geq 1,\quad\sum_{i=21}^{26}d_{i}\geq 1. (29)

These constraints enforce that at least one trial should be performed for xx between 2/3-2/3 and 1/3-1/3 and also for xx between 1/31/3 and 2/32/3. Such constraints can be included in the linear program, and all the computations can proceed as usual. One exception is that a design 𝐝0\mathbf{d}_{0} that satisfies the constraints (29) must be used to construct the relevant covariance bounds. For this purpose, we used a design 𝐝0\mathbf{d}_{0} that assigns one trial to each of the design points x{1,7/15,0,8/15,1}x\in\{-1,-7/15,0,8/15,1\}, which satisfies (29). The resulting constrained optimal designs obtained via MILP are depicted in Figure 1 (right).

Covariance constraints

The AA- and GG-optimal designs for (28) differ quite significantly, so we might be interested in finding a “compromise” design. This can be achieved by, e.g., finding the AA-optimal design among all the designs that attain at most a specified value of the GG-optimality criterion. In particular, we have ΦG(𝐌(𝐝A))1.00\Phi_{G}(\mathbf{M}(\mathbf{d}_{A}^{*}))\approx 1.00 and ΦG(𝐌(𝐝G))0.75\Phi_{G}(\mathbf{M}(\mathbf{d}_{G}^{*}))\approx 0.75, where 𝐝A\mathbf{d}_{A}^{*} and 𝐝G\mathbf{d}_{G}^{*} are the AA- and GG-optimal designs, respectively, and we wish to construct the design 𝐝AG\mathbf{d}_{A_{G}}^{*} that is AA-optimal among all designs 𝐝\mathbf{d} that have ΦG(𝐌(𝐝))0.9\Phi_{G}(\mathbf{M}(\mathbf{d}))\leq 0.9 (i.e., such designs have a maximum variance of 𝐟(xi)𝜷^\mathbf{f}^{\prime}(x_{i})\widehat{\boldsymbol{\beta}}, i=1,,ni=1,\ldots,n, of at most 0.9). The constraints ΦG(𝐌(𝐝))0.9\Phi_{G}(\mathbf{M}(\mathbf{d}))\leq 0.9 are linear in 𝚺=𝐌1(𝐝)\boldsymbol{\Sigma}=\mathbf{M}^{-1}(\mathbf{d}), which means that they can be incorporated into the MILP formulation. To construct the covariance bounds, a design 𝐝0\mathbf{d}_{0} that satisfies ΦG(𝐌(𝐝0))0.9\Phi_{G}(\mathbf{M}(\mathbf{d}_{0}))\leq 0.9 must be chosen. As discussed in Section 4.3, a natural choice is the GG-optimal design, i.e., 𝐝0=𝐝G\mathbf{d}_{0}=\mathbf{d}_{G}^{*}. The optimal constrained design 𝐝AG\mathbf{d}_{A_{G}}^{*} obtained via MILP is depicted in Figure 2 (left), together with 𝐝A\mathbf{d}_{A}^{*} and 𝐝G\mathbf{d}_{G}^{*}. This figure illustrates that 𝐝AG\mathbf{d}_{A_{G}}^{*} is indeed a compromise between 𝐝A\mathbf{d}_{A}^{*} and 𝐝G\mathbf{d}_{G}^{*}.

Refer to caption
Figure 2: Left: AA- and GG-optimal (binary) designs for the quadratic regression model (28) and the design that is AA-optimal among all the designs 𝐝\mathbf{d} that satisfy ΦG(𝐌(𝐝))0.9\Phi_{G}(\mathbf{M}(\mathbf{d}))\leq 0.9. Right: Optimal designs with replications. Smaller dots correspond to di=1d_{i}=1 (one replication), and larger dots correspond to di=3d_{i}=3 (three replications).
Replicated observations

All the designs discussed above are binary, i.e., without replications. To allow for any number of replications, we can simply repeat each regressor NN times and apply the MILP algorithm, as outlined in Section 4.1. The resulting designs for (28) with N=5N=5 are illustrated in Figure 2 (right), and the computation times are given in Table 2. In the AA- and MVMV-optimal designs, the design point xi=0x_{i}=0 is replicated three times, while the II- and GG-optimal designs are still binary even when replications are allowed. The observed computation times demonstrate that allowing for replications can result in a significantly increased running time.

5.2 Nonlinear regression

Nonlinear models, continuous design spaces

In this section, we demonstrate the usefulness of the MILP approach for nonlinear models and for infinite design spaces. Consider the following model given by Pronzato and Zhigljavsky (2014) (Problem 10 therein):

yi=β1+β2eβ3xi,1+β4β4β5(eβ5xi,2eβ4xi,2)+εi,y_{i}=\beta_{1}+\beta_{2}e^{-\beta_{3}x_{i,1}}+\frac{\beta_{4}}{\beta_{4}-\beta_{5}}(e^{-\beta_{5}x_{i,2}}-e^{-\beta_{4}x_{i,2}})+\varepsilon_{i}, (30)

localized at 𝜷0=(1,1,2,0.7,0.2)\boldsymbol{\beta}_{0}=(1,1,2,0.7,0.2)^{\prime}, where 𝐱i=(xi,1,xi,2)[0,2]×[0,10]\mathbf{x}_{i}=(x_{i,1},x_{i,2})^{\prime}\in[0,2]\times[0,10] and εi\varepsilon_{i} follows the Gaussian distribution 𝒩(0,1)\mathcal{N}(0,1). Let us seek a GG-optimal design for N=6N=6 trials. Since the design space is continuous, we cannot apply MILP directly. Instead, we first apply the approximate design algorithm GEX proposed by Harman et al. (2021) to find the GG-optimal approximate design (because GG- and DD-optimality coincide in the approximate design case, we use the GEX algorithm for DD-optimality), and we consider the points that support the GG-optimal approximate design (i.e., 𝐱i\mathbf{x}_{i} such that wi>0w_{i}>0); see Figure 3 (left). We then find the exact design that is GG-optimal on these nine “candidate” support points using our MILP approach; see Figure 3 (middle). This design is highly efficient among the designs on the entire design space [0,2]×[0,10][0,2]\times[0,10]; for instance, it coincides with the design obtained via the KL exchange algorithm for DD-optimality applied on a densely discretized version of [0,2]×[0,10][0,2]\times[0,10].

Refer to caption
Figure 3: Support points (gray) of the GG-optimal approximate design for model (30) (left), the design (black) that is GG-optimal among all exact designs supported on these candidate points (middle), and the design (black) that is GG-optimal among exact designs on these points that satisfy cost constraint (31) with p1=2p_{1}=2, p2=1p_{2}=1 and B=20B=20 (right).

Whereas the GEX algorithm took a few seconds and the subsequent MILP computations required less than one second, the KL exchange algorithm required 10 minutes to arrive at the same design. Consequently, for a model with multiple continuous factors, the direct application of an exchange heuristic would be unfeasible; the current relatively simple model was chosen only so that the efficiency of the proposed approach could be easily demonstrated. Note also that the GEX algorithm works by using a dense discretization of the design space (in particular, we considered a grid of more than 20 million points), but any approximate design algorithm for continuous spaces can be used instead, such as particle swarm optimization (cf. Chen et al. (2022)). In addition, one may employ any nonlinear programming approach for a continuous domain to fine-tune the positions of the support points of the exact design determined by our approach.

Cost constraints

Because the MILP formulation allows for the inclusion of extra constraints, we can consider a cost constraint of the form

p1i=19dixi,1+p2i=19dixi,2B,p_{1}\sum_{i=1}^{9}d_{i}x_{i,1}+p_{2}\sum_{i=1}^{9}d_{i}x_{i,2}\leq B, (31)

where p1p_{1} and p2p_{2} represent the costs of factors 1 and 2, respectively; BB is the total budget; and 𝐱1,,𝐱9\mathbf{x}_{1},\ldots,\mathbf{x}_{9} are the nine candidate points, with 𝐱i=(xi,1,xi,2)\mathbf{x}_{i}=(x_{i,1},x_{i,2})^{\prime}. We then use the MILP approach to find the design that is GG-optimal among all the designs supported on the candidate points satisfying (31) with p1=2p_{1}=2, p2=1p_{2}=1 and B=20B=20. To construct the covariance bounds for the MILP formulation, we select the design 𝐝0\mathbf{d}_{0} in which all 6 trials are assigned to the 6 candidate points that satisfy xi,2<2x_{i,2}<2. The optimal cost-constrained design given by the MILP algorithm is depicted in Figure 3 (right).

5.3 Time complexity

To estimate the typical running time of the MILP computation, we consider an (artificial) “Gaussian” model

yi=𝐟i𝜷+εi;𝐟i{𝐟1,,𝐟n},y_{i}=\mathbf{f}_{i}^{\prime}\boldsymbol{\beta}+\varepsilon_{i};\quad\mathbf{f}_{i}\in\{\mathbf{f}_{1},\ldots,\mathbf{f}_{n}\}, (32)

where the vectors 𝐟1,,𝐟n\mathbf{f}_{1},\ldots,\mathbf{f}_{n} are independently generated from the mm-dimensional standardized Gaussian distribution 𝒩m(𝟎m,𝐈m)\mathcal{N}_{m}(\mathbf{0}_{m},\mathbf{I}_{m}). Figure 4 shows the mean running times based on 10 runs of MILP computations of the AA-optimal designs for various values of NN, nn and mm. To compare the MILP approach with its most natural competitor, the mean running times of ten runs of the MISOCP approach proposed by Sagnol and Harman (2015) are also included in this plot. Overall, the MILP algorithm is often faster than the MISOCP algorithm for smaller models, but it tends to fall behind in performance as nn and mm increase. Note that the two algorithms always returned designs with the same value of the optimality criterion, so we report only their respective computation times.

Refer to caption
Figure 4: Mean running times of 10 runs of the MILP (red circles) and MISOCP (blue triangles) algorithms for various models of the form (32). In each graph, one value (NN, nn or mm) is varied, while the other two remain fixed—the values of the fixed parameters are specified at the bottom of each graph. All times are in seconds.

We also consider three more realistic models: a basic linear regression model

yi\displaystyle y_{i} =β1+β2xi+εi,\displaystyle=\beta_{1}+\beta_{2}x_{i}+\varepsilon_{i},
xi\displaystyle x_{i} {1,1+2n1,,12n1,1},\displaystyle\in\{-1,-1+\frac{2}{n-1},\ldots,1-\frac{2}{n-1},1\}, (33)

with n=31n=31 design points and m=2m=2 parameters; the quadratic regression model (28) with n=21n=21 and m=3m=3; and a factorial main-effects model

yi=β1+β2xi,1++βmxi,m1+εi;xi,j{1,1}y_{i}=\beta_{1}+\beta_{2}x_{i,1}+\ldots+\beta_{m}x_{i,m-1}+\varepsilon_{i};\quad x_{i,j}\in\{-1,1\} (34)

with n=16n=16 and m=5m=5. We compare the MILP and MISOCP computations of the AA- and II-optimal designs for these models for selected numbers of observations NN; the results are reported in Table 3.

Model A-opt. I-opt.
tMILPt_{MILP} tMISOCPt_{MISOCP} tMILPt_{MILP} tMISOCPt_{MISOCP}
(33), m=2m=2, n=21n=21, N=6N=6 0.20 (0.02) 0.26 (0.02) 1.05 (0.06) 0.25 (0.02)
(28), m=3m=3, n=31n=31, N=9N=9 24.31 (2.12) 0.32 (0.07) 137.69 (7.17) 1.07 (0.06)
(34), m=5m=5, n=16n=16, N=7N=7 2.67 (0.30) 20.49 (0.94) 2.84 (0.12) 20.46 (1.17)
Table 3: Running times of the MILP and MISOCP computations of AA- and II-optimal designs for the given models. Each algorithm was run 10 times, and the means and standard deviations of these runs are reported in the form “mean (standard deviation)”. All times are in seconds.

The above examples suggest that the MILP approach is often slower than the MISOCP approach for larger nn and mm, but not always. Moreover, even in situations where MILP is currently slower than MISOCP, this may change in the future, e.g., with new developments in MILP solvers (see the Discussion). It should also be noted that the comparisons were performed only for problems with an existing MISOCP formulation. For certain problems that can be solved via our MILP-based approach, such as MVMV-optimality problems and problems with covariance or design constraints, we are unaware of available MISOCP implementations; hence, corresponding comparisons were not feasible.

6 Discussion

We have shown that a large class of optimal exact design problems can be formulated as mixed-integer linear programs. Although our implementation of the MILP computation of exact designs is generally comparable to the existing MISOCP approach, there are several important advantages of the MILP approach:

  1. 1.

    The MILP formulation is important from a theoretical perspective—it shows that many optimal design problems can be solved using solvers that are even more specialized than MISOCP solvers, continuing the “increasing specialization” trend of mixed-integer programming \to mixed-integer second-order cone programming \to mixed-integer linear programming.

  2. 2.

    We provide an MILP formulation for criteria and design problems for which, to our knowledge, no MISOCP formulation has been published—for example, MVMV-optimality and optimal designs under constraints on the (co)variances of the least-squares estimator.

  3. 3.

    MILP solvers are more readily available than MISOCP solvers. For example, there are freely available solvers for MILP (e.g., the package lpSolve) in R, whereas the MISOCP solvers in R all seem to be dependent on commercial software (such as gurobi), although currently with free licenses for academics.

  4. 4.

    It is possible that the MILP algorithm may become a faster means of computing optimal designs than the MISOCP algorithm, with either further developments in MILP solvers, the use of different solvers, the application of particular tricks such as “symmetry breaking”, or perhaps a different MILP formulation. Moreover, in this study, we used only the default settings, as our aim was not to fine-tune the solver for particular problems but rather to examine its expected behavior—as such, it may be possible to increase the speed of the MILP computations by using customized solver settings.

It is therefore important to know that the considered optimal design problems can be expressed in MILP form at all so that, starting from this foundation, the formulation and implementation can be further improved.

It is also worth noting that methods with a guarantee of optimality (such as MILP and MISOCP) are generally limited to either small or medium-sized problems due to their time complexity, unlike various exchange-type heuristics. However, in classical statistical applications, the problem size is often not too large (e.g., the number mm of variables is usually less than 1010). In practice, therefore, we would recommend first attempting to use an “exact” method such as MILP or MISOCP for a given problem and applying a heuristic method only if the exact method takes too long to find a solution or if the problem is clearly too large. Moreover, the MILP and MISOCP formulations can accommodate additional constraints that are beyond the capabilities of available heuristic methods.

We also emphasize that the MILP approach, similarly to other algorithms for discrete design spaces, can also be used for solving relevant subproblems even for continuous design spaces, as demonstrated in Section 5.2.

Finally, we note that the provided MILP formulation opens up interesting research avenues: e.g., the analytical and computational construction of better covariance bounds, and the construction of designs 𝐝0\mathbf{d}_{0} for arbitrary design or covariance constraints.

Declarations

Declarations of interest: none.

Acknowledgments

This work was supported by the Slovak Scientific Grant Agency [grant VEGA 1/0362/22].

Appendix A MILP formulation

Here, we give the exact mathematical form of the MILP formulation (9) that can be used as an input to MILP solvers. Note that we use the vec\mathrm{vec} operator (the vectorization of a matrix obtained by stacking its columns), but due to symmetry, the more efficient formulation with vech\mathrm{vech} (the vector-half operator for symmetric matrices) for both 𝚺\boldsymbol{\Sigma} and 𝐳\mathbf{z} can be used instead. However, the vech\mathrm{vech} version seems too complicated to describe concisely here.

For simplicity, we first consider AA-optimality, i.e., Φ𝔅(𝐌)=tr(𝐌1)\Phi_{\mathfrak{B}}(\mathbf{M})=\mathrm{tr}(\mathbf{M}^{-1}). For AA-optimality, formulation (8) is still linear and simpler than (9), so we actually describe (8) here. Let 𝐜=vec(𝚺)m2\mathbf{c}=\mathrm{vec}(\boldsymbol{\Sigma})\in\mathbb{R}^{m^{2}}, 𝐌i=𝐟i𝐟i\mathbf{M}_{i}=\mathbf{f}_{i}\mathbf{f}_{i}^{\prime}, 𝐙i=(zijk)j,k=1mm×m\mathbf{Z}_{i}=(z_{ijk})_{j,k=1}^{m}\in\mathbb{R}^{m\times m}, 𝐳i=vec(𝐙i)m2\mathbf{z}_{i}=\mathrm{vec}(\mathbf{Z}_{i})\in\mathbb{R}^{m^{2}} (for i={1:n}i=\{1{:}n\}), 𝐳=(𝐳1,,𝐳n)\mathbf{z}=(\mathbf{z}_{1}^{\prime},\ldots,\mathbf{z}_{n}^{\prime})^{\prime}, and 𝐱=(𝐳,𝐝,𝐜)\mathbf{x}=(\mathbf{z}^{\prime},\mathbf{d}^{\prime},\mathbf{c}^{\prime})^{\prime}. Overall, we have the vector of variables 𝐱=(𝐳,𝐝,𝐜)\mathbf{x}=(\mathbf{z}^{\prime},\mathbf{d}^{\prime},\mathbf{c}^{\prime})^{\prime}, where 𝐳nm2\mathbf{z}\in\mathbb{R}^{nm^{2}}, 𝐝{0,1}n\mathbf{d}\in\{0,1\}^{n} and 𝐜m2\mathbf{c}\in\mathbb{R}^{m^{2}}. Below, we express the constraints and the objective function in (8) using 𝐱\mathbf{x}.

1. 𝐌𝚺=𝐈m\mathbf{M}\boldsymbol{\Sigma}=\mathbf{I}_{m}: The left-hand side can be expressed as

𝐌𝚺=i=1ndi𝐌i𝚺=i=1n𝐌i𝐙i.\mathbf{M}\boldsymbol{\Sigma}=\sum_{i=1}^{n}d_{i}\mathbf{M}_{i}\boldsymbol{\Sigma}=\sum_{i=1}^{n}\mathbf{M}_{i}\mathbf{Z}_{i}.

By applying the vec\mathrm{vec} operator, we obtain i(𝐈m𝐌i)vec(𝐙i)=vec(𝐈m)\sum_{i}(\mathbf{I}_{m}\otimes\mathbf{M}_{i})\mathrm{vec}(\mathbf{Z}_{i})=\mathrm{vec}(\mathbf{I}_{m}) because vec(𝐀𝐁)=(𝐈𝐀)vec(𝐁)\mathrm{vec}(\mathbf{A}\mathbf{B})=(\mathbf{I}\otimes\mathbf{A})\mathrm{vec}(\mathbf{B}). It follows that 𝐌𝚺=𝐈m\mathbf{M}\boldsymbol{\Sigma}=\mathbf{I}_{m} can be expressed as

(𝐈m𝐌1𝐈m𝐌n𝟎m2×n𝟎m2×m2)𝐱=vec(𝐈m)\begin{pmatrix}\mathbf{I}_{m}\otimes\mathbf{M}_{1}&\ldots&\mathbf{I}_{m}\otimes\mathbf{M}_{n}&\mathbf{0}_{m^{2}\times n}&\mathbf{0}_{m^{2}\times m^{2}}\end{pmatrix}\mathbf{x}=\mathrm{vec}(\mathbf{I}_{m})

(the matrices of zeros correspond to 𝐝\mathbf{d} and 𝐜\mathbf{c} in 𝐱\mathbf{x}).

2. zijkdiLjk0z_{ijk}-d_{i}L_{jk}\geq 0: For each ii, this can be expressed as vec(𝐙i)divec(𝐋)𝟎m2\mathrm{vec}(\mathbf{Z}_{i})-d_{i}\mathrm{vec}(\mathbf{L})\geq\mathbf{0}_{m^{2}}, where 𝐋=(Ljk)j,k=1mm×m\mathbf{L}=(L_{jk})_{j,k=1}^{m}\in\mathbb{R}^{m\times m}. It follows that zijkdiLjk0z_{ijk}-d_{i}L_{jk}\geq 0 for all i{1:n}i\in\{1{:}n\} is equivalent to

(𝐈nm2𝐈nvec(𝐋)𝟎nm2×m2)𝐱𝟎nm2.\begin{pmatrix}\mathbf{I}_{nm^{2}}&-\mathbf{I}_{n}\otimes\mathrm{vec}(\mathbf{L})&\mathbf{0}_{nm^{2}\times m^{2}}\end{pmatrix}\mathbf{x}\geq\mathbf{0}_{nm^{2}}.

3. The other constraints, (5)-(7), can be similarly expressed as

(𝐈nm2𝐈nvec(𝐔)𝟏n𝐈m2)𝐱𝟏nvec(𝐔),\begin{pmatrix}\mathbf{I}_{nm^{2}}&-\mathbf{I}_{n}\otimes\mathrm{vec}(\mathbf{U})&-\mathbf{1}_{n}\otimes\mathbf{I}_{m^{2}}\end{pmatrix}\mathbf{x}\geq-\mathbf{1}_{n}\otimes\mathrm{vec}(\mathbf{U}),
(𝐈nm2𝐈nvec(𝐔)𝟎nm2×m2)𝐱𝟎nm2,\begin{pmatrix}\mathbf{I}_{nm^{2}}&-\mathbf{I}_{n}\otimes\mathrm{vec}(\mathbf{U})&\mathbf{0}_{nm^{2}\times m^{2}}\end{pmatrix}\mathbf{x}\leq\mathbf{0}_{nm^{2}},
(𝐈nm2𝐈nvec(𝐋)𝟏n𝐈m2)𝐱𝟏nvec(𝐋).\begin{pmatrix}\mathbf{I}_{nm^{2}}&-\mathbf{I}_{n}\otimes\mathrm{vec}(\mathbf{L})&-\mathbf{1}_{n}\otimes\mathbf{I}_{m^{2}}\end{pmatrix}\mathbf{x}\leq-\mathbf{1}_{n}\otimes\mathrm{vec}(\mathbf{L}).

4. 𝟏n𝐝=N\mathbf{1}_{n}^{\prime}\mathbf{d}=N: This is clearly equivalent to (𝟎nm2,𝟏n,𝟎m2)𝐱=N(\mathbf{0}_{nm^{2}}^{\prime},\mathbf{1}_{n}^{\prime},\mathbf{0}_{m^{2}}^{\prime})\mathbf{x}=N.

5. The objective function for AA-optimality is tr(𝚺)=𝐚𝐱\mathrm{tr}(\boldsymbol{\Sigma})=\mathbf{a}^{\prime}\mathbf{x}, where 𝐚=(𝟎nm2,𝟎n,vec(𝐈m))\mathbf{a}^{\prime}=(\mathbf{0}_{nm^{2}}^{\prime},\mathbf{0}_{n}^{\prime},\mathrm{vec}(\mathbf{I}_{m})^{\prime}).

Constraints 1-4 and objective function 5 above form the MILP formulation for AA-optimality. For a general Φ𝔅\Phi_{\mathfrak{B}} criterion, we have φtr(𝐁𝚺𝐁)\varphi\geq\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\boldsymbol{\Sigma}\mathbf{B}_{\ell}) for ={1:K}\ell=\{1{:}K\} in the more general formulation (9). The vector 𝐱\mathbf{x} is then redefined as 𝐱=(𝐳,𝐝,𝐜,φ)\mathbf{x}=(\mathbf{z}^{\prime},\mathbf{d}^{\prime},\mathbf{c}^{\prime},\varphi)^{\prime}, and the objective function in item 5 above becomes 𝐚𝐱\mathbf{a}^{\prime}\mathbf{x}, where 𝐚=(𝟎nm2,𝟎n,𝟎m2,1)\mathbf{a}^{\prime}=(\mathbf{0}_{nm^{2}}^{\prime},\mathbf{0}_{n}^{\prime},\mathbf{0}_{m^{2}}^{\prime},1), and a column of zeros representing φ\varphi is added to each left-hand-side matrix in constraints 1-4.

6. An additional set of constraints, representing φtr(𝐁𝚺𝐁)\varphi\geq\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\boldsymbol{\Sigma}\mathbf{B}_{\ell}), also needs to be included. These constraints can be expressed by setting 𝐆=𝐁𝐁\mathbf{G}_{\ell}=\mathbf{B}_{\ell}\mathbf{B}_{\ell}^{\prime} (={1:K}\ell=\{1{:}K\}):

φtr(𝐁𝚺𝐁)=tr(𝚺𝐆)=vec(𝐆)vec(𝚺),\varphi\geq\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\boldsymbol{\Sigma}\mathbf{B}_{\ell})=\mathrm{tr}(\boldsymbol{\Sigma}\mathbf{G}_{\ell})=\mathrm{vec}(\mathbf{G}_{\ell})^{\prime}\mathrm{vec}(\boldsymbol{\Sigma}),

which means that the constraints are

(𝟎nm2𝟎nvec(𝐆1)1𝟎nm2𝟎nvec(𝐆K)1)𝐱𝟎K.\begin{pmatrix}\mathbf{0}_{nm^{2}}^{\prime}&\mathbf{0}_{n}^{\prime}&-\mathrm{vec}(\mathbf{G}_{1})^{\prime}&1\\ \vdots&\vdots&\vdots&\vdots\\ \mathbf{0}_{nm^{2}}^{\prime}&\mathbf{0}_{n}^{\prime}&-\mathrm{vec}(\mathbf{G}_{K})^{\prime}&1\end{pmatrix}\mathbf{x}\geq\mathbf{0}_{K}.

Appendix B Proofs

Here, we provide the nontrivial proofs of the lemmas and theorems presented in the paper.

Proof 1 (Lemma 1)

Let 𝚺𝕊+m\boldsymbol{\Sigma}\in\mathbb{S}^{m}_{+} be such that tr(𝐁𝚺𝐁)α\mathrm{tr}(\mathbf{B}_{\ell}^{\prime}\boldsymbol{\Sigma}\mathbf{B}_{\ell})\leq\alpha for all {1:K}\ell\in\{1{:}K\}. Let 𝐰=(w1,,wK)K\mathbf{w}=(w_{1},\ldots,w_{K})^{\prime}\in\mathbb{R}^{K} be a vector with nonnegative components summing to 11, and let 𝐍:=𝐍(𝐰)==1Kw𝐁𝐁\mathbf{N}:=\mathbf{N}(\mathbf{w})=\sum_{\ell=1}^{K}w_{\ell}\mathbf{B}_{\ell}\mathbf{B}^{\prime}_{\ell}. First, since maxtr(𝐁𝚺𝐁)α\max_{\ell}\mathrm{tr}(\mathbf{B}^{\prime}_{\ell}\boldsymbol{\Sigma}\mathbf{B}_{\ell})\leq\alpha, we have

tr(𝐍1/2𝚺𝐍1/2)\displaystyle\mathrm{tr}(\mathbf{N}^{1/2}\boldsymbol{\Sigma}\mathbf{N}^{1/2}) =tr(𝐍𝚺)\displaystyle=\mathrm{tr}(\mathbf{N}\boldsymbol{\Sigma})
=wtr(𝐁𝚺𝐁)α.\displaystyle=\sum_{\ell}w_{\ell}\mathrm{tr}(\mathbf{B}^{\prime}_{\ell}\boldsymbol{\Sigma}\mathbf{B}_{\ell})\leq\alpha. (35)

Consider an m×rm\times r matrix 𝐗\mathbf{X} such that 𝒞(𝐗)𝒞(𝐍)\mathcal{C}(\mathbf{X})\subseteq\mathcal{C}(\mathbf{N}). We have

tr(𝐗𝚺𝐗)\displaystyle\mathrm{tr}(\mathbf{X}^{\prime}\boldsymbol{\Sigma}\mathbf{X}) (36)
=tr(𝐗(𝐍1/2)+𝐍1/2𝚺𝐍1/2(𝐍1/2)+𝐗)\displaystyle=\mathrm{tr}(\mathbf{X}^{\prime}(\mathbf{N}^{1/2})^{+}\mathbf{N}^{1/2}\boldsymbol{\Sigma}\mathbf{N}^{1/2}(\mathbf{N}^{1/2})^{+}\mathbf{X}) (37)
=tr([𝐍1/2𝚺𝐍1/2][(𝐍1/2)+𝐗𝐗(𝐍1/2)+])\displaystyle=\mathrm{tr}([\mathbf{N}^{1/2}\boldsymbol{\Sigma}\mathbf{N}^{1/2}][(\mathbf{N}^{1/2})^{+}\mathbf{X}\mathbf{X}^{\prime}(\mathbf{N}^{1/2})^{+}]) (38)
tr(𝐍1/2𝚺𝐍1/2)λmax((𝐍1/2)+𝐗𝐗(𝐍1/2)+)\displaystyle\leq\mathrm{tr}(\mathbf{N}^{1/2}\boldsymbol{\Sigma}\mathbf{N}^{1/2})\lambda_{\max}((\mathbf{N}^{1/2})^{+}\mathbf{X}\mathbf{X}^{\prime}(\mathbf{N}^{1/2})^{+}) (39)
αλmax(𝐗𝐍+𝐗),\displaystyle\leq\alpha\lambda_{\max}(\mathbf{X}^{\prime}\mathbf{N}^{+}\mathbf{X}), (40)

where (37) follows from 𝐗=𝐍1/2(𝐍1/2)+𝐗\mathbf{X}=\mathbf{N}^{1/2}(\mathbf{N}^{1/2})^{+}\mathbf{X}, (38) is a basic property of the trace, (39) follows from tr(𝐐𝐇)tr(𝐐)λmax(𝐇)\mathrm{tr}(\mathbf{Q}\mathbf{H})\leq\mathrm{tr}(\mathbf{Q})\lambda_{\max}(\mathbf{H}) for any 𝐐,𝐇𝕊+m\mathbf{Q},\mathbf{H}\in\mathbb{S}^{m}_{+}, and (40) is valid because of (35), λmax(𝐘𝐘)=λmax(𝐘𝐘)\lambda_{\max}(\mathbf{Y}\mathbf{Y}^{\prime})=\lambda_{\max}(\mathbf{Y}^{\prime}\mathbf{Y}) for any matrix 𝐘\mathbf{Y}, and (𝐍1/2)+(𝐍1/2)+=𝐍+(\mathbf{N}^{1/2})^{+}(\mathbf{N}^{1/2})^{+}=\mathbf{N}^{+} (e.g., Example 7.54 (c) in Seber (2008)).

Proof 2 (Lemma 2)

We need only to prove that 𝐞j𝒞(𝐍(𝐰(j+)))\mathbf{e}_{j}\in\mathcal{C}(\mathbf{N}(\mathbf{w}^{(j+)})) for any j{1:m}j\in\{1{:}m\}; all other claims of the lemma are then trivial. We fix j{1:m}j\in\{1{:}m\} and adopt the notations 𝐰:=(w1,,wK):=𝐰(j+)\mathbf{w}:=(w_{1},\ldots,w_{K})^{\prime}:=\mathbf{w}^{(j+)}, 𝐡:=(𝐡1,,𝐡K):=𝐁+𝐞j\mathbf{h}:=(\mathbf{h}^{\prime}_{1},\ldots,\mathbf{h}^{\prime}_{K})^{\prime}:=\mathbf{B}^{+}\mathbf{e}_{j}, where 𝐡1s1,,𝐡KsK\mathbf{h}_{1}\in\mathbb{R}^{s_{1}},\ldots,\mathbf{h}_{K}\in\mathbb{R}^{s_{K}}, and 𝐁~=(w1𝐁1,,wK𝐁K)\tilde{\mathbf{B}}=(\sqrt{w_{1}}\mathbf{B}_{1},\ldots,\sqrt{w_{K}}\mathbf{B}_{K}). Because 𝐞j𝒞(𝐁𝐁)=𝒞(𝐁)\mathbf{e}_{j}\in\mathcal{C}(\mathbf{B}\mathbf{B}^{\prime})=\mathcal{C}(\mathbf{B}), we have

𝐞j\displaystyle\mathbf{e}_{j} =𝐁𝐁+𝐞j=𝐁𝐡=𝐁𝐡=;𝐡𝟎s𝐁𝐡\displaystyle=\mathbf{B}\mathbf{B}^{+}\mathbf{e}_{j}=\mathbf{B}\mathbf{h}=\sum_{\ell}\mathbf{B}_{\ell}\mathbf{h}_{\ell}=\sum_{\ell;\mathbf{h}_{\ell}\neq\mathbf{0}_{s_{\ell}}}\mathbf{B}_{\ell}\mathbf{h}_{\ell}
=;w>0w𝐁(𝐡/w)=𝐁~𝐡~,\displaystyle=\sum_{\ell;w_{\ell}>0}\sqrt{w_{\ell}}\mathbf{B}_{\ell}(\mathbf{h}_{\ell}/\sqrt{w_{\ell}})=\tilde{\mathbf{B}}\tilde{\mathbf{h}},

where 𝐡~=(𝐡1/w1,,𝐡K/wK)\tilde{\mathbf{h}}=(\mathbf{h}^{\prime}_{1}/\sqrt{w_{1}},\ldots,\mathbf{h}^{\prime}_{K}/\sqrt{w_{K}})^{\prime} and 𝟎/0:=𝟎\mathbf{0}/0:=\mathbf{0}. That is, 𝐞j𝒞(𝐁~)\mathbf{e}_{j}\in\mathcal{C}(\tilde{\mathbf{B}}). However, 𝒞(𝐁~)=𝒞(𝐁~𝐁~)=𝒞(𝐍(𝐰))\mathcal{C}(\tilde{\mathbf{B}})=\mathcal{C}(\tilde{\mathbf{B}}\tilde{\mathbf{B}}^{\prime})=\mathcal{C}(\mathbf{N}(\mathbf{w})).

References

  • Ahipasaoglu (2021) Ahipasaoglu, S.D., 2021. A branch-and-bound algorithm for the exact optimal experimental design problem. Statistics and Computing 31, 1–11.
  • Atkinson et al. (2007) Atkinson, A.C., Donev, A., Tobias, R., 2007. Optimum experimental designs, with SAS. Oxford University Press, New York.
  • Atkinson and Donev (1996) Atkinson, A.C., Donev, A.N., 1996. Experimental design optimally balanced for trend. Technometrics 38, 333–341.
  • Atkinson and Woods (2015) Atkinson, A.C., Woods, D.C., 2015. Designs for generalized linear models, in: Dean, A., Morris, M., Stufken, J., Bingham, D. (Eds.), Handbook of design and analysis of experiments. Chapman and Hall/CRC, New York, pp. 471–514.
  • Bailey (2007) Bailey, R.A., 2007. Designs for two-colour microarray experiments. Journal of the Royal Statistical Society: Series C 56, 365–394.
  • Bailey and Cameron (2009) Bailey, R.A., Cameron, P.J., 2009. Combinatorics of optimal designs. Surveys in Combinatorics 365, 19–73.
  • Bhela et al. (2019) Bhela, S., Deka, D., Nagarajan, H., Kekatos, V., 2019. Designing power grid topologies for minimizing network disturbances: An exact MILP formulation, in: 2019 American Control Conference, IEEE. pp. 1949–1956.
  • Bulutoglu and Margot (2008) Bulutoglu, D.A., Margot, F., 2008. Classification of orthogonal arrays by integer programming. Journal of Statistical Planning and Inference 138, 654–666.
  • Chang and Yeh (1998) Chang, F.C., Yeh, Y.R., 1998. Exact A-optimal designs for quadratic regression. Statistica Sinica 8, 527–533.
  • Chen et al. (2022) Chen, P.Y., Chen, R.B., Wong, W.K., 2022. Particle swarm optimization for searching efficient experimental designs: A review. Wiley Interdisciplinary Reviews: Computational Statistics 14, e1578.
  • Cheng (1981) Cheng, C.S., 1981. Maximizing the total number of spanning trees in a graph: Two related problems in graph theory and optimum design theory. Journal of Combinatorial Theory, Series B 31, 240–248.
  • Duarte et al. (2020) Duarte, B.P., Granjo, J.F., Wong, W.K., 2020. Optimal exact designs of experiments via mixed integer nonlinear programming. Statistics and Computing 30, 93–112.
  • Fedorov (1972) Fedorov, V.V., 1972. Theory of optimal experiments. Academic Press, New York.
  • Fedorov (1989) Fedorov, V.V., 1989. Optimal design with bounded density: optimization algorithms of the exchange type. Journal of Statistical Planning and Inference 22, 1–13.
  • Filová and Harman (2020) Filová, L., Harman, R., 2020. Ascent with quadratic assistance for the construction of exact experimental designs. Computational Statistics 35, 775–801.
  • Fontana (2013) Fontana, R., 2013. Algebraic generation of minimum size orthogonal fractional factorial designs: an approach based on integer linear programming. Computational Statistics 28, 241–253.
  • Fu et al. (2020) Fu, A., Narasimhan, B., Boyd, S., 2020. CVXR: An R package for disciplined convex optimization. Journal of Statistical Software 94, 1–34.
  • Gaffke (1986) Gaffke, N., 1986. On D-optimality of exact linear regression designs with minimum support. Journal of Statistical Planning and Inference 15, 189–204.
  • Gaffke and Krafft (1982) Gaffke, N., Krafft, O., 1982. Exact D-optimum designs for quadratic regression. Journal of the Royal Statistical Society: Series B 44, 394–397.
  • Ghosh et al. (2008) Ghosh, A., Boyd, S., Saberi, A., 2008. Minimizing effective resistance of a graph. SIAM Review 50, 37–66.
  • Goos and Jones (2011) Goos, P., Jones, B., 2011. Optimal design of experiments: a case study approach. Wiley, Chichester.
  • Gurobi Optimization, LLC (2023) Gurobi Optimization, LLC, 2023. Gurobi Optimizer Reference Manual. URL: https://www.gurobi.com.
  • Haines (1987) Haines, L.M., 1987. The application of the annealing algorithm to the construction of exact optimal designs for linear-regression models. Technometrics 29, 439–447.
  • Haines and Clark (2014) Haines, L.M., Clark, A.E., 2014. The construction of optimal designs for dose-escalation studies. Statistics and Computing 24, 101–109.
  • Harman (2014) Harman, R., 2014. Multiplicative methods for computing D-optimal stratified designs of experiments. Journal of Statistical Planning and Inference 146, 82–94.
  • Harman et al. (2016) Harman, R., Bachratá, A., Filová, L., 2016. Construction of efficient experimental designs under multiple resource constraints. Applied Stochastic Models in Business and Industry 32, 3–17.
  • Harman and Filová (2014) Harman, R., Filová, L., 2014. Computing efficient exact designs of experiments using integer quadratic programming. Computational Statistics and Data Analysis 71, 1159–1167.
  • Harman and Filová (2019) Harman, R., Filová, L., 2019. OptimalDesign: A Toolbox for Computing Efficient Designs of Experiments. R package version 1.0.1.
  • Harman et al. (2021) Harman, R., Filová, L., Rosa, S., 2021. Optimal design of multifactor experiments via grid exploration. Statistics and Computing 31, 70.
  • Harman and Jurík (2008) Harman, R., Jurík, T., 2008. Computing c-optimal experimental designs using the simplex method of linear programming. Computational Statistics and Data Analysis 53, 247–254.
  • Huang et al. (2019) Huang, Y., Gilmour, S.G., Mylona, K., Goos, P., 2019. Optimal design of experiments for non-linear response surface models. Journal of the Royal Statistical Society: Series C 68, 623–640.
  • Imhof (2000) Imhof, L., 2000. Exact designs minimising the integrated variance in quadratic regression. Statistics 34, 103–115.
  • Imhof (2014) Imhof, L.A., 2014. G-optimal exact designs for quadratic regression. Journal of Statistical Planning and Inference 154, 133–140.
  • Khuri et al. (2006) Khuri, A.I., Mukherjee, B., Sinha, B.K., Ghosh, M., 2006. Design issues for generalized linear models: A review. Statistical Science 21, 376–399.
  • Lin et al. (2015) Lin, C.D., Anderson-Cook, C.M., Hamada, M.S., Moore, L.M., Sitter, R.R., 2015. Using genetic algorithms to design experiments: A review. Quality and Reliability Engineering International 31, 155–167.
  • McCormick (1976) McCormick, G.P., 1976. Computability of global solutions to factorable nonconvex programs: Part I – convex underestimating problems. Mathematical Programming 10, 147–175.
  • Meyer and Nachtsheim (1995) Meyer, R.K., Nachtsheim, C.J., 1995. The coordinate-exchange algorithm for constructing exact optimal experimental designs. Technometrics 37, 60–69.
  • Mitchell (1974) Mitchell, T.J., 1974. An algorithm for the construction of D-optimal experimental designs. Technometrics 16, 203–210.
  • Neubauer et al. (2000) Neubauer, M., Watkins, W., Zeitlin, J., 2000. D-optimal weighing designs for six objects. Metrika 52, 185–211.
  • Okasaki et al. (2023) Okasaki, C., Tóth, S.F., Berdahl, A.M., 2023. Optimal sampling design under logistical constraints with mixed integer programming. arXiv:2302.05553 [stat.ME].
  • Pázman (1986) Pázman, A., 1986. Foundation of Optimum Experimental Design. Reidel Publ., Dordrecht.
  • Pronzato and Pázman (2013) Pronzato, L., Pázman, A., 2013. Design of Experiments in Nonlinear Models. Springer, New York.
  • Pronzato and Zhigljavsky (2014) Pronzato, L., Zhigljavsky, A.A., 2014. Algorithmic construction of optimal designs on compact sets for concave and differentiable criteria. Journal of Statistical Planning and Inference 154, 141–155.
  • Pukelsheim (1993) Pukelsheim, F., 1993. Optimal design of experiments. Wiley, New York.
  • Pukelsheim and Rieder (1992) Pukelsheim, F., Rieder, S., 1992. Efficient rounding of approximate designs. Biometrika 79, 763–770.
  • Rasch et al. (1997) Rasch, D.A.M.K., Hendrix, E.M.T., Boer, E.P.J., 1997. Replication-free optimal designs in regression analysis. Computational Statistics 12, 19–52.
  • Sagnol (2011) Sagnol, G., 2011. Computing optimal designs of multiresponse experiments reduces to second-order cone programming. Journal of Statistical Planning and Inference 141, 1684–1708.
  • Sagnol and Harman (2015) Sagnol, G., Harman, R., 2015. Computing exact D-optimal designs by mixed integer second-order cone programming. The Annals of Statistics 43, 2198–2224.
  • Sartono et al. (2015a) Sartono, B., Goos, P., Schoen, E., 2015a. Constructing general orthogonal fractional factorial split-plot designs. Technometrics 57, 488–502.
  • Sartono et al. (2015b) Sartono, B., Schoen, E., Goos, P., 2015b. Blocking orthogonal designs with mixed integer linear programming. Technometrics 57, 428–439.
  • Seber (2008) Seber, G.A., 2008. A Matrix Handbook for Statisticians. John Wiley & Sons, New Jersey.
  • Todd (2016) Todd, M.J., 2016. Minimum-Volume Ellipsoids: Theory and Algorithms. SIAM, Philadelphia.
  • Vandenberghe and Boyd (1999) Vandenberghe, L., Boyd, S., 1999. Applications of semidefinite programming. Applied Numerical Mathematics 29, 283–299.
  • Vo-Thanh et al. (2018) Vo-Thanh, N., Jans, R., Schoen, E.D., Goos, P., 2018. Symmetry breaking in mixed integer linear programming formulations for blocking two-level orthogonal experimental designs. Computers & Operations Research 97, 96–110.
  • Welch (1982) Welch, W.J., 1982. Branch-and-bound search for experimental designs based on D-optimality and other criteria. Technometrics 24, 41–48.
  • Wong (1992) Wong, W.K., 1992. A unified approach to the construction of minimax designs. Biometrika 79, 611–619.
  • Wong and Zhou (2019) Wong, W.K., Zhou, J., 2019. CVX-based algorithms for constructing various optimal regression designs. Canadian Journal of Statistics 47, 374–391.
  • Wong and Zhou (2023) Wong, W.K., Zhou, J., 2023. Using CVX to construct optimal designs for biomedical studies with multiple objectives. Journal of Computational and Graphical Statistics 32, 744–753.