This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A general framework of rotational sparse approximation in uncertainty quantification

Mengqi Hu Yifei Lou Xiu Yang Corresponding author: xiy518@lehigh.edu Department of Industrial and Systems Engineering, Lehigh University, PA 18015, USA
Abstract

This paper proposes a general framework to estimate coefficients of generalized polynomial chaos (gPC) used in uncertainty quantification via rotational sparse approximation. In particular, we aim to identify a rotation matrix such that the gPC expansion of a set of random variables after the rotation has a sparser representation. However, this rotational approach alters the underlying linear system to be solved, which makes finding the sparse coefficients more difficult than the case without rotation. To solve this problem, we examine several popular nonconvex regularizations in compressive sensing (CS) that perform better than the classic 1\ell_{1} approach empirically. All these regularizations can be minimized by the alternating direction method of multipliers (ADMM). Numerical examples show superior performance of the proposed combination of rotation and nonconvex sparse promoting regularizations over the ones without rotation and with rotation but using the convex 1\ell_{1} approach.

keywords: Generalized polynomial chaos, uncertainty quantification, iterative rotations, compressive sensing, alternating direction method of multipliers, nonconvex regularization .

1 Introduction

A surrogate model (also known as “response surface”) plays an important role in uncertainty quantification (UQ), as it can efficiently evaluate the quantity of interest (QoI) of a system given a set of inputs. Specifically, in parametric uncertainty studies, the input usually refers to a set of parameters in the system, while the QoI can be observables such as mass, density, pressure, velocity, or even a trajectory of a dynamical system. The uncertainty in the system’s parameters typically originates from the lack of physical knowledge, inaccurate measurements, etc. Therefore, it is common to treat these parameters as random variables, and statistics, e.g., mean, variance, and the probability density function (PDF) of the QoI with respect to such random parameters are crucial in understanding the behavior of the system.

The generalized polynomial chaos (gPC) expansion [23, 58] is a widely used surrogate model in applied mathematics and engineering studies, which uses orthogonal polynomials associated with measures of the aforementioned random variables. Under some conditions, the gPC expansion converges to the QoI in a Hilbert space as the number of polynomials increases [9, 21, 42, 58]. Both intrusive methods (e.g., stochastic Galerkin) and non-intrusive methods (e.g., probabilistic collocation method) [4, 23, 52, 57, 58] have been developed to compute the gPC coefficients. The latter is particularly more desirable to study a complex system, as it does not require to modify the computational models or simulation codes. For example, the gPC coefficients can be calculated based on input samples and corresponding output using least squared fitting, probabilistic collocation method, etc.

However, in many practical problems, it is prohibitive to obtain a large amount of output samples used in the non-intrusive methods, because it is costly to measure the QoI in experiments or to conduct simulations using a complicated model. Consequently, one shall consider an under-determined linear system (matrix), denoted as 𝚿\mathbf{\Psi}, of size M×NM\times N with M<NM<N (or even MNM\ll N), where MM is the size of available output samples and NN is the number of basis functions used in the gPC expansion. When the solution to the under-determined system is sparse, compressive sensing (CS) techniques [8, 10, 11, 19] are effective. Recent studies have shown some success in applying CS to UQ problems [2, 20, 33, 34, 44, 49, 60, 61, 64]. For example, sampling strategies [3, 27, 30, 46] can improve the property of 𝚿\mathbf{\Psi} to guarantee sparse recovery via the 1\ell_{1} minimization. Computationally, the weighted 1\ell_{1} minimization [1, 13, 43, 47, 64] assigns larger weights to smaller components (in magnitude) of the solution, and hence minimizing the weighted 1\ell_{1} norm leads to a sparser solution than the vanilla 1\ell_{1} minimization does. Besides, adaptive basis selection [3, 6, 16, 28, 29] as well as dimension reduction techniques can be adopted to reduce the number of unknown variables [55, 67] thus improving computational efficiency.

In this paper, we focus on a sparsity-enhancing approach, referred to as iterative rotation [34, 65, 66, 68], that intrinsically changes the structure of a surrogate model to make the gPC coefficients more sparse. However, this method tends to deteriorate properties of 𝚿\mathbf{\Psi} that are favorable by CS algorithms, e.g., low coherence, which may counteract the benefit of the enhanced sparsity. Since the polynomials in the gPC expansion may not be orthogonal after the rotation of the random variables, the coherence of 𝚿\mathbf{\Psi} may not converges to zero asymptotically, leading to an amplified coherence after the rotation. To remedy this drawback, we innovatively combine the iterative rotation technique with a class of nonconvex regularizations to improve the efficiency of CS-based UQ methods. Specifically, our new approach uses rotations to increase the sparsity while taking advantages of the nonconvex formalism for dealing with a matrix 𝚿\mathbf{\Psi} that is highly coherent. In this way, we leverage the advantages of both methods to exploit information from limited samples of the QoI more efficiently and to construct gPC expansions more accurately. Main contributions of this work are two-fold. On one hand, we propose a unified and flexible framework that combines iterative rotation and sparse recovery together with an efficient algorithm. On the other hand, we empirically validate a rule of thumb in CS that nonconvex regularizations often lead to better performance compared to the convex approach.

The rest of the paper is organized as follows. We briefly review gPC, CS, and rotational CS in 2. We describe the combination of sparse signal recovery and rotation matrix estimation in 3. 4 devotes to numerical examples, showing that the proposed approach significantly outperforms the state-of-the-art. Finally, conclusions are given in 5.

2 Prior works

In this section, we briefly review gPC expansions, useful concepts in CS, and our previous work [65, 68] on the rotational CS for gPC methods.

2.1 Generalized polynomial chaos expansions

We consider a QoI uu that depends on location 𝒙\bm{x}, time tt and a set of random variables 𝝃\bm{\xi} with the following gPC expansion:

u(𝒙,t;𝝃)=n=1Ncn(𝒙,t)ψn(𝝃)+ε(𝒙,t;𝝃),u(\bm{x},t;\bm{\xi})=\sum_{n=1}^{N}c_{n}(\bm{x},t)\psi_{n}(\bm{\xi})+\varepsilon(\bm{x},t;\bm{\xi}), (1)

where cn(𝒙,t)=𝔼{u(𝒙,t;𝝃)ψn(𝝃)}c_{n}(\bm{x},t)=\mathbb{E}\{u(\bm{x},t;\bm{\xi})\psi_{n}(\bm{\xi})\} and ε\varepsilon denotes the truncation error. Here, {ψn}n=1N\{\psi_{n}\}_{n=1}^{N} are orthonormal with respect to the measure of 𝝃\bm{\xi}, i.e.,

dψi(𝒙)ψj(𝒙)ρ𝝃(𝒙)d𝒙=δij,\int_{\mathbb{R}^{d}}\psi_{i}(\bm{x})\psi_{j}(\bm{x})\rho_{\bm{\xi}}(\bm{x})\mathrm{d}\bm{x}=\delta_{ij}, (2)

where ρ𝝃(𝒙)\rho_{\bm{\xi}}(\bm{x}) is the PDF of 𝝃\bm{\xi}, δij\delta_{ij} is the Kronecker delta function, and we usually set ψ1(𝒙)1\psi_{1}(\bm{x})\equiv 1. We study systems relying on dd-dimensional i.i.d. random variables 𝝃=(ξ1,,ξd)\bm{\xi}=(\xi_{1},\dotsc,\xi_{d}) and the gPC basis functions are constructed by tensor products of univariate orthonormal polynomials associated with ξi\xi_{i}. Specifically for a multi-index 𝜶=(α1,,αd)\bm{\alpha}=(\alpha_{1},\dotsc,\alpha_{d}) with each αi{0}\alpha_{i}\in\mathbb{N}\cup\{0\}, we set

ψ𝜶(𝝃)=ψα1(ξ1)ψα2(ξ2)ψαd(ξd),\psi_{\bm{\alpha}}(\bm{\xi})=\psi_{\alpha_{1}}(\xi_{1})\psi_{\alpha_{2}}(\xi_{2})\cdots\psi_{\alpha_{d}}(\xi_{d}), (3)

where ψαi\psi_{\alpha_{i}} are univariate orthonormal polynomial of degree αi\alpha_{i}. For two different multi-indices 𝜶=(α1,,αd)\bm{\alpha}=(\alpha_{{}_{1}},\cdots,\alpha_{{}_{d}}) and 𝜷=(β1,,βd)\bm{\beta}=(\beta_{{}_{1}},\cdots,\beta_{{}_{d}}), we have

dψ𝜶(𝒙)ψ𝜷(𝒙)ρ𝝃(𝒙)d𝒙=δ𝜶𝜷=δα1β1δα2β2δαdβd,\int_{\mathbb{R}^{d}}\psi_{\bm{\alpha}}(\bm{x})\psi_{\bm{\beta}}(\bm{x})\rho_{\bm{\xi}}(\bm{x})\mathrm{d}\bm{x}=\delta_{\bm{\alpha}\bm{\beta}}=\delta_{\alpha_{1}\beta_{1}}\delta_{\alpha_{2}\beta_{2}}\cdots\delta_{\alpha_{d}\beta_{d}}, (4)

with ρ𝝃(𝒙)=ρξ1(x1)ρξ2(x2)ρξd(xd).\rho_{\bm{\xi}}(\bm{x})=\rho_{\xi_{1}}(x_{1})\rho_{\xi_{2}}(x_{2})\cdots\rho_{\xi_{d}}(x_{d}). Typically, a ppth order gPC expansion involves all polynomials ψ𝜶\psi_{\bm{\alpha}} satisfying |𝜶|p|\bm{\alpha}|\leq p for |𝜶|=i=1dαi|\bm{\alpha}|=\sum_{i=1}^{d}\alpha_{i}, which indicates that a total number of N=(p+dd)N=\left(\begin{smallmatrix}p+d\\ d\end{smallmatrix}\right) polynomials are used in the expansion. For simplicity, we reorder 𝜶\bm{\alpha} in such a way that we index ψ𝜶\psi_{\bm{\alpha}} by ψn\psi_{n}, which is consistent with Eq. (1).

In this paper, we focus on time independent problems, in which the gPC expansion at a fixed location 𝒙\bm{x} is given by

u(𝝃q)=n=1Ncnψn(𝝃q)+ε(𝝃q),q=1,2,,M,u(\bm{\xi}^{q})=\sum_{n=1}^{N}c_{n}\psi_{n}(\bm{\xi}^{q})+\varepsilon(\bm{\xi}^{q}),\quad q=1,2,\cdots,M, (5)

where each 𝝃q\bm{\xi}^{q} is a sample of 𝝃\bm{\xi}, e.g, 𝝃1=(ξ11,,ξd1)\bm{\xi}^{1}=(\xi_{1}^{1},\cdots,\xi_{d}^{1}), and MM is the number of samples. We rewrite the above expansion in terms of the matrix-vector notation, i.e.,

𝚿𝒄=𝒖𝜺,\mathbf{\Psi}\bm{c}=\bm{u}-\bm{\varepsilon}, (6)

where 𝒖=(u1,,uM)𝖳\bm{u}=(u^{1},\dotsc,u^{M})^{\mathsf{T}} is a vector of output samples, 𝒄=(c1,,cN)𝖳\bm{c}=(c_{1},\dotsc,c_{N})^{\mathsf{T}} is a vector of gPC coefficients, 𝚿\mathbf{\Psi} is an M×NM\times N matrix with Ψij=ψj(𝝃i),\Psi_{ij}=\psi_{j}(\bm{\xi}^{i}), and 𝜺=(ε1,,εM)𝖳\bm{\varepsilon}=(\varepsilon^{1},\dotsc,\varepsilon^{M})^{\mathsf{T}} is a vector of errors with εq=ε(𝝃q)\varepsilon^{q}=\varepsilon(\bm{\xi}^{q}). We are interested in identifying sparse coefficients 𝐜=(c1,,cN)𝖳\mathbf{c}=(c_{1},\dotsc,c_{N})^{\mathsf{T}} among the solutions of an under-determined system with M<NM<N in (6), which is the focus of compressive sensing [12, 17, 22].

2.2 Compressive sensing

We review the concept of sparsity, which plays an important role in error estimation for solving the under-determined system (6). The number of non-zero entries of a vector 𝒄=(c1,,cN)\bm{c}=(c_{1},\dotsc,c_{N}) is denoted by 𝒄0\|\bm{c}\|_{0}. Note that 0\|\cdot\|_{0} is named the “0\ell_{0} norm” in [17], although it is not a norm nor a semi-norm. The vector 𝒄\bm{c} is called ss-sparse if 𝒄0s\|\bm{c}\|_{0}\leq s, and it is considered a sparse vector if sNs\ll N. Few practical systems have truly sparse gPC coefficients, but rather compressible, i.e., only a few entries contributing significantly to its 1\ell_{1} norm. To this end, a vector 𝒄s\bm{c}_{s} is defined as the best ss-sparse approximation which is obtained by setting all but the ss-largest entries in magnitude of 𝒄\bm{c} to zero, and subsequently, 𝒄\bm{c} is regarded as sparse or compressible if 𝒄𝒄s1\|\bm{c}-\bm{c}_{s}\|_{1} is small for sNs\ll N.

In order to find a sparse vector 𝒄\bm{c} from (6), one formulates the following problem,

𝒄^0=argmin𝒄12𝚿𝒄𝒖22+λ𝒄0,\hat{\bm{c}}_{0}=\arg\min_{\bm{c}}\frac{1}{2}\|\mathbf{\Psi}\bm{c}-\bm{u}\|_{2}^{2}+\lambda\|\bm{c}\|_{0}, (7)

where λ\lambda is a positive parameter to be tuned such that 𝚿𝒄^0𝒖2ϵ.\|\mathbf{\Psi}\hat{\bm{c}}_{0}-\bm{u}\|_{2}\leq\epsilon. As the 0\ell_{0} minimization (7) is NP-hard to solve [41], one often uses the convex 1\ell_{1} norm to replace 0\ell_{0}, i.e.,

𝒄^1=argmin𝒄12𝚿𝒄𝒖22+λ𝒄1.\hat{\bm{c}}_{1}=\arg\min_{\bm{c}}\frac{1}{2}\|\mathbf{\Psi}\bm{c}-\bm{u}\|_{2}^{2}+\lambda\|\bm{c}\|_{1}. (8)

A sufficient condition of the 1\ell_{1} minimization to exactly recover the sparse signal was proved based on the restricted isometry property (RIP) [11]. Unfortunately, RIP is numerically unverifiable for a given matrix [5, 53]. Instead, a computable condition for 1\ell_{1}’s exact recovery is coherence, which is defined as

μ(𝚿)=maxij|𝝍i,𝝍j|𝝍i𝝍j,with𝚿=[𝝍1,,𝝍N].\mu(\mathbf{\Psi})=\max_{i\neq j}\dfrac{|\langle\bm{\psi}_{i},\bm{\psi}_{j}\rangle|}{\|\bm{\psi}_{i}\|\|\bm{\psi}_{j}\|},\quad\mbox{with}\ \mathbf{\Psi}=[\bm{\psi}_{1},\cdots,\bm{\psi}_{N}]. (9)

Donoho-Elad [18] and Gribonval [24] proved independently that if

𝒄^10<12(1+1μ(𝚿)),\|\hat{\bm{c}}_{1}\|_{0}<\frac{1}{2}\left(1+\frac{1}{\mu(\mathbf{\Psi})}\right), (10)

then 𝒄^1\hat{\bm{c}}_{1} is indeed the sparsest solution to (8). Although the inequality condition in (10) is not sharp, the coherence of a matrix 𝚿\mathbf{\Psi} is often used as an indicator to quantify how difficult it is to find a sparse vector from a linear system governed by 𝚿\mathbf{\Psi} in the sense that the larger the coherence is, the more challenging it is to find the sparse vector 𝒄\bm{c} [37, 70]. Apparently if the samples 𝝃q\bm{\xi}^{q} are drawn independently according to the distribution of 𝝃\bm{\xi}, μ(𝚿)\mu(\mathbf{\Psi}) for the gPC expansion converges to zero as MM\rightarrow\infty [20]. However, the rotation technique tends to increase the coherence of the matrix, leading to unsatisfactory performance of the subsequent 1\ell_{1} minimization as discussed in [68].

In this work, we promote the use of nonconvex regularizations to find a sparse vector when the coherence of 𝚿\mathbf{\Psi} is relatively large, referred to as a coherent linear system. There are many nonconvex alternatives to approximate the 0\ell_{0} norm that give superior results over the 1\ell_{1} norm, such as 1/2\ell_{1/2} [14, 59, 32], capped 1\ell_{1} [73, 50, 38], transformed 1\ell_{1} [39, 71, 72, 25], 1\ell_{1}-2\ell_{2} [69, 37, 36], 1/2\ell_{1}/\ell_{2} [45, 56] and error function (ERF) [26]. We formulate a general framework that works for any regularization whose proximal operator can be found efficiently. Recall that a proximal operator 𝐩𝐫𝐨𝐱J()\mathbf{prox}_{J}(\cdot) of a functional J()J(\cdot) is defined by

𝐩𝐫𝐨𝐱J(𝒄;μ)argminy(μJ(𝒚)+12𝒚𝒄22),\mathbf{prox}_{J}(\bm{c};\mu)\in\operatorname*{arg\,min}_{y}\Big{(}\mu J(\bm{y})+\frac{1}{2}\|\bm{y}-\bm{c}\|_{2}^{2}\Big{)}, (11)

where μ\mu is a positive parameter. We provide the formula of aforementioned regularizations together with their proximal operators as follows,

  • The 1\ell_{1} norm of 𝐲\mathbf{y} is 𝒚1\|\bm{y}\|_{1} with its proximal operator given by

    𝐩𝐫𝐨𝐱1(𝒚;μ)=sign(𝒚)max(|𝒚|μ,0),\displaystyle\mathbf{prox}_{\ell_{1}}(\bm{y};\mu)=\mbox{sign}(\bm{y})\circ\max(|\bm{y}|-\mu,0), (12)

    where \circ denotes the Hadamard operator for componentwise operation.

  • The square of the 1/2\ell_{1/2} norm is defined as 𝒚1/22=n|yn|,\|\bm{y}\|_{1/2}^{2}=\sum_{n}\sqrt{|y_{n}|}, whose proximal operator [59] is expressed as

    𝐩𝐫𝐨𝐱1/2(𝒚;μ)=3𝐲4[cos(π3ϕ(𝒚)3)]2max(𝐲34μ2/3,0),\mathbf{prox}_{\ell_{1/2}}(\bm{y};\mu)=\frac{3\mathbf{y}}{4}\circ\left[\cos\left(\frac{\pi}{3}-\frac{\phi(\bm{y})}{3}\right)\right]^{2}\circ\max(\mathbf{y}-\frac{3}{4}\mu^{2/3},0), (13)

    where ϕ(𝒚)=arccos(μ8(3|𝒚|)3/2)\phi(\bm{y})=\displaystyle\arccos\left({\frac{\mu}{8}\big{(}\frac{3}{|\bm{y}|}\big{)}^{3/2}}\right) and the square is also computed componentwisely.

  • Transformed 1\ell_{1} (TL1) is defined as (γ+1)𝒚1γ+𝒚1\frac{(\gamma+1)\|\bm{y}\|_{1}}{\gamma+\|\bm{y}\|_{1}} for a positive parameter γ\gamma and its proximal operator [71] is given by

    𝐩𝐫𝐨𝐱TL1(𝒚;μ)={[23(γ+|𝒚|)cosϕ(𝒚)323γ+|𝒚|3]if |𝒚|>θ0if |𝒚|θ,\displaystyle\mathbf{prox}_{\mbox{TL1}}(\bm{y};\mu)=\begin{cases}\left[\displaystyle\frac{2}{3}(\gamma+|\bm{y}|)\cos\frac{\phi(\bm{y})}{3}-\frac{2}{3}\gamma+\frac{|\bm{y}|}{3}\right]&\text{if }|\bm{y}|>\theta\\ 0&\text{if }|\bm{y}|\leq\theta,\end{cases} (14)

    with

    ϕ(𝒚)=arccos(127μγ(γ+1)2(γ+|𝒚|)3)andθ={μγ+1γ,if μγ22(γ+1)2μ(γ+1)γ2,if μ>γ22(γ+1).\displaystyle\phi(\bm{y})=\displaystyle\arccos{\left(1-\frac{27\mu\gamma(\gamma+1)}{2(\gamma+|\bm{y}|)^{3}}\right)}\quad\mbox{and}\quad\theta=\begin{cases}\mu\frac{\gamma+1}{\gamma},&\text{if }\mu\leq\frac{\gamma^{2}}{2(\gamma+1)}\\ \sqrt{2\mu(\gamma+1)}-\frac{\gamma}{2},&\text{if }\mu>\frac{\gamma^{2}}{2(\gamma+1)}.\end{cases}
  • The 1\ell_{1}-2\ell_{2} regularization is defined by 𝒚1𝐲2,\|\bm{y}\|_{1}-\|\mathbf{y}\|_{2}, and its proximal operator [36] is given by the following cases,

    • If 𝒚>μ\|\bm{y}\|_{\infty}>\mu, one has 𝐩𝐫𝐨𝐱12(𝒚;μ)=𝒛(𝒛2+μ)𝐳2\mathbf{prox}_{\ell_{1-2}}(\bm{y};\mu)=\frac{\bm{z}(\|\bm{z}\|_{2}+\mu)}{\|\mathbf{z}\|_{2}}, where 𝒛=𝐩𝐫𝐨𝐱1(𝒚;μ)\bm{z}=\mathbf{prox}_{\ell_{1}}(\bm{y};\mu).

    • If 𝒚μ\|\bm{y}\|_{\infty}\leq\mu, 𝒄:=𝐩𝐫𝐨𝐱12(𝒚;μ)\bm{c}^{*}:=\mathbf{prox}_{\ell_{1-2}}(\bm{y};\mu) is an optimal solution if and only if ci=0c^{*}_{i}=0 for |yi|<𝒚|y_{i}|<\|\bm{y}\|_{\infty}, 𝒄2=𝒚,\|\bm{c}^{*}\|_{2}=\|\bm{y}\|_{\infty}, and ciyi0c^{*}_{i}y_{i}\geq 0 for all ii. The optimality condition implies infinitely many solutions of 𝒄,\bm{c}^{*}, among which we choose ci=sign(yi)yc^{*}_{i}=\mbox{sign}(y_{i})\|y\|_{\infty} for the smallest ii satisfies |yi|=𝒚|y_{i}|=\|\bm{y}\|_{\infty} and the rest coefficients set to be zero.

  • The error function (ERF) [26] is defined by

    JσERF(𝒚):=j=1n0|yj|eτ2/σ2𝑑τ,J^{\mathrm{ERF}}_{\sigma}(\bm{y}):=\sum_{j=1}^{n}\int_{0}^{|y_{j}|}e^{-\tau^{2}/\sigma^{2}}d\tau, (15)

    for σ>0\sigma>0. Though there is no closed-form solution for this problem, one can find the solution via the Newton’s method. In particular, the optimality condition of (11) for ERF reads as

    𝐯μJσERF(𝒚)+𝒚=μexp(𝒚2σ2)|𝒚|+𝒚.\mathbf{v}\in\mu\partial J_{\sigma}^{\mathrm{ERF}}(\bm{y})+\bm{y}=\mu\exp\left(-\frac{\bm{y}^{2}}{\sigma^{2}}\right)\odot\partial|\bm{y}|+\bm{y}.

    When |vi|μ|v_{i}|\leq\mu, we have xi=0x_{i}=0. Otherwise the optimality condition becomes

    vi=μexp(yi2σ2)sign(vi)+yi,v_{i}=\mu\exp\left(-\frac{y_{i}^{2}}{\sigma^{2}}\right)\mbox{sign}(v_{i})+y_{i},

    which can be found by the Newton’s method.

2.3 Rotational compressive sensing

To further enhance the sparsity, we aim to find a linear map 𝐀:dd\mathbf{A}:\mathbb{R}^{d}\mapsto\mathbb{R}^{d} such that a new set of random variables 𝜼,\bm{\eta}, given by

𝜼=𝐀𝝃,𝜼=(η1,η2,,ηd)𝖳,\bm{\eta}=\mathbf{A}\mathbf{\bm{\xi}},\quad\bm{\eta}=(\eta_{1},\eta_{2},\cdots,\eta_{d})^{\mathsf{T}}, (16)

leads to a sparser polynomial expansion than 𝝃\bm{\xi} does. We consider 𝐀\mathbf{A} as an orthogonal matrix, i.e., 𝐀𝐀𝖳=𝐈\mathbf{A}\mathbf{A}^{\mathsf{T}}=\mathbf{I} with the identity matrix 𝐈\mathbf{I}, such that the linear map from 𝝃\bm{\xi} to 𝜼\bm{\eta} can be regarded as a rotation in d\mathbb{R}^{d}. Therefore, the new polynomial expansion for uu is expressed as

u(𝝃)ug(𝝃)=n=1Nc~nψn(𝐀𝝃)=n=1Nc~nψn(𝜼)=vg(𝜼).u(\bm{\xi})\approx u_{g}(\bm{\xi})=\sum_{n=1}^{N}\tilde{c}_{n}\psi_{n}(\mathbf{A}\bm{\xi})=\sum_{n=1}^{N}\tilde{c}_{n}\psi_{n}(\bm{\eta})=v_{g}(\bm{\eta}). (17)

Here ug(𝝃)u_{g}(\bm{\xi}) can be understood as a polynomial ug(𝒙)u_{g}(\bm{x}) evaluated at random variables 𝝃\bm{\xi}, and the same for vgv_{g}. Ideally, 𝒄~\tilde{\bm{c}} is sparser than 𝒄\bm{c}. In the previous works [34, 65], it is assumed that 𝝃𝒩(𝟎,𝐈)\bm{\xi}\sim\mathcal{N}(\bm{0},\mathbf{I}), so 𝜼𝒩(𝟎,𝐈)\bm{\eta}\sim\mathcal{N}(\bm{0},\mathbf{I}). For general cases where {ξi}i=1d\{\xi_{i}\}_{i=1}^{d} are not i.i.d. Gaussian, {ηi}i=1d\{\eta_{i}\}_{i=1}^{d} are not necessarily independent. Moreover, {ψn}n=1N\{\psi_{n}\}_{n=1}^{N} are not necessarily orthogonal to each other with respect to ρ𝜼\rho_{\bm{\eta}}. Therefore, vg(𝜼)v_{g}(\bm{\eta}) may not be a standard gPC expansion of v(𝜼)v(\bm{\eta}), but rather a polynomial equivalent to ug(𝝃)u_{g}(\bm{\xi}) with potentially sparser coefficients [68].

We can identify 𝐀\mathbf{A} using the gradient information of uu based on the framework of active subspace [15, 48]. In particular, we define

𝐖=1M[u(𝝃1),u(𝝃2),,u(𝝃M)].\mathbf{W}=\dfrac{1}{\sqrt{M}}[\nabla u(\bm{\xi}^{1}),\nabla u(\bm{\xi}^{2}),\cdots,\nabla u(\bm{\xi}^{M})]. (18)

Note that 𝐖\mathbf{W} is a d×Md\times M matrix, and we consider MdM\geq d in this work. The singular value decomposition (SVD) of 𝐖\mathbf{W} yields

𝐖=𝐔𝚺𝐕𝖳,\mathbf{W}=\mathbf{U}\mathbf{\Sigma}\mathbf{V}^{\mathsf{T}}, (19)

where 𝐔\mathbf{U} is a d×dd\times d orthogonal matrix, 𝚺\mathbf{\Sigma} is a d×Md\times M matrix, whose diagonal consists of singular values σ1σd0\sigma_{1}\geq\cdots\geq\sigma_{d}\geq 0, and 𝐕\mathbf{V} is an M×MM\times M orthogonal matrix. We set the rotation matrix as 𝐀=𝐔𝖳\mathbf{A}=\mathbf{U}^{\mathsf{T}}. As such, the rotation projects 𝝃\bm{\xi} to the directions of principle components of u\nabla u. Of note, we do not use the information of the orthogonal matrix 𝐕\mathbf{V}.

Unfortunately, uu is unknown, and thus the samples of u\nabla u are not available. Instead, we approximate Eq. (18) by a computed solution ugu_{g} that can be obtained by the 1\ell_{1} minimization [65]. In other words, we have

𝐖𝐖g=1M[ug(𝝃1),ug(𝝃2),,ug(𝝃M)],\mathbf{W}\approx\mathbf{W}_{g}=\dfrac{1}{\sqrt{M}}[\nabla u_{g}(\bm{\xi}^{1}),\nabla u_{g}(\bm{\xi}^{2}),\cdots,\nabla u_{g}(\bm{\xi}^{M})], (20)

and the rotation matrix is constructed based on the SVD of 𝐖g\mathbf{W}_{g}:

𝐖g=𝐔g𝚺g𝐕g𝖳,𝐀=𝐔g𝖳.\mathbf{W}_{g}=\mathbf{U}_{g}\mathbf{\Sigma}_{g}\mathbf{V}^{\mathsf{T}}_{g},\quad\mathbf{A}=\mathbf{U}_{g}^{\mathsf{T}}. (21)

Defining 𝜼=𝐀𝝃,\bm{\eta}=\mathbf{A}\bm{\xi}, we compute the corresponding input samples as 𝜼q=𝐀𝝃q\bm{\eta}^{q}=\mathbf{A}\bm{\xi}^{q} and construct a new measurement matrix 𝚿(𝜼)\mathbf{\Psi}(\bm{\eta}) as (𝚿(𝜼))ij=ψj(𝜼i)(\mathbf{\Psi}(\bm{\eta}))_{ij}=\psi_{j}(\bm{\eta}^{i}). We then solve the minimization problem in Eq. (8) to obtain 𝒄~\tilde{\bm{c}}. If some singular values of 𝐖g\mathbf{W}_{g} are much larger than the others, we can expect to obtain a sparser representation of uu with respect to 𝜼,\bm{\eta}, which is dominated by the eigenspace associated with these larger singular values. On the other hand, if all the singular values σi\sigma_{i} are of the same order, the rotation does not enhance the sparsity. In practice, this method can be designed as an iterative algorithm, in which 𝐀\mathbf{A} and 𝒄~\tilde{\bm{c}} are updated separately following an alternating direction manner.

It is worth noting that the idea of using a linear map to identify a possible low-dimensional structure is also used in sliced inverse regression (SIR) [35], active subspace [15, 48], basis adaptation [54], etc., but with different manners in computing the matrix. In contrast to these methods, the iterative rotation approach does not truncate the dimension in the sense that 𝐀\mathbf{A} is a square matrix. As an initial guess may not be sufficiently accurate, reducing dimension before the iterations terminate may lead to suboptimal results. The dimension reduction was integrated with an iterative method in [62], while another iterative rotation method preceded with SIR-based dimension reduction was proposed in [67]. We refer interested readers to the respective literature.

3 The proposed approach

When applying the rotational CS techniques, the measurement matrix 𝚿(𝜼)\mathbf{\Psi}(\bm{\eta}) may become more coherent compared with 𝚿\mathbf{\Psi}. This is because popular polynomials ψi\psi_{i} used in gPC method, e.g., Legendre and Laguerre polynomials, are not orthogonal with respect to the measure of 𝜼\bm{\eta}, so μ(𝚿(𝜼))\mu(\mathbf{\Psi}(\bm{\eta})) converges to a positive number instead of zero as μ(𝚿)\mu(\mathbf{\Psi}) does. Under such a coherent regime, we advocate the minimization of a nonconvex regularization to identify the sparse coefficients.

To start with, we generate input samples {𝝃q}q=1M\{\bm{\xi}^{q}\}_{q=1}^{M} based on the distribution of 𝝃\bm{\xi} and select the gPC basis functions {ψj}j=1N\{\psi_{j}\}_{j=1}^{N} associated with 𝝃\bm{\xi} in order to generate the measurement matrix 𝚿\mathbf{\Psi} by setting Ψij=ψj(𝝃i)\Psi_{ij}=\psi_{j}(\bm{\xi}^{i}) as in Eq. (6), while initializing 𝐀(0)=𝐈\mathbf{A}^{(0)}=\mathbf{I} and 𝜼(0)=𝝃.\bm{\eta}^{(0)}=\bm{\xi}. Then we propose an alternating direction method (ADM) that combines the nonconvex minimization and rotation matrix estimation. Specifically given {𝝃q}q=1M,\{\bm{\xi}^{q}\}_{q=1}^{M}, {ψj}j=1N,\{\psi_{j}\}_{j=1}^{N}, and 𝒖:={uq=u(𝝃q)}q=1M,\bm{u}:=\{u^{q}=u(\bm{\xi}^{q})\}_{q=1}^{M}, we formulate the following minimization problem,

argmin𝒄,𝐀λJ(𝒄)+12𝚿𝒄𝒖22,𝐀𝐀𝖳=𝐈 and 𝚿ij=ψj(𝐀𝝃i),\operatorname*{arg\,min}_{\bm{c},~{}\mathbf{A}}\lambda J(\bm{c})+\frac{1}{2}\|\mathbf{\Psi}\bm{c}-\bm{u}\|_{2}^{2},\quad\mathbf{A}\mathbf{A}^{\mathsf{T}}=\mathbf{I}\mbox{ and }\mathbf{\Psi}_{ij}=\psi_{j}(\mathbf{A}\bm{\xi}^{i}), (22)

where J()J(\cdot) denotes a regularization functional and λ>0\lambda>0 is a weighting parameter. For example, our early work [65] considered J(𝒄)=𝒄1J(\bm{c})=\|\bm{c}\|_{1}, i.e., the 1\ell_{1} approach. We can minimize Eq. (22) with respect to 𝒄\bm{c} and 𝐀\mathbf{A} in an alternating direction manner. When 𝐀\mathbf{A} is fixed, we minimize a sparsity-promoting functional J()J(\cdot) to identify the sparse coefficients 𝒄,\bm{c}, as detailed in Section 3.1. When 𝒄\bm{c} is fixed, optimizing 𝐀\mathbf{A} is computationally expensive, unless a dimension reduction technique is used, e.g., as in [55]. Instead, we use the rotation estimation introduced in Section 2.3 to construct 𝐀\mathbf{A}. Admittedly, this way of estimating 𝐀\mathbf{A} may not be optimal, but it can potentially promote the sparsity of 𝒄\bm{c}, thus improving the accuracy of the sparse approximation of the gPC expansion.

3.1 Finding sparse coefficient via ADMM

We focus on the 𝒄\bm{c}-subproblem in (22), whose objective function is defined as

𝒄~:=argmin𝒄λJ(𝒄)+12𝚿𝒄𝒖22.\displaystyle\tilde{\bm{c}}:=\operatorname*{arg\,min}_{\bm{c}}\lambda J(\bm{c})+\frac{1}{2}\|\mathbf{\Psi}\bm{c}-\bm{u}\|_{2}^{2}. (23)

We adopt the alternating direction method of multipliers (ADMM) [7] to minimize (23). In particular, we introduce an auxiliary variable 𝐲\mathbf{y} and rewrite (23) as an equivalent problem,

min𝒄,𝒚λJ(𝒄)+12𝚿𝒚𝒖22s.t.𝒄=𝒚.\min_{\bm{c},\bm{y}}\lambda J(\bm{c})+\frac{1}{2}\|\mathbf{\Psi}\bm{y}-\bm{u}\|_{2}^{2}\quad\mbox{s.t.}\quad\bm{c}=\bm{y}. (24)

This new formulation (24) makes the objective function separable with respect to two variables 𝒄\bm{c} and 𝒚\bm{y} to enable efficient computation. Specifically, the augmented Lagrangian corresponding to (24) can be expressed as

Lρ(𝒄,𝒚;𝒘)=λJ(𝒄)+12𝚿𝒚𝒖22+𝒘,𝒄𝒚+ρ2𝒄𝒚22,L_{\rho}(\bm{c},\bm{y};\bm{w})=\lambda J(\bm{c})+\frac{1}{2}\|\mathbf{\Psi}\bm{y}-\bm{u}\|_{2}^{2}+\langle\bm{w},\bm{c}-\bm{y}\rangle+\frac{\rho}{2}\|\bm{c}-\bm{y}\|_{2}^{2}, (25)

where 𝒘\bm{w} is an Lagrangian multiplier and ρ\rho is a positive parameter. Then the ADMM iterations indexed by kk consist of three steps,

𝒄k+1\displaystyle\bm{c}_{k+1} =\displaystyle= argmin𝒄λJ(𝒄)+ρ2𝒄𝒚k+𝒘kρ22\displaystyle\operatorname*{arg\,min}_{\bm{c}}\lambda J(\bm{c})+\frac{\rho}{2}\|\bm{c}-\bm{y}_{k}+\frac{\bm{w}_{k}}{\rho}\|_{2}^{2} (26)
𝒚k+1\displaystyle\bm{y}_{k+1} =\displaystyle= argmin𝒚12𝚿𝒚𝒖22+ρ2𝒄k+1𝒚+𝒘kρ22\displaystyle\operatorname*{arg\,min}_{\bm{y}}\frac{1}{2}\|\mathbf{\Psi}\bm{y}-\bm{u}\|_{2}^{2}+\frac{\rho}{2}\|\bm{c}_{k+1}-\bm{y}+\frac{\bm{w}_{k}}{\rho}\|_{2}^{2} (27)
𝒘k+1\displaystyle\bm{w}_{k+1} =\displaystyle= 𝒘k+ρ(𝒄k+1𝒚k+1).\displaystyle\bm{w}_{k}+\rho(\bm{c}_{k+1}-\bm{y}_{k+1}). (28)

Depending on the choice of J(),J(\cdot), the 𝒄\bm{c}-update (26) can be given by its corresponding proximal operator, i.e.,

𝒄k+1=𝐩𝐫𝐨𝐱J(𝒚k𝒘kρ;λρ).\bm{c}_{k+1}=\mathbf{prox}_{J}\left(\bm{y}_{k}-\frac{\bm{w}_{k}}{\rho};\frac{\lambda}{\rho}\right). (29)

Please refer to Section 2.2 for detailed formula of proximal operators.

As for the 𝒚\bm{y}-update, we take the gradient of (27) with respect to 𝒚\bm{y}, thus leading to

𝚿T(𝚿𝒚𝒖)+ρ(𝒚𝒄k+1𝒘kρ)=𝟎.\mathbf{\Psi}^{T}(\mathbf{\Psi}\bm{y}-\bm{u})+\rho(\bm{y}-\bm{c}_{k+1}-\frac{\bm{w}_{k}}{\rho})=\bm{0}. (30)

Therefore, the update for 𝐲\mathbf{y} is given by

𝒚k+1=(𝚿T𝚿+ρ𝐈)1(𝚿𝖳𝒖+ρ𝒄k+1+𝒘k).\displaystyle\bm{y}_{k+1}=(\mathbf{\Psi}^{T}\mathbf{\Psi}+\rho\mathbf{I})^{-1}\left(\mathbf{\Psi}^{\mathsf{T}}\bm{u}+\rho\bm{c}_{k+1}+\bm{w}_{k}\right). (31)

Note that 𝚿𝖳𝚿+ρ𝐈\mathbf{\Psi}^{\mathsf{T}}\mathbf{\Psi}+\rho\mathbf{I} is a positive definite matrix and there are many efficient numeral algorithms for matrix inversion. Since 𝚿\mathbf{\Psi} has more columns than rows in our case, we further use the Woodbury formula to speed up,

ρ(𝚿𝖳𝚿+ρ𝐈)1=𝐈1ρ𝚿𝖳(𝚿𝚿𝖳+ρ𝐈)1,\displaystyle\rho(\mathbf{\Psi}^{\mathsf{T}}\mathbf{\Psi}+\rho\mathbf{I})^{-1}=\mathbf{I}-\frac{1}{\rho}\mathbf{\Psi}^{\mathsf{T}}(\mathbf{\Psi}\mathbf{\Psi}^{\mathsf{T}}+\rho\mathbf{I})^{-1}, (32)

as 𝚿𝚿𝖳\mathbf{\Psi}\mathbf{\Psi}^{\mathsf{T}} has a smaller dimension than 𝚿𝖳𝚿\mathbf{\Psi}^{\mathsf{T}}\mathbf{\Psi} to be inverted. In summary, the overall minimization algorithm based on ADMM is described in 1.

Algorithm 1 The ADMM framework for solving a general sparse coding problem (23).
1:  Input: measurement matrix 𝚿\mathbf{\Psi} and observed data 𝒖\bm{u}.
2:  Parameters: λ\lambda, ρ\rho, ϵ+\epsilon\in\mathbb{R^{+}} and kmax+\text{k}_{\text{max}}\in\mathbb{Z}^{+}.
3:  Initialize: 𝒄,𝒚,𝒗,𝒘\bm{c},\bm{y},\bm{v},\bm{w} and k=0k=0.
4:  while kkmaxk\leq\text{k}_{\text{max}} or 𝒄k𝒄k1>ϵ\|\bm{c}_{k}-\bm{c}_{k-1}\|>\epsilon do
5:     𝒄k+1=𝐩𝐫𝐨𝐱J(𝒚k𝒘kρ;λρ)\bm{c}_{k+1}=\mathbf{prox}_{J}\left(\bm{y}_{k}-\frac{\bm{w}_{k}}{\rho};\frac{\lambda}{\rho}\right)
6:     𝒚k+1=(𝚿𝖳𝚿+ρ𝐈)1(𝚿𝖳𝒖+ρ𝒄k+1+𝒘k)\bm{y}_{k+1}=(\mathbf{\Psi}^{\mathsf{T}}\mathbf{\Psi}+\rho\mathbf{I})^{-1}(\mathbf{\Psi}^{\mathsf{T}}\bm{u}+\rho\bm{c}_{k+1}+\bm{w}_{k})
7:     𝒘k+1=𝒘k+ρ(𝒄k+1𝒚k+1)\bm{w}_{k+1}=\bm{w}_{k}+\rho(\bm{c}_{k+1}-\bm{y}_{k+1})
8:     k=k+1k=k+1
9:  end while
10:  Return 𝒄~=𝒄k\tilde{\bm{c}}=\bm{c}_{k}

3.2 Rotation matrix update

Suppose we obtain the gPC coefficients 𝒄~(l)\tilde{\bm{c}}^{(l)} at the ll-th iteration via Algorithm 1 with l1l\geq 1. Given vg(l)(𝜼)v_{g}^{(l)}(\bm{\eta}) with input samples {(𝜼(l))q}q=1M\{(\bm{\eta}^{(l)})^{q}\}_{q=1}^{M} for (𝜼(l))q=𝐀(l1)(𝜼(l1))q(\bm{\eta}^{(l)})^{q}=\mathbf{A}^{(l-1)}(\bm{\eta}^{(l-1)})^{q}, we collect the gradient of vg(l)v_{g}^{(l)}, denoted by

𝐖g(l)=1M[𝝃vg(l)((𝜼(l))1),,𝝃vg(l)((𝜼(l))M)],\mathbf{W}_{g}^{(l)}=\dfrac{1}{\sqrt{M}}\left[\nabla_{\bm{\xi}}v_{g}^{(l)}\left((\bm{\eta}^{(l)})^{1}\right),\cdots,\nabla_{\bm{\xi}}v_{g}^{(l)}\left((\bm{\eta}^{(l)})^{M}\right)\right], (33)

where 𝝃=(/ξ1,/ξ2,,/ξd)𝖳\nabla_{\bm{\xi}}\cdot=(\partial\cdot/\partial\xi_{1},\partial\cdot/\partial\xi_{2},\cdots,\partial\cdot/\partial\xi_{d})^{\mathsf{T}}. It is straightforward to evaluate ψn\nabla\psi_{n} at (𝜼(l))q,(\bm{\eta}^{(l)})^{q}, as we construct ψn\psi_{n} using the tensor product of univariate polynomials (3) and derivatives for widely used orthogonal polynomials, e.g., Hermite, Laguerre, Legendre, and Chebyshev, are well studied in the UQ literature. Here, we can analytically compute the gradient of vg(l)v_{g}^{(l)} with respect to 𝝃\bm{\xi} using the chain rule,

𝝃vg(l)((𝜼(l))q)=𝝃vg(l)(𝐀(l)𝝃q)=(𝐀(l))𝖳vg(l)(𝒙)|𝒙=𝐀(l)𝝃q=(𝐀(l))𝖳n=1Nc~n(l)ψn(𝒙)|𝒙=𝐀(l)𝝃q=(𝐀(l))𝖳n=1Nc~n(l)ψn(𝒙)|𝒙=𝐀(l)𝝃q.\begin{split}&\nabla_{{}_{\bm{\xi}}}v_{g}^{(l)}\left((\bm{\eta}^{(l)})^{q}\right)=\nabla_{{}_{\bm{\xi}}}v_{g}^{(l)}\left(\mathbf{A}^{(l)}\bm{\xi}^{q}\right)=(\mathbf{A}^{(l)})^{\mathsf{T}}\nabla v_{g}^{(l)}(\bm{x})\bigg{|}_{\bm{x}=\mathbf{A}^{(l)}\bm{\xi}^{q}}\\ =&(\mathbf{A}^{(l)})^{\mathsf{T}}\nabla\sum_{n=1}^{N}\tilde{c}_{n}^{(l)}\psi_{n}(\bm{x})\bigg{|}_{\bm{x}=\mathbf{A}^{(l)}\bm{\xi}^{q}}=(\mathbf{A}^{(l)})^{\mathsf{T}}\sum_{n=1}^{N}\tilde{c}_{n}^{(l)}\nabla\psi_{n}(\bm{x})\bigg{|}_{\bm{x}=\mathbf{A}^{(l)}\bm{\xi}^{q}}.\end{split} (34)

Then, we have an update for 𝐀(l+1)=(𝐔g(l))𝖳,\mathbf{A}^{(l+1)}=\left(\mathbf{U}_{g}^{(l)}\right)^{\mathsf{T}}, where 𝐔W(l)\mathbf{U}_{W}^{(l)} is from the SVD of

𝐖g(l)=𝐔g(l)𝚺g(l)(𝐕g(l))𝖳.\mathbf{W}_{g}^{(l)}=\mathbf{U}_{g}^{(l)}\mathbf{\Sigma}_{g}^{(l)}\left(\mathbf{V}_{g}^{(l)}\right)^{\mathsf{T}}. (35)

Now we can define a new set of random variables as 𝜼(l+1)=𝐀(l+1)𝝃\bm{\eta}^{(l+1)}=\mathbf{A}^{(l+1)}\bm{\xi} and compute their samples accordingly: (𝜼(l+1))q=𝐀(l+1)𝝃q(\bm{\eta}^{(l+1)})^{q}=\mathbf{A}^{(l+1)}\bm{\xi}^{q}. These samples are then used to construct a new measurement matrix 𝚿(l+1)\mathbf{\Psi}^{(l+1)} as Ψij(l+1)=ψj((𝜼(l+1))i)\Psi^{(l+1)}_{ij}=\psi_{j}((\bm{\eta}^{(l+1)})^{i}) to feed into 1 to obtain 𝒄(l+1)\bm{c}^{(l+1)}.

We summarize the entire procedure in 2.

Algorithm 2 Alternating direction method of minimizing (22)
1:  Generate input samples {𝝃q}q=1M\{\bm{\xi}^{q}\}_{q=1}^{M} based on the distribution of 𝝃\bm{\xi}.
2:  Generate corresponding output samples 𝒖:={uq=u(𝝃q)}q=1M\bm{u}:=\{u^{q}=u(\bm{\xi}^{q})\}_{q=1}^{M} by solving the complete model, e.g., running simulations, solvers, etc.
3:  Select gPC basis functions {ψn}n=1N\{\psi_{n}\}_{n=1}^{N} associated with 𝝃\bm{\xi} and set counter l=0l=0. Set 𝐀(0)=𝐈\mathbf{A}^{(0)}=\mathbf{I}, ~c(0)=𝟎\bm{\tilde{}}{c}^{(0)}=\bm{0}, e=1e=1 and 𝜼(0)=𝝃\bm{\eta}^{(0)}=\bm{\xi}.
4:  Generate the measurement matrix 𝚿(l)\mathbf{\Psi}^{(l)} by setting Ψij(l)=ψj(𝝃i)\Psi_{ij}^{(l)}=\psi_{j}(\bm{\xi}^{i}).
5:  while l<lmaxl<l_{\max} and e103e\geq 10^{-3} do
6:     if l>0l>0 then
7:        Construct 𝐖(l)\mathbf{W}^{(l)} in (33) with vg(l)=n=1Nc~(l)ψn(𝝃(l))v_{g}^{(l)}=\sum_{n=1}^{N}\tilde{c}^{(l)}\psi_{n}(\bm{\xi}^{(l)}).
8:        Compute SVD of 𝐖g(l)\mathbf{W}_{g}^{(l)}: 𝐖g(l)=𝐔g(l)𝚺g(l)(𝐕g(l))𝖳.\mathbf{W}_{g}^{(l)}=\mathbf{U}^{(l)}_{g}\mathbf{\Sigma}^{(l)}_{g}\left(\mathbf{V}^{(l)}_{g}\right)^{\mathsf{T}}.
9:        Set 𝐀(l+1)=(𝐔g(l))𝖳\mathbf{A}^{(l+1)}=\left(\mathbf{U}^{(l)}_{g}\right)^{\mathsf{T}} and 𝜼(l+1)=𝐀(l+1)𝝃\bm{\eta}^{(l+1)}=\mathbf{A}^{(l+1)}\bm{\xi}.
10:        Construct the new measurement matrix 𝚿(l+1)\mathbf{\Psi}^{(l+1)} with Ψij(l+1)=ψj((𝜼(l+1))i)\Psi^{(l+1)}_{ij}=\psi_{j}\left((\bm{\eta}^{(l+1)})^{i}\right).
11:     end if
12:     Solve the minimization problem via ADMM (Algorithm 1):
𝒄~(l+1)=argmin𝐜λJ(𝐜)+12𝚿(l+1)𝐜𝒖22.\tilde{\bm{c}}^{(l+1)}=\operatorname*{arg\,min}_{\mathbf{c}}\lambda J(\mathbf{c})+\frac{1}{2}\|\mathbf{\Psi}^{(l+1)}\mathbf{c}-\bm{u}\|_{2}^{2}.
13:     Calculate root square mean error of coefficients between two consecutive iterations
e=𝒄~(l+1)𝒄~(l)2𝒄~(l+1)2.e=\frac{\|\tilde{\bm{c}}^{(l+1)}-\tilde{\bm{c}}^{(l)}\|_{2}}{\|\tilde{\bm{c}}^{(l+1)}\|_{2}}.
14:     l=l+1l=l+1.
15:  end while
16:  Construct gPC expansion as u(𝝃)ug(𝝃)=vg(lmax)(𝜼(lmax))=n=1Nc~n(lmax)ψn(𝐀(lmax)𝝃)u(\bm{\xi})\approx u_{g}(\bm{\xi})=v_{g}^{(l_{\max})}(\bm{\eta}^{(l_{\max})})=\sum\limits_{n=1}^{N}\tilde{c}^{(l_{\max})}_{n}\psi_{n}(\mathbf{A}^{(l_{\max})}\bm{\xi}).

The stopping criteria we adopt are llmaxl\leq l_{\max} and the relative difference of coefficients between two consecutive iterations less than 10310^{-3}. As the sparsity structure is problem-dependent (see examples in Section 4), more iterations do not grant significant improvements for many practical problems.

4 Numerical Examples

In this section, we present four numerical examples to demonstrate the performance of the proposed method. Specifically, in Section 4.1, we examine a function with low-dimensional structure, which has a truly sparse representation. Section 4.2 discusses an elliptic equation widely used in UQ literature whose solution is not exactly sparse. The example discussed in Section 4.3 has two dominant directions, even though the solution is not exactly sparse either. Lastly, we present a high-dimensional example in Section 4.4. These examples revisit some numerical tests in previous works [65, 68], and hence provide direct comparison of the newly proposed approaches and the original one.

We compare the proposed framework to the rotation with the 1\ell_{1} approach (by setting J(𝐜)=𝐜1J(\mathbf{c})=\|\mathbf{c}\|_{1} in (22)) and the one without rotation (by setting lmax=1l_{\max}=1 in 2). The performance is evaluated in terms of relative error (RE), defined as uug2u2\frac{\|u-u_{g}\|_{2}}{\|u\|_{2}}, where uu is the exact solution, ugu_{g} is a reconstructed solution as a gPC approximation of uu, and the integral in computing the norm 2\|\cdot\|_{2} is approximated with a high-level sparse grids method, based on one-dimensional Gaussian quadrature and the Smolyak structure [51].

We set lmax=9l_{\max}=9. To tune for other two parameters (λ,ρ)(\lambda,\rho), we generate 1010 independent random trials and start with a coarser grid of 10410^{-4}, 10310^{-3}, \cdots, 10110^{1}. For each combination of λ\lambda and ρ\rho in this coarse grid, we apply 2 for every regularization functional that is mentioned in Section 2.2 and record all the relative errors. After finding the optimal exponential order that achieves the smallest averaged RE over these random trials, we multiply by 0.50.5, 11, \cdots, 9.59.5 to find the parameter values, denoted by (λ¯,ρ¯).(\bar{\lambda},\bar{\rho}). We specify the parameter pair (λ¯,ρ¯)(\bar{\lambda},\bar{\rho}) for each testing case in the corresponding section. After (λ¯,ρ¯)(\bar{\lambda},\bar{\rho}) are identified, we conduct another 100100 independent random trials (not including trials for the parameter tuning procedure), and report the average RE. In practice, these parameters can be determined by the kk-fold cross-validation following the work of [20]. Empirically, we did not observe significant difference between these two parameter tuning procedure for our numerical experiments, so we used a fixed set of (λ,ρ)(\lambda,\rho) for all 100100 trials in each example.

In the first example, we do not terminate the iteration when e>103e>10^{-3} in Algorithm 2, but finish ninenine iterations to compare the performance of different approaches when a truly sparse representation is available after a proper rotation. In the remaining examples, we follow the termination criterion (i.e., line 5 in Algorithm 2).

4.1 Ridge function

Consider the following ridge function:

u(ξ)=i=1dξi+0.25(i=1dξi)2+0.025(i=1dξi)3.\displaystyle u(\mathbf{\xi})=\sum_{i=1}^{d}\xi_{i}+0.25\left(\sum_{i=1}^{d}\xi_{i}\right)^{2}+0.025\left(\sum_{i=1}^{d}\xi_{i}\right)^{3}. (36)

As {ξi}i=1d\{\xi_{i}\}_{i=1}^{d} are equally important in this example, adaptive methods [40, 63, 74] that build surrogate models hierarchically based on the importance of ξi\xi_{i} may not be effective. We consider a rotation matrix in the form of

𝐀=(d12d12d12𝐀~),\displaystyle\mathbf{A}=\left(\begin{array}[]{cccc}d^{-\frac{1}{2}}&d^{-\frac{1}{2}}&\cdots&d^{-\frac{1}{2}}\\ \\ &\tilde{\mathbf{A}}&\\ &&\end{array}\right), (41)

where 𝐀~\tilde{\mathbf{A}} is an d×(d1)d\times(d-1) matrix designed to guarantee the orthogonality of the matrix 𝐀\mathbf{A}, e.g., 𝐀\mathbf{A} can be obtained by the Gram–Schmidt process. With this choice of 𝐀\mathbf{A}, we have η1=d12i=1dξi\eta_{1}=d^{-\frac{1}{2}}\sum_{i=1}^{d}\xi_{i} and uu can be represented as

u(𝝃)=v(𝜼)=d12η1+0.25dη12+0.025d32η13.\displaystyle u(\bm{\xi})=v(\bm{\eta})=d^{\frac{1}{2}}\eta_{1}+0.25d\eta_{1}^{2}+0.025d^{\frac{3}{2}}\eta_{1}^{3}. (42)

In the expression u(𝝃)=n=1Nc~nψn(𝐀𝝃)=n=1Nc~nψn(𝜼)u(\bm{\xi})=\sum_{n=1}^{N}\tilde{c}_{n}\psi_{n}(\mathbf{A}\bm{\xi})=\sum_{n=1}^{N}\tilde{c}_{n}\psi_{n}(\bm{\eta}), all of the polynomials that are not related to η1\eta_{1} make no contribution to the expansion, which guarantees the sparsity of 𝒄~=(c~1,,c~N)\tilde{\bm{c}}=(\tilde{c}_{1},\dotsc,\tilde{c}_{N}).

By setting d=12d=12 (hence, the number of gPC basis functions is N=455N=455 for p=3p=3), we compare the accuracy of computing gPC expansions by minimizing different regularization functionals with and without rotations. We consider gPC expansion using Legendre polynomial (assuming ξi\xi_{i} are i.i.d. uniform random variables), Hermite polynomial expansion (assuming ξi\xi_{i} are i.i.d. Gaussian random variables), and Laguerre polynomial (assuming ξi\xi_{i} are i.i.d. exponential random variables), respectively. The number of sample MM ranges from 100100 to 180180, each repeated 100 times to compute the average RE. The parameters λ¯,ρ¯\bar{\lambda},\bar{\rho} for all the regularization models are listed in 1.

Table 1: Parameters (λ¯,ρ¯)(\bar{\lambda},\bar{\rho}) for different regularization models in the case of ridge function.
UQ setting Legendre Hermite Laguerre
λ¯\bar{\lambda} ρ¯\bar{\rho} λ¯\bar{\lambda} ρ¯\bar{\rho} λ¯\bar{\lambda} ρ¯\bar{\rho}
1\ell_{1} 6×1046\times 10^{-4} 6×1016\times 10^{-1} 1×1021\times 10^{-2} 1×1011\times 10^{-1} 5×1025\times 10^{-2} 5×1005\times 10^{0}
l1/2l_{1/2} 5×1035\times 10^{-3} 1.6×1011.6\times 10^{1} 1.8×1021.8\times 10^{-2} 6×1016\times 10^{-1} 6×1026\times 10^{-2} 1.2×1001.2\times 10^{0}
TL11 1×1081\times 10^{-8} 1×1071\times 10^{-7} 1×1081\times 10^{-8} 1×1071\times 10^{-7} 1×1081\times 10^{-8} 1×1071\times 10^{-7}
ERF 1.5×1011.5\times 10^{-1} 1.6×1031.6\times 10^{3} 1×1001\times 10^{0} 1.2×1011.2\times 10^{1} 1.1×1001.1\times 10^{0} 1.5×1031.5\times 10^{3}
12\ell_{1}-\ell_{2} 3×1043\times 10^{-4} 5×1015\times 10^{-1} 1×1041\times 10^{-4} 1×1021\times 10^{-2} 5×1015\times 10^{-1} 1×1001\times 10^{0}
Refer to caption
(a) Legendre
Refer to caption
(b) Hermite
Refer to caption
(c) Laguerre
Figure 1: Relative errors of Legendre, Hermite and Laguerre polynomial expansions for ridge function against ratio M/NM/N. Solid lines are for experiments with rotations applied whereas dashed line is a reference of 1\ell_{1} test result without rotation. “ \circ” is the marker for 1\ell_{1} minimization, “ *” is the marker for 1/2\ell_{1/2}, “ \square” is the marker for TL1, “ \star” is the marker for ERF, “ \Diamond” is the marker for 12\ell_{1}-\ell_{2}.

1 plots relative errors corresponding to the ratios of M/NM/N, showing that nonconvex regularizations (except for 1/2\ell_{1/2}) outperform the convex 1\ell_{1} approach. In particular, the best regularizations for different polynomials are different, but 1\ell_{1}-2\ell_{2} and ERF generally perform very well (within top two). TL1 works well in the cases of Legendre and Hermite polynomials, while L1/2L_{1/2} is not numerically stable to guarantee satisfactory results. Please refer to Section 4.5 for in-depth discussions on the performance of these regularizations with respect to the matrix coherence.

1 also indicates that the standard 1\ell_{1} minimization without rotation is not effective, as its relative error is close to 50% even when M/NM/N approaches 0.40.4. Our iterative rotation methods with any regularization yield much higher accuracy, especially for a larger MM value. The reason can be partially explained by 2. Specifically, the plots (a), (b) and (c) of 2 are about the absolute values of exact coefficients |cn||c_{n}| of Legendre, Hermite, and Laguerre polynomials, while the plots (d), (e) and (f) show corresponding coefficients |c~n||\tilde{c}_{n}| after 9 iterations with the 1\ell_{1}-2\ell_{2} minimization using 120 samples randomly chosen from the 100100 independent experiments; we exclude c~n\tilde{c}_{n} whose absolute value is smaller than 10310^{-3}, since they are sufficiently small (more than two magnitudes smaller than the dominating ones). As demonstrated in Figure 2, the iterative updates on the rotation matrix significantly sparsifies the representation of uu; and as a result, the efficiency of CS methods is substantially enhanced. Moreover, c~n\tilde{c}_{n} for the Laguerre polynomial is not as sparse as the ones for the Legendre and Hermite polynomials. Consequently, the Laguerre polynomial shows less accurate results in 1(c) compared with other two polynomials in 1(a) and (b). The phenomenon can be partially explained by the coherence of 𝚿\mathbf{\Psi} for different polynomials, i.e., the coherence in the Laguerre case is much larger (>0.94>0.94 before rotation and >0.99>0.99 after rotation) than the other two, thus making any sparse regression algorithms less effective. For more detailed discussion on coherence, please refer to Table 5 and Section 4.5.

Refer to caption
(a) Legendre |cn||c_{n}|
Refer to caption
(b) Hermite |cn||c_{n}|
Refer to caption
(c) Laguerre |cn||c_{n}|
Refer to caption
(d) Legendre |c~n||\tilde{c}_{n}|
Refer to caption
(e) Hermite |c~n||\tilde{c}_{n}|
Refer to caption
(f) Laguerre |c~n||\tilde{c}_{n}|
Figure 2: (Ridge function) Absolute values of exact coefficients cnc_{n} (firt row) and coefficients c~n\tilde{c}_{n} after 99 rotations with the 1\ell_{1}-2\ell_{2} minimization (second row) using 120120 samples.

4.2 Elliptic Equation

Next we consider a one-dimensional elliptic differential equation with a random coefficient [20, 65]:

ddx(a(x;𝝃)du(x;𝝃)dx)=1,\displaystyle-\frac{d}{dx}\left(a(x;\bm{\xi})\frac{du(x;\bm{\xi})}{dx}\right)=1, x(0,1)\displaystyle\quad x\in(0,1) (43)
u(0)=u(1)=0,\displaystyle u(0)=u(1)=0,

where a(x;𝝃)a(x;\bm{\xi}) is a log-normal random field based on Karhunen-Loève (KL) expansion:

a(x;𝝃)=a0(x)+exp(σi=1dλiϕi(x)ξi),a(x;\bm{\xi})=a_{0}(x)+\exp\left(\sigma\sum_{i=1}^{d}\sqrt{\lambda_{i}}\phi_{i}(x)\xi_{i}\right), (44)

{ξi}\{\xi_{i}\} are i.i.d. random variables, and {λi,ϕi(t)}i=1d\{\lambda_{i},\phi_{i}(t)\}_{i=1}^{d} are eigenvalues/eigenfunctions (in a descending order in λi\lambda_{i}) of an exponential covariance kernel:

C(x,x)=exp(|xx|lc).C(x,x^{\prime})=\exp\left(\dfrac{|x-x^{\prime}|}{l_{c}}\right). (45)

The value of λi\lambda_{i} and the analytical expressions for ϕi\phi_{i} are given in [31]. We set a0(x)0.1,σ=0.5,lc=0.2a_{0}(x)\equiv 0.1,\sigma=0.5,l_{c}=0.2 and d=15d=15 such that i=1dλi>0.93i=1λi\sum_{i=1}^{d}\lambda_{i}>0.93\sum_{i=1}^{\infty}\lambda_{i}. For each input sample 𝝃q\bm{\xi}^{q}, the solution of the deterministic elliptic equation can be obtained by [64]:

u(x)=u(0)+0xa(0)u(0)ya(y)dy.u(x)=u(0)+\int_{0}^{x}\dfrac{a(0)u(0)^{\prime}-y}{a(y)}\mathrm{d}y. (46)

By imposing the boundary condition u(0)=u(1)=0u(0)=u(1)=0, we can compute a(0)u(0)a(0)u(0)^{\prime} as

a(0)u(0)=(01ya(y)dy)/(011a(y)dy).a(0)u(0)^{\prime}=\left(\int_{0}^{1}\dfrac{y}{a(y)}\mathrm{d}y\right)\Big{/}\left(\int_{0}^{1}\dfrac{1}{a(y)}\mathrm{d}y\right). (47)

The integrals in Eqs. (46) and (47) are obtained by highly accurate numerical integration. For this example, we choose the quantity of interest to be u(x;𝝃)u(x;\bm{\xi}) at x=0.35x=0.35. We aim to build a third-order Legendre (or Hermite) polynomial expansion which includes N=816N=816 basis functions. The relative error is approximated by a level-66 sparse grid method. The parameters (λ¯,ρ¯)(\bar{\lambda},\bar{\rho}) are given in 2.

Table 2: Parameters (λ¯,ρ¯)(\bar{\lambda},\bar{\rho}) for different regularization models in the case of elliptic equation.
UQ setting Legendre Hermite
λ¯\bar{\lambda} ρ¯\bar{\rho} λ¯\bar{\lambda} ρ¯\bar{\rho}
1\ell_{1} 6×1046\times 10^{-4} 1×1011\times 10^{-1} 1.3×1031.3\times 10^{-3} 3×1003\times 10^{0}
l1/2l_{1/2} 8×1058\times 10^{-5} 1×1021\times 10^{2} 3×1043\times 10^{-4} 1.2×1021.2\times 10^{2}
TL11 1×1081\times 10^{-8} 1×1071\times 10^{-7} 1×1081\times 10^{-8} 1×1071\times 10^{-7}
ERF 2×1022\times 10^{-2} 2×1042\times 10^{4} 9×1039\times 10^{-3} 1.3×1031.3\times 10^{3}
12\ell_{1}-\ell_{2} 1×1031\times 10^{-3} 6×1026\times 10^{-2} 1.9×1031.9\times 10^{-3} 2.3×1022.3\times 10^{-2}

Relative errors of the Legendre and Hermite polynomial expansions are presented in 3. All the methods with rotation perform almost the same, except that 1/2\ell_{1/2} yields unstable results, especially when sample size is small. In this case, we do not observe significant improvement of the nonconvex regularizations over the convex 1\ell_{1} model, as opposed to 1. 1\ell_{1}-2\ell_{2} and ERF perform the best for both Legendre and Hermite polynomials. In this case, the solution does not have an underlying low-dimensional structure under rotation as in the previous example and the truncation error exists, which is common in practical problems. This is why the improvement by the rotational method is minor.

Refer to caption
(a) Legendre
Refer to caption
(b) Hermite
Figure 3: Relative error of Legendre and Hermite polynomial expansions for an elliptic equation against ratio M/NM/N. Solid lines are for experiments with rotations applied whereas dashed line is a reference of 1\ell_{1} test result without rotation. “ \circ” is the marker for 1\ell_{1} minimization, “ *” is the marker for 1/2\ell_{1/2}, “ \square” is the marker for TL1, “ \star” is the marker for ERF, “ \Diamond” is the marker for 12\ell_{1}-\ell_{2}.

4.3 Korteweg-de Vries equation

As an example of a more complicated and nonlinear differential equation, we consider the Korteweg-de Vries (KdV) equation with time-dependent additive noise,

ut(x,t;𝝃)6u(x,t;𝝃)ux(x,t;𝝃)+uxxx(x,t;𝝃)=f(t;𝝃),x(,)u(x,0;𝝃)=2sech2(x).\begin{array}[]{l}u_{t}(x,t;\bm{\xi})-6u(x,t;\bm{\xi})u_{x}(x,t;\bm{\xi})+u_{xxx}(x,t;\bm{\xi})=f(t;\bm{\xi}),\quad x\in(-\infty,\infty)\\ u(x,0;\bm{\xi})=-2\operatorname{sech}^{2}(x).\end{array} (48)

Here f(t;ξ)f(t;\mathbf{\xi}) is modeled as a random field represented by the Karhunen-Loève expansion:

f(t;𝝃)=σi=1dλiϕi(t)ξi,\displaystyle f(t;\bm{\xi})=\sigma\sum_{i=1}^{d}\sqrt{\lambda_{i}}\phi_{i}(t)\xi_{i}, (49)

where σ\sigma is a constant and {λi,ϕi(t)}i=1d\{\lambda_{i},\phi_{i}(t)\}_{i=1}^{d} are eigenvalues/eigenfunctions (in a descending order) of the exponential covariance kernel Eq. (45). We set lc=0.25l_{c}=0.25 and d=10d=10 in (45) such that i=1dλi>0.96i=1λi\sum_{i=1}^{d}\lambda_{i}>0.96\sum_{i=1}^{\infty}\lambda_{i}. Under this setting, we have an analytical solution given by

u(x,t;𝝃)=σi=1dλiξi0tϕi(y)dy2sech2(x4t+6σi=1dλiξi0t0zϕi(y)dydz).\displaystyle u(x,t;\bm{\xi})=\sigma\sum_{i=1}^{d}\sqrt{\lambda_{i}}\xi_{i}\int_{0}^{t}\phi_{i}(y)\mathrm{d}y-2\operatorname{sech}^{2}\left(x-4t+6\sigma\sum_{i=1}^{d}\sqrt{\lambda_{i}}\xi_{i}\int_{0}^{t}\int_{0}^{z}\phi_{i}(y)\mathrm{d}y\mathrm{d}z\right). (50)

We choose QoI to be u(x,t;ξ)u(x,t;\mathbf{\xi}) at x=6,t=1,x=6,t=1, and σ=0.4\sigma=0.4. Thanks to analytical expressions of ϕi(x)\phi_{i}(x), we can compute the integrals in (50) with high accuracy. Denote

Ai\displaystyle A_{i} =λi01ϕi(y)dyandBi=λi010zϕi(y)dydz,i=1,2,,d,\displaystyle=\sqrt{\lambda_{i}}\int_{0}^{1}\phi_{i}(y)\mathrm{d}y\quad\mbox{and}\quad B_{i}=\sqrt{\lambda_{i}}\int_{0}^{1}\int_{0}^{z}\phi_{i}(y)\mathrm{d}y\mathrm{d}z,\quad i=1,2,\cdots,d, (51)

the analytical solution can be written as

u(x,t;𝝃)|x=6,t=1=σi=1dAiξi2sech2(2+6σi=1dBiξi).\displaystyle\left.u(x,t;\bm{\xi})\right|_{x=6,t=1}=\sigma\sum_{i=1}^{d}A_{i}\xi_{i}-2\operatorname{sech}^{2}\left(2+6\sigma\sum_{i=1}^{d}B_{i}\xi_{i}\right). (52)

We use a fourth-order gPC expansion to approximate the solution, i.e., p=4p=4, and the number of gPC basis functions is N=1001N=1001. The experiment is repeated 5050 times to compute the average relative errors for each gPC expansion. Parameters chosen for different regularizations are given in 3. Relative errors of the Legendre and Hermite polynomial expansions are presented in 4, which illustrates the combined method of iterative rotation and nonconvex minimization outperforms the simple 1\ell_{1} approaches. The coherence μ\mu of 𝚿\mathbf{\Psi} for Legendre polynomial is around 0.6/0.85 before/after the rotation, while it becomes over 0.92 for the Hermite polynomial. In such highly coherent regime, 1/2\ell_{1/2} does not work very well, and other nonconvex regularizations, i.e., ERF, TL1 and 1\ell_{1}-2\ell_{2}, perform better than 1\ell_{1} in the Hermite polynomial case, especially ERF.

Table 3: Parameters (λ¯,ρ¯)(\bar{\lambda},\bar{\rho}) for different regularization models in the case of KdV function.
Method Legendre Hermite
λ¯\bar{\lambda} ρ¯\bar{\rho} λ¯\bar{\lambda} ρ¯\bar{\rho}
1\ell_{1} 1×1041\times 10^{-4} 1×1021\times 10^{-2} 1×1021\times 10^{-2} 4×1044\times 10^{-4}
l1/2l_{1/2} 5×1035\times 10^{-3} 1.6×1011.6\times 10^{1} 1.8×1021.8\times 10^{-2} 6×1016\times 10^{-1}
Transformed 1\ell_{1} 1×1081\times 10^{-8} 1×1071\times 10^{-7} 1×1081\times 10^{-8} 1×1071\times 10^{-7}
ERF 1.5×1011.5\times 10^{-1} 1.6×1031.6\times 10^{3} 1×1001\times 10^{0} 1.2×1011.2\times 10^{1}
12\ell_{1}-\ell_{2} 1×1041\times 10^{-4} 1×1021\times 10^{-2} 1×1041\times 10^{-4} 1×1021\times 10^{-2}
Refer to caption
(a) Legendre
Refer to caption
(b) Hermite
Figure 4: (KdV equation) Relative error of Legendre and Hermite polynomial expansions for KdV equation against ratio M/NM/N. Solid lines are for experiments with rotations applied whereas dashed line is a reference of 1\ell_{1} test result without rotation. “ \circ” is the marker for 1\ell_{1} minimization, “ *” is the marker for 1/2\ell_{1/2}, “ \square” is the marker for TL1, “ \star” is the marker for ERF, “ \Diamond” is the marker for 12\ell_{1}-\ell_{2}.

4.4 High-dimensional function

We illustrate the potential capability of the proposed approach for dealing with higher-dimensional problems, referred to as HD function. Specifically, we select a function similar to the one in Section 4.1 but with a much higher dimension,

u(𝝃)=i=1dξi+0.25(i=1dξi/i)2,d=100.\displaystyle u(\bm{\xi})=\sum_{i=1}^{d}\xi_{i}+0.25\left(\sum_{i=1}^{d}\xi_{i}/\sqrt{i}\right)^{2},\quad d=100. (53)

The total number of basis functions for this example is N=5151N=5151. The experiment is repeated 2020 times to compute the average relative errors for each polynomial. Parameters for this set of experiments are given in 4. The results are presented in 5, showing that all nonconvex methods with rotation outperforms the 1\ell_{1} approach except for ERF under the Hermite basis when sample size is small. Different from previous examples (ridge, elliptic, and KdV), 1/2\ell_{1/2} achieves the best result, as the corresponding coherence is relatively small, around 0.2/0.3 before/after rotation for the Legendre polynomial, and 0.30.3 for the Hermite polynomial. In addition, 5 (b) suggests that ERF method is stable with respect to sampling ratios for the Hermite basis.

Table 4: Parameters (λ¯,ρ¯)(\bar{\lambda},\bar{\rho}) for different regularization models in the case of HD function.
UQ setting Legendre Hermite
λ¯\bar{\lambda} ρ¯\bar{\rho} λ¯\bar{\lambda} ρ¯\bar{\rho}
1\ell_{1} 1.2×1001.2\times 10^{0} 5×1025\times 10^{-2} 1×1021\times 10^{-2} 1×1031\times 10^{-3}
l1/2l_{1/2} 1×1011\times 10^{-1} 1×1021\times 10^{2} 5×1025\times 10^{-2} 1×1021\times 10^{2}
TL1 1×1061\times 10^{-6} 1×1071\times 10^{-7} 4×1064\times 10^{-6} 1×1071\times 10^{-7}
ERF 1.5×1011.5\times 10^{-1} 1.6×1031.6\times 10^{3} 5×1025\times 10^{-2} 1×1021\times 10^{2}
12\ell_{1}-\ell_{2} 5×1025\times 10^{-2} 1×1021\times 10^{2} 1×1031\times 10^{-3} 9×1029\times 10^{-2}
Refer to caption
(a) Legendre
Refer to caption
(b) Hermite
Figure 5: (High-dimensional function) Relative error of Legendre and Hermite polynomial expansions for the high-dimensional function against sampling ratio M/NM/N. Solid lines are for experiments with rotations applied whereas dashed line is a reference of 1\ell_{1} test result without rotation. “ \circ” is the marker for 1\ell_{1} minimization, “ *” is the marker for 1/2\ell_{1/2}, “ \square” is the marker for TL1, “ \star” is the marker for ERF, “ \Diamond” is the marker for 12\ell_{1}-\ell_{2}. All results with rotations are plotted with a threshold of 10310^{-3}.

4.5 Discussion

We intend to discuss the effects of coherence and the number of rotations on the performance of the 1\ell_{1} and other nonconvex approaches. As reported in [26, 37, 70], 1\ell_{1}-2\ell_{2} and ERF methods perform particularly well for coherent matrices (i.e., large μ\mu) and p\ell_{p} performs well for incoherenct matrices (i.e., small μ\mu), which motivates us to compute the coherence values and report in 56 using the 1\ell_{1}-2\ell_{2} method for ridge and elliptic/KdV/HD, respectively. Here, we use μ\mu of each iteration by 1\ell_{1}-2\ell_{2} as an examples, and its value for other regularizations (including 1\ell_{1}) are similar. Both tables confirm that applying rotation increases the coherence level of the sensing matrix 𝚿\mathbf{\Psi} except for the Hermite basis. As we show in numerical examples, when the coherence is large (e.g., around 0.9 or even larger in Hermite polynomial for KdV) 1\ell_{1}-2\ell_{2} and ERF perform better than the convex 1\ell_{1} method. When the coherence is small (e.g., 0.3\lesssim 0.3 in the high-dimensional case) 1/2\ell_{1/2} gives the best results among all the competing methods. One the other hand, 1/2\ell_{1/2} may lead to unstable and unsatisfactory results, sometimes even worse than the convex 1\ell_{1} method, when the coherence of the sensing matrix is large. In the extreme case when the coherence is close to 11 (e.g., Laguerre in ridge function), the best result is only slightly better than 1\ell_{1}, which seems difficult for any sparse recovery algorithms to succeed. This series of observations coincide with the empirical performance in CS, i.e., 1/2\ell_{1/2} works the best for incoherent matrices, while 1\ell_{1}-2\ell_{2} and ERF work better for coherent cases.

Table 5: Average coherence of matrix 𝚿\mathbf{\Psi} (size 160×455160\times 455) for ridge function.
Ridge function
Rotations Legendre Hermite Laguerre
0 0.4692 0.7622 0.9448
3 0.6833 0.7527 0.9910
6 0.6822 0.7519 0.9911
9 0.6762 0.7657 0.9911
Table 6: Average coherence of matrix 𝚿\mathbf{\Psi} for ellip equation (matrix size 160×816160\times 816), KdV (matrix size 160×1001160\times 1001) and HD function (matrix size 1000×51511000\times 5151).
Elliptic equation KdV equation HD function
Rotations Legendre Hermite Legendre Hermite Legendre Hermite
0 0.5014 0.7770 0.6079 0.9121 0.2117 0.2852
1 0.6958 0.7830 0.8825 0.9223 0.2580 0.3075
2 0.6943 0.7719 0.8487 0.9167 0.2664 0.2956
3 0.6896 0.7710 0.8677 0.9214 0.2522 0.3003

We then examine the effect of the number of rotations in 6. Here we use ridge function and KdV equation as an examples, and show the results by 1\ell_{1}-2\ell_{2}. For ridge function, more rotations lead to a better accuracy, while it stagnates at 3-5 rotations for the KdV equation. This is because the ridge function has very good low-dimensional structure, i.e., the dimension can be reduced to one using a linear transformation 𝜼=𝐀𝝃\bm{\eta}=\mathbf{A}\bm{\xi}, while the KdV equation does not have this property. Also, there is no truncation error ε(𝝃)\varepsilon(\bm{\xi}) when using the gPC expansion to represent the ridge function as we use a third order expansion, while ε(𝝃)\varepsilon(\bm{\xi}) exists for the KdV equation. In most practical problems, the truncation error exists and the linear transform may not yield the optimal low-dimensional structure to have sparse coefficients of the gPC expansion. Therefore, we empirically set a maximum number of rotations lmaxl_{\max} to terminate iterations in the algorithm.

Refer to caption
(a) Ridge Legendre
Refer to caption
(b) Ridge Hermite
Refer to caption
(c) KdV Legendre
Refer to caption
(d) KdV Hermite
Figure 6: Relative error vs rotation. The size of 𝚿\mathbf{\Psi} is 160×455160\times 455 in the Ridge problem and 160×1001160\times 1001 in the KdV problem.

Finally, we present the computation time in 7. All the experiments are performed on an AMD Ryzen 5 3600, 16 GB RAM machine on Windows 10 1904 and 2004 with MATLAB 2018b. The major computation comes from two components: one is the 1\ell_{1}-2\ell_{2} minimization and the other is to find the rotation matrix AA. The computation complexity for every iteration of the 1\ell_{1}-2\ell_{2} algorithm is O(M3+M2N),O(M^{3}+M^{2}N), which reduces to O(M2N)O(M^{2}N) as we assume MN.M\ll N. In practice, we choose the maximum outer/inner numbers in Algorithm 1 as nmax=10,kmax=2Nn_{max}=10,k_{max}=2N respectively, and hence the complexity for 1\ell_{1}-2\ell_{2} algorithm is O(M2N2)O(M^{2}N^{2}). To find the rotation matrix A,A, one has to construct a matrix WW using (33) with complexity of O(M3N),O(M^{3}N), followed by SVD with complexity of O(M3N+M2N2).O(M^{3}N+M^{2}N^{2}). Therefore, the total complexity of our approach is O(M2N2)O(M^{2}N^{2}) per rotation. We divide the time of Legendre polynomial reported in 7 by lmax(MN)2,l_{\max}(MN)^{2}, getting 1.39e10,1.95e101.39e^{-10},1.95e^{-10}, and 0.25e10.0.25e^{-10}. As the ratios are of the same order, the empirical results are consistent with the complexity analysis.

Table 7: Computation time per each random realization, averaged over 20 trials. (N/A means a certain case is not available.)
Time (sec.) dimension rotations Legendre Hermite Laguerre
Ridge function 160×455160\times 455 9 6.53 4.31 16.19
Elliptic equation 220×816220\times 816 3 5.26 11.96 N/A
KdV equation 160×1001160\times 1001 3 15.03 14.33 N/A
HD function 1000×51511000\times 5151 3 2041.82 2102.04 N/A

5 Conclusions

In this work, we proposed an alternating direction method to identify a rotation matrix iteratively in order to enhance the sparsity of gPC expansion, followed by several nonconvex minimization scheme to efficiently identify the sparse coefficients. We used a general framework to incorporate any regularization whose proximal operator can be found efficiently (including 1\ell_{1}) into the rotational method. As such, it improves the accuracy of the compressive sensing method to construct the gPC expansions from a small amount of data. In particular, the rotation is determined by seeking the directions of maximum variation for the QoI through SVD of the gradients at different points in the parameter space. The linear system after rotations becomes ill-conditioned, specifically more coherent, which motivated us to choose the nonconvex method instead of the convex 1\ell_{1} approach for sparse recovery. We conducted extensive simulations under various scenarios, including a ridge function, an elliptic equation, KdV equation, and a HD function with Legendre, Hermite, and Laguerre polynomials, all of which are widely used in practice. Our experimental results demonstrated that the proposed combination of rotation estimation and nonconvex methods significantly outperforms the standard 1\ell_{1} minimization (without rotation). In different coherence scenarios, there are different nonconvex regularizations (combined with rotations) that outperforms the rotational CS with the 1\ell_{1} approach. Specifically, 1\ell_{1}-2\ell_{2} and ERF work well for coherent systems, while 1/2\ell_{1/2} excels in incoherent ones, which are aligned with the observations in CS studies.

Acknowledgement

This work was partially supported by NSF CAREER 1846690. Xiu Yang was supported by the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research (ASCR) as part of Multifaceted Mathematics for Rare, Extreme Events in Complex Energy and Environment Systems (MACSER).

References

  • [1] B. Adcock, Infinite-dimensional compressed sensing and function interpolation, Found. of Comput. Math., (2017), pp. 1–41.
  • [2] B. Adcock, S. Brugiapaglia, and C. G. Webster, Compressed sensing approaches for polynomial approximation of high-dimensional functions, in Compressed Sensing and its Applications, Springer, 2017, pp. 93–124.
  • [3] N. Alemazkoor and H. Meidani, Divide and conquer: An incremental sparsity promoting compressive sampling approach for polynomial chaos expansions, Comput. Methods Appl. Mech. Eng., 318 (2017), pp. 937–956.
  • [4] I. Babuška, F. Nobile, and R. Tempone, A stochastic collocation method for elliptic partial differential equations with random input data, SIAM Rev., 52 (2010), pp. 317–355.
  • [5] A. S. Bandeira, E. Dobriban, D. G. Mixon, and W. F. Sawin, Certifying the restricted isometry property is hard, IEEE Trans. Inf. Theory, 59 (2013), pp. 3448–3450.
  • [6] G. Blatman and B. Sudret, Adaptive sparse polynomial chaos expansion based on least angle regression, J. Comput. Phys., 230 (2011), pp. 2345–2367.
  • [7] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn., 3 (2011), pp. 1–122.
  • [8] A. M. Bruckstein, D. L. Donoho, and M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Rev., 51 (2009), pp. 34–81.
  • [9] R. H. Cameron and W. T. Martin, The orthogonal development of non-linear functionals in series of fourier-hermite functionals, Ann. Math., (1947), pp. 385–392.
  • [10] E. J. Candès, The restricted isometry property and its implications for compressed sensing, C.R. Math., 346 (2008), pp. 589–592.
  • [11] E. J. Candès, J. K. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Comm. Pure Appl. Math, 59 (2006), pp. 1207–1223.
  • [12] E. J. Candès and M. B. Wakin, An introduction to compressive sampling, IEEE Signal Process. Mag., 25 (2008), pp. 21–30.
  • [13] E. J. Candès, M. B. Wakin, and S. P. Boyd, Enhancing sparsity by reweighted l1 minimization, J. Fourier Anal. Appl., 14 (2008), pp. 877–905.
  • [14] R. Chartrand, Exact reconstruction of sparse signals via nonconvex minimization, IEEE Signal Process Lett., 14 (2007), pp. 707–710.
  • [15] P. G. Constantine, E. Dow, and Q. Wang, Active subspace methods in theory and practice: Applications to kriging surfaces, SIAM J. Sci. Comput., 36 (2014), pp. A1500–A1524.
  • [16] W. Dai and O. Milenkovic, Subspace pursuit for compressive sensing: Closing the gap between performance and complexity, tech. rep., ILLINOIS UNIV AT URBANA-CHAMAPAIGN, 2008.
  • [17] D. L. Donoho, Compressed sensing, IEEE Trans. Inf. Theory, 52 (2006), pp. 1289–1306.
  • [18] D. L. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonl) dictionaries via l1 minimization, Proc. Nat. Acad. Scien. USA, 100 (2003), pp. 2197–2202.
  • [19] D. L. Donoho, M. Elad, and V. N. Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise, IEEE Trans. Inf. Theory, 52 (2006), pp. 6–18.
  • [20] A. Doostan and H. Owhadi, A non-adapted sparse approximation of PDEs with stochastic inputs, J. Comput. Phys., 230 (2011), pp. 3015–3034.
  • [21] O. G. Ernst, A. Mugler, H.-J. Starkloff, and E. Ullmann, On the convergence of generalized polynomial chaos expansions, ESAIM: Math. Model. Numer. Anal., 46 (2012), pp. 317–339.
  • [22] S. Foucart and H. Rauhut, A mathematical introduction to compressive sensing, Birkhäuser Basel, 2013.
  • [23] R. G. Ghanem and P. D. Spanos, Stochastic finite elements: a spectral approach, Springer-Verlag, New York, 1991.
  • [24] R. Gribonval and M. Nielsen, Sparse representations in unions of bases, IEEE Trans. Inf. Theory, 49 (2003), pp. 3320–3325.
  • [25] L. Guo, J. Li, and Y. Liu, Stochastic collocation methods via minimisation of the transformed l1l_{1}-penalty, East Asian Journal on Applied Mathematics, 8 (2018), pp. 566–585.
  • [26] W. Guo, Y. Lou, J. Qin, and M. Yan, A novel regularization based on the error function for sparse recovery, arXiv preprint arXiv:2007.02784, (2020).
  • [27] J. Hampton and A. Doostan, Compressive sampling of polynomial chaos expansions: Convergence analysis and sampling strategies, J. Comput. Phys., 280 (2015), pp. 363–386.
  • [28]  , Basis adaptive sample efficient polynomial chaos (base-pc), J. Comput. Phys., 371 (2018), pp. 20–49.
  • [29] J. D. Jakeman, M. S. Eldred, and K. Sargsyan, Enhancing 1\ell_{1}-minimization estimates of polynomial chaos expansions using basis selection, J. Comput. Phys., 289 (2015), pp. 18–34.
  • [30] J. D. Jakeman, A. Narayan, and T. Zhou, A generalized sampling and preconditioning scheme for sparse approximation of polynomial chaos expansions, SIAM J. Sci. Comput., 39 (2017), pp. A1114–A1144.
  • [31] M. Jardak, C.-H. Su, and G. E. Karniadakis, Spectral polynomial chaos solutions of the stochastic advection equation, J. Sci. Comput., 17 (2002), pp. 319–338.
  • [32] M.-J. Lai, Y. Xu, and W. Yin, Improved iteratively reweighted least squares for unconstrained smoothed q\ell_{q} minimization, SIAM J. Numer. Anal., 51 (2013), pp. 927–957.
  • [33] H. Lei, X. Yang, Z. Li, and G. E. Karniadakis, Systematic parameter inference in stochastic mesoscopic modeling, J. Comput. Phys., 330 (2017), pp. 571–593.
  • [34] H. Lei, X. Yang, B. Zheng, G. Lin, and N. A. Baker, Constructing surrogate models of complex systems with enhanced sparsity: quantifying the influence of conformational uncertainty in biomolecular solvation, SIAM Multiscale Model. Simul., 13 (2015), pp. 1327–1353.
  • [35] K.-C. Li, Sliced inverse regression for dimension reduction, J. Am. Stat. Assoc., 86 (1991), pp. 316–327.
  • [36] Y. Lou and M. Yan, Fast l1-l2 minimization via a proximal operator, J. Sci. Comput., 74 (2018), pp. 767–785.
  • [37] Y. Lou, P. Yin, Q. He, and J. Xin, Computing sparse representation in a highly coherent dictionary based on difference of L1{L_{1}} and L2{L_{2}}, J. Sci. Comput., 64 (2015), pp. 178–196.
  • [38] Y. Lou, P. Yin, and J. Xin, Point source super-resolution via non-convex l1 based methods, J. Sci. Comput., 68 (2016), pp. 1082–1100.
  • [39] J. Lv, Y. Fan, et al., A unified approach to model selection and sparse recovery using regularized least squares, Annals of Stat., 37 (2009), pp. 3498–3528.
  • [40] X. Ma and N. Zabaras, An adaptive high-dimensional stochastic model representation technique for the solution of stochastic partial differential equations, J. Comput. Phys., 229 (2010), pp. 3884–3915.
  • [41] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Comput., 24 (1995), pp. 227–234.
  • [42] H. Ogura, Orthogonal functionals of the Poisson process, IEEE Trans. Inf. Theory, 18 (1972), pp. 473–481.
  • [43] J. Peng, J. Hampton, and A. Doostan, A weighted 1\ell_{1}-minimization approach for sparse polynomial chaos expansions, J. Comput. Phys., 267 (2014), pp. 92–111.
  • [44]  , On polynomial chaos expansion via gradient-enhanced 1\ell_{1}-minimization, J. Comput. Phys., 310 (2016), pp. 440–458.
  • [45] Y. Rahimi, C. Wang, H. Dong, and Y. Lou, A scale invariant approach for sparse signal recovery, SIAM J. Sci. Comput., 41 (2019), pp. A3649–A3672.
  • [46] H. Rauhut and R. Ward, Sparse legendre expansions via l1l_{1}-minimization, J. Approx. Theory, 164 (2012), pp. 517–533.
  • [47] H. Rauhut and R. Ward, Interpolation via weighted 1\ell_{1} minimization, Appl. Comput. Harmon. Anal., 40 (2016), pp. 321 – 351.
  • [48] T. M. Russi, Uncertainty quantification with experimental data and complex system models, PhD thesis, UC Berkeley, 2010.
  • [49] K. Sargsyan, C. Safta, H. N. Najm, B. J. Debusschere, D. Ricciuto, and P. Thornton, Dimensionality reduction for complex models via bayesian compressive sensing, Int. J. Uncertain. Quantif., 4 (2014).
  • [50] X. Shen, W. Pan, and Y. Zhu, Likelihood-based selection and sharp parameter estimation, J. Am. Stat. Assoc., 107 (2012), pp. 223–232.
  • [51] S. Smolyak, Quadrature and interpolation formulas for tensor products of certain classes of functions, Sov. Math. Dokl., 4 (1963), pp. 240–243.
  • [52] M. A. Tatang, W. Pan, R. G. Prinn, and G. J. McRae, An efficient method for parametric uncertainty analysis of numerical geophysical models, J. Geophys. Res-Atmos. (1984–2012), 102 (1997), pp. 21925–21932.
  • [53] A. M. Tillmann and M. E. Pfetsch, The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing, IEEE Trans. Inf. Theory, 60 (2014), pp. 1248–1259.
  • [54] R. Tipireddy and R. Ghanem, Basis adaptation in homogeneous chaos spaces, J. Comput. Phys., 259 (2014), pp. 304–317.
  • [55] P. Tsilifis, X. Huan, C. Safta, K. Sargsyan, G. Lacaze, J. C. Oefelein, H. N. Najm, and R. G. Ghanem, Compressive sensing adaptation for polynomial chaos expansions, J. Comput. Phys., 380 (2019), pp. 29–47.
  • [56] C. Wang, M. Yan, Y. Rahimi, and Y. Lou, Accelerated schemes for the L1/L2{L}_{1}/{L}_{2} minimization, IEEE Trans. Signal Process., 68 (2020), pp. 2660–2669.
  • [57] D. Xiu and J. S. Hesthaven, High-order collocation methods for differential equations with random inputs, SIAM J. Sci. Comput., 27 (2005), pp. 1118–1139.
  • [58] D. Xiu and G. E. Karniadakis, The Wiener-Askey polynomial chaos for stochastic differential equations, SIAM J. Sci. Comput., 24 (2002), pp. 619–644.
  • [59] Z. Xu, X. Chang, F. Xu, and H. Zhang, L1/2{L_{1/2}} regularization: A thresholding representation theory and a fast solver, IEEE Trans. Neural Netw. Learn. Syst., 23 (2012), pp. 1013–1027.
  • [60] Z. Xu and T. Zhou, On sparse interpolation and the design of deterministic interpolation points, SIAM J. Sci. Comput., 36 (2014), pp. A1752–A1769.
  • [61] L. Yan, L. Guo, and D. Xiu, Stochastic collocation algorithms using l1l_{1}-minimization, Int. J. Uncertain. Quan., 2 (2012), pp. 279–293.
  • [62] X. Yang, D. A. Barajas-Solano, W. S. Rosenthal, and A. M. Tartakovsky, PDF estimation for power grid systems via sparse regression, arXiv preprint arXiv:1708.08378, (2017).
  • [63] X. Yang, M. Choi, G. Lin, and G. E. Karniadakis, Adaptive ANOVA decomposition of stochastic incompressible and compressible flows, J. Comput. Phys., 231 (2012), pp. 1587–1614.
  • [64] X. Yang and G. E. Karniadakis, Reweighted 1\ell_{1} minimization method for stochastic elliptic differential equations, J. Comput. Phys., 248 (2013), pp. 87–108.
  • [65] X. Yang, H. Lei, N. Baker, and G. Lin, Enhancing sparsity of Hermite polynomial expansions by iterative rotations, J. Comput. Phys., 307 (2016), pp. 94–109.
  • [66] X. Yang, H. Lei, P. Gao, D. G. Thomas, D. L. Mobley, and N. A. Baker, Atomic radius and charge parameter uncertainty in biomolecular solvation energy calculations, J. Chem. Theory Comput., 14 (2018), pp. 759–767.
  • [67] X. Yang, W. Li, and A. Tartakovsky, Sliced-inverse-regression–aided rotated compressive sensing method for uncertainty quantification, SIAM/ASA J. Uncertainty, 6 (2018), pp. 1532–1554.
  • [68] X. Yang, X. Wan, L. Lin, and H. Lei, A general framework for enhancing sparsity of generalized polynomial chaos expansions, Int. J. Uncertain. Quantif., 9 (2019).
  • [69] P. Yin, E. Esser, and J. Xin, Ratio and difference of l1l_{1} and l2l_{2} norms and sparse representation with coherent dictionaries, Comm. Inf. Syst., 14 (2014), pp. 87–109.
  • [70] P. Yin, Y. Lou, Q. He, and J. Xin, Minimization of 12\ell_{1-2} for compressed sensing, SIAM J. Sci. Comput., 37 (2015), pp. A536–A563.
  • [71] S. Zhang and J. Xin, Minimization of transformed L1{L_{1}} penalty: Closed form representation and iterative thresholding algorithms, Comm. Math. Sci., 15 (2017), pp. 511–537.
  • [72]  , Minimization of transformed L1{L_{1}} penalty: theory, difference of convex function algorithm, and robust application in compressed sensing, Math. Program., 169 (2018), pp. 307–336.
  • [73] T. Zhang, Multi-stage convex relaxation for learning with sparse regularization, in Adv. Neural Inf. Proces. Syst. (NIPS), 2009, pp. 1929–1936.
  • [74] Z. Zhang, X. Yang, I. V. Oseledets, G. E. Karniadakis, and L. Daniel, Enabling high-dimensional hierarchical uncertainty quantification by ANOVA and tensor-train decomposition, IEEE Trans. Comput.-Aided Des. Integr. Circuits and Syst., 34 (2015), pp. 63–76.