This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\headers

Goal-Oriented Tensor DecompositionsD. Dunlavy, E. Phipps, H. Kolla, J. Shadid, and E. Phillips

Goal-Oriented Low-Rank Tensor Decompositions for Numerical Simulation Data

Daniel M. Dunlavy Sandia National Laboratories (, ) {dmdunla, etphipp, hnkolla, jnshadi}@sandia.gov egphill86@gmail.com    Eric T. Phipps11footnotemark: 1    Hemanth Kolla11footnotemark: 1    John N. Shadid11footnotemark: 1    Edward Phillips11footnotemark: 1
Abstract

We introduce a new low-dimensional model of high-dimensional numerical simulation data based on low-rank tensor decompositions. Our new model aims to minimize differences between the model data and simulation data as well as functions of the model data and functions of the simulation data. This novel approach to dimensionality reduction of simulation data provides a means of directly incorporating quantities of interests and invariants associated with conservation principles associated with the simulation data into the low-dimensional model, thus enabling more accurate analysis of the simulation without requiring access to the full set of high-dimensional data. Computational results of applying this approach to two standard low-rank tensor decompositions of data arising from simulation of combustion and plasma physics are presented.

keywords:
tensor, low-rank, Tucker decomposition, canonical polyadic decomposition, goal-oriented
{AMS}

15A69, 65F55

[Uncaptioned image]
\levelstay

Introduction High-dimensional numerical simulation data is ubiquitous across scientific computing disciplines, including plasma physics, fluids, earth systems, and mechanics. Often such data is generated on high-performance computing (HPC) resources and is too large to be stored, analyzed, or moved as a whole. To facilitate efficient post-simulation analysis of the data, scientists often use samples and/or low-dimensional models of the high-dimensional data to avoid these challenges. Here we present a new dimension reduction method for numerical simulation data that models both the high-dimensional data and functions of the high-dimensional data.

Numerical simulation data is naturally represented as a tensor, or multi-dimensional array, consisting of the value of each simulation variable at each point in space and time. Low-rank tensor decompositions—akin to low-rank matrix decompositions of two-dimensional data—provide low-dimensional models of tensor data and have been used effectively in a variety of data analysis tasks, including data compression, surrogate modeling, pattern identification, and anomaly detection. However, existing tensor decomposition methods target simple element-wise error metrics between the high-dimensional data and low-dimensional model of the data, often failing to faithfully represent important physics quantities of interest (QoIs) or invariants arising from conservation principles. In this work, we introduce goal-oriented low-rank tensor decompositions, which incorporate QoIs directly into the optimization problems used in fitting the low-rank models to high-dimensional tensor data. Specifically, we derive two such goal-oriented decompositions—extending the common Canonical Polyadic (CP) and Tucker models—and explore how well these new low-rank models better capture several standard QoIs typically used in analyzing data produced via combustion and plasma physics simulations.

The purpose of this paper is to introduce the new goal-oriented low-rank tensor decompositions and demonstrate that they lead to improvements of low-dimensional modeling of high-dimensional numerical simulation data. As such, the target audience for this work includes both tensor decomposition experts and computational scientists.

\levelstay

Background

In this section, we introduce tensors and two of the standard low-rank tensor decompositions that form the foundation of our new goal-oriented low-rank decompositions. We also discuss the connections of our approach to related work on constrained low-rank tensor decompositions.

Tensors, or multi-dimensional arrays, are a powerful means of representing relationships in multiway data [49]. For example, in the context of scientific simulations presented in this work, tensors represent the spatio-temporal solution of coupled systems of differential equations. Analysis of high-dimensional tensor data is usually facilitated through low-rank tensor decompositions, which approximate a given tensor in terms of simpler, structured tensors defined with fewer parameters than the number of elements in the high-dimensional data. In this work, we focus on two common tensor decompositions, the Canonical Polyadic (CP, also known as CANDECOMP/PARAFAC) and Tucker decompositions, summarized below.

The order of a tensor is the number of dimensions and each tensor dimension is called a mode. Following standard notation [49], lowercase letters denote scalars (tensors of order zero), e.g., xx; bold lowercase letters denote vectors (tensors of order one), e.g., 𝐯\bm{\mathbf{{v}}}; and bold uppercase letters denote matrices (tensors of order two), e.g., 𝐀\bm{\mathbf{{A}}}. Tensors of order three and higher are denoted with bold capital script letters, e.g., 𝓧\bm{\mathscr{{X}}}. We use multi-index notation to indicate tensor elements, i.e., x𝐢xi1idx_{\bm{\mathbf{i}}}\equiv x_{i_{1}\dots i_{d}} denotes the entry 𝐢=(i1,,id){1,,I1}{1,,Id}\bm{\mathbf{i}}=(i_{1},\dots,i_{d})\in\mathcal{I}\equiv\set{1,\dots,I_{1}}\otimes\dots\otimes\set{1,\dots,I_{d}} of a dd-way tensor 𝓧I1××Id\bm{\mathscr{{X}}}\in\mathbb{R}^{I_{1}\times\cdots\times I_{d}}. Finally, given 𝓧I1××Id\bm{\mathscr{{X}}}\in\mathbb{R}^{I_{1}\times\cdots\times I_{d}}, we define 𝐗(n)\mathbf{X}_{(n)} for n=1,,dn=1,\dots,d to be the mode-nn matricization of 𝓧\bm{\mathscr{{X}}}, which is a matrix of size In×(I1In1In+1Id)I_{n}\times(I_{1}\cdots I_{n-1}I_{n+1}\cdots I_{d}) whose rows are given by a linearization of the mode-nn slices.

\leveldown

Canonical Polyadic (CP) Decompositions

Refer to caption
(a)
Refer to caption
(b)
Figure 1: (1(a)) Schematic of the canonical polyadic (CP) decomposition for a general, non-symmetric, third-order tensor. The input tensor is approximated as a sum of vector outer products. (1(b)) Schematic of the Tucker decomposition for a third-order tensor. Mathematically, the decomposition can be written as 𝓧𝓜=𝓖×1𝐀(1)×2𝐀(2)×3𝐀(3)\bm{\mathscr{{X}}}\approx\bm{\mathscr{{M}}}=\bm{\mathscr{{G}}}\times_{1}\bm{\mathbf{{A}}}^{\mkern-4.0mu(1)}\times_{2}\bm{\mathbf{{A}}}^{\mkern-4.0mu(2)}{\times_{3}}\bm{\mathbf{{A}}}^{\mkern-4.0mu(3)}.

For a given dd-way tensor 𝓧I1××Id\bm{\mathscr{{X}}}\in\mathbb{R}^{I_{1}\times\cdots\times I_{d}}, the CP decomposition [43, 44, 19, 20, 18, 41] attempts to find a good approximating low-rank model tensor 𝓜\bm{\mathscr{{M}}} of the form

(1) 𝓧𝓜=r=1R𝐚j(1)𝐚j(2)𝐚j(d),\bm{\mathscr{{X}}}\approx\bm{\mathscr{{M}}}=\sum_{r=1}^{R}\bm{\mathbf{{a}}}^{\mkern-3.0mu{(1)}}_{j}\circ\bm{\mathbf{{a}}}^{\mkern-3.0mu{(2)}}_{j}\circ\dots\circ\bm{\mathbf{{a}}}^{\mkern-3.0mu{(d)}}_{j},

where 𝐚j(k)\bm{\mathbf{{a}}}^{\mkern-3.0mu{(k)}}_{j} is a column vector of size IkI_{k}, \circ represents the vector outer product, and RR is the approximate rank. See Fig. 1(a) for a graphical depiction. The column vectors for each mode kk are often collected into a matrix 𝐀(k)=[𝐚1(k)𝐚R(k)]\bm{\mathbf{{A}}}^{\mkern-4.0mu(k)}=[\bm{\mathbf{{a}}}^{\mkern-3.0mu{(k)}}_{1}\,\cdots\;\bm{\mathbf{{a}}}^{\mkern-3.0mu{(k)}}_{R}] of size Ik×RI_{k}\times R called a factor matrix. Given factor matrices {𝐀(1),,𝐀(d)}\set{\bm{\mathbf{{A}}}^{\mkern-4.0mu(1)},\dots,\bm{\mathbf{{A}}}^{\mkern-4.0mu(d)}}, we use the notation 𝓜=𝐀(1),,𝐀(d)\bm{\mathscr{{M}}}=\llbracket\bm{\mathbf{{A}}}^{\mkern-4.0mu(1)},\dots,\bm{\mathbf{{A}}}^{\mkern-4.0mu(d)}\rrbracket [7]. For standard CP decompositions, 𝓜\bm{\mathscr{{M}}} is computed by solving the following optimization problem,

(2) min𝓜f(𝓧,𝓜)=min𝓜𝓧𝓜F2s.t.𝓜=𝐀(1),,𝐀(d),\min_{\bm{\mathscr{{M}}}}\;\;f\left(\bm{\mathscr{{X}}},\bm{\mathscr{{M}}}\right)=\min_{\bm{\mathscr{{M}}}}\;\;\|\bm{\mathscr{{X}}}-\bm{\mathscr{{M}}}\|_{F}^{2}\quad\mbox{s.t.}\quad\bm{\mathscr{{M}}}=\llbracket\bm{\mathbf{{A}}}^{\mkern-4.0mu(1)},\dots,\bm{\mathbf{{A}}}^{\mkern-4.0mu(d)}\rrbracket,

where 𝓧𝓜F2=𝐢(x𝐢m𝐢)2\|\bm{\mathscr{{X}}}-\bm{\mathscr{{M}}}\|_{F}^{2}=\sum_{\bm{\mathbf{i}}\in\mathcal{I}}(x_{\bm{\mathbf{i}}}-m_{\bm{\mathbf{i}}})^{2} denotes the tensor Frobenius (sum-of-squares) norm and \mathcal{I}, defined as above, denotes the set of all multi-indices of the tensor elements. Many approaches have been developed for efficiently solving Eq. 2 that are scalable to large, sparse or dense tensors including alternating least-squares (ALS) [18, 41], gradient-based optimization [2], nonlinear least-squares [55], and damped Gauss-Newton [56, 64] approaches, among others.

\levelstay

Tucker Decompositions

The Tucker decomposition [65] attempts to approximate a given tensor 𝓧I1××Id\bm{\mathscr{{X}}}\in\mathbb{R}^{I_{1}\times\cdots\times I_{d}} in the form

(3) 𝓧𝓜=𝓖×1𝐀(1)×2×d𝐀(d),\bm{\mathscr{{X}}}\approx\bm{\mathscr{{M}}}=\bm{\mathscr{{G}}}\times_{1}\bm{\mathbf{{A}}}^{\mkern-4.0mu(1)}\times_{2}\dots\times_{d}\bm{\mathbf{{A}}}^{\mkern-4.0mu(d)},

where 𝓖R1××Rd\bm{\mathscr{{G}}}\in\mathbb{R}^{R_{1}\times\dots\times R_{d}} is called the core tensor and the columns of each factor matrix 𝐀(n)In×Rn\bm{\mathbf{{A}}}^{\mkern-4.0mu(n)}\in\mathbb{R}^{I_{n}\times R_{n}}, n=1,,dn=1,\dots,d are usually orthonormal. See Fig. 1(b) for an example in three dimensions. Here the notation 𝓨=𝓧×n𝐀\bm{\mathscr{{Y}}}=\bm{\mathscr{{X}}}\times_{n}\bm{\mathbf{{A}}} is called the nn-mode product and is defined in terms of matricized tensors as 𝐘(n)=𝐀𝐗(n)\bm{\mathbf{{Y}}}_{(n)}=\bm{\mathbf{{A}}}\mathbf{X}_{(n)}. As in the CP decomposition, we use the shorthand notation 𝓜=𝓖;𝐀(1),,𝐀(d)\bm{\mathscr{{M}}}=\llbracket\bm{\mathscr{{G}}};\bm{\mathbf{{A}}}^{\mkern-4.0mu(1)},\dots,\bm{\mathbf{{A}}}^{\mkern-4.0mu(d)}\rrbracket. If RnInR_{n}\ll I_{n}, n=1,,dn=1,\dots,d, then the memory required to store 𝓜\bm{\mathscr{{M}}} is significantly smaller than 𝓧\bm{\mathscr{{X}}}, and hence Tucker decompositions have been used effectively for data compression [9, 5]. Each factor matrix 𝐀(n)\bm{\mathbf{{A}}}^{\mkern-4.0mu(n)} is an approximate basis for the column space corresponding to the mode-nn matricization of 𝓧\bm{\mathscr{{X}}}; thus, the Tucker decomposition is often considered to be a form of generalization of the matrix SVD or PCA to higher dimensions [29].

Similar to the problem for CP decompositions in Eq. 2, for standard Tucker decompositions, 𝓜\bm{\mathscr{{M}}} is computed by solving the following optimization problem

(4) min𝓜f(𝓧,𝓜)=min𝓜𝓧𝓜F2s.t.𝓜=𝓖;𝐀(1),,𝐀(d).\min_{\bm{\mathscr{{M}}}}\;\;f\left(\bm{\mathscr{{X}}},\bm{\mathscr{{M}}}\right)=\min_{\bm{\mathscr{{M}}}}\;\;\|\bm{\mathscr{{X}}}-\bm{\mathscr{{M}}}\|_{F}^{2}\quad\mbox{s.t.}\quad\bm{\mathscr{{M}}}=\llbracket\bm{\mathscr{{G}}};\bm{\mathbf{{A}}}^{\mkern-4.0mu(1)},\dots,\bm{\mathbf{{A}}}^{\mkern-4.0mu(d)}\rrbracket.

The only difference in the optimization problems between the CP and Tucker decompositions is the specific form of the low-rank tensor 𝓜\bm{\mathscr{{M}}}. Many approaches exist for computing Tucker decompositions, including the Higher-Order SVD (HOSVD) [29], sequentially truncated HOSVD (ST-HOSVD) [67], and Higher-Order Orthogonal Iteration (HOOI) [30]. For a given set of Tucker ranks, both the HOSVD and ST-HOSVD algorithms compute quasi-optimal solutions, with reconstruction error within a factor of d\sqrt{d} of the optimal approximation as measured by a sum-of-squares (Frobenius) error [40]. For a given approximation error tolerance, both algorithms can select Tucker ranks in order to guarantee the error tolerance is satisfied with respect to the Frobenius norm, providing a continuous trade-off between data reduction and approximation error.

\levelstay

Constrained Tensor Decompositions

In general, when computing low-rank tensor decompositions via numerical optimization approaches, some loss function involving the tensor data 𝓧\bm{\mathscr{{X}}} and low-rank model 𝓜\bm{\mathscr{{M}}} is minimized—e.g., ff in Eq. 2 or Eq. 4—subject to the specific low-rank structure of the model 𝓜\bm{\mathscr{{M}}}—e.g., either the CP or Tucker structure described above. In addition to the constraint of the low-rank model, more general constraints can be included in the optimization problem to reflect additional information known about the data.

Much prior work exists on incorporating constraints into CP and Tucker decompositions [26], including non-negativity (and more general bound) constraints [8, 25, 27], linear constraints [24, 37], constraints on the rank of the decomposition [58], and constraints presented in the form of coupled matrix-tensor factorizations [3, 31]. However, such constraints amount to either bound or linear constraints, and the methods used to solve the associated constrained optimization problems are often not applicable to general nonlinear constraints. Most related to the work presented here is recent work that includes constraints on low-rank matrix and tensor decomposition factors via regularization penalties. For example, Huang, et al. introduced a general method for incorporating constraints through penalties on the factors and illustrated the use of non-negativity constraints, nuclear-norm regularization (for tensor completion), and sparsity-inducing regularization (for dictionary learning) [46]. More recently, that work was extended to support any proximable regularizer, broadening the types of constraints that could be placed on the factors [62]. As we show in Section 1, our approach presented here also incorporates constraints as penalties to the decomposition optimization problems in Eq. 2 or Eq. 4, but we illustrate the use on more general nonlinear constraints than this previous work.

\levelstay

Related Work

As mentioned above, low-rank approximation through tensor decomposition has applications to many different computational and analysis tasks, including data compression [9, 28], reduced-order modeling [38], surrogate modeling [48, 47], and anomaly detection [34, 52], and in many of these applications, including QoIs into the low-rank approximation could potentially improve the utility of the decomposition. Typically, only the error of primary state variables is minimized while computing the decomposition, and it has been observed that the errors in QoIs, which are derived from the primary variables, can be orders of magnitude higher [50]. In this work, we don’t focus on any particular application of tensor decomposition, but instead focus on the algorithmic, computational, and numerical consequences of computing the goal-oriented decomposition itself. Furthermore, we recognize that tensor methods are not the only approach for solving these problems and numerous other methods for the above applications have been developed, some also incorporating goal-oriented techniques. In particular, the literature on goal-oriented techniques for reduced order modeling of scientific simulations is vast, e.g., [17, 35, 15, 36, 23]. Furthermore, neural networks (NN) and neural operators (NO) have recently emerged as data-driven surrogate models of scientific systems, and ‘physics-informed’ variants of these that attempt to preserve physics in addition to minimizing state prediction error have been proposed, such as physics-informed neural networks (PINNs) [60] and physics-informed neural operators (PINOs) [53]. Finally, several data compression approaches that can preserve quantities of interest have recently been investigated, including MGARD [4], which can preserve linear quantities of interest, as well as [51], which proposes a hybrid approach based on tensor decompositions, autoencoders, MGARD for achieving a prescribed error guarantee, and QoI preservation as a post-processing step after reconstruction. Our approach differs in that it produces a reduced/compressed model that attempts to directly preserve the relevant QoIs (including nonlinear QoIs) without post-processing.

\levelup

Goal-Oriented Tensor Decompositions

\leveldown

Problem Formulation

As described above, traditional tensor decompositions can often be formulated as the solution to optimization problem such as

(5) min𝓜f(𝓧,𝓜)s.t. 𝓜 is of CP or Tucker form,\min_{\bm{\mathscr{{M}}}}f(\bm{\mathscr{{X}}},\bm{\mathscr{{M}}})\quad\mbox{s.t. $\bm{\mathscr{{M}}}$ is of CP or Tucker form,}

where ff is chosen based on the statistical model of the data tensor 𝓧\bm{\mathscr{{X}}}. In this work, we use f(𝓧,𝓜)=𝓧𝓜F2f(\bm{\mathscr{{X}}},\bm{\mathscr{{M}}})=\|\bm{\mathscr{{X}}}-\bm{\mathscr{{M}}}\|_{F}^{2}, i.e., sum-of-squares loss, which is appropriate when the error tensor 𝓧𝓜\bm{\mathscr{{X}}}-\bm{\mathscr{{M}}} is assumed to be normally distributed (however, many other choices are possible). We further assume the data tensor 𝓧\bm{\mathscr{{X}}} represents the spatio-temporal evolution of some multi-physics system on a fixed Cartesian grid, consisting of one or more spatial modes (our examples below contain either two or three spatial dimensions)111This includes the modeling of systems involving more complex geometries and unstructured meshes by collapsing all spatial dimensions into a single mode given by all of the vertices in the mesh., a variable mode representing the different physics variables present in the system (e.g., density, temperature, momentum, etc.), and a temporal mode. We model quantities-of-interest (QoIs), whether they be invariants that are satisfied by the data or merely quantities that are derived from the data, as scalar valued functions gq(𝓧t)g_{q}(\bm{\mathscr{{X}}}_{t}) for a given set of time points t𝕋qt\in\mathbb{T}_{q}, where 𝓧t\bm{\mathscr{{X}}}_{t} represents a temporal slice of 𝓧\bm{\mathscr{{X}}} at time tt (i.e., 𝓧t=𝓧(:,:,:,:,t)\bm{\mathscr{{X}}}_{t}=\bm{\mathscr{{X}}}(:,:,:,:,t) for a tensor with three spatial modes) and 𝕋q{1,,τ}\mathbb{T}_{q}\subseteq\{1,\dots,\tau\} is the set of indices in the temporal mode of the tensor used when computing the qqth QoI. To preserve these QoIs in the tensor decomposition, one would like to constrain the tensor decomposition to enforce gq(𝓜t)=gq(𝓧t)g_{q}(\bm{\mathscr{{M}}}_{t})=g_{q}(\bm{\mathscr{{X}}}_{t}). However, unless the data 𝓧\bm{\mathscr{{X}}} is exactly low rank, it is unlikely these constraints can be satisfied exactly and still obtain reasonable levels of reduction and some accuracy in the data reconstruction so that 𝓜𝓧\bm{\mathscr{{M}}}\approx\bm{\mathscr{{X}}}. We therefore attempt to preserve them through the following penalty formulation,

(6) min𝓜α0f(𝓧,𝓜)+q=1Qαqt𝕋q(gq(𝓧t)gq(𝓜t))2,\min_{\bm{\mathscr{{M}}}}\alpha_{0}f(\bm{\mathscr{{X}}},\bm{\mathscr{{M}}})+\sum_{q=1}^{Q}\alpha_{q}\sum_{t\in\mathbb{T}_{q}}\left(g_{q}(\bm{\mathscr{{X}}}_{t})-g_{q}(\bm{\mathscr{{M}}}_{t})\right)^{2}\;,

where QQ is the number of QoIs and α0\alpha_{0},…,αQ\alpha_{Q}, are user-chosen weighting coefficients. Note that the quadratic penalty in Eq. 6 could be replaced with other penalty functions; e.g., an augmented Lagrangian penalty [54]. While not explored here, the use of alternate penalty functions to incorporate the QoIs into the low-rank decompositions could be explored through future work. We next discuss multiple derivative-based optimization approaches for solving Eq. 6 followed by several practical considerations that must be addressed for any optimization-based approach.

\levelstay

Optimization Approaches

In this section, we describe two derivative-based optimization methods proposed to solve the optimization problems defining the goal-oriented CP and Tucker decompositions above. The first is the Limited-Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) quasi-Newton algorithm commonly used for tensor decomposition methods, followed by trust-region Newton methods employing truncated Conjugate-Gradient inner solvers and Gauss-Newton Hessian approximations.

\leveldown

Quasi-Newton

The L-BFGS algorithm has been widely used for tensor decomposition, including CP [2], generalized CP [45], and Tucker [42] decompositions, as it provides superlinear convergence rates but only requires specification of the gradient of the objective function. The algorithm works by constructing secant approximations of the Hessian matrix over multiple steps of the optimization algorithm, which can be implemented by low-rank updates to the Hessian and its inverse (see [54] for more details). Our experiments specifically employ the implementation provided by the L-BFGS-B code [68] available in the Tensor Toolbox for MATLAB. Formulas for computing the needed gradients for the goal-oriented Tucker and CP decompositions are described in References.

\levelstay

Trust-Region Newton

As will be demonstrated in Fig. 5, our numerical experiments indicated the above L-BFGS method had difficulty in achieving significant reduction in the goal terms compared to a traditional tensor decomposition, particularly for the Tucker method. However, Newton-type methods such as trust region-Newton, Gauss-Newton and Levenberg-Marquardt have proven to be effective for tensor decomposition in other contexts, e.g., [57, 56, 61, 66]. So we also explored applying such methods to the goal-oriented formulation, in particular focusing on trust-region Newton methods. The trust-region Newton method approximates the objective function at each step by a local quadratic model constructed from the objective function gradient and Hessian. The quadratic model is minimized in the trust-region via Newton’s method. The implementation used here was provided by the Manopt [16] MATLAB package for optimization on Riemannian manifolds (which would allow inclusion of constraints such as optimization over Grassmann manifolds for Tucker decompositions as described later in Fig. 1, but those capabilities are not explored here). In Euclidean space, the Manopt trust-region method [1] is equivalent to the method described in [54], using a truncated conjugate gradient (tCG) linear solve for the inexact Newton step. This provides much faster quadratic convergence over the L-BFGS method described above, but requires implementing (approximate) Hessian-vector products for use in CG. Since the optimization problems defining the goal-oriented CP and Tucker decompositions are a form of nonlinear least-squares, we found the Gauss-Newton Hessian approximation to be effective, which only requires implementation of the first derivative of the goal functions. Formulas for computing Gauss-Newton Hessian-vector products are derived in References for both CP and Tucker models. The truncated CG solves are also improved through the use of a preconditioner. In this work, we employed simple block-diagonal preconditioning in the CP case, only considering the Frobenius norm term in the objective function, where the diagonal blocks are explicitly constructed and inverted through Cholesky decompositions. In the Tucker case, even just forming these blocks was too memory intensive, so we resorted to diagonal preconditioning. Note that while it can be shown that each goal-oriented term introduces a rank-|𝕋q||\mathbb{T}_{q}| correction to the diagonal blocks arising from the Frobenius norm term, effective incorporation of these terms in the preconditioner will be studied in future work. Construction of the diagonal blocks and their inverses is standard in the CP and Tucker literature (e.g., [57, 10]), and they are summarized in References for reference.

\levelup

Solution Procedure

Many multiphysics problems involve solution variables that exhibit a wide range of scales that can cause numerical difficulties when solving Eq. 6 if not handled appropriately. To obtain more accurate low-rank models of this data, we first perform centering and scaling of each variable slice in 𝓧\bm{\mathscr{{X}}} to produce a scaled tensor 𝓧~\bm{\mathscr{{\tilde{X}}}} where, e.g., for a 5-way data tensor 𝓧I1×I2×I3×I4×τ\bm{\mathscr{{X}}}\in\mathbb{R}^{I_{1}\times I_{2}\times I_{3}\times I_{4}\times\tau},

(7) 𝓧~(i1,i2,i3,v,t)=𝓧(i1,i2,i3,v,t)𝝁(v)𝝈(v).\bm{\mathscr{{\tilde{X}}}}(i_{1},i_{2},i_{3},v,t)=\frac{\bm{\mathscr{{X}}}(i_{1},i_{2},i_{3},v,t)-\bm{\mathbf{{\mu}}}(v)}{\bm{\mathbf{{\sigma}}}(v)}.

The computation of 𝝁\bm{\mathbf{{\mu}}} and 𝝈\bm{\mathbf{{\sigma}}} are problem dependent, with common choices of the variable mean for 𝝁\bm{\mathbf{{\mu}}} and standard deviation for 𝝈\bm{\mathbf{{\sigma}}}. For traditional tensor decompositions, one can compute a corresponding low-rank model 𝓜~\bm{\mathscr{{\tilde{M}}}} from 𝓧~\bm{\mathscr{{\tilde{X}}}} and then transform back to unscaled variables by unscaling the corresponding variable factor matrix, e.g., for CP:

(8) 𝓜=S(𝓜~)=S(𝐀,𝐁,𝐂,𝐃,𝐄)𝐀,𝐁,𝐂,diag(𝝈)𝐃,𝐄+𝟏I1,𝟏I2,𝟏I3,𝝁,𝟏τ\bm{\mathscr{{M}}}=S(\bm{\mathscr{{\tilde{M}}}})=S(\llbracket\bm{\mathbf{{A}}},\bm{\mathbf{{B}}},\bm{\mathbf{{C}}},\bm{\mathbf{{D}}},\bm{\mathbf{{E}}}\rrbracket)\equiv\llbracket\bm{\mathbf{{A}}},\bm{\mathbf{{B}}},\bm{\mathbf{{C}}},\mbox{diag}(\bm{\mathbf{{\sigma}}})\bm{\mathbf{{D}}},\bm{\mathbf{{E}}}\rrbracket+\llbracket\bm{\mathbf{{1}}}_{I_{1}},\bm{\mathbf{{1}}}_{I_{2}},\bm{\mathbf{{1}}}_{I_{3}},\bm{\mathbf{{\mu}}},\bm{\mathbf{{1}}}_{\tau}\rrbracket

where 𝟏n\bm{\mathbf{{1}}}_{n} is a length-nn vector of 1’s and SS is the function that maps the scaled data and model to the unscaled data and model. However, QoI evaluations often only make sense when evaluated on unscaled tensors, and thus we modify Eq. 6 to explicitly unscale data before QoI evaluation:

(9) min𝓜~fgo(𝓧~,𝓜~)wherefgo(𝓧~,𝓜~)α0f(𝓧~,𝓜~)+q=1Qαqt𝕋q(gq(S(𝓧~t))gq(S(𝓜~t)))2,\min_{\bm{\mathscr{{\tilde{M}}}}}f_{go}(\bm{\mathscr{{\tilde{X}}}},\bm{\mathscr{{\tilde{M}}}})\;\;\mbox{where}\;\;f_{go}(\bm{\mathscr{{\tilde{X}}}},\bm{\mathscr{{\tilde{M}}}})\equiv\alpha_{0}f(\bm{\mathscr{{\tilde{X}}}},\bm{\mathscr{{\tilde{M}}}})+\sum_{q=1}^{Q}\alpha_{q}\sum_{t\in\mathbb{T}_{q}}\left(g_{q}(S(\bm{\mathscr{{\tilde{X}}}}_{t}))-g_{q}(S(\bm{\mathscr{{\tilde{M}}}}_{t}))\right)^{2},

which requires modifying the derivative formulas presented in References to incorporate Eq. 8 when computing the goal derivatives.

Once the scaling vectors have been determined, the next steps in finding an approximate solution to Eq. 9 are obtaining a good initial guess and choosing the weighting coefficients. For the former, we first obtain an approximate solution to

(10) min𝓜~f(𝓧~,𝓜~)\min_{\bm{\mathscr{{\tilde{M}}}}}f(\bm{\mathscr{{\tilde{X}}}},\bm{\mathscr{{\tilde{M}}}})

using a traditional approach such as CP-ALS when 𝓜~\bm{\mathscr{{\tilde{M}}}} is of CP form and ST-HOSVD when 𝓜~\bm{\mathscr{{\tilde{M}}}} is of Tucker form, to provide the initial guess 𝓜~0\bm{\mathscr{{\tilde{M}}}}^{0}. We then fix the CP/Tucker rank of 𝓜~\bm{\mathscr{{\tilde{M}}}} based on the corresponding rank of 𝓜~0\bm{\mathscr{{\tilde{M}}}}^{0} and choose the weighting coefficients as

(11) α0\displaystyle\alpha_{0} =1Q+11f(𝓧~,𝓜~0),\displaystyle=\frac{1}{Q+1}\frac{1}{f(\bm{\mathscr{{\tilde{X}}}},\bm{\mathscr{{\tilde{M}}}}^{0})},
αq\displaystyle\alpha_{q} =1Q+11t𝕋q(gq(S(𝓧~t))gq(S(𝓜~t0)))2,q=1,,Q,\displaystyle=\frac{1}{Q+1}\frac{1}{\sum_{t\in\mathbb{T}_{q}}\left(g_{q}(S(\bm{\mathscr{{\tilde{X}}}}_{t}))-g_{q}(S(\bm{\mathscr{{\tilde{M}}}}^{0}_{t}))\right)^{2}},\;\;q=1,\dots,Q,

which ensures the tensor term and each goal-term contributes equally to the objective function.

When presenting results of the various low-rank decompositions in Figure 1 and Figure 4, we report relative tensor reconstruction error using the unscaled data and model 𝓧𝓜F/𝓧F\|\bm{\mathscr{{X}}}-\bm{\mathscr{{M}}}\|_{F}/\|\bm{\mathscr{{X}}}\|_{F}, scaled data and model 𝓧~𝓜~F/𝓧~F\|\bm{\mathscr{{\tilde{X}}}}-\bm{\mathscr{{\tilde{M}}}}\|_{F}/\|\bm{\mathscr{{\tilde{X}}}}\|_{F}, or both.

\levelstay

Non-uniqueness and Stopping Criteria It is well-known that CP and Tucker models obtained from the solution of optimization problems such as Eq. 5 are not locally unique and instead lie along a manifold of equivalent solutions [10]. For CP, this non-uniqueness arises from the CP factor scaling ambiguity whereby any factor matrix column may be scaled by a nonzero value and not change the resulting tensor reconstruction as long as the same column of the factor matrices for the other dimensions are scaled appropriately so that the product of scales is unity. For Tucker, the situation is even worse since any factor matrix may be multiplied by a nonsingular matrix and not change the tensor reconstruction as long as the core is multiplied by the inverse of that matrix in the corresponding dimension. Since the QoI functions in Eq. 6 or Eq. 9 rely on tensor reconstructions, these same non-uniqueness properties are inherited within the goal-oriented formulation. This lack of uniqueness can cause problems when monitoring the norm of the gradient of the objective function to determine stopping criteria, because it often does not converge monotonically to zero and instead exhibits a saw-tooth pattern (while the objective function is constant along the manifold of equivalent solutions at a point, and the gradient is therefore orthogonal to the tangent space of the manifold at that point, steps with a nonzero component along the tangent space can cause the norm of the gradient to increase since the manifold is curved). Thus determining appropriate stopping criteria is somewhat challenging. However, in this work we are not aiming to find the CP/Tucker model that best minimizes the QoI error, but rather to reduce this error over what is obtained in a traditional, non-goal-oriented low-rank decomposition formulation. Thus our approach is to run the optimization method for a fixed number of iterations. In particular, because of the quick convergence of the trust-region Newton optimization method, we found 5 optimization iterations to be sufficient for achieving significant goal reduction in most of our experiments (however most of the experiments below used 20 iterations to ensure the results were near a local minimum). Furthermore, with the choice of weighting coefficients described above, the objective function fgof_{go} evaluated at the initial guess satisfies fgo(𝓧~,𝓜~0)=1f_{go}(\bm{\mathscr{{\tilde{X}}}},\bm{\mathscr{{\tilde{M}}}}^{0})=1, and if a solution 𝓜~\bm{\mathscr{{\tilde{M}}}}^{\ast} is found that exactly preserves the QoIs such that gq(S(𝓧~t))=gq(S(𝓜~t))g_{q}(S(\bm{\mathscr{{\tilde{X}}}}_{t}))=g_{q}(S(\bm{\mathscr{{\tilde{M}}}}_{t}^{\ast})) while not changing the tensor loss (i.e., f(𝓧~,𝓜~)=f(𝓧~,𝓜~0)f(\bm{\mathscr{{\tilde{X}}}},\bm{\mathscr{{\tilde{M}}}}^{\ast})=f(\bm{\mathscr{{\tilde{X}}}},\bm{\mathscr{{\tilde{M}}}}^{0})), then fgo(𝓧~,𝓜~)=1/(Q+1)f_{go}(\bm{\mathscr{{\tilde{X}}}},\bm{\mathscr{{\tilde{M}}}}^{\ast})=1/(Q+1). These provide approximate (though not strict) upper and lower bounds on the objective function fgof_{go}, and by comparing fgof_{go} to 1/(Q+1)1/(Q+1) throughout the optimization process, one can gauge how well the solution to Eq. 6 has reduced the QoI error.

We note that the non-uniqueness challenges of these optimization formulations can be rectified by incorporating additional constraints into the problem formulation. For CP this is relatively straightforward by constraining each factor matrix column to unit-norm and introducing an additional RR weight variables (where RR is the CP rank) into the optimization problem. For Tucker, however, formulating explicit algebraic constraints that eliminate the above ambiguity is more challenging. Instead, one can formulate both CP and Tucker on Riemannian manifolds and leverage existing work on optimization methods over Riemannian manifolds (e.g., [1]). In the Tucker case, this is done by constraining the columns of the factor matrices to be orthonormal and recognizing the resulting core is not a free variable, but rather the projection of the data onto spaces spanned by the factor matrix columns in each dimension [33]. By eliminating the core, one can then pose and solve the Tucker minimization over a product of Grassmann manifolds [33, 32]. However, extending such a method to the goal-oriented Tucker formulation is non-trivial since the constraint between the core and factor matrices is no longer satisfied and therefore can’t be formulated as a product Riemannian manifold. While not explored here, this manifold optimization approach could be explored through future work.

\levelstay

Software Implementation For the numerical experiments below, we implemented the goal-oriented formulation in Eq. 9 in MATLAB, leveraging the Tensor Toolbox for MATLAB [6, 7] dense tensor, CP, and Tucker data structures as well as the Matricized Tensor Times Khatri-Rao Product (MTTKRP) and Tensor Times Matrix (TTM) operations needed for the gradient and Hessian calculations described in References. As described in Fig. 1 we used the Manopt trust region-Newton as the optimization solver. While effective for investigating the feasibility of this approach, this serial, Matlab-based implementation was found to be quite limiting in the size of simulation data sets that could be studied. In particular, the approach for MTTKRP in the Tensor Toolbox for MATLAB for dense tensors involves explicitly forming the Khatri-Rao product, which for large data sets and large CP ranks is memory prohibitive.

\levelstay

Computational Cost The goal-oriented formulation in Eq. 9 adds an additional QQ terms to the objective function defining the CP/Tucker model, so the computational cost of the goal-oriented approach is necessarily greater than a traditional approach, even one based on similar optimization methods such as trust-region Newton/Gauss-Newton. Comparing Eq. 26 with Eq. 35 and Eq. 32 with Eq. 39, we see the cost of computing the gradient of the additional QoI terms is similar to the gradient cost for the Frobenius loss term, once each QoI derivative tensor 𝓩\bm{\mathscr{{Z}}} has been computed. Note that these tensors can be combined across multiple QoIs to compute the goal-oriented gradient by applying Eq. 26/Eq. 32 once instead of QQ times, and so the gradient cost of the goal-oriented approach is about twice that of a traditional, non-goal-oriented approach (not including the cost to calculate the QoI derivative tensors). Furthermore, comparing Eq. 32 and Eq. 34 with Eq. 39 and Eq. 40, we see the cost of Hessian-vector products is again similar between the QoI terms and the Frobenius loss term for the Tucker method. However, by comparing Eq. 26 and Eq. 29 with Eq. 36, we see the special structure of Frobenius loss does allow for more efficient Hessian-vector products for the traditional CP model than can be computed for the goal-oriented approach, since it requires dd CP model reconstructions. Moreover, the cost of computing the derivative tensor 𝓩\bm{\mathscr{{Z}}} for each QoI can be large. Each derivative tensor is the same size and shape as the original data 𝓧\bm{\mathscr{{X}}}, and in most applications, the cost of computing it should be proportional to the total number of entries in 𝓧\bm{\mathscr{{X}}}. However, the finite element integrations defining the QoIs derivatives for the plasma physics examples shown below are challenging to vectorize within our Matlab-based implementation, making the goal-oriented approach substantially more expensive than a traditional CP or Tucker decomposition method. We would expect the cost to be more competitive with a (Newton-based) traditional optimization method in an HPC environment where such integrals can be implemented more efficiently.

\levelup

Application Case Study I: Combustion

\leveldown

Background

We use a dataset of direct numerical simulation (DNS) of turbulent combustion in conditions representing a homogeneous charge compression ignition (HCCI) engine. The simulation [11] considers autoignition of a turbulent ethanol-air mixture in a 2D spatial domain, performed with the DNS code S3D [22] which solves a system of PDEs governing conservation of mass, momentum, energy, and chemical species mass fractions described in Table 1. S3D uses an explicit eighth-order finite difference scheme for spatial derivatives and explicit fourth-order Runge-Kutta scheme for temporal derivatives. A structured grid rectangular Cartesian domain with uniform grid spacing and uniform time step are also typical choices (although not strictly necessary).

Mass ρt+(ρ𝐮)=0\frac{\partial\rho}{\partial t}+\nabla\cdot(\rho{\bf u})=0
Momentum 𝐦t+[𝐦𝐮+(P𝐈+𝚷)]=𝟎;𝐦ρ𝐮\frac{\partial{{\bf m}}}{\partial t}+\nabla\cdot\left[{{\bf m}\otimes{\bf u}}+(P{\bf I}+{{\bf\Pi}})\right]={\bf 0};~{\bf m}\equiv\rho{\bf u}
Total Energy (ρet)t+[ρ𝐮et+𝐪]𝚷:𝐮=𝟎;ete+𝐮𝐮2\frac{\partial(\rho e_{t})}{\partial t}+\nabla\cdot\left[\rho{\bf u}e_{t}+{\bf q}\right]-{{\bf\Pi}:{\nabla{\bf u}}}={\bf 0};~e_{t}\equiv e+\frac{{\bf u}\cdot{\bf u}}{2}
Species Mass Fractions ρ𝐲t+[𝐦𝐬+𝐝]˙ω=𝟎\frac{\partial\rho{\bf y}}{\partial t}+\nabla\cdot\left[{{\bf m}\otimes{\bf s}}+{\bf d}\right]-{\bm{\dot{}}{\omega}}={\bf 0}
Table 1: System of PDEs for low Mach number turbulent combustion solved by S3D. ρ\rho is density, 𝐮\bf u is the velocity vector, 𝐦\bf m is momentum vector, PP is pressure, 𝚷\bf\Pi is the viscous shear stress tensor, ete_{t} and ee are total and internal energies respectively, 𝐪\bf q is the heat flux vector, 𝐬\bf s is the vector of chemical species mass fractions, 𝐝\bf d is the species diffusion flux, and ˙ω\bm{\dot{}}{\omega} is the vector of species chemical production rates. In addition to the PDEs, a set of constitutive laws complete the description of the computational model: thermodynamic relationships compute temperature, TT, as a function of internal energy, ee; pressure is prescribed using an equation of state, P=P(ρ𝐬,T)P=P(\rho{\bf s},T); molecular transport laws relate 𝚷\bf\Pi to 𝐮\nabla{\bf u}, 𝐪\bf q to T\nabla T, and 𝐝\bf d to 𝐬\nabla{\bf s}, respectively; finite-rate chemical kinetics and Arrhenius equation provide species production rates ˙ω=˙ω(ρ𝐬,P,T){\bm{\dot{}}{\omega}}={\bm{\dot{}}{\omega}}(\rho{\bf s},P,T).

The HCCI simulation is performed in a doubly-periodic xyx-y spatial domain, discretized with an equi-spaced 672×672672\times 672 grid. The data contains, at each grid point, the solution primary variables that include velocities (ux,uyu_{x},u_{y}), temperature, pressure, and 28 chemical species (ρ𝐬\rho{\bf s}) that capture the finite-rate chemical kinetics of ethanol combustion. A total of 626 snapshots in time are considered. Accordingly, the HCCI data tensor is denoted as 𝓧I1×I2×I3×τ\bm{\mathscr{{X}}}\in\mathbb{R}^{I_{1}\times I_{2}\times I_{3}\times\tau}, where

  • I1I_{1} represents the spatial discretization in xx (I1=672I_{1}=672);

  • I2I_{2} represents the spatial discretization in yy (I2=672I_{2}=672);

  • I3I_{3} represents the variables, including chemical species (I3=32I_{3}=32); and

  • τ\tau represents the temporal snapshots (τ=626\tau=626).

Following [50], the HCCI data are provided in dimensionless units by dividing each variable slice by the maximum of magnitude of that variable across all spatial grid and time points, and therefore explicit scaling/unscaling operations as described in Fig. 1 are not required.

\levelstay

Quantities of Interest

The tensor entries for the combustion data represent solution values at the discrete mesh points which are uniformly spaced, which simplifies the expressions for the QoIs and the associated operations on the tensors. Any integral over the spatial domain Ω\Omega is equivalent to a summation over the first two (or three for 3D cases) spatial modes times a constant factor that denotes an elemental volume:

Ω()𝑑Ω=i1=1|I1|i2=1|I2|()ΔxΔy.\int_{\Omega}()~d\Omega=\sum_{i_{1}=1}^{|I_{1}|}\sum_{i_{2}=1}^{|I_{2}|}()~\Delta x\Delta y.

Likewise, any integral over time is equivalent to a summation over the last tensor mode. QoIs that are functions of the solution variables imply an operation over the variables mode at each point in space/time.

\leveldown

Conservation of Mass Since the HCCI simulation is performed over a doubly periodic spatial domain, it is a closed system that does not exchange mass with the surroundings. Hence, the mass, computed as volume integral of density, is conserved in time222 Strictly speaking, a finite difference discretization is not conservative, but we expect the discretization errors to be much smaller than errors due to the low-rank tensor representation.. Note that density itself is not stored in the data files, rather it is implied by the species mass fractions vector, via the relationship

i3=128si3=1ρ=i3=128ρsi3.\sum_{i_{3}=1}^{28}s_{i_{3}}=1\implies\rho=\sum_{i_{3}=1}^{28}\rho s_{i_{3}}.

Accordingly, the point-wise density of the HCCI data tensor is

(12) 𝓓𝓧(i1,i2,t)=i3=128𝓧(i1,i2,i3,t).\bm{\mathscr{{D}}}_{\bm{\mathscr{{X}}}}(i_{1},i_{2},t)=\sum_{{i_{3}}=1}^{28}\bm{\mathscr{{X}}}(i_{1},i_{2},i_{3},t)\;.

We denote the full density tensor as 𝓓𝓧I1×I2×τ\bm{\mathscr{{D}}}_{\bm{\mathscr{{X}}}}\in\mathbb{R}^{I_{1}\times I_{2}\times\tau}, and define our QoI functional for mass conservation at each time step as the volume integral of density

(13) Ωρ𝑑Ωg1(𝓧t)=i1=1I1i2=1I2𝓓𝓧(i1,i2,t),t=1,2,,τ,\int_{\Omega}\rho~d\Omega\equiv g_{1}(\bm{\mathscr{{X}}}_{t})=\sum_{i_{1}=1}^{I_{1}}\sum_{i_{2}=1}^{I_{2}}\bm{\mathscr{{D}}}_{\bm{\mathscr{{X}}}}(i_{1},i_{2},t)\;,~~t=1,2,\dots,\tau,

where the constant factor of ΔxΔy\Delta x\Delta y has been dropped for simplicity.

\levelstay

Conservation of Kinetic Energy The point-wise velocities in the xx-direction and yy-direction are the last two quantities in the point-wise variables vector, and hence slices of the HCCI data tensor,

(14) 𝓤𝓧(x)(i1,i2,t)=𝓧(i1,i2,31,t),\bm{\mathscr{{U}}}^{(x)}_{\bm{\mathscr{{X}}}}(i_{1},i_{2},t)=\bm{\mathscr{{X}}}(i_{1},i_{2},31,t)\;,

and

(15) 𝓤𝓧(y)(i1,i2,t)=𝓧(i1,i2,32,t),\bm{\mathscr{{U}}}^{(y)}_{\bm{\mathscr{{X}}}}(i_{1},i_{2},t)=\bm{\mathscr{{X}}}(i_{1},i_{2},32,t)\;,

respectively. Kinetic energy is an important quantity in turbulent flows, and the point-wise kinetic energy is given by

(16) 𝓚𝓧(i1,i2,t)=𝓓𝓧(i1,i2,t)[(𝓤𝓧(x)(i1,i2,t))2+(𝓤𝓧(y)(i1,i2,t))2].\bm{\mathscr{{K}}}_{\bm{\mathscr{{X}}}}(i_{1},i_{2},t)=\bm{\mathscr{{D}}}_{\bm{\mathscr{{X}}}}(i_{1},i_{2},t)\left[\left(\bm{\mathscr{{U}}}^{(x)}_{\bm{\mathscr{{X}}}}(i_{1},i_{2},t)\right)^{2}+\left(\bm{\mathscr{{U}}}^{(y)}_{\bm{\mathscr{{X}}}}(i_{1},i_{2},t)\right)^{2}\right]\;.

We denote the full kinetic energy tensor as 𝓚𝓧I1×I2×τ\bm{\mathscr{{K}}}_{\bm{\mathscr{{X}}}}\in\mathbb{R}^{I_{1}\times I_{2}\times\tau}, which is defined as

(17) 𝓚𝓧=𝓓𝓧[𝓤𝓧(x)𝓤𝓧(x)+𝓤𝓧(y)𝓤𝓧(y)],\bm{\mathscr{{K}}}_{\bm{\mathscr{{X}}}}=\bm{\mathscr{{D}}}_{\bm{\mathscr{{X}}}}\ast\left[\bm{\mathscr{{U}}}^{(x)}_{\bm{\mathscr{{X}}}}\ast\bm{\mathscr{{U}}}^{(x)}_{\bm{\mathscr{{X}}}}+\bm{\mathscr{{U}}}^{(y)}_{\bm{\mathscr{{X}}}}\ast\bm{\mathscr{{U}}}^{(y)}_{\bm{\mathscr{{X}}}}\right]\;,

where 𝓐𝓑\bm{\mathscr{{A}}}\ast\bm{\mathscr{{B}}} is the Hadamard (or element-wise) product of tensors 𝓐\bm{\mathscr{{A}}} and 𝓑\bm{\mathscr{{B}}} [59]. We now define our QoI functional for kinetic energy as

(18) g2(𝓧t)=i1=1I1i2=1I2𝓚𝓧(i1,i2,t),t=1,2,,τ,g_{2}(\bm{\mathscr{{X}}}_{t})=\sum_{i_{1}=1}^{I_{1}}\sum_{i_{2}=1}^{I_{2}}\bm{\mathscr{{K}}}_{\bm{\mathscr{{X}}}}(i_{1},i_{2},t)\;,~~t=1,2,\dots,\tau,

representing conservation at each timestep.

\levelup

Results

In this section we present results of computing low-rank tensor decompositions on a subset of the HCCI data tensor. We focus on a subset of time steps of the simulation data where there exist interesting dynamics, specifically time steps 360 through 409 during an ignition phase. Thus, the HCCI data tensor is 𝓧672×672×32×50\bm{\mathscr{{X}}}\in\mathbb{R}^{672\times 672\times 32\times 50}. As noted above in Fig. 1, explicit scaling/unscaling of each variable slice is not necessary for this data. Otherwise, we use the solution strategy described in Fig. 1, computing a traditional CP/Tucker decomposition via CP-ALS/ST-HOSVD using a supplied CP rank and ST-HOSVD error tolerance, respectively, which serves as the initial guess for the goal-oriented decomposition. For CP-ALS, we set a tight stopping tolerance of 10910^{-9} on the change in the tensor fit and a maximum of 100 iterations to ensure the relative tensor reconstruction error for the unscaled tensor data and the CP model is dominated by rank-truncation error rather than solver error. We then choose the weighting coefficients as described above and run 20 iterations of the trust region-Newton optimization method as described in Fig. 1.

The density and kinetic energy QoIs described above are displayed for the HCCI data tensor and reconstruction obtained from a rank-70 CP-ALS decomposition and ST-HOSVD decomposition with truncation tolerance of 0.10.1 in Fig. 2(a). For the density QoI, there are small but clearly visible differences observed between the data and traditional CP-ALS and ST-HOSVD decompositions. However, for the kinetic energy QoI, the differences are much more pronounced, especially for the CP-ALS decomposition. Such results are what motivated the development of the goal-oriented decompositions: when computing decompositions that simply aim to match the reconstructed low-rank model to the data—i.e., as in traditional tensor decompositions—it is often the case that QoIs associated with the data are not well modeled. In comparison, computing the goal-oriented decompositions—both GO-CP and GO-Tucker—leads to much more accurate modeling of the density and kinetic QoIs, as shown in Fig. 2(b). Furthermore, this increase in QoI modeling accuracy comes at minimal increase in the relative tensor reconstruction error—about 0.04%0.04\% for GO-CP and 0.09%0.09\% for GO-Tucker.

Refer to caption
(a) Initial density and kinetic energy QoIs
Refer to caption
(b) Final density and kinetic energy QoIs
Figure 2: Traditional (CP-ALS/ST-HOSVD) and goal-oriented (GO-CP/GO-Tucker) decompositions of the HCCI data for CP rank of 70 and initial ST-HOSVD truncation tolerance of 0.10.1 (corresponding to a core tensor of size 46×40×7×346\times 40\times 7\times 3), both resulting in compression ratios of about 7,000x.

We next consider the performance of the goal-oriented approach compared to the traditional CP-ALS and ST-HOSVD approaches over a range of CP ranks and initial ST-HOSVD truncation tolerances. Figure 3 displays the relative tensor reconstruction error as well as the relative QoI error for the density and kinetic energy QoIs as functions of compression ratio of the CP/Tucker reduced models, for both traditional and goal-oriented methods. We make several observations:

  • The Tucker relative tensor reconstruction error is (slightly) less than the CP relative tensor reconstruction error for the same compression ratio, as is typically expected.

  • The relative tensor reconstruction errors are essentially identical between the traditional and goal-oriented approaches, for both CP and Tucker.

  • In all compression ratios tested in these experiment, the relative QoI errors were improved using the goal-oriented approach. For the density QoI, we see around 2–4 orders of magnitude improvement in both CP and Tucker decompositions, while for the kinetic energy QoI, we see around 1–3 orders of magnitude improvement.

Refer to caption
Figure 3: Relative tensor reconstruction and QoI errors for traditional (CP-ALS/ST-HOSVD) and goal-oriented (GO-CP/GO-Tucker) decompositions for the HCCI problem. Significant improvement in the relative QoI errors using the goal-oriented decompositions are observed (solid versus dashed lines in the center and right plots) with little observed change in the relative tensor reconstruction errors (left plot).

To illustrate the impact of the compression gained from computing the low-rank tensor decompositions of the HCCI data, Fig. 4(a) presents a slice of data for the temperature variable over all spatial grid points at time step 401 in the simulation. In Fig. 4(b) and Fig. 4(c), we see the corresponding reconstructions using the CP-ALS and GO-CP decompositions, respectively, noting there are no clearly visible differences, despite the relative QoI errors of GO-CP being dramatic improvements over those of CP-ALS. Similarly, in Fig. 4(d) and Fig. 4(e), we see the corresponding reconstructions using the ST-HOSVD and GO-Tucker decompositions, respectively, noting also here are no clearly visible differences despite the dramatic improvements in the relative QoI errors using GO-Tucker.

Refer to caption
(a) HCCI Data
Refer to caption
(b) CP-ALS
Refer to caption
(c) GO-CP
Refer to caption
(d) ST-HOSVD      
Refer to caption
(e) GO-Tucker
Figure 4: Visual comparison of the (4(a)) HCCI data and reconstructions from decompositions computed using (4(b)) CP-ALS, (4(c)) GO-CP, (4(d)) ST-HOSVD, and (4(e)) GO-Tucker for the temperature variable over all spatial grids points at time step 401 out of 626. The decompositions use either a CP rank of 70 and initial ST-HOSVD truncation tolerance of 0.10.1 (corresponding to a core tensor of size 46×40×7×346\times 40\times 7\times 3), resulting in compression ratios of about 7,000x for both the CP and Tucker decompositions.
\levelup

Application Case Study II: Plasma Physics

\leveldown

Background

The computational modeling of complex plasma physics systems is of critical importance in science and advanced technology. A heavily used base-level model is resistive magnetohydrodynamics (MHD) [39, 13]. This model is useful for describing the macroscopic dynamics of conducting fluids in the presence of electromagnetic fields and is often employed to study aspects of astrophysical phenomena (e.g., stellar interiors, solar flares), important science and technology applications (e.g., tokamak and stellarator devices, alternate pulsed fusion concepts), and basic plasma physics phenomena (e.g., magnetic reconnection, hydromagnetic instabilities) [39, 13]. The mathematical basis for the continuum modeling of these systems is the solution of the governing partial differential equations (PDEs) describing conservation of mass, momentum, and energy, augmented by an evolution equation for the magnetic field. The magnetic field leads to Lorentz forces in the momentum equation, 𝐣×𝐁=(×𝐁/μ0)×𝐁{\bf j}\times{\bf B}=(\nabla\times{\bf B}/\mu_{0})\times{\bf B}, and a Joule heating energy dissipation term, η𝐣2\eta\|{\bf j}\|^{2}, in the energy equation. An outline of these equations is provided in Table 2.

Mass ρt+(ρ𝐮)=0\frac{\partial\rho}{\partial t}+\nabla\cdot(\rho{\bf u})=0
Momentum 𝐦t+[𝐦𝐮+(P𝐈+𝚷)]𝐣×𝐁=𝟎;𝐦ρ𝐮\frac{\partial{{\bf m}}}{\partial t}+\nabla\cdot\left[{{\bf m}\otimes{\bf u}}+(P{\bf I}+{{\bf\Pi}})\right]-{\bf j}\times{\bf B}={\bf 0};~{\bf m}\equiv\rho{\bf u}
Energy (ρe)t+[ρ𝐮e+𝐪]𝚷:𝐮η𝐣2=𝟎\frac{\partial(\rho e)}{\partial t}+\nabla\cdot\left[\rho{\bf u}e+{\bf q}\right]-{{\bf\Pi}:{\nabla{\bf u}}}-\eta\|{\bf j}\|^{2}={\bf 0}
Magnetic Field 𝐁t+[𝐮𝐁𝐁𝐮ημ0(𝐁(𝐁)T)+ψ𝐈]=𝟎;𝐁=𝟎\frac{\partial{\bf B}}{\partial t}+\nabla\cdot\left[{\bf u}\otimes{\bf B}-{\bf B}\otimes{\bf u}-\frac{\eta}{\mu_{0}}\left(\nabla{\bf B}-(\nabla{\bf B})^{T}\right)+\psi{\bf I}\right]={\bf 0};~\nabla\cdot\mathbf{B}={\bf 0}
Plasma QoIs IE=Ω(ρe)𝑑Ω;KE=Ω𝐦22ρ𝑑Ω;ME=Ω𝐁22μ0𝑑Ω;M=Ω𝐦2𝑑Ω;{IE=\int_{\Omega}(\rho e)d\Omega;~KE=\int_{\Omega}\frac{\|{\bf m}\|^{2}}{2\rho}d\Omega;~ME=\int_{\Omega}\frac{\|{\bf B}\|^{2}}{2\mu_{0}}d\Omega;~M=\int_{\Omega}\|{\bf m}\|^{2}d\Omega;}
Table 2: Low Mach Number Compressible Resistive MHD Plasma Model and QoIs.

In the computational MHD simulations that are used in the results that follow, a fully-implicit variational multiscale (VMS) unstructured finite element (FE) discretization of the governing MHD system is employed. The fully-implicit time integration, based on RK methods, allow efficient solution of longer-time scale dynamics [63, 14] and the unstructured FE spatial discretization allows for complex geometries (e.g. ITER [14]) and local grid refinement. Simulations of transient, large, 3D simulations can generate solutions with O(10M)O(10M)O(1B)O(1B) elements and O(103)O(10^{3})O(104)O(10^{4}) time-steps resulting in FE databases from O(100GB)O(100GB)O(10TB)O(10TB) in size.

\levelstay

Quantities of Interest

In the pursuit of fundamental scientific understanding of complex plasma physics systems, accurate representations of the state (solution variables (ρ,𝐦,(ρe),𝐁)(\rho,{\bf m},(\rho e),{\bf B})) as well a number of scientific QoIs are important. As an example, magnetic reconnection is a fundamental process whereby a magnetic field is structurally altered via some dissipation mechanism, resulting in a rapid conversion of magnetic field energy into plasma energy and significant plasma transport/flow. Magnetic reconnection dominates the dynamics of many space and laboratory plasmas, and is at the root of phenomena such as solar flares, coronal mass ejections, and major disruptions in MCF energy (MFE) experiments [39, 12]. In this context, there are a number of important QoIs that include the internal plasma energy (IE), magnetic energy (ME), kinetic energy (KE), the total energy, TE=IE+KE+METE=IE+KE+ME, along with the total momentum, M\sqrt{M} (see definitions in Table 2). In the evolution of the state variables and QoIs that are considered here, this transformation of energy components (ME, KE) in magnetic reconnection will be evident333It should be noted that due to energy transfer through the boundary there is a small decrease in the total energy in example problems that are presented.. Commonly in the evaluation of reconnection phenomena, an understanding of the growth rate of these transformations is of interest, and for example, the slope of M(t)\sqrt{M(t)} provides such a metric. Alternatively, using ME(t)\sqrt{ME(t)}, one could compute a decay rate for the transformation of magnetic energy.

Refer to caption
Figure 5: Tearing mode evolution of unstable 1D current sheet to 2D magnetic island. Colored contours and isolines of the current JzJ_{z} are shown at times t=0,50,75,100,125,150t~=~0,50,75,100,125,150.

In this work, we attempt to preserve IE, KE, ME as quantities of interest which then implicitly preserve TE. Together with the momentum QoI, this results in four QoI functions that are approximated using the finite element discretization scheme described above. Calculation of these QoIs is straightforward but substantially more complicated than the combustion QoIs described above, so descriptions of their evaluations are provided in Algorithm 1.

\levelstay

Results

\leveldown

2D Compressible Tearing Mode

The magnetic reconnection problem is a 2D low flow-Mach number compressible tearing mode simulation that follows the unstable evolution of a thin current sheet formed by a sheared magnetic field (Harris sheet) within an initially stationary velocity field. The domain is a rectangle [0,4]×[0,1][0,4]\times[0,1] discretized on a uniform grid resulting in 501 and 201 grid points in the xx and yy directions, respectively. The computation is for a Lundquist number of 10310^{3} and consists of 500 time steps. Details of the full problem setup can be found in [21, 14]. The thin current sheet becomes unstable and forms a magnetic island structure as presented in Fig. 5 with the xx-axis oriented in the vertical direction. The rate of the decay of the L2L_{2} norm of the magnetic energy and the growth of the L2L_{2} norm of momentum provide information of the time-scale of the linear phase of the instability [21]. These rates correspond to exponential fits of the slopes of these QoIs in Fig. 6.

We now consider goal-oriented CP and Tucker decomposition of the 2-D tearing mode plasma physics data stored as a 4-way tensor 𝓧401×201×12×501\bm{\mathscr{{X}}}\in\mathbb{R}^{401\times 201\times 12\times 501} consisting of each of the 12 simulation variables (ρ,T,𝐦,𝐮,(ρe),𝐁)(\rho,T,{\bf m},{\bf u},(\rho e),{\bf B}) at each grid and time point. We use the solution strategy described in Fig. 1 where we first center each variable slice by its mean and scale it by its standard deviation and then compute a traditional CP/Tucker decomposition via CP-ALS/ST-HOSVD using a supplied CP rank and ST-HOSVD error tolerance, respectively, which serves as the initial guess for the goal-oriented decomposition. As for the HCCI problem, we set a CP-ALS stopping tolerance of 10910^{-9}. Since the momentum values are near zero for the first few time steps, we only include time steps in the momentum QoI starting from the fifth time step—i.e., 𝕋q={5,,τ}\mathbb{T}_{q}=\{5,...,\tau\} for the momentum QoI. We then choose the weighting coefficients as described above and run 20 iterations of the trust region-Newton optimization method as described in Fig. 1.

The momentum and energy QoIs described above are displayed for the data tensor and reconstruction obtained from a rank-7 CP-ALS decomposition and ST-HOSVD decomposition with truncation tolerance of 0.10.1 in Fig. 6(a), where small but clearly visible differences are observed in these quantities, even though the relative tensor reconstruction error of the unscaled data and model is approximately 61046\cdot 10^{-4} for CP-ALS and 11041\cdot 10^{-4} for ST-HOSVD. These same QoIs are then displayed after the goal-oriented decomposition as described above using the trust-region Newton optimization method in Fig. 6(b), where these visual differences between the QoIs for the data and the low-rank models have largely disappeared. Furthermore, the relative tensor reconstruction error is again about 61046\cdot 10^{-4} and 11041\cdot 10^{-4}, respectively, indicating substantial reduction in relative QoI error with no increase in the relative tensor reconstruction error.

Refer to caption
(a) Initial momentum and energy QoIs
Refer to caption
(b) Final momentum and energy QoIs
Figure 6: Traditional and goal-oriented CP and Tucker decomposition of the tearing sheet data for CP rank of 7 and initial ST-HOSVD truncation tolerance of 0.10.1 (corresponding to a core tensor of size 8×9×5×38\times 9\times 5\times 3), both resulting in compression ratios of about 60,000x.

We now consider the performance of the goal-oriented approach compared to the traditional CP/Tucker approach over a range of CP ranks and initial ST-HOSVD truncation tolerances. Figure 7 displays the relative tensor reconstruction error (for both the scaled and unscaled data and models) as well as the relative QoI error for the momentum and energy QoIs as functions of compression ratio of the CP/Tucker reduced models, for both traditional and goal-oriented methods. We make several observations:

  • The Tucker relative tensor reconstruction error is typically less than the CP relative tensor reconstruction error for the same compression ratio, as is typically expected.

  • The relative QoI errors for Tucker is also typically less than for CP.

  • The relative tensor reconstruction errors for the traditional and goal-oriented approaches are essentially identical, for both CP and Tucker. This is true for both the scaled and unscaled data and models, indicating the goal improvements are not due to increased relative tensor reconstruction error of small variables relative to larger variables.

  • Around 2–3 orders of magnitude improvement in relative QoI error for all QoIs is typically observed for the goal-oriented CP method, whereas goal-oriented Tucker results in about 1–2 orders of magnitude improvement.

Refer to caption
Figure 7: Relative tensor reconstruction and QoI errors for traditional and goal-oriented CP and Tucker decompositions, including both scaled and unscaled data and models for the 2D tearing mode problem. Significant improvement in the relative QoI errors are observed with little observed change in the relative tensor reconstruction errors.

The above calculations all relied on use of the trust-region Newton optimization method with Gauss-Newton Hessian approximation and approximate block-diagonal preconditioner. We found this approach to be much more reliable than the L-BFGS method, often finding more accurate solutions (in terms of QoI error) in significantly fewer iterations. For example, Fig. 8 displays the convergence history for the trust-region Newton and L-BFGS methods for the above goal-oriented Tucker experiment with HOSVD truncation tolerance of 0.10.1, where we see lower achieved QoI error for the trust-region Newton method, yielding a good solution in just a few optimization iterations.

Refer to caption
Figure 8: Convergence history for the trust-region Newton and L-BFGS optimization methods for goal-oriented Tucker. for the 2D tearing mode problem.
\levelstay

3D Island Coalescence The island coalescence problem follows the unstable evolution of two 3D current flux tubes (which in the cross plane become islands) embedded in a sheared magnetic field Harris sheet and is described in detail in [63]. The computational simulations presented here are for a higher Lundquist number that exhibit more significant dynamics in the evolution and reconnection [21, 63]. This example allows a demonstration of compression and evaluation of the evolution of appropriate QoIs, but also the more complex dynamics facilitate qualitative discussion of the reconstructions using low-rank tensor decompositions at various compression ratios shown in Fig. 9 and Fig. 10. Visualization of an iso-surface (in this case density) is a challenging qualitative metric for comparison of the reconstructions as small changes in the density distribution in space will modify the structure of the iso-surface at a given value. Fig. 9 compares the initial condition and three discrete times in the evolution of the reconnecting flux tubes for the original FE solution and ST-HOSVD decomposition at approximately 10x compression. Clearly evident is the quality of the ST-HOSVD reconstruction at 10x reduction.

Refer to caption
Figure 9: Evolution of unstable 3D current flux tubes with magnetic reconnection. Top row FE solution/data and bottom ST-HOSVD tensor reconstruction at 10x reduction. Density iso-surface at 1.5 colored by 𝒖x\bm{u}_{x}, and slice plane colored by density at t=0,4.530,8.630,9.930t~=~0,4.530,8.630,9.930.

In Fig. 10, the ST-HOSVD reconstruction for up to three orders of magnitude of reduction is visualized. Here it is evident that at the significant reduction of 100x, the reconstruction appears very representative of the original data for visualizing the structure of the iso-surface. Additionally, even up to a 1000x reduction, the major structural features of the iso-surface resemble reasonably well the original FE data with the existence of the loop structures on the three main island iso-surfaces at time t=9.930t=9.930.

Refer to caption
Figure 10: Comparison of FE solution/data and ST-HOSVD reconstructions for compressions at t=8.630,9.930t~=8.630,9.930.
Refer to caption
(a) FE Solution/Data.
Refer to caption
(b) ST-HOSVD full-database.
Refer to caption
(c) ST-HOSVD coarse-database.
Figure 11: Degradation of reconstruction quality due to coarsening the 3D IC database to every third time step for the density iso-surface at 1.5 colored by 𝒖x\bm{u}_{x}, and slice plane colored by density at t = 9.930. This coarse-database is used for the study of the goal-oriented methods.

Next, a comparison of the reconstructions of the four plasma physics QoIs along with the quality of the iso-surface reconstruction for traditional and goal-oriented decompositions with varying levels of reduction is studied. The setup of the goal-oriented approach is identical to the tearing sheet problem described above, except the default stopping tolerance of 10410^{-4} for CP-ALS is used for the initial guess for goal-oriented CP. In this study, the original island coalescence data set produced by the simulation code was coarsened in time by selecting every third time-step to overcome memory and runtime limitations with our current serial, Matlab-based software implementation444These limitations are expected to be removed in a future software implementation.. It should be noted that, as expected, the temporal coarsening somewhat adversely affects the general quality of the reconstructions over what would have been obtained with the original data set for the same level of reduction, for both the traditional and goal-oriented approaches. This degradation is illustrated in Fig. 11 where the reference FE data, the full-database ST-HOSVD and coarsened-database reconstructions at a reduction of 1,000x are presented.

Figure 12 and Fig. 13 present the improvement in the satisfactions of the QoI evolution with the goal-oriented approaches, where the ranks/tolerances were chosen to yield compression ratios of approximately 10x, 100x, 1,000x, and 10,000x. Here we see significant error in the QoIs for the 1,000x and 10,000x compressions for the traditional approach, which are substantially reduced for the goal-oriented approach. As with the tearing mode data, essentially no change in the relative tensor reconstruction errors are observed.

Refer to caption
(a) Relative tensor reconstruction errors of 0.000870.00087/0.00830.0083 for unscaled/scaled ST-HOSVD.
Refer to caption
(b) Relative tensor reconstruction errors of 0.000850.00085/0.00840.0084 for unscaled/scaled GO-Tucker.
Refer to caption
(c) Relative tensor reconstruction errors of 0.00620.0062/0.0620.062 for unscaled/scaled ST-HOSVD.
Refer to caption
(d) Relative tensor reconstruction errors of 0.00630.0063/0.0630.063 for unscaled/scaled GO-Tucker.
Figure 12: Traditional (left column) and goal-oriented (right column) Tucker decomposition of the island coalescence data for with varying scaled initial ST-HOSVD tolerances (0.010.01, 0.070.07), yielding corresponding compression ratios of approximately 10x and 100x, respectively (CP decompositions for 10x and 100x compressions require too much memory due to the serial MATLAB implementation and thus are not provided). Even with the relatively high accuracy of these decompositions, visual differences in the QoIs are observed for the ST-HOSVD decomposition which are eliminated with the goal-oriented approach while the unscaled and scaled relative tensor reconstruction errors are nearly the same between the approaches.
Refer to caption
(a) Relative tensor reconstruction errors of 0.0110.011/0.0530.053 for unscaled/scaled CP-ALS and 0.0120.012/0.130.13 for unscaled/scaled ST-HOSVD.
Refer to caption
(b) Relative tensor reconstruction errors of 0.0110.011/0.0530.053 for unscaled/scaled GO-CP and 0.0120.012/0.130.13 for unscaled/scaled GO-Tucker.
Refer to caption
(c) Relative tensor reconstruction errors of 0.0240.024/0.180.18 for unscaled/scaled CP-ALS and 0.0270.027/0.290.29 for unscaled/scaled ST-HOSVD.
Refer to caption
(d) Relative tensor reconstruction errors of 0.0240.024/0.180.18 unscaled/scaled GO-CP and 0.0250.025/0.290.29 unscaled/scaled GO-Tucker.
Figure 13: Traditional (left column) and goal-oriented (right column) CP and Tucker decomposition of the island coalescence data for with varying CP ranks (2294, 229) and scaled initial ST-HOSVD tolerances (0.150.15, 0.330.33), yielding corresponding compression ratios of approximately 1,000x and 10,000x respectively. Substantial errors in the QoIs are observed for both the original CP and Tucker decompositions due to the high compression, yet these errors are effectively removed for the goal-oriented approach. As before, the unscaled and unscaled relative tensor reconstruction errors are nearly the same between the two approaches.

As described above, the goal-oriented methods do strongly improve the QoI reconstruction. In general these methods also appear to improve the iso-surface reconstructions in most cases but not uniformly. To illustrate this effect, Fig. 14 and Fig. 15 present results at times t=8.730t=8.730 and t=9.930t=9.930 respectively for reduction levels of 10x, 100x, 1,000x, and 10,000x. The results look very good even on this coarse-database for the 10x, 100x compression ratios for the goal-oriented approaches. For higher compression ratios, it is evident for the t=8.730t=8.730 data that the goal-oriented methods appear to slightly improve the iso-surface reconstructions. This can most clearly be seen for the goal-oriented methods reconstructing some of the slender iso-surfaces at 1000x and for Tucker at 10,000x reforming the continuous surfaces of the central iso-surface that has two islands that exist on the intersection with the (x,y)x,y) plane, whereas the ST-HOSVD and CP-ALS-based decompositions only show a single island outline.

For the latter time of t=9.930t=9.930 the data is mixed. In Fig. 15, the 10x reductions look excellent, for 100x a slight improvement of the surface seems to appear with the loop structure reappearing in the goal-oriented Tucker reconstruction. Then for 1,000x a slight degradation in the loop structure seems evident in that the openings become closed surfaces compared to the standard ST-HOSVD. However, with a further increase in the compression rate to 10,000x the goal-oriented approach does isolate the three major iso-surfaces showing an improvement from the standard ST-HOSVD that merged these surfaces.

Refer to caption
10x
Refer to caption
100x
Refer to caption
1,000x
Refer to caption
10,000x
Figure 14: Comparison of FE method to the standard and goal-oriented tensor decompositions at various compressions for reconstructing the density iso-surface at 1.5 colored by 𝒖x\bm{u}_{x}, and slice plane colored by density at t = 8.730.
Refer to caption
10x
Refer to caption
100x
Refer to caption
1,000x
Refer to caption
10,000x
Figure 15: Comparison of FE method to the standard and goal-oriented tensor decompositions at various compressions for reconstructing the density iso-surface at 1.5 colored by 𝒖x\bm{u}_{x}, and slice plane colored by density at t = 9.930.
\levelmultiup

Conclusions and Future Work2

In this paper, we introduced a new approach for low-rank modeling of high-dimensional numerical simulation data based on tensor decompositions that augment the traditional optimization problems used to compute these tensor decompositions with additional functionals in the objective function that attempt to preserve application-specific quantities of interest and invariants. The method was introduced specifically for CP and Tucker decompositions, but the approach is general and could likely be applied to any tensor format that can be computed through an optimization-based method. Examples of applying the technique to large data sets arising from the numerical simulation of turbulent combustion and plasma physics relevant to fusion energy were presented, demonstrating several orders of magnitude reduction in the error for these QoIs while maintain the same level of accuracy in the overall tensor reconstruction. When incorporating multiple QoIs in the problem formulation, we found trust region-Newton optimization methods based on Gauss-Newton Hessian approximations effective in solving the resulting optimization problems where other methods such as L-BFGS struggled.

While our results compared the effectiveness of CP and Tucker decompositions for reduction of scientific data, our aim was not to claim that one these decompositions to be superior over the other. Generally, we found the level of error for a given compression ratio to be comparable between the two methods, with comparable levels of reduction in QoI error for the goal-oriented approach. We often found the Tucker method easier to deal with in practical settings however, particularly for high-accuracy decompositions. With Tucker, setting a tight ST-HOSVD solver tolerance is straightforward and not terribly expensive computationally. However, obtaining high-accuracy CP decompositions is challenging numerically due to the very high ranks (i.e., OO(10,000)) required which existing CP decomposition algorithms (e.g., CP-ALS) and computational kernels supporting them (e.g., MTTKRP) are often not optimized to handle effectively. Conversely, we found the optimization problem defining goal-oriented CP decompositions typically easier to solve numerically. In fact, the challenges with solving the optimization problem for Tucker decompositions motivated the trust-region Newton optimization approach studied here.

While the approach was demonstrated to be successful in achieving our goal of reducing QoI error in reconstructions arising from the low-rank models, several challenges remain that could be studied through future work. In particular, we found the goal-oriented approach to be substantially more expensive than the traditional solution methods the models are based on (i.e., CP-ALS and ST-HOSVD). Part of this is due to our serial Matlab-based implementation where vectorizing the evaluation of the QoI functions and their derivatives is challenging. We expect the approach to be much more competitive in terms of computational cost with a high-performance, parallel implementation. Other potential approaches for reducing cost include randomized algorithms, incremental update/streaming algorithms, and improved preconditioning that incorporates the QoI terms in some fashion. Effective stopping criteria beyond a fixed number of iterations were also not addressed in this work, and we suspect that extending the formulation to one on Riemannian manifolds to eliminate degeneracies arising from non-uniqueness where traditional stopping criteria based on gradient norms could be used would likely be fruitful. Finally, a theoretical study of how the goal-oriented approach can decrease QoI error without significantly increasing tensor reconstruction error, illustrating similarities and differences of the traditional and goal-oriented approaches, is needed.

Acknowledgments

This work was supported by the Laboratory Directed Research and Development program at Sandia National Laboratories, a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc. for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

References

  • [1] P.-A. Absil, C. G. Baker, and K. A. Gallivan, Trust-region methods on Riemannian manifolds, Foundations of Computational Mathematics, 7 (2007), pp. 303–330, https://doi.org/10.1007/s10208-005-0179-9.
  • [2] E. Acar, D. M. Dunlavy, and T. G. Kolda, A scalable optimization approach for fitting canonical tensor decompositions, Journal of Chemometrics, 25 (2011), pp. 67–86, https://doi.org/10.1002/cem.1335.
  • [3] E. Acar, T. G. Kolda, and D. M. Dunlavy, All-at-once optimization for coupled matrix and tensor factorizations. arXiv:1105.3422, 2011, https://arxiv.org/abs/1105.3422.
  • [4] M. Ainsworth, O. Tugluk, B. Whitney, and S. Klasky, Multilevel techniques for compression and reduction of scientific data-quantitative control of accuracy in derived quantities, SIAM Journal on Scientific Computing, 41 (2019), pp. A2146–A2171, https://doi.org/10.1137/18M1208885.
  • [5] W. Austin, G. Ballard, and T. G. Kolda, Parallel tensor compression for large-scale scientific data, in IPDPS’16: Proceedings of the 30th IEEE International Parallel and Distributed Processing Symposium, May 2016, pp. 912–922, https://doi.org/10.1109/IPDPS.2016.67.
  • [6] B. W. Bader and T. G. Kolda, Algorithm 862: Matlab tensor classes for fast algorithm prototyping, ACM Trans. Math. Softw., 32 (2006), p. 635–653, https://doi.org/10.1145/1186785.1186794.
  • [7] B. W. Bader and T. G. Kolda, Efficient MATLAB computations with sparse and factored tensors, SIAM Journal on Scientific Computing, 30 (2007), pp. 205–231, https://doi.org/10.1137/060676489.
  • [8] G. Ballard, K. Hayashi, and R. Kannan, Parallel nonnegative CP decomposition of dense tensors, in 25th IEEE International Conference on High Performance Computing (HiPC), Dec 2018, pp. 22–31, https://ieeexplore.ieee.org/document/8638076.
  • [9] G. Ballard, A. Klinvex, and T. G. Kolda, TuckerMPI: A parallel C++/MPI software package for large-scale data compression via the Tucker tensor decomposition, ACM Transactions on Mathematical Software, 46 (2020), 13 (31 pages), https://doi.org/10.1145/3378445.
  • [10] G. Ballard and T. G. Kolda, Tensor Decompositions for Data Science, Cambridge University Press, 2025, https://www.mathsci.ai/post/tensor-textbook/.
  • [11] A. Bhagatwala, J. H. Chen, and T. Lu, Direct numerical simulations of hcci/saci with ethanol, Combustion and Flame, 161 (2014), pp. 1826–1841, https://doi.org/https://doi.org/10.1016/j.combustflame.2013.12.027, https://www.sciencedirect.com/science/article/pii/S0010218014000030.
  • [12] D. Biskamp, Magnetic reconnection, Physics Reports, 237 (1994), pp. 179–247.
  • [13] J. A. Bittencourt, Fundamentals of plasma physics, Springer Science & Business Media, 2013.
  • [14] J. Bonilla, J.N.Shadid, X. Tang, M. Crockatt, P. Ohm, E. G. Phillips, R. Pawlowski, and S. Conde, On a fully-implicit VMS-stabilized FE formulation for low Mach number compressible resistive MHD with application to MCF, Computer Methods in Applied Mechanics and Engineering, 417 (2023), p. 116359.
  • [15] J. Borggaard, Z. Wang, and L. Zietsman, A goal-oriented reduced-order modeling approach for nonlinear systems, Computers & Mathematics with Applications, 71 (2016), pp. 2155–2169.
  • [16] N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre, Manopt, a Matlab toolbox for optimization on manifolds, Journal of Machine Learning Research, 15 (2014), pp. 1455–1459, https://www.manopt.org.
  • [17] T. Bui-Thanh, K. Willcox, O. Ghattas, and B. van Bloemen Waanders, Goal-oriented, model-constrained optimization for reduction of large-scale systems, Journal of Computational Physics, 224 (2007), pp. 880–896.
  • [18] J. D. Carroll and J. J. Chang, Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition, Psychometrika, 35 (1970), pp. 283–319, https://doi.org/10.1007/BF02310791.
  • [19] R. B. Cattell, Parallel proportional profiles and other principles for determining the choice of factors by rotation, Psychometrika, 9 (1944), pp. 267–283, https://doi.org/10.1007/BF02288739.
  • [20] R. B. Cattell, The three basic factor-analytic research designs — their interrelations and derivatives, Psychological Bulletin, 49 (1952), pp. 499–452, https://doi.org/10.1037/h0054245.
  • [21] L. Chacón, An optimal, parallel, fully implicit newton-krylov solver for three-dimensional visco-resistive magnetohydrodynamics, Phys. Plasmas, 15 (2008), p. 056103.
  • [22] J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo, Terascale direct numerical simulations of turbulent combustion using s3d, Computational Science & Discovery, 2 (2009), p. 015001, https://doi.org/10.1088/1749-4699/2/1/015001, https://dx.doi.org/10.1088/1749-4699/2/1/015001.
  • [23] L. Cheng, S. Mattei, P. W. Fick, and S. J. Hulshoff, A semi-continuous formulation for goal-oriented reduced-order models: 1d problems, International Journal for Numerical Methods in Engineering, 105 (2016), pp. 464–480.
  • [24] D. Choi and L. Sael, SNeCT: Scalable network constrained Tucker decomposition for multi-platform data profiling, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17 (2020), pp. 1785–1796, https://doi.org/10.1109/TCBB.2019.2906205.
  • [25] J. Cohen, R. C. Farias, and P. Comon, Fast decomposition of large nonnegative tensors, IEEE Signal Processing Letters, 22 (2015), pp. 862–866, https://doi.org/10.1109/LSP.2014.2374838.
  • [26] J. Cohen, K. Usevich, and P. Comon, A tour of constrained tensor canonical polyadic decomposition, May 2016. lhttps://hal.archives-ouvertes.fr/hal-01311795/. Accessed 05/24/2021.
  • [27] J. E. Cohen and V. Leplat, Efficient algorithms for regularized nonnegative scale-invariant low-rank approximation models, SIAM Journal on Mathematics of Data Science, 7 (2025), pp. 468–494, https://doi.org/10.1137/24M1657948.
  • [28] S. De, E. Corona, P. Jayakumar, and S. Veerapaneni, Tensor-train compression of discrete element method simulation data, Journal of Terramechanics, 113-114 (2024), p. 100967, https://doi.org/https://doi.org/10.1016/j.jterra.2024.100967, https://www.sciencedirect.com/science/article/pii/S0022489824000090.
  • [29] L. De Lathauwer, B. De Moor, and J. Vandewalle, A multilinear singular value decomposition, SIAM Journal on Matrix Analysis and Applications, 21 (2000), pp. 1253–1278, https://doi.org/10.1137/S0895479896305696.
  • [30] L. De Lathauwer, B. De Moor, and J. Vandewalle, On the best rank-1 and rank-(R1,R2,,RN)(R_{1},R_{2},\dots,R_{N}) approximation of higher-order tensors, SIAM Journal on Matrix Analysis and Applications, 21 (2000), pp. 1324–1342, https://doi.org/10.1137/S0895479898346995.
  • [31] L. De Lathauwer and E. Kofidis, Coupled matrix-tensor factorizations—the case of partially shared factors, in 2017 51st Asilomar Conference on Signals, Systems, and Computers, 2017, pp. 711–715, https://doi.org/10.1109/ACSSC.2017.8335436.
  • [32] H. De Sterck and A. Howse, Nonlinearly preconditioned optimization on grassmann manifolds for computing approximate tucker tensor decompositions, SIAM Journal on Scientific Computing, 38 (2016), pp. A997–A1018, https://doi.org/10.1137/15M1037288.
  • [33] L. Eldén and B. Savas, A Newton–Grassmann method for computing the best multilinear rank-(r1,r2,r3)(r_{1},r_{2},r_{3}) approximation of a tensor, SIAM Journal on Matrix Analysis and Applications, 31 (2009), pp. 248–271, https://doi.org/10.1137/070688316.
  • [34] H. Fanaee-T and J. Gama, Tensor-based anomaly detection: An interdisciplinary survey, Knowledge-Based Systems, 98 (2016), pp. 130–147, https://doi.org/https://doi.org/10.1016/j.knosys.2016.01.027, https://www.sciencedirect.com/science/article/pii/S0950705116000472.
  • [35] F. Fang, C. Pain, I. M. Navon, and D. Xiao, An efficient goal-based reduced order model approach for targeted adaptive observations, International Journal for Numerical Methods in Fluids, 83 (2017), pp. 263–275.
  • [36] O. T. Farrando, A goal-oriented, reduced-order modeling framework of wall-bounded shear flows, PhD thesis, Master’s thesis]. Delft University of Technology, 2022.
  • [37] G. Favier and A. de Almeida, Overview of constrained parafac models, EURASIP Journal on Advances in Signal Processing, 2014 (2014), p. 142, https://doi.org/10.1186/1687-6180-2014-142.
  • [38] B. Ghahremani and H. Babaee, Cross interpolation for solving high-dimensional dynamical systems on low-rank tucker and tensor train manifolds, Computer Methods in Applied Mechanics and Engineering, 432 (2024), p. 117385, https://doi.org/https://doi.org/10.1016/j.cma.2024.117385, https://www.sciencedirect.com/science/article/pii/S0045782524006406.
  • [39] H. Goedbloed and S. Poedts, Principles of Magnetohydrodynamics with Applications to Laboratory and Astrophysical Plasmas, Cambridge Univ. Press, 2004.
  • [40] W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, vol. 42 of Springer Series in Computational Mathematics, Springer-Verlag Berlin Heidelberg, 2012, https://doi.org/10.1007/978-3-642-28027-6.
  • [41] R. A. Harshman, Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis, UCLA working papers in phonetics, 16 (1970), pp. 1–84. Available at http://www.psychology.uwo.ca/faculty/harshman/wpppfac0.pdf.
  • [42] Q. Heng, E. C. Chi, and Y. Liu, Robust low-rank tensor decomposition with the L2 criterion, Technometrics, 65 (2023), pp. 537–552, https://doi.org/10.1080/00401706.2023.2200541.
  • [43] F. L. Hitchcock, The expression of a tensor or a polyadic as a sum of products, Journal of Mathematics and Physics, 6 (1927), pp. 164–189, https://doi.org/10.1002/sapm192761164.
  • [44] F. L. Hitchcock, Multiple invariants and generalized rank of a p-way matrix or tensor, Journal of Mathematics and Physics, 7 (1927), pp. 39–79, https://doi.org/10.1002/sapm19287139.
  • [45] D. Hong, T. G. Kolda, and J. A. Duersch, Generalized canonical polyadic tensor decomposition, SIAM Review, 62 (2020), pp. 133–163, https://doi.org/10.1137/18M1203626.
  • [46] K. Huang, N. D. Sidiropoulos, and A. P. Liavas, A flexible and efficient algorithmic framework for constrained matrix and tensor factorization, IEEE Transactions on Signal Processing, 64 (2016), pp. 5052–5065, https://doi.org/10.1109/TSP.2016.2576427.
  • [47] I. G. Ion, Low-rank tensor decompositions for surrogate modeling in forward and inverse problems, PhD thesis, Technische Universität Darmstadt, Darmstadt, March 2024, https://doi.org/https://doi.org/10.26083/tuprints-00026678, http://tuprints.ulb.tu-darmstadt.de/26678/.
  • [48] U. Kizhakkinan, P. L. T. Duong, R. Laskowski, G. Vastola, D. W. Rosen, and N. Raghavan, Development of a surrogate model for high-fidelity laser powder-bed fusion using tensor train and gaussian process regression, Journal of Intelligent Manufacturing, 34 (2023), pp. 369–385, https://doi.org/https://doi.org/10.1007/s10845-022-02038-4.
  • [49] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Review, 51 (2009), pp. 455–500, https://doi.org/10.1137/07070111X.
  • [50] H. Kolla, K. Aditya, and J. H. Chen, Higher order tensors for dns data analysis and compression, in Data Analysis for Direct Numerical Simulations of Turbulent Combustion: From Equation-Based Analysis to Machine Learning, H. Pitsch and A. Attili, eds., Springer International Publishing, Cham, 2020, pp. 109–134, https://doi.org/10.1007/978-3-030-44718-2_6, https://doi.org/10.1007/978-3-030-44718-2_6.
  • [51] J. Lee, Q. Gong, J. Choi, T. Banerjee, S. Klasky, S. Ranka, and A. Rangarajan, Error-bounded learned scientific data compression with preservation of derived quantities, Applied Sciences, 12 (2022), https://doi.org/10.3390/app12136718.
  • [52] X. Li and Y. Yuan, A tensor-based hyperspectral anomaly detection method under prior physical constraints, IEEE Transactions on Geoscience and Remote Sensing, 61 (2023), pp. 1–12, https://doi.org/10.1109/TGRS.2023.3324147.
  • [53] Z. Li, H. Zheng, N. Kovachki, D. Jin, H. Chen, B. Liu, K. Azizzadenesheli, and A. Anandkumar, Physics-informed neural operator for learning partial differential equations, ACM / IMS J. Data Sci., 1 (2024), https://doi.org/10.1145/3648506, https://doi.org/10.1145/3648506.
  • [54] J. Nocedal and S. J. Wright, Numerical Optimization, Springer Series in Operations Research, Springer-Verlag, New York, 2006.
  • [55] P. Paatero, A weighted non-negative least squares algorithm for three-way “PARAFAC” factor analysis, Chemometrics and Intelligent Laboratory Systems, 38 (1997), pp. 223–242, https://doi.org/10.1016/S0169-7439(97)00031-2.
  • [56] A.-H. Phan, P. Tichavský, and A. Cichocki, Low complexity damped Gauss–Newton algorithms for CANDECOMP/PARAFAC, SIAM Journal on Matrix Analysis and Applications, 34 (2013), pp. 126–147, https://doi.org/10.1137/100808034.
  • [57] A. H. Phan, P. Tichavský, and A. Cichocki, Damped Gauss-Newton algorithm for nonnegative Tucker decomposition, in 2011 IEEE Statistical Signal Processing Workshop (SSP), 2011, pp. 665–668, https://doi.org/10.1109/SSP.2011.5967789.
  • [58] A.-H. Phan, P. Tichavský, K. Sobolev, K. Sozykin, D. Ermilov, and A. Cichocki, Canonical polyadic tensor decomposition with low-rank factor matrices, in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 4690–4694, https://doi.org/10.1109/ICASSP39728.2021.9414606.
  • [59] L. Qi and Z. Luo, Tensor Analysis: Spectral Theory and Special Tensors, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2017, https://doi.org/10.1137/1.9781611974751.
  • [60] M. Raissi, P. Perdikaris, and G. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics, 378 (2019), pp. 686–707, https://doi.org/https://doi.org/10.1016/j.jcp.2018.10.045, https://www.sciencedirect.com/science/article/pii/S0021999118307125.
  • [61] T. M. Ranadive and M. M. Baskaran, Large-scale sparse tensor decomposition using a damped Gauss-Newton method, in 2020 IEEE High Performance Extreme Computing Conference (HPEC), 2020, pp. 1–8, https://doi.org/10.1109/HPEC43674.2020.9286202.
  • [62] M. Roald, C. Schenker, V. D. Calhoun, T. Adali, R. Bro, J. E. Cohen, and E. Acar, An ao-admm approach to constraining parafac2 on all modes, SIAM Journal on Mathematics of Data Science, 4 (2022), pp. 1191–1222, https://doi.org/10.1137/21M1450033.
  • [63] J. N. Shadid, R. P. Pawlowski, E. C. Cyr, R. S. Tuminaro, L. Chacon, and P. D. Weber, Scalable Implicit Incompressible Resistive MHD with Stabilized FE and Fully-coupled Newton-Krylov-AMG, Comput. Methods Appl. Mech. Engrg., 304 (2016), pp. 1–25.
  • [64] N. Singh, L. Ma, H. Yang, and E. Solomonik, Comparison of accuracy and scalability of Gauss-Newton and alternating least squares for CP decomposition, 2020, https://arxiv.org/abs/1910.12331.
  • [65] L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika, 31 (1966), pp. 279–311, https://doi.org/10.1007/BF02289464.
  • [66] M. Vandecappelle, N. Vervliet, and L. D. Lathauwer, Inexact generalized Gauss–Newton for scaling the canonical polyadic decomposition with non-least-squares cost functions, IEEE Journal of Selected Topics in Signal Processing, 15 (2021), pp. 491–505, https://doi.org/10.1109/JSTSP.2020.3045911.
  • [67] N. Vannieuwenhoven, R. Vandebril, and K. Meerbergen, A new truncation strategy for the higher-order singular value decomposition, SIAM Journal on Scientific Computing, 34 (2012), pp. A1027–A1052, https://doi.org/10.1137/110836067.
  • [68] C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization, ACM Trans. Math. Softw., 23 (1997), p. 550–560, https://doi.org/10.1145/279232.279236.
\levelstay

Goal-oriented Gradient and Hessian

In this section, we derive formulas for the efficient evaluation of the gradient and Hessian of the objective function fgof_{go} needed for the optimization methods described above. Calculation of these quantities for the tensor term (i.e., f(𝓧,𝓜)f(\bm{\mathscr{{X}}},\bm{\mathscr{{M}}})) is standard within the literature, so we focus solely on the goal-oriented terms (formulas for the tensor term are provided in References for reference). For simplicity, we consider only a single goal term since the extension to multiple goals is straightforward, and therefore drop the qq subscript. We also ignore variable scaling as updating the goal derivatives to include it is straightforward, and assume 𝕋={1,,τ}\mathbb{T}=\{1,\dots,\tau\}. Define G=t=1τ(g(𝓧t)g(𝓜t))2G=\sum_{t=1}^{\tau}\left(g(\bm{\mathscr{{X}}}_{t})-g(\bm{\mathscr{{M}}}_{t})\right)^{2}, which has the form of a nonlinear least-squares objective. Given a CP or Tucker model 𝓜\bm{\mathscr{{M}}}, let 𝐯\bm{\mathbf{{v}}} be the vector containing the factor matrix coefficients (and core coefficients for Tucker). Define the residual function

(19) 𝐟(𝐯)=[g(𝓧1)g(𝓜1)g(𝓧τ)g(𝓜τ)]\bm{\mathbf{{f}}}(\bm{\mathbf{{v}}})=\begin{bmatrix}g(\bm{\mathscr{{X}}}_{1})-g(\bm{\mathscr{{M}}}_{1})\\ \vdots\\ g(\bm{\mathscr{{X}}}_{\tau})-g(\bm{\mathscr{{M}}}_{\tau})\end{bmatrix}

so that G=𝐟T𝐟G=\bm{\mathbf{{f}}}^{T}\bm{\mathbf{{f}}}. Then 𝐯G=2𝐉T𝐟\bm{\mathbf{{\nabla}}}_{\bm{\mathbf{{v}}}}G=2\bm{\mathbf{{J}}}^{T}\bm{\mathbf{{f}}} where 𝐉\bm{\mathbf{{J}}} is the Jacobian matrix 𝐃𝐯𝐟\bm{\mathbf{{D}}}_{\bm{\mathbf{{v}}}}\bm{\mathbf{{f}}}. Furthermore, the Gauss-Newton Hessian approximation 𝐇𝐯2G\bm{\mathbf{{H}}}\approx\bm{\mathbf{{\nabla}}}_{\bm{\mathbf{{v}}}}^{2}G is given by 𝐇=2𝐉T𝐉\bm{\mathbf{{H}}}=2\bm{\mathbf{{J}}}^{T}\bm{\mathbf{{J}}}. For simplicity in what follows, we assume 𝓧I1×I2×τ\bm{\mathscr{{X}}}\in\mathbb{R}^{I_{1}\times I_{2}\times\tau} only contains a single spatial mode and therefore is a 3-way tensor. The generalization to higher-dimensional 𝓧\bm{\mathscr{{X}}} is straightforward. Given a corresponding CP/Tucker model 𝓜t\bm{\mathscr{{M}}}_{t} for each tt, define 𝓩I1×I2×τ\bm{\mathscr{{Z}}}\in\mathbb{R}^{I_{1}\times I_{2}\times\tau} such that zi1i2t=g/mi1i2tz_{{i_{1}}{i_{2}}t}=\partial g/\partial m_{{i_{1}}{i_{2}}t} is the tensor of partial derivatives with respect to the reconstructed tensor model entries.

\leveldown

Goal-oriented CP

Let 𝓜=𝐀,𝐁,𝐂\bm{\mathscr{{M}}}=\llbracket\bm{\mathbf{{A}}},\bm{\mathbf{{B}}},\bm{\mathbf{{C}}}\rrbracket be a given 3-way, rank-RR (unweighted) CP model. Then 𝐯=[vec(𝐀)Tvec(𝐁)Tvec(𝐂)T]T\bm{\mathbf{{v}}}=[\operatorname{vec}(\bm{\mathbf{{A}}})^{T}\operatorname{vec}(\bm{\mathbf{{B}}})^{T}\operatorname{vec}(\bm{\mathbf{{C}}})^{T}]^{T}. Note that 𝓜t=𝐀,𝐁,𝐜t\bm{\mathscr{{M}}}_{t}=\llbracket\bm{\mathbf{{A}}},\bm{\mathbf{{B}}},\bm{\mathbf{{c}}}_{t}\rrbracket where 𝐜t=𝐂(t,:)\bm{\mathbf{{c}}}_{t}=\bm{\mathbf{{C}}}(t,:). Let 𝐯t=vec(𝓜t)\bm{\mathbf{{v}}}_{t}=\operatorname{vec}(\bm{\mathscr{{M}}}_{t}). Then we have

(20) 𝐯t\displaystyle\bm{\mathbf{{v}}}_{t} =vec(𝓜t)\displaystyle=\operatorname{vec}(\bm{\mathscr{{M}}}_{t})
=𝐏1Tvec((𝓜t)(1))\displaystyle=\bm{\mathbf{{P}}}_{1}^{T}\operatorname{vec}((\bm{\mathscr{{M}}}_{t})_{(1)})
=𝐏1Tvec(𝐀(𝐜t𝐁)T)\displaystyle=\bm{\mathbf{{P}}}_{1}^{T}\operatorname{vec}(\bm{\mathbf{{A}}}(\bm{\mathbf{{c}}}_{t}\odot\bm{\mathbf{{B}}})^{T})
=𝐏1T((𝐜t𝐁)𝐈I1)vec(𝐀)\displaystyle=\bm{\mathbf{{P}}}_{1}^{T}((\bm{\mathbf{{c}}}_{t}\odot\bm{\mathbf{{B}}})\otimes\bm{\mathbf{{I}}}_{I_{1}})\operatorname{vec}(\bm{\mathbf{{A}}})

where 𝐏k\bm{\mathbf{{P}}}_{k} is the mode-kk perfect shuffle matrix such that 𝐏kvec(𝓧)=vec(𝐗(k))\bm{\mathbf{{P}}}_{k}\operatorname{vec}(\bm{\mathscr{{X}}})=\operatorname{vec}(\bm{\mathbf{{X}}}_{(k)}) and 𝐈I1\bm{\mathbf{{I}}}_{I_{1}} is the I1×I1I_{1}\times I_{1} identity matrix. Thus

(21) 𝐯tvec(𝐀)=𝐏1T((𝐜t𝐁)𝐈I1)\frac{\partial\bm{\mathbf{{v}}}_{t}}{\partial\operatorname{vec}(\bm{\mathbf{{A}}})}=\bm{\mathbf{{P}}}_{1}^{T}((\bm{\mathbf{{c}}}_{t}\odot\bm{\mathbf{{B}}})\otimes\bm{\mathbf{{I}}}_{I_{1}})

and through similar manipulations we obtain

(22) 𝐯tvec(𝐁)\displaystyle\frac{\partial\bm{\mathbf{{v}}}_{t}}{\partial\operatorname{vec}(\bm{\mathbf{{B}}})} =𝐏2T((𝐜t𝐀)𝐈I2),\displaystyle=\bm{\mathbf{{P}}}_{2}^{T}((\bm{\mathbf{{c}}}_{t}\odot\bm{\mathbf{{A}}})\otimes\bm{\mathbf{{I}}}_{I_{2}}),
𝐯tvec(𝐜t)\displaystyle\frac{\partial\bm{\mathbf{{v}}}_{t}}{\partial\operatorname{vec}(\bm{\mathbf{{c}}}_{t})} =𝐏3T((𝐁𝐀)𝐈τ).\displaystyle=\bm{\mathbf{{P}}}_{3}^{T}((\bm{\mathbf{{B}}}\odot\bm{\mathbf{{A}}})\otimes\bm{\mathbf{{I}}}_{\tau}).

The Jacobian matrix 𝐉=𝐃𝐯𝐟\bm{\mathbf{{J}}}=\bm{\mathbf{{D}}}_{\bm{\mathbf{{v}}}}\bm{\mathbf{{f}}} has the block structure

(23) 𝐉=[𝐉𝐀1𝐉𝐁1𝐉𝐜1100𝐉𝐀2𝐉𝐁20𝐉𝐜220𝐉𝐀τ𝐉𝐁τ00𝐉𝐜ττ]\bm{\mathbf{{J}}}=\begin{bmatrix}\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{1}&\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{1}&\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{1}}^{1}&0&\ldots&0\\ \bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{2}&\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{2}&0&\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{2}}^{2}&\ldots&0\\ \vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\ \bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{\tau}&\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{\tau}&0&0&\ldots&\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{\tau}}^{\tau}\\ \end{bmatrix}

where

(24) 𝐉𝐀t\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{t} =ft𝐯t𝐯tvec(𝐀)\displaystyle=\frac{\partial f_{t}}{\partial\bm{\mathbf{{v}}}_{t}}\frac{\partial\bm{\mathbf{{v}}}_{t}}{\partial\operatorname{vec}(\bm{\mathbf{{A}}})}
=vec(𝓩t)T𝐯tvec(𝐀)\displaystyle=-\operatorname{vec}(\bm{\mathscr{{Z}}}^{t})^{T}\frac{\partial\bm{\mathbf{{v}}}_{t}}{\partial\operatorname{vec}(\bm{\mathbf{{A}}})}
=vec(𝐙(1)t)T𝐏1𝐏1T((𝐜t𝐁)𝐈I1)\displaystyle=-\operatorname{vec}(\bm{\mathbf{{Z}}}_{(1)}^{t})^{T}\bm{\mathbf{{P}}}_{1}\bm{\mathbf{{P}}}_{1}^{T}((\bm{\mathbf{{c}}}_{t}\odot\bm{\mathbf{{B}}})\otimes\bm{\mathbf{{I}}}_{I_{1}})
=(((𝐜t𝐁)T𝐈I1)vec(𝐙(1)t))T\displaystyle=-\left(((\bm{\mathbf{{c}}}_{t}\odot\bm{\mathbf{{B}}})^{T}\otimes\bm{\mathbf{{I}}}_{I_{1}})\operatorname{vec}(\bm{\mathbf{{Z}}}_{(1)}^{t})\right)^{T}
=vec(𝐙(1)(𝐜t𝐁)T)T\displaystyle=-\operatorname{vec}(\bm{\mathbf{{Z}}}_{(1)}(\bm{\mathbf{{c}}}_{t}\odot\bm{\mathbf{{B}}})^{T})^{T}

for t𝕋t\in\mathbb{T}. Similarly,

(25) 𝐉𝐁t\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{t} =vec(𝐙(2)(𝐜t𝐀)T)T,\displaystyle=-\operatorname{vec}(\bm{\mathbf{{Z}}}_{(2)}(\bm{\mathbf{{c}}}_{t}\odot\bm{\mathbf{{A}}})^{T})^{T},
𝐉𝐜tt\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{t}}^{t} =vec(𝐙(3)(𝐁𝐀)T)T.\displaystyle=-\operatorname{vec}(\bm{\mathbf{{Z}}}_{(3)}(\bm{\mathbf{{B}}}\odot\bm{\mathbf{{A}}})^{T})^{T}.

Thus the Jacobian blocks are given by the well-known Matricized Tensor Times Khatri-Rao Product (MTTKRP) operation with the gradient tensors 𝓩t\bm{\mathscr{{Z}}}^{t}. Furthermore, one can efficiently compute Jacobian-vector and Jacobian-transpose-vector products using these formulas. Given 𝐮T\bm{\mathbf{{u}}}\in\mathbb{R}^{T} one can easily show

(26) 𝐉T𝐮=[t=1τ(𝐉𝐀t)Tutt=1τ(𝐉𝐁t)Tut(𝐉𝐜11)Tu1(𝐉𝐜ττ)Tuτ]=[vec(𝐙~(1)(𝐂𝐁))vec(𝐙~(2)(𝐂𝐀))vec(𝐙~(3)(𝐁𝐀))]\bm{\mathbf{{J}}}^{T}\bm{\mathbf{{u}}}=\begin{bmatrix}\sum_{t=1}^{\tau}(\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{t})^{T}u_{t}\\ \sum_{t=1}^{\tau}(\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{t})^{T}u_{t}\\ (\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{1}}^{1})^{T}u_{1}\\ \vdots\\ (\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{\tau}}^{\tau})^{T}u_{\tau}\\ \end{bmatrix}=\begin{bmatrix}\operatorname{vec}(\bm{\mathbf{{\tilde{Z}}}}_{(1)}(\bm{\mathbf{{C}}}\odot\bm{\mathbf{{B}}}))\\ \operatorname{vec}(\bm{\mathbf{{\tilde{Z}}}}_{(2)}(\bm{\mathbf{{C}}}\odot\bm{\mathbf{{A}}}))\\ \operatorname{vec}(\bm{\mathbf{{\tilde{Z}}}}_{(3)}(\bm{\mathbf{{B}}}\odot\bm{\mathbf{{A}}}))\\ \end{bmatrix}

where 𝓩~(:,:,t)=ut𝓩(:,:,t)\bm{\mathscr{{\tilde{Z}}}}(:,:,t)=-u_{t}\bm{\mathscr{{Z}}}(:,:,t), which allows for efficient calculation of the gradient. Moreover, given 𝐰=[𝐰𝐀T𝐰𝐁T𝐰𝐂T]T=[vec(𝐖𝐀)Tvec(𝐖𝐁)Tvec(𝐖𝐂)T]TR(I1+I2+τ)\bm{\mathbf{{w}}}=[\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}^{T}\;\bm{\mathbf{{w}}}_{\bm{\mathbf{{B}}}}^{T}\;\bm{\mathbf{{w}}}_{\bm{\mathbf{{C}}}}^{T}]^{T}=[\operatorname{vec}(\bm{\mathbf{{W}}}_{\bm{\mathbf{{A}}}})^{T}\operatorname{vec}(\bm{\mathbf{{W}}}_{\bm{\mathbf{{B}}}})^{T}\operatorname{vec}(\bm{\mathbf{{W}}}_{\bm{\mathbf{{C}}}})^{T}]^{T}\in\mathbb{R}^{R(I_{1}+I_{2}+\tau)},

(27) 𝐉𝐰=[𝐉𝐀1𝐰𝐀+𝐉𝐁1𝐰𝐀+𝐉𝐜11𝐰𝐜1𝐉𝐀τ𝐰𝐀+𝐉𝐁τ𝐰𝐀+𝐉𝐜ττ𝐰𝐜τ]\bm{\mathbf{{J}}}\bm{\mathbf{{w}}}=\begin{bmatrix}\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{1}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{1}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{1}}^{1}\bm{\mathbf{{w}}}_{\bm{\mathbf{{c}}}_{1}}\\ \vdots\\ \bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{\tau}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{\tau}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{\tau}}^{\tau}\bm{\mathbf{{w}}}_{\bm{\mathbf{{c}}}_{\tau}}\\ \end{bmatrix}

where, e.g.,

(28) 𝐉𝐀t𝐰𝐀\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{t}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}} =vec(𝐙(1)t)T𝐏1𝐏1T((𝐜t𝐁)𝐈I1)vec(𝐖𝐀)\displaystyle=-\operatorname{vec}(\bm{\mathbf{{Z}}}_{(1)}^{t})^{T}\bm{\mathbf{{P}}}_{1}\bm{\mathbf{{P}}}_{1}^{T}((\bm{\mathbf{{c}}}_{t}\odot\bm{\mathbf{{B}}})\otimes\bm{\mathbf{{I}}}_{I_{1}})\operatorname{vec}(\bm{\mathbf{{W}}}_{\bm{\mathbf{{A}}}})
=vec(𝐙(1)t)T𝐏1𝐏1Tvec(𝐖𝐀(𝐜t𝐁)T)\displaystyle=-\operatorname{vec}(\bm{\mathbf{{Z}}}_{(1)}^{t})^{T}\bm{\mathbf{{P}}}_{1}\bm{\mathbf{{P}}}_{1}^{T}\operatorname{vec}(\bm{\mathbf{{W}}}_{\bm{\mathbf{{A}}}}(\bm{\mathbf{{c}}}_{t}\odot\bm{\mathbf{{B}}})^{T})
=vec(𝐙(1)t)T𝐏1𝐏1Tvec(𝐖𝐀,𝐁,𝐜t(1))\displaystyle=-\operatorname{vec}(\bm{\mathbf{{Z}}}_{(1)}^{t})^{T}\bm{\mathbf{{P}}}_{1}\bm{\mathbf{{P}}}_{1}^{T}\operatorname{vec}(\llbracket\bm{\mathbf{{W}}}_{\bm{\mathbf{{A}}}},\bm{\mathbf{{B}}},\bm{\mathbf{{c}}}_{t}\rrbracket_{(1)})
=vec(𝓩t)Tvec(𝐖𝐀,𝐁,𝐜t).\displaystyle=-\operatorname{vec}(\bm{\mathscr{{Z}}}^{t})^{T}\operatorname{vec}(\llbracket\bm{\mathbf{{W}}}_{\bm{\mathbf{{A}}}},\bm{\mathbf{{B}}},\bm{\mathbf{{c}}}_{t}\rrbracket).

Thus

(29) 𝐉𝐀t𝐰𝐀+𝐉𝐁t𝐰𝐀+𝐉𝐜tt𝐰𝐜t=vec(𝓩t)Tvec(𝐖𝐀,𝐁,𝐜t+𝐀,𝐖𝐁,𝐜t+𝐀,𝐁,𝐰𝐜t).\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{t}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{t}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{t}}^{t}\bm{\mathbf{{w}}}_{\bm{\mathbf{{c}}}_{t}}=\\ -\operatorname{vec}(\bm{\mathscr{{Z}}}^{t})^{T}\operatorname{vec}\left(\llbracket\bm{\mathbf{{W}}}_{\bm{\mathbf{{A}}}},\bm{\mathbf{{B}}},\bm{\mathbf{{c}}}_{t}\rrbracket+\llbracket\bm{\mathbf{{A}}},\bm{\mathbf{{W}}}_{\bm{\mathbf{{B}}}},\bm{\mathbf{{c}}}_{t}\rrbracket+\llbracket\bm{\mathbf{{A}}},\bm{\mathbf{{B}}},\bm{\mathbf{{w}}}_{\bm{\mathbf{{c}}}_{t}}\rrbracket\right).

Combining this with Eq. 26, one can efficiently compute Gauss-Newton Hessian-vector products.

\levelstay

Goal-oriented Tucker

Let 𝓜=𝓖;𝐀,𝐁,𝐂\bm{\mathscr{{M}}}=\llbracket\bm{\mathscr{{G}}};\bm{\mathbf{{A}}},\bm{\mathbf{{B}}},\bm{\mathbf{{C}}}\rrbracket be a given 3-way, rank-(R1,R2,R3)(R_{1},R_{2},R_{3}) Tucker model. Then 𝐯=[vec(𝓖)Tvec(𝐀)Tvec(𝐁)Tvec(𝐂)T]T\bm{\mathbf{{v}}}=[\operatorname{vec}(\bm{\mathscr{{G}}})^{T}\operatorname{vec}(\bm{\mathbf{{A}}})^{T}\operatorname{vec}(\bm{\mathbf{{B}}})^{T}\operatorname{vec}(\bm{\mathbf{{C}}})^{T}]^{T} and 𝓜t=𝓖;𝐀,𝐁,𝐜t\bm{\mathscr{{M}}}_{t}=\llbracket\bm{\mathscr{{G}}};\bm{\mathbf{{A}}},\bm{\mathbf{{B}}},\bm{\mathbf{{c}}}_{t}\rrbracket where 𝐜t=𝐂(t,:)\bm{\mathbf{{c}}}_{t}=\bm{\mathbf{{C}}}(t,:). With 𝐯t=vec(𝓜t)\bm{\mathbf{{v}}}_{t}=\operatorname{vec}(\bm{\mathscr{{M}}}_{t}) and using similar techniques as above, one can show the Jacobian matrix has a similar block structure

(30) 𝐉=[𝐉𝓖1𝐉𝐀1𝐉𝐁1𝐉𝐜1100𝐉𝓖2𝐉𝐀2𝐉𝐁20𝐉𝐜220𝐉𝓖τ𝐉𝐀τ𝐉𝐁τ00𝐉𝐜ττ]\bm{\mathbf{{J}}}=\begin{bmatrix}\bm{\mathbf{{J}}}_{\bm{\mathscr{{G}}}}^{1}&\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{1}&\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{1}&\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{1}}^{1}&0&\ldots&0\\ \bm{\mathbf{{J}}}_{\bm{\mathscr{{G}}}}^{2}&\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{2}&\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{2}&0&\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{2}}^{2}&\ldots&0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\ \bm{\mathbf{{J}}}_{\bm{\mathscr{{G}}}}^{\tau}&\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{\tau}&\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{\tau}&0&0&\ldots&\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{\tau}}^{\tau}\\ \end{bmatrix}

where

(31) 𝐉𝓖t\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathscr{{G}}}}^{t} =vec(𝓩t×1𝐀T×2𝐁T×3𝐜tT)T,\displaystyle=-\operatorname{vec}(\bm{\mathscr{{Z}}}^{t}\times_{1}\bm{\mathbf{{A}}}^{T}\times_{2}\bm{\mathbf{{B}}}^{T}\times_{3}\bm{\mathbf{{c}}}_{t}^{T})^{T},
𝐉𝐀t\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{t} =vec((𝓩t×2𝐁T×3𝐜tT)(1)𝐆(1)T)T,\displaystyle=-\operatorname{vec}((\bm{\mathscr{{Z}}}^{t}\times_{2}\bm{\mathbf{{B}}}^{T}\times_{3}\bm{\mathbf{{c}}}_{t}^{T})_{(1)}\bm{\mathbf{{G}}}_{(1)}^{T})^{T},
𝐉𝐁t\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{t} =vec((𝓩t×1𝐀T×3𝐜tT)(2)𝐆(2)T)T,\displaystyle=-\operatorname{vec}((\bm{\mathscr{{Z}}}^{t}\times_{1}\bm{\mathbf{{A}}}^{T}\times_{3}\bm{\mathbf{{c}}}_{t}^{T})_{(2)}\bm{\mathbf{{G}}}_{(2)}^{T})^{T},
𝐉𝐜tt\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{t}}^{t} =vec((𝓩t×1𝐀T×2𝐁T)(3)𝐆(3)T)T.\displaystyle=-\operatorname{vec}((\bm{\mathscr{{Z}}}^{t}\times_{1}\bm{\mathbf{{A}}}^{T}\times_{2}\bm{\mathbf{{B}}}^{T})_{(3)}\bm{\mathbf{{G}}}_{(3)}^{T})^{T}.

Thus, as before,

(32) 𝐉T𝐮=[vec(𝓩~×1𝐀T×2𝐁T×3𝐂T)vec((𝓩~×2𝐁T×3𝐂T)(1)𝐆(1)T)vec((𝓩~×1𝐀T×3𝐂T)(2)𝐆(2)T)vec((𝓩~×1𝐀T×2𝐁T)(3)𝐆(3)T)]\bm{\mathbf{{J}}}^{T}\bm{\mathbf{{u}}}=\begin{bmatrix}\operatorname{vec}(\bm{\mathscr{{\tilde{Z}}}}\times_{1}\bm{\mathbf{{A}}}^{T}\times_{2}\bm{\mathbf{{B}}}^{T}\times_{3}\bm{\mathbf{{C}}}^{T})\\ \operatorname{vec}((\bm{\mathscr{{\tilde{Z}}}}\times_{2}\bm{\mathbf{{B}}}^{T}\times_{3}\bm{\mathbf{{C}}}^{T})_{(1)}\bm{\mathbf{{G}}}_{(1)}^{T})\\ \operatorname{vec}((\bm{\mathscr{{\tilde{Z}}}}\times_{1}\bm{\mathbf{{A}}}^{T}\times_{3}\bm{\mathbf{{C}}}^{T})_{(2)}\bm{\mathbf{{G}}}_{(2)}^{T})\\ \operatorname{vec}((\bm{\mathscr{{\tilde{Z}}}}\times_{1}\bm{\mathbf{{A}}}^{T}\times_{2}\bm{\mathbf{{B}}}^{T})_{(3)}\bm{\mathbf{{G}}}_{(3)}^{T})\end{bmatrix}

with 𝓩~\bm{\mathscr{{\tilde{Z}}}} defined as above. Similarly,

(33) 𝐉𝐰=[𝐉𝓖1𝐰𝓖+𝐉𝐀1𝐰𝐀+𝐉𝐁1𝐰𝐀+𝐉𝐜11𝐰𝐜1𝐉𝓖τ𝐰𝓖+𝐉𝐀τ𝐰𝐀+𝐉𝐁τ𝐰𝐀+𝐉𝐜ττ𝐰𝐜τ]\bm{\mathbf{{J}}}\bm{\mathbf{{w}}}=\begin{bmatrix}\bm{\mathbf{{J}}}_{\bm{\mathscr{{G}}}}^{1}\bm{\mathbf{{w}}}_{\bm{\mathscr{{G}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{1}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{1}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{1}}^{1}\bm{\mathbf{{w}}}_{\bm{\mathbf{{c}}}_{1}}\\ \vdots\\ \bm{\mathbf{{J}}}_{\bm{\mathscr{{G}}}}^{\tau}\bm{\mathbf{{w}}}_{\bm{\mathscr{{G}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{\tau}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{\tau}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{\tau}}^{\tau}\bm{\mathbf{{w}}}_{\bm{\mathbf{{c}}}_{\tau}}\\ \end{bmatrix}

with

(34) 𝐉𝓖t𝐰𝓖+𝐉𝐀t𝐰𝐀+𝐉𝐁t𝐰𝐀+𝐉𝐜tt𝐰𝐜t=vec(𝓩t)Tvec(𝓦𝓖×1𝐀×2𝐁×3𝐜t+𝓖×1𝐖𝐀×2𝐁×3𝐜t+𝓖×1𝐀×2𝐖𝐁×3𝐜t+𝓖×1𝐀×2𝐁×3𝐰𝐜t).\bm{\mathbf{{J}}}_{\bm{\mathscr{{G}}}}^{t}\bm{\mathbf{{w}}}_{\bm{\mathscr{{G}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{t}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{t}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{c}}}_{t}}^{t}\bm{\mathbf{{w}}}_{\bm{\mathbf{{c}}}_{t}}=-\operatorname{vec}(\bm{\mathscr{{Z}}}^{t})^{T}\operatorname{vec}(\bm{\mathscr{{W}}}_{\bm{\mathscr{{G}}}}\times_{1}\bm{\mathbf{{A}}}\times_{2}\bm{\mathbf{{B}}}\times_{3}\bm{\mathbf{{c}}}_{t}\\ +\bm{\mathscr{{G}}}\times_{1}\bm{\mathbf{{W}}}_{\bm{\mathbf{{A}}}}\times_{2}\bm{\mathbf{{B}}}\times_{3}\bm{\mathbf{{c}}}_{t}+\bm{\mathscr{{G}}}\times_{1}\bm{\mathbf{{A}}}\times_{2}\bm{\mathbf{{W}}}_{\bm{\mathbf{{B}}}}\times_{3}\bm{\mathbf{{c}}}_{t}+\bm{\mathscr{{G}}}\times_{1}\bm{\mathbf{{A}}}\times_{2}\bm{\mathbf{{B}}}\times_{3}\bm{\mathbf{{w}}}_{\bm{\mathbf{{c}}}_{t}}).
\levelstay

Frobenius Gradient, Hessian, and Preconditioner

While the goal-oriented formulation in Eq. 6 is general and can be applied with any statistical loss function f(𝓧,𝓜)f(\bm{\mathscr{{X}}},\bm{\mathscr{{M}}}), our examples below focus specifically on Frobenius (sum-of-squares) loss with f(𝓧,𝓜)=12𝓧𝓜F2f(\bm{\mathscr{{X}}},\bm{\mathscr{{M}}})=\frac{1}{2}\|\bm{\mathscr{{X}}}-\bm{\mathscr{{M}}}\|_{F}^{2}. While construction of the gradient, Gauss-Newton Hessian, and diagonal blocks for preconditioning is standard within the literature, we summarize these formulas here for completeness. As above, we restrict to the 3-way case for simplicity of exposition as these formulas naturally generalize to high-order tensors. First, in the CP case we have

(35) f𝐀missing\displaystyle\frac{\partial f}{\partial\bm{\mathbf{{A}}}missing} =𝐀(𝐂T𝐂𝐁T𝐁)𝐗(1)(𝐂𝐁),\displaystyle=\bm{\mathbf{{A}}}(\bm{\mathbf{{C}}}^{T}\bm{\mathbf{{C}}}\ast\bm{\mathbf{{B}}}^{T}\bm{\mathbf{{B}}})-\bm{\mathbf{{X}}}_{(1)}(\bm{\mathbf{{C}}}\odot\bm{\mathbf{{B}}}),
f𝐁missing\displaystyle\frac{\partial f}{\partial\bm{\mathbf{{B}}}missing} =𝐁(𝐂T𝐂𝐀T𝐀)𝐗(2)(𝐂𝐀),\displaystyle=\bm{\mathbf{{B}}}(\bm{\mathbf{{C}}}^{T}\bm{\mathbf{{C}}}\ast\bm{\mathbf{{A}}}^{T}\bm{\mathbf{{A}}})-\bm{\mathbf{{X}}}_{(2)}(\bm{\mathbf{{C}}}\odot\bm{\mathbf{{A}}}),
f𝐂missing\displaystyle\frac{\partial f}{\partial\bm{\mathbf{{C}}}missing} =𝐂(𝐁T𝐁𝐀T𝐀)𝐗(3)(𝐁𝐀),\displaystyle=\bm{\mathbf{{C}}}(\bm{\mathbf{{B}}}^{T}\bm{\mathbf{{B}}}\ast\bm{\mathbf{{A}}}^{T}\bm{\mathbf{{A}}})-\bm{\mathbf{{X}}}_{(3)}(\bm{\mathbf{{B}}}\odot\bm{\mathbf{{A}}}),

where \ast is the matrix Hadamard (element-wise) product. Given 𝐰\bm{\mathbf{{w}}} as defined above, the corresponding Gauss-Newton Hessian-vector product is

(36) 𝐉T𝐉𝐰=[vec(𝐖𝐀(𝐁T𝐁𝐂T𝐂)+𝐀(𝐖𝐁T𝐁𝐂T𝐂)+𝐀(𝐁T𝐁𝐖𝐂T𝐂))vec(𝐁(𝐖𝐀T𝐀𝐂T𝐂)+𝐖𝐁(𝐀T𝐀𝐂T𝐂)+𝐁(𝐀T𝐀𝐖𝐂T𝐂))vec(𝐂(𝐖𝐀T𝐀𝐁T𝐁)+𝐂(𝐀T𝐀𝐖𝐁T𝐁)+𝐖𝐂(𝐀T𝐀𝐁T𝐁))].\bm{\mathbf{{J}}}^{T}\bm{\mathbf{{J}}}\bm{\mathbf{{w}}}=\begin{bmatrix}\operatorname{vec}(\bm{\mathbf{{W}}}_{\bm{\mathbf{{A}}}}(\bm{\mathbf{{B}}}^{T}\bm{\mathbf{{B}}}\ast\bm{\mathbf{{C}}}^{T}\bm{\mathbf{{C}}})+\bm{\mathbf{{A}}}(\bm{\mathbf{{W}}}_{\bm{\mathbf{{B}}}}^{T}\bm{\mathbf{{B}}}\ast\bm{\mathbf{{C}}}^{T}\bm{\mathbf{{C}}})+\bm{\mathbf{{A}}}(\bm{\mathbf{{B}}}^{T}\bm{\mathbf{{B}}}\ast\bm{\mathbf{{W}}}_{\bm{\mathbf{{C}}}}^{T}\bm{\mathbf{{C}}}))\\ \operatorname{vec}(\bm{\mathbf{{B}}}(\bm{\mathbf{{W}}}_{\bm{\mathbf{{A}}}}^{T}\bm{\mathbf{{A}}}\ast\bm{\mathbf{{C}}}^{T}\bm{\mathbf{{C}}})+\bm{\mathbf{{W}}}_{\bm{\mathbf{{B}}}}(\bm{\mathbf{{A}}}^{T}\bm{\mathbf{{A}}}\ast\bm{\mathbf{{C}}}^{T}\bm{\mathbf{{C}}})+\bm{\mathbf{{B}}}(\bm{\mathbf{{A}}}^{T}\bm{\mathbf{{A}}}\ast\bm{\mathbf{{W}}}_{\bm{\mathbf{{C}}}}^{T}\bm{\mathbf{{C}}}))\\ \operatorname{vec}(\bm{\mathbf{{C}}}(\bm{\mathbf{{W}}}_{\bm{\mathbf{{A}}}}^{T}\bm{\mathbf{{A}}}\ast\bm{\mathbf{{B}}}^{T}\bm{\mathbf{{B}}})+\bm{\mathbf{{C}}}(\bm{\mathbf{{A}}}^{T}\bm{\mathbf{{A}}}\ast\bm{\mathbf{{W}}}_{\bm{\mathbf{{B}}}}^{T}\bm{\mathbf{{B}}})+\bm{\mathbf{{W}}}_{\bm{\mathbf{{C}}}}(\bm{\mathbf{{A}}}^{T}\bm{\mathbf{{A}}}\ast\bm{\mathbf{{B}}}^{T}\bm{\mathbf{{B}}}))\end{bmatrix}.

Finally, the diagonal blocks of the Gauss-Newton Hessian are

(37) 𝐉𝐀T𝐉𝐀\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{T}\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}} =(𝐂T𝐂𝐁T𝐁)𝐈1,\displaystyle=(\bm{\mathbf{{C}}}^{T}\bm{\mathbf{{C}}}\ast\bm{\mathbf{{B}}}^{T}\bm{\mathbf{{B}}})\otimes\bm{\mathbf{{I}}}_{1},
𝐉𝐁T𝐉𝐁\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{T}\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}} =(𝐂T𝐂𝐀T𝐀)𝐈2,\displaystyle=(\bm{\mathbf{{C}}}^{T}\bm{\mathbf{{C}}}\ast\bm{\mathbf{{A}}}^{T}\bm{\mathbf{{A}}})\otimes\bm{\mathbf{{I}}}_{2},
𝐉𝐂T𝐉𝐂\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{C}}}}^{T}\bm{\mathbf{{J}}}_{\bm{\mathbf{{C}}}} =(𝐁T𝐁𝐀T𝐀)𝐈τ,\displaystyle=(\bm{\mathbf{{B}}}^{T}\bm{\mathbf{{B}}}\ast\bm{\mathbf{{A}}}^{T}\bm{\mathbf{{A}}})\otimes\bm{\mathbf{{I}}}_{\tau},

whose inverses can be efficiently applied through Cholesky decompositions of the above Hadamard product matrices.

For Tucker, the Gauss-Newton Jacobian is 𝐉=[𝐉𝓖𝐉𝐀𝐉𝐁𝐉𝐂]\bm{\mathbf{{J}}}=[\bm{\mathbf{{J}}}_{\bm{\mathscr{{G}}}}\;\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}\;\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}\;\bm{\mathbf{{J}}}_{\bm{\mathbf{{C}}}}] with

(38) 𝐉𝓖\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathscr{{G}}}} =𝐂𝐁𝐀,\displaystyle=-\bm{\mathbf{{C}}}\otimes\bm{\mathbf{{B}}}\otimes\bm{\mathbf{{A}}},
𝐉𝐀\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}} =𝐏1T((𝐂𝐁)𝐆(1)T𝐈1),\displaystyle=-\bm{\mathbf{{P}}}_{1}^{T}((\bm{\mathbf{{C}}}\otimes\bm{\mathbf{{B}}})\bm{\mathbf{{G}}}_{(1)}^{T}\otimes\bm{\mathbf{{I}}}_{1}),
𝐉𝐁\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}} =𝐏2T((𝐂𝐀)𝐆(2)T𝐈2),\displaystyle=-\bm{\mathbf{{P}}}_{2}^{T}((\bm{\mathbf{{C}}}\otimes\bm{\mathbf{{A}}})\bm{\mathbf{{G}}}_{(2)}^{T}\otimes\bm{\mathbf{{I}}}_{2}),
𝐉𝐂\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{C}}}} =𝐏3T((𝐁𝐀)𝐆(3)T𝐈τ).\displaystyle=-\bm{\mathbf{{P}}}_{3}^{T}((\bm{\mathbf{{B}}}\otimes\bm{\mathbf{{A}}})\bm{\mathbf{{G}}}_{(3)}^{T}\otimes\bm{\mathbf{{I}}}_{\tau}).

Thus one can compute Jacobian-transpose products as

(39) 𝐉T𝐮=[vec(𝓤×1𝐀T×2𝐁T×3𝐂T)vec((𝓤×2𝐁T×3𝐂T)(1)𝐆(1)T)vec((𝓤×1𝐀T×3𝐂T)(2)𝐆(2)T)vec((𝓤×1𝐀T×2𝐁T)(3)𝐆(3)T)]\bm{\mathbf{{J}}}^{T}\bm{\mathbf{{u}}}=-\begin{bmatrix}\operatorname{vec}(\bm{\mathscr{{U}}}\times_{1}\bm{\mathbf{{A}}}^{T}\times_{2}\bm{\mathbf{{B}}}^{T}\times_{3}\bm{\mathbf{{C}}}^{T})\\ \operatorname{vec}((\bm{\mathscr{{U}}}\times_{2}\bm{\mathbf{{B}}}^{T}\times_{3}\bm{\mathbf{{C}}}^{T})_{(1)}\bm{\mathbf{{G}}}_{(1)}^{T})\\ \operatorname{vec}((\bm{\mathscr{{U}}}\times_{1}\bm{\mathbf{{A}}}^{T}\times_{3}\bm{\mathbf{{C}}}^{T})_{(2)}\bm{\mathbf{{G}}}_{(2)}^{T})\\ \operatorname{vec}((\bm{\mathscr{{U}}}\times_{1}\bm{\mathbf{{A}}}^{T}\times_{2}\bm{\mathbf{{B}}}^{T})_{(3)}\bm{\mathbf{{G}}}_{(3)}^{T})\end{bmatrix}

where 𝐮=vec(𝓤)\bm{\mathbf{{u}}}=\operatorname{vec}(\bm{\mathscr{{U}}}). In particular, this allows calculation of the gradient where 𝓤=𝓧𝓜\bm{\mathscr{{U}}}=\bm{\mathscr{{X}}}-\bm{\mathscr{{M}}}. Furthermore, Jacobian-vector products can be computed as

(40) 𝐉𝓖𝐰𝓖+𝐉𝐀𝐰𝐀+𝐉𝐁𝐰𝐀+𝐉𝐂𝐰𝐂=vec(𝓦𝓖×1𝐀×2𝐁×3𝐂+𝓖×1𝐖𝐀×2𝐁×3𝐂+𝓖×1𝐀×2𝐖𝐁×3𝐂+𝓖×1𝐀×2𝐁×3𝐰𝐂).\bm{\mathbf{{J}}}_{\bm{\mathscr{{G}}}}\bm{\mathbf{{w}}}_{\bm{\mathscr{{G}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}\bm{\mathbf{{w}}}_{\bm{\mathbf{{A}}}}+\bm{\mathbf{{J}}}_{\bm{\mathbf{{C}}}}\bm{\mathbf{{w}}}_{\bm{\mathbf{{C}}}}=-\operatorname{vec}(\bm{\mathscr{{W}}}_{\bm{\mathscr{{G}}}}\times_{1}\bm{\mathbf{{A}}}\times_{2}\bm{\mathbf{{B}}}\times_{3}\bm{\mathbf{{C}}}\\ +\bm{\mathscr{{G}}}\times_{1}\bm{\mathbf{{W}}}_{\bm{\mathbf{{A}}}}\times_{2}\bm{\mathbf{{B}}}\times_{3}\bm{\mathbf{{C}}}+\bm{\mathscr{{G}}}\times_{1}\bm{\mathbf{{A}}}\times_{2}\bm{\mathbf{{W}}}_{\bm{\mathbf{{B}}}}\times_{3}\bm{\mathbf{{C}}}+\bm{\mathscr{{G}}}\times_{1}\bm{\mathbf{{A}}}\times_{2}\bm{\mathbf{{B}}}\times_{3}\bm{\mathbf{{w}}}_{\bm{\mathbf{{C}}}}).

Finally, the Gauss-Newton Hessian diagonal blocks for preconditioning are

(41) 𝐉𝓖T𝐉𝓖\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathscr{{G}}}}^{T}\bm{\mathbf{{J}}}_{\bm{\mathscr{{G}}}} =𝐂T𝐂𝐁T𝐁𝐀T𝐀,\displaystyle=\bm{\mathbf{{C}}}^{T}\bm{\mathbf{{C}}}\otimes\bm{\mathbf{{B}}}^{T}\bm{\mathbf{{B}}}\otimes\bm{\mathbf{{A}}}^{T}\bm{\mathbf{{A}}},
𝐉𝐀T𝐉𝐀\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}}^{T}\bm{\mathbf{{J}}}_{\bm{\mathbf{{A}}}} =𝓖(1)(𝐂T𝐂𝐁T𝐁)𝓖(1)T𝐈1,\displaystyle=\bm{\mathscr{{G}}}_{(1)}(\bm{\mathbf{{C}}}^{T}\bm{\mathbf{{C}}}\otimes\bm{\mathbf{{B}}}^{T}\bm{\mathbf{{B}}})\bm{\mathscr{{G}}}_{(1)}^{T}\otimes\bm{\mathbf{{I}}}_{1},
𝐉𝐁T𝐉𝐁\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}}^{T}\bm{\mathbf{{J}}}_{\bm{\mathbf{{B}}}} =𝓖(2)(𝐂T𝐂𝐀T𝐀)𝓖(2)T𝐈2,\displaystyle=\bm{\mathscr{{G}}}_{(2)}(\bm{\mathbf{{C}}}^{T}\bm{\mathbf{{C}}}\otimes\bm{\mathbf{{A}}}^{T}\bm{\mathbf{{A}}})\bm{\mathscr{{G}}}_{(2)}^{T}\otimes\bm{\mathbf{{I}}}_{2},
𝐉𝐂T𝐉𝐂\displaystyle\bm{\mathbf{{J}}}_{\bm{\mathbf{{C}}}}^{T}\bm{\mathbf{{J}}}_{\bm{\mathbf{{C}}}} =𝓖(3)(𝐁T𝐁𝐀T𝐀)𝓖(3)T𝐈τ.\displaystyle=\bm{\mathscr{{G}}}_{(3)}(\bm{\mathbf{{B}}}^{T}\bm{\mathbf{{B}}}\otimes\bm{\mathbf{{A}}}^{T}\bm{\mathbf{{A}}})\bm{\mathscr{{G}}}_{(3)}^{T}\otimes\bm{\mathbf{{I}}}_{\tau}.

Since these blocks can be quite large, our goal-oriented Tucker experiments only explicitly construct and invert the diagonal of these blocks.

\levelup

Plasma Physics QoI Evaluations

The calculation of the QoI integrals for the plasma physics data are more complicated than simple sums as in the combustion data, owing to the finite element discretization approach taken in these simulations. Here the values stored in the tensor do not necessarily correspond to values of the solution variables at points in the mesh, but rather are coefficients for the projection of the solution onto a finite element basis. The QoI integrals are then given by integrating the discrete solution represented by these basis functions over the computational domain. This is summarized in Algorithm 1 for evaluation over a 3-dimensional mesh 𝒩(Ω)\mathcal{N}(\Omega) for the finite element approximation to the integral

(42) g(t)=Ωf(𝒖(𝒙,t))𝑑𝒙,g(t)=\int_{\Omega}f(\bm{u}(\bm{x},t))d\bm{x},

assuming nodal finite element basis functions, which takes as input a data tensor 𝓧\bm{\mathscr{{X}}}, a set of variable indices 𝐯\bm{\mathbf{{v}}} indicating which variables are used in the QoI, a set of time indices 𝐭\bm{\mathbf{{t}}} indicating which time steps are used for the QoI, the coordinates 𝐱\bm{\mathbf{{x}}}, 𝐲\bm{\mathbf{{y}}}, and 𝐳\bm{\mathbf{{z}}} of the nodes in the mesh, a matrix 𝐀Nqp×Nn\bm{\mathbf{{A}}}\in\mathbb{R}^{N_{qp}\times N_{n}} that interpolates each variable to the Gauss points in the cell from the nodal values, and a set 𝐰qpN\bm{\mathbf{{w}}}\in\mathbb{R}^{N}_{qp} quadrature weights. For trilinear finite element basis functions using first-order Gaussian quadrature Nn=Nqp=8N_{n}=N_{qp}=8, 𝐰\bm{\mathbf{{w}}} is a vector of all ones, and 𝐀=𝐀1𝐀1𝐀1\bm{\mathbf{{A}}}=\bm{\mathbf{{A}}}_{1}\otimes\bm{\mathbf{{A}}}_{1}\otimes\bm{\mathbf{{A}}}_{1} where

(43) 𝐀1=12[1+131131131+13].\bm{\mathbf{{A}}}_{1}=\frac{1}{2}\begin{bmatrix}1+\frac{1}{\sqrt{3}}&1-\frac{1}{\sqrt{3}}\\ 1-\frac{1}{\sqrt{3}}&1+\frac{1}{\sqrt{3}}\end{bmatrix}.

It relies on several mappings/functions that must be initialized when reading the discrete mesh, including node_ind, which provides a list of indices of mesh nodes for the given element, tensor_ind, which provides the (x,y,z)(x,y,z) spatial indices of those nodes in the tensor 𝓧\bm{\mathscr{{X}}}, and det_jac, which computes the determinant of the Jacobian matrix of the transformation for a cell determined by its nodal coordinates to the reference cell. In addition to approximating the above QoI integral, Algorithm 1 also computes the QoI derivative needed for the Gauss-Newton gradient/Hessian computations described in References by approximating

(44) 𝐙(t)=Ωf𝒖(𝒖(𝒙,t))𝑑𝒙.\bm{\mathbf{{Z}}}(t)=\int_{\Omega}\frac{\partial f}{\partial\bm{u}}(\bm{u}(\bm{x},t))d\bm{x}.
Algorithm 1 Finite-element QoI and derivative evaluation for plasma physics data.
𝓧\bm{\mathscr{{X}}}, 𝐯\bm{\mathbf{{v}}}, 𝐭\bm{\mathbf{{t}}}, 𝐱\bm{\mathbf{{x}}}, 𝐲\bm{\mathbf{{y}}}, 𝐳\bm{\mathbf{{z}}}, 𝐀\bm{\mathbf{{A}}}, 𝐰\bm{\mathbf{{w}}}
𝐠\bm{\mathbf{{g}}}, 𝓩\bm{\mathscr{{Z}}}
𝐠=0\bm{\mathbf{{g}}}=0
𝓩=0\bm{\mathscr{{Z}}}=0
for e𝒩(Ω)e\in\mathcal{N}(\Omega) do
  𝐧=node_ind(e)\bm{\mathbf{{n}}}=\mbox{{node\_ind}}(e) \triangleright indices for nodes in mesh cell ee (length NnN_{n})
  𝐛=det_jac(𝐱(𝐧),𝐲(𝐧),𝐳(𝐧))\bm{\mathbf{{b}}}=\mbox{{det\_jac}}(\bm{\mathbf{{x}}}(\bm{\mathbf{{n}}}),\bm{\mathbf{{y}}}(\bm{\mathbf{{n}}}),\bm{\mathbf{{z}}}(\bm{\mathbf{{n}}})) \triangleright determinant of cell transformation Jacobian (length NqpN_{qp})
  𝐈=tensor_ind(𝐧)\bm{\mathbf{{I}}}=\mbox{{tensor\_ind}}(\bm{\mathbf{{n}}}) \triangleright tensor indices for each node index (Nn×3N_{n}\times 3)
  for k=1,,Nnk=1,\dots,N_{n} do
   𝓤(k,:,:)=𝓧(i(k,1),i(k,2),i(k,3),𝐯,𝐭)\bm{\mathscr{{U}}}(k,:,:)=\bm{\mathscr{{X}}}(i(k,1),i(k,2),i(k,3),\bm{\mathbf{{v}}},\bm{\mathbf{{t}}}) \triangleright extract values corresponding to given nodes, variables, and time steps
  end for
  𝐕(1)=𝐀𝐔(1)\bm{\mathbf{{V}}}_{(1)}=\bm{\mathbf{{A}}}\bm{\mathbf{{U}}}_{(1)} \triangleright Interpolate variables to quadrature points
  𝐅=f(𝓥)\bm{\mathbf{{F}}}=f(\bm{\mathscr{{V}}}) \triangleright Evaluate QoI integrand at each quadrature point (Nn×NTN_{n}\times N_{T})
  𝓓=fu(𝓥)\bm{\mathscr{{D}}}=\frac{\partial f}{\partial u}(\bm{\mathscr{{V}}}) \triangleright Evaluate derivative of QoI integrand at each quadrature point (Nqp×Nv×NtN_{qp}\times N_{v}\times N_{t})
  𝐠+=k=18w(k)b(k)𝐟(k,:)\bm{\mathbf{{g}}}\mathrel{+}=\sum_{k=1}^{8}w(k)b(k)\bm{\mathbf{{f}}}(k,:) \triangleright Cell’s contribution to integral
  for k=1,,Nnk=1,\dots,N_{n} do
   for l=1,,Nqpl=1,\dots,N_{qp} do
     𝓩(i(k,1),i(k,2),i(k,3),𝐯,𝐭)+=w(l)b(l)a(l,k)𝓓(l,:,:)\bm{\mathscr{{Z}}}(i(k,1),i(k,2),i(k,3),\bm{\mathbf{{v}}},\bm{\mathbf{{t}}})\mathrel{+}=w(l)b(l)a(l,k)\bm{\mathscr{{D}}}(l,:,:) \triangleright Cell’s contribution to derivative integral
   end for
  end for
end for