This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Two-Level Direct Solver for the Hierarchical Poincaré-Steklov Method

Joseph Kump Anna Yesypenko Per-Gunnar Martinsson
Abstract

The Hierarchical Poincaré-Steklov method (HPS) is a solver for linear elliptic PDEs that combines a multidomain spectral collocation discretization scheme with a highly efficient direct solver. It is particularly well-suited for solving problems with highly oscillatory solutions. However, its hierarchical structure requires a heterogeneous and recursive approach that limits the potential parallelism. Our work introduces a new two-level solver for HPS which divides the method into dense linear algebra operations that can be implemented on GPUs, and a sparse system encoding the hierarchical aspects of the method which interfaces with highly optimized multilevel sparse solvers. This methodology improves the numerical stability of HPS as well as its computational performance. We present a description and justification of our method, as well as an array of tests on three-dimensional problems to showcase its accuracy and performance.

keywords:
HPS , sparse direct solvers , elliptic PDEs , spectral methods , domain decomposition , GPUs
journal: arXiv
\affiliation

[label1]organization=Oden Institute, University of Texas at Austin,addressline=201 E 24th St, city=Austin, postcode=78712, state=TX, country=USA

1 Introduction

Consider an elliptic boundary value problem of the form

[Au](𝐱)\displaystyle[Au](\mathbf{x}) =f(𝐱),\displaystyle=f(\mathbf{x}), 𝐱Ωd\displaystyle\mathbf{x}\in\Omega\subset\mathbb{R}^{d}
u(𝐱)\displaystyle u(\mathbf{x}) =g(𝐱),\displaystyle=g(\mathbf{x}), 𝐱Ω.\displaystyle\mathbf{x}\in\partial\Omega. (1)

Here d=2 or 3d=2\text{ or }3, AA is an elliptic partial differential operator (PDO), Ω\Omega is a bounded domain in d\mathbb{R}^{d}, g:Ωg:\partial\Omega\rightarrow\mathbb{R} is a known Dirichlet boundary condition, and f:Ωf:\Omega\rightarrow\mathbb{R} is a known body load. Our goal is to find u:Ωu:\Omega\rightarrow\mathbb{R}, the continuous function that satisfies (1). An example PDO is the 3D variable-coefficient Helmholtz operator with wavenumber κ\kappa and smooth nonnegative function bb,

Au=(2x12+2x22+2x32)uκ2b(x)u.Au=-\left(\frac{\partial^{2}}{\partial x_{1}^{2}}+\frac{\partial^{2}}{\partial x_{2}^{2}}+\frac{\partial^{2}}{\partial x_{3}^{2}}\right)u-\kappa^{2}b(\vec{x})u. (2)

Numerically solving an oscillatory problem such as the Helmholtz equation is difficult because of a common phenomenon called the pollution effect [1, 2]. A domain-based numerical method for solving the Helmholtz equation suffers from the pollution effect if, as kk\rightarrow\infty, the total number of degrees of freedom needed to maintain accuracy grows faster than kdk^{d} [3]. This effect makes many numerical methods infeasible to use for solving highly oscillatory problems. One group of methods that generally avoids the pollution effect, however, are global spectral methods [4]. These discretize a bounded, continuous domain using a set of basis functions restricted to the domain - a common choice are polynomials derived from Chebyshev roots, which more effective than other classic polynomials (such as Gaussian nodes) at accurately approximating continuous functions [5]. For a global spectral method using Chebyshev nodes, the degrees of freedom represent the order of polynomials used to approximate the solution function. High-order polynomials can be highly oscillatory while smooth, so they are ideal for approximating oscillatory functions.

However, global spectral methods introduce their own problems. The matrices formulated to solve these numerical methods are typically dense, making their use computationally expensive. Moreover, they are often poorly conditioned [6]. In addition, Chebyshev nodes in particular are denser near domain boundaries than the interior, which can lead to an uneven accuracy in their approximation.

One strategy to leverage the strengths of spectral methods while mitigating their drawbacks is by integrating them into multidomain approaches, which partition the domain into subdomains on which the spectral method is applied, then use some technique to “glue” these subdomains together. Examples include spectral collocation methods [7, 8] and spectral element methods [9, 10, 11]. With these approaches the total degrees of freedom is generally a product of the spatial partitioning and the polynomial order per subdomain. Since the spectral method is localized to smaller subdomains, its polynomial order can remain relatively low which improves conditioning. This approach also enables the operators assembled in the method’s implementation to be sparse, since subdomains typically only interact with nearby subdomains in the discretization.

The Hierarchical Poincaré-Steklov method, or HPS method, is a multidomain spectral collocation scheme. As illustrated in its name, HPS utilizes a hierarchical divide-and-conquer approach where the domain Ω\Omega is recursively split in half into a series of subdomains [12]. The PDE is enforced locally on each lowest-level subdomain (“leaf”) through a spectral method. Then in an upward pass through the tree of hierarchical subdomains, solution operators are formed for leaf boundaries by enforcing continuity of the Neumann derivative. The HPS scheme is typically solved directly.

In addition to the advantages of other multidomain spectral methods, the HPS scheme’s use of a direct solver makes it well-suited for problems where effective preconditioners are difficult to obtain (including aforementioned oscillatory problems), and the operators constructed as part of the method can be used for problems with the same PDO but different ff and gg [13] - this can make HPS well-suited for solving parabolic PDEs as well, since data from the previous time step is expressed in the body load [14, 15]. However, the hierarchical structure of HPS does cause some issues of its own. In particular, the solution operators generated at each level tend to be ill-conditioned since they are derived from spectral differentiation matrices and Dirichlet-to-Neumann (DtN) maps which are themselves ill-conditioned. Moreover, the recursive nature of hierarchical methods does not lend to easy parallelization: the levels of solution operators have to be constructed in serial since each one depends on the previous level [16].

We propose a modified solver for the HPS scheme on 3D domains that forgoes the hierarchical approach for a two-level framework, analogous to [17]. Instead of recursively constructing a chain of solution operators in an upward pass, we use the DtN maps of each leaf subdomain to formulate a sparse system that encodes all leaf boundaries concurrently. This system can easily interface with effective multifrontal sparse solvers such as MUMPS, which can apply state-of-the-art pivoting techniques to improve the method’s stability. In addition, flattening the hierarchical component of the method enables easier parallelization and use of GPUs accelerators for parts of the method. This paper provides a summary of the traditional HPS method, then presents this improved method and an array of numerical experiments to showcase its effectiveness.

2 Preliminaries

Recall the boundary value problem we intend to solve, shown in (1). In this section we will describe the use of spectral methods and merge operations for discretizing and solving this type of problem.

2.1 Leaf Computation

In this subsection we will describe a global spectral method to approximate an elliptic PDE on a box domain. We will apply this later to individual subdomains within a larger boundary value problem. Suppose we have a PDE such as in (1), where Ω\Omega is a rectangular domain such as [0,1]d[0,1]^{d}, d=2d=2 or 33. We discretize Ω\Omega into a product grid of pdp^{d} Chebyshev nodes {xk}k=1pd\{x_{k}\}_{k=1}^{p^{d}}, where pp is the chosen polynomial order (other works have used polynomial coefficients as degrees of freedom instead [18]). Let 𝐮pd\mathbf{u}\in\mathbb{R}^{p^{d}} be the numerical approximation of uu on this discretization, such that 𝐮ku(xk)\mathbf{u}_{k}\approx u(x_{k}). There exist spectral differentiation matrices 𝐃(1)\mathbf{D}^{(1)}, 𝐃(2)\mathbf{D}^{(2)}, and (in 3D) 𝐃(3)\mathbf{D}^{(3)} such that [𝐃(i)𝐮][iu](xk)[\mathbf{D}^{(i)}\mathbf{u}]\approx[\partial_{i}u](x_{k}), and more generally 𝐃(i)𝐮\mathbf{D}^{(i)}\mathbf{u} is a discretization of iu\partial_{i}u [19]. Thus we can approximate the partial differential operator with a spectral differentiation matrix. For example, in the 3D Helmholtz equation shown in (2), we can discretize the partial differential operator with

𝐀𝐮=((𝐃(1))2+(𝐃(2))2+(𝐃(3))2+κ2b(𝐱)𝐈)𝐮,\mathbf{A}\mathbf{u}=-\left(\left(\mathbf{D}^{(1)}\right)^{2}+\left(\mathbf{D}^{(2)}\right)^{2}+\left(\mathbf{D}^{(3)}\right)^{2}+\kappa^{2}b(\mathbf{x})\mathbf{I}\right)\mathbf{u}, (3)

where 𝐈\mathbf{I} is the p3×p3p^{3}\times p^{3} identity matrix. Since Chebyshev nodes include the endpoints of an interval, there will always be points on Ω\partial\Omega: 4p4p points on the edges for d=2d=2, and 6p26p^{2} points on the faces for d=3d=3. Let BB denote the indices of 𝐮\mathbf{u} that lie on Ω\partial\Omega, and II denote the indices of 𝐮\mathbf{u} on the interior of Ω\Omega. We can then show

𝐀𝐮\displaystyle\mathbf{A}\mathbf{u} =𝐟\displaystyle=\mathbf{f}
\displaystyle\implies 𝐀I,:𝐮\displaystyle\mathbf{A}_{I,:}\mathbf{u} =𝐟I\displaystyle=\mathbf{f}_{I}
\displaystyle\implies 𝐀I,I𝐮I+𝐀I,B𝐮B\displaystyle\mathbf{A}_{I,I}\mathbf{u}_{I}+\mathbf{A}_{I,B}\mathbf{u}_{B} =𝐟I\displaystyle=\mathbf{f}_{I}
\displaystyle\implies 𝐀I,I𝐮I\displaystyle\mathbf{A}_{I,I}\mathbf{u}_{I} =𝐀I,B𝐮B+𝐟I\displaystyle=-\mathbf{A}_{I,B}\mathbf{u}_{B}+\mathbf{f}_{I}
\displaystyle\implies 𝐮I\displaystyle\mathbf{u}_{I} =((𝐀I,I)1𝐀I,B)𝐮B+(𝐀I,I)1𝐟I\displaystyle=\left(-\left(\mathbf{A}_{I,I}\right)^{-1}\mathbf{A}_{I,B}\right)\mathbf{u}_{B}+\left(\mathbf{A}_{I,I}\right)^{-1}\mathbf{f}_{I}
\displaystyle\implies 𝐮I\displaystyle\mathbf{u}_{I} =𝐒𝐮B+(𝐀I,I)1𝐟I.\displaystyle=\mathbf{S}\mathbf{u}_{B}+\left(\mathbf{A}_{I,I}\right)^{-1}\mathbf{f}_{I}. (4)

𝐒\mathbf{S} is sometimes called the solution operator [20], and is of size # of interior points ×# of boundary points=(p2)d×2dpd1\#\text{ of interior points }\times\#\text{ of boundary points}=(p-2)^{d}\times 2dp^{d-1}. Thus given a boundary value problem such as (1) with partial differential operator AA and a discretization 𝐀\mathbf{A} based on Chebyshev nodes {x}k=1pd\{x\}_{k=1}^{p^{d}}, we can derive 𝐒\mathbf{S} that maps the problem’s Dirichlet data to its solution on the interior of Ω\Omega (with the addition of a body load, if inhomogeneous). This is a global spectral method, and it excels at approximating highly-oscillatory problems such as the Helmholtz equation. However, it is poorly conditioned for larger pp, and it can struggle to capture function details near the center of Ω\Omega since there are fewer Chebyshev nodes in that region. In addition, both 𝐀\mathbf{A} and 𝐒\mathbf{S} are dense matrices, making use of this method for large pp computationally expensive.

2.2 Merge Operation for Two Subdomains

Refer to caption
Figure 1: A rectangular 2D domain Ω\Omega divided into subdomains Ω(1),Ω(2)\Omega^{(1)},\Omega^{(2)}. 𝐮1\mathbf{u}_{1} and 𝐮2\mathbf{u}_{2} refer to function values on nodes in the interiors of Ω(1)\Omega^{(1)} and Ω(2)\Omega^{(2)}, while 𝐮3\mathbf{u}_{3} and 𝐮4\mathbf{u}_{4} are on Ω\partial\Omega restricted to Ω(1)\Omega^{(1)} and Ω(2)\Omega^{(2)}, respectively (the Dirichlet boundary condition). 𝐮5\mathbf{u}_{5} is the shared boundary between Ω(1)\Omega^{(1)} and Ω(2)\Omega^{(2)}; its points are not on Ω\partial\Omega.

To mitigate the shortcomings of a global spectral method in solving an elliptic PDE such as in (1), we will partition Ω\Omega into two rectangular non-overlapping subdomains Ω(1)\Omega^{(1)} and Ω(2)\Omega^{(2)} (we will alternatively call these subdomains “leaves” or “boxes”). We will discretize each subdomain using pdp^{d} Chebyshev nodes, as we did for Ω\Omega in Section 2.1. This leads to a total of 2pdpd12p^{d}-p^{d-1} points, since there is one shared face of points between Ω(1)\Omega^{(1)} and Ω(2)\Omega^{(2)}. A diagram of this discretization in 2D is presented in Figure 1, with 𝐮5\mathbf{u}_{5} denoting the shared face. We already know the data of the other faces (𝐮3,𝐮4\mathbf{u}_{3},\mathbf{u}_{4} in the figure) on both boxes as part of the Dirichlet boundary condition. If we can solve for the points on the shared face first, then we can solve for the interior of each subdomain separately using a spectral method as in Section 2.1, enforcing the PDE locally. To help solve for the shared face, we will add an additional condition to our problem: the Neumann derivative of uu on Ω(1)\partial\Omega^{(1)} and Ω(2)\partial\Omega^{(2)} must be equal on the shared face. In other words, we enforce continuity of the Neumann derivative.

To enforce this continuity, we need to find a numerical approximation to the Neumann derivative. This can be done using spectral differentiation matrices. In particular we form a Neumann operator matrix 𝐍\mathbf{N} on each subdomain such that

uNeumann|Ω(i)𝐍𝐮|Ω(i)=[𝐃Kl,:(1)𝐃Kr,:(1)𝐃Kd,:(2)𝐃Ku,:(2)𝐃Kb,:(3)𝐃Kf,:(3)]𝐮|Ω(i) for i=1,2.\left.u_{\text{Neumann}}\right|_{\partial\Omega^{(i)}}\approx\left.\mathbf{N}\mathbf{u}\right|_{\Omega^{(i)}}=\begin{bmatrix}\mathbf{D}_{K_{l},:}^{(1)}\\ \mathbf{D}_{K_{r},:}^{(1)}\\ \mathbf{D}_{K_{d},:}^{(2)}\\ \mathbf{D}_{K_{u},:}^{(2)}\\ \mathbf{D}_{K_{b},:}^{(3)}\\ \mathbf{D}_{K_{f},:}^{(3)}\\ \end{bmatrix}\left.\mathbf{u}\right|_{\Omega^{(i)}}\text{ for }i=1,2.

Here l,r,d,u,b,fl,r,d,u,b,f are indices for each face of the subdomain boundary, with l,rl,r pointing in the x1x_{1} direction, d,ud,u pointing in the x2x_{2} direction, and b,fb,f pointing in the x3x_{3} direction (respectively “left”, “right”, “down”, “up”, “back”, and “front” - in 2D bb and ff are omitted). 𝐍\mathbf{N} obtains the Neumann derivative for each face, taking in both the boundary points 𝐮B\mathbf{u}_{B} and interior points 𝐮I\mathbf{u}_{I} as inputs. To obtain the Neumann derivatives using only 𝐮B\mathbf{u}_{B}, we combine 𝐍\mathbf{N} with the solution operator 𝐒\mathbf{S} to get

𝐍𝐮\displaystyle\mathbf{N}\mathbf{u} =𝐍:,[B,I][𝐮B𝐮I] (reindexing entries / columns)\displaystyle=\mathbf{N}_{:,[B,I]}\begin{bmatrix}\mathbf{u}_{B}\\ \mathbf{u}_{I}\end{bmatrix}\text{ (reindexing entries / columns)}
=𝐍:,[B,I][𝐮B𝐒𝐮B+𝐀I,I1𝐟I] from (2.1)\displaystyle=\mathbf{N}_{:,[B,I]}\begin{bmatrix}\mathbf{u}_{B}\\ \mathbf{S}\mathbf{u}_{B}+\mathbf{A}_{I,I}^{-1}\mathbf{f}_{I}\end{bmatrix}\text{ from (\ref{eqn:solutionOp})}
=𝐍:,[B,I][𝐈𝐝𝐒]𝐮B+𝐍:,[B,I][0𝐀I,I1𝐟I]\displaystyle=\mathbf{N}_{:,[B,I]}\begin{bmatrix}\mathbf{Id}\\ \mathbf{S}\end{bmatrix}\mathbf{u}_{B}+\mathbf{N}_{:,[B,I]}\begin{bmatrix}0\\ \mathbf{A}_{I,I}^{-1}\mathbf{f}_{I}\end{bmatrix}
=𝐓𝐮B+𝐍:,I𝐀I,I1𝐟I.\displaystyle=\mathbf{T}\mathbf{u}_{B}+\mathbf{N}_{:,I}\mathbf{A}_{I,I}^{-1}\mathbf{f}_{I}. (5)

𝐈𝐝\mathbf{Id} is the size(𝐮B)×size(𝐮B)\text{size}(\mathbf{u}_{B})\times\text{size}(\mathbf{u}_{B}) identity matrix and 𝐓\mathbf{T} denotes the Dirichlet-to-Neumann (DtN) map for Ω(i)\Omega^{(i)} (also called the Poincaré-Steklov operator). For homogeneous PDEs we approximate the Neumann derivative entirely with 𝐓𝐮B\mathbf{T}\mathbf{u}_{B}, whereas the inhomogeneous case requires some contribution from the body load as seen in (2.2). Since 𝐓\mathbf{T} maps from the box boundary to itself, it is a square matrix of size 2dpd1×2dpd12dp^{d-1}\times 2dp^{d-1}.

Borrowing terminology from Figure 1, let 𝐮1,𝐮2,𝐮3,𝐮4,𝐮5\mathbf{u}_{1},\mathbf{u}_{2},\mathbf{u}_{3},\mathbf{u}_{4},\mathbf{u}_{5} denote the various subsets of 𝐮\mathbf{u} on discretized Ω\Omega. Let 𝐯3,𝐯4,𝐯5\mathbf{v}_{3},\mathbf{v}_{4},\mathbf{v}_{5} denote the Neumann derivatives on the same points as 𝐮3,𝐮4,𝐮5\mathbf{u}_{3},\mathbf{u}_{4},\mathbf{u}_{5}, and let 𝐓(1),𝐓(2)\mathbf{T}^{(1)},\mathbf{T}^{(2)} denote the DtNs on Ω(1),Ω(2)\Omega^{(1)},\Omega^{(2)}. Then, with a reindexing of some rows and columns of the DtNs,

[𝐯3𝐯5]=𝐓(1)[𝐮3𝐮5], [𝐯4𝐯5]=𝐓(2)[𝐮4𝐮5].\displaystyle\begin{bmatrix}\mathbf{v}_{3}\\ \mathbf{v}_{5}\end{bmatrix}=\mathbf{T}^{(1)}\begin{bmatrix}\mathbf{u}_{3}\\ \mathbf{u}_{5}\end{bmatrix},\text{ }\begin{bmatrix}\mathbf{v}_{4}\\ \mathbf{v}_{5}\end{bmatrix}=\mathbf{T}^{(2)}\begin{bmatrix}\mathbf{u}_{4}\\ \mathbf{u}_{5}\end{bmatrix}. (6)

By setting 𝐯5\mathbf{v}_{5} (Neumann derivative of the shared face) equal to itself, we get

𝐯5=𝐯5\displaystyle\mathbf{v}_{5}=\mathbf{v}_{5}
\displaystyle\implies 𝐓5,:(1)[𝐮3𝐮5]=𝐓5,:(2)[𝐮4𝐮5]\displaystyle\mathbf{T}_{5,:}^{(1)}\begin{bmatrix}\mathbf{u}_{3}\\ \mathbf{u}_{5}\end{bmatrix}=\mathbf{T}_{5,:}^{(2)}\begin{bmatrix}\mathbf{u}_{4}\\ \mathbf{u}_{5}\end{bmatrix}
\displaystyle\implies 𝐓5,3(1)𝐮3+𝐓5,5(1)𝐮5=𝐓5,4(2)𝐮4+𝐓5,5(2)𝐮5\displaystyle\mathbf{T}_{5,3}^{(1)}\mathbf{u}_{3}+\mathbf{T}_{5,5}^{(1)}\mathbf{u}_{5}=\mathbf{T}_{5,4}^{(2)}\mathbf{u}_{4}+\mathbf{T}_{5,5}^{(2)}\mathbf{u}_{5}
\displaystyle\implies (𝐓5,5(1)𝐓5,5(2))𝐮5=𝐓5,3(1)𝐮3+𝐓5,4(2)𝐮4\displaystyle\left(\mathbf{T}_{5,5}^{(1)}-\mathbf{T}_{5,5}^{(2)}\right)\mathbf{u}_{5}=-\mathbf{T}_{5,3}^{(1)}\mathbf{u}_{3}+\mathbf{T}_{5,4}^{(2)}\mathbf{u}_{4}
\displaystyle\implies 𝐮5=(𝐓5,5(1)𝐓5,5(2))1[𝐓5,3(1)𝐓5,4(2)][𝐮3𝐮4]=𝐒(3)[𝐮3𝐮4].\displaystyle\mathbf{u}_{5}=\left(\mathbf{T}_{5,5}^{(1)}-\mathbf{T}_{5,5}^{(2)}\right)^{-1}\begin{bmatrix}-\mathbf{T}_{5,3}^{(1)}&\mathbf{T}_{5,4}^{(2)}\end{bmatrix}\begin{bmatrix}\mathbf{u}_{3}\\ \mathbf{u}_{4}\end{bmatrix}=\mathbf{S}^{(3)}\begin{bmatrix}\mathbf{u}_{3}\\ \mathbf{u}_{4}\end{bmatrix}. (7)

Thus we can compute the shared face 𝐮5\mathbf{u}_{5} 𝐮3\mathbf{u}_{3} and 𝐮4\mathbf{u}_{4}, given by the Dirichlet boundary condition, and a new solution operator 𝐒(3)\mathbf{S}^{(3)} computed entirely from submatrices of the DtNs. This assumes a homogeneous PDE - if the problem is inhomogeneous, additional terms with the body load are present such as in (2.2). Once 𝐮5\mathbf{u}_{5} is solved, we can treat it as Dirichlet data for Ω1\Omega_{1} and Ω2\Omega_{2}, along with 𝐮3\mathbf{u}_{3} and 𝐮4\mathbf{u}_{4}, to get 𝐮1\mathbf{u}_{1} and 𝐮2\mathbf{u}_{2}.

This divide-and-conquer approach has a few advantages over a global spectral method. The solution operators and DtN maps for each subdomain can be computed separately, providing an easy source of parallelism. While the merged solution operator 𝐒(3)\mathbf{S}^{(3)} is still somewhat ill-conditioned, it is better conditioned than a global spectral matrix with the same domain size and degrees of freedom. In addition, nodes are more evenly spaced across the entire domain than in a global spectral method.

2.2.1 Merging with Corner and Edge Points

Since Chebyshev nodes include the ends of their interval, using them to discretize boxes produces nodes on the corners and edges. Corner nodes present a challenge in this method, since their Neumann derivative is ill-defined (in three dimensions this challenge extends to edge points as well, but for consistency we will also refer to these nodes as corner nodes). If our partial differential operator AA does not include mixed second order derivatives (such as in the Poisson equation or constant-coefficient Helmholtz), then the corner nodes do not impact the values of interior domain points through the solution operator 𝐒\mathbf{S} due to the structure of spectral differentiation matrices, so we may simply drop them from our discretization: when merging boxes, we only consider the (p2)d1(p-2)^{d-1} points on the interior of the shared face.

However, if AA does include mixed second order terms, then the corner points impact interior points through 𝐒\mathbf{S}, so exclusion will significantly reduce accuracy. To make their role in the Neumann derivative unambiguous, we interpolate the box face points to Gaussian nodes (which do not include the ends of their interval) in this case. Then we define our DtN map 𝐓interp=𝐋CtG𝐓𝐋GtC\mathbf{T}_{\text{interp}}=\mathbf{L}_{CtG}\mathbf{T}\mathbf{L}_{GtC}, where 𝐋CtG\mathbf{L}_{CtG} is the Chebyshev-to-Gaussian interpolation operator, and 𝐋GtC\mathbf{L}_{GtC} is the Gaussian-to-Chebyshev interpolation operator. Once the shared face is solved, we then interpolate it back to Chebyshev nodes before applying the local solution operators. A projection operator is composed onto the Gaussian-to-Chebyshev interpolation matrix to preserve continuity at corner points.

We call the degree of the modified box exteriors qq. In the case of dropped corners q=p2q=p-2. We can choose qq when we use Gaussian interpolation, but q=p2q=p-2 or p1p-1 is generally most effective. This means in practice the DtN maps 𝐓\mathbf{T} are size 2dqd1×2dqd12dq^{d-1}\times 2dq^{d-1}.

2.3 Analogy to Sparse LULU Factorization

One interpretation of this merge operation is as an analogy to LULU factorization [21]. Consider Figure 1, simplifying it by considering the domain Dirichlet boundary terms 𝐮3,𝐮4\mathbf{u}_{3},\mathbf{u}_{4} parts of 𝐮1\mathbf{u}_{1} and 𝐮2\mathbf{u}_{2} respectively. Then we can represent our numerical partial differential operator (PDO) 𝐀\mathbf{A} with

𝐀𝐮=𝐟[𝐀11𝐀15𝐀22𝐀25𝐀51𝐀52𝐀55][𝐮1𝐮2𝐮5]=[𝐟1𝐟2𝐟5].\mathbf{A}\mathbf{u}=\mathbf{f}\equiv\begin{bmatrix}\mathbf{A}_{11}&&\mathbf{A}_{15}\\ &\mathbf{A}_{22}&\mathbf{A}_{25}\\ \mathbf{A}_{51}&\mathbf{A}_{52}&\mathbf{A}_{55}\end{bmatrix}\begin{bmatrix}\mathbf{u}_{1}\\ \mathbf{u}_{2}\\ \mathbf{u}_{5}\end{bmatrix}=\begin{bmatrix}\mathbf{f}_{1}\\ \mathbf{f}_{2}\\ \mathbf{f}_{5}\end{bmatrix}. (8)

Here 𝐮\mathbf{u} is the solution and 𝐟\mathbf{f} is the body load on the interior of Ω\Omega, split into the two subdomains and shared boundary. Because an elliptic PDO is a local operator there is no interaction through 𝐀\mathbf{A} on 𝐟1\mathbf{f}_{1} by 𝐮2\mathbf{u}_{2} or 𝐟2\mathbf{f}_{2} by 𝐮1\mathbf{u}_{1}, so 𝐀=0\mathbf{A}=0 in those submatrices. However, both 𝐮1\mathbf{u}_{1} and 𝐮2\mathbf{u}_{2}, are local to the shared boundary where 𝐮5\mathbf{u}_{5} lies, so they both interact with the body load there, 𝐟5\mathbf{f}_{5}, through 𝐀\mathbf{A}. To interpret the first two block rows, note that

𝐀11𝐮1+𝐀15𝐮5=𝐟1𝐮1=𝐀111𝐀15𝐮5+𝐀111𝐟1\mathbf{A}_{11}\mathbf{u}_{1}+\mathbf{A}_{15}\mathbf{u}_{5}=\mathbf{f}_{1}\iff\mathbf{u}_{1}=-\mathbf{A}_{11}^{-1}\mathbf{A}_{15}\mathbf{u}_{5}+\mathbf{A}_{11}^{-1}\mathbf{f}_{1} (9)

This is similar to our solution operator formulation shown in (2.1), with the part of the subdomain boundary on Ω\partial\Omega now folded into 𝐮1\mathbf{u}_{1}. Next, let 𝐮~1=𝐀111𝐟1\mathbf{\tilde{u}}_{1}=\mathbf{A}_{11}^{-1}\mathbf{f}_{1} and 𝐮~2=𝐀221𝐟2\mathbf{\tilde{u}}_{2}=\mathbf{A}_{22}^{-1}\mathbf{f}_{2}. If we supply these vectors in place of 𝐮1,𝐮2\mathbf{u}_{1},\mathbf{u}_{2} in (8) and set 𝐮5=𝟎\mathbf{u}_{5}=\mathbf{0}, then the equation still holds with the same body loads 𝐟1\mathbf{f}_{1} and 𝐟2\mathbf{f}_{2}. Thus 𝐮~1\mathbf{\tilde{u}}_{1} and 𝐮~2\mathbf{\tilde{u}}_{2} are the particular solutions to our PDE on the interiors of Ω(1)\Omega^{(1)}, and Ω(2)\Omega^{(2)} with 𝐮5=𝟎\mathbf{u}_{5}=\mathbf{0}. Now consider an upper triangular matrix 𝐔\mathbf{U} that satisfies

[𝐈𝐀111𝐀15𝐈𝐀221𝐀25𝐈]𝐔[𝐮1𝐮2𝐮5]=[𝐮~1𝐮~2𝐮5].\displaystyle\underbracket{\begin{bmatrix}\mathbf{I}&&\mathbf{A}_{11}^{-1}\mathbf{A}_{15}\\ &\mathbf{I}&\mathbf{A}_{22}^{-1}\mathbf{A}_{25}\\ &&\mathbf{I}\end{bmatrix}}_{\mathbf{U}}\begin{bmatrix}\mathbf{u}_{1}\\ \mathbf{u}_{2}\\ \mathbf{u}_{5}\end{bmatrix}=\begin{bmatrix}\mathbf{\tilde{u}}_{1}\\ \mathbf{\tilde{u}}_{2}\\ \mathbf{u}_{5}\end{bmatrix}. (10)

This linear system can be confirmed with (9). 𝐔\mathbf{U} decouples our solutions 𝐮1,𝐮2\mathbf{u}_{1},\mathbf{u}_{2} into the particular solutions 𝐮~1,𝐮~2\mathbf{\tilde{u}}_{1},\mathbf{\tilde{u}}_{2} which are derived from our global Dirichlet BC without the subdomain-only (shared face) Dirichlet BC, and 𝐮5\mathbf{u}_{5} which is that subdomain-only boundary condition. Thus 𝐔1\mathbf{U}^{-1} collects both components of our solutions - in effect it represents the end of solving the merged system, where we use 𝐮3\mathbf{u}_{3} and 𝐮5\mathbf{u}_{5} to get 𝐮1\mathbf{u}_{1}, and 𝐮4\mathbf{u}_{4} and 𝐮5\mathbf{u}_{5} to get 𝐮2\mathbf{u}_{2}. Next, let 𝐋\mathbf{L} be a lower triangular matrix defined such that

[𝐈𝐈𝐀51𝐀111𝐀52𝐀221𝐈]𝐋[𝐟1𝐟2𝐟5𝐀51𝐀111𝐟1𝐀52𝐀221𝐟2𝐟~5]=[𝐟1𝐟2𝐟5].\displaystyle\underbracket{\begin{bmatrix}\mathbf{I}&&\\ &\mathbf{I}&\\ \mathbf{A}_{51}\mathbf{A}_{11}^{-1}&\mathbf{A}_{52}\mathbf{A}_{22}^{-1}&\mathbf{I}\end{bmatrix}}_{\mathbf{L}}\begin{bmatrix}\mathbf{f}_{1}\\ \mathbf{f}_{2}\\ \underbracket{\mathbf{f}_{5}-\mathbf{A}_{51}\mathbf{A}_{11}^{-1}\mathbf{f}_{1}-\mathbf{A}_{52}\mathbf{A}_{22}^{-1}\mathbf{f}_{2}}_{\mathbf{\tilde{f}}_{5}}\end{bmatrix}=\begin{bmatrix}\mathbf{f}_{1}\\ \mathbf{f}_{2}\\ \mathbf{f}_{5}\end{bmatrix}. (11)

To interpret this, note that

𝐟~5=𝐟5𝐀51𝐮~1𝐀52𝐮~2𝐟5=[𝐀51𝐀52][𝐮~1𝐮~2]+𝐟~5.\mathbf{\tilde{f}}_{5}=\mathbf{f}_{5}-\mathbf{A}_{51}\mathbf{\tilde{u}}_{1}-\mathbf{A}_{52}\mathbf{\tilde{u}}_{2}\iff\mathbf{f}_{5}=\begin{bmatrix}\mathbf{A}_{51}&\mathbf{A}_{52}\end{bmatrix}\begin{bmatrix}\mathbf{\tilde{u}}_{1}\\ \mathbf{\tilde{u}}_{2}\end{bmatrix}+\mathbf{\tilde{f}}_{5}. (12)

This has some resemblance to (7) when noting that 𝐮~1\mathbf{\tilde{u}}_{1} encodes the impact of Dirichlet data 𝐮3\mathbf{u}_{3} without 𝐮5\mathbf{u}_{5} (and likewise 𝐮~2\mathbf{\tilde{u}}_{2} is 𝐮4\mathbf{u}_{4} without 𝐮5\mathbf{u}_{5}). If an additional operator was composed onto [𝐀51|𝐀52][\mathbf{A}_{51}|\mathbf{A}_{52}] that resembled a Neumann derivative approximation following an inverse of the PDO, then it would closely resemble [𝐓53(1)|𝐓54(2)][-\mathbf{T}_{53}^{(1)}|\mathbf{T}_{54}^{(2)}] from (7). Another way to view (12) is that it breaks loads from decoupled 𝐮~1,𝐮~2\mathbf{\tilde{u}}_{1},\mathbf{\tilde{u}}_{2} on the two subdomains from 𝐟5\mathbf{f}_{5}, leaving another decoupled term in 𝐟~5\mathbf{\tilde{f}}_{5}. By giving us 𝐟~5\mathbf{\tilde{f}}_{5} from 𝐟\mathbf{f}, 𝐋1\mathbf{L}^{-1} can enable us to formulate a reduced problem that will solve for 𝐮5\mathbf{u}_{5} given 𝐟~5\mathbf{\tilde{f}}_{5} - it performs the first part of the merge operation. To define this reduced problem, consider a block diagonal matrix 𝐃\mathbf{D} where

[𝐀11𝐀22𝐓55]𝐃[𝐮~1𝐮~2𝐮5]=[𝐟1𝐟2𝐟~5].\underbracket{\begin{bmatrix}\mathbf{A}_{11}&&\\ &\mathbf{A}_{22}&\\ &&\mathbf{T}_{55}\end{bmatrix}}_{\mathbf{D}}\begin{bmatrix}\mathbf{\tilde{u}}_{1}\\ \mathbf{\tilde{u}}_{2}\\ \mathbf{u}_{5}\end{bmatrix}=\begin{bmatrix}\mathbf{f}_{1}\\ \mathbf{f}_{2}\\ \mathbf{\tilde{f}}_{5}\end{bmatrix}. (13)

This now represents a fully decoupled system for the particular solutions 𝐮~1,𝐮~2\mathbf{\tilde{u}}_{1},\mathbf{\tilde{u}}_{2} and shared boundary 𝐮5\mathbf{u}_{5}. However, we need to find a submatrix 𝐓55\mathbf{T}_{55} that satisfies this system. Such a 𝐓55\mathbf{T}_{55} can be found as

𝐟~5\displaystyle\mathbf{\tilde{f}}_{5} =𝐟5𝐀51𝐮~1𝐀52𝐮~2\displaystyle=\mathbf{f}_{5}-\mathbf{A}_{51}\mathbf{\tilde{u}}_{1}-\mathbf{A}_{52}\mathbf{\tilde{u}}_{2}
=𝐟5𝐀51(𝐮1+𝐀111𝐀15𝐮5)𝐀52(𝐮2+𝐀221𝐀25𝐮5)\displaystyle=\mathbf{f}_{5}-\mathbf{A}_{51}(\mathbf{u}_{1}+\mathbf{A}_{11}^{-1}\mathbf{A}_{15}\mathbf{u}_{5})-\mathbf{A}_{52}(\mathbf{u}_{2}+\mathbf{A}_{22}^{-1}\mathbf{A}_{25}\mathbf{u}_{5})
=(𝐟5𝐀51𝐮1𝐀52𝐮2)𝐀51𝐀111𝐀15𝐮5𝐀52𝐀221𝐀25𝐮5\displaystyle=(\mathbf{f}_{5}-\mathbf{A}_{51}\mathbf{u}_{1}-\mathbf{A}_{52}\mathbf{u}_{2})-\mathbf{A}_{51}\mathbf{A}_{11}^{-1}\mathbf{A}_{15}\mathbf{u}_{5}-\mathbf{A}_{52}\mathbf{A}_{22}^{-1}\mathbf{A}_{25}\mathbf{u}_{5}
=𝐀55𝐮5𝐀51𝐀111𝐀15𝐮5𝐀52𝐀221𝐀25𝐮5\displaystyle=\mathbf{A}_{55}\mathbf{u}_{5}-\mathbf{A}_{51}\mathbf{A}_{11}^{-1}\mathbf{A}_{15}\mathbf{u}_{5}-\mathbf{A}_{52}\mathbf{A}_{22}^{-1}\mathbf{A}_{25}\mathbf{u}_{5}
=(𝐀55𝐀51𝐀111𝐀15𝐀52𝐀221𝐀25)𝐮5\displaystyle=(\mathbf{A}_{55}-\mathbf{A}_{51}\mathbf{A}_{11}^{-1}\mathbf{A}_{15}-\mathbf{A}_{52}\mathbf{A}_{22}^{-1}\mathbf{A}_{25})\mathbf{u}_{5}
=𝐓55𝐮5.\displaystyle=\mathbf{T}_{55}\mathbf{u}_{5}. (14)

This formulation for 𝐓55\mathbf{T}_{55} includes the effect of 𝐮5\mathbf{u}_{5} on its body load (𝐀55\mathbf{A}_{55}) as well as terms that resemble the solution operators in (2.1) acting on 𝐮5\mathbf{u}_{5} only (𝐀111𝐀15,𝐀221𝐀25\mathbf{A}_{11}^{-1}\mathbf{A}_{15},\mathbf{A}_{22}^{-1}\mathbf{A}_{25}). If we combine it with (12) then we see

𝐮5=𝐓551[𝐀51𝐀52][𝐮~1𝐮~2]+𝐓551𝐟5.\mathbf{u}_{5}=-\mathbf{T}_{55}^{-1}\begin{bmatrix}\mathbf{A}_{51}&\mathbf{A}_{52}\end{bmatrix}\begin{bmatrix}\mathbf{\tilde{u}}_{1}\\ \mathbf{\tilde{u}}_{2}\end{bmatrix}+\mathbf{T}_{55}^{-1}\mathbf{f}_{5}. (15)

This more closely resembles (7). Lastly, by combining the matrices together,

𝐋𝐃𝐔𝐮=𝐋𝐃[𝐮~1𝐮~2𝐮5]=𝐋[𝐟~1𝐟~2𝐟5]=𝐟.\mathbf{L}\mathbf{D}\mathbf{U}\mathbf{u}=\mathbf{L}\mathbf{D}\begin{bmatrix}\mathbf{\tilde{u}}_{1}\\ \mathbf{\tilde{u}}_{2}\\ \mathbf{u}_{5}\end{bmatrix}=\mathbf{L}\begin{bmatrix}\mathbf{\tilde{f}}_{1}\\ \mathbf{\tilde{f}}_{2}\\ \mathbf{f}_{5}\end{bmatrix}=\mathbf{f}. (16)

Thus 𝐋𝐃𝐔\mathbf{L}\mathbf{D}\mathbf{U} is an LDU factorization of 𝐀\mathbf{A}, with (𝐋𝐃)1(\mathbf{L}\mathbf{D})^{-1} resembling the merge operations we derive via DtN maps, and 𝐔1\mathbf{U}^{-1} resembling the interior solve for 𝐮1,𝐮2\mathbf{u}_{1},\mathbf{u}_{2} once we obtain the shared boundary data. This conceptualization will prove useful in justifying our algorithm for HPS.

3 Algorithm Description

In the previous section we discussed how to merge two subdomains of a boundary value problem together using DtN maps before solving each subdomain with a localized spectral method. Here we will briefly explain how the original Hierarchical Poincaré-Steklov method extends this concept to >2>2 subdomains, before introducing the contributions of this work.

3.1 Hierarchical Poincare-Steklov: Traditional Solver

Refer to caption
(a) =1\ell=1
Refer to caption
(b) =2\ell=2
Refer to caption
(c) =3\ell=3
Refer to caption
(d) =4\ell=4
Figure 2: A hierarchical breakdown of subdomains.

Given an elliptic PDE such as in (1) with a rectangular domain Ω\Omega, we can hierarchically partition Ω\Omega into a series of rectangular “box” subdomains, illustrated in Figure 2. Note that the boxes in each level are children of boxes in the previous level, with each box in level \ell having 2 children in level +1\ell+1, until we reach the final level of “leaf” boxes.

The HPS algorithm is split into two stages: the “build” stage, and the “solve” stage. In the build stage, we first construct solution and DtN maps for every leaf box in the final level of our discretization (=4\ell=4 in Fig. 2). We then proceed in an upward pass, constructing solution maps for each box in the next level up using the leaf DtNs, following (7). These solution maps solve for the boundaries between paired leaf boxes (boxes 8 and 9, 10 and 11, 12 and 13, and 14 and 15 in Fig. 2). We then recursively construct solution and DtN maps for each level of the discretization, until we get a solution operator for =1\ell=1 which solves for one large face in the middle of the domain (between boxes 2 and 3 in Fig. 2). Once all solution operators are constructed, we proceed with the solve stage that follows a downward pass: using the Dirichlet boundary condition and the =1\ell=1 solution operator to solve for the middle boundary, then proceeding down and using each level’s solution operators to solve for subsequent box faces, until every leaf box has all faces solved. Then we solve for the leaf interiors using their local solution operators.

The method’s upward and downward passes recursively extend the formulation and use of solution operators we derived in Section 2.2. Similarly, they also resemble the LDU factorization described in Section 2.3. We can imagine the upward pass as similar to 𝐋1\mathbf{L}^{-1} described there, with both encoding merge operations that reduce the body loads of the problem onto shared boundaries. 𝐃1\mathbf{D}^{-1} then resembles formulating the final components of the merged solution operators and solving for the shared face points - it combines elements of the upward and downward pass. Lastly, 𝐔1\mathbf{U}^{-1} is the end of the downward pass, obtaining the total solution. Given the similarities, it is not surprising that the classical algorithm for obtaining the DtN and solution operators of traditional HPS resembles multifrontal methods for sparse LU factorization [22, 23].

The original HPS method has a few significant drawbacks. The solution and DtN maps can be formed in parallel within each level, and prior work has used shared-memory models or MPI to accelerate this component [24, 25]. However, across levels these operators must be constructed recursively. This forces a large serial component to the algorithm. In addition, the method on a discretization with LL levels requires the storage of 2L12^{L}-1 dense solution operators, which can be large especially for 3D problems. One can choose to recompute certain levels during the downward pass to reduce storage cost, but this greatly increases the computation time as a trade-off. In addition, higher level DtN maps tend to get increasingly ill-conditioned in this method, which can cause numerical challenges in the method’s stability (formulations using other merge operators such as impedance maps may be better conditioned [26]). Since each solution operator is applied separately in the downward pass, techniques such as pivoting are limited to improving the conditioning of one solution operator at a time - this is one way HPS differs significantly from sparse LU algorithms, despite the other similarities. Our contributions mitigate these problems with the original HPS method.

3.2 Two-Level HPS

Refer to caption
Figure 3: The build stage of HPS resembles methods for static condensation, reducing the problem to one on subdomain boundaries.

In this section we will discuss the contribution of our work: designing a direct solver for elliptic PDEs in 2D or 3D that utilizes the HPS method in a 2 level framework. Instead of constructing higher level solution and DtN operators, we use the leaf DtN maps to formulate a system that solves for all leaf boundaries concurrently. We use this system to acquire the leaf boundaries, then apply the leaf solution operators in parallel to solve for leaf interiors. With this approach we use the discretization for HPS to reduce the problem in a way analogous to static condensation, visualized in Figure 3. Similar to traditional HPS, our method can naturally be divided into a build stage and a solve stage.

3.2.1 Build Stage

Refer to caption
Figure 4: Diagram of the discretization points for a problem with 232^{3} subdomains, with p=5p=5. Points on subdomain interiors are blue and solved using spectral differentiation. Points on exteriors are red, and solved by enforcing continuity of the Neumann derivative.
Refer to caption
(a) Box boundary points, with faces to solve highlighted in various colors.
Refer to caption
(b) Sparsity pattern.
Figure 5: Diagram of the box faces to solve for a problem discretized with 2×2×22\times 2\times 2 boxes, and the resulting sparsity pattern. A total of 12 box faces do not lie on the domain boundary Ω\partial\Omega, and must be solved in a sparse system which has 12 blocks along the main diagonal and 4 additional blocks in each block row (corresponding to adjacent faces that do not lie on Ω\partial\Omega).

As in previous sections suppose we have an elliptic PDE with a Dirichlet boundary condition on rectangular domain Ω\Omega, in this case partitioned into a number of non-overlapping rectangular subdomains Ω(i)\Omega^{(i)}, each discretized with Chebyshev nodes. An example with 8 such boxes can be seen in Figure 4, similar to level =4\ell=4 in Figure 2. We know the value of numerical solution 𝐮\mathbf{u} at every node on Ω\partial\Omega due to the Dirichlet boundary condition - we call this Dirichlet data 𝐮D\mathbf{u}_{D}. However, there are a number of box faces that do not lie on Ω\partial\Omega (see Figure 5(a), where there are 12 such faces). These must be solved before applying the leaf solution operators on each box to obtain 𝐮\mathbf{u} on box interiors.

To solve for the shared faces we enforce continuity of the Neumann derivative using Dirichlet-to-Neumann maps as described in Section 2.2. However, each shared face depends on other shared faces since DtN maps require the entire box boundary data as input (unlike traditional HPS, where some box faces are solved earlier by different-level solution operators). So we must encode all shared faces in one coupled system. We construct this system 𝐓sp\mathbf{T}_{\rm sp} using DtN maps 𝐓(i)\mathbf{T}^{(i)} and the relation between two boxes that share a boundary. Recall that each 𝐓(i)\mathbf{T}^{(i)} is size 2dqd1×2dqd12dq^{d-1}\times 2dq^{d-1}, where qp2q\approx p-2 is either the size of a box face with corner points dropped, or the number of Gaussian nodes to which we interpolate (see Section 2.2.1). Thus the contribution of one box face to another face’s Neumann derivative via DtN is a submatrix of 𝐓(i)\mathbf{T}^{(i)} of size qd1×qd1q^{d-1}\times q^{d-1}. We index our discretization points by box face so we can represent 𝐓sp\mathbf{T}_{\rm sp} as a block matrix consisting of these submatrices. Referring back to (7) and Figure 1 in Section 2.2, each block row of 𝐓sp\mathbf{T}_{\rm sp} (encoding the shared face between leaf boxes Ω(1)\Omega^{(1)} and Ω(2)\Omega^{(2)}) looks like

(𝐓5,5(1)𝐓5,5(2))𝐮5+𝐓5,3(1)𝐮3𝐓5,4(2)𝐮4=𝟎qd1\displaystyle\left(\mathbf{T}_{5,5}^{(1)}-\mathbf{T}_{5,5}^{(2)}\right)\mathbf{u}_{5}+\mathbf{T}_{5,3}^{(1)}\mathbf{u}_{3}-\mathbf{T}_{5,4}^{(2)}\mathbf{u}_{4}=\mathbf{0}_{q^{d-1}}
\displaystyle\iff [𝟎𝐓5,3(1)(𝐓5,5(1)𝐓5,5(2))𝐓5,4(2)𝟎][(rest of 𝐮)𝐮3𝐮5𝐮4(rest of 𝐮)]=𝟎qd1.\displaystyle\begin{bmatrix}\dots&\mathbf{0}&\mathbf{T}_{5,3}^{(1)}&\left(\mathbf{T}_{5,5}^{(1)}-\mathbf{T}_{5,5}^{(2)}\right)&-\mathbf{T}_{5,4}^{(2)}&\mathbf{0}&\dots\end{bmatrix}\begin{bmatrix}(\text{rest of }\mathbf{u})\\ \vdots\\ \mathbf{u}_{3}\\ \mathbf{u}_{5}\\ \mathbf{u}_{4}\\ \vdots\\ (\text{rest of }\mathbf{u})\end{bmatrix}=\mathbf{0}_{q^{d-1}}.

Recall 𝐮5\mathbf{u}_{5} is 𝐮\mathbf{u} on the shared face between Ω(1)\Omega^{(1)} and Ω(2)\Omega^{(2)}, 𝐮3\mathbf{u}_{3} is 𝐮\mathbf{u} on the other 2d12d-1 faces of Ω(1)\Omega^{(1)}, and 𝐮4\mathbf{u}_{4} is 𝐮\mathbf{u} on the other 2d12d-1 faces of Ω(2)\Omega^{(2)}. Thus the block main diagonal 𝐓5,5(1)𝐓5,5(2)\mathbf{T}_{5,5}^{(1)}-\mathbf{T}_{5,5}^{(2)} is of size qd1×qd1q^{d-1}\times q^{d-1}, and the block off-diagonals from 𝐓5,3(1)\mathbf{T}_{5,3}^{(1)} and 𝐓5,4(2)-\mathbf{T}_{5,4}^{(2)} are each of size qd1×(2d1)qd1q^{d-1}\times(2d-1)q^{d-1}. This means each block row of 𝐓sp\mathbf{T}_{\rm sp} has at most qd1×(4d1)qd1q^{d-1}\times(4d-1)q^{d-1} nonzero entries arranged into qd1×qd1q^{d-1}\times q^{d-1} sized blocks (for some rows it is less, since portions of 𝐮3,𝐮4\mathbf{u}_{3},\mathbf{u}_{4} can lie on Ω\partial\Omega and be encoded in 𝐮D\mathbf{u}_{D} instead), so it has a distinctive block-sparse structure. This sparse structure can be seen for a problem with 232^{3} subdomains in Figure 5(b), and for a larger problem with 535^{3} subdomains in Figure 6. Note that the sparsity pattern is unaffected by the total number of subdomains.

Once 𝐓sp\mathbf{T}_{\rm sp} is obtained, we may factorize it using a multilevel sparse solver such as MUMPS [27, 28]. An outline of the construction and factorization of 𝐓sp\mathbf{T}_{\rm sp} is shown in Algorithm 1. First we construct the DtN maps 𝐓(i)\mathbf{T}^{(i)}, potentially with GPUs. Then we use submatrices of these DtN maps to formulate 𝐓sp\mathbf{T}_{\rm sp}, which we then factorize.

Algorithm 1 Build Stage
1:domain Ω\Omega, discretized box partition {Ω(i)}i=1n\{\Omega^{(i)}\}_{i=1}^{n}, polynomial order pp.
2:{𝐓(i)}\{\mathbf{T}^{(i)}\}\leftarrow DtN maps from Alg. 3.
3:{E(j)}\{E^{(j)}\}\leftarrow set of box faces not on Ω\partial\Omega.
4:Nfacep2×#({E(j)})N_{\text{face}}\leftarrow p^{2}\times\#(\{E^{(j)}\}) (#\# of disc. points on faces).
5:𝐓sp sparse 𝟎Nface×Nface\mathbf{T}_{\rm sp}\leftarrow\text{ sparse }\mathbf{0}^{N_{\text{face}}\times N_{\text{face}}}.
6:for all E(j)E^{(j)} in {E(j)}\{E^{(j)}\} do
7:     Ω(i1),Ω(i2)\Omega^{(i_{1})},\Omega^{(i_{2})}\leftarrow the two boxes that share E(j)E^{(j)}.
8:     for k=1,2k=1,2 do
9:         FF\leftarrow set of faces on Ω(i1)Ω(i2)E(j)\Omega^{(i_{1})}\cup\Omega^{(i_{2})}\setminus E^{(j)}.
10:         𝖤\mathsf{E}\leftarrow indices for E(j)E^{(j)} local to their matrix. \triangleright differ for 𝐓sp,𝐓(ik)\mathbf{T}_{\rm sp},\mathbf{T}^{(i_{k})}
11:         𝖥\mathsf{F}\leftarrow indices for FF local to their matrix.
12:         [𝐓sp]𝖤,𝖤+=𝐓𝖤,𝖤(ik)[\mathbf{T}_{\rm sp}]_{\mathsf{E},\mathsf{E}}+=\mathbf{T}^{(i_{k})}_{\mathsf{E},\mathsf{E}}. \triangleright block main diagonal
13:         [𝐓sp]𝖤,𝖥+=𝐓𝖤,𝖥(ik)[\mathbf{T}_{\rm sp}]_{\mathsf{E},\mathsf{F}}+=\mathbf{T}^{(i_{k})}_{\mathsf{E},\mathsf{F}}. \triangleright block off-diagonal
14:     end for
15:end for
16:𝐋𝐔\mathbf{LU}\leftarrow factorization of 𝐓sp\mathbf{T}_{\rm sp} for use as solver.
17:return 𝐋𝐔\mathbf{LU}.

3.2.2 Solve Stage

Refer to caption
Figure 6: The sparsity pattern of a problem discretized with 5×5×55\times 5\times 5 boxes. Unlike the 232^{3} case shown in Figure 5, here some faces have up to 10 faces on the adjacent boxes that are not on domain boundary Ω\partial\Omega.

Once the factorized sparse system is obtained, we may use it as a direct solver for multiple variations in Dirichlet data 𝐮D\mathbf{u}_{D} and body load 𝐟\mathbf{f} in the same boundary value problem. This is because the DtN maps 𝐓(i)\mathbf{T}^{(i)} and thus 𝐓sp\mathbf{T}_{\rm sp} are only dependent on the problem’s partial differential operator. We find 𝐮\mathbf{u} on the shared faces of our discretization by solving the sparse system,

𝐓sp𝐮Ix=𝐛(𝐮D,𝐟) where 𝐮Ixu|Ω(i)Ω.\mathbf{T}_{\rm sp}\mathbf{u}_{I_{x}}=\mathbf{b}\left(\mathbf{u}_{D},\mathbf{f}\right)\text{ where }\mathbf{u}_{I_{x}}\approx\left.u\right|_{\bigcup\partial\Omega^{(i)}\setminus\partial\Omega}. (18)

Here 𝐛\mathbf{b} is a right-hand-side specified by the Dirichlet data and body load. To solve for the box interior points (blue in Figure 4), we use the solution operators 𝐒(i)\mathbf{S}^{(i)} shown in (2.1). These operators enforce the PDE on the box interiors given the now-complete box boundary data. Each matrix 𝐒(i)\mathbf{S}^{(i)} is dense, but they can be formed and applied to box boundary data indepedently. Thus they are well-suited to use with batched linear algebra routines on GPUs, such as those offered in PyTorch [29] (the solution operators are also necessary to construct the DtN operators 𝐓(i)\mathbf{T}^{(i)}, but in practice it is often preferred to re-compute them and avoid transferring memory between CPU and GPU). The solve stage is detailed in Algorithm 2.

Algorithm 2 Solve Stage
1:domain Ω\Omega, discretized box partition {Ω(i)}i=1n\{\Omega^{(i)}\}_{i=1}^{n}, factorized boundary solve system 𝐋𝐔\mathbf{LU}, Dirichlet data 𝐮D\mathbf{u}_{D}, body load 𝐟\mathbf{f}, number of discretization points NN.
2:𝐮𝟎N\mathbf{u}\leftarrow\mathbf{0}^{N}
3:𝐛\mathbf{b}\leftarrow right hand side, derived from 𝐮D\mathbf{u}_{D} and 𝐟\mathbf{f}.
4:𝐮[indices on Ω]=𝐮D\mathbf{u}\left[\text{indices on }\partial\Omega\right]=\mathbf{u}_{D}.
5:𝐮[indices on box boundaries not on Ω]=𝐋𝐔(𝐛)\mathbf{u}\left[\text{indices on box boundaries not on }\partial\Omega\right]=\mathbf{LU}(\mathbf{b}).
6:{𝐒(i)}\{\mathbf{S}^{(i)}\}\leftarrow from Alg. 3.
7:for i=1,,ni=1,\dots,n do
8:     𝖼i\mathsf{c}_{i}\leftarrow indices for points in interior of Ω(i)\Omega^{(i)}.
9:     𝗑i\mathsf{x}_{i}\leftarrow indices for points on Ω(i)\partial\Omega^{(i)}.
10:     𝐮[𝖼i]=𝐒(i)𝐮[𝗑i]\mathbf{u}\left[\mathsf{c}_{i}\right]=\mathbf{S}^{(i)}\mathbf{u}\left[\mathsf{x}_{i}\right].
11:end for
12:return 𝐮\mathbf{u}.

3.2.3 Advantages of the 2-Level Framework

In our solver we still discretize Ω\Omega into a series of non-overlapping subdomains, but instead of constructing a hierarchical series of solution operators in an upward pass, we only form solution operators at the leaf level and use leaf DtN maps to form a larger system that solves for all of the subdomain boundaries concurrently - unlike the original methodology for HPS, which encodes different subdomain boundaries at each level. This resulting system is sparse, so we can factorize it using efficient sparse direct solvers such as those within the PETSc library [30].

Highly-optimized multifrontal solvers such as MUMPS can use various pivoting techniques such as partial threshold pivoting, relaxed pivoting [31], and tournament pivoting [32] to maximize numerical stability of a matrix factorization like sparse LU [33]. This enables accurate direct solves with potentially ill-conditioned systems. In theory such pivoting techniques could be used on the higher-level solution operators in traditional HPS to improve their stability (although these operators are dense, so sparse solvers could not be used). However, encoding solutions for all shared faces into one sparse matrix operator 𝐓sp\mathbf{T}_{\rm sp} allows such pivoting techniques to be applied holistically on a much larger portion of the problem at once. This means the best possible pivoting can be applied, maximizing the factorization’s stability while preserving an optimal sparsity pattern. In effect we outsource the multilevel component of HPS to an optimized solver better equipped to handle its conditioning. Previous work has found other effective means to represent the HPS discretization as one large system, but required iterative techniques to solve [34]. Our approach preserves the original strength of HPS being a direct method.

In addition, the non-hierarchical nature of the formulation of 𝐓sp\mathbf{T}_{\rm sp} - relying only on leaf DtN maps and not higher level DtN or solution operators - increases the method’s uniformity and parallelism. All DtN maps can be constructed in parallel; they are dense matrices, but that makes them well-suited to formulation on GPUs if available via batched linear algebra routines [35]. Similarly, the computation of the leaf solution operators in the solve stage may also be done in batch with GPUs. This allows us to use modern hardware to greatly improve the computational performance of the HPS method.

Algorithm 3 Form DtN Maps
1:domain Ω\Omega, discretized box partition {Ω(i)}i=1n\{\Omega^{(i)}\}_{i=1}^{n}, box local interior indices IlocI_{loc}, box local boundary indices BlocB_{loc}, Neumann operator 𝐍\mathbf{N}.
2:for i=1,,ni=1,\dots,n do
3:     𝐀(i)\mathbf{A}^{(i)}\leftarrow spectral differentiation matrix on Ω(i)\Omega^{(i)}.
4:     𝐒(i)(AIloc,Iloc(i))1AIloc,Bloc\mathbf{S}^{(i)}\leftarrow-\left(A_{I_{loc},I_{loc}}^{(i)}\right)^{-1}A_{I_{loc},B_{loc}}. \triangleright solution maps
5:     𝐓(i)𝐍[IS(i)]\mathbf{T}^{(i)}\leftarrow\mathbf{N}\begin{bmatrix}I\\ S^{(i)}\end{bmatrix}.\triangleright DtN maps
6:end for
7:return {𝐒(i)},{𝐓(i)}\{\mathbf{S}^{(i)}\},\{\mathbf{T}^{(i)}\}

3.3 Asymptotic Cost

The build stage has an asymptotic cost of

O(p6N) (forming DtNs) +O(N2) (sparse factorization).O\left(p^{6}N\right)\text{ (forming DtNs) }+O\left(N^{2}\right)\text{ (sparse factorization).}

We formulate the DtNs with embarrassingly parallel dense matrix operations that are well-suited for GPUs using batched linear algebra routines. As pp increases, its effect on the DtN size and run time is substantial - this is because of the need to invert a large block of the spectral differentiation submatrix 𝐀(i)\mathbf{A}^{(i)}, as shown in Algorithm 3. Nevertheless, we can still easily formulate DtNs in batch for p15p\leq 15. Even for larger pp where batched operations are impractical, formulating the DtNs in serial is relatively fast as shown in our numerical results. In practice the steepest cost in the build stage comes from the factorization of the sparse system, which for a 3D problem using a multilevel solver is O(N2)O(N^{2}). However, the sparse factorization is unaffected by the choice of pp and this cost is only required once per differential operator and domain discretization. Since the sparse system is relatively small (it has <6Np×6Np<\frac{6N}{p}\times\frac{6N}{p} total entries with 11p2\approx 11p^{2} nonzero entries per row), it can be stored in memory and reused for multiple solves. For the solve stage, our asymptotic cost is

O(N4/3) (sparse solve) +O(p6N) (interior box solves).O\left(N^{4/3}\right)\text{ (sparse solve) }+O\left(p^{6}N\right)\text{ (interior box solves).}

If we store the solution operators in memory then the interior box solves are only O(p2N)O(p^{2}N), but in practice it is typically preferred to reformulate the solution operators and avoid having to store and potentially transfer additional memory.

4 Numerical Experiments

Refer to caption
Figure 7: (left) Accuracy of our solver for the Poisson Equation. (right) Accuracy for the Helmholtz equation fixed to 10 points-per-wavelength. Both cases feature hh-refinement with each line marking a fixed pp.

4.1 Accuracy results - Poisson and Helmholtz Equations

We investigate the accuracy of our HPS solver on the homogeneous Poisson equation,

Δu\displaystyle\Delta u =\displaystyle= 0 on Ω=[1.1,0.1]×[1,2]×[1.2,2.2]\displaystyle 0\text{ on }\Omega=[-1.1,0.1]\times[1,2]\times[1.2,2.2]
u(𝐱)\displaystyle u(\mathbf{x}) =\displaystyle= 14π𝐱2 on Ω.\displaystyle\frac{1}{4\pi\|\mathbf{x}\|_{2}}\text{ on }\partial\Omega. (19)

Here the Dirichlet boundary condition is defined so that it has a solution in its Green’s function. We compute numerical solutions to this equation across a range of block sizes hh and polynomial orders pp, comparing their relative 2\ell^{2}-norm error to the Green’s function. The accuracy results are in Figure 7. We notice convergence rates of h7\approx h^{7} for p=6p=6 and h12h^{12} for p=8p=8 until the relative error reaches 1013\approx 10^{-13}, at which point it flattens (for larger pp the error reaches 101310^{-13} before a convergence rate can be fairly estimated). Subsequent refinement does not generally improve accuracy, but it does not lead to an increase in error either. This is likely due to the HPS discretization being fairly well-conditioned, especially compared to global spectral methods.

To assess our method’s accuracy on oscillatory problems we also evaluate it on the homogeneous Helmholtz equation with its Green’s function used as a manufactured solution with wavenumber κ\kappa,

Δu+κu\displaystyle\Delta u+\kappa u =\displaystyle= 0 on Ω=[1.1,0.1]×[1,2]×[1.2,2.2]\displaystyle 0\text{ on }\Omega=[-1.1,0.1]\times[1,2]\times[1.2,2.2]
u(𝐱)\displaystyle u(\mathbf{x}) =\displaystyle= cos(κ|𝐱|)4π|𝐱| on Ω.\displaystyle\frac{\cos(\kappa|\mathbf{x}|)}{4\pi|\mathbf{x}|}\text{ on }\partial\Omega. (20)

In Figure (8) we show relative 2\ell^{2}-norm errors for our solver on Equation (20) with fixed wavenumbers κ=16\kappa=16 and κ=30\kappa=30, which correspond to roughly 2.5 and 4.5 wavelengths on the domain, respectively. We see comparable convergence results to the Poisson equation, though the lower bound on relative error is somewhat higher for each problem (1011\approx 10^{-11} for κ=16\kappa=16 and 1010\approx 10^{-10} for κ=30\kappa=30). However, increasing the resolution further does not cause the error to diverge. Along with the results for Equation (19), this suggests our method remains stable when problems are over-resolved. Figure 7 shows the relative error for the Helmholtz equation with a varying κ\kappa, set to provide 10\approx 10 discretization points per wavelength. We see higher pp corresponds to a better numerical accuracy, while increasing hh does not improve results (since the wavenumber is also increased), but errors remain relatively stable. Since our method maintains accuracy while increasing degrees of freedom linearly with κ\kappa, it appears to not suffer from the pollution effect.

Refer to caption
Figure 8: Accuracy of our solver for the Helmholtz equation set to wavenumbers κ=16\kappa=16 (2.5\approx 2.5 wavelengths on the domain) and κ=30\kappa=30 (4.5\approx 4.5 wavelengths).

We then consider the gravity Helmholtz equation,

Δu+κ2(1x3)u\displaystyle\Delta u+\kappa^{2}(1-x_{3})u =\displaystyle= 1 on Ω=[1.1,2.1]×[1,0]×[1.2,0.2]\displaystyle-1\text{ on }\Omega=[1.1,2.1]\times[-1,0]\times[-1.2,-0.2]
u(𝐱)\displaystyle u(\mathbf{x}) =\displaystyle= 0 on Ω.\displaystyle 0\text{ on }\partial\Omega. (21)

This equation adds a spatially-varying component to the wavenumber variation based on x3x_{3} analogous to the effect of a gravitational field, hence the name [36]. Since 1<1x3<2.21<1-x_{3}<2.2, the variation is greater through Ω\Omega than in the constant Helmholtz equation with the same κ\kappa. Unlike the previous Poisson and Helmholtz equations, equation (21) does not have a manufactured solution. It also produces relatively sharp gradients near the domain boundaries since there is a zero Dirichlet boundary condition, and it has a variable coefficient in spatial coordinates (specifically x3x_{3}). We investigate our solver’s accuracy by computing the relative 2\ell^{2}-norm errors of our results to an over-resolved solution at select points. Figure 9 shows the solution of this problem using select pp and hh values, while Figure 10 shows these errors in the case of pp-refinement, for a range of values of hh. We can see our solver shows convergence for each hh.

Refer to caption
(a) 64000\approx 64000 discretization points.
Refer to caption
(b) 185000\approx 185000 discretization points.
Figure 9: Gravity Helmholtz equation as described in (21. A cross-section has been taken through the xx-axis. We see consistent results across different pp and hh.
Refer to caption
Figure 10: Convergence of our solver on the gravity Helmholtz equation with κ=12\kappa=12, relative to a highly-resolved solution. Here hh is fixed and resolution is increased through pp-refinement.

4.2 Speed and scaling

To evaluate the computational complexity of our HPS solver, we measure the execution times of its three components: the construction of the DtN operators used for the build stage, the factorization time for our sparse matrix constructed by the DtNs, and the solve time. We know the asymptotic cost of constructing the DtN operators is O(p6N)O\left(p^{6}N\right) as shown in Equation (3.3). Figure (11) illustrates the run times for the build stage in solving the Poisson and Helmholtz equations detailed in Section 4.1, using batched linear algebra routines on an Nvidia V100 GPU. As expected, we observe a linear increase in runtime with hh-refinement (i.e. a higher NN with fixed pp). The constant for the linear scaling in hh, though, increases sharply with higher values of pp.

A challenge with utilizing GPUs for DtN formulation is memory storage. In 3D each DtN map is size 6p2×6p2\approx 6p^{2}\times 6p^{2}. Computing them requires the inversion of the spectral differentiation operator on interior points which is size (p2)3×(p2)3\approx(p-2)^{3}\times(p-2)^{3}. These matrices grow quickly as pp increases (at p=20p=20 they are 5.7\approx 5.7 million and 3434 million entries, respectively). For larger problem sizes it is impossible to store all DtN maps on a GPU concurrently. Interior differentiation submatrices do not have to be stored beyond formulating their box’s DtN map, but they are much larger for high pp and present considerable overhead. To handle this memory concern we have implemented an aggressive two-level scheduler that stores a limited number of DtNs on the GPU at once, and formulates a smaller number of DtNs in batch at a time. For p16p\geq 16 on a V100 GPU the best results occur when only one DtN is formulated at a time. However, we may still store multiple DtNs on the GPU before transferring them collectively to the CPU.

Refer to caption
Figure 11: Time to formulate the DtN maps and construct the sparse system.

In Figure (12) we see the time to factorize the resulting sparse matrix for the build stage, using MUMPS. The expected asymptotic cost is O(N2)O(N^{2}). However, our results more closely resemble a lower cost of O(N3/2)O(N^{3/2}) for NN up to \approx 8 million. Overall the factorization time dominates the build stage of our method given its more costly scaling in NN and the reliance on CPUs through MUMPS. Lastly in Figure (13) we see the runtime for the leaf interior solves. These follow a linear scaling similar to the batched DtN construction.

Refer to caption
Figure 12: Time to factorize the sparse system using MUMPS. (Will add O(N3/2)O(N^{3/2}) line)
Refer to caption
Figure 13: Time to solve for interior values of the leaf nodes in parallel.

4.3 Curved and non-rectangular domains

Refer to caption
(a) On a curved domain with a wavy edge, 2.6×105\approx 2.6\times 10^{5} discretization points.
Refer to caption
(b) On an annulus, with 1.04×105\approx 1.04\times 10^{5} discretization points.
Figure 14: Helmholtz equation on two non-rectangular domains, both with a Dirichlet boundary condition of 1. A cross section has been taken along the zz axis.
Refer to caption
Figure 15: Accuracy of our solver for the Helmholtz equation on a curved domain shown in Fig. 14(a) set to wavenumbers κ=16\kappa=16 (2.5\approx 2.5 wavelengths on domain) and κ=30\kappa=30 (4.5\approx 4.5 wavelengths).

The HPS method has been extended to other problems such as surface PDEs [37], inverse scattering problems [38], and more generally non-rectangular domains. We can apply our HPS solver to non-rectangular domain geometries through the use of an analytic parameterization between the domain we wish to model, such as a sinusoidal curve along the yy-axis shown in Figure (14(a)), and a rectangular reference domain. These parameter maps extend the versatility of the solver, but feature multiple challenges in the implementation: they require variable coefficients for our differential operators; they may use mixed second order differential terms, i.e. 2uxixj\frac{\partial^{2}u}{\partial x_{i}\partial x_{j}} where iji\neq j; and (depending on the domain) may have areas that are near-singular, such as around the sharp point (x,y)(2,0)(x,y)\approx(2,0) in Figure (14(a)). We investigate the accuracy of our solver on this sinusoidal domain for the Helmholtz equation as shown in Eq. (20), but now on the domain

Ψ={x1,x2ψ(x1),x3 for (x1,x2,x3)Ω} where ψ(z)=114sin(6z).\Psi=\left\{x_{1},\frac{x_{2}}{\psi(x_{1})},x_{3}\text{ for }(x_{1},x_{2},x_{3})\in\Omega\right\}\text{ where }\psi(z)=1-\frac{1}{4}\sin(6z). (22)

We map (20) on Ψ\Psi to a cubic reference domain Ω=[1.1,2.1]×[1,0]×[1.2,0.2]\Omega=[1.1,2.1]\times[-1,0]\times[-1.2,0.2] using the parameter map

2ux12((ψ(x1)x2ψ(x1))2+ψ(x1)2)2ux222ux322ψ(x1)x2ψ(xi)2ux1x2\displaystyle-\frac{\partial^{2}u}{\partial x_{1}^{2}}-\left(\left(\frac{\psi^{\prime}(x_{1})x_{2}}{\psi(x_{1})}\right)^{2}+\psi(x_{1})^{2}\right)\frac{\partial^{2}u}{\partial x_{2}^{2}}-\frac{\partial^{2}u}{\partial x_{3}^{2}}-2\frac{\psi^{\prime}(x_{1})x_{2}}{\psi(x_{i})}\frac{\partial^{2}u}{\partial x_{1}\partial x_{2}}
ψ′′(x1)x2ψ(xi)ux2κu=0, (x1,x2,x3)Ω.\displaystyle-\frac{\psi^{\prime\prime}(x_{1})x_{2}}{\psi(x_{i})}\frac{\partial u}{\partial x_{2}}-\kappa u=0,\text{ }(x_{1},x_{2},x_{3})\in\Omega. (23)

We test this problem with a manufactured solution in the Green’s function of the Helmholtz equation, with corresponding Dirichlet data matching our test in Section 4.1. A plot of this problem and the computed solution for wavenumber κ=26\kappa=26 is shown in Figure 14(a), while convergence studies for this problem with κ=16\kappa=16 and κ=30\kappa=30 are shown in Figure 15. We observe a similar convergence pattern to the Helmholtz test on a cubic domain, although the speed of convergence is somewhat slower.

We also investigate our solver’s accuracy on the Helmholtz equation when the domain is a three-dimensional annulus, shown in Figure 14(b). We use an analytic parametrization to map this problem to a rectangular reference domain with higher resolution in the yy-axis and a periodic boundary condition. Relative errors in pp and NN are comparable to those for the problem on a curved domain.

5 Conclusion and Future Work

We have designed and implemented a two-level framework for the Hierarchical Poincaré-Steklov method that effectively decouples the hierarchical and dense portions of the method, using state-of-the-art multifrontal solvers to improve the stability of the first component, and GPUs to improve performance of the second component. Our code works for three-dimensional problems. However, there are still several directions for future work.

Although multifrontal solvers like MUMPS provide excellent stability, their use is also the main performance bottleneck in our implementation. GPU-based sparse solvers such as those in the NVIDIA cuDSS library may offer better performance. Alternatively, the reduced problem given to the sparse solver may itself be compressed into a two-level framework using SlabLU [39]. There are also potential performance gains by exploiting rank structure in the leaf operators, especially in the 3D discretization where these operators can be fairly large.

This implementation of HPS may be extended to other formulations of the method such as those using impedance maps rather than DtN maps, which may further improve conditioning. It may also be extended to adaptive discretizations [40, 41]. In addition to handling corner nodes, interpolation is also useful for implementing merge operators across subdomains with variable pp. Our method could also work in this adaptive case, although variable pp would somewhat alter the block-sparse structure of 𝐓sp\mathbf{T}_{\rm sp}.

References

  • [1] I. Babuška, F. Ihlenburg, E. T. Paik, S. A. Sauter, A generalized finite element method for solving the helmholtz equation in two dimensions with minimal pollution, Computer methods in applied mechanics and engineering 128 (3-4) (1995) 325–359.
  • [2] I. M. Babuska, S. A. Sauter, Is the pollution effect of the fem avoidable for the helmholtz equation considering high wave numbers?, SIAM Journal on numerical analysis 34 (6) (1997) 2392–2423.
  • [3] J. Galkowski, E. A. Spence, Does the helmholtz boundary element method suffer from the pollution effect?, Siam Review 65 (3) (2023) 806–828.
  • [4] A. Townsend, S. Olver, The automatic solution of partial differential equations using a global spectral method, Journal of Computational Physics 299 (2015) 106–123.
  • [5] J. P. Boyd, Chebyshev and Fourier spectral methods, Courier Corporation, 2001.
  • [6] S. Olver, A. Townsend, A fast and well-conditioned spectral method, siam REVIEW 55 (3) (2013) 462–489.
  • [7] T. Babb, A. Gillman, S. Hao, P.-G. Martinsson, An accelerated poisson solver based on multidomain spectral discretization, BIT Numerical Mathematics 58 (2018) 851–879.
  • [8] A. Gillman, P.-G. Martinsson, A direct solver with o(n) complexity for variable coefficient elliptic pdes discretized via a high-order composite spectral collocation method, SIAM Journal on Scientific Computing 36 (4) (2014) A2023–A2046.
  • [9] A. T. Patera, A spectral element method for fluid dynamics: laminar flow in a channel expansion, Journal of computational Physics 54 (3) (1984) 468–488.
  • [10] Y. Maday, R. Munoz, Spectral element multigrid. ii. theoretical justification, Journal of scientific computing 3 (1988) 323–353.
  • [11] A. Yeiser, A. A. Townsend, A spectral element method for meshes with skinny elements, arXiv preprint arXiv:1803.10353 (2018).
  • [12] P. Martinsson, The hierarchical poincaré-steklov (hps) solver for elliptic pdes: A tutorial, arXiv preprint arXiv:1506.01308 (2015).
  • [13] P.-G. Martinsson, Fast direct solvers for elliptic PDEs, SIAM, 2019.
  • [14] T. Babb, P.-G. Martinsson, D. Appelö, Hps accelerated spectral solvers for time dependent problems: Part i, algorithms, in: Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018: Selected Papers from the ICOSAHOM Conference, London, UK, July 9-13, 2018, Springer International Publishing, 2020, pp. 131–141.
  • [15] K. Chen, D. Appelö, T. Babb, P.-G. Martinsson, Fast and high-order approximation of parabolic equations using hierarchical direct solvers and implicit runge-kutta methods, Communications on Applied Mathematics and Computation (2024) 1–21.
  • [16] S. Hao, P.-G. Martinsson, A direct solver for elliptic pdes in three dimensions based on hierarchical merging of poincaré–steklov operators, Journal of Computational and Applied Mathematics 308 (2016) 419–434.
  • [17] A. Yesypenko, P.-G. Martinsson, GPU optimizations for the hierarchical poincaré-steklov scheme, in: International Conference on Domain Decomposition Methods, Springer, 2022, pp. 519–528.
  • [18] D. Fortunato, N. Hale, A. Townsend, The ultraspherical spectral element method, Journal of Computational Physics 436 (2021) 110087.
  • [19] L. N. Trefethen, Finite difference and spectral methods for ordinary and partial differential equations, Cornell University-Department of Computer Science and Center for Applied …, 1996, Ch. 8.
  • [20] A. Gillman, P.-G. Martinsson, An o (n) algorithm for constructing the solution operator to 2d elliptic boundary value problems in the absence of body loads, Advances in Computational Mathematics 40 (2014) 773–796.
  • [21] T. A. Davis, Direct methods for sparse linear systems, SIAM, 2006.
  • [22] A. George, Nested dissection of a regular finite element mesh, SIAM journal on numerical analysis 10 (2) (1973) 345–363.
  • [23] P. R. Amestoy, T. A. Davis, I. S. Duff, An approximate minimum degree ordering algorithm, SIAM Journal on Matrix Analysis and Applications 17 (4) (1996) 886–905.
  • [24] N. N. Beams, A. Gillman, R. J. Hewett, A parallel shared-memory implementation of a high-order accurate solution technique for variable coefficient helmholtz problems, Computers & Mathematics with Applications 79 (4) (2020) 996–1011.
  • [25] D. Chipman, Ellipticforest: A direct solver library for elliptic partial differential equations on adaptive meshes, Journal of Open Source Software 9 (96) (2024) 6339.
  • [26] A. Gillman, A. H. Barnett, P.-G. Martinsson, A spectrally accurate direct solution technique for frequency-domain scattering problems with variable media, BIT Numerical Mathematics 55 (2015) 141–170.
  • [27] P. R. Amestoy, I. S. Duff, J.-Y. L’Excellent, J. Koster, Mumps: a general purpose distributed memory sparse solver, in: International Workshop on Applied Parallel Computing, Springer, 2000, pp. 121–130.
  • [28] J. Pan, L. Xiao, M. Tian, T. Liu, L. Wang, Heterogeneous multi-core optimization of mumps solver and its application, in: Proceedings of the 2021 ACM International Conference on Intelligent Computing and Its Emerging Applications, 2021, pp. 122–127.
  • [29] S. Imambi, K. B. Prakash, G. Kanagachidambaresan, Pytorch, Programming with TensorFlow: solution for edge computing applications (2021) 87–104.
  • [30] S. Balay, S. Abhyankar, M. Adams, J. Brown, P. Brune, K. Buschelman, E. Constantinescu, A. Dener, J. Faibussowitsch, W. Gropp, et al., Petsc/tao users manual revision 3.22, Tech. rep., Argonne National Laboratory (ANL), Argonne, IL (United States) (2024).
  • [31] I. S. Duff, S. Pralet, Towards stable mixed pivoting strategies for the sequential and parallel solution of sparse symmetric indefinite systems, SIAM Journal on Matrix Analysis and Applications 29 (3) (2007) 1007–1024.
  • [32] L. Grigori, J. W. Demmel, H. Xiang, Communication avoiding gaussian elimination, in: SC’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, IEEE, 2008, pp. 1–12.
  • [33] P. Amestoy, J.-Y. L’Excellent, T. Mary, C. Puglisi, Mumps: Multifrontal massively parallel solver for the direct solution of sparse linear equations, in: EMS/ECMI Lanczos prize, ECM conference, Sevilla, 2024.
  • [34] J. P. Lucero Lorca, N. Beams, D. Beecroft, A. Gillman, An iterative solver for the hps discretization applied to three dimensional helmholtz problems, SIAM Journal on Scientific Computing 46 (1) (2024) A80–A104.
  • [35] J. Dongarra, S. Hammarling, N. J. Higham, S. D. Relton, M. Zounon, Optimized batched linear algebra for modern architectures, in: European Conference on Parallel Processing, Springer, 2017, pp. 511–522.
  • [36] A. H. Barnett, B. J. Nelson, J. M. Mahoney, High-order boundary integral equation solution of high frequency wave scattering from obstacles in an unbounded linearly stratified medium, Journal of Computational Physics 297 (2015) 407–426.
  • [37] D. Fortunato, A high-order fast direct solver for surface pdes, SIAM Journal on Scientific Computing 46 (4) (2024) A2582–A2606.
  • [38] C. Borges, A. Gillman, L. Greengard, High resolution inverse scattering in two dimensions using recursive linearization, SIAM Journal on Imaging Sciences 10 (2) (2017) 641–664.
  • [39] A. Yesypenko, P.-G. Martinsson, SlabLU: a two-level sparse direct solver for elliptic PDEs, Advances in Computational Mathematics 50 (4) (2024) 90.
  • [40] P. Geldermans, A. Gillman, An adaptive high order direct solution technique for elliptic boundary value problems, SIAM Journal on Scientific Computing 41 (1) (2019) A292–A315.
  • [41] D. Chipman, D. Calhoun, C. Burstedde, A fast direct solver for elliptic pdes on a hierarchy of adaptively refined quadtrees, arXiv preprint arXiv:2402.14936 (2024).