This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Department of Mathematics, Florida State University, Tallahassee, Florida 22institutetext: Department of Scientific Computing, Florida State University, Tallahassee, Florida 33institutetext: Computer Science and Mathematics Division, Oak Ridge National Laboratory 44institutetext: Physical Sciences Laboratory, NOAA/Earth System Research Laboratories, Boulder, Colorado

Nonlinear ensemble filtering with diffusion models:
Application to the surface quasi-geostrophic dynamics

Feng Bao[Uncaptioned image] 11    Hristo G. Chipilski [Uncaptioned image] Corresponding author: Dr. Hristo G. Chipilski, hchipilski@fsu.edu.
This Work has been submitted to Monthly Weather Review. Copyright in this Work may be transferred without further notice.
22
   Siming Liang 11   
Guannan Zhang
33
   Jeffrey S. Whitaker 44
Abstract

The intersection between classical data assimilation methods and novel machine learning techniques has attracted significant interest in recent years. Here we explore another promising solution in which diffusion models are used to formulate a robust nonlinear ensemble filter for sequential data assimilation. Unlike standard machine learning methods, the proposed Ensemble Score Filter (EnSF) is completely training-free and can efficiently generate a set of analysis ensemble members. In this study, we apply the EnSF to a surface quasi-geostrophic model and compare its performance against the popular Local Ensemble Transform Kalman Filter (LETKF), which makes Gaussian assumptions on the posterior distribution. Numerical tests demonstrate that EnSF maintains stable performance in the absence of localization and for a variety of experimental settings. We find that EnSF achieves competitive performance relative to LETKF in the case of linear observations, but leads to significant advantages when the state is nonlinearly observed and the numerical model is subject to unexpected shocks. A spectral decomposition of the analysis results shows that the largest improvements over LETKF occur at large scales (small wavenumbers) where LETKF lacks sufficient ensemble spread. Overall, this initial application of EnSF to a geophysical model of intermediate complexity is very encouraging, and motivates further developments of the algorithm for more realistic problems.

Keywords:
diffusion models ensemble data assimilation surface quasi-geostrophic turbulence

1 Introduction

Since its introduction by Evensen (1994), the ensemble Kalman filter (EnKF) has been utilized extensively for the initialization of geophysical models and has inspired the rapidly developing subfield of ensemble data assimilation (DA). In numerical weather prediction (NWP), the ensemble approach has been highly successful either on its own or in conjunction with variational methods such as 4D-Var (e.g., Isaksen et al. 2010). This success largely stems from the ability of a forecast ensemble to capture the “errors of the day”, replacing the assumptions of static forecast covariances used in earlier DA methods (e.g., optimal interpolation) and purely variational approaches.

In meteorological applications, EnKF algorithms have been particularly beneficial at the convective scales due to their ability to handle complex numerical models and observing systems (e.g., Aksoy et al. 2009, 2010; Jones et al. 2010; Chipilski et al. 2020, 2022; Hu et al. 2023). One of the practical advantages of ensemble DA methods is their ability to incorporate highly nonlinear models and observations without the need to explicitly compute any tangent linear and adjoint operators. This is an extra overhead for any variational DA method in operational contexts as one needs to continuously recompute these operators once a new version of the model is released. Nevertheless, standard EnKFs still approximate the Kalman filter’s analysis equations, which are derived from restrictive Gaussian assumptions. These Gaussian assumptions manifest themselves in the linear nature of the EnKF’s update, which limits the ability to represent more complex analysis (posterior) distributions (Spantini et al. 2022). Recent studies have shown that the analysis biases introduced by methods leveraging the Kalman filter equations, such as the EnKF, can have a detrimental impact on the ability to forecast high-impact weather events like hurricanes (Poterjoy 2022).

Particle filters (PFs; Gordon et al. 1993; van Leeuwen 2009; van Leeuwen et al. 2019) represent a natural replacement for standard EnKFs due to their provable convergence to the correct posterior distribution (Crisan and Doucet 2002). Although they were introduced around the same time as EnKFs, they have taken a long time to reach operational potential. The underlying reasons have practical dimensions – in order to avoid the curse of dimensionality, PFs require a prohibitively large number of particles. There has been visible progress in this direction, with several great examples of successful PF implementations in large systems (Tödter et al. 2016; Poterjoy et al. 2017; Rojahn et al. 2023). PFs have also stimulated the development of many complementary non-Gaussian DA approaches, including lognormal and bi-Gaussian extensions of standard DA methods (Fletcher 2010; Fletcher et al. 2023; Chan et al. 2020) and the recently developed two-step quantile-conserving ensemble filtering framework (QCEFF) of Anderson (2022, 2023).

It is clear that research into nonlinear/non-Gaussian ensemble DA methods will continue to attract more interest in view of the increasing complexity of numerical models and observing systems. One promising way to address these challenges is to exploit the ongoing revolution in generative artificial intelligence (GenAI; see Luo and Hu 2021; Song et al. 2021; Baranchuk et al. 2022). Up to now, two GenAI approaches have been adopted in the ensemble DA context. Chipilski (2023) showed how the appealing mathematical properties of invertible neural networks (normalizing flows) can help generalize the Kalman Filter to arbitrarily non-Gaussian distributions. The resulting Conjugate Transform Filter (CTF) is amenable to ensemble approximations which can take advantage of existing EnKF solvers. The second GenAI framework is associated with the score-based filter of Bao et al. (2023b) which harnesses the expressive power of diffusion models to approximate complex posterior distributions. The use of a pseudo time to gradually transform samples to the desired distribution makes this approach somewhat similar to the Particle Flow Filter (PFF; Pulido and van Leeuwen 2019; Hu and van Leeuwen 2021), but its ability to incorporate non-Gaussian priors is an important distinction. Following its original formulation, a more efficient ensemble version of the score-based filter, henceforth referred to as the Ensemble Score Filter (EnSF), has been developed. Owing to its training-free nature, EnSF has been shown to estimate the states of a high-dimensional Lorenz-96 system with up to 10610^{6} variables (Bao et al. 2023a). More recently, the EnSF framework has been also extended to conduct joint state-parameter estimation (Bao et al. 2023c).

In this study, we focus on the further development of the EnSF algorithm and aim to introduce its capabilities to the geophysical community. Motivated by its scalable performance in high-dimensional and nonlinear settings, we implement EnSF in a surface quasi-geostrophic (SQG) model which provides a realistic description of geophysical turbulence (Smith et al. 2023). Our main objective is to demonstrate that (i) EnSF maintains stable performance in high dimensions while (ii) offering advantages compared to standard EnKF methods. This is achieved by performing a hierarchy of SQG experiments in which we gradually increase the complexity of the DA task toward more nonlinear settings.

The rest of this paper is organized as follows. In Section 2, we introduce basic theory from diffusion models and discuss how the associated backward (reverse-time) stochastic differential equations (SDEs) can be used to sample from the Baysian posterior. After describing important experimental design choices in Section 3, we present our DA results with the SQG model in Section 4. In Section 5, we conclude with a summary of all main findings and outline several interesting research directions.

2 Methodology

We first introduce the diffusion model as a general framework to sample from a user-specified target distribution. Then, we explain how score-based diffusion models fit the standard forecast-analysis procedure of sequential DA methods. Finally, we discuss numerical schemes for solving the DA problem under the score-based filtering framework, and provide details on how to build ensemble approximations.

2.1 Introduction to diffusion models

In a diffusion model, there is a forward Nx{\mathbb{R}}^{N_{x}}-valued stochastic differential equation (SDE) defined as

d𝒁t=b(t)𝒁tdt+σ(t)d𝑾t(forward SDE),d\boldsymbol{Z}_{t}=b(t)\boldsymbol{Z}_{t}dt+\sigma(t)d\boldsymbol{W}_{t}\quad\text{(forward SDE),} (1)

where 𝑾t\boldsymbol{W}_{t} is a standard Nx{\mathbb{R}}^{N_{x}}-valued Brownian motion (Wiener process) corresponding to an Itô type stochastic integral d𝑾t\int\cdot d\boldsymbol{W}_{t}, while bb and σ\sigma are two explicitly given functions referred to as the drift and diffusion coefficients, respectively. The initial condition 𝒁0\boldsymbol{Z}_{0} of the SDE in Eq. (1) follows some target distribution with its probability density function (pdf) denoted by Q(𝐳0)Q({\mathbf{z}}_{0}). One can show that with properly chosen bb and σ\sigma, the diffusion process {𝒁t}0tT\{\boldsymbol{Z}_{t}\}_{0\leq t\leq T} can transform any target pdf Q(𝐳0){Q}({\mathbf{z}}_{0}) to a standard Gaussian, i.e. 𝒁T𝒩(𝟎,𝐈)\boldsymbol{Z}_{T}\sim\mathcal{N}({\mathbf{0}},{\mathbf{I}}), where [0,T][0,T] is often referred to as the pseudo-time interval.

Generating samples {𝒁0i}i=1N\{\boldsymbol{Z}_{0}^{i}\}_{i=1}^{N} of the target random variable 𝒁0\boldsymbol{Z}_{0} pertains to simulating the following reverse-time SDE:

d𝒁t=[b(t)𝒁tσ2(t)𝐬(𝒁t,t)]dt\displaystyle d\boldsymbol{Z}_{t}=\left[b(t)\boldsymbol{Z}_{t}-\sigma^{2}(t){\mathbf{s}}(\boldsymbol{Z}_{t},t)\right]dt +σ(t)d𝑾t(reverse-time SDE),\displaystyle+\sigma(t)d\overleftarrow{\boldsymbol{W}}_{t}\quad\text{(reverse-time SDE),} (2)

where d𝑾t\int\cdot d\overleftarrow{\boldsymbol{W}}_{t} is a backward Itô stochastic integral (Kloeden and Platen 1992; Bao et al. 2016), and 𝐬(,t){\mathbf{s}}(\cdot,t) is the so-called score function given by

𝐬(𝐳t,t)logQ(𝐳t),{\mathbf{s}}({\mathbf{z}}_{t},t)\coloneqq\nabla\log Q({\mathbf{z}}_{t}), (3)

where Q(𝐳t)Q({\mathbf{z}}_{t}) denotes the pdf of 𝒁t\boldsymbol{Z}_{t}. Note that the reverse-time SDE in Eq. (2) is also a diffusion process except that the propagation direction is backwards in time from TT to 0 with initial condition 𝒁T\boldsymbol{Z}_{T} given at time TT. An important result from the literature is that the solution 𝒁0\boldsymbol{Z}_{0} of the reverse-time SDE follows the target distribution (Bao et al. 2023b). The practical implications are that we can generate samples from a standard Gaussian distribution (which can be done efficiently) and use the reverse-time SDE to transform them to samples of the target distribution. The score function 𝐬(,t){\mathbf{s}}(\cdot,t) has an important role in this mapping process as it stores information about the distribution of the samples, which in turn helps guide their transformation over the pseudo-time interval as T0T\rightarrow 0. In particular, having the score function associated with the target pdf Q(𝐳0)Q({\mathbf{z}}_{0}) and the predefined forward SDE allows us to generate an unlimited number of target samples by running the reverse-time SDE in Eq. (2).

In both the forward SDE and the reverse-time SDE, one needs to carefully choose the drift and diffusion coefficients bb and σ\sigma. While there are multiple options for bb and σ\sigma, in this work we let

b(t)=dlogαtdt\displaystyle b(t)=\frac{d\log\alpha_{t}}{dt} (4)
σ2(t)=dβt2dt2dlogαtdtβt2\displaystyle\sigma^{2}(t)=\frac{d\beta^{2}_{t}}{dt}-2\frac{d\log\alpha_{t}}{dt}\beta^{2}_{t}

with αt=1t\alpha_{t}=1-t and βt=t\beta_{t}=\sqrt{t} for t[0,1]t\in[0,1], which is consistent with the choice made in Song et al. (2021). Nevertheless, in future work we plan to explore the sensitivity of EnSF to different options for the drift and diffusion coefficients.

The traditional use of diffusion models in the machine learning (ML) literature is to generate highly realistic images and videos (Song et al. 2021). Later in our exposition, we will discuss how this powerful approach can be also used in the context of data assimilation (DA).

2.2 The general filtering solution

The formulation of every DA method requires two main ingredients – (stochastic) dynamic and observation models. Let k=0,1,2,,Kk=0,1,2,...,K be a time index. A general representation of the discretized system evolution is given by

𝑿k=𝐟k1(𝑿k1,𝑬k1m),\boldsymbol{X}_{k}={\mathbf{f}}_{k-1}(\boldsymbol{X}_{k-1},\boldsymbol{E}^{m}_{k-1}), (5)

where 𝑿kNx\boldsymbol{X}_{k}\in{\mathbb{R}}^{N_{x}} stands for the (true) state vector we wish to estimate and 𝐟k{\mathbf{f}}_{k} represents the error-prone numerical model. The model errors 𝑬km\boldsymbol{E}^{m}_{k} arise due to (i) discretization of the governing equations and (ii) imperfect knowledge about the simulated process. In this study, we assume the model errors are known and we only need to infer the unknown true state from the noisy observations

𝒀k=𝐡k(𝑿k)+𝑬kowith 𝑬ko𝒩(𝟎,𝐑k),\boldsymbol{Y}_{k}={\mathbf{h}}_{k}(\boldsymbol{X}_{k})+\boldsymbol{E}^{o}_{k}\quad\text{with }\boldsymbol{E}^{o}_{k}\sim\mathcal{N}({\mathbf{0}},{\mathbf{R}}_{k}), (6)

where 𝐡k{\mathbf{h}}_{k} is an observation operator and 𝑬ko\boldsymbol{E}^{o}_{k} are the associated observation errors. Note that while the formulation of Eq. (6) follows the standard additive-Gaussian observation error model, it is possible to introduce more general statistical assumptions (Chipilski 2023).

Given the model and observation equations, a rather general formulation of the DA problem is to seek the filtering (posterior) pdf P(𝐱k|𝐲1:k)P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k})111The notation 1:k1:k is shorthand for the set of integers from 11 to kk., which can be done recursively by iterating through a prediction and an update step:

Prediction:

Given the filtering pdf P(𝐱k1|𝐲1:k1)P({\mathbf{x}}_{k-1}|{\mathbf{y}}_{1:k-1}) at the previous time level k1k-1, we can first compute the prior pdf P(𝐱k|𝐲1:k1)P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k-1}) using the Chapman-Kolmogorov equation

P(𝐱k|𝐲1:k1)=P(𝐱k1|𝐲1:k1)P(𝐱k|𝐱k1)𝑑𝐱k1,P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k-1})=\int P({\mathbf{x}}_{k-1}|{\mathbf{y}}_{1:k-1})P({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1})d\boldsymbol{{\mathbf{x}}}_{k-1}, (7)

where P(𝐱k|𝐱k1)P({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}) is the transition pdf determined from the dynamic model in Eq. (5).

Update:

After receiving the new set of observations 𝒀k=𝐲k\boldsymbol{Y}_{k}={\mathbf{y}}_{k}, we can adjust the forecast state towards the observations by applying Bayes’ theorem,

P(𝐱k|𝐲1:k)P(𝐱k|𝐲1:k1)P(𝐲k|𝐱k),P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k})\propto P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k-1})P({\mathbf{y}}_{k}|{\mathbf{x}}_{k}), (8)

where the likelihood function P(𝐲k|𝐱k)P({\mathbf{y}}_{k}|{\mathbf{x}}_{k}) is determined from the observation model in Eq. (6). In view of our Gaussian assumptions, the likelihood can be written as

P(𝐲k|𝐱k)exp(12[𝐲k𝐡(𝐱k)]𝐑k1[𝐲k𝐡(𝐱k)]).\displaystyle P({\mathbf{y}}_{k}|{\mathbf{x}}_{k})\propto\exp\Big{(}-\frac{1}{2}[{\mathbf{y}}_{k}-{\mathbf{h}}({\mathbf{x}}_{k})]{{}^{\top}}{\mathbf{R}}_{k}^{-1}[{\mathbf{y}}_{k}-{\mathbf{h}}({\mathbf{x}}_{k})]\Big{)}. (9)

Through the recursive application of the prediction and update steps in Eqs. (7) and (8), we can find the filtering pdf for any time index kk.

2.3 Score-based filtering for data assimilation

The central idea in score-based filtering is to create score models which represent the filtering pdfs at different times and then generate analysis ensemble members according to the reverse-time SDE in Eq. (2). To this end, we introduce the score functions 𝐬k|k1{\mathbf{s}}_{k|k-1} and 𝐬k|k{\mathbf{s}}_{k|k} corresponding to the prior pdf P(𝐱k|𝐲1:k1)P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k-1}) and the posterior pdf P(𝐱k|𝐲1:k)P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}):

  • Prior filtering score 𝐬k|k1{\mathbf{s}}_{k|k-1} such that 𝒁0𝑿k|𝒀1:k1\boldsymbol{Z}_{0}\equiv\boldsymbol{X}_{k}|\boldsymbol{Y}_{1:k-1}; i.e., Q(𝐳0)P(𝐱k|𝐲1:k1)Q({\mathbf{z}}_{0})\equiv P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k-1}).

  • Posterior filtering score 𝐬k|k{\mathbf{s}}_{k|k} such that 𝒁0𝑿k|𝒀1:k\boldsymbol{Z}_{0}\equiv\boldsymbol{X}_{k}|\boldsymbol{Y}_{1:k}; i.e., Q(𝐳0)P(𝐱k|𝐲1:k)Q({\mathbf{z}}_{0})\equiv P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}).

Next, we describe a recursive procedure to implement the score-based filter for DA. To proceed, it is assumed that the posterior score 𝐬k1|k1{\mathbf{s}}_{k-1|k-1} associated with the posterior (filtering) pdf at time level k1k-1 is given. This is analogous to how the initial state distribution is assumed to be known in sequential DA. An Euler scheme can be used to discretize the reverse-time SDE in Eq. (2) (Kloeden and Platen 1992) as follows

𝒁tnm=𝒁tn+1m[b(tn+1)𝒁tn+1m\displaystyle\boldsymbol{Z}^{m}_{t_{n}}=\boldsymbol{Z}^{m}_{t_{n+1}}-\big{[}b(t_{n+1})\boldsymbol{Z}^{m}_{t_{n+1}} σ2(tn+1)𝐬k1|k1(𝒁tn+1m,tn+1)]Δtnσ(tn+1)Δ𝑾tn,\displaystyle-\sigma^{2}(t_{n+1}){\mathbf{s}}_{k-1|k-1}(\boldsymbol{Z}^{m}_{t_{n+1}},t_{n+1})\big{]}\Delta t_{n}-\sigma(t_{n+1})\Delta\boldsymbol{W}_{t_{n}}, (10)
n=0,1,,N1;m=1,2,,M,\displaystyle n=0,1,\dots,N-1;\quad m=1,2,\dots,M,

where {𝒁tnm}m=1M\{\boldsymbol{Z}^{m}_{t_{n}}\}_{m=1}^{M} is a set of MM iid samples, Δtntn+1tn\Delta t_{n}\coloneqq t_{n+1}-t_{n}, Δ𝑾tn𝑾tn+1𝑾tn\Delta\boldsymbol{W}_{t_{n}}\coloneqq\boldsymbol{W}_{t_{n+1}}-\boldsymbol{W}_{t_{n}}, and the above scheme uses the following time discretization

{tn:0=t0<t1<<tn<<tN=T}\{t_{n}:0=t_{0}<t_{1}<\cdots<t_{n}<\cdots<t_{N}=T\}

over the diffusion model’s pseudo-time interval [0,T][0,T]. The Euler scheme is one of the two most popular schemes for solving SDEs numerically. The second one is based on the Milstein method (Milstein 1975), but for the standard SDE formulation used here (in which the diffusion coefficient σ\sigma is state independent), the two schemes give identical results.

Following the standard workflow of diffusion models, the reverse-time SDE is initialized with samples drawn from a standard Gaussian distribution; that is, {𝒁tNm}m=1M𝒩(𝟎,𝐈)\{\boldsymbol{Z}^{m}_{t_{N}}\}_{m=1}^{M}\sim\mathcal{N}({\mathbf{0}},{\mathbf{I}}). Leveraging the Euler scheme in Eq. (10), these initial samples are mapped to the analysis ensemble {𝑿k1|k1m}m=1M\{\boldsymbol{X}^{m}_{k-1|k-1}\}_{m=1}^{M} which follows the filtering pdf P(𝐱k1|𝐲1:k1)P({\mathbf{x}}_{k-1}|{\mathbf{y}}_{1:k-1}) by construction.

The generative nature of the diffusion process needed to obtain the analysis ensemble {𝑿k1|k1m}m=1M\{\boldsymbol{X}^{m}_{k-1|k-1}\}_{m=1}^{M} is worth elaborating on. With an appropriately estimated score function 𝐬^k1|k1\hat{{\mathbf{s}}}_{k-1|k-1}, we can generate unlimited samples from the posterior pdf P(𝐱k1|𝐲1:k1)P({\mathbf{x}}_{k-1}|{\mathbf{y}}_{1:k-1}), which can be seen as a non-Gaussian extension of the class of resampling EnKFs discussed by Anderson (2001). In addition, the sample size MM can be an arbitrary number depending on the specific application and computing resources.

Prediction step:

The prediction step in the score filter is analogous to all ensemble DA methods. In particular, the dynamic model in Eq. (5) is used to advance each analysis ensemble member 𝑿k1|k1m\boldsymbol{X}^{m}_{k-1|k-1} to the next observation time level kk. In doing so, we obtain the forecast ensemble {𝑿k|k1m}m=1M\{\boldsymbol{X}^{m}_{k|k-1}\}_{m=1}^{M} which approximates the prior pdf P(𝐱k|𝐲1:k1)P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k-1}) and will be used to estimate the prior score function 𝐬^k|k1\hat{{\mathbf{s}}}_{k|k-1}.

Update step:

The posterior score function 𝐬k|k{\mathbf{s}}_{k|k} implicitly encodes the posterior pdf P(𝐱k|𝐲1:k)P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}), which is why its estimation is an essential part of the update step in the score filter. Since P(𝐱k|𝐲1:k)P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}) combines information from both the prior and the likelihood, the derivation of 𝐬k|k{\mathbf{s}}_{k|k} will focus on how to effectively adjust the prior score based on the assimilated observations. Indeed, this is a general challenge for all DA methods. Calculating the gradient of the logarithm of Eq. (8), we see that

𝐱logP(𝐱k|𝐲1:k)=𝐱logP(𝐱k|𝐲1:k1)+𝐱logP(𝐲k|𝐱k),\displaystyle\nabla_{\mathbf{x}}\log P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k})=\nabla_{\mathbf{x}}\log P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k-1})+\nabla_{\mathbf{x}}\log P({\mathbf{y}}_{k}|{\mathbf{x}}_{k}), (11)

where the gradient 𝐱\nabla_{\mathbf{x}} is taken with respect to the state variable 𝐱k{\mathbf{x}}_{k}.

The posterior pdf P(𝐱k|𝐲1:k)P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}) is the target distribution for the update step of the score filter, implying that 𝐱logP(𝐱k|𝐲1:k)\nabla_{\mathbf{x}}\log P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}) is the desired posterior score function 𝐬k|k{\mathbf{s}}_{k|k} at pseudo-time t=0t=0 (recall the score function definition in Eq. 3). Further notice that P(𝐱k|𝐲1:k1)P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k-1}) is the prior pdf, hence 𝐱logP(𝐱k|𝐲1:k1)\nabla_{\mathbf{x}}\log P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k-1}) is equivalent to the prior score function 𝐬k|k1{\mathbf{s}}_{k|k-1} at pseudo time t=0t=0. This suggests that 𝐱logP(𝐲k|𝐱k)\nabla_{\mathbf{x}}\log P({\mathbf{y}}_{k}|{\mathbf{x}}_{k}) should be the likelihood portion of the posterior filtering score at pseudo time t=0t=0. Taking advantage of this link, we propose the following posterior score function

𝐬k|k(𝐳,t):=𝐬k|k1(𝐳,t)+h(t)𝐱logp(𝐲k|𝐳).{\mathbf{s}}_{k|k}({\mathbf{z}},t):={\mathbf{s}}_{k|k-1}({\mathbf{z}},t)+h(t)\nabla_{\mathbf{x}}\log p({\mathbf{y}}_{k}|{\mathbf{z}}). (12)

The prior score 𝐬^k|k1\hat{{\mathbf{s}}}_{k|k-1} is already estimated from the forecast ensemble, and we will refer to 𝐱logp(𝐲k|)\nabla_{\mathbf{x}}\log p({\mathbf{y}}_{k}|\cdot) as the likelihood score associated with the observational data 𝐲k{\mathbf{y}}_{k}. The coefficient h(t)h(t) multiplying the likelihood score is a damping function satisfying the following property:

h(t)h(t) is monotonically decreasing in [0,T][0,T] with h(0)=1h(0)=1 and h(T)=0h(T)=0.

We use h(t)=Tth(t)=T-t for the SQG experiments reported in this paper. However, it should be emphasized there are multiple choices to define hh, and the question of which one is mathematically optimal remains open.

Having the posterior score 𝐬^k|k\hat{{\mathbf{s}}}_{k|k}, we can now use the discretized scheme in Eq. (10) to transport a set of samples from 𝒩(𝟎,𝐈)\mathcal{N}({\mathbf{0}},{\mathbf{I}}) to the desired analysis ensemble {𝑿k|km}m=1M\{\boldsymbol{X}_{k|k}^{m}\}_{m=1}^{M} from P(𝐱k|𝐲1:k)P({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}).

2.4 The Ensemble Score Filter (EnSF)

The estimation of score functions is a central topic in the score-based filtering approach discussed so far. The standard technique to estimate scores is via deep learning (Song et al. 2021; Bao et al. 2023b). Here, we introduce an ensemble approximation referred to as the Ensemble Score Filter (EnSF) which does not require the training of neural networks.

Since the likelihood score appearing in Eq. (12) can be computed explicitly in the Gaussian case considered here, the major computational effort in EnSF lies in evaluating the prior score 𝐬^k|k1\hat{{\mathbf{s}}}_{k|k-1}. The latter can be achieved by setting 𝒁0\boldsymbol{Z}_{0} to the forecast ensemble {𝑿k|k1m}m=1M\{\boldsymbol{X}^{m}_{k|k-1}\}_{m=1}^{M}. From the definition of score functions and the choice of bb and σ\sigma in Eq. (4), we have that the conditional density Q(𝐳t|𝐳0)Q({\mathbf{z}}_{t}|{\mathbf{z}}_{0}) needed in the forward SDE is given by

Q(𝐳t|𝐳0)exp(12βt2(𝐳tαt𝐳0)(𝐳tαt𝐳0)).Q({\mathbf{z}}_{t}|{\mathbf{z}}_{0})\propto\exp\Big{(}-\frac{1}{2\beta_{t}^{2}}({\mathbf{z}}_{t}-\alpha_{t}{\mathbf{z}}_{0}){{}^{\top}}({\mathbf{z}}_{t}-\alpha_{t}{\mathbf{z}}_{0})\Big{)}. (13)

With this, marginalizing over 𝐳0{\mathbf{z}}_{0} yields the following score function

𝐬(𝐳t,t)\displaystyle{\mathbf{s}}({\mathbf{z}}_{t},t) =𝐳logQ(𝐳t)=𝐳log(Q(𝐳t|𝐳0)Q(𝐳0)𝑑𝐳0)\displaystyle=\nabla_{\mathbf{z}}\log Q({\mathbf{z}}_{t})=\nabla_{\mathbf{z}}\log\left(\int Q({{\mathbf{z}}}_{t}|{\mathbf{z}}_{0})Q({\mathbf{z}}_{0})d{\mathbf{z}}_{0}\right) (14)
=1Q(𝐳t|𝐳0)Q(𝐳0)𝑑𝐳0𝐳tαt𝐳0βt2Q(𝐳t|𝐳0)Q(𝐳0)d𝐳0\displaystyle=\frac{1}{\int Q({\mathbf{z}}_{t}|{\mathbf{z}}^{\prime}_{0})Q({\mathbf{z}}^{\prime}_{0})d{\mathbf{z}}^{\prime}_{0}}\int-\frac{{\mathbf{z}}_{t}-\alpha_{t}{\mathbf{z}}_{0}}{\beta^{2}_{t}}Q({\mathbf{z}}_{t}|{\mathbf{z}}_{0})Q({\mathbf{z}}_{0})d{\mathbf{z}}_{0}
=𝐳tαt𝐳0βt2𝐰t(𝐳t,𝐳0)Q(𝐳0)𝑑𝐳0,\displaystyle=-\int\frac{{\mathbf{z}}_{t}-\alpha_{t}{\mathbf{z}}_{0}}{\beta^{2}_{t}}{\mathbf{w}}_{t}({\mathbf{z}}_{t},{\mathbf{z}}_{0})Q({\mathbf{z}}_{0})d{\mathbf{z}}_{0},

where the weight function 𝐰t(𝐳t,𝐳0){\mathbf{w}}_{t}({\mathbf{z}}_{t},{\mathbf{z}}_{0}) is defined by

𝐰t(𝐳t,𝐳0):=Q(𝐳t|𝐳0)Q(𝐳t|𝐳0)Q(𝐳0)𝑑𝐳0,{\mathbf{w}}_{t}({\mathbf{z}}_{t},{\mathbf{z}}_{0}):=\frac{Q({\mathbf{z}}_{t}|{\mathbf{z}}_{0})}{\int Q({\mathbf{z}}_{t}|{\mathbf{z}}^{\prime}_{0})Q({\mathbf{z}}^{\prime}_{0})d{\mathbf{z}}^{\prime}_{0}}, (15)

and satisfies the condition 𝐰t(𝐳t,𝐳0)Q(𝐳0)𝑑𝐳0=1\int{\mathbf{w}}_{t}({\mathbf{z}}_{t},{\mathbf{z}}_{0})Q({\mathbf{z}}_{0})d{\mathbf{z}}_{0}=1.

Then, we can apply a Monte Carlo approximation for 𝐬k|k1{\mathbf{s}}_{k|k-1} based on Eq. (14) at a given 𝐳{\mathbf{z}} and t[0,1]t\in[0,1]:

𝐬k|k1(𝐳,t)𝐬^k|k1(𝐳,t):=j=1J𝐳αt𝐱k|k1mjβt2𝐰^t(𝐳,𝐱k|k1mj),{\mathbf{s}}_{k|k-1}({\mathbf{z}},t)\approx\hat{{\mathbf{s}}}_{k|k-1}({\mathbf{z}},t):=\sum_{j=1}^{J}-\frac{{\mathbf{z}}-\alpha_{t}{\mathbf{x}}_{k|k-1}^{m_{j}}}{\beta^{2}_{t}}\hat{{\mathbf{w}}}_{t}\left({\mathbf{z}},{\mathbf{x}}_{k|k-1}^{m_{j}}\right), (16)

where {𝑿k|k1mj}j=1J\{\boldsymbol{X}_{k|k-1}^{m_{j}}\}_{j=1}^{J} is a mini-batch of samples from the forecast ensemble {𝑿k|k1m}m=1M\{\boldsymbol{X}_{k|k-1}^{m}\}_{m=1}^{M}, and 𝐰^t\hat{{\mathbf{w}}}_{t} is a Monte Carlo approximation of Eq. (15) such that

𝐰^t(𝐳,𝐱k|k1mj):=Q(𝐳|𝐳k|k1mj)j=1JQ(𝐳|𝐱k|k1mj).\hat{{\mathbf{w}}}_{t}\left({\mathbf{z}},{\mathbf{x}}_{k|k-1}^{m_{j}}\right):=\frac{Q\left({\mathbf{z}}|{\mathbf{z}}_{k|k-1}^{m_{j}}\right)}{\sum_{j=1}^{J}Q\left({\mathbf{z}}|{\mathbf{x}}_{k|k-1}^{m^{\prime}_{j}}\right)}. (17)

Having the estimated prior score 𝐬^k|k1\hat{{\mathbf{s}}}_{k|k-1}, we can use Eq. (12) to write the approximate posterior score as

𝐬^k|k(𝐳,t):=𝐬^k|k1(𝐳,t)+h(t)𝐱logP(𝐲k|𝐳).\hat{{\mathbf{s}}}_{k|k}({\mathbf{z}},t):=\hat{{\mathbf{s}}}_{k|k-1}({\mathbf{z}},t)+h(t)\nabla_{\mathbf{x}}\log P({\mathbf{y}}_{k}|{\mathbf{z}}). (18)

With these approximations in mind, we are finally in a position to summarize the EnSF algorithm:

 

Algorithm: Ensemble Score Filter (EnSF)
 
1:  Input: The forecast model 𝐟{\mathbf{f}} and initial state pdf P(𝐱0)P({\mathbf{x}}_{0});
2: Generate MM samples {𝑿0|0m}m=1M\{\boldsymbol{X}_{0|0}^{m}\}_{m=1}^{M} from the initial pdf P(𝐱0)P({\mathbf{x}}_{0});
3:  for k=1,2,,Kk=1,2,\ldots,K:  % physical time loop
4:   Run the forecast model 𝐟{\mathbf{f}} to get the forecast ensemble {𝑿k|k1m}m=1M\{\boldsymbol{X}_{k|k-1}^{m}\}_{m=1}^{M};
5:   for n=N1,,0n=N-1,\ldots,0:  % pseudo-time loop for the backward SDE
6:      Compute the weight 𝐰^tn+1(𝐳tn+1m,𝐱k|k1m)\hat{{\mathbf{w}}}_{t_{n+1}}({\mathbf{z}}_{t_{n+1}}^{m},{\mathbf{x}}_{k|k-1}^{m}) using Eq. (17);
7:      Compute {𝐬^k|k1(𝐳tn+1m,tn+1)}m=1M\{\hat{{\mathbf{s}}}_{k|k-1}({\mathbf{z}}_{t_{n+1}}^{m},t_{n+1})\}_{m=1}^{M} using Eq. (16);
8:      Compute and store {𝐬^k|k(𝐳tn+1m,tn+1)}m=1M\{\hat{{\mathbf{s}}}_{k|k}({\mathbf{z}}_{t_{n+1}}^{m},t_{n+1})\}_{m=1}^{M} using Eq. (18);
9:     end
10:    Compute {𝒁t0m}m=1M\{\boldsymbol{Z}_{t_{0}}^{m}\}_{m=1}^{M} using Eq. (10) and set {𝑿k|km}m=1M{𝒁t0m}m=1M\{\boldsymbol{X}_{k|k}^{m}\}_{m=1}^{M}\leftarrow\{\boldsymbol{Z}_{t_{0}}^{m}\}_{m=1}^{M};
11: end
 

3 Experimental design

Before presenting our results, we outline some details on the numerical model and reference EnKF method used in this study.

3.1 Surface quasi-geostrophic (SQG) model

The SQG model belongs to a special class of quasi-geostrophic models in which a fluid of constant potential vorticity (PV) is bounded between two flat, rigid surfaces (Tulloch and Smith 2009b). Despite its idealized nature, the system is capable of simulating turbulent motions similar to those occurring in real geophysical flows (Smith et al. 2023). It is also suitable for DA studies because the SQG flow is inherently chaotic and sensitive to initial condition errors (Rotunno and Snyder 2008; Durran and Gingrich 2014).

In this study, we adopt the SQG formulation proposed by Tulloch and Smith (2009a) in which the dynamics reduce to the nonlinear Eady model with an f-plane approximation as well as uniform stratification and shear. For this case, the governing equations simplify to the advection of potential temperature on the bounding surfaces. Those equations are solved numerically by first applying a fast Fourier transform (FFT) to map model variables to spectral space. They are integrated forward with a 4th{}^{\text{th}}-order Runge Kutta scheme that uses a 2/32/3 dealiasing rule and implicit treatment of hyperdiffusion. Further numerical details and open-source code can be found on the GitHub page shared at the end of the paper.

3.2 Observing system simulation experiments

The EnSF performance is assessed with 2020 ensemble members using a standard observing system simulation experiment (OSSE) framework where synthetic observations are generated by corrupting the true (nature) run with random noise. The construction of the nature run mostly follows the details of Wang et al. (2021). Two notable exceptions are that we perform additional experiments with a higher resolution version of the model (96 ×\times 96 points) and further carry out imperfect model experiments where the nature run is contaminated with unpredictable model errors.

More significant differences appear in the generation of synthetic observations. First, we utilize a 12-hour assimilation window. Compared to the 3-hour window of Wang et al. (2021), this constitutes a more challenging scenario as it creates larger differences with the true state and likely causes more significant departures from Gaussianity in the forecast ensemble. Moreover, we test a wider range of observing networks, starting with the simpler case of a fully observed state with linear observations and finishing with a partially and nonlinearly observed state (50%\% coverage) that is additionally subject to unexpected model errors. More specific details about the different observing networks are deferred to Section 4.

3.3 Reference LETKF method

While there is a vast array of available EnKF methods we could compare EnSF’s performance against, here we opt for the Local Ensemble Transform Kalman Filter (LETKF; Hunt et al. 2007). LETKF is a popular EnKF variant that is currently implemented in several major operational centers (e.g., Schraff et al. 2016; Frolov et al. 2024). One of the most appealing features of this method is its efficiency – the model state is decomposed into overlapping subdomains which are updated separately (and in parallel) using the analysis equations of the Ensemble Transform Kalman Filter (ETKF; Bishop et al. 2001). Our numerical implementation considers local regions surrounding single grid points defined by a cutoff radius. The cut-off radius is determined by standard Gaspari-Cohn (GC) taper functions (Gaspari and Cohn 1999), and we also apply the original RR-localization strategy of Hunt et al. (2007)222With R-localization, the local observation error variances are multiplied by the inverse GC function. to smoothly decrease the impact of observation away from each model grid point. Analogous to Wang et al. (2021), the vertical and horizontal localization scales are coupled dynamically through the Rossby radius of deformation. We also apply the relaxation to prior spread (RTPS) inflation discussed in Whitaker and Hamill (2012) in order to prevent underdispersive analysis ensembles as a result of the finite ensemble size. In most experiments, the inflation factor is optimally tuned for LETKF while EnSF currently sets RTPS to 1.

While the simultaneous use of localization and inflation can greatly improve LETKF’s performance, achieving optimal analysis results necessitates careful parameter tuning, which is computationally demanding in operational contexts. Moreover, sensitivity tests are required each time one makes changes to the NWP system (model resolution, type of assimilated observations, etc). Our initial results in Section 4 emphasize how LETKF’s skill is highly sensitive to suboptimal choices of localization and inflation even in the conceivably simpler case of a fully and linearly observed state. This will be contrasted with EnSF’s performance which leads to stable performance across all experiments and without any further ensemble regularization strategies (although additional experiments not presented in the paper suggest that optimal tuning of RTPS can further improve EnSF’s performance).

4 Results

The analysis skill of EnSF and LETKF is examined in the context of 4 different experimental settings with increasing complexity. Table 1 separates them according to the type of observations (linear vs. nonlinear) as this accounts for the largest filtering differences.

Linear observations Nonlinear observations
EXP_L1: fully observed state EXP_NL1: fully observed state
EXP_L2: fully observed state subject to unknown model errors EXP_NL2: partially observed state subject to unknown model errors
Table 1: A summary of the 4 main experimental settings used to compare EnSF and LETKF in Section 4. Note that EXP_L1 contains more than one numerical test (e.g., using the SQG model at different resolutions).

4.1 Linear observations

In EXP_L1, we compare the performance of EnSF and LETKF in the case where the state of the SQG model is fully and linearly observed. Measurements are corrupted by additive Gaussian errors such that

𝒀k=𝑿k+𝑬kowith 𝑬ko𝒩(𝟎,𝐑k),\boldsymbol{Y}_{k}=\boldsymbol{X}_{k}+\boldsymbol{E}_{k}^{o}\quad\text{with }\boldsymbol{E}_{k}^{o}\sim\mathcal{N}({\mathbf{0}},{\mathbf{R}}_{k}), (19)

with 𝐑k=𝐈{\mathbf{R}}_{k}={\mathbf{I}} (i.e., the observation error variance is 11K). While this represents a fairly straightforward test for most conventional DA methods, the combination of a small ensemble size (2020) and large number of independent observations will likely cause weight collapse in the traditional (bootstrap) particle filter. This compels us to first explore the stability properties of the new EnSF method.

It was already discussed in Section 33.3 that one of the major obstacles with EnKFs is the need for a careful parameter tuning. The impact of suboptimal choices for these parameters is illustrated in Figure  1 where LETKF uses the same localization and inflation parameters as Fig. 11a of Wang et al. (2021) [horizontal localization scale of Lh=2500L_{h}=2500 km and RTPS =0.6=0.6]. Despite the physical realism of these parameter choices, we see that the LETKF experiment quickly diverges due to changes in the model resolution (96×9696\times 96 vs. 64×6464\times 64 grid points in the horizontal), assimilation period (12h vs. 3h) and number of observations (fully vs. partially observed state). By contrast, EnSF demonstrates stable performance throughout all 300 assimilation cycles without requiring any additional parameter tuning.

Refer to caption
Figure 1: Comparison of the analysis root mean squared errors (RMSEs) between EnSF and LETKF with a high-resolution configuration of the SQG model that uses 96×9696\times 96 grid points in the horizontal. In this simulation, the SQG state is fully observed through linear observations with Gaussian errors (EXP_L1). The LETKF experiment uses the same localization and inflation parameters as Figure 11a of Wang et al. (2021) [Lh=2500L_{h}=2500 km and RTPS =0.6=0.6], while EnSF does not implement any localization and uses RTPS =1=1.

To achieve a fair comparison between the two DA methods in this linear observation regime, our next step is to optimize LETKF’s localization and inflation parameters. We revert back to the coarser model resolution of 64×6464\times 64 horizontal grid points and see from Figure 2a that the best LETKF performance is achieved with Lh=2000L_{h}=2000 km and RTPS = 0.30.3. Evidently, LETKF performs slightly better in this case. However, it is worth noting that EnSF continues to maintain stable performance despite the change in model resolution and lack of additional parameter tuning. EnSF’s sensitivity to different parameter settings will be the subject of future work and is expected to further reduce the gap between the two filtering techniques in this linear regime (EXP_L1).

Refer to caption
(a)
Refer to caption
(b)
Figure 2: Same experimental setting as in Figure 1 (EXP_L1), but with the difference that the SQG model has a coarser grid made of 64×6464\times 64 grid points in the horizontal. Panel (a) shows the time-averaged analysis RMSE errors of LETKF as a function of localization scale (LhL_{h}) and inflation (RTPS), with the best tuned experiment marked with a black cross. In addition to the analysis RMSE time series of EnSF and LETKF, the black curve in panel (b) shows the RMSE values of a reference experiment (SQG) in which no observations are assimilated.
Refer to caption
(a) Truth
Refer to caption
(b) 20%20\% noise
Refer to caption
(c) 30%30\% noise
Refer to caption
(d) 40%40\% noise
Refer to caption
(e) 50%50\% noise
Figure 3: Visualizing the impacts from corrupting the true SQG state with model errors of different amplitude (relevant to EXP_L2 and EXP_NL2). Please refer to the main text for more details on how these model errors are incorporated in our DA experiments.

In the next experimental setting EXP_L2, we address the more challenging but realistic scenario involving an imperfect model. While still assuming direct measurements of all state variables, we add an a-priori unknown stochastic noise 𝑬u\boldsymbol{E}^{u} to the SQG state evolution. Assuming the noise is Gaussian, the modified stochastic-dynamic model can be written as

𝑿k=𝐟k1(𝑿k1,𝑬k1m)+𝑬kuwith 𝑬ku𝒩(𝟎,𝚺k).\boldsymbol{X}_{k}={\mathbf{f}}_{k-1}(\boldsymbol{X}_{k-1},\boldsymbol{E}^{m}_{k-1})+\boldsymbol{E}^{u}_{k}\quad\text{with }\boldsymbol{E}^{u}_{k}\sim\mathcal{N}({\mathbf{0}},{\mathbf{\Sigma}}_{k}). (20)

The covariance matrix 𝚺k{\mathbf{\Sigma}}_{k} is taken to be diagonal as we want to impose spatially uncorrelated model noise. We also assume that the unknown model errors are white in time (i.e., no temporal correlations). More specifically, 𝑬ku\boldsymbol{E}^{u}_{k} in EXP_L2 is defined as the composition of four distinct state-dependent error processes: (1) 20%20\% chance of occurrence (in time) with 20%20\% model noise, (2) 15%15\% chance of occurrence with 30%30\% model noise, (3) 10%10\% chance of occurrence with 40%40\% model noise, and (4) 5%5\% chance of occurrence with 50%50\% model noise. The purpose of introducing this variety of model errors is to simulate the effects of flow-dependent model uncertainties. For example, the rare occurrence of high-amplitude model errors in the 4th{}^{\text{th}} process might be associated with the rapid development of dynamical instabilities in the θ\theta field. The study of Held et al. (1995) shows examples of such instabilities and discusses how they tend to form along temperature filaments (see their Fig. 2).

It is crucial to highlight that although we specified a particular structure for the time-dependent error process 𝑬ku\boldsymbol{E}^{u}_{k}, obtaining an explicit expression for 𝑬ku\boldsymbol{E}^{u}_{k} is impractical in reality. Consequently, adjusting the LETKF inflation and localization parameters to account for the unknown model errors 𝑬ku\boldsymbol{E}^{u}_{k} is not feasible. In Figure 3, we visually illustrate the impact of adding model errors of various amplitude to the true state at the final time.

Refer to caption
Figure 4: Same as Figure 2 but with the exception that the presented comparison refers to the imperfect model setting EXP_L2. Note that the LETKF experiment (red curve) uses the optimal localization and inflation values from Figure 2a.

A comparison between the analysis RMSE errors of EnSF and LETKF in EXP_L2 is presented in Figure 4. Unlike our previous experiment, LETKF diverges from the true state as model errors accumulate in time, with analysis RMSEs now comparable to the free forecast run where no observations are assimilated. This finding offers additional evidence for the enhanced sensitivity of LETKF to model imperfections – without the implementation of more advanced regularization techniques (e.g., adaptive inflation), LETKF will be susceptible to analysis errors even when nearly optimal parameters have been used and the system is fully and linearly observed. This is to be contrasted with EnSF’s performance where we achieve stable performance throughout all analysis cycles despite the lack of any further tuning. A snapshot of these differences at the last analysis time is shown in Figure 5. The top 3 figures offer a visual confirmation that EnSF’s analysis mean (panel b) is much closer to the truth (panel a). While LETKF (panel c) does well at representing large-scale patterns, it struggles to capture some of the small-scale features and extreme values in the potential temperature field. The larger error magnitude in LETKF is also confirmed by the two error plots in the second row.

Refer to caption
(a) Truth
Refer to caption
(b) EnSF analysis
Refer to caption
(c) LETKF analysis
Refer to caption
(d) EnSF error
Refer to caption
(e) LETKF error
Figure 5: The top row shows the analysis ensemble means from EnSF and LETKF under scenario EXP_L2 and with respect to the true potential temperature field at the final observation time. The analysis mean errors of the two experiments are displayed on the bottom row.

4.2 Nonlinear observations

In the second round of experiments, the synthetically generated observations are nonlinear functions of the SQG state. Specifically, we consider an arctangent nonlinearity and additive Gaussian observation errors. The observation model relevant to EXP_NL1 from Table 1 can be written as

𝒀k=arctan(𝑿k)+𝑬~kowith 𝑬~ko𝒩(𝟎,𝐑~k).\boldsymbol{Y}_{k}=\arctan(\boldsymbol{X}_{k})+\tilde{\boldsymbol{E}}^{o}_{k}\quad\text{with }\tilde{\boldsymbol{E}}_{k}^{o}\sim\mathcal{N}({\mathbf{0}},\tilde{{\mathbf{R}}}_{k}).

Notice that the observation error variance is scaled relative to the linear observation experiments in order to reflect the narrower range of the arctangent function. In this case, we choose 𝐑~k=0.01×𝐑k=0.01×𝐈\tilde{{\mathbf{R}}}_{k}=0.01\times{\mathbf{R}}_{k}=0.01\times{\mathbf{I}}. Moreover, EXP_NL1 tests are carried out in the absence of unpredictable model shocks.

Analogous to the linear observation setting, stabilizing LETKF’s performance requires extensive parameter tuning. Figure 6a indicates the existence of a very narrow range of optimal localization and inflation parameters for which LETKF does not diverge – a manifestation of the strong deviations from non-Gaussianity in the posterior distribution. The red curve in Figure 6b confirms that this best tuned LETKF maintains stability during the entire experiment. On the other hand, a naive substitution of the optimal LETKF parameters from the linear observation setting results in a rapidly diverging filter (magenta curve in Figure 6b). While this comparison admittedly constitutes an extreme modification of the observing system, it reaffirms our earlier point that any changes in operational NWP systems require costly calibrations of the underlying EnKF algorithm. By contrast, EnSF systematically outperforms the best LETKF configuration without further application of ensemble regularization strategies.

Refer to caption
(a)
Refer to caption
(b)
Figure 6: Filtering performance in the case of a fully observed state with an arctangent nonlinearity (EXP_NL1). The format of this Figure is close to Figure 2 with the difference that the reference experiment in magenta line is LETKF with localization and inflation parameters optimally tuned for a linear observing system.

Our last experimental setting EXP_NL2 considers the most complex DA scenario – only half of the SQG state is observed nonlinearly such that

𝒀k=𝐇karctan(𝑿k)+𝑬~kowith 𝑬~ko𝒩(𝟎,𝐑~k),\boldsymbol{Y}_{k}={\mathbf{H}}_{k}\arctan(\boldsymbol{X}_{k})+\tilde{\boldsymbol{E}}^{o}_{k}\quad\text{with }\tilde{\boldsymbol{E}}_{k}^{o}\sim\mathcal{N}({\mathbf{0}},\tilde{{\mathbf{R}}}_{k}),

where 𝐇k{\mathbf{H}}_{k} is a selection matrix determining which state variables are observed at a given time level kk. Similar to Wang et al. (2021), we implement a procedure which randomly selects the observed model grid points for each kk. An example realization of the nonlinear observations is presented in Figure 7b. Comparison with the true field in Figure 7a serves as a good illustration for the squashing effect of the arctangent nonlinearity, and the general difficulty of extracting useful information from observations with a severely limited numerical range.

We also work under the imperfect model assumption described by Eq. (20), but choose a simpler definition for the unknown model errors 𝑬ku\boldsymbol{E}^{u}_{k}. In particular, a single stochastic noise is added 10% of the time with a magnitude that equals 30% of the nature run’s values.

Refer to caption
(a)
Refer to caption
(b)
Figure 7: (a) The original SQG state prior to the addition of model errors. (b) A snapshot from a partial observing network with a 50% coverage showing the loss of information (cf. limited range of values) due to the application of an arctangent nonlinearity.

Under the challenging DA setting of EXP_NL2, we find that EnSF’s analysis RMSEs remain nearly intact (compare blue curves in Figures 6b and 8b), while LETKF (Figure 8a) exhibits a visible skill deterioration (compare red curves in Figures 6b and 8b) despite our efforts to fine tune its performance for this case.

Refer to caption
(a)
Refer to caption
(b)
Figure 8: Same as Figure 6 but the analysis RMSEs refer to EXP_NL2, which contains a partially observed state that is additionally subject to imperfect model assumptions.

Given the multiscale characteristics of the SQG model, we conclude this section by presenting a spectral comparison of the EnSF and LETKF results in Figure 9. The plotted diagnostics still refer to the EXP_NL2 setting (partially and nonlinearly observed state corrupted by unpredictable model errors) but are now averaged in time. Examination of the solid red and blue curves in panel (a) indicates that EnSF has consistently lower analysis errors for all wavenumbers, but the most significant improvements occur at large scales (also refer to panel b). Focusing our attention on the LETKF results (red cuferves), we notice a systematic underdispersion of the analysis ensemble, which is most pronounced at large scales (panel c). This is perhaps one of the important reasons why a large number of LETKF experiments diverge in EXP_NL2. A different situation emerges for EnSF (blue curves) which tends to be overdispersive on average (panel c). This behavior is connected to the specific inflation settings used in EnSF (RTPS =1=1) but is likely also influenced by our choice of SDE and score parameters such as b(t),σ(t)b(t),\sigma(t) and h(t)h(t). We hypothesize that an optimal selection of these parameters will further improve EnSF’s performance and lead to a better spread-error consistency. Nevertheless, it is clear that even this “vanilla” implementation of EnSF offers significant performance benefits over LETKF.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 9: Spectral diagnostics associated with EXP_NL2. (a) Time-averaged kinetic energy (KE) density of the analysis mean errors (solid curves) and analysis perturbations (dashed curves) for EnSF (blue) and LETKF (red). (b) Percent improvement of EnSF’s analysis mean errors as a function of wavenumber. (c) Analysis consistency ratios for the 2 ensemble filters. Note that the values are obtained by dividing the dashed curves by the solid curves in panel (a).

5 Conclusions

In this study, we introduced a stable and highly efficient implementation of the Ensemble Score Filter (EnSF) for sequential data assimilation with geophysical systems. The theoretical basis for EnSF comes from diffusion models – an extremely popular class of generative AI (GenAI) methods which have the ability to produce highly realistic images and videos. Like other diffusion-based techniques, EnSF uses score functions (the gradient of the log probability) to store complex information about the underlying filtering (posterior) distribution. However, sampling from the posterior utilizes a training-free, Monte Carlo procedure which only requires access to the forecast ensemble members.

The resulting algorithm is applied to the surface quasi-geostrophic (SQG) model and compared against a benchmark Local Ensemble Transform Kalman Filter (LETKF). The analysis performance of both methods is explored in a hierarchy of experiments with increasing complexity. While they demonstrate comparable skill in the case of a fully and linearly observed state, EnSF performs significantly better when observations are partial and nonlinear, as well as when the model is subject to unexpected errors. We report stable EnSF performance in all experiments despite the lack of localization and additional parameter tuning. Even though we only use 20 ensemble members in our experiments, the results suggest that EnSF can reliably capture the non-Gaussian characteristics of a high-dimensional state with 8,1928,192 components. Although LETKF is competitive or even slightly more skillful in the linear observation case, the results are highly sensitive to the choice of inflation and localization parameters; this effect is even more pronounced with nonlinear observations when slight changes in the LETKF settings often lead to rapid filter divergence.

The EnSF findings outlined in this study are very encouraging and suggest a few possible avenues for further development. An immediate extension of this work will be to explore the sensitivity of EnSF to different types of observing networks. Moreover, we believe the algorithm will benefit from additional theoretical refinements. One ongoing line of research from the authors is to derive an optimal damping function h(t)h(t). Given the important role of h(t)h(t) in determining how observations are incorporated in the diffusion-based filtering procedure (cf. Eq. 12), we expect to see further improvements in EnSF’s analysis performance. Although localization was not required to prevent filter divergence in the SQG context, more work is still needed to understand how sampling errors affect EnSF in higher dimensions. Together with a suitable choice for h(t)h(t), an optimally designed localization scheme is expected to make EnSF comparable to a fine-tuned LETKF in linear-Gaussian regimes.

5.0.1 Acknowledgements.

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under the contract ERKJ387 at the Oak Ridge National Laboratory, which is operated by UT-Battelle, LLC, for the U.S. Department of Energy under Contract DE-AC05-00OR22725. The first author (FB) would also like to acknowledge support from the U.S. National Science Foundation through project DMS-2142672 and the support from the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Grant DE-SC0022297. The computing for this project was performed on the high performance computing (HPC) cluster at the Florida State University Research Computing Center.

5.0.2 Data availability statement.

All code written as part of this study will be made available on GitHub upon completing the peer-review process for this article. An open-source version of the SQG model used in our experiments can be found at https://github.com/jswhit/sqgturb.

References

  • Aksoy et al. (2009) Aksoy, A., D. Dowell, and C. Snyder, 2009: A multicase comparative assessment of the ensemble Kalman filter for assimilation of radar observations. Part I: Storm-scale analyses. Mon. Wea. Rev., 137, 1805–1824, doi:10.1175/2008MWR2691.1.
  • Aksoy et al. (2010) Aksoy, A., D. Dowell, and C. Snyder, 2010: A multicase comparative assessment of the ensemble Kalman filter for assimilation of radar observations. Part II: Short-range ensemble forecasts. Mon. Wea. Rev., 138, 1273–1292, doi:10.1175/2009MWR3086.1.
  • Anderson (2001) Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 2884–2903, doi:10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2.
  • Anderson (2022) Anderson, J. L., 2022: A quantile-conserving ensemble filter framework. Part I: Updating an observed variable. Mon. Wea. Rev., 150, 1061–1074, doi:10.1175/MWR-D-21-0229.1.
  • Anderson (2023) Anderson, J. L., 2023: A quantile-conserving ensemble filter framework. Part II: Regression of observation increments in a probit and probability integral transformed space. Mon. Wea. Rev., 151, 2759–2777, doi:10.1175/MWR-D-23-0065.1.
  • Bao et al. (2016) Bao, F., Y. Cao, A. Meir, and W. Zhao, 2016: A first order scheme for backward doubly stochastic differential equations. SIAM/ASA J. Uncertain. Quantif., 4 (1), 413–445.
  • Bao et al. (2023a) Bao, F., Z. Zhang, and G. Zhang, 2023a: An ensemble score filter for tracking high-dimensional nonlinear dynamical systems. arXiv, 1–17, doi:arXiv:2309.00983.
  • Bao et al. (2023b) Bao, F., Z. Zhang, and G. Zhang, 2023b: A score-based nonlinear filter for data assimilation. arXiv, 1–20, doi:arXiv:2306.09282.
  • Bao et al. (2023c) Bao, F., Z. Zhang, and G. Zhang, 2023c: A unified filter method for jointly estimating state and parameters of stochastic dynamical systems via the ensemble score filter. arXiv, 1–24, doi:arXiv:2312.10503.
  • Baranchuk et al. (2022) Baranchuk, D., A. Voynov, I. Rubachev, V. Khrulkov, and A. Babenko, 2022: Label-efficient semantic segmentation with diffusion models. International Conference on Learning Representations, URL https://openreview.net/forum?id=SlxSY2UZQT.
  • Bishop et al. (2001) Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420–436, doi:10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.
  • Chan et al. (2020) Chan, M.-Y., J. Anderson, and X. Chen, 2020: An efficient bi-Gaussian ensemble Kalman filter for satellite infrared radiance data assimilation. Mon. Wea. Rev., 148, 5087–5104, doi:10.1175/MWR-D-20-0142.1.
  • Chipilski (2023) Chipilski, H. G., 2023: Exact nonlinear state estimation. arXiv, 1–31, doi:10.48550/arXiv.2310.10976.
  • Chipilski et al. (2020) Chipilski, H. G., X. Wang, and D. B. Parsons, 2020: Impact of assimilating PECAN profilers on the prediction of bore-driven nocturnal convection: A multiscale forecast evaluation for the 6 july 2015 case study. Mon. Wea. Rev., 148, 1147–1175, doi:10.1175/MWR-D-19-0171.1.
  • Chipilski et al. (2022) Chipilski, H. G., X. Wang, D. B. Parsons, A. Johnson, and S. K. Degelia, 2022: The value of assimilating different ground-based profiling networks on the forecasts of bore-generating nocturnal convection. Mon. Wea. Rev., 150, 1273–1292, doi:10.1175/MWR-D-21-0193.1.
  • Crisan and Doucet (2002) Crisan, D., and A. Doucet, 2002: A survey of convergence results on particle filtering methods for practitioners. IEEE Transactions on Signal Processing, 50, 736–746, doi:10.1109/78.984773.
  • Durran and Gingrich (2014) Durran, D. R., and M. Gingrich, 2014: Atmospheric predictability: why butterflies are not of practical importance. Journal of the Atmospheric Sciences, 71, 2476–2488, doi:10.1175/JAS-D-14-0007.1.
  • Evensen (1994) Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99, 10 143–10 162, doi:10.1029/94JC00572.
  • Fletcher (2010) Fletcher, S. J., 2010: Mixed Gaussian-lognormal four-dimensional data assimilation. Tellus A: Dynamic Meteorology and Oceanography, 62, 266–287, doi:10.1111/j.1600-0870.2009.00439.x.
  • Fletcher et al. (2023) Fletcher, S. J., and Coauthors, 2023: Lognormal and mixed Gaussian-lognormal Kalman filters. Mon. Wea. Rev., 151, 761–774, doi:0.1175/MWR-D-22-0072.1.
  • Frolov et al. (2024) Frolov, S., and Coauthors, 2024: Local volume solvers for Earth system data assimilation: Implementation in the framework for Joint Effort for Data Assimilation Integration. Journal of Advances in Modeling Earth Systems, 16, e2023MS003 692, doi:10.1029/2023MS003692.
  • Gaspari and Cohn (1999) Gaspari, G., and S. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723–757, doi:10.1002/qj.49712555417.
  • Gordon et al. (1993) Gordon, N. J., D. J. Salmond, and A. F. M. Smith, 1993: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. Proc. Inst. Elect. Eng. F, 1400, 107–113.
  • Held et al. (1995) Held, I. M., R. T. Pierrehumbert, S. T. Garner, and K. L. Swanson, 1995: Surface quasi-geostrophic dynamics. J. Fluid Mech., 282, 1–20, doi:10.1017/S0022112095000012.
  • Hu and van Leeuwen (2021) Hu, C.-C., and P. J. van Leeuwen, 2021: A particle flow filter for high-dimensional system applications. Quart. J. Roy. Meteor. Soc., 147, 2352–2374, doi:10.1002/qj.4028.
  • Hu et al. (2023) Hu, G., S. L. Dance, R. N. Bannister, H. G. Chipilski, O. Guillet, B. Macpherson, M. Weissmann, and N. Yussouf, 2023: Progress, challenges, and future steps in data assimilation for convection-permitting numerical weather prediction: Report on the virtual meeting held on 10 and 12 november 2021. Atmos. Sci. Let., 24, e1130, doi:10.1002/asl.1130.
  • Hunt et al. (2007) Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230, 112–126, doi:10.1016/j.physd.2006.11.008.
  • Isaksen et al. (2010) Isaksen, L., M. Bonaita, R. Buizza, M. Fisher, J. Haseler, M. Leutbecher, and L. Raynaud, 2010: Ensemble of data assimilations at ECMWF. ECMWF Technical Memoranda, 636, 1–41.
  • Jones et al. (2010) Jones, T. A., K. Knopfmeier, D. Wheatley, G. Creager, P. Minnis, and R. Palikonda, 2010: Storm-scale data assimilation and ensemble forecasting with the NSSL experimental Warn-on-Forecast system. Part II: Combined radar and satellite data experiments. Wea. Forecasting, 30, 1795–1817, doi:10.1175/WAF-D-15-0043.1.
  • Kloeden and Platen (1992) Kloeden, P. E., and E. Platen, 1992: Numerical solution of stochastic differential equations, Applications of Mathematics (New York), Vol. 23. Springer-Verlag, Berlin, xxxvi+632 pp.
  • Luo and Hu (2021) Luo, S., and W. Hu, 2021: Score-based point cloud denoising. 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, IEEE, 4563–4572, doi:10.1109/ICCV48922.2021.00454, URL https://doi.org/10.1109/ICCV48922.2021.00454.
  • Milstein (1975) Milstein, G. N., 1975: Approximate integration of stochastic differential equations. Theory of probability & its applications, 19, 557–562, doi:10.1137/1119062.
  • Poterjoy (2022) Poterjoy, J., 2022: Implications of multivariate non-Gaussian data assimilation for multi-scale weather prediction. Mon. Wea. Rev., 150, 1475–1493, doi:10.1175/MWR-D-21-0228.1.
  • Poterjoy et al. (2017) Poterjoy, J., R. A. Sobash, and J. L. Anderson, 2017: Convective-scale data assimilation for the Weather Research and Forecasting model using the local particle filter. Mon. Wea. Rev., 145, 1897–1918, doi:10.1175/MWR-D-16-0298.1.
  • Pulido and van Leeuwen (2019) Pulido, M., and P. J. van Leeuwen, 2019: Sequential Monte Carlo with kernel embedded mappings: The mapping particle filter. J. Comp. Phys., 396, 400–415, doi:10.1016/j.jcp.2019.06.060.
  • Rojahn et al. (2023) Rojahn, A., N. Schenk, P. J. van Leeuwen, and R. Potthast, 2023: Particle filtering and Gaussian mixtures - on a localized mixture coefficients particle filter (LMCPF) for global NWP. J. Meteor. Soc. Japan, 101, 233–253, doi:10.2151/jmsj.2023-015.
  • Rotunno and Snyder (2008) Rotunno, R., and C. Snyder, 2008: A generalization of Lorenz’s model for the predictability of flows with many sclaes of motion. J. Atmos. Sci., 65, 1063–1076, doi:10.1175/2007JAS2449.1.
  • Schraff et al. (2016) Schraff, C., H. Reich, A. Rhodin, A. Schomburg, K. Stephan, A. Periáñez, and R. Potthast, 2016: Kilometre-scale ensemble data assimilation for the COSMO model (KENDA). Quart. J. Roy. Meteor. Soc., 142, 1453–1472, doi:10.1002/qj.2748.
  • Smith et al. (2023) Smith, T. A., S. G. Penny, J. A. Platt, and T.-C. Chen, 2023: Temporal subsampling diminishes small spatial scales in recurrent neural network emulators of geophysical turbulence. Journal of Advances in Modeling Earth Systems, 15, e2023MS003 792, doi:10.1029/2023MS003792.
  • Song et al. (2021) Song, Y., J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, 2021: Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations, URL https://openreview.net/forum?id=PxTIG12RRHS.
  • Spantini et al. (2022) Spantini, A., R. Baptista, and Y. Marzouk, 2022: Coupling techniques for nonlinear ensemble filtering. SIAM Review, 144, 921–953, doi:10.1137/20M1312204.
  • Tödter et al. (2016) Tödter, J., P. Kirchgessner, L. Nerger, and B. Ahrens, 2016: Assessment of a nonlinear ensemble transform filter for high-dimensional data assimilation. Mon. Wea. Rev., 144, 409–427, doi:10.1175/MWR-D-15-0073.1.
  • Tulloch and Smith (2009a) Tulloch, R., and K. S. Smith, 2009a: A note on the numerical presentation of surface dynamics in quasigeostrophic turbulence. J. Atmos. Sci., 66, 1063–1068, doi:10.1175/2008JAS2921.1.
  • Tulloch and Smith (2009b) Tulloch, R., and K. S. Smith, 2009b: Quasigeostrophic turbulence with explicit surface dynamics: Application to the atmospheric energy spectrum. J. Atmos. Sci., 66, 450–467, doi:10.1175/2008JAS2653.1.
  • van Leeuwen (2009) van Leeuwen, P. J., 2009: Particle filtering in geophysical systems. Mon. Wea. Rev., 137, 4089–4114, doi:10.1175/2009MWR2835.1.
  • van Leeuwen et al. (2019) van Leeuwen, P. J., H. R. Künsch, L. Nerger, R. Potthast, and S. Reich, 2019: Particle filters for high-dimensional geoscience applications: a review. Quart. J. Roy. Meteor. Soc., 145, 2335–2365, doi:10.1002/qj.3551.
  • Wang et al. (2021) Wang, X., H. G. Chipilski, C. H. Bishop, E. Satterfield, N. Baker, and J. S. Whitaker, 2021: A multiscale local gain form ensemble transform kalman filter (MLGETKF). Mon. Wea. Rev., 149, 605–622, doi:10.1175/MWR-D-20-0290.1.
  • Whitaker and Hamill (2012) Whitaker, J. S., and T. Hamill, 2012: Evaluating methods to account for system rrrors in ensemble data assimilation. Mon. Wea. Rev., 140, 3078–3089, doi:10.1175/MWR-D-11-00276.1.