This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Robust Bayesian methods using amortized simulation-based inference

Wang Yuyan   Michael Evans  and David J. Nott Department of Statistics and Data Science, National University of Singapore.Department of Statistics, University of Toronto, Toronto, ON M5S 3G3, Canada.Corresponding author: standj@nus.edu.sg. Department of Statistics and Data Science, National University of Singapore and Institute of Operations Research and Analytics, National University of Singapore.
Abstract

Bayesian simulation-based inference (SBI) methods are used in statistical models where simulation is feasible but the likelihood is intractable. Standard SBI methods can perform poorly in cases of model misspecification, and there has been much recent work on modified SBI approaches which are robust to misspecified likelihoods. However, less attention has been given to the issue of inappropriate prior specification, which is the focus of this work. In conventional Bayesian modelling, there will often be a wide range of prior distributions consistent with limited prior knowledge expressed by an expert. Choosing a single prior can lead to an inappropriate choice, possibly conflicting with the likelihood information. Robust Bayesian methods, where a class of priors is considered instead of a single prior, can address this issue. For each density in the prior class, a posterior can be computed, and the range of the resulting inferences is informative about posterior sensitivity to the prior imprecision. We consider density ratio classes for the prior and implement robust Bayesian SBI using amortized neural methods developed recently in the literature. We also discuss methods for checking for conflict between a density ratio class of priors and the likelihood, and sequential updating methods for examining conflict between different groups of summary statistics. The methods are illustrated for several simulated and real examples.

Keywords: Amortized inference; Density ratio class; Misspecification; Robust Bayes.

1 Introduction

There are many interesting statistical models where the likelihood is intractable. If simulation of synthetic data from the model is feasible, it may still be possible to perform Bayesian inference. The field of simulation-based inference (SBI) deals with such models. This paper develops robust Bayesian methods for SBI building on recent amortized neural SBI methods in the literature. Robust Bayesian approaches consider Bayesian updating for every prior in some class rather than a single prior. The range of the resulting posterior inferences is informative about posterior sensitivity to the prior ambiguity. Computation of robust Bayesian inferences is challenging, but one tractable approach uses so-called density ratio classes. Our work considers robust Bayesian methods implemented with amortized SBI methods for density ratio classes. We also develop methods for checking for conflict between a density ratio prior class and the likelihood, and methods for checking for conflict between subsets of data summary statistics.

There are many different methods for simulation-based inference. One well-established approach is approximate Bayesian computation (ABC) (Tavaré et al.,, 1997; Sisson et al.,, 2018), which in its simplest form repeatedly simulates from the joint Bayesian model for parameters and data, and accumulates parameter samples for which synthetic data is close enough to the observed data for some distance and tolerance. To make computation easier, the distance used in ABC is usually defined from low-dimensional summary statistics. Another common SBI method is synthetic likelihood (Wood,, 2010; Price et al.,, 2018; Frazier et al.,, 2023). This approach approximates the likelihood by assuming that the distribution of data summary statistics is Gaussian, and estimates summary statistic means and covariances using model simulation. The problem of estimating the posterior density from simulated data can also be treated as one of flexible conditional density estimation, with neural posterior estimation (NPE) (Papamakarios and Murray,, 2016; Lueckmann et al.,, 2017; Greenberg et al.,, 2019; Radev et al.,, 2022) being the most popular example of this approach. Related neural likelihood estimation (NLE) methods approximate the likelihood instead of the prior (Papamakarios et al.,, 2021; Lueckmann et al.,, 2019), and approximations of likelihood ratios can also be developed using flexible classifiers (Hermans et al.,, 2020; Thomas et al.,, 2022). Some recent approaches approximate the posterior and likelihood simultaneously (Wiqvist et al.,, 2021; Glöckler et al.,, 2022; Radev et al.,, 2023). A recent discussion of theoretical aspects of both NPE and NLE methods is given by Frazier et al., (2024), but the existing theory covers only the case of a correctly specified model.

Much recent SBI research has focused on the effects of misspecification on standard SBI methods. However, the existing work mostly considers situations in which the likelihood is misspecified. A brief discussion of different approaches is given in Appendix A, and Kelly et al., (2025) give a comprehensive recent review. Here we concentrate on neural conditional density estimation approaches and the issue of avoiding inappropriate choices of the prior, and understanding the sensitivity of posterior inferences to ambiguity when it is difficult to specify a single prior. To the best of our knowledge, there is no existing literature on robust Bayesian methods in this sense for models with intractable likelihood. However, there is some work on checking for prior-data conflict in the conventional Bayesian setting of SBI with a single prior (Chakraborty et al.,, 2023), and other authors have recognized the distinction between prior-data conflicts and misspecification of the likelihood (Schmitt et al.,, 2024). There is existing work on prior-data conflict and imprecise probability (e.g., Walter and Coolen,, 2016, among others) but to the best of our knowledge not in the setting of density ratio classes specifically or for intractable likelihood. Further discussion is given in Section 5. Our work makes three main contributions. The first is to implement robust Bayes methods based on density ratio prior classes using amortized Bayesian inference methods for SBI. The second is to give methods for checking for conflict between the likelihood and a density ratio prior class. Related to these checks, our third contribution considers sequential updating for density ratio classes to check for conflicts between subsets of summary statistics.

In the next section we give some necessary background on robust Bayesian methods using density ratio classes. Section 3 discusses amortized neural methods for SBI. Section 4 discusses how to use the methods in Section 3 to implement robust Bayesian methods in models with intractable likelihood, and Section 5 discusses conflict checking for density ratio classes and checking for conflicts between subsets of summary statistics. Section 6 discusses some real and simulated examples and Section 7 gives some concluding discussion.

2 Robust Bayes methods using density ratio classes

When a single prior is specified in a Bayesian analysis, some of its characteristics will be chosen in an arbitrary way, since in practice the prior information doesn’t determine the prior uniquely. Expert knowledge can also be flawed, resulting in conflict between prior and likelihood information. This can compromise sound Bayesian inference, as conflicting information should not be combined thoughtlessly. Prior-data conflict is a form of model misspecification that is distinct from any problem with the specification of the likelihood; see Evans and Moshonov, (2006) and Nott et al., (2020) for further discussion.

One way to avoid specification of a single prior when that is difficult is to take a robust Bayesian approach (Walley,, 1991; Berger,, 1994). In robust Bayes a single prior is replaced by a set of priors. For each prior in the chosen set, we can compute a posterior distribution. An important question is how the class of priors should be defined, so that it is expressive of elicitation uncertainty but computationally tractable. Here we will use the so-called density ratio class (also called intervals of measures in the original paper discussing this class by DeRobertis and Hartigan, (1981)). Useful overviews of this approach with comparisons to other prior classes are given in Berger, (1990) and Rinderknecht et al., (2014), and elicitation is discussed in Rinderknecht et al., (2011). Bayesian updating and marginalization of a density ratio class leads to another density ratio class. Wasserman, (1992) proved that closure under Bayesian updating and marginalization characterize the density ratio class. Rinderknecht et al., (2014) discuss predictive inference for both deterministic and stochastic models. The rest of this section defines density ratio classes, and gives a summary of their properties.

2.1 Definition of the density ratio class

To make our discussion easier we define some notation. For a model with parameter θΘ\theta\in\Theta for data yy, let 0l(θ)u(θ)0\leq l(\theta)\leq u(\theta) be two functions, which we call lower and upper bound functions, and assume that

l(θ)𝑑θ>0andu(θ)𝑑θ<.\int l(\theta)\,d\theta>0\;\;\;\text{and}\;\;\;\int u(\theta)\,d\theta<\infty.

Writing π(θ)\pi(\theta) for a possibly unnormalized density (i.e. a density that does not integrate to one), we follow Rinderknecht et al., (2014) and write π^(θ)\hat{\pi}(\theta) for the normalized verison of π(θ)\pi(\theta) when this exists. The density ratio class with lower bound l(θ)l(\theta) and upper bound u(θ)u(\theta) is

ψl,u:={π^(θ)=π(θ)π(θ)𝑑θ;l(θ)π(θ)u(θ)}.\displaystyle\psi_{l,u}:=\left\{\hat{\pi}(\theta)=\frac{\pi(\theta)}{\int\pi(\theta)\,d\theta};l(\theta)\leq\pi(\theta)\leq u(\theta)\right\}. (1)

The functions l(θ)l(\theta) and u(θ)u(\theta) are bounds on the shape of a density in ψl,u\psi_{l,u}. If an unnormalized density can fit between the bounds, its normalized version is in ψl,u\psi_{l,u}. In the case where l(θ)>0l(\theta)>0 for all θΘ\theta\in\Theta, an equivalent definition of ψl,u\psi_{l,u} is

ψl,u\displaystyle\psi_{l,u} ={π^(θ):l(θ)u(θ)π(θ)π(θ)u(θ)l(θ) for all θ,θΘ}.\displaystyle=\left\{\hat{\pi}(\theta):\frac{l(\theta)}{u(\theta^{\prime})}\leq\frac{\pi(\theta)}{\pi(\theta^{\prime})}\leq\frac{u(\theta)}{l(\theta^{\prime})}\mbox{ for all $\theta,\theta^{\prime}\in\Theta$}\right\}. (2)

In (2) the left-most inequality is equivalent to the right-most one by inverting ratios. However, including the redundancy makes the implications of the definition clearer. The definition (2) explains the name “density ratio class” first used in Berger, (1990). The equivalence of (1) and (2) is demonstrated in Appendix B.

From (1), it is immediate that ψl,u=ψkl,ku\psi_{l,u}=\psi_{kl,ku}, for any constant k>0k>0. Hence we could take either the lower or upper bound function to be a normalized density. If we take u(θ)=u^(θ)u(\theta)=\hat{u}(\theta), and write

r\displaystyle r =u(θ)𝑑θl(θ)𝑑θ,\displaystyle=\frac{\int u(\theta)\,d\theta}{\int l(\theta)\,d\theta}, (3)

then

ψl,u=ψr1l^,u^.\displaystyle\psi_{l,u}=\psi_{r^{-1}\hat{l},\hat{u}}. (4)

In visualizing density ratio classes later, we will normalize the upper bound function.

2.2 Lower and upper probabilities

A density ratio class implies a range of probabilities for any event. Suppose we are interested in the event EΘE\subseteq\Theta, and for some density ratio class ψl,u\psi_{l,u} we want the lower and upper probabilities P¯(E)\underline{P}(E), P¯(E)\overline{P}(E) for EE, defined by

(P¯(E),P¯(E)):=(infπ^(θ)ψl,uEπ^(θ)𝑑θ,supπ^(θ)ψl,uEπ^(θ)𝑑θ).(\underline{P}(E),\overline{P}(E)):=\left(\inf_{\hat{\pi}(\theta)\in\psi_{l,u}}\int_{E}\hat{\pi}(\theta)\,d\theta,\sup_{\hat{\pi}(\theta)\in\psi_{l,u}}\int_{E}\hat{\pi}(\theta)\,d\theta\right).

It is easily shown (e.g. Rinderknecht et al., 2014, Section 2.1) that

(P¯(E),P¯(E))\displaystyle(\underline{P}(E),\overline{P}(E)) =(El(θ)𝑑θEl(θ)𝑑θ+Ecu(θ)𝑑θ,Eu(θ)𝑑θEu(θ)𝑑θ+Ecl(θ)𝑑θ),\displaystyle=\left(\frac{\int_{E}l(\theta)\,d\theta}{\int_{E}l(\theta)\,d\theta+\int_{E^{c}}u(\theta)\,d\theta},\frac{\int_{E}u(\theta)\,d\theta}{\int_{E}u(\theta)\,d\theta+\int_{E^{c}}l(\theta)\,d\theta}\right),

where EcE^{c} denotes the complement of EE. In terms of the normalized densities l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta) and with rr defined in (3), we can write

(P¯(E),P¯(E))\displaystyle(\underline{P}(E),\overline{P}(E)) =(El^(θ)𝑑θEl^(θ)𝑑θ+rEcu^(θ)𝑑θ,Eu^(θ)𝑑θEu^(θ)𝑑θ+r1Ecl^(θ)𝑑θ).\displaystyle=\left(\frac{\int_{E}\hat{l}(\theta)\,d\theta}{\int_{E}\hat{l}(\theta)\,d\theta+r\int_{E^{c}}\hat{u}(\theta)\,d\theta},\frac{\int_{E}\hat{u}(\theta)\,d\theta}{\int_{E}\hat{u}(\theta)\,d\theta+r^{-1}\int_{E^{c}}\hat{l}(\theta)\,d\theta}\right). (5)

2.3 Closure under Bayesian updating and marginalization

Two important properties of density ratio classes are closure under Bayesian updating and marginalization. Wasserman, (1992) demonstrated that the density ratio class is the only class of densities possessing these properties. Invariance under Bayesian updating means the following. If a set of prior densities is a density ratio class, ψl,u\psi_{l,u} say, and if we update each prior in ψl,u\psi_{l,u} to its posterior density using a likelihood function p(y|θ)p(y|\theta), then the set of posterior densities is also a density ratio class, which we write as ψl(θ;y),u(θ;y)\psi_{l(\theta;y),u(\theta;y)}, where l(θ;y)=l(θ)p(y|θ)l(\theta;y)=l(\theta)p(y|\theta) and u(θ;y)=u(θ)p(y|θ)u(\theta;y)=u(\theta)p(y|\theta). Later we will make use of the ratio of areas under the upper and lower bound functions for the posterior density ratio class as a discrepancy in a Bayesian predictive check, and it will be useful to have some notation for this. We write

r(y)\displaystyle r(y) :=u(θ;y)𝑑θl(θ;y)𝑑θ=ru^(θ)p(y|θ)𝑑θl^(θ)p(y|θ)𝑑θ,\displaystyle:=\frac{\int u(\theta;y)\,d\theta}{\int l(\theta;y)\,d\theta}=r\frac{\int\hat{u}(\theta)p(y|\theta)\,d\theta}{\int\hat{l}(\theta)p(y|\theta)\,d\theta}, (6)

with the last equality following from (4). The quantity r(y)r(y) is a measure of how large the class of posterior densities is. Multiplying the lower and upper bound functions by an arbitrary positive constant does not change r(y)r(y).

Next we describe invariance of a density ratio class under marginalization. Suppose we partition θ\theta into two subvectors θ=(θA,θB)\theta=(\theta_{A}^{\top},\theta_{B}^{\top})^{\top} and for π^(θ)ψl,u\hat{\pi}(\theta)\in\psi_{l,u}, write π^(θA)\hat{\pi}(\theta_{A}) for its θA\theta_{A} marginal:

π^(θA)=π^(θ)𝑑θB.\hat{\pi}(\theta_{A})=\int\hat{\pi}(\theta)\,d\theta_{B}.

The set of all densities π^(θA)\hat{\pi}(\theta_{A}) for π^(θ)ψl,u\hat{\pi}(\theta)\in\psi_{l,u} is a density ratio class, ψl(θA),u(θA)\psi_{l(\theta_{A}),u(\theta_{A})}, where

l(θA)=l(θ)𝑑θB,u(θA)=u(θ)𝑑θB.l(\theta_{A})=\int l(\theta)\,d\theta_{B},\;\;\;\;u(\theta_{A})=\int u(\theta)\,d\theta_{B}.

2.4 Prediction

Rinderknecht et al., (2014) considered how a density ratio class defined on parameters propagates in Bayesian predictive inference for data yy^{\prime}. Rinderknecht et al., (2014) consider both deterministic and stochastic prediction. For the case where predictions are deterministic given θ\theta, the predictive densities propagated from a density ratio class on the parameter space are a density ratio class on predictive space. This follows from the closure under marginalization property discussed above.

Next consider stochastic prediction, where given θ\theta the data to be predicted has density p(y|θ)p(y^{\prime}|\theta) given θ\theta. Suppose we have a density ratio class ψl,u\psi_{l,u} of densities defined on the parameter space Θ\Theta. For π^(θ)ψl,u\hat{\pi}(\theta)\in\psi_{l,u}, consider the prior predictive density for yy^{\prime} defined as

p(y;π^)\displaystyle p(y^{\prime};\hat{\pi}) =p(y|θ)π^(θ)𝑑θ.\displaystyle=\int p(y^{\prime}|\theta)\hat{\pi}(\theta)\;d\theta. (7)

Later we will also consider extensions to settings where there is previously observed data yy say and yy^{\prime} is not conditionally independent of it given θ\theta, and we define

p(y;y,π^)\displaystyle p(y^{\prime};y,\hat{\pi}) :=p(y,y;π^)p(y;π^).\displaystyle:=\frac{p(y^{\prime},y;\hat{\pi})}{p(y;\hat{\pi})}.

The set of p(y;π^)p(y^{\prime};\hat{\pi}) for all π^(θ)ψl,u\hat{\pi}(\theta)\in\psi_{l,u} is not a density ratio class, but it is contained in the density ratio class

ψl(y),u(y),\displaystyle\psi_{l(y^{\prime}),u(y^{\prime})}, (8)

where

l(y)=l(θ)p(y;θ)𝑑θu(y)=u(θ)p(y;θ)𝑑θ.l(y^{\prime})=\int l(\theta)p(y^{\prime};\theta)\,d\theta\;\;\;\;u(y^{\prime})=\int u(\theta)p(y^{\prime};\theta)\,d\theta.

There can be predictive densities in (8) that may not be obtained from (7) for some π^ψl,u\hat{\pi}\in\psi_{l,u}; Rinderknecht et al., (2014, Section 2.4.1) give an example. We can use the density ratio class (8) to give conservative upper and lower predictive probabilities. It is immediate that

ψl(y),u(y)=ψ(r)1l^(y),u^(y).\psi_{l(y^{\prime}),u(y^{\prime})}=\psi_{(r)^{-1}\hat{l}(y^{\prime}),\hat{u}(y^{\prime})}.

Later we will consider density ratio classes of prior predictive densities based on the observed yy, as given by ψl(y),u(y)\psi_{l(y),u(y)}. We will also consider prior predictive densities of data summary statistics, S=S(y)S=S(y), and in this case we write ψl(S),u(S)\psi_{l(S),u(S)}.

3 Amortized inference for SBI

Next we discuss amortized inference for SBI in the conventional Bayesian framework. The methods we describe here are used in the next section to perform robust Bayesian SBI computations for density ratio classes.

Suppose we have data yobsy_{\text{obs}} and a model for it with density p(y|θ)p(y|\theta) where θΘ\theta\in\Theta is an unknown parameter. The likelihood p(yobs|θ)p(y_{\text{obs}}|\theta) is intractable, and we consider neural SBI methods for inference about θ\theta. There are many neural SBI methods in the literature as discussed in the introduction; here we focus on amortized methods, which after training are able to produce posterior approximations for arbitrary data (not just yobsy_{\text{obs}}) at minimal additional computational cost. In our work, we have used the JANA package (Radev et al.,, 2023), which builds on the BayesFlow approach of Radev et al., (2022), with the latter performing only posterior estimation, while the former approximates both posterior and likelihood. Most neural SBI methods involve the sequential learning of a proposal distribution over many rounds to focus the simulation effort on parts of the parameter space likely to produce synthetic data similar to the observed data. Although these methods are not amortized, they can often be thought of as amortized methods if a single round of training is performed with a proposal given by the prior. Amortized SBI is a fast-moving field (Gloeckler et al.,, 2024; Zammit-Mangion et al.,, 2025; Chang et al.,, 2025) and our work makes no new contribution to it. Our focus instead is on using amortized methods to implement robust Bayesian inference with density ratio classes, and for calibrating checks for prior-data conflict. The latter task requires computing posterior quantities many times for different simulated data, which amortized methods can do efficiently.

The methods described in Radev et al., (2023) are well-suited for robust Bayesian computations with density ratio classes, since they provide methods for approximating both marginal likelihoods and posterior densities, and this can be exploited for computing quantities such as r(y)r(y) defined at (6). Rinderknecht et al., (2014) describe a variety of approaches to computation with density ratio classes, some of which do not require marginal likelihood evaluations, or only require estimation of the posterior density for a single prior density, but we found these alternatives to be more difficult numerically than those described in the next section. Another strength of the methodology in Radev et al., (2023) is the development of simulation-based calibration (SBC) (Talts et al.,, 2018) methods adapted to the setting of amortized inference where both posterior and likelihood approximations are learnt (so-called joint simulation-based calibration, JSBC).

We briefly describe here the JANA methodology in the simplified case where summary statistics are not learnt from the data. Learnt summary statistics are useful in many settings, but can be hard to interpret, and later our summary statistic checks are more insightful when the summaries are specified by the user so that they are interpretable. For data y𝒴y\in\mathcal{Y}, we write S:𝒴𝒮S:\mathcal{Y}\rightarrow\mathcal{S} for a summary statistic mapping. In our examples, 𝒴n\mathcal{Y}\subseteq\mathbb{R}^{n} and 𝒮d\mathcal{S}\subseteq\mathbb{R}^{d}, where nn is the sample size and dd is the summary statistic dimension, and generally dnd\ll n. The observed summary statistic value is written Sobs=S(yobs)S_{\text{obs}}=S(y_{\text{obs}}). Parametrized approximations are used for both the posterior and likelihood. For some parameter φ\varphi, we write qφ(θ|S)q_{\varphi}(\theta|S) for a parametric approximation to the summary statistic posterior p(θ|S)p(\theta|S) valid for all SS, and for some parameter γ\gamma we write qγ(S|θ)q_{\gamma}(S|\theta) for a parametric approximation to the sampling density p(S|θ)p(S|\theta) of the summary statistic, valid for all θ\theta. How these families of posterior and likelihood approximations are parametrized in a flexible way is discussed further below. For a given parametrization, the final approximations for likelihood and posterior for are qγ(S|θ)q_{\gamma^{*}}(S|\theta), qφ(θ|S)q_{\varphi^{*}}(\theta|S), where

(φ,γ)\displaystyle(\varphi^{*},\gamma^{*}) =argminφ,γEp(θ,S){logqφ(θ|S)logqγ(S|θ)}.\displaystyle=\arg\min_{\varphi,\gamma}E_{p(\theta,S)}\left\{-\log q_{\varphi}(\theta|S)-\log q_{\gamma}(S|\theta)\right\}. (9)

In practice, the expectation needs to be approximated by an average over simulated samples, and a a penalty can also be added to encourage the prior predictive density of summaries to be close to standard normal when summary statistics are learnt. This can be useful in model checking (Schmitt et al.,, 2024) for the detection of unusual summary statistic values with respect to the prior. Optimizing a simulation approximation of (9) without any penalty learns a parametric approximations to the posterior and likelihood simultaneously using maximum likelihood for the simulated data.

We choose qφ(θ|S)q_{\varphi}(\theta|S) and qγ(S|θ)q_{\gamma}(S|\theta) to be conditional normalizing flows (see, for example, Papamakarios et al., (2021) for a review of normalizing flows). For an p\mathbb{R}^{p}-valued random variable θ\theta having a distribution depending on SS, suppose that Tφ,S:ppT_{\varphi,S}:\mathbb{R}^{p}\rightarrow\mathbb{R}^{p} is a smooth invertible mapping and that Tφ,S(θ)N(0,Ip)T_{\varphi,S}(\theta)\sim N(0,I_{p}) for every SS, where IpI_{p} denotes the p×pp\times p identity matrix. By the invertibility of Tφ,ST_{\varphi,S}, and using a change of variables, the implied density of θ\theta conditional on SS is

qφ(θ|S)=ϕp(Tφ,S(θ))|detTφ,Sφ|,q_{\varphi}(\theta|S)=\phi_{p}(T_{\varphi,S}(\theta))\left|\det\frac{\partial T_{\varphi,S}}{\partial\varphi}\right|,

where ϕp()\phi_{p}(\cdot) denotes the pp-dimensional standard normal density. This gives a parametrized form for the posterior density given SS, for a suitable family of transformations Tφ,ST_{\varphi,S} with the tuning parameter φ\varphi. Similarly, for an d\mathbb{R}^{d}-valued random variable SS with a distribution depending on θ\theta, suppose that Hγ,θ:ddH_{\gamma,\theta}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d} is a smooth invertible mapping such that Hγ,θ(S)N(0,Id)H_{\gamma,\theta}(S)\sim N(0,I_{d}) for every θ\theta. By the invertibility of Hγ,θH_{\gamma,\theta} and using a change of variables, the density of SS conditional on θ\theta is

qγ(S|θ)=ϕd(Hγ,θ(S))|detHγ,θγ|.q_{\gamma}(S|\theta)=\phi_{d}(H_{\gamma,\theta}(S))\left|\det\frac{\partial H_{\gamma,\theta}}{\partial\gamma}\right|.

This gives a parametrized form for the sampling density of SS given θ\theta. The invertible mappings Tφ,ST_{\varphi,S} and Hγ,θH_{\gamma,\theta} are constructed as compositions of simpler transformations (“flows”), for which there are many standard choices, such as real NCP (Dinh et al.,, 2016) and neural spline flows (Durkan et al.,, 2019), among many others. Often the parameters in the flows are weights for neural networks. In the flow construction it is important to ensure invertibility as well as easy computation of Jacobian determinants in the expressions for qφ(θ|S)q_{\varphi}(\theta|S) and qγ(S|θ)q_{\gamma}(S|\theta). It is straightforward to obtain independent draws from a distribution defined through a normalizing flow from their definition in terms of a transformation of a simple density.

4 Implementing robust Bayes for SBI

A common method of summarizing the posterior density in a conventional Bayesian analysis is to plot the marginal densities. Similarly, in robust Bayes with density ratio classes, we can plot the upper and lower bound functions of the density ratio classes for the marginal posteriors. Let ψl,u\psi_{l,u} be a prior density ratio class. Write π^(θ|y)\hat{\pi}(\theta|y) for the posterior density obtained by updating π^(θ)\hat{\pi}(\theta) using likelihood p(y|θ)p(y|\theta) and Bayes’ rule, and write π^(θj|y)\hat{\pi}(\theta_{j}|y) for its θj\theta_{j} marginal density. By the invariance under Bayesian updating and marginalization properties, the set of densities

{π^(θj|y):π^(θ)ψl,u},\{\hat{\pi}(\theta_{j}|y):\hat{\pi}(\theta)\in\psi_{l,u}\},

is a density ratio class, with lower and upper bound functions

r(y)1l^(θj|y)<u^j(θj|y),\displaystyle r(y)^{-1}\hat{l}(\theta_{j}|y)<\hat{u}_{j}(\theta_{j}|y), (10)

where r(y)r(y) is defined in equation (6), l^(θj|y)\hat{l}(\theta_{j}|y) and u^(θj|y)\hat{u}(\theta_{j}|y) are θj\theta_{j} marginal posterior densities for priors l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta) respectively, and we have chosen to define the upper bound as a normalized density function.

To compute the lower and upper bound functions in (10), we need l^(θj|y)\hat{l}(\theta_{j}|y), u^(θj|y)\hat{u}(\theta_{j}|y) and r(y)r(y). In the likelihood-free setting with summary statistics SS, we approximate l^(θj|y)\hat{l}(\theta_{j}|y) by first simulating data Zil=(θil,Sil)Z_{i}^{l}=(\theta_{i}^{l},S_{i}^{l}), i=1,,ni=1,\dots,n, where the ZilZ_{i}^{l} are drawn independently from density l^(θ)p(S|θ)\hat{l}(\theta)p(S|\theta). Using the methods discussed in Section 3, we can obtain from this training data approximations l~(θ|S)\widetilde{l}(\theta|S) and l~(S|θ)\widetilde{l}(S|\theta) for the posterior density of θ\theta given SS and sampling density of SS given θ\theta respectively when the prior is l^(θ)\hat{l}(\theta). For the prior u^(θ)\hat{u}(\theta) derived from the upper bound, we similarly obtain approximations u~(θ|S)\widetilde{u}(\theta|S) and u~(S|θ)\widetilde{u}(S|\theta) for the posterior density of θ\theta given SS and sampling density of SS given θ\theta respectively when the prior is u^(θ)\hat{u}(\theta). So by simulation of samples from l~(θ|S)\widetilde{l}(\theta|S) and u~(θ|S)\widetilde{u}(\theta|S) we can construct approximations l~(θj|S)\widetilde{l}(\theta_{j}|S) and u~(θj|S)\widetilde{u}(\theta_{j}|S) to the marginal posterior densities l^(θj|y)\hat{l}(\theta_{j}|y) and u^(θj|y)\hat{u}(\theta_{j}|y) by using kernel density estimates for the θj\theta_{j} samples.

We also need to approximate the ratio

r(y)\displaystyle r(y) =u(θ;y)𝑑θl(θ;y)𝑑θ=ru^(θ)p(y|θ)𝑑θl^(θ)p(y|θ)𝑑θ=rp(y;u^)p(y;l^),\displaystyle=\frac{\int u(\theta;y)\,d\theta}{\int l(\theta;y)\,d\theta}=r\frac{\int\hat{u}(\theta)p(y|\theta)\,d\theta}{\int\hat{l}(\theta)p(y|\theta)\,d\theta}=r\frac{p(y;\hat{u})}{p(y;\hat{l})}, (11)

where rr is defined in (3) and is known from the elicitation of ψl,u\psi_{l,u}. Recalling the definition in equation (7), p(y;l^)p(y;\hat{l}) and p(y;u^)p(y;\hat{u}) are the values of the prior predictive densities at yy for priors l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta) respectively. In the likelihood-free case, where we use summary statistics SS, we will extend our previous notation and write

r(S)\displaystyle r(S) =rp(S;u^)p(S;l^),\displaystyle=r\frac{p(S;\hat{u})}{p(S;\hat{l})}, (12)

where a prior predictive density for SS, based on distribution π^(θ)\hat{\pi}(\theta) on the parameter space, is written as p(S;π^)p(S;\hat{\pi}).

To approximate p(S;l^)p(S;\hat{l}) and p(S;u^)p(S;\hat{u}) in (12), we know from Bayes’ rule that

l^(θ|S)=l^(θ)p(S|θ)p(S;l^)andu^(θ|S)=u^(θ)p(S|θ)p(S;u^).\hat{l}(\theta|S)=\frac{\hat{l}(\theta)p(S|\theta)}{p(S;\hat{l})}\;\;\;\;\text{and}\;\;\;\;\hat{u}(\theta|S)=\frac{\hat{u}(\theta)p(S|\theta)}{p(S;\hat{u})}.

Rearranging these expressions,

p(S;l^)=l^(θ)p(S|θ)l^(θ|S)andp(S;u^)=u^(θ)p(S|θ)u^(θ|S).\displaystyle p(S;\hat{l})=\frac{\hat{l}(\theta)p(S|\theta)}{\hat{l}(\theta|S)}\;\;\;\;\text{and}\;\;\;\;p(S;\hat{u})=\frac{\hat{u}(\theta)p(S|\theta)}{\hat{u}(\theta|S)}. (13)

These expressions hold for any choice of θ\theta. Using the same θ\theta in both expressions at (13) to compute (12),

r(S)\displaystyle r(S) =ru^(θ)l^(θ|S)l^(θ)u^(θ|S),\displaystyle=r\frac{\hat{u}(\theta)\hat{l}(\theta|S)}{\hat{l}(\theta)\hat{u}(\theta|S)}, (14)

where the likelihood term cancels out when taking the ratio. Replacing l^(θ|S)\hat{l}(\theta|S) and u^(θ|S)\hat{u}(\theta|S) by normalizing flow approximations l~(θ|S)\widetilde{l}(\theta|S) and u~(θ|S)\widetilde{u}(\theta|S) we obtain the approximation

r~(S)\displaystyle\widetilde{r}(S) :=ru^(θ)l~(θ|S)l^(θ)u~(θ|S).\displaystyle:=r\frac{\hat{u}(\theta)\widetilde{l}(\theta|S)}{\hat{l}(\theta)\widetilde{u}(\theta|S)}. (15)

In our later examples the value of θ\theta chosen in computing (15) is the posterior mean of u~(θ|S)\widetilde{u}(\theta|S). To evaluate (15) it is not necessary to approximate the intractable likelihood, provided we can approximate the normalized posterior density. However, if different values of θ\theta were used in the two expressions at (13) to evaluate (12), then alternative expressions to (15) are obtained which can make use of approximations of the likelihood if available.

5 Prior-data conflict checking

Our next goal is to devise checks for whether a density ratio class of priors is in conflict with the likelihood. The checks we develop are extensions of the checks used in conventional Bayesian analysis with a single prior, and we explain these first.

5.1 Conventional Bayesian predictive checks

We use similar notation to previous sections, with θ\theta a parameter in a statistical model for data yy. The density of yy is p(y|θ)p(y|\theta), and we consider in this subsection conventional Bayesian inference with a prior density p(θ)p(\theta) for θ\theta. The observed value of yy is yobsy_{\text{obs}}. The posterior density is p(θ|yobs)p(θ)p(yobs|θ)p(\theta|y_{\text{obs}})\propto p(\theta)p(y_{\text{obs}}|\theta). Bayesian model checking is usually performed through Bayesian predictive checks, which require the choice of a statistic D(y)D(y), and a reference density m(y)m(y) for the data. The check examines whether the observed discrepancy D(yobs)D(y_{\text{obs}}) is surprising or not under the assumed model. To measure surprise, the observed discrepancy is calibrated by computing a tail probability, sometimes referred to as a Bayesian predictive pp-value,

p\displaystyle p =P(D(y)D(yobs)),\displaystyle=P(D(y)\geq D(y_{\text{obs}})), (16)

for ym(y)y\sim m(y), where it is assumed that D(y)D(y) has been defined in such a way that a larger value is considered more surprising.

The test statistic and reference distribution used should depend on what aspect of the model we wish to check. When the goal is to check the likelihood component of the model, it is popular to choose m(y)m(y) as the posterior predictive density of a replicate observation (Guttman,, 1967; Gelman et al.,, 1996):

p(y|yobs)=p(θ|yobs)p(y|θ)𝑑θ.p(y|y_{\text{obs}})=\int p(\theta|y_{\text{obs}})p(y|\theta)\,d\theta.

Although easy to use, posterior predictive pp-values are not necessarily uniform or otherwise having a known distribution when the model is correct, and this lack of calibration means it can be hard to identify when such a check has produced a surprising result. Lack of calibration is often symptomatic of an insufficiently thoughtful choice of D(y)D(y), but alternatives to the posterior predictive pp-value with better calibration properties have been explored (Bayarri and Berger,, 2000; Moran et al.,, 2023).

Most relevant to the present work is checking for prior-data conflict, where we consider checking the prior p(θ)p(\theta) rather than the likelihood p(y|θ)p(y|\theta). By “checking the prior” here, we mean considering a Bayesian predictive check which will alert us if the information in the prior and the likelihood are contradictory. This happens if p(θ)p(\theta) puts all its mass in the tails of the likelihood. These conflicts are important to detect, because they can result in prior sensitivity for inferences of interest (Al Labadi and Evans,, 2017), and may alert us to an inadequate understanding of the model and its parametrization. For prior-data conflict checking, an appropriate reference density m(y)m(y) is the prior predictive density (Box,, 1980),

p(y)=p(θ)p(y|θ),p(y)=\int p(\theta)p(y|\theta),

since we wish to see if the observed likelihood is unusual compared to the likelihood for data generated using the prior density for the parameters. Box, (1980) suggested prior predictive checking for criticism of both likelihood and prior, but Evans and Moshonov, (2006) argued that prior predictive checks based on a minimal sufficient statistic are appropriate for checking for prior-data conflict, while alternative methods are more appropriate for criticizing the likelihood. Various choices of D(y)D(y) can be considered for summarizing the likelihood in a prior predictive check, such as prior predictive density values for exactly or approximately sufficient statistics, or prior-to-posterior divergences (Evans and Moshonov,, 2006; Nott et al.,, 2020). Extensions to hierarchical conflict checking methods are also discussed by these and other authors (e.g. Marshall and Spiegelhalter, 2007; Bayarri and Castellanos, 2007; Steinbakk and Storvik, 2009; Scheel et al., 2011), where an attempt is made to perform model checking informative about different levels of a hierarchical prior. A discussion of checking for conflicting sources of information in a Bayesian model in a broad sense is given in Presanis et al., (2013).

There has been much discussion of the connection between prior-data conflict and imprecise probability models. For example, Walter and Coolen, (2016) consider the phenomenon of prior-data conflict insensitivity for exponential family models and a precise conjugate prior, and suggest imprecise probability methods where the range of posterior inferences is reflective of prior-data conflicts or prior-data agreement. For earlier related work see Pericchi and Walley, (1991), Coolen, (1994), Walter and Augustin, (2009) and Walter and Augustin, (2010) for example. This work is insightful about how various imprecise probability formulations deal with conflicts in simple settings where computations are tractable, such as for exponential families. However, to the best of our knowledge, existing approaches have not been extended to prior-data conflict checking for density ratio classes, or to models with intractable likelihood. We consider this now.

5.2 Conventional Bayesian prior-data conflict checks for the bounds

A first simple approach to conflict checking for density ratio classes is to apply conventional prior-data conflict checks to the priors specified by the bounds, l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta). If the prior class represents ambiguity in our prior knowledge, it may be felt that this prior information should be consistent with the information in the likelihood, and that none of the priors in the prior class should have conflict with the likelihood. If we take that view, it can be interesting to conduct a conventional Bayesian check on the bounds. If there is a conflict, this suggests there is a problem with the elicitation of the bounds. In the next subsection we will consider a different approach, where the goal is to determine whether every prior in the prior class is in conflict with the likelihood.

In implementing a conventional prior-data conflict check for bounds, we will consider the approach of Evans and Moshonov, (2006). Consider a conventional Bayesian analysis with prior η^(θ)\hat{\eta}(\theta). Evans and Moshonov, (2006) suggest a prior-data conflict check using the discrepancy

D(T)\displaystyle D(T) =logp(T;η^),\displaystyle=-\log p(T;\hat{\eta}), (17)

where TT is a minimal sufficient statistic. They calibrate the observed value D(Tobs)D(T_{\text{obs}}) of D(T)D(T) using a prior predictive tail probability

P(D(T)D(Tobs)),Tp(T;η^).P(D(T)\geq D(T_{\text{obs}})),\;\;\;T\sim p(T;\hat{\eta}).

The discrepancy used by Evans and Moshonov, (2006) is a function of a minimal sufficient statistic, which is a desirable feature for a prior-data conflict check. If a proposed discrepancy depends on aspects of the data not captured by a minimal sufficient statistic, then these aspects are irrelevant to the likelihood, and can have nothing to do with whether the likelihood and a prior conflict.

In SBI, choosing the summary statistic SS as a minimal sufficient statistic is an ideal that is often not attainable, but it is desirable to choose an SS which is “near minimal sufficient”. Near sufficiency reduces information loss, and having a statistic of minimal dimension aids computation. Hence if a good summary statistic choice has been made for an SBI analysis, it is natural to use T=ST=S in implementing the prior-data conflict check of Evans and Moshonov, (2006). Computation of discrepancies for the bound checks requires computation of p(S;l^)p(S;\hat{l}) and p(S;u^)p(S;\hat{u}). These calculations can be done approximately using (13), and the calibration tail probability can be estimated by Monte Carlo. Considering the check for l^(θ)\hat{l}(\theta), and substituting a normalizing flow approximations l~(θ|S)\widetilde{l}(\theta|S) and l~(S|θ)\widetilde{l}(S|\theta) for l^(θ|S)\hat{l}(\theta|S) and l^(S|θ)\hat{l}(S|\theta) in (13) we obtain an approximate discrepancy D~(S)\widetilde{D}(S). A Monte Carlo approximation of a Bayesian predictive pp-value is obtained as

1Vv=1VI{D~(Sv)D~(Sobs)},for Svp(S;l^)v=1,,V.\frac{1}{V}\sum_{v=1}^{V}I\left\{\widetilde{D}(S_{v})\geq\widetilde{D}(S_{\text{obs}})\right\},\;\;\;\mbox{for $S_{v}\sim p(S;\hat{l})$, $v=1,\dots,V$}.

The check for u^(θ)\hat{u}(\theta) is similar.

A possible problem with the check of Evans and Moshonov, (2006) is that the check is not invariant to the particular form used for the minimal sufficient statistic TT. So, for example, if we make an invertible transformation of TT to TT^{\prime} say, the result of the check may differ. Although there are ways to address this issue (Evans and Jang,, 2011) the checks for density ratio classes in the next subsection do possess a property of invariance to invertible transformations of summary statistics.

5.3 Prior-data conflict for density ratio classes

Next we consider checks for whether all the priors in a prior density ratio class are in conflict with the likelihood. If the “size” of the prior density ratio class is very large, then the checks we discuss here are unlikely to produce evidence of a conflict. This is natural, because when the degree of prior ambiguity is large, there are many different priors compatible with the prior information. Hence the checks discussed here are useful mostly when the elicited density ratio class is highly informative.

Here we will consider r(S)r(S) defined at (12) as the discrepancy for a check,

r(S)=rp(S;u^)p(S;l^).r(S)=r\frac{p(S;\hat{u})}{p(S;\hat{l})}.

Intuitively, r(S)r(S) will be very large if the prior predictive density value for u^\hat{u} is much larger than for l^\hat{l}. This often happens in the case of a prior-data conflict, where prior predictive density values are very sensitive to the prior, and the usually more diffuse u^\hat{u} places mass closer to parameter values receiving support from the likelihood. We will compare r(Sobs)r(S_{\text{obs}}) to what is expected under the model to check for conflict. It is possible to have density ratio classes where l^(θ)=u^(θ)\hat{l}(\theta)=\hat{u}(\theta), and in this case r(S)r(S) is not a suitable discrepancy to use for conflict checking, since r(S)r(S) is then a constant not depending on the data. Alternative discrepancies that could be used in such cases are discussed below.

We calibrate our check using the upper probability of the event {r(S)r(Sobs)}\{r(S)\geq r(S_{\text{obs}})\} for the prior predictive density ratio class ψl(S),u(S)\psi_{l(S),u(S)},

P¯(r(S)r(Sobs)).\displaystyle\overline{P}(r(S)\geq r(S_{\text{obs}})). (18)

As discussed in Section 2.4, the density ratio class ψl(S),u(S)\psi_{l(S),u(S)} is larger than the set of predictive densities {p(S;π^):π^ψl,u}\{p(S;\hat{\pi}):\hat{\pi}\in\psi_{l,u}\}, leading to conservative Bayesian predictive pp-values. The upper probability P¯(r(S)r(Sobs)\overline{P}(r(S)\geq r(S_{\text{obs}}) is

P¯(r(S)r(Sobs))\displaystyle\overline{P}(r(S)\geq r(S_{\text{obs}})) ={r(S)r(Sobs)}p(S;u^)𝑑S{r(S)r(Sobs)}p(S;u^)𝑑S+r1{r(S)<r(Sobs)}p(S;l^)𝑑S.\displaystyle=\frac{\int_{\{r(S)\geq r(S_{\text{obs}})\}}p(S;\hat{u})\,dS}{\int_{\{r(S)\geq r(S_{\text{obs}})\}}p(S;\hat{u})\,dS+r^{-1}\int_{\{r(S)<r(S_{\text{obs}})\}}p(S;\hat{l})\,dS}. (19)

We use r~(S)\widetilde{r}(S) at (15) to approximate r(S)r(S), and we can approximate the probabilities given by the integrals in (19) from samples Svup(S;u^)S_{v}^{u}\sim p(S;\hat{u}), v=1,,Vv=1,\dots,V and Svlp(S;l^)S_{v}^{l}\sim p(S;\hat{l}), v=1,,Vv=1,\dots,V.

The conflict checks considered here are related to the checks of Evans and Moshonov, (2006) with T=ST=S, since both are based on prior predictive densities for SS. The statistic r(S)r(S) looks at a ratio of prior predictive density values at SS for u^(θ)\hat{u}(\theta) and l^(θ)\hat{l}(\theta). For our robust Bayes conflict check, if we make an invertible transformation of SS to another statistic SS^{\prime}, it is easy to see that r(S)=r(S)r(S)=r(S^{\prime}), because r(S)r(S^{\prime}) is a ratio of densities where the Jacobian term for the transformation cancels out. As mentioned above, if we choose u^(θ)=l^(θ)\hat{u}(\theta)=\hat{l}(\theta) in defining the density ratio class, r(S)r(S) cannot be used as a discrepancy for checking for conflict. In that case, we could use use the statistic of Evans and Moshonov (17) with T=ST=S and η^=u^\hat{\eta}=\hat{u}, or η^=l^\hat{\eta}=\hat{l}. Then we can calibrate the check of the density ratio class with P¯(D(S)D(Sobs))\overline{P}(D(S)\geq D(S_{\text{obs}})). As r1r\rightarrow 1, and the density ratio class shrinks towards a single precise prior, we would recover the check of Evans and Moshonov, (2006) for this prior.

5.4 Checking for conflicts between summary statistics

In Mao et al., (2021), the authors consider a way of checking for conflict between different components of a summary statistic vector in a conventional SBI analysis with summary statistics. They consider dividing SS into subvectors S=(SA,SB)S=(S_{A}^{\top},S_{B}^{\top})^{\top}, with observed value Sobs=(Sobs,A,Sobs,B)S_{\text{obs}}=(S_{\text{obs},A}^{\top},S_{\text{obs},B}^{\top})^{\top}, and writing

p(θ|Sobs)p(θ|Sobs,A)p(Sobs,B|Sobs,A,θ),p(\theta|S_{\text{obs}})\propto p(\theta|S_{\text{obs},A})p(S_{\text{obs},B}|S_{\text{obs},A},\theta),

we can consider p(θ|Sobs,A)p(\theta|S_{\text{obs},A}) as the prior after observing SA=Sobs,AS_{A}=S_{\text{obs,A}} to be updated by the likelihood term p(Sobs,B|Sobs,A,θ)p(S_{\text{obs},B}|S_{\text{obs},A},\theta). If we have checked the adequacy of the model p(SA|θ)p(S_{A}|\theta) for SAS_{A}, and if there is no prior data conflict between the prior p(θ)p(\theta) and p(SA|θ)p(S_{A}|\theta), then conflict between p(θ|Sobs,A)p(\theta|S_{\text{obs},A}) and p(Sobs,B|Sobs,A,θ)p(S_{\text{obs},B}|S_{\text{obs},A},\theta) indicates conflict between the different subvectors of the summary statistics. To get some intuition, a conflict here would suggest that the values of θ\theta that give a good fit to Sobs,AS_{\text{obs},A} are not values that give a good fit to Sobs,BS_{\text{obs},B}, which may indicate model misspecification. The definition of the check of Mao et al., (2021) is not invariant to swapping SAS_{A} and SBS_{B} in the definition; the order matters. If SAS_{A} were a sufficient statistic, this check would not be useful, since in that case p(SB|Sobs,A,θ)p(S_{B}|S_{\text{obs},A},\theta) does not depend on θ\theta, and hence this likelihood term cannot conflict with p(θ|Sobs,A)p(\theta|S_{\text{obs},A}) no matter what SBS_{B} is observed.

It is possible to construct a similar check for conflict between summary statistics for density ratio classes. Follow the discussion of Section 5.3, the prior-data conflict discrepancy for the case where SAS_{A} is observed but before updating by SBS_{B}, is

r(SB|Sobs,A)\displaystyle r(S_{B}|S_{\text{obs},A}) =r(Sobs,A)p(SB|Sobs,A,u^)p(SB|Sobs,A,l^),\displaystyle=r(S_{\text{obs},A})\frac{p(S_{B}|S_{\text{obs},A},\hat{u})}{p(S_{B}|S_{\text{obs},A},\hat{l})}, (20)

where

p(SB|Sobs,A,u^)\displaystyle p(S_{B}|S_{\text{obs},A},\hat{u}) :=p(Sobs,A,SB;u^)p(Sobs,A;u^),\displaystyle:=\frac{p(S_{\text{obs},A},S_{B};\hat{u})}{p(S_{\text{obs},A};\hat{u})}, (21)
p(SB|Sobs,A,l^)\displaystyle p(S_{B}|S_{\text{obs},A},\hat{l}) :=p(Sobs,A,SB;l^)p(Sobs,A.l^),\displaystyle:=\frac{p(S_{\text{obs},A},S_{B};\hat{l})}{p(S_{\text{obs},A}.\hat{l})}, (22)

and

r(Sobs,A)=rp(Sobs,A;u^)p(Sobs,A;l^)\displaystyle r(S_{\text{obs},A})=r\frac{p(S_{\text{obs},A};\hat{u})}{p(S_{\text{obs},A};\hat{l})} (23)

Write AA for the event A={r(SB|Sobs,A)r(Sobs,B|Sobs,A)}A=\{r(S_{B}|S_{\text{obs},A})\geq r(S_{\text{obs},B}|S_{\text{obs},A})\}. A calibration calibration tail probability for the discrepancy (20) is

P¯(A)\displaystyle\overline{P}(A) =Ap(SB|Sobs,A;u^)𝑑SBAp(SB|Sobs,A;u^)𝑑SB+r(Sobs,A)1Acp(SB|Sobs,A;l^)𝑑SB.\displaystyle=\frac{\int_{A}p(S_{B}|S_{\text{obs},A};\hat{u})\,dS_{B}}{\int_{A}p(S_{B}|S_{\text{obs},A};\hat{u})\,dS_{B}+r(S_{\text{obs},A})^{-1}\int_{A^{c}}p(S_{B}|S_{\text{obs},A};\hat{l})\,dS_{B}}. (24)

Further details about approximation of the discrepancy and computation are given in Appendix C.

This check may not be very helpful in the case where the prior density ratio class is large, leading to very conservative results. For checking conflicts between summary statistics, we find it more useful to conduct the corresponding checks based on the bounds, and we discuss this now. For checking for conflict between summaries, which is a check of the likelihood, it is not necessary to consider prior ambiguity, since the prior has nothing to do with correct specification of the likelihood.

5.5 Checking for conflicts between summary statistics based on the bounds

In the check for conflict between summary statistics based on the bounds, we can use check of Evans and Moshonov, (2006). If it is observed that SA=Sobs,AS_{A}=S_{\text{obs},A}, then the discrepancy for the check based on e.g. the lower bound, is

D(SB|Sobs,A)=logp(SB|Sobs,A;l^)=logp(Sobs,A,SB;l^)p(Sobs,A;l^).D(S_{B}|S_{\text{obs},A})=-\log p(S_{B}|S_{\text{obs},A};\hat{l})=-\log\frac{p(S_{\text{obs},A},S_{B};\hat{l})}{p(S_{\text{obs},A};\hat{l})}.

Estimating the prior predictive density values in the expression on the right can be done using the methods of Section 4 to obtain an approximate discrepancy D~(SB|Sobs,A)\widetilde{D}(S_{B}|S_{\text{obs},A}). A Monte Carlo approximation of a Bayesian predictive pp-value is obtained as

1Vv=1VI{D~(Sv,B|Sobs,A)D~(Sobs,B|Sobs,A)},for Sv,Bp(SB|Sobs,A;l^).\frac{1}{V}\sum_{v=1}^{V}I\left\{\widetilde{D}(S_{v,B}|S_{\text{obs},A})\geq\widetilde{D}(S_{\text{obs},B}|S_{\text{obs},A})\right\},\;\;\;\mbox{for $S_{v,B}\sim p(S_{B}|S_{\text{obs},A};\hat{l})$}.

The check for for the upper bound of the posterior density ratio class given SA=Sobs,AS_{A}=S_{\text{obs},A} is similar.

6 Examples

We consider three examples. The first example is a one-dimensional normal example where all calculations are done analytically. The second example is a Poisson example in one dimension with a non-conjugate prior where calculations cannot be done analytically, and it is helpful for understanding amortized SBI computations for density ratio classes and their application to conflict checking in a simple setting. The third example is more complex. It considers an agent-based model for movement of a species of toads. The likelihood is intractable, and we fit the model using a 24-dimensional summary statistic. A further time series example is discussed in Appendix D.

6.1 A normal example

We discuss a simple normal example first, in which calculations can be done analytically. The example comes from the literature on Bayesian modular inference (Liu et al.,, 2009). There are two data sources, z=(z1,,zn1)z=(z_{1},\dots,z_{n_{1}})^{\top} and w=(w1,,wn2)w=(w_{1},\dots,w_{n_{2}})^{\top}. There are scalar parameters φ\varphi and η\eta. Given φ\varphi, the elements of zz are conditionally independent, zi|φN(φ,1)z_{i}|\varphi\sim N(\varphi,1), i=1,,n1i=1,\dots,n_{1}. Give φ\varphi and η\eta, the elements of ww are conditionally independent, wi|φ,ηN(φ+η,1)w_{i}|\varphi,\eta\sim N(\varphi+\eta,1), i=1,,n2i=1,\dots,n_{2}. The parameter of interest is φ\varphi, and η\eta is a bias parameter which affects the data ww. Liu et al., (2009) consider the setting where n1n_{1} is small, and n2n_{2} is much larger. So φ\varphi can be estimated using the data zz only. However, one might think that since n1n_{1} is small, using the biased data ww might improve the inference about φ\varphi substantially. Liu et al., (2009) consider a single prior where φ\varphi and η\eta are independent a priori with φN(0,δ11)\varphi\sim N(0,\delta_{1}^{-1}), ηN(0,δ21)\eta\sim N(0,\delta_{2}^{-1}) where δ1,δ2>0\delta_{1},\delta_{2}>0 are known precision parameters. A flat prior on φ\varphi can be obtained if δ10\delta_{1}\rightarrow 0, and if δ2\delta_{2} is large, this would express confidence that the bias is small. Liu et al., (2009) show that using the biased data ww in addition to zz is not helpful for inference about φ\varphi, with limited improvement for small bias, and very poor inference for large bias.

We consider the checks developed in Sections 5.4 and 5.5 for conflict between summary statistics for this problem. A minimal sufficient statistic for θ=(φ,η)\theta=(\varphi,\eta) is S=(z¯,w¯)S=(\bar{z},\bar{w}), where z¯\bar{z} and w¯\bar{w} are the sample means of zz and ww respectively. In the notation of Section 5.4, we use SA=z¯S_{A}=\bar{z} and SB=w¯S_{B}=\bar{w}. After observing z¯\bar{z}, we can think of the posterior given z¯\bar{z} as a prior used for updating by SB=w¯S_{B}=\bar{w}, and check its consistency with the likelihood term for w¯\bar{w}. A conflict could indicate inconsistency between the prior and likelihood, or a problem with the likelihood; here we will consider the situation where the problem is a prior-data conflict, with a large bias η\eta that conflicts with prior information expressing belief in a small bias.

The lower and upper bound functions l(θ)l(\theta) and u(θ)u(\theta) are defined as follows. Write ϕ(x;μ,σ2)\phi(x;\mu,\sigma^{2}) for the normal density in xx with mean μ\mu and variance σ2\sigma^{2}. Without loss of generality we take the upper bound function to be normalized,

u(θ)=u^(θ)=ϕ(φ;0,δ11)×ϕ(η;0,δ21)u(\theta)=\hat{u}(\theta)=\phi(\varphi;0,\delta_{1}^{-1})\times\phi(\eta;0,\delta_{2}^{-1})

and l(θ)=r1l^(θ)l(\theta)=r^{-1}\hat{l}(\theta) where

l^(θ)=ϕ(φ;0,δ11)×ϕ(η;0,kδ21).\hat{l}(\theta)=\phi(\varphi;0,\delta_{1}^{-1})\times\phi(\eta;0,k\delta_{2}^{-1}).

We set kk to be 0.90.9 and then multiply l^(θ)\hat{l}(\theta) by r1=0.9r^{-1}=0.9 which ensures that l(θ)l(\theta) is a lower bound. u^(θ)\hat{u}(\theta) is a joint prior of similar form to the one considered in Liu et al., (2009), and l^(θ)\hat{l}(\theta) is similar but with a smaller prior variance for the bias parameter η\eta.

Write z¯obs\bar{z}_{\text{obs}} for the observed value of z¯\bar{z}. Again using the notation of Section 5.4, routine manipulations show that

p(SB|Sobs,A;u^)\displaystyle p(S_{B}|S_{\text{obs},A};\hat{u}) =p(w¯|z¯obs;u^)=ϕ(w¯;n1n1+δ1z¯obs,1n1+δ1+1δ2+1n2),\displaystyle=p(\bar{w}|\bar{z}_{\text{obs}};\hat{u})=\phi\left(\bar{w};\frac{n_{1}}{n_{1}+\delta_{1}}\bar{z}_{\text{obs}},\frac{1}{n_{1}+\delta_{1}}+\frac{1}{\delta_{2}}+\frac{1}{n_{2}}\right), (25)

and

p(SB|Sobs,A;l^)\displaystyle p(S_{B}|S_{\text{obs},A};\hat{l}) =p(w¯|z¯obs;l^)=ϕ(w¯;n1n1+δ1z¯obs,1n1+δ1+kδ2+1n2).\displaystyle=p(\bar{w}|\bar{z}_{\text{obs}};\hat{l})=\phi\left(\bar{w};\frac{n_{1}}{n_{1}+\delta_{1}}\bar{z}_{\text{obs}},\frac{1}{n_{1}+\delta_{1}}+\frac{k}{\delta_{2}}+\frac{1}{n_{2}}\right). (26)

Let’s first consider the checks of Section 5.5 for the bounds using the approach of Evans and Moshonov, (2006). The discrepancy for the check is logp(w¯|z¯obs,l^)-\log p(\bar{w}|\bar{z}_{\text{obs}},\hat{l}) for the lower bound and logp(w¯|z¯obs,u^)-\log p(\bar{w}|\bar{z}_{\text{obs}},\hat{u}) for the upper bound. If we take logs in (25) and (26) and ignore irrelevant constants we get equivalent discrepancies, which are the same for both cases and equal to

(w¯n1n1+δ1z¯obs)2.\displaystyle\left(\bar{w}-\frac{n_{1}}{n_{1}+\delta_{1}}\bar{z}_{\text{obs}}\right)^{2}. (27)

Let

A={(w¯n1n1+δ1z¯obs)2(w¯obsn1n1+δ1z¯obs)2}.A=\left\{\left(\bar{w}-\frac{n_{1}}{n_{1}+\delta_{1}}\bar{z}_{\text{obs}}\right)^{2}\geq\left(\bar{w}_{\text{obs}}-\frac{n_{1}}{n_{1}+\delta_{1}}\bar{z}_{\text{obs}}\right)^{2}\right\}.

The probabilities of AA under (25) and (26) respectively are

PA,u=P(W(w¯obsn1n1+δ1z¯obs)21n1+δ1+1δ2+1n2)PA,l=P(W(w¯obsn1n1+δ1z¯obs)21n1+δ1+kδ2+1n2),P_{A,u}=P\left(W\geq\frac{\left(\bar{w}_{\text{obs}}-\frac{n_{1}}{n_{1}+\delta_{1}}\bar{z}_{\text{obs}}\right)^{2}}{\frac{1}{n_{1}+\delta_{1}}+\frac{1}{\delta_{2}}+\frac{1}{n_{2}}}\right)\hskip 28.45274ptP_{A,l}=P\left(W\geq\frac{\left(\bar{w}_{\text{obs}}-\frac{n_{1}}{n_{1}+\delta_{1}}\bar{z}_{\text{obs}}\right)^{2}}{\frac{1}{n_{1}+\delta_{1}}+\frac{k}{\delta_{2}}+\frac{1}{n_{2}}}\right),

where Wχ12W\sim\chi^{2}_{1}. PA,uP_{A,u} and PA,lP_{A,l} are the exact calibration tail probabilities discussed in Section 5.5 for the priors u^\hat{u} and l^\hat{l} respectively.

We can also consider a check of the whole density ratio class (Section 5.4 and Appendix C). Since the distribution of zz depends only on φ\varphi, and since the φ\varphi marginal prior is the same for l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta), we have r(Sobs,A)r(S_{\text{obs},A}) given by (23) is rr, and the summary statistic for our conflict check (20) is

r(Sobs,A)p(SB|Sobs,A;u^)p(SB|Sobs,A;l^)=rϕ(w¯;n1n1+δ1z¯obs,1n1+δ1+1δ2+1n2)ϕ(w¯;n1n1+δ1z¯obs,1n1+δ1+kδ2+1n2).r(S_{\text{obs},A})\frac{p(S_{B}|S_{\text{obs},A};\hat{u})}{p(S_{B}|S_{\text{obs},A};\hat{l})}=r\frac{\phi\left(\bar{w};\frac{n_{1}}{n_{1}+\delta_{1}}\bar{z}_{\text{obs}},\frac{1}{n_{1}+\delta_{1}}+\frac{1}{\delta_{2}}+\frac{1}{n_{2}}\right)}{\phi\left(\bar{w};\frac{n_{1}}{n_{1}+\delta_{1}}\bar{z}_{\text{obs}},\frac{1}{n_{1}+\delta_{1}}+\frac{k}{\delta_{2}}+\frac{1}{n_{2}}\right)}.

Again an equivalent statistic is obtained by taking logs and ignoring irrelevant constants; provided that k1k\neq 1 so that l^(θ)u^(θ)\hat{l}(\theta)\neq\hat{u}(\theta), we obtain once more the statistic (27). The calibration (upper) probability (24) is

PA,uPA,u+r1(1PA,l).\displaystyle\frac{P_{A,u}}{P_{A,u}+r^{-1}(1-P_{A,l})}. (28)

Consider n1=100n_{1}=100, n2=1000n_{2}=1000, δ1=1\delta_{1}=1, δ2=100\delta_{2}=100, k=0.9k=0.9, r1=0.9r^{-1}=0.9, and z¯obs=0\bar{z}_{\text{obs}}=0. Figure 1 (left) shows the calibration probability (28) and the probabilities PA,uP_{A,u} and PA,lP_{A,l} as a function of w¯obs\bar{w}_{\text{obs}}. The probabilities PA,uP_{A,u} and PA,lP_{A,l} are indistinguishable in the Figure, but if kk were chosen to be smaller, so that the shape of l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta) are very different, then they will be distinct. As the difference between z¯obs\bar{z}_{\text{obs}} and w¯obs\bar{w}_{\text{obs}} increases, the calibration tail probability tends to zero, giving evidence of conflict. Figure 1 (right) shows a similar situation but with r=100r=100. As expected it is much less likely that a conflict will be encountered in the check for the density ratio class because the class of priors is much wider.

Refer to caption

Figure 1: Calibration tail probabilities for density ratio class and the check of Evans and Moshonov for the bounds for toy normal example The left graph is for r=1/0.9r=1/0.9 and the right for r=100r=100 The curves for the checks of Evans and Moshonov are indistinguishable.

6.2 Poisson example

The following example was first discussed in Sisson et al., (2018), where it was used to demonstrate some difficulties with naïve approaches to summary statistic choice in likelihood-free inference. We will use it to illustrate amortized SBI computations for conflict checking with density ratio classes in a simple one-dimensional case.

In this example we have 5 observations, written as y=(y1,,y5)y=(y_{1},\dots,y_{5})^{\top}, and the observed value is yobs=(0,0,0,0,5)y_{\text{obs}}=(0,0,0,0,5)^{\top}. The assumed model is yiθiidPoisson(θ)y_{i}\mid\theta\stackrel{{\scriptstyle iid}}{{\sim}}\text{Poisson}(\theta), i=1,,5i=1,\dots,5. We consider a density ratio class of prior densities for θ\theta with l(θ)=(1/3)l^(θ)l(\theta)=(1/3)\hat{l}(\theta) where l^(θ)\hat{l}(\theta) is a lognormal density with parameters μ=0.25+1/16\mu=0.25+1/16, σ=1/4\sigma=1/4, and u(θ)=u^(θ)u(\theta)=\hat{u}(\theta) a lognormal density with μ=0.25+1/4\mu=0.25+1/4, σ=1/2\sigma=1/2. The ratio of integrated upper bound to lower bound is r=3r=3. Plots of l(θ)l(\theta) and u(θ)u(\theta) are shown in Appendix E of the supplementary material. Write S1(y)=(1/5)i=15yiS_{1}(y)=(1/5)\sum_{i=1}^{5}y_{i} for the sample mean y¯\bar{y}, and S2(y)=(1/4)i=15(yiy¯)2S_{2}(y)=(1/4)\sum_{i=1}^{5}(y_{i}-\bar{y})^{2} for the sample variance s2s^{2}. Since in the Poisson model the mean and variance are equal to θ\theta, S1(y)S_{1}(y) and S2(y)S_{2}(y) are both sample estimates of θ\theta. The observed values of these summary statistics are S1(yobs)=1S_{1}(y_{\text{obs}})=1 and S2(yobs)=5S_{2}(y_{\text{obs}})=5, and they are very different, suggesting that their values conflict and that a model that is over-dispersed with respect to the Poisson, such as a negative binomial, could be better. However, in the case of a likelihood-free analysis using only the summary statistic S1(y)S_{1}(y), or only the summary statistic S2(y)S_{2}(y), it is possible to match the observed summary statistic with the Poisson model. The summary statistics are discrete here, but we use continuous approximations for the sample mean and variance summaries.

We perform three robust Bayesian analyses where the summary statistics consist of 1) S1(y)S_{1}(y) only 2) S1(y)S_{1}(y) and S2(y)S_{2}(y) and 3) S2(y)S_{2}(y) only. Since S1(y)=y¯S_{1}(y)=\bar{y} is a sufficient statistic for the Poisson model, u^(θ|Sobs)\hat{u}(\theta|S_{\text{obs}}) and l^(θ|Sobs)\hat{l}(\theta|S_{\text{obs}}) are the same for summary statistic choices 1) and 2), but it is interesting to see whether our computational methods give the same answer for these cases. For summary statistic choice 2), as mentioned above, the observed summary statistic components are in conflict and we use this case to demonstrate our conflict checks.

From simulated data under l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta) we approximate lower and upper bound functions for the posterior density for the three different summary statistic choices using the methods of Section 4. The upper bound is normalized to be a density function. Figure 2 shows the results. For case (a) of the figure where the summary statistic is the sample mean, the likelihood information is not in conflict with the prior information. We can see that the lower and upper bound functions are close. For case (b) of the figure where the summary statistic is the mean and variance, the lower and upper bound functions are close too, and the result is nearly identical to case (a), as expected due to the sufficiency of the sample mean. On the other hand, scenario (c) represents a situation where the likelihood is less consistent with the prior information, and the prior ambiguity leads to greater posterior ambiguity, because of the greater posterior sensitivity to the prior when the prior and likelihood conflict. Calibration plots for the three different cases to check the accuracy of the amortized inference computations are shown in Appendix E of the supplement. We used the same settings for training in each case, but the calibration plots suggest that for the upper bound prior u^(θ)\hat{u}(\theta) and S2(y)S_{2}(y) the posterior estimation is not reliable.

Refer to caption
(a) y¯\bar{y}
Refer to caption
(b) (y¯,s2\bar{y},s^{2})
Refer to caption
(c) s2s^{2}
Figure 2: Estimated upper bound and lower bound for posterior densities via robust Bayes with different summary statistic choices.

Checking for prior-data conflict between the density ratio class and the likelihood using the discrepancy in Section 5.2 using calibration (19) we obtain upper tail probabilities for the three summary statistic choices of 0.956 for case 1, 0.917 for case 2, and 0.084 for case 3. This indicates some conflict between the likelihood and prior density ratio class in case 3, where the data information consists of just the sample variance summary which is inconsistent with the prior information. We also check for conflict between the two summary statistic components as described in Sections 5.4 and 5.5 with SA=s2S_{A}=s^{2} and SB=y¯S_{B}=\bar{y}. As discussed in Section 5.4, we should not choose SAS_{A} to be the sufficient statistic y¯\bar{y}. It is also not appropriate to do our conflict check between summaries if the prior density ratio class or its bounds are in conflict with the information in SAS_{A}, and we consider new upper and lower bound functions that are not in conflict with s2=5s^{2}=5 before performing the check. We let l(θ)=(1/2)l^(θ)l(\theta)=(1/2)\hat{l}(\theta) where l^(θ)\hat{l}(\theta) is a lognormal density with parameters μ=1.09+1/9\mu=1.09+1/9, σ=1/3\sigma=1/3, and u(θ)=u^(θ)u(\theta)=\hat{u}(\theta) a lognormal density with μ=1.09+1/16\mu=1.09+1/16, σ=1/4\sigma=1/4. With these choices l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta) are roughly peaked at 33, and there is no conflict between SAS_{A} and the prior. The calibration tail probability for the conflict check between the summaries for the prior density ratio class (Section 5.4 and Appendix C) is 0.8060.806, indicating a lack of conflict, due to the conservatism caused by the prior ambiguity. In the check for conflict between summary statistics based on the bounds using the method of Evans and Moshonov, (2006) (Section 5.5), for the lower bound the calibration tail probability is 0.003 and for the upper bound it is 0.006, which indicates conflict and the problem with the specification of the model here.

6.3 Toad example

Our next example considers an agent-based model for movement of a species of toads (Anaxyrus floweri, or Fowler’s toads). We consider data originally discussed in Marchand et al., (2017), also analyzed in Frazier and Drovandi, (2021). The data are modelled using ‘model 2’ in Marchand et al., (2017), which has a parameter θ=(α,δ,p0)\theta=(\alpha,\delta,p_{0})^{\top}, where α\alpha and δ\delta are stability and scale parameters in an alpha-stable distribution for an overnight displacement in toad movements, and p0p_{0} is a probability for a toad returning to a previously used refuge.

The raw data consists of GPS locations for 63 days and 66 toads. The summary statistic vector SS that we consider has dimension 2424, with S=(SA,SB)S=(S_{A}^{\top},S_{B}^{\top})^{\top}, and SAS_{A} and SBS_{B} are 12-dimensional summary statistic vectors. The vector SAS_{A} summarizes the displacements for all toads for a one day lag; the vector SBS_{B} summarizes the displacements for all toads for a two day lag. The analysis in Frazier and Drovandi, (2021) suggests conflict between the summary statistics at different lags (i.e. between SAS_{A} and SBS_{B}). They also consider summary statistics at lags of 4 and 8 days, but we do not do this here.

We will consider three different prior density ratio classes. In all three cases, the upper bound u(θ)u(\theta) will be a uniform density for θ\theta on the range [1,2]×[30,50]×[0,0.9]\left[1,2\right]\times\left[30,50\right]\times\left[0,0.9\right]. The lower bound functions l(θ)l(\theta) are different in the three cases. The first case (hereafter case 1) takes l(θ)=(1/3)l^(θ)l(\theta)=(1/3)\hat{l}(\theta) and l^(θ)\hat{l}(\theta) a uniform density on [1.2,1.8]×[30,50]×[0,0.9]\left[1.2,1.8\right]\times\left[30,50\right]\times\left[0,0.9\right]. The second case (hereafter case 2) takes l(θ)=(1/3)l^(θ)l(\theta)=(1/3)\hat{l}(\theta) and l^(θ)\hat{l}(\theta) a uniform density on [1,2]×[35,45]×[0,0.9]\left[1,2\right]\times\left[35,45\right]\times\left[0,0.9\right]. The third case (hereafter case 3) takes l(θ)=(1/3)l^(θ)l(\theta)=(1/3)\hat{l}(\theta) and l^(θ)\hat{l}(\theta) a uniform density on [1,2]×[30,50]×[0.1,0.8]\left[1,2\right]\times\left[30,50\right]\times\left[0.1,0.8\right]. In case 1, l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta) are densities where θ1,θ2\theta_{1},\theta_{2} and θ3\theta_{3} are independent, with the marginal densities for θ2\theta_{2} and θ3\theta_{3} being the same for l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta), whereas the densities for θ1\theta_{1} are different. Case 2 and case 3 are similar to case 1 with the different densities for θ2\theta_{2} and θ3\theta_{3} respectively. The summary statistic data comes from the BSL R package. Considering case 1 above, and normalizing the upper bound to be a density function, Figure 3 shows the estimated upper and lower bound functions r~(Sobs)1l~(θj|Sobs)\tilde{r}(S_{\text{obs}})^{-1}\widetilde{l}(\theta_{j}|S_{obs}) and u~(θj|Sobs)\widetilde{u}(\theta_{j}|S_{obs}) for the marginal posterior density ratio classes for the three parameters, j=1,2,3j=1,2,3 for the three cases for the specified prior density ratio class. Diagnostic plots examining the reliability of the amortized inference computation are shown in Appendix A.

Refer to caption
Refer to caption
Refer to caption
Figure 3: Estimated upper bound and lower bound functions for marginal posterior density ratio classes for case 1 (first row), case 2 (second row) and case 3 (third row) for the Toad example.

We also do the summary statistics conflict checking as described in Section 5.5 based on the bounds. The calibration tail probabilities for conflict between the summaries based on the bounds are <0.005<0.005 for the upper bound in all three cases, while for the lower bound they are less than 0.10.1 for cases 2 and 3. This suggests conflict between the summary statistics at different time lags, which was also the conclusion of Frazier and Drovandi, (2021).

7 Discussion

Much recent work on neural SBI methods has focused on achieving robustness to misspecification of the likelihood component of the model. Complementing this work, we consider issues of robustness to the choice of prior, and implementing robust Bayes methods which avoid the choice of a single prior. We demonstrate that recently developed amortized neural SBI methods can be adapted to compute robust Bayesian inferences based on density ratio classes. Methods of checking for conflict between a density ratio class and the likelihood, and checking for conflict between subsets of summary statistics, are also developed. Conflict checks can be based on checking whether all priors in a density ratio class conflict with likelihood information, or on whether one of the bounds is in conflict, where conventional Bayesian conflict checking methods can be employed. In the latter case, the prior class may contain some reasonable priors and some unreasonable priors, in the sense of conflicting with the likelihood information. It would be possible to combine the methods discussed in this work with any of the previously suggested methods in the literature for robustifying the model through model expansions to obtain robustness to both likelihood misspecification and prior ambiguity or prior-data conflicts. A difficulty with all robust Bayes methods is the challenge of eliciting the prior class used. While Rinderknecht et al., (2011) have done some pioneering work in this direction for density ratio classes, this remains a difficult task in models for which the parameter is high-dimensional.

Acknowledgements

David Nott’s research was supported by the Ministry of Education, Singapore, under the Academic Research Fund Tier 2 (MOE-T2EP20123-0009). Evans’ research was supported by a grant from the Natural Sciences and Engineering Research Council of Canada. Wang Yuyan thanks the developers of the JANA package and Stefan Radev for patiently answering her questions.

Appendix A: Previous work on SBI with misspecified likelihood

One of the earliest works on model misspecification and SBI is Ratmann et al., (2009), who consider model expansions for ABC regarding tolerances as model parameters. Wilkinson, (2013) demonstrates that the posterior approximation in ABC is an exact posterior by interpreting the kernel and tolerance in ABC procedures as specifying a model error. Frazier et al., (2020) examine theoretically the behaviour of standard and regression-adjustment ABC methods under misspecification, and suggest diagnostics. Lewis et al., (2021) give a general discussion of the value of using insufficient summary statistics to robustify Bayesian modelling, and discuss implementing exact conditioning for certain types of summary statistics in linear model settings. Frazier and Drovandi, (2021) suggest model expansions for the Bayesian synthetic likelihood approach, where a parameter is introduced for each summary statistic. The added parameters can be in a mean or variance adjustment, and they discount a sparse set of incompatible summary statistics which cannot be matched under the assumed model. The approach is useful for understanding the nature of misspecification through the summary statistics, which can guide model improvement. Frazier et al., (2025) demonstrate that the standard synthetic likelihood posterior can exhibit non-standard behaviour under misspecification. Numerous authors have warned about the difficulties arising with neural SBI methods under misspecification (Cannon et al.,, 2022; Ward et al.,, 2022; Huang et al.,, 2023; Kelly et al.,, 2024), with suggested remedies including model expansions similar to those considered by Frazier et al., (2020) in the BSL context, methods for choosing summary statistics where incompatibility is penalized, or addition of noise to summary statistics in training. Another relevant approach to dealing with misspecification is to use generalized Bayesian inference (Bissiri et al.,, 2016) where the likelihood is replaced by a pseudo-likelihood derived from a loss function. In the case of SBI methods, the loss function is often based on a scoring rule (Giummolè et al.,, 2019). Recent work on generalized Bayesian methods for SBI includes Cherief-Abdellatif and Alquier, (2020), Schmon et al., (2021), Matsubara et al., (2022), Dellaporta et al., (2022), Gao et al., (2023), Pacchiardi et al., (2024) and Weerasinghe et al., (2025).

Appendix B: Equivalence of definitions of likelihood ratio classes

Here we consider the equivalence of the two definitions of a density ratio class given in the main text. Write the lower bound function as l(θ)l(\theta) and the upper bound function as u(θ)u(\theta), 0l(θ)u(θ)0\leq l(\theta)\leq u(\theta), with

l(θ)𝑑θ>0andu(θ)𝑑θ<\int l(\theta)\,d\theta>0\;\;\;\text{and}\;\;\;\int u(\theta)\,d\theta<\infty

The first definition of a density ratio class is

ψl,u:={π^(θ)=π(θ)π(θ)𝑑θ;l(θ)π(θ)u(θ)}.\displaystyle\psi_{l,u}:=\left\{\hat{\pi}(\theta)=\frac{\pi(\theta)}{\int\pi(\theta)\,d\theta};l(\theta)\leq\pi(\theta)\leq u(\theta)\right\}. (S1)

In the case where l(θ)>0l(\theta)>0 for all θΘ\theta\in\Theta, an equivalent definition of ψl,u\psi_{l,u} is

ψl,u\displaystyle\psi_{l,u} ={π^(θ):l(θ)u(θ)π(θ)π(θ)u(θ)l(θ) for all θ,θΘ}.\displaystyle=\left\{\hat{\pi}(\theta):\frac{l(\theta)}{u(\theta^{\prime})}\leq\frac{\pi(\theta)}{\pi(\theta^{\prime})}\leq\frac{u(\theta)}{l(\theta^{\prime})}\mbox{ for all $\theta,\theta^{\prime}\in\Theta$}\right\}. (S2)

To demonstrate the equivalence, start with definition (S1). If π^(θ)\hat{\pi}(\theta) belongs to ψl,u\psi_{l,u} in (S1), then for any given θ,θ\theta,\theta^{\prime}, and assuming l(θ)>0l(\theta)>0 for all θ\theta, we can write

l(θ)π(θ)u(θ),   1/u(θ)1/π(θ)1/l(θ).l(\theta)\leq\pi(\theta)\leq u(\theta),\;\;\;1/u(\theta^{\prime})\leq 1/\pi(\theta^{\prime})\leq 1/l(\theta^{\prime}).

Multiplying terms in the inequalities gives

l(θ)/u(θ)π(θ)/π(θ)u(θ)/l(θ),l(\theta)/u(\theta^{\prime})\leq\pi(\theta)/\pi(\theta^{\prime})\leq u(\theta)/l(\theta^{\prime}),

and π^(θ)\hat{\pi}(\theta) belongs to ψl,u\psi_{l,u} in (S2).

In the other direction, assume that π^(θ)\hat{\pi}(\theta) belongs to ψl,u\psi_{l,u} given in (S2), with l(θ)>0l(\theta)>0 for all θΘ\theta\in\Theta. Then for all θ,θΘ\theta,\theta^{\prime}\in\Theta,

π(θ)π(θ)u(θ)l(θ),\frac{\pi(\theta)}{\pi(\theta^{\prime})}\leq\frac{u(\theta)}{l(\theta^{\prime})},

and setting θ=θ\theta=\theta^{\prime} we can deduce that l(θ)u(θ)l(\theta)\leq u(\theta) for all θΘ\theta\in\Theta, and hence u(θ)>0u(\theta)>0 for all θΘ\theta\in\Theta. Rearranging the above inequality,

π(θ)u(θ)\displaystyle\frac{\pi(\theta)}{u(\theta)} π(θ)l(θ),\displaystyle\leq\frac{\pi(\theta^{\prime})}{l(\theta^{\prime})}, (S3)

and

c=supθΘπ(θ)u(θ)<,c=\sup_{\theta\in\Theta}\frac{\pi(\theta)}{u(\theta)}<\infty,

which implies that π(θ)cu(θ)\pi(\theta)\leq cu(\theta) for all θ\theta. Suppose that there exists θ\theta^{\prime} such that π(θ)<cl(θ)\pi(\theta^{\prime})<cl(\theta^{\prime}), which implies that

π(θ)l(θ)<c.\displaystyle\frac{\pi(\theta^{\prime})}{l(\theta^{\prime})}<c. (S4)

Then from (S3) and (S4) and for any θ\theta,

π(θ)u(θ)π(θ)l(θ)<c,\frac{\pi(\theta)}{u(\theta)}\leq\frac{\pi(\theta^{\prime})}{l(\theta^{\prime})}<c,

for all θ\theta, which implies

c=supθπ(θ)u(θ)<c,c=\sup_{\theta}\frac{\pi(\theta)}{u(\theta)}<c,

a contradiction. So we must have cl(θ)π(θ)cu(θ)cl(\theta)\leq\pi(\theta)\leq cu(\theta), so that π^(θ)\hat{\pi}(\theta) belongs to ψl,u\psi_{l,u} defined by (S1).

Appendix C: Checking for conflict between summary statistics using density ratio classes

Consider a similar check to Section 5.3, where now p(θ|Sobs,A)p(\theta|S_{\text{obs},A}) is the prior and p(Sobs,B|Sobs,A,θ)p(S_{\text{obs},B}|S_{\text{obs},A},\theta) is the likelihood. Our prior-data conflict discrepancy, for the case where SAS_{A} is observed but before updating by SBS_{B}, is

r(SB|Sobs,A)\displaystyle r(S_{B}|S_{\text{obs},A}) =r(Sobs,A)p(SB|Sobs,A,u^)p(SB|Sobs,A,l^),\displaystyle=r(S_{\text{obs},A})\frac{p(S_{B}|S_{\text{obs},A},\hat{u})}{p(S_{B}|S_{\text{obs},A},\hat{l})}, (S5)

where

p(SB|Sobs,A,u^)\displaystyle p(S_{B}|S_{\text{obs},A},\hat{u}) :=p(Sobs,A,SB;u^)p(Sobs,A;u^),\displaystyle:=\frac{p(S_{\text{obs},A},S_{B};\hat{u})}{p(S_{\text{obs},A};\hat{u})}, (S6)

and

p(SB|Sobs,A,l^)\displaystyle p(S_{B}|S_{\text{obs},A},\hat{l}) :=p(Sobs,A,SB;l^)p(Sobs,A.l^),\displaystyle:=\frac{p(S_{\text{obs},A},S_{B};\hat{l})}{p(S_{\text{obs},A}.\hat{l})}, (S7)

Using (S6) and equation (13) in the main text we obtain

p(SB|Sobs,A,u^)\displaystyle p(S_{B}|S_{\text{obs},A},\hat{u}) =u^(θ)p(Sobs,A,SB|θ)u^(θ|Sobs,A,SB)×u^(θ|Sobs,A)u^(θ)p(Sobs,A|θ),\displaystyle=\frac{\hat{u}(\theta)p(S_{\text{obs},A},S_{B}|\theta)}{\hat{u}(\theta|S_{\text{obs},A},S_{B})}\times\frac{\hat{u}(\theta|S_{\text{obs},A})}{\hat{u}(\theta)p(S_{\text{obs},A}|\theta)}, (S8)

and similarly from (S7) and equation (13) in the main text

p(SB|Sobs,A,l^)\displaystyle p(S_{B}|S_{\text{obs},A},\hat{l}) =l^(θ)p(Sobs,A,SB|θ)l^(θ|Sobs,A,SB)×l^(θ|Sobs,A)l^(θ)p(Sobs,A|θ).\displaystyle=\frac{\hat{l}(\theta)p(S_{\text{obs},A},S_{B}|\theta)}{\hat{l}(\theta|S_{\text{obs},A},S_{B})}\times\frac{\hat{l}(\theta|S_{\text{obs},A})}{\hat{l}(\theta)p(S_{\text{obs},A}|\theta)}. (S9)

Also,

r(Sobs,A)=rp(Sobs,A;u^)p(Sobs,A;l^)=ru^(θ)l^(θ|Sobs,A)l^(θ)u^(θ|Sobs,A),\displaystyle r(S_{\text{obs},A})=r\frac{p(S_{\text{obs},A};\hat{u})}{p(S_{\text{obs},A};\hat{l})}=r\frac{\hat{u}(\theta)\hat{l}(\theta|S_{\text{obs},A})}{\hat{l}(\theta)\hat{u}(\theta|S_{\text{obs},A})}, (S10)

for any value of θ\theta, where the last equality above comes from equation (14) in the main text. Using the same value of θ\theta in the expressions (S8), (S9) and (S10) we obtain for (S5)

r(SB|Sobs,A)=ru^(θ)l^(θ|Sobs,A,SB)l^(θ)u^(θ|Sobs,A,SB).\displaystyle r(S_{B}|S_{\text{obs},A})=r\frac{\hat{u}(\theta)\hat{l}(\theta|S_{\text{obs},A},S_{B})}{\hat{l}(\theta)\hat{u}(\theta|S_{\text{obs},A},S_{B})}. (S11)

Using normalizing flow approximations u~(θ|Sobs,A,SB)\widetilde{u}(\theta|S_{\text{obs},A},S_{B}) and l~(θ|Sobs,A,SB)\widetilde{l}(\theta|S_{\text{obs},A},S_{B}) for u^(θ|Sobs,A,SB)\hat{u}(\theta|S_{\text{obs},A},S_{B}) and l^(θ|Sobs,A,SB)\hat{l}(\theta|S_{\text{obs},A},S_{B}) respectively gives an approximation of (S11), which we denote by r~(SB|Sobs,A)\widetilde{r}(S_{B}|S_{\text{obs},A}).

The density ratio class for the posterior density given SA=Sobs,AS_{A}=S_{\text{obs},A} has lower and upper bound functions

r(Sobs,A)1l^(θ|Sobs,A)<u^(θ|Sobs,A).r(S_{\text{obs},A})^{-1}\hat{l}(\theta|S_{\text{obs},A})<\hat{u}(\theta|S_{\text{obs},A}).

where l^(θ|Sobs,A)\hat{l}(\theta|S_{\text{obs},A}) and u^(θ|Sobs,A)\hat{u}(\theta|S_{\text{obs},A}) are posterior densities for θ\theta given SA=Sobs,AS_{A}=S_{\text{obs},A} for priors l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta) respectively. This density ratio class leads to a density ratio class of conditional predictive densities of SBS_{B} given SA=Sobs,AS_{A}=S_{\text{obs},A}, ψl(SB|Sobs,A),u(SB|Sobs,A)\psi_{l(S_{B}|S_{\text{obs},A}),u(S_{B}|S_{\text{obs},A})}, where

l(SB|Sobs,A)=p(SB|Sobs,A,θ)r(Sobs,A)1l^(θ|Sobs,A)𝑑θl(S_{B}|S_{\text{obs},A})=\int p(S_{B}|S_{\text{obs},A},\theta)r(S_{\text{obs},A})^{-1}\hat{l}(\theta|S_{\text{obs},A})\,d\theta

and

u(SB|Sobs,A)=p(SB|Sobs,A,θ)u^(θ|Sobs,A)𝑑θ.u(S_{B}|S_{\text{obs},A})=\int p(S_{B}|S_{\text{obs},A},\theta)\hat{u}(\theta|S_{\text{obs},A})\,d\theta.

This density ratio class is larger than the set of predictive distributions {p(SB|Sobs,A;π^):π^(θ)ψl,u}\{p(S_{B}|S_{\text{obs},A};\hat{\pi}):\hat{\pi}(\theta)\in\psi_{l,u}\}, leading to conservative Bayesian predictive pp-values when calibrating the discrepancy for our conflict check.

Write AA for the event A={r~(SB|Sobs,A)r~(Sobs,B|Sobs,A)}A=\{\widetilde{r}(S_{B}|S_{\text{obs},A})\geq\widetilde{r}(S_{\text{obs},B}|S_{\text{obs},A})\}. An approximate calibration tail probability for the discrepancy (S5) is

P¯(A)\displaystyle\overline{P}(A) =Ap(SB|Sobs,A;u^)𝑑SBAp(SB|Sobs,A;u^)𝑑SB+r(Sobs,A)1Acp(SB|Sobs,A;l^)𝑑SB.\displaystyle=\frac{\int_{A}p(S_{B}|S_{\text{obs},A};\hat{u})\,dS_{B}}{\int_{A}p(S_{B}|S_{\text{obs},A};\hat{u})\,dS_{B}+r(S_{\text{obs},A})^{-1}\int_{A^{c}}p(S_{B}|S_{\text{obs},A};\hat{l})\,dS_{B}}. (S12)

The integrals in the above expression can be approximated by

Ap(SB|Sobs,A;u^SA)𝑑SB1Vv=1VI{r~(Sv,Bu,B|A|Sobs,A)r~(Sobs,B|Sobs,A)},\int_{A}p(S_{B}|S_{\text{obs},A};\hat{u}_{S_{A}})\,dS_{B}\approx\frac{1}{V}\sum_{v=1}^{V}I\left\{\widetilde{r}(S_{v,B}^{u,B|A}|S_{\text{obs},A})\geq\widetilde{r}(S_{\text{obs},B}|S_{\text{obs},A})\right\},

for Svu,B|Ap(SB|Sobs,A,u^SA)S_{v}^{u,B|A}\sim p(S_{B}|S_{\text{obs},A},\hat{u}_{S_{A}}), v=1,,Vv=1,\dots,V, and

Acp(SB|Sobs,A;l^SA)𝑑SB1Vv=1VI{r~(Sv,Bl,B|A|Sobs,A)<r~(Sobs,B|Sobs,A)},\int_{A^{c}}p(S_{B}|S_{\text{obs},A};\hat{l}_{S_{A}})\,dS_{B}\approx\frac{1}{V}\sum_{v=1}^{V}I\left\{\widetilde{r}(S_{v,B}^{l,B|A}|S_{\text{obs},A})<\widetilde{r}(S_{\text{obs},B}|S_{\text{obs},A})\right\},

for Svl,B|Ap(SB|Sobs,A,l^SA)S_{v}^{l,B|A}\sim p(S_{B}|S_{\text{obs},A},\hat{l}_{S_{A}}), v=1,,Vv=1,\dots,V. Approximate simulation of replicates from p(SB|Sobs,A,u^)p(S_{B}|S_{\text{obs},A},\hat{u}) could be done using many approaches, but a simple approach is based on an ABC approximation. We simulate a large number LL of replicates (θl,SAl,SBl)u^(θ)p(S|θ)(\theta^{l},S_{A}^{l},S_{B}^{l})\sim\hat{u}(\theta)p(S|\theta), and then select from these the VV replicates for which SAlS_{A}^{l} is closest to Sobs,AS_{\text{obs},A} in some distance, such as Euclidean distance. Simulation of replicates from p(SB|Sobs,A,l^)p(S_{B}|S_{\text{obs},A},\hat{l}) is done similarly.

Appendix D: Ricker model

We consider an additional time series application involving the Ricker model (Ricker,, 1954), a simple model for population sizes in ecology. Let NtN_{t}, t=1,,Tt=1,\dots,T, be population sizes, with observations dtd_{t}\sim Poisson(ϕNt\phi N_{t}), where ϕ\phi is a sampling parameter. The series NtN_{t} has some initial value N0N_{0} and one-step conditional distributions are defined by

Nt+1=RNtexp(Nt+et+1),N_{t+1}=RN_{t}\exp(-N_{t}+e_{t+1}), (S13)

where RR is a growth parameter and etN(0,σ2)e_{t}\sim N(0,\sigma^{2}) is an independent environmental noise series. We write θ=(θ1,θ2,θ3)=(logϕ,logR,logσ)\theta=(\theta_{1},\theta_{2},\theta_{3})^{\top}=(\log\phi,\log R,\log\sigma)^{\top} for the parameters. For our “observed” data in this example we consider a simulated time series of length T=100T=100 with parameter θ=(12,0.01,0.75)\theta=(12,0.01,-0.75)^{\top}. We use the summary statistics used in Wood, (2010) consisting of a combination of marginal distribution summaries, autocorrelation values, parameter estimates from an auxiliary autoregressive model and the number of zeros observed.

We will consider three different prior density ratio classes. In all three cases, the upper bound u(θ)u(\theta) will be a uniform density for θ\theta on the range [11,13]×[0.02,0.04]×[2,0.5]\left[11,13\right]\times\left[-0.02,0.04\right]\times\left[-2,-0.5\right]. The lower bound functions l(θ)l(\theta) are different in the three cases. The first case (hereafter case 1) takes 3l(θ)=l^(θ)3l(\theta)=\hat{l}(\theta) to be a uniform density on [11.2,12.8]×[0.02,0.04]×[2,0.5][11.2,12.8]\times[-0.02,0.04]\times[-2,-0.5]. The second case (hereafter case 2) takes 3l(θ)=l^(θ)3l(\theta)=\hat{l}(\theta) to be a uniform density on [11,13]×[0.01,0.03]×[2,0.5][11,13]\times[-0.01,0.03]\times[-2,-0.5]. The third case (hereafter case 3) takes 3l(θ)=l^(θ)3l(\theta)=\hat{l}(\theta) to be a uniform density on [11,13]×[0.02,0.04]×[1.8,0.7][11,13]\times[-0.02,0.04]\times[-1.8,-0.7]. In case 1, l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta) are densities where θ1,θ2\theta_{1},\theta_{2} and θ3\theta_{3} are independent, with the marginal densities for θ2\theta_{2} and θ3\theta_{3} being the same for l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta), whereas the densities for θ1\theta_{1} are different. So case 1 accommodates densities where in the marginal prior for θ1\theta_{1} the density can approach zero near the edge of the support. Case 2 is similar, but we allow a wider range of shapes for the marginal prior density for θ2\theta_{2}, whereas case 3 is more flexible in the shape of the marginal prior density for θ3\theta_{3}.

We use the offline training method in Radev et al., (2023) with 100,000100,000 simulated samples for both l^(θ)\hat{l}(\theta) and u^(θ)\hat{u}(\theta) to obtain posterior approximations l~(θ|S)\widetilde{l}(\theta|S) and u~(θ|S)\widetilde{u}(\theta|S). Setting S=SobsS=S_{\text{obs}} and computing r~(Sobs)\widetilde{r}(S_{\text{obs}}) using equation (15) in the main text gives estimated lower and upper bounds for the posterior density ratio class. Figure S1 shows estimated lower and upper bound functions r~(Sobs)1l~(θj|Sobs)\tilde{r}(S_{\text{obs}})^{-1}\widetilde{l}(\theta_{j}|S_{obs}) and u~(θj|Sobs)\widetilde{u}(\theta_{j}|S_{obs}) for the marginal posterior density ratio classes for the three parameters, j=1,2,3j=1,2,3. We see that the additional flexibility in the shape of the prior in the prior density ratio class for logϕ\log\phi, logR\log R and logσ\log\sigma for cases 1, 2 and 3 respectively is reflected in the increased prior ambiguity in these parameters in the different cases. Appendix E shows diagnostics of the accuracy of the amortized posterior approximations.

Refer to caption
Refer to caption
Refer to caption
Figure S1: Estimated upper bound and lower bound functions for marginal posterior density ratio classes for case 1 (first row), case 2 (second row) and case 3 (third row) for the Ricker example.

Appendix E: Additional plots for the examples

For the Poisson model of Section 6.2, the lower and upper bounds are plotted in Figure S2.

Refer to caption

Figure S2: Lower and upper bound functions for prior density ratio class for the Poisson example.

The calibration plots below provide checks for the adequacy of the amortized inference approximations based on simulation-based calibration (Talts et al.,, 2018) for the examples in Section 6, and the Ricker model example in Appendix D. The idea of simulation based calibration is to draw parameters and data from the joint Bayesian model, then approximate posterior samples given the synthetic datasets, and obtain ranks of the prior samples within marginal posterior samples. The plots below show the difference between an empirical distribution of probability integral transform (PIT) values and a uniform distribution, together with simultaneous confidence bands. If the line is outside the bands this indicates inadequacy of the computational approximation (Säilynoja et al.,, 2022).

Refer to captionRefer to caption
(a) y¯\bar{y}
Refer to captionRefer to caption
(b) (y¯,s2\bar{y},s^{2})
Refer to captionRefer to caption
(c) s2s^{2}
Figure S3: Calibration plots for diagnosing accuracy of amortized posterior computation for different summary statistic choices for the Poisson example. The first row is for the lower bound and the second row is for the upper bound.
Refer to captionRefer to caption
Figure S4: Calibration plots for diagnosing accuracy of amortized posterior computation for case 1 for lower and upper bounds for the Ricker example. The first row is for the lower bound and the second row is for the upper bound.
Refer to captionRefer to caption
Figure S5: Calibration plots for diagnosing accuracy of amortized posterior computation for case 2 for lower and upper bounds for the Ricker example. The first row is for the lower bound and the second row is for the upper bound.
Refer to captionRefer to caption
Figure S6: Calibration plots for diagnosing accuracy of amortized posterior computation for case 3 for lower and upper bounds for the Ricker example. The first row is for the lower bound and the second row is for the upper bound.
Refer to captionRefer to caption
Figure S7: Calibration plots for diagnosing accuracy of amortized posterior computation for case 1 for lower and upper bounds for the Fowler’s toad example. The first row is for the lower bound and the second row is for the upper bound.
Refer to captionRefer to caption
Figure S8: Calibration plots for diagnosing accuracy of amortized posterior computation for case 2 for lower and upper bounds for the Fowler’s toad example. The first row is for the lower bound and the second row is for the upper bound.
Refer to captionRefer to caption
Figure S9: Calibration plots for diagnosing accuracy of amortized posterior computation for case 3 for lower and upper bounds for the Fowler’s toad example. The first row is for the lower bound and the second row is for the upper bound.

References

  • Al Labadi and Evans, (2017) Al Labadi, L. and Evans, M. (2017). Optimal robustness results for relative belief inferences and the relationship to prior-data conflict. Bayesian Analysis, 12(3):705–728.
  • Bayarri and Berger, (2000) Bayarri, M. J. and Berger, J. O. (2000). P values for composite null models (with discussion). Journal of the American Statistical Association, 95(452):1127–1142.
  • Bayarri and Castellanos, (2007) Bayarri, M. J. and Castellanos, M. E. (2007). Bayesian checking of the second levels of hierarchical models. Statistical Science, 22(3):322–343.
  • Berger, (1990) Berger, J. O. (1990). Robust Bayesian analysis: sensitivity to the prior. Journal of Statistical Planning and Inference, 25(3):303–328.
  • Berger, (1994) Berger, J. O. (1994). An overview of robust Bayesian analysis. Test, 3(1):5–124.
  • Bissiri et al., (2016) Bissiri, P. G., Holmes, C. C., and Walker, S. G. (2016). A general framework for updating belief distributions. Journal of the Royal Statistical Society. Series B, Statistical methodology, 78(5):1103.
  • Box, (1980) Box, G. E. P. (1980). Sampling and Bayes’ inference in scientific modelling and robustness (with discussion). Journal of the Royal Statistical Society, Series A, 143(4):383–430.
  • Cannon et al., (2022) Cannon, P., Ward, D., and Schmon, S. M. (2022). Investigating the impact of model misspecification in neural simulation-based inference. arXiv preprint arXiv:2209.01845.
  • Chakraborty et al., (2023) Chakraborty, A., Nott, D. J., and Evans, M. (2023). Weakly informative priors and prior-data conflict checking for likelihood-free inference. Statistics and Its Interface, 16(3):445–457.
  • Chang et al., (2025) Chang, P. E., Loka, N., Huang, D., Remes, U., Kaski, S., and Acerbi, L. (2025). Amortized probabilistic conditioning for optimization, simulation and inference. arXiv preprint arXiv:2410.15320.
  • Cherief-Abdellatif and Alquier, (2020) Cherief-Abdellatif, B.-E. and Alquier, P. (2020). MMD-Bayes: Robust Bayesian estimation via maximum mean discrepancy. In Zhang, C., Ruiz, F., Bui, T., Dieng, A. B., and Liang, D., editors, Proceedings of The 2nd Symposium on Advances in Approximate Bayesian Inference, volume 118 of Proceedings of Machine Learning Research, pages 1–21. PMLR.
  • Coolen, (1994) Coolen, F. P. A. (1994). On Bernoulli experiments with imprecise prior probabilities. Journal of the Royal Statistical Society. Series D (The Statistician), 43(1):155–167.
  • Dellaporta et al., (2022) Dellaporta, C., Knoblauch, J., Damoulas, T., and Briol, F.-X. (2022). Robust Bayesian inference for simulator-based models via the MMD posterior bootstrap. In Camps-Valls, G., Ruiz, F. J. R., and Valera, I., editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 943–970. PMLR.
  • DeRobertis and Hartigan, (1981) DeRobertis, L. and Hartigan, J. A. (1981). Bayesian inference using intervals of measures. The Annals of Statistics, 9(2):235–244.
  • Dinh et al., (2016) Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2016). Density estimation using real NVP. arXiv preprint arXiv:1605.08803.
  • Durkan et al., (2019) Durkan, C., Bekasov, A., Murray, I., and Papamakarios, G. (2019). Neural spline flows. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
  • Evans and Jang, (2011) Evans, M. and Jang, G. H. (2011). A limit result for the prior predictive applied to checking for prior-data conflict. Statistics and Probability Letters, 81(8):1034 – 1038.
  • Evans and Moshonov, (2006) Evans, M. and Moshonov, H. (2006). Checking for prior-data conflict. Bayesian Analysis, 1(4):893–914.
  • Frazier and Drovandi, (2021) Frazier, D. T. and Drovandi, C. (2021). Robust approximate Bayesian inference with synthetic likelihood. Journal of Computational and Graphical Statistics, 30(4):958–976.
  • Frazier et al., (2024) Frazier, D. T., Kelly, R., Drovandi, C., and Warne, D. J. (2024). The statistical accuracy of neural posterior and likelihood estimation. arXiv preprint arXiv:2411.12068.
  • Frazier et al., (2025) Frazier, D. T., Nott, D. J., and Drovandi, C. (2025). Synthetic likelihood in misspecified models. Journal of the American Statistical Association, (To appear).
  • Frazier et al., (2023) Frazier, D. T., Nott, D. J., Drovandi, C., and Kohn, R. (2023). Bayesian inference using synthetic likelihood: Asymptotics and adjustments. Journal of the American Statistical Association, 118(544):2821–2832.
  • Frazier et al., (2020) Frazier, D. T., Robert, C. P., and Rousseau, J. (2020). Model misspecification in approximate Bayesian computation: consequences and diagnostics. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(2):421–444.
  • Gao et al., (2023) Gao, R., Deistler, M., and Macke, J. H. (2023). Generalized Bayesian inference for scientific simulators via amortized cost estimation. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., editors, Advances in Neural Information Processing Systems, volume 36, pages 80191–80219. Curran Associates, Inc.
  • Gelman et al., (1996) Gelman, A., Meng, X.-L., and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4):733–807.
  • Giummolè et al., (2019) Giummolè, F., Mameli, V., Ruli, E., and Ventura, L. (2019). Objective Bayesian inference with proper scoring rules. TEST, 28(3):728–755.
  • Glöckler et al., (2022) Glöckler, M., Deistler, M., and Macke, J. H. (2022). Variational methods for simulation-based inference. arXiv preprint arXiv:2203.04176.
  • Gloeckler et al., (2024) Gloeckler, M., Deistler, M., Weilbach, C., Wood, F., and Macke, J. H. (2024). All-in-one simulation-based inference. arXiv preprint arXiv:2404.09636.
  • Greenberg et al., (2019) Greenberg, D. S., Nonnenmacher, M., and Macke, J. H. (2019). Automatic posterior transformation for likelihood-free inference. In Chaudhuri, K. and Salakhutdinov, R., editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 2404–2414. PMLR.
  • Guttman, (1967) Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems. Journal of the Royal Statistical Society, Series B, 29(1):83–100.
  • Hermans et al., (2020) Hermans, J., Begy, V., and Louppe, G. (2020). Likelihood-free MCMC with amortized approximate ratio estimators. In III, H. D. and Singh, A., editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 4239–4248. PMLR.
  • Huang et al., (2023) Huang, D., Bharti, A., Souza, A., Acerbi, L., and Kaski, S. (2023). Learning robust statistics for simulation-based inference under model misspecification. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., editors, Advances in Neural Information Processing Systems, volume 36, pages 7289–7310. Curran Associates, Inc.
  • Kelly et al., (2024) Kelly, R., Nott, D. J., Frazier, D. T., Warne, D., and Drovandi, C. (2024). Misspecification-robust sequential neural likelihood for simulation-based inference. Transactions on Machine Learning Research, (Article number: 2347).
  • Kelly et al., (2025) Kelly, R. P., Warne, D. J., Frazier, D. T., Nott, D. J., Gutmann, M. U., and Drovandi, C. (2025). Simulation-based bayesian inference under model misspecification. arXiv preprint arXiv:2503.12315.
  • Lewis et al., (2021) Lewis, J. R., MacEachern, S. N., and Lee, Y. (2021). Bayesian restricted likelihood methods: Conditioning on insufficient statistics in Bayesian regression (with discussion). Bayesian Analysis, 16(4):1393 – 2854.
  • Liu et al., (2009) Liu, F., Bayarri, M. J., and Berger, J. O. (2009). Modularization in Bayesian analysis, with emphasis on analysis of computer models. Bayesian Analysis, 4(1):119–150.
  • Lueckmann et al., (2019) Lueckmann, J.-M., Bassetto, G., Karaletsos, T., and Macke, J. H. (2019). Likelihood-free inference with emulator networks. In Ruiz, F., Zhang, C., Liang, D., and Bui, T., editors, Proceedings of The 1st Symposium on Advances in Approximate Bayesian Inference, volume 96 of Proceedings of Machine Learning Research, pages 32–53. PMLR.
  • Lueckmann et al., (2017) Lueckmann, J.-M., Goncalves, P. J., Bassetto, G., Öcal, K., Nonnenmacher, M., and Macke, J. H. (2017). Flexible statistical inference for mechanistic models of neural dynamics. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  • Mao et al., (2021) Mao, Y., Wang, X., Nott, D. J., and Evans, M. (2021). Detecting conflicting summary statistics in likelihood-free inference. Statistics and Computing, 31(78).
  • Marchand et al., (2017) Marchand, P., Boenke, M., and Green, D. M. (2017). A stochastic movement model reproduces patterns of site fidelity and long-distance dispersal in a population of Fowler’s toads (Anaxyrus fowleri). Ecological Modelling, 360:63–69.
  • Marshall and Spiegelhalter, (2007) Marshall, E. C. and Spiegelhalter, D. J. (2007). Identifying outliers in Bayesian hierarchical models: a simulation-based approach. Bayesian Analysis, 2(2):409–444.
  • Matsubara et al., (2022) Matsubara, T., Knoblauch, J., Briol, F.-X., and Oates, C. J. (2022). Robust generalised Bayesian inference for intractable likelihoods. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(3):997–1022.
  • Moran et al., (2023) Moran, G. E., Blei, D. M., and Ranganath, R. (2023). Holdout predictive checks for Bayesian model criticism. Journal of the Royal Statistical Society Series B: Statistical Methodology, 86(1):194–214.
  • Nott et al., (2020) Nott, D. J., Wang, X., Evans, M., and Englert, B.-G. (2020). Checking for prior-data conflict using prior-to-posterior divergences. Statistical Science, 35(2):234–253.
  • Pacchiardi et al., (2024) Pacchiardi, L., Khoo, S., and Dutta, R. (2024). Generalized Bayesian likelihood-free inference. Electronic Journal of Statistics, 18(2):3628 – 3686.
  • Papamakarios and Murray, (2016) Papamakarios, G. and Murray, I. (2016). Fast ϵ\epsilon-free inference of simulation models with Bayesian conditional density estimation. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc.
  • Papamakarios et al., (2021) Papamakarios, G., Nalisnick, E., Rezende, D. J., Mohamed, S., and Lakshminarayanan, B. (2021). Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(57):1–64.
  • Pericchi and Walley, (1991) Pericchi, L. R. and Walley, P. (1991). Robust Bayesian credible intervals and prior ignorance. International Statistical Review, 59(1):1–23.
  • Presanis et al., (2013) Presanis, A. M., Ohlssen, D., Spiegelhalter, D. J., and Angelis, D. D. (2013). Conflict diagnostics in directed acyclic graphs, with applications in Bayesian evidence synthesis. Statistical Science, 28(3):376–397.
  • Price et al., (2018) Price, L. F., Drovandi, C. C., Lee, A. C., and Nott, D. J. (2018). Bayesian synthetic likelihood. Journal of Computational and Graphical Statistics, 27(1):1–11.
  • Radev et al., (2022) Radev, S. T., Mertens, U. K., Voss, A., Ardizzone, L., and Köthe, U. (2022). Bayesflow: Learning complex stochastic models with invertible neural networks. IEEE Transactions on Neural Networks and Learning Systems, 33(4):1452–1466.
  • Radev et al., (2023) Radev, S. T., Schmitt, M., Pratz, V., Picchini, U., Köthe, U., and Bürkner, P.-C. (2023). JANA: Jointly amortized neural approximation of complex Bayesian models. In Evans, R. J. and Shpitser, I., editors, Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, volume 216 of Proceedings of Machine Learning Research, pages 1695–1706. PMLR.
  • Ratmann et al., (2009) Ratmann, O., Andrieu, C., Wiuf, C., and Richardson, S. (2009). Model criticism based on likelihood-free inference, with an application to protein network evolution. Proceedings of the National Academy of Sciences, 106(26):10576–10581.
  • Ricker, (1954) Ricker, W. (1954). Stock and recruitment. Journal of the Fisheries Research Board of Canada, 11(5):559–623.
  • Rinderknecht et al., (2014) Rinderknecht, S. L., Albert, C., Borsuk, M. E., Schuwirth, N., Künsch, H. R., and Reichert, P. (2014). The effect of ambiguous prior knowledge on Bayesian model parameter inference and prediction. Environmental Modelling and Software, 62:300–315.
  • Rinderknecht et al., (2011) Rinderknecht, S. L., Borsuk, M. E., and Reichert, P. (2011). Eliciting density ratio classes. International Journal of Approximate Reasoning, 52(6):792–804.
  • Säilynoja et al., (2022) Säilynoja, T., Bürkner, P.-C., and Vehtari, A. (2022). Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison. Statistics and Computing, 32(2):32.
  • Scheel et al., (2011) Scheel, I., Green, P. J., and Rougier, J. C. (2011). A graphical diagnostic for identifying influential model choices in Bayesian hierarchical models. Scandinavian Journal of Statistics, 38(3):529–550.
  • Schmitt et al., (2024) Schmitt, M., Bürkner, P.-C., Köthe, U., and Radev, S. T. (2024). Detecting model misspecification in amortized Bayesian inference with neural networks. In Köthe, U. and Rother, C., editors, Pattern Recognition, pages 541–557, Cham. Springer Nature Switzerland.
  • Schmon et al., (2021) Schmon, S. M., Cannon, P. W., and Knoblauch, J. (2021). Generalized posteriors in approximate Bayesian computation. In Third Symposium on Advances in Approximate Bayesian Inference.
  • Sisson et al., (2018) Sisson, S., Fan, Y., and Beaumont, M. (2018). Overview of Approximate Bayesian Computation. In Sisson, S., Fan, Y., and Beaumont, M., editors, Handbook of Approximate Bayesian Computation, Chapman & Hall/CRC Handbooks of Modern Statistical Methods. CRC Press, Taylor & Francis Group, Boca Raton, Florida.
  • Steinbakk and Storvik, (2009) Steinbakk, G. H. and Storvik, G. O. (2009). Posterior predictive p-values in Bayesian hierarchical models. Scandinavian Journal of Statistics, 36(2):320–336.
  • Talts et al., (2018) Talts, S., Betancourt, M., Simpson, D., Vehtari, A., and Gelman, A. (2018). Validating Bayesian inference algorithms with simulation-based calibration. arXiv preprint arXiv:1804.06788.
  • Tavaré et al., (1997) Tavaré, S., Balding, D. J., Griffiths, R. C., and Donnelly, P. (1997). Inferring coalescence times from DNA sequence data. Genetics, 145(505-518).
  • Thomas et al., (2022) Thomas, O., Dutta, R., Corander, J., Kaski, S., and Gutmann, M. U. (2022). Likelihood-Free Inference by Ratio Estimation. Bayesian Analysis, 17(1):1 – 31.
  • Walley, (1991) Walley, P. (1991). Statistical Reasoning with Imprecise Probabilities. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis.
  • Walter and Augustin, (2009) Walter, G. and Augustin, T. (2009). Imprecision and prior-data conflict in generalized Bayesian inference. Journal of Statistical Theory and Practice, 3:255–271.
  • Walter and Augustin, (2010) Walter, G. and Augustin, T. (2010). Bayesian linear regression — different conjugate models and their (in)sensitivity to prior-data conflict. In Kneib, T. and Tutz, G., editors, Statistical Modelling and Regression Structures: Festschrift in Honour of Ludwig Fahrmeir, pages 59–78, Heidelberg. Physica-Verlag HD.
  • Walter and Coolen, (2016) Walter, G. and Coolen, F. P. A. (2016). Sets of priors reflecting prior-data conflict and agreement. In Carvalho, J. P., Lesot, M.-J., Kaymak, U., Vieira, S., Bouchon-Meunier, B., and Yager, R. R., editors, Information Processing and Management of Uncertainty in Knowledge-Based Systems, pages 153–164, Cham. Springer International Publishing.
  • Ward et al., (2022) Ward, D., Cannon, P., Beaumont, M., Fasiolo, M., and Schmon, S. (2022). Robust neural posterior estimation and statistical model criticism. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A., editors, Advances in Neural Information Processing Systems, volume 35, pages 33845–33859. Curran Associates, Inc.
  • Wasserman, (1992) Wasserman, L. (1992). Invariance properties of density ratio priors. The Annals of Statistics, 20(4):2177 – 2182.
  • Weerasinghe et al., (2025) Weerasinghe, C., Loaiza-Maya, R., Martin, G. M., and Frazier, D. T. (2025). ABC-based forecasting in misspecified state space models. International Journal of Forecasting, 41(1):270–289.
  • Wilkinson, (2013) Wilkinson, R. D. (2013). Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Statistical Applications in Genetics and Molecular Biology, 12(2):129 – 141.
  • Wiqvist et al., (2021) Wiqvist, S., Frellsen, J., and Picchini, U. (2021). Sequential neural posterior and likelihood approximation. arXiv preprint arXiv:2102.06522.
  • Wood, (2010) Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310):1102–1104.
  • Zammit-Mangion et al., (2025) Zammit-Mangion, A., Sainsbury-Dale, M., and Huser, R. (2025). Neural methods for amortized inference. Annual Review of Statistics and Its Application, 12:311––335.