This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Towards Saner Deep Image Registration

  Bin Duan      Ming Zhong Yan Yan
bduan2@hawk.iit.edu  {mzhong3, yyan34}@iit.edu
Illinois Tech, Chicago, IL 60616, USA
https://github.com/tuffr5/Saner-deep-registration
Corresponding author
Abstract

With recent advances in computing hardware and surges of deep-learning architectures, learning-based deep image registration methods have surpassed their traditional counterparts, in terms of metric performance and inference time. However, these methods focus on improving performance measurements such as Dice, resulting in less attention given to model behaviors that are equally desirable for registrations, especially for medical imaging. This paper investigates these behaviors for popular learning-based deep registrations under a sanity-checking microscope. We find that most existing registrations suffer from low inverse consistency and nondiscrimination of identical pairs due to overly optimized image similarities. To rectify these behaviors, we propose a novel regularization-based sanity-enforcer method that imposes two sanity checks on the deep model to reduce its inverse consistency errors and increase its discriminative power simultaneously. Moreover, we derive a set of theoretical guarantees for our sanity-checked image registration method, with experimental results supporting our theoretical findings and their effectiveness in increasing the sanity of models without sacrificing any performance.

1 Introduction

Learning maps between images or spaces, i.e. registration, is an important task, and has been widely studied in various fields, such as computer vision [15, 33], medical imaging [20, 52], and brain mapping [35, 47]. With recent advances in modern computing hardware and deep-learning techniques, learning-based deep image registration methods have surpassed their traditional counterparts, both in terms of metric performance and inference time. Different from the traditional style of optimizing on single image pair [12, 32, 7, 11, 2, 28, 19] using diffeomorphic formulations, such as elastic [4, 38], fluid mechanics [7, 18, 43] or B-spline [36], existing deep registrations [42, 6, 13, 29, 23, 10, 41, 25, 9] focus on maximizing image similarities between transformed moving images and fixed images. Despite the effectiveness of this approach, it inevitably leads to over-optimization of image similarities and thus introduces non-smooth mappings [6, 50, 16, 9], where smooth transformation maps are typically desirable, especially in the medical imaging domain.

Refer to caption
Figure 1: FV-SDice-Dice comparisons of deep registrations on IXI Brain dataset. The vertical axis is FV (% of folded voxels), the horizontal axis is SDice (Self-Dice), and the circle size is Dice. Both sanity-checked models (VM-ESC and TMBS-ESC) achieve better diffeomorphism, competitive registration performance, and significantly improved self-sanity, compared to other models, including models with inverse consistency (ICNet [50], ICON [16]).

To tackle the over-optimized issue, popular remedies [4, 36, 44, 37, 49, 45, 39, 40] utilize add-ons such as Large Deformations Diffeomorphic Metric Mapping (LDDMM) [7], vector Stationary Velocity Field (vSVF) [39], B-spline [36], Elastic [4] or Demons [43] to enforce diffeomorphism, requiring costly and iterative numerical optimizations [8, 49, 39]. Other methods [13, 17] seek probabilistic formulation for the registration but can lead to inferior performance [9]. Nonetheless, these methods operate only on one mapping in the direction from moving to fixed images, yet disregarding the relationship between different mappings from both directions, as shown in Appendix111Appendix goes https://arxiv.org/pdf/2307.09696.pdf Fig. A1.

Recent studies [50, 16, 31] have made great progress in modeling the relationship, i.e. inverse consistency, for different mappings. [50, 16] explicitly enforce the relationship in a strict form for ideal inverse consistency. However, for applications such as brain tumor registration [3], where there are regions with no valid correspondences, it is impractical to apply in such a strict manner. To select valid correspondences, [31] utilize mean errors of similar anatomical locations as thresholds. However, it is always tricky to determine similar anatomical locations, especially for unsupervised registrations in medical imaging [6, 13, 41]. Different from previous methods [50, 16, 31], we introduce two straightforward yet effective constraints, namely, self-sanity check to reduce error while registering identical pairs and cross-sanity check to ensure inverse consistency. Using our sanity checks, we test a wide range of registration models and find that despite better performance such as Dice, most models are suffering from over-optimization of image similarities, leading to folded transformations and low sanity awareness, as shown in Fig. 1. Moreover, our sanity checks can be applied to different registration models, where the extensive experiments certify that our method improves their sanity awareness, without sacrificing any performance.

Our findings are five-fold: (1) We find that despite better performance such as Dice, most models produce non-smooth transformations and are less aware of sanity errors due to over-optimization on image similarities; (2) We propose two novel sanity checks for the registration model training and derive corresponding theoretical foundations; (3) Our sanity checks not only help reduce the sanity errors of existing models but also assist them to produce more regular maps without diffeomorphic add-ons; (4) Our proposed sanity-checks are model-agnostic. It can be deployed to various models and is only needed during training so that it does generate no side effects for inference; (5) Experiments on IXI [1], OASIS [26], and BraTSReg [3] datasets verify our findings and show on par or better performance.

2 Background and Related Work

Background. Consider a set of images defined on the domain Ωn\Omega\subseteq{\mathbb{R}}^{n} (nn-D spatial domain) of some Hilbert space HH. Let ff be the fixed, and mm be the moving Ω\in\Omega. Let u:Ωnu:\Omega\rightarrow{\mathbb{R}}^{n} be a displacement vector field that maps from H×HHH\times H\rightarrow H^{\star} (dual space of HH) such that φ=u+p\varphi=u+p, where pp is the image grid of coordinates. φ\varphi denotes the transformation from mfm\rightarrow f. The goal of registration is to compute such transformation φ\varphi, whose quality can be measured by some similarity functions in the form of Sim(f,mφ)\operatorname*{Sim}(f,m\circ\varphi). mφm\circ\varphi denotes that mm is warped by φ\varphi. Without loss of generality, Sim()\operatorname*{Sim}(\cdot) can be replaced by any differentiable similarity function, such as Normalized Cross Correlation (NCC) and Mutual Information (MI). Alternatively, we can use other distance functions such as sum squared error (SSD) by replacing Sim()-\operatorname*{Sim}(\cdot) with Dist()\operatorname*{Dist}(\cdot). Hereafter, we will stick to the similarity notation for later derivations.

Let gg be the mapping function, parameterized using the model, where u=gmfg(m,f)u=g^{m\rightarrow f}\coloneqq g(m,f) stands for the displacement map from mfm\rightarrow f. In terms of such gg learned by the model, we assume that the similarity operator is concave, making Sim-\operatorname*{Sim} convex when it is not a distance operator for a (m,f)(m,f) pair. Therefore, the optimization in standard learning-based registrations can be formulated as

minSim(f,m(gmf+p))+λrReg(gmf)22,\min-\operatorname*{Sim}(f,m\circ(g^{m\rightarrow f}+p))+\lambda_{r}||{\operatorname*{Reg}(g^{m\rightarrow f})}||^{2}_{2}, (1)

trying to find such gg for a (m,f)(m,f) pair in the mapping search space. Here, Reg()\operatorname*{Reg}(\cdot) term is added to smooth the transformation, penalizing sudden changes in the map. Most commonly, Reg()\operatorname*{Reg}(\cdot) can be in the form of (u)\nabla(u) (first-order spatial derivative) or 2(u)\nabla^{2}(u) (second-order spatial derivative), where L2L^{2} norm of the image gradient is generally adopted in medical registration, resulting in a H1H^{1} regularization.

Method Relaxation/Approximation Error Bound
ICNet [50] gmf+g~fmF2||g^{m\rightarrow f}+\tilde{g}^{f\rightarrow m}||_{F}^{2} -
ICON [16] φmfφfmid22+||\varphi^{m\rightarrow f}\circ\varphi^{f\rightarrow m}-id||_{2}^{2}+ -
ϵ2dφmfJac(φmf)22\epsilon^{2}||{d\varphi^{m\rightarrow f}\sqrt{\rm Jac(\varphi^{m\rightarrow f})}}||^{2}_{2}
DIRAC [31] |gmf+g~fm|2<|g^{m\rightarrow f}+\tilde{g}^{f\rightarrow m}|_{2}< -
1NΣN|gmf+g~fm|2+β\frac{1}{N}\Sigma_{N}|g^{m\rightarrow f}+\tilde{g}^{f\rightarrow m}|_{2}+\beta
Ours |gmf+g~fm|2<|g^{m\rightarrow f}+\tilde{g}^{f\rightarrow m}|^{2}< λc(1α)βN\lambda_{c}(1-\alpha)\beta N
α(|gmf|2+|g~fm|2)+β\alpha(|g^{m\rightarrow f}|^{2}+|\tilde{g}^{f\rightarrow m}|^{2})+\beta
Table 1: Comparison between different inverse consistent registration methods. Due to the space limit, the formulations are shown only for mfm\rightarrow f, where fmf\rightarrow m should follow the same way.
Definition 1 (Ideal/Strict inverse consistency).

Given two different mappings: gmfg^{m\rightarrow f} and gfmg^{f\rightarrow m}, if φmfφfm=id\varphi^{m\rightarrow f}\circ\varphi^{f\rightarrow m}=id, where idid denotes the identity transformation, we called these two mappings are strictly inverse consistent. The strict inverse consistency is equivalently formulated as gmf+g~fm=0g^{m\rightarrow f}+\tilde{g}^{f\rightarrow m}=0, where g~fm\tilde{g}^{f\rightarrow m} is back-projected from gfmg^{f\rightarrow m}.

Relation to other inverse consistent methods. We show the relationship between methods in Tab. 1. The formulation of ICNet [50] follows strict inverse consistency in Def. 1 with Frobenius norm. Besides strict inverse consistency, ICON [16] adds regularization using the determinant of the Jacobian matrix of the transformation. Instead of explicitly enforcing strict consistency as in [50, 16], DIRAC [31] and our approach both modulate the inverse problem with more relaxation. This inequality formulation allows us to tell whether a voxel has a valid correspondence, which is practically useful in registering image pairs with topological changes, e.g., brain tumor registration [31, 30]. Different from [31], where means of inverse errors are utilized, we closely associate our formulation with the data, allowing us to show that the errors are upper bounded without resorting to extra information to determine similar anatomical structures, which has not been studied in other related works. Experimentally, our relaxation shows better performance over various metrics. Besides, to the best of our knowledge, our work is the first to consider the self-sanity error directly on displacements rather than for image similarities as in [23] for medical image registration studies.

3 Methodology

Refer to caption
Figure 2: Self-sanity error maps comparison. Left: with no self-sanity check, Right: with self-sanity check. We unify the error maps’ scale bars for a fair comparison.

3.1 Self-sanity and Cross-sanity Checks

To increase the discriminative power of identical pairs feeding into the model, we propose our self-sanity check as

gaa=0,a{m,f}.g^{a\rightarrow a}=0,\forall a\in\{m,f\}. (2)

where the mapping function gg learned using any models is restricted to output zero displacements for identical pairs. Such identical pairs can be filtered out using similarity measurements. However, users are unlikely to perform these filters, especially when they do not know that trained models would produce terrible predictions for identical pairs. Hence, our self-sanity check is a natural remedy.

Next, we enforce the inverse consistency on different mappings for gg by the model. We will search for correspondence in the fixed image for every point in the moving image such that the transformations between the two images are inconsistent. For example, suppose that a point in the space of image aa is registered to the space of image bb. If we register this point from image bb back to image aa, the point will arrive at the same location in image aa (Def. 1). We first define the backward displacement map as in optical flow studies [22, 27, 24], back-projected from gbag^{b\rightarrow a}

g~ba(p)=gba(p+gab(p)),\tilde{g}^{b\rightarrow a}(p)=g^{b\rightarrow a}(p+g^{a\rightarrow b}(p)), (3)

making it convenient for calculation. We then introduce our cross-sanity check in the form of

|gab+g~ba|2<α(|gab|2+|g~ba|2)+β,\displaystyle|{g^{a\rightarrow b}+\tilde{g}^{b\rightarrow a}}|^{2}<\alpha(|{g^{a\rightarrow b}}|^{2}+|{\tilde{g}^{b\rightarrow a}}|^{2})+\beta, (4)

(a,b){(m,f),(f,m)}\forall(a,b)\in\{(m,f),(f,m)\}. Here, we allow the estimation errors to increase linearly with the displacement magnitude with slope α\alpha and intercept β\beta. Instead of imposing zero-tolerance between forward and back-projected backward displacements [16, 31], we relax the inverse consistency with error tolerance, defined by α\alpha and β\beta, to allow occlusions which is more practical. I.e., This sanity check states that every point pp in the moving image aa should be able to map back from the fixed image bb to its original place in image aa with certain error tolerance. We then prove that this error tolerance is upper bounded.

Theorem 1 (Relaxed registration via cross-sanity check).

An ideal symmetric registration meets φabφba=id\varphi^{a\rightarrow b}\circ\varphi^{b\rightarrow a}=id, defined in Def. 1. Then, a cross-sanity checked registration is a relaxed solution to this ideal registration, satisfying

gab+g~ba22<β(2α)N1α.||{g^{a\rightarrow b}+\tilde{g}^{b\rightarrow a}}||^{2}_{2}<\frac{\beta(2-\alpha)N}{1-\alpha}. (5)

Here, 0<α<10<\alpha<1 and β>0\beta>0. NN is a constant, representing the total pixel/voxel numbers. Theoretically, our proposed cross-sanity checks can be viewed as a relaxed version of the strict symmetric constraint, which is commonly used. We also derive the lower/upper bound for satisfying our forward/backward consistency check as shown in Eq. 5. The derivations’ details are shown in the Appx. A.2. To sum up, our cross-sanity check allows a series of solutions to the forward/backward consistency in an enclosed set with a radius of β(2α)N1α\sqrt{\frac{\beta(2-\alpha)N}{1-\alpha}}.

Refer to caption
Figure 3: Comparisons between strict inverse consistency trained results (Top) and cross-sanity checked results (Bottom). Our relaxed sanity-checked result maintains a similar level of inverse consistency as φmfφfm\varphi^{m\rightarrow f}\circ\varphi^{f\rightarrow m} is close to id transformation (second column). We can also observe that ours produces a more regular map, compared to the folded map from the model trained with strict inverse consistency. Best view zoomed.

3.2 Unique Minimizer for Single Image Pair

Next, we show that there exists a unique solution for our sanity-checked optimization, in terms of a single image pair. We start with writing the standard optimization with our proposed sanity checks in the norm form as

minSim(f,m(gmf+p))+λrReg(gmf)22,\displaystyle\quad\,\min-\operatorname*{Sim}(f,m\circ(g^{m\rightarrow f}+p))+\lambda_{r}||{\operatorname*{Reg}(g^{m\rightarrow f})}||^{2}_{2}, (6)
s.t.gab+g~ba22<α(gab22+g~ba22)+βN,\displaystyle\text{s.t.}\quad||{g^{a\rightarrow b}+\tilde{g}^{b\rightarrow a}}||^{2}_{2}<\alpha(||{g^{a\rightarrow b}}||^{2}_{2}+||{\tilde{g}^{b\rightarrow a}}||^{2}_{2})+\beta N,
gaa22=0,(a,b){(m,f),(f,m)}.\displaystyle\quad\quad\,||{g^{a\rightarrow a}}||^{2}_{2}=0,\quad\forall(a,b)\in\{(m,f),(f,m)\}.
Theorem 2 (Existence of the unique minimizer for our relaxed optimization).

Let mm and ff be two images defined on the same spatial domain Ω\Omega, which is connected, closed, and bounded in n{\mathbb{R}}^{n} with a Lipschitz boundary Ω\partial\Omega. Let g:H×HHg:H\times H\rightarrow H^{\star} be a displacement mapping from the Hilbert space HH to its dual space HH^{\star}. Then, there exists a unique minimizer gg^{\star} to the relaxed minimization problem.

The detailed proof for Thm. 2 is shown in Appx. A.3. In summary, for a moving-fixed pair, since the relaxation by sanity checks and data term (e.g. SSD) are convex, together with the convex search space for gg, our problem has a unique minimizer. Thus, the optimization problem is well-conditioned by properties of our regularity constraints such that gg will not change dramatically.

3.3 Loyal Sanity-Checked Minimizer

In the next section, we further prove that the distance between the minimizer of the sanity-checked constrained optimization and the optimal minimizer for a single moving-fixed image pair can be represented and controlled by our proposed sanity checks. We first rewrite Eq. 6 as

minSim(b,a(gab+p))+λrReg(gab)22,\displaystyle\quad\,\min-\operatorname*{Sim}(b,a\circ(g^{a\rightarrow b}+p))+\lambda_{r}||{\operatorname*{Reg}(g^{a\rightarrow b})}||^{2}_{2}, (7)
s.t.gab+g~ba22<α(gab22+g~ba22)+βN,\displaystyle\text{s.t.}\quad||{g^{a\rightarrow b}+\tilde{g}^{b\rightarrow a}}||^{2}_{2}<\alpha(||{g^{a\rightarrow b}}||^{2}_{2}+||{\tilde{g}^{b\rightarrow a}}||^{2}_{2})+\beta N,
gaa22=0,(a,b){(m,f),(f,m)}.\displaystyle\quad\quad\,||{g^{a\rightarrow a}}||^{2}_{2}=0,\quad\forall(a,b)\in\{(m,f),(f,m)\}.
Remark 2.1.

We denote this formulation as the bidirectional optimization since it operates in both directions. It is straightforward to show that this bidirectional optimization still satisfies Thm. 2 so that there exists a unique minimizer.

Then, with a slight abuse of notations, we define that

𝐠=[gmf,gfm𝐠c,gmm,gff𝐠s].{\mathbf{g}}=[\underbrace{g^{m\rightarrow f},g^{f\rightarrow m}}_{{\mathbf{g}}_{c}},\underbrace{g^{m\rightarrow m},g^{f\rightarrow f}}_{{\mathbf{g}}_{s}}]^{\top}. (8)

We can rewrite the minimization problem as

minSim(𝐠)+λrReg(𝐠)22,\displaystyle\min-\operatorname*{Sim}({\mathbf{g}})+\lambda_{r}||{\operatorname*{Reg}({\mathbf{g}})}||^{2}_{2}, (9)
s.t. 𝐠s22=0,\displaystyle||{{\mathbf{g}}_{s}}||^{2}_{2}=0,
𝐠c+𝐠c~22<α(𝐠c22+𝐠c~22)+2βN.\displaystyle||{{\mathbf{g}}_{c}+\tilde{{\mathbf{g}}_{c}}}||^{2}_{2}<\alpha(||{{\mathbf{g}}_{c}}||^{2}_{2}+||{\tilde{{\mathbf{g}}_{c}}}||^{2}_{2})+2\beta N.

Here, Sim(𝐠):=Sim(m,f(gfm+p))+Sim(f,m(gmf+p))\operatorname*{Sim}({\mathbf{g}}):=\operatorname*{Sim}(m,f\circ(g^{f\rightarrow m}+p))+\operatorname*{Sim}(f,m\circ(g^{m\rightarrow f}+p)), Reg(𝐠)22:=Reg(gmf)22+Reg(gfm)22||{\operatorname*{Reg}({\mathbf{g}})}||^{2}_{2}:=||{\operatorname*{Reg}(g^{m\rightarrow f})}||^{2}_{2}+||{\operatorname*{Reg}(g^{f\rightarrow m})}||^{2}_{2}, 𝐠s22:=gmm22+gff22||{{\mathbf{g}}_{s}}||^{2}_{2}:=||{g^{m\rightarrow m}}||^{2}_{2}+||{g^{f\rightarrow f}}||^{2}_{2}, 𝐠c+𝐠c~22:=gmf+g~fm22+gfm+g~mf22||{{\mathbf{g}}_{c}+\tilde{{\mathbf{g}}_{c}}}||^{2}_{2}:=||{g^{m\rightarrow f}+\tilde{g}^{f\rightarrow m}}||^{2}_{2}+||{g^{f\rightarrow m}+\tilde{g}^{m\rightarrow f}}||^{2}_{2}, and 𝐠c22+𝐠c~22:=gmf22+g~mf22+gfm22+g~fm22||{{\mathbf{g}}_{c}}||^{2}_{2}+||{\tilde{{\mathbf{g}}_{c}}}||^{2}_{2}:=||{g^{m\rightarrow f}}||^{2}_{2}+||{\tilde{g}^{m\rightarrow f}}||^{2}_{2}+||{g^{f\rightarrow m}}||^{2}_{2}+||{\tilde{g}^{f\rightarrow m}}||^{2}_{2}. Then, the optimal minimizer 𝐠{\mathbf{g}}_{\ast} can be written in the matrix equation as

𝐠=argmin𝐠HSim(𝐠),s.t.𝐀𝐠=𝐲.\displaystyle{\mathbf{g}}_{\ast}=\operatorname*{arg\,min}_{{\mathbf{g}}\in H}-\operatorname*{Sim}({\mathbf{g}}),\quad\text{s.t.}\quad{\mathbf{A}}{\mathbf{g}}_{\ast}={\mathbf{y}}. (10)
𝐀=[Reg()00CS()0000SS()],𝐲=[0  00].\displaystyle{\mathbf{A}}=\begin{bmatrix}\lx@intercol\hfil\operatorname*{Reg}(\cdot)\hfil\lx@intercol&0&0\\ \lx@intercol\hfil\text{CS}(\cdot)\hfil\lx@intercol&0&0\\ 0&0&\lx@intercol\hfil\text{SS}(\cdot)\hfil\lx@intercol&\\ \end{bmatrix},\quad{\mathbf{y}}=\begin{bmatrix}0\\ \,\,0^{\dagger}\\ 0\\ \end{bmatrix}.

Here, Reg():=Reg(𝐠)22\operatorname*{Reg}(\cdot):=||{\operatorname*{Reg}({\mathbf{g}})}||^{2}_{2}, CS():=𝐠c+𝐠c~22α(𝐠c22+𝐠c~22)2βN\text{CS}(\cdot):=||{{\mathbf{g}}_{c}+\tilde{{\mathbf{g}}_{c}}}||^{2}_{2}-\alpha(||{{\mathbf{g}}_{c}}||^{2}_{2}+||{\tilde{{\mathbf{g}}_{c}}}||^{2}_{2})-2\beta N, and SS():=𝐠s22\text{SS}(\cdot):=||{{\mathbf{g}}_{s}}||^{2}_{2}, as defined previously. {\dagger} here means that we can have <0<0 solutions for the CS check as in Eq. 9, which we show will not interfere with the theoretical foundations by masking in the next section. In this way, we obtain our unique minimizer as

𝐠sanity=argmin𝐠HSim(𝐠)+λ2𝐀𝐠𝐲22,\displaystyle{\mathbf{g}}_{\rm sanity}=\operatorname*{arg\,min}_{{\mathbf{g}}\in H}-\operatorname*{Sim}({\mathbf{g}})+\frac{\lambda}{2}||{{\mathbf{A}}{\mathbf{g}}-{\mathbf{y}}}||^{2}_{2}, (11)

where λ:=[2λr,2λc,2λs]\lambda:=[2\lambda_{r},2\lambda_{c},2\lambda_{s}] is the vector of parameters.

Theorem 3 (Loyalty of the sanity-checked minimizer).

Let 𝐠{\mathbf{g}}_{\ast} be the optimal minimizer to the bidirectional registration problem, defined in Eq. 10, and 𝐠sanity{\mathbf{g}}_{\rm sanity} as our sanity-checked minimizer, defined in Eq. 11. The distance between these two minimizers can be upper bounded as

Sim(𝐠sanity)Sim(𝐠)λ2A(𝐠sanity𝐠)22.\displaystyle\operatorname*{Sim}({\mathbf{g}}_{\rm sanity})-\operatorname*{Sim}({\mathbf{g}}_{\ast})\leq\frac{\lambda}{2}||{A({\mathbf{g}}_{\rm sanity}-{\mathbf{g}}_{\ast})}||^{2}_{2}. (12)

The proof is presented in Appx. A.4. We then expand the right-hand side of Eq. 12 as

Sim(𝐠sanity)Sim(𝐠)λrReg(𝐠sanity)22\displaystyle\operatorname*{Sim}({\mathbf{g}}_{\rm sanity})-\operatorname*{Sim}({\mathbf{g}}_{\ast})\leq\lambda_{r}||{\operatorname*{Reg}({\mathbf{g}}_{\rm sanity})}||^{2}_{2} (13)
+λs||SS(𝐠sanity)||22)+λc||CS(𝐠sanity)||22,\displaystyle\quad\qquad\qquad+\lambda_{s}||{\text{SS}({\mathbf{g}}_{\rm sanity})}||^{2}_{2})+\lambda_{c}||{\text{CS}({\mathbf{g}}_{\rm sanity})}||^{2}_{2},
whereReg(𝐠)=0,SS(𝐠)=0,CS(𝐠)=0.\displaystyle\quad\quad\text{where}\quad\operatorname*{Reg}({\mathbf{g}}_{\ast})=0,\,\text{SS}({\mathbf{g}}_{\ast})=0,\,\text{CS}({\mathbf{g}}_{\ast})=0.

Empirically, the first two terms contribute relatively less (10×\times smaller) than the cross-sanity error (See Experiments). Thus, we focus on the CS term here. Following Thm. 1, CS(𝐠sanity)22||{\text{CS}({\mathbf{g}}_{\rm sanity})}||^{2}_{2} is upper bounded in the form of

CS(𝐠sanity)22<2(1α)βN.\displaystyle||{\text{CS}({\mathbf{g}}_{\rm sanity})}||^{2}_{2}<2(1-\alpha)\beta N. (14)

Here, 0<α<10<\alpha<1 and β>0\beta>0 ensure the inequality’s direction, where the multiplication of two accounts for cross-sanity errors from two directions, i.e., from moving image to fixed image and also fixed image to moving image. We put the full derivation of CS error bound in the Appx. A.5. In a nutshell, we prove that by satisfying the cross-sanity check, we have an upper bound for the CS error, constraining the relationship between two displacements from different directions via parameters α\alpha and β\beta.

Lemma 4 (Upper-bound of distance between optimal minimizer and sanity-checked minimizer).

Let 𝐠{\mathbf{g}}_{\ast} and 𝐠sanity{\mathbf{g}}_{\rm sanity} respectively be the optimal minimizer and the constrained minimizer with cross-sanity check, then we have an upper bound for the distance between such two minimizers as

Sim(𝐠sanity)Sim(𝐠)<2λc(1α)βN.\operatorname*{Sim}({\mathbf{g}}_{\rm sanity})-\operatorname*{Sim}({\mathbf{g}}_{\ast})<2\lambda_{c}(1-\alpha)\beta N. (15)

The Lemma 4 follows Thm. 3 and Eq. 14 (upper bound of CS error). That said, the loyalty w.r.t the distance of the constrained minimizer is controlled by the combination of α\alpha, β\beta, and the weight parameter λc\lambda_{c}.

Interpretation of derived upper bound. By Lemma 4, we prove that if we average on the total number of voxels and also two directions, the similarity distance between the optimal minimizer and our sanity-checked minimizer per pixel/voxel is upper bounded by λc(1α)β\lambda_{c}(1-\alpha)\beta. That being said, to satisfy our sanity checks, and thus maintain the loyalty to the optimal minimizer, λc(1α)β\lambda_{c}(1-\alpha)\beta should be small. Numerically speaking, for example, if α=0.1\alpha=0.1 and β=10\beta=10, we have (1α)β=9(1-\alpha)\beta=9. This distance of 99 is extremely large for this delicate image registration task, causing the constrained minimizer to be untrustworthy. Therefore, we need to adopt a relatively small loss weight λc\lambda_{c}, e.g. λc=0.001\lambda_{c}=0.001, to bound the distance between two minimizers tightly. This observation from our proven theoretical upper bound also coincides with our sanity loss weight ablation study.

Refer to caption
Figure 4: Training a sanity-checked model. \raisebox{-.9pt} {\scriptsize{W}}⃝ denotes spatial warping, e.g., warped mm is that we warp moving image mm using the transformation map calculated from gmfg^{m\rightarrow f}.

3.4 Sanity-checked Registration Training

We show our sanity-checked training pipeline in Fig. 4. We introduce each loss in the subsection. The self-sanity check can be formulated into a loss function form as

self=12(gmm22+gff22).\mathcal{L}_{\rm self}=\frac{1}{2}(||{g^{m\rightarrow m}}||^{2}_{2}+||{g^{f\rightarrow f}}||^{2}_{2}). (16)

So that the self-sanity loss penalizes the squared differences between predicted displacement maps and the ideal ones. Next, we use mfm\rightarrow f direction as an example (fmf\rightarrow m direction follows the same principle) to formulate the proposed cross-sanity check loss. We calculate for every voxel and define a binary mask mf\mathcal{M}^{m\rightarrow f} in the form of

mf={0if satisfies the cross-sanity check,1otherwise.\mathcal{M}^{m\rightarrow f}=\begin{cases}0&\text{if satisfies the cross-sanity check},\\ 1&\text{otherwise}.\end{cases} (17)

An interpretation of this binary mask mf\mathcal{M}^{m\rightarrow f} is that it records violations of the cross-sanity check (Eq. 4) for each individual point. In this way, <0<0 solution in Eq. 10 will not challenge the theoretical formulations since these points are masked out. Thus, we can formulate the proposed cross-sanity check in the form of a loss function of

crossmf\displaystyle\mathcal{L}^{m\rightarrow f}_{\rm cross} =mf(gmf+g~fm)22βmf22\displaystyle=||{\mathcal{M}^{m\rightarrow f}\odot(g^{m\rightarrow f}+\tilde{g}^{f\rightarrow m})}||^{2}_{2}-\beta||{\mathcal{M}^{m\rightarrow f}}||^{2}_{2} (18)
α(mfgmf22+mfg~fm22).\displaystyle-\alpha(||{\mathcal{M}^{m\rightarrow f}\odot g^{m\rightarrow f}}||^{2}_{2}+||{\mathcal{M}^{m\rightarrow f}\odot\tilde{g}^{f\rightarrow m}}||^{2}_{2}).

The final cross=12(crossmf+crossfm)\mathcal{L}_{\rm cross}=\frac{1}{2}(\mathcal{L}^{m\rightarrow f}_{\rm cross}+\mathcal{L}^{f\rightarrow m}_{\rm cross}). Here, \odot denotes element-wise multiplication so that we only retain points violating the cross-sanity check. In this case, the loss value is only calculated for those violators, visualized as occlusion masks, shown in Fig. 5. Finally, the total loss is

total=sim+λrreg+λsself+λccross.\mathcal{L}_{\rm total}=\mathcal{L}_{\rm sim}+\lambda_{r}\mathcal{L}_{\rm reg}+\lambda_{s}\mathcal{L}_{\rm self}+\lambda_{c}\mathcal{L}_{\rm cross}. (19)

Here, sim\mathcal{L}_{\rm sim} as NCC loss, reg\mathcal{L}_{\rm reg} as (u)22||{\nabla(u)}||^{2}_{2}, and λr=1\lambda_{r}=1, following standard deep registrations [6, 9]. If not specified otherwise, λs=0.1\lambda_{s}=0.1 and λc=0.001\lambda_{c}=0.001. While training, the model optimizes the total loss total\mathcal{L}_{\rm total} on different moving-fixed image pairs (m,f)(m,f) in the training set 𝒟\mathcal{D} as

min(m,f)𝒟mingHtotal.\displaystyle\min_{(m,f)\in\mathcal{D}}\min_{g\in H}\mathcal{L}_{\rm total}. (20)

In this way, we fulfill our novel regularization-based sanity-enforcer formulation for training a registration model.

Refer to caption
Figure 5: Mask evolution during training. Overall, as training proceeds, violators of the cross-sanity check are decreasing.

4 Experiments

Evaluation metrics. We study model behaviors in a wide range of metrics. For the main metric, we use dice to measure how fit is the transformed segmentation to its ground truth, following previous studies. To study the model taking an identical pair as inputs, we also report self dice (SDice), i.e., the dice when registering the moving image to itself. For pre-operative and post-recurrence registration, we measure the mean target registration error (TRE) of the paired landmarks with Euclidean distance in millimeters and also self mean registration error (STRE) to study the self-sanity of the models. Besides, we report 95% Hausdorff Distance (HD95) as in [9], which measures the 95th percentile of the distances between boundary points of the transformed subject and the actual subject. We follow [3, 31] to report robustness (ROB) for a pair of scans as the relative number of successfully registered landmarks.

For diffeomorphism measurements, we report the percentage of folded voxels (FV) whose Jacobian determinant <0<0, the Absolute Value of Negative Jacobian (AJ) where we sum up all negative Jacobian determinants, and the Standard Deviation of the logarithm of the Jacobian determinant (SDlogJ). All these three Jacobian determinant-related metrics reflect how regular are the transformation maps.

For quantifying sanity errors, we present the mean of self-sanity error (SSE) and the mean of cross-sanity error (CSE), defined in Eq. 16 and Eq. 18, respectively. These two metrics are designed to study model behaviors per the level of each single image pair and are essential for our sanity analysis of different learning-based deep models.

Refer to caption
Figure 6: Comparisons between different models on IXI dataset.

Implemented models. We denote the bidirectional optimization as Enclosed (E) Image registration, operating to maximize the similarity score and minimize the spatial gradient regularization. We also have Self-sanity checked (S) and Cross-sanity checked (C) image registrations. Moreover, as proof of concept that our sanity checks can be applied to different learning-based models, we implement our proposed techniques on models such as VoxelMorph [6] (VM), TransMorph-Large [9] (TM), TransMorph-B-spline [9] (TMBS), and DIRAC [31].

Refer to caption
Figure 7: Qualitative comparisons on IXI, where we mark maximum values of error maps on each top left. Best view zoomed.
Refer to caption
Figure 8: Comparisons on OASIS validation set. Our method produces more regular maps for all input image pairs.

4.1 Results

Method Dice↑ FV↓
Affine 0.386±\pm0.195 -
SyN [2] 0.645±\pm0.152 <<0.01
NiftyReg [28] 0.645±\pm0.167 0.020±\pm0.046
LDDMM [7] 0.680±\pm0.135 <<0.01
deedsBCV [19] 0.733±\pm0.126 0.147±\pm0.050
MIDIR [34] 0.742±\pm0.128 <<0.01
PVT [46] 0.727±\pm0.128 1.858±\pm0.314
CoTr [48] 0.735±\pm0.135 1.298±\pm0.342
VM [6] 0.732±\pm0.123 1.522±\pm0.336
VM-diff [13] 0.580±\pm0.165 <<0.01
ICNet [50] 0.742±\pm0.223 0.854±\pm0.316
CM [23] 0.737±\pm0.123 1.719±\pm0.382
ViT-V-Net [10] 0.734±\pm0.124 1.609±\pm0.319
nnFormer [51] 0.747±\pm0.135 1.595±\pm0.358
ICON [16] 0.752±\pm0.155 0.695±\pm0.248
TM [9] 0.754±\pm0.124 1.579±\pm0.328
TM-diff [9] 0.594±\pm0.163 <<0.01
TMBS [9] 0.761±\pm0.122 <<0.01
VM-ESC 0.743±\pm0.025 0.478±\pm0.101
TMBS-ESC 0.762±\pm0.023 <<0.01
Table 2: IXI dataset results.
Method Dice↑ HD95↓ SDlogJ↓
VTN [25] 0.827±\pm0.013 1.722±\pm0.318 0.121±\pm0.015
ConAdam [41] 0.846±\pm0.016 1.500±\pm0.304 0.067±\pm0.005
VM [6] 0.847±\pm0.014 1.546±\pm0.306 0.133±\pm0.021
ClapIRN [29] 0.861±\pm0.015 1.514±\pm0.337 0.072±\pm0.007
TM [9] 0.862±\pm0.014 1.431±\pm0.282 0.128±\pm0.021
TM-SC 0.862±\pm0.014 1.449±\pm0.310 0.084±\pm0.005
(a) OASIS validation set results.
Method Dice↑ HD95↓ SDlogJ↓
Initial 0.56 3.86 1.50
VTN [25] 0.80 1.77 0.08
ConAdam [41] 0.81 1.63 0.07
ClapIRN [29] 0.82 1.67 0.07
TM [9] 0.82 1.66 0.12
TM-SC 0.820 1.666 0.085
(b) OASIS test set results.
Table 3: OASIS dataset results, obtained from the learn2reg challenge organizers [20]. Since the listed top-ranking methods are already enclosed, we mainly study the impact of adding sanity checks, where the impact is proved to be negligible on these main metrics.
Method TRE↓ STRE↓ ROB↑ FV↓ AJ×102
Initial 6.864±\pm0.996 - - - -
DIRAC [31] 2.760±\pm0.247 0.274±\pm0.027 0.776±\pm0.055 0.025±\pm0.009 4.242±\pm2.954
DIRAC-SC 2.719±\pm0.259 0.218±\pm0.046 0.795±\pm0.034 0.022±\pm0.005 3.001±\pm1.314
Table 4: BraTSReg dataset results, reported as means±\pmdeviations out of five-fold validations. \ast: We use their code and train models until convergence. {\dagger}: We implement the cross-sanity check to replace the inverse consistent error part of DIRAC, and add our self-sanity check.

Atlas-to-subject registration. We split in total 576 T1–weighted brain MRI images from the Information eXtraction from Images (IXI) [1] database into 403, 58, and 115 volumes for training, validation, and test sets. The moving image is an atlas brain MRI obtained from [23]. FreeSurfer [14] was used to pre-process the MRI volumes. The pre-processed image volumes are all cropped to the size of 160×192×224. Label maps, including 30 anatomical structures, are obtained using FreeSurfer for evaluating registration. Besides CNN-based models [6, 23, 10], we also include other transformer-based deep methods [10, 51, 48, 46] as baselines following [9]. The comparison results are presented in Tab. 4, and qualitative comparisons in Fig. 6. For cross-sanity check, we set α=0.1\alpha=0.1 and β=12\beta=12. Overall, we can observe models with sanity checks achieve better diffeomorphisms without impairing any Dice performance and even improve the performance for the VM model. We show qualitative comparisons in Fig. 7. Compared to the naive counterparts, our sanity-checked model produces more regular maps from both directions and reduces huge sanity errors, comparably using error heatmaps for both self-sanity and cross-sanity errors.

Subject-to-subject registration. OASIS dataset [26, 21] contains a total of 451 brain T1 weighted MRI images, with 394/19/38 images used for training/validation/testing purposes, respectively. The pre-processed image volumes are all cropped to the size of 160×192×224. Label maps for 35 anatomical structures are provided using FreeSurfer [14] for evaluation. For the cross-sanity check, we set α=0.1\alpha=0.1 and β=10\beta=10. The results are shown in Tab. 4. In terms of main metrics, we achieve on-par performance with all the state-of-the-art methods, further certifying our sanity checks will not contaminate the performance of models. We also show our qualitative comparison in Fig. 8, indicating that our sanity-enforced image registration results in less self-sanity error from the self-sanity error map comparison and more regular maps in the deformed grid. All these comparisons prove the efficacy of our regularization-based sanity-enforcer to prevent both sanity errors.

Refer to caption
Figure 9: Example axial T1-weighted MR slices comparisons on BraTSReg dataset. The error maps are normalized to incorporate displacements in unit space predicted by DIRAC.

Pre-operative and post-recurrence scans registration. BraTSReg [3] dataset contains 160 pairs of pre-operative and follow-up brain MR scans of glioma patients taken from different time points. Each time point contains native T1, contrast-enhanced T1-weighted, T2-weighted, and FLAIR MRI. Within the dataset, 140 pairs of scans are associated with 6 to 50 manual landmarks in both scans. The other 20 pairs of scans only have landmarks in the follow-up scan. Following [31], we create a five-fold schema where each fold has 122/10/28 for training/validation/test, respectively. We set α=0.01\alpha=0.01 and β=0.03\beta=0.03 in the cross-sanity check since DIRAC predicts normalized displacement (range 0 to 1). We take an average of five folds to report our results in Tab. 4. All baselines and our models use the same training setup. We specifically focus on the model training part and directly train all models on data downloaded from the official website [5] without pre-affine and post-processing steps mentioned in their paper [30]. The best model is selected on the validation set, which has the lowest TREs. Despite the high performance of the baseline, we can observe that the substitution by cross-sanity check and the addition of self-sanity check enable all metrics’ improvements to be claimed as state-of-the-art. We also show qualitative comparisons in Fig. 9, where the comparison between SSE error maps demonstrates the effectiveness of our self-sanity check.

4.2 Sanity Analysis

We conduct a broad sanity analysis on a wide range of models on the IXI dataset, including popular models (VM [6] and TM [9]), model with cycle consistency (CM [23]), models with inverse consistency (ICNet [50] and ICON [16]), probabilistically-formulated models (VM-diff [13] and TM-diff [9]), and models with diffeomorphic add-ons (MIDIR [34] and TMBS [9]). The results are shown in Tab. 5. We find that models with probabilistic formulation or cycle consistency have higher self-sanity, whereas the other models are insensitive to this scenario and have an inferior SDice. However, these probabilistic formulated models suffer from an relatively low Dice score, refraining from practical usage. For CM, albeit low self-sanity error, the correspondence error between image pairs is not negligible. We also find it interesting that models with diffeomorphic add-ons still suffer from large sanity errors, despite their ability to produce a diffeomorphic/regular map, caused by not modeling different mappings from both directions. Overall, our behavior studies show that most models suffer from inferior Dice/SDice performance, lower diffeomorphism, or significant sanity errors. Our proposed sanity checks improve performance from every aspect, producing a more regular map and preventing sanity errors.

Method Main Metric↑ Diffeomorphism↓ Sanity Error↓
Dice SDice FV AJ×104 SSE×10-1 CSE
PVT [46] 0.727 0.695 1.858 3.64 52.95 19.43
CoTr [48] 0.735 0.846 1.298 2.10 6.28 21.37
MIDIR [34] 0.742 0.850 <<0.1 <<0.1 6.75 15.75
VM [6] 0.732 0.890 1.522 3.00 5.16 16.55
VM-diff [13] 0.580 1.000 <<0.1 <<0.1 0.13 <<0.1
ICNet [50] 0.742 0.849 0.854 1.90 13.23 7.48
CM [23] 0.737 1.000 1.719 3.44 1.18 19.47
VIT-V-Net [10] 0.734 0.908 1.609 2.97 14.52 12.98
nnFormer [51] 0.747 0.799 1.595 2.90 13.31 20.68
ICON [16] 0.752 0.899 0.695 0.90 7.31 5.68
TM [9] 0.754 0.873 1.579 2.75 85.63 19.62
TM-diff [9] 0.594 1.000 <<0.1 <<0.1 0.16 <<0.1
TMBS [9] 0.761 0.903 <<0.1 0.02 23.84 14.08
VM-ESC 0.743 1.000 0.478 0.43 <<0.1 2.55
TMBS-ESC 0.762 1.000 <<0.1 <<0.1 <<0.1 2.99
Table 5: Sanity analysis of various models on IXI dataset.

4.3 Ablation Study

Cross-sanity error under strict inverse consistency.

[46] [48] [6] [23] [10] [51] [50] [16] [9] [6] TMBSESC\frac{\textit{TMBS}}{\textit{ESC}} VMESC\frac{\textit{VM}}{\textit{ESC}}
19.24 10.02 5.63 7.58 11.27 10.44 3.46 2.78 22.61 7.31 1.96 0.92
Table 6: Cross-sanity Error using strict inverse consistency on IXI dataset, i.e., g~ba+gab=0\tilde{g}^{b\rightarrow a}+g^{a\rightarrow b}=0. We choose VM for [6] and TMBS for [9] as our baselines. We show our results in the last two columns. Still, we achieve state-of-the-art performance even under strict inverse consistency for calculating cross-sanity error.

We testify the performance under strict inverse consistency as in Def. 1, the same as CICE in [11], shown in Tab. 6. The reductions of strict inverse consistency error compared using relaxed inverse consistency are caused by many small errors taken into account for calculating the average, where our sanity-checked methods still show better performance, compared to all models, including the model trained using explicitly strict inverse consistency, e.g. [16].

Sanity checks loss weight study. We derive an upper bound for our saner registration in Lemma 4. We show in the study that we can utilize this upper bound to guide us to set appropriate loss weight λc\lambda_{c} for the cross-sanity check. For fast verification, we randomly sample 50 subjects from the training set of the IXI dataset and validate/test in the full validation/test set. We also include the self-sanity check loss weight λs\lambda_{s} here for comparing purposes. The experimental results are shown in Tab. 7. Compared to λc\lambda_{c}, λs\lambda_{s} is insensitive to weight change. As we discussed earlier, if we set higher loss weight to cross-sanity check, for example, 0.01/0.001, approximately having upper bounds as 0.1/0.01, respectively. This creates a huge difference when we train our models. This high tolerance for error results in significantly degenerated performance for Dice, where our derived upper bound gives useful guidance for the setting of loss weights. More discussions can be found in Appendix.

λc\lambda_{c} λs\lambda_{s} Dice↑ SDice↑ FV↓ AJ×104 SSE×10-1 CSE↓
5e-4 1e-3 0.720 0.957 0.913 1.24 0.658 6.96
1e-2 0.720 0.983 0.920 1.27 0.299 6.77
1e-1 0.719 1.000 0.896 1.22 <<0.1 6.73
1e-3 1e-3 0.721 0.967 0.422 0.38 0.572 3.43
1e-2 0.720 0.989 0.426 0.40 0.301 3.46
1e-1 0.721 1.000 0.443 0.40 <<0.1 3.43
1e-2 1e-3 0.473 0.912 <<0.1 <<0.1 1.966 <<0.1
1e-2 0.420 0.980 <<0.1 <<0.1 1.210 <<0.1
1e-1 0.406 1.000 <<0.1 <<0.1 0.209 <<0.1
Table 7: Sanity loss weight study of VM on sub-IXI dataset.
Method Dice↑ SDice↑ FV↓ AJ×104 SSE×10-1 CSE↓
VM 0.732 0.890 1.522 3.00 5.156 16.55
VM-E 0.742 0.919 1.574 3.04 2.647 26.97
VM-ES 0.740 1.000 1.442 2.72 0.050 25.32
VM-EC 0.743 0.950 0.447 0.39 1.178 2.61
VM-ESC 0.743 1.000 0.478 0.43 <<0.1 2.55
TM 0.862 0.925 0.752 1.52 3.069 10.72
TM-S 0.861 1.000 0.777 1.66 0.018 11.43
TM-C 0.862 0.948 0.246 0.26 0.991 2.93
TM-SC 0.862 1.000 0.307 0.35 <<0.1 3.17
Table 8: Model ablations on IXI dataset (top rows) and OASIS validation dataset (bottom rows). Since TM is already enclosed (E), we focus on studying the impacts of adding our sanity checks.

Model ablation study. We study our proposed sanity checks on IXI and OASIS validation datasets and report the results in Tab. 8. We show that each sanity check reduces the corresponding errors without compromising other metrics’ performance compared to their naive counterparts. We also find that the bidirectional registration and cross-sanity check can also mitigate self-sanity errors to a certain level, but cannot eliminate such errors completely, proving that our self-sanity check is necessary to regulate the models’ behavior for mapping identical image pairs. More experiments such as parameters α\alpha and β\beta numerical study, sanity preservation study, statistical significance of results, ablative qualitative results, etc, can be seen in the Appendix.

5 Conclusion

This paper focuses on correcting learning-based deep models’ behaviors on single moving-fixed image pairs. In our model sanity analysis, we find that most existing models suffer from significant sanity errors, with no exceptions for models equipped with diffeomorphic add-ons. We show that this sanity-checked model can prevent such sanity errors without contaminating any registration performance. While the experimental results certify the effectiveness of our proposed sanity checks, our sanity checks are supported by a set of theoretical guarantees derived in this paper. We first show that there is an error upper bound for our sanity-checked formulation to the optimal condition. Then, we show that this upper bound can give significant guidance to train a sanity-checked registration model, where we believe it is beneficial for preventing overly optimization on image similarities when training deep registration models.

Acknowledgements

Authors thank anonymous reviewers for their constructive comments and help from Yuzhang Shang for proofreading the rebuttal. Authors would also want to thank Junyu Chen, Tony C. W. Mok for their generous help with their papers and codes, and learn2reg challenge organizer for evaluating our models. This work was supported by NSF SCH-2123521. This article solely reflects the opinions and conclusions of its authors and not the funding agency.

References

  • [1] Ixi brain, 2022. https://brain-development.org/ixi-dataset.
  • [2] Brian B Avants, Charles L Epstein, Murray Grossman, and James C Gee. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. MIA, 12(1):26–41, 2008.
  • [3] Bhakti Baheti, Diana Waldmannstetter, Satrajit Chakrabarty, Hamed Akbari, Michel Bilello, Benedikt Wiestler, Julian Schwarting, Evan Calabrese, Jeffrey Rudie, Syed Abidi, et al. The brain tumor sequence registration challenge: Establishing correspondence between pre-operative and follow-up mri scans of diffuse glioma patients. arXiv:2112.06979, 2021.
  • [4] Ruzena Bajcsy and Stane Kovačič. Multiresolution elastic matching. CVGIP, 46(1):1–21, 1989.
  • [5] Spyridon Bakas. Brain tumor sequence registration challenge, 2022. https://www.med.upenn.edu/cbica/brats-reg-challenge.
  • [6] Guha Balakrishnan, Amy Zhao, Mert R Sabuncu, John Guttag, and Adrian V Dalca. Voxelmorph: a learning framework for deformable medical image registration. TMI, 38(8):1788–1800, 2019.
  • [7] M Faisal Beg, Michael I Miller, Alain Trouvé, and Laurent Younes. Computing large deformation metric mappings via geodesic flows of diffeomorphisms. IJCV, 61(2):139–157, 2005.
  • [8] Xiaohuan Cao, Jianhua Yang, Jun Zhang, Qian Wang, Pew-Thian Yap, and Dinggang Shen. Deformable image registration using a cue-aware deep regression network. TBE, 65(9):1900–1911, 2018.
  • [9] Junyu Chen, Eric C Frey, Yufan He, William P Segars, Ye Li, and Yong Du. Transmorph: Transformer for unsupervised medical image registration. MIA, page 102615, 2022.
  • [10] Junyu Chen, Yufan He, Eric C Frey, Ye Li, and Yong Du. Vit-v-net: Vision transformer for unsupervised volumetric medical image registration. arXiv:2104.06468, 2021.
  • [11] Gary E Christensen, Xiujuan Geng, Jon G Kuhl, Joel Bruss, Thomas J Grabowski, Imran A Pirwani, Michael W Vannier, John S Allen, and Hanna Damasio. Introduction to the non-rigid image registration evaluation project (nirep). In WBIR, 2006.
  • [12] Gary E Christensen, Sarang C Joshi, and Michael I Miller. Volumetric transformation of brain anatomy. TMI, 16(6):864–877, 1997.
  • [13] Adrian V Dalca, Guha Balakrishnan, John Guttag, and Mert R Sabuncu. Unsupervised learning of probabilistic diffeomorphic registration for images and surfaces. MIA, 57:226–236, 2019.
  • [14] Bruce Fischl. Freesurfer. Neuroimage, 62(2):774–781, 2012.
  • [15] Kexue Fu, Shaolei Liu, Xiaoyuan Luo, and Manning Wang. Robust point cloud registration framework based on deep graph matching. In CVPR, 2021.
  • [16] Hastings Greer, Roland Kwitt, Francois-Xavier Vialard, and Marc Niethammer. Icon: Learning regular maps through inverse consistency. In ICCV, 2021.
  • [17] Daniel Grzech, Mohammad Farid Azampour, Ben Glocker, Julia Schnabel, Nassir Navab, Bernhard Kainz, and Loïc Le Folgoc. A variational bayesian method for similarity learning in non-rigid image registration. In CVPR, 2022.
  • [18] Gabriel L Hart, Christopher Zach, and Marc Niethammer. An optimal control approach for deformable registration. In CVPR, 2009.
  • [19] Mattias P Heinrich, Oskar Maier, and Heinz Handels. Multi-modal multi-atlas segmentation using discrete optimisation and self-similarities. VISCERAL Challenge@ ISBI, 1390:27, 2015.
  • [20] Alessa Hering, Lasse Hansen, Tony C. W. Mok, Albert CS Chung, Hanna Siebert, Stephanie Häger, Annkristin Lange, Sven Kuckertz, Stefan Heldmann, Wei Shao, et al. Learn2reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning. TMI, 2022.
  • [21] Andrew Hoopes, Malte Hoffmann, Douglas N Greve, Bruce Fischl, John Guttag, and Adrian V Dalca. Learning the effect of registration hyperparameters with hypermorph. MELBA, 3:1–30, 2022.
  • [22] Junhwa Hur and Stefan Roth. Mirrorflow: Exploiting symmetries in joint optical flow and occlusion estimation. In ICCV, 2017.
  • [23] Boah Kim, Dong Hwan Kim, Seong Ho Park, Jieun Kim, June-Goo Lee, and Jong Chul Ye. Cyclemorph: cycle consistent unsupervised deformable image registration. MIA, 71:102036, 2021.
  • [24] Pengpeng Liu, Michael Lyu, Irwin King, and Jia Xu. Selflow: Self-supervised learning of optical flow. In CVPR, 2019.
  • [25] Jinxin Lv, Zhiwei Wang, Hongkuan Shi, Haobo Zhang, Sheng Wang, Yilang Wang, and Qiang Li. Joint progressive and coarse-to-fine registration of brain mri via deformation field integration and non-rigid feature fusion. TMI, 41(10):2788–2802, 2022.
  • [26] Daniel S Marcus, Tracy H Wang, Jamie Parker, John G Csernansky, John C Morris, and Randy L Buckner. Open access series of imaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults. Journal of cognitive neuroscience, 19(9):1498–1507, 2007.
  • [27] Simon Meister, Junhwa Hur, and Stefan Roth. Unflow: Unsupervised learning of optical flow with a bidirectional census loss. In AAAI, 2018.
  • [28] Marc Modat, Gerard R Ridgway, Zeike A Taylor, Manja Lehmann, Josephine Barnes, David J Hawkes, Nick C Fox, and Sébastien Ourselin. Fast free-form deformation using graphics processing units. Computer methods and programs in biomedicine, 98(3):278–284, 2010.
  • [29] Tony C. W. Mok and Albert Chung. Conditional deformable image registration with convolutional neural network. In MICCAI, 2021.
  • [30] Tony C. W. Mok and Albert Chung. Robust image registration with absent correspondences in pre-operative and follow-up brain mri scans of diffuse glioma patients. arXiv:2210.11045, 2022.
  • [31] Tony C. W. Mok and Albert Chung. Unsupervised deformable image registration with absent correspondences in pre-operative and post-recurrence brain tumor mri scans. In MICCAI, 2022.
  • [32] Xavier Pennec, Pascal Cachier, and Nicholas Ayache. Understanding the “demon’s algorithm”: 3d non-rigid registration by gradient descent. In MICCAI, 1999.
  • [33] Zheng Qin, Hao Yu, Changjian Wang, Yulan Guo, Yuxing Peng, and Kai Xu. Geometric transformer for fast and robust point cloud registration. In CVPR, 2022.
  • [34] Huaqi Qiu, Chen Qin, Andreas Schuh, Kerstin Hammernik, and Daniel Rueckert. Learning diffeomorphic and modality-invariant registration using b-splines. In MIDL, 2021.
  • [35] Lei Qu, Yuanyuan Li, Peng Xie, Lijuan Liu, Yimin Wang, Jun Wu, Yu Liu, Tao Wang, Longfei Li, Kaixuan Guo, et al. Cross-modal coherent registration of whole mouse brains. Nature Methods, 19(1):111–118, 2022.
  • [36] Daniel Rueckert, Luke I Sonoda, Carmel Hayes, Derek LG Hill, Martin O Leach, and David J Hawkes. Nonrigid registration using free-form deformations: application to breast mr images. TMI, 18(8):712–721, 1999.
  • [37] Tanya Schmah, Laurent Risser, and François-Xavier Vialard. Left-invariant metrics for diffeomorphic image registration with spatially-varying regularisation. In MICCAI, 2013.
  • [38] Dinggang Shen and Christos Davatzikos. Hammer: hierarchical attribute matching mechanism for elastic registration. TMI, 21(11):1421–1439, 2002.
  • [39] Zhengyang Shen, Xu Han, Zhenlin Xu, and Marc Niethammer. Networks for joint affine and non-parametric image registration. In CVPR, 2019.
  • [40] Zhengyang Shen, François-Xavier Vialard, and Marc Niethammer. Region-specific diffeomorphic metric mapping. NeurIPS, 2019.
  • [41] Hanna Siebert, Lasse Hansen, and Mattias P Heinrich. Fast 3d registration with accurate optimisation and little learning for learn2reg 2021. In MICCAI, 2021.
  • [42] Hessam Sokooti, Bob De Vos, Floris Berendsen, Boudewijn PF Lelieveldt, Ivana Išgum, and Marius Staring. Nonrigid image registration using multi-scale 3d convolutional neural networks. In MICCAI, 2017.
  • [43] Tom Vercauteren, Xavier Pennec, Aymeric Perchant, and Nicholas Ayache. Diffeomorphic demons: Efficient non-parametric image registration. NeuroImage, 45(1):S61–S72, 2009.
  • [44] François-Xavier Vialard, Laurent Risser, Daniel Rueckert, and Colin J Cotter. Diffeomorphic 3d image registration via geodesic shooting using an efficient adjoint calculation. IJCV, 97:229–241, 2012.
  • [45] Jian Wang and Miaomiao Zhang. Deepflash: An efficient network for learning-based medical image registration. In CVPR, 2020.
  • [46] Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In ICCV, 2021.
  • [47] Xuechun Wang, Weilin Zeng, Xiaodan Yang, Yongsheng Zhang, Chunyu Fang, Shaoqun Zeng, Yunyun Han, and Peng Fei. Bi-channel image registration and deep-learning segmentation (birds) for efficient, versatile 3d mapping of mouse brain. Elife, 10:63455, 2021.
  • [48] Yutong Xie, Jianpeng Zhang, Chunhua Shen, and Yong Xia. Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation. In MICCAI, 2021.
  • [49] Xiao Yang, Roland Kwitt, Martin Styner, and Marc Niethammer. Quicksilver: Fast predictive image registration–a deep learning approach. NeuroImage, 158:378–396, 2017.
  • [50] Jun Zhang. Inverse-consistent deep networks for unsupervised deformable image registration. arXiv:1809.03443, 2018.
  • [51] Hong-Yu Zhou, Jiansen Guo, Yinghao Zhang, Lequan Yu, Liansheng Wang, and Yizhou Yu. nnformer: Interleaved transformer for volumetric segmentation. arXiv:2109.03201, 2021.
  • [52] S Kevin Zhou, Hoang Ngan Le, Khoa Luu, Hien V Nguyen, and Nicholas Ayache. Deep reinforcement learning in medical imaging: A literature review. MIA, 73:102193, 2021.

Appendix A Appendix

A.1 Comparison Between Diffeomorphic Methods and Inverse Consistent Methods

Refer to caption
Figure A1: Illustration of two different types of methods. Technically, diffeomorphic methods [4, 36, 44, 37, 49, 45, 39, 40] are also inverse-consistent. But following the convention in [50, 16, 31], we restrict inverse consistency in the essence of operating on two different mappings predicted by the same model. Since the diffeomorphic methods only operate on one mapping, where the other mapping is calculated from that mapping, rather than predicted. Thus, they are called diffeomorphic methods instead.

A.2 Proof of Thm. 1

Theorem 1 (Relaxed ideal symmetric registration via cross-sanity check).

An ideal symmetric registration meets

φabφba=id,\varphi^{a\rightarrow b}\circ\varphi^{b\rightarrow a}=id,

in which idid denotes the identity transformation. Then, a cross-sanity checked registration is a relaxed solution to the ideal registration, satisfying

gab+g~ba22<β(2α)N1α.||{g^{a\rightarrow b}+\tilde{g}^{b\rightarrow a}}||^{2}_{2}<\frac{\beta(2-\alpha)N}{1-\alpha}.
Proof 1.

A straightforward explanation for ideal symmetric registration φabφba=id\varphi^{a\rightarrow b}\circ\varphi^{b\rightarrow a}=id would be that the coordinates of one pixel (pp) for image aa stay the same after two transformations: forward transformation (from aa to bb) and backward transformation (from bb back to aa). Therefore, we have the chain of the coordinates changing

p\displaystyle p (A1)
abgab(p)+p\displaystyle\xRightarrow{a\rightarrow b}g^{a\rightarrow b}(p)+p
bagba(gab(p)+p)+gab+p\displaystyle\xRightarrow{b\rightarrow a}g^{b\rightarrow a}(g^{a\rightarrow b}(p)+p)+g^{a\rightarrow b}+p
=p.\displaystyle=p.

By simplification, we have the ideal inverse consistency:

gba(gab(p)+p)\displaystyle\underbrace{g^{b\rightarrow a}(g^{a\rightarrow b}(p)+p)} +gab+p=p\displaystyle+g^{a\rightarrow b}+p=p (A2)
g~ba\displaystyle\tilde{g}^{b\rightarrow a} +gab=0.\displaystyle+\quad g^{a\rightarrow b}\,\,=0.

Different from the ideal inverse consistency, recall that our cross-sanity check in norm form is

gab+g~ba22<α(gab22+g~ba22)+βN.||{g^{a\rightarrow b}+\tilde{g}^{b\rightarrow a}}||^{2}_{2}<\alpha(||{g^{a\rightarrow b}}||^{2}_{2}+||{\tilde{g}^{b\rightarrow a}}||^{2}_{2})+\beta N. (A3)

Suppose 0<α<10<\alpha<1 and β>0\beta>0. By expanding the cross-sanity check, we have

gabg~ba\displaystyle{g^{a\rightarrow b}}^{\top}\tilde{g}^{b\rightarrow a} <βN(1α)(gab22+g~ba22)2\displaystyle<\frac{\beta N-(1-\alpha)(||{g^{a\rightarrow b}}||^{2}_{2}+||{\tilde{g}^{b\rightarrow a}}||^{2}_{2})}{2} (A4)
βN2.\displaystyle\leq\frac{\beta N}{2}.

So, we have

0gabg~ba<βN2,\displaystyle 0\leq{g^{a\rightarrow b}}^{\top}\tilde{g}^{b\rightarrow a}<\frac{\beta N}{2}, (A5)

and

0<gab22+g~ba22<βN1α.\displaystyle 0<||{g^{a\rightarrow b}}||^{2}_{2}+||{\tilde{g}^{b\rightarrow a}}||^{2}_{2}<\frac{\beta N}{1-\alpha}. (A6)

Thus, we have

gab+g~ba22\displaystyle||{g^{a\rightarrow b}+\tilde{g}^{b\rightarrow a}}||^{2}_{2} =gab22+g~ba22+2gabg~ba\displaystyle=||{g^{a\rightarrow b}}||^{2}_{2}+||{\tilde{g}^{b\rightarrow a}}||^{2}_{2}+2{g^{a\rightarrow b}}^{\top}\tilde{g}^{b\rightarrow a} (A7)
<β(2α)N1α.\displaystyle<\frac{\beta(2-\alpha)N}{1-\alpha}.

Finally, we derive the lower/upper bound as

gab+g~ba22<β(2α)N1α.||{g^{a\rightarrow b}+\tilde{g}^{b\rightarrow a}}||^{2}_{2}<\frac{\beta(2-\alpha)N}{1-\alpha}. (A8)

It is obvious that our cross-sanity check formulation (Eq. A8) is a relaxed version of the strict symmetry in Eq. A2. So far, we prove that under our cross-sanity check, the symmetry of g~ba\tilde{g}^{b\rightarrow a} and gabg^{a\rightarrow b} is bounded by α\alpha and β\beta. When the strict symmetry is satisfied, our cross-sanity check is definitely satisfied. However, if our cross-sanity check is satisfied, it is not the other way around. We show in the experiments that our relaxed version of symmetry improves the overall results, quantitatively and qualitatively.

A.3 Proof of Thm. 2

Theorem 2 (Existence of the unique minimizer for our relaxed optimization).

Let mm and ff be two images defined on the same spatial domain Ω\Omega, which is connected, closed, and bounded in n\mathbb{R}^{n} with a Lipschitz boundary Ω\partial\Omega. Let g:H×HHg:H\times H\rightarrow H^{\star} be a displacement mapping from the Hilbert space HH to its dual space HH^{\star}. Then, there exists a unique minimizer gg^{\star} to the relaxed minimization problem.

Proof 2.

In reality, a meaningful deformation field cannot be unbounded. We first restrict gg to be a closed subset of L2(H)L^{2}(H^{\star}):

\displaystyle\mathcal{B}\triangleq {gL2(H):\displaystyle\{g\in L^{2}(H^{\star}): (A9)
||g||L2(H)2B,B+ only depends on Ω}.\displaystyle||{g}||_{L^{2}(H^{\star})}^{2}\leq B,B\in\mathbb{R}_{+}\text{ only depends on }\Omega\}.

We then seek solutions gg^{\star} to the minimization problem in the space H1(Ω)H^{1}(\Omega)\cap\mathcal{B}, and meanwhile satisfying our proposed checks. For short notations, we denote the minimization problem as

mingH1E(g),\displaystyle\qquad\qquad\qquad\min_{g\in H^{1}\cap\mathcal{B}}E(g), (A10)
s.t.
E(g)=Sim(f,m(gmf+id))+λrReg(gmf)22,\displaystyle\mkern-24.0muE(g)=-\operatorname*{Sim}(f,m\circ(g^{m\rightarrow f}+id))+\lambda_{r}||{\operatorname*{Reg}(g^{m\rightarrow f})}||^{2}_{2},

and λr\lambda_{r} is a positive constant. For gH1g\in H^{1}\cap\mathcal{B}, E(g)E(g) is bounded below and there exists a minimizing sequence {gk}k=1\{g_{k}\}_{k=1}^{\infty} satisfying

E(gk+1)E(gk)E(g1)\displaystyle E(g_{k+1})\leq E(g_{k})\leq\cdots\leq E(g_{1}) limkE(gk)\displaystyle\leq\lim_{k\rightarrow\infty}E(g_{k}) (A11)
=infH1BE(g).\displaystyle=\inf_{H^{1}\cap B}E(g).

For identical inputs, we have g22=0<BgH1||{g}||^{2}_{2}=0<B\Rightarrow g\in H^{1}\cap\mathcal{B}, thus, the self-sanity is checked. For different image pairs, we have

g+g~22\displaystyle||{g+\tilde{g}}||^{2}_{2} <α(g22+g~22)+βN2αB+βN.\displaystyle<\alpha(||{g}||^{2}_{2}+||{\tilde{g}}||^{2}_{2})+\beta N\leq 2\alpha B+\beta N. (A12)

By definition, gg and its reversed displacement g~\tilde{g} need to follow the constraint g(x)g~(x)0,xΩg(x)\tilde{g}(x)\leq 0,\forall x\in\Omega, so we have

max(g+g~22)<g22B2αB+βN<B.\max(||{g+\tilde{g}}||^{2}_{2})<||{g}||^{2}_{2}\leq B\Rightarrow 2\alpha B+\beta N<B. (A13)

Here, we let 2αB+βN<B2\alpha B+\beta N<B, since our cross-sanity check is considered a tighter bound than BB. Thus, by choosing appropriate α\alpha and β\beta, we can ensure that the cross-sanity is also checked, such that gH1g\in H^{1}\cap\mathcal{B}. Due to the fact that H1H^{1} is precompact in L2L^{2} space, there exists a convergent subsequence where we still denote as {gk}k=1\{g_{k}\}_{k=1}^{\infty}, and gH1g^{\star}\in H^{1}, such that gkgg_{k}\rightarrow g^{\star}, which is strongly in L2L^{2} and a.e. in Ω\Omega. Note that, either similarity functions (e.g. NCC) or distance functions (e.g. SSD) is naturally bounded in our image registration scenario, so that we can always have

Sim(f,m(g+p))limkSim(f,m(gk+p)).-\operatorname*{Sim}(f,m\circ(g^{\star}+p))\leq\lim_{k\rightarrow\infty}-\operatorname*{Sim}(f,m\circ(g_{k}+p)). (A14)

Besides, for the H1H^{1} regularization, since {gk}k=1\{g_{k}\}_{k=1}^{\infty} is a bounded convergent sequence in H1H^{1}\cap\mathcal{B}, and gkgg_{k}\rightarrow g^{\star} a.e. in Ω\Omega. By the dominant convergence theorem, we have

limkReg(gk)=Reg(g).\lim_{k\rightarrow\infty}\operatorname*{Reg}(g_{k})=\operatorname*{Reg}(g^{\star}). (A15)

Combining Eq. A14 with Eq. A15, we obtain

E(g)limkE(gk)=infH1BE(g).E(g^{\star})\leq\lim_{k\rightarrow\infty}E(g_{k})=\inf_{H^{1}\cap B}E(g). (A16)

Thus, gg^{\star} is indeed a solution to the minimization problem. So far, we prove that there exists a minimizer gg^{\star} of the modified optimization problem. We can then prove that gg^{\star} is unique. Note that, we assume that the similarity operator is concave (e.g., NCC, negative NCC is convex) when it is not a distance operator (e.g., SSD is convex) on the transformation with a pair of (m,f)(m,f). I.e., g=g|m,fg^{\star}=g|m,f, i.e., gg^{\star} learned by the model and conditioned on this specific (m,f)(m,f) pair to satisfy the proposed sanity checks. Therefore, since the data term (e.g. SSD) and the regularization (H1H^{1} regularization) are convex, together with the convex search space for gg, the uniqueness of the minimizer gg^{\star} is proved.

A.4 Proof of Thm. 3

Theorem 3 (Loyalty of the sanity-checked minimizer).

Let 𝐠{\mathbf{g}}_{\ast} be the optimal minimizer to the bidirectional registration problem, defined in Eq. 10, and 𝐠sanity{\mathbf{g}}_{\rm sanity} as our sanity-checked minimizer, defined in Eq. 11. The distance between these two minimizers can be upper bounded as

Sim(𝐠sanity)Sim(𝐠)λ2𝐀(𝐠sanity𝐠)22.\displaystyle\operatorname*{Sim}({\mathbf{g}}_{\rm sanity})-\operatorname*{Sim}({\mathbf{g}}_{\ast})\leq\frac{\mathbf{\lambda}}{2}||{{\mathbf{A}}({\mathbf{g}}_{\rm sanity}-{\mathbf{g}}_{\ast})}||^{2}_{2}.
Proof 3.

Since 𝐠{\mathbf{g}}_{\ast} is optimal, thus we have that Sim(𝐠)+λ2𝐀𝐠𝐲22Sim(𝐠sanity)+λ2𝐀𝐠sanity𝐲22-\operatorname*{Sim}({\mathbf{g}}_{\ast})+\frac{\mathbf{\lambda}}{2}||{{\mathbf{A}}{\mathbf{g}}_{\ast}-{\mathbf{y}}}||^{2}_{2}\leq-\operatorname*{Sim}({\mathbf{g}}_{\rm sanity})+\frac{\mathbf{\lambda}}{2}||{{\mathbf{A}}{\mathbf{g}}_{\rm sanity}-{\mathbf{y}}}||^{2}_{2}. Since 𝐀𝐠𝐲22=0||{{\mathbf{A}}{\mathbf{g}}_{\ast}-{\mathbf{y}}}||^{2}_{2}=0, with elimination we have that Sim(𝐠)Sim(𝐠sanity)+λ2𝐀𝐠sanity𝐲22-\operatorname*{Sim}({\mathbf{g}}_{\ast})\leq-\operatorname*{Sim}({\mathbf{g}}_{\rm sanity})+\frac{\mathbf{\lambda}}{2}||{{\mathbf{A}}{\mathbf{g}}_{\rm sanity}-{\mathbf{y}}}||^{2}_{2}. By substituting 𝐲{\mathbf{y}} with 𝐀𝐠{\mathbf{A}}{\mathbf{g}}_{\ast} and combining like terms, we have Eq. 12. Therefore, we prove the loyalty of the unique minimizer 𝐠sanity{\mathbf{g}}_{\rm sanity} to the optimal minimizer 𝐠{\mathbf{g}}_{\ast}, controlled by weight λ\mathbf{\lambda}.

A.5 Proof of CS Error Upper Bound

Recall we want to prove that the CS error is upper bounded in the form of

CS(𝐠sanity)22<2(1α)βN.\displaystyle||{\text{CS}({\mathbf{g}}_{\rm sanity})}||^{2}_{2}<2(1-\alpha)\beta N.
Proof 4.

Recall CS error in Eq. 9, and by Thm. 1, we have

CS(𝐠sanity)22\displaystyle||{\text{CS}({\mathbf{g}}_{\rm sanity})}||^{2}_{2} (A17)
=𝐠sanity+𝐠~sanity22α(𝐠sanity22+𝐠~sanity22)2βN\displaystyle=||{{\mathbf{g}}_{\rm sanity}+\tilde{{\mathbf{g}}}_{\rm sanity}}||^{2}_{2}-\alpha(||{{\mathbf{g}}_{\rm sanity}}||^{2}_{2}+||{\tilde{{\mathbf{g}}}_{\rm sanity}}||^{2}_{2})-2\beta N
=(1α)(||𝐠sanity||22+||𝐠~sanity||22\displaystyle=(1-\alpha)(||{{\mathbf{g}}_{\rm sanity}}||^{2}_{2}+||{\tilde{{\mathbf{g}}}_{\rm sanity}}||^{2}_{2}
+21α𝐠sanity𝐠~sanity)2βN\displaystyle\quad\quad\quad\quad\quad\quad+\frac{2}{1-\alpha}{{\mathbf{g}}_{\rm sanity}}^{\top}\tilde{{\mathbf{g}}}_{\rm sanity})-2\beta N
(1α)(||𝐠sanity||22+||𝐠~sanity||22\displaystyle\leq(1-\alpha)(||{{\mathbf{g}}_{\rm sanity}}||^{2}_{2}+||{\tilde{{\mathbf{g}}}_{\rm sanity}}||^{2}_{2}
+2𝐠sanity𝐠~sanity)2βN\displaystyle\quad\quad\quad\quad\quad\quad+2{{\mathbf{g}}_{\rm sanity}}^{\top}\tilde{{\mathbf{g}}}_{\rm sanity})-2\beta N
=(1α)𝐠sanity+𝐠~sanity222βN\displaystyle=(1-\alpha)||{{\mathbf{g}}_{\rm sanity}+\tilde{{\mathbf{g}}}_{\rm sanity}}||^{2}_{2}-2\beta N
<(1α)2β(2α)1αN2βN\displaystyle<(1-\alpha)\frac{2\beta(2-\alpha)}{1-\alpha}N-2\beta N
=2(1α)βN,where0<α<1andβ>0.\displaystyle=2(1-\alpha)\beta N,\quad\text{where}\quad 0<\alpha<1\quad\text{and}\quad\beta>0.

The first inequality holds since 𝐠sanity𝐠~sanity0{{\mathbf{g}}_{\rm sanity}}^{\top}\tilde{{\mathbf{g}}}_{\rm sanity}\leq 0 for valid inverse consistent displacements. The second inequality holds for two directions (mfm{\rightarrow}f and fmf{\rightarrow}m) by Thm. 1 . Thus, the proof for CS error upper bound is completed.

α\alpha β\beta Dice SDice FV AJ×104 SSE×10-1 CSE
0.1 5 0.7263 0.8850 1.5220 3.00 5.156 10.72
0.1 5 0.7223 0.8745 1.6130 3.30 4.337 11.05
0 0 0.7178 0.9486 1.4041 2.74 0.693 7.58→1.8176.12%
0.1 0.1 0.7199 0.9533 1.2460 2.30 0.721 6.39→2.2165.41%
1 0.7223 0.9654 1.2070 2.02 0.643 6.81→4.6032.45%
3 0.7233 0.9685 1.0160 1.51 0.596 8.70→5.1940.34%
5 0.7247 0.9588 0.8216 1.09 0.723 10.72→5.0353.07%
7 0.7226 0.9689 0.6632 0.77 0.642 12.53→4.5263.92%
8 0.7232 0.9600 0.6152 0.68 0.668 13.39→4.2768.11%
9 0.7226 0.9699 0.5534 0.58 0.572 14.25→4.0371.72%
11 0.7206 0.9665 0.4472 0.41 0.564 15.83→3.5377.70%
12 0.7211 0.9668 0.4219 0.38 0.572 16.55→3.4379.27%
13 0.7186 0.9742 0.4278 0.39 0.543 17.22→3.3980.31%
14 0.7189 0.9678 0.4182 0.37 0.543 17.80→3.2481.80%
16 0.7179 0.9681 0.4345 0.40 0.561 18.79→2.8984.62%
20 0.7124 0.9693 0.4668 0.44 0.577 20.36→2.8286.15%
0.01 5 0.7229 0.9677 0.9842 1.46 0.623 11.30→5.4351.94%
0.10 0.7247 0.9588 0.8216 1.09 0.723 10.72→5.0353.07%
0.15 0.7230 0.9624 0.7891 0.99 0.591 10.42→4.7754.22%
0.20 0.7222 0.9656 0.7479 0.90 0.586 10.15→4.5355.37%
0.50 0.7218 0.9570 0.7276 0.86 0.642 8.64→3.4759.83%
1 0.7165 0.9353 0.9622 1.45 1.277 7.41→1.9373.95%
Table A1: Ablation study for α,β\alpha,\beta, where the loss weights for both sanity check losses are set to 0.001. \ast denotes the checkpoint model used in this ablation (resulting in initial CSE errors on the left-side of arrows), and {\dagger} represents where we follow the standard protocol to finetune the model for the same epochs as the rest ablative experiments. In this case, α\alpha and β\beta are only used to calculate the CSE. Along with SSE, they are not part of the training loss.
Refer to caption
Figure A2: Qualitative comparisons of different model variants on: (Top) IXI dataset, (Bottom) OASIS validation dataset.

A.6 Cross-sanity Check Numerical Study

We present our numerical study for α\alpha and β\beta parameters in Tab. A1. This ablation study is conducted on the same subset of IXI training dataset described in the ablation study section but validate/test in the entire validation/test set. Our thought is that compared to the problem of enforcing strict inverse consistency over two different displacements, the relaxed version might be easier to solve, mathematically. Again, practically speaking, it is rather difficult to find one-to-one correspondence for every point in the moving-fixed image pair, which is the strict inverse consistency saying, relaxing or setting an error threshold can be effective from this perspective. Our error-bound formulation implicitly presents guidance for training such sanity-checked models.

Settings of α\alpha and β\beta. A uniform estimate of α\alpha and β\beta is possible, however, such a bound is not sharp, and it will lead to over-estimation of λc\lambda_{c} (the regularization parameter) for different applications. Hence, we prefer to derive the bounds on α\alpha and β\beta on particular sets of applications, where we can easily find such bounds from the formulations. As stated in Thm. 1, α\alpha and β\beta control the relaxation. We can also directly derive an upper bound from the check that constrains the ratio of two displacements. Since gabg~ba<0g^{a\rightarrow b}\tilde{g}^{b\rightarrow a}{<}0, then gabg~ba+g~bagab<21α\frac{g^{a\rightarrow b}}{\tilde{g}^{b\rightarrow a}}{+}\frac{\tilde{g}^{b\rightarrow a}}{g^{a\rightarrow b}}{<}\frac{2}{1-\alpha}, β\beta is neglected for simplicity. These two bounds estimate ranges of α\alpha and β\beta for our relaxation. E.g., we use existing models (e.g., VM) to predict ten samples randomly, and set β\beta to 0.15× maximum displacement; set α\alpha to 0.1 for models outputting absolute displacements (e.g., VM and TM), or α\alpha to 0.01 for models outputting relative displacements (e.g., DIRAC). Note that this only needs to be done once, while the previous experiment shows that the registrations are pretty robust among a range of α\alpha and β\beta. Thus, it is safe to choose α\alpha and β\beta within the range.

TrainTest\frac{\text{Train}}{\text{Test}}

Model

Dice

SDice

HD95

SDlogJ

FV

AJ×104

SSE×10-1

CSE

IXIOASIS\frac{\text{IXI}}{\text{OASIS}}

VM

0.714 0.848 3.039 0.118 0.831 1.52 6.24 7.49

VM-ESC

0.759 1.000 2.563 0.080 0.173 0.16 0.00 2.86

TMBS

0.775 0.901 2.201 0.072 0.00 0.00 24.4 15.16

TMBS-ESC

0.783 1.000 2.144 0.061 0.00 0.00 0.00 3.73

OASISIXI\frac{\text{OASIS}}{\text{IXI}}

TM

0.705 0.918 4.083 0.132 1.390 2.67 3.18 13.78

TM-SC

0.718 1.000 3.903 0.097 0.674 0.84 0.01 5.58
Table A2: Generalization ability study on sanity-checked registers to study cross-dataset registration performance. Here, we set α=0.1\alpha{=}0.1, and β=12\beta{=}12 to calculate cross-sanity errors in both settings.
Refer to caption
Figure A3: Qualitative comparisons between ICNet [50], ICON [16] and our method on IXI dataset.

A.7 Ablative Qualitative Comparisons

Qualitative comparisons between our ablative model variants are shown in Fig. A2.

A.8 Sanity-awareness Preservation Study on Cross-Dataset Scenario

We test whether sanity awareness is preserved in cross-dataset scenarios. We train our sanity-checked register in one dataset and then test it on different dataset so no overlapping between training and testing datasets. The results are shown in Tab. A2. Compared to methods without sanity checks, our sanity-checked models improve in every metric, certifying that our sanity checks do not harm the model training. Besides, the sanity-checked registers still preserve good sanity for preventing corresponding errors.

A.9 Experimental Results Statistical Significance

We specifically study whether our results are statistically significant, compared to the other strong baselines, e.g., ICON [16] and DIRAC [31]. We calculate p value using scipy package. Compared to ICON, our VM-ESC (p value: 0.0174) and TMBS-ESC (p value: 0.0280), while for DIRAC, our DIRAC-SC (p value: 0.0204), considering all metrics shown in the corresponding tables. All the p values are <0.05<0.05, indicating that our results are statistically significant.

A.10 Error Map Comparisons between Inverse Consistent Methods

Qualitative comparisons between inverse consistent methods on IXI dataset are shown in Fig. A3.

Method TRE↓ STRE↓ ROB↑ FV↓ AJ×102
DIRAC 2.760±\pm0.247 0.274±\pm0.027 0.776±\pm0.055 0.025±\pm0.009 4.242±\pm2.954
DIRAC-C 2.721±\pm0.262 0.268±\pm0.039 0.791±\pm0.044 0.022±\pm0.008 3.012±\pm1.442
DIRAC-SC 2.719±\pm0.259 0.218±\pm0.046 0.795±\pm0.034 0.022±\pm0.005 3.001±\pm1.314
Table A3: Performance of replacing the inverse consistent error.

A.11 Performance of Replacing DIRAC’s Inverse Error

We denote it as DIRAC-C, and report in Tab. A3.

A.12 Role of Image Similarity Loss

The image similarity loss still plays a very important role during training. The reason is that self\mathcal{L}_{\rm self} and cross\mathcal{L}_{\rm cross} are defined on displacements, to calculate such losses, we need to ensure that those displacements are meaningful, which is guaranteed via sim\mathcal{L}_{\rm sim}. Compared to the value of NCC (<<1), the cross-sanity error is relatively large (Tab. 4), and using small λc\lambda_{c} will not interfere with the optimizations.