¹¹institutetext: Department of Electrical and Computer Engineering, Vanderbilt University²²institutetext: Department of Computer Science, Vanderbilt University
²²email: dewei.hu@vanderbilt.edu

VesselMorph: Domain-Generalized Retinal Vessel Segmentation via Shape-Aware Representation

Dewei Hu 11 Hao Li 11 Han Liu 22 Xing Yao 22 Jiacheng Wang 22 Ipek Oguz 1122

Abstract

Due to the absence of a single standardized imaging protocol, domain shift between data acquired from different sites is an inherent property of medical images and has become a major obstacle for large-scale deployment of learning-based algorithms. For retinal vessel images, domain shift usually presents as the variation of intensity, contrast and resolution, while the basic tubular shape of vessels remains unaffected. Thus, taking advantage of such domain-invariant morphological features can greatly improve the generalizability of deep models. In this study, we propose a method named VesselMorph which generalizes the 2D retinal vessel segmentation task by synthesizing a shape-aware representation. Inspired by the traditional Frangi filter and the diffusion tensor imaging literature, we introduce a Hessian-based bipolar tensor field to depict the morphology of the vessels so that the shape information is taken into account. We map the intensity image and the tensor field to a latent space for feature extraction. Then we fuse the two latent representations via a weight-balancing trick and feed the result to a segmentation network. We evaluate on six public datasets of fundus and OCT angiography images from diverse patient populations. VesselMorph achieves superior generalization performance compared with competing methods in different domain shift scenarios.

Keywords:

domain generalization vessel segmentation tensor field shape representation retina

1 Introduction

Medical images suffer from the distribution shift caused by the discrepancy in imaging acquisition protocols. Images can appear in different contrast, resolution and range of intensity values, even within the same modality. A set of examples is shown in Fig. 1. This obstacle severely impedes the learning-based algorithms reaching clinical adoption. Therefore, much effort has been spent on solving the domain generalization (DG) problem so that the deep models can robustly work on out-of-distribution (OOD) data. There are three major types of solutions: data augmentation [23, 18], meta-learning [6, 14] and domain alignment [24]. The first two strategies aim to improve the model’s generalizability by either augmenting the source domain with additional data or replicating the exposure to OOD data during training. In contrast, the domain alignment strives to align the distribution of the target domains in either image [8] or feature space [1, 15].

Refer to caption — Figure 1: Domain shift among retinal vessel images. Panels (1-3) are the green channel of fundus images. Panels (4-5) are OCT-A images. They all have the same size ( $300\mathrm{pix}\times 300\mathrm{pix}$ ). Suppose (1) represents the source domain. Distribution shifts in test domain can be caused by (2) pathology, (3) resolution change, and (4, 5) different modality.

We propose a novel method, VesselMorph, to improve the DG performance by providing an explicit description of the domain-agnostic shape features as auxiliary training material. Even though traditional algorithms are outperformed by their learning-based counterparts in many aspects, they can typically better generalize to any dataset, regardless of distribution shifts. Specifically for vessel segmentation, Frangi et al. [7] proposed a Hessian-based model to express the tubular shape of vessels which can be regarded as a domain-invariant feature. Merging the Hessian-based shape description [12] with the principles of diffusion tensor imaging (DTI) [13], we introduce a bipolar tensor field (BTF) to explicitly represent the vessel shape by a tensor at each pixel. To effectively merge the features in the intensity image and the shape descriptor BTF, we employ a full-resolution feature extraction network to obtain an interpretable representation in the latent space from both inputs. This technique is broadly used in unsupervised segmentation [17, 11] and representation disentanglement [3, 21].

As shown in Fig. 2, let $\mathbf{x}$ be the input image and $\Psi(\mathbf{x})$ the corresponding BTF. $D(E^{I}(\cdot))$ and $D(E^{S}(\cdot))$ are two feature extraction networks with a shared decoder $D$ . We empirically observe that the intensity representation $\mathbf{z}^{I}$ can precisely delineate thinner vessels while the structure representation $\mathbf{z}^{S}$ works better on thick ones. We combine the strengths of the two pathways for a robust DG performance. The two latent images are fused by a weight-balancing trick $\Gamma(\mathbf{z}^{I},\mathbf{z}^{S})$ to avoid any potential bias induced by the selection of source domains. Finally, we train a segmentation network $D^{T}$ on the fused latent images. We compare the performance of VesselMorph to other DG approaches on four public datasets that represent various distribution shift conditions, and show that VesselMorph has superior performance in most OOD domains. Our contributions are:

❖

A Hessian-based bipolar tensor field (BTF) that provides an explicit description of the vessel morphology (Sec. 2.1).
❖

A full-resolution feature extraction network that generates vessel representation from both the intensity image and the BTF (Sec. 2.2).
❖

A training pipeline that generates stable latent images for both pathways and a weight-balancing method to fuse the two representations (Sec. 2.3).
❖

A comprehensive evaluation on public datasets which shows superior cross-resolution and cross-modality generalization performance (Sec. 3.2).

2 Methods

2.1 Bipolar Tensor Field

Unlike ML models, our visual interpretation of vessels is rarely affected by data distribution shifts. Mimicking the human vessel recognition can thus help address the DG problem. In addition to intensity values, human perception of vessels also depends on the local contrast and the correlation in a neighborhood, which is often well described by the local Hessian. Inspired by the use of DTI to depict the white matter tracts, we create a Hessian-based bipolar tensor field to represent the morphology of vessels. Given a 2D input image $\mathbf{x}\in\mathbb{R}^{h\times w}$ and scale $\sigma$ , the classical Frangi vesselness $\mathcal{V}(\sigma)$ [7] is defined as:

\mathcal{V(\sigma)}=\begin{cases}0&\text{if }\lambda_{2}>0,\\ \exp\left(-\frac{\mathcal{R}_{B}^{2}}{2\beta^{2}}\right)\left[1-\exp\left(-\frac{S^{2}}{2c^{2}}\right)\right]&\text{else}\end{cases}.

(1)

Here, $\lambda_{1},\lambda_{2}$ are the sorted eigenvalues of the Hessian $\mathcal{H}$ , $\mathcal{R}_{B}=\lambda_{1}/\lambda_{2}$ , $S$ is the Frobenius norm of the Hessian ( $\|\mathcal{H}\|_{F}$ ), $\beta=0.5$ and $c=0.5$ . Note that we assume vessels are brighter than the background; fundus images are negated to comply. To represent vessels of different sizes, we leverage the multiscale vesselness filter that uses the optimal scale $\sigma^{*}$ for the Hessian $\mathcal{H}(\mathbf{x}_{ij},\sigma)$ at each pixel $(i,j)$ . This is achieved by grid search in the range $[\sigma_{min},\sigma_{max}]$ to maximize the vesselness $\mathcal{V}(\sigma)$ , i.e., $\sigma^{*}=\operatorname*{argmax}_{\sigma_{min}\leq\sigma\leq\sigma_{max}}\mathcal{V}(\sigma)$ . Then the optimized Hessian is represented by a $2\times 2$ matrix:

\mathcal{H}(\mathbf{x}_{ij},\sigma^{*})=(\sigma^{*})^{2}\mathbf{x}_{ij}\ast\nabla^{2}G(\mathbf{x}_{ij},\sigma^{*})

(2)

where $G(\mathbf{x}_{ij},\sigma^{*})$ is a 2D Gaussian kernel with standard deviation $\sigma^{*}$ . Then we apply the eigen decomposition to obtain the eigenvalues $\lambda_{1}$ , $\lambda_{2}$ ( $|\lambda_{1}|\leq|\lambda_{2}|$ ) and the corresponding eigenvectors $\mathbf{v}_{1}$ , $\mathbf{v}_{2}$ at the optimal $\sigma^{*}$ .

Instead of solely analyzing the signs and magnitudes of the Hessian eigenvalues as in the traditional Frangi filter, we propose to leverage the eigenvectors along with custom-designed magnitudes to create our tensor field as shown in Fig. 3(Left). The core idea of Frangi filter is to enhance the tubular structure by matching the vessel diameter with the distance between the two zero crossings in the second order derivative of Gaussian ( $2\sqrt{2}\sigma^{*}$ ). However, the solution is not guaranteed to land in range $[\sigma_{min},\sigma_{max}]$ , especially for small vessels. Consequently, we observe that the inaccurate estimation of $\sigma^{*}$ results in a blurring effect at the vessel boundary, which is problematic for segmentation. As an example in Fig. 3(Left), the direction of $\mathbf{v}_{1}$ at $p_{2}$ aligns with that at $p_{1}$ , even though $p_{1}$ is inside the vessel while $p_{2}$ is in the background but close to the boundary. This makes it difficult for the vector orientations alone to differentiate points inside and outside the vessel. To tackle this, we introduce the idea of a bipolar tensor by assigning a large magnitude to the orthogonal eigenvector $\mathbf{v}_{2}$ to points in the background, as shown in the blue dashed ellipse. Specifically, we define the magnitudes $\alpha_{1}$ and $\alpha_{2}$ associated with the eigenvectors $\mathbf{v}_{1}$ and $\mathbf{v}_{2}$ as:

\displaystyle\alpha_{1}={\underbrace{\vphantom{\left(\frac{\lambda_{1}^{2}}{\|\mathcal{H}\|^{2}_{F}}\right)}P(\mathbf{x}\leq\mathbf{x}_{ij})}_{\text{bright}}}{\underbrace{\exp\left(-\epsilon\frac{\lambda_{1}^{2}}{\|\mathcal{H}\|^{2}_{F}}\right)}_{\text{vessel-like}}},\quad\alpha_{2}={\underbrace{\vphantom{\left(\frac{\lambda_{2}^{2}}{\|\mathcal{H}\|^{2}_{F}}\right)}P(\mathbf{x}>\mathbf{x}_{ij})}_{\text{dark}}}{\underbrace{\exp\left(-\epsilon\frac{\lambda_{2}^{2}}{\|\mathcal{H}\|^{2}_{F}}\right)}_{\text{vessel-like}}}

(3)

where $P(\mathbf{x}>\mathbf{x}_{ij})$ is the probability that the intensity of a random pixel $x$ in the image is greater than $\mathbf{x}_{ij}$ . This is equivalent to normalizing the histogram by the factor $hw$ and computing the cumulative distribution function at $\mathbf{x}_{ij}$ . This term thus provides a normalized brightness function in the range $[0,1]$ . The exponential term represents how vessel-like the voxel is by using a normalized eigenvalue, and is in the $[0,1]$ range as well. $\epsilon$ is a constant that controls the sensitivity, which is empirically set to $0.5$ . With the custom magnitudes $\alpha_{1}$ and $\alpha_{2}$ , the two poles can better differentiate vessels from the background. Fig. 3(Right) is an example of BTF on an OCTA image. In practice, we stack the two vectors as the input to the structural encoding network, i.e., $\Psi(\mathbf{x}_{ij})=\left[\alpha_{1}\mathbf{v}_{1}^{\top},\alpha_{2}\mathbf{v}_{2}^{\top}\right]^{\top}\in\mathbb{R}^{4\times 1}$ .

2.2 Latent Vessel Representation

Preserving the spatial resolution for the bottleneck of models with U-Net backbone is a common strategy to emphasize the structural features in unsupervised segmentation [17, 11] and representation disentanglement [3, 21]. We employ a network that has a full-resolution ( $h\times w$ pixels) latent space as the feature extraction model. We propose to extract vessel structure from both the intensity image $\mathbf{x}\in\mathbb{R}^{h\times w}$ and its corresponding BTF, $\Psi(\mathbf{x})\in\mathbb{R}^{4\times h\times w}$ . Therefore, in Fig. 2, the intensity $D(E^{I}(\cdot))$ and structure $D(E^{S}(\cdot))$ encoding pathways share the decoder D, and the latent images $\mathbf{z}^{I},\mathbf{z}^{S}\in\mathbb{R}^{h\times w}$ . To distribute more workload on the encoder, $D$ has a shallower architecture and will be discarded in testing. For the intensity encoding, the model is optimized by minimizing the segmentation loss function defined as the combination of cross-entropy and Dice loss:

\mathcal{L}_{seg}=-\frac{1}{N}\sum_{n=1}^{N}\mathbf{y}_{n}\log\hat{\mathbf{y}}^{I}_{n}+\left(1-\frac{2\sum_{n=1}^{N}\mathbf{y}_{n}\hat{\mathbf{y}}^{I}_{n}}{\sum_{n=1}^{N}\mathbf{y}_{n}^{2}+(\hat{\mathbf{y}}^{I}_{n})^{2}}\right)

(4)

where $N=h\times w$ is the total number of pixels in the image, $\mathbf{y}$ is the ground truth and $\hat{\mathbf{y}}^{I}$ is the prediction from the training-only decoder $D$ . Although there is no explicit constraint on the latent image $E^{I}(\mathbf{x})=\mathbf{z}^{I}$ , we note that the segmentation-based supervision encourages it to include the vessels while most other irrelevant features are filtered out. Hence, we can view the latent feature as a vessel representation.

Our approach is slightly different for the structure encoding as we observe that it is hard for the feature extraction network to generate a stable latent image that is free of artifacts when the number of input channels is greater than 1. Thus, it is necessary to use $E^{I}$ as a teacher model that provides direct supervision on the vessel representation. In other words, we first train the intensity encoding path to get $E^{I}$ and $D$ , then train the $E^{S}$ by leveraging both the segmentation loss in Eq. 4 and a similarity loss defined as:

\mathcal{L}_{sim}(\mathbf{z}^{S},\mathbf{z}^{I})=\sum_{n=1}^{N}\|\mathbf{z}^{S}_{n}-\mathbf{z}^{I}_{n}\|_{1}+\mathrm{SSIM}(\mathbf{z}^{S},\mathbf{z}^{I})

(5)

which is a weighted sum of $L_{1}$ norm and structural similarity loss SSIM [10]. SSIM is defined as $\mathrm{SSIM}(A,B)=2\frac{(2\mu_{A}\mu_{B}+c_{1})(2\sigma_{AB}+c_{2})}{(\mu_{A}^{2}+\mu_{B}^{2}+c_{1})(\sigma_{A}^{2}+\sigma_{B}^{2}+c_{2})},$ where $\mu$ and $\sigma$ represent the mean and standard deviation of the image, and we set $c_{1}=0.01$ and $c_{2}=0.03$ . The overall loss function for the structural encoding is thus $\mathcal{L}(\Psi(\mathbf{x}),\mathbf{y})=\omega_{1}\mathcal{L}_{seg}(\hat{\mathbf{y}}^{S},\mathbf{y})+\omega_{2}\mathcal{L}_{sim}(\mathbf{z}^{S},\mathbf{z}^{I})$ , with empirically determined weights $\omega_{1}=1$ , $\omega_{2}=5$ . Experimentally, we found that the $\mathbf{z}^{I}$ is good at preserving small vessels, while $\mathbf{z}^{S}$ works better on larger ones.

2.3 Fusion of Vessel Representations

Given the two synthesized vessel representations $\mathbf{z}^{I}$ and $\mathbf{z}^{S}$ , we need to introduce a fusion method to take advantage of both intensity and structure features. Naively stacking these two channels as input to the segmentation network is prone to inducing bias: if $\mathbf{z}^{I}$ is consistently better for images from the source domain, then the downstream task model $D^{T}$ would learn to downplay the contribution of $\mathbf{z}^{S}$ due to this biased training data. As a result, despite its potential to improve performance, $\mathbf{z}^{S}$ would be hindered from making a significant contribution to the target domain during testing. To circumvent this issue, we propose a simple weight-balancing trick. As illustrated in Fig. 2, we randomly swap some patches between the two latent images so that $D^{T}$ does not exclusively consider the feature from a single channel, even for biased training data. This trick is feasible because $\mathbf{z}^{S}$ and $\mathbf{z}^{I}$ are in the same intensity range, due to the similarity constraints applied in Eq. 5. Thus the input to $D^{T}$ is $\tilde{\mathbf{x}}=\Gamma(\mathbf{z}^{I},\mathbf{z}^{S})$ , where $\tilde{\mathbf{x}}\in\mathbb{R}^{2\times h\times w}$ . The loss function leveraged for $D^{T}$ is the same as Eq. 4.

input : Source domains

\mathcal{S}=\{(\mathbf{x}_{i},\mathbf{y}_{i})\}_{i=1}^{K}

hyperparameters:

\epsilon,\sigma_{min},\sigma_{max}

\eta_{E^{I}}

\eta_{E^{S}}

\eta_{D^{T}}

\omega_{1}

\omega_{2}

output : parameters of models

\theta_{I}^{*}

\theta_{S}^{*}

\varphi^{*}_{T}

// Train the intensity encoder

E^{I}

as a teacher model

1 repeat

for $i=1:K$ do

\theta_{I}^{{}^{\prime}}\leftarrow\theta_{I}-\eta_{E^{I}}(i)\nabla\mathcal{L}_{seg}(D(E^{I}(\mathbf{x}_{i})),\mathbf{y}_{i})

until converge

2Generate the tensor field:

\Psi(\mathbf{x})

// Train the structure encoder

E^{S}

as a student model

3 repeat

for $i=1:K$ do

\hat{\mathbf{y}}_{i}\leftarrow D(E^{S}(\Psi(\mathbf{x}_{i})))

\mathcal{L}(\Psi(\mathbf{x}_{i}),\mathbf{y}_{i})\leftarrow\omega_{1}\mathcal{L}_{seg}(\hat{\mathbf{y}}_{i},\mathbf{y}_{i})+\omega_{2}\mathcal{L}_{sim}(E^{S}(\Psi(\mathbf{x}_{i})),E^{I}(\mathbf{x}_{i}))

\theta_{S}^{{}^{\prime}}\leftarrow\theta_{S}-\eta_{E^{S}}(i)\nabla\mathcal{L}(\Psi(\mathbf{x}_{i}),\mathbf{y}_{i})

until converge

// Train the segmentation network

D^{T}

4 repeat

for $i=1:K$ do

\tilde{\mathbf{x}}_{i}\leftarrow\Gamma(E^{I}(\mathbf{x}_{i}),E^{S}(\Phi(\mathbf{x}_{i})))

\varphi^{{}^{\prime}}_{T}\leftarrow\varphi_{T}-\eta_{D^{T}}(i)\nabla\mathcal{L}_{seg}(D^{T}(\tilde{\mathbf{x}}_{i}),\mathbf{y}_{i})

until converge

Algorithm 1 Training of VesselMorph

The complete algorithm for training of VesselMorph is shown in Algorithm.1. Briefly, we first train the intensity encoder $E^{I}$ as it is easier to generate a stable vessel representation $\mathbf{z}^{I}$ . Then a structure encoder $E^{S}$ is trained with the supervision of the ground truth and teacher model $E^{I}$ so that an auxiliary representation $\mathbf{z}^{S}$ is extracted from the structural descriptor BTF. The last step is to train a segmentation network $D^{T}$ with the fusion of the two vessel maps $\Gamma(\mathbf{z}^{I},\mathbf{z}^{S})$ . During testing, the patch-swapping is no longer needed, so we simply concatenate $E^{I}(\mathbf{x})$ and $E^{S}(\Psi(\mathbf{x}))$ as the input to $D^{T}$ .

	modality	resolution	$\#$ sample	domain
DRIVE [22]	fundus	$565\times 584$	20	$\mathcal{S}$
STARE [9]	fundus	$700\times 605$	20	$\mathcal{S}$
ARIA [5]	fundus	$768\times 576$	$61/59/23$	$\mathcal{S}/\mathcal{T}/\mathcal{T}$
HRF [2]	fundus	$3504\times 2336$	$15/15/15$	$\mathcal{T}$
OCTA-500(6M) [16]	OCTA	$400\times 400$	300	$\mathcal{T}$
ROSE [19]	OCTA	$304\times 304$	30	$\mathcal{T}$

Table 1: Publicly available datasets used in our experiments. For ARIA and HRF, we list the number of samples per class. ARIA classes: healthy, diabetic and AMD (age-related macular degeneration). HRF classes: healthy, diabetic and glaucoma. The shading of the rows indicates datasets in similar distributions to each other.

3 Experiments

3.1 Experimental Settings

Datasets.

The 6 publicly available datasets used in this study are listed in Table 1. Since there are more labeled fundus data available, we set up a source domain $\mathcal{S}$ that includes three fundus datasets: DRIVE, STARE and the control subjects in ARIA. In the target domain $\mathcal{T}$ , we test the performance of the model under three different conditions: pathology (diabetic/AMD subjects in ARIA), resolution change (HRF) and cross-modality (OCTA500 and ROSE).

Compared methods. We pick one representative algorithm from each of the three major categories of DG approaches (Sec. 1) as a competing method. For data augmentation, we implement BigAug [23]. For meta-learning, we use the MASF [4] model. For domain alignment, we use the domain regularization network [1]. In addition, we also include VFT [12] which proposes the idea of shape description for DG. The baseline model is a vanilla residual U-Net trained on $\mathcal{S}$ , and the oracle model is the same network trained directly on each target domain to represent the optimal performance. Note that for a fair comparison, we set the baseline model to have a bit more parameters than $D(E^{I}(\cdot))$ ( $7.4\times 10^{5}:6.7\times 10^{5}$ ).

Implementation Details. We use the residual U-Net structure for $E^{I}$ , $D$ and $D^{T}$ . To take advantage of the tensor field, the structure encoder $E^{S}$ is equipped with parallel transformer blocks with different window sizes as proposed in [12]. All networks are trained and tested on an NVIDIA RTX 2080TI 11GB GPU. We use a batch size of 5 and train for 100 epochs. We use the Adam optimizer with the initial learning rate $\eta_{E^{I}}=\eta_{E^{S}}=5\times 10^{-4}$ , $\eta_{D^{T}}=1\times 10^{-3}$ , decayed by 0.5 for every 3 epochs. For fundus images, we use the green channel as network input $\mathbf{x}$ . The intensity values are normalized to $[0,1]$ .

3.2 Results

Method	ARIA		HRF
Method	amd	diabetic	healthy	diabetic	glaucoma	OCTA 500	ROSE
baseline	$\displaystyle 0.6382$	$\displaystyle 0.6519$	$\displaystyle 0.6406$	$\displaystyle 0.5267$	$\displaystyle 0.5566$	$\displaystyle 0.7316$	$\displaystyle 0.6741$
+Regular	$\displaystyle 0.6489$	$\displaystyle 0.6697$	$\displaystyle 0.6403$	$\displaystyle 0.5216$	$\displaystyle 0.5625$	$\displaystyle 0.7354$	$\displaystyle 0.6836$
+BigAug	$\displaystyle\underline{0.6555}$	$\displaystyle 0.6727$	$\displaystyle 0.6613$	$\displaystyle 0.5389$	$\displaystyle 0.5735$	$\displaystyle 0.7688$	$\displaystyle 0.6932$
+MASF	$\displaystyle 0.6533$	$\displaystyle\underline{0.6775}$	$\displaystyle 0.6131$	$\displaystyle 0.5358$	$\displaystyle 0.5629$	$\displaystyle\underline{0.7765}$	$\displaystyle 0.6725$
VFT	$\displaystyle 0.6181$	$\displaystyle 0.6405$	$\displaystyle\underline{0.7058}$	$\displaystyle\underline{0.5732}$	$\displaystyle\underline{0.6410}$	$\displaystyle\bf{0.7791}$	$\displaystyle\underline{0.7281}$
VesselMorph	$\scalebox{0.9}{\mbox{$\displaystyle\bf{0.6619}$}}^{\sim}$	$\scalebox{0.9}{\mbox{$\displaystyle\bf{0.6787}$}}^{\sim}$	$\scalebox{0.9}{\mbox{$\displaystyle\bf{0.7420}$}}^{\dagger}$	$\scalebox{0.9}{\mbox{$\displaystyle\bf{0.6145}$}}^{\dagger}$	$\scalebox{0.9}{\mbox{$\displaystyle\bf{0.6756}$}}^{\dagger}$	$\scalebox{0.9}{\mbox{$\displaystyle 0.7714$}}^{\dagger}$	$\scalebox{0.9}{\mbox{$\displaystyle\bf{0.7308}$}}^{\dagger}$
oracle	$\displaystyle{\color[rgb]{.5,.5,.5}0.7334}$	$\displaystyle{\color[rgb]{.5,.5,.5}0.7065}$	$\displaystyle{\color[rgb]{.5,.5,.5}0.8358}$	$\displaystyle{\color[rgb]{.5,.5,.5}0.7524}$	$\displaystyle{\color[rgb]{.5,.5,.5}0.7732}$	$\displaystyle{\color[rgb]{.5,.5,.5}0.8657}$	$\displaystyle{\color[rgb]{.5,.5,.5}0.7603}$

Table 2: Dice values for testing on target domains. Boldface: best result. Underline: second best result.

{}^{\sim}:\text{p-value}\geq 0.05

{}^{\dagger}:\text{p-value}\ll 0.05

in paired t-test against the baseline output. The background is color-coded the same way as Table 1.

Fig. 4 shows a qualitative ablation study: it illustrates that the intensity representation $\mathbf{z}^{I}$ may miss large vessels in the very high-resolution HRF images, while $\mathbf{z}^{S}$ remains robust. In contrast, $\mathbf{z}^{I}$ provides sharper delineation for very thin vessels in ROSE. The fusion of both pathways outperforms either pathway for most scenarios. These observations are further supported by the quantitative ablation study in Fig.6. We note that $\mathbf{z}^{S}$ and $\mathbf{z}^{I}$ can be used as synthetic angiograms that provide both enhanced vessel visualization and model interpretability.

Fig. 5 shows the t-SNE plots [20] of the datasets. The distribution gaps between datasets are greatly reduced for the two latent vessel representations.

Table 2 compares all methods on the target domain $\mathcal{T}$ . For the diseased ARIA data, all methods show comparable performance and are not significantly different from the baseline. VesselMorph has the best OOD outcome for both cross-modality (dark gray) and cross-resolution (light gray) scenarios, except the OCTA500 dataset where VFT, MASF and VesselMorph perform similarly. The results of VFT and VesselMorph prove the value of the shape information.

4 Conclusion

In this work, we propose to solve the DG problem by explicitly modeling the domain-agnostic tubular vessel shape with a bipolar tensor field which connects traditional algorithms with deep learning. We extract vessel representation from both intensity and BTF, then fuse the information from the two pathways so that the segmentation network can better exploit both types of description. Our VesselMorph model provides significant quantitative improvement on Dice score across a variety of domain shift conditions, and its latent images offer enhanced vessel visualization and interpretability.

Acknowledgements. This work is supported by the NIH grant R01EY033969 and the Vanderbilt University Discovery Grant Program.

References

[1] Aslani, S., Murino, V., Dayan, M., Tam, R., Sona, D., Hamarneh, G.: Scanner invariant multiple sclerosis lesion segmentation from MRI. In: IEEE ISBI. pp. 781–785 (2020)
[2] Budai, A., Bock, R., Maier, A., Hornegger, J., Michelson, G.: Robust vessel segmentation in fundus images. International journal of biomedical imaging (2013)
[3] Dewey, B.E., Zuo, L., Carass, A., He, Y., Liu, Y., Mowry, E.M., Newsome, S., Oh, J., Calabresi, P.A., Prince, J.L.: A disentangled latent space for cross-site MRI harmonization. In: MICCAI. pp. 720–729. Springer (2020)
[4] Dou, Q., Coelho de Castro, D., Kamnitsas, K., Glocker, B.: Domain generalization via model-agnostic learning of semantic features. NeurIPS 32 (2019)
[5] Farnell, D., Hatfield, F., Knox, P., Reakes, M., Spencer, S., Parry, D., Harding, S.P.: Enhancement of blood vessels in digital fundus photographs via the application of multiscale line operators. Journal of the Franklin institute 345(7), 748–765 (2008)
[6] Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML. pp. 1126–1135. PMLR (2017)
[7] Frangi, A.F., Niessen, W.J., Vincken, K.L., Viergever, M.A.: Multiscale vessel enhancement filtering. In: MICCAI. pp. 130–137. Springer (1998)
[8] He, Y., Carass, A., Zuo, L., Dewey, B.E., Prince, J.L.: Autoencoder based self-supervised test-time adaptation for medical image analysis. MedIA (2021)
[9] Hoover, A., Kouznetsova, V., Goldbaum, M.: Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE TMI 19(3), 203–210 (2000)
[10] Hore, A., Ziou, D.: Image quality metrics: Psnr vs. ssim. In: 2010 20th international conference on pattern recognition. pp. 2366–2369. IEEE (2010)
[11] Hu, D., Cui, C., Li, H., Larson, K.E., Tao, Y.K., Oguz, I.: LIFE: a generalizable autodidactic pipeline for 3D OCT-A vessel segmentation. In: MICCAI (2021)
[12] Hu, D., Li, H., Liu, H., Oguz, I.: Domain generalization for retinal vessel segmentation with vector field transformer. In: MIDL (2021)
[13] Le Bihan, D., Mangin, J.F., Poupon, C., Clark, C.A., Pappata, S., Molko, N., Chabriat, H.: Diffusion tensor imaging: concepts and applications. Journal of Magnetic Resonance Imaging 13(4), 534–546 (2001)
[14] Li, D., Yang, Y., Song, Y.Z., Hospedales, T.: Learning to generalize: Meta-learning for domain generalization. In: AAAI. vol. 32 (2018)
[15] Li, H., Wang, Y., Wan, R., Wang, S., Li, T., Kot, A.: Domain generalization for medical imaging classification with linear-dependency regularization. NeurIPS (2020)
[16] Li, M., Chen, Y., Ji, Z., Xie, K., Yuan, S., Chen, Q., Li, S.: Image projection network: 3D to 2D image segmentation in OCTA images. IEEE TMI (2020)
[17] Liu, Y., Zuo, L., Carass, A., He, Y., Filippatou, A., Solomon, S., Saidha, S., Calabresi, P., Prince, J.: Variational intensity cross channel encoder for unsupervised vessel segmentation on OCT angiography. In: SPIE Medical Imaging (2020)
[18] Lyu, J., Zhang, Y., Huang, Y., Lin, L., Cheng, P., Tang, X.: AADG: Automatic Augmentation for Domain Generalization on Retinal Image Segmentation. IEEE TMI (2022)
[19] Ma, Y., Hao, H., Xie, J., Fu, H., Zhang, J., Yang, J., Wang, Z., Liu, J., Zheng, Y., Zhao, Y.: ROSE: a retinal OCT-angiography vessel segmentation dataset and new model. IEEE TMI 40(3), 928–939 (2020)
[20] Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of machine learning research 9(11) (2008)
[21] Ouyang, J., Adeli, E., Pohl, K.M., Zhao, Q., Zaharchuk, G.: Representation disentanglement for multi-modal brain MRI analysis. In: IPMI. pp. 321–333 (2021)
[22] Staal, J., Abràmoff, M.D., Niemeijer, M., Viergever, M.A., Van Ginneken, B.: Ridge-based vessel segmentation in color images of the retina. IEEE TMI (2004)
[23] Zhang, L., Wang, X., Yang, D., Sanford, T., Harmon, S., Turkbey, B., Wood, B.J., Roth, H., Myronenko, A., et al.: Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation. IEEE TMI (2020)
[24] Zhou, K., Liu, Z., Qiao, Y., Xiang, T., Loy, C.C.: Domain generalization: A survey. IEEE PAMI (2022)

HRF ${\scriptstyle[1000\times 1000]}$			ROSE ${\scriptstyle[200\times 200]}$
$\mathbf{x}$	$\mathbf{z}^{I}$	$\mathbf{z}^{S}$	$\mathbf{x}$	$\mathbf{z}^{I}$	$\mathbf{z}^{S}$