This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Department of Electrical and Computer Engineering, Vanderbilt University22institutetext: Department of Computer Science, Vanderbilt University
22email: dewei.hu@vanderbilt.edu

VesselMorph: Domain-Generalized Retinal Vessel Segmentation via Shape-Aware Representation

Dewei Hu 11    Hao Li 11    Han Liu 22    Xing Yao 22    Jiacheng Wang 22    Ipek Oguz 1122
Abstract

Due to the absence of a single standardized imaging protocol, domain shift between data acquired from different sites is an inherent property of medical images and has become a major obstacle for large-scale deployment of learning-based algorithms. For retinal vessel images, domain shift usually presents as the variation of intensity, contrast and resolution, while the basic tubular shape of vessels remains unaffected. Thus, taking advantage of such domain-invariant morphological features can greatly improve the generalizability of deep models. In this study, we propose a method named VesselMorph which generalizes the 2D retinal vessel segmentation task by synthesizing a shape-aware representation. Inspired by the traditional Frangi filter and the diffusion tensor imaging literature, we introduce a Hessian-based bipolar tensor field to depict the morphology of the vessels so that the shape information is taken into account. We map the intensity image and the tensor field to a latent space for feature extraction. Then we fuse the two latent representations via a weight-balancing trick and feed the result to a segmentation network. We evaluate on six public datasets of fundus and OCT angiography images from diverse patient populations. VesselMorph achieves superior generalization performance compared with competing methods in different domain shift scenarios.

Keywords:
domain generalization vessel segmentation tensor field shape representation retina

1 Introduction

Medical images suffer from the distribution shift caused by the discrepancy in imaging acquisition protocols. Images can appear in different contrast, resolution and range of intensity values, even within the same modality. A set of examples is shown in Fig. 1. This obstacle severely impedes the learning-based algorithms reaching clinical adoption. Therefore, much effort has been spent on solving the domain generalization (DG) problem so that the deep models can robustly work on out-of-distribution (OOD) data. There are three major types of solutions: data augmentation [23, 18], meta-learning [6, 14] and domain alignment [24]. The first two strategies aim to improve the model’s generalizability by either augmenting the source domain with additional data or replicating the exposure to OOD data during training. In contrast, the domain alignment strives to align the distribution of the target domains in either image [8] or feature space [1, 15].

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
Figure 1: Domain shift among retinal vessel images. Panels (1-3) are the green channel of fundus images. Panels (4-5) are OCT-A images. They all have the same size (300pix×300pix300\mathrm{pix}\times 300\mathrm{pix}). Suppose (1) represents the source domain. Distribution shifts in test domain can be caused by (2) pathology, (3) resolution change, and (4, 5) different modality.

We propose a novel method, VesselMorph, to improve the DG performance by providing an explicit description of the domain-agnostic shape features as auxiliary training material. Even though traditional algorithms are outperformed by their learning-based counterparts in many aspects, they can typically better generalize to any dataset, regardless of distribution shifts. Specifically for vessel segmentation, Frangi et al. [7] proposed a Hessian-based model to express the tubular shape of vessels which can be regarded as a domain-invariant feature. Merging the Hessian-based shape description [12] with the principles of diffusion tensor imaging (DTI) [13], we introduce a bipolar tensor field (BTF) to explicitly represent the vessel shape by a tensor at each pixel. To effectively merge the features in the intensity image and the shape descriptor BTF, we employ a full-resolution feature extraction network to obtain an interpretable representation in the latent space from both inputs. This technique is broadly used in unsupervised segmentation [17, 11] and representation disentanglement [3, 21].

As shown in Fig. 2, let 𝐱\mathbf{x} be the input image and Ψ(𝐱)\Psi(\mathbf{x}) the corresponding BTF. D(EI())D(E^{I}(\cdot)) and D(ES())D(E^{S}(\cdot)) are two feature extraction networks with a shared decoder DD. We empirically observe that the intensity representation 𝐳I\mathbf{z}^{I} can precisely delineate thinner vessels while the structure representation 𝐳S\mathbf{z}^{S} works better on thick ones. We combine the strengths of the two pathways for a robust DG performance. The two latent images are fused by a weight-balancing trick Γ(𝐳I,𝐳S)\Gamma(\mathbf{z}^{I},\mathbf{z}^{S}) to avoid any potential bias induced by the selection of source domains. Finally, we train a segmentation network DTD^{T} on the fused latent images. We compare the performance of VesselMorph to other DG approaches on four public datasets that represent various distribution shift conditions, and show that VesselMorph has superior performance in most OOD domains. Our contributions are:

  • A Hessian-based bipolar tensor field (BTF) that provides an explicit description of the vessel morphology (Sec. 2.1).

  • A full-resolution feature extraction network that generates vessel representation from both the intensity image and the BTF (Sec. 2.2).

  • A training pipeline that generates stable latent images for both pathways and a weight-balancing method to fuse the two representations (Sec. 2.3).

  • A comprehensive evaluation on public datasets which shows superior cross-resolution and cross-modality generalization performance (Sec. 3.2).

Refer to caption
Figure 2: The overall structure of VesselMorph. The shaded layers include transformer blocks. The dashed line indicates DD will be discarded in testing.

2 Methods

2.1 Bipolar Tensor Field

Unlike ML models, our visual interpretation of vessels is rarely affected by data distribution shifts. Mimicking the human vessel recognition can thus help address the DG problem. In addition to intensity values, human perception of vessels also depends on the local contrast and the correlation in a neighborhood, which is often well described by the local Hessian. Inspired by the use of DTI to depict the white matter tracts, we create a Hessian-based bipolar tensor field to represent the morphology of vessels. Given a 2D input image 𝐱h×w\mathbf{x}\in\mathbb{R}^{h\times w} and scale σ\sigma, the classical Frangi vesselness 𝒱(σ)\mathcal{V}(\sigma) [7] is defined as:

𝒱(σ)={0if λ2>0,exp(B22β2)[1exp(S22c2)]else.\mathcal{V(\sigma)}=\begin{cases}0&\text{if }\lambda_{2}>0,\\ \exp\left(-\frac{\mathcal{R}_{B}^{2}}{2\beta^{2}}\right)\left[1-\exp\left(-\frac{S^{2}}{2c^{2}}\right)\right]&\text{else}\end{cases}. (1)

Here, λ1,λ2\lambda_{1},\lambda_{2} are the sorted eigenvalues of the Hessian \mathcal{H}, B=λ1/λ2\mathcal{R}_{B}=\lambda_{1}/\lambda_{2}, SS is the Frobenius norm of the Hessian (F\|\mathcal{H}\|_{F}), β=0.5\beta=0.5 and c=0.5c=0.5. Note that we assume vessels are brighter than the background; fundus images are negated to comply. To represent vessels of different sizes, we leverage the multiscale vesselness filter that uses the optimal scale σ\sigma^{*} for the Hessian (𝐱ij,σ)\mathcal{H}(\mathbf{x}_{ij},\sigma) at each pixel (i,j)(i,j). This is achieved by grid search in the range [σmin,σmax][\sigma_{min},\sigma_{max}] to maximize the vesselness 𝒱(σ)\mathcal{V}(\sigma), i.e., σ=argmaxσminσσmax𝒱(σ)\sigma^{*}=\operatorname*{argmax}_{\sigma_{min}\leq\sigma\leq\sigma_{max}}\mathcal{V}(\sigma). Then the optimized Hessian is represented by a 2×22\times 2 matrix:

(𝐱ij,σ)=(σ)2𝐱ij2G(𝐱ij,σ)\mathcal{H}(\mathbf{x}_{ij},\sigma^{*})=(\sigma^{*})^{2}\mathbf{x}_{ij}\ast\nabla^{2}G(\mathbf{x}_{ij},\sigma^{*}) (2)

where G(𝐱ij,σ)G(\mathbf{x}_{ij},\sigma^{*}) is a 2D Gaussian kernel with standard deviation σ\sigma^{*}. Then we apply the eigen decomposition to obtain the eigenvalues λ1\lambda_{1}, λ2\lambda_{2} (|λ1||λ2||\lambda_{1}|\leq|\lambda_{2}|) and the corresponding eigenvectors 𝐯1\mathbf{v}_{1}, 𝐯2\mathbf{v}_{2} at the optimal σ\sigma^{*}.

Refer to caption Refer to caption Refer to caption
Figure 3: Left: A simplified illustration of BTF. The red arrows indicate the orientation of 𝐯1\mathbf{v}_{1} while the blue arrows correspond to 𝐯2\mathbf{v}_{2}. The ellipses represent the tensors at p1p_{1} (in the vessel) and p2p_{2} (in the background). Right: BTF applied on an OCTA image.

Instead of solely analyzing the signs and magnitudes of the Hessian eigenvalues as in the traditional Frangi filter, we propose to leverage the eigenvectors along with custom-designed magnitudes to create our tensor field as shown in Fig. 3(Left). The core idea of Frangi filter is to enhance the tubular structure by matching the vessel diameter with the distance between the two zero crossings in the second order derivative of Gaussian ( 22σ2\sqrt{2}\sigma^{*}). However, the solution is not guaranteed to land in range [σmin,σmax][\sigma_{min},\sigma_{max}], especially for small vessels. Consequently, we observe that the inaccurate estimation of σ\sigma^{*} results in a blurring effect at the vessel boundary, which is problematic for segmentation. As an example in Fig. 3(Left), the direction of 𝐯1\mathbf{v}_{1} at p2p_{2} aligns with that at p1p_{1}, even though p1p_{1} is inside the vessel while p2p_{2} is in the background but close to the boundary. This makes it difficult for the vector orientations alone to differentiate points inside and outside the vessel. To tackle this, we introduce the idea of a bipolar tensor by assigning a large magnitude to the orthogonal eigenvector 𝐯2\mathbf{v}_{2} to points in the background, as shown in the blue dashed ellipse. Specifically, we define the magnitudes α1\alpha_{1} and α2\alpha_{2} associated with the eigenvectors 𝐯1\mathbf{v}_{1} and 𝐯2\mathbf{v}_{2} as:

α1=P(𝐱𝐱ij)brightexp(ϵλ12F2)vessel-like,α2=P(𝐱>𝐱ij)darkexp(ϵλ22F2)vessel-like\displaystyle\alpha_{1}={\underbrace{\vphantom{\left(\frac{\lambda_{1}^{2}}{\|\mathcal{H}\|^{2}_{F}}\right)}P(\mathbf{x}\leq\mathbf{x}_{ij})}_{\text{bright}}}{\underbrace{\exp\left(-\epsilon\frac{\lambda_{1}^{2}}{\|\mathcal{H}\|^{2}_{F}}\right)}_{\text{vessel-like}}},\quad\alpha_{2}={\underbrace{\vphantom{\left(\frac{\lambda_{2}^{2}}{\|\mathcal{H}\|^{2}_{F}}\right)}P(\mathbf{x}>\mathbf{x}_{ij})}_{\text{dark}}}{\underbrace{\exp\left(-\epsilon\frac{\lambda_{2}^{2}}{\|\mathcal{H}\|^{2}_{F}}\right)}_{\text{vessel-like}}} (3)

where P(𝐱>𝐱ij)P(\mathbf{x}>\mathbf{x}_{ij}) is the probability that the intensity of a random pixel xx in the image is greater than 𝐱ij\mathbf{x}_{ij}. This is equivalent to normalizing the histogram by the factor hwhw and computing the cumulative distribution function at 𝐱ij\mathbf{x}_{ij}. This term thus provides a normalized brightness function in the range [0,1][0,1]. The exponential term represents how vessel-like the voxel is by using a normalized eigenvalue, and is in the [0,1][0,1] range as well. ϵ\epsilon is a constant that controls the sensitivity, which is empirically set to 0.50.5. With the custom magnitudes α1\alpha_{1} and α2\alpha_{2}, the two poles can better differentiate vessels from the background. Fig. 3(Right) is an example of BTF on an OCTA image. In practice, we stack the two vectors as the input to the structural encoding network, i.e., Ψ(𝐱ij)=[α1𝐯1,α2𝐯2]4×1\Psi(\mathbf{x}_{ij})=\left[\alpha_{1}\mathbf{v}_{1}^{\top},\alpha_{2}\mathbf{v}_{2}^{\top}\right]^{\top}\in\mathbb{R}^{4\times 1}.

2.2 Latent Vessel Representation

Preserving the spatial resolution for the bottleneck of models with U-Net backbone is a common strategy to emphasize the structural features in unsupervised segmentation [17, 11] and representation disentanglement [3, 21]. We employ a network that has a full-resolution (h×wh\times w pixels) latent space as the feature extraction model. We propose to extract vessel structure from both the intensity image 𝐱h×w\mathbf{x}\in\mathbb{R}^{h\times w} and its corresponding BTF, Ψ(𝐱)4×h×w\Psi(\mathbf{x})\in\mathbb{R}^{4\times h\times w}. Therefore, in Fig. 2, the intensity D(EI())D(E^{I}(\cdot)) and structure D(ES())D(E^{S}(\cdot)) encoding pathways share the decoder D, and the latent images 𝐳I,𝐳Sh×w\mathbf{z}^{I},\mathbf{z}^{S}\in\mathbb{R}^{h\times w}. To distribute more workload on the encoder, DD has a shallower architecture and will be discarded in testing. For the intensity encoding, the model is optimized by minimizing the segmentation loss function defined as the combination of cross-entropy and Dice loss:

seg=1Nn=1N𝐲nlog𝐲^nI+(12n=1N𝐲n𝐲^nIn=1N𝐲n2+(𝐲^nI)2)\mathcal{L}_{seg}=-\frac{1}{N}\sum_{n=1}^{N}\mathbf{y}_{n}\log\hat{\mathbf{y}}^{I}_{n}+\left(1-\frac{2\sum_{n=1}^{N}\mathbf{y}_{n}\hat{\mathbf{y}}^{I}_{n}}{\sum_{n=1}^{N}\mathbf{y}_{n}^{2}+(\hat{\mathbf{y}}^{I}_{n})^{2}}\right) (4)

where N=h×wN=h\times w is the total number of pixels in the image, 𝐲\mathbf{y} is the ground truth and 𝐲^I\hat{\mathbf{y}}^{I} is the prediction from the training-only decoder DD. Although there is no explicit constraint on the latent image EI(𝐱)=𝐳IE^{I}(\mathbf{x})=\mathbf{z}^{I}, we note that the segmentation-based supervision encourages it to include the vessels while most other irrelevant features are filtered out. Hence, we can view the latent feature as a vessel representation.

Our approach is slightly different for the structure encoding as we observe that it is hard for the feature extraction network to generate a stable latent image that is free of artifacts when the number of input channels is greater than 1. Thus, it is necessary to use EIE^{I} as a teacher model that provides direct supervision on the vessel representation. In other words, we first train the intensity encoding path to get EIE^{I} and DD, then train the ESE^{S} by leveraging both the segmentation loss in Eq. 4 and a similarity loss defined as:

sim(𝐳S,𝐳I)=n=1N𝐳nS𝐳nI1+SSIM(𝐳S,𝐳I)\mathcal{L}_{sim}(\mathbf{z}^{S},\mathbf{z}^{I})=\sum_{n=1}^{N}\|\mathbf{z}^{S}_{n}-\mathbf{z}^{I}_{n}\|_{1}+\mathrm{SSIM}(\mathbf{z}^{S},\mathbf{z}^{I}) (5)

which is a weighted sum of L1L_{1} norm and structural similarity loss SSIM [10]. SSIM is defined as SSIM(A,B)=2(2μAμB+c1)(2σAB+c2)(μA2+μB2+c1)(σA2+σB2+c2),\mathrm{SSIM}(A,B)=2\frac{(2\mu_{A}\mu_{B}+c_{1})(2\sigma_{AB}+c_{2})}{(\mu_{A}^{2}+\mu_{B}^{2}+c_{1})(\sigma_{A}^{2}+\sigma_{B}^{2}+c_{2})}, where μ\mu and σ\sigma represent the mean and standard deviation of the image, and we set c1=0.01c_{1}=0.01 and c2=0.03c_{2}=0.03. The overall loss function for the structural encoding is thus (Ψ(𝐱),𝐲)=ω1seg(𝐲^S,𝐲)+ω2sim(𝐳S,𝐳I)\mathcal{L}(\Psi(\mathbf{x}),\mathbf{y})=\omega_{1}\mathcal{L}_{seg}(\hat{\mathbf{y}}^{S},\mathbf{y})+\omega_{2}\mathcal{L}_{sim}(\mathbf{z}^{S},\mathbf{z}^{I}), with empirically determined weights ω1=1\omega_{1}=1, ω2=5\omega_{2}=5. Experimentally, we found that the 𝐳I\mathbf{z}^{I} is good at preserving small vessels, while 𝐳S\mathbf{z}^{S} works better on larger ones.

2.3 Fusion of Vessel Representations

Given the two synthesized vessel representations 𝐳I\mathbf{z}^{I} and 𝐳S\mathbf{z}^{S}, we need to introduce a fusion method to take advantage of both intensity and structure features. Naively stacking these two channels as input to the segmentation network is prone to inducing bias: if 𝐳I\mathbf{z}^{I} is consistently better for images from the source domain, then the downstream task model DTD^{T} would learn to downplay the contribution of 𝐳S\mathbf{z}^{S} due to this biased training data. As a result, despite its potential to improve performance, 𝐳S\mathbf{z}^{S} would be hindered from making a significant contribution to the target domain during testing. To circumvent this issue, we propose a simple weight-balancing trick. As illustrated in Fig. 2, we randomly swap some patches between the two latent images so that DTD^{T} does not exclusively consider the feature from a single channel, even for biased training data. This trick is feasible because 𝐳S\mathbf{z}^{S} and 𝐳I\mathbf{z}^{I} are in the same intensity range, due to the similarity constraints applied in Eq. 5. Thus the input to DTD^{T} is 𝐱~=Γ(𝐳I,𝐳S)\tilde{\mathbf{x}}=\Gamma(\mathbf{z}^{I},\mathbf{z}^{S}), where 𝐱~2×h×w\tilde{\mathbf{x}}\in\mathbb{R}^{2\times h\times w}. The loss function leveraged for DTD^{T} is the same as Eq. 4.

input : Source domains 𝒮={(𝐱i,𝐲i)}i=1K\mathcal{S}=\{(\mathbf{x}_{i},\mathbf{y}_{i})\}_{i=1}^{K},
hyperparameters: ϵ,σmin,σmax\epsilon,\sigma_{min},\sigma_{max}, ηEI\eta_{E^{I}}, ηES\eta_{E^{S}}, ηDT\eta_{D^{T}} ω1\omega_{1}, ω2\omega_{2}
output : parameters of models θI\theta_{I}^{*}, θS\theta_{S}^{*}, φT\varphi^{*}_{T}
// Train the intensity encoder EIE^{I} as a teacher model
1 repeat
       for i=1:Ki=1:K do
             θIθIηEI(i)seg(D(EI(𝐱i)),𝐲i)\theta_{I}^{{}^{\prime}}\leftarrow\theta_{I}-\eta_{E^{I}}(i)\nabla\mathcal{L}_{seg}(D(E^{I}(\mathbf{x}_{i})),\mathbf{y}_{i})
            
      
until converge
2Generate the tensor field: Ψ(𝐱)\Psi(\mathbf{x})
// Train the structure encoder ESE^{S} as a student model
3 repeat
       for i=1:Ki=1:K do
             𝐲^iD(ES(Ψ(𝐱i)))\hat{\mathbf{y}}_{i}\leftarrow D(E^{S}(\Psi(\mathbf{x}_{i})))
             (Ψ(𝐱i),𝐲i)ω1seg(𝐲^i,𝐲i)+ω2sim(ES(Ψ(𝐱i)),EI(𝐱i))\mathcal{L}(\Psi(\mathbf{x}_{i}),\mathbf{y}_{i})\leftarrow\omega_{1}\mathcal{L}_{seg}(\hat{\mathbf{y}}_{i},\mathbf{y}_{i})+\omega_{2}\mathcal{L}_{sim}(E^{S}(\Psi(\mathbf{x}_{i})),E^{I}(\mathbf{x}_{i}))
             θSθSηES(i)(Ψ(𝐱i),𝐲i)\theta_{S}^{{}^{\prime}}\leftarrow\theta_{S}-\eta_{E^{S}}(i)\nabla\mathcal{L}(\Psi(\mathbf{x}_{i}),\mathbf{y}_{i})
            
      
until converge
// Train the segmentation network DTD^{T}
4 repeat
       for i=1:Ki=1:K do
             𝐱~iΓ(EI(𝐱i),ES(Φ(𝐱i)))\tilde{\mathbf{x}}_{i}\leftarrow\Gamma(E^{I}(\mathbf{x}_{i}),E^{S}(\Phi(\mathbf{x}_{i})))
             φTφTηDT(i)seg(DT(𝐱~i),𝐲i)\varphi^{{}^{\prime}}_{T}\leftarrow\varphi_{T}-\eta_{D^{T}}(i)\nabla\mathcal{L}_{seg}(D^{T}(\tilde{\mathbf{x}}_{i}),\mathbf{y}_{i})
            
      
until converge
Algorithm 1 Training of VesselMorph

The complete algorithm for training of VesselMorph is shown in Algorithm.1. Briefly, we first train the intensity encoder EIE^{I} as it is easier to generate a stable vessel representation 𝐳I\mathbf{z}^{I}. Then a structure encoder ESE^{S} is trained with the supervision of the ground truth and teacher model EIE^{I} so that an auxiliary representation 𝐳S\mathbf{z}^{S} is extracted from the structural descriptor BTF. The last step is to train a segmentation network DTD^{T} with the fusion of the two vessel maps Γ(𝐳I,𝐳S)\Gamma(\mathbf{z}^{I},\mathbf{z}^{S}). During testing, the patch-swapping is no longer needed, so we simply concatenate EI(𝐱)E^{I}(\mathbf{x}) and ES(Ψ(𝐱))E^{S}(\Psi(\mathbf{x})) as the input to DTD^{T}.

modality resolution #\# sample domain
DRIVE [22] fundus 565×584565\times 584 20 𝒮\mathcal{S}
STARE [9] fundus 700×605700\times 605 20 𝒮\mathcal{S}
ARIA [5] fundus 768×576768\times 576 61/59/2361/59/23 𝒮/𝒯/𝒯\mathcal{S}/\mathcal{T}/\mathcal{T}
HRF [2] fundus 3504×23363504\times 2336 15/15/1515/15/15 𝒯\mathcal{T}
OCTA-500(6M) [16] OCTA 400×400400\times 400 300 𝒯\mathcal{T}
ROSE [19] OCTA 304×304304\times 304 30 𝒯\mathcal{T}
Table 1: Publicly available datasets used in our experiments. For ARIA and HRF, we list the number of samples per class. ARIA classes: healthy, diabetic and AMD (age-related macular degeneration). HRF classes: healthy, diabetic and glaucoma. The shading of the rows indicates datasets in similar distributions to each other.

3 Experiments

3.1 Experimental Settings

Datasets.

The 6 publicly available datasets used in this study are listed in Table 1. Since there are more labeled fundus data available, we set up a source domain 𝒮\mathcal{S} that includes three fundus datasets: DRIVE, STARE and the control subjects in ARIA. In the target domain 𝒯\mathcal{T}, we test the performance of the model under three different conditions: pathology (diabetic/AMD subjects in ARIA), resolution change (HRF) and cross-modality (OCTA500 and ROSE).

Compared methods. We pick one representative algorithm from each of the three major categories of DG approaches (Sec. 1) as a competing method. For data augmentation, we implement BigAug [23]. For meta-learning, we use the MASF [4] model. For domain alignment, we use the domain regularization network [1]. In addition, we also include VFT [12] which proposes the idea of shape description for DG. The baseline model is a vanilla residual U-Net trained on 𝒮\mathcal{S}, and the oracle model is the same network trained directly on each target domain to represent the optimal performance. Note that for a fair comparison, we set the baseline model to have a bit more parameters than D(EI())D(E^{I}(\cdot)) (7.4×105:6.7×1057.4\times 10^{5}:6.7\times 10^{5}).

HRF [1000×1000]{\scriptstyle[1000\times 1000]} ROSE [200×200]{\scriptstyle[200\times 200]}
𝐱\mathbf{x} 𝐳I\mathbf{z}^{I} 𝐳S\mathbf{z}^{S} 𝐱\mathbf{x} 𝐳I\mathbf{z}^{I} 𝐳S\mathbf{z}^{S}
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
Figure 4: Qualitative ablation. The shown patches are 1000×10001000\times 1000pix for HRF diabetic image and 200×200200\times 200pix for ROSE. Top row: raw image, 𝐳I\mathbf{z}^{I} and 𝐳S\mathbf{z}^{S}. Bottom row: the VesselMorph segmentation and prediction from each pathway, i.e., DT(Γ(𝐳I,𝐳S))D^{T}(\Gamma(\mathbf{z}^{I},\mathbf{z}^{S})), D(𝐳I)D(\mathbf{z}^{I}), and D(𝐳S)D(\mathbf{z}^{S}). Red and green indicate the false negative (FN) and false positive (FP), respectively. 𝐳I\mathbf{z}^{I} may miss large vessels, while 𝐳S\mathbf{z}^{S} may miss thin ones. The fusion provides robust performance, as can also be seen quantitatively in Supp. Fig. 1.

Implementation Details. We use the residual U-Net structure for EIE^{I}, DD and DTD^{T}. To take advantage of the tensor field, the structure encoder ESE^{S} is equipped with parallel transformer blocks with different window sizes as proposed in [12]. All networks are trained and tested on an NVIDIA RTX 2080TI 11GB GPU. We use a batch size of 5 and train for 100 epochs. We use the Adam optimizer with the initial learning rate ηEI=ηES=5×104\eta_{E^{I}}=\eta_{E^{S}}=5\times 10^{-4}, ηDT=1×103\eta_{D^{T}}=1\times 10^{-3}, decayed by 0.5 for every 3 epochs. For fundus images, we use the green channel as network input 𝐱\mathbf{x}. The intensity values are normalized to [0,1][0,1].

3.2 Results

Refer to caption Refer to caption Refer to caption
Figure 5: t-SNE on raw data 𝐱\mathbf{x}(left), 𝐳I\mathbf{z}^{I}(center) and 𝐳S\mathbf{z}^{S}(right). 𝒮\mathcal{S} is coded by shades of green, while fundus and OCTA in 𝒯\mathcal{T} are coded by red and blue shades respectively. Both intensity and structure representations reduce the domain gaps between datasets.
Method ARIA HRF
amd diabetic healthy diabetic glaucoma OCTA 500 ROSE
baseline 0.6382\displaystyle 0.6382 0.6519\displaystyle 0.6519 0.6406\displaystyle 0.6406 0.5267\displaystyle 0.5267 0.5566\displaystyle 0.5566 0.7316\displaystyle 0.7316 0.6741\displaystyle 0.6741
+Regular 0.6489\displaystyle 0.6489 0.6697\displaystyle 0.6697 0.6403\displaystyle 0.6403 0.5216\displaystyle 0.5216 0.5625\displaystyle 0.5625 0.7354\displaystyle 0.7354 0.6836\displaystyle 0.6836
+BigAug 0.6555¯\displaystyle\underline{0.6555} 0.6727\displaystyle 0.6727 0.6613\displaystyle 0.6613 0.5389\displaystyle 0.5389 0.5735\displaystyle 0.5735 0.7688\displaystyle 0.7688 0.6932\displaystyle 0.6932
+MASF 0.6533\displaystyle 0.6533 0.6775¯\displaystyle\underline{0.6775} 0.6131\displaystyle 0.6131 0.5358\displaystyle 0.5358 0.5629\displaystyle 0.5629 0.7765¯\displaystyle\underline{0.7765} 0.6725\displaystyle 0.6725
     VFT 0.6181\displaystyle 0.6181 0.6405\displaystyle 0.6405 0.7058¯\displaystyle\underline{0.7058} 0.5732¯\displaystyle\underline{0.5732} 0.6410¯\displaystyle\underline{0.6410} 0.7791\displaystyle\bf{0.7791} 0.7281¯\displaystyle\underline{0.7281}
VesselMorph

0.6619

\scalebox{0.9}{\mbox{$\displaystyle\bf{0.6619}$}}^{\sim}

0.6787

\scalebox{0.9}{\mbox{$\displaystyle\bf{0.6787}$}}^{\sim}

0.7420

\scalebox{0.9}{\mbox{$\displaystyle\bf{0.7420}$}}^{\dagger}

0.6145

\scalebox{0.9}{\mbox{$\displaystyle\bf{0.6145}$}}^{\dagger}

0.6756

\scalebox{0.9}{\mbox{$\displaystyle\bf{0.6756}$}}^{\dagger}

0.7714

\scalebox{0.9}{\mbox{$\displaystyle 0.7714$}}^{\dagger}

0.7308

\scalebox{0.9}{\mbox{$\displaystyle\bf{0.7308}$}}^{\dagger}
    oracle 0.7334\displaystyle{\color[rgb]{.5,.5,.5}0.7334} 0.7065\displaystyle{\color[rgb]{.5,.5,.5}0.7065} 0.8358\displaystyle{\color[rgb]{.5,.5,.5}0.8358} 0.7524\displaystyle{\color[rgb]{.5,.5,.5}0.7524} 0.7732\displaystyle{\color[rgb]{.5,.5,.5}0.7732} 0.8657\displaystyle{\color[rgb]{.5,.5,.5}0.8657} 0.7603\displaystyle{\color[rgb]{.5,.5,.5}0.7603}
Table 2: Dice values for testing on target domains. Boldface: best result. Underline: second best result. :p-value0.05{}^{\sim}:\text{p-value}\geq 0.05, :p-value0.05{}^{\dagger}:\text{p-value}\ll 0.05 in paired t-test against the baseline output. The background is color-coded the same way as Table 1.

Fig. 4 shows a qualitative ablation study: it illustrates that the intensity representation 𝐳I\mathbf{z}^{I} may miss large vessels in the very high-resolution HRF images, while 𝐳S\mathbf{z}^{S} remains robust. In contrast, 𝐳I\mathbf{z}^{I} provides sharper delineation for very thin vessels in ROSE. The fusion of both pathways outperforms either pathway for most scenarios. These observations are further supported by the quantitative ablation study in Fig.6. We note that 𝐳S\mathbf{z}^{S} and 𝐳I\mathbf{z}^{I} can be used as synthetic angiograms that provide both enhanced vessel visualization and model interpretability.

Refer to caption
Figure 6: Quantitative ablation results. Dice scores of vanilla residual U-Net, intensity encoding D(𝐳I)D(\mathbf{z}^{I}), structural encoding D(𝐳S)D(\mathbf{z}^{S}) and the final output DT(Γ(𝐳I,𝐳S))D^{T}(\Gamma(\mathbf{z}^{I},\mathbf{z}^{S})).The background is encoded the same way as Table 1. We note that the 𝐳S\mathbf{z}^{S} is especially useful in capturing the thick vessels in HRF, whereas 𝐳I\mathbf{z}^{I} provides additional precision in the thin vessels in the OCTA datasets. The proposed model combines these advantages and is robust across the board.

Fig. 5 shows the t-SNE plots [20] of the datasets. The distribution gaps between datasets are greatly reduced for the two latent vessel representations.

Table 2 compares all methods on the target domain 𝒯\mathcal{T}. For the diseased ARIA data, all methods show comparable performance and are not significantly different from the baseline. VesselMorph has the best OOD outcome for both cross-modality (dark gray) and cross-resolution (light gray) scenarios, except the OCTA500 dataset where VFT, MASF and VesselMorph perform similarly. The results of VFT and VesselMorph prove the value of the shape information.

4 Conclusion

In this work, we propose to solve the DG problem by explicitly modeling the domain-agnostic tubular vessel shape with a bipolar tensor field which connects traditional algorithms with deep learning. We extract vessel representation from both intensity and BTF, then fuse the information from the two pathways so that the segmentation network can better exploit both types of description. Our VesselMorph model provides significant quantitative improvement on Dice score across a variety of domain shift conditions, and its latent images offer enhanced vessel visualization and interpretability.

Acknowledgements. This work is supported by the NIH grant R01EY033969 and the Vanderbilt University Discovery Grant Program.

References

  • [1] Aslani, S., Murino, V., Dayan, M., Tam, R., Sona, D., Hamarneh, G.: Scanner invariant multiple sclerosis lesion segmentation from MRI. In: IEEE ISBI. pp. 781–785 (2020)
  • [2] Budai, A., Bock, R., Maier, A., Hornegger, J., Michelson, G.: Robust vessel segmentation in fundus images. International journal of biomedical imaging (2013)
  • [3] Dewey, B.E., Zuo, L., Carass, A., He, Y., Liu, Y., Mowry, E.M., Newsome, S., Oh, J., Calabresi, P.A., Prince, J.L.: A disentangled latent space for cross-site MRI harmonization. In: MICCAI. pp. 720–729. Springer (2020)
  • [4] Dou, Q., Coelho de Castro, D., Kamnitsas, K., Glocker, B.: Domain generalization via model-agnostic learning of semantic features. NeurIPS 32 (2019)
  • [5] Farnell, D., Hatfield, F., Knox, P., Reakes, M., Spencer, S., Parry, D., Harding, S.P.: Enhancement of blood vessels in digital fundus photographs via the application of multiscale line operators. Journal of the Franklin institute 345(7), 748–765 (2008)
  • [6] Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML. pp. 1126–1135. PMLR (2017)
  • [7] Frangi, A.F., Niessen, W.J., Vincken, K.L., Viergever, M.A.: Multiscale vessel enhancement filtering. In: MICCAI. pp. 130–137. Springer (1998)
  • [8] He, Y., Carass, A., Zuo, L., Dewey, B.E., Prince, J.L.: Autoencoder based self-supervised test-time adaptation for medical image analysis. MedIA (2021)
  • [9] Hoover, A., Kouznetsova, V., Goldbaum, M.: Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE TMI 19(3), 203–210 (2000)
  • [10] Hore, A., Ziou, D.: Image quality metrics: Psnr vs. ssim. In: 2010 20th international conference on pattern recognition. pp. 2366–2369. IEEE (2010)
  • [11] Hu, D., Cui, C., Li, H., Larson, K.E., Tao, Y.K., Oguz, I.: LIFE: a generalizable autodidactic pipeline for 3D OCT-A vessel segmentation. In: MICCAI (2021)
  • [12] Hu, D., Li, H., Liu, H., Oguz, I.: Domain generalization for retinal vessel segmentation with vector field transformer. In: MIDL (2021)
  • [13] Le Bihan, D., Mangin, J.F., Poupon, C., Clark, C.A., Pappata, S., Molko, N., Chabriat, H.: Diffusion tensor imaging: concepts and applications. Journal of Magnetic Resonance Imaging 13(4), 534–546 (2001)
  • [14] Li, D., Yang, Y., Song, Y.Z., Hospedales, T.: Learning to generalize: Meta-learning for domain generalization. In: AAAI. vol. 32 (2018)
  • [15] Li, H., Wang, Y., Wan, R., Wang, S., Li, T., Kot, A.: Domain generalization for medical imaging classification with linear-dependency regularization. NeurIPS (2020)
  • [16] Li, M., Chen, Y., Ji, Z., Xie, K., Yuan, S., Chen, Q., Li, S.: Image projection network: 3D to 2D image segmentation in OCTA images. IEEE TMI (2020)
  • [17] Liu, Y., Zuo, L., Carass, A., He, Y., Filippatou, A., Solomon, S., Saidha, S., Calabresi, P., Prince, J.: Variational intensity cross channel encoder for unsupervised vessel segmentation on OCT angiography. In: SPIE Medical Imaging (2020)
  • [18] Lyu, J., Zhang, Y., Huang, Y., Lin, L., Cheng, P., Tang, X.: AADG: Automatic Augmentation for Domain Generalization on Retinal Image Segmentation. IEEE TMI (2022)
  • [19] Ma, Y., Hao, H., Xie, J., Fu, H., Zhang, J., Yang, J., Wang, Z., Liu, J., Zheng, Y., Zhao, Y.: ROSE: a retinal OCT-angiography vessel segmentation dataset and new model. IEEE TMI 40(3), 928–939 (2020)
  • [20] Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of machine learning research 9(11) (2008)
  • [21] Ouyang, J., Adeli, E., Pohl, K.M., Zhao, Q., Zaharchuk, G.: Representation disentanglement for multi-modal brain MRI analysis. In: IPMI. pp. 321–333 (2021)
  • [22] Staal, J., Abràmoff, M.D., Niemeijer, M., Viergever, M.A., Van Ginneken, B.: Ridge-based vessel segmentation in color images of the retina. IEEE TMI (2004)
  • [23] Zhang, L., Wang, X., Yang, D., Sanford, T., Harmon, S., Turkbey, B., Wood, B.J., Roth, H., Myronenko, A., et al.: Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation. IEEE TMI (2020)
  • [24] Zhou, K., Liu, Z., Qiao, Y., Xiang, T., Loy, C.C.: Domain generalization: A survey. IEEE PAMI (2022)