This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: School of Computer Science and Technology,
University of Science and Technology of China, Hefei, Anhui, China,
11email: lwj1217@mail.ustc.edu.cn

Solving Low-Dose CT Reconstruction via GAN with Local Coherence

Wenjie Liu 11 0000-0002-4524-8507
Abstract

The Computed Tomography (CT) for diagnosis of lesions in human internal organs is one of the most fundamental topics in medical imaging. Low-dose CT, which offers reduced radiation exposure, is preferred over standard-dose CT, and therefore its reconstruction approaches have been extensively studied. However, current low-dose CT reconstruction techniques mainly rely on model-based methods or deep-learning-based techniques, which often ignore the coherence and smoothness for sequential CT slices. To address this issue, we propose a novel approach using generative adversarial networks (GANs) with enhanced local coherence. The proposed method can capture the local coherence of adjacent images by optical flow, which yields significant improvements in the precision and stability of the constructed images. We evaluate our proposed method on real datasets and the experimental results suggest that it can outperform existing state-of-the-art reconstruction approaches significantly.

Keywords:
CT reconstruction Low-dose Generative adversarial networks Local coherence Optical flow

1 Introduction

Computed Tomography (CT) is one of the most widely used technologies in medical imaging, which can assist doctors for diagnosing the lesions in human internal organs. Due to harmful radiation exposure of standard-dose CT, the low dose CT is more preferable in clinical application [5, 7, 35]. However, when the dose is low together with the issues like sparse-view or limited angles, it becomes quite challenging to reconstruct high-quality CT images. The high-quality CT images are important to improve the performance of diagnosis in clinic [28]. In mathematics, we model the CT imaging as the following procedure

𝐲=𝒯(𝐱𝐫)+δ,\mathbf{y}=\mathcal{T}(\mathbf{x^{r}})+\delta, (1)

where 𝐱𝐫d\mathbf{x^{r}}\in\mathbb{R}^{d} denotes the unknown ground-truth picture, 𝐲m\mathbf{y}\in\mathbb{R}^{m} denotes the received measurement, and δ\delta is the noise. The function 𝒯\mathcal{T} represents the forward operator that is analogous to the Radon transform, which is widely used in medical imaging [24, 29]. The problem of CT reconstruction is to recover 𝐱𝐫\mathbf{x^{r}} from the received 𝐲\mathbf{y}.

Solving the inverse problem of (1) is often very challenging if there is no any additional information. If the forward operator 𝒯\mathcal{T} is well-posed and δ\delta is neglectable, we know that an approximate 𝐱𝐫\mathbf{x^{r}} can be easily obtained by directly computing 𝒯1(𝐲)\mathcal{T}^{-1}(\mathbf{y}). However, 𝒯\mathcal{T} is often ill-posed, which means the inverse function 𝒯1\mathcal{T}^{-1} does not exist and the inverse problem of (1) may have multiple solutions. Moreover, when the CT imaging is low-dose, the filter backward projection (FBP) [12] can produce serious detrimental artifact. Therefore, most of existing approaches usually incorporate some prior knowledge during the reconstruction [15, 18, 27]. For example, a commonly used method is based on regularization:

𝐱=argmin𝐱𝒯(𝐱)𝐲p+λ(𝐱),\mathbf{x}=argmin_{\mathbf{x}}\left\|\mathcal{T}(\mathbf{x})-\mathbf{y}\right\|_{p}+\lambda\mathcal{R}(\mathbf{x}), (2)

where p\|\cdot\|_{p} denotes the pp-norm and (𝐱)\mathcal{R}(\mathbf{x}) denotes the penalty item from some prior knowledge.

In the past years, a number of methods have been proposed for designing the regularization \mathcal{R}. The traditional model-based algorithms, e.g., the ones using total variation [4, 27], usually apply the sparse gradient assumptions and run an iterative algorithm to learn the regularizers [13, 19, 25, 30]. Another popular line for learning the regularizers comes from deep learning [14, 18]; the advantage of the deep learning methods is that they can achieve an end-to-end recovery of the true image 𝐱𝐫\mathbf{x^{r}} from the measurement 𝐲\mathbf{y} [1, 22]. Recent researches reveal that convolutional neural networks (CNNs) are quite effective for image denoising, e.g., the CNN based algorithms [11, 35] can directly learn the reconstructed mapping from initial measurement reconstructions (e.g., FBP) to the ground-truth images. The dual-domain network that combines the sinograms with reconstructed low-dose CT images was also proposed to enhance the generalizability [16, 31].

A major drawback of the aforementioned reconstruction methods is that they deal with the input CT 2D slices independently (note that the goal of CT reconstruction is to build the 3D model of the organ). Namely, the neighborhood correlations among the 2D slices are often ignored, which may affect the reconstruction performance in practice. In the field of computer vision, “optical flow” is a common technique for tracking the motion of object between consecutive frames, which has been applied to many different tasks like video generation [36], prediction of next frames [23] and super resolution synthesis [6, 32]. To estimate the optical flow field, exsting approaches include the traditional brightness gradient methods [3] and the deep networks [8]. The idea of optical flow has also been used for tracking the organs movement in medical imaging [17, 21, 34]. However, to the best of our knowledge, there is no work considering GANs with using optical flow to capture neighbor slices coherence for low dose 3D CT reconstruction.

In this paper, we propose a novel optical flow based generative adversarial network for 3D CT reconstruction. Our intuition is as follows. When a patient is located in a CT equipment, a set of consecutive cross-sectional images are generated. If the vertical axial sampling space of transverse planes is small, the corresponding CT slices should be highly similar. So we apply optical flow, though there exist several technical issues waiting to solve for the design and implementation, to capture the local coherence of adjacent CT images for reducing the artifacts in low-dose CT reconstruction. Our contributions are summarized below:

  1. 1.

    We introduce the “local coherence” by characterizing the correlation of consecutive CT images, which plays a key role for suppressing the artifacts.

  2. 2.

    Together with the local coherence, our proposed generative adversarial networks (GANs) can yield significant improvement for texture quality and stability of the reconstructed images.

  3. 3.

    To illustrate the efficiency of our proposed approach, we conduct rigorous experiments on several real clinical datasets; the experimental results reveal the advantages of our approach over several state-of-the-art CT reconstruction methods.

2 Preliminaries

In this section, we briefly review the framework of the ordinary generative adversarial network, and also introduce the local coherence of CT slices.

2.0.1 Generative adversarial network.

Traditional generative adversarial network [9] consists of two main modules, a generator and a discriminator. The generator 𝒢\mathcal{G} is a mapping from a latent-space Gaussian distribution Z\mathbb{P}_{Z} to the synthetic sample distribution XG\mathbb{P}_{X_{G}}, which is expected to be close to the real sample distribution X\mathbb{P}_{X}. On the other hand, the discriminator 𝒟\mathcal{D} aims to maximize the distance between the distributions XG\mathbb{P}_{X_{G}} and X\mathbb{P}_{X}. The game between the generator and discriminator actually is an adversarial process, where the overall optimization objective follows a min-max principle:

min𝒢max𝒟𝔼𝐱𝐫X,𝐳Z(log(𝒟(𝐱𝐫))+log(1𝒟(𝒢(𝐳))).\mathop{min}\limits_{\mathcal{G}}\mathop{max}\limits_{\mathcal{D}}\mathbb{E}_{\mathbf{x^{r}}\sim\mathbb{P}_{X},\mathbf{z}\sim\mathbb{P}_{Z}}(\log(\mathcal{D}(\mathbf{x^{r}}))+\log(1-\mathcal{D}(\mathcal{G}(\mathbf{z}))). (3)

2.0.2 Local coherence.

As mentioned in Section 1, optical flow can capture the temporal coherence of object movements, which plays a crucial role in many video-related tasks. More specifically, the optical flow refers to the instantaneous velocity of pixels of moving objects on consecutive frames over a short period of time [3]. The main idea relies on the practical assumptions that the brightness of the object more likely remains stable across consecutive frames, and the brightness of the pixels in a local region are usually changed consistently [10]. Based on these assumptions, the brightness of optical flow can be described by the following equation:

Iwvw+Ihvh+It=0,\nabla I_{w}\cdot v_{w}+\nabla I_{h}\cdot v_{h}+\nabla I_{t}=0, (4)

where v=(vw,vh)v=(v_{w},v_{h}) represents the optical flow of the position (w,h)(w,h) in the image. I=(Iw,Ih)\nabla I=(\nabla I_{w},\nabla I_{h}) denotes spatial gradients of image brightness, and It\nabla I_{t} denotes the temporal partial derivative of the corresponding region.

Following the equation (4), we consider the question that whether the optical flow idea can be applied to 3D CT reconstruction. In practice, the brightness of adjacent CT images often has very tiny difference, due to the inherent continuity and structural integrity of human body. Therefore, we introduce the “local coherence” that indicates the correlation between adjacent images of a tissue. Namely, adjacent CT images often exhibit significant similarities within a certain local range along the vertical axis of the human body. Due to the local coherence, the noticeable variations observed in CT slices within the local range often occur at the edges of organs. We can substitute the temporal partial derivative It\nabla I_{t} by the vertical axial partial derivative Iz\nabla I_{z} in the equation (4), where “z” indicates the index of the vertical axis. As illustrated in Figures 1, the local coherence can be captured by the optical flow between adjacent CT slices.

Refer to caption
Figure 1: The optical flow between two adjacent CT slices. The scanning window of X-ray slides from the position of the left image to the position of the right image. The directions and lengths of the red arrows represent the optical flow field. The left and right images share the local coherence and thus the optical flows are small.

3 GANs with local coherence

In this section, we introduce our low-dose CT image generation framework with local coherence in detail.

3.0.1 The framework of our network.

The proposed framework comprises three components, including a generator 𝒢\mathcal{G}, a discriminator 𝒟\mathcal{D} and an optical flow estimator \mathcal{F}. The generator is the core component, and the flow estimator provides auxiliary warping images for the generation process.

Suppose we have a sequence of measurements 𝐲𝟏,𝐲𝟐,,𝐲𝐧\mathbf{y_{1}},\mathbf{y_{2}},\cdots,\mathbf{y_{n}}; for each 𝐲𝐢\mathbf{y_{i}}, 1in1\leq i\leq n, we want to reconstruct its ground truth image 𝐱𝐢𝐫\mathbf{x_{i}^{r}} as the equation (1). Before performing the reconstruction in the generator 𝒢\mathcal{G}, we apply some prior knowledge in physics and run filter backward projection on the measurement 𝐲𝐢\mathbf{y_{i}} in equation (1) to obtain an initial recovery solution 𝐬𝐢\mathbf{s_{i}}. Usually 𝐬𝐢\mathbf{s_{i}} contains significant noise comparing with the ground truth 𝐱𝐢𝐫\mathbf{x_{i}^{r}}. Then the network has two input components, i.e., the initial backward projected image 𝐬𝐢\mathbf{s_{i}} that serves as an approximation of the ground truth 𝐱𝐢𝐫\mathbf{x_{i}^{r}}, and a set of neighbor CT slices 𝒩(𝐬𝐢)={𝐬𝐢𝟏,𝐬𝐢+𝟏}\mathcal{N}(\mathbf{s_{i}})=\{\mathbf{s_{i-1}},\mathbf{s_{i+1}}\} 111If i=1i=1, 𝒩(𝐬𝐢)={𝐬𝟐}\mathcal{N}(\mathbf{s_{i}})=\{\mathbf{s_{2}}\}; if i=ni=n, 𝒩(𝐬𝐢)={𝐬𝐧𝟏}\mathcal{N}(\mathbf{s_{i}})=\{\mathbf{s_{n-1}}\}. for preserving the local coherence. The overall structure of our framework is shown in Figure 2. Below, we introduce the three key parts of our framework separately.

Refer to caption
Figure 2: The framework of our generate adversarial network with local coherence for CT reconstruction.

3.0.2 Optical flow estimator.

The optical flow (𝒩(𝐬𝐢),𝐬𝐢)\mathcal{F}(\mathcal{N}(\mathbf{s_{i}}),\mathbf{s_{i}}) denotes the brightness changes of pixels from 𝒩(𝐬𝐢)\mathcal{N}(\mathbf{s_{i}}) to 𝐬𝐢\mathbf{s_{i}}, where it captures their local coherence. The estimator is derived by the network architecture of FlowNet [8]. The FlowNet is an autoencoder architecture with extraction of features of two input frames to learn the corresponding flow, which is consist of 66 (de)convolutional layers for both encoder and decoder.

3.0.3 Discriminator.

The discriminator 𝒟\mathcal{D} assigns the label “11” to real standard-dose CT images and “0” to generated images. The goal of 𝒟\mathcal{D} is to maximize the separation between the distributions of real images and generated images:

𝒟=i=1n(log(𝒟(𝐱𝐢𝐫))+log(1𝒟(𝐱𝐢𝐠))),\mathcal{L}_{\mathcal{D}}=\sum_{i=1}^{n}-(\log(\mathcal{D}(\mathbf{x_{i}^{r}}))+\log(1-\mathcal{D}(\mathbf{x_{i}^{g}}))), (5)

where 𝐱𝐢𝐠\mathbf{x_{i}^{g}} is the image generated by 𝒢\mathcal{G} (the formal definition for 𝐱𝐢𝐠\mathbf{x_{i}^{g}} will be introduced below). The discriminator includes 3 residual blocks, with 4 convolutional layers in each residual block.

3.0.4 Generator.

We use the generator 𝒢\mathcal{G} to reconstruct the high-quality CT image for the ground truth 𝐱𝐢𝐫\mathbf{x_{i}^{r}} from the low-dose image 𝐬𝐢\mathbf{s_{i}}. The generated image is obtained by

𝐱𝐢𝐠=𝒢(𝐬𝐢,𝒲(𝒩(𝐱𝐢𝐠)));\displaystyle\mathbf{x_{i}^{g}}=\mathcal{G}(\mathbf{s_{i}},\mathcal{W}(\mathcal{N}(\mathbf{x_{i}^{g}}))); (6)
𝒩(𝐱𝐢𝐠)=𝒢(𝒩(𝐬𝐢)),\displaystyle\mathcal{N}(\mathbf{x_{i}^{g}})=\mathcal{G}(\mathcal{N}(\mathbf{s_{i}})),

where 𝒲()\mathcal{W}(\cdot) is the warping operator. Before generating 𝐱𝐢𝐠\mathbf{x_{i}^{g}}, 𝒩(𝐱𝐢𝐠)\mathcal{N}(\mathbf{x_{i}^{g}}) is reconstructed from 𝒩(𝐬𝐢)\mathcal{N}(\mathbf{s_{i}}) by the generator without considering local coherence. Subsequently, according to the optical flow (𝒩(𝐬𝐢),𝐬𝐢)\mathcal{F}(\mathcal{N}(\mathbf{s_{i}}),\mathbf{s_{i}}), we warp the reconstructed images 𝒩(𝐱𝐢𝐠)\mathcal{N}(\mathbf{x_{i}^{g}}) to align with the current slice by adjusting the brightness values. The warping operator 𝒲\mathcal{W} utilizes bi-linear interpolation to obtain 𝒲(𝒩(𝐱𝐢𝐠))\mathcal{W}(\mathcal{N}(\mathbf{x_{i}^{g}})), which enables the model to capture subtle variations in the tissue from the generated 𝒩(𝐱𝐢𝐠)\mathcal{N}(\mathbf{x_{i}^{g}}); also, the warping operator can reduce the influence of artifacts for the reconstruction. Finally, 𝐱𝐢𝐠\mathbf{x_{i}^{g}} is generated by combining 𝐬𝐢\mathbf{s_{i}} and 𝒲(𝒩(𝐱𝐢𝐠))\mathcal{W}(\mathcal{N}(\mathbf{x_{i}^{g}})). Since 𝐱𝐢𝐫\mathbf{x_{i}^{r}} is our target for reconstruction in the ii-th batch, we consider the difference between 𝐱𝐢𝐠\mathbf{x_{i}^{g}} and 𝐱𝐢𝐫\mathbf{x_{i}^{r}} in the loss. Our generator is mainly based on the network architecture of Unet [26]. Partly inspired by the loss in [2], the optimization objective of the generator 𝒢\mathcal{G} comprises three items with the coefficients λ𝚙𝚒𝚡,λ𝚊𝚍𝚟,λ𝚙𝚎𝚛(0,1)\lambda_{\mathtt{pix}},\lambda_{\mathtt{adv}},\lambda_{\mathtt{per}}\in(0,1):

𝒢=λ𝚙𝚒𝚡𝚙𝚒𝚡𝚎𝚕+λ𝚊𝚍𝚟𝚊𝚍𝚟+λ𝚙𝚎𝚛𝚙𝚎𝚛𝚌𝚎𝚙𝚝.\mathcal{L}_{\mathcal{G}}=\lambda_{\mathtt{pix}}\mathcal{L}_{\mathtt{pixel}}+\lambda_{\mathtt{adv}}\mathcal{L}_{\mathtt{adv}}+\lambda_{\mathtt{per}}\mathcal{L}_{\mathtt{percept}}. (7)

In (7), “𝚙𝚒𝚡𝚎𝚕\mathcal{L}_{\mathtt{pixel}}” is the loss measuring the pixel-wise mean square error of the generated image 𝐱𝐢𝐠\mathbf{x_{i}^{g}} with respect to the ground-truth 𝐱𝐢𝐫\mathbf{x_{i}^{r}}. “𝚊𝚍𝚟\mathcal{L}_{\mathtt{adv}}” represents the adversarial loss of the discriminator 𝒟\mathcal{D}, which is designed to minimize the distance between the generated standard-dose CT image distribution XG\mathbb{P}_{X_{G}} and the real standard-dose CT image distribution X\mathbb{P}_{X}. “𝚙𝚎𝚛𝚌𝚎𝚙𝚝\mathcal{L}_{\mathtt{percept}}” denotes the perceptual loss, which quantifies the dissimilarity between the feature maps of 𝐱𝐢𝐫\mathbf{x_{i}^{r}} and 𝐱𝐢𝐠\mathbf{x_{i}^{g}}; the feature maps denote the feature representation extracted from the hidden layers in the discriminator 𝒟\mathcal{D} (suppose there are tt hidden layers):

percept=i=1nj=1t𝒟j(𝐱𝐢𝐫)𝒟j(𝐱𝐢𝐠)1\mathcal{L}_{percept}=\sum_{i=1}^{n}\sum_{j=1}^{t}\|\mathcal{D}_{j}(\mathbf{x_{i}^{r}})-\mathcal{D}_{j}(\mathbf{x_{i}^{g}})\|_{1} (8)

where 𝒟j()\mathcal{D}_{j}(\cdot) refers to the feature extraction performed on the jj-th hidden layer. Through capturing the high frequency differences in CT images, 𝚙𝚎𝚛𝚌𝚎𝚙𝚝\mathcal{L}_{\mathtt{percept}} can enhance the sharpness for edges and increase the contrast for the reconstructed images. 𝚙𝚒𝚡𝚎𝚕\mathcal{L}_{\mathtt{pixel}} and 𝚊𝚍𝚟\mathcal{L}_{\mathtt{adv}} are designed to recover global structure, and 𝚙𝚎𝚛𝚌𝚎𝚙𝚝\mathcal{L}_{\mathtt{percept}} is utilized to incorporate additional texture details into the reconstruction process.

4 Experiment

4.0.1 Datasets.

First, our proposed approaches are evaluated on the “Mayo-Clinic low-dose CT Grand Challenge” (Mayo-Clinic) dataset of lung CT images [20]. The dataset contains 2250 two dimensional slices from 9 patients for training, and the remaining 128 slices from 1 patient are reserved for testing. The low-dose measurements are simulated by parallel-beam X-ray with 200 (or 150) uniform views, i.e., Nv=200(orNv=150)N_{v}=200~{}(\text{or}~{}N_{v}=150), and 400 (or 300) detectors, i.e., Nd=400(orNd=300)N_{d}=400~{}(\text{or}~{}N_{d}=300). In order to further verify the denoising ability of our approaches, we add the Gaussian noise with standard deviation σ=2.0\sigma=2.0 to the sinograms after X-ray projection in 50% of the experiments. To evaluate the generalization of our model, we also consider another dataset RIDER with non-small cell lung cancer under two CT scans [37] for testing. We randomly select 4 patients with 1827 slices from the dataset. The simulation process is identical to that of Mayo-Clinic. The proposed networks were implemented in the MindSpore framework and trained on Nvidia 3090 GPU with 100 epochs.

4.0.2 Baselines and evaluation metrics.

We consider several existing popular algorithms for comparison. (1) FBP [12]: the classical filter backward projection on low-dose sinograms. (2) FBPConvNet [11]: a direct inversion network followed by the CNN after initial FBP reconstruction. (3) LPD [1]: a deep learning method based on proximal primal-dual optimization. (4) UAR [22]: an end-to-end reconstruction method based on learning unrolled reconstruction operators and adversarial regularizers. Our proposed method is denoted by GAN-LC. We set λ𝚙𝚒𝚡=1.0,λ𝚊𝚍𝚟=0.01and λ𝚙𝚎𝚛=1.0\lambda_{\mathtt{pix}}=1.0,\lambda_{\mathtt{adv}}=0.01~{}\text{and }~{}\lambda_{\mathtt{per}}=1.0 for the optimization objective in equation (7) during our training process. Following most of the previous articles on 3D CT reconstruction, we evaluate the experimental performance by two metrics: the peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM) [33]. PSNR measures the pixel differences of two images, which is negatively correlated with mean square error. SSIM measures the structure similarity between two images, which is related to the variances of the input images. For both two meansures, the higher the better.

4.0.3 Results.

Table 1 presents the results on the Mayo-Clinic dataset, where the first row represents different parameter settings (i.e., the number of uniform views NvN_{v}, the number of detectors NdN_{d} and the standard deviation of Gaussian noise σ\sigma) for simulating low-dose sinograms. Our proposed approach GAN-LC consistently outperforms the baselines under almost all the low-dose parameter settings. The methods FBP and UAR are very sensitive to noise; the performance of LPD is relatively stable but with low reconstruction accuracy. FBPConvNet has a similar increasing trend with our approach across different settings but has worse reconstruction quality. To evaluate the stability and generalization of our model and the baselines trained on Mayo-Clinic dataset, we also test them on the RIDER dataset. The results are shown in Table 2. Due to the bias in the datasets collected from different facilities, the performances of all the models are declined to some extents. But our proposed approach still outperforms the other models for most testing cases.

To illustrate the reconstruction performances more clearly, we also show the reconstruction results for testing images in Figure 3. We can see that our network can reconstruct the CT image with higher quality. Due to the space limit, the experimental results of different views NvN_{v} and more visualized results are placed in our supplementary material.

Table 1: Experimental results for Mayo-Clinic dataset. The value in first row of the table represents NvN_{v}, NdN_{d} and σ\sigma for simulating low-dose sinograms, respectively.
Sinograms 200,400,0.0~{}~{}200,400,0.0~{}~{} 200,400,2.0~{}~{}200,400,2.0~{}~{} 150,300,0.0~{}~{}150,300,0.0~{}~{} 150,300,2.0~{}~{}150,300,2.0~{}~{}
PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
FBP 26.449 0.721 13.517 0.191 21.460 0.616 12.593 0.168
FBPConvNet 38.213 0.918 30.148 0.743 35.263 0.869 29.095 0.723
LPD 28.050 0.844 28.357 0.794 28.376 0.826 27.409 0.801
UAR 33.248 0.902 22.048 0.272 29.829 0.848 21.227 0.238
GAN-LC 39.548 0.950 32.437 0.819 36.542 0.899 31.586 0.725
Table 2: Experimental results for RIDER dataset. The value in first row of the table represents NvN_{v}, NdN_{d} and σ\sigma for simulating low-dose sinograms, respectively.
Sinograms 200,400,0.0~{}~{}200,400,0.0~{}~{} 200,400,2.0~{}~{}200,400,2.0~{}~{} 150,300,0.0~{}~{}150,300,0.0~{}~{} 150,300,2.0~{}~{}150,300,2.0~{}~{}
PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
FBP 21.398 0.647 15.609 0.233 19.49 0.597 14.845 0.203
FBPConvNet 27.256 0.671 19.520 0.444 27.504 0.650 18.517 0.431
LPD 22.341 0.615 12.196 0.466 22.172 0.556 12.215 0.455
UAR 24.915 0.667 20.943 0.207 21.136 0.557 19.873 0.176
GAN-LC 28.861 0.721 22.624 0.517 29.171 0.705 19.607 0.470
Refer to caption
Figure 3: Reconstruction results on Mayo-Clinic dataset. The sparse view setting of sinograms is Nv=200N_{v}=200, Nd=400N_{d}=400 and σ=2.0\sigma=2.0. “Ground Truth” is the standard-dose CT image.

5 Conclusion

In this paper, we propose a novel approach for low-dose CT reconstruction using generative adversarial networks with local coherence. By considering the inherent continuity of human body, local coherence can be captured through optical flow, which is small deformations and structural differences between consecutive CT slices. The experimental results on real datasets demonstrate the advantages of our proposed network over several popular approaches. In future, we will evaluate our network on real-world CT images from local hospital and use the reconstructed images to support doctors for the diagnosis and recognition of lung nodules.

Acknowledgements. The research of this work was supported in part by National Key R&D program of China through grant 2021YFA1000900, the NSFC throught Grant 62272432, the Provincial NSF of Anhui through grant 2208085MF163, a Huawei-USTC Joint Innovation Project on Fundamental System Software and sponsored by CAAI-Huawei MindSpore Open Fund.

References

  • [1] Adler, J., Öktem, O.: Learned primal-dual reconstruction. IEEE transactions on medical imaging 37(6), 1322–1332 (2018)
  • [2] Armanious, K., Jiang, C., Fischer, M., Küstner, T., Hepp, T., Nikolaou, K., Gatidis, S., Yang, B.: Medgan: Medical image translation using gans. Computerized medical imaging and graphics 79, 101684 (2020)
  • [3] Beauchemin, S.S., Barron, J.L.: The computation of optical flow. ACM computing surveys (CSUR) 27(3), 433–466 (1995)
  • [4] Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical imaging and vision 20(1), 89–97 (2004)
  • [5] Chen, H., Zhang, Y., Zhang, W., Liao, P., Li, K., Zhou, J., Wang, G.: Low-dose ct via convolutional neural network. Biomedical optics express 8(2), 679–694 (2017)
  • [6] Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., Thuerey, N.: Learning temporal coherence via self-supervision for gan-based video generation. ACM Transactions on Graphics (TOG) 39(4), 75–1 (2020)
  • [7] Ding, Q., Nan, Y., Gao, H., Ji, H.: Deep learning with adaptive hyper-parameters for low-dose ct image reconstruction. IEEE Transactions on Computational Imaging 7, 648–660 (2021)
  • [8] Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T.: Flownet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision. pp. 2758–2766 (2015)
  • [9] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial networks. CoRR abs/1406.2661 (2014), http://arxiv.org/abs/1406.2661
  • [10] Horn, B.K., Schunck, B.G.: Determining optical flow. Artificial intelligence 17(1-3), 185–203 (1981)
  • [11] Jin, K.H., McCann, M.T., Froustey, E., Unser, M.: Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing 26(9), 4509–4522 (2017)
  • [12] Kak, A.C., Slaney, M.: Principles of computerized tomographic imaging. SIAM (2001)
  • [13] Knoll, F., Bredies, K., Pock, T., Stollberger, R.: Second order total generalized variation (tgv) for mri. Magnetic resonance in medicine 65(2), 480–491 (2011)
  • [14] Kobler, E., Effland, A., Kunisch, K., Pock, T.: Total deep variation for linear inverse problems. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7549–7558 (2020)
  • [15] Li, H., Schwab, J., Antholzer, S., Haltmeier, M.: Nett: Solving inverse problems with deep neural networks. Inverse Problems 36(6), 065005 (2020)
  • [16] Lin, W.A., Liao, H., Peng, C., Sun, X., Zhang, J., Luo, J., Chellappa, R., Zhou, S.K.: Dudonet: Dual domain network for ct metal artifact reduction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10512–10521 (2019)
  • [17] Liu, H., Lin, Y., Ibragimov, B., Zhang, C.: Low dose 4d-ct super-resolution reconstruction via inter-plane motion estimation based on optical flow. Biomedical Signal Processing and Control 62, 102085 (2020)
  • [18] Lunz, S., Öktem, O., Schönlieb, C.B.: Adversarial regularizers in inverse problems. Advances in neural information processing systems 31 (2018)
  • [19] McCann, M.T., Nilchian, M., Stampanoni, M., Unser, M.: Fast 3d reconstruction method for differential phase contrast x-ray ct. Optics express 24(13), 14564–14581 (2016)
  • [20] McCollough, C.: Tu-fg-207a-04: overview of the low dose ct grand challenge. Medical physics 43(6Part35), 3759–3760 (2016)
  • [21] Mira, C., Moya-Albor, E., Escalante-Ramírez, B., Olveres, J., Brieva, J., Vallejo, E.: 3d hermite transform optical flow estimation in left ventricle ct sequences. Sensors 20(3),  595 (2020)
  • [22] Mukherjee, S., Carioni, M., Öktem, O., Schönlieb, C.B.: End-to-end reconstruction meets data-driven regularization for inverse problems. Advances in Neural Information Processing Systems 34, 21413–21425 (2021)
  • [23] Patraucean, V., Handa, A., Cipolla, R.: Spatio-temporal video autoencoder with differentiable memory. arXiv preprint arXiv:1511.06309 (2015)
  • [24] Ramm, A.G., Katsevich, A.I.: The Radon transform and local tomography. CRC press (2020)
  • [25] Romano, Y., Elad, M., Milanfar, P.: The little engine that could: Regularization by denoising (red). SIAM Journal on Imaging Sciences 10(4), 1804–1844 (2017)
  • [26] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. pp. 234–241. Springer (2015)
  • [27] Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena 60(1-4), 259–268 (1992)
  • [28] Sori, W.J., Feng, J., Godana, A.W., Liu, S., Gelmecha, D.J.: Dfd-net: lung cancer detection from denoised ct scan image using deep learning. Frontiers of Computer Science 15, 1–13 (2021)
  • [29] Toft, P.: The radon transform. Theory and Implementation (Ph. D. Dissertation)(Copenhagen: Technical University of Denmark) (1996)
  • [30] Venkatakrishnan, S.V., Bouman, C.A., Wohlberg, B.: Plug-and-play priors for model based reconstruction. In: 2013 IEEE Global Conference on Signal and Information Processing. pp. 945–948. IEEE (2013)
  • [31] Wang, C., Shang, K., Zhang, H., Li, Q., Zhou, S.K.: Dudotrans: Dual-domain transformer for sparse-view ct reconstruction. In: Machine Learning for Medical Image Reconstruction: 5th International Workshop, MLMIR 2022, Held in Conjunction with MICCAI 2022, Singapore, September 22, 2022, Proceedings. pp. 84–94. Springer (2022)
  • [32] Wang, T.C., Liu, M.Y., Zhu, J.Y., Liu, G., Tao, A., Kautz, J., Catanzaro, B.: Video-to-video synthesis. arXiv preprint arXiv:1808.06601 (2018)
  • [33] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4), 600–612 (2004)
  • [34] Weng, N., Yang, Y.H., Pierson, R.: Three-dimensional surface reconstruction using optical flow for medical imaging. IEEE transactions on medical imaging 16(5), 630–641 (1997)
  • [35] Wolterink, J.M., Leiner, T., Viergever, M.A., Išgum, I.: Generative adversarial networks for noise reduction in low-dose ct. IEEE transactions on medical imaging 36(12), 2536–2545 (2017)
  • [36] Xue, T., Wu, J., Bouman, K., Freeman, B.: Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks. Advances in neural information processing systems 29 (2016)
  • [37] Zhao, B., James, L.P., Moskowitz, C.S., Guo, P., Ginsberg, M.S., Lefkowitz, R.A., Qin, Y., Riely, G.J., Kris, M.G., Schwartz, L.H.: Evaluating variability in tumor measurements from same-day repeat ct scans of patients with non–small cell lung cancer. Radiology 252(1), 263–272 (2009)

Appendix 0.A Supplementary figures

Refer to caption
Refer to caption
Figure 4: Reconstruction results on Mayo-Clinic dataset with NvN_{v} varying from 5050 to 300300. The sparse view setting of sinograms is Nd=400N_{d}=400 and σ=0\sigma=0. Our proposed approach GAN-LC outperforms the baselines under almost all the low-dose parameter settings.
Refer to caption
Figure 5: Reconstruction results on Mayo-Clinic dataset. The sparse view setting of sinograms is Nv=200N_{v}=200, Nd=400N_{d}=400 and σ=2.0\sigma=2.0.
Refer to caption
Figure 6: Reconstruction results on Mayo-Clinic dataset. The sparse view setting of sinograms is Nv=200N_{v}=200, Nd=400N_{d}=400 and σ=0\sigma=0.
Refer to caption
Figure 7: Reconstruction results on Rider dataset. The sparse view setting of sinograms is Nv=200N_{v}=200, Nd=400N_{d}=400 and σ=2.0\sigma=2.0.