11email: srivathsa@subtlemedical.com 22institutetext: University of Massachusetts Lowell, Lowell MA 01854, USA 33institutetext: Stanford University, Stanford CA 94035, USA
Simulation of Arbitrary Level Contrast Dose in MRI Using an Iterative Global Transformer Model
Abstract
aaaPaper accepted in the 26th International Conference on Medical Image Computing and Computer Assisted Interventation (MICCAI 2023) to be held in Vancouver, Canada from October 8-12, 2023.Deep learning (DL) based contrast dose reduction and elimination in MRI imaging is gaining traction, given the detrimental effects of Gadolinium-based Contrast Agents (GBCAs). These DL algorithms are however limited by the availability of high quality low dose datasets. Additionally, different types of GBCAs and pathologies require different dose levels for the DL algorithms to work reliably. In this work, we formulate a novel transformer (Gformer) based iterative modelling approach for the synthesis of images with arbitrary contrast enhancement that corresponds to different dose levels. The proposed Gformer incorporates a sub-sampling based attention mechanism and a rotational shift module that captures the various contrast related features. Quantitative evaluation indicates that the proposed model performs better than other state-of-the-art methods. We further perform quantitative evaluation on downstream tasks such as dose reduction and tumor segmentation to demonstrate the clinical utility.
Keywords:
Contrast-enhanced MRI Iterative Model Vision Transformers1 Introduction
Gadolinium-Based Contrast Agents (GBCAs) are widely used in MRI scans owing to their capability of improving the border delineation and internal morphology of different pathologies and have extensive clinical applications[1]. However, GBCAs have several disadvantages like contraindications in patients with reduced renal function[2], patient inconvenience, high operation costs and environmental side effects[3]. Therefore, there is an increased emphasis on the paradigm of "as low as reasonably achievable" (ALARA)[4]. To tackle these concerns of GBCAs, several dose reduction[5, 6] and elimination approaches[7] have been proposed. However, these deep learning(DL)-based dose reduction approaches require high quality low-dose contrast-enhanced (CE) images paired with pre-contrast and full-dose CE images. Acquiring such a dataset requires modification of the standard imaging protocol and involves additional training of the MR technicians. Therefore, it is important to simulate the process of T1w low-dose image acquisition, using images from the standard protocol. Moreover, it is crucial for these dose reduction approaches to establish the minimum dose level required for different pathologies as these are dependent on the scanning protocol and the GBCA compound injected. Therefore the simulation tool should also have the ability to synthesize images with multiple contrast enhancement levels, that correspond to multiple arbitrary dose levels.
Currently MRI dose simulation is done using physics-based models[8]. However, these physics-based methods are dependent on the protocol parameters and the type of GBCA and their relaxation parameters. Deep learning (DL) models have been widely used in medical imaging application due to their high capacity, generazibility, and transferability[9, 10]. The performance of these DL models heavily depend on the availability of high quality data. There is a dearth of data-driven approaches to MRI dose-simulation given the lack of diverse ground truth data of the different dose levels. To this effect, we introduce a vision transformer based DL modelbbbA part of this work was presented as a poster in the conference of International Society for Magnetic Resonance in Medicine (ISMRM) 2023, held in Toronto. that can synthesize braincccRefer Supplementary Material Figure 1(b) for examples on Spine MRI. MRI images that correspond to arbitrary dose levels, by training on a highly imbalanced dataset with only T1w pre-contrast, T1w 10% low-dose, and T1w CE standard dose images. The model backbone consists of a novel Global transformer (Gformer) with subsampling attention that can learn long-range dependencies of contrast uptake features. The proposed method also consists of a rotational shift operation that can further capture the shape irregularity of the contrast uptake regions. We performed extensive quantitative evaluation in comparison to other state-of-the art methods. Additionally, we show the clinical utility of the simulated T1w low-dose images using downstream tasks. To the best of our knowledge, this is the first DL based MRI dose simulation approach.
2 Methods
Iterative learning design: DL based models tend to perform poorly when the training data is highly imbalanced [11]. Furthermore, the problem of arbitrary dose simulation requires the interpolation of intermediate dose-levels using a minimum number of data points. Iterative models [12, 13] are suitable for such applications as they work on the terminal images to generate step-wise intermediate solutions. We first utilize this design paradigm for the dose simulation task and train an end-to-end model on a highly imbalanced dataset where only T1w pre-contrast, T1w low-dose, and T1w post-contrast are available.

As shown in Fig. 1(a), the proposed iterative model learns a transformation from the post-contrast to the low-dose image in iterations, where represents the base model. At each iteration , the higher enhancement image from the previous step and the pre-contrast images are fed into to predict the image with lower enhancement. The iterative model can be formulated as follows,
(1) |
where , , and denote the pre-contrast, post-contrast, and predicted low-dose images, respectively and denotes the image with a higher enhancement than . This way, the intermediate outputs having different enhancement levels, correspond to images with different contrast dose level with a uniform interval. This iterative model essentially learns a gradual dose reduction process, in which each iteration step removes a certain amount of contrast enhancement from the full-dose image.
Loss functions and model convergence: The proposed iterative model aims to learn a mapping from the post-contrast & pre-contrast images to the synthesized low-dose images and is trained with the true 10% low-dose image as the ground truth. We used the and structural similarity index measure (SSIM) losses. To tackle the problem of gradient explosion or vanishing, "soft labels" are generated using linear scaling. These "soft labels" serve as a reference to the intermediate outputs during the iterative training process and also aid model convergence, without which the model has to directly learn from post-contrast to low-dose. Given iterations, the soft label for iteration is calculated as follows:
(2) |
where and denote the skull-stripped post-contrast and pre-contrast images. represents the dose level of the final prediction, and denotes the threshold to extract the estimated contrast uptake . Finally, the total losses are calculated as
(3) |
Where and and . The "soft labels" are assigned a small loss weight so that they do not overshadow the contribution of the real low-dose image. Additionally, in order to recover the high frequency texture information and to improve the overall perceptual quality, adversarial [14] and perceptual losses [15] are applied on with a weight of .
Global transformer (Gformer):
Transformer models have risen to prominence in a wide range of computer vision applications [10, 16]. Traditional Swin transformers compute attention on non-overlapping local window patches. To further exploit the global contrast information, we propose a hybrid global transformer (Gformer) as a backbone for the dose simulation task. As illustrated in Fig. 1(b), the proposed model design includes six sequential Gformer blocks as the backbone module with shortcuts. As shown in Fig. 1(c), the Gformer block contains a convolution block, a rotational shift module, a sub-sampling process, and a typical transformer module. The convolution layer extracts granular local information of the contrast uptake while the self-attention emphasizes more on the coarse global context, thereby paying attention to the overall contrast uptake structure.
Subsampling attention: The sub-sampling is a key element in the Gformer block which generates a number of sub-images from the whole image as attention windows as shown in Fig 2. Gformer performs self-attention on the sub-sampled images, which encompasses global contextual information with minimal self-attention overhead on small feature maps. Given the entire feature map , where , and
are the batch size, channel dimension, height, and width, respectively, the subsampling process aggregates the strided positions to the sub-feature maps as follows,
(4) |
where denotes sampling a position every pixels, and is the subsampled feature map. We set to avoid any information loss during subsampling. These sub-feature maps are stacked onto the batch dimension as the attention windows for the transformer block shown in Fig 1(c).

Rotational shift: Image rotation has been widely used as a data augmentation technique in preprocessing and model training. Here, to further capture the heterogeneous nature of the contrast uptake areas, we employ the rotational shift as a module to facilitate the representation power of the Gformer. To prevent information loss on the edges due to rotation, only small angles (e.g., , ) are used for rotation and residual shortcuts are also applied. Specifically, given the feature map , rotational shift is performed around the vertical axis of height/width. The rotated feature map is obtained by the following equation:
(5) |
(6) |
where is the rotation angle. and denote the pixel index in the feature map tensor before and after rotational shift, respectively.
3 Experiments and Results
Dataset: With IRB approval and informed consent, we retrospectively used 126 clinical cases (113 training, 13 testing) from a internal private datasetdddThe dataset used in this study was obtained from Site A and B and are not available under a data sharing license using Gadoterate meglumine contrast agent (Site A). For downstream task assessment we used 159 patient studies from another site (Site B) using Gadobenate dimeglumine. The detailed cohort description is given in Table 1. The clinical indications for both sites included suspected tumor, post-op tumor follow-up and routine brain. For each patient, 3D T1w MPRAGE scans were acquired for the pre-contrast, low-dose, and post-contrast images. These paired images were then mean normalized and affine co-registered (pre-contrast as the fixed image) using SimpleElastix [17]. The images were also skull-stripped, to account for differences in fat suppression, using the HD-BET brain extraction tool [18] for generating the "soft labels".
Site |
|
Gender | Age | Scanner |
|
|
|
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Site A | 126 |
|
48 16 |
|
3T | 2.97-3.11 | 6.41-6.70 | 8∘ | ||||||||||
Site B | 159 |
|
52 17 |
|
3T | 2.99-5.17 | 7.73-12.25 | 8-20∘ |
Implementation details: All experiments were conducted with a single Tesla V100-SXM2 32GB GPU on a Intel(R) Xeon(R) CPU E5-2698 v4. The Subsampling stride for the six levels of Gformer block were {4,8,16,16,8,4}. The Rotational shift angles were {0,10,20,20,10,0} across all blocks. The model was optimized using the Adam optimizer with an initial learning rate of 1e-5 and a batch size of 4.

Evaluation settings:
We quantitatively evaluated the proposed model using PSNR, SSIM, RMSE, and LPIPS perceptual metrics [19], between the synthesized and true low-dose images. We replaced the Gformer backbone with other state-of-the-art methods to compare the efficacy of the different methods. Particularly, the following backbone networks were studied: simple linear scaling ("Scaling") approach, Rednet [20], Mapnn [13], Restormer [21], and SwinIR [22]. Unet [23] and Swin-Unet [24] models were not assessed due to their tendency to synthesize blurry artifacts in the iterative modelling. throughput metric (number of images generated per second) was also calculated to assess the inference efficiency.
Evaluation results:
Fig. 4(a) shows that the proposed model is able to generate images that correspond to different dose levels. As shown in the zoomed inset, the hyperintensity of the contrast uptake in these images gradually reduces at each iteration. Fig. 4(b) shows that the pathological structure in the synthesized low-dose image is similar to that of the ground truth. Fig. 4(c) also shows that the model is robust to hyperintensities that are not related to contrast uptake. Fig. 3 and Table 2 show that proposed model can synthesize enhancement patterns that look close to the true low-dose and that it performs better than the other competing methods with a reasonable inference throughput.
Method | Throughput | PSNR (dB) | SSIM | RMSE | LPIPS |
---|---|---|---|---|---|
Post | - | 33.93 2.88 | 0.93 0.03 | 0.34 0.13 | 0.055 0.016 |
Scaling | 0.79 Im/s | 38.41 2.22 | 0.94 0.19 | 0.20 0.05 | 0.027 0.015 |
Rednet | 0.71 Im/s | 40.07 2.72 | 0.97 0.01 | 0.17 0.05 | 0.029 0.009 |
Mapnn | 0.71 Im/s | 40.56 1.64 | 0.96 0.01 | 0.16 0.05 | 0.023 0.012 |
Restormer | 0.65 Im/s | 40.04 2.27 | 0.95 0.01 | 0.16 0.16 | 0.038 0.016 |
SwinIR | 0.58 Im/s | 40.93 2.25 | 0.96 0.01 | 0.15 0.06 | 0.028 0.015 |
Gformer*(Cyc) | 0.69 Im/s | 41.46 2.14 | 0.97 0.02 | 0.14 0.04 | 0.021 0.007 |
Gformer*(Rot) | 0.65 Im/s | 42.29 0.02 | 0.98 0.01 | 0.13 0.03 | 0.017 0.005 |

Quantitative assessment of contrast uptake:
The above pixel-based metrics do not specifically focus on the contrast uptake region. In order to assess the contrast uptake patterns of the intermediate images, we used the following metrics as described in [25]: contrast to noise ratio(CNR), contrast to background ratio(CBR), and contrast enhancement percentage(CEP). The ROI for the contrast uptake was computed as the binary mask of the corresponding "soft labels" in Eqn. 2. As shown in Fig 5, the value of the contrast specific metrics increases in a non-linear fashion as the iteration step increases.
Downstream tasks:
In order to demonstrate the clinical utility of the synthesized low-dose images, we performed two downstream tasks:
1) Low-dose to full-dose synthesis Using the DL-based algorithm to predict full-dose image from pre-contrast and low-dose images described in [5], we synthesized T1CE volumes using true low-dose (T1CE-real-ldose) and Gformer (rot) synthesized low-dose (T1CE-synth-ldose). We computed the PSNR and SSIM metrics of T1CE vs T1CE-synth/T1CE vs T1CE-synth-sim which are dB/ dB and / respectively. This shows that the synthesized low-dose images perform similareeep ¡ 0.0001 (Wilcoxon signed rank test) to that of the low-dose image in the dose reduction task. For this analysis we used the data from Site B.


2) Tumor segmentation Using the T1CE volumes synthesized in the above step, we perform tumor segmentation using the winning solution of BraTS 2018 challenge [26]. Let , and be the whole tumor (WT) masks generated using T1CE, T1CE-real-ldose and T1CE-synth-ldose (+ T1, T2 and FLAIR images) respectively. The mean Dice scores and on the test set were and respectively. Fig. 6 shows visual examples of tumor segmentation performance. This shows that the clinical utility provided by the synthesized low-dose is similarfffp ¡ 0.0001 (Wilcoxon signed rank test) to that of the actual low-dose image.
4 Discussions and Conclusion
We have proposed a Gformer-based iterative model to simulate low-dose CE images. Extensive experiments and downstream task performance have verified the efficacy and clinical performance of the proposed model compared to other state-of-the art methods. In the future, further reader studies are required to assess the diagnostic equivalence of the simulated low-dose images. The model can be guided using physics-based models [27] that estimate contrast enhancement level using signal intensity. This simulation technique can easily be extended to other anatomies and contrast agents.
References
- [1] L. E. Minton, R. Pandit, and K. K. Porter, “Contrast-enhanced mri: History and current recommendations,” Appl Radiol, pp. 15–9, 2021.
- [2] T. Grobner, “Gadolinium–a specific trigger for the development of nephrogenic fibrosing dermopathy and nephrogenic systemic fibrosis?,” Nephrology Dialysis Transplantation, vol. 21, no. 4, pp. 1104–1108, 2006.
- [3] R. Brünjes and T. Hofmann, “Anthropogenic gadolinium in freshwater and drinking water systems,” Water research, vol. 182, p. 115966, 2020.
- [4] M. Uffmann and C. Schaefer-Prokop, “Digital radiography: the balance between image quality and required radiation dose,” European journal of radiology, vol. 72, no. 2, pp. 202–208, 2009.
- [5] S. Pasumarthi, J. I. Tamir, S. Christensen, G. Zaharchuk, T. Zhang, and E. Gong, “A generic deep learning model for reduced gadolinium dose in contrast-enhanced brain mri,” Magnetic Resonance in Medicine, vol. 86, no. 3, pp. 1687–1700, 2021.
- [6] E. Gong, J. M. Pauly, M. Wintermark, and G. Zaharchuk, “Deep learning enables reduced gadolinium dose for contrast-enhanced brain mri,” Journal of magnetic resonance imaging, vol. 48, no. 2, pp. 330–340, 2018.
- [7] J. Kleesiek, J. N. Morshuis, F. Isensee, K. Deike-Hofmann, D. Paech, P. Kickingereder, U. Köthe, C. Rother, M. Forsting, W. Wick, et al., “Can virtual contrast enhancement in brain mri replace gadolinium?: a feasibility study,” Investigative radiology, vol. 54, no. 10, pp. 653–660, 2019.
- [8] S. Sourbron and D. L. Buckley, “Tracer kinetic modelling in mri: estimating perfusion and capillary permeability,” Physics in Medicine & Biology, vol. 57, no. 2, p. R1, 2011.
- [9] D. Wang, Z. Wu, and H. Yu, “Ted-net: Convolution-free t2t vision transformer-based encoder-decoder dilation network for low-dose ct denoising,” in Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 12, pp. 416–425, Springer, 2021.
- [10] J. Liu, S. Pasumarthi, B. Duffy, E. Gong, G. Zaharchuk, and K. Datta, “One model to synthesize them all: Multi-contrast multi-scale transformer for missing data imputation,” arXiv preprint arXiv:2204.13738, 2022.
- [11] G. Antipov, M. Baccouche, and J.-L. Dugelay, “Face aging with conditional generative adversarial networks,” in 2017 IEEE international conference on image processing (ICIP), pp. 2089–2093, IEEE, 2017.
- [12] Y. Liu, Q. Liu, M. Zhang, Q. Yang, S. Wang, and D. Liang, “Ifr-net: Iterative feature refinement network for compressed sensing mri,” IEEE Transactions on Computational Imaging, vol. 6, pp. 434–446, 2019.
- [13] H. Shan, A. Padole, F. Homayounieh, U. Kruger, R. D. Khera, C. Nitiwarangkul, M. K. Kalra, and G. Wang, “Competitive performance of a modularized deep neural network compared to commercial algorithms for low-dose ct image reconstruction,” Nature Machine Intelligence, vol. 1, no. 6, pp. 269–276, 2019.
- [14] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
- [15] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pp. 694–711, Springer, 2016.
- [16] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022, 2021.
- [17] K. Marstal, F. Berendsen, M. Staring, and S. Klein, “Simpleelastix: A user-friendly, multi-lingual library for medical image registration,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 134–142, 2016.
- [18] M. Schell, I. Tursunova, I. Fabian, D. Bonekamp, U. Neuberger, W. Wick, M. Bendszus, K. Maier-Hein, P. Kickingereder, et al., “Automated brain extraction of multi-sequence mri using artificial neural networks,” European Congress of Radiology-ECR 2019, 2019.
- [19] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595, 2018.
- [20] H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “Low-dose ct with a residual encoder-decoder convolutional neural network,” IEEE transactions on medical imaging, vol. 36, no. 12, pp. 2524–2535, 2017.
- [21] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang, “Restormer: Efficient transformer for high-resolution image restoration,” in CVPR, 2022.
- [22] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “Swinir: Image restoration using swin transformer,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 1833–1844, 2021.
- [23] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241, Springer, 2015.
- [24] H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,” in Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pp. 205–218, Springer, 2023.
- [25] A. Bône, S. Ammari, Y. Menu, C. Balleyguier, E. Moulton, É. Chouzenoux, A. Volk, G. C. Garcia, F. Nicolas, P. Robert, et al., “From dose reduction to contrast maximization: Can deep learning amplify the impact of contrast media on brain magnetic resonance image quality? a reader study,” Investigative Radiology, vol. 57, no. 8, pp. 527–535, 2022.
- [26] A. Myronenko, “3d mri brain tumor segmentation using autoencoder regularization,” in Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part II 4, pp. 311–320, Springer, 2019.
- [27] J. Mørkenborg, M. Pedersen, F. Jensen, H. Stødkilde-Jørgensen, J. Djurhuus, and J. Frøkiær, “Quantitative assessment of gd-dtpa contrast agent from signal enhancement: an in-vitro study,” Magnetic resonance imaging, vol. 21, no. 6, pp. 637–643, 2003.