An algorithm for constrained one-step inversion of spectral CT data

Rina Foygel Barber, Emil Y. Sidky, Taly Gilat Schmidt, and Xiaochuan Pan Rina Foygel Barber is with the Department of Statistics, The University of Chicago, 5734 S. University Ave., Chicago, IL 60637, USA. rina@uchicago.eduEmil Y. Sidky and Xiaochuan Pan are with the Department of Radiology, The University of Chicago, 5841 S. Maryland Ave., Chicago, IL 60637, USA. e-mail: {sidky,xpan}@uchicago.eduTaly Gilat Schmidt is with the Department of Biomedical Engineering, Marquette University, Milwaukee, WI 53233, USA. taly.gilat-schmidt@marquette.edu

Abstract

We develop a primal-dual algorithm that allows for one-step inversion of spectral CT transmission photon counts data to a basis map decomposition. The algorithm allows for image constraints to be enforced on the basis maps during the inversion. The derivation of the algorithm makes use of a local upper bounding quadratic approximation to generate descent steps for non-convex spectral CT data discrepancy terms, combined with a new convex-concave optimization algorithm. Convergence of the algorithm is demonstrated on simulated spectral CT data. Simulations with noise and anthropomorphic phantoms show examples of how to employ the constrained one-step algorithm for spectral CT data.

I Introduction

The recent research activity in photon-counting detectors has motivated a resurgence in the investigation of spectral computed tomography (CT). Photon-counting detectors detect individual X-ray quanta and the electronic pulse signal generated by these quanta has a peak amplitude proportional to the photon energy [1]. Thresholding these amplitudes allows for coarse energy resolution of the X-ray photons, and the transmitted flux of X-ray photons can be measured simultaneously in a number of energy windows. Theoretically, the energy-windowed transmission measurements can be exploited to reconstruct quantitatively the X-ray attenuation map of the subject being scanned [2]. The potential benefits are reduction of beam-hardening artifacts, and improved contrast-to-noise ratio (CNR), signal-to-noise ratio (SNR), and quantitative imaging [1, 3, 4, 5]. For photon-counting detectors where the number of energy windows can be three or greater, the new advantage with respect to quantitative imaging is the ability to image contrast agents that possess a K-edge in the diagnostic X-ray energy range [6, 7, 8, 9, 10, 11].

Use of energy information in X-ray CT has been proposed almost since the conception of CT itself [12]. Dual-energy CT acquires transmission intensity at either two energy windows or for two different X-ray source spectra. Despite the extremely coarse energy-resolution, the technique is effective because for many materials only two physical processes, photo-electric effect and Compton scattering, dominate X-ray attenuation in the diagnostic energy range [2]. Within the context of dual-energy, the processing methods of energy-windowed intensity data have been classified in two broad categories: pre-reconstruction and post-reconstruction [13]. The majority of processing methods for multi-window data also fall into these categories.

In pre-reconstruction processing of the multi-energy data, the X-ray attenuation map is expressed as a sum of terms based on physical process or basis materials [2]. The multi-energy data are converted to sinograms of the basis maps, then any image reconstruction technique can be employed to invert these sinograms. The basis maps can subsequently be combined to obtain images of other desired quantities: estimated X-ray attenuation maps at a single energy, material maps, atomic number, or electron density maps. The main advantage of pre-reconstruction processing is that beam-hardening artifacts can be avoided, because consistent sinograms of the basis maps are estimated prior to image reconstruction. Two major challenges for pre-reconstruction methods are the need to calibrate the spectral transmission model and to acquire registered projections. Photon-counting detectors ease the implementation of projection registration, because multiple energy-thresholding circuits can operate on the same detection element signal. Accounting for detection physics and spectral calibration by data pre-processing or incorporation directly in the image reconstruction algorithm remains a challenge for photon-counting detectors [1].

For post-reconstruction processing, the energy-windowed transmission data are processed by the standard negative logarithm to obtain approximate sinograms of a weighted energy-averaged attenuation map followed by standard image reconstruction. The resulting images can be combined to obtain approximate estimates of images of the same physical quantities as mentioned for the pre-reconstruction processing [14]. The advantage of post-reconstruction processing is that it is relatively simple, because it is only a small modification on how standard CT data are processed and there is no requirement of projection registration. The down-side, however, is that the images corresponding to each energy-window are susceptible to beam-hardening artifacts because the negative logarithm processed data will, in general, not be consistent with the projection of any object.

A third option for the processing of spectral CT data, however, does exist, which due to difficulties arising from the nonlinearity of the attenuation of polychromatic X-rays when passing through an object, is much less common than either pre- or post-reconstruction methods: direct estimation of basis maps from energy-windowed transmission data. This approach, labeled the one-step approach in the remainder of the article, has the advantages that the spectral transmission model is treated exactly, there is no need for registered projections, and constraints on the basis maps can be incorporated together with the fitting of the spectral CT data. The main difficulty of the one-step approach is that it necessitates an iterative algorithm because the corresponding transmission data model is too complex for analytic solution, at present. Iterative image reconstruction (IIR) has been applied to spectral CT in order to address the added complexity of the data model [15, 16, 17, 18, 19, 20, 21, 22].

In this work, we develop a framework that addresses one-step image reconstruction in spectral CT allowing for non-smooth convex constraints to be applied to the basis maps. We demonstrate the algorithm with the use of total variation (TV) constraints, but the framework allows for other constraints such as non-negativity, upper bounds, and sum bounds applied to either the basis maps or to a composite image such as an estimated mono-chromatic attenuation map.

We draw upon recent developments in large-scale first-order algorithms and adapt them to incorporate the non-linear model for spectral CT to optimize the data-fidelity of the estimated image by minimizing the discrepancy between the observed and estimated data. We present an algorithm framework for constrained optimization, deriving algorithms for minimizing the data discrepancy based on least-squares fitting and on a transmission Poisson likelihood model. As previously mentioned, the framework admits many convex constraints that can be exploited to stabilize image reconstruction from spectral CT data. Section II presents the constrained optimization for one-step spectral CT image reconstruction; Sec. III presents a convex-concave primal-dual algorithm that addresses the non-convex data discrepancy term arising from the non-linear spectral CT data model; and Sec. IV demonstrates the proposed algorithm with simulated spectral CT transmission data.

II One-step image reconstruction for spectral CT

II-A Spectral CT data model

For the present work, we employ a basic spectral model for the energy-windowed transmitted X-ray intensity along a ray $\ell$ , where the transmitted X-ray intensity in the energy window $w$ for ray $\ell$ is given by

I_{w\ell}=\int_{E}S_{w\ell}(E)\exp\left[-\int_{t\in\ell}\mu(E,\vec{r}(t))\,\mathsf{d}t\right]\,\mathsf{d}E.

Here $\int_{t\in\ell}$ denotes that we are integrating along the ray $\ell$ while $\int_{E}$ integrates over the range of energy; $S_{w\ell}(E)$ is the product of the X-ray beam spectrum intensity and detector sensitivity for the energy window $w$ and transmission ray $\ell$ at energy $E$ ; and $\mu(E,\vec{r})$ is the linear X-ray attenuation coefficient for energy $E$ at the spatial location $\vec{r}$ . Let $I^{(0)}_{w\ell}$ be the transmitted intensity in the setting where no object is present between the X-ray beam and the detector (i.e. attenuation is set to zero), given by

I^{(0)}_{w\ell}=\int_{E}S_{w\ell}(E)\,\mathsf{d}E;\;s_{w\ell}(E)=S_{w\ell}(E)/I^{(0)}_{w\ell}.

Then we can write

I_{w\ell}=I^{(0)}_{w\ell}\int_{E}s_{w\ell}(E)\exp\left[-\int_{t\in\ell}\mu(E,\vec{r}(t))\,\mathsf{d}t\right]\,\mathsf{d}E,

(1)

where $s_{w\ell}(E)$ represents the normalized energy distribution of X-ray intensity and detector sensitivity. Image reconstruction for spectral CT aims to recovery the a complete energy-dependent linear attenuation map $\mu(E,\vec{r})$ from intensity measurements $I_{w\ell}$ in all windows $w$ and rays $\ell$ comprising the X-ray projection data set.

Throughout the article we use the convention that $N_{x}$ is the dimension of the discrete index $x$ . For example, the spectral CT data set consists of $N_{w}$ energy windows and $N_{\ell}$ transmission rays.

This inverse problem is simplified by exploiting the fact that the energy-dependence of the X-ray attenuation coefficient can be represented efficiently by a low-dimensional expansion. For the present work, we employ the basis material expansion

\mu(E,\vec{r})=\sum_{m}\mu_{m}(E)f_{m}(\vec{r})=\sum_{m}\left(\frac{\mu_{m}(E)}{\rho_{m}}\right)\rho_{m}f_{m}(\vec{r}),

(2)

where $\rho_{m}$ is the density of material $m$ ; the X-ray mass attenuation coefficient $\mu_{m}(E)/\rho_{m}$ are available from the national institute of standards and technology (NIST) report by Hubbell and Seltzer [23]; and $f_{m}(\vec{r})$ is the fractional density map of material $m$ at location $\vec{r}$ . For the present spectral CT image reconstruction problem, we aim to recover $f_{m}(\vec{r})$ , which we refer to as the material maps.

Proceeding with the spectral CT model, we discretize the material maps $f_{m}(\vec{r})$ by use of an expansion set

f_{m}(\vec{r})=\sum_{k}^{N_{k}}f_{km}\phi_{k}^{\text{(map)}}(\vec{r}),

where $\phi_{k}^{\text{(map)}}(\vec{r})$ are the representation functions for the material maps, respectively. For the 2D/3D image representation standard pixels/voxels are employed, that is, $k$ indexes the pixels/voxels. With the spatial expansion set, the line integration over the material maps is represented by a matrix $X$ with entry $X_{\ell k}$ measuring the length of the intersection between ray $\ell$ and pixel $k$ :

\int_{t\in\ell}\mu(E,\vec{r}(t))\,\mathsf{d}t=\sum_{mk}\mu_{m}(E)X_{\ell k}f_{km},

where formally we can calculate

X_{\ell k}=\int_{t\in\ell}\phi_{k}^{\text{(map)}}(\vec{r})\,\mathsf{d}t.

This integration results in the standard line-intersection method for the pixel/voxel basis.

The discretization of the integration over energy $E$ in Eq. (1) is perform by use of a Riemann sum approximation.

I_{w\ell}=I^{(0)}_{w\ell}\int_{E}s_{w\ell}(E)\exp\left[-\int_{t\in\ell}\mu(E,\vec{r})\,\mathsf{d}t\right]\,\mathsf{d}E\\ \approx I^{(0)}_{w\ell}\sum_{i}\Delta E_{i}s_{w\ell}(E_{i})\exp\left[-\int_{t\in\ell}\mu(E_{i},\vec{r})\,\mathsf{d}t\right]=I^{(0)}_{w\ell}\sum_{i}s_{w\ell i}\exp\left[-\sum_{mk}\mu_{mi}X_{\ell k}f_{km}\right],

(3)

where $i$ indexes the discretized energy $E$ and

s_{w\ell i}=\Delta E_{i}s_{w\ell}(E_{i})\text{ and }\mu_{mi}=\mu_{m}(E_{i}).

With the Riemann sum approximation we normalize the discrete window spectra,

\sum_{i}s_{w\ell i}=1.

Modeling photon-counting detection, we express X-ray incident and transmitted spectral fluence in terms of numbers of photons per ray $\ell$ (as before, the ray $\ell$ identifies the source detector-bin combinations) and energy window $w$ :

\hat{c}_{w\ell}=N_{w\ell}\sum_{i}s_{w\ell i}\exp\left[-\sum_{mk}\mu_{mi}X_{\ell k}f_{km}\right],

(4)

where $N_{w\ell}$ is the incident spectral fluence and $\hat{c}_{w\ell}$ is interpreted as a mean transmitted fluence. Note that in general the right hand side of Eq. (4) evaluates to a non-integer value and as a result the left hand side variable cannot be assigned to an integer as would be implied by reporting transmitted fluence in terms of numbers of photons. This inconsistency is rectified by interpreting the left hand side variable, $\hat{c}_{w\ell}$ , as an expected value.

II-B Constrained optimization for one-step basis decomposition

For the purpose of developing one-step image reconstruction of the basis material maps from transmission counts data, we formulate a constrained optimization involving minimization of a non-convex data-discrepancy objective function subject to convex constraints. The optimization problem of interest takes the following form

f^{*}=\operatorname*{arg\,min}_{f}\left\{\sum_{w\ell}D(c_{w\ell},\hat{c}_{w\ell}(f))+\sum_{i}\delta(P_{i})\right\},

(5)

where $c_{w\ell}$ are the measured counts in energy window $w$ and ray $\ell$ ; $D(\cdot,\cdot)$ is a generic data discrepancy objective function; and the indicator functions $\delta(P_{i})$ enforce the convex constraints $f\in P_{i}$ , the $P_{i}$ are convex sets corresponding to the desired constraints (for instance, nonnegativity of the material maps). The indicator function is defined

\delta(P)=\begin{cases}0&f\in P\\ \infty&f\notin P\end{cases}.

(6)

Use of constrained optimization with TV constraints is demonstrated in Sec. IV.

Data discrepancy functions

For the present work, we consider two data discrepancy functions: transmission Poisson likelihood (TPL) and least-squares (LSQ)

	$\displaystyle D_{\text{TPL}}(c,\hat{c}(f))$	$\displaystyle=\sum_{w\ell}\left[\hat{c}_{w\ell}(f)-c_{w\ell}-c_{w\ell}\log\left(\hat{c}_{w\ell}(f)/c_{w\ell}\right)\right]$		(7)
	$\displaystyle D_{\text{LSQ}}(c,\hat{c}(f))$	$\displaystyle=\frac{1}{2}\sum_{w\ell}\left[\log(c_{w\ell})-\log\left(\hat{c}_{w\ell}(f)\right)\right]^{2}.$		(8)

The TPL data discrepancy function is derived from the negative log likelihood of a stochastic model of the counts data

c_{w\ell}\sim\mathsf{Poisson}\left(\hat{c}_{w\ell}(f)\right),

that is, minimizing $D_{\text{TPL}}$ is equivalent to maximizing the likelihood. Note that in defining $D_{\text{TPL}}$ we have subtracted a term independent of $f$ from the negative log likelihood so that $D_{\text{TPL}}$ is zero when $c=\hat{c}(f)$ , and positive otherwise. From a physics perspective, the important difference between these two data discrepancy functions is how they each weight the individual measurements; the LSQ function treats all measurements equally while the TPL function gives greater weight to higher count measurements. We point out this property to emphasize that the TPL data discrepancy can be useful even when there are data inconsistencies due to other physical factors besides the stochastic nature of the counts measurement. This alternate weighting is also achieved without introducing additional parameters as would be the case for a weighted quadratic data discrepancy. From a mathematics perspective, both data functions are convex functions of $\hat{c}_{w\ell}(f)$ , but they are non-convex functions of $f$ . It is the non-convexity with respect to $f$ that drives the main theoretical and algorithmic development of this work. Although we consider only these two data fidelities, the same methods can be applied to other functions.

Convex constraints

The present algorithm framework allows for convex constraints that may improve reconstruction of the basis material maps. In Eq. (5) the constraints are coded with indicator functions, but here we express the constraints by the inequalities that define the convex set to which the material maps are restricted. When the basis materials are identical to the materials actually present in the subject, the basis maps can be highly constrained. Physically, the fractional densities represented by each material map must take on a value between zero and one, and the corresponding constraint is

0\leq f_{mk}\leq 1.

(9)

Similarly, the sum of the fractional densities cannot be greater than one, leading to a constraint on the sum of material maps

\sum_{m}f_{mk}\leq 1.

(10)

Care must be taken, however, in using these bound and sum constraints when the basis materials used for computation are not the same as the materials actually present in the scanned object. The bounds on the material maps and their sum must likely be loosened, and therefore they may not be as effective.

In medical imaging, where multiple soft tissues comprise the subject, it is standard to employ a spectral CT materials basis which does not include many of the tissue/density combinations present. The reason for this is that soft tissues such as muscle, fat, brain, blood, etc., all have attenuation curves similar to water, and recovering each of these soft tissues individually becomes an extremely ill-posed inverse problem. For spectral CT, it is common to employ a two-material expansion set, such as bone and water, and possibly a third material for representing contrast agent that has a K-edge in the diagnostic X-ray energy range. The displayed image can then be the basis material maps or the estimated X-ray attenuation map for a single energy $E$ , also known as a monochromatic image

f^{\text{(mono)}}_{k}(E)=\sum_{m}\left(\frac{\mu_{m}(E)}{\rho_{m}}\right)\rho_{m}f_{mk}.

(11)

A non-negativity constraint can be applied to the monochromatic image

f^{\text{(mono)}}_{k}(E)\geq 0

at one or more energies. This constraint makes physical sense even when the basis materials are not the same as the materials in the subject.

Finally, we formulate $\ell_{1}$ -norm constraints on the gradient magnitude images, also known as the total variation, in order to encourage gradient magnitude sparsity in either the basis material maps or the monochromatic image. In applying TV constraints to the basis material maps, we allow for different constraint values $\gamma_{m}$ for each material

\|f_{m}\|_{\text{TV}}\equiv\left\|\left(|\nabla f_{m}|\right)\right\|_{1}\leq\gamma_{m},

where $\nabla$ represents the finite-differencing approximation to the gradient, and we use $|\cdot|$ to represent a spatial magnitude operator so that $|\nabla f_{m}|$ is the gradient magnitude image (GMI) of material map $m$ . Similarly, a TV constraint can be formulated so that it applies to the monochromatic image at energy $E$

\|f^{\text{(mono)}}(E)\|_{\text{TV}}\equiv\left\|\left(|\nabla f^{\text{(mono)}}(E)|\right)\right\|_{1}\leq\gamma_{\text{mono}}(E),

where the constraint can be applied at one or more values of $E$ .

The constraints involving TV of the material maps and the monochromatic image are specifically in Sec. IV. Many other convex constraints can be incorporated into the presented framework such as constraints on a generalized TV computed from multiple monochromatic images [24].

III A first-order algorithm for spectral CT constrained optimization

The proposed algorithm derives from the primal-dual algorithm of Chambolle and Pock (CP) [25, 26, 27]. Considering the general constrained optimization form in Eq. (5), the second term coding the convex constraints can be treated in the same way as shown in Refs. [28, 29]. The main algorithmic development, presented here, is the generalization and adaptation of CP’s primal-dual algorithm to the minimization of the data discrepancy term, the first term of Eq. (5). We derive the data fidelity steps specifically focusing on the deriving steps for $D_{\text{TPL}}$ and $D_{\text{LSQ}}$ .

Optimizing the spectral CT data fidelity

We first sketch the main developments of the algorithm for minimizing the non-convex data discrepancy terms, and then explain each step in detail. The overall design of the algorithm is comprised of two nested iteration loops. The outer iteration loop involves derivation of a convex quadratic upper bound to the local quadratic Taylor expansion about the current estimate for the material maps. The inner iteration loop takes descent steps for the current quadratic upper bound. Although the algorithm construction formally involves two nested iteration loops, in practice the number of inner loop iterations is set to one. Thus, effectively the algorithm consists only of a single iteration loop where a re-expansion of the data discrepancy term is performed at every iteration.

The local convex quadratic upper bound, used to generate descent steps for the non-convex data discrepancy terms, does not fit directly with the generic primal-dual optimization form used by CP. A convex-concave generalization to the CP primal-dual algorithm is needed. The resulting algorithm called mirrored convex-concave (MOCCA) algorithm is presented in detail in Ref. [30]. For the one-step spectral CT image reconstruction algorithm we present: the local convex quadratic upper bound, a short description of MOCCA and its application in the present context, preconditioning, and convergence checks for the spectral CT image reconstruction algorithm.

III-A A local convex quadratic upper bound to the spectral CT data discrepancy terms

III-A1 Quadratic expansion

We carry out the deriviations on $D_{\text{LSQ}}$ and $D_{\text{TPL}}$ in parallel. The local quadratic expansion for each of these data discrepancy terms about the material maps $f=f_{0}$ is

	$\displaystyle L(f)\approx L\left(f_{0}\right)+\left(f-f_{0}\right)^{\top}\nabla_{f}L\left(f_{0}\right)+\frac{1}{2}\left(f-f_{0}\right)^{\top}\nabla^{2}_{f}L\left(f_{0}\right)\left(f-f_{0}\right),$	(12)
where
	$\displaystyle L(f)=D(c,\hat{c}(f)).$

To obtain the desired expansions, we need expressions for the gradient and Hessian of each data discrepancy. The gradient of $L_{\text{TPL}}(f)$ is derived explicitly in Appendix A; we do not show the details for the other derivations. The data discrepancy gradients are:

	$\displaystyle\nabla_{f}L_{\text{TPL}}(f)$	$\displaystyle=Z^{\top}A(f)^{\top}r(f),$		(13)
	$\displaystyle\nabla_{f}L_{\text{LSQ}}(f)$	$\displaystyle=Z^{\top}A(f)^{\top}r^{\text{(log)}}(f),$		(14)

where $r$ and $r^{\text{(log)}}$ denote the residuals in terms of counts or log counts:

	$\displaystyle r_{w\ell}(f)$	$\displaystyle=c_{w\ell}-\hat{c}_{w\ell}(f),$		(15)
	$\displaystyle r^{\text{(log)}}_{w\ell}(f)$	$\displaystyle=\log(c_{w\ell})-\log\left(\hat{c}_{w\ell}(f)\right);$		(16)

$Z$ represents the combined linear transform that accepts material maps, performs projection, and then combines the resulting sinograms to form monochromatic sinograms at energy $E_{i}$ :

Z_{\ell i,mk}=\mu_{mi}X_{\ell k};

(17)

and $A(f)$ is a term that results from the gradient of the logarithm of the estimated counts $\log\hat{c}(f)$ :

	$\displaystyle A_{w\ell,\ell^{\prime}i}(f)=$	$\displaystyle\frac{s_{w\ell i}\exp\left[-(Zf)_{\ell i}\right]}{\sum_{i^{\prime}}s_{w\ell i^{\prime}}\exp\left[-(Zf)_{\ell i^{\prime}}\right]}\mathbf{I}_{\ell\ell^{\prime}}$		(18)
	$\displaystyle\mathbf{I}_{\ell\ell^{\prime}}=$	$\displaystyle\begin{cases}0&\ell\neq\ell^{\prime}\\ 1&\ell=\ell^{\prime}\end{cases}.$

Using the same variable and linear transform definitions, the expressions for the two Hessians are

	$\displaystyle\nabla^{2}_{f}L_{\text{TPL}}(f)$	$\displaystyle=-Z^{\top}\operatorname{diag}\left(A(f)^{\top}r(f)\right)Z+Z^{\top}A(f)^{\top}\operatorname{diag}\left(\hat{c}(f)+r(f)\right)A(f)Z,$		(19)
	$\displaystyle\nabla^{2}_{f}L_{\text{LSQ}}(f)$	$\displaystyle=-Z^{\top}\operatorname{diag}\left(A(f)^{\top}r^{\text{(log)}}(f)\right)Z+Z^{\top}A(f)^{\top}\operatorname{diag}\left(1+r^{\text{(log)}}(f)\right)A(f)Z.$		(20)

Substituting either Eq. (19) or (20) for the Hessian and either Eq. (13) or (14) for the gradient into the Taylor expansion in Eq. (12), yields the quadratic approximation to the data discrepancy terms of interest. This quadratic is in general non-convex because both Hessian expressions can have negative values.

III-A2 A local convex upper bound to $L(f)$

The key to deriving a local convex upper bound to the quadratic expansion of $L(f)$ is to split the Hessian expressions into positive and negative components. Setting the negative components to zero and substituting this thresholded Hessian into the Taylor’s expansion, yields a quadratic term with non-negative curvature. (As an aside, a tighter convex local quadratic upper bound would be attained by diagonalizing the Hessian and forming a positive semi-definite Hessian by keeping eigenvectors corresponding to only non-negative eigenvalues in the eigenvalue decomposition, but for realistic sized tomography configurations such an eigenvalue decomposition is impractical.) The algebraic steps for splitting the Hessian into positive and negative components in the form

\nabla^{2}_{f}L(f)=\nabla^{2}_{+}L(f)-\nabla^{2}_{-}L(f),

where $\nabla^{2}_{+}L(f)$ and $\nabla^{2}_{-}L(f)$ are both positive semidefinite (see Appendix B for more details). The resulting split expressions are:

	$\displaystyle\nabla^{2}_{+}L_{\text{TPL}}(f)$	$\displaystyle=Z^{\top}\operatorname{diag}\left(A(f)^{\top}r_{-}(f)\right)Z+Z^{\top}A(f)^{\top}\operatorname{diag}\left(\hat{c}(f)-r_{-}(f)\right)A(f)Z,$		(21)
	$\displaystyle\nabla^{2}_{-}L_{\text{TPL}}(f)$	$\displaystyle=Z^{\top}\operatorname{diag}\left(A(f)^{\top}r_{+}(f)\right)Z-Z^{\top}A(f)^{\top}\operatorname{diag}\left(r_{+}(f)\right)A(f)Z,$		(22)

and

	$\displaystyle\nabla^{2}_{+}L_{\text{LSQ}}(f)$	$\displaystyle=Z^{\top}\operatorname{diag}\left(A(f)^{\top}r_{-}^{\text{(log)}}(f)\right)Z+Z^{\top}A(f)^{\top}\operatorname{diag}\left(1-r_{-}^{\text{(log)}}(f)\right)A(f)Z,$		(23)
	$\displaystyle\nabla^{2}_{-}L_{\text{LSQ}}(f)$	$\displaystyle=Z^{\top}\operatorname{diag}\left(A(f)^{\top}r_{+}^{\text{(log)}}(f)\right)Z-Z^{\top}A(f)^{\top}\operatorname{diag}\left(r_{+}^{\text{(log)}}(f)\right)A(f)Z,$		(24)

where

r(f)=r_{+}(f)-r_{-}(f),\;\;r_{+}(f)=\max\left[r(f),0\right]\text{ and }r_{-}(f)=\max\left[-r(f),0\right],

and similarly

r^{\text{(log)}}(f)=r^{\text{(log)}}_{+}(f)-r^{\text{(log)}}_{-}(f),\;\;r_{+}^{\text{(log)}}(f)=\max\left[r^{\text{(log)}}(f),0\right]\text{ and }r^{\text{(log)}}_{-}(f)=\max\left[-r^{\text{(log)}}(f),0\right].

To summarize, the expression for the convex local upper bound to the quadratic approximation in Eq. (12) is

Q\left(c,f_{0};f\right)=L\left(f_{0}\right)+\left(f-f_{0}\right)^{\top}\nabla_{f}L\left(f_{0}\right)+\frac{1}{2}\left(f-f_{0}\right)^{\top}\nabla^{2}_{+}L\left(f_{0}\right)\left(f-f_{0}\right),

(25)

where $\nabla^{2}_{+}$ is used instead of $\nabla^{2}_{f}$ in the quadratic term. Here $Q$ depends parametrically on the counts data $c$ , through the function $L$ (see Eq. (12)), and the expansion center $f_{0}$ . The gradients of $L$ at $f_{0}$ are obtained from Eqs. (13) and (14), and the Hessian upper bounds are available from Eqs. (21) and (23). Note that the quadratic expression in Eq. (25) is not necessarily an upper bound of the data discrepancy functions, even locally, because we bound only the quadratic expansion. We employ the convex function $Q\left(c,f_{0};f\right)$ combined with convex constraints to generate descent steps for the generic non-convex optimization problem specified in Eq. (5).

III-B The motivation and application of MOCCA

III-B1 Summary of the Chambolle-Pock (CP) primal-dual framework

The generic convex optimization addressed in Ref. [25] is

x^{\star}=\operatorname*{arg\,min}_{x}\left\{F(Kx)+G(x)\right\},

(26)

where $F$ and $G$ are convex, possibly non-smooth, functions and $K$ is a matrix multiplying the vector $x$ . The ability to handle non-smooth convex functions is key for addressing the convex constraints of Eq. (5). In the primal-dual picture this minimization is embedded in a larger saddle point problem

\min_{x}\max_{y}\left\{y^{\top}Kx-F^{*}(y)+G(x)\right\},

(27)

using the Legendre transform or convex conjugation

F^{*}(y)=\max_{x}\left\{x^{\top}y-F(x)\right\},

(28)

and the fact that

F(x)=F^{**}(x)=\max_{y}\left\{y^{\top}x-F^{*}(y)\right\}

(29)

if $F$ is a convex function. The CP primal-dual algorithm of interest solves Eq. (27) by iterating on the following steps

$\displaystyle y^{(n+1)}$	$\displaystyle=\operatorname{arg\,min}_{y^{\prime}}\left\{F^{}(y^{\prime})+\frac{1}{2\sigma}\\|y^{(n)}+\sigma K\bar{x}^{(n)}-y^{\prime}\\|^{2}_{2}\right\}$	(30)
$\displaystyle x^{(n+1)}$	$\displaystyle=\operatorname*{arg\,min}_{x^{\prime}}\left\{G(x^{\prime})+\frac{1}{2\tau}\\|x^{(n)}-\tau K^{\top}y^{(n+1)}-x^{\prime}\\|^{2}_{2}\right\}$	(31)
$\displaystyle\bar{x}^{(n+1)}$	$\displaystyle=2x^{(n+1)}-x^{(n)},$	(32)

where $n$ is the iteration index; $\sigma>0$ and $\tau>0$ are the primal and dual step sizes, respectively, and these step sizes must satisfy the inequality

\sigma\tau<\frac{1}{\|K\|^{2}_{2}}

where $\|K\|_{2}$ is the largest singular value of $K$ . Because this algorithm solves the saddle point problem, Eq. (27), one obtains the solution to the primal problem, Eq. (26) along with its Fenchel dual

y^{\star}=\operatorname*{arg\,max}_{x}\left\{-F^{*}(y)-G^{*}(-K^{\top}y)\right\}.

(33)

The fact that both Eqs. (26) and (33) are solved simultaneously provides a convergence check: the primal-dual gap, the difference between the objective functions of Eqs. (26) and (33), tends to zero as the iteration number increases.

In some settings, the requirement $\sigma\tau<\frac{1}{\|K\|^{2}_{2}}$ may be impractical or too conservative, and the CP algorithm can instead be implemented with diagonal matrices $\Sigma$ and $T$ in place of $\sigma$ and $\tau$ [26], with the condition $\|\Sigma^{1/2}KT^{1/2}\|<1$ and the revised steps

$\displaystyle y^{(n+1)}$	$\displaystyle=\operatorname{arg\,min}_{y^{\prime}}\left\{F^{}(y^{\prime})+\frac{1}{2}\\|y^{(n)}+\Sigma K\bar{x}^{(n)}-y^{\prime}\\|^{2}_{\Sigma^{-1}}\right\}$	(34)
$\displaystyle x^{(n+1)}$	$\displaystyle=\operatorname*{arg\,min}_{x^{\prime}}\left\{G(x^{\prime})+\frac{1}{2}\\|x^{(n)}-TK^{\top}y^{(n+1)}-x^{\prime}\\|^{2}_{T^{-1}}\right\}$	(35)
$\displaystyle\bar{x}^{(n+1)}$	$\displaystyle=2x^{(n+1)}-x^{(n)},$	(36)

where for a positive semidefinite matrix $A$ the norm $\|z\|_{A}$ is defined as $\sqrt{z^{\top}Az}$ .

III-B2 The need to generalize the CP primal-dual framework

To apply the CP primal-dual algorithm to $Q$ for fixed $f_{0}$ , we need to write Eq. (25) in the form of the objective function in Eq. (26). Manipulating the expression for $Q$ and dropping all terms that are constant with respect to $f$ , we obtain

$\displaystyle Q\left(c,f_{0};f\right)$	$\displaystyle=\frac{1}{2}f^{\top}K^{\top}(D-E)Kf-f^{\top}K^{\top}b,$	(37)
$\displaystyle K$	$\displaystyle=\left(\begin{array}[]{c}K_{1}\\ K_{2}\end{array}\right)=\left(\begin{array}[]{c}A\left(f_{0}\right)Z\\ Z\end{array}\right),$	(42)
$\displaystyle D$	$\displaystyle=\left(\begin{array}[]{cc}D_{1}&0\\ 0&D_{2}\end{array}\right),\;\;E=\left(\begin{array}[]{cc}E_{1}&0\\ 0&0\end{array}\right),\;\;b=\left(\begin{array}[]{c}b_{1}\\ b_{2}\end{array}\right)$	(49)
$\displaystyle D_{1}$	$\displaystyle=\begin{cases}\operatorname{diag}\left[\hat{c}\left(f_{0}\right)\right]&\text{if }L=L_{\text{TPL}}\\ \mathbf{I}&\text{if }L=L_{\text{LSQ}}\end{cases},$
$\displaystyle D_{2}$	$\displaystyle=\begin{cases}\operatorname{diag}\left[A\left(f_{0}\right)^{\top}r_{-}\left(f_{0}\right)\right]&\text{if }L=L_{\text{TPL}}\\ \operatorname{diag}\left[A\left(f_{0}\right)^{\top}r^{\text{(log)}}_{-}\left(f_{0}\right)\right]&\text{if }L=L_{\text{LSQ}}\end{cases},$
$\displaystyle E_{1}$	$\displaystyle=\begin{cases}\operatorname{diag}\left[r_{-}\left(f_{0}\right)\right]&\text{if }L=L_{\text{TPL}}\\ \operatorname{diag}\left[r^{\text{(log)}}_{-}\left(f_{0}\right)\right]&\text{if }L=L_{\text{LSQ}}\end{cases},$
$\displaystyle b_{1}$	$\displaystyle=\begin{cases}(D_{1}-E_{1})K_{1}f_{0}-r\left(f_{0}\right)&\text{if }L=L_{\text{TPL}}\\ (D_{1}-E_{1})K_{1}f_{0}-r^{\text{(log)}}\left(f_{0}\right)&\text{if }L=L_{\text{LSQ}}\end{cases},$
$\displaystyle b_{2}$	$\displaystyle=D_{2}K_{2}f_{0}.$

The matrices $D$ and $E$ are nonnegative and depend on $c$ and $f_{0}$ ; $b$ is a vector which also depends on $c$ and $f_{0}$ ; and $K$ is a matrix that depends only on $f_{0}$ . Both terms of $Q$ are functions of $Kf$ and accordingly $Q$ is identified with the function $F$ in the objective function of Eq. (26)

	$\displaystyle Q\left(c,f_{0};f\right)$	$\displaystyle=F_{Q}(Kf),$
	$\displaystyle F_{Q}(z)$	$\displaystyle=\frac{1}{2}z^{\top}(D-E)z-z^{\top}b.$		(50)

Because $Q$ is a convex function of $f$ , $F_{Q}$ is convex as a function of $f$ . The function $F_{Q}$ , however, is not a convex function of $z$ . Because $D$ and $E$ are non-negative matrices, $F_{Q}$ is a difference of convex functions of $z$ ,

	$\displaystyle F_{Q}(z)$	$\displaystyle=F_{Q+}(z)-F_{Q-}(z),$
	$\displaystyle F_{Q+}(z)$	$\displaystyle=\frac{1}{2}z^{\top}Dz-z^{\top}b,$
	$\displaystyle F_{Q-}(z)$	$\displaystyle=\frac{1}{2}z^{\top}Ez,$

where $F_{Q+}(z)$ and $F_{Q-}(z)$ are convex functions of $z$ . That $F_{Q}(z)$ is not convex implies that $F_{Q}$ cannot be written as the convex conjugate of $F_{Q}^{*}(y)$ , and performing the maximization over $y$ in Eq. (27) no longer yields Eq. (26).

III-B3 Heuristic derivation of MOCCA

To generalize the CP algorithm to allow the case of interest, we consider the function $F$ to be a convex-concave

F(z)=F_{+}(z)-F_{-}(z),

where $F_{+}$ and $F_{-}$ are both convex. The heuristic strategy for MOCCA is to employ a convex approximation to $F(z)$ in the neighborhood of a point $z=z_{0}$

F_{\text{convex}}(z_{0};z)=F_{+}(z)-z^{\top}\nabla F_{-}(z_{0});

(51)

(again we drop terms that are constant with respect to $z$ ). We then execute an iteration of the CP algorithm on the convex function $F_{\text{convex}}(z_{0};z)$ ; and then modify the point of convex approximation $z_{0}$ and repeat the iteration. The question then is how to choose $z_{0}$ , the center for the convex approximation, in light of the fact that the optimization of $F$ in the CP algorithm happens in the dual space with $F^{*}$ , see Eq. (30).

A corresponding primal point to a point in the dual space can be determined by selecting the maximizer of the objective function in the definition of the Legendre transform. Taking the gradient of the objective function in Eq. (29) and setting it to zero, yields

x=\nabla_{y}F^{*}(y).

(52)

We use this relation to find the expansion point for the primal objective function that mirrors the current value of the dual variables.

Incorporating the convex approximation $F_{\text{convex}}(z_{0};z)$ about the mirrored expansion point $z_{0}$ into the CP algorithm, yields the iteration steps for MOCCA

$\displaystyle z^{(n+1)}_{0}$	$\displaystyle=\nabla_{y}F_{\text{convex}}^{*}\left(z^{(n)}_{0};y^{(n)}\right)=\Sigma^{-1}(y^{(n-1)}-y^{(n)}+\Sigma K\bar{f}^{(n-1)})$	(53)
$\displaystyle y^{(n+1)}$	$\displaystyle=\operatorname{arg\,min}_{y^{\prime}}\left\{F_{\text{convex}}^{}\left(z^{(n+1)}_{0};y^{\prime}\right)+\frac{1}{2}\\|y^{(n)}+\Sigma K\bar{f}^{(n)}-y^{\prime}\\|^{2}_{\Sigma^{-1}}\right\}$	(54)
$\displaystyle f^{(n+1)}$	$\displaystyle=\operatorname*{arg\,min}_{f^{\prime}}\left\{G(f^{\prime})+\frac{1}{2}\\|f^{(n)}-TK^{\top}y^{(n+1)}-f^{\prime}\\|^{2}_{T^{-1}}\right\}$	(55)
$\displaystyle\bar{f}^{(n+1)}$	$\displaystyle=2f^{(n+1)}-f^{(n)},$	(56)

where $F_{\text{convex}}^{*}(z_{0};y)$ is convex conjugate to $F_{\text{convex}}(z_{0};z)$ with respect to the second argument; the first line obtains the mirror expansion point using Eq. (52) and the right hand side expression is found by setting to zero the gradient of the objective function in Eq. (54); the second line makes use of convex approximation $F_{\text{convex}}$ in the form of its convex conjugate; and the remaining two lines are the same as the those of the CP algorithm. For the simulations in this article, all variables are initialized to zero. Convergence of MOCCA, the algorithm specified by Eqs. (53) - (56), is investigated in an accompanying paper [30], which also develops the algorithm for a more general setting.

III-B4 Application of MOCCA to optimization of the spectral CT data fidelity

The MOCCA algorithm handles a fixed convex-concave function $F$ , convex function $G$ , and linear transform $K$ . In order to apply it to the spectral CT data fidelity, we propose: employing the local quadratic expansion in Eq. (37) to which we apply MOCCA, re-expand the spectral CT data discrepancy at the current estimate of the material maps, and iterate this procedure until convergence. We refer to iterations of the core MOCCA algorithm as “inner” iterations, and the process of iteratively re-expanding the data discrepancy and applying MOCCA are the “outer” iterations. Because MOCCA allows for non-smooth terms, the convex constraints described in Sec. II-B can be incorporated and the inner iterations aim at solving the intermediate problem

f^{*}=\operatorname*{arg\,min}_{f}\left\{\sum_{w\ell}Q\left(c_{w\ell},f_{0};f\right)+\sum_{i}\delta(P_{i})\right\}.

(57)

For the remainder of this section, for brevity, we drop the constraints and write the update steps taking only for the spectral CT data fidelity. The full algorithm with the convex constraints discussed in Sec. II-B can be derived using the methods described in [27] and an algorithm instance with TV constraints on the material maps is covered in Appendix C.

In applying MOCCA to $Q\left(c_{w\ell},f_{0};f\right)$ , we use the convex and concave components from $F_{Q}$ in Eq. (50) to form the local convex quadratic expansion needed in MOCCA, see Eq. (51),

F_{Q,\text{convex}}(z)=\frac{1}{2}z^{\top}Dz-(z-z_{0})^{\top}(b+Ez_{0}).

(58)

The corresponding dual function

F_{Q,\text{convex}}^{*}(y)=\frac{1}{2D}\|y+b+Ez_{0}\|_{2}^{2}+z^{\top}_{0}Ez_{0},

(59)

is needed to derive the MOCCA dual update step at Eq. (54). We note that because the material maps $f$ enter $Q\left(c_{w\ell},f_{0};f\right)$ only after linear transformation, $Kf$ , and comparing with the generic optimization problem in Eq. (26), we have $G(f)=0$ for the present case where we only consider minimization of the data discrepancy.

In using an inner/outer iteration, a basic question is how accurately does the inner problem need to be solved. It turns out that it is sufficient to employ a single inner iteration, so that effectively the proposed algorithm no longer consists of nested iteration loops. Instead, the proposed algorithm performs re-expansion at every iteration:

$\displaystyle f_{0}$	$\displaystyle=\bar{f}^{(n)}$	(60)
$\displaystyle\Sigma^{(n)}$	$\displaystyle=\|K_{1}(f_{0})\|\mathbf{1}/\lambda;\;\;T^{(n)}=\lambda\|K_{1}(f_{0})\|^{\top}\mathbf{1}$	(61)
$\displaystyle z^{(n+1)}_{0}$	$\displaystyle=(\Sigma^{(n)})^{-1}\left(y^{(n-1)}-y^{(n)}+\Sigma^{(n)}K_{1}(f_{0})\bar{f}^{(n-1)}\right)$	(62)
$\displaystyle y^{(n+1)}$	$\displaystyle=(D_{1}(f_{0})+\Sigma^{(n)})^{-1}\left[D_{1}(f_{0})\left(y^{(n)}+\Sigma^{(n)}K_{1}(f_{0})\bar{f}^{(n)}\right)-\Sigma^{(n)}\left(b_{1}(f_{0})+E_{1}(f_{0})z^{(n+1)}_{0}\right)\right]$	(63)
$\displaystyle f^{(n+1)}$	$\displaystyle=\bar{f}^{(n)}-T^{(n)}K_{1}(f_{0})^{\top}y^{(n+1)}$	(64)
$\displaystyle\bar{f}^{(n+1)}$	$\displaystyle=2f^{(n+1)}-f^{(n)},$	(65)

where $f^{(0)}$ , $f^{(-1)}$ , $\bar{f}^{(0)}$ , $y^{(0)}$ , and $y^{(-1)}$ are initialized to zero vectors.

Before explaining each line of the one-step spectral CT algorithm specified by Eqs. (60)-(65), we point out important features of the use of re-expansion at every iteration: (1) There are no nested loops. (2) The size of the system of equations is significantly reduced; note that only the first matrix block of $K$ , $D$ , $E$ , and $b$ (see Eq. (37) for their definition) appears in the steps of the algorithm. By re-expanding at every iteration the set of update steps for the second matrix block becomes trivial. (3) Re-expanding at every step is not guaranteed to converge, and an algorithm control parameter $\lambda$ is introduced that balances algorithm convergence rate against possible unstable iteration, see Sec. IV for a demonstration on how $\lambda$ impacts convergence. A similar strategy was used together with the CP algorithm in the use of non-convex image regularity norms, see [28].

The first line of the algorithm, Eq. (60), explicitly assigns the current material maps estimate to the new expansion point. In this way it is clear in the following steps whether $\bar{f}^{(n)}$ enters the equations through the re-expansion center or through the steps of MOCCA. For the spectral CT algorithm it is convenient to use the vector step-sizes $\Sigma^{(n)}$ and $T^{(n)}$ , defined in Eq. (61), from the pre-conditioned form of the CP algorithm [26], because the linear transform $K_{1}(f_{0})$ is changing at each iteration as the expansion center changes. Computation of the vector step-sizes only involves single matrix-vector products of $|K_{1}(f_{0})|$ and $|K_{1}(f_{0})|^{\top}$ with a vector of ones, $\mathbf{1}$ , as opposed to performing the power method on $K_{1}(f_{0})$ to find the scalar step-sizes $\sigma$ and $\tau$ , which would render re-expansion at every iteration impractical. In Eq. (61), the parameter $\lambda$ enters in such a way that the product $\|\Sigma^{1/2}KT^{1/2}\|$ remains constant. For the preconditioned CP algorithm, $\lambda$ defined in this way will not violate the step-size condition. The dual and primal steps in Eqs. (63) and (64), respectively, are obtained by analytic computation of the minimizations in Eqs. (54) and (55) using Eq. (58) and $G(f)=0$ , respectively. The primal step at Eq.(64) and the primal variable prediction step at Eq. (65) are identical to the corresponding CP algorithm steps at Eqs. (31) and (32), respectively. The presented algorithm accounts only for the spectral CT data fidelity optimization. For the full algorithm incorporating TV constraints used in the results section, see the pseudocode in Appendix C.

III-C One-step algorithm $\mu$ -preconditioning

One of the main challenges of spectral CT image reconstruction is the similar dependence of the linear X-ray attenuation curves on energy for different tissues/materials. This causes rows of the attenuation matrix $\mu_{mi}$ to be nearly linearly dependent, or equivalently its condition number is large. There are two effects of the poor conditioning of $\mu_{mi}$ : (1) the ability to separate the material maps is highly sensitive to inconsistency in the spectral CT transmission data, and (2) the poor conditioning of $\mu_{mi}$ contributes to the overall poor conditioning of spectral CT image reconstruction negatively impacting algorithm efficiency. To address the latter issue, we introduce a simple preconditioning step that orthogonalizes the attenuation curves. We call this step “ $\mu$ ”-preconditioning to differentiate it from the preconditioning of the CP algorithm.

To perform $\mu$ -preconditioning, we form the matrix

M_{mm^{\prime}}=\sum_{i}\mu_{mi}\mu^{\top}_{im^{\prime}},

(66)

and perform the eigenvalue decomposition

M=U\operatorname{diag}(s)U^{T},

where the eigenvalues are ordered $s_{1}\geq s_{2}\geq\dots\geq s_{N_{m}}$ . The singular values of $\mu$ are given by the $\sqrt{s_{i}}$ ’s and its condition number is $\sqrt{s_{1}/s_{N_{m}}}$ . The preconditioning matrix for $\mu$ is given by

	$\displaystyle P$	$\displaystyle=\operatorname{diag}(\sqrt{s})U^{T},$		(67)
	$\displaystyle P^{-1}$	$\displaystyle=U\operatorname{diag}(1/\sqrt{s}).$

Implementation of $\mu$ -preconditioning consists of the following steps:

•

Transformation of material maps and attenuation matrix - The appropriate transformation is arrived at through inserting the identity matrix in the form of $P^{-1}P$ into the exponent of the intensity counts data model in Eq. (4):

\sum_{mk}X_{\ell k}\mu_{mi}f_{km}=\sum_{m\,m^{\prime}m^{\prime\prime}k}X_{\ell k}\mu_{m^{\prime\prime}i}(P^{-1})_{m^{\prime\prime}m}P_{m\,m^{\prime}}f_{km^{\prime}}=\sum_{mk}X_{\ell k}\mu^{\prime}_{mi}f^{\prime}_{km},

(68)

where

	$\displaystyle f^{\prime}_{km}$	$\displaystyle=\sum_{m^{\prime}}P_{mm^{\prime}}f_{km^{\prime}},$		(69)
	$\displaystyle\mu^{\prime}_{mi}$	$\displaystyle=\sum_{m^{\prime}}\mu_{m^{\prime}i}(P^{-1})_{m^{\prime}m}.$		(70)

•

Substitution into the one-step algorithm - Substitution of the transformed material maps and attenuation matrix into the one-step algorithm given by Eqs. (60)-(65) is fairly straight-forward. All occurrences of $f$ are replaced by $f^{\prime}$ , and the linear transform $K_{1}$ is replaced by

K_{1}^{\prime}=A^{\prime}(f_{0}^{\prime})Z^{\prime},

where, using Eqs. (17) and (18),

A^{\prime}_{w\ell,\ell^{\prime}i}(f^{\prime})=\frac{s_{w\ell i}\exp\left[-(Z^{\prime}f^{\prime})_{\ell i}\right]}{\sum_{i^{\prime}}s_{w\ell i^{\prime}}\exp\left[-(Z^{\prime}f^{\prime})_{\ell i^{\prime}}\right]},

and

Z^{\prime}=\mu^{\prime}X.

Using $\mu$ -preconditioning, care must be taken in computing the vector stepsizes $\Sigma^{\prime}$ and $T^{\prime}$ in Eq. (61). Without $\mu$ -preconditioning, the absolute value symbols are superfluous, because $K_{1}$ has non-negative matrix elements. With $\mu$ -preconditioning, the absolute value operation is necessary, because $K^{\prime}_{1}$ may have negative entries through its dependence on $Z^{\prime}$ and in turn $\mu^{\prime}$ .

•

Formulation of constraints - the previously discussed constraints are functions of the untransformed material maps. As a result, in using $\mu$ -preconditioning where we solve for the transformed material maps, the constraints should be formulated in terms of

$f=P^{-1}f^{\prime}.$ (71)

The explicit pseudocode for constrained data-discrepancy minimization using $\mu$ -preconditioning is given in Appendix C.

After applying the $\mu$ -preconditioned one-step algorithm the final material maps are arrived at through Eq. (71).

III-D Convergence checks

Within the present primal-dual framework we employ the primal-dual gap for checking convergence. The primal-dual gap that we seek is the difference between the convex quadratic approximation using the first matrix block in Eq. (58), which is the objective function in the primal minimization

f^{\star}=\operatorname*{arg\,min}_{f}\left\{\frac{1}{2}(K_{1}f)^{\top}D_{1}K_{1}f-(K_{1}f-z_{0})^{\top}(b_{1}+E_{1}z_{0})\right\},

(72)

and the objective function in the Fenchel dual maximization problem

y^{\star}=\operatorname*{arg\,max}_{y}\left\{-\frac{1}{2}D^{-1}_{1}\|y+b_{1}+E_{1}z_{0}\|_{2}^{2}-z^{\top}_{0}E_{1}z_{0}\right\}\text{ such that }K_{1}^{\top}y=0.

(73)

These problems are derived from the general forms in Eqs. (26) and (33), and the constraint in the dual maximization comes from the fact that $G(f)=0$ in the primal problem, see Sec. 3.1 in [27]. For a convergence check we inspect the difference between these two objective functions. Note that the constant term $z^{\top}_{0}E_{1}z_{0}$ cancels in this subtraction and plays no role in the optimization algorithm, and could thus be left out. If the material maps $f^{(n)}$ attain a stable value, the constraint $K^{\top}_{1}y=0$ is necessarily satisfied from inspection of Eq. (64). When other constraints are included the estimates of the material maps should be checked against these constraints and the primal-dual gap is modified.

IV Results

Refer to caption — Figure 1: Normalized spectrum of a typical X-ray source for CT operating at a potential of 120kV.

We demonstrate use of the one-step spectral CT algorithm on simulated transmission data modeling an ideal photon-counting detector. The X-ray spectrum, shown in Fig. 1, is assumed known. In modeling the ideal detector, the spectral response of an energy-windowed photon count measurement is taken to be the same as that of Fig. 1 between the bounding threshold energies of the window and zero outside. We conduct two studies. The first is focused on demonstrating convergence and application of the one-step algorithm with recovery of material maps for a two-material head phantom using the following minimization problems

\text{TPL-TV:}\;\;\;\operatorname*{arg\,min}_{f}D_{\text{TPL}}(c,\hat{c}(f))\text{ such that }\|f_{m}\|_{\text{TV}}\leq\gamma_{m}\;\forall\,m,

and

\text{LSQ-TV:}\;\;\;\operatorname*{arg\,min}_{f}D_{\text{LSQ}}(c,\hat{c}(f))\text{ such that }\|f_{m}\|_{\text{TV}}\leq\gamma_{m}\;\forall\,m.

The pseudo-code for TPL-TV and LSQ-TV is given explicitly in Appendix C. The second study simulates a more realistic study demonstrating application on an anthropomorphic chest phantom simulating multiple tissues/materials at multiple densities. For this study we demonstrate one-step image reconstruction of a mono-energetic image at energy $E$ using

\text{TPL-monoTV:}\;\;\;\operatorname*{arg\,min}_{f}D_{\text{TPL}}(c,\hat{c}(f))\text{ such that }\|f^{\text{(mono)}}(E)\|_{\text{TV}}\leq\gamma_{\text{mono}}.

Note that for monoenergetic image reconstruction, the TV constraint is placed on the monoenergetic image, but the optimization is performed over the individual material maps $f_{m}$ and the monoenergetic image is formed after the optimization using Eq. (11).

Aside from the system specification parameters, such as number of views, detector bins, and image dimensions, the algorithm parameters are the TV constraints $\gamma_{m}$ for TPL-TV and LSQ-TV or $\gamma_{\text{mono}}$ for TPL-monoTV and the primal-dual step size ratio $\lambda$ . The TV constraints $\gamma_{m}$ or $\gamma_{\text{mono}}$ affect the image regularization, but $\lambda$ is a tuning parameter which does not alter the solution of TPL-TV, LSQ-TV, or TPL-monoTV. It is used to optimize convergence speed of the one-step image reconstruction algorithm.

IV-A Head phantom studies with material map TV-constraints

For the present studies, we employ a two-material phantom derived from the FORBILD head phantom shown in Fig. 2. The spectral CT transmission counts are computed by use of the discrete-to-discrete model in Eq. (4). The true material maps $f_{k,\text{bone}}$ and $f_{k,\text{brain}}$ are the 256 $\times$ 256 pixel arrays shown in Fig. 2 and the corresponding linear X-ray coefficients $\mu_{\text{bone},i}$ and $\mu_{\text{brain},i}$ are obtained from the NIST tables available in Ref. [23] for energies ranging from 20 to 120 KeV in increments of 1 KeV. By employing the same data model as that used in the image reconstruction algorithm, we can investigate the convergence properties of the one-step algorithm.

For the head phantom simulations, the scanning configuration is 2D fan-beam CT with a source to iso-center distance of 50 cm and source to detector distance of 100 cm. The physical size of the phantom pixel array is $20\times 20$ cm². The number of projection views over a full 2 $\pi$ scan is 128 and the number of detector bins along a linear detector array is 512. This configuration is undersampled by a factor of 4 [31]. Two X-ray energy windows are simulated with a spectral response for each window given by the spectrum shown in Fig. 1 in the energy ranges [20 KeV, 70 KeV] and [70 KeV, 120 KeV] for the first and second energy windows, respectively.

Ideal data study

For ideal, noiseless data several image metrics are plotted in Fig. 3 for different values of $\lambda$ , and it is observed that the conditional primal-dual (cPD) gap and data discrepancy tend to zero while the material map TVs converge to the designed values. For this problem the convergence metrics are the cPD and material map TVs; the data discrepancy only tends to zero here due to the use of ideal data and in general when data inconsistency is present the minimum data discrepancy will be greater than zero. The convergence metrics demonstrate convergence of the one-step algorithm for the particular problem under study. It is important, however, to inspect these metrics for each application of the one-step algorithm, because there is no theoretical guarantee of convergence due to the re-expansion step in Eq. (60). From the present results it is clear that progress towards convergence depends on $\lambda$ ; thus it is important to perform a search over $\lambda$ .

To demonstrate convergence of the material map estimates to the corresponding phantom, we plot image root-mean-square-error (RMSE) in Fig. 4 as a function of iteration number and show the map differences at the last iteration performed in Fig. 5. The material map estimates are seen to converge to the corresponding phantom maps despite the projection view-angle under-sampling. Thus we note that the material map TV constraints are effective at combatting these under-sampling artifacts just as they are for standard CT [32, 33]. The image RMSE curves only give a summary metric for material map convergence, and it is clear from the difference images displayed in narrow gray scale window that convergence can be spatially non-uniform. For these idealized examples the pre-conditioned one-step algorithm appears to be more effective for LSQ-TV than TPL-TV as the image RMSE attained for the former is significantly lower than that of the latter. In Fig. 4 curves for LSQ-TV at $\lambda=1\times 10^{2}$ , the image RMSE curves plateau at $~10^{-5}$ due to the fact that the solution of LSQ-TV is achieved to the single precision accuracy of the computation.

Noisy data study

The noisy simulation parameters are identical to the previous noiseless study except that the spectral CT data are generated from the transmission Poisson model. The mean of the transmission measurements is arrived at by assuming $4\times 10^{6}$ total photons are incident at each detector pixel over the complete X-ray spectrum. As the simulated scan acquires only 128 views, the total X-ray exposure is equivalent to acquiring 512 views at $1\times 10^{6}$ photons per detector pixel.

We obtain multiple material map reconstructions varying the TV constraints among values greater than or equal to the actual values of the known bone and brain maps. The results for TV-TPL are shown in Figs. 6 and 7, and those for TV-LSQ are shown in Figs. 8 and 9. In all images the bone and brain maps recover the main features of the true phantom maps, and the main difference in the images is the structure of the noise. The noise texture of the recovered brain maps appears to be patchy for lower TV and grainy for larger TV constraints. Also, in comparing the brain maps for TPL-TV in Fig. 7 and LSQ-TV in Fig. 9, streak artifacts are more apparent in the latter particularly for the larger TV constraint values.

It is instructive to examine the convergence metrics in Fig. 10 and image convergence in Fig. 11 for this noisy simulation. The presentation parallels the noiseless results in Figs. 3 and 4, respectively. The differences are that results are shown for a single $\lambda$ and the image RMSE is shown for two different TV-constraint settings in the present noisy simulations. The cPD and TV plots all indicate convergence to the solution for TPL-TV and LSQ-TV. Note, however, the value of the data discrepancy objective function settles on a positive value as expected for inconsistent data. The data discrepancy, however, does not provide a check on convergence. It is true that if the data discrepancy changes with iteration we do not have convergence, but the inverse is not necessarily true. It is also reassuring to observe that the convergence rates for the set values of $\lambda$ are similar between the noiseless and noisy results. This similarity is also not affected by the fact that the TV constraints are set to different values in each of these simulations.

The RMSE comparison of the recovered material maps with the true phantom maps shown in Fig. 11 indicate an average error less than 1% for the bone map and just under 2% for the brain map (100 $\times$ the RMSE values can be interpreted as a percent error because the material maps have a value of 0 or 1). The main purpose of showing these plots is to see quantitatively the difference between the TPL and LSQ data discrepancy terms. We would expect to see lower values of image RMSE for TPL-TV, because the simulated noise is generated by a transmission Poisson model. Indeed the image RMSE is lower for TPL-TV and the gap between TPL-TV and LSQ-TV is larger for looser TV constraints. We do point out that image RMSE may not translate into better image quality, because image quality depends on the imaging task for which the images are used. Task-based assessment would take into account features of the observed signal, noise texture, and possibly background texture and observer perception [34].

One of the benefits of using the TV constraints instead of TV penalties is that the material maps reconstructed using the TPL and LSQ data discrepancy terms can be compared meaningfully. The TV constraint parameters will result in material maps with exactly the chosen TVs, while to achieve the same with the penalization approach the penalty parameters must be searched to achieve equivalent TVs. Also generating simulation results becomes more efficient, because we can directly make use of the known phantom TV values.

IV-B Chest phantom studies with a mono-energetic image TV constraint

For the final set of results we employ an anthropomorphic chest phantom created from segmentation of an actual CT chest image. Different tissue types and densities are labeled in the image totaling 24 material/density combinations, including various soft tissues, calcified/bony regions, and Gadolinium contrast agent. To demonstrate the one-step algorithm on this more realistic phantom model, we select the TPL-monoTV optimization problem in Eq. (IV) for the material map reconstruction. The material basis is selected to be water, bone, and Gadolinium contrast agent. Using TPL-monoTV is simpler than TPL-TV in that only the energy for the mono-energetic image and a single TV constraint parameter is needed instead of three parameters – the TV for each of the material maps. There are potential advantages to constraining the TV of the material maps individually, but the purpose here is to demonstrate use of the one-step algorithm and accordingly we select the simpler optimization problem.

For the chest phantom simulations, the scanning configuration is again 2D fan-beam CT with a source to iso-center distance of 80 cm and source to detector distance of 160 cm. The physical size of the phantom pixel array is $29\times 29$ cm². The number of projection views is 128 over a 2 $\pi$ scan. Five X-ray energy windows are simulated in the energy ranges [20 , 50], [50, 60], [60,80], [80,100], and [100, 120] keV. The lowest energy window is selected wider than the other four to avoid photon starvation. Noise is added in the same way as the previous simulation. The transmitted counts data follow a Poisson model with a total of $4\times 10^{6}$ photons per detector pixel. The monoenergetic image at 70 keV along with unregularized image reconstruction by TPL are shown in Fig. 12. The TPL mono-energetic image reconstruction demonstrates the impact of the simulated noise on the reconstructed image.

In Fig. 13 we show the resulting monoenergetic images from TPL-monoTV at three values of the TV constraint. The reconstructed images are shown globally in a wide gray scale and in an ROI focused on the right lung in a narrow gray scale window. The values of the TV constraint are selected based on visualization of the fine structures in the lung. For viewing these features, relatively low values of TV are selected. We note that in the global images the same TV values show the high-contrast structures with few artifacts. We point out that the one-step algorithm yields three basis material maps (not shown) and the mono-energetic images are formed by use of Eq. (11).

The selected optimization problems and simulation parameters are chosen to demonstrate possible applications of the one-step algorithm for spectral CT. Comparison of the TPL and LSQ data discrepancy in Figs. 7 and 9 does show fewer artifacts for TPL-TV, where the simulated noise model matches the TPL likelihood. In practice, we may not see the same relative performance on real data – the simulations ignore some important physical factors of spectral CT, and image quality evaluation depends on the task for which the images are used.

V Conclusion

We have developed a constrained one-step algorithm for inverting spectral CT transmission data directly to basis material maps. The algorithm addresses the associated non-convex data discrepancy terms by employing a local convex quadratic upper bound to derive the descent step. While we have derived the algorithm for TPL and LSQ data discrepancy terms, the same strategy can be applied to derive the one-step algorithm for other data fidelities. The one-step algorithm derives from the convex-concave optimization algorithm, MOCCA, which we have developed for addressing an intermediate problem arising from use of the local convex quadratic approximation. The simulations demonstrate the one-step algorithm for TV-constrained data discrepancy minimization, where the TV constraints can be applied to the individual basis maps or to an estimated monochromatic X-ray attenuation map.

Future work will investigate robustness of the one-step algorithm to data inconsistency due to spectral miscalibration error, X-ray scatter, and various physical processes involved in photon-counting detection. The one-step algorithm’s ability to incorporate basis map constraints in the inversion process should provide a means to control artifacts due to such inconsistencies. We are also pursuing a generalization to the present algorithm to allow for auto-calibration of the spectral response of the CT system.

Acknowledgements

This work was supported by NIH Grants R21EB015094, CA158446, CA182264, and EB018102. The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health.

Appendix A Gradient of $L_{\text{TPL}}$

We derive the gradient in Eq. (13), motivating the definition of the linear transform $A$ . Recall Eqs. (4) and (17):

	$\displaystyle Z_{\ell i,mk}$	$\displaystyle=\mu_{mi}X_{\ell k}$
	$\displaystyle\hat{c}_{w\ell}(f_{km})$	$\displaystyle=N_{w\ell}\sum_{i}s_{w\ell i}\exp\left[-\sum_{mk}\mu_{mi}X_{\ell k}f_{km}\right]$
		$\displaystyle=N_{w\ell}\sum_{i}s_{w\ell i}\exp\left[-\sum_{mk}Z_{\ell i,mk}f_{km}\right]$
		$\displaystyle=N_{w\ell}\sum_{i}s_{w\ell i}\exp\left[-(Zf)_{\ell i}\right].$

The gradient of $L_{\text{TPL}}$ is

	$\displaystyle\nabla_{f}L_{\text{TPL}}(f)$	$\displaystyle:=\frac{\partial}{\partial f_{km}}\sum_{w\ell}\left[\hat{c}_{w\ell}(f)-c_{w\ell}-c_{w\ell}\log\left(\hat{c}_{w\ell}(f)/c_{w\ell}\right)\right]$
		$\displaystyle=\frac{\partial}{\partial f_{km}}\sum_{w\ell}\left[\hat{c}_{w\ell}(f_{km})-c_{w\ell}\log\hat{c}_{wl}(f_{km})\right]$
		$\displaystyle=\sum_{w\ell}\left[N_{w\ell}\sum_{i}s_{w\ell i}\cdot(-Z_{\ell i,mk})\exp[-(Zf)_{\ell i}]-c_{w\ell}\frac{\sum_{i}s_{w\ell i}\cdot(-Z_{\ell i,mk})\exp[-(Zf)_{\ell i}]}{\sum_{i}s_{w\ell i}\exp[-(Zf)_{\ell i}]}\right]$
		$\displaystyle=\sum_{\ell i}Z_{\ell i,mk}\sum_{w}\left[c_{w\ell}\frac{s_{w\ell i}\exp[-(Zf)_{\ell i}]}{\sum_{i}s_{w\ell i}\exp[-(Zf)_{\ell i}]}-N_{w\ell}s_{w\ell i}\exp[-(Zf)_{\ell i}]\right]$
		$\displaystyle=\sum_{\ell i}Z_{\ell i,mk}\sum_{w}(c_{w\ell}-\hat{c}_{w\ell}(f))\frac{s_{w\ell i}\exp[-(Zf)_{\ell i}]}{\sum_{i}s_{w\ell i}\exp[-(Zf)_{\ell i}]}.$

Continuing the algebraic manipulation we insert $\mathbf{I}_{\ell\ell^{\prime}}$ :

	$\displaystyle\nabla_{f}L_{\text{TPL}}(f)$	$\displaystyle=\sum_{\ell i}Z_{\ell i,mk}\sum_{w\ell^{\prime}}\mathbf{I}_{\ell\ell^{\prime}}\frac{s_{w\ell^{\prime}i}\exp[-(Zf)_{\ell^{\prime}i}]}{\sum_{i}s_{w\ell^{\prime}i}\exp[-(Zf)_{\ell^{\prime}i}]}(c_{w\ell^{\prime}}-\hat{c}_{w\ell^{\prime}}(f))$
		$\displaystyle=\sum_{\ell i}Z_{\ell i,mk}\sum_{w\ell^{\prime}}A_{w\ell^{\prime},\ell i}(f)r_{w\ell^{\prime}}(f)=Z^{\top}A(f)^{\top}r(f).$

The other necessary gradient and Hessian computations follow from similar manipulations.

Appendix B Positive semidefiniteness of $\nabla^{2}_{+}L(f)$ and $\nabla^{2}_{-}L(f)$

Recall Eq. (18),

A_{w\ell,\ell^{\prime}i}(f)=\frac{s_{w\ell i}\exp\left[-(Zf)_{\ell i}\right]}{\sum_{i^{\prime}}s_{w\ell i^{\prime}}\exp\left[-(Zf)_{\ell i^{\prime}}\right]}\mathbf{I}_{\ell^{\prime}\ell}.

For ease of presentation, we collapse the double indices of $A$

A_{s,t}(f)=A_{w\ell,\ell^{\prime}i}(f)\text{ where }s=w\cdot N_{\ell}+\ell,\;t=i\cdot N_{\ell}+\ell^{\prime}.

We show that

\operatorname{diag}\left(A(f)^{\top}b\right)-A(f)^{\top}\operatorname{diag}(b)A(f)\geq 0\text{ when }b_{s}\geq 0\;\forall\,s.

(74)

This inequality can be used to prove that the Hessians in Eqs. (21), (22), (23), and (24) are positive semidefinite by setting $b$ equal to $r_{-}(f)$ , $r_{+}(f)$ , $r_{-}^{\text{(log)}}(f)$ , and $r_{+}^{\text{(log)}}(f)$ , respectively.

To prove Eq. (74), we expand $b$ in unit vectors

b=\sum_{s}b_{s}\hat{e}_{s}\text{ where }\hat{e}_{s,s^{\prime}}=\begin{cases}1&s^{\prime}=s\\ 0&s^{\prime}\neq s\end{cases}.

and show for any vector $u$ that

u^{\top}\left[\operatorname{diag}\left(A(f)^{\top}\hat{e}_{s}\right)-A(f)^{\top}\operatorname{diag}(\hat{e}_{s})A(f)\right]u\geq 0.

(75)

Fixing $s$ , we define the vector $v$ with components $v_{t}=A(f)_{s,t}$ or using the unit vector $\hat{e}_{s}$

v=A^{\top}(f)\hat{e}_{s}.

From the definition of $A$ , we note that $v_{t}\geq 0$ for all $t\in\{1,\dots,N_{t}\}$ and $\sum_{t}v_{t}=1$ . By the definition of $v$ , the left-hand side of Eq. (75), lhs, is

	lhs	$\displaystyle=u^{\top}\left[\operatorname{diag}(v)-vv^{\top}\right]u$
		$\displaystyle=\sum_{t}v_{t}u_{t}^{2}-\left(\sum_{t}v_{t}u_{t}\right)^{2}$
		$\displaystyle=\sum_{t}v_{t}u_{t}^{2}-\left(\sum_{t}\sqrt{v_{t}}\cdot\sqrt{v_{t}u_{t}^{2}}\right)^{2},$

and by the Cauchy-Schwartz inequality,

	lhs	$\displaystyle\geq\sum_{t}v_{t}u_{t}^{2}-\left(\sum_{t}v_{t}\right)\cdot\left(\sum_{t}v_{t}u_{t}^{2}\right)$
		$\displaystyle=\sum_{t}v_{t}u_{t}^{2}-(1)\cdot\left(\sum_{t}v_{t}u_{t}^{2}\right)$
		$\displaystyle=0.$

This proves the inequality in Eq. (75).

Using Eq. (75), we prove the inequality in Eq. (74)

\operatorname{diag}\left(A(f)^{\top}b\right)-A(f)^{\top}\operatorname{diag}(b)A(f)=\sum_{s}b_{s}\cdot\left[\operatorname{diag}\left(A(f)^{\top}\hat{e}_{s}\right)-A(f)^{\top}\operatorname{diag}(\hat{e}_{s})A(f)\right]\geq 0,

(76)

where the inequality is shown by noting that $b_{s}\geq 0$ , by assumption, and the sum is thus a linear combination of positive definite matrices with non-negative coefficients.

Appendix C Derivation of one-step algorithm for TPL-TV and LSQ-TV with $\mu$ -preconditioning

TV-constrained optimization

To derive the one-step algorithm used in the Results section, we write down the intermediate convex optimization problem that involves the first block of the local quadratic upper bound to $D_{\text{TPL}}$ or $D_{\text{LSQ}}$

f^{\star}=\operatorname*{arg\,min}_{f}\left\{\frac{1}{2}(K_{1}f)^{\top}D_{1}K_{1}f-(K_{1}f-z_{0})^{\top}(b_{1}+E_{1}z_{0})\right\}\text{ such that }\|f_{m}\|_{\text{TV}}\leq\gamma_{m}\;\forall\,m.

(77)

That only the first block of the full quadratic expression is explained in Sec. III-B4 and the form of $D_{1}$ , $E_{1}$ and $b_{1}$ given in Sec. III-B2 determines whether we are addressing TPL-TV or LSQ-TV . The data discrepancy term of this optimization problem is the same as the objective function of Eq. (58), but it differs from Eq. (58) in that we have added the convex constraints on the material map TV values. We write Eq. (77) using indicator functions (see Eq. (6)) to code the TV constraints and we introduce the $\mu$ -preconditioning transformation described in Sec. III-C

f^{\star}=\operatorname*{arg\,min}_{f^{\prime}}\left\{\frac{1}{2}(K^{\prime}_{1}f^{\prime})^{\top}D_{1}K^{\prime}_{1}f-(K^{\prime}_{1}f^{\prime}-z_{0})^{\top}(b_{1}+E_{1}z_{0})+\right.\\ \left.\sum_{m}\delta\left(\left\|\left(P^{-1}f^{\prime}\right)_{m}\right\|_{\text{TV}}\leq\gamma_{m}\right)\right\},

(78)

where $f^{\prime}=Pf$ are the transformed ( $\mu$ -preconditioned) material maps from Sec. III-C. Note that the TV constraints apply to the untransformed material maps $f=P^{-1}f^{\prime}$ .

Writing constrained TV optimization in the general form $F(Kx)+G(x)$

To derive the CP primal-dual algorithm, we write Eq. (78) in the form of Eq. (26). We note that all the terms involve a linear transform of $f$ , and accordingly we make the following assignments

	$\displaystyle F_{\text{convex}}(z_{0};z)$	$\displaystyle=F_{1}(z_{0};z_{\text{sino}})+F_{2}(z_{\text{grad}})$
	$\displaystyle F_{1}(z_{\text{sino}})$	$\displaystyle=\frac{1}{2}z^{\top}_{\text{sino}}D_{1}z_{\text{sino}}-(z_{\text{sino}}-z_{0})^{\top}(b_{1}+E_{1}z_{0})$
	$\displaystyle F_{2}(z_{\text{grad}})$	$\displaystyle=\sum_{m}\delta\left(\left\\|\left(\|z_{\text{grad},m}\|\right)\right\\|_{1}\leq\gamma_{m}\right)$
	$\displaystyle G(f^{\prime})$	$\displaystyle=0,$

where

z=\left(\begin{array}[]{c}z_{\text{sino}}\\ z_{\text{grad}}\end{array}\right)=\left(\begin{array}[]{c}K^{\prime}_{1}\\ \nabla P^{-1}\end{array}\right)f^{\prime}.

Note that we use the short-hand that the gradient operator, $\nabla$ , applies to each of the material maps in the composite material map vector, $P^{-1}f^{\prime}$ . The Legendre transform in Eq. (28) provides the necessary dual functions $F^{*}_{1}$ , $F^{*}_{2}$ , and $G^{*}$ . By direct computation

F^{*}_{1}(y_{\text{sino}})=\frac{1}{2}D_{1}^{-1}\|y_{\text{sino}}+b_{1}+E_{1}z_{0}\|^{2}_{2}+z_{0}^{T}E_{1}z_{0}.

(79)

From Sec. 3.1 of Ref. [27]

G^{*}(f)=\delta(f=0).

(80)

Convex conjugate of $F_{2}$

We sketch the derivation of $F^{*}_{2}(y_{\text{grad}})$ , and for this derivation we drop the “grad” subscript.

	$\displaystyle F^{*}_{2}(y)$	$\displaystyle=\max_{y^{\prime}}\left\{y^{\top}y^{\prime}-\sum_{m}\delta\left(\left\\|\left(\|y^{\prime}_{m}\|\right)\right\\|_{1}\leq\gamma_{m}\right)\right\}$
		$\displaystyle=\max_{g^{\prime}}\left\{g^{\top}g^{\prime}-\sum_{m}\delta\left(\left\\|g^{\prime}_{m}\right\\|_{1}\leq\gamma_{m}\right)\right\}$
		$\displaystyle\text{where }g=\|y\|\text{ and }g^{\prime}=\|y^{\prime}\|,$
		$\displaystyle=\max_{g^{\prime}}\sum_{m}\left\{g_{m}^{\top}g_{m}^{\prime}-\delta\left(\left\\|g^{\prime}_{m}\right\\|_{1}\leq\gamma_{m}\right)\right\},$
		$\displaystyle=\sum_{m}\max_{g_{m}^{\prime}}\left\{g_{m}^{\top}g_{m}^{\prime}-\delta\left(\left\\|g^{\prime}_{m}\right\\|_{1}\leq\gamma_{m}\right)\right\}$

The Legendre transform maximization over the variable $y^{\prime}$ , dual to the material map gradients, is reduced to a maximization over the spatial magnitude $g^{\prime}=|y^{\prime}|$ because the indicator function is independent of the spatial direction of $y^{\prime}$ and the term $y^{\top}y^{\prime}$ is maximized when the spatial direction of $y^{\prime}$ line up with $y$ ; hence the term $y^{\top}y^{\prime}$ is replaced by $g^{\top}g^{\prime}$ , which we explicitly write as a sum over the material index $m$ . The maximization and summation order can be switched, because each of the terms in the summation are independent of each other. Evaluation of the maximization over $g^{\prime}_{m}$ can be seen in the diagram shown in Fig. 14. Accordingly we find

F^{*}_{2}(y_{\text{grad}})=\sum_{m}\gamma_{m}\|(|y_{\text{grad}}|)\|_{\infty}.

(81)

Dual maximization of Eq. (78)

Using Eqs. (33), (79), (80), and (81), we obtain the maximization dual to Eq. (77)

y^{\star}=\operatorname*{arg\,max}_{y}\left\{-\frac{1}{2}D^{-1}_{1}\|y_{\text{sino}}+b_{1}+E_{1}z_{0}\|_{2}^{2}-z^{\top}_{0}E_{1}z_{0}-\sum_{m}\gamma_{m}\|(|y_{\text{grad}}|)\|_{\infty}\right\}\\ \text{ such that }\left(\begin{array}[]{c}K^{\prime}_{1}\\ \nabla P^{-1}\end{array}\right)^{\top}y=0.

(82)

The objective functions of the primal and dual problems, in Eqs. (77) and (82) respectively, are needed to generate the conditional primal-dual gap plots in Fig. 3.

The material map TV proximity step

In order to derive the TPL-TV and LSQ-TV one step algorithms, we need to derive the proximity minimization in Eq. (54)

y^{(n+1)}=\operatorname*{arg\,min}_{y^{\prime}}\left\{F_{\text{convex}}^{*}\left(z^{(n+1)}_{0};y^{\prime}\right)+\frac{1}{2}\left\|\Sigma_{\text{grad}}^{-1/2}\left(y^{(n)}+(\Sigma_{\text{sino}}K^{\prime}_{1}+\Sigma_{\text{grad}}\nabla P^{-1})\bar{x}^{(n)}-y^{\prime}\right)\right\|^{2}_{2}\right\}.

The proximity problem splits into “sino” and “grad” sub-problems and the “sino” sub-problem results in Eq. (63). We solve here the “grad” proximity optimization to obtain the pseudo-code for TPL-TV and LSQ-TV

y_{\text{grad}}^{(n+1)}=\operatorname*{arg\,min}_{y_{\text{grad}}^{\prime}}\left\{F_{2}^{*}\left(y_{\text{grad}}^{\prime}\right)+\frac{1}{2}\left\|\Sigma_{\text{grad}}^{-1/2}\left(y^{+}_{\text{grad}}-y_{\text{grad}}^{\prime}\right)\right\|^{2}_{2}\right\}\text{ where }y^{+}_{\text{grad}}=y_{\text{grad}}^{(n)}+\Sigma_{\text{grad}}\nabla P^{-1}\bar{x}^{(n)}.

Dropping the “grad” subscript on $y^{\prime}$ , we employ the Moreau identity which relates the proximity optimizations between a function and its dual

\operatorname*{arg\,min}_{y^{\prime}}\left\{F_{2}^{*}\left(y^{\prime}\right)+\frac{1}{2}\left\|\Sigma_{\text{grad}}^{-1/2}\left(y^{+}_{\text{grad}}-y^{\prime}\right)\right\|^{2}_{2}\right\}=\\ y^{+}_{\text{grad}}-\Sigma_{\text{grad}}^{1/2}\operatorname*{arg\,min}_{y^{\prime}}\left\{F_{2}\left(y^{\prime}\Sigma_{\text{grad}}^{-1/2}\right)+\frac{1}{2}\left\|y^{+}_{\text{grad}}\Sigma_{\text{grad}}^{-1/2}-y^{\prime}\right\|^{2}_{2}\right\}.

(83)

The dual “grad” update separates into the individual material map $m$ components

y_{\text{grad},m}^{(n+1)}=y^{+}_{\text{grad},m}-\Sigma_{\text{grad},m}^{1/2}\operatorname*{arg\,min}_{y^{\prime}_{m}}\left\{\delta\left(\left\|\left(\left|y^{\prime}_{m}\Sigma_{\text{grad},m}^{-1/2}\right|\right)\right\|_{1}\leq\gamma_{m}\right)+\frac{1}{2}\left\|y^{+}_{\text{grad},m}\Sigma_{\text{grad},m}^{-1/2}-y^{\prime}_{m}\right\|^{2}_{2}\right\}.

To simplify the proximity minimization we set

	$\displaystyle g^{+}_{m}$	$\displaystyle=\left\|y^{+}_{\text{grad},m}\Sigma_{\text{grad},m}^{-1/2}\right\|,\;\;\hat{g}^{+}_{m}=\left(y^{+}_{\text{grad},m}\Sigma_{\text{grad},m}^{-1/2}\right)/g^{+}_{m},\;\;g^{\prime}_{m}=\left\|y^{\prime}_{m}\right\|,$
	$\displaystyle y_{\text{grad},m}^{(n+1)}$	$\displaystyle=y^{+}_{\text{grad},m}-\Sigma_{\text{grad},m}^{1/2}\,\hat{g}^{+}_{m}\operatorname*{arg\,min}_{g^{\prime}_{m}}\left\{\delta\left(\left\\|g^{\prime}_{m}\Sigma_{\text{grad},m}^{-1/2}\right\\|_{1}\leq\gamma_{m}\right)+\frac{1}{2}\left\\|g^{+}_{m}-g^{\prime}_{m}\right\\|^{2}_{2}\right\}.$

The proximity minimization is a projection of $g^{+}_{m}$ onto a weighted $\ell_{1}$ -ball.

	$\displaystyle y_{\text{grad},m}^{(n+1)}$	$\displaystyle=y^{+}_{\text{grad},m}-w\,\hat{g}^{+}_{m}\text{Proj}\left(g^{+}_{m};\left\{g,\\|g/w\\|\leq\gamma_{m}\right\}\right)$
		$\displaystyle\text{where }w=\Sigma_{\text{grad},m}^{1/2}.$

If $g$ is inside the weighted $\ell_{1}$ -ball, i.e. $\|g/w\|_{1}\leq\gamma$ , the function $\text{Proj}\left(g;\left\{g,\|g/w\|\leq\gamma\right\}\right)$ returns $g$ . If $g$ is outside the weighted $\ell_{1}$ -ball, i.e. $\|g/w\|_{1}>\gamma$ , there exists an $\alpha_{0}$ such that

	$\displaystyle\text{Proj}\left(g;\left\{g,\\|g/w\\|\leq\gamma_{m}\right\}\right)$	$\displaystyle=\operatorname*{arg\,min}_{g^{\prime}}\left\{\delta\left(\left\\|g^{\prime}/w\right\\|_{1}\leq\gamma\right)+\frac{1}{2}\left\\|g-g^{\prime}\right\\|^{2}_{2}\right\}$
		$\displaystyle=\operatorname*{arg\,min}_{g^{\prime}}\left\{\alpha_{0}\left\\|g^{\prime}/w\right\\|_{1}+\frac{1}{2}\left\\|g-g^{\prime}\right\\|^{2}_{2}\right\}$
		$\displaystyle=\max\left(g-\alpha_{0}w,0\right).$

The parameter $\alpha_{0}$ is defined implicitly

\|\max\left(g-\alpha_{0}w,0\right)\|_{1}=\gamma,

and it can be determined by any standard root finding technique applied to

f(\alpha)=0\text{ where }f(\alpha)=\|\max\left(g-\alpha w,0\right)\|_{1}-\gamma,

where the search interval is $\alpha\in[0,\|g/w\|_{\infty}]$ .

The pseudocode for TPL-TV and LSQ-TV

Having derived the TV constraint proximity step, we are in a position to write the complete pseudocode for the one-step algorithm including the TV constraints. We do employ the $\mu$ -preconditioning that orthogonalizes the linear attenuation coefficients, but we drop the prime notation on $f$ and $K$ .

$\displaystyle f_{0}$	$\displaystyle=\bar{f}^{(n)}$
$\displaystyle\Sigma^{(n)}$	$\displaystyle=\left(\begin{array}[]{c}\|K_{1}(f_{0})\|\\ \|\nabla P^{-1}\|\end{array}\right)\mathbf{1}/\lambda;\;\;T^{(n)}=\lambda\left(\begin{array}[]{c}\|K_{1}(f_{0})\|\\ \|\nabla P^{-1}\|\end{array}\right)^{\top}\mathbf{1}$	(88)
$\displaystyle w_{m}$	$\displaystyle=\left(\Sigma_{\text{grad},m}^{(n)}\right)^{1/2}\;\forall\,m$
$\displaystyle z^{(n+1)}_{0}$	$\displaystyle=\left(\Sigma_{\text{sino}}^{(n)}\right)^{-1}\left(y_{\text{sino}}^{(n-1)}-y_{\text{sino}}^{(n)}+\Sigma_{\text{sino}}^{(n)}K_{1}(f_{0})\bar{f}^{(n-1)}\right)$
$\displaystyle y_{\text{sino}}^{(n+1)}$	$\displaystyle=\left(D_{1}(f_{0})+\Sigma_{\text{sino}}^{(n)}\right)^{-1}\left[D_{1}(f_{0})\left(y_{\text{sino}}^{(n)}+\Sigma_{\text{sino}}^{(n)}K_{1}(f_{0})\bar{f}^{(n)}\right)-\Sigma_{\text{sino}}^{(n)}\left(b_{1}(f_{0})+E_{1}(f_{0})z^{(n+1)}_{0}\right)\right]$
$\displaystyle y^{+}_{\text{grad},m}$	$\displaystyle=y_{\text{grad},m}^{(n)}+\Sigma_{\text{grad},m}^{(n)}\nabla(P^{-1}\bar{f}^{(n)})_{m}\;\forall\,m$
$\displaystyle g^{+}_{m}$	$\displaystyle=\left\|y^{+}_{\text{grad},m}/w_{m}\right\|;\;\;\hat{g}^{+}_{m}=\left(y^{+}_{\text{grad},m}/w_{m}\right)/g^{+}_{m}\;\forall\,m$
$\displaystyle y_{\text{grad},m}^{(n+1)}$	$\displaystyle=y^{+}_{\text{grad},m}-w_{m}\,\hat{g}^{+}_{m}\text{Proj}\left(g^{+}_{m};\left\{g,\\|g/w_{m}\\|\leq\gamma_{m}\right\}\right)\;\forall\,m$
$\displaystyle f^{(n+1)}$	$\displaystyle=\bar{f}^{(n)}-T^{(n)}\left(\begin{array}[]{c}K_{1}(f_{0})\\ \nabla P^{-1}\end{array}\right)^{\top}\left(\begin{array}[]{c}y_{\text{sino}}^{(n+1)}\\ y_{\text{grad}}^{(n+1)}\end{array}\right)$	(93)
$\displaystyle\bar{f}^{(n+1)}$	$\displaystyle=2f^{(n+1)}-f^{(n)}$

The final material maps after $N$ iterations are obtained by applying the inverse preconditioner

\text{return }P^{-1}f^{(N)}.

For all the results presented in the article, all variables are initialized to zero.

References

[1] K. Taguchi and J. S. Iwanczyk, “Vision 20/20: Single photon counting x-ray detectors in medical imaging,” Med. Phys., vol. 40, p. 100901, 2013.
[2] R. E. Alvarez and A. Macovski, “Energy-selective reconstructions in X-ray computerised tomography,” Phys. Med. Biol., vol. 21, pp. 733–744, 1976.
[3] P. M. Shikhaliev, “Energy-resolved computed tomography: first experimental results,” Phys. Med. Biol., vol. 53, pp. 5595–5613, 2008.
[4] T. G. Schmidt, “Optimal “image-based” weighting for energy-resolved CT,” Med. Phys., vol. 36, pp. 3018–3027, 2009.
[5] A. M. Alessio and L. R. MacDonald, “Quantitative material characterization from multi-energy photon counting CT,” Med. Phys., vol. 40, p. 031108, 2013.
[6] E. Roessl and R. Proksa, “K-edge imaging in x-ray computed tomography using multi-bin photon counting detectors,” Phys. Med. Biol., vol. 52, pp. 4679–4696, 2007.
[7] J.-P. Schlomka, E. Roessl, R. Dorscheid, S. Dill, G. Martens, T. Istel, C. Bäumer, C. Herrmann, R. Steadman, G. Zeitler, A. Livne, and R. Proksa, “Experimental feasibility of multi-energy photon-counting K-edge imaging in pre-clinical computed tomography,” Phys. Med. Biol., vol. 53, pp. 4031–4047, 2008.
[8] E. Roessl, D. Cormode, B. Brendel, K.-J. Engel, G. Martens, A. Thran, Z. Fayad, and R. Proksa, “Preclinical spectral computed tomography of gold nano-particles,” Nucl. Inst. Meth. A, vol. 648, pp. S259–S264, 2011.
[9] D. P. Cormode, E. Roessl, A. Thran, T. Skajaa, R. E. Gordon, J.-P. Schlomka, V. Fuster, E. A. Fisher, W. J. M. Mulder, R. Proksa, and Z. A. Fayad, “Atherosclerotic plaque composition: Analysis with multicolor CT and targeted Gold nanoparticles,” Radiology, vol. 256, pp. 774–782, 2010.
[10] E. Roessl, B. Brendel, K.-J. Engel, J.-P. Schlomka, A. Thran, and R. Proksa, “Sensitivity of photon-counting based K-edge imaging in X-ray computed tomography.” IEEE Trans. Med. Imaging, vol. 30, pp. 1678–1690, 2011.
[11] C. O. Schirra, E. Roessl, T. Koehler, B. Brendal, A. Thran, D. Pan, M. A. Anastasio, and R. Proksa, “Statistical reconstruction of material decomposed data in spectral CT,” IEEE Trans. Med. Imaging, vol. 32, pp. 1249–1257, 2013.
[12] G. N. Hounsfield, “Computerized transverse axial scanning (tomography): Part 1. Description of system,” Brit. J. Radiol., vol. 46, pp. 1016–1022, 1973.
[13] C. Maaß, M. Baer, and M. Kachelrieß, “Image-based dual energy CT using optimized precorrection functions: A practical new approach of material decomposition in image domain,” Med. Phys., vol. 36, pp. 3818–3829, 2009.
[14] R. A. Brooks, “A quantitative theory of the Hounsfield unit and its application to dual energy scanning.” J. Comp. Assist. Tomography, vol. 1, pp. 487–493, 1977.
[15] J. A. Fessler, I. A. Elbakri, P. Sukovic, and N. H. Clinthorne, “Maximum-likelihood dual-energy tomographic image reconstruction,” vol. 4684, 2002, pp. 38–49.
[16] I. A. Elbakri and J. A. Fessler, “Statistical image reconstruction for polyenergetic X-ray computed tomography,” IEEE Trans. Med. Imaging, vol. 21, pp. 89–99, 2002.
[17] J. Chung, J. G. Nagy, and I. Sechopoulos, “Numerical algorithms for polyenergetic digital breast tomosynthesis reconstruction,” SIAM J. Imaging Sci., vol. 3, no. 1, pp. 133–152, 2010.
[18] C. Cai, T. Rodet, S. Legoupil, and A. Mohammad-Djafari, “A full-spectral Bayesian reconstruction approach based on the material decomposition model applied in dual-energy computed tomography,” Med. Phys., vol. 40, p. 111916, 2013.
[19] Z. Ruoqiao, J.-B. Thibault, C.-A. Bouman, K. D. Sauer., and J. Hsieh, “Model-based iterative reconstruction for dual-energy X-ray CT using a joint quadratic likelihood model,” IEEE Trans. Med. Imaging, vol. 33, pp. 117–134, 2014.
[20] Y. Long and J. A. Fessler, “Multi-material decomposition using statistical image reconstruction for spectral CT,” IEEE Trans. Med. Imaging, vol. 33, pp. 1614–1626, 2014.
[21] A. Sawatsky, Q. Xu, C. O. Schirra, and M. A. Anastasio, “Proximal admm for multi-channel image reconstruction in spectral X-ray CT,” IEEE Trans. Med. Imaging, vol. 33, pp. 1657–1668, 2014.
[22] K. Nakada, K. Taguchi, G. S. K. Fung, and K. Armaya, “Joint estimation of tissue types and linear attenuation coefficients for photon counting CT,” Med. Phys., vol. 42, pp. 5329–5341, 2015.
[23] J. H. Hubbell and S. M. Seltzer, “Tables of x-ray mass attenuation coefficients and mass energy-absorption coefficients 1 keV to 20 MeV for elements Z= 1 to 92 and 48 additional substances of dosimetric interest,” National Inst. of Standards and Technology-PL, Gaithersburg, MD (United States). Ionizing Radiation Div., Tech. Rep., 1995.
[24] D. S. Rigie and P. J. L. Rivière, “Joint reconstruction of multi-channel, spectral CT data via constrained total nuclear variation minimization,” Phys. Med. Biol., vol. 60, pp. 1741–1762, 2015.
[25] A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” J. Math. Imag. Vis., vol. 40, pp. 120–145, 2011.
[26] T. Pock and A. Chambolle, “Diagonal preconditioning for first order primal-dual algorithms in convex optimization,” in International Conference on Computer Vision (ICCV 2011). Barcelona, Spain: IEEE, 2011, pp. 1762–1769.
[27] E. Y. Sidky, J. H. Jørgensen, and X. Pan, “Convex optimization problem prototyping for image reconstruction in computed tomography with the Chambolle-Pock algorithm,” Phys. Med. Biol., vol. 57, pp. 3065–3091, 2012.
[28] E. Y. Sidky, R. Chartrand, J. M. Boone, and X. Pan, “Constrained T $p$ V minimization for enhanced exploitation of gradient sparsity: Application to CT image reconstruction,” J. Trans. Engineer. Health Med., vol. 2, p. 1800418, 2014.
[29] J. S. Jørgensen and E. Y. Sidky, “How little data is enough? Phase-diagram analysis of sparsity-regularized X-ray CT,” Phil. Trans. Royal Soc. A, vol. 373, p. 20140387, 2015.
[30] R. F. Barber and E. Y. Sidky, “MOCCA: mirrored convex/concave optimization for nonconvex composite functions,” 2015, http://arxiv.org/abs/1510.08842.
[31] J. S. Jørgensen, E. Y. Sidky, and X. Pan, “Quantifying admissible undersampling for sparsity-exploiting iterative image reconstruction in X-ray CT,” IEEE Trans. Med. Imaging, vol. 32, pp. 460–473, 2013.
[32] E. Y. Sidky, C.-M. Kao, and X. Pan, “Accurate image reconstruction from few-views and limited-angle data in divergent-beam CT,” J. X-ray Sci. Tech., vol. 14, pp. 119–139, 2006.
[33] E. Y. Sidky and X. Pan, “Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization,” Phys. Med. Biol., vol. 53, pp. 4777–4807, 2008.
[34] H. H. Barrett and K. J. Myers, Foundations of Image Science. Hoboken, NJ: John Wiley & Sons, 2004.

	$\displaystyle F^{*}_{2}(y)$	$\displaystyle=\max_{y^{\prime}}\left\{y^{\top}y^{\prime}-\sum_{m}\delta\left(\left\\|\left(\|y^{\prime}_{m}\|\right)\right\\|_{1}\leq\gamma_{m}\right)\right\}$
		$\displaystyle=\max_{g^{\prime}}\left\{g^{\top}g^{\prime}-\sum_{m}\delta\left(\left\\|g^{\prime}_{m}\right\\|_{1}\leq\gamma_{m}\right)\right\}$
		$\displaystyle\text{where }g=\|y\|\text{ and }g^{\prime}=\|y^{\prime}\|,$
		$\displaystyle=\max_{g^{\prime}}\sum_{m}\left\{g_{m}^{\top}g_{m}^{\prime}-\delta\left(\left\\|g^{\prime}_{m}\right\\|_{1}\leq\gamma_{m}\right)\right\},$
		$\displaystyle=\sum_{m}\max_{g_{m}^{\prime}}\left\{g_{m}^{\top}g_{m}^{\prime}-\delta\left(\left\\|g^{\prime}_{m}\right\\|_{1}\leq\gamma_{m}\right)\right\}$

	$\displaystyle\text{Proj}\left(g;\left\{g,\\|g/w\\|\leq\gamma_{m}\right\}\right)$	$\displaystyle=\operatorname*{arg\,min}_{g^{\prime}}\left\{\delta\left(\left\\|g^{\prime}/w\right\\|_{1}\leq\gamma\right)+\frac{1}{2}\left\\|g-g^{\prime}\right\\|^{2}_{2}\right\}$
		$\displaystyle=\operatorname*{arg\,min}_{g^{\prime}}\left\{\alpha_{0}\left\\|g^{\prime}/w\right\\|_{1}+\frac{1}{2}\left\\|g-g^{\prime}\right\\|^{2}_{2}\right\}$
		$\displaystyle=\max\left(g-\alpha_{0}w,0\right).$

An algorithm for constrained one-step inversion of spectral CT data

Abstract

I Introduction

II One-step image reconstruction for spectral CT

II-A Spectral CT data model

II-B Constrained optimization for one-step basis decomposition

Data discrepancy functions

Convex constraints

III A first-order algorithm for spectral CT constrained optimization

Optimizing the spectral CT data fidelity

III-A A local convex quadratic upper bound to the spectral CT data discrepancy terms

III-A1 Quadratic expansion

III-A2 A local convex upper bound to L​(f)L(f)

III-B The motivation and application of MOCCA

III-B1 Summary of the Chambolle-Pock (CP) primal-dual framework

III-B2 The need to generalize the CP primal-dual framework

III-B3 Heuristic derivation of MOCCA

III-B4 Application of MOCCA to optimization of the spectral CT data fidelity

III-C One-step algorithm μ\mu-preconditioning

III-D Convergence checks

IV Results

IV-A Head phantom studies with material map TV-constraints

Ideal data study

Noisy data study

IV-B Chest phantom studies with a mono-energetic image TV constraint

V Conclusion

Acknowledgements

Appendix A Gradient of LTPLL_{\text{TPL}}

Appendix B Positive semidefiniteness of ∇+2L​(f)\nabla^{2}_{+}L(f) and ∇−2L​(f)\nabla^{2}_{-}L(f)

Appendix C Derivation of one-step algorithm for TPL-TV and LSQ-TV with μ\mu-preconditioning

TV-constrained optimization

Writing constrained TV optimization in the general form F​(K​x)+G​(x)F(Kx)+G(x)

Convex conjugate of F2F_{2}

Dual maximization of Eq. (78)

The material map TV proximity step

The pseudocode for TPL-TV and LSQ-TV

References

III-A2 A local convex upper bound to $L(f)$

III-C One-step algorithm $\mu$ -preconditioning

Appendix A Gradient of $L_{\text{TPL}}$

Appendix B Positive semidefiniteness of $\nabla^{2}_{+}L(f)$ and $\nabla^{2}_{-}L(f)$

Appendix C Derivation of one-step algorithm for TPL-TV and LSQ-TV with $\mu$ -preconditioning

Writing constrained TV optimization in the general form $F(Kx)+G(x)$

Convex conjugate of $F_{2}$