This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An algorithm for constrained one-step inversion of spectral CT data

Rina Foygel Barber, Emil Y. Sidky, Taly Gilat Schmidt, and Xiaochuan Pan Rina Foygel Barber is with the Department of Statistics, The University of Chicago, 5734 S. University Ave., Chicago, IL 60637, USA. rina@uchicago.eduEmil Y. Sidky and Xiaochuan Pan are with the Department of Radiology, The University of Chicago, 5841 S. Maryland Ave., Chicago, IL 60637, USA. e-mail: {sidky,xpan}@uchicago.eduTaly Gilat Schmidt is with the Department of Biomedical Engineering, Marquette University, Milwaukee, WI 53233, USA. taly.gilat-schmidt@marquette.edu
Abstract

We develop a primal-dual algorithm that allows for one-step inversion of spectral CT transmission photon counts data to a basis map decomposition. The algorithm allows for image constraints to be enforced on the basis maps during the inversion. The derivation of the algorithm makes use of a local upper bounding quadratic approximation to generate descent steps for non-convex spectral CT data discrepancy terms, combined with a new convex-concave optimization algorithm. Convergence of the algorithm is demonstrated on simulated spectral CT data. Simulations with noise and anthropomorphic phantoms show examples of how to employ the constrained one-step algorithm for spectral CT data.

I Introduction

The recent research activity in photon-counting detectors has motivated a resurgence in the investigation of spectral computed tomography (CT). Photon-counting detectors detect individual X-ray quanta and the electronic pulse signal generated by these quanta has a peak amplitude proportional to the photon energy [1]. Thresholding these amplitudes allows for coarse energy resolution of the X-ray photons, and the transmitted flux of X-ray photons can be measured simultaneously in a number of energy windows. Theoretically, the energy-windowed transmission measurements can be exploited to reconstruct quantitatively the X-ray attenuation map of the subject being scanned [2]. The potential benefits are reduction of beam-hardening artifacts, and improved contrast-to-noise ratio (CNR), signal-to-noise ratio (SNR), and quantitative imaging [1, 3, 4, 5]. For photon-counting detectors where the number of energy windows can be three or greater, the new advantage with respect to quantitative imaging is the ability to image contrast agents that possess a K-edge in the diagnostic X-ray energy range [6, 7, 8, 9, 10, 11].

Use of energy information in X-ray CT has been proposed almost since the conception of CT itself [12]. Dual-energy CT acquires transmission intensity at either two energy windows or for two different X-ray source spectra. Despite the extremely coarse energy-resolution, the technique is effective because for many materials only two physical processes, photo-electric effect and Compton scattering, dominate X-ray attenuation in the diagnostic energy range [2]. Within the context of dual-energy, the processing methods of energy-windowed intensity data have been classified in two broad categories: pre-reconstruction and post-reconstruction [13]. The majority of processing methods for multi-window data also fall into these categories.

In pre-reconstruction processing of the multi-energy data, the X-ray attenuation map is expressed as a sum of terms based on physical process or basis materials [2]. The multi-energy data are converted to sinograms of the basis maps, then any image reconstruction technique can be employed to invert these sinograms. The basis maps can subsequently be combined to obtain images of other desired quantities: estimated X-ray attenuation maps at a single energy, material maps, atomic number, or electron density maps. The main advantage of pre-reconstruction processing is that beam-hardening artifacts can be avoided, because consistent sinograms of the basis maps are estimated prior to image reconstruction. Two major challenges for pre-reconstruction methods are the need to calibrate the spectral transmission model and to acquire registered projections. Photon-counting detectors ease the implementation of projection registration, because multiple energy-thresholding circuits can operate on the same detection element signal. Accounting for detection physics and spectral calibration by data pre-processing or incorporation directly in the image reconstruction algorithm remains a challenge for photon-counting detectors [1].

For post-reconstruction processing, the energy-windowed transmission data are processed by the standard negative logarithm to obtain approximate sinograms of a weighted energy-averaged attenuation map followed by standard image reconstruction. The resulting images can be combined to obtain approximate estimates of images of the same physical quantities as mentioned for the pre-reconstruction processing [14]. The advantage of post-reconstruction processing is that it is relatively simple, because it is only a small modification on how standard CT data are processed and there is no requirement of projection registration. The down-side, however, is that the images corresponding to each energy-window are susceptible to beam-hardening artifacts because the negative logarithm processed data will, in general, not be consistent with the projection of any object.

A third option for the processing of spectral CT data, however, does exist, which due to difficulties arising from the nonlinearity of the attenuation of polychromatic X-rays when passing through an object, is much less common than either pre- or post-reconstruction methods: direct estimation of basis maps from energy-windowed transmission data. This approach, labeled the one-step approach in the remainder of the article, has the advantages that the spectral transmission model is treated exactly, there is no need for registered projections, and constraints on the basis maps can be incorporated together with the fitting of the spectral CT data. The main difficulty of the one-step approach is that it necessitates an iterative algorithm because the corresponding transmission data model is too complex for analytic solution, at present. Iterative image reconstruction (IIR) has been applied to spectral CT in order to address the added complexity of the data model [15, 16, 17, 18, 19, 20, 21, 22].

In this work, we develop a framework that addresses one-step image reconstruction in spectral CT allowing for non-smooth convex constraints to be applied to the basis maps. We demonstrate the algorithm with the use of total variation (TV) constraints, but the framework allows for other constraints such as non-negativity, upper bounds, and sum bounds applied to either the basis maps or to a composite image such as an estimated mono-chromatic attenuation map.

We draw upon recent developments in large-scale first-order algorithms and adapt them to incorporate the non-linear model for spectral CT to optimize the data-fidelity of the estimated image by minimizing the discrepancy between the observed and estimated data. We present an algorithm framework for constrained optimization, deriving algorithms for minimizing the data discrepancy based on least-squares fitting and on a transmission Poisson likelihood model. As previously mentioned, the framework admits many convex constraints that can be exploited to stabilize image reconstruction from spectral CT data. Section II presents the constrained optimization for one-step spectral CT image reconstruction; Sec. III presents a convex-concave primal-dual algorithm that addresses the non-convex data discrepancy term arising from the non-linear spectral CT data model; and Sec. IV demonstrates the proposed algorithm with simulated spectral CT transmission data.

II One-step image reconstruction for spectral CT

II-A Spectral CT data model

For the present work, we employ a basic spectral model for the energy-windowed transmitted X-ray intensity along a ray \ell, where the transmitted X-ray intensity in the energy window ww for ray \ell is given by

Iw=ESw(E)exp[tμ(E,r(t))𝖽t]𝖽E.I_{w\ell}=\int_{E}S_{w\ell}(E)\exp\left[-\int_{t\in\ell}\mu(E,\vec{r}(t))\,\mathsf{d}t\right]\,\mathsf{d}E.

Here t\int_{t\in\ell} denotes that we are integrating along the ray \ell while E\int_{E} integrates over the range of energy; Sw(E)S_{w\ell}(E) is the product of the X-ray beam spectrum intensity and detector sensitivity for the energy window ww and transmission ray \ell at energy EE; and μ(E,r)\mu(E,\vec{r}) is the linear X-ray attenuation coefficient for energy EE at the spatial location r\vec{r}. Let Iw(0)I^{(0)}_{w\ell} be the transmitted intensity in the setting where no object is present between the X-ray beam and the detector (i.e. attenuation is set to zero), given by

Iw(0)=ESw(E)𝖽E;sw(E)=Sw(E)/Iw(0).I^{(0)}_{w\ell}=\int_{E}S_{w\ell}(E)\,\mathsf{d}E;\;s_{w\ell}(E)=S_{w\ell}(E)/I^{(0)}_{w\ell}.

Then we can write

Iw=Iw(0)Esw(E)exp[tμ(E,r(t))𝖽t]𝖽E,I_{w\ell}=I^{(0)}_{w\ell}\int_{E}s_{w\ell}(E)\exp\left[-\int_{t\in\ell}\mu(E,\vec{r}(t))\,\mathsf{d}t\right]\,\mathsf{d}E, (1)

where sw(E)s_{w\ell}(E) represents the normalized energy distribution of X-ray intensity and detector sensitivity. Image reconstruction for spectral CT aims to recovery the a complete energy-dependent linear attenuation map μ(E,r)\mu(E,\vec{r}) from intensity measurements IwI_{w\ell} in all windows ww and rays \ell comprising the X-ray projection data set.

Throughout the article we use the convention that NxN_{x} is the dimension of the discrete index xx. For example, the spectral CT data set consists of NwN_{w} energy windows and NN_{\ell} transmission rays.

This inverse problem is simplified by exploiting the fact that the energy-dependence of the X-ray attenuation coefficient can be represented efficiently by a low-dimensional expansion. For the present work, we employ the basis material expansion

μ(E,r)=mμm(E)fm(r)=m(μm(E)ρm)ρmfm(r),\mu(E,\vec{r})=\sum_{m}\mu_{m}(E)f_{m}(\vec{r})=\sum_{m}\left(\frac{\mu_{m}(E)}{\rho_{m}}\right)\rho_{m}f_{m}(\vec{r}), (2)

where ρm\rho_{m} is the density of material mm; the X-ray mass attenuation coefficient μm(E)/ρm\mu_{m}(E)/\rho_{m} are available from the national institute of standards and technology (NIST) report by Hubbell and Seltzer [23]; and fm(r)f_{m}(\vec{r}) is the fractional density map of material mm at location r\vec{r}. For the present spectral CT image reconstruction problem, we aim to recover fm(r)f_{m}(\vec{r}), which we refer to as the material maps.

Proceeding with the spectral CT model, we discretize the material maps fm(r)f_{m}(\vec{r}) by use of an expansion set

fm(r)=kNkfkmϕk(map)(r),f_{m}(\vec{r})=\sum_{k}^{N_{k}}f_{km}\phi_{k}^{\text{(map)}}(\vec{r}),

where ϕk(map)(r)\phi_{k}^{\text{(map)}}(\vec{r}) are the representation functions for the material maps, respectively. For the 2D/3D image representation standard pixels/voxels are employed, that is, kk indexes the pixels/voxels. With the spatial expansion set, the line integration over the material maps is represented by a matrix XX with entry XkX_{\ell k} measuring the length of the intersection between ray \ell and pixel kk:

tμ(E,r(t))𝖽t=mkμm(E)Xkfkm,\int_{t\in\ell}\mu(E,\vec{r}(t))\,\mathsf{d}t=\sum_{mk}\mu_{m}(E)X_{\ell k}f_{km},

where formally we can calculate

Xk=tϕk(map)(r)𝖽t.X_{\ell k}=\int_{t\in\ell}\phi_{k}^{\text{(map)}}(\vec{r})\,\mathsf{d}t.

This integration results in the standard line-intersection method for the pixel/voxel basis.

The discretization of the integration over energy EE in Eq. (1) is perform by use of a Riemann sum approximation.

Iw=Iw(0)Esw(E)exp[tμ(E,r)𝖽t]𝖽EIw(0)iΔEisw(Ei)exp[tμ(Ei,r)𝖽t]=Iw(0)iswiexp[mkμmiXkfkm],I_{w\ell}=I^{(0)}_{w\ell}\int_{E}s_{w\ell}(E)\exp\left[-\int_{t\in\ell}\mu(E,\vec{r})\,\mathsf{d}t\right]\,\mathsf{d}E\\ \approx I^{(0)}_{w\ell}\sum_{i}\Delta E_{i}s_{w\ell}(E_{i})\exp\left[-\int_{t\in\ell}\mu(E_{i},\vec{r})\,\mathsf{d}t\right]=I^{(0)}_{w\ell}\sum_{i}s_{w\ell i}\exp\left[-\sum_{mk}\mu_{mi}X_{\ell k}f_{km}\right], (3)

where ii indexes the discretized energy EE and

swi=ΔEisw(Ei) and μmi=μm(Ei).s_{w\ell i}=\Delta E_{i}s_{w\ell}(E_{i})\text{ and }\mu_{mi}=\mu_{m}(E_{i}).

With the Riemann sum approximation we normalize the discrete window spectra,

iswi=1.\sum_{i}s_{w\ell i}=1.

Modeling photon-counting detection, we express X-ray incident and transmitted spectral fluence in terms of numbers of photons per ray \ell (as before, the ray \ell identifies the source detector-bin combinations) and energy window ww:

c^w=Nwiswiexp[mkμmiXkfkm],\hat{c}_{w\ell}=N_{w\ell}\sum_{i}s_{w\ell i}\exp\left[-\sum_{mk}\mu_{mi}X_{\ell k}f_{km}\right], (4)

where NwN_{w\ell} is the incident spectral fluence and c^w\hat{c}_{w\ell} is interpreted as a mean transmitted fluence. Note that in general the right hand side of Eq. (4) evaluates to a non-integer value and as a result the left hand side variable cannot be assigned to an integer as would be implied by reporting transmitted fluence in terms of numbers of photons. This inconsistency is rectified by interpreting the left hand side variable, c^w\hat{c}_{w\ell}, as an expected value.

II-B Constrained optimization for one-step basis decomposition

For the purpose of developing one-step image reconstruction of the basis material maps from transmission counts data, we formulate a constrained optimization involving minimization of a non-convex data-discrepancy objective function subject to convex constraints. The optimization problem of interest takes the following form

f=argminf{wD(cw,c^w(f))+iδ(Pi)},f^{*}=\operatorname*{arg\,min}_{f}\left\{\sum_{w\ell}D(c_{w\ell},\hat{c}_{w\ell}(f))+\sum_{i}\delta(P_{i})\right\}, (5)

where cwc_{w\ell} are the measured counts in energy window ww and ray \ell; D(,)D(\cdot,\cdot) is a generic data discrepancy objective function; and the indicator functions δ(Pi)\delta(P_{i}) enforce the convex constraints fPif\in P_{i}, the PiP_{i} are convex sets corresponding to the desired constraints (for instance, nonnegativity of the material maps). The indicator function is defined

δ(P)={0fPfP.\delta(P)=\begin{cases}0&f\in P\\ \infty&f\notin P\end{cases}. (6)

Use of constrained optimization with TV constraints is demonstrated in Sec. IV.

Data discrepancy functions

For the present work, we consider two data discrepancy functions: transmission Poisson likelihood (TPL) and least-squares (LSQ)

DTPL(c,c^(f))\displaystyle D_{\text{TPL}}(c,\hat{c}(f)) =w[c^w(f)cwcwlog(c^w(f)/cw)]\displaystyle=\sum_{w\ell}\left[\hat{c}_{w\ell}(f)-c_{w\ell}-c_{w\ell}\log\left(\hat{c}_{w\ell}(f)/c_{w\ell}\right)\right] (7)
DLSQ(c,c^(f))\displaystyle D_{\text{LSQ}}(c,\hat{c}(f)) =12w[log(cw)log(c^w(f))]2.\displaystyle=\frac{1}{2}\sum_{w\ell}\left[\log(c_{w\ell})-\log\left(\hat{c}_{w\ell}(f)\right)\right]^{2}. (8)

The TPL data discrepancy function is derived from the negative log likelihood of a stochastic model of the counts data

cw𝖯𝗈𝗂𝗌𝗌𝗈𝗇(c^w(f)),c_{w\ell}\sim\mathsf{Poisson}\left(\hat{c}_{w\ell}(f)\right),

that is, minimizing DTPLD_{\text{TPL}} is equivalent to maximizing the likelihood. Note that in defining DTPLD_{\text{TPL}} we have subtracted a term independent of ff from the negative log likelihood so that DTPLD_{\text{TPL}} is zero when c=c^(f)c=\hat{c}(f), and positive otherwise. From a physics perspective, the important difference between these two data discrepancy functions is how they each weight the individual measurements; the LSQ function treats all measurements equally while the TPL function gives greater weight to higher count measurements. We point out this property to emphasize that the TPL data discrepancy can be useful even when there are data inconsistencies due to other physical factors besides the stochastic nature of the counts measurement. This alternate weighting is also achieved without introducing additional parameters as would be the case for a weighted quadratic data discrepancy. From a mathematics perspective, both data functions are convex functions of c^w(f)\hat{c}_{w\ell}(f), but they are non-convex functions of ff. It is the non-convexity with respect to ff that drives the main theoretical and algorithmic development of this work. Although we consider only these two data fidelities, the same methods can be applied to other functions.

Convex constraints

The present algorithm framework allows for convex constraints that may improve reconstruction of the basis material maps. In Eq. (5) the constraints are coded with indicator functions, but here we express the constraints by the inequalities that define the convex set to which the material maps are restricted. When the basis materials are identical to the materials actually present in the subject, the basis maps can be highly constrained. Physically, the fractional densities represented by each material map must take on a value between zero and one, and the corresponding constraint is

0fmk1.0\leq f_{mk}\leq 1. (9)

Similarly, the sum of the fractional densities cannot be greater than one, leading to a constraint on the sum of material maps

mfmk1.\sum_{m}f_{mk}\leq 1. (10)

Care must be taken, however, in using these bound and sum constraints when the basis materials used for computation are not the same as the materials actually present in the scanned object. The bounds on the material maps and their sum must likely be loosened, and therefore they may not be as effective.

In medical imaging, where multiple soft tissues comprise the subject, it is standard to employ a spectral CT materials basis which does not include many of the tissue/density combinations present. The reason for this is that soft tissues such as muscle, fat, brain, blood, etc., all have attenuation curves similar to water, and recovering each of these soft tissues individually becomes an extremely ill-posed inverse problem. For spectral CT, it is common to employ a two-material expansion set, such as bone and water, and possibly a third material for representing contrast agent that has a K-edge in the diagnostic X-ray energy range. The displayed image can then be the basis material maps or the estimated X-ray attenuation map for a single energy EE, also known as a monochromatic image

fk(mono)(E)=m(μm(E)ρm)ρmfmk.f^{\text{(mono)}}_{k}(E)=\sum_{m}\left(\frac{\mu_{m}(E)}{\rho_{m}}\right)\rho_{m}f_{mk}. (11)

A non-negativity constraint can be applied to the monochromatic image

fk(mono)(E)0f^{\text{(mono)}}_{k}(E)\geq 0

at one or more energies. This constraint makes physical sense even when the basis materials are not the same as the materials in the subject.

Finally, we formulate 1\ell_{1}-norm constraints on the gradient magnitude images, also known as the total variation, in order to encourage gradient magnitude sparsity in either the basis material maps or the monochromatic image. In applying TV constraints to the basis material maps, we allow for different constraint values γm\gamma_{m} for each material

fmTV(|fm|)1γm,\|f_{m}\|_{\text{TV}}\equiv\left\|\left(|\nabla f_{m}|\right)\right\|_{1}\leq\gamma_{m},

where \nabla represents the finite-differencing approximation to the gradient, and we use |||\cdot| to represent a spatial magnitude operator so that |fm||\nabla f_{m}| is the gradient magnitude image (GMI) of material map mm. Similarly, a TV constraint can be formulated so that it applies to the monochromatic image at energy EE

f(mono)(E)TV(|f(mono)(E)|)1γmono(E),\|f^{\text{(mono)}}(E)\|_{\text{TV}}\equiv\left\|\left(|\nabla f^{\text{(mono)}}(E)|\right)\right\|_{1}\leq\gamma_{\text{mono}}(E),

where the constraint can be applied at one or more values of EE.

The constraints involving TV of the material maps and the monochromatic image are specifically in Sec. IV. Many other convex constraints can be incorporated into the presented framework such as constraints on a generalized TV computed from multiple monochromatic images [24].

III A first-order algorithm for spectral CT constrained optimization

The proposed algorithm derives from the primal-dual algorithm of Chambolle and Pock (CP) [25, 26, 27]. Considering the general constrained optimization form in Eq. (5), the second term coding the convex constraints can be treated in the same way as shown in Refs. [28, 29]. The main algorithmic development, presented here, is the generalization and adaptation of CP’s primal-dual algorithm to the minimization of the data discrepancy term, the first term of Eq. (5). We derive the data fidelity steps specifically focusing on the deriving steps for DTPLD_{\text{TPL}} and DLSQD_{\text{LSQ}}.

Optimizing the spectral CT data fidelity

We first sketch the main developments of the algorithm for minimizing the non-convex data discrepancy terms, and then explain each step in detail. The overall design of the algorithm is comprised of two nested iteration loops. The outer iteration loop involves derivation of a convex quadratic upper bound to the local quadratic Taylor expansion about the current estimate for the material maps. The inner iteration loop takes descent steps for the current quadratic upper bound. Although the algorithm construction formally involves two nested iteration loops, in practice the number of inner loop iterations is set to one. Thus, effectively the algorithm consists only of a single iteration loop where a re-expansion of the data discrepancy term is performed at every iteration.

The local convex quadratic upper bound, used to generate descent steps for the non-convex data discrepancy terms, does not fit directly with the generic primal-dual optimization form used by CP. A convex-concave generalization to the CP primal-dual algorithm is needed. The resulting algorithm called mirrored convex-concave (MOCCA) algorithm is presented in detail in Ref. [30]. For the one-step spectral CT image reconstruction algorithm we present: the local convex quadratic upper bound, a short description of MOCCA and its application in the present context, preconditioning, and convergence checks for the spectral CT image reconstruction algorithm.

III-A A local convex quadratic upper bound to the spectral CT data discrepancy terms

III-A1 Quadratic expansion

We carry out the deriviations on DLSQD_{\text{LSQ}} and DTPLD_{\text{TPL}} in parallel. The local quadratic expansion for each of these data discrepancy terms about the material maps f=f0f=f_{0} is

L(f)L(f0)+(ff0)fL(f0)+12(ff0)f2L(f0)(ff0),\displaystyle L(f)\approx L\left(f_{0}\right)+\left(f-f_{0}\right)^{\top}\nabla_{f}L\left(f_{0}\right)+\frac{1}{2}\left(f-f_{0}\right)^{\top}\nabla^{2}_{f}L\left(f_{0}\right)\left(f-f_{0}\right), (12)
where
L(f)=D(c,c^(f)).\displaystyle L(f)=D(c,\hat{c}(f)).

To obtain the desired expansions, we need expressions for the gradient and Hessian of each data discrepancy. The gradient of LTPL(f)L_{\text{TPL}}(f) is derived explicitly in Appendix A; we do not show the details for the other derivations. The data discrepancy gradients are:

fLTPL(f)\displaystyle\nabla_{f}L_{\text{TPL}}(f) =ZA(f)r(f),\displaystyle=Z^{\top}A(f)^{\top}r(f), (13)
fLLSQ(f)\displaystyle\nabla_{f}L_{\text{LSQ}}(f) =ZA(f)r(log)(f),\displaystyle=Z^{\top}A(f)^{\top}r^{\text{(log)}}(f), (14)

where rr and r(log)r^{\text{(log)}} denote the residuals in terms of counts or log counts:

rw(f)\displaystyle r_{w\ell}(f) =cwc^w(f),\displaystyle=c_{w\ell}-\hat{c}_{w\ell}(f), (15)
rw(log)(f)\displaystyle r^{\text{(log)}}_{w\ell}(f) =log(cw)log(c^w(f));\displaystyle=\log(c_{w\ell})-\log\left(\hat{c}_{w\ell}(f)\right); (16)

ZZ represents the combined linear transform that accepts material maps, performs projection, and then combines the resulting sinograms to form monochromatic sinograms at energy EiE_{i}:

Zi,mk=μmiXk;Z_{\ell i,mk}=\mu_{mi}X_{\ell k}; (17)

and A(f)A(f) is a term that results from the gradient of the logarithm of the estimated counts logc^(f)\log\hat{c}(f):

Aw,i(f)=\displaystyle A_{w\ell,\ell^{\prime}i}(f)= swiexp[(Zf)i]iswiexp[(Zf)i]𝐈\displaystyle\frac{s_{w\ell i}\exp\left[-(Zf)_{\ell i}\right]}{\sum_{i^{\prime}}s_{w\ell i^{\prime}}\exp\left[-(Zf)_{\ell i^{\prime}}\right]}\mathbf{I}_{\ell\ell^{\prime}} (18)
𝐈=\displaystyle\mathbf{I}_{\ell\ell^{\prime}}= {01=.\displaystyle\begin{cases}0&\ell\neq\ell^{\prime}\\ 1&\ell=\ell^{\prime}\end{cases}.

Using the same variable and linear transform definitions, the expressions for the two Hessians are

f2LTPL(f)\displaystyle\nabla^{2}_{f}L_{\text{TPL}}(f) =Zdiag(A(f)r(f))Z+ZA(f)diag(c^(f)+r(f))A(f)Z,\displaystyle=-Z^{\top}\operatorname{diag}\left(A(f)^{\top}r(f)\right)Z+Z^{\top}A(f)^{\top}\operatorname{diag}\left(\hat{c}(f)+r(f)\right)A(f)Z, (19)
f2LLSQ(f)\displaystyle\nabla^{2}_{f}L_{\text{LSQ}}(f) =Zdiag(A(f)r(log)(f))Z+ZA(f)diag(1+r(log)(f))A(f)Z.\displaystyle=-Z^{\top}\operatorname{diag}\left(A(f)^{\top}r^{\text{(log)}}(f)\right)Z+Z^{\top}A(f)^{\top}\operatorname{diag}\left(1+r^{\text{(log)}}(f)\right)A(f)Z. (20)

Substituting either Eq. (19) or (20) for the Hessian and either Eq. (13) or (14) for the gradient into the Taylor expansion in Eq. (12), yields the quadratic approximation to the data discrepancy terms of interest. This quadratic is in general non-convex because both Hessian expressions can have negative values.

III-A2 A local convex upper bound to L(f)L(f)

The key to deriving a local convex upper bound to the quadratic expansion of L(f)L(f) is to split the Hessian expressions into positive and negative components. Setting the negative components to zero and substituting this thresholded Hessian into the Taylor’s expansion, yields a quadratic term with non-negative curvature. (As an aside, a tighter convex local quadratic upper bound would be attained by diagonalizing the Hessian and forming a positive semi-definite Hessian by keeping eigenvectors corresponding to only non-negative eigenvalues in the eigenvalue decomposition, but for realistic sized tomography configurations such an eigenvalue decomposition is impractical.) The algebraic steps for splitting the Hessian into positive and negative components in the form

f2L(f)=+2L(f)2L(f),\nabla^{2}_{f}L(f)=\nabla^{2}_{+}L(f)-\nabla^{2}_{-}L(f),

where +2L(f)\nabla^{2}_{+}L(f) and 2L(f)\nabla^{2}_{-}L(f) are both positive semidefinite (see Appendix B for more details). The resulting split expressions are:

+2LTPL(f)\displaystyle\nabla^{2}_{+}L_{\text{TPL}}(f) =Zdiag(A(f)r(f))Z+ZA(f)diag(c^(f)r(f))A(f)Z,\displaystyle=Z^{\top}\operatorname{diag}\left(A(f)^{\top}r_{-}(f)\right)Z+Z^{\top}A(f)^{\top}\operatorname{diag}\left(\hat{c}(f)-r_{-}(f)\right)A(f)Z, (21)
2LTPL(f)\displaystyle\nabla^{2}_{-}L_{\text{TPL}}(f) =Zdiag(A(f)r+(f))ZZA(f)diag(r+(f))A(f)Z,\displaystyle=Z^{\top}\operatorname{diag}\left(A(f)^{\top}r_{+}(f)\right)Z-Z^{\top}A(f)^{\top}\operatorname{diag}\left(r_{+}(f)\right)A(f)Z, (22)

and

+2LLSQ(f)\displaystyle\nabla^{2}_{+}L_{\text{LSQ}}(f) =Zdiag(A(f)r(log)(f))Z+ZA(f)diag(1r(log)(f))A(f)Z,\displaystyle=Z^{\top}\operatorname{diag}\left(A(f)^{\top}r_{-}^{\text{(log)}}(f)\right)Z+Z^{\top}A(f)^{\top}\operatorname{diag}\left(1-r_{-}^{\text{(log)}}(f)\right)A(f)Z, (23)
2LLSQ(f)\displaystyle\nabla^{2}_{-}L_{\text{LSQ}}(f) =Zdiag(A(f)r+(log)(f))ZZA(f)diag(r+(log)(f))A(f)Z,\displaystyle=Z^{\top}\operatorname{diag}\left(A(f)^{\top}r_{+}^{\text{(log)}}(f)\right)Z-Z^{\top}A(f)^{\top}\operatorname{diag}\left(r_{+}^{\text{(log)}}(f)\right)A(f)Z, (24)

where

r(f)=r+(f)r(f),r+(f)=max[r(f),0] and r(f)=max[r(f),0],r(f)=r_{+}(f)-r_{-}(f),\;\;r_{+}(f)=\max\left[r(f),0\right]\text{ and }r_{-}(f)=\max\left[-r(f),0\right],

and similarly

r(log)(f)=r+(log)(f)r(log)(f),r+(log)(f)=max[r(log)(f),0] and r(log)(f)=max[r(log)(f),0].r^{\text{(log)}}(f)=r^{\text{(log)}}_{+}(f)-r^{\text{(log)}}_{-}(f),\;\;r_{+}^{\text{(log)}}(f)=\max\left[r^{\text{(log)}}(f),0\right]\text{ and }r^{\text{(log)}}_{-}(f)=\max\left[-r^{\text{(log)}}(f),0\right].

To summarize, the expression for the convex local upper bound to the quadratic approximation in Eq. (12) is

Q(c,f0;f)=L(f0)+(ff0)fL(f0)+12(ff0)+2L(f0)(ff0),Q\left(c,f_{0};f\right)=L\left(f_{0}\right)+\left(f-f_{0}\right)^{\top}\nabla_{f}L\left(f_{0}\right)+\frac{1}{2}\left(f-f_{0}\right)^{\top}\nabla^{2}_{+}L\left(f_{0}\right)\left(f-f_{0}\right), (25)

where +2\nabla^{2}_{+} is used instead of f2\nabla^{2}_{f} in the quadratic term. Here QQ depends parametrically on the counts data cc, through the function LL (see Eq. (12)), and the expansion center f0f_{0}. The gradients of LL at f0f_{0} are obtained from Eqs. (13) and (14), and the Hessian upper bounds are available from Eqs. (21) and (23). Note that the quadratic expression in Eq. (25) is not necessarily an upper bound of the data discrepancy functions, even locally, because we bound only the quadratic expansion. We employ the convex function Q(c,f0;f)Q\left(c,f_{0};f\right) combined with convex constraints to generate descent steps for the generic non-convex optimization problem specified in Eq. (5).

III-B The motivation and application of MOCCA

III-B1 Summary of the Chambolle-Pock (CP) primal-dual framework

The generic convex optimization addressed in Ref. [25] is

x=argminx{F(Kx)+G(x)},x^{\star}=\operatorname*{arg\,min}_{x}\left\{F(Kx)+G(x)\right\}, (26)

where FF and GG are convex, possibly non-smooth, functions and KK is a matrix multiplying the vector xx. The ability to handle non-smooth convex functions is key for addressing the convex constraints of Eq. (5). In the primal-dual picture this minimization is embedded in a larger saddle point problem

minxmaxy{yKxF(y)+G(x)},\min_{x}\max_{y}\left\{y^{\top}Kx-F^{*}(y)+G(x)\right\}, (27)

using the Legendre transform or convex conjugation

F(y)=maxx{xyF(x)},F^{*}(y)=\max_{x}\left\{x^{\top}y-F(x)\right\}, (28)

and the fact that

F(x)=F(x)=maxy{yxF(y)}F(x)=F^{**}(x)=\max_{y}\left\{y^{\top}x-F^{*}(y)\right\} (29)

if FF is a convex function. The CP primal-dual algorithm of interest solves Eq. (27) by iterating on the following steps

y(n+1)\displaystyle y^{(n+1)} =argminy{F(y)+12σy(n)+σKx¯(n)y22}\displaystyle=\operatorname*{arg\,min}_{y^{\prime}}\left\{F^{*}(y^{\prime})+\frac{1}{2\sigma}\|y^{(n)}+\sigma K\bar{x}^{(n)}-y^{\prime}\|^{2}_{2}\right\} (30)
x(n+1)\displaystyle x^{(n+1)} =argminx{G(x)+12τx(n)τKy(n+1)x22}\displaystyle=\operatorname*{arg\,min}_{x^{\prime}}\left\{G(x^{\prime})+\frac{1}{2\tau}\|x^{(n)}-\tau K^{\top}y^{(n+1)}-x^{\prime}\|^{2}_{2}\right\} (31)
x¯(n+1)\displaystyle\bar{x}^{(n+1)} =2x(n+1)x(n),\displaystyle=2x^{(n+1)}-x^{(n)}, (32)

where nn is the iteration index; σ>0\sigma>0 and τ>0\tau>0 are the primal and dual step sizes, respectively, and these step sizes must satisfy the inequality

στ<1K22\sigma\tau<\frac{1}{\|K\|^{2}_{2}}

where K2\|K\|_{2} is the largest singular value of KK. Because this algorithm solves the saddle point problem, Eq. (27), one obtains the solution to the primal problem, Eq. (26) along with its Fenchel dual

y=argmaxx{F(y)G(Ky)}.y^{\star}=\operatorname*{arg\,max}_{x}\left\{-F^{*}(y)-G^{*}(-K^{\top}y)\right\}. (33)

The fact that both Eqs. (26) and (33) are solved simultaneously provides a convergence check: the primal-dual gap, the difference between the objective functions of Eqs. (26) and (33), tends to zero as the iteration number increases.

In some settings, the requirement στ<1K22\sigma\tau<\frac{1}{\|K\|^{2}_{2}} may be impractical or too conservative, and the CP algorithm can instead be implemented with diagonal matrices Σ\Sigma and TT in place of σ\sigma and τ\tau [26], with the condition Σ1/2KT1/2<1\|\Sigma^{1/2}KT^{1/2}\|<1 and the revised steps

y(n+1)\displaystyle y^{(n+1)} =argminy{F(y)+12y(n)+ΣKx¯(n)yΣ12}\displaystyle=\operatorname*{arg\,min}_{y^{\prime}}\left\{F^{*}(y^{\prime})+\frac{1}{2}\|y^{(n)}+\Sigma K\bar{x}^{(n)}-y^{\prime}\|^{2}_{\Sigma^{-1}}\right\} (34)
x(n+1)\displaystyle x^{(n+1)} =argminx{G(x)+12x(n)TKy(n+1)xT12}\displaystyle=\operatorname*{arg\,min}_{x^{\prime}}\left\{G(x^{\prime})+\frac{1}{2}\|x^{(n)}-TK^{\top}y^{(n+1)}-x^{\prime}\|^{2}_{T^{-1}}\right\} (35)
x¯(n+1)\displaystyle\bar{x}^{(n+1)} =2x(n+1)x(n),\displaystyle=2x^{(n+1)}-x^{(n)}, (36)

where for a positive semidefinite matrix AA the norm zA\|z\|_{A} is defined as zAz\sqrt{z^{\top}Az}.

III-B2 The need to generalize the CP primal-dual framework

To apply the CP primal-dual algorithm to QQ for fixed f0f_{0}, we need to write Eq. (25) in the form of the objective function in Eq. (26). Manipulating the expression for QQ and dropping all terms that are constant with respect to ff, we obtain

Q(c,f0;f)\displaystyle Q\left(c,f_{0};f\right) =12fK(DE)KffKb,\displaystyle=\frac{1}{2}f^{\top}K^{\top}(D-E)Kf-f^{\top}K^{\top}b, (37)
K\displaystyle K =(K1K2)=(A(f0)ZZ),\displaystyle=\left(\begin{array}[]{c}K_{1}\\ K_{2}\end{array}\right)=\left(\begin{array}[]{c}A\left(f_{0}\right)Z\\ Z\end{array}\right), (42)
D\displaystyle D =(D100D2),E=(E1000),b=(b1b2)\displaystyle=\left(\begin{array}[]{cc}D_{1}&0\\ 0&D_{2}\end{array}\right),\;\;E=\left(\begin{array}[]{cc}E_{1}&0\\ 0&0\end{array}\right),\;\;b=\left(\begin{array}[]{c}b_{1}\\ b_{2}\end{array}\right) (49)
D1\displaystyle D_{1} ={diag[c^(f0)]if L=LTPL𝐈if L=LLSQ,\displaystyle=\begin{cases}\operatorname{diag}\left[\hat{c}\left(f_{0}\right)\right]&\text{if }L=L_{\text{TPL}}\\ \mathbf{I}&\text{if }L=L_{\text{LSQ}}\end{cases},
D2\displaystyle D_{2} ={diag[A(f0)r(f0)]if L=LTPLdiag[A(f0)r(log)(f0)]if L=LLSQ,\displaystyle=\begin{cases}\operatorname{diag}\left[A\left(f_{0}\right)^{\top}r_{-}\left(f_{0}\right)\right]&\text{if }L=L_{\text{TPL}}\\ \operatorname{diag}\left[A\left(f_{0}\right)^{\top}r^{\text{(log)}}_{-}\left(f_{0}\right)\right]&\text{if }L=L_{\text{LSQ}}\end{cases},
E1\displaystyle E_{1} ={diag[r(f0)]if L=LTPLdiag[r(log)(f0)]if L=LLSQ,\displaystyle=\begin{cases}\operatorname{diag}\left[r_{-}\left(f_{0}\right)\right]&\text{if }L=L_{\text{TPL}}\\ \operatorname{diag}\left[r^{\text{(log)}}_{-}\left(f_{0}\right)\right]&\text{if }L=L_{\text{LSQ}}\end{cases},
b1\displaystyle b_{1} ={(D1E1)K1f0r(f0)if L=LTPL(D1E1)K1f0r(log)(f0)if L=LLSQ,\displaystyle=\begin{cases}(D_{1}-E_{1})K_{1}f_{0}-r\left(f_{0}\right)&\text{if }L=L_{\text{TPL}}\\ (D_{1}-E_{1})K_{1}f_{0}-r^{\text{(log)}}\left(f_{0}\right)&\text{if }L=L_{\text{LSQ}}\end{cases},
b2\displaystyle b_{2} =D2K2f0.\displaystyle=D_{2}K_{2}f_{0}.

The matrices DD and EE are nonnegative and depend on cc and f0f_{0}; bb is a vector which also depends on cc and f0f_{0}; and KK is a matrix that depends only on f0f_{0}. Both terms of QQ are functions of KfKf and accordingly QQ is identified with the function FF in the objective function of Eq. (26)

Q(c,f0;f)\displaystyle Q\left(c,f_{0};f\right) =FQ(Kf),\displaystyle=F_{Q}(Kf),
FQ(z)\displaystyle F_{Q}(z) =12z(DE)zzb.\displaystyle=\frac{1}{2}z^{\top}(D-E)z-z^{\top}b. (50)

Because QQ is a convex function of ff, FQF_{Q} is convex as a function of ff. The function FQF_{Q}, however, is not a convex function of zz. Because DD and EE are non-negative matrices, FQF_{Q} is a difference of convex functions of zz,

FQ(z)\displaystyle F_{Q}(z) =FQ+(z)FQ(z),\displaystyle=F_{Q+}(z)-F_{Q-}(z),
FQ+(z)\displaystyle F_{Q+}(z) =12zDzzb,\displaystyle=\frac{1}{2}z^{\top}Dz-z^{\top}b,
FQ(z)\displaystyle F_{Q-}(z) =12zEz,\displaystyle=\frac{1}{2}z^{\top}Ez,

where FQ+(z)F_{Q+}(z) and FQ(z)F_{Q-}(z) are convex functions of zz. That FQ(z)F_{Q}(z) is not convex implies that FQF_{Q} cannot be written as the convex conjugate of FQ(y)F_{Q}^{*}(y), and performing the maximization over yy in Eq. (27) no longer yields Eq. (26).

III-B3 Heuristic derivation of MOCCA

To generalize the CP algorithm to allow the case of interest, we consider the function FF to be a convex-concave

F(z)=F+(z)F(z),F(z)=F_{+}(z)-F_{-}(z),

where F+F_{+} and FF_{-} are both convex. The heuristic strategy for MOCCA is to employ a convex approximation to F(z)F(z) in the neighborhood of a point z=z0z=z_{0}

Fconvex(z0;z)=F+(z)zF(z0);F_{\text{convex}}(z_{0};z)=F_{+}(z)-z^{\top}\nabla F_{-}(z_{0}); (51)

(again we drop terms that are constant with respect to zz). We then execute an iteration of the CP algorithm on the convex function Fconvex(z0;z)F_{\text{convex}}(z_{0};z); and then modify the point of convex approximation z0z_{0} and repeat the iteration. The question then is how to choose z0z_{0}, the center for the convex approximation, in light of the fact that the optimization of FF in the CP algorithm happens in the dual space with FF^{*}, see Eq. (30).

A corresponding primal point to a point in the dual space can be determined by selecting the maximizer of the objective function in the definition of the Legendre transform. Taking the gradient of the objective function in Eq. (29) and setting it to zero, yields

x=yF(y).x=\nabla_{y}F^{*}(y). (52)

We use this relation to find the expansion point for the primal objective function that mirrors the current value of the dual variables.

Incorporating the convex approximation Fconvex(z0;z)F_{\text{convex}}(z_{0};z) about the mirrored expansion point z0z_{0} into the CP algorithm, yields the iteration steps for MOCCA

z0(n+1)\displaystyle z^{(n+1)}_{0} =yFconvex(z0(n);y(n))=Σ1(y(n1)y(n)+ΣKf¯(n1))\displaystyle=\nabla_{y}F_{\text{convex}}^{*}\left(z^{(n)}_{0};y^{(n)}\right)=\Sigma^{-1}(y^{(n-1)}-y^{(n)}+\Sigma K\bar{f}^{(n-1)}) (53)
y(n+1)\displaystyle y^{(n+1)} =argminy{Fconvex(z0(n+1);y)+12y(n)+ΣKf¯(n)yΣ12}\displaystyle=\operatorname*{arg\,min}_{y^{\prime}}\left\{F_{\text{convex}}^{*}\left(z^{(n+1)}_{0};y^{\prime}\right)+\frac{1}{2}\|y^{(n)}+\Sigma K\bar{f}^{(n)}-y^{\prime}\|^{2}_{\Sigma^{-1}}\right\} (54)
f(n+1)\displaystyle f^{(n+1)} =argminf{G(f)+12f(n)TKy(n+1)fT12}\displaystyle=\operatorname*{arg\,min}_{f^{\prime}}\left\{G(f^{\prime})+\frac{1}{2}\|f^{(n)}-TK^{\top}y^{(n+1)}-f^{\prime}\|^{2}_{T^{-1}}\right\} (55)
f¯(n+1)\displaystyle\bar{f}^{(n+1)} =2f(n+1)f(n),\displaystyle=2f^{(n+1)}-f^{(n)}, (56)

where Fconvex(z0;y)F_{\text{convex}}^{*}(z_{0};y) is convex conjugate to Fconvex(z0;z)F_{\text{convex}}(z_{0};z) with respect to the second argument; the first line obtains the mirror expansion point using Eq. (52) and the right hand side expression is found by setting to zero the gradient of the objective function in Eq. (54); the second line makes use of convex approximation FconvexF_{\text{convex}} in the form of its convex conjugate; and the remaining two lines are the same as the those of the CP algorithm. For the simulations in this article, all variables are initialized to zero. Convergence of MOCCA, the algorithm specified by Eqs. (53) - (56), is investigated in an accompanying paper [30], which also develops the algorithm for a more general setting.

III-B4 Application of MOCCA to optimization of the spectral CT data fidelity

The MOCCA algorithm handles a fixed convex-concave function FF, convex function GG, and linear transform KK. In order to apply it to the spectral CT data fidelity, we propose: employing the local quadratic expansion in Eq. (37) to which we apply MOCCA, re-expand the spectral CT data discrepancy at the current estimate of the material maps, and iterate this procedure until convergence. We refer to iterations of the core MOCCA algorithm as “inner” iterations, and the process of iteratively re-expanding the data discrepancy and applying MOCCA are the “outer” iterations. Because MOCCA allows for non-smooth terms, the convex constraints described in Sec. II-B can be incorporated and the inner iterations aim at solving the intermediate problem

f=argminf{wQ(cw,f0;f)+iδ(Pi)}.f^{*}=\operatorname*{arg\,min}_{f}\left\{\sum_{w\ell}Q\left(c_{w\ell},f_{0};f\right)+\sum_{i}\delta(P_{i})\right\}. (57)

For the remainder of this section, for brevity, we drop the constraints and write the update steps taking only for the spectral CT data fidelity. The full algorithm with the convex constraints discussed in Sec. II-B can be derived using the methods described in [27] and an algorithm instance with TV constraints on the material maps is covered in Appendix C.

In applying MOCCA to Q(cw,f0;f)Q\left(c_{w\ell},f_{0};f\right), we use the convex and concave components from FQF_{Q} in Eq. (50) to form the local convex quadratic expansion needed in MOCCA, see Eq. (51),

FQ,convex(z)=12zDz(zz0)(b+Ez0).F_{Q,\text{convex}}(z)=\frac{1}{2}z^{\top}Dz-(z-z_{0})^{\top}(b+Ez_{0}). (58)

The corresponding dual function

FQ,convex(y)=12Dy+b+Ez022+z0Ez0,F_{Q,\text{convex}}^{*}(y)=\frac{1}{2D}\|y+b+Ez_{0}\|_{2}^{2}+z^{\top}_{0}Ez_{0}, (59)

is needed to derive the MOCCA dual update step at Eq. (54). We note that because the material maps ff enter Q(cw,f0;f)Q\left(c_{w\ell},f_{0};f\right) only after linear transformation, KfKf, and comparing with the generic optimization problem in Eq. (26), we have G(f)=0G(f)=0 for the present case where we only consider minimization of the data discrepancy.

In using an inner/outer iteration, a basic question is how accurately does the inner problem need to be solved. It turns out that it is sufficient to employ a single inner iteration, so that effectively the proposed algorithm no longer consists of nested iteration loops. Instead, the proposed algorithm performs re-expansion at every iteration:

f0\displaystyle f_{0} =f¯(n)\displaystyle=\bar{f}^{(n)} (60)
Σ(n)\displaystyle\Sigma^{(n)} =|K1(f0)|𝟏/λ;T(n)=λ|K1(f0)|𝟏\displaystyle=|K_{1}(f_{0})|\mathbf{1}/\lambda;\;\;T^{(n)}=\lambda|K_{1}(f_{0})|^{\top}\mathbf{1} (61)
z0(n+1)\displaystyle z^{(n+1)}_{0} =(Σ(n))1(y(n1)y(n)+Σ(n)K1(f0)f¯(n1))\displaystyle=(\Sigma^{(n)})^{-1}\left(y^{(n-1)}-y^{(n)}+\Sigma^{(n)}K_{1}(f_{0})\bar{f}^{(n-1)}\right) (62)
y(n+1)\displaystyle y^{(n+1)} =(D1(f0)+Σ(n))1[D1(f0)(y(n)+Σ(n)K1(f0)f¯(n))Σ(n)(b1(f0)+E1(f0)z0(n+1))]\displaystyle=(D_{1}(f_{0})+\Sigma^{(n)})^{-1}\left[D_{1}(f_{0})\left(y^{(n)}+\Sigma^{(n)}K_{1}(f_{0})\bar{f}^{(n)}\right)-\Sigma^{(n)}\left(b_{1}(f_{0})+E_{1}(f_{0})z^{(n+1)}_{0}\right)\right] (63)
f(n+1)\displaystyle f^{(n+1)} =f¯(n)T(n)K1(f0)y(n+1)\displaystyle=\bar{f}^{(n)}-T^{(n)}K_{1}(f_{0})^{\top}y^{(n+1)} (64)
f¯(n+1)\displaystyle\bar{f}^{(n+1)} =2f(n+1)f(n),\displaystyle=2f^{(n+1)}-f^{(n)}, (65)

where f(0)f^{(0)}, f(1)f^{(-1)}, f¯(0)\bar{f}^{(0)}, y(0)y^{(0)}, and y(1)y^{(-1)} are initialized to zero vectors.

Before explaining each line of the one-step spectral CT algorithm specified by Eqs. (60)-(65), we point out important features of the use of re-expansion at every iteration: (1) There are no nested loops. (2) The size of the system of equations is significantly reduced; note that only the first matrix block of KK, DD, EE, and bb (see Eq. (37) for their definition) appears in the steps of the algorithm. By re-expanding at every iteration the set of update steps for the second matrix block becomes trivial. (3) Re-expanding at every step is not guaranteed to converge, and an algorithm control parameter λ\lambda is introduced that balances algorithm convergence rate against possible unstable iteration, see Sec. IV for a demonstration on how λ\lambda impacts convergence. A similar strategy was used together with the CP algorithm in the use of non-convex image regularity norms, see [28].

The first line of the algorithm, Eq. (60), explicitly assigns the current material maps estimate to the new expansion point. In this way it is clear in the following steps whether f¯(n)\bar{f}^{(n)} enters the equations through the re-expansion center or through the steps of MOCCA. For the spectral CT algorithm it is convenient to use the vector step-sizes Σ(n)\Sigma^{(n)} and T(n)T^{(n)}, defined in Eq. (61), from the pre-conditioned form of the CP algorithm [26], because the linear transform K1(f0)K_{1}(f_{0}) is changing at each iteration as the expansion center changes. Computation of the vector step-sizes only involves single matrix-vector products of |K1(f0)||K_{1}(f_{0})| and |K1(f0)||K_{1}(f_{0})|^{\top} with a vector of ones, 𝟏\mathbf{1}, as opposed to performing the power method on K1(f0)K_{1}(f_{0}) to find the scalar step-sizes σ\sigma and τ\tau, which would render re-expansion at every iteration impractical. In Eq. (61), the parameter λ\lambda enters in such a way that the product Σ1/2KT1/2\|\Sigma^{1/2}KT^{1/2}\| remains constant. For the preconditioned CP algorithm, λ\lambda defined in this way will not violate the step-size condition. The dual and primal steps in Eqs. (63) and (64), respectively, are obtained by analytic computation of the minimizations in Eqs. (54) and (55) using Eq. (58) and G(f)=0G(f)=0, respectively. The primal step at Eq.(64) and the primal variable prediction step at Eq. (65) are identical to the corresponding CP algorithm steps at Eqs. (31) and (32), respectively. The presented algorithm accounts only for the spectral CT data fidelity optimization. For the full algorithm incorporating TV constraints used in the results section, see the pseudocode in Appendix C.

III-C One-step algorithm μ\mu-preconditioning

One of the main challenges of spectral CT image reconstruction is the similar dependence of the linear X-ray attenuation curves on energy for different tissues/materials. This causes rows of the attenuation matrix μmi\mu_{mi} to be nearly linearly dependent, or equivalently its condition number is large. There are two effects of the poor conditioning of μmi\mu_{mi}: (1) the ability to separate the material maps is highly sensitive to inconsistency in the spectral CT transmission data, and (2) the poor conditioning of μmi\mu_{mi} contributes to the overall poor conditioning of spectral CT image reconstruction negatively impacting algorithm efficiency. To address the latter issue, we introduce a simple preconditioning step that orthogonalizes the attenuation curves. We call this step “μ\mu”-preconditioning to differentiate it from the preconditioning of the CP algorithm.

To perform μ\mu-preconditioning, we form the matrix

Mmm=iμmiμim,M_{mm^{\prime}}=\sum_{i}\mu_{mi}\mu^{\top}_{im^{\prime}}, (66)

and perform the eigenvalue decomposition

M=Udiag(s)UT,M=U\operatorname{diag}(s)U^{T},

where the eigenvalues are ordered s1s2sNms_{1}\geq s_{2}\geq\dots\geq s_{N_{m}}. The singular values of μ\mu are given by the si\sqrt{s_{i}}’s and its condition number is s1/sNm\sqrt{s_{1}/s_{N_{m}}}. The preconditioning matrix for μ\mu is given by

P\displaystyle P =diag(s)UT,\displaystyle=\operatorname{diag}(\sqrt{s})U^{T}, (67)
P1\displaystyle P^{-1} =Udiag(1/s).\displaystyle=U\operatorname{diag}(1/\sqrt{s}).

Implementation of μ\mu-preconditioning consists of the following steps:

  • Transformation of material maps and attenuation matrix - The appropriate transformation is arrived at through inserting the identity matrix in the form of P1PP^{-1}P into the exponent of the intensity counts data model in Eq. (4):

    mkXkμmifkm=mmm′′kXkμm′′i(P1)m′′mPmmfkm=mkXkμmifkm,\sum_{mk}X_{\ell k}\mu_{mi}f_{km}=\sum_{m\,m^{\prime}m^{\prime\prime}k}X_{\ell k}\mu_{m^{\prime\prime}i}(P^{-1})_{m^{\prime\prime}m}P_{m\,m^{\prime}}f_{km^{\prime}}=\sum_{mk}X_{\ell k}\mu^{\prime}_{mi}f^{\prime}_{km}, (68)

    where

    fkm\displaystyle f^{\prime}_{km} =mPmmfkm,\displaystyle=\sum_{m^{\prime}}P_{mm^{\prime}}f_{km^{\prime}}, (69)
    μmi\displaystyle\mu^{\prime}_{mi} =mμmi(P1)mm.\displaystyle=\sum_{m^{\prime}}\mu_{m^{\prime}i}(P^{-1})_{m^{\prime}m}. (70)
  • Substitution into the one-step algorithm - Substitution of the transformed material maps and attenuation matrix into the one-step algorithm given by Eqs. (60)-(65) is fairly straight-forward. All occurrences of ff are replaced by ff^{\prime}, and the linear transform K1K_{1} is replaced by

    K1=A(f0)Z,K_{1}^{\prime}=A^{\prime}(f_{0}^{\prime})Z^{\prime},

    where, using Eqs. (17) and (18),

    Aw,i(f)=swiexp[(Zf)i]iswiexp[(Zf)i],A^{\prime}_{w\ell,\ell^{\prime}i}(f^{\prime})=\frac{s_{w\ell i}\exp\left[-(Z^{\prime}f^{\prime})_{\ell i}\right]}{\sum_{i^{\prime}}s_{w\ell i^{\prime}}\exp\left[-(Z^{\prime}f^{\prime})_{\ell i^{\prime}}\right]},

    and

    Z=μX.Z^{\prime}=\mu^{\prime}X.

    Using μ\mu-preconditioning, care must be taken in computing the vector stepsizes Σ\Sigma^{\prime} and TT^{\prime} in Eq. (61). Without μ\mu-preconditioning, the absolute value symbols are superfluous, because K1K_{1} has non-negative matrix elements. With μ\mu-preconditioning, the absolute value operation is necessary, because K1K^{\prime}_{1} may have negative entries through its dependence on ZZ^{\prime} and in turn μ\mu^{\prime}.

  • Formulation of constraints - the previously discussed constraints are functions of the untransformed material maps. As a result, in using μ\mu-preconditioning where we solve for the transformed material maps, the constraints should be formulated in terms of

    f=P1f.f=P^{-1}f^{\prime}. (71)

    The explicit pseudocode for constrained data-discrepancy minimization using μ\mu-preconditioning is given in Appendix C.

After applying the μ\mu-preconditioned one-step algorithm the final material maps are arrived at through Eq. (71).

III-D Convergence checks

Within the present primal-dual framework we employ the primal-dual gap for checking convergence. The primal-dual gap that we seek is the difference between the convex quadratic approximation using the first matrix block in Eq. (58), which is the objective function in the primal minimization

f=argminf{12(K1f)D1K1f(K1fz0)(b1+E1z0)},f^{\star}=\operatorname*{arg\,min}_{f}\left\{\frac{1}{2}(K_{1}f)^{\top}D_{1}K_{1}f-(K_{1}f-z_{0})^{\top}(b_{1}+E_{1}z_{0})\right\}, (72)

and the objective function in the Fenchel dual maximization problem

y=argmaxy{12D11y+b1+E1z022z0E1z0} such that K1y=0.y^{\star}=\operatorname*{arg\,max}_{y}\left\{-\frac{1}{2}D^{-1}_{1}\|y+b_{1}+E_{1}z_{0}\|_{2}^{2}-z^{\top}_{0}E_{1}z_{0}\right\}\text{ such that }K_{1}^{\top}y=0. (73)

These problems are derived from the general forms in Eqs. (26) and (33), and the constraint in the dual maximization comes from the fact that G(f)=0G(f)=0 in the primal problem, see Sec. 3.1 in [27]. For a convergence check we inspect the difference between these two objective functions. Note that the constant term z0E1z0z^{\top}_{0}E_{1}z_{0} cancels in this subtraction and plays no role in the optimization algorithm, and could thus be left out. If the material maps f(n)f^{(n)} attain a stable value, the constraint K1y=0K^{\top}_{1}y=0 is necessarily satisfied from inspection of Eq. (64). When other constraints are included the estimates of the material maps should be checked against these constraints and the primal-dual gap is modified.

IV Results

Refer to caption

Figure 1: Normalized spectrum of a typical X-ray source for CT operating at a potential of 120kV.

We demonstrate use of the one-step spectral CT algorithm on simulated transmission data modeling an ideal photon-counting detector. The X-ray spectrum, shown in Fig. 1, is assumed known. In modeling the ideal detector, the spectral response of an energy-windowed photon count measurement is taken to be the same as that of Fig. 1 between the bounding threshold energies of the window and zero outside. We conduct two studies. The first is focused on demonstrating convergence and application of the one-step algorithm with recovery of material maps for a two-material head phantom using the following minimization problems

TPL-TV:argminfDTPL(c,c^(f)) such that fmTVγmm,\text{TPL-TV:}\;\;\;\operatorname*{arg\,min}_{f}D_{\text{TPL}}(c,\hat{c}(f))\text{ such that }\|f_{m}\|_{\text{TV}}\leq\gamma_{m}\;\forall\,m,

and

LSQ-TV:argminfDLSQ(c,c^(f)) such that fmTVγmm.\text{LSQ-TV:}\;\;\;\operatorname*{arg\,min}_{f}D_{\text{LSQ}}(c,\hat{c}(f))\text{ such that }\|f_{m}\|_{\text{TV}}\leq\gamma_{m}\;\forall\,m.

The pseudo-code for TPL-TV and LSQ-TV is given explicitly in Appendix C. The second study simulates a more realistic study demonstrating application on an anthropomorphic chest phantom simulating multiple tissues/materials at multiple densities. For this study we demonstrate one-step image reconstruction of a mono-energetic image at energy EE using

TPL-monoTV:argminfDTPL(c,c^(f)) such that f(mono)(E)TVγmono.\text{TPL-monoTV:}\;\;\;\operatorname*{arg\,min}_{f}D_{\text{TPL}}(c,\hat{c}(f))\text{ such that }\|f^{\text{(mono)}}(E)\|_{\text{TV}}\leq\gamma_{\text{mono}}.

Note that for monoenergetic image reconstruction, the TV constraint is placed on the monoenergetic image, but the optimization is performed over the individual material maps fmf_{m} and the monoenergetic image is formed after the optimization using Eq. (11).

Aside from the system specification parameters, such as number of views, detector bins, and image dimensions, the algorithm parameters are the TV constraints γm\gamma_{m} for TPL-TV and LSQ-TV or γmono\gamma_{\text{mono}} for TPL-monoTV and the primal-dual step size ratio λ\lambda. The TV constraints γm\gamma_{m} or γmono\gamma_{\text{mono}} affect the image regularization, but λ\lambda is a tuning parameter which does not alter the solution of TPL-TV, LSQ-TV, or TPL-monoTV. It is used to optimize convergence speed of the one-step image reconstruction algorithm.

IV-A Head phantom studies with material map TV-constraints

Refer to caption

Figure 2: Bone and brain maps derived from the FORBILD head phantom. Both images are shown in the gray scale window [0.9, 1.1].

For the present studies, we employ a two-material phantom derived from the FORBILD head phantom shown in Fig. 2. The spectral CT transmission counts are computed by use of the discrete-to-discrete model in Eq. (4). The true material maps fk,bonef_{k,\text{bone}} and fk,brainf_{k,\text{brain}} are the 256×\times256 pixel arrays shown in Fig. 2 and the corresponding linear X-ray coefficients μbone,i\mu_{\text{bone},i} and μbrain,i\mu_{\text{brain},i} are obtained from the NIST tables available in Ref. [23] for energies ranging from 20 to 120 KeV in increments of 1 KeV. By employing the same data model as that used in the image reconstruction algorithm, we can investigate the convergence properties of the one-step algorithm.

Refer to caption Refer to caption

Figure 3: Convergence metrics for LSQ-TV and TPL-TV and for different values of λ\lambda with ideal, noiseless data. First, second, third, and fourth rows show the conditional primal-dual (cPD) gap, data discrepancy objective function, difference between the TV of estimated bone map and that of the phantom bone map, and same for the brain map TV. Note that the expressions for the gap and data discrepancy are different for TPL and LSQ; thus those quantities are not directly comparable.

For the head phantom simulations, the scanning configuration is 2D fan-beam CT with a source to iso-center distance of 50 cm and source to detector distance of 100 cm. The physical size of the phantom pixel array is 20×2020\times 20 cm2. The number of projection views over a full 2π\pi scan is 128 and the number of detector bins along a linear detector array is 512. This configuration is undersampled by a factor of 4 [31]. Two X-ray energy windows are simulated with a spectral response for each window given by the spectrum shown in Fig. 1 in the energy ranges [20 KeV, 70 KeV] and [70 KeV, 120 KeV] for the first and second energy windows, respectively.

Refer to caption Refer to caption

Figure 4: Convergence of the material map estimates to the phantom material maps for LSQ-TV and TPL-TV and for different values of λ\lambda with ideal, noiseless data.
Ideal data study

For ideal, noiseless data several image metrics are plotted in Fig. 3 for different values of λ\lambda, and it is observed that the conditional primal-dual (cPD) gap and data discrepancy tend to zero while the material map TVs converge to the designed values. For this problem the convergence metrics are the cPD and material map TVs; the data discrepancy only tends to zero here due to the use of ideal data and in general when data inconsistency is present the minimum data discrepancy will be greater than zero. The convergence metrics demonstrate convergence of the one-step algorithm for the particular problem under study. It is important, however, to inspect these metrics for each application of the one-step algorithm, because there is no theoretical guarantee of convergence due to the re-expansion step in Eq. (60). From the present results it is clear that progress towards convergence depends on λ\lambda; thus it is important to perform a search over λ\lambda.

Refer to caption Refer to caption

   TPL-TV                                                   LSQ-TV

Figure 5: Difference between estimated brain and bone maps after 5,000 iterations and the corresponding phantom map shown in a 1% gray scale window [-0.01, 0.01] for TPL-TV and a 0.1% window [-0.001, 0.001] for LSQ-TV and different values of λ\lambda with ideal, noiseless data. The difference images are displayed in a region of interest around the sinus bones.

To demonstrate convergence of the material map estimates to the corresponding phantom, we plot image root-mean-square-error (RMSE) in Fig. 4 as a function of iteration number and show the map differences at the last iteration performed in Fig. 5. The material map estimates are seen to converge to the corresponding phantom maps despite the projection view-angle under-sampling. Thus we note that the material map TV constraints are effective at combatting these under-sampling artifacts just as they are for standard CT [32, 33]. The image RMSE curves only give a summary metric for material map convergence, and it is clear from the difference images displayed in narrow gray scale window that convergence can be spatially non-uniform. For these idealized examples the pre-conditioned one-step algorithm appears to be more effective for LSQ-TV than TPL-TV as the image RMSE attained for the former is significantly lower than that of the latter. In Fig. 4 curves for LSQ-TV at λ=1×102\lambda=1\times 10^{2}, the image RMSE curves plateau at 105~10^{-5} due to the fact that the solution of LSQ-TV is achieved to the single precision accuracy of the computation.

Refer to caption

Figure 6: Reconstructed bone map by use of TPL-TV from simulated noisy projection spectral CT transmission data. The material map TV constraints are varied according to fractions of the corresponding phantom material map TV.

Refer to caption

Figure 7: Reconstructed brain map by use of TPL-TV from simulated noisy projection spectral CT transmission data. The material map TV constraints are varied according to fractions of the corresponding phantom material map TV.
Noisy data study

The noisy simulation parameters are identical to the previous noiseless study except that the spectral CT data are generated from the transmission Poisson model. The mean of the transmission measurements is arrived at by assuming 4×1064\times 10^{6} total photons are incident at each detector pixel over the complete X-ray spectrum. As the simulated scan acquires only 128 views, the total X-ray exposure is equivalent to acquiring 512 views at 1×1061\times 10^{6} photons per detector pixel.

Refer to caption

Figure 8: Reconstructed bone map by use of LSQ-TV from simulated noisy projection spectral CT transmission data. The material map TV constraints are varied according to fractions of the corresponding phantom material map TV.

Refer to caption

Figure 9: Reconstructed brain map by use of LSQ-TV from simulated noisy projection spectral CT transmission data. The material map TV constraints are varied according to fractions of the corresponding phantom material map TV.

We obtain multiple material map reconstructions varying the TV constraints among values greater than or equal to the actual values of the known bone and brain maps. The results for TV-TPL are shown in Figs. 6 and 7, and those for TV-LSQ are shown in Figs. 8 and 9. In all images the bone and brain maps recover the main features of the true phantom maps, and the main difference in the images is the structure of the noise. The noise texture of the recovered brain maps appears to be patchy for lower TV and grainy for larger TV constraints. Also, in comparing the brain maps for TPL-TV in Fig. 7 and LSQ-TV in Fig. 9, streak artifacts are more apparent in the latter particularly for the larger TV constraint values.

Refer to caption Refer to caption

Figure 10: Same as Fig. 3 except that only one value of λ\lambda is shown and the results are for noisy data and the TV constraints for the bone and brain maps are set to 1.1×TVbone1.1\times\text{TV}_{\text{bone}} and 1.1×TVbrain1.1\times\text{TV}_{\text{brain}}, respectively. The TV constraint settings correspond to the center images in Figs. 6-9.

It is instructive to examine the convergence metrics in Fig. 10 and image convergence in Fig. 11 for this noisy simulation. The presentation parallels the noiseless results in Figs. 3 and 4, respectively. The differences are that results are shown for a single λ\lambda and the image RMSE is shown for two different TV-constraint settings in the present noisy simulations. The cPD and TV plots all indicate convergence to the solution for TPL-TV and LSQ-TV. Note, however, the value of the data discrepancy objective function settles on a positive value as expected for inconsistent data. The data discrepancy, however, does not provide a check on convergence. It is true that if the data discrepancy changes with iteration we do not have convergence, but the inverse is not necessarily true. It is also reassuring to observe that the convergence rates for the set values of λ\lambda are similar between the noiseless and noisy results. This similarity is also not affected by the fact that the TV constraints are set to different values in each of these simulations.

Refer to caption Refer to caption

Figure 11: Convergence of the material map estimates to the phantom material maps for LSQ-TV and TPL-TV and for noisy data with two different settings of the TV constraints. The TV factor applies to both the bone and brain maps, so that a TV factors of 1.1 and 1.2 correspond to the center and bottom, left images of Figs. 6-9.

The RMSE comparison of the recovered material maps with the true phantom maps shown in Fig. 11 indicate an average error less than 1% for the bone map and just under 2% for the brain map (100 ×\times the RMSE values can be interpreted as a percent error because the material maps have a value of 0 or 1). The main purpose of showing these plots is to see quantitatively the difference between the TPL and LSQ data discrepancy terms. We would expect to see lower values of image RMSE for TPL-TV, because the simulated noise is generated by a transmission Poisson model. Indeed the image RMSE is lower for TPL-TV and the gap between TPL-TV and LSQ-TV is larger for looser TV constraints. We do point out that image RMSE may not translate into better image quality, because image quality depends on the imaging task for which the images are used. Task-based assessment would take into account features of the observed signal, noise texture, and possibly background texture and observer perception [34].

One of the benefits of using the TV constraints instead of TV penalties is that the material maps reconstructed using the TPL and LSQ data discrepancy terms can be compared meaningfully. The TV constraint parameters will result in material maps with exactly the chosen TVs, while to achieve the same with the penalization approach the penalty parameters must be searched to achieve equivalent TVs. Also generating simulation results becomes more efficient, because we can directly make use of the known phantom TV values.

IV-B Chest phantom studies with a mono-energetic image TV constraint

Refer to caption

Figure 12: (Left) Chest phantom displayed at 70 KeV in a gray scale window of [0, 0.5] cm-1. (Right) Reconstruction by use of unregularized TPL. The estimated material maps are combined to form the shown monochromatic image estimate at 70 KeV (gray scale is also [0, 1.0] cm-1). For reference the TV values of the phantom and unconstrained reconstructed image are 2,587 and 7,686, respectively.

For the final set of results we employ an anthropomorphic chest phantom created from segmentation of an actual CT chest image. Different tissue types and densities are labeled in the image totaling 24 material/density combinations, including various soft tissues, calcified/bony regions, and Gadolinium contrast agent. To demonstrate the one-step algorithm on this more realistic phantom model, we select the TPL-monoTV optimization problem in Eq. (IV) for the material map reconstruction. The material basis is selected to be water, bone, and Gadolinium contrast agent. Using TPL-monoTV is simpler than TPL-TV in that only the energy for the mono-energetic image and a single TV constraint parameter is needed instead of three parameters – the TV for each of the material maps. There are potential advantages to constraining the TV of the material maps individually, but the purpose here is to demonstrate use of the one-step algorithm and accordingly we select the simpler optimization problem.

Refer to caption

Figure 13: Estimated monochromatic images by use of TPL-monoTV. The left column shows the complete image in a gray scale window of [0, 0.5] cm-1. The right column magnifies a region of interest (ROI) in the right lung, and the gray scale is narrowed to [0, 0.1] cm-1 in order to see the soft tissue detail. The top set of images correspond to the phantom. The location of the ROI is indicated in the left phantom image inset by use of the narrow [0, 0.1] cm-1 gray scale. The second, third, and fourth rows correspond to images obtained by different TV constraints of the monoenergetic image at 70 KeV.

For the chest phantom simulations, the scanning configuration is again 2D fan-beam CT with a source to iso-center distance of 80 cm and source to detector distance of 160 cm. The physical size of the phantom pixel array is 29×2929\times 29 cm2. The number of projection views is 128 over a 2π\pi scan. Five X-ray energy windows are simulated in the energy ranges [20 , 50], [50, 60], [60,80], [80,100], and [100, 120] keV. The lowest energy window is selected wider than the other four to avoid photon starvation. Noise is added in the same way as the previous simulation. The transmitted counts data follow a Poisson model with a total of 4×1064\times 10^{6} photons per detector pixel. The monoenergetic image at 70 keV along with unregularized image reconstruction by TPL are shown in Fig. 12. The TPL mono-energetic image reconstruction demonstrates the impact of the simulated noise on the reconstructed image.

In Fig. 13 we show the resulting monoenergetic images from TPL-monoTV at three values of the TV constraint. The reconstructed images are shown globally in a wide gray scale and in an ROI focused on the right lung in a narrow gray scale window. The values of the TV constraint are selected based on visualization of the fine structures in the lung. For viewing these features, relatively low values of TV are selected. We note that in the global images the same TV values show the high-contrast structures with few artifacts. We point out that the one-step algorithm yields three basis material maps (not shown) and the mono-energetic images are formed by use of Eq. (11).

The selected optimization problems and simulation parameters are chosen to demonstrate possible applications of the one-step algorithm for spectral CT. Comparison of the TPL and LSQ data discrepancy in Figs. 7 and 9 does show fewer artifacts for TPL-TV, where the simulated noise model matches the TPL likelihood. In practice, we may not see the same relative performance on real data – the simulations ignore some important physical factors of spectral CT, and image quality evaluation depends on the task for which the images are used.

V Conclusion

We have developed a constrained one-step algorithm for inverting spectral CT transmission data directly to basis material maps. The algorithm addresses the associated non-convex data discrepancy terms by employing a local convex quadratic upper bound to derive the descent step. While we have derived the algorithm for TPL and LSQ data discrepancy terms, the same strategy can be applied to derive the one-step algorithm for other data fidelities. The one-step algorithm derives from the convex-concave optimization algorithm, MOCCA, which we have developed for addressing an intermediate problem arising from use of the local convex quadratic approximation. The simulations demonstrate the one-step algorithm for TV-constrained data discrepancy minimization, where the TV constraints can be applied to the individual basis maps or to an estimated monochromatic X-ray attenuation map.

Future work will investigate robustness of the one-step algorithm to data inconsistency due to spectral miscalibration error, X-ray scatter, and various physical processes involved in photon-counting detection. The one-step algorithm’s ability to incorporate basis map constraints in the inversion process should provide a means to control artifacts due to such inconsistencies. We are also pursuing a generalization to the present algorithm to allow for auto-calibration of the spectral response of the CT system.

Acknowledgements

This work was supported by NIH Grants R21EB015094, CA158446, CA182264, and EB018102. The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health.

Appendix A Gradient of LTPLL_{\text{TPL}}

We derive the gradient in Eq. (13), motivating the definition of the linear transform AA. Recall Eqs. (4) and (17):

Zi,mk\displaystyle Z_{\ell i,mk} =μmiXk\displaystyle=\mu_{mi}X_{\ell k}
c^w(fkm)\displaystyle\hat{c}_{w\ell}(f_{km}) =Nwiswiexp[mkμmiXkfkm]\displaystyle=N_{w\ell}\sum_{i}s_{w\ell i}\exp\left[-\sum_{mk}\mu_{mi}X_{\ell k}f_{km}\right]
=Nwiswiexp[mkZi,mkfkm]\displaystyle=N_{w\ell}\sum_{i}s_{w\ell i}\exp\left[-\sum_{mk}Z_{\ell i,mk}f_{km}\right]
=Nwiswiexp[(Zf)i].\displaystyle=N_{w\ell}\sum_{i}s_{w\ell i}\exp\left[-(Zf)_{\ell i}\right].

The gradient of LTPLL_{\text{TPL}} is

fLTPL(f)\displaystyle\nabla_{f}L_{\text{TPL}}(f) :=fkmw[c^w(f)cwcwlog(c^w(f)/cw)]\displaystyle:=\frac{\partial}{\partial f_{km}}\sum_{w\ell}\left[\hat{c}_{w\ell}(f)-c_{w\ell}-c_{w\ell}\log\left(\hat{c}_{w\ell}(f)/c_{w\ell}\right)\right]
=fkmw[c^w(fkm)cwlogc^wl(fkm)]\displaystyle=\frac{\partial}{\partial f_{km}}\sum_{w\ell}\left[\hat{c}_{w\ell}(f_{km})-c_{w\ell}\log\hat{c}_{wl}(f_{km})\right]
=w[Nwiswi(Zi,mk)exp[(Zf)i]cwiswi(Zi,mk)exp[(Zf)i]iswiexp[(Zf)i]]\displaystyle=\sum_{w\ell}\left[N_{w\ell}\sum_{i}s_{w\ell i}\cdot(-Z_{\ell i,mk})\exp[-(Zf)_{\ell i}]-c_{w\ell}\frac{\sum_{i}s_{w\ell i}\cdot(-Z_{\ell i,mk})\exp[-(Zf)_{\ell i}]}{\sum_{i}s_{w\ell i}\exp[-(Zf)_{\ell i}]}\right]
=iZi,mkw[cwswiexp[(Zf)i]iswiexp[(Zf)i]Nwswiexp[(Zf)i]]\displaystyle=\sum_{\ell i}Z_{\ell i,mk}\sum_{w}\left[c_{w\ell}\frac{s_{w\ell i}\exp[-(Zf)_{\ell i}]}{\sum_{i}s_{w\ell i}\exp[-(Zf)_{\ell i}]}-N_{w\ell}s_{w\ell i}\exp[-(Zf)_{\ell i}]\right]
=iZi,mkw(cwc^w(f))swiexp[(Zf)i]iswiexp[(Zf)i].\displaystyle=\sum_{\ell i}Z_{\ell i,mk}\sum_{w}(c_{w\ell}-\hat{c}_{w\ell}(f))\frac{s_{w\ell i}\exp[-(Zf)_{\ell i}]}{\sum_{i}s_{w\ell i}\exp[-(Zf)_{\ell i}]}.

Continuing the algebraic manipulation we insert 𝐈\mathbf{I}_{\ell\ell^{\prime}}:

fLTPL(f)\displaystyle\nabla_{f}L_{\text{TPL}}(f) =iZi,mkw𝐈swiexp[(Zf)i]iswiexp[(Zf)i](cwc^w(f))\displaystyle=\sum_{\ell i}Z_{\ell i,mk}\sum_{w\ell^{\prime}}\mathbf{I}_{\ell\ell^{\prime}}\frac{s_{w\ell^{\prime}i}\exp[-(Zf)_{\ell^{\prime}i}]}{\sum_{i}s_{w\ell^{\prime}i}\exp[-(Zf)_{\ell^{\prime}i}]}(c_{w\ell^{\prime}}-\hat{c}_{w\ell^{\prime}}(f))
=iZi,mkwAw,i(f)rw(f)=ZA(f)r(f).\displaystyle=\sum_{\ell i}Z_{\ell i,mk}\sum_{w\ell^{\prime}}A_{w\ell^{\prime},\ell i}(f)r_{w\ell^{\prime}}(f)=Z^{\top}A(f)^{\top}r(f).

The other necessary gradient and Hessian computations follow from similar manipulations.

Appendix B Positive semidefiniteness of +2L(f)\nabla^{2}_{+}L(f) and 2L(f)\nabla^{2}_{-}L(f)

Recall Eq. (18),

Aw,i(f)=swiexp[(Zf)i]iswiexp[(Zf)i]𝐈.A_{w\ell,\ell^{\prime}i}(f)=\frac{s_{w\ell i}\exp\left[-(Zf)_{\ell i}\right]}{\sum_{i^{\prime}}s_{w\ell i^{\prime}}\exp\left[-(Zf)_{\ell i^{\prime}}\right]}\mathbf{I}_{\ell^{\prime}\ell}.

For ease of presentation, we collapse the double indices of AA

As,t(f)=Aw,i(f) where s=wN+,t=iN+.A_{s,t}(f)=A_{w\ell,\ell^{\prime}i}(f)\text{ where }s=w\cdot N_{\ell}+\ell,\;t=i\cdot N_{\ell}+\ell^{\prime}.

We show that

diag(A(f)b)A(f)diag(b)A(f)0 when bs0s.\operatorname{diag}\left(A(f)^{\top}b\right)-A(f)^{\top}\operatorname{diag}(b)A(f)\geq 0\text{ when }b_{s}\geq 0\;\forall\,s. (74)

This inequality can be used to prove that the Hessians in Eqs. (21), (22), (23), and (24) are positive semidefinite by setting bb equal to r(f)r_{-}(f), r+(f)r_{+}(f), r(log)(f)r_{-}^{\text{(log)}}(f), and r+(log)(f)r_{+}^{\text{(log)}}(f), respectively.

To prove Eq. (74), we expand bb in unit vectors

b=sbse^s where e^s,s={1s=s0ss.b=\sum_{s}b_{s}\hat{e}_{s}\text{ where }\hat{e}_{s,s^{\prime}}=\begin{cases}1&s^{\prime}=s\\ 0&s^{\prime}\neq s\end{cases}.

and show for any vector uu that

u[diag(A(f)e^s)A(f)diag(e^s)A(f)]u0.u^{\top}\left[\operatorname{diag}\left(A(f)^{\top}\hat{e}_{s}\right)-A(f)^{\top}\operatorname{diag}(\hat{e}_{s})A(f)\right]u\geq 0. (75)

Fixing ss, we define the vector vv with components vt=A(f)s,tv_{t}=A(f)_{s,t} or using the unit vector e^s\hat{e}_{s}

v=A(f)e^s.v=A^{\top}(f)\hat{e}_{s}.

From the definition of AA, we note that vt0v_{t}\geq 0 for all t{1,,Nt}t\in\{1,\dots,N_{t}\} and tvt=1\sum_{t}v_{t}=1. By the definition of vv, the left-hand side of Eq. (75), lhs, is

lhs =u[diag(v)vv]u\displaystyle=u^{\top}\left[\operatorname{diag}(v)-vv^{\top}\right]u
=tvtut2(tvtut)2\displaystyle=\sum_{t}v_{t}u_{t}^{2}-\left(\sum_{t}v_{t}u_{t}\right)^{2}
=tvtut2(tvtvtut2)2,\displaystyle=\sum_{t}v_{t}u_{t}^{2}-\left(\sum_{t}\sqrt{v_{t}}\cdot\sqrt{v_{t}u_{t}^{2}}\right)^{2},

and by the Cauchy-Schwartz inequality,

lhs tvtut2(tvt)(tvtut2)\displaystyle\geq\sum_{t}v_{t}u_{t}^{2}-\left(\sum_{t}v_{t}\right)\cdot\left(\sum_{t}v_{t}u_{t}^{2}\right)
=tvtut2(1)(tvtut2)\displaystyle=\sum_{t}v_{t}u_{t}^{2}-(1)\cdot\left(\sum_{t}v_{t}u_{t}^{2}\right)
=0.\displaystyle=0.

This proves the inequality in Eq. (75).

Using Eq. (75), we prove the inequality in Eq. (74)

diag(A(f)b)A(f)diag(b)A(f)=sbs[diag(A(f)e^s)A(f)diag(e^s)A(f)]0,\operatorname{diag}\left(A(f)^{\top}b\right)-A(f)^{\top}\operatorname{diag}(b)A(f)=\sum_{s}b_{s}\cdot\left[\operatorname{diag}\left(A(f)^{\top}\hat{e}_{s}\right)-A(f)^{\top}\operatorname{diag}(\hat{e}_{s})A(f)\right]\geq 0, (76)

where the inequality is shown by noting that bs0b_{s}\geq 0, by assumption, and the sum is thus a linear combination of positive definite matrices with non-negative coefficients.

Appendix C Derivation of one-step algorithm for TPL-TV and LSQ-TV with μ\mu-preconditioning

TV-constrained optimization

To derive the one-step algorithm used in the Results section, we write down the intermediate convex optimization problem that involves the first block of the local quadratic upper bound to DTPLD_{\text{TPL}} or DLSQD_{\text{LSQ}}

f=argminf{12(K1f)D1K1f(K1fz0)(b1+E1z0)} such that fmTVγmm.f^{\star}=\operatorname*{arg\,min}_{f}\left\{\frac{1}{2}(K_{1}f)^{\top}D_{1}K_{1}f-(K_{1}f-z_{0})^{\top}(b_{1}+E_{1}z_{0})\right\}\text{ such that }\|f_{m}\|_{\text{TV}}\leq\gamma_{m}\;\forall\,m. (77)

That only the first block of the full quadratic expression is explained in Sec. III-B4 and the form of D1D_{1}, E1E_{1} and b1b_{1} given in Sec. III-B2 determines whether we are addressing TPL-TV or LSQ-TV . The data discrepancy term of this optimization problem is the same as the objective function of Eq. (58), but it differs from Eq. (58) in that we have added the convex constraints on the material map TV values. We write Eq. (77) using indicator functions (see Eq. (6)) to code the TV constraints and we introduce the μ\mu-preconditioning transformation described in Sec. III-C

f=argminf{12(K1f)D1K1f(K1fz0)(b1+E1z0)+mδ((P1f)mTVγm)},f^{\star}=\operatorname*{arg\,min}_{f^{\prime}}\left\{\frac{1}{2}(K^{\prime}_{1}f^{\prime})^{\top}D_{1}K^{\prime}_{1}f-(K^{\prime}_{1}f^{\prime}-z_{0})^{\top}(b_{1}+E_{1}z_{0})+\right.\\ \left.\sum_{m}\delta\left(\left\|\left(P^{-1}f^{\prime}\right)_{m}\right\|_{\text{TV}}\leq\gamma_{m}\right)\right\}, (78)

where f=Pff^{\prime}=Pf are the transformed (μ\mu-preconditioned) material maps from Sec. III-C. Note that the TV constraints apply to the untransformed material maps f=P1ff=P^{-1}f^{\prime}.

Writing constrained TV optimization in the general form F(Kx)+G(x)F(Kx)+G(x)

To derive the CP primal-dual algorithm, we write Eq. (78) in the form of Eq. (26). We note that all the terms involve a linear transform of ff, and accordingly we make the following assignments

Fconvex(z0;z)\displaystyle F_{\text{convex}}(z_{0};z) =F1(z0;zsino)+F2(zgrad)\displaystyle=F_{1}(z_{0};z_{\text{sino}})+F_{2}(z_{\text{grad}})
F1(zsino)\displaystyle F_{1}(z_{\text{sino}}) =12zsinoD1zsino(zsinoz0)(b1+E1z0)\displaystyle=\frac{1}{2}z^{\top}_{\text{sino}}D_{1}z_{\text{sino}}-(z_{\text{sino}}-z_{0})^{\top}(b_{1}+E_{1}z_{0})
F2(zgrad)\displaystyle F_{2}(z_{\text{grad}}) =mδ((|zgrad,m|)1γm)\displaystyle=\sum_{m}\delta\left(\left\|\left(|z_{\text{grad},m}|\right)\right\|_{1}\leq\gamma_{m}\right)
G(f)\displaystyle G(f^{\prime}) =0,\displaystyle=0,

where

z=(zsinozgrad)=(K1P1)f.z=\left(\begin{array}[]{c}z_{\text{sino}}\\ z_{\text{grad}}\end{array}\right)=\left(\begin{array}[]{c}K^{\prime}_{1}\\ \nabla P^{-1}\end{array}\right)f^{\prime}.

Note that we use the short-hand that the gradient operator, \nabla, applies to each of the material maps in the composite material map vector, P1fP^{-1}f^{\prime}. The Legendre transform in Eq. (28) provides the necessary dual functions F1F^{*}_{1}, F2F^{*}_{2}, and GG^{*}. By direct computation

F1(ysino)=12D11ysino+b1+E1z022+z0TE1z0.F^{*}_{1}(y_{\text{sino}})=\frac{1}{2}D_{1}^{-1}\|y_{\text{sino}}+b_{1}+E_{1}z_{0}\|^{2}_{2}+z_{0}^{T}E_{1}z_{0}. (79)

From Sec. 3.1 of Ref. [27]

G(f)=δ(f=0).G^{*}(f)=\delta(f=0). (80)

Refer to caption

Figure 14: Schematic illustrating the solution of maxgm{gmgmδ(gm1γm)}\max_{g_{m}^{\prime}}\left\{g_{m}^{\top}g_{m}^{\prime}-\delta\left(\left\|g^{\prime}_{m}\right\|_{1}\leq\gamma_{m}\right)\right\}. The input vector gmg_{m} and the maximizing vector gmg^{\prime}_{m} are indicated on a 2D schematic, but the argument applies for the full NkN_{k}-D space of gmg_{m}. Because gmg^{\prime}_{m} is a vector of magnitudes, each component is non-negative gm,k0g^{\prime}_{m,k}\geq 0. The indicator function confines gmg^{\prime}_{m} below the line (hyper-plane), kgm,k=γm\sum_{k}g^{\prime}_{m,k}=\gamma_{m}. The combination of these constraints confines gmg^{\prime}_{m} to the schematic, shaded triangle. The maximizer gmg^{\prime}_{m} is the vector that maximizes the dot product, gmgmg_{m}^{\top}g_{m}^{\prime} (or equivalently the projection of gmg^{\prime}_{m} onto gmg_{m} as indicated by the dashed line from the head of gmg^{\prime}_{m} to the arrow indicating gmg_{m}). Maximization of this dot product is achieved by choosing gm=γme^kmaxg^{\prime}_{m}=\gamma_{m}\hat{e}_{k-\text{max}} such that it is aligned along the unit vector corresponding to the largest component of gmg_{m}. The largest component of gmg_{m} is also known as the “infinity-norm”, gm\|g_{m}\|_{\infty}. Thus we have (γme^kmax)gm=γmgm(\gamma_{m}\hat{e}_{k-\text{max}})^{\top}g_{m}=\gamma_{m}\|g_{m}\|_{\infty}.
Convex conjugate of F2F_{2}

We sketch the derivation of F2(ygrad)F^{*}_{2}(y_{\text{grad}}), and for this derivation we drop the “grad” subscript.

F2(y)\displaystyle F^{*}_{2}(y) =maxy{yymδ((|ym|)1γm)}\displaystyle=\max_{y^{\prime}}\left\{y^{\top}y^{\prime}-\sum_{m}\delta\left(\left\|\left(|y^{\prime}_{m}|\right)\right\|_{1}\leq\gamma_{m}\right)\right\}
=maxg{ggmδ(gm1γm)}\displaystyle=\max_{g^{\prime}}\left\{g^{\top}g^{\prime}-\sum_{m}\delta\left(\left\|g^{\prime}_{m}\right\|_{1}\leq\gamma_{m}\right)\right\}
where g=|y| and g=|y|,\displaystyle\text{where }g=|y|\text{ and }g^{\prime}=|y^{\prime}|,
=maxgm{gmgmδ(gm1γm)},\displaystyle=\max_{g^{\prime}}\sum_{m}\left\{g_{m}^{\top}g_{m}^{\prime}-\delta\left(\left\|g^{\prime}_{m}\right\|_{1}\leq\gamma_{m}\right)\right\},
=mmaxgm{gmgmδ(gm1γm)}\displaystyle=\sum_{m}\max_{g_{m}^{\prime}}\left\{g_{m}^{\top}g_{m}^{\prime}-\delta\left(\left\|g^{\prime}_{m}\right\|_{1}\leq\gamma_{m}\right)\right\}

The Legendre transform maximization over the variable yy^{\prime}, dual to the material map gradients, is reduced to a maximization over the spatial magnitude g=|y|g^{\prime}=|y^{\prime}| because the indicator function is independent of the spatial direction of yy^{\prime} and the term yyy^{\top}y^{\prime} is maximized when the spatial direction of yy^{\prime} line up with yy; hence the term yyy^{\top}y^{\prime} is replaced by ggg^{\top}g^{\prime}, which we explicitly write as a sum over the material index mm. The maximization and summation order can be switched, because each of the terms in the summation are independent of each other. Evaluation of the maximization over gmg^{\prime}_{m} can be seen in the diagram shown in Fig. 14. Accordingly we find

F2(ygrad)=mγm(|ygrad|).F^{*}_{2}(y_{\text{grad}})=\sum_{m}\gamma_{m}\|(|y_{\text{grad}}|)\|_{\infty}. (81)
Dual maximization of Eq. (78)

Using Eqs. (33), (79), (80), and (81), we obtain the maximization dual to Eq. (77)

y=argmaxy{12D11ysino+b1+E1z022z0E1z0mγm(|ygrad|)} such that (K1P1)y=0.y^{\star}=\operatorname*{arg\,max}_{y}\left\{-\frac{1}{2}D^{-1}_{1}\|y_{\text{sino}}+b_{1}+E_{1}z_{0}\|_{2}^{2}-z^{\top}_{0}E_{1}z_{0}-\sum_{m}\gamma_{m}\|(|y_{\text{grad}}|)\|_{\infty}\right\}\\ \text{ such that }\left(\begin{array}[]{c}K^{\prime}_{1}\\ \nabla P^{-1}\end{array}\right)^{\top}y=0. (82)

The objective functions of the primal and dual problems, in Eqs. (77) and (82) respectively, are needed to generate the conditional primal-dual gap plots in Fig. 3.

The material map TV proximity step

In order to derive the TPL-TV and LSQ-TV one step algorithms, we need to derive the proximity minimization in Eq. (54)

y(n+1)=argminy{Fconvex(z0(n+1);y)+12Σgrad1/2(y(n)+(ΣsinoK1+ΣgradP1)x¯(n)y)22}.y^{(n+1)}=\operatorname*{arg\,min}_{y^{\prime}}\left\{F_{\text{convex}}^{*}\left(z^{(n+1)}_{0};y^{\prime}\right)+\frac{1}{2}\left\|\Sigma_{\text{grad}}^{-1/2}\left(y^{(n)}+(\Sigma_{\text{sino}}K^{\prime}_{1}+\Sigma_{\text{grad}}\nabla P^{-1})\bar{x}^{(n)}-y^{\prime}\right)\right\|^{2}_{2}\right\}.

The proximity problem splits into “sino” and “grad” sub-problems and the “sino” sub-problem results in Eq. (63). We solve here the “grad” proximity optimization to obtain the pseudo-code for TPL-TV and LSQ-TV

ygrad(n+1)=argminygrad{F2(ygrad)+12Σgrad1/2(ygrad+ygrad)22} where ygrad+=ygrad(n)+ΣgradP1x¯(n).y_{\text{grad}}^{(n+1)}=\operatorname*{arg\,min}_{y_{\text{grad}}^{\prime}}\left\{F_{2}^{*}\left(y_{\text{grad}}^{\prime}\right)+\frac{1}{2}\left\|\Sigma_{\text{grad}}^{-1/2}\left(y^{+}_{\text{grad}}-y_{\text{grad}}^{\prime}\right)\right\|^{2}_{2}\right\}\text{ where }y^{+}_{\text{grad}}=y_{\text{grad}}^{(n)}+\Sigma_{\text{grad}}\nabla P^{-1}\bar{x}^{(n)}.

Dropping the “grad” subscript on yy^{\prime}, we employ the Moreau identity which relates the proximity optimizations between a function and its dual

argminy{F2(y)+12Σgrad1/2(ygrad+y)22}=ygrad+Σgrad1/2argminy{F2(yΣgrad1/2)+12ygrad+Σgrad1/2y22}.\operatorname*{arg\,min}_{y^{\prime}}\left\{F_{2}^{*}\left(y^{\prime}\right)+\frac{1}{2}\left\|\Sigma_{\text{grad}}^{-1/2}\left(y^{+}_{\text{grad}}-y^{\prime}\right)\right\|^{2}_{2}\right\}=\\ y^{+}_{\text{grad}}-\Sigma_{\text{grad}}^{1/2}\operatorname*{arg\,min}_{y^{\prime}}\left\{F_{2}\left(y^{\prime}\Sigma_{\text{grad}}^{-1/2}\right)+\frac{1}{2}\left\|y^{+}_{\text{grad}}\Sigma_{\text{grad}}^{-1/2}-y^{\prime}\right\|^{2}_{2}\right\}. (83)

The dual “grad” update separates into the individual material map mm components

ygrad,m(n+1)=ygrad,m+Σgrad,m1/2argminym{δ((|ymΣgrad,m1/2|)1γm)+12ygrad,m+Σgrad,m1/2ym22}.y_{\text{grad},m}^{(n+1)}=y^{+}_{\text{grad},m}-\Sigma_{\text{grad},m}^{1/2}\operatorname*{arg\,min}_{y^{\prime}_{m}}\left\{\delta\left(\left\|\left(\left|y^{\prime}_{m}\Sigma_{\text{grad},m}^{-1/2}\right|\right)\right\|_{1}\leq\gamma_{m}\right)+\frac{1}{2}\left\|y^{+}_{\text{grad},m}\Sigma_{\text{grad},m}^{-1/2}-y^{\prime}_{m}\right\|^{2}_{2}\right\}.

To simplify the proximity minimization we set

gm+\displaystyle g^{+}_{m} =|ygrad,m+Σgrad,m1/2|,g^m+=(ygrad,m+Σgrad,m1/2)/gm+,gm=|ym|,\displaystyle=\left|y^{+}_{\text{grad},m}\Sigma_{\text{grad},m}^{-1/2}\right|,\;\;\hat{g}^{+}_{m}=\left(y^{+}_{\text{grad},m}\Sigma_{\text{grad},m}^{-1/2}\right)/g^{+}_{m},\;\;g^{\prime}_{m}=\left|y^{\prime}_{m}\right|,
ygrad,m(n+1)\displaystyle y_{\text{grad},m}^{(n+1)} =ygrad,m+Σgrad,m1/2g^m+argmingm{δ(gmΣgrad,m1/21γm)+12gm+gm22}.\displaystyle=y^{+}_{\text{grad},m}-\Sigma_{\text{grad},m}^{1/2}\,\hat{g}^{+}_{m}\operatorname*{arg\,min}_{g^{\prime}_{m}}\left\{\delta\left(\left\|g^{\prime}_{m}\Sigma_{\text{grad},m}^{-1/2}\right\|_{1}\leq\gamma_{m}\right)+\frac{1}{2}\left\|g^{+}_{m}-g^{\prime}_{m}\right\|^{2}_{2}\right\}.

The proximity minimization is a projection of gm+g^{+}_{m} onto a weighted 1\ell_{1}-ball.

ygrad,m(n+1)\displaystyle y_{\text{grad},m}^{(n+1)} =ygrad,m+wg^m+Proj(gm+;{g,g/wγm})\displaystyle=y^{+}_{\text{grad},m}-w\,\hat{g}^{+}_{m}\text{Proj}\left(g^{+}_{m};\left\{g,\|g/w\|\leq\gamma_{m}\right\}\right)
where w=Σgrad,m1/2.\displaystyle\text{where }w=\Sigma_{\text{grad},m}^{1/2}.

If gg is inside the weighted 1\ell_{1}-ball, i.e. g/w1γ\|g/w\|_{1}\leq\gamma, the function Proj(g;{g,g/wγ})\text{Proj}\left(g;\left\{g,\|g/w\|\leq\gamma\right\}\right) returns gg. If gg is outside the weighted 1\ell_{1}-ball, i.e. g/w1>γ\|g/w\|_{1}>\gamma, there exists an α0\alpha_{0} such that

Proj(g;{g,g/wγm})\displaystyle\text{Proj}\left(g;\left\{g,\|g/w\|\leq\gamma_{m}\right\}\right) =argming{δ(g/w1γ)+12gg22}\displaystyle=\operatorname*{arg\,min}_{g^{\prime}}\left\{\delta\left(\left\|g^{\prime}/w\right\|_{1}\leq\gamma\right)+\frac{1}{2}\left\|g-g^{\prime}\right\|^{2}_{2}\right\}
=argming{α0g/w1+12gg22}\displaystyle=\operatorname*{arg\,min}_{g^{\prime}}\left\{\alpha_{0}\left\|g^{\prime}/w\right\|_{1}+\frac{1}{2}\left\|g-g^{\prime}\right\|^{2}_{2}\right\}
=max(gα0w,0).\displaystyle=\max\left(g-\alpha_{0}w,0\right).

The parameter α0\alpha_{0} is defined implicitly

max(gα0w,0)1=γ,\|\max\left(g-\alpha_{0}w,0\right)\|_{1}=\gamma,

and it can be determined by any standard root finding technique applied to

f(α)=0 where f(α)=max(gαw,0)1γ,f(\alpha)=0\text{ where }f(\alpha)=\|\max\left(g-\alpha w,0\right)\|_{1}-\gamma,

where the search interval is α[0,g/w]\alpha\in[0,\|g/w\|_{\infty}].

The pseudocode for TPL-TV and LSQ-TV

Having derived the TV constraint proximity step, we are in a position to write the complete pseudocode for the one-step algorithm including the TV constraints. We do employ the μ\mu-preconditioning that orthogonalizes the linear attenuation coefficients, but we drop the prime notation on ff and KK.

f0\displaystyle f_{0} =f¯(n)\displaystyle=\bar{f}^{(n)}
Σ(n)\displaystyle\Sigma^{(n)} =(|K1(f0)||P1|)𝟏/λ;T(n)=λ(|K1(f0)||P1|)𝟏\displaystyle=\left(\begin{array}[]{c}|K_{1}(f_{0})|\\ |\nabla P^{-1}|\end{array}\right)\mathbf{1}/\lambda;\;\;T^{(n)}=\lambda\left(\begin{array}[]{c}|K_{1}(f_{0})|\\ |\nabla P^{-1}|\end{array}\right)^{\top}\mathbf{1} (88)
wm\displaystyle w_{m} =(Σgrad,m(n))1/2m\displaystyle=\left(\Sigma_{\text{grad},m}^{(n)}\right)^{1/2}\;\forall\,m
z0(n+1)\displaystyle z^{(n+1)}_{0} =(Σsino(n))1(ysino(n1)ysino(n)+Σsino(n)K1(f0)f¯(n1))\displaystyle=\left(\Sigma_{\text{sino}}^{(n)}\right)^{-1}\left(y_{\text{sino}}^{(n-1)}-y_{\text{sino}}^{(n)}+\Sigma_{\text{sino}}^{(n)}K_{1}(f_{0})\bar{f}^{(n-1)}\right)
ysino(n+1)\displaystyle y_{\text{sino}}^{(n+1)} =(D1(f0)+Σsino(n))1[D1(f0)(ysino(n)+Σsino(n)K1(f0)f¯(n))Σsino(n)(b1(f0)+E1(f0)z0(n+1))]\displaystyle=\left(D_{1}(f_{0})+\Sigma_{\text{sino}}^{(n)}\right)^{-1}\left[D_{1}(f_{0})\left(y_{\text{sino}}^{(n)}+\Sigma_{\text{sino}}^{(n)}K_{1}(f_{0})\bar{f}^{(n)}\right)-\Sigma_{\text{sino}}^{(n)}\left(b_{1}(f_{0})+E_{1}(f_{0})z^{(n+1)}_{0}\right)\right]
ygrad,m+\displaystyle y^{+}_{\text{grad},m} =ygrad,m(n)+Σgrad,m(n)(P1f¯(n))mm\displaystyle=y_{\text{grad},m}^{(n)}+\Sigma_{\text{grad},m}^{(n)}\nabla(P^{-1}\bar{f}^{(n)})_{m}\;\forall\,m
gm+\displaystyle g^{+}_{m} =|ygrad,m+/wm|;g^m+=(ygrad,m+/wm)/gm+m\displaystyle=\left|y^{+}_{\text{grad},m}/w_{m}\right|;\;\;\hat{g}^{+}_{m}=\left(y^{+}_{\text{grad},m}/w_{m}\right)/g^{+}_{m}\;\forall\,m
ygrad,m(n+1)\displaystyle y_{\text{grad},m}^{(n+1)} =ygrad,m+wmg^m+Proj(gm+;{g,g/wmγm})m\displaystyle=y^{+}_{\text{grad},m}-w_{m}\,\hat{g}^{+}_{m}\text{Proj}\left(g^{+}_{m};\left\{g,\|g/w_{m}\|\leq\gamma_{m}\right\}\right)\;\forall\,m
f(n+1)\displaystyle f^{(n+1)} =f¯(n)T(n)(K1(f0)P1)(ysino(n+1)ygrad(n+1))\displaystyle=\bar{f}^{(n)}-T^{(n)}\left(\begin{array}[]{c}K_{1}(f_{0})\\ \nabla P^{-1}\end{array}\right)^{\top}\left(\begin{array}[]{c}y_{\text{sino}}^{(n+1)}\\ y_{\text{grad}}^{(n+1)}\end{array}\right) (93)
f¯(n+1)\displaystyle\bar{f}^{(n+1)} =2f(n+1)f(n)\displaystyle=2f^{(n+1)}-f^{(n)}

The final material maps after NN iterations are obtained by applying the inverse preconditioner

return P1f(N).\text{return }P^{-1}f^{(N)}.

For all the results presented in the article, all variables are initialized to zero.

References

  • [1] K. Taguchi and J. S. Iwanczyk, “Vision 20/20: Single photon counting x-ray detectors in medical imaging,” Med. Phys., vol. 40, p. 100901, 2013.
  • [2] R. E. Alvarez and A. Macovski, “Energy-selective reconstructions in X-ray computerised tomography,” Phys. Med. Biol., vol. 21, pp. 733–744, 1976.
  • [3] P. M. Shikhaliev, “Energy-resolved computed tomography: first experimental results,” Phys. Med. Biol., vol. 53, pp. 5595–5613, 2008.
  • [4] T. G. Schmidt, “Optimal “image-based” weighting for energy-resolved CT,” Med. Phys., vol. 36, pp. 3018–3027, 2009.
  • [5] A. M. Alessio and L. R. MacDonald, “Quantitative material characterization from multi-energy photon counting CT,” Med. Phys., vol. 40, p. 031108, 2013.
  • [6] E. Roessl and R. Proksa, “K-edge imaging in x-ray computed tomography using multi-bin photon counting detectors,” Phys. Med. Biol., vol. 52, pp. 4679–4696, 2007.
  • [7] J.-P. Schlomka, E. Roessl, R. Dorscheid, S. Dill, G. Martens, T. Istel, C. Bäumer, C. Herrmann, R. Steadman, G. Zeitler, A. Livne, and R. Proksa, “Experimental feasibility of multi-energy photon-counting K-edge imaging in pre-clinical computed tomography,” Phys. Med. Biol., vol. 53, pp. 4031–4047, 2008.
  • [8] E. Roessl, D. Cormode, B. Brendel, K.-J. Engel, G. Martens, A. Thran, Z. Fayad, and R. Proksa, “Preclinical spectral computed tomography of gold nano-particles,” Nucl. Inst. Meth. A, vol. 648, pp. S259–S264, 2011.
  • [9] D. P. Cormode, E. Roessl, A. Thran, T. Skajaa, R. E. Gordon, J.-P. Schlomka, V. Fuster, E. A. Fisher, W. J. M. Mulder, R. Proksa, and Z. A. Fayad, “Atherosclerotic plaque composition: Analysis with multicolor CT and targeted Gold nanoparticles,” Radiology, vol. 256, pp. 774–782, 2010.
  • [10] E. Roessl, B. Brendel, K.-J. Engel, J.-P. Schlomka, A. Thran, and R. Proksa, “Sensitivity of photon-counting based K-edge imaging in X-ray computed tomography.” IEEE Trans. Med. Imaging, vol. 30, pp. 1678–1690, 2011.
  • [11] C. O. Schirra, E. Roessl, T. Koehler, B. Brendal, A. Thran, D. Pan, M. A. Anastasio, and R. Proksa, “Statistical reconstruction of material decomposed data in spectral CT,” IEEE Trans. Med. Imaging, vol. 32, pp. 1249–1257, 2013.
  • [12] G. N. Hounsfield, “Computerized transverse axial scanning (tomography): Part 1. Description of system,” Brit. J. Radiol., vol. 46, pp. 1016–1022, 1973.
  • [13] C. Maaß, M. Baer, and M. Kachelrieß, “Image-based dual energy CT using optimized precorrection functions: A practical new approach of material decomposition in image domain,” Med. Phys., vol. 36, pp. 3818–3829, 2009.
  • [14] R. A. Brooks, “A quantitative theory of the Hounsfield unit and its application to dual energy scanning.” J. Comp. Assist. Tomography, vol. 1, pp. 487–493, 1977.
  • [15] J. A. Fessler, I. A. Elbakri, P. Sukovic, and N. H. Clinthorne, “Maximum-likelihood dual-energy tomographic image reconstruction,” vol. 4684, 2002, pp. 38–49.
  • [16] I. A. Elbakri and J. A. Fessler, “Statistical image reconstruction for polyenergetic X-ray computed tomography,” IEEE Trans. Med. Imaging, vol. 21, pp. 89–99, 2002.
  • [17] J. Chung, J. G. Nagy, and I. Sechopoulos, “Numerical algorithms for polyenergetic digital breast tomosynthesis reconstruction,” SIAM J. Imaging Sci., vol. 3, no. 1, pp. 133–152, 2010.
  • [18] C. Cai, T. Rodet, S. Legoupil, and A. Mohammad-Djafari, “A full-spectral Bayesian reconstruction approach based on the material decomposition model applied in dual-energy computed tomography,” Med. Phys., vol. 40, p. 111916, 2013.
  • [19] Z. Ruoqiao, J.-B. Thibault, C.-A. Bouman, K. D. Sauer., and J. Hsieh, “Model-based iterative reconstruction for dual-energy X-ray CT using a joint quadratic likelihood model,” IEEE Trans. Med. Imaging, vol. 33, pp. 117–134, 2014.
  • [20] Y. Long and J. A. Fessler, “Multi-material decomposition using statistical image reconstruction for spectral CT,” IEEE Trans. Med. Imaging, vol. 33, pp. 1614–1626, 2014.
  • [21] A. Sawatsky, Q. Xu, C. O. Schirra, and M. A. Anastasio, “Proximal admm for multi-channel image reconstruction in spectral X-ray CT,” IEEE Trans. Med. Imaging, vol. 33, pp. 1657–1668, 2014.
  • [22] K. Nakada, K. Taguchi, G. S. K. Fung, and K. Armaya, “Joint estimation of tissue types and linear attenuation coefficients for photon counting CT,” Med. Phys., vol. 42, pp. 5329–5341, 2015.
  • [23] J. H. Hubbell and S. M. Seltzer, “Tables of x-ray mass attenuation coefficients and mass energy-absorption coefficients 1 keV to 20 MeV for elements Z= 1 to 92 and 48 additional substances of dosimetric interest,” National Inst. of Standards and Technology-PL, Gaithersburg, MD (United States). Ionizing Radiation Div., Tech. Rep., 1995.
  • [24] D. S. Rigie and P. J. L. Rivière, “Joint reconstruction of multi-channel, spectral CT data via constrained total nuclear variation minimization,” Phys. Med. Biol., vol. 60, pp. 1741–1762, 2015.
  • [25] A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” J. Math. Imag. Vis., vol. 40, pp. 120–145, 2011.
  • [26] T. Pock and A. Chambolle, “Diagonal preconditioning for first order primal-dual algorithms in convex optimization,” in International Conference on Computer Vision (ICCV 2011). Barcelona, Spain: IEEE, 2011, pp. 1762–1769.
  • [27] E. Y. Sidky, J. H. Jørgensen, and X. Pan, “Convex optimization problem prototyping for image reconstruction in computed tomography with the Chambolle-Pock algorithm,” Phys. Med. Biol., vol. 57, pp. 3065–3091, 2012.
  • [28] E. Y. Sidky, R. Chartrand, J. M. Boone, and X. Pan, “Constrained TppV minimization for enhanced exploitation of gradient sparsity: Application to CT image reconstruction,” J. Trans. Engineer. Health Med., vol. 2, p. 1800418, 2014.
  • [29] J. S. Jørgensen and E. Y. Sidky, “How little data is enough? Phase-diagram analysis of sparsity-regularized X-ray CT,” Phil. Trans. Royal Soc. A, vol. 373, p. 20140387, 2015.
  • [30] R. F. Barber and E. Y. Sidky, “MOCCA: mirrored convex/concave optimization for nonconvex composite functions,” 2015, http://arxiv.org/abs/1510.08842.
  • [31] J. S. Jørgensen, E. Y. Sidky, and X. Pan, “Quantifying admissible undersampling for sparsity-exploiting iterative image reconstruction in X-ray CT,” IEEE Trans. Med. Imaging, vol. 32, pp. 460–473, 2013.
  • [32] E. Y. Sidky, C.-M. Kao, and X. Pan, “Accurate image reconstruction from few-views and limited-angle data in divergent-beam CT,” J. X-ray Sci. Tech., vol. 14, pp. 119–139, 2006.
  • [33] E. Y. Sidky and X. Pan, “Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization,” Phys. Med. Biol., vol. 53, pp. 4777–4807, 2008.
  • [34] H. H. Barrett and K. J. Myers, Foundations of Image Science. Hoboken, NJ: John Wiley & Sons, 2004.