This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A short tutorial on Wirtinger Calculus with applications in quantum information

Kelvin Koor cqtkjkk@nus.edu.sg Centre for Quantum Technologies, National University of Singapore Yixian Qiu yixian_qiu@u.nus.edu Centre for Quantum Technologies, National University of Singapore Leong Chuan Kwek kwekleongchuan@nus.edu.sg Centre for Quantum Technologies, National University of Singapore MajuLab, CNRS-UNS-NUS-NTU International Joint Research Unit, UMI 3654 Quantum Science and Engineering Centre, Nanyang Technological University Patrick Rebentrost cqtfpr@nus.edu.sg Centre for Quantum Technologies, National University of Singapore
Abstract

The optimization of system parameters is a ubiquitous problem in science and engineering. The traditional approach involves setting to zero the partial derivatives of the objective function with respect to each parameter, in order to extract the optimal solution. However, the system parameters often take the form of complex matrices. In such situations, conventional methods become unwieldy. The ‘Wirtinger Calculus’ provides a relatively simple methodology for such optimization problems. In this tutorial, we provide a pedagogical introduction to Wirtinger Calculus. To illustrate the utility of this framework in quantum information theory, we also discuss a few example applications.

1 Introduction

The optimization of system parameters is a ubiquitous problem in science and engineering. For an analytical solution, the partial derivatives of the objective function with respect to the parameters are set to zero. Alternatively, gradient descent could be applied iteratively, leading to increasingly accurate approximations to the optimal solution. In either case, this involves taking the (partial) derivatives of the objective function. Often, however, the problem is formulated in terms of complex-valued parameters. While the complex derivatives of complex functions play an important role in complex analysis and its numerous applications to science and engineering, in the optimization of real-valued111Objective functions are real- and not complex-valued because there is no ordering (<< and >>) in the field of complex numbers, and thus no notion of minimization/maximization. functions of complex parameters they are much less useful. Furthermore, these parameters often come in the form of matrices, whose structure we would like to preserve for a deeper understanding of the problem.

In short, we are dealing with real-valued functions of complex matrices. How do we optimize such functions? In principle we could convert everything into real numbers—MN()2N2M_{N}(\mathbb{C})\cong\mathbb{R}^{2N^{2}}, so we could view ff as a function of 2N22N^{2} real parameters and implement conventional optimization methods. However, this conversion is generally tedious, and the resulting expression for ff cumbersome. The ‘Wirtinger Calculus’ provides a relatively simple methodology for the optimization of such functions, through the use of ‘Wirtinger derivatives’. This calculus is best viewed as a bookkeeping device. It justifies, and enables us to ‘differentiate as usual’ (almost) with respect to the complex matrix parameter as a whole, as opposed to 2N22N^{2} distinct real parameters. Any dependencies among the elements of the input matrix (e.g. when the matrix is Hermitian) can also be accounted for via appropriate modifications.

The Wirtinger Calculus was developed principally by Austrian mathematician Wilhelm Wirtinger [Wir27] in his paper on functions of several complex variables, although according to [Rem91] the concept goes back to Poincaré [Poi98]. It was subsequently rediscovered and extended by the electrical engineering community [Bra83, VDB94] for the purposes of optimizing real-valued functions with complex inputs. A compendium on this topic can be found in Hjørungnes’ text [Hjø11], alongside a concise summary [HG07]. Other good resources include [KD09], Chapter 1 (Part 4) of [Rem91] and [GK06]. Our exposition in this article is elementary, and only requires an understanding of the basics of multivariable calculus, linear algebra and complex analysis.

1.1 Motivation

The objects of interest in this discussion are real-valued functions with (generally complex) matrix inputs. We begin with a simple example that is not entirely trivial. Consider the function

f:M2()\displaystyle f:M_{2}(\mathbb{R}) \displaystyle\longrightarrow\mathbb{R} (1.1)
𝐗\displaystyle\mathbf{X} Tr(𝐗2).\displaystyle\mapsto\operatorname{Tr}(\mathbf{X}^{2}).

How do we optimize this function? The obvious way to do so would be to observe that for the purposes of optimization, the matrix structure of 𝐗\mathbf{X} can be disregarded. What matters is that 𝐗\mathbf{X} essentially comprises a set of parameters, and ff is a multivariable function. If we write 𝐗=[abcd]\mathbf{X}=\begin{bmatrix}a&b\\ c&d\end{bmatrix}, then after identifying 𝐗\mathbf{X} with the vector (a,b,c,d)(a,b,c,d) we can express222Strictly speaking, the functions in Equations 1.1 and 1.2 are not the same, simply because they have different domains (M2()M_{2}(\mathbb{R}) versus 4\mathbb{R}^{4}). ff as

f(a,b,c,d)=a2+d2+2bc.\displaystyle f(a,b,c,d)=a^{2}+d^{2}+2bc. (1.2)

Conventional optimization methods can then be implemented accordingly. (Note that constraints may have to be further imposed on 𝐗\mathbf{X} to ensure the existence of solutions. For example in Equation 1.1 above we note that simply by rescaling 𝐗\mathbf{X} we can send f(𝐗)f(\mathbf{X}) to ±\pm\infty, thus no optimal solution exists. To remedy this we could impose, say, Tr𝐗=1\operatorname{Tr}\mathbf{X}=1.)

One quickly observes the impracticality of this approach except in the simplest of cases. Even if the form of ff is relatively simple, like the one above, a few issues persist. For instance, if 𝐗Mn()\mathbf{X}\in M_{n}(\mathbb{R}) for general nn, then as nn increases the corresponding analog to ff in Equation 1.2 becomes considerably messier. Likewise, had the field been \mathbb{C} instead of \mathbb{R}, or if there were constraints among the elements of 𝐗\mathbf{X} (if 𝐗Mn()\mathbf{X}\in M_{n}(\mathbb{R}) is symmetric, say, then the number of Lagrange multipliers required goes as Θ(n2)\Theta(n^{2})). But perhaps the most salient deficiency lies beyond the mathematics: that 𝐗\mathbf{X} is a matrix (and not just a bunch of numbers) often holds significant physical meaning, and one would like to preserve this structure to gain further insight into the problem. This is perhaps best illustrated with an example. Suppose the optimal solution to some problem (similar in flavour to Problem 1.1 above) is given by 𝐗=[abcd]\mathbf{X^{\star}}=\begin{bmatrix}a^{\star}&b^{\star}\\ c^{\star}&d^{\star}\end{bmatrix} with

a\displaystyle a^{\star} =p3p2+2pqr+qrsqr\displaystyle=p^{3}-p^{2}+2pqr+qrs-qr
b\displaystyle b^{\star} =p2q+q2r+pqs+qs2pqqs\displaystyle=p^{2}q+q^{2}r+pqs+qs^{2}-pq-qs
c\displaystyle c^{\star} =p2r+qr2+prs+rs2prps\displaystyle=p^{2}r+qr^{2}+prs+rs^{2}-pr-ps
d\displaystyle d^{\star} =s3s2+2qrs+pqrqr\displaystyle=s^{3}-s^{2}+2qrs+pqr-qr

where 𝐘=[pqrs]\mathbf{Y}=\begin{bmatrix}p&q\\ r&s\end{bmatrix} is a matrix involved in the specification of the problem. One suspects it might be possible to express 𝐗\mathbf{X^{\star}} in terms of 𝐘\mathbf{Y}, but it is not immediately obvious how333Here, 𝐗=𝐘3𝐘2\mathbf{X^{\star}}=\mathbf{Y}^{3}-\mathbf{Y}^{2}. The solution form is admittedly rather artificial. For a ‘realistic’ example see Example 4.3 below.. By using the ‘obvious’ method above, we have obscured potentially important information in the form of an explicit connection between 𝐗\mathbf{X^{\star}} and 𝐘\mathbf{Y}.

Wouldn’t it be nice if we could somehow differentiate f(𝐗)f(\mathbf{X}) with respect to 𝐗\mathbf{X} as a whole? That is, we obtain something like

df(𝐗)d𝐗=f(𝐗)\displaystyle\frac{df(\mathbf{X})}{d\mathbf{X}}=f^{\prime}(\mathbf{X})

for some matrix function ff^{\prime}, which we can call the matrix derivative of ff. Then by setting f(𝐗)=0f^{\prime}(\mathbf{X})=0 we could hopefully extract the optimal 𝐗\mathbf{X}, in a similar flavour to the procedure taught in elementary calculus. Showing that this could be done for a large class of interesting and relevant functions, and how to do it, is the focus of this article.

1.2 Preliminaries and notation

We define the following notations. Let ={1,2,}\mathbb{N}=\{1,2,\dots\} be the set of positive natural numbers. For nn\in\mathbb{N}, [n]={1,2,,n}[n]=\{1,2,\dots,n\}. If zz\in\mathbb{C}, zz^{*}\in\mathbb{C} denotes its complex conjugate. Mn(𝔽)/𝔽n×nM_{n}(\mathbb{F})/\mathbb{F}^{n\times n} is the set of n×nn\times n dimensional matrices over the field 𝔽\mathbb{F}, where 𝔽=/\mathbb{F}=\mathbb{R}/\mathbb{C}. We denote a Hilbert space by \mathcal{H} (N\mathcal{H}_{N} if its dimension is to be explicitly specified), the set of linear operators on \mathcal{H} by ()\mathcal{L}(\mathcal{H}), and the set of density operators on \mathcal{H} by 𝒟()\mathcal{D}(\mathcal{H}). F\|\cdot\|_{F} is the Frobenius norm. The symbol \odot denotes component-wise product, e.g. for vectors (vw)i=viwi(v\odot w)_{i}=v_{i}w_{i}, for matrices (AB)ij=AijBij(A\odot B)_{ij}=A_{ij}B_{ij}. Throughout most of this text, we adopt the convention whereby scalar quantities are denoted by lowercase symbols (xx), vector quantities by lowercase boldface (𝐱)(\mathbf{x}) and matrix quantities by capital boldface (𝐗)(\mathbf{X}).

We will not be pedantic with domain/codomain issues. While we generally write f:f:\mathbb{C}\longrightarrow\mathbb{C} or f:nmf:\mathbb{R}^{n}\longrightarrow\mathbb{R}^{m}, points where ff is not well-defined are implicitly excluded from the domain. E.g. if f(z)=1/zf(z)=1/z then the domain of ff is understood to be {0}\mathbb{C}\setminus\{0\}. Similarly, a function is complex-differentiable at a point if its complex derivative exists at that point, and it is holomorphic on an open set if it is complex-differentiable at every point in the open set. This is strictly speaking a stronger condition than complex-differentiability, but we shall simply use the two terms interchangeably.

1.3 Real vs. Complex Differentiability

As will be shown below in Proposition 2.6, non-constant real-valued functions of complex numbers cannot be complex-differentiable, in the sense that the quantity

df(z)dz=limh0f(z+h)f(z)h\displaystyle\frac{df(z)}{dz}=\lim_{h\rightarrow 0}\frac{f(z+h)-f(z)}{h}

is not well-defined. However, the optimization of ff entails taking the derivatives fx\frac{\partial f}{\partial x} and fy\frac{\partial f}{\partial y}, which we assume a priori are well-defined. That fx\frac{\partial f}{\partial x} and fy\frac{\partial f}{\partial y} exist, but not dfdz\frac{df}{dz}, might cause some confusion since z=(x,y)z=(x,y). In this section, we clarify the difference between real- and complex- differentiability of a function f:f:\mathbb{C}\longrightarrow\mathbb{C}. The material presented here is standard fare in real and complex analysis, see for example [Rud53, Rem91, GK06].

In single-variable calculus, the derivative of f:f:\mathbb{R}\longrightarrow\mathbb{R} at xx is

f(x)=limh0f(x+h)f(x)h,\displaystyle f^{\prime}(x)=\lim_{h\rightarrow 0}\frac{f(x+h)-f(x)}{h}, (1.3)

with the usual interpretation being the ‘slope of the tangent of ff at xx’. For multivariable functions the generalization of Equation 1.3 is as follows. We say f:nmf:\mathbb{R}^{n}\longrightarrow\mathbb{R}^{m} is real-differentiable at xx if there exists T(n,m)T\in\mathcal{L}(\mathbb{R}^{n},\mathbb{R}^{m}) such that

limh0f(x+h)f(x)T(h)h=0,\displaystyle\lim_{h\rightarrow 0}\frac{\|f(x+h)-f(x)-T(h)\|}{\|h\|}=0, (1.4)

where \|\cdot\| denotes the Euclidean norm. We write f(x)=Tf^{\prime}(x)=T and call it the derivative/differential of ff at xx. Thus, this generalized derivative is a linear operator, not just a number (when n=m=1n=m=1, a linear operator is just a scaling, so it can be identified with the scale factor, a real number). The matrix representation of f(x)f^{\prime}(x) with respect to the standard Euclidean bases is the Jacobian matrix [fixj(x)]1im1jn\left[\frac{\partial f_{i}}{\partial x_{j}}(x)\right]_{\begin{subarray}{c}1\leq i\leq m\\ 1\leq j\leq n\end{subarray}}. While the geometric picture of a slope no longer holds, the interpretation of f(x)f^{\prime}(x) as best linear approximation of ff at xx still applies. The differential also subsumes as special cases the directional derivative (when n=1n=1) and the gradient vector (when m=1m=1). It is a standard result that for an open set OnO\subseteq\mathbb{R}^{n}, ff is real-differentiable on OO if and only if all its partial derivatives fixj\frac{\partial f_{i}}{\partial x_{j}}, 1im,1jn1\leq i\leq m,1\leq j\leq n exist and are continuous on OO (Theorem 9.21, [Rud53]). Thus for functions of our interest, we can assume they are real-differentiable since we almost always assume fixj\frac{\partial f_{i}}{\partial x_{j}} are all smooth.

The complex derivative is obtained by directly extending the concept of infinitesimal difference quotients in calculus (Eq. 1.3) to the complex domain. A function f:f:\mathbb{C}\longrightarrow\mathbb{C} is complex-differentiable at zz if the derivative f(z)=limh0f(z+h)f(z)hf^{\prime}(z)=\lim_{h\rightarrow 0}\frac{f(z+h)-f(z)}{h} exists, or equivalently,

limh0f(z+h)f(z)f(z)hh=0.\displaystyle\lim_{h\rightarrow 0}\frac{f(z+h)-f(z)-f^{\prime}(z)h}{h}=0. (1.5)

Since f:f:\mathbb{C}\longrightarrow\mathbb{C} is also f:22f:\mathbb{R}^{2}\longrightarrow\mathbb{R}^{2}, the notion of real-differentiability also applies to ff. How do Equations 1.4 and 1.5 relate to each other? First, recall that the Cauchy-Riemann equations

ux=vy,uy=vx\displaystyle\frac{\partial u}{\partial x}=\frac{\partial v}{\partial y},\qquad\frac{\partial u}{\partial y}=-\frac{\partial v}{\partial x} (1.6)

holding on 𝒪\mathcal{O} form a necessary and sufficient condition for ff to be complex-differentiable on 𝒪\mathcal{O}. In this case, f(z)=ux+ivx=vyiuyf^{\prime}(z)=u_{x}+iv_{x}=v_{y}-iu_{y}. We next recast 1.5 into a form more similar to 1.4. The multiplication of two complex numbers can be viewed as a matrix acting on a vector in 2\mathbb{R}^{2}: for z,wz,w\in\mathbb{C},

zw=(RezRewImzImw)+i(RezImw+ImzRew)\displaystyle z\cdot w=(\operatorname{Re}z\operatorname{Re}w-\operatorname{Im}z\operatorname{Im}w)+i(\operatorname{Re}z\operatorname{Im}w+\operatorname{Im}z\operatorname{Re}w) =[RezRewImzImwRezImw+ImzRew]\displaystyle=\begin{bmatrix}\operatorname{Re}z\operatorname{Re}w-\operatorname{Im}z\operatorname{Im}w\\ \operatorname{Re}z\operatorname{Im}w+\operatorname{Im}z\operatorname{Re}w\end{bmatrix}
=[RezImzImzRez][RewImw].\displaystyle=\begin{bmatrix}\operatorname{Re}z&-\operatorname{Im}z\\ \operatorname{Im}z&\operatorname{Re}z\end{bmatrix}\begin{bmatrix}\operatorname{Re}w\\ \operatorname{Im}w\end{bmatrix}.

Thus we have that if f(z)f^{\prime}(z) exists, then

limh0f(z+h)f(z)f(z)hh=0\displaystyle\lim_{h\rightarrow 0}\frac{f(z+h)-f(z)-f^{\prime}(z)h}{h}=0
\displaystyle\implies limh0f(z+h)f(z)[uxvxvxux]hh=0\displaystyle\lim_{h\rightarrow 0}\frac{f(z+h)-f(z)-\begin{bmatrix}u_{x}&-v_{x}\\ v_{x}&u_{x}\end{bmatrix}h}{h}=0
\displaystyle\implies limh0f(z+h)f(z)[uxuyvxvy]hh=0\displaystyle\lim_{h\rightarrow 0}\frac{f(z+h)-f(z)-\begin{bmatrix}u_{x}&u_{y}\\ v_{x}&v_{y}\end{bmatrix}h}{h}=0\qquad by Cauchy-Riemann
\displaystyle\implies limh0f((x,y)+h)f(x,y)f(x,y)hh=0\displaystyle\lim_{h\rightarrow 0}\frac{f((x,y)+h)-f(x,y)-f^{\prime}(x,y)h}{h}=0\qquad since f(x,y)=[uxuyvxvy]f^{\prime}(x,y)=\begin{bmatrix}u_{x}&u_{y}\\ v_{x}&v_{y}\end{bmatrix}
\displaystyle\iff limh0f((x,y)+h)f(x,y)f(x,y)hh=0,\displaystyle\lim_{h\rightarrow 0}\frac{\|f((x,y)+h)-f(x,y)-f^{\prime}(x,y)h\|}{\|h\|}=0,

5 i.e. ff is real-differentiable, and the complex derivative f(z)f^{\prime}(z) (a complex number) is identified with the differential/real derivative f(x,y)f^{\prime}(x,y) (a linear operator). Furthermore, note that

f(z)=f(x,y)=[uxuyvxvy]=[uxvxvxux]=rA,\displaystyle f^{\prime}(z)=f^{\prime}(x,y)=\begin{bmatrix}u_{x}&u_{y}\\ v_{x}&v_{y}\end{bmatrix}=\begin{bmatrix}u_{x}&-v_{x}\\ v_{x}&u_{x}\end{bmatrix}=rA,

where 0r0\leq r\in\mathbb{R} and ASO2()A\in SO_{2}(\mathbb{R}). This operator is a composition of scaling and an orientation-preserving rotation (the scaling factor and rotation angle generally differ from point to point). Such maps are widely called ‘conformal’ in the literature. In short, we have the following fact.

Fact 1.1.

If f:f:\mathbb{C}\longrightarrow\mathbb{C} is complex-differentiable/holomorphic, then it is real-differentiable. In this case, its differential f(x,y)f^{\prime}(x,y) is a conformal operator and can be identified with f(z)f^{\prime}(z).

The converse does not hold: if a map ff is real-differentiable it need not be complex-differentiable, the classic example being complex conjugation: f(z)=zf(z)=z^{\star} or equivalently f(x,y)=(x,y)f(x,y)=(x,-y). The conformality of the differential makes complex-differentiability a much more rigid property compared to real-differentiability. This rigidity however underlies many remarkable properties of holomorphic functions and thus the power of complex analysis, broadly defined as the study of these functions.

2 Wirtinger Calculus

2.1 The basic idea

We begin by discussing general complex-valued functions, f:f:\mathbb{C}\longrightarrow\mathbb{C}. Since \mathbb{C} is just 2\mathbb{R}^{2} endowed with the multiplication operation (a,b)×(c,d)(acbd,ad+bc)(a,b)\times(c,d)\mapsto(ac-bd,ad+bc), we can view ff as

f:\displaystyle f:\; 22\displaystyle\mathbb{R}^{2}\longrightarrow\mathbb{R}^{2}
(x,y)(u(x,y),v(x,y)).\displaystyle(x,y)\mapsto(u(x,y),v(x,y)).

Here we assume ff is real-differentiable, but ff need not be complex-differentiable (refer to Section 1.3 above for the difference). Often, however, ff is expressed in terms of zz and zz^{*}. More precisely, viewing z,zz,z^{*} as functions from ×\mathbb{R}\times\mathbb{R} to \mathbb{C}, one has a function f~:×\tilde{f}:\mathbb{C}\times\mathbb{C}\longrightarrow\mathbb{C} such that

f(x,y)=:f~(z,z)(x,y)=f~(z(x,y),z(x,y))=f~(x+iy,xiy).\displaystyle f(x,y)=:\tilde{f}\circ(z,z^{*})(x,y)=\tilde{f}(z(x,y),z^{*}(x,y))=\tilde{f}(x+iy,x-iy). (2.1)

Conversely, we also have

f~(z,z)=f(x,y)(z,z)=f(x(z,z),y(z,z))=f(z+z2,zz2).\displaystyle\tilde{f}(z,z^{*})=f\circ(x,y)(z,z^{*})=f(x(z,z^{*}),y(z,z^{*}))=f\left(\frac{z+z^{*}}{2},\frac{z-z^{*}}{2}\right). (2.2)

Partial differentiating ff and applying the chain rule gives

fx(x,y)\displaystyle\frac{\partial f}{\partial x}(x,y) =f~zzx+f~zzx=f~z(z(x,y),z(x,y))+f~z(z(x,y),z(x,y))\displaystyle=\frac{\partial\tilde{f}}{\partial z}\frac{\partial z}{\partial x}+\frac{\partial\tilde{f}}{\partial z^{*}}\frac{\partial z^{*}}{\partial x}=\frac{\partial\tilde{f}}{\partial z}(z(x,y),z^{*}(x,y))+\frac{\partial\tilde{f}}{\partial z^{*}}(z(x,y),z^{*}(x,y))
fy(x,y)\displaystyle\frac{\partial f}{\partial y}(x,y) =f~zzy+f~zzy=if~z(z(x,y),z(x,y))if~z(z(x,y),z(x,y)).\displaystyle=\frac{\partial\tilde{f}}{\partial z}\frac{\partial z}{\partial y}+\frac{\partial\tilde{f}}{\partial z^{*}}\frac{\partial z^{*}}{\partial y}=i\cdot\frac{\partial\tilde{f}}{\partial z}(z(x,y),z^{*}(x,y))-i\cdot\frac{\partial\tilde{f}}{\partial z^{*}}(z(x,y),z^{*}(x,y)).

After rearranging terms,

f~z(z(x,y),z(x,y))\displaystyle\frac{\partial\tilde{f}}{\partial z}(z(x,y),z^{*}(x,y)) =12(fxify)(x,y)\displaystyle=\frac{1}{2}\left(\frac{\partial f}{\partial x}-i\frac{\partial f}{\partial y}\right)(x,y)
f~z(z(x,y),z(x,y))\displaystyle\frac{\partial\tilde{f}}{\partial z^{*}}(z(x,y),z^{*}(x,y)) =12(fx+ify)(x,y).\displaystyle=\frac{1}{2}\left(\frac{\partial f}{\partial x}+i\frac{\partial f}{\partial y}\right)(x,y).

The notations z,zz,z^{*} may raise questions on independence. This is irrelevant—one may simply write z1,z2z_{1},z_{2} if one wishes. We emphasize that the fundamental input variables are the two real numbers xx and yy.

The purpose of the discussion above is as follows. We have that ff and f~\tilde{f} are two different expressions of the complex function f:f:\mathbb{C}\longrightarrow\mathbb{C}, ff being expressed in terms of xx and yy, while f~\tilde{f} being expressed in terms of zz and zz^{*}. Often, it is the ‘tilde-form’ of ff that is given. In such cases, certain procedures become a hassle. Examples include verifying the Cauchy-Riemann conditions, or finding the optimal points for ff (in the case ff is real-valued). The common feature of these operations is that they involve taking the partial derivatives x,y\frac{\partial}{\partial x},\frac{\partial}{\partial y} of ff, which requires ff to be expressed in terms of xx and yy beforehand. This conversion (from f~(z,z)\tilde{f}(z,z^{*}) to f(x,y)f(x,y), using Eq. 2.1) is generally tedious, and the resulting form of f(x,y)f(x,y) cumbersome. For instance, consider f~(z,z)=zmzn\tilde{f}(z,z^{*})=z^{m}z^{*n} for large m,nm,n\in\mathbb{Z}. As we shall see below, the operators z,z\frac{\partial}{\partial z},\frac{\partial}{\partial z^{*}} allow us to circumvent this by retaining the form f~(z,z)\tilde{f}(z,z^{*}) and partially differentiate with respect to zz and zz^{*} instead. In a nutshell, we can view f~(z,z)\tilde{f}(z,z^{*}) as a black box encapsulating the complexity of f(x,y)f(x,y) and z,z\frac{\partial}{\partial z},\frac{\partial}{\partial z^{*}} as ‘higher-level’ operators acting on the black box.

Definition 1 (Wirtinger Derivatives).

Let f(x,y)=(u(x,y),v(x,y))f(x,y)=(u(x,y),v(x,y)) be a complex function where u(x,y)u(x,y) and v(x,y)v(x,y) are differentiable functions with respect to x,yx,y.444We often just assume uu and vv are smooth. Write f(x,y)=f~(z(x,y),z(x,y))f(x,y)=\tilde{f}(z(x,y),z^{*}(x,y)), where f~:×\tilde{f}:\mathbb{C}\times\mathbb{C}\longrightarrow\mathbb{C}. The Wirtinger derivatives of ff are the partial derivatives of f~\tilde{f} with respect to zz and zz^{*}:

f~zandf~z.\displaystyle\frac{\partial\tilde{f}}{\partial z}\qquad\text{and}\qquad\frac{\partial\tilde{f}}{\partial z^{*}}. (2.3)
Remark 2.1.

As discussed above, the Wirtinger derivatives satisfy

f~z(z(x,y),z(x,y))\displaystyle\frac{\partial\tilde{f}}{\partial z}(z(x,y),z^{*}(x,y)) =12(fxify)(x,y)\displaystyle=\frac{1}{2}\left(\frac{\partial f}{\partial x}-i\frac{\partial f}{\partial y}\right)(x,y) (2.4)
f~z(z(x,y),z(x,y))\displaystyle\frac{\partial\tilde{f}}{\partial z^{*}}(z(x,y),z^{*}(x,y)) =12(fx+ify)(x,y).\displaystyle=\frac{1}{2}\left(\frac{\partial f}{\partial x}+i\frac{\partial f}{\partial y}\right)(x,y).

These two equations tell us what is happening ‘under the hood’: partially differentiating with respect to z/zz/z^{*} then writing z,zz,z^{*} in terms of x,yx,y is equivalent to first converting f~(z,z)\tilde{f}(z,z^{*}) into f(x,y)f(x,y) then applying the operators 12(fxify)\frac{1}{2}\left(\frac{\partial f}{\partial x}\mp i\frac{\partial f}{\partial y}\right). Often, we abuse notation and disregard the distinction between f~\tilde{f} and ff (i.e. write both f(x,y)f(x,y) and f(z,z)f(z,z^{*})). Doing so, we can simply write

z=12(xiy),z=12(x+iy).\displaystyle\frac{\partial}{\partial z}=\frac{1}{2}\left(\frac{\partial}{\partial x}-i\frac{\partial}{\partial y}\right),\qquad\frac{\partial}{\partial z^{*}}=\frac{1}{2}\left(\frac{\partial}{\partial x}+i\frac{\partial}{\partial y}\right). (2.5)

Henceforth we shall abuse notation and do away with f~\tilde{f}, writing both f(x,y)f(x,y) and f(z,z)f(z,z^{*}) (simply bear in mind that ff really is a function of two real variables, not two complex variables). Thus, zf(z,z)\frac{\partial}{\partial z}f(z,z^{*}) really means zf~(z,z)\frac{\partial}{\partial z}\tilde{f}(z,z^{*}) and zf(x,y)\frac{\partial}{\partial z}f(x,y) really means 12(xiy)f(x,y)\frac{1}{2}\left(\frac{\partial}{\partial x}-i\frac{\partial}{\partial y}\right)f(x,y) (likewise for z\frac{\partial}{\partial z^{*}}).

Example 2.2.

Let f(z,z)=zf(z,z^{*})=z. Then zz=1=12(xiy)(x+iy)\frac{\partial}{\partial z}z=1=\frac{1}{2}\left(\frac{\partial}{\partial x}-i\frac{\partial}{\partial y}\right)(x+iy) and zz=0=12(x+iy)(x+iy)\frac{\partial}{\partial z^{*}}z=0=\frac{1}{2}\left(\frac{\partial}{\partial x}+i\frac{\partial}{\partial y}\right)(x+iy). Likewise, zz=0=12(xiy)(xiy)\frac{\partial}{\partial z}z^{*}=0=\frac{1}{2}\left(\frac{\partial}{\partial x}-i\frac{\partial}{\partial y}\right)(x-iy) and zz=1=12(x+iy)(xiy)\frac{\partial}{\partial z^{*}}z^{*}=1=\frac{1}{2}\left(\frac{\partial}{\partial x}+i\frac{\partial}{\partial y}\right)(x-iy).

Example 2.3.

Consider the function f(z,z)=z2zf(z,z^{*})=z^{2}z^{*}. Here f(x,y)=(x+iy)2(xiy)=u(x,y)+iv(x,y)f(x,y)=(x+iy)^{2}(x-iy)=u(x,y)+iv(x,y) where u(x,y)=x3+xy2u(x,y)=x^{3}+xy^{2}, v(x,y)=x2y+y3v(x,y)=x^{2}y+y^{3}. We verify Equations 2.4:

fz\displaystyle\frac{\partial f}{\partial z} =2zz=2(x2+y2)\displaystyle=2zz^{*}=2(x^{2}+y^{2}) =12(xiy)(u+iv)\displaystyle=\frac{1}{2}\left(\frac{\partial}{\partial x}-i\frac{\partial}{\partial y}\right)(u+iv)
fz\displaystyle\frac{\partial f}{\partial z^{*}} =z2=x2+2ixy+y2\displaystyle=z^{2}=x^{2}+2ixy+y^{2} =12(x+iy)(u+iv).\displaystyle=\frac{1}{2}\left(\frac{\partial}{\partial x}+i\frac{\partial}{\partial y}\right)(u+iv).

We state without proof the following:

Fact 2.4.

The Wirtinger derivatives satisfy the following properties:

  1. i.

    Linearity: z/z(af+bg)=afz/z+bgz/z\frac{\partial}{\partial z/z^{*}}(af+bg)=a\frac{\partial f}{\partial z/z^{*}}+b\frac{\partial g}{\partial z/z^{*}}for a,ba,b\in\mathbb{C}

  2. ii.

    Product: z/z(fg)=fz/zg+fgz/z\frac{\partial}{\partial z/z^{*}}(f\cdot g)=\frac{\partial f}{\partial z/z^{*}}\cdot g+f\cdot\frac{\partial g}{\partial z/z^{*}}

  3. iii.

    Chain: z/z(gf)=gffz/z+gffz/z\frac{\partial}{\partial z/z^{*}}(g\circ f)=\frac{\partial g}{\partial f}\frac{\partial f}{\partial z/z^{*}}+\frac{\partial g}{\partial f^{*}}\frac{\partial f^{*}}{\partial z/z^{*}}

  4. iv.

    Conjugate: fz=(fz)\frac{\partial f}{\partial z^{*}}=\left(\frac{\partial f^{*}}{\partial z}\right)^{*}. In particular, if ff is real-valued then fz=(fz)\frac{\partial f}{\partial z^{*}}=\left(\frac{\partial f}{\partial z}\right)^{*}.

We emphasize that ff need not be complex-differentiable—so long as f(x,y)f(x,y) is real-differentiable, or equivalently, the partial derivatives ux,uy,vx,vyu_{x},u_{y},v_{x},v_{y} exist (see Section 1.3), the Wirtinger derivatives of ff exist. If ff is complex-differentiable, however, one Wirtinger derivative becomes zero and the other reduces to the normal complex derivative.

Proposition 2.5.

Let 𝒪\mathcal{O}\subseteq\mathbb{C} be an open set. ff is holomorphic on 𝒪\mathcal{O} if and only if the following hold on 𝒪\mathcal{O}:

fz\displaystyle\frac{\partial f}{\partial z^{*}} =0\displaystyle=0\qquad (Cauchy-Riemann condition)\displaystyle(\text{Cauchy-Riemann condition})
fz\displaystyle\frac{\partial f}{\partial z} =dfdz\displaystyle=\frac{df}{dz}\qquad (reduction to usual complex derivative).\displaystyle(\text{reduction to usual complex derivative}).
Proof.

On an open set, ff is holomorphic if and only if ux=vyu_{x}=v_{y}, uy=vxu_{y}=-v_{x} (the Cauchy-Riemann conditions) hold, see Section 1.3. Therefore

fz=12(fx+ify)=12(ux+ivx+iuyvy)=0\displaystyle\frac{\partial f}{\partial z^{*}}=\frac{1}{2}(f_{x}+if_{y})=\frac{1}{2}(u_{x}+iv_{x}+iu_{y}-v_{y})=0

and

fz=12(fx+ify)=12(ux+ivxiuy+vy)=ux+ivx=f(z).\displaystyle\frac{\partial f}{\partial z}=\frac{1}{2}(f_{x}+if_{y})=\frac{1}{2}(u_{x}+iv_{x}-iu_{y}+v_{y})=u_{x}+iv_{x}=f^{\prime}(z).

The concise equation fz=0\frac{\partial f}{\partial z^{*}}=0 thus merges the two Cauchy-Riemann equations into one. This also agrees with conventional wisdom whereby complex-differentiable functions ‘have no zz^{*} terms in their expressions’. The next result enables us to optimize for real-valued ff using Wirtinger derivatives.

Proposition 2.6.

Let f:f:\mathbb{C}\longrightarrow\mathbb{R} be a real-valued function. Then

  1. i.

    If ff is nonconstant, then it cannot be holomorphic, i.e. dfdz\frac{df}{dz} does not exist.

  2. ii.

    ff has a stationary point at z=(x,y)z=(x,y) if and only if

    fz(z)=0(or equivalentlyfz(z)=0).\displaystyle\frac{\partial f}{\partial z}(z)=0\quad\left(\text{or equivalently}\;\;\frac{\partial f}{\partial z^{*}}(z)=0\right).
Proof.

If f(x,y)=u(x,y)+iv(x,y)f(x,y)=u(x,y)+iv(x,y) is real-valued, v(x,y)v(x,y) is necessarily the zero function. Then

fz/z(z)=0\displaystyle\frac{\partial f}{\partial z/z^{*}}(z)=0 12(fx(x,y)ify(x,y))=0\displaystyle\iff\frac{1}{2}\left(\frac{\partial f}{\partial x}(x,y)\mp i\frac{\partial f}{\partial y}(x,y)\right)=0
fx(x,y)=0andfy(x,y)=0.\displaystyle\iff\frac{\partial f}{\partial x}(x,y)=0\;\;\text{and}\;\;\frac{\partial f}{\partial y}(x,y)=0.

Furthermore, assume ff is non-constant. If ff is holomorphic, the Cauchy-Riemann equations ux=vy=0u_{x}=v_{y}=0 and uy=vx=0u_{y}=-v_{x}=0 imply u(x,y)u(x,y) is constant, contradicting our assumption on ff. ∎

As in Proposition 2.5, we see the role of the Wirtinger derivatives as bookkeeping devices. Here f/z=0\partial f/\partial z=0 effectively merges the optimality conditions f/x=0\partial f/\partial x=0 and f/y=0\partial f/\partial y=0 into a single equation. The nature of the stationary point (i.e. whether it is a minimum/maximum/saddle point) has to be checked by inspecting higher-order derivatives or via additional considerations.

Example 2.7.

Let f(z,z)=|z|4|z|2f(z,z^{*})=|z|^{4}-|z|^{2}. We would like to find the stationary points of ff. One could first convert it into f(x,y)=(x2+y2)2(x2+y2)f(x,y)=(x^{2}+y^{2})^{2}-(x^{2}+y^{2}). Setting the gradient of ff to zero gives

2x(2(x2+y2)1)\displaystyle 2x(2(x^{2}+y^{2})-1) =0\displaystyle=0
2y(2(x2+y2)1)\displaystyle 2y(2(x^{2}+y^{2})-1) =0,\displaystyle=0,

which after some algebra gives (x,y)=(0,0)(x,y)=(0,0) and {(x,y):x2+y2=1/2}\{(x,y):x^{2}+y^{2}=1/2\} as the stationary points.

Using the Wirtinger derivatives, one avoids the hassle of converting f(z,z)f(z,z^{*}) to f(x,y)f(x,y), then carrying out differentiation twice. We simply obtain

fz(z)=0\displaystyle\frac{\partial f}{\partial z}(z)=0 z(2zz1)=0\displaystyle\implies z^{*}(2zz^{*}-1)=0
z=0 or |z|=1/2,\displaystyle\implies z=0\text{ or }|z|=1/\sqrt{2},

clearly equivalent to what was previously obtained.

2.2 Extension to multivariable complex functions

The extension of this formalism to functions f:nf:\mathbb{C}^{n}\longrightarrow\mathbb{C} is straightforward. We have

f:\displaystyle f:\; 2n2\displaystyle\mathbb{R}^{2n}\longrightarrow\mathbb{R}^{2}
(x1,y1,,xn,yn)=(𝐱,𝐲)(u(𝐱,𝐲),v(𝐱,𝐲)).\displaystyle(x_{1},y_{1},\dots,x_{n},y_{n})=(\mathbf{x},\mathbf{y})\mapsto(u(\mathbf{x},\mathbf{y}),v(\mathbf{x},\mathbf{y})).

Now for i=1,,ni=1,\dots,n regard zi,ziz_{i},z_{i}^{*} as functions from n×n\mathbb{R}^{n}\times\mathbb{R}^{n} to \mathbb{C}, where zi(𝐱,𝐲)=xi+iyiz_{i}(\mathbf{x},\mathbf{y})=x_{i}+iy_{i} and zi(𝐱,𝐲)=xiiyiz_{i}^{*}(\mathbf{x},\mathbf{y})=x_{i}-iy_{i}. Then we have a function f~:n×n\tilde{f}:\mathbb{C}^{n}\times\mathbb{C}^{n}\longrightarrow\mathbb{C} such that

f(𝐱,𝐲)=:f~(𝐳,𝐳)(𝐱,𝐲)=f~(𝐳(𝐱,𝐲),𝐳(𝐱,𝐲))=f~(𝐱+𝐢𝐲,𝐱𝐢𝐲).\displaystyle f(\mathbf{x},\mathbf{y})=:\tilde{f}\circ(\mathbf{z},\mathbf{z^{*}})(\mathbf{x},\mathbf{y})=\tilde{f}(\mathbf{z}(\mathbf{x},\mathbf{y}),\mathbf{z^{*}}(\mathbf{x},\mathbf{y}))=\tilde{f}(\mathbf{x+iy},\mathbf{x-iy}). (2.6)

Partial differentiating ff with respect to each xix_{i} and yiy_{i} gives

fxi(𝐱,𝐲)\displaystyle\frac{\partial f}{\partial x_{i}}(\mathbf{x},\mathbf{y}) =k=1nf~zkzkxi+k=1nf~zkzkxi=f~zi(𝐳(𝐱,𝐲),𝐳(𝐱,𝐲))+f~zi(𝐳(𝐱,𝐲),𝐳(𝐱,𝐲))\displaystyle=\sum_{k=1}^{n}\frac{\partial\tilde{f}}{\partial z_{k}}\frac{\partial z_{k}}{\partial x_{i}}+\sum_{k=1}^{n}\frac{\partial\tilde{f}}{\partial z_{k}^{*}}\frac{\partial z_{k}^{*}}{\partial x_{i}}=\frac{\partial\tilde{f}}{\partial z_{i}}(\mathbf{z}(\mathbf{x},\mathbf{y}),\mathbf{z^{*}}(\mathbf{x},\mathbf{y}))+\frac{\partial\tilde{f}}{\partial z_{i}^{*}}(\mathbf{z}(\mathbf{x},\mathbf{y}),\mathbf{z^{*}}(\mathbf{x},\mathbf{y}))
fyi(𝐱,𝐲)\displaystyle\frac{\partial f}{\partial y_{i}}(\mathbf{x},\mathbf{y}) =k=1nf~zkzkyi+k=1nf~zkzkyi=if~zi(𝐳(𝐱,𝐲),𝐳(𝐱,𝐲))if~zi(𝐳(𝐱,𝐲),𝐳(𝐱,𝐲))\displaystyle=\sum_{k=1}^{n}\frac{\partial\tilde{f}}{\partial z_{k}}\frac{\partial z_{k}}{\partial y_{i}}+\sum_{k=1}^{n}\frac{\partial\tilde{f}}{\partial z_{k}^{*}}\frac{\partial z_{k}^{*}}{\partial y_{i}}=i\cdot\frac{\partial\tilde{f}}{\partial z_{i}}(\mathbf{z}(\mathbf{x},\mathbf{y}),\mathbf{z^{*}}(\mathbf{x},\mathbf{y}))-i\cdot\frac{\partial\tilde{f}}{\partial z_{i}^{*}}(\mathbf{z}(\mathbf{x},\mathbf{y}),\mathbf{z^{*}}(\mathbf{x},\mathbf{y}))

so after rearranging terms we obtain the Wirtinger derivatives

f~𝐳(𝐳(𝐱,𝐲),𝐳(𝐱,𝐲))\displaystyle\frac{\partial\tilde{f}}{\partial\mathbf{z}}(\mathbf{z}(\mathbf{x},\mathbf{y}),\mathbf{z^{*}}(\mathbf{x},\mathbf{y})) =12(f𝐱if𝐲)(𝐱,𝐲)\displaystyle=\frac{1}{2}\left(\frac{\partial f}{\partial\mathbf{x}}-i\frac{\partial f}{\partial\mathbf{y}}\right)(\mathbf{x},\mathbf{y})
f~𝐳(𝐳(𝐱,𝐲),𝐳(𝐱,𝐲))\displaystyle\frac{\partial\tilde{f}}{\partial\mathbf{z^{*}}}(\mathbf{z}(\mathbf{x},\mathbf{y}),\mathbf{z^{*}}(\mathbf{x},\mathbf{y})) =12(f𝐱+if𝐲)(𝐱,𝐲).\displaystyle=\frac{1}{2}\left(\frac{\partial f}{\partial\mathbf{x}}+i\frac{\partial f}{\partial\mathbf{y}}\right)(\mathbf{x},\mathbf{y}).

Here 𝐳=(zn,,z1)\frac{\partial}{\partial\mathbf{z}}=\left(\frac{\partial}{\partial z_{n}},\dots,\frac{\partial}{\partial z_{1}}\right) and similarly for 𝐳\frac{\partial}{\partial\mathbf{z^{*}}}, 𝐱\frac{\partial}{\partial\mathbf{x}} and 𝐲\frac{\partial}{\partial\mathbf{y}}. As before, we abuse notation and write both f(𝐱,𝐲)f(\mathbf{x},\mathbf{y}) and f(𝐳,𝐳)f(\mathbf{z},\mathbf{z^{*}}), so we can write

𝐳=12(𝐱i𝐲),𝐳=12(𝐱+i𝐲).\displaystyle\frac{\partial}{\partial\mathbf{z}}=\frac{1}{2}\left(\frac{\partial}{\partial\mathbf{x}}-i\frac{\partial}{\partial\mathbf{y}}\right),\qquad\frac{\partial}{\partial\mathbf{z^{*}}}=\frac{1}{2}\left(\frac{\partial}{\partial\mathbf{x}}+i\frac{\partial}{\partial\mathbf{y}}\right). (2.7)

The extension of Proposition 2.6 also holds, namely if f:nf:\mathbb{C}^{n}\longrightarrow\mathbb{R} is real-valued then ff has a stationary point at 𝐳=(𝐱,𝐲)\mathbf{z}=(\mathbf{x},\mathbf{y}) if and only if f𝐳/𝐳(𝐳)=𝟎\frac{\partial f}{\partial\mathbf{z/z^{*}}}(\mathbf{z})=\mathbf{0}.

3 Matrix Wirtinger Calculus

Having discussed Wirtinger derivatives for multivariable functions, we are now ready to examine the commonly encountered scenario where the parameters are structured as matrices.

3.1 Matrix Wirtinger derivatives

Now we consider functions of the form f:n×nf:\mathbb{C}^{n\times n}\longrightarrow\mathbb{C}.555We could more generally let the domain be m×n\mathbb{C}^{m\times n}. Indeed, many of the results developed below hold for such matrices as well. But for simplicity and because our matrices of interest in applications are square, we let m=nm=n here. The following development, which we include for pedagogical purposes, is virtually the same as that in Section 2.2, just with different indexing. We have

f:\displaystyle f:\; 2(n×n)2\displaystyle\mathbb{R}^{2(n\times n)}\longrightarrow\mathbb{R}^{2}
(xij,yij)i,j[n]=(𝐗,𝐘)(u(𝐗,𝐘),v(𝐗,𝐘)).\displaystyle(x_{ij},y_{ij})_{i,j\in[n]}=(\mathbf{X},\mathbf{Y})\mapsto(u(\mathbf{X},\mathbf{Y}),v(\mathbf{X},\mathbf{Y})).

For i=1,,ni=1,\dots,n regard zij,zijz_{ij},z_{ij}^{*} as functions from n×n×n×n\mathbb{R}^{n\times n}\times\mathbb{R}^{n\times n} to \mathbb{C}, where zij(𝐗,𝐘)=xij+iyijz_{ij}(\mathbf{X},\mathbf{Y})=x_{ij}+iy_{ij} and zij(𝐗,𝐘)=xijiyijz_{ij}^{*}(\mathbf{X},\mathbf{Y})=x_{ij}-iy_{ij}. Then we have a function f~:n×n×n×n\tilde{f}:\mathbb{C}^{n\times n}\times\mathbb{C}^{n\times n}\longrightarrow\mathbb{C} such that

f(𝐗,𝐘)=:f~(𝐙,𝐙)(𝐗,𝐘)=f~(𝐙(𝐗,𝐘),𝐙(𝐗,𝐘))=f~(𝐗+𝐢𝐘,𝐗𝐢𝐘).\displaystyle f(\mathbf{X},\mathbf{Y})=:\tilde{f}\circ(\mathbf{Z},\mathbf{Z^{*}})(\mathbf{X},\mathbf{Y})=\tilde{f}(\mathbf{Z}(\mathbf{X},\mathbf{Y}),\mathbf{Z^{*}}(\mathbf{X},\mathbf{Y}))=\tilde{f}(\mathbf{X+iY},\mathbf{X-iY}). (3.1)

Again partial differentiating ff with respect to each xijx_{ij} and yijy_{ij} gives

fxij(𝐗,𝐘)\displaystyle\frac{\partial f}{\partial x_{ij}}(\mathbf{X},\mathbf{Y}) =k,l=1nf~zklzklxij+k,l=1nf~zklzklxij\displaystyle=\sum_{k,l=1}^{n}\frac{\partial\tilde{f}}{\partial z_{kl}}\frac{\partial z_{kl}}{\partial x_{ij}}+\sum_{k,l=1}^{n}\frac{\partial\tilde{f}}{\partial z_{kl}^{*}}\frac{\partial z_{kl}^{*}}{\partial x_{ij}}
=k,l=1nf~zklδkiδlj+k,l=1nf~zklδkiδlj\displaystyle=\sum_{k,l=1}^{n}\frac{\partial\tilde{f}}{\partial z_{kl}}\delta_{ki}\delta_{lj}+\sum_{k,l=1}^{n}\frac{\partial\tilde{f}}{\partial z_{kl}^{*}}\delta_{ki}\delta_{lj}
=f~zij(𝐙(𝐗,𝐘),𝐙(𝐗,𝐘))+f~zij(𝐙(𝐗,𝐘),𝐙(𝐗,𝐘))\displaystyle=\frac{\partial\tilde{f}}{\partial z_{ij}}(\mathbf{Z}(\mathbf{X},\mathbf{Y}),\mathbf{Z^{*}}(\mathbf{X},\mathbf{Y}))+\frac{\partial\tilde{f}}{\partial z_{ij}^{*}}(\mathbf{Z}(\mathbf{X},\mathbf{Y}),\mathbf{Z^{*}}(\mathbf{X},\mathbf{Y}))

and

fyij(𝐗,𝐘)\displaystyle\frac{\partial f}{\partial y_{ij}}(\mathbf{X},\mathbf{Y}) =k,l=1nf~zklzklyij+k,l=1nf~zklzklyij\displaystyle=\sum_{k,l=1}^{n}\frac{\partial\tilde{f}}{\partial z_{kl}}\frac{\partial z_{kl}}{\partial y_{ij}}+\sum_{k,l=1}^{n}\frac{\partial\tilde{f}}{\partial z_{kl}^{*}}\frac{\partial z_{kl}^{*}}{\partial y_{ij}}
=if~zij(𝐙(𝐗,𝐘),𝐙(𝐗,𝐘))if~zij(𝐙(𝐗,𝐘),𝐙(𝐗,𝐘)).\displaystyle=i\cdot\frac{\partial\tilde{f}}{\partial z_{ij}}(\mathbf{Z}(\mathbf{X},\mathbf{Y}),\mathbf{Z^{*}}(\mathbf{X},\mathbf{Y}))-i\cdot\frac{\partial\tilde{f}}{\partial z_{ij}^{*}}(\mathbf{Z}(\mathbf{X},\mathbf{Y}),\mathbf{Z^{*}}(\mathbf{X},\mathbf{Y})).

Rearranging terms as before gives us for 1i,jn1\leq i,j\leq n

f~zij(𝐙(𝐗,𝐘),𝐙(𝐗,𝐘))\displaystyle\frac{\partial\tilde{f}}{\partial z_{ij}}(\mathbf{Z}(\mathbf{X},\mathbf{Y}),\mathbf{Z^{*}}(\mathbf{X},\mathbf{Y})) =12(fxijifyij)(𝐗,𝐘)\displaystyle=\frac{1}{2}\left(\frac{\partial f}{\partial x_{ij}}-i\frac{\partial f}{\partial y_{ij}}\right)(\mathbf{X},\mathbf{Y}) (3.2)
f~zij(𝐙(𝐗,𝐘),𝐙(𝐗,𝐘))\displaystyle\frac{\partial\tilde{f}}{\partial z_{ij}^{*}}(\mathbf{Z}(\mathbf{X},\mathbf{Y}),\mathbf{Z^{*}}(\mathbf{X},\mathbf{Y})) =12(fxij+ifyij)(𝐗,𝐘).\displaystyle=\frac{1}{2}\left(\frac{\partial f}{\partial x_{ij}}+i\frac{\partial f}{\partial y_{ij}}\right)(\mathbf{X},\mathbf{Y}).

To preserve the matrix structure of the parameters zijz_{ij} and zijz_{ij}^{*} we use the standard notation

𝐙:=[z11z1nzn1znn]𝐙:=[z11z1nzn1znn]\displaystyle\frac{\partial}{\partial\mathbf{Z}}:=\begin{bmatrix}\frac{\partial}{\partial z_{11}}&\dots&\frac{\partial}{\partial z_{1n}}\\ \vdots&\ddots&\vdots\\ \frac{\partial}{\partial z_{n1}}&\dots&\frac{\partial}{\partial z_{nn}}\end{bmatrix}\qquad\frac{\partial}{\partial\mathbf{Z^{*}}}:=\begin{bmatrix}\frac{\partial}{\partial z_{11}^{*}}&\dots&\frac{\partial}{\partial z_{1n}^{*}}\\ \vdots&\ddots&\vdots\\ \frac{\partial}{\partial z_{n1}^{*}}&\dots&\frac{\partial}{\partial z_{nn}^{*}}\end{bmatrix} (3.3)

and similarly for 𝐗\frac{\partial}{\partial\mathbf{X}} and 𝐘\frac{\partial}{\partial\mathbf{Y}}. Then Equation 3.2 is concisely stated as

f~𝐙(𝐙(𝐗,𝐘),𝐙(𝐗,𝐘))\displaystyle\frac{\partial\tilde{f}}{\partial\mathbf{Z}}(\mathbf{Z}(\mathbf{X},\mathbf{Y}),\mathbf{Z^{*}}(\mathbf{X},\mathbf{Y})) =12(f𝐗if𝐘)(𝐗,𝐘)\displaystyle=\frac{1}{2}\left(\frac{\partial f}{\partial\mathbf{X}}-i\frac{\partial f}{\partial\mathbf{Y}}\right)(\mathbf{X},\mathbf{Y}) (3.4)
f~𝐙(𝐙(𝐗,𝐘),𝐙(𝐗,𝐘))\displaystyle\frac{\partial\tilde{f}}{\partial\mathbf{Z^{*}}}(\mathbf{Z}(\mathbf{X},\mathbf{Y}),\mathbf{Z^{*}}(\mathbf{X},\mathbf{Y})) =12(f𝐗+if𝐘)(𝐗,𝐘).\displaystyle=\frac{1}{2}\left(\frac{\partial f}{\partial\mathbf{X}}+i\frac{\partial f}{\partial\mathbf{Y}}\right)(\mathbf{X},\mathbf{Y}).

𝐙\frac{\partial}{\partial\mathbf{Z}} and 𝐙\frac{\partial}{\partial\mathbf{Z^{*}}} are the matrix Wirtinger derivatives of ff and they are the protagonists of this article. As before, we abuse notation and write both f(𝐗,𝐘)f(\mathbf{X},\mathbf{Y}) and f(𝐙,𝐙)f(\mathbf{Z},\mathbf{Z^{*}}), so we can write

𝐙=12(𝐗i𝐘),𝐙=12(𝐗+i𝐘).\displaystyle\frac{\partial}{\partial\mathbf{Z}}=\frac{1}{2}\left(\frac{\partial}{\partial\mathbf{X}}-i\frac{\partial}{\partial\mathbf{Y}}\right),\qquad\frac{\partial}{\partial\mathbf{Z^{*}}}=\frac{1}{2}\left(\frac{\partial}{\partial\mathbf{X}}+i\frac{\partial}{\partial\mathbf{Y}}\right). (3.5)

The following matrix version of Proposition 2.6 will be invoked frequently in optimization problems. The proof is omitted since it is a simple generalization of the proof in Proposition 2.6.

Proposition 3.1.

Let f:n×nf:\mathbb{C}^{n\times n}\longrightarrow\mathbb{R} be a real-valued function of complex matrices. Then ff has a stationary point at 𝐙=[zij]i,j[n]\mathbf{Z}=[z_{ij}]_{i,j\in[n]} if and only if

f𝐙(𝐙)=0(or equivalentlyf𝐙(𝐙)=0).\displaystyle\frac{\partial f}{\partial\mathbf{Z}}(\mathbf{Z})=0\quad\left(\text{or equivalently}\;\;\frac{\partial f}{\partial\mathbf{Z^{*}}}(\mathbf{Z})=0\right).

3.2 Functions of special interest

In this section we give several concrete examples of f(𝐙,𝐙)f(\mathbf{Z},\mathbf{Z^{*}}). Two such functions come to mind immediately—the trace and determinant of a matrix. We shall pay special attention to the trace function, as it appears most frequently in quantum information theory. For instance, the l1l_{1}-norm (trace), l2l_{2}-norm (Hilbert-Schmidt), Uhlmann fidelity, von Neumann entropy etc. are/can be expressed in terms of traces [Wil13]. The determinant plays a less prominent role, so we focus less on that. Nonetheless, we provide a list of common f(𝐙,𝐙)f(\mathbf{Z},\mathbf{Z^{*}}) and their Wirtinger derivatives, see Table 1.

      f(𝐙,𝐙)f(\mathbf{Z,Z^{*}})       f𝐙\frac{\partial f}{\partial\mathbf{Z}}       f𝐙\frac{\partial f}{\partial\mathbf{Z^{*}}}
      𝐚𝐓𝐙𝐛\mathbf{a^{T}Zb}       𝐚𝐛T\mathbf{ab}^{T}       𝟎\mathbf{0}
      𝐚𝐙𝐛\mathbf{a^{\dagger}Zb}       𝐚𝐛T\mathbf{a^{*}b}^{T}       𝟎\mathbf{0}
      Tr𝐙\operatorname{Tr}\mathbf{Z}       𝐈\mathbf{I}       𝟎\mathbf{0}
      Tr𝐙\operatorname{Tr}\mathbf{Z^{*}}       𝟎\mathbf{0}       𝐈\mathbf{I}
      Tr(𝐀𝐙)\operatorname{Tr}(\mathbf{AZ})       𝐀T\mathbf{A}^{T}       𝟎\mathbf{0}
      Tr(𝐀𝐙T)\operatorname{Tr}(\mathbf{A}\mathbf{Z}^{T})       𝐀\mathbf{A}       𝟎\mathbf{0}
      Tr(𝐀𝐙)\operatorname{Tr}(\mathbf{AZ^{*}})       𝟎\mathbf{0}       𝐀T\mathbf{A}^{T}
      Tr(𝐀𝐙)\operatorname{Tr}(\mathbf{AZ^{\dagger}})       𝟎\mathbf{0}       𝐀\mathbf{A}
      Tr(𝐙𝐀𝐙𝐁)\operatorname{Tr}(\mathbf{ZAZB})       (𝐀𝐙𝐁+𝐁𝐙𝐀)T(\mathbf{AZB+BZA})^{T}       𝟎\mathbf{0}
      Tr(𝐙𝐀𝐙T𝐁)\operatorname{Tr}(\mathbf{ZAZ}^{T}\mathbf{B})       𝐁T𝐙𝐀T+𝐁𝐙𝐀\mathbf{B}^{T}\mathbf{ZA}^{T}+\mathbf{BZA}       𝟎\mathbf{0}
      Tr(𝐙𝐀𝐙𝐁)\operatorname{Tr}(\mathbf{ZAZ^{*}B})       (𝐀𝐙𝐁)T(\mathbf{AZ^{*}B})^{T}       (𝐁𝐙𝐀)T(\mathbf{BZA})^{T}
      Tr(𝐙𝐀𝐙𝐁)\operatorname{Tr}(\mathbf{ZAZ^{\dagger}B})       𝐁T𝐙𝐀T\mathbf{B}^{T}\mathbf{Z^{*}A}^{T}       𝐁𝐙𝐀\mathbf{BZA}
      Tr(𝐙k)\operatorname{Tr}(\mathbf{Z}^{k})       k(𝐙T)k1k(\mathbf{Z}^{T})^{k-1}       𝟎\mathbf{0}
      Tr(𝐀𝐙k)\operatorname{Tr}(\mathbf{AZ}^{k})       r=0k1(𝐙r𝐀𝐙k1r)T\sum_{r=0}^{k-1}(\mathbf{Z}^{r}\mathbf{AZ}^{k-1-r})^{T}       𝟎\mathbf{0}
      Tr(F(𝐙))\operatorname{Tr}(F(\mathbf{Z}))       F(𝐙)TF^{\prime}(\mathbf{Z})^{T}       𝟎\mathbf{0}
      𝐙F2\|\mathbf{Z}\|_{F}^{2}       𝐙\mathbf{Z^{*}}       𝐙\mathbf{Z}
      det𝐙\det\mathbf{Z}       det(𝐙)(𝐙T)1\det(\mathbf{Z})(\mathbf{Z}^{T})^{-1}       𝟎\mathbf{0}
      det𝐙\det\mathbf{Z^{*}}       𝟎\mathbf{0}       det(𝐙)(𝐙)1\det(\mathbf{Z^{*}})(\mathbf{Z}^{\dagger})^{-1}
      det(𝐙𝐀𝐙)\det(\mathbf{Z^{\dagger}AZ})       det(𝐙𝐀𝐙)(𝐙T)1\det(\mathbf{Z^{\dagger}AZ})(\mathbf{Z}^{T})^{-1}       det(𝐙𝐀𝐙)(𝐙)1\det(\mathbf{Z^{\dagger}AZ})(\mathbf{Z}^{\dagger})^{-1}
      det(𝐙k)\det(\mathbf{Z}^{k})       kdetk(𝐙)(𝐙T)1k\det^{k}(\mathbf{Z})(\mathbf{Z}^{T})^{-1}       𝟎\mathbf{0}
Table 1: A list of scalar functions f(𝐙,𝐙)f(\mathbf{Z,Z^{*}}) and their Wirtinger derivatives, assuming the input matrices 𝐙,𝐙\mathbf{Z,Z^{*}} are unstructured (see next section). The items here are selected from Table 4.3, [Hjø11] and Sections 2 and 4, [PP+08].

From Section 2.5, [PP+08], we have that for real matrices 𝐗\mathbf{X} and a smooth function F(x)=n=0cnxnF(x)=\sum_{n=0}^{\infty}c_{n}x^{n}, the derivative of Tr(F(𝐗))\operatorname{Tr}(F(\mathbf{X})) is given by

Tr(F(𝐗))𝐗\displaystyle\frac{\partial\operatorname{Tr}(F(\mathbf{X}))}{\partial\mathbf{X}} =F(𝐗)T.\displaystyle=F^{\prime}(\mathbf{X})^{T}.

This is straightforwardly generalized to complex matrices 𝐙\mathbf{Z} and analytic functions F(z)=n=0cnznF(z)=\sum_{n=0}^{\infty}c_{n}z^{n}. The following proposition will be heavily used in Section 4 when we discuss applications in quantum information.

Proposition 3.2.

Let 𝐙\mathbf{Z} be a complex matrix and F(z)=n=0cnznF(z)=\sum_{n=0}^{\infty}c_{n}z^{n} be analytic. Define the scalar function f(𝐙,𝐙):=Tr(F(𝐙))f(\mathbf{Z,Z^{*}}):=\operatorname{Tr}(F(\mathbf{Z})). Then

Tr(F(𝐙))𝐙=F(𝐙)T\displaystyle\frac{\partial\operatorname{Tr}(F(\mathbf{Z}))}{\partial\mathbf{Z}}=F^{\prime}(\mathbf{Z})^{T}

where F()F^{\prime}(\cdot) is the complex derivative of F()F(\cdot).

Proof.

The operations of summing, tracing and differentiating commute, so we have

Tr(F(𝐙))Zij\displaystyle\frac{\partial\operatorname{Tr}(F(\mathbf{Z}))}{\partial Z_{ij}} =n=0cnTr𝐙nZij\displaystyle=\sum_{n=0}^{\infty}c_{n}\operatorname{Tr}\frac{\partial\mathbf{Z}^{n}}{\partial Z_{ij}}
=n=1cnlr=0n1(𝐙rΔij𝐙n1r)ll\displaystyle=\sum_{n=1}^{\infty}c_{n}\sum_{l}\sum_{r=0}^{n-1}\left(\mathbf{Z}^{r}\Delta_{ij}\mathbf{Z}^{n-1-r}\right)_{ll}\quad (see lemma below)\displaystyle(\text{see lemma below})
=n=1cnr=0n1l(Δij𝐙n1)ll\displaystyle=\sum_{n=1}^{\infty}c_{n}\sum_{r=0}^{n-1}\sum_{l}\left(\Delta_{ij}\mathbf{Z}^{n-1}\right)_{ll}\quad (cyclic permutation within trace)\displaystyle(\text{cyclic permutation within trace})
=n=1cnr=0n1lpΔij,lp(𝐙n1)pl\displaystyle=\sum_{n=1}^{\infty}c_{n}\sum_{r=0}^{n-1}\sum_{l}\sum_{p}\Delta_{ij,lp}(\mathbf{Z}^{n-1})_{pl}
=n=1cnr=0n1lpδilδjp(𝐙n1)pl\displaystyle=\sum_{n=1}^{\infty}c_{n}\sum_{r=0}^{n-1}\sum_{l}\sum_{p}\delta_{il}\delta_{jp}(\mathbf{Z}^{n-1})_{pl}\quad (definition of Δij)\displaystyle(\text{definition of $\Delta_{ij}$})
=n=1cnr=0n1(𝐙n1)ji\displaystyle=\sum_{n=1}^{\infty}c_{n}\sum_{r=0}^{n-1}(\mathbf{Z}^{n-1})_{ji}
=n=1cnn(𝐙n1)ji\displaystyle=\sum_{n=1}^{\infty}c_{n}n(\mathbf{Z}^{n-1})_{ji}
=F(𝐙)ji.\displaystyle=F^{\prime}(\mathbf{Z})_{ji}.

Lemma 3.3 (pg11, [PP+08]).

For n1n\geq 1,

(𝐙n)lmZij=r=0n1(𝐙rΔij𝐙n1r)lm\displaystyle\frac{\partial(\mathbf{Z}^{n})_{lm}}{\partial Z_{ij}}=\sum_{r=0}^{n-1}\left(\mathbf{Z}^{r}\Delta_{ij}\mathbf{Z}^{n-1-r}\right)_{lm}

where Δij\Delta_{ij} is the matrix with entry (i,j)(i,j) equaling one and other entries zero, i.e. Δij,lp:=δilδjp\Delta_{ij,lp}:=\delta_{il}\delta_{jp}.

Proof.

Expanding the matrix product gives

(𝐙n)lmZij\displaystyle\frac{\partial(\mathbf{Z}^{n})_{lm}}{\partial Z_{ij}} =Zij(k1,,kn1Zlk1Zk1k2Zkn2kn1Zkn1m)\displaystyle=\frac{\partial}{\partial Z_{ij}}\left(\sum_{k_{1},...,k_{n-1}}Z_{lk_{1}}Z_{k_{1}k_{2}}...Z_{k_{n-2}k_{n-1}}Z_{k_{n-1}m}\right)
=k1,,kn1r=0n1Zlk1Zk1k2Zkr1krδikrδjkr+1Zkr+1kr+2Zkn1m(product rule)\displaystyle=\sum_{k_{1},...,k_{n-1}}\sum_{r=0}^{n-1}Z_{lk_{1}}Z_{k_{1}k_{2}}...Z_{k_{r-1}k_{r}}\delta_{ik_{r}}\delta_{jk_{r+1}}Z_{k_{r+1}k_{r+2}}...Z_{k_{n-1}m}\quad(\text{product rule})
=r=0n1k1,,kn1Zlk1Zk1k2Zkr1krΔij,krkr+1Zkr+1kr+2Zkn1m\displaystyle=\sum_{r=0}^{n-1}\sum_{k_{1},...,k_{n-1}}Z_{lk_{1}}Z_{k_{1}k_{2}}...Z_{k_{r-1}k_{r}}\Delta_{ij,k_{r}k_{r+1}}Z_{k_{r+1}k_{r+2}}...Z_{k_{n-1}m}
=r=0n1(𝐙rΔij𝐙n1r)lm\displaystyle=\sum_{r=0}^{n-1}\left(\mathbf{Z}^{r}\Delta_{ij}\mathbf{Z}^{n-1-r}\right)_{lm}

where k0=lk_{0}=l and kn=mk_{n}=m in the second line. ∎

3.3 Structured matrices and the chain rule

So far, by writing f:n×nf:\mathbb{C}^{n\times n}\longrightarrow\mathbb{C} we have implicitly assumed the input matrices have independent components. This condition often does not hold, e.g. when our matrices of interest are symmetric/Hermitian etc. Such matrices are termed ‘patterned’ in [Hjø11] and ‘structured’ in [PP+08]) (we shall adopt the latter terminology).

In this situation, the results in Table 1 no longer hold. To obtain the correct Wirtinger derivatives with respect to structured matrices, we resort to the chain rule. For pedagogical purposes, we first demonstrate how to use the chain rule to find the matrix derivative of a function with respect to real symmetric matrices. Then we do likewise to obtain the Wirtinger derivatives of a function of complex, Hermitian matrices. In Table 2 we give the Wirtinger derivatives of f(𝐙,𝐙)f(\mathbf{Z,Z^{*}}) when 𝐙\mathbf{Z} is diagonal, symmetric, anti-symmetric, Hermitian and anti-Hermitian. All can be derived from the chain rule.

Let f(𝐗)f(\mathbf{X}) be a function of real matrices. The matrix derivative of f(𝐗)f(\mathbf{X}) with respect to 𝐗\mathbf{X} has components f/Xij\partial f/\partial X_{ij}. Using the chain rule, we write

fXij=k,l=1nfX~klX~klXij.\displaystyle\frac{\partial f}{\partial X_{ij}}=\sum_{k,l=1}^{n}\frac{\partial f}{\partial\tilde{X}_{kl}}\frac{\partial\tilde{X}_{kl}}{\partial X_{ij}}. (3.6)

Here, the tilde above 𝐗~\mathbf{\tilde{X}} indicates that it is an unstructured matrix. The term X~kl/Xij\partial\tilde{X}_{kl}/\partial X_{ij} takes into account the dependencies of the components of 𝐗\mathbf{X}. Following the terminology in [PP+08], we call it the structure matrix of 𝐗\mathbf{X}. If 𝐗\mathbf{X} is unstructured, then its structure matrix is simply

X~klXij=δkiδlj.\displaystyle\frac{\partial\tilde{X}_{kl}}{\partial X_{ij}}=\delta_{ki}\delta_{lj}.

Now let 𝐗\mathbf{X} be symmetric. For diagonal matrix elements we have X~kl/Xij=δklδij\partial\tilde{X}_{kl}/\partial X_{ij}=\delta_{kl}\delta_{ij}, while for off-diagonal matrix elements we have X~kl/Xij=δklδij+δliδkj\partial\tilde{X}_{kl}/\partial X_{ij}=\delta_{kl}\delta_{ij}+\delta_{li}\delta_{kj}. We can write these succinctly as

X~klXij\displaystyle\frac{\partial\tilde{X}_{kl}}{\partial X_{ij}} =δkiδlj+(1δkl)δliδkj\displaystyle=\delta_{ki}\delta_{lj}+(1-\delta_{kl})\delta_{li}\delta_{kj}
=δkiδlj+δliδkjδklδliδkj.\displaystyle=\delta_{ki}\delta_{lj}+\delta_{li}\delta_{kj}-\delta_{kl}\delta_{li}\delta_{kj}.

Plugging this into Eq. 3.6, we obtain

fXij\displaystyle\frac{\partial f}{\partial X_{ij}} =k,l=1nfX~klX~klXij\displaystyle=\sum_{k,l=1}^{n}\frac{\partial f}{\partial\tilde{X}_{kl}}\frac{\partial\tilde{X}_{kl}}{\partial X_{ij}}
=k,l=1nfX~kl(δkiδlj+δliδkjδklδliδkj)\displaystyle=\sum_{k,l=1}^{n}\frac{\partial f}{\partial\tilde{X}_{kl}}(\delta_{ki}\delta_{lj}+\delta_{li}\delta_{kj}-\delta_{kl}\delta_{li}\delta_{kj})
=fX~ij+fX~jiδijfX~ij\displaystyle=\frac{\partial f}{\partial\tilde{X}_{ij}}+\frac{\partial f}{\partial\tilde{X}_{ji}}-\delta_{ij}\frac{\partial f}{\partial\tilde{X}_{ij}}

for 1i,jn1\leq i,j\leq n. This gives

Proposition 3.4 (Matrix derivative with respect to symmetric matrices).

Let f(𝐗)f(\mathbf{X}) be a function of real symmetric matrices. Then the matrix derivative of ff with respect to 𝐗\mathbf{X} is

f𝐗=[f𝐗~+(f𝐗~)T𝐈f𝐗~]𝐗~=𝐗.\displaystyle\frac{\partial f}{\partial\mathbf{X}}=\left[\frac{\partial f}{\partial\mathbf{\tilde{X}}}+\left(\frac{\partial f}{\partial\mathbf{\tilde{X}}}\right)^{T}-\mathbf{I}\odot\frac{\partial f}{\partial\mathbf{\tilde{X}}}\right]_{\mathbf{\tilde{X}}=\mathbf{X}}.

Note that fXij=fXji\frac{\partial f}{\partial X_{ij}}=\frac{\partial f}{\partial X_{ji}}, as expected. Thus, to derive the matrix derivative with respect to symmetric matrices, we first obtain the matrix derivative of ff, assuming the inputs are unstructured, then form the linear combination given in Proposition 3.4, before reinstating the structured matrix 𝐗\mathbf{X} as the argument. For example,

Tr(𝐀𝐗)𝐗=𝐀+𝐀T𝐈𝐀\displaystyle\frac{\partial\operatorname{Tr}(\mathbf{AX})}{\partial\mathbf{X}}=\mathbf{A}+\mathbf{A}^{T}-\mathbf{I\odot A}

and

Tr(𝐗2)𝐗=2𝐗+2𝐗T2𝐈𝐗𝐓.\displaystyle\frac{\partial\operatorname{Tr}(\mathbf{X}^{2})}{\partial\mathbf{X}}=2\mathbf{X}+2\mathbf{X}^{T}-2\mathbf{I\odot\mathbf{X}^{T}}.
Structure of 𝐙\mathbf{Z} f𝐙(𝐙,𝐙)\frac{\partial f}{\partial\mathbf{Z}}(\mathbf{Z,Z^{*}}) f𝐙(𝐙,𝐙)\frac{\partial f}{\partial\mathbf{Z}^{*}}(\mathbf{Z,Z^{*}})
Unstructured: 𝐙n×n\mathbf{Z}\in\mathbb{C}^{n\times n} [f𝐙~]𝐙~=𝐙\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}}}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}} [f𝐙~]𝐙~=𝐙\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}}
Diagonal: 𝐙=𝐈𝐙\mathbf{Z}=\mathbf{I}\odot\mathbf{Z} [𝐈f𝐙~]𝐙~=𝐙\left[\mathbf{I}\odot\frac{\partial f}{\partial\mathbf{\tilde{Z}}}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}} [𝐈f𝐙~]𝐙~=𝐙\left[\mathbf{I}\odot\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}}
Symmetric: 𝐙=𝐙T\mathbf{Z}=\mathbf{Z}^{T} [f𝐙~+(f𝐙~)T𝐈f𝐙~]𝐙~=𝐙\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}}}+\left(\frac{\partial f}{\partial\mathbf{\tilde{Z}}}\right)^{T}-\mathbf{I}\odot\frac{\partial f}{\partial\mathbf{\tilde{Z}}}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}} [f𝐙~+(f𝐙~)T𝐈f𝐙~]𝐙~=𝐙\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}+\left(\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}\right)^{T}-\mathbf{I}\odot\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}}
Anti-symmetric: 𝐙=𝐙T\mathbf{Z}=-\mathbf{Z}^{T} [f𝐙~(f𝐙~)T]𝐙~=𝐙\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}}}-\left(\frac{\partial f}{\partial\mathbf{\tilde{Z}}}\right)^{T}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}} [f𝐙~(f𝐙~)T]𝐙~=𝐙\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}-\left(\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}\right)^{T}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}}
Hermitian: 𝐙=𝐙\mathbf{Z}=\mathbf{Z}^{\dagger} [f𝐙~+(f𝐙~)T]𝐙~=𝐙\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}}}+\left(\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}\right)^{T}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}} [f𝐙~+(f𝐙~)T]𝐙~=𝐙\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}+\left(\frac{\partial f}{\partial\mathbf{\tilde{Z}}}\right)^{T}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}}
Anti-Hermitian: 𝐙=𝐙\mathbf{Z}=-\mathbf{Z}^{\dagger} [f𝐙~(f𝐙~)T]𝐙~=𝐙\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}}}-\left(\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}\right)^{T}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}} [f𝐙~(f𝐙~)T]𝐙~=𝐙\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}-\left(\frac{\partial f}{\partial\mathbf{\tilde{Z}}}\right)^{T}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}}
Table 2: The Wirtinger derivatives of f(𝐙,𝐙)f(\mathbf{Z,Z^{*}}) where 𝐙\mathbf{Z} is a structured matrix. The tildes above 𝐙~,𝐙~\mathbf{\tilde{Z},\tilde{Z}^{*}} indicate that they are unstructured matrices. To derive the Wirtinger derivatives with respect to structured matrices, first obtain the Wirtinger derivative of ff, assuming the inputs are unstructured. After forming the correct expressions given above, reinstate the structured matrices 𝐙,𝐙\mathbf{Z,Z^{*}} as the arguments. Here we consider five structure classes, namely the diagonal, symmetric, anti-symmetric, Hermitian and anti-Hermitian matrices. This table is replicated from Table 6.2, [Hjø11].

Having demonstrated how to find matrix derivatives with respect to real structured matrices, we now apply the same methodology to obtain the Wirtinger derivatives of f(𝐙,𝐙)f(\mathbf{Z,Z^{*}}) when 𝐙\mathbf{Z} is structured. Let f(𝐙,𝐙)f(\mathbf{Z,Z^{*}}) be a function of complex matrices. Applying the chain rule, the Wirtinger derivatives are given by

fZij\displaystyle\frac{\partial f}{\partial Z_{ij}} =k,l=1nfZ~klZ~klZij+k,l=1nfZ~klZ~klZij\displaystyle=\sum_{k,l=1}^{n}\frac{\partial f}{\partial\tilde{Z}_{kl}}\frac{\partial\tilde{Z}_{kl}}{\partial Z_{ij}}+\sum_{k,l=1}^{n}\frac{\partial f}{\partial\tilde{Z}_{kl}^{*}}\frac{\partial\tilde{Z}_{kl}^{*}}{\partial Z_{ij}} (3.7)
fZij\displaystyle\frac{\partial f}{\partial Z_{ij}^{*}} =k,l=1nfZ~klZ~klZij+k,l=1nfZ~klZ~klZij.\displaystyle=\sum_{k,l=1}^{n}\frac{\partial f}{\partial\tilde{Z}_{kl}}\frac{\partial\tilde{Z}_{kl}}{\partial Z_{ij}^{*}}+\sum_{k,l=1}^{n}\frac{\partial f}{\partial\tilde{Z}_{kl}^{*}}\frac{\partial\tilde{Z}_{kl}^{*}}{\partial Z_{ij}^{*}}.

As before, the tildes above 𝐙~,𝐙~\mathbf{\tilde{Z},\tilde{Z}^{*}} indicate that they are unstructured matrices, so f𝐙~\frac{\partial f}{\partial\mathbf{\tilde{Z}}}, f𝐙~\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}} can be obtained using the results in Section 3.2. Note that there are now four structure matrices. If 𝐙\mathbf{Z} is unstructured, then they are simply

Z~klZij=δkiδljZ~klZij=0Z~klZij=0Z~klZij=δkiδlj.\displaystyle\frac{\partial\tilde{Z}_{kl}}{\partial Z_{ij}}=\delta_{ki}\delta_{lj}\qquad\frac{\partial\tilde{Z}_{kl}}{\partial Z_{ij}^{*}}=0\qquad\frac{\partial\tilde{Z}_{kl}^{*}}{\partial Z_{ij}}=0\qquad\frac{\partial\tilde{Z}_{kl}^{*}}{\partial Z_{ij}^{*}}=\delta_{ki}\delta_{lj}.

If 𝐙\mathbf{Z} is Hermitian, we have

Proposition 3.5 (Wirtinger derivatives with respect to Hermitian matrices).

Let f(𝐙,𝐙)f(\mathbf{Z,Z^{*}}) be a function of complex Hermitian matrices. Then the Wirtinger derivatives of ff with respect to 𝐙,𝐙\mathbf{Z,Z^{*}} are given by

f𝐙=[f𝐙~+(f𝐙~)T]𝐙~=𝐙andf𝐙=[f𝐙~+(f𝐙~)T]𝐙~=𝐙.\displaystyle\frac{\partial f}{\partial\mathbf{Z}}=\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}}}+\left(\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}\right)^{T}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}}\qquad\text{and}\qquad\frac{\partial f}{\partial\mathbf{Z^{*}}}=\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}+\left(\frac{\partial f}{\partial\mathbf{\tilde{Z}}}\right)^{T}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}}.
Proof.

Let us first consider the Wirtinger derivative f𝐙\frac{\partial f}{\partial\mathbf{Z}}. We have the structure matrices

Z~klZij=δkiδljandZ~klZij=δliδkj.\displaystyle\frac{\partial\tilde{Z}_{kl}}{\partial Z_{ij}}=\delta_{ki}\delta_{lj}\qquad\text{and}\qquad\frac{\partial\tilde{Z}_{kl}^{*}}{\partial Z_{ij}}=\delta_{li}\delta_{kj}.

Thus

fZij\displaystyle\frac{\partial f}{\partial Z_{ij}} =k,l=1nfZ~klZ~klZij+k,l=1nfZ~klZ~klZij\displaystyle=\sum_{k,l=1}^{n}\frac{\partial f}{\partial\tilde{Z}_{kl}}\frac{\partial\tilde{Z}_{kl}}{\partial Z_{ij}}+\sum_{k,l=1}^{n}\frac{\partial f}{\partial\tilde{Z}_{kl}^{*}}\frac{\partial\tilde{Z}_{kl}^{*}}{\partial Z_{ij}}
=k,l=1nfZ~klδkiδlj+k,l=1nfZ~klδliδkj\displaystyle=\sum_{k,l=1}^{n}\frac{\partial f}{\partial\tilde{Z}_{kl}}\delta_{ki}\delta_{lj}+\sum_{k,l=1}^{n}\frac{\partial f}{\partial\tilde{Z}_{kl}^{*}}\delta_{li}\delta_{kj}
=fZ~ij+fZ~ji.\displaystyle=\frac{\partial f}{\partial\tilde{Z}_{ij}}+\frac{\partial f}{\partial\tilde{Z}_{ji}^{*}}.

f𝐙\frac{\partial f}{\partial\mathbf{Z^{*}}} can be similarly obtained. Alternatively, because 𝐙=𝐙T\mathbf{Z^{*}}=\mathbf{Z}^{T} it follows that

f𝐙=f𝐙T=(f𝐙)T=[f𝐙~+(f𝐙~)T]𝐙~=𝐙T=[f𝐙~+(f𝐙~)T]𝐙~=𝐙.\displaystyle\frac{\partial f}{\partial\mathbf{Z^{*}}}=\frac{\partial f}{\partial\mathbf{Z}^{T}}=\left(\frac{\partial f}{\partial\mathbf{Z}}\right)^{T}=\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}}}+\left(\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}\right)^{T}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}}^{T}=\left[\frac{\partial f}{\partial\mathbf{\tilde{Z}^{*}}}+\left(\frac{\partial f}{\partial\mathbf{\tilde{Z}}}\right)^{T}\right]_{\mathbf{\tilde{Z}}=\mathbf{Z}}.

Note that fZij=fZji\frac{\partial f}{\partial Z_{ij}}=\frac{\partial f}{\partial Z_{ji}^{*}}, as expected. ∎

4 Applications in Quantum Information

To illustrate the utility of this calculus, we discuss a few examples of relevance in quantum information.

4.1 Examples

Example 4.1.

Consider the purity function on density operators, f:𝒟(N)f:\mathcal{D}(\mathcal{H}_{N})\longrightarrow\mathbb{R}, ρTr(ρ2)\rho\mapsto\operatorname{Tr}(\rho^{2}). The purity of a density operator/state quantifies the noisiness of a state [Wil13]. Which state ρ\rho has the lowest purity? Intuitively, it is the maximally mixed state. Let us verify this via two approaches: the ‘classical’ method and Wirtinger’s method.

Since this is our first example, we shall be more thorough with the details.

  1. i.

    (Classical) For any ρ𝒟(N)\rho\in\mathcal{D}(\mathcal{H}_{N}) we can write ρii=aii\rho_{ii}=a_{ii} and ρij=aij+ibij=ρji\rho_{ij}=a_{ij}+ib_{ij}=\rho^{*}_{ji} for iji\neq j, where aij,bija_{ij},b_{ij}\in\mathbb{R}. For example if N=2N=2 then

    ρ=[ρ11ρ12ρ21ρ22]=[a11a12+ib12a12ib12a22].\displaystyle\rho=\begin{bmatrix}\rho_{11}&\rho_{12}\\ \rho_{21}&\rho_{22}\end{bmatrix}=\begin{bmatrix}a_{11}&a_{12}+ib_{12}\\ a_{12}-ib_{12}&a_{22}\end{bmatrix}.

    Thus

    Tr(ρ2)=i,j=1Ni|ρ|jj|ρ|i\displaystyle\operatorname{Tr}(\rho^{2})=\sum_{i,j=1}^{N}\braket{i}{\rho}{j}\braket{j}{\rho}{i} =i,j=1N|i|ρ|j|2\displaystyle=\sum_{i,j=1}^{N}|\braket{i}{\rho}{j}|^{2}
    =i=1N|ρii|2+ij|ρij|2\displaystyle=\sum_{i=1}^{N}|\rho_{ii}|^{2}+\sum_{i\neq j}|\rho_{ij}|^{2}
    =i=1Naii2+2i<j(aij2+bij2).\displaystyle=\sum_{i=1}^{N}a_{ii}^{2}+2\sum_{i<j}(a_{ij}^{2}+b_{ij}^{2}).

    The purity function is viewed as a function of N2N^{2} real inputs:

    f({aii}i[N];{aij,bij}i<j)=i=1Naii2+2i<j(aij2+bij2).\displaystyle f(\{a_{ii}\}_{i\in[N]};\{a_{ij},b_{ij}\}_{i<j})=\sum_{i=1}^{N}a_{ii}^{2}+2\sum_{i<j}(a_{ij}^{2}+b_{ij}^{2}).

    We formulate the Lagrangian function

    =iaii2+2i<j(aij2+bij2)λ(iaii1),\displaystyle\mathcal{L}=\sum_{i}a_{ii}^{2}+2\sum_{i<j}(a_{ij}^{2}+b_{ij}^{2})-\lambda\left(\sum_{i}a_{ii}-1\right),

    where λ\lambda is the Lagrange multiplier taking into account the constraint Trρ=1\operatorname{Tr}\rho=1. The first-order conditions give

    aii=0\displaystyle\frac{\partial\mathcal{L}}{\partial a_{ii}}=0 aii=λ2\displaystyle\implies a_{ii}=\frac{\lambda}{2}
    aij=0\displaystyle\frac{\partial\mathcal{L}}{\partial a_{ij}}=0 aij=0\displaystyle\implies a_{ij}=0
    λ=0\displaystyle\frac{\partial\mathcal{L}}{\partial\lambda}=0 λ=2N,\displaystyle\implies\lambda=\frac{2}{N},

    therefore the optimal state is ρ=IN\rho=\frac{I}{N}, the maximally mixed state.

  2. ii.

    (Wirtinger) Per Wirtinger’s approach, we retain the structure of ρ\rho. Here f(ρ,ρ)=Tr(ρ2)f(\rho,\rho^{*})=\operatorname{Tr}(\rho^{2}) where ρ\rho is Hermitian and thus structured. We formulate the Lagrangian:

    =Tr(ρ2)λ(Trρ1).\displaystyle\mathcal{L}=\operatorname{Tr}(\rho^{2})-\lambda(\operatorname{Tr}\rho-1).

    To find the optimal ρ\rho, we invoke Proposition 3.1. The conditions required are

    ρ\displaystyle\frac{\partial\mathcal{L}}{\partial\rho} =Tr(ρ2)ρλ(Trρ1)ρ=𝟎\displaystyle=\frac{\partial\operatorname{Tr}(\rho^{2})}{\partial\rho}-\lambda\frac{\partial(\operatorname{Tr}\rho-1)}{\partial\rho}=\mathbf{0}
    λ\displaystyle\frac{\partial\mathcal{L}}{\partial\lambda} =0.\displaystyle=0.

    To obtain Tr(ρ2)ρ\frac{\partial\operatorname{Tr}(\rho^{2})}{\partial\rho} and Trρρ\frac{\partial\operatorname{Tr}\rho}{\partial\rho} we use Proposition 3.2 (or simply consult Table 1) together with the chain rule, see Proposition 3.4. For example

    Tr(ρ2)ρ\displaystyle\frac{\partial\operatorname{Tr}(\rho^{2})}{\partial\rho} =[Tr(ρ~2)ρ~+(Tr(ρ~2)ρ~)T]ρ~=ρ\displaystyle=\left[\frac{\partial\operatorname{Tr}(\tilde{\rho}^{2})}{\partial\tilde{\rho}}+\cancel{\left(\frac{\partial\operatorname{Tr}(\tilde{\rho}^{2})}{\partial\tilde{\rho}^{*}}\right)^{T}}\right]_{\tilde{\rho}=\rho}
    =2ρT,\displaystyle=2\rho^{T},

    and similarly Trρρ=I\frac{\partial\operatorname{Tr}\rho}{\partial\rho}=I. Therefore ρ=𝟎2ρTλI=𝟎ρ=λ2I\frac{\partial\mathcal{L}}{\partial\rho}=\mathbf{0}\implies 2\rho^{T}-\lambda I=\mathbf{0}\implies\rho=\frac{\lambda}{2}I. The constraint Trρ=1\operatorname{Tr}\rho=1 then gives ρ=IN\rho=\frac{I}{N}.

Example 4.2.

Which density operator ρ\rho maximizes the von Neumann entropy H(ρ)=Tr(ρlogρ)H(\rho)=-\operatorname{Tr}(\rho\log\rho)? The usual textbook solution makes use of the quantum data-processing inequality, which states that S(σρ)S(𝒩(σ)𝒩(ρ))S(\sigma\|\rho)\geq S(\mathcal{N}(\sigma)\|\mathcal{N}(\rho)) for quantum states σ,ρ\sigma,\rho and any quantum channel 𝒩\mathcal{N} (here S(σρ)S(\sigma\|\rho) is the quantum relative entropy). Defining the quantum channel :𝒟(d)𝒟(d)\mathcal{E}:\mathcal{D}(\mathcal{H}_{d})\longrightarrow\mathcal{D}(\mathcal{H}_{d}), ρId\rho\mapsto\frac{I}{d}, we have H(ρ)+logd=D(ρId)D((ρ)(Id))=D(IdId)=0.-H(\rho)+\log d=D(\rho\|\frac{I}{d})\geq D(\mathcal{E}(\rho)\|\mathcal{E}(\frac{I}{d}))=D(\frac{I}{d}\|\frac{I}{d})=0. Thus H(ρ)logdH(\rho)\leq\log d, which is attained when ρ=Id\rho=\frac{I}{d}.

Using Wirtinger calculus, we first prepare the Lagrangian =Tr(ρlogρ)λ(Tr(ρ)1)\mathcal{L}=-\operatorname{Tr}(\rho\log\rho)-\lambda(\operatorname{Tr}(\rho)-1). ρ=𝟎1+logρT+λ=𝟎ρ=e1λIρ=Id\frac{\partial\mathcal{L}}{\partial\rho}=\mathbf{0}\implies 1+\log\rho^{T}+\lambda=\mathbf{0}\implies\rho=e^{-1-\lambda}I\implies\rho=\frac{I}{d}.

Example 4.3.

We extend the example above by imposing an additional constraint on ρ\rho, namely that Tr(ρH)=E\operatorname{Tr}(\rho H)=E for the Hamiltonian HH governing the quantum system. So we want to solve

minimizeρ𝒟()\displaystyle\text{minimize}_{\rho\in\mathcal{D}(\mathcal{H})} H(ρ)\displaystyle H(\rho)
s.t. Tr(ρH)=E.\displaystyle\operatorname{Tr}(\rho H)=E.

Note that because ρ,H\rho,H are Hermitian, Tr(ρH)\operatorname{Tr}(\rho H) is real. We require hmin<E<hmaxh_{\min}<E<h_{\max} (hh denotes a generic eigenvalue of HH), otherwise the constraint Tr(ρH)=E\operatorname{Tr}(\rho H)=E cannot be satisfied.

Set up the Lagrangian

=Tr(ρlogρ)β(Tr(ρH)E)η(Trρ1),\displaystyle\mathcal{L}=-\operatorname{Tr}(\rho\log\rho)-\beta(\operatorname{Tr}(\rho H)-E)-\eta(\operatorname{Tr}\rho-1),

where β\beta and η\eta are the Lagrange multipliers. Making use of Proposition 3.2 for trace functions and the chain rule, Proposition 3.5, setting the Wirtinger derivative of \mathcal{L} to zero gives

ρ=𝟎\displaystyle\frac{\partial\mathcal{L}}{\partial\rho}=\mathbf{0} (logρ)TIβHTηI=𝟎\displaystyle\implies-(\log\rho)^{T}-I-\beta H^{T}-\eta I=\mathbf{0}
ρ=e(η+1)eβH\displaystyle\implies\rho=e^{-(\eta+1)}e^{-\beta H}
ρ=eβHTr(eβH)after normalization,\displaystyle\implies\rho=\frac{e^{-\beta H}}{\operatorname{Tr}(e^{-\beta H})}\qquad\text{after normalization},

where β\beta is to be determined from the constraint Tr(ρH)=E\operatorname{Tr}(\rho H)=E. We see that the solution takes the form of a Gibbs state with respect to the Hamiltonian HH and inverse temperature β\beta [PB21, Kar07].

Example 4.4.

In this example, we consider the Frobenius/Hilbert-Schmidt norm F\|\cdot\|_{F} as the figure of merit. Given a linear operator L()L\in\mathcal{L}(\mathcal{H}), we want to find the operator TT such that the Frobenius metric TLF\|T-L\|_{F} is minimized, while keeping the Frobenius norm and trace of TT fixed. That is, we want to solve

minimizeT()\displaystyle\text{minimize}_{T\in\mathcal{L}(\mathcal{H})} TLF\displaystyle\|T-L\|_{F}
s.t. TF=C\displaystyle\|T\|_{F}=C
TrT=D.\displaystyle\operatorname{Tr}T=D.

The Frobenius norm can be expressed as a trace: TF:=Tr(TT)\|T\|_{F}:=\sqrt{\operatorname{Tr}(T^{\dagger}T)}. Since the map xx2x\mapsto x^{2} is increasing on the positive numbers, we could equally well minimize TF2\|T\|_{F}^{2}—this removes the square root. The Lagrangian is then

\displaystyle\mathcal{L} =TLF2+λ(TF2C)+η(TrTD)\displaystyle=\|T-L\|_{F}^{2}+\lambda(\|T\|_{F}^{2}-C)+\eta(\operatorname{Tr}T-D)
=Tr((TL)(TL))+λ(Tr(TT)C2)+η(TrTD).\displaystyle=\operatorname{Tr}\left((T-L)^{\dagger}(T-L)\right)+\lambda(\operatorname{Tr}(T^{\dagger}T)-C^{2})+\eta(\operatorname{Tr}T-D).

Then, we obtain

T=𝟎\displaystyle\frac{\partial\mathcal{L}}{\partial T}=\mathbf{0} TL+λT+ηI=𝟎(cf. Table 1)\displaystyle\implies T^{*}-L^{*}+\lambda T^{*}+\eta I=\mathbf{0}\qquad\text{(cf.\leavevmode\nobreak\ Table \ref{Table:List_Wirtinger_derivatives})}
T=LηI1+λ.\displaystyle\implies T=\frac{L-\eta I}{1+\lambda}.

The Lagrange multipliers λ\lambda and η\eta can then be obtained from the constraint equations. We note that if there were no trace constraint then T=CLFLT=\frac{C}{\|L\|_{F}}L, while if there were no Frobenius norm constraint then T=LTrLDTrIIT=L-\frac{\operatorname{Tr}L-D}{\operatorname{Tr}I}I.

4.2 Discussion

Above we gave a few simple applications of Wirtinger Calculus in quantum information. For more involved applications of this framework in quantum information, we direct the reader to the recent papers on quantum tomography [UARTND19] and quantum estimation theory [MPND22].

The WC first found widespread practical use in electrical engineering. The interested reader can consult [Hjø11] and [KD09] for the myriad applications of WC in various topics, such as digital signal processing, wireless communications, control theory and adaptive filtering. More recently the WC is also being widely utilized in machine learning (broadly defined). See for example [BR17, BT10, LA10, PXC+21, Vir19, ZLC+19]. We hope that our little article would be of use to researchers in these communities as well.

Acknowledgments

This work is supported by the National Research Foundation, Singapore, and A*STAR under its CQT Bridging Grant and its Quantum Engineering Programme under grant NRF2021-QEP2-02-P05.

References

  • [BR17] Sohail Bahmani and Justin Romberg. Phase retrieval meets statistical learning theory: A flexible convex relaxation. In Artificial Intelligence and Statistics, pages 252–260. PMLR, 2017.
  • [Bra83] DH Brandwood. A complex gradient operator and its application in adaptive array theory. In IEE Proceedings H (Microwaves, Optics and Antennas), volume 130, pages 11–16. IET Digital Library, 1983.
  • [BT10] Pantelis Bouboulis and Sergios Theodoridis. Extension of Wirtinger’s calculus to reproducing kernel Hilbert spaces and the complex kernel LMS. IEEE Transactions on Signal Processing, 59(3):964–978, 2010.
  • [GK06] Robert Everist Greene and Steven George Krantz. Function theory of one complex variable. American Mathematical Soc., 2006.
  • [HG07] Are Hjorungnes and David Gesbert. Complex-valued matrix differentiation: Techniques and key results. IEEE Transactions on Signal Processing, 55(6):2740–2746, 2007.
  • [Hjø11] Are Hjørungnes. Complex-valued matrix derivatives: with applications in signal processing and communications. Cambridge University Press, 2011.
  • [Kar07] Mehran Kardar. Statistical physics of particles. Cambridge University Press, 2007.
  • [KD09] Ken Kreutz-Delgado. The complex gradient operator and the CR-calculus. arXiv preprint arXiv:0906.4835, 2009.
  • [LA10] Hualiang Li and Tülay Tülay Adali. Algorithms for complex ML ICA and their stability analysis using Wirtinger calculus. IEEE Transactions on Signal Processing, 58(12):6156–6167, 2010.
  • [MPND22] M Muñoz, L Pereira, S Niklitschek, and A Delgado. Complex field formulation of the quantum estimation theory. arXiv preprint arXiv:2203.03064, 2022.
  • [PB21] R.K. Pathria and P.D. Beale. Statistical Mechanics. Elsevier, 2021.
  • [Poi98] Henri Poincaré. Sur les propriétés du potentiel et sur les fonctions abéliennes. 1898.
  • [PP+08] Kaare Brandt Petersen, Michael Syskind Pedersen, et al. The matrix cookbook. Technical University of Denmark, 7(15):510, 2008.
  • [PXC+21] Yi-Fei Pu, Xuetao Xie, Jinde Cao, Hua Chen, Kai Zhang, and Jian Wang. An input weights dependent complex-valued learning algorithm based on Wirtinger calculus. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52(5):2920–2932, 2021.
  • [Rem91] Reinhold Remmert. Theory of complex functions, volume 122. Springer Science & Business Media, 1991.
  • [Rud53] Walter Rudin. Principles of mathematical analysis. McGraw-Hill, Inc., 1953.
  • [UARTND19] A Utreras-Alarcón, M Rivera-Tapia, S Niklitschek, and A Delgado. Stochastic optimization on complex variables and pure-state quantum tomography. Scientific Reports, 9(1):16143, 2019.
  • [VDB94] A Van Den Bos. Complex gradient and Hessian. IEE Proceedings-Vision, Image and Signal Processing, 141(6):380–382, 1994.
  • [Vir19] Patrick M Virtue. Complex-valued deep learning with applications to magnetic resonance image synthesis. University of California, Berkeley, 2019.
  • [Wil13] Mark M Wilde. Quantum information theory. Cambridge University Press, 2013.
  • [Wir27] Wilhelm Wirtinger. Zur formalen Theorie der Funktionen von mehr komplexen Veränderlichen. Mathematische Annalen, 97(1):357–375, 1927.
  • [ZLC+19] Bingjie Zhang, Yusong Liu, Jinde Cao, Shujun Wu, and Jian Wang. Fully complex conjugate gradient-based neural networks using Wirtinger calculus framework: Deterministic convergence and its application. Neural Networks, 115:50–64, 2019.