Capacity of a Nonlinear Optical Channel
with Finite Memory

Erik Agrell, Alex Alvarado, Giuseppe Durisi, and Magnus Karlsson Research supported by the Swedish Research Council (VR) under grant no. 2012-5280, the Swedish Foundation for Strategic Research (SSF) under grant no. RE07-0026, and the European Community’s Seventh’s Framework Programme (FP7/2007-2013) under grant agreement no. 271986. This paper was presented in part at the 2013 European Conference on Optical Communication.E. Agrell and G. Durisi are with the Department of Signals and Systems, Chalmers University of Technology, SE-41296 Gothenburg, Sweden (email: {agrell,durisi}@chalmers.se).M. Karlsson is with the Department of Microtechnology and Nanoscience, Chalmers University of Technology, SE-41296 Gothenburg, Sweden (email: magnus.karlsson@chalmers.se).A. Alvarado is with the Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, United Kingdom (email: alex.alvarado@ieee.org).

Abstract

The channel capacity of a nonlinear, dispersive fiber-optic link is revisited. To this end, the popular Gaussian noise (GN) model is extended with a parameter to account for the finite memory of realistic fiber channels. This finite-memory model is harder to analyze mathematically but, in contrast to previous models, it is valid also for nonstationary or heavy-tailed input signals. For uncoded transmission and standard modulation formats, the new model gives the same results as the regular GN model when the memory of the channel is about 10 symbols or more. These results confirm previous results that the GN model is accurate for uncoded transmission. However, when coding is considered, the results obtained using the finite-memory model are very different from those obtained by previous models, even when the channel memory is large. In particular, the peaky behavior of the channel capacity, which has been reported for numerous nonlinear channel models, appears to be an artifact of applying models derived for independent input in a coded (i.e., dependent) scenario.

Index Terms:

Channel capacity, channel model, fiber-optic communications, Gaussian noise model, nonlinear distortion.

I Introduction

The introduction of coherent optical receivers has brought significant advantages in fiber optical communications, e.g., enabling efficient polarization demultiplexing, higher-order modulation formats, increased sensitivity, and electrical mitigation of transmission impairments[1, 2]. Even if the linear transmission impairments (such as chromatic and polarization-mode dispersion) can be dealt with electronically, the Kerr nonlinearity in the fiber remains a significant obstacle. Since the nonlinearity causes signal distortions at high signaling powers, arbitrarily high signal-to-noise ratios are inaccessible, which limits transmission over long distances and high spectral efficiencies. This is sometimes referred to as the “nonlinear Shannon limit” [3, 4].

For systems with large accumulated dispersion and weak nonlinearity, the joint effect of chromatic dispersion and the Kerr effect is similar to that of additive Gaussian noise. This was pointed out already by Splett [5] and Tang [6]. The emergence of this Gaussian noise is prevalent in links that have no inline dispersion compensation, such as today’s coherent links, where the dispersion compensation takes place electronically in the receiver signal processing. This Gaussian noise approximation has been recently rediscovered and applied to today’s coherent links in a series of papers by Poggiolini et al. [7, 8, 9, 10] and other groups [11, 12, 13]. The resulting so-called Gaussian noise model, or GN model for short, is valid for multi-channel (wavelength- and polarization-division multiplexed) signals. It has also been shown to work for single-channel and single-polarization transmission if the dispersive decorrelation is large enough [11, 14].

A crucial assumption in the derivation of the GN model is that of independent, identically distributed (i.i.d.) inputs: the transmitted symbols are independent of each other, are drawn from the same constellation, and have the same constellation scaling (the same average transmit power). Under these assumptions, the model has been experimentally verified to be very accurate [15, 9] for the most common modulation formats, such as quadrature amplitude modulation (QAM) or phase-shift keying.

In this paper, the assumption of i.i.d. inputs is, perhaps for the first time in optical channel modeling, relaxed. This is done by introducing a modified GN model, which we call the finite-memory GN model. This new model includes the memory of the channel as a parameter and differs from previous channel models in that it is valid also when the channel input statistics are time-varying, or when “heavy-tailed” constellations are used.

The performance predicted by the regular GN model (both in terms of uncoded error probability and channel capacity) is compared with the ones predicted by the finite-memory GN model. The uncoded performance is characterized in terms of symbol error rate (SER) and bit error rate (BER), assuming i.i.d. data. Exact analytical expressions are obtained for 16-ary QAM (16-QAM), which show that the GN model is accurate for uncoded transmission and standard modulation formats, confirming previous results.

The main contributions of the paper are in terms of coded performance. Shannon, the father of information theory, proved that for a given channel, it is possible to achieve an arbitrarily small error probability, if the transmission rate in bits per symbol is small enough. A rate for which virtually error-free transmission is possible is called an achievable rate and the supremum over all achievable rates for a given channel, represented as a statistical relation between its input $X$ and output $Y$ , is defined as the channel capacity [16], [17, p. 195]. A capacity-approaching transmission scheme operates in general by grouping the data to be transmitted into blocks, encoding each block into a sequence of coded symbols, modulating and transmitting this sequence over the channel, and decoding the block in the receiver. This coding process introduces, by definition, dependencies among the transmitted symbols, which is the reason why channel models derived for i.i.d. inputs may be questionable for the purpose of capacity analysis.

More fundamentally, the regular GN model is not well-suited to capacity analysis, because in this model each output sample depends on the statistics of the previously transmitted input symbols (through their average power) rather than on their actual value. This yields artifacts in capacity analysis. One such artifact is the peaky behavior of the capacity of the GN model as a function of the transmit power. Indeed, through a capacity lower bound it is shown in this paper that this peaky behavior does not occur for the finite-memory GN model, even when the memory is taken to be arbitrary large.

The analysis of channel capacity for fiber-optical transmission dates back to 1993 [5], when Splett et al. quantified the impact of nonlinear four-wave mixing on the channel capacity. By applying Shannon’s formula for the additive white Gaussian noise (AWGN) channel capacity to a channel with power-dependent noise, Splett et al. found that there exists an “optimal” finite signal-to-noise ratio that maximizes capacity. Beyond this value, capacity starts decreasing. It was however not motivated in [5] why the noise was assumed Gaussian. Using a different model for four-wave mixing, Stark [18] showed that capacity saturates, but does not decrease, at high power. In the same paper, the capacity loss due to the quantum nature of light was quantified. In 2001, Mitra and Stark [19] considered the capacity in links where cross-phase modulation dominates, proved that the capacity is lower-bounded by the capacity of a linear, Gaussian channel with the same input–output covariance matrix, and evaluated this bound via Shannon’s AWGN formula. The obtained bound vanishes at high input power. It was claimed, without motivation, that the true capacity would have the same qualitative nonmonotonic behavior.

Since 2001, the interest in optical channel capacity has virtually exploded. The zero-dispersion channel was considered by Turitsyn et al. [20]. The joint effect of nonlinearity and dispersion was modeled by Djordjevic et al. [21] as a finite-state machine, which allowed the capacity to be estimated using the Bahl–Cocke–Jelinek–Raviv (BCJR) algorithm. Taghavi et al. [22] considered a fiber-optical multiuser system as a multiple-access channel and characterized its capacity region. In a very detailed tutorial paper, Essiambre et al. [23] applied a channel model based on extensive lookup tables and obtained capacity lower bounds for a variety of scenarios. Secondini et al. [24] obtained lower bounds using the theory of mismatched decoding. Recently, Dar et al. [25] modeled the nonlinear phase noise as being blockwise constant for a certain number of symbols, which is a channel with finite memory, obtaining improved capacity bounds.

Detailed literature reviews are provided in [26] for the early results, and in [23] for more recent results. Other capacity estimates, or lower bounds thereon, were reported for various nonlinear transmission scenarios in, e.g., [27, 28, 29, 30, 31, 32, 8, 4]. Most of these estimates or bounds decrease to zero as the power increases.

This paper is organized as follows. In Sec. II, the GN model is reviewed and the finite-memory GN model is introduced. In Sec. III, the uncoded error performance of the new finite-memory model is studied. The channel capacity is studied in Sec. IV and conclusions are drawn in Sec. V. The mathematical proofs are relegated to appendices.

Notation: Throughout this paper, vectors are denoted by boldface letters $\boldsymbol{x}$ and sets are denoted by calligraphic letters $\mathcal{X}$ . Random variables are denoted by uppercase letters $X$ and their (deterministic) outcomes by the same letter in lowercase $x$ . Probability density functions (PDFs) and conditional PDFs are denoted by $f_{Y}(y)$ and $f_{Y|X}(y|x)$ , respectively. Analogously, probability mass functions (PMF) are denoted by $P_{X}(x)$ and $P_{X|Y}(x|y)$ . Expectations are denoted by $\mathbb{E}[\cdot]$ and random sequences by $\{{Z}_{k}\}$ .

II Channel Modeling: Finite and Infinite Memory

In this section, we will begin with a high-level description of the nonlinear interference in optical dual-polarization wavelength-division multiplexing (WDM) systems, highlighting the role of the channel memory, and thereafter in Sec. II-B–II-D describe in detail the channel models considered in this paper.

II-A Nonlinear Interference in Optical Channels

A coherent optical communication link converts a discrete, complex-valued electric data signal $x_{k}$ to a modulated, continuous optical signal, which is transmitted through an optical fiber, received coherently, and then converted back to a discrete output sequence $Y_{k}$ . The coherent link is particularly simple theoretically, in that the transmitter and receiver directly map the electric data to the optical field, which is a linear operation (in contrast with, e.g., direct-detection receivers), and can ideally be performed without distortions. The channel is then well described by the propagation of the (continuous) optical field in the fiber link. It should be emphasized that this assumes the coherent receiver to be ideal, with perfect synchronization and negligible phase noise. Experiments have shown [2] that commercial coherent receivers can indeed perform well enough for the fiber propagation effects to be the main limitations. Two main linear propagation effects in the fiber need to be addressed: dispersion and attenuation. The attenuation effects can be overcome by periodic optical amplification, at the expense of additive Gaussian noise from the inline amplifiers. The dispersion effects are usually equalized electronically by a filter in the coherent receiver. Such a linear optical link can be well-described by an AWGN channel, the capacity of which is unbounded with the signal power.

However, the fiber Kerr-nonlinearity introduces signal distortions, and greatly complicates the transmission modeling. The nonlinear signal propagation in the fiber is described by a nonlinear partial differential equation, the nonlinear Schrödinger equation (NLSE), which includes dispersion, attenuation, and nonlinearity. At high power levels, the three effects can no longer be conveniently separated. However, in contemporary coherent links (distance at least $500~{\textnormal{km}}$ and symbol rate at least $28~{\textnormal{Gbaud}}$ ), the nonlinearity is significantly weaker than the other two effects, and a perturbation approach can be successfully applied to the NLSE [5, 10, 11, 12]. This leads to the GN model, which will be described in Sec. II-C.

II-B Finite Memory

Even today’s highly dispersive optical links have a finite memory. For example, a signal with dispersive length $L_{\text{D}}=1/(\Delta\omega^{2}|\beta_{2}|)$ , where $\beta_{2}$ is the group velocity dispersion and $\Delta\omega$ the optical bandwidth, broadens (temporally) a factor $L/L_{\text{D}}$ over a fiber of length $L$ . With typical dispersion lengths of $5$ – $50~{\textnormal{km}}$ , this broadening factor can correspond to hundreds to thousands of adjacent symbols, a large but finite number. The same will hold for interaction among WDM channels; if one interprets $\Delta\omega$ as the channel separation, $L/L_{\text{D}}$ will give an approximation of the number of symbols that two WDM channels separate due to walk-off (and hence interact with nonlinearly during transmission). The channel memory will thus be even larger in the WDM case, and increase with channel separation, but the nonlinear interaction will decrease due to the shorter $L_{\text{D}}$ . Thus, the principle of a finite channel memory holds also for WDM signals. To keep notation as simple as possible, we will consider a single, scalar, wavelength channel in this paper. Extensions to dual polarizations and WDM are possible, but will involve obscuring complications such as four-dimensional constellation space [33] in the former case and behavioral models [34] in the latter. We can thus say that in an optical link a certain signal may sense the interference from $N\approx L/L_{\text{D}}$ neighboring symbols, which is the physical reason for introducing a finite-memory model.

If we let the range $N$ of the interfering symbols go to infinity, an even simpler type of model is obtained. The interference is now averaged over infinitely many transmitted symbols. Assuming that an i.i.d. sequence is transmitted, this time average converges to a statistical average, which greatly simplifies the analysis. All models suggested for dispersive optical channels so far belong to this category [5, 23, 35, 10, 4, 11, 12, 14], of which the GN model described in Sec. II-C is the most common.

For a given transmitted complex symbol $x_{k}$ , the (complex) single-channel output at each discrete-time $k\in\mathbb{Z}$ is modeled as

\displaystyle Y_{k}=x_{k}+Z_{k},

(1)

where $\{Z_{k}\}$ is a circularly symmetric, complex, white, Gaussian random sequence, independent of $x_{k}$ . In (1), $Z_{k}$ is assumed to be independent of the actual transmitted sequence $x_{k}$ . However, the variance of $Z_{k}$ depends on the transmit power, as detailed in Sec. II-C and II-D.

II-C The Regular GN Model

For coherent long-haul fiber-optical links without dispersion compensation, Splett et al. [5], Poggiolini et al. [7], and Beygi et al. [11] have all derived models where the nonlinear interference (NLI) appears as Gaussian noise, whose statistics depend on the transmitted signal power via a cubic relationship. The models assume that the transmitted symbols $x_{k}$ in time slot $k\in\mathbb{Z}$ are i.i.d. realizations of the same complex random variable $X$ . In this model, the additive noise in (1) is given by

\displaystyle Z_{k}=\tilde{Z}_{k}\sqrt{P_{\text{ASE}}+\eta P^{3}}

(2)

where $\{\tilde{Z}_{k}\}$ are i.i.d. zero-mean unit-variance circularly symmetric complex Gaussian random variables, $P_{\text{ASE}}$ and $\eta$ are real, nonnegative constants, and $P=\mathbb{E}[|X|^{2}]$ is the average transmit power. Therefore, the noise $Z_{k}$ is distributed as $Z_{k}\sim\mathcal{CN}(0,P_{\text{ASE}}+\eta P^{3})$ , where $\mathcal{CN}(0,\sigma^{2})$ denotes a circularly symmetric complex Gaussian random variable with mean $0$ and variance $\sigma^{2}$ . The parameter $P$ , which is a property of the transmitter, governs the behavior of the channel model. This can be intuitively understood as a long-term average of the input power. Mathematically,

\displaystyle P=\lim_{N\rightarrow\infty}\frac{1}{2N+1}\sum_{i=k-N}^{k+N}|x_{i}|^{2}

(3)

for any given $k$ , still assuming i.i.d. symbols $x_{k}$ . For this reason, we will refer to models that depend on infinitely many past and/or future symbols, via $P$ in (3) or in some other way, as infinite-memory models.

The cubic relation in (2) between the transmit power and the additive noise variance $P_{\text{ASE}}+\eta P^{3}$ is a consequence of the Kerr nonlinearity, and holds for both lumped and distributed amplification schemes. The constant $P_{\text{ASE}}$ represents the total amplified spontaneous emission (ASE) noise of the optical amplifiers for the channel under study, while $\eta$ quantifies the NLI. Several related expressions for this coefficient have been proposed. For example, for distributed amplification and WDM signaling over the length $L$ ,

	$\displaystyle\eta$	$\displaystyle=\frac{4\gamma^{2}L}{\pi\|\beta_{2}\|B^{2}}\log_{e}\left(2\pi e\|\beta_{2}\|LB^{2}\right),$		(4)
	$\displaystyle\eta$	$\displaystyle=\frac{16\gamma^{2}L}{27\pi\|\beta_{2}\|R_{\text{s}}^{2}}\log_{e}\left(\frac{2}{3}\pi^{2}\|\beta_{2}\|LB^{2}\right),$		(5)

were proposed in [5] and [36], resp., where $\gamma$ is the fiber nonlinear coefficient, $B$ is the total WDM bandwidth, and $R_{\text{s}}$ is the symbol rate. Obviously, the expressions in (4) and (5) are qualitatively similar. For dual polarization and single channel transmission over $M$ lumped amplifier spans, the expression

\eta=\frac{3\gamma^{2}}{\alpha^{2}}M^{1+\epsilon}\tanh\left(\frac{\alpha}{4|\beta_{2}|R_{\textnormal{s}}^{2}}\right)

(6)

was proposed in [14], and a qualitatively similar formula can be obtained from the results in [10]. Here, $\alpha$ is the attenuation coefficient of the fiber and the coefficient $\epsilon$ is between 0 and 1 (see [10, 14]) depending on how well the nonlinear interference decorrelates between each amplifier span. For single polarization transmission, the coefficient $3$ in (6) should be replaced by $2$ [11].

The benefits of the GN model is that it is very accurate for uncoded transmission with traditional modulation formats,¹¹1The model is not valid for exotic modulation formats such as satellite constellations [37]. as demonstrated in experiments and simulations [38, 15, 9], and that it is very simple to analyze. It is, however, not intended for nonstationary input sequences, i.e., sequences whose statistics vary with time, because the transmit power $P$ in (2) is defined as the (constant) power of a random variable that generates the i.i.d. symbols $x_{k}$ . In order to capture the behavior of a wider class of transmission schemes, the GN model can be modified to depend on a time-varying transmit power, which is the topic of the next section.

II-D The Finite-Memory GN Model

As mentioned in Sec. I and II-C, a finite-memory model is essential in order to model the channel output corresponding to time-varying input distributions. Therefore, we refine the GN model in Sec. II-C to make it explicitly dependent on the channel memory $N$ , in such a way that the model “converges” to the regular GN model as $N\rightarrow\infty$ . Many such models can be formulated. In this paper, we aim for simplicity rather than accuracy.

The proposed model assumes that the input–output relation is still given by (1), but the average transmit power $P$ in (2) is replaced by an empirical power, i.e., by the arithmetic average of the squared magnitude of the symbol $x_{k}$ and of the $2N$ symbols around it. Mathematically, (2) is replaced by

\displaystyle Z_{k}

\displaystyle=\tilde{Z}_{k}\sqrt{P_{\text{ASE}}+\eta\left(\frac{1}{2N+1}\sum_{i=k-N}^{k+N}|x_{i}|^{2}\right)^{3}}

(7)

for any $k\in\mathbb{Z}$ , where $N$ is the (one-sided) channel memory. We refer to (1) and (7) as the finite-memory GN model. Since (second-order) group velocity dispersion causes symmetric broadening with respect to the transit time of the signal, inter-symbol interference from dispersion will act both backwards and forwards in terms of the symbol index. This is why both past and future inputs contribute to the noise power in (7). A somewhat related model for the additive noise in the context of data transmission in electronic circuits has been recently proposed in [39], where the memory is single-sided and the noise scales linearly with the signal power, not cubically as in (7).

Having introduced the finite-memory GN model, we now discuss some particular cases. First, the memoryless AWGN channel model can be obtained from both the GN and finite-memory GN models by setting $\eta=0$ . In this case, the noise variance is $\mathds{E}[|Z_{k}|^{2}]=P_{\text{ASE}}$ for all $k$ . Second, let us consider the scenario where the transmitted symbols is the random process $\{X_{i}\}$ . Then the empirical power $(1/(2N+1))\sum_{i=k-N}^{k+N}|X_{i}|^{2}$ at any discrete time $k$ is a random variable that depends on the magnitude of the $k$ th symbol and the $2N$ symbols around it. In the limit $N\rightarrow\infty$ , this empirical power converges to the “statistical” power $P$ in (3), for any i.i.d. process with power $P$ , as mentioned in Sec. II-C. This observation shows that the proposed finite-memory model in (7) “converges” to the GN model in (2), provided that the channel memory $N$ is sufficiently large and that the process consists of i.i.d. symbols with zero mean and variance $P$ .

The purpose of the finite-memory model is to be able to predict the output of the channel when the transmitted symbols are not i.i.d. This is the case for example when the transmitted symbols are a nonstationary process (as will be exemplified in Sec. II-E) and also for coded sequences (which we discuss in Sec. IV). An advantage of the finite-memory model, from a theoretic viewpoint, is that the input–output relation of the channel is modeled as a fixed conditional probability of the output given the input and its history, which is the common notion of a channel model in communication and information theory ever since the work of Shannon [16], [40, p. 74]. This is in contrast to the regular GN model and other channel models, whose conditional distribution change depending on which transmitter the channel is connected to. Specifically, the GN model is represented by a family of such conditional distributions, one for each value of the transmitter parameter $P$ .

\psfrag{xlabel}[lB][ct][0.85][29]{Symbol slots}\psfrag{ylabel}[rB][ct][0.85][-19]{Distance [km]}\includegraphics{pulse_broadening_P_0_1_mW_gamma_1_27}

Figure 1: Amplitude for a linearly propagating

15.6~{\textnormal{ps}}

raised-cosine pulse (compatible with

32~{\textnormal{GBaud}}

) over

700~{\textnormal{km}}

fiber with

\beta_{2}=-21.7~{\textnormal{ps}}^{2}/{\textnormal{km}}

. The lossy NLSE over 10 amplifier spans was simulated, with ASE noise switched off for clarity, and the peak power used was 0.1 mW.

A drawback with the proposed finite-memory model is that it is more complex than the GN model. Also, our model is not accurate for small values of $N$ , since the GN assumption relies on the central limit theorem [7, 11, 12]. Furthermore, we assumed that all the $2N$ symbols around the symbol $x_{k}$ affect the noise variance equally. In practice, this is not the case. We nevertheless use the proposed model in this paper because it is relatively easy to analyze (see Sec. III and IV) and because even this simple finite-memory model captures the quantitative effects caused by non-i.i.d. symbols, which is essential for the capacity analysis in Sec. IV.

II-E Numerical Comparison

Before analyzing the finite-memory GN model, we first quantify the chromatic dispersion of the optical fiber. To this end, we simulated the transmission of a single symbol pulse over a over a single-channel, single-polarization fiber link without dispersion compensation. Ten amplifiers spans over a total distance of $700~{\textnormal{km}}$ are simulated using the lossy NLSE model. We used a raised-cosine pulse with peak power $0.1~{\textnormal{mW}}$ and a duration of $15.6~{\textnormal{ps}}$ at half the maximum amplitude, which corresponds to half the symbol slot in a $32~{\textnormal{Gbaud}}$ transmission system. The result is illustrated in Fig. 1. At this low power, the nonlinear effects are almost negligible. For clarity of illustration, the ASE noise was neglected by setting $P_{\text{ASE}}=0$ . The remaining system parameters are given in Table I and will be used throughout the paper, except when other values are explicitly stated. As we can see, the pulse broadens as it propagates along the fiber, having a width corresponding to about $100$ data symbols after $700~{\textnormal{km}}$ of transmission, or a half-width of $N=50$ symbols. This is in good agreement with the relation for symbol memory used in [41, p. 2037], which gives $2N\approx 2\pi|\beta_{2}|LR_{\text{s}}^{2}=97$ .

TABLE I: System parameters used in the paper.

Symbol	Value	Meaning
$\alpha$	$0.2~{\textnormal{dB/km}}$	Fiber attenuation
$\beta_{2}$	$-21.7~{\textnormal{ps}}^{2}/{\textnormal{km}}$	Group velocity dispersion
$\gamma$	$1.27~{\textnormal{(W km)}}^{-1}$	Fiber nonlinear coefficient
$M$	$10$	Number of amplifier spans
$L$	$700~{\textnormal{km}}$	System length
$R_{\textnormal{s}}$	$32~{\textnormal{Gbaud}}$	Symbol rate
$P_{\text{ASE}}$	$4.1\cdot 10^{-6}~{\textnormal{W}}$	Total ASE noise
$\eta$	$7244~{\textnormal{W}}^{-2}$	NLI coefficient

Next, to validate the behavior of the finite-memory model with nonstationary input symbol sequences, we simulated the transmission of independent quadrature phase-shift keying (QPSK) data symbols with a time-varying magnitude, over the same $700$ km fiber link, at $R_{\textnormal{s}}=32~{\textnormal{Gbaud}}$ . The transmitted sequence consists of $128$ symbols with $4~{\textnormal{mW}}$ of average signal power, 128 symbols at $0~{\textnormal{mW}}$ power, 128 symbols at $4~{\textnormal{mW}}$ , and so on. The statistical power is then $2~{\textnormal{mW}}$ . The chosen pulse shape is a raised-cosine return-to-zero pulse. In Fig. 2, we show the amplitude of the transmitted symbols $|x_{k}|$ (red) and received symbols $|Y_{k}|$ (blue) with three different models: the NLSE, the finite-memory GN model with $N=50$ , and the regular GN model. In the middle and lower plots of Fig. 2, we used the NLI coefficient $\eta=7244~{\textnormal{W}}^{-2}$ , which was calculated from (6), using $\epsilon=0$ for simplicity. Also in Fig. 2, we used $P_{\text{ASE}}=0$ to better illustrate the properties of the nonlinear models.

As can be seen, the agreement between the NLSE simulations and the finite memory model is quite reasonable, but the GN model cannot capture the nonstationary dynamics. The results in Fig. 2 also show that the noise variance in the NLSE simulation is low around the symbols with low input power and high around the symbols with high input power. This behavior is captured by the finite-memory GN model but not by the regular GN model, for which the variance of the noise is the same for any time instant. This illustrates that the GN model (2) should be avoided with nonstationary symbol sequences as the ones used in Fig. 2. This is not surprising, as the model was derived under an i.i.d. assumption. In Sec. V, we will return to this observation when analyzing coded transmission. We believe that the finite-memory GN model proposed here, albeit idealized, is the first model that is able to deal with nonstationary symbol sequences.

\psfrag{NLSE}[cl][cl][0.85]{\fcolorbox{black}{white}{{\color{black}NLSE simulation}}}\psfrag{N=50}[cl][cl][0.85]{\fcolorbox{black}{white}{{\color{black}$N=50$}}}\psfrag{GN}[cl][cl][0.85]{\fcolorbox{black}{white}{{\color{black}GN model}}}\includegraphics[width=216.81pt]{pulsed_tx_example_linear.eps}

Figure 2: Amplitude of the transmitted QPSK symbols

|x_{k}|

(red squares) and received symbols

|Y_{k}|

(blue circles) transmitted in a

700~{\textnormal{km}}

fiber link. The received symbols are obtained using (top) the NLSE, (middle) the finite-memory GN model (7) with

N=50

, and (bottom) the regular GN model (2).

III Uncoded Error Probability

We assume that the transmitted symbols $\{X_{k}\}$ are independently drawn from a discrete constellation $\mathcal{S}=\{s_{1},\ldots,s_{M}\}$ , where $M=2^{m}$ . The symbols are assumed to be selected with the same probability, and thus, the average transmit (statistical) power is given by

\displaystyle P=\mathds{E}[|X|^{2}]=\frac{1}{M}\sum_{i=1}^{16}|s_{i}|^{2}.

(8)

For each time instant $k$ , we denote the sequence of the $2N$ symbols transmitted around $x_{k}$ by

\displaystyle\boldsymbol{X}^{\text{mem}}_{k}\triangleq[X_{k-N},\ldots,X_{k-1},X_{k+1},\ldots,X_{k+N}],

(9)

where the notation emphasizes that $\boldsymbol{X}^{\text{mem}}_{k}$ is a random vector describing the channel memory at time instant $k$ . For future use, we define the function

\displaystyle{\rho}(a)

\displaystyle\triangleq P_{\text{ASE}}+\eta\left(\frac{a}{2N+1}\right)^{3}.

(10)

For a given sequence of $2N$ symbols $\boldsymbol{x}^{\text{mem}}_{k}$ and a given transmitted symbol $X_{k}=s_{i}$ , the conditional variance of the additive noise in (7) can be expressed as an explicit function of $\boldsymbol{x}^{\text{mem}}_{k}$ using (10), i.e.,

\displaystyle\rho(\left\lvert s_{i}\right\rvert^{2}\!+\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2})

\displaystyle=P_{\text{ASE}}+\eta\left(\frac{|s_{i}|^{2}\!+\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2}}{2N+1}\right)^{3},

(11)

where $\|\boldsymbol{x}\|$ denotes the Euclidean norm of $\boldsymbol{x}$ . For a given transmitted symbol $X_{k}=s_{i}$ and a given sequence $\boldsymbol{x}^{\text{mem}}_{k}$ , the channel law for the finite-memory model is

		$\displaystyle f_{Y_{k}\|X_{k},\boldsymbol{X}^{\text{mem}}_{k}}(y\|s_{i},\boldsymbol{x}^{\text{mem}}_{k})$
		$\displaystyle\qquad\triangleq\frac{1}{\pi\rho(\left\lvert s_{i}\right\rvert^{2}\!+\\|\boldsymbol{x}^{\text{mem}}_{k}\\|^{2})}\mathrm{exp}{\biggl{(}-\frac{\|y-s_{i}\|^{2}}{\rho(\left\lvert s_{i}\right\rvert^{2}\!+\\|\boldsymbol{x}^{\text{mem}}_{k}\\|^{2})}\biggr{)}}.$		(12)

III-A Error Probability Analysis

We consider the equally spaced 16-QAM constellation shown in Fig. 3. In this case, $\mathcal{S}=\{a+b\sqrt{-1}:a,b\in\{\pm\Delta,\pm 3\Delta\}\}$ , the minimum Euclidean distance (MED) of the constellation is $2\Delta$ , and the statistical power is $P=10\Delta^{2}$ . The binary labeling is the binary reflected Gray code (BRGC) [42], where the first two bits determine the in-phase (real) component of the symbols and the last two bits determine the quadrature (imaginary) components of the symbols. This is shown with colors in Fig. 3.

\psfrag{dd}[cb][cb][0.85]{$2\Delta$}\psfrag{x1}[cb][cb][0.85]{$s_{1}$}\psfrag{x2}[cb][cb][0.85]{$s_{2}$}\psfrag{x3}[cb][cb][0.85]{$s_{3}$}\psfrag{x4}[cb][cb][0.85]{$s_{4}$}\psfrag{x5}[cb][cb][0.85]{$s_{5}$}\psfrag{x6}[cb][cb][0.85]{$s_{6}$}\psfrag{x7}[cb][cb][0.85]{$s_{7}$}\psfrag{x8}[cb][cb][0.85]{$s_{8}$}\psfrag{x9}[cb][cb][0.85]{$s_{9}$}\psfrag{x10}[cb][cb][0.85]{$s_{10}$}\psfrag{x11}[cb][cb][0.85]{$s_{11}$}\psfrag{x12}[cb][cb][0.85]{$s_{12}$}\psfrag{x13}[cb][cb][0.85]{$s_{13}$}\psfrag{x14}[cb][cb][0.85]{$s_{14}$}\psfrag{x15}[cb][cb][0.85]{$s_{15}$}\psfrag{x16}[cb][cb][0.85]{$s_{16}$}\psfrag{b1}[cb][cb][0.85]{{\color[rgb]{1,0,0}00}{\color[rgb]{0,0,1}00}}\psfrag{b2}[cb][cb][0.85]{{\color[rgb]{1,0,0}00}{\color[rgb]{0,0,1}01}}\psfrag{b3}[cb][cb][0.85]{{\color[rgb]{1,0,0}00}{\color[rgb]{0,0,1}11}}\psfrag{b4}[cb][cb][0.85]{{\color[rgb]{1,0,0}00}{\color[rgb]{0,0,1}10}}\psfrag{b5}[cb][cb][0.85]{{\color[rgb]{1,0,0}01}{\color[rgb]{0,0,1}00}}\psfrag{b6}[cb][cb][0.85]{{\color[rgb]{1,0,0}01}{\color[rgb]{0,0,1}01}}\psfrag{b7}[cb][cb][0.85]{{\color[rgb]{1,0,0}01}{\color[rgb]{0,0,1}11}}\psfrag{b8}[cb][cb][0.85]{{\color[rgb]{1,0,0}01}{\color[rgb]{0,0,1}10}}\psfrag{b9}[cb][cb][0.85]{{\color[rgb]{1,0,0}11}{\color[rgb]{0,0,1}00}}\psfrag{b10}[cb][cb][0.85]{{\color[rgb]{1,0,0}11}{\color[rgb]{0,0,1}01}}\psfrag{b11}[cb][cb][0.85]{{\color[rgb]{1,0,0}11}{\color[rgb]{0,0,1}11}}\psfrag{b12}[cb][cb][0.85]{{\color[rgb]{1,0,0}11}{\color[rgb]{0,0,1}10}}\psfrag{b13}[cb][cb][0.85]{{\color[rgb]{1,0,0}10}{\color[rgb]{0,0,1}00}}\psfrag{b14}[cb][cb][0.85]{{\color[rgb]{1,0,0}10}{\color[rgb]{0,0,1}01}}\psfrag{b15}[cb][cb][0.85]{{\color[rgb]{1,0,0}10}{\color[rgb]{0,0,1}11}}\psfrag{b16}[cb][cb][0.85]{{\color[rgb]{1,0,0}10}{\color[rgb]{0,0,1}10}}\includegraphics{16QAM.eps}

Figure 3: The 16-QAM constellation

\mathcal{S}

and its binary labeling. The binary labeling of the constellation is based on the Cartesian product of the BRGC for 4-ary pulse amplitude modulation in phase (red) and quadrature (blue). The Voronoi regions of the symbols and the MED of the constellation are also shown. The Voronoi region

\mathcal{V}_{6}

is highlighted in gray.

The maximum-likelihood (ML) symbol-by-symbol detection rule for a given sequence $\boldsymbol{x}^{\text{mem}}_{k}$ chooses the symbol $s_{i}\in\mathcal{S}$ that maximizes $f_{Y_{k}|X_{k},\boldsymbol{X}^{\text{mem}}_{k}}(y|s_{i},\boldsymbol{x}^{\text{mem}}_{k})$ in (III). The decision made by this detector can be expressed as

	$\displaystyle\hat{X}_{k}^{\mathrm{ML}}$	$\displaystyle=$	$\displaystyle\mathop{\mathrm{argmin}}_{s_{i}\in\mathcal{S}}\Biggl{\{}\log{\rho(\left\lvert s_{i}\right\rvert^{2}\!+\\|\boldsymbol{x}^{\text{mem}}_{k}\\|^{2})}$		(13)
			$\displaystyle+\frac{\|y-s_{i}\|^{2}}{\rho(\left\lvert s_{i}\right\rvert^{2}\!+\\|\boldsymbol{x}^{\text{mem}}_{k}\\|^{2})}\Biggr{\}},$		(13)

which shows that, due to the dependency of $\log{\rho(\left\lvert s_{i}\right\rvert^{2}\!+\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2})}$ on $s_{i}$ , this detector is not an MED detector. For simplicity, however, we disregard this term and study the MED detector, which chooses the symbol $s_{i}$ being closest, in Euclidean distance, to the channel output $y$ . Thus

	$\displaystyle\hat{X}_{k}$	$\displaystyle=\mathop{\mathrm{argmin}}_{s_{i}\in\mathcal{S}}\|y-s_{i}\|^{2}$
		$\displaystyle=s_{i},\quad\text{if $Y_{k}\in\mathcal{V}_{i}$},$		(14)

where $\mathcal{V}_{i}$ denotes the decision region, or Voronoi region, of $s_{i}$ .

Remark 1

As we will later see, for memory $N$ , the MED detector in (III-A) is in fact equivalent to the detector in (13). Intuitively, this holds because the approximation $\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2}+|s_{i}|^{2}\approx\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2}$ becomes tight when $N$ is large.

Remark 2

The ML symbol-by-symbol detector in (13) is suboptimal, i.e., better detectors can be devised. For example, one could design a detector that uses not only the current received symbol, but also the next $N$ received symbols. Since the current transmitted symbol will affect the noise of the next $N$ symbols, this information could be taken into account to make a better decision on the current symbol. In this paper, however, we focus on the MED detector in (III-A) because of its simplicity.

The following two theorems give closed-form expressions for the BER and SER for the constellation in Fig. 3 when used over the finite-memory GN model.

Theorem 1

For the finite-memory GN model with arbitrary memory $N<\infty$ , the BER of the MED detector for the 16-QAM constellation in Fig. 3 is given by

\displaystyle\mathrm{BER}=\frac{2^{-3}}{2^{4N}}\sum_{l=0}^{4N}\binom{4N}{l}\sum_{\begin{subarray}{c}r\in\{1,3,5\}\\ t\in\{1,5,9\}\end{subarray}}B_{r,t}Q\mathopen{}\left(\sqrt{\frac{r^{2}P}{5\gamma_{l,t,N}}}\right),

(15)

where

$\displaystyle B_{1,1}$	$\displaystyle=2,~B_{3,1}=1,~B_{5,1}=0,$	(16)
$\displaystyle B_{1,5}$	$\displaystyle=3,~B_{3,5}=2,~B_{5,5}=-1,$	(17)
$\displaystyle B_{1,9}$	$\displaystyle=1,~B_{3,9}=1,~B_{5,9}=-1,$	(18)

and where

\displaystyle\gamma_{l,t,N}\triangleq P_{\text{ASE}}+\frac{\eta}{(2N+1)^{3}}\biggl{(}\frac{P(2N+4l+t)}{5}\biggr{)}^{3}.

(19)

Proof:

See Appendix A. ∎

Theorem 2

For the finite-memory GN model with arbitrary memory $N<\infty$ , the SER of the MED detector for the 16-QAM constellation in Fig. 3 is given by

\displaystyle\mathrm{SER}

\displaystyle=\frac{4^{-1}}{4^{2N}}\sum_{l=0}^{4N}\binom{4N}{l}\sum_{\begin{subarray}{c}e\in\{1,2\}\\ t\in\{1,5,9\}\end{subarray}}S_{e,t}Q\mathopen{}\left(\sqrt{\frac{P}{5\gamma_{l,t,N}}}\right)^{e},

(20)

where

$\displaystyle S_{1,1}$	$\displaystyle=4,~B_{2,1}=-4,$	(21)
$\displaystyle S_{1,5}$	$\displaystyle=6,~B_{2,5}=-4,$	(22)
$\displaystyle S_{1,9}$	$\displaystyle=2,~B_{2,9}=-1,$	(23)

and where $\gamma_{l,t,N}$ is given by (19).

Proof:

See Appendix B. ∎

The BER and SER in the limit $N\to\infty$ can be inferred from Theorems 1 and 2 as shown in the next corollary.

Corollary 1

The BER and SER for the finite-memory GN model in the limit $N\to\infty$ are

$\displaystyle\mathrm{BER}$	$\displaystyle=\frac{3}{4}Q\mathopen{}\left(\sqrt{\frac{P/5}{P_{\text{ASE}}+\eta P^{3}}}\right)+\frac{1}{2}Q\mathopen{}\left(\sqrt{\frac{9P/5}{P_{\text{ASE}}+\eta P^{3}}}\right)$
	$\displaystyle\qquad-\frac{1}{4}Q\mathopen{}\left(\sqrt{\frac{5P}{P_{\text{ASE}}+\eta P^{3}}}\right),$	(24)
$\displaystyle\mathrm{SER}$	$\displaystyle=3Q\mathopen{}\left(\sqrt{\frac{P/5}{P_{\text{ASE}}+\eta P^{3}}}\right)-\frac{9}{4}Q\mathopen{}\left(\sqrt{\frac{P/5}{P_{\text{ASE}}+\eta P^{3}}}\right)^{2}.$	(25)

Proof:

See Appendix C. ∎

The other extreme case to consider is the memoryless AWGN channel. The BER and SER expressions in this case are given in the following corollary.

Corollary 2

The BER and SER for the memoryless AWGN channel are given by

$\displaystyle\mathrm{BER}$	$\displaystyle=\frac{3}{4}Q\mathopen{}\left(\sqrt{\frac{P}{5P_{\text{ASE}}}}\right)+\frac{1}{2}Q\mathopen{}\left(\sqrt{\frac{9P}{5P_{\text{ASE}}}}\right)$
	$\displaystyle\qquad-\frac{1}{4}Q\mathopen{}\left(\sqrt{\frac{5P}{P_{\text{ASE}}}}\right),$	(26)
$\displaystyle\mathrm{SER}$	$\displaystyle=3Q\mathopen{}\left(\sqrt{\frac{P}{5P_{\text{ASE}}}}\right)-\frac{9}{4}Q\mathopen{}\left(\sqrt{\frac{P}{5P_{\text{ASE}}}}\right)^{2}.$	(27)

Proof:

Set $\eta=0$ in (24) and (25). ∎

The results in Corollaries 1 and 2 correspond to the well-known expressions for the BER and SER for the AWGN channel. In particular, (26) can be found in [43, eq. (10)], [44, eq. (10.36a)] and (27) in [44, eq. (10.32)]. Also, the results in Corollary 2 together with (2) show that the BER and SER for the finite-memory GN model when $N\rightarrow\infty$ converge to the BER and SER for the regular GN model.

\psfrag{xlabel}[cc][cB][0.85]{$P$~[dBm]}\psfrag{ylabel}[cc][cB][0.85]{$\mathrm{BER}$}\psfrag{N=0}[cl][cl][0.85]{AWGN}\psfrag{N=1}[cl][cl][0.85]{$N=1$}\psfrag{N=2}[cl][cl][0.85]{$N=2$}\psfrag{N=5}[cl][cl][0.85]{$N=5$}\psfrag{N=10}[cl][cl][0.85]{$N=10$}\psfrag{N=50}[cl][cl][0.85]{$N=50$}\psfrag{N=infinityyyy}[cl][cl][0.85]{GN model}\includegraphics{BEP_PASE_4.100000e-06_ETA_7.244000e+03.eps}

\psfrag{xlabel}[cc][cB][0.85]{$P$~[dBm]}\psfrag{ylabel}[cc][cB][0.85]{$\mathrm{BER}$}\psfrag{N=0}[cl][cl][0.85]{AWGN}\psfrag{N=1}[cl][cl][0.85]{$N=1$}\psfrag{N=2}[cl][cl][0.85]{$N=2$}\psfrag{N=5}[cl][cl][0.85]{$N=5$}\psfrag{N=10}[cl][cl][0.85]{$N=10$}\psfrag{N=50}[cl][cl][0.85]{$N=50$}\psfrag{N=infinityyyy}[cl][cl][0.85]{GN model}\psfrag{ylabel}[cc][cB][0.85]{$\mathrm{SER}$}\includegraphics{SEP_PASE_4.100000e-06_ETA_7.244000e+03.eps}

Figure 4: Analytical BER (top) and SER (bottom) of 16-QAM transmission with the finite-memory GN model, for different values of

N

(solid lines). Markers show simulation results with the ML detector in (13) (squares) and the MED detector in (III-A) (circles). The results for the memoryless AWGN channel and the regular GN model are included for comparison.

III-B Numerical Results

We consider the same scenario as in Sec. II-E, with parameters according to Table I. The BER and SER for the 16-QAM constellation in Fig. 3 given by Theorems 1 and 2 are shown in Fig. 4 for different values of $N$ . Fig. 4 also shows the asymptotic cases $N=0$ and $N\rightarrow\infty$ given by Corollaries 1 and 2, respectively. Furthermore, results obtained via computer simulations of (1)–(2) are included using the ML detector in (13), marked with squares, and the MED detector in (III-A), marked with circles. As expected, the MED detector yields a perfect match with the analytical expressions, whereas the ML detector deviates slightly for small channel memories.

The results in Fig. 4 show that in the low-input-power regime, the memory in the channel plays no role for the BER and SER, and all the curves follow closely the BER and the SER of a memoryless AWGN channel. However, as $P$ increases, the memory kicks in, causing the BER and SER for finite $N$ to have a minimum, and then to increase as $P$ increases. Physically, this can be explained as follows: in the low-power regime, the BER is limited by the ASE noise, which is independent of the memory depth. In the high-power regime, the Kerr-induced noise dominates, resulting in increasing BER with power. Similar behavior has been reported in most experiments and simulations on nonlinearly-limited links, e.g., [45, 46, 9, 11], [47, Ch. 9]. The reason why the performance improves slightly with the memory depth $N$ is the nonlinear scaling of the Kerr-induced noise. For $N=1$ , sequences of two or more high-amplitude symbols will receive high noise power and dominate the average BER. For higher $N$ , longer (and less probable) sequences of high-amplitude symbols are required to receive the same, high, noise power. Thus on average the performance improves with $N$ , up to a limit given by the GN model.

The results in Fig. 4 also show how the finite-memory model in the high-input power regime approaches the GN model. For $N=50$ , the two models yield very similar BER and SER curves.

IV Channel Capacity

In this section, some fundamentals of information theory are first reviewed. Then a lower bound on the capacity of the finite-memory GN model is derived and evaluated numerically.

IV-A Preliminaries

Fig. 5 shows a generic coded communication system where a message $j$ is mapped to a codeword $\boldsymbol{x}=[x_{1},\dots,x_{n}]$ . This codeword is then used to modulate a continuous-time waveform, which is then transmitted through the physical channel. At the receiver’s side, the continuous-time waveform is processed (filtered, equalized, synchronized, matched filtered, sampled, etc.) resulting in a discrete-time observation $\boldsymbol{Y}=[Y_{1},\dots,Y_{n}]$ , which is a noisy version of the transmitted codeword $\boldsymbol{x}$ . The decoder uses $\boldsymbol{Y}$ to estimate the transmitted message $j$ .

\psfrag{ENC}[cc][cc][0.85]{Encoder}\psfrag{DEC}[cc][cc][0.85]{Decoder}\psfrag{COMM}[cc][cc][0.85]{Physical}\psfrag{CHANNEL}[cc][cc][0.85]{Channel}\psfrag{x}[bc][bc][0.85]{$\boldsymbol{x}$}\psfrag{y}[bc][Bc][0.85]{$\boldsymbol{Y}$}\psfrag{z}[bl][Bl][0.85]{$\boldsymbol{Z}$}\psfrag{j}[bc][Bc][0.85]{$j$}\psfrag{jh}[bc][Bc][0.85]{$\hat{\jmath}$}\includegraphics[width=216.81pt]{enc_dec.eps}

Figure 5: Encoder and decoder pair. The encoder maps a message

j

to a codeword

\boldsymbol{x}=[x_{1},\dots,x_{n}]

. The decoder uses the noisy observation

\boldsymbol{Y}=[Y_{1},\dots,Y_{n}]

to provide an estimate

\hat{\jmath}

of the message

j

When designing a coded communication system, the first step is to choose the set of codewords (i.e., the codebook) that will be transmitted through the channel. Once the codebook has been chosen, the mapping rule between messages and codewords should be chosen, which fully determines the encoding procedure. At the receiver side, the decoder block will use the mapping rule used at the transmitter (as well as the channel characteristics) to give an estimate $\hat{\jmath}$ of the message $j$ . The triplet codebook, encoder, and decoder forms a so-called coding scheme. Practical coding schemes are designed so as to minimize the probability that $\hat{\jmath}$ differs from $j$ , while at the same time keeping the complexity of both encoder and decoder low.

Channel capacity is the largest transmission rate at which reliable communications can occur. More formally, let $(n,M,\epsilon)$ be a coding scheme consisting of:

•

An encoder that maps a message $j\in\{1,\dots,M\}$ into a block of $n$ transmitted symbols $\boldsymbol{x}=[x_{1},\dots,x_{n}]$ satisfying a per-codeword power constraint

$\displaystyle\frac{1}{n}\sum_{l=1}^{n}|x_{l}|^{2}=P.$ (28)
•

A decoder that maps the corresponding block of received symbols $\boldsymbol{Y}=[Y_{1},\dots,Y_{n}]$ into a message $\hat{\jmath}\in\{1,\dots,M\}$ so that the average error probability, i.e., the probability that $\hat{\jmath}$ differs from $j$ , does not exceed $\epsilon$ .

Observe that $P$ here is defined differently from in previous sections. It still represents the average transmit power, but while this quantity is Sec. II–III was interpreted in a statistical sense as the mean of an i.i.d. random variable, it is in this section the exact power of every codeword.

The maximum coding rate $R^{*}(n,\epsilon)$ (measured in bit/symbol) for a given block length $n$ and error probability $\epsilon$ is defined as the largest ratio $(\log_{2}M)/n$ for which an $(n,M,\epsilon)$ coding scheme exists. The channel capacity $C$ is the largest coding rate for which a coding scheme with vanishing error probability exists, in the limit of large block length,

\displaystyle C\triangleq\lim_{\epsilon\to 0}\lim_{n\to\infty}R^{*}(n,\epsilon).

(29)

IV-B Memoryless Channels

By Shannon’s channel coding theorem, the channel capacity of a discrete-time memoryless channel, in bit/symbol, can be calculated as [16], [17, Ch. 7]

\displaystyle C=\sup I(X;Y),

(30)

where $I(X;Y)$ is the mutual information (MI)

\displaystyle I(X;Y)=\iint f_{X,Y}(x,y)\log_{2}\frac{f_{X,Y}(x,y)}{f_{X}(x)f_{Y}(y)}dxdy

(31)

and the maximization in (30) is over all probability distributions $f_{X}$ that satisfy $\mathbb{E}[|X|^{2}]=P$ , for a given channel $f_{Y|X}$ .

Roughly speaking, a transmission scheme that operates at an arbitrary rate $R<C$ can be designed by creating a codebook of $M=2^{nR}$ codewords of length $n$ , whose elements are i.i.d. random samples from the distribution $f_{X}$ that maximizes the mutual information in (30). This codebook is stored in both the encoder and decoder. During transmission, the encoder maps each message $j$ into a unique codeword $\boldsymbol{x}$ , and the decoder identifies the codeword that is most similar, in some sense, to the received vector $\boldsymbol{Y}$ . An arbitrarily small error probability $\epsilon$ can be achieved by choosing $n$ large enough. This random coding paradigm was proposed already by Shannon [16]. In practice, however, randomly constructed codebooks are usually avoided for complexity reasons.

Since the additive noise in (2) is statistically independent of $X_{k}$ , the channel capacity of the GN model (2) can be calculated exactly as [5, 8]

\displaystyle C=\log_{2}\mathopen{}\left(1+\frac{P}{P_{\text{ASE}}+\eta P^{3}}\right)

(32)

using Shannon’s well-known capacity expression [16, Sec. 24], [17, Ch. 9]. The capacity in (32) can be achieved by choosing the codewords $\boldsymbol{x}$ to be drawn independently from a Gaussian distribution $\mathcal{CN}(0,P)$ .

Considered as a function of the transmitted signal power $P$ , the capacity in (32) has the peculiar behavior of reaching a peak and eventually decreasing to zero at high enough power, since the denominator of (32) increases faster than the numerator. This phenomenon, sometimes called the “nonlinear Shannon limit” in the optical communications community, conveys the message that reliable communication over nonlinear optical channels becomes impossible at high powers. In the following sections, we shall question this pessimistic conclusion.

IV-C Channels with Memory

The capacity of channels with memory is, under certain assumptions on information stability [48, Sec. I],

\displaystyle C=\lim_{n\rightarrow\infty}\sup\frac{1}{n}I(\boldsymbol{X}_{1}^{n};\boldsymbol{Y}_{1}^{n}),

(33)

where $\boldsymbol{X}_{i}^{j}=(X_{i},X_{i+1},\ldots,X_{j})$ , $I(\boldsymbol{X}_{i}^{j};\boldsymbol{Y}_{i}^{j})$ is defined as a multidimensional integral analogous to (31), and the maximization is over all joint distributions of $X_{1},\ldots,X_{n}$ satisfying $\mathds{E}\mathopen{}\left[\|\boldsymbol{X}_{1}^{n}\|^{2}\right]=nP$ . In this context, it is worth emphasizing that the maximization in (33) includes sequences $X_{1},\ldots,X_{n}$ that are not i.i.d. Hence, in order to calculate the channel capacity of a transmission link, it is essential that the employed channel model allows non-i.i.d. inputs.

An exact expression for the channel capacity of the finite-memory GN model (7) is not available. Shannon’s formula, which leads to (32), does not apply here, because the sequences $\{X_{k}\}$ and $\{Z_{k}\}$ , where $Z_{k}$ was defined in (7), are dependent. A capacity estimation via (33) is numerically infeasible, since it involves integration and maximization over high-dimensional spaces. We therefore turn our attention to bounds on the capacity for the finite-memory model. Every joint distribution of $X_{1},\ldots,X_{k}$ satisfying $\mathds{E}\mathopen{}\left[\|\boldsymbol{X}_{1}^{n}\|^{2}\right]=nP$ gives us a lower bound on capacity. Thus,

\displaystyle C\geq\lim_{n\rightarrow\infty}\frac{1}{n}I(\boldsymbol{X}_{1}^{n};\boldsymbol{Y}_{1}^{n}),

(34)

for any random process $\{X_{k}\}$ such that the limit exists.

\psfrag{Re}[.7][25]{$\mathrm{Re}\{X_{k}\}$}\psfrag{Im}[.7][25]{$\mathrm{Im}\{X_{k}\}$}\psfrag{k}[.7]{$k$}\includegraphics[width=216.81pt]{sequence}

Figure 6: Six samples of the random input process

\{X_{k}\}

used to generate the lower bound in Theorem 3. The channel memory is here

N=1

, meaning that

2N+1=3

input symbols

X_{k}

influence each output symbol. The distributions are illustrated as scatter plots of

1000

realizations for each sample.

IV-D Lower Bound

In this section, a lower bound on (33) is derived by applying (34) to the following random input process. In every block of $2N+1$ consecutive symbols, we let the first $N$ symbols and the last $N$ symbols have a constant amplitude, whereas the amplitude of the symbol in the middle of the block follows an arbitrary distribution. The phase of each symbol in the block is assumed uniform. With this random input process, illustrated in Fig. 6, the memory in (7) depends only on a single variable-amplitude symbol. This enables us to derive an analytical expression for the resulting capacity lower bound in (34).

Theorem 3

For every $r_{1}\geq 0$ and every probability distribution $f_{R}$ over $\mathbb{R}^{+}$ such that

\displaystyle\frac{2Nr_{1}^{2}+\mathbb{E}[R^{2}]}{2N+1}=P,

(35)

where $R\sim f_{R}$ , the channel capacity of (7) is lower-bounded as

C\geq-\frac{\mathbb{E}[\log_{2}f_{\boldsymbol{U}}(\boldsymbol{U})]}{2N+1}\\ -\int_{0}^{\infty}f_{R}(r)\log_{2}(e\rho(2Nr^{2}_{1}+r^{2}))\,\mathrm{d}r.

(36)

Here, $\boldsymbol{U}\triangleq[U_{-N},U_{-N+1},\ldots,U_{N}]$ is a random vector whose probability density function $f_{\boldsymbol{U}}$ is

$\displaystyle f_{\boldsymbol{U}}(\boldsymbol{u})=$	$\displaystyle\int_{0}^{\infty}f_{R}(r)\frac{\mathrm{exp}\mathopen{}\left(-\frac{\sum_{k=-N}^{N}u_{k}+2Nr_{1}^{2}+r^{2}}{\rho(2Nr^{2}_{1}+r^{2})}\right)}{\bigl{(}\rho(2Nr^{2}_{1}+r^{2})\bigr{)}^{2N+1}}$	(37)
	$\displaystyle\cdot I_{0}\mathopen{}\left(\frac{2r\sqrt{u_{0}}}{\rho(2Nr_{1}^{2}+r^{2})}\right)$
	$\displaystyle\cdot\prod_{\begin{subarray}{c}k=-N\\ k\neq 0\end{subarray}}^{N}I_{0}\mathopen{}\left(\frac{2r_{1}\sqrt{u_{k}}}{\rho(2Nr_{1}^{2}+r^{2})}\right)\mathrm{d}r,$

where the function $\rho(\cdot)$ is defined in (10), and $I_{0}(u)$ is the modified Bessel function of the first kind.

Proof:

See Appendix D. ∎

The bound will be numerically computed in the next section.

IV-E Numerical Results

Theorem 3 yields a lower bound on capacity for every constant $r_{1}$ and every probability distribution $f_{R}$ satisfying (35). Instead of optimizing the bound over all distributions $f_{R}$ , which is of limited interest, since the theorem itself provides only a lower bound on capacity, we study a heuristically chosen family of distributions and optimize its parameters along with the constant amplitude $r_{1}$ .

\psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi05}	\psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi10}
\psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi15}	\psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi20}
\psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi25}	\psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi30}

Figure 7: Lower bounds on capacity from Theorem 3 as a function of

\nu

, for various parameters

P

and

r_{1}^{2}/s

. The memory is

N=1

An attractive distribution in this context is to let the variable-amplitude symbols follow a circularly symmetric bivariate t-distribution [49, p. 86], [50, p. 1],

\displaystyle f_{X}(x)=\frac{1}{2\pi s}\left(1+\frac{|x|^{2}}{\nu s}\right)^{-(1+\nu/2)},

(38)

where $X$ (with magnitude $R=\left\lvert X\right\rvert$ ) denotes one such variable-amplitude symbol, $\nu$ is a shape parameter, and $s$ scales the variance, which equals [50, p. 11] $\mathbb{E}[|X|^{2}]=\mathbb{E}[R^{2}]=2\nu s/(\nu-2)$ if $\nu>2$ and is otherwise undefined. The shape of this distribution is similar to a Gaussian, but the heaviness of the tail can be controlled via the shape parameter $\nu$ : the closer $\nu$ is to $2$ , the heavier tail. This is, as we shall see later, what makes it an interesting choice for nonlinear optical channels.

Again, we consider the same scenario as in Sec. II-E, with the system parameters given in Table I. The distribution of $R=|X|$ is given by $f_{R}(r)=2\pi rf_{X}(r)$ , with $f_{X}$ given by (38). The power constraint (35), which reduces to

\displaystyle P=\frac{1}{2N+1}\left(2Nr_{1}^{2}+\frac{2\nu s}{\nu-2}\right),

leaves two degrees of freedom to optimize for each $P$ , which we can take to be the shape parameter $\nu$ and the ratio $r_{1}^{2}/s$ .

\psfrag{N=0}[cl][cl][0.85]{AWGN}\psfrag{ZERO}[cl][cl][0.85]{$N=0$}\psfrag{ONE}[cl][cl][0.85]{$N=1$}\psfrag{LowerBound}[cl][cl][0.85]{Lower bounds}\psfrag{OTH}[cc][cc][0.85]{$N=2,5,10,20,50$}\psfrag{N=infinityyyyyyyyy}[cl][cl][0.85]{GN model}\psfrag{xlabel}[cc][cB][0.85]{$P$~[dBm]}\psfrag{ylabel}[cc][cB][0.85]{$C$~[bit/symbol]}\includegraphics{EA_clb_w_extension}

Figure 8: Lower bounds from Theorem 3 on the capacity of the finite-memory model for different values of

N

. The exact capacities of the AWGN channel and the GN model in (32) are included for comparison. Observe that the capacity of the finite-memory model does not converge to the capacity of the GN model as the memory

N

increases. Dashed lines indicate improved lower bounds via the law of monotonic channel capacity.

The lower bound on the capacity of the finite-memory model given by Theorem 3 is shown in Fig. 7 as a function of $P$ , $\nu$ , and $r_{1}^{2}/s$ , for the special case $N=1$ . The expectation in (36) was estimated by Monte Carlo integration. It can be seen that as the transmit power $P$ increases, the optimum shape parameter $\nu$ gets closer and closer to $2$ . In other words, the tail gets heavier, so that at high power, it consumes almost all power, while the probability of transmitting a high amplitude $R$ is still small. In this sense, a t-distribution with a shape parameter near $2$ is similar to a satellite constellation [37].

Selecting the optimum parameters $\nu$ and $r_{1}^{2}/s$ for every power $P$ , the capacity bound is plotted in Fig. 8 as a function of transmit power $P$ , for selected values of the channel memory $N$ . The figure also shows the AWGN channel capacity and the exact capacity of the GN model given by (32). In the linear regime, the capacity bound is close to the AWGN capacity if $N=0$ , because the t-distribution is, at high values of $\nu$ , approximately equal to the capacity-achieving Gaussian distribution. As $N$ increases, the capacity bound tends, still in the linear regime, to the mutual information of constant-amplitude transmission [51, 52].

Interestingly, we can see that as $N$ increases, the curves approach an asymptotic bound (the curves for $N=10$ , $20$ , and $50$ almost overlap). It follows that reliable communication in the high input power regime is indeed possible for every finite $N$ . This result should be compared with the regular GN model, whose capacity (32) decreases to zero at high average transmit power [8]. It may seem contradictory that the GN model, which can be characterized as a limiting case of the finite-memory model (cf. (7) and (2)–(3)), nevertheless exhibits a fundamentally different channel capacity. This can be intuitively understood as follows. For every block of $2N+1$ symbols, we transmit $2N$ constant-amplitude symbols with low power and only one symbol with variable (potentially very large) power. Although the amplitude of this variable-power symbol is chosen so that the average power constraint is satisfied according to (35) (which requires averaging across many blocks of length $2N+1$ ), the convergence to average power illustrated in (3) does not occur within a block, even when $N$ is taken very large.

It can be observed that the lower bounds in Fig. 8 all exhibit a low peak, before they converge to their asymptotic values at high $P$ . Such bounds can always be improved using the law of monotonic channel capacity [53]. Cast in the framework of this paper, this law states that the channel capacity never decreases with power for any finite-memory channel. This law does not give a capacity lower bound per se, but it provides an instrument by which a lower bound at a certain power $P$ can be propagated to any power greater than $P$ . Hence, the part of the curves in Fig. 8 to the right of the peaks can be lifted up to the level of the peaks, which would yield a marginally tighter lower bounds (dashed lines in Fig. 8).

V Discussion and Conclusions

We extended the popular GN model for nonlinear fiber channels with a parameter to account for the channel memory. The extended channel model, which is given by (7), is able to model the time-varying output of an optical fiber whose input is a nonstationary process. If the input varies on a time scale comparable to or longer than the memory of the channel, then this model gives more realistic results than the regular GN model, as we showed in Fig. 2.

The validity of the GN model remains undisputed in the case of i.i.d. input symbols, such as in an uncoded scenario with a fixed, not too heavy-tailed modulation format²²2Examples of “heavy-tailed” modulation formats are t-distributions (Sec. IV-E) and satellite constellations [37]. and a fixed transmit power. These are the conditions under which the GN model was derived and validated. The uncoded bit and symbol error rates computed in Sec. III confirm that the finite-memory model behaves similarly to the GN model as the channel memory $N$ increases.

The scene changes completely if we instead study capacity, as in Fig. 8. In this case, the finite-memory GN model does not, even at high $N$ , behave as the regular GN model. This is because the channel capacity by definition involves a maximization over all possible transmission schemes, including nonstationary input, heavy-tailed modulation formats, etc. In the nonlinear regime, it turns out to be beneficial to transmit using a heavy-tailed input sequence, whose output the GN model cannot reliably predict. Hence, the GN model and other infinite-memory models (in the sense defined in Sec. II-C) should be used with caution in capacity analysis. It is still possible (and often easy) to calculate the capacity of such channel models, but this capacity should not be interpreted as the capacity of some underlying physical phenomenon with a finite memory. As a rule of thumb, if the model depends on the average transmit power, we recommend to avoid it in capacity analysis.

A challenging area for future work would be to derive more realistic finite-memory models than (7), i.e., discrete-time channel models that give the channel output as a function of a finite number of input symbols, ideally including not only a time-varying sequence of symbols but also symbols in other wavelengths, polarizations, modes, and/or cores, and to analyze these models from an information-theoretic perspective. This may lead to innovative new transmission techniques, which may potentially increase the capacity significantly over known results in the nonlinear regime. The so-called nonlinear Shannon limit, which has only been derived for infinite-memory channel models, does not prevent the existence of such techniques.

Appendix A Proof of Theorem 1

Let $\{B_{q}\}$ , $q=1,\dots,4$ , be the four bits associated with the $16$ -QAM constellation point chosen as the $k$ th transmitted symbol $X_{k}$ . The BER for the 16-QAM constellation in Fig. 3 is given by

	$\displaystyle\mathrm{BER}$	$\displaystyle\triangleq\frac{1}{4}\sum_{q=1}^{4}\Pr\{\hat{B}_{q}\neq{B}_{q}\}$		(39)
		$\displaystyle=\frac{1}{64}\sum_{q=1}^{4}\sum_{i=1}^{16}P_{\hat{B}_{q}\|X_{k}}\bigl{(}\overline{B}_{q}\|s_{i}\bigr{)},$		(40)

where $\hat{B}_{q}$ is the estimated bit obtained by the MED detector in (III-A) and $\overline{B}$ denotes bit negation. Using the law of total probability we can then express (40) as

	$\displaystyle\mathrm{BER}$	$\displaystyle=\frac{1}{64}\sum_{q=1}^{4}\sum_{i=1}^{16}\sum_{\\|\boldsymbol{x}^{\text{mem}}_{k}\\|^{2}}P_{\\|\boldsymbol{X}^{\text{mem}}_{k}\\|^{2}}\bigl{(}\\|\boldsymbol{x}^{\text{mem}}_{k}\\|^{2}\bigr{)}$
		$\displaystyle\qquad\qquad\cdot P_{\hat{B}_{q}\|X_{k},\\|\boldsymbol{X}^{\text{mem}}_{k}\\|^{2}}\bigl{(}\overline{B}_{q}\|s_{i},\\|\boldsymbol{x}^{\text{mem}}_{k}\\|^{2}\bigr{)}.$		(41)

We now compute the PMF $P_{\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}}$ . As $\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}$ is a sum of $2N$ i.i.d. random variables, its PMF is the $2N$ -fold self-convolution of the PMF of one such random variable. This convolution can be readily computed using probability generating functions [54, Sec. 5.1]. Let

	$\displaystyle\hat{P}_{\|X_{k}\|^{2}}(z)$	$\displaystyle=\frac{1}{4}(z^{2\Delta^{2}}+2z^{10\Delta^{2}}+z^{18\Delta^{2}})$
		$\displaystyle=\frac{1}{4}\bigl{(}z^{\Delta^{2}}+z^{9\Delta^{2}}\bigr{)}^{2}$		(42)

denote the probability generating function of $|X_{k}|^{2}$ . The probability generating function of $\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}$ is given by

$\displaystyle\hat{P}_{\\|\boldsymbol{X}^{\text{mem}}_{k}\\|^{2}}(z)$	$\displaystyle=\bigl{(}\hat{P}_{\|X_{k}\|^{2}}(z)\bigr{)}^{2N}$	(43)
	$\displaystyle=\frac{1}{4^{2N}}\bigl{(}z^{\Delta^{2}}+z^{9\Delta^{2}}\bigr{)}^{4N}$	(44)
	$\displaystyle=\sum_{l=0}^{4N}\frac{1}{4^{2N}}\binom{4N}{l}z^{(4N+8l)\Delta^{2}}.$	(45)

We see from (45) that the possible outcomes of $\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}$ are

\displaystyle\delta_{l}\triangleq{(4N+8l)\Delta^{2}},\quad l=0,1,\ldots,4N,

(46)

and $\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}=\delta_{l}$ occurs with probability ${\binom{4N}{l}}{4^{-2N}}$ . Using this in (41) yields

$\displaystyle\mathrm{BER}$	$\displaystyle=\frac{4^{-3}}{4^{2N}}\sum_{l=0}^{4N}\binom{4N}{l}\sum_{q=1}^{4}\sum_{i=1}^{16}P_{\hat{B}_{q}\|X_{k},\\|\boldsymbol{X}^{\text{mem}}_{k}\\|^{2}}(\overline{B}_{q}\|s_{i},\delta_{l})$
	$\displaystyle=\frac{4^{-3}}{4^{2N}}\sum_{l=0}^{4N}\binom{4N}{l}\sum_{q=1}^{4}\sum_{i=1}^{16}\sum_{\begin{subarray}{c}j=1\\ c_{j,q}\neq c_{i,q}\end{subarray}}^{16}$
	$\displaystyle\qquad\,\int_{\mathcal{V}_{j}}\frac{\pi^{-1}}{\rho(\|s_{i}\|^{2}+\delta_{l})}\mathrm{exp}{\biggl{(}-\frac{\|y-s_{i}\|^{2}}{\rho(\|s_{i}\|^{2}+\delta_{l})}\biggr{)}}\,\mathrm{d}y,$	(47)

where (47) follows from (III) and $c_{j,q}$ represents the $q$ th bit label of the symbol $s_{j}$ for $j=1,\ldots,16$ .

The density in the integral in (47) corresponds to a Gaussian random variable with total variance $\rho(|s_{i}|^{2}+\delta_{l})$ , and thus, we now focus on the function $\rho(|s_{i}|^{2}+\delta_{l})$ . First, we express the constellation points indices as $\{1,2,\ldots,16\}=\mathcal{I}_{1}\cup\mathcal{I}_{5}\cup\mathcal{I}_{9}$ , where $\mathcal{I}_{1}\triangleq\{6,7,10,11\}$ , $\mathcal{I}_{5}\triangleq\{2,3,5,8,9,12,14,15\}$ , and $\mathcal{I}_{9}\triangleq\{1,4,13,16\}$ . From Fig. 3, we see that $|s_{i}|^{2}=2\Delta^{2}$ if $i\in\mathcal{I}_{1}$ , $|s_{i}|^{2}=10\Delta^{2}$ if $i\in\mathcal{I}_{5}$ , and $|s_{i}|^{2}=18\Delta^{2}$ if $i\in\mathcal{I}_{9}$ . Using the definition of $\rho(|s_{i}|^{2}+\delta_{l})$ in (11) together with (46) and $P=10\Delta^{2}$ , we obtain

		$\displaystyle\rho(\|s_{i}\|^{2}+\delta_{l})=$
		$\displaystyle\quad\begin{cases}P_{\text{ASE}}+\frac{\eta}{(2N+1)^{3}}\left(\frac{P(2N+4l+1)}{5}\right)^{3}\!\!,&{\textnormal{if $i\in\mathcal{I}_{1}$}}\\ P_{\text{ASE}}+\frac{\eta}{(2N+1)^{3}}\left(\frac{P(2N+4l+5)}{5}\right)^{3}\!\!,&{\textnormal{if $i\in\mathcal{I}_{5}$}}\\ P_{\text{ASE}}+\frac{\eta}{(2N+1)^{3}}\left(\frac{P(2N+4l+9)}{5}\right)^{3}\!\!,&{\textnormal{if $i\in\mathcal{I}_{9}$}}\\ \end{cases}.$		(48)

We recognize the three values of $\rho(|s_{i}|^{2}+\delta_{l})$ in (A) as $\gamma_{l,1,N}$ , $\gamma_{l,5,N}$ , and $\gamma_{l,9,N}$ , respectively. Combining this with $P=10\Delta^{2}$ and inspecting the constellation and labeling in Fig. 3 yields (15).

Appendix B Proof of Theorem 2

The SER for the 16-QAM constellation in Fig. 3 is

$\displaystyle\mathrm{SER}$	$\displaystyle\triangleq\Pr\{\hat{X}_{k}\neq X_{k}\}$	(49)
	$\displaystyle=\frac{1}{16}\sum_{i=1}^{16}\Pr\{\hat{X}_{k}\neq s_{i}\|X_{k}=s_{i}\}$	(50)
	$\displaystyle=\frac{1}{16}\sum_{i=1}^{16}\sum_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{16}\Pr\{Y_{k}\in\mathcal{V}_{j}\|X_{k}=s_{i}\}.$	(51)

By conditioning on the possible values of $\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}$ , we obtain

$\displaystyle\mathrm{SER}$	$\displaystyle=\frac{4^{-2}}{4^{2N}}\sum_{l=0}^{4N}\binom{4N}{l}\sum_{i=1}^{16}\sum_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{16}$
	$\displaystyle\quad\qquad\quad\Pr\{Y_{k}\in\mathcal{V}_{j}\|X_{k}=s_{i},\\|\boldsymbol{X}^{\text{mem}}_{k}\\|^{2}=\delta_{l}\}$	(52)
	$\displaystyle=\frac{4^{-2}}{4^{2N}}\sum_{l=0}^{4N}\binom{4N}{l}\sum_{i=1}^{16}\sum_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{16}$
	$\displaystyle\qquad\qquad\int_{\mathcal{V}_{j}}\frac{\pi^{-1}}{\rho(\|s_{i}\|^{2}+\delta_{l})}\mathrm{exp}{\biggl{(}-\frac{\|y-s_{i}\|^{2}}{\rho(\|s_{i}\|^{2}+\delta_{l})}\biggr{)}}\,\mathrm{d}y,$	(53)

where $\delta_{l}$ is given by (46).

The expression in (20) is obtained by recognizing the density in the integral in (53) as a Gaussian random variable with total variance given by (A), and by integrating, for each $i$ in the sets $\mathcal{I}_{1}$ , $\mathcal{I}_{5}$ , and $\mathcal{I}_{9}$ , over $\mathcal{V}_{j}$ with $j\neq i$ . This completes the proof of Theorem 2.

Appendix C Proof of Corollary 1

The SER in (20) can be expressed as

\displaystyle\mathrm{SER}

\displaystyle=\frac{1}{4}\sum_{\begin{subarray}{c}e\in\{1,2\}\\ t\in\{1,5,9\}\end{subarray}}S_{e,t}\sum_{l=0}^{4N}\frac{1}{4^{2N}}\binom{4N}{l}u_{e}\biggl{(}\frac{4l+t-1}{4(2N+1)}\biggr{)},

(54)

where

\displaystyle u_{e}

\displaystyle(x)\triangleq Q\mathopen{}\left(\sqrt{\frac{P/5}{P_{\text{ASE}}+({1+4x})^{3}\eta\left({P}/{5}\right)^{3}}}\right)^{e}

(55)

is a continuous and bounded function in $[0,2]$ for any $e\in\{1,2\}$ and $t\in\{1,5,9\}$ . We can interpret the innermost sum in (54) in probabilistic terms as

	$\displaystyle\sum_{l=0}^{4N}\frac{1}{4^{2N}}\binom{4N}{l}u_{e}$	$\displaystyle\biggl{(}\frac{4l+t-1}{4(2N+1)}\biggr{)}$
		$\displaystyle=\mathds{E}\mathopen{}\left[u_{e}\biggl{(}\frac{4S_{4N}+t-1}{4(2N+1)}\biggr{)}\right],$		(56)

where $S_{4N}$ is a binomial random variable with parameters $(4N,1/2)$ , i.e., $S_{4N}$ is the sum of $4N$ i.i.d. Bernoulli random variables that take values $0$ and $1$ with the same probability. We use the notation $S_{4N}$ to emphasize the dependency on $N$ . To establish (25), we first calculate

$\displaystyle\lim_{N\to\infty}\mathds{E}$	$\displaystyle\mathopen{}\left[u_{e}\biggl{(}\frac{4S_{4N}+t-1}{4(2N+1)}\biggr{)}\right]$
	$\displaystyle=\mathds{E}\mathopen{}\left[\lim_{N\to\infty}u_{e}\biggl{(}\frac{4S_{4N}+t-1}{4(2N+1)}\biggr{)}\right]$	(57)
	$\displaystyle=\mathds{E}\mathopen{}\left[u_{e}\biggl{(}\lim_{N\to\infty}\frac{4S_{4N}+t-1}{4(2N+1)}\biggr{)}\right]$	(58)
	$\displaystyle=\mathds{E}[u_{e}(1)]$	(59)
	$\displaystyle=u_{e}(1).$	(60)

Here, (57) follows from the dominated convergence theorem [54, Sec. 5.6.(12).(b)], whose application is possible because $u_{e}(x)$ is a bounded function, (58) holds because $u_{e}(x)$ is continuous, and (59) follows from the law of large numbers (see e.g., [54, Sec. 7.4.(3)]). The proof of (25) is completed by using

\displaystyle u_{e}

\displaystyle(1)=Q\mathopen{}\left(\sqrt{\frac{P/5}{P_{\text{ASE}}+\eta P^{3}}}\right)^{e}

(61)

and (59) in (54) together with (21)–(23).

The proof of the BER expression in (24) follows steps similar to the ones we presented above.

Appendix D Proof of Theorem 3

Consider a sequence of independent symbols $X_{k}=R_{k}e^{\jmath\Phi_{k}},k\in\mathbb{Z}$ , where for each $k$ , the magnitude $R_{k}$ is independent of the phase $\Phi_{k}$ , which is uniform in $[0,2\pi)$ . The magnitude $R_{k}$ is distributed according to $f_{R}$ if $k=0\mod(2N+1)$ and is otherwise equal to the constant $r_{1}$ . Furthermore, $f_{R}$ and $r_{1}$ are chosen so that (35) holds, which guarantees that the average power constraint is satisfied. We will next show that the right-hand side of (36) is the mutual information (in bits per channel use) obtainable with this input distribution. Hence, it is a lower bound on capacity.

We define blocks of length $2N+1$ of transmitted and received symbols as

	$\displaystyle\boldsymbol{Y}_{l}$	$\displaystyle\triangleq\boldsymbol{Y}_{l(2N+1)-N}^{l(2N+1)+N},$
	$\displaystyle\boldsymbol{X}_{l}$	$\displaystyle\triangleq\boldsymbol{X}_{l(2N+1)-N}^{l(2N+1)+N}$

for $l\in\mathbb{Z}$ . Let us focus for a moment on the received block $\boldsymbol{Y}_{0}$ . Let $Y_{k}$ be the $k$ th element ( $k=-N,\dots,N$ ) of $\boldsymbol{Y}_{0}$ . It follows from (7) that the additive noise contribution to $Y_{k}$ depends on the input vector $\|\boldsymbol{X}_{k-N}^{k+N}\|$ , which may span more than one input block. By construction, however, all elements of $\boldsymbol{X}_{k-N}^{k+N}$ with the exception of $X_{0}$ have constant magnitude equal to $r_{1}$ . Hence,

\displaystyle\|\boldsymbol{X}_{k-N}^{k+N}\|^{2}=\left\lvert X_{0}\right\rvert^{2}+2Nr_{1}^{2}.

(62)

This implies that

f_{Y_{k}|\boldsymbol{X}_{k-N}^{k+N}}(y_{k}|\boldsymbol{x}_{k-N}^{k+N})\\ =\frac{1}{\pi\rho(2Nr^{2}_{1}+\left\lvert x_{0}\right\rvert^{2})}\mathrm{exp}\left(\frac{\left\lvert y_{k}-x_{k}\right\rvert^{2}}{\rho(2Nr^{2}_{1}+\left\lvert x_{0}\right\rvert^{2})}\right).

(63)

We see from (63) that each output sample $Y_{k}$ in $\boldsymbol{Y}_{0}$ actually depends on the input symbols only through $X_{k}$ and $X_{0}$ . We then conclude that $\boldsymbol{Y}_{0}$ depends on the whole input sequence only through $\boldsymbol{X}_{0}$ . But this, together with the assumption of independent input symbols, implies that the output blocks $\{\boldsymbol{Y}_{l}\}$ are independent. Hence, from (34),

\displaystyle C\geq\frac{1}{2N+1}I(\boldsymbol{X}_{l},;\boldsymbol{Y}_{l})

(64)

for an arbitrary $l\in\mathbb{Z}$ , say, $l=0$ .

Next, we calculate $I(\boldsymbol{X}_{0};\boldsymbol{Y}_{0})$ . The mutual information can be decomposed into differential entropies as

\displaystyle I(\boldsymbol{X}_{0};\boldsymbol{Y}_{0})=h(\boldsymbol{Y}_{0})-h(\boldsymbol{Y}_{0}|\boldsymbol{X}_{0}),

(65)

where

	$\displaystyle h(\boldsymbol{Y}_{0})$	$\displaystyle=-\mathbb{E}[\log_{2}f_{\boldsymbol{Y}_{0}}(\boldsymbol{Y}_{0})],$		(66)
	$\displaystyle h(\boldsymbol{Y}_{0}\|\boldsymbol{X}_{0})$	$\displaystyle=-\mathbb{E}[\log_{2}f_{\boldsymbol{Y}_{0}\|\boldsymbol{X}_{0}}(\boldsymbol{Y}_{0}\|\boldsymbol{X}_{0})].$		(67)

We start by evaluating (67). Because of (63), the conditional distribution of $\boldsymbol{Y}_{0}$ given $\boldsymbol{X}_{0}$ is the multivariate Gaussian density

f_{\boldsymbol{Y}_{0}|\boldsymbol{X}_{0}}(\boldsymbol{y}_{0}|\boldsymbol{x}_{0})\\ =\frac{1}{\bigl{(}\pi\rho(2Nr_{1}^{2}+\left\lvert x_{0}\right\rvert^{2})\bigr{)}^{2N+1}}\mathrm{exp}\mathopen{}\left(-\frac{\|\boldsymbol{y}_{0}-\boldsymbol{x}_{0}\|^{2}}{\rho(2Nr^{2}_{1}+\left\lvert x_{0}\right\rvert^{2})}\right).

(68)

Using [17, Theorem 8.4.1], we conclude that

\displaystyle h(\boldsymbol{Y}_{0}|\boldsymbol{X}_{0})=(2N+1)\mathbb{E}[\log_{2}\pi\rho(2Nr_{1}^{2}+\left\lvert X_{0}\right\rvert^{2})],

(69)

where the expectation is with respect to the random variable $\left\lvert X_{0}\right\rvert$ , which is distributed according to $f_{R}$ .

To evaluate (66), we start by noting that all elements of $\boldsymbol{Y}_{0}$ have uniform phase because the transmitted symbols and the additive noise samples have uniform phase by assumption. We use this property to simplify (66). Specifically, let $U_{k}=\left\lvert Y_{k}\right\rvert^{2}$ and

\displaystyle\boldsymbol{U}\triangleq[U_{-N},U_{-N+1},\ldots,U_{N}].

(70)

By [55, eq. (320)]

\displaystyle h(\boldsymbol{Y}_{0})=(2N+1)\log_{2}(\pi)+h(\boldsymbol{U}).

(71)

To evaluate $h(\boldsymbol{U})=-\mathbb{E}[\log_{2}(f_{\boldsymbol{U}}(\boldsymbol{U}))]$ , we first derive the conditional distribution $f_{\boldsymbol{U}|\left\lvert X_{0}\right\rvert}$ of $\boldsymbol{U}$ given $\left\lvert X_{0}\right\rvert$ . Note that $U_{k}$ has the same distribution as $\left\lvert\left\lvert X_{k}\right\rvert+\sqrt{\rho(2Nr^{2}_{1}+\left\lvert X_{0}\right\rvert^{2})}\tilde{Z}_{k}\right\rvert^{2}$ (see (1) and (7)). Hence, given $\left\lvert X_{0}\right\rvert=r$ , the random variables $\{2U_{k}/\rho(2Nr^{2}_{1}+r^{2})\}$ follow a noncentral chi-square distribution with two degrees of freedom and noncentrality parameters $\{2\left\lvert X_{k}\right\rvert^{2}/\rho(2Nr^{2}_{1}+r^{2})\}$ , where $\left\lvert X_{k}\right\rvert=r_{1}$ if $k\neq 0$ and $\left\lvert X_{k}\right\rvert=r$ otherwise. Furthermore, these random variables are conditionally independent given $\left\lvert X_{0}\right\rvert$ . Using the change of variable theorem for transformation of random variables, we finally obtain after algebraic manipulations

$\displaystyle f_{\boldsymbol{U}\|\left\lvert X_{0}\right\rvert}(\boldsymbol{u}\|r)$	$\displaystyle=$	$\displaystyle\frac{\mathrm{exp}\mathopen{}\left(-\frac{\sum_{k=-N}^{N}u_{k}+2Nr_{1}^{2}+r^{2}}{\rho(2Nr^{2}_{1}+r^{2})}\right)}{\bigl{(}\rho(2Nr^{2}_{1}+r^{2})\bigr{)}^{2N+1}}$	(72)
		$\displaystyle\cdot I_{0}\mathopen{}\left(\frac{2r\sqrt{u_{0}}}{\rho(2Nr_{1}^{2}+r^{2})}\right)$
		$\displaystyle\cdot\prod_{\begin{subarray}{c}k=-N\\ k\neq 0\end{subarray}}^{N}I_{0}\mathopen{}\left(\frac{2r_{1}\sqrt{u_{k}}}{\rho(2Nr_{1}^{2}+r^{2})}\right).$

The probability distribution $f_{\boldsymbol{U}}$ , which is given in (37), is obtained from (72) by taking the expectation with respect to $f_{R}$ , the probability distribution of $\left\lvert X_{0}\right\rvert$ . Finally, we obtain the capacity lower bound (36) by substituting (37) into (66) and (69) into (67), by computing the difference between the two resulting differential entropies according to (65), and by dividing by $2N+1$ .

References

[1] H. Sun, K.-T. Wu, and K. Roberts, “Real-time measurements of a 40 Gb/s coherent system,” Opt. Express, vol. 16, no. 2, pp. 873–879, 2008.
[2] K. Roberts, M. O’Sullivan, K.-T. Wu, H. Sun, A. Awadalla, D. J. Krause, and C. Laperle, “Performance of dual-polarization QPSK for optical transport systems,” J. Lightw. Technol., vol. 27, no. 16, pp. 3546–3559, Aug. 2009.
[3] A. D. Ellis, J. Zhao, and D. Cotter, “Approaching the non-linear Shannon limit,” J. Lightw. Technol., vol. 28, no. 4, pp. 423–433, Feb. 2010.
[4] A. Mecozzi and R.-J. Essiambre, “Nonlinear Shannon limit in pseudolinear coherent systems,” J. Lightw. Technol., vol. 30, no. 12, pp. 2011–2024, June 2012.
[5] A. Splett, C. Kurtzke, and K. Petermann, “Ultimate transmission capacity of amplified optical fiber communication systems taking into account fiber nonlinearities,” in Proc. European Conference on Optical Communication (ECOC), Montreux, Switzerland, Sept. 1993.
[6] J. Tang, “The channel capacity of a multispan DWDM system employing dispersive nonlinear optical fibers and an ideal coherent optical receiver,” J. Lightw. Technol., vol. 20, no. 7, pp. 1095–1101, July 2002.
[7] P. Poggiolini, A. Carena, V. Curri, G. Bosco, and F. Forghieri, “Analytical modeling of nonlinear propagation in uncompensated optical transmission links,” IEEE Photon. Technol. Lett., vol. 23, no. 11, pp. 742–744, June 2011.
[8] G. Bosco, P. Poggiolini, A. Carena, V. Curri, and F. Forghieri, “Analytical results on channel capacity in uncompensated optical links with coherent detection,” Opt. Express, vol. 19, no. 26, pp. B440–B449, Dec. 2011.
[9] A. Carena, V. Curri, G. Bosco, P. Poggiolini, and F. Forghieri, “Modeling of the impact of nonlinear propagation effects in uncompensated optical coherent transmission links,” J. Lightw. Technol., vol. 30, no. 10, pp. 1524–1539, May 2012.
[10] P. Poggiolini, “The GN model of non-linear propagation in uncompensated coherent optical systems,” J. Lightw. Technol., vol. 24, no. 30, pp. 3875–3879, Dec. 2012.
[11] L. Beygi, E. Agrell, P. Johannisson, M. Karlsson, and H. Wymeersch, “A discrete-time model for uncompensated single-channel fiber-optical links,” IEEE Trans. Commun., vol. 60, no. 11, pp. 3440–3450, Nov. 2012.
[12] P. Johannisson and M. Karlsson, “Perturbation analysis of nonlinear propagation in a strongly dispersive optical communication system,” J. Lightw. Technol., vol. 31, no. 8, pp. 1273–1282, Apr. 2013.
[13] E. Grellier and A. Bononi, “Quality parameter for coherent transmissions with Gaussian-distributed nonlinear noise,” Opt. Express, vol. 19, no. 13, pp. 12 781–12 788, June 2011.
[14] L. Beygi, N. V. Irukulapati, E. Agrell, P. Johannisson, M. Karlsson, H. Wymeersch, P. Serena, and A. Bononi, “On nonlinearly-induced noise in single-channel optical links with digital backpropagation,” Optics Express, vol. 21, no. 22, pp. 26 376–26 386, Nov. 2013.
[15] F. V. O. Rival, C. Simonneau, E. Grellier, A. Bononi, L. Lorcy, J.-C. Antona, and S. Bigo, “On nonlinear distortions of highly dispersive optical coherent systems,” Opt. Express, vol. 20, no. 2, pp. 1022–1032, Jan. 2012.
[16] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, July, Oct. 1948.
[17] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley, 2006.
[18] J. B. Stark, “Fundamental limits of information capacity for optical communications channels,” in Proc. European Conference on Optical Communication (ECOC), Nice, France, Sep. 1999.
[19] P. P. Mitra and J. B. Stark, “Nonlinear limits to the information capacity of optical fibre communications,” Nature, vol. 411, pp. 1027–1030, June 2001.
[20] K. S. Turitsyn, S. A. Derevyanko, I. V. Yurkevich, and S. K. Turitsyn, “Information capacity of optical fiber channels with zero average dispersion,” Physical Review Letters, vol. 91, no. 20, pp. 203 901 1–4, Nov. 2003.
[21] I. B. Djordjevic and B. Vasic, “Achievable information rates for high-speed long-haul optical transmission,” J. Lightw. Technol., vol. 11, no. 23, pp. 3755–3763, Nov. 2005.
[22] M. H. Taghavi, G. C. Papen, and P. H. Siegel, “On the multiuser capacity of WDM in a nonlinear optical fiber: Coherent communication,” IEEE Trans. Inf. Theory, vol. 52, no. 11, pp. 5008–5022, Nov. 2006.
[23] R.-J. Essiambre, G. Kramer, P. J. Winzer, G. J. Foschini, and B. Goebel, “Capacity limits of optical fiber networks,” J. Lightw. Technol., vol. 28, no. 4, pp. 662–701, Feb. 2010.
[24] M. Secondini, E. Forestieri, and G. Prati, “Achievable information rate in nonlinear WDM fiber-optic systems with arbitrary modulation formats and dispersion maps,” J. Lightw. Technol., vol. 31, no. 23, pp. 3839–3852, Dec. 2013.
[25] R. Dar, M. Shtaif, and M. Feder, “New bounds on the capacity of the nonlinear fiber-optic channel,” Optics Letters, vol. 39, no. 2, pp. 398–401, Jan. 2014.
[26] J. M. Kahn and K.-P. Ho, “Spectral efficiency limits and modulation/detection techniques for DWDM systems,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 10, no. 2, pp. 259–272, Mar./Apr 2004.
[27] E. Narimanov and P. Mitra, “The channel capacity of a fiber optics communication system: perturbation theory,” J. Lightw. Technol., vol. 20, no. 3, pp. 530–537, Mar. 2002.
[28] L. G. L. Wegener, M. L. Povinelli, A. G. Green, P. P. Mitra, J. B. Stark, and P. B. Littlewood, “The effect of propagation nonlinearities on the information capacity of WDM optical fiber systems: Cross-phase modulation and four-wave mixing,” Physica D: Nonlinear Phenomena, vol. 189, no. 1-2, pp. 81–99, Feb. 2004.
[29] R.-J. Essiambre, G. J. Foschini, G. Kramer, and P. J. Winzer, “Capacity limits of information transport in fiber-optic networks,” Physical Review Letters, vol. 101, no. 16, pp. 163 901 1–4, Oct. 2008.
[30] T. Freckmann, R.-J. Essiambre, P. J. Winzer, G. J. Foschini, and G. Kramer, “Fiber capacity limits with optimized ring constellations,” IEEE Photon. Technol. Lett., vol. 21, no. 20, pp. 1496–1498, Oct. 2009.
[31] I. B. Djordjevic, H. G. Batshon, L. Xu, and T. Wang, “Coded polarization-multiplexed iterative polar modulation (PM-IPM) for beyond 400 Gb/s serial optical transmission,” in Proc. Optical Fiber Communication Conference (OFC), San Diego, CA, Mar. 2010.
[32] R. I. Killey and C. Behrens, “Shannon’s theory in nonlinear systems,” Journal of Modern Optics, vol. 58, no. 1, pp. 1–10, Jan. 2011.
[33] E. Agrell and M. Karlsson, “Power-efficient modulation formats in coherent transmission systems,” J. Lightw. Technol., vol. 27, no. 22, pp. 5115–5126, Nov. 2009.
[34] ——, “WDM Channel Capacity and its Dependence on Multichannel Adaptation Models,” in Proc. Optical Fiber Communication Conference (OFC), Anaheim, CA, Mar. 2013.
[35] B. Goebel, R.-J. Essiambre, G. Kramer, P. J. Winzer, and N. Hanik, “Calculation of mutual information for partially coherent Gaussian channels with applications to fiber optics,” IEEE Trans. Inf. Theory, vol. 57, no. 9, pp. 5720–5736, Sep. 2011.
[36] G. Bosco, P. Poggiolini, A. Carena, V. Curri, and F. Forghieri, “Analytical results on channel capacity in uncompensated optical links with coherent detection: Erratum,” Opt. Express, vol. 20, no. 17, pp. 19 610–19 611, Aug. 2012.
[37] E. Agrell and M. Karlsson, “Satellite constellations: Towards the nonlinear channel capacity,” in Proc. IEEE Photon. Conf. (IPC), Burlingame, CA, Sept. 2012.
[38] A. Carena, G. Bosco, V. Curri, P. Poggiolini, M. T. Taiba, and F. Forghieri, “Statistical characterization of PM-QPSK signals after propagation in uncompensated fiber links,” in Proc. European Conference on Optical Communication (ECOC), London, U.K., Sept. 2010.
[39] T. Koch, A. Lapidoth, and P. Sotiriadis, “Channels that heat up,” IEEE Trans. Inf. Theory, vol. 55, no. 8, pp. 3594 –3612, Aug. 2009.
[40] R. G. Gallager, Information Theory and Reliable Communication. New York, NY: Wiley, 1968.
[41] E. Ip and J. M. Kahn, “Digital equalization of chromatic dispersion and polarization mode dispersion,” J. Lightw. Technol., vol. 25, no. 8, pp. 2033–2043, Aug. 2007.
[42] E. Agrell, J. Lassing, E. G. Ström, and T. Ottosson, “On the optimality of the binary reflected Gray code,” IEEE Trans. Inf. Theory, vol. 50, no. 12, pp. 3170–3182, Dec. 2004.
[43] M. P. Fitz and J. P. Seymour, “On the bit error probability of qam modulation,” International Journal of Wireless Information Networks, vol. 1, no. 2, pp. 131–139, Apr. 1994.
[44] M. K. Simon, S. M. Hinedi, and W. C. Lindsey, Digital Communication Techniques: Signal Design and Detection. Englewood Cliffs, NJ: Prentice-Hall, 1995.
[45] A. Mecozzi, “Limits to long-haul coherent transmission set by the kerr nonlinearity and noise of the in-line amplifiers,” J. Lightw. Technol., vol. 12, no. 11, pp. 1993–2000, Nov. 1994.
[46] A. Demir, “Nonlinear phase noise in optical-fiber-communication systems,” J. Lightw. Technol., vol. 25, no. 8, pp. 2002–2032, Aug. 2007.
[47] G. P. Agrawal, Fiber-optic communication systems, 4th ed. Wiley, 2010.
[48] S. Verdú and T. S. Han, “A general formula for channel capacity,” IEEE Trans. Inf. Theory, vol. 40, no. 4, pp. 1147–1157, July 1994.
[49] K.-T. Fang, S. Kotz, and K. W. Ng, Symmetric Multivariate and Related Distributions. Springer, 1990.
[50] S. Kotz and S. Nadarajah, Multivariate t Distributions and Their Applications. Cambridge University Press, 2004.
[51] N. M. Blachman, “A comparison of the informational capacities of amplitude- and phase-modulation communication systems,” Proceedings of the I.R.E., vol. 41, no. 6, pp. 748–759, June 1953.
[52] K.-P. Ho and J. M. Kahn, “Channel capacity of WDM systems using constant-intensity modulation formats,” in Proc. Optical Fiber Communication Conference (OFC), Anaheim, CA, Mar. 2002.
[53] E. Agrell, “On monotonic capacity–cost functions,” 2012, preprint. [Online]. Available: http://arxiv.org/abs/1209.2820
[54] G. R. Grimmett and D. R. Stirzaker, Probability and Random Processes, 3rd ed. Oxford University Press, 2001.
[55] A. Lapidoth and S. M. Moser, “Capacity bounds via duality with applications to multiple-antenna systems on flat-fading channels,” IEEE Trans. Inf. Theory, vol. 49, no. 10, pp. 2426–2467, Oct. 2003.

Capacity of a Nonlinear Optical Channel with Finite Memory

Abstract

Index Terms:

I Introduction

II Channel Modeling: Finite and Infinite Memory

II-A Nonlinear Interference in Optical Channels

II-B Finite Memory

II-C The Regular GN Model

II-D The Finite-Memory GN Model

II-E Numerical Comparison

III Uncoded Error Probability

III-A Error Probability Analysis

Remark 1

Remark 2

Theorem 1

Proof:

Theorem 2

Proof:

Corollary 1

Proof:

Corollary 2

Proof:

III-B Numerical Results

IV Channel Capacity

IV-A Preliminaries

IV-B Memoryless Channels

IV-C Channels with Memory

IV-D Lower Bound

Theorem 3

Proof:

IV-E Numerical Results

V Discussion and Conclusions

Appendix A Proof of Theorem 1

Appendix B Proof of Theorem 2

Appendix C Proof of Corollary 1

Appendix D Proof of Theorem 3

References

Capacity of a Nonlinear Optical Channel
with Finite Memory