This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Capacity of a Nonlinear Optical Channel
with Finite Memory

Erik Agrell, Alex Alvarado, Giuseppe Durisi, and Magnus Karlsson Research supported by the Swedish Research Council (VR) under grant no. 2012-5280, the Swedish Foundation for Strategic Research (SSF) under grant no. RE07-0026, and the European Community’s Seventh’s Framework Programme (FP7/2007-2013) under grant agreement no. 271986. This paper was presented in part at the 2013 European Conference on Optical Communication.E. Agrell and G. Durisi are with the Department of Signals and Systems, Chalmers University of Technology, SE-41296 Gothenburg, Sweden (email: {agrell,durisi}@chalmers.se).M. Karlsson is with the Department of Microtechnology and Nanoscience, Chalmers University of Technology, SE-41296 Gothenburg, Sweden (email: magnus.karlsson@chalmers.se).A. Alvarado is with the Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, United Kingdom (email: alex.alvarado@ieee.org).
Abstract

The channel capacity of a nonlinear, dispersive fiber-optic link is revisited. To this end, the popular Gaussian noise (GN) model is extended with a parameter to account for the finite memory of realistic fiber channels. This finite-memory model is harder to analyze mathematically but, in contrast to previous models, it is valid also for nonstationary or heavy-tailed input signals. For uncoded transmission and standard modulation formats, the new model gives the same results as the regular GN model when the memory of the channel is about 10 symbols or more. These results confirm previous results that the GN model is accurate for uncoded transmission. However, when coding is considered, the results obtained using the finite-memory model are very different from those obtained by previous models, even when the channel memory is large. In particular, the peaky behavior of the channel capacity, which has been reported for numerous nonlinear channel models, appears to be an artifact of applying models derived for independent input in a coded (i.e., dependent) scenario.

Index Terms:
Channel capacity, channel model, fiber-optic communications, Gaussian noise model, nonlinear distortion.

I Introduction

The introduction of coherent optical receivers has brought significant advantages in fiber optical communications, e.g., enabling efficient polarization demultiplexing, higher-order modulation formats, increased sensitivity, and electrical mitigation of transmission impairments[1, 2]. Even if the linear transmission impairments (such as chromatic and polarization-mode dispersion) can be dealt with electronically, the Kerr nonlinearity in the fiber remains a significant obstacle. Since the nonlinearity causes signal distortions at high signaling powers, arbitrarily high signal-to-noise ratios are inaccessible, which limits transmission over long distances and high spectral efficiencies. This is sometimes referred to as the “nonlinear Shannon limit” [3, 4].

For systems with large accumulated dispersion and weak nonlinearity, the joint effect of chromatic dispersion and the Kerr effect is similar to that of additive Gaussian noise. This was pointed out already by Splett [5] and Tang [6]. The emergence of this Gaussian noise is prevalent in links that have no inline dispersion compensation, such as today’s coherent links, where the dispersion compensation takes place electronically in the receiver signal processing. This Gaussian noise approximation has been recently rediscovered and applied to today’s coherent links in a series of papers by Poggiolini et al. [7, 8, 9, 10] and other groups [11, 12, 13]. The resulting so-called Gaussian noise model, or GN model for short, is valid for multi-channel (wavelength- and polarization-division multiplexed) signals. It has also been shown to work for single-channel and single-polarization transmission if the dispersive decorrelation is large enough [11, 14].

A crucial assumption in the derivation of the GN model is that of independent, identically distributed (i.i.d.) inputs: the transmitted symbols are independent of each other, are drawn from the same constellation, and have the same constellation scaling (the same average transmit power). Under these assumptions, the model has been experimentally verified to be very accurate [15, 9] for the most common modulation formats, such as quadrature amplitude modulation (QAM) or phase-shift keying.

In this paper, the assumption of i.i.d. inputs is, perhaps for the first time in optical channel modeling, relaxed. This is done by introducing a modified GN model, which we call the finite-memory GN model. This new model includes the memory of the channel as a parameter and differs from previous channel models in that it is valid also when the channel input statistics are time-varying, or when “heavy-tailed” constellations are used.

The performance predicted by the regular GN model (both in terms of uncoded error probability and channel capacity) is compared with the ones predicted by the finite-memory GN model. The uncoded performance is characterized in terms of symbol error rate (SER) and bit error rate (BER), assuming i.i.d. data. Exact analytical expressions are obtained for 16-ary QAM (16-QAM), which show that the GN model is accurate for uncoded transmission and standard modulation formats, confirming previous results.

The main contributions of the paper are in terms of coded performance. Shannon, the father of information theory, proved that for a given channel, it is possible to achieve an arbitrarily small error probability, if the transmission rate in bits per symbol is small enough. A rate for which virtually error-free transmission is possible is called an achievable rate and the supremum over all achievable rates for a given channel, represented as a statistical relation between its input XX and output YY, is defined as the channel capacity [16], [17, p. 195]. A capacity-approaching transmission scheme operates in general by grouping the data to be transmitted into blocks, encoding each block into a sequence of coded symbols, modulating and transmitting this sequence over the channel, and decoding the block in the receiver. This coding process introduces, by definition, dependencies among the transmitted symbols, which is the reason why channel models derived for i.i.d. inputs may be questionable for the purpose of capacity analysis.

More fundamentally, the regular GN model is not well-suited to capacity analysis, because in this model each output sample depends on the statistics of the previously transmitted input symbols (through their average power) rather than on their actual value. This yields artifacts in capacity analysis. One such artifact is the peaky behavior of the capacity of the GN model as a function of the transmit power. Indeed, through a capacity lower bound it is shown in this paper that this peaky behavior does not occur for the finite-memory GN model, even when the memory is taken to be arbitrary large.

The analysis of channel capacity for fiber-optical transmission dates back to 1993 [5], when Splett et al. quantified the impact of nonlinear four-wave mixing on the channel capacity. By applying Shannon’s formula for the additive white Gaussian noise (AWGN) channel capacity to a channel with power-dependent noise, Splett et al. found that there exists an “optimal” finite signal-to-noise ratio that maximizes capacity. Beyond this value, capacity starts decreasing. It was however not motivated in [5] why the noise was assumed Gaussian. Using a different model for four-wave mixing, Stark [18] showed that capacity saturates, but does not decrease, at high power. In the same paper, the capacity loss due to the quantum nature of light was quantified. In 2001, Mitra and Stark [19] considered the capacity in links where cross-phase modulation dominates, proved that the capacity is lower-bounded by the capacity of a linear, Gaussian channel with the same input–output covariance matrix, and evaluated this bound via Shannon’s AWGN formula. The obtained bound vanishes at high input power. It was claimed, without motivation, that the true capacity would have the same qualitative nonmonotonic behavior.

Since 2001, the interest in optical channel capacity has virtually exploded. The zero-dispersion channel was considered by Turitsyn et al. [20]. The joint effect of nonlinearity and dispersion was modeled by Djordjevic et al. [21] as a finite-state machine, which allowed the capacity to be estimated using the Bahl–Cocke–Jelinek–Raviv (BCJR) algorithm. Taghavi et al. [22] considered a fiber-optical multiuser system as a multiple-access channel and characterized its capacity region. In a very detailed tutorial paper, Essiambre et al. [23] applied a channel model based on extensive lookup tables and obtained capacity lower bounds for a variety of scenarios. Secondini et al. [24] obtained lower bounds using the theory of mismatched decoding. Recently, Dar et al. [25] modeled the nonlinear phase noise as being blockwise constant for a certain number of symbols, which is a channel with finite memory, obtaining improved capacity bounds.

Detailed literature reviews are provided in [26] for the early results, and in [23] for more recent results. Other capacity estimates, or lower bounds thereon, were reported for various nonlinear transmission scenarios in, e.g., [27, 28, 29, 30, 31, 32, 8, 4]. Most of these estimates or bounds decrease to zero as the power increases.

This paper is organized as follows. In Sec. II, the GN model is reviewed and the finite-memory GN model is introduced. In Sec. III, the uncoded error performance of the new finite-memory model is studied. The channel capacity is studied in Sec. IV and conclusions are drawn in Sec. V. The mathematical proofs are relegated to appendices.

Notation: Throughout this paper, vectors are denoted by boldface letters 𝒙\boldsymbol{x} and sets are denoted by calligraphic letters 𝒳\mathcal{X}. Random variables are denoted by uppercase letters XX and their (deterministic) outcomes by the same letter in lowercase xx. Probability density functions (PDFs) and conditional PDFs are denoted by fY(y)f_{Y}(y) and fY|X(y|x)f_{Y|X}(y|x), respectively. Analogously, probability mass functions (PMF) are denoted by PX(x)P_{X}(x) and PX|Y(x|y)P_{X|Y}(x|y). Expectations are denoted by 𝔼[]\mathbb{E}[\cdot] and random sequences by {Zk}\{{Z}_{k}\}.

II Channel Modeling: Finite and Infinite Memory

In this section, we will begin with a high-level description of the nonlinear interference in optical dual-polarization wavelength-division multiplexing (WDM) systems, highlighting the role of the channel memory, and thereafter in Sec. II-BII-D describe in detail the channel models considered in this paper.

II-A Nonlinear Interference in Optical Channels

A coherent optical communication link converts a discrete, complex-valued electric data signal xkx_{k} to a modulated, continuous optical signal, which is transmitted through an optical fiber, received coherently, and then converted back to a discrete output sequence YkY_{k}. The coherent link is particularly simple theoretically, in that the transmitter and receiver directly map the electric data to the optical field, which is a linear operation (in contrast with, e.g., direct-detection receivers), and can ideally be performed without distortions. The channel is then well described by the propagation of the (continuous) optical field in the fiber link. It should be emphasized that this assumes the coherent receiver to be ideal, with perfect synchronization and negligible phase noise. Experiments have shown [2] that commercial coherent receivers can indeed perform well enough for the fiber propagation effects to be the main limitations. Two main linear propagation effects in the fiber need to be addressed: dispersion and attenuation. The attenuation effects can be overcome by periodic optical amplification, at the expense of additive Gaussian noise from the inline amplifiers. The dispersion effects are usually equalized electronically by a filter in the coherent receiver. Such a linear optical link can be well-described by an AWGN channel, the capacity of which is unbounded with the signal power.

However, the fiber Kerr-nonlinearity introduces signal distortions, and greatly complicates the transmission modeling. The nonlinear signal propagation in the fiber is described by a nonlinear partial differential equation, the nonlinear Schrödinger equation (NLSE), which includes dispersion, attenuation, and nonlinearity. At high power levels, the three effects can no longer be conveniently separated. However, in contemporary coherent links (distance at least 500km500~{\textnormal{km}} and symbol rate at least 28Gbaud28~{\textnormal{Gbaud}}), the nonlinearity is significantly weaker than the other two effects, and a perturbation approach can be successfully applied to the NLSE [5, 10, 11, 12]. This leads to the GN model, which will be described in Sec. II-C.

II-B Finite Memory

Even today’s highly dispersive optical links have a finite memory. For example, a signal with dispersive length LD=1/(Δω2|β2|)L_{\text{D}}=1/(\Delta\omega^{2}|\beta_{2}|), where β2\beta_{2} is the group velocity dispersion and Δω\Delta\omega the optical bandwidth, broadens (temporally) a factor L/LDL/L_{\text{D}} over a fiber of length LL. With typical dispersion lengths of 5550km50~{\textnormal{km}}, this broadening factor can correspond to hundreds to thousands of adjacent symbols, a large but finite number. The same will hold for interaction among WDM channels; if one interprets Δω\Delta\omega as the channel separation, L/LDL/L_{\text{D}} will give an approximation of the number of symbols that two WDM channels separate due to walk-off (and hence interact with nonlinearly during transmission). The channel memory will thus be even larger in the WDM case, and increase with channel separation, but the nonlinear interaction will decrease due to the shorter LDL_{\text{D}}. Thus, the principle of a finite channel memory holds also for WDM signals. To keep notation as simple as possible, we will consider a single, scalar, wavelength channel in this paper. Extensions to dual polarizations and WDM are possible, but will involve obscuring complications such as four-dimensional constellation space [33] in the former case and behavioral models [34] in the latter. We can thus say that in an optical link a certain signal may sense the interference from NL/LDN\approx L/L_{\text{D}} neighboring symbols, which is the physical reason for introducing a finite-memory model.

If we let the range NN of the interfering symbols go to infinity, an even simpler type of model is obtained. The interference is now averaged over infinitely many transmitted symbols. Assuming that an i.i.d. sequence is transmitted, this time average converges to a statistical average, which greatly simplifies the analysis. All models suggested for dispersive optical channels so far belong to this category [5, 23, 35, 10, 4, 11, 12, 14], of which the GN model described in Sec. II-C is the most common.

For a given transmitted complex symbol xkx_{k}, the (complex) single-channel output at each discrete-time kk\in\mathbb{Z} is modeled as

Yk=xk+Zk,\displaystyle Y_{k}=x_{k}+Z_{k}, (1)

where {Zk}\{Z_{k}\} is a circularly symmetric, complex, white, Gaussian random sequence, independent of xkx_{k}. In (1), ZkZ_{k} is assumed to be independent of the actual transmitted sequence xkx_{k}. However, the variance of ZkZ_{k} depends on the transmit power, as detailed in Sec. II-C and II-D.

II-C The Regular GN Model

For coherent long-haul fiber-optical links without dispersion compensation, Splett et al. [5], Poggiolini et al. [7], and Beygi et al. [11] have all derived models where the nonlinear interference (NLI) appears as Gaussian noise, whose statistics depend on the transmitted signal power via a cubic relationship. The models assume that the transmitted symbols xkx_{k} in time slot kk\in\mathbb{Z} are i.i.d. realizations of the same complex random variable XX. In this model, the additive noise in (1) is given by

Zk=Z~kPASE+ηP3\displaystyle Z_{k}=\tilde{Z}_{k}\sqrt{P_{\text{ASE}}+\eta P^{3}} (2)

where {Z~k}\{\tilde{Z}_{k}\} are i.i.d. zero-mean unit-variance circularly symmetric complex Gaussian random variables, PASEP_{\text{ASE}} and η\eta are real, nonnegative constants, and P=𝔼[|X|2]P=\mathbb{E}[|X|^{2}] is the average transmit power. Therefore, the noise ZkZ_{k} is distributed as Zk𝒞𝒩(0,PASE+ηP3)Z_{k}\sim\mathcal{CN}(0,P_{\text{ASE}}+\eta P^{3}), where 𝒞𝒩(0,σ2)\mathcal{CN}(0,\sigma^{2}) denotes a circularly symmetric complex Gaussian random variable with mean 0 and variance σ2\sigma^{2}. The parameter PP, which is a property of the transmitter, governs the behavior of the channel model. This can be intuitively understood as a long-term average of the input power. Mathematically,

P=limN12N+1i=kNk+N|xi|2\displaystyle P=\lim_{N\rightarrow\infty}\frac{1}{2N+1}\sum_{i=k-N}^{k+N}|x_{i}|^{2} (3)

for any given kk, still assuming i.i.d. symbols xkx_{k}. For this reason, we will refer to models that depend on infinitely many past and/or future symbols, via PP in (3) or in some other way, as infinite-memory models.

The cubic relation in (2) between the transmit power and the additive noise variance PASE+ηP3P_{\text{ASE}}+\eta P^{3} is a consequence of the Kerr nonlinearity, and holds for both lumped and distributed amplification schemes. The constant PASEP_{\text{ASE}} represents the total amplified spontaneous emission (ASE) noise of the optical amplifiers for the channel under study, while η\eta quantifies the NLI. Several related expressions for this coefficient have been proposed. For example, for distributed amplification and WDM signaling over the length LL,

η\displaystyle\eta =4γ2Lπ|β2|B2loge(2πe|β2|LB2),\displaystyle=\frac{4\gamma^{2}L}{\pi|\beta_{2}|B^{2}}\log_{e}\left(2\pi e|\beta_{2}|LB^{2}\right), (4)
η\displaystyle\eta =16γ2L27π|β2|Rs2loge(23π2|β2|LB2),\displaystyle=\frac{16\gamma^{2}L}{27\pi|\beta_{2}|R_{\text{s}}^{2}}\log_{e}\left(\frac{2}{3}\pi^{2}|\beta_{2}|LB^{2}\right), (5)

were proposed in [5] and [36], resp., where γ\gamma is the fiber nonlinear coefficient, BB is the total WDM bandwidth, and RsR_{\text{s}} is the symbol rate. Obviously, the expressions in (4) and (5) are qualitatively similar. For dual polarization and single channel transmission over MM lumped amplifier spans, the expression

η=3γ2α2M1+ϵtanh(α4|β2|Rs2)\eta=\frac{3\gamma^{2}}{\alpha^{2}}M^{1+\epsilon}\tanh\left(\frac{\alpha}{4|\beta_{2}|R_{\textnormal{s}}^{2}}\right) (6)

was proposed in [14], and a qualitatively similar formula can be obtained from the results in [10]. Here, α\alpha is the attenuation coefficient of the fiber and the coefficient ϵ\epsilon is between 0 and 1 (see [10, 14]) depending on how well the nonlinear interference decorrelates between each amplifier span. For single polarization transmission, the coefficient 33 in (6) should be replaced by 22 [11].

The benefits of the GN model is that it is very accurate for uncoded transmission with traditional modulation formats,111The model is not valid for exotic modulation formats such as satellite constellations [37]. as demonstrated in experiments and simulations [38, 15, 9], and that it is very simple to analyze. It is, however, not intended for nonstationary input sequences, i.e., sequences whose statistics vary with time, because the transmit power PP in (2) is defined as the (constant) power of a random variable that generates the i.i.d. symbols xkx_{k}. In order to capture the behavior of a wider class of transmission schemes, the GN model can be modified to depend on a time-varying transmit power, which is the topic of the next section.

II-D The Finite-Memory GN Model

As mentioned in Sec. I and II-C, a finite-memory model is essential in order to model the channel output corresponding to time-varying input distributions. Therefore, we refine the GN model in Sec. II-C to make it explicitly dependent on the channel memory NN, in such a way that the model “converges” to the regular GN model as NN\rightarrow\infty. Many such models can be formulated. In this paper, we aim for simplicity rather than accuracy.

The proposed model assumes that the input–output relation is still given by (1), but the average transmit power PP in (2) is replaced by an empirical power, i.e., by the arithmetic average of the squared magnitude of the symbol xkx_{k} and of the 2N2N symbols around it. Mathematically, (2) is replaced by

Zk\displaystyle Z_{k} =Z~kPASE+η(12N+1i=kNk+N|xi|2)3\displaystyle=\tilde{Z}_{k}\sqrt{P_{\text{ASE}}+\eta\left(\frac{1}{2N+1}\sum_{i=k-N}^{k+N}|x_{i}|^{2}\right)^{3}} (7)

for any kk\in\mathbb{Z}, where NN is the (one-sided) channel memory. We refer to (1) and (7) as the finite-memory GN model. Since (second-order) group velocity dispersion causes symmetric broadening with respect to the transit time of the signal, inter-symbol interference from dispersion will act both backwards and forwards in terms of the symbol index. This is why both past and future inputs contribute to the noise power in (7). A somewhat related model for the additive noise in the context of data transmission in electronic circuits has been recently proposed in [39], where the memory is single-sided and the noise scales linearly with the signal power, not cubically as in (7).

Having introduced the finite-memory GN model, we now discuss some particular cases. First, the memoryless AWGN channel model can be obtained from both the GN and finite-memory GN models by setting η=0\eta=0. In this case, the noise variance is 𝔼[|Zk|2]=PASE\mathds{E}[|Z_{k}|^{2}]=P_{\text{ASE}} for all kk. Second, let us consider the scenario where the transmitted symbols is the random process {Xi}\{X_{i}\}. Then the empirical power (1/(2N+1))i=kNk+N|Xi|2(1/(2N+1))\sum_{i=k-N}^{k+N}|X_{i}|^{2} at any discrete time kk is a random variable that depends on the magnitude of the kkth symbol and the 2N2N symbols around it. In the limit NN\rightarrow\infty, this empirical power converges to the “statistical” power PP in (3), for any i.i.d. process with power PP, as mentioned in Sec. II-C. This observation shows that the proposed finite-memory model in (7) “converges” to the GN model in (2), provided that the channel memory NN is sufficiently large and that the process consists of i.i.d. symbols with zero mean and variance PP.

The purpose of the finite-memory model is to be able to predict the output of the channel when the transmitted symbols are not i.i.d. This is the case for example when the transmitted symbols are a nonstationary process (as will be exemplified in Sec. II-E) and also for coded sequences (which we discuss in Sec. IV). An advantage of the finite-memory model, from a theoretic viewpoint, is that the input–output relation of the channel is modeled as a fixed conditional probability of the output given the input and its history, which is the common notion of a channel model in communication and information theory ever since the work of Shannon [16], [40, p. 74]. This is in contrast to the regular GN model and other channel models, whose conditional distribution change depending on which transmitter the channel is connected to. Specifically, the GN model is represented by a family of such conditional distributions, one for each value of the transmitter parameter PP.

\psfrag{xlabel}[lB][ct][0.85][29]{Symbol slots}\psfrag{ylabel}[rB][ct][0.85][-19]{Distance [km]}\includegraphics{pulse_broadening_P_0_1_mW_gamma_1_27}
Figure 1: Amplitude for a linearly propagating 15.6ps15.6~{\textnormal{ps}} raised-cosine pulse (compatible with 32GBaud32~{\textnormal{GBaud}}) over 700km700~{\textnormal{km}} fiber with β2=21.7ps2/km\beta_{2}=-21.7~{\textnormal{ps}}^{2}/{\textnormal{km}}. The lossy NLSE over 10 amplifier spans was simulated, with ASE noise switched off for clarity, and the peak power used was 0.1 mW.

A drawback with the proposed finite-memory model is that it is more complex than the GN model. Also, our model is not accurate for small values of NN, since the GN assumption relies on the central limit theorem [7, 11, 12]. Furthermore, we assumed that all the 2N2N symbols around the symbol xkx_{k} affect the noise variance equally. In practice, this is not the case. We nevertheless use the proposed model in this paper because it is relatively easy to analyze (see Sec. III and IV) and because even this simple finite-memory model captures the quantitative effects caused by non-i.i.d. symbols, which is essential for the capacity analysis in Sec. IV.

II-E Numerical Comparison

Before analyzing the finite-memory GN model, we first quantify the chromatic dispersion of the optical fiber. To this end, we simulated the transmission of a single symbol pulse over a over a single-channel, single-polarization fiber link without dispersion compensation. Ten amplifiers spans over a total distance of 700km700~{\textnormal{km}} are simulated using the lossy NLSE model. We used a raised-cosine pulse with peak power 0.1mW0.1~{\textnormal{mW}} and a duration of 15.6ps15.6~{\textnormal{ps}} at half the maximum amplitude, which corresponds to half the symbol slot in a 32Gbaud32~{\textnormal{Gbaud}} transmission system. The result is illustrated in Fig. 1. At this low power, the nonlinear effects are almost negligible. For clarity of illustration, the ASE noise was neglected by setting PASE=0P_{\text{ASE}}=0. The remaining system parameters are given in Table I and will be used throughout the paper, except when other values are explicitly stated. As we can see, the pulse broadens as it propagates along the fiber, having a width corresponding to about 100100 data symbols after 700km700~{\textnormal{km}} of transmission, or a half-width of N=50N=50 symbols. This is in good agreement with the relation for symbol memory used in [41, p. 2037], which gives 2N2π|β2|LRs2=972N\approx 2\pi|\beta_{2}|LR_{\text{s}}^{2}=97.

TABLE I: System parameters used in the paper.
Symbol Value Meaning
α\alpha 0.2dB/km0.2~{\textnormal{dB/km}} Fiber attenuation
β2\beta_{2} 21.7ps2/km-21.7~{\textnormal{ps}}^{2}/{\textnormal{km}} Group velocity dispersion
γ\gamma 1.27(W km)11.27~{\textnormal{(W km)}}^{-1} Fiber nonlinear coefficient
MM 1010 Number of amplifier spans
LL 700km700~{\textnormal{km}} System length
RsR_{\textnormal{s}} 32Gbaud32~{\textnormal{Gbaud}} Symbol rate
PASEP_{\text{ASE}} 4.1106W4.1\cdot 10^{-6}~{\textnormal{W}} Total ASE noise
η\eta 7244W27244~{\textnormal{W}}^{-2} NLI coefficient

Next, to validate the behavior of the finite-memory model with nonstationary input symbol sequences, we simulated the transmission of independent quadrature phase-shift keying (QPSK) data symbols with a time-varying magnitude, over the same 700700 km fiber link, at Rs=32GbaudR_{\textnormal{s}}=32~{\textnormal{Gbaud}}. The transmitted sequence consists of 128128 symbols with 4mW4~{\textnormal{mW}} of average signal power, 128 symbols at 0mW0~{\textnormal{mW}} power, 128 symbols at 4mW4~{\textnormal{mW}}, and so on. The statistical power is then 2mW2~{\textnormal{mW}}. The chosen pulse shape is a raised-cosine return-to-zero pulse. In Fig. 2, we show the amplitude of the transmitted symbols |xk||x_{k}| (red) and received symbols |Yk||Y_{k}| (blue) with three different models: the NLSE, the finite-memory GN model with N=50N=50, and the regular GN model. In the middle and lower plots of Fig. 2, we used the NLI coefficient η=7244W2\eta=7244~{\textnormal{W}}^{-2}, which was calculated from (6), using ϵ=0\epsilon=0 for simplicity. Also in Fig. 2, we used PASE=0P_{\text{ASE}}=0 to better illustrate the properties of the nonlinear models.

As can be seen, the agreement between the NLSE simulations and the finite memory model is quite reasonable, but the GN model cannot capture the nonstationary dynamics. The results in Fig. 2 also show that the noise variance in the NLSE simulation is low around the symbols with low input power and high around the symbols with high input power. This behavior is captured by the finite-memory GN model but not by the regular GN model, for which the variance of the noise is the same for any time instant. This illustrates that the GN model (2) should be avoided with nonstationary symbol sequences as the ones used in Fig. 2. This is not surprising, as the model was derived under an i.i.d. assumption. In Sec. V, we will return to this observation when analyzing coded transmission. We believe that the finite-memory GN model proposed here, albeit idealized, is the first model that is able to deal with nonstationary symbol sequences.

\psfrag{NLSE}[cl][cl][0.85]{\fcolorbox{black}{white}{{\color{black}NLSE simulation}}}\psfrag{N=50}[cl][cl][0.85]{\fcolorbox{black}{white}{{\color{black}$N=50$}}}\psfrag{GN}[cl][cl][0.85]{\fcolorbox{black}{white}{{\color{black}GN model}}}\includegraphics[width=216.81pt]{pulsed_tx_example_linear.eps}
Figure 2: Amplitude of the transmitted QPSK symbols |xk||x_{k}| (red squares) and received symbols |Yk||Y_{k}| (blue circles) transmitted in a 700km700~{\textnormal{km}} fiber link. The received symbols are obtained using (top) the NLSE, (middle) the finite-memory GN model (7) with N=50N=50, and (bottom) the regular GN model (2).

III Uncoded Error Probability

We assume that the transmitted symbols {Xk}\{X_{k}\} are independently drawn from a discrete constellation 𝒮={s1,,sM}\mathcal{S}=\{s_{1},\ldots,s_{M}\}, where M=2mM=2^{m}. The symbols are assumed to be selected with the same probability, and thus, the average transmit (statistical) power is given by

P=𝔼[|X|2]=1Mi=116|si|2.\displaystyle P=\mathds{E}[|X|^{2}]=\frac{1}{M}\sum_{i=1}^{16}|s_{i}|^{2}. (8)

For each time instant kk, we denote the sequence of the 2N2N symbols transmitted around xkx_{k} by

𝑿kmem[XkN,,Xk1,Xk+1,,Xk+N],\displaystyle\boldsymbol{X}^{\text{mem}}_{k}\triangleq[X_{k-N},\ldots,X_{k-1},X_{k+1},\ldots,X_{k+N}], (9)

where the notation emphasizes that 𝑿kmem\boldsymbol{X}^{\text{mem}}_{k} is a random vector describing the channel memory at time instant kk. For future use, we define the function

ρ(a)\displaystyle{\rho}(a) PASE+η(a2N+1)3.\displaystyle\triangleq P_{\text{ASE}}+\eta\left(\frac{a}{2N+1}\right)^{3}. (10)

For a given sequence of 2N2N symbols 𝒙kmem\boldsymbol{x}^{\text{mem}}_{k} and a given transmitted symbol Xk=siX_{k}=s_{i}, the conditional variance of the additive noise in (7) can be expressed as an explicit function of 𝒙kmem\boldsymbol{x}^{\text{mem}}_{k} using (10), i.e.,

ρ(|si|2+𝒙kmem2)\displaystyle\rho(\left\lvert s_{i}\right\rvert^{2}\!+\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2}) =PASE+η(|si|2+𝒙kmem22N+1)3,\displaystyle=P_{\text{ASE}}+\eta\left(\frac{|s_{i}|^{2}\!+\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2}}{2N+1}\right)^{3}, (11)

where 𝒙\|\boldsymbol{x}\| denotes the Euclidean norm of 𝒙\boldsymbol{x}. For a given transmitted symbol Xk=siX_{k}=s_{i} and a given sequence 𝒙kmem\boldsymbol{x}^{\text{mem}}_{k}, the channel law for the finite-memory model is

fYk|Xk,𝑿kmem(y|si,𝒙kmem)\displaystyle f_{Y_{k}|X_{k},\boldsymbol{X}^{\text{mem}}_{k}}(y|s_{i},\boldsymbol{x}^{\text{mem}}_{k})
1πρ(|si|2+𝒙kmem2)exp(|ysi|2ρ(|si|2+𝒙kmem2)).\displaystyle\qquad\triangleq\frac{1}{\pi\rho(\left\lvert s_{i}\right\rvert^{2}\!+\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2})}\mathrm{exp}{\biggl{(}-\frac{|y-s_{i}|^{2}}{\rho(\left\lvert s_{i}\right\rvert^{2}\!+\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2})}\biggr{)}}. (12)

III-A Error Probability Analysis

We consider the equally spaced 16-QAM constellation shown in Fig. 3. In this case, 𝒮={a+b1:a,b{±Δ,±3Δ}}\mathcal{S}=\{a+b\sqrt{-1}:a,b\in\{\pm\Delta,\pm 3\Delta\}\}, the minimum Euclidean distance (MED) of the constellation is 2Δ2\Delta, and the statistical power is P=10Δ2P=10\Delta^{2}. The binary labeling is the binary reflected Gray code (BRGC) [42], where the first two bits determine the in-phase (real) component of the symbols and the last two bits determine the quadrature (imaginary) components of the symbols. This is shown with colors in Fig. 3.

\psfrag{dd}[cb][cb][0.85]{$2\Delta$}\psfrag{x1}[cb][cb][0.85]{$s_{1}$}\psfrag{x2}[cb][cb][0.85]{$s_{2}$}\psfrag{x3}[cb][cb][0.85]{$s_{3}$}\psfrag{x4}[cb][cb][0.85]{$s_{4}$}\psfrag{x5}[cb][cb][0.85]{$s_{5}$}\psfrag{x6}[cb][cb][0.85]{$s_{6}$}\psfrag{x7}[cb][cb][0.85]{$s_{7}$}\psfrag{x8}[cb][cb][0.85]{$s_{8}$}\psfrag{x9}[cb][cb][0.85]{$s_{9}$}\psfrag{x10}[cb][cb][0.85]{$s_{10}$}\psfrag{x11}[cb][cb][0.85]{$s_{11}$}\psfrag{x12}[cb][cb][0.85]{$s_{12}$}\psfrag{x13}[cb][cb][0.85]{$s_{13}$}\psfrag{x14}[cb][cb][0.85]{$s_{14}$}\psfrag{x15}[cb][cb][0.85]{$s_{15}$}\psfrag{x16}[cb][cb][0.85]{$s_{16}$}\psfrag{b1}[cb][cb][0.85]{{\color[rgb]{1,0,0}00}{\color[rgb]{0,0,1}00}}\psfrag{b2}[cb][cb][0.85]{{\color[rgb]{1,0,0}00}{\color[rgb]{0,0,1}01}}\psfrag{b3}[cb][cb][0.85]{{\color[rgb]{1,0,0}00}{\color[rgb]{0,0,1}11}}\psfrag{b4}[cb][cb][0.85]{{\color[rgb]{1,0,0}00}{\color[rgb]{0,0,1}10}}\psfrag{b5}[cb][cb][0.85]{{\color[rgb]{1,0,0}01}{\color[rgb]{0,0,1}00}}\psfrag{b6}[cb][cb][0.85]{{\color[rgb]{1,0,0}01}{\color[rgb]{0,0,1}01}}\psfrag{b7}[cb][cb][0.85]{{\color[rgb]{1,0,0}01}{\color[rgb]{0,0,1}11}}\psfrag{b8}[cb][cb][0.85]{{\color[rgb]{1,0,0}01}{\color[rgb]{0,0,1}10}}\psfrag{b9}[cb][cb][0.85]{{\color[rgb]{1,0,0}11}{\color[rgb]{0,0,1}00}}\psfrag{b10}[cb][cb][0.85]{{\color[rgb]{1,0,0}11}{\color[rgb]{0,0,1}01}}\psfrag{b11}[cb][cb][0.85]{{\color[rgb]{1,0,0}11}{\color[rgb]{0,0,1}11}}\psfrag{b12}[cb][cb][0.85]{{\color[rgb]{1,0,0}11}{\color[rgb]{0,0,1}10}}\psfrag{b13}[cb][cb][0.85]{{\color[rgb]{1,0,0}10}{\color[rgb]{0,0,1}00}}\psfrag{b14}[cb][cb][0.85]{{\color[rgb]{1,0,0}10}{\color[rgb]{0,0,1}01}}\psfrag{b15}[cb][cb][0.85]{{\color[rgb]{1,0,0}10}{\color[rgb]{0,0,1}11}}\psfrag{b16}[cb][cb][0.85]{{\color[rgb]{1,0,0}10}{\color[rgb]{0,0,1}10}}\includegraphics{16QAM.eps}
Figure 3: The 16-QAM constellation 𝒮\mathcal{S} and its binary labeling. The binary labeling of the constellation is based on the Cartesian product of the BRGC for 4-ary pulse amplitude modulation in phase (red) and quadrature (blue). The Voronoi regions of the symbols and the MED of the constellation are also shown. The Voronoi region 𝒱6\mathcal{V}_{6} is highlighted in gray.

The maximum-likelihood (ML) symbol-by-symbol detection rule for a given sequence 𝒙kmem\boldsymbol{x}^{\text{mem}}_{k} chooses the symbol si𝒮s_{i}\in\mathcal{S} that maximizes fYk|Xk,𝑿kmem(y|si,𝒙kmem)f_{Y_{k}|X_{k},\boldsymbol{X}^{\text{mem}}_{k}}(y|s_{i},\boldsymbol{x}^{\text{mem}}_{k}) in (III). The decision made by this detector can be expressed as

X^kML\displaystyle\hat{X}_{k}^{\mathrm{ML}} =\displaystyle= argminsi𝒮{logρ(|si|2+𝒙kmem2)\displaystyle\mathop{\mathrm{argmin}}_{s_{i}\in\mathcal{S}}\Biggl{\{}\log{\rho(\left\lvert s_{i}\right\rvert^{2}\!+\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2})} (13)
+|ysi|2ρ(|si|2+𝒙kmem2)},\displaystyle+\frac{|y-s_{i}|^{2}}{\rho(\left\lvert s_{i}\right\rvert^{2}\!+\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2})}\Biggr{\}},

which shows that, due to the dependency of logρ(|si|2+𝒙kmem2)\log{\rho(\left\lvert s_{i}\right\rvert^{2}\!+\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2})} on sis_{i}, this detector is not an MED detector. For simplicity, however, we disregard this term and study the MED detector, which chooses the symbol sis_{i} being closest, in Euclidean distance, to the channel output yy. Thus

X^k\displaystyle\hat{X}_{k} =argminsi𝒮|ysi|2\displaystyle=\mathop{\mathrm{argmin}}_{s_{i}\in\mathcal{S}}|y-s_{i}|^{2}
=si,if Yk𝒱i,\displaystyle=s_{i},\quad\text{if $Y_{k}\in\mathcal{V}_{i}$}, (14)

where 𝒱i\mathcal{V}_{i} denotes the decision region, or Voronoi region, of sis_{i}.

Remark 1

As we will later see, for memory NN, the MED detector in (III-A) is in fact equivalent to the detector in (13). Intuitively, this holds because the approximation 𝐱kmem2+|si|2𝐱kmem2\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2}+|s_{i}|^{2}\approx\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2} becomes tight when NN is large.

Remark 2

The ML symbol-by-symbol detector in (13) is suboptimal, i.e., better detectors can be devised. For example, one could design a detector that uses not only the current received symbol, but also the next NN received symbols. Since the current transmitted symbol will affect the noise of the next NN symbols, this information could be taken into account to make a better decision on the current symbol. In this paper, however, we focus on the MED detector in (III-A) because of its simplicity.

The following two theorems give closed-form expressions for the BER and SER for the constellation in Fig. 3 when used over the finite-memory GN model.

Theorem 1

For the finite-memory GN model with arbitrary memory N<N<\infty, the BER of the MED detector for the 16-QAM constellation in Fig. 3 is given by

BER=2324Nl=04N(4Nl)r{1,3,5}t{1,5,9}Br,tQ(r2P5γl,t,N),\displaystyle\mathrm{BER}=\frac{2^{-3}}{2^{4N}}\sum_{l=0}^{4N}\binom{4N}{l}\sum_{\begin{subarray}{c}r\in\{1,3,5\}\\ t\in\{1,5,9\}\end{subarray}}B_{r,t}Q\mathopen{}\left(\sqrt{\frac{r^{2}P}{5\gamma_{l,t,N}}}\right), (15)

where

B1,1\displaystyle B_{1,1} =2,B3,1=1,B5,1=0,\displaystyle=2,~B_{3,1}=1,~B_{5,1}=0, (16)
B1,5\displaystyle B_{1,5} =3,B3,5=2,B5,5=1,\displaystyle=3,~B_{3,5}=2,~B_{5,5}=-1, (17)
B1,9\displaystyle B_{1,9} =1,B3,9=1,B5,9=1,\displaystyle=1,~B_{3,9}=1,~B_{5,9}=-1, (18)

and where

γl,t,NPASE+η(2N+1)3(P(2N+4l+t)5)3.\displaystyle\gamma_{l,t,N}\triangleq P_{\text{ASE}}+\frac{\eta}{(2N+1)^{3}}\biggl{(}\frac{P(2N+4l+t)}{5}\biggr{)}^{3}. (19)
Proof:

See Appendix A. ∎

Theorem 2

For the finite-memory GN model with arbitrary memory N<N<\infty, the SER of the MED detector for the 16-QAM constellation in Fig. 3 is given by

SER\displaystyle\mathrm{SER} =4142Nl=04N(4Nl)e{1,2}t{1,5,9}Se,tQ(P5γl,t,N)e,\displaystyle=\frac{4^{-1}}{4^{2N}}\sum_{l=0}^{4N}\binom{4N}{l}\sum_{\begin{subarray}{c}e\in\{1,2\}\\ t\in\{1,5,9\}\end{subarray}}S_{e,t}Q\mathopen{}\left(\sqrt{\frac{P}{5\gamma_{l,t,N}}}\right)^{e}, (20)

where

S1,1\displaystyle S_{1,1} =4,B2,1=4,\displaystyle=4,~B_{2,1}=-4, (21)
S1,5\displaystyle S_{1,5} =6,B2,5=4,\displaystyle=6,~B_{2,5}=-4, (22)
S1,9\displaystyle S_{1,9} =2,B2,9=1,\displaystyle=2,~B_{2,9}=-1, (23)

and where γl,t,N\gamma_{l,t,N} is given by (19).

Proof:

See Appendix B. ∎

The BER and SER in the limit NN\to\infty can be inferred from Theorems 1 and 2 as shown in the next corollary.

Corollary 1

The BER and SER for the finite-memory GN model in the limit NN\to\infty are

BER\displaystyle\mathrm{BER} =34Q(P/5PASE+ηP3)+12Q(9P/5PASE+ηP3)\displaystyle=\frac{3}{4}Q\mathopen{}\left(\sqrt{\frac{P/5}{P_{\text{ASE}}+\eta P^{3}}}\right)+\frac{1}{2}Q\mathopen{}\left(\sqrt{\frac{9P/5}{P_{\text{ASE}}+\eta P^{3}}}\right)
14Q(5PPASE+ηP3),\displaystyle\qquad-\frac{1}{4}Q\mathopen{}\left(\sqrt{\frac{5P}{P_{\text{ASE}}+\eta P^{3}}}\right), (24)
SER\displaystyle\mathrm{SER} =3Q(P/5PASE+ηP3)94Q(P/5PASE+ηP3)2.\displaystyle=3Q\mathopen{}\left(\sqrt{\frac{P/5}{P_{\text{ASE}}+\eta P^{3}}}\right)-\frac{9}{4}Q\mathopen{}\left(\sqrt{\frac{P/5}{P_{\text{ASE}}+\eta P^{3}}}\right)^{2}. (25)
Proof:

See Appendix C. ∎

The other extreme case to consider is the memoryless AWGN channel. The BER and SER expressions in this case are given in the following corollary.

Corollary 2

The BER and SER for the memoryless AWGN channel are given by

BER\displaystyle\mathrm{BER} =34Q(P5PASE)+12Q(9P5PASE)\displaystyle=\frac{3}{4}Q\mathopen{}\left(\sqrt{\frac{P}{5P_{\text{ASE}}}}\right)+\frac{1}{2}Q\mathopen{}\left(\sqrt{\frac{9P}{5P_{\text{ASE}}}}\right)
14Q(5PPASE),\displaystyle\qquad-\frac{1}{4}Q\mathopen{}\left(\sqrt{\frac{5P}{P_{\text{ASE}}}}\right), (26)
SER\displaystyle\mathrm{SER} =3Q(P5PASE)94Q(P5PASE)2.\displaystyle=3Q\mathopen{}\left(\sqrt{\frac{P}{5P_{\text{ASE}}}}\right)-\frac{9}{4}Q\mathopen{}\left(\sqrt{\frac{P}{5P_{\text{ASE}}}}\right)^{2}. (27)
Proof:

Set η=0\eta=0 in (24) and (25). ∎

The results in Corollaries 1 and 2 correspond to the well-known expressions for the BER and SER for the AWGN channel. In particular, (26) can be found in [43, eq. (10)], [44, eq. (10.36a)] and (27) in [44, eq. (10.32)]. Also, the results in Corollary 2 together with (2) show that the BER and SER for the finite-memory GN model when NN\rightarrow\infty converge to the BER and SER for the regular GN model.

\psfrag{xlabel}[cc][cB][0.85]{$P$~[dBm]}\psfrag{ylabel}[cc][cB][0.85]{$\mathrm{BER}$}\psfrag{N=0}[cl][cl][0.85]{AWGN}\psfrag{N=1}[cl][cl][0.85]{$N=1$}\psfrag{N=2}[cl][cl][0.85]{$N=2$}\psfrag{N=5}[cl][cl][0.85]{$N=5$}\psfrag{N=10}[cl][cl][0.85]{$N=10$}\psfrag{N=50}[cl][cl][0.85]{$N=50$}\psfrag{N=infinityyyy}[cl][cl][0.85]{GN model}\includegraphics{BEP_PASE_4.100000e-06_ETA_7.244000e+03.eps}
\psfrag{xlabel}[cc][cB][0.85]{$P$~[dBm]}\psfrag{ylabel}[cc][cB][0.85]{$\mathrm{BER}$}\psfrag{N=0}[cl][cl][0.85]{AWGN}\psfrag{N=1}[cl][cl][0.85]{$N=1$}\psfrag{N=2}[cl][cl][0.85]{$N=2$}\psfrag{N=5}[cl][cl][0.85]{$N=5$}\psfrag{N=10}[cl][cl][0.85]{$N=10$}\psfrag{N=50}[cl][cl][0.85]{$N=50$}\psfrag{N=infinityyyy}[cl][cl][0.85]{GN model}\psfrag{ylabel}[cc][cB][0.85]{$\mathrm{SER}$}\includegraphics{SEP_PASE_4.100000e-06_ETA_7.244000e+03.eps}
Figure 4: Analytical BER (top) and SER (bottom) of 16-QAM transmission with the finite-memory GN model, for different values of NN (solid lines). Markers show simulation results with the ML detector in (13) (squares) and the MED detector in (III-A) (circles). The results for the memoryless AWGN channel and the regular GN model are included for comparison.

III-B Numerical Results

We consider the same scenario as in Sec. II-E, with parameters according to Table I. The BER and SER for the 16-QAM constellation in Fig. 3 given by Theorems 1 and 2 are shown in Fig. 4 for different values of NN. Fig. 4 also shows the asymptotic cases N=0N=0 and NN\rightarrow\infty given by Corollaries 1 and 2, respectively. Furthermore, results obtained via computer simulations of (1)–(2) are included using the ML detector in (13), marked with squares, and the MED detector in (III-A), marked with circles. As expected, the MED detector yields a perfect match with the analytical expressions, whereas the ML detector deviates slightly for small channel memories.

The results in Fig. 4 show that in the low-input-power regime, the memory in the channel plays no role for the BER and SER, and all the curves follow closely the BER and the SER of a memoryless AWGN channel. However, as PP increases, the memory kicks in, causing the BER and SER for finite NN to have a minimum, and then to increase as PP increases. Physically, this can be explained as follows: in the low-power regime, the BER is limited by the ASE noise, which is independent of the memory depth. In the high-power regime, the Kerr-induced noise dominates, resulting in increasing BER with power. Similar behavior has been reported in most experiments and simulations on nonlinearly-limited links, e.g., [45, 46, 9, 11], [47, Ch. 9]. The reason why the performance improves slightly with the memory depth NN is the nonlinear scaling of the Kerr-induced noise. For N=1N=1, sequences of two or more high-amplitude symbols will receive high noise power and dominate the average BER. For higher NN, longer (and less probable) sequences of high-amplitude symbols are required to receive the same, high, noise power. Thus on average the performance improves with NN, up to a limit given by the GN model.

The results in Fig. 4 also show how the finite-memory model in the high-input power regime approaches the GN model. For N=50N=50, the two models yield very similar BER and SER curves.

IV Channel Capacity

In this section, some fundamentals of information theory are first reviewed. Then a lower bound on the capacity of the finite-memory GN model is derived and evaluated numerically.

IV-A Preliminaries

Fig. 5 shows a generic coded communication system where a message jj is mapped to a codeword 𝒙=[x1,,xn]\boldsymbol{x}=[x_{1},\dots,x_{n}]. This codeword is then used to modulate a continuous-time waveform, which is then transmitted through the physical channel. At the receiver’s side, the continuous-time waveform is processed (filtered, equalized, synchronized, matched filtered, sampled, etc.) resulting in a discrete-time observation 𝒀=[Y1,,Yn]\boldsymbol{Y}=[Y_{1},\dots,Y_{n}], which is a noisy version of the transmitted codeword 𝒙\boldsymbol{x}. The decoder uses 𝒀\boldsymbol{Y} to estimate the transmitted message jj.

\psfrag{ENC}[cc][cc][0.85]{Encoder}\psfrag{DEC}[cc][cc][0.85]{Decoder}\psfrag{COMM}[cc][cc][0.85]{Physical}\psfrag{CHANNEL}[cc][cc][0.85]{Channel}\psfrag{x}[bc][bc][0.85]{$\boldsymbol{x}$}\psfrag{y}[bc][Bc][0.85]{$\boldsymbol{Y}$}\psfrag{z}[bl][Bl][0.85]{$\boldsymbol{Z}$}\psfrag{j}[bc][Bc][0.85]{$j$}\psfrag{jh}[bc][Bc][0.85]{$\hat{\jmath}$}\includegraphics[width=216.81pt]{enc_dec.eps}
Figure 5: Encoder and decoder pair. The encoder maps a message jj to a codeword 𝒙=[x1,,xn]\boldsymbol{x}=[x_{1},\dots,x_{n}]. The decoder uses the noisy observation 𝒀=[Y1,,Yn]\boldsymbol{Y}=[Y_{1},\dots,Y_{n}] to provide an estimate ȷ^\hat{\jmath} of the message jj .

When designing a coded communication system, the first step is to choose the set of codewords (i.e., the codebook) that will be transmitted through the channel. Once the codebook has been chosen, the mapping rule between messages and codewords should be chosen, which fully determines the encoding procedure. At the receiver side, the decoder block will use the mapping rule used at the transmitter (as well as the channel characteristics) to give an estimate ȷ^\hat{\jmath} of the message jj. The triplet codebook, encoder, and decoder forms a so-called coding scheme. Practical coding schemes are designed so as to minimize the probability that ȷ^\hat{\jmath} differs from jj, while at the same time keeping the complexity of both encoder and decoder low.

Channel capacity is the largest transmission rate at which reliable communications can occur. More formally, let (n,M,ϵ)(n,M,\epsilon) be a coding scheme consisting of:

  • An encoder that maps a message j{1,,M}j\in\{1,\dots,M\} into a block of nn transmitted symbols 𝒙=[x1,,xn]\boldsymbol{x}=[x_{1},\dots,x_{n}] satisfying a per-codeword power constraint

    1nl=1n|xl|2=P.\displaystyle\frac{1}{n}\sum_{l=1}^{n}|x_{l}|^{2}=P. (28)
  • A decoder that maps the corresponding block of received symbols 𝒀=[Y1,,Yn]\boldsymbol{Y}=[Y_{1},\dots,Y_{n}] into a message ȷ^{1,,M}\hat{\jmath}\in\{1,\dots,M\} so that the average error probability, i.e., the probability that ȷ^\hat{\jmath} differs from jj, does not exceed ϵ\epsilon.

Observe that PP here is defined differently from in previous sections. It still represents the average transmit power, but while this quantity is Sec. IIIII was interpreted in a statistical sense as the mean of an i.i.d. random variable, it is in this section the exact power of every codeword.

The maximum coding rate R(n,ϵ)R^{*}(n,\epsilon) (measured in bit/symbol) for a given block length nn and error probability ϵ\epsilon is defined as the largest ratio (log2M)/n(\log_{2}M)/n for which an (n,M,ϵ)(n,M,\epsilon) coding scheme exists. The channel capacity CC is the largest coding rate for which a coding scheme with vanishing error probability exists, in the limit of large block length,

Climϵ0limnR(n,ϵ).\displaystyle C\triangleq\lim_{\epsilon\to 0}\lim_{n\to\infty}R^{*}(n,\epsilon). (29)

IV-B Memoryless Channels

By Shannon’s channel coding theorem, the channel capacity of a discrete-time memoryless channel, in bit/symbol, can be calculated as [16], [17, Ch. 7]

C=supI(X;Y),\displaystyle C=\sup I(X;Y), (30)

where I(X;Y)I(X;Y) is the mutual information (MI)

I(X;Y)=fX,Y(x,y)log2fX,Y(x,y)fX(x)fY(y)dxdy\displaystyle I(X;Y)=\iint f_{X,Y}(x,y)\log_{2}\frac{f_{X,Y}(x,y)}{f_{X}(x)f_{Y}(y)}dxdy (31)

and the maximization in (30) is over all probability distributions fXf_{X} that satisfy 𝔼[|X|2]=P\mathbb{E}[|X|^{2}]=P, for a given channel fY|Xf_{Y|X}.

Roughly speaking, a transmission scheme that operates at an arbitrary rate R<CR<C can be designed by creating a codebook of M=2nRM=2^{nR} codewords of length nn, whose elements are i.i.d. random samples from the distribution fXf_{X} that maximizes the mutual information in (30). This codebook is stored in both the encoder and decoder. During transmission, the encoder maps each message jj into a unique codeword 𝒙\boldsymbol{x}, and the decoder identifies the codeword that is most similar, in some sense, to the received vector 𝒀\boldsymbol{Y}. An arbitrarily small error probability ϵ\epsilon can be achieved by choosing nn large enough. This random coding paradigm was proposed already by Shannon [16]. In practice, however, randomly constructed codebooks are usually avoided for complexity reasons.

Since the additive noise in (2) is statistically independent of XkX_{k}, the channel capacity of the GN model (2) can be calculated exactly as [5, 8]

C=log2(1+PPASE+ηP3)\displaystyle C=\log_{2}\mathopen{}\left(1+\frac{P}{P_{\text{ASE}}+\eta P^{3}}\right) (32)

using Shannon’s well-known capacity expression [16, Sec. 24], [17, Ch. 9]. The capacity in (32) can be achieved by choosing the codewords 𝒙\boldsymbol{x} to be drawn independently from a Gaussian distribution 𝒞𝒩(0,P)\mathcal{CN}(0,P).

Considered as a function of the transmitted signal power PP, the capacity in (32) has the peculiar behavior of reaching a peak and eventually decreasing to zero at high enough power, since the denominator of (32) increases faster than the numerator. This phenomenon, sometimes called the “nonlinear Shannon limit” in the optical communications community, conveys the message that reliable communication over nonlinear optical channels becomes impossible at high powers. In the following sections, we shall question this pessimistic conclusion.

IV-C Channels with Memory

The capacity of channels with memory is, under certain assumptions on information stability [48, Sec. I],

C=limnsup1nI(𝑿1n;𝒀1n),\displaystyle C=\lim_{n\rightarrow\infty}\sup\frac{1}{n}I(\boldsymbol{X}_{1}^{n};\boldsymbol{Y}_{1}^{n}), (33)

where 𝑿ij=(Xi,Xi+1,,Xj)\boldsymbol{X}_{i}^{j}=(X_{i},X_{i+1},\ldots,X_{j}), I(𝑿ij;𝒀ij)I(\boldsymbol{X}_{i}^{j};\boldsymbol{Y}_{i}^{j}) is defined as a multidimensional integral analogous to (31), and the maximization is over all joint distributions of X1,,XnX_{1},\ldots,X_{n} satisfying 𝔼[𝑿1n2]=nP\mathds{E}\mathopen{}\left[\|\boldsymbol{X}_{1}^{n}\|^{2}\right]=nP. In this context, it is worth emphasizing that the maximization in (33) includes sequences X1,,XnX_{1},\ldots,X_{n} that are not i.i.d. Hence, in order to calculate the channel capacity of a transmission link, it is essential that the employed channel model allows non-i.i.d. inputs.

An exact expression for the channel capacity of the finite-memory GN model (7) is not available. Shannon’s formula, which leads to (32), does not apply here, because the sequences {Xk}\{X_{k}\} and {Zk}\{Z_{k}\}, where ZkZ_{k} was defined in (7), are dependent. A capacity estimation via (33) is numerically infeasible, since it involves integration and maximization over high-dimensional spaces. We therefore turn our attention to bounds on the capacity for the finite-memory model. Every joint distribution of X1,,XkX_{1},\ldots,X_{k} satisfying 𝔼[𝑿1n2]=nP\mathds{E}\mathopen{}\left[\|\boldsymbol{X}_{1}^{n}\|^{2}\right]=nP gives us a lower bound on capacity. Thus,

Climn1nI(𝑿1n;𝒀1n),\displaystyle C\geq\lim_{n\rightarrow\infty}\frac{1}{n}I(\boldsymbol{X}_{1}^{n};\boldsymbol{Y}_{1}^{n}), (34)

for any random process {Xk}\{X_{k}\} such that the limit exists.

\psfrag{Re}[.7][25]{$\mathrm{Re}\{X_{k}\}$}\psfrag{Im}[.7][25]{$\mathrm{Im}\{X_{k}\}$}\psfrag{k}[.7]{$k$}\includegraphics[width=216.81pt]{sequence}
Figure 6: Six samples of the random input process {Xk}\{X_{k}\} used to generate the lower bound in Theorem 3. The channel memory is here N=1N=1, meaning that 2N+1=32N+1=3 input symbols XkX_{k} influence each output symbol. The distributions are illustrated as scatter plots of 10001000 realizations for each sample.

IV-D Lower Bound

In this section, a lower bound on (33) is derived by applying (34) to the following random input process. In every block of 2N+12N+1 consecutive symbols, we let the first NN symbols and the last NN symbols have a constant amplitude, whereas the amplitude of the symbol in the middle of the block follows an arbitrary distribution. The phase of each symbol in the block is assumed uniform. With this random input process, illustrated in Fig. 6, the memory in (7) depends only on a single variable-amplitude symbol. This enables us to derive an analytical expression for the resulting capacity lower bound in (34).

Theorem 3

For every r10r_{1}\geq 0 and every probability distribution fRf_{R} over +\mathbb{R}^{+} such that

2Nr12+𝔼[R2]2N+1=P,\displaystyle\frac{2Nr_{1}^{2}+\mathbb{E}[R^{2}]}{2N+1}=P, (35)

where RfRR\sim f_{R}, the channel capacity of (7) is lower-bounded as

C𝔼[log2f𝑼(𝑼)]2N+10fR(r)log2(eρ(2Nr12+r2))dr.C\geq-\frac{\mathbb{E}[\log_{2}f_{\boldsymbol{U}}(\boldsymbol{U})]}{2N+1}\\ -\int_{0}^{\infty}f_{R}(r)\log_{2}(e\rho(2Nr^{2}_{1}+r^{2}))\,\mathrm{d}r. (36)

Here, 𝐔[UN,UN+1,,UN]\boldsymbol{U}\triangleq[U_{-N},U_{-N+1},\ldots,U_{N}] is a random vector whose probability density function f𝐔f_{\boldsymbol{U}} is

f𝑼(𝒖)=\displaystyle f_{\boldsymbol{U}}(\boldsymbol{u})= 0fR(r)exp(k=NNuk+2Nr12+r2ρ(2Nr12+r2))(ρ(2Nr12+r2))2N+1\displaystyle\int_{0}^{\infty}f_{R}(r)\frac{\mathrm{exp}\mathopen{}\left(-\frac{\sum_{k=-N}^{N}u_{k}+2Nr_{1}^{2}+r^{2}}{\rho(2Nr^{2}_{1}+r^{2})}\right)}{\bigl{(}\rho(2Nr^{2}_{1}+r^{2})\bigr{)}^{2N+1}} (37)
I0(2ru0ρ(2Nr12+r2))\displaystyle\cdot I_{0}\mathopen{}\left(\frac{2r\sqrt{u_{0}}}{\rho(2Nr_{1}^{2}+r^{2})}\right)
k=Nk0NI0(2r1ukρ(2Nr12+r2))dr,\displaystyle\cdot\prod_{\begin{subarray}{c}k=-N\\ k\neq 0\end{subarray}}^{N}I_{0}\mathopen{}\left(\frac{2r_{1}\sqrt{u_{k}}}{\rho(2Nr_{1}^{2}+r^{2})}\right)\mathrm{d}r,

where the function ρ()\rho(\cdot) is defined in (10), and I0(u)I_{0}(u) is the modified Bessel function of the first kind.

Proof:

See Appendix D. ∎

The bound will be numerically computed in the next section.

IV-E Numerical Results

Theorem 3 yields a lower bound on capacity for every constant r1r_{1} and every probability distribution fRf_{R} satisfying (35). Instead of optimizing the bound over all distributions fRf_{R}, which is of limited interest, since the theorem itself provides only a lower bound on capacity, we study a heuristically chosen family of distributions and optimize its parameters along with the constant amplitude r1r_{1}.

\psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi05} \psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi10}
\psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi15} \psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi20}
\psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi25} \psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi30}
Figure 7: Lower bounds on capacity from Theorem 3 as a function of ν\nu, for various parameters PP and r12/sr_{1}^{2}/s. The memory is N=1N=1.

An attractive distribution in this context is to let the variable-amplitude symbols follow a circularly symmetric bivariate t-distribution [49, p. 86], [50, p. 1],

fX(x)=12πs(1+|x|2νs)(1+ν/2),\displaystyle f_{X}(x)=\frac{1}{2\pi s}\left(1+\frac{|x|^{2}}{\nu s}\right)^{-(1+\nu/2)}, (38)

where XX (with magnitude R=|X|R=\left\lvert X\right\rvert) denotes one such variable-amplitude symbol, ν\nu is a shape parameter, and ss scales the variance, which equals [50, p. 11] 𝔼[|X|2]=𝔼[R2]=2νs/(ν2)\mathbb{E}[|X|^{2}]=\mathbb{E}[R^{2}]=2\nu s/(\nu-2) if ν>2\nu>2 and is otherwise undefined. The shape of this distribution is similar to a Gaussian, but the heaviness of the tail can be controlled via the shape parameter ν\nu: the closer ν\nu is to 22, the heavier tail. This is, as we shall see later, what makes it an interesting choice for nonlinear optical channels.

Again, we consider the same scenario as in Sec. II-E, with the system parameters given in Table I. The distribution of R=|X|R=|X| is given by fR(r)=2πrfX(r)f_{R}(r)=2\pi rf_{X}(r), with fXf_{X} given by (38). The power constraint (35), which reduces to

P=12N+1(2Nr12+2νsν2),\displaystyle P=\frac{1}{2N+1}\left(2Nr_{1}^{2}+\frac{2\nu s}{\nu-2}\right),

leaves two degrees of freedom to optimize for each PP, which we can take to be the shape parameter ν\nu and the ratio r12/sr_{1}^{2}/s.

\psfrag{N=0}[cl][cl][0.85]{AWGN}\psfrag{ZERO}[cl][cl][0.85]{$N=0$}\psfrag{ONE}[cl][cl][0.85]{$N=1$}\psfrag{LowerBound}[cl][cl][0.85]{Lower bounds}\psfrag{OTH}[cc][cc][0.85]{$N=2,5,10,20,50$}\psfrag{N=infinityyyyyyyyy}[cl][cl][0.85]{GN model}\psfrag{xlabel}[cc][cB][0.85]{$P$~[dBm]}\psfrag{ylabel}[cc][cB][0.85]{$C$~[bit/symbol]}\includegraphics{EA_clb_w_extension}
Figure 8: Lower bounds from Theorem 3 on the capacity of the finite-memory model for different values of NN. The exact capacities of the AWGN channel and the GN model in (32) are included for comparison. Observe that the capacity of the finite-memory model does not converge to the capacity of the GN model as the memory NN increases. Dashed lines indicate improved lower bounds via the law of monotonic channel capacity.

The lower bound on the capacity of the finite-memory model given by Theorem 3 is shown in Fig. 7 as a function of PP, ν\nu, and r12/sr_{1}^{2}/s, for the special case N=1N=1. The expectation in (36) was estimated by Monte Carlo integration. It can be seen that as the transmit power PP increases, the optimum shape parameter ν\nu gets closer and closer to 22. In other words, the tail gets heavier, so that at high power, it consumes almost all power, while the probability of transmitting a high amplitude RR is still small. In this sense, a t-distribution with a shape parameter near 22 is similar to a satellite constellation [37].

Selecting the optimum parameters ν\nu and r12/sr_{1}^{2}/s for every power PP, the capacity bound is plotted in Fig. 8 as a function of transmit power PP, for selected values of the channel memory NN. The figure also shows the AWGN channel capacity and the exact capacity of the GN model given by (32). In the linear regime, the capacity bound is close to the AWGN capacity if N=0N=0, because the t-distribution is, at high values of ν\nu, approximately equal to the capacity-achieving Gaussian distribution. As NN increases, the capacity bound tends, still in the linear regime, to the mutual information of constant-amplitude transmission [51, 52].

Interestingly, we can see that as NN increases, the curves approach an asymptotic bound (the curves for N=10N=10, 2020, and 5050 almost overlap). It follows that reliable communication in the high input power regime is indeed possible for every finite NN. This result should be compared with the regular GN model, whose capacity (32) decreases to zero at high average transmit power [8]. It may seem contradictory that the GN model, which can be characterized as a limiting case of the finite-memory model (cf. (7) and (2)–(3)), nevertheless exhibits a fundamentally different channel capacity. This can be intuitively understood as follows. For every block of 2N+12N+1 symbols, we transmit 2N2N constant-amplitude symbols with low power and only one symbol with variable (potentially very large) power. Although the amplitude of this variable-power symbol is chosen so that the average power constraint is satisfied according to (35) (which requires averaging across many blocks of length 2N+12N+1), the convergence to average power illustrated in (3) does not occur within a block, even when NN is taken very large.

It can be observed that the lower bounds in Fig. 8 all exhibit a low peak, before they converge to their asymptotic values at high PP. Such bounds can always be improved using the law of monotonic channel capacity [53]. Cast in the framework of this paper, this law states that the channel capacity never decreases with power for any finite-memory channel. This law does not give a capacity lower bound per se, but it provides an instrument by which a lower bound at a certain power PP can be propagated to any power greater than PP. Hence, the part of the curves in Fig. 8 to the right of the peaks can be lifted up to the level of the peaks, which would yield a marginally tighter lower bounds (dashed lines in Fig. 8).

V Discussion and Conclusions

We extended the popular GN model for nonlinear fiber channels with a parameter to account for the channel memory. The extended channel model, which is given by (7), is able to model the time-varying output of an optical fiber whose input is a nonstationary process. If the input varies on a time scale comparable to or longer than the memory of the channel, then this model gives more realistic results than the regular GN model, as we showed in Fig. 2.

The validity of the GN model remains undisputed in the case of i.i.d. input symbols, such as in an uncoded scenario with a fixed, not too heavy-tailed modulation format222Examples of “heavy-tailed” modulation formats are t-distributions (Sec. IV-E) and satellite constellations [37]. and a fixed transmit power. These are the conditions under which the GN model was derived and validated. The uncoded bit and symbol error rates computed in Sec. III confirm that the finite-memory model behaves similarly to the GN model as the channel memory NN increases.

The scene changes completely if we instead study capacity, as in Fig. 8. In this case, the finite-memory GN model does not, even at high NN, behave as the regular GN model. This is because the channel capacity by definition involves a maximization over all possible transmission schemes, including nonstationary input, heavy-tailed modulation formats, etc. In the nonlinear regime, it turns out to be beneficial to transmit using a heavy-tailed input sequence, whose output the GN model cannot reliably predict. Hence, the GN model and other infinite-memory models (in the sense defined in Sec. II-C) should be used with caution in capacity analysis. It is still possible (and often easy) to calculate the capacity of such channel models, but this capacity should not be interpreted as the capacity of some underlying physical phenomenon with a finite memory. As a rule of thumb, if the model depends on the average transmit power, we recommend to avoid it in capacity analysis.

A challenging area for future work would be to derive more realistic finite-memory models than (7), i.e., discrete-time channel models that give the channel output as a function of a finite number of input symbols, ideally including not only a time-varying sequence of symbols but also symbols in other wavelengths, polarizations, modes, and/or cores, and to analyze these models from an information-theoretic perspective. This may lead to innovative new transmission techniques, which may potentially increase the capacity significantly over known results in the nonlinear regime. The so-called nonlinear Shannon limit, which has only been derived for infinite-memory channel models, does not prevent the existence of such techniques.

Appendix A Proof of Theorem 1

Let {Bq}\{B_{q}\}, q=1,,4q=1,\dots,4, be the four bits associated with the 1616-QAM constellation point chosen as the kkth transmitted symbol XkX_{k}. The BER for the 16-QAM constellation in Fig. 3 is given by

BER\displaystyle\mathrm{BER} 14q=14Pr{B^qBq}\displaystyle\triangleq\frac{1}{4}\sum_{q=1}^{4}\Pr\{\hat{B}_{q}\neq{B}_{q}\} (39)
=164q=14i=116PB^q|Xk(B¯q|si),\displaystyle=\frac{1}{64}\sum_{q=1}^{4}\sum_{i=1}^{16}P_{\hat{B}_{q}|X_{k}}\bigl{(}\overline{B}_{q}|s_{i}\bigr{)}, (40)

where B^q\hat{B}_{q} is the estimated bit obtained by the MED detector in (III-A) and B¯\overline{B} denotes bit negation. Using the law of total probability we can then express (40) as

BER\displaystyle\mathrm{BER} =164q=14i=116𝒙kmem2P𝑿kmem2(𝒙kmem2)\displaystyle=\frac{1}{64}\sum_{q=1}^{4}\sum_{i=1}^{16}\sum_{\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2}}P_{\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}}\bigl{(}\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2}\bigr{)}
PB^q|Xk,𝑿kmem2(B¯q|si,𝒙kmem2).\displaystyle\qquad\qquad\cdot P_{\hat{B}_{q}|X_{k},\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}}\bigl{(}\overline{B}_{q}|s_{i},\|\boldsymbol{x}^{\text{mem}}_{k}\|^{2}\bigr{)}. (41)

We now compute the PMF P𝑿kmem2P_{\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}}. As 𝑿kmem2\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2} is a sum of 2N2N i.i.d. random variables, its PMF is the 2N2N-fold self-convolution of the PMF of one such random variable. This convolution can be readily computed using probability generating functions [54, Sec. 5.1]. Let

P^|Xk|2(z)\displaystyle\hat{P}_{|X_{k}|^{2}}(z) =14(z2Δ2+2z10Δ2+z18Δ2)\displaystyle=\frac{1}{4}(z^{2\Delta^{2}}+2z^{10\Delta^{2}}+z^{18\Delta^{2}})
=14(zΔ2+z9Δ2)2\displaystyle=\frac{1}{4}\bigl{(}z^{\Delta^{2}}+z^{9\Delta^{2}}\bigr{)}^{2} (42)

denote the probability generating function of |Xk|2|X_{k}|^{2}. The probability generating function of 𝑿kmem2\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2} is given by

P^𝑿kmem2(z)\displaystyle\hat{P}_{\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}}(z) =(P^|Xk|2(z))2N\displaystyle=\bigl{(}\hat{P}_{|X_{k}|^{2}}(z)\bigr{)}^{2N} (43)
=142N(zΔ2+z9Δ2)4N\displaystyle=\frac{1}{4^{2N}}\bigl{(}z^{\Delta^{2}}+z^{9\Delta^{2}}\bigr{)}^{4N} (44)
=l=04N142N(4Nl)z(4N+8l)Δ2.\displaystyle=\sum_{l=0}^{4N}\frac{1}{4^{2N}}\binom{4N}{l}z^{(4N+8l)\Delta^{2}}. (45)

We see from (45) that the possible outcomes of 𝑿kmem2\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2} are

δl(4N+8l)Δ2,l=0,1,,4N,\displaystyle\delta_{l}\triangleq{(4N+8l)\Delta^{2}},\quad l=0,1,\ldots,4N, (46)

and 𝑿kmem2=δl\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}=\delta_{l} occurs with probability (4Nl)42N{\binom{4N}{l}}{4^{-2N}}. Using this in (41) yields

BER\displaystyle\mathrm{BER} =4342Nl=04N(4Nl)q=14i=116PB^q|Xk,𝑿kmem2(B¯q|si,δl)\displaystyle=\frac{4^{-3}}{4^{2N}}\sum_{l=0}^{4N}\binom{4N}{l}\sum_{q=1}^{4}\sum_{i=1}^{16}P_{\hat{B}_{q}|X_{k},\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}}(\overline{B}_{q}|s_{i},\delta_{l})
=4342Nl=04N(4Nl)q=14i=116j=1cj,qci,q16\displaystyle=\frac{4^{-3}}{4^{2N}}\sum_{l=0}^{4N}\binom{4N}{l}\sum_{q=1}^{4}\sum_{i=1}^{16}\sum_{\begin{subarray}{c}j=1\\ c_{j,q}\neq c_{i,q}\end{subarray}}^{16}
𝒱jπ1ρ(|si|2+δl)exp(|ysi|2ρ(|si|2+δl))dy,\displaystyle\qquad\,\int_{\mathcal{V}_{j}}\frac{\pi^{-1}}{\rho(|s_{i}|^{2}+\delta_{l})}\mathrm{exp}{\biggl{(}-\frac{|y-s_{i}|^{2}}{\rho(|s_{i}|^{2}+\delta_{l})}\biggr{)}}\,\mathrm{d}y, (47)

where (47) follows from (III) and cj,qc_{j,q} represents the qqth bit label of the symbol sjs_{j} for j=1,,16j=1,\ldots,16.

The density in the integral in (47) corresponds to a Gaussian random variable with total variance ρ(|si|2+δl)\rho(|s_{i}|^{2}+\delta_{l}), and thus, we now focus on the function ρ(|si|2+δl)\rho(|s_{i}|^{2}+\delta_{l}). First, we express the constellation points indices as {1,2,,16}=159\{1,2,\ldots,16\}=\mathcal{I}_{1}\cup\mathcal{I}_{5}\cup\mathcal{I}_{9}, where 1{6,7,10,11}\mathcal{I}_{1}\triangleq\{6,7,10,11\}, 5{2,3,5,8,9,12,14,15}\mathcal{I}_{5}\triangleq\{2,3,5,8,9,12,14,15\}, and 9{1,4,13,16}\mathcal{I}_{9}\triangleq\{1,4,13,16\}. From Fig. 3, we see that |si|2=2Δ2|s_{i}|^{2}=2\Delta^{2} if i1i\in\mathcal{I}_{1}, |si|2=10Δ2|s_{i}|^{2}=10\Delta^{2} if i5i\in\mathcal{I}_{5}, and |si|2=18Δ2|s_{i}|^{2}=18\Delta^{2} if i9i\in\mathcal{I}_{9}. Using the definition of ρ(|si|2+δl)\rho(|s_{i}|^{2}+\delta_{l}) in (11) together with (46) and P=10Δ2P=10\Delta^{2}, we obtain

ρ(|si|2+δl)=\displaystyle\rho(|s_{i}|^{2}+\delta_{l})=
{PASE+η(2N+1)3(P(2N+4l+1)5)3,if i1PASE+η(2N+1)3(P(2N+4l+5)5)3,if i5PASE+η(2N+1)3(P(2N+4l+9)5)3,if i9.\displaystyle\quad\begin{cases}P_{\text{ASE}}+\frac{\eta}{(2N+1)^{3}}\left(\frac{P(2N+4l+1)}{5}\right)^{3}\!\!,&{\textnormal{if $i\in\mathcal{I}_{1}$}}\\ P_{\text{ASE}}+\frac{\eta}{(2N+1)^{3}}\left(\frac{P(2N+4l+5)}{5}\right)^{3}\!\!,&{\textnormal{if $i\in\mathcal{I}_{5}$}}\\ P_{\text{ASE}}+\frac{\eta}{(2N+1)^{3}}\left(\frac{P(2N+4l+9)}{5}\right)^{3}\!\!,&{\textnormal{if $i\in\mathcal{I}_{9}$}}\\ \end{cases}. (48)

We recognize the three values of ρ(|si|2+δl)\rho(|s_{i}|^{2}+\delta_{l}) in (A) as γl,1,N\gamma_{l,1,N}, γl,5,N\gamma_{l,5,N}, and γl,9,N\gamma_{l,9,N}, respectively. Combining this with P=10Δ2P=10\Delta^{2} and inspecting the constellation and labeling in Fig. 3 yields (15).

Appendix B Proof of Theorem 2

The SER for the 16-QAM constellation in Fig. 3 is

SER\displaystyle\mathrm{SER} Pr{X^kXk}\displaystyle\triangleq\Pr\{\hat{X}_{k}\neq X_{k}\} (49)
=116i=116Pr{X^ksi|Xk=si}\displaystyle=\frac{1}{16}\sum_{i=1}^{16}\Pr\{\hat{X}_{k}\neq s_{i}|X_{k}=s_{i}\} (50)
=116i=116j=1ji16Pr{Yk𝒱j|Xk=si}.\displaystyle=\frac{1}{16}\sum_{i=1}^{16}\sum_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{16}\Pr\{Y_{k}\in\mathcal{V}_{j}|X_{k}=s_{i}\}. (51)

By conditioning on the possible values of 𝑿kmem2\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}, we obtain

SER\displaystyle\mathrm{SER} =4242Nl=04N(4Nl)i=116j=1ji16\displaystyle=\frac{4^{-2}}{4^{2N}}\sum_{l=0}^{4N}\binom{4N}{l}\sum_{i=1}^{16}\sum_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{16}
Pr{Yk𝒱j|Xk=si,𝑿kmem2=δl}\displaystyle\quad\qquad\quad\Pr\{Y_{k}\in\mathcal{V}_{j}|X_{k}=s_{i},\|\boldsymbol{X}^{\text{mem}}_{k}\|^{2}=\delta_{l}\} (52)
=4242Nl=04N(4Nl)i=116j=1ji16\displaystyle=\frac{4^{-2}}{4^{2N}}\sum_{l=0}^{4N}\binom{4N}{l}\sum_{i=1}^{16}\sum_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{16}
𝒱jπ1ρ(|si|2+δl)exp(|ysi|2ρ(|si|2+δl))dy,\displaystyle\qquad\qquad\int_{\mathcal{V}_{j}}\frac{\pi^{-1}}{\rho(|s_{i}|^{2}+\delta_{l})}\mathrm{exp}{\biggl{(}-\frac{|y-s_{i}|^{2}}{\rho(|s_{i}|^{2}+\delta_{l})}\biggr{)}}\,\mathrm{d}y, (53)

where δl\delta_{l} is given by (46).

The expression in (20) is obtained by recognizing the density in the integral in (53) as a Gaussian random variable with total variance given by (A), and by integrating, for each ii in the sets 1\mathcal{I}_{1}, 5\mathcal{I}_{5}, and 9\mathcal{I}_{9}, over 𝒱j\mathcal{V}_{j} with jij\neq i. This completes the proof of Theorem 2.

Appendix C Proof of Corollary 1

The SER in (20) can be expressed as

SER\displaystyle\mathrm{SER} =14e{1,2}t{1,5,9}Se,tl=04N142N(4Nl)ue(4l+t14(2N+1)),\displaystyle=\frac{1}{4}\sum_{\begin{subarray}{c}e\in\{1,2\}\\ t\in\{1,5,9\}\end{subarray}}S_{e,t}\sum_{l=0}^{4N}\frac{1}{4^{2N}}\binom{4N}{l}u_{e}\biggl{(}\frac{4l+t-1}{4(2N+1)}\biggr{)}, (54)

where

ue\displaystyle u_{e} (x)Q(P/5PASE+(1+4x)3η(P/5)3)e\displaystyle(x)\triangleq Q\mathopen{}\left(\sqrt{\frac{P/5}{P_{\text{ASE}}+({1+4x})^{3}\eta\left({P}/{5}\right)^{3}}}\right)^{e} (55)

is a continuous and bounded function in [0,2][0,2] for any e{1,2}e\in\{1,2\} and t{1,5,9}t\in\{1,5,9\}. We can interpret the innermost sum in (54) in probabilistic terms as

l=04N142N(4Nl)ue\displaystyle\sum_{l=0}^{4N}\frac{1}{4^{2N}}\binom{4N}{l}u_{e} (4l+t14(2N+1))\displaystyle\biggl{(}\frac{4l+t-1}{4(2N+1)}\biggr{)}
=𝔼[ue(4S4N+t14(2N+1))],\displaystyle=\mathds{E}\mathopen{}\left[u_{e}\biggl{(}\frac{4S_{4N}+t-1}{4(2N+1)}\biggr{)}\right], (56)

where S4NS_{4N} is a binomial random variable with parameters (4N,1/2)(4N,1/2), i.e., S4NS_{4N} is the sum of 4N4N i.i.d. Bernoulli random variables that take values 0 and 11 with the same probability. We use the notation S4NS_{4N} to emphasize the dependency on NN. To establish (25), we first calculate

limN𝔼\displaystyle\lim_{N\to\infty}\mathds{E} [ue(4S4N+t14(2N+1))]\displaystyle\mathopen{}\left[u_{e}\biggl{(}\frac{4S_{4N}+t-1}{4(2N+1)}\biggr{)}\right]
=𝔼[limNue(4S4N+t14(2N+1))]\displaystyle=\mathds{E}\mathopen{}\left[\lim_{N\to\infty}u_{e}\biggl{(}\frac{4S_{4N}+t-1}{4(2N+1)}\biggr{)}\right] (57)
=𝔼[ue(limN4S4N+t14(2N+1))]\displaystyle=\mathds{E}\mathopen{}\left[u_{e}\biggl{(}\lim_{N\to\infty}\frac{4S_{4N}+t-1}{4(2N+1)}\biggr{)}\right] (58)
=𝔼[ue(1)]\displaystyle=\mathds{E}[u_{e}(1)] (59)
=ue(1).\displaystyle=u_{e}(1). (60)

Here, (57) follows from the dominated convergence theorem [54, Sec. 5.6.(12).(b)], whose application is possible because ue(x)u_{e}(x) is a bounded function, (58) holds because ue(x)u_{e}(x) is continuous, and (59) follows from the law of large numbers (see e.g., [54, Sec. 7.4.(3)]). The proof of (25) is completed by using

ue\displaystyle u_{e} (1)=Q(P/5PASE+ηP3)e\displaystyle(1)=Q\mathopen{}\left(\sqrt{\frac{P/5}{P_{\text{ASE}}+\eta P^{3}}}\right)^{e} (61)

and (59) in (54) together with (21)–(23).

The proof of the BER expression in (24) follows steps similar to the ones we presented above.

Appendix D Proof of Theorem 3

Consider a sequence of independent symbols Xk=RkeȷΦk,kX_{k}=R_{k}e^{\jmath\Phi_{k}},k\in\mathbb{Z}, where for each kk, the magnitude RkR_{k} is independent of the phase Φk\Phi_{k}, which is uniform in [0,2π)[0,2\pi). The magnitude RkR_{k} is distributed according to fRf_{R} if k=0mod(2N+1)k=0\mod(2N+1) and is otherwise equal to the constant r1r_{1}. Furthermore, fRf_{R} and r1r_{1} are chosen so that (35) holds, which guarantees that the average power constraint is satisfied. We will next show that the right-hand side of (36) is the mutual information (in bits per channel use) obtainable with this input distribution. Hence, it is a lower bound on capacity.

We define blocks of length 2N+12N+1 of transmitted and received symbols as

𝒀l\displaystyle\boldsymbol{Y}_{l} 𝒀l(2N+1)Nl(2N+1)+N,\displaystyle\triangleq\boldsymbol{Y}_{l(2N+1)-N}^{l(2N+1)+N},
𝑿l\displaystyle\boldsymbol{X}_{l} 𝑿l(2N+1)Nl(2N+1)+N\displaystyle\triangleq\boldsymbol{X}_{l(2N+1)-N}^{l(2N+1)+N}

for ll\in\mathbb{Z}. Let us focus for a moment on the received block 𝒀0\boldsymbol{Y}_{0}. Let YkY_{k} be the kkth element (k=N,,Nk=-N,\dots,N) of 𝒀0\boldsymbol{Y}_{0}. It follows from (7) that the additive noise contribution to YkY_{k} depends on the input vector 𝑿kNk+N\|\boldsymbol{X}_{k-N}^{k+N}\|, which may span more than one input block. By construction, however, all elements of 𝑿kNk+N\boldsymbol{X}_{k-N}^{k+N} with the exception of X0X_{0} have constant magnitude equal to r1r_{1}. Hence,

𝑿kNk+N2=|X0|2+2Nr12.\displaystyle\|\boldsymbol{X}_{k-N}^{k+N}\|^{2}=\left\lvert X_{0}\right\rvert^{2}+2Nr_{1}^{2}. (62)

This implies that

fYk|𝑿kNk+N(yk|𝒙kNk+N)=1πρ(2Nr12+|x0|2)exp(|ykxk|2ρ(2Nr12+|x0|2)).f_{Y_{k}|\boldsymbol{X}_{k-N}^{k+N}}(y_{k}|\boldsymbol{x}_{k-N}^{k+N})\\ =\frac{1}{\pi\rho(2Nr^{2}_{1}+\left\lvert x_{0}\right\rvert^{2})}\mathrm{exp}\left(\frac{\left\lvert y_{k}-x_{k}\right\rvert^{2}}{\rho(2Nr^{2}_{1}+\left\lvert x_{0}\right\rvert^{2})}\right). (63)

We see from (63) that each output sample YkY_{k} in 𝒀0\boldsymbol{Y}_{0} actually depends on the input symbols only through XkX_{k} and X0X_{0}. We then conclude that 𝒀0\boldsymbol{Y}_{0} depends on the whole input sequence only through 𝑿0\boldsymbol{X}_{0}. But this, together with the assumption of independent input symbols, implies that the output blocks {𝒀l}\{\boldsymbol{Y}_{l}\} are independent. Hence, from (34),

C12N+1I(𝑿l,;𝒀l)\displaystyle C\geq\frac{1}{2N+1}I(\boldsymbol{X}_{l},;\boldsymbol{Y}_{l}) (64)

for an arbitrary ll\in\mathbb{Z}, say, l=0l=0.

Next, we calculate I(𝑿0;𝒀0)I(\boldsymbol{X}_{0};\boldsymbol{Y}_{0}). The mutual information can be decomposed into differential entropies as

I(𝑿0;𝒀0)=h(𝒀0)h(𝒀0|𝑿0),\displaystyle I(\boldsymbol{X}_{0};\boldsymbol{Y}_{0})=h(\boldsymbol{Y}_{0})-h(\boldsymbol{Y}_{0}|\boldsymbol{X}_{0}), (65)

where

h(𝒀0)\displaystyle h(\boldsymbol{Y}_{0}) =𝔼[log2f𝒀0(𝒀0)],\displaystyle=-\mathbb{E}[\log_{2}f_{\boldsymbol{Y}_{0}}(\boldsymbol{Y}_{0})], (66)
h(𝒀0|𝑿0)\displaystyle h(\boldsymbol{Y}_{0}|\boldsymbol{X}_{0}) =𝔼[log2f𝒀0|𝑿0(𝒀0|𝑿0)].\displaystyle=-\mathbb{E}[\log_{2}f_{\boldsymbol{Y}_{0}|\boldsymbol{X}_{0}}(\boldsymbol{Y}_{0}|\boldsymbol{X}_{0})]. (67)

We start by evaluating (67). Because of (63), the conditional distribution of 𝒀0\boldsymbol{Y}_{0} given 𝑿0\boldsymbol{X}_{0} is the multivariate Gaussian density

f𝒀0|𝑿0(𝒚0|𝒙0)=1(πρ(2Nr12+|x0|2))2N+1exp(𝒚0𝒙02ρ(2Nr12+|x0|2)).f_{\boldsymbol{Y}_{0}|\boldsymbol{X}_{0}}(\boldsymbol{y}_{0}|\boldsymbol{x}_{0})\\ =\frac{1}{\bigl{(}\pi\rho(2Nr_{1}^{2}+\left\lvert x_{0}\right\rvert^{2})\bigr{)}^{2N+1}}\mathrm{exp}\mathopen{}\left(-\frac{\|\boldsymbol{y}_{0}-\boldsymbol{x}_{0}\|^{2}}{\rho(2Nr^{2}_{1}+\left\lvert x_{0}\right\rvert^{2})}\right). (68)

Using [17, Theorem 8.4.1], we conclude that

h(𝒀0|𝑿0)=(2N+1)𝔼[log2πρ(2Nr12+|X0|2)],\displaystyle h(\boldsymbol{Y}_{0}|\boldsymbol{X}_{0})=(2N+1)\mathbb{E}[\log_{2}\pi\rho(2Nr_{1}^{2}+\left\lvert X_{0}\right\rvert^{2})], (69)

where the expectation is with respect to the random variable |X0|\left\lvert X_{0}\right\rvert, which is distributed according to fRf_{R}.

To evaluate (66), we start by noting that all elements of 𝒀0\boldsymbol{Y}_{0} have uniform phase because the transmitted symbols and the additive noise samples have uniform phase by assumption. We use this property to simplify (66). Specifically, let Uk=|Yk|2U_{k}=\left\lvert Y_{k}\right\rvert^{2} and

𝑼[UN,UN+1,,UN].\displaystyle\boldsymbol{U}\triangleq[U_{-N},U_{-N+1},\ldots,U_{N}]. (70)

By [55, eq. (320)]

h(𝒀0)=(2N+1)log2(π)+h(𝑼).\displaystyle h(\boldsymbol{Y}_{0})=(2N+1)\log_{2}(\pi)+h(\boldsymbol{U}). (71)

To evaluate h(𝑼)=𝔼[log2(f𝑼(𝑼))]h(\boldsymbol{U})=-\mathbb{E}[\log_{2}(f_{\boldsymbol{U}}(\boldsymbol{U}))], we first derive the conditional distribution f𝑼||X0|f_{\boldsymbol{U}|\left\lvert X_{0}\right\rvert} of 𝑼\boldsymbol{U} given |X0|\left\lvert X_{0}\right\rvert. Note that UkU_{k} has the same distribution as ||Xk|+ρ(2Nr12+|X0|2)Z~k|2\left\lvert\left\lvert X_{k}\right\rvert+\sqrt{\rho(2Nr^{2}_{1}+\left\lvert X_{0}\right\rvert^{2})}\tilde{Z}_{k}\right\rvert^{2} (see (1) and (7)). Hence, given |X0|=r\left\lvert X_{0}\right\rvert=r, the random variables {2Uk/ρ(2Nr12+r2)}\{2U_{k}/\rho(2Nr^{2}_{1}+r^{2})\} follow a noncentral chi-square distribution with two degrees of freedom and noncentrality parameters {2|Xk|2/ρ(2Nr12+r2)}\{2\left\lvert X_{k}\right\rvert^{2}/\rho(2Nr^{2}_{1}+r^{2})\}, where |Xk|=r1\left\lvert X_{k}\right\rvert=r_{1} if k0k\neq 0 and |Xk|=r\left\lvert X_{k}\right\rvert=r otherwise. Furthermore, these random variables are conditionally independent given |X0|\left\lvert X_{0}\right\rvert. Using the change of variable theorem for transformation of random variables, we finally obtain after algebraic manipulations

f𝑼||X0|(𝒖|r)\displaystyle f_{\boldsymbol{U}|\left\lvert X_{0}\right\rvert}(\boldsymbol{u}|r) =\displaystyle= exp(k=NNuk+2Nr12+r2ρ(2Nr12+r2))(ρ(2Nr12+r2))2N+1\displaystyle\frac{\mathrm{exp}\mathopen{}\left(-\frac{\sum_{k=-N}^{N}u_{k}+2Nr_{1}^{2}+r^{2}}{\rho(2Nr^{2}_{1}+r^{2})}\right)}{\bigl{(}\rho(2Nr^{2}_{1}+r^{2})\bigr{)}^{2N+1}} (72)
I0(2ru0ρ(2Nr12+r2))\displaystyle\cdot I_{0}\mathopen{}\left(\frac{2r\sqrt{u_{0}}}{\rho(2Nr_{1}^{2}+r^{2})}\right)
k=Nk0NI0(2r1ukρ(2Nr12+r2)).\displaystyle\cdot\prod_{\begin{subarray}{c}k=-N\\ k\neq 0\end{subarray}}^{N}I_{0}\mathopen{}\left(\frac{2r_{1}\sqrt{u_{k}}}{\rho(2Nr_{1}^{2}+r^{2})}\right).

The probability distribution f𝑼f_{\boldsymbol{U}}, which is given in (37), is obtained from (72) by taking the expectation with respect tofRf_{R}, the probability distribution of |X0|\left\lvert X_{0}\right\rvert. Finally, we obtain the capacity lower bound (36) by substituting (37) into (66) and (69) into (67), by computing the difference between the two resulting differential entropies according to (65), and by dividing by 2N+12N+1.

References

  • [1] H. Sun, K.-T. Wu, and K. Roberts, “Real-time measurements of a 40 Gb/s coherent system,” Opt. Express, vol. 16, no. 2, pp. 873–879, 2008.
  • [2] K. Roberts, M. O’Sullivan, K.-T. Wu, H. Sun, A. Awadalla, D. J. Krause, and C. Laperle, “Performance of dual-polarization QPSK for optical transport systems,” J. Lightw. Technol., vol. 27, no. 16, pp. 3546–3559, Aug. 2009.
  • [3] A. D. Ellis, J. Zhao, and D. Cotter, “Approaching the non-linear Shannon limit,” J. Lightw. Technol., vol. 28, no. 4, pp. 423–433, Feb. 2010.
  • [4] A. Mecozzi and R.-J. Essiambre, “Nonlinear Shannon limit in pseudolinear coherent systems,” J. Lightw. Technol., vol. 30, no. 12, pp. 2011–2024, June 2012.
  • [5] A. Splett, C. Kurtzke, and K. Petermann, “Ultimate transmission capacity of amplified optical fiber communication systems taking into account fiber nonlinearities,” in Proc. European Conference on Optical Communication (ECOC), Montreux, Switzerland, Sept. 1993.
  • [6] J. Tang, “The channel capacity of a multispan DWDM system employing dispersive nonlinear optical fibers and an ideal coherent optical receiver,” J. Lightw. Technol., vol. 20, no. 7, pp. 1095–1101, July 2002.
  • [7] P. Poggiolini, A. Carena, V. Curri, G. Bosco, and F. Forghieri, “Analytical modeling of nonlinear propagation in uncompensated optical transmission links,” IEEE Photon. Technol. Lett., vol. 23, no. 11, pp. 742–744, June 2011.
  • [8] G. Bosco, P. Poggiolini, A. Carena, V. Curri, and F. Forghieri, “Analytical results on channel capacity in uncompensated optical links with coherent detection,” Opt. Express, vol. 19, no. 26, pp. B440–B449, Dec. 2011.
  • [9] A. Carena, V. Curri, G. Bosco, P. Poggiolini, and F. Forghieri, “Modeling of the impact of nonlinear propagation effects in uncompensated optical coherent transmission links,” J. Lightw. Technol., vol. 30, no. 10, pp. 1524–1539, May 2012.
  • [10] P. Poggiolini, “The GN model of non-linear propagation in uncompensated coherent optical systems,” J. Lightw. Technol., vol. 24, no. 30, pp. 3875–3879, Dec. 2012.
  • [11] L. Beygi, E. Agrell, P. Johannisson, M. Karlsson, and H. Wymeersch, “A discrete-time model for uncompensated single-channel fiber-optical links,” IEEE Trans. Commun., vol. 60, no. 11, pp. 3440–3450, Nov. 2012.
  • [12] P. Johannisson and M. Karlsson, “Perturbation analysis of nonlinear propagation in a strongly dispersive optical communication system,” J. Lightw. Technol., vol. 31, no. 8, pp. 1273–1282, Apr. 2013.
  • [13] E. Grellier and A. Bononi, “Quality parameter for coherent transmissions with Gaussian-distributed nonlinear noise,” Opt. Express, vol. 19, no. 13, pp. 12 781–12 788, June 2011.
  • [14] L. Beygi, N. V. Irukulapati, E. Agrell, P. Johannisson, M. Karlsson, H. Wymeersch, P. Serena, and A. Bononi, “On nonlinearly-induced noise in single-channel optical links with digital backpropagation,” Optics Express, vol. 21, no. 22, pp. 26 376–26 386, Nov. 2013.
  • [15] F. V. O. Rival, C. Simonneau, E. Grellier, A. Bononi, L. Lorcy, J.-C. Antona, and S. Bigo, “On nonlinear distortions of highly dispersive optical coherent systems,” Opt. Express, vol. 20, no. 2, pp. 1022–1032, Jan. 2012.
  • [16] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, July, Oct. 1948.
  • [17] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley, 2006.
  • [18] J. B. Stark, “Fundamental limits of information capacity for optical communications channels,” in Proc. European Conference on Optical Communication (ECOC), Nice, France, Sep. 1999.
  • [19] P. P. Mitra and J. B. Stark, “Nonlinear limits to the information capacity of optical fibre communications,” Nature, vol. 411, pp. 1027–1030, June 2001.
  • [20] K. S. Turitsyn, S. A. Derevyanko, I. V. Yurkevich, and S. K. Turitsyn, “Information capacity of optical fiber channels with zero average dispersion,” Physical Review Letters, vol. 91, no. 20, pp. 203 901 1–4, Nov. 2003.
  • [21] I. B. Djordjevic and B. Vasic, “Achievable information rates for high-speed long-haul optical transmission,” J. Lightw. Technol., vol. 11, no. 23, pp. 3755–3763, Nov. 2005.
  • [22] M. H. Taghavi, G. C. Papen, and P. H. Siegel, “On the multiuser capacity of WDM in a nonlinear optical fiber: Coherent communication,” IEEE Trans. Inf. Theory, vol. 52, no. 11, pp. 5008–5022, Nov. 2006.
  • [23] R.-J. Essiambre, G. Kramer, P. J. Winzer, G. J. Foschini, and B. Goebel, “Capacity limits of optical fiber networks,” J. Lightw. Technol., vol. 28, no. 4, pp. 662–701, Feb. 2010.
  • [24] M. Secondini, E. Forestieri, and G. Prati, “Achievable information rate in nonlinear WDM fiber-optic systems with arbitrary modulation formats and dispersion maps,” J. Lightw. Technol., vol. 31, no. 23, pp. 3839–3852, Dec. 2013.
  • [25] R. Dar, M. Shtaif, and M. Feder, “New bounds on the capacity of the nonlinear fiber-optic channel,” Optics Letters, vol. 39, no. 2, pp. 398–401, Jan. 2014.
  • [26] J. M. Kahn and K.-P. Ho, “Spectral efficiency limits and modulation/detection techniques for DWDM systems,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 10, no. 2, pp. 259–272, Mar./Apr 2004.
  • [27] E. Narimanov and P. Mitra, “The channel capacity of a fiber optics communication system: perturbation theory,” J. Lightw. Technol., vol. 20, no. 3, pp. 530–537, Mar. 2002.
  • [28] L. G. L. Wegener, M. L. Povinelli, A. G. Green, P. P. Mitra, J. B. Stark, and P. B. Littlewood, “The effect of propagation nonlinearities on the information capacity of WDM optical fiber systems: Cross-phase modulation and four-wave mixing,” Physica D: Nonlinear Phenomena, vol. 189, no. 1-2, pp. 81–99, Feb. 2004.
  • [29] R.-J. Essiambre, G. J. Foschini, G. Kramer, and P. J. Winzer, “Capacity limits of information transport in fiber-optic networks,” Physical Review Letters, vol. 101, no. 16, pp. 163 901 1–4, Oct. 2008.
  • [30] T. Freckmann, R.-J. Essiambre, P. J. Winzer, G. J. Foschini, and G. Kramer, “Fiber capacity limits with optimized ring constellations,” IEEE Photon. Technol. Lett., vol. 21, no. 20, pp. 1496–1498, Oct. 2009.
  • [31] I. B. Djordjevic, H. G. Batshon, L. Xu, and T. Wang, “Coded polarization-multiplexed iterative polar modulation (PM-IPM) for beyond 400 Gb/s serial optical transmission,” in Proc. Optical Fiber Communication Conference (OFC), San Diego, CA, Mar. 2010.
  • [32] R. I. Killey and C. Behrens, “Shannon’s theory in nonlinear systems,” Journal of Modern Optics, vol. 58, no. 1, pp. 1–10, Jan. 2011.
  • [33] E. Agrell and M. Karlsson, “Power-efficient modulation formats in coherent transmission systems,” J. Lightw. Technol., vol. 27, no. 22, pp. 5115–5126, Nov. 2009.
  • [34] ——, “WDM Channel Capacity and its Dependence on Multichannel Adaptation Models,” in Proc. Optical Fiber Communication Conference (OFC), Anaheim, CA, Mar. 2013.
  • [35] B. Goebel, R.-J. Essiambre, G. Kramer, P. J. Winzer, and N. Hanik, “Calculation of mutual information for partially coherent Gaussian channels with applications to fiber optics,” IEEE Trans. Inf. Theory, vol. 57, no. 9, pp. 5720–5736, Sep. 2011.
  • [36] G. Bosco, P. Poggiolini, A. Carena, V. Curri, and F. Forghieri, “Analytical results on channel capacity in uncompensated optical links with coherent detection: Erratum,” Opt. Express, vol. 20, no. 17, pp. 19 610–19 611, Aug. 2012.
  • [37] E. Agrell and M. Karlsson, “Satellite constellations: Towards the nonlinear channel capacity,” in Proc. IEEE Photon. Conf. (IPC), Burlingame, CA, Sept. 2012.
  • [38] A. Carena, G. Bosco, V. Curri, P. Poggiolini, M. T. Taiba, and F. Forghieri, “Statistical characterization of PM-QPSK signals after propagation in uncompensated fiber links,” in Proc. European Conference on Optical Communication (ECOC), London, U.K., Sept. 2010.
  • [39] T. Koch, A. Lapidoth, and P. Sotiriadis, “Channels that heat up,” IEEE Trans. Inf. Theory, vol. 55, no. 8, pp. 3594 –3612, Aug. 2009.
  • [40] R. G. Gallager, Information Theory and Reliable Communication. New York, NY: Wiley, 1968.
  • [41] E. Ip and J. M. Kahn, “Digital equalization of chromatic dispersion and polarization mode dispersion,” J. Lightw. Technol., vol. 25, no. 8, pp. 2033–2043, Aug. 2007.
  • [42] E. Agrell, J. Lassing, E. G. Ström, and T. Ottosson, “On the optimality of the binary reflected Gray code,” IEEE Trans. Inf. Theory, vol. 50, no. 12, pp. 3170–3182, Dec. 2004.
  • [43] M. P. Fitz and J. P. Seymour, “On the bit error probability of qam modulation,” International Journal of Wireless Information Networks, vol. 1, no. 2, pp. 131–139, Apr. 1994.
  • [44] M. K. Simon, S. M. Hinedi, and W. C. Lindsey, Digital Communication Techniques: Signal Design and Detection. Englewood Cliffs, NJ: Prentice-Hall, 1995.
  • [45] A. Mecozzi, “Limits to long-haul coherent transmission set by the kerr nonlinearity and noise of the in-line amplifiers,” J. Lightw. Technol., vol. 12, no. 11, pp. 1993–2000, Nov. 1994.
  • [46] A. Demir, “Nonlinear phase noise in optical-fiber-communication systems,” J. Lightw. Technol., vol. 25, no. 8, pp. 2002–2032, Aug. 2007.
  • [47] G. P. Agrawal, Fiber-optic communication systems, 4th ed. Wiley, 2010.
  • [48] S. Verdú and T. S. Han, “A general formula for channel capacity,” IEEE Trans. Inf. Theory, vol. 40, no. 4, pp. 1147–1157, July 1994.
  • [49] K.-T. Fang, S. Kotz, and K. W. Ng, Symmetric Multivariate and Related Distributions. Springer, 1990.
  • [50] S. Kotz and S. Nadarajah, Multivariate t Distributions and Their Applications. Cambridge University Press, 2004.
  • [51] N. M. Blachman, “A comparison of the informational capacities of amplitude- and phase-modulation communication systems,” Proceedings of the I.R.E., vol. 41, no. 6, pp. 748–759, June 1953.
  • [52] K.-P. Ho and J. M. Kahn, “Channel capacity of WDM systems using constant-intensity modulation formats,” in Proc. Optical Fiber Communication Conference (OFC), Anaheim, CA, Mar. 2002.
  • [53] E. Agrell, “On monotonic capacity–cost functions,” 2012, preprint. [Online]. Available: http://arxiv.org/abs/1209.2820
  • [54] G. R. Grimmett and D. R. Stirzaker, Probability and Random Processes, 3rd ed. Oxford University Press, 2001.
  • [55] A. Lapidoth and S. M. Moser, “Capacity bounds via duality with applications to multiple-antenna systems on flat-fading channels,” IEEE Trans. Inf. Theory, vol. 49, no. 10, pp. 2426–2467, Oct. 2003.