Capacity of a Nonlinear Optical Channel
with Finite Memory
Abstract
The channel capacity of a nonlinear, dispersive fiber-optic link is revisited. To this end, the popular Gaussian noise (GN) model is extended with a parameter to account for the finite memory of realistic fiber channels. This finite-memory model is harder to analyze mathematically but, in contrast to previous models, it is valid also for nonstationary or heavy-tailed input signals. For uncoded transmission and standard modulation formats, the new model gives the same results as the regular GN model when the memory of the channel is about 10 symbols or more. These results confirm previous results that the GN model is accurate for uncoded transmission. However, when coding is considered, the results obtained using the finite-memory model are very different from those obtained by previous models, even when the channel memory is large. In particular, the peaky behavior of the channel capacity, which has been reported for numerous nonlinear channel models, appears to be an artifact of applying models derived for independent input in a coded (i.e., dependent) scenario.
Index Terms:
Channel capacity, channel model, fiber-optic communications, Gaussian noise model, nonlinear distortion.I Introduction
The introduction of coherent optical receivers has brought significant advantages in fiber optical communications, e.g., enabling efficient polarization demultiplexing, higher-order modulation formats, increased sensitivity, and electrical mitigation of transmission impairments[1, 2]. Even if the linear transmission impairments (such as chromatic and polarization-mode dispersion) can be dealt with electronically, the Kerr nonlinearity in the fiber remains a significant obstacle. Since the nonlinearity causes signal distortions at high signaling powers, arbitrarily high signal-to-noise ratios are inaccessible, which limits transmission over long distances and high spectral efficiencies. This is sometimes referred to as the “nonlinear Shannon limit” [3, 4].
For systems with large accumulated dispersion and weak nonlinearity, the joint effect of chromatic dispersion and the Kerr effect is similar to that of additive Gaussian noise. This was pointed out already by Splett [5] and Tang [6]. The emergence of this Gaussian noise is prevalent in links that have no inline dispersion compensation, such as today’s coherent links, where the dispersion compensation takes place electronically in the receiver signal processing. This Gaussian noise approximation has been recently rediscovered and applied to today’s coherent links in a series of papers by Poggiolini et al. [7, 8, 9, 10] and other groups [11, 12, 13]. The resulting so-called Gaussian noise model, or GN model for short, is valid for multi-channel (wavelength- and polarization-division multiplexed) signals. It has also been shown to work for single-channel and single-polarization transmission if the dispersive decorrelation is large enough [11, 14].
A crucial assumption in the derivation of the GN model is that of independent, identically distributed (i.i.d.) inputs: the transmitted symbols are independent of each other, are drawn from the same constellation, and have the same constellation scaling (the same average transmit power). Under these assumptions, the model has been experimentally verified to be very accurate [15, 9] for the most common modulation formats, such as quadrature amplitude modulation (QAM) or phase-shift keying.
In this paper, the assumption of i.i.d. inputs is, perhaps for the first time in optical channel modeling, relaxed. This is done by introducing a modified GN model, which we call the finite-memory GN model. This new model includes the memory of the channel as a parameter and differs from previous channel models in that it is valid also when the channel input statistics are time-varying, or when “heavy-tailed” constellations are used.
The performance predicted by the regular GN model (both in terms of uncoded error probability and channel capacity) is compared with the ones predicted by the finite-memory GN model. The uncoded performance is characterized in terms of symbol error rate (SER) and bit error rate (BER), assuming i.i.d. data. Exact analytical expressions are obtained for 16-ary QAM (16-QAM), which show that the GN model is accurate for uncoded transmission and standard modulation formats, confirming previous results.
The main contributions of the paper are in terms of coded performance. Shannon, the father of information theory, proved that for a given channel, it is possible to achieve an arbitrarily small error probability, if the transmission rate in bits per symbol is small enough. A rate for which virtually error-free transmission is possible is called an achievable rate and the supremum over all achievable rates for a given channel, represented as a statistical relation between its input and output , is defined as the channel capacity [16], [17, p. 195]. A capacity-approaching transmission scheme operates in general by grouping the data to be transmitted into blocks, encoding each block into a sequence of coded symbols, modulating and transmitting this sequence over the channel, and decoding the block in the receiver. This coding process introduces, by definition, dependencies among the transmitted symbols, which is the reason why channel models derived for i.i.d. inputs may be questionable for the purpose of capacity analysis.
More fundamentally, the regular GN model is not well-suited to capacity analysis, because in this model each output sample depends on the statistics of the previously transmitted input symbols (through their average power) rather than on their actual value. This yields artifacts in capacity analysis. One such artifact is the peaky behavior of the capacity of the GN model as a function of the transmit power. Indeed, through a capacity lower bound it is shown in this paper that this peaky behavior does not occur for the finite-memory GN model, even when the memory is taken to be arbitrary large.
The analysis of channel capacity for fiber-optical transmission dates back to 1993 [5], when Splett et al. quantified the impact of nonlinear four-wave mixing on the channel capacity. By applying Shannon’s formula for the additive white Gaussian noise (AWGN) channel capacity to a channel with power-dependent noise, Splett et al. found that there exists an “optimal” finite signal-to-noise ratio that maximizes capacity. Beyond this value, capacity starts decreasing. It was however not motivated in [5] why the noise was assumed Gaussian. Using a different model for four-wave mixing, Stark [18] showed that capacity saturates, but does not decrease, at high power. In the same paper, the capacity loss due to the quantum nature of light was quantified. In 2001, Mitra and Stark [19] considered the capacity in links where cross-phase modulation dominates, proved that the capacity is lower-bounded by the capacity of a linear, Gaussian channel with the same input–output covariance matrix, and evaluated this bound via Shannon’s AWGN formula. The obtained bound vanishes at high input power. It was claimed, without motivation, that the true capacity would have the same qualitative nonmonotonic behavior.
Since 2001, the interest in optical channel capacity has virtually exploded. The zero-dispersion channel was considered by Turitsyn et al. [20]. The joint effect of nonlinearity and dispersion was modeled by Djordjevic et al. [21] as a finite-state machine, which allowed the capacity to be estimated using the Bahl–Cocke–Jelinek–Raviv (BCJR) algorithm. Taghavi et al. [22] considered a fiber-optical multiuser system as a multiple-access channel and characterized its capacity region. In a very detailed tutorial paper, Essiambre et al. [23] applied a channel model based on extensive lookup tables and obtained capacity lower bounds for a variety of scenarios. Secondini et al. [24] obtained lower bounds using the theory of mismatched decoding. Recently, Dar et al. [25] modeled the nonlinear phase noise as being blockwise constant for a certain number of symbols, which is a channel with finite memory, obtaining improved capacity bounds.
Detailed literature reviews are provided in [26] for the early results, and in [23] for more recent results. Other capacity estimates, or lower bounds thereon, were reported for various nonlinear transmission scenarios in, e.g., [27, 28, 29, 30, 31, 32, 8, 4]. Most of these estimates or bounds decrease to zero as the power increases.
This paper is organized as follows. In Sec. II, the GN model is reviewed and the finite-memory GN model is introduced. In Sec. III, the uncoded error performance of the new finite-memory model is studied. The channel capacity is studied in Sec. IV and conclusions are drawn in Sec. V. The mathematical proofs are relegated to appendices.
Notation: Throughout this paper, vectors are denoted by boldface letters and sets are denoted by calligraphic letters . Random variables are denoted by uppercase letters and their (deterministic) outcomes by the same letter in lowercase . Probability density functions (PDFs) and conditional PDFs are denoted by and , respectively. Analogously, probability mass functions (PMF) are denoted by and . Expectations are denoted by and random sequences by .
II Channel Modeling: Finite and Infinite Memory
In this section, we will begin with a high-level description of the nonlinear interference in optical dual-polarization wavelength-division multiplexing (WDM) systems, highlighting the role of the channel memory, and thereafter in Sec. II-B–II-D describe in detail the channel models considered in this paper.
II-A Nonlinear Interference in Optical Channels
A coherent optical communication link converts a discrete, complex-valued electric data signal to a modulated, continuous optical signal, which is transmitted through an optical fiber, received coherently, and then converted back to a discrete output sequence . The coherent link is particularly simple theoretically, in that the transmitter and receiver directly map the electric data to the optical field, which is a linear operation (in contrast with, e.g., direct-detection receivers), and can ideally be performed without distortions. The channel is then well described by the propagation of the (continuous) optical field in the fiber link. It should be emphasized that this assumes the coherent receiver to be ideal, with perfect synchronization and negligible phase noise. Experiments have shown [2] that commercial coherent receivers can indeed perform well enough for the fiber propagation effects to be the main limitations. Two main linear propagation effects in the fiber need to be addressed: dispersion and attenuation. The attenuation effects can be overcome by periodic optical amplification, at the expense of additive Gaussian noise from the inline amplifiers. The dispersion effects are usually equalized electronically by a filter in the coherent receiver. Such a linear optical link can be well-described by an AWGN channel, the capacity of which is unbounded with the signal power.
However, the fiber Kerr-nonlinearity introduces signal distortions, and greatly complicates the transmission modeling. The nonlinear signal propagation in the fiber is described by a nonlinear partial differential equation, the nonlinear Schrödinger equation (NLSE), which includes dispersion, attenuation, and nonlinearity. At high power levels, the three effects can no longer be conveniently separated. However, in contemporary coherent links (distance at least and symbol rate at least ), the nonlinearity is significantly weaker than the other two effects, and a perturbation approach can be successfully applied to the NLSE [5, 10, 11, 12]. This leads to the GN model, which will be described in Sec. II-C.
II-B Finite Memory
Even today’s highly dispersive optical links have a finite memory. For example, a signal with dispersive length , where is the group velocity dispersion and the optical bandwidth, broadens (temporally) a factor over a fiber of length . With typical dispersion lengths of –, this broadening factor can correspond to hundreds to thousands of adjacent symbols, a large but finite number. The same will hold for interaction among WDM channels; if one interprets as the channel separation, will give an approximation of the number of symbols that two WDM channels separate due to walk-off (and hence interact with nonlinearly during transmission). The channel memory will thus be even larger in the WDM case, and increase with channel separation, but the nonlinear interaction will decrease due to the shorter . Thus, the principle of a finite channel memory holds also for WDM signals. To keep notation as simple as possible, we will consider a single, scalar, wavelength channel in this paper. Extensions to dual polarizations and WDM are possible, but will involve obscuring complications such as four-dimensional constellation space [33] in the former case and behavioral models [34] in the latter. We can thus say that in an optical link a certain signal may sense the interference from neighboring symbols, which is the physical reason for introducing a finite-memory model.
If we let the range of the interfering symbols go to infinity, an even simpler type of model is obtained. The interference is now averaged over infinitely many transmitted symbols. Assuming that an i.i.d. sequence is transmitted, this time average converges to a statistical average, which greatly simplifies the analysis. All models suggested for dispersive optical channels so far belong to this category [5, 23, 35, 10, 4, 11, 12, 14], of which the GN model described in Sec. II-C is the most common.
For a given transmitted complex symbol , the (complex) single-channel output at each discrete-time is modeled as
(1) |
where is a circularly symmetric, complex, white, Gaussian random sequence, independent of . In (1), is assumed to be independent of the actual transmitted sequence . However, the variance of depends on the transmit power, as detailed in Sec. II-C and II-D.
II-C The Regular GN Model
For coherent long-haul fiber-optical links without dispersion compensation, Splett et al. [5], Poggiolini et al. [7], and Beygi et al. [11] have all derived models where the nonlinear interference (NLI) appears as Gaussian noise, whose statistics depend on the transmitted signal power via a cubic relationship. The models assume that the transmitted symbols in time slot are i.i.d. realizations of the same complex random variable . In this model, the additive noise in (1) is given by
(2) |
where are i.i.d. zero-mean unit-variance circularly symmetric complex Gaussian random variables, and are real, nonnegative constants, and is the average transmit power. Therefore, the noise is distributed as , where denotes a circularly symmetric complex Gaussian random variable with mean and variance . The parameter , which is a property of the transmitter, governs the behavior of the channel model. This can be intuitively understood as a long-term average of the input power. Mathematically,
(3) |
for any given , still assuming i.i.d. symbols . For this reason, we will refer to models that depend on infinitely many past and/or future symbols, via in (3) or in some other way, as infinite-memory models.
The cubic relation in (2) between the transmit power and the additive noise variance is a consequence of the Kerr nonlinearity, and holds for both lumped and distributed amplification schemes. The constant represents the total amplified spontaneous emission (ASE) noise of the optical amplifiers for the channel under study, while quantifies the NLI. Several related expressions for this coefficient have been proposed. For example, for distributed amplification and WDM signaling over the length ,
(4) | ||||
(5) |
were proposed in [5] and [36], resp., where is the fiber nonlinear coefficient, is the total WDM bandwidth, and is the symbol rate. Obviously, the expressions in (4) and (5) are qualitatively similar. For dual polarization and single channel transmission over lumped amplifier spans, the expression
(6) |
was proposed in [14], and a qualitatively similar formula can be obtained from the results in [10]. Here, is the attenuation coefficient of the fiber and the coefficient is between 0 and 1 (see [10, 14]) depending on how well the nonlinear interference decorrelates between each amplifier span. For single polarization transmission, the coefficient in (6) should be replaced by [11].
The benefits of the GN model is that it is very accurate for uncoded transmission with traditional modulation formats,111The model is not valid for exotic modulation formats such as satellite constellations [37]. as demonstrated in experiments and simulations [38, 15, 9], and that it is very simple to analyze. It is, however, not intended for nonstationary input sequences, i.e., sequences whose statistics vary with time, because the transmit power in (2) is defined as the (constant) power of a random variable that generates the i.i.d. symbols . In order to capture the behavior of a wider class of transmission schemes, the GN model can be modified to depend on a time-varying transmit power, which is the topic of the next section.
II-D The Finite-Memory GN Model
As mentioned in Sec. I and II-C, a finite-memory model is essential in order to model the channel output corresponding to time-varying input distributions. Therefore, we refine the GN model in Sec. II-C to make it explicitly dependent on the channel memory , in such a way that the model “converges” to the regular GN model as . Many such models can be formulated. In this paper, we aim for simplicity rather than accuracy.
The proposed model assumes that the input–output relation is still given by (1), but the average transmit power in (2) is replaced by an empirical power, i.e., by the arithmetic average of the squared magnitude of the symbol and of the symbols around it. Mathematically, (2) is replaced by
(7) |
for any , where is the (one-sided) channel memory. We refer to (1) and (7) as the finite-memory GN model. Since (second-order) group velocity dispersion causes symmetric broadening with respect to the transit time of the signal, inter-symbol interference from dispersion will act both backwards and forwards in terms of the symbol index. This is why both past and future inputs contribute to the noise power in (7). A somewhat related model for the additive noise in the context of data transmission in electronic circuits has been recently proposed in [39], where the memory is single-sided and the noise scales linearly with the signal power, not cubically as in (7).
Having introduced the finite-memory GN model, we now discuss some particular cases. First, the memoryless AWGN channel model can be obtained from both the GN and finite-memory GN models by setting . In this case, the noise variance is for all . Second, let us consider the scenario where the transmitted symbols is the random process . Then the empirical power at any discrete time is a random variable that depends on the magnitude of the th symbol and the symbols around it. In the limit , this empirical power converges to the “statistical” power in (3), for any i.i.d. process with power , as mentioned in Sec. II-C. This observation shows that the proposed finite-memory model in (7) “converges” to the GN model in (2), provided that the channel memory is sufficiently large and that the process consists of i.i.d. symbols with zero mean and variance .
The purpose of the finite-memory model is to be able to predict the output of the channel when the transmitted symbols are not i.i.d. This is the case for example when the transmitted symbols are a nonstationary process (as will be exemplified in Sec. II-E) and also for coded sequences (which we discuss in Sec. IV). An advantage of the finite-memory model, from a theoretic viewpoint, is that the input–output relation of the channel is modeled as a fixed conditional probability of the output given the input and its history, which is the common notion of a channel model in communication and information theory ever since the work of Shannon [16], [40, p. 74]. This is in contrast to the regular GN model and other channel models, whose conditional distribution change depending on which transmitter the channel is connected to. Specifically, the GN model is represented by a family of such conditional distributions, one for each value of the transmitter parameter .
A drawback with the proposed finite-memory model is that it is more complex than the GN model. Also, our model is not accurate for small values of , since the GN assumption relies on the central limit theorem [7, 11, 12]. Furthermore, we assumed that all the symbols around the symbol affect the noise variance equally. In practice, this is not the case. We nevertheless use the proposed model in this paper because it is relatively easy to analyze (see Sec. III and IV) and because even this simple finite-memory model captures the quantitative effects caused by non-i.i.d. symbols, which is essential for the capacity analysis in Sec. IV.
II-E Numerical Comparison
Before analyzing the finite-memory GN model, we first quantify the chromatic dispersion of the optical fiber. To this end, we simulated the transmission of a single symbol pulse over a over a single-channel, single-polarization fiber link without dispersion compensation. Ten amplifiers spans over a total distance of are simulated using the lossy NLSE model. We used a raised-cosine pulse with peak power and a duration of at half the maximum amplitude, which corresponds to half the symbol slot in a transmission system. The result is illustrated in Fig. 1. At this low power, the nonlinear effects are almost negligible. For clarity of illustration, the ASE noise was neglected by setting . The remaining system parameters are given in Table I and will be used throughout the paper, except when other values are explicitly stated. As we can see, the pulse broadens as it propagates along the fiber, having a width corresponding to about data symbols after of transmission, or a half-width of symbols. This is in good agreement with the relation for symbol memory used in [41, p. 2037], which gives .
Symbol | Value | Meaning |
---|---|---|
Fiber attenuation | ||
Group velocity dispersion | ||
Fiber nonlinear coefficient | ||
Number of amplifier spans | ||
System length | ||
Symbol rate | ||
Total ASE noise | ||
NLI coefficient |
Next, to validate the behavior of the finite-memory model with nonstationary input symbol sequences, we simulated the transmission of independent quadrature phase-shift keying (QPSK) data symbols with a time-varying magnitude, over the same km fiber link, at . The transmitted sequence consists of symbols with of average signal power, 128 symbols at power, 128 symbols at , and so on. The statistical power is then . The chosen pulse shape is a raised-cosine return-to-zero pulse. In Fig. 2, we show the amplitude of the transmitted symbols (red) and received symbols (blue) with three different models: the NLSE, the finite-memory GN model with , and the regular GN model. In the middle and lower plots of Fig. 2, we used the NLI coefficient , which was calculated from (6), using for simplicity. Also in Fig. 2, we used to better illustrate the properties of the nonlinear models.
As can be seen, the agreement between the NLSE simulations and the finite memory model is quite reasonable, but the GN model cannot capture the nonstationary dynamics. The results in Fig. 2 also show that the noise variance in the NLSE simulation is low around the symbols with low input power and high around the symbols with high input power. This behavior is captured by the finite-memory GN model but not by the regular GN model, for which the variance of the noise is the same for any time instant. This illustrates that the GN model (2) should be avoided with nonstationary symbol sequences as the ones used in Fig. 2. This is not surprising, as the model was derived under an i.i.d. assumption. In Sec. V, we will return to this observation when analyzing coded transmission. We believe that the finite-memory GN model proposed here, albeit idealized, is the first model that is able to deal with nonstationary symbol sequences.
III Uncoded Error Probability
We assume that the transmitted symbols are independently drawn from a discrete constellation , where . The symbols are assumed to be selected with the same probability, and thus, the average transmit (statistical) power is given by
(8) |
For each time instant , we denote the sequence of the symbols transmitted around by
(9) |
where the notation emphasizes that is a random vector describing the channel memory at time instant . For future use, we define the function
(10) |
For a given sequence of symbols and a given transmitted symbol , the conditional variance of the additive noise in (7) can be expressed as an explicit function of using (10), i.e.,
(11) |
where denotes the Euclidean norm of . For a given transmitted symbol and a given sequence , the channel law for the finite-memory model is
(12) |
III-A Error Probability Analysis
We consider the equally spaced 16-QAM constellation shown in Fig. 3. In this case, , the minimum Euclidean distance (MED) of the constellation is , and the statistical power is . The binary labeling is the binary reflected Gray code (BRGC) [42], where the first two bits determine the in-phase (real) component of the symbols and the last two bits determine the quadrature (imaginary) components of the symbols. This is shown with colors in Fig. 3.
The maximum-likelihood (ML) symbol-by-symbol detection rule for a given sequence chooses the symbol that maximizes in (III). The decision made by this detector can be expressed as
(13) | |||||
which shows that, due to the dependency of on , this detector is not an MED detector. For simplicity, however, we disregard this term and study the MED detector, which chooses the symbol being closest, in Euclidean distance, to the channel output . Thus
(14) |
where denotes the decision region, or Voronoi region, of .
Remark 1
Remark 2
The ML symbol-by-symbol detector in (13) is suboptimal, i.e., better detectors can be devised. For example, one could design a detector that uses not only the current received symbol, but also the next received symbols. Since the current transmitted symbol will affect the noise of the next symbols, this information could be taken into account to make a better decision on the current symbol. In this paper, however, we focus on the MED detector in (III-A) because of its simplicity.
The following two theorems give closed-form expressions for the BER and SER for the constellation in Fig. 3 when used over the finite-memory GN model.
Theorem 1
For the finite-memory GN model with arbitrary memory , the BER of the MED detector for the 16-QAM constellation in Fig. 3 is given by
(15) |
where
(16) | ||||
(17) | ||||
(18) |
and where
(19) |
Proof:
See Appendix A. ∎
Theorem 2
Proof:
See Appendix B. ∎
Corollary 1
The BER and SER for the finite-memory GN model in the limit are
(24) | ||||
(25) |
Proof:
See Appendix C. ∎
The other extreme case to consider is the memoryless AWGN channel. The BER and SER expressions in this case are given in the following corollary.
Corollary 2
The BER and SER for the memoryless AWGN channel are given by
(26) | ||||
(27) |
The results in Corollaries 1 and 2 correspond to the well-known expressions for the BER and SER for the AWGN channel. In particular, (26) can be found in [43, eq. (10)], [44, eq. (10.36a)] and (27) in [44, eq. (10.32)]. Also, the results in Corollary 2 together with (2) show that the BER and SER for the finite-memory GN model when converge to the BER and SER for the regular GN model.
III-B Numerical Results
We consider the same scenario as in Sec. II-E, with parameters according to Table I. The BER and SER for the 16-QAM constellation in Fig. 3 given by Theorems 1 and 2 are shown in Fig. 4 for different values of . Fig. 4 also shows the asymptotic cases and given by Corollaries 1 and 2, respectively. Furthermore, results obtained via computer simulations of (1)–(2) are included using the ML detector in (13), marked with squares, and the MED detector in (III-A), marked with circles. As expected, the MED detector yields a perfect match with the analytical expressions, whereas the ML detector deviates slightly for small channel memories.
The results in Fig. 4 show that in the low-input-power regime, the memory in the channel plays no role for the BER and SER, and all the curves follow closely the BER and the SER of a memoryless AWGN channel. However, as increases, the memory kicks in, causing the BER and SER for finite to have a minimum, and then to increase as increases. Physically, this can be explained as follows: in the low-power regime, the BER is limited by the ASE noise, which is independent of the memory depth. In the high-power regime, the Kerr-induced noise dominates, resulting in increasing BER with power. Similar behavior has been reported in most experiments and simulations on nonlinearly-limited links, e.g., [45, 46, 9, 11], [47, Ch. 9]. The reason why the performance improves slightly with the memory depth is the nonlinear scaling of the Kerr-induced noise. For , sequences of two or more high-amplitude symbols will receive high noise power and dominate the average BER. For higher , longer (and less probable) sequences of high-amplitude symbols are required to receive the same, high, noise power. Thus on average the performance improves with , up to a limit given by the GN model.
The results in Fig. 4 also show how the finite-memory model in the high-input power regime approaches the GN model. For , the two models yield very similar BER and SER curves.
IV Channel Capacity
In this section, some fundamentals of information theory are first reviewed. Then a lower bound on the capacity of the finite-memory GN model is derived and evaluated numerically.
IV-A Preliminaries
Fig. 5 shows a generic coded communication system where a message is mapped to a codeword . This codeword is then used to modulate a continuous-time waveform, which is then transmitted through the physical channel. At the receiver’s side, the continuous-time waveform is processed (filtered, equalized, synchronized, matched filtered, sampled, etc.) resulting in a discrete-time observation , which is a noisy version of the transmitted codeword . The decoder uses to estimate the transmitted message .
When designing a coded communication system, the first step is to choose the set of codewords (i.e., the codebook) that will be transmitted through the channel. Once the codebook has been chosen, the mapping rule between messages and codewords should be chosen, which fully determines the encoding procedure. At the receiver side, the decoder block will use the mapping rule used at the transmitter (as well as the channel characteristics) to give an estimate of the message . The triplet codebook, encoder, and decoder forms a so-called coding scheme. Practical coding schemes are designed so as to minimize the probability that differs from , while at the same time keeping the complexity of both encoder and decoder low.
Channel capacity is the largest transmission rate at which reliable communications can occur. More formally, let be a coding scheme consisting of:
-
•
An encoder that maps a message into a block of transmitted symbols satisfying a per-codeword power constraint
(28) -
•
A decoder that maps the corresponding block of received symbols into a message so that the average error probability, i.e., the probability that differs from , does not exceed .
Observe that here is defined differently from in previous sections. It still represents the average transmit power, but while this quantity is Sec. II–III was interpreted in a statistical sense as the mean of an i.i.d. random variable, it is in this section the exact power of every codeword.
The maximum coding rate (measured in bit/symbol) for a given block length and error probability is defined as the largest ratio for which an coding scheme exists. The channel capacity is the largest coding rate for which a coding scheme with vanishing error probability exists, in the limit of large block length,
(29) |
IV-B Memoryless Channels
By Shannon’s channel coding theorem, the channel capacity of a discrete-time memoryless channel, in bit/symbol, can be calculated as [16], [17, Ch. 7]
(30) |
where is the mutual information (MI)
(31) |
and the maximization in (30) is over all probability distributions that satisfy , for a given channel .
Roughly speaking, a transmission scheme that operates at an arbitrary rate can be designed by creating a codebook of codewords of length , whose elements are i.i.d. random samples from the distribution that maximizes the mutual information in (30). This codebook is stored in both the encoder and decoder. During transmission, the encoder maps each message into a unique codeword , and the decoder identifies the codeword that is most similar, in some sense, to the received vector . An arbitrarily small error probability can be achieved by choosing large enough. This random coding paradigm was proposed already by Shannon [16]. In practice, however, randomly constructed codebooks are usually avoided for complexity reasons.
Since the additive noise in (2) is statistically independent of , the channel capacity of the GN model (2) can be calculated exactly as [5, 8]
(32) |
using Shannon’s well-known capacity expression [16, Sec. 24], [17, Ch. 9]. The capacity in (32) can be achieved by choosing the codewords to be drawn independently from a Gaussian distribution .
Considered as a function of the transmitted signal power , the capacity in (32) has the peculiar behavior of reaching a peak and eventually decreasing to zero at high enough power, since the denominator of (32) increases faster than the numerator. This phenomenon, sometimes called the “nonlinear Shannon limit” in the optical communications community, conveys the message that reliable communication over nonlinear optical channels becomes impossible at high powers. In the following sections, we shall question this pessimistic conclusion.
IV-C Channels with Memory
The capacity of channels with memory is, under certain assumptions on information stability [48, Sec. I],
(33) |
where , is defined as a multidimensional integral analogous to (31), and the maximization is over all joint distributions of satisfying . In this context, it is worth emphasizing that the maximization in (33) includes sequences that are not i.i.d. Hence, in order to calculate the channel capacity of a transmission link, it is essential that the employed channel model allows non-i.i.d. inputs.
An exact expression for the channel capacity of the finite-memory GN model (7) is not available. Shannon’s formula, which leads to (32), does not apply here, because the sequences and , where was defined in (7), are dependent. A capacity estimation via (33) is numerically infeasible, since it involves integration and maximization over high-dimensional spaces. We therefore turn our attention to bounds on the capacity for the finite-memory model. Every joint distribution of satisfying gives us a lower bound on capacity. Thus,
(34) |
for any random process such that the limit exists.
IV-D Lower Bound
In this section, a lower bound on (33) is derived by applying (34) to the following random input process. In every block of consecutive symbols, we let the first symbols and the last symbols have a constant amplitude, whereas the amplitude of the symbol in the middle of the block follows an arbitrary distribution. The phase of each symbol in the block is assumed uniform. With this random input process, illustrated in Fig. 6, the memory in (7) depends only on a single variable-amplitude symbol. This enables us to derive an analytical expression for the resulting capacity lower bound in (34).
Theorem 3
Proof:
See Appendix D. ∎
The bound will be numerically computed in the next section.
IV-E Numerical Results
Theorem 3 yields a lower bound on capacity for every constant and every probability distribution satisfying (35). Instead of optimizing the bound over all distributions , which is of limited interest, since the theorem itself provides only a lower bound on capacity, we study a heuristically chosen family of distributions and optimize its parameters along with the constant amplitude .
\psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi05} | \psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi10} |
\psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi15} | \psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi20} |
\psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi25} | \psfrag{MI}[.7]{$C$}\psfrag{v}[.7][10]{$\nu$}\psfrag{rs}[l][b][.7][-60]{$r_{1}^{2}/s$}\includegraphics[width=113.81102pt]{mi30} |
An attractive distribution in this context is to let the variable-amplitude symbols follow a circularly symmetric bivariate t-distribution [49, p. 86], [50, p. 1],
(38) |
where (with magnitude ) denotes one such variable-amplitude symbol, is a shape parameter, and scales the variance, which equals [50, p. 11] if and is otherwise undefined. The shape of this distribution is similar to a Gaussian, but the heaviness of the tail can be controlled via the shape parameter : the closer is to , the heavier tail. This is, as we shall see later, what makes it an interesting choice for nonlinear optical channels.
Again, we consider the same scenario as in Sec. II-E, with the system parameters given in Table I. The distribution of is given by , with given by (38). The power constraint (35), which reduces to
leaves two degrees of freedom to optimize for each , which we can take to be the shape parameter and the ratio .
The lower bound on the capacity of the finite-memory model given by Theorem 3 is shown in Fig. 7 as a function of , , and , for the special case . The expectation in (36) was estimated by Monte Carlo integration. It can be seen that as the transmit power increases, the optimum shape parameter gets closer and closer to . In other words, the tail gets heavier, so that at high power, it consumes almost all power, while the probability of transmitting a high amplitude is still small. In this sense, a t-distribution with a shape parameter near is similar to a satellite constellation [37].
Selecting the optimum parameters and for every power , the capacity bound is plotted in Fig. 8 as a function of transmit power , for selected values of the channel memory . The figure also shows the AWGN channel capacity and the exact capacity of the GN model given by (32). In the linear regime, the capacity bound is close to the AWGN capacity if , because the t-distribution is, at high values of , approximately equal to the capacity-achieving Gaussian distribution. As increases, the capacity bound tends, still in the linear regime, to the mutual information of constant-amplitude transmission [51, 52].
Interestingly, we can see that as increases, the curves approach an asymptotic bound (the curves for , , and almost overlap). It follows that reliable communication in the high input power regime is indeed possible for every finite . This result should be compared with the regular GN model, whose capacity (32) decreases to zero at high average transmit power [8]. It may seem contradictory that the GN model, which can be characterized as a limiting case of the finite-memory model (cf. (7) and (2)–(3)), nevertheless exhibits a fundamentally different channel capacity. This can be intuitively understood as follows. For every block of symbols, we transmit constant-amplitude symbols with low power and only one symbol with variable (potentially very large) power. Although the amplitude of this variable-power symbol is chosen so that the average power constraint is satisfied according to (35) (which requires averaging across many blocks of length ), the convergence to average power illustrated in (3) does not occur within a block, even when is taken very large.
It can be observed that the lower bounds in Fig. 8 all exhibit a low peak, before they converge to their asymptotic values at high . Such bounds can always be improved using the law of monotonic channel capacity [53]. Cast in the framework of this paper, this law states that the channel capacity never decreases with power for any finite-memory channel. This law does not give a capacity lower bound per se, but it provides an instrument by which a lower bound at a certain power can be propagated to any power greater than . Hence, the part of the curves in Fig. 8 to the right of the peaks can be lifted up to the level of the peaks, which would yield a marginally tighter lower bounds (dashed lines in Fig. 8).
V Discussion and Conclusions
We extended the popular GN model for nonlinear fiber channels with a parameter to account for the channel memory. The extended channel model, which is given by (7), is able to model the time-varying output of an optical fiber whose input is a nonstationary process. If the input varies on a time scale comparable to or longer than the memory of the channel, then this model gives more realistic results than the regular GN model, as we showed in Fig. 2.
The validity of the GN model remains undisputed in the case of i.i.d. input symbols, such as in an uncoded scenario with a fixed, not too heavy-tailed modulation format222Examples of “heavy-tailed” modulation formats are t-distributions (Sec. IV-E) and satellite constellations [37]. and a fixed transmit power. These are the conditions under which the GN model was derived and validated. The uncoded bit and symbol error rates computed in Sec. III confirm that the finite-memory model behaves similarly to the GN model as the channel memory increases.
The scene changes completely if we instead study capacity, as in Fig. 8. In this case, the finite-memory GN model does not, even at high , behave as the regular GN model. This is because the channel capacity by definition involves a maximization over all possible transmission schemes, including nonstationary input, heavy-tailed modulation formats, etc. In the nonlinear regime, it turns out to be beneficial to transmit using a heavy-tailed input sequence, whose output the GN model cannot reliably predict. Hence, the GN model and other infinite-memory models (in the sense defined in Sec. II-C) should be used with caution in capacity analysis. It is still possible (and often easy) to calculate the capacity of such channel models, but this capacity should not be interpreted as the capacity of some underlying physical phenomenon with a finite memory. As a rule of thumb, if the model depends on the average transmit power, we recommend to avoid it in capacity analysis.
A challenging area for future work would be to derive more realistic finite-memory models than (7), i.e., discrete-time channel models that give the channel output as a function of a finite number of input symbols, ideally including not only a time-varying sequence of symbols but also symbols in other wavelengths, polarizations, modes, and/or cores, and to analyze these models from an information-theoretic perspective. This may lead to innovative new transmission techniques, which may potentially increase the capacity significantly over known results in the nonlinear regime. The so-called nonlinear Shannon limit, which has only been derived for infinite-memory channel models, does not prevent the existence of such techniques.
Appendix A Proof of Theorem 1
Let , , be the four bits associated with the -QAM constellation point chosen as the th transmitted symbol . The BER for the 16-QAM constellation in Fig. 3 is given by
(39) | ||||
(40) |
where is the estimated bit obtained by the MED detector in (III-A) and denotes bit negation. Using the law of total probability we can then express (40) as
(41) |
We now compute the PMF . As is a sum of i.i.d. random variables, its PMF is the -fold self-convolution of the PMF of one such random variable. This convolution can be readily computed using probability generating functions [54, Sec. 5.1]. Let
(42) |
denote the probability generating function of . The probability generating function of is given by
(43) | ||||
(44) | ||||
(45) |
We see from (45) that the possible outcomes of are
(46) |
and occurs with probability . Using this in (41) yields
(47) |
where (47) follows from (III) and represents the th bit label of the symbol for .
The density in the integral in (47) corresponds to a Gaussian random variable with total variance , and thus, we now focus on the function . First, we express the constellation points indices as , where , , and . From Fig. 3, we see that if , if , and if . Using the definition of in (11) together with (46) and , we obtain
(48) |
We recognize the three values of in (A) as , , and , respectively. Combining this with and inspecting the constellation and labeling in Fig. 3 yields (15).
Appendix B Proof of Theorem 2
Appendix C Proof of Corollary 1
The SER in (20) can be expressed as
(54) |
where
(55) |
is a continuous and bounded function in for any and . We can interpret the innermost sum in (54) in probabilistic terms as
(56) |
where is a binomial random variable with parameters , i.e., is the sum of i.i.d. Bernoulli random variables that take values and with the same probability. We use the notation to emphasize the dependency on . To establish (25), we first calculate
(57) | ||||
(58) | ||||
(59) | ||||
(60) |
Here, (57) follows from the dominated convergence theorem [54, Sec. 5.6.(12).(b)], whose application is possible because is a bounded function, (58) holds because is continuous, and (59) follows from the law of large numbers (see e.g., [54, Sec. 7.4.(3)]). The proof of (25) is completed by using
(61) |
The proof of the BER expression in (24) follows steps similar to the ones we presented above.
Appendix D Proof of Theorem 3
Consider a sequence of independent symbols , where for each , the magnitude is independent of the phase , which is uniform in . The magnitude is distributed according to if and is otherwise equal to the constant . Furthermore, and are chosen so that (35) holds, which guarantees that the average power constraint is satisfied. We will next show that the right-hand side of (36) is the mutual information (in bits per channel use) obtainable with this input distribution. Hence, it is a lower bound on capacity.
We define blocks of length of transmitted and received symbols as
for . Let us focus for a moment on the received block . Let be the th element () of . It follows from (7) that the additive noise contribution to depends on the input vector , which may span more than one input block. By construction, however, all elements of with the exception of have constant magnitude equal to . Hence,
(62) |
This implies that
(63) |
We see from (63) that each output sample in actually depends on the input symbols only through and . We then conclude that depends on the whole input sequence only through . But this, together with the assumption of independent input symbols, implies that the output blocks are independent. Hence, from (34),
(64) |
for an arbitrary , say, .
Next, we calculate . The mutual information can be decomposed into differential entropies as
(65) |
where
(66) | ||||
(67) |
We start by evaluating (67). Because of (63), the conditional distribution of given is the multivariate Gaussian density
(68) |
Using [17, Theorem 8.4.1], we conclude that
(69) |
where the expectation is with respect to the random variable , which is distributed according to .
To evaluate (66), we start by noting that all elements of have uniform phase because the transmitted symbols and the additive noise samples have uniform phase by assumption. We use this property to simplify (66). Specifically, let and
(70) |
By [55, eq. (320)]
(71) |
To evaluate , we first derive the conditional distribution of given . Note that has the same distribution as (see (1) and (7)). Hence, given , the random variables follow a noncentral chi-square distribution with two degrees of freedom and noncentrality parameters , where if and otherwise. Furthermore, these random variables are conditionally independent given . Using the change of variable theorem for transformation of random variables, we finally obtain after algebraic manipulations
(72) | |||||
The probability distribution , which is given in (37), is obtained from (72) by taking the expectation with respect to, the probability distribution of . Finally, we obtain the capacity lower bound (36) by substituting (37) into (66) and (69) into (67), by computing the difference between the two resulting differential entropies according to (65), and by dividing by .
References
- [1] H. Sun, K.-T. Wu, and K. Roberts, “Real-time measurements of a 40 Gb/s coherent system,” Opt. Express, vol. 16, no. 2, pp. 873–879, 2008.
- [2] K. Roberts, M. O’Sullivan, K.-T. Wu, H. Sun, A. Awadalla, D. J. Krause, and C. Laperle, “Performance of dual-polarization QPSK for optical transport systems,” J. Lightw. Technol., vol. 27, no. 16, pp. 3546–3559, Aug. 2009.
- [3] A. D. Ellis, J. Zhao, and D. Cotter, “Approaching the non-linear Shannon limit,” J. Lightw. Technol., vol. 28, no. 4, pp. 423–433, Feb. 2010.
- [4] A. Mecozzi and R.-J. Essiambre, “Nonlinear Shannon limit in pseudolinear coherent systems,” J. Lightw. Technol., vol. 30, no. 12, pp. 2011–2024, June 2012.
- [5] A. Splett, C. Kurtzke, and K. Petermann, “Ultimate transmission capacity of amplified optical fiber communication systems taking into account fiber nonlinearities,” in Proc. European Conference on Optical Communication (ECOC), Montreux, Switzerland, Sept. 1993.
- [6] J. Tang, “The channel capacity of a multispan DWDM system employing dispersive nonlinear optical fibers and an ideal coherent optical receiver,” J. Lightw. Technol., vol. 20, no. 7, pp. 1095–1101, July 2002.
- [7] P. Poggiolini, A. Carena, V. Curri, G. Bosco, and F. Forghieri, “Analytical modeling of nonlinear propagation in uncompensated optical transmission links,” IEEE Photon. Technol. Lett., vol. 23, no. 11, pp. 742–744, June 2011.
- [8] G. Bosco, P. Poggiolini, A. Carena, V. Curri, and F. Forghieri, “Analytical results on channel capacity in uncompensated optical links with coherent detection,” Opt. Express, vol. 19, no. 26, pp. B440–B449, Dec. 2011.
- [9] A. Carena, V. Curri, G. Bosco, P. Poggiolini, and F. Forghieri, “Modeling of the impact of nonlinear propagation effects in uncompensated optical coherent transmission links,” J. Lightw. Technol., vol. 30, no. 10, pp. 1524–1539, May 2012.
- [10] P. Poggiolini, “The GN model of non-linear propagation in uncompensated coherent optical systems,” J. Lightw. Technol., vol. 24, no. 30, pp. 3875–3879, Dec. 2012.
- [11] L. Beygi, E. Agrell, P. Johannisson, M. Karlsson, and H. Wymeersch, “A discrete-time model for uncompensated single-channel fiber-optical links,” IEEE Trans. Commun., vol. 60, no. 11, pp. 3440–3450, Nov. 2012.
- [12] P. Johannisson and M. Karlsson, “Perturbation analysis of nonlinear propagation in a strongly dispersive optical communication system,” J. Lightw. Technol., vol. 31, no. 8, pp. 1273–1282, Apr. 2013.
- [13] E. Grellier and A. Bononi, “Quality parameter for coherent transmissions with Gaussian-distributed nonlinear noise,” Opt. Express, vol. 19, no. 13, pp. 12 781–12 788, June 2011.
- [14] L. Beygi, N. V. Irukulapati, E. Agrell, P. Johannisson, M. Karlsson, H. Wymeersch, P. Serena, and A. Bononi, “On nonlinearly-induced noise in single-channel optical links with digital backpropagation,” Optics Express, vol. 21, no. 22, pp. 26 376–26 386, Nov. 2013.
- [15] F. V. O. Rival, C. Simonneau, E. Grellier, A. Bononi, L. Lorcy, J.-C. Antona, and S. Bigo, “On nonlinear distortions of highly dispersive optical coherent systems,” Opt. Express, vol. 20, no. 2, pp. 1022–1032, Jan. 2012.
- [16] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, July, Oct. 1948.
- [17] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley, 2006.
- [18] J. B. Stark, “Fundamental limits of information capacity for optical communications channels,” in Proc. European Conference on Optical Communication (ECOC), Nice, France, Sep. 1999.
- [19] P. P. Mitra and J. B. Stark, “Nonlinear limits to the information capacity of optical fibre communications,” Nature, vol. 411, pp. 1027–1030, June 2001.
- [20] K. S. Turitsyn, S. A. Derevyanko, I. V. Yurkevich, and S. K. Turitsyn, “Information capacity of optical fiber channels with zero average dispersion,” Physical Review Letters, vol. 91, no. 20, pp. 203 901 1–4, Nov. 2003.
- [21] I. B. Djordjevic and B. Vasic, “Achievable information rates for high-speed long-haul optical transmission,” J. Lightw. Technol., vol. 11, no. 23, pp. 3755–3763, Nov. 2005.
- [22] M. H. Taghavi, G. C. Papen, and P. H. Siegel, “On the multiuser capacity of WDM in a nonlinear optical fiber: Coherent communication,” IEEE Trans. Inf. Theory, vol. 52, no. 11, pp. 5008–5022, Nov. 2006.
- [23] R.-J. Essiambre, G. Kramer, P. J. Winzer, G. J. Foschini, and B. Goebel, “Capacity limits of optical fiber networks,” J. Lightw. Technol., vol. 28, no. 4, pp. 662–701, Feb. 2010.
- [24] M. Secondini, E. Forestieri, and G. Prati, “Achievable information rate in nonlinear WDM fiber-optic systems with arbitrary modulation formats and dispersion maps,” J. Lightw. Technol., vol. 31, no. 23, pp. 3839–3852, Dec. 2013.
- [25] R. Dar, M. Shtaif, and M. Feder, “New bounds on the capacity of the nonlinear fiber-optic channel,” Optics Letters, vol. 39, no. 2, pp. 398–401, Jan. 2014.
- [26] J. M. Kahn and K.-P. Ho, “Spectral efficiency limits and modulation/detection techniques for DWDM systems,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 10, no. 2, pp. 259–272, Mar./Apr 2004.
- [27] E. Narimanov and P. Mitra, “The channel capacity of a fiber optics communication system: perturbation theory,” J. Lightw. Technol., vol. 20, no. 3, pp. 530–537, Mar. 2002.
- [28] L. G. L. Wegener, M. L. Povinelli, A. G. Green, P. P. Mitra, J. B. Stark, and P. B. Littlewood, “The effect of propagation nonlinearities on the information capacity of WDM optical fiber systems: Cross-phase modulation and four-wave mixing,” Physica D: Nonlinear Phenomena, vol. 189, no. 1-2, pp. 81–99, Feb. 2004.
- [29] R.-J. Essiambre, G. J. Foschini, G. Kramer, and P. J. Winzer, “Capacity limits of information transport in fiber-optic networks,” Physical Review Letters, vol. 101, no. 16, pp. 163 901 1–4, Oct. 2008.
- [30] T. Freckmann, R.-J. Essiambre, P. J. Winzer, G. J. Foschini, and G. Kramer, “Fiber capacity limits with optimized ring constellations,” IEEE Photon. Technol. Lett., vol. 21, no. 20, pp. 1496–1498, Oct. 2009.
- [31] I. B. Djordjevic, H. G. Batshon, L. Xu, and T. Wang, “Coded polarization-multiplexed iterative polar modulation (PM-IPM) for beyond 400 Gb/s serial optical transmission,” in Proc. Optical Fiber Communication Conference (OFC), San Diego, CA, Mar. 2010.
- [32] R. I. Killey and C. Behrens, “Shannon’s theory in nonlinear systems,” Journal of Modern Optics, vol. 58, no. 1, pp. 1–10, Jan. 2011.
- [33] E. Agrell and M. Karlsson, “Power-efficient modulation formats in coherent transmission systems,” J. Lightw. Technol., vol. 27, no. 22, pp. 5115–5126, Nov. 2009.
- [34] ——, “WDM Channel Capacity and its Dependence on Multichannel Adaptation Models,” in Proc. Optical Fiber Communication Conference (OFC), Anaheim, CA, Mar. 2013.
- [35] B. Goebel, R.-J. Essiambre, G. Kramer, P. J. Winzer, and N. Hanik, “Calculation of mutual information for partially coherent Gaussian channels with applications to fiber optics,” IEEE Trans. Inf. Theory, vol. 57, no. 9, pp. 5720–5736, Sep. 2011.
- [36] G. Bosco, P. Poggiolini, A. Carena, V. Curri, and F. Forghieri, “Analytical results on channel capacity in uncompensated optical links with coherent detection: Erratum,” Opt. Express, vol. 20, no. 17, pp. 19 610–19 611, Aug. 2012.
- [37] E. Agrell and M. Karlsson, “Satellite constellations: Towards the nonlinear channel capacity,” in Proc. IEEE Photon. Conf. (IPC), Burlingame, CA, Sept. 2012.
- [38] A. Carena, G. Bosco, V. Curri, P. Poggiolini, M. T. Taiba, and F. Forghieri, “Statistical characterization of PM-QPSK signals after propagation in uncompensated fiber links,” in Proc. European Conference on Optical Communication (ECOC), London, U.K., Sept. 2010.
- [39] T. Koch, A. Lapidoth, and P. Sotiriadis, “Channels that heat up,” IEEE Trans. Inf. Theory, vol. 55, no. 8, pp. 3594 –3612, Aug. 2009.
- [40] R. G. Gallager, Information Theory and Reliable Communication. New York, NY: Wiley, 1968.
- [41] E. Ip and J. M. Kahn, “Digital equalization of chromatic dispersion and polarization mode dispersion,” J. Lightw. Technol., vol. 25, no. 8, pp. 2033–2043, Aug. 2007.
- [42] E. Agrell, J. Lassing, E. G. Ström, and T. Ottosson, “On the optimality of the binary reflected Gray code,” IEEE Trans. Inf. Theory, vol. 50, no. 12, pp. 3170–3182, Dec. 2004.
- [43] M. P. Fitz and J. P. Seymour, “On the bit error probability of qam modulation,” International Journal of Wireless Information Networks, vol. 1, no. 2, pp. 131–139, Apr. 1994.
- [44] M. K. Simon, S. M. Hinedi, and W. C. Lindsey, Digital Communication Techniques: Signal Design and Detection. Englewood Cliffs, NJ: Prentice-Hall, 1995.
- [45] A. Mecozzi, “Limits to long-haul coherent transmission set by the kerr nonlinearity and noise of the in-line amplifiers,” J. Lightw. Technol., vol. 12, no. 11, pp. 1993–2000, Nov. 1994.
- [46] A. Demir, “Nonlinear phase noise in optical-fiber-communication systems,” J. Lightw. Technol., vol. 25, no. 8, pp. 2002–2032, Aug. 2007.
- [47] G. P. Agrawal, Fiber-optic communication systems, 4th ed. Wiley, 2010.
- [48] S. Verdú and T. S. Han, “A general formula for channel capacity,” IEEE Trans. Inf. Theory, vol. 40, no. 4, pp. 1147–1157, July 1994.
- [49] K.-T. Fang, S. Kotz, and K. W. Ng, Symmetric Multivariate and Related Distributions. Springer, 1990.
- [50] S. Kotz and S. Nadarajah, Multivariate t Distributions and Their Applications. Cambridge University Press, 2004.
- [51] N. M. Blachman, “A comparison of the informational capacities of amplitude- and phase-modulation communication systems,” Proceedings of the I.R.E., vol. 41, no. 6, pp. 748–759, June 1953.
- [52] K.-P. Ho and J. M. Kahn, “Channel capacity of WDM systems using constant-intensity modulation formats,” in Proc. Optical Fiber Communication Conference (OFC), Anaheim, CA, Mar. 2002.
- [53] E. Agrell, “On monotonic capacity–cost functions,” 2012, preprint. [Online]. Available: http://arxiv.org/abs/1209.2820
- [54] G. R. Grimmett and D. R. Stirzaker, Probability and Random Processes, 3rd ed. Oxford University Press, 2001.
- [55] A. Lapidoth and S. M. Moser, “Capacity bounds via duality with applications to multiple-antenna systems on flat-fading channels,” IEEE Trans. Inf. Theory, vol. 49, no. 10, pp. 2426–2467, Oct. 2003.