This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Retrieving Data Permutations from Noisy Observations: High and Low Noise Asymptotics

Minoh Jeong, Alex Dytso, Martina Cardone University of Minnesota, Minneapolis, MN 55455, USA, Email: {jeong316, mcardone}@umn.edu
New Jersey Institute of Technology, Newark, NJ 07102, USA Email: alex.dytso@njit.edu
The work of M. Jeong and M. Cardone was supported in part by the U.S. National Science Foundation under Grant CCF-1849757.
Abstract

This paper considers the problem of recovering the permutation of an nn-dimensional random vector 𝐗{\bf X} observed in Gaussian noise. First, a general expression for the probability of error is derived when a linear decoder (i.e., linear estimator followed by a sorting operation) is used. The derived expression holds with minimal assumptions on the distribution of 𝐗{\bf X} and when the noise has memory. Second, for the case of isotropic noise (i.e., noise with a diagonal scalar covariance matrix), the rates of convergence of the probability of error are characterized in the high and low noise regimes. In the low noise regime, for every dimension nn, the probability of error is shown to behave proportionally to σ\sigma, where σ\sigma is the noise standard deviation. Moreover, the slope is computed exactly for several distributions and it is shown to behave quadratically in nn. In the high noise regime, for every dimension nn, the probability of correctness is shown to behave as 1/σ1/\sigma, and the exact expression for the rate of convergence is also provided.

I Introduction

The problem of recovering data permutations from noisy observations is becoming a common task of modern communication and computing systems. For example, systems based on data sorting operations, such as a recommender system or a data analysis system, make use of the data permutations and leverage the information that can be obtained from the data ordering. In particular, recommender systems clearly utilize the sorting information in order to optimize their next recommendation. As for the case of a recommender system, data analysis systems are also often interested in rankings of massive data sets rather than in the exact values of the data. In such systems, users may desire to enclose their data when it contains sensitive information. A common solution to privatize individual data is to add a sufficient amount of random noise to guarantee the desired privacy level [1]. However, adding too much noise can render the task of recovering a permutation impossible as the data will be too noisy. Therefore, for a given noise level, it is important to understand the fundamental limits of the data permutation recovery problem.

In this work, following preliminary works in [2] and [3], we study the data permutation recovery problem in the framework of an MM-ary hypothesis testing. The specific goal of this paper is to study fundamental limits of such problem under the constraint that a linear decoder (i.e., linear estimator followed by a sorting operation) is employed. Studying linear decoders is interesting for several reasons. First, as it was shown in [2] linear decoders are optimal (i.e., they lead to the smallest probability of error) when the noise is isotropic, and the distribution of the input data is exchangeable. Second, the optimal decoder can be linear even if the noise is colored; see [3] for the exact conditions. Third, linear decoders have at most polynomial complexity in the data dimension and hence, they are suitable for practical implementations.

The structure of the paper is as follows. In Section II, we introduce the notation and formally define the problem. In Section III, we characterize the probability of error when linear decoders are used. The derived expression holds with minimal assumptions on the distribution of the data and holds when the noise has memory. In Section IV, we utilize the expression for the error probability derived in Section III and characterize the asymptotic behavior of the probability of error for the isotropic noise case (i.e., when the noise covariance matrix is a diagonal scalar matrix) in the low and high noise regimes. For example, we show that the probability of error linearly increases in σ\sigma (i.e., the standard deviation of the noise) in the low noise regime (i.e., when σ0\sigma\to 0). We derive the exact slope and we show it to be at most a quadratic function of the data dimension for a general class of distributions. In addition, we show that the behavior of the probability of correctness in the high noise regime (i.e., when σ\sigma\to\infty) is proportional to 1σ\frac{1}{\sigma}, and we characterize the exact slope.

I-A Related work

Permutation associated estimation problems have recently gained significant importance and are studied in various fields [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. The ranking (e.g., data permutation) estimation problem under a joint Gaussian distribution was investigated in [4, 5, 6, 7]. In particular, in [4] the author considered a pairwise ordering for the bivariate case; the extended version to the nn-dimension was considered in [5]. The generalization of the assumption of a Gaussian distribution to an elliptically contoured distribution can be found in [6, 7]. The authors in [4, 5, 6, 7] analyzed the structure of the covariance matrix that maximizes the probability of correctness of such estimation problems using the minimum mean square error (MMSE) estimator. In [3], the MMSE estimator was shown to be the only linear estimator that achieves the minimum probability of error for the ranking estimation problem. Most of recent works study a problem based on a linear regression framework premultiplied by an unknown permutation matrix, which suitably models the problem with unknown labels. In [8], the feature matching problem in computer vision was formulated as a permutation recovery problem. The multivariate linear regression model with an unknown permutation was studied in [10, 9]. The authors provided necessary and sufficient conditions on the signal-to-noise ratio for an exact permutation recovery and characterized the minimax prediction error. The isotonic regression without data labels, namely the uncoupled isotonic regression, was discussed in [11]. Data estimation given randomly selected measurements – referred to as unlabeled sensing – was studied in [12, 13, 14]. In [12], the authors characterized a necessary condition on the dimension of the observation vector for uniquely recovering the original data in the noiseless case. A generalized framework of unlabeled sensing was presented in [15, 16, 17]. The estimation of a sorted vector based on noisy observations was proposed in [18], where the MMSE estimator on sorted data was characterized as a linear combination of estimators on the unsorted data.

II Notation and Framework

Notation. Boldface upper case letters 𝐗\mathbf{X} denote vector random variables; the boldface lower case letter 𝐱\mathbf{x} indicates a specific realization of 𝐗\mathbf{X}; Xi:nX_{i:n} denotes the ii-th order statistics of 𝐗{\bf X}; 𝐗\|{\bf X}\| is the norm of 𝐗{\bf X}; [n1:n2][n_{1}:n_{2}] is the set of integers from n1n_{1} to n2n1n_{2}\geq n_{1}; InI_{n} is the identity matrix of dimension nn; 𝟎n\mathbf{0}_{n} is the column vector of dimension nn of all zeros; calligraphic letters indicate sets; |𝒜||\mathcal{A}| is the cardinality of 𝒜\mathcal{A}; for 𝒜\mathcal{A} and \mathcal{B}, 𝒜\mathcal{A}\setminus\mathcal{B} is the set of elements that belong to 𝒜\mathcal{A} but not to \mathcal{B}, 𝒜\mathcal{A}\cap\mathcal{B} is the set of elements that belong both to 𝒜\mathcal{A} and \mathcal{B}, and 𝒜\mathcal{A}\cup\mathcal{B} is the set of elements which are in either set. For a set 𝒮n{\cal S}\subseteq\mathbb{R}^{n}, Vol(𝒮){\rm Vol}({\cal S}) denotes the volume, i.e., the nn-dimensional Lebesgue measure. For two nn-dimensional vectors 𝐱{\bf x} and 𝐲{\bf y}, if for all i[1:n]i\in[1:n], the ii-th element of 𝐱{\bf x} is larger than or equal to the ii-th element of 𝐲{\bf y}, then we use 𝐱𝐲{\bf x}\geq{\bf y}. Finally, the multiplication of a matrix AA by a set {\cal B} is denoted and defined as A={A𝐱:𝐱}A{\cal B}=\{A{\bf x}:{\bf x}\in{\cal B}\}.

We consider the framework in Fig. 1, where an nn-dimensional random vector 𝐗n\mathbf{X}\in\mathbb{R}^{n} is first generated according to a certain distribution and then passed through an additive noisy channel with Gaussian transition probability, the output of which is denoted as 𝐘\mathbf{Y}. Thus, we have 𝐘=𝐗+𝐍\mathbf{Y}=\mathbf{X}+\mathbf{N}, with 𝐍𝒩(𝟎n,K𝐍)\mathbf{N}\sim\mathcal{N}(\mathbf{0}_{n},K_{\mathbf{N}}) where K𝐍K_{\mathbf{N}} is the covariance matrix of the additive noise 𝐍\mathbf{N}, and where 𝐗\mathbf{X} and 𝐍\mathbf{N} are independent.

In this work, we are interested in studying the probability of error of the “data permutation recovery” problem formulated in [2, 3] that, given the observation of 𝐘\mathbf{Y}, seeks to retrieve the permutation (among the n!n! possible ones) according to which the vector 𝐗\mathbf{X} is sorted. Specifically, this problem can be formulated within a hypothesis testing framework with n!n! hypotheses π,π𝒫\mathcal{H}_{\pi},\pi\in\mathcal{P}, where 𝒫\mathcal{P} is the collection of all permutations of the elements of [1:n][1:n], and where π\mathcal{H}_{\pi} is the hypothesis that 𝐗\mathbf{X} is an nn-dimensional vector sorted according to the permutation π𝒫\pi\in\mathcal{P}, that is

π={𝐱n:xπ1xπ2xπn},\displaystyle{\cal H}_{\pi}=\{\mathbf{x}\in\mathbb{R}^{n}:x_{\pi_{1}}\leq x_{\pi_{2}}\leq\cdots\leq x_{\pi_{n}}\}, (1)

with xπi,i[1:n]x_{\pi_{i}},i\in[1:n] being the πi\pi_{i}-th element of 𝐱\mathbf{x}, and πi,i[1:n]\pi_{i},i\in[1:n] being the ii-th element of π\pi. Given this, the optimal decoder in Fig. 1 will output π^,π^𝒫\mathcal{H}_{\hat{\pi}},\hat{\pi}\in\mathcal{P} such that

π^:π^=argminπ𝒫{Pr(ππ)},\displaystyle\mathcal{H}_{\hat{\pi}}:\ \hat{\pi}=\operatornamewithlimits{argmin}_{\pi\in\mathcal{P}}\ \{\Pr\left(\mathcal{H}_{\pi}\neq\mathcal{H}_{\pi^{\star}}\right)\}, (2)

where π\pi^{\star} denotes the permutation according to which the random vector 𝐗\mathbf{X} is sorted. In particular, the decoder will declare that the input vector 𝐱π\mathbf{x}\in\mathcal{H}_{\pi} if and only if the observation vector 𝐲π,K𝐍\mathbf{y}\in\mathcal{R}_{\pi,K_{\mathbf{N}}}, where π,K𝐍,π𝒫\mathcal{R}_{\pi,K_{\mathbf{N}}},\pi\in\mathcal{P} are the so-called optimal decision regions111The notation π,K𝐍\mathcal{R}_{\pi,K_{\mathbf{N}}} indicates that, in general, the decision regions might be functions of the noise covariance matrix K𝐍K_{\mathbf{N}}., which can be derived by leveraging the maximum a posterior probability (MAP) criterion [19, Appendix 3C] and are given by [2, 3]

π,K𝐍={𝐲n:f𝐘(𝐲,π)>maxτ𝒫τπf𝐘(𝐲,τ)},\displaystyle\mathcal{R}_{\pi,K_{\mathbf{N}}}=\left\{{\bf y}\in\mathbb{R}^{n}:f_{\bf Y}({\bf y},\mathcal{H}_{\pi})>\max_{\begin{subarray}{c}\tau\in\mathcal{P}\\ \tau\neq\pi\end{subarray}}f_{\bf Y}({\bf y},\mathcal{H}_{\tau})\right\}, (3)

where f𝐘(𝐲,π)=f𝐘(𝐲|π)Pr(π)f_{\bf Y}({\bf y},\mathcal{H}_{\pi})=f_{\bf Y}({\bf y}|\mathcal{H}_{\pi})\Pr(\mathcal{H}_{\pi}) with f𝐘(𝐲|π)f_{\bf Y}({\bf y}|\mathcal{H}_{\pi}) denoting the conditional probability density function of 𝐘{\bf Y} given that 𝐗π{\bf X}\in\mathcal{H}_{\pi}. In order to guarantee that the collection {π,K𝐍,π𝒫}\{\mathcal{R}_{\pi,K_{\mathbf{N}}},\pi\in\mathcal{P}\} is a partition of the nn-dimensional space, we assume that if 𝐲{π,K𝐍,π𝒮,𝒮𝒫,|𝒮|>1}\mathbf{y}\in\{\mathcal{R}_{\pi,K_{\mathbf{N}}},\pi\in\mathcal{S},\mathcal{S}\subseteq\mathcal{P},|\mathcal{S}|>1\}, then one of the hypotheses π,π𝒮\mathcal{H}_{\pi},\pi\in\mathcal{S} is arbitrarily selected.

DataGenerator\begin{array}[]{l}\text{Data}\\ \text{Generator}\end{array}++𝐍𝒩(0n,KN)\mathbf{N}\sim\mathcal{N}(\textbf{0}_{n},K_{\textbf{N}})Decoderπ^,π^𝒫\begin{array}[]{l}\mathcal{H}_{\hat{\pi}},\\ \hat{\pi}\in\mathcal{P}\end{array} π,π𝒫Ground Truth\begin{array}[]{l}\mathcal{H}_{\pi^{\star}},\pi^{\star}\in\mathcal{P}\\ \text{Ground Truth}\end{array} X
Figure 1: Graphical representation of the considered framework.

III Probability of Error with Linear Decoder

In this section, we focus on characterizing the probability of error of the data permutation recovery problem introduced in Section II. Given the hypothesis and decision regions defined in (1) and (3), we have that the error probability PeP_{e} is given by

Pe\displaystyle P_{e} =1Pc,\displaystyle=1-P_{c}, (4a)
Pc\displaystyle P_{c} =π𝒫Pr({𝐘π,K𝐍}{𝐗π}),\displaystyle=\sum_{\pi\in{\cal P}}\Pr\left(\{{\bf Y}\in\mathcal{R}_{\pi,K_{\mathbf{N}}}\}\cap\{{\bf X}\in\mathcal{H}_{\pi}\}\right), (4b)

where PcP_{c} is the probability of correctness.

In particular, we assess the probability of error when a linear decoder is employed. This decoder first computes a permutation-independent linear transformation 𝐲\mathbf{y}_{\ell} of 𝐲\mathbf{y}, i.e., 𝐲=A𝐲+𝐛\mathbf{y}_{\ell}=A\mathbf{y}+\mathbf{b}, where An×nA\in\mathbb{R}^{n\times n} and 𝐛n\mathbf{b}\in\mathbb{R}^{n} are the same for all permutations, and then it outputs the permutation according to which 𝐲\mathbf{y}_{\ell} is sorted. The decision regions in (3) when a linear decoder is used become

π,K𝐍=Aπ+𝐛,π𝒫.\displaystyle\mathcal{R}_{\pi,K_{\mathbf{N}}}=A\mathcal{H}_{\pi}+\mathbf{b},\ \forall\pi\in\mathcal{P}. (5)

Our choice of assessing the probability of error performance of a linear decoder stems primarily from its low complexity (at most polynomial in nn) compared to a brute force evaluation of the optimal test (3), which has a practically prohibitive complexity of n!n!. Moreover, for the case 𝐗𝒩(𝟎n,In){\bf X}\sim\mathcal{N}(\mathbf{0}_{n},I_{n}) it has been shown in [3] that a linear decoder is indeed optimal, i.e., it minimizes the probability of error, under certain conditions on the noise covariance matrix K𝐍K_{\bf N}.

We next derive an expression for the probability of error when a linear decoder is used. Towards this end, for each π𝒫\pi\in\mathcal{P}, we define a matrix Tπ(n1)×nT_{\pi}\in\mathbb{R}^{(n-1)\times n} such that

(Tπ)i,j=1{j=πi+1}1{j=πi},\displaystyle(T_{\pi})_{i,j}=1_{\{j=\pi_{i+1}\}}-1_{\{j=\pi_{i}\}}, (6)

where 1{x=y}=11_{\{x=y\}}=1 if and only if x=yx=y and is equal to zero otherwise. For instance, let n=4n=4 and consider π={4,2,1,3}\pi=\{4,2,1,3\}; then, we have that

T{4,2,1,3}=[010111001010].\displaystyle T_{\{4,2,1,3\}}=\begin{bmatrix}0&1&0&-1\\ 1&-1&0&0\\ -1&0&1&0\end{bmatrix}.

The theorem below provides an expression for the error probability of the data permutation recovery problem when a linear decoder is used.

Theorem 1.

Let 𝐗{\bf X} be an exchangeable random vector222A sequence of random variables U1,,UnU_{1},…,U_{n} is said to be exchangeable if, for any permutation (π1,,πn)(\pi_{1},…,\pi_{n}) of the indices [1:n][1:n], we have that (U1,,Un)(U_{1},…,U_{n}) is equal in distribution to (Uπ1,,Uπn)(U_{\pi_{1}},…,U_{\pi_{n}}).. Then, for any invertible AA and 𝐛\mathbf{b} defined in (5) and any noise covariance matrix K𝐍K_{\bf N}, the probability of error is given by

Pe=11n!π𝒫𝔼[QK~π(TπA1(𝐗𝐛))|𝐗π],\displaystyle P_{e}\!=\!1\!-\!\frac{1}{n!}\sum_{\pi\in{\cal P}}\mathbb{E}\left[Q_{\tilde{K}_{\pi}}\!\left(-T_{\pi}A^{-1}({\bf X}\!-\!{\bf b})\right)\;\middle|\;{\bf X}\in\mathcal{H}_{\pi}\right], (7)

where K~π=TπA1K𝐍ATTπT(n1)×(n1)\tilde{K}_{\pi}=T_{\pi}A^{-1}K_{\bf N}A^{-T}T_{\pi}^{T}\in\mathbb{R}^{(n-1)\times(n-1)} with Tπ,π𝒫T_{\pi},\pi\in\mathcal{P} given by (6), and where QK~π()Q_{\tilde{K}_{\pi}}\!(\cdot)\! is the multivariate Gaussian Q-function with covariance K~π\tilde{K}_{\pi}.

Proof:

By substituting the decision regions in (5) inside (4) and by using the Bayes’ theorem, we obtain

Pc=π𝒫Pr(𝐘Aπ+𝐛|𝐗π)Pr(𝐗π)\displaystyle P_{c}=\sum_{\pi\in{\cal P}}\Pr\left({\bf Y}\in A\mathcal{H}_{\pi}+{\bf b}\;\middle|\;{\bf X}\in\mathcal{H}_{\pi}\right)\Pr\left({\bf X}\in\mathcal{H}_{\pi}\right)
=(a)1n!π𝒫Pr(𝐗+K𝐍12𝐙𝐛Aπ|𝐗π)\displaystyle{\overset{\rm(a)}{=}}\frac{1}{n!}\!\sum_{\pi\in{\cal P}}\Pr\left({\bf X}+K_{{\bf N}}^{\frac{1}{2}}{\bf Z}-{\bf b}\in A\mathcal{H}_{\pi}\!\;\middle|\;\!{\bf X}\in\mathcal{H}_{\pi}\right)
=(b)1n!π𝒫𝔼[Pr(𝐗+K𝐍12𝐙𝐛Aπ|𝐗)|𝐗π],\displaystyle{\overset{\rm(b)}{=}}\frac{1}{n!}\!\sum_{\pi\in{\cal P}}\mathbb{E}\!\left[\Pr\!\left({\bf X}\!+\!K_{{\bf N}}^{\frac{1}{2}}{\bf Z}-{\bf b}\in\!A\mathcal{H}_{\pi}\!\;\middle|\;\!{\bf X}\right)\!\;\middle|\;\!{\bf X}\!\in\!\mathcal{H}_{\pi}\right],\! (8)

where (a)\rm(a) follows from the fact that 𝐗{\bf X} is exchangeable and hence, Pr(𝐗π)=1n!,π𝒫\Pr({\bf X}\in{\cal H}_{\pi})=\frac{1}{n!},~{}\forall\pi\in{\cal P} and letting 𝐙𝒩(𝟎n,In){\bf Z}\sim\mathcal{N}(\mathbf{0}_{n},I_{n}), and (b)\rm(b) is due to the law of total expectation.

We now focus on the conditional probability inside the conditional expectation in (III). For each π,π𝒫{\cal H}_{\pi},~{}\forall\pi\in{\cal P} we have

Pr(𝐗+K𝐍12𝐙𝐛Aπ|𝐗)\displaystyle\Pr\left({\bf X}+K_{\bf N}^{\frac{1}{2}}{\bf Z}-{\bf b}\in A\mathcal{H}_{\pi}\;\middle|\;{\bf X}\right)
=Pr(A1(𝐗𝐛)+A1K𝐍12𝐙π|𝐗)\displaystyle=\Pr\left(A^{-1}({\bf X}-{\bf b})+A^{-1}K_{\bf N}^{\frac{1}{2}}{\bf Z}\in\mathcal{H}_{\pi}\;\middle|\;{\bf X}\right)
=Pr(A1(𝐗𝐛)+𝐔π|𝐗),\displaystyle=\Pr\left(A^{-1}({\bf X}-{\bf b})+{\bf U}\in\mathcal{H}_{\pi}\;\middle|\;{\bf X}\right), (9)

where the last equality follows by letting 𝐔=A1K𝐍12𝐙{\bf U}=A^{-1}K_{\bf N}^{\frac{1}{2}}{\bf Z}. Note that 𝐔𝒩(𝟎n,A1K𝐍AT){\bf U}\sim{\cal N}(\mathbf{0}_{n},A^{-1}K_{\bf N}A^{-T}).

Then, given 𝐗{\bf X}, the event inside the conditional probability in (III) can be expressed as

{A1(𝐗𝐛)+𝐔π}\displaystyle\left\{A^{-1}({\bf X}-{\bf b})+{\bf U}\in\mathcal{H}_{\pi}\right\}
=k=1n1{(A1(𝐗𝐛))πk(A1(𝐗𝐛))πk+1Uπk+1Uπk}\displaystyle=\bigcap_{k=1}^{n-1}\!\left\{\left(A^{-1}({\bf X}\!-\!{\bf b})\right)_{\!\pi_{k}}\!\!-\!\left(A^{-1}({\bf X}\!-\!{\bf b})\right)_{\!\pi_{k+1}}\!\leq U_{\pi_{k+1}}\!-\!U_{\pi_{k}}\right\}
={TπA1(𝐗𝐛)Tπ𝐔},\displaystyle=\left\{-T_{\pi}A^{-1}({\bf X}-{\bf b})\leq T_{\pi}{\bf U}\right\}, (10)

where the last equality follows by using the definition of Tπ,π𝒫T_{\pi},\pi\in\mathcal{P} in (6). By introducing a random vector 𝐕π=Tπ𝐔𝒩(𝟎n1,K~π){\bf V}_{\pi}=T_{\pi}{\bf U}\sim{\cal N}(\mathbf{0}_{n-1},\tilde{K}_{\pi}), where K~π=TπA1K𝐍ATTπT\tilde{K}_{\pi}=T_{\pi}A^{-1}K_{\bf N}A^{-T}T_{\pi}^{T}, we have an equivalent expression for (III) as Pr(𝐗+K𝐍12𝐙𝐛Aπ|𝐗)=Pr(TπA1(𝐗𝐛)𝐕π|𝐗)\Pr\left({\bf X}+K_{\bf N}^{\frac{1}{2}}{\bf Z}-{\bf b}\in A\mathcal{H}_{\pi}\;\middle|\;{\bf X}\right)=\Pr\left(-T_{\pi}A^{-1}({\bf X}-{\bf b})\leq{\bf V}_{\pi}\;\middle|\;{\bf X}\right). By substituting this inside (III), we obtain

Pc\displaystyle P_{c} =1n!π𝒫𝔼[Pr(TπA1(𝐗𝐛)𝐕π|𝐗)|𝐗π]\displaystyle=\!\frac{1}{n!}\!\sum_{\pi\in{\cal P}}\!\mathbb{E}\left[\Pr\left(\!-T_{\pi}A^{-1}({\bf X}-{\bf b})\leq{\bf V}_{\pi}\;\middle|\;{\bf X}\right)\;\middle|\;{\bf X}\in\mathcal{H}_{\pi}\right]
=1n!π𝒫𝔼[QK~π(TπA1(𝐗𝐛))|𝐗π],\displaystyle=\frac{1}{n!}\sum_{\pi\in{\cal P}}\mathbb{E}\left[Q_{\tilde{K}_{\pi}}\left(-T_{\pi}A^{-1}({\bf X}-{\bf b})\right)\;\middle|\;{\bf X}\in\mathcal{H}_{\pi}\right], (11)

where the last equality follows by letting QK~π()Q_{\tilde{K}_{\pi}}(\cdot) be the multivariate Gaussian Q-function with covariance QK~πQ_{\tilde{K}_{\pi}}. We conclude the proof of Theorem 1 by using Pe=1PcP_{e}=1-P_{c}. ∎

We highlight that (7) holds with minimal assumption on the distribution of 𝐗{\bf X} (i.e., exchangeability) and hence, it can be used to study the error probability of the data permutation recovery problem in various noise settings, e.g., noise has memory, noise is isotropic. In the remaining of this paper, we will focus on the isotropic noise scenario, i.e., we assume that K𝐍K_{\bf N} is a diagonal scalar matrix.

IV Isotropic Noise

We here study the error probability of the data permutation recovery problem when the noise is isotropic, i.e., K𝐍=σ2InK_{\bf N}=\sigma^{2}I_{n}. Under this assumption, the regions π,K𝐍,π𝒫\mathcal{R}_{\pi,K_{\mathbf{N}}},\pi\in\mathcal{P} in (5) depends on K𝐍K_{\bf N} only through the parameter σ\sigma and hence, we let π,K𝐍=π,σ\mathcal{R}_{\pi,K_{\mathbf{N}}}=\mathcal{R}_{\pi,\sigma}. Moreover, when 𝐗{\bf X} is exchangeable, it has been shown in [2] that π,σ=π,π𝒫\mathcal{R}_{\pi,\sigma}=\mathcal{H}_{\pi},\pi\in\mathcal{P}, i.e., for the isotropic noise setting the optimal decoder is indeed linear and hence, the probability of error in Theorem 1 is the minimum.

In Section IV-A, we will evaluate the probability of error PeP_{e} in (7) when K𝐍=σ2InK_{\bf N}=\sigma^{2}I_{n} and then in Section IV-B and Section IV-C, we will use this expression to derive the rates of convergence of PeP_{e} in the low noise regime (i.e., σ0\sigma\to 0) and high noise regime (i.e., σ\sigma\to\infty), respectively.

IV-A Probability of Error

Under the assumption K𝐍=σ2InK_{\bf N}=\sigma^{2}I_{n} we have that π,σ=π,π𝒫\mathcal{R}_{\pi,\sigma}=\mathcal{H}_{\pi},\pi\in\mathcal{P} [2] and hence, with reference to (5), we have that A=InA=I_{n} and 𝐛=𝟎n\mathbf{b}=\mathbf{0}_{n}. Moreover, by substituting these values inside K~π(n1)×(n1)\tilde{K}_{\pi}\in\mathbb{R}^{(n-1)\times(n-1)} in Theorem 1, we obtain

K~π=TπA1K𝐍ATTπT=σ2TπTπT=σ2K~,(K~)i,j={2i=j,1i=j+1andj=i+1,0otherwise,\displaystyle\begin{split}\tilde{K}_{\pi}&=T_{\pi}A^{-1}K_{\bf N}A^{-T}T_{\pi}^{T}=\sigma^{2}T_{\pi}T_{\pi}^{T}=\sigma^{2}\tilde{K},\\ (\tilde{K})_{i,j}&=\left\{\begin{array}[]{ll}2&i=j,\\ -1&i=j+1\ \text{and}\ j=i+1,\\ 0&\text{otherwise,}\end{array}\right.\end{split} (12)

that is K~(n1)×(n1)\tilde{K}\in\mathbb{R}^{(n-1)\times(n-1)} is a tridiagonal Toeplitz matrix.

The probability of error in the isotropic noise scenario is then given by the next corollary.

Corollary 1.

Let 𝐗{\bf X} be an exchangeable random vector and let K𝐍=σ2InK_{\bf N}=\sigma^{2}I_{n}. Then, for an arbitrary π𝒫\pi\in\mathcal{P}, the probability of error is given by

Pe=1𝔼[Qσ2K~(Tπ𝐗)|𝐗π],\displaystyle P_{e}=1-\mathbb{E}\left[Q_{\sigma^{2}\tilde{K}}\!\left(-T_{\pi}{\bf X}\right)\;\middle|\;{\bf X}\in\mathcal{H}_{\pi}\right], (13)

where K~\tilde{K} is defined in (12) and where QK~()Q_{\tilde{K}}(\cdot) is the multivariate Gaussian Q-function with covariance σ2K~\sigma^{2}\tilde{K}.

Proof:

By substituting the expression of σ2K~π\sigma^{2}\tilde{K}_{\pi} in (12) inside (7), we obtain

Pe=11n!π𝒫𝔼[Qσ2K~(Tπ𝐗)|𝐗π].\displaystyle P_{e}=1-\frac{1}{n!}\sum_{\pi\in{\cal P}}\mathbb{E}\left[Q_{\sigma^{2}\tilde{K}}\left(-T_{\pi}{\bf X}\right)\;\middle|\;{\bf X}\in\mathcal{H}_{\pi}\right]. (14)

We note that σ2K~\sigma^{2}\tilde{K} and the distribution of Tπ𝐗|𝐗πT_{\pi}{\bf X}\ |\ {\bf X}\in{\cal H}_{\pi} are independent of π𝒫\pi\in\mathcal{P} and hence, the conditional expectation in (14) is constant in π𝒫\pi\in\mathcal{P}. Since |𝒫|=n!|\mathcal{P}|=n!, we obtain

Pe\displaystyle P_{e} =1𝔼[Qσ2K~(Tτ𝐗)|𝐗τ],\displaystyle=1-\mathbb{E}\left[Q_{\sigma^{2}\tilde{K}}\!\left(-T_{\tau}{\bf X}\right)\;\middle|\;{\bf X}\in\mathcal{H}_{\tau}\right], (15)

where τ𝒫\tau\in{\cal P} can be arbitrary. ∎

We note that (14) is a function of σ\sigma and hence, in what follows we will use Pe(σ)P_{e}(\sigma) to highlight this dependence.

IV-B Low Noise Asymptotic

We here focus on the asymptotic behavior of the probability of error in the low noise regime (i.e., σ0\sigma\to 0). In particular, the next result, proved in Appendix A, shows that the probability of error in this regime is approximately linear in σ\sigma.

Theorem 2.

Let 𝐗{\bf X} consist of nn i.i.d. random variables generated according to XX. Let XX^{\prime} be an independent copy of XX and assume that

fXX(x)<,x.f_{X-X^{\prime}}(x)<\infty,~{}\forall x\in\mathbb{R}. (16)

Then,

limσ0Pe(σ)σ=i=1n1fWi(0+)π,\lim_{\sigma\to 0}\frac{P_{e}(\sigma)}{\sigma}=\sum_{i=1}^{n-1}\frac{f_{W_{i}}\!\left(0^{+}\right)}{\sqrt{\pi}}, (17)

where Wi=Xi+1:nXi:n,i[1:n1]W_{i}=X_{i+1:n}-X_{i:n},~{}i\in[1:n-1].

Remark 1.

The i.i.d. assumption on 𝐗{\bf X} in Theorem 2 can be relaxed to the case of exchangeable 𝐗{\bf X}, provided that the following holds: for 1i<jn11\leq i<j\leq n-1,

fWi(u)<,u+,\displaystyle f_{W_{i}}(u)<\infty,~{}\forall u\in\mathbb{R}_{+}, (18a)
fWi,Wj(u,v)<,(u,v)+2.\displaystyle f_{W_{i},W_{j}}(u,v)<\infty,~{}\forall(u,v)\in\mathbb{R}_{+}^{2}. (18b)

Then, under these conditions, for the case of an exchangeable 𝐗{\bf X} we have the same result as in Theorem 2.

Remark 2.

The quantity fWi(0+)f_{W_{i}}\left(0^{+}\right) in (17) can be computed as follows [20]

fWi(0+)\displaystyle f_{W_{i}}(0^{+}) =n!F(x)i1(1F(x))ni1f2(x)dx(i1)!(ni1)!\displaystyle=\frac{n!\!\int_{-\infty}^{\infty}F(x)^{i-1}\!\left(1\!-\!F(x)\right)^{n-i-1}\!f^{2}(x)\ {\rm d}x}{(i-1)!(n-i-1)!}
=n!𝔼[Ui1(1U)ni1f(F1(U))](i1)!(ni1)!,\displaystyle=\frac{n!\mathbb{E}\left[U^{i-1}\left(1-U\right)^{n-i-1}f(F^{-1}(U))\right]}{(i-1)!(n-i-1)!}, (19)

where the last step uses the probability integral transformation theorem and the quantile function theorem [21] with UUnif(0,1)U\sim\text{Unif}(0,1).

We now show that, under the condition supxfX(x)=c\sup_{x\in\mathbb{R}}f_{X}(x)=c, the asymptotic behavior of the probability of error in the low noise regime for an i.i.d. 𝐗{\bf X} is upper bounded by O(n2)O(n^{2}). In particular, we have the following lemma.

Lemma 1.

Assume that supxfX(x)=c\sup_{x\in\mathbb{R}}f_{X}(x)=c, where cc\in\mathbb{R} is a constant. Then,

limσ0Pe(σ)σcn(n1)π.\lim_{\sigma\to 0}\frac{P_{e}(\sigma)}{\sigma}\leq c\frac{n(n-1)}{\sqrt{\pi}}. (20)
Proof:

By using the expression in (19), we have that

fWi(0+)\displaystyle f_{W_{i}}(0^{+}) =n!𝔼[Ui1(1U)ni1f(F1(U))](i1)!(ni1)!\displaystyle=\frac{n!\mathbb{E}\left[U^{i-1}\left(1-U\right)^{n-i-1}f(F^{-1}(U))\right]}{(i-1)!(n-i-1)!}
cn!𝔼[Ui1(1U)ni1](i1)!(ni1)!\displaystyle\leq c\frac{n!\mathbb{E}\left[U^{i-1}\left(1-U\right)^{n-i-1}\right]}{(i-1)!(n-i-1)!}
=cn!01xi1(1x)ni1dx(i1)!(ni1)!\displaystyle=c\frac{n!\int_{0}^{1}x^{i-1}\left(1-x\right)^{n-i-1}\ {\rm d}x}{(i-1)!(n-i-1)!}
=cn!(i1)!(ni1)!Γ(ni)Γ(i)Γ(n)\displaystyle=c\frac{n!}{(i-1)!(n-i-1)!}\frac{\Gamma(n-i)\Gamma(i)}{\Gamma(n)}
=cn,\displaystyle=cn,

where Γ()\Gamma(\cdot) is the gamma function and where the inequality follows by using the bound f(F1(U))c=supxf(x)f(F^{-1}(U))\leq c=\sup_{x\in\mathbb{R}}f(x). Hence, (17) can be upper bounded as

limσ0Pe(σ)σ=i=1n1fWi(0+)πc(n1)nπ.\lim_{\sigma\to 0}\frac{P_{e}(\sigma)}{\sigma}=\sum_{i=1}^{n-1}\frac{f_{W_{i}}(0^{+})}{\sqrt{\pi}}\leq c\frac{(n-1)n}{\sqrt{\pi}}. (21)

This concludes the proof of Lemma 1. ∎

We conclude this section by providing some examples of (17) for a few distributions.

Example 1. Consider XUnif(a,b),0a<b<X\sim\text{Unif}(a,b),~{}0\leq a<b<\infty. Then,

limσ0Pe(σ)σ=n(n1)(ba)π.\displaystyle\lim_{\sigma\to 0}\frac{P_{e}(\sigma)}{\sigma}=\frac{n(n-1)}{(b-a)\sqrt{\pi}}. (22)

The proof of (22) can be found in Appendix D-A. \diamond

Example 2. Consider XExp(λ),λ>0X\sim\text{Exp}(\lambda),~{}\lambda>0. Then,

limσ0Pe(σ)σ=λn(n1)2π.\displaystyle\lim_{\sigma\to 0}\frac{P_{e}(\sigma)}{\sigma}=\frac{\lambda n(n-1)}{2\sqrt{\pi}}. (23)

The proof of (23) can be found in Appendix D-B. \diamond

Example 3. Consider X𝒩(0,1)X\sim{\cal N}(0,1). Then,

2n(n1)6π\displaystyle\frac{\sqrt{2}n(n-1)}{6\pi} limσ0Pe(σ)σn(n1)2π.\displaystyle\leq\lim_{\sigma\to 0}\frac{P_{e}(\sigma)}{\sigma}\leq\frac{n(n-1)}{\sqrt{2}\pi}. (24)

Note that the upper bound in (24) follows from Lemma 1, where we used the fact that c=supxf(x)=12πc=\sup_{x\in\mathbb{R}}f(x)=\frac{1}{\sqrt{2\pi}}. For the lower bound we use the following inequality [22, Lemma 10.1]:

f(F1(u))2πmin{u,1u}2πu(1u),\displaystyle f(F^{-1}(u))\geq\sqrt{\frac{2}{\pi}}\min\{u,1-u\}\geq\sqrt{\frac{2}{\pi}}u(1-u), (25)

where the last step follows since min{a,b}aba+b,a>0,b>0\min\{a,b\}\geq\frac{ab}{a+b},~{}\forall a>0,b>0. Combining the expression in (19) and the bound in (25), we arrive at the following lower bound,

fWi(0+)\displaystyle f_{W_{i}}(0^{+}) 2πn!𝔼[Ui(1U)ni](i1)!(ni1)!\displaystyle\geq\sqrt{\frac{2}{\pi}}\frac{n!\mathbb{E}\left[U^{i}\left(1-U\right)^{n-i}\right]}{(i-1)!(n-i-1)!}
=2πi(ni)(n+1),\displaystyle=\sqrt{\frac{2}{\pi}}\frac{i(n-i)}{(n+1)}, (26)

which implies the lower bound in (24). \diamond

IV-C High Noise Asymptotic

We now focus on the asymptotic behavior of the probability of error in the high noise regime (i.e., σ\sigma\to\infty). It is not difficult to argue that if 𝐗{\bf X} is exchangeable, then we have that

limσPe(σ)=11n!=Pe().\lim_{\sigma\to\infty}P_{e}(\sigma)=1-\frac{1}{n!}=P_{e}(\infty). (27)

The interpretation is that if σ\sigma is large, then the output 𝐘{\bf Y} carries no information of 𝐗{\bf X}, and the decoder can only rely on the prior knowledge; hence, the best thing that the decoder can do is to guess one of the n!n! hypotheses.

The next result, proved in Appendix B, sharpens the limit in (27) by finding the rate of convergence.

Theorem 3.

Let 𝐗{\bf X} be an exchangeable random vector such that 𝔼[𝐗]<\mathbb{E}[\|{\bf X}\|]<\infty. Then,

limσPe()Pe(σ)1σ=12πi=1n1αi𝔼[Wi],\lim_{\sigma\to\infty}\frac{P_{e}(\infty)-P_{e}(\sigma)}{\frac{1}{\sigma}}=\frac{1}{\sqrt{2\pi}}\sum_{i=1}^{n-1}\alpha_{i}\mathbb{E}\left[W_{i}\right], (28)

where Wi=Xi+1:nXi:n,i[1:n1]W_{i}=X_{i+1:n}-X_{i:n},~{}i\in[1:n-1] and

αi=Vol((𝟎n1,i)[1:n1])Vol((𝟎n1,1)),\alpha_{i}=\frac{{\rm Vol}\left({\cal E}(\mathbf{0}_{n-1},i)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}, (29)

where [1:n1]{\cal H}_{[1:n-1]} is defined in (1), (𝟎n1,1){\cal B}(\mathbf{0}_{n-1},1) is the (n1)(n-1)-dimensional ball centered at the origin with unitary radius, and (𝟎n1,i){\cal E}(\mathbf{0}_{n-1},i) is the (n1)(n-1)-dimensional ellipsoid centered at the origin with unit radii along standard axes except a 12\frac{1}{\sqrt{2}} radius along the ii-th axis.

Finding a closed-form expression for the αi\alpha_{i}’s in (29) does not appear to be an easy task. In the next lemma, we provide upper and lower bounds on the αi\alpha_{i}’s, which lead to expressions that are amenable to computations.

Lemma 2.

In the high noise regime, the convergence rate of the probability of correctness can be bounded as

𝔼[Rn]π(n1)!2n2limσPe()Pe(σ)1σ𝔼[Rn]2π(n1)!,\displaystyle\frac{\mathbb{E}\left[R_{n}\right]}{\sqrt{\pi}(n-1)!2^{\frac{n}{2}}}\leq\lim_{\sigma\to\infty}\frac{P_{e}(\infty)-P_{e}(\sigma)}{\frac{1}{\sigma}}\leq\frac{\mathbb{E}\left[R_{n}\right]}{\sqrt{2\pi}(n-1)!},

where Rn=Xn:nX1:nR_{n}=X_{n:n}-X_{1:n}.

Proof:

We start by observing that

(𝟎n1,212)(𝟎n1,i)(𝟎n1,1),\displaystyle{\cal B}\left(\mathbf{0}_{n-1},2^{-\frac{1}{2}}\right)\subset{\cal E}(\mathbf{0}_{n-1},i)\subset{\cal B}\left(\mathbf{0}_{n-1},1\right), (30)

that is, the ellipsoid (𝟎n1,i){\cal E}(\mathbf{0}_{n-1},i): (i) contains the ball (𝟎n1,212){\cal B}\left(\mathbf{0}_{n-1},2^{-\frac{1}{2}}\right) since (𝟎n1,i){\cal E}(\mathbf{0}_{n-1},i) has minimum radius equal to 2122^{-\frac{1}{2}}; and (ii) is contained inside the ball (𝟎n1,1){\cal B}\left(\mathbf{0}_{n-1},1\right) since (𝟎n1,i){\cal E}(\mathbf{0}_{n-1},i) has maximum radius equal to 11.

Thus, from (30) we obtain

αiVol((𝟎n1,1)[1:n1])Vol((𝟎n1,1))=1(n1)!,\displaystyle\alpha_{i}\leq\frac{{\rm Vol}\left({\cal B}\left(\mathbf{0}_{n-1},1\right)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}=\frac{1}{(n-1)!}, (31)

where the last equality follows since [1:n1]{\cal H}_{[1:n-1]} is a cone that occupies a 1(n1)!\frac{1}{(n-1)!} portion of the space and hence, Vol((𝟎n1,1)[1:n1])=1(n1)!Vol((𝟎n1,1)){\rm Vol}\left({\cal B}\left(\mathbf{0}_{n-1},1\right)\cap{\cal H}_{[1:n-1]}\right)=\frac{1}{(n-1)!}{\rm Vol}\left({\cal B}\left(\mathbf{0}_{n-1},1\right)\right).

Similarly, from (30) we obtain

αi\displaystyle\alpha_{i} Vol((𝟎n1,212)[1:n1])Vol((𝟎n1,1))\displaystyle\geq\frac{{\rm Vol}\left({\cal B}\left(\mathbf{0}_{n-1},2^{-\frac{1}{2}}\right)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}
=|det(212In1)|Vol((𝟎n1,1)[1:n1])Vol((𝟎n1,1))\displaystyle=\left|{\hbox{det}}\left(2^{-\frac{1}{2}}I_{n-1}\right)\right|\frac{{\rm Vol}\left({\cal B}\left(\mathbf{0}_{n-1},1\right)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}
=12n12(n1)!,\displaystyle=\frac{1}{2^{\frac{n-1}{2}}(n-1)!}, (32)

where in the equality we have used the facts that: (i) 212In1(𝟎n1,212)=(𝟎n1,1)2^{\frac{1}{2}}I_{n-1}{\cal B}\left(\mathbf{0}_{n-1},2^{-\frac{1}{2}}\right)={\cal B}\left(\mathbf{0}_{n-1},1\right), (ii) 212In1[1:n1]=[1:n1]2^{\frac{1}{2}}I_{n-1}{\cal H}_{[1:n-1]}={\cal H}_{[1:n-1]}, and (iii) Vol(A𝒮)=|det(A)|Vol(𝒮){\rm Vol}(A{\cal S})=|{\hbox{det}}(A)|{\rm Vol}({\cal S}) for any invertible matrix AA and any set 𝒮{\cal S}.

The proof of Lemma 2 is concluded by substituting 31 and (IV-C) into (28) and by using the fact that

i=1n1𝔼[Wi]=𝔼[Rn],\displaystyle\sum_{i=1}^{n-1}\mathbb{E}[W_{i}]=\mathbb{E}[R_{n}], (33)

where Rn=Xn:nX1:nR_{n}=X_{n:n}-X_{1:n} denotes the range [23] of 𝐗{\bf X}. ∎

We conclude this section by providing some examples of the range RnR_{n} for a few common distributions (see Appendix E for the detailed computations). In particular, these examples show that the term 1(n1)!\frac{1}{(n-1)!} dominates in the expression of the rate for several distribution of interest.

Example 1. Consider XUnif(a,b),0a<b<X\sim\text{Unif}(a,b),~{}0\leq a<b<\infty. Then,

𝔼[Rn]=(ba)(n1)n+1.\mathbb{E}[R_{n}]=(b-a)\frac{(n-1)}{n+1}.

Example 2. Consider XExp(λ),λ>0X\sim\text{Exp}(\lambda),~{}\lambda>0. Then,

𝔼[Rn]=1λk=1n11k=O(1λlog(n)).\mathbb{E}[R_{n}]=\frac{1}{\lambda}\sum_{k=1}^{n-1}\frac{1}{k}=O\left(\frac{1}{\lambda}\log(n)\right). (34)

Example 3. Let XX be γ2\gamma^{2}-sub-Gaussian333A random variable XX is γ2\gamma^{2}-sub-Gaussian if 𝔼[eλ(X𝔼[X])]eλγ2\mathbb{E}[e^{\lambda(X-\mathbb{E}[X])}]\leq e^{\lambda\gamma^{2}} for all λ\lambda\in\mathbb{R}. . Then [22],

𝔼[Rn]22γ2log(n).\mathbb{E}[R_{n}]\leq 2\sqrt{2\gamma^{2}\log(n)}.

Appendix A Proof of Theorem 2

Before proceeding with the proof of Theorem 2, we first present two ancillary results.

Lemma 3.

Let 𝐗{\bf X} consist of nn i.i.d. random variables generated according to XX. Let XX^{\prime} be an independent copy of XX and assume that

fXX(x)<,x.f_{X-X^{\prime}}(x)<\infty,~{}\forall x\in\mathbb{R}.

Then, the following holds

fWi(u)<,u,+,1in1,\displaystyle f_{W_{i}}(u)<\infty,~{}\forall u,\in\mathbb{R}_{+},1\leq i\leq n-1,
fWi,Wj(u,v)<,(u,v)+2, 1i<jn1,\displaystyle f_{W_{i},W_{j}}(u,v)<\infty,~{}\forall(u,v)\in\mathbb{R}_{+}^{2},\,1\leq i<j\leq n-1,

where Wi=Xi+1:nXi:n,i[1:n1]W_{i}=X_{i+1:n}-X_{i:n},~{}i\in[1:n-1].

Proof:

The proof is provided in Appendix C. ∎

Lemma 4.

Let 𝐕𝒩(𝟎n1,K~){\bf V}\sim{\cal N}(\mathbf{0}_{n-1},\tilde{K}) where K~\tilde{K} is defined in (12). Then, for any subset [1:n1]{\cal I}\subseteq[1:n-1],

Pr(i{Viti})iPr({Viti}).\Pr\left(\bigcap_{i\in{\cal I}}\{V_{i}\leq t_{i}\}\right)\leq\prod_{i\in{\cal I}}\Pr\left(\{V_{i}\leq t_{i}\}\right). (35)
Proof:

The bound in (35) holds if the random vector 𝐕{\bf V} consists of negatively associated random variables [24]. Observe that the Gaussian random vector 𝐕𝒩(𝟎n1,K~){\bf V}\sim{\cal N}(\mathbf{0}_{n-1},\tilde{K}) consists of either negatively correlated or independent random variables (see the structure of K~\tilde{K} in (12)). As was shown in [24], this implies that the random variables in 𝐕{{\bf V}} are negatively associated. This concludes the proof of Lemma 4. ∎

We now leverage the two lemmas above to prove Theorem 2. From Corollary 1 we have that

Pe\displaystyle P_{e} =1𝔼[Pr(𝐕Tτ𝐗σ|𝐗)|𝐗τ],\displaystyle=1-\mathbb{E}\left[\Pr\left({\bf V}\geq-\frac{T_{\tau}{\bf X}}{\sigma}\;\middle|\;{\bf X}\right)\;\middle|\;{\bf X}\in\mathcal{H}_{\tau}\right], (36)

where 𝐕𝒩(𝟎n1,K~){\bf V}\sim{\cal N}(\mathbf{0}_{n-1},\tilde{K}). The expression in (36) can be equivalently written as

Pe\displaystyle P_{e} =𝔼[1Pr(i=1n1{ViXτiXτi+1σ}|𝐗)|𝐗τ]\displaystyle=\mathbb{E}\left[1\!-\!\Pr\!\left(\bigcap_{i=1}^{n-1}\left\{V_{i}\geq\frac{X_{\tau_{i}}\!-\!X_{\tau_{i+1}}}{\sigma}\right\}\;\middle|\;{\bf X}\right)\;\middle|\;{\bf X}\in{\cal H}_{\tau}\right]
=𝔼[Pr(i=1n1{Vi<XτiXτi+1σ}|𝐗)|𝐗τ]\displaystyle=\mathbb{E}\left[\Pr\left(\bigcup_{i=1}^{n-1}\left\{V_{i}<\frac{X_{\tau_{i}}\!-\!X_{\tau_{i+1}}}{\sigma}\right\}\;\middle|\;{\bf X}\right)\;\middle|\;{\bf X}\in{\cal H}_{\tau}\right]
=(a)Pr(i=1n1{Vi<XτiXτi+1σ}|𝐗τ)\displaystyle\overset{\rm(a)}{=}\Pr\left(\bigcup_{i=1}^{n-1}\left\{V_{i}<\frac{X_{\tau_{i}}\!-\!X_{\tau_{i+1}}}{\sigma}\right\}\;\middle|\;{\bf X}\in{\cal H}_{\tau}\right)
=(b)k=1n1((1)k1[1:n1]||=kPr(𝒜)),\displaystyle\overset{\rm(b)}{=}\sum_{k=1}^{n-1}\left((-1)^{k-1}\!\sum_{\begin{subarray}{c}{\cal I}\subseteq[1:n-1]\\ |{\cal I}|=k\end{subarray}}\Pr\left({\cal A}_{{\cal I}}\right)\right), (37)

where (a)\rm{(a)} is due to the law of total expectation, and (b)\rm{(b)} follows from the inclusion-exclusion principle where 𝒜=i𝒜i{\cal A}_{{\cal I}}=\cap_{i\in{\cal I}}{\cal A}_{i} with 𝒜i={Vi<σ1(XτiXτi+1)𝐗τ}{\cal A}_{i}=\{V_{i}<\sigma^{-1}(X_{\tau_{i}}-X_{\tau_{i+1}})\mid{\bf X}\in{\cal H}_{\tau}\}.

From the expression in (A) it follows that

limσ0Peσ\displaystyle\lim_{\sigma\to 0}\frac{P_{e}}{\sigma} =k=1n1((1)k1[1:n1]||=klimσ01σPr(𝒜)).\displaystyle=\sum_{k=1}^{n-1}\left((-1)^{k-1}\sum_{\begin{subarray}{c}{\cal I}\subseteq[1:n-1]\\ |{\cal I}|=k\end{subarray}}\lim_{\sigma\to 0}\frac{1}{\sigma}\Pr\left({\cal A}_{{\cal I}}\right)\right). (38)

In what follows, we therefore analyze Pr(𝒜)\Pr\left({\cal A}_{{\cal I}}\right). We have that

Pr(𝒜)\displaystyle\Pr\left({\cal A}_{{\cal I}}\right) =𝔼[Pr(i{Vi<XτiXτi+1σ}|𝐗)|𝐗τ]\displaystyle=\mathbb{E}\left[\Pr\!\left(\bigcap_{i\in{\cal I}}\left\{V_{i}<\frac{X_{\tau_{i}}\!-\!X_{\tau_{i+1}}}{\sigma}\right\}\!\;\middle|\;\!{\bf X}\right)\!\;\middle|\;\!{\bf X}\in{\cal H}_{\tau}\right]
=𝔼[Pr(i{Vi<Xi:nXi+1:nσ}|𝐗)]\displaystyle=\mathbb{E}\left[\Pr\!\left(\bigcap_{i\in{\cal I}}\left\{V_{i}<\frac{X_{i:n}\!-\!X_{i+1:n}}{\sigma}\right\}\;\middle|\;{\bf X}\right)\right]
=𝔼[Pr(i{Vi<Wiσ}|𝐖)],\displaystyle=\mathbb{E}\left[\Pr\!\left(\bigcap_{i\in{\cal I}}\left\{V_{i}<\frac{-W_{i}}{\sigma}\right\}\;\middle|\;{\bf W}_{{\cal I}}\right)\right], (39)

where 𝐖{\bf W}_{{\cal I}} is a kk-dimensional random vector with entries Wi=Xi+1:nXi:nW_{i}=X_{i+1:n}-X_{i:n} for ii\in{\cal I}.

We next consider two separate cases.

\bullet Case 1: k=1k=1. Let ={i}{\cal I}=\{i\}; then, we can write (A) as

Pr(𝒜)\displaystyle\Pr\left({\cal A}_{{\cal I}}\right) =𝔼[Pr(Vi<Wiσ|Wi)]\displaystyle=\mathbb{E}\left[\Pr\left(V_{i}<\frac{-W_{i}}{\sigma}\;\middle|\;W_{i}\right)\right]
=𝔼[Q(Wi2σ)]\displaystyle=\mathbb{E}\left[Q\left(\frac{W_{i}}{\sqrt{2}\sigma}\right)\right]
=0Q(w2σ)fWi(w)dw\displaystyle=\int_{0}^{\infty}Q\left(\frac{w}{\sqrt{2}\sigma}\right)f_{W_{i}}\left(w\right){\rm d}w
=0Q(u)fWi(2σu)2σdu,\displaystyle=\int_{0}^{\infty}Q\left(u\right)f_{W_{i}}\left(\sqrt{2}\sigma u\right)\sqrt{2}\sigma\ {\rm d}u,

where the last equality follows by applying the change of variable. Thus, we have that

limσ01σPr(𝒜)\displaystyle\lim_{\sigma\to 0}\frac{1}{\sigma}\Pr\left({\cal A}_{{\cal I}}\right) =20limσ0Q(u)fWi(2σu)du\displaystyle=\sqrt{2}\int_{0}^{\infty}\lim_{\sigma\to 0}Q\left(u\right)f_{W_{i}}\left(\sqrt{2}\sigma u\right){\rm d}u
=2fWi(0+)0Q(u)du\displaystyle=\sqrt{2}f_{W_{i}}\left(0^{+}\right)\int_{0}^{\infty}Q\left(u\right){\rm d}u
=fWi(0+)π,\displaystyle=\frac{f_{W_{i}}\left(0^{+}\right)}{\sqrt{\pi}}, (40)

where the first equality follows from the dominated convergence theorem, which is verifiable since for any σ\sigma, Q(u)fWi(2σu)Q(u)maxtfWi(t)<Q(u)f_{W_{i}}(\sqrt{2}\sigma u)\leq Q(u)\max_{t}f_{W_{i}}(t)<\infty where the fact that fWi(t)<,t+f_{W_{i}}(t)<\infty,~{}\forall t\in\mathbb{R}_{+} is shown in Lemma 3. \square

\bullet Case 2: k2k\geq 2. By using the bound in Lemma 4, we obtain

Pr(𝒜)\displaystyle\Pr\left({\cal A}_{{\cal I}}\right) 𝔼[iPr(Vi<Wiσ|𝐖)]\displaystyle\leq\mathbb{E}\left[\prod_{i\in{\cal I}}\Pr\!\left(V_{i}<\frac{-W_{i}}{\sigma}\;\middle|\;{\bf W}_{{\cal I}}\right)\right]
=𝔼[iQ(Wi2σ)]\displaystyle=\mathbb{E}\left[\prod_{i\in{\cal I}}Q\!\left(\frac{W_{i}}{\sqrt{2}\sigma}\right)\right]
𝔼[i𝒥𝒥,|𝒥|=2Q(Wi2σ)].\displaystyle\leq\mathbb{E}\left[\prod_{\begin{subarray}{c}i\in{\cal J}\\ {\cal J}\subset{\cal I},|{\cal J}|=2\end{subarray}}Q\!\left(\frac{W_{i}}{\sqrt{2}\sigma}\right)\right]. (41)

By letting 𝒥={s,t}{\cal J}=\{s,t\}\subset{\cal I} in (41), we obtain that

limσ0Pr(𝒜)σ\displaystyle\lim_{\sigma\to 0}\frac{\Pr\left({\cal A}_{{\cal I}}\right)}{\sigma}
limσ0σ00Q(w2)Q(z2)fWs,Wt(σw,σz)dwdz\displaystyle\leq\lim_{\sigma\to 0}\sigma\!\!\int_{0}^{\infty}\!\!\!\int_{0}^{\infty}\!\!Q\!\left(\frac{w}{\sqrt{2}}\right)\!Q\!\left(\frac{z}{\sqrt{2}}\right)\!f_{W_{s},W_{t}}(\sigma w,\sigma z)\ {\rm d}w\ {\rm d}z
limσ0σfWs,Wt(0+,0+)π=0,\displaystyle\leq\lim_{\sigma\to 0}\sigma\frac{f_{W_{s},W_{t}}(0^{+},0^{+})}{\pi}=0, (42)

where the equality follows from Lemma 3, and the second inequality is due to the fact that

limσ000Q(w2)Q(z2)fWs,Wt(σw,σz)dwdz\displaystyle\lim_{\sigma\to 0}\int_{0}^{\infty}\!\!\!\int_{0}^{\infty}\!\!Q\!\left(\frac{w}{\sqrt{2}}\right)\!Q\!\left(\frac{z}{\sqrt{2}}\right)\!f_{W_{s},W_{t}}(\sigma w,\sigma z)\ {\rm d}w\ {\rm d}z
=(a)00Q(w2)Q(z2)fWs,Wt(0+,0+)dwdz\displaystyle\stackrel{{\scriptstyle{\rm{(a)}}}}{{=}}\int_{0}^{\infty}\!\!\!\int_{0}^{\infty}\!\!Q\!\left(\frac{w}{\sqrt{2}}\right)\!Q\!\left(\frac{z}{\sqrt{2}}\right)\!f_{W_{s},W_{t}}(0^{+},0^{+})\ {\rm d}w\ {\rm d}z
=fWs,Wt(0+,0+)π<,\displaystyle=\frac{f_{W_{s},W_{t}}(0^{+},0^{+})}{\pi}<\infty, (43)

where (a)\rm{(a)} follows from the dominated convergence theorem, which is verifiable by means of Lemma 3. \square

By using the limits in (A) and (42) inside (38), we obtain

limσ0Peσ\displaystyle\lim_{\sigma\to 0}\frac{P_{e}}{\sigma} =(a)[1:n1]||=1limσ01σPr(𝒜)\displaystyle\overset{\rm(a)}{=}\sum_{\begin{subarray}{c}{\cal I}\subseteq[1:n-1]\\ |{\cal I}|=1\end{subarray}}\lim_{\sigma\to 0}\frac{1}{\sigma}\Pr\left({\cal A}_{{\cal I}}\right)
=(b)i=1n1fWi(0+)π,\displaystyle\overset{\rm(b)}{=}\sum_{i=1}^{n-1}\frac{f_{W_{i}}\left(0^{+}\right)}{\sqrt{\pi}}, (44)

where (a)\rm(a) follows from (42), and (b)\rm(b) follows from (A). This concludes the proof of Theorem 2.

Appendix B Proof of Theorem 3

We start by noting that, in view of the limit in (27), we have that limσPc=1n!\lim_{\sigma\to\infty}P_{c}=\frac{1}{n!}. We now consider the following limit,

limσPc1n!1σ.\lim_{\sigma\to\infty}\frac{P_{c}-\frac{1}{n!}}{\frac{1}{\sigma}}. (45)

Instead of working with σ\sigma, we parameterize the problem in terms of σ=1κ\sigma=\frac{1}{\kappa}. Then, (45) can be equivalently expressed as

limσPc1n!1σ=limκ0Pc1n!κ=limκ0Pcκ,\displaystyle\lim_{\sigma\to\infty}\frac{P_{c}-\frac{1}{n!}}{\frac{1}{\sigma}}=\lim_{\kappa\to 0}\frac{P_{c}-\frac{1}{n!}}{\kappa}=\lim_{\kappa\to 0}\frac{\partial P_{c}}{\partial\kappa}, (46)

where the last equality can be argued by using the definition of the derivative or the L’Hôpital’s rule.

From Corollary 1, the probability of correctness is given by

Pc=1Pe\displaystyle P_{c}=1-P_{e} =𝔼[Qσ2K~(Tτ𝐗)|𝐗τ]\displaystyle=\mathbb{E}\left[Q_{\sigma^{2}\tilde{K}}\!\left(-T_{\tau}{\bf X}\right)\;\middle|\;{\bf X}\in\mathcal{H}_{\tau}\right]
=𝔼[Pr(1κ𝐕𝐖|𝐖)]\displaystyle=\mathbb{E}\left[\Pr\left(\frac{1}{\kappa}{\bf V}\geq-{\bf W}\;\middle|\;{\bf W}\right)\right]
=𝔼[𝐯n11{𝐯κ𝐖}f𝐕(𝐯)d𝐯],\displaystyle=\mathbb{E}\left[\int_{{\bf v}\in\mathbb{R}^{n-1}}\!1_{\left\{{\bf v}\geq-\kappa{\bf W}\right\}}f_{{{\bf V}}}({\bf v}){\rm d}{\bf v}\right], (47)

where we let 𝐕𝒩(𝟎n1,K~){\bf V}\sim{\cal N}(\mathbf{0}_{n-1},\tilde{K}) and 𝐖=Tτ𝐗𝐗τ{\bf W}=T_{\tau}{\bf X}\mid{\bf X}\in{\cal H}_{\tau}, and we used the exchangeablity of 𝐗{\bf X}.

Using the expression in (47), the derivative of PcP_{c} with respect to κ\kappa is now given by

Pcκ\displaystyle\frac{\partial P_{c}}{\partial\kappa} =(a)𝔼[𝐯n1κ1{𝐯κ𝐖}f𝐕(𝐯)d𝐯]\displaystyle\overset{\rm(a)}{=}\mathbb{E}\left[\int_{{\bf v}\in\mathbb{R}^{n-1}}\!\frac{\partial}{\partial\kappa}1_{\left\{{\bf v}\geq-\kappa{{\bf W}}\right\}}f_{{{\bf V}}}({\bf v}){\rm d}{\bf v}\right]
=(b)𝔼[𝐯n1(κ,𝐯,𝐖)f𝐕(𝐯)d𝐯],\displaystyle\overset{\rm(b)}{=}\mathbb{E}\left[\int_{{\bf v}\in\mathbb{R}^{n-1}}\!\bigtriangleup(\kappa,{\bf v},{\bf W})f_{{{\bf V}}}({\bf v}){\rm d}{\bf v}\right], (48)

where in (a)\rm(a) we used the Leibniz’s integral rule, and (b)\rm(b) follows since

κ1{𝐯κ𝐖}\displaystyle\frac{\partial}{\partial\kappa}1_{\left\{{\bf v}\geq-\kappa{{\bf W}}\right\}} =κi=1n11{viκWi}\displaystyle=\frac{\partial}{\partial\kappa}\prod_{i=1}^{n-1}1_{\left\{v_{i}\geq-\kappa W_{i}\right\}}
=(b1)κi=1n11{viWiκ}\displaystyle{\overset{\rm(b1)}{=}}\frac{\partial}{\partial\kappa}\prod_{i=1}^{n-1}1_{\left\{\frac{-v_{i}}{W_{i}}\leq\kappa\right\}}
=(b2)i=1n1δ(κ+viWi)j=1jin11{vjWjκ}\displaystyle{\overset{\rm(b2)}{=}}\sum_{i=1}^{n-1}\delta\left(\kappa+\frac{v_{i}}{W_{i}}\right)\prod_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{n-1}1_{\left\{\frac{-v_{j}}{W_{j}}\leq\kappa\right\}}
=(b3)i=1n1Wiδ(κWi+vi)j=1jin11{vjWjκ}\displaystyle{\overset{\rm(b3)}{=}}\sum_{i=1}^{n-1}W_{i}\delta\left(\kappa W_{i}+v_{i}\right)\prod_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{n-1}1_{\left\{\frac{-v_{j}}{W_{j}}\leq\kappa\right\}}
(κ,𝐯,𝐖),\displaystyle\triangleq\bigtriangleup(\kappa,{\bf v},{\bf W}), (49)

where the labeled equalities follow from: (b1)\rm(b1) since each entry Wi,i[1:n1]W_{i},i\in[1:n-1] of 𝐖{\bf W} is positive (we ignore the case when (Tτ𝐗)i=0(T_{\tau}{\bf X})_{i}=0, and in fact fWi(0)=0f_{W_{i}}(0)=0 for an i.i.d. 𝐗{\bf X}); (b2)\rm(b2) the product rule and the fact that x1{0x+t}=δ(x+t)\frac{\partial}{\partial x}1_{\{0\leq x+t\}}=\delta(x+t), where δ(x)\delta(x) is the Dirac delta function; (b3)\rm(b3) the scaling property of the Dirac delta function.

We now consider the integral inside the expectation in (B). By using the sifting property of the Dirac delta function, the integral becomes

𝐯n1(κ,𝐯,𝐖)f𝐕(𝐯)d𝐯\displaystyle\int_{{\bf v}\in\mathbb{R}^{n-1}}\!\bigtriangleup(\kappa,{\bf v},{\bf W})f_{{{\bf V}}}({\bf v}){\rm d}{\bf v}
=i=1n1Wi𝐮n2j=1jin21{ujWjκ}f𝐕i,Vi(𝐮,κWi)d𝐮,\displaystyle\!=\!\sum_{i=1}^{n-1}W_{i}\!\int_{{\bf u}\in\mathbb{R}^{n-2}}\!\prod_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{n-2}\!1_{\left\{\frac{-u_{j}}{W_{j}}\leq\kappa\right\}}f_{{{\bf V}}_{\setminus i},V_{i}}({\bf u},-\kappa W_{i}){\rm d}{\bf u}, (50)

where 𝐕i{{\bf V}}_{\setminus i} is obtained by retaining all the entries of 𝐕{\bf V} except the ii-th one. We next substitute (B) inside (B) and we compute the limit in (46). We obtain

limκ0Pcκ\displaystyle\lim_{\kappa\to 0}\frac{\partial P_{c}}{\partial\kappa} =(a)𝔼[i=1n1Wi𝐮n2j=1jin21{uj0}f𝐕i,Vi(𝐮,0)d𝐮]\displaystyle\overset{\rm(a)}{=}\mathbb{E}\!\left[\sum_{i=1}^{n-1}W_{i}\int_{{\bf u}\in\mathbb{R}^{n-2}}\!\!\prod_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{n-2}\!1_{\left\{u_{j}\geq 0\right\}}f_{{{\bf V}}_{\setminus i},V_{i}}({\bf u},0)\ {\rm d}{\bf u}\right]
=i=1n1𝔼[Wi]𝐮+n2f𝐕i,Vi(𝐮,0)d𝐮\displaystyle=\sum_{i=1}^{n-1}\mathbb{E}\left[W_{i}\right]\int_{{\bf u}\in\mathbb{R}^{n-2}_{+}}f_{{{\bf V}}_{\setminus i},V_{i}}({\bf u},0)\ {\rm d}{\bf u}
=(b)i=1n1𝔼[Wi]Pr(𝐕i𝟎n2Vi=0)fVi(0),\displaystyle\stackrel{{\scriptstyle\rm(b)}}{{=}}\!\sum_{i=1}^{n-1}\!\mathbb{E}\!\left[W_{i}\right]\!\Pr({{\bf V}}_{\setminus i}\!\geq\!\mathbf{0}_{n-2}\mid V_{i}=0)f_{V_{i}}(0), (51)

where the labeled equalities follows from: (a)\rm(a) using the dominated convergence theorem, which is verified since Wij=1jin21{uj0}f𝐕i,Vi(𝐮,0)Wif𝐕i,Vi(𝐮,0)W_{i}\prod_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{n-2}1_{\left\{u_{j}\geq 0\right\}}f_{{{\bf V}}_{\setminus i},V_{i}}({\bf u},0)\leq W_{i}f_{{{\bf V}}_{\setminus i},V_{i}}({\bf u},0) where WiW_{i} is assumed to be absolutely integrable; and (b)\rm(b) using the following,

𝐮+n2f𝐕i,Vi(𝐮,0)d𝐮\displaystyle\int_{{\bf u}\in\mathbb{R}^{n-2}_{+}}f_{{{\bf V}}_{\setminus i},V_{i}}({\bf u},0)\ {\rm d}{\bf u} =𝐮+n2f𝐕i|Vi(𝐮0)fVi(0)d𝐮\displaystyle=\int_{{\bf u}\in\mathbb{R}_{+}^{n-2}}\!f_{{{\bf V}}_{\setminus i}|V_{i}}({\bf u}\mid 0)f_{V_{i}}(0){\rm d}{\bf u}
=Pr(𝐕i𝟎n2|Vi=0)fVi(0).\displaystyle=\Pr({{\bf V}}_{\setminus i}\geq\mathbf{0}_{n-2}|V_{i}=0)f_{V_{i}}(0).

To finalize the proof, it remains to compute Pr(𝐕i𝟎n2|Vi=0)=αi\Pr({{\bf V}}_{\setminus i}\geq\mathbf{0}_{n-2}|V_{i}=0)=\alpha_{i}. This can be done as follows,

αi=Pr(𝐕i𝟎n2Vi=0)\displaystyle\alpha_{i}=\Pr({{\bf V}}_{\setminus i}\geq\mathbf{0}_{n-2}\mid V_{i}=0)
=(a)Pr(j=1,jin1{Zj+1Zj0}|Zi+1Zi=0)\displaystyle\stackrel{{\scriptstyle\rm(a)}}{{=}}\Pr\!\left(\bigcap_{{j=1,j\neq i}}^{n-1}\{Z_{j+1}-Z_{j}\geq 0\}\;\middle|\;Z_{i+1}-Z_{i}=0\right)
=Pr({Z1Zi}{Zi+1Zn}Zi+1=Zi)\displaystyle=\Pr\!\left(\{Z_{1}\leq\cdots\leq Z_{i}\}\cap\{Z_{i+1}\leq\cdots\leq Z_{n}\}\mid Z_{i+1}=Z_{i}\right)
=(b)𝔼[Pr({Zi}{Zi+1}|Zi+1,Zi)|Zi+1=Zi]\displaystyle\stackrel{{\scriptstyle\rm(b)}}{{=}}\mathbb{E}\!\left[\Pr\!\left(\{\cdots\leq Z_{i}\}\cap\{Z_{i+1}\leq\cdots\}|Z_{i+1},Z_{i}\right)|Z_{i+1}=Z_{i}\right]
=Pr({t}{t})fZi,Zi+1Zi=Zi+1(t,t)dt\displaystyle=\int_{-\infty}^{\infty}\Pr\!\left(\{\cdots\leq t\}\cap\{t\leq\cdots\}\right)f_{Z_{i},Z_{i+1}\mid Z_{i}=Z_{i+1}}(t,t)\ {\rm d}t
=(c)Pr({Z1t}{tZn})f12Z(t)dt\displaystyle\stackrel{{\scriptstyle\rm(c)}}{{=}}\int_{-\infty}^{\infty}\Pr\!\left(\{Z_{1}\leq\cdots\leq t\}\cap\{t\leq\cdots\leq Z_{n}\}\right)f_{\frac{1}{\sqrt{2}}{Z}}(t)\ {\rm d}t
=Pr(Z1Zi112ZiZi+2Zn)\displaystyle=\Pr\!\left(Z_{1}\leq\cdots\leq Z_{i-1}\leq\frac{1}{\sqrt{2}}{Z}_{i}\leq Z_{i+2}\leq\cdots\leq Z_{n}\right)
=(d)Pr(Ai𝐙[1:n1])\displaystyle\stackrel{{\scriptstyle\rm(d)}}{{=}}\Pr\!\left(A_{i}{\bf Z}\in{\cal H}_{[1:n-1]}\right)
=Pr(𝐙Ai1[1:n1])\displaystyle=\Pr\!\left({\bf Z}\in A_{i}^{-1}{\cal H}_{[1:n-1]}\right)
=(e)Vol((𝟎n1,1)Ai1[1:n1])Vol((𝟎n1,1))\displaystyle\overset{\rm(e)}{=}\frac{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\cap A_{i}^{-1}{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}
=(f)|det(Ai1)|Vol(Ai(𝟎n1,1)[1:n1])Vol((𝟎n1,1))\displaystyle\overset{\rm(f)}{=}\left|{\hbox{det}}\left(A_{i}^{-1}\right)\right|\frac{{\rm Vol}\left(A_{i}{\cal B}(\mathbf{0}_{n-1},1)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}
=(g)2Vol((𝟎n1,i)[1:n1])Vol((𝟎n1,1)),\displaystyle\overset{\rm(g)}{=}\sqrt{2}\frac{{\rm Vol}\left({\cal E}(\mathbf{0}_{n-1},i)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}, (52)

where the labeled equalities follow from: (a)\rm(a) the definition of 𝐕{{\bf V}} and writing it in terms of standard normal; (b)\rm(b) the law of total expectation, where we abbreviate {Zi}{Zi+1}{Z1Zi}{Zi+1Zn}\{\cdots\leq Z_{i}\}\cap\{Z_{i+1}\leq\cdots\}\triangleq\{Z_{1}\leq\cdots\leq Z_{i}\}\cap\{Z_{i+1}\leq\cdots\leq Z_{n}\}; (c)\rm(c) using the fact that

fZi,Zi+1Zi=Zi+1(t,t)\displaystyle f_{Z_{i},Z_{i+1}\mid Z_{i}=Z_{i+1}}(t,t) =fZi,Zi+1(t,t)fZi,Zi+1(z,z)dz\displaystyle=\frac{f_{Z_{i},Z_{i+1}}(t,t)}{\int_{-\infty}^{\infty}f_{Z_{i},Z_{i+1}}(z,z)\ {\rm d}z}
=f12Z(t);\displaystyle=f_{\frac{1}{\sqrt{2}}{Z}}(t);

(d)\rm(d) letting 𝐙𝒩(𝟎n1,In1){\bf{Z}}\sim\mathcal{N}(\mathbf{0}_{n-1},I_{n-1}), defining a diagonal matrix Ai(n1)×(n1)A_{i}\in\mathbb{R}^{(n-1)\times(n-1)} with the ii-th element equal to 12\frac{1}{\sqrt{2}} and the others equal to one, and recalling that from (1) we have [1:n1]={𝐱n1:x1xn1}{\cal H}_{[1:n-1]}=\{{\bf x}\in\mathbb{R}^{n-1}:x_{1}\leq\cdots\leq x_{n-1}\}; (e)\rm(e) using the (n1)(n-1)-dimensional volume expression for the probability of a standard normal vector [2]; (f)\rm(f) the fact that Vol(A𝒮)=|det(A)|Vol(𝒮){\rm Vol}(A{\cal S})=|{\hbox{det}}(A)|{\rm Vol}({\cal S}) for any invertible matrix AA and any set 𝒮{\cal S}; (g)\rm(g) letting (𝟎n1,i){\cal E}(\mathbf{0}_{n-1},i) be the (n1)(n-1)-dimensional ellipsoid centered at the origin with unit radii along standard axes except a 12\frac{1}{\sqrt{2}} radius along the ii-th axis.

Substituting (52) into (B), and noting that fVi(0)=12πf_{V_{i}}(0)=\frac{1}{2\sqrt{\pi}} for all i[1:n1]i\in[1:n-1], we obtain

limσPe()Pe(σ)1σ\displaystyle\lim_{\sigma\to\infty}\frac{P_{e}(\infty)-P_{e}(\sigma)}{\frac{1}{\sigma}} =12πi=1n1αi𝔼[Wi],\displaystyle=\frac{1}{\sqrt{2\pi}}\sum_{i=1}^{n-1}\alpha_{i}\mathbb{E}\left[W_{i}\right], (53)

where, for all i[1:n1]i\in[1:n-1], we have

αi\displaystyle\alpha_{i} =Vol((𝟎n1,i)[1:n1])Vol((𝟎n1,1)).\displaystyle=\frac{{\rm Vol}\left({\cal E}(\mathbf{0}_{n-1},i)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}. (54)

This concludes the proof of Theorem 3.

Appendix C Proof of Lemma 3

We start by noting that the joint density function fWi,Wj(u,v),1i<jn1f_{W_{i},W_{j}}(u,v),~{}1\leq i<j\leq n-1 is given by [20]

fWi,Wj(u,v)\displaystyle f_{W_{i},W_{j}}(u,v)
=n!x+uF(y)i2(i2)!(F(x)F(x+u))ji2(ji2)!\displaystyle=n!\!\int_{-\infty}^{\infty}\int_{x+u}^{\infty}\!\frac{F(y)^{i-2}}{(i-2)!}\frac{(F(x)\!-\!F(x+u))^{j-i-2}}{(j-i-2)!}
×(1F(y+v))nj(nj)!f(x)f(x+u)f(y)f(y+v)dydx,\displaystyle~{}~{}~{}\times\!\frac{(1\!-\!F(y\!+\!v))^{n-j}}{(n-j)!}f(x)f(x\!+\!u)f(y)f(y\!+\!v)\ {\rm d}y\ {\rm d}x,

where F()F(\cdot) is the cumulative distribution function of XX and f()f(\cdot) is the probability density function of XX.

By using the upper bounds of F(x)1F(x)\leq 1 and 1F(x)11-F(x)\leq 1, we obtain

fWi,Wj(u,v)\displaystyle f_{W_{i},W_{j}}(u,v)
n!x+uf(x)f(x+u)f(y)f(y+v)dydx(i2)!(ji2)!(nj)!\displaystyle\leq n!\frac{\int_{-\infty}^{\infty}\int_{x+u}^{\infty}f(x)f(x+u)f(y)f(y+v)\ {\rm d}y\ {\rm d}x}{(i-2)!(j-i-2)!(n-j)!}
n!f(x)f(x+u)f(y)f(y+v)dydx(i2)!(ji2)!(nj)!\displaystyle\leq n!\frac{\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}f(x)f(x+u)f(y)f(y+v)\ {\rm d}y\ {\rm d}x}{(i-2)!(j-i-2)!(n-j)!}
=n!f(x)f(x+u)dxf(y)f(y+v)dy(i2)!(ji2)!(nj)!\displaystyle=n!\frac{\int_{-\infty}^{\infty}f(x)f(x+u)\ {\rm d}x\int_{-\infty}^{\infty}f(y)f(y+v)\ {\rm d}y}{(i-2)!(j-i-2)!(n-j)!}
=n!fXX(u)fXX(v)(i2)!(ji2)!(nj)!\displaystyle=\frac{n!f_{X-X^{\prime}}(u)f_{X-X^{\prime}}(v)}{(i-2)!(j-i-2)!(n-j)!}
<,\displaystyle<\infty,

where the second inequality follows since the integrand is always positive, and the last inequality is due to the assumption that fXX(x)<,xf_{X-X^{\prime}}(x)<\infty,~{}\forall x. This shows that the joint density is bounded everywhere.

For the marginal density, we obtain

fWi(u)\displaystyle f_{W_{i}}(u) =fWi,Wj(u,v)dv\displaystyle=\int_{-\infty}^{\infty}\!f_{W_{i},W_{j}}(u,v)\ {\rm d}v
n!fXX(u)fXX(v)(i2)!(ji2)!(nj)!dv\displaystyle\leq\int_{-\infty}^{\infty}\!\frac{n!f_{X-X^{\prime}}(u)f_{X-X^{\prime}}(v)}{(i-2)!(j-i-2)!(n-j)!}\ {\rm d}v
=n!fXX(u)(i2)!(ji2)!(nj)!\displaystyle=\frac{n!f_{X-X^{\prime}}(u)}{(i-2)!(j-i-2)!(n-j)!}
<,\displaystyle<\infty,

where the inequality follows from the fact that we have shown above that

fWi,Wj(u,v)n!fXX(u)fXX(v)(i2)!(ji2)!(nj)!.\displaystyle f_{W_{i},W_{j}}(u,v)\leq\frac{n!f_{X-X^{\prime}}(u)f_{X-X^{\prime}}(v)}{(i-2)!(j-i-2)!(n-j)!}.

This concludes the proof of Lemma 3.

Appendix D Examples for the Low Noise Regime

D-A Uniform Distribution

For XUnif(a,b),0a<b<X\sim\text{Unif}(a,b),~{}0\leq a<b<\infty, using the formula in (19), we have that

fWi(0+)\displaystyle f_{W_{i}}(0^{+}) =n!ab(xaba)i1(bxba)ni1(ba)2dx(i1)!(ni1)!\displaystyle=\frac{n!\int_{a}^{b}\left(\frac{x-a}{b-a}\right)^{i-1}\left(\frac{b-x}{b-a}\right)^{n-i-1}(b-a)^{-2}\ {\rm d}x}{(i-1)!(n-i-1)!}
=n!ab(xa)i1(bx)ni1dx(ba)n(i1)!(ni1)!\displaystyle=\frac{n!\int_{a}^{b}\left(x-a\right)^{i-1}\left(b-x\right)^{n-i-1}\ {\rm d}x}{(b-a)^{n}(i-1)!(n-i-1)!}
=n!(ba)n(i1)!(ni1)!Γ(ni)Γ(i)(ba)1nΓ(n)\displaystyle=\frac{n!}{(b-a)^{n}(i-1)!(n-i-1)!}\frac{\Gamma(n-i)\Gamma(i)}{(b-a)^{1-n}\Gamma(n)}
=nba,\displaystyle=\frac{n}{b-a},

where Γ()\Gamma(\cdot) is the gamma function. Hence, (17) becomes

limσ0Peσ=i=1n1fWi(0+)π=n(n1)(ba)π.\displaystyle\lim_{\sigma\to 0}\frac{P_{e}}{\sigma}=\sum_{i=1}^{n-1}\frac{f_{W_{i}}(0^{+})}{\sqrt{\pi}}=\frac{n(n-1)}{(b-a)\sqrt{\pi}}.

D-B Exponential Distribution

For the case of XExp(λ)X\sim\text{Exp}(\lambda), using the formula in (19), we have that

fWi(0+)\displaystyle f_{W_{i}}(0^{+}) =λ2n!0(1eλx)i1e(ni1)λxe2λxdx(i1)!(ni1)!\displaystyle=\frac{\lambda^{2}n!\int_{0}^{\infty}(1-e^{-\lambda x})^{i-1}e^{-(n-i-1)\lambda x}e^{-2\lambda x}\ {\rm d}x}{(i-1)!(n-i-1)!}
=λ2n!0(1eλx)i1e(ni+1)λxdx(i1)!(ni1)!\displaystyle=\frac{\lambda^{2}n!\int_{0}^{\infty}(1-e^{-\lambda x})^{i-1}e^{-(n-i+1)\lambda x}\ {\rm d}x}{(i-1)!(n-i-1)!}
=λ2n!(i1)!(ni1)!Γ(ni+1)Γ(i)λΓ(1+n)\displaystyle=\frac{\lambda^{2}n!}{(i-1)!(n-i-1)!}\frac{\Gamma(n-i+1)\Gamma(i)}{\lambda\Gamma(1+n)}
=λ2n!(i1)!(ni1)!(ni)!(i1)!λn!\displaystyle=\frac{\lambda^{2}n!}{(i-1)!(n-i-1)!}\frac{(n-i)!(i-1)!}{\lambda n!}
=λ(ni),\displaystyle=\lambda(n-i),

where Γ()\Gamma(\cdot) is the gamma function. Hence, (17) becomes

limσ0Peσ=i=1n1fWi(0+)π=λ(n1)n2π.\displaystyle\lim_{\sigma\to 0}\frac{P_{e}}{\sigma}=\sum_{i=1}^{n-1}\frac{f_{W_{i}}(0^{+})}{\sqrt{\pi}}=\frac{\lambda(n-1)n}{2\sqrt{\pi}}.

Appendix E Examples for the High Noise Regime

The key to the proof is to use the following expressions from [23],

𝔼[X1:n]=nx(1F(x))n1f(x)dx,\displaystyle\mathbb{E}[X_{1:n}]=n\int_{-\infty}^{\infty}x(1-F(x))^{n-1}f(x)\ {\rm d}x,
𝔼[Xn:n]=nxF(x)n1f(x)dx.\displaystyle\mathbb{E}[X_{n:n}]=n\int_{-\infty}^{\infty}xF(x)^{n-1}f(x)\ {\rm d}x.

First, consider XiUnif(a,b),0a<b<X_{i}\sim\text{Unif}(a,b),~{}0\leq a<b<\infty. Then,

𝔼[X1:n]=b+ann+1 and 𝔼[Xn:n]=a+bnn+1,\displaystyle\mathbb{E}[X_{1:n}]=\frac{b+an}{n+1}\ \text{ and }\ \mathbb{E}[X_{n:n}]=\frac{a+bn}{n+1},

and hence, we obtain

𝔼[Rn]=(ba)(n1)n+1.\displaystyle\mathbb{E}[R_{n}]=\frac{(b-a)(n-1)}{n+1}.

Next, let XiExp(λ)X_{i}\sim\text{Exp}(\lambda). Then,

𝔼[X1:n]=1λn and 𝔼[Xn:n]=k=1n1λk,\displaystyle\mathbb{E}[X_{1:n}]=\frac{1}{\lambda n}\ \text{ and }\ \mathbb{E}[X_{n:n}]=\sum_{k=1}^{n}\frac{1}{\lambda k},

and hence, we obtain

𝔼[Rn]=k=1n11λk.\displaystyle\mathbb{E}[R_{n}]=\sum_{k=1}^{n-1}\frac{1}{\lambda k}.

References

  • [1] C. Dwork, “Differential privacy: A survey of results,” in International conference on theory and applications of models of computation.   Springer, 2008, pp. 1–19.
  • [2] M. Jeong, A. Dytso, M. Cardone, and H. V. Poor, “Recovering structure of noisy data through hypothesis testing,” in Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), June 2020, pp. 1307–1312.
  • [3] ——, “Recovering data permutations from noisy observations: The linear regime,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 3, pp. 854–869, 2020.
  • [4] S. R. Searle et al., “Prediction, mixed models, and variance components,” 1973.
  • [5] S. Portnoy, “Maximizing the probability of correctly ordering random variables using linear predictors,” Journal of Multivariate Analysis, vol. 12, no. 2, pp. 256–269, 1982.
  • [6] K. Nomakuchi and T. Sakata, “Characterizations of the forms of covariance matrix of an elliptically contoured distribution,” Sankhyā: The Indian Journal of Statistics, Series A, pp. 205–210, 1988.
  • [7] ——, “Characterization of conditional covariance and unified theory in the problem of ordering random variables,” Annals of the Institute of Statistical Mathematics, vol. 40, no. 1, pp. 93–99, 1988.
  • [8] O. Collier and A. S. Dalalyan, “Minimax rates in permutation estimation for feature matching,” The Journal of Machine Learning Research, vol. 17, no. 6, pp. 1 –31, January 2016.
  • [9] A. Pananjady, M. J. Wainwright, and T. A. Courtade, “Linear regression with shuffled data: Statistical and computational limits of permutation recovery,” IEEE Transactions on Information Theory, vol. 64, no. 5, pp. 3286–3300, May 2018.
  • [10] ——, “Denoising linear models with permuted data,” in Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), June 2017, pp. 446–450.
  • [11] P. Rigollet and J. Weed, “Uncoupled isotonic regression via minimum Wasserstein deconvolution,” Information and Inference: A Journal of the IMA, vol. 8, no. 4, pp. 691–717, December 2019.
  • [12] J. Unnikrishnan, S. Haghighatshoar, and M. Vetterli, “Unlabeled sensing with random linear measurements,” IEEE Transactions on Information Theory, vol. 64, no. 5, pp. 3237–3253, May 2018.
  • [13] S. Haghighatshoar and G. Caire, “Signal recovery from unlabeled samples,” IEEE Transactions on Signal Processing, vol. 66, no. 5, pp. 1242–1257, March 2018.
  • [14] H. Zhang, M. Slawski, and P. Li, “Permutation recovery from multiple measurement vectors in unlabeled sensing,” in Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), July 2019, pp. 1857–1861.
  • [15] I. Dokmanić, “Permutations unlabeled beyond sampling unknown,” IEEE Signal Processing Letters, vol. 26, no. 6, pp. 823–827, April 2019.
  • [16] M. Tsakiris and L. Peng, “Homomorphic sensing,” in Proceedings of the 36th International Conference on Machine Learning (ICML), vol. 97, June 2019, pp. 6335–6344.
  • [17] M. C. Tsakiris, “Eigenspace conditions for homomorphic sensing,” arXiv:1812.07966, April 2019.
  • [18] A. Dytso, M. Cardone, M. S. Veedu, and H. V. Poor, “On estimation under noisy order statistics,” in Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), July 2019, pp. 36–40.
  • [19] S. M. Kay, Fundamentals of Statistical Signal Processing, vol. 2: Detection Theory.   Prentice Hall PTR, 1998.
  • [20] R. Pyke, “Spacings,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 27, no. 3, pp. 395–449, 1965. [Online]. Available: http://www.jstor.org/stable/2345793
  • [21] J. E. Angus, “The probability integral transform and related results,” SIAM review, vol. 36, no. 4, pp. 652–654, 1994.
  • [22] S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities: A nonasymptotic theory of independence.   Oxford university press, 2013.
  • [23] H. A. David and H. N. Nagaraja, “Order statistics,” Encyclopedia of statistical sciences, 2004.
  • [24] K. Joag-Dev and F. Proschan, “Negative association of random variables with applications,” The Annals of Statistics, pp. 286–295, 1983.