Retrieving Data Permutations from Noisy Observations: High and Low Noise Asymptotics

Minoh Jeong^†, Alex Dytso^⋆, Martina Cardone^† ^† University of Minnesota, Minneapolis, MN 55455, USA, Email: {jeong316, mcardone}@umn.edu
^⋆ New Jersey Institute of Technology, Newark, NJ 07102, USA Email: alex.dytso@njit.edu The work of M. Jeong and M. Cardone was supported in part by the U.S. National Science Foundation under Grant CCF-1849757.

Abstract

This paper considers the problem of recovering the permutation of an $n$ -dimensional random vector ${\bf X}$ observed in Gaussian noise. First, a general expression for the probability of error is derived when a linear decoder (i.e., linear estimator followed by a sorting operation) is used. The derived expression holds with minimal assumptions on the distribution of ${\bf X}$ and when the noise has memory. Second, for the case of isotropic noise (i.e., noise with a diagonal scalar covariance matrix), the rates of convergence of the probability of error are characterized in the high and low noise regimes. In the low noise regime, for every dimension $n$ , the probability of error is shown to behave proportionally to $\sigma$ , where $\sigma$ is the noise standard deviation. Moreover, the slope is computed exactly for several distributions and it is shown to behave quadratically in $n$ . In the high noise regime, for every dimension $n$ , the probability of correctness is shown to behave as $1/\sigma$ , and the exact expression for the rate of convergence is also provided.

I Introduction

The problem of recovering data permutations from noisy observations is becoming a common task of modern communication and computing systems. For example, systems based on data sorting operations, such as a recommender system or a data analysis system, make use of the data permutations and leverage the information that can be obtained from the data ordering. In particular, recommender systems clearly utilize the sorting information in order to optimize their next recommendation. As for the case of a recommender system, data analysis systems are also often interested in rankings of massive data sets rather than in the exact values of the data. In such systems, users may desire to enclose their data when it contains sensitive information. A common solution to privatize individual data is to add a sufficient amount of random noise to guarantee the desired privacy level [1]. However, adding too much noise can render the task of recovering a permutation impossible as the data will be too noisy. Therefore, for a given noise level, it is important to understand the fundamental limits of the data permutation recovery problem.

In this work, following preliminary works in [2] and [3], we study the data permutation recovery problem in the framework of an $M$ -ary hypothesis testing. The specific goal of this paper is to study fundamental limits of such problem under the constraint that a linear decoder (i.e., linear estimator followed by a sorting operation) is employed. Studying linear decoders is interesting for several reasons. First, as it was shown in [2] linear decoders are optimal (i.e., they lead to the smallest probability of error) when the noise is isotropic, and the distribution of the input data is exchangeable. Second, the optimal decoder can be linear even if the noise is colored; see [3] for the exact conditions. Third, linear decoders have at most polynomial complexity in the data dimension and hence, they are suitable for practical implementations.

The structure of the paper is as follows. In Section II, we introduce the notation and formally define the problem. In Section III, we characterize the probability of error when linear decoders are used. The derived expression holds with minimal assumptions on the distribution of the data and holds when the noise has memory. In Section IV, we utilize the expression for the error probability derived in Section III and characterize the asymptotic behavior of the probability of error for the isotropic noise case (i.e., when the noise covariance matrix is a diagonal scalar matrix) in the low and high noise regimes. For example, we show that the probability of error linearly increases in $\sigma$ (i.e., the standard deviation of the noise) in the low noise regime (i.e., when $\sigma\to 0$ ). We derive the exact slope and we show it to be at most a quadratic function of the data dimension for a general class of distributions. In addition, we show that the behavior of the probability of correctness in the high noise regime (i.e., when $\sigma\to\infty$ ) is proportional to $\frac{1}{\sigma}$ , and we characterize the exact slope.

I-A Related work

Permutation associated estimation problems have recently gained significant importance and are studied in various fields [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. The ranking (e.g., data permutation) estimation problem under a joint Gaussian distribution was investigated in [4, 5, 6, 7]. In particular, in [4] the author considered a pairwise ordering for the bivariate case; the extended version to the $n$ -dimension was considered in [5]. The generalization of the assumption of a Gaussian distribution to an elliptically contoured distribution can be found in [6, 7]. The authors in [4, 5, 6, 7] analyzed the structure of the covariance matrix that maximizes the probability of correctness of such estimation problems using the minimum mean square error (MMSE) estimator. In [3], the MMSE estimator was shown to be the only linear estimator that achieves the minimum probability of error for the ranking estimation problem. Most of recent works study a problem based on a linear regression framework premultiplied by an unknown permutation matrix, which suitably models the problem with unknown labels. In [8], the feature matching problem in computer vision was formulated as a permutation recovery problem. The multivariate linear regression model with an unknown permutation was studied in [10, 9]. The authors provided necessary and sufficient conditions on the signal-to-noise ratio for an exact permutation recovery and characterized the minimax prediction error. The isotonic regression without data labels, namely the uncoupled isotonic regression, was discussed in [11]. Data estimation given randomly selected measurements – referred to as unlabeled sensing – was studied in [12, 13, 14]. In [12], the authors characterized a necessary condition on the dimension of the observation vector for uniquely recovering the original data in the noiseless case. A generalized framework of unlabeled sensing was presented in [15, 16, 17]. The estimation of a sorted vector based on noisy observations was proposed in [18], where the MMSE estimator on sorted data was characterized as a linear combination of estimators on the unsorted data.

II Notation and Framework

Notation. Boldface upper case letters $\mathbf{X}$ denote vector random variables; the boldface lower case letter $\mathbf{x}$ indicates a specific realization of $\mathbf{X}$ ; $X_{i:n}$ denotes the $i$ -th order statistics of ${\bf X}$ ; $\|{\bf X}\|$ is the norm of ${\bf X}$ ; $[n_{1}:n_{2}]$ is the set of integers from $n_{1}$ to $n_{2}\geq n_{1}$ ; $I_{n}$ is the identity matrix of dimension $n$ ; $\mathbf{0}_{n}$ is the column vector of dimension $n$ of all zeros; calligraphic letters indicate sets; $|\mathcal{A}|$ is the cardinality of $\mathcal{A}$ ; for $\mathcal{A}$ and $\mathcal{B}$ , $\mathcal{A}\setminus\mathcal{B}$ is the set of elements that belong to $\mathcal{A}$ but not to $\mathcal{B}$ , $\mathcal{A}\cap\mathcal{B}$ is the set of elements that belong both to $\mathcal{A}$ and $\mathcal{B}$ , and $\mathcal{A}\cup\mathcal{B}$ is the set of elements which are in either set. For a set ${\cal S}\subseteq\mathbb{R}^{n}$ , ${\rm Vol}({\cal S})$ denotes the volume, i.e., the $n$ -dimensional Lebesgue measure. For two $n$ -dimensional vectors ${\bf x}$ and ${\bf y}$ , if for all $i\in[1:n]$ , the $i$ -th element of ${\bf x}$ is larger than or equal to the $i$ -th element of ${\bf y}$ , then we use ${\bf x}\geq{\bf y}$ . Finally, the multiplication of a matrix $A$ by a set ${\cal B}$ is denoted and defined as $A{\cal B}=\{A{\bf x}:{\bf x}\in{\cal B}\}$ .

We consider the framework in Fig. 1, where an $n$ -dimensional random vector $\mathbf{X}\in\mathbb{R}^{n}$ is first generated according to a certain distribution and then passed through an additive noisy channel with Gaussian transition probability, the output of which is denoted as $\mathbf{Y}$ . Thus, we have $\mathbf{Y}=\mathbf{X}+\mathbf{N}$ , with $\mathbf{N}\sim\mathcal{N}(\mathbf{0}_{n},K_{\mathbf{N}})$ where $K_{\mathbf{N}}$ is the covariance matrix of the additive noise $\mathbf{N}$ , and where $\mathbf{X}$ and $\mathbf{N}$ are independent.

In this work, we are interested in studying the probability of error of the “data permutation recovery” problem formulated in [2, 3] that, given the observation of $\mathbf{Y}$ , seeks to retrieve the permutation (among the $n!$ possible ones) according to which the vector $\mathbf{X}$ is sorted. Specifically, this problem can be formulated within a hypothesis testing framework with $n!$ hypotheses $\mathcal{H}_{\pi},\pi\in\mathcal{P}$ , where $\mathcal{P}$ is the collection of all permutations of the elements of $[1:n]$ , and where $\mathcal{H}_{\pi}$ is the hypothesis that $\mathbf{X}$ is an $n$ -dimensional vector sorted according to the permutation $\pi\in\mathcal{P}$ , that is

\displaystyle{\cal H}_{\pi}=\{\mathbf{x}\in\mathbb{R}^{n}:x_{\pi_{1}}\leq x_{\pi_{2}}\leq\cdots\leq x_{\pi_{n}}\},

(1)

with $x_{\pi_{i}},i\in[1:n]$ being the $\pi_{i}$ -th element of $\mathbf{x}$ , and $\pi_{i},i\in[1:n]$ being the $i$ -th element of $\pi$ . Given this, the optimal decoder in Fig. 1 will output $\mathcal{H}_{\hat{\pi}},\hat{\pi}\in\mathcal{P}$ such that

\displaystyle\mathcal{H}_{\hat{\pi}}:\ \hat{\pi}=\operatornamewithlimits{argmin}_{\pi\in\mathcal{P}}\ \{\Pr\left(\mathcal{H}_{\pi}\neq\mathcal{H}_{\pi^{\star}}\right)\},

(2)

where $\pi^{\star}$ denotes the permutation according to which the random vector $\mathbf{X}$ is sorted. In particular, the decoder will declare that the input vector $\mathbf{x}\in\mathcal{H}_{\pi}$ if and only if the observation vector $\mathbf{y}\in\mathcal{R}_{\pi,K_{\mathbf{N}}}$ , where $\mathcal{R}_{\pi,K_{\mathbf{N}}},\pi\in\mathcal{P}$ are the so-called optimal decision regions¹¹1The notation $\mathcal{R}_{\pi,K_{\mathbf{N}}}$ indicates that, in general, the decision regions might be functions of the noise covariance matrix $K_{\mathbf{N}}$ ., which can be derived by leveraging the maximum a posterior probability (MAP) criterion [19, Appendix 3C] and are given by [2, 3]

\displaystyle\mathcal{R}_{\pi,K_{\mathbf{N}}}=\left\{{\bf y}\in\mathbb{R}^{n}:f_{\bf Y}({\bf y},\mathcal{H}_{\pi})>\max_{\begin{subarray}{c}\tau\in\mathcal{P}\\ \tau\neq\pi\end{subarray}}f_{\bf Y}({\bf y},\mathcal{H}_{\tau})\right\},

(3)

where $f_{\bf Y}({\bf y},\mathcal{H}_{\pi})=f_{\bf Y}({\bf y}|\mathcal{H}_{\pi})\Pr(\mathcal{H}_{\pi})$ with $f_{\bf Y}({\bf y}|\mathcal{H}_{\pi})$ denoting the conditional probability density function of ${\bf Y}$ given that ${\bf X}\in\mathcal{H}_{\pi}$ . In order to guarantee that the collection $\{\mathcal{R}_{\pi,K_{\mathbf{N}}},\pi\in\mathcal{P}\}$ is a partition of the $n$ -dimensional space, we assume that if $\mathbf{y}\in\{\mathcal{R}_{\pi,K_{\mathbf{N}}},\pi\in\mathcal{S},\mathcal{S}\subseteq\mathcal{P},|\mathcal{S}|>1\}$ , then one of the hypotheses $\mathcal{H}_{\pi},\pi\in\mathcal{S}$ is arbitrarily selected.

Figure 1: Graphical representation of the considered framework.

III Probability of Error with Linear Decoder

In this section, we focus on characterizing the probability of error of the data permutation recovery problem introduced in Section II. Given the hypothesis and decision regions defined in (1) and (3), we have that the error probability $P_{e}$ is given by


$\displaystyle P_{e}$	$\displaystyle=1-P_{c},$	(4a)
$\displaystyle P_{c}$	$\displaystyle=\sum_{\pi\in{\cal P}}\Pr\left(\{{\bf Y}\in\mathcal{R}_{\pi,K_{\mathbf{N}}}\}\cap\{{\bf X}\in\mathcal{H}_{\pi}\}\right),$	(4b)

where $P_{c}$ is the probability of correctness.

In particular, we assess the probability of error when a linear decoder is employed. This decoder first computes a permutation-independent linear transformation $\mathbf{y}_{\ell}$ of $\mathbf{y}$ , i.e., $\mathbf{y}_{\ell}=A\mathbf{y}+\mathbf{b}$ , where $A\in\mathbb{R}^{n\times n}$ and $\mathbf{b}\in\mathbb{R}^{n}$ are the same for all permutations, and then it outputs the permutation according to which $\mathbf{y}_{\ell}$ is sorted. The decision regions in (3) when a linear decoder is used become

\displaystyle\mathcal{R}_{\pi,K_{\mathbf{N}}}=A\mathcal{H}_{\pi}+\mathbf{b},\ \forall\pi\in\mathcal{P}.

(5)

Our choice of assessing the probability of error performance of a linear decoder stems primarily from its low complexity (at most polynomial in $n$ ) compared to a brute force evaluation of the optimal test (3), which has a practically prohibitive complexity of $n!$ . Moreover, for the case ${\bf X}\sim\mathcal{N}(\mathbf{0}_{n},I_{n})$ it has been shown in [3] that a linear decoder is indeed optimal, i.e., it minimizes the probability of error, under certain conditions on the noise covariance matrix $K_{\bf N}$ .

We next derive an expression for the probability of error when a linear decoder is used. Towards this end, for each $\pi\in\mathcal{P}$ , we define a matrix $T_{\pi}\in\mathbb{R}^{(n-1)\times n}$ such that

\displaystyle(T_{\pi})_{i,j}=1_{\{j=\pi_{i+1}\}}-1_{\{j=\pi_{i}\}},

(6)

where $1_{\{x=y\}}=1$ if and only if $x=y$ and is equal to zero otherwise. For instance, let $n=4$ and consider $\pi=\{4,2,1,3\}$ ; then, we have that

\displaystyle T_{\{4,2,1,3\}}=\begin{bmatrix}0&1&0&-1\\ 1&-1&0&0\\ -1&0&1&0\end{bmatrix}.

The theorem below provides an expression for the error probability of the data permutation recovery problem when a linear decoder is used.

Theorem 1.

Let ${\bf X}$ be an exchangeable random vector²²2A sequence of random variables $U_{1},…,U_{n}$ is said to be exchangeable if, for any permutation $(\pi_{1},…,\pi_{n})$ of the indices $[1:n]$ , we have that $(U_{1},…,U_{n})$ is equal in distribution to $(U_{\pi_{1}},…,U_{\pi_{n}})$ .. Then, for any invertible $A$ and $\mathbf{b}$ defined in (5) and any noise covariance matrix $K_{\bf N}$ , the probability of error is given by

\displaystyle P_{e}\!=\!1\!-\!\frac{1}{n!}\sum_{\pi\in{\cal P}}\mathbb{E}\left[Q_{\tilde{K}_{\pi}}\!\left(-T_{\pi}A^{-1}({\bf X}\!-\!{\bf b})\right)\;\middle|\;{\bf X}\in\mathcal{H}_{\pi}\right],

(7)

where $\tilde{K}_{\pi}=T_{\pi}A^{-1}K_{\bf N}A^{-T}T_{\pi}^{T}\in\mathbb{R}^{(n-1)\times(n-1)}$ with $T_{\pi},\pi\in\mathcal{P}$ given by (6), and where $Q_{\tilde{K}_{\pi}}\!(\cdot)\!$ is the multivariate Gaussian Q-function with covariance $\tilde{K}_{\pi}$ .

Proof:

By substituting the decision regions in (5) inside (4) and by using the Bayes’ theorem, we obtain

		$\displaystyle P_{c}=\sum_{\pi\in{\cal P}}\Pr\left({\bf Y}\in A\mathcal{H}_{\pi}+{\bf b}\;\middle\|\;{\bf X}\in\mathcal{H}_{\pi}\right)\Pr\left({\bf X}\in\mathcal{H}_{\pi}\right)$
		$\displaystyle{\overset{\rm(a)}{=}}\frac{1}{n!}\!\sum_{\pi\in{\cal P}}\Pr\left({\bf X}+K_{{\bf N}}^{\frac{1}{2}}{\bf Z}-{\bf b}\in A\mathcal{H}_{\pi}\!\;\middle\|\;\!{\bf X}\in\mathcal{H}_{\pi}\right)$
		$\displaystyle{\overset{\rm(b)}{=}}\frac{1}{n!}\!\sum_{\pi\in{\cal P}}\mathbb{E}\!\left[\Pr\!\left({\bf X}\!+\!K_{{\bf N}}^{\frac{1}{2}}{\bf Z}-{\bf b}\in\!A\mathcal{H}_{\pi}\!\;\middle\|\;\!{\bf X}\right)\!\;\middle\|\;\!{\bf X}\!\in\!\mathcal{H}_{\pi}\right],\!$		(8)

where $\rm(a)$ follows from the fact that ${\bf X}$ is exchangeable and hence, $\Pr({\bf X}\in{\cal H}_{\pi})=\frac{1}{n!},~{}\forall\pi\in{\cal P}$ and letting ${\bf Z}\sim\mathcal{N}(\mathbf{0}_{n},I_{n})$ , and $\rm(b)$ is due to the law of total expectation.

We now focus on the conditional probability inside the conditional expectation in (III). For each ${\cal H}_{\pi},~{}\forall\pi\in{\cal P}$ we have

		$\displaystyle\Pr\left({\bf X}+K_{\bf N}^{\frac{1}{2}}{\bf Z}-{\bf b}\in A\mathcal{H}_{\pi}\;\middle\|\;{\bf X}\right)$
		$\displaystyle=\Pr\left(A^{-1}({\bf X}-{\bf b})+A^{-1}K_{\bf N}^{\frac{1}{2}}{\bf Z}\in\mathcal{H}_{\pi}\;\middle\|\;{\bf X}\right)$
		$\displaystyle=\Pr\left(A^{-1}({\bf X}-{\bf b})+{\bf U}\in\mathcal{H}_{\pi}\;\middle\|\;{\bf X}\right),$		(9)

where the last equality follows by letting ${\bf U}=A^{-1}K_{\bf N}^{\frac{1}{2}}{\bf Z}$ . Note that ${\bf U}\sim{\cal N}(\mathbf{0}_{n},A^{-1}K_{\bf N}A^{-T})$ .

Then, given ${\bf X}$ , the event inside the conditional probability in (III) can be expressed as

	$\displaystyle\left\{A^{-1}({\bf X}-{\bf b})+{\bf U}\in\mathcal{H}_{\pi}\right\}$
	$\displaystyle=\bigcap_{k=1}^{n-1}\!\left\{\left(A^{-1}({\bf X}\!-\!{\bf b})\right)_{\!\pi_{k}}\!\!-\!\left(A^{-1}({\bf X}\!-\!{\bf b})\right)_{\!\pi_{k+1}}\!\leq U_{\pi_{k+1}}\!-\!U_{\pi_{k}}\right\}$
	$\displaystyle=\left\{-T_{\pi}A^{-1}({\bf X}-{\bf b})\leq T_{\pi}{\bf U}\right\},$		(10)

where the last equality follows by using the definition of $T_{\pi},\pi\in\mathcal{P}$ in (6). By introducing a random vector ${\bf V}_{\pi}=T_{\pi}{\bf U}\sim{\cal N}(\mathbf{0}_{n-1},\tilde{K}_{\pi})$ , where $\tilde{K}_{\pi}=T_{\pi}A^{-1}K_{\bf N}A^{-T}T_{\pi}^{T}$ , we have an equivalent expression for (III) as $\Pr\left({\bf X}+K_{\bf N}^{\frac{1}{2}}{\bf Z}-{\bf b}\in A\mathcal{H}_{\pi}\;\middle|\;{\bf X}\right)=\Pr\left(-T_{\pi}A^{-1}({\bf X}-{\bf b})\leq{\bf V}_{\pi}\;\middle|\;{\bf X}\right)$ . By substituting this inside (III), we obtain

	$\displaystyle P_{c}$	$\displaystyle=\!\frac{1}{n!}\!\sum_{\pi\in{\cal P}}\!\mathbb{E}\left[\Pr\left(\!-T_{\pi}A^{-1}({\bf X}-{\bf b})\leq{\bf V}_{\pi}\;\middle\|\;{\bf X}\right)\;\middle\|\;{\bf X}\in\mathcal{H}_{\pi}\right]$
		$\displaystyle=\frac{1}{n!}\sum_{\pi\in{\cal P}}\mathbb{E}\left[Q_{\tilde{K}_{\pi}}\left(-T_{\pi}A^{-1}({\bf X}-{\bf b})\right)\;\middle\|\;{\bf X}\in\mathcal{H}_{\pi}\right],$		(11)

where the last equality follows by letting $Q_{\tilde{K}_{\pi}}(\cdot)$ be the multivariate Gaussian Q-function with covariance $Q_{\tilde{K}_{\pi}}$ . We conclude the proof of Theorem 1 by using $P_{e}=1-P_{c}$ . ∎

We highlight that (7) holds with minimal assumption on the distribution of ${\bf X}$ (i.e., exchangeability) and hence, it can be used to study the error probability of the data permutation recovery problem in various noise settings, e.g., noise has memory, noise is isotropic. In the remaining of this paper, we will focus on the isotropic noise scenario, i.e., we assume that $K_{\bf N}$ is a diagonal scalar matrix.

IV Isotropic Noise

We here study the error probability of the data permutation recovery problem when the noise is isotropic, i.e., $K_{\bf N}=\sigma^{2}I_{n}$ . Under this assumption, the regions $\mathcal{R}_{\pi,K_{\mathbf{N}}},\pi\in\mathcal{P}$ in (5) depends on $K_{\bf N}$ only through the parameter $\sigma$ and hence, we let $\mathcal{R}_{\pi,K_{\mathbf{N}}}=\mathcal{R}_{\pi,\sigma}$ . Moreover, when ${\bf X}$ is exchangeable, it has been shown in [2] that $\mathcal{R}_{\pi,\sigma}=\mathcal{H}_{\pi},\pi\in\mathcal{P}$ , i.e., for the isotropic noise setting the optimal decoder is indeed linear and hence, the probability of error in Theorem 1 is the minimum.

In Section IV-A, we will evaluate the probability of error $P_{e}$ in (7) when $K_{\bf N}=\sigma^{2}I_{n}$ and then in Section IV-B and Section IV-C, we will use this expression to derive the rates of convergence of $P_{e}$ in the low noise regime (i.e., $\sigma\to 0$ ) and high noise regime (i.e., $\sigma\to\infty$ ), respectively.

IV-A Probability of Error

Under the assumption $K_{\bf N}=\sigma^{2}I_{n}$ we have that $\mathcal{R}_{\pi,\sigma}=\mathcal{H}_{\pi},\pi\in\mathcal{P}$ [2] and hence, with reference to (5), we have that $A=I_{n}$ and $\mathbf{b}=\mathbf{0}_{n}$ . Moreover, by substituting these values inside $\tilde{K}_{\pi}\in\mathbb{R}^{(n-1)\times(n-1)}$ in Theorem 1, we obtain

\displaystyle\begin{split}\tilde{K}_{\pi}&=T_{\pi}A^{-1}K_{\bf N}A^{-T}T_{\pi}^{T}=\sigma^{2}T_{\pi}T_{\pi}^{T}=\sigma^{2}\tilde{K},\\ (\tilde{K})_{i,j}&=\left\{\begin{array}[]{ll}2&i=j,\\ -1&i=j+1\ \text{and}\ j=i+1,\\ 0&\text{otherwise,}\end{array}\right.\end{split}

(12)

that is $\tilde{K}\in\mathbb{R}^{(n-1)\times(n-1)}$ is a tridiagonal Toeplitz matrix.

The probability of error in the isotropic noise scenario is then given by the next corollary.

Corollary 1.

Let ${\bf X}$ be an exchangeable random vector and let $K_{\bf N}=\sigma^{2}I_{n}$ . Then, for an arbitrary $\pi\in\mathcal{P}$ , the probability of error is given by

\displaystyle P_{e}=1-\mathbb{E}\left[Q_{\sigma^{2}\tilde{K}}\!\left(-T_{\pi}{\bf X}\right)\;\middle|\;{\bf X}\in\mathcal{H}_{\pi}\right],

(13)

where $\tilde{K}$ is defined in (12) and where $Q_{\tilde{K}}(\cdot)$ is the multivariate Gaussian Q-function with covariance $\sigma^{2}\tilde{K}$ .

Proof:

By substituting the expression of $\sigma^{2}\tilde{K}_{\pi}$ in (12) inside (7), we obtain

\displaystyle P_{e}=1-\frac{1}{n!}\sum_{\pi\in{\cal P}}\mathbb{E}\left[Q_{\sigma^{2}\tilde{K}}\left(-T_{\pi}{\bf X}\right)\;\middle|\;{\bf X}\in\mathcal{H}_{\pi}\right].

(14)

We note that $\sigma^{2}\tilde{K}$ and the distribution of $T_{\pi}{\bf X}\ |\ {\bf X}\in{\cal H}_{\pi}$ are independent of $\pi\in\mathcal{P}$ and hence, the conditional expectation in (14) is constant in $\pi\in\mathcal{P}$ . Since $|\mathcal{P}|=n!$ , we obtain

\displaystyle P_{e}

\displaystyle=1-\mathbb{E}\left[Q_{\sigma^{2}\tilde{K}}\!\left(-T_{\tau}{\bf X}\right)\;\middle|\;{\bf X}\in\mathcal{H}_{\tau}\right],

(15)

where $\tau\in{\cal P}$ can be arbitrary. ∎

We note that (14) is a function of $\sigma$ and hence, in what follows we will use $P_{e}(\sigma)$ to highlight this dependence.

IV-B Low Noise Asymptotic

We here focus on the asymptotic behavior of the probability of error in the low noise regime (i.e., $\sigma\to 0$ ). In particular, the next result, proved in Appendix A, shows that the probability of error in this regime is approximately linear in $\sigma$ .

Theorem 2.

Let ${\bf X}$ consist of $n$ i.i.d. random variables generated according to $X$ . Let $X^{\prime}$ be an independent copy of $X$ and assume that

f_{X-X^{\prime}}(x)<\infty,~{}\forall x\in\mathbb{R}.

(16)

Then,

\lim_{\sigma\to 0}\frac{P_{e}(\sigma)}{\sigma}=\sum_{i=1}^{n-1}\frac{f_{W_{i}}\!\left(0^{+}\right)}{\sqrt{\pi}},

(17)

where $W_{i}=X_{i+1:n}-X_{i:n},~{}i\in[1:n-1]$ .

Remark 1.

The i.i.d. assumption on ${\bf X}$ in Theorem 2 can be relaxed to the case of exchangeable ${\bf X}$ , provided that the following holds: for $1\leq i<j\leq n-1$ ,


	$\displaystyle f_{W_{i}}(u)<\infty,~{}\forall u\in\mathbb{R}_{+},$		(18a)
	$\displaystyle f_{W_{i},W_{j}}(u,v)<\infty,~{}\forall(u,v)\in\mathbb{R}_{+}^{2}.$		(18b)

Then, under these conditions, for the case of an exchangeable ${\bf X}$ we have the same result as in Theorem 2.

Remark 2.

The quantity $f_{W_{i}}\left(0^{+}\right)$ in (17) can be computed as follows [20]

	$\displaystyle f_{W_{i}}(0^{+})$	$\displaystyle=\frac{n!\!\int_{-\infty}^{\infty}F(x)^{i-1}\!\left(1\!-\!F(x)\right)^{n-i-1}\!f^{2}(x)\ {\rm d}x}{(i-1)!(n-i-1)!}$
		$\displaystyle=\frac{n!\mathbb{E}\left[U^{i-1}\left(1-U\right)^{n-i-1}f(F^{-1}(U))\right]}{(i-1)!(n-i-1)!},$		(19)

where the last step uses the probability integral transformation theorem and the quantile function theorem [21] with $U\sim\text{Unif}(0,1)$ .

We now show that, under the condition $\sup_{x\in\mathbb{R}}f_{X}(x)=c$ , the asymptotic behavior of the probability of error in the low noise regime for an i.i.d. ${\bf X}$ is upper bounded by $O(n^{2})$ . In particular, we have the following lemma.

Lemma 1.

Assume that $\sup_{x\in\mathbb{R}}f_{X}(x)=c$ , where $c\in\mathbb{R}$ is a constant. Then,

\lim_{\sigma\to 0}\frac{P_{e}(\sigma)}{\sigma}\leq c\frac{n(n-1)}{\sqrt{\pi}}.

(20)

Proof:

By using the expression in (19), we have that

	$\displaystyle f_{W_{i}}(0^{+})$	$\displaystyle=\frac{n!\mathbb{E}\left[U^{i-1}\left(1-U\right)^{n-i-1}f(F^{-1}(U))\right]}{(i-1)!(n-i-1)!}$
		$\displaystyle\leq c\frac{n!\mathbb{E}\left[U^{i-1}\left(1-U\right)^{n-i-1}\right]}{(i-1)!(n-i-1)!}$
		$\displaystyle=c\frac{n!\int_{0}^{1}x^{i-1}\left(1-x\right)^{n-i-1}\ {\rm d}x}{(i-1)!(n-i-1)!}$
		$\displaystyle=c\frac{n!}{(i-1)!(n-i-1)!}\frac{\Gamma(n-i)\Gamma(i)}{\Gamma(n)}$
		$\displaystyle=cn,$

where $\Gamma(\cdot)$ is the gamma function and where the inequality follows by using the bound $f(F^{-1}(U))\leq c=\sup_{x\in\mathbb{R}}f(x)$ . Hence, (17) can be upper bounded as

\lim_{\sigma\to 0}\frac{P_{e}(\sigma)}{\sigma}=\sum_{i=1}^{n-1}\frac{f_{W_{i}}(0^{+})}{\sqrt{\pi}}\leq c\frac{(n-1)n}{\sqrt{\pi}}.

(21)

This concludes the proof of Lemma 1. ∎

We conclude this section by providing some examples of (17) for a few distributions.

Example 1. Consider $X\sim\text{Unif}(a,b),~{}0\leq a<b<\infty$ . Then,

\displaystyle\lim_{\sigma\to 0}\frac{P_{e}(\sigma)}{\sigma}=\frac{n(n-1)}{(b-a)\sqrt{\pi}}.

(22)

The proof of (22) can be found in Appendix D-A. $\diamond$

Example 2. Consider $X\sim\text{Exp}(\lambda),~{}\lambda>0$ . Then,

\displaystyle\lim_{\sigma\to 0}\frac{P_{e}(\sigma)}{\sigma}=\frac{\lambda n(n-1)}{2\sqrt{\pi}}.

(23)

The proof of (23) can be found in Appendix D-B. $\diamond$

Example 3. Consider $X\sim{\cal N}(0,1)$ . Then,

\displaystyle\frac{\sqrt{2}n(n-1)}{6\pi}

\displaystyle\leq\lim_{\sigma\to 0}\frac{P_{e}(\sigma)}{\sigma}\leq\frac{n(n-1)}{\sqrt{2}\pi}.

(24)

Note that the upper bound in (24) follows from Lemma 1, where we used the fact that $c=\sup_{x\in\mathbb{R}}f(x)=\frac{1}{\sqrt{2\pi}}$ . For the lower bound we use the following inequality [22, Lemma 10.1]:

\displaystyle f(F^{-1}(u))\geq\sqrt{\frac{2}{\pi}}\min\{u,1-u\}\geq\sqrt{\frac{2}{\pi}}u(1-u),

(25)

where the last step follows since $\min\{a,b\}\geq\frac{ab}{a+b},~{}\forall a>0,b>0$ . Combining the expression in (19) and the bound in (25), we arrive at the following lower bound,

	$\displaystyle f_{W_{i}}(0^{+})$	$\displaystyle\geq\sqrt{\frac{2}{\pi}}\frac{n!\mathbb{E}\left[U^{i}\left(1-U\right)^{n-i}\right]}{(i-1)!(n-i-1)!}$
		$\displaystyle=\sqrt{\frac{2}{\pi}}\frac{i(n-i)}{(n+1)},$		(26)

which implies the lower bound in (24). $\diamond$

IV-C High Noise Asymptotic

We now focus on the asymptotic behavior of the probability of error in the high noise regime (i.e., $\sigma\to\infty$ ). It is not difficult to argue that if ${\bf X}$ is exchangeable, then we have that

\lim_{\sigma\to\infty}P_{e}(\sigma)=1-\frac{1}{n!}=P_{e}(\infty).

(27)

The interpretation is that if $\sigma$ is large, then the output ${\bf Y}$ carries no information of ${\bf X}$ , and the decoder can only rely on the prior knowledge; hence, the best thing that the decoder can do is to guess one of the $n!$ hypotheses.

The next result, proved in Appendix B, sharpens the limit in (27) by finding the rate of convergence.

Theorem 3.

Let ${\bf X}$ be an exchangeable random vector such that $\mathbb{E}[\|{\bf X}\|]<\infty$ . Then,

\lim_{\sigma\to\infty}\frac{P_{e}(\infty)-P_{e}(\sigma)}{\frac{1}{\sigma}}=\frac{1}{\sqrt{2\pi}}\sum_{i=1}^{n-1}\alpha_{i}\mathbb{E}\left[W_{i}\right],

(28)

where $W_{i}=X_{i+1:n}-X_{i:n},~{}i\in[1:n-1]$ and

\alpha_{i}=\frac{{\rm Vol}\left({\cal E}(\mathbf{0}_{n-1},i)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)},

(29)

where ${\cal H}_{[1:n-1]}$ is defined in (1), ${\cal B}(\mathbf{0}_{n-1},1)$ is the $(n-1)$ -dimensional ball centered at the origin with unitary radius, and ${\cal E}(\mathbf{0}_{n-1},i)$ is the $(n-1)$ -dimensional ellipsoid centered at the origin with unit radii along standard axes except a $\frac{1}{\sqrt{2}}$ radius along the $i$ -th axis.

Finding a closed-form expression for the $\alpha_{i}$ ’s in (29) does not appear to be an easy task. In the next lemma, we provide upper and lower bounds on the $\alpha_{i}$ ’s, which lead to expressions that are amenable to computations.

Lemma 2.

In the high noise regime, the convergence rate of the probability of correctness can be bounded as

\displaystyle\frac{\mathbb{E}\left[R_{n}\right]}{\sqrt{\pi}(n-1)!2^{\frac{n}{2}}}\leq\lim_{\sigma\to\infty}\frac{P_{e}(\infty)-P_{e}(\sigma)}{\frac{1}{\sigma}}\leq\frac{\mathbb{E}\left[R_{n}\right]}{\sqrt{2\pi}(n-1)!},

where $R_{n}=X_{n:n}-X_{1:n}$ .

Proof:

We start by observing that

\displaystyle{\cal B}\left(\mathbf{0}_{n-1},2^{-\frac{1}{2}}\right)\subset{\cal E}(\mathbf{0}_{n-1},i)\subset{\cal B}\left(\mathbf{0}_{n-1},1\right),

(30)

that is, the ellipsoid ${\cal E}(\mathbf{0}_{n-1},i)$ : (i) contains the ball ${\cal B}\left(\mathbf{0}_{n-1},2^{-\frac{1}{2}}\right)$ since ${\cal E}(\mathbf{0}_{n-1},i)$ has minimum radius equal to $2^{-\frac{1}{2}}$ ; and (ii) is contained inside the ball ${\cal B}\left(\mathbf{0}_{n-1},1\right)$ since ${\cal E}(\mathbf{0}_{n-1},i)$ has maximum radius equal to $1$ .

Thus, from (30) we obtain

\displaystyle\alpha_{i}\leq\frac{{\rm Vol}\left({\cal B}\left(\mathbf{0}_{n-1},1\right)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}=\frac{1}{(n-1)!},

(31)

where the last equality follows since ${\cal H}_{[1:n-1]}$ is a cone that occupies a $\frac{1}{(n-1)!}$ portion of the space and hence, ${\rm Vol}\left({\cal B}\left(\mathbf{0}_{n-1},1\right)\cap{\cal H}_{[1:n-1]}\right)=\frac{1}{(n-1)!}{\rm Vol}\left({\cal B}\left(\mathbf{0}_{n-1},1\right)\right)$ .

Similarly, from (30) we obtain

$\displaystyle\alpha_{i}$	$\displaystyle\geq\frac{{\rm Vol}\left({\cal B}\left(\mathbf{0}_{n-1},2^{-\frac{1}{2}}\right)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}$
	$\displaystyle=\left\|{\hbox{det}}\left(2^{-\frac{1}{2}}I_{n-1}\right)\right\|\frac{{\rm Vol}\left({\cal B}\left(\mathbf{0}_{n-1},1\right)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}$
	$\displaystyle=\frac{1}{2^{\frac{n-1}{2}}(n-1)!},$	(32)

where in the equality we have used the facts that: (i) $2^{\frac{1}{2}}I_{n-1}{\cal B}\left(\mathbf{0}_{n-1},2^{-\frac{1}{2}}\right)={\cal B}\left(\mathbf{0}_{n-1},1\right)$ , (ii) $2^{\frac{1}{2}}I_{n-1}{\cal H}_{[1:n-1]}={\cal H}_{[1:n-1]}$ , and (iii) ${\rm Vol}(A{\cal S})=|{\hbox{det}}(A)|{\rm Vol}({\cal S})$ for any invertible matrix $A$ and any set ${\cal S}$ .

The proof of Lemma 2 is concluded by substituting 31 and (IV-C) into (28) and by using the fact that

\displaystyle\sum_{i=1}^{n-1}\mathbb{E}[W_{i}]=\mathbb{E}[R_{n}],

(33)

where $R_{n}=X_{n:n}-X_{1:n}$ denotes the range [23] of ${\bf X}$ . ∎

We conclude this section by providing some examples of the range $R_{n}$ for a few common distributions (see Appendix E for the detailed computations). In particular, these examples show that the term $\frac{1}{(n-1)!}$ dominates in the expression of the rate for several distribution of interest.

Example 1. Consider $X\sim\text{Unif}(a,b),~{}0\leq a<b<\infty$ . Then,

\mathbb{E}[R_{n}]=(b-a)\frac{(n-1)}{n+1}.

Example 2. Consider $X\sim\text{Exp}(\lambda),~{}\lambda>0$ . Then,

\mathbb{E}[R_{n}]=\frac{1}{\lambda}\sum_{k=1}^{n-1}\frac{1}{k}=O\left(\frac{1}{\lambda}\log(n)\right).

(34)

Example 3. Let $X$ be $\gamma^{2}$ -sub-Gaussian³³3A random variable $X$ is $\gamma^{2}$ -sub-Gaussian if $\mathbb{E}[e^{\lambda(X-\mathbb{E}[X])}]\leq e^{\lambda\gamma^{2}}$ for all $\lambda\in\mathbb{R}$ . . Then [22],

\mathbb{E}[R_{n}]\leq 2\sqrt{2\gamma^{2}\log(n)}.

Appendix A Proof of Theorem 2

Before proceeding with the proof of Theorem 2, we first present two ancillary results.

Lemma 3.

Let ${\bf X}$ consist of $n$ i.i.d. random variables generated according to $X$ . Let $X^{\prime}$ be an independent copy of $X$ and assume that

f_{X-X^{\prime}}(x)<\infty,~{}\forall x\in\mathbb{R}.

Then, the following holds

	$\displaystyle f_{W_{i}}(u)<\infty,~{}\forall u,\in\mathbb{R}_{+},1\leq i\leq n-1,$
	$\displaystyle f_{W_{i},W_{j}}(u,v)<\infty,~{}\forall(u,v)\in\mathbb{R}_{+}^{2},\,1\leq i<j\leq n-1,$

where $W_{i}=X_{i+1:n}-X_{i:n},~{}i\in[1:n-1]$ .

Proof:

The proof is provided in Appendix C. ∎

Lemma 4.

Let ${\bf V}\sim{\cal N}(\mathbf{0}_{n-1},\tilde{K})$ where $\tilde{K}$ is defined in (12). Then, for any subset ${\cal I}\subseteq[1:n-1]$ ,

\Pr\left(\bigcap_{i\in{\cal I}}\{V_{i}\leq t_{i}\}\right)\leq\prod_{i\in{\cal I}}\Pr\left(\{V_{i}\leq t_{i}\}\right).

(35)

Proof:

The bound in (35) holds if the random vector ${\bf V}$ consists of negatively associated random variables [24]. Observe that the Gaussian random vector ${\bf V}\sim{\cal N}(\mathbf{0}_{n-1},\tilde{K})$ consists of either negatively correlated or independent random variables (see the structure of $\tilde{K}$ in (12)). As was shown in [24], this implies that the random variables in ${{\bf V}}$ are negatively associated. This concludes the proof of Lemma 4. ∎

We now leverage the two lemmas above to prove Theorem 2. From Corollary 1 we have that

\displaystyle P_{e}

\displaystyle=1-\mathbb{E}\left[\Pr\left({\bf V}\geq-\frac{T_{\tau}{\bf X}}{\sigma}\;\middle|\;{\bf X}\right)\;\middle|\;{\bf X}\in\mathcal{H}_{\tau}\right],

(36)

where ${\bf V}\sim{\cal N}(\mathbf{0}_{n-1},\tilde{K})$ . The expression in (36) can be equivalently written as

$\displaystyle P_{e}$	$\displaystyle=\mathbb{E}\left[1\!-\!\Pr\!\left(\bigcap_{i=1}^{n-1}\left\{V_{i}\geq\frac{X_{\tau_{i}}\!-\!X_{\tau_{i+1}}}{\sigma}\right\}\;\middle\|\;{\bf X}\right)\;\middle\|\;{\bf X}\in{\cal H}_{\tau}\right]$
	$\displaystyle=\mathbb{E}\left[\Pr\left(\bigcup_{i=1}^{n-1}\left\{V_{i}<\frac{X_{\tau_{i}}\!-\!X_{\tau_{i+1}}}{\sigma}\right\}\;\middle\|\;{\bf X}\right)\;\middle\|\;{\bf X}\in{\cal H}_{\tau}\right]$
	$\displaystyle\overset{\rm(a)}{=}\Pr\left(\bigcup_{i=1}^{n-1}\left\{V_{i}<\frac{X_{\tau_{i}}\!-\!X_{\tau_{i+1}}}{\sigma}\right\}\;\middle\|\;{\bf X}\in{\cal H}_{\tau}\right)$
	$\displaystyle\overset{\rm(b)}{=}\sum_{k=1}^{n-1}\left((-1)^{k-1}\!\sum_{\begin{subarray}{c}{\cal I}\subseteq[1:n-1]\\ \|{\cal I}\|=k\end{subarray}}\Pr\left({\cal A}_{{\cal I}}\right)\right),$	(37)

where $\rm{(a)}$ is due to the law of total expectation, and $\rm{(b)}$ follows from the inclusion-exclusion principle where ${\cal A}_{{\cal I}}=\cap_{i\in{\cal I}}{\cal A}_{i}$ with ${\cal A}_{i}=\{V_{i}<\sigma^{-1}(X_{\tau_{i}}-X_{\tau_{i+1}})\mid{\bf X}\in{\cal H}_{\tau}\}$ .

From the expression in (A) it follows that

\displaystyle\lim_{\sigma\to 0}\frac{P_{e}}{\sigma}

\displaystyle=\sum_{k=1}^{n-1}\left((-1)^{k-1}\sum_{\begin{subarray}{c}{\cal I}\subseteq[1:n-1]\\ |{\cal I}|=k\end{subarray}}\lim_{\sigma\to 0}\frac{1}{\sigma}\Pr\left({\cal A}_{{\cal I}}\right)\right).

(38)

In what follows, we therefore analyze $\Pr\left({\cal A}_{{\cal I}}\right)$ . We have that

$\displaystyle\Pr\left({\cal A}_{{\cal I}}\right)$	$\displaystyle=\mathbb{E}\left[\Pr\!\left(\bigcap_{i\in{\cal I}}\left\{V_{i}<\frac{X_{\tau_{i}}\!-\!X_{\tau_{i+1}}}{\sigma}\right\}\!\;\middle\|\;\!{\bf X}\right)\!\;\middle\|\;\!{\bf X}\in{\cal H}_{\tau}\right]$
	$\displaystyle=\mathbb{E}\left[\Pr\!\left(\bigcap_{i\in{\cal I}}\left\{V_{i}<\frac{X_{i:n}\!-\!X_{i+1:n}}{\sigma}\right\}\;\middle\|\;{\bf X}\right)\right]$
	$\displaystyle=\mathbb{E}\left[\Pr\!\left(\bigcap_{i\in{\cal I}}\left\{V_{i}<\frac{-W_{i}}{\sigma}\right\}\;\middle\|\;{\bf W}_{{\cal I}}\right)\right],$	(39)

where ${\bf W}_{{\cal I}}$ is a $k$ -dimensional random vector with entries $W_{i}=X_{i+1:n}-X_{i:n}$ for $i\in{\cal I}$ .

We next consider two separate cases.

$\bullet$ Case 1: $k=1$ . Let ${\cal I}=\{i\}$ ; then, we can write (A) as

	$\displaystyle\Pr\left({\cal A}_{{\cal I}}\right)$	$\displaystyle=\mathbb{E}\left[\Pr\left(V_{i}<\frac{-W_{i}}{\sigma}\;\middle\|\;W_{i}\right)\right]$
		$\displaystyle=\mathbb{E}\left[Q\left(\frac{W_{i}}{\sqrt{2}\sigma}\right)\right]$
		$\displaystyle=\int_{0}^{\infty}Q\left(\frac{w}{\sqrt{2}\sigma}\right)f_{W_{i}}\left(w\right){\rm d}w$
		$\displaystyle=\int_{0}^{\infty}Q\left(u\right)f_{W_{i}}\left(\sqrt{2}\sigma u\right)\sqrt{2}\sigma\ {\rm d}u,$

where the last equality follows by applying the change of variable. Thus, we have that

$\displaystyle\lim_{\sigma\to 0}\frac{1}{\sigma}\Pr\left({\cal A}_{{\cal I}}\right)$	$\displaystyle=\sqrt{2}\int_{0}^{\infty}\lim_{\sigma\to 0}Q\left(u\right)f_{W_{i}}\left(\sqrt{2}\sigma u\right){\rm d}u$
	$\displaystyle=\sqrt{2}f_{W_{i}}\left(0^{+}\right)\int_{0}^{\infty}Q\left(u\right){\rm d}u$
	$\displaystyle=\frac{f_{W_{i}}\left(0^{+}\right)}{\sqrt{\pi}},$	(40)

where the first equality follows from the dominated convergence theorem, which is verifiable since for any $\sigma$ , $Q(u)f_{W_{i}}(\sqrt{2}\sigma u)\leq Q(u)\max_{t}f_{W_{i}}(t)<\infty$ where the fact that $f_{W_{i}}(t)<\infty,~{}\forall t\in\mathbb{R}_{+}$ is shown in Lemma 3. $\square$

$\bullet$ Case 2: $k\geq 2$ . By using the bound in Lemma 4, we obtain

$\displaystyle\Pr\left({\cal A}_{{\cal I}}\right)$	$\displaystyle\leq\mathbb{E}\left[\prod_{i\in{\cal I}}\Pr\!\left(V_{i}<\frac{-W_{i}}{\sigma}\;\middle\|\;{\bf W}_{{\cal I}}\right)\right]$
	$\displaystyle=\mathbb{E}\left[\prod_{i\in{\cal I}}Q\!\left(\frac{W_{i}}{\sqrt{2}\sigma}\right)\right]$
	$\displaystyle\leq\mathbb{E}\left[\prod_{\begin{subarray}{c}i\in{\cal J}\\ {\cal J}\subset{\cal I},\|{\cal J}\|=2\end{subarray}}Q\!\left(\frac{W_{i}}{\sqrt{2}\sigma}\right)\right].$	(41)

By letting ${\cal J}=\{s,t\}\subset{\cal I}$ in (41), we obtain that

	$\displaystyle\lim_{\sigma\to 0}\frac{\Pr\left({\cal A}_{{\cal I}}\right)}{\sigma}$
	$\displaystyle\leq\lim_{\sigma\to 0}\sigma\!\!\int_{0}^{\infty}\!\!\!\int_{0}^{\infty}\!\!Q\!\left(\frac{w}{\sqrt{2}}\right)\!Q\!\left(\frac{z}{\sqrt{2}}\right)\!f_{W_{s},W_{t}}(\sigma w,\sigma z)\ {\rm d}w\ {\rm d}z$
	$\displaystyle\leq\lim_{\sigma\to 0}\sigma\frac{f_{W_{s},W_{t}}(0^{+},0^{+})}{\pi}=0,$		(42)

where the equality follows from Lemma 3, and the second inequality is due to the fact that

		$\displaystyle\lim_{\sigma\to 0}\int_{0}^{\infty}\!\!\!\int_{0}^{\infty}\!\!Q\!\left(\frac{w}{\sqrt{2}}\right)\!Q\!\left(\frac{z}{\sqrt{2}}\right)\!f_{W_{s},W_{t}}(\sigma w,\sigma z)\ {\rm d}w\ {\rm d}z$
		$\displaystyle\stackrel{{\scriptstyle{\rm{(a)}}}}{{=}}\int_{0}^{\infty}\!\!\!\int_{0}^{\infty}\!\!Q\!\left(\frac{w}{\sqrt{2}}\right)\!Q\!\left(\frac{z}{\sqrt{2}}\right)\!f_{W_{s},W_{t}}(0^{+},0^{+})\ {\rm d}w\ {\rm d}z$
		$\displaystyle=\frac{f_{W_{s},W_{t}}(0^{+},0^{+})}{\pi}<\infty,$		(43)

where $\rm{(a)}$ follows from the dominated convergence theorem, which is verifiable by means of Lemma 3. $\square$

By using the limits in (A) and (42) inside (38), we obtain

	$\displaystyle\lim_{\sigma\to 0}\frac{P_{e}}{\sigma}$	$\displaystyle\overset{\rm(a)}{=}\sum_{\begin{subarray}{c}{\cal I}\subseteq[1:n-1]\\ \|{\cal I}\|=1\end{subarray}}\lim_{\sigma\to 0}\frac{1}{\sigma}\Pr\left({\cal A}_{{\cal I}}\right)$
		$\displaystyle\overset{\rm(b)}{=}\sum_{i=1}^{n-1}\frac{f_{W_{i}}\left(0^{+}\right)}{\sqrt{\pi}},$		(44)

where $\rm(a)$ follows from (42), and $\rm(b)$ follows from (A). This concludes the proof of Theorem 2.

Appendix B Proof of Theorem 3

We start by noting that, in view of the limit in (27), we have that $\lim_{\sigma\to\infty}P_{c}=\frac{1}{n!}$ . We now consider the following limit,

\lim_{\sigma\to\infty}\frac{P_{c}-\frac{1}{n!}}{\frac{1}{\sigma}}.

(45)

Instead of working with $\sigma$ , we parameterize the problem in terms of $\sigma=\frac{1}{\kappa}$ . Then, (45) can be equivalently expressed as

\displaystyle\lim_{\sigma\to\infty}\frac{P_{c}-\frac{1}{n!}}{\frac{1}{\sigma}}=\lim_{\kappa\to 0}\frac{P_{c}-\frac{1}{n!}}{\kappa}=\lim_{\kappa\to 0}\frac{\partial P_{c}}{\partial\kappa},

(46)

where the last equality can be argued by using the definition of the derivative or the L’Hôpital’s rule.

From Corollary 1, the probability of correctness is given by

$\displaystyle P_{c}=1-P_{e}$	$\displaystyle=\mathbb{E}\left[Q_{\sigma^{2}\tilde{K}}\!\left(-T_{\tau}{\bf X}\right)\;\middle\|\;{\bf X}\in\mathcal{H}_{\tau}\right]$
	$\displaystyle=\mathbb{E}\left[\Pr\left(\frac{1}{\kappa}{\bf V}\geq-{\bf W}\;\middle\|\;{\bf W}\right)\right]$
	$\displaystyle=\mathbb{E}\left[\int_{{\bf v}\in\mathbb{R}^{n-1}}\!1_{\left\{{\bf v}\geq-\kappa{\bf W}\right\}}f_{{{\bf V}}}({\bf v}){\rm d}{\bf v}\right],$	(47)

where we let ${\bf V}\sim{\cal N}(\mathbf{0}_{n-1},\tilde{K})$ and ${\bf W}=T_{\tau}{\bf X}\mid{\bf X}\in{\cal H}_{\tau}$ , and we used the exchangeablity of ${\bf X}$ .

Using the expression in (47), the derivative of $P_{c}$ with respect to $\kappa$ is now given by

	$\displaystyle\frac{\partial P_{c}}{\partial\kappa}$	$\displaystyle\overset{\rm(a)}{=}\mathbb{E}\left[\int_{{\bf v}\in\mathbb{R}^{n-1}}\!\frac{\partial}{\partial\kappa}1_{\left\{{\bf v}\geq-\kappa{{\bf W}}\right\}}f_{{{\bf V}}}({\bf v}){\rm d}{\bf v}\right]$
		$\displaystyle\overset{\rm(b)}{=}\mathbb{E}\left[\int_{{\bf v}\in\mathbb{R}^{n-1}}\!\bigtriangleup(\kappa,{\bf v},{\bf W})f_{{{\bf V}}}({\bf v}){\rm d}{\bf v}\right],$		(48)

where in $\rm(a)$ we used the Leibniz’s integral rule, and $\rm(b)$ follows since

$\displaystyle\frac{\partial}{\partial\kappa}1_{\left\{{\bf v}\geq-\kappa{{\bf W}}\right\}}$	$\displaystyle=\frac{\partial}{\partial\kappa}\prod_{i=1}^{n-1}1_{\left\{v_{i}\geq-\kappa W_{i}\right\}}$
	$\displaystyle{\overset{\rm(b1)}{=}}\frac{\partial}{\partial\kappa}\prod_{i=1}^{n-1}1_{\left\{\frac{-v_{i}}{W_{i}}\leq\kappa\right\}}$
	$\displaystyle{\overset{\rm(b2)}{=}}\sum_{i=1}^{n-1}\delta\left(\kappa+\frac{v_{i}}{W_{i}}\right)\prod_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{n-1}1_{\left\{\frac{-v_{j}}{W_{j}}\leq\kappa\right\}}$
	$\displaystyle{\overset{\rm(b3)}{=}}\sum_{i=1}^{n-1}W_{i}\delta\left(\kappa W_{i}+v_{i}\right)\prod_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{n-1}1_{\left\{\frac{-v_{j}}{W_{j}}\leq\kappa\right\}}$
	$\displaystyle\triangleq\bigtriangleup(\kappa,{\bf v},{\bf W}),$	(49)

where the labeled equalities follow from: $\rm(b1)$ since each entry $W_{i},i\in[1:n-1]$ of ${\bf W}$ is positive (we ignore the case when $(T_{\tau}{\bf X})_{i}=0$ , and in fact $f_{W_{i}}(0)=0$ for an i.i.d. ${\bf X}$ ); $\rm(b2)$ the product rule and the fact that $\frac{\partial}{\partial x}1_{\{0\leq x+t\}}=\delta(x+t)$ , where $\delta(x)$ is the Dirac delta function; $\rm(b3)$ the scaling property of the Dirac delta function.

We now consider the integral inside the expectation in (B). By using the sifting property of the Dirac delta function, the integral becomes

		$\displaystyle\int_{{\bf v}\in\mathbb{R}^{n-1}}\!\bigtriangleup(\kappa,{\bf v},{\bf W})f_{{{\bf V}}}({\bf v}){\rm d}{\bf v}$
		$\displaystyle\!=\!\sum_{i=1}^{n-1}W_{i}\!\int_{{\bf u}\in\mathbb{R}^{n-2}}\!\prod_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{n-2}\!1_{\left\{\frac{-u_{j}}{W_{j}}\leq\kappa\right\}}f_{{{\bf V}}_{\setminus i},V_{i}}({\bf u},-\kappa W_{i}){\rm d}{\bf u},$		(50)

where ${{\bf V}}_{\setminus i}$ is obtained by retaining all the entries of ${\bf V}$ except the $i$ -th one. We next substitute (B) inside (B) and we compute the limit in (46). We obtain

$\displaystyle\lim_{\kappa\to 0}\frac{\partial P_{c}}{\partial\kappa}$	$\displaystyle\overset{\rm(a)}{=}\mathbb{E}\!\left[\sum_{i=1}^{n-1}W_{i}\int_{{\bf u}\in\mathbb{R}^{n-2}}\!\!\prod_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{n-2}\!1_{\left\{u_{j}\geq 0\right\}}f_{{{\bf V}}_{\setminus i},V_{i}}({\bf u},0)\ {\rm d}{\bf u}\right]$
	$\displaystyle=\sum_{i=1}^{n-1}\mathbb{E}\left[W_{i}\right]\int_{{\bf u}\in\mathbb{R}^{n-2}_{+}}f_{{{\bf V}}_{\setminus i},V_{i}}({\bf u},0)\ {\rm d}{\bf u}$
	$\displaystyle\stackrel{{\scriptstyle\rm(b)}}{{=}}\!\sum_{i=1}^{n-1}\!\mathbb{E}\!\left[W_{i}\right]\!\Pr({{\bf V}}_{\setminus i}\!\geq\!\mathbf{0}_{n-2}\mid V_{i}=0)f_{V_{i}}(0),$	(51)

where the labeled equalities follows from: $\rm(a)$ using the dominated convergence theorem, which is verified since $W_{i}\prod_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{n-2}1_{\left\{u_{j}\geq 0\right\}}f_{{{\bf V}}_{\setminus i},V_{i}}({\bf u},0)\leq W_{i}f_{{{\bf V}}_{\setminus i},V_{i}}({\bf u},0)$ where $W_{i}$ is assumed to be absolutely integrable; and $\rm(b)$ using the following,

	$\displaystyle\int_{{\bf u}\in\mathbb{R}^{n-2}_{+}}f_{{{\bf V}}_{\setminus i},V_{i}}({\bf u},0)\ {\rm d}{\bf u}$	$\displaystyle=\int_{{\bf u}\in\mathbb{R}_{+}^{n-2}}\!f_{{{\bf V}}_{\setminus i}\|V_{i}}({\bf u}\mid 0)f_{V_{i}}(0){\rm d}{\bf u}$
		$\displaystyle=\Pr({{\bf V}}_{\setminus i}\geq\mathbf{0}_{n-2}\|V_{i}=0)f_{V_{i}}(0).$

To finalize the proof, it remains to compute $\Pr({{\bf V}}_{\setminus i}\geq\mathbf{0}_{n-2}|V_{i}=0)=\alpha_{i}$ . This can be done as follows,

	$\displaystyle\alpha_{i}=\Pr({{\bf V}}_{\setminus i}\geq\mathbf{0}_{n-2}\mid V_{i}=0)$
	$\displaystyle\stackrel{{\scriptstyle\rm(a)}}{{=}}\Pr\!\left(\bigcap_{{j=1,j\neq i}}^{n-1}\{Z_{j+1}-Z_{j}\geq 0\}\;\middle\|\;Z_{i+1}-Z_{i}=0\right)$
	$\displaystyle=\Pr\!\left(\{Z_{1}\leq\cdots\leq Z_{i}\}\cap\{Z_{i+1}\leq\cdots\leq Z_{n}\}\mid Z_{i+1}=Z_{i}\right)$
	$\displaystyle\stackrel{{\scriptstyle\rm(b)}}{{=}}\mathbb{E}\!\left[\Pr\!\left(\{\cdots\leq Z_{i}\}\cap\{Z_{i+1}\leq\cdots\}\|Z_{i+1},Z_{i}\right)\|Z_{i+1}=Z_{i}\right]$
	$\displaystyle=\int_{-\infty}^{\infty}\Pr\!\left(\{\cdots\leq t\}\cap\{t\leq\cdots\}\right)f_{Z_{i},Z_{i+1}\mid Z_{i}=Z_{i+1}}(t,t)\ {\rm d}t$
	$\displaystyle\stackrel{{\scriptstyle\rm(c)}}{{=}}\int_{-\infty}^{\infty}\Pr\!\left(\{Z_{1}\leq\cdots\leq t\}\cap\{t\leq\cdots\leq Z_{n}\}\right)f_{\frac{1}{\sqrt{2}}{Z}}(t)\ {\rm d}t$
	$\displaystyle=\Pr\!\left(Z_{1}\leq\cdots\leq Z_{i-1}\leq\frac{1}{\sqrt{2}}{Z}_{i}\leq Z_{i+2}\leq\cdots\leq Z_{n}\right)$
	$\displaystyle\stackrel{{\scriptstyle\rm(d)}}{{=}}\Pr\!\left(A_{i}{\bf Z}\in{\cal H}_{[1:n-1]}\right)$
	$\displaystyle=\Pr\!\left({\bf Z}\in A_{i}^{-1}{\cal H}_{[1:n-1]}\right)$
	$\displaystyle\overset{\rm(e)}{=}\frac{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\cap A_{i}^{-1}{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}$
	$\displaystyle\overset{\rm(f)}{=}\left\|{\hbox{det}}\left(A_{i}^{-1}\right)\right\|\frac{{\rm Vol}\left(A_{i}{\cal B}(\mathbf{0}_{n-1},1)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}$
	$\displaystyle\overset{\rm(g)}{=}\sqrt{2}\frac{{\rm Vol}\left({\cal E}(\mathbf{0}_{n-1},i)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)},$		(52)

where the labeled equalities follow from: $\rm(a)$ the definition of ${{\bf V}}$ and writing it in terms of standard normal; $\rm(b)$ the law of total expectation, where we abbreviate $\{\cdots\leq Z_{i}\}\cap\{Z_{i+1}\leq\cdots\}\triangleq\{Z_{1}\leq\cdots\leq Z_{i}\}\cap\{Z_{i+1}\leq\cdots\leq Z_{n}\}$ ; $\rm(c)$ using the fact that

	$\displaystyle f_{Z_{i},Z_{i+1}\mid Z_{i}=Z_{i+1}}(t,t)$	$\displaystyle=\frac{f_{Z_{i},Z_{i+1}}(t,t)}{\int_{-\infty}^{\infty}f_{Z_{i},Z_{i+1}}(z,z)\ {\rm d}z}$
		$\displaystyle=f_{\frac{1}{\sqrt{2}}{Z}}(t);$

$\rm(d)$ letting ${\bf{Z}}\sim\mathcal{N}(\mathbf{0}_{n-1},I_{n-1})$ , defining a diagonal matrix $A_{i}\in\mathbb{R}^{(n-1)\times(n-1)}$ with the $i$ -th element equal to $\frac{1}{\sqrt{2}}$ and the others equal to one, and recalling that from (1) we have ${\cal H}_{[1:n-1]}=\{{\bf x}\in\mathbb{R}^{n-1}:x_{1}\leq\cdots\leq x_{n-1}\}$ ; $\rm(e)$ using the $(n-1)$ -dimensional volume expression for the probability of a standard normal vector [2]; $\rm(f)$ the fact that ${\rm Vol}(A{\cal S})=|{\hbox{det}}(A)|{\rm Vol}({\cal S})$ for any invertible matrix $A$ and any set ${\cal S}$ ; $\rm(g)$ letting ${\cal E}(\mathbf{0}_{n-1},i)$ be the $(n-1)$ -dimensional ellipsoid centered at the origin with unit radii along standard axes except a $\frac{1}{\sqrt{2}}$ radius along the $i$ -th axis.

Substituting (52) into (B), and noting that $f_{V_{i}}(0)=\frac{1}{2\sqrt{\pi}}$ for all $i\in[1:n-1]$ , we obtain

\displaystyle\lim_{\sigma\to\infty}\frac{P_{e}(\infty)-P_{e}(\sigma)}{\frac{1}{\sigma}}

\displaystyle=\frac{1}{\sqrt{2\pi}}\sum_{i=1}^{n-1}\alpha_{i}\mathbb{E}\left[W_{i}\right],

(53)

where, for all $i\in[1:n-1]$ , we have

\displaystyle\alpha_{i}

\displaystyle=\frac{{\rm Vol}\left({\cal E}(\mathbf{0}_{n-1},i)\cap{\cal H}_{[1:n-1]}\right)}{{\rm Vol}\left({\cal B}(\mathbf{0}_{n-1},1)\right)}.

(54)

This concludes the proof of Theorem 3.

Appendix C Proof of Lemma 3

We start by noting that the joint density function $f_{W_{i},W_{j}}(u,v),~{}1\leq i<j\leq n-1$ is given by [20]

	$\displaystyle f_{W_{i},W_{j}}(u,v)$
	$\displaystyle=n!\!\int_{-\infty}^{\infty}\int_{x+u}^{\infty}\!\frac{F(y)^{i-2}}{(i-2)!}\frac{(F(x)\!-\!F(x+u))^{j-i-2}}{(j-i-2)!}$
	$\displaystyle~{}~{}~{}\times\!\frac{(1\!-\!F(y\!+\!v))^{n-j}}{(n-j)!}f(x)f(x\!+\!u)f(y)f(y\!+\!v)\ {\rm d}y\ {\rm d}x,$

where $F(\cdot)$ is the cumulative distribution function of $X$ and $f(\cdot)$ is the probability density function of $X$ .

By using the upper bounds of $F(x)\leq 1$ and $1-F(x)\leq 1$ , we obtain

	$\displaystyle f_{W_{i},W_{j}}(u,v)$
	$\displaystyle\leq n!\frac{\int_{-\infty}^{\infty}\int_{x+u}^{\infty}f(x)f(x+u)f(y)f(y+v)\ {\rm d}y\ {\rm d}x}{(i-2)!(j-i-2)!(n-j)!}$
	$\displaystyle\leq n!\frac{\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}f(x)f(x+u)f(y)f(y+v)\ {\rm d}y\ {\rm d}x}{(i-2)!(j-i-2)!(n-j)!}$
	$\displaystyle=n!\frac{\int_{-\infty}^{\infty}f(x)f(x+u)\ {\rm d}x\int_{-\infty}^{\infty}f(y)f(y+v)\ {\rm d}y}{(i-2)!(j-i-2)!(n-j)!}$
	$\displaystyle=\frac{n!f_{X-X^{\prime}}(u)f_{X-X^{\prime}}(v)}{(i-2)!(j-i-2)!(n-j)!}$
	$\displaystyle<\infty,$

where the second inequality follows since the integrand is always positive, and the last inequality is due to the assumption that $f_{X-X^{\prime}}(x)<\infty,~{}\forall x$ . This shows that the joint density is bounded everywhere.

For the marginal density, we obtain

	$\displaystyle f_{W_{i}}(u)$	$\displaystyle=\int_{-\infty}^{\infty}\!f_{W_{i},W_{j}}(u,v)\ {\rm d}v$
		$\displaystyle\leq\int_{-\infty}^{\infty}\!\frac{n!f_{X-X^{\prime}}(u)f_{X-X^{\prime}}(v)}{(i-2)!(j-i-2)!(n-j)!}\ {\rm d}v$
		$\displaystyle=\frac{n!f_{X-X^{\prime}}(u)}{(i-2)!(j-i-2)!(n-j)!}$
		$\displaystyle<\infty,$

where the inequality follows from the fact that we have shown above that

\displaystyle f_{W_{i},W_{j}}(u,v)\leq\frac{n!f_{X-X^{\prime}}(u)f_{X-X^{\prime}}(v)}{(i-2)!(j-i-2)!(n-j)!}.

This concludes the proof of Lemma 3.

Appendix D Examples for the Low Noise Regime

D-A Uniform Distribution

For $X\sim\text{Unif}(a,b),~{}0\leq a<b<\infty$ , using the formula in (19), we have that

	$\displaystyle f_{W_{i}}(0^{+})$	$\displaystyle=\frac{n!\int_{a}^{b}\left(\frac{x-a}{b-a}\right)^{i-1}\left(\frac{b-x}{b-a}\right)^{n-i-1}(b-a)^{-2}\ {\rm d}x}{(i-1)!(n-i-1)!}$
		$\displaystyle=\frac{n!\int_{a}^{b}\left(x-a\right)^{i-1}\left(b-x\right)^{n-i-1}\ {\rm d}x}{(b-a)^{n}(i-1)!(n-i-1)!}$
		$\displaystyle=\frac{n!}{(b-a)^{n}(i-1)!(n-i-1)!}\frac{\Gamma(n-i)\Gamma(i)}{(b-a)^{1-n}\Gamma(n)}$
		$\displaystyle=\frac{n}{b-a},$

where $\Gamma(\cdot)$ is the gamma function. Hence, (17) becomes

\displaystyle\lim_{\sigma\to 0}\frac{P_{e}}{\sigma}=\sum_{i=1}^{n-1}\frac{f_{W_{i}}(0^{+})}{\sqrt{\pi}}=\frac{n(n-1)}{(b-a)\sqrt{\pi}}.

D-B Exponential Distribution

For the case of $X\sim\text{Exp}(\lambda)$ , using the formula in (19), we have that

	$\displaystyle f_{W_{i}}(0^{+})$	$\displaystyle=\frac{\lambda^{2}n!\int_{0}^{\infty}(1-e^{-\lambda x})^{i-1}e^{-(n-i-1)\lambda x}e^{-2\lambda x}\ {\rm d}x}{(i-1)!(n-i-1)!}$
		$\displaystyle=\frac{\lambda^{2}n!\int_{0}^{\infty}(1-e^{-\lambda x})^{i-1}e^{-(n-i+1)\lambda x}\ {\rm d}x}{(i-1)!(n-i-1)!}$
		$\displaystyle=\frac{\lambda^{2}n!}{(i-1)!(n-i-1)!}\frac{\Gamma(n-i+1)\Gamma(i)}{\lambda\Gamma(1+n)}$
		$\displaystyle=\frac{\lambda^{2}n!}{(i-1)!(n-i-1)!}\frac{(n-i)!(i-1)!}{\lambda n!}$
		$\displaystyle=\lambda(n-i),$

where $\Gamma(\cdot)$ is the gamma function. Hence, (17) becomes

\displaystyle\lim_{\sigma\to 0}\frac{P_{e}}{\sigma}=\sum_{i=1}^{n-1}\frac{f_{W_{i}}(0^{+})}{\sqrt{\pi}}=\frac{\lambda(n-1)n}{2\sqrt{\pi}}.

Appendix E Examples for the High Noise Regime

The key to the proof is to use the following expressions from [23],

	$\displaystyle\mathbb{E}[X_{1:n}]=n\int_{-\infty}^{\infty}x(1-F(x))^{n-1}f(x)\ {\rm d}x,$
	$\displaystyle\mathbb{E}[X_{n:n}]=n\int_{-\infty}^{\infty}xF(x)^{n-1}f(x)\ {\rm d}x.$

First, consider $X_{i}\sim\text{Unif}(a,b),~{}0\leq a<b<\infty$ . Then,

\displaystyle\mathbb{E}[X_{1:n}]=\frac{b+an}{n+1}\ \text{ and }\ \mathbb{E}[X_{n:n}]=\frac{a+bn}{n+1},

and hence, we obtain

\displaystyle\mathbb{E}[R_{n}]=\frac{(b-a)(n-1)}{n+1}.

Next, let $X_{i}\sim\text{Exp}(\lambda)$ . Then,

\displaystyle\mathbb{E}[X_{1:n}]=\frac{1}{\lambda n}\ \text{ and }\ \mathbb{E}[X_{n:n}]=\sum_{k=1}^{n}\frac{1}{\lambda k},

and hence, we obtain

\displaystyle\mathbb{E}[R_{n}]=\sum_{k=1}^{n-1}\frac{1}{\lambda k}.

References

[1] C. Dwork, “Differential privacy: A survey of results,” in International conference on theory and applications of models of computation. Springer, 2008, pp. 1–19.
[2] M. Jeong, A. Dytso, M. Cardone, and H. V. Poor, “Recovering structure of noisy data through hypothesis testing,” in Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), June 2020, pp. 1307–1312.
[3] ——, “Recovering data permutations from noisy observations: The linear regime,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 3, pp. 854–869, 2020.
[4] S. R. Searle et al., “Prediction, mixed models, and variance components,” 1973.
[5] S. Portnoy, “Maximizing the probability of correctly ordering random variables using linear predictors,” Journal of Multivariate Analysis, vol. 12, no. 2, pp. 256–269, 1982.
[6] K. Nomakuchi and T. Sakata, “Characterizations of the forms of covariance matrix of an elliptically contoured distribution,” Sankhyā: The Indian Journal of Statistics, Series A, pp. 205–210, 1988.
[7] ——, “Characterization of conditional covariance and unified theory in the problem of ordering random variables,” Annals of the Institute of Statistical Mathematics, vol. 40, no. 1, pp. 93–99, 1988.
[8] O. Collier and A. S. Dalalyan, “Minimax rates in permutation estimation for feature matching,” The Journal of Machine Learning Research, vol. 17, no. 6, pp. 1 –31, January 2016.
[9] A. Pananjady, M. J. Wainwright, and T. A. Courtade, “Linear regression with shuffled data: Statistical and computational limits of permutation recovery,” IEEE Transactions on Information Theory, vol. 64, no. 5, pp. 3286–3300, May 2018.
[10] ——, “Denoising linear models with permuted data,” in Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), June 2017, pp. 446–450.
[11] P. Rigollet and J. Weed, “Uncoupled isotonic regression via minimum Wasserstein deconvolution,” Information and Inference: A Journal of the IMA, vol. 8, no. 4, pp. 691–717, December 2019.
[12] J. Unnikrishnan, S. Haghighatshoar, and M. Vetterli, “Unlabeled sensing with random linear measurements,” IEEE Transactions on Information Theory, vol. 64, no. 5, pp. 3237–3253, May 2018.
[13] S. Haghighatshoar and G. Caire, “Signal recovery from unlabeled samples,” IEEE Transactions on Signal Processing, vol. 66, no. 5, pp. 1242–1257, March 2018.
[14] H. Zhang, M. Slawski, and P. Li, “Permutation recovery from multiple measurement vectors in unlabeled sensing,” in Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), July 2019, pp. 1857–1861.
[15] I. Dokmanić, “Permutations unlabeled beyond sampling unknown,” IEEE Signal Processing Letters, vol. 26, no. 6, pp. 823–827, April 2019.
[16] M. Tsakiris and L. Peng, “Homomorphic sensing,” in Proceedings of the 36th International Conference on Machine Learning (ICML), vol. 97, June 2019, pp. 6335–6344.
[17] M. C. Tsakiris, “Eigenspace conditions for homomorphic sensing,” arXiv:1812.07966, April 2019.
[18] A. Dytso, M. Cardone, M. S. Veedu, and H. V. Poor, “On estimation under noisy order statistics,” in Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), July 2019, pp. 36–40.
[19] S. M. Kay, Fundamentals of Statistical Signal Processing, vol. 2: Detection Theory. Prentice Hall PTR, 1998.
[20] R. Pyke, “Spacings,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 27, no. 3, pp. 395–449, 1965. [Online]. Available: http://www.jstor.org/stable/2345793
[21] J. E. Angus, “The probability integral transform and related results,” SIAM review, vol. 36, no. 4, pp. 652–654, 1994.
[22] S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
[23] H. A. David and H. N. Nagaraja, “Order statistics,” Encyclopedia of statistical sciences, 2004.
[24] K. Joag-Dev and F. Proschan, “Negative association of random variables with applications,” The Annals of Statistics, pp. 286–295, 1983.

		$\displaystyle P_{c}=\sum_{\pi\in{\cal P}}\Pr\left({\bf Y}\in A\mathcal{H}_{\pi}+{\bf b}\;\middle\|\;{\bf X}\in\mathcal{H}_{\pi}\right)\Pr\left({\bf X}\in\mathcal{H}_{\pi}\right)$
		$\displaystyle{\overset{\rm(a)}{=}}\frac{1}{n!}\!\sum_{\pi\in{\cal P}}\Pr\left({\bf X}+K_{{\bf N}}^{\frac{1}{2}}{\bf Z}-{\bf b}\in A\mathcal{H}_{\pi}\!\;\middle\|\;\!{\bf X}\in\mathcal{H}_{\pi}\right)$
		$\displaystyle{\overset{\rm(b)}{=}}\frac{1}{n!}\!\sum_{\pi\in{\cal P}}\mathbb{E}\!\left[\Pr\!\left({\bf X}\!+\!K_{{\bf N}}^{\frac{1}{2}}{\bf Z}-{\bf b}\in\!A\mathcal{H}_{\pi}\!\;\middle\|\;\!{\bf X}\right)\!\;\middle\|\;\!{\bf X}\!\in\!\mathcal{H}_{\pi}\right],\!$		(8)

		$\displaystyle\Pr\left({\bf X}+K_{\bf N}^{\frac{1}{2}}{\bf Z}-{\bf b}\in A\mathcal{H}_{\pi}\;\middle\|\;{\bf X}\right)$
		$\displaystyle=\Pr\left(A^{-1}({\bf X}-{\bf b})+A^{-1}K_{\bf N}^{\frac{1}{2}}{\bf Z}\in\mathcal{H}_{\pi}\;\middle\|\;{\bf X}\right)$
		$\displaystyle=\Pr\left(A^{-1}({\bf X}-{\bf b})+{\bf U}\in\mathcal{H}_{\pi}\;\middle\|\;{\bf X}\right),$		(9)

$\displaystyle P_{e}$	$\displaystyle=\mathbb{E}\left[1\!-\!\Pr\!\left(\bigcap_{i=1}^{n-1}\left\{V_{i}\geq\frac{X_{\tau_{i}}\!-\!X_{\tau_{i+1}}}{\sigma}\right\}\;\middle\|\;{\bf X}\right)\;\middle\|\;{\bf X}\in{\cal H}_{\tau}\right]$
	$\displaystyle=\mathbb{E}\left[\Pr\left(\bigcup_{i=1}^{n-1}\left\{V_{i}<\frac{X_{\tau_{i}}\!-\!X_{\tau_{i+1}}}{\sigma}\right\}\;\middle\|\;{\bf X}\right)\;\middle\|\;{\bf X}\in{\cal H}_{\tau}\right]$
	$\displaystyle\overset{\rm(a)}{=}\Pr\left(\bigcup_{i=1}^{n-1}\left\{V_{i}<\frac{X_{\tau_{i}}\!-\!X_{\tau_{i+1}}}{\sigma}\right\}\;\middle\|\;{\bf X}\in{\cal H}_{\tau}\right)$
	$\displaystyle\overset{\rm(b)}{=}\sum_{k=1}^{n-1}\left((-1)^{k-1}\!\sum_{\begin{subarray}{c}{\cal I}\subseteq[1:n-1]\\ \|{\cal I}\|=k\end{subarray}}\Pr\left({\cal A}_{{\cal I}}\right)\right),$	(37)

$\displaystyle\Pr\left({\cal A}_{{\cal I}}\right)$	$\displaystyle=\mathbb{E}\left[\Pr\!\left(\bigcap_{i\in{\cal I}}\left\{V_{i}<\frac{X_{\tau_{i}}\!-\!X_{\tau_{i+1}}}{\sigma}\right\}\!\;\middle\|\;\!{\bf X}\right)\!\;\middle\|\;\!{\bf X}\in{\cal H}_{\tau}\right]$
	$\displaystyle=\mathbb{E}\left[\Pr\!\left(\bigcap_{i\in{\cal I}}\left\{V_{i}<\frac{X_{i:n}\!-\!X_{i+1:n}}{\sigma}\right\}\;\middle\|\;{\bf X}\right)\right]$
	$\displaystyle=\mathbb{E}\left[\Pr\!\left(\bigcap_{i\in{\cal I}}\left\{V_{i}<\frac{-W_{i}}{\sigma}\right\}\;\middle\|\;{\bf W}_{{\cal I}}\right)\right],$	(39)