This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimal Decision Rules for Simple Hypothesis Testing under General Criterion Involving Error Probabilities

Berkan Dulek, Cuneyd Ozturk, Student Member, IEEE, and Sinan Gezici, Senior Member, IEEE B. Dulek is with the Department of Electrical and Electronics Engineering, Hacettepe University, Beytepe Campus, Ankara 06800, Turkey, e-mail: berkan@ee.hacettepe.edu.tr.C. Ozturk and S. Gezici are with the Department of Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey, e-mails: {cuneyd,gezici}@ee.bilkent.edu.tr.
Abstract

The problem of simple MM-ary hypothesis testing under a generic performance criterion that depends on arbitrary functions of error probabilities is considered. Using results from convex analysis, it is proved that an optimal decision rule can be characterized as a randomization among at most two deterministic decision rules, of the form reminiscent to Bayes rule, if the boundary points corresponding to each rule have zero probability under each hypothesis. Otherwise, a randomization among at most M(M1)+1M(M-1)+1 deterministic decision rules is sufficient. The form of the deterministic decision rules are explicitly specified. Likelihood ratios are shown to be sufficient statistics. Classical performance measures including Bayesian, minimax, Neyman-Pearson, generalized Neyman-Pearson, restricted Bayesian, and prospect theory based approaches are all covered under the proposed formulation. A numerical example is presented for prospect theory based binary hypothesis testing.

Index Terms– Hypothesis testing, optimal tests, convexity, likelihood ratio, randomization.

I Problem Statement

Consider a detection problem with MM simple hypotheses:

j:𝒀fj(), with j=0,1,,M1,\mathcal{H}_{j}:\boldsymbol{Y}\thicksim f_{j}(\cdot),\text{ with }j=0,1,\ldots,M-1, (1)

where the random observation 𝒀\boldsymbol{Y} takes values from an observation set Γ\Gamma with ΓN\Gamma\subset{\mathbb{R}}^{N}. Depending on whether the observed random vector 𝒀Γ\boldsymbol{Y}\in\Gamma is continuous-valued or discrete-valued, fj(𝒚)f_{j}(\boldsymbol{y}) denotes either the probability density function (pdf) or the probability mass function (pmf) under hypothesis j\mathcal{H}_{j}. For compactness of notation, the term density is used for both pdf and pmf. In order to decide among the hypotheses, we consider the set of pointwise randomized decision functions, denoted by 𝖣\mathsf{D}, i.e., 𝜹:=(δ0,δ1,,δM1)𝖣\boldsymbol{\delta}\vcentcolon=(\delta_{0},\delta_{1},\ldots,\delta_{M-1})\in\mathsf{D} such that i=0M1δi(𝒚)=1\sum_{i=0}^{M-1}\delta_{i}(\boldsymbol{y})=1 and δi(𝒚)[0,1]\delta_{i}(\boldsymbol{y})\in[0,1] for 0iM10\leq i\leq M-1 and 𝒚Γ\boldsymbol{y}\in\Gamma. More explicitly, given the observation 𝒚\boldsymbol{y}, the detector decides in favor of hypothesis i\mathcal{H}_{i} with probability δi(𝒚)\delta_{i}(\boldsymbol{y}). Then, the probability of choosing hypothesis i\mathcal{H}_{i} when hypothesis j\mathcal{H}_{j} is true, denoted by pijp_{ij} with 0i,jM10\leq i,j\leq M-1, is given by

pij:=𝔼j[δi(𝒚)]=Γδi(𝒚)fj(𝒚)μ(d𝒚),p_{ij}\vcentcolon=\mathbb{E}_{j}[\delta_{i}(\boldsymbol{y})]=\int_{\Gamma}\delta_{i}(\boldsymbol{y})f_{j}(\boldsymbol{y})\mu(d\boldsymbol{y}), (2)

where 𝔼j[]\mathbb{E}_{j}[\cdot] denotes expected value under hypothesis j\mathcal{H}_{j} and μ(d𝒚)\mu(d\boldsymbol{y}) is used in (2) to denote the NN-fold integral and sum for continuous and discrete cases, respectively. Let 𝒑(𝜹)\boldsymbol{p}(\boldsymbol{\delta}) denote the (column) vector containing all pairwise error probabilities pijp_{ij} for 0i,jM10\leq i,j\leq M-1 and iji\neq j corresponding to the decision rule 𝜹\boldsymbol{\delta}. It is sufficient to include only the pairwise error probabilities in 𝒑(𝜹)\boldsymbol{p}(\boldsymbol{\delta}), i.e., pijp_{ij} with iji\neq j. To see this, note that (2) in conjunction with i=0M1δi(𝒚)=1\sum_{i=0}^{M-1}\delta_{i}(\boldsymbol{y})=1 imply i=0M1pij=1\sum_{i=0}^{M-1}p_{ij}=1, from which we get the probability of correctly identifying hypothesis i\mathcal{H}_{i} as pii=1i=0,ijM1pijp_{ii}=1-\sum_{i=0,i\neq j}^{M-1}p_{ij}.

For MM-ary hypothesis testing, we consider a generic decision criterion that can be expressed in terms of the error probabilities as follows:

minimize𝜹𝖣\displaystyle\underset{\boldsymbol{\delta}\in\mathsf{D}}{\textrm{minimize}}\qquad\ g0(𝒑(𝜹))\displaystyle g_{0}(\boldsymbol{p}(\boldsymbol{\delta}))
subject to gi(𝒑(𝜹))0,i=1,2,,m\displaystyle g_{i}(\boldsymbol{p}(\boldsymbol{\delta}))\leq 0,\quad i=1,2,\ldots,m
hj(𝒑(𝜹))=0,j=1,2,,p\displaystyle h_{j}(\boldsymbol{p}(\boldsymbol{\delta}))=0,\quad j=1,2,\ldots,p (3)

where gig_{i} and hjh_{j} denote arbitrary functions of the pairwise error probability vector. Classical hypothesis testing criteria such as Bayesian, minimax, Neyman-Pearson (NP) [1], generalized Neyman-Pearson [2], restricted Bayesian [3], and prospect theory based hypothesis testing [4] are all special cases of the formulation in (3). For example, in the restricted Bayesian framework, the Bayes risk with respect to (w.r.t.) a certain prior is minimized subject to a constraint on the maximum conditional risk [3]:

minimize𝜹𝖣\displaystyle\underset{\boldsymbol{\delta}\in\mathsf{D}}{\textrm{minimize}}\quad rB(𝜹)\displaystyle\quad\,r_{B}(\boldsymbol{\delta})
subject to max0jM1Rj(𝜹)α\displaystyle\underset{0\leq j\leq M-1}{\textrm{max}}\ R_{j}(\boldsymbol{\delta})\ \leq\alpha (4)

for some ααm\alpha\geq\alpha_{m}, where αm\alpha_{m} is the maximum conditional risk of the minimax procedure [1]. The conditional risk when hypothesis HjH_{j} is true, denoted by Rj(𝜹)R_{j}(\boldsymbol{\delta}), is given by Rj(𝜹)=i=0M1cijpijR_{j}(\boldsymbol{\delta})=\sum_{i=0}^{M-1}c_{ij}p_{ij} and the Bayes risk is expressed as rB(𝜹)=j=0M1πjRj(𝜹)r_{B}(\boldsymbol{\delta})=\sum_{j=0}^{M-1}\pi_{j}R_{j}(\boldsymbol{\delta}), where πj\pi_{j} denotes the a priori probability of hypothesis j\mathcal{H}_{j} and cijc_{ij} is the cost incurred by choosing hypothesis i\mathcal{H}_{i} when in fact hypothesis j\mathcal{H}_{j} is true. Hence, (I) is a special case of (3).

In this letter, for the first time in the literature, we provide a unified characterization of optimal decision rules for simple hypothesis testing under a general criterion involving error probabilities.

II Preliminaries

Let 𝒗\boldsymbol{v} be a real (column) vector of length M(M1)M(M-1) whose elements are denoted as vijv_{ij} for 0i,jM10\leq i,j\leq M-1 and iji\neq j. Next, we present an optimal deterministic decision rule that minimizes the weighted sum of pijp_{ij}’s with arbitrary real weights 𝒗\boldsymbol{v}.111In classical Bayesian MM-ary hypothesis testing, vij=πj(cijcjj)v_{ij}=\pi_{j}(c_{ij}-c_{jj}).

II-A Optimal decision rule that minimizes 𝐯T𝐩(𝛅)\boldsymbol{v}^{T}\boldsymbol{p}(\boldsymbol{\delta})

The corresponding weighted sum of pairwise error probabilities can be written as

𝒗T𝒑(𝜹)\displaystyle\boldsymbol{v}^{T}\boldsymbol{p}(\boldsymbol{\delta}) =i=0M1j=0,jiM1vijpij\displaystyle=\sum_{i=0}^{M-1}\sum_{j=0,j\neq i}^{M-1}v_{ij}p_{ij}
=Γi=0M1δi(𝒚)(j=0,jiM1vijfj(𝒚))μ(d𝒚),\displaystyle=\int_{\Gamma}\sum_{i=0}^{M-1}\delta_{i}(\boldsymbol{y})\left(\sum_{j=0,j\neq i}^{M-1}v_{ij}f_{j}(\boldsymbol{y})\right)\mu(d\boldsymbol{y}), (5)

where (2) is substituted for pijp_{ij} in (II-A). Defining Vi(𝒚):=j=0,jiM1vijfj(𝒚)V_{i}(\boldsymbol{y})\vcentcolon=\sum_{j=0,j\neq i}^{M-1}v_{ij}f_{j}(\boldsymbol{y}), we get

𝒗T𝒑(𝜹)\displaystyle\boldsymbol{v}^{T}\boldsymbol{p}(\boldsymbol{\delta}) =Γi=0M1δi(𝒚)Vi(𝒚)μ(d𝒚)\displaystyle=\int_{\Gamma}\sum_{i=0}^{M-1}\delta_{i}(\boldsymbol{y})V_{i}(\boldsymbol{y})\,\mu(d\boldsymbol{y})
Γmin0iM1{Vi(𝒚)}μ(d𝒚)\displaystyle\geq\int_{\Gamma}\min_{0\leq i\leq M-1}\{V_{i}(\boldsymbol{y})\}\ \mu(d\boldsymbol{y}) (6)

The lower bound in (II-A) is achieved if, for all 𝒚Γ\boldsymbol{y}\in\Gamma, we set

δ(𝒚)=1 for =argmin0iM1Vi(𝒚)\delta_{\ell}(\boldsymbol{y})=1\text{ for }\ell=\operatorname*{argmin}_{0\leq i\leq M-1}V_{i}(\boldsymbol{y}) (7)

(and hence, δi(𝒚)=0\delta_{i}(\boldsymbol{y})=0 for all ii\neq\ell), i.e., each observed vector 𝒚\boldsymbol{y} is assigned to the corresponding hypothesis that minimizes Vi(𝒚)V_{i}(\boldsymbol{y}) over all 0iM10\leq i\leq M-1. In case where there are multiple hypotheses that achieve the same minimum value of V(𝒚)V_{\ell}(\boldsymbol{y}) for a given observation 𝒚\boldsymbol{y}, the ties can be broken by arbitrarily selecting one of them since the boundary decision does not affect the decision criterion 𝒗T𝒑(𝜹)\boldsymbol{v}^{T}\boldsymbol{p}(\boldsymbol{\delta}). However, pairwise probabilities for erroneously selecting hypotheses i\mathcal{H}_{i} and j\mathcal{H}_{j} will change if the set of boundary points

𝖡i,j(𝒗):={𝒚Γ:Vi(𝒚)=Vj(𝒚)Vk(𝒚) for all 0kM1,ki,kj}\mathsf{B}_{i,j}(\boldsymbol{v})\vcentcolon=\{\boldsymbol{y}\in\Gamma\,:\,V_{i}(\boldsymbol{y})=V_{j}(\boldsymbol{y})\leq V_{k}(\boldsymbol{y})\\ \text{ for all }0\leq k\leq M-1,k\neq i,k\neq j\} (8)

occurs with nonzero probability. We also define the set of all boundary points

𝖡(𝒗):=0iM1i<jM1𝖡i,j(𝒗)\mathsf{B}(\boldsymbol{v})\vcentcolon=\underset{\begin{subarray}{c}0\leq i\leq M-1\\ i<j\leq M-1\end{subarray}}{\bigcup}\mathsf{B}_{i,j}(\boldsymbol{v}) (9)

and the complimentary set where Vi(𝒚)V_{i}(\boldsymbol{y}) for some 0iM10\leq i\leq M-1 is strictly smaller than the rest:

𝖡¯(𝒗):=Γ𝖡(𝒗)={𝒚Γ:Vi(𝒚)<Vj(𝒚), for some 0iM1 and all 0jM1,ji}\bar{\mathsf{B}}(\boldsymbol{v})\vcentcolon=\Gamma\setminus\mathsf{B}(\boldsymbol{v})=\{\boldsymbol{y}\in\Gamma\,:\,V_{i}(\boldsymbol{y})<V_{j}(\boldsymbol{y}),\text{ for some }\\ 0\leq i\leq M-1\text{ and all }0\leq j\leq M-1,j\neq i\} (10)

II-B The set of achievable pairwise error probability vectors

Let 𝖯\mathsf{P} denote the set of all pairwise error probability vectors that can be achieved by randomized decision functions 𝜹𝖣\boldsymbol{\delta}\in\mathsf{D}, i.e., 𝖯:={𝒑(𝜹):𝜹𝖣}\mathsf{P}\vcentcolon=\{\boldsymbol{p}(\boldsymbol{\delta})\,:\,\boldsymbol{\delta}\in\mathsf{D}\}. In this part, we present some properties of 𝖯\mathsf{P}.

Property 1: 𝖯\mathsf{P} is a convex set.

Proof: Let 𝒑1(𝜹1)\boldsymbol{p}^{1}(\boldsymbol{\delta}^{1}) and 𝒑2(𝜹2)\boldsymbol{p}^{2}(\boldsymbol{\delta}^{2}) be two pairwise error probability vectors obtained by employing randomized decision functions 𝜹1\boldsymbol{\delta}^{1} and 𝜹2\boldsymbol{\delta}^{2}, respectively. Then, for any θ\theta with 0θ10\leq\theta\leq 1, 𝒑θ=θ𝒑1(𝜹1)+(1θ)𝒑2(𝜹2)𝖯\boldsymbol{p}_{\theta}=\theta\boldsymbol{p}^{1}(\boldsymbol{\delta}^{1})+(1-\theta)\boldsymbol{p}^{2}(\boldsymbol{\delta}^{2})\in\mathsf{P} since 𝒑θ\boldsymbol{p}_{\theta} is the pairwise error probability vector corresponding to the randomized decision rule θ𝜹1+(1θ)𝜹2\theta\boldsymbol{\delta}^{1}+(1-\theta)\boldsymbol{\delta}^{2} as seen from (2).

Property 2: Let 𝐩0\boldsymbol{p}_{0} be a point on the boundary of 𝖯\mathsf{P}. There exists a hyperplane {𝐩:𝐯T𝐩=𝐯T𝐩0}\{\boldsymbol{p}\,:\,\boldsymbol{v}^{T}\boldsymbol{p}=\boldsymbol{v}^{T}\boldsymbol{p}_{0}\} that is tangent to 𝖯\mathsf{P} at 𝐩0\boldsymbol{p}_{0} and 𝐯T𝐩𝐯T𝐩0\boldsymbol{v}^{T}\boldsymbol{p}\geq\boldsymbol{v}^{T}\boldsymbol{p}_{0} for all 𝐩𝖯\boldsymbol{p}\in\mathsf{P}.

Proof: Follows immediately from the supporting hyperplane theorem [5, Sec. 2.5.2].

III Characterization of Optimal Decision Rule

In order to characterize the solution of (3), we first present the following lemma.

Lemma: Let 𝐩0\boldsymbol{p}_{0} be a point on the boundary of 𝖯\mathsf{P} and {𝐩:𝐯T𝐩=𝐯T𝐩0}\{\boldsymbol{p}\,:\,\boldsymbol{v}^{T}\boldsymbol{p}=\boldsymbol{v}^{T}\boldsymbol{p}_{0}\} be a supporting hyperplane to 𝖯\mathsf{P} at the point 𝐩0\boldsymbol{p}_{0}.
Case 1: Any deterministic decision rule of the form given in (7) corresponding to the weights specified by 𝐯\boldsymbol{v} yields 𝐩0\boldsymbol{p}_{0} if 𝖡(𝐯)\mathsf{B}(\boldsymbol{v}), defined in (9), has zero probability under all hypotheses.
Case 2: 𝐩0\boldsymbol{p}_{0} is achieved by a randomization among at most M(M1)M(M-1) deterministic decision rules of the form given in (7), all corresponding to the same weights specified by 𝐯\boldsymbol{v}, if 𝖡(𝐯)\mathsf{B}(\boldsymbol{v}), defined in (9), has nonzero probability under some hypotheses.

Proof: See Appendix A.

It should be noted that the condition in case 1 of the lemma, i.e., 𝖡(𝒗)\mathsf{B}(\boldsymbol{v}) has zero probability under all hypotheses, is not difficult to satisfy. A simple example is when the observation under hypothesis i\mathcal{H}_{i} is Gaussian distributed with mean μi\mu_{i} and variance σ2\sigma^{2} for all 0iM10\leq i\leq M-1. Furthermore, the lemma implies that any extreme point of the convex set 𝖯\mathsf{P}, i.e., any point on the boundary of the convex set 𝖯\mathsf{P} that is not a convex combination of any other points in the set, can be achieved by a deterministic decision rule of the form (7) without any randomization. The points that are on the boundary but not extreme points can be obtained via randomization as stated in case 2.

Next, we present a unified characterization of the optimal decision rule for problems that are in the form of (3). We suppose that the problem in (3) is feasible and let 𝜹\boldsymbol{\delta}^{\ast} and 𝒑(𝜹)\boldsymbol{p}^{\ast}(\boldsymbol{\delta}^{\ast}) denote an optimal decision rule and the corresponding pairwise error probabilities, respectively.

Theorem: An optimal decision rule that solves (3) can be obtained as
Case 1: a randomization among at most two deterministic decision rules of the form given in (7), each specified by some real 𝐯\boldsymbol{v}, if 𝖡(𝐯)\mathsf{B}(\boldsymbol{v}), defined in (9), has zero probability under all hypotheses for all real 𝐯\boldsymbol{v}; otherwise
Case 2: a randomization among at most M(M1)+1M(M-1)+1 deterministic decision rules of the form given in (7), one specified by some real 𝐯\boldsymbol{v} and the remaining M(M1)M(M-1) correspond to the same weights specified by another real 𝐯\boldsymbol{v}.

Proof: If the optimal point 𝒑(𝜹)\boldsymbol{p}^{\ast}(\boldsymbol{\delta}^{\ast}) is on the boundary of 𝖯\mathsf{P}, then the lemma takes care of the proof. Here, we consider the case when 𝒑(𝜹)\boldsymbol{p}^{\ast}(\boldsymbol{\delta}^{\ast}) is an interior point of 𝖯\mathsf{P}. First, we pick an arbitrary 𝒗1M(M1)\boldsymbol{v}^{1}\in\mathbb{R}^{M(M-1)} and derive the optimal deterministic decision rule according to (7). Let 𝒑1\boldsymbol{p}^{1} denote the pairwise error probability vector corresponding to the employed decision rule. Then, we move along the ray that originates from 𝒑1\boldsymbol{p}^{1} and passes through 𝒑(𝜹)\boldsymbol{p}^{\ast}(\boldsymbol{\delta}^{\ast}). Since 𝑷\boldsymbol{P} is bounded, this ray will intersect with the boundary of 𝑷\boldsymbol{P} at some point, say 𝒑2\boldsymbol{p}^{2}. If the condition in case 1 is satisfied, then by lemma-case 1, there exists a deterministic decision rule of the form given in (7) that yields 𝒑2\boldsymbol{p}^{2}. Otherwise, by lemma-case 2, 𝒑2\boldsymbol{p}^{2} is achieved by a randomization among at most M(M1)M(M-1) deterministic decision rules of the form given in (7), all sharing the same weight vector 𝒗2\boldsymbol{v}^{2}. Since 𝒑(𝜹)\boldsymbol{p}^{\ast}(\boldsymbol{\delta}^{\ast}) resides on the line segment that connects 𝒑1\boldsymbol{p}^{1} to 𝒑2\boldsymbol{p}^{2}, it can be attained by appropriately randomizing among the decision rules that yield 𝒑1\boldsymbol{p}^{1} and 𝒑2\boldsymbol{p}^{2}. \blacksquare

When the optimization problem in (3) possesses certain structure, the maximum number of deterministic decision rules required to achieve optimal performance may be reduced below those given in the theorem. For example, suppose that the objective is a concave function of 𝒑\boldsymbol{p} and there are a total of nn constraints in (3) which are all linear in 𝒑\boldsymbol{p} (i.e., the feasible set, denoted by 𝖯\mathsf{P}^{\prime}, is the intersection of 𝖯\mathsf{P} with halfspaces and hyperplanes). It is well known that the minimum of a concave function over a closed bounded convex set is achieved at an extreme point [5]. Hence, in this case, the optimal point 𝒑\boldsymbol{p}^{\ast} is an extreme point of 𝖯\mathsf{P}^{\prime}. By Dubin’s theorem [6], any extreme point of 𝖯\mathsf{P}^{\prime} can be written as a convex combination of n+1n+1 or fewer extreme points of 𝖯\mathsf{P}. Since any extreme point of 𝖯\mathsf{P} can be achieved by a deterministic decision rule of the form (7), the optimal decision rule is obtained as a randomization among at most n+1n+1 deterministic decision rules of the form (7). If there are no constraints in (3), i.e., n=0n=0, the deterministic decision rule given in (7) is optimal and no randomization is required with a concave objective function.

An immediate and important corollary of the theorem is given below.

Corollary: Likelihood ratios are sufficient statistics for simple MM-ary hypothesis testing under any decision criterion that is expressed in terms of arbitrary functions of error probabilities as specified in (3).

Proof: It is stated in the theorem that a solution of the generic optimization problem in (3) can be expressed in terms of decision rules of the form given in (7). These decision rules only involve comparisons among Vi(𝒚)V_{i}(\boldsymbol{y})’s, which are linear w.r.t. the density terms fi(𝒚)f_{i}(\boldsymbol{y})’s. Normalizing fi(𝒚)f_{i}(\boldsymbol{y})’s with f0(𝒚)f_{0}(\boldsymbol{y}) and defining Li(𝒚):=fi(𝒚)/f0(𝒚)L_{i}(\boldsymbol{y})\vcentcolon=f_{i}(\boldsymbol{y})/f_{0}(\boldsymbol{y}), we see that an optimal decision rule that solves the problem in (3) depends on the observation 𝒚\boldsymbol{y} only through the likelihood ratios. \blacksquare

IV Numerical Examples

In this section, numerical examples are presented by considering a binary hypothesis testing problem; i.e., M=2M=2 in (1). Suppose that a bit (0 or 11) is sent over two independent binary channels to a decision maker, which aims to make an optimal decision based on the binary channel outputs. The output of binary channel kk is denoted by yk{0,1},k=1,2y_{k}\in\{0,1\},\ k=1,2, and the decision maker declares its decision based on 𝒚=[y1,y2]\boldsymbol{y}=[y_{1},y_{2}]. The probability that the output of binary channel kk is ii when bit jj is sent is denoted by pij(k)p_{ij}^{(k)} for 0i,j10\leq i,j\leq 1 with p0j(k)+p1j(k)=1p_{0j}^{(k)}+p_{1j}^{(k)}=1. Then, the pmf of 𝒚\boldsymbol{y} under j\mathcal{H}_{j} is given by

fj(𝒚)={p0j(1)p0j(2),if𝒚=[0,0]p0j(1)p1j(2),if𝒚=[0,1]p1j(1)p0j(2),if𝒚=[1,0]p1j(1)p1j(2),if𝒚=[1,1]\displaystyle f_{j}(\boldsymbol{y})=\begin{cases}p_{0j}^{(1)}p_{0j}^{(2)}\,,&{\textrm{if}}~\boldsymbol{y}=[0,0]\\ p_{0j}^{(1)}p_{1j}^{(2)}\,,&{\textrm{if}}~\boldsymbol{y}=[0,1]\\ p_{1j}^{(1)}p_{0j}^{(2)}\,,&{\textrm{if}}~\boldsymbol{y}=[1,0]\\ p_{1j}^{(1)}p_{1j}^{(2)}\,,&{\textrm{if}}~\boldsymbol{y}=[1,1]\end{cases} (11)

for j{0,1}j\in\{0,1\}. As in the previous sections, the pairwise error probability vector of the decision maker for a given decision rule 𝜹\boldsymbol{\delta} is represented by 𝒑(𝜹)\boldsymbol{p}(\boldsymbol{\delta}), which is expressed as 𝒑(𝜹)=[p10,p01]T\boldsymbol{p}(\boldsymbol{\delta})=[p_{10},p_{01}]^{T} in this case. It is assumed that the decision maker knows the conditional pdfs in (11).

In this section, a special case of (3) is considered based on prospect theory by focusing on a behavioral decision maker [4, 7, 8, 9]. In particular, there exist no constraints (i.e., m=p=0m=p=0 in (3)) and the objective function in (3) is expressed as

g0(𝒑(𝜹))=i=01j=01w(P(i is selected & j is true))v(cij)\displaystyle\vskip-8.53581ptg_{0}(\boldsymbol{p}(\boldsymbol{\delta}))=\sum_{i=0}^{1}\sum_{j=0}^{1}w(P(\mathcal{H}_{i}\text{ is selected \& }\mathcal{H}_{j}\text{ is true}))v(c_{ij}) (12)

where w()w(\cdot) is a weight function and v()v(\cdot) is a value function, which characterize how a behavioral decision maker distorts probabilities and costs, respectively [4], and P()P(\cdot) denotes the probability of its argument. In the numerical examples, the following weight function is employed: w(p)=pκ(pκ+(1p)κ)1/κw(p)=\frac{p^{\kappa}}{(p^{\kappa}+(1-p)^{\kappa})^{1/\kappa}} [4, 7, 8, 9]. In addition, the other parameters are set as v(c00)=3v(c_{00})=3, v(c01)=10v(c_{01})=10, v(c10)=20v(c_{10})=20, and v(c11)=7v(c_{11})=7. Furthermore, the prior probabilities of bit 0 and bit 11 are assumed to be equal.

The aim of the decision maker is to obtain a decision rule that minimizes (12). In the first example, κ\kappa is set to 55, and the parameters of the binary channels are selected as p10(1)=p10(2)=0.4p_{10}^{(1)}=p_{10}^{(2)}=0.4 and p01(1)=p01(2)=0.1p_{01}^{(1)}=p_{01}^{(2)}=0.1. In this case, it can be shown via (11) that there exist 66 different deterministic decision rules in the form of (7), which achieve the pairwise error probability vectors marked with blue stars in Fig. 1. The convex hull of these pairwise error probability vectors is also illustrated in the figure. Over these deterministic decision rules (i.e., in the absence of randomization), the minimum achievable value of (12) becomes 0.19010.1901, which corresponds to the pairwise error probability vector shown with the green square in Fig. 1. If randomization between two deterministic decision rules in the form of (7) is considered, the resulting minimum objective value becomes 0.04220.0422, and the corresponding pairwise error probability vector is indicated with the red triangle in the figure. On the other hand, in compliance with the theorem (case 2), the minimum value of (12) is achieved via randomization of (at most) three deterministic decision rules in the form of (7) (since M(M1)+1=3M(M-1)+1=3). In this case, the optimal decision rule randomizes among δ1\delta_{1}, δ2\delta_{2}, and δ3\delta_{3}, with randomization coefficients of 0.410.41, 0.510.51, and 0.080.08, respectively, as given below:

δ1(𝒚)\displaystyle\delta_{1}(\boldsymbol{y}) =0for all 𝒚\displaystyle=0~\text{for all~}\boldsymbol{y}
δ2(𝒚)\displaystyle\delta_{2}(\boldsymbol{y}) ={0, if 𝒚{[0,1],[1,0],[1,1]}1, if 𝒚=[0,0]\displaystyle=\begin{cases}0\,,\text{ if }\boldsymbol{y}\in\{[0,1],[1,0],[1,1]\}\\ 1\,,\text{ if }\boldsymbol{y}=[0,0]\\ \end{cases} (13)
δ3(𝒚)\displaystyle\delta_{3}(\boldsymbol{y}) ={0, if 𝒚=[1,1]1, if 𝒚{[0,0],[0,1],[1,0]}\displaystyle=\begin{cases}0\,,\text{ if }\boldsymbol{y}=[1,1]\\ 1\,,\text{ if }\boldsymbol{y}\in\{[0,0],[0,1],[1,0]\}\end{cases}

This optimal decision rule achieves the lowest objective value of 0.04000.0400, and the corresponding pairwise error probability vector is marked with the black circle in Fig. 1. Hence, this example shows that randomization among three deterministic decision rules may be required to obtain the solution of (3).

Refer to caption
Figure 1: Convex hull of pairwise error probability vectors corresponding to deterministic decision rules in (7), and pairwise error probability vectors corresponding to decision rules which yield the minimum objectives attained via no randomization (marked with square), randomization of two (marked with triangle) and three deterministic decision rules (marked with circle), where p10(1)=p10(2)=0.4p_{10}^{(1)}=p_{10}^{(2)}=0.4, p01(1)=p01(2)=0.1p_{01}^{(1)}=p_{01}^{(2)}=0.1, and κ=5\kappa=5.

In the second example, the parameters are taken as κ=1.5\kappa=1.5, p10(1)=0.3p_{10}^{(1)}=0.3, p10(2)=0.2p_{10}^{(2)}=0.2, p01(1)=0.4p_{01}^{(1)}=0.4, and p01(2)=0.25p_{01}^{(2)}=0.25. In this case, there exist 88 different deterministic decision rules in the form of (7), which achieve the pairwise error probability vectors marked with blue stars in Fig. 2. The minimum value of (12) among these deterministic decision rules is 3.92783.9278, which corresponds to the pairwise error probability vector shown with the green square in the figure. In addition, the pairwise error probability vectors corresponding to the solutions with randomization of two and three deterministic decision rules are marked with the red triangle and the black circle, respectively. In this scenario, the minimum objective value (3.84323.8432) can be achieved via randomization of two deterministic decision rules, as well. This is again in compliance with the theorem (case 2), which states that an optimal decision rule can be obtained as a randomization among at most M(M1)+1M(M-1)+1 deterministic decision rules of the form given in (7).

Refer to caption
Figure 2: Convex hull of pairwise error probability vectors corresponding to deterministic decision rules in (7), and pairwise error probability vectors corresponding to decision rules which yield the minimum objectives attained via no randomization (marked with square), randomization of two (marked with triangle) and three deterministic decision rules (marked with circle), where p10(1)=0.3p_{10}^{(1)}=0.3, p10(2)=0.2p_{10}^{(2)}=0.2, p01(1)=0.4p_{01}^{(1)}=0.4, and p01(2)=0.25p_{01}^{(2)}=0.25, and κ=1.5\kappa=1.5.

V Concluding Remarks

This letter presents a unified characterization of optimal decision rules for simple MM-ary hypothesis testing under a generic performance criterion that depends on arbitrary functions of error probabilities. It is shown that optimal performance with respect to the design criterion can be achieved by randomizing among at most two deterministic decision rules of the form reminiscent (but not necessarily identical) to Bayes rule when points on the decision boundary do not contribute to the error probabilities. For the general case, the solution for an optimal decision rule is reduced to a search over two weight coefficient vectors, each of length M(M1)M(M-1). Likelihood ratios are shown to be sufficient statistics. Classical performance measures including Bayesian, minimax, Neyman-Pearson, generalized Neyman-Pearson, restricted Bayesian, and prospect theory based approaches all appear as special cases of the considered framework.

Finally, we point out that the form of optimal local sensor decision rules for the problem of distributed detection [10, 11, 12, 13] with conditionally independent observations at the sensors and an arbitrary fusion rule can be characterized using the proposed framework.

Appendix A Proof of Lemma

Since {𝒑:𝒗T𝒑=𝒗T𝒑0}\{\boldsymbol{p}\,:\,\boldsymbol{v}^{T}\boldsymbol{p}=\boldsymbol{v}^{T}\boldsymbol{p}_{0}\} is a supporting hyperplane to 𝖯\mathsf{P} at the point 𝒑0\boldsymbol{p}_{0}, we get 𝒗T𝒑𝒗T𝒑0\boldsymbol{v}^{T}\boldsymbol{p}\geq\boldsymbol{v}^{T}\boldsymbol{p}_{0} for all 𝒑𝖯\boldsymbol{p}\in\mathsf{P}. Furthermore, the deterministic decision rule given in (7), denoted here by 𝜹\boldsymbol{\delta}^{\ast}, minimizes 𝒗T𝒑\boldsymbol{v}^{T}\boldsymbol{p} among all decision rules 𝜹𝖣\boldsymbol{\delta}\in\mathsf{D} (and consequently over all 𝒑𝖯\boldsymbol{p}\in\mathsf{P}). Since 𝒑0𝖯\boldsymbol{p}_{0}\in\mathsf{P} as well, the deterministic decision rule given in (7) achieves a performance score of 𝒗T𝒑0\boldsymbol{v}^{T}\boldsymbol{p}_{0}. Any other decision rule that does not agree with 𝜹\boldsymbol{\delta}^{\ast} on any subset of 𝖡¯(𝒗)\bar{\mathsf{B}}(\boldsymbol{v}) with nonzero probability measure will have a strictly greater performance score than 𝒗T𝒑0\boldsymbol{v}^{T}\boldsymbol{p}_{0} (due to the optimality of 𝜹\boldsymbol{\delta}^{\ast}), and hence, cannot be on the supporting hyperplane.
Case 1: We prove the first part by contrapositive. Suppose that the deterministic decision rule 𝜹\boldsymbol{\delta}^{\ast} given in (7) yields 𝒑𝒑0\boldsymbol{p}^{\ast}\neq\boldsymbol{p}_{0} meaning that 𝒑0\boldsymbol{p}_{0} is achieved by some other decision rule 𝜹0𝖣\boldsymbol{\delta}^{0}\in\mathsf{D}. Since 𝜹\boldsymbol{\delta}^{\ast} minimizes 𝒗T𝒑\boldsymbol{v}^{T}\boldsymbol{p} over all 𝒑𝖯\boldsymbol{p}\in\mathsf{P}, 𝒗T𝒑=𝒗T𝒑0\boldsymbol{v}^{T}\boldsymbol{p}^{\ast}=\boldsymbol{v}^{T}\boldsymbol{p}_{0} holds and both 𝒑\boldsymbol{p}^{\ast} and 𝒑0\boldsymbol{p}_{0} are located on the supporting hyperplane {𝒑:𝒗T𝒑=𝒗T𝒑0}\{\boldsymbol{p}\,:\,\boldsymbol{v}^{T}\boldsymbol{p}=\boldsymbol{v}^{T}\boldsymbol{p}_{0}\}. This implies that 𝜹\boldsymbol{\delta}^{\ast} and 𝜹0\boldsymbol{\delta}^{0} must agree on any subset of 𝖡¯(𝒗)\bar{\mathsf{B}}(\boldsymbol{v}) with nonzero probability measure. As a result, the difference between the pairwise probability vectors 𝒑\boldsymbol{p}^{\ast} and 𝒑0\boldsymbol{p}_{0} must stem from the difference of 𝜹\boldsymbol{\delta}^{\ast} and 𝜹0\boldsymbol{\delta}^{0} over 𝖡(𝒗)\mathsf{B}(\boldsymbol{v}). Consequently, the set 𝖡(𝒗)\mathsf{B}(\boldsymbol{v}) cannot have zero probability under all hypotheses.
Case 2: Suppose that the set of boundary points specified by 𝖡(𝒗)\mathsf{B}(\boldsymbol{v}) has nonzero probability under some hypotheses. In this case, each point in 𝖡i,j(𝒗)\mathsf{B}_{i,j}(\boldsymbol{v}) can be assigned arbitrarily (or in a randomized manner) to hypotheses i\mathcal{H}_{i} and j\mathcal{H}_{j}. Since the way the ties are broken does not change 𝒗T𝒑\boldsymbol{v}^{T}\boldsymbol{p}, the resulting error probability vectors are all located on the intersection of the set 𝖯\mathsf{P} with the M(M1)1M(M-1)-1 dimensional supporting hyperplane {𝒑:𝒗T𝒑=𝒗T𝒑0}\{\boldsymbol{p}\,:\,\boldsymbol{v}^{T}\boldsymbol{p}=\boldsymbol{v}^{T}\boldsymbol{p}_{0}\}. By Carathéodory’s Theorem [14], any point (including 𝒑0\boldsymbol{p}_{0}) in the intersection set, whose dimension is at most M(M1)1M(M-1)-1, can be represented as a convex combination of at most M(M1)M(M-1) extreme points of this set. Since these extreme points can only be obtained via deterministic decision rules which all agree with 𝜹\boldsymbol{\delta}^{\ast} on the set 𝖡¯(𝒗)\bar{\mathsf{B}}(\boldsymbol{v}), 𝒑0\boldsymbol{p}_{0} can be achieved by a randomization among at most M(M1)M(M-1) deterministic decision rules of the form given in (7), all corresponding to the weights specified by 𝒗\boldsymbol{v}. \blacksquare

References

  • [1] H. V. Poor, An Introduction to Signal Detection and Estimation. New York: Springer-Verlag, 1994.
  • [2] E. L. Lehmann, Testing Statistical Hypotheses, 2nd ed. New York, USA: Chapman & Hall, 1986.
  • [3] J. L. Hodges, Jr. and E. L. Lehmann, “The use of previous experience in reaching statistical decisions,” Ann. Math. Stat., vol. 23, no. 3, pp. 396–407, Sep. 1952.
  • [4] S. Gezici and P. K. Varshney, “On the optimality of likelihood ratio test for prospect theory-based binary hypothesis testing,” IEEE Signal Process. Lett., vol. 25, no. 12, pp. 1845–1849, Dec 2018.
  • [5] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, UK: Cambridge University Press, 2004.
  • [6] H. Witsenhausen, “Some aspects of convexity useful in information theory,” IEEE Trans. Inform. Theory, vol. 26, no. 3, pp. 265–271, May 1980.
  • [7] R. Gonzales and G. Wu, “On the shape of the probability weighting function,” Cognitive Psychology, vol. 38, no. 1, pp. 129–166, 1999.
  • [8] D. Prelec, “The probability weighting function,” Econometrica, vol. 66, no. 3, pp. 497–527, 1998.
  • [9] A. Tversky and D. Kahneman, “Advances in prospect theory: Cumulative represenation of uncertainty,” Journal of Risk and Uncertainty, vol. 5, pp. 297–323, 1992.
  • [10] C. Altay and H. Delic, “Optimal quantization intervals in distributed detection,” IEEE Trans. Aerosp. Electron. Syst., vol. 52, no. 1, pp. 38–48, February 2016.
  • [11] C. A. M. Sotomayor, R. P. David, and R. Sampaio-Neto, “Adaptive nonassisted distributed detection in sensor networks,” IEEE Trans. Aerosp. Electron. Syst., vol. 53, no. 6, pp. 3165–3174, Dec 2017.
  • [12] A. Ghobadzadeh and R. S. Adve, “Separating function estimation test for binary distributed radar detection with unknown parameters,” IEEE Trans. Aerosp. Electron. Syst., 2018.
  • [13] D. Warren and P. Willett, “Optimum quantization for detector fusion: some proofs, examples, and pathology,” J. Franklin Inst., vol. 336, no. 2, pp. 323–359, 1999.
  • [14] R. T. Rockafellar, Convex Analysis. Princeton, NJ: Princeton University Press, 1968.