This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Multi-Way Number Partitioning: an Information-Theoretic View

Niloufar Ahmadypour and Amin Gohari This work was supported in part by INSF grant 96015883 and INSF grant on “Nanonetwork Communications”. Niloufar Ahmadypour is with the Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran (e-mail: ahmadypour_n@ee.sharif.edu). Amin Gohari was previously with the Department of Electrical Engineering, Sharif University of Technology. He is currently with the Tehran Institute for Advanced Studies (TeIAS), Tehran, Iran (email: a.gohari@teias.institute).
Abstract

The number partitioning problem is the problem of partitioning a given list of numbers into multiple subsets so that the sum of the numbers in each subset are as nearly equal as possible. We introduce two closely related notions of the “most informative” and “most compressible” partitions. Most informative partitions satisfy a principle of optimality property. We also give an exact algorithm (based on Huffman coding) with a running time of 𝒪(nlog(n))\mathcal{O}(n\log(n)) in input size nn to find the most compressible partition.

Index Terms:
Multi-way number partitioning, Entropy, Huffman codes.

I Introduction

Let S=(α1,α2,,αn)S=(\alpha_{1},\alpha_{2},\cdots,\alpha_{n}) be a list of nn positive integers. The number partitioning problem is the task of partitioning SS into kk subsets S1,S2,,SkS_{1},S_{2},\cdots,S_{k} so that the sum of the numbers in different subsets (qi=αjSiαjq_{i}=\sum_{\alpha_{j}\in S_{i}}\alpha_{j} for 1ik1\leq i\leq k) are as nearly equal as possible. For instance, if S=(1,1,2,3,4,5)S=(1,1,2,3,4,5) and k=2k=2, we can consider the following partition (1,1,2,4)(1,1,2,4) and (3,5)(3,5). The numbers in each subset add up to 8, so this is a completely balanced partition. Three typical objective functions exist for this problem [1]:

  1. 1.

    [Min-Difference objective function] Minimize the difference between the largest and smallest subset sums, i.e., minimize max1ikqimin1ikqi\max_{1\leq i\leq k}q_{i}-\min_{1\leq i\leq k}q_{i},

  2. 2.

    [Min-Max objective function] Minimize the largest subset sum, i.e., minimize max1ikqi\max_{1\leq i\leq k}q_{i},

  3. 3.

    [Max-Min objective function] Maximize the smallest subset sum, i.e., maximize min1ikqi\min_{1\leq i\leq k}q_{i}.

While these objective functions are equivalent when k=2k=2, neither of them is equivalent to the other for k>2k>2 [2]. For the case of k=2k=2, Karp proved that the decision version of number-partitioning problem is NP-complete [3]. However, there are some algorithms such as pseudo-polynomial time dynamic programming solution or some heuristic algorithms that solve the problem approximately or completely [4, 2, 3, 5, 6, 7, 1, 8, 9, 10].

In this paper, we introduce a new objective function for the number partitioning problem, different from the three objective functions described above. Let M=i=1nαi=i=1kqiM=\sum_{i=1}^{n}\alpha_{i}=\sum_{i=1}^{k}q_{i} be sum of all numbers in the list. Then, (q1/M,q2/M,,qk/M)(q_{1}/M,q_{2}/M,\cdots,q_{k}/M) will be a probability distribution, and we can measure its distance from the uniform distribution via its Shannon entropy:

H(q1/M,q2/M,,qk/M)=i=1kqiMlog2Mqi.H(q_{1}/M,q_{2}/M,\cdots,q_{k}/M)=\sum_{i=1}^{k}\frac{q_{i}}{M}\log_{2}\frac{M}{q_{i}}.

The above Shannon entropy is less than or equal to log(k)\log(k) and reaches its maximum log(k)\log(k) for the uniform distribution. We define a new objective function as maximizing this Shannon entropy and call it the entropic objective function. In information theory, entropy also finds an interpretation in terms of the optimal compression rate of a source. This interpretation of entropy allows us to define another objective function, closely related to the entropic objective function, which we call the compression objective function. Using a variant of Huffman coding, we present an exact algorithm, with a running time of 𝒪(nlogn)\mathcal{O}(n\log n), to solve the optimization problem with the compression objective function.

The rest of this paper is organized as follows. In Section II, the entropic objective function is presented and a principle of optimality is proven for it. Section III introduces the compression objective function.

II Entropic objective function and a principle of optimality

Definition 1.

Given a list S=(α1,α2,,αn)S=(\alpha_{1},\alpha_{2},\cdots,\alpha_{n}), we define a random variable XX over the alphabet set 𝒳={1,2,,n}\mathcal{X}=\{1,2,\cdots,n\} such that [X=i]=αi/M\mathbb{P}[X=i]={\alpha_{i}}/{M} where M=iαiM=\sum_{i}\alpha_{i}. Let 𝒜\mathcal{A} be a set of size kk. Then, an (n,k)(n,k)-partition function is a mapping f:𝒳𝒜f:\mathcal{X}\to\mathcal{A}. This partitions 𝒳\mathcal{X} into kk sets f1(a)f^{-1}(a) for a𝒜a\in\mathcal{A}. Let A=f(X)A=f(X). Then, the marginal distribution of AA is characterized by the sum of numbers in different partitions divided by MM.

Definition 2.

For two discrete random variables XX and YY with the joint probability mass function p(x,y)p(x,y) define

H(Y|X)\displaystyle H(Y|X) =x𝒳p(x)y𝒴p(y|x)log2p(y|x)\displaystyle=-\sum_{x\in\mathcal{X}}p(x)\sum_{y\in\mathcal{Y}}p(y|x)\log_{2}p(y|x)
I(X;Y)\displaystyle I(X;Y) =H(X)H(X|Y)=H(Y)H(Y|X).\displaystyle=H(X)-H(X|Y)=H(Y)-H(Y|X).

The number partition problem with the entropic objective function can be expressed as follows

argmaxf:𝒳𝒜H(A)\displaystyle\operatorname*{arg\ max}_{\small\small{f:\mathcal{X}\mapsto\mathcal{A}}}H(A) (1)

where A=f(X)A=f(X). Since H(A)=I(X;A)H(A)=I(X;A), we are looking for a partition function ff such that f(X)f(X) is most informative about XX. As an example, consider the list S=(1,1,2,3,4,5)S=(1,1,2,3,4,5). This corresponds to a random variable XX with the following distribution on (1/16,1/16,2/16,3/16,4/16,5/16)(1/16,1/16,2/16,3/16,4/16,5/16) on the set {1,2,3,4,5,6}\{1,2,3,4,5,6\}. If k=2k=2, we should consider functions f:{1,2,3,4,5,6}{1,2}f:\{1,2,3,4,5,6\}\mapsto\{1,2\}. Assume that f(1)=f(2)=f(3)=f(5)=1f(1)=f(2)=f(3)=f(5)=1 and f(4)=f(6)=2f(4)=f(6)=2. Then A=f(X)A=f(X) is a uniform and H(A)=1H(A)=1 bit. Note that for k=2k=2, random variable AA is binary and maximizing H(A)H(A) is equivalent to making p(A=1)p(A=1) as close as possible to 1/21/2. Thus, the entropic objective function is equivalent with the Min-Difference, Min-Max and Max-Min objective functions, reviewed in the introduction, for k=2k=2. For k>2k>2, the entropic objective function is related to the Min-Max objective function. Remember that the min-entropy of a random variable H(A)H_{\infty}(A) is defined as H(A)logmaxa𝒜pA(a).H_{\infty}(A)\triangleq-\log\max_{a\in\mathcal{A}}p_{A}(a). Thus, maximizing H(A)H_{\infty}(A) is equivalent to minimizing the largest subset sum. Since the min-entropy is never larger than the Shannon entropy, the maximum value of H(A)H_{\infty}(A) yields a lower bound on the maximum of H(A)H(A).111 Maximizing H(A)H_{\infty}(A) is equivalent with minimizing the D(pAuA)D_{\infty}(p_{A}\|u_{A}) where uAu_{A} is the uniform distribution on alphabet of 𝒜\mathcal{A}, and DD_{\infty} is the Renyi divergence of order infinity. The entropic objective function is equivalent with minimizing D(pAuA)D(p_{A}\|u_{A}) where DD is the KL divergence. On the other hand, minimizing D(uApA)D_{\infty}(u_{A}\|p_{A}) is equivalent with maximizing the smallest subset sum objective function. Note that minimizing D(uApA)D(u_{A}\|p_{A}) is equivalent to maximizing the product of all subset sums, a different objective function that also satisfies a principle of optimality as in Theorem 3. Finally, the entropic objective function depends on the entire subset sums, not just the the largest or smallest subset sums. Next, we discuss a principle of optimality for the entropic objective function.

Principle of optimality: A property of the Min-Difference objective function is that in each optimal kk-way partition if the numbers in any k1k-1 subsets are optimally partitioned, the new partition is also optimal (principle of optimality) [4]. This property underlies the recursive algorithms of [4]. A different and a kind of more general principle of optimality (called recursive principle of optimality in [5]) is valid for Min-Max and Max-Min objective functions [2, 5]. It says that for any optimal kk-way partition with kk subsets and any k1+k2=kk_{1}+k_{2}=k, combining any optimal k1k_{1}-way partition of the numbers in k1k_{1} subsets and any optimal k2k_{2}-way partition of the numbers in the other k2k_{2} subsets results in an optimal partition for the main set [5]. In [6], the authors develop a principle of weakest-link optimality for minimizing the largest subset sum. In [7], the authors incorporate the ideas of [5, 6] and [11] and develop an algorithm that is similar to [6] in the sense of weakest-link optimality. See [1] for a review.

Next, we prove that the entropic objective function has a principle of optimality property similar to the one in [5] (which is the basis of algorithms in [5]).

Theorem 3.

Take an optimal (n,k)(n,k)-partition function f:𝒳𝒜f:\mathcal{X}\to\mathcal{A}. Let 𝒜1\mathcal{A}_{1}, 𝒜2\mathcal{A}_{2} be an arbitrary partition of 𝒜\mathcal{A} into two sets. Define a partition of 𝒳\mathcal{X} into 𝒳1\mathcal{X}_{1} and 𝒳2\mathcal{X}_{2} by 𝒳i=f1(𝒜i)\mathcal{X}_{i}=f^{-1}(\mathcal{A}_{i}). Define a random variable XiX_{i} on the set 𝒳i\mathcal{X}_{i} whose distribution equals the conditional distribution of XX given A𝒜iA\in\mathcal{A}_{i}. Set ki=|𝒜i|,ni=|𝒳i|,i{1,2}k_{i}=|\mathcal{A}_{i}|,\ n_{i}=|\mathcal{X}_{i}|,\ i\in\{1,2\}. Let f1:𝒳1𝒜1f_{1}:\mathcal{X}_{1}\mapsto\mathcal{A}_{1} be an arbitrary optimal (n1,k1)(n_{1},k_{1})-partition function of X1X_{1}, and f2:𝒳2𝒜2f_{2}:\mathcal{X}_{2}\mapsto\mathcal{A}_{2} be an arbitrary optimal (n2,k2)(n_{2},k_{2})-partition function of X2X_{2}. Then, the following function is an optimal (n,k)(n,k)-partition function for XX:

fc(x)={f1(x)x𝒳1f2(x)x𝒳2f_{c}(x)=\begin{cases}f_{1}(x)&x\in\mathcal{X}_{1}\\ f_{2}(x)&x\in\mathcal{X}_{2}\end{cases}

The following lemma is the key to prove Theorem 3.

Lemma 4 (Grouping Axiom of Entropy).

For any probability vector 𝐩=(p1,p2,,pk)\boldsymbol{p}=(p_{1},p_{2},\ldots,p_{k}) and 1rk11\leq r\leq k-1,

H(p1,\displaystyle H(p_{1}, p2,,pk)=H(i=1rpi,i=r+1kpi)\displaystyle p_{2},\ldots,p_{k})=H\left(\sum_{i=1}^{r}p_{i},\sum_{i=r+1}^{k}p_{i}\right)
+(i=1rpi)H(p1i=1rpi,,pri=1rpi)\displaystyle+\left(\sum_{i=1}^{r}p_{i}\right)H\left(\frac{p_{1}}{\sum_{i=1}^{r}p_{i}},\ldots,\frac{p_{r}}{\sum_{i=1}^{r}p_{i}}\right)
+(i=r+1kpi)H(pr+1i=r+1kpi,,pki=r+1kpi).\displaystyle+\left(\sum_{i=r+1}^{k}p_{i}\right)H\left(\frac{p_{r+1}}{\sum_{i=r+1}^{k}p_{i}},\ldots,\frac{p_{k}}{\sum_{i=r+1}^{k}p_{i}}\right). (2)
Proof of Theorem 3.

Let f(X)=Af(X)=A be an optimal (n,k)(n,k)-partition function of XX. Suppose that f1(X1)=A1f_{1}(X_{1})=A^{\prime}_{1} is an arbitrary optimal partition function for X1X_{1}. Thus, H(A1)H(f(X1))H(A^{\prime}_{1})\geq H(f(X_{1})) by definition. On the other hand, using Lemma 4 we have H(A1)H(f(X1))H(A^{\prime}_{1})\leq H(f(X_{1})), because otherwise combining f1f_{1} and f|𝒳2f\left|{}_{\mathcal{X}_{2}}\right. results in a (n,k)(n,k)-partition function f(X)=Af^{\prime}(X)=A^{\prime} such that H(A)>H(A)H(A^{\prime})>H(A). That is contradiction with the optimality of ff. A similar argument is true for X2X_{2}. Hence combining any optimal (n1,k1)(n_{1},k_{1})-partition function of X1X_{1} with (n2,k2)(n_{2},k_{2})-partition function of X2X_{2} must yield an optimal (n,k)(n,k)-partition function of XX. ∎

III Compression objective function and an algorithm

Observe that H(X,A)=H(X)+H(A|X)=H(A)+H(X|A)H(X,A)=H(X)+H(A|X)=H(A)+H(X|A). Since H(A|X)=0H(A|X)=0, we have H(A)=H(X)H(X|A)H(A)=H(X)-H(X|A). Since H(X)H(X) does not depend on the choice of partition function, we can minimize H(X|A)H(X|A) instead of maximizing H(A)H(A). The conditional entropy H(X|A)H(X|A) can be understood as the average uncertainty remaining in XX when AA is revealed. Moreover, H(X|A)H(X|A) approximates the average number of bits required to compress the source XX when AA is revealed. Consider the running example of S=(1,1,2,3,4,5)S=(1,1,2,3,4,5) and k=2k=2. A worst-case partition is to put all numbers in the first subset, and nothing in the other subset. This partition is also worst-case from the perspective of compression: assume that random variable XX takes values {1,2,3,4,5,6}\{1,2,3,4,5,6\} with probabilities (1/16,1/16,2/16,3/16,4/16,5/16)(1/16,1/16,2/16,3/16,4/16,5/16). The partition given above implies A=1A=1 with probability one, and its revelation provides no information about XX. Thus, one still needs to fully compress XX.

To go from the entropic objective function to the compression objective function (which is more operational), we note the following connection between entropy and compression. It is known that the minimum expected length among all prefix-free codes to describe a source XX is given by the Huffman code [12]. Moreover, we have

𝔼((X))1<H(X)𝔼((X)),\displaystyle\mathbb{E}(\ell(X))-1<H(X)\leq\mathbb{E}(\ell(X)), (3)

where (x)\ell(x) is the length of the Huffman code assigned to symbol xx[12].

Given an (n,k)(n,k)-partition function f:𝒳𝒜f:\mathcal{X}\mapsto\mathcal{A} where 𝒜={1,2,,k}\mathcal{A}=\{1,2,\cdots,k\}, let pi(x)p_{i}(x) be the conditional distribution of XX given A=iA=i for 1ik1\leq i\leq k. Let 𝒞i\mathcal{C}_{i} be the Huffman code for compressing XX when Xpi(x)X\sim p_{i}(x). Then, the compression objective function is defined as

L(X|A)i[A=i]𝔼(i(X)|A=i)\displaystyle L(X|A)\triangleq\sum_{i}\mathbb{P}[A=i]\mathbb{E}(\ell_{i}(X)|A=i) (4)

where i(x)\ell_{i}(x) is the length of the Huffman code assigned to symbol xx in 𝒞i\mathcal{C}_{i}. Using (3) we obtain

L(X|A)1<H(X|A)L(X|A).\displaystyle L(X|A)-1<H(X|A)\leq L(X|A). (5)

Thus, minimizing L(X|A)L(X|A) and H(X|A)H(X|A) are approximately the same. Unlike H(X|A)H(X|A), L(X|A)L(X|A) does not admit an explicit formula. However, we give a fast algorithm for solving

argminf:𝒳𝒜L(X|A)\displaystyle\operatorname*{arg\,min}_{\small\small{f:\mathcal{X}\mapsto\mathcal{A}}}L(X|A) (6)

where A=f(X)A=f(X).

Consider a list S=(α1,α2,,αn)S=(\alpha_{1},\alpha_{2},\cdots,\alpha_{n}) where α1α2αn\alpha_{1}\leq\alpha_{2}\leq\cdots\leq\alpha_{n}. We show in Lemma 5 that there is an optimal partition (minimizing L(X|A)L(X|A)) such that the two smallest numbers in the list, namely α1\alpha_{1} and α2\alpha_{2}, belong to the same subset in that partition. Knowing this, we can simply merge these two numbers together and replace α1\alpha_{1} and α2\alpha_{2} by α1+α2\alpha_{1}+\alpha_{2}. We claim that the problem then reduces to finding an optimal partition for a new list (α1+α2,α3,,αn)(\alpha_{1}+\alpha_{2},\alpha_{3},\cdots,\alpha_{n}). The reason is as follows: assume that f(α1)=f(α2)=if(\alpha_{1})=f(\alpha_{2})=i for some 1ik1\leq i\leq k. Then, in the distribution of XX conditioned on A=iA=i, the probabilities α1/p(A=i)\alpha_{1}/p(A=i) and α2/p(A=i)\alpha_{2}/p(A=i) are still the two smallest numbers. It is known that a Huffman code starts off by merging the two symbol of lowest probabilities. Therefore, as α1/p(A=i)\alpha_{1}/p(A=i) and α2/p(A=i)\alpha_{2}/p(A=i) are in the same group, an optimal Huffman code also begins by merging α1/p(A=i)\alpha_{1}/p(A=i) and α2/p(A=i)\alpha_{2}/p(A=i) into a symbol (α1+α2)/p(A=i)(\alpha_{1}+\alpha_{2})/p(A=i). Thus, there is a one-to-one correspondence between Huffman codes for partitions of (α1,α2,,αn)(\alpha_{1},\alpha_{2},\cdots,\alpha_{n}) in which α1\alpha_{1} and α2\alpha_{2} are in the same group, and Huffman codes for partitions of (α1+α2,α3,,αn)(\alpha_{1}+\alpha_{2},\alpha_{3},\cdots,\alpha_{n}). Moreover, from (4), L(X|A)L(X|A) for a partition of (α1,α2,,αn)(\alpha_{1},\alpha_{2},\cdots,\alpha_{n}) in which α1\alpha_{1} and α2\alpha_{2} are in the same group equals p(A=i)×(α1+α2)/p(A=i)=α1+α2p(A=i)\times(\alpha_{1}+\alpha_{2})/p(A=i)=\alpha_{1}+\alpha_{2} plus L(X|A)L(X|A) for the corresponding partition of (α1+α2,α3,,αn)(\alpha_{1}+\alpha_{2},\alpha_{3},\cdots,\alpha_{n}). Since α1+α2\alpha_{1}+\alpha_{2} is a constant that does not depend on the choice of partitions, it suffices to proceed by minimizing L(X|A)L(X|A) over partitions of partitions of (α1+α2,α3,,αn)(\alpha_{1}+\alpha_{2},\alpha_{3},\cdots,\alpha_{n}).

Algorithm 1 gives the formal algorithm. This algorithm is similar to the Huffman code except that the algorithm is stopped prematurely when the size of the list becomes equal to kk.

Input a list S0=(α1,α2,,αn)S_{0}=(\alpha_{1},\alpha_{2},\cdots,\alpha_{n});
Set i0i\leftarrow 0;
while |Si|>k|S_{i}|>k do
       Sort the list in SiS_{i} in increasing order as Si=(b1,b2,,bm)S_{i}=(b_{1},b_{2},...,b_{m}) where b1b2bmb_{1}\leq b_{2}\leq\cdots\leq b_{m} ;
       Merge the smallest numbers b1b_{1} and b2b_{2} together and form the list Si+1=(b1+b2,b3,b4,,bn)S_{i+1}=(b_{1}+b_{2},b_{3},b_{4},\cdots,b_{n}) ;
       Increase ii by one;
      
end while
Put all the numbers merged together in the same group.
Algorithm 1 Minimizing L(X|A)L(X|A)

Consider the running example of S=(1,1,2,3,4,5)S=(1,1,2,3,4,5) and k=2k=2. The algorithm produces the following lists: (1,1,2,3,4,5)(2,2,3,4,5)(3,4,4,5)(4,5,7)(7,9)(1,1,2,3,4,5)\mapsto(\textbf{2},2,3,4,5)\mapsto(3,\textbf{4},4,5)\mapsto(4,5,\textbf{7})\mapsto(\textbf{7},\textbf{9}). One can see that the numbers 1,1,2,31,1,2,3 are grouped together (adding up to 77), and 4,54,5 are also grouped together during the execution of the algorithm (adding up to 99). This shows that the minimum of L(X|A)L(X|A) equals 22/1622/16. The balanced partition (1,1,2,4)(1,1,2,4) and (3,5)(3,5) also yields L(X|A)=22/16L(X|A)=22/16. However, unlike H(X|A)H(X|A), it is not always the case that L(X|A)L(X|A) is minimized by a perfectly balanced partition (if such a partition exists). Nonetheless, as |L(X|A)H(X|A)|1|L(X|A)-H(X|A)|\leq 1, maximizing L(X|A)L(X|A) or H(X|A)H(X|A) are approximately the same when H(X|A)H(X|A) and kk are large.

Lemma 5.

Assume n>kn>k. There is an optimal mapping ff minimizing (6), such that the two smallest numbers are in the same partition, i.e., f(1)=f(2)f(1)=f(2) for a list S=(α1,α2,,αn)S=(\alpha_{1},\alpha_{2},\cdots,\alpha_{n}) where α1α2α3αn\alpha_{1}\leq\alpha_{2}\leq\alpha_{3}\leq\cdots\alpha_{n}.

Proof.

Take a list S=(α1,α2,,αn)S=(\alpha_{1},\alpha_{2},\cdots,\alpha_{n}) where α1α2αn\alpha_{1}\leq\alpha_{2}\leq\cdots\alpha_{n}. Let ff be an optimal mapping minimizing (6) such that f(1)f(2)f(1)\neq f(2). There are two cases:

  1. 1.

    One cannot find i{3,4,,n}i\in\{3,4,\cdots,n\} such that f(i)=f(1)f(i)=f(1) or f(i)=f(2)f(i)=f(2). In this case, the Huffman code for XX given A=f(1)A=f(1) or A=f(2)A=f(2) has zero length. Since n>kn>k, one can find numbers j1,j2{3,4,,n}j_{1},j_{2}\in\{3,4,\cdots,n\} such that f(j1)=f(j2)=af(j_{1})=f(j_{2})=a for some a{f(1),f(2)}a\notin\{f(1),f(2)\}. We construct a new partition function f()f^{\prime}(\cdot) such that f(j1)=f(1)f^{\prime}(j_{1})=f(1), f(1)=af^{\prime}(1)=a and f(j2)=f(2)f^{\prime}(j_{2})=f(2), f(2)=af^{\prime}(2)=a and f()f(\cdot), f()f^{\prime}(\cdot) are equal on the other values, then the expected length L(X|A)L(X|A) decreases by

    Δ=a(j1)(αj1α1)/M+a(j2)(αj2α2)/M,\Delta=\ell_{a}(j_{1})(\alpha_{j_{1}}-\alpha_{1})/M+\ell_{a}(j_{2})(\alpha_{j_{2}}-\alpha_{2})/M,

    where a(j1)\ell_{a}(j_{1}) and a(j2)\ell_{a}(j_{2}) are the length of the Huffman codewords assigned to X=j1X=j_{1} and X=j2X=j_{2} conditioned on A=aA=a. This is a contradiction with optimality of ff unless Δ=0\Delta=0. If Δ=0\Delta=0, ff^{\prime} will also be an optimal code satisfying f(1)=f(2)f^{\prime}(1)=f^{\prime}(2).

  2. 2.

    There exists some i{3,4,,n}i\in\{3,4,\cdots,n\} such that either f(i)=f(1)f(i)=f(1), or f(i)=f(2)f(i)=f(2). Let a=f(1)a=f(1) and b=f(2)b=f(2). Let 𝒞a\mathcal{C}_{a} and 𝒞b\mathcal{C}_{b} be the Huffman codes for distribution of XX given A=aA=a and A=bA=b respectively. At least one of the Huffman codes 𝒞a\mathcal{C}_{a} and 𝒞b\mathcal{C}_{b} has a non-zero average length. In any Huffman code with at least two symbols, the two longest codewords have the same length and they are assigned to symbols with the lowest probabilities [12]. First assume that a(1)b(2)\ell_{a}(1)\geq\ell_{b}(2). Then, 𝒞a\mathcal{C}_{a} certainly has more than one codeword. Since X=1X=1 has the least probability (corresponds to α1\alpha_{1}), it has the least probability in its group and also its codeword has the largest length in code 𝒞a\mathcal{C}_{a}. Moreover there is another codeword with this length that corresponds to some i1{3,4,,n}i_{1}\in\{3,4,\cdots,n\}. Thus, f(i1)=af(i_{1})=a and a(i1)=a(1)\ell_{a}(i_{1})=\ell_{a}(1). Construct the new partition function f()f^{\prime}(\cdot) such that f(i1)=f(2)f^{\prime}(i_{1})=f(2), f(2)=f(1)f^{\prime}(2)=f(1) and f()f(\cdot), f()f^{\prime}(\cdot) are equal on the other values. Using the same Huffman codewords as before, this change in the mapping reduces the expected length of codewords by Δ=(a(1)b(2))(αi1α2)/M.\Delta=(\ell_{a}(1)-\ell_{b}(2))(\alpha_{i_{1}}-\alpha_{2})/M. This is a contradiction unless Δ=0\Delta=0 which implies optimality of ff^{\prime}. For case a(1)<b(2)\ell_{a}(1)<\ell_{b}(2), a similar argument goes through. Therefore, similar to Case 1, we can construct an optimal mapping satisfying f(1)=f(2)f^{\prime}(1)=f^{\prime}(2).∎

We end this section by giving an information theoretic characterization of L(X|A)L(X|A) in terms of a source coding problem. Suppose that we have a discrete memoryless source XX with alphabet 𝒳\mathcal{X} available at the encoder; see Fig. 1. The encoder has access to two noiseless parallel channels. The first channel (depicted as channel (A) in Fig. 1) is a free channel and can carry a symbol A{1,2,,k}A\in\{1,2,\cdots,k\}. The second channel is not free, and the encoder is charged for each transmitted bit. The goal is to minimize the average number of bits that are transmitted on the second channel in such a way that the receiver is able to perfectly reconstruct the source XX. The solution to this problem is L(X|A)L(X|A).

Refer to caption
Figure 1: Sending a source over two links.

References

  • [1] E. L. Schreiber, R. E. Korf, and M. D. Moffitt, “Optimal multi-way number partitioning,” J. ACM, vol. 65, no. 4, Jul. 2018.
  • [2] R. E. Korf, “Objective functions for multi-way number partitioning,” in Third Annual Symposium on Combinatorial Search, 2010.
  • [3] R. M. Karp, “Reducibility among combinatorial problems,” in Complexity of computer computations, 1972, pp. 85–103.
  • [4] R. E. Korf, “Multi-way number partitioning,” in Twenty-First International Joint Conference on Artificial Intelligence, 2009.
  • [5] ——, “A hybrid recursive multi-way number partitioning algorithm,” in Twenty-Second International Joint Conference on Artificial Intelligence, 2011.
  • [6] M. D. Moffitt, “Search strategies for optimal multi-way number partitioning,” in Twenty-Third International Joint Conference on Artificial Intelligence, 2013.
  • [7] R. E. Korf, E. L. Schreiber, and M. D. Moffitt, “Optimal sequential multi-way number partitioning.” in ISAIM, 2014.
  • [8] R. L. Graham, “Bounds on multiprocessing timing anomalies,” SIAM journal on Applied Mathematics, vol. 17, no. 2, pp. 416–429, 1969.
  • [9] R. E. Korf, “From approximate to optimal solutions: A case study of number partitioning,” in IJCAI, 1995, pp. 266–272.
  • [10] N. Karmarkar and R. M. Karp, The differencing method of set partitioning.   University of California Berkeley, 1982.
  • [11] R. E. Korf and E. L. Schreiber, “Optimally scheduling small numbers of identical parallel machines,” in Twenty-Third International Conference on Automated Planning and Scheduling, 2013.
  • [12] D. A. Huffman, “A method for the construction of minimum-redundancy codes,” Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101, 1952.