Multi-Way Number Partitioning: an Information-Theoretic View

Niloufar Ahmadypour and Amin Gohari This work was supported in part by INSF grant 96015883 and INSF grant on “Nanonetwork Communications”. Niloufar Ahmadypour is with the Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran (e-mail: ahmadypour_n@ee.sharif.edu). Amin Gohari was previously with the Department of Electrical Engineering, Sharif University of Technology. He is currently with the Tehran Institute for Advanced Studies (TeIAS), Tehran, Iran (email: a.gohari@teias.institute).

Abstract

The number partitioning problem is the problem of partitioning a given list of numbers into multiple subsets so that the sum of the numbers in each subset are as nearly equal as possible. We introduce two closely related notions of the “most informative” and “most compressible” partitions. Most informative partitions satisfy a principle of optimality property. We also give an exact algorithm (based on Huffman coding) with a running time of $\mathcal{O}(n\log(n))$ in input size $n$ to find the most compressible partition.

Index Terms:

Multi-way number partitioning, Entropy, Huffman codes.

I Introduction

Let $S=(\alpha_{1},\alpha_{2},\cdots,\alpha_{n})$ be a list of $n$ positive integers. The number partitioning problem is the task of partitioning $S$ into $k$ subsets $S_{1},S_{2},\cdots,S_{k}$ so that the sum of the numbers in different subsets ( $q_{i}=\sum_{\alpha_{j}\in S_{i}}\alpha_{j}$ for $1\leq i\leq k$ ) are as nearly equal as possible. For instance, if $S=(1,1,2,3,4,5)$ and $k=2$ , we can consider the following partition $(1,1,2,4)$ and $(3,5)$ . The numbers in each subset add up to 8, so this is a completely balanced partition. Three typical objective functions exist for this problem [1]:

1.

[Min-Difference objective function] Minimize the difference between the largest and smallest subset sums, i.e., minimize $\max_{1\leq i\leq k}q_{i}-\min_{1\leq i\leq k}q_{i}$ ,
2.

[Min-Max objective function] Minimize the largest subset sum, i.e., minimize $\max_{1\leq i\leq k}q_{i}$ ,
3.

[Max-Min objective function] Maximize the smallest subset sum, i.e., maximize $\min_{1\leq i\leq k}q_{i}$ .

While these objective functions are equivalent when $k=2$ , neither of them is equivalent to the other for $k>2$ [2]. For the case of $k=2$ , Karp proved that the decision version of number-partitioning problem is NP-complete [3]. However, there are some algorithms such as pseudo-polynomial time dynamic programming solution or some heuristic algorithms that solve the problem approximately or completely [4, 2, 3, 5, 6, 7, 1, 8, 9, 10].

In this paper, we introduce a new objective function for the number partitioning problem, different from the three objective functions described above. Let $M=\sum_{i=1}^{n}\alpha_{i}=\sum_{i=1}^{k}q_{i}$ be sum of all numbers in the list. Then, $(q_{1}/M,q_{2}/M,\cdots,q_{k}/M)$ will be a probability distribution, and we can measure its distance from the uniform distribution via its Shannon entropy:

H(q_{1}/M,q_{2}/M,\cdots,q_{k}/M)=\sum_{i=1}^{k}\frac{q_{i}}{M}\log_{2}\frac{M}{q_{i}}.

The above Shannon entropy is less than or equal to $\log(k)$ and reaches its maximum $\log(k)$ for the uniform distribution. We define a new objective function as maximizing this Shannon entropy and call it the entropic objective function. In information theory, entropy also finds an interpretation in terms of the optimal compression rate of a source. This interpretation of entropy allows us to define another objective function, closely related to the entropic objective function, which we call the compression objective function. Using a variant of Huffman coding, we present an exact algorithm, with a running time of $\mathcal{O}(n\log n)$ , to solve the optimization problem with the compression objective function.

The rest of this paper is organized as follows. In Section II, the entropic objective function is presented and a principle of optimality is proven for it. Section III introduces the compression objective function.

II Entropic objective function and a principle of optimality

Definition 1.

Given a list $S=(\alpha_{1},\alpha_{2},\cdots,\alpha_{n})$ , we define a random variable $X$ over the alphabet set $\mathcal{X}=\{1,2,\cdots,n\}$ such that $\mathbb{P}[X=i]={\alpha_{i}}/{M}$ where $M=\sum_{i}\alpha_{i}$ . Let $\mathcal{A}$ be a set of size $k$ . Then, an $(n,k)$ -partition function is a mapping $f:\mathcal{X}\to\mathcal{A}$ . This partitions $\mathcal{X}$ into $k$ sets $f^{-1}(a)$ for $a\in\mathcal{A}$ . Let $A=f(X)$ . Then, the marginal distribution of $A$ is characterized by the sum of numbers in different partitions divided by $M$ .

Definition 2.

For two discrete random variables $X$ and $Y$ with the joint probability mass function $p(x,y)$ define

	$\displaystyle H(Y\|X)$	$\displaystyle=-\sum_{x\in\mathcal{X}}p(x)\sum_{y\in\mathcal{Y}}p(y\|x)\log_{2}p(y\|x)$
	$\displaystyle I(X;Y)$	$\displaystyle=H(X)-H(X\|Y)=H(Y)-H(Y\|X).$

The number partition problem with the entropic objective function can be expressed as follows

\displaystyle\operatorname*{arg\ max}_{\small\small{f:\mathcal{X}\mapsto\mathcal{A}}}H(A)

(1)

where $A=f(X)$ . Since $H(A)=I(X;A)$ , we are looking for a partition function $f$ such that $f(X)$ is most informative about $X$ . As an example, consider the list $S=(1,1,2,3,4,5)$ . This corresponds to a random variable $X$ with the following distribution on $(1/16,1/16,2/16,3/16,4/16,5/16)$ on the set $\{1,2,3,4,5,6\}$ . If $k=2$ , we should consider functions $f:\{1,2,3,4,5,6\}\mapsto\{1,2\}$ . Assume that $f(1)=f(2)=f(3)=f(5)=1$ and $f(4)=f(6)=2$ . Then $A=f(X)$ is a uniform and $H(A)=1$ bit. Note that for $k=2$ , random variable $A$ is binary and maximizing $H(A)$ is equivalent to making $p(A=1)$ as close as possible to $1/2$ . Thus, the entropic objective function is equivalent with the Min-Difference, Min-Max and Max-Min objective functions, reviewed in the introduction, for $k=2$ . For $k>2$ , the entropic objective function is related to the Min-Max objective function. Remember that the min-entropy of a random variable $H_{\infty}(A)$ is defined as $H_{\infty}(A)\triangleq-\log\max_{a\in\mathcal{A}}p_{A}(a).$ Thus, maximizing $H_{\infty}(A)$ is equivalent to minimizing the largest subset sum. Since the min-entropy is never larger than the Shannon entropy, the maximum value of $H_{\infty}(A)$ yields a lower bound on the maximum of $H(A)$ .¹¹1 Maximizing $H_{\infty}(A)$ is equivalent with minimizing the $D_{\infty}(p_{A}\|u_{A})$ where $u_{A}$ is the uniform distribution on alphabet of $\mathcal{A}$ , and $D_{\infty}$ is the Renyi divergence of order infinity. The entropic objective function is equivalent with minimizing $D(p_{A}\|u_{A})$ where $D$ is the KL divergence. On the other hand, minimizing $D_{\infty}(u_{A}\|p_{A})$ is equivalent with maximizing the smallest subset sum objective function. Note that minimizing $D(u_{A}\|p_{A})$ is equivalent to maximizing the product of all subset sums, a different objective function that also satisfies a principle of optimality as in Theorem 3. Finally, the entropic objective function depends on the entire subset sums, not just the the largest or smallest subset sums. Next, we discuss a principle of optimality for the entropic objective function.

Principle of optimality: A property of the Min-Difference objective function is that in each optimal $k$ -way partition if the numbers in any $k-1$ subsets are optimally partitioned, the new partition is also optimal (principle of optimality) [4]. This property underlies the recursive algorithms of [4]. A different and a kind of more general principle of optimality (called recursive principle of optimality in [5]) is valid for Min-Max and Max-Min objective functions [2, 5]. It says that for any optimal $k$ -way partition with $k$ subsets and any $k_{1}+k_{2}=k$ , combining any optimal $k_{1}$ -way partition of the numbers in $k_{1}$ subsets and any optimal $k_{2}$ -way partition of the numbers in the other $k_{2}$ subsets results in an optimal partition for the main set [5]. In [6], the authors develop a principle of weakest-link optimality for minimizing the largest subset sum. In [7], the authors incorporate the ideas of [5, 6] and [11] and develop an algorithm that is similar to [6] in the sense of weakest-link optimality. See [1] for a review.

Next, we prove that the entropic objective function has a principle of optimality property similar to the one in [5] (which is the basis of algorithms in [5]).

Theorem 3.

Take an optimal $(n,k)$ -partition function $f:\mathcal{X}\to\mathcal{A}$ . Let $\mathcal{A}_{1}$ , $\mathcal{A}_{2}$ be an arbitrary partition of $\mathcal{A}$ into two sets. Define a partition of $\mathcal{X}$ into $\mathcal{X}_{1}$ and $\mathcal{X}_{2}$ by $\mathcal{X}_{i}=f^{-1}(\mathcal{A}_{i})$ . Define a random variable $X_{i}$ on the set $\mathcal{X}_{i}$ whose distribution equals the conditional distribution of $X$ given $A\in\mathcal{A}_{i}$ . Set $k_{i}=|\mathcal{A}_{i}|,\ n_{i}=|\mathcal{X}_{i}|,\ i\in\{1,2\}$ . Let $f_{1}:\mathcal{X}_{1}\mapsto\mathcal{A}_{1}$ be an arbitrary optimal $(n_{1},k_{1})$ -partition function of $X_{1}$ , and $f_{2}:\mathcal{X}_{2}\mapsto\mathcal{A}_{2}$ be an arbitrary optimal $(n_{2},k_{2})$ -partition function of $X_{2}$ . Then, the following function is an optimal $(n,k)$ -partition function for $X$ :

f_{c}(x)=\begin{cases}f_{1}(x)&x\in\mathcal{X}_{1}\\ f_{2}(x)&x\in\mathcal{X}_{2}\end{cases}

The following lemma is the key to prove Theorem 3.

Lemma 4 (Grouping Axiom of Entropy).

For any probability vector $\boldsymbol{p}=(p_{1},p_{2},\ldots,p_{k})$ and $1\leq r\leq k-1$ ,

$\displaystyle H(p_{1},$	$\displaystyle p_{2},\ldots,p_{k})=H\left(\sum_{i=1}^{r}p_{i},\sum_{i=r+1}^{k}p_{i}\right)$
	$\displaystyle+\left(\sum_{i=1}^{r}p_{i}\right)H\left(\frac{p_{1}}{\sum_{i=1}^{r}p_{i}},\ldots,\frac{p_{r}}{\sum_{i=1}^{r}p_{i}}\right)$
	$\displaystyle+\left(\sum_{i=r+1}^{k}p_{i}\right)H\left(\frac{p_{r+1}}{\sum_{i=r+1}^{k}p_{i}},\ldots,\frac{p_{k}}{\sum_{i=r+1}^{k}p_{i}}\right).$	(2)

Proof of Theorem 3.

Let $f(X)=A$ be an optimal $(n,k)$ -partition function of $X$ . Suppose that $f_{1}(X_{1})=A^{\prime}_{1}$ is an arbitrary optimal partition function for $X_{1}$ . Thus, $H(A^{\prime}_{1})\geq H(f(X_{1}))$ by definition. On the other hand, using Lemma 4 we have $H(A^{\prime}_{1})\leq H(f(X_{1}))$ , because otherwise combining $f_{1}$ and $f\left|{}_{\mathcal{X}_{2}}\right.$ results in a $(n,k)$ -partition function $f^{\prime}(X)=A^{\prime}$ such that $H(A^{\prime})>H(A)$ . That is contradiction with the optimality of $f$ . A similar argument is true for $X_{2}$ . Hence combining any optimal $(n_{1},k_{1})$ -partition function of $X_{1}$ with $(n_{2},k_{2})$ -partition function of $X_{2}$ must yield an optimal $(n,k)$ -partition function of $X$ . ∎

III Compression objective function and an algorithm

Observe that $H(X,A)=H(X)+H(A|X)=H(A)+H(X|A)$ . Since $H(A|X)=0$ , we have $H(A)=H(X)-H(X|A)$ . Since $H(X)$ does not depend on the choice of partition function, we can minimize $H(X|A)$ instead of maximizing $H(A)$ . The conditional entropy $H(X|A)$ can be understood as the average uncertainty remaining in $X$ when $A$ is revealed. Moreover, $H(X|A)$ approximates the average number of bits required to compress the source $X$ when $A$ is revealed. Consider the running example of $S=(1,1,2,3,4,5)$ and $k=2$ . A worst-case partition is to put all numbers in the first subset, and nothing in the other subset. This partition is also worst-case from the perspective of compression: assume that random variable $X$ takes values $\{1,2,3,4,5,6\}$ with probabilities $(1/16,1/16,2/16,3/16,4/16,5/16)$ . The partition given above implies $A=1$ with probability one, and its revelation provides no information about $X$ . Thus, one still needs to fully compress $X$ .

To go from the entropic objective function to the compression objective function (which is more operational), we note the following connection between entropy and compression. It is known that the minimum expected length among all prefix-free codes to describe a source $X$ is given by the Huffman code [12]. Moreover, we have

\displaystyle\mathbb{E}(\ell(X))-1<H(X)\leq\mathbb{E}(\ell(X)),

(3)

where $\ell(x)$ is the length of the Huffman code assigned to symbol $x$ [12].

Given an $(n,k)$ -partition function $f:\mathcal{X}\mapsto\mathcal{A}$ where $\mathcal{A}=\{1,2,\cdots,k\}$ , let $p_{i}(x)$ be the conditional distribution of $X$ given $A=i$ for $1\leq i\leq k$ . Let $\mathcal{C}_{i}$ be the Huffman code for compressing $X$ when $X\sim p_{i}(x)$ . Then, the compression objective function is defined as

\displaystyle L(X|A)\triangleq\sum_{i}\mathbb{P}[A=i]\mathbb{E}(\ell_{i}(X)|A=i)

(4)

where $\ell_{i}(x)$ is the length of the Huffman code assigned to symbol $x$ in $\mathcal{C}_{i}$ . Using (3) we obtain

\displaystyle L(X|A)-1<H(X|A)\leq L(X|A).

(5)

Thus, minimizing $L(X|A)$ and $H(X|A)$ are approximately the same. Unlike $H(X|A)$ , $L(X|A)$ does not admit an explicit formula. However, we give a fast algorithm for solving

\displaystyle\operatorname*{arg\,min}_{\small\small{f:\mathcal{X}\mapsto\mathcal{A}}}L(X|A)

(6)

where $A=f(X)$ .

Consider a list $S=(\alpha_{1},\alpha_{2},\cdots,\alpha_{n})$ where $\alpha_{1}\leq\alpha_{2}\leq\cdots\leq\alpha_{n}$ . We show in Lemma 5 that there is an optimal partition (minimizing $L(X|A)$ ) such that the two smallest numbers in the list, namely $\alpha_{1}$ and $\alpha_{2}$ , belong to the same subset in that partition. Knowing this, we can simply merge these two numbers together and replace $\alpha_{1}$ and $\alpha_{2}$ by $\alpha_{1}+\alpha_{2}$ . We claim that the problem then reduces to finding an optimal partition for a new list $(\alpha_{1}+\alpha_{2},\alpha_{3},\cdots,\alpha_{n})$ . The reason is as follows: assume that $f(\alpha_{1})=f(\alpha_{2})=i$ for some $1\leq i\leq k$ . Then, in the distribution of $X$ conditioned on $A=i$ , the probabilities $\alpha_{1}/p(A=i)$ and $\alpha_{2}/p(A=i)$ are still the two smallest numbers. It is known that a Huffman code starts off by merging the two symbol of lowest probabilities. Therefore, as $\alpha_{1}/p(A=i)$ and $\alpha_{2}/p(A=i)$ are in the same group, an optimal Huffman code also begins by merging $\alpha_{1}/p(A=i)$ and $\alpha_{2}/p(A=i)$ into a symbol $(\alpha_{1}+\alpha_{2})/p(A=i)$ . Thus, there is a one-to-one correspondence between Huffman codes for partitions of $(\alpha_{1},\alpha_{2},\cdots,\alpha_{n})$ in which $\alpha_{1}$ and $\alpha_{2}$ are in the same group, and Huffman codes for partitions of $(\alpha_{1}+\alpha_{2},\alpha_{3},\cdots,\alpha_{n})$ . Moreover, from (4), $L(X|A)$ for a partition of $(\alpha_{1},\alpha_{2},\cdots,\alpha_{n})$ in which $\alpha_{1}$ and $\alpha_{2}$ are in the same group equals $p(A=i)\times(\alpha_{1}+\alpha_{2})/p(A=i)=\alpha_{1}+\alpha_{2}$ plus $L(X|A)$ for the corresponding partition of $(\alpha_{1}+\alpha_{2},\alpha_{3},\cdots,\alpha_{n})$ . Since $\alpha_{1}+\alpha_{2}$ is a constant that does not depend on the choice of partitions, it suffices to proceed by minimizing $L(X|A)$ over partitions of partitions of $(\alpha_{1}+\alpha_{2},\alpha_{3},\cdots,\alpha_{n})$ .

Algorithm 1 gives the formal algorithm. This algorithm is similar to the Huffman code except that the algorithm is stopped prematurely when the size of the list becomes equal to $k$ .

Input a list

S_{0}=(\alpha_{1},\alpha_{2},\cdots,\alpha_{n})

;

Set

i\leftarrow 0

;

while $|S_{i}|>k$ do

Sort the list in

S_{i}

in increasing order as

S_{i}=(b_{1},b_{2},...,b_{m})

where

b_{1}\leq b_{2}\leq\cdots\leq b_{m}

;

Merge the smallest numbers

b_{1}

and

b_{2}

together and form the list

S_{i+1}=(b_{1}+b_{2},b_{3},b_{4},\cdots,b_{n})

;

Increase

i

by one;

end while

Put all the numbers merged together in the same group.

Algorithm 1 Minimizing

L(X|A)

Consider the running example of $S=(1,1,2,3,4,5)$ and $k=2$ . The algorithm produces the following lists: $(1,1,2,3,4,5)\mapsto(\textbf{2},2,3,4,5)\mapsto(3,\textbf{4},4,5)\mapsto(4,5,\textbf{7})\mapsto(\textbf{7},\textbf{9})$ . One can see that the numbers $1,1,2,3$ are grouped together (adding up to $7$ ), and $4,5$ are also grouped together during the execution of the algorithm (adding up to $9$ ). This shows that the minimum of $L(X|A)$ equals $22/16$ . The balanced partition $(1,1,2,4)$ and $(3,5)$ also yields $L(X|A)=22/16$ . However, unlike $H(X|A)$ , it is not always the case that $L(X|A)$ is minimized by a perfectly balanced partition (if such a partition exists). Nonetheless, as $|L(X|A)-H(X|A)|\leq 1$ , maximizing $L(X|A)$ or $H(X|A)$ are approximately the same when $H(X|A)$ and $k$ are large.

Lemma 5.

Assume $n>k$ . There is an optimal mapping $f$ minimizing (6), such that the two smallest numbers are in the same partition, i.e., $f(1)=f(2)$ for a list $S=(\alpha_{1},\alpha_{2},\cdots,\alpha_{n})$ where $\alpha_{1}\leq\alpha_{2}\leq\alpha_{3}\leq\cdots\alpha_{n}$ .

Proof.

Take a list $S=(\alpha_{1},\alpha_{2},\cdots,\alpha_{n})$ where $\alpha_{1}\leq\alpha_{2}\leq\cdots\alpha_{n}$ . Let $f$ be an optimal mapping minimizing (6) such that $f(1)\neq f(2)$ . There are two cases:

1.

One cannot find $i\in\{3,4,\cdots,n\}$ such that $f(i)=f(1)$ or $f(i)=f(2)$ . In this case, the Huffman code for $X$ given $A=f(1)$ or $A=f(2)$ has zero length. Since $n>k$ , one can find numbers $j_{1},j_{2}\in\{3,4,\cdots,n\}$ such that $f(j_{1})=f(j_{2})=a$ for some $a\notin\{f(1),f(2)\}$ . We construct a new partition function $f^{\prime}(\cdot)$ such that $f^{\prime}(j_{1})=f(1)$ , $f^{\prime}(1)=a$ and $f^{\prime}(j_{2})=f(2)$ , $f^{\prime}(2)=a$ and $f(\cdot)$ , $f^{\prime}(\cdot)$ are equal on the other values, then the expected length $L(X|A)$ decreases by

$\Delta=\ell_{a}(j_{1})(\alpha_{j_{1}}-\alpha_{1})/M+\ell_{a}(j_{2})(\alpha_{j_{2}}-\alpha_{2})/M,$

where $\ell_{a}(j_{1})$ and $\ell_{a}(j_{2})$ are the length of the Huffman codewords assigned to $X=j_{1}$ and $X=j_{2}$ conditioned on $A=a$ . This is a contradiction with optimality of $f$ unless $\Delta=0$ . If $\Delta=0$ , $f^{\prime}$ will also be an optimal code satisfying $f^{\prime}(1)=f^{\prime}(2)$ .
2.

There exists some $i\in\{3,4,\cdots,n\}$ such that either $f(i)=f(1)$ , or $f(i)=f(2)$ . Let $a=f(1)$ and $b=f(2)$ . Let $\mathcal{C}_{a}$ and $\mathcal{C}_{b}$ be the Huffman codes for distribution of $X$ given $A=a$ and $A=b$ respectively. At least one of the Huffman codes $\mathcal{C}_{a}$ and $\mathcal{C}_{b}$ has a non-zero average length. In any Huffman code with at least two symbols, the two longest codewords have the same length and they are assigned to symbols with the lowest probabilities [12]. First assume that $\ell_{a}(1)\geq\ell_{b}(2)$ . Then, $\mathcal{C}_{a}$ certainly has more than one codeword. Since $X=1$ has the least probability (corresponds to $\alpha_{1}$ ), it has the least probability in its group and also its codeword has the largest length in code $\mathcal{C}_{a}$ . Moreover there is another codeword with this length that corresponds to some $i_{1}\in\{3,4,\cdots,n\}$ . Thus, $f(i_{1})=a$ and $\ell_{a}(i_{1})=\ell_{a}(1)$ . Construct the new partition function $f^{\prime}(\cdot)$ such that $f^{\prime}(i_{1})=f(2)$ , $f^{\prime}(2)=f(1)$ and $f(\cdot)$ , $f^{\prime}(\cdot)$ are equal on the other values. Using the same Huffman codewords as before, this change in the mapping reduces the expected length of codewords by $\Delta=(\ell_{a}(1)-\ell_{b}(2))(\alpha_{i_{1}}-\alpha_{2})/M.$ This is a contradiction unless $\Delta=0$ which implies optimality of $f^{\prime}$ . For case $\ell_{a}(1)<\ell_{b}(2)$ , a similar argument goes through. Therefore, similar to Case 1, we can construct an optimal mapping satisfying $f^{\prime}(1)=f^{\prime}(2)$ .∎

We end this section by giving an information theoretic characterization of $L(X|A)$ in terms of a source coding problem. Suppose that we have a discrete memoryless source $X$ with alphabet $\mathcal{X}$ available at the encoder; see Fig. 1. The encoder has access to two noiseless parallel channels. The first channel (depicted as channel (A) in Fig. 1) is a free channel and can carry a symbol $A\in\{1,2,\cdots,k\}$ . The second channel is not free, and the encoder is charged for each transmitted bit. The goal is to minimize the average number of bits that are transmitted on the second channel in such a way that the receiver is able to perfectly reconstruct the source $X$ . The solution to this problem is $L(X|A)$ .

Refer to caption — Figure 1: Sending a source over two links.

References

[1] E. L. Schreiber, R. E. Korf, and M. D. Moffitt, “Optimal multi-way number partitioning,” J. ACM, vol. 65, no. 4, Jul. 2018.
[2] R. E. Korf, “Objective functions for multi-way number partitioning,” in Third Annual Symposium on Combinatorial Search, 2010.
[3] R. M. Karp, “Reducibility among combinatorial problems,” in Complexity of computer computations, 1972, pp. 85–103.
[4] R. E. Korf, “Multi-way number partitioning,” in Twenty-First International Joint Conference on Artificial Intelligence, 2009.
[5] ——, “A hybrid recursive multi-way number partitioning algorithm,” in Twenty-Second International Joint Conference on Artificial Intelligence, 2011.
[6] M. D. Moffitt, “Search strategies for optimal multi-way number partitioning,” in Twenty-Third International Joint Conference on Artificial Intelligence, 2013.
[7] R. E. Korf, E. L. Schreiber, and M. D. Moffitt, “Optimal sequential multi-way number partitioning.” in ISAIM, 2014.
[8] R. L. Graham, “Bounds on multiprocessing timing anomalies,” SIAM journal on Applied Mathematics, vol. 17, no. 2, pp. 416–429, 1969.
[9] R. E. Korf, “From approximate to optimal solutions: A case study of number partitioning,” in IJCAI, 1995, pp. 266–272.
[10] N. Karmarkar and R. M. Karp, The differencing method of set partitioning. University of California Berkeley, 1982.
[11] R. E. Korf and E. L. Schreiber, “Optimally scheduling small numbers of identical parallel machines,” in Twenty-Third International Conference on Automated Planning and Scheduling, 2013.
[12] D. A. Huffman, “A method for the construction of minimum-redundancy codes,” Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101, 1952.

	$\displaystyle H(Y\|X)$	$\displaystyle=-\sum_{x\in\mathcal{X}}p(x)\sum_{y\in\mathcal{Y}}p(y\|x)\log_{2}p(y\|x)$
	$\displaystyle I(X;Y)$	$\displaystyle=H(X)-H(X\|Y)=H(Y)-H(Y\|X).$