This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A bound on partitioning clusters

Daniel Kane
Department of Mathematics, UC San Diego
9500 Gilman Drive, La Jolla CA 92093-0112
dakane@ucsd.edu
   Terence Tao
Department of Mathematics, UCLA
405 Hilgard Ave, Los Angeles CA 90095
tao@math.ucla.edu
(Submitted: Feb 2, 2017; Accepted: Feb 2, 2017; Published: XX
Mathematics Subject Classifications: 11B30
)
Abstract

Let XX be a finite collection of sets (or “clusters”). We consider the problem of counting the number of ways a cluster AXA\in X can be partitioned into two disjoint clusters A1,A2XA_{1},A_{2}\in X, thus A=A1A2A=A_{1}\uplus A_{2} is the disjoint union of A1A_{1} and A2A_{2}; this problem arises in the run time analysis of the ASTRAL algorithm in phylogenetic reconstruction. We obtain the bound

|{(A1,A2,A)X×X×X:A=A1A2}||X|3/p|\{(A_{1},A_{2},A)\in X\times X\times X:A=A_{1}\uplus A_{2}\}|\leqslant|X|^{3/p}

where |X||X| denotes the cardinality of XX, and plog3274=1.73814p\coloneqq\log_{3}\frac{27}{4}=1.73814\dots, so that 3p=1.72598\frac{3}{p}=1.72598\dots. Furthermore, the exponent pp cannot be replaced by any larger quantity. This improves upon the trivial bound of |X|2|X|^{2}. The argument relies on establishing a one-dimensional convolution inequality that can be established by elementary calculus combined with some numerical verification.

In a similar vein, we show that for any subset AA of a discrete cube {0,1}n\{0,1\}^{n}, the additive energy of AA (the number of quadruples (a1,a2,a3,a4)(a_{1},a_{2},a_{3},a_{4}) in A4A^{4} with a1+a2=a3+a4a_{1}+a_{2}=a_{3}+a_{4}) is at most |A|log26|A|^{\log_{2}6}, and that this exponent is best possible.

1 Introduction

The purpose of this note is to establish the following theorem:

Theorem 1.

Let XX be a finite collection of sets. Then we have

|{(A1,A2,A)X×X×X:A=A1A2}||X|3/p|\{(A_{1},A_{2},A)\in X\times X\times X:A=A_{1}\uplus A_{2}\}|\leqslant|X|^{3/p} (1)

where |X||X| denotes the cardinality of XX, A=A1A2A=A_{1}\uplus A_{2} denotes the assertion that AA is the disjoint union of A1A_{1} and A2A_{2}, and plog3274=1.73814p\coloneqq\log_{3}\frac{27}{4}=1.73814\dots. Furthermore, the exponent pp cannot be replaced by any larger quantity.

Note that 3p=1.72598\frac{3}{p}=1.72598\dots. Thus the inequality (1) improves upon the trivial bound

|{(A1,A2,A)X×X×X:A=A1A2}||X|2|\{(A_{1},A_{2},A)\in X\times X\times X:A=A_{1}\uplus A_{2}\}|\leqslant|X|^{2} (2)

that arises simply because there are |X|2|X|^{2} pairs (A1,A2)(A_{1},A_{2}), and this pair uniquely determines AA.

Theorem 1 has applications in analyzing the running time of several dynamic programming algorithms used in phylogenetic reconstruction literature. Several published algorithms [4, 5, 6, 7] seek to find a median tree that minimizes the total distance to an input set of trees, for various definitions of a distance between two trees. For example, a method called ASTRAL [4, 5] defines the distance between two trees as the number of quartet trees induced by one tree but not the other, and seeks to find the unrooted tree that minimizes the sum of this quartet distance to its input set of unrooted trees, a problem that turns out to be NP-Hard [8]. To make this optimization problem tractable, ASTRAL uses a dynamic programming approach where the final tree is built by successively dividing each subset of leaves (called a cluster) AA into smaller clusters, AA^{\prime} and A\AA\backslash A^{\prime} while minimizing the number of quartets in the input tree set that will have to be missing from any tree that includes A|A\AA^{\prime}|A\backslash A^{\prime} as a bipartition (i.e., a branch; note that a branch in an unrooted tree is just a bipartition of the leaves). If all possible subsets AAA^{\prime}\subset A are considered when dividing AA to two subsets, the algorithm provably returns the optimal tree (also, the optimal solution has been shown to enjoy statistical consistency under certain models of gene and species evolution [4]). However, such a dynamic programming algorithm will have to explore the power set of the set of leaves and will thus require time exponential in the number of leaves.

To give a practical alternative, ASTRAL solves a constrained version of the problem where a set XX of clusters is defined in advance to constrain the search space; when dividing AA, we only look for AAA^{\prime}\subset A such that AXA^{\prime}\in X and A\AXA\backslash A^{\prime}\in X. The set XX is defined heuristically, and the running time of ASTRAL should be defined asymptotically as a function of |X||X|. Throughout the dynamic programming execution, ASTRAL considers all possible pairs of clusters x,yXx,y\in X exactly once iff xy=x\cap y=\emptyset and xyXx\cup y\in X. Therefore, establishing the asymptotic running time of ASTRAL with regards to |X||X| requires bounding the left-hand side of (1) ASTRAL simply used the trivial O(|X|2)O(|X|^{2}) upper bound in their running time analysis. We can now improve that analysis to O(|X|1.72598)O(|X|^{1.72598\dots}).

We first demonstrate why the exponent pp is best possible. Let nn be a large multiple of 33, and let XX denote the collection of all sets A{1,,n}A\subset\{1,\dots,n\} whose cardinality |A||A| is equal to either n/3n/3 or 2n/32n/3. Clearly

|X|=(nn/3)+(n2n/3)=2n!(n/3)!(2n/3)!.|X|=\binom{n}{n/3}+\binom{n}{2n/3}=2\frac{n!}{(n/3)!(2n/3)!}.

On the other hand, if AXA\in X has cardinality 2n/32n/3, then it can be partitioned in (2n/3n/3)\binom{2n/3}{n/3} ways into A1A2A_{1}\uplus A_{2} with A1,A2XA_{1},A_{2}\in X, and no partition is available when AA has cardinality n/3n/3. Thus the left-hand side of (1) is equal to

(n2n/3)×(2n/3n/3)=n!(n/3)!(n/3)!(n/3)!.\binom{n}{2n/3}\times\binom{2n/3}{n/3}=\frac{n!}{(n/3)!(n/3)!(n/3)!}.

Using the Stirling approximation n!=nnen+o(n)n!=n^{n}e^{-n+o(n)} as nn\to\infty, we conclude that the left-hand side of (1) is equal to exp(n(log3+o(1)))\exp(n(\log 3+o(1))), while |S||S| is equal to exp(n3(log274+o(1)))\exp(\frac{n}{3}(\log\frac{27}{4}+o(1))), and on sending nn\to\infty we conclude that (1) can fail whenever p>log3274p>\log_{3}\frac{27}{4}.

Now we show why (1) holds for p=log3274p=\log_{3}\frac{27}{4}. This will be a consequence of the following convolution estimate on the discrete unit cube {0,1}n\{0,1\}^{n} (which we view as a subset of n\mathbb{Z}^{n} for the purposes of defining convolution):

Theorem 2 (Convolution).

Let n1n\geqslant 1 be a natural number, and let f,g,h:{0,1}nf,g,h:\{0,1\}^{n}\to\mathbb{R} be functions. Set p:=log3274p:=\log_{3}\frac{27}{4}. Then we have

fgh(1n)fp({0,1}n)gp({0,1}n)hp({0,1}n)f*g*h(1^{n})\leqslant\|f\|_{\ell^{p}(\{0,1\}^{n})}\|g\|_{\ell^{p}(\{0,1\}^{n})}\|h\|_{\ell^{p}(\{0,1\}^{n})} (3)

where 1n1^{n} denotes the element (1,,1)(1,\dots,1) of {0,1}n\{0,1\}^{n}, fghf*g*h denotes the convolution

fgh(w)x,y,z{0,1}n:x+y+z=wf(x)g(y)h(z)f*g*h(w)\coloneqq\sum_{x,y,z\in\{0,1\}^{n}:x+y+z=w}f(x)g(y)h(z)

and p({0,1}n)\|\cdot\|_{\ell^{p}(\{0,1\}^{n})} denotes the p\ell^{p} norm

fp({0,1}n)(x{0,1}n|f(x)|p)1/p.\|f\|_{\ell^{p}(\{0,1\}^{n})}\coloneqq(\sum_{x\in\{0,1\}^{n}}|f(x)|^{p})^{1/p}.

Let us now see why Theorem 2 implies Theorem 1. We first observe that to prove Theorem 1, it suffices to do so under the additional assumption that all the sets in XX are finite. Indeed, if we let ΩAXA\Omega\coloneqq\bigcup_{A\in X}A denote the union of the sets in XX, then XX partitions Ω\Omega into at most 2|X|2^{|X|} cells. Some of these cells may be infinite, but we may replace any such cell with a single point without affecting either the left or right-hand side of (1). After applying this replacement, every set in XX is now finite.

Without loss of generality, we may now assume that all the sets AA in XX are subsets of {1,,n}\{1,\dots,n\} for some natural number n1n\geqslant 1. We now define the functions f,g,h:{0,1}n{0,1}f,g,h:\{0,1\}^{n}\to\{0,1\} as follows. For any (a1,,an){0,1}n(a_{1},\dots,a_{n})\in\{0,1\}^{n}, we set f(a1,,an)=g(a1,,an)=1f(a_{1},\dots,a_{n})=g(a_{1},\dots,a_{n})=1 when the set {1in:ai=1}\{1\leqslant i\leqslant n:a_{i}=1\} lies in XX, and f(a1,,an)=g(a1,,an)=0f(a_{1},\dots,a_{n})=g(a_{1},\dots,a_{n})=0 otherwise. Similarly, we set h(a1,,an)=1h(a_{1},\dots,a_{n})=1 when the set {1in:ai=0}\{1\leqslant i\leqslant n:a_{i}=0\} lies in XX, and h(a1,,an)=0h(a_{1},\dots,a_{n})=0 otherwise. It is then easy to see that

fp({0,1}n)=gp({0,1}n)=hp({0,1}n)=|X|1/p\|f\|_{\ell^{p}(\{0,1\}^{n})}=\|g\|_{\ell^{p}(\{0,1\}^{n})}=\|h\|_{\ell^{p}(\{0,1\}^{n})}=|X|^{1/p}

and

fgh(1n)=|{(A1,A2,A)X×X×X:A=A1A2}|f*g*h(1^{n})=|\{(A_{1},A_{2},A)\in X\times X\times X:A=A_{1}\uplus A_{2}\}|

giving the claim.

Remark 3.

Young’s convolution inequality establishes (3) with pp replaced by 3/23/2 (or any exponent less than 3/23/2). This corresponds to the trivial bound (2). The ability to improve the exponents in Young’s convolution inequality is reminiscent of the Kunze-Stein phenomenon [3] in semisimple Lie groups, as well as the hypercontractivity inequality on the Boolean cube (see e.g. [2]). Indeed, the proof of Theorem 2 will be similar to the proof of hypercontractivity in that we will soon reduce matters to verifying the one-dimensional case n=1n=1.

Remark 4.

The above argument in fact establishes the more general inequality

|{(A1,A2,A)X×X1×X2:A=A1A2}||X|1/p|X1|1/p|X2|1/p|\{(A_{1},A_{2},A)\in X\times X_{1}\times X_{2}:A=A_{1}\uplus A_{2}\}|\leqslant|X|^{1/p}|X_{1}|^{1/p}|X_{2}|^{1/p}

whenever X,X1,X2X,X_{1},X_{2} are finite collections of sets.

The form of Theorem 2 is very amenable to an induction on dimension:

Proposition 5.

Let n1,n21n_{1},n_{2}\geqslant 1 be natural numbers. If Theorem 2 holds for n=n1n=n_{1} and n=n2n=n_{2}, then it holds for n=n1+n2n=n_{1}+n_{2}.

Proof.

Let f,g,h:{0,1}n1+n2f,g,h:\{0,1\}^{n_{1}+n_{2}}\to\mathbb{R} be functions. For any x{0,1}n1x\in\{0,1\}^{n_{1}}, let fx:{0,1}n2f_{x}:\{0,1\}^{n_{2}}\to\mathbb{R} denote the function

fx(x)f(x,x)f_{x}(x^{\prime})\coloneqq f(x,x^{\prime})

for x{0,1}n2x^{\prime}\in\{0,1\}^{n_{2}} (where we identify the pair (x,x)(x,x^{\prime}) with an element of {0,1}n1+n2\{0,1\}^{n_{1}+n_{2}} in the usual fashion). Then we can write

fgh(1n1+n2)=x,y,z{0,1}n1:x+y+z=1n1fxgyhz(1n2).f*g*h(1^{n_{1}+n_{2}})=\sum_{x,y,z\in\{0,1\}^{n_{1}}:x+y+z=1^{n_{1}}}f_{x}*g_{y}*h_{z}(1^{n_{2}}).

Applying Theorem 2 for n=n2n=n_{2} and the functions fx,gy,hzf_{x},g_{y},h_{z}, we thus have

fgh(1n1+n2)x,y,z{0,1}n1:x+y+z=1n1fxp({0,1}n2)gyp({0,1}n2)hzp({0,1}n2).f*g*h(1^{n_{1}+n_{2}})\leqslant\sum_{x,y,z\in\{0,1\}^{n_{1}}:x+y+z=1^{n_{1}}}\|f_{x}\|_{\ell^{p}(\{0,1\}^{n_{2}})}\|g_{y}\|_{\ell^{p}(\{0,1\}^{n_{2}})}\|h_{z}\|_{\ell^{p}(\{0,1\}^{n_{2}})}.

Applying Theorem 2 for n=n1n=n_{1} and the functions xfxp({0,1}n2)x\mapsto\|f_{x}\|_{\ell^{p}(\{0,1\}^{n_{2}})}, ygyp({0,1}n2)y\mapsto\|g_{y}\|_{\ell^{p}(\{0,1\}^{n_{2}})}, zhzp({0,1}n2)z\mapsto\|h_{z}\|_{\ell^{p}(\{0,1\}^{n_{2}})}, we obtain

fgh(1n1+n2)fp({0,1}n1+n2gp({0,1}n1+n2hp({0,1}n1+n2f*g*h(1^{n_{1}+n_{2}})\leqslant\|f\|_{\ell^{p}(\{0,1\}^{n_{1}+n_{2}}}\|g\|_{\ell^{p}(\{0,1\}^{n_{1}+n_{2}}}\|h\|_{\ell^{p}(\{0,1\}^{n_{1}+n_{2}}}

which gives Theorem 2 for n=n1+n2n=n_{1}+n_{2}. ∎

From this proposition and induction, we see that to prove Theorem 2, it suffices to do so in the one-dimensional case n=1n=1. We may normalize

fp({0,1})=gp({0,1})=hp({0,1})=1,\|f\|_{\ell^{p}(\{0,1\})}=\|g\|_{\ell^{p}(\{0,1\})}=\|h\|_{\ell^{p}(\{0,1\})}=1,

so that we may write

f(0)\displaystyle f(0) =a1/p\displaystyle=a^{1/p}
f(1)\displaystyle f(1) =(1a)1/p\displaystyle=(1-a)^{1/p}
g(0)\displaystyle g(0) =b1/p\displaystyle=b^{1/p}
g(1)\displaystyle g(1) =(1b)1/p\displaystyle=(1-b)^{1/p}
h(0)\displaystyle h(0) =c1/p\displaystyle=c^{1/p}
h(1)\displaystyle h(1) =(1c)1/p\displaystyle=(1-c)^{1/p}

for some 0a,b,c10\leqslant a,b,c\leqslant 1. The inequality (3) then simplifies to the elementary inequality

(ab(1c))1/p+(bc(1a))1/p+(ca(1b))1/p1.(ab(1-c))^{1/p}+(bc(1-a))^{1/p}+(ca(1-b))^{1/p}\leqslant 1. (4)

Observe that equality is attained here when (a,b,c)=(0,1,1),(1,0,1),(0,1,1),(2/3,2/3,2/3)(a,b,c)=(0,1,1),(1,0,1),(0,1,1),(2/3,2/3,2/3); the final case (a,b,c)=(2/3,2/3,2/3)(a,b,c)=(2/3,2/3,2/3) also reveals that the inequality (4) fails if pp is replaced by any quantity larger than log3274\log_{3}\frac{27}{4}. This is of course consistent with the second part of Theorem 1.

The fact that equality is attained in (4) in four different locations seems to rule out any quick proof of (4) using convexity-based methods such as Jensen’s inequality. Instead, we argue as follows. First observe that when a=0a=0, the left-hand side of (4) simplifies to (bc)1/p(bc)^{1/p}, and it is then clear that the inequality (4) holds whenever a=0a=0 and is strict unless (a,b,c)=(0,1,1)(a,b,c)=(0,1,1). Next, we analyze the left-hand side of (4) for (a,b,c)(a,b,c) close to (0,1,1)(0,1,1). Writing (a,b,c)=(α,1β,1γ)(a,b,c)=(\alpha,1-\beta,1-\gamma) for some small α,β,γ0\alpha,\beta,\gamma\geqslant 0, we can write the left-hand side of (4) as

(αγ)1/p+((1β)(1γ)(1α))1/p+(αβ)1/p.(\alpha\gamma)^{1/p}+((1-\beta)(1-\gamma)(1-\alpha))^{1/p}+(\alpha\beta)^{1/p}.

For α,β,γ\alpha,\beta,\gamma small enough, we have

(1β)(1γ)(1α)112(α+β+γ)(1-\beta)(1-\gamma)(1-\alpha)\leqslant 1-\frac{1}{2}(\alpha+\beta+\gamma)

(say), which by the concavity of xx1/px\mapsto x^{1/p} implies that

((1β)(1γ)(1α))1/p112p(α+β+γ).((1-\beta)(1-\gamma)(1-\alpha))^{1/p}\leqslant 1-\frac{1}{2p}(\alpha+\beta+\gamma).

On the other hand, from the arithmetic mean-geometric mean inequality we certainly have

(αγ)1/p,(αβ)1/p(α+β+γ)2/p.(\alpha\gamma)^{1/p},(\alpha\beta)^{1/p}\leqslant(\alpha+\beta+\gamma)^{2/p}.

Since p<2p<2, we conclude that the inequality (4) holds whenever α+β+γ\alpha+\beta+\gamma is sufficiently small, or equivalently when (a,b,c)(a,b,c) is sufficiently close to (0,1,1)(0,1,1). Since both sides of (4) depend continuously on a,b,ca,b,c, we now see that (4) holds whenever aa is sufficiently small, and similarly for bb and cc. Thus we may assume a,b,cεa,b,c\geqslant\varepsilon for some small absolute constant ε>0\varepsilon>0.

We next consider the boundary case a=1a=1, b,c>εb,c>\varepsilon. Here, we claim strict inequality:

(b(1c))1/p+(c(1b))1/p<1.(b(1-c))^{1/p}+(c(1-b))^{1/p}<1. (5)

Indeed, from the Cauchy-Schwarz inequality one has

(b(1c))1/2+(c(1b))1/2((b1/2)2+((1b)1/2)2)1/2(((1c)1/2)2+(c1/2)2)1/2=1(b(1-c))^{1/2}+(c(1-b))^{1/2}\leqslant\left((b^{1/2})^{2}+((1-b)^{1/2})^{2}\right)^{1/2}\left(((1-c)^{1/2})^{2}+(c^{1/2})^{2}\right)^{1/2}=1

and the claim follows since 1p>12\frac{1}{p}>\frac{1}{2}.

For similar reasons, we obtain strict inequality in (4) when b=1b=1 or c=1c=1. By continuity, this establishes (4) in all regions except the region

εa,b,c1ε\varepsilon\leqslant a,b,c\leqslant 1-\varepsilon

for some small absolute constant ε>0\varepsilon>0. We now work in this region.

We can rewrite (4) as

(1cc)1/p+(1aa)1/p+(1bb)1/p1(abc)1/p;\left(\frac{1-c}{c}\right)^{1/p}+\left(\frac{1-a}{a}\right)^{1/p}+\left(\frac{1-b}{b}\right)^{1/p}\leqslant\frac{1}{(abc)^{1/p}};

writing x(1aa)1/px\coloneqq(\frac{1-a}{a})^{1/p}, y(1bb)1/py\coloneqq(\frac{1-b}{b})^{1/p}, z(1cc)1/pz\coloneqq(\frac{1-c}{c})^{1/p}, x,y,zx,y,z lies in the region

(ε1ε)px,y,z(1εε)p\left(\frac{\varepsilon}{1-\varepsilon}\right)^{p}\leqslant x,y,z\leqslant\left(\frac{1-\varepsilon}{\varepsilon}\right)^{p} (6)

and the above inequality transforms to

x+y+zexp(f(x)+f(y)+f(z)p)x+y+z\leqslant\exp\left(\frac{f(x)+f(y)+f(z)}{p}\right)

or equivalently

f(x)+f(y)+f(z)plog(x+y+z)0f(x)+f(y)+f(z)-p\log(x+y+z)\geqslant 0 (7)

where f:(0,+)(0,+)f:(0,+\infty)\to(0,+\infty) is the function f(z)log(1+zp)f(z)\coloneqq\log(1+z^{p}).

Since the region (6) is compact, and the inequality (7) is already known on the boundary of this region, it suffices to verify (7) when (x,y,z)(x,y,z) is a critical point of the left-hand side, that is to say that

f(x)=f(y)=f(z)=px+y+z.f^{\prime}(x)=f^{\prime}(y)=f^{\prime}(z)=\frac{p}{x+y+z}.

Since f(z)=pz+z1pf^{\prime}(z)=\frac{p}{z+z^{1-p}}, we can rewrite this condition as

x+x1p=y+y1p=z+z1p=x+y+z.x+x^{1-p}=y+y^{1-p}=z+z^{1-p}=x+y+z. (8)

The function xx+x1px\mapsto x+x^{1-p} is increasing for x<(p1)1/px<(p-1)^{1/p} and decreasing for x>(p1)1/px>(p-1)^{1/p}, so it can only attain any given value at most twice. From (8) and the pigeonhole principle, we conclude that at least two of x,y,zx,y,z are equal. Without loss of generality we may assume x=yx=y, then from (8) we have

x1p=x+zx^{1-p}=x+z

and

z1p=2xz^{1-p}=2x

and hence

x1p=x+(2x)11p.x^{1-p}=x+(2x)^{\frac{1}{1-p}}.

Dividing by xx we obtain

xp=1+211pxp1px^{-p}=1+2^{\frac{1}{1-p}}x^{\frac{p}{1-p}}

and then setting uxpu\coloneqq x^{-p} we conclude that

u=1+(u/2)1p1.u=1+(u/2)^{\frac{1}{p-1}}. (9)

The function u1+(u/2)1p1u\mapsto 1+(u/2)^{\frac{1}{p-1}} is convex, equals 22 when u=2u=2, 0 when u=0u=0, and is larger than uu for sufficiently large uu. As a consequence, the equation (9) has exactly two solutions, one at u=2u=2 and one with u>2u>2; see Figure 2. The second solution can be computed numerically as u=10.70297u=10.70297\dots. Thus, there are two critical points (x,y,z)(x,y,z) with x=yx=y, the first of which is

(21/p,21/p,21/p)=(0.67113,0.67113,0.67113),(2^{-1/p},2^{-1/p},2^{-1/p})=(0.67113\dots,0.67113\dots,0.67113\dots),

and the second of which can be computed numerically as

(x,y,z)=(0.25568,0.25568,2.48086).(x,y,z)=(0.25568\dots,0.25568\dots,2.48086\dots).

At the first critical point, we have f(x)=f(y)=f(z)=log(3/2)f(x)=f(y)=f(z)=\log(3/2), and one easily verifies that the left-hand side of (7) vanishes since p=log(27/4)/log(3)p=\log(27/4)/\log(3). At the second critical point, one can numerically verify that

f(x)=f(y)=0.089321;f(z)=1.766695f(x)=f(y)=0.089321\dots;\quad f(z)=1.766695\dots

and hence

f(x)+f(y)+f(z)plog(x+y+z)=0.040307>0f(x)+f(y)+f(z)-p\log(x+y+z)=0.040307\dots>0

at this critical point, giving the claim.

Refer to caption
Figure 1: A graph of f(x)+f(y)+f(z)plog(x+y+z)f(x)+f(y)+f(z)-p\log(x+y+z) for 0x10\leqslant x\leqslant 1 with y=xy=x and z=x1pxz=x^{1-p}-x.
Refer to caption
Figure 2: A graph of uu (blue) and 1+(u/2)1p11+(u/2)^{\frac{1}{p-1}} (red) for 2u122\leqslant u\leqslant 12.
Remark 6.

The above methods extend111We thank Paata Ivanishvili for this comment. to establish the more general bound

f1fk(1n)f1p({0,1}n)fkp({0,1}n)f_{1}*\dots*f_{k}(1^{n})\leqslant\|f_{1}\|_{\ell^{p}(\{0,1\}^{n})}\dots\|f_{k}\|_{\ell^{p}(\{0,1\}^{n})}

for any k3k\geqslant 3, where now plogkkk(k1)k1p\coloneqq\log_{k}\frac{k^{k}}{(k-1)^{k-1}}. In particular one has

|{(A1,,Ak1,A)X×X×X:A=A1Ak1}||X|k/p|\{(A_{1},\dots,A_{k-1},A)\in X\times X\times X:A=A_{1}\uplus\dots\uplus A_{k-1}\}|\leqslant|X|^{k/p}

for any finite collection of sets. We sketch the details as follows. By repeating the above arguments (and using an induction on kk to handle boundary cases), one needs to show that

f(x1)++f(xk)plog(x1++xk)0f(x_{1})+\dots+f(x_{k})-p\log(x_{1}+\dots+x_{k})\geqslant 0

for εx1,,xk1ε\varepsilon\leqslant x_{1},\dots,x_{k}\leqslant 1-\varepsilon. We can again restrict attention to critical points, in which

x1+x11p==xk+xk1p=x1++xk.x_{1}+x_{1}^{1-p}=\dots=x_{k}+x_{k}^{1-p}=x_{1}+\dots+x_{k}.

As before, x1,,xkx_{1},\dots,x_{k} can take only two values, say xx and zz, leading to the equations

x+x1p=z+z1p=ax+bzx+x^{1-p}=z+z^{1-p}=ax+bz

for some positive integers a,ba,b summing to kk. Writing uxpu\coloneqq x^{-p} and vz/xv\coloneqq z/x, we have the system

u\displaystyle u =a1+bv\displaystyle=a-1+bv
uv1p\displaystyle uv^{1-p} =a+(b1)v.\displaystyle=a+(b-1)v.

Differentiating the second equation once with respect to uu gives

dvdu=v(b1)vp+(p1)u>0\frac{dv}{du}=\frac{v}{(b-1)v^{p}+(p-1)u}>0

and differentiating twice gives (after some algebra)

d2vdu2=p1vpdvdu(2p)u(b1)vp+(p1)u>0\frac{d^{2}v}{du^{2}}=\frac{p-1}{v^{p}}\frac{dv}{du}\frac{(2-p)u}{(b-1)v^{p}+(p-1)u}>0

so vv is again a convex function of uu (since 1<p<21<p<2), and so as before the equation u=a1+bvu=a-1+bv has at most two solutions, including the one at u=k1u=k-1 and v=1v=1. Using the equation x+x1p=ax+bzx+x^{1-p}=ax+bz to implicitly define zz in terms of xx, the function

af(x)+bf(z)plog(ax+bz)af(x)+bf(z)-p\log(ax+bz)

then has two critical points, including one at x=z=(k1)1/px=z=(k-1)^{-1/p} where the function vanishes. Direct calculation shows that this critical point is a local minimum (basically because f′′((k1)1/p)>0f^{\prime\prime}((k-1)^{-1/p})>0, which in turn follows from the inequality p>kk1p>\frac{k}{k-1}), so the function must be positive at the other critical point (otherwise there would be an additional critical point from the mean value theorem and intermediate value theorem), giving the claim.

2 A variant for additive energy

Recall from [9] that the additive energy E(A)E(A) of a finite subset AA of an additive group GG is defined as the number of quadruples (a1,a2,a3,a4)A×A×A×A(a_{1},a_{2},a_{3},a_{4})\in A\times A\times A\times A such that a1+a2=a3+a4a_{1}+a_{2}=a_{3}+a_{4}. We have the trivial bound E(A)|A|3E(A)\leqslant|A|^{3}, which is attained for instance when AA is itself a finite group. By modifying the above arguments, we have the following refinement in the discrete cube {0,1}n\{0,1\}^{n}:

Theorem 7.

Let n0n\geqslant 0, and let A{0,1}nA\subset\{0,1\}^{n}. Then E(A)|A|pE(A)\leqslant|A|^{p}, where plog26=2.58496p\coloneqq\log_{2}6=2.58496\dots. Furthermore, the exponent pp cannot be replaced by any smaller quantity.

The second claim is clear, since if A={0,1}nA=\{0,1\}^{n} then one easily computes that |A|=|{0,1}|n=2n|A|=|\{0,1\}|^{n}=2^{n} and E(A)=E({0,1})n=6nE(A)=E(\{0,1\})^{n}=6^{n}. As in the previous section, the theorem is proven by induction on nn together with an elementary inequality, namely

Lemma 8 (Elementary inequality).

If a0,a10a_{0},a_{1}\geqslant 0, then

a0p+4(a0a1)p2+a1p(a0+a1)p.a_{0}^{p}+4(a_{0}a_{1})^{\frac{p}{2}}+a_{1}^{p}\leqslant(a_{0}+a_{1})^{p}.
Refer to caption
Figure 3: A graph of (xp+4xp/2+1)/(1+x)p(x^{p}+4x^{p/2}+1)/(1+x)^{p} for 0x10\leqslant x\leqslant 1.
Proof.

By symmetry and scaling we may assume that a1=1a_{1}=1 and a0=x[0,1]a_{0}=x\in[0,1], thus we need to show that

xp+4xp/2+1(1+x)px^{p}+4x^{p/2}+1\leqslant(1+x)^{p}

for 0x10\leqslant x\leqslant 1 (see Figure 3). Near x=0x=0, the left-hand side is 1+O(xp/2)1+O(x^{p/2}) and the right-hand side is at least 1+px1+px, so the claim holds for xx sufficiently close to zero. At x=1x=1, the function xp+4xp/2+1x^{p}+4x^{p/2}+1 takes the value of 66, first derivative of 3p3p, and second derivative of

p(p1)+p(p2)=5.60917,p(p-1)+p(p-2)=5.60917\dots,

while (1+x)p(1+x)^{p} takes the value of 2p=62^{p}=6, first derivative of p2p1=3pp2^{p-1}=3p, and second derivative of

p(p1)2p2=6.14560,p(p-1)2^{p-2}=6.14560\dots,

so the claim also holds for xx sufficiently close to 11. It thus suffices to verify the inequality at any critical point of the functional

xp+4xp/2+1(1+x)p\frac{x^{p}+4x^{p/2}+1}{(1+x)^{p}}

in 0<x<10<x<1. Differentiating, we see that such a critical point solves the equation

(pxp1+2px(p2)/2)(1+x)p=(xp+4xp/2+1)p(1+x)p1(px^{p-1}+2px^{(p-2)/2})(1+x)^{p}=(x^{p}+4x^{p/2}+1)p(1+x)^{p-1}

which simplifies to

xp1+2x(p2)/22xp/2=1.x^{p-1}+2x^{(p-2)/2}-2x^{p/2}=1. (10)

The second derivative of the left-hand side is

p22xp62((2p2)xp/2+p4px);\frac{p-2}{2}x^{\frac{p-6}{2}}((2p-2)x^{p/2}+p-4-px);

since (2p2)xp/2(2p2)xpx(2p-2)x^{p/2}\leqslant(2p-2)x\leqslant px and p<4p<4, we conclude that xp1+2x(p2)/22xp/2x^{p-1}+2x^{(p-2)/2}-2x^{p/2} is strictly concave. As this function is 0 at x=0x=0 and 11 at x=1x=1, and has a derivative of p3<0p-3<0 at x=1x=1, there are exactly two solutions to (10) for 0x10\leqslant x\leqslant 1, one at x=1x=1 and another with 0<x<10<x<1; see Figure 4. The second solution can be numerically evaluated as x=0.131657x=0.131657\dots, at which

xp+4xp/2+1=1.29634x^{p}+4x^{p/2}+1=1.29634\dots

and

(1+x)p=1.376738(1+x)^{p}=1.376738\dots

giving the claim. ∎

Refer to caption
Figure 4: A graph of xp1+2x(p2)/22xp/2x^{p-1}+2x^{(p-2)/2}-2x^{p/2} for 0x10\leqslant x\leqslant 1.

Now we establish Theorem 7. The claim is trivial for n=0n=0, so suppose that n1n\geqslant 1 and that the claim has already been proven for n1n-1. For A{0,1}nA\subset\{0,1\}^{n}, we may partition

A=(A0×{0})(A1×{1})A=(A_{0}\times\{0\})\uplus(A_{1}\times\{1\})

for some A0,A1{0,1}nA_{0},A_{1}\subset\{0,1\}^{n}. We can then split

E(A)=E(A0)+4|{(a0,a1,a0,a1)A0×A1×A0×A1:a0+a1=a0+a1}|+E(A1).E(A)=E(A_{0})+4|\{(a_{0},a_{1},a^{\prime}_{0},a^{\prime}_{1})\in A_{0}\times A_{1}\times A_{0}\times A_{1}:a_{0}+a_{1}=a^{\prime}_{0}+a^{\prime}_{1}\}|+E(A_{1}).

By the Cauchy-Schwarz inequality (and writing a0+a1=a0+a1a_{0}+a_{1}=a^{\prime}_{0}+a^{\prime}_{1} as a0a0=a1a1a_{0}-a^{\prime}_{0}=a^{\prime}_{1}-a_{1}) we have

|{(a0,a1,a0,a1)A0×A1×A0×A1:a0+a1=a0+a1}|E(A0)1/2E(A1)1/2|\{(a_{0},a_{1},a^{\prime}_{0},a^{\prime}_{1})\in A_{0}\times A_{1}\times A_{0}\times A_{1}:a_{0}+a_{1}=a^{\prime}_{0}+a^{\prime}_{1}\}|\leqslant E(A_{0})^{1/2}E(A_{1})^{1/2}

and hence by the induction hypothesis

E(A)|A0|p+4(|A0||A1|)p/2+|A1|p.E(A)\leqslant|A_{0}|^{p}+4(|A_{0}||A_{1}|)^{p/2}+|A_{1}|^{p}.

Applying Lemma 8 and noting that |A0|+|A1|=|A||A_{0}|+|A_{1}|=|A|, we obtain E(A)|A|pE(A)\leqslant|A|^{p}, closing the induction.

Remark 9.

The same argument shows that

ff2f4/p2\|f*f\|_{\ell^{2}}\leqslant\|f\|_{\ell^{4/p}}^{2}

for any function f:{0,1}nf:\{0,1\}^{n}\to\mathbb{C} (where the convolution fff*f is viewed as a function on {0,1,2}n\{0,1,2\}^{n}). By several applications of the Cauchy-Schwarz inequality, this implies that

|a1,a2,a3,a4{0,1}n:a1+a2=a3+a4f1(a1)f2(a2)f3(a3)f4(a4)|f14/pf24/pf34/pf44/p|\int_{a_{1},a_{2},a_{3},a_{4}\in\{0,1\}^{n}:a_{1}+a_{2}=a_{3}+a_{4}}f_{1}(a_{1})f_{2}(a_{2})f_{3}(a_{3})f_{4}(a_{4})|\leqslant\|f_{1}\|_{\ell^{4/p}}\|f_{2}\|_{\ell^{4/p}}\|f_{3}\|_{\ell^{4/p}}\|f_{4}\|_{\ell^{4/p}}

for any functions f1,f2,f3,f4:{0,1}nf_{1},f_{2},f_{3},f_{4}:\{0,1\}^{n}\to\mathbb{C}. Thus, for instance, if A1,A2,A3,A4{0,1}nA_{1},A_{2},A_{3},A_{4}\in\{0,1\}^{n}, the number of solutions to a1+a2=a3+a4a_{1}+a_{2}=a_{3}+a_{4} with a1A1,a2A2,a3A3,a4A4a_{1}\in A_{1},a_{2}\in A_{2},a_{3}\in A_{3},a_{4}\in A_{4} is at most |A1|p/4|A2|p/4|A3|p/4|A4|p/4|A_{1}|^{p/4}|A_{2}|^{p/4}|A_{3}|^{p/4}|A_{4}|^{p/4}.

Remark 10.

In [1], the method of compressions is used to obtain optimal lower bounds for the size |A+B||A+B| of a sumset of two subsets A,BA,B of {0,1}n\{0,1\}^{n} of specified cardinality. It is possible that compression methods could also be used to obtain an alternate proof of Theorem 7, and perhaps to also refine the upper bound of |A|log26|A|^{\log_{2}6} slightly when |A||A| is not a power of two. However, we were unable to use the method of compressions to attack Theorem 1.

3 Acknowledgments

The authors would like to thank Siavash Mirarab for bringing this problem to our attention, and David Speyer and the anonymous referee for helpful comments. DK is supported by NSF award CCF-1553288 (CAREER). TT is supported by NSF grant DMS-1266164, the James and Carol Collins Chair, and by a Simons Investigator Award.

References

  • [1] B. Bollobás, I. Leader, Sums in the grid, Discrete Math. 162 (1996), no. 1–3, 31–48.
  • [2] P. Diaconis, L. Saloff-Coste, Logarithmic Sobolev inequalities for finite Markov chains, The Annals of Applied Probability, 6 (1996), 695–750.
  • [3] R. A. Kunze, E. M. Stein, Uniformly bounded representations and harmonic analysis of the 2×22\times 2 real unimodular group, Amer. J. Math. 82 (1960), 1–62.
  • [4] S. Mirarab, R. Reaz, M. S. Bayzid, T. Zimmermann, M. S. Swenson, T. Warnow, ASTRAL: genome-scale coalescent-based species tree estimation, Bioinformatics. 2014;30(17):i541-i548. doi:10.1093/bioinformatics/btu462.
  • [5] S. Mirarab, T. Warnow, ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes, Bioinformatics. 2015;31(12):i44-i52. doi:10.1093/bioinformatics/btv234.
  • [6] M. T. Hallett, J. Lagergren, New Algorithms for the Duplication-Loss Model; 2000. doi:10.1145/332306.332359.
  • [7] D. Bryant, M. Steel, Constructing Optimal Trees from Quartets, J Algorithms. 2001;38:237-259. doi:10.1006/jagm.2000.1133.
  • [8] M. Lafond, C. Scornavacca, On the Weighted Quartet Consensus problem. October 2016.
  • [9] T. Tao, V. Vu, Additive combinatorics. Cambridge Studies in Advanced Mathematics, 105. Cambridge University Press, Cambridge, 2006