This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

PPP-Completeness and Extremal Combinatoricsthanks: Part of this wok done while visiting R.B., L.F., P.H., and N.I.S. were visiting Bocconi University.

Romain Bourneuf ENS de Lyon Lukáš Folwarczný Supported by the Grant Agency of the Czech Republic under the grant agreement no. 19-27871X and by the Charles University grant SVV–2020–260578. Charles University, Faculty of Mathematics and Physics Institute of Mathematics, Czech Academy of Sciences Pavel Hubáček Supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 101019547), by the Cariplo CRYPTONOMEX grant, by the Grant Agency of the Czech Republic under the grant agreement no. 19-27871X, and by the Charles University project UNCE/SCI/004. Charles University, Faculty of Mathematics and Physics Alon Rosen Supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 101019547) and Cariplo CRYPTONOMEX grant. Bocconi University and Reichman University Nikolaj I. Schwartzbach Aarhus University
Abstract

Many classical theorems in combinatorics establish the emergence of substructures within sufficiently large collections of objects. Well-known examples are Ramsey’s theorem on monochromatic subgraphs and the Erdős-Rado sunflower lemma. Implicit versions of the corresponding total search problems are known to be PWPP-hard; here “implicit” means that the collection is represented by a poly-sized circuit inducing an exponentially large number of objects.

We show that several other well-known theorems from extremal combinatorics – including Erdős-Ko-Rado, Sperner, and Cayley’s formula – give rise to complete problems for PWPP and PPP. This is in contrast to the Ramsey and Erdős-Rado problems, for which establishing inclusion in PWPP has remained elusive. Besides significantly expanding the set of problems that are complete for PWPP and PPP, our work identifies some key properties of combinatorial proofs of existence that can give rise to completeness for these classes.

Our completeness results rely on efficient encodings for which finding collisions allows extracting the desired substructure. These encodings are made possible by the tightness of the bounds for the problems at hand (tighter than what is known for Ramsey’s theorem and the sunflower lemma). Previous techniques for proving bounds in TFNP invariably made use of structured algorithms. Such algorithms are not known to exist for the theorems considered in this work, as their proofs “from the book” are non-constructive.

1 Introduction

A well-known theorem by Ramsey gives a lower bound on the size of the largest monochromatic clique in any edge-coloring of the complete graph using two colors.

Ramsey [Ram30]

Any edge-coloring of the complete graph on nn vertices with two colors contains a monochromatic clique of size at least 12logn\frac{1}{2}\log n.

Ramsey’s theorem gives rise to a natural computational search problem Ramsey [Kra05, KNY19]: given a description of an edge-coloring, output the vertices of a monochromatic clique of size 12logn\frac{1}{2}\log n. Since the theorem guarantees the existence of a monochromatic clique of this size, Ramsey belongs to the complexity class TFNP consisting of efficiently verifiable search problems to which a solution is guaranteed to exist [MP91].

The computational complexity of Ramsey very much depends on its representation. One the one hand, it is efficiently solvable when the graph is given explicitly; a folklore proof of Ramsey’s theorem gives an efficient algorithm to find such a subgraph – see Appendix A. On the other hand, the situation is less clear when the graph is represented implicitly, e.g., via a Boolean circuit that, for any pair of vertices, outputs the corresponding color of the edge-coloring of the graph.111Given such a representation, it might be even hard to compute the degree of a node with respect to one of the two colors.

Another TFNP problem considered in the literature that is motivated by a result in extremal combinatorics arises from the well-known Erdős-Rado sunflower lemma.

Erdős-Rado [ER60]

Any family of nn-sets of cardinality greater than nnn!n^{n}n! contains an nn-sunflower of size n+1n+1, i.e., subsets A1,A2,,An+1A_{1},A_{2},\ldots,A_{n+1}\in\mathcal{F} such that, for some Δ\Delta, AiAj=ΔA_{i}\cap A_{j}=\Delta for every distinct Ai,AjA_{i},A_{j}.

An instance of the total search problem Sunflower [KNY19] can be implicitly represented, e.g., via a Boolean circuit that, given an index of a set in the family, outputs its characteristic vector.

In general, little is known of the complexity of the implicit variants of Ramsey or Sunflower – the proofs of the corresponding theorems are either non-constructive or result in inefficient (i.e., superpolynomial-time) algorithms. Both problems are known to be PWPP-hard, as shown by Krajíček [Kra05] and Komargodski, Naor, and Yogev [KNY19]. This means that finding the desired substructure is at least as hard as finding collisions in an arbitrary poly-sized shrinking circuit and, hence, hard in the worst-case if collision-resistant hash functions exist. However, they are not known to be complete for the class PWPP and the intriguing question of whether they give rise to a complexity class distinct from PWPP has remained open for years.

1.1 Our Results

We explore new connections between classical theorems in extremal combinatorics and the complexity classes PPP [Pap94] and PWPP [Jeř16], i.e., the classes of search problems with totality guaranteed by the (weak) pigeonhole principle. We show that PPP and PWPP can be characterized via a number of new TFNP problems based on the following theorems.

Erdős-Ko-Rado [EKR61].

Any family of distinct pairwise-intersecting kk-sets on a universe of size mm has size at most (m1k1)\binom{m-1}{k-1}.

Sperner [Spe28].

The largest antichain, i.e., a family of subsets such that no member is contained in any other, on a universe with 2n2n elements is unique and consists of all subsets of size nn.

Cayley [Cay89].

There are exactly nn2n^{n-2} spanning trees of the complete graph on nn vertices.

Just as for Ramsey and Sunflower, the corresponding search problems are efficiently solvable when given explicit access to the family of objects and, again, their computational complexity is open when we consider implicit access to the structure, e.g., where the instance is given by a circuit that on input ii returns an encoding of the ithi^{\text{th}} object in the collection.222Note that an implicit representation of the collection might not necessarily satisfy the assumptions of the underlying theorem. For instance, representing sets via characteristic vectors for Erdős-Ko-Rado does not ensure that they are actually kk-sets or that they are distinct. Importantly, such a violation could allow evading the totality of the search problem. Nevertheless, we can ensure totality by allowing locally verifiable evidence of a malformed representation as a solution, e.g., an index not corresponding to a kk-set or two indices corresponding to the same set. The totality of the problems we define follows from a common principle – the instances are given via an implicit representation of a sufficiently large collection of objects (e.g., subsets for Erdős-Ko-Rado) such that, by the corresponding theorem, there exists a small subset of these objects satisfying some efficiently verifiable property (e.g, a pair of disjoint subsets for Erdős-Ko-Rado).

In addition to the above completeness results, we define TFNP problems arising from the following two results in extremal combinatorics.

Mantel [Man07].

Any triangle-free graph on nn vertices has at most n2/4n^{2}/4 edges.

Ward-Szabo [WS95].

Any edge-coloring of the complete graph on nn vertices with 2rn2\leq r\leq\sqrt{n} colors must contain a bichromatic triangle.

We show that variants of the corresponding problems are hard for PWPP and PPP. However, proving their inclusion in PWPP or PPP remains open and they join Ramsey and Sunflower as candidate problems that might define a new class above PWPP or PPP (see Section 1.5). An overview of our results in terms of weak and strong problems (see Section 1.3) is given in Table 1.

Problem Hardness Containment
Ramsey PWPP [Kra05, KNY19] TFNP
Sunflower
Ward-Szabo       PWPP

[Theorems 7.4, 8.3 and 8.12]
weak-Mantel          PPP

[Theorems 7.7, 8.4 and 8.13]
weak-Turánr
Ward-Szabo-Colorful-Collisions
Ward-Szabo-Collisions PWPP   [Theorems 7.5, 7.4, 5.2, 6.2, 4.4 and 4.21]
weak-Erdős-Ko-Rado
weak-general-Erdős-Ko-Radok
weak-Sperner-Antichain
weak-Cayley
Erdős-Ko-Rado PPP   [Theorems 6.7, 5.7, 4.9 and 4.26]
general-Erdős-Ko-Radok
Sperner-Antichain
Cayley
Mantel PPP  [Theorem 8.8] TFNP
Table 1: Summary of the complexity of problems we consider. Except for Ramsey and Sunflower, all problems were introduced in this work. The containment results for weak-general-Erdős-Ko-Radok and general-Erdős-Ko-Radok rely on the efficient Baranyai assumption (Assumption 4.18).

1.2 Techniques and Ideas

A long-standing open problem regarding Ramsey and Sunflower has been to determine their status with respect to the classes PWPP and PPP. For the most part, the most challenging part in establishing completeness for some syntactic subclass of TFNP lies in proving hardness (see, e.g., [DGP09, Meh18, FG18]). For subclasses of TFNP such as PPAD, PPA, and PLS, the inclusion in a subclass mostly follows from the existence of an inefficient yet structured algorithm for the problem at hand; for example, the chessplayer algorithm for PPA [Pap94] or the steepest descent algorithm for PLS [JPY88]. However, this methodology seems inapplicable for proving inclusion in PWPP or PPP as these classes do not exhibit any characterizing graph-theoretic structure that could capture some class of natural algorithms.

In contrast to many existing bounds in TFNP, our work does not make use of structured algorithms but instead makes use of encodings that translate between substructures and collisions in circuits. In order to establish inclusion in PWPP, we encode the objects of the collection using a “property-preserving encoding” that encodes the objects in a way that translates some specific relation into collisions. More precisely, we want an encoding function that is efficiently computable and (nearly) optimal, such that whenever two elements have the same encoding, these two elements give a solution to the original problem. While this technique is quite general, it is not always clear how to instantiate the encoding to get the desired collisions.

Consider, for example, the total search problem corresponding to the Erdős-Ko-Rado theorem for intersecting families of nn-sets on a universe of size 2n2n. An instance can be given by a Boolean circuit C:{0,1}log((2n1n1))+1{0,1}2nC\colon\{0,1\}^{\left\lceil\log\left(\binom{2n-1}{n-1}\right)\right\rceil+1}\to\{0,1\}^{2n} representing a family of subsets of [2n][2n], i.e., C(i)C(i) is the characteristic vector of the ii-th nn-set in the family. Suppose the outputs of CC define distinct nn-sets. Since there are more than (2n1n1)\binom{2n-1}{n-1} of them, then, by the Erdős-Ko-Rado theorem, there must exist a pair of inputs mapped to disjoint nn-sets by CC. We define any such pair of inputs to be a solution.333To ensure the totality of the problem, we introduce additional solutions corresponding to succinct certificates that CC does not define a family of distinct nn-sets, i.e., either an ii such that C(i)C(i) is not of Hamming weight nn or a pair iji\neq j such that C(i)=C(j)C(i)=C(j).

When proving that the above total search problem is contained in the complexity class PWPP, at a high level, we want to encode the nn-sets of the family using a shrinking circuit, in such a way that collisions correspond to disjoint sets. Observe that for nn-sets in a universe of size 2n2n, the only disjoint sets are complements and, hence, we get an equivalent instance of the problem if we map each set to either itself or its complement, arbitrarily. In our construction, we map each set SS to the representative not containing the element 1. That is, if 1S1\not\in S, the set is left unchanged and, otherwise, it is mapped to its complement S¯\overline{S}. Note that, by the pigeonhole principle, two sets that do not contain 1 must have a non-empty intersection since we work with nn-subsets of [2n][2n]. To obtain a shrinking circuit, we make use of Cover encodings (Section 3.1) that give an optimal encoding of all nn-sets by considering their lexicographic order. Notice that if the input SS is not an nn-set, we may map it arbitrarily to any nn-set, as a collision, in this case, yields a solution to the instance of the above problem motivated by the Erdős-Ko-Rado theorem.

In contrast, the PWPP-hardness results for Ramsey and Sunflower follow an extremely elegant but rather direct (compared to other hardness results for subclasses of TFNP) technique of graph-hash product [Kra05, KNY19], which we illustrate on Ramsey. Recall that there are known randomized constructions of edge-colorings of the complete graph K2n/4K_{2^{n/4}} on 2n/42^{n/4} vertices that do not contain a monochromatic clique of size n/2n/2 [Erd47]. Given such an underlying edge-coloring of K2n/4K_{2^{n/4}} and a hash function hh mapping nn-bit strings to n/4n/4-bit strings, one can construct an edge-coloring of the complete graph on 2n2^{n} vertices by assigning to every edge (u,v){0,1}n×{0,1}n(u,v)\in\{0,1\}^{n}\times\{0,1\}^{n} the color of the edge (h(u),h(v)){0,1}n/4×{0,1}n/4(h(u),h(v))\in\{0,1\}^{n/4}\times\{0,1\}^{n/4} from the underlying coloring. Since the underlying edge-coloring of K2n/4K_{2^{n/4}} does not contain a monochromatic clique of size n/2n/2, it is easy to see that any monochromatic clique of size n/2n/2 in the resulting edge-coloring of K2nK_{2^{n}} (guaranteed to exist by Ramsey’s theorem) must have been introduced via a collision in the hash hh.

As noted by [KNY19], the structure of a PWPP-hardness proof using the graph-hash product is not restricted to total search problems corresponding to graph-theoretic theorems of existence; indeed, [KNY19] used the graph-hash product to prove also PWPP-hardness of Sunflower. On a high level, for a problem to be amenable to the graph-hash product technique, it is sufficient to be able to construct a collection of objects such that 1) it does not contain the desired substructure, 2) its size is at least a constant fraction of the threshold necessary for the existential theorem to apply,444This is a technical condition ensuring that we can reduce from a PWPP-complete variant of the problem of finding collisions in a shrinking hash. Note that it is easy to find collisions in functions that exhibit extreme shrinking. and 3) it can be efficiently indexed. Then, we can interpret the output of an appropriately shrinking hash hh as an index into the small collection of objects, and, for each index, we can efficiently compute and output the corresponding element in the collection. Again, since the small collection does not contain the desired substructure, all solutions of the instance constructed via graph-hash product must in some way result from a collision in the hash hh.

For example, consider the total search problem arising from Sperner’s theorem on antichains – here, the threshold size is (2nn)\binom{2n}{n}, meaning that if we have a family with strictly more than (2nn)\binom{2n}{n} distinct subsets of [2n][2n] then one subset from the family must be contained in another member of the family. It is straightforward to construct a family of subsets that does not contain the specific substructure (i.e., a subset that is included in another one) with size equal to the threshold size (2nn)\binom{2n}{n}. It suffices to consider the family of all the nn-subsets of [2n][2n]. Similarly, for many other combinatorial problems we study, an adequate collection of objects can be found by looking at a collection of maximum size that does not contain the substructure.

We also show natural reductions between some of the problems we define (from Erdős-Ko-Rado to Sperner-Antichain for instance), which, in our opinion, highlights the relevance of these new problems and the fact that their definition is the correct one.

1.3 PPP-Completeness From Extremal Combinatorics

Up to this point, our discussion did not explicitly distinguish between the classes PWPP and PPP. However, our work highlights important structural differences between the two complexity classes. Recall that the class PWPP contains the search problems in TFNP whose totality can be proved using the weak pigeonhole principle: “In any assignment of 2n2n pigeons to nn holes there must be two pigeons sharing the same hole.”

This statement can be seen as a result in extremal combinatorics bounding the maximum number of pigeons that can be assigned to nn holes without two pigeons being sent to the same hole. More generally, we say that a theorem from extremal combinatorics is “weak” if it gives an upper bound (which may or may not be tight) on the maximum size of a collection of objects that does not contain some substructure (above, two pigeons sharing the same hole). On the contrary, we say that a theorem from extremal combinatorics is “strong” if it gives a tight upper bound on the maximum size of a collection of objects that does not contain some substructure, as well as some structural property about the maximum families without the substructure. For instance, the strong pigeonhole principle can be stated as: “In any assignment of nn pigeons to nn holes there is either a pigeon in the first hole or two pigeons sharing the same hole.” Note that it is exactly this formulation of the strong pigeonhole principle that defines the class PPP.

Many results in extremal combinatorics have a weak statement and a strong statement. For such results, we can define a problem corresponding to the weak statement, which often is related to PWPP, and a problem corresponding to the strong statement, which often is related to PPP. In this paper, all PWPP-hard problems correspond to a weak theorem in extremal combinatorics, while PPP-hard problems correspond to a strong theorems in extremal combinatorics. As an example, consider Cayley’s formula and note that the bound nn2n^{n-2} is tight. Hence, if we are given a collection of exactly nn2n^{n-2} distinct graphs on nn vertices, then either one of the graphs is not a spanning tree, or every spanning tree is in the collection. This observation induces a TFNP problem that we show to be PPP-complete.

1.4 Related Work

Compared to the majority of subclasses of TFNP that have been extensively studied and are known to capture various total search problems from diverse domains of mathematics, PPP and PWPP might seem less expressive and the first non-trivial completeness results appeared only recently.

Sotiraki, Zampetakis, and Zirdelis [SZZ18] and Ban, Jain, Papadimitriou, Psomas, and Rubinstein [BJP+19] demonstrated that PPP contains computational problems from number theory and the theory of integral lattices. In particular, Sotiraki et al. showed PPP-completeness of a computational problem related to Blitchfeld’s theorem and PPP-completeness (resp. PWPP-completeness) of a problem motivated by the Short Integer Solution problem. Hubáček and Václavek [HV21] showed that some general formalizations of the discrete logarithm problem are complete for PWPP and PPP and, motivated by classical constructions of collision-resistant hashing, they characterized PWPP via the problem of breaking claw-free (pseudo-)permutations.

1.5 Open Problems

Our work suggests various interesting directions for future research:

  • We exploit the power of strong statements in extremal combinatorics for establishing PPP-completeness. The notorious lack of tight bounds for the Erdős-Rado sunflower lemma and Ramsey’s theorem implies that we have no strong version of these theorems, which may explain why showing the inclusion of the corresponding problems in, e.g., PPP has eluded researchers.

  • We introduced total search problems corresponding to Mantel’s theorem, Turán’s theorem, and Ward-Szabo’s theorem. In this work, we only prove hardness results for these problems but no inclusion results. Hence, it is still open whether they are complete for the classes PPP and PWPP, or whether they could define a new subclass of TFNP.

  • The Turánr problem is defined in a similar fashion to Mantel, yet, unlike for Mantel, we currently do not have a proof of PPP-hardness for it. Thus, the question of PPP-hardness of Turánr is immediate. Alternatively, it would be interesting to define a different PPP-hard problem in a natural way from Turán’s theorem.

  • Another exciting question is whether the efficient Baranyai assumption (4.18) holds, as well as whether it is possible to prove the inclusion results of the problems associated to the general version of Erdős-Ko-Rado’s theorem without that assumption. Showing reductions between general-Erdős-Ko-Radok and general-Erdős-Ko-Radol for klk\neq l without the efficient Baranyai assumption would also be intriguing.

  • Finally, we believe the problems General-Pigeonkm\textsc{General-Pigeon}_{k}^{m} deserve a more thorough investigation to further our understanding of the classes they define and their interrelation.

2 Preliminaries

We denote by logx\log x the binary logarithm of xx. We denote by [n][n] the set {1,2,3,,n1,n}\{1,2,3,\ldots,n-1,n\}. We interpret elements of {0,1}\{0,1\}^{*} as strings and write them as x=x1x2xnx=x_{1}x_{2}\cdots x_{n} for xi{0,1}x_{i}\in\{0,1\}. Each element xix_{i} is also called a bit. We say nn is the length of x{0,1}nx\in\{0,1\}^{n}, and say xx is an nn-bit string. We denote by 0n0^{n} (resp. 1n1^{n}) the nn-bit string consisting of all 0 (resp. 1). If x,y{0,1}x,y\in\{0,1\}^{*} are two strings of lengths n,mn,m, respectively, we denote by xy=x1x2xny1y2ymx\mathbin{\|}y=x_{1}x_{2}\cdots x_{n}y_{1}y_{2}\cdots y_{m} the concatenation of xx and yy. We denote by \leq the lexicographical order on strings. Note that \leq is a partial order as it is only well-defined for strings of the same length. We use x<yx<y to denote xyx\leq y and xyx\neq y. We may occasionally abuse notation and write x<kx<k where kk\in\mathbb{N}, in which case we mean the binary encoding of kk on the same number of bits as xx. If logk\left\lceil\log k\right\rceil exceeds the length of xx, we define x<kx<k such that the order is total.

If Ω\Omega is a set of size nn, we associate the set 2Ω2^{\Omega} with the characteristic vectors from {0,1}n\{0,1\}^{n} for some arbitrary (but fixed) order on Ω\Omega. We denote by \subseteq the partial order on {0,1}n\{0,1\}^{n} where xyx\subseteq y iff xiyix_{i}\leq y_{i} for every i=1ni=1\ldots n. If x{0,1}nx\in\{0,1\}^{n} is a string, we denote by x¯:=x¯1x¯2x¯n\overline{x}:=\overline{x}_{1}\overline{x}_{2}\cdots\overline{x}_{n} the complement of xx, defined by x¯i=1xi\overline{x}_{i}=1-x_{i}. We also use other set-theoretic operators ,,\cap,\cup,\setminus that are defined in a natural way. We also denote by |x|=i=1nxi|x|=\sum_{i=1}^{n}x_{i} the number of 1s in xx when the length is implicit from the context.

2.1 Total Search Problems

A search problem is defined by a binary relation R{0,1}×{0,1}R\subseteq\{0,1\}^{*}\times\{0,1\}^{*} – a string s{0,1}s\in\{0,1\}^{*} is a solution for an instance x{0,1}x\in\{0,1\}^{*} if (x,s)R(x,s)\in R. A search problem defined by relation RR is total if for every xx, there exists an ss such that (x,s)R(x,s)\in R. We define TFNP as the class of all total search problems that can be efficiently verified, i.e., there is a deterministic polynomial-time Turing machine that, given (x,s)(x,s), outputs 1 if and only if (x,s)R(x,s)\in R and, for every instance xx, there exists a solution ss of polynomial length in the size of xx.

To avoid unnecessarily cumbersome phrasing throughout the paper, we define TFNP relations implicitly by presenting the set of valid instances X{0,1}X\subseteq\{0,1\}^{*} recognizable in polynomial time (in the length of an instance) and, for each instance iXi\in X, the set of admissible solutions Yi{0,1}Y_{i}\subseteq\{0,1\}^{*} for the instance ii. It is then implicitly assumed that, for any invalid instance i{0,1}Xi\in\{0,1\}^{*}\setminus X, we define the corresponding solution set as Yi={0,1}Y_{i}=\{0,1\}^{*}.

Next, we recall the definitions of the complexity classes PWPP and PPP via their canonical complete problems weak-Pigeon and Pigeon.

Definition 2.1 (weak-Pigeon and PWPP [Jeř16]).

The problem weak-Pigeon is defined by the relation

Instance:

A Boolean circuit C:{0,1}n{0,1}n1C\colon\{0,1\}^{n}\to\{0,1\}^{n-1}.

Solution:

x1x2x_{1}\neq x_{2} s.t. C(x1)=C(x2)C(x_{1})=C(x_{2}).

The class of all TFNP problems reducible to weak-Pigeon is called PWPP.

Definition 2.2 (Pigeon and PPP [Pap94]).

The problem Pigeon is defined by the relation

Instance:

A Boolean circuit C:{0,1}n{0,1}nC\colon\{0,1\}^{n}\to\{0,1\}^{n}.

Solution:

One of the following:

  1. i)

    xx s.t. C(x)=0nC(x)=0^{n},

  2. ii)

    xyx\neq y s.t. C(x)=C(y)C(x)=C(y).

The class of all TFNP problems reducible to Pigeon is called PPP.

3 Property-Preserving Encodings

A key ingredient to our proofs of inclusion in PWPP and PPP is the use of efficient encodings. We rely on two different types of encodings. The first one simply consists of bijections between two different representations of the same set of objects, the first one being more natural and more convenient to work with, and the second one being more concise. The second type of encodings, which we call property-preserving encodings, consists of shrinking functions, in the sense that the range of the encoding is smaller than the domain, whose collisions exactly correspond to elements sharing some property. The following definition gives a precise description of the features we require from these encodings.

Definition 3.1 (Property-preserving encoding).

Let 𝒳{0,1}k,𝒴\mathcal{X}\subseteq\{0,1\}^{k},\mathcal{Y} be sets, and let \sim be an equivalence relation on 𝒳\mathcal{X}. Let E:{0,1}k𝒴E:\{0,1\}^{k}\rightarrow\mathcal{Y} be a surjection. We say that EE constitutes a property-preserving encoding for \sim on 𝒳\mathcal{X} if it satisfies.

  • (Efficiency). EE can be computed in polynomial time.

  • (Compression). |𝒴||𝒳||\mathcal{Y}|\leq|\mathcal{X}|.

  • (\sim-correctness). EE is constant on every coset of 𝒳\mathcal{X} for \sim.

We first describe some bijective encodings before studying some property-preserving encodings.

3.1 Cover Encodings

Our reductions in Section 4 make use of Cover encodings [Cov73] that efficiently encode subsets of a specified size in optimal space: namely, we may encode every subset S{0,1}mS\subseteq\{0,1\}^{m} such that |S|=k|S|=k by considering the lexicographic order of all (mk)\binom{m}{k} such sets (in fact we consider the lexicographic order over their characteristic vectors {0,1}m\in\{0,1\}^{m}), and mapping this into binary strings: this requires log(mk)\left\lceil\log\binom{m}{k}\right\rceil bits, which is optimal. We denote the encoding and decoding functions as follows, with α(k,m)=log(mk)\alpha(k,m)=\left\lceil\log\binom{m}{k}\right\rceil.

ECoverk,m:{0,1}m\displaystyle E_{\textsf{Cover}}^{k,m}:\{0,1\}^{m} {0,1}α(k,m)\displaystyle\rightarrow\{0,1\}^{\alpha(k,m)}
DCoverk,m:{0,1}α(k,m)\displaystyle D_{\textsf{Cover}}^{k,m}:\{0,1\}^{\alpha(k,m)} {0,1}m\displaystyle\rightarrow\{0,1\}^{m}

We set ECover=ECovern,2nE_{\textsf{Cover}}=E_{\textsf{Cover}}^{n,2n} and DCover=DCovern,2nD_{\textsf{Cover}}=D_{\textsf{Cover}}^{n,2n}, and α=α(n,2n)\alpha=\alpha(n,2n). As described in [Cov73], these functions can be made efficient.

Lemma 3.2.

For every kmk\leq m, DCoverk,mECoverk,mD_{\textsf{Cover}}^{k,m}\circ E_{\textsf{Cover}}^{k,m} is the identity over all kk-subsets of {0,1}m\{0,1\}^{m}. Similarly, ECoverk,mDCoverk,mE_{\textsf{Cover}}^{k,m}\circ D_{\textsf{Cover}}^{k,m} is the identity over the first (km)\binom{k}{m} elements in the lexicographic order of {0,1}α(k,m)\{0,1\}^{\alpha(k,m)}.

Note that the behavior of DCoverk,mD_{\textsf{Cover}}^{k,m} is undefined for the last 2α(k,m)(mk)2^{\alpha(k,m)}-\binom{m}{k} inputs. Furthermore, by design, ECoverk,mE_{\textsf{Cover}}^{k,m} is well-defined on any subset of [m][m] (even if this subset does not have size kk), but the encoding only makes sense for subsets of size kk. We also note the following identity which will be useful later when dealing with nn-subsets of [2n][2n].

DCover(0α)=0n1n=[n]¯D_{\textsf{Cover}}(0^{\alpha})=0^{n}1^{n}=\overline{[n]} (1)
Remark 3.3.

When we encode nn-subsets of [2n][2n], since we encode sets according to their rank of their characteristic vector in the lexicographic order, any set that does not contain element 1 is one of the (2n1n1)=12(2nn)2α1\binom{2n-1}{n-1}=\frac{1}{2}\binom{2n}{n}\leq 2^{\alpha-1} first ones in the lexicographic order, hence its encoding starts with a 0. Conversely, if we decode an element whose first two bits are 0’s, this means that the corresponding nn-subset of [2n][2n] is one of the first 2α2(2n1n1)2^{\alpha-2}\leq\binom{2n-1}{n-1} in the lexicographic order, hence that it does not contain the element 1. \hfill\diamond

3.2 Encoding 2-subsets of [2n][2^{n}]

In Section 7, we need to encode the subsets of [2n][2^{n}] with 2 distinct elements in an injective way. Unfortunately, since the base set is large, we cannot use Cover encodings to do so. However, we can use the idea behind Cover encodings, that is to encode the subsets by their rank in the lexicographic order. Consider (x,y)[2n]×[2n](x,y)\in[2^{n}]\times[2^{n}], with x<yx<y. What is its rank in the lexicographic order?
All subsets whose smallest element is smaller than xx have a lower rank. The number of such subsets is

(2n1)+(2n2)++(2nx+1)\displaystyle(2^{n}-1)+(2^{n}-2)+\ldots+(2^{n}-x+1) =j=2nx+12n1j\displaystyle=\sum_{j=2^{n}-x+1}^{2^{n}-1}j
=j=12n1jj=12nxj\displaystyle=\sum_{j=1}^{2^{n}-1}j-\sum_{j=1}^{2^{n}-x}j
=2n(2n1)2(2nx)(2nx+1)2\displaystyle=\frac{2^{n}(2^{n}-1)}{2}-\frac{(2^{n}-x)(2^{n}-x+1)}{2}

All subsets whose smallest element is xx and whose second smallest element is smaller than yy also have a lower rank. There are exactly yx1y-x-1 such subsets.
Hence, the rank of the subset (x,y)(x,y) in the lexicographic order is

2n(2n1)2(2nx)(2nx+1)2+yx1\frac{2^{n}(2^{n}-1)}{2}-\frac{(2^{n}-x)(2^{n}-x+1)}{2}+y-x-1

Note that since there are (2n2)<22n1\binom{2^{n}}{2}<2^{2n-1} subsets of [2n][2^{n}] with 2 distinct elements, the rank of any subset (x,y)(x,y) with x<yx<y can be written in binary using 2n12n-1 bits. Now, denote as Elex:{0,1}n×{0,1}n{0,1}2n1E_{lex}:\{0,1\}^{n}\times\{0,1\}^{n}\rightarrow\{0,1\}^{2n-1} the following circuit. On input (x,y)(x,y), it proceeds as follows.

  1. 1.

    If x=yx=y, it returns 02n10^{2n-1}.

  2. 2.

    If x<yx<y, it computes and returns the binary encoding on 2n12n-1 bits of 2n(2n1)2(2nx)(2nx+1)2+yx1\frac{2^{n}(2^{n}-1)}{2}-\frac{(2^{n}-x)(2^{n}-x+1)}{2}+y-x-1.

  3. 3.

    If x>yx>y, it computes and returns the binary encoding on 2n12n-1 bits of 2n(2n1)2(2ny)(2ny+1)2+xy1\frac{2^{n}(2^{n}-1)}{2}-\frac{(2^{n}-y)(2^{n}-y+1)}{2}+x-y-1.

Note that ElexE_{lex} has polynomial size, and is injective on the set of subsets of [2n][2^{n}] with 2 distinct elements by construction.

Remark 3.4.

In fact, this encoding is a bijection from the set of 2-subsets of [2n][2^{n}] to the set [(2n2)][\binom{2^{n}}{2}]. The reciprocal of that bijection can also be computed by a circuit DlexD_{lex} of polynomial size.

3.3 Prüfer Codes

In Section 6, we make use of Prüfer codes [Pru18] that give an efficiently computable bijection between the set of labelled spanning trees on nn vertices and the set of sequences of n2n-2 elements of [n][n]. They were originally used by Heinz Prüfer [Pru18] to prove Classical Theorem 5

We denote by EPrüferE_{\textsf{Prüfer}} a circuit that efficiently computes the Prüfer encoding of a spanning tree described by an element of {0,1}(n2)\{0,1\}^{\binom{n}{2}}. Similarly, let DPrüferD_{\textsf{Prüfer}} be a circuit that efficiently computes the spanning tree associated with a Prüfer code. By looking at the algorithm to compute Prüfer encodings, it is clear that we can assume these circuits to have polynomial size. We also assume that EPrüferE_{\textsf{Prüfer}} outputs elements of the right form even on inputs which do not correspond to spanning trees. Consider the lexicographic order on [n]n2[n]^{n-2}. Let RR be a circuit that efficiently computes the rank of an element of [n]n2[n]^{n-2}, and let E~Prüfer=REPrüfer\tilde{E}_{\textsf{Prüfer}}=R\circ E_{\textsf{Prüfer}}. Given a spanning tree, E~Prüfer\tilde{E}_{\textsf{Prüfer}} returns the rank of its Prüfer code in the lexicographic order.

Let RR^{\prime} be a circuit which on input xx computes the sequence of [n]n2[n]^{n-2} whose rank in the lexicographic order is xx. Let D~Prüfer=DPrüferR\tilde{D}_{\textsf{Prüfer}}=D_{\textsf{Prüfer}}\circ R^{\prime}. Given a rank, D~Prüfer\tilde{D}_{\textsf{Prüfer}} returns the spanning tree whose Prüfer code has the corresponding rank in the lexicographic order. Note that D~Prüfer\tilde{D}_{\textsf{Prüfer}} and E~Prüfer\tilde{E}_{\textsf{Prüfer}} both have polynomial size. Now, if β=(n2)log(n)\beta=\lceil(n-2)\log(n)\rceil, then E~Prüfer:{0,1}(n2){0,1}β\tilde{E}_{\textsf{Prüfer}}:\{0,1\}^{\binom{n}{2}}\rightarrow\{0,1\}^{\beta}, D~Prüfer:{0,1}β{0,1}(n2)\tilde{D}_{\textsf{Prüfer}}:\{0,1\}^{\beta}\rightarrow\{0,1\}^{\binom{n}{2}}. By construction, we have the following.

Lemma 3.5.

The following statements are true.

  1. 1.

    D~PrüferE~Prüfer\tilde{D}_{\textsf{Prüfer}}\circ\tilde{E}_{\textsf{Prüfer}} is the identity over the set of labelled spanning trees on nn vertices.

  2. 2.

    E~PrüferD~Prüfer\tilde{E}_{\textsf{Prüfer}}\circ\tilde{D}_{\textsf{Prüfer}} is the identity over the first nn2n^{n-2} elements of {0,1}β\{0,1\}^{\beta}.

Remark 3.6.

The behavior of D~Prüfer\tilde{D}_{\textsf{Prüfer}} on its last 2βnn22^{\beta}-n^{n-2} inputs is undefined.

Remark 3.7.

Let T1T_{1} be the tree composed of the edges (1,2),(1,3),,(1,n)(1,2),(1,3),\ldots,(1,n). Then, E~Prüfer(T1)=0β\tilde{E}_{\textsf{Prüfer}}(T_{1})=0^{\beta} and D~Prüfer(0β)=T1\tilde{D}_{\textsf{Prüfer}}(0^{\beta})=T_{1}.\hfill\diamond

3.4 Catalan Factorization

Catalan factorization [EK99] is an encoding of subsets of [2n][2n] that allows us to decompose the partially ordered set (2[2n],)(2^{[2n]},\subseteq) into (2nn)\binom{2n}{n} chains and to move efficiently within each chain to find a canonical representative, namely the only nn-subset of the chain.

Let x{0,1}2nx\in\{0,1\}^{2n} be a bitmap representing an element of [2n][2n]. We introduce a new symbol zz, and construct the Catalan factorization as follows. We temporarily record for each symbol whether or not it is underlined.

  1. 1.

    Underline the leftmost substring that starts with a non-underlined 1, followed by a (possibly empty) sequence of underlined symbols, and ends in a non-underlined 0. If no such substring exists, go to step 3.

  2. 2.

    Go to step 1.

  3. 3.

    Record the number kk of non-underlined 1’s.

  4. 4.

    Replace all non-underlined symbols in xx with zz, and let x{0,1,z}2nx^{\prime}\in\{0,1,z\}^{2n} be the resulting string (with underlinings removed).

  5. 5.

    Output (x,k)(x^{\prime},k).

We denote the output of the Catalan factorization as ECatalan(x){0,1,z}2n×[2n]E_{\textsf{Catalan}}(x)\in\{0,1,z\}^{2n}\times[2n]. We say x=E~Catalan(x)x^{\prime}=\tilde{E}_{\textsf{Catalan}}(x) is the Catalan string of xx. If x{0,1,z}2nx^{\prime}\in\{0,1,z\}^{2n} and mm is the number of zz’s in xx^{\prime}, then for any lml\leq m, we define DCatalan(x,l)D_{\textsf{Catalan}}(x^{\prime},l) as the string obtained from xx^{\prime} by replacing the ll last zz’s by 11 and the rest by 0.

Example 3.8.

Let n=4n=4 and let x=01101100x=01101100 be the string corresponding to the set {2,3,5,6}\{2,3,5,6\}. Then, we construct the Catalan factorization by repeating step 1 to get the underlined version.

011011000110¯11000110¯110¯00110¯110¯0¯\displaystyle 01101100\rightarrow 01\underline{10}1100\rightarrow 01\underline{10}1\underline{10}0\rightarrow 01\underline{10}\underline{1\underline{10}0}

We terminate as there are no non-underlined 0’s with a 1 on its left. We record that there is k=1k=1 non-underlined 1. We then replace all non-underlined symbols with zz to obtain the Catalan factorization.

(x,k)=(zz101100,1)(x^{\prime},k)=(zz101100,1)

Note that we have DCatalan(x,k)=01101100=xD_{\textsf{Catalan}}(x^{\prime},k)=01101100=x so the encoding and decoding operations behave as expected. Note also that DCatalan(x,0)=00101100D_{\textsf{Catalan}}(x^{\prime},0)=00101100 corresponds to the set {3,5,6}\{3,5,6\} and DCatalan(x,2)=11101100D_{\textsf{Catalan}}(x^{\prime},2)=11101100 corresponds to the set {1,2,3,5,6}\{1,2,3,5,6\}. For this reason, we say that the Catalan string xx^{\prime} identifies the following chain.

{3,5,6}{2,3,5,6}{1,2,3,5,6}\{3,5,6\}\subset\{2,3,5,6\}\subset\{1,2,3,5,6\}

In that chain, kk identifies that xx is the 1st1^{\text{st}} element, counting from 0. \hfill\diamond

Lemma 3.9.

DCatalanECatalanD_{\textsf{Catalan}}\circ E_{\textsf{Catalan}} acts as identity over {0,1}2n\{0,1\}^{2n}.

Proof.

Let x{0,1}2nx\in\{0,1\}^{2n}, and (x,k)=ECatalan(x)(x^{\prime},k)=E_{\textsf{Catalan}}(x) be its Catalan factorization. Let mm be the number of zz’s in xx. We claim that at the end of the underlining phase of the Catalan factorization of xx, the entries that are not underlined are first mkm-k 0’s and then kk 1’s. Indeed, by definition, kk of them are 1, so mkm-k of them are 0. Furthermore, if we had a non-underlined 1 before a non-underlined 0, then we could consider the rightmost non-underlined 1 that is before a non-underlined 0. This 1 is followed by a sequence of underlined symbols and then by a non-underlined 0 so this 1 and the corresponding 0 should have been underlined. Thus, we indeed have that the entries that are not underlined are first mkm-k 0’s and then kk 1’s. These are the entries that are turned into zz’s when we go from xx to xx^{\prime}.

Now, when we compute DCatalan(x,k)D_{\textsf{Catalan}}(x^{\prime},k), we replace the last kk zz’s in xx^{\prime} by 1’s and the mkm-k other ones by 0’s, which is exactly what we had in xx. Hence, DCatalanECatalan(x)=DCatalan(x,k)=xD_{\textsf{Catalan}}\circ E_{\textsf{Catalan}}(x)=D_{\textsf{Catalan}}(x^{\prime},k)=x. ∎

We also denote by DCatalan(l):{0,1,z}2n{0,1}2nD_{\textsf{Catalan}}^{(l)}:\{0,1,z\}^{2n}\rightarrow\{0,1\}^{2n} the map xDCatalan(x,l)x^{\prime}\mapsto D_{\textsf{Catalan}}(x^{\prime},l). If on input xx^{\prime}, ll is larger than the number of zz symbols in xx^{\prime}, all zz symbols are be replaced with 1; this ensures the map is defined for all l0l\geq 0.

Lemma 3.10.

For every l0l\geq 0, E~CatalanDCatalan(l)\tilde{E}_{\textsf{Catalan}}\circ D_{\textsf{Catalan}}^{(l)} acts as identity on the set of Catalan strings. That is, if xx^{\prime} is a Catalan string, then for every ll, the Catalan string of DCatalan(l)(x)D_{\textsf{Catalan}}^{(l)}(x^{\prime}) is xx^{\prime}.

Proof.

Let x{0,1}2nx\in\{0,1\}^{2n} and let x=E~Catalan(x)x^{\prime}=\tilde{E}_{\textsf{Catalan}}(x) be the Catalan string of xx. Now let l0l\geq 0, y=DCatalan(x,l)y=D_{\textsf{Catalan}}(x^{\prime},l) and y=E~Catalan(y)y^{\prime}=\tilde{E}_{\textsf{Catalan}}(y) be the Catalan string of yy. We want to show that y=xy^{\prime}=x^{\prime}.

We proceed using induction on the steps of the algorithm. At first, no entries are underlined in either string. Next, suppose that after some number of steps, the underlined bits are exactly the same in xx and in yy. Now, consider two bits that get underlined in xx at the next step. Then, all the bits between them are underlined in xx at this point, so this is also the case in yy by induction hypothesis. Furthermore, since these two bits get underlined in xx, they are not turned into zz’s at the end of the algorithm, which means that they are still the same bits in xx^{\prime} and therefore in yy. Hence, in yy we have these 2 bits, first a 1 and then a 0, such that every entry between them is underlined, so they get underlined at this step.

Conversely, consider two bits that get underlined in yy at the next step. Then, all the entries between them in yy are underlined at this point, so it is the case in xx too by induction hypothesis. By contradiction, suppose that the corresponding bits in xx do not get underlined at this step. By the previous observation, it means that this pair of bits in xx is not (1,0)(1,0). There are three cases to consider:

  1. 1.

    In xx, these two bits are 0’s. Then, the first gets turned into a 1 in yy, which means that it never gets underlined in xx (otherwise it would remain the same). Then, since all the bits in xx between these two are already underlined, and since the first never gets underlined, this means that the second never gets underlined (there will never be a non-underlined 1 before it such that all entries between them are underlined). Hence, these two bits never get underlined in the algorithm, and are finally turned into zz’s. Then, to go from xx^{\prime} to yy, we replace the ll last zz’s by 1’s and the others by 0’s, thus making it impossible for the first of these two bits to be turned into a 1 while the second is turned into a 0.

  2. 2.

    In xx, these two bits are respectively 0 and 1. Then, both these bits are changed between xx and yy, which means that they never get underlined in xx, hence they are zz’s in xx^{\prime}. Thus, like previously, it is impossible that the first one is turned into a 0 while the second is turned into a 1.

  3. 3.

    In xx, these two bits are 11’s. Then, the second bit gets turned into a 0 in yy, which means that it never gets underlined in xx. Like in the first case, we get that the first bit never gets underlined neither, once more making it impossible for these two bits to be turned respectively in 1 and 0.

In all three cases, we get a contradiction. Thus, the corresponding bits in xx are also underlined at this step. Then, by induction, we get that at each step, the same bits are underlined in xx and yy. Finally, we turn all the bits that are not underlined into zz’s to get xx^{\prime} and yy^{\prime}, hence x=yx^{\prime}=y^{\prime}. ∎

Remark 3.11.

We can define an equivalence relation \sim over the subsets of [2n][2n] by saying that two subsets are equivalent if and only if they have the same Catalan string.
By combining Catalan factorization and Cover encodings, we can obtain a property-preserving encoding for \sim on {0,1}2n\{0,1\}^{2n}. We use this in Section 5.

4 Erdős-Ko-Rado Theorem on Intersecting Families

In this section, we define total search problems motivated by the well-known Erdős-Ko-Rado theorem on intersecting families and study their computational complexity. First, we present a PWPP-complete variant of the problem. Next, we modify the problem using a strong statement of the Erdős-Ko-Rado theorem to get a PPP-complete variant.

Recall the definition of an intersecting family and the statement of the Erdős-Ko-Rado theorem.

Definition 4.1 (Intersecting family).

Let Ω\Omega be any set. A family of sets 2Ω\mathcal{F}\subseteq 2^{\Omega} is an intersecting family if no two sets are disjoint, i.e., if for any A,BA,B\in\mathcal{F}, it holds that ABA\cap B\neq\emptyset.

Classical Theorem 1 (Erdős-Ko-Rado [EKR61]).

Any intersecting family where each set has kk elements on a universe of size mm contains at most (m1k1)\binom{m-1}{k-1} sets, and this bound is tight.

We start by defining a total search problem motivated by a special case of the Erdős-Ko-Rado theorem for families of nn-sets in a universe of size 2n2n presented in the following corollary.

Corollary 4.2.

Any intersecting family where each set has nn elements on a universe of size 2n2n contains at most (2n1n1)\binom{2n-1}{n-1} sets, and this bound is tight. Furthermore, if \mathcal{F} is an intersecting family of maximum size, then for every nn-subset SS, exactly one of SS and S¯\overline{S} is in \mathcal{F}.

Suppose that we have a collection, containing more than (2n1n1)\binom{2n-1}{n-1} sets of size nn on 2n2n elements. Then, by Classical Theorem 1, there must be two sets that do not intersect. This induces a total search problem of finding two such disjoint sets. We consider an implicit representation of such a collection by a circuit CC whose inputs serve as indices in the collection. The output of the circuit is a representation of the corresponding set as a characteristic vector of the 2n2n elements. Of course, this representation does not guarantee that CC satisfies the conditions required for Classical Theorem 1 to apply, which would make the problem not total; in this case, we allow evidence of this fact to be a solution to the problem. Namely, if for a given input xx, we do not have |C(x)|=n|C(x)|=n, or two distinct indices x,yx,y represent the same set, i.e., C(x)=C(y)C(x)=C(y), we allow such inputs as solutions.

Definition 4.3 (weak-Erdős-Ko-Rado).

The problem weak-Erdős-Ko-Rado is defined by the relation

Instance:

A Boolean circuit C:{0,1}log((2n1n1))+1{0,1}2nC\colon\{0,1\}^{\left\lceil\log\left(\binom{2n-1}{n-1}\right)\right\rceil+1}\to\{0,1\}^{2n}.

Solution:

One of the following:

  1. i)

    xx s.t. |C(x)|n|C(x)|\neq n,

  2. ii)

    xyx\neq y s.t. C(x)=C(y)C(x)=C(y),

  3. iii)

    x,yx,y s.t. C(x)C(y)=C(x)\cap C(y)=\emptyset.

As we discussed in the introduction, the totality of this problem is proved using a “weak” statement in extremal combinatorics, namely the first part of Corollary 4.2, hence the name Weak. However, the analogy with weak-Pigeon goes further. Indeed, our first main theorem is the following.

Theorem 4.4.

weak-Erdős-Ko-Rado is PWPP-complete.

Throughout this section, we maintain α=log(2nn)=log(2n1n1)+1\alpha=\left\lceil\log\binom{2n}{n}\right\rceil=\left\lceil\log\binom{2n-1}{n-1}\right\rceil+1.

Lemma 4.5.

weak-Erdős-Ko-RadoPWPP\textsc{weak-Erdős-Ko-Rado}\in\textsf{PWPP}.

Proof.

At a high level, we want to encode the sets using a shrinking circuit, in such a way that collisions correspond to disjoint sets. Observe that for nn-sets in a universe of size 2n2n, the only disjoint sets are complements, hence we get an equivalent instance of weak-Erdős-Ko-Rado if we map each set to either itself or its complement, arbitrarily. In our construction, we map each set SS to the representative not containing 1. That is, if 1S1\not\in S, the set is left unchanged and, otherwise, it is mapped to its complement S¯\overline{S}. Note that by the pigeonhole principle, two sets that do not contain 1 must have a non-empty intersection since we work with nn-subsets of [2n][2n]. To obtain a shrinking circuit, we make use of Cover encodings (Section 3.1) that give an optimal encoding of all nn-sets by considering their lexicographic order. Notice that if the input SS is not an nn-set, we may map it arbitrarily to any nn-set, as a collision, in this case, yields a solution to the weak-Erdős-Ko-Rado instance.

Formally, recall that we have ECover:{0,1}2n{0,1}αE_{\textsf{Cover}}:\{0,1\}^{2n}\rightarrow\{0,1\}^{\alpha} and DCover:{0,1}α{0,1}2nD_{\textsf{Cover}}:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{2n}. Now let C:{0,1}α{0,1}2nC:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{2n} be an instance of Erdős-Ko-Rado. We proceed to construct an instance C:{0,1}α{0,1}α1C^{\prime}:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{\alpha-1} of weak-Pigeon as follows:

C(x)={ECover(C(x))if C(x)1=0ECover(C(x)¯)if C(x)1=1C^{\prime}(x)=\begin{cases}E_{\textsf{Cover}}(C(x))&\text{if $C(x)_{1}=0$}\\ E_{\textsf{Cover}}(\overline{C(x)})&\text{if $C(x)_{1}=1$}\end{cases}

Note that since we only encode sets whose first bit is a 0, by Remark 3.3, we get that the first bit of the encoding always is a 0, so we can consider only the log((2nn))1=α1\left\lceil\log(\binom{2n}{n})\right\rceil-1=\alpha-1 last bits of C(x)C^{\prime}(x) for every xx, which is why we say that CC^{\prime} only outputs α1\alpha-1 bits. Note also that if for some xx, C(x)C(x) does not have size nn, then ECover(C(x))E_{\textsf{Cover}}(C(x)) and ECover(C(x)¯)E_{\textsf{Cover}}(\overline{C(x)}) are still well-defined, even if they are meaningless.

Now, suppose that we have a solution to CC^{\prime}, that is xyx\neq y such that C(x)=C(y)C^{\prime}(x)=C^{\prime}(y). There are four cases to consider, depending on the first bits of C(x),C(y)C(x),C(y). If C(x)1=C(y)1=0C(x)_{1}=C(y)_{1}=0, then ECover(C(x))=C(x)=C(y)=ECover(C(y))E_{\textsf{Cover}}(C(x))=C^{\prime}(x)=C^{\prime}(y)=E_{\textsf{Cover}}(C(y)). If both C(x)C(x) and C(y)C(y) have size nn, then by injectivity of ECoverE_{\textsf{Cover}} on inputs of size nn (see Lemma 3.2), we get C(x)=C(y)C(x)=C(y), which is a solution to weak-Erdős-Ko-Rado. If one of them does not have size nn, we also get a solution to weak-Erdős-Ko-Rado. The other cases are similar. ∎

Remark 4.6.

Consider the circuit E:{0,1}2n{0,1}α1E:\{0,1\}^{2n}\rightarrow\{0,1\}^{\alpha-1}, defined as follows.

E(x)={0α1if |x|nECover(x)if x1=0 and |x|=nECover(x¯)if x1=1 and |x|=nE(x)=\begin{cases}0^{\alpha-1}&\text{if $|x|\neq n$}\\ E_{\textsf{Cover}}(x)&\text{if $x_{1}=0$ and $|x|=n$}\\ E_{\textsf{Cover}}(\overline{x})&\text{if $x_{1}=1$ and $|x|=n$}\\ \end{cases}

Let 𝒳{0,1}2n\mathcal{X}\subseteq\{0,1\}^{2n} be the subset of {0,1}2n\{0,1\}^{2n} corresponding to the nn-subsets of [2n][2n]. We define an equivalence relation \sim on 𝒳\mathcal{X} by saying that two strings are equivalent if the corresponding subsets are either equal or disjoint. Note that this relation is transitive only because we work with nn-subsets of [2n][2n].
Then, we have that EE is a property-preserving encoding for \sim on 𝒳\mathcal{X}.
Furthermore, the property that is preserved by EE is such that if two of its inputs collide, they form a solution to the problem we’re interested in.
Then, to prove the inclusion of weak-Erdős-Ko-Rado into PWPP, it suffices to compose our instance of weak-Erdős-Ko-Rado with EE.

Lemma 4.7.

weak-Erdős-Ko-Rado is PWPP-hard.

Proof.

Our goal is for the Erdős-Ko-Rado solver to find collisions in an instance CC^{\prime} of weak-Pigeon. We use a variation of the graph hash product [Kra05, KNY19]. The idea is to interpret the output of CC^{\prime} as an index into the collection of all nn-sets that do not contain 1. We then use the Cover decoding function to obtain a representation of the corresponding set, and by correctness of the encoding, any such set must have exactly nn elements – and all the sets intersect since they do not contain 1. Hence, the only solutions to the weak-Erdős-Ko-Rado instance are collisions, that yield solutions to the original circuit CC^{\prime}.

Formally, let C:{0,1}m{0,1}m1C^{\prime}:\{0,1\}^{m}\rightarrow\{0,1\}^{m-1} be an instance of weak-Pigeon. Let nn be the minimal integer such that 2m+1(2nn)2^{m+1}\leq\binom{2n}{n}. Then, m+1αm+1\leq\alpha. We proceed to build a circuit A:{0,1}α{0,1}α2A:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{\alpha-2} whose size is polynomial in mm and such that from any collision in AA we can efficiently find a collision in CC^{\prime}. Recall that we have ECover:{0,1}2n{0,1}αE_{\textsf{Cover}}:\{0,1\}^{2n}\rightarrow\{0,1\}^{\alpha} and DCover:{0,1}α{0,1}2nD_{\textsf{Cover}}:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{2n}. We define C:{0,1}α{0,1}2nC:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{2n} by

C(x)=DCover(00A(x))C(x)=D_{\textsf{Cover}}(00\mathbin{\|}A(x))

By Remark 3.3, since for every xx, (00A(x))(00\mathbin{\|}A(x)) is one of the (2n1n1)\binom{2n-1}{n-1} first possible inputs, we have that the set DCover(00A(x))D_{\textsf{Cover}}(00\mathbin{\|}A(x)) is an nn-subset of [2n][2n] which does not contain the element 1. We observe that CC defines an instance of weak-Erdős-Ko-Rado. Now suppose that we have a solution to this instance. By correctness of the decoding, we can only have solutions of type iii), that is xyx\neq y such that C(x)=C(y)C(x)=C(y). By injectivity of DCoverD_{\textsf{Cover}} on its first (2nn)\binom{2n}{n} inputs (see Lemma 3.2), we get that (00A(x))=(00A(y))(00\mathbin{\|}A(x))=(00\mathbin{\|}A(y)) hence A(x)=A(y)A(x)=A(y) and from there we can retrieve a collision for CC^{\prime}.∎

PPP-completeness using the tight bound

We remark that Corollary 4.2 gives a tight upper bound on the size of the collection. Furthermore, we know some structure of any collection whose size is exactly one (2n1n1)\binom{2n-1}{n-1}: it must either not be an intersecting family, or it must contain either [n][n] or [n]¯\overline{[n]}. This is an example of a “strong” theorem in extremal combinatorics. As discussed in the introduction, this observation allows us to modify the problem to be create a variant of weak-Erdős-Ko-Rado that is to weak-Erdős-Ko-Rado what Pigeon is to weak-Pigeon. The idea is to let CC encode a collection whose size exactly matches the threshold. We then let CC represent a collection of exactly (2n1n1)\binom{2n-1}{n-1} sets, and also allow preimages of [n][n] and [n]¯\overline{[n]} as solutions. We show that modifying the problem in this manner makes it PPP-complete, thus strengthening the analogy with Pigeon. This technique is quite general, and we utilise it again in later sections.

Definition 4.8 (Erdős-Ko-Rado).

The problem Erdős-Ko-Rado is defined by the relation

Instance:

A Boolean circuit C:{0,1}log((2n1n1)){0,1}2nC\colon\{0,1\}^{\left\lceil\log\left(\binom{2n-1}{n-1}\right)\right\rceil}\to\{0,1\}^{2n}.

Solution:

One of the following:

  1. i)

    xx s.t. |C(x)|n|C(x)|\neq n and x<(2n1n1)x<\binom{2n-1}{n-1},

  2. ii)

    xyx\neq y s.t. C(x)=C(y)C(x)=C(y) and x,y<(2n1n1)x,y<\binom{2n-1}{n-1},

  3. iii)

    x,yx,y s.t. C(x)C(y)=C(x)\cap C(y)=\emptyset and x,y<(2n1n1)x,y<\binom{2n-1}{n-1},

  4. iv)

    xx s.t. C(x)=[n]C(x)=[n] or [n]¯\overline{[n]} and x<(2n1n1)x<\binom{2n-1}{n-1}.

Theorem 4.9.

Erdős-Ko-Rado is PPP-complete.

Lemma 4.10.

Erdős-Ko-Rado is PPP-hard.

Proof.

This proof is similar in spirit to that of Lemma 4.7, except for some minor changes. The first one is that the instance of Pigeon might be a permutation, and thus not have collisions. We then need to be able to find the preimage of 0. This is done by solutions of type iv)iv). The second one is that we only look at the first (2n1n1)\binom{2n-1}{n-1} inputs of the Pigeon instance, so we have to modify it to make sure that all the possible solutions come from here. This is why we build the circuit AA.

Formally, let C:{0,1}m{0,1}mC^{\prime}:\{0,1\}^{m}\rightarrow\{0,1\}^{m} be an instance of Pigeon, and let nn be the minimal integer such that 2m<(2n1n1)2^{m}<\binom{2n-1}{n-1}. Since α=log(2n1n1)+1\alpha=\left\lceil\log\binom{2n-1}{n-1}\right\rceil+1, we have m<α1m<\alpha-1. Define A:{0,1}α1{0,1}α1A:\{0,1\}^{\alpha-1}\rightarrow\{0,1\}^{\alpha-1} by,

A(x)={C(x)if x<2mxo.w.A(x)=\begin{cases}C^{\prime}(x)&\text{if $x<2^{m}$}\\ x&\text{o.w.}\end{cases}

It might be the case that the output of AA has less than α1\alpha-1 bits, in which case we pad it with 0 on the left to make it an (α1)(\alpha-1)-bit string. Recall that we have ECover:{0,1}2n{0,1}αE_{\textsf{Cover}}:\{0,1\}^{2n}\rightarrow\{0,1\}^{\alpha} and DCover:{0,1}α{0,1}2nD_{\textsf{Cover}}:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{2n}.

We proceed to build an instance C:{0,1}α1{0,1}2nC:\{0,1\}^{\alpha-1}\rightarrow\{0,1\}^{2n} of Erdős-Ko-Rado by setting C(x)=DCover(0A(x))C(x)=D_{\textsf{Cover}}(0\mathbin{\|}A(x)). Note that for any x<(2n1n1)x<\binom{2n-1}{n-1}, we have A(x)<(2n1n1)A(x)<\binom{2n-1}{n-1}, thus C(x)[2n]C(x)\subseteq[2n] is an nn-subset and does not contain the element 1 by Remark 3.3.

Now, suppose that we have a solution to CC. Since the index of a solution is <(2n1n1)<\binom{2n-1}{n-1}, the corresponding subset(s) must have size nn and can’t contain 11. If the solution is of the form x,yx,y such that C(x)C(y)=C(x)\cap C(y)=\emptyset then we have |C(x)C(y)|=|C(x)|+|C(y)|=2n|C(x)\ \cup\ C(y)|=|C(x)|+|C(y)|=2n so we must have either 1C(x)1\in C(x) or 1C(y)1\in C(y), which is not possible.

Thus, any solution must be xyx\neq y such that C(x)=C(y)C(x)=C(y) or xx such that C(x)=[n]C(x)=[n] or [n]¯\overline{[n]}. There are two cases to consider:

  • Case DCover(0A(x))=DCover(0A(y))D_{\textsf{Cover}}(0\mathbin{\|}A(x))=D_{\textsf{Cover}}(0\mathbin{\|}A(y)). Then A(x)=A(y)A(x)=A(y) since DCoverD_{\textsf{Cover}} is injective on its first (2nn)\binom{2n}{n} inputs. But CC^{\prime} has range [2m1]\subseteq[2^{m}-1] so any collision in AA must result from a collision in CC^{\prime}. Hence, we get that x,y<2mx,y<2^{m} give us a solution to CC^{\prime}.

  • Case DCover(A(x))=[n]D_{\textsf{Cover}}(A(x))=[n] or [n]¯\overline{[n]}. Since A(x)<(2n1n1)A(x)<\binom{2n-1}{n-1} then DCover(0A(x))D_{\textsf{Cover}}(0\mathbin{\|}A(x)) does not contain element 1, so C(x)=[n]¯=DCover(0α)C(x)=\overline{[n]}=D_{\textsf{Cover}}(0^{\alpha}), thus A(x)=0α1A(x)=0^{\alpha-1}. This means that we have x<2mx<2^{m} and xx corresponds to a preimage of 0m0^{m} for CC^{\prime}.

In each case, we get a solution to our original problem. ∎

Remark 4.11.

We often use that technique of creating a circuit AA from a circuit CC, such that any collision (resp. preimage of 0) in AA must come from a collision (resp. preimage of 0) in CC, and happen in the first inputs of AA (in the range where we want it to happen).

Lemma 4.12.

Erdős-Ko-RadoPPP\textsc{Erdős-Ko-Rado}\in\textsf{PPP}.

Proof.

This proof is quite the same as the proof of Lemma 4.7, with two minor differences. The first one is that in the instance of Pigeon we create, there might be preimages of 0. These solutions to Pigeon correspond to solutions of type iv)iv) for Erdős-Ko-Rado. The second difference is that we only perform the reduction on the first (2n1n1)\binom{2n-1}{n-1} inputs, and then map the others in such a way that they neither create a collision nor result in a preimage of 0.

Formally, suppose that we have an instance of Erdős-Ko-Rado, i.e., a circuit C:{0,1}α1{0,1}2nC:\{0,1\}^{\alpha-1}\rightarrow\{0,1\}^{2n}. We proceed to construct an instance C:{0,1}α1{0,1}α1C^{\prime}:\{0,1\}^{\alpha-1}\rightarrow\{0,1\}^{\alpha-1} of Pigeon as follows:

C(x)={ECover(C(x))if C(x)1=0 and x<(2n1n1)ECover(C(x)¯)if C(x)1=1 and x<(2n1n1)xif x(2n1n1)C^{\prime}(x)=\begin{cases}E_{\textsf{Cover}}(C(x))&\text{if $C(x)_{1}=0$ and $x<\binom{2n-1}{n-1}$}\\ E_{\textsf{Cover}}(\overline{C(x)})&\text{if $C(x)_{1}=1$ and $x<\binom{2n-1}{n-1}$}\\ x&\text{if $x\geq\binom{2n-1}{n-1}$}\end{cases}

In the case x<(2n1n1)x<\binom{2n-1}{n-1}, since we only encode sets whose first bit is a 0, by Remark 3.3, we get that the first bit of the encoding always is a 0, so we can consider only the log((2nn))1=α1\left\lceil\log(\binom{2n}{n})\right\rceil-1=\alpha-1 last bits of C(x)C^{\prime}(x) for every such xx. Furthermore, if we consider the output of ECoverE_{\textsf{Cover}} as an integer, we get that this integer is <(2n1n1)<\binom{2n-1}{n-1} (because the set we encode is one of the first (2n1n1)\binom{2n-1}{n-1} in the lexicographic order). Note also that if for some xx such that x<(2n1n1)x<\binom{2n-1}{n-1}, C(x)C(x) does not have size nn, then C(x)C^{\prime}(x) is still well-defined and less than (2n1n1)\binom{2n-1}{n-1}, even if it is meaningless.

Now, suppose that we have a solution to CC^{\prime} of the form xyx\neq y such that C(x)=C(y)C^{\prime}(x)=C^{\prime}(y). Again there are four cases to consider, depending on the first bits of C(x),C(y)C(x),C(y). If C(x)1=C(y)1=0C(x)_{1}=C(y)_{1}=0 then ECover(C(x))=C(x)=C(y)=ECover(C(y))E_{\textsf{Cover}}(C(x))=C^{\prime}(x)=C^{\prime}(y)=E_{\textsf{Cover}}(C(y)). If both C(x)C(x) and C(y)C(y) have size nn, then by injectivity of ECoverE_{\textsf{Cover}} on inputs of size nn (see Lemma 3.2), we get C(x)=C(y)C(x)=C(y), which is a solution to Erdős-Ko-Rado. If one of them does not have size nn, we also get a solution to Erdős-Ko-Rado. The other cases are similar.

Now, suppose that we have a solution to CC^{\prime} of the form xx such that C(x)=0α1C^{\prime}(x)=0^{\alpha-1}. Like previously, we get that x<(2n1n1)x<\binom{2n-1}{n-1}. If C(x)C(x) does not have size nn then xx is a solution. Now, suppose that C(x)C(x) has size nn. There are two cases to consider, depending on the first bit of C(x)C^{\prime}(x). If the first bit of C(x)C(x) is 0, then, ECover(C(x))=0αE_{\textsf{Cover}}(C(x))=0^{\alpha} so C(x)=0n1nC(x)=0^{n}\mathbin{\|}1^{n} by Eq. 1 and Lemma 3.2. Thus, C(x)=[n]¯C(x)=\overline{[n]}. Instead, if the first bit of C(x)C(x) is 1, then ECover(C(x)¯)=0αE_{\textsf{Cover}}(\overline{C(x)})=0^{\alpha} so C(x)¯=[n]¯\overline{C(x)}=\overline{[n]} and thus C(x)=[n]C(x)=[n]. In either case, we get a solution to our original problem. ∎

Remark 4.13.

Like previously, the idea behind that proof is to compose our instance of Erdős-Ko-Rado with the property-preserving encoding we defined in Remark 4.6. However, this time it is not only the collisions that are of interest to us, but also the preimages of the 0 string.

4.1 A Generalized Erdős-Ko-Rado Problem

For the previous problems, we were only considering a very restricted version of the Erdős-Ko-Rado theorem, namely for an intersecting family of nn-subsets of [2n][2n]. We now consider a more general version where we consider an intersecting family of nn-subsets of [kn][kn] for some k>2k>2.

We now fix some k>2k>2 for the rest of this section. The Erdős-Ko-Rado theorem states that if \mathcal{F} is an intersecting family where each set has nn elements on a universe of size knkn, then \mathcal{F} contains at most (kn1n1)\binom{kn-1}{n-1} sets. Then, we can define the following TFNP problem, very similar to weak-Erdős-Ko-Rado.

Definition 4.14 (weak-general-Erdős-Ko-Radok).

The problem weak-general-Erdős-Ko-Radok is defined by the relation

Instance:

A Boolean circuit C:{0,1}log((kn1n1))+1{0,1}knC\colon\{0,1\}^{\left\lceil\log\left(\binom{kn-1}{n-1}\right)\right\rceil+1}\to\{0,1\}^{kn}.

Solution:

One of the following:

  1. i)

    xx s.t. |C(x)|n|C(x)|\neq n,

  2. ii)

    xyx\neq y s.t. C(x)=C(y)C(x)=C(y),

  3. iii)

    x,yx,y s.t. C(x)C(y)=C(x)\cap C(y)=\emptyset.

Proposition 4.15.

weak-general-Erdős-Ko-Radok is PWPP-hard.

Proof.

This proof is very similar to the proof of Lemma 4.7, except that instead of working with nn-subsets of [2n][2n], we work with nn-subsets of [kn][kn]. There is also a technical change, which is that this time we work with nn-subsets of [kn][kn] that do contain the element 1. This is necessary to make sure that we have an intersecting family, but it adds some more technicality. For the same reason, we need AA to shrink more than in the previous proof. However, the idea behind the proof is exactly the same, with the same use of the graph-hash product on a large intersecting family.

Formally, let C:{0,1}m{0,1}m1C^{\prime}:\{0,1\}^{m}\rightarrow\{0,1\}^{m-1} be an instance of weak-Pigeon. Let nn be the minimal integer such that 2m+1(knn)2^{m+1}\leq\binom{kn}{n}. Now, let α=log(knn)\alpha=\left\lceil\log\binom{kn}{n}\right\rceil. Then, m+1αm+1\leq\alpha. We also define a=log(k)a=\left\lceil\log(k)\right\rceil. By definition of α\alpha, we have (knn)2α1\binom{kn}{n}\geq 2^{\alpha-1}. We also have 1k12a\frac{1}{k}\geq\frac{1}{2^{a}}, so (kn1n1)=1k(knn)2α1k2α1a\binom{kn-1}{n-1}=\frac{1}{k}\binom{kn}{n}\geq\frac{2^{\alpha-1}}{k}\geq 2^{\alpha-1-a}. Like in the proof of Lemma 6.5, we can build a circuit A:{0,1}α{0,1}α1aA^{\prime}:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{\alpha-1-a} whose size is polynomial in mm and such that from any collision in AA^{\prime} we can efficiently find a collision in CC^{\prime}. Let s{0,1}αs\in\{0,1\}^{\alpha} be the binary encoding on α\alpha bits of (knn)(kn1n1)\binom{kn}{n}-\binom{kn-1}{n-1}. We use the Cover encoding functions for nn-subsets of [kn][kn]: ECovern,kn:{0,1}kn{0,1}αE_{\textsf{Cover}}^{n,kn}:\{0,1\}^{kn}\rightarrow\{0,1\}^{\alpha} and DCovern,kn:{0,1}α{0,1}knD_{\textsf{Cover}}^{n,kn}:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{kn}.

We define C:{0,1}α{0,1}knC:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{kn} by C(x)=DCoverk,kn(s0a+1A(x))C(x)=D_{\textsf{Cover}}^{k,kn}(s\oplus 0^{a+1}\mathbin{\|}A^{\prime}(x)). For every xx, we have that (0a+1A(x))(0^{a+1}\mathbin{\|}A^{\prime}(x)) is one of the first 2α1a2^{\alpha-1-a} elements of {0,1}α\{0,1\}^{\alpha} in the lexicographic order, hence it is one of the first (kn1n1)\binom{kn-1}{n-1} first. Thus, the rank of s0a+1A(x)s\oplus 0^{a+1}\mathbin{\|}A^{\prime}(x) in the lexicographic order is between (knn)(kn1n1)\binom{kn}{n}-\binom{kn-1}{n-1} and (knn)1\binom{kn}{n}-1 counting from 0. The last (kn1n1)\binom{kn-1}{n-1} nn-subsets of [kn][kn] in the lexicographic order correspond to subsets that contain the element 1. Hence, for every xx, we have that the set DCovern,kn(s01+aA(x))D_{\textsf{Cover}}^{n,kn}(s\oplus 0^{1+a}\mathbin{\|}A^{\prime}(x)) is an nn-subset of [kn][kn] which contains the element 1. We observe that CC defines an instance of weak-general-Erdős-Ko-Radok.

Now, suppose that we have a solution to this instance. We consider each solution type separately.

  1. i)

    It cannot be xx such that |C(x)|n|C(x)|\neq n because C(x)=DCovern,kn(s01+aA(x))C(x)=D_{\textsf{Cover}}^{n,kn}(s\oplus 0^{1+a}\mathbin{\|}A^{\prime}(x)) is an nn-subset of [kn][kn].

  2. ii)

    By the previous, 1C(x)1\in C(x) and 1C(y)1\in C(y) so 1C(x)C(y)1\in C(x)\cup C(y), which is a contradiction.

  3. iii)

    By injectivity of DCovern,knD_{\textsf{Cover}}^{n,kn} on its first (knn)\binom{kn}{n} inputs (see Lemma 3.2), we get that (s01+aA(x))=(s01+aA(y))(s\oplus 0^{1+a}\mathbin{\|}A^{\prime}(x))=(s\oplus 0^{1+a}\mathbin{\|}A^{\prime}(y)) hence A(x)=A(y)A^{\prime}(x)=A^{\prime}(y) and from there we can retrieve a collision for CC^{\prime}.∎

To prove that weak-general-Erdős-Ko-RadokPWPP\textsc{weak-general-Erdős-Ko-Rado${}_{k}$}\in\textsf{PWPP}, we present some useful definitions and results related to the Erdős-Ko-Rado theorem.

Definition 4.16.

If kk divides mm, a (k,m)(k,m)-parallel class is a set of m/km/k kk-subsets of [m][m] which partition [m][m].

Classical Theorem 2 (Baranyai, [Bar73]).

If kk divides mm, we can define (m1k1)\binom{m-1}{k-1} (k,m)(k,m)-parallel classes 𝒜1,,𝒜(m1k1)\mathcal{A}_{1},\ldots,\mathcal{A}_{\binom{m-1}{k-1}} such that each kk-subset of [m][m] appears in exactly one 𝒜i\mathcal{A}_{i}.

Remark 4.17.

Note that this result proves the Erdős-Ko-Rado theorem in the case where the size of the subsets divides the size of the universe.
Note also that up to renaming the elements, we can assume that 𝒜1\mathcal{A}_{1} consists exactly of the sets {1,2,,n},{n+1,n+2,,2n},\{1,2,\ldots,n\},\{n+1,n+2,\ldots,2n\},\ldots, and {(k1)n+1,(k1)n+2,,kn}\{(k-1)n+1,(k-1)n+2,\ldots,kn\}.

However, all known proofs of this theorem are inefficient, in the sense that there is no known way to define 𝒜1,,𝒜(m1k1)\mathcal{A}_{1},\ldots,\mathcal{A}_{\binom{m-1}{k-1}} such that given a kk-subset of [m][m], we can find in polynomial time the only ii such that this subset appears in 𝒜i\mathcal{A}_{i}. We make this assumption explicit.

Assumption 4.18 (efficient Baranyai assumption).

There is an efficient procedure to define 𝒜1,,𝒜(m1k1)\mathcal{A}_{1},\ldots,\mathcal{A}_{\binom{m-1}{k-1}} and a circuit Bar:{0,1}m[(m1k1)]Bar:\{0,1\}^{m}\rightarrow[\binom{m-1}{k-1}] which takes as input a kk-subset of [m][m] and returns the only index ii such that this subset appears in 𝒜i\mathcal{A}_{i}. Furthermore, we assume that 𝒜1\mathcal{A}_{1} consists exactly of the sets {1,2,,n},{n+1,n+2,,2n},\{1,2,\ldots,n\},\{n+1,n+2,\ldots,2n\},\ldots, and {(k1)n+1,(k1)n+2,,kn}\{(k-1)n+1,(k-1)n+2,\ldots,kn\}.

Proposition 4.19.

Under 4.18, weak-general-Erdős-Ko-RadokPWPP\textsc{weak-general-Erdős-Ko-Rado${}_{k}$}\in\textsf{PWPP}.

Proof.

At a high level, the proof goes as follows. We are given strictly more than (kn1n1)\binom{kn-1}{n-1} subsets of [kn][kn]. We map them to elements of [(kn1n1)][\binom{kn-1}{n-1}] in the following way. If one set does not have size nn, we map it anywhere. If it has size nn, we map it to the only ii such that the set is in 𝒜i\mathcal{A}_{i}. This defines an instance of weak-Pigeon. In any collision for this instance, we must have either a set that does not have size nn, or two sets in the same parallel class, which means that either they are equal, or they do not intersect.

Formally, by assumption, we have a circuit Bar:{0,1}kn[(kn1n1)]Bar:\{0,1\}^{kn}\rightarrow[\binom{kn-1}{n-1}] which takes as input an nn-subset of [kn][kn] and returns the only index ii such that this subset appears in 𝒜i\mathcal{A}_{i}. We define a circuit Bar:{0,1}kn{0,1}log(kn1n1)Bar^{\prime}:\{0,1\}^{kn}\rightarrow\{0,1\}^{\left\lceil\log\binom{kn-1}{n-1}\right\rceil} which takes as input an nn-subset of [kn][kn] and returns the binary encoding on log(kn1n1)\left\lceil\log\binom{kn-1}{n-1}\right\rceil bits of the only index ii such that this subset appears in 𝒜i\mathcal{A}_{i}. Now, suppose that we have an instance C:{0,1}log((kn1n1))+1{0,1}knC:\{0,1\}^{\left\lceil\log\left(\binom{kn-1}{n-1}\right)\right\rceil+1}\rightarrow\{0,1\}^{kn} of weak-general-Erdős-Ko-Radok. We set C=BarCC^{\prime}=Bar^{\prime}\circ C. Then, we have C:{0,1}log((kn1n1))+1{0,1}log(kn1n1)C^{\prime}:\{0,1\}^{\left\lceil\log\left(\binom{kn-1}{n-1}\right)\right\rceil+1}\rightarrow\{0,1\}^{\left\lceil\log\binom{kn-1}{n-1}\right\rceil} so CC^{\prime} is an instance of weak-Pigeon.

Now, suppose that we have a solution to this instance of weak-Pigeon, that is xy{0,1}log((kn1n1))+1x\neq y\in\{0,1\}^{\left\lceil\log\left(\binom{kn-1}{n-1}\right)\right\rceil+1} such that C(x)=C(y)C^{\prime}(x)=C^{\prime}(y). Then, Bar(C(x))=Bar(C(y))Bar^{\prime}(C(x))=Bar^{\prime}(C(y)). If one of C(x),C(y)C(x),C(y) does not have size nn, we have a solution to our instance of weak-general-Erdős-Ko-Radok, and similarly if C(x)=C(y)C(x)=C(y). Otherwise, it means that C(x),C(y)C(x),C(y) are distinct nn-subsets of [kn][kn] that appear in the same (n,kn)(n,kn)-parallel class. By definition of a parallel class, it means that these 2 sets are part of a partition of [kn][kn], hence they don’t intersect and they form a solution to our original instance of weak-general-Erdős-Ko-Radok. ∎

Remark 4.20.

Let 𝒳\mathcal{X} be the set of nn-subsets of [kn][kn]. We define an equivalence relation \sim on 𝒳\mathcal{X} by saying that two nn-subsets XX and YY of [kn][kn] are equivalent if and only Bar(X)=Bar(Y)Bar(X)=Bar(Y), meaning that they are in the same (n,kn)(n,kn)-parallel class in the partition induced by BarBar.
Then, we have that BarBar is a property-preserving encoding for \sim on 𝒳\mathcal{X}.
Note that two equivalent subsets are either equal or disjoint. Hence, the property that is preserved by BarBar is such that if two of its inputs collide, they form a solution to our problem.
Then, to prove the inclusion of weak-general-Erdős-Ko-Radok into PPP, it suffices to compose our instance of weak-general-Erdős-Ko-Radok with BarBar.

The previous two propositions establish the following result.

Theorem 4.21.

Under 4.18, weak-general-Erdős-Ko-Radok is PWPP-complete.

PPP-completeness using the tight bound

Like for the case of nn-subsets of [2n][2n], we can define a “tight” version of the previous problem, which is very similar to Erdős-Ko-Rado.

Definition 4.22 (general-Erdős-Ko-Radok).

The problem general-Erdős-Ko-Radok is defined by the relation

Instance:

A Boolean circuit C:{0,1}log((kn1n1)){0,1}knC\colon\{0,1\}^{\left\lceil\log\left(\binom{kn-1}{n-1}\right)\right\rceil}\to\{0,1\}^{kn}.

Solution:

One of the following:

  1. i)

    xx s.t. |C(x)|n|C(x)|\neq n and x<(kn1n1)x<\binom{kn-1}{n-1},

  2. ii)

    xyx\neq y s.t. C(x)=C(y)C(x)=C(y) and x,y<(kn1n1)x,y<\binom{kn-1}{n-1},

  3. iii)

    x,yx,y s.t. C(x)C(y)=C(x)\cap C(y)=\emptyset and x,y<(kn1n1)x,y<\binom{kn-1}{n-1},

  4. iv)

    xx s.t. C(x)={1,2,,n}C(x)=\{1,2,\ldots,n\} or {n+1,n+2,,2n}\{n+1,n+2,\ldots,2n\}, or…, or {(k1)n+1,(k1)n+2,,kn}\{(k-1)n+1,(k-1)n+2,\ldots,kn\} and x<(kn1n1)x<\binom{kn-1}{n-1}.

First, let’s see why this problem is total. Suppose that we have a list of (kn1n1)\binom{kn-1}{n-1} subsets of [kn][kn]. If one of the sets does not have nn elements, if two of the sets are equal, or if two of the sets don’t intersect, we have a solution. Now, suppose that we have an intersecting family of (kn1n1)\binom{kn-1}{n-1} distinct nn-subsets of [kn][kn].
Now, consider a collection of (n,kn)(n,kn)-parallel classes 𝒜1,,𝒜(kn1n1)\mathcal{A}_{1},\ldots,\mathcal{A}_{\binom{kn-1}{n-1}} such that each nn-subset of [kn][kn] appears in exactly one 𝒜i\mathcal{A}_{i} (which exists by Classical Theorem 2). Up to renaming the elements, we can assume that 𝒜1\mathcal{A}_{1} is composed of the kk nn-subsets {1,2,,n}\{1,2,\ldots,n\}, {n+1,n+2,,2n}\{n+1,n+2,\ldots,2n\}, … and {(k1)n+1,(k1)n+2,,kn}\{(k-1)n+1,(k-1)n+2,\ldots,kn\}.
Since we have an intersecting family of distinct subsets, no two subsets can be in the same 𝒜i\mathcal{A}_{i}, and we have as many subsets as 𝒜i\mathcal{A}_{i}’s, which means that one of the subsets is in 𝒜1\mathcal{A}_{1}, hence that it is one of the particular subsets we are looking for. This proves that general-Erdős-Ko-RadokTFNP\textsc{general-Erdős-Ko-Rado${}_{k}$}\in\textsf{TFNP}. We then have the following result.

Proposition 4.23.

general-Erdős-Ko-Radok is PPP-hard.

Proof.

Informally, this proof is very much like the proof of Proposition 4.15, with the same technicalities as in the proof of Lemma 4.10. The idea is again to interpret the outputs of an instance of Pigeon as indices into the collection of all the nn-subsets of [kn][kn] which contain the element 1. Solutions of type iv)iv) correspond to preimages of 0. Like for Lemma 4.10, we need to define AA to make sure that all solutions to our instance of general-Erdős-Ko-Radok indeed come from the instance of Pigeon.

Formally, let C:{0,1}m{0,1}mC^{\prime}:\{0,1\}^{m}\rightarrow\{0,1\}^{m} be an instance of Pigeon, and let nn be the minimal integer such that 2m(kn1n1)2^{m}\leq\binom{kn-1}{n-1}. We set α=log(knn)\alpha=\left\lceil\log\binom{kn}{n}\right\rceil and β=log(kn1n1)+1\beta=\left\lceil\log\binom{kn-1}{n-1}\right\rceil+1. Then, β1m\beta-1\geq m. Define A:{0,1}β1{0,1}β1A:\{0,1\}^{\beta-1}\rightarrow\{0,1\}^{\beta-1} by,

A(x)={C(x)if x<2mxif x2mA(x)=\begin{cases}C^{\prime}(x)&\text{if $x<2^{m}$}\\ x&\text{if $x\geq 2^{m}$}\end{cases}

It might be the case that the output of AA has less than β1\beta-1 bits, in which case we pad it with 0 on the left to make it an (β1)(\beta-1)-bit string. Let s{0,1}αs\in\{0,1\}^{\alpha} be the binary encoding on α\alpha bits of (knn)1\binom{kn}{n}-1. Recall that we have ECovern,kn:{0,1}kn{0,1}αE_{\textsf{Cover}}^{n,kn}:\{0,1\}^{kn}\rightarrow\{0,1\}^{\alpha} and DCovern,kn:{0,1}α{0,1}knD_{\textsf{Cover}}^{n,kn}:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{kn}.

We proceed to build an instance C:{0,1}β1{0,1}knC:\{0,1\}^{\beta-1}\rightarrow\{0,1\}^{kn} of general-Erdős-Ko-Radok by setting C(x)=DCovern,kn(s0α+1βA(x))C(x)=D_{\textsf{Cover}}^{n,kn}(s-0^{\alpha+1-\beta}\mathbin{\|}A(x)) where - represents the subtraction in binary (mod 2α2^{\alpha}). For every x<(kn1n1)x<\binom{kn-1}{n-1}, we have that (0α+1βA(x))(0^{\alpha+1-\beta}\mathbin{\|}A(x)) is one of the first (kn1n1)\binom{kn-1}{n-1} elements of {0,1}α\{0,1\}^{\alpha} in the lexicographic order. Thus, the rank of s0α+1βA(x)s-0^{\alpha+1-\beta}\mathbin{\|}A(x) in the lexicographic order is between (knn)(kn1n1)\binom{kn}{n}-\binom{kn-1}{n-1} and (knn)1\binom{kn}{n}-1 counting from 0. The last (kn1n1)\binom{kn-1}{n-1} nn-subsets of [kn][kn] in the lexicographic order correspond to subsets that contain the element 1. Hence, for every x<(kn1n1)x<\binom{kn-1}{n-1}, we have that the set DCovern,kn(s0α+1βA(x))D_{\textsf{Cover}}^{n,kn}(s-0^{\alpha+1-\beta}\mathbin{\|}A(x)) is an nn-subset of [kn][kn] which contains the element 1. We observe that CC defines an instance of general-Erdős-Ko-Radok.

Now, suppose that we have a solution to this instance. We consider each solution type separately.

  1. i)

    It cannot be xx such that |C(x)|n|C(x)|\neq n because C(x)=DCovern,kn(s0α+1βA(x))C(x)=D_{\textsf{Cover}}^{n,kn}(s-0^{\alpha+1-\beta}\mathbin{\|}A(x)) is an nn-subset of [kn][kn].

  2. ii)

    By the previous, 1C(x)1\in C(x) and 1C(y)1\in C(y) so 1C(x)C(y)1\in C(x)\cup C(y), which is a contradiction.

  3. iii)

    By injectivity of DCovern,knD_{\textsf{Cover}}^{n,kn} on its first (knn)\binom{kn}{n} inputs (see Lemma 3.2), we get that (s0α+1βA(x))=(s0α+1βA(y))(s-0^{\alpha+1-\beta}\mathbin{\|}A(x))=(s-0^{\alpha+1-\beta}\mathbin{\|}A(y)) hence A(x)=A(y)A(x)=A(y) and from there we can retrieve a collision for CC^{\prime} by design of AA.

  4. iv)

    If it is xx such that C(x)C(x) is one of the kk particular subsets we’re looking for, since we know that 1C(x)1\in C(x), it means that C(x)=[n]C(x)=[n]. When we consider nn-subsets of [kn][kn], the characteristic vector of [n][n] is the last one in the lexicographic order, which means that [n]=DCovern,kn(s)[n]=D_{\textsf{Cover}}^{n,kn}(s). Furthermore, [n]=C(x)=DCovern,kn(s0α+1βA(x))[n]=C(x)=D_{\textsf{Cover}}^{n,kn}(s-0^{\alpha+1-\beta}\mathbin{\|}A(x)), the rank of s0α+1βA(x)s-0^{\alpha+1-\beta}\mathbin{\|}A(x) in the lexicographic order is between (knn)(kn1n1)+1\binom{kn}{n}-\binom{kn-1}{n-1}+1 and (knn)\binom{kn}{n} and DCovern,knD_{\textsf{Cover}}^{n,kn} is injective on its first (knn)\binom{kn}{n} inputs. Thus, s0α+1βA(x)=ss-0^{\alpha+1-\beta}\mathbin{\|}A(x)=s, which implies that A(x)=0A(x)=0. By definition of AA, this can only mean that C(x)=0mC^{\prime}(x)=0^{m}.

In either case, we get a solution to our original problem. ∎

Proposition 4.24.

Under 4.18, general-Erdős-Ko-RadokPPP\textsc{general-Erdős-Ko-Rado${}_{k}$}\in\textsf{PPP}.

Proof.

The proof of this result resembles a lot the proof of Proposition 4.19. The idea is the same: we are given (kn1n1)\binom{kn-1}{n-1} subsets of [kn][kn]. We map each of them to an element of [(kn1n1)[\binom{kn-1}{n-1} as follows. If a set does not have nn elements, we map it anywhere, and if it has nn elements, we map it to the only ii such that this set is in 𝒜i\mathcal{A}_{i}. This defines an instance of Pigeon. If we have a collision, it results in a solution like before. If we have a preimage of 0, it is a set in 𝒜1\mathcal{A}_{1}, which means it is one of the sets we are looking for. The definition of CC^{\prime} has some technicality since we need to take care of the last inputs to make sure that they are not involved in a collision or result in a preimage of 0.

More formally, we have by assumption a circuit Bar:{0,1}kn[(kn1n1)]Bar:\{0,1\}^{kn}\rightarrow[\binom{kn-1}{n-1}] which takes as input an nn-subset of [kn][kn] and returns the only index ii such that this subset appears in 𝒜i\mathcal{A}_{i}. We define a circuit Bar:{0,1}kn{0,1}log(kn1n1)Bar^{\prime}:\{0,1\}^{kn}\rightarrow\{0,1\}^{\left\lceil\log\binom{kn-1}{n-1}\right\rceil} which takes as input an nn-subset of [kn][kn] and returns the binary encoding on log(kn1n1)\left\lceil\log\binom{kn-1}{n-1}\right\rceil bits of i1i-1 where ii is the only index such that this subset appears in 𝒜i\mathcal{A}_{i}.
Now, suppose that we have an instance C:{0,1}log((kn1n1)){0,1}knC:\{0,1\}^{\left\lceil\log\left(\binom{kn-1}{n-1}\right)\right\rceil}\rightarrow\{0,1\}^{kn} of weak-general-Erdős-Ko-Radok.
We set

C(x)={BarC(x)if x<(kn1n1)xif x(kn1n1)C^{\prime}(x)=\begin{cases}Bar^{\prime}\circ C(x)&\text{if $x<\binom{kn-1}{n-1}$}\\ x&\text{if $x\geq\binom{kn-1}{n-1}$}\end{cases}

Then, we have C:{0,1}log((kn1n1)){0,1}log(kn1n1)C^{\prime}:\{0,1\}^{\left\lceil\log\left(\binom{kn-1}{n-1}\right)\right\rceil}\rightarrow\{0,1\}^{\left\lceil\log\binom{kn-1}{n-1}\right\rceil} so CC^{\prime} is an instance of Pigeon.
Now, suppose that we have a solution to this instance of Pigeon. There are two cases to consider.

  1. 1.

    It is xy{0,1}log((kn1n1))x\neq y\in\{0,1\}^{\left\lceil\log\left(\binom{kn-1}{n-1}\right)\right\rceil} such that C(x)=C(y)C^{\prime}(x)=C^{\prime}(y). By construction of CC^{\prime} (and by definition of BarBar^{\prime}), this means that x,y<(kn1n1)x,y<\binom{kn-1}{n-1}. We have Bar(C(x))=Bar(C(y))Bar^{\prime}(C(x))=Bar^{\prime}(C(y)). If one of C(x),C(y)C(x),C(y) does not have size nn, we have a solution to our instance of general-Erdős-Ko-Radok, and similarly if C(x)=C(y)C(x)=C(y). Otherwise, it means that C(x),C(y)C(x),C(y) are distinct nn-subsets of [kn][kn] that appear in the same (n,kn)(n,kn)-parallel class. By definition of a parallel class, it means that these 2 sets are part of a partition of [kn][kn], hence they don’t intersect and they form a solution to our original instance of general-Erdős-Ko-Radok.

  2. 2.

    It is xx such that C(x)=0log((kn1n1))C^{\prime}(x)=0^{\left\lceil\log\left(\binom{kn-1}{n-1}\right)\right\rceil}. By construction of CC^{\prime}, it means that x<(kn1n1)x<\binom{kn-1}{n-1}. We have Bar(C(x))=0log((kn1n1))Bar^{\prime}(C(x))=0^{\left\lceil\log\left(\binom{kn-1}{n-1}\right)\right\rceil}. If C(x)C(x) does not have size nn, it is a solution to our original instance. If it has size nn, it means that it is an nn-subset of [kn][kn] which is in 𝒜1\mathcal{A}_{1}. By assumption, the only such subsets are the particular ones we’re looking for. Hence, xx is a solution to our original instance of general-Erdős-Ko-Radok.∎

Remark 4.25.

As before, the idea behind that proof is to compose our instance of general-Erdős-Ko-Radok with the property-preserving encoding BarBar. However, this time it is not only the collisions that are of interest to us, but also the preimages of the 0 string.

The previous two propositions establish the following result.

Theorem 4.26.

Under 4.18, general-Erdős-Ko-Radok is PPP-complete.

5 Sperner’s Theorem on Largest Antichains

We now turn our attention to a different existence theorem from extremal combinatorics, concerning antichains. We say a family of sets 2Ω\mathcal{F}\subseteq 2^{\Omega} is an antichain if for every ABA\neq B\in\mathcal{F}, it holds that ABA\not\subseteq B. A well-known theorem by Sperner gives a characterization of the largest antichain. As before, for an appropriate input size, this induces a total search problem of finding two distinct sets A,BA,B for which ABA\subseteq B. As in the previous section, we consider both a weak and a strong version, and prove the weak version to be PWPP-complete, and the strong one PPP-complete.

Classical Theorem 3 (Sperner [Spe28]).

The largest antichain on any universe of 2n2n elements is unique and consists of all subsets of size nn.

Like before, we consider an implicit representation of the collection of subsets via a circuit CC whose input corresponds to an index into the collection, and whose output is the characteristic vector of the corresponding set.

Definition 5.1 (weak-Sperner-Antichain).

The problem weak-Sperner-Antichain is defined by the relation

Instance:

A Boolean circuit C:{0,1}log((2nn))+1{0,1}2nC\colon\{0,1\}^{\left\lceil\log\left(\binom{2n}{n}\right)\right\rceil+1}\to\{0,1\}^{2n}.

Solution:

xyx\neq y s.t. C(x)C(y)C(x)\subseteq C(y).

Theorem 5.2.

weak-Sperner-Antichain is PWPP-complete

For the rest of this section, we set α=log(2nn)=log(2n1n1)+1\alpha=\left\lceil\log\binom{2n}{n}\right\rceil=\left\lceil\log\binom{2n-1}{n-1}\right\rceil+1.

Lemma 5.3.

weak-Sperner-Antichain is PWPP-hard.

Proof.

We explain the reduction at a high level. We reduce from weak-Erdős-Ko-Rado and create an instance of weak-Sperner-Antichain by including each set from the weak-Erdős-Ko-Rado instance, as well as its complement. If we find a solution to weak-Sperner-Antichain, one of the sets must be contained within another. If one of the two sets does not have size nn, we obtain a solution to weak-Erdős-Ko-Rado of type i). Otherwise, the duplicated sets must be equal, and hence the original sets are either equal, or one of the sets is the complement of the other.

Formally, suppose that we have an instance C:{0,1}α{0,1}2nC:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{2n} of weak-Erdős-Ko-Rado. Write x=ybx=yb where bb is a bit. We build an instance C:{0,1}α+1{0,1}2nC^{\prime}:\{0,1\}^{\alpha+1}\rightarrow\{0,1\}^{2n} of Sperner-Antichain as follows.

C(x)={C(y)if b=0C(y)¯if b=1C^{\prime}(x)=\begin{cases}C(y)&\text{if $b=0$}\\ \overline{C(y)}&\text{if $b=1$}\end{cases}

Now, suppose that we have a solution to this instance of Sperner-Antichain, that is xxx\neq x^{\prime} such that C(x)=C(x)C^{\prime}(x)=C^{\prime}(x^{\prime}). Write x=ybx=yb and x=ybx^{\prime}=y^{\prime}b^{\prime}. There are four cases to consider. If b=b=0b=b^{\prime}=0. Then yyy\neq y^{\prime} and C(y)=C(x)C(x)=C(y)C(y)=C^{\prime}(x)\subseteq C^{\prime}(x^{\prime})=C(y^{\prime}). If C(y)C(y) and C(y)C(y^{\prime}) both have size nn, then C(y)=C(y)C(y)=C(y^{\prime}), and if this is not the case we get a solution for CC. In both cases, we get a solution for weak-Erdős-Ko-Rado. The other cases are similar; in all four cases, we get a solution to our original problem, so weak-Sperner-Antichain is PWPP-hard. ∎

Classical Theorem 4 (Dilworth’s Theorem, [Dil50]).

The size of the largest antichain in (2[2n],)(2^{[2n]},\subseteq) is equal to the size of the smallest chain partition, namely (2nn)\binom{2n}{n}.

Lemma 5.4.

weak-Sperner-AntichainPWPP\textsc{weak-Sperner-Antichain}\in\textsf{PWPP}.

Proof.

We give a high-level overview of the reduction from weak-Sperner-Antichain to weak-Pigeon.

Fix an arbitrary partition into chains of (2[2n],)(2^{[2n]},\subseteq) of size (2nn)\binom{2n}{n} (which exists by Classical Theorems 3 and 4). Since we have more than (2nn)\binom{2n}{n} inputs in an instance of weak-Sperner-Antichain, by the pigeonhole principle, two distinct inputs must end up in the same chain. We want to give an identifier to each of these chains, using α\alpha bits, such that for any subset we are be able to quickly find the identifier of the chain to which it belongs. To do so, in each chain, we choose as representative the nn-subset of the chain, that is guaranteed to exist by Classical Theorem 4. Then, the identifier of the chain is the Cover encoding on this subset. To map a subset to the representative of its chain, we make use Catalan factorizations (Section 3.4). Once we have this, from each subset we can efficiently get the nn-subset in its chain and therefore the identifier of the chain. Finally, a collision in the identifiers is equivalent to two elements in the same chain, which means a solution for weak-Sperner-Antichain.

Formally, let C:{0,1}α+1{0,1}2nC:\{0,1\}^{\alpha+1}\rightarrow\{0,1\}^{2n} be an instance of weak-Sperner-Antichain. We proceed to construct an instance of weak-Pigeon as follows: if x{0,1}α+1x\in\{0,1\}^{\alpha+1}, we have X:=C(x){0,1}2nX:=C(x)\in\{0,1\}^{2n} which represents a subset of [2n][2n]. Let (X,k)=ECatalan(X)(X^{\prime},k)=E_{\textsf{Catalan}}(X) be the Catalan factorization of XX, ll be the number of zz’s in XX^{\prime} and mm the number of bits underlined during the construction of XX^{\prime}. Note that every time we underline bits we underline simultaneously a 0 and a 1, thus mm is even. Then, l=2nml=2n-m is an even number. Now, let S(x)=DCatalan(l/2)(X)S(x)=D_{\textsf{Catalan}}^{(l/2)}(X^{\prime}). Then, since XX^{\prime} has the same number of 1’s and 0’s and since we replaced half of the zz’s by 1’s and the other half by 0’s, we have that S(x)S(x) represents an nn-subset of [2n][2n]. Informally, it is the nn-subset of the chain that contains XX, and replacing zz’s by 1’s enables us to move inside that chain. Finally, we set C(x)=ECover(S(x)){0,1}αC^{\prime}(x)=E_{\textsf{Cover}}(S(x))\in\{0,1\}^{\alpha}. We observe that CC^{\prime} is an instance of weak-Pigeon.

Now suppose that we have a solution to this instance of weak-Pigeon, that is xyx\neq y such that C(x)=C(y)C^{\prime}(x)=C^{\prime}(y). Then, by injectivity of ECoverE_{\textsf{Cover}} on the nn-subsets of [2n][2n] (see Lemma 3.2), we get that S(x)=S(y)S(x)=S(y). Informally, this means that C(x)C(x) and C(y)C(y) belong to the same chain and thus that one is contained is the other. Let’s now prove it formally. Let (X,k)=ECatalan(X)=ECatalan(C(x))(X^{\prime},k)=E_{\textsf{Catalan}}(X)=E_{\textsf{Catalan}}(C(x)) be the Catalan factorization of XX and ll be the number of zz’s in XX^{\prime}, and let (Y,k)=ECatalan(Y)=ECatalan(C(y))(Y^{\prime},k^{\prime})=E_{\textsf{Catalan}}(Y)=E_{\textsf{Catalan}}(C(y)). We have S(x)=DCatalan(X,l/2)S(x)=D_{\textsf{Catalan}}(X^{\prime},l/2) so by Lemma 3.10, the Catalan string that corresponds to S(x)S(x) is XX^{\prime}. Similarly, the Catalan string that corresponds to S(y)S(y) is YY^{\prime}. Since S(x)=S(y)S(x)=S(y), we get X=YX^{\prime}=Y^{\prime}. We have that X=DCatalan(ECatalan(X))X=D_{\textsf{Catalan}}(E_{\textsf{Catalan}}(X)) and that Y=DCatalan(ECatalan(Y))Y=D_{\textsf{Catalan}}(E_{\textsf{Catalan}}(Y)) by Lemma 3.9, so X=DCatalan(X,k)X=D_{\textsf{Catalan}}(X^{\prime},k) and Y=DCatalan(Y,k)=DCatalan(X,k)Y=D_{\textsf{Catalan}}(Y^{\prime},k^{\prime})=D_{\textsf{Catalan}}(X^{\prime},k^{\prime}). By symmetry of xx and yy we can assume that kkk\leq k^{\prime}. Then, to go from XX^{\prime} to XX we added kk elements (the ones corresponding to the last kk zz’s in XX^{\prime}) while to go from XX^{\prime} to YY we added these same kk elements plus kkk^{\prime}-k others. Hence, C(x)=XY=C(y)C(x)=X\subseteq Y=C(y). ∎

Remark 5.5.

Consider the circuit E:{0,1}2n{0,1}αE:\{0,1\}^{2n}\rightarrow\{0,1\}^{\alpha}, defined as follows. On input X{0,1}2nX\in\{0,1\}^{2n}, it computes (X,k)(X^{\prime},k) the Catalan factorization of XX, ll the number of zz in XX^{\prime}. Then, it computes S(X)=DCatalan(l/2)(X)S(X)=D_{\textsf{Catalan}}^{(l/2)}(X^{\prime}) and finally returns ECover(S(X))E_{\textsf{Cover}}(S(X)).
Let 𝒳=2[2n]\mathcal{X}=2^{[2n]}. We define an equivalence relation on 𝒳\mathcal{X} by saying that two subsets are equivalent if and only if they have the same Catalan string.
Then, we showed in the previous proof that EE is a property-preserving encoding for \sim on 𝒳\mathcal{X}. Note that we also showed that if we have two equivalent subsets, one is included in the other. Hence, the property that is preserved by EE is such that if two of its inputs collide, they form a solution to our problem.
Then, to prove the inclusion of weak-Sperner-Antichain into PWPP, it suffices to compose our instance of weak-Sperner-Antichain with EE.

PPP-completeness using the tight bound

As with Erdős-Ko-Rado, we observe that the bound in theorem is tight, and we know the unique antichain of size (2nn)\binom{2n}{n}, so we have some structural information about any collection of size (2nn)\binom{2n}{n}. From that strong theorem, employing the same technique as before, we modify the problem to let the circuit represent a collection of that exact size. By Classical Theorem 3, we observe that if \mathcal{F} is an antichain with ||=(2nn)|\mathcal{F}|=\binom{2n}{n}, then \mathcal{F} must contain [n]¯\overline{[n]}. This leads us to define the following problem.

Definition 5.6 (Sperner-Antichain).

The problem Sperner-Antichain is defined by the relation

Instance:

A Boolean circuit C:{0,1}log((2nn)){0,1}2nC\colon\{0,1\}^{\left\lceil\log\left(\binom{2n}{n}\right)\right\rceil}\to\{0,1\}^{2n}.

Solution:

One of the following:

  1. i)

    xyx\neq y s.t. C(x)C(y)C(x)\subseteq C(y) and x,y<(2nn)x,y<\binom{2n}{n},

  2. ii)

    xx s.t. C(x)=[n]¯C(x)=\overline{[n]} and x<(2nn)x<\binom{2n}{n}.

Theorem 5.7.

Sperner-Antichain is PPP-complete.

Lemma 5.8.

Sperner-Antichain is PPP-hard.

Proof.

Same proof as for Lemma 5.3, by reduction from Erdős-Ko-Rado. Observe that if we have a solution of type ii) for Sperner-Antichain, the corresponding set in the Erdős-Ko-Rado instance is either [n][n] or [n]¯\overline{[n]}, which is one of the desired solutions to Erdős-Ko-Rado. ∎

Lemma 5.9.

Sperner-AntichainPPP\textsc{Sperner-Antichain}\in\textsf{PPP}.

Proof.

Informally, this proof is the same as the proof of Lemma 5.3, with some additional technical details. First, we need to take care of preimages of 0. The indices corresponding to preimages of 0 correspond to solutions of type ii)ii). Second, since we only care about the first (2nn)\binom{2n}{n} inputs, we have to make sure that the last ones are not part of a collision, or result in a preimage of 0.

Formally, let C:{0,1}α{0,1}2nC:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{2n} be an instance of Sperner-Antichain. We proceed to construct an instance of Pigeon as follows: if x{0,1}αx\in\{0,1\}^{\alpha}, we have X:=C(x){0,1}2nX:=C(x)\in\{0,1\}^{2n} which is a subset of [2n][2n]. Let (X,k)=ECatalan(X)(X^{\prime},k)=E_{\textsf{Catalan}}(X) be the Catalan factorization of XX, ll be the number of zz’s in XX^{\prime} and mm the number of bits underlined during the construction of XX^{\prime}. Note that every time we underline bits we underline simultaneously a 0 and a 1, thus mm is even. Then, l=2nml=2n-m is an even number. Now, let

S(x)=DCatalan(l/2)(X)S(x)=D_{\textsf{Catalan}}^{(l/2)}(X^{\prime})

Then, since XX^{\prime} has the same number of 1’s and 0’s and since we replaced half of the zz’s by 1’s and the other half by 0’s, we have that S(x)S(x) represents an nn-subset of [2n][2n]. Informally, it is the nn-subset of the chain that contains XX, and replacing zz’s by 1’s enables us to move inside that chain. Finally, we set,

C(x)={ECover(S(x))if x<(2nn)xif x(2nn)C^{\prime}(x)=\begin{cases}E_{\textsf{Cover}}(S(x))&\text{if $x<\binom{2n}{n}$}\\ x&\text{if $x\geq\binom{2n}{n}$}\end{cases}

Then C:{0,1}α{0,1}αC^{\prime}:\{0,1\}^{\alpha}\rightarrow\{0,1\}^{\alpha} is an instance of Pigeon and has polynomial size. Suppose that we have a solution to this instance of Pigeon of the form xx such that C(x)=0αC^{\prime}(x)=0^{\alpha}. Then, x<(2nn)x<\binom{2n}{n} and ECover(S(x))=0αE_{\textsf{Cover}}(S(x))=0^{\alpha} so S(x)=[n]¯S(x)=\overline{[n]}. Let (X,k)=ECatalan(X)=ECatalan(C(x))(X^{\prime},k)=E_{\textsf{Catalan}}(X)=E_{\textsf{Catalan}}(C(x)) be the Catalan factorization of XX. Like previously, we get that the Catalan string that corresponds to S(x)S(x) is XX^{\prime}. However, S(x)=[n]¯S(x)=\overline{[n]} and the Catalan string that corresponds to [n]¯\overline{[n]} is 0n1n0^{n}\mathbin{\|}1^{n}. Thus, X=0n1nX^{\prime}=0^{n}\mathbin{\|}1^{n}. Now, C(x)=DCatalanECatalan(C(x))=DCatalan(0n1n,k)=0n1nC(x)=D_{\textsf{Catalan}}\circ E_{\textsf{Catalan}}(C(x))=D_{\textsf{Catalan}}(0^{n}\mathbin{\|}1^{n},k)=0^{n}\mathbin{\|}1^{n}, so C(x)=[n]¯C(x)=\overline{[n]}.

Suppose instead that we have a solution to this instance of Pigeon, of the form xyx\neq y such that C(x)=C(y)C^{\prime}(x)=C^{\prime}(y). Like before, we have x,y<(2nn)x,y<\binom{2n}{n}. Then, by injectivity of ECoverE_{\textsf{Cover}} on the nn-subsets of [2n][2n] (see Lemma 3.2), we get that S(x)=S(y)S(x)=S(y). Informally, this means that C(x)C(x) and C(y)C(y) belong to the same chain and thus that one is contained is the other. Let (X,k)=ECatalan(X)=ECatalan(C(x))(X^{\prime},k)=E_{\textsf{Catalan}}(X)=E_{\textsf{Catalan}}(C(x)) be the Catalan factorization of XX and ll be the number of zz’s in XX^{\prime}, and let (Y,k)=ECatalan(Y)=ECatalan(C(y))(Y^{\prime},k^{\prime})=E_{\textsf{Catalan}}(Y)=E_{\textsf{Catalan}}(C(y)). We have S(x)=DCatalan(X,l/2)S(x)=D_{\textsf{Catalan}}(X^{\prime},l/2) so by Lemma 3.10, the Catalan string that corresponds to S(x)S(x) is XX^{\prime}. Similarly, the Catalan string that corresponds to S(y)S(y) is YY^{\prime}. Since S(x)=S(y)S(x)=S(y), we get X=YX^{\prime}=Y^{\prime}. We have that X=DCatalan(ECatalan(X))X=D_{\textsf{Catalan}}(E_{\textsf{Catalan}}(X)) and that Y=DCatalan(ECatalan(Y))Y=D_{\textsf{Catalan}}(E_{\textsf{Catalan}}(Y)) by Lemma 3.9, so X=DCatalan(X,k)X=D_{\textsf{Catalan}}(X^{\prime},k) and Y=DCatalan(Y,k)=DCatalan(X,k)Y=D_{\textsf{Catalan}}(Y^{\prime},k^{\prime})=D_{\textsf{Catalan}}(X^{\prime},k^{\prime}). By symmetry of xx and yy we can assume that kkk\leq k^{\prime}. Then, to go from XX^{\prime} to XX we added kk elements (the ones corresponding to the last kk zz’s in XX^{\prime}) while to go from XX^{\prime} to YY we added these same kk elements plus kkk^{\prime}-k others. Hence, C(x)=XY=C(y)C(x)=X\subseteq Y=C(y). ∎

Remark 5.10.

Like previously, the idea behind that proof is to compose our instance of Sperner-Antichain with the property-preserving encoding we defined in Remark 5.5. However, this time it is not only the collisions that are of interest to us, but also the preimages of the 0 string.

6 Cayley’s Tree Formula

We consider yet another classic theorem from combinatorics, related to spanning trees. A classic result by Cayley establishes the number of spanning trees of the complete graph on nn vertices. We observe then that if we have a collection of sufficiently many such graphs, either one of the graphs is not a spanning tree, or two spanning trees collide. Note that two isomorphic trees on distinct vertices are not considered a collision. This allows us to define a total search problem of either finding a collision or finding an index not corresponding to a spanning tree. We represent trees using a bitmap on all possible edges, ordered arbitrarily. We show that this problem is equivalent to weak-Pigeon, in a more direct way than for the previous results. As before, the problem can be modified using the same technique as previously to become equivalent to Pigeon, and thus PPP-complete.

Classical Theorem 5 (Cayley [Cay89]).

There are exactly nn2n^{n-2} spanning trees of the complete graph on nn vertices.

Definition 6.1 (weak-Cayley).

The problem weak-Cayley is defined by the relation

Instance:

A Boolean circuit C:{0,1}(n2)log(n)+1{0,1}(n2)C\colon\{0,1\}^{\lceil(n-2)\log(n)\rceil+1}\to\{0,1\}^{\binom{n}{2}}.

Solution:

One of the following:

  1. i)

    xx s.t. C(x)C(x) is not a spanning tree (i.e., is not spanning, not connected or contains a cycle),

  2. ii)

    xyx\neq y s.t. C(x)=C(y)C(x)=C(y).

Theorem 6.2.

weak-Cayley is PWPP-complete.

For the rest of this section, we set β=(n2)log(n)\beta=\lceil(n-2)\log(n)\rceil.

Lemma 6.3.

weak-CayleyPWPP\textsc{weak-Cayley}\in\textsf{PWPP}.

Proof.

We reduce to weak-Pigeon. Unlike the previous problems, here, we are interested in a very simple algebraic structure, namely equality. Thus, we want collisions in our encoding to correspond to equality. This means that we want an efficiently computable injective encoding of spanning trees. For this, we use Prüfer codes (Section 3.3). We map any input xx to the Prüfer encoding of C(x)C(x) and, therefore, a collision either yield a collision in the trees or a graph that is not a spanning tree.

Formally, suppose that we have C:{0,1}(n2)log(n)+1{0,1}(n2)C\colon\{0,1\}^{\lceil(n-2)\log(n)\rceil+1}\to\{0,1\}^{\binom{n}{2}} an instance of Cayley. We may define an instance of weak-Pigeon by setting C(x)=E~Prüfer(C(x))C^{\prime}(x)=\tilde{E}_{\textsf{Prüfer}}(C(x)). We observe that C:{0,1}β+1{0,1}βC^{\prime}:\{0,1\}^{\beta+1}\rightarrow\{0,1\}^{\beta} is indeed an instance of weak-Pigeon. By definition, C(x)C^{\prime}(x) is the rank in the lexicographic order of the Prüfer code of C(x)C(x). Now, suppose that we have a solution to this instance, that is xy{0,1}β+1x\neq y\in\{0,1\}^{\beta+1} such that C(x)=C(y)C^{\prime}(x)=C^{\prime}(y). Then, E~Prüfer(C(x))=E~Prüfer(C(y))\tilde{E}_{\textsf{Prüfer}}(C(x))=\tilde{E}_{\textsf{Prüfer}}(C(y)). If C(x)C(x) or C(y)C(y) is not a spanning tree, then we have a solution to our original instance of Cayley. Otherwise, C(x)C(x) and C(y)C(y) are spanning trees, so by injectivity of E~Prüfer\tilde{E}_{\textsf{Prüfer}} on the set of labelled spanning trees on nn vertices (see Lemma 3.5), we have C(x)=C(y)C(x)=C(y) which is a solution to our original instance of weak-Cayley. ∎

Remark 6.4.

Here, we can interpret E~Prüfer\tilde{E}_{\textsf{Prüfer}} as a property-preserving encoding on the set of labelled spanning trees on nn vertices, where the equivalence relation is equality. Hence, this is another proof of inclusion using property-preserving encodings, where we compose the instance of our problem with an appropriate property-preserving encoding. The equivalence relation has to be equality since the only spanning trees that are solutions of weak-Cayley are spanning trees that are equal.

Lemma 6.5.

weak-Cayley is PWPP-hard.

Proof.

We interpret the output of the weak-Pigeon instance as an index into the collection of all labelled spanning trees on nn vertices. By correctness of the encoding, the output necessarily is a spanning tree and, hence, the only solutions are collisions. We also detail some technical work to get a circuit with the right input size and output size, for which finding collisions allows solving the original instance of weak-Pigeon.

Formally, let C:{0,1}m+1{0,1}mC^{\prime}:\{0,1\}^{m+1}\rightarrow\{0,1\}^{m} be an instance of weak-Pigeon. We define a circuit A:{0,1}m+2{0,1}mA:\{0,1\}^{m+2}\rightarrow\{0,1\}^{m} as follows. For any x{0,1}m+2x\in\{0,1\}^{m+2}, write x=yzx=y\mathbin{\|}z with y{0,1}m+1y\in\{0,1\}^{m+1} and z{0,1}z\in\{0,1\}. Then, we set A(x)=C(C(y)z)A(x)=C^{\prime}(C^{\prime}(y)\mathbin{\|}z). Note that AA still has polynomial size and that any collision in AA allows us to retrieve a collision for CC^{\prime} (like in the Merkle-Damgård construction, see [Mer79]).

Let nn be the smallest integer such that m+1(n2)log(n)m+1\leq(n-2)\log(n). Note that nn is polynomial in mm. Let β=(n2)log(n)\beta=\lceil(n-2)\log(n)\rceil. Then, m+1βm+1\leq\beta, hence m+2β+1m+2\leq\beta+1. Now, we define a circuit A:{0,1}β+1{0,1}β1A^{\prime}:\{0,1\}^{\beta+1}\rightarrow\{0,1\}^{\beta-1} as follows. For any x{0,1}β+1x\in\{0,1\}^{\beta+1}, write x=yzx=y\mathbin{\|}z with y{0,1}m+2y\in\{0,1\}^{m+2} and z{0,1}β+1m2z\in\{0,1\}^{\beta+1-m-2}. Then, we set A(x)=A(y)zA^{\prime}(x)=A(y)\mathbin{\|}z. Note that AA^{\prime} also has polynomial size and that any collision in AA^{\prime} allows us to retrieve a collision for AA hence for CC^{\prime}.

Recall that we have E~Prüfer:{0,1}(n2){0,1}β\tilde{E}_{\textsf{Prüfer}}:\{0,1\}^{\binom{n}{2}}\rightarrow\{0,1\}^{\beta} and D~Prüfer:{0,1}β{0,1}(n2)\tilde{D}_{\textsf{Prüfer}}:\{0,1\}^{\beta}\rightarrow\{0,1\}^{\binom{n}{2}}. We now define an instance CC of Cayley by setting C(x)=D~Prüfer(0A(x))C(x)=\tilde{D}_{\textsf{Prüfer}}(0\mathbin{\|}A^{\prime}(x)). Now, suppose that we have a solution to this instance of Cayley. For every xx, 0A(x)0\mathbin{\|}A^{\prime}(x) is one of the first nn2n^{n-2} elements of {0,1}β\{0,1\}^{\beta} in the lexicographic order, so D~Prüfer\tilde{D}_{\textsf{Prüfer}} is well-defined and correct (i.e., it indeed returns a spanning tree) on input 0A(x)0\mathbin{\|}A^{\prime}(x). Then, this solution must be xyx\neq y such that C(x)=C(y)C(x)=C(y). By injectivity of D~Prüfer\tilde{D}_{\textsf{Prüfer}} on its first nn2n^{n-2} inputs (Lemma 3.5), we get that A(x)=A(y)A^{\prime}(x)=A^{\prime}(y) and from this we can retrieve a solution to our original instance of weak-Pigeon. ∎

PPP-completeness using the tight bound

Again, we observe that Classical Theorem 5 gives an exact bound, namely that there are exactly nn2n^{n-2} labelled spanning trees on nn vertices. As before, this leads us to defining the following problem.

Definition 6.6 (Cayley).

The problem Cayley is defined by the relation

Instance:

A Boolean circuit C:{0,1}(n2)log(n){0,1}(n2)C\colon\{0,1\}^{\lceil(n-2)\log(n)\rceil}\to\{0,1\}^{\binom{n}{2}}.

Solution:

One of the following:

  1. i)

    xx s.t. C(x)C(x) is not a spanning tree and x<nn2x<n^{n-2},

  2. ii)

    xyx\neq y s.t. C(x)=C(y)C(x)=C(y) and x<nn2x<n^{n-2},

  3. iii)

    xx s.t. C(x)=T1C(x)=T_{1} and x<nn2x<n^{n-2}, with T1T_{1} defined as in Remark 3.7.

Theorem 6.7.

Cayley is PPP-complete.

Lemma 6.8.

Cayley is PPP-hard.

Proof.

This proof is in spirit similar to the proof of Lemma 6.5. We interpret the outputs of the instance of Pigeon as indices in the list of all spanning trees of the complete graph on nn vertices. Like in previous proofs, we have to define a circuit AA with sufficiently many inputs such that from any collision (resp. preimage of 0) in AA we can find a collision (resp. preimage of 0) in the instance of Pigeon. In the instance of Cayley we create, preimages of T1T_{1} correspond to preimages of 0.

Let C:{0,1}m{0,1}mC^{\prime}:\{0,1\}^{m}\rightarrow\{0,1\}^{m} be an instance of Pigeon, and let nn be the smallest integer such that m(n2)log(n)m\leq(n-2)\log(n). Note that nn is polynomial in mm. Let β=(n2)log(n)\beta=\lceil(n-2)\log(n)\rceil. We define A:{0,1}β{0,1}βA:\{0,1\}^{\beta}\rightarrow\{0,1\}^{\beta} as follows.

A(x)={C(x)if x<2mxif x2mA(x)=\begin{cases}C^{\prime}(x)&\text{if $x<2^{m}$}\\ x&\text{if $x\geq 2^{m}$}\end{cases}

If necessary, we pad the outputs of AA on the left by 0’s so that they have length β\beta (this might be necessary for x<2mx<2^{m}). Note that A([2m1])[2m1]A([2^{m}-1])\subseteq[2^{m}-1] and AA acts as the identity over [2β1][2m1][2^{\beta}-1]\setminus[2^{m}-1], hence any solution to AA as an instance of Pigeon immediately gives a solution to CC^{\prime}. Recall that we have E~Prüfer:{0,1}(n2){0,1}β\tilde{E}_{\textsf{Prüfer}}:\{0,1\}^{\binom{n}{2}}\rightarrow\{0,1\}^{\beta} and D~Prüfer:{0,1}β{0,1}(n2)\tilde{D}_{\textsf{Prüfer}}:\{0,1\}^{\beta}\rightarrow\{0,1\}^{\binom{n}{2}}. Then, we define an instance CC of Cayley by setting C(x)=D~Prüfer(A(x))C(x)=\tilde{D}_{\textsf{Prüfer}}(A(x)).

Now, suppose that we have a solution to this instance of Cayley. Every solution must consist of inputs <nn2<n^{n-2} but A([nn21])[nn21]A([n^{n-2}-1])\subseteq[n^{n-2}-1] by construction of AA, and D~Prüfer\tilde{D}_{\textsf{Prüfer}} is well-defined, correct and injective on this set by Lemma 3.5. This implies that this solution can not be xx such that C(x)C(x) is not a spanning tree. Then, suppose that this solution is xyx\neq y such that C(x)=C(y)C(x)=C(y). By injectivity of D~Prüfer\tilde{D}_{\textsf{Prüfer}} on [nn21][n^{n-2}-1], we get that A(x)=A(y)A(x)=A(y) and from this we can retrieve a solution to our original instance of Pigeon. Now, if this solution is xx such that C(x)=T1C(x)=T_{1} then this means that A(x)=0βA(x)=0^{\beta} by Remark 3.7 and injectivity of DPrüferD_{\textsf{Prüfer}} over [nn21][n^{n-2}-1] so C(x)=0mC^{\prime}(x)=0^{m}. ∎

Lemma 6.9.

CayleyPPP\textsc{Cayley}\in\textsf{PPP}.

Proof.

The idea behind the proof is similar to that of Lemma 6.3, using E~Prüfer\tilde{E}_{\textsf{Prüfer}} to create an instance of Pigeon except that we restrict the circuit to only apply the first nn2n^{n-2} elements of the collection, and set it to the identity on the rest of the inputs. Any preimage of 0 correspond to a preimage of T1T_{1}, and collisions arise from graphs that are not spanning trees, as well as collisions in the Cayley instance. ∎

7 Ward-Szabo Theorem on Swell Colorings

We now focus on a different theorem from extremal combinatorics, and more precisely from extremal graph theory. Let G=(V,E)G=(V,E) be the complete graph on NN vertices. An edge-coloring c:E[r]c:E\rightarrow[r] for some rr is called a swell coloring of GG if it uses at least 2 colors and if every triangle is either monochromatic or trichromatic. It is rather straightforward to see that in any 22-coloring of GG, there must exist a bichromatic triangle. On the contrary, if we color each edge with a different color, we trivially get a swell coloring. The natural question that appears is then to determine the minimal number of colors required to swell-color the complete graph on NN vertices. This was solved in some cases by Ward and Szabo in 1995.

Classical Theorem 6 (Ward-Szabo [WS95]).

The complete graph on NN vertices cannot be swell-colored with fewer than N+1\sqrt{N}+1 colors, and this bound is tight.

From that theorem, we can define a TFNP problem as follows: the input is a coloring CC of the edges of the complete graph on 22n2^{2n} vertices with 2n2^{n} colors, as well as three vertices a,b,ca,b,c such that C(a,b)C(a,c)C(a,b)\neq C(a,c) to guarantee that at least 2 colors are used in the coloring. A solution is then the vertices of a bichromatic triangle (which is guaranteed to exist by Classical Theorem 6). We also allow extra solutions, one to specify that the edges (a,b)(a,b) and (a,c)(a,c) have the same color, and one if the coloring of the graph is not consistent.

Definition 7.1 (Ward-Szabo).

The problem Ward-Szabo is defined by the relation

Instance:

The following:

  1. 1.

    A Boolean circuit C:{0,1}2n×{0,1}2n{0,1}nC\colon\{0,1\}^{2n}\times\{0,1\}^{2n}\to\{0,1\}^{n}; and,

  2. 2.

    Distinct a,b,c{0,1}2na,b,c\in\{0,1\}^{2n}.

Solution:

One of the following:

  1. i)

    0 if C(a,b)=C(a,c)C(a,b)=C(a,c),

  2. ii)

    x,yx,y s.t. C(x,y)C(y,x)C(x,y)\neq C(y,x),

  3. iii)

    Distinct x,y,zx,y,z s.t. C(x,y)=C(y,z)C(x,z)C(x,y)=C(y,z)\neq C(x,z).

We also define two variants of this problem, whose totality is a consequence of the totality of Ward-Szabo.
In the first one, we allow an extra type of solution, namely the vertices of two distinct triangles with the same “color profile”.

Definition 7.2 (Ward-Szabo-Collisions).

The problem Ward-Szabo-Collisions is defined by the relation

Instance:

The following:

  1. 1.

    A Boolean circuit C:{0,1}2n×{0,1}2n{0,1}nC\colon\{0,1\}^{2n}\times\{0,1\}^{2n}\to\{0,1\}^{n}; and,

  2. 2.

    Distinct a,b,c{0,1}2na,b,c\in\{0,1\}^{2n}.

Solution:

One of the following:

  1. i)

    0 if C(a,b)=C(a,c)C(a,b)=C(a,c),

  2. ii)

    x,yx,y s.t. C(x,y)C(y,x)C(x,y)\neq C(y,x),

  3. iii)

    Distinct x,y,zx,y,z s.t. C(x,y)=C(y,z)C(x,z)C(x,y)=C(y,z)\neq C(x,z),

  4. iv)

    Two triples, (x,y,z),(x,y,z)(x,y,z),(x^{\prime},y^{\prime},z^{\prime}), each with 3 distinct elements, s.t. {x,y,z}{x,y,z}\{x,y,z\}\neq\{x^{\prime},y^{\prime},z^{\prime}\} and C(x,y)=C(x,y)C(x,y)=C(x^{\prime},y^{\prime}), C(x,z)=C(x,z)C(x,z)=C(x^{\prime},z^{\prime}), C(y,z)=C(y,z)C(y,z)=C(y^{\prime},z^{\prime}).

In the second variant, we allow the same extra type of solution, namely the vertices of two distinct triangles with the same “color profile”, with the additional constraint that these triangles should be trichromatic.

Definition 7.3 (Ward-Szabo-Colorful-Collisions).

The problem Ward-Szabo-Colorful-Collisions is defined by the relation

Instance:

The following:

  1. 1.

    A Boolean circuit C:{0,1}2n×{0,1}2n{0,1}nC\colon\{0,1\}^{2n}\times\{0,1\}^{2n}\to\{0,1\}^{n}; and,

  2. 2.

    Distinct a,b,c{0,1}2na,b,c\in\{0,1\}^{2n}.

Solution:

One of the following:

  1. i)

    0 if C(a,b)=C(a,c)C(a,b)=C(a,c),

  2. ii)

    x,yx,y s.t. C(x,y)C(y,x)C(x,y)\neq C(y,x),

  3. iii)

    Distinct x,y,zx,y,z s.t. C(x,y)=C(y,z)C(x,z)C(x,y)=C(y,z)\neq C(x,z),

  4. iv)

    Two triples (x,y,z),(x,y,z)(x,y,z),(x^{\prime},y^{\prime},z^{\prime}), each with 3 distinct elements, s.t. {x,y,z}{x,y,z}\{x,y,z\}\neq\{x^{\prime},y^{\prime},z^{\prime}\}, C(x,y)=C(x,y)C(x,y)=C(x^{\prime},y^{\prime}), C(x,z)=C(x,z)C(x,z)=C(x^{\prime},z^{\prime}), C(y,z)=C(y,z)C(y,z)=C(y^{\prime},z^{\prime}) and the triangle (x,y,z)(x,y,z) is trichromatic.

Theorem 7.4.

weak-PigeonWard-Szabo-CollisionsWard-Szabo-Colorful-CollisionsWard-Szabo\textsc{weak-Pigeon}\leq\textsc{Ward-Szabo-Collisions}\leq\textsc{Ward-Szabo-Colorful-Collisions}\leq\textsc{Ward-Szabo}.

Proof.

At a high level, we use the weak-Pigeon circuit as the coloring of the graph. If we find a bichromatic triangle, we have found a collision. If we find two triangles with the same “color-profile”, we have also found a collision.

Formally, let us prove that weak-Pigeon reduces to Ward-Szabo-Collisions. Let C:{0,1}n+1{0,1}nC:\{0,1\}^{n+1}\rightarrow\{0,1\}^{n} be an instance of weak-Pigeon. By the Merkle-Damgård construction, we can build a circuit A:{0,1}4n{0,1}nA:\{0,1\}^{4n}\rightarrow\{0,1\}^{n} of polynomial size such that finding a collision for AA allows finding a collision for CC. We set a=02n,b=12na=0^{2n},b=1^{2n} and c=02n11c=0^{2n-1}\mathbin{\|}1. If A(a,b)=A(a,c)A(a,b)=A(a,c) then we have a collision for AA. Otherwise, we have A(a,b)A(a,c)A(a,b)\neq A(a,c). We define a circuit A:{0,1}4n{0,1}nA^{\prime}:\{0,1\}^{4n}\rightarrow\{0,1\}^{n} as follows.

A(x,y)={A(x,y)if xyA(y,x)if x>yA^{\prime}(x,y)=\begin{cases}A(x,y)&\text{if $x\leq y$}\\ A(y,x)&\text{if $x>y$}\end{cases}

Then, we define an instance of Ward-Szabo-Collisions by saying that the coloring is AA^{\prime} and that A(a,b)A(a,c)A^{\prime}(a,b)\neq A^{\prime}(a,c).

Now, suppose that we have a solution to this instance of Ward-Szabo-Collisions. Note that solution cannot be x,yx,y such that A(x,y)A(y,x)A^{\prime}(x,y)\neq A^{\prime}(y,x) by definition of AA^{\prime}. If this solution is distinct x,y,zx,y,z such that A(x,y)=A(x,z)A(y,z)A^{\prime}(x,y)=A^{\prime}(x,z)\neq A^{\prime}(y,z) then A(xy)=A(xz)A^{\prime}(x\mathbin{\|}y)=A^{\prime}(x\mathbin{\|}z). which implies a collision for AA in any case. If this solution is two triples (x,y,z)(x,y,z)(x,y,z)\neq(x^{\prime},y^{\prime},z^{\prime}) such that A(x,y)=A(x,y)A^{\prime}(x,y)=A^{\prime}(x^{\prime},y^{\prime}), A(x,z)=A(x,z)A^{\prime}(x,z)=A^{\prime}(x^{\prime},z^{\prime}), A(y,z)=A(y,z)A^{\prime}(y,z)=A^{\prime}(y^{\prime},z^{\prime}), then by symmetry of x,yx,y and zz, and of x,yx^{\prime},y^{\prime} and zz^{\prime}, we can assume xxx\neq x^{\prime}. If x=yx=y^{\prime} and y=xy=x^{\prime}, then A(x,z)=A(x,z)=A(y,z)A^{\prime}(x,z)=A^{\prime}(x^{\prime},z^{\prime})=A^{\prime}(y,z^{\prime}) and xyx\neq y so this gives us a collision for AA. Otherwise, from A(xy)=A(xy)A^{\prime}(x\mathbin{\|}y)=A^{\prime}(x^{\prime}\mathbin{\|}y^{\prime}), from which we can find a collision for AA.
In all cases, we get a collision for AA from which we can get a collision for CC. ∎

Theorem 7.5.

Ward-Szabo-CollisionsPWPP\textsc{Ward-Szabo-Collisions}\in\textsf{PWPP}.

Proof.

We describe informally the proof. There are only 23n2^{3n} different “color profiles” possible, which is less than the number of triangles containing the vertex 02n0^{2n}. Hence, if we map sufficiently many distinct triangles containing that vertex to their color profile, it defines an instance of weak-Pigeon, and any solution to this instance gives us a solution of type iv)iv).

Formally, let C:{0,1}2n×{0,1}2n{0,1}nC:\{0,1\}^{2n}\times\{0,1\}^{2n}\rightarrow\{0,1\}^{n}, a,b,c{0,1}2na,b,c\in\{0,1\}^{2n}be an instance of Ward-Szabo-Collisions. We consider the “color profile” of some triangles containing the vertex indexed by 02n0^{2n}. Let C:{0,1}3n+1{0,1}3nC^{\prime}:\{0,1\}^{3n+1}\rightarrow\{0,1\}^{3n} be the circuit defined as follows. For every x{0,1}3n+1x\in\{0,1\}^{3n+1}, write x=(yz)x=(y\mathbin{\|}z) with y{0,1}n+3y\in\{0,1\}^{n+3} and z{0,1}2n2z\in\{0,1\}^{2n-2}. Then, let y=(1n2y)y^{\prime}=(1^{n-2}\mathbin{\|}y) and z=(10z){0,1}2nz^{\prime}=(10\mathbin{\|}z)\in\{0,1\}^{2n}. Then, we set C(x)=(C(02n,y),C(02n,z),C(y,z))C^{\prime}(x)=(C(0^{2n},y^{\prime}),C(0^{2n},z^{\prime}),C(y^{\prime},z^{\prime})). CC^{\prime} defines an instance of weak-Pigeon. Suppose now that we have a solution to this instance of weak-Pigeon, that is x1x2x_{1}\neq x_{2} such that C(x1)=C(x2)C^{\prime}(x_{1})=C^{\prime}(x_{2}).
Then, define y1,z1,y2y_{1}^{\prime},z_{1}^{\prime},y_{2}^{\prime} and z2z_{2}^{\prime} as above. Since x1x2x_{1}\neq x_{2}, by construction we have that {02n,y1,z1}{02n,y2,z2}\{0^{2n},y_{1}^{\prime},z_{1}^{\prime}\}\neq\{0^{2n},y_{2}^{\prime},z_{2}^{\prime}\} and that each of these two sets has three distinct elements. Furthermore, C(x1)=C(x2)C^{\prime}(x_{1})=C^{\prime}(x_{2}) implies that C(02n,y1)=C(02n,y2),C(02n,z1)=C(02n,z2)C(0^{2n},y_{1}^{\prime})=C(0^{2n},y_{2}^{\prime}),C(0^{2n},z_{1}^{\prime})=C(0^{2n},z_{2}^{\prime}) and C(y1,z1)=C(y2,z2)C(y_{1}^{\prime},z_{1}^{\prime})=C(y_{2}^{\prime},z_{2}^{\prime}). Hence, we have a solution of type iv)iv) to Ward-Szabo-Collisions. ∎

Remark 7.6.

The last two theorems prove that Ward-Szabo-Collisions is PWPP-complete. However, notice that the proof of inclusion into PWPP does not use solutions of the first three types. Hence, if we call Ward-Szabo-Collisions’ the problem similar to Ward-Szabo-Collisions but without the first three types of solutions, this new problem is also PWPP-complete. Indeed, the proof of inclusion into PWPP would be similar, and the proof of hardness too, only with less cases to consider. Thus, it seems (at least that is how we prove it) that what makes Ward-Szabo-Collisions PWPP-complete is only its last type of solutions. Now, one could wonder how hard this problem becomes if we slightly modify this last type of solutions to make them harder to find. This is exactly what Ward-Szabo-Colorful-Collisions does.

Theorem 7.7.

Ward-Szabo-Colorful-CollisionsPPP\textsc{Ward-Szabo-Colorful-Collisions}\in\textsf{PPP}.

Proof.

We first give an overview of the proof. It is quite similar in spirit to the previous one, but we need to work to avoid getting collisions that would give us 2 monochromatic triangles. This costs an extra bit, hence the inclusion in PPP and not in PWPP. We are given three vertices a,b,c{0,1}2na,b,c\in\{0,1\}^{2n} such that the colors C(a,b),C(a,c)C(a,b),C(a,c) and C(a,c)C(a,c) are distinct (otherwise we have an easy solution to the instance). We create an instance of Pigeon by mapping any vertex xx to the pair of colors (C(x,b),C(x,c))(C(x,b),C(x,c)) if we don’t have C(x,b)=C(x,c)=C(b,c)C(x,b)=C(x,c)=C(b,c) which would be a monochromatic triangle, and to the color C(x,a)C(x,a) otherwise. We need 2n2n bits to make sure that these two types of outputs don’t collide. We make sure that 0 has no preimage. Then, any solution to the instance of Pigeon must be a collision. If it is a collision from the first case, we found 2 distinct non-monochromatic triangles with the same profile, hence a solution of type iii)iii) or iv)iv). If it is a collision from the second case, we found 2 non-monochromatic triangles with the same profile.

Formally, let C:{0,1}2n×{0,1}2n{0,1}nC:\{0,1\}^{2n}\times\{0,1\}^{2n}\rightarrow\{0,1\}^{n} and a,b,c{0,1}2na,b,c\in\{0,1\}^{2n} be an instance of Ward-Szabo-Colorful-Collisions. If C(a,b)=C(a,c)C(a,b)=C(a,c) then we have a solution to this instance of Ward-Szabo-Colorful-Collisions. Now, suppose C(a,b)C(a,c)C(a,b)\neq C(a,c). If C(b,c)=C(a,b)C(b,c)=C(a,b) or C(b,c)=C(a,c)C(b,c)=C(a,c), then we have a solution of type iii)iii) to this instance of Ward-Szabo-Colorful-Collisions. Hence, we can suppose that the colors C(a,b),C(a,c)C(a,b),C(a,c) and C(b,c)C(b,c) are all distinct. Furthermore, if C(c,b)C(b,c)C(c,b)\neq C(b,c), we have a solution of type ii)ii), so we also assume that C(c,b)=C(b,c)C(c,b)=C(b,c). We use the circuit Elex:{0,1}n×{0,1}n{0,1}2n1E_{lex}:\{0,1\}^{n}\times\{0,1\}^{n}\rightarrow\{0,1\}^{2n-1} defined in Section 3.2, to encode 2-subsets of {0,1}n\{0,1\}^{n} using 2n12n-1 bits.

We define an instance C:{0,1}2n{0,1}2nC^{\prime}:\{0,1\}^{2n}\rightarrow\{0,1\}^{2n} of Pigeon as follows.

C(x)={011102n4if x=a0102n2if x=b01102n3if x=c01n1C(x,a)if C(x,b)=C(x,c)=C(b,c)1Elex(C(x,b),C(x,c))otherwise\displaystyle C^{\prime}(x)=\begin{cases}01110^{2n-4}&\text{if $x=a$}\\ 010^{2n-2}&\text{if $x=b$}\\ 0110^{2n-3}&\text{if $x=c$}\\ 01^{n-1}\mathbin{\|}C(x,a)&\text{if $C(x,b)=C(x,c)=C(b,c)$}\\ 1\mathbin{\|}E_{lex}(C(x,b),C(x,c))&\text{otherwise}\end{cases}

Now, suppose that we have a solution to this instance of Pigeon. By construction of CC^{\prime}, it cannot be x{0,1}2nx\in\{0,1\}^{2n} such that C(x)=02nC^{\prime}(x)=0^{2n}. Then, it must be xy{0,1}2nx\neq y\in\{0,1\}^{2n} such that C(x)=C(y)C^{\prime}(x)=C^{\prime}(y). Furthermore, by design of CC^{\prime}, we have x,y{a,b,c}x,y\notin\{a,b,c\}. We consider two cases, depending on the first bit of C(x)C^{\prime}(x).

  1. 1.

    Suppose the first bit of C(y)=C(x)C^{\prime}(y)=C^{\prime}(x) is a 11. Then, Elex(C(x,b),C(x,c))=Elex(C(y,b),C(y,c))E_{lex}(C(x,b),C(x,c))=E_{lex}(C(y,b),C(y,c)). If C(x,b)=C(x,c)C(x,b)=C(x,c), then we have that C(x,b)=C(x,c)C(b,c)C(x,b)=C(x,c)\neq C(b,c) otherwise the first bit of C(x)C^{\prime}(x) would be a 0. Then, the triangle (x,b,c)(x,b,c) is bichromatic so it’s a solution to our instance of Ward-Szabo-Colorful-Collisions. Similarly, if C(y,b)=C(y,c)C(y,b)=C(y,c), then the triangle (y,b,c)(y,b,c) is bichromatic. Now, if C(x,b)C(x,c)C(x,b)\neq C(x,c) and C(y,b)C(y,c)C(y,b)\neq C(y,c), then {C(x,b),C(x,c)}={C(y,b),C(y,c)}\{C(x,b),C(x,c)\}=\{C(y,b),C(y,c)\} by injectivity of ElexE_{lex} on subsets of 2 distinct elements of {0,1}n\{0,1\}^{n}. Then, {x,b,c}{y,b,c}\{x,b,c\}\neq\{y,b,c\}, each has three distinct elements, and either C(x,b)=C(y,b)C(x,b)=C(y,b), C(x,c)=C(y,c)C(x,c)=C(y,c) and C(b,c)=C(b,c)C(b,c)=C(b,c), or C(x,b)=C(y,c)C(x,b)=C(y,c), C(x,c)=C(y,b)C(x,c)=C(y,b) and C(b,c)=C(c,b)C(b,c)=C(c,b). The triangle (x,b,c)(x,b,c) is not monochromatic so this gives us a solution to our instance of Ward-Szabo-Colorful-Collisions, either of type iv)iv) if it is trichromatic, or of type iii)iii) if it is bichromatic.

  2. 2.

    Otherwise, suppose that the first bit of C(y)=C(x)C^{\prime}(y)=C^{\prime}(x) is a 0. By construction of CC^{\prime}, this means that C(x,b)=C(x,c)=C(b,c)=C(y,c)=C(y,b)C(x,b)=C(x,c)=C(b,c)=C(y,c)=C(y,b). Furthermore, since C(x)=C(y)C^{\prime}(x)=C^{\prime}(y), we get that C(x,a)=C(y,a)C(x,a)=C(y,a). Then, {x,a,b}{y,a,b}\{x,a,b\}\neq\{y,a,b\}, each has three distinct elements, and C(x,a)=C(y,a)C(x,a)=C(y,a), C(x,b)=C(y,b)C(x,b)=C(y,b) and C(a,b)=C(a,b)C(a,b)=C(a,b). The triangle (x,a,b)(x,a,b) is not monochromatic since C(x,b)=C(b,c)C(a,b)C(x,b)=C(b,c)\neq C(a,b) so this gives us a solution to our instance of Ward-Szabo-Colorful-Collisions, either of type iv)iv) if it is trichromatic, or of type iii)iii) if it is bichromatic.∎

7.1 A Hierarchy of Total Search Problems between weak-Pigeon and Pigeon?

In the last proof, we define a reduction to Pigeon where the circuit CC^{\prime} only has a range of 22n1+2n12^{2n-1}+2^{n-1} elements. Indeed, we need exactly (2n2)=22n12n1\binom{2^{n}}{2}=2^{2n-1}-2^{n-1} elements to encode the pairs of colors. We also need exactly 2n2^{n} elements for the fourth case. However, we can map the xx anywhere in that case if C(x,a){C(a,b),C(a,c),C(b,c)}C(x,a)\in\{C(a,b),C(a,c),C(b,c)\} because such an xx would give us a bichromatic triangle. Hence, we need 2n32^{n}-3 colors for this case. We also need 3 extra elements for a,ba,b and cc. Hence, overall, we only need a range of 22n1+2n12^{2n-1}+2^{n-1} elements. Thus, we get a reduction from Ward-Szabo-Colorful-Collisions to a problem that is weaker than Pigeon (but stronger than weak-Pigeon), which is the following : given a circuit from 2n2n bits to 2n2n bits, either find a collision, or a preimage of one of the first 22n(22n1+2n1)2^{2n}-(2^{2n-1}+2^{n-1}) elements.

More generally, we can define the problem General-Pigeonkm\textsc{General-Pigeon}_{k}^{m} as follows.

Definition 7.8 (General-Pigeonkm\textsc{General-Pigeon}_{k}^{m}).

The problem General-Pigeonkm\textsc{General-Pigeon}_{k}^{m} is defined by the relation

Instance:

A Boolean circuit C:{0,1}m{0,1}mC\colon\{0,1\}^{m}\to\{0,1\}^{m}.

Solution:

One of the following:

  1. i)

    xy{0,1}mx\neq y\in\{0,1\}^{m} s.t. C(x)=C(y)C(x)=C(y),

  2. ii)

    x{0,1}mx\in\{0,1\}^{m} s.t. C(x)C(x) is one of the first kk elements of {0,1}m\{0,1\}^{m}.

Note that this problem gets harder as kk decreases. It is trivial for k=2mk=2^{m}, equivalent to weak-Pigeon for k=2m1k=2^{m-1} and to Pigeon for k=1k=1.
This problem induces an entire family of intermediary problems between weak-Pigeon and Pigeon. It is not clear how many non-equivalent problems appear in that hierarchy. It is also unclear whether each PWPP-hard problem that is in PPP is in fact equivalent to one of these.

8 Mantel’s Theorem on Triangle-Free Graphs

Next, we move on to another classical theorem in extremal graph theory. It answers the following question: What is the maximum number of edges in a triangle-free graph on NN vertices?

Classical Theorem 7 (Mantel [Man07]).

If G=(V,E)G=(V,E) is a triangle-free graph on NN vertices then |E|N2/4|E|\leq N^{2}/4, and this bound is tight.

This gives rise to the following search problem. Suppose that we are given a collection of strictly more than N2/4N^{2}/4 distinct edges for a graph on NN vertices. Then, by Mantel’s theorem, there must be three of these edges forming a triangle in the graph. The search problem is then to find them. We can turn this problem into a TFNP problem if we also allow evidence that two edges in the collection are in fact the same, or that an edge is in fact a loop. For practical reasons, we demand that the endpoints of every edge are given in the lexicographic order. When the edges are represented implicitly by a poly-sized circuit, we get the following problem.

Definition 8.1 (weak-Mantel).

The problem weak-Mantel is defined by the relation

Instance:

A Boolean circuit C:{0,1}2n1{0,1}n×{0,1}nC\colon\{0,1\}^{2n-1}\to\{0,1\}^{n}\times\{0,1\}^{n}.

Solution:

One of the following:

  1. i)

    Distinct i,j,ki,j,k s.t. C(i),C(j),C(k)C(i),C(j),C(k) form a triangle,

  2. ii)

    ii s.t. C(i)=(u,v)C(i)=(u,v) with uvu\geq v in the lexicographic order,

  3. iii)

    iji\neq j s.t. C(i)=C(j)C(i)=C(j).

Remark 8.2.

Like in the other problems, the size of the collection we receive (in this case, edges) is twice the threshold size (here, 2n22^{n-2}). However, here, we observe that the number of edges we receive as input is greater than the number of possible edges since 2n1>(2n2)2^{n-1}>\binom{2^{n}}{2}. Thus, in any instance of weak-Mantel, there must be solutions of type ii)ii) or iii)iii).

Theorem 8.3.

weak-Mantel is PWPP-hard.

Proof.

To prove this result, we apply the graph-hash product to the complete balanced bipartite graph on 2n2^{n} vertices.

Formally, let C:{0,1}n{0,1}n1C:\{0,1\}^{n}\rightarrow\{0,1\}^{n-1} be an instance of weak-Pigeon. We define C:{0,1}2n1{0,1}2n2C^{\prime}:\{0,1\}^{2n-1}\rightarrow\{0,1\}^{2n-2} as follows. For every x{0,1}2n1x\in\{0,1\}^{2n-1}, write x=yzx=y\mathbin{\|}z with y{0,1}ny\in\{0,1\}^{n} and z{0,1}n1z\in\{0,1\}^{n-1}. We then set C(x)=C(y)zC^{\prime}(x)=C(y)\mathbin{\|}z. Note that from any collision for CC^{\prime} we can retrieve a collision for CC (by looking at the first nn bits). Now, we define C′′:{0,1}2n1{0,1}n×{0,1}nC^{\prime\prime}:\{0,1\}^{2n-1}\rightarrow\{0,1\}^{n}\times\{0,1\}^{n} as follows. For every x{0,1}2n1x\in\{0,1\}^{2n-1}, write C(x)=(yz)C^{\prime}(x)=(y\mathbin{\|}z) with y,z{0,1}n1y,z\in\{0,1\}^{n-1}. We then set C′′(x)=(0y,1z)C^{\prime\prime}(x)=(0\mathbin{\|}y,1\mathbin{\|}z). We observe that C′′C^{\prime\prime} defines an instance of Mantel. Note that the edges given by C′′C^{\prime\prime} correspond to edges of the complete balanced bipartite graph on 2n2^{n} vertices where one side of the bipartition consists of the 2n12^{n-1} first elements in the lexicographic order. In particular, the graph described by C′′C^{\prime\prime} is triangle-free, so there is no solution of type i)i). Similarly, by construction of C′′C^{\prime\prime}, there can be no solution of type ii)ii). Thus, any solution to this instance of weak-Mantel is iji\neq j such that C′′(i)=C′′(j)C^{\prime\prime}(i)=C^{\prime\prime}(j). By construction of C′′C^{\prime\prime}, this means that C(i)=C(j)C^{\prime}(i)=C^{\prime}(j) and from there we can find a collision for CC. ∎

Theorem 8.4.

weak-MantelPPP\textsc{weak-Mantel}\in\textsf{PPP}.

Proof.

We give a high-level overview of the proof. Since we have more edges than there are possible distinct edges, we encode the edges injectively, mapping only ill-defined edges to 0. This defines an instance of Pigeon, where a solution can only be a collision, meaning two different indices corresponding to the same edge.

With the circuit Elex:{0,1}n×{0,1}n{0,1}2n1E_{lex}:\{0,1\}^{n}\times\{0,1\}^{n}\rightarrow\{0,1\}^{2n-1} defined in Section 3.2, we can encode 2-subsets of {0,1}n\{0,1\}^{n} using optimally many bits, that is log(2n2)=2n1\left\lceil\log\binom{2^{n}}{2}\right\rceil=2n-1.

Now, consider the following circuit E:{0,1}n×{0,1}n{0,1}2n1E:\{0,1\}^{n}\times\{0,1\}^{n}\rightarrow\{0,1\}^{2n-1},

E(u,v)={02n1if uvElex(u,v)+02n21if u<vE(u,v)=\begin{cases}0^{2n-1}&\text{if $u\geq v$}\\ E_{lex}(u,v)+0^{2n-2}1&\text{if $u<v$}\end{cases}

where ++ represents the addition in binary. Note that since the range of ElexE_{lex} is exactly the first (2n2)\binom{2^{n}}{2} elements of {0,1}2n1\{0,1\}^{2n-1} in the lexicographic order, if E(u,v)=02n1E(u,v)=0^{2n-1}, it must be that uvu\geq v.

Let C:{0,1}2n1{0,1}n×{0,1}nC:\{0,1\}^{2n-1}\rightarrow\{0,1\}^{n}\times\{0,1\}^{n} be an instance of weak-Mantel. For every x{0,1}2n1x\in\{0,1\}^{2n-1}, we set C(x)=E(C(x))C^{\prime}(x)=E(C(x)). Then, C:{0,1}2n1{0,1}2n1C^{\prime}:\{0,1\}^{2n-1}\rightarrow\{0,1\}^{2n-1} is an instance of Pigeon.

Now, suppose that we have a solution to this instance of Pigeon. If it is xx such that C(x)=02n1C^{\prime}(x)=0^{2n-1}, then E(C(x))=02n1E(C(x))=0^{2n-1} which means that C(x)=(u,v)C(x)=(u,v) with uvu\geq v so xx is a solution to our instance of weak-Mantel. If it is xyx\neq y such that C(x)=C(y)C^{\prime}(x)=C^{\prime}(y). If C(x)=02n1C^{\prime}(x)=0^{2n-1}, by the first case we have that xx is a solution to the instance of weak-Mantel. Now, if C(x)02n1C^{\prime}(x)\neq 0^{2n-1}, then it means that E(C(x))+02n21=E(C(y))+02n21E(C(x))+0^{2n-2}1=E(C(y))+0^{2n-2}1 so E(C(x))=E(C(y))E(C(x))=E(C(y)). By injectivity of EE on well-defined inputs (that is inputs of the form (u,v)(u,v) with u<vu<v), this means that C(x)=C(y)C(x)=C(y) which is a solution to our original instance of weak-Mantel. ∎

Remark 8.5.

Similarly to the proof that Ward-Szabo-CollisionsPPP\textsc{Ward-Szabo-Collisions}\in\textsf{PPP}, we only use the last two types of solutions, which suggests that what makes this problem easier than Pigeon is only the fact that we are given more edges than there are different possible edges in a graph on 2n2^{n} vertices.

Remark 8.6.

In fact, this last proof shows that weak-Mantel reduces to General-Pigeon2n12n1\textsc{General-Pigeon}_{2^{n-1}}^{2n-1}.

Mantel’s theorem states that there is a unique triangle-free graph on 2N2N vertices that has N2N^{2} edges, it is the complete bipartite graph KN,NK_{N,N}. Now, consider any labelling of the vertices of KN,NK_{N,N}. If for every label xx, the vertices labelled xx and x+1mod2Nx+1\mod 2N were on the same side of the bipartition, then all the vertices would be on the same side of the bipartition, which is impossible. Hence, there must be 2 vertices labelled xx and x+1mod2Nx+1\mod 2N on different sides of the bipartition, and therefore there must be an edge between them. Thus, the following problem is total.

Definition 8.7 (Mantel).

The problem Mantel is defined by the relation

Instance:

A Boolean circuit C:{0,1}2n2{0,1}n×{0,1}nC\colon\{0,1\}^{2n-2}\to\{0,1\}^{n}\times\{0,1\}^{n}.

Solution:

One of the following:

  1. i)

    Distinct i,j,ki,j,k s.t. C(i),C(j),C(k)C(i),C(j),C(k) form a triangle,

  2. ii)

    ii s.t. C(i)=(u,v)C(i)=(u,v) with uvu\geq v in the lexicographic order,

  3. iii)

    iji\neq j s.t. C(i)=C(j)C(i)=C(j),

  4. iv)

    ii s.t. C(i)=(u,v)C(i)=(u,v) with v=u+1mod2nv=u+1\mod 2^{n} when we consider uu and vv as integers.

Theorem 8.8.

Mantel is PPP-hard.

Proof.

To prove this result, we do the graph-hash product on the complete balanced bipartite graph on 2n2^{n} vertices, where one side of the bipartition consists of the first 2n12^{n-1} vertices in the lexicographic order. We make sure to map 0 into the edge (01n1,10n1)(01^{n-1},10^{n-1}), which is the only edge satisfying iv)iv) in that graph.

Formally, let C:{0,1}2n2{0,1}2n2C:\{0,1\}^{2n-2}\rightarrow\{0,1\}^{2n-2} be an instance of Pigeon.
We define a circuit C:{0,1}2n2{0,1}n×{0,1}nC^{\prime}:\{0,1\}^{2n-2}\rightarrow\{0,1\}^{n}\times\{0,1\}^{n} as follows. Let x{0,1}2n2x\in\{0,1\}^{2n-2}. If C(x)=02n2C(x)=0^{2n-2}, we set C(x)=(01n1,10n1)C^{\prime}(x)=(0\mathbin{\|}1^{n-1},1\mathbin{\|}0^{n-1}). If C(x)=1n10n1C(x)=1^{n-1}\mathbin{\|}0^{n-1}, we set C(x)=(0n,10n1)C^{\prime}(x)=(0^{n},1\mathbin{\|}0^{n-1}). Otherwise, if C(x)=(u,v)C(x)=(u,v), we set C(x)=(0u,1v)C^{\prime}(x)=(0\mathbin{\|}u,1\mathbin{\|}v). CC^{\prime} has polynomial size and defines an instance of Mantel.

Now, suppose that we have a solution to this instance of Mantel. Like in the proof of Theorem 8.8, this solution cannot be of type i)i) because the graph described by CC^{\prime} is bipartite hence triangle-free, and it cannot be of type ii)ii) neither, by construction. If this solution is of the form iji\neq j such that C(i)=C(j)C^{\prime}(i)=C^{\prime}(j), by construction of CC^{\prime} it means that C(i)=C(j)C(i)=C(j) which is a collision for CC. If this solution is of the form ii such that C(i)=(u,v)C^{\prime}(i)=(u,v) with v=u+1mod2nv=u+1\mod 2^{n}, then by definition of CC^{\prime}, it can only be that C(i)=(01n,10n)C^{\prime}(i)=(0\mathbin{\|}1^{n},1\mathbin{\|}0^{n}). By construction of CC^{\prime}, this means that C(i)=02n2C(i)=0^{2n-2} hence xx is a solution to the original instance of Pigeon.

8.1 Generalization with Turán’s Theorem

Mantel’s theorem investigates the maximum number of edges in a triangle-free graph on NN vertices. Similarly, one could wonder about the maximum number of edges in a graph on NN vertices that does not contain a clique on rr vertices, where r3r\geq 3 is an arbitrary constant. This problem was solved by Turán in 1941.

Classical Theorem 8 (Turán [Tur41]).

If G=(V,E)G=(V,E) is a graph on N=|V|N=|V| vertices that does not contain any r+1r+1-clique, then |E|(11r)N22|E|\leq(1-\frac{1}{r})\frac{N^{2}}{2} and this bound is tight when rr divides NN.

Now, suppose that we are given a list of strictly more than (11r)N22(1-\frac{1}{r})\frac{N^{2}}{2} edges for a graph on NN vertices. Then, by Turán’s theorem, if all these edges are distinct, the graph must contain an r+1r+1-clique. This induces a total search, namely that of finding the vertices of such a clique. If the edges are given implicitly via a Boolean circuit which on input ii returns the endpoints of the ii-th edge, we get the following TFNP problem.

Definition 8.9 (weak-Turánr).

The problem weak-Turánr is defined by the relation

Instance:

A Boolean circuit C:{0,1}2n1{0,1}n×{0,1}nC\colon\{0,1\}^{2n-1}\to\{0,1\}^{n}\times\{0,1\}^{n}.

Solution:

One of the following:

  1. i)

    Distinct i1,i2,i(r+1)(r+2)/2i_{1},i_{2},\ldots i_{(r+1)(r+2)/2} such that C(i1),C(i2),C(i(r+1)(r+2)/2)C(i_{1}),C(i_{2}),\ldots C(i_{(r+1)(r+2)/2}) are the edges of an r+1r+1-clique,

  2. ii)

    ii s.t. C(i)=(u,v)C(i)=(u,v) with uvu\geq v in the lexicographic order,

  3. iii)

    iji\neq j s.t. C(i)=C(j)C(i)=C(j).

Remark 8.10.

Note that rr can be any polynomial in nn in the previous definition and it would still define a TFNP problem.

Theorem 8.11.

For every r1<r2r_{1}<r_{2}, there is a reduction from weak-Turánr1\textsc{weak-Tur\'{a}n}_{r_{1}} to weak-Turánr2\textsc{weak-Tur\'{a}n}_{r_{2}}.

Proof.

Let C:{0,1}2n1{0,1}n×{0,1}nC:\{0,1\}^{2n-1}\rightarrow\{0,1\}^{n}\times\{0,1\}^{n} be an instance of weak-Turánr1\textsc{weak-Tur\'{a}n}_{r_{1}}. Now, we interpret it as an instance of weak-Turánr2\textsc{weak-Tur\'{a}n}_{r_{2}}. Suppose that we have a solution to this instance of weak-Turánr2\textsc{weak-Tur\'{a}n}_{r_{2}}.
If we have (r2+1)(r2+2)/2(r_{2}+1)(r_{2}+2)/2 edges that form an r2+1r_{2}+1-clique, it suffices to remove some of them to get the edges of an r1+1r_{1}+1-clique. Otherwise, any solution of type ii)ii) or iii)iii) for weak-Turánr2\textsc{weak-Tur\'{a}n}_{r_{2}} immediately translates into a solution of the same type for weak-Turánr1\textsc{weak-Tur\'{a}n}_{r_{1}}. ∎

Theorem 8.12.

For every r2r\geq 2, weak-Turánr is PWPP-hard.

Proof.

It is enough to notice that WeakTurán2\textsc{WeakTur\'{a}n}_{2} is exactly weak-Mantel, which is PWPP-hard by Theorem 8.3. Then, apply Theorem 8.11. ∎

Theorem 8.13.

For every r>2r>2, weak-TuránrPPP\textsc{weak-Tur\'{a}n${}_{r}$}\in\textsf{PPP}.

The proof is exactly similar to the proof of Theorem 8.4. In this case too, it appears that what makes the problem easier than Pigeon is that we are given too many edges.

Turán’s theorem states that there if rr divides NN, there is a unique graph on NN vertices that does not contain any r+1r+1-clique and that has the maximum number of edges. This graph is the complete rr-partite graph, where each part has size N/rN/r. Like previously, there must be 2 vertices labelled xx and x+1mod2Nx+1\mod 2N with an edge between them. We denote by NN the largest multiple of rr that is at most 2n2^{n}, and set M=(11r)N22M=(1-\frac{1}{r})\frac{N^{2}}{2}. Thus, the following problem is in TFNP.

Definition 8.14 (Turánr).

The problem Turánr is defined by the relation

Instance:

The following:

  1. 1.

    A Boolean circuit C:{0,1}2n1{0,1}n×{0,1}nC\colon\{0,1\}^{2n-1}\to\{0,1\}^{n}\times\{0,1\}^{n}; and,

  2. 2.

    Two integers NN and MM.

Solution:

One of the following:

  1. i)

    0 if rr does not divide NN, or if N>2nN>2^{n}, or if N+r2nN+r\leq 2^{n}, or if M(11r)N22M\neq(1-\frac{1}{r})\frac{N^{2}}{2},

  2. ii)

    ii s.t. C(i)=(u,v)C(i)=(u,v) with uNu\geq N or vNv\geq N, and i<Mi<M

  3. iii)

    Distinct i1,i2,i(r+1)(r+2)/2i_{1},i_{2},\ldots i_{(r+1)(r+2)/2} such that C(i1),C(i2),C(i(r+1)(r+2)/2)C(i_{1}),C(i_{2}),\ldots C(i_{(r+1)(r+2)/2}) are the edges of an r+1r+1-clique, and ij<Mi_{j}<M for every jj,

  4. iv)

    ii s.t. C(i)=(u,v)C(i)=(u,v) with uvu\geq v in the lexicographic order, and i<Mi<M,

  5. v)

    iji\neq j s.t. C(i)=C(j)C(i)=C(j), and i,j<Mi,j<M,

  6. vi)

    ii s.t. C(i)=(u,v)C(i)=(u,v) with v=u+1mod2nv=u+1\mod 2^{n} when we consider uu and vv as integers, and i<Mi<M.

This last problem is in TFNP. However, we cannot adapt the proof of PPP-hardness of Mantel to it in a straightforward way and, in fact, it is open whether this problem is PPP-hard.

References

  • [Bar73] Zsolt Baranyai. Infinite and finite sets, vol. 1. proceedings of a colloquium held at Keszthely, June 25 – July 1, 1973. Dedicated to Paul Erdős on his 60th Birthday. J. Symb. Log., 1:91–108, 1973.
  • [BJP+19] Frank Ban, Kamal Jain, Christos H. Papadimitriou, Christos-Alexandros Psomas, and Aviad Rubinstein. Reductions in PPP. Inf. Process. Lett., 145:48–52, 2019.
  • [Cay89] Arthur Cayley. A theorem on trees. Quarterly Journal of Mathematics, 23:376–378, 1889.
  • [Cov73] Thomas M. Cover. Enumerative source encoding. IEEE Transactions on Information Theory, 19(1):73–77, 1973.
  • [DGP09] Constantinos Daskalakis, Paul W. Goldberg, and Christos H. Papadimitriou. The complexity of computing a Nash equilibrium. SIAM J. Comput., 39(1):195–259, 2009.
  • [Dil50] Robert P. Dilworth. A decomposition theorem for partially ordered sets. Annals of Mathematics 51, pages 161–166, 1950.
  • [EK99] Ömer Egecioglu and Alastair King. Random walks and Catalan factorization. 1999.
  • [EKR61] Paul Erdős, Chao Ko, and Richard Rado. Intersection theorems for systems of finite sets. The Quarterly Journal of Mathematics, 12(1):313–320, 01 1961.
  • [ER60] Paul Erdös and Richard Rado. Intersection theorems for systems of sets. Journal of the London Mathematical Society, s1-35(1):85–90, 1960.
  • [Erd47] Paul Erdös. Some remarks on the theory of graphs. Bulletin of the American Mathematical Society, 53(4):292–294, 1947.
  • [FG18] Aris Filos-Ratsikas and Paul W. Goldberg. Consensus halving is PPA-complete. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 51–64. ACM, 2018.
  • [HV21] Pavel Hubáček and Jan Václavek. On search complexity of discrete logarithm. In Filippo Bonchi and Simon J. Puglisi, editors, 46th International Symposium on Mathematical Foundations of Computer Science, MFCS 2021, August 23-27, 2021, Tallinn, Estonia, volume 202 of LIPIcs, pages 60:1–60:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
  • [Jeř16] Emil Jeřábek. Integer factoring and modular square roots. J. Comput. Syst. Sci., 82(2):380–394, 2016.
  • [JPY88] David S. Johnson, Christos H. Papadimitriou, and Mihalis Yannakakis. How easy is local search? J. Comput. Syst. Sci., 37(1):79–100, 1988.
  • [KNY19] Ilan Komargodski, Moni Naor, and Eylon Yogev. White-box vs. black-box complexity of search problems: Ramsey and graph property testing. J. ACM, 66(5), jul 2019.
  • [Kra05] Jan Krajíček. Structured pigeonhole principle, search problems and hard tautologies. J. Symb. Log., 70(2):619–630, 2005.
  • [Man07] Willem Mantel. Problem 28 (Solution by H. Gouwentak, W. Mantel, J. Teixeira de Mattes, F. Schuh and W. A. Wythoff). Wiskundige Opgaven, 18:60–61, 1907.
  • [Meh18] Ruta Mehta. Constant rank two-player games are PPAD-hard. SIAM J. Comput., 47(5):1858–1887, 2018.
  • [Mer79] Ralph Charles Merkle. Secrecy, Authentication, and Public Key Systems. PhD thesis, Stanford, CA, USA, 1979. AAI8001972.
  • [MP91] Nimrod Megiddo and Christos H. Papadimitriou. On total functions, existence theorems and computational complexity. Theor. Comput. Sci., 81(2):317–324, 1991.
  • [Pap94] Christos H. Papadimitriou. On the complexity of the parity argument and other inefficient proofs of existence. J. Comput. Syst. Sci., 48(3):498–532, 1994.
  • [Pru18] Heinz Prufer. Neuer Beweis eines Satzes über Permutationen. Archiv der Mathematischen Physik, 27:742–744, 1918.
  • [Ram30] Frank P. Ramsey. On a Problem of Formal Logic. Proceedings of the London Mathematical Society, s2-30(1):264–286, 01 1930.
  • [Spe28] Emanuel Sperner. Ein Satz über Untermengen einer endlichen Menge. Mathematische Zeitschrift, 27(1):544–548, 1928.
  • [SZZ18] Katerina Sotiraki, Manolis Zampetakis, and Giorgos Zirdelis. Ppp-completeness with connections to cryptography. In Mikkel Thorup, editor, 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 148–158. IEEE Computer Society, 2018.
  • [Tur41] Paul Turán. On an extremal problem in graph theory (in hungarian). Matematikai és Fizikai Lapok, 48:436–452, 1941.
  • [WS95] Coburn Ward and Sandor Szabo. On swell-colored complete graphs. 06 1995.

Appendix A Efficient algorithm for the explicit Ramsey problem

The following proof of Ramsey’s theorem is folklore. Recall the statement of the theorem

Ramsey [Ram30]

Any edge-coloring of the complete graph on nn vertices with two colors contains a monochromatic clique of size at least 12logn\frac{1}{2}\log n.

Proof.

Let G=(V,E)G=(V,E) be the complete graph on nn vertices, and c:E{0,1}c:E\rightarrow\{0,1\} be a two-coloring of its edges.
Pick an arbitrary vertex v1Vv_{1}\in V.
v1v_{1} has n1n-1 adjacent edges so at least n/2n/2 of them have the same color by the pigeonhole principle.
Let c1c_{1} be that color and V1={vV{v1},c(v,v1)=c1}V_{1}=\{v\in V\setminus\{v_{1}\},c(v,v_{1})=c_{1}\}.
Then, V1V_{1} has at least n/2n/2 elements.

Next, pick an arbitrary vertex v2V1v_{2}\in V_{1}.
There are at least n/21n/2-1 edges between v2v_{2} and another vertex in V1V_{1}. Like before, at least n/4n/4 of them have the same color by the pigeonhole principle.
Let c2c_{2} be that color and V2={vV1{v2},c(v,v2)=c2}V_{2}=\{v\in V_{1}\setminus\{v_{2}\},c(v,v_{2})=c_{2}\}.

That way, we proceed to build by induction a finite family of vertices (vi)(v_{i}), a finite family of colors (ci)(c_{i}) and a finite family of sets of vertices (Vi)(V_{i}) with the following properties :
\bullet For every ii, ViVi1V_{i}\subset V_{i-1}.
\bullet For every ii, ViV_{i} has size at least n/2in/2^{i}.
\bullet For every ii, vi+1Viv_{i+1}\in V_{i}.
\bullet For every ii and for every uViu\in V_{i}, we have c(vi,u)=cic(v_{i},u)=c_{i}.

In particular, note that the second point implies that we have at least log(n)1\log(n)-1 ViV_{i}’s, thus we can construct at least log(n)\log(n) viv_{i}’s (since we need that ViV_{i} is not empty to build vi+1v_{i+1}).
This means that we define at least log(n)1\log(n)-1 colors cic_{i}. By the pigeonhole principle, at least log(n)/2\log(n)/2 of them are the same, say color c{0,1}c\in\{0,1\}.
Let k=log(n)/2k=\log(n)/2.
Pick i1,i2,,iki_{1},i_{2},\ldots,i_{k} such that ci1=ci2==cik=cc_{i_{1}}=c_{i_{2}}=\ldots=c_{i_{k}}=c.
We claim that the subgraph whose vertices are vi1,vi2,,vikv_{i_{1}},v_{i_{2}},\ldots,v_{i_{k}} is monochromatic.
Indeed, let j<l[k]j<l\in[k].
Then, vilVil1Vil2Vijv_{i_{l}}\in V_{i_{l}-1}\subset V_{i_{l}-2}\subset\ldots\subset V_{i_{j}}, so by the fourth point, we get that c(vij,vil)=cij=cc(v_{i_{j}},v_{i_{l}})=c_{i_{j}}=c. ∎

Now, note that this proof is constructive and yields an algorithm to find a monochromatic subgraph of size k=log(n)/2k=\log(n)/2 of the complete graph on nn vertices.
In this algorithm, we have log(n)\log(n) iterations, and each of them can be done in time O(n)O(n), so overall we get an algorithm running in O(nlog(n))O(n\log(n)) time.