This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Threshold phenomena for random discrete structures

Jinyoung Park The author will be an assistant professor of mathematics at Courant Institute of Mathematical Sciences, NYU, starting in Fall 2023. Her email address is jinyoungpark@nyu.edu. Her research is supported by NSF grant DMS-2153844.

1 Erdős-Rényi model

To begin, we briefly introduce a model of random graphs. Recall that a graph is a mathematical structure that consists of vertices (nodes) and edges.

Refer to caption
Figure 1: A graph

Roughly speaking, a random graph in this article means that, given a vertex set, the existence of each potential edge is decided at random. We will specifically focus on the Erdős-Rényi random graph (denoted by Gn,pG_{n,p}), which is defined as follows.

Consider nn vertices that are labelled from 1 to nn.

[Uncaptioned image]

Observe that on those nn vertices, there are potentially (n2){n\choose 2} edges, that is, the edges labelled {1,2},{1,3},,{n1,n}\{1,2\},\{1,3\},\ldots,\{n-1,n\}. Given a probability p[0,1]p\in[0,1], include each of the (n2)n\choose 2 potential edges with probability pp, where the choice of each edge is made independently from the choices of the other edges.

Example 1.1.

As a toy example of the Erdős-Rényi random graph, let’s think about what Gn,pG_{n,p} looks like when n=3n=3 and the value of pp varies. First, if p=1/2p=1/2, then Gn,pG_{n,p} has the probability distribution as in Figure 2 defined on the collection of eight graphs. Observe that each graph is equally likely (since each potential edge is present with probability 1/21/2 independently).

Refer to caption
Figure 2: G3,1/2G_{3,1/2}

Of course, we will have a different probability distribution if we change the value of pp. For example, if pp is closer to 0, say 0.01, then Gn,pG_{n,p} has the distribution as in Figure 3, where sparser graphs are more likely (as expected). On the other hand, if pp is closer to 1, then denser graphs will be more likely.

Refer to caption
Figure 3: G3,0.01G_{3,0.01}

In reality, when we consider Gn,pG_{n,p}, nn is a large (yet finite) number that tends to infinity, and p=p(n)p=p(n) is usually a function of nn that tends to zero as nn\rightarrow\infty. For example, p=1/np=1/n, p=logn/np=\log n/n, etc.

As we saw in Example 1.1, a random graph is a random variable with a certain probability distribution (as opposed to a fixed graph) that depends on the values of nn and pp. Assuming nn is given, the structure of Gn,pG_{n,p} changes as the value of pp changes, and in order to understand Gn,pG_{n,p}, we ask questions about the structure of Gn,pG_{n,p} such as

What’s the probability that Gn,pG_{n,p} is connected?

or

What’s the probability that Gn,pG_{n,p} is planar?

Basically, for any property (=n)\mathcal{F}(=\mathcal{F}_{n}) of interest, we can ask

What’s the probability that Gn,pG_{n,p} satisfies property \mathcal{F}?

In those questions, usually we are interested in understanding the typical structure/behavior of Gn,pG_{n,p}. Observe that, unless p=0p=0 or 11, there is always a positive probability that all of the edges in Gn,pG_{n,p} are absent, or all of them are present (see Examples 1.2, 1.3). But in this article, we would rather ignore such extreme events that happen with a tiny probability, and focus on properties that Gn,pG_{n,p} possesses with a probability close to 1.

We often use languages and tools from probability theory to describe/understand behaviors of Gn,pG_{n,p}. Below we discuss some very basic examples.

We will write f(n)g(n)f(n)\ll g(n) if f(n)g(n)0\frac{f(n)}{g(n)}\rightarrow 0 as nn\rightarrow\infty.

Example 1.2.

One important object in graph theory is the complete graph, a graph with all the potential edges present. The complete graph on nn vertices is denoted by KnK_{n}.

[Uncaptioned image]

We can easily imagine that, unless pp is very close to 11, it is extremely unlikely that Gn,pG_{n,p} is complete. Indeed,

(Gn,p=Kn)=p(n2)\mathbb{P}(G_{n,p}=K_{n})=p^{n\choose 2}

(since we want all the edges present), which tends to 0 unless 1p1-p is of order at most n2n^{-2}.

Example 1.3.

Similarly, we can compute the probability that Gn,pG_{n,p} is ”empty” (let’s denote this by \emptyset) meaning that no edges are present.111By the definition, Gn,pG_{n,p} has nn vertices as a default. The probability for this event is

(Gn,p=)=(1p)(n2).\mathbb{P}(G_{n,p}=\emptyset)=(1-p)^{n\choose 2}.

When pp is small, 1p1-p is approximately epe^{-p}, so the above computation tells us that

(Gn,p=){0 if p1/n2;1 if p1/n2.\mathbb{P}(G_{n,p}=\emptyset)\rightarrow\begin{cases}0&\text{ if }p\gg 1/n^{2};\\ 1&\text{ if }p\ll 1/n^{2}.\end{cases}
Example 1.4.

How many edges does Gn,pG_{n,p} typically have? The natural first step to answer this question is computing the expected number of edges in Gn,pG_{n,p}. Using linearity of expectation,

𝔼[number of edges in Gn,p]=i<j(edge {i,j} is present in Gn,p)=(number of edges in Kn)×(each edge is present)=(n2)p.\begin{split}&\mathbb{E}[\text{number of edges in $G_{n,p}$}]\\ &=\sum_{i<j}\mathbb{P}(\text{edge $\{i,j\}$ is present in $G_{n,p}$})\\ &=(\text{number of edges in $K_{n}$})\times\mathbb{P}(\text{each edge is present})\\ &={n\choose 2}p.\end{split}
Remark 1.5.

For example, if p=1/np=1/n, then the expected number of edges in Gn,pG_{n,p} is n12\frac{n-1}{2}. But does this really imply that Gn,1/nG_{n,1/n} typically has about n12\frac{n-1}{2} edges? The answer to this question is related to the fascinating topic of ”concentration of a probability measure.” We will very briefly discuss this topic in Example 2.5.

Example 1.6.

Similarly, we can compute the expected number of triangles (the complete graph K3K_{3}) in Gn,pG_{n,p}.

𝔼[number of triangles in Gn,p]=i<j<k(triangle {i,j,k} is present in Gn,p)=(number of triangles in Kn)×(each triangle is present)=(n3)p3.\begin{split}&\mathbb{E}[\text{number of triangles in $G_{n,p}$}]\\ &=\sum_{i<j<k}\mathbb{P}(\text{triangle $\{i,j,k\}$ is present in $G_{n,p}$})\\ &=(\text{number of triangles in $K_{n}$})\times\mathbb{P}(\text{each triangle is present})\\ &={n\choose 3}p^{3}.\end{split}

The above computation tells us that

𝔼[number of triangles in Gn,p]{0 if p1/n; if p1/n,\mathbb{E}[\text{number of triangles in $G_{n,p}$}]\rightarrow\begin{cases}0&\text{ if }p\ll 1/n;\\ \infty&\text{ if }p\gg 1/n,\end{cases}

from which we can conclude that Gn,pG_{n,p} is typically triangle-free if p1/np\ll 1/n. (If the expectation tends to 0, then there is little chance that Gn,pG_{n,p} contains a triangle.)

Remark 1.7.

On the contrary, we cannot conclude that Gn,pG_{n,p} typically contains many triangles for p1/np\gg 1/n from the above expectation computation. Just think about a lottery of the prize money 10100010^{1000} dollars with the chance of winning 1010010^{-100}, to see that a large expectation does not necessarily imply a high chance of the occurence of an event. In general, showing that a desired structure typically exists in Gn,pG_{n,p} is a very challenging task, and this became a motivation for the Kahn–Kalai Conjecture that we will discuss in the latter sections.

2 Threshold phenomena

One striking thing about Gn,pG_{n,p} is that appearance and disappearance of certain properties are abrupt. Probably one of the most well-known examples that exhibit threshold phenomena of Gn,pG_{n,p} is appearance of the giant component. A component of a graph is a maximal connected subgraph. For example, the graph in Figure 4 consists of four components, and the size (the number of vertices) of each component is 1,2,6,1,2,6, and 88.

Refer to caption
Figure 4: A graph that consists of four components

For Gn,pG_{n,p}, observe that, when p=0p=0, the size of a largest component of Gn,pG_{n,p} is 1; in this case all of the edges are absent with probability 1, so each of the components is an isolated vertex. On the other hand, when p=1p=1, Gn,pG_{n,p} is the complete graph with probability 1, so the size of its largest component is nn.

Refer to caption
Figure 5: Gn,0G_{n,0} and Gn,1G_{n,1}

Then what if pp is strictly between 0 and 11?

Question 2.1.

What’s the (typical) size of a largest component in Gn,pG_{n,p}?

Of course, one would naturally guess that as pp increases from 0 to 11, the typical size of a largest component in Gn,pG_{n,p} would also increase from 1 to nn. But what is really interesting here is that there is a ”sudden jump” in this increment.

In the following statement and everywhere else, with high probability means that the probability that the event under consideration occurs tends to 1 as nn\rightarrow\infty.

Theorem 2.2 (Erdős-Rényi [6]).

For any ε>0\varepsilon>0, the size of a largest component of Gn,pG_{n,p} is

{C1(ε)lognif np<1εC2(ε)nif np>1+ε\begin{cases}\leq C_{1}(\varepsilon)\log n&\mbox{if $np<1-\varepsilon$}\\ \geq C_{2}(\varepsilon)n&\mbox{if $np>1+\varepsilon$}\end{cases}

with high probability, where C1(ε),C2(ε)C_{1}(\varepsilon),C_{2}(\varepsilon) depend only on ε\varepsilon.

The above theorem says that if pp is ”slightly smaller” than 1n\frac{1}{n}, then typically all of the components of Gn,pG_{n,p} are very small (note that logn\log n is much smaller than the number of vertices, nn).

Refer to caption
Figure 6: Gn,pG_{n,p} with all components small (np<1εnp<1-\varepsilon)

On the other hand, if pp is ”slightly larger” than 1n\frac{1}{n}, then the size of a largest component of Gn,pG_{n,p} is as large as linear in nn. It is also well-known that all other components are very small (at most of order logn\log n), and this unique largest component is called the giant component.

Refer to caption
Figure 7: Gn,pG_{n,p} with the giant component (np>1εnp>1-\varepsilon)

So around the value p=1np=\frac{1}{n}, the giant component ”suddenly” appears, and therefore the structure of Gn,pG_{n,p} also drasitically changes. This is one example of the threshold phenomena that Gn,pG_{n,p} exhibits, and the value p=1np=\frac{1}{n} is a threshold function for Gn,pG_{n,p} of having the giant component. (The formal definition of a threshold function is given in Definition 2.3. See also the definition of the threshold in Section 5.)

The abrupt appearance of the giant component of Gn,pG_{n,p} is just one instance of vast threshold phenomena for random discrete structures. In this article, we will mostly deal with Gn,pG_{n,p} for the sake of concreteness, but there will be a brief discussion about a more general setting in Section 5.

Now we introduce the formal definition of a threshold function due to Erdős and Rényi. Recall that, in Gn,pG_{n,p}, all the vertices are labelled 1,,n1,\ldots,n. A graph property is a property that is invariant under graph isomorphisms (i.e. relabelling the vertices), such as {connected}\{\text{connected}\}, {planar}\{\text{planar}\}, {triangle-free}\{\text{triangle-free}\}, etc. We use (=n)\mathcal{F}(=\mathcal{F}_{n}) for a graph property, and Gn,pG_{n,p}\in\mathcal{F} denotes that Gn,pG_{n,p} has property \mathcal{F}.

Definition 2.3.

Given a graph property (=n)\mathcal{F}(=\mathcal{F}_{n}), we say that p0=p0(n)p_{0}=p_{0}(n) is a threshold function222By the definition, a threshold function is determined up to a constant factor thus not unique, but conventionally people also call this the threshold function. In this article, we will separately define the threshold in Section 5, which is distinguished from a threshold function. (or simply a threshold) for \mathcal{F} if

(Gn,p){0 if pp01 if pp0.\mathbb{P}(G_{n,p}\in\mathcal{F})\rightarrow\begin{cases}0&\text{ if }p\ll p_{0}\\ 1&\text{ if }p\gg p_{0}.\end{cases}

For example, p0=1np_{0}=\frac{1}{n} is a threshold function for the existence of the giant component.

Note that it is not obvious at all whether a given graph property would admit a threshold function. Erdős and Rényi proved that many graph properties have a threshold function, and about 20 years later, Bollobás and Thomason proved that, in fact, there is a wide class of properties that admit a threshold function. In what follows, an increasing (graph) property is a property that is preserved under addition of edges. For example, connectivity is an increasing property, because if a graph is connected then it remains connected no matter what edges are additionally added.

Theorem 2.4 (Bollobas-Thomason [5]).

Every increasing property has a threshold function.

Now it immediately follows from the above theorem that all the properties that we have mentioned so far – connectivity, planarity333We can apply the theorem for non-planarity, which is an increasing property., having the giant component, etc. – have a threshold function (thus exhibit a threshold phenomenon). How fascinating it is!

On the other hand, knowing that a property \mathcal{F} has a threshold function p0=p0()p_{0}=p_{0}(\mathcal{F}) does not tell us anything about the value of p0p_{0}. So it naturally became a central interest in the study of random graphs to find a threshold function for various increasing properties. One of the most studied classes of increasing properties is subgraph containment, i.e., the question of for what p=p(n)p=p(n), Gn,pG_{n,p} is likely/unlikely to contain a copy of the given graph. Figure 8 shows some of the well-known threshold functions for various subgraph containments (and that for connectivity).

Refer to caption
Figure 8: Some well-known thresholds
Example 2.5.

Figure 8 says that p=1np=\frac{1}{n} is a threshold function for the property ={contain a triangle}\mathcal{F}=\{\text{contain a triangle}\}. Recall from the definition of a threshold that this means

  1. (i)

    if p1np\ll\frac{1}{n} then (Gn,p contains a triangle)0\mathbb{P}(G_{n,p}\text{ contains a triangle})\rightarrow 0; and

  2. (ii)

    if p1np\gg\frac{1}{n} then (Gn,p contains a triangle)1.\mathbb{P}(G_{n,p}\text{ contains a triangle})\rightarrow 1.

We have already justified (i) in Example 1.2 by showing that

𝔼[number of triangles in Gn,p]0 if p1n.\mathbb{E}[\text{number of triangles in $G_{n,p}$}]\rightarrow 0\text{ if }p\ll\frac{1}{n}.

However, showing (ii) is an entirely different story. As discussed in Remark 1.7, the fact that

𝔼[number of triangles in Gn,p]\mathbb{E}[\text{number of triangles in $G_{n,p}$}]\rightarrow\infty

does not necessarily imply that Gn,pG_{n,p} typically contains many triangles. Here we briefly describe one technique, which is called the second moment method, that we can use to show (ii): let XX be the number of triangles in Gn,pG_{n,p}, noting that then XX is a random variable. By showing that the variance of XX is very small, which implies that XX is ”concentrated around” 𝔼X\mathbb{E}X, we can derive (from the fact that 𝔼X\mathbb{E}X is huge) that typically the number of triangles in Gn,pG_{n,p} is huge. We remark that the second moment method is only a tiny part of the much broader topic of concentration of a probability measure.

We stress that, in general, finding a threshold function for a given increasing property is a very hard task. To illustrate this point, let’s consider one of the most basic objects in graph theory, a spanning tree – a tree that contains all of the vertices.

Refer to caption
Figure 9: A connected graph and a spanning tree in it

The question of finding a threshold function for Gn,pG_{n,p} of containing a spanning tree444This is equivalent to Gn,pG_{n,p} is connected. was one of the first questions studied by Erdős and Rényi. Already in their seminal paper [6], Erdős and Rényi showed that a threshold function for containing a spanning tree is p0=lognnp_{0}=\frac{\log n}{n}. However, the difficulty of this problem immensely changes if we require Gn,pG_{n,p} to contain a specific (up to isomorphisms) spanning tree (or more broadly, a spanning graph555A spanning graph means a graph that contains all of the vertices). For example, one of the biggest open questions in this area back in 1960s was finding a threshold function for a Hamiltonian cycle (a cycle that contains all of the vertices).

Refer to caption
Figure 10: A graph and a Hamiltonian cycle in it

This problem was famously solved by Pósa in 1976.

Theorem 2.6 (Pósa [16]).

A threshold function for Gn,pG_{n,p} to contain a Hamiltonian cycle is

p0(n)=lognn.p_{0}(n)=\frac{\log n}{n}.

Note that both threshold functions for {\{contain any spanning tree}\} and {contain a Hamiltonian cycle}\{\text{contain a Hamiltonian cycle}\} are of order lognn\frac{\log n}{n}, even though the latter is a stronger requirement. Later we will see (in the discussion that follows Example 4.6) that lognn\frac{\log n}{n} is actually an easy lower bound on both threshold functions. It has long been conjectured that for any spanning tree666More precisely, for any sequence of spanning trees {Tn}\{T_{n}\} with a constant maximum degree, its threshold function is of order lognn\frac{\log n}{n}. This conjecture was only very recently proved by Montgomery [14].

3 The Kahn–Kalai Conjecture: a preview

Henceforth, \mathcal{F} always denotes an increasing property.

In 2006, Jeff Kahn and Gil Kalai [12] posed an extremely bold conjecture that captures the location of threshold functions for any increasing properties. Its formal statement will be given in Conjecture 4.11 (graph version) and Theorem 5.7 (abstract version), and in this section we will give an informal description of this conjecture first. All of the terms not defined here will be discussed in the forthcoming sections.

Given an \mathcal{F}, we are interested in locating its threshold function, p0()p_{0}(\mathcal{F}).777We switch the notation from p0(n)p_{0}(n) to p0()p_{0}(\mathcal{F}) to emphasize its dependence on \mathcal{F}. But again, this is in general a very hard task.

Kahn and Kalai introduced another quantity which they named the expectation threshold and denoted by p𝔼()p_{\mathbb{E}}(\mathcal{F}), which is associated with some sort of expectation calculations as its name indicates. By its definition (Definition 4.5),

p𝔼()p0()p_{\mathbb{E}}(\mathcal{F})\leq p_{0}(\mathcal{F}) for any \mathcal{F},

and, in particular, p𝔼()p_{\mathbb{E}}(\mathcal{F}) is easy to compute for many interesting increasing properties \mathcal{F}. So p𝔼()p_{\mathbb{E}}(\mathcal{F}) provides an ”easy” lower bound on the hard parameter p0()p_{0}(\mathcal{F}). A really fascinating part is that then Kahn and Kalai conjectured that p0()p_{0}(\mathcal{F}) is, in fact, bounded above by p𝔼()p_{\mathbb{E}}(\mathcal{F}) multiplied by some tiny quantity!

[Uncaptioned image]

So this conjecture asserts that, for any \mathcal{F}, p0()p_{0}(\mathcal{F}) is actually well-predicted by (much) easier p𝔼()p_{\mathbb{E}}(\mathcal{F})!

The graph version of this conjecture (Conjecture 4.11) is still open, but the abstract version (Theorem 5.7) is recently proved in [15].

4 Motivating examples

The conjecture of Kahn and Kalai is very strong, and even the authors of the conjecture wrote in their paper [12] that ”it would probably be more sensible to conjecture that it is not true.” The fundamental question that motivated this conjecture was:

Question 4.1.

What drives thresholds?

All of the examples in this section are carefully chosen to show the motivation behind the conjecture.

Recall that the definition of a threshold (Definition 2.3) doesn’t distinguish constant factors. So in this section, we will use the convenient notation ,\gtrsim,\lesssim, and \asymp to mean (respectively) ,\geq,\leq, and == up to constant factors. Finally, write p0(H)p_{0}(H) for a threshold function for Gn,pG_{n,p} of containing a copy of HH, for notational simplicity.

Example 4.2.

Let HH be the graph in Figure 11. Let’s find p0(H)p_{0}(H).

Refer to caption
Figure 11: Graph HH

In Example 2.5, we observed that there is a connection between a threshold function and computing expectations. As we did in Examples 1.4 and 1.6,

𝔼[number of H’s in Gn,p]=(number of (labelled) H’s in Kn)×(each (labelled) copy of H is present in Gn,p)()n4p5,\begin{split}&\mathbb{E}[\text{number of $H$'s in $G_{n,p}$}]\\ &=(\text{number of (labelled) $H$'s in $K_{n}$})\times\\ &\quad\mathbb{P}(\text{each (labelled) copy of $H$ is present in $G_{n,p}$})\\ &\stackrel{{\scriptstyle(\dagger)}}{{\asymp}}n^{4}p^{5},\end{split}

where ()(\dagger) is because the number of HH’s in KnK_{n} is of order n4n^{4} (since HH has four vertices), and (each copy of H is present)\mathbb{P}(\text{each copy of $H$ is present}) is precisely p5p^{5} (since HH has five edges). So we have

𝔼[number of H’s in Gn,p]{0 if pn4/5; if pn4/5,\mathbb{E}[\text{number of $H$'s in $G_{n,p}$}]\rightarrow\begin{cases}0\quad&\text{ if }p\ll n^{-4/5};\\ \infty\quad&\text{ if }p\gg n^{-4/5},\end{cases} (1)

and let’s (informally) call the value p=n4/5p=n^{-4/5}

”the threshold for the expectation of HH.”

This name makes sense since p=n4/5p=n^{-4/5} is where the expected number of HH’s drastically changes. Note that (1) tells us that

(Gn,pH)0 if pn4/5,\mathbb{P}(G_{n,p}\supseteq H)\rightarrow 0\quad\text{ if }p\ll n^{-4/5},

so, by the definition of a threshold, we have

n4/5p0(H).n^{-4/5}\lesssim p_{0}(H).

This way, we can always easily find a lower bound on p0(F)p_{0}(F) for any graph FF.

What is interesting here is that, for HH in Figure 11, we can actually show that

(Gn,pH)1 if pn4/5\mathbb{P}(G_{n,p}\supseteq H)\rightarrow 1\quad\text{ if }p\gg n^{-4/5}

using the second moment method (discussed in Example 2.5). This tells us a rather surprising fact that p0(H)p_{0}(H) is actually equal to the threshold for the expectation of HH.

Dream. Maybe p0(F)p_{0}(F) is always equal to the threshold for the expectation of FF for any graph FF?

The next example shows that the above dream is too dreamy to be true.

Example 4.3.

Consider H~\tilde{H} in Figure 12 this time. Notice that H~\tilde{H} is HH in Figure 11 with a ”tail.”

Refer to caption
Figure 12: Graph H~\tilde{H}

By repeating a similar computation as before, we have

𝔼[number of H~’s in Gn,p]{0 if pn5/6; if pn5/6,\mathbb{E}[\text{number of $\tilde{H}$'s in $G_{n,p}$}]\rightarrow\begin{cases}0\quad&\text{ if }p\ll n^{-5/6};\\ \infty\quad&\text{ if }p\gg n^{-5/6},\end{cases}

so the threshold for the expectation of H~\tilde{H} is n5/6n^{-5/6}. Again, this gives that

(Gn,pH~)0 if pn5/6,\mathbb{P}(G_{n,p}\supseteq\tilde{H})\rightarrow 0\quad\text{ if }p\ll n^{-5/6},

so we have n5/6p0(H~)n^{-5/6}\lesssim p_{0}(\tilde{H}). However, the actual threshold p0(H~)p_{0}(\tilde{H}) is n4/5n^{-4/5}, which is much larger than the lower bound.

Refer to caption
Figure 13: Gap between p0(H~)p_{0}(\tilde{H}) and the expectational lower bound

This is interesting, because Figure 13 tells us that when n5/6pn4/5n^{-5/6}\ll p\ll n^{-4/5}, Gn,pG_{n,p} contains a huge number of H~\tilde{H} ”on average,” but still it is very unlikely that Gn,pG_{n,p} actually contains H~\tilde{H}. What happens in this inverval?

Here is an explanation. Recall from Example 4.2 that if pn4/5p\ll n^{-4/5}, then Gn,pG_{n,p} is unlikely to contain HH. But

the absence of HH implies the absence of H~\tilde{H},

because HH is a subgraph of H~\tilde{H}!

So when n5/6pn4/5n^{-5/6}\ll p\ll n^{-4/5}, it is highly unlikely that Gn,pG_{n,p} contains H~\tilde{H} because it is already unlikely that Gn,pG_{n,p} contains HH. However, if Gn,pG_{n,p} happens to contain HH, then that copy of HH typically has lots of ”tails” as in Figure 14. This produces a huge number of copies of H~\tilde{H}’s in Gn,pG_{n,p}.

Refer to caption
Figure 14: HH with many ”tails”

Maybe you have noticed the similarity between this example and the example of a lottery in Remark 1.7.

In Example 4.3, p0(H~)p_{0}(\tilde{H}) is not predicted by the expected number of H~\tilde{H}, thus the Dream is broken. However, it still shows that p0(H~)p_{0}(\tilde{H}) is predicted by the expected number of some subgraph of H~\tilde{H}, and, intriguingly, this holds true in general. To provide its formal statement, define the density of a graph FF by

density(F)=(the number of edges of F)(the number of vertices of F).\text{density($F$)}=\frac{\text{(the number of edges of $F$)}}{\text{(the number of vertices of $F$})}.

The next theorem tells us the exciting fact that we can find p0(F)p_{0}(F) by just looking at its densest subgraph, as long as FF is fixed.888For example, a Hamiltonian cycle is not a fixed graph, since it grows as nn grows.

Theorem 4.4 (Bollobás [4]).

For any fixed graph FF, p0(F)p_{0}(F) is equal to the threshold for the expectation of the densest subgraph of FF.

For example, in Example 4.2, the densest subgraph of HH is HH itself, so p0(H)p_{0}(H) is determined by the expectation of HH. This also determines p0(H~)p_{0}(\tilde{H}) in Example 4.3, since the densest subgraph of H~\tilde{H} is again HH.

Motivated by the preceding examples and Theorem 4.4, we give a formal definition of the expectation threshold.

Definition 4.5 (Expectation threshold).

For any graph FF, the expectation threshold for FF is

p𝔼(F)=min{p:𝔼[number of F in Gn,p]1FF}.p_{\mathbb{E}}(F)=\min\{p:\mathbb{E}[\text{number of $F^{\prime}$ in $G_{n,p}$}]\geq 1\quad\forall F^{\prime}\subseteq F\}.

Observe that

p𝔼(F)p0(F) for any F,p_{\mathbb{E}}(F)\lesssim p_{0}(F)\text{ for any $F$}, (2)

and in particular, Theorem 4.4 gives that

p𝔼(F)p0(F) for any fixed F.p_{\mathbb{E}}(F)\asymp p_{0}(F)\text{ for any fixed $F$}.

Note that this gives a beautiful answer to Question 4.1 whenever \mathcal{F} is a property of containing a fixed graph.

Example 4.6.

Theorem 4.4 characterizes threshold functions for any fixed graphs. To extend our exploration, in this example we consider a graph that grows as nn grows. We say a graph MM is a matching if MM is a disjoint union of edges. MM is a perfect matching if MM is a matching that contains all the vertices. Write PM for perfect matching.

Refer to caption
Figure 15: A matching (above) and a perfect matching (below)

Keeping Question 4.1 in mind, let’s first check the validity of Theorem 4.4 to a perfect matching999We assume 2|n2|n to avoid a trivial obstruction from having a perfect matching., which is not a fixed graph. By repeating a similar computation as before, we obtain that

𝔼[number of PM’s in Gn,p](n/e)n/2pn/2,\mathbb{E}[\text{number of PM's in $G_{n,p}$}]\asymp(n/e)^{n/2}p^{n/2},

which tends to 0 if p1/n.p\ll 1/n. In fact, it is easy to compute (by considering all subgraphs of a perfect matching) that p𝔼(PM)1/np_{\mathbb{E}}(\text{PM})\asymp 1/n, so by (2),

p0(PM)1/n.p_{0}(\text{PM})\gtrsim 1/n.

However, unlike threshold functions for fixed graphs, p0(PM)p_{0}(\text{PM}) is not equal to p𝔼(PM)p_{\mathbb{E}}(\text{PM}); it was proved by Erdős and Rényi that

p0(PM)lognn(p𝔼(PM)).p_{0}(\text{PM})\asymp\frac{\log n}{n}\;(\gg p_{\mathbb{E}}(\text{PM})). (3)
Refer to caption
Figure 16: Gap between p0(PM)p_{0}(\text{PM}) and p𝔼(PM)p_{\mathbb{E}}(\text{PM})

Notice that, in Figure 16, what happens in the gap is fundamentally different from that in Figure 13. When 1nplognn\frac{1}{n}\ll p\ll\frac{\log n}{n}, Gn,pG_{n,p} contains huge numbers of PMs and all its subgraphs ”on average.” This means the absence of a subgraph of a PM is not the obstruction for Gn,pG_{n,p} from having a PM when p1/np\gg 1/n. Then what happens here, and what’s the real obstruction?

It turned out, we have

p0(PM)lognnp_{0}(\text{PM})\gtrsim\frac{\log n}{n}

for a very simple reason: the existence of an isolated vertex101010a vertex not contained in any edges in Gn,pG_{n,p}. It is well-known that if plognnp\ll\frac{\log n}{n}, then Gn,pG_{n,p} contains an isolated vertex with high probability (this phenomenon is elaborated in Example 4.7). Of course, if there is an isolated vertex in a graph, then this graph cannot contain a perfect matching.

So (3) says the very compelling fact that once pp is large enough that Gn,pG_{n,p} avoids isolated vertices, Gn,pG_{n,p} contains a perfect matching!

The existence of an isolated vertex in Gn,pG_{n,p} is essentially equivalent to the Coupon-collector’s Problem:

Example 4.7 (Coupon-collector).

Each box of cereal contains a random coupon, and there are nn different types of coupons. If all coupons are equally likely, then how many boxes of cereal do we (typically) need to buy to collect all nn coupons?

The well-known answer to this question is that we need to buy nlogn\gtrsim n\log n boxes of cereal. This phenomenon can be translated to Gn,pG_{n,p} in the following way: in Gn,pG_{n,p}, the nn vertices are regarded as coupons. If a vertex is contained in a (random) edge in Gn,pG_{n,p}, then that is regarded as being ”collected.” Note that if plognnp\ll\frac{\log n}{n}, then typically the number of edges in Gn,pG_{n,p} is (n2)pnlogn{n\choose 2}p\ll n\log n, and then the Coupon-collector’s Problem says that there is typically an ”uncollected coupon,” which is an isolated vertex.

Observe that, in Example 4.6, the ”coupon-collector behavior” of Gn,pG_{n,p} provides another lower bound on p0(PM)p_{0}(\text{PM}), pushing up the first lower bound, p𝔼(PM)p_{\mathbb{E}}(\text{PM}), by logn\log n. And it turned out that this second (better) lower bound is equal to the threshold.

Lower bounds Threshold
Expectation threshold p0p𝔼p_{0}\gtrsim p_{\mathbb{E}} p0p𝔼lognp_{0}\asymp p_{\mathbb{E}}\log n
Coupon collector p0p𝔼lognp_{0}\gtrsim p_{\mathbb{E}}\log n
Figure 17: Bounds on p0(PM)p_{0}(\text{PM})

Hitting time. Again, the existence of an isolated vertex in a graph trivially blocks this graph from containing any spanning subgraphs. In Example 4.6, we observed the compelling phenomenon that if pp is large enough that Gn,pG_{n,p} typically avoids isolated vertices, then for those pp, Gn,pG_{n,p} contains a perfect matching with high probability. Would this mean that, for Gn,pG_{n,p}, isolated vertices are the only barriers to the existence of spanning subgraphs?

To investigate this question, we consider a random process defined as below. Consider a sequence of graphs on nn vertices

G0=,G1,G2,,G(n2)=Kn,G_{0}=\emptyset,G_{1},G_{2},\ldots,G_{n\choose 2}=K_{n},

where Gm+1G_{m+1} is obtained from GmG_{m} by adding a random edge.

Refer to caption
Figure 18: Random process

Then GmG_{m}, the mm-th graph in this sequence, is the random variable that is uniformly distributed among all the graphs on nn vertices with mm edges. The next theorem tells us that, indeed, isolated vertices are the obstructions for a random graph to having a perfect matching.

Theorem 4.8 (Erdős-Rényi [7]).

Let m0m_{0} denote the first time that GmG_{m} contains no isolated vertices. Then, with high probability, Gm0G_{m_{0}} contains a perfect matching.

We remark that Theorem 4.8 gives much more precise information about p0(PM)p_{0}(\text{PM}) (back in Gn,pG_{n,p} setting). For example, Theorem 4.8 implies:

Theorem 4.9.

Let p=logn+cnnp=\frac{\log n+c_{n}}{n}. Then

limn(Gn,pPM)={0 if cneec if cnc1 if cn.\lim_{n\rightarrow\infty}\mathbb{P}(G_{n,p}\supseteq\text{PM})=\begin{cases}0&\text{ if }c_{n}\rightarrow-\infty\\ e^{-e^{-c}}&\text{ if }c_{n}\rightarrow c\\ 1&\text{ if }c_{n}\rightarrow\infty\end{cases}.

We observe a similar phenomenon for Hamiltonian cycles. Notice that in order for a graph to contain a Hamiltonian cycle, a minimum requirement is that every vertex is contained in at least two edges. The next theorem tells us that, again, this naive requirement is essentially the only barrier.

Theorem 4.10 (Ajtai-Komlós-Szemerédi [1], Bollobás [3]).

Let m1m_{1} denote the first time that every vertex in GmG_{m} is contained in at least two edges. Then, with high probability, Gm1G_{m_{1}} contains a Hamiltonian cycle.

Returning to Question 4.1, so far we have established that there are two factors that affect threshold functions. We first observed that p𝔼p_{\mathbb{E}} always gives a lower bound on p0p_{0}. We then observed that, when it applies, the coupon-collector behavior of Gn,pG_{n,p} pushes up this expectational lower bound by logn\log n. Conjecture 4.11 below dauntingly proposes that there are no other factors that affect thresholds.

Conjecture 4.11 (Kahn-Kalai [12]).

For any graph FF,

p0(F)p𝔼(F)logn.p_{0}(F)\lesssim p_{\mathbb{E}}(F)\log n.

Conjecture 4.11 is still wide open even after the ”abstract version” of this conjecture is proved. We close this section with a very interesting example in which p0p_{0} lies strictly in between p𝔼p_{\mathbb{E}} and p𝔼lognp_{\mathbb{E}}\log n. A triangle factor is a (vertex-) disjoint union of triangles that contains all the vertices.

Refer to caption
Figure 19: A triangle factor

The question of a threshold function for a triangle-factor111111or, more generally, a KrK_{r}-factor for any fixed rr was famously solved by Johansson, Kahn, and Vu [10]. Observe that an obvious obstruction for a graph from having a triangle factor is the existence of a vertex that is not contained in any triangles. The result below is the hitting time version of [10], which is obtained by combining [11] and [9].

Theorem 4.12.

Let m2m_{2} denote the first time that every vertex in GmG_{m} is contained in at least one triangle. Then, with high probability, Gm2G_{m_{2}} contains a triangle factor.

The above theorem implies that

p0(triangle factor)p𝔼(triangle factor)(logn)1/3.p_{0}(\text{triangle factor})\asymp p_{\mathbb{E}}(\text{triangle factor})\cdot(\log n)^{1/3}.

5 The Expectation Threshold Theorem

The abstract version of the Kahn–Kalai Conjecture, which is the main content of this section, is recently proved in [15]. We remark that the discussion in this section is not restricted by the languages in graph theory anymore.

We introduce some more definitions for this general setting. Given a finite set XX, the pp-biased product probability measure, μp\mu_{p}, on 2X2^{X} is defined by

μp(A)=p|A|(1p)|XA|AX.\mu_{p}(A)=p^{|A|}(1-p)^{|X\setminus A|}\quad\forall A\subseteq X.

We use XpX_{p} for the random variable whose law is

(Xp=A)=μp(A)AX.\mathbb{P}(X_{p}=A)=\mu_{p}(A)\quad\forall A\subseteq X.

In other words, XpX_{p} is a ”pp-random subset” of XX, which means XpX_{p} contains each element of XX with probability pp independently.

Example 5.1.

If X=([n]2)X={[n]\choose 2}, then

Xp=Gn,p.X_{p}=G_{n,p}.

So Gn,pG_{n,p} is a special case of the random model XpX_{p}.

We redefine increasing property in our new set-up. A property is a subset of 2X2^{X}, and 2X\mathcal{F}\subseteq 2^{X} is an increasing property if

BAB.B\supseteq A\in\mathcal{F}\Rightarrow B\in\mathcal{F}.

Informally, a property is increasing if we cannot ”destroy” this property by adding elements. Note that in this new definition, \mathcal{F} is not required to possess the strong symmetry as in increasing graph properties; for example, there is no longer a requirement like ”invariant under graph isomorphisms.”

Observe that μp()(:=Aμp(A)=(Xp))\mu_{p}(\mathcal{F})(:=\sum_{A\in\mathcal{F}}\mu_{p}(A)=\mathbb{P}(X_{p}\in\mathcal{F})) is a polynomial in pp, thus continuous. Furthermore, it is a well-known fact that μp()\mu_{p}(\mathcal{F}) is strictly increasing in pp unless =,2X\mathcal{F}=\emptyset,2^{X} (see Figure 20). For the rest of this section, we always assume ,2X\mathcal{F}\neq\emptyset,2^{X}.

Refer to caption
Figure 20: μp()\mu_{p}(\mathcal{F}) for p[0,1]p\in[0,1], and pc()p_{c}(\mathcal{F})

Because μp()\mu_{p}(\mathcal{F}) is continuous and strictly increasing in pp, there exists a unique pc()p_{c}(\mathcal{F}) for which μpc()=1/2\mu_{p_{c}}(\mathcal{F})=1/2. This pc()p_{c}(\mathcal{F}) is called the threshold for \mathcal{F}.

Remark 5.2.

The definition of pc()p_{c}(\mathcal{F}) does not require sequences. Incidentally, by Theorem 2.4, for any increasing graph property (=n)\mathcal{F}(=\mathcal{F}_{n}), pc()p_{c}(\mathcal{F}) is an (Erdős-Rényi) threshold function for \mathcal{F}.

For a general increasing property 2X\mathcal{F}\subseteq 2^{X}, the definition of p𝔼p_{\mathbb{E}} is not applicable anymore. Kahn and Kalai introduced the following generalized notion of the expectation threshold, which is also introduced by Talagrand [17].

Definition 5.3.

Given a finite set XX and an increasing property 2X\mathcal{F}\subseteq 2^{X}, q()q(\mathcal{F}) is the maximum of q[0,1]q\in[0,1] for which there exists 𝒢2X\mathcal{G}\subseteq 2^{X} satisfying the following two properties.

  1. (a)

    Each AA\in\mathcal{F} contains some member of 𝒢\mathcal{G}.

  2. (b)

    S𝒢q|S|1/2\sum_{S\in\mathcal{G}}q^{|S|}\leq 1/2.

A family 𝒢2X\mathcal{G}\subseteq 2^{X} that satisfies (a) is called a cover of \mathcal{F}.

Remark 5.4.

The definition of q()q(\mathcal{F}) eliminates the ”symmetry” requirement – which seems very natural (and seemingly easier to deal with) in the context of thresholds for subgraph containments – from the definition of p𝔼p_{\mathbb{E}}. It is worth noting that this flexibility is crucially used in the proof of Theorem 5.7 in [15].

The next proposition says that q()q(\mathcal{F}) still provides a lower bound on the threshold.

Proposition 5.5.

For any finite set XX and increasing property 2X\mathcal{F}\subseteq 2^{X},

q()pc().q(\mathcal{F})\leq p_{c}(\mathcal{F}).
Justification..

Write q=q()q=q(\mathcal{F}). By the definition of pc()p_{c}(\mathcal{F}), it suffices to show that μq()1/2\mu_{q}(\mathcal{F})\leq 1/2. We have

μq()S𝒢SAμq(A)S𝒢BSμq(B)=S𝒢q|S|1/2\begin{split}\mu_{q}(\mathcal{F})\leq\sum_{S\in\mathcal{G}}\sum_{S\subseteq A\in\mathcal{F}}\mu_{q}(A)&\leq\sum_{S\in\mathcal{G}}\sum_{B\supseteq S}\mu_{q}(B)\\ &=\sum_{S\in\mathcal{G}}q^{|S|}\leq 1/2\end{split}

where the first inequality uses the fact that 𝒢\mathcal{G} covers \mathcal{F}. ∎

For a graph FF, write F\mathcal{F}_{F} for the increasing graph property of containing a copy of FF. The example below illustrates the relationship between p𝔼(F)p_{\mathbb{E}}(F) and q(F)q(\mathcal{F}_{F}).

Example 5.6 (Example 4.3 revisited).

For X=([n]2)X={[n]\choose 2} (so Xp=Gn,pX_{p}=G_{n,p}) and the increasing property ={contain H~}(2X)\mathcal{F}=\{\text{contain }\tilde{H}\}(\subseteq 2^{X}),

𝒢1:={all the (labelled) copies of H~ in Kn}\mathcal{G}_{1}:=\{\text{all the (labelled) copies of $\tilde{H}$ in $K_{n}$}\}

is a cover of \mathcal{F}. The left-hand side of Definition 5.3 (b) is

S𝒢1q|S|=(number of H~’s in Kn)×(each copy of H~ is present in Gn,p),\begin{split}\sum_{S\in\mathcal{G}_{1}}q^{|S|}=&(\text{number of $\tilde{H}$'s in $K_{n}$})\\ &\times\mathbb{P}(\text{each copy of $\tilde{H}$ is present in $G_{n,p}$}),\end{split}

which is precisely the expected number of H~\tilde{H}’s in Gn,pG_{n,p}. Combined with Proposition 5.5, the above computation gives that n5/6pc()n^{-5/6}\lesssim p_{c}(\mathcal{F}).

On the other hand, we have (implicitly) discussed in Example 4.3 that there is another cover that gives a lower bound better than that of 𝒢1\mathcal{G}_{1}; if we take

𝒢2:={all the (labelled) copies of H in Kn},\mathcal{G}_{2}:=\{\text{all the (labelled) copies of $H$ in $K_{n}$}\},

then the computation in Definition 5.3 (b) gives that n4/5pc()n^{-4/5}\lesssim p_{c}(\mathcal{F}).

The above discussion shows that, for any (not necessarily fixed) graph FF,

p𝔼(F)q(F)p_{\mathbb{E}}(F)\lesssim q(\mathcal{F}_{F})

(whether p𝔼(F)q(F)p_{\mathbb{E}}(F)\asymp q(\mathcal{F}_{F}) is unknown). The abstract version of the Kahn–Kalai Conjecture is similar to its graph version, with p𝔼p_{\mathbb{E}} replaced by q()q(\mathcal{F}). This is what’s proved in [15].

Theorem 5.7 (Park-Pham [15], conjectured in [12, 17]).

There exists a constant KK such that for any finite set XX and increasing property 2X\mathcal{F}\subseteq 2^{X},

pc()Kq()log()p_{c}(\mathcal{F})\leq Kq(\mathcal{F})\log\ell(\mathcal{F})

where ()\ell(\mathcal{F}) is the size of a largest minimal element of \mathcal{F}.

Theorem 5.7 is extremely powerful; for instance, its immediate consequences include historically difficult results such as the resolutions of Shamir’s Problem [10] and the ”Tree Conjecture” [14]. Here we just mention one smaller consequence:

Example 5.8.

If FF is a fixed graph, then (F)\ell(\mathcal{F}_{F}) is the number of edges in FF, thus a constant. So in this case Theorem 5.7 says pc()q()p_{c}(\mathcal{F})\asymp q(\mathcal{F}), which recovers Theorem 4.4.

Sunflower Conjecture, and ”fractional” Kahn-Kalai. The proof of Theorem 5.7 is strikingly easy given its powerful consequences. The approach is inspired by remarkable work of Alweiss, Lovett, Wu, and Zhang [2] on the Erdős-Rado Sunflower Conjecture, which seemingly has no connection to threshold phenomena. This totally unexpected connection was first exploited by Frankston, Kahn, Nayaranan, and the author in [8], where a ”fractional” version of the Kahn-Kalai conjecture (conjectured by Talagrand [17]) was proved, illustrating how two seemingly unrelated fields of mathematics can be nicely connected!

Note that q()q(\mathcal{F}) is in theory hard to compute. For instance, in Example 4.3, we can estimate p𝔼(H~)p_{\mathbb{E}}(\tilde{H}) by finding FH~F\subseteq\tilde{H} with the maximum e(F)/v(F)e(F)/v(F). On the other hand, to compute q(H~)q(\mathcal{F}_{\tilde{H}}), we should in principle consider all possible covers of H~\mathcal{F}_{\tilde{H}}, which is typically not feasible. The good news is that there is a convenient way to find an upper bound on q()q(\mathcal{F}), which is often of the correct order. Namely, Talagrand [17] introduced a notion of fractional expectation threshold, qf()q_{f}(\mathcal{F}), satisfying

q()qf()pc()q(\mathcal{F})\leq q_{f}(\mathcal{F})\leq p_{c}(\mathcal{F})

for any increasing property .\mathcal{F}. He conjectured (and it was proved in [8]) that the (abstract) Kahn-Kalai Conjecture (now Theorem 5.7) holds with qf()q_{f}(\mathcal{F}) in place of q()q(\mathcal{F}). This puts us in linear programming territory: by LP duality, a bound qf()αq_{f}(\mathcal{F})\leq\alpha (α[0,1]\alpha\in[0,1]) is essentially equivalent to existence of an ”α\alpha-spread” probability measure on \mathcal{F}. In all applications of Theorem 5.7 to date, what is actually used to upper bound q()q(\mathcal{F}) is an appropriately spread measure.121212The problem of constructing well-spread measure is getting growing attention now: see e.g. [13] for a start. So all these applications actually follow from the weaker Talagrand version.

We close this article with a very interesting conjecture of Talagrand [17] that would imply the equivalence of Theorem 5.7 and its fractional version:

Conjecture 5.9.

There exists a constant KK such that for any finite set XX and increasing property 2X,\mathcal{F}\subseteq 2^{X},

qf()Kq().q_{f}(\mathcal{F})\leq Kq(\mathcal{F}).

Acknowledgement. The author is grateful to Jeff Kahn for his helpful comments.

References

  • [1] M. Ajtai, J. Komlós and Szemerédi, The first occurrence of Hamilton cycles in random graphs, Annals of Discrete Mathematics, 27 (1985) 173-178. MR0821516
  • [2] R. Alweiss, S. Lovett, K. Wu, and J. Zhang, Improved bounds for the sunflower lemma, Ann. of Math. (2) 194 (2021), no. 3, 795–815. MR4334977
  • [3] B. Bollobás, The evolution of sparse graphs, Graph theory and combinatorics, Cambridge (1983), 35–57. MR0777163
  • [4] B. Bollobás, Random Graphs, second ed., Cambridge Studies in Advanced Mathematics, vol. 73, Cambridge University Press, Cambridge (2001). MR1864966
  • [5] B. Bollobás and A. Thomason, Threshold functions, Combinatorica 7 (1987), 35-38. MR0905149
  • [6] P. Erdős and A. Rényi, On the evolution of random graphs, Publ Math Inst Hungar Acad Sci 5 (1960), 17-61. MR0125031
  • [7] P. Erdős and A. Rényi, On the existence of a factor of degree one of a connected random graph, Acta. Math. Acad. Sci. Hungar. 17 (1966) 359-368. MR0200186
  • [8] K. Frankston, J. Kahn, B. Narayanan, and J. Park, Thresholds versus fractional expectation-thresholds, Ann. of Math. (2) 194 (2021), no. 2, 475-495. MR4298747
  • [9] A. Heckel, M. Kaufmann, N. Müller, and M. Pasch, The hitting time of clique factors, Preprint, arXiv 2302.08340
  • [10] A. Johansson, J. Kahn, and V. Vu, Factors in random graphs, Random Structures Algorithms 33 (2008), 1–28. MR2428975
  • [11] J. Kahn, Hitting times for Shamir’s problem, Transactions of the American Mathematical Society 375, no. 01 (2022), 627-668. MR4358678
  • [12] J. Kahn and G. Kalai, Thresholds and expectation thresholds, Combin. Probab. Comput. 16 (2007), 495-502. MR2312440
  • [13] D. Kang, T. Kelly, D. Kühn, A. Methuku, and D. Osthus Thresholds for Latin squares and Steiner triple systems: Bounds within a logarithmic factor, Transactions of the American Mathematical Society, to appear.
  • [14] R. Montgomery, Spanning trees in random graphs, Adv. Math. 356 (2019), 106793, 92. MR3998769
  • [15] J. Park and H. T. Pham, A proof of the Kahn–Kalai Conjecture, Journal of the American Mathematical Society, to appear.
  • [16] L. Pósa, Hamiltonian circuits in random graphs, Discrete Math. 14 (1976), 359-364. MR0389666
  • [17] M. Talagrand, Are many small sets explicitly small?, STOC’10 – Proceedings of the 2010 ACM International Symposium on Theory of Computing, 13-35, ACM, New York, 2010. MR2743011