This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

How to determine if a random graph with a fixed degree sequence has a giant component

Felix Joos School of Mathematics, University of Birmingham, Birmingham, B15 2TT, UK. E-mail: f.joos@bham.ac.uk. This coauthor was supported by the EPSRC, grant no. EP/M009408/1.    Guillem Perarnau School of Mathematics, University of Birmingham, Birmingham, B15 2TT, UK. E-mail: p.melliug@gmail.com. This coauthor was supported by the European Research Council, ERC Grant Agreement no. 306349.    Dieter Rautenbach Institute of Optimization and Operations Research, Ulm University, Ulm, Germany. E-mail: dieter.rautenbach@uni-ulm.de    Bruce Reed CNRS, France; Kawarabayashi Large Graph Project, NII, Japan; IMPA, Brasil; E-mail: breed@sophia.inria.fr
Abstract

For a fixed degree sequence 𝒟=(d1,,dn)\mathcal{D}=(d_{1},...,d_{n}), let G(𝒟)G(\mathcal{D}) be a uniformly chosen (simple) graph on {1,,n}\{1,...,n\} where the vertex ii has degree did_{i}. In this paper we determine whether G(𝒟)G(\mathcal{D}) has a giant component with high probability, essentially imposing no conditions on 𝒟\mathcal{D}. We simply insist that the sum of the degrees in 𝒟\mathcal{D} which are not 2 is at least λ(n)\lambda(n) for some function λ\lambda going to infinity with nn. This is a relatively minor technical condition, and when 𝒟\mathcal{D} does not satisfy it, both the probability that G(𝒟)G(\mathcal{D}) has a giant component and the probability that G(𝒟)G(\mathcal{D}) has no giant component are bounded away from 11.

1 Introduction

The traditional Erdős-Rényi model of a random network is of little use in modelling the type of complex networks which modern researchers study. Such a graph can be constructed by adding edges one by one such that in every step, every pair of non-adjacent vertices is equally likely to be connected by the new edge. However, 21st century networks are of diverse nature and usually exhibit inhomogeneity among their nodes and correlations among their edges. For example, we observe empirically in the web that certain authoritative pages will have many more links entering them than typical ones. This motivates the study, for a fixed degree sequence 𝒟=(d1,,dn){\cal D}=(d_{1},...,d_{n}), of a uniformly chosen simple graph G(𝒟)G({\cal D}) on {1,,n}\{1,...,n\} where the vertex ii has degree did_{i}. In this paper, we study the existence of a giant component in G(𝒟)G(\mathcal{D}).

A heuristic argument suggests that a giant component will exist provided that the sum of the squares of the degrees is larger than twice the sum of the degrees. In 1995, Molloy and Reed essentially proved this to be the case provided that the degree sequence under consideration satisfied certain technical conditions [24]. This work has attracted considerable attention and has been applied to random models of a wide range of complex networks such as the World Wide Web or biological systems operating at a sub-molecular level [1, 2, 4, 27, 28]. Furthermore, many authors have obtained related results which formalize the Molloy-Reed heuristic argument under different sets of technical conditions [5, 16, 18, 21, 25].

Unfortunately, these technical conditions do not allow the application of such results to many degree sequences that describe real-world networks. While these conditions are of different nature, here we exemplify their limitations with a well-known example, scale-free networks. A network is scale-free if its degree distribution follows a power-law, governed by a specific exponent. It is well-known that many real-world networks are scale-free and one of the main research topic in this area is to determine the exponent of a particular network. It has been observed that many scale-free networks have a fat-tailed power-law degree distribution with exponent between 22 and 33. This is the case of the World Wide Web, where the exponent is between 2.152.15 and 2.22.2 [9], or the Movie Actor network, with exponent 2.32.3 [3]. In scale-free networks with exponents between 22 and 33, the vertices of high degree (called hubs) have a crucial role in several of the network properties such as in the “small-world” phenomenon. However, one of the many technical conditions under the previous results on the existence of a giant component in G(𝒟)G(\mathcal{D}) hold, is that the vertices of high degree do not have a large impact on the structure of the graph. (In particular, it is required that there is no mass of edges in vertices of non-constant degree.) Hence, often these results cannot be directly applied to real-world networks where hubs are present and for each particular network ad-hoc approaches are needed (see e.g. the Aiello-Chung-Lu model for the case of scale-free networks [1]).

Another problem is that all the previous results apply to a sequence of degree sequences (𝒟n)n1(\mathcal{D}_{n})_{n\geq 1} satisfying various technical conditions instead of a single degree sequence 𝒟\mathcal{D}. Finally, all the previous results on the existence of a giant component in G(𝒟)G(\mathcal{D}) do not cover degree sequences where most of the vertices have degree 22.

In this paper we characterize when G(𝒟)G(\mathcal{D}) has a giant component for every feasible111A degree sequence is feasible if there is a graph with the given degree sequence. degree sequence 𝒟\mathcal{D} of length nn. We only require that the sum of the degrees in the sequence which are not 22 is at least λ(n)\lambda(n) for some arbitrary function λ\lambda going to infinity with nn. Besides the fact that it is a relatively minor technical condition, we also show that if it is not satisfied, both the probability that G(𝒟)G(\mathcal{D}) has a giant component and the probability that G(𝒟)G(\mathcal{D}) has no giant component are bounded away from 11.

It turns out that the heuristic argument which was used in [24] to describe the existence of a giant component in G(𝒟)G(\mathcal{D}) for degree sequences satisfying some technical conditions and that was generalized in the subsequent papers [5, 16, 18, 21, 25], actually suggests the wrong answer for general degree sequences. Precisely, if we let SS be a smallest set such that (i) no vertex outside of SS has degree bigger than a vertex in SS, and (ii) the sum of the squares of the degrees of the vertices outside of SS is at most twice the sum of their degrees, then whether or not a giant component exists depends on the sum of the degrees of the vertices in SS, not on the sum of the squares of the degrees of the vertices in SS as suggested by this heuristic argument

This new unified criterion on the existence of a giant component in G(𝒟)G(\mathcal{D}) is valid for every sequence 𝒟\mathcal{D} and implies all the previous results on the topic both for arbitrary degree sequences [5, 18, 24] or for particular models [1].

1.1 The Molloy-Reed Approach

Let us first describe the result of Molloy and Reed [24]. For every 1in1\leq i\leq n, one can explore the component containing a specific initial vertex ii of a graph on {1,,n}\{1,...,n\} via breadth-first search. Initially we have did_{i} “open” edges out of ii. Upon exposing the other endpoint jj of such an open edge, it is no longer open, but we gain dj1d_{j}-1 open edges out of jj. Thus, the number of open edges has increased by dj2d_{j}-2 (note that this is negative if dj=1d_{j}=1).

One can generate the random graph G(𝒟)G({\cal D}) for 𝒟=(d1,,dn){\cal D}=(d_{1},...,d_{n}) and carry out this exploration at the same time, by choosing each vertex as jj with the appropriate probability.

Intuitively speaking, the probability we pick a specific vertex jj as the other endpoint of the first exposed edge is proportional to its degree. So, the expected increase in the number of open edges in the first step is equal to k{1,..,n},kidk(dk2)k{1,..,n},kidk\frac{\sum_{k\in\{1,..,n\},k\neq i}d_{k}(d_{k}-2)}{\sum_{k\in\{1,..,n\},k\neq i}d_{k}}. Thus, it is positive essentially if and only if the sum of the squares of the degrees exceeds twice the sum of the degrees.

Suppose that this expected increase remains the same until we have exposed a linear number of vertices. It seems intuitively clear that if the expected increase is less than 0, then the probability that initial vertex ii is in a linear order component is o(1)o(1), and hence the probability that G(𝒟)G({\cal D}) has no linear order component is 1o(1)1-o(1). If for some positive constant ϵ\epsilon, the expected increase is at least ϵ\epsilon, then there is some γ=γ(ϵ)>0\gamma=\gamma(\epsilon)>0 such that the probability that ii is in a component with at least γn\gamma n vertices exceeds γ\gamma.

In [24], Molloy and Reed proved, subject to certain technical conditions which required them to discuss sequences of degree sequences rather than one single degree sequence, that we essentially have that (i) if k=1ndk(dk2)k=1ndk>ϵ\frac{\sum_{k=1}^{n}d_{k}(d_{k}-2)}{\sum_{k=1}^{n}d_{k}}>\epsilon for some ϵ>0\epsilon>0, then the probability that G(𝒟)G({\cal D}) has a giant component is 1o(1)1-o(1), and (ii) if k=1ndk(dk2)k=1ndk<ϵ\frac{\sum_{k=1}^{n}d_{k}(d_{k}-2)}{\sum_{k=1}^{n}d_{k}}<-\epsilon for some ϵ>0\epsilon>0, then the probability that G(𝒟)G({\cal D}) has no giant component is 1o(1)1-o(1). We present their precise result and some of its generalizations later in this introductory section.

1.2 Our Refinement

It turns out that, absent the imposed technical conditions, the expected increase may change drastically during the exploration process. Consider for example the situation in which n=k2n=k^{2} for some large odd integer kk, d1=d2=..=dn1=1d_{1}=d_{2}=..=d_{n-1}=1 and dn=2kd_{n}=2k. Then k=1ndk(dk2)k=1ndk=4k24k(n1)2k+n13\frac{\sum_{k=1}^{n}d_{k}(d_{k}-2)}{\sum_{k=1}^{n}d_{k}}=\frac{4k^{2}-4k-(n-1)}{2k+n-1}\approx 3, and so the Molloy-Reed approach would suggest that with probability 1o(1)1-o(1) there is a giant component. However, with probability 11, G(𝒟)G({\cal D}) is the disjoint union of a star with 2k2k leaves and n2k12\frac{n-2k-1}{2} components of order 22 and hence it has no giant component. The problem is that as soon as we explore vertex nn, the expected increase drops from roughly 33 to 1-1, so it does not stay positive throughout (the beginning of) the process.

Thus, we see that the Molloy-Reed criterion cannot be extended for general degree sequences. To find a variant which applies to arbitrary degree sequences, we need to characterize those for which the expected increase remains positive for a sufficiently long time.

Intuitively, since the probability that we explore a vertex is essentially proportional to its degree, in lower bounding the length of the period during which the expected increase remains positive, we could assume that the exploration process picks at each step a highest degree vertex that has not been explored yet. Moreover, note that vertices of degree 22 have a neutral role in the exploration process as exposing such a vertex does not change the number of open edges, provided we assume that our component locally looks like a tree (which turns out to be a good approximation around the critical window). These observations suggest that we should focus on the following invariants of 𝒟\mathcal{D} defined by considering a permutation π\pi of the vertices that satisfies dπ1dπnd_{\pi_{1}}\leq...\leq d_{\pi_{n}}:

  • -

    j𝒟=min({j:j[n] and i=1jdπi(dπi2)>0}{n})j_{\cal D}=\min\left(\Big{\{}j:j\in[n]\mbox{ and }\sum\limits_{i=1}^{j}d_{\pi_{i}}(d_{\pi_{i}}-2)>0\Big{\}}\cup\Big{\{}n\Big{\}}\right),

  • -

    R𝒟=i=j𝒟ndπiR_{\cal D}=\sum\limits_{i=j_{\cal D}}^{n}d_{\pi_{i}}, and

  • -

    M𝒟=i[n]:di2diM_{\cal D}=\sum\limits_{i\in[n]:\,d_{i}\neq 2}d_{i}.

We emphasize that these invariants are determined by the multiset of the degrees given by 𝒟{\cal D} and are independent from the choice of π\pi subject to dπ1dπnd_{\pi_{1}}\leq...\leq d_{\pi_{n}}.

Our intuition further suggests that in the exploration process, the expected increase in the number of open edges will be positive until we have explored R𝒟R_{\cal D} edges and will then become negative. Thus, we might expect to explore a component with about R𝒟R_{\cal D} edges, and indeed we can show this is the case.

This allows us to prove our main result which is that whether G(𝒟)G({\cal D}) has a giant component essentially depends on whether R𝒟R_{\cal D} is of the same order as M𝒟M_{\cal D} or not. There is however a caveat, this is not true if essentially all vertices have degree 22.

For any function λ:\lambda:\mathbb{N}\to\mathbb{N}, we say a degree sequence 𝒟{\cal D} is λ\lambda-well-behaved or simply well-behaved if M𝒟M_{\cal D} is at least λ(n)\lambda(n). Our main results hold for any function λ\lambda\to\infty as nn\to\infty.

Theorem 1.

For any function δ0\delta\to 0 as nn\to\infty, for every γ>0\gamma>0, if 𝒟{\cal D} is a well-behaved feasible degree sequence with R𝒟δ(n)M𝒟R_{\cal D}\leq\delta(n)M_{\cal D}, then the probability that G(𝒟)G({\cal D}) has a component of order at least γn\gamma n is o(1)o(1).

Theorem 2.

For any positive constant ϵ\epsilon, there is a γ>0\gamma>0, such that if 𝒟{\cal D} is a well-behaved feasible degree sequence with R𝒟ϵM𝒟R_{\cal D}\geq\epsilon M_{\cal D}, then the probability that G(𝒟)G({\cal D}) has a component of order at least γn\gamma n is 1o(1)1-o(1).

As we shall see momentarily, previous results in this field apply to sequences of degree sequences, and required that both the proportion of elements of a given degree and the average degree of an element of the degree sequences in the sequence approaches a limit in some smooth way. We can easily deduce results for every sequence of (feasible) degree sequences from the two theorems above, and from our results on degree sequences which are not well-behaved, presented in the next section.

We denote by 𝔇=(𝒟n)n1\mathfrak{D}=({\cal D}_{n})_{n\geq 1} a sequence of degree sequences where 𝒟n\mathcal{D}_{n} has length nn. We say that 𝔇\mathfrak{D} is well-behaved if for every bb, there is an nbn_{b} such that for all n>nbn>n_{b}, we have M𝒟n>bM_{{\cal D}_{n}}>b; 𝔇\mathfrak{D} is lower bounded if for some ϵ>0\epsilon>0, there is an nϵn_{\epsilon} such that for all n>nϵn>{n_{\epsilon}}, we have R𝒟nϵM𝒟nR_{{\cal D}_{n}}\geq\epsilon M_{{\cal D}_{n}}; and 𝔇\mathfrak{D} is upper bounded if for every ϵ>0\epsilon>0, there is an nϵn_{\epsilon} such that for all n>nϵn>{n_{\epsilon}}, we have R𝒟nϵM𝒟nR_{{\cal D}_{n}}\leq\epsilon M_{{\cal D}_{n}}.

The following is an immediate consequence of Theorem 1 and 2, and Theorem 6, which will be presented in the next section.

Theorem 3.

For any well-behaved lower bounded sequence of feasible degree sequences 𝔇\mathfrak{D}, there is a γ>0\gamma>0 such that the probability that G(𝒟n)G({\cal D}_{n}) has a component of order at least γn\gamma n is 1o(1)1-o(1).

For any well-behaved upper bounded sequence of feasible degree sequences 𝔇\mathfrak{D} and every γ>0\gamma>0, the probability that G(𝒟n)G({\cal D}_{n}) has a component of order at least γn\gamma n is o(1)o(1).

If a sequence of feasible degree sequences 𝔇\mathfrak{D} is either not well behaved or neither upper bounded nor lower bounded, then for every sufficiently small positive γ\gamma, there is a 0<δ<10<\delta<1 such that there are both arbitrarily large nn for which the probability that G(𝒟n)G({\cal D}_{n}) has a component of order at least γn\gamma n is at least δ\delta, and arbitrarily large nn for which the probability that G(𝒟n)G({\cal D}_{n}) has a component of order at least γn\gamma n is at most 1δ1-\delta.

1.3 The Special Role of Vertices of Degree 2

At first glance, it may be surprising that the existence of a giant component depends on the ratio between R𝒟R_{\cal D} and M𝒟M_{\cal D} rather than the ratio between R𝒟R_{\cal D} and i=1ndi\sum_{i=1}^{n}d_{i}. It may also be unclear why we have to treat differently degree sequences where the sum of the degrees which are not 2 is bounded.

To clarify why our results are stated as they are, we now highlight the special role of vertices of degree 2.

The non-cyclic components222A component is cyclic if it is a cycle and non-cyclic if it is not. of G(𝒟)G({\cal D}) can be obtained by subdividing the edges of a multigraph H(𝒟)H({\cal D}) none of whose vertices have degree 2 so that every loop is subdivided at least twice, and all but at most one edge of every set of parallel edges is subdivided at least once. Note that H(𝒟)H(\mathcal{D}) can be obtained from G(𝒟)G(\mathcal{D}) by deleting all cyclic components and suppressing all vertices of degree 22.333Here and throughout the paper, when we say we suppress a vertex uu of degree 22, this means we delete uu and we add an edge between its neighbours. Observe that this may create loops and multiple edges, so the resulting object might not be a simple graph. Clearly, H(𝒟)H({\cal D}) is uniquely determined by G(𝒟)G({\cal D}). Moreover, the degree sequence of H(𝒟)H(\mathcal{D}) is precisely that of G(𝒟)G(\mathcal{D}) without the vertices of degree 22.

The number of vertices of a non-cyclic component of G(𝒟)G({\cal D}) equals the sum of the number of vertices of the corresponding component of H(𝒟)H({\cal D}) and the number of vertices of degree 2 used in subdividing its edges. Intuitively, the second term in this sum depends on the proportion of the edges in the corresponding component of H(𝒟)H({\cal D}). Subject to the caveat mentioned above and discussed below, if the number of vertices of degree 2 in G(𝒟)G(\mathcal{D}) is much larger than the size444As it is standard, we use order and size to denote the number of vertices and the number of edges of a graph, respectively. of H(𝒟)H({\cal D}), then the probability that G(𝒟)G({\cal D}) has a giant component is essentially the same as the probability H(𝒟)H({\cal D}) has a component containing a positive fraction of its edges. The same is true, although not as immediately obvious, even if the number of vertices of degree 2 is not this large.

Theorem 4.

For every γ>0\gamma>0, there exists a ρ>0\rho>0 such that for every well-behaved feasible degree sequence 𝒟{\cal D}, the probability that G(𝒟)G({\cal D}) has a component of order at least γn\gamma n and H(𝒟)H({\cal D}) has no component of size at least ρM𝒟\rho M_{\cal D} is o(1)o(1).

Theorem 5.

For every ρ>0\rho>0, there exists a γ>0\gamma>0 such that for every well-behaved feasible degree sequence 𝒟{\cal D}, the probability that G(𝒟)G({\cal D}) has no component of order at least γn\gamma n and H(𝒟)H({\cal D}) has a component of size at least ρM𝒟\rho M_{\cal D} is o(1)o(1).

As we mentioned above, degree sequences which are not well behaved behave differently from well-behaved degree sequences. For instance, suppose that M𝒟=0M_{\cal D}=0. Then H(𝒟)H({\cal D}) is empty and G(𝒟)G({\cal D}) is a uniformly chosen disjoint union of cycles. In this case it is known that the probability of having a giant component is bounded away both from 0 and 11. Indeed, the latter statement also holds whenever M𝒟M_{\cal D} is at most bb for any constant bb.

Theorem 6.

For every b0b\geq 0 and every 0<γ<180<\gamma<\frac{1}{8}, there exist a positive integer nb,γn_{b,\gamma} and a 0<δ<10<\delta<1 such that if n>nb,γn>n_{b,\gamma} and 𝒟{\cal D} is a degree sequence with M𝒟bM_{\cal D}\leq b, then the probability that there is a component of order at least γn\gamma n in G(𝒟)G({\cal D}) lies between δ\delta and 1δ1-\delta.

This theorem both explains why we concentrate on well-behaved degree sequences and sets out how degree sequences which are not well-behaved actually behave (badly obviously). Combining it with Theorem 1 and 2, it immediately implies Theorem 3. We omit the straightforward details.

1.4 Previous Results

The study of the existence of a giant component in random graphs with an arbitrary prescribed degree sequence555Random graphs with special degree sequences had been studied earlier (see, e.g. [22, 33])., started with the result of Molloy and Reed [24]. Although they define the concept of asymptotic degree sequences, we will state all the previous results in terms of sequences of degree sequences 𝔇=(𝒟n)n1\mathfrak{D}=(\mathcal{D}_{n})_{n\geq 1}. Using a symmetry argument, one can easily translate results for sequences of degree sequences to asymptotic degree sequences, and vice versa. For every 𝒟n=(d1(n),,dn(n))\mathcal{D}_{n}=(d_{1}^{(n)},\dots,d_{n}^{(n)}), we define ni=ni(n)=|{j[n]:dj(n)=i}|n_{i}=n_{i}(n)=|\{j\in[n]:\,d_{j}^{(n)}=i\}|. Recall that we only consider degree sequences such that n0=0n_{0}=0.

Before stating their result, we need to introduce a number of properties of sequences of degree sequences. A sequence of degree sequences 𝔇\mathfrak{D} is

  • -

    feasible, if for every n1n\geq 1, there exists at least one simple graph on nn vertices with degree sequence 𝒟n\mathcal{D}_{n}.

  • -

    smooth, if for every nonnegative integer i0i\geq 0, there exists λi[0,1]\lambda_{i}\in[0,1] such that limnnin=λi\lim_{n\to\infty}\frac{n_{i}}{n}=\lambda_{i}.

  • -

    sparse, if there exists λ(0,)\lambda\in(0,\infty) such that limni1inin=λ\lim_{n\to\infty}\sum_{i\geq 1}\frac{in_{i}}{n}=\lambda.

  • -

    ff-bounded, for some function f:f:\mathbb{N}\to\mathbb{R}, if ni=0n_{i}=0 for every i>f(n)i>f(n).

In particular, observe that random graphs G(𝒟n)G(\mathcal{D}_{n}) arising from a sparse sequence of degree sequences 𝔇\mathfrak{D} have a linear number of edges, provided that nn is large enough.

Given that 𝔇\mathfrak{D} is smooth, we define the following parameter

Q(𝔇)=i1i(i2)λi.\displaystyle Q(\mathfrak{D})=\sum_{i\geq 1}i(i-2)\lambda_{i}\;.

Note that Q(𝔇)Q(\mathfrak{D}) is very close to the notion of initial expected increase described in Section 1.1.

We say a sequence of degree sequences 𝔇\mathfrak{D} satisfies the 𝐌𝐑\mathbf{MR}-conditions if

  1. (a.1)

    it is feasible, smooth and sparse,

  2. (a.2)

    it is n1/4ϵn^{1/4-\epsilon}-bounded, for some ϵ>0\epsilon>0,

  3. (a.3)

    for every i1i\geq 1, i(i2)nin\frac{i(i-2)n_{i}}{n} converges uniformly to i(i2)λii(i-2)\lambda_{i}, and

  4. (a.4)

    limni1i(i2)nin\lim_{n\to\infty}\sum_{i\geq 1}i(i-2)\frac{n_{i}}{n} exists and converges uniformly to i1i(i2)λi\sum_{i\geq 1}i(i-2)\lambda_{i}.

For a precise statement of the uniform convergence on conditions (a.3)–(a.4), we refer the reader to [24]. Note that these conditions imply that λ=i1iλi\lambda=\sum_{i\geq 1}i\lambda_{i}.

Now we can precisely state the result of Molloy and Reed [24].

Theorem 7 (Molloy and Reed [24]).

Let 𝔇=(𝒟n)n1\mathfrak{D}=(\mathcal{D}_{n})_{n\geq 1} be a sequence of degree sequences that satisfies the 𝐌𝐑\mathbf{MR}-conditions. Then,

  1. 1.

    if Q(𝔇)>0Q(\mathfrak{D})>0, then there exists a constant c1>0c_{1}>0 such that the probability that G(𝒟n)G(\mathcal{D}_{n}) has a component of order at least c1nc_{1}n is 1o(1)1-o(1).

  2. 2.

    if Q(𝔇)<0Q(\mathfrak{D})<0 and the sequence is n1/8ϵn^{1/8-\epsilon}-bounded for some ϵ>0\epsilon>0, then for every constant c2>0c_{2}>0, the probability that G(𝒟n)G(\mathcal{D}_{n}) has no component of order at least c2nc_{2}n is 1o(1)1-o(1).

Note that the case λ2=1\lambda_{2}=1 is not considered in Theorem 7, since λ2=1\lambda_{2}=1 implies Q(𝔇)=0Q(\mathfrak{D})=0.

Theorem 7 has been generalized to other sequences of degree sequences, which in particular include the case Q(𝔇)=0Q(\mathfrak{D})=0. In Section 8, we show that Theorem 3 implies all the criteria for the existence of a giant component in G(𝒟n)G(\mathcal{D}_{n}) introduced below.666Note that some of these results give a more precise description on the order of the largest component. Our results only deal with the existential question.

We say a sequence of degree sequences 𝔇\mathfrak{D} satisfies the 𝐉𝐋\mathbf{JL}-conditions if

  1. (b.1)

    is feasible, smooth, and sparse,

  2. (b.2)

    i1i2ni=O(n)\sum_{i\geq 1}i^{2}n_{i}=O(n), and

  3. (b.3)

    λ1>0\lambda_{1}>0.

Observe that if 𝔇\mathfrak{D} satisfies the 𝐉𝐋\mathbf{JL}-conditions, then, by (b.2), it is also O(n1/2)O(n^{1/2})-bounded. Moreover, they also imply that λ=i1iλi\lambda=\sum_{i\geq 1}i\lambda_{i}. Janson and Luczak in [18] showed that one can prove a variant of Theorem 7 obtained by replacing the 𝐌𝐑\mathbf{MR}-conditions by the 𝐉𝐋\mathbf{JL}-conditions.777Their result gives convergence in probability of the proportion of vertices in the giant component and they also consider the case Q(𝔇)=0Q(\mathfrak{D})=0. They also note that if λ2=1\lambda_{2}=1, then the criterion based on Q(𝔇)Q(\mathfrak{D}) does not apply. Our results completely describe the case λ2=1\lambda_{2}=1.

We say a sequence of degree sequence 𝔇\mathfrak{D} satisfies the 𝐁𝐑\mathbf{BR}-conditions if

  1. (c.1)

    it is feasible, smooth and sparse,

  2. (c.2)

    i3λi>0\sum_{i\geq 3}\lambda_{i}>0, and

  3. (c.3)

    λ=i1iλi\lambda=\sum_{i\geq 1}i\lambda_{i}.

Bollobás and Riordan in [5] proved a version of Theorem 7 for sequences of degree sequences obtained by replacing the 𝐌𝐑\mathbf{MR}-conditions by the 𝐁𝐑\mathbf{BR}-conditions.888They also proved some results on the distribution of the order of the largest component and also consider the case Q(𝔇)=0Q(\mathfrak{D})=0.

Theorem 7 and its extensions provide easy-to-use criteria for the existence of a giant component and have been widely used by many researchers in the area of complex networks [2, 4, 27]. However, the technical conditions on 𝔇\mathfrak{D} to which they can be applied, restrict its applicability, seem to be artificial and are only required due to the nature of the proofs. It turns out to be the case that many real-world networks do not satisfy these conditions. For this reason, researchers have developed both ad-hoc approaches for proving results for specific types of degree sequences and variants of the Molloy-Reed result which require different sets of technical conditions to be satisfied.

An early example of an ad-hoc approach is the work of Aiello, Chung and Lu on Power-Law Random Graphs [1]. They introduce a model depending on two parameters α,β>0\alpha,\beta>0 that define a degree sequence satisfying ni=eαiβn_{i}=\lfloor e^{\alpha}i^{-\beta}\rfloor. One should think about these parameters as follows: α\alpha is typically large and determines the order of the graph (we always have α=Θ(logn)\alpha=\Theta(\log{n})), and β\beta is a fixed constant that determines the power-decay of the degree distribution. Among other results, the authors prove that there exists β0>0\beta_{0}>0, such that if β>β0\beta>\beta_{0} the probability that there is a component of linear order is o(1)o(1) and if β<β0\beta<\beta_{0} the probability there is a component of linear order is 1o(1)1-o(1). Here, the previous conditions are only satisfied for certain values of β\beta and the authors need to do additional work to determine when a giant component exists for other values of β\beta. In Section 8 we will show how Theorem 3 also implies Aiello-Chung-Lu results on the existence of a giant component in the model of Power-Law Random Graphs.

1.5 Future Directions

Beginning with the early results of Molloy and Reed, the study of the giant component in random graphs with prescribed degree sequence has attracted a lot of attention. Directions of study include determining the asymptotic order of the largest component in the subcritical regime or estimating the order of the second largest component in both regimes [5, 16, 18, 20, 21, 25, 30]. It would be interesting to extend these known results to arbitrary degree sequences.

For example, Theorem 1 and 2 precisely describe the appearance of a giant component when the degree sequence is well-behaved. While bounds on the constant γ\gamma in terms of δ\delta and ϵ\epsilon respectively, may follow from their respective proofs, these bounds are probably not of the right order of magnitude. Molloy and Reed in [25], precisely determined this dependence for sequences of degree sequences that satisfy the 𝐌𝐑\mathbf{MR}-conditions. Precise constants are also given in [5, 14, 18]. We wonder whether it is possible to determine the precise dependence on the parameters for arbitrary degree sequences. It is likely that our methods can be used to find this dependence and to determine the order of the second largest component when a giant one exists.

Another direction is the study of site and bond percolation in G(𝒟)G(\mathcal{D}) for arbitrary degree sequences 𝒟\mathcal{D}. This problem has been already approached for sequences of degree sequences that satisfy certain conditions similar to the ones presented in Section 1.4 [5, 13, 17, 30].

Motivated by some applications in peer-to-peer networks (see, e.g. [6]), one can study efficient sampling of the random graph G(𝒟)G(\mathcal{D}). Cooper et al. [8] showed that the switching chain rapidly mixes for dd-regular graphs for every 3dn13\leq d\leq n-1. Greenhill [15] recently extended this result to G(𝒟)G(\mathcal{D}), but, due to some technical reasons, this result only holds if the maximum degree in 𝒟\mathcal{D} is small enough.

Many other basic properties of G(𝒟)G(\mathcal{D}), such as determining its diameter [11, 31, 32] or the existence of giant cores [7, 10, 19], have already been studied for certain sequences of degree sequences. We believe that our method can help to extend these results to arbitrary degree sequences.

2 A Proof Sketch

2.1 The Approach

The proofs of Theorem 4, 5 and 6 are simpler than the remaining proofs and we delay any discussion of these results to Section 6 and 7. By applying them, we see that in order to prove Theorem 1 and 2 it is enough to prove the following results:

Theorem 8.

For any function δ0\delta\to 0 as nn\to\infty and for every γ>0\gamma>0, if 𝒟\mathcal{D} is a well-behaved degree sequence with R𝒟δ(n)M𝒟R_{\mathcal{D}}\leq\delta(n)M_{\mathcal{D}}, then the probability that H(𝒟)H(\mathcal{D}) has a component of size at least γM𝒟\gamma M_{\mathcal{D}} is o(1)o(1).

Theorem 9.

For any positive constant ϵ\epsilon, there is a γ>0\gamma>0 such that if 𝒟\mathcal{D} is a well-behaved degree sequence with R𝒟ϵM𝒟R_{\mathcal{D}}\geq\epsilon M_{\mathcal{D}}, then the probability that H(𝒟)H(\mathcal{D}) has a component of size at least γM𝒟\gamma M_{\mathcal{D}} is 1o(1)1-o(1).

The proofs of both theorems analyse an exploration process similar to the one discussed in Section 1.1 by combining probabilistic tools with a combinatorial switching argument. However, we will focus on the edges of H(𝒟)H(\mathcal{D}) rather than the ones of G(𝒟)G({\cal D}). Again, we will need to bound the expected increase of the number of open edges throughout the process and prove that the (random) increase is highly concentrated around its expected value. In order to do so, we will need to bound the probability that the next vertex of H(𝒟)H({\cal D}) explored in the process, is a specific vertex ww. One of the key applications of our combinatorial switching technique will be to estimate this probability and show that it is approximately proportional to the degree of ww.

Crucial to this approach is that the degrees of the vertices explored throughout the process are not too high. Standard arguments for proving concentration of a random variable require that the change at each step is relatively small. This translates precisely to an upper bound on the maximum degree of the explored graph. Furthermore, without such a bound on the maximum degree, we cannot obtain good bounds on the probability that a certain vertex ww is the next vertex explored in the process. So, a second key ingredient in our proofs will be a preprocessing step which allows us to handle the vertices of high degree, ensuring that we will not encounter them in our exploration process.

2.2 The Exploration Process

We consider a variant of the exploration process where we start our exploration at a non-empty set S0S_{0} of vertices of H(𝒟)H(\mathcal{D}), rather than at just one vertex.

Thus, we see that the exploration takes |V(H(𝒟))S0||V(H(\mathcal{D}))\setminus S_{0}| steps and produces sets

S0S1S2S|V(H(𝒟))S0|,S_{0}\subset S_{1}\subset S_{2}\subset\ldots\subset S_{|V(H(\mathcal{D}))\setminus S_{0}|}\;,

where wt=StSt1w_{t}=S_{t}\setminus S_{t-1} is either a neighbour of a vertex vtv_{t} of St1S_{t-1} or is a randomly chosen vertex in V(H(𝒟))St1V(H(\mathcal{D}))\setminus S_{t-1} if there are no edges between St1S_{t-1} and V(H(𝒟))St1V(H(\mathcal{D}))\setminus S_{t-1}.

To specify our exploration process precisely, we need to describe how we choose vtv_{t} and wtw_{t}. To aid in this process, for each vertex vV(H(𝒟))v\in V(H({\cal D})) we will choose a uniformly random permutation of its adjacency list in G(𝒟)G({\cal D}). For this purpose, an input of our exploration process consists of a graph GG equipped with an ordering of its adjacency lists for all vertices vV(H(𝒟))v\in V(H(\mathcal{D})). Applying the method of deferred decisions (cf. Section 2.4 in [26]), we can generate these random linear orders as we go along with our process. We note that this yields, in a natural manner, an ordering of the non-loop edges of H(𝒟)H({\cal D}) which have the vertex vv as an endpoint. If there are no edges between St1S_{t-1} and V(H(𝒟))St1V(H(\mathcal{D}))\setminus S_{t-1}, we choose each vertex of V(H(𝒟))St1V(H({\cal D}))\setminus S_{t-1} to be wtw_{t} with probability proportional to its degree. Otherwise we choose the smallest vertex vtv_{t} of St1S_{t-1} (with respect to the natural order in {1,,n}\{1,\dots,n\}), which has a neighbour in V(H(𝒟))St1V(H(\mathcal{D}))\setminus S_{t-1}. We expose the edge of H(𝒟)H(\mathcal{D}) from vtv_{t} to V(H(𝒟))St1V(H(\mathcal{D}))\setminus S_{t-1} which appears first in our random ordering and let wtw_{t} be its other endpoint. Furthermore, we expose all the edges of H(𝒟)H(\mathcal{D}) from wtw_{t} to St1{vt}S_{t-1}\setminus\{v_{t}\} as well as the loops incident to wtw_{t}. Finally, we expose the paths of G(𝒟)G({\cal D}) corresponding to the edges of H(𝒟)H({\cal D}) which we have just exposed and the position in the random permutation of the adjacency list of wtw_{t} in G(𝒟)G({\cal D}) of the edges we have just exposed.

Thus, after tt iterations of our exploration process we have exposed

  • the subgraph of H(𝒟)H({\cal D}) induced by StS_{t},

  • the paths of G(𝒟)G({\cal D}) corresponding to the exposed edges of H(𝒟)H({\cal D}), and

  • where each initial and final edge of such a path appears in the random permutation of the adjacency list of any of its endpoints which is also an endpoint of the path.

We refer to this set of information as the configuration 𝒞t{\cal C}_{t} at time tt. A configuration can also be understood as a set of inputs. During our analysis of the exploration process, we will consider all the probabilities of events conditional on the current configuration.

An important parameter for our exploration process is the number XtX_{t} of edges of H(𝒟)H({\cal D}) between StS_{t} and V(H(𝒟))StV(H({\cal D}))\setminus{S}_{t}. We note that if Xt=0X_{t}=0, then StS_{t} is the union of some components of H(𝒟)H({\cal D}) containing all of S0S_{0}. We note that if |S0|=1|S_{0}|=1, then every XtX_{t} is a lower bound on the maximum size of a component of H(𝒟)H({\cal D}) (not necessarily the one containing the vertex in S0S_{0}).

We prove Theorem 8 by showing that under its hypotheses for every vertex vv of H(𝒟)H({\cal D}), there is a set S0=S0(v)S_{0}=S_{0}(v) containing vv such that, given we start our exploration process with S0S_{0}, the probability that there is a tt with Xt=0X_{t}=0 for which the number of edges within StS_{t} is at most γM𝒟\gamma M_{\cal D}, is 1o(M𝒟1)1-o(M_{\cal D}^{-1}). Since H(𝒟)H({\cal D}) has at most 2M𝒟2M_{\cal D} vertices, it follows that the probability that H(𝒟)H({\cal D}) has a component of size at least γM𝒟\gamma M_{\cal D} is o(1)o(1). The set S0{v}S_{0}\setminus\{v\} is a set of highest degree vertices the sum of whose degrees exceeds R𝒟R_{\mathcal{D}}. By the definition of j𝒟j_{\cal D} and R𝒟R_{\cal D}, this implies that, unless X0=0X_{0}=0, the expectation of X1X0X_{1}-X_{0} is negative. We show that, as the process continues, the expectation of XtXt1X_{t}-X_{t-1} becomes even smaller. We can prove that the actual change of XtX_{t} is highly concentrated around its expectation and hence complete the proof, because S0S_{0} contains all the high degree vertices and so in the analysis of our exploration process we only have to deal with low degree vertices.

We prove (a slight strengthening of) Theorem 9 for graphs without large degree vertices by showing that under its hypotheses and setting S0S_{0} to be a random vertex vv chosen with probability proportional to its degree, with probability 1o(1)1-o(1), there exists some tt such that XtγM𝒟X_{t}\geq\gamma M_{\cal D} (and hence there is a component of H(𝒟)H({\cal D}) of size at least γM𝒟\gamma M_{\cal D}). Key to doing so is that the expected increase of XtX_{t} is a positive fraction of the increase in the sum of the degrees of the vertices in StS_{t} until this sum approaches R𝒟R_{\cal D}. To handle the high degree vertices, we expose the edges whose endpoints are in components containing a high degree vertex. If this number of edges is at least a constant fraction of M𝒟M_{\cal D}, then we can show that in fact all the high degree vertices lie in one component, which therefore contains a constant fraction of the edges of H(𝒟)H({\cal D}). Otherwise, we show that the conditions of Theorem 9 (slightly relaxed) hold in the remainder of the graph, which has no high degree vertices, so we can apply (a slight strengthening of) Theorem 9 to find the desired component of H(𝒟)H({\cal D}).

3 Switching

As mentioned above, the key to extending our branching analysis to arbitrary well-behaved degree sequences is a combinatorial switching argument. In this section, we describe the type of switchings we consider and demonstrate the power of the technique.

Let HH be a multigraph. We say a multigraph HH^{\prime} is obtained by switching from HH on a pair of orientated and distinct edges uvuv and xyxy if HH^{\prime} can be obtained from HH by deleting uvuv and xyxy as well as adding the edges uxux and vyvy. Observe that switching uxux and vyvy in HH^{\prime} yields HH. Observe further that if HH is simple and we want to ensure that HH^{\prime} is simple, then we must insist that uxu\neq x, vyv\neq y and, unless u=yu=y or v=xv=x, the edges uxux and vyvy are not edges of HH.

Switching was introduced in the late 19th century by Petersen [29]. Much later, McKay [23] reintroduced the method to count graphs with prescribed degree sequences and, together with Wormald, used it in the study of random regular graphs. We refer the interested reader to the survey of Wormald on random regular graphs for a short introduction to the method [34].

In this paper we will consider standard switchings as well as a particular extension of them. This extension concerns pairs consisting of a simple graph GG and the multigraph HGH_{G} obtained from GG by deleting its cyclic components and suppressing the vertices of degree 2 in the non-cyclic ones. For certain switchings of HGH_{G} which yield HH^{\prime}, our extension constructs a simple graph GG^{\prime} from GG such that HG=HH_{G^{\prime}}=H^{\prime}. We now describe for which switchings in HGH_{G} we can obtain such an HH^{\prime} and how we do so.

Our extension considers directed walks (either a path or a cycle) of GG which correspond to (oriented) edges in HGH_{G}, (note that an edge of HGH_{G} corresponds to exactly two such directed walks, even if it is a loop and hence has only one orientation). We can switch on an ordered pair of such directed walks in GG, corresponding to an ordered pair of orientated distinct edges e1=uve_{1}=uv and e2=xye_{2}=xy of HGH_{G}, such that none of the following hold:

  1. (i)

    there is an edge of GG between uu and xx which forms neither e1e_{1} nor e2e_{2}, and the walk corresponding to e1e_{1} has one edge,

  2. (ii)

    there is an edge of GG between vv and yy which forms neither e1e_{1} nor e2e_{2} and the walk corresponding to e2e_{2} has one edge,

  3. (iii)

    u=xu=x and the directed walk corresponding to e1e_{1} has at most two edges, or

  4. (iv)

    v=yv=y and the directed walk corresponding to e2e_{2} has at most two edges.

To do so, let u=w0,w1,,wr=vu=w_{0},w_{1},\dots,w_{r}=v be the directed walk corresponding to e1e_{1} and let x=z0,z1,,zs=yx=z_{0},z_{1},\dots,z_{s}=y be the directed walk corresponding to e2e_{2}. We delete the edges wr1vw_{r-1}v and xz1xz_{1} and add the edges wr1xw_{r-1}x and vz1vz_{1}.

We note that (i)-(iv) ensure that we obtain a simple graph GG^{\prime}. Furthermore, we have that HGH_{G^{\prime}} is obtained from HGH_{G} by switching on uvuv and xyxy. We remark further that if we reverse both the ordering of the edges and the orientation of both edges, we always obtain the same graph GG^{\prime}; that is, it is equivalent to switch the ordered pair (uv,xy)(uv,xy) or the ordered pair (yx,vu)(yx,vu). Therefore, given two walks between uu and vv and between xx and yy (either paths or cycles) of GG, we always consider the four following possible switches: (uv,xy)(uv,xy), (uv,yx)(uv,yx), (vu,xy)(vu,xy) and (vu,yx)(vu,yx). We note that some of these choices might give rise to the same graph GG^{\prime}. However, we consider each of them as a valid switch since it will be simpler to count them considering these multiplicities.

Refer to caption
Figure 1: A graph GG and its corresponding graph HGH_{G} before and after switching the directed walks corresponding to ordered pair of oriented edges uvuv and xyxy in HGH_{G}.

Given any two disjoint sets of (multi)graphs 𝒜\mathcal{A} and \mathcal{B}, we can build an auxiliary bipartite graph with vertex set 𝒜\mathcal{A}\cup\mathcal{B} where we add an edge between H𝒜H\in\mathcal{A} and HH^{\prime}\in\mathcal{B} for every (extended) switching that transforms HH into HH^{\prime}, or equivalently, HH^{\prime} into HH. We can also consider subgraphs of this auxiliary graph where we only add an edge if the switching satisfies some special property. Given a lower bound d𝒜d_{\mathcal{A}} on the degrees in 𝒜\mathcal{A} and an upper bound dd_{\mathcal{B}} on the degrees in \mathcal{B}, we obtain immediately that |𝒜|dd𝒜|||\mathcal{A}|\leq\frac{d_{\mathcal{B}}}{d_{\mathcal{A}}}|\mathcal{B}|. We frequently use this fact without explicitly referring to it.

To illustrate our method, we show here that if M𝒟M_{\mathcal{D}} is large with respect to the number of vertices, then there exists a component containing most of the vertices.

Lemma 10.

If M𝒟nloglognM_{\mathcal{D}}\geq n\log\log n, then the probability that G(𝒟)G(\mathcal{D}) has a component of order (1o(1))n(1-o(1))n is 1o(1)1-o(1).

In proving the lemma, we will need the following straightforward result on 22-edge cuts of graphs. We defer its proof to the end of the section.

Lemma 11.

The number of pairs of orientations of edges uv,xyuv,xy in a graph GG of order nn such that by switching on uvuv and xyxy we obtain a graph with one more component than GG is at most 8n28n^{2}.

Proof of Lemma 10.

We can assume nn is large enough to satisfy an inequality stated below since the lemma makes a statement about asymptotic behaviour. Let K=(11loglogn)nK=\lfloor(1-\frac{1}{\sqrt{\log\log n}})n\rfloor. For every integer k1k\geq 1, let k\mathcal{F}_{k} be the event that G(𝒟)G(\mathcal{D}) has exactly kk components and let k\mathcal{F}_{k}^{\prime} be the event that GG is in k\mathcal{F}_{k} and that all components of GG have order at most KK. Denote by =k2k\mathcal{F}^{\prime}=\cup_{k\geq 2}\mathcal{F}_{k}^{\prime}. Our goal is to show that []=o(1)\mathbb{P}[\mathcal{F}^{\prime}]=o(1). If so, with high probability GG has a component of order larger than KK. Observe that if one proves for some function f:+f:\mathbb{N}\to\mathbb{R}^{+} with f(n)0f(n)\to 0 as nn\to\infty that [k+1]f(n)[k]\mathbb{P}[\mathcal{F}_{k+1}^{\prime}]\leq f(n)\mathbb{P}[\mathcal{F}_{k}] for every k1k\geq 1, then []=k1[k+1]f(n)(k1[k])=f(n)=o(1)\mathbb{P}[\mathcal{F}^{\prime}]=\sum_{k\geq 1}\mathbb{P}[\mathcal{F}_{k+1}^{\prime}]\leq f(n)\left(\sum_{k\geq 1}\mathbb{P}[\mathcal{F}_{k}]\right)=f(n)=o(1). We adopt this approach with f(n)=16loglognf(n)=\frac{16}{\sqrt{\log\log{n}}}.

Fix k1k\geq 1. Now suppose that there exist s+s^{+} and ss^{-} such that for every GG in k\mathcal{F}_{k}, there are at most s+s^{+} switchings that transform GG into a graph in k+1\mathcal{F}_{k+1}^{\prime}, and for every graph GG in k+1\mathcal{F}_{k+1}^{\prime}, there are at least ss^{-} switchings that transform GG into a graph in k\mathcal{F}_{k}. Then,

[k+1]s[k]s+.\displaystyle\mathbb{P}[\mathcal{F}_{k+1}^{\prime}]s^{-}\leq\mathbb{P}[\mathcal{F}_{k}]s^{+}\;.

Let us now obtain some values for s+s^{+} and ss^{-}. On the one hand, applying Lemma 11, we can choose

s+=8n2.s^{+}=8n^{2}\;.

On the other hand, if GG is in k+1\mathcal{F}_{k+1}^{\prime}, in order to merge two components it is enough to perform a switching between an oriented non-cut edge (at least M𝒟2n(loglogn2)nM_{\mathcal{D}}-2n\geq(\log\log n-2)n choices) and any other oriented edge not in the same component as the first one (since GG has minimum degree at least 11 and the largest component has order at most KK, there are at least nKn-K choices). Since nn is large, we can choose

s=(loglogn2)n(nK)loglogn2n2.s^{-}=(\log\log{n}-2)n\cdot(n-K)\geq\frac{\sqrt{\log\log{n}}}{2}n^{2}\;.

From the previous bounds, we obtain the desired result

[k+1]s+s[k]16loglogn[k].\displaystyle\mathbb{P}[\mathcal{F}_{k+1}^{\prime}]\leq\frac{s^{+}}{s^{-}}\cdot\mathbb{P}[\mathcal{F}_{k}]\leq\frac{16}{\sqrt{\log\log{n}}}\cdot\mathbb{P}[\mathcal{F}_{k}]\;.

Proof of Lemma 11.

Any such pair of oriented edges must lie in the same component and swapping on them within the component must yield a disconnected graph. So, as the function x4x2x\mapsto 4x^{2} is convex, we may assume that GG is connected.

First, suppose that at least one of the oriented edges, say uvuv, is an edge cut of the graph. Then, if xyxy is not an edge cut of GG, the switching does not disconnect the graph. Since there are at most n1n-1 cut-edges, there are at most 4(n1)24(n-1)^{2} switchings using at least one (and hence two) oriented cut-edges.

Otherwise {uv,xy}\{uv,xy\} is a proper 2-edge cut (that is, both GuvG-uv and GxyG-xy are connected). Consider an arbitrary spanning tree of GG. This tree contains at least one edge of every 22-edge cut of GG. Thus, select uvuv among these edges (exactly (n1)(n-1) choices). Observe now that, in order to construct the proper 2-edge cut, we need to select xyxy as a cut-edge in GuvG-uv (at most (n1)(n-1) choices).

Therefore, in total there are at most 8n28n^{2} switchings in GG which disconnect it. ∎

4 The Proof of Theorem 8

Theorem 8 follows immediately from the following result.

Lemma 12.

For every sufficiently small ω>0\omega>0 and every degree sequence 𝒟{\cal D} such that R𝒟ωM𝒟R_{\cal D}\leq\omega M_{\cal D} and M𝒟M_{\cal D} is sufficiently large in terms of ω\omega, for every vertex vv of H(𝒟)H({\cal D}), the probability that vv lies in a component of H(𝒟)H({\cal D}) of size larger than ω1/9M𝒟\omega^{1/9}M_{\cal D} is less than eM𝒟1/4e^{-M_{\mathcal{D}}^{1/4}}.

In order to prove this lemma, we analyse our random exploration process on H(𝒟)H({\cal D}) using a set S0S_{0} of vertices including vv and show that the probability that there is a tt with Xt=0X_{t}=0 and such that there are at most ω1/9M𝒟\omega^{1/9}M_{\cal D} edges in the graph induced by StS_{t} in H(𝒟)H(\mathcal{D}), is at least 1eMD1/41-e^{-M_{D}^{1/4}}.

Since it is difficult to keep track of XtX_{t}, we will instead focus on the random variable XtX_{t}^{\prime} (defined below), which overestimates XtX_{t} until Xt=0X_{t}=0. Clearly, X0X_{0} is at most

X0=uS0d(u).X^{\prime}_{0}=\sum_{u\in S_{0}}d(u)\;.

Provided that Xt1>0X_{t-1}>0, the edge vtwtv_{t}w_{t} is an edge of H(𝒟)H(\mathcal{D}), and we can upper bound XtX_{t} by

Xt=X0+i=1t(d(wi)2).X^{\prime}_{t}=X_{0}^{\prime}+\sum_{i=1}^{t}(d(w_{i})-2)\;.

Observe that, provided that Xt1>0X_{t-1}>0, the process XtX_{t}^{\prime} coincides with XtX_{t} if the explored components are trees and S0S_{0} is a stable set. More importantly, we observe that if Xt=0X^{\prime}_{t}=0, then there is a ttt^{\prime}\leq t for which Xt=0X_{t^{\prime}}=0. We note further that the number of edges in the graph induced by StS_{t} in H(𝒟)H(\mathcal{D}) is at most Xt+2tX^{\prime}_{t}+2t, so we only need to show that the probability that there is no t<ω1/9M2t<\frac{\omega^{1/9}M}{2} for which Xt=0X^{\prime}_{t}=0 is less than eM𝒟1/4e^{-M_{\mathcal{D}}^{1/4}}.

As suggested by our introductory discussion and proven below, the probability that wtw_{t} is a specific vertex ww in V(H(𝒟))St1V(H({\cal D}))\setminus{S}_{t-1} is essentially proportional to the degree of ww. Therefore, the expected value of XtXt1=d(wt)2X^{\prime}_{t}-X^{\prime}_{t-1}=d(w_{t})-2 is with high probability close to

1wV(H(𝒟))St1d(w)wV(H(𝒟))St1d(w)(d(w)2).\frac{1}{\sum\limits_{w\in V(H({\cal D}))\setminus{S}_{t-1}}d(w)}\sum\limits_{w\in V(H({\cal D}))\setminus{S}_{t-1}}d(w)(d(w)-2).

By putting the high degree vertices of H(𝒟)H({\cal D}) into S0{v}S_{0}\setminus\{v\}, we can ensure that the expectation of X1X0X^{\prime}_{1}-X^{\prime}_{0} is negative. The fact that, if the process has not died out by time tt, then the sum of the degrees of the vertices we picked cannot be much less than 2t2t, allows us to obtain a bound on the expectation of XtXt1X^{\prime}_{t}-X^{\prime}_{t-1} which decreases as tt increases. Having all the high degree vertices in S0S_{0} facilitates our analysis of the exploration process, and allows us to show that the probability that XtX^{\prime}_{t} drops to 0 before the number of edges in the graph induced by StS_{t} in H(𝒟)H(\mathcal{D}) is at least ω1/9M𝒟\omega^{1/9}M_{\cal D}, is more than 1eM𝒟1/41-e^{-M_{\mathcal{D}}^{1/4}}. Forthwith the details.

We use HH for H(𝒟)H({\cal D}), VV for V(H(𝒟))V(H({\cal D})), MM for M𝒟M_{\cal D}, RR for R𝒟R_{\cal D} and GG for G(𝒟)G({\cal D}). We implicitly assume that ω\omega is small enough and MM is large enough in terms of ω\omega to ensure various inequalities scattered throughout the proof are satisfied.

Let SS be a smallest set of vertices of HH such that iSdi5ω1/4M\sum_{i\in S}d_{i}\geq 5\omega^{1/4}M and there is no vertex outside of SS with degree bigger than a vertex in SS. Since the sum of the degrees of the vertices in HH is M>5ω1/4MM>5\omega^{1/4}M, such a set SS exists. Furthermore, since RωM<5ω1/4MR\leq\omega M<5\omega^{1/4}M, the definition of jDj_{D} implies that wVSd(w)(d(w)2)0\sum_{w\in V\setminus S}d(w)(d(w)-2)\leq 0. It is straightforward to prove, as we do below, the following strengthening, which is important for our analysis.

Lemma 13.

We have

  1. (a)

    wVSd(w)(d(w)2)4ω1/4M\sum\limits_{w\in V\setminus S}d(w)(d(w)-2)\leq-4\omega^{1/4}M, and

  2. (b)

    there is a vertex of SS with degree at most ω1/4\omega^{-1/4}.

The sum of the degrees in SS is at most the sum of 5ω1/4M5\omega^{1/4}M and the minimum degree in SS. Let S0=S{v}S_{0}=S\cup\{v\}. Since every vertex not in SS has degree at most the minimum degree in SS and X0X^{\prime}_{0} is the sum of the degrees of the vertices in S{v}S\cup\{v\}, the following observation holds.

Observation 14.

We have

  1. (a)

    d(u)ω1/4d(u)\leq\omega^{-1/4} for every uVS0u\in V\setminus S_{0}, and

  2. (b)

    X07ω1/4MX_{0}^{\prime}\leq 7\omega^{1/4}M.

As we carry out our process, we let Yt=XtXt1𝔼[d(wt)2]Y_{t}=X^{\prime}_{t}-X^{\prime}_{t-1}-\mathbb{E}[d(w_{t})-2], where the expectation is conditional on 𝒞t\mathcal{C}_{t}. By construction 𝔼[Yt]=0\mathbb{E}[Y_{t}]=0 and by Lemma 13 (b), the absolute value of YtY_{t} is bounded from above by ω1/4\omega^{-1/4}. As we explain below, by applying Azuma’s Inequality we immediately obtain:

Lemma 15.

The probability that there is a tt such that ttYt>M2/3\sum_{t^{\prime}\leq t}Y_{t^{\prime}}>M^{2/3} is less than eM1/4e^{-M^{1/4}}.

Thus, in order to bound XtX^{\prime}_{t} from above, we need to bound 𝔼[d(wt)]\mathbb{E}[d(w_{t})] from above for each t1t\geq 1. Letting Mt1M_{t-1} be the sum of the degrees of the vertices of HH which are not in St1S_{t-1} and using our swapping arguments, we can prove the following result, which will be useful to give a precise estimation for 𝔼[d(wt)]\mathbb{E}[d(w_{t})].

Lemma 16.

If tω1/9Mt\leq\omega^{1/9}M and Xt1ω1/5MX^{\prime}_{t-1}\leq\omega^{1/5}M, and Xt>0X_{t^{\prime}}>0 for all t<tt^{\prime}<t, then the following statements hold:

  1. (a)

    If wVSt1w\in V\setminus{S}_{t-1} and d(w)=1d(w)=1, then

    [wt=w](19ω1/5)1Mt1.\mathbb{P}[w_{t}=w]\geq\left(1-9\omega^{1/5}\right)\frac{1}{M_{t-1}}\;.
  2. (b)

    If wVSt1w\in V\setminus{S}_{t-1}, then

    [wt=w](1+9ω1/5)d(w)Mt1.\mathbb{P}[w_{t}=w]\leq\left(1+9\omega^{1/5}\right)\frac{d(w)}{M_{t-1}}\;.

Iteratively applying this result to bound 𝔼[d(wt)]\mathbb{E}[d(w_{t})], we obtain:

Lemma 17.

Letting τ\tau be the minimum of ω1/9M2\lfloor\frac{\omega^{1/9}M}{2}\rfloor and the first tt for which ttYt>M2/3\sum_{t^{\prime}\leq t}Y_{t^{\prime}}>M^{2/3} or Xt=0X_{t}=0, we have the following for all tτt\leq\tau:

𝔼[d(wt)2]tM+19ω1/5.\mathbb{E}[d(w_{t})-2]\leq-\frac{t}{M}+19\omega^{1/5}.

The next lemma completes the proof of Lemma 12 and thus, of Theorem 8.

Lemma 18.

With probability greater than 1eM1/41-e^{-M^{1/4}}, there exists tω1/9M3t\leq\lceil\frac{\omega^{1/9}M}{3}\rceil such that Xt=0X_{t}=0.

Proof.

By Lemma 15, with probability greater than 1eM1/41-e^{-M^{1/4}} there is no tt such that ttYt>M2/3\sum_{t^{\prime}\leq t}Y_{t^{\prime}}>M^{2/3}. If this event holds, and there is no tω1/9M3t\leq\lceil\frac{\omega^{1/9}M}{3}\rceil for which Xt=0X_{t}=0, then by applying Lemma 17 we see that for every such tt,

XtXtX0t(t1)2M+19ω1/5t+M2/3.X_{t}\leq X^{\prime}_{t}\leq X^{\prime}_{0}-\frac{t(t-1)}{2M}+19\omega^{1/5}t+M^{2/3}.

Since X07w1/4MX^{\prime}_{0}\leq 7w^{1/4}M, it follows that for t=ω1/9M3t=\lceil\frac{\omega^{1/9}M}{3}\rceil, we have Xt<0X_{t}<0, which is a contradiction. ∎

Therefore, it only remains to prove Lemma 13, 15, 16, and 17.

4.1 The Details

We start with some simple observations.

Observation 19.

The maximum degree dπnd_{\pi_{n}} of HH is at most ωM\omega M.

Proof.

By definition, jDnj_{D}\leq n, which implies dπnRωMd_{\pi_{n}}\leq R\leq\omega M. ∎

We let n1n_{1} be the number of vertices of degree 1 in HH.

Observation 20.

We have n1M3+1n_{1}\geq\frac{M}{3}+1.

Proof.

By the definition of jDj_{D}, we obtain

2n1+i[jD1]:dπi2dπii=1jD1dπi(dπi2)0.-2n_{1}+\sum_{i\in[j_{D}-1]:d_{\pi_{i}}\not=2}d_{\pi_{i}}\leq\sum_{i=1}^{j_{D}-1}d_{\pi_{i}}(d_{\pi_{i}}-2)\leq 0.

Hence, 2n1i[jD1]:dπi2dπiMR(1ω)M2n_{1}\geq\sum_{i\in[j_{D}-1]:d_{\pi_{i}}\not=2}d_{\pi_{i}}\geq M-R\geq(1-\omega)M, which implies n1(1ω)M2M3+1n_{1}\geq\frac{(1-\omega)M}{2}\geq\frac{M}{3}+1. ∎

If there is a vertex in SS of degree 11, then every vertex outside of SS has degree 11. Thus every edge in the components of HH containing S0S_{0} is incident to a vertex of S0S_{0}, hence there are at most X0<ω1/9M2X^{\prime}_{0}<\frac{\omega^{1/9}M}{2} such edges and we are done. So every vertex in SS has degree at least 33.

Proof of Lemma 13.

Let jDj_{D}^{*} be such that i=jDndπi=iSdi\sum_{i=j_{D}^{*}}^{n}d_{\pi_{i}}=\sum_{i\in S}d_{i}. Since iSdi>5ω1/4M>ωM\sum_{i\in S}d_{i}>5\omega^{1/4}M>\omega M, we have jD<jDj_{D}^{*}<j_{D}. By the definition of RR, we have i=jDjD1dπi5ω1/4MR>4ω1/4M\sum_{i=j_{D}^{*}}^{j_{D}-1}d_{\pi_{i}}\geq 5\omega^{1/4}M-R>4\omega^{1/4}M. Now, since SS contains only vertices of degree at least 33, the definition of jDj_{D} implies wVSd(w)(d(w)2)=i=1jD1dπi(dπi2)i=jDjD1dπi(dπi2)i=jDjD1dπi<4ω1/4M\sum_{w\in V\setminus S}d(w)(d(w)-2)=\sum_{i=1}^{j_{D}^{*}-1}d_{\pi_{i}}(d_{\pi_{i}}-2)\leq-\sum_{i=j_{D}^{*}}^{j_{D}-1}d_{\pi_{i}}(d_{\pi_{i}}-2)\leq-\sum_{i=j_{D}^{*}}^{j_{D}-1}d_{\pi_{i}}<-4\omega^{1/4}M, and the first statement follows.

If every vertex in SS has degree at least ω1/4\omega^{-1/4} then, since for sufficiently small ω\omega, we have that ω1/42>3ω1/44\omega^{-1/4}-2>\frac{3\omega^{-1/4}}{4}. With the above observation, we have i=jDjD1dπi(dπi2)>3ω1/444ω1/4M=3M\sum_{i=j_{D}^{*}}^{j_{D}-1}d_{\pi_{i}}(d_{\pi_{i}}-2)>\frac{3\omega^{-1/4}}{4}\cdot 4\omega^{1/4}M=3M. Since there are at most MM values of ii for which di=1d_{i}=1, it follows that i=1jD1dπi(dπi2)3MM>0\sum_{i=1}^{j_{D}-1}d_{\pi_{i}}(d_{\pi_{i}}-2)\geq 3M-M>0, which is a contradiction to the choice of jDj_{D}, and the second statement follows. ∎

For the proof of Lemma 15 we recall a standard concentration inequality.

Lemma 21 (Azuma’s Inequality (see, e.g. [26])).

Let XX be a random variable determined by a sequence of NN random experiments T1,,TNT_{1},\ldots,T_{N} such that for every 1iN1\leq i\leq N and any possible sequences t1,,ti1,tit_{1},\ldots,t_{i-1},t_{i} and t1,,ti1,tit_{1},\ldots,t_{i-1},t_{i}^{\prime}:

|𝔼[XT1=t1,,Ti=ti]𝔼[XT1=t1,,Ti=ti]|ci,\displaystyle\left|\mathbb{E}[X\mid T_{1}=t_{1},\ldots,T_{i}=t_{i}]-\mathbb{E}[X\mid T_{1}=t_{1},\ldots,T_{i}=t_{i}^{\prime}]\right|\leq c_{i},

then

[|X𝔼[X]|>t]<2et22i=1Nci2.\displaystyle\mathbb{P}[|X-\mathbb{E}[X]|>t]<2e^{-\frac{t^{2}}{2\sum_{i=1}^{N}c_{i}^{2}}}.
Proof of Lemma 15.

Recall that Yt=XtXt1𝔼[d(wt)2]Y_{t}=X^{\prime}_{t}-X^{\prime}_{t-1}-\mathbb{E}[d(w_{t})-2], which implies 𝔼[Yt]=0\mathbb{E}[Y_{t}]=0. By Lemma 13, we have |Yt|ω1/4|Y_{t}|\leq\omega^{-1/4}. Azuma’s inequality applied to ttYt\sum_{t^{\prime}\leq t}Y_{t^{\prime}} with N=tN=t and ci=ω1/4c_{i}=\omega^{-1/4} gives us

[ttYt>M2/3]<2eM4/32ω1/2t<eM2/7,\displaystyle\mathbb{P}\left[\sum_{t^{\prime}\leq t}Y_{t^{\prime}}>M^{2/3}\right]<2e^{-\frac{M^{4/3}}{2\omega^{-1/2}t}}<e^{-M^{2/7}}\;,

where we used that tMt\leq M. A union bound over all tMt\leq M suffices to obtain the desired statement. ∎

Proof of Lemma 16.

We can assume that there is an edge of HH from St1S_{t-1} to VSt1V\setminus S_{t-1} as otherwise the probability that ww is wtw_{t} is exactly d(w)Mt1\frac{d(w)}{M_{t-1}}, by construction of the exploration process, and we are done.

It is enough to prove this result conditioned on the current configuration 𝒞t1\mathcal{C}_{t-1}. In doing so, we partition the inputs within 𝒞t1\mathcal{C}_{t-1} (a graph with an ordering of the adjacency list of each vertex) into different equivalence classes. All the inputs in the same equivalence class share the same underlying graph GG, a partial order of the adjacency list of each vertex in StS_{t} and a specification of the first edge from vtv_{t} to VSt1V\setminus S_{t-1} in the ordering of the adjacency list for vtv_{t}. Observe that each equivalence class corresponds to the same number of inputs; these arise from each other by suitably reordering some of the adjacency lists. More precisely, every equivalence class contains iV(di)!\prod_{i\in V}(d^{\prime}_{i})! inputs, where di=did^{\prime}_{i}=d_{i} if iVSt1i\in V\setminus S_{t-1}, and did^{\prime}_{i} is the number of edges of HH from ii to VSt1V\setminus S_{t-1} if iSt1{vt}i\in S_{t-1}\setminus\{v_{t}\} and is one less than this number if i=vti=v_{t}.

We generalise the definition of extended switching from graphs to equivalence classes of inputs, provided that the switching neither uses nor creates an edge of HH within St1S_{t-1}. This ensures that after the switching, the set of edges of HH within St1S_{t-1} (and the set of edges of GG corresponding to them) remains unchanged. Consider switching the pair of edges (e1=uv,e2=xy)(e_{1}=uv,e_{2}=xy) as explained in Section 3. Let u=w0,w1,,wr=vu=w_{0},w_{1},\dots,w_{r}=v be the directed walk in GG corresponding to e1e_{1} and let x=z0,z1,,zs=yx=z_{0},z_{1},\dots,z_{s}=y be the directed walk in GG corresponding to e2e_{2}. When switching, we delete the edges wr1vw_{r-1}v and xz1xz_{1} and add the edges wr1xw_{r-1}x and vz1vz_{1}. This naturally preserves the ordering of the adjacency lists of u=w0,w1,,wr2u=w_{0},w_{1},\dots,w_{r-2} and z2,,zs1,z2=yz_{2},\dots,z_{s-1},z_{2}=y. The orderings of the adjacency lists of wr1w_{r-1} and z1z_{1} are obtained by preserving the position of the edges wr2wr1w_{r-2}w_{r-1} and z1z2z_{1}z_{2} in them, respectively. If vxv\neq x, after the switching, the ordering of the adjacency list of vv is obtained by preserving the position of the edges incident to vv and different from e1e_{1}. If v=xv=x, after the switching, the position of the edges e1e_{1} and e2e_{2} in the adjacency list of vv is swapped. This works analogously for xx. Note that this generalisation preserves the equivalence class of the input: if a switching is performed to an input, first, no edges within St1S_{t-1} are modified (neither their position on the corresponding adjacency lists) and, second, the specification of the first edge from vtv_{t} to VSt1V\setminus S_{t-1} in the adjacency list of vtv_{t} is preserved.

For any vertex wVSt1w\in V\setminus{S}_{t-1}, we let 𝒜w{\cal A}_{w} be the set of equivalence classes for which ww is wtw_{t}, and let w{\cal B}_{w} be the set of equivalence classes for which ww is not wtw_{t}. In order to bound the probability that ww is chosen to be the vertex wtw_{t} that will be added to St1S_{t-1}, we consider switchings between elements of 𝒜w{\cal A}_{w} and w{\cal B}_{w}, which allow us to relate |𝒜w||{\cal A}_{w}| and |w||{\cal B}_{w}|.

Since t<M12t<\frac{M}{12}, the set VSt1V\setminus S_{t-1} contains at least M4+1\frac{M}{4}+1 of the at least M3+1\frac{M}{3}+1 vertices of degree 11 and so Mt1M4+1M_{t-1}\geq\frac{M}{4}+1.

Proof of (a). A switching from 𝒜w{\cal A}_{w} to w{\cal B}_{w} involves the oriented edge vtwv_{t}w and one of the remaining Mt11M_{t-1}-1 oriented edges with first endpoint in VSt1V\setminus S_{t-1} or wvtwv_{t} and one of the remaining Mt11M_{t-1}-1 oriented edges with last endpoint in VSt1V\setminus S_{t-1}. This implies that there are less than 4Mt1|𝒜w|4M_{t-1}|{\cal A}_{w}| switchings from 𝒜w{\cal A}_{w} to w{\cal B}_{w}.

Next, we prove a lower bound for the number of switchings from w{\cal B}_{w} to 𝒜w{\cal A}_{w}. The choice of an equivalence class in w{\cal B}_{w} fixes a choice of the edge vtwtv_{t}w_{t}. A vertex in VSt1V\setminus S_{t-1} is bad if it either has a neighbour in St1S_{t-1}, has a common neighbour with wtw_{t} or is adjacent to wtw_{t}. If ww is not bad, then for the unique neighbour zz of ww there exists four switchings using vtwtv_{t}w_{t} and wzwz (or their reverses), from w{\cal B}_{w} to 𝒜w{\cal A}_{w}. By the hypothesis of the lemma, at most Xt1Xt1ω1/5MX_{t-1}\leq X_{t-1}^{\prime}\leq\omega^{1/5}M vertices in VSt1V\setminus{S}_{t-1} have a neighbour in St1S_{t-1}. By Observation 14, the degrees of the vertices in VSt1V\setminus{S}_{t-1} are at most ω1/4\omega^{-1/4}, which implies that there are at most ω1/5M+ω1/2+ω1/42ω1/5M\omega^{1/5}M+\omega^{-1/2}+\omega^{-1/4}\leq 2\omega^{1/5}M bad vertices. Since sampling an element of w{\cal B}_{w} uniformly at random, and applying a random permutation to the at least M/4M/4 vertices of degree 11 in V(St1{wt})V\setminus({S}_{t-1}\cup\{w_{t}\}), yields an equivalence class of w{\cal B}_{w} that is still sampled uniformly at random, the proportion of equivalence classes in w{\cal B}_{w} in which a specific vertex ww of degree 11 is not bad, is at least 12ω1/5MM/4>18ω1/51-\frac{2\omega^{1/5}M}{M/4}>1-8\omega^{1/5}. This implies that there are at least 4(18ω1/5)|w|4\left(1-8\omega^{1/5}\right)|{\cal B}_{w}| switchings from w{\cal B}_{w} to 𝒜w{\cal A}_{w}.

Double-counting these switchings yields |𝒜w|(18ω1/5)|w|Mt1|{\cal A}_{w}|\geq\frac{(1-8\omega^{1/5}){|\cal B}_{w}|}{M_{t-1}}. Since, [w=wt]=|𝒜w||𝒜w|+|w|\mathbb{P}[w=w_{t}]=\frac{|{\cal A}_{w}|}{|{\cal A}_{w}|+|{\cal B}_{w}|}, and Mt1=MXt12(t1)M_{t-1}=M-X_{t-1}^{\prime}-2(t-1) is large in terms of ω\omega, it follows that [w=wt]19ω1/5Mt1\mathbb{P}[w=w_{t}]\geq\frac{1-9\omega^{1/5}}{M_{t-1}}.

Proof of (b). Let d=d(w)d=d(w). For a switching from w{\cal B}_{w} to 𝒜w{\cal A}_{w}, we have to switch the ordered edge vtwtv_{t}w_{t} with one of the dd ordered edges wywy or the ordered edge wtvtw_{t}v_{t} with one of the dd ordered edges ywyw. Therefore, the number of switchings from w{\cal B}_{w} to 𝒜w{\cal A}_{w} is at most 4d|w|4d|{\cal B}_{w}|.

Next, we prove a lower bound for the number of switchings from 𝒜w{\cal A}_{w} to w{\cal B}_{w}. We have such a switching involving the ordered edge vtwv_{t}w, and some ordered edge yzyz provided yy is in VSt1V\setminus{S}_{t-1}, and vtyv_{t}y and wzwz are not edges. We have such a switching involving the ordered edge wvtwv_{t}, and some ordered edge zyzy provided yy is in VSt1V\setminus{S}_{t-1}, and vtyv_{t}y and wzwz are not edges. Since vtv_{t} has degree at most ωM\omega M, and the maximum degree of a vertex in VSt1V\setminus S_{t-1} is ω1/4\omega^{-1/4}, the number of choices for zyzy is at least Mt1ω3/4MXt1ω1/2>(12ω1/5)Mt1M_{t-1}-\omega^{3/4}M-X^{\prime}_{t-1}-\omega^{-1/2}>(1-2\omega^{1/5})M_{t-1}.

Double-counting the number of switchings, we obtain |w|(12ω1/5)Mt1d|𝒜w||{\cal B}_{w}|\geq\left(1-2\omega^{1/5}\right)\frac{M_{t-1}}{d}|{\cal A}_{w}|. As before [w=wt]=|𝒜w||𝒜w|+|w|\mathbb{P}[w=w_{t}]=\frac{|{\cal A}_{w}|}{|{\cal A}_{w}|+|{\cal B}_{w}|}, and hence [wt=w](1+9ω1/5)dMt1.\mathbb{P}[w_{t}=w]\leq\frac{(1+9\omega^{1/5})d}{M_{t-1}}.

Proof of Lemma 17.

The proof is by induction on tt. Recall that there are between M4\frac{M}{4} and MM vertices of degree 11 in VSt1V\setminus S_{t-1}.

First, let t=1t=1. Observation 14 (b) gives us X0ω1/5M2X_{0}^{\prime}\leq\frac{\omega^{1/5}M}{2}, hence Lemma 16 implies

𝔼[d(w1)2](1+9ω1/5)wVS0d(w)(d(w)2)+18ω1/5n1M0.\mathbb{E}[d(w_{1})-2]\leq\frac{(1+9\omega^{1/5})\sum_{w\in V\setminus S_{0}}d(w)(d(w)-2)+18\omega^{1/5}n_{1}}{M_{0}}.

Thus, by Lemma 13 and the fact that n1n_{1} is at most M0+1M_{0}+1 (recall that there are no vertices of degree 11 in S0S_{0}), we have 𝔼[d(w1)2]1M+19ω1/5\mathbb{E}[d(w_{1})-2]\leq-\frac{1}{M}+19\omega^{1/5}.

Now, let 2tω1/9M22\leq t\leq\frac{\omega^{1/9}M}{2}. By induction, we obtain

Xt1=X0+i=1t1(𝔼[d(wi)]2)+t<tYtω1/5M2+19ω1/5t+M2/3ω1/5M.X^{\prime}_{t-1}=X^{\prime}_{0}+\sum_{i=1}^{t-1}(\mathbb{E}[d(w_{i})]-2)+\sum_{t^{\prime}<t}Y_{t^{\prime}}\leq\frac{\omega^{1/5}M}{2}+19\omega^{1/5}t+M^{2/3}\leq\omega^{1/5}M.
Claim 1.

For any sequence of positive integers a1,,aja_{1},\ldots,a_{j} distinct from 2 and a nonnegative integer \ell such that i=1jai2j\sum_{i=1}^{j}a_{i}\geq 2j-\ell, we have i=1jai(ai2)j2\sum_{i=1}^{j}a_{i}(a_{i}-2)\geq j-2\ell.

Proof.

The proof is by induction on i=1jai\sum_{i=1}^{j}a_{i}. The statement is trivially true if j=0j=0.

If all aia_{i} are at least 33, then i=1jai(ai2)i=1jai2jj2\sum_{i=1}^{j}a_{i}(a_{i}-2)\geq\sum_{i=1}^{j}a_{i}\geq 2j-\ell\geq j-2\ell. Hence, we may assume without loss of generality that aj=1a_{j}=1. Since i=1j1ai2j1=2(j1)(1)\sum_{i=1}^{j-1}a_{i}\geq 2j-\ell-1=2(j-1)-(\ell-1), if 0\ell\neq 0, we obtain by induction that i=1jai(ai2)(j1)2(1)1=j2\sum_{i=1}^{j}a_{i}(a_{i}-2)\geq(j-1)-2(\ell-1)-1=j-2\ell.

So, we can assume that =0\ell=0. If for all i[j]i\in[j], the integer ai{1,3}a_{i}\in\{1,3\}, then the claim also follows easily, as then |{i[j]:ai=3}||{i[j]:ai=1}||\{i\in[j]:a_{i}=3\}|\geq|\{i\in[j]:a_{i}=1\}|. Since 3(31)+1(12)=23(3-1)+1(1-2)=2, the negative contribution of a 11 is compensated by a 33.

Hence, without loss of generality, a14a_{1}\geq 4 and j>1j>1. If i=1jai>2j\sum_{i=1}^{j}a_{i}>2j then by decreasing a1a_{1} by 1 and then applying induction we are done. So, we may assume i=1jai=2j\sum_{i=1}^{j}a_{i}=2j, and thus there must be at least a12a_{1}-2 values of ii for which ai=1a_{i}=1. The sum of ak(ak2)a_{k}(a_{k}-2) over these a12a_{1}-2 elements and a1a_{1} is (a11)(a12)a11(a_{1}-1)(a_{1}-2)\geq a_{1}-1. So, deleting these a11a_{1}-1 elements from the set and again applying induction, we are done. ∎

Since Xt1>0X_{t-1}>0, we have Xt1>0X^{\prime}_{t-1}>0, and hence i=1t1d(wi)=2(t1)+i=1t1(d(wi)2)=2(t1)+Xt1X02(t1)X0\sum_{i=1}^{t-1}d(w_{i})=2(t-1)+\sum_{i=1}^{t-1}(d(w_{i})-2)=2(t-1)+X^{\prime}_{t-1}-X^{\prime}_{0}\geq 2(t-1)-X^{\prime}_{0}. By Claim 1,

i=1t1d(wi)(d(wi)2)(t1)2X0.\sum_{i=1}^{t-1}d(w_{i})(d(w_{i})-2)\geq(t-1)-2X^{\prime}_{0}.

Since VSt1=V(S0{w1,,wt1})V\setminus S_{t-1}=V\setminus(S_{0}\cup\{w_{1},...,w_{t-1}\}), Lemma 13 yields

wVSt1d(w)(d(w)2)2X0(t1).\sum_{w\in V\setminus{S}_{t-1}}d(w)(d(w)-2)\leq 2X^{\prime}_{0}-(t-1)\;.

Combining this bound with Lemma 16 yields to

𝔼[d(wt)]2(1+9ω1/5)(2X0(t1))+18ω1/5n1Mt1.\mathbb{E}[d(w_{t})]-2\leq\frac{(1+9\omega^{1/5})(2X^{\prime}_{0}-(t-1))+18\omega^{1/5}n_{1}}{M_{t-1}}\;.

Since n1Mt1n_{1}\leq M_{t-1}, X07ω1/4MX^{\prime}_{0}\leq 7\omega^{1/4}M, and M4Mt1<M\frac{M}{4}\leq M_{t-1}<M, we obtain

𝔼[d(wt)]2t1M+4(1+9ω1/5)(14ω1/4)+18ω1/5tM+19ω1/5\mathbb{E}[d(w_{t})]-2\leq-\frac{t-1}{M}+4(1+9\omega^{1/5})(14\omega^{1/4})+18\omega^{1/5}\leq-\frac{t}{M}+19\omega^{1/5}

which completes the proof of Lemma 17. ∎

5 The Proof of Theorem 9

Again, we use HH for H(𝒟)H({\cal D}), VV for V(H(𝒟))V(H({\cal D})), MM for M𝒟M_{\cal D}, RR for R𝒟R_{\cal D} and GG for G(𝒟)G({\cal D}). Since 𝒟{\cal D} is well-behaved, we can also assume that MM is sufficiently large to satisfy various inequalities scattered throughout the proof.

As we have already mentioned, in the preprocessing step we need to handle the vertices of large degree. We let LL be the set of vertices of degree exceeding MlogM\frac{\sqrt{M}}{\log M}. We will prove the following result using the combinatorial switching approach:

Lemma 22.

The probability that the number of edges in the components of HH intersecting LL is at least R200\frac{R}{200} and there exists no component containing at least R200\frac{R}{200} edges is o(1)o(1).

We also need the following straightforward result:

Lemma 23.

Let UU be a set of vertices of HH that contains all vertices of degree exceeding MlogM\frac{\sqrt{M}}{\log M} and let 14<c<1\frac{1}{4}<c<1 be such that uUd(u)cR\sum_{u\in U}d(u)\leq cR. Then

uVUd(u)(d(u)2)(1c)2R.\sum_{u\in V\setminus U}d(u)(d(u)-2)\geq\frac{(1-c)}{2}R.
Proof.

The assumptions imply dπj𝒟MlogMd_{\pi_{j_{\mathcal{D}}}}\leq\frac{\sqrt{M}}{\log M}. Letting U={i:ij𝒟,πiU}U^{\prime}=\{i:\,i\leq j_{\mathcal{D}},\,\pi_{i}\in U\} and K=iUdπiK=\sum_{i\in U^{\prime}}d_{\pi_{i}}, the definition of j𝒟j_{\mathcal{D}} implies

uVUd(u)(d(u)2)\displaystyle\sum_{u\in V\setminus U}d(u)(d(u)-2) iUdπi(dπi2)+i=j𝒟+1ndπi(dπi2)iUUdπi(dπi2)\displaystyle\geq-\sum_{i\in U^{\prime}}d_{\pi_{i}}(d_{\pi_{i}}-2)+\sum_{i=j_{\mathcal{D}}+1}^{n}d_{\pi_{i}}(d_{\pi_{i}}-2)-\sum_{i\in U\setminus U^{\prime}}d_{\pi_{i}}(d_{\pi_{i}}-2)
K(dπj𝒟2)+(dπj𝒟2)(Rdπj𝒟(cRK))\displaystyle\geq-K(d_{\pi_{j_{\mathcal{D}}}}-2)+(d_{\pi_{j_{\mathcal{D}}}}-2)(R-d_{\pi_{j_{\mathcal{D}}}}-(cR-K))
(1c)2R.\displaystyle\geq\frac{(1-c)}{2}R.

Starting with the set LL, we first explore all the components in HH that contain at least one vertex in LL. Let UU be the set of all vertices in such components. If uUd(u)R100\sum_{u\in U}d(u)\geq\frac{R}{100}, then by Lemma 22 the probability that there does not exist a component in HH with at least R200\frac{R}{200} edges is o(1)o(1). So, in what follows, we condition on uUd(u)R100\sum_{u\in U}d(u)\leq\frac{R}{100}.

We let S0=U{v}S_{0}=U\cup\{v\}, where vv is a random vertex selected with probability proportional to d(v)d(v). Since there are no edges from UU to VUV\setminus U, we have v1=vv_{1}=v. Note that this implies that, for every tt, the edges counted by XtX_{t} belong to the same component (not necessarily the one of vv). In addition, the maximum degree of a vertex in VUV\setminus U is at most MlogM\frac{\sqrt{M}}{\log{M}} and MM0MR100d(v0)MM100MlogM98M100M\geq M_{0}\geq M-\frac{R}{100}-d(v_{0})\geq M-\frac{M}{100}-\frac{\sqrt{M}}{\log{M}}\geq\frac{98M}{100}.

For each vertex wVSt1w\in V\setminus S_{t-1}, we let dt(w)d^{\prime}_{t}(w) be the sum of the number of loops of HH at ww and the number of edges of HH between ww and St1{vt}S_{t-1}\setminus\{v_{t}\}. Observe that we can control the number of edges between StS_{t} and VStV\setminus S_{t} as follows

Xt=Xt1+(d(wt)2)2dt(wt).\displaystyle X_{t}=X_{t-1}+(d(w_{t})-2)-2d^{\prime}_{t}(w_{t})\;. (1)

The next lemma shows that the probability of selecting a vertex ww at time tt is essentially proportional to its degree.

Lemma 24.

Let β<106\beta<10^{-6} be a fixed constant. If Mt13M4M_{t-1}\geq\frac{3M}{4} and Xt1βMX_{t-1}\leq\beta M, then for every wVSt1w\in V\setminus S_{t-1},

(110β)d(w)Mt1[w=wt](1+10β)d(w)Mt1,(1-10\sqrt{\beta})\frac{d(w)}{M_{t-1}}\leq\mathbb{P}[w=w_{t}]\leq(1+10\sqrt{\beta})\frac{d(w)}{M_{t-1}}\;,

and,

[dt(w)2βd(w)+i|w=wt]βi/2.\mathbb{P}\Big{[}d^{\prime}_{t}(w)\geq\lfloor 2\sqrt{\beta}d(w)\rfloor+i|w=w_{t}\Big{]}\leq\beta^{i/2}\;.

We are now in a position to carry out our exploration process. Let At=d(wt)𝔼[d(wt)]A_{t}=d(w_{t})-\mathbb{E}[d(w_{t})] and let Bt=dt(wt)𝔼[dt(wt)]B_{t}=d^{\prime}_{t}(w_{t})-\mathbb{E}[d^{\prime}_{t}(w_{t})]. By our convention, both expectations are conditional on 𝒞t1\mathcal{C}_{t-1}. We let bad\mathcal{F}_{bad} be the set of those inputs such that for some tt either ttAt>MloglogM\sum_{t^{\prime}\leq t}A_{t^{\prime}}>\frac{M}{\log\log M} or ttBt>MloglogM\sum_{t^{\prime}\leq t}B_{t^{\prime}}>\frac{M}{\log\log M}.

Lemma 25.

[bad]=o(1)\mathbb{P}[\mathcal{F}_{bad}]=o(1).

Let ϵ=RMϵ\epsilon^{\prime}=\frac{R}{M}\geq\epsilon, β=106ϵ2\beta=10^{-6}\epsilon^{2}, and τ\tau be the smallest tt for which either XtβMX_{t}\geq\beta M or Mt(1ϵ4)M0M_{t}\leq\left(1-\frac{\epsilon^{\prime}}{4}\right)M_{0}.

Lemma 26.

For any t<τt<\tau, 𝔼[d(wt)2]ϵ4\mathbb{E}[d(w_{t})-2]\geq\frac{\epsilon}{4}, 𝔼[dt(wt)]𝔼(d(wt)2)3\mathbb{E}[d^{\prime}_{t}(w_{t})]\leq\frac{\mathbb{E}(d(w_{t})-2)}{3}, and 𝔼[XtXt1]𝔼[d(wt)2]3ϵ12\mathbb{E}[X_{t}-X_{t-1}]\geq\frac{\mathbb{E}[d(w_{t})-2]}{3}\geq\frac{\epsilon}{12}.

Proof.

We have

uSt1d(u)=uS0d(u)+uSt1S0d(u)R100+M0Mt1(1100+14)RR3.\sum_{u\in S_{t-1}}d(u)=\sum_{u\in S_{0}}d(u)+\sum_{u\in S_{t-1}\setminus S_{0}}d(u)\leq\frac{R}{100}+M_{0}-M_{t-1}\leq\left(\frac{1}{100}+\frac{1}{4}\right)R\leq\frac{R}{3}\;.

By the definition of τ\tau, the hypotheses of Lemma 24 are satisfied. Using Lemma 23 as well as Lemma 24 we conclude

𝔼[d(wt)2]\displaystyle\mathbb{E}[d(w_{t})-2] =wVSt1(d(w)2)[w=wt]\displaystyle=\sum_{w\in V\setminus S_{t-1}}(d(w)-2)\mathbb{P}[w=w_{t}]
1Mt1(110β)wVSt1d(w)3d(w)(d(w)2)+1Mt1(1+10β)wVSt1d(w)=1d(w)(d(w)2)\displaystyle\geq\frac{1}{M_{t-1}}(1-10\sqrt{\beta})\!\!\!\!\sum_{\genfrac{}{}{0.0pt}{}{w\in V\setminus S_{t-1}}{d(w)\geq 3}}\!\!\!\!\!\!d(w)(d(w)-2)+\frac{1}{M_{t-1}}(1+10\sqrt{\beta})\!\!\!\!\sum_{\genfrac{}{}{0.0pt}{}{w\in V\setminus S_{t-1}}{d(w)=1}}\!\!\!\!\!\!d(w)(d(w)-2)
1Mt1(110β)wVSt1d(w)(d(w)2)20βn1Mt1\displaystyle\geq\frac{1}{M_{t-1}}(1-10\sqrt{\beta})\sum_{w\in V\setminus S_{t-1}}d(w)(d(w)-2)-\frac{20\sqrt{\beta}n_{1}}{M_{t-1}}
13(110β)RM30βϵ4,\displaystyle\geq\frac{1}{3}\left(1-10\sqrt{\beta}\right)\frac{R}{M}-30\sqrt{\beta}\geq\frac{\epsilon}{4}\;,

since β106ϵ2\beta\leq 10^{-6}\epsilon^{2}. This proves the first statement. Again, by Lemma 24, we obtain

𝔼[dt(wt)]7β𝔼[d(wt)]=7β𝔼[d(wt)2]+14β𝔼[d(wt)2]6+ϵ70𝔼[d(wt)2]3,\mathbb{E}[d^{\prime}_{t}(w_{t})]\leq 7\sqrt{\beta}\mathbb{E}[d(w_{t})]=7\sqrt{\beta}\mathbb{E}[d(w_{t})-2]+14\sqrt{\beta}\leq\frac{\mathbb{E}[d(w_{t})-2]}{6}+\frac{\epsilon}{70}\leq\frac{\mathbb{E}[d(w_{t})-2]}{3}\;,

where the last inequality follows from the first statement of this lemma.

Now, since 𝔼[XtXt1]=𝔼[d(wt)2]𝔼[2dt(wt)]\mathbb{E}[X_{t}-X_{t-1}]=\mathbb{E}[d(w_{t})-2]-\mathbb{E}[2d^{\prime}_{t}(w_{t})] the third statement follows directly from the first and second one. ∎

Since all the edges counted by XtX_{t} are in the same component of HH, this next lemma proves Theorem 9.

Lemma 27.

With probability 1o(1)1-o(1), we have XτβMX_{\tau}\geq\beta M.

Proof.

We show that if our configuration is not in bad{\cal F}_{bad} then XτβMX_{\tau}\geq\beta M. By Lemma 25, the result follows.

Applying (1) recursively, we have

Xτ\displaystyle X_{\tau} =X0+tτ(d(wt)2)2tτdt(wt).\displaystyle=X_{0}+\sum_{t\leq\tau}(d(w_{t})-2)-2\sum_{t\leq\tau}d^{\prime}_{t}(w_{t})\;.

By adding 𝔼[Xτ]\mathbb{E}[X_{\tau}], subtracting the expectation of the right hand side in the previous equation and since 𝔼[X0]=X0\mathbb{E}[X_{0}]=X_{0}, we obtain that

Xτ\displaystyle X_{\tau} =𝔼[Xτ]+tτ(d(wt)2𝔼[d(wt)2])2tτ(dt(wt)𝔼[dt(wt)])\displaystyle=\mathbb{E}[X_{\tau}]+\sum_{t\leq\tau}(d(w_{t})-2-\mathbb{E}[d(w_{t})-2])-2\sum_{t\leq\tau}(d^{\prime}_{t}(w_{t})-\mathbb{E}[d^{\prime}_{t}(w_{t})])
=𝔼[Xτ]+tτAt2tτBt\displaystyle=\mathbb{E}[X_{\tau}]+\sum_{t\leq\tau}A_{t}-2\sum_{t\leq\tau}B_{t}
𝔼[Xτ]3MloglogM.\displaystyle\geq\mathbb{E}[X_{\tau}]-\frac{3M}{\log\log M}.

If τ>ϵM400\tau>\lceil\frac{\epsilon^{\prime}M}{400}\rceil, then Lemma 26 implies Xτϵτ123MloglogMβMX_{\tau}\geq\frac{\epsilon\tau}{12}-\frac{3M}{\log\log M}\geq\beta M, and we are done. Now, let τϵM400\tau\leq\lceil\frac{\epsilon^{\prime}M}{400}\rceil. If Xτ<βMX_{\tau}<\beta M, then, by the definition of τ\tau, Mτ(1ϵ4)M0M_{\tau}\leq\left(1-\frac{\epsilon^{\prime}}{4}\right)M_{0}. Note that tτd(wt)=M0Mτϵ4M0ϵ5M\sum_{t\leq\tau}d(w_{t})=M_{0}-M_{\tau}\geq\frac{\epsilon^{\prime}}{4}M_{0}\geq\frac{\epsilon^{\prime}}{5}M, because M098M100M_{0}\geq\frac{98M}{100}. Using Lemma 26 as before, we obtain

Xτ\displaystyle X_{\tau} X0+tτ𝔼[d(wt)2]33MloglogMϵ15M2τ33MloglogMϵ20M>βM,\displaystyle\geq X_{0}+\sum_{t\leq\tau}\frac{\mathbb{E}[d(w_{t})-2]}{3}-\frac{3M}{\log\log M}\geq\frac{\epsilon^{\prime}}{15}M-\frac{2\tau}{3}-\frac{3M}{\log\log M}\geq\frac{\epsilon^{\prime}}{20}M>\beta M,

a contradiction. Thus, Xτ>βMX_{\tau}>\beta M in both cases. ∎

It remains to prove Lemma 2224 and 25.

5.1 The Details

We start this section with a result showing that if there are many vertices in LL, then they all lie in the same component of HH.

Lemma 28.

If |L|log7M|L|\geq\log^{7}M, then the probability the vertices of LL lie in the same component of HH is 1o(1)1-o(1).

Proof.

Let L6L_{6} and L7L_{7} be the vertices of degree at least Mlog6M\frac{\sqrt{M}}{\log^{6}{M}} and Mlog7M\frac{\sqrt{M}}{\log^{7}{M}}, respectively. We divide the proof into two cases depending on the size of L6L_{6}.


Case 1: |L6|Mlog6M|L_{6}|\geq\frac{\sqrt{M}}{\log^{6}{M}}.

We begin with a claim which shows that every vertex in LL is adjacent in HH to a large number of vertices in L7L_{7}.

Claim 2.

For every uLu\in L, the probability that uu is adjacent to at most Mlog14M\frac{\sqrt{M}}{\log^{14}{M}} vertices in L7L_{7} is at most M7M^{-7}.

Proof.

Let K=2Mlog14MK=\lceil\frac{2\sqrt{M}}{\log^{14}{M}}\rceil. Assume for a contradiction that the claim fails for uLu\in L. For every k{0,,K}k\in\{0,\dots,K\}, let k\mathcal{F}_{k} be the event that uu is adjacent to exactly kk vertices in L7L_{7}. By our assumption, there is some k0{0,,K2}k_{0}\in\{0,\dots,\frac{K}{2}\} such that [k0]>M8\mathbb{P}[{\mathcal{F}}_{k_{0}}]>M^{-8}.

Suppose that GG is in k\mathcal{F}_{k}. We consider switchings that lead to a multigraph in either k+1\mathcal{F}_{k+1} or k+2\mathcal{F}_{k+2}. We stress here that we will use a specially adjusted version of switchings. Consider edges uvE(H)uv\in E(H) such that vL7v\notin L_{7} or uvuv is not an edge in GG. We have at least MlogMkM2logM\frac{\sqrt{M}}{\log M}-k\geq\frac{\sqrt{M}}{2\log M} choices for such an edge. Moreover, there are at least Mlog6MMlog14MM2log6M\frac{\sqrt{M}}{\log^{6}M}-\frac{\sqrt{M}}{\log^{14}M}\geq\frac{\sqrt{M}}{2\log^{6}M} vertices xL6NH(v)x\in L_{6}\setminus N_{H}(v). Now we discuss different switching situations depending on the structure of GG.

First, suppose that there are at least M4logM\frac{\sqrt{M}}{4\log{M}} edges uvuv such that vL7v\notin L_{7}. Choose such an oriented edge uvuv. Then, for each xL6NH(v)x\in L_{6}\setminus N_{H}(v), there are at least Mlog6MMlog7MM2log6M\frac{\sqrt{M}}{\log^{6}{M}}-\frac{\sqrt{M}}{\log^{7}{M}}\geq\frac{\sqrt{M}}{2\log^{6}{M}} edges xyxy such that yvy\neq v and either xyxy is not an edge in GG or vyE(H)vy\notin E(H). In both cases we get at least M3/216log13M\frac{M^{3/2}}{16\log^{13}M} switchings which increase the degree of uu in L7L_{7}.

Otherwise, there are at least M4logM\frac{\sqrt{M}}{4\log{M}} edges uvuv that are not edges in GG. Choose such an edge uvuv. Next, suppose that there are at least M4log6M\frac{\sqrt{M}}{4\log^{6}M} vertices xL6NH(v)x\in L_{6}\setminus N_{H}(v) such that there are at least M2log6M\frac{\sqrt{M}}{2\log^{6}{M}} edges xyxy with yvy\neq v. As before this give rise to at least M3/232log13M\frac{M^{3/2}}{32\log^{13}M} switchings. (Observe that if u=vu=v, then the obtained graph is in k+2\mathcal{F}_{k+2}. Observe also that we chose to switch so that the new edge between yy and vv corresponds to the edge between uu and vv and hence has an internal vertex). Otherwise, there are at least M4log6M\frac{\sqrt{M}}{4\log^{6}M} vertices xL6NH(v)x\in L_{6}\setminus N_{H}(v) such that there are at least M2log6M\frac{\sqrt{M}}{2\log^{6}{M}} edges xyxy with y=vy=v. Choose such an xyxy that it is not an edge in GG (all but at most one of them are not edges of GG). If either uvuv or xyxy corresponds to a path of length at least 33 in GG, then there exists at least one switching (the one that switches such an edge to a new loop in y=vy=v) that transforms GG into a graph in k+1\mathcal{F}_{k+1}. If both uvuv and xyxy correspond to paths of length 22, then we perform a special type of switching. Let uwvuwv and xzyxzy be the corresponding paths in GG. Then, we obtain the switched graph by deleting the edges uwuw, wvwv, xzxz and zyzy and by adding the edges uxux, vwvw, wzwz and zyzy. This gives a graph in k+1\mathcal{F}_{k+1}. In this case, there are also at least M3/232log13M\frac{M^{3/2}}{32\log^{13}M} switchings.

Now, for any GG in either k+1\mathcal{F}_{k+1} or k+2\mathcal{F}_{k+2}, consider the switchings that transform it into a multigraph in k\mathcal{F}_{k}. We must use an edge uvuv for vL7v\in L_{7} which is not a parallel edge in HH. While there might be many edges between uu and L7L_{7}, note that there are at most k+2k+2 of this type. We can select xyxy in at most MM ways. Thus there are at most 4(k+2)M10M3/2log14M4(k+2)M\leq\frac{10M^{3/2}}{\log^{14}M} switchings leading to a multigraph in k\mathcal{F}_{k}. The factor 44 comes from the fact that we performed the special type of switching introduced above, that given two edges in a graph can give rise to at most 44 graphs.

Hence [k+1]+[k+2]logM320[k]8[k]\mathbb{P}[\mathcal{F}_{k+1}]+\mathbb{P}[\mathcal{F}_{k+2}]\geq\frac{\log{M}}{320}\mathbb{P}[\mathcal{F}_{k}]\geq 8\mathbb{P}[\mathcal{F}_{k}], and in particular max{[k+1],[k+2]}4[k]\max\{\mathbb{P}[\mathcal{F}_{k+1}],\mathbb{P}[\mathcal{F}_{k+2}]\}\geq 4\mathbb{P}[\mathcal{F}_{k}]. Using that [k0]M8\mathbb{P}[{\mathcal{F}}_{k_{0}}]\geq M^{-8}, we have max{[K1],[K]}2Kk0[k0]>1\max\{\mathbb{P}[{\mathcal{F}}_{K-1}],\mathbb{P}[{\mathcal{F}}_{K}]\}\geq 2^{K-k_{0}}\mathbb{P}[\mathcal{F}_{k_{0}}]>1, which is a contradiction. ∎

Now we use Claim 2 to show that any two vertices in LL whose degree is not extremely large, lie in the same component.

Claim 3.

For every u,vLu,v\in L each of degree at most Mlog24M\frac{M}{\log^{24}M}, the probability that they are not in the same component is at most M4M^{-4}.

Proof.

Let K=logMK=\lceil\log{M}\rceil. For every k{0,,K}k\in\{0,\dots,K\}, we define the following events,

  • A1:A_{1}: uu and vv have no common neighbour in HH.

  • A2k:A^{k}_{2}: there are kk edges between NH(u)N_{H}(u) and NH(v)N_{H}(v) in HH.

  • A3k:A^{k}_{3}: HH has an edge-cut of size at most 2k2k separating NH(u)N_{H}(u) and NH(v)N_{H}(v).

Let k\mathcal{F}_{k} be the event A1A2kA3kA_{1}\cap A^{k}_{2}\cap A^{k}_{3}. Observe that if 0\mathcal{F}_{0} is not satisfied, then there exists a path between uu and vv. Thus, it suffices to show [0]M4\mathbb{P}[\mathcal{F}_{0}]\leq M^{-4}.

Here, we will show that for every kk satisfying [k]M4\mathbb{P}[\mathcal{F}_{k}]\geq M^{-4}, we have

max{[k+1],[k+2]}logM[k].\max\{\mathbb{P}[\mathcal{F}_{k+1}],\mathbb{P}[\mathcal{F}_{k+2}]\}\geq\log M\cdot\mathbb{P}[\mathcal{F}_{k}]\;.

This implies that [0]max{M4,(logM)K2}M4\mathbb{P}[\mathcal{F}_{0}]\leq\max\{M^{-4},(\log{M})^{-\frac{K}{2}}\}\leq M^{-4} and proves the claim.

So, suppose [k]M4\mathbb{P}[\mathcal{F}_{k}]\geq M^{-4}. Let kk\mathcal{F}_{k}^{\prime}\subseteq\mathcal{F}_{k} be the event that u,vu,v are in addition adjacent to at least Mlog14M\frac{\sqrt{M}}{\log^{14}{M}} vertices in L7L_{7}. By Claim 2 and since [k]M4\mathbb{P}[\mathcal{F}_{k}]\geq M^{-4}, we obtain that [k]12[k]\mathbb{P}[\mathcal{F}_{k}^{\prime}]\geq\frac{1}{2}\mathbb{P}[\mathcal{F}_{k}].

We consider switchings from a graph GG in k\mathcal{F}_{k}^{\prime} to k+1\mathcal{F}_{k+1} or k+2\mathcal{F}_{k+2}, which use no edge incident to uu or vv. We are going to switch using edges from two sets which we now define.

Fix an edge cut F1F_{1} of size at most 2k2k separating NH(u)N_{H}(u) and NH(v)N_{H}(v) and let F2F_{2} be the set of kk edges between NH(u)N_{H}(u) and NH(v)N_{H}(v). These two sets of edges exist by A3kA^{k}_{3} and A2kA^{k}_{2}, respectively.

Given that GG is in k\mathcal{F}_{k}^{\prime}, there are at least Mlog14M\frac{\sqrt{M}}{\log^{14}{M}} vertices xNH(u)L7x\in N_{H}(u)\cap L_{7} and for each such xx, there are at least Mlog7M\frac{\sqrt{M}}{\log^{7}{M}} edges xyxy. Since d(u)Mlog24Md(u)\leq\frac{M}{\log^{24}M}, essentially all such xyxy satisfy yuy\neq u. Indeed, we can find a set XX of M2log14M\frac{\sqrt{M}}{2\log^{14}{M}} vertices xNH(u)L7x\in N_{H}(u)\cap L_{7} such that xx is not an endpoint of F1F2F_{1}\cup F_{2} and there are at least M2log7M\frac{\sqrt{M}}{2\log^{7}{M}} edges xyxy with yuy\neq u. Let E1E_{1} be the set of edges xyxy such that xXx\in X, yuy\neq u and either xyxy is not an edge of GG or yy is not an endpoint of any edge of F1F2F_{1}\cup F_{2}. Since |F1F2|3k|F_{1}\cup F_{2}|\leq 3k, |E1|M8log21M|E_{1}|\geq\frac{M}{8\log^{21}{M}}.

In the same vein, we can obtain a set of vertices WW such that for each wWw\in W, we have wNH(v)L7w\in N_{H}(v)\cap L_{7}, ww is not an endpoint of F1F2F_{1}\cup F_{2} and there are at least M2log7M\frac{\sqrt{M}}{2\log^{7}{M}} edges wzwz with zvz\neq v. Moreover, we can also obtain a set of edges E2E_{2} with |E2|M8log21M|E_{2}|\geq\frac{M}{8\log^{21}M} such that for each wzE2wz\in E_{2}, wWw\in W, zvz\neq v and either wzwz is not an edge of GG or zz is not an endpoint of any edge of F1F2F_{1}\cup F_{2}.

Observe that for any xyE1xy\in E_{1} and any wzE2wz\in E_{2}, we have yzy\neq z. Otherwise, if y=zy=z, there exists a path uxywvuxywv non of whose edges are in the edge cut F1F_{1}, getting a contradiction.

If yzyz is an edge of HH, then yzF1yz\in F_{1} and both xyxy and wzwz are not edges of GG. Note that xwE(H)xw\notin E(H). Thus, we can always switch xyxy and wzwz to obtain a new graph which is k+1\mathcal{F}_{k+1} or k+2\mathcal{F}_{k+2} (it only belong to k+2\mathcal{F}_{k+2} if yNH(u)y\in N_{H}(u) and zNH(v)z\in N_{H}(v)). There are at least M264log42M\frac{M^{2}}{64\log^{42}M} switchings.

Given a graph in k+1k+2\mathcal{F}_{k+1}\cup\mathcal{F}_{k+2}, there are at most 4(k+2)MM3/24(k+2)M\leq M^{3/2} switchings which yield a graph in k\mathcal{F}_{k}.

We conclude that max{[k+1],[k+2]}logM[k]\max\{\mathbb{P}[\mathcal{F}_{k+1}],\mathbb{P}[\mathcal{F}_{k+2}]\}\geq\log M\cdot\mathbb{P}[\mathcal{F}_{k}], as desired. ∎

A union bound over all pairs u,vLu,v\in L together with Claim 3 suffices to show that in Case 1, with probability 1o(1)1-o(1) all the vertices in LL with degree at most Mlog24M\frac{M}{\log^{24}M} lie in the same component of HH.

Now we consider a set SS consisting of all the vertices of LL of degree at least Mlog24M\frac{M}{\log^{24}M} and one other vertex of LL, if there are any more. Therefore, |S|log24M+1|S|\leq\log^{24}M+1.

For any u,vSu,v\in S, we let 𝒜u,v\mathcal{A}_{u,v} be the event that uu and vv are in the same component and u,v\mathcal{B}_{u,v} be the event that they are in different components. We will use switchings involving uu and vv. On the one hand, for any graph in u,v\mathcal{B}_{u,v}, there are d(u)d(v)M3/2log25Md(u)d(v)\geq\frac{M^{3/2}}{\log^{25}M} switchings which yield a graph GG in 𝒜u,v\mathcal{A}_{u,v} in which uvuv is an edge of HH. On the other hand, for any graph GG in 𝒜u,v\mathcal{A}_{u,v} there are at most 4M4M switchings using the edge uvuv and another edge, that transform the graph into a graph in u,v\mathcal{B}_{u,v}. Therefore, [u,v]M1/3\mathbb{P}[\mathcal{B}_{u,v}]\leq M^{-1/3}. So, with probability 1o(1)1-o(1) all the vertices in SS lie in the same component. This implies that with probability 1o(1)1-o(1), all the vertices in LL lie in the same component of HH and this completes the proof of the lemma for Case 1.


Case 2: |L6|Mlog6M|L_{6}|\leq\frac{\sqrt{M}}{\log^{6}{M}}.

Let \ell be the size of LL. By the hypothesis of the lemma and of this case, we have

log7MMlog6M.\displaystyle\log^{7}M\leq\ell\leq\frac{\sqrt{M}}{\log^{6}M}\;. (2)

The following claim shows that with high probability, the multigraph induced in HH by the vertices in LL, has large minimum degree.

Claim 4.

With probability at least 171-\ell^{-7}, every vertex uLu\in L is incident to at least log\sqrt{\ell\log{\ell}} edges of HH which join it to other vertices of LL.

Proof.

Fix a vertex uLu\in L. For every 0k<2log0\leq k<2\sqrt{\ell\log{\ell}}, let k\mathcal{F}_{k} be the event that uu is incident to exactly kk edges joining it to vertices of LL in HH. Using (2), we have that k2logM1/4k\leq 2\sqrt{\ell\log{\ell}}\leq M^{1/4}.

Let GG be a graph in k\mathcal{F}_{k}. We will count how many (extended) switchings lead to a graph in k+1\mathcal{F}_{k+1}. By the hypothesis of Case 2, there are at least MlogMMlog6MkM2logM\frac{\sqrt{M}}{\log M}-\frac{\sqrt{M}}{\log^{6}M}-k\geq\frac{\sqrt{M}}{2\log M} edges uvuv such that either vL6v\not\in L_{6} or vLv\not\in L and uvuv corresponds to a path of length at least 22 in GG.

For any such edge uvuv, we can switch with any edge xyxy disjoint from uvuv such that xL{u}x\in L\setminus\{u\} and is not adjacent to uu unless one of the following situation happens:

  1. (i)

    xyxy and uvuv both correspond to edges of GG and there is an edge corresponding to an edge of GG between yy and vv, or

  2. (ii)

    v=yv=y.

There are at least k12\ell-k-1\geq\frac{\ell}{2} choices for xLNH(v)x\in L\setminus N_{H}(v). Given the choice of uvuv and xx, since xLx\in L and if uvuv is an edge of GG then vL6v\in L_{6}, there are at least M2logM\frac{\sqrt{M}}{2\log{M}} choices for an edge xyxy that do not satisfy (i)(i). Since vLv\not\in L, there are in total at most MlogM\frac{\sqrt{M}}{\log M} edges xyxy satisfying (ii) for uvuv.

Thus, there are at least M4logMMlogMM5logM\frac{\ell\sqrt{M}}{4\log M}-\frac{\sqrt{M}}{\log{M}}\geq\frac{\ell\sqrt{M}}{5\log M} choices for an edge xyxy that give a valid switching with uvuv. So, in total there are at least M10log2M\frac{\ell M}{10\log^{2}M} switchings.

If GG is in k+1\mathcal{F}_{k+1}, then there are at most 4(k+1)M8logM4(k+1)M\leq 8\sqrt{\ell\log\ell}M switchings that lead to a multigraph in k\mathcal{F}_{k}.

Using (2), we conclude,

[k]80loglog2M[k+1]1logM[k+1],\displaystyle\mathbb{P}[\mathcal{F}_{k}]\leq\frac{80\sqrt{\ell\log{\ell}}\log^{2}M}{\ell}\cdot\mathbb{P}[\mathcal{F}_{k+1}]\leq\frac{1}{\log M}\cdot\mathbb{P}[\mathcal{F}_{k+1}],

In particular, for every klogk\leq\sqrt{\ell\log{\ell}}, we have

[k](logM)log[2log]<9.\displaystyle\mathbb{P}[\mathcal{F}_{k}]\leq(\log{M})^{-\sqrt{\ell\log{\ell}}}\cdot\mathbb{P}[\mathcal{F}_{2\sqrt{\ell\log{\ell}}}]<\ell^{-9}.

A union bound for all klogk\leq\sqrt{\ell\log{\ell}} and all vertices vLv\in L now yields to the desired result. ∎

To complete Case 22, we will use the minimum degree within the vertices in LL to show that the multigraph induced by LL in HH, denoted by H[L]H[L], is connected. For every k1k\geq 1, let k\mathcal{F}_{k} be the event that H[L]H[L] has exactly kk components. We show that there is an f(l)f(l) which is o(l)o(l) such that for every k1k\geq 1 that satisfies [k+1]2\mathbb{P}[\mathcal{F}_{k+1}]\geq\ell^{-2}, we have [k+1]f(l)[k]\mathbb{P}[\mathcal{F}_{k+1}]\leq f(l)\mathbb{P}[\mathcal{F}_{k}]. If so, [1]=1o(1)\mathbb{P}[\mathcal{F}_{1}]=1-o(1), or in other words, with probability 1o(1)1-o(1) the multigraph H[L]H[L] is connected. This proof follows the same lines as the one in Lemma 10.

Fix k1k\geq 1 such that [k+1]2\mathbb{P}[\mathcal{F}_{k+1}]\geq\ell^{-2}. Suppose GG is in k\mathcal{F}_{k}. Any (extended) switching from GG that leads to a graph in k+1\mathcal{F}_{k+1} creates a new component and hence either uses two cut edges or uses a 2-edge cut which does not contain a cut edge. By Lemma 11, there are at most 828\ell^{2} switchings leading to a multigraph in k+1\mathcal{F}_{k+1}.

For every k1k\geq 1, let k\mathcal{F}_{k}^{\prime} be the event k\mathcal{F}_{k} with the additional restriction that H[L]H[L] has minimum degree at least log\sqrt{\ell\log{\ell}}. Since [k+1]2\mathbb{P}[\mathcal{F}_{k+1}]\geq\ell^{-2}, by Claim 4, [k]12[k]\mathbb{P}[\mathcal{F}_{k}^{\prime}]\geq\frac{1}{2}\mathbb{P}[\mathcal{F}_{k}]. Suppose now that GG is in k+1\mathcal{F}_{k+1}^{\prime}. We will lower bound the number of (extended) switchings to graphs in k\mathcal{F}_{k}. In order to merge two components it is enough to select non-cut edge xyxy and an edge uvuv in another component. By the definition of k+1\mathcal{F}_{k+1}^{\prime}, there are at least |E(H[L])|123/2log3/2|E(H[L])|-\ell\geq\frac{1}{2}\ell^{3/2}\sqrt{\log{\ell}}-\ell\geq\ell^{3/2} choices for xyxy. Given the choice of xyxy, there is at least one vertex in another component, and hence, there are at least log\sqrt{\ell\log{\ell}} choices for uvuv. The total number of switchings is at least 2log\ell^{2}\sqrt{\log{\ell}}.

Hence, for every k1k\geq 1 and since \ell\to\infty as nn\to\infty,

[k+1]2[k+1]28212log[k]=16log[k]).\mathbb{P}[\mathcal{F}_{k+1}]\leq 2\mathbb{P}[\mathcal{F}_{k+1}^{\prime}]\leq 2\cdot 8\ell^{2}\cdot\frac{1}{\ell^{2}\sqrt{\log{\ell}}}\cdot\mathbb{P}[\mathcal{F}_{k}]=\frac{16}{\sqrt{\log{\ell}}}\mathbb{P}[\mathcal{F}_{k}]).

This completes the proof of Lemma 28. ∎

We proceed with the proof of Lemma 22.

Proof of Lemma 22.

If |L|log7M|L|\geq\log^{7}{M}, we can use Lemma 28 to show that with probability 1o(1)1-o(1), all the vertices in LL lie in the same connected component, and hence the statement of the lemma holds in this case.

Therefore, we can assume that |L|log7M|L|\leq\log^{7}M. Since RϵMR\geq\epsilon M, this implies that if the union of the components intersecting LL have at least R200\frac{R}{200} edges there is a component of size at least M2/3M^{2/3} which contains a vertex of LL. So, it is enough to prove that for every pair u,vLu,v\in L, the probability that uu is in a component with at least M23M^{\frac{2}{3}} edges not containing vv is o(M110)o(M^{-\frac{1}{10}}).

Fix u,vLu,v\in L. Let \mathcal{F}_{-} be the event that the component of uu in HH has at least M23M^{\frac{2}{3}} edges and u,vu,v are in different components. Let +\mathcal{F}_{+} be the event that u,vu,v are in the same component of HH. We will show that []=o(M110)[+]\mathbb{P}[\mathcal{F}_{-}]=o(M^{-\frac{1}{10}})\mathbb{P}[\mathcal{F}_{+}].

Let GG be a graph in \mathcal{F}_{-}. Since vLv\in L, there are at least M2logM\frac{\sqrt{M}}{2\log{M}} oriented edges vwvw (i.e. we count loops twice) and at least M2/3M^{2/3} oriented edges xyxy in the same component as uu ordered in such a way that xx is at least as close to uu, as yy. Thus, the total number of switchings, using an edge xyxy ordered in such a way leading to a multigraph in +\mathcal{F}_{+} with vxvx an edge is at least M7/6logM\frac{M^{7/6}}{\log M}.

Consider GG in +\mathcal{F}_{+} obtained by such a swap. If vwv\neq w and xyx\neq y, then there exists a unique edge vxvx in any shortest path from uu to vv. Otherwise we can find two edges incident to vv, such that every shortest path from uu to vv contains one of them. So, the total number of such switchings leading to GG is at most 8M8M.

We conclude the desired result, []8logMM1/6[+]=o(M110)\mathbb{P}[\mathcal{F}_{-}]\leq\frac{8\log M}{M^{1/6}}\cdot\mathbb{P}[\mathcal{F}_{+}]=o(M^{-\frac{1}{10}}). ∎

Proof of Lemma 24.

Recall that for every t0t\geq 0, we have LStL\subset S_{t}. Moreover, by construction of the exploration process, the component we are exploring at time tt has no vertices in LL. Thus any vertex that belongs either to the current explored component or to VStV\setminus S_{t}, has degree at most MlogM\frac{\sqrt{M}}{\log M}. This implies that, for every vertex vv that plays a role in the exploration process, the number of edges incident to a neighbour of vv is at most Mlog2M\frac{M}{\log^{2}M}. This property will be crucial in our analysis.

In this proof we consider inputs (graphs equipped with an order of each adjacency list) instead of graphs. As in the proof of Lemma 16, we will perform our switchings between equivalence classes of inputs.

We start by proving the second part of the lemma. We fix wVSt1w\in V\setminus S_{t-1} and condition on the configuration 𝒞t1\mathcal{C}_{t-1} at time t1t-1. For every 0id(w)10\leq i\leq d(w)-1, we let i\mathcal{F}_{i} be the equivalent class of inputs such that wt=ww_{t}=w and that the sum of the number of loops on ww and edges from ww to St1S_{t-1} is i+1i+1. For any equivalence class in i+1\mathcal{F}_{i+1}, there are at least (i+1)(Mt1βM2Mlog2M)2(i+1)M3(i+1)(M_{t-1}-\beta M-\frac{2M}{\log^{2}M})\geq\frac{2(i+1)M}{3} switchings that lead to one in i\mathcal{F}_{i}. For any equivalence class in i\mathcal{F}_{i}, there are at most 8(d(w)i)(βM+d(w))8(d(w)-i)(\beta M+d(w)) switchings that lead to i+1\mathcal{F}_{i+1}. It follows that for i+12βd(w)i+1\geq 2\sqrt{\beta}d(w), we have [i+1]16β[i]\mathbb{P}[\mathcal{F}_{i+1}]\leq 16\sqrt{\beta}\mathbb{P}[\mathcal{F}_{i}]. The second statement of the lemma follows from applying the last inequality recursively.

In the same vein, one can obtain that conditional on wtww_{t}\neq w, the probability that ww is incident to more than 3βd(w)3\sqrt{\beta}d(w) edges connecting to St1S_{t-1}, is at most 2β2\sqrt{\beta}. We omit the details. This fact will be used at the end of the proof.

As before, we fix wVSt1w\in V\setminus S_{t-1} and condition on the configuration 𝒞t1\mathcal{C}_{t-1} at time t1t-1. We let 𝒜w\mathcal{A}_{w} be the union of the equivalence classes of inputs consistent with this configuration where wt=ww_{t}=w and we let w\mathcal{B}_{w} be those where wtww_{t}\neq w.

Let wi\mathcal{B}^{i}_{w} be the elements of w\mathcal{B}_{w} such that there are ii edges between ww and vtv_{t}. For each equivalence class in wi+1\mathcal{B}^{i+1}_{w}, we can switch any edge vtwv_{t}w with some other ordered edge xyxy with xVSt1x\in V\setminus S_{t-1} to get an element of wi\mathcal{B}^{i}_{w} unless xx is a neighbour of vtv_{t} or yy is a neighbour of ww. Thus, given a graph in wi+1\mathcal{B}^{i+1}_{w}, there are at least (i+1)(Mt12Mlog2M)(i+1)(M_{t-1}-\frac{2M}{\log^{2}M}) switchings that lead to wi\mathcal{B}^{i}_{w}. On the other hand, given an element of wi\mathcal{B}^{i}_{w}, there are at most d(w)d(vt)Mlog2Md(w)d(v_{t})\leq\frac{M}{\log^{2}M} switchings that lead to wi+1\mathcal{B}^{i+1}_{w}. Since Mt13M4M_{t-1}\geq\frac{3M}{4}, we have that Mt12Mlog2MM2M_{t-1}-\frac{2M}{\log^{2}M}\geq\frac{M}{2}. This implies, |wi|log2M2|wi+1||\mathcal{B}^{i}_{w}|\geq\frac{\log^{2}M}{2}|\mathcal{B}^{i+1}_{w}|. Thus,

|w0|\displaystyle|\mathcal{B}^{0}_{w}| (112logM)|w|.\displaystyle\geq\left(1-\frac{1}{2\log M}\right)|\mathcal{B}_{w}|\;. (3)

We let 𝒞wi\mathcal{C}^{i}_{w} be the elements of w0\mathcal{B}^{0}_{w} such that there are ii edges between wtw_{t} and the neighbours of ww (recall that wwtw\neq w_{t}). Given an equivalence class in 𝒞wi+1\mathcal{C}^{i+1}_{w}, there exist at least (i+1)(Mt1βM2Mlog2M)(i+1)(M_{t-1}-\beta M-\frac{2M}{\log^{2}M}) switchings that lead to 𝒞wi\mathcal{C}^{i}_{w} and that do not use any edge incident to vtv_{t}. On the other hand, given an equivalence class in 𝒞wi\mathcal{C}^{i}_{w}, there are at most (d(wt)(i+1))d(w)MlogMd(w)Mlog2M(d(w_{t})-(i+1))d(w)\frac{\sqrt{M}}{\log M}\leq d(w)\frac{M}{\log^{2}{M}} switchings that lead to 𝒞wi+1\mathcal{C}^{i+1}_{w}.

Given that i+1βd(w)i+1\geq\sqrt{\beta}d(w), we have that,

|𝒞wi|βd(w)M2d(w)Mlog2M|𝒞wi+1|logM|𝒞wi+1|.|\mathcal{C}^{i}_{w}|\geq\frac{\sqrt{\beta}d(w)\frac{M}{2}}{d(w)\frac{M}{\log^{2}{M}}}|\mathcal{C}^{i+1}_{w}|\geq\log{M}|\mathcal{C}^{i+1}_{w}|\;.

Using (3), it follows that conditional on our input being in w\mathcal{B}_{w}, the probability that ww has no edge to vtv_{t} and wtw_{t} has at most βd(w)\sqrt{\beta}d(w) edges incident to NH(w)N_{H}(w) is at least (112logM)(12logM)13logM\left(1-\frac{1}{2\log M}\right)\left(1-\frac{2}{\log M}\right)\geq 1-\frac{3}{\log{M}}.

Combining the statements above, we obtain that the proportion of elements of w\mathcal{B}_{w} such that in the corresponding multigraph HH we have that ww has no edge to vtv_{t}, is incident to at most 3βd(w)3\sqrt{\beta}d(w) edges that are also incident to St1S_{t-1} and wtw_{t} is incident to at most βd(w)\sqrt{\beta}d(w) edges that are also incident to NH(w)N_{H}(w), is at least 12β3logM13β1-2\sqrt{\beta}-\frac{3}{\log{M}}\geq 1-3\sqrt{\beta}. Note that this implies that wwtww_{t} is not an edge of HH.

We consider switching using an ordered pair of oriented edges vtwtv_{t}w_{t} and wxwx of HH such that xx is not in St1S_{t-1}. For inputs as in the last paragraph, there are at least (14β)d(w)(1-4\sqrt{\beta})d(w) choices for an oriented edge wxwx to switch with vtwtv_{t}w_{t} to construct an input in 𝒜w\mathcal{A}_{w} (we simply cannot choose x in St1S_{t-1} or such that wxwx is an edge of GG and wtxw_{t}xis an edge of GG) . Clearly, for any input in w\mathcal{B}_{w}, there are at most d(w)d(w) oriented edges to switch that lead to 𝒜w\mathcal{A}_{w}. For any equivalence class in 𝒜w\mathcal{A}_{w}, similarly as before, there are between Mt1βM2Mlog2MM_{t-1}-\beta M-\frac{2M}{\log^{2}M} and Mt1M_{t-1} such switchings that lead to w\mathcal{B}_{w}( we pick an oriented edge xyxy of HH with both endpoints outside St1S_{t-1} and say we produced vtwv_{t}w and xyxy by swapping on vtxv_{t}x and wywy, this will work for certain provided xx is not incident to a neighbour of vtv_{t} and yy is not incident to a neighbour of ww).

Straightforward computations give,

(110β)d(w)Mt1[𝒜w](1+10β)d(w)Mt1.\left(1-10\sqrt{\beta}\right)\frac{d(w)}{M_{t-1}}\leq\mathbb{P}[\mathcal{A}_{w}]\leq\left(1+10\sqrt{\beta}\right)\frac{d(w)}{M_{t-1}}\;.

Proof of Lemma 25.

Recall that At=d(wt)𝔼[d(wt)]A_{t}=d(w_{t})-\mathbb{E}[d(w_{t})] and that Bt=d(wt)𝔼[d(wt)]B_{t}=d^{\prime}(w_{t})-\mathbb{E}[d^{\prime}(w_{t})]. Note that 𝔼[At]=𝔼[Bt]=0\mathbb{E}[A_{t}]=\mathbb{E}[B_{t}]=0 and since the maximum degree of the vertices in V(H)S0V(H)\setminus S_{0} is at most MlogM\frac{\sqrt{M}}{\log{M}}, we have |At|,|Bt|2MlogM|A_{t}|,|B_{t}|\leq\frac{2\sqrt{M}}{\log{M}}. We can apply Azuma’s Inequality (see Lemma 21) to ttAt\sum_{t^{\prime}\leq t}A_{t^{\prime}} with N=tN=t and ci=2MlogMc_{i}=\frac{2\sqrt{M}}{\log{M}}, to obtain

[ttAt>MloglogM]<2eMlog2M8t(loglogM)2<elog3/2M,\displaystyle\mathbb{P}\left[\sum_{t^{\prime}\leq t}A_{t^{\prime}}>\frac{M}{\log\log M}\right]<2e^{-\frac{M\log^{2}{M}}{8t(\log\log{M})^{2}}}<e^{-\log^{3/2}{M}}\;,

since tMt\leq M and MM is large enough. A union bound over all tMt\leq M suffices to obtain that the probability there exists a tt such that ttAt>MloglogM\sum_{t^{\prime}\leq t}A_{t^{\prime}}>\frac{M}{\log\log M} is o(1)o(1). The same argument can be used for BtB_{t}. Thus, we obtain [bad]=o(1)\mathbb{P}[\mathcal{F}_{bad}]=o(1). ∎

6 Handling Vertices of Degree 2

6.1 Disjoint Unions of Cycles

The graph G(𝒟)G(\mathcal{D}) partitions into a set of cyclic components and a subdivision of H(𝒟)H({\cal D}). We consider first the structure of the graph formed by its cyclic components. We let TT be the set of vertices in these components and let J(T)J(T) be a union of cycles chosen uniformly at random among all 22-regular graphs with vertex set TT. We emphasize that G(𝒟)G(\mathcal{D}) is a simple graph so these cycles have length at least 33.

Fix some vertex vTv\in T. Let p,tp_{\ell,t} be the probability that vv is in a cycle of length \ell in J(T)J(T) if t=|T|t=|T|. Let CtC_{t} be the number of configurations of tt vertices into disjoint cycles of length at least 33. We will use the following result on the asymptotic enumeration of 22-regular graphs (see, e.g. Example VI.2.2 in [12]).

Theorem 29 ([12]).

We have

Ct=(1+58t+O(t2))e3/4πtt!.C_{t}=\left(1+\frac{5}{8t}+O(t^{-2})\right)\frac{e^{-3/4}}{\sqrt{\pi t}}t!\;.
Corollary 30.

For every integer t3t\geq 3 and every 334t3\leq\ell\leq\frac{3}{4}t, we have

p,t=1+O(t2)2t(t).\displaystyle p_{\ell,t}=\frac{1+O(\ell t^{-2})}{2\sqrt{t(t-\ell)}}\;.
Proof.

If vv belongs to a cycle of length \ell, then there are (t11)\binom{t-1}{\ell-1} ways to select the remaining vertices in its cycle. In addition, there are (1)!2\frac{(\ell-1)!}{2} possible configurations for the cycle containing vv given we have selected the vertices in this cycle. Hence p,t=(t11)(1)!2CtCtp_{\ell,t}=\binom{t-1}{\ell-1}\frac{(\ell-1)!}{2}\frac{C_{t-\ell}}{C_{t}}. The desired results follow from straightforward computations using the bounds from Theorem 29. ∎

Corollary 31.

For every 0<δ<340<\delta<\frac{3}{4} and for every sufficiently large tt, the probability that vv lies on a cycle of J(T)J(T) of length at least δt2\frac{\delta t}{2} but less than δt\delta t is at least δ5\frac{\delta}{5}.

Proof.

Using Corollary 30 and that t(t)<t\sqrt{t(t-\ell)}<t, we obtain that the desired probability is at least

=δt2δtp,t=δt2δt(1+ot(1))2t(t)δ5.\sum_{\ell=\frac{\delta t}{2}}^{\delta t}p_{\ell,t}\geq\sum_{\ell=\frac{\delta t}{2}}^{\delta t}\frac{(1+o_{t}(1))}{2\sqrt{t(t-\ell)}}\geq\frac{\delta}{5}\;.

Corollary 32.

For every 0<δ<380<\delta<\frac{3}{8} and for every sufficiently large tt, the probability that J(T)J(T) contains a cycle of length at least δt\delta t is at least δ3\frac{\delta}{3}.

Proof.

The probability there exists a cycle of length at least \ell is at least the probability that vv is in a cycle of length \ell. Using Corollary 31, we conclude that the probability that vv is contained in a cycle of length at least δt\delta t (and at most 2δt2\delta t) is at least δ3\frac{\delta}{3}. ∎

Corollary 33.

For every 0<ϵ<10<\epsilon<1, there exists a pϵ>0p_{\epsilon}>0 such that for any sufficiently large tt, the probability that J(T)J(T) contains no cycle of length at least ϵt\epsilon t is at least pϵp_{\epsilon}.

Proof.

It suffices to prove the statement for ϵ<110\epsilon<\frac{1}{10}.

We let DϵD_{\epsilon} be the event that the sum of the lengths of the cycles of J(T)J(T) which have length at least ϵt4\frac{\epsilon t}{4} but less than ϵt2\frac{\epsilon t}{2} exceeds (1ϵ)t(1-\epsilon)t. Clearly it is enough to prove a lower bound on [Dϵ]\mathbb{P}[D_{\epsilon}].

For k0k\geq 0, we let Ek,ϵE_{k,\epsilon} be the event that J(T)J(T) contains kk cycles P1,,PkP_{1},\ldots,P_{k} of length 1,,k\ell_{1},\dots,\ell_{k} such that for each 1ik1\leq i\leq k, if ti=tj<ijϵtt_{i}=t-\sum_{j<i}\ell_{j}\geq\epsilon t, then PiP_{i} is disjoint from PjP_{j} for all j<ij<i and the lowest indexed vertex in T(j<iPj)T\setminus\left(\cup_{j<i}P_{j}\right) is in PiP_{i}, and ϵt4iϵt2\frac{\epsilon t}{4}\leq\ell_{i}\leq\frac{\epsilon t}{2}.

We set k=4ϵk^{*}=\lceil\frac{4}{\epsilon}\rceil. Observe that [Dϵ][Ek,ϵ]\mathbb{P}[D_{\epsilon}]\geq\mathbb{P}[E_{k^{*},\epsilon}], so it suffices to lower bound [Ek,ϵ]\mathbb{P}[E_{k^{*},\epsilon}]. Clearly, [E0,ϵ]=1\mathbb{P}[E_{0,\epsilon}]=1. For 1kk1\leq k\leq k^{*}, we have [Ek,ϵEk1,ϵ]=1\mathbb{P}[E_{k,\epsilon}\mid E_{k-1,\epsilon}]=1 if the number tt^{\prime} of vertices not in the union of the PjP_{j} for j<ij<i is less than ϵt\epsilon t as we can simply set Pi=Pi1P_{i}=P_{i-1}. Given that tϵtt^{\prime}\geq\epsilon t, this conditional probability is at least the probability the vertex vv with lowest index in a uniformly random disjoint union of cycles of total length tt^{\prime} is in a cycle of length \ell with ϵt4ϵt2\frac{\epsilon t}{4}\leq\ell\leq\frac{\epsilon t}{2}. By applying Corollary 31 with the parameters δ=ϵt2t12\delta=\frac{\epsilon t}{2t^{\prime}}\leq\frac{1}{2} and tt^{\prime}, we have [Ek,ϵ][Ek,ϵEk1,ϵ][Ek1,ϵ]ϵ5[Ek1,ϵ]\mathbb{P}[E_{k,\epsilon}]\geq\mathbb{P}[E_{k,\epsilon}\mid E_{k-1,\epsilon}]\mathbb{P}[E_{k-1,\epsilon}]\geq\frac{\epsilon}{5}\mathbb{P}[E_{k-1,\epsilon}]. Hence [Ek,ϵ](ϵ5)k(ϵ5)4ϵ1\mathbb{P}[E_{k^{*},\epsilon}]\geq(\frac{\epsilon}{5})^{k^{*}}\geq(\frac{\epsilon}{5})^{\lceil 4\epsilon^{-1}\rceil}. ∎

6.2 The order of the components

In this section we study the number of vertices in the union of some components of G=G(𝒟)G=G(\mathcal{D}) conditioned on an (at least partial) choice of H=H(𝒟)H=H({\cal D}). As before, we write M=M𝒟M=M_{\mathcal{D}}. Observe that the degree sequence 𝒟\mathcal{D} fixes the number m=M2m=\frac{M}{2} of edges of HH and also n2n_{2}, the number of vertices of degree 2 in GG, while a choice of HH fixes the number of edges of H(𝒟)H({\cal D}) in each of its components.

For a given choice of HH, we expose edges and vertices of degree 22 in GG in two phases. In the first phase we do the following.

  1. (1.1)

    For each set of parallel edges between two vertices in HH, we expose at most one of them as an edge of GG. The parallel edges of HH that we exposed as edges of GG will be called fixed edges of HH.

  2. (1.2)

    For each non-fixed edge in HH with distinct endpoints which is parallel to some other edge (so it corresponds to a path in GG of length at least 22), we expose the edge on the corresponding path of GG which is incident to its endpoint of lowest index; that is, we expose the other endpoint of such an edge.

  3. (1.3)

    For each loop of HH rooted at a vertex, we expose the two edges of GG incident to this vertex in the cycle of GG corresponding to the loop; that is, we expose the other endpoints of these two edges.

We note that if an edge of HH is neither a loop nor parallel to another edge, then we do not expose whether it corresponds to an edge or a non-trivial path of GG.

Let mm^{\prime} be the number of edges of HH that are non-fixed. We let n2n^{*}_{2} be the number of vertices of degree 22 which have been exposed in (1.2) or (1.3). Let n2=n2n2n^{\prime}_{2}=n_{2}-n^{*}_{2} be the vertices of degree 22 that have not been exposed yet.

Observation 34.

We observe that

  1. (a)

    for every component KK of HH, there are at most |E(K)|2\frac{|E(K)|}{2} fixed edges, and

  2. (b)

    for every component KK of HH, the number of vertices of degree 22 exposed in (1.2) or (1.3) inside KK, is at most 2|E(K)|2|E(K)|, and the sum of the number of such vertices and |V(K)||V(K)| is at most 3|E(K)|3|E(K)|.

Suppose that we also condition on the set TH={v1,,vn2H}T_{H}=\{v_{1},\dots,v_{n^{H}_{2}}\} of the n2Hn^{H}_{2} vertices of degree 22 that have not been exposed yet and which lie in the non-cyclic components of GG. Then we can specify the non-cyclic components of the graph GG in a second phase.

  1. (2.1)

    Fix an arbitrary ordering of the mm^{\prime} edges of HH that are non-fixed and a direction on each of these edges.

  2. (2.2)

    Choose a uniformly random permutation π\pi of length n2H+m1n^{H}_{2}+m^{\prime}-1 of THT_{H} and a set of m1m^{\prime}-1 indistinguishable delimiters. We let did_{i} be the position of the ii-th delimiter in the permutation and add a delimiter d0=0d_{0}=0 at the start of the permutation and a delimiter dm=n2H+md_{m^{\prime}}=n^{H}_{2}+m^{\prime} at its end.

  3. (2.3)

    For every 1im1\leq i\leq m^{\prime}, let ei=xye_{i}=xy be the ii-th non-fixed edge of HH, with the corresponding direction. We let xx^{\prime} (resp. yy^{\prime}) be the neighbour of xx (resp. yy) on the path of GG corresponding to eie_{i} if we have exposed it, otherwise we set x=xx^{\prime}=x (resp. y=yy^{\prime}=y). We expose the vertices vjTHv_{j}\in T_{H} with di1π(j)<did_{i-1}\leq\pi(j)<d_{i} to construct a path in GG connecting xx^{\prime} and yy^{\prime}. We do this by starting at xx and by following the order induced by π\pi.

Now, conditional on the choice of THT_{H} of size n2Hn^{H}_{2}, the number of choices for the non-cyclic components of G(𝒟)G(\mathcal{D}) is exactly

NH(n2H,m)=(n2H+m1)!(m1)!.N^{H}(n^{H}_{2},m^{\prime})=\frac{(n^{H}_{2}+m^{\prime}-1)!}{(m^{\prime}-1)!}\;.

Recall that, if we only condition on the information exposed in (1.1)–(1.3), we have mm^{\prime} non-fixed edges in HH and n2n^{*}_{2} vertices of degree 22 which were exposed in non-cyclic components. Also recall, n2=n2n2n^{\prime}_{2}=n_{2}-n_{2}^{*}. For each s,ts,t with s+t=n2s+t=n^{\prime}_{2} and every m1m^{\prime}\geq 1, we let N(s,t,m)N(s,t,m^{\prime}) be the number of graphs with tt vertices of degree 22 in cyclic components and s+n2s+n^{*}_{2} vertices of degree 22 in non-cyclic components given our exposure of HH.

By the previous observations, N(s,t,m)=(s+tt)NH(s,m)CtN(s,t,m^{\prime})=\binom{s+t}{t}N^{H}(s,m^{\prime})C_{t}, where CtC_{t} has been defined in the previous section as the number of configurations of disjoint cycles using tt vertices. Now, NH(s+1,m)NH(s,m)=s+m\frac{N^{H}(s+1,m^{\prime})}{N^{H}(s,m^{\prime})}=s+m^{\prime}. Theorem 29 allows us to estimate the ratio Ct/Ct1C_{t}/C_{t-1}. We thus obtain that there exists some function ff such that f(t)=O(t2)f(t)=O(t^{-2}) and 0<1f(t)<tt10<1-f(t)<\sqrt{\frac{t}{t-1}}, and such that for every t4t\geq 4,

N(s,t,m)N(s+1,t1,m)\displaystyle\frac{N(s,t,m^{\prime})}{N(s+1,t-1,m^{\prime})} =(s+tt)(s+tt1)NH(s,m)NH(s+1,m)CtCt1=(1f(t))t1t(1m1s+m).\displaystyle=\frac{\binom{s+t}{t}}{\binom{s+t}{t-1}}\cdot\frac{N^{H}(s,m^{\prime})}{N^{H}(s+1,m^{\prime})}\cdot\frac{C_{t}}{C_{t-1}}=\left(1-f(t)\right)\sqrt{\frac{t-1}{t}}\left(1-\frac{m^{\prime}-1}{s+m^{\prime}}\right)\;. (4)

It is also not hard to see that for non-negative ss we have:

N(s,3,m)N(s+3,0,m)=13!(s+3)(s+2)(s+1)(s+m)(s+m+1)(s+m+2).\displaystyle\frac{N(s,3,m^{\prime})}{N(s+3,0,m^{\prime})}=\frac{1}{3!}\frac{(s+3)(s+2)(s+1)}{(s+m^{\prime})(s+m^{\prime}+1)(s+m^{\prime}+2)}\;. (5)

We let N(n2,m)=N(n2,0,m)+t=3n2N(n2t,t,m)N^{*}(n_{2}^{\prime},m^{\prime})=N(n_{2}^{\prime},0,m^{\prime})+\sum_{t=3}^{n_{2}^{\prime}}N(n_{2}^{\prime}-t,t,m^{\prime}). For every t3t\geq 3, let

qt(n2,M)=N(n2t,t,m)N(n2,m).q_{t}(n_{2}^{\prime},M^{\prime})=\frac{N(n_{2}^{\prime}-t,t,m^{\prime})}{N^{*}(n_{2}^{\prime},m^{\prime})}\;.

That is, qtq_{t} equals the probability that there are tt vertices in the cycle components given our choices of H(𝒟)H(\mathcal{D}) and the exploration explained above. Observe that if gg is a function such that g(i)=O(i2)g(i)=O(i^{-2}) for each i4i\geq 4 and we have g(i)<1g(i)<1, then there are two positive constants such that for every integer j4j\geq 4, the product i=4j(1g(i))\prod_{i=4}^{j}\left(1-g(i)\right) lies between them. Using (4) and letting ft=i=4t(1f(i))f_{t}=\prod_{i=4}^{t}\left(1-f(i)\right), we have that for t3t\geq 3,

qt\displaystyle q_{t} =3q3ftti=4t(1m1n2t+m).\displaystyle=\frac{\sqrt{3}q_{3}f_{t}}{\sqrt{t}}\prod_{i=4}^{t}\left(1-\frac{m^{\prime}-1}{n_{2}^{\prime}-t+m^{\prime}}\right)\;. (6)

We now provide the proof of Theorem 6.

Proof of Theorem 6.

Define α=γ2\alpha=\frac{\gamma}{2}. Using (6) and the fact that mMbm^{\prime}\leq M\leq b, we can find positive constants c1c_{1} and c2c_{2} such that for every 3t<(1α2)n23\leq t<(1-\frac{\alpha}{2})n^{\prime}_{2}, we have

c1q3tqtc2q3t,\displaystyle\frac{c_{1}q_{3}}{\sqrt{t}}\leq q_{t}\leq\frac{c_{2}q_{3}}{\sqrt{t}}\;, (7)

where we use that n2n_{2}^{\prime} is large in terms of α\alpha and bb. From (5), we obtain that q07q3q05\frac{q_{0}}{7}\leq q_{3}\leq\frac{q_{0}}{5}, provided that n2n_{2}^{\prime} is large enough. Recall that q1=q2=0q_{1}=q_{2}=0. We also observe that qtqt1q_{t}\leq q_{t-1} for every t4t\geq 4 by (4). Thus, qtq(1α2)n2q_{t}\leq q_{(1-\frac{\alpha}{2})n_{2}^{\prime}} for every t>(1α2)n2t>(1-\frac{\alpha}{2})n_{2}^{\prime}. Observe that 1(1α2)n22t\frac{1}{\sqrt{(1-\frac{\alpha}{2})n_{2}^{\prime}}}\leq\frac{\sqrt{2}}{\sqrt{t}} for all t>(1α2)n2t>(1-\frac{\alpha}{2})n_{2}^{\prime}. Since qtq_{t} is a probability distribution, it follows from (7) that

1=q0+t=3n2qt7q3+2c2q3t=3n21t7q3+3c2q3n2,\displaystyle 1=q_{0}+\sum_{t=3}^{n_{2}^{\prime}}q_{t}\leq 7q_{3}+\sqrt{2}c_{2}q_{3}\sum_{t=3}^{n_{2}^{\prime}}\frac{1}{\sqrt{t}}\leq 7q_{3}+3c_{2}q_{3}\sqrt{n_{2}^{\prime}}\;,

from where we obtain that q3c1n2q_{3}\geq\frac{c_{1}^{\prime}}{\sqrt{n^{\prime}_{2}}}, for some positive constant c1c_{1}^{\prime}. Therefore, we conclude that the probability that t(1α)n2t\geq(1-\alpha)n^{\prime}_{2} is at least

t=(1α)n2(1α2)n2qt\displaystyle\sum_{t=(1-\alpha)n^{\prime}_{2}}^{(1-\frac{\alpha}{2})n^{\prime}_{2}}q_{t} c1c1(1α2)n2αn22=c1c1α2(1α2)=:δ.\displaystyle\geq\frac{c_{1}c_{1}^{\prime}}{\sqrt{(1-\frac{\alpha}{2})}\cdot n^{\prime}_{2}}\cdot\frac{\alpha n^{\prime}_{2}}{2}=\frac{c_{1}c_{1}^{\prime}\alpha}{2\sqrt{(1-\frac{\alpha}{2})}}=:\delta^{*}\;.

Now we are able to conclude the proof. Recall that α=γ2\alpha=\frac{\gamma}{2}. Observe that n2=n2n2n2Mn^{\prime}_{2}=n_{2}-n^{*}_{2}\geq n-2M, since n2Mn^{*}_{2}\leq M and n2nMn_{2}\geq n-M. It follows that with probability δ\delta^{*} we have t(1α)n2(12γ3)nt\geq(1-\alpha)n^{\prime}_{2}\geq(1-\frac{2\gamma}{3})n. If this is the case, there are at most 2γ3n\frac{2\gamma}{3}n vertices in non-cyclic components, and thus, there is no component of order at least γn\gamma n in such components. First, we apply Corollary 32 with t(12γ3)nn2t\geq(1-\frac{2\gamma}{3})n\geq\frac{n}{2} and δ=2γ\delta=2\gamma, and obtain that the probability that there exists a cycle of length at least 2γtγn2\gamma t\geq\gamma n is at least δ1>0\delta_{1}>0. Second, we apply Corollary 33 with t(12γ3)nt\geq(1-\frac{2\gamma}{3})n and ϵ=γ\epsilon=\gamma, and obtain that the probability that there exists no cycle of length at least γtγn\gamma t\leq\gamma n is at least δ2>0\delta_{2}>0. Let δ=min{δ1,δ2}\delta^{\prime}=\min\{\delta_{1},\delta_{2}\}. Finally, since this holds for every conditioning on HH, mm^{\prime} and n2n_{2}^{*}, it also holds for the unconditioned statement and we have proved the theorem for δ=δδ\delta=\delta^{*}\delta^{\prime}. ∎

We finish this section with two results that will be useful to proof Theorem 4 and 5.

Lemma 35.

For any positive constant β\beta, if n2βMn_{2}^{\prime}\geq\beta M, then the probability that there are more than n2M\frac{n^{\prime}_{2}}{\sqrt{M}} vertices in cyclic components of GG is oM(1)o_{M}(1).

Proof.

Recall that by Observation 34 (a), we have M4mM\frac{M}{4}\leq m^{\prime}\leq M. We split the proof into two cases.

First, suppose that n22Mn_{2}^{\prime}\leq 2M. We use (6) to upper bound qtq_{t} and obtain the desired probability. There exists some constant c>0c>0 such that

tn2Mqt\displaystyle\sum_{t\geq\frac{n^{\prime}_{2}}{\sqrt{M}}}q_{t} ctn2M(1m1n2+m)t3ctn2M(1M/42M+M/4)t3\displaystyle\leq c\sum_{t\geq\frac{n^{\prime}_{2}}{\sqrt{M}}}\left(1-\frac{m^{\prime}-1}{n^{\prime}_{2}+m^{\prime}}\right)^{t-3}\leq c\sum_{t\geq\frac{n^{\prime}_{2}}{\sqrt{M}}}\left(1-\frac{M/4}{2M+M/4}\right)^{t-3}
ctn2M(89)t39c(89)n2/M3=oM(1),\displaystyle\leq c\sum_{t\geq\frac{n^{\prime}_{2}}{\sqrt{M}}}\left(\frac{8}{9}\right)^{t-3}\leq 9c\left(\frac{8}{9}\right)^{n^{\prime}_{2}/\sqrt{M}-3}=o_{M}(1)\;,

since n2βMn_{2}^{\prime}\geq\beta M.

Now, suppose that n22Mn_{2}^{\prime}\geq 2M. We use (6) to lower bound qtq_{t} for every tn210t\leq\frac{n^{\prime}_{2}}{10}. There exists some constant c>0c>0 such that,

qtcq3n2(1m1n2n210+m)tcq3n2(13M2n2)t.q_{t}\geq\frac{cq_{3}}{\sqrt{n^{\prime}_{2}}}\left(1-\frac{m^{\prime}-1}{n^{\prime}_{2}-\frac{n^{\prime}_{2}}{10}+m^{\prime}}\right)^{t}\geq\frac{cq_{3}}{\sqrt{n^{\prime}_{2}}}\left(1-\frac{3M}{2n^{\prime}_{2}}\right)^{t}\;.

Since qtq_{t} is a probability distribution and since n22Mn_{2}^{\prime}\geq 2M, we have

1\displaystyle 1 t=3n2/10qtcq3n2t=3n2/10(13M2n2)tcq3n2(13M2n2)3(13M2n2)n2/10+11(13M2n2)=cq3n2M,\displaystyle\geq\sum^{n^{\prime}_{2}/10}_{t=3}q_{t}\geq\frac{cq_{3}}{\sqrt{n^{\prime}_{2}}}\sum_{t=3}^{n^{\prime}_{2}/10}\left(1-\frac{3M}{2n^{\prime}_{2}}\right)^{t}\geq\frac{cq_{3}}{\sqrt{n^{\prime}_{2}}}\cdot\frac{(1-\frac{3M}{2n^{\prime}_{2}})^{3}-(1-\frac{3M}{2n^{\prime}_{2}})^{n^{\prime}_{2}/10+1}}{1-(1-\frac{3M}{2n^{\prime}_{2}})}=\frac{c^{\prime}q_{3}\sqrt{n^{\prime}_{2}}}{M}\;,

for some c>0c^{\prime}>0, from which we conclude that q3Mcn2q_{3}\leq\frac{M}{c^{\prime}\sqrt{n^{\prime}_{2}}}. Now we use again (6) to upper bound the desired probability

tn2Mqtcq3n2/Mtn2M(1M/41n2+M/4)tcM5/4cn2tn2M(1M8n2)t\displaystyle\sum_{t\geq\frac{n^{\prime}_{2}}{\sqrt{M}}}q_{t}\leq\frac{cq_{3}}{\sqrt{n^{\prime}_{2}/\sqrt{M}}}\sum_{t\geq\frac{n^{\prime}_{2}}{\sqrt{M}}}\left(1-\frac{M/4-1}{n^{\prime}_{2}+M/4}\right)^{t}\leq\frac{cM^{5/4}}{c^{\prime}n^{\prime}_{2}}\sum_{t\geq\frac{n^{\prime}_{2}}{\sqrt{M}}}\left(1-\frac{M}{8n^{\prime}_{2}}\right)^{t}
cM5/4cn2(1M8n2)n2M1(1M8n2)8ccM1/4eM/8=oM(1).\displaystyle\leq\frac{cM^{5/4}}{c^{\prime}n^{\prime}_{2}}\cdot\frac{\left(1-\frac{M}{8n^{\prime}_{2}}\right)^{\frac{n^{\prime}_{2}}{\sqrt{M}}}}{1-\left(1-\frac{M}{8n^{\prime}_{2}}\right)}\leq\frac{8c}{c^{\prime}}\cdot M^{1/4}e^{-\sqrt{M}/8}=o_{M}(1)\;.

Lemma 36.

For every positive constant β<1100\beta<\frac{1}{100}, if n2βMn^{\prime}_{2}\geq\beta M the following is satisfied. Fix a choice of HH and let UHU_{H} be a union of some components of HH with |E(UH)|βM|E(U_{H})|\geq\beta M. Let UGU_{G} be the union of the corresponding components of GG. Then, with probability 1oM(1)1-o_{M}(1),

|E(UH)|8Mn2|V(UG)|16|E(UH)|Mn2+4|E(UH)|.\frac{|E(U_{H})|}{8M}\cdot n^{\prime}_{2}\leq|V(U_{G})|\leq\frac{16|E(U_{H})|}{M}\cdot n^{\prime}_{2}+4|E(U_{H})|\;.
Proof.

Observe that the choice of HH determines mm^{\prime}, the number of edges that have not been fixed in (1.1), and n2n^{*}_{2}, the number of vertices of degree 2 which have been exposed in (1.2) or (1.3). We will also condition on n2Hn_{2}^{H}, the number of vertices of degree 22 that have been exposed in (2.3). Since n2βMn_{2}^{\prime}\geq\beta M, by Lemma 35, the probability that n2Hn22n_{2}^{H}\leq\frac{n^{\prime}_{2}}{2} is oM(1)o_{M}(1). Hence, we may assume that n2Hn22n_{2}^{H}\geq\frac{n^{\prime}_{2}}{2}.

We denote by n2(UH)n_{2}^{*}(U_{H}) the number of vertices of degree 22 exposed in (1.2) or (1.3) in a component of UHU_{H}. By Observation 34 (b), we have

0n2(UH)2|E(UH)|.0\leq n_{2}^{*}(U_{H})\leq 2|E(U_{H})|\;.

We let m(UH)m^{\prime}(U_{H}) be the number of non-fixed edges in UHU_{H}. By Observation 34 (a), we have |E(UH)|2m(UH)|E(UH)|\frac{|E(U_{H})|}{2}\leq m^{\prime}(U_{H})\leq|E(U_{H})|. Similarly, M4=m2mm=M2\frac{M}{4}=\frac{m}{2}\leq m^{\prime}\leq m=\frac{M}{2}. Thus,

|E(UH)|Mm(UH)m4|E(UH)|M.\frac{|E(U_{H})|}{M}\leq\frac{m^{\prime}(U_{H})}{m^{\prime}}\leq\frac{4|E(U_{H})|}{M}\;.

Let n2H(UH)n_{2}^{H}(U_{H}) be the number of vertices of degree 22 which have been exposed in (2.3) to the edges of UHU_{H}. Since the ordering of the edges in (2.1) was arbitrary, symmetry amongst the non-fixed edges yields:

n2|E(UH)|2Mn2H|E(UH)|M𝔼[n2H(UH)]=n2Hm(UH)m4n2H|E(UH)|M4n2|E(UH)|M.\frac{n_{2}^{\prime}|E(U_{H})|}{2M}\leq\frac{n_{2}^{H}|E(U_{H})|}{M}\leq\mathbb{E}[n_{2}^{H}(U_{H})]=\frac{n_{2}^{H}m^{\prime}(U_{H})}{m^{\prime}}\leq\frac{4n_{2}^{H}|E(U_{H})|}{M}\leq\frac{4n^{\prime}_{2}|E(U_{H})|}{M}\;.

Since the minimum degree in HH is at least one, there are at most 2|E(UH)|2|E(U_{H})| vertices in UHU_{H}. So, the number of vertices in UGU_{G} satisfies

n2H(UH)|V(UG)|=|V(UH)|+n2(UH)+𝔼[n2H(UH)]n2H(UH)+4|E(UH)|.n_{2}^{H}(U_{H})\leq|V(U_{G})|=|V(U_{H})|+n_{2}^{*}(U_{H})+\mathbb{E}[n_{2}^{H}(U_{H})]\leq n_{2}^{H}(U_{H})+4|E(U_{H})|\;.

Now, we will use our random permutation model to show that the random variable n2H(UH)n_{2}^{H}(U_{H}) is concentrated around its expected value, which by the previous equation implies that |V(UG)||V(U_{G})| also is. In (2.1) we insist on choosing an ordering of the non-fixed edges of HH in such a way that the m(UH)m^{\prime}(U_{H}) first ones correspond to E(UH)E(U_{H}).

The probability that n2H(UH)n_{2}^{H}(U_{H})\geq\ell conditional on the value of n2Hn_{2}^{H}, is the same as the probability that in choosing m1m^{\prime}-1 elements in {1,2,,(m1+n2H)}\{1,2,...,(m^{\prime}-1+n_{2}^{H})\}, we choose less than m(UH)m^{\prime}(U_{H}) of them that are smaller than m(UH)+m^{\prime}(U_{H})+\ell. So, since n2Hn22βM2n_{2}^{H}\geq\frac{n^{\prime}_{2}}{2}\geq\frac{\beta M}{2}, letting XX_{\ell} be the number of elements chosen from the m1m^{\prime}-1 ones which are smaller than m(UH)+m^{\prime}(U_{H})+\ell, a standard concentration argument on XX_{\ell} for =4𝔼[n2H(UH)]\ell=4\mathbb{E}[n_{2}^{H}(U_{H})] 999Observe that XX_{\ell} follows a hypergeometric distribution., shows that with probability 1oM(1)1-o_{M}(1) we have n2H(UH)4𝔼[n2H(UH)]n_{2}^{H}(U_{H})\leq 4\mathbb{E}[n_{2}^{H}(U_{H})]. The same holds for the probability of the event n2H(UH)𝔼[n2H(UH)]4n_{2}^{H}(U_{H})\geq\frac{\mathbb{E}[n_{2}^{H}(U_{H})]}{4}, concluding the proof of the lemma. ∎

7 Relating the size of a component of H(𝒟)H(\mathcal{D}) to the order of a component of G(𝒟)G(\mathcal{D})

As usual we use M=M𝒟M=M_{\mathcal{D}} and R=R𝒟R=R_{\mathcal{D}} throughout this section. We start with the proof of Theorem 4 and conclude the section with the proof of Theorem 5.

Proof of Theorem 4.

By Theorem 9 with ϵ=1/3\epsilon=1/3, there is a γ~\tilde{\gamma} such that if RM3R\geq\frac{M}{3}, then the probability that HH has no component of size γ~M\tilde{\gamma}M is o(1)o(1). We will choose ρ\rho to be the minimum of γ30\frac{\gamma}{30} and γ~\tilde{\gamma}. Hence we can assume that RM3R\leq\frac{M}{3}.

To complete the proof, we show that under this assumption, the probability GG has a component of order 30ρn30\rho n given HH has no component of size ρM\rho M is o(1)o(1). Under this hypothesis, for every component KK of HH, we have |E(K)|ρM|E(K)|\leq\rho M. Since each component is connected, this also implies that |V(K)|ρM+1|V(K)|\leq\rho M+1.

The following claim allow us to bound MM in terms of nn.

Claim 5.

If n2n_{2} is the number of vertices of degree 22 in 𝒟\mathcal{D}, then

RM2(nn2).R\geq M-2(n-n_{2})\;.
Proof.

Suppose that R<M2(nn2)R<M-2(n-n_{2}). By the definition of j𝒟j_{\mathcal{D}}, we obtain k=1j𝒟1dπkM+2n2R>2n\sum_{k=1}^{j_{\mathcal{D}}-1}d_{\pi_{k}}\geq M+2n_{2}-R>2n. Since the function f(x)=x(x2)f(x)=x(x-2) is convex,

k=1j𝒟1dπk(dπk2)>(j𝒟1)2nj𝒟1(2nj𝒟12)>0,\sum_{k=1}^{j_{\mathcal{D}}-1}d_{\pi_{k}}(d_{\pi_{k}}-2)>(j_{\mathcal{D}}-1)\frac{2n}{j_{\mathcal{D}}-1}\left(\frac{2n}{j_{\mathcal{D}}-1}-2\right)>0,

which is a contradiction to the choice of j𝒟j_{\mathcal{D}}. Thus RM2(nn2)R\geq M-2(n-n_{2}). ∎

Since RM3R\leq\frac{M}{3}, Claim 5 implies M3nM\leq 3n.

Condition now on the choice of HH and on the set of fixed edges and let mm^{\prime} be the number of non-fixed ones. These choices determine n2n_{2}^{\prime}.

Suppose that n2ρnn^{\prime}_{2}\leq\rho n and recall that n2(K)2|E(K)|n_{2}^{*}(K)\leq 2|E(K)| by Observation 34. Then, deterministically, each component of GG has at most |V(KH)|+n2(K)+n24ρM+113ρn|V(K_{H})|+n^{*}_{2}(K)+n^{\prime}_{2}\leq 4\rho M+1\leq 13\rho n vertices.

Otherwise, n2ρnρM3n^{\prime}_{2}\geq\rho n\geq\frac{\rho M}{3}. Group the components of HH in sets U1,,UsU_{1},\dots,U_{s} such that ρM2|E(Ui)|ρM\frac{\rho M}{2}\leq|E(U_{i})|\leq\rho M. Clearly such a collection exists and s2(ρ)1s\leq 2(\rho)^{-1}. For every 1is1\leq i\leq s, we apply Lemma 36 with UH=UiU_{H}=U_{i} and β=ρ4\beta=\frac{\rho}{4}. Since 𝒟\mathcal{D} is well-behaved, the conclusion of Lemma 36 holds with probability 1o(1)1-o(1). Using a union bound over all the sets UiU_{i}, we obtain that with probability 1o(1)1-o(1), for 1is1\leq i\leq s, we have

|V(Ui)|16|E(Ui)|Mn2+4|E(Ui)|16ρn+4ρM28ρn.|V(U_{i})|\leq\frac{16|E(U_{i})|}{M}\cdot n^{\prime}_{2}+4|E(U_{i})|\leq 16\rho n+4\rho M\leq 28\rho n\;.

Proof of Theorem 5.

It is enough to prove the theorem for sufficiently small positive ρ\rho, for which we will prove it with γ=ρ3\gamma=\rho^{3}.

We can use Lemma 10 to show that if MnloglognM\geq n\log\log{n}, then the probability GG has no component of order (1o(1))n(1-o(1))n is o(1)o(1).

Next, suppose that Mn3M\leq\frac{n}{3}. We show that in this case, the probability that GG has no component of order ρ3n\rho^{3}n given that HH contains a component of size ρM\rho M is o(1)o(1). We have that n2n2Mn3Mn_{2}^{\prime}\geq n-2M\geq\frac{n}{3}\geq M. Letting KHK_{H} be the component of HH of size at least ρM\rho M and KGK_{G} be the component of GG corresponding to KHK_{H}, and applying Lemma 36 with UH=KHU_{H}=K_{H} and β=ρ\beta=\rho, we conclude that with probability 1o(1)1-o(1) (since 𝒟\mathcal{D} is well-behaved)

|V(KG)||E(KH)|8Mn2ρn24ρ3n.|V(K_{G})|\geq\frac{|E(K_{H})|}{8M}\cdot n^{\prime}_{2}\geq\frac{\rho n}{24}\geq\rho^{3}n\;.

Thus, it remains to prove the theorem for

n3Mnloglogn.\frac{n}{3}\leq M\leq n\log\log{n}\;.

For any subgraph KK of GG, let ex(K)=|E(K)||V(K)|\text{ex}(K)=|E(K)|-|V(K)| be the excess of KK. We also define the near-excess of KK as nex(K)=ex(K)+|LV(K)|\text{nex}(K)=\text{ex}(K)+|L\cap V(K)|, where LL is the set of vertices of degree at least MlogM\frac{\sqrt{M}}{\log{M}} in KK. Let SS be the set of vertices of GG that correspond to a component KK in HH with nex(K)ρM2\text{nex}(K)\geq\frac{\rho M}{2}.

The following claim is crucial to finish the proof of the theorem.

Claim 6.

We have

[S,|S|<3ρ2n]=o(1).\mathbb{P}\left[S\neq\emptyset,|S|<3\rho^{2}n\right]=o(1)\;.

We now conclude the proof of the theorem, assuming that the claim is true. Suppose HH contains a component KHK_{H} that satisfies |E(KH)|ρM|E(K_{H})|\geq\rho M. If the corresponding component KGK_{G} in GG satisfies |V(KG)|ρ3n|V(K_{G})|\geq\rho^{3}n, there is nothing to prove. So, suppose |V(KG)|<ρ3n|V(K_{G})|<\rho^{3}n. Then

nex(KH)ex(KH)=|E(KH)||V(KH)|ρMρ3nρM2,\displaystyle\text{nex}(K_{H})\geq\text{ex}(K_{H})=|E(K_{H})|-|V(K_{H})|\geq\rho M-\rho^{3}n\geq\frac{\rho M}{2}\;, (8)

and SS is non-empty.

Since the total excess of the graph is at most MM, the total near-excess is at most M+|L|3M/2M+|L|\leq 3M/2 and there are at most 3ρ\frac{3}{\rho} components in SS. Hence, there exists a component in GG with at least ρ|S|3\frac{\rho|S|}{3} vertices, which by Claim 6 implies that with probability 1o(1)1-o(1), GG has a component of order at least ρ3n\rho^{3}n.

So, it is indeed enough to prove Claim 6. To do so, we need the following:

Claim 7.

We have

[S,vLSd(v)>M2/3]=o(1).\mathbb{P}\left[S\neq\emptyset,~~\sum_{v\in L\setminus S}d(v)>M^{2/3}\right]=o(1)\;.
Proof of Claim 7.

We let 𝒜\mathcal{A} be the event that SS\not=\emptyset, and for any vertex vLv\in L, we let v\mathcal{B}_{v} be the event that SS\not=\emptyset, but vSv\notin S.

Say a graph GG is in v\mathcal{B}_{v}. We only consider switchings between ordered pairs of oriented edges vxvx and yzyz in HH, where yzyz is an edge in a component of HH whose vertex set is in SS, which is not a cut-edge and we allow that v=xv=x or y=zy=z.

Since yzyz is in a component of SS and SS\not=\emptyset, there are at least ex(S)ρM3\text{ex}(S)\geq\frac{\rho M}{3} choices for yzyz. Clearly, there are d(v)d(v) choices for the (directed) edge vxvx.

We show below that there are at most 4M4M such switchings from any element of 𝒜v\mathcal{A}\setminus\mathcal{B}_{v} to v\mathcal{B}_{v}. Thus

[v]12[𝒜v]ρd(v)12ρd(v).\mathbb{P}[\mathcal{B}_{v}]\leq\frac{12\mathbb{P}[\mathcal{A}\setminus\mathcal{B}_{v}]}{\rho d(v)}\leq\frac{12}{\rho d(v)}.

So, we have,

𝔼[S,vLSd(v)]=vLd(v)[v]12ρ1|L|.\mathbb{E}\left[\sum_{S\neq\emptyset,v\in L\setminus S}d(v)\right]=\sum_{v\in L}d(v)\mathbb{P}[\mathcal{B}_{v}]\leq 12\rho^{-1}|L|\;.

Since |L|MlogM|L|\leq\sqrt{M}\log{M}, Markov’s inequality implies that

[S,vLSd(v)>M2/3]=o(1),\mathbb{P}\left[S\neq\emptyset,\sum_{v\in L\setminus S}d(v)>M^{2/3}\right]=o(1)\;,

and the claim follows.

It remains to prove the mentioned bound on the number of switchings between 𝒜v\mathcal{A}\setminus\mathcal{B}_{v} and v\mathcal{B}_{v}. In doing so, we note that (i) if a connected subgraph JJ contains a subgraph of near-excess aa, then JJ also has near-excess at least aa, and (ii) the near-excess of the disjoint union of J1J_{1} and J2J_{2} is at least the sum of the near-excesses of J1J_{1} and J2J_{2}.

Consider a graph GG that is in 𝒜v\mathcal{A}\setminus\mathcal{B}_{v}. Let KK be the component of HH containing vv and let K1,,KK_{1},\ldots,K_{\ell} be the components of K(N(v){v})K-(N(v)\cup\{v\}). For each neighbour ww of vv, let KwK_{w} be the graph induced by ww and all components in {K1,,K}\{K_{1},\ldots,K_{\ell}\} in which ww has a neighbour.

In the following, we consider a triple of vertices x,y,zx,y,z such that switching {vy,xz}\{vy,xz\} leads to a graph in v\mathcal{B}_{v} such that the edge yzyz is not a cut-edge and the component containing yy and zz has near-excess at least ρM2\frac{\rho M}{2}. We note that unless v=xv=x, this implies there is at most one edge between vv and yy. We note further that v,xv,x are distinct from y,zy,z but zz and yy may coincide as may vv and xx.

If v=xv=x, then yy and zz are both neighbours of vv in HH and there are exactly two edges of HH from HKzKyH-K_{z}-K_{y} to KzKyK_{z}\cup K_{y}. Furthermore, there is a path of KzKyK_{z}\cup K_{y} from zz to yy and the near-excess of KzKyK_{z}\cup K_{y} is at least ρM22\frac{\rho M-2}{2}. So, we see that there are at most two choices for the pair y,zy,z and at most eight choices for switchings of this type.

If vxv\neq x, then zz must be in KyK_{y} and in HxzH-xz there can be no path from zz to any of N(v){y}N(v)\setminus\{y\}. Thus, there is a unique choice of vyvy for each such ordered xzxz and at most 2M2d(v)2M-2d(v) total choices for such switchings. ∎

Proof of Claim 6.

Let ZZ be the number of sets of vertices TT that satisfy

  1. (1)

    |T|3ρ2n|T|\leq 3\rho^{2}n,

  2. (2)

    the sum of the degrees of the elements in TT is at least ρM\rho M,

  3. (3)

    vLTd(v)M2/3\sum_{v\in L\setminus T}d(v)\leq M^{2/3},

  4. (4)

    there are no edges between TT and V(H)TV(H)\setminus T.

Observe that SS as defined satisfies (2), since it is non-empty and each component in SS has at least ρM2\frac{\rho M}{2} edges. Since SS is a union of components, it also satisfies (4).

We will show that 𝔼[Z]=o(1)\mathbb{E}[Z]=o(1). This directly implies that the probability SS satisfies (1) and  (3) is at most o(1)o(1), which combined with Claim 7 yields Claim 6.

Fix a set of vertices TT that satisfies (1), (2) and (3). For every 0kk:=2ρn260\leq k\leq k^{*}:=2\lceil\frac{\rho n}{26}\rceil, let 𝒜T,k\mathcal{A}_{T,k} be event that there are exactly kk edges that connect TT with V(H)TV(H)\setminus T.

There are at most k2k^{2} switchings from a graph in 𝒜T,k\mathcal{A}_{T,k} which yields a graph in 𝒜T,k2\mathcal{A}_{T,k-2}.

We now lower bound the number of switchings from a graph in 𝒜T,k2\mathcal{A}_{T,k-2}, to a graph 𝒜T,k\mathcal{A}_{T,k}. To do so we consider pairs consisting of (i) an edge xyxy such that xTx\notin T and xx is not a neighbour of a vertex in TLT\cup L, and (ii) an edge uvuv with both endpoints in TT such that yy is not adjacent simultaneously to both uu and vv. For such a pair, we can always switch xyxy with at least one of uvuv or vuvu. There are at least n|T|(k2)M2/3>n2n-|T|-(k-2)-M^{2/3}>\frac{n}{2} choices for xx, since TT satisfies (3) and MnloglognM\leq n\log\log{n}. Since d(x)1d(x)\geq 1, there are at least the same number of choices for the edge xyxy. Since yLy\notin L, we have d(y)MlogMd(y)\leq\frac{\sqrt{M}}{\log{M}} which implies that there are at most (MlogM)2=Mlog2M\left(\frac{\sqrt{M}}{\log{M}}\right)^{2}=\frac{M}{\log^{2}{M}} edges within TT that have both endpoint adjacent to yy. Using that TT satisfies (2) and that n3Mn\leq 3M, we conclude that there are at least ρM2(k2)Mlog2MρM4\frac{\rho M}{2}-(k-2)-\frac{M}{\log^{2}{M}}\geq\frac{\rho M}{4} choices for the edge uvuv, given the choice of xyxy. The total number of switchings is at least ρnM8\frac{\rho nM}{8}.

So, using that kkk\leq k^{*} and that n3Mn\leq 3M

[𝒜T,k2]k2ρnM8[𝒜T,k]ρn3M[𝒜T,k]ρ[𝒜T,k].\mathbb{P}[\mathcal{A}_{T,k-2}]\leq\frac{k^{2}}{\frac{\rho nM}{8}}\mathbb{P}[\mathcal{A}_{T,k}]\leq\frac{\rho n}{3M}\mathbb{P}[\mathcal{A}_{T,k}]\leq\rho\mathbb{P}[\mathcal{A}_{T,k}]\;.

Therefore,

[𝒜T,0]ρk2[𝒜T,k]ρρn26;.\mathbb{P}[\mathcal{A}_{T,0}]\leq\rho^{\frac{k^{*}}{2}}\mathbb{P}[\mathcal{A}_{T,k^{*}}]\leq\rho^{\frac{\rho n}{26}};.

Since TT satisfies (1), there are at most (n3ρ2n)\binom{n}{3\rho^{2}n} choices for TT. Provided that ρ\rho is small enough, we conclude that 𝔼[Z]=o(1)\mathbb{E}[Z]=o(1). ∎

This completes the proof of Theorem 5. ∎

8 Applications of Theorem 3

We briefly show that Theorem 3 implies the results mentioned in Section 1.4. We consider a sequence of degree sequence 𝔇=(𝒟n)n1\mathfrak{D}=(\mathcal{D}_{n})_{n\geq 1} such that

  1. (d.1)

    it is feasible, smooth, and sparse,

  2. (d.2)

    λ2<1\lambda_{2}<1, and

  3. (d.3)

    λ=i1iλi\lambda=\sum_{i\geq 1}i\lambda_{i}.

Conditions (d.1)–(d.3) are essentially 𝐁𝐑\mathbf{BR}-conditions and they are weaker than 𝐌𝐑\mathbf{MR}-conditions and 𝐉𝐋\mathbf{JL}-conditions. An interesting consequence of them is the following: for every ϵ>0\epsilon>0, there exist positive integers CC and nϵn_{\epsilon} such that if nnϵn\geq n_{\epsilon}, then

iCiniϵn.\sum_{i\geq C}in_{i}\leq\epsilon n\;.

Therefore, any sequence of degree sequences that satisfies (d.1)–(d.3) has almost all the edges incident to vertices of bounded degree.

The results of Molloy and Reed [24], Janson and Luczak [18], and Bollobás and Riordan [5] on the existence of a giant component, are of the following form: provided that 𝔇\mathfrak{D} satisfies certain conditions, if Q(𝔇)>0Q(\mathfrak{D})>0 then G(𝒟n)G(\mathcal{D}_{n}) has a linear order component with probability 1o(1)1-o(1), and if Q(𝔇)0Q(\mathfrak{D})\leq 0, then G(𝒟n)G(\mathcal{D}_{n}) has no linear order component with probability 1o(1)1-o(1).101010In fact, the Molloy-Reed result does not discuss the case Q(𝔇)=0Q(\mathfrak{D})=0. We will show that if 𝔇\mathfrak{D} satisfies (d.1)–(d.3), then Theorem 3 implies the same statement.

Theorem 37.

Let 𝔇=(𝒟n)n1\mathfrak{D}=(\mathcal{D}_{n})_{n\geq 1} be a sequence of degree sequences that satisfies conditions (d.1)–(d.3). Then,

  1. 1.

    if Q(𝔇)>0Q(\mathfrak{D})>0, then there exists a constant c1>0c_{1}>0 such that the probability that G(𝒟n)G(\mathcal{D}_{n}) has a component of order at least c1nc_{1}n is 1o(1)1-o(1).

  2. 2.

    if Q(𝔇)0Q(\mathfrak{D})\leq 0, then for every constant c2>0c_{2}>0, the probability that G(𝒟n)G(\mathcal{D}_{n}) has no component of order at least c2nc_{2}n is 1o(1)1-o(1).

Proof.

First of all, observe that (d.2) implies that 𝔇\mathfrak{D} is well-behaved.

Fix δ>0\delta>0 small enough. Since λ=i1iλi\lambda=\sum_{i\geq 1}i\lambda_{i}, there is an integer KK such that i=1Kiλiλδ\sum_{i=1}^{K}i\lambda_{i}\geq\lambda-\delta. Also, since 𝔇\mathfrak{D} is smooth and provided that nn is large enough, we have

i=1Ki|λinin|<δ,\displaystyle\sum_{i=1}^{K}i\left|\lambda_{i}-\frac{n_{i}}{n}\right|<\delta\;,

and hence, as 𝔇\mathfrak{D} is sparse,

i>Kinin<3δ.\displaystyle\sum_{i>K}i\cdot\frac{n_{i}}{n}<3\delta\;. (9)

Recall Q=Q(𝔇)=i1i(i2)λiQ=Q(\mathfrak{D})=\sum_{i\geq 1}i(i-2)\lambda_{i}. Assume first that Q>0Q>0. We will show that 𝔇\mathfrak{D} is lower bounded. Fix γ>0\gamma>0 such that Q>γQ>\gamma. Now note that there exists a positive integer kk such that i=1ki(i2)λi>γ2\sum_{i=1}^{k}i(i-2)\lambda_{i}>\frac{\gamma}{2}. Since 𝔇\mathfrak{D} is smooth and provided that nn is large enough, we have that for every 1ik1\leq i\leq k

|i(i2)nini(i2)λi|<γ4k.\displaystyle\left|i(i-2)\frac{n_{i}}{n}-i(i-2)\lambda_{i}\right|<\frac{\gamma}{4k}.

Therefore,

i=1ki(i2)nini=1ki(i2)λii=1k|i(i2)nini(i2)λi|γ4.\displaystyle\sum_{i=1}^{k}i(i-2)\frac{n_{i}}{n}\geq\sum_{i=1}^{k}i(i-2)\lambda_{i}-\sum_{i=1}^{k}\left|i(i-2)\frac{n_{i}}{n}-i(i-2)\lambda_{i}\right|\geq\frac{\gamma}{4}.

Since every vertex of degree i[k]i\in[k] contributes in at most i(k2)i(k-2) to the previous sum, this implies R𝒟nγ4kn.R_{\mathcal{D}_{n}}\geq\frac{\gamma}{4k}\cdot n. Using that 𝔇\mathfrak{D} is sparse and since nn is large enough, we have that i1ini2λn\sum_{i\geq 1}in_{i}\leq 2\lambda n and hence M𝒟n2λnM_{\mathcal{D}_{n}}\leq 2\lambda n. Therefore,

R𝒟nγ8λkM𝒟nR_{\mathcal{D}_{n}}\geq\frac{\gamma}{8\lambda k}\cdot M_{\mathcal{D}_{n}}

and thus, the sequence 𝒟n\mathcal{D}_{n} is lower-bounded with ϵ=γ8λk\epsilon=\frac{\gamma}{8\lambda k}. Since it is also well-behaved, Theorem 3 implies that there exists a constant c1>0c_{1}>0 such that the probability that GG has a component of order at least c1nc_{1}n is 1o(1)1-o(1).

Now suppose that Q0Q\leq 0. We aim to show that 𝔇\mathfrak{D} is upper bounded; that is, for every sufficiently small ϵ>0\epsilon>0 and large enough nn, we have R𝒟nϵM𝒟nR_{\mathcal{D}_{n}}\leq\epsilon M_{\mathcal{D}_{n}}. We fix an arbitrary and sufficiently small ϵ>0\epsilon>0. Observe first that i=1ki(i2)λi0\sum_{i=1}^{k}i(i-2)\lambda_{i}\leq 0 for any positive integer kk, as i(i2)0i(i-2)\geq 0 for every i2i\geq 2.

Furthermore, for sufficiently large nn, the number of vertices of degree different than 22 is at least 1λ22n\frac{1-\lambda_{2}}{2}n. Since the minimum degree is at least one, M𝒟n1λ24nM_{\mathcal{D}_{n}}\geq\frac{1-\lambda_{2}}{4}n.

Using (9), we can choose KK large enough such that

i>Kinin<(1λ2)ϵ8.\sum_{i>K}i\cdot\frac{n_{i}}{n}<\frac{(1-\lambda_{2})\epsilon}{8}\;.

As before, since 𝔇\mathfrak{D} is smooth, we can consider nn large enough such that for every 1iK1\leq i\leq K, we have |i(i2)nini(i2)λi|<(1λ2)ϵ8K|i(i-2)\frac{n_{i}}{n}-i(i-2)\lambda_{i}|<\frac{(1-\lambda_{2})\epsilon}{8K}. Therefore,

i=1Ki(i2)nini=1Ki(i2)λi+i=1K|i(i2)nini(i2)λi|0+(1λ2)ϵ8.\displaystyle\sum_{i=1}^{K}i(i-2)\frac{n_{i}}{n}\leq\sum_{i=1}^{K}i(i-2)\lambda_{i}+\sum_{i=1}^{K}\left|i(i-2)\frac{n_{i}}{n}-i(i-2)\lambda_{i}\right|\leq 0+\frac{(1-\lambda_{2})\epsilon}{8}\;.

Because ii(i2)i\leq i(i-2) for every i3i\geq 3, this implies

R𝒟n\displaystyle R_{\mathcal{D}_{n}} (1λ2)ϵ8n+i>Kini\displaystyle\leq\frac{(1-\lambda_{2})\epsilon}{8}n+\sum_{i>K}i\cdot n_{i}
(1λ2)ϵ4n\displaystyle\leq\frac{(1-\lambda_{2})\epsilon}{4}\cdot n
ϵM𝒟n.\displaystyle\leq\epsilon M_{\mathcal{D}_{n}}\;.

Note that the choice of ϵ\epsilon was arbitrary and thus 𝔇\mathfrak{D} is upper-bounded. Since it is also well-behaved, Theorem 3 implies that for every constant c2>0c_{2}>0, the probability that GG has no component of order at least c2nc_{2}n is 1o(1)1-o(1). ∎

Theorem 3 can be also applied to obtain results on the existence of a giant component in specific models of random graphs. Here, as an example, we will study the case of the Power-Law Random Graph introduced by Aiello, Chung and Lu [1]. Let us first recall its definition. Choose two parameters α,β>0\alpha,\beta>0 and consider the sequence of degree sequences 𝔇=(𝒟n)n1\mathfrak{D}=(\mathcal{D}_{n})_{n\geq 1} where 𝒟n\mathcal{D}_{n} has ni(n)=eαiβn_{i}(n)=\lfloor e^{\alpha}i^{-\beta}\rfloor vertices of degree ii, for every i1i\geq 1. We should think about these parameters as follows: α\alpha is typically large and determines the order of the graph (we always have α=Θ(logn)\alpha=\Theta(\log{n})), and β\beta is a fixed constant that determines the power-decay of the degree sequence. The total number of vertices, can be determined in terms of α\alpha and β\beta,

n{ζ(β)eαif β>1αeαif β=1eαβ1βif 0<β<1n\approx\begin{cases}\zeta(\beta)e^{\alpha}&\quad\text{if }\beta>1\\ \alpha e^{\alpha}&\quad\text{if }\beta=1\\ \frac{e^{\frac{\alpha}{\beta}}}{1-\beta}&\quad\text{if }0<\beta<1\\ \end{cases}

and similarly for the number of edges

m{12ζ(β1)eαif β>214αeαif β=212e2αβ2βif 0<β<2m\approx\begin{cases}\frac{1}{2}\zeta(\beta-1)e^{\alpha}&\quad\text{if }\beta>2\\ \frac{1}{4}\alpha e^{\alpha}&\quad\text{if }\beta=2\\ \frac{1}{2}\frac{e^{\frac{2\alpha}{\beta}}}{2-\beta}&\quad\text{if }0<\beta<2\\ \end{cases}

where ζ(x)=i1ix\zeta(x)=\sum_{i\geq 1}i^{-x} is the standard zeta function. Moreover, the maximum degree is eα/βe^{\alpha/\beta}.

The Power-Law Random Graph, denoted by G=G(α,β)G=G(\alpha,\beta), is a graph on nn vertices chosen uniformly at random among all such graphs with degree sequence 𝒟n\mathcal{D}_{n}. There are some values of β\beta (for instance β=1\beta=1) for which 𝔇\mathfrak{D} does not satisfy the conditions (d.1)–(d.3). Thus, Theorem 37 cannot be used in general.

Nevertheless, observe that for every α,β>0\alpha,\beta>0, we have n2n2n_{2}\leq\frac{n}{2}, and thus, the asymptotic degree sequence is well-behaved. This in particular implies that M𝒟n2mnM_{\mathcal{D}_{n}}\geq 2m-n.

Let β0=3.47875\beta_{0}=3.47875\dots be a solution to the equation ζ(β2)2ζ(β1)=0\zeta(\beta-2)-2\zeta(\beta-1)=0. This is the important threshold for the appearance of the giant component since ββ0\beta\geq\beta_{0} if and only if

i1i(i2)nin0.\displaystyle\sum_{i\geq 1}i(i-2)\frac{n_{i}}{n}\leq 0\;. (10)

We refer the reader to the beginning of Section 3 in [1] for a formal proof of this fact. Thus for ββ0\beta\geq\beta_{0}, the parameter R𝒟nR_{\mathcal{D}_{n}} is simply the maximum degree of GG, which is eαβe^{\frac{\alpha}{\beta}}. Since β0>1\beta_{0}>1, we have R𝒟nM𝒟nR_{\mathcal{D}_{n}}\ll M_{\mathcal{D}_{n}}.

Theorem 38 (Aiello, Chung and Lu).

Let G=G(α,β)G=G(\alpha,\beta) be a Power-Law random graph. Then,

  1. 1.

    if β<β0\beta<\beta_{0}, then there exists a constant c1>0c_{1}>0 such that the probability that GG has a component of order at least c1nc_{1}n is 1o(1)1-o(1).

  2. 2.

    if ββ0\beta\geq\beta_{0}, then for every constant c2>0c_{2}>0, the probability that GG has no component of order at least c2nc_{2}n is 1o(1)1-o(1).

We want to emphasize that the structural description the authors give in their paper is more precise than the just mentioned result.

Proof.

We already addressed the case ββ0\beta\geq\beta_{0} before the theorem. In such a case, the sequence 𝔇\mathfrak{D} is upper-bounded and it follows from Theorem 3 that for every constant c2>0c_{2}>0, the probability that GG has no component of order at least c2nc_{2}n is 1o(1)1-o(1).

Now, we consider β<β0\beta<\beta_{0}. We will show that in this case 𝔇\mathfrak{D} is lower-bounded. We split its proof into three cases,

Case 0<β20<\beta\leq 2: Suppose β2\beta\not=2. The number of edges is of order e2αβe^{\frac{2\alpha}{\beta}} and thus the average degree is of order e(2β1)αe^{(\frac{2}{\beta}-1)\alpha}, which tends to \infty as α\alpha\to\infty. Thus, provided that nn is large enough, we have R𝒟nM𝒟n2R_{\mathcal{D}_{n}}\geq\frac{M_{\mathcal{D}_{n}}}{2}. The same argument applies for β=2\beta=2, as the average degree is of order α=Θ(logn)\alpha=\Theta(\log n).

Case 2<β<32<\beta<3: Let kk be the smallest integer such that i=1ki2β>4ζ(β1)\sum_{i=1}^{k}i^{2-\beta}>4\zeta(\beta-1). Thus

i=1ki(i2)eαiβeα(i=1ki2β2i=1i1β)2ζ(β1)ζ(β)n.\displaystyle\sum_{i=1}^{k}i(i-2)e^{\alpha}i^{-\beta}\geq e^{\alpha}\left(\sum_{i=1}^{k}i^{2-\beta}-2\sum_{i=1}^{\infty}i^{1-\beta}\right)\geq\frac{2\zeta(\beta-1)}{\zeta(\beta)}\cdot n.

Therefore,

R𝒟ni=k+1eαβieαiβcM𝒟n,\displaystyle R_{\mathcal{D}_{n}}\geq\sum_{i=k+1}^{e^{\frac{\alpha}{\beta}}}i\cdot\frac{e^{\alpha}}{i^{\beta}}\geq cM_{\mathcal{D}_{n}},

for some small constant c=c(β)>0c=c(\beta)>0. Here we used that M𝒟2mn(ζ(β1)ζ(β))eαM_{\mathcal{D}}\geq 2m-n\approx(\zeta(\beta-1)-\zeta(\beta))e^{\alpha}.

Case 3β<β03\leq\beta<\beta_{0}: Suppose now that β>3\beta>3 (the case β=3\beta=3 is similar to what follows). Let ϵ=ζ(β2)2ζ(β1)\epsilon=\zeta(\beta-2)-2\zeta(\beta-1) and kk be the smallest integer ii such that 1(β3)kβ3<ϵ2\frac{1}{(\beta-3)k^{\beta-3}}<\frac{\epsilon}{2}. Using an integral approximation of the sum, we obtain i=1ki2βζ(β2)1(β3)kβ3\sum_{i=1}^{k}i^{2-\beta}\approx\zeta(\beta-2)-\frac{1}{(\beta-3)k^{\beta-3}}. Thus

i=1ki(i2)eαiβeα(i=1ki2β2i=1i1β)ϵ2ζ(β)n.\displaystyle\sum_{i=1}^{k}i(i-2)e^{\alpha}i^{-\beta}\geq e^{\alpha}\left(\sum_{i=1}^{k}i^{2-\beta}-2\sum_{i=1}^{\infty}i^{1-\beta}\right)\geq\frac{\epsilon}{2\zeta(\beta)}\cdot n.

As in the previous case, we have R𝒟ncM𝒟nR_{\mathcal{D}_{n}}\geq cM_{\mathcal{D}_{n}} for some constant c=c(β)>0c=c(\beta)>0.

Since 𝔇\mathfrak{D} is well-behaved and lower-bounded for β<β0\beta<\beta_{0}, we can use Theorem 3 to conclude that there exists a constant c1>0c_{1}>0 such that the probability that GG has a component of order at least c1nc_{1}n is 1o(1)1-o(1). ∎

References

  • [1] W. Aiello, F. Chung, and L. Lu, A random graph model for massive graphs, Proceedings of the thirty-second annual ACM Symposium on Theory of Computing (STOC), ACM, 2000, pp. 171–180.
  • [2] R. Albert and A.-L. Barabási, Statistical mechanics of complex networks, Reviews of modern physics 74 (2002), 47.
  • [3] A.-L. Barabási and R. Albert, Emergence of scaling in random networks, science 286 (1999), no. 5439, 509–512.
  • [4] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang, Complex networks: Structure and dynamics, Physics reports 424 (2006), 175–308.
  • [5] B. Bollobás and O. Riordan, An old approach to the giant component problem, J. Combin. Theory (Series B) 113 (2015), 236–260.
  • [6] V. Bourassa and F. Holt, SWAN: Small-world wide area networks, Proceeding of International Conference on Advances in Infrastructures (SSGRR 2003w), L’Aquila, Italy, 2003.
  • [7] C. Cooper, The cores of random hypergraphs with a given degree sequence, Random Structures Algorithms 25 (2004), 353–375.
  • [8] C. Cooper, M. Dyer, and C. Greenhill, Sampling regular graphs and a peer-to-peer network, Comb., Probab. Comput. 16 (2007), 557–593.
  • [9] M. Faloutsos, P. Faloutsos, and C. Faloutsos, On power-law relationships of the internet topology, ACM SIGCOMM computer communication review, vol. 29, ACM, 1999, pp. 251–262.
  • [10] D. Fernholz and V. Ramachandran, Cores and connectivity in sparse random graphs, The University of Texas at Austin, Department of Computer Sciences, technical report TR-04-13 (2004).
  • [11]   , The diameter of sparse random graphs, Random Structures Algorithms 31 (2007), 482–516.
  • [12] P. Flajolet and R. Sedgewick, Analytic combinatorics, Cambridge University Press, 2009.
  • [13] N. Fountoulakis, Percolation on sparse random graphs with given degree sequence, Internet Mathematics 4 (2007), 329–356.
  • [14] N. Fountoulakis and B. Reed, A general critical condition for the emergence of a giant component in random graphs with given degrees, Electronic Notes in Discrete Mathematics 34 (2009), 639–645.
  • [15] C. Greenhill, The switch markov chain for sampling irregular graphs, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2015, pp. 1564–1572.
  • [16] H. Hatami and M. Molloy, The scaling window for a random graph with a given degree sequence, Random Structures Algorithms 41 (2012), 99–123.
  • [17] S. Janson, On percolation in random graphs with given vertex degrees, Electron. J. Probab. 14 (2009), 87–118.
  • [18] S. Janson and M. J. Luczak, A new approach to the giant component problem, Random Structures Algorithms 34 (2009), 197–216.
  • [19] S. Janson and M.J. Luczak, A simple solution to the k-core problem, Random Structures Algorithms 30 (2007), 50–62.
  • [20] A. Joseph, The component sizes of a critical random graph with pre-described degree sequence, The Annals of Applied Probability 24 (2014), 2560–2594.
  • [21] M. Kang and T. G. Seierstad, The critical phase for random graphs with a given degree sequence, Comb., Probab. Comput. 17 (2008), 67–86.
  • [22] T. Luczak, Sparse random graphs with a given degree sequence, Proceedings of the Symposium on Random Graphs, Poznan, 1989, pp. 165–182.
  • [23] B. D. McKay, Subgraphs of random graphs with specified degrees, vol. 33, 1981, pp. 213–223.
  • [24] M. Molloy and B. Reed, A critical point for random graphs with a given degree sequence, Random Structures Algorithms 6 (1995), 161–180.
  • [25]   , The size of the giant component of a random graph with a given degree sequence, Comb., Probab. Comput. 7 (1998), 295–305.
  • [26]   , Graph colouring and the probabilistic method, vol. 23, Springer Science & Business Media, 2013.
  • [27] M. E. J. Newman, The structure and function of complex networks, SIAM review 45 (2003), 167–256.
  • [28] M. E. J. Newman, D.J. Watts, and S.H. Strogatz, Random graph models of social networks, Proceedings of the National Academy of Sciences 99 (2002), no. suppl 1, 2566–2572.
  • [29] J. Petersen, Die Theorie der regulären graphs, Acta Math. 15 (1891), 193–220.
  • [30] O. Riordan, The phase transition in the configuration model, Comb., Probab. Comput. 21 (2012), 265–299.
  • [31] H. van den Esker, R. van der Hofstad, G. Hooghiemstra, and D. Znamenski, Distances in random graphs with infinite mean degrees, Extremes 8 (2005), no. 3, 111–141.
  • [32] R. van der Hofstad, G. Hooghiemstra, and P. van Mieghem, Distances in random graphs with finite variance degrees, Random Structures Algorithms 27 (2005), 76–123.
  • [33] N. C. Wormald, Some problems in the enumeration of labelled graphs, Doctoral Thesis (1980).
  • [34]   , Models of random regular graphs, Surveys in combinatorics, 1999 (Canterbury), London Math. Soc. Lecture Note Ser., vol. 267, Cambridge Univ. Press, Cambridge, 1999, pp. 239–298.