Uniform Limit Theory for Network Data
Abstract
I present a novel uniform law of large numbers (ULLN) for network-dependent data. While Kojevnikov, Marmer, and Song (KMS, 2021) provide a comprehensive suite of limit theorems and a robust variance estimator for network-dependent processes, their analysis focuses on pointwise convergence. On the other hand, uniform convergence is essential for nonlinear estimators such as M and GMM estimators (e.g., Newey and McFadden, 1994, Section 2). Building on KMS, I establish the ULLN under network dependence and demonstrate its utility by proving the consistency of both M and GMM estimators. A byproduct of this work is a novel maximal inequality for network data, which may prove useful for future research beyond the scope of this paper.
Keywords: GMM estimation, M estimation, maximal inequality, network, uniform law of large numbers
JEL Codes: C12, C21, C31
1 Introduction
In recent years, asymptotic analysis of network-dependent data has garnered significant attention in econometrics (e.g., Kuersteiner, 2019; Leung and Moon, 2019; Kuersteiner and Prucha, 2020; Kojevnikov, Marmer, and Song, 2021).111This set of references excludes another important branch of the literature on network asymptotics—namely, the literature on dyadic-like networks—because its strong dependence structure differs significantly from the focus of this paper, both in terms of network structure and limit theory. Among these, Kojevnikov, Marmer, and Song (KMS, 2021) establish limit theorems and develop a robust variance estimator for a general class of dependent processes that encompass dependency-graph models in particular. Their framework, grounded in a conditional -dependence concept adopted from Doukhan and Louhichi (1999), offers powerful tools for handling network data and has spurred further research in related fields. Furthermore, the theory and methods introduced by KMS have been widely applied in economic and econometric studies of network models.
Many applications, particularly those involving nonlinear models like limited dependent variable models, demand uniform convergence results. In the context of general classes of M estimators (including maximum likelihood estimators) and generalized method of moments (GMM) estimators, a uniform law of large numbers (ULLN) is crucial for ensuring that the empirical criterion function converges uniformly to its population counterpart. This uniform convergence is fundamental for establishing the consistency—and subsequently the asymptotic normality—of these estimators, as detailed in standard references such as the handbook chapter by Newey and McFadden (1994, Section 2).
Although KMS offer elegant pointwise limit theorems under network dependence, their results do not directly yield the uniform law of large numbers (ULLN) required for nonlinear estimation. Achieving uniform convergence necessitates controlling not only the individual moments of network-dependent observations but also the fluctuations of the entire process uniformly across the parameter space. This task is further complicated by the intricate dependence structure inherent in network data.
The main contribution of this paper is to bridge this gap by establishing a novel ULLN under network dependence. My results build on the KMS framework, which utilizes model restrictions based on conditional -dependence, decay rates of network dependence, and moment bounds – these concepts will be concisely reviewed in Section 2.1. In addition, I impose further regularity conditions—such as uniform boundedness, uniform Lipschitz continuity, uniform integrability, and modified network decay conditions—that enable the extension from pointwise to uniform convergence.
These developments lay the groundwork for establishing consistency—and subsequently asymptotic normality—in nonlinear model estimation, including M and GMM estimators, under network dependence. I illustrate the utility of the novel ULLN by proving the consistency of these estimators. Specifically, the ULLN established herein permits one to follow the standard arguments detailed in Newey and McFadden (1994, Section 2).
Finally, although the primary contribution of this paper is the novel ULLN, a key by-product of the analysis is the development of a new maximal inequality for conditionally -dependent processes in network data. Recognizing its potential utility for future research beyond the scope of this work, I present this result independently in Appendix B, separate from the assumptions introduced in the main text.
The remainder of the paper is organized as follows. In Section 2, I introduce the setup. Section 3 presents the main theoretical results on the ULLN, while Section 4 illustrates applications of the ULLN to M and GMM estimators. Section 5 concludes. Detailed mathematical derivations are provided in the appendices: Appendix A contains the proofs of the main results, and Appendix B presents an auxiliary lemma (the maximal inequality) and its proof.
2 The Setup
This section introduces the econometric framework.
First, I introduce some basic notation. Let . For any function , define
where is a metric on . With these definitions, we introduce the class of uniformly bounded Lipschitz functions:
2.1 Conditionally -Dependent Processes
This subsection provides a concise overview of the baseline model introduced in Kojevnikov, Marmer, and Song (KMS, 2021) and the notational conventions used in the KMS framework; for a more detailed exposition, please refer to the original paper by KMS.
For each , let denote the set of indices corresponding to the nodes in the network with the adjacency matrix whose elements are and . A link between nodes and exists if and only if the -th entry of equals one. For each , let be the -algebra with respect to which the adjacency matrix is measurable. Let denote the network distance between nodes and in , defined as the length of the shortest path connecting and in .
For and a positive real number , define
where
Thus, each element of is a pair of node sets of sizes and with a distance of at least between them.
Consider a triangular array of random vectors in . The following definition introduces the notion of conditional -dependence as provided in KMS.
Definition 1 (Conditional -Dependence; KMS, Definition 2.2).
A triangular array is conditionally -dependent given if for each , there exists a -measurable sequence
and a collection of nonrandom functions
such that for all positive integers , for every pair with , and for all functions and , the following inequality holds almost surely:
As emphasized in KMS, it is important to note that the decay coefficients are generally random, allowing one to accommodate the “common shocks” present in the network. I now present the following three key assumptions from KMS, which will be employed throughout the present paper.
Assumption 1 (KMS, Assumption 2.1 (a)).
The triangular array is conditionally –dependent given with dependence coefficients , and there exists a constant such that for all , , and ,
For , let denote the conditional norm defined by
With this notation, the following assumption imposes a moment condition.
Assumption 2 (KMS, Assumption 3.1).
There exists such that
For each node in the network for each row and , define the –shell
Then, define the average shell size
With this notation, the following assumption restricts the denseness of the network and the decay rate of dependence with the network distance.
Assumption 3 (KMS, Assumption 3.2).
The combined effect of network denseness and the decay of dependence is controlled so that
I refer readers to the original paper by KMS for detailed discussions of these assumptions, as they are excerpted from KMS. Under these assumptions, along with an additional regularity condition, KMS establish the pointwise law of large numbers – see Proposition 3.1 in their paper.
2.2 Function Classes
This subsection introduces a parameter-indexed class of functions and imposes additional restrictions to establish the uniform law of large numbers.
Let denote a parameter space. For each , let
be a measurable function.
In addition to the three assumptions inherited from KMS, I impose the following conditions on the parameter space and the function class .
Assumption 4 (Compactness).
The parameter space is compact.
Assumption 5 (Function Class).
For each fixed , the function belongs to .
Assumption 5 imposes the uniform bound and Lipschitz conditions on each function in the class.
For ‘each’ , the pointwise law of large numbers (LLN), as stated in Proposition 3.1 of KMS, holds under Assumptions 1, 2, 3, and 5. I will leverage this pointwise result by KMS as an auxiliary step in establishing the uniform law of large numbers.
Assumption 6 (Uniform Equicontinuity).
The function class is uniformly equicontinuous in . In particular, there exists a constant such that for all in the support of for all and for all ,
3 The Main Results: Uniform Laws of Large Numbers
I now state the first main result of this paper—the uniform law of large numbers for network-dependent data.
See Appendix A.1 for a proof.
Note that Theorem 1 establishes the uniform almost sure convergence to its conditional mean. In order to further deduce uniform almost sure convergence to the unconditional mean averaging over all the common shocks,
I strengthen Assumptions 3 and 5 into Assumptions 3′ and 5′, as stated below.
-
Assumption 5′.
There exist , and for all .
These modifications to the baseline assumptions facilitate the application of a maximal inequality for conditionally -dependent processes; see Appendix B for further details. In particular, it is advantageous to impose the bounds and uniformly for all functions in the class . This uniformity ensures that a universal constant in the maximal inequality is applicable across the entire function class. This point is subtle but constitutes an important aspect of these assumptions.
In addition to Assumptions 1, 2, 3′, 4, 5′, and 6, I also impose the following network sparsity condition.
Assumption 7 (Network Sparsity).
There exist finite non-zero constants , , , and such that one can find a partition of into equally-sized blocks of size satisfying
for all sufficiently large .
The purpose of this assumption is to facilitate the blocking strategy employed in deriving the maximal inequality – see Appendix B. In the context of time-series data, the network structure is ‘linear’ and inherently sparse, which naturally permits a blocking scheme with alternating blocks for establishing a maximal inequality. Similarly, Assumption 7 postulates a blocking strategy for network data that ensures sufficient separation of nodes within each block , a condition that is feasible under sparse networks. In social networks, it is plausible that for any individual there exists a set of other individuals arbitrarily far away in the network. In contrast, this assumption rules out densely connected networks, such as the global trade network where nearly every country is linked to most others.
Remark 1 (On the Value of and Restrictiveness of Assumptions 3′ and 7).
The value of in Assumption 3′ must match the value of in Assumption 7 whenever both are invoked. Nonetheless, there is no fundamental trade-off between these assumptions, and one can choose to be an arbitrarily large positive integer, rendering the -relevant parts of Assumptions 3′ and 7 effectively non-restrictive. Specifically, Assumption 3′ may be taken with
for some small , and Assumption 7 with
for small . While there is no explicit cost to choosing a large for the purpose of proving the uniform law of large numbers, there is an implicit cost in the background. For instance, if one makes Assumption 7 less restrictive by lowering and while keeping in Assumption 3′ fixed, then the rate of divergence in the bound of the maximal inequality increases. In particular,
where . Consequently, if the maximal inequality (Lemma 1 in Appendix B) is to be used for other applications in the readers’ future research, one should be aware of this implicit cost incurred by making the generalized counterparts of Assumptions 3′ and 7, namely Assumptions 11 (ii) and 12, less restrictive. See Appendix B for further details.
Now, I state the second main result of this paper – the uniform law of large numbers for the unconditional mean:
Corollary 1 (Uniform Law of Large Numbers for the Unconditional Mean).
See Appendix A.2 for a proof.
4 Applications
Recall from the introductory section, that the primary motivation for developing the uniform law of large numbers in this paper is to facilitate the consistency of M (MLE-type) and GMM estimators. In this section, I demonstrate how the uniform law of large numbers, specifically Corollary 1, can be applied to establish the consistency of M estimators (Section 4.1) and GMM estimators (Section 4.2).
4.1 Application I: M Estimation
Let and be the population and sample criterion functions for M estimation, defined on by
respectively. The M estimator is defined as
Suppose that the population criterion satisfies the following condition.
Assumption 8 (Identification for M Estimation).
There exists a unique such that
With this identification condition, the standard argument based on Newey and McFadden (1994, Theorem 2.1), for example, yields the consistency by the uniform law of large numbers (my Corollary 1).
Let me state this conclusion formally as a proposition.
Proposition 1 (Consistency of the M Estimator).
Although this proposition immediately follows by applying my Corollary 1 and Newey and McFadden (1994, Theorem 2.1) as mentioned above, I provide a proof for completeness in Appendix A.3.
Note that the limit theorems of KMS alone could not produce this crucial result due to their focus on pointwise convergence. Once the consistency has been established, then the standard argument along the lines of Newey and McFadden (1994, Section 3), together with the limit distribution theory of KMS (their Theorem 3.2), yields the asymptotic normality under additional regularity conditions.
4.2 Application II: GMM Estimation
Let denote the moment function such that the true parameter vector satisfies the moment equality
Define the sample moment function by
For any sequence of positive definite weighting matrices (which may depend on the data) converging in probability to a positive definite matrix , the GMM estimator is defined as
We can define the population criterion by
Suppose that the population moment satisfies the following condition.
Assumption 9 (Identification for GMM Estimation).
There exists a unique such that
With this identification condition, the standard argument based on Newey and McFadden (1994, Theorem 2.1), for example, yields the consistency by the uniform law of large numbers (my Corollary 1).
Let me state this conclusion formally as a proposition.
Proposition 2 (Consistency of the GMM Estimator).
Although this proposition immediately follows by applying my Corollary 1 and Newey and McFadden (1994, Theorem 2.1) as mentioned above, I provide a proof for completeness in Appendix A.4.
Note that the results of KMS alone could not produce this crucial result due to their focus on pointwise convergence. Once the consistency has been established, then the standard argument along the lines of Newey and McFadden (1994, Section 3), together with the limit distribution theory of KMS (their Theorem 3.2), yields the asymptotic normality under additional regularity conditions.
5 Summary
In this paper, I extend the limit theory for the framework of network-dependent data, introduced by Kojevnikov, Marmer, and Song (KMS 2021), by developing a novel uniform law of large numbers (ULLN). While KMS provide a comprehensive set of limit theorems and a robust variance estimator for network-dependent processes, their pointwise results are insufficient for applications that require uniform convergence—such as establishing the consistency of M and GMM estimators, where the empirical criterion function must converge uniformly over a compact parameter space.
My main contribution fills this gap by setting forth conditions under which uniform convergence to both the conditional and unconditional means holds, thereby paving the way for consistent estimation of nonlinear models in the presence of network dependence. I illustrate the applications by using Corollary 1 to establish the consistency of both M and GMM estimators.
Finally, as a byproduct of this paper, I establish a novel maximal inequality for conditionally -dependent processes, which may be of independent interest for future research beyond the scope of this study. To highlight its broader applicability, I present this result separately in Appendix B, along with an independent set of assumptions distinct from those introduced in the main text.
References
- Doukhan and Louhichi (1999) Doukhan, P. and S. Louhichi (1999). A new weak dependence condition and applications to moment inequalities. Stochastic Processes and Their Applications 84(2), 313–342.
- Kojevnikov et al. (2021) Kojevnikov, D., V. Marmer, and K. Song (2021). Limit theorems for network dependent random variables. Journal of Econometrics 222(2), 882–908.
- Kuersteiner (2019) Kuersteiner, G. M. (2019). Limit theorems for data with network structure. arXiv preprint arXiv:1908.02375.
- Kuersteiner and Prucha (2020) Kuersteiner, G. M. and I. R. Prucha (2020). Dynamic spatial panel models: Networks, common shocks, and sequential exogeneity. Econometrica 88(5), 2109–2146.
- Leung and Moon (2019) Leung, M. P. and H. R. Moon (2019). Normal approximation in large network models. arXiv preprint arXiv:1904.11060.
- Newey and McFadden (1994) Newey, W. and D. McFadden (1994). Large sample estimation and hypothesis testing. In R. Engle and D. McFadden (Eds.), Handbook of Econometrics, pp. 2111–2245. Elsevier.
Appendix
The appendix consists of two sections. Appendix A collects proofs of the theoretical results presented in the main text. Specifically, Appendix A.1, A.2, A.3, and A.4 present proofs of Theorem 1, Corollary 1, Proposition 1, and Proposition 2, respectively. Appendix B presents an auxiliary lemma (Lemma 1), the novel maximal inequality for network-dependent data, which is used for the proof of Corollary 1. Notations carry over from the main text.
Appendix A Proofs
A.1 Proof of Theorem 1
Proof.
By Assumption 4, for any there exists a finite –net such that for every , there exists some with
For an arbitrary , let be a net point with . Decompose
where the three components on the right-hand side are:
Applying Proposition 3.1 of Kojevnikov, Marmer, and Song (2021) under my Assumptions 1, 2, 3, and 5 for each fixed gives
Since the –net is finite, it follows that
(1) |
For any ,
and thus
Since for every fixed the maximum over the -net converges to 0 almost surely by (1), it follows that
Because is arbitrary, this shows that
as claimed in the statement of the theorem. ∎
A.2 Proof of Corollary 1
Proof.
For any fixed ,
Triangle inequality yields
By Theorem 1, the first term on the right-hand side converges to 0 almost surely under Assumptions 1, 2, 3′, 4, 5′, and 6.
It remains to show that
(2) |
(Note that identical distribution is not assumed.) For each , , and , the law of iterated expectations gives
Thus, the difference
has mean zero. For notational convenience, write the sum as
By Assumption 4, for any , there exists a -net with such that for every , there exists some with
For any , let be an element of the -net satisfying . Then, we can decompose as
Triangle inequality yields
(3) |
By Assumption 6, for each , we have
Summing over , it follows that
(4) |
(5) |
Lemma 1 under Assumptions 1, 3′, 5′, and 7, in which , , , , , , , and are all constant across all , provides a universal constant such that, for each in the -net ,
where with
Since the -net is finite with points,
(6) |
Combine (5) and (6) to bound the -th moment of the supremum:
Recall that , and choose to get
The law of iterated expectations yields
Divide both sides by to obtain
Now, let be arbitrary. By Markov’s inequality,
(7) |
Since
by the definitions of and , it follows that
and, therefore, the series
(obtained from taking the sum on the right-hand side of (7) over ) converges to zero. Hence, by the Borel–Cantelli lemma, for every ,
Because was arbitrary, it follows that
showing that (2) holds.
A.3 Proof of Proposition 1
Proof.
I am going to check the four conditions of Newey and McFadden (NM, 1994, Theorem 2.1). Assumption 8 implies condition (i) in NM. Assumption 4 implies condition (ii) in NM. Condition (iii) in NM is directly assumed in the statement of the proposition. Corollary 1 under Assumptions 1, 2, 3′, 4, 5′, 6, and 7 implies condition (iv) in NM. Therefore, the claim of the proposition follows by Theorem 2.1 of NM. ∎
A.4 Proof of Proposition 2
Proof.
I am going to check the four conditions of Newey and McFadden (NM, 1994, Theorem 2.1). Assumption 9 and the positive definiteness of imply condition (i) in NM. Assumption 4 implies condition (ii) in NM. Condition (iii) in NM is directly assumed in the statement of the proposition. Since where is positive definite, Corollary 1 under Assumptions 1, 2, 3′, 4, 5′, 6, and 7 implies condition (iv) in NM. Therefore, the claim of the proposition follows by Theorem 2.1 of NM. ∎
Appendix B An Auxiliary Lemma: A Maximal Inequality for Conditionally -Dependent-Type Processes
This section presents a novel maximal inequality for conditionally -dependent processes, which is used in the proof of Corollary 1. Since this inequality itself is potentially useful for future research on network analysis outside of the scope of the current paper, I state a self-contained set of assumptions here separately from those in the main text for general applicability.
Let be a triangular array, and consider the following set of assumptions with the basic notations inherited from Section 2 in the main text.
Assumption 10 (Zero Mean and Boundedness).
There exists such that
and for all a.s.
Assumption 11 (Decay Rate).
(i) There exists such that
for all a.s.
(ii) There exist and an integer such that
for all a.s.
Assumption 12 (Network Sparsity).
There exists a constants , , , , and such that one can find a partition of into equally-sized blocks with each size denoted by , satisfying and
for all that are sufficiently large.
The last assumption may be interpreted as a network sparsity condition, and Assumption 7 in the main text is a special case of Assumption 12 with and . As discussed below Assumption 7 in the main text, it facilitates the blocking strategy in the proof of the maximal inequality:
Lemma 1 (Maximal Inequality for Conditionally -Dependent Processes).
Proof of Lemma 1.
The proof of the lemma consists of four steps. Throughtout the proof, I will drop the qualification “a.s.” for ease of writing.
Step 2 (Bounding Block Sums)
In this step, I am now going to show that there exists a constant (depending on , , , , and ) such that for each block index ,
(8) |
I begin with getting a bound for each term when the -th power on the left-hand side of (8) is expanded. Applying Hölder’s inequality iteratively yields
Hence, for each -tuple ,
(9) |
is true under Assumption 10. Now, I branch into two types of such terms, labeled as (a) and (b) below.
(a) (Indices Are Fully Distinct)
First, consider a term indexed a -tuple where all the indices are distinct. Under Assumption 11 (i), the bound (9) can be in turn bounded above as
for such a term. Summing over all fully distinct -tuples gives
(10) |
where the total number of fully distinct tuples in (i.e., the number of summands in (10)) is
Under Assumptions 11 (ii) and 12, the typical product (i.e., the summand in the right-hand side of (10))
will be bounded above by
Therefore, the overall off-diagonal contribution (10) is of order
This shows that the sum of the off-diagonal terms is bounded by a constant times .
(b) (Indices Are Not Fully Distinct)
Next, consider a term indexed a -tuple where not all the indices are distinct. Let denote the number of distinct indices in such . For this type of terms, the bound (9) will be in turn bounded above as
under Assumptions 10 and 11 (i). The product
on the right-hand side of the last display will be bounded above by
under Assumptions 11 (ii) and 12. The total number of such terms with distinct indices is
Therefore, the overall contribution from terms with distinct indices is of order
where is used to obtain the last bound. Since this bound is the same for all , it shows that the sum of all the terms with repeated indices is bounded above by a constant times .
Combining the bounds from (a) of order and (b) of order yields
for a constant that depends on , , , , and . This establishes (8).
Step 3 (Summing over Blocks)
Now, define the cumulative block sums
For every ,
and hence
Then, applying the generic inequality
with yields
Taking conditional expectation gives
By the bound (8) established in the previous step,
Therefore,
Step 4 (Extension to All Indices)
For any natural number with , there exist and such that and .
Then, the partial sum can bounded with two components:
By Assumption 10, the second term on the right-hand side is bounded by
Thus,
Taking the -th power and then conditional expectation yield
Therefore, by Step 3, there exists a constant depending on , , , , , , and such that
under Assumption 12, as claimed in the lemma. ∎