This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Uniform Limit Theory for Network Data

Yuya Sasaki
Vanderbilt University
Brian and Charlotte Grove Chair and Professor of Economics. Department of Economics, Vanderbilt University, PMB #351819, 2301 Vanderbilt Place, Nashville, TN 37235-1819. Email: yuya.sasaki@vanderbilt.edu. I thank Brian and Charlotte Grove for their generous research support.
Abstract

I present a novel uniform law of large numbers (ULLN) for network-dependent data. While Kojevnikov, Marmer, and Song (KMS, 2021) provide a comprehensive suite of limit theorems and a robust variance estimator for network-dependent processes, their analysis focuses on pointwise convergence. On the other hand, uniform convergence is essential for nonlinear estimators such as M and GMM estimators (e.g., Newey and McFadden, 1994, Section 2). Building on KMS, I establish the ULLN under network dependence and demonstrate its utility by proving the consistency of both M and GMM estimators. A byproduct of this work is a novel maximal inequality for network data, which may prove useful for future research beyond the scope of this paper.


Keywords: GMM estimation, M estimation, maximal inequality, network, uniform law of large numbers

JEL Codes: C12, C21, C31

1 Introduction

In recent years, asymptotic analysis of network-dependent data has garnered significant attention in econometrics (e.g., Kuersteiner, 2019; Leung and Moon, 2019; Kuersteiner and Prucha, 2020; Kojevnikov, Marmer, and Song, 2021).111This set of references excludes another important branch of the literature on network asymptotics—namely, the literature on dyadic-like networks—because its strong dependence structure differs significantly from the focus of this paper, both in terms of network structure and limit theory. Among these, Kojevnikov, Marmer, and Song (KMS, 2021) establish limit theorems and develop a robust variance estimator for a general class of dependent processes that encompass dependency-graph models in particular. Their framework, grounded in a conditional ψ\psi-dependence concept adopted from Doukhan and Louhichi (1999), offers powerful tools for handling network data and has spurred further research in related fields. Furthermore, the theory and methods introduced by KMS have been widely applied in economic and econometric studies of network models.

Many applications, particularly those involving nonlinear models like limited dependent variable models, demand uniform convergence results. In the context of general classes of M estimators (including maximum likelihood estimators) and generalized method of moments (GMM) estimators, a uniform law of large numbers (ULLN) is crucial for ensuring that the empirical criterion function converges uniformly to its population counterpart. This uniform convergence is fundamental for establishing the consistency—and subsequently the asymptotic normality—of these estimators, as detailed in standard references such as the handbook chapter by Newey and McFadden (1994, Section 2).

Although KMS offer elegant pointwise limit theorems under network dependence, their results do not directly yield the uniform law of large numbers (ULLN) required for nonlinear estimation. Achieving uniform convergence necessitates controlling not only the individual moments of network-dependent observations but also the fluctuations of the entire process uniformly across the parameter space. This task is further complicated by the intricate dependence structure inherent in network data.

The main contribution of this paper is to bridge this gap by establishing a novel ULLN under network dependence. My results build on the KMS framework, which utilizes model restrictions based on conditional ψ\psi-dependence, decay rates of network dependence, and moment bounds – these concepts will be concisely reviewed in Section 2.1. In addition, I impose further regularity conditions—such as uniform boundedness, uniform Lipschitz continuity, uniform integrability, and modified network decay conditions—that enable the extension from pointwise to uniform convergence.

These developments lay the groundwork for establishing consistency—and subsequently asymptotic normality—in nonlinear model estimation, including M and GMM estimators, under network dependence. I illustrate the utility of the novel ULLN by proving the consistency of these estimators. Specifically, the ULLN established herein permits one to follow the standard arguments detailed in Newey and McFadden (1994, Section 2).

Finally, although the primary contribution of this paper is the novel ULLN, a key by-product of the analysis is the development of a new maximal inequality for conditionally ψ\psi-dependent processes in network data. Recognizing its potential utility for future research beyond the scope of this work, I present this result independently in Appendix B, separate from the assumptions introduced in the main text.

The remainder of the paper is organized as follows. In Section 2, I introduce the setup. Section 3 presents the main theoretical results on the ULLN, while Section 4 illustrates applications of the ULLN to M and GMM estimators. Section 5 concludes. Detailed mathematical derivations are provided in the appendices: Appendix A contains the proofs of the main results, and Appendix B presents an auxiliary lemma (the maximal inequality) and its proof.

2 The Setup

This section introduces the econometric framework.

First, I introduce some basic notation. Let v,av,a\in\mathbb{N}. For any function f:v×af:\mathbb{R}^{v\times a}\to\mathbb{R}, define

f=supxv×a|f(x)|andLip(f)=supxy|f(x)f(y)|d(x,y),\|f\|_{\infty}=\sup_{x\in\mathbb{R}^{v\times a}}|f(x)|\quad\text{and}\quad\operatorname{Lip}(f)=\sup_{x\neq y}\frac{|f(x)-f(y)|}{d(x,y)},

where d(x,y)d(x,y) is a metric on v×a\mathbb{R}^{v\times a}. With these definitions, we introduce the class of uniformly bounded Lipschitz functions:

Lv,a={f:v×a:f< and Lip(f)<}.L_{v,a}=\Bigl{\{}f:\mathbb{R}^{v\times a}\to\mathbb{R}\,:\,\|f\|_{\infty}<\infty\text{ and }\operatorname{Lip}(f)<\infty\Bigr{\}}.

2.1 Conditionally ψ\psi-Dependent Processes

This subsection provides a concise overview of the baseline model introduced in Kojevnikov, Marmer, and Song (KMS, 2021) and the notational conventions used in the KMS framework; for a more detailed exposition, please refer to the original paper by KMS.

For each nn\in\mathbb{N}, let Nn={1,2,,n}N_{n}=\{1,2,\ldots,n\} denote the set of indices corresponding to the nodes in the network GnG_{n} with the adjacency matrix AnA_{n} whose elements are 0 and 11. A link between nodes ii and jj exists if and only if the (i,j)(i,j)-th entry of AnA_{n} equals one. For each nn\in\mathbb{N}, let 𝒞n\mathcal{C}_{n} be the σ\sigma-algebra with respect to which the adjacency matrix AnA_{n} is measurable. Let dn(i,j)d_{n}(i,j) denote the network distance between nodes ii and jj in NnN_{n}, defined as the length of the shortest path connecting ii and jj in GnG_{n}.

For a,ba,b\in\mathbb{N} and a positive real number ss, define

Pn(a,b;s)={(A,B):A,BNn,|A|=a,|B|=b,anddn(A,B)s},P_{n}(a,b;s)=\Bigl{\{}(A,B)\,:\,A,B\subset N_{n},\;|A|=a,\;|B|=b,\;\text{and}\;d_{n}(A,B)\geq s\Bigr{\}},

where

dn(A,B)=min{dn(i,j):iA,jB}.d_{n}(A,B)=\min\{d_{n}(i,j):i\in A,\;j\in B\}.

Thus, each element of Pn(a,b;s)P_{n}(a,b;s) is a pair of node sets of sizes aa and bb with a distance of at least ss between them.

Consider a triangular array {Yn,i}iNn\{Y_{n,i}\}_{i\in N_{n}} of random vectors in v\mathbb{R}^{v}. The following definition introduces the notion of conditional ψ\psi-dependence as provided in KMS.


Definition 1 (Conditional ψ\psi-Dependence; KMS, Definition 2.2).


A triangular array {Yn,i}iNn\{Y_{n,i}\}_{i\in N_{n}} is conditionally ψ\psi-dependent given {𝒞n}\{\mathcal{C}_{n}\} if for each nn\in\mathbb{N}, there exists a 𝒞n\mathcal{C}_{n}-measurable sequence

ϑn={ϑn,s}s0with ϑn,0=1,\vartheta_{n}=\{\vartheta_{n,s}\}_{s\geq 0}\quad\text{with }\vartheta_{n,0}=1,

and a collection of nonrandom functions

ψa,b:Lv,a×Lv,b[0,),a,b,\psi_{a,b}:L_{v,a}\times L_{v,b}\to[0,\infty),\quad a,b\in\mathbb{N},

such that for all positive integers a,ba,b, for every pair (A,B)Pn(a,b;s)(A,B)\in P_{n}(a,b;s) with s>0s>0, and for all functions fLv,af\in L_{v,a} and gLv,bg\in L_{v,b}, the following inequality holds almost surely:

|Cov(f(Yn,A),g(Yn,B)𝒞n)|ψa,b(f,g)ϑn,s.\Bigl{|}\operatorname{Cov}\bigl{(}f(Y_{n,A}),\,g(Y_{n,B})\mid\mathcal{C}_{n}\bigr{)}\Bigr{|}\leq\psi_{a,b}(f,g)\,\vartheta_{n,s}.

As emphasized in KMS, it is important to note that the decay coefficients are generally random, allowing one to accommodate the “common shocks” 𝒞n\mathcal{C}_{n} present in the network. I now present the following three key assumptions from KMS, which will be employed throughout the present paper.


Assumption 1 (KMS, Assumption 2.1 (a)).

The triangular array {Yn,i}\{Y_{n,i}\} is conditionally ψ\psi–dependent given {𝒞n}\{\mathcal{C}_{n}\} with dependence coefficients {ϑn,s}\{\vartheta_{n,s}\}, and there exists a constant C>0C>0 such that for all a,ba,b\in\mathbb{N}, fLv,af\in L_{v,a}, and gLv,bg\in L_{v,b},

ψa,b(f,g)Cab(f+Lip(f))(g+Lip(g)).\psi_{a,b}(f,g)\leq C\,ab\,\Bigl{(}\|f\|_{\infty}+\operatorname{Lip}(f)\Bigr{)}\Bigl{(}\|g\|_{\infty}+\operatorname{Lip}(g)\Bigr{)}.

For p>0p>0, let Yn,i𝒞n,p\|Y_{n,i}\|_{\mathcal{C}_{n},p} denote the conditional LpL^{p} norm defined by

Yn,i𝒞n,p=(E(|Yn,i|p𝒞n))1/p.\|Y_{n,i}\|_{\mathcal{C}_{n},p}=\bigl{(}E\bigl{(}|Y_{n,i}|^{p}\mid\mathcal{C}_{n}\bigr{)}\bigr{)}^{1/p}.

With this notation, the following assumption imposes a moment condition.


Assumption 2 (KMS, Assumption 3.1).

There exists ε>0\varepsilon>0 such that

supnmaxiNnYn,i𝒞n,1+ε<a.s.\sup_{n\in\mathbb{N}}\max_{i\in N_{n}}\|Y_{n,i}\|_{\mathcal{C}_{n},1+\varepsilon}<\infty\quad\text{a.s.}

For each node iNni\in N_{n} in the network for each row nn and s1s\geq 1, define the ss–shell

Nn(i;s)={jNn:dn(i,j)=s}.N_{n}^{\partial}(i;s)=\{j\in N_{n}:d_{n}(i,j)=s\}.

Then, define the average shell size

δn(s)=1niNn|Nn(i;s)|.\delta_{n}^{\partial}(s)=\frac{1}{n}\sum_{i\in N_{n}}|N_{n}^{\partial}(i;s)|.

With this notation, the following assumption restricts the denseness of the network and the decay rate of dependence with the network distance.


Assumption 3 (KMS, Assumption 3.2).

The combined effect of network denseness and the decay of dependence is controlled so that

1ns1δn(s)ϑn,s0a.s.\frac{1}{n}\sum_{s\geq 1}\delta_{n}^{\partial}(s)\,\vartheta_{n,s}\to 0\quad\text{a.s.}

I refer readers to the original paper by KMS for detailed discussions of these assumptions, as they are excerpted from KMS. Under these assumptions, along with an additional regularity condition, KMS establish the pointwise law of large numbers – see Proposition 3.1 in their paper.

2.2 Function Classes

This subsection introduces a parameter-indexed class of functions and imposes additional restrictions to establish the uniform law of large numbers.

Let Θd\Theta\subset\mathbb{R}^{d} denote a parameter space. For each θΘ\theta\in\Theta, let

f(,θ):vf(\cdot,\theta):\mathbb{R}^{v}\to\mathbb{R}

be a measurable function.

In addition to the three assumptions inherited from KMS, I impose the following conditions on the parameter space Θ\Theta and the function class {f(,θ):θΘ}\{f(\cdot,\theta):\theta\in\Theta\}.


Assumption 4 (Compactness).

The parameter space Θd\Theta\subset\mathbb{R}^{d} is compact.


Assumption 5 (Function Class).

For each fixed θΘ\theta\in\Theta, the function f(,θ)f(\cdot,\theta) belongs to Lv,1L_{v,1}.


Assumption 5 imposes the uniform bound and Lipschitz conditions on each function f(,θ)f(\cdot,\theta) in the class.

For ‘each’ θΘ\theta\in\Theta, the pointwise law of large numbers (LLN), as stated in Proposition 3.1 of KMS, holds under Assumptions 1, 2, 3, and 5. I will leverage this pointwise result by KMS as an auxiliary step in establishing the uniform law of large numbers.


Assumption 6 (Uniform Equicontinuity).

The function class {f(,θ):θΘ}\{f(\cdot,\theta):\theta\in\Theta\} is uniformly equicontinuous in θ\theta. In particular, there exists a constant L¯>0\overline{L}>0 such that for all yy in the support of Yn,iY_{n,i} for all nn and for all θ,θΘ\theta,\theta^{\prime}\in\Theta,

|f(y,θ)f(y,θ)|L¯θθ.|f(y,\theta)-f(y,\theta^{\prime})|\leq\overline{L}\|\theta-\theta^{\prime}\|.

Assumption 6, together with Assumption 4, will allow me to have a finite-net approximation of f(Yn,i,θ)f(Y_{n,i},\theta) for all θΘ\theta\in\Theta, as a way to establish the uniform result.

3 The Main Results: Uniform Laws of Large Numbers

I now state the first main result of this paper—the uniform law of large numbers for network-dependent data.


Theorem 1 (Uniform Law of Large Numbers).

If Assumptions 16 are satisfied, then

supθΘ|1niNn[f(Yn,i,θ)E(f(Yn,i,θ)𝒞n)]|0a.s.\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-E\bigl{(}f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\bigr{)}\Bigr{]}\right|\to 0\quad\text{a.s.}

See Appendix A.1 for a proof.

Note that Theorem 1 establishes the uniform almost sure convergence to its conditional mean. In order to further deduce uniform almost sure convergence to the unconditional mean averaging over all the common shocks,

supθΘ|1niNn[f(Yn,i,θ)E(f(Yn,i,θ))]|0a.s.,\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-E\bigl{(}f(Y_{n,i},\theta)\bigr{)}\Bigr{]}\right|\to 0\quad\text{a.s.},

I strengthen Assumptions 3 and 5 into Assumptions 3 and 5, as stated below.


  • Assumption 3.

    In addition to Assumption 3, there exist a constant A>0A>0 and an integer p>2p>2 such that

    ϑn,sAsp/(p1)\displaystyle\vartheta_{n,s}\leq As^{-p/(p-1)}

    holds a.s. for all positive real values ss and nn\in\mathbb{N}.



  • Assumption 5.

    There exist R,L<R,L<\infty, |f(,θ)|R|f(\cdot,\theta)|\leq R and Lip(f(,θ))L\operatorname{Lip}(f(\cdot,\theta))\leq L for all θΘ\theta\in\Theta.


These modifications to the baseline assumptions facilitate the application of a maximal inequality for conditionally ψ\psi-dependent processes; see Appendix B for further details. In particular, it is advantageous to impose the bounds LL and MM uniformly for all functions in the class {f(,θ):θΘ}\{f(\cdot,\theta):\theta\in\Theta\}. This uniformity ensures that a universal constant KK in the maximal inequality is applicable across the entire function class. This point is subtle but constitutes an important aspect of these assumptions.

In addition to Assumptions 1, 2, 3, 4, 5, and 6, I also impose the following network sparsity condition.


Assumption 7 (Network Sparsity).

There exist finite non-zero constants CC, c1c_{1}, c2c_{2}, and η(0,1)\eta\in(0,1) such that one can find a partition of NnN_{n} into JnJ_{n} equally-sized blocks {I1,,IJn}\{I_{1},...,I_{J_{n}}\} of size bnb_{n} satisfying

c1n2(1p+dp21)+ηbnc2n11pdp21ηandminj{1,..,Jn}mini,iIjiidn(i,i)Cbnc_{1}n^{2\left(\frac{1}{p}+\frac{d}{p^{2}-1}\right)+\eta}\leq b_{n}\leq c_{2}n^{1-\frac{1}{p}-\frac{d}{p^{2}-1}-\eta}\quad\text{and}\quad\min_{j\in\{1,..,J_{n}\}}\min_{\begin{subarray}{c}i,i^{\prime}\in I_{j}\\ i\neq i^{\prime}\end{subarray}}d_{n}(i,i^{\prime})\geq Cb_{n}

for all sufficiently large nn.


The purpose of this assumption is to facilitate the blocking strategy employed in deriving the maximal inequality – see Appendix B. In the context of time-series data, the network structure is ‘linear’ and inherently sparse, which naturally permits a blocking scheme with alternating JJ blocks for establishing a maximal inequality. Similarly, Assumption 7 postulates a blocking strategy for network data that ensures sufficient separation of nodes within each block IjI_{j}, a condition that is feasible under sparse networks. In social networks, it is plausible that for any individual there exists a set of other individuals arbitrarily far away in the network. In contrast, this assumption rules out densely connected networks, such as the global trade network where nearly every country is linked to most others.


Remark 1 (On the Value of pp and Restrictiveness of Assumptions 3 and 7).

The value of pp in Assumption 3 must match the value of pp in Assumption 7 whenever both are invoked. Nonetheless, there is no fundamental trade-off between these assumptions, and one can choose pp to be an arbitrarily large positive integer, rendering the pp-relevant parts of Assumptions 3 and 7 effectively non-restrictive. Specifically, Assumption 3 may be taken with

ϑn,sAs1ϵ0\vartheta_{n,s}\leq As^{-1-\epsilon_{0}}

for some small ϵ0>0\epsilon_{0}>0, and Assumption 7 with

c1nϵ1bnc2n1ϵ2c_{1}n^{\epsilon_{1}}\leq b_{n}\leq c_{2}n^{1-\epsilon_{2}}

for small ϵ1,ϵ2>0\epsilon_{1},\epsilon_{2}>0. While there is no explicit cost to choosing a large pp for the purpose of proving the uniform law of large numbers, there is an implicit cost in the background. For instance, if one makes Assumption 7 less restrictive by lowering ϵ1\epsilon_{1} and ϵ2\epsilon_{2} while keeping ϵ0\epsilon_{0} in Assumption 3 fixed, then the rate of divergence in the bound of the maximal inequality increases. In particular,

E[max1kn|i=1kXn,i|p|𝒞n]Knpmax{1ϵ1/2, 1ϵ2},E\Bigl{[}\max_{1\leq k\leq n}\Bigl{|}\sum_{i=1}^{k}X_{n,i}\Bigr{|}^{p}\;\Bigm{|}\;\mathcal{C}_{n}\Bigr{]}\;\leq\;K\,n^{p\cdot\max\{1-\epsilon_{1}/2,\,1-\epsilon_{2}\}},

where p=(1+ϵ0)/ϵ0p=(1+\epsilon_{0})/\epsilon_{0}. Consequently, if the maximal inequality (Lemma 1 in Appendix B) is to be used for other applications in the readers’ future research, one should be aware of this implicit cost incurred by making the generalized counterparts of Assumptions 3 and 7, namely Assumptions 11 (ii) and 12, less restrictive. See Appendix B for further details. \blacktriangle


Now, I state the second main result of this paper – the uniform law of large numbers for the unconditional mean:


Corollary 1 (Uniform Law of Large Numbers for the Unconditional Mean).

Suppose that Assumptions 1, 2, 3, 4, 5, 6, and 7 hold. Then,

supθΘ|1niNn[f(Yn,i,θ)E(f(Yn,i,θ))]|0a.s.\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-E\bigl{(}f(Y_{n,i},\theta)\bigr{)}\Bigr{]}\right|\to 0\quad\text{a.s.}

See Appendix A.2 for a proof.

This result (Corollary 1) is more advantageous than the first main result (Theorem 1) when the econometric model establishes identification via the unconditional moment function θE(f(Yn,i,θ)).\theta\mapsto E\bigl{(}f(Y_{n,i},\theta)\bigr{)}. For examples, see the following section.

4 Applications

Recall from the introductory section, that the primary motivation for developing the uniform law of large numbers in this paper is to facilitate the consistency of M (MLE-type) and GMM estimators. In this section, I demonstrate how the uniform law of large numbers, specifically Corollary 1, can be applied to establish the consistency of M estimators (Section 4.1) and GMM estimators (Section 4.2).

4.1 Application I: M Estimation

Let Q()Q(\cdot) and Qn()Q_{n}(\cdot) be the population and sample criterion functions for M estimation, defined on Θ\Theta by

Q(θ)=E(f(Yn,i,θ))andQn(θ)=1niNnf(Yn,i,θ),Q(\theta)=E\bigl{(}f(Y_{n,i},\theta)\bigr{)}\quad\text{and}\quad Q_{n}(\theta)=\frac{1}{n}\sum_{i\in N_{n}}f(Y_{n,i},\theta),

respectively. The M estimator is defined as

θ^MargmaxθΘQn(θ).\hat{\theta}_{M}\in\arg\max_{\theta\in\Theta}Q_{n}(\theta).

Suppose that the population criterion satisfies the following condition.


Assumption 8 (Identification for M Estimation).

There exists a unique θ0Θ\theta_{0}\in\Theta such that

θ0=argmaxθΘQ(θ).\theta_{0}=\arg\max_{\theta\in\Theta}Q(\theta).

With this identification condition, the standard argument based on Newey and McFadden (1994, Theorem 2.1), for example, yields the consistency θ^Mpθ0\hat{\theta}_{M}\stackrel{{\scriptstyle p}}{{\rightarrow}}\theta_{0} by the uniform law of large numbers (my Corollary 1).

Let me state this conclusion formally as a proposition.


Proposition 1 (Consistency of the M Estimator).

Suppose that Assumptions 1, 2, 3, 4, 5, 6, 7, and 8 hold, and Q()Q(\cdot) is continuous on Θ\Theta. Then, θ^Mpθ0\hat{\theta}_{M}\stackrel{{\scriptstyle p}}{{\rightarrow}}\theta_{0}.


Although this proposition immediately follows by applying my Corollary 1 and Newey and McFadden (1994, Theorem 2.1) as mentioned above, I provide a proof for completeness in Appendix A.3.

Note that the limit theorems of KMS alone could not produce this crucial result due to their focus on pointwise convergence. Once the consistency has been established, then the standard argument along the lines of Newey and McFadden (1994, Section 3), together with the limit distribution theory of KMS (their Theorem 3.2), yields the asymptotic normality under additional regularity conditions.

4.2 Application II: GMM Estimation

Let f(,)f(\cdot,\cdot) denote the moment function such that the true parameter vector θ0Θ\theta_{0}\in\Theta satisfies the moment equality

E(f(Yn,i,θ0))=0.E\bigl{(}f(Y_{n,i},\theta_{0})\bigr{)}=0.

Define the sample moment function by

f¯n(θ)=1ni=1nf(Yn,j,θ).\bar{f}_{n}(\theta)=\frac{1}{n}\sum_{i=1}^{n}f(Y_{n,j},\theta).

For any sequence WnW_{n} of positive definite weighting matrices (which may depend on the data) converging in probability to a positive definite matrix WW, the GMM estimator is defined as

θ^GMM=argminθΘQn(θ),\displaystyle\hat{\theta}_{GMM}=\arg\min_{\theta\in\Theta}Q_{n}(\theta),
whereQn(θ)=f¯n(θ)Wnf¯n(θ).\displaystyle\text{where}\quad Q_{n}(\theta)=\bar{f}_{n}(\theta)^{\top}W_{n}\,\bar{f}_{n}(\theta).

We can define the population criterion by

Q(θ)=E(f(Yn,i,θ))WE(f(Yn,i,θ)).Q(\theta)=E\bigl{(}f(Y_{n,i},\theta)\bigr{)}^{\top}WE\bigl{(}f(Y_{n,i},\theta)\bigr{)}.

Suppose that the population moment satisfies the following condition.


Assumption 9 (Identification for GMM Estimation).

There exists a unique θ0Θ\theta_{0}\in\Theta such that

E(f(Yn,i,θ))=0if and only ifθ=θ0.E\bigl{(}f(Y_{n,i},\theta)\bigr{)}=0\quad\text{if and only if}\quad\theta=\theta_{0}.

With this identification condition, the standard argument based on Newey and McFadden (1994, Theorem 2.1), for example, yields the consistency θ^GMMpθ0\hat{\theta}_{GMM}\stackrel{{\scriptstyle p}}{{\rightarrow}}\theta_{0} by the uniform law of large numbers (my Corollary 1).

Let me state this conclusion formally as a proposition.


Proposition 2 (Consistency of the GMM Estimator).

Suppose that Assumptions 1, 2, 3, 4, 5, 6, 7, and 9 hold, and Q()Q(\cdot) is continuous on Θ\Theta. Then, θ^GMMpθ0\hat{\theta}_{GMM}\stackrel{{\scriptstyle p}}{{\rightarrow}}\theta_{0}.


Although this proposition immediately follows by applying my Corollary 1 and Newey and McFadden (1994, Theorem 2.1) as mentioned above, I provide a proof for completeness in Appendix A.4.

Note that the results of KMS alone could not produce this crucial result due to their focus on pointwise convergence. Once the consistency has been established, then the standard argument along the lines of Newey and McFadden (1994, Section 3), together with the limit distribution theory of KMS (their Theorem 3.2), yields the asymptotic normality under additional regularity conditions.

5 Summary

In this paper, I extend the limit theory for the framework of network-dependent data, introduced by Kojevnikov, Marmer, and Song (KMS 2021), by developing a novel uniform law of large numbers (ULLN). While KMS provide a comprehensive set of limit theorems and a robust variance estimator for network-dependent processes, their pointwise results are insufficient for applications that require uniform convergence—such as establishing the consistency of M and GMM estimators, where the empirical criterion function must converge uniformly over a compact parameter space.

My main contribution fills this gap by setting forth conditions under which uniform convergence to both the conditional and unconditional means holds, thereby paving the way for consistent estimation of nonlinear models in the presence of network dependence. I illustrate the applications by using Corollary 1 to establish the consistency of both M and GMM estimators.

Finally, as a byproduct of this paper, I establish a novel maximal inequality for conditionally ψ\psi-dependent processes, which may be of independent interest for future research beyond the scope of this study. To highlight its broader applicability, I present this result separately in Appendix B, along with an independent set of assumptions distinct from those introduced in the main text.


References

  • Doukhan and Louhichi (1999) Doukhan, P. and S. Louhichi (1999). A new weak dependence condition and applications to moment inequalities. Stochastic Processes and Their Applications 84(2), 313–342.
  • Kojevnikov et al. (2021) Kojevnikov, D., V. Marmer, and K. Song (2021). Limit theorems for network dependent random variables. Journal of Econometrics 222(2), 882–908.
  • Kuersteiner (2019) Kuersteiner, G. M. (2019). Limit theorems for data with network structure. arXiv preprint arXiv:1908.02375.
  • Kuersteiner and Prucha (2020) Kuersteiner, G. M. and I. R. Prucha (2020). Dynamic spatial panel models: Networks, common shocks, and sequential exogeneity. Econometrica 88(5), 2109–2146.
  • Leung and Moon (2019) Leung, M. P. and H. R. Moon (2019). Normal approximation in large network models. arXiv preprint arXiv:1904.11060.
  • Newey and McFadden (1994) Newey, W. and D. McFadden (1994). Large sample estimation and hypothesis testing. In R. Engle and D. McFadden (Eds.), Handbook of Econometrics, pp.  2111–2245. Elsevier.

Appendix

The appendix consists of two sections. Appendix A collects proofs of the theoretical results presented in the main text. Specifically, Appendix A.1, A.2, A.3, and A.4 present proofs of Theorem 1, Corollary 1, Proposition 1, and Proposition 2, respectively. Appendix B presents an auxiliary lemma (Lemma 1), the novel maximal inequality for network-dependent data, which is used for the proof of Corollary 1. Notations carry over from the main text.

Appendix A Proofs

A.1 Proof of Theorem 1

Proof.

By Assumption 4, for any δ>0\delta>0 there exists a finite δ\delta–net {θ1,θ2,,θJ}Θ\{\theta_{1},\theta_{2},\ldots,\theta_{J}\}\subset\Theta such that for every θΘ\theta\in\Theta, there exists some θj{θ1,θ2,,θJ}\theta_{j}\in\{\theta_{1},\theta_{2},\ldots,\theta_{J}\} with

θθjδ.\|\theta-\theta_{j}\|\leq\delta.

For an arbitrary θΘ\theta\in\Theta, let θj(θ){θ1,θ2,,θJ}\theta_{j}(\theta)\in\{\theta_{1},\theta_{2},\ldots,\theta_{J}\} be a net point with θθj(θ)δ\|\theta-\theta_{j}(\theta)\|\leq\delta. Decompose

1niNn[f(Yn,i,θ)E(f(Yn,i,θ)𝒞n)]=An(θ)+Bn(θ)+Cn(θ),\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-E\bigl{(}f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\bigr{)}\Bigr{]}=A_{n}(\theta)+B_{n}(\theta)+C_{n}(\theta),

where the three components on the right-hand side are:

An(θ)=\displaystyle A_{n}(\theta)= 1niNn[f(Yn,i,θ)f(Yn,i,θj(θ))],\displaystyle\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-f(Y_{n,i},\theta_{j}(\theta))\Bigr{]},
Bn(θ)=\displaystyle B_{n}(\theta)= 1niNn[f(Yn,i,θj(θ))E(f(Yn,i,θj(θ))𝒞n)],and\displaystyle\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta_{j}(\theta))-E\bigl{(}f(Y_{n,i},\theta_{j}(\theta))\mid\mathcal{C}_{n}\bigr{)}\Bigr{]},\qquad\text{and}
Cn(θ)=\displaystyle C_{n}(\theta)= 1niNnE[f(Yn,i,θj(θ))f(Yn,i,θ)𝒞n].\displaystyle\frac{1}{n}\sum_{i\in N_{n}}E\Bigl{[}f(Y_{n,i},\theta_{j}(\theta))-f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\Bigr{]}.

By Assumption 6, for every n,in,i, we have

|f(Yn,i,θ)f(Yn,i,θj(θ))|L¯θθj(θ)L¯δ.|f(Y_{n,i},\theta)-f(Y_{n,i},\theta_{j}(\theta))|\leq\overline{L}\,\|\theta-\theta_{j}(\theta)\|\leq\overline{L}\,\delta.

Taking sample mean gives

|An(θ)|1niNn|f(Yn,i,θ)f(Yn,i,θj(θ))|L¯δ.|A_{n}(\theta)|\leq\frac{1}{n}\sum_{i\in N_{n}}|f(Y_{n,i},\theta)-f(Y_{n,i},\theta_{j}(\theta))|\leq\overline{L}\,\delta.

Similarly,

|Cn(θ)|1niNnE[|f(Yn,i,θj(θ))f(Yn,i,θ)|𝒞n]L¯δ.|C_{n}(\theta)|\leq\frac{1}{n}\sum_{i\in N_{n}}E\Bigl{[}|f(Y_{n,i},\theta_{j}(\theta))-f(Y_{n,i},\theta)|\,\mid\mathcal{C}_{n}\Bigr{]}\leq\overline{L}\,\delta.

Thus, it follows that

|An(θ)+Cn(θ)|2L¯δ.|A_{n}(\theta)+C_{n}(\theta)|\leq 2\overline{L}\,\delta.

Applying Proposition 3.1 of Kojevnikov, Marmer, and Song (2021) under my Assumptions 1, 2, 3, and 5 for each fixed θj{θ1,θ2,,θJ}\theta_{j}\in\{\theta_{1},\theta_{2},\ldots,\theta_{J}\} gives

1niNn[f(Yn,i,θj)E(f(Yn,i,θj)𝒞n)]0a.s.\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta_{j})-E\bigl{(}f(Y_{n,i},\theta_{j})\mid\mathcal{C}_{n}\bigr{)}\Bigr{]}\to 0\quad\text{a.s.}

Since the δ\delta–net {θ1,,θJ}\{\theta_{1},\ldots,\theta_{J}\} is finite, it follows that

Sn:=max1jJ|1niNn[f(Yn,i,θj)E(f(Yn,i,θj)𝒞n)]|0a.s.S_{n}:=\max_{1\leq j\leq J}\left|\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta_{j})-E\bigl{(}f(Y_{n,i},\theta_{j})\mid\mathcal{C}_{n}\bigr{)}\Bigr{]}\right|\to 0\quad\text{a.s.} (1)

For any θΘ\theta\in\Theta,

|1niNn[f(Yn,i,θ)E(f(Yn,i,θ)𝒞n)]||Bn(θ)|+|An(θ)+Cn(θ)|,\left|\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-E\bigl{(}f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\bigr{)}\Bigr{]}\right|\leq|B_{n}(\theta)|+|A_{n}(\theta)+C_{n}(\theta)|,

and thus

supθΘ|1niNn[f(Yn,i,θ)E(f(Yn,i,θ)𝒞n)]|Sn+2L¯δ.\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-E\bigl{(}f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\bigr{)}\Bigr{]}\right|\leq S_{n}+2\overline{L}\,\delta.

Since for every fixed δ>0\delta>0 the maximum SnS_{n} over the δ\delta-net converges to 0 almost surely by (1), it follows that

supθΘ|1niNn[f(Yn,i,θ)E(f(Yn,i,θ)𝒞n)]|2L¯δ+o(1)a.s.\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-E\bigl{(}f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\bigr{)}\Bigr{]}\right|\leq 2\overline{L}\,\delta+o(1)\quad\text{a.s.}

Because δ>0\delta>0 is arbitrary, this shows that

supθΘ|1niNn[f(Yn,i,θ)E(f(Yn,i,θ)𝒞n)]|0a.s.\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-E\bigl{(}f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\bigr{)}\Bigr{]}\right|\to 0\quad\text{a.s.}

as claimed in the statement of the theorem. ∎

A.2 Proof of Corollary 1

Proof.

For any fixed θΘ\theta\in\Theta,

1niNn[f(Yn,i,θ)E(f(Yn,i,θ))]=\displaystyle\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-E\bigl{(}f(Y_{n,i},\theta)\bigr{)}\Bigr{]}= 1niNn[f(Yn,i,θ)E(f(Yn,i,θ)𝒞n)]\displaystyle\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-E\bigl{(}f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\bigr{)}\Bigr{]}
+\displaystyle+ 1niNn[E(f(Yn,i,θ)𝒞n)E(f(Yn,i,θ))].\displaystyle\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}E\bigl{(}f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\bigr{)}-E\bigl{(}f(Y_{n,i},\theta)\bigr{)}\Bigr{]}.

Triangle inequality yields

supθΘ|1niNn[f(Yn,i,θ)E(f(Yn,i,θ))]|\displaystyle\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-E\bigl{(}f(Y_{n,i},\theta)\bigr{)}\Bigr{]}\right|\leq supθΘ|1niNn[f(Yn,i,θ)E(f(Yn,i,θ)𝒞n)]|\displaystyle\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-E\bigl{(}f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\bigr{)}\Bigr{]}\right|
+\displaystyle+ supθΘ|1niNn[E(f(Yn,i,θ)𝒞n)E(f(Yn,i,θ))]|.\displaystyle\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}E\bigl{(}f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\bigr{)}-E\bigl{(}f(Y_{n,i},\theta)\bigr{)}\Bigr{]}\right|.

By Theorem 1, the first term on the right-hand side converges to 0 almost surely under Assumptions 1, 2, 3, 4, 5, and 6.

It remains to show that

supθΘ|1niNn[E(f(Yn,i,θ)𝒞n)E(f(Yn,i,θ))]|0a.s.\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}E\bigl{(}f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\bigr{)}-E\bigl{(}f(Y_{n,i},\theta)\bigr{)}\Bigr{]}\right|\to 0\quad\text{a.s.} (2)

(Note that identical distribution is not assumed.) For each nn, ii, and θΘ\theta\in\Theta, the law of iterated expectations gives

E[E(f(Yn,i,θ)𝒞n)]=E(f(Yn,i,θ)).E\Bigl{[}E\bigl{(}f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\bigr{)}\Bigr{]}=E\bigl{(}f(Y_{n,i},\theta)\bigr{)}.

Thus, the difference

dn,i(θ)=E(f(Yn,i,θ)𝒞n)E(f(Yn,i,θ))d_{n,i}(\theta)=E\bigl{(}f(Y_{n,i},\theta)\mid\mathcal{C}_{n}\bigr{)}-E\bigl{(}f(Y_{n,i},\theta)\bigr{)}

has mean zero. For notational convenience, write the sum as

Sn(θ):=i=1ndn,i(θ).S_{n}(\theta):=\sum_{i=1}^{n}d_{n,i}(\theta).

By Assumption 4, for any δ>0\delta>0, there exists a δ\delta-net {θ1,,θJ}Θ\{\theta_{1},\dots,\theta_{J}\}\subset\Theta with J=J(δ)δdJ=J(\delta)\lesssim\delta^{-d} such that for every θΘ\theta\in\Theta, there exists some θj\theta_{j} with

θθjδ.\|\theta-\theta_{j}\|\leq\delta.

For any θΘ\theta\in\Theta, let θj(θ){θ1,,θJ}\theta_{j}(\theta)\in\{\theta_{1},\dots,\theta_{J}\} be an element of the δ\delta-net satisfying θθj(θ)δ\|\theta-\theta_{j}(\theta)\|\leq\delta. Then, we can decompose Sn(θ)S_{n}(\theta) as

Sn(θ)=Sn(θj(θ))+(Sn(θ)Sn(θj(θ))).S_{n}(\theta)=S_{n}(\theta_{j}(\theta))+\Bigl{(}S_{n}(\theta)-S_{n}(\theta_{j}(\theta))\Bigr{)}.

Triangle inequality yields

supθΘ|Sn(θ)|max1jJ|Sn(θj(θ))|+supθΘ|Sn(θ)Sn(θj(θ))|.\sup_{\theta\in\Theta}\bigl{|}S_{n}(\theta)\bigr{|}\leq\max_{1\leq j\leq J}\bigl{|}S_{n}(\theta_{j}(\theta))\bigr{|}+\sup_{\theta\in\Theta}\Bigl{|}S_{n}(\theta)-S_{n}(\theta_{j}(\theta))\Bigr{|}. (3)

By Assumption 6, for each ii, we have

|dn,i(θ)dn,i(θj(θ))|2L¯θθj(θ)2L¯δ.\bigl{|}d_{n,i}(\theta)-d_{n,i}(\theta_{j}(\theta))\bigr{|}\leq 2\overline{L}\|\theta-\theta_{j}(\theta)\|\leq 2\overline{L}\delta.

Summing over ii, it follows that

|Sn(θ)Sn(θj)|i=1n|dn,i(θ)dn,i(θj(θ))|2nL¯δ.\Bigl{|}S_{n}(\theta)-S_{n}(\theta_{j})\Bigr{|}\leq\sum_{i=1}^{n}\bigl{|}d_{n,i}(\theta)-d_{n,i}(\theta_{j}(\theta))\bigr{|}\leq 2n\overline{L}\delta. (4)

Thus, (3) and (4) yield

supθΘ|Sn(θ)|max1jJ|Sn(θj(θ))|+2nL¯δ.\sup_{\theta\in\Theta}\bigl{|}S_{n}(\theta)\bigr{|}\leq\max_{1\leq j\leq J}\bigl{|}S_{n}(\theta_{j}(\theta))\bigr{|}+2n\overline{L}\delta. (5)

Lemma 1 under Assumptions 1, 3, 5, and 7, in which AA, CC, c1c_{1}, c2c_{2}, η\eta, LL, pp, and RR are all constant across all θΘ\theta\in\Theta, provides a universal constant KK such that, for each θj\theta_{j} in the δ\delta-net {θ1,..,θJ}\{\theta_{1},..,\theta_{J}\},

E[|Sn(θj)|p|𝒞n]Knpβ,E\Bigl{[}\bigl{|}S_{n}(\theta_{j})\bigr{|}^{p}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\leq K\,n^{p\beta},

where β=max{1β1/2,β2}\beta=\max\{1-\beta_{1}/2,\,\beta_{2}\} with

β1=2(1p+dp21)+ηandβ2=11pdp21η.\beta_{1}=2\left(\frac{1}{p}+\frac{d}{p^{2}-1}\right)+\eta\quad\text{and}\quad\beta_{2}=1-\frac{1}{p}-\frac{d}{p^{2}-1}-\eta.

Since the δ\delta-net is finite with J=J(δ)J=J(\delta) points,

E[max1jJ|Sn(θj(θ))|p|𝒞n]J(δ)Knpβ.E\Bigl{[}\max_{1\leq j\leq J}\bigl{|}S_{n}(\theta_{j}(\theta))\bigr{|}^{p}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\leq J(\delta)\,K\,n^{p\beta}. (6)

Combine (5) and (6) to bound the pp-th moment of the supremum:

E[supθΘ|Sn(θ)|p|𝒞n]\displaystyle E\Bigl{[}\sup_{\theta\in\Theta}\bigl{|}S_{n}(\theta)\bigr{|}^{p}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\leq 2p1E[max1jJ|Sn(θj(θ))|p|𝒞n]+2p1(2nL¯δ)p\displaystyle 2^{p-1}\,E\Bigl{[}\max_{1\leq j\leq J}\bigl{|}S_{n}(\theta_{j}(\theta))\bigr{|}^{p}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}+2^{p-1}\,\bigl{(}2n\overline{L}\delta\bigr{)}^{p}
\displaystyle\leq 2p1J(δ)Knpβ+22p1L¯pnpδp.\displaystyle 2^{p-1}\,J(\delta)\,K\,n^{p\beta}+2^{2p-1}\,\overline{L}^{p}\,n^{p}\,\delta^{p}.

Recall that J(δ)δdJ(\delta)\lesssim\delta^{-d}, and choose δ=np/(p21)\delta=n^{-p/(p^{2}-1)} to get

E[supθΘ|Sn(θ)|p|𝒞n]np(β+d/(p21))+np(1p/(p21)).E\Bigl{[}\sup_{\theta\in\Theta}\bigl{|}S_{n}(\theta)\bigr{|}^{p}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\lesssim n^{p(\beta+d/(p^{2}-1))}+n^{p(1-p/(p^{2}-1))}.

The law of iterated expectations yields

E[supθΘ|Sn(θ)|p]np(β+d/(p21))+np(1p/(p21)).E\Bigl{[}\sup_{\theta\in\Theta}\bigl{|}S_{n}(\theta)\bigr{|}^{p}\Bigr{]}\lesssim n^{p(\beta+d/(p^{2}-1))}+n^{p(1-p/(p^{2}-1))}.

Divide both sides by npn^{p} to obtain

E[supθΘ|1ni=1ndn,i(θ)|p]np(β+d/(p21)1)+np2/(p21).E\left[\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i=1}^{n}d_{n,i}(\theta)\right|^{p}\right]\lesssim n^{p(\beta+d/(p^{2}-1)-1)}+n^{-p^{2}/(p^{2}-1)}.

Now, let ϵ>0\epsilon>0 be arbitrary. By Markov’s inequality,

P(supθΘ|1ni=1ndn,i(θ)|>ϵ)ϵp(np(β+d/(p21)1)+np2/(p21)).P\Biggl{(}\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i=1}^{n}d_{n,i}(\theta)\right|>\epsilon\Biggr{)}\lesssim\epsilon^{p}\left(n^{p(\beta+d/(p^{2}-1)-1)}+n^{-p^{2}/(p^{2}-1)}\right). (7)

Since

β1>2(1p+dp21)andβ2<11pdp21\beta_{1}>2\left(\frac{1}{p}+\frac{d}{p^{2}-1}\right)\quad\text{and}\quad\beta_{2}<1-\frac{1}{p}-\frac{d}{p^{2}-1}

by the definitions of β1\beta_{1} and β2\beta_{2}, it follows that

p(β+dp211)=p(max{1β1/2,β2}+dp211)<1\displaystyle p\,\left(\beta+\frac{d}{p^{2}-1}-1\right)=p\,\left(\max\{1-\beta_{1}/2,\,\beta_{2}\}+\frac{d}{p^{2}-1}-1\right)<-1

and, therefore, the series

n=1ϵp(np(β+d/(p21)1)+np2/(p21))\sum_{n=1}^{\infty}\epsilon^{p}\left(n^{p(\beta+d/(p^{2}-1)-1)}+n^{-p^{2}/(p^{2}-1)}\right)

(obtained from taking the sum on the right-hand side of (7) over nn\in\mathbb{N}) converges to zero. Hence, by the Borel–Cantelli lemma, for every ϵ>0\epsilon>0,

(supθΘ|1ni=1ndn,i(θ)|>ϵ infinitely often)=0.\mathbb{P}\Biggl{(}\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i=1}^{n}d_{n,i}(\theta)\right|>\epsilon\text{ infinitely often}\Biggr{)}=0.

Because ϵ>0\epsilon>0 was arbitrary, it follows that

supθΘ|1ni=1ndn,i(θ)|0a.s.,\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i=1}^{n}d_{n,i}(\theta)\right|\to 0\quad\text{a.s.},

showing that (2) holds.

Finally, combining Theorem 1 and the convergence (2) just established yields

supθΘ|1niNn[f(Yn,i,θ)E(f(Yn,i,θ))]|0a.s.\sup_{\theta\in\Theta}\left|\frac{1}{n}\sum_{i\in N_{n}}\Bigl{[}f(Y_{n,i},\theta)-E\bigl{(}f(Y_{n,i},\theta)\bigr{)}\Bigr{]}\right|\to 0\quad\text{a.s.}

as claimed in the corollary. ∎

A.3 Proof of Proposition 1

Proof.

I am going to check the four conditions of Newey and McFadden (NM, 1994, Theorem 2.1). Assumption 8 implies condition (i) in NM. Assumption 4 implies condition (ii) in NM. Condition (iii) in NM is directly assumed in the statement of the proposition. Corollary 1 under Assumptions 1, 2, 3, 4, 5, 6, and 7 implies condition (iv) in NM. Therefore, the claim of the proposition follows by Theorem 2.1 of NM. ∎

A.4 Proof of Proposition 2

Proof.

I am going to check the four conditions of Newey and McFadden (NM, 1994, Theorem 2.1). Assumption 9 and the positive definiteness of WW imply condition (i) in NM. Assumption 4 implies condition (ii) in NM. Condition (iii) in NM is directly assumed in the statement of the proposition. Since WnpWW_{n}\stackrel{{\scriptstyle p}}{{\rightarrow}}W where WW is positive definite, Corollary 1 under Assumptions 1, 2, 3, 4, 5, 6, and 7 implies condition (iv) in NM. Therefore, the claim of the proposition follows by Theorem 2.1 of NM. ∎

Appendix B An Auxiliary Lemma: A Maximal Inequality for Conditionally ψ\psi-Dependent-Type Processes

This section presents a novel maximal inequality for conditionally ψ\psi-dependent processes, which is used in the proof of Corollary 1. Since this inequality itself is potentially useful for future research on network analysis outside of the scope of the current paper, I state a self-contained set of assumptions here separately from those in the main text for general applicability.

Let {Xn,i}iNn\{X_{n,i}\}_{i\in N_{n}} be a triangular array, and consider the following set of assumptions with the basic notations inherited from Section 2 in the main text.


Assumption 10 (Zero Mean and Boundedness).


There exists M<M<\infty such that E[Xn,i|𝒞n]=0E[X_{n,i}|\mathcal{C}_{n}]=0 and |Xn,i|M|X_{n,i}|\leq M for all ii a.s.


Assumption 11 (Decay Rate).


(i) There exists ψ<\psi<\infty such that |Cov(Xn,i,Xn,i)𝒞n)|ψϑn,dn(i,i)\Bigl{|}\operatorname{Cov}\bigl{(}X_{n,i},X_{n,i^{\prime}})\mid\mathcal{C}_{n}\bigr{)}\Bigr{|}\leq\psi\,\vartheta_{n,d_{n}(i,i^{\prime})} for all iii\neq i^{\prime} a.s.
(ii) There exist A>0A>0 and an integer p>2p>2 such that ϑn,sAsp/(p1)\vartheta_{n,s}\leq As^{-p/(p-1)} for all s>0s>0 a.s.


Assumption 12 (Network Sparsity).


There exists a constants CC, c1c_{1}, c2c_{2}, β1\beta_{1}, and β2\beta_{2} such that one can find a partition of NnN_{n} into JnJ_{n} equally-sized blocks {I1,,IJn}\{I_{1},...,I_{J_{n}}\} with each size denoted by bnb_{n}, satisfying c1nβ1bnc2nβ2c_{1}n^{\beta_{1}}\leq b_{n}\leq c_{2}n^{\beta_{2}} and

minj{1,..,Jn}mini,iIjiidn(i,i)Cbn\min_{j\in\{1,..,J_{n}\}}\min_{\begin{subarray}{c}i,i^{\prime}\in I_{j}\\ i\neq i^{\prime}\end{subarray}}d_{n}(i,i^{\prime})\geq Cb_{n}

for all nn that are sufficiently large.


The last assumption may be interpreted as a network sparsity condition, and Assumption 7 in the main text is a special case of Assumption 12 with β1=2/(pδ)\beta_{1}=2/(p-\delta) and β2=11/(pδ)\beta_{2}=1-1/(p-\delta). As discussed below Assumption 7 in the main text, it facilitates the blocking strategy in the proof of the maximal inequality:


Lemma 1 (Maximal Inequality for Conditionally ψ\psi-Dependent Processes).

Suppose that Assumptions 10, 11, and 12 hold. Then, there exists a constant K<K<\infty (depending on AA, CC, MM, pp, and ψ\psi) such that for β=max{1β1/2,β2}\beta=\max\{1-\beta_{1}/2,\beta_{2}\}

E[max1kn|i=1kXn,i|p|𝒞n]Knpβa.s.E\Biggl{[}\max_{1\leq k\leq n}\Bigl{|}\sum_{i=1}^{k}X_{n,i}\Bigr{|}^{p}\,\Big{|}\,\mathcal{C}_{n}\Biggr{]}\leq K\,n^{p\beta}\quad\text{a.s.}

Proof of Lemma 1.

The proof of the lemma consists of four steps. Throughtout the proof, I will drop the qualification “a.s.” for ease of writing.


Step 1 (Blocking)
Let {I1,,IJn}\{I_{1},...,I_{J_{n}}\} be a partition of NnN_{n} satisfying Assumption 12. Define the block sum

Sj=iIjXn,iS_{j}^{*}=\sum_{i\in I_{j}}X_{n,i}

for each j{1,,Jn}j\in\{1,...,J_{n}\}.


Step 2 (Bounding Block Sums)
In this step, I am now going to show that there exists a constant BB (depending on AA, CC, MM, pp, and ψ\psi) such that for each block index j{1,,Jn}j\in\{1,...,J_{n}\},

E[|Sj|p|𝒞n]Bbnp/2.E\Bigl{[}|S_{j}^{*}|^{p}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\leq B\,b_{n}^{p/2}. (8)

I begin with getting a bound for each term when the pp-th power on the left-hand side of (8) is expanded. Applying Hölder’s inequality iteratively yields

E[1k<kp|Xn,ikXn,ik|1/p|𝒞n](1k<kpE[|Xn,ikXn,ik||𝒞n])1/p\displaystyle E\left[\prod_{1\leq k<k^{\prime}\leq p}|X_{n,i_{k}}X_{n,i_{k^{\prime}}}|^{1/p}\,\Big{|}\,\mathcal{C}_{n}\right]\leq\left(\prod_{1\leq k<k^{\prime}\leq p}E\Bigl{[}|X_{n,i_{k}}X_{n,i_{k^{\prime}}}|\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\right)^{1/p}

Hence, for each pp-tuple (i1,,ip)Ijp(i_{1},\dots,i_{p})\in I_{j}^{p},

|E[Xn,i1Xn,ip|𝒞n]|\displaystyle\Bigl{|}E\Bigl{[}X_{n,i_{1}}\cdots X_{n,i_{p}}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\Bigr{|}\leq (Mp1k<kpE[|Xn,ikXn,ik||𝒞n])1/p\displaystyle\left(M^{p}\prod_{1\leq k<k^{\prime}\leq p}E\Bigl{[}|X_{n,i_{k}}X_{n,i_{k^{\prime}}}|\,\Big{|}\mathcal{C}_{n}\Bigr{]}\right)^{1/p} (9)

is true under Assumption 10. Now, I branch into two types of such terms, labeled as (a) and (b) below.


(a) (Indices Are Fully Distinct)


First, consider a term indexed a pp-tuple (i1,,ip)Ijp(i_{1},...,i_{p})\in I_{j}^{p} where all the pp indices are distinct. Under Assumption 11 (i), the bound (9) can be in turn bounded above as

|E[Xn,i1Xn,ip|𝒞n]|\displaystyle\Bigl{|}E\Bigl{[}X_{n,i_{1}}\cdots X_{n,i_{p}}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\Bigr{|}\leq Mψ(p1)/21k<kpϑn,dn(ik,ik)1/p\displaystyle M\psi^{(p-1)/2}\,\prod_{1\leq k<k^{\prime}\leq p}\vartheta_{n,d_{n}(i_{k},i_{k^{\prime}})}^{1/p}

for such a term. Summing over all fully distinct pp-tuples gives

(i1,,ip)Ijpall distinct|E[Xn,i1Xn,ip|𝒞n]|Mψ(p1)/2(i1,,ip)Ijpall distinct1k<kpϑn,dn(ik,ik)1/p,\sum_{\begin{subarray}{c}(i_{1},\dots,i_{p})\in I_{j}^{p}\\ \text{all distinct}\end{subarray}}\Bigl{|}E\Bigl{[}X_{n,i_{1}}\cdots X_{n,i_{p}}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\Bigr{|}\leq M\psi^{(p-1)/2}\,\sum_{\begin{subarray}{c}(i_{1},\dots,i_{p})\in I_{j}^{p}\\ \text{all distinct}\end{subarray}}\prod_{1\leq k<k^{\prime}\leq p}\vartheta_{n,d_{n}(i_{k},i_{k^{\prime}})}^{1/p}, (10)

where the total number of fully distinct tuples in IjpI_{j}^{p} (i.e., the number of summands in (10)) is

bn!(bnp)!O(bnp).\frac{b_{n}\,!}{(b_{n}-p)\,!}\sim O(b_{n}^{p}).

Under Assumptions 11 (ii) and 12, the typical product (i.e., the summand in the right-hand side of (10))

1k<kpϑn,dn(ik,ik)1/p\prod_{1\leq k<k^{\prime}\leq p}\vartheta_{n,d_{n}(i_{k},i_{k^{\prime}})}^{1/p}

will be bounded above by

A(p1)/2Cp/2bnp/2.A^{(p-1)/2}C^{-p/2}b_{n}^{-p/2}.

Therefore, the overall off-diagonal contribution (10) is of order

O(bnpbnp/2)=O(bnp/2).O\bigl{(}b_{n}^{p}\cdot b_{n}^{-p/2}\bigr{)}=O(b_{n}^{p/2}).

This shows that the sum of the off-diagonal terms is bounded by a constant times bnp/2b_{n}^{p/2}.


(b) (Indices Are Not Fully Distinct)


Next, consider a term indexed a pp-tuple (i1,,ip)Ijp(i_{1},...,i_{p})\in I_{j}^{p} where not all the pp indices are distinct. Let dp1d\leq p-1 denote the number of distinct indices in such (i1,,ip)(i_{1},\dots,i_{p}). For this type of terms, the bound (9) will be in turn bounded above as

|E[Xn,i1Xn,ip|𝒞n]|\displaystyle\Bigl{|}E\Bigl{[}X_{n,i_{1}}\cdots X_{n,i_{p}}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\Bigr{|}\leq Mp(1k<kdE|Xn,ikXn,ik|)1/p\displaystyle M^{p}\left(\prod_{1\leq k<k^{\prime}\leq d}E|X_{n,i_{k}}X_{n,i_{k^{\prime}}}|\right)^{1/p}
\displaystyle\leq Mpψd(d1)2p1k<kdϑn,dn(ik,ik)1/p\displaystyle M^{p}\psi^{\frac{d(d-1)}{2p}}\,\prod_{1\leq k<k^{\prime}\leq d}\vartheta_{n,d_{n}(i_{k},i_{k^{\prime}})}^{1/p}

under Assumptions 10 and 11 (i). The product

1k<kdϑn,dn(ik,ik)1/p\displaystyle\prod_{1\leq k<k^{\prime}\leq d}\vartheta_{n,d_{n}(i_{k},i_{k^{\prime}})}^{1/p}

on the right-hand side of the last display will be bounded above by

Ad(d1)2pCd(d1)2(p1)bnd(d1)2(p1)\displaystyle A^{\frac{d(d-1)}{2p}}C^{-\frac{d(d-1)}{2(p-1)}}b_{n}^{-\frac{d(d-1)}{2(p-1)}}

under Assumptions 11 (ii) and 12. The total number of such terms with dd distinct indices is

bn!(bnd)!O(bnd).\frac{b_{n}!}{(b_{n}-d)!}\sim O(b_{n}^{d}).

Therefore, the overall contribution from terms with dd distinct indices is of order

O(bndbnd(d1)2(p1))=O(bnp2(pd)(pd1)2(p1)), which is O(bnp/2),O\bigl{(}b_{n}^{d}\cdot b_{n}^{-\frac{d(d-1)}{2(p-1)}}\bigr{)}=O\bigl{(}b_{n}^{\frac{p}{2}-\frac{(p-d)(p-d-1)}{2(p-1)}}\bigr{)},\text{ which is }O\left(b_{n}^{p/2}\right),

where dp1d\leq p-1 is used to obtain the last bound. Since this bound is the same for all d{1,,p1}d\in\{1,...,p-1\}, it shows that the sum of all the terms with repeated indices is bounded above by a constant times bnp/2b_{n}^{p/2}.

Combining the bounds from (a) of order O(bnp/2)O(b_{n}^{p/2}) and (b) of order O(bnp/2)O(b_{n}^{p/2}) yields

E[|iIjXn,i|p|𝒞n]Bbnp/2,E\Bigl{[}\Bigl{|}\sum_{i\in I_{j}}X_{n,i}\Bigr{|}^{p}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\leq B\,b_{n}^{p/2},

for a constant BB that depends on AA, CC, MM, pp, and ψ\psi. This establishes (8).


Step 3 (Summing over Blocks)
Now, define the cumulative block sums

Tj=j=1jSj,j=1,,Jn.T_{j}=\sum_{j^{\prime}=1}^{j}S_{j^{\prime}}^{*},\quad j=1,\ldots,J_{n}.

For every j{1,,Jn}j\in\{1,...,J_{n}\},

|Tj|j=1Jn|Sj|,|T_{j}|\leq\sum_{j^{\prime}=1}^{J_{n}}|S_{j^{\prime}}^{*}|,

and hence

max1jJn|Tj|p(j=1Jn|Sj|)p.\max_{1\leq j\leq J_{n}}|T_{j}|^{p}\leq\left(\sum_{j=1}^{J_{n}}|S_{j}^{*}|\right)^{p}.

Then, applying the generic inequality

(j=1Jnaj)pJnp1j=1Jnajpfor aj0,\Biggl{(}\sum_{j=1}^{J_{n}}a_{j}\Biggr{)}^{p}\leq J_{n}^{p-1}\sum_{j=1}^{J_{n}}a_{j}^{p}\quad\text{for }a_{j}\geq 0,

with aj=|Sj|a_{j}=|S_{j}^{*}| yields

max1jJn|Tj|pJnp1j=1Jn|Sj|p.\max_{1\leq j\leq J_{n}}|T_{j}|^{p}\leq J_{n}^{p-1}\sum_{j=1}^{J_{n}}|S_{j}^{*}|^{p}.

Taking conditional expectation gives

E[max1jJn|Tj|p|𝒞n]Jnp1j=1JnE[|Sj|p|𝒞n].E\Bigl{[}\max_{1\leq j\leq J_{n}}|T_{j}|^{p}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\leq J_{n}^{p-1}\sum_{j=1}^{J_{n}}E\Bigl{[}|S_{j}^{*}|^{p}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}.

By the bound (8) established in the previous step,

j=1JnE[|Sj|p|𝒞n]JnBbnp/2.\sum_{j=1}^{J_{n}}E\Bigl{[}|S_{j}^{*}|^{p}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\leq{J_{n}}\,B\,b_{n}^{p/2}.

Therefore,

E[max1jJn|Tj|p|𝒞n]JnpBbnp/2=Bnpbnp/2.E\Bigl{[}\max_{1\leq j\leq J_{n}}|T_{j}|^{p}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}\leq J_{n}^{p}Bb_{n}^{p/2}=Bn^{p}b_{n}^{-p/2}.

Step 4 (Extension to All Indices)
For any natural number kk with 1kn1\leq k\leq n, there exist j(k){1,,Jn}j(k)\in\{1,...,J_{n}\} and 𝒥(k){1,,Jn}\mathcal{J}(k)\subset\{1,...,J_{n}\} such that j(k)𝒥(k)j(k)\not\in\mathcal{J}(k) and {1,,k}=(j𝒥(k)Ij){iIj(k):ik}\{1,...,k\}=(\cup_{j\in\mathcal{J}(k)}I_{j})\cup\{i\in I_{j(k)}:i\leq k\}. Then, the partial sum can bounded with two components:

|i=1kXn,i||j𝒥(k)Sj|+|iIj(k)ikXn,i|.\left|\sum_{i=1}^{k}X_{n,i}\right|\leq\left|\sum_{j\in\mathcal{J}(k)}S_{j}^{*}\right|+\left|\sum_{\begin{subarray}{c}i\in I_{j(k)}\\ i\leq k\end{subarray}}X_{n,i}\right|.

By Assumption 10, the second term on the right-hand side is bounded by

|iIj(k)ikXn,i|Mbn.\left|\sum_{\begin{subarray}{c}i\in I_{j(k)}\\ i\leq k\end{subarray}}X_{n,i}\right|\leq M\,b_{n}.

Thus,

max1kn|i=1kXn,i|max1jJn|Tj|+Mbn.\max_{1\leq k\leq n}\left|\sum_{i=1}^{k}X_{n,i}\right|\leq\max_{1\leq j\leq J_{n}}|T_{j}|+M\,b_{n}.

Taking the pp-th power and then conditional expectation yield

E[max1kn|i=1kXn,i|p|𝒞n]2p1E[max1jJn|Tj|p|𝒞n]+2p1(Mbn)p.E\Biggl{[}\max_{1\leq k\leq n}\left|\sum_{i=1}^{k}X_{n,i}\right|^{p}\,\Big{|}\,\mathcal{C}_{n}\Biggr{]}\leq 2^{p-1}\,E\Bigl{[}\max_{1\leq j\leq J_{n}}|T_{j}|^{p}\,\Big{|}\,\mathcal{C}_{n}\Bigr{]}+2^{p-1}(M\,b_{n})^{p}.

Therefore, by Step 3, there exists a constant KK depending on AA, CC, c1c_{1}, c2c_{2}, MM, pp, and ψ\psi such that

E[max1kn|i=1kXn,i|p|𝒞n]\displaystyle E\left[\max_{1\leq k\leq n}\left|\sum_{i=1}^{k}X_{n,i}\right|^{p}\,\Big{|}\,\mathcal{C}_{n}\right]\leq 2p1Bnpbnp/2+2p1Mpbnp\displaystyle 2^{p-1}Bn^{p}b_{n}^{-p/2}+2^{p-1}M^{p}b_{n}^{p}
\displaystyle\leq Knpβ\displaystyle Kn^{p\beta}

under Assumption 12, as claimed in the lemma. ∎