This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\newcites

suppSupplementary Material References

A Regression Framework for Studying Relationships among Attributes under Network Interference

Cornelius Fritz
Michael Schweinberger
Subhankar Bhadra
David R. Hunter
Department of Statistics, The Pennsylvania State University
Corresponding author.
Abstract

To understand how the interconnected and interdependent world of the twenty-first century operates and make model-based predictions, joint probability models for networks and interdependent outcomes are needed. We propose a comprehensive regression framework for networks and interdependent outcomes with multiple advantages, including interpretability, scalability, and provable theoretical guarantees. The regression framework can be used for studying relationships among attributes of connected units and captures complex dependencies among connections and attributes, while retaining the virtues of linear regression, logistic regression, and other regression models by being interpretable and widely applicable. On the computational side, we show that the regression framework is amenable to scalable statistical computing based on convex optimization of pseudo-likelihoods using minorization-maximization methods. On the theoretical side, we establish convergence rates for pseudo-likelihood estimators based on a single observation of dependent connections and attributes. We demonstrate the regression framework using simulations and an application to hate speech on the social media platform X.

Keywords: Dependent Data, Generalized Linear Models, Minorization-Maximization, Pseudo-Likelihood

1 Introduction

In the interconnected and interdependent world of the twenty-first century, individual and collective outcomes—such as personal and public health, economic welfare, or war and peace—are affected by relationships among individual, corporate, state, and non-state actors. To understand how the world of the twenty-first century operates and make model-based predictions, it is vital to study networks of relationships and gain insight into how the structure of networks affects individual and collective outcomes.

While the structure of networks has been widely studied (see Kolaczyk, 2017, and references therein), the structure of networks is rarely of primary interest. Instead, we often wish to understand how networks affect individual or collective outcomes. For example, social, economic, and financial relationships among individual and corporate actors can affect the welfare of people, but the outcome of primary interest is the welfare of billions of people around the world. Relationships among state and non-state actors can affect war and peace, but the outcome of primary interest is the welfare of nations. Contact networks mediate the spread of infectious diseases, but the outcome of primary interest is public health. A final example is causal inference under network interference: If the outcomes of units are affected by the treatments or outcomes of other units, the spillover effect of treatments on outcomes can be represented by an intervention network, but the target of statistical inference is the direct and indirect causal effects of treatments on outcomes.

To learn how networks are wired and how the structure of networks affects outcomes of interest, data on outcomes 𝒀(Yi)i=1N\bm{Y}\coloneqq(Y_{i})_{i=1}^{N} and connections 𝒁(Zi,j)i,jN\bm{Z}\coloneqq(Z_{i,j})_{i,j}^{N} among NN units are needed along with predictors 𝑿(𝑿i)i=1N\bm{X}\coloneqq(\bm{X}_{i})_{i=1}^{N}. Statistical work on joint probability models for dependent outcomes and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} is scarce. Snijders et al. (2007) and Niezink and Snijders (2017) develop models for behavioral outcomes and connections using continuous-time Markov processes, assuming that the behavioral outcomes and connections are observed at two or more time points. Wang et al. (2024) combine Ising models for binary outcomes with exponential family models for binary connections, with applications to causal inference (Clark and Handcock, 2024). In a Bayesian framework, Fosdick and Hoff (2015) unite models for continuous outcomes with latent variable models that capture dependencies among connections. A common feature of these approaches is that the models and methods in these works may be useful in small populations with, say, hundreds of members, but may be less useful in large populations with, say, thousands or millions of members. For example, many of these models make dependence assumptions that are reasonable in small populations but are less reasonable in large populations. In the special case of exponential-family models, it is known that models that make unreasonable dependence assumptions can give rise to undesirable probabilistic and statistical behavior in large populations, such as model near-degeneracy (Handcock, 2003; Schweinberger, 2011; Chatterjee and Diaconis, 2013). In addition, these works rely on Monte Carlo and Markov chain Monte Carlo methods for moment- and likelihood-based inference, which limits the scalability of the mentioned approaches. Last, but not least, the theoretical properties of statistical procedures based on dependent outcomes and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}, such as the convergence rates of estimators, are unknown.

While statistical work on joint probability models for (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} is scarce, recent progress has been made on conditional models for outcomes 𝒀(𝑿,𝒁)=(𝒙,𝒛)\bm{Y}\mid(\bm{X},\bm{Z})=(\bm{x},\bm{z}) and connections 𝒁𝑿=𝒙\bm{Z}\mid\bm{X}=\bm{x}. For example, the literature on network-aware regression uses conditional models for outcomes 𝒀(𝑿,𝒁)=(𝒙,𝒛)\bm{Y}\mid(\bm{X},\bm{Z})=(\bm{x},\bm{z}): Lei et al. (2024) assume that the dependence among outcomes decays as a function of distance in the population network, while Li et al. (2019) and Le and Li (2022) encourage outcomes of connected units to be similar. A related branch of literature, concerned with causal inference under network interference, leverages conditional models for outcomes 𝒀(𝑿,𝒁)=(𝒙,𝒛)\bm{Y}\mid(\bm{X},\bm{Z})=(\bm{x},\bm{z}) given treatment assignments 𝑿=𝒙\bm{X}=\bm{x} and connections 𝒁=𝒛\bm{Z}=\bm{z}. Some of them consider fixed connections (e.g., Tchetgen Tchetgen et al., 2021; Ogburn et al., 2024), while others combine conditional models for 𝒀(𝑿,𝒁)=(𝒙,𝒛)\bm{Y}\mid(\bm{X},\bm{Z})=(\bm{x},\bm{z}) with marginal models for 𝒁\bm{Z}, assuming that connections are independent (Li and Wager, 2022). Other works advance autoregressive network models for 𝒀(𝑿,𝒁)=(𝒙,𝒛)\bm{Y}\mid(\bm{X},\bm{Z})=(\bm{x},\bm{z}) (Huang et al., 2019, 2020; Zhu et al., 2020). Conditional models for connections 𝒁𝑿=𝒙\bm{Z}\mid\bm{X}=\bm{x} include stochastic block and exponential-family models with covariates (e.g., Handcock, 2003; Huang et al., 2024; Wang et al., 2024; Stein et al., 2025).

All of the cited work is limited to special cases, such as real-valued outcomes or binary connections, rather than presenting a comprehensive regression framework for studying relationships among attributes (𝑿,𝒀)(\bm{X},\bm{Y}) under network interference 𝒁\bm{Z}. To fill the void left by existing work, we propose a comprehensive regression framework for studying relationships among attributes (𝑿,𝒀)(\bm{X},\bm{Y}) under network interference 𝒁\bm{Z} based on joint probability models for (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}. The proposed regression framework has important advantages over existing work, including interpretability, scalability, and provable theoretical guarantees:

  1. 1.

    We show in Sections 2.1 and 2.2 that the proposed regression framework can be viewed as a generalization of linear regression, logistic regression, and other regression models for studying relationships among attributes under network interference, adding a simple and widely applicable set of tools to the toolbox of data scientists. We demonstrate the advantages of the regression framework with an application to hate speech on the social media platform X in Section 6.

  2. 2.

    The proposed regression framework can be applied to small and large populations by leveraging additional structure to control the dependence among outcomes and connections, facilitating the construction of models with complex dependencies among outcomes and connections in small and large populations.

  3. 3.

    We develop scalable methods using minorization-maximization algorithms for convex optimization of pseudo-likelihoods in Section 3. To disseminate the regression framework and its scalable methods, we provide an R package.

  4. 4.

    We establish theoretical guarantees for pseudo-likelihood estimators in Section 4. To the best of our knowledge, these are the first theoretical guarantees for joint probability models of (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} based on a single observation (𝒚,𝒛)(\bm{y},\bm{z}) of (𝒀,𝒁)(\bm{Y},\bm{Z}). The simulation results in Section 5 demonstrate that pseudo-likelihood estimators perform well as the number of units NN and the number of parameters pp increases.

In addition, the regression framework has conceptual and statistical advantages:

  1. 5.

    Compared with conditional models for outcomes 𝒀(𝑿,𝒁)=(𝒙,𝒛)\bm{Y}\mid(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z}) and connections 𝒁𝑿=𝒙\bm{Z}\mid\bm{X}=\bm{x}, the proposed regression framework for (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} provides insight into outcome-connection dependencies, in addition to outcome-outcome and connection-connection dependencies.

  2. 6.

    Compared with conditional models for outcomes 𝒀(𝑿,𝒁)=(𝒙,𝒛)\bm{Y}\mid(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z}), conclusions based on the proposed regression framework are not limited to a specific population network 𝒛\bm{z}, but can be extended to the superpopulation of all possible population networks. In addition, the proposed regression framework provides insight into the probability law governing the superpopulation of all possible population networks.

  3. 7.

    The proposed regression framework retains the advantages of two general approaches to building joint probability models for dependent data, elucidated in the celebrated paper by Besag (1974): Specifying a joint probability distribution directly guarantees desirable mathematical properties, while specifying it indirectly via conditional probability distributions helps build complex models from simple building blocks. We show how to directly specify a joint probability model from simple building blocks. The resulting regression framework possesses desirable mathematical properties and induces conditional distributions that can be represented by regression models, facilitating interpretation. We showcase these advantages in Sections 2.2 and 6.2.

We elaborate the proposed regression framework in the remainder of the article.

2 Regression under Network Interference

Consider a population of N2N\geq 2 units 𝒫N{1,,N}\mathscr{P}_{N}\coloneqq\{1,\dots,N\}, where each unit i𝒫Ni\in\mathscr{P}_{N} possesses

  • one or more binary, count-, or real-valued predictors 𝑿i𝒳i\bm{X}_{i}\in\mathscr{X}_{i}, which may include covariates and treatment assignments;

  • binary, count-, or real-valued outcomes or responses Yi𝒴iY_{i}\in\mathscr{Y}_{i};

  • binary, count-, or real-valued connections Zi,j𝒵i,jZ_{i,j}\in\mathscr{Z}_{i,j} to other units j𝒫N{i}j\in\mathscr{P}_{N}\setminus\,\{i\}, which represent indicators of connections or weights of connections (e.g., the number of interactions between ii and jj).

We first consider undirected connections, for which Zi,jZ_{i,j} equals Zj,iZ_{j,i}, and describe extensions to directed connections in Section 6. We write 𝑿(𝑿i)1iN\bm{X}\coloneqq(\bm{X}_{i})_{1\leq i\leq N},  𝒀(Yi)1iN\bm{Y}\coloneqq(Y_{i})_{1\leq i\leq N},  𝒁(Zi,j)1i<jN\bm{Z}\coloneqq(Z_{i,j})_{1\leq i<j\leq N},  𝒳i=1N𝒳i\mathscr{X}\coloneqq\mathbin{\leavevmode\hbox to9.47pt{\vbox to9.47pt{\pgfpicture\makeatletter\hbox{\thinspace\lower-0.43056pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}\pgfsys@setlinewidth{0.86111pt}\pgfsys@invoke{ }{}{{}}{} {}{}{}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{8.61108pt}{8.61108pt}\pgfsys@moveto{0.0pt}{8.61108pt}\pgfsys@lineto{8.61108pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}}_{i=1}^{N}\,\mathscr{X}_{i},  𝒴i=1N𝒴i\mathscr{Y}\coloneqq\mathbin{\leavevmode\hbox to9.47pt{\vbox to9.47pt{\pgfpicture\makeatletter\hbox{\thinspace\lower-0.43056pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}\pgfsys@setlinewidth{0.86111pt}\pgfsys@invoke{ }{}{{}}{} {}{}{}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{8.61108pt}{8.61108pt}\pgfsys@moveto{0.0pt}{8.61108pt}\pgfsys@lineto{8.61108pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}}_{i=1}^{N}\,\mathscr{Y}_{i},  and 𝒵i<jN𝒵i,j\mathscr{Z}\coloneqq\mathbin{\leavevmode\hbox to9.47pt{\vbox to9.47pt{\pgfpicture\makeatletter\hbox{\thinspace\lower-0.43056pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}\pgfsys@setlinewidth{0.86111pt}\pgfsys@invoke{ }{}{{}}{} {}{}{}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{8.61108pt}{8.61108pt}\pgfsys@moveto{0.0pt}{8.61108pt}\pgfsys@lineto{8.61108pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}}_{i<j}^{N}\,\mathscr{Z}_{i,j}, and refer to 𝒀\bm{Y} without YiY_{i} and 𝒁\bm{Z} without Zi,jZ_{i,j} as 𝒀i𝒴i\bm{Y}_{-i}\in\mathscr{Y}_{-i} and 𝒁{i,j}𝒵{i,j}\bm{Z}_{-\{i,j\}}\in\mathscr{Z}_{-\{i,j\}}, respectively. In line with Generalized Linear Models (GLMs), we introduce a known scale parameter ψ(0,+)\psi\in(0,+\infty) and define YiYi/ψY_{i}^{\star}\coloneqq Y_{i}\,/\,\psi and 𝒀i𝒀i/ψ\bm{Y}_{-i}^{\star}\coloneqq\bm{Y}_{-i}\,/\,\psi. Throughout, 𝕀()\mathbb{I}(\cdot) is an indicator function, which is 11 if its argument is true and is 0 otherwise. We write aN=O(bN)a_{N}=O(b_{N}) and aN=o(bN)a_{N}=o(b_{N}) to indicate that |aN/bN||a_{N}/b_{N}| remains bounded and limN|aN/bN|=0\lim_{N\to\infty}|a_{N}/b_{N}|=0, respectively.

Following the bulk of the literature on regression models, we condition on predictors 𝑿=𝒙\bm{X}=\bm{x}. To construct joint probability models for dependent responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\,\bm{Z})\mid\bm{X}=\bm{x}, we introduce a family of probability measures {𝜽,𝜽𝚯}\{\mathbb{P}_{\bm{\theta}},\,\bm{\theta}\in\bm{\Theta}\} dominated by a σ\sigma-finite measure ν\nu, with densities of the form

f𝜽(𝒚,𝒛𝒙)=1φ(𝜽)[i=1Na𝒴(yi)exp(𝜽ggi(𝒙i,yi))]×[i=1N1j=i+1Na𝒵(zi,j)exp(𝜽hhi,j(𝒙,yi,yj,𝒛))]:\begin{array}[]{llllllllll}f_{\bm{\theta}}(\bm{y},\,\bm{z}\mid\bm{x})&=&\dfrac{1}{\varphi(\bm{\theta})}\left[\displaystyle\prod\limits_{i=1}^{N}a_{\mathscr{Y}}(y_{i})\,\exp\left(\bm{\theta}_{g}^{\top}\,g_{i}(\bm{x}_{i},\,y_{i}^{\star})\right)\right]\vskip 7.11317pt\\ &\times&\left[\displaystyle\prod\limits_{i=1}^{N-1}\,\displaystyle\prod\limits_{j=i+1}^{N}a_{\mathscr{Z}}(z_{i,j})\,\exp\left(\bm{\theta}_{h}^{\top}\,h_{i,j}(\bm{x},\,y_{i}^{\star},\,y_{j}^{\star},\,\bm{z})\right)\right]:\end{array} (1)
  • a𝒴:𝒴i[0,+)a_{\mathscr{Y}}:\mathscr{Y}_{i}\mapsto[0,+\infty) and a𝒵:𝒵i,j[0,+)a_{\mathscr{Z}}:\mathscr{Z}_{i,j}\mapsto[0,+\infty) are known functions of responses YiY_{i} of units i𝒫ii\in\mathscr{P}_{i} and connections Zi,jZ_{i,j} of pairs of units {i,j}𝒫N\{i,j\}\subset\mathscr{P}_{N};

  • gi:𝒳i×𝒴iqg_{i}:\mathscr{X}_{i}\times\mathscr{Y}_{i}\mapsto\mathbb{R}^{q} are known functions describing the relationship of predictors 𝒙i\bm{x}_{i} and responses YiY_{i}  of  units i𝒫Ni\in\mathscr{P}_{N}, which can depend on ψ\psi;

  • hi,j:𝒳×𝒴i×𝒴j×𝒵rh_{i,j}:\mathscr{X}\times\mathscr{Y}_{i}\times\mathscr{Y}_{j}\times\mathscr{Z}\mapsto\mathbb{R}^{r} are known functions specifying how the responses and connections of pairs of units {i,j}𝒫N\{i,j\}\subset\mathscr{P}_{N} depend on the predictors, responses, and connections to other units, which can depend on ψ\psi;

  • 𝜽(𝜽g,𝜽h)𝚯\bm{\theta}\coloneqq(\bm{\theta}_{g},\,\bm{\theta}_{h})\in\bm{\Theta} is a parameter vector of dimension pq+rp\coloneqq q+r, where 𝚯{𝜽p:φ(𝜽)<}\bm{\Theta}\coloneqq\{\bm{\theta}\in\mathbb{R}^{p}:\varphi(\bm{\theta})<\infty\} and φ:𝚯(0,+]\varphi:\bm{\Theta}\mapsto(0,+\infty] ensures that 𝒴×𝒵f𝜽(𝒚,𝒛|𝒙)dν(𝒚,𝒛)=1\int_{\mathscr{Y}\times\mathscr{Z}}\,f_{\bm{\theta}}(\bm{y},\bm{z}\,|\,\bm{x})\mathop{\mbox{d}}\nolimits\nu(\bm{y},\bm{z})=1, with the dependence of φ\varphi on 𝒙\bm{x} suppressed;

  • ν\nu is a σ\sigma-finite product measure of the form

    ν(𝒚,𝒛)[i=1Nν𝒴(yi)][i=1N1j=i+1Nν𝒵(zi,j)],\begin{array}[]{llllllllll}\nu(\bm{y},\bm{z})&\coloneqq&\left[\displaystyle\prod\limits_{i=1}^{N}\nu_{\mathscr{Y}}(y_{i})\right]\left[\displaystyle\prod\limits_{i=1}^{N-1}\,\displaystyle\prod\limits_{j=i+1}^{N}\nu_{\mathscr{Z}}(z_{i,j})\right],\end{array}

    where ν𝒴\nu_{\mathscr{Y}} and ν𝒵\nu_{\mathscr{Z}} are σ\sigma-finite measures that depend on the support sets of responses YiY_{i} and connections Zi,jZ_{i,j} (e.g., Lebesgue or counting measure).

Remark: Importance of Additional Structure. To respect real-world constraints and facilitate theoretical guarantees, joint probability models for dependent responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} should leverage additional structure, e.g., local dependence structure. For one thing, units in large populations may not be aware of most other units in the population, so it is not credible that the responses and connections of units depend on the responses and connections of all other units in the population. In addition, models permitting strong dependence among the responses and connections of all units in the population may suffer from model near-degeneracy (Handcock, 2003; Schweinberger, 2011; Chatterjee and Diaconis, 2013). By contrast, Stewart and Schweinberger (2025) demonstrate that leveraging additional structure to control dependence can lead to theoretical guarantees. Motivated by these considerations, we assume that each unit i𝒫Ni\in\mathscr{P}_{N} has a known set of neighbors 𝒩i𝒫N\mathscr{N}_{i}\subset\mathscr{P}_{N}, which includes ii and is independent of connections 𝒁\bm{Z}, and that the dependence among responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} is local in the sense that it is limited to overlapping neighborhoods. We provide examples of joint probability models for (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} with local dependence in Sections 2.2 and 6.1.

Remark: Fixed versus Random Design. In line with the bulk of the literature on regression models, we consider a fixed design: We view predictors 𝑿\bm{X} and neighborhoods 𝒩1,,𝒩N\mathscr{N}_{1},\ldots,\mathscr{N}_{N} as exogenous and known, and we do not make assumptions about the mechanism generating them. Thus, Equation (1) specifies the joint probability density function of responses and connections (𝒀,𝒁)(\bm{Y},\bm{Z}) conditional on predictors 𝑿=𝒙\bm{X}=\bm{x} and neighborhoods 𝒩1,,𝒩N\mathscr{N}_{1},\ldots,\mathscr{N}_{N}. If 𝑿\bm{X} and 𝒩1,,𝒩N\mathscr{N}_{1},\ldots,\mathscr{N}_{N} were random, the conditional model for (𝒀,𝒁)𝑿=𝒙,𝒩1,,𝒩N(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x},\,\mathscr{N}_{1},\ldots,\mathscr{N}_{N} could be combined with marginal models for 𝑿\bm{X} and 𝒩1,,𝒩N\mathscr{N}_{1},\ldots,\mathscr{N}_{N}. In the social media application in Section 6, the neighborhoods are fixed and known: The neighborhoods of users are the sets of followees, because users choose whom to follow and hence who can influence them, and these choices are observed. If the neighborhoods were unobserved, one could view them as unobserved constants (if neighborhoods were fixed) or unobserved variables (if neighborhoods were random) and learn them from attributes (𝑿,𝒀)(\bm{X},\bm{Y}) or connections 𝒁\bm{Z}. The problem of how to learn neighborhoods is an open problem and constitutes a promising avenue for future research.

2.1 GLM Representations

The proposed joint probability models of (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} can be viewed as generalizations of Generalized Linear Models (GLMs) (Efron, 2022). GLMs form a well-known, interpretable, and widely applicable statistical framework for univariate responses Yi𝒴iY_{i}\in\mathscr{Y}_{i} given predictors 𝒙id\bm{x}_{i}\in\mathbb{R}^{d} (d1d\geq 1), including logistic regression (Yi{0,1}Y_{i}\in\{0,1\}), Poisson regression (Yi{0,1,}Y_{i}\in\{0,1,\ldots\}), and linear regression (YiY_{i}\in\mathbb{R}). GLMs are characterized by two properties:

  1. 1.

    Conditional mean: The conditional mean μi(ηi)𝔼ηi(Yi𝒙i)\mu_{i}(\eta_{i})\coloneqq\mathbb{E}_{\eta_{i}}(Y_{i}\mid\bm{x}_{i}) of response Yi𝒴iY_{i}\in\mathscr{Y}_{i}, conditional on predictors 𝒙id\bm{x}_{i}\in\mathbb{R}^{d} with weights 𝜷d\bm{\beta}\in\mathbb{R}^{d}, is a (possibly nonlinear) function of a linear predictor ηi𝜷𝒙i\eta_{i}\coloneqq\bm{\beta}^{\top}\bm{x}_{i}.

  2. 2.

    Conditional distribution: The conditional distribution of response YiY_{i} is an exponential family distribution with a known scale parameter ψ(0,+)\psi\in(0,\,+\infty), which admits a density with respect to a σ\sigma-finite measure ν𝒴\nu_{\mathscr{Y}} of the form

    fηi(yi𝒙i)a𝒴(yi)exp(ηiyibi(ηi)ψ),\begin{array}[]{llllllllll}f_{\eta_{i}}(y_{i}\mid\bm{x}_{i})&\coloneqq&a_{\mathscr{Y}}(y_{i})\,\exp\left(\dfrac{\eta_{i}\,y_{i}-b_{i}(\eta_{i})}{\psi}\right),\end{array}

    with cumulant-generating function

    bi(ηi)ψlog𝒴ia𝒴(y)exp(ηiyψ)dν𝒴(y).\begin{array}[]{llllllllll}b_{i}(\eta_{i})&\coloneqq&\psi\,\log\displaystyle\int\limits_{\mathscr{Y}_{i}}\,a_{\mathscr{Y}}(y)\,\exp\left(\dfrac{\eta_{i}\,y}{\psi}\right)\mathop{\mbox{d}}\nolimits\nu_{\mathscr{Y}}(y).\end{array}

    The conditional mean μi(ηi)\mu_{i}(\eta_{i}) can be obtained by differentiating bi(ηi)b_{i}(\eta_{i}):μi(ηi)=ηibi(ηi)\mu_{i}(\eta_{i})=\nabla_{\eta_{i}}\,b_{i}(\eta_{i}) (Corollary 2.3, Brown, 1986, pp. 35–36).

The relationship to GLMs facilitates the interpretation and dissemination of results. The following proposition clarifies the relationship to GLMs.

Proposition 1: GLM Representation of Conditionals. Consider any pair of units {i,j}𝒫N\{i,j\}\subset\mathscr{P}_{N} (i<ji<j) and assume that gig_{i} and hi,jh_{i,j} are affine functions of yiy_{i}^{\star} for any given (𝐱,𝐲i,𝐳)𝒳×𝒴i×𝒵(\bm{x},\,\bm{y}_{-i},\,\bm{z})\in\mathscr{X}\times\mathscr{Y}_{-i}\times\mathscr{Z}, in the sense that there exist known functions gi,0:𝒳iqg_{i,0}:\mathscr{X}_{i}\mapsto\mathbb{R}^{q},  gi,1:𝒳iqg_{i,1}:\mathscr{X}_{i}\mapsto\mathbb{R}^{q},  hi,j,0:𝒳×𝒴j×𝒵rh_{i,j,0}:\mathscr{X}\times\mathscr{Y}_{j}\times\mathscr{Z}\mapsto\mathbb{R}^{r},  and hi,j,1:𝒳×𝒴j×𝒵rh_{i,j,1}:\mathscr{X}\times\mathscr{Y}_{j}\times\mathscr{Z}\mapsto\mathbb{R}^{r} such that

gi(𝒙i,yi)gi,0(𝒙i)+gi,1(𝒙i)yihi,j(𝒙,yi,yj,𝒛)hi,j,0(𝒙,yj,𝒛)+hi,j,1(𝒙,yj,𝒛)yi.\begin{array}[]{llllllllll}g_{i}(\bm{x}_{i},\,y_{i}^{\star})\;\coloneqq\;g_{i,0}(\bm{x}_{i})+g_{i,1}(\bm{x}_{i})\;y_{i}^{\star}\vskip 7.11317pt\\ h_{i,j}(\bm{x},\,y_{i}^{\star},\,y_{j}^{\star},\,\bm{z})\;\coloneqq\;h_{i,j,0}(\bm{x},\,y_{j}^{\star},\,\bm{z})+h_{i,j,1}(\bm{x},\,y_{j}^{\star},\,\bm{z})\;y_{i}^{\star}.\end{array}

Then the conditional distribution of response  Yi(𝐗,𝐘i,𝐙)=(𝐱,𝐲i,𝐳)Y_{i}\mid(\bm{X},\,\bm{Y}_{-i},\,\bm{Z})=(\bm{x},\,\bm{y}_{-i},\,\bm{z}) by unit ii can be represented by a GLM with linear predictor

ηi(𝜽;𝒙,𝒚i,𝒛)𝜽(gi,1(𝒙i),j𝒫N{i}hi,j,1(𝒙,yj,𝒛))\begin{array}[]{llllllllll}\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z})&\coloneqq&\bm{\theta}^{\top}\left(g_{i,1}(\bm{x}_{i}),\;\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,h_{i,j,1}(\bm{x},\,y_{j}^{\star},\,\bm{z})\right)\end{array}

and cumulant-generating function

bi(ηi(𝜽;𝒙,𝒚i,𝒛))ψlog𝒴ia𝒴(y)exp(ηi(𝜽;𝒙,𝒚i,𝒛)yψ)dν𝒴(y).\begin{array}[]{llllllllll}b_{i}(\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z}))&\coloneqq&\psi\,\log\displaystyle\int\limits_{\mathscr{Y}_{i}}\,a_{\mathscr{Y}}(y)\,\exp\left(\dfrac{\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z})\;y}{\psi}\right)\mathop{\mbox{d}}\nolimits\nu_{\mathscr{Y}}(y).\end{array}

To ease the notation, we henceforth write ηi\eta_{i} instead of  ηi(𝜽;𝒙,𝒚i,𝒛)\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z}).

Proposition 2.1 supplies a recipe for representing the conditional distribution of responses Yi(𝑿,𝒀i,𝒁)=(𝒙,𝒚i,𝒛)Y_{i}\mid(\bm{X},\,\bm{Y}_{-i},\,\bm{Z})=(\bm{x},\,\bm{y}_{-i},\,\bm{z}) by a GLM:

  1. 1.

    Conditional distribution: The conditional distribution of response YiY_{i} is an exponential family distribution, which can be represented by a GLM with conditional mean μi(ηi)\mu_{i}(\eta_{i}), linear predictor ηi\eta_{i}, and scale parameter ψ\psi.

  2. 2.

    Conditional mean: The conditional mean μi(ηi)𝔼ηi(Yi𝒙,𝒚i,𝒛)\mu_{i}(\eta_{i})\coloneqq\mathbb{E}_{\eta_{i}}(Y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z}) can be obtained by differentiating bi(ηi)b_{i}(\eta_{i}): μi(ηi)=ηibi(ηi)\mu_{i}(\eta_{i})=\nabla_{\eta_{i}}\,b_{i}(\eta_{i}). Since the map ηiμi\eta_{i}\mapsto\mu_{i} is one-to-one and invertible (Theorem 3.6, Brown, 1986, p. 74),  ηi\eta_{i} can be obtained by inverting μi(ηi)\mu_{i}(\eta_{i}).

Thus, the proposed regression framework for dependent responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} inherits the GLM advantages of being interpretable and widely applicable, without assuming that responses or connections are independent. As a result, the proposed regression framework can be viewed as a generalization of GLMs.

2.2 Example: Model Specification

We showcase how a joint probability model for dependent responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} with local dependence can be constructed, leveraging additional structure in the form of overlapping neighborhoods 𝒩1,,𝒩N\mathscr{N}_{1},\ldots,\mathscr{N}_{N} to control the dependence among responses and connections in small and large populations.

We focus on units i𝒫Ni\in\mathscr{P}_{N} with binary, count-, or real-valued predictors xi𝒳ix_{i}\in\mathscr{X}_{i} and responses Yi𝒴iY_{i}\in\mathscr{Y}_{i} and binary connections Zi,j{0,1}Z_{i,j}\in\{0,1\}. Starting with gig_{i}, we capture the main effect of YiY_{i}^{\star} and the interaction effect of xix_{i} and YiY_{i}^{\star} by specifying gig_{i} as follows:

𝜽g(α𝒴β𝒳,𝒴)2,gi(yixiyi)2.\begin{array}[]{llllllllll}\bm{\theta}_{g}\,\coloneqq\,\left(\begin{array}[]{ccc}\alpha_{\mathscr{Y}}\\ \beta_{\mathscr{X},\mathscr{Y}}\end{array}\right)\,\in\,\mathbb{R}^{2},&g_{i}\,\coloneqq\,\left(\begin{array}[]{ccc}y_{i}^{\star}\\ x_{i}\,y_{i}^{\star}\end{array}\right)\,\in\,\mathbb{R}^{2}.\end{array} (2)

Turning to hi,jh_{i,j}, we define neighborhood-related terms

ci,j𝟙(𝒩i𝒩j),di,j(𝒛)𝟙(k𝒩i𝒩j:zi,k=zk,j=1).\begin{array}[]{llllllllll}c_{i,j}\coloneqq\mathbbm{1}(\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset),\quad d_{i,j}(\bm{z})\coloneqq\mathbbm{1}(\exists\;k\,\in\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,:\,z_{i,k}=z_{k,j}=1).\end{array} (3)

To capture unobserved heterogeneity in the propensities of units to form connections, we introduce the NN-vector 𝜶𝒵(α𝒵,1,,α𝒵,N)N\bm{\alpha}_{\mathscr{Z}}\coloneqq(\alpha_{\mathscr{Z},1},\ldots,\alpha_{\mathscr{Z},N})\in\mathbb{R}^{N}. In addition, we penalize connections among units ii and jj with non-overlapping neighborhoods and capture transitive closure along with treatment and outcome spillover by specifying hi,jh_{i,j} as follows:

𝜽h(𝜶𝒵λγ𝒵,𝒵γ𝒳,𝒴,𝒵γ𝒴,𝒴,𝒵)N+4,hi,j(𝒆i,jzi,j(1ci,j)zi,jlogNdi,j(𝒛)zi,jci,j(xiyj+xjyi)zi,jci,jyiyjzi,j)N+4,\begin{array}[]{llllllllll}\bm{\theta}_{h}\coloneqq\left(\begin{array}[]{ccc}\bm{\alpha}_{\mathscr{Z}}\\ \lambda\\ \gamma_{\mathscr{Z},\mathscr{Z}}\\ \gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\\ \gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\end{array}\right)\in\mathbb{R}^{N+4},&h_{i,j}\coloneqq\left(\begin{array}[]{ccc}\bm{e}_{i,j}\,z_{i,j}\\ -(1-c_{i,j})\,z_{i,j}\log N\\ d_{i,j}(\bm{z})\,z_{i,j}\\ c_{i,j}\,(x_{i}\,y_{j}^{\star}+x_{j}\,y_{i}^{\star})\,z_{i,j}\\ c_{i,j}\,y_{i}^{\star}\,y_{j}^{\star}\,z_{i,j}\end{array}\right)\in\mathbb{R}^{N+4},\end{array} (4)

where 𝒆i,j\bm{e}_{i,j} denotes the NN-vector whose whose iith and jjth coordinates are 11 and whose other coordinates are all 0. The parameters α𝒵,1,,α𝒵,N\alpha_{\mathscr{Z},1},\ldots,\alpha_{\mathscr{Z},N} can be interpreted as the propensities of units 1,,N1,\dots,N to form connections; λ>0\lambda>0 discourages connections among units with non-overlapping neighborhoods; γ𝒵,𝒵\gamma_{\mathscr{Z},\mathscr{Z}} quantifies the tendency towards transitive closure among connections; and γ𝒳,𝒴,𝒵\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}} and γ𝒴,𝒴,𝒵\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}} capture treatment and outcome spillover, respectively. Sections 2.2.1 and 2.2.2 demonstrate that the interpretation of these effects is facilitated by the fact that the conditional distributions of YiY_{i} and Zi,jZ_{i,j} can be represented by GLMs.

2.2.1 GLM Representation of Responses YiY_{i}

To interpret the model specified by Equations (2) and (4), we take advantage of the fact that the conditional distribution of response Yi(𝑿,𝒀i,𝒁)=(𝒙,𝒚i,𝒛)Y_{i}\mid(\bm{X},\,\bm{Y}_{-i},\,\bm{Z})=(\bm{x},\,\bm{y}_{-i},\,\bm{z}) by unit ii can be represented by a GLM with linear predictor

ηi=α𝒴+β𝒳,𝒴xi+γ𝒳,𝒴,𝒵j:𝒩i𝒩jxjzi,j+γ𝒴,𝒴,𝒵j:𝒩i𝒩jyjzi,j.\begin{array}[]{llllllllll}\eta_{i}&=&\alpha_{\mathscr{Y}}+\beta_{\mathscr{X},\mathscr{Y}}\;x_{i}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j:\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset}\,x_{j}\,z_{i,j}+\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j:\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset}\,y_{j}^{\star}\,z_{i,j}.\end{array} (5)

Figure 1 depicts the predictors, responses, and connections that affect the conditional distribution of response YiY_{i}. We provide three specific examples, depending on the support set of response YiY_{i}.

Refer to caption
Figure 1: Given N=3N=3 units 1,2,31,2,3 with neighborhoods 𝒩1{1,2}\mathscr{N}_{1}\coloneqq\{1,2\},  𝒩2{1,2,3}\mathscr{N}_{2}\coloneqq\{1,2,3\},  and 𝒩3{2,3}\mathscr{N}_{3}\coloneqq\{2,3\}, the arrows indicate which predictors, responses, and connections can affect the response Y1Y_{1} of unit 11 according to the model specified by Equations (2) and (4).

Example 1: Real-valued Responses YiY_{i}\in\mathbb{R}. Let ψ(0,+)\psi\in(0,+\infty) and

a𝒴(yi)12πψexp(yi22ψ)𝕀(yi).\begin{array}[]{llllllllll}a_{\mathscr{Y}}(y_{i})&\coloneqq&\dfrac{1}{\sqrt{2\,\pi\,\psi}}\,\exp\left(-\dfrac{y_{i}^{2}}{2\,\psi}\right)\,\mathbb{I}(y_{i}\in\mathbb{R}).\end{array}
  1. 1.

    Conditional distribution: The conditional distribution of response YiY_{i} is N(μi(ηi),ψ)N(\mu_{i}(\eta_{i}),\,\psi).

  2. 2.

    Conditional mean: The conditional mean μi(ηi)\mu_{i}(\eta_{i}) can be obtained by differentiating bi(ηi)=ηi2/ 2b_{i}(\eta_{i})=\eta_{i}^{2}/\,2 with respect to ηi\eta_{i},  giving μi(ηi)=ηi\mu_{i}(\eta_{i})=\eta_{i}:

    μi(ηi)=α𝒴+β𝒳,𝒴xi+γ𝒳,𝒴,𝒵j:𝒩i𝒩jxjzi,j+γ𝒴,𝒴,𝒵j:𝒩i𝒩jyjzi,j.\begin{array}[]{llllllllll}\mu_{i}(\eta_{i})&=&\alpha_{\mathscr{Y}}+\beta_{\mathscr{X},\mathscr{Y}}\;x_{i}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j:\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset}\,x_{j}\,z_{i,j}+\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j:\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset}\,y_{j}^{\star}\,z_{i,j}.\end{array}

    Under certain restrictions on γ𝒴,𝒴,𝒵\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}, the conditional distribution of 𝒀(𝑿,𝒁)=(𝒙,𝒛)\bm{Y}\mid(\bm{X},\bm{Z})=(\bm{x},\bm{z}) is NN-variate Gaussian. The restrictions on γ𝒴,𝒴,𝒵\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}} depend on the neighborhoods 𝒩i\mathscr{N}_{i} and 𝒩j\mathscr{N}_{j} and connections Zi,jZ_{i,j} of pairs of units {i,j}𝒫N\{i,j\}\subset\mathscr{P}_{N}; see Proposition A in Section A of the Supplementary Materials.

Example 2: Count-valued Responses Yi{0,1,}Y_{i}\in\{0,1,\dots\}. Let ψ1\psi\coloneqq 1 and

a𝒴(yi)1yi!𝕀(yi{0,1,}).\begin{array}[]{llllllllll}a_{\mathscr{Y}}(y_{i})&\coloneqq&\dfrac{1}{y_{i}!}\;\mathbb{I}(y_{i}\in\{0,1,\dots\}).\end{array}
  1. 1.

    Conditional distribution: The conditional distribution of response YiY_{i} is Poisson(μi(ηi))\mbox{Poisson}(\mu_{i}(\eta_{i})).

  2. 2.

    Conditional mean: The conditional mean μi(ηi)\mu_{i}(\eta_{i}) can be obtained by differentiating bi(ηi)=exp(ηi)b_{i}(\eta_{i})=\exp(\eta_{i}) with respect to ηi\eta_{i},  giving μi(ηi)=exp(ηi)\mu_{i}(\eta_{i})=\exp(\eta_{i}).

Example 3: Binary Responses Yi{0, 1}Y_{i}\in\{0,\,1\}. Let ψ1\psi\coloneqq 1 and a𝒴(yi)𝕀(yi{0,1})a_{\mathscr{Y}}(y_{i})\coloneqq\mathbb{I}(y_{i}\in\{0,1\}).

  1. 1.

    Conditional distribution: The conditional distribution of response YiY_{i} is Bernoulli(μi(ηi))\mbox{Bernoulli}(\mu_{i}(\eta_{i})).

  2. 2.

    Conditional mean: The conditional mean μi(ηi)\mu_{i}(\eta_{i}) can be obtained by differentiating bi(ηi)=log(1+exp(ηi))b_{i}(\eta_{i})=\log(1+\exp(\eta_{i})) with respect to ηi\eta_{i},  giving μi(ηi)=logit1(ηi)\mu_{i}(\eta_{i})=\mbox{logit}^{-1}(\eta_{i}).

Interpretation of Examples. According to Equations (2) and (4), regardless of the conditional distribution of response YiY_{i}, α𝒴\alpha_{\mathscr{Y}} can be viewed as an intercept, while β𝒳,𝒴\beta_{\mathscr{X},\mathscr{Y}} captures the relationship between predictor xix_{i} and response YiY_{i}. The parameters γ𝒳,𝒴,𝒵\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}} and γ𝒴,𝒴,𝒵\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}} capture two distinct spillover effects:

  • Treatment spillover: γ𝒳,𝒴,𝒵0\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\neq 0 allows the outcome YiY_{i} of unit ii to be affected by the treatments xjx_{j} of its neighbors j𝒩ij\in\mathscr{N}_{i} and non-neighbors j𝒩ij\not\in\mathscr{N}_{i},  provided 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset and ii and jj are connected (see Figure 1).

  • Outcome spillover: γ𝒴,𝒴,𝒵0\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\neq 0 allows the outcome YiY_{i} of unit ii to be affected by the outcomes yjy_{j} of its neighbors j𝒩ij\in\mathscr{N}_{i} and non-neighbors j𝒩ij\not\in\mathscr{N}_{i},  provided 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset and ii and jj are connected (see Figure 1).

The proposed regression framework can be used for causal inference under network interference, which studies treatment spillover. That said, the framework is considerably broader, because it permits outcome spillover, and spillover need not be studied in a causal setting.

2.2.2 GLM Representation of Connections Zi,jZ_{i,j}

The conditional mean μi,j(ηi,j)𝔼ηi,j(Zi,j𝒙,𝒚,𝒛{i,j})=logit1(ηi,j)\mu_{i,j}(\eta_{i,j})\coloneqq\mathbb{E}_{\eta_{i,j}}(Z_{i,j}\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})=\mbox{logit}^{-1}(\eta_{i,j}) of connection Zi,j{0,1}Z_{i,j}\in\{0,1\} depends on the linear predictor

ηi,j=α𝒵,i+α𝒵,j(1ci,j)λlogN+ci,j[γ𝒵,𝒵Δi,j(𝒛)+γ𝒳,𝒴,𝒵(xiyj+xjyi)+γ𝒴,𝒴,𝒵yiyj],\begin{array}[]{llllllllll}\eta_{i,j}=\alpha_{\mathscr{Z},i}+\alpha_{\mathscr{Z},j}-(1-c_{i,j})\,\lambda\,\log N+c_{i,j}\left[\gamma_{\mathscr{Z},\mathscr{Z}}\,\Delta_{i,j}(\bm{z})+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,(x_{i}\,y_{j}^{\star}+x_{j}\,y_{i}^{\star})+\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,y_{i}^{\star}\,y_{j}^{\star}\right],\end{array}

where Δi,j:𝒵\Delta_{i,j}:\mathscr{Z}\mapsto\mathbb{R} is the change in a<bNda,b(𝒛)\sum_{a<b}^{N}d_{a,b}(\bm{z}) due to transforming zi,jz_{i,j} from 0 to 11. The logistic regression representation of Zi,j(𝑿,𝒀,𝒁{i,j})=(𝒙,𝒚,𝒛{i,j})Z_{i,j}\mid(\bm{X},\bm{Y},\bm{Z}_{-\{i,j\}})=(\bm{x},\bm{y},\bm{z}_{-\{i,j\}}) facilitates interpretation: e.g., α𝒵,i\alpha_{\mathscr{Z},i} captures heterogeneity among units ii in forming connections. If λ>0\lambda>0, the sparsity-inducing term (1ci,j)λlogN-(1-c_{i,j})\,\lambda\,\log N penalizes connections between pairs of units with non-overlapping neighborhoods, where the logN\log N-term can be motivated in the special case of Bernoulli random graphs (Krivitsky et al., 2023): If Zi,jiidBernoulli(π)Z_{i,j}\mathop{\rm\sim}\limits^{\mbox{\tiny iid}}\mbox{Bernoulli}(\pi) and the expected degrees 𝔼j=1NZi,j\mathbb{E}\sum_{j=1}^{N}Z_{i,j} are bounded, then π=O(1/N)\pi=O(1/N) and logit(π)=O(logN)\mbox{logit}(\pi)=O(\log N). In addition, the model captures three forms of dependencies. First, the model encourages ii and jj to be connected when ii and jj are both connected to some k𝒩i𝒩jk\,\in\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}, provided 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset and γ𝒵,𝒵>0\gamma_{\mathscr{Z},\mathscr{Z}}>0. Second, the model encourages ii and jj to be connected when xiyj>0x_{i}\,y_{j}^{\star}>0 or xjyi>0x_{j}\,y_{i}^{\star}>0, provided 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset and γ𝒳,𝒴,𝒵>0\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}>0. Third, the model encourages ii and jj to be connected when yiyj>0y_{i}^{\star}\;y_{j}^{\star}>0, provided 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset and γ𝒴,𝒴,𝒵> 0\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,>\,0.

3 Scalable Statistical Computing

To learn the regression framework from a single observation (𝒚,𝒛)(\bm{y},\bm{z}) of dependent responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}, we develop scalable methods based on convex optimization of pseudo-likelihoods using minorization-maximization methods.

3.1 Pseudo-Loglikelihood

Let

(𝜽;𝒚,𝒛)i=1Ni(𝜽;𝒚,𝒛)+i=1N1j=i+1Ni,j(𝜽;𝒚,𝒛),\begin{array}[]{llllllllll}\ell(\bm{\theta};\,\bm{y},\,\bm{z})&\coloneqq&\displaystyle\sum\limits_{i=1}^{N}\ell_{i}(\bm{\theta};\,\bm{y},\,\bm{z})+\displaystyle\sum\limits_{i=1}^{N-1}\,\displaystyle\sum\limits_{j=i+1}^{N}\ell_{i,j}(\bm{\theta};\,\bm{y},\,\bm{z}),\end{array} (6)

where the dependence on predictors 𝒙𝒳\bm{x}\in\mathscr{X} is suppressed and i\ell_{i} and i,j\ell_{i,j} are defined by

i(𝜽;𝒚,𝒛)logf𝜽(yi𝒚i,𝒛)andi,j(𝜽;𝒚,𝒛)logf𝜽(zi,j𝒚,𝒛{i,j}).\begin{array}[]{llllllllll}\ell_{i}(\bm{\theta};\,\bm{y},\,\bm{z})&\coloneqq&\log f_{\bm{\theta}}(y_{i}\mid\bm{y}_{-i},\,\bm{z})&\mbox{and}&\ell_{i,j}(\bm{\theta};\,\bm{y},\,\bm{z})&\coloneqq&\log f_{\bm{\theta}}(z_{i,j}\mid\bm{y},\,\bm{z}_{-\{i,j\}}).\end{array}

The pseudo-loglikelihood \ell is based on full conditional densities of responses YiY_{i} and connections Zi,jZ_{i,j} and is hence tractable. In addition, \ell is a sum of exponential family loglikelihood functions i\ell_{i} and i,j\ell_{i,j}, each of which is concave and twice differentiable on the convex set 𝚯\bm{\Theta} (Brown, 1986, Theorem 1.13, p. 19 and Lemma 5.3, p. 146), proving Lemma 3.1:

Lemma 1: Convexity and Smoothness. The set 𝚯\bm{\Theta} is convex and the pseudo-loglikelihood function :𝚯\ell:\bm{\Theta}\mapsto\mathbb{R}, considered as a function of  𝛉\bm{\theta} for fixed (𝐲,𝐳)𝒴×𝒵(\bm{y},\bm{z})\in\mathscr{Y}\times\mathscr{Z}, is twice differentiable with a negative semidefinite Hessian matrix on 𝚯\bm{\Theta}.

In light of the tractability and concavity of \ell, it makes sense to base statistical learning on pseudo-likelihood estimators of the form

𝚯^(δN){𝜽𝚯:𝜽(𝜽;𝒚,𝒛)δN},\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\coloneqq&\left\{\bm{\theta}\,\in\,\bm{\Theta}:\;|\!|\nabla_{\bm{\theta}}~\ell(\bm{\theta};\,\bm{y},\,\bm{z})|\!|_{\infty}\;\leq\;\delta_{N}\right\},\end{array} (7)

where 𝜽\nabla_{\bm{\theta}} denotes the gradient with respect to 𝜽\bm{\theta} while 𝒗max1kp|vk||\!|\bm{v}|\!|_{\infty}\,\coloneqq\,\max_{1\leq k\leq p}\,|v_{k}| denotes the \ell_{\infty}-norm of vectors 𝒗p\bm{v}\in\mathbb{R}^{p}. The quantity δN[0,+)\delta_{N}\in[0,+\infty) can be viewed as a convergence criterion of a root-finding algorithm and can depend on NN. The set 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) consists of maximizers of \ell when δN=0\delta_{N}=0, and maximizers and near-maximizers when δN>0\delta_{N}>0.

3.2 Minorization-Maximization (MM)

While pseudo-likelihood estimators 𝜽^𝚯^(δN)\widehat{\bm{\theta}}\in\widehat{\bm{\Theta}}(\delta_{N}) can be obtained by standard root-finding algorithms, inverting the p×pp\times p negative Hessian of  \ell at each iteration is time-consuming, because inversions require O(p3)O(p^{3}) operations and pp can increase with NN. We thus divide the task of estimating pp parameters into two subtasks using MM methods (Hunter and Lange, 2004). In the example model specified by Equations (2) and (4) for binary, count-, or real-valued predictors and responses (Xi,Yi)(X_{i},Y_{i}) and binary connections Zi,jZ_{i,j}, we partition 𝜽N+6\bm{\theta}\in\mathbb{R}^{N+6} into NN nuisance parameters,  𝜽1(α𝒵,1,,α𝒵,N)N\bm{\theta}_{1}\coloneqq(\alpha_{\mathscr{Z},1},\,\dots,\,\alpha_{\mathscr{Z},N})\in\mathbb{R}^{N}, and 66 parameters of primary interest,  𝜽2(λ,α𝒴,β𝒳,𝒴,γ𝒵,𝒵,γ𝒳,𝒴,𝒵,γ𝒴,𝒴,𝒵)6\bm{\theta}_{2}\coloneqq(\lambda,\,\alpha_{\mathscr{Y}},\,\beta_{\mathscr{X},\mathscr{Y}},\,\gamma_{\mathscr{Z},\mathscr{Z}},\,\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}},\,\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}})\in\mathbb{R}^{6}. We then partition the negative Hessian of \ell accordingly:

𝜽2(𝜽;𝒚,𝒛)(𝐀(𝜽)𝐁(𝜽)𝐁(𝜽)𝐂(𝜽)),\begin{array}[]{llllllllll}-\nabla_{\bm{\theta}}^{2}~\ell(\bm{\theta};\,\bm{y},\,\bm{z})&\coloneqq&\begin{pmatrix}\mathbf{A}(\bm{\theta})&\mathbf{B}(\bm{\theta})\\ \mathbf{B}(\bm{\theta})^{\top}&\mathbf{C}(\bm{\theta})&\end{pmatrix},\end{array} (8)

where 𝐀(𝜽)N×N\mathbf{A}(\bm{\theta})\in\mathbb{R}^{N\times N},  𝐁(𝜽)N×6\mathbf{B}(\bm{\theta})\in\mathbb{R}^{N\times 6},  and 𝐂(𝜽)6×6\mathbf{C}(\bm{\theta})\in\mathbb{R}^{6\times 6}. We suppress the dependence of 𝐀(𝜽)\mathbf{A}(\bm{\theta}), 𝐁(𝜽)\mathbf{B}(\bm{\theta}), and 𝐂(𝜽)\mathbf{C}(\bm{\theta}) on (𝒚,𝒛)(\bm{y},\,\bm{z}) and henceforth write (𝜽1,𝜽2;𝒚,𝒛)\ell(\bm{\theta}_{1},\bm{\theta}_{2};\,\bm{y},\,\bm{z}) instead of (𝜽;𝒚,𝒛)\ell(\bm{\theta};\,\bm{y},\,\bm{z}).

Iteration t+1t+1 then consists of two steps:

  • Step 1: Find 𝜽1(t+1)\bm{\theta}_{1}^{(t+1)} satisfying (𝜽1(t+1),𝜽2(t);𝒚,𝒛)(𝜽1(t),𝜽2(t);𝒚,𝒛)\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})\,\geq\,\ell(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z}).

  • Step 2: Find 𝜽2(t+1)\bm{\theta}_{2}^{(t+1)} satisfying (𝜽1(t+1),𝜽2(t+1);𝒚,𝒛)(𝜽1(t+1),𝜽2(t);𝒚,𝒛)\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t+1)};\,\bm{y},\,\bm{z})\,\geq\,\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z}).

In Step 1, it is inconvenient to invert the high-dimensional N×NN\times N matrix

𝑨(𝜽(t))i<jN𝜽12i,j(𝜽1,𝜽2(t);𝒚,𝒛)|𝜽1=𝜽1(t)=i<jNπi,j(t)(1πi,j(t))𝒆i,j𝒆i,j,\begin{array}[]{llllllllll}\bm{A}(\bm{\theta}^{(t)})&\coloneqq&-\displaystyle\sum\limits_{i<j}^{N}\,\nabla_{\bm{\theta}_{1}}^{2}\,\ell_{i,j}(\bm{\theta}_{1},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})\Big{|}_{\bm{\theta}_{1}=\bm{\theta}_{1}^{(t)}}&=&\displaystyle\sum\limits_{i<j}^{N}\pi_{i,j}^{(t)}\,(1-\pi_{i,j}^{(t)})\,\bm{e}_{i,j}\,\bm{e}_{i,j}^{\top},\end{array} (9)

where πi,j(t)𝜽(t)(Zi,j=1𝒚,𝒛{i,j})\pi_{i,j}^{(t)}\coloneqq\mathbb{P}_{\bm{\theta}^{(t)}}(Z_{i,j}=1\mid\bm{y},\,\bm{z}_{-\{i,j\}}). We thus increase \ell by maximizing a minorizer of \ell, replacing 𝑨(𝜽(t))\bm{A}(\bm{\theta}^{(t)}) by a constant matrix 𝑨\bm{A}^{\star} that only needs to be inverted once.

Lemma 2: Minorizer. Define

𝑨14i<jN𝒆i,j𝒆i,j=14[(N2)𝑰+𝟏𝟏]=[4N2(𝑰12N2 11)]1,\begin{array}[]{llllllllll}\bm{A}^{\star}&\coloneqq&\dfrac{1}{4}\,\displaystyle\sum\limits_{i<j}^{N}\bm{e}_{i,j}\,\bm{e}_{i,j}^{\top}\ =\ \dfrac{1}{4}\,\left[(N-2)\,\bm{I}+\bm{1}\bm{1}^{\top}\right]\ =\ \left[\dfrac{4}{N-2}\,\left(\bm{I}-\dfrac{1}{2\,N-2}\,\bm{1}\bm{1}^{\top}\right)\right]^{-1},\end{array}

where 𝐈\bm{I} is the N×NN\times N identity matrix and 𝟏\bm{1} is the NN-vector of ones. Then the function

m(𝜽1;𝜽1(t),𝜽2(t),𝒚,𝒛)(𝜽1(t),𝜽2(t);𝒚,𝒛)+(𝜽1(𝜽1,𝜽2(t);𝒚,𝒛)|𝜽1=𝜽1(t))(𝜽1𝜽1(t))+12(𝜽1𝜽1(t))(𝑨)(𝜽1𝜽1(t))\begin{array}[]{llllllllll}m(\bm{\theta}_{1};\,\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)},\,\bm{y},\,\bm{z})&\coloneqq&\ell(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})\vskip 7.11317pt\\ &+&\left(\nabla_{\bm{\theta}_{1}}\,\ell(\bm{\theta}_{1},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})\Big{|}_{\bm{\theta}_{1}=\bm{\theta}_{1}^{(t)}}\right)^{\top}(\bm{\theta}_{1}-\bm{\theta}_{1}^{(t)})\vskip 7.11317pt\\ &+&\dfrac{1}{2}\,(\bm{\theta}_{1}-\bm{\theta}_{1}^{(t)})^{\top}\,(-\bm{A}^{\star})\,(\bm{\theta}_{1}-\bm{\theta}_{1}^{(t)})\end{array}

is a minorizer of \ell at 𝛉1(t)\bm{\theta}_{1}^{(t)} for fixed 𝛉2(t)\bm{\theta}_{2}^{(t)}, in the sense that

m(𝜽1;𝜽1(t),𝜽2(t),𝒚,𝒛)(𝜽1,𝜽2(t);𝒚,𝒛) for all 𝜽1N,m(𝜽1(t);𝜽1(t),𝜽2(t),𝒚,𝒛)=(𝜽1(t),𝜽2(t);𝒚,𝒛).\begin{array}[]{llllllllll}m(\bm{\theta}_{1};\,\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)},\,\bm{y},\,\bm{z})&\leq&\ell(\bm{\theta}_{1},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})\;\mbox{ for all }\;\bm{\theta}_{1}\in\mathbb{R}^{N}\vskip 7.11317pt,\\ m(\bm{\theta}_{1}^{(t)};\,\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)},\,\bm{y},\,\bm{z})&=&\ell(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z}).\end{array}

Lemma 3.2 is proved in Section B of the Supplementary Materials. Step 1 may be implemented by an MM algorithm, as the closed-form maximizer of the minorizer mm is

𝜽1(t+1)𝜽1(t)+(𝐀)1(𝜽1(𝜽1,𝜽2(t);𝒚,𝒛)|𝜽1=𝜽1(t)).\begin{array}[]{llllllllll}\bm{\theta}_{1}^{(t+1)}&\coloneqq&\bm{\theta}_{1}^{(t)}+\left(\mathbf{A}^{\star}\right)^{-1}\left(\nabla_{\bm{\theta}_{1}}\,\ell(\bm{\theta}_{1},\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})\Big{|}_{\bm{\theta}_{1}=\bm{\theta}_{1}^{(t)}}\right).\end{array} (10)

We accelerate the MM step in Equation (10) with quasi-Newton methods. Details can be found in Section E of the Supplementary Materials. Compared to a Newton-Raphson algorithm, the accelerated MM-step reduces the per-iteration computational complexity from O(N3)O(N^{3}) to O(N2)O(N^{2}). Step 2 updates the low-dimensional parameter vector of interest 𝜽2(t+1)6\bm{\theta}_{2}^{(t+1)}\in\mathbb{R}^{6} given the high-dimensional nuisance parameter vector 𝜽1(t+1)N\bm{\theta}_{1}^{(t+1)}\in\mathbb{R}^{N} by a Newton-Raphson step. The concavity of \ell, established in Lemma 3.1, guarantees that

(𝜽1(t),𝜽2(t);𝒚,𝒛)(𝜽1(t+1),𝜽2(t);𝒚,𝒛)(𝜽1(t+1),𝜽2(t+1);𝒚,𝒛).\begin{array}[]{llllllllll}\ell(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})&\leq&\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})&\leq&\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t+1)};\,\bm{y},\,\bm{z}).\end{array}

3.3 Quantifying Uncertainty

The uncertainty about the maximum pseudo-likelihood estimator 𝜽^\widehat{\bm{\theta}} of the data-generating parameter vector 𝜽\bm{\theta}^{\star} can be quantified based on the covariance matrix of the sampling distribution of 𝜽^\widehat{\bm{\theta}}, which we derive as follows: The mean-value theorem for vector-valued functions (Ortega and Rheinboldt, 2000, Equations (2) and (3), pp. 68–69) implies that there exist real numbers t1,,tp(0, 1)t_{1},\ldots,t_{p}\in(0,\,1) such that

𝜽(𝜽;𝒚,𝒛)|𝜽=𝜽^𝜽(𝜽;𝒚,𝒛)|𝜽=𝜽=𝑯(𝜽^,𝜽;𝒚,𝒛)(𝜽^𝜽),\begin{array}[]{llllllllll}\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{y},\,\bm{z})\Big{|}_{\bm{\theta}=\widehat{\bm{\theta}}}-\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{y},\,\bm{z})\Big{|}_{\bm{\theta}=\bm{\theta}^{\star}}&=&\bm{H}(\widehat{\bm{\theta}},\,\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\;(\widehat{\bm{\theta}}-\bm{\theta}^{\star}),\end{array} (11)

where

𝑯(𝜽^,𝜽;𝒚,𝒛)(g1(𝜽+t1(𝜽^𝜽);𝒚,𝒛)gp(𝜽+tp(𝜽^𝜽);𝒚,𝒛)).\begin{array}[]{llllllllll}\bm{H}(\widehat{\bm{\theta}},\,\bm{\theta}^{\star};\,\bm{y},\,\bm{z})&\coloneqq&\left(\begin{array}[]{cccccc}g_{1}^{\prime}(\bm{\theta}^{\star}+t_{1}\,(\widehat{\bm{\theta}}-\bm{\theta}^{\star});\,\bm{y},\,\bm{z})\\ \vdots\\ g_{p}^{\prime}(\bm{\theta}^{\star}+t_{p}\,(\widehat{\bm{\theta}}-\bm{\theta}^{\star});\,\bm{y},\,\bm{z})\end{array}\right).\end{array}

Here, gk(𝜽;𝒚,𝒛)g_{k}(\bm{\theta};\,\bm{y},\,\bm{z}) is the kkth coordinate of 𝜽(𝜽;𝒚,𝒛)\nabla_{\bm{\theta}}\,\,\ell(\bm{\theta};\,\bm{y},\bm{z}) and gk(𝜽;𝒚,𝒛)g_{k}^{\prime}(\bm{\theta};\,\bm{y},\,\bm{z}) is the row vector of partial derivatives of gk(𝜽;𝒚,𝒛)g_{k}(\bm{\theta};\,\bm{y},\,\bm{z}) with respect to 𝜽\bm{\theta} (k=1,,pk=1,\ldots,p). Leveraging Equation (11) along with 𝜽(𝜽;𝒚,𝒛)|𝜽=𝜽^=𝟎\nabla_{\bm{\theta}}\,\,\ell(\bm{\theta};\,\bm{y},\,\bm{z})|_{\bm{\theta}=\widehat{\bm{\theta}}}=\bm{0} gives the exact covariance matrix 𝜽^\widehat{\bm{\theta}}:

𝕍𝜽(𝜽^)=𝕍𝜽[𝑯(𝜽^,𝜽;𝒀,𝒁)1𝜽(𝜽;𝒀,𝒁)|𝜽=𝜽].\begin{array}[]{llllllllll}\mathbb{V}_{\bm{\theta}^{\star}}(\widehat{\bm{\theta}})&=&\mathbb{V}_{\bm{\theta}^{\star}}\left[-\bm{H}(\widehat{\bm{\theta}},\,\bm{\theta}^{\star};\,\bm{Y},\bm{Z})^{-1}\;\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{Y},\,\bm{Z})\Big{|}_{\bm{\theta}=\bm{\theta}^{\star}}\right].\end{array}

If NN is large, 𝜽\bm{\theta}^{\star} can be replaced by 𝜽^\widehat{\bm{\theta}}, because 𝜽^𝜽|\!|\widehat{\bm{\theta}}-\bm{\theta}^{\star}|\!|_{\infty} is small with high probability according to Theorem 4 in Section 4. The resulting approximation of 𝕍𝜽(𝜽^)\mathbb{V}_{\bm{\theta}^{\star}}(\widehat{\bm{\theta}}) is

𝕍𝜽^(𝜽^)=𝕍𝜽^[𝜽2(𝜽;𝒀,𝒁)|𝜽=𝜽^1𝜽(𝜽;𝒀,𝒁)|𝜽=𝜽^],\begin{array}[]{llllllllll}\mathbb{V}_{\widehat{\bm{\theta}}}(\widehat{\bm{\theta}})&=&\mathbb{V}_{\widehat{\bm{\theta}}}\left[-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{Y},\,\bm{Z})\Big{|}_{\bm{\theta}=\widehat{\bm{\theta}}}^{-1}\;\,\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{Y},\,\bm{Z})\Big{|}_{\bm{\theta}=\widehat{\bm{\theta}}}\right],\end{array} (12)

which can be estimated by simulating responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} from 𝜽^\mathbb{P}_{\widehat{\bm{\theta}}} using Markov chain Monte Carlo methods.

Remark: Asymptotic Distribution. Establishing asymptotic normality for pseudo-likelihood estimators based on a single observation of dependent responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} in scenarios with pp\to\infty parameters is an open problem. Asymptotic normality results in the most closely related literature—the literature in applied probability concerned with Ising models, Gibbs measures, and Markov random fields in single-observation scenarios (e.g., Jensen and Künsch, 1994; Comets and Janzura, 1998)—assume the presence of lattice structure and a fixed number of parameters pp, in addition to other assumptions motivated by applications in physics. In the current setting, none of these assumptions holds, although simulation results in Section 5 suggest that the sampling distribution of 𝜽^\widehat{\bm{\theta}} is approximately normal and that normal-based confidence intervals based on Equation (12) achieve close-to-nominal coverage probabilities.

4 Theoretical Guarantees

We establish convergence rates for pseudo-likelihood estimators 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) based on a single observation of dependent responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}. To cover a wide range of models for binary, count-, and real-valued responses and connections, we introduce a general theoretical framework and showcase convergence rates in a specific example.

We denote by 𝜽𝚯p\bm{\theta}^{\star}\in\bm{\Theta}\subseteq\mathbb{R}^{p} the data-generating parameter vector and by (𝜽,ρ){𝜽p:𝜽𝜽<ρ}\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\rho)\,\coloneqq\,\{\bm{\theta}\in\mathbb{R}^{p}:{\left|\!\left|\bm{\theta}-\bm{\theta}^{\star}\right|\!\right|_{\infty}}<\rho\} a hypercube with center 𝜽𝚯\bm{\theta}^{\star}\in\bm{\Theta} and width 2ρ(0,+)2\,\rho\in(0,+\infty). Let

(S){(𝒚,𝒛)𝒴×𝒵:𝜽2(𝜽;𝒚,𝒛) is invertible for all 𝜽S}\mathscr{I}(S)\,\coloneqq\,\left\{(\bm{y},\,\bm{z})\in\mathscr{Y}\times\mathscr{Z}:\;-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{y},\,\bm{z})\mbox{ is invertible for all $\bm{\theta}\in S$}\right\}

and, for some ϵ(0,+)\epsilon^{\star}\in(0,+\infty) and ((𝜽,ϵ))\mathscr{H}\,\subseteq\,\mathscr{I}\left(\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})\right), let

ΛN(𝜽)sup(𝒚,𝒛)sup𝜽(𝜽,ϵ)|(𝜽2(𝜽;𝒚,𝒛))1|,\Lambda_{N}(\bm{\theta}^{\star})\,\coloneqq\,\sup\limits_{(\bm{y},\,\bm{z})\,\in\,\mathscr{H}}\;\sup\limits_{\bm{\theta}\,\in\,\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})}\,|\!|\!|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{y},\,\bm{z}))^{-1}|\!|\!|_{\infty},

where |||.||||\!|\!|.|\!|\!|_{\infty} is the \ell_{\infty}-induced matrix norm. The set \mathscr{H} can be a proper subset of  ((𝜽,ϵ))\mathscr{I}\left(\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})\right), provided \mathscr{H} is a high probability subset of 𝒴×𝒵\mathscr{Y}\times\mathscr{Z}. The definition of \mathscr{H} is motivated by the fact that characterizing the set of all (𝒚,𝒛)𝒴×𝒵(\bm{y},\,\bm{z})\in\mathscr{Y}\times\mathscr{Z}  for which the Hessian is invertible can be challenging, but finding a sufficient condition for invertibility is often possible.

Theorem 1: Convergence Rate. Consider a single observation of (𝐘,𝐙)𝒴×𝒵(\bm{Y},\bm{Z})\in\mathscr{Y}\times\mathscr{Z} generated by model (1) with parameter vector  𝛉𝚯p\bm{\theta}^{\star}\in\bm{\Theta}\subseteq\mathbb{R}^{p}, where 𝒴×𝒵\mathscr{Y}\times\mathscr{Z} is a finite, countably infinite, or uncountable set. Assume that there exists a sequence ρ1,ρ2,[0,+)\rho_{1},\rho_{2},\ldots\in[0,+\infty) satisfying ρN=o(1)\rho_{N}=o(1) so that the events ||𝛉(𝛉;𝐘,𝐙)|𝛉=𝛉𝔼𝛉(𝛉;𝐘,𝐙)|𝛉=𝛉||<δN|\!|\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{Y},\bm{Z})|_{\bm{\theta}=\bm{\theta}^{\star}}-\mathbb{E}\;\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{Y},\bm{Z})|_{\bm{\theta}=\bm{\theta}^{\star}}|\!|_{\infty}<\delta_{N} and (𝐘,𝐙)(\bm{Y},\bm{Z})\in\mathscr{H} occur with probability 1o(1)1-o(1), where δNρN/(2ΛN(𝛉))\delta_{N}\coloneqq\rho_{N}/(2\,\Lambda_{N}(\bm{\theta}^{\star})). Then there exists a positive integer N0N_{0} such that, for all N>N0N>N_{0}, the random set 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) is non-empty and, with probability 1o(1)1-o(1), satisfies

𝚯^(δN)(𝜽,ρN).\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\rho_{N}).\end{array}

Theorem 4 is proved in Section C of the Supplementary Materials. The requirement δNρN/(2ΛN(𝜽))\delta_{N}\coloneqq\rho_{N}/(2\,\Lambda_{N}(\bm{\theta}^{\star})) implies ρNδNΛN(𝜽)\rho_{N}\propto\delta_{N}\,\Lambda_{N}(\bm{\theta}^{\star}), so the convergence rate depends on

  • the strength of concentration of the gradient 𝜽(𝜽;𝒀,𝒁)|𝜽=𝜽\nabla_{\bm{\theta}}\,\,\ell(\bm{\theta};\,\bm{Y},\bm{Z})|_{\bm{\theta}=\bm{\theta}^{\star}} around its expectation 𝔼𝜽(𝜽;𝒀,𝒁)|𝜽=𝜽\mathbb{E}\,\nabla_{\bm{\theta}}\,\,\ell(\bm{\theta};\,\bm{Y},\bm{Z})|_{\bm{\theta}=\bm{\theta}^{\star}} via δN\delta_{N};

  • the inverse negative Hessian (𝜽2(𝜽;𝒚,𝒛))1(-\nabla_{\bm{\theta}}^{2}~\ell(\bm{\theta};\,\bm{y},\bm{z}))^{-1} in a neighborhood (𝜽,ϵ)\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star}) of 𝜽𝚯\bm{\theta}^{\star}\in\bm{\Theta} and a high probability subset (𝒚,𝒛)(\bm{y},\bm{z})\in\mathscr{H} of 𝒴×𝒵\mathscr{Y}\times\mathscr{Z} via ΛN(𝜽)\Lambda_{N}(\bm{\theta}^{\star}).

The strength of concentration of 𝜽(𝜽;𝒀,𝒁)|𝜽=𝜽\nabla_{\bm{\theta}}\,\,\ell(\bm{\theta};\,\bm{Y},\bm{Z})|_{\bm{\theta}=\bm{\theta}^{\star}} can be quantified by concentration inequalities for dependent random variables. In general, the strength of concentration depends on the sample space and the tails of the distribution, the smoothness of the functions gig_{i} and hi,jh_{i,j}, and the dependence induced by model (1). To control the dependence among responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}, one can take advantage of additional structure (e.g., one or more neighborhood structures, non-overlapping or overlapping subpopulations, or a metric space in which units are embedded). For example, each unit can have one or more neighborhoods (e.g., geographical neighbors and colleagues in the workplace), and the responses and connections of the unit can be affected by any geographical neighbor and any colleague. Theoretical guarantees can be obtained as long as the neighborhoods are not too large and do not overlap too much.

Specific convergence rates depend on the model. To demonstrate, consider predictors xix_{i}\in\mathbb{R}, responses Yi{0,1}Y_{i}\in\{0,1\}, and connections Zi,j{0,1}Z_{i,j}\in\{0,1\} generated by a model capturing heterogeneity in the propensities α𝒵,1,,α𝒵,N\alpha_{\mathscr{Z},1},\,\dots,\alpha_{\mathscr{Z},N} of units 1,,N1,\dots,N to form connections, transitive closure among connections with weight γ𝒵,𝒵\gamma_{\mathscr{Z},\mathscr{Z}}, and treatment spillover with weight γ𝒳,𝒴,𝒵\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}; compare Equations (2) and (4) in Section 2.2. Since Yi{0,1}Y_{i}\in\{0,1\} and Zi,j{0,1}Z_{i,j}\in\{0,1\}, it is reasonable to specify a𝒴(yi)𝕀(yi{0,1})a_{\mathscr{Y}}(y_{i})\coloneqq\mathbb{I}(y_{i}\in\{0,1\}) and a𝒵(zi,j)𝕀(zi,j{0,1})a_{\mathscr{Z}}(z_{i,j})\coloneqq\mathbb{I}(z_{i,j}\in\{0,1\}). Convergence rates can be obtained under the following conditions.

Condition 1: Predictors. There exist finite constants 0<c<C0<c<C such that, for each i𝒫Ni\in\mathscr{P}_{N}, xi[0,C]x_{i}\in[0,\,C] and there exists j𝒫N{i}j\in\mathscr{P}_{N}\setminus\,\{i\} such that 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset and xj[c,C]x_{j}\in[c,\,C].

Condition 2: Parameters. The parameter space is 𝚯=N+2\bm{\Theta}=\mathbb{R}^{N+2} and there exists a constant A(0,+)A\in(0,+\infty), not depending on NN, such that 𝛉<A\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}<A.

Condition 3: Dependence. The population 𝒫\mathscr{P} consists of overlapping subpopulations 𝒜1,𝒜2,\mathscr{A}_{1},\mathscr{A}_{2},\ldots, which can be represented as vertices of a subpopulation graph 𝒢𝒜\mathscr{G}_{\mathscr{A}} with an edge connecting 𝒜k\mathscr{A}_{k} and 𝒜l\mathscr{A}_{l} if 𝒜k𝒜l\mathscr{A}_{k}\,\cap\,\mathscr{A}_{l}\neq\emptyset (k<lk<l). For each 𝒜k\mathscr{A}_{k}, the number of subpopulations at geodesic distance KK in 𝒢𝒜\mathscr{G}_{\mathscr{A}} is O(logK)O(\log K). For each i𝒫ii\in\mathscr{P}_{i}, the neighborhood is

𝒩i{j𝒫N: there exists k{1,2,} such that i𝒜k and j𝒜k}.\begin{array}[]{llllllllll}\mathscr{N}_{i}&\coloneqq&\{j\,\in\,\mathscr{P}_{N}:\mbox{ there exists }k\in\{1,2,\ldots\}\text{ such that }i\in\mathscr{A}_{k}\text{ and }j\in\mathscr{A}_{k}\}.\end{array}

There exists a constant B(0,+)B\in(0,\,+\infty) such that max1iN|𝒩i|<B\max_{1\leq i\leq N}|\mathscr{N}_{i}|<B.

Condition 4 imposes restrictions on 𝒙N\bm{x}\in\mathbb{R}^{N}. Condition 4 requires that the data-generating parameter vector 𝜽\bm{\theta}^{\star} be contained in a compact subset of  𝚯=N+2\bm{\Theta}=\mathbb{R}^{N+2}. The set of estimators 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) is not restricted by Condition 4 and consists of all 𝜽N+2\bm{\theta}\in\mathbb{R}^{N+2} such that 𝜽(𝜽;𝒀,𝒁)δN|\!|\nabla_{\bm{\theta}}~\ell(\bm{\theta};\,\bm{Y},\bm{Z})|\!|_{\infty}\,\leq\,\delta_{N}. Condition 4 can be weakened in special cases, allowing 𝜽=O(logN)|\!|\bm{\theta}^{\star}|\!|_{\infty}=O(\log N); see Section D.3 of the Supplementary Materials. Condition 4 controls the dependence among responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} and can be weakened to max1iN|𝒩i|=O(logN)\max_{1\leq i\leq N}|\mathscr{N}_{i}|=O(\log N), as demonstrated by Stewart and Schweinberger (2025) in the special case of connections 𝒁\bm{Z} (without predictors 𝑿\bm{X} and responses 𝒀\bm{Y}).

Corollary 1: Example of Convergence Rate. Consider a single observation of dependent responses and connections (𝐘,𝐙)(\bm{Y},\bm{Z}) generated by the model with parameter vector 𝛉(α𝒵,1,,α𝒵,N,γ𝒵,𝒵,γ𝒳,𝒴,𝒵)N+2\bm{\theta}^{\star}\coloneqq(\alpha_{\mathscr{Z},1}^{\star},\,\dots,\alpha_{\mathscr{Z},N}^{\star},\,\gamma_{\mathscr{Z},\mathscr{Z}}^{\star},\,\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}^{\star})\in\mathbb{R}^{N+2}. If Conditions 44 hold, there exist constants K(0,+)K\in(0,+\infty) and 0<LU<+0<L\leq U<+\infty along with an integer N0{3,4,}N_{0}\in\{3,4,\dots\} such that, for all N>N0N>N_{0}, the quantity δN\delta_{N} satisfies

LNlogNδNUNlogN,\begin{array}[]{llllllllll}L\,\sqrt{N\log N}&\leq&\delta_{N}&\leq&U\,\sqrt{N\log N},\end{array}

and the random set 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) is non-empty and satisfies

𝚯^(δN)(𝜽,KlogNN)\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}\left(\bm{\theta}^{\star},\;K\,\sqrt{\dfrac{\log N}{N}}\right)\end{array}

with probability at least 16/N21-6\,/N^{2}.

Corollary 4 is proved in Section D of the Supplementary Materials. The same method of proof can be used to establish convergence rates for pseudo-likelihood estimators 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) based on other models for dependent responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}, provided there is additional structure to control the dependence among responses and connections.

5 Simulation Results

To evaluate the performance of pseudo-likelihood estimators 𝜽^𝚯^(δN)\widehat{\bm{\theta}}\in\widehat{\bm{\Theta}}(\delta_{N}) and the accompanying uncertainty quantification, we simulate data from the example model specified by Equations (2) and (4). The coordinates of the nuisance parameter vector, 𝜽1(α𝒵,1\bm{\theta}_{1}^{\star}\coloneqq(\alpha_{\mathscr{Z},1}^{\star}, \dots, α𝒵,N)N\alpha_{\mathscr{Z},N}^{\star})\in\mathbb{R}^{N}, are independent Gaussian draws with mean 3/2-3/2 and standard deviation 3/103/10. The parameter vector of primary interest, 𝜽2(λ\bm{\theta}_{2}^{\star}\coloneqq(\lambda^{\star}, α𝒴\alpha_{\mathscr{Y}}^{\star}, β𝒳,𝒴\beta_{\mathscr{X},\mathscr{Y}}^{\star}, γ𝒵,𝒵\gamma_{\mathscr{Z},\mathscr{Z}}^{\star}, γ𝒳,𝒴,𝒵\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}^{\star}, γ𝒴,𝒴,𝒵)6\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}^{\star})\in\mathbb{R}^{6}, is specified as (1/5,1, 3, 4/5, 1/2,1/2)(1/5,\,-1,\,3,\,4/5,\,1/2,\,-1/2). The sparsity parameter λ=1/5\lambda^{\star}=1/5 ensures that each unit has on average approximately 30 connections, regardless of the value of NN. The neighborhood structure is based on L=(N25)/25L=(N-25)/25 intersecting subpopulations 𝒜1,,𝒜L\mathscr{A}_{1},\ldots,\mathscr{A}_{L}, where 𝒜l\mathscr{A}_{l} consists of the 50 units 1+25(l1),,25(l+1)1+25\,(l-1),\ldots,25\,(l+1) (l=1,,L1l=1,\dots,L-1). For each i𝒫Ni\in\mathscr{P}_{N}, we define the neighborhood 𝒩i𝒫N\mathscr{N}_{i}\subset\mathscr{P}_{N} to be the 50- or 75-unit union of all subpopulations 𝒜l\mathscr{A}_{l} containing ii.

Refer to caption
Refer to caption
Figure 2: Simulation results based on 1,000 replications. Left: Statistical error 𝜽^𝜽|\!|\widehat{\bm{\theta}}-\bm{\theta}^{\star}|\!|_{\infty} of maximum pseudo-likelihood estimator 𝜽^N+6\widehat{\bm{\theta}}\in\mathbb{R}^{N+6} as a function of NN. Right: Kernel density estimators of ZZ-scores (λ^λ)/S.E.(λ^)(\widehat{\lambda}-\lambda^{\star})/\text{S.E.}(\widehat{\lambda}),  (α^𝒴α𝒴)/S.E.(α^𝒴)(\widehat{\alpha}_{\mathscr{Y}}-\alpha_{\mathscr{Y}}^{\star})/\text{S.E.}(\widehat{\alpha}_{\mathscr{Y}}),  (β^𝒳,𝒴β𝒳,𝒴)/S.E.(β^𝒳,𝒴)(\widehat{\beta}_{\mathscr{X},\mathscr{Y}}-\beta_{\mathscr{X},\mathscr{Y}}^{\star})/\text{S.E.}(\widehat{\beta}_{\mathscr{X},\mathscr{Y}}),  (γ^𝒵,𝒵γ𝒵,𝒵)/S.E.(γ^𝒵,𝒵)(\widehat{\gamma}_{\mathscr{Z},\mathscr{Z}}-\gamma_{\mathscr{Z},\mathscr{Z}}^{\star})/\text{S.E.}(\widehat{\gamma}_{\mathscr{Z},\mathscr{Z}}),  (γ^𝒳,𝒴,𝒵γ𝒳,𝒴,𝒵)/S.E.(γ^𝒳,𝒴,𝒵)(\widehat{\gamma}_{\mathscr{X},\mathscr{Y},\mathscr{Z}}-\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}^{\star})/\text{S.E.}(\widehat{\gamma}_{\mathscr{X},\mathscr{Y},\mathscr{Z}}),  and (γ^𝒴,𝒴,𝒵γ𝒴,𝒴,𝒵)/S.E.(γ^𝒴,𝒴,𝒵)(\widehat{\gamma}_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}-\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}^{\star})/\text{S.E.}(\widehat{\gamma}_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}) based on N=500N=500 units, where the dashed line corresponds to the standard normal density and CP is the coverage probability of interval estimators with nominal coverage probability .95.95.

In Figure 2, the left panel shows that 𝜽^𝜽|\!|\widehat{\bm{\theta}}-\bm{\theta}^{\star}|\!|_{\infty} decreases as NN increases. The right panel depicts the empirical distributions of the standardized univariate estimators and the empirical coverage probabilities, demonstrating that the covariance estimator in Equation (12) appears accurate and normal-based inference seems reasonable. As discussed in Section 1, comparisons to other estimation approaches are infeasible due to their lack of scalability.

6 Hate Speech on X

We analyze posts of U.S. state legislators on the social media platform X in the six months preceding the insurrection at the U.S. Capitol on January 6, 2021 (Kim et al., 2022), with a view to studying how hate speech depends on the attributes of legislators and connections among them. Using Large Language Models (LLMs), we classify the contents of 109,974 posts by N=N= 2,191 legislators as “non-hate speech” or “hate speech,” as explained in Section G of the Supplementary Materials. The response YiY_{i} of legislator ii indicates whether ii released at least one post classified as hate speech. We use four covariates: xi,1x_{i,1} indicates that legislator ii’s party affiliation is Republican, xi,2x_{i,2} indicates that legislator ii is female, xi,3x_{i,3} indicates that legislator ii is white, and xi,4x_{i,4} is the state legislature that legislator ii is a member of (e.g., New York). The directed connections Zi,jZ_{i,j} are based on the mentions and reposts exchanged between January 6, 2020 and January 6, 2021: Zi,j=1Z_{i,j}=1 if legislator ii mentioned or reposted posts by legislator jj in a post. To construct the neighborhoods 𝒩i\mathscr{N}_{i} of legislators ii, we exploit the fact that users of X choose whom to follow and that these choices are known, so 𝒩i\mathscr{N}_{i} is defined as the union of {i}\{i\} and the set of users followed by ii.

6.1 Model Specification

To accommodate binary responses Yi{0,1}Y_{i}\in\{0,1\} and connections Zi,j{0,1}Z_{i,j}\in\{0,1\} that are directed, i.e., Zi,jZ_{i,j} may not be equal to Zj,iZ_{j,i}, we consider a model of the form

f𝜽(𝒚,𝒛𝒙)[i=1Na𝒴(yi)exp(𝜽ggi(𝒙i,yi))]×[i=1Nj=1,jiNa𝒵(zi,j)exp(𝜽hhi,j(𝒙,yi,yj,𝒛))],\begin{array}[]{llllllllll}f_{\bm{\theta}}(\bm{y},\,\bm{z}\mid\bm{x})&\propto&\left[\displaystyle\prod\limits_{i=1}^{N}a_{\mathscr{Y}}(y_{i})\,\exp(\bm{\theta}_{g}^{\top}\,g_{i}(\bm{x}_{i},\,y_{i}^{\star}))\right]\vskip 7.11317pt\\ &\times&\left[\displaystyle\prod\limits_{i=1}^{N}\,\displaystyle\prod\limits_{j=1,\,j\neq i}^{N}a_{\mathscr{Z}}(z_{i,j})\,\exp(\bm{\theta}_{h}^{\top}\,h_{i,j}(\bm{x},\,y_{i}^{\star},\,y_{j}^{\star},\,\bm{z}))\right],\end{array} (13)

where yiyi/ψ=yiy_{i}^{\star}\coloneqq y_{i}/\psi=y_{i} because ψ1\psi\coloneqq 1 when Yi{0,1}Y_{i}\in\{0,1\}; see Example 3 in Section 2.2.1. Since yi=yiy_{i}^{\star}=y_{i}, we henceforth write yiy_{i} instead of yiy_{i}^{\star}.

Using the definitions of ci,jc_{i,j} and di,jd_{i,j} in Equation (3), we specify gig_{i} and hi,jh_{i,j} as follows:

𝜽g(α𝒴β𝒳,𝒴,m,m=1,2,3),gi(yixi,myi,m=1,2,3),\begin{array}[]{llllllllll}\bm{\theta}_{g}\,\coloneqq\,\left(\begin{array}[]{ccc}\alpha_{\mathscr{Y}}\\ \beta_{\mathscr{X},\mathscr{Y},m},\;m=1,2,3\end{array}\right),&g_{i}\,\coloneqq\,\left(\begin{array}[]{ccc}y_{i}\\ x_{i,m}\,y_{i},\;m=1,2,3\end{array}\right),\end{array} (14)
𝜽h(𝜶𝒵,O𝜶𝒵,Iλγ𝒳,𝒵,1γ𝒳,𝒵,m,m=2,3,4γ𝒴,𝒵γ𝒵,𝒵,1γ𝒵,𝒵,2γ𝒳,𝒴,𝒵),hi,j(𝒆izi,j𝒆jzi,j(1ci,j)zi,jlogNci,jxi,1zi,jci,j𝕀(xi,m=xj,m)zi,j,m=2,3,4ci,jyjzi,j12zi,jzj,idi,j(𝒛)zi,jci,jxi,1yjzi,j),\begin{array}[]{llllllllll}\bm{\theta}_{h}\coloneqq\left(\begin{array}[]{ccc}\bm{\alpha}_{\mathscr{Z},O}\\ \bm{\alpha}_{\mathscr{Z},I}\\ \lambda\\ \gamma_{\mathscr{X},\mathscr{Z},1}\\ \gamma_{\mathscr{X},\mathscr{Z},m},\;m=2,3,4\\ \gamma_{\mathscr{Y},\mathscr{Z}}\\ \gamma_{\mathscr{Z},\mathscr{Z},1}\\ \gamma_{\mathscr{Z},\mathscr{Z},2}\\ \gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\end{array}\right),&h_{i,j}\coloneqq\left(\begin{array}[]{ccc}\bm{e}_{i}\,z_{i,j}\\ \bm{e}_{j}\,z_{i,j}\\ -(1-c_{i,j})\,z_{i,j}\log N\\ c_{i,j}\,x_{i,1}\,z_{i,j}\\ c_{i,j}\,\mathbb{I}(x_{i,m}=x_{j,m})\,z_{i,j},\;m=2,3,4\\ c_{i,j}\,y_{j}\,z_{i,j}\\ \dfrac{1}{2}\,z_{i,j}\,z_{j,i}\\ d_{i,j}(\bm{z})\,z_{i,j}\\ c_{i,j}\,x_{i,1}\,y_{j}\,z_{i,j}\end{array}\right),\end{array} (15)

where the iith coordinate of NN-vector 𝒆i{0,1}N\bm{e}_{i}\in\{0,1\}^{N} is 11 and all other coordinates are 0. Here, 𝜶𝒵,O(α𝒵,O,1,,α𝒵,O,N)N\bm{\alpha}_{\mathscr{Z},O}\coloneqq(\alpha_{\mathscr{Z},O,1},\ldots,\alpha_{\mathscr{Z},O,N})\in\mathbb{R}^{N} quantifies the activity of legislators 1,,N1,\dots,N, i.e., their tendency to mention or repost posts of other legislators; 𝜶𝒵,I(α𝒵,I,1,,α𝒵,I,N)N\bm{\alpha}_{\mathscr{Z},I}\coloneqq(\alpha_{\mathscr{Z},I,1},\ldots,\alpha_{\mathscr{Z},I,N})\in\mathbb{R}^{N} quantifies the attractiveness of legislators 1,,N1,\dots,N, i.e., the tendency for other legislators to mention or repost posts by them; λ>0\lambda>0 discourages connections between legislators with non-overlapping neighborhoods; γ𝒳,𝒵,1,,γ𝒳,𝒵,4\gamma_{\mathscr{X},\mathscr{Z},1},\dots,\gamma_{\mathscr{X},\mathscr{Z},4}\in\mathbb{R} capture the effects of covariates xi,1,,xi,4x_{i,1},\dots,x_{i,4} on connections Zi,jZ_{i,j}; γ𝒴,𝒵\gamma_{\mathscr{Y},\mathscr{Z}}\in\mathbb{R} is the weight of the interaction of YjY_{j} and Zi,jZ_{i,j};  γ𝒵,𝒵,1\gamma_{\mathscr{Z},\mathscr{Z},1}\in\mathbb{R} quantifies the tendency to reciprocate connections; γ𝒵,𝒵,2\gamma_{\mathscr{Z},\mathscr{Z},2}\in\mathbb{R} quantifies the tendency to form transitive connections; and γ𝒳,𝒴,𝒵\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}} captures spillover from covariate xi,1x_{i,1} on response YjY_{j} through connection Zi,jZ_{i,j}; note that the spillover effect should not be interpreted as a causal effect, because the party affiliations xi,1x_{i,1} of legislators ii are not under the control of investigators (Kim et al., 2022). Since i=1NZi,j=j=1NZi,j\sum_{i=1}^{N}Z_{i,j}=\sum_{j=1}^{N}Z_{i,j} with probability 11, we set α𝒵,I,N0\alpha_{\mathscr{Z},I,N}\coloneqq 0 to address the identifiability problem that would result if all α𝒵,O,i\alpha_{\mathscr{Z},O,i} and α𝒵,I,j\alpha_{\mathscr{Z},I,j} were allowed to vary freely. These model terms were chosen based on domain knowledge, because model selection is an open problem: For instance, the statistic ci,jxi,1yjzi,jc_{i,j}\,x_{i,1}\,y_{j}\,z_{i,j} with weight γ𝒳,𝒴,𝒵\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}} is included to assess whether the party affiliation xi,1x_{i,1} of state legislators ii affects posts yjy_{j} of state legislators jj who are connected (zi,j=1z_{i,j}=1) and whose neighborhoods overlap (ci,j=1c_{i,j}=1). In practice, data scientists can consult domain experts to make informed choices regarding model specifications.

The specified model is estimated by an extension of the algorithm in Section 3.2 to directed connections; see Section F of the Supplementary Materials.

6.2 Results

Table 1: Maximum pseudo-likelihood estimates and standard errors based on the model specified by Equations (14) and (15).
Weight Estimate Standard Error Weight Estimate Standard Error
α𝒴\alpha_{\mathcal{Y}} .893-.893 .134 γ𝒵,𝒵,1\gamma_{\mathcal{Z},\mathcal{Z},1} 2.572.57 .033
β𝒳,𝒴,1\beta_{\mathcal{X},\mathcal{Y},1} .257-.257 .105 γ𝒵,𝒵,2\gamma_{\mathcal{Z},\mathcal{Z},2} .604.604 .037
β𝒳,𝒴,2\beta_{\mathcal{X},\mathcal{Y},2} .069.069 .094 γ𝒳,𝒵,1\gamma_{\mathcal{X},\mathcal{Z},1} .007-.007 .07
β𝒳,𝒴,3\beta_{\mathcal{X},\mathcal{Y},3} .034-.034 .127 γ𝒳,𝒵,2\gamma_{\mathcal{X},\mathcal{Z},2} .236.236 .016
γ𝒴,𝒵\gamma_{\mathcal{Y},\mathcal{Z}} .035.035 .005 γ𝒳,𝒵,3\gamma_{\mathcal{X},\mathcal{Z},3} .756.756 .025
γ𝒳,𝒴,𝒵\gamma_{\mathcal{X},\mathcal{Y},\mathcal{Z}} .038.038 .013 γ𝒳,𝒵,4\gamma_{\mathcal{X},\mathcal{Z},4} 4.7294.729 .049
λ\lambda .184.184 .006

To interpret the results, we exploit the fact that the conditional distributions of responses YiY_{i} and connections Zi,jZ_{i,j} can be represented by logistic regression models, with log odds

log𝜽(Yi=1others)1𝜽(Yi=1others)=α𝒴+m=13β𝒳,𝒴,mxi,m+j:𝒩i𝒩j(γ𝒴,𝒵+γ𝒳,𝒴,𝒵xj,1)zj,i\begin{array}[]{llllllllll}\log\dfrac{\mathbb{P}_{\bm{\theta}}(Y_{i}=1\mid\text{others})}{1-\mathbb{P}_{\bm{\theta}}(Y_{i}=1\mid\text{others})}&=&\alpha_{\mathscr{Y}}+\displaystyle\sum\limits_{m=1}^{3}\beta_{\mathscr{X},\mathscr{Y},m}\,x_{i,m}+\displaystyle\sum\limits_{j:\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset}(\gamma_{\mathscr{Y},\mathscr{Z}}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,x_{j,1})\,z_{j,i}\end{array}

and

log𝜽(Zi,j=1others)1𝜽(Zi,j=1others)=α𝒵,O,i+α𝒵,I,j+12γ𝒵,𝒵,1zj,i(1ci,j)λlogN+ci,j(γ𝒵,𝒵,2Δi,j(𝒛)+γ𝒳,𝒵,1xi,1+m=24γ𝒳,𝒵,m𝕀(xi,m=xj,m)+γ𝒴,𝒵yj+γ𝒳,𝒴,𝒵xi,1yj).\begin{array}[]{llllllllll}&&\log\dfrac{\mathbb{P}_{\bm{\theta}}(Z_{i,j}=1\mid\text{others})}{1-\mathbb{P}_{\bm{\theta}}(Z_{i,j}=1\mid\text{others})}\;=\;\alpha_{\mathscr{Z},O,i}+\alpha_{\mathscr{Z},I,j}+\dfrac{1}{2}\,\gamma_{\mathscr{Z},\mathscr{Z},1}\,z_{j,i}-(1-c_{i,j})\,\lambda\,\log N\vskip 7.11317pt\\ &+&c_{i,j}\left(\gamma_{\mathscr{Z},\mathscr{Z},2}\;\Delta_{i,j}(\bm{z})+\gamma_{\mathscr{X},\mathscr{Z},1}\,x_{i,1}+\displaystyle\sum\limits_{m=2}^{4}\gamma_{\mathscr{X},\mathscr{Z},m}\,\mathbb{I}(x_{i,m}=x_{j,m})+\gamma_{\mathscr{Y},\mathscr{Z}}\,y_{j}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,x_{i,1}\,y_{j}\right).\end{array}

For instance, the positive sign of γ^𝒳,𝒴,𝒵=.038\widehat{\gamma}_{\mathscr{X},\mathscr{Y},\mathscr{Z}}=.038 suggests that the more Republicans interact with legislator ii, the higher is the conditional probability that legislator ii uses offensive text in a post, holding everything else constant. Alternatively, one can interpret γ^𝒳,𝒴,𝒵\widehat{\gamma}_{\mathscr{X},\mathscr{Y},\mathscr{Z}} in terms of the conditional probability of observing a connection: The positive sign of γ^𝒳,𝒴,𝒵=.038\widehat{\gamma}_{\mathscr{X},\mathscr{Y},\mathscr{Z}}=.038 indicates that Republican legislators are more likely to interact with legislators who post harmful language. Other estimates align with expectations. For example, serving for the same state is the strongest predictor for reposting and mentioning activities (γ^𝒳,𝒵,4=4.729\widehat{\gamma}_{\mathscr{X},\mathscr{Z},4}=4.729), while matching gender (γ^𝒳,𝒵,2=.236\widehat{\gamma}_{\mathscr{X},\mathscr{Z},2}=.236) and race (γ^𝒳,𝒵,3=.756\widehat{\gamma}_{\mathscr{X},\mathscr{Z},3}=.756) likewise increase the conditional probability to interact. At the same time, connections affect other connections: For example, forming a connection that leads to a transitive connection is observed more often than expected under the model with γ𝒵,𝒵,2=0\gamma_{\mathscr{Z},\mathscr{Z},2}=0, holding everything else constant.

6.3 Model Assessment

Refer to caption
Figure 3: Model-based predictions of spillover in- and out-degrees of U.S. state legislators in the subnetwork with ii being Republican, jj using offensive language, and the neighborhoods of ii and jj overlapping, i.e., xi,1=Yj=1x_{i,1}=Y_{j}=1 and 𝒩i𝒩i\mathscr{N}_{i}\,\cap\,\mathscr{N}_{i}\,\neq\,\emptyset. By construction, the possible connections in the subnetwork act as potential channels of spillover. The spillover in- and out-degrees are defined as the respective degree of a unit in the subnetwork with xi,1=Yj=1x_{i,1}=Y_{j}=1 and 𝒩i𝒩i\mathscr{N}_{i}\,\cap\,\mathscr{N}_{i}\,\neq\,\emptyset. The observed spillover in- and out-degrees are shown by solid lines with overlaid points.
Refer to caption
Figure 4: Comparing model-based predictions of Yi𝑿i=𝒙iY_{i}\mid\bm{X}_{i}=\bm{x}_{i} without network interference (solid) to model-based predictions of Yi(𝑿,𝒀i,𝒁)=(𝒙,𝒚i,𝒛)Y_{i}\mid(\bm{X},\bm{Y}_{-i},\bm{Z})=(\bm{x},\bm{y}_{-i},\bm{z}) with network interference (dashed) based on the area under the curve (AUC). Sensitivity is the true positive rate, the fraction among legislators predicted to make offensive posts who actually do so. Specificity is the true negative rate, the fraction among legislators predicted not to make offensive posts who actually do not do so.
Table 2: Comparison of maximum pseudo-likelihood estimates based on the logistic regression model for Yi𝑿i=𝒙iY_{i}\mid\bm{X}_{i}=\bm{x}_{i} without network interference and the joint probability model for (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} with network interference.
α𝒴\alpha_{\mathcal{Y}} β𝒳,𝒴,1\beta_{\mathcal{X},\mathcal{Y},1} β𝒳,𝒴,2\beta_{\mathcal{X},\mathcal{Y},2} β𝒳,𝒴,3\beta_{\mathcal{X},\mathcal{Y},3}
Without network interference .101-.101 (.103) .235-.235 (.097) .032.032 (.093) .169-.169 (.113)
With network interference .893-.893 (.134) .257-.257 (.105) .069.069 (.094) .034-.034 (.127)

We assess the model using model-based predictions of (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}. First, we focus on the subnetwork of all pairs of legislators {i,j}𝒫N\{i,j\}\subset\mathscr{P}_{N} with xi,1=Yj=1x_{i,1}=Y_{j}=1 and 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset, with a view to assessing how well the interplay of xi,1x_{i,1}, YjY_{j}, and Zi,jZ_{i,j} can be represented by the model. Figure 3 shows that the model captures the effect of xi,1x_{i,1} on YjY_{j} among pairs of legislators {i,j}𝒫N\{i,j\}\subset\mathscr{P}_{N} with 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset. Second, we compare predictions of responses YiY_{i} based on models with and without network interference. Predictions without network interference are based on the logistic regression model for Yi𝑿i=𝒙iY_{i}\mid\bm{X}_{i}=\bm{x}_{i} with weights α𝒴\alpha_{\mathscr{Y}}, β𝒳,𝒴,1\beta_{\mathscr{X},\mathscr{Y},1}, β𝒳,𝒴,2\beta_{\mathscr{X},\mathscr{Y},2}, and β𝒳,𝒴,3\beta_{\mathscr{X},\mathscr{Y},3}, which assumes that the posts YiY_{i} of legislators ii are independent and do not depend on the connections 𝒁\bm{Z} among the legislators. By contrast, predictions with network interference are based on the joint probability model for (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} specified by Equation (13), which allows the posts YiY_{i} of state legislators ii to be affected by the posts YjY_{j} of connected legislators jj whose sets of followees overlap. Figure 4 demonstrates that predictions based on models with network interference outperform those without network interference, suggesting that posts of connected state legislators with overlapping sets of followees are interdependent. Table 2 compares estimates of α𝒴\alpha_{\mathcal{Y}}, β𝒳,𝒴,1\beta_{\mathcal{X},\mathcal{Y},1}, β𝒳,𝒴,2\beta_{\mathcal{X},\mathcal{Y},2}, and β𝒳,𝒴,3\beta_{\mathcal{X},\mathcal{Y},3} based on models with and without network interference. While both models agree on the signs of parameter estimates, the estimates of β𝒳,𝒴,2\beta_{\mathcal{X},\mathcal{Y},2} and β𝒳,𝒴,3\beta_{\mathcal{X},\mathcal{Y},3} differ by a factor of 2 and 5, respectively, suggesting that network interference affects parameter estimates. Third, we demonstrate in Section G.2 of the Supplementary Materials that the model preserves salient features of connections 𝒁\bm{Z}.

7 Discussion

The proposed regression framework is flexible, allowing data scientists to specify a wide range of models for dependent responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}.

The large set of possible models raises the question of how data scientists can select a model from a set of candidate models. While model selection for independent responses Yi𝑿i=𝒙iY_{i}\mid\bm{X}_{i}=\bm{x}_{i} is well-established and model selection for independent connections 𝒁𝑿=𝒙\bm{Z}\mid\bm{X}=\bm{x} is an active area of research (e.g., Wang and Bickel, 2017; Wang et al., 2024; Stein et al., 2025), model selection for dependent responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x} with pp\to\infty parameters is an open problem. Two model selection ideas that hold promise are those of Ravikumar et al. (2010) for high-dimensional graphical models for dependent responses 𝒀\bm{Y} and those of Chen and Chen (2012) for high-dimensional generalized linear models for independent responses Yi𝑿i=𝒙iY_{i}\mid\bm{X}_{i}=\bm{x}_{i} based on the extended BIC.

Likewise, open questions remain in the realm of uncertainty quantification, as discussed in Section 3.3. For example, a proof of asymptotic normality based on dependent data remains elusive. A related avenue for future research is Godambe information: If 𝜽2(𝜽;𝒀,𝒁)|𝜽=𝜽^1-\nabla_{\bm{\theta}}^{2}\,\,\ell(\bm{\theta};\,\bm{Y},\bm{Z})|_{\bm{\theta}=\widehat{\bm{\theta}}}^{-1}  were constant, it could be pulled out of the approximate covariance matrix in Equation (12), giving rise to Godambe information. Simulations suggest that uncertainty quantification based on Godambe information achieves comparable accuracy to the method reported here, while avoiding multiple matrix inversions.

The question of neighborhood recovery is another important direction for future research, as discussed in Section 2.

Supplementary Materials

The supplementary materials contain proofs of all theoretical results.

Acknowledgements

The authors are indebted to the constructive comments and suggestions of an anonymous associate editor and two referees, which have led to numerous improvements.

Disclosure Statement

The authors report there are no competing interests to declare.

Funding

The authors acknowledge support by DFG award FR 4768/1-1 (CF) and ARO award W911NF-21-1-0335 (CF, MS, SB).

References

  • Besag (1974) Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society: Series B 36, 192–225.
  • Brown (1986) Brown, L. (1986). Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory. Hayworth, CA, USA: Institute of Mathematical Statistics.
  • Chatterjee and Diaconis (2013) Chatterjee, S. and P. Diaconis (2013). Estimating and understanding exponential random graph models. The Annals of Statistics 41, 2428–2461.
  • Chen and Chen (2012) Chen, J. and Z. Chen (2012). Extended BIC for small-nn-large-pp sparse GLM. Statistica Sinica 22, 555–574.
  • Clark and Handcock (2024) Clark, D. A. and M. S. Handcock (2024). Causal inference over stochastic networks. Journal of the Royal Statistical Society Series A: Statistics in Society 187, 772–795.
  • Comets and Janzura (1998) Comets, F. and M. Janzura (1998). A central limit theorem for conditionally centred random fields with an application to Markov fields. Journal of Applied Probability 35, 608–621.
  • Efron (2022) Efron, B. (2022). Exponential families in theory and practice. Cambridge, MA: Cambridge University Press.
  • Fosdick and Hoff (2015) Fosdick, B. K. and P. D. Hoff (2015). Testing and modeling dependencies between a network and nodal attributes. Journal of the American Statistical Association 110, 1047–1056.
  • Handcock (2003) Handcock, M. S. (2003). Statistical models for social networks: Inference and degeneracy. In R. Breiger, K. Carley, and P. Pattison (Eds.), Dynamic Social Network Modeling and Analysis, pp.  1–12. Washington, D.C.: National Academies Press.
  • Huang et al. (2019) Huang, D., W. Lan, H. H. Zhang, and H. Wang (2019). Least squares estimation of spatial autoregressive models for large-scale social networks. Electronic Journal of Statistics 13, 1135–1165.
  • Huang et al. (2020) Huang, D., F. Wang, X. Zhu, and H. Wang (2020). Two-mode network autoregressive model for large-scale networks. Journal of Econometrics 216, 203–219.
  • Huang et al. (2024) Huang, S., J. Sun, and Y. Feng (2024). PCABM: Pairwise covariates-adjusted block model for community detection. Journal of the American Statistical Association 119, 2092–2104.
  • Hunter and Lange (2004) Hunter, D. R. and K. Lange (2004). A tutorial on MM algorithms. The American Statistician 58, 30–37.
  • Jensen and Künsch (1994) Jensen, J. L. and H. R. Künsch (1994). On asymptotic normality of pseudo likelihood estimates for pairwise interaction processes. Annals of the Institute of Mathematical Statistics 46, 475–486.
  • Kim et al. (2022) Kim, T., N. Nakka, I. Gopal, B. A. Desmarais, A. Mancinelli, J. J. Harden, H. Ko, and F. J. Boehmke (2022). Attention to the COVID‐19 pandemic on Twitter: Partisan differences among U.S. state legislators. Legislative Studies Quarterly 47, 1023–1041.
  • Kolaczyk (2017) Kolaczyk, E. D. (2017). Topics at the Frontier of Statistics and Network Analysis: (Re)Visiting the Foundations. Cambridge University Press.
  • Krivitsky et al. (2023) Krivitsky, P. N., P. Coletti, and N. Hens (2023). A tale of two datasets: Representativeness and generalisability of inference for samples of networks. Journal of the American Statistical Association 118, 2213–2224.
  • Le and Li (2022) Le, C. M. and T. Li (2022). Linear regression and its inference on noisy network-linked data. Journal of the Royal Statistical Society Series B: Statistical Methodology 84, 1851–1885.
  • Lei et al. (2024) Lei, J., K. Chen, and H. Moon (2024). Least squares inference for data with network dependency. Available at arXiv:2404.01977.
  • Li and Wager (2022) Li, S. and S. Wager (2022). Random graph asymptotics for treatment effect estimation under network interference. The Annals of Statistics 50, 2334 – 2358.
  • Li et al. (2019) Li, T., E. Levina, and J. Zhu (2019). Prediction models for network-linked data. The Annals of Applied Statistics 13(1), 132–164.
  • Niezink and Snijders (2017) Niezink, N. M. D. and T. A. B. Snijders (2017). Co-evolution of social networks and continuous actor attributes. The Annals of Applied Statistics 11, 1948–1973.
  • Ogburn et al. (2024) Ogburn, E. L., O. Sofrygin, I. Diaz, and M. J. Van der Laan (2024). Causal inference for social network data. Journal of the American Statistical Association 119, 597–611.
  • Ortega and Rheinboldt (2000) Ortega, J. M. and W. C. Rheinboldt (2000). Iterative solution of nonlinear equations in several variables. Society for Industrial and Applied Mathematics.
  • Ravikumar et al. (2010) Ravikumar, P., M. J. Wainwright, and J. Lafferty (2010). High-dimensional Ising model selection using 1\ell_{1}-regularized logistic regression. The Annals of Statistics 38, 1287–1319.
  • Schweinberger (2011) Schweinberger, M. (2011). Instability, sensitivity, and degeneracy of discrete exponential families. Journal of the American Statistical Association 106, 1361–1370.
  • Snijders et al. (2007) Snijders, T. A. B., C. E. G. Steglich, and M. Schweinberger (2007). Modeling the co-evolution of networks and behavior. In K. van Montfort, H. Oud, and A. Satorra (Eds.), Longitudinal models in the behavioral and related sciences, pp.  41–71. Lawrence Erlbaum.
  • Stein et al. (2025) Stein, S., R. Feng, and C. Leng (2025). A sparse beta regression model for network analysis. Journal of the American Statistical Association. To appear.
  • Stewart and Schweinberger (2025) Stewart, J. R. and M. Schweinberger (2025). Pseudo-likelihood-based MM-estimators for random graphs with dependent edges and parameter vectors of increasing dimension. The Annals of Statistics. To appear.
  • Tchetgen Tchetgen et al. (2021) Tchetgen Tchetgen, E. J., I. R. Fulcher, and I. Shpitser (2021). Auto-G-computation of causal effects on a network. Journal of the American Statistical Association 116, 833–844.
  • Wang et al. (2024) Wang, J., X. Cai, X. Niu, and R. Li (2024). Variable selection for high-dimensional nodal attributes in social networks with degree heterogeneity. Journal of the American Statistical Association 119, 1322–1335.
  • Wang and Bickel (2017) Wang, Y. X. and P. J. Bickel (2017). Likelihood-based model selection for stochastic block models. The Annals of Statistics 45, 500–528.
  • Wang et al. (2024) Wang, Z., I. E. Fellows, and M. S. Handcock (2024). Understanding networks with exponential-family random network models. Social Networks 78, 81–91.
  • Zhu et al. (2020) Zhu, X., D. Huang, R. Pan, and H. Wang (2020). Multivariate spatial autoregressive model for large scale social networks. Journal of Econometrics 215, 591–606.

Supplementary Materials:
A Regression Framework for Studying Relationships among Attributes under Network Interference

\startcontents\printcontents

1

Appendix A Proofs of Propositions 2.1 and A

Proof of Proposition 2.1. The joint probability density function of (𝒀,𝒁)𝑿=𝒙(\bm{Y},\,\bm{Z})\mid\bm{X}=\bm{x} stated in Equation (1) in Section 2 implies that the conditional probability density function of Yi(𝑿,𝒀i,𝒁)=(𝒙,𝒚i,𝒛)Y_{i}\mid(\bm{X},\,\bm{Y}_{-i},\,\bm{Z})=(\bm{x},\,\bm{y}_{-i},\,\bm{z}) can be written as

f𝜽(yi𝒙,𝒚i,𝒛)=f𝜽(yi,𝒚i,𝒛𝒙)𝒴if𝜽(y,𝒚i,𝒛𝒙)dν𝒴(y)=a𝒴(yi)exp(𝜽ggi,1(𝒙i)yi+(j𝒫N{i}𝜽hhi,j,1(𝒙,yj,𝒛))yi)𝒴ia𝒴(y)exp(𝜽ggi,1(𝒙i)y+(j𝒫N{i}𝜽hhi,j,1(𝒙,yj,𝒛))y)dν𝒴(y)=a𝒴(yi)exp(ηi(𝜽;𝒙,𝒚i,𝒛)yibi(ηi(𝜽;𝒙,𝒚i,𝒛))ψ),\begin{array}[]{llllllllll}&&f_{\bm{\theta}}(y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})\;=\;\dfrac{f_{\bm{\theta}}(y_{i},\,\bm{y}_{-i},\,\bm{z}\mid\bm{x})}{\displaystyle\int\limits_{\mathscr{Y}_{i}}f_{\bm{\theta}}(y,\,\bm{y}_{-i},\,\bm{z}\mid\bm{x})\mathop{\mbox{d}}\nolimits\nu_{\mathscr{Y}}(y)}\vskip 7.11317pt\vskip 7.11317pt\\ &=&\dfrac{a_{\mathscr{Y}}(y_{i})\,\exp\left(\bm{\theta}_{g}^{\top}\,g_{i,1}(\bm{x}_{i})\,y_{i}^{\star}+\left(\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\;\bm{\theta}_{h}^{\top}\,h_{i,j,1}(\bm{x},\,y_{j}^{\star},\,\bm{z})\right)y_{i}^{\star}\right)}{\displaystyle\int\limits_{\mathscr{Y}_{i}}a_{\mathscr{Y}}(y)\,\exp\left(\bm{\theta}_{g}^{\top}\,g_{i,1}(\bm{x}_{i})\,y^{\star}+\left(\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\;\bm{\theta}_{h}^{\top}\,h_{i,j,1}(\bm{x},\,y_{j}^{\star},\,\bm{z})\right)y^{\star}\right)\mathop{\mbox{d}}\nolimits\nu_{\mathscr{Y}}(y)}\vskip 7.11317pt\vskip 7.11317pt\\ &=&a_{\mathscr{Y}}(y_{i})\,\exp\left(\dfrac{\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z})\,y_{i}-b_{i}(\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z}))}{\psi}\right),\end{array}

where yy/ψy^{\star}\coloneqq y/\psi,  yiyi/ψy_{i}^{\star}\coloneqq y_{i}/\psi,  and 𝒚i𝒚i/ψ\bm{y}_{-i}^{\star}\coloneqq\bm{y}_{-i}/\psi, while

ηi(𝜽;𝒙,𝒚i,𝒛)𝜽(gi,1(𝒙i),j𝒫N{i}hi,j,1(𝒙,yj,𝒛))bi(ηi(𝜽;𝒙,𝒚i,𝒛))ψlog𝒴ia𝒴(y)exp(ηi(𝜽;𝒙,𝒚i,𝒛)yψ)dν𝒴(y).\begin{array}[]{llllllllll}\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z})\;\coloneqq\;\bm{\theta}^{\top}\left(g_{i,1}(\bm{x}_{i}),\;\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,h_{i,j,1}(\bm{x},\,y_{j}^{\star},\,\bm{z})\right)\vskip 7.11317pt\\ b_{i}(\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z}))\;\coloneqq\;\psi\,\log\displaystyle\int\limits_{\mathscr{Y}_{i}}\,a_{\mathscr{Y}}(y)\exp\left(\dfrac{\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z})\,y}{\psi}\right)\mathop{\mbox{d}}\nolimits\nu_{\mathscr{Y}}(y).\end{array}

Proposition 2. Consider Example 1 in Section 2.2.1. Let 𝐔{0,1}N×N\bm{U}\in\{0,1\}^{N\times N} be the N×NN\times N matrix with elements

ui,jci,jzi,j=𝟙(𝒩i𝒩j)zi,j,\begin{array}[]{llllllllll}u_{i,j}&\coloneqq&c_{i,j}\,z_{i,j}&=&\mathbbm{1}(\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset)\,z_{i,j},\end{array} (A.1)

and let 𝐯N\bm{v}\in\mathbb{R}^{N} be the NN-vector with coordinates

viα𝒴+β𝒳,𝒴xi+γ𝒳,𝒴,𝒵j𝒫N{i}ui,jxj.\begin{array}[]{llllllllll}v_{i}&\coloneqq&\alpha_{\mathscr{Y}}+\beta_{\mathscr{X},\mathscr{Y}}\;x_{i}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;x_{j}.\end{array} (A.2)

Denote by 𝐈\bm{I} the N×NN\times N identity matrix and define ξ𝒴,𝒴,𝒵γ𝒴,𝒴,𝒵/ψ\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\coloneqq\,\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}/\psi. If (𝐈ξ𝒴,𝒴,𝒵𝐔)(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U}) is positive definite, the conditional distribution of 𝐘(𝐗,𝐙)=(𝐱,𝐳)\bm{Y}\mid(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z}) is NN-variate Gaussian with mean vector (𝐈ξ𝒴,𝒴,𝒵𝐔)1𝐯(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})^{-1}\,\bm{v} and covariance matrix ψ(𝐈ξ𝒴,𝒴,𝒵𝐔)1\psi\,(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})^{-1}.

Remark. The requirement that (𝑰ξ𝒴,𝒴,𝒵𝑼)(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U}) be positive definite imposes restrictions on γ𝒴,𝒴,𝒵\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}. The restrictions on γ𝒴,𝒴,𝒵\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}} depend on the neighborhoods 𝒩i\mathscr{N}_{i} of units i𝒫Ni\in\mathscr{P}_{N} and connections Zi,jZ_{i,j} among pairs of units {i,j}𝒫N\{i,j\}\subset\mathscr{P}_{N}.

Proof of Proposition A. Example 1 in Section 2.2.1 demonstrates that the conditional distribution of Yi(𝑿,𝒀i,𝒁)=(𝒙,𝒚i,𝒛)Y_{i}\mid(\bm{X},\,\bm{Y}_{-i},\,\bm{Z})=(\bm{x},\,\bm{y}_{-i},\,\bm{z}) is Gaussian with conditional mean

𝔼(Yi𝒙,𝒚i,𝒛)\displaystyle\mathbb{E}(Y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})~ =α𝒴+β𝒳,𝒴xi+γ𝒳,𝒴,𝒵j𝒫N{i}ui,jxj+γ𝒴,𝒴,𝒵j𝒫N{i}ui,jyj\displaystyle=~\alpha_{\mathscr{Y}}+\beta_{\mathscr{X},\mathscr{Y}}\,x_{i}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;x_{j}+\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;y_{j}^{\star}\vskip 7.11317pt
=vi+ξ𝒴,𝒴,𝒵j𝒫N{i}ui,jyj,\displaystyle=~v_{i}+\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;y_{j}, (A.3)

where

viα𝒴+β𝒳,𝒴xi+γ𝒳,𝒴,𝒵j𝒫N{i}ui,jxj\begin{array}[]{llllllllll}v_{i}&\coloneqq&\alpha_{\mathscr{Y}}+\beta_{\mathscr{X},\mathscr{Y}}\,x_{i}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;x_{j}\end{array}

and

ξ𝒴,𝒴,𝒵γ𝒴,𝒴,𝒵ψ.\begin{array}[]{llllllllll}\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}&\coloneqq&\dfrac{\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}}{\psi}.\end{array}

The conditional variance of Yi(𝑿,𝒀i,𝒁)=(𝒙,𝒚i,𝒛)Y_{i}\mid(\bm{X},\,\bm{Y}_{-i},\bm{Z})=(\bm{x},\,\bm{y}_{-i},\,\bm{z}) is

𝕍(Yi𝒙,𝒚i,𝒛)=ψ.\begin{array}[]{llllllllll}\mathbb{V}(Y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})&=&\psi.\end{array} (A.4)

Let 𝒎(mi)N\bm{m}\coloneqq(m_{i})\in\mathbb{R}^{N} be the conditional mean of 𝒀(𝑿,𝒁)=(𝒙,𝒛)\bm{Y}\mid(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z}). Upon taking expectation on both sides of (A) conditional on (𝑿,𝒁)=(𝒙,𝒛)(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z}), we obtain

mi=vi+ξ𝒴,𝒴,𝒵j𝒫N{i}ui,jmj,\begin{array}[]{llllllllll}m_{i}&=&v_{i}+\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;m_{j},\end{array} (A.5)

which implies that

vi=miξ𝒴,𝒴,𝒵j𝒫N{i}ui,jmj\begin{array}[]{llllllllll}v_{i}&=&m_{i}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;m_{j}\end{array}

and hence

𝔼(Yi𝒙,𝒚i,𝒛)\displaystyle\mathbb{E}(Y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z}) =vi+ξ𝒴,𝒴,𝒵j𝒫N{i}ui,jyj\displaystyle=~v_{i}+\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;y_{j}\vskip 7.11317pt
=miξ𝒴,𝒴,𝒵j𝒫N{i}ui,jmj+ξ𝒴,𝒴,𝒵j𝒫N{i}ui,jyj\displaystyle=~m_{i}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;m_{j}+\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;y_{j}\vskip 7.11317pt
=mi+ξ𝒴,𝒴,𝒵j𝒫N{i}ui,j(yjmj)\displaystyle=~m_{i}+\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;(y_{j}-m_{j})\vskip 7.11317pt
=mij𝒫N{i}bi,j(yjmj),\displaystyle=~m_{i}-\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,b_{i,j}\;(y_{j}-m_{j}), (A.6)

where

bi,jξ𝒴,𝒴,𝒵ui,j.\begin{array}[]{llllllllll}b_{i,j}&\coloneqq&-\,\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;u_{i,j}.\end{array}

By comparing Equations (A.4) and (A) to Equations (2.17) and (2.18) of \citetsupprue2005gaussian and invoking Theorem 2.6 of \citetsupprue2005gaussian, we conclude that the conditional distribution of 𝒀(𝑿,𝒁)=(𝒙,𝒛)\bm{Y}\mid(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z}) is NN-variate Gaussian with mean vector 𝒎N\bm{m}\in\mathbb{R}^{N} and precision matrix 𝑷N×N\bm{P}\in\mathbb{R}^{N\times N} with elements

pi,j{1ψif i=jbi,jψif ij,\begin{array}[]{llllllllll}p_{i,j}&\coloneqq&\begin{cases}\dfrac{1}{\psi}&\mbox{if }i=j\vskip 7.11317pt\\ \dfrac{b_{i,j}}{\psi}&\mbox{if }i\neq j,\end{cases}\end{array}

provided ui,j=uj,iu_{i,j}=u_{j,i} for all iji\neq j and 𝑷\bm{P} is positive definite; note that ui,j=uj,iu_{i,j}=u_{j,i} is satisfied in undirected networks with zi,j=zj,iz_{i,j}=z_{j,i}.

To state these results in matrix form, note that (A.5) can be expressed as

𝒎=𝒗+ξ𝒴,𝒴,𝒵𝑼𝒎,\begin{array}[]{llllllllll}\bm{m}&=&\bm{v}+\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U}\,\bm{m},\end{array}

implying

𝒎=(𝑰ξ𝒴,𝒴,𝒵𝑼)1𝒗,\begin{array}[]{llllllllll}\bm{m}&=&(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})^{-1}\;\bm{v},\end{array}

while 𝑷\bm{P} can be expressed as

𝑷=1ψ(𝑰ξ𝒴,𝒴,𝒵𝑼),\begin{array}[]{llllllllll}\bm{P}&=&\dfrac{1}{\psi}\,(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U}),\end{array}

implying

𝑷1=ψ(𝑰ξ𝒴,𝒴,𝒵𝑼)1.\begin{array}[]{llllllllll}\bm{P}^{-1}&=&\psi\,(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})^{-1}.\end{array}

To conclude, the conditional distribution of 𝒀(𝑿,𝒁)=(𝒙,𝒛)\bm{Y}\mid(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z}) is NN-variate Gaussian with mean vector (𝑰ξ𝒴,𝒴,𝒵𝑼)1𝒗(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})^{-1}\,\bm{v} and covariance matrix ψ(𝑰ξ𝒴,𝒴,𝒵𝑼)1\psi\,(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})^{-1}, provided (𝑰ξ𝒴,𝒴,𝒵𝑼)(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U}) is positive definite.

Appendix B Proofs of Lemmas 3.1 and 3.2

Proof of Lemma 3.1. Lemma 3.1 is proved in the sentence preceeding the statement of Lemma 3.1 in Section 3.1.

Proof of Lemma 3.2. Letting 𝚯1\bm{\Theta}_{1} denote the parameter space of 𝜽1\bm{\theta}_{1}, suppose that v:𝚯1v:\bm{\Theta}_{1}\mapsto\mathbb{R} is any twice differentiable function and that 2v(𝜽)𝑴\nabla^{2}\,v(\bm{\theta})-\bm{M} is non-negative definite for all 𝜽𝚯1\bm{\theta}\in\bm{\Theta}_{1} for some constant matrix 𝑴d×d\bm{M}\in\mathbb{R}^{d\times d} (d1d\geq 1). Then the function u:𝚯1u:\bm{\Theta}_{1}\mapsto\mathbb{R} given by

u(𝜽1)v(𝜽0)+(𝜽1𝜽0)v(𝜽0)+12(𝜽1𝜽0)𝑴(𝜽1𝜽0),𝜽0𝚯1\begin{array}[]{llllllllll}u(\bm{\theta}_{1})&\coloneqq&v(\bm{\theta}_{0})+(\bm{\theta}_{1}-\bm{\theta}_{0})^{\top}\,\nabla\,v(\bm{\theta}_{0})+\dfrac{1}{2}(\bm{\theta}_{1}-\bm{\theta}_{0})^{\top}\bm{M}(\bm{\theta}_{1}-\bm{\theta}_{0}),&&\bm{\theta}_{0}\in\bm{\Theta}_{1}\end{array}

satisfies u(𝜽1)v(𝜽1)u(\bm{\theta}_{1})\leq v(\bm{\theta}_{1}) for all 𝜽1𝚯1\bm{\theta}_{1}\in\bm{\Theta}_{1}, because Taylor’s theorem (Theorem 6.11, \citealpsupp[p. 124]magnus_matrix_2019) gives

u(𝜽1)v(𝜽1)=12(𝜽1𝜽0)[2v(𝜽˙)𝑴](𝜽1𝜽0),\begin{array}[]{llllllllll}u(\bm{\theta}_{1})-v(\bm{\theta}_{1})\;=\;\dfrac{1}{2}\,(\bm{\theta}_{1}-\bm{\theta}_{0})^{\top}\left[\nabla^{2}\,v(\dot{\bm{\theta}})-\bm{M}\right](\bm{\theta}_{1}-\bm{\theta}_{0}),\end{array}

where 𝜽˙ϕ𝜽0+(1ϕ)𝜽1𝚯1\dot{\bm{\theta}}\coloneqq\phi\,\bm{\theta}_{0}+(1-\phi)\,\bm{\theta}_{1}\in\bm{\Theta}_{1} (ϕ[0, 1]\phi\in[0,\,1]). The inequality 1/4πi,j(1πi,j)1/4\,\geq\,\pi_{i,j}\,(1-\pi_{i,j}) implies that

[𝑨(𝜽1)𝑨]=i=1Nj=i+1N[14πi,j(t)(1πi,j(t))]𝒆i,j𝒆i,j\begin{array}[]{llllllllll}-[\bm{A}(\bm{\theta}_{1})-\bm{A}^{\star}]&=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\left[\dfrac{1}{4}-\pi_{i,j}^{(t)}\,(1-\pi_{i,j}^{(t)})\right]\,\bm{e}_{i,j}\,\bm{e}_{i,j}^{\top}\end{array}

is non-negative definite. Lemma 3.1 proves that 𝜽1\bm{\theta}_{1} is concave and that the restriction of (𝜽)\ell(\bm{\theta}) to 𝜽1\bm{\theta}_{1} has the properties of v(𝜽1)v(\bm{\theta}_{1}) stated above, proving Lemma 3.2.

Appendix C Proof of Theorem 4

Theorem 4 is a generalization of Theorem 2 of \citetsupp[][abbreviated as S25]StSc20 from exponential family models for binary connections 𝒁\bm{Z} to exponential family models for binary, count-, and real-valued responses and connections (𝒀,𝒁)𝑿=𝒙(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}. We henceforth suppress predictors 𝒙𝒳\bm{x}\in\mathscr{X}.

Proof of Theorem 4. Let 𝒔(𝜽;𝒚,𝒛)𝜽(𝜽;𝒚,𝒛)\bm{s}(\bm{\theta};\,\bm{y},\bm{z})\coloneqq\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{y},\bm{z}) and consider events

𝒞(δN){(𝒚,𝒛)𝒴×𝒵:𝒔(𝜽;𝒚,𝒛)δN}{(𝒚,𝒛)𝒴×𝒵:𝜽2(𝜽;𝒚,𝒛) is invertible for all 𝜽(𝜽,ϵ)}.\begin{array}[]{cll}\mathscr{C}(\delta_{N})&\coloneqq&\left\{(\bm{y},\,\bm{z})\in\mathscr{Y}\times\mathscr{Z}:\,\left|\!\left|\bm{s}(\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}\;\leq\;\delta_{N}\right\}\vskip 7.11317pt\\ \mathscr{H}&\subseteq&\left\{(\bm{y},\,\bm{z})\in\mathscr{Y}\times\mathscr{Z}:\;-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{y},\,\bm{z})\mbox{ is invertible for all $\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})$}\right\}.\end{array}

Define

ΛN,𝒚,𝒛(𝜽)sup𝜽(𝜽,ϵ)|(𝜽2(𝜽;𝒚,𝒛))1|,(𝒚,𝒛)ΛN(𝜽)sup(𝒚,𝒛)ΛN,𝒚,𝒛(𝜽).\begin{array}[]{cll}\Lambda_{N,\bm{y},\bm{z}}(\bm{\theta}^{\star})&\coloneqq&\sup\limits_{\bm{\theta}\,\in\,\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})}\,|\!|\!|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{y},\,\bm{z}))^{-1}|\!|\!|_{\infty},\;\;\;(\bm{y},\,\bm{z})\in\mathscr{H}\vskip 7.11317pt\\ \Lambda_{N}(\bm{\theta}^{\star})&\coloneqq&\sup\limits_{(\bm{y},\,\bm{z})\,\in\,\mathscr{H}}\,\Lambda_{N,\bm{y},\bm{z}}(\bm{\theta}^{\star}).\end{array}

It follows from results of S25 that s(𝜽;𝒚,𝒛)s(\bm{\theta};\,\bm{y},\bm{z}), considered as a function of 𝜽𝚯\bm{\theta}\in\bm{\Theta} for fixed (𝒚,𝒛)(\bm{y},\bm{z})\in\mathscr{H}, is a homeomorphism and is continuously differentiable.

In the event (Y,Z)𝒞(δN)(\bm{Y},\bm{Z})\in\mathscr{C}(\delta_{N}), the set 𝚯(δN){\bm{\Theta}}(\delta_{N}) is non-empty. By construction of the sets 𝒞(δN)\mathscr{C}(\delta_{N}) and 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}), the set 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) is non-empty for all (𝒚,𝒛)𝒞(δN)(\bm{y},\,\bm{z})\in\mathscr{C}(\delta_{N}), because 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) contains the data-generating parameter vector 𝜽𝚯\bm{\theta}^{\star}\in\bm{\Theta} provided (𝒚,𝒛)𝒞(δN)(\bm{y},\,\bm{z})\in\mathscr{C}(\delta_{N}):

𝜽𝚯^(δN){𝜽𝚯:𝒔(𝜽;𝒚,𝒛)δN}.\begin{array}[]{llllllllll}\bm{\theta}^{\star}&\in&\widehat{\bm{\Theta}}(\delta_{N})&\coloneqq&\left\{\bm{\theta}\in\bm{\Theta}:\;\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}\,\leq\,\delta_{N}\right\}.\end{array}

In the event (Y,Z)𝒞(δN)(\bm{Y},\bm{Z})\in\mathscr{C}(\delta_{N})\,\cap\,\,\mathscr{H},  the set 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) satisfies 𝚯^(δN)(θ,ρN)\widehat{\bm{\Theta}}(\delta_{N})\subseteq\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\rho_{N}) provided N>N0N>N_{0}. By assumption, there exists a sequence ρ1,ρ2,[0,+)\rho_{1},\rho_{2},\dots\in[0,+\infty) such that ρN=o(1)\rho_{N}=o(1). Therefore, there exists an integer N0{1,2,}N_{0}\in\{1,2,\dots\} such that ρN<ϵ\rho_{N}<\epsilon^{\star} for all N>N0N>N_{0}. Consider any N>N0N>N_{0} and any (𝒚,𝒛)𝒞(δN)(\bm{y},\,\bm{z})\in\mathscr{C}(\delta_{N})\,\cap\,\mathscr{H}. Since 𝒔1(;𝒚,𝒛)\bm{s}^{-1}(\,\cdot\,;\,\bm{y},\,\bm{z}) is continuous on 𝚯\bm{\Theta}, there exists, for each (𝒚,𝒛)(\bm{y},\,\bm{z})\in\mathscr{H}, a real number ϵN(ρN)(0,+)\epsilon_{N}(\rho_{N})\in(0,\,+\infty) (which depends on (𝒚,𝒛)(\bm{y},\,\bm{z})\in\mathscr{H}) such that

𝒔(𝜽;𝒚,𝒛)𝒔(𝜽;𝒚,𝒛)ϵN(ρN)implies𝜽𝜽ρN.\begin{array}[]{llllllllll}\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\,\bm{z})-\bm{s}(\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}&\leq&\epsilon_{N}(\rho_{N})&\mbox{implies}&\left|\!\left|\bm{\theta}-\bm{\theta}^{\star}\right|\!\right|_{\infty}&\leq&\rho_{N}.\end{array} (C.1)

As 𝒔(𝜽;𝒚,𝒛)\bm{s}(\bm{\theta};\,\bm{y},\bm{z}) is a homeomorphism and continuously differentiable, we can invoke Lemma 1 of S25 to conclude that ϵN(ρN)\epsilon_{N}(\rho_{N}) is related to ρN\rho_{N} by the following inequality:

ρNΛN,𝒚,𝒛(𝜽)ϵN(ρN).\begin{array}[]{llllllllll}\dfrac{\rho_{N}}{\Lambda_{N,\bm{y},\bm{z}}(\bm{\theta}^{\star})}&\leq&\epsilon_{N}(\rho_{N}).\end{array} (C.2)

To take advantage of (C.2), observe that, for all 𝜽𝚯^(δN)\bm{\theta}\in\widehat{\bm{\Theta}}(\delta_{N}) and all (𝒚,𝒛)𝒞(δN)(\bm{y},\,\bm{z})\in\mathscr{C}(\delta_{N})\,\cap\,\mathscr{H},

𝒔(𝜽;𝒚,𝒛)𝒔(𝜽;𝒚,𝒛)𝒔(𝜽;𝒚,𝒛)+𝒔(𝜽;𝒚,𝒛) 2δN=ρNΛN(𝜽),\begin{array}[]{llllllll}\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\bm{z})-\bm{s}(\bm{\theta}^{\star};\,\bm{y},\bm{z})\right|\!\right|_{\infty}\,\leq\,\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\bm{z})\right|\!\right|_{\infty}+\left|\!\left|\bm{s}(\bm{\theta}^{\star};\,\bm{y},\bm{z})\right|\!\right|_{\infty}\,\leq\,2\,\delta_{N}=\dfrac{\rho_{N}}{\Lambda_{N}(\bm{\theta}^{\star})},\end{array} (C.3)

because 𝒔(𝜽;𝒚,𝒛)δN\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}\leq\delta_{N} for all 𝜽𝚯^(δN)\bm{\theta}\in\widehat{\bm{\Theta}}(\delta_{N}),  𝒔(𝜽;𝒚,𝒛)δN\left|\!\left|\bm{s}(\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}\leq\delta_{N} for all (𝒚,𝒛)𝒞(δN)(\bm{y},\,\bm{z})\in\mathscr{C}(\delta_{N})\,\cap\,\mathscr{H},  and δNρN/(2ΛN(𝜽))\delta_{N}\coloneqq\rho_{N}\,/\,(2\,\Lambda_{N}(\bm{\theta}^{\star})). Using (C.3) along with the definition of ΛN(𝜽)sup(𝒚,𝒛)ΛN,𝒚,𝒛(𝜽)>0\Lambda_{N}(\bm{\theta}^{\star})\coloneqq\sup_{(\bm{y},\,\bm{z})\in\mathscr{H}}\,\Lambda_{N,\bm{y},\bm{z}}(\bm{\theta}^{\star})>0, we obtain

𝒔(𝜽;𝒚,𝒛)𝒔(𝜽;𝒚,𝒛)ρNΛN(𝜽)ρNΛN,𝒚,𝒛(𝜽),\begin{array}[]{llllllllll}\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\,\bm{z})-\bm{s}(\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}&\leq&\dfrac{\rho_{N}}{\Lambda_{N}(\bm{\theta}^{\star})}&\leq&\dfrac{\rho_{N}}{\Lambda_{N,\bm{y},\bm{z}}(\bm{\theta}^{\star})},\end{array} (C.4)

and, using (C.2),

𝒔(𝜽;𝒚,𝒛)𝒔(𝜽;𝒚,𝒛)ρNΛN,𝒚,𝒛(𝜽)ϵN(ρN).\begin{array}[]{llllllllll}\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\,\bm{z})-\bm{s}(\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}&\leq&\dfrac{\rho_{N}}{\Lambda_{N,\bm{y},\bm{z}}(\bm{\theta}^{\star})}&\leq&\epsilon_{N}(\rho_{N}).\end{array} (C.5)

In light of the fact that

𝒔(𝜽;𝒚,𝒛)𝒔(𝜽;𝒚,𝒛)ϵN(ρN)implies𝜽𝜽ρN,\begin{array}[]{llllllllll}\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\,\bm{z})-\bm{s}(\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}&\leq&\epsilon_{N}(\rho_{N})&\mbox{implies}&|\!|\bm{\theta}-\bm{\theta}^{\star}|\!|_{\infty}&\leq&\rho_{N},\end{array}

the set 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) is non-empty and satisfies

𝚯^(δN)(𝜽,ρN)\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\rho_{N})\end{array} (C.6)

in the event (𝒀,𝒁)𝒞(δN)(\bm{Y},\bm{Z})\in\mathscr{C}(\delta_{N})\cap\,\mathscr{H}, provided N>N0N>N_{0}.

The event (Y,Z)𝒞(δN)(\bm{Y},\bm{Z})\,\in\,\mathscr{C}(\delta_{N})\,\cap\,\mathscr{H} occurs with probability 1o(1)1-o(1). The probability of event (𝒀,𝒁)𝒞(δN)(\bm{Y},\bm{Z})\,\in\,\mathscr{C}(\delta_{N})\,\cap\,\mathscr{H} is bounded below by

((𝒀,𝒁)𝒞(δN))1((𝒀,𝒁)𝒞(δN))((𝒀,𝒁))=1o(1).\begin{array}[]{llllllllll}\mathbb{P}\left((\bm{Y},\bm{Z})\in\mathscr{C}(\delta_{N})\cap\,\mathscr{H}\right)&\geq&1-\mathbb{P}\left((\bm{Y},\bm{Z})\not\in\mathscr{C}(\delta_{N})\right)-\mathbb{P}\left((\bm{Y},\bm{Z})\not\in\mathscr{H}\right)&=&1-o(1).\end{array}

The above inequality stems from a union bound, while the identity follows from the assumption that the probabilities of the events (𝒀,𝒁)𝒞(δN)(\bm{Y},\bm{Z})\not\in\mathscr{C}(\delta_{N}) and (𝒀,𝒁)(\bm{Y},\bm{Z})\not\in\mathscr{H} satisfy

((𝒀,𝒁)𝒞(δN))=(𝒔(𝜽;𝒀,𝒁)𝔼𝒔(𝜽;𝒀,𝒁)δN)=o(1)((𝒀,𝒁))=o(1),\begin{array}[]{llllllllll}\mathbb{P}\left((\bm{Y},\bm{Z})\not\in\mathscr{C}(\delta_{N})\right)&=&\mathbb{P}\left(\left|\!\left|\bm{s}(\bm{\theta}^{\star};\,\bm{Y},\bm{Z})-\mathbb{E}\,\,\bm{s}(\bm{\theta}^{\star};\,\bm{Y},\bm{Z})\right|\!\right|_{\infty}\geq\delta_{N}\right)&=&o(1)\vskip 7.11317pt\\ \mathbb{P}\left((\bm{Y},\bm{Z})\not\in\mathscr{H}\right)&=&o(1),\end{array}

where the first result leverages the fact that 𝔼𝒔(𝜽;𝒀,𝒁)=𝟎\mathbb{E}\,\,\bm{s}(\bm{\theta}^{\star};\,\bm{Y},\bm{Z})=\bm{0} by Lemma 7 of S25.

Conclusion. Combining (C.6) with (C) establishes that, for all N>N0N>N_{0}, the random set 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) is non-empty and, with probability 1o(1)1-o(1), satisfies

𝚯^(δN)(𝜽,ρN).\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\rho_{N}).\end{array}

Appendix D Corollaries 4 and D.3

To state and prove Corollaries 4 and D.3, we first introduce notation along with background on conditional independence graphs \citepsuppgraphical.models and couplings \citepsuppLi02.

D.1 Notation and Background

We consider the model of Corollary 4, with joint probability mass function

𝜽((𝒀,𝒁)=(𝒚,𝒛)𝑿=𝒙)exp(𝜽𝒃(𝒙,𝒚,𝒛)).\begin{array}[]{llllllllll}\mathbb{P}_{\bm{\theta}}\left((\bm{Y},\,\bm{Z})=(\bm{y},\,\bm{z})\mid\bm{X}=\bm{x}\right)&\propto&\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y},\,\bm{z})\right).\end{array} (D.1)

The parameter vector is 𝜽(α𝒵,1,,α𝒵,N,γ𝒵,𝒵,γ𝒳,𝒴,𝒵)N+2\bm{\theta}\coloneqq(\alpha_{\mathscr{Z},1},\ldots,\alpha_{\mathscr{Z},N},\gamma_{\mathscr{Z},\mathscr{Z}},\,\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}})\;\in\;\mathbb{R}^{N+2} and the vector of sufficient statistics is 𝒃(𝒙,𝒚,𝒛)N+2\bm{b}(\bm{x},\,\bm{y},\,\bm{z})\in\mathbb{R}^{N+2}, with coordinates

  • bi(𝒙,𝒚,𝒛)j𝒫N{i}zi,jb_{i}(\bm{x},\,\bm{y},\,\bm{z})\coloneqq\sum_{j\in\mathscr{P}_{N}\setminus\,\{i\}}\,z_{i,j} (i=1,,Ni=1,\ldots,N),

  • bN+1(𝒙,𝒚,𝒛)i=1Nj=i+1Ndi,j(𝒛)zi,jb_{N+1}(\bm{x},\,\bm{y},\,\bm{z})\coloneqq\sum_{i=1}^{N}\sum_{j=i+1}^{N}d_{i,j}(\bm{z})\,z_{i,j},

  • bN+2(𝒙,𝒚,𝒛)i=1Nj=i+1Nci,j(xiyj+xjyi)zi,jb_{N+2}(\bm{x},\,\bm{y},\,\bm{z})\coloneqq\sum_{i=1}^{N}\sum_{j=i+1}^{N}\,c_{i,j}\,(x_{i}\,y_{j}+x_{j}\,y_{i})\,z_{i,j},

where the terms ci,jc_{i,j} and di,j(𝒛)d_{i,j}(\bm{z}) are defined as follows:

ci,j𝟙(𝒩i𝒩j)di,j(𝒛)𝟙(k𝒩i𝒩j:zi,k=zk,j=1).\begin{array}[]{llllllllll}c_{i,j}\coloneqq\mathbbm{1}(\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset)\\ d_{i,j}(\bm{z})\coloneqq\mathbbm{1}(\exists\;k\,\in\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,:\,z_{i,k}=z_{k,j}=1).\end{array} (D.2)

In light of ψ1\psi\coloneqq 1, we do not distinguish between 𝒚\bm{y} and 𝒚\bm{y}^{\star} or yiy_{i} and yiy_{i}^{\star}. To ease the presentation, we write Yi𝒙,𝒚i,𝒛Y_{i}\mid\bm{x},\,\,\bm{y}_{-i},\,\bm{z} rather than Yi(𝑿,𝒀i,𝒁)=(𝒙,𝒚i,𝒛)Y_{i}\mid(\bm{X},\,\,\bm{Y}_{-i},\,\bm{Z})=(\bm{x},\,\,\bm{y}_{-i},\,\bm{z}), and Zi,j𝒙,𝒚,𝒛{i,j}Z_{i,j}\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}} rather than Zi,j(𝑿,𝒀,𝒁{i,j})=(𝒙,𝒚,𝒛{i,j})Z_{i,j}\mid(\bm{X},\,\bm{Y},\,\bm{Z}_{-\{i,j\}})=(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}}). Expectations, variances, and covariances with respect to the conditional distributions of Yi𝒙,𝒚i,𝒛Y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z} and Zi,j𝒙,𝒚,𝒛{i,j}Z_{i,j}\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}} are denoted by 𝔼𝒴,i\mathbb{E}_{\mathscr{Y},i}, 𝕍𝒴,i\mathbb{V}_{\mathscr{Y},i}, 𝒴,i\mathbb{C}_{\mathscr{Y},i} and 𝔼𝒵,i,j\mathbb{E}_{\mathscr{Z},i,j}, 𝕍𝒵,i,j\mathbb{V}_{\mathscr{Z},i,j}, 𝒵,i,j\mathbb{C}_{\mathscr{Z},i,j}, respectively.

Conditional independence graph. Let MN+(N2)M\coloneqq N+\binom{N}{2} be the total number of responses and connections and

𝑾(W1,,WM)(Y1,,YN,Z1,2,,ZN1,N)𝒲{0, 1}N+(N2)\begin{array}[]{llllllllll}\bm{W}\,\coloneqq\,(W_{1},\,\ldots,\,W_{M})\,\coloneqq\,(Y_{1},\,\ldots,\,Y_{N},\,Z_{1,2},\,\ldots,\,Z_{N-1,N})\,\in\,\mathscr{W}\,\coloneqq\,\{0,\,1\}^{N+\binom{N}{2}}\end{array} (D.3)

be the vector consisting of responses and connections. The conditional independence structure of the model can be represented by a conditional independence graph 𝒢(𝒱,)\mathscr{G}\coloneqq(\mathscr{V},\,\mathscr{E}) with a set of vertices 𝒱{W1,,WM}\mathscr{V}\coloneqq\{W_{1},\ldots,W_{M}\} and a set of undirected edges \mathscr{E}. We refer to elements of 𝒱\mathscr{V} and \mathscr{E} as vertices and edges of 𝒢\mathscr{G}. There are two distinct subsets of vertices in 𝒢\mathscr{G}:

  • the subset 𝒱𝒴{W1,,WN}\mathscr{V}_{\mathscr{Y}}\,\coloneqq\,\{W_{1},\dots,W_{N}\} corresponding to responses Y1,,YNY_{1},\ldots,Y_{N};

  • the subset 𝒱𝒵{WN+1,,WM}\mathscr{V}_{\mathscr{Z}}\,\coloneqq\,\{W_{N+1},\dots,W_{M}\} corresponding to connections Z1,2,,ZN1,NZ_{1,2},\ldots,Z_{N-1,N}.

An undirected edge between two vertices in 𝒢\mathscr{G} represents dependence of the two corresponding random variables conditional on all other random variables. The vertices in 𝒢\mathscr{G} are connected to the following subsets of vertices (neighborhoods):

  • The neighborhood of YiY_{i} in 𝒢\mathscr{G} consists of all YjY_{j} and all Zi,jZ_{i,j} such that j𝒫N{i}j\,\in\,\mathscr{P}_{N}\setminus\,\{i\} and 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset.

  • The neighborhood of Zi,jZ_{i,j} in 𝒢\mathscr{G} consists of

    1. 1.

      YiY_{i} and YjY_{j};

    2. 2.

      all Zi,hZ_{i,h} and Zj,hZ_{j,h} such that h𝒫N{i,j}h\in\mathscr{P}_{N}\setminus\,\{i,\,j\} and h𝒩i𝒩jh\in\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j};

    3. 3.

      all Zi,hZ_{i,h} and Zj,hZ_{j,h} such that h𝒫N{i,j}h\in\mathscr{P}_{N}\setminus\,\{i,\,j\} and h𝒩i𝒩jh\not\in\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j} provided that either j𝒩i𝒩hj\in\mathscr{N}_{i}\,\cap\,\mathscr{N}_{h} holds or i𝒩j𝒩hi\in\mathscr{N}_{j}\,\cap\,\mathscr{N}_{h} holds.

Let d𝒢(i,j)d_{\mathscr{G}}(i,j) be the length of the shortest path from vertex Wi𝒱W_{i}\in\mathscr{V} to vertex Wj𝒱W_{j}\in\mathscr{V} in 𝒢\mathscr{G} and let 𝒮𝒢,i,k\mathscr{S}_{\mathscr{G},i,k} be the set of vertices with distance k{1,2,}k\in\left\{1,2,\ldots\right\} to the iith vertex WiW_{i} in 𝒢\mathscr{G}:

𝒮𝒢,i,k{Wj𝒱{Wi}:d𝒢(i,j)=k}.\begin{array}[]{llllllllll}\mathscr{S}_{\mathscr{G},i,k}&\coloneqq&\left\{W_{j}\in\mathscr{V}\setminus\{W_{i}\}:d_{\mathscr{G}}(i,j)=k\right\}.\end{array}

We define the maximum degree of vertices relating to connections in 𝒢\mathscr{G} as follows:

DNmax1iM|𝒮𝒢,i,1|.\begin{array}[]{llllllllll}D_{N}&\coloneqq&\underset{1\,\leq\,i\,\leq\,M}{\text{max}}\;|\mathscr{S}_{\mathscr{G},i,1}|.\end{array} (D.4)

Coupling matrix. Let 𝑾a:b(Wa,,Wb)𝒲a:b\bm{W}_{a:b}\coloneqq(W_{a},\,\ldots,\,W_{b})\in\mathscr{W}_{a:b} be the subvector consisting of responses and connections with indices 1abM1\leq a\leq b\leq M. The set of random variables excluding the random variable Wv𝒱W_{v}\in\mathscr{V} with v{1,,M}v\in\{1,\ldots,M\} is denoted by 𝒘v𝒲v\bm{w}_{-v}\in\mathscr{W}_{-v}. Consider any 𝒂{0,1}Mi\bm{a}\in\{0,1\}^{M-i} and define

𝜽,𝒘1:(i1),wi(𝑾(i+1):M=𝒂)𝜽(𝑾(i+1):M=𝒂(𝑾1:(i1),Wi)=(𝒘1:(i1),wi)).\begin{array}[]{llllllllll}\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},w_{i}}(\bm{W}_{(i+1):M}=\bm{a})&\coloneqq&\mathbb{P}_{\bm{\theta}^{\star}}(\bm{W}_{(i+1):M}=\bm{a}\mid(\bm{W}_{1:(i-1)},W_{i})=(\bm{w}_{1:(i-1)},w_{i})).\end{array}

We use the total variation distance between the conditional distributions 𝜽,𝒘1:(i1),0\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},0} and 𝜽,𝒘1:(i1),1\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},1} for quantifying the amount of dependence induced by the model, where 𝜽𝚯\bm{\theta}^{\star}\in\bm{\Theta} is the data-generating parameter vector. The total variation distance between 𝜽,𝒘1:(i1),0\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},0} and 𝜽,𝒘1:(i1),1\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},1} can be bounded from above by using coupling methods \citepsuppLi02. A coupling of 𝜽,𝒘1:(i1),0\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},0} and 𝜽,𝒘1:(i1),1\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},1} is a joint probability distribution 𝜽,i,𝒘1:(i1)\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}} for a pair of random vectors (𝑾(i+1):M,𝑾(i+1):M){0,1}Mi×{0,1}Mi(\bm{W}_{(i+1):M}^{\star},\,\bm{W}_{(i+1):M}^{\star\star})\in\{0,1\}^{M-i}\times\{0,1\}^{M-i} with marginals 𝜽,𝒘1:(i1),0\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},0} and 𝜽,𝒘1:(i1),1\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},1}. For convenience, we define (𝑾,𝑾){0, 1}M×{0, 1}M(\bm{W}^{\star},{\bm{W}}^{\star\star})\in\{0,\,1\}^{M}\times\{0,\,1\}^{M}, where the first ii elements are given by 𝑾1:i=(𝒘1:(i1), 0)\bm{W}^{\star}_{1:i}=(\bm{w}_{1:(i-1)},\,0) and 𝑾1:i=(𝒘1:(i1), 1){\bm{W}}^{\star\star}_{1:i}=(\bm{w}_{1:(i-1)},\,1), respectively. The basic coupling inequality \citepsupp[][Theorem 5.2, p. 19]Li02 shows that any coupling satisfies

𝜽,𝒘1:(i1),0𝜽,𝒘1:(i1),1TV𝜽,i,𝒘1:(i1)(𝑾(i+1):M𝑾(i+1):M),\begin{array}[]{llllllllll}\left|\!\left|\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},0}-\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},1}\right|\!\right|_{\text{TV}}&\leq&\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}(\bm{W}_{(i+1):M}^{\star}\neq\bm{W}_{(i+1):M}^{\star\star}),\end{array} (D.5)

where ||.||TV\left|\!\left|.\right|\!\right|_{\text{TV}} denotes the total variance distance between probability measures. If the two sides in Equation (D.5) are equal, the coupling is called optimal. An optimal coupling is guaranteed to exist, but may not be unique \citepsupp[][pp. 99–107]Li02. To prove Corollary 4, we need an upper bound on the spectral norm |𝒟(𝜽)|2{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}} of the coupling matrix 𝒟(𝜽)\mathscr{D}(\bm{\theta}^{\star}), so we construct a coupling that is convenient but may not be optimal.

A coupling 𝜽,i,𝒘1:(i1)\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}} of 𝜽,𝒘1:(i1),0\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},0} and 𝜽,𝒘1:(i1),1\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},1} can be constructed as follows:

  1. Step 1: Set 𝒰={1,,i}\mathscr{U}=\{1,\ldots,i\} and 𝒦={1,,M}\mathscr{K}=\{1,\ldots,M\}.

  2. Step 2: Set 𝒜={j𝒦𝒰:(Wi,Wj) with i𝒰 and j𝒦𝒰 such that WjWj}\mathscr{A}=\{j\in\mathscr{K}\,\setminus\,\mathscr{U}:\;(W_{i},\,W_{j})\in\mathscr{E}\text{ with }i\in\mathscr{U}\text{ and $j\in\mathscr{K}\,\setminus\,\mathscr{U}$ such that }W_{j}^{\star}\neq W_{j}^{\star\star}\}.

    • (a)

      If 𝒜\mathscr{A}\neq\emptyset, pick the smallest element j𝒜j\in\mathscr{A} and let (Wj,Wj)(W_{j}^{\star},\,W_{j}^{\star\star}) be distributed according to an optimal coupling of 𝜽(Wj=𝑾𝒰=𝒘𝒰)\mathbb{P}_{\bm{\theta}^{\star}}(W_{j}=\cdot\mid\bm{W}_{\mathscr{U}}=\bm{w}_{\mathscr{U}}^{\star}) and 𝜽(Wj=𝑾𝒰=𝒘𝒰)\mathbb{P}_{\bm{\theta}^{\star}}(W_{j}=\cdot\mid\bm{W}_{\mathscr{U}}=\bm{w}_{\mathscr{U}}^{\star\star}).

    • (b)

      If 𝒜=\mathscr{A}=\emptyset, pick the smallest element j𝒦𝒰j\in\mathscr{K}\,\setminus\,\mathscr{U} and let (Wj,Wj)(W_{j}^{\star},W_{j}^{\star\star}) be distributed according to an optimal coupling of 𝜽(Wj=𝑾𝒰=𝒘𝒰)\mathbb{P}_{\bm{\theta}^{\star}}(W_{j}=\cdot\mid\bm{W}_{\mathscr{U}}=\bm{w}_{\mathscr{U}}^{\star}) and𝜽(Wj=𝑾𝒰=𝒘𝒰)\mathbb{P}_{\bm{\theta}^{\star}}(W_{j}=\cdot\mid\bm{W}_{\mathscr{U}}=\bm{w}_{\mathscr{U}}^{\star\star}).

  3. Step 3: Replace 𝒰\mathscr{U} by 𝒰{j}\mathscr{U}\,\cup\,\{j\} and repeat Step 2 until 𝒦𝒰=\mathscr{K}\,\setminus\,\mathscr{U}=\emptyset.

Based on 𝜽,i,𝒘1:(i1)\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}, we construct a coupling matrix 𝒟(𝜽)M×M\mathscr{D}(\bm{\theta}^{\star})\in\mathbb{R}^{M\times M} with elements

𝒟i,j(𝜽){0if i<j1if i=jmax𝒘1:(i1)𝒲1:i1𝜽,i,𝒘1:(i1)(WjWj)if i>j.\begin{array}[]{llllllllll}\mathscr{D}_{i,j}(\bm{\theta}^{\star})&\coloneqq&\begin{cases}0&\mbox{if }i<j\\ 1&\mbox{if }i=j\\ \max\limits_{\bm{w}_{1:(i-1)}\,\in\,\mathscr{W}_{1:i-1}}\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}(W_{j}^{\star}\neq W_{j}^{\star\star})&\mbox{if }i>j.\end{cases}\end{array}

Overlapping subpopulations. To obtain convergence rates based on a single observation of dependent random variables 𝑾\bm{W}, we need to control the dependence of 𝑾\bm{W} in the form of |𝒟(𝜽)|2{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}. In line with the simulation setting in Section 5, we therefore assume that overlapping subpopulations 𝒜1,𝒜2,\mathscr{A}_{1},\mathscr{A}_{2},\ldots characterize the neighborhoods. The neighborhood 𝒩i\mathscr{N}_{i} of unit i𝒫Ni\in\mathscr{P}_{N} is then defined as

𝒩i{j𝒫N: there exists k{1,2,} such that i𝒜k and j𝒜k}.\begin{array}[]{llllllllll}\mathscr{N}_{i}&\coloneqq&\{j\,\in\,\mathscr{P}_{N}:\mbox{ there exists }k\in\{1,2,\ldots\}\text{ such that }i\in\mathscr{A}_{k}\text{ and }j\in\mathscr{A}_{k}\}.\end{array} (D.6)

Let 𝒢𝒜\mathscr{G}_{\mathscr{A}} be a subpopulation graph with a set of vertices 𝒱𝒜{𝒜1,𝒜2,}\mathscr{V}_{\mathscr{A}}\coloneqq\left\{\mathscr{A}_{1},\mathscr{A}_{2},\ldots\right\} and a set of edges connecting distinct subpopulations 𝒜k\mathscr{A}_{k} and 𝒜l\mathscr{A}_{l} with 𝒜k𝒜l\mathscr{A}_{k}\,\cap\,\mathscr{A}_{l}\neq\emptyset. Define

𝒮𝒢𝒜,i,k{𝒜j𝒱𝒜{𝒜i}:d𝒢𝒜(i,j)=k}.\begin{array}[]{llllllllll}\mathscr{S}_{\mathscr{G}_{\mathscr{A}},\,i,k}&\coloneqq&\left\{\mathscr{A}_{j}\in\mathscr{V}_{\mathscr{A}}\setminus\{\mathscr{A}_{i}\}:\;d_{\mathscr{G}_{\mathscr{A}}}(i,j)=k\right\}.\end{array} (D.7)

Using the background introduced above, we restate Condition 4 more formally.

Condition 4: Dependence. The population 𝒫\mathscr{P} consists of intersecting subpopulations 𝒜1,𝒜2,\mathscr{A}_{1},\mathscr{A}_{2},\ldots, whose intersections are represented by subpopulation graph 𝒢𝒜\mathscr{G}_{\mathscr{A}}. Let DN{2,3,}D_{N}\in\{2,3,\ldots\} be defined by (D.4) and 𝒮𝒢𝒜,i,k\mathscr{S}_{\mathscr{G}_{\mathscr{A}},\,i,k} be defined by (D.7), and assume that

maxk{1,2,}|𝒮𝒢𝒜,k,l|ω1+ω22DN3log(l+1),l=1,2,,\begin{array}[]{llllllllll}\max\limits_{k\,\in\,\{1,2,\ldots\}}~|\mathscr{S}_{\mathscr{G}_{\mathscr{A}},k,l}|&\leq&\omega_{1}+\dfrac{\omega_{2}}{2\,D_{N}^{3}}\log(l+1),&&l=1,2,\ldots,\end{array}

where ω1 0\omega_{1}\,\geq\,0 and 0ω2min{ω1, 1/((ω1+1)|log(1U)|)}0\,\leq\,\omega_{2}\,\leq\,\min\limits\{\omega_{1},\,1/((\omega_{1}+1)\,|\log(1-U)|)\} with U(1+exp(A))1>0U\coloneqq(1+\exp(-A))^{-1}>0. The constant A>0A>0 is identical to the constant AA in Condition 4. In addition, for each unit i𝒫ii\in\mathscr{P}_{i}, the neighborhood 𝒩i\mathscr{N}_{i} is defined by (D.6), and there exists a constant B(0,+)B\in(0,\,+\infty) such that max1iN|𝒩i|<B\max_{1\leq i\leq N}|\mathscr{N}_{i}|<B.

The assumption max1iN|𝒩i|<B\max_{1\leq i\leq N}|\mathscr{N}_{i}|<B implies that DND_{N} is bounded above by a constant D{2,3,}D\in\{2,3,\ldots\}.

D.2 Proof of Corollary 4

To prove Corollary 4, define

121{𝒘𝒲:i=1N𝑯i,1(𝒘)N2(1+χ(𝜽))2}2{𝒘𝒲:i=1N𝑯i,2(𝒘)c2N2(1+χ(𝜽))}𝑯i,1(𝒘)(di,1(𝒛),,di,i1(𝒛),di,i+1(𝒛),,di,N(𝒛))𝑯i,2(𝒘)(ci,1x12zi,1,,ci,i1xi12zi,i1,ci,i+1xi+12zi,i+1,,ci,NxN2zi,N)\begin{array}[]{llllllllll}\mathscr{H}\;\coloneqq\;\mathscr{H}_{1}\,\cap\,\mathscr{H}_{2}\vskip 7.11317pt\\ \mathscr{H}_{1}\;\coloneqq\;\left\{\bm{w}\in\mathscr{W}:\;\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}\geq\dfrac{N}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}\right\}\vskip 7.11317pt\\ \mathscr{H}_{2}\;\coloneqq\;\left\{\bm{w}\in\mathscr{W}:\;\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{w})\right|\!\right|_{\infty}}\geq\dfrac{c^{2}\,N}{2\,(1+\chi(\bm{\theta}^{\star}))}\right\}\vskip 7.11317pt\\ \bm{H}_{i,1}(\bm{w})\;\coloneqq\;(d_{i,1}(\bm{z}),\,\ldots,\,d_{i,i-1}(\bm{z}),\,d_{i,i+1}(\bm{z}),\,\ldots,\,d_{i,N}(\bm{z}))\vskip 7.11317pt\\ \bm{H}_{i,2}(\bm{w})\;\coloneqq\;(c_{i,1}\,x_{1}^{2}\,z_{i,1},\,\ldots,\,c_{i,i-1}\,x_{i-1}^{2}\,z_{i,i-1},\,c_{i,i+1}\,x_{i+1}^{2}\,z_{i,i+1},\,\ldots,\,c_{i,N}\,x_{N}^{2}\,z_{i,N})\end{array} (D.8)

and

χ(𝜽)exp(CD2(𝜽+ϵ)),\begin{array}[]{llllllllll}\chi(\bm{\theta}^{\star})&\coloneqq&\exp(C\,D^{2}\,(|\!|\bm{\theta}^{\star}|\!|_{\infty}+\epsilon^{\star})),\end{array} (D.9)

where the constants 0<c<C<0<c<C<\infty and D{2,3,}D\in\{2,3,\ldots\} are identical to the corresponding constants defined in Condition 4 and Equation (D.4), respectively.

Proof of Corollary 4. We prove Corollary 4 using Theorem 4 in five steps:

  • Step 1: We bound

    (||𝜽(𝜽;𝑾)|𝜽=𝜽𝔼𝜽(𝜽;𝑾)|𝜽=𝜽||<ρN2ΛN(𝜽))1τ(ρN2ΛN(𝜽)),\begin{array}[]{llllllllll}\mathbb{P}\left({\left|\!\left|\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\bm{W})|_{\bm{\theta}=\bm{\theta}^{\star}}-\mathbb{E}\,\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\bm{W})|_{\bm{\theta}=\bm{\theta}^{\star}}\right|\!\right|_{\infty}}<\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\right)&\geq&1-\tau\left(\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\right),\end{array}

    and choose ρN\rho_{N} so that 1τ(ρN/(2ΛN(𝜽))) 12/max{N,p}21-\tau(\rho_{N}/(2\,\Lambda_{N}(\bm{\theta}^{\star})))\,\geq\,1-2\,/\max\{N,\,p\}^{2}.

  • Step 2: We show that 𝜽2(𝜽;𝒘)-\nabla_{\bm{\theta}}^{2}~\ell(\bm{\theta};\,\bm{w}) is invertible for all 𝜽(𝜽,ϵ)\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star}) and all 𝒘\bm{w}\in\mathscr{H}.

  • Step 3: We prove that the event 𝑾\bm{W}\in\mathscr{H} occurs with probability at least1υ(ρN/(2ΛN(𝜽))) 14/max{N,p}21-\upsilon(\rho_{N}/(2\,\Lambda_{N}(\bm{\theta}^{\star})))\,\geq\,1-4\,/\max\{N,\,p\}^{2}.

  • Step 4: We bound δN\delta_{N}.

  • Step 5: We bound ρN\rho_{N}.

The proof of Corollary 4 leverages auxiliary results supplied by Lemmas D.4, D.5, and D.6, which show that there exists an integer N1{3,4,}N_{1}\in\{3,4,\dots\} such that, for all N>N1N>N_{1},

ΛN(𝜽)C1χ(𝜽)9Nby Lemma D.4N/2ΨNC2Nby Lemma D.5|𝒟(𝜽)|2C3by Lemma D.6,\begin{array}[]{llllllllll}\Lambda_{N}(\bm{\theta}^{\star})&\leq&C_{1}\;\dfrac{\chi(\bm{\theta}^{\star})^{9}}{N}&\text{by Lemma \ref{lemma.bounds.lambda}}\vskip 7.11317pt\\ \sqrt{N/2}&\leq&\Psi_{N}~\leq~C_{2}\,\sqrt{N}&\text{by Lemma \ref{lemma.bounds.psi}}\vskip 7.11317pt\\ {|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&C_{3}&\text{by Lemma \ref{lemma.bounds.d}},\end{array}

where C1>0C_{1}>0,  C2>0C_{2}>0,  and C31C_{3}\geq 1 are constants.

Step 1: Since 𝑾{0,1}M×M\bm{W}\in\{0,1\}^{M\times M},  Lemma 6 of S25 establishes

(||𝜽(𝜽;𝑾)|𝜽=𝜽𝔼𝜽(𝜽;𝑾)|𝜽=𝜽||<ρN2ΛN(𝜽))1τ(ρN2ΛN(𝜽)),\begin{array}[]{llllllllll}\mathbb{P}\left({\left|\!\left|\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\bm{W})|_{\bm{\theta}=\bm{\theta}^{\star}}-\mathbb{E}\,\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\bm{W})|_{\bm{\theta}=\bm{\theta}^{\star}}\right|\!\right|_{\infty}}<\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\right)&\geq&1-\tau\left(\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\right),\end{array}

where

τ(ρN2ΛN(𝜽))2exp(ρN232ΛN(𝜽)2(1+D)2|𝒟N(𝜽)|22ΨN2+logp),\begin{array}[]{llllllllll}\tau\left(\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\right)&\coloneqq&2\,\exp\left(-\dfrac{\rho_{N}^{2}}{32\,\Lambda_{N}(\bm{\theta}^{\star})^{2}\;(1+D)^{2}\;|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}^{2}\;\Psi_{N}^{2}}+\log\,p\right),\end{array}

with D{2,3,}D\in\{2,3,\ldots\} defined in (D.4). Choosing

ρN96ΛN(𝜽)(1+D)|𝒟(𝜽)|2ΨNlogmax{N,p}\begin{array}[]{llllllllll}\rho_{N}&\coloneqq&\sqrt{96}\;\Lambda_{N}(\bm{\theta}^{\star})\,(1+D)\,{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}\,\Psi_{N}\sqrt{\log\,\max\{N,\,p\}}\end{array} (D.10)

implies that the event

||𝜽(𝜽;𝑾)|𝜽=𝜽𝔼𝜽(𝜽;𝑾)|𝜽=𝜽||<ρN2ΛN(𝜽)\begin{array}[]{llllllllll}{\left|\!\left|\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\bm{W})|_{\bm{\theta}=\bm{\theta}^{\star}}-\mathbb{E}\,\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\bm{W})|_{\bm{\theta}=\bm{\theta}^{\star}}\right|\!\right|_{\infty}}&<&\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\end{array}

occurs with probability at least

1τ(ρN2ΛN(𝜽))12max{N,p}2.\begin{array}[]{llllllllll}1-\tau\left(\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\right)&\geq&1-\dfrac{2}{\max\{N,\,p\}^{2}}.\end{array}

Step 2: Let \mathscr{H} be defined in (D.8). Lemma D.4 establishes that 𝜽2(𝜽;𝒘)-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{w}) is invertible for all 𝜽(𝜽,ϵ)\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star}) and all 𝒘\bm{w}\in\mathscr{H}.

Step 3: Lemma D.7 shows that there exists an integer N2{3,4,}N_{2}\in\{3,4,\dots\} such that, for all N>N2N>N_{2}, the event 𝑾\bm{W}\in\mathscr{H} occurs with probability at least

1υ(δN)=14max{N,p}2.\begin{array}[]{llllllllll}1-\upsilon(\delta_{N})&=&1-\dfrac{4}{\max\{N,\,p\}^{2}}.\end{array}

Step 4: The quantity

δNρN2ΛN(𝜽)=24(1+D)|𝒟(𝜽)|2ΨNlogmax{N,p}\begin{array}[]{llllllllll}\delta_{N}&\coloneqq&\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}&=&\sqrt{24}\;\,(1+D)\,{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}\,\Psi_{N}\,\sqrt{\log\,\max\{N,\,p\}}\end{array}

is bounded below by

δN24DN/2logN=12DNlogN\begin{array}[]{llllllllll}\delta_{N}&\geq&\sqrt{24}\;\,D\,\sqrt{N/2}\;\sqrt{\log N}&=&\sqrt{12}\;\,D\,\sqrt{N\log N}\end{array}

and is bounded above by

δN24C2C3(2D)N2logN=192C2C3DNlogN,\begin{array}[]{llllllllll}\delta_{N}&\leq&\sqrt{24}\;\,C_{2}\;C_{3}\,(2\,D)\,\sqrt{N}\,\sqrt{2\,\log N}&=&\sqrt{192}\;\,C_{2}\;C_{3}\,D\,\sqrt{N\log N},\end{array}

using D{2,3,}D\in\{2,3,\ldots\}, 1|𝒟(𝜽)|2C31\leq{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}\leq C_{3},  N/2ΨNC2N\sqrt{N/2}\leq\Psi_{N}\leq C_{2}\,\sqrt{N},  and max{N,p}=p=N+2\max\{N,\,p\}=p=N+2. Since C2>0C_{2}>0, C31C_{3}\geq 1, and D{2,3,}D\in\{2,3,\ldots\} defined in (D.4) are constants, there exist constants 0<LU<0\,<\,L\,\leq\,U\,<\,\infty such that

LNlogNδNUNlogN.\begin{array}[]{llllllllll}L\;\sqrt{N\log N}&\leq&\delta_{N}&\leq&U\;\sqrt{N\log N}.\end{array}

Step 5: Substituting the bounds on ΛN(𝜽)\Lambda_{N}(\bm{\theta}^{\star}),  ΨN\Psi_{N},  and |𝒟(𝜽)|2{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}} supplied by Lemmas D.4, D.5, and D.6 into (D.10) reveals that

ρN96ΛN(𝜽)(1+D)|𝒟(𝜽)|2ΨNlogmax{N,p}96C1C2C3(2D)χ(𝜽)9NNlog(N+2)768C1C2C3Dχ(𝜽)9logNN,\begin{array}[]{llllllllll}\rho_{N}&\coloneqq&\sqrt{96}\;\Lambda_{N}(\bm{\theta}^{\star})\,(1+D)\,{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}\,\Psi_{N}\sqrt{\log\,\max\{N,\,p\}}\vskip 7.11317pt\\ &\leq&\sqrt{96}\;\,C_{1}\;C_{2}\;C_{3}\;(2\,D)\;\dfrac{\chi(\bm{\theta}^{\star})^{9}}{N}\;\sqrt{N\log(N+2)}\vskip 7.11317pt\\ &\leq&\sqrt{768}\;\,C_{1}\;C_{2}\;C_{3}\;D\,\chi(\bm{\theta}^{\star})^{9}\;\sqrt{\dfrac{\log N}{N}},\end{array} (D.11)

using max{N,p}=p=N+2\max\{N,\,p\}=p=N+2 and log(N+2)log(2N)2logN\log(N+2)\leq\log(2\,N)\leq 2\,\log N (N2N\geq 2). To bound χ(𝜽)\chi(\bm{\theta}^{\star}), we invoke Condition 4:

χ(𝜽)9exp(CD2(||𝜽||+ϵ))9exp(CD2(A+ϵ))9=exp(9CD2(A+ϵ)).\begin{array}[]{llllllllll}\chi(\bm{\theta}^{\star})^{9}\;\coloneqq\;\exp(C\,D^{2}\,(|\!|\bm{\theta}^{\star}|\!|_{\infty}+\epsilon^{\star}))^{9}\;\leq\;\exp(C\,D^{2}\,(A+\epsilon^{\star}))^{9}\;=\;\exp(9\;C\,D^{2}\,(A+\epsilon^{\star})).\end{array}

Define

K768C1C2C3Dexp(9CD2(A+ϵ))>0.\begin{array}[]{llllllllll}K&\coloneqq&\sqrt{768}\;\,C_{1}\;C_{2}\;C_{3}\;D\,\exp(9\;C\,D^{2}\,(A+\epsilon^{\star}))&>&0.\end{array}

Since AA, CC, C1C_{1}, C2C_{2}, C3C_{3}, DD, and ϵ\epsilon^{\star} are independent of NN, so is KK. We conclude that

ρNKlogNN0asN.\begin{array}[]{llllllllll}\rho_{N}&\leq&K\;\sqrt{\dfrac{\log N}{N}}&\to&0&\mbox{as}&N\to\infty.\end{array}

Conclusion. Theorem 4 implies that, for all N>N0max{N1,N2}N>N_{0}\coloneqq\max\{N_{1},N_{2}\}, the random set 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) is non-empty and satisfies

𝚯^(δN)(𝜽,KlogNN)\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}\left(\bm{\theta}^{\star},\;K\,\sqrt{\dfrac{\log N}{N}}\right)\end{array}

with probability at least

1τ(δN)υ(δN)16max{N,p}216N2,\begin{array}[]{llllllllll}1-\tau(\delta_{N})-\upsilon(\delta_{N})&\geq&1-\dfrac{6}{\max\{N,\,p\}^{2}}&\geq&1-\dfrac{6}{N^{2}},\end{array}

using max{N,p}2=p2=(N+2)2N2\max\{N,\,p\}^{2}=p^{2}=(N+2)^{2}\geq N^{2}.

D.3 Statement and Proof of Corollary D.3

If subpopulations do not overlap, 𝜽{\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}} can grow as a function NN.  Condition D.3 details how fast 𝜽{\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}} can grow.

Condition 5. The parameter space is 𝚯=N+2\bm{\Theta}=\mathbb{R}^{N+2} and the data-generating parameter vector 𝛉N+2\bm{\theta}^{\star}\in\mathbb{R}^{N+2} satisfies

𝜽E+ϑlogNCD2ϵ,\begin{array}[]{llllllllll}\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}&\leq&\dfrac{E+\vartheta\,\log N}{C\,D^{2}}-\epsilon^{\star},\end{array}

where E0E\geq 0 and ϑ[0, 1/18)\vartheta\in[0,\,1/18) are constants,  C>0C>0 is identical to the constant CC in Condition 4,  D{2,3,}D\in\{2,3,\ldots\} is identical to the constant DD in (D.4),  and ϵ>0\epsilon^{\star}>0 is identical to the constant ϵ\epsilon^{\star} in the definition of ΛN(𝛉)\Lambda_{N}(\bm{\theta}^{\star}) in Section 4.

Corollary D.3 replaces Condition 4 by Condition D.3. Resulting from this, the constant UU coming up in Condition D.1 is redefined as U(1+exp(D))1>0U\coloneqq(1+\exp(-D))^{-1}>0.

Corollary 2. Consider a single observation of dependent responses and connections (𝐘,𝐙)(\bm{Y},\bm{Z}) generated by the model with parameter vector 𝛉(α𝒵,1,,α𝒵,N,γ𝒵,𝒵,γ𝒳,𝒴,𝒵)N+2\bm{\theta}^{\star}\coloneqq(\alpha_{\mathscr{Z},1}^{\star},\,\dots,\alpha_{\mathscr{Z},N}^{\star},\,\gamma_{\mathscr{Z},\mathscr{Z}}^{\star},\,\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}^{\star})\in\mathbb{R}^{N+2}. If Conditions 4, D.1, and D.3 are satisfied with ϑ[0,1/18)\vartheta\in[0,1/18), there exist constants K(0,+)K\in(0,+\infty) and 0<LU<+0<L\leq U<+\infty along with an integer N0{3,4,}N_{0}\in\{3,4,\dots\} such that, for all N>N0N>N_{0}, the quantity δN\delta_{N} satisfies

LNlogNδNUNlogN,\begin{array}[]{llllllllll}L\,\sqrt{N\log N}&\leq&\delta_{N}&\leq&U\,\sqrt{N\log N},\end{array}

and the random set 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) is non-empty and satisfies

𝚯^(δN)(𝜽,KlogNN118ϑ)\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}\left(\bm{\theta}^{\star},\;K\,\sqrt{\dfrac{\log N}{N^{1-18\,\vartheta}}}\right)\end{array}

with probability at least 16/N21-6\,/N^{2}.

Proof of Corollary D.3. The proof of Corollary D.3 resembles the proof of Corollary 4, with Condition 4 replaced by Condition D.3. The proof of Corollary 4 shows that

ρN768C1C2C3Dχ(𝜽)9logNN,\begin{array}[]{llllllllll}\rho_{N}&\leq&\sqrt{768}\;\,C_{1}\;C_{2}\,C_{3}\;D\;\chi(\bm{\theta}^{\star})^{9}\;\sqrt{\dfrac{\log N}{N}},\end{array}

where the constants C1>0C_{1}>0, C2>0C_{2}>0,  C31C_{3}\geq 1, and D{2,3,}D\in\{2,3,\ldots\} are defined in Lemmas D.4, D.5, and D.6, and Equation (D.4), respectively. Condition D.3 implies that

χ(𝜽)9exp(CD2(||𝜽||+ϵ))9exp(CD2(E+ϑlogNCD2))9=exp(9E)N9ϑ,\begin{array}[]{llllllllll}\chi(\bm{\theta}^{\star})^{9}\;\coloneqq\;\exp(C\,D^{2}\,(|\!|\bm{\theta}^{\star}|\!|_{\infty}+\epsilon^{\star}))^{9}\;\leq\;\exp\left(C\,D^{2}\,\left(\dfrac{E+\vartheta\,\log N}{C\,D^{2}}\right)\right)^{9}\;=\;\exp(9\,E)\,N^{9\,\vartheta},\end{array}

which in turn implies that

ρN768C1C2C3Dexp(9E)logNN118ϑ=KlogNN118ϑ,\begin{array}[]{llllllllll}\rho_{N}&\leq&\sqrt{768}\;\,C_{1}\;C_{2}\,C_{3}\;D\,\exp(9\,E)\,\sqrt{\dfrac{\log N}{N^{1-18\,\vartheta}}}&=&K\;\sqrt{\dfrac{\log N}{N^{1-18\,\vartheta}}},\end{array}

where K768C1C2C3Dexp(9E)>0K\coloneqq\sqrt{768}\;C_{1}\;C_{2}\,C_{3}\;D\,\exp(9\,E)>0. The remainder of the proof of Corollary D.3 resembles the proof of Corollary 4. We conclude that there exists an integer N0{3,4,}N_{0}\in\{3,4,\dots\} such that, for all N>N0N>N_{0}, the random set 𝚯^(δN)\widehat{\bm{\Theta}}(\delta_{N}) is non-empty and satisfies

𝚯^(δN)(𝜽,KlogNN118ϑ)\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}\left(\bm{\theta}^{\star},\;K\;\sqrt{\dfrac{\log N}{N^{1-18\,\vartheta}}}\right)\end{array}

with probability at least 16/N21-6\,/N^{2}.

D.4 Bounding ΛN(𝜽)\Lambda_{N}(\bm{\theta}^{\star})

Lemma 3. Consider the model of Corollary 4. If Conditions 4 and D.1 are satisfied along with either Condition 4 or Condition D.3 with ϑ[0,1/18)\vartheta\in[0,1/18), there exists a constant C1>0C_{1}>0 along with an integer N0{3,4,}N_{0}\in\{3,4,\dots\} such that, for all N>N0N>N_{0},

  • (𝜽2(𝜽;𝒘))1(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{w}))^{-1} is invertible for all 𝜽(𝜽,ϵ)\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star}) and all 𝒘\bm{w}\in\mathscr{H},

  • the event 𝑾\bm{W}\in\mathscr{H} occurs with probability at least 14/max{N,p}21-4\,/\max\{N,\,p\}^{2},

  • ΛN(𝜽)sup𝒘sup𝜽(𝜽,ϵ)|(𝜽2(𝜽;𝒘))1|C1χ(𝜽)9N\Lambda_{N}(\bm{\theta}^{\star})~\coloneqq~\sup\limits_{\bm{w}\,\in\,\mathscr{H}}\;\,\sup\limits_{\bm{\theta}\,\in\,\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})}\,{\left|\!\left|\!\left|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}}~\leq~C_{1}\,\dfrac{\chi(\bm{\theta}^{\star})^{9}}{N},

where \mathscr{H} is defined in (D.8) and χ(𝛉)\chi(\bm{\theta}^{\star}) is defined in (D.9).

Proof of Lemma D.4. We first partition 𝜽2(𝜽;𝒘)-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{w}) in accordance with 𝜽(𝜽1,𝜽2)\bm{\theta}\coloneqq(\bm{\theta}_{1},\,\bm{\theta}_{2}), given by 𝜽1(α𝒵,1,,α𝒵,N)N\bm{\theta}_{1}\coloneqq(\alpha_{\mathscr{Z},1},\dots,\alpha_{\mathscr{Z},N})\in\mathbb{R}^{N} and 𝜽2(γ𝒵,𝒵,γ𝒳,𝒴,𝒵)2\bm{\theta}_{2}\coloneqq(\gamma_{\mathscr{Z},\mathscr{Z}},\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}})\in\mathbb{R}^{2}:

𝜽2(𝜽;𝒘)(𝑨(𝜽,𝒘)𝑪(𝜽,𝒘)𝑪(𝜽,𝒘)𝑩(𝜽,𝒘)),\begin{array}[]{llllllllll}-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{w})&\coloneqq&\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})&\bm{C}(\bm{\theta},\,\bm{w})\\ \bm{C}(\bm{\theta},\,\bm{w})^{\top}&\bm{B}(\bm{\theta},\,\bm{w})\end{pmatrix},\end{array} (D.12)

where the matrices 𝑨(𝜽,𝒘)N×N\bm{A}(\bm{\theta},\,\bm{w})\in\mathbb{R}^{N\times N} and 𝑩(𝜽,𝒘)2×2\bm{B}(\bm{\theta},\,\bm{w})\in\mathbb{R}^{2\times 2} define the covariance matrices of the sufficient statistics corresponding to the parameters 𝜽1\bm{\theta}_{1} and 𝜽2\bm{\theta}_{2}, respectively. Define 𝑪(𝜽,𝒘)(𝑪1(𝜽,𝒘),𝑪2(𝜽,𝒘))N×2\bm{C}(\bm{\theta},\,\bm{w})\coloneqq\left(\bm{C}_{1}(\bm{\theta},\,\bm{w}),\,\bm{C}_{2}(\bm{\theta},\,\bm{w})\right)\in\mathbb{R}^{N\times 2}, where 𝑪1(𝜽,𝒘)N\bm{C}_{1}(\bm{\theta},\,\bm{w})\in\mathbb{R}^{N} and 𝑪2(𝜽,𝒘)N\bm{C}_{2}(\bm{\theta},\,\bm{w})\in\mathbb{R}^{N} are the covariances of the degree terms with the transitive connection term with weight γ𝒵,𝒵\gamma_{\mathscr{Z},\mathscr{Z}} and spillover term with weight γ𝒳,𝒴,𝒵\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}, respectively.

We wish to bound the infinity norm of (𝜽2(𝜽;𝒘))1\left(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\bm{w})\right)^{-1}, given by

(𝜽2(𝜽,𝒘))1=(𝑨(𝜽,𝒘)𝑪(𝜽,𝒘)𝑪(𝜽,𝒘)𝑩(𝜽,𝒘))1=(𝑨(𝜽,𝒘)1𝟎N,2𝟎2,N𝟎2,2)+(𝑨(𝜽,𝒘)1𝑪(𝜽,𝒘)𝑰2,2)𝑽(𝜽,𝒘)1(𝑨(𝜽,𝒘)1𝑪(𝜽,𝒘)𝑰2,2),\begin{array}[]{llllllllll}(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w}))^{-1}&=&\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})&\bm{C}(\bm{\theta},\,\bm{w})\\ \bm{C}(\bm{\theta},\,\bm{w})^{\top}&\bm{B}(\bm{\theta},\,\bm{w})\end{pmatrix}^{-1}\vskip 7.11317pt\\ &=&\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})^{-1}&\bm{0}_{N,2}\\ \bm{0}_{2,N}&\bm{0}_{2,2}\end{pmatrix}\\ &+&\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\bm{C}(\bm{\theta},\,\bm{w})^{\top}\\ -\bm{I}_{2,2}\end{pmatrix}\bm{V}(\bm{\theta},\,\bm{w})^{-1}\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\bm{C}(\bm{\theta},\,\bm{w})^{\top}\\ -\bm{I}_{2,2}\end{pmatrix}^{\top},\end{array}

where 𝟎a,bdiag(0,,0){0,1}a×b\bm{0}_{a,b}\coloneqq\text{diag}(0,\ldots,0)\in\{0,1\}^{a\times b} and 𝑰a,bdiag(1,,1){0,1}a×b\bm{I}_{a,b}\coloneqq\text{diag}(1,\ldots,1)\ \in\{0,1\}^{a\times b} (a,b{1,2,}a,b\in\{1,2,\ldots\}) are diagonal matrices, and

𝑽(𝜽,𝒘)𝑩(𝜽,𝒘)𝑪(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪(𝜽,𝒘)\begin{array}[]{llllllllll}\bm{V}(\bm{\theta},\,\bm{w})&\coloneqq&\bm{B}(\bm{\theta},\,\bm{w})-\bm{C}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}(\bm{\theta},\,\bm{w})\end{array}

is the Schur complement of 𝜽2(𝜽,𝒘)-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w}) with respect to the block 𝑨(𝜽,𝒘)\bm{A}(\bm{\theta},\,\bm{w}).

The \ell_{\infty}-induced norm is submultiplicative, so

|(𝜽2(𝜽,𝒘))1||𝑨(𝜽,𝒘)1|+|(𝑨(𝜽,𝒘)1𝑪(𝜽,𝒘)𝑰p,p)||𝑽(𝜽,𝒘)1||(𝑨(𝜽,𝒘)1𝑪(𝜽,𝒘)𝑰p,p)||𝑨(𝜽,𝒘)1|+max{1,|𝑨(𝜽,𝒘)1𝑪(𝜽,𝒘)|}×|𝑽(𝜽,𝒘)1|(|𝑪(𝜽,𝒘)𝑨(𝜽,𝒘)1|+1)|𝑨(𝜽,𝒘)1|+max{1,|𝑨(𝜽,𝒘)1||𝑪(𝜽,𝒘)|}×|𝑽(𝜽,𝒘)1|(|𝑪(𝜽,𝒘)||𝑨(𝜽,𝒘)1|+1).\begin{array}[]{llllllllll}&&\,{\left|\!\left|\!\left|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &\leq&\,{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}\\ &+&{\left|\!\left|\!\left|\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}(\bm{\theta},\,\bm{w})\\ -\bm{I}_{p,p}\end{pmatrix}\right|\!\right|\!\right|_{\infty}}{\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}{\left|\!\left|\!\left|\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}(\bm{\theta},\,\bm{w})\\ -\bm{I}_{p,p}\end{pmatrix}^{\top}\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &\leq&{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}\,+\,\max\{1,\;{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}(\bm{\theta},\,\bm{w})\right|\!\right|\!\right|_{\infty}}\}\\ &\times&{\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}({\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}+1)\\ &\leq&{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}+\max\{1,\;{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}\,{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})\right|\!\right|\!\right|_{\infty}}\}\\ &\times&\,{\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}\,\left({\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})^{\top}\right|\!\right|\!\right|_{\infty}}\,{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}+1\right).\end{array} (D.13)

We bound the terms |𝑨(𝜽,𝒘)1|{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}},  |𝑪(𝜽,𝒘)|{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})^{\top}\right|\!\right|\!\right|_{\infty}},  and |𝑽(𝜽,𝒘)1|{\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}} one by one.

Bounding |A(θ,w)1|{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}. The proof of Lemma 9 in S25 shows that

|𝑨(𝜽,𝒘)1|18χ(𝜽)2N\begin{array}[]{llllllllll}{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}&\leq&\dfrac{18\,\chi(\bm{\theta}^{\star})^{2}}{N}\end{array} (D.14)

for all 𝜽(𝜽,ϵ)\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star}), where χ(𝜽)\chi(\bm{\theta}^{\star}) is an upper bound on the inverse standard deviation of connections Zi,jZ_{i,j} of pairs of units {i,j}𝒫N\{i,j\}\subset\mathscr{P}_{N} with 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset conditional on 𝑿,𝒀,𝒁{i,j}\bm{X},\,\bm{Y},\,\bm{Z}_{-\{i,j\}}. Under the model considered here, the conditional distribution of Zi,jZ_{i,j} is Bernoulli, as shown in Section 2.2.2. Therefore, 𝕍𝒵,i,j(Zi,j)\mathbb{V}_{\mathscr{Z},i,j}(Z_{i,j}) is given by

𝕍𝒵,i,j(Zi,j)=(Zi,j=1𝒙,𝒚,𝒛{i,j})×(1(Zi,j=1𝒙,𝒚,𝒛{i,j})).\begin{array}[]{llllllllll}\mathbb{V}_{\mathscr{Z},i,j}(Z_{i,j})&=&\mathbb{P}(Z_{i,j}=1\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})\times(1-\mathbb{P}(Z_{i,j}=1\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})).\end{array}

Applying the bounds on (Zi,j=1𝒙,𝒚,𝒛{i,j})\mathbb{P}(Z_{i,j}=1\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}}) supplied by Lemma D.7 gives

𝕍𝒵,i,j(Zi,j)1(exp(CD2𝜽))21(exp(CD2(𝜽+ϵ)))2,\begin{array}[]{llllllllll}\mathbb{V}_{\mathscr{Z},i,j}(Z_{i,j})&\geq&\dfrac{1}{(\exp\left(C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right))^{2}}&\geq&\dfrac{1}{(\exp\left(C\,D^{2}\,(\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}+\epsilon^{\star})\right))^{2}},\end{array} (D.15)

provided D{2,3,}D\in\{2,3,\ldots\}, where DD corresponds to the constant DD defined in (D.4) and CC corresponds to the constant CC in Condition 4. For the second inequality of (D.15), we use the fact that 𝜽𝜽+ϵ\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\leq\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}+\epsilon^{\star} for all 𝜽(𝜽,ϵ)\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star}). With

χ(𝜽)exp(CD2(𝜽+ϵ)),\begin{array}[]{llllllllll}\chi(\bm{\theta}^{\star})&\coloneqq&\exp\left(C\,D^{2}\,(\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}+\epsilon^{\star})\right),\end{array}

we therefore deduce that χ(𝜽)\chi(\bm{\theta}^{\star}) is an bound on the inverse standard deviation of connections Zi,jZ_{i,j}:

χ(𝜽)1𝕍𝒵,i,j(Zi,j).\begin{array}[]{llllllllll}\chi(\bm{\theta}^{\star})&\geq&\dfrac{1}{\sqrt{\mathbb{V}_{\mathscr{Z},i,j}(Z_{i,j})}}.\end{array}

Bounding |C(θ,w)|{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})^{\top}\right|\!\right|\!\right|_{\infty}}. Define 𝑪(𝜽,𝒘)(𝑪1(𝜽,𝒘),𝑪2(𝜽,𝒘))\bm{C}(\bm{\theta},\,\bm{w})\coloneqq\left(\bm{C}_{1}(\bm{\theta},\,\bm{w}),\,\bm{C}_{2}(\bm{\theta},\,\bm{w})\right), where
𝑪1(𝜽,𝒘)N\bm{C}_{1}(\bm{\theta},\,\bm{w})\in\mathbb{R}^{N} and 𝑪2(𝜽,𝒘)N\bm{C}_{2}(\bm{\theta},\,\bm{w})\in\mathbb{R}^{N} are the covariance terms of the degree terms with the sufficient statistics pertaining to the transitive connection term weighted by γ𝒵,𝒵\gamma_{\mathscr{Z},\mathscr{Z}} and the spillover term weighted by γ𝒳,𝒴,𝒵\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}, respectively. Then

|𝑪(𝜽,𝒘)|𝑪1(𝜽,𝒘)+𝑪2(𝜽,𝒘).\begin{array}[]{llllllllll}{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})^{\top}\right|\!\right|\!\right|_{\infty}}&\leq&\left|\!\left|\bm{C}_{1}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}\,+\,\left|\!\left|\bm{C}_{2}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}.\end{array}

We bound the terms 𝑪1(𝜽,𝒘)\left|\!\left|\bm{C}_{1}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty} and 𝑪2(𝜽,𝒘)\left|\!\left|\bm{C}_{2}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty} one by one.

By Lemma 13 of S25, 𝑪1(𝜽,𝒘)3D3\left|\!\left|\bm{C}_{1}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}\leq 3\,D^{3}. The term𝑪2(𝜽,𝒘)(C2,1(𝜽,𝒘),,C2,N(𝜽,𝒘))N\bm{C}_{2}(\bm{\theta},\,\bm{w})\coloneqq(C_{2,1}(\bm{\theta},\,\bm{w}),\ldots,C_{2,N}(\bm{\theta},\,\bm{w}))\in\mathbb{R}^{N} refers to the covariances between the degrees bi(𝒙,𝒚,𝒛)b_{i}(\bm{x},\,\bm{y},\,\bm{z}) of units i{1,,N}i\in\{1,\dots,N\} and bN+2(𝒙,𝒚,𝒛)b_{N+2}(\bm{x},\,\bm{y},\,\bm{z}). An upper bound on tt-th element of 𝑪2(𝜽,𝒘)\bm{C}_{2}(\bm{\theta},\,\bm{w}) can be obtained by

|C2,t(𝜽,𝒘)|=|i=1Nj=i+1N𝒵,i,j(bt(𝒙,𝒚,𝒁),bN+2(𝒙,𝒚,𝒁))|=|i=1Nj=i+1N𝒵,i,j(htZh,t,h=1Nk=h+1Nch,k(xhyk+xkyh)Zh,k)|=|it:𝒩i𝒩t𝒵,i,t(Zi,t,(xiyt+yixt)Zi,t)|=|it:𝒩i𝒩t(xiyt+yixt)𝕍𝒵,i,t(Zi,t)||C2it:𝒩i𝒩tN1|CD2,\begin{array}[]{llllllllll}|C_{2,t}(\bm{\theta},\,\bm{w})|&=&\left|\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{C}_{\mathscr{Z},i,j}\left(b_{t}(\bm{x},\,\bm{y},\,\bm{Z}),\,b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right|\vskip 7.11317pt\\ &=&\left|\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{C}_{\mathscr{Z},i,j}\left(\displaystyle\sum\limits_{h\neq t}Z_{h,t},\;\,\sum_{h=1}^{N}\sum_{k=h+1}^{N}\,c_{h,k}\,(x_{h}\,y_{k}+x_{k}\,y_{h})\,Z_{h,k}\right)\right|\vskip 7.11317pt\\ &=&\left|\displaystyle\sum\limits_{i\neq t:\;\mathscr{N}_{i}\,\cap\,\mathscr{N}_{t}\,\neq\,\emptyset}\mathbb{C}_{\mathscr{Z},i,t}\left(Z_{i,t},\;(x_{i}\,y_{t}+y_{i}\,x_{t})\,Z_{i,t}\right)\right|\vskip 7.11317pt\\ &=&\left|\displaystyle\sum\limits_{i\neq t:\;\mathscr{N}_{i}\,\cap\,\mathscr{N}_{t}\,\neq\,\emptyset}(x_{i}\,y_{t}+y_{i}\,x_{t})\,\mathbb{V}_{\mathscr{Z},i,t}\left(Z_{i,t}\right)\right|\vskip 7.11317pt\\ &\leq&\left|\dfrac{C}{2}\;\displaystyle\sum\limits_{i\neq t:\;\mathscr{N}_{i}\,\cap\,\mathscr{N}_{t}\,\neq\,\emptyset}^{N}1\right|\;\;\leq\;\;C\,D^{2},\end{array} (D.16)

where CC corresponds to the constant from Condition 4 and DD is defined in (D.4). On the third line, note that bt(𝒙,𝒚,𝒛)b_{t}(\bm{x},\,\bm{y},\,\bm{z}) only depends on connection Zi,jZ_{i,j} if t{i,j}t\in\{i,j\}. Therefore, the covariance of bt(𝒙,𝒚,𝒛)b_{t}(\bm{x},\,\bm{y},\,\bm{z}) with respect to any other connection is 0. The first inequality follows from the observation that xiyj+xjyi2Cx_{i}\,y_{j}+x_{j}\,y_{i}\leq 2\,C and 𝕍𝒵,i,j(Zi,j)1/4\mathbb{V}_{\mathscr{Z},i,j}\left(Z_{i,j}\right)\leq 1/4, which follows from 0xiC<0\leq x_{i}\leq C<\infty by Condition 4 and Yi{0,1}Y_{i}\in\{0,1\}. The second inequality follows from Lemma 15 in S25 bounding the pairs of units ii and tt such that 𝒩i𝒩t\mathscr{N}_{i}\,\cap\mathscr{N}_{t}\,\neq\,\emptyset from above by D2D^{2}. Since the bound from (D.16) holds for all t{1,,N}t\in\{1,\ldots,N\}, we obtain 𝑪2(𝜽,𝒘)CD2\left|\!\left|\bm{C}_{2}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}\leq C\,D^{2}. Taken together,

|𝑪(𝜽,𝒘)|3D3+CD2max{3,C}D3.\begin{array}[]{llllllllll}{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})^{\top}\right|\!\right|\!\right|_{\infty}}&\leq&3\,D^{3}+C\,D^{2}&\leq&\max\{3,\,C\}\,D^{3}.\end{array} (D.17)

Bounding |V(θ,w)1|{\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}. Write

𝑩(𝜽,𝒘)(B1,1(𝜽,𝒘)B1,2(𝜽,𝒘)B1,2(𝜽,𝒘)B2,2(𝜽,𝒘))𝑽(𝜽,𝒘)(V1,1(𝜽,𝒘)V1,2(𝜽,𝒘)V1,2(𝜽,𝒘)V2,2(𝜽,𝒘)).\begin{array}[]{llllllllll}\bm{B}(\bm{\theta},\,\bm{w})&\coloneqq&\begin{pmatrix}B_{1,1}(\bm{\theta},\,\bm{w})&B_{1,2}(\bm{\theta},\,\bm{w})\\ B_{1,2}(\bm{\theta},\,\bm{w})&B_{2,2}(\bm{\theta},\,\bm{w})\end{pmatrix}\vskip 7.11317pt\\ \bm{V}(\bm{\theta},\,\bm{w})&\coloneqq&\begin{pmatrix}V_{1,1}(\bm{\theta},\,\bm{w})&V_{1,2}(\bm{\theta},\,\bm{w})\\ V_{1,2}(\bm{\theta},\,\bm{w})&V_{2,2}(\bm{\theta},\,\bm{w})\end{pmatrix}.\end{array}

The elements of 𝑽(𝜽,𝒘)\bm{V}(\bm{\theta},\,\bm{w}) are then given by

Vi,j(𝜽,𝒘)=Bi,j(𝜽,𝒘)𝑪i(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪j(𝜽,𝒘).\begin{array}[]{llllllllll}V_{i,j}(\bm{\theta},\,\bm{w})&=&B_{i,j}(\bm{\theta},\,\bm{w})-\bm{C}_{i}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{j}(\bm{\theta},\,\bm{w}).\end{array}

The inverse of 𝑽(𝜽,𝒘)\bm{V}(\bm{\theta},\,\bm{w}) is

𝑽(𝜽,𝒘)1=1V1,1(𝜽,𝒘)V2,2(𝜽,𝒘)V1,2(𝜽,𝒘)2(V2,2(𝜽,𝒘)V1,2(𝜽,𝒘)V1,2(𝜽,𝒘)V1,1(𝜽,𝒘)),\begin{array}[]{llllllllll}\bm{V}(\bm{\theta},\,\bm{w})^{-1}&=&\dfrac{1}{V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}}\begin{pmatrix}V_{2,2}(\bm{\theta},\,\bm{w})&-V_{1,2}(\bm{\theta},\,\bm{w})\\ -V_{1,2}(\bm{\theta},\,\bm{w})&V_{1,1}(\bm{\theta},\,\bm{w})\end{pmatrix},\end{array}

implying that

|(𝑽(𝜽,𝒘))1|max{V1,1(𝜽,𝒘),V2,2(𝜽,𝒘)}+|V1,2(𝜽,𝒘)||V1,1(𝜽,𝒘)V2,2(𝜽,𝒘)V1,2(𝜽,𝒘)2|.\begin{array}[]{llllllllll}\,{\left|\!\left|\!\left|(\bm{V}(\bm{\theta},\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}}&\leq&\dfrac{\max\left\{V_{1,1}(\bm{\theta},\,\bm{w}),\,V_{2,2}(\bm{\theta},\,\bm{w})\right\}+|V_{1,2}(\bm{\theta},\,\bm{w})|}{|V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}|}.\end{array} (D.18)

Invoking the inequalities from (D.14) and (D.17), we obtain for i,j{1,2}i,j\in\{1,2\}

|𝑪i(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪j(𝜽,𝒘)|N𝑪i(𝜽,𝒘)|𝑨(𝜽,𝒘)1|𝑪i(𝜽,𝒘)N|𝑪(𝜽,𝒘)||𝑨(𝜽,𝒘)1||𝑪(𝜽,𝒘)|18max{9,C2}D6χ(𝜽)2,\begin{array}[]{llllllllll}&&|\bm{C}_{i}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{j}(\bm{\theta},\,\bm{w})|\\ &\leq&N\,\left|\!\left|\bm{C}_{i}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}\left|\!\left|\bm{C}_{i}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}\vskip 7.11317pt\\ &\leq&N\,{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})\right|\!\right|\!\right|_{\infty}}{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &\leq&18\,\max\{9,\,C^{2}\}\,D^{6}\,\chi(\bm{\theta}^{\star})^{2},\end{array} (D.19)

where DD corresponds to the constant DD defined in (D.4) and CC corresponds to the constant CC from Condition 4.

By applying Lemma D.7 along with (LABEL:eq:boundcac), we get for i,j{1,2}i,j\in\{1,2\}

|Vi,j(𝜽,𝒘)|=|Bi,j(𝜽,𝒘)|+|𝑪i(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪j(𝜽,𝒘)|max{1,C2}ND54+18max{9,C2}D6χ(𝜽)2max{9,C2}D5(N4+18Dχ(𝜽)2)\begin{array}[]{llllllllll}|V_{i,j}(\bm{\theta},\,\bm{w})|&=&\,|B_{i,j}(\bm{\theta},\,\bm{w})|\,+\,|\bm{C}_{i}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{j}(\bm{\theta},\,\bm{w})|\vskip 7.11317pt\\ &\leq&\max\{1,\,C^{2}\}\dfrac{N\,D^{5}}{4}+18\,\max\{9,\,C^{2}\}\,D^{6}\,\chi(\bm{\theta}^{\star})^{2}\vskip 7.11317pt\\ &\leq&\max\{9,\,C^{2}\}\,D^{5}\left(\dfrac{N}{4}+18\,D\,\chi(\bm{\theta}^{\star})^{2}\right)\end{array}

Thus, the numerator of (D.18) is bounded above by

max{V1,1(𝜽,𝒘),V2,2(𝜽,𝒘)}+|V1,2(𝜽,𝒘)|max{9,C2}D5(N2+36Dχ(𝜽)2).\begin{array}[]{llllllllll}&\max\left\{V_{1,1}(\bm{\theta},\,\bm{w}),\,V_{2,2}(\bm{\theta},\,\bm{w})\right\}+|V_{1,2}(\bm{\theta},\,\bm{w})|\\ &\leq~\max\{9,\,C^{2}\}\,D^{5}\left(\dfrac{N}{2}+36\,D\,\chi(\bm{\theta}^{\star})^{2}\right).\end{array} (D.20)

The denominator of (D.18), which is the determinant of 𝑽(𝜽,𝒘)\bm{V}(\bm{\theta},\,\bm{w}), is

V1,1(𝜽,𝒘)V2,2(𝜽,𝒘)V1,2(𝜽,𝒘)2=(B1,1(𝜽,𝒘)𝑪1(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪1(𝜽,𝒘))×(B2,2(𝜽,𝒘)𝑪2(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪2(𝜽,𝒘))(B1,2(𝜽,𝒘)𝑪1(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪2(𝜽,𝒘))2=B1,1(𝜽,𝒘)B2,2(𝜽,𝒘)B1,1(𝜽,𝒘)𝑪2(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪2(𝜽,𝒘)B2,2(𝜽,𝒘)𝑪1(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪1(𝜽,𝒘)+(𝑪1(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪1(𝜽,𝒘))(𝑪2(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪2(𝜽,𝒘))B1,2(𝜽,𝒘)2+2B1,2(𝜽,𝒘)(𝑪1(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪2(𝜽,𝒘))(𝑪1(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪2(𝜽,𝒘))2.\begin{array}[]{llllllllll}&&~V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}\\ &=&\,(B_{1,1}(\bm{\theta},\,\bm{w})-\bm{C}_{1}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{1}(\bm{\theta},\,\bm{w}))\\ &\times&(B_{2,2}(\bm{\theta},\,\bm{w})-\bm{C}_{2}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{2}(\bm{\theta},\,\bm{w}))\\ &-&(B_{1,2}(\bm{\theta},\,\bm{w})-\bm{C}_{1}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{2}(\bm{\theta},\,\bm{w}))^{2}\\ &=&\,B_{1,1}(\bm{\theta},\,\bm{w})\,B_{2,2}(\bm{\theta},\,\bm{w})-B_{1,1}(\bm{\theta},\,\bm{w})\,\bm{C}_{2}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{2}(\bm{\theta},\,\bm{w})\\ &-&\,B_{2,2}(\bm{\theta},\,\bm{w})\,\bm{C}_{1}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{1}(\bm{\theta},\,\bm{w})\\ &+&(\bm{C}_{1}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{1}(\bm{\theta},\,\bm{w}))\,(\bm{C}_{2}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{2}(\bm{\theta},\,\bm{w}))-B_{1,2}(\bm{\theta},\,\bm{w})^{2}\\ &+&\,2\,B_{1,2}(\bm{\theta},\,\bm{w})\,(\bm{C}_{1}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{2}(\bm{\theta},\,\bm{w}))-(\bm{C}_{1}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{2}(\bm{\theta},\,\bm{w}))^{2}.\end{array}

Applying the property of positive semidefinite matrices 𝑷n×n\bm{P}\in\mathbb{R}^{n\times n} that (𝒂𝑷𝒂)(𝒃𝑷𝒃)(𝒂𝑷𝒃)2(\bm{a}^{\top}\bm{P}\,\bm{a})\,(\bm{b}^{\top}\bm{P}\,\bm{b})\geq(\bm{a}^{\top}\bm{P}\,\bm{b})^{2} is true for all vectors 𝒂n\bm{a}\in\mathbb{R}^{n} and 𝒃n\bm{b}\in\mathbb{R}^{n} (n1n\geq 1), we obtain

V1,1(𝜽,𝒘)V2,2(𝜽,𝒘)V1,2(𝜽,𝒘)2B1,1(𝜽,𝒘)B2,2(𝜽,𝒘)B1,2(𝜽,𝒘)24maxi,j|Bi,j(𝜽,𝒘)|maxi,j|𝑪i(𝜽,𝒘)𝑨(𝜽,𝒘)1𝑪j(𝜽,𝒘)|B1,1(𝜽,𝒘)B2,2(𝜽,𝒘)B1,2(𝜽,𝒘)218max{81,C4}ND11χ(𝜽)2=U(𝜽,𝒘)18max{81,C4}ND11χ(𝜽)2,\begin{array}[]{llllllllll}&&~V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}\\ &\geq&\,B_{1,1}(\bm{\theta},\,\bm{w})\,B_{2,2}(\bm{\theta},\,\bm{w})-B_{1,2}(\bm{\theta},\,\bm{w})^{2}\\ &-&4\,\max\limits_{i,\,j}\,|B_{i,j}(\bm{\theta},\,\bm{w})|\,\,\max\limits_{i,\,j}\,|\bm{C}_{i}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{j}(\bm{\theta},\,\bm{w})|\\ &\geq&B_{1,1}(\bm{\theta},\,\bm{w})\,B_{2,2}(\bm{\theta},\,\bm{w})-B_{1,2}(\bm{\theta},\,\bm{w})^{2}-18\,\max\{81,\,C^{4}\}\,N\,D^{11}\,\chi(\bm{\theta}^{\star})^{2}\\ &=&U(\bm{\theta},\,\bm{w})-18\,\max\{81,\,C^{4}\}\,N\,D^{11}\,\chi(\bm{\theta}^{\star})^{2},\end{array}

where

U(𝜽,𝒘)B1,1(𝜽,𝒘)B2,2(𝜽,𝒘)B1,2(𝜽,𝒘)2.\begin{array}[]{llllllllll}U(\bm{\theta},\,\bm{w})&\coloneqq&B_{1,1}(\bm{\theta},\,\bm{w})\,B_{2,2}(\bm{\theta},\,\bm{w})-B_{1,2}(\bm{\theta},\,\bm{w})^{2}.\end{array} (D.21)

The final inequality follows from invoking (LABEL:eq:boundcac) along with Lemma D.7.

For (D.21), we obtain

U(𝜽,𝒘)=B1,1(𝜽,𝒘)B2,2(𝜽,𝒘)B1,2(𝜽,𝒘)2=(i=1Nj=i+1N𝕍𝒵,i,j(bN+1(𝒙,𝒚,𝒁)))×(i=1Nj=i+1N𝕍𝒴,i(bN+2(𝒙,𝒀,𝒛))+i=1Nj=i+1N𝕍𝒵,i,j(bN+2(𝒙,𝒚,𝒁)))(i=1Nj=i+1N𝒵,i,j(bN+1(𝒙,𝒚,𝒁),bN+2(𝒙,𝒚,𝒁)))2=(i=1Nj=i+1N𝕍𝒵,i,j(bN+1(𝒙,𝒚,𝒁)))(i=1N𝕍𝒴,i(bN+2(𝒙,𝒀,𝒛)))+(i=1Nj=i+1N𝕍𝒵,i,j(bN+1(𝒙,𝒚,𝒁)))(i=1Nj=i+1N𝕍𝒵,i,j(bN+2(𝒙,𝒚,𝒁)))(i=1Nj=i+1N𝒵,i,j(bN+1(𝒙,𝒚,𝒁),bN+2(𝒙,𝒚,𝒁)))2.\begin{array}[]{llllllllll}U(\bm{\theta},\,\bm{w})&=&\,B_{1,1}(\bm{\theta},\,\bm{w})\,B_{2,2}(\bm{\theta},\,\bm{w})-B_{1,2}(\bm{\theta},\,\bm{w})^{2}\vskip 7.11317pt\\ &=&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\vskip 7.11317pt\\ &\times&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Y},i}\left(b_{N+2}(\bm{x},\,\bm{Y},\,\bm{z})\right)+\,\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\vskip 7.11317pt\\ &-&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{C}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z}),\;b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)^{2}\vskip 7.11317pt\\ &=&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\left(\displaystyle\sum\limits_{i=1}^{N}\mathbb{V}_{\mathscr{Y},i}\left(b_{N+2}(\bm{x},\,\bm{Y},\,\bm{z})\right)\right)\vskip 7.11317pt\\ &+&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\vskip 7.11317pt\\ &-&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{C}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z}),\;b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)^{2}.\end{array}

Next, we show that the third term

(i=1Nj=i+1N𝒵,i,j(bN+1(𝒙,𝒚,𝒁),bN+2(𝒙,𝒚,𝒁)))2\begin{array}[]{llllllllll}\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{C}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z}),b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)^{2}\end{array}

is smaller than the second term

(i=1Nj=i+1N𝕍𝒵,i,j(bN+1(𝒙,𝒚,𝒁)))(i=1Nj=i+1N𝕍𝒵,i,j(bN+2(𝒙,𝒚,𝒁))).\begin{array}[]{llllllllll}\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right).\end{array}

Define

u1,i,j𝕍𝒵,i,j(bN+1(𝒙,𝒚,𝒁))andu2,i,j𝕍𝒵,i,j(bN+2(𝒙,𝒚,𝒁)),i=1,,N.\begin{array}[]{llllllllll}u_{1,i,j}\coloneqq\sqrt{\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)}\ \ \ \text{and}\ \ \ u_{2,i,j}\coloneqq\sqrt{\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)},\ \ \ i=1,\ldots,N.\end{array}

Then the second term can be restated as follows:

(i=1Nj=i+1N𝕍𝒵,i,j(bN+1(𝒙,𝒚,𝒁)))(i=1Nj=i+1N𝕍𝒵,i,j(bN+2(𝒙,𝒚,𝒁)))=(i=1Nj=i+1Nu1,i,j2)(i=1Nj=i+1Nu2,i,j2),\begin{array}[]{llllllllll}&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\\ &=\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}u_{1,i,j}^{2}\right)\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}u_{2,i,j}^{2}\right),\end{array}

while the third term is

(i=1Nj=i+1N𝒵,i,j(bN+1(𝒙,𝒚,𝒁),bN+2(𝒙,𝒚,𝒁)))2(i=1Nj=i+1N|𝒵,i,j(bN+1(𝒙,𝒚,𝒁),bN+2(𝒙,𝒚,𝒁))|)2(i=1Nj=i+1N𝕍𝒵,i,j(bN+1(𝒙,𝒚,𝒁))𝕍𝒵,i,j(bN+2(𝒙,𝒚,𝒁)))2=(i=1Nj=i+1Nu1,i,ju2,i,j)2(i=1Nj=i+1Nu1,i,j2)(i=1Nj=i+1Nu2,i,j2),\begin{array}[]{llllllllll}&&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\,\mathbb{C}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z}),b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)^{2}\vskip 7.11317pt\\ &\leq&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\,|\mathbb{C}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z}),b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)|\right)^{2}\vskip 7.11317pt\\ &\leq&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\,\sqrt{\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\,\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)}\right)^{2}\vskip 7.11317pt\\ &=&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}u_{1,i,j}\,u_{2,i,j}\right)^{2}\vskip 7.11317pt\\ &\leq&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}u_{1,i,j}^{2}\right)\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}u_{2,i,j}^{2}\right),\end{array}

where the Cauchy-Schwarz inequality is invoked on the third and last line. This translates to the following lower bound on U(𝜽,𝒘)U(\bm{\theta},\,\bm{w}):

U(𝜽,𝒘)=B1,1(𝜽,𝒘)B2,2(𝜽,𝒘)B1,2(𝜽,𝒘)2(i=1Nj=i+1N𝕍𝒵,i,j(bN+1(𝒙,𝒚,𝒁)))(i=1N𝕍𝒴,i(bN+2(𝒙,𝒀,𝒛)))(i=1N𝑯i,1(𝒘)(1+χ(𝜽))2)i=1N(jiNci,jxjzi,j)2𝕍𝒴,i(Yi),\begin{array}[]{llllllllll}U(\bm{\theta},\,\bm{w})&=&\,B_{1,1}(\bm{\theta},\,\bm{w})\,B_{2,2}(\bm{\theta},\,\bm{w})-B_{1,2}(\bm{\theta},\,\bm{w})^{2}\vskip 7.11317pt\\ &\geq&\,\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\left(\displaystyle\sum\limits_{i=1}^{N}\mathbb{V}_{\mathscr{Y},i}\left(b_{N+2}(\bm{x},\,\bm{Y},\,\bm{z})\right)\right)\vskip 7.11317pt\\ &\geq&\,\left(\displaystyle\sum\limits_{i=1}^{N}\dfrac{{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}}{(1+\chi(\bm{\theta}^{\star}))^{2}}\right)\,\displaystyle\sum\limits_{i=1}^{N}\left(\displaystyle\sum\limits_{j\neq i}^{N}c_{i,j}\,x_{j}\,z_{i,j}\right)^{2}\;\mathbb{V}_{\mathscr{Y},i}\left(Y_{i}\right),\end{array}

where 𝑯i,1(𝒘)\bm{H}_{i,1}(\bm{w}) is defined in (D.8) and the function ci,jc_{i,j} is defined in (D.2). For the second inequality, we use the result from the proof of Lemma 13 in S25, which implies that

i=1Nj=i+1N𝕍𝒵,i,j(bN+1(𝒚,𝒁))i=1Nj=i+1Nci,jdi,j(𝒛)𝕍𝒵,i,j(Zi,j)\begin{array}[]{llllllllll}\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{y},\,\bm{Z})\right)&\geq&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,d_{i,j}(\bm{z})\,\mathbb{V}_{\mathscr{Z},i,j}\left(Z_{i,j}\right)\\ \end{array} (D.22)

where the function di,j(𝒁)d_{i,j}(\bm{Z}) is defined in (D.2). By Lemma D.7, we get

𝕍𝒵,i,j(Zi,j)=(Zi,j𝒙,𝒚i,𝒛)×(1(Zi,j𝒙,𝒚i,𝒛))1(1+χ(𝜽))2,\begin{array}[]{llllllllll}\mathbb{V}_{\mathscr{Z},i,j}(Z_{i,j})=\mathbb{P}(Z_{i,j}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})\times(1-\mathbb{P}(Z_{i,j}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z}))&\geq&\dfrac{1}{(1+\chi(\bm{\theta}^{\star}))^{2}},\end{array}

where χ(𝜽)\chi(\bm{\theta}^{\star}) is defined in (D.9). When combined with (D.22), this results in

i=1Nj=i+1N𝕍𝒵,i,j(bN+1(𝒚,𝒁))i=1Nj=i+1Nci,jdi,j(𝒛)(1+χ(𝜽))2i=1N𝑯i,1(𝒘)2(1+χ(𝜽))2,\begin{array}[]{llllllllll}\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{y},\,\bm{Z})\right)&\geq&\dfrac{\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,d_{i,j}(\bm{z})}{(1+\chi(\bm{\theta}^{\star}))^{2}}\\ &\geq&\displaystyle\sum\limits_{i=1}^{N}\dfrac{{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}},\end{array}

where the second inequality is again from the proof of Lemma 13 in S25.

By applying Lemma D.7 and expanding the quadratic term, we obtain

U(𝜽,𝒘)(i=1N𝑯i,1(𝒘)2(1+χ(𝜽))2)i=1N𝕍𝒴,i(Yi)×(j=i+1Nxj2ci,jzi,j+h=1NkhNci,hci,kxhxkzi,hzi,k)(i=1N𝑯i,1(𝒘)2(1+χ(𝜽))2)(i=1N𝕍𝒴,i(Yi)j=i+1Nxj2ci,jzi,j)(i=1N𝑯i,1(𝒘))(i=1N𝑯i,2(𝒘))2(1+χ(𝜽))4,\begin{array}[]{llllllllll}U(\bm{\theta},\,\bm{w})&\geq&\left(\displaystyle\sum\limits_{i=1}^{N}\dfrac{{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}\right)\,\displaystyle\sum\limits_{i=1}^{N}\,\mathbb{V}_{\mathscr{Y},i}\left(Y_{i}\right)\,\vskip 7.11317pt\\ &\times&\Bigg{(}\displaystyle\sum\limits_{j=i+1}^{N}x_{j}^{2}\,c_{i,j}\,z_{i,j}+\displaystyle\sum\limits_{h=1}^{N}\displaystyle\sum\limits_{k\neq h}^{N}\,c_{i,h}\,c_{i,k}\,x_{h}\,x_{k}\,z_{i,h}\,z_{i,k}\Bigg{)}\vskip 7.11317pt\\ &\geq&\left(\displaystyle\sum\limits_{i=1}^{N}\dfrac{{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}\right)\left(\displaystyle\sum\limits_{i=1}^{N}\,\mathbb{V}_{\mathscr{Y},i}\left(Y_{i}\right)\,\displaystyle\sum\limits_{j=i+1}^{N}x_{j}^{2}\;c_{i,j}\,z_{i,j}\right)\vskip 7.11317pt\\ &\geq&\dfrac{\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}\right)\,\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{w})\right|\!\right|_{\infty}}\right)}{2\,(1+\chi(\bm{\theta}^{\star}))^{4}},\end{array}

where 𝑯i,2(𝒘)\bm{H}_{i,2}(\bm{w}) is defined in (D.8) and CC corresponds to the constant from Condition 4. The second inequality follows from the assumption xi[0,C]x_{i}\in[0,\,C] by Condition 4 along with ci,j{0,1}c_{i,j}\in\{0,1\} and zi,j{0,1}z_{i,j}\in\{0,1\}. Lemma D.7 shows that

(𝑾)14max{N,p}2,\begin{array}[]{llllllllll}\mathbb{P}(\bm{W}\in\mathscr{H})&\geq&1-\dfrac{4}{\max\{N,\,p\}^{2}},\end{array}

where \mathscr{H} is defined in (D.8). For all 𝒘\bm{w}\in\mathscr{H}, we obtain by definition

U(𝜽,𝒘)(i=1N𝑯i,1(𝒘))(i=1N𝑯i,2(𝒘))2(1+χ(𝜽))4c2N28(1+χ(𝜽))7,\begin{array}[]{llllllllll}U(\bm{\theta},\,\bm{w})&\geq&\dfrac{\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}\right)\,\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{w})\right|\!\right|_{\infty}}\right)}{2\,(1+\chi(\bm{\theta}^{\star}))^{4}}&\geq&\dfrac{c^{2}\,N^{2}}{8\,(1+\chi(\bm{\theta}^{\star}))^{7}},\end{array}

which results in the following bound for the denominator of (D.18):

V1,1(𝜽,𝒘)V2,2(𝜽,𝒘)V1,2(𝜽,𝒘)2U(𝜽,𝒘)18max{81,C4}ND11χ(𝜽)2c2N28(1+χ(𝜽))718max{81,C4}ND11χ(𝜽)2>c2N28(1+χ(𝜽))7(118432max{81,C4}D11χ(𝜽)9c2N),\begin{array}[]{llllllllll}V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}&\geq&U(\bm{\theta},\,\bm{w})-18\,\max\{81,\,C^{4}\}\,N\,D^{11}\,\chi(\bm{\theta}^{\star})^{2}\\ &\geq&\dfrac{c^{2}\,N^{2}}{8\,(1+\chi(\bm{\theta}^{\star}))^{7}}-18\,\max\{81,\,C^{4}\}\,N\,D^{11}\,\chi(\bm{\theta}^{\star})^{2}\vskip 7.11317pt\\ &>&\dfrac{c^{2}\,N^{2}}{8\,(1+\chi(\bm{\theta}^{\star}))^{7}}\,\left(1-\dfrac{18432\,\max\{81,\,C^{4}\}\,D^{11}\,\chi(\bm{\theta}^{\star})^{9}}{c^{2}\,N}\right),\end{array}

using the fact that C>0C>0,  D2D\geq 2, and ϵ>0\epsilon^{\star}>0, which implies that

χ(𝜽)exp(CD2(𝜽+ϵ))>1.\begin{array}[]{llllllllll}\chi(\bm{\theta}^{\star})&\coloneqq&\exp(C\,D^{2}\,(|\!|\bm{\theta}^{\star}|\!|_{\infty}+\epsilon^{\star}))&>&1.\end{array}

Under Conditions 4 and D.3 with ϑ[0, 1/18)\vartheta\in[0,\,1/18), we have, for all 𝒘\bm{w}\in\mathscr{H},

18432max{81,C4}D11χ(𝜽)9c2N0asN.\begin{array}[]{llllllllll}\dfrac{18432\,\max\{81,\,C^{4}\}\,D^{11}\,\chi(\bm{\theta}^{\star})^{9}}{c^{2}\,N}&\to&0&\mbox{as}&N\to\infty.\end{array}

Thus, there exists a real number ϵ>0\epsilon>0 along with an integer N3{3,4,}N_{3}\in\{3,4,\ldots\} such that

18432max{81,C4}D11χ(𝜽)9c2Nϵ\begin{array}[]{llllllllll}\dfrac{18432\,\max\{81,\,C^{4}\}\,D^{11}\,\chi(\bm{\theta}^{\star})^{9}}{c^{2}\,N}&\leq&\epsilon\end{array}

for all N>N3N>N_{3}, which implies that

V1,1(𝜽,𝒘)V2,2(𝜽,𝒘)V1,2(𝜽,𝒘)2c2N28(1+χ(𝜽))7(1ϵ).\begin{array}[]{llllllllll}&&V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}&\geq&\dfrac{c^{2}\,N^{2}}{8\,(1+\chi(\bm{\theta}^{\star}))^{7}}\,(1-\epsilon).\end{array} (D.23)

Observe that (LABEL:eq:denom) provides a positive lower bound on the determinant of 𝑽(𝜽,𝒘)\bm{V}(\bm{\theta},\,\bm{w}) for 𝒘\bm{w}\in\mathscr{H}, demonstrating that

|V1,1(𝜽,𝒘)V2,2(𝜽,𝒘)V1,2(𝜽,𝒘)2|=V1,1(𝜽,𝒘)V2,2(𝜽,𝒘)V1,2(𝜽,𝒘)2.\begin{array}[]{llllllllll}|V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}|&=&V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}.\end{array} (D.24)

Combining (LABEL:eq:boundw), (LABEL:eq:denom), and (D.24) shows that, for all 𝒘\bm{w}\in\mathscr{H},

|𝑽(𝜽,𝒘)1|max{V2,2(𝜽,𝒘),V1,1(𝜽,𝒘)}+V1,2(𝜽,𝒘)V1,1(𝜽,𝒘)V2,2(𝜽,𝒘)V1,2(𝜽,𝒘)2max{9,C2}D5(N2+32Dχ(𝜽)2)8(1+χ(𝜽))7c2N2(1ϵ)K1D5χ(𝜽)7Nmax{1,Dχ(𝜽)2N},\begin{array}[]{llllllllll}{\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}&\leq&\,\dfrac{\max\left\{V_{2,2}(\bm{\theta},\,\bm{w}),\,V_{1,1}(\bm{\theta},\,\bm{w})\right\}+V_{1,2}(\bm{\theta},\,\bm{w})}{V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}}\vskip 7.11317pt\vskip 7.11317pt\\ &\leq&\,\max\{9,\,C^{2}\}\,D^{5}\left(\dfrac{N}{2}+32\,D\,\chi(\bm{\theta}^{\star})^{2}\right)\,\dfrac{8\,(1+\chi(\bm{\theta}^{\star}))^{7}}{c^{2}\,N^{2}\,(1-\epsilon)}\vskip 7.11317pt\vskip 7.11317pt\\ &\leq&\,K_{1}\,\dfrac{D^{5}\,\chi(\bm{\theta}^{\star})^{7}}{N}\;\max\left\{1,\;\dfrac{D\,\chi(\bm{\theta}^{\star})^{2}}{N}\right\},\end{array} (D.25)

where K1>0K_{1}>0 is a constant.

Conclusion. We show in two steps that 𝜽2(𝜽,𝒘)-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w}) is invertible for all 𝜽(𝜽,ϵ)\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star}) and all 𝒘\bm{w}\in\mathscr{H}. First, by Lemma 9 in S25, the matrix 𝑨(𝜽,𝒘)\bm{A}(\bm{\theta},\,\bm{w}) is invertible for all 𝜽(𝜽,ϵ)\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star}) and all 𝒘\bm{w}\in\mathscr{H}. Second, (D.24) demonstrates that the determinant of 𝑽(𝜽,𝒘)\bm{V}(\bm{\theta},\,\bm{w}) is bounded away from 0 for all 𝜽(𝜽,ϵ)\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star}) and all 𝒘\bm{w}\in\mathscr{H}. Thus, 𝑽(𝜽,𝒘)\bm{V}(\bm{\theta},\,\bm{w}) is nonsingular for all 𝜽(𝜽,ϵ)\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star}) and all 𝒘\bm{w}\in\mathscr{H}, and so is 𝜽2(𝜽,𝒘)-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w}) by Theorem 8.5.11 of \citetsupp[][p. 99]harville_matrix_1997. Combining (LABEL:hessian_break), (D.14), (D.17), and (D.25) shows that, for all 𝜽(𝜽,ϵ)\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star}) and all 𝒘\bm{w}\in\mathscr{H},

|(𝜽2(𝜽,𝒘))1||𝑨(𝜽,𝒘)1|+max{1,|𝑨(𝜽,𝒘)1||𝑪(𝜽,𝒘)|}|𝑽(𝜽,𝒘)1|×(N|𝑪(𝜽,𝒘)||𝑨(𝜽,𝒘)1|+1)18χ(𝜽)2N+max{1,max{3,C}D318χ(𝜽)2N}K1D5χ(𝜽)7N×max{1,D2χ(𝜽)2N}(max{3,C}D3 18χ(𝜽)2+1).\begin{array}[]{llllllllll}{\left|\!\left|\!\left|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}}&\leq&{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &+&\max\{1,\;{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})\right|\!\right|\!\right|_{\infty}}\}\;{\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &\times&\left(N\,{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})\right|\!\right|\!\right|_{\infty}}{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}+1\right)\vskip 7.11317pt\vskip 7.11317pt\\ &\leq&\dfrac{18\,\chi(\bm{\theta}^{\star})^{2}}{N}\vskip 7.11317pt\vskip 7.11317pt\\ &+&\max\left\{1,\;\max\{3,\,C\}\,D^{3}\,\dfrac{18\,\chi(\bm{\theta}^{\star})^{2}}{N}\right\}\,K_{1}\;\dfrac{D^{5}\,\chi(\bm{\theta}^{\star})^{7}}{N}\,\vskip 7.11317pt\\ &\times&\max\left\{1,\;\dfrac{D^{2}\,\chi(\bm{\theta}^{\star})^{2}}{N}\right\}\;\left(\max\{3,\,C\}\,D^{3}\,18\,\chi(\bm{\theta}^{\star})^{2}+1\right).\end{array}

Conditions 4 and D.3 with ϑ[0, 1/18)\vartheta\in[0,\,1/18) imply that

χ(𝜽)2N<χ(𝜽)9N 0 as N.\begin{array}[]{llllllllll}\dfrac{\chi(\bm{\theta}^{\star})^{2}}{N}~<~\dfrac{\chi(\bm{\theta}^{\star})^{9}}{N}\;\rightarrow\;0\mbox{ as }N\to\infty.\end{array}

Thus, there exists an integer N0{3,4,}N_{0}\in\{3,4,\dots\} such that the two maxima in the upper bound on |(𝜽2(𝜽,𝒘))1|{\left|\!\left|\!\left|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}} are equal to 11 for all N>N0N>N_{0}, so that

|(𝜽2(𝜽,𝒘))1|18χ(𝜽)2N+K1D5χ(𝜽)7N(max{3,C}D3 18χ(𝜽)2+1)18χ(𝜽)2N+K2D8χ(𝜽)9NC1χ(𝜽)9N,\begin{array}[]{llllllllll}&&{\left|\!\left|\!\left|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &\leq&\,\dfrac{18\,\chi(\bm{\theta}^{\star})^{2}}{N}+K_{1}\,\dfrac{D^{5}\,\chi(\bm{\theta}^{\star})^{7}}{N}\left(\max\{3,\,C\}\,D^{3}\,18\,\chi(\bm{\theta}^{\star})^{2}+1\right)\vskip 7.11317pt\vskip 7.11317pt\\ &\leq&\,\dfrac{18\,\chi(\bm{\theta}^{\star})^{2}}{N}+K_{2}\,\dfrac{D^{8}\,\chi(\bm{\theta}^{\star})^{9}}{N}\vskip 7.11317pt\vskip 7.11317pt\\ &\leq&C_{1}\,\dfrac{\chi(\bm{\theta}^{\star})^{9}}{N},\end{array} (D.26)

where K2>0K_{2}>0 and C1>0C_{1}>0 are constants. Substituting (LABEL:last.inequality) into the definition of ΛN(𝜽)\Lambda_{N}(\bm{\theta}^{\star}) concludes the proof of Lemma D.4:

ΛN(𝜽)sup𝒘sup𝜽(𝜽,ϵ)|(𝜽2(𝜽;𝒘))1|C1χ(𝜽)9N.\begin{array}[]{llllllllll}\Lambda_{N}(\bm{\theta}^{\star})&\coloneqq&\sup\limits_{\bm{w}\,\in\,\mathscr{H}}\;\,\sup\limits_{\bm{\theta}\,\in\,\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})}\,{\left|\!\left|\!\left|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}}&\leq&C_{1}\,\dfrac{\chi(\bm{\theta}^{\star})^{9}}{N}.\end{array}

D.5 Bounding ΨN\Psi_{N}

Lemma 4. Consider the model of Corollary 4. Then N/2ΨNC2N\sqrt{N/2}~\leq~\Psi_{N}~\leq~C_{2}\,\sqrt{N}, where C2>0C_{2}>0 is a constant.

Proof of Lemma D.5. The term ΨN\Psi_{N} is defined in

ΨNmax1aN+2𝚵a2,\begin{array}[]{llllllllll}\Psi_{N}&\coloneqq&\underset{1\leq a\leq N+2}{\max}\;\left|\!\left|\bm{\Xi}_{a}\right|\!\right|_{2},\end{array}

where

𝚵a(Ξ{1},a,,Ξ{N},a,Ξ{1,2},a,,Ξ{N,N1},a)=(𝚵𝒴,a,𝚵𝒵,a),a{1,,N+2}.\begin{array}[]{llllllllll}\bm{\Xi}_{a}&\coloneqq&(\Xi_{\{1\},a},\ldots,\Xi_{\{N\},a},\Xi_{\{1,2\},a},\ldots,\Xi_{\{N,N-1\},a})=(\bm{\Xi}_{\mathscr{Y},a},\bm{\Xi}_{\mathscr{Z},a}),~\text{$a\in\{1,\ldots,N+2\}$}.\end{array}

The sensitivity of the sufficient statistic vector ba(𝒙,𝒚,𝒛)b_{a}(\bm{x},\,\bm{y},\,\bm{z}) with respect to changes of responses is quantified by the vector 𝚵𝒴,aN\bm{\Xi}_{\mathscr{Y},a}\in\mathbb{R}^{N}:

𝚵𝒴,a(Ξ{1},a,,Ξ{N},a),\begin{array}[]{llllllllll}\bm{\Xi}_{\mathscr{Y},a}&\coloneqq(\Xi_{\{1\},a},\ldots,\Xi_{\{N\},a}),\end{array}

where

Ξ{i},amax(𝒘,𝒘)𝒲×𝒲:yk=yk for all ki,𝒛=𝒛|ba(𝒙,𝒚,𝒛)ba(𝒙,𝒚,𝒛)|.\begin{array}[]{llllllllll}\Xi_{\{i\},a}&\coloneqq&\max\limits_{(\bm{w},\bm{w}^{\prime})\,\in\,\mathscr{W}\,\times\,\mathscr{W}:\;y_{k}=y_{k}^{\prime}\text{ for all }k\neq i,\,\bm{z}=\bm{z}^{\prime}}|b_{a}(\bm{x},\,\bm{y},\,\bm{z})-b_{a}(\bm{x},\,\bm{y}^{\prime},\bm{z}^{\prime})|.\end{array}

The sensitivity of the sufficient statistic vector ba(𝒙,𝒚,𝒛)b_{a}(\bm{x},\,\bm{y},\,\bm{z}) with respect to changes of connections is quantified by the vector 𝚵𝒵,aN\bm{\Xi}_{\mathscr{Z},a}\in\mathbb{R}^{N}:

𝚵𝒵,a(Ξ{1,2},a,,Ξ{N,N1},a),\begin{array}[]{llllllllll}\bm{\Xi}_{\mathscr{Z},a}&\coloneqq&(\Xi_{\{1,2\},a},\ldots,\Xi_{\{N,N-1\},a}),\end{array}

where

Ξ{i,j},amax(𝒘,𝒘)𝒲×𝒲:𝒚=𝒚,zk,l=zk,l for all {k,l}{i,j}|ba(𝒙,𝒚,𝒛)ba(𝒙,𝒚,𝒛)|.\begin{array}[]{llllllllll}\Xi_{\{i,j\},a}&\coloneqq&\max\limits_{(\bm{w},\bm{w}^{\prime})\,\in\,\mathscr{W}\,\times\,\mathscr{W}:\;\bm{y}=\bm{y}^{\prime},\,z_{k,l}=z_{k,l}^{\prime}\text{ for all }\,\{k,\,l\}\,\neq\,\{i,\,j\}}|b_{a}(\bm{x},\,\bm{y},\,\bm{z})-b_{a}(\bm{x},\,\bm{y}^{\prime},\bm{z}^{\prime})|.\end{array}

Define

ΨN=max1aN+2i=1N|Ξ{i},a|2+i=1Nj=i+1N|Ξ{i,j},a|2.\begin{array}[]{llllllllll}\Psi_{N}&=&\max\limits_{1\leq a\leq N+2}\,\sqrt{\displaystyle\sum\limits_{i=1}^{N}|\Xi_{\{i\},a}|^{2}+\displaystyle\sum\limits_{i=1}^{N}\displaystyle\sum\limits_{j=i+1}^{N}|\Xi_{\{i,j\},a}|^{2}}.\end{array} (D.27)
  • For a=1,,Na=1,\ldots,N, the statistic ba(𝒙,𝒚,𝒛)b_{a}(\bm{x},\,\bm{y},\,\bm{z}) refers to the degree effects of unit aa:

    ba(𝒙,𝒚,𝒛)=j=1;jaNza,j.\begin{array}[]{llllllllll}b_{a}(\bm{x},\,\bm{y},\,\bm{z})&=&\displaystyle\sum\limits_{j=1;\,j\neq a}^{N}z_{a,j}.\end{array}

    The term Ξ{i,j},a\Xi_{\{i,j\},a} is 11 if a{i,j}a\in\{i,\,j\} and is 0 otherwise. Since the statistic is unaffected by the response, Ξ{i},a=0\Xi_{\{i\},a}=0 for all i=1,,Ni=1,\ldots,N. For the sum in (D.27) over all i<ji<j,  where 𝕀(a{i,j})=1\mathbb{I}(a\in\{i,j\})=1 holds NN times,  yielding 𝚵a2N\left|\!\left|\bm{\Xi}_{a}\right|\!\right|_{2}\,\leq\,\sqrt{N} for all a=1,,Na=1,\ldots,N.

  • The statistic bN+1(𝒙,𝒚,𝒛)b_{N+1}(\bm{x},\,\bm{y},\,\bm{z}) refers to the transitive connections effect given by

    bN+1(𝒙,𝒚,𝒛)=i=1Nj=i+1Ndi,j(𝒛)zi,j,\begin{array}[]{llllllllll}b_{N+1}(\bm{x},\,\bm{y},\,\bm{z})&=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}d_{i,j}(\bm{z})\,z_{i,j},\end{array}

    where the function di,j(𝒁)d_{i,j}(\bm{Z}) is defined in (D.2). Since this statistic is not affected by 𝒚\bm{y},  Ξ{i},a=0\Xi_{\{i\},a}=0 for all i=1,,Ni=1,\ldots,N. Following Lemma 18 in S25,

    𝚵N+12ND2(1+D)24ND4=2D2N,\begin{array}[]{llllllllll}\left|\!\left|\bm{\Xi}_{N+1}\right|\!\right|_{2}&\leq&\sqrt{N\,D^{2}\,(1+D)^{2}}&\leq&\sqrt{4\,N\,D^{4}}&=&2\,D^{2}\,\sqrt{N},\end{array}

    where DD corresponds to the constant defined in D.4.

  • The statistic bN+2(𝒙,𝒚,𝒛)b_{N+2}(\bm{x},\,\bm{y},\,\bm{z}) refers to the spillover effect given by

    bN+2(𝒙,𝒚,𝒛)=i=1Nj=i+1Nci,j(xiyj+yixj)zi,j,\begin{array}[]{llllllllll}b_{N+2}(\bm{x},\,\bm{y},\,\bm{z})&=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,(x_{i}\,y_{j}+y_{i}\,x_{j})\,z_{i,j},\end{array}

    where the function ci,jc_{i,j} is defined in (D.2). For {i,j}𝒫N\{i,j\}\subset\mathscr{P}_{N}, the terms Ξ{i,j},N+2\Xi_{\{i,j\},N+2} are (yixj+yjxi)2C(y_{i}\,x_{j}+y_{j}\,x_{i})\leq 2\,C if 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset and 0 otherwise. For all i𝒫Ni\in\mathscr{P}_{N},

    Ξ{i},N+2=j:𝒩i𝒩jxjzi,jj:𝒩i𝒩jCCD2,\begin{array}[]{llllllllll}\Xi_{\{i\},N+2}&=&\displaystyle\sum\limits_{j:\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}}x_{j}\,z_{i,j}&\leq&\displaystyle\sum\limits_{j:\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}}C&\leq&C\,D^{2},\end{array}

    because according to Lemma 15 in S25 there are at most D2D^{2} units such that 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset and xiCx_{i}\leq C for i=1,,Ni=1,\ldots,N according to Condition 4. Combining Ξ{i,j},N+2\Xi_{\{i,j\},N+2} and Ξ{i},N+2\Xi_{\{i\},N+2} gives

    𝚵N+222NCD2+NCD2D3NC2DNC,\begin{array}[]{llllllllll}\left|\!\left|\bm{\Xi}_{N+2}\right|\!\right|_{2}&\leq&\sqrt{2\,N\,C\,D^{2}+N\,C\,D^{2}}&\leq&D\,\sqrt{3\,N\,C}&\leq&2\,D\,\sqrt{N\,C},\end{array}

    where CC corresponds to the constant CC in Condition 4.

Combining the results for 𝚵a2\left|\!\left|\bm{\Xi}_{a}\right|\!\right|_{2} for a=1,,N+2a=1,\ldots,N+2 gives

N/2ΨN2DNCN/2ΨNC2N,\begin{array}[]{llllllllll}\sqrt{N/2}&\leq&\Psi_{N}&\leq&2\,D\,\sqrt{N\,C}\\ \sqrt{N/2}&\leq&\Psi_{N}&\leq&C_{2}\,\sqrt{N},\end{array}

where C22DC>0C_{2}\coloneqq 2\,D\,\sqrt{C}>0 is a constant.

D.6 Bounding |𝒟N(𝜽)|2{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}

Lemma 5. Consider the model of Corollary 4. If Conditions 4, 4, and D.1 are satisfied with ϑ[0,1/18)\vartheta\in[0,1/18), there exists a constant C31C_{3}\geq 1 such that |𝒟(𝛉)|2C3{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}\leq C_{3} for all N2N\geq 2. If the population 𝒫N\mathscr{P}_{N} consists of non-overlapping subpopulations with dependence restricted to subpopulations, the same result holds when Condition 4 is replaced by Condition D.3.

Proof of Lemma D.6. To bound |𝒟(𝜽)|2{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}} from above, we use the Hölder’s inequality

|𝒟(𝜽)|2|𝒟(𝜽)|1|𝒟(𝜽)|,\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&\sqrt{{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{1}}\,{\left|\!\left|\!\left|\mathscr{D}(\bm{\theta}^{\star})\right|\!\right|\!\right|_{\infty}}},\end{array} (D.28)

where

|𝒟(𝜽)|1max1jMi=1M|𝒟i,j(𝜽)||𝒟(𝜽)|max1iMj=1M|𝒟i,j(𝜽)|.\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{1}}&\coloneqq&\max\limits_{1\leq j\leq M}\displaystyle\sum\limits_{i=1}^{M}|\mathscr{D}_{i,j}(\bm{\theta}^{\star})|\vskip 7.11317pt\\ {\left|\!\left|\!\left|\mathscr{D}(\bm{\theta}^{\star})\right|\!\right|\!\right|_{\infty}}&\coloneqq&\max\limits_{1\leq i\leq M}\displaystyle\sum\limits_{j=1}^{M}|\mathscr{D}_{i,j}(\bm{\theta}^{\star})|.\end{array}

We can therefore bound |𝒟(𝜽)|2{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}} by bounding the elements of the upper triangular coupling matrix 𝒟(𝜽)M×M\mathscr{D}(\bm{\theta}^{\star})\in\mathbb{R}^{M\times M} which are

𝒟i,j(𝜽){0if i<j1if i=jmax𝒘1:(i1)𝒲1:i1𝜽,i,𝒘1:(i1)(WjWj)if i>j.\begin{array}[]{llllllllll}\mathscr{D}_{i,j}(\bm{\theta}^{\star})&\coloneqq&\begin{cases}0&\mbox{if }i<j\\ 1&\mbox{if }i=j\\ \max\limits_{\bm{w}_{1:(i-1)}\,\in\,\mathscr{W}_{1:i-1}}\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}(W_{j}^{\star}\neq W_{j}^{\star\star})&\mbox{if }i>j.\\ \end{cases}\end{array}

Next, we define a symmetrized version of the coupling matrix denoted by 𝒯(𝜽)M×M\mathscr{T}(\bm{\theta}^{\star})\in\mathbb{R}^{M\times M} with elements

𝒯i,j(𝜽){𝒟j,i(𝜽)if i<j𝒟i,i(𝜽)if i=j𝒟i,j(𝜽)if i>j.\begin{array}[]{llllllllll}\mathscr{T}_{i,j}(\bm{\theta}^{\star})&\coloneqq&\begin{cases}\mathscr{D}_{j,i}(\bm{\theta}^{\star})&\mbox{if }i<j\\ \mathscr{D}_{i,i}(\bm{\theta}^{\star})&\mbox{if }i=j\\ \mathscr{D}_{i,j}(\bm{\theta}^{\star})&\mbox{if }i>j.\\ \end{cases}\end{array}

The symmetry of 𝒯(𝜽)\mathscr{T}(\bm{\theta}^{\star}) yields the following upper bound for (D.28):

|𝒟(𝜽)|2|𝒯(𝜽)|1|𝒯(𝜽)|=|𝒯(𝜽)|=1+max1iMj=1:jiM𝜽,i,𝒘1:(i1)(WjWj)\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&\sqrt{{|\!|\!|\mathscr{T}(\bm{\theta}^{\star})|\!|\!|_{1}}\,{\left|\!\left|\!\left|\mathscr{T}(\bm{\theta}^{\star})\right|\!\right|\!\right|_{\infty}}}~=~{\left|\!\left|\!\left|\mathscr{T}(\bm{\theta}^{\star})\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &=&1+\underset{1\leq i\leq M}{\max}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}(W_{j}^{\star}\neq W_{j}^{\star\star})\vskip 7.11317pt\\ \end{array} (D.29)

where the constant 1 in the second line stems from the diagonal elements of 𝒯(𝜽)\mathscr{T}(\bm{\theta}^{\star}).

Consider any (i,j){1,,M}×{1,,M}(i,j)\in\{1,\ldots,M\}\times\{1,\ldots,M\} such that iji\neq j and define the event Wi\centernotWjW_{i}\centernot{\longleftrightarrow}W_{j} as the event that there exists a path of disagreement between vertices WiW_{i} and WjW_{j} in 𝒢\mathscr{G}. A path of disagreement between vertices WiW_{i} and WjW_{j} in 𝒢\mathscr{G} is a path from WiW_{i} to WjW_{j} in 𝒢\mathscr{G} such that the coupling (W(i+1):M,W(i+1):M)(W_{(i+1):M}^{\star},\,W_{(i+1):M}^{\star\star}) with joint probability mass function 𝜽,i,𝒘1:(i1)\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}} disagrees at each vertex on the path, in the sense that WWW^{\star}\neq W^{\star\star} holds for all vertices WW on the path. Theorem 1 of \citetsupp[p. 753]BeMa94 shows that

𝜽,i,𝒘1:(i1)(WjWj)𝔹𝝅(Wi\centernotWj in 𝒢),\begin{array}[]{llllllllll}\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}(W_{j}^{\star}\neq W_{j}^{\star\star})&\leq&\mathbb{B}_{\bm{\pi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}),\end{array} (D.30)

where 𝔹𝝅\mathbb{B}_{\bm{\pi}} is a Bernoulli product measure based on MM independent Bernoulli experiments with success probabilities 𝝅(π1,,πM)[0,1]M\bm{\pi}\coloneqq(\pi_{1},\dots,\pi_{M})\in[0,1]^{M}. With v{1,,M}v\in\{1,\ldots,M\}, the success probabilities πv\pi_{v} are

πv{0if v{1,,i1}1if v=imax(𝒘v,𝒘v)𝒲v×𝒲vπv,𝒘v,𝒘vif v{i+1,,M},\begin{array}[]{llllllllll}\pi_{v}&\coloneqq&\begin{cases}0&\text{if $v\in\{1,\ldots,i-1\}$}\\ 1&\text{if $v=i$}\\ \underset{(\bm{w}_{-v},\bm{w}_{-v}^{\prime})\in\mathscr{W}_{-v}\times\mathscr{W}_{-v}}{\max}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\text{if $v\in\{i+1,\ldots,M\},$}\end{cases}\end{array}

where

πv,𝒘v,𝒘v||𝜽(𝒘v)𝜽(𝒘v)||TV.\begin{array}[]{llllllllll}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\coloneqq&\left|\!\left|\mathbb{P}_{\bm{\theta}}(\,\cdot\mid\bm{w}_{-v})-\mathbb{P}_{\bm{\theta}}(\,\cdot\mid\bm{w}_{-v}^{\prime})\right|\!\right|_{\text{TV}}.\end{array} (D.31)

Lemma D.7 provides the following upper bound:

πv,𝒘v,𝒘v11+exp(CD2𝜽),\begin{array}[]{llllllllll}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\leq&\dfrac{1}{1+\exp(-C\,D^{2}\,\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty})},\end{array} (D.32)

where CC corresponds to the positive constant from Condition 4 and DD is defined in (D.4). Combining (D.32) with Condition D.3 shows that

11+exp(CD2𝜽)11+exp(EϑlogN)UN.\begin{array}[]{llllllllll}\dfrac{1}{1+\exp(-C\,D^{2}\,\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty})}&\leq&\dfrac{1}{1+\exp(-E-\vartheta\log N)}&\eqqcolon&U_{N}.\end{array} (D.33)

The constant UN=UU_{N}=U coincides with the constant UU considered in Condition D.1.

With (D.33), we define the vector 𝝃[0,1]M\bm{\xi}\in[0,1]^{M} with elements

ξv{0if v{1,,i1}1if v=iUNif v{i+1,,M},\begin{array}[]{llllllllll}\xi_{v}&\coloneqq&\begin{cases}0&\text{if $v\in\{1,\ldots,i-1\}$}\\ 1&\text{if $v=i$}\\ U_{N}&\text{if $v\in\{i+1,\ldots,M\}$},\end{cases}\end{array}

and obtain

𝔹𝝅(Wi\centernotWj in 𝒢)𝔹𝝃(Wi\centernotWj in 𝒢),\begin{array}[]{llllllllll}\mathbb{B}_{\bm{\pi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G})&\leq&\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}),\end{array} (D.34)

because πvξv\pi_{v}\leq\xi_{v} for all v=1,,Mv=1,\ldots,M.

Next, we construct the set

a,b{{c,d}:c𝒩a𝒩b,d𝒩a𝒩b{c}}{{c}:c𝒩a𝒩b}\begin{array}[]{llllllllll}\mbox{$\mathscr{M}$}_{a,b}&\coloneqq&\{\{c,d\}:\;c\,\in\,\mathscr{N}_{a}\,\cup\,\mathscr{N}_{b},\;d\,\in\,\mathscr{N}_{a}\,\cup\,\mathscr{N}_{b}\setminus\{c\}\}\;\cup\;\{\{c\}:\;c\,\in\,\mathscr{N}_{a}\,\cup\,\mathscr{N}_{b}\}\end{array}

and two additional graphs with the same set of vertices as 𝒢\mathscr{G}:

  1. 1.

    𝒢1(𝒱,1)\mathscr{G}_{1}\coloneqq(\mathscr{V},\mathscr{E}_{1}):

    • Vertex W𝒱𝒵W\in\mathscr{V}_{\mathscr{Z}} relating to connection Zi,jZ_{i,j} has edges to vertices that relate to all connections Zh,kZ_{h,k} and responses YhY_{h} with {h,k},{h}i,j\{h,k\},\{h\}\in\mbox{$\mathscr{M}$}_{i,j}.

    • Vertex W𝒱𝒴W\in\mathscr{V}_{\mathscr{Y}} relating to attribute YiY_{i} has edges to vertices that relate to all connections Zh,kZ_{h,k} and responses YhY_{h} with {h,k},{h}i,N+1\{h,k\},\{h\}\in\mbox{$\mathscr{M}$}_{i,N+1} for a fictional unit N+1N+1 with 𝒩N+1=\mathscr{N}_{N+1}=\emptyset.

  2. 2.

    𝒢2(𝒱,12)\mathscr{G}_{2}\coloneqq(\mathscr{V},\,\mathscr{E}_{1}\cup\mathscr{E}_{2}): The set 2\mathscr{E}_{2} includes edges of all vertices Wi𝒱W_{i}\in\mathscr{V} with i{1,,M}i\in\{1,\ldots,M\} to vertices in 𝒮𝒢1,i,2\mathscr{S}_{\mathscr{G}_{1},i,2}.

The graph 𝒢2\mathscr{G}_{2} is a covering of 𝒢\mathscr{G}, so

𝔹𝝃(Wi\centernotWj in 𝒢)𝔹𝝃(Wi\centernotWj in 𝒢2).\begin{array}[]{llllllllll}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G})&\leq&\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2}).\end{array} (D.35)

Combining the previous results gives

|𝒟(𝜽)|21+max1iMj=1:jiM𝜽,i,𝒘1:(i1)(WjWj)1+max1iMj=1:jiM𝔹𝝅(Wi\centernotWj in 𝒢)1+max1iMj=1:jiM𝔹𝝃(Wi\centernotWj in 𝒢)1+max1iMj=1:jiM𝔹𝝃(Wi\centernotWj in 𝒢2),\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&1+\underset{1\leq i\leq M}{\max}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}(W_{j}^{\star}\neq W_{j}^{\star\star})\\ &\leq&1+\max\limits_{1\leq i\leq M}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{B}_{\bm{\pi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G})\\ &\leq&1+\max\limits_{1\leq i\leq M}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G})\\ &\leq&1+\max\limits_{1\leq i\leq M}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2}),\end{array} (D.36)

using (D.29), (D.30), (D.34), (D.35). Sorting the vertices without WiW_{i} by the geodesic distance to WiW_{i} (i.e., by the length of the shortest path to WiW_{i}), we obtain

j=1:jiM𝔹𝝃(Wi\centernotWj in 𝒢2)|𝒮𝒢2,i,1|(maxWj𝒮𝒢2,i,1𝔹𝝃(Wi\centernotWj in 𝒢2))+k=2|𝒮𝒢2,i,k|(maxWj𝒮𝒢2,i,k𝔹𝝃(Wi\centernotWj in 𝒢2))|𝒮𝒢2,i,1|+k=2|𝒮𝒢2,i,k|maxWj𝒮𝒢2,i,k𝔹𝝃(Wi\centernotWj in 𝒢2).\begin{array}[]{llllllllll}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})&\leq&|\mathscr{S}_{\mathscr{G}_{2},i,1}|\left(\max\limits_{W_{j}\in\mathscr{S}_{\mathscr{G}_{2},i,1}}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})\right)\\ &+&\displaystyle\sum\limits_{k=2}^{\infty}|\mathscr{S}_{\mathscr{G}_{2},i,k}|\,\left(\max\limits_{W_{j}\in\mathscr{S}_{\mathscr{G}_{2},i,k}}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})\right)\\ &\leq&|\mathscr{S}_{\mathscr{G}_{2},i,1}|\\ &+&\displaystyle\sum\limits_{k=2}^{\infty}|\mathscr{S}_{\mathscr{G}_{2},i,k}|\,\max\limits_{W_{j}\in\mathscr{S}_{\mathscr{G}_{2},i,k}}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2}).\end{array} (D.37)

For the event Wi\centernotWjW_{i}\centernot{\longleftrightarrow}W_{j} in 𝒢2\mathscr{G}_{2} with Wj𝒮𝒢2,i,kW_{j}\in\mathscr{S}_{\mathscr{G}_{2},i,k} and k 2k\,\geq\,2 to occur, there must exist at least one vertex in each set 𝒮𝒢2,i,1,,𝒮𝒢2,i,k1\mathscr{S}_{\mathscr{G}_{2},i,1},\ldots,\mathscr{S}_{\mathscr{G}_{2},i,k-1} at which the coupling disagrees. Therefore, we next derive bounds on |𝒮𝒢2,i,k||\mathscr{S}_{\mathscr{G}_{2},i,k}| to obtain an upper bound on 𝔹𝝃(Wi\centernotWj in 𝒢2)\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2}). Following Lemma D.7, Condition D.1 implies that for i{1,,M}i\in\{1,\ldots,M\} and k{2,3,}k\in\{2,3,\ldots\}

|𝒮𝒢2,i,k|K1+K2logk\begin{array}[]{llllllllll}|\mathscr{S}_{\mathscr{G}_{2},i,k}|&\leq&K_{1}+K_{2}\,\log k\end{array} (D.38)

and |𝒮𝒢2,i,1|K3|\mathscr{S}_{\mathscr{G}_{2},i,1}|\,\leq \,K_{3}, with constants K1 0,K2 0K_{1}\,\geq\,0,\,K_{2}\,\geq\,0, and K3> 0K_{3}\,>\,0 being functions of the constants ω1 0\omega_{1}\,\geq\,0 and ω2 0\omega_{2}\,\geq\,0 defined in Condition D.1 and the constant D{2,3,}D\in\{2,3,\ldots\} defined in (D.4). The probability of event Wi\centernotWjW_{i}\centernot{\longleftrightarrow}W_{j} in 𝒢2\mathscr{G}_{2} can then be bounded as follows:

𝔹𝝃(Wi\centernotWj in 𝒢2)UN(1(1UN)K3)l=2k1[1(1UN)K1+K2logl]l=2k1[1(1UN)K1+K2logl][1(1UN)K1+K2log(k1)]k2,\begin{array}[]{llllllllll}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})&\leq&U_{N}\,(1-(1-U_{N})^{K_{3}})\,\displaystyle\prod\limits_{l=2}^{k-1}\left[1-(1-U_{N})^{K_{1}+K_{2}\,\log l}\right]\\ &\leq&\displaystyle\prod\limits_{l=2}^{k-1}\left[1-(1-U_{N})^{K_{1}+K_{2}\,\log l}\right]\\ &\leq&\left[1-(1-U_{N})^{K_{1}+K_{2}\,\log(k-1)}\right]^{k-2},\end{array}

The first inequality follows from

UN(1(1UN)K3)1,\begin{array}[]{llllllllll}U_{N}\,(1-(1-U_{N})^{K_{3}})&\leq&1,\end{array}

because UN[0,1]U_{N}\in[0,1] and K3>0K_{3}>0. Defining KNexp(K1|log(1UN)|)K_{N}\coloneqq\exp(-K_{1}\,|\log(1-U_{N})|), we obtain for Wj𝒮𝒢,i,kW_{j}\in\mathscr{S}_{\mathscr{G},i,k}

𝔹𝝃(Wi\centernotWj in 𝒢2)[1(1UN)K1+K2log(k1)]k2exp(KN(k1)1K2|log(1UN)|)\begin{array}[]{llllllllll}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})&\leq&\left[1-(1-U_{N})^{K_{1}+K_{2}\,\log(k-1)}\right]^{k-2}\\ &\leq&\exp(-K_{N}\,(k-1)^{1-K_{2}\,|\log(1-U_{N})|})\end{array} (D.39)

with the inequality 1aexp(a)1-a\,\leq\,\exp(-a) for all a(0,1)a\in(0,1).

Plugging (D.38) and (D.39) in (D.37), we obtain:

j=1:jiM𝔹𝝃(Wi\centernotWj in 𝒢2)K3+k=2(K1+K2logk)×exp(KN(k1)1K2|log(1UN)|)=K3+K1k=2exp(KN(k1)1K2|log(1UN)|)+K2k=2logkexp(KN(k1)1K2|log(1UN)|),\begin{array}[]{llllllllll}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}&\mathbb{B}_{\bm{\xi}}&(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})\\ &\leq&K_{3}+\displaystyle\sum\limits_{k=2}^{\infty}(K_{1}+K_{2}\,\log k)\\ &\times&\exp\left(-K_{N}\,(k-1)^{1-K_{2}\,|\log(1-U_{N})|}\right)\vskip 7.11317pt\\ &=&K_{3}+K_{1}\,\displaystyle\sum\limits_{k=2}^{\infty}\exp\left(-K_{N}\,(k-1)^{1-K_{2}\,|\log(1-U_{N})|}\right)\\ &+&K_{2}\,\displaystyle\sum\limits_{k=2}^{\infty}\log k\,\exp\left(-K_{N}\,(k-1)^{1-K_{2}\,|\log(1-U_{N})|}\right),\\ \end{array} (D.40)

resulting in two series that we bound one by one. With :[0,){1,2,}\lceil\cdot\rceil:[0,\infty)\mapsto\{1,2,\ldots\} being the function giving the upper ceiling of a positive real number and uN2/(1K2|log(1UN)|)u_{N}\coloneqq\lceil 2/(1-K_{2}\,|\log(1-U_{N})|)\rceil, the first series can be bounded as follows:

k=2exp(KN(k1)1K2|log(1UN)|)=k=1exp(KNk1K2|log(1UN)|)uN!(KN)uNk=11k2=uN!π2(KN)uN 6.\begin{array}[]{llllllllll}&&\displaystyle\sum\limits_{k=2}^{\infty}\exp\left(-K_{N}\,(k-1)^{1-K_{2}\,|\log(1-U_{N})|}\right)\vskip 7.11317pt\\ &=&\displaystyle\sum\limits_{k=1}^{\infty}\exp\left(-K_{N}\,k^{1-K_{2}\,|\log(1-U_{N})|}\right)\vskip 7.11317pt\\ &\leq&\dfrac{u_{N}!}{{(K_{N})}^{u_{N}}}\displaystyle\sum\limits_{k=1}^{\infty}\dfrac{1}{k^{2}}\;\;=\;\;\dfrac{u_{N}!\,\pi^{2}}{{(K_{N})}^{u_{N}}\,6}.\end{array}

The above bound is based on a Taylor expansion of exp(z)\exp(z), which establishes the inequality exp(z)>zu/u!\exp(z)>z^{u}\,/\,u! implying for any z>0z>0 and any u{1,2,}u\in\{1,2,\dots\}. This, in turn, implies the inequality exp(z)<u!/zu\exp(-z)<u!\,/\,z^{u} for any z>0z>0 and any u{1,2,}u\in\{1,2,\dots\}. With vN3/(1K2|log(1UN)|)v_{N}\coloneqq\lceil 3/(1-K_{2}\,|\log(1-U_{N})|)\rceil, we apply the same inequality to the second series:

k=2log(k)exp(KN(k1)1K2|log(1UN)|)=k=1log(k+1)exp(KNk1K2|log(1UN)|)vN!(KN)vNk=1log(k+1)k3vN!(KN)vNk=1kk3=vN!(KN)vNk=11k2=vN!π2(KN)vN 6.\begin{array}[]{llllllllll}&&\displaystyle\sum\limits_{k=2}^{\infty}\log(k)\,\exp\left(-{K_{N}}\,(k-1)^{1-K_{2}\,|\log(1-U_{N})|}\right)\vskip 7.11317pt\\ &=&\displaystyle\sum\limits_{k=1}^{\infty}\log(k+1)\,\exp\left(-{K_{N}}\,k^{1-K_{2}\,|\log(1-U_{N})|}\right)\vskip 7.11317pt\\ &\leq&\dfrac{v_{N}!}{{(K_{N})}^{v_{N}}}\displaystyle\sum\limits_{k=1}^{\infty}\dfrac{\log(k+1)}{k^{3}}\vskip 7.11317pt\\ &\leq&\dfrac{v_{N}!}{{(K_{N})}^{v_{N}}}\displaystyle\sum\limits_{k=1}^{\infty}\dfrac{k}{k^{3}}~=~\dfrac{v_{N}!}{{(K_{N})}^{v_{N}}}\displaystyle\sum\limits_{k=1}^{\infty}\dfrac{1}{k^{2}}~=~\dfrac{v_{N}!\,\pi^{2}}{{(K_{N})}^{v_{N}}\,6}.\end{array}

Plugging these results into (D.40) gives

j=1:jiM𝔹𝝃(Wi\centernotWj in 𝒢2)K3+π26(K1uN!(KN)uN+K2vN!(KN)vN).\begin{array}[]{llllllllll}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})&\leq&K_{3}+\dfrac{\pi^{2}}{6}\left(K_{1}\,\dfrac{u_{N}!}{{(K_{N})}^{u_{N}}}+K_{2}\,\dfrac{v_{N}!}{{(K_{N})}^{v_{N}}}\right).\\ \end{array} (D.41)

Last but not least, combining (D.41) with (D.36) yields

|𝒟(𝜽)|21+K3+π26(K1uN!(KN)uN+K2vN!(KN)vN).\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&1+K_{3}+\dfrac{\pi^{2}}{6}\left(K_{1}\,\dfrac{u_{N}!}{{(K_{N})}^{u_{N}}}+K_{2}\,\dfrac{v_{N}!}{{(K_{N})}^{v_{N}}}\right).\end{array}

Under Condition 4, ϑ=0\vartheta=0 holds, hence Uϑ,N,KN,uϑ,NU_{\vartheta,N},K_{N},u_{\vartheta,N}, and vϑ,Nv_{\vartheta,N} in (D.33) reduce to

UN=11+exp(E)UKN=exp(K1|log(1U)|)K4uN=21K2|logU|uvN=31K2|logU|v,\begin{array}[]{llllllllll}U_{N}&=&\dfrac{1}{1+\exp(-E)}&\eqqcolon&U\vskip 7.11317pt\\ K_{N}&=&\exp(-K_{1}\,|\log(1-U)|)&\eqqcolon&K_{4}\vskip 7.11317pt\\ u_{N}&=&\left\lceil\dfrac{2}{1-K_{2}\,|\log U|}\right\rceil&\eqqcolon&u\vskip 7.11317pt\\ v_{N}&=&\left\lceil\dfrac{3}{1-K_{2}\,|\log U|}\right\rceil&\eqqcolon&v,\end{array}

which are constants independent of ϑ\vartheta and NN. The constant UU corresponds to the constant UU from Condition D.1. This translates to

|𝒟(𝜽)|2C3,\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&C_{3},\end{array}

with C31+K3+(π2/ 6)(K1u!/K4u+K2v!/K4v) 1C_{3}\coloneqq 1+K_{3}+(\pi^{2}/\,6)\,(K_{1}\,u!/K_{4}^{u}+K_{2}\,v!/K_{4}^{v})\,\geq\,1. For non-overlapping subpopulations, we have K1=K2=0K_{1}=K_{2}=0 and

|𝒟(𝜽)|21+K3=C3.\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&1+K_{3}&=&C_{3}.\end{array}

D.7 Auxiliary Results

Lemma 6. Consider the model of Corollary 4. Condition D.1 implies that there exist constants K10K_{1}\geq 0, K20K_{2}\geq 0, K3>0K_{3}>0 such that, for all k{2,3,}k\in\{2,3,\ldots\} and all i{1,,M}i\in\{1,\ldots,M\},

|𝒮𝒢2,i,k|K1+K2logk|𝒮𝒢2,i,1|K3.\begin{array}[]{llllllllll}|\mathscr{S}_{\mathscr{G}_{2},i,k}|&\leq&K_{1}+K_{2}\,\log k\vskip 7.11317pt\\ |\mathscr{S}_{\mathscr{G}_{2},i,1}|&\leq&K_{3}.\end{array}

Proof of Lemma D.7. With the set

a,b{{c,d}:c𝒩a𝒩b,d𝒩a𝒩b{c}}{{c}:c𝒩a𝒩b},\begin{array}[]{llllllllll}\mbox{$\mathscr{M}$}_{a,b}&\coloneqq&\{\{c,d\}:\;c\,\in\,\mathscr{N}_{a}\,\cup\,\mathscr{N}_{b},\;d\,\in\,\mathscr{N}_{a}\,\cup\,\mathscr{N}_{b}\setminus\{c\}\}\;\cup\;\{\{c\}:\;c\,\in\,\mathscr{N}_{a}\,\cup\,\mathscr{N}_{b}\},\end{array}

we constructed from 𝒢\mathscr{G} two additional graphs 𝒢1\mathscr{G}_{1} and 𝒢2\mathscr{G}_{2} as follows:

  1. 1.

    𝒢1(𝒱,1)\mathscr{G}_{1}\coloneqq(\mathscr{V},\mathscr{E}_{1}):

    • Vertex W𝒱𝒵W\in\mathscr{V}_{\mathscr{Z}} relating to connection Zi,jZ_{i,j} has edges to vertices that relate to all connections Zh,kZ_{h,k} and responses YhY_{h} with {h,k},{h}i,j\{h,k\},\{h\}\in\mbox{$\mathscr{M}$}_{i,j}.

    • Vertex W𝒱𝒴W\in\mathscr{V}_{\mathscr{Y}} relating to attribute YiY_{i} has edges to vertices that relate to all connections Zh,kZ_{h,k} and responses YhY_{h} with {h,k},{h}i,N+1\{h,k\},\{h\}\in\mbox{$\mathscr{M}$}_{i,N+1} for a fictional unit N+1N+1 with 𝒩N+1=\mathscr{N}_{N+1}=\emptyset.

  2. 2.

    𝒢2(𝒱,12)\mathscr{G}_{2}\coloneqq(\mathscr{V},\,\mathscr{E}_{1}\,\cup\,\mathscr{E}_{2}): The set 2\mathscr{E}_{2} includes edges of all vertices Wi𝒱W_{i}\in\mathscr{V} with i{1,,M}i\in\{1,\ldots,M\} to vertices in 𝒮𝒢1,i,2\mathscr{S}_{\mathscr{G}_{1},i,2}.

The graph 𝒢1\mathscr{G}_{1} is equivalent to the graph cover 𝒢\mathscr{G}^{\star} defined in Lemma 16 of S25. Therefore, we are able to use results from the proof of Lemma 16 in S25 demonstrating that Condition D.1 implies the following bound for 𝒮𝒢1,i,k\mathscr{S}_{\mathscr{G}_{1},i,k}:

|𝒮𝒢1,i,k|(ω1+1)(2D3ω1+ω2log(k1)),k{2,3,},\begin{array}[]{llllllllll}|\mathscr{S}_{\mathscr{G}_{1},i,k}|&\leq&(\omega_{1}+1)(2\,D^{3}\,\omega_{1}+\omega_{2}\,\log(k-1)),&&k\in\{2,3,\ldots\},\end{array}

where DD corresponds to the constant defined in (D.4) and the constants ω1 0\omega_{1}\,\geq\,0 and 0ω2min{ω1, 1/((ω1+1)|log(1U)|)}0\,\leq\,\omega_{2}\,\leq\,\min\limits\{\omega_{1},\,1/((\omega_{1}+1)\,|\log(1-U)|)\} with U(1+exp(A))1>0U\coloneqq(1+\exp(-A))^{-1}>0 correspond to the constant from Condition D.1. Defining K52ω1(ω1+1)D3 0K_{5}\coloneqq 2\,\omega_{1}\,(\omega_{1}+1)\,D^{3}\,\geq\,0 and K6ω2(ω1+1) 0K_{6}\coloneqq\omega_{2}\,(\omega_{1}+1)\,\geq\,0, this bound is:

|𝒮𝒢1,i,k|K5+K6log(k1),k{2,3,}.\begin{array}[]{llllllllll}|\mathscr{S}_{\mathscr{G}_{1},i,k}|&\leq&K_{5}+K_{6}\,\log(k-1),&&k\in\{2,3,\ldots\}.\end{array}

The bound for 𝒮𝒢1,i,14D2+D\mathscr{S}_{\mathscr{G}_{1},i,1}\leq 4\,D^{2}+D differs to the result from S25 since for our definition of i,j\mbox{$\mathscr{M}$}_{i,j} there are additional |𝒩i𝒩j|D|\mathscr{N}_{i}\cup\mathscr{N}_{j}|\,\leq\,D responses in i,j\mbox{$\mathscr{M}$}_{i,j}.

Adding edges 2\mathscr{E}_{2}, defined as the edges from vertices to other vertices with a geodesic distance of two in 𝒢1\mathscr{G}_{1}, to 𝒢2\mathscr{G}_{2} reduces the geodesic distance between all vertices from k{1,2,}k\in\{1,2,\ldots\} in 𝒢1\mathscr{G}_{1} to k/2\lceil k/2\rceil in 𝒢2\mathscr{G}_{2}. Therefore, |𝒮𝒢2,i,k|=|𝒮𝒢1,i,2k|+|𝒮𝒢1,i,2k1||\mathscr{S}_{\mathscr{G}_{2},i,k}|=|\mathscr{S}_{\mathscr{G}_{1},i,2\,k}|+|\mathscr{S}_{\mathscr{G}_{1},i,2\,k-1}| holds for k{1,2,}k\in\{1,2,\ldots\} and i{1,,M}i\in\{1,\ldots,M\}. This allows us to relate the bounds for |𝒮𝒢1,i,k||\mathscr{S}_{\mathscr{G}_{1},i,k}| to bounds for |𝒮𝒢2,i,k||\mathscr{S}_{\mathscr{G}_{2},i,k}| with k=2,3,k=2,3,\ldots and i{1,,M}i\in\{1,\ldots,M\}:

|𝒮𝒢2,i,k|=|𝒮𝒢1,i,2k|+|𝒮𝒢1,i,2k1|2K5+K6(log(2k)+log(2k1))2K5+2K6log(2k)=K1+K2logk\begin{array}[]{llllllllll}|\mathscr{S}_{\mathscr{G}_{2},i,k}|&=&|\mathscr{S}_{\mathscr{G}_{1},i,2\,k}|+|\mathscr{S}_{\mathscr{G}_{1},i,2\,k-1}|\\ &\leq&2\,K_{5}+K_{6}\,(\log(2\,k)+\log(2\,k-1))\\ &\leq&2\,K_{5}+2\,K_{6}\,\log(2\,k)\\ &=&K_{1}+K_{2}\,\log k\end{array}

and

|𝒮𝒢2,i,1|4D2+D+K1K3,\begin{array}[]{llllllllll}|\mathscr{S}_{\mathscr{G}_{2},i,1}|&\leq&4\,D^{2}+D+K_{1}\eqqcolon K_{3},\end{array}

with K12K5+2K6log2K_{1}\coloneqq 2\,K_{5}+2\,K_{6}\log 2 and K22K6K_{2}\coloneqq 2\,K_{6}. This proves the statement with K10,K20,K_{1}\geq 0,K_{2}\geq 0, and K3>0K_{3}>0.

Lemma 7. Consider the model of Corollary 4. Then, for any pair of units {i,j}𝒫N\{i,j\}\subset\mathscr{P}_{N} such that 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset,

11+exp(CD2𝜽)𝜽(Zi,j=1𝒙,𝒚,𝒛{i,j})11+exp(CD2𝜽).\begin{array}[]{llllllllll}\dfrac{1}{1+\exp\left(C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}~\leq~\mathbb{P}_{\bm{\theta}}(Z_{i,j}=1\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})~\leq~\dfrac{1}{1+\exp\left(-C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}.\end{array}

Proof of Lemma D.7. For all {i,j}𝒫N\{i,j\}\subset\mathscr{P}_{N} such that 𝒩i𝒩j\mathscr{N}_{i}\cap\mathscr{N}_{j}\neq\emptyset, the conditional probability of Zi,jZ_{i,j} given (𝑿,𝒀,𝒁{i,j})=(𝒙,𝒚,𝒛{i,j})(\bm{X},\bm{Y},\bm{Z}_{-\{i,j\}})=(\bm{x},\bm{y},\bm{z}_{-\{i,j\}}) is

𝜽(Zi,j=zi,j𝒙,𝒚,𝒛{i,j})=exp(𝜽𝒃(𝒙,𝒚,𝒛{i,j},zi,j))exp(𝜽𝒃(𝒙,𝒚,𝒛{i,j},1))+exp(𝜽𝒃(𝒙,𝒚,𝒛{i,j},0))=11+g(1zi,j;𝒛{i,j},zi,j,𝜽),\begin{array}[]{llllllllll}&\mathbb{P}_{\bm{\theta}}(Z_{i,j}=z_{i,j}\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})\vskip 7.11317pt\\ =&\dfrac{\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},z_{i,j})\right)}{\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},1)\right)+\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},0)\right)}\vskip 7.11317pt\\ =&\dfrac{1}{1+g(1-z_{i,j};\bm{z}_{-\{i,j\}},z_{i,j},\bm{\theta})},\end{array}

with

g(z;𝒛{i,j},zi,j,𝜽)=exp(𝜽(𝒃(𝒙,𝒚,𝒛{i,j},z)𝒃(𝒙,𝒚,𝒛{i,j},zi,j))).\begin{array}[]{llllllllll}g(z;\,\bm{z}_{-\{i,j\}},z_{i,j},\bm{\theta})&=&\exp\left(\bm{\theta}^{\top}\left(\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},z)-\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},z_{i,j})\right)\right).\end{array}

Note that

maxz{i,j}𝒵{i,j}|ba(𝒙,𝒚,𝒛{i,j},0)ba(𝒙,𝒚,𝒛{i,j},1)|{0if a{1,,N}{i,j}1if a{i,j}1+Dif a=N+12Cif a=N+2,\begin{array}[]{llllllllll}\underset{z_{-\{i,j\}}\in\mathscr{Z}_{-\{i,j\}}}{\max}|b_{a}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},0)-b_{a}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},1)|\vskip 7.11317pt\\ \leq\begin{cases}0&\text{if $a\in\{1,\ldots,N\}\setminus\{i,j\}$}\\ 1&\text{if $a\in\{i,j\}$}\\ 1+D&\text{if $a=N+1$}\\ 2\,C&\text{if $a=N+2$}\\ \end{cases},\end{array}

where 𝒵{i,j}(k,h)(i,j)N𝒵k,h\mathscr{Z}_{-\{i,j\}}\coloneqq\mathbin{\leavevmode\hbox to9.47pt{\vbox to9.47pt{\pgfpicture\makeatletter\hbox{\thinspace\lower-0.43056pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}\pgfsys@setlinewidth{0.86111pt}\pgfsys@invoke{ }{}{{}}{} {}{}{}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{8.61108pt}{8.61108pt}\pgfsys@moveto{0.0pt}{8.61108pt}\pgfsys@lineto{8.61108pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}}_{(k,h)\neq(i,j)}^{N}\,\mathscr{Z}_{k,h} is the domain of 𝒁\bm{Z} excluding Zi,jZ_{i,j}, CC corresponds to the constant from Condition 4, and DD matches the constant defined in (D.4). . The bounds for a=1,,Na=1,\ldots,N follow from the observation, that the degree statistic of unit aa can, first, only affected by connections zi,jz_{i,j} with a{i,j}a\in\{i,j\} and, second, be at most 1 if this is the case. For a=N+1a=N+1, the bound follows from Lemma 18 of S25. For a=N+2a=N+2, the sufficient statistic counts the number of connections with overlapping neighborhoods and either Yixj>0Y_{i}\,x_{j}>0 or Yjxi>0Y_{j}\,x_{i}>0. For 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset, the maximal change in the statistic is 2C2\,C since yi{0,1}y_{i}\in\{0,1\} and xiCx_{i}\leq C for i𝒫Ni\in\mathscr{P}_{N}, otherwise the maximal change is 0.

Upon applying the triangle inequality,

|𝜽𝒃(𝒙,𝒚,𝒛{i,j},z)𝜽𝒃(𝒙,𝒚,𝒛{i,j},zi,j)|(2+2C+D)𝜽,\begin{array}[]{llllllllll}|\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},z)-\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},z_{i,j})|~\leq~(2+2\,C+D)\;\left|\!\left|\bm{\theta}\right|\!\right|_{\infty},\end{array}

we obtain for 𝒩i𝒩j\mathscr{N}_{i}\cap\mathscr{N}_{j}\neq\emptyset

exp((2+2C+D)𝜽)g(1zi,j;𝒛{i,j},zi,j,𝜽)exp((2+2C+D)𝜽).\begin{array}[]{llllllllll}\exp\left(-(2+2\,C+D)\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)~\leq~g(1-z_{i,j};\bm{z}_{-\{i,j\}},z_{i,j},\bm{\theta})~\leq~\exp\left((2+2\,C+D)\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right).\end{array}

Upon collecting terms, we obtain the final result:

11+exp(C6𝜽)𝜽(Zi,j=1𝒙,𝒚,𝒛{i,j})11+exp(C6𝜽)11+exp(CD2𝜽)𝜽(Zi,j=1𝒙,𝒚,𝒛{i,j})11+exp(CD2𝜽)\begin{array}[]{llllllllll}\dfrac{1}{1+\exp\left(C_{6}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}&\leq&\mathbb{P}_{\bm{\theta}}(Z_{i,j}=1\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})&\leq&\dfrac{1}{1+\exp\left(-C_{6}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}\vskip 7.11317pt\\ \dfrac{1}{1+\exp\left(C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}&\leq&\mathbb{P}_{\bm{\theta}}(Z_{i,j}=1\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})&\leq&\dfrac{1}{1+\exp\left(-C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}\end{array}

where D{2,3,}D\in\{2,3,\ldots\} and C62+2C+D>0C_{6}\coloneqq 2+2\,C+D>0 are constants.

Lemma 8. Consider the model of Corollary 4. Then, for any i{1,,M}i\in\{1,\ldots,M\}

11+exp(CD2𝜽)𝜽(Yi=1𝒙,𝒚i,𝒛)11+exp(CD2𝜽).\begin{array}[]{llllllllll}\dfrac{1}{1+\exp\left(C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}~\leq~\mathbb{P}_{\bm{\theta}}(Y_{i}=1\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})~\leq~\dfrac{1}{1+\exp\left(-C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}.\end{array}

Proof of Lemma D.7. The conditional probability of YiY_{i} given (𝑿,𝒀i,𝒁)=(𝒚i,𝒛)(\bm{X},\,\bm{Y}_{-i},\bm{Z})=(\bm{y}_{-i},\,\bm{z}) is

𝜽(Yi=yi𝒙,𝒚i,𝒛)=exp(𝜽𝒃(𝒙,𝒚i,yi,𝒛))exp(𝜽𝒃(𝒙,𝒚i,0,𝒛))+exp(𝜽𝒃(𝒙,𝒚i,1,𝒛))=1g(0;𝒚i,yi,𝜽)+g(1;𝒚i,yi,𝜽),\begin{array}[]{llllllllll}\mathbb{P}_{\bm{\theta}}(Y_{i}=y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})&=&\dfrac{\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y}_{-i},\,y_{i},\bm{z})\right)}{\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y}_{-i},0,\bm{z})\right)+\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y}_{-i},1,\bm{z})\right)}\vskip 7.11317pt\\ &=&\dfrac{1}{g(0;\bm{y}_{-i},\,y_{i},\bm{\theta})+g(1;\bm{y}_{-i},\,y_{i},\bm{\theta})},\end{array}

where

g(y;𝒚i,yi,𝜽)=exp(𝜽(𝒃(𝒙,𝒚i,y,𝒛)𝒃(𝒙,𝒚i,yi,𝒛))).\begin{array}[]{llllllllll}g(y;\,\bm{y}_{-i},\,y_{i},\,\bm{\theta})&=&\exp\left(\bm{\theta}^{\top}\left(\bm{b}(\bm{x},\,\bm{y}_{-i},\,y,\,\bm{z})-\bm{b}(\bm{x},\,\bm{y}_{-i},\,y_{i},\,\bm{z})\right)\right).\end{array}

Note that

maxyi𝒴i|ba(𝒙,𝒚i, 0,𝒛)ba(𝒙,𝒚i,1,𝒛)|{0if a{1,,N+1}CD2if a=N+2,\begin{array}[]{llllllllll}\underset{y_{-i}\in\mathscr{Y}_{-i}}{\max}\;|b_{a}(\bm{x},\,\bm{y}_{-i},\,0,\bm{z})-b_{a}(\bm{x},\,\bm{y}_{-i},1,\bm{z})|\;\leq\;\begin{cases}0&\text{if $a\in\{1,\ldots,N+1\}$}\\ C\,D^{2}&\text{if $a=N+2$,}\\ \end{cases}\end{array}

where 𝒴ijiN𝒴j\mathscr{Y}_{-i}\coloneqq\mathbin{\leavevmode\hbox to9.47pt{\vbox to9.47pt{\pgfpicture\makeatletter\hbox{\thinspace\lower-0.43056pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}\pgfsys@setlinewidth{0.86111pt}\pgfsys@invoke{ }{}{{}}{} {}{}{}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{8.61108pt}{8.61108pt}\pgfsys@moveto{0.0pt}{8.61108pt}\pgfsys@lineto{8.61108pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}}_{j\neq i}^{N}\,\mathscr{Y}_{j} is the domain of 𝒀\bm{Y} without YiY_{i}, CC corresponds to the constant from Condition 4, and DD matches the constant defined in (D.4). The bounds for a=1,,N+1a=1,\ldots,N+1 are 0 as the corresponding statistics are not affected by changes in 𝒚\bm{y}. For a=N+2a=N+2, the maximal change is bounded by the number of units jj such that 𝒩i𝒩j\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset, which is D2D^{2}, times the maximal value CC of the predictors. The remainder of the proof of Lemma D.7 resembles the proof of Lemma D.7.

Lemma 9. Consider the model of Corollary 4. If Conditions 4, D.1, and D.3 are satisfied with ϑ[0,1/18)\vartheta\in[0,1/18), we obtain the following bounds for all elements of 𝐁(𝛉,𝐰)\bm{B}(\bm{\theta},\,\bm{w}), being the covariance matrix of the sufficient statistics bN+1(𝐱,𝐲,𝐳)b_{N+1}(\bm{x},\,\bm{y},\,\bm{z}) and bN+2(𝐱,𝐲,𝐳)b_{N+2}(\bm{x},\,\bm{y},\,\bm{z}) defined in Section D.1, for all 𝛉𝚯\bm{\theta}\in\bm{\Theta} and all 𝐰𝒲\bm{w}\in\mathscr{W}:

B1,1(𝜽,𝒘)ND54|B1,2(𝜽,𝒘)|NC2D54B2,2(𝜽,𝒘)NC2D54.\begin{array}[]{llllllllll}B_{1,1}(\bm{\theta},\,\bm{w})&\leq&\,\dfrac{ND^{5}}{4}\vskip 7.11317pt\\ |B_{1,2}(\bm{\theta},\,\bm{w})|&\leq&\,\dfrac{N\,C^{2}\,D^{5}}{4}\vskip 7.11317pt\\ B_{2,2}(\bm{\theta},\,\bm{w})&\leq&\ \dfrac{N\,C^{2}\,D^{5}}{4}.\end{array}

Proof of Lemma D.7. We first bound B1,1(𝜽,𝒘)B_{1,1}(\bm{\theta},\,\bm{w}) from above as follows:

B1,1(𝜽,𝒘)=i=1Nj=i+1N𝕍𝒵,i,j(sN+1(𝒙,𝒚,𝒁))=i=1Nj=i+1N𝕍𝒵,i,j(a=1Nb=a+1Nda,b(𝒁)Za,b)=i=1Nj=i+1Nci,j𝕍𝒵,i,j(a=1Nb=a+1NZa,bda,b(𝒁))i=1Nj=i+1Nci,jD2(a=1Nb=a+1N𝕍𝒵,i,j(Za,bda,b(𝒁)))\begin{array}[]{llllllllll}B_{1,1}(\bm{\theta},\,\bm{w})&=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(s_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\vskip 7.11317pt\\ &=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(\displaystyle\sum\limits_{a=1}^{N}\displaystyle\sum\limits_{b=a+1}^{N}d_{a,b}(\bm{Z})\,Z_{a,b}\right)\vskip 7.11317pt\\ &=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,\mathbb{V}_{\mathscr{Z},i,j}\,\left(\displaystyle\sum\limits_{a=1}^{N}\displaystyle\sum\limits_{b=a+1}^{N}Z_{a,b}\,d_{a,b}(\bm{Z})\right)\vskip 7.11317pt\\ &\leq&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,D^{2}\left(\displaystyle\sum\limits_{a=1}^{N}\displaystyle\sum\limits_{b=a+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\,\left(Z_{a,b}\,d_{a,b}(\bm{Z})\right)\right)\end{array} (D.42)

where DD matches the constant defined in (D.4) and the function da,b(𝒁)d_{a,b}(\bm{Z}) is defined in (D.2). On the second line of (D.42), we use that the fact that 𝒩i𝒩j=\mathcal{N}_{i}\,\cap\,\mathcal{N}_{j}\,=\,\emptyset implies that di,j(𝒁)Zi,j=0d_{i,j}(\bm{Z})\,Z_{i,j}=0 and da,b(𝒁)Za,bd_{a,b}(\bm{Z})\,Z_{a,b} does not depend on Zi,jZ_{i,j} for any {a,b}{i,j}\{a,b\}\neq\{i,j\}. For the inequality in the last line of (D.42), we use the fact that the number of pairs (a,b)(a,b) for which da,b(𝒁)Za,bd_{a,b}(\bm{Z})\,Z_{a,b} is a function of Zi,jZ_{i,j} is bounded above by DD (see proof of Lemma 19 in S25). Invoking Lemma 15 of S25 together with applying

a=1Nb=a+1N𝕍𝒵,i,j(da,b(𝒁)Za,b)D4\begin{array}[]{llllllllll}\displaystyle\sum\limits_{a=1}^{N}\displaystyle\sum\limits_{b=a+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\,\left(d_{a,b}(\bm{Z})\,Z_{a,b}\right)&\leq&\dfrac{D}{4}\end{array}

gives:

B1,1(𝜽,𝒘)i=1Nj=i+1Nci,jD2(a=1Nb=a+1N𝕍𝒵,i,j(Za,bda,b(𝒁)))ND54\begin{array}[]{llllllllll}B_{1,1}(\bm{\theta},\,\bm{w})&\leq&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,D^{2}\left(\displaystyle\sum\limits_{a=1}^{N}\displaystyle\sum\limits_{b=a+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\,\left(Z_{a,b}\,d_{a,b}(\bm{Z})\right)\right)\;\;\leq\;\;\dfrac{N\,D^{5}}{4}\end{array}

We proceed with bounding B2,2(𝜽,𝒘)B_{2,2}(\bm{\theta},\,\bm{w}):

B2,2(𝜽,𝒘)=i=1N𝕍𝒴,i(sN+2(𝒙,𝒀,𝒛))+i=1Nj=i+1N𝕍𝒵,i,j(sN+2(𝒙,𝒚,𝒁))=i=1N𝕍𝒴,i((j=1Nci,jxjzi,j)Yi)+i=1Nj=i+1N𝕍𝒵,i,j(ci,j(xiyj+xjyi)Zi,j)=i=1N(j=1Nci,jxjzi,j)2𝕍𝒴,i(Yi)+i=1Nj=i+1Nci,j(xiyj+xjyi)2𝕍𝒵,i,j(Zi,j)5NC2D44NC2D54,\begin{array}[]{llllllllll}B_{2,2}(\bm{\theta},\,\bm{w})&=&\displaystyle\sum\limits_{i=1}^{N}\mathbb{V}_{\mathscr{Y},i}\left(s_{N+2}(\bm{x},\bm{Y},\,\bm{z})\right)+\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(s_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\\ &=&\displaystyle\sum\limits_{i=1}^{N}\mathbb{V}_{\mathscr{Y},i}\left(\left(\displaystyle\sum\limits_{j=1}^{N}c_{i,j}\,x_{j}\,z_{i,j}\right)\,Y_{i}\right)+\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(c_{i,j}\,(x_{i}\,y_{j}+x_{j}\,y_{i})\,Z_{i,j}\right)\\ &=&\displaystyle\sum\limits_{i=1}^{N}\left(\displaystyle\sum\limits_{j=1}^{N}c_{i,j}\,x_{j}\,z_{i,j}\right)^{2}\mathbb{V}_{\mathscr{Y},i}\left(Y_{i}\right)+\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,(x_{i}\,y_{j}+x_{j}\,y_{i})^{2}\,\mathbb{V}_{\mathscr{Z},i,j}\left(Z_{i,j}\right)\vskip 7.11317pt\\ &\leq&\,\dfrac{5\,N\,C^{2}\,D^{4}}{4}~\leq~\dfrac{N\,C^{2}\,D^{5}}{4},\end{array}

because |xj|C|x_{j}|\,\leq C according to Condition 4. For the first inequality, we also use that

j=1Nci,jD2\begin{array}[]{llllllllll}\displaystyle\sum\limits_{j=1}^{N}c_{i,j}&\leq&D^{2}\end{array}

by Lemma 15 in S25. We obtain

max{B1,1(𝜽,𝒘),B2,2(𝜽,𝒘)}NC2D54,\begin{array}[]{llllllllll}\max\{B_{1,1}(\bm{\theta},\,\bm{w}),\,B_{2,2}(\bm{\theta},\,\bm{w})\}&\leq&\dfrac{N\,C^{2}\,D^{5}}{4},\end{array}

which provides an upper bound on |B1,2(𝜽,𝒘)||B_{1,2}(\bm{\theta},\,\bm{w})| by the Cauchy-Schwarz inequality:

|B1,2(𝜽,𝒘)|B1,1(𝜽,𝒘)B2,2(𝜽,𝒘)NC2D54.\begin{array}[]{llllllllll}|B_{1,2}(\bm{\theta},\,\bm{w})|&\leq&\sqrt{B_{1,1}(\bm{\theta},\,\bm{w})}\,\sqrt{B_{2,2}(\bm{\theta},\,\bm{w})}\ \leq\,\dfrac{N\,C^{2}\,D^{5}}{4}.\end{array}

Lemma 10. Consider the model of Corollary 4. Define

πv,𝒘v,𝒘v||𝜽(𝒘v)𝜽(𝒘v)||TV\begin{array}[]{llllllllll}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\coloneqq&\left|\!\left|\mathbb{P}_{\bm{\theta}}(\,\cdot\mid\bm{w}_{-v})-\mathbb{P}_{\bm{\theta}}(\,\cdot\mid\bm{w}_{-v}^{\prime})\right|\!\right|_{\text{TV}}\end{array}
πmax1vMmax(𝒘v,𝒘v)𝒲v×𝒲vπv,𝒘v,𝒘v.\begin{array}[]{llllllllll}\pi^{\star}&\coloneqq&\underset{1\,\leq\,v\,\leq\,M}{\max}\;\underset{(\bm{w}_{-v},\,\bm{w}_{-v}^{\prime})\,\in\,\mathscr{W}_{-v}\times\mathscr{W}_{-v}}{\max}\;\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}.\end{array}

Let D{2,3,}D\in\{2,3,\ldots\} be the maximum degree of vertices Zi,jZ_{i,j} in 𝒢\mathscr{G}. Then

π11+exp(CD2𝜽).\begin{array}[]{llllllllll}\pi^{\star}&\leq&\dfrac{1}{1+\exp(-C\,D^{2}\,\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty})}.\end{array}

Proof of Lemma D.7. The proof of Lemma D.7 resembles the proof of Lemma 21 in S25, adapted to the bounds on the conditional probabilities derived in Lemmas D.7 and D.7. We distinguish four cases, where WvW_{v} with v{1,,M}v\in\{1,\ldots,M\} relates to:

  1. 1.

    Connection Zi,jZ_{i,j} of a pair of nodes {i,j}𝒫N\{i,\,j\}\subset\mathscr{P}_{N} with 𝒩i𝒩j=\mathscr{N}_{i}\cap\mathscr{N}_{j}=\emptyset.

  2. 2.

    Attribute YiY_{i} with i𝒫Ni\in\mathscr{P}_{N} and {j𝒫N:𝒩i𝒩j}=\{j\in\mathscr{P}_{N}:\;\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset\}=\emptyset.

  3. 3.

    Connection Zi,jZ_{i,j} of a pair of nodes {i,j}𝒫N\{i,\,j\}\subset\mathscr{P}_{N} with 𝒩i𝒩j\mathscr{N}_{i}\cap\mathscr{N}_{j}\neq\emptyset.

  4. 4.

    Attribute YiY_{i} with i𝒫Ni\in\mathscr{P}_{N} and {j𝒫N:𝒩i𝒩j}\{j\in\mathscr{P}_{N}:\;\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset\}\neq\emptyset.

In cases 1 and 2, WvW_{v} is independent of 𝑾v\bm{W}_{-v}, so that πv,𝒘v,𝒘v=0\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}=0; note that case 2 cannot occur, because Condition 4 ensures that there are no units i𝒫Ni\in\mathscr{P}_{N} with {j𝒫N:𝒩i𝒩j}=\{j\in\mathscr{P}_{N}:\;\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset\}=\emptyset. In cases 3 and 4, WvW_{v} depends on a non-empty subset of other vertices in 𝒢\mathscr{G}. Consider any v{1,,M}v\in\{1,\ldots,M\} such that πv,𝒘v,𝒘v>0\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}>0 for some (𝒘v,𝒘v)𝒲v×𝒲v(\bm{w}_{-v},\bm{w}_{-v}^{\prime})\in\mathscr{W}_{-v}\times\mathscr{W}_{-v} and define

a0,v𝜽(Wv=0𝑾v=𝒘v) and a1,v𝜽(Wv=1𝑾v=𝒘v)b0,v𝜽(Wv=0𝑾v=𝒘v) and b1,v𝜽(Wv=1𝑾v=𝒘v).\begin{array}[]{llllllllll}a_{0,v}&\coloneqq&\mathbb{P}_{\bm{\theta}}(W_{v}=0\mid\bm{W}_{-v}=\bm{w}_{-v})~\text{ and }~a_{1,v}&\coloneqq&\mathbb{P}_{\bm{\theta}}(W_{v}=1\mid\bm{W}_{-v}=\bm{w}_{-v})\\ b_{0,v}&\coloneqq&\mathbb{P}_{\bm{\theta}}(W_{v}=0\mid\bm{W}_{-v}=\bm{w}_{-v}^{\prime})~\text{ and }~b_{1,v}&\coloneqq&\mathbb{P}_{\bm{\theta}}(W_{v}=1\mid\bm{W}_{-v}=\bm{w}_{-v}^{\prime}).\end{array}

Lemma 21 in S25 shows that

πv,𝒘v,𝒘vmin{max{a0,v,b0,v},max{a1,v,b1,v}}.\begin{array}[]{llllllllll}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\leq&\min\{\max\{a_{0,v},\,b_{0,v}\},\;\max\{a_{1,v},\,b_{1,v}\}\}.\end{array}

Plugging in the bounds on the conditional probabilities in Lemmas D.7 and D.7, we obtain

πv,𝒘v,𝒘v11+exp(CD2𝜽),v𝒱Z\begin{array}[]{llllllllll}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\leq&\dfrac{1}{1+\exp\left(-C\,D^{2}\,\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}\right)},&v\in\mathscr{V}_{Z}\end{array}

and

πv,𝒘v,𝒘v11+exp(CD2𝜽),v𝒱Y.\begin{array}[]{llllllllll}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\leq&\dfrac{1}{1+\exp\left(-C\,D^{2}\,\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}\right)},&v\in\mathscr{V}_{Y}.\end{array}

Since D{2,3,}D\in\{2,3,\ldots\}, we obtain

πmax1vMmax(𝒘v,𝒘v)𝒲v×𝒲vπv,𝒘v,𝒘v11+exp(CD2𝜽).\begin{array}[]{llllllllll}\pi^{\star}&\coloneqq&\underset{1\,\leq\,v\,\leq\,M}{\max}\;\;\underset{(\bm{w}_{-v},\,\bm{w}_{-v}^{\prime})\,\in\,\mathscr{W}_{-v}\times\mathscr{W}_{-v}}{\max}\;\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\leq&\dfrac{1}{1+\exp(-C\,D^{2}\,\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty})}.\end{array}

Lemma 11. Consider the model of Corollary 4. If Conditions 4 and D.1 are satisfied along with either Condition 4 or Condition D.3 with ϑ[0,1/18)\vartheta\in[0,1/18), there exists an integer N0{3,4,}N_{0}\in\{3,4,\ldots\} such that, for all N>N0N>N_{0},

(𝑾)4max{N,p}2,\begin{array}[]{llllllllll}\mathbb{P}(\bm{W}\not\in\mathscr{H})&\leq&\dfrac{4}{\max\{N,\,p\}^{2}},\end{array}

where \mathscr{H} is defined in (D.8).

Proof of Lemma D.7. We prove Lemma D.7 by showing that

(i=1N𝑯i,1(𝑾)<N2(1+χ(𝜽))2)2max{N,p}2(i=1N𝑯i,2(𝑾)<c2N2(1+χ(𝜽)))2max{N,p}2.\begin{array}[]{llllllllll}\mathbb{P}\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}~<~\dfrac{N}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}\right)&\leq&\dfrac{2}{\max\{N,\,p\}^{2}}\vskip 7.11317pt\vskip 7.11317pt\\ \mathbb{P}\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{W})\right|\!\right|_{\infty}}~<~\dfrac{c^{2}\,N}{2\,(1+\chi(\bm{\theta}^{\star}))}\right)&\leq&\dfrac{2}{\max\{N,\,p\}^{2}}.\end{array} (D.43)

To prove the first line of (D.43), we first bound (1/2)i=1N𝑯i,1(𝑾)(1/2)\sum_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}} from below. We then use Theorem 1 of \citetsupp[][p. 207]Chetal07 to concentrate i=1N𝑯i,1(𝑾)\sum_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}. Last, but not least, we show that there exists an integer N0{3,4,}N_{0}\in\{3,4,\ldots\} such that the obtained lower bound for (1/2)i=1N𝑯i,1(𝑾)(1/2)\sum_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}} is, with high probability, greater than the deviation of i=1N𝑯i,1(𝑾)\sum_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}} from its mean. The first line of (D.43) follows from combining these steps. The second line of (D.43) can be established along the same lines. A union bound then establishes the desired result:

(𝑾)(i=1N𝑯i,1(𝑾)<N2(1+χ(𝜽))2)+(i=1N𝑯i,2(𝑾)<c2N2(1+χ(𝜽)))4max{N,p}2.\begin{array}[]{llllllllll}\mathbb{P}(\bm{W}\not\in\mathscr{H})&\leq&\mathbb{P}\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}~<~\dfrac{N}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}\right)\vskip 7.11317pt\\ &+&\mathbb{P}\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{W})\right|\!\right|_{\infty}}~<~\dfrac{c^{2}\,N}{2\,(1+\chi(\bm{\theta}^{\star}))}\right)\vskip 7.11317pt\\ &\leq&\dfrac{4}{\max\{N,\,p\}^{2}}.\end{array}

Step 1: Condition 4 implies that, for each unit i𝒫Ni\in\mathscr{P}_{N}, there exists a unit j𝒫N{i}j\in\mathscr{P}_{N}\setminus\{i\} such that 𝒩i𝒩j\mathscr{N}_{i}\cap\mathscr{N}_{j}\neq\emptyset and xj[c,C]x_{j}\in[c,\,C]. Thus, by Lemma D.7, Lemma 17 of S25, and Conditions 4 and 4, we obtain

12i=1N𝑯i,1(𝒘)12i=1N𝔼di,j(𝒁)N2(1+χ(𝜽))2N4χ(𝜽)2N12ϑ4exp(2E).\begin{array}[]{llllllllll}\dfrac{1}{2}\,\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}&\geq&\dfrac{1}{2}\,\displaystyle\sum\limits_{i=1}^{N}\mathbb{E}\,d_{i,j}(\bm{Z})&\geq&\dfrac{N}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}&\geq&\dfrac{N}{4\,\chi(\bm{\theta}^{\star})^{2}}\vskip 7.11317pt&\geq&\dfrac{N^{1-2\,\vartheta}}{4\,\exp(2\,E)}.\end{array}

Theorem 1 of \citetsupp[][p. 207]Chetal07 implies

(|i=1N𝑯i,1(𝒘)𝔼i=1N𝑯i,1(𝑾)|<t)12exp(2t2ΨN2|𝒟N(𝜽)|22).\begin{array}[]{llllllllll}\mathbb{P}\left(\left|\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}-\mathbb{E}\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}\right|<t\right)&\geq&1-2\exp\left(-\dfrac{2\,t^{2}}{\Psi_{N}^{2}\,{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}^{2}}}\right).\end{array}

Choosing

tlogmax{N,p}ΨN|𝒟N(𝜽)|2\begin{array}[]{llllllllll}t&\coloneqq&\sqrt{\log\max\{N,\,p\}}\,\Psi_{N}\,{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}\end{array}

gives

(|i=1N𝑯i,1(𝑾)𝔼i=1N𝑯i,1(𝑾)|<logmax{N,p}ΨN|𝒟N(𝜽)|2)12max{N,p}2.\begin{array}[]{llllllllll}&&\mathbb{P}\left(\left|\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}-\mathbb{E}\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}\right|<\sqrt{\log\max\{N,\,p\}}\,\Psi_{N}\,{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}\right)\vskip 7.11317pt\\ &\geq&1-\dfrac{2}{\max\{N,\,p\}^{2}}.\end{array}

Next, we demonstrate that there exists an integer N1{3,4,}N_{1}\in\{3,4,\dots\} such that, for all N>N1N>N_{1},

logmax{N,p}ΨN|𝒟N(𝜽)|2N12ϑ4exp(2E).\begin{array}[]{llllllllll}\sqrt{\log\max\{N,\,p\}}\,\Psi_{N}\,{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&\dfrac{N^{1-2\,\vartheta}}{4\,\exp(2\,E)}.\end{array}

To do so, we bound the three terms one by one. Using max{N,p}=N+2\max\{N,\,p\}=N+2, the first term, logmax{N,p}\sqrt{\log\max\{N,\,p\}}, is bounded above by logmax{N,p} 2logN\sqrt{\log\max\{N,\,p\}}\,\leq\,2\,\sqrt{\log N} provided N2N\geq 2. The second term is bounded above by ΨNDN\Psi_{N}~\leq~D\,\sqrt{N} as shown in the proof of Lemma 14 in S25. The third term is bounded above by |𝒟N(𝜽)|2<C3{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}<C_{3} by Lemma D.6, where C3>0C_{3}>0 is a constant.

Combining these results gives

2NlogNC3,DN1ϑ4exp(E)8C3Dexp(E)N12ϑlogN.\begin{array}[]{llllllllll}2\,\sqrt{N\,\log N}\,C_{3},D&\leq&\dfrac{N^{1-\vartheta}}{4\,\exp(E)}\vskip 7.11317pt\\ 8\,C_{3}\,D\,\exp(E)&\leq&\sqrt{\dfrac{N^{1-2\,\vartheta}}{\log N}}.\end{array}

Similar to the proof of Lemma 14 in S25, this implies

(i=1N𝑯i,1(𝑾)N2(1+χ(𝜽))2)12max{N,p}2.\begin{array}[]{llllllllll}\mathbb{P}\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}~\geq~\dfrac{N}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}\right)&\geq&1-\dfrac{2}{\max\{N,\,p\}^{2}}.\end{array}

Step 2: Conditions 4 and 4 along with Lemma D.7 establish

12i=1N𝑯i,2(𝒘)c22i=1N𝔼Zi,jc2N2(1+χ(𝜽))c2N4χ(𝜽)c2N1ϑ4exp(E).\begin{array}[]{llllllllll}\dfrac{1}{2}\,\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{w})\right|\!\right|_{\infty}}&\geq&\dfrac{c^{2}}{2}\,\displaystyle\sum\limits_{i=1}^{N}\mathbb{E}\,Z_{i,j}&\geq&\dfrac{c^{2}\,N}{2\,(1+\chi(\bm{\theta}^{\star}))}&\geq&\dfrac{c^{2}\,N}{4\,\chi(\bm{\theta}^{\star})}&\geq&\dfrac{c^{2}\,N^{1-\vartheta}}{4\,\exp(E)}.\end{array}

Once more, we invoke Theorem 1 of \citetsupp[][p. 207]Chetal07 to obtain

(|i=1N𝑯i,2(𝑾)𝔼i=1N𝑯i,2(𝑾)|<logmax{N,p}ΨN|𝒟N(𝜽)|2)12max{N,p}2.\begin{array}[]{llllllllll}&&\mathbb{P}\left(\left|\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{W})\right|\!\right|_{\infty}}-\mathbb{E}\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{W})\right|\!\right|_{\infty}}\right|\,<\,\sqrt{\log\max\{N,\,p\}}\,\Psi_{N}\,{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}\right)\vskip 7.11317pt\\ &\geq&1-\dfrac{2}{\max\{N,\,p\}^{2}}.\end{array}

We proceed by showing that there exists an integer N2{3,4,}N_{2}\in\{3,4,\dots\} such that, for all N>N2N>N_{2},

logmax{N,p}ΨN|𝒟N(𝜽)|2c2N1ϑ4exp(E).\begin{array}[]{llllllllll}\sqrt{\log\max\{N,\,p\}}\,\Psi_{N}\,{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&\dfrac{c^{2}\,N^{1-\vartheta}}{4\,\exp(E)}.\end{array} (D.44)

We bound the three terms on the left-hand side of (D.44) one by one. The bounds on the first term, logmax{N,p}\sqrt{\log\max\{N,\,p\}}, and third term, |𝒟N(𝜽)|2{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}, are the same as in the first step. With regard to the second term, we obtain ΨNC2N\Psi_{N}~\leq~C_{2}\,\sqrt{N} by the proof of Lemma D.5 with C2>0C_{2}>0.

Combining these bounds gives

2NlogNC3c2N1ϑ4exp(E)8c2C3exp(E)N12ϑlogN,\begin{array}[]{llllllllll}2\,\sqrt{N\,\log N}\,C_{3}&\leq&\dfrac{c^{2}\,N^{1-\vartheta}}{4\,\exp(E)}\vskip 7.11317pt\\ \dfrac{8}{c^{2}}\,C_{3}\,\exp(E)&\leq&\sqrt{\dfrac{N^{1-2\,\vartheta}}{\log N}},\end{array}

which vanishes as NN\rightarrow\infty under Conditions 4 and D.3 with ϑ[0, 1/18)\vartheta\in[0,\,1/18). Thus, for all N>N2N>N_{2},

(i=1N𝑯i,2(𝑾)c2N2(1+χ(𝜽)))12max{N,p}2.\begin{array}[]{llllllllll}\mathbb{P}\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{W})\right|\!\right|_{\infty}}~\geq~\dfrac{c^{2}\,N}{2\,(1+\chi(\bm{\theta}^{\star}))}\right)&\geq&1-\dfrac{2}{\max\{N,\,p\}^{2}}.\end{array}

Appendix E Quasi-Newton Acceleration

The two-step algorithm described in Section 3.2 iterates two steps:

  • Step 1: Update 𝜽1(t)\bm{\theta}_{1}^{(t)} given 𝜽2(t1)\bm{\theta}_{2}^{(t-1)} using a MM algorithm with a linear convergence rate (\citealpsuppbohning_monotonicity_1988, Theorem 4.1).

  • Step 2: Update 𝜽2(t)\bm{\theta}_{2}^{(t)} given 𝜽1(t)\bm{\theta}_{1}^{(t)} using a Newton-Raphson update with a quadratic convergence rate.

To accelerate Step 1, we use quasi-Newton methods (\citealpsupplange_optimization_2000): We approximate the difference between (𝑨)1(\bm{A}^{\star})^{-1} and [𝑨(𝜽(t))]1[\bm{A}(\bm{\theta}^{(t)})]^{-1}, defined in Lemma 3.2 and Equation (9), respectively, by rank-one updates.

A first-order Taylor approximation of 𝜽1(𝜽1,𝜽2(t))\nabla_{\bm{\theta}_{1}}\,\ell(\bm{\theta}_{1},\bm{\theta}_{2}^{(t)}) around 𝜽1(t)\bm{\theta}_{1}^{(t)} shows that

𝑨(𝜽(t))𝒌(𝜽1(t),𝜽1;𝜽2(t))𝜽1(t)𝜽1,\displaystyle-\bm{A}(\bm{\theta}^{(t)})\;\bm{k}(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{1};\,\bm{\theta}_{2}^{(t)})\;\approx\;\bm{\theta}_{1}^{(t)}-\bm{\theta}_{1}, (E.1)

where

𝒌(𝜽1(t),𝜽1;𝜽2(t))𝜽1(𝜽1,𝜽2(t))|𝜽1=𝜽1(t)𝜽1(𝜽1,𝜽2(t)).\begin{array}[]{llllllllll}\bm{k}(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{1};\,\bm{\theta}_{2}^{(t)})&\coloneqq&\nabla_{\bm{\theta}_{1}}\,\ell(\bm{\theta}_{1},\bm{\theta}_{2}^{(t)})\Big{|}_{\bm{\theta}_{1}=\bm{\theta}_{1}^{(t)}}-\nabla_{\bm{\theta}_{1}}\,\ell(\bm{\theta}_{1},\bm{\theta}_{2}^{(t)}).\end{array} (E.2)

Since a standard Newton-Raphson algorithm corresponds to (E.1) with 𝜽1=𝜽1(t1)\bm{\theta}_{1}=\bm{\theta}_{1}^{(t-1)}, the change in consecutive estimates carries information on [𝑨(𝜽(t))]1[\bm{A}(\bm{\theta}^{(t)})]^{-1}, which we want to approximate. More specifically, we approximate the difference between (𝑨)1\left(\bm{A}^{\star}\right)^{-1} and [𝑨(𝜽(t))]1[\bm{A}(\bm{\theta}^{(t)})]^{-1}. Thus we write (𝑨)1[𝑨(𝜽(t))]1𝑴(t)\left(\bm{A}^{\star}\right)^{-1}-\left[\bm{A}(\bm{\theta}^{(t)})\right]^{-1}\eqqcolon\bm{M}^{(t)} and set 𝜽1=𝜽1(t1)\bm{\theta}_{1}=\bm{\theta}_{1}^{(t-1)}, so that (E.1) becomes

𝑴(t)𝒌(𝜽1(t),𝜽1(t1);𝜽2(t))=(𝜽1(t)𝜽1(t1))+(𝑨)1𝒌(𝜽1(t),𝜽1(t1);𝜽2(t))𝒓(t),\begin{array}[]{llllllllll}\bm{M}^{(t)}\,\bm{k}(\bm{\theta}^{(t)}_{1},\bm{\theta}_{1}^{(t-1)};\bm{\theta}_{2}^{(t)})&=&(\bm{\theta}_{1}^{(t)}-\bm{\theta}_{1}^{(t-1)})+\left(\bm{A}^{\star}\right)^{-1}\,\bm{k}(\bm{\theta}^{(t)}_{1},\bm{\theta}_{1}^{(t-1)};\bm{\theta}_{2}^{(t)})\eqqcolon\bm{r}^{(t)},\end{array} (E.3)

which is called the inverse secant condition for updating 𝑴(t)\bm{M}^{(t)}. Given that (E.3) relates [𝑨(𝜽(t))]1[\bm{A}(\bm{\theta}^{(t)})]^{-1} to the score functions through the definition of 𝒌(𝜽1(t),𝜽1(t1);𝜽2(t))\bm{k}(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{1}^{(t-1)};\,\bm{\theta}_{2}^{(t)}) in (E.2) and estimates 𝜽1(t)\bm{\theta}_{1}^{(t)} and 𝜽1(t1)\bm{\theta}_{1}^{(t-1)}, the updates of 𝑴(t+1)\bm{M}^{(t+1)} will be based on (E.3). We employ the parsimonious symmetric, rank-one update of \citetsuppdavidon_variable_1991 to satisfy (E.3) by updating 𝑴(t)\bm{M}^{(t)} as follows:

𝑴(t)=𝑴(t1)+𝒒(t)(𝒒(t))[c(t)]1,\begin{array}[]{llllllllll}\bm{M}^{(t)}&=&\bm{M}^{(t-1)}+\bm{q}^{(t)}\left(\bm{q}^{(t)}\right)^{\top}\left[c^{(t)}\right]^{-1},\end{array} (E.4)

with 𝒒(t)𝒓(t)𝑴(t1)𝒌(𝜽(t),𝜽(t1);𝜽2(t))\bm{q}^{(t)}\coloneqq\bm{r}^{(t)}-\bm{M}^{(t-1)}\,\bm{k}(\bm{\theta}^{(t)},\bm{\theta}^{(t-1)};\,\bm{\theta}_{2}^{(t)}) and c(t)(𝒒(t))𝒌(𝜽(t),𝜽(t1);𝜽2(t))c^{(t)}\coloneqq(\bm{q}^{(t)})^{\top}\,\bm{k}(\bm{\theta}^{(t)},\bm{\theta}^{(t-1)};\,\bm{\theta}_{2}^{(t)}). We seed the algorithm with the MM update described in Section 3.2 by setting 𝑴(0)=𝟎\bm{M}^{(0)}=\bm{0}, the N×NN\times N null matrix.

In summary, the quasi-Newton acceleration of the MM algorithm updates 𝜽1(t)\bm{\theta}_{1}^{(t)} given 𝜽2(t)\bm{\theta}_{2}^{(t)} as follows:

  1. Step 1: Calculate 𝒌(𝜽1(t),𝜽1(t1);𝜽2(t))\bm{k}(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{1}^{(t-1)};\,\bm{\theta}_{2}^{(t)}) defined in (E.2).

  2. Step 2: Update 𝑴(t)\bm{M}^{(t)} according to (E.4).

  3. Step 3: Update 𝜽1(t+1)\bm{\theta}_{1}^{(t+1)} from 𝜽1(t)\bm{\theta}_{1}^{(t)}:

    𝜽1(t+1)=𝜽1(t)+[(𝑨)1𝑴(t)][𝜽1(𝜽1,𝜽2(t))|𝜽1=𝜽1(t)].\displaystyle\bm{\theta}_{1}^{(t+1)}=\bm{\theta}_{1}^{(t)}+\left[\left(\bm{A}^{\star}\right)^{-1}-\bm{M}^{(t)}\right]\left[\nabla_{\bm{\theta}_{1}}\,\ell(\bm{\theta}_{1},\bm{\theta}_{2}^{(t)})\Big{|}_{\bm{\theta}_{1}=\bm{\theta}_{1}^{(t)}}\right]. (E.5)

Unlike the unaccelerated MM algorithm, the described quasi-Newton algorithm does not guarantee that (𝜽1(t+1),𝜽2(t+1))(𝜽1(t+1),𝜽2(t))\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t+1)})\,\geq\,\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t)}). Therefore, 𝜽1(t+1)\bm{\theta}_{1}^{(t+1)} is updated by either the quasi-Newton update (E.5) or the MM update (10), whichever gives rise to the higher pseudo-likelihood. The resulting updates slightly increase the computing time per iteration while potentially dramatically decreasing the total number of iterations.

Appendix F MM Algorithm: Directed Connections

If connections are directed, Zi,jZ_{i,j} may differ from Zj,iZ_{j,i}. In such cases, the pseudo-loglikelihood can be written as

(𝜽)i=1Ni(𝜽)+i=1N1j=1,jiNi,j(𝜽),\begin{array}[]{llllllllll}\ell(\bm{\theta})&\coloneqq&\displaystyle\sum\limits_{i=1}^{N}\ell_{i}(\bm{\theta})+\displaystyle\sum\limits_{i=1}^{N-1}\,\displaystyle\sum\limits_{j=1,\,j\neq i}^{N}\ell_{i,j}(\bm{\theta}),\end{array}

where i\ell_{i} and i,j\ell_{i,j} are defined by

i(𝜽)logp𝜽(yi𝒚i,𝒛)andi,j(𝜽)logp𝜽(zi,j𝒚,𝒛{i,j}).\begin{array}[]{llllllllll}\ell_{i}(\bm{\theta})\ \coloneqq\ \log\,p_{\bm{\theta}}(y_{i}\mid\bm{y}_{-i},\,\bm{z})\quad\mbox{and}\quad\ell_{i,j}(\bm{\theta})\ \coloneqq\ \log\,p_{\bm{\theta}}(z_{i,j}\mid\bm{y},\,\bm{z}_{-\{i,j\}}).\end{array}

We partition the parameter vector 𝜽(𝜽1,𝜽2)2N+12\bm{\theta}\coloneqq(\bm{\theta}_{1},\,\bm{\theta}_{2})\in\mathbb{R}^{2N+12} into

  • the nuisance parameter vector: 𝜽1(α𝒵,O,1\bm{\theta}_{1}\coloneqq(\alpha_{\mathscr{Z},O,1}, \dots, α𝒵,O,N,α𝒵,I,1\alpha_{\mathscr{Z},O,N},\alpha_{\mathscr{Z},I,1}, \dots, α𝒵,I,N1)2N1\alpha_{\mathscr{Z},I,N-1})\in\mathbb{R}^{2\,N-1};

  • the parameter vector of primary interest: 𝜽2(α𝒴,\bm{\theta}_{2}\coloneqq(\alpha_{\mathscr{Y}}, β𝒳,𝒴,1,β𝒳,𝒴,2,\,\beta_{\mathscr{X},\mathscr{Y},1},\,\beta_{\mathscr{X},\mathscr{Y},2},\, β𝒳,𝒴,3,\beta_{\mathscr{X},\mathscr{Y},3}, λ,γ𝒵,𝒵,1\,\lambda,\,\gamma_{\mathscr{Z},\mathscr{Z},1}, γ𝒵,𝒵,2\gamma_{\mathscr{Z},\mathscr{Z},2}, γ𝒳,𝒵,1\gamma_{\mathscr{X},\mathscr{Z},1}, γ𝒳,𝒵,2\gamma_{\mathscr{X},\mathscr{Z},2}, γ𝒳,𝒵,3\gamma_{\mathscr{X},\mathscr{Z},3}, γ𝒳,𝒵,4\gamma_{\mathscr{X},\mathscr{Z},4}, γ𝒴,𝒵\gamma_{\mathscr{Y},\mathscr{Z}}, γ𝒳,𝒴,𝒵)13\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}})\in\mathbb{R}^{13}.

As explained in Section 6.1, α𝒵,N,I\alpha_{\mathscr{Z},N,I} is set to 0 in order to address identifiability issues. The negative Hessian is partitioned in accordance:

𝜽2(𝜽)(𝑨(𝜽)𝑩(𝜽)𝑩(𝜽)𝑪(𝜽)),\begin{array}[]{llllllllll}-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta})&\coloneqq&\begin{pmatrix}\bm{A}(\bm{\theta})&\bm{B}(\bm{\theta})\\ \bm{B}(\bm{\theta})^{\top}&\bm{C}(\bm{\theta})&\end{pmatrix},\end{array}

where 𝑨(𝜽)(2N1)×(2N1)\bm{A}(\bm{\theta})\in\mathbb{R}^{(2\,N-1)\times(2\,N-1)}, 𝑩(𝜽)(2N1)×13\bm{B}(\bm{\theta})\in\mathbb{R}^{(2\,N-1)\times 13}, and 𝑪(𝜽)13×13\bm{C}(\bm{\theta})\in\mathbb{R}^{13\times 13}. Writing (𝜽1,𝜽2)\ell(\bm{\theta}_{1},\bm{\theta}_{2}) in place of (𝜽)\ell(\bm{\theta}), we compute at iteration t+1t+1:

  • Step 1: Given 𝜽2(t)\bm{\theta}_{2}^{(t)}, find 𝜽1(t+1)\bm{\theta}_{1}^{(t+1)} satisfying (𝜽1(t+1),𝜽2(t))(𝜽1(t),𝜽2(t))\ell(\bm{\theta}_{1}^{(t+1)},\bm{\theta}_{2}^{(t)})\,\geq\,\ell(\bm{\theta}_{1}^{(t)},\bm{\theta}_{2}^{(t)}).

  • Step 2: Given 𝜽1(t+1)\bm{\theta}_{1}^{(t+1)}, find 𝜽2(t+1)\bm{\theta}_{2}^{(t+1)} satisfying (𝜽1(t+1),𝜽2(t+1))(𝜽1(t+1),𝜽2(t))\ell(\bm{\theta}_{1}^{(t+1)},\bm{\theta}_{2}^{(t+1)})\,\geq\,\ell(\bm{\theta}_{1}^{(t+1)},\bm{\theta}_{2}^{(t)}).

In Step 1, it is inconvenient to invert the high-dimensional (2N1)×(2N1)(2\,N-1)\times(2\,N-1) matrix

𝑨(𝜽(t))i=1Nj=1,jiN𝜽12i,j(𝜽1,𝜽2(t))|𝜽1=𝜽1(t)=i=1Nj=1,jiNπi,j(t)(1πi,j(t))𝒆i,j𝒆i,j.\begin{array}[]{llllllllll}\bm{A}(\bm{\theta}^{(t)})&\coloneqq&-\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=1,\,j\neq i}^{N}\,\nabla_{\bm{\theta}_{1}}^{2}\,\ell_{i,j}(\bm{\theta}_{1},\bm{\theta}_{2}^{(t)})\Big{|}_{\bm{\theta}_{1}=\bm{\theta}_{1}^{(t)}}&=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=1,\,j\neq i}^{N}\pi_{i,j}^{(t)}\,(1-\pi_{i,j}^{(t)})\,\bm{e}_{i,j}\,\bm{e}_{i,j}^{\top}.\end{array}

Note that the definition of vector 𝒆i,j2N1\bm{e}_{i,j}\in\mathbb{R}^{2\,N-1} differs from the undirected case described in Section 3.2. For jNj\neq N, let 𝒆i,j\bm{e}_{i,j} be the (2N1)(2\,N-1)-vector whose iith and (j+N)(j+N)th coordinates are 11 and whose other coordinates are 0. For j=Nj=N, let 𝒆i,j\bm{e}_{i,j} be the (2N1)(2\,N-1)-vector whose iith coordinate is 11 and whose other coordinates are 0. Along the lines of the MM algorithm for undirected connections described in Section 3.2, we increase \ell by maximizing a minorizing function of \ell, replacing 𝑨(𝜽(t))\bm{A}(\bm{\theta}^{(t)}) by a constant matrix 𝑨\bm{A}^{\star} that is more convenient to invert. The constant matrix 𝑨\bm{A}^{\star} is defined as

𝑨(𝑨1,1𝑨1,2(𝑨1,2)𝑨2,2)\begin{array}[]{llllllllll}\bm{A}^{\star}&\coloneqq&\begin{pmatrix}\bm{A}_{1,1}^{\star}&\bm{A}_{1,2}^{\star}\\ \left(\bm{A}_{1,2}^{\star}\right)^{\top}&\bm{A}_{2,2}^{\star}\\ \end{pmatrix}\end{array}

where

  • 𝑨1,1N×N\bm{A}_{1,1}^{\star}\in\mathbb{R}^{N\times N} and 𝑨2,2(N1)×(N1)\bm{A}_{2,2}^{\star}\in\mathbb{R}^{(N-1)\times(N-1)} are diagonal matrices with elements (N1)/4(N-1)/4 on the main diagonal;

  • 𝑨1,2N×(N1)\bm{A}_{1,2}^{\star}\in\mathbb{R}^{N\times(N-1)} is a matrix with vanishing elements on its main diagonal and off-diagonal elements 1/41/4.

Applying Theorem 8.5.11 in \citetsuppharville_matrix_1997 to 𝑨1,2\bm{A}_{1,2}^{\star} and 𝑨\bm{A}^{\star} shows that matrix can be inverted in O(N)O(N) operations. With the above change in the constant matrix 𝑨\bm{A}^{\star}, we estimate 𝜽\bm{\theta} along the lines of Section 3.2.

Appendix G Hate Speech on X: Additional Information

G.1 Data

For the application, we use posts of N=N= 2,191 U.S. state legislators on the social media platform X collected by \citetsuppkim_attention_2022 in the six months leading up to and including the insurrection at the United States Capitol on January 6, 2021. We restrict attention to active legislators, that is, legislators who posted during the aforementioned period and mentioned or reposted content from other active legislators. Since reposts do not necessarily reflect politicians’ opinions, we exclude all reposts and non-unique posts that are direct copies of other users’ messages to gather information on responses. Employing large language models of \citetsuppcamacho-collados_tweetnlp_2022 pre-trained on these posts enables categorizing the 109,974 posts into those containing hate speech statements versus those that do not. Accordingly, the binary attribute YiY_{i} equals 1 if the corresponding legislator sent at least one post classified as hate speech and 0 otherwise. The algorithm of \citetsuppcamacho-collados_tweetnlp_2022 provides for each Tweet a continuous value between 0 and 1. We classify the respective Tweet as using hate speech if its value is larger than 0.5. The attribute xi,1{0,1}x_{i,1}\in\{0,1\} is 1 if legislator ii is a Republican and 0 otherwise. In addition, we incorporate information on each legislator’s gender (xi,2=1x_{i,2}=1 if legislator ii is female and 0 otherwise), race (xi,3=1x_{i,3}=1 if legislator is white and 0 otherwise), and state (xi,4x_{i,4}). On the social media platform X, users have the ability to either mention or repost other users’ posts. The resulting network, denoted as 𝒁\bm{Z}, is based on the mentions and reposts exchanged between January 6, 2020 and January 6, 2021: Zi,j=1Z_{i,j}=1 if legislator ii mentioned or reposted legislator jj in a post.

G.2 Plots

In addition to the goodness-of-fit checks reported in Section 6, we assess whether the model preserves salient characteristics of connections 𝒁\bm{Z}. Figure 5 suggests that the proposed model captures the shared partner distribution, i.e., the numbers of connected pairs of legislators {i,j}𝒫N\{i,j\}\subset\mathscr{P}_{N} with 11, 22, \dots shared partners.

Refer to caption
Figure 5: Hate speech on X: The red line indicates the observed shared partners distribution of the network of repost and mention interactions of U.S. legislators, while the boxplots represent the shared partners distributions of simulated networks from the estimated model.
\bibliographystylesupp

chicago \bibliographysuppbase