\newcites

suppSupplementary Material References

A Regression Framework for Studying Relationships among Attributes under Network Interference

Cornelius Fritz
Michael Schweinberger
Subhankar Bhadra
David R. Hunter
Department of Statistics, The Pennsylvania State University Corresponding author.

Abstract

To understand how the interconnected and interdependent world of the twenty-first century operates and make model-based predictions, joint probability models for networks and interdependent outcomes are needed. We propose a comprehensive regression framework for networks and interdependent outcomes with multiple advantages, including interpretability, scalability, and provable theoretical guarantees. The regression framework can be used for studying relationships among attributes of connected units and captures complex dependencies among connections and attributes, while retaining the virtues of linear regression, logistic regression, and other regression models by being interpretable and widely applicable. On the computational side, we show that the regression framework is amenable to scalable statistical computing based on convex optimization of pseudo-likelihoods using minorization-maximization methods. On the theoretical side, we establish convergence rates for pseudo-likelihood estimators based on a single observation of dependent connections and attributes. We demonstrate the regression framework using simulations and an application to hate speech on the social media platform X.

Keywords: Dependent Data, Generalized Linear Models, Minorization-Maximization, Pseudo-Likelihood

1 Introduction

In the interconnected and interdependent world of the twenty-first century, individual and collective outcomes—such as personal and public health, economic welfare, or war and peace—are affected by relationships among individual, corporate, state, and non-state actors. To understand how the world of the twenty-first century operates and make model-based predictions, it is vital to study networks of relationships and gain insight into how the structure of networks affects individual and collective outcomes.

While the structure of networks has been widely studied (see Kolaczyk, 2017, and references therein), the structure of networks is rarely of primary interest. Instead, we often wish to understand how networks affect individual or collective outcomes. For example, social, economic, and financial relationships among individual and corporate actors can affect the welfare of people, but the outcome of primary interest is the welfare of billions of people around the world. Relationships among state and non-state actors can affect war and peace, but the outcome of primary interest is the welfare of nations. Contact networks mediate the spread of infectious diseases, but the outcome of primary interest is public health. A final example is causal inference under network interference: If the outcomes of units are affected by the treatments or outcomes of other units, the spillover effect of treatments on outcomes can be represented by an intervention network, but the target of statistical inference is the direct and indirect causal effects of treatments on outcomes.

To learn how networks are wired and how the structure of networks affects outcomes of interest, data on outcomes $\bm{Y}\coloneqq(Y_{i})_{i=1}^{N}$ and connections $\bm{Z}\coloneqq(Z_{i,j})_{i,j}^{N}$ among $N$ units are needed along with predictors $\bm{X}\coloneqq(\bm{X}_{i})_{i=1}^{N}$ . Statistical work on joint probability models for dependent outcomes and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ is scarce. Snijders et al. (2007) and Niezink and Snijders (2017) develop models for behavioral outcomes and connections using continuous-time Markov processes, assuming that the behavioral outcomes and connections are observed at two or more time points. Wang et al. (2024) combine Ising models for binary outcomes with exponential family models for binary connections, with applications to causal inference (Clark and Handcock, 2024). In a Bayesian framework, Fosdick and Hoff (2015) unite models for continuous outcomes with latent variable models that capture dependencies among connections. A common feature of these approaches is that the models and methods in these works may be useful in small populations with, say, hundreds of members, but may be less useful in large populations with, say, thousands or millions of members. For example, many of these models make dependence assumptions that are reasonable in small populations but are less reasonable in large populations. In the special case of exponential-family models, it is known that models that make unreasonable dependence assumptions can give rise to undesirable probabilistic and statistical behavior in large populations, such as model near-degeneracy (Handcock, 2003; Schweinberger, 2011; Chatterjee and Diaconis, 2013). In addition, these works rely on Monte Carlo and Markov chain Monte Carlo methods for moment- and likelihood-based inference, which limits the scalability of the mentioned approaches. Last, but not least, the theoretical properties of statistical procedures based on dependent outcomes and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ , such as the convergence rates of estimators, are unknown.

While statistical work on joint probability models for $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ is scarce, recent progress has been made on conditional models for outcomes $\bm{Y}\mid(\bm{X},\bm{Z})=(\bm{x},\bm{z})$ and connections $\bm{Z}\mid\bm{X}=\bm{x}$ . For example, the literature on network-aware regression uses conditional models for outcomes $\bm{Y}\mid(\bm{X},\bm{Z})=(\bm{x},\bm{z})$ : Lei et al. (2024) assume that the dependence among outcomes decays as a function of distance in the population network, while Li et al. (2019) and Le and Li (2022) encourage outcomes of connected units to be similar. A related branch of literature, concerned with causal inference under network interference, leverages conditional models for outcomes $\bm{Y}\mid(\bm{X},\bm{Z})=(\bm{x},\bm{z})$ given treatment assignments $\bm{X}=\bm{x}$ and connections $\bm{Z}=\bm{z}$ . Some of them consider fixed connections (e.g., Tchetgen Tchetgen et al., 2021; Ogburn et al., 2024), while others combine conditional models for $\bm{Y}\mid(\bm{X},\bm{Z})=(\bm{x},\bm{z})$ with marginal models for $\bm{Z}$ , assuming that connections are independent (Li and Wager, 2022). Other works advance autoregressive network models for $\bm{Y}\mid(\bm{X},\bm{Z})=(\bm{x},\bm{z})$ (Huang et al., 2019, 2020; Zhu et al., 2020). Conditional models for connections $\bm{Z}\mid\bm{X}=\bm{x}$ include stochastic block and exponential-family models with covariates (e.g., Handcock, 2003; Huang et al., 2024; Wang et al., 2024; Stein et al., 2025).

All of the cited work is limited to special cases, such as real-valued outcomes or binary connections, rather than presenting a comprehensive regression framework for studying relationships among attributes $(\bm{X},\bm{Y})$ under network interference $\bm{Z}$ . To fill the void left by existing work, we propose a comprehensive regression framework for studying relationships among attributes $(\bm{X},\bm{Y})$ under network interference $\bm{Z}$ based on joint probability models for $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ . The proposed regression framework has important advantages over existing work, including interpretability, scalability, and provable theoretical guarantees:

1.

We show in Sections 2.1 and 2.2 that the proposed regression framework can be viewed as a generalization of linear regression, logistic regression, and other regression models for studying relationships among attributes under network interference, adding a simple and widely applicable set of tools to the toolbox of data scientists. We demonstrate the advantages of the regression framework with an application to hate speech on the social media platform X in Section 6.
2.

The proposed regression framework can be applied to small and large populations by leveraging additional structure to control the dependence among outcomes and connections, facilitating the construction of models with complex dependencies among outcomes and connections in small and large populations.
3.

We develop scalable methods using minorization-maximization algorithms for convex optimization of pseudo-likelihoods in Section 3. To disseminate the regression framework and its scalable methods, we provide an R package.
4.

We establish theoretical guarantees for pseudo-likelihood estimators in Section 4. To the best of our knowledge, these are the first theoretical guarantees for joint probability models of $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ based on a single observation $(\bm{y},\bm{z})$ of $(\bm{Y},\bm{Z})$ . The simulation results in Section 5 demonstrate that pseudo-likelihood estimators perform well as the number of units $N$ and the number of parameters $p$ increases.

In addition, the regression framework has conceptual and statistical advantages:

5.

Compared with conditional models for outcomes $\bm{Y}\mid(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z})$ and connections $\bm{Z}\mid\bm{X}=\bm{x}$ , the proposed regression framework for $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ provides insight into outcome-connection dependencies, in addition to outcome-outcome and connection-connection dependencies.
6.

Compared with conditional models for outcomes $\bm{Y}\mid(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z})$ , conclusions based on the proposed regression framework are not limited to a specific population network $\bm{z}$ , but can be extended to the superpopulation of all possible population networks. In addition, the proposed regression framework provides insight into the probability law governing the superpopulation of all possible population networks.
7.

The proposed regression framework retains the advantages of two general approaches to building joint probability models for dependent data, elucidated in the celebrated paper by Besag (1974): Specifying a joint probability distribution directly guarantees desirable mathematical properties, while specifying it indirectly via conditional probability distributions helps build complex models from simple building blocks. We show how to directly specify a joint probability model from simple building blocks. The resulting regression framework possesses desirable mathematical properties and induces conditional distributions that can be represented by regression models, facilitating interpretation. We showcase these advantages in Sections 2.2 and 6.2.

We elaborate the proposed regression framework in the remainder of the article.

2 Regression under Network Interference

Consider a population of $N\geq 2$ units $\mathscr{P}_{N}\coloneqq\{1,\dots,N\}$ , where each unit $i\in\mathscr{P}_{N}$ possesses

•

one or more binary, count-, or real-valued predictors $\bm{X}_{i}\in\mathscr{X}_{i}$ , which may include covariates and treatment assignments;
•

binary, count-, or real-valued outcomes or responses $Y_{i}\in\mathscr{Y}_{i}$ ;
•

binary, count-, or real-valued connections $Z_{i,j}\in\mathscr{Z}_{i,j}$ to other units $j\in\mathscr{P}_{N}\setminus\,\{i\}$ , which represent indicators of connections or weights of connections (e.g., the number of interactions between $i$ and $j$ ).

We first consider undirected connections, for which $Z_{i,j}$ equals $Z_{j,i}$ , and describe extensions to directed connections in Section 6. We write $\bm{X}\coloneqq(\bm{X}_{i})_{1\leq i\leq N}$ , $\bm{Y}\coloneqq(Y_{i})_{1\leq i\leq N}$ , $\bm{Z}\coloneqq(Z_{i,j})_{1\leq i<j\leq N}$ , $\mathscr{X}\coloneqq\mathbin{\leavevmode\hbox to9.47pt{\vbox to9.47pt{\pgfpicture\makeatletter\hbox{\thinspace\lower-0.43056pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}\pgfsys@setlinewidth{0.86111pt}\pgfsys@invoke{ }{}{{}}{} {}{}{}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{8.61108pt}{8.61108pt}\pgfsys@moveto{0.0pt}{8.61108pt}\pgfsys@lineto{8.61108pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}}_{i=1}^{N}\,\mathscr{X}_{i}$ , $\mathscr{Y}\coloneqq\mathbin{\leavevmode\hbox to9.47pt{\vbox to9.47pt{\pgfpicture\makeatletter\hbox{\thinspace\lower-0.43056pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}\pgfsys@setlinewidth{0.86111pt}\pgfsys@invoke{ }{}{{}}{} {}{}{}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{8.61108pt}{8.61108pt}\pgfsys@moveto{0.0pt}{8.61108pt}\pgfsys@lineto{8.61108pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}}_{i=1}^{N}\,\mathscr{Y}_{i}$ , and $\mathscr{Z}\coloneqq\mathbin{\leavevmode\hbox to9.47pt{\vbox to9.47pt{\pgfpicture\makeatletter\hbox{\thinspace\lower-0.43056pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}\pgfsys@setlinewidth{0.86111pt}\pgfsys@invoke{ }{}{{}}{} {}{}{}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{8.61108pt}{8.61108pt}\pgfsys@moveto{0.0pt}{8.61108pt}\pgfsys@lineto{8.61108pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}}_{i<j}^{N}\,\mathscr{Z}_{i,j}$ , and refer to $\bm{Y}$ without $Y_{i}$ and $\bm{Z}$ without $Z_{i,j}$ as $\bm{Y}_{-i}\in\mathscr{Y}_{-i}$ and $\bm{Z}_{-\{i,j\}}\in\mathscr{Z}_{-\{i,j\}}$ , respectively. In line with Generalized Linear Models (GLMs), we introduce a known scale parameter $\psi\in(0,+\infty)$ and define $Y_{i}^{\star}\coloneqq Y_{i}\,/\,\psi$ and $\bm{Y}_{-i}^{\star}\coloneqq\bm{Y}_{-i}\,/\,\psi$ . Throughout, $\mathbb{I}(\cdot)$ is an indicator function, which is $1$ if its argument is true and is $0$ otherwise. We write $a_{N}=O(b_{N})$ and $a_{N}=o(b_{N})$ to indicate that $|a_{N}/b_{N}|$ remains bounded and $\lim_{N\to\infty}|a_{N}/b_{N}|=0$ , respectively.

Following the bulk of the literature on regression models, we condition on predictors $\bm{X}=\bm{x}$ . To construct joint probability models for dependent responses and connections $(\bm{Y},\,\bm{Z})\mid\bm{X}=\bm{x}$ , we introduce a family of probability measures $\{\mathbb{P}_{\bm{\theta}},\,\bm{\theta}\in\bm{\Theta}\}$ dominated by a $\sigma$ -finite measure $\nu$ , with densities of the form

\begin{array}[]{llllllllll}f_{\bm{\theta}}(\bm{y},\,\bm{z}\mid\bm{x})&=&\dfrac{1}{\varphi(\bm{\theta})}\left[\displaystyle\prod\limits_{i=1}^{N}a_{\mathscr{Y}}(y_{i})\,\exp\left(\bm{\theta}_{g}^{\top}\,g_{i}(\bm{x}_{i},\,y_{i}^{\star})\right)\right]\vskip 7.11317pt\\ &\times&\left[\displaystyle\prod\limits_{i=1}^{N-1}\,\displaystyle\prod\limits_{j=i+1}^{N}a_{\mathscr{Z}}(z_{i,j})\,\exp\left(\bm{\theta}_{h}^{\top}\,h_{i,j}(\bm{x},\,y_{i}^{\star},\,y_{j}^{\star},\,\bm{z})\right)\right]:\end{array}

(1)

•

$a_{\mathscr{Y}}:\mathscr{Y}_{i}\mapsto[0,+\infty)$ and $a_{\mathscr{Z}}:\mathscr{Z}_{i,j}\mapsto[0,+\infty)$ are known functions of responses $Y_{i}$ of units $i\in\mathscr{P}_{i}$ and connections $Z_{i,j}$ of pairs of units $\{i,j\}\subset\mathscr{P}_{N}$ ;
•

$g_{i}:\mathscr{X}_{i}\times\mathscr{Y}_{i}\mapsto\mathbb{R}^{q}$ are known functions describing the relationship of predictors $\bm{x}_{i}$ and responses $Y_{i}$ of units $i\in\mathscr{P}_{N}$ , which can depend on $\psi$ ;
•

$h_{i,j}:\mathscr{X}\times\mathscr{Y}_{i}\times\mathscr{Y}_{j}\times\mathscr{Z}\mapsto\mathbb{R}^{r}$ are known functions specifying how the responses and connections of pairs of units $\{i,j\}\subset\mathscr{P}_{N}$ depend on the predictors, responses, and connections to other units, which can depend on $\psi$ ;
•

$\bm{\theta}\coloneqq(\bm{\theta}_{g},\,\bm{\theta}_{h})\in\bm{\Theta}$ is a parameter vector of dimension $p\coloneqq q+r$ , where $\bm{\Theta}\coloneqq\{\bm{\theta}\in\mathbb{R}^{p}:\varphi(\bm{\theta})<\infty\}$ and $\varphi:\bm{\Theta}\mapsto(0,+\infty]$ ensures that $\int_{\mathscr{Y}\times\mathscr{Z}}\,f_{\bm{\theta}}(\bm{y},\bm{z}\,|\,\bm{x})\mathop{\mbox{d}}\nolimits\nu(\bm{y},\bm{z})=1$ , with the dependence of $\varphi$ on $\bm{x}$ suppressed;

•

$\nu$ is a $\sigma$ -finite product measure of the form

\begin{array}[]{llllllllll}\nu(\bm{y},\bm{z})&\coloneqq&\left[\displaystyle\prod\limits_{i=1}^{N}\nu_{\mathscr{Y}}(y_{i})\right]\left[\displaystyle\prod\limits_{i=1}^{N-1}\,\displaystyle\prod\limits_{j=i+1}^{N}\nu_{\mathscr{Z}}(z_{i,j})\right],\end{array}

where $\nu_{\mathscr{Y}}$ and $\nu_{\mathscr{Z}}$ are $\sigma$ -finite measures that depend on the support sets of responses $Y_{i}$ and connections $Z_{i,j}$ (e.g., Lebesgue or counting measure).

Remark: Importance of Additional Structure. To respect real-world constraints and facilitate theoretical guarantees, joint probability models for dependent responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ should leverage additional structure, e.g., local dependence structure. For one thing, units in large populations may not be aware of most other units in the population, so it is not credible that the responses and connections of units depend on the responses and connections of all other units in the population. In addition, models permitting strong dependence among the responses and connections of all units in the population may suffer from model near-degeneracy (Handcock, 2003; Schweinberger, 2011; Chatterjee and Diaconis, 2013). By contrast, Stewart and Schweinberger (2025) demonstrate that leveraging additional structure to control dependence can lead to theoretical guarantees. Motivated by these considerations, we assume that each unit $i\in\mathscr{P}_{N}$ has a known set of neighbors $\mathscr{N}_{i}\subset\mathscr{P}_{N}$ , which includes $i$ and is independent of connections $\bm{Z}$ , and that the dependence among responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ is local in the sense that it is limited to overlapping neighborhoods. We provide examples of joint probability models for $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ with local dependence in Sections 2.2 and 6.1.

Remark: Fixed versus Random Design. In line with the bulk of the literature on regression models, we consider a fixed design: We view predictors $\bm{X}$ and neighborhoods $\mathscr{N}_{1},\ldots,\mathscr{N}_{N}$ as exogenous and known, and we do not make assumptions about the mechanism generating them. Thus, Equation (1) specifies the joint probability density function of responses and connections $(\bm{Y},\bm{Z})$ conditional on predictors $\bm{X}=\bm{x}$ and neighborhoods $\mathscr{N}_{1},\ldots,\mathscr{N}_{N}$ . If $\bm{X}$ and $\mathscr{N}_{1},\ldots,\mathscr{N}_{N}$ were random, the conditional model for $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x},\,\mathscr{N}_{1},\ldots,\mathscr{N}_{N}$ could be combined with marginal models for $\bm{X}$ and $\mathscr{N}_{1},\ldots,\mathscr{N}_{N}$ . In the social media application in Section 6, the neighborhoods are fixed and known: The neighborhoods of users are the sets of followees, because users choose whom to follow and hence who can influence them, and these choices are observed. If the neighborhoods were unobserved, one could view them as unobserved constants (if neighborhoods were fixed) or unobserved variables (if neighborhoods were random) and learn them from attributes $(\bm{X},\bm{Y})$ or connections $\bm{Z}$ . The problem of how to learn neighborhoods is an open problem and constitutes a promising avenue for future research.

2.1 GLM Representations

The proposed joint probability models of $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ can be viewed as generalizations of Generalized Linear Models (GLMs) (Efron, 2022). GLMs form a well-known, interpretable, and widely applicable statistical framework for univariate responses $Y_{i}\in\mathscr{Y}_{i}$ given predictors $\bm{x}_{i}\in\mathbb{R}^{d}$ ( $d\geq 1$ ), including logistic regression ( $Y_{i}\in\{0,1\}$ ), Poisson regression ( $Y_{i}\in\{0,1,\ldots\}$ ), and linear regression ( $Y_{i}\in\mathbb{R}$ ). GLMs are characterized by two properties:

1.

Conditional mean: The conditional mean $\mu_{i}(\eta_{i})\coloneqq\mathbb{E}_{\eta_{i}}(Y_{i}\mid\bm{x}_{i})$ of response $Y_{i}\in\mathscr{Y}_{i}$ , conditional on predictors $\bm{x}_{i}\in\mathbb{R}^{d}$ with weights $\bm{\beta}\in\mathbb{R}^{d}$ , is a (possibly nonlinear) function of a linear predictor $\eta_{i}\coloneqq\bm{\beta}^{\top}\bm{x}_{i}$ .

Conditional distribution: The conditional distribution of response $Y_{i}$ is an exponential family distribution with a known scale parameter $\psi\in(0,\,+\infty)$ , which admits a density with respect to a $\sigma$ -finite measure $\nu_{\mathscr{Y}}$ of the form

\begin{array}[]{llllllllll}f_{\eta_{i}}(y_{i}\mid\bm{x}_{i})&\coloneqq&a_{\mathscr{Y}}(y_{i})\,\exp\left(\dfrac{\eta_{i}\,y_{i}-b_{i}(\eta_{i})}{\psi}\right),\end{array}

with cumulant-generating function

\begin{array}[]{llllllllll}b_{i}(\eta_{i})&\coloneqq&\psi\,\log\displaystyle\int\limits_{\mathscr{Y}_{i}}\,a_{\mathscr{Y}}(y)\,\exp\left(\dfrac{\eta_{i}\,y}{\psi}\right)\mathop{\mbox{d}}\nolimits\nu_{\mathscr{Y}}(y).\end{array}

The conditional mean $\mu_{i}(\eta_{i})$ can be obtained by differentiating $b_{i}(\eta_{i})$ : $\mu_{i}(\eta_{i})=\nabla_{\eta_{i}}\,b_{i}(\eta_{i})$ (Corollary 2.3, Brown, 1986, pp. 35–36).

The relationship to GLMs facilitates the interpretation and dissemination of results. The following proposition clarifies the relationship to GLMs.

Proposition 1: GLM Representation of Conditionals. Consider any pair of units $\{i,j\}\subset\mathscr{P}_{N}$ ( $i<j$ ) and assume that $g_{i}$ and $h_{i,j}$ are affine functions of $y_{i}^{\star}$ for any given $(\bm{x},\,\bm{y}_{-i},\,\bm{z})\in\mathscr{X}\times\mathscr{Y}_{-i}\times\mathscr{Z}$ , in the sense that there exist known functions $g_{i,0}:\mathscr{X}_{i}\mapsto\mathbb{R}^{q}$ , $g_{i,1}:\mathscr{X}_{i}\mapsto\mathbb{R}^{q}$ , $h_{i,j,0}:\mathscr{X}\times\mathscr{Y}_{j}\times\mathscr{Z}\mapsto\mathbb{R}^{r}$ , and $h_{i,j,1}:\mathscr{X}\times\mathscr{Y}_{j}\times\mathscr{Z}\mapsto\mathbb{R}^{r}$ such that

\begin{array}[]{llllllllll}g_{i}(\bm{x}_{i},\,y_{i}^{\star})\;\coloneqq\;g_{i,0}(\bm{x}_{i})+g_{i,1}(\bm{x}_{i})\;y_{i}^{\star}\vskip 7.11317pt\\ h_{i,j}(\bm{x},\,y_{i}^{\star},\,y_{j}^{\star},\,\bm{z})\;\coloneqq\;h_{i,j,0}(\bm{x},\,y_{j}^{\star},\,\bm{z})+h_{i,j,1}(\bm{x},\,y_{j}^{\star},\,\bm{z})\;y_{i}^{\star}.\end{array}

Then the conditional distribution of response $Y_{i}\mid(\bm{X},\,\bm{Y}_{-i},\,\bm{Z})=(\bm{x},\,\bm{y}_{-i},\,\bm{z})$ by unit $i$ can be represented by a GLM with linear predictor

\begin{array}[]{llllllllll}\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z})&\coloneqq&\bm{\theta}^{\top}\left(g_{i,1}(\bm{x}_{i}),\;\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,h_{i,j,1}(\bm{x},\,y_{j}^{\star},\,\bm{z})\right)\end{array}

and cumulant-generating function

\begin{array}[]{llllllllll}b_{i}(\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z}))&\coloneqq&\psi\,\log\displaystyle\int\limits_{\mathscr{Y}_{i}}\,a_{\mathscr{Y}}(y)\,\exp\left(\dfrac{\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z})\;y}{\psi}\right)\mathop{\mbox{d}}\nolimits\nu_{\mathscr{Y}}(y).\end{array}

To ease the notation, we henceforth write $\eta_{i}$ instead of $\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z})$ .

Proposition 2.1 supplies a recipe for representing the conditional distribution of responses $Y_{i}\mid(\bm{X},\,\bm{Y}_{-i},\,\bm{Z})=(\bm{x},\,\bm{y}_{-i},\,\bm{z})$ by a GLM:

1.

Conditional distribution: The conditional distribution of response $Y_{i}$ is an exponential family distribution, which can be represented by a GLM with conditional mean $\mu_{i}(\eta_{i})$ , linear predictor $\eta_{i}$ , and scale parameter $\psi$ .
2.

Conditional mean: The conditional mean $\mu_{i}(\eta_{i})\coloneqq\mathbb{E}_{\eta_{i}}(Y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})$ can be obtained by differentiating $b_{i}(\eta_{i})$ : $\mu_{i}(\eta_{i})=\nabla_{\eta_{i}}\,b_{i}(\eta_{i})$ . Since the map $\eta_{i}\mapsto\mu_{i}$ is one-to-one and invertible (Theorem 3.6, Brown, 1986, p. 74), $\eta_{i}$ can be obtained by inverting $\mu_{i}(\eta_{i})$ .

Thus, the proposed regression framework for dependent responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ inherits the GLM advantages of being interpretable and widely applicable, without assuming that responses or connections are independent. As a result, the proposed regression framework can be viewed as a generalization of GLMs.

2.2 Example: Model Specification

We showcase how a joint probability model for dependent responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ with local dependence can be constructed, leveraging additional structure in the form of overlapping neighborhoods $\mathscr{N}_{1},\ldots,\mathscr{N}_{N}$ to control the dependence among responses and connections in small and large populations.

We focus on units $i\in\mathscr{P}_{N}$ with binary, count-, or real-valued predictors $x_{i}\in\mathscr{X}_{i}$ and responses $Y_{i}\in\mathscr{Y}_{i}$ and binary connections $Z_{i,j}\in\{0,1\}$ . Starting with $g_{i}$ , we capture the main effect of $Y_{i}^{\star}$ and the interaction effect of $x_{i}$ and $Y_{i}^{\star}$ by specifying $g_{i}$ as follows:

\begin{array}[]{llllllllll}\bm{\theta}_{g}\,\coloneqq\,\left(\begin{array}[]{ccc}\alpha_{\mathscr{Y}}\\ \beta_{\mathscr{X},\mathscr{Y}}\end{array}\right)\,\in\,\mathbb{R}^{2},&g_{i}\,\coloneqq\,\left(\begin{array}[]{ccc}y_{i}^{\star}\\ x_{i}\,y_{i}^{\star}\end{array}\right)\,\in\,\mathbb{R}^{2}.\end{array}

(2)

Turning to $h_{i,j}$ , we define neighborhood-related terms

\begin{array}[]{llllllllll}c_{i,j}\coloneqq\mathbbm{1}(\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset),\quad d_{i,j}(\bm{z})\coloneqq\mathbbm{1}(\exists\;k\,\in\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,:\,z_{i,k}=z_{k,j}=1).\end{array}

(3)

To capture unobserved heterogeneity in the propensities of units to form connections, we introduce the $N$ -vector $\bm{\alpha}_{\mathscr{Z}}\coloneqq(\alpha_{\mathscr{Z},1},\ldots,\alpha_{\mathscr{Z},N})\in\mathbb{R}^{N}$ . In addition, we penalize connections among units $i$ and $j$ with non-overlapping neighborhoods and capture transitive closure along with treatment and outcome spillover by specifying $h_{i,j}$ as follows:

\begin{array}[]{llllllllll}\bm{\theta}_{h}\coloneqq\left(\begin{array}[]{ccc}\bm{\alpha}_{\mathscr{Z}}\\ \lambda\\ \gamma_{\mathscr{Z},\mathscr{Z}}\\ \gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\\ \gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\end{array}\right)\in\mathbb{R}^{N+4},&h_{i,j}\coloneqq\left(\begin{array}[]{ccc}\bm{e}_{i,j}\,z_{i,j}\\ -(1-c_{i,j})\,z_{i,j}\log N\\ d_{i,j}(\bm{z})\,z_{i,j}\\ c_{i,j}\,(x_{i}\,y_{j}^{\star}+x_{j}\,y_{i}^{\star})\,z_{i,j}\\ c_{i,j}\,y_{i}^{\star}\,y_{j}^{\star}\,z_{i,j}\end{array}\right)\in\mathbb{R}^{N+4},\end{array}

(4)

where $\bm{e}_{i,j}$ denotes the $N$ -vector whose whose $i$ th and $j$ th coordinates are $1$ and whose other coordinates are all $0$ . The parameters $\alpha_{\mathscr{Z},1},\ldots,\alpha_{\mathscr{Z},N}$ can be interpreted as the propensities of units $1,\dots,N$ to form connections; $\lambda>0$ discourages connections among units with non-overlapping neighborhoods; $\gamma_{\mathscr{Z},\mathscr{Z}}$ quantifies the tendency towards transitive closure among connections; and $\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}$ and $\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}$ capture treatment and outcome spillover, respectively. Sections 2.2.1 and 2.2.2 demonstrate that the interpretation of these effects is facilitated by the fact that the conditional distributions of $Y_{i}$ and $Z_{i,j}$ can be represented by GLMs.

2.2.1 GLM Representation of Responses $Y_{i}$

To interpret the model specified by Equations (2) and (4), we take advantage of the fact that the conditional distribution of response $Y_{i}\mid(\bm{X},\,\bm{Y}_{-i},\,\bm{Z})=(\bm{x},\,\bm{y}_{-i},\,\bm{z})$ by unit $i$ can be represented by a GLM with linear predictor

\begin{array}[]{llllllllll}\eta_{i}&=&\alpha_{\mathscr{Y}}+\beta_{\mathscr{X},\mathscr{Y}}\;x_{i}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j:\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset}\,x_{j}\,z_{i,j}+\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j:\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset}\,y_{j}^{\star}\,z_{i,j}.\end{array}

(5)

Figure 1 depicts the predictors, responses, and connections that affect the conditional distribution of response $Y_{i}$ . We provide three specific examples, depending on the support set of response $Y_{i}$ .

Refer to caption — Figure 1: Given $N=3$ units $1,2,3$ with neighborhoods $\mathscr{N}_{1}\coloneqq\{1,2\}$ , $\mathscr{N}_{2}\coloneqq\{1,2,3\}$ , and $\mathscr{N}_{3}\coloneqq\{2,3\}$ , the arrows indicate which predictors, responses, and connections can affect the response $Y_{1}$ of unit $1$ according to the model specified by Equations (2) and (4).

Example 1: Real-valued Responses $Y_{i}\in\mathbb{R}$ . Let $\psi\in(0,+\infty)$ and

\begin{array}[]{llllllllll}a_{\mathscr{Y}}(y_{i})&\coloneqq&\dfrac{1}{\sqrt{2\,\pi\,\psi}}\,\exp\left(-\dfrac{y_{i}^{2}}{2\,\psi}\right)\,\mathbb{I}(y_{i}\in\mathbb{R}).\end{array}

1.

Conditional distribution: The conditional distribution of response $Y_{i}$ is $N(\mu_{i}(\eta_{i}),\,\psi)$ .

Conditional mean: The conditional mean $\mu_{i}(\eta_{i})$ can be obtained by differentiating $b_{i}(\eta_{i})=\eta_{i}^{2}/\,2$ with respect to $\eta_{i}$ , giving $\mu_{i}(\eta_{i})=\eta_{i}$ :

\begin{array}[]{llllllllll}\mu_{i}(\eta_{i})&=&\alpha_{\mathscr{Y}}+\beta_{\mathscr{X},\mathscr{Y}}\;x_{i}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j:\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset}\,x_{j}\,z_{i,j}+\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j:\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset}\,y_{j}^{\star}\,z_{i,j}.\end{array}

Under certain restrictions on $\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}$ , the conditional distribution of $\bm{Y}\mid(\bm{X},\bm{Z})=(\bm{x},\bm{z})$ is $N$ -variate Gaussian. The restrictions on $\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}$ depend on the neighborhoods $\mathscr{N}_{i}$ and $\mathscr{N}_{j}$ and connections $Z_{i,j}$ of pairs of units $\{i,j\}\subset\mathscr{P}_{N}$ ; see Proposition A in Section A of the Supplementary Materials.

Example 2: Count-valued Responses $Y_{i}\in\{0,1,\dots\}$ . Let $\psi\coloneqq 1$ and

\begin{array}[]{llllllllll}a_{\mathscr{Y}}(y_{i})&\coloneqq&\dfrac{1}{y_{i}!}\;\mathbb{I}(y_{i}\in\{0,1,\dots\}).\end{array}

1.

Conditional distribution: The conditional distribution of response $Y_{i}$ is $\mbox{Poisson}(\mu_{i}(\eta_{i}))$ .
2.

Conditional mean: The conditional mean $\mu_{i}(\eta_{i})$ can be obtained by differentiating $b_{i}(\eta_{i})=\exp(\eta_{i})$ with respect to $\eta_{i}$ , giving $\mu_{i}(\eta_{i})=\exp(\eta_{i})$ .

Example 3: Binary Responses $Y_{i}\in\{0,\,1\}$ . Let $\psi\coloneqq 1$ and $a_{\mathscr{Y}}(y_{i})\coloneqq\mathbb{I}(y_{i}\in\{0,1\})$ .

1.

Conditional distribution: The conditional distribution of response $Y_{i}$ is $\mbox{Bernoulli}(\mu_{i}(\eta_{i}))$ .
2.

Conditional mean: The conditional mean $\mu_{i}(\eta_{i})$ can be obtained by differentiating $b_{i}(\eta_{i})=\log(1+\exp(\eta_{i}))$ with respect to $\eta_{i}$ , giving $\mu_{i}(\eta_{i})=\mbox{logit}^{-1}(\eta_{i})$ .

Interpretation of Examples. According to Equations (2) and (4), regardless of the conditional distribution of response $Y_{i}$ , $\alpha_{\mathscr{Y}}$ can be viewed as an intercept, while $\beta_{\mathscr{X},\mathscr{Y}}$ captures the relationship between predictor $x_{i}$ and response $Y_{i}$ . The parameters $\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}$ and $\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}$ capture two distinct spillover effects:

•

Treatment spillover: $\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\neq 0$ allows the outcome $Y_{i}$ of unit $i$ to be affected by the treatments $x_{j}$ of its neighbors $j\in\mathscr{N}_{i}$ and non-neighbors $j\not\in\mathscr{N}_{i}$ , provided $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset$ and $i$ and $j$ are connected (see Figure 1).
•

Outcome spillover: $\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\neq 0$ allows the outcome $Y_{i}$ of unit $i$ to be affected by the outcomes $y_{j}$ of its neighbors $j\in\mathscr{N}_{i}$ and non-neighbors $j\not\in\mathscr{N}_{i}$ , provided $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset$ and $i$ and $j$ are connected (see Figure 1).

The proposed regression framework can be used for causal inference under network interference, which studies treatment spillover. That said, the framework is considerably broader, because it permits outcome spillover, and spillover need not be studied in a causal setting.

2.2.2 GLM Representation of Connections $Z_{i,j}$

The conditional mean $\mu_{i,j}(\eta_{i,j})\coloneqq\mathbb{E}_{\eta_{i,j}}(Z_{i,j}\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})=\mbox{logit}^{-1}(\eta_{i,j})$ of connection $Z_{i,j}\in\{0,1\}$ depends on the linear predictor

\begin{array}[]{llllllllll}\eta_{i,j}=\alpha_{\mathscr{Z},i}+\alpha_{\mathscr{Z},j}-(1-c_{i,j})\,\lambda\,\log N+c_{i,j}\left[\gamma_{\mathscr{Z},\mathscr{Z}}\,\Delta_{i,j}(\bm{z})+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,(x_{i}\,y_{j}^{\star}+x_{j}\,y_{i}^{\star})+\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,y_{i}^{\star}\,y_{j}^{\star}\right],\end{array}

where $\Delta_{i,j}:\mathscr{Z}\mapsto\mathbb{R}$ is the change in $\sum_{a<b}^{N}d_{a,b}(\bm{z})$ due to transforming $z_{i,j}$ from $0$ to $1$ . The logistic regression representation of $Z_{i,j}\mid(\bm{X},\bm{Y},\bm{Z}_{-\{i,j\}})=(\bm{x},\bm{y},\bm{z}_{-\{i,j\}})$ facilitates interpretation: e.g., $\alpha_{\mathscr{Z},i}$ captures heterogeneity among units $i$ in forming connections. If $\lambda>0$ , the sparsity-inducing term $-(1-c_{i,j})\,\lambda\,\log N$ penalizes connections between pairs of units with non-overlapping neighborhoods, where the $\log N$ -term can be motivated in the special case of Bernoulli random graphs (Krivitsky et al., 2023): If $Z_{i,j}\mathop{\rm\sim}\limits^{\mbox{\tiny iid}}\mbox{Bernoulli}(\pi)$ and the expected degrees $\mathbb{E}\sum_{j=1}^{N}Z_{i,j}$ are bounded, then $\pi=O(1/N)$ and $\mbox{logit}(\pi)=O(\log N)$ . In addition, the model captures three forms of dependencies. First, the model encourages $i$ and $j$ to be connected when $i$ and $j$ are both connected to some $k\,\in\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}$ , provided $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset$ and $\gamma_{\mathscr{Z},\mathscr{Z}}>0$ . Second, the model encourages $i$ and $j$ to be connected when $x_{i}\,y_{j}^{\star}>0$ or $x_{j}\,y_{i}^{\star}>0$ , provided $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset$ and $\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}>0$ . Third, the model encourages $i$ and $j$ to be connected when $y_{i}^{\star}\;y_{j}^{\star}>0$ , provided $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset$ and $\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,>\,0$ .

3 Scalable Statistical Computing

To learn the regression framework from a single observation $(\bm{y},\bm{z})$ of dependent responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ , we develop scalable methods based on convex optimization of pseudo-likelihoods using minorization-maximization methods.

3.1 Pseudo-Loglikelihood

Let

\begin{array}[]{llllllllll}\ell(\bm{\theta};\,\bm{y},\,\bm{z})&\coloneqq&\displaystyle\sum\limits_{i=1}^{N}\ell_{i}(\bm{\theta};\,\bm{y},\,\bm{z})+\displaystyle\sum\limits_{i=1}^{N-1}\,\displaystyle\sum\limits_{j=i+1}^{N}\ell_{i,j}(\bm{\theta};\,\bm{y},\,\bm{z}),\end{array}

(6)

where the dependence on predictors $\bm{x}\in\mathscr{X}$ is suppressed and $\ell_{i}$ and $\ell_{i,j}$ are defined by

\begin{array}[]{llllllllll}\ell_{i}(\bm{\theta};\,\bm{y},\,\bm{z})&\coloneqq&\log f_{\bm{\theta}}(y_{i}\mid\bm{y}_{-i},\,\bm{z})&\mbox{and}&\ell_{i,j}(\bm{\theta};\,\bm{y},\,\bm{z})&\coloneqq&\log f_{\bm{\theta}}(z_{i,j}\mid\bm{y},\,\bm{z}_{-\{i,j\}}).\end{array}

The pseudo-loglikelihood $\ell$ is based on full conditional densities of responses $Y_{i}$ and connections $Z_{i,j}$ and is hence tractable. In addition, $\ell$ is a sum of exponential family loglikelihood functions $\ell_{i}$ and $\ell_{i,j}$ , each of which is concave and twice differentiable on the convex set $\bm{\Theta}$ (Brown, 1986, Theorem 1.13, p. 19 and Lemma 5.3, p. 146), proving Lemma 3.1:

Lemma 1: Convexity and Smoothness. The set $\bm{\Theta}$ is convex and the pseudo-loglikelihood function $\ell:\bm{\Theta}\mapsto\mathbb{R}$ , considered as a function of $\bm{\theta}$ for fixed $(\bm{y},\bm{z})\in\mathscr{Y}\times\mathscr{Z}$ , is twice differentiable with a negative semidefinite Hessian matrix on $\bm{\Theta}$ .

In light of the tractability and concavity of $\ell$ , it makes sense to base statistical learning on pseudo-likelihood estimators of the form

\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\coloneqq&\left\{\bm{\theta}\,\in\,\bm{\Theta}:\;|\!|\nabla_{\bm{\theta}}~\ell(\bm{\theta};\,\bm{y},\,\bm{z})|\!|_{\infty}\;\leq\;\delta_{N}\right\},\end{array}

(7)

where $\nabla_{\bm{\theta}}$ denotes the gradient with respect to $\bm{\theta}$ while $|\!|\bm{v}|\!|_{\infty}\,\coloneqq\,\max_{1\leq k\leq p}\,|v_{k}|$ denotes the $\ell_{\infty}$ -norm of vectors $\bm{v}\in\mathbb{R}^{p}$ . The quantity $\delta_{N}\in[0,+\infty)$ can be viewed as a convergence criterion of a root-finding algorithm and can depend on $N$ . The set $\widehat{\bm{\Theta}}(\delta_{N})$ consists of maximizers of $\ell$ when $\delta_{N}=0$ , and maximizers and near-maximizers when $\delta_{N}>0$ .

3.2 Minorization-Maximization (MM)

While pseudo-likelihood estimators $\widehat{\bm{\theta}}\in\widehat{\bm{\Theta}}(\delta_{N})$ can be obtained by standard root-finding algorithms, inverting the $p\times p$ negative Hessian of $\ell$ at each iteration is time-consuming, because inversions require $O(p^{3})$ operations and $p$ can increase with $N$ . We thus divide the task of estimating $p$ parameters into two subtasks using MM methods (Hunter and Lange, 2004). In the example model specified by Equations (2) and (4) for binary, count-, or real-valued predictors and responses $(X_{i},Y_{i})$ and binary connections $Z_{i,j}$ , we partition $\bm{\theta}\in\mathbb{R}^{N+6}$ into $N$ nuisance parameters, $\bm{\theta}_{1}\coloneqq(\alpha_{\mathscr{Z},1},\,\dots,\,\alpha_{\mathscr{Z},N})\in\mathbb{R}^{N}$ , and $6$ parameters of primary interest, $\bm{\theta}_{2}\coloneqq(\lambda,\,\alpha_{\mathscr{Y}},\,\beta_{\mathscr{X},\mathscr{Y}},\,\gamma_{\mathscr{Z},\mathscr{Z}},\,\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}},\,\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}})\in\mathbb{R}^{6}$ . We then partition the negative Hessian of $\ell$ accordingly:

\begin{array}[]{llllllllll}-\nabla_{\bm{\theta}}^{2}~\ell(\bm{\theta};\,\bm{y},\,\bm{z})&\coloneqq&\begin{pmatrix}\mathbf{A}(\bm{\theta})&\mathbf{B}(\bm{\theta})\\ \mathbf{B}(\bm{\theta})^{\top}&\mathbf{C}(\bm{\theta})&\end{pmatrix},\end{array}

(8)

where $\mathbf{A}(\bm{\theta})\in\mathbb{R}^{N\times N}$ , $\mathbf{B}(\bm{\theta})\in\mathbb{R}^{N\times 6}$ , and $\mathbf{C}(\bm{\theta})\in\mathbb{R}^{6\times 6}$ . We suppress the dependence of $\mathbf{A}(\bm{\theta})$ , $\mathbf{B}(\bm{\theta})$ , and $\mathbf{C}(\bm{\theta})$ on $(\bm{y},\,\bm{z})$ and henceforth write $\ell(\bm{\theta}_{1},\bm{\theta}_{2};\,\bm{y},\,\bm{z})$ instead of $\ell(\bm{\theta};\,\bm{y},\,\bm{z})$ .

Iteration $t+1$ then consists of two steps:

Step 1: Find $\bm{\theta}_{1}^{(t+1)}$ satisfying $\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})\,\geq\,\ell(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})$ .
Step 2: Find $\bm{\theta}_{2}^{(t+1)}$ satisfying $\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t+1)};\,\bm{y},\,\bm{z})\,\geq\,\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})$ .

In Step 1, it is inconvenient to invert the high-dimensional $N\times N$ matrix

\begin{array}[]{llllllllll}\bm{A}(\bm{\theta}^{(t)})&\coloneqq&-\displaystyle\sum\limits_{i<j}^{N}\,\nabla_{\bm{\theta}_{1}}^{2}\,\ell_{i,j}(\bm{\theta}_{1},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})\Big{|}_{\bm{\theta}_{1}=\bm{\theta}_{1}^{(t)}}&=&\displaystyle\sum\limits_{i<j}^{N}\pi_{i,j}^{(t)}\,(1-\pi_{i,j}^{(t)})\,\bm{e}_{i,j}\,\bm{e}_{i,j}^{\top},\end{array}

(9)

where $\pi_{i,j}^{(t)}\coloneqq\mathbb{P}_{\bm{\theta}^{(t)}}(Z_{i,j}=1\mid\bm{y},\,\bm{z}_{-\{i,j\}})$ . We thus increase $\ell$ by maximizing a minorizer of $\ell$ , replacing $\bm{A}(\bm{\theta}^{(t)})$ by a constant matrix $\bm{A}^{\star}$ that only needs to be inverted once.

Lemma 2: Minorizer. Define

\begin{array}[]{llllllllll}\bm{A}^{\star}&\coloneqq&\dfrac{1}{4}\,\displaystyle\sum\limits_{i<j}^{N}\bm{e}_{i,j}\,\bm{e}_{i,j}^{\top}\ =\ \dfrac{1}{4}\,\left[(N-2)\,\bm{I}+\bm{1}\bm{1}^{\top}\right]\ =\ \left[\dfrac{4}{N-2}\,\left(\bm{I}-\dfrac{1}{2\,N-2}\,\bm{1}\bm{1}^{\top}\right)\right]^{-1},\end{array}

where $\bm{I}$ is the $N\times N$ identity matrix and $\bm{1}$ is the $N$ -vector of ones. Then the function

\begin{array}[]{llllllllll}m(\bm{\theta}_{1};\,\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)},\,\bm{y},\,\bm{z})&\coloneqq&\ell(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})\vskip 7.11317pt\\ &+&\left(\nabla_{\bm{\theta}_{1}}\,\ell(\bm{\theta}_{1},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})\Big{|}_{\bm{\theta}_{1}=\bm{\theta}_{1}^{(t)}}\right)^{\top}(\bm{\theta}_{1}-\bm{\theta}_{1}^{(t)})\vskip 7.11317pt\\ &+&\dfrac{1}{2}\,(\bm{\theta}_{1}-\bm{\theta}_{1}^{(t)})^{\top}\,(-\bm{A}^{\star})\,(\bm{\theta}_{1}-\bm{\theta}_{1}^{(t)})\end{array}

is a minorizer of $\ell$ at $\bm{\theta}_{1}^{(t)}$ for fixed $\bm{\theta}_{2}^{(t)}$ , in the sense that

\begin{array}[]{llllllllll}m(\bm{\theta}_{1};\,\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)},\,\bm{y},\,\bm{z})&\leq&\ell(\bm{\theta}_{1},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})\;\mbox{ for all }\;\bm{\theta}_{1}\in\mathbb{R}^{N}\vskip 7.11317pt,\\ m(\bm{\theta}_{1}^{(t)};\,\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)},\,\bm{y},\,\bm{z})&=&\ell(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z}).\end{array}

Lemma 3.2 is proved in Section B of the Supplementary Materials. Step 1 may be implemented by an MM algorithm, as the closed-form maximizer of the minorizer $m$ is

\begin{array}[]{llllllllll}\bm{\theta}_{1}^{(t+1)}&\coloneqq&\bm{\theta}_{1}^{(t)}+\left(\mathbf{A}^{\star}\right)^{-1}\left(\nabla_{\bm{\theta}_{1}}\,\ell(\bm{\theta}_{1},\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})\Big{|}_{\bm{\theta}_{1}=\bm{\theta}_{1}^{(t)}}\right).\end{array}

(10)

We accelerate the MM step in Equation (10) with quasi-Newton methods. Details can be found in Section E of the Supplementary Materials. Compared to a Newton-Raphson algorithm, the accelerated MM-step reduces the per-iteration computational complexity from $O(N^{3})$ to $O(N^{2})$ . Step 2 updates the low-dimensional parameter vector of interest $\bm{\theta}_{2}^{(t+1)}\in\mathbb{R}^{6}$ given the high-dimensional nuisance parameter vector $\bm{\theta}_{1}^{(t+1)}\in\mathbb{R}^{N}$ by a Newton-Raphson step. The concavity of $\ell$ , established in Lemma 3.1, guarantees that

\begin{array}[]{llllllllll}\ell(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})&\leq&\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t)};\,\bm{y},\,\bm{z})&\leq&\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t+1)};\,\bm{y},\,\bm{z}).\end{array}

3.3 Quantifying Uncertainty

The uncertainty about the maximum pseudo-likelihood estimator $\widehat{\bm{\theta}}$ of the data-generating parameter vector $\bm{\theta}^{\star}$ can be quantified based on the covariance matrix of the sampling distribution of $\widehat{\bm{\theta}}$ , which we derive as follows: The mean-value theorem for vector-valued functions (Ortega and Rheinboldt, 2000, Equations (2) and (3), pp. 68–69) implies that there exist real numbers $t_{1},\ldots,t_{p}\in(0,\,1)$ such that

\begin{array}[]{llllllllll}\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{y},\,\bm{z})\Big{|}_{\bm{\theta}=\widehat{\bm{\theta}}}-\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{y},\,\bm{z})\Big{|}_{\bm{\theta}=\bm{\theta}^{\star}}&=&\bm{H}(\widehat{\bm{\theta}},\,\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\;(\widehat{\bm{\theta}}-\bm{\theta}^{\star}),\end{array}

(11)

where

\begin{array}[]{llllllllll}\bm{H}(\widehat{\bm{\theta}},\,\bm{\theta}^{\star};\,\bm{y},\,\bm{z})&\coloneqq&\left(\begin{array}[]{cccccc}g_{1}^{\prime}(\bm{\theta}^{\star}+t_{1}\,(\widehat{\bm{\theta}}-\bm{\theta}^{\star});\,\bm{y},\,\bm{z})\\ \vdots\\ g_{p}^{\prime}(\bm{\theta}^{\star}+t_{p}\,(\widehat{\bm{\theta}}-\bm{\theta}^{\star});\,\bm{y},\,\bm{z})\end{array}\right).\end{array}

Here, $g_{k}(\bm{\theta};\,\bm{y},\,\bm{z})$ is the $k$ th coordinate of $\nabla_{\bm{\theta}}\,\,\ell(\bm{\theta};\,\bm{y},\bm{z})$ and $g_{k}^{\prime}(\bm{\theta};\,\bm{y},\,\bm{z})$ is the row vector of partial derivatives of $g_{k}(\bm{\theta};\,\bm{y},\,\bm{z})$ with respect to $\bm{\theta}$ ( $k=1,\ldots,p$ ). Leveraging Equation (11) along with $\nabla_{\bm{\theta}}\,\,\ell(\bm{\theta};\,\bm{y},\,\bm{z})|_{\bm{\theta}=\widehat{\bm{\theta}}}=\bm{0}$ gives the exact covariance matrix $\widehat{\bm{\theta}}$ :

\begin{array}[]{llllllllll}\mathbb{V}_{\bm{\theta}^{\star}}(\widehat{\bm{\theta}})&=&\mathbb{V}_{\bm{\theta}^{\star}}\left[-\bm{H}(\widehat{\bm{\theta}},\,\bm{\theta}^{\star};\,\bm{Y},\bm{Z})^{-1}\;\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{Y},\,\bm{Z})\Big{|}_{\bm{\theta}=\bm{\theta}^{\star}}\right].\end{array}

If $N$ is large, $\bm{\theta}^{\star}$ can be replaced by $\widehat{\bm{\theta}}$ , because $|\!|\widehat{\bm{\theta}}-\bm{\theta}^{\star}|\!|_{\infty}$ is small with high probability according to Theorem 4 in Section 4. The resulting approximation of $\mathbb{V}_{\bm{\theta}^{\star}}(\widehat{\bm{\theta}})$ is

\begin{array}[]{llllllllll}\mathbb{V}_{\widehat{\bm{\theta}}}(\widehat{\bm{\theta}})&=&\mathbb{V}_{\widehat{\bm{\theta}}}\left[-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{Y},\,\bm{Z})\Big{|}_{\bm{\theta}=\widehat{\bm{\theta}}}^{-1}\;\,\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{Y},\,\bm{Z})\Big{|}_{\bm{\theta}=\widehat{\bm{\theta}}}\right],\end{array}

(12)

which can be estimated by simulating responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ from $\mathbb{P}_{\widehat{\bm{\theta}}}$ using Markov chain Monte Carlo methods.

Remark: Asymptotic Distribution. Establishing asymptotic normality for pseudo-likelihood estimators based on a single observation of dependent responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ in scenarios with $p\to\infty$ parameters is an open problem. Asymptotic normality results in the most closely related literature—the literature in applied probability concerned with Ising models, Gibbs measures, and Markov random fields in single-observation scenarios (e.g., Jensen and Künsch, 1994; Comets and Janzura, 1998)—assume the presence of lattice structure and a fixed number of parameters $p$ , in addition to other assumptions motivated by applications in physics. In the current setting, none of these assumptions holds, although simulation results in Section 5 suggest that the sampling distribution of $\widehat{\bm{\theta}}$ is approximately normal and that normal-based confidence intervals based on Equation (12) achieve close-to-nominal coverage probabilities.

4 Theoretical Guarantees

We establish convergence rates for pseudo-likelihood estimators $\widehat{\bm{\Theta}}(\delta_{N})$ based on a single observation of dependent responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ . To cover a wide range of models for binary, count-, and real-valued responses and connections, we introduce a general theoretical framework and showcase convergence rates in a specific example.

We denote by $\bm{\theta}^{\star}\in\bm{\Theta}\subseteq\mathbb{R}^{p}$ the data-generating parameter vector and by $\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\rho)\,\coloneqq\,\{\bm{\theta}\in\mathbb{R}^{p}:{\left|\!\left|\bm{\theta}-\bm{\theta}^{\star}\right|\!\right|_{\infty}}<\rho\}$ a hypercube with center $\bm{\theta}^{\star}\in\bm{\Theta}$ and width $2\,\rho\in(0,+\infty)$ . Let

\mathscr{I}(S)\,\coloneqq\,\left\{(\bm{y},\,\bm{z})\in\mathscr{Y}\times\mathscr{Z}:\;-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{y},\,\bm{z})\mbox{ is invertible for all $\bm{\theta}\in S$}\right\}

and, for some $\epsilon^{\star}\in(0,+\infty)$ and $\mathscr{H}\,\subseteq\,\mathscr{I}\left(\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})\right)$ , let

\Lambda_{N}(\bm{\theta}^{\star})\,\coloneqq\,\sup\limits_{(\bm{y},\,\bm{z})\,\in\,\mathscr{H}}\;\sup\limits_{\bm{\theta}\,\in\,\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})}\,|\!|\!|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{y},\,\bm{z}))^{-1}|\!|\!|_{\infty},

where $|\!|\!|.|\!|\!|_{\infty}$ is the $\ell_{\infty}$ -induced matrix norm. The set $\mathscr{H}$ can be a proper subset of $\mathscr{I}\left(\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})\right)$ , provided $\mathscr{H}$ is a high probability subset of $\mathscr{Y}\times\mathscr{Z}$ . The definition of $\mathscr{H}$ is motivated by the fact that characterizing the set of all $(\bm{y},\,\bm{z})\in\mathscr{Y}\times\mathscr{Z}$ for which the Hessian is invertible can be challenging, but finding a sufficient condition for invertibility is often possible.

Theorem 1: Convergence Rate. Consider a single observation of $(\bm{Y},\bm{Z})\in\mathscr{Y}\times\mathscr{Z}$ generated by model (1) with parameter vector $\bm{\theta}^{\star}\in\bm{\Theta}\subseteq\mathbb{R}^{p}$ , where $\mathscr{Y}\times\mathscr{Z}$ is a finite, countably infinite, or uncountable set. Assume that there exists a sequence $\rho_{1},\rho_{2},\ldots\in[0,+\infty)$ satisfying $\rho_{N}=o(1)$ so that the events $|\!|\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{Y},\bm{Z})|_{\bm{\theta}=\bm{\theta}^{\star}}-\mathbb{E}\;\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{Y},\bm{Z})|_{\bm{\theta}=\bm{\theta}^{\star}}|\!|_{\infty}<\delta_{N}$ and $(\bm{Y},\bm{Z})\in\mathscr{H}$ occur with probability $1-o(1)$ , where $\delta_{N}\coloneqq\rho_{N}/(2\,\Lambda_{N}(\bm{\theta}^{\star}))$ . Then there exists a positive integer $N_{0}$ such that, for all $N>N_{0}$ , the random set $\widehat{\bm{\Theta}}(\delta_{N})$ is non-empty and, with probability $1-o(1)$ , satisfies

\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\rho_{N}).\end{array}

Theorem 4 is proved in Section C of the Supplementary Materials. The requirement $\delta_{N}\coloneqq\rho_{N}/(2\,\Lambda_{N}(\bm{\theta}^{\star}))$ implies $\rho_{N}\propto\delta_{N}\,\Lambda_{N}(\bm{\theta}^{\star})$ , so the convergence rate depends on

•

the strength of concentration of the gradient $\nabla_{\bm{\theta}}\,\,\ell(\bm{\theta};\,\bm{Y},\bm{Z})|_{\bm{\theta}=\bm{\theta}^{\star}}$ around its expectation $\mathbb{E}\,\nabla_{\bm{\theta}}\,\,\ell(\bm{\theta};\,\bm{Y},\bm{Z})|_{\bm{\theta}=\bm{\theta}^{\star}}$ via $\delta_{N}$ ;
•

the inverse negative Hessian $(-\nabla_{\bm{\theta}}^{2}~\ell(\bm{\theta};\,\bm{y},\bm{z}))^{-1}$ in a neighborhood $\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})$ of $\bm{\theta}^{\star}\in\bm{\Theta}$ and a high probability subset $(\bm{y},\bm{z})\in\mathscr{H}$ of $\mathscr{Y}\times\mathscr{Z}$ via $\Lambda_{N}(\bm{\theta}^{\star})$ .

The strength of concentration of $\nabla_{\bm{\theta}}\,\,\ell(\bm{\theta};\,\bm{Y},\bm{Z})|_{\bm{\theta}=\bm{\theta}^{\star}}$ can be quantified by concentration inequalities for dependent random variables. In general, the strength of concentration depends on the sample space and the tails of the distribution, the smoothness of the functions $g_{i}$ and $h_{i,j}$ , and the dependence induced by model (1). To control the dependence among responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ , one can take advantage of additional structure (e.g., one or more neighborhood structures, non-overlapping or overlapping subpopulations, or a metric space in which units are embedded). For example, each unit can have one or more neighborhoods (e.g., geographical neighbors and colleagues in the workplace), and the responses and connections of the unit can be affected by any geographical neighbor and any colleague. Theoretical guarantees can be obtained as long as the neighborhoods are not too large and do not overlap too much.

Specific convergence rates depend on the model. To demonstrate, consider predictors $x_{i}\in\mathbb{R}$ , responses $Y_{i}\in\{0,1\}$ , and connections $Z_{i,j}\in\{0,1\}$ generated by a model capturing heterogeneity in the propensities $\alpha_{\mathscr{Z},1},\,\dots,\alpha_{\mathscr{Z},N}$ of units $1,\dots,N$ to form connections, transitive closure among connections with weight $\gamma_{\mathscr{Z},\mathscr{Z}}$ , and treatment spillover with weight $\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}$ ; compare Equations (2) and (4) in Section 2.2. Since $Y_{i}\in\{0,1\}$ and $Z_{i,j}\in\{0,1\}$ , it is reasonable to specify $a_{\mathscr{Y}}(y_{i})\coloneqq\mathbb{I}(y_{i}\in\{0,1\})$ and $a_{\mathscr{Z}}(z_{i,j})\coloneqq\mathbb{I}(z_{i,j}\in\{0,1\})$ . Convergence rates can be obtained under the following conditions.

Condition 1: Predictors. There exist finite constants $0<c<C$ such that, for each $i\in\mathscr{P}_{N}$ , $x_{i}\in[0,\,C]$ and there exists $j\in\mathscr{P}_{N}\setminus\,\{i\}$ such that $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset$ and $x_{j}\in[c,\,C]$ .

Condition 2: Parameters. The parameter space is $\bm{\Theta}=\mathbb{R}^{N+2}$ and there exists a constant $A\in(0,+\infty)$ , not depending on $N$ , such that $\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}<A$ .

Condition 3: Dependence. The population $\mathscr{P}$ consists of overlapping subpopulations $\mathscr{A}_{1},\mathscr{A}_{2},\ldots$ , which can be represented as vertices of a subpopulation graph $\mathscr{G}_{\mathscr{A}}$ with an edge connecting $\mathscr{A}_{k}$ and $\mathscr{A}_{l}$ if $\mathscr{A}_{k}\,\cap\,\mathscr{A}_{l}\neq\emptyset$ ( $k<l$ ). For each $\mathscr{A}_{k}$ , the number of subpopulations at geodesic distance $K$ in $\mathscr{G}_{\mathscr{A}}$ is $O(\log K)$ . For each $i\in\mathscr{P}_{i}$ , the neighborhood is

\begin{array}[]{llllllllll}\mathscr{N}_{i}&\coloneqq&\{j\,\in\,\mathscr{P}_{N}:\mbox{ there exists }k\in\{1,2,\ldots\}\text{ such that }i\in\mathscr{A}_{k}\text{ and }j\in\mathscr{A}_{k}\}.\end{array}

There exists a constant $B\in(0,\,+\infty)$ such that $\max_{1\leq i\leq N}|\mathscr{N}_{i}|<B$ .

Condition 4 imposes restrictions on $\bm{x}\in\mathbb{R}^{N}$ . Condition 4 requires that the data-generating parameter vector $\bm{\theta}^{\star}$ be contained in a compact subset of $\bm{\Theta}=\mathbb{R}^{N+2}$ . The set of estimators $\widehat{\bm{\Theta}}(\delta_{N})$ is not restricted by Condition 4 and consists of all $\bm{\theta}\in\mathbb{R}^{N+2}$ such that $|\!|\nabla_{\bm{\theta}}~\ell(\bm{\theta};\,\bm{Y},\bm{Z})|\!|_{\infty}\,\leq\,\delta_{N}$ . Condition 4 can be weakened in special cases, allowing $|\!|\bm{\theta}^{\star}|\!|_{\infty}=O(\log N)$ ; see Section D.3 of the Supplementary Materials. Condition 4 controls the dependence among responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ and can be weakened to $\max_{1\leq i\leq N}|\mathscr{N}_{i}|=O(\log N)$ , as demonstrated by Stewart and Schweinberger (2025) in the special case of connections $\bm{Z}$ (without predictors $\bm{X}$ and responses $\bm{Y}$ ).

Corollary 1: Example of Convergence Rate. Consider a single observation of dependent responses and connections $(\bm{Y},\bm{Z})$ generated by the model with parameter vector $\bm{\theta}^{\star}\coloneqq(\alpha_{\mathscr{Z},1}^{\star},\,\dots,\alpha_{\mathscr{Z},N}^{\star},\,\gamma_{\mathscr{Z},\mathscr{Z}}^{\star},\,\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}^{\star})\in\mathbb{R}^{N+2}$ . If Conditions 4–4 hold, there exist constants $K\in(0,+\infty)$ and $0<L\leq U<+\infty$ along with an integer $N_{0}\in\{3,4,\dots\}$ such that, for all $N>N_{0}$ , the quantity $\delta_{N}$ satisfies

\begin{array}[]{llllllllll}L\,\sqrt{N\log N}&\leq&\delta_{N}&\leq&U\,\sqrt{N\log N},\end{array}

and the random set $\widehat{\bm{\Theta}}(\delta_{N})$ is non-empty and satisfies

\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}\left(\bm{\theta}^{\star},\;K\,\sqrt{\dfrac{\log N}{N}}\right)\end{array}

with probability at least $1-6\,/N^{2}$ .

Corollary 4 is proved in Section D of the Supplementary Materials. The same method of proof can be used to establish convergence rates for pseudo-likelihood estimators $\widehat{\bm{\Theta}}(\delta_{N})$ based on other models for dependent responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ , provided there is additional structure to control the dependence among responses and connections.

5 Simulation Results

To evaluate the performance of pseudo-likelihood estimators $\widehat{\bm{\theta}}\in\widehat{\bm{\Theta}}(\delta_{N})$ and the accompanying uncertainty quantification, we simulate data from the example model specified by Equations (2) and (4). The coordinates of the nuisance parameter vector, $\bm{\theta}_{1}^{\star}\coloneqq(\alpha_{\mathscr{Z},1}^{\star}$ , $\dots$ , $\alpha_{\mathscr{Z},N}^{\star})\in\mathbb{R}^{N}$ , are independent Gaussian draws with mean $-3/2$ and standard deviation $3/10$ . The parameter vector of primary interest, $\bm{\theta}_{2}^{\star}\coloneqq(\lambda^{\star}$ , $\alpha_{\mathscr{Y}}^{\star}$ , $\beta_{\mathscr{X},\mathscr{Y}}^{\star}$ , $\gamma_{\mathscr{Z},\mathscr{Z}}^{\star}$ , $\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}^{\star}$ , $\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}^{\star})\in\mathbb{R}^{6}$ , is specified as $(1/5,\,-1,\,3,\,4/5,\,1/2,\,-1/2)$ . The sparsity parameter $\lambda^{\star}=1/5$ ensures that each unit has on average approximately 30 connections, regardless of the value of $N$ . The neighborhood structure is based on $L=(N-25)/25$ intersecting subpopulations $\mathscr{A}_{1},\ldots,\mathscr{A}_{L}$ , where $\mathscr{A}_{l}$ consists of the 50 units $1+25\,(l-1),\ldots,25\,(l+1)$ ( $l=1,\dots,L-1$ ). For each $i\in\mathscr{P}_{N}$ , we define the neighborhood $\mathscr{N}_{i}\subset\mathscr{P}_{N}$ to be the 50- or 75-unit union of all subpopulations $\mathscr{A}_{l}$ containing $i$ .

In Figure 2, the left panel shows that $|\!|\widehat{\bm{\theta}}-\bm{\theta}^{\star}|\!|_{\infty}$ decreases as $N$ increases. The right panel depicts the empirical distributions of the standardized univariate estimators and the empirical coverage probabilities, demonstrating that the covariance estimator in Equation (12) appears accurate and normal-based inference seems reasonable. As discussed in Section 1, comparisons to other estimation approaches are infeasible due to their lack of scalability.

6 Hate Speech on X

We analyze posts of U.S. state legislators on the social media platform X in the six months preceding the insurrection at the U.S. Capitol on January 6, 2021 (Kim et al., 2022), with a view to studying how hate speech depends on the attributes of legislators and connections among them. Using Large Language Models (LLMs), we classify the contents of 109,974 posts by $N=$ 2,191 legislators as “non-hate speech” or “hate speech,” as explained in Section G of the Supplementary Materials. The response $Y_{i}$ of legislator $i$ indicates whether $i$ released at least one post classified as hate speech. We use four covariates: $x_{i,1}$ indicates that legislator $i$ ’s party affiliation is Republican, $x_{i,2}$ indicates that legislator $i$ is female, $x_{i,3}$ indicates that legislator $i$ is white, and $x_{i,4}$ is the state legislature that legislator $i$ is a member of (e.g., New York). The directed connections $Z_{i,j}$ are based on the mentions and reposts exchanged between January 6, 2020 and January 6, 2021: $Z_{i,j}=1$ if legislator $i$ mentioned or reposted posts by legislator $j$ in a post. To construct the neighborhoods $\mathscr{N}_{i}$ of legislators $i$ , we exploit the fact that users of X choose whom to follow and that these choices are known, so $\mathscr{N}_{i}$ is defined as the union of $\{i\}$ and the set of users followed by $i$ .

6.1 Model Specification

To accommodate binary responses $Y_{i}\in\{0,1\}$ and connections $Z_{i,j}\in\{0,1\}$ that are directed, i.e., $Z_{i,j}$ may not be equal to $Z_{j,i}$ , we consider a model of the form

\begin{array}[]{llllllllll}f_{\bm{\theta}}(\bm{y},\,\bm{z}\mid\bm{x})&\propto&\left[\displaystyle\prod\limits_{i=1}^{N}a_{\mathscr{Y}}(y_{i})\,\exp(\bm{\theta}_{g}^{\top}\,g_{i}(\bm{x}_{i},\,y_{i}^{\star}))\right]\vskip 7.11317pt\\ &\times&\left[\displaystyle\prod\limits_{i=1}^{N}\,\displaystyle\prod\limits_{j=1,\,j\neq i}^{N}a_{\mathscr{Z}}(z_{i,j})\,\exp(\bm{\theta}_{h}^{\top}\,h_{i,j}(\bm{x},\,y_{i}^{\star},\,y_{j}^{\star},\,\bm{z}))\right],\end{array}

(13)

where $y_{i}^{\star}\coloneqq y_{i}/\psi=y_{i}$ because $\psi\coloneqq 1$ when $Y_{i}\in\{0,1\}$ ; see Example 3 in Section 2.2.1. Since $y_{i}^{\star}=y_{i}$ , we henceforth write $y_{i}$ instead of $y_{i}^{\star}$ .

Using the definitions of $c_{i,j}$ and $d_{i,j}$ in Equation (3), we specify $g_{i}$ and $h_{i,j}$ as follows:

\begin{array}[]{llllllllll}\bm{\theta}_{g}\,\coloneqq\,\left(\begin{array}[]{ccc}\alpha_{\mathscr{Y}}\\ \beta_{\mathscr{X},\mathscr{Y},m},\;m=1,2,3\end{array}\right),&g_{i}\,\coloneqq\,\left(\begin{array}[]{ccc}y_{i}\\ x_{i,m}\,y_{i},\;m=1,2,3\end{array}\right),\end{array}

(14)

\begin{array}[]{llllllllll}\bm{\theta}_{h}\coloneqq\left(\begin{array}[]{ccc}\bm{\alpha}_{\mathscr{Z},O}\\ \bm{\alpha}_{\mathscr{Z},I}\\ \lambda\\ \gamma_{\mathscr{X},\mathscr{Z},1}\\ \gamma_{\mathscr{X},\mathscr{Z},m},\;m=2,3,4\\ \gamma_{\mathscr{Y},\mathscr{Z}}\\ \gamma_{\mathscr{Z},\mathscr{Z},1}\\ \gamma_{\mathscr{Z},\mathscr{Z},2}\\ \gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\end{array}\right),&h_{i,j}\coloneqq\left(\begin{array}[]{ccc}\bm{e}_{i}\,z_{i,j}\\ \bm{e}_{j}\,z_{i,j}\\ -(1-c_{i,j})\,z_{i,j}\log N\\ c_{i,j}\,x_{i,1}\,z_{i,j}\\ c_{i,j}\,\mathbb{I}(x_{i,m}=x_{j,m})\,z_{i,j},\;m=2,3,4\\ c_{i,j}\,y_{j}\,z_{i,j}\\ \dfrac{1}{2}\,z_{i,j}\,z_{j,i}\\ d_{i,j}(\bm{z})\,z_{i,j}\\ c_{i,j}\,x_{i,1}\,y_{j}\,z_{i,j}\end{array}\right),\end{array}

(15)

where the $i$ th coordinate of $N$ -vector $\bm{e}_{i}\in\{0,1\}^{N}$ is $1$ and all other coordinates are $0$ . Here, $\bm{\alpha}_{\mathscr{Z},O}\coloneqq(\alpha_{\mathscr{Z},O,1},\ldots,\alpha_{\mathscr{Z},O,N})\in\mathbb{R}^{N}$ quantifies the activity of legislators $1,\dots,N$ , i.e., their tendency to mention or repost posts of other legislators; $\bm{\alpha}_{\mathscr{Z},I}\coloneqq(\alpha_{\mathscr{Z},I,1},\ldots,\alpha_{\mathscr{Z},I,N})\in\mathbb{R}^{N}$ quantifies the attractiveness of legislators $1,\dots,N$ , i.e., the tendency for other legislators to mention or repost posts by them; $\lambda>0$ discourages connections between legislators with non-overlapping neighborhoods; $\gamma_{\mathscr{X},\mathscr{Z},1},\dots,\gamma_{\mathscr{X},\mathscr{Z},4}\in\mathbb{R}$ capture the effects of covariates $x_{i,1},\dots,x_{i,4}$ on connections $Z_{i,j}$ ; $\gamma_{\mathscr{Y},\mathscr{Z}}\in\mathbb{R}$ is the weight of the interaction of $Y_{j}$ and $Z_{i,j}$ ; $\gamma_{\mathscr{Z},\mathscr{Z},1}\in\mathbb{R}$ quantifies the tendency to reciprocate connections; $\gamma_{\mathscr{Z},\mathscr{Z},2}\in\mathbb{R}$ quantifies the tendency to form transitive connections; and $\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}$ captures spillover from covariate $x_{i,1}$ on response $Y_{j}$ through connection $Z_{i,j}$ ; note that the spillover effect should not be interpreted as a causal effect, because the party affiliations $x_{i,1}$ of legislators $i$ are not under the control of investigators (Kim et al., 2022). Since $\sum_{i=1}^{N}Z_{i,j}=\sum_{j=1}^{N}Z_{i,j}$ with probability $1$ , we set $\alpha_{\mathscr{Z},I,N}\coloneqq 0$ to address the identifiability problem that would result if all $\alpha_{\mathscr{Z},O,i}$ and $\alpha_{\mathscr{Z},I,j}$ were allowed to vary freely. These model terms were chosen based on domain knowledge, because model selection is an open problem: For instance, the statistic $c_{i,j}\,x_{i,1}\,y_{j}\,z_{i,j}$ with weight $\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}$ is included to assess whether the party affiliation $x_{i,1}$ of state legislators $i$ affects posts $y_{j}$ of state legislators $j$ who are connected ( $z_{i,j}=1$ ) and whose neighborhoods overlap ( $c_{i,j}=1$ ). In practice, data scientists can consult domain experts to make informed choices regarding model specifications.

The specified model is estimated by an extension of the algorithm in Section 3.2 to directed connections; see Section F of the Supplementary Materials.

6.2 Results

Table 1: Maximum pseudo-likelihood estimates and standard errors based on the model specified by Equations (14) and (15).

Weight	Estimate	Standard Error	Weight	Estimate	Standard Error
$\alpha_{\mathcal{Y}}$	$-.893$	.134	$\gamma_{\mathcal{Z},\mathcal{Z},1}$	$2.57$	.033
$\beta_{\mathcal{X},\mathcal{Y},1}$	$-.257$	.105	$\gamma_{\mathcal{Z},\mathcal{Z},2}$	$.604$	.037
$\beta_{\mathcal{X},\mathcal{Y},2}$	$.069$	.094	$\gamma_{\mathcal{X},\mathcal{Z},1}$	$-.007$	.07
$\beta_{\mathcal{X},\mathcal{Y},3}$	$-.034$	.127	$\gamma_{\mathcal{X},\mathcal{Z},2}$	$.236$	.016
$\gamma_{\mathcal{Y},\mathcal{Z}}$	$.035$	.005	$\gamma_{\mathcal{X},\mathcal{Z},3}$	$.756$	.025
$\gamma_{\mathcal{X},\mathcal{Y},\mathcal{Z}}$	$.038$	.013	$\gamma_{\mathcal{X},\mathcal{Z},4}$	$4.729$	.049
			$\lambda$	$.184$	.006

To interpret the results, we exploit the fact that the conditional distributions of responses $Y_{i}$ and connections $Z_{i,j}$ can be represented by logistic regression models, with log odds

\begin{array}[]{llllllllll}\log\dfrac{\mathbb{P}_{\bm{\theta}}(Y_{i}=1\mid\text{others})}{1-\mathbb{P}_{\bm{\theta}}(Y_{i}=1\mid\text{others})}&=&\alpha_{\mathscr{Y}}+\displaystyle\sum\limits_{m=1}^{3}\beta_{\mathscr{X},\mathscr{Y},m}\,x_{i,m}+\displaystyle\sum\limits_{j:\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset}(\gamma_{\mathscr{Y},\mathscr{Z}}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,x_{j,1})\,z_{j,i}\end{array}

and

\begin{array}[]{llllllllll}&&\log\dfrac{\mathbb{P}_{\bm{\theta}}(Z_{i,j}=1\mid\text{others})}{1-\mathbb{P}_{\bm{\theta}}(Z_{i,j}=1\mid\text{others})}\;=\;\alpha_{\mathscr{Z},O,i}+\alpha_{\mathscr{Z},I,j}+\dfrac{1}{2}\,\gamma_{\mathscr{Z},\mathscr{Z},1}\,z_{j,i}-(1-c_{i,j})\,\lambda\,\log N\vskip 7.11317pt\\ &+&c_{i,j}\left(\gamma_{\mathscr{Z},\mathscr{Z},2}\;\Delta_{i,j}(\bm{z})+\gamma_{\mathscr{X},\mathscr{Z},1}\,x_{i,1}+\displaystyle\sum\limits_{m=2}^{4}\gamma_{\mathscr{X},\mathscr{Z},m}\,\mathbb{I}(x_{i,m}=x_{j,m})+\gamma_{\mathscr{Y},\mathscr{Z}}\,y_{j}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,x_{i,1}\,y_{j}\right).\end{array}

For instance, the positive sign of $\widehat{\gamma}_{\mathscr{X},\mathscr{Y},\mathscr{Z}}=.038$ suggests that the more Republicans interact with legislator $i$ , the higher is the conditional probability that legislator $i$ uses offensive text in a post, holding everything else constant. Alternatively, one can interpret $\widehat{\gamma}_{\mathscr{X},\mathscr{Y},\mathscr{Z}}$ in terms of the conditional probability of observing a connection: The positive sign of $\widehat{\gamma}_{\mathscr{X},\mathscr{Y},\mathscr{Z}}=.038$ indicates that Republican legislators are more likely to interact with legislators who post harmful language. Other estimates align with expectations. For example, serving for the same state is the strongest predictor for reposting and mentioning activities ( $\widehat{\gamma}_{\mathscr{X},\mathscr{Z},4}=4.729$ ), while matching gender ( $\widehat{\gamma}_{\mathscr{X},\mathscr{Z},2}=.236$ ) and race ( $\widehat{\gamma}_{\mathscr{X},\mathscr{Z},3}=.756$ ) likewise increase the conditional probability to interact. At the same time, connections affect other connections: For example, forming a connection that leads to a transitive connection is observed more often than expected under the model with $\gamma_{\mathscr{Z},\mathscr{Z},2}=0$ , holding everything else constant.

6.3 Model Assessment

Table 2: Comparison of maximum pseudo-likelihood estimates based on the logistic regression model for

Y_{i}\mid\bm{X}_{i}=\bm{x}_{i}

without network interference and the joint probability model for

(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}

with network interference.

	$\alpha_{\mathcal{Y}}$	$\beta_{\mathcal{X},\mathcal{Y},1}$	$\beta_{\mathcal{X},\mathcal{Y},2}$	$\beta_{\mathcal{X},\mathcal{Y},3}$
Without network interference	$-.101$ (.103)	$-.235$ (.097)	$.032$ (.093)	$-.169$ (.113)
With network interference	$-.893$ (.134)	$-.257$ (.105)	$.069$ (.094)	$-.034$ (.127)

We assess the model using model-based predictions of $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ . First, we focus on the subnetwork of all pairs of legislators $\{i,j\}\subset\mathscr{P}_{N}$ with $x_{i,1}=Y_{j}=1$ and $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset$ , with a view to assessing how well the interplay of $x_{i,1}$ , $Y_{j}$ , and $Z_{i,j}$ can be represented by the model. Figure 3 shows that the model captures the effect of $x_{i,1}$ on $Y_{j}$ among pairs of legislators $\{i,j\}\subset\mathscr{P}_{N}$ with $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset$ . Second, we compare predictions of responses $Y_{i}$ based on models with and without network interference. Predictions without network interference are based on the logistic regression model for $Y_{i}\mid\bm{X}_{i}=\bm{x}_{i}$ with weights $\alpha_{\mathscr{Y}}$ , $\beta_{\mathscr{X},\mathscr{Y},1}$ , $\beta_{\mathscr{X},\mathscr{Y},2}$ , and $\beta_{\mathscr{X},\mathscr{Y},3}$ , which assumes that the posts $Y_{i}$ of legislators $i$ are independent and do not depend on the connections $\bm{Z}$ among the legislators. By contrast, predictions with network interference are based on the joint probability model for $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ specified by Equation (13), which allows the posts $Y_{i}$ of state legislators $i$ to be affected by the posts $Y_{j}$ of connected legislators $j$ whose sets of followees overlap. Figure 4 demonstrates that predictions based on models with network interference outperform those without network interference, suggesting that posts of connected state legislators with overlapping sets of followees are interdependent. Table 2 compares estimates of $\alpha_{\mathcal{Y}}$ , $\beta_{\mathcal{X},\mathcal{Y},1}$ , $\beta_{\mathcal{X},\mathcal{Y},2}$ , and $\beta_{\mathcal{X},\mathcal{Y},3}$ based on models with and without network interference. While both models agree on the signs of parameter estimates, the estimates of $\beta_{\mathcal{X},\mathcal{Y},2}$ and $\beta_{\mathcal{X},\mathcal{Y},3}$ differ by a factor of 2 and 5, respectively, suggesting that network interference affects parameter estimates. Third, we demonstrate in Section G.2 of the Supplementary Materials that the model preserves salient features of connections $\bm{Z}$ .

7 Discussion

The proposed regression framework is flexible, allowing data scientists to specify a wide range of models for dependent responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ .

The large set of possible models raises the question of how data scientists can select a model from a set of candidate models. While model selection for independent responses $Y_{i}\mid\bm{X}_{i}=\bm{x}_{i}$ is well-established and model selection for independent connections $\bm{Z}\mid\bm{X}=\bm{x}$ is an active area of research (e.g., Wang and Bickel, 2017; Wang et al., 2024; Stein et al., 2025), model selection for dependent responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ with $p\to\infty$ parameters is an open problem. Two model selection ideas that hold promise are those of Ravikumar et al. (2010) for high-dimensional graphical models for dependent responses $\bm{Y}$ and those of Chen and Chen (2012) for high-dimensional generalized linear models for independent responses $Y_{i}\mid\bm{X}_{i}=\bm{x}_{i}$ based on the extended BIC.

Likewise, open questions remain in the realm of uncertainty quantification, as discussed in Section 3.3. For example, a proof of asymptotic normality based on dependent data remains elusive. A related avenue for future research is Godambe information: If $-\nabla_{\bm{\theta}}^{2}\,\,\ell(\bm{\theta};\,\bm{Y},\bm{Z})|_{\bm{\theta}=\widehat{\bm{\theta}}}^{-1}$ were constant, it could be pulled out of the approximate covariance matrix in Equation (12), giving rise to Godambe information. Simulations suggest that uncertainty quantification based on Godambe information achieves comparable accuracy to the method reported here, while avoiding multiple matrix inversions.

The question of neighborhood recovery is another important direction for future research, as discussed in Section 2.

Supplementary Materials

The supplementary materials contain proofs of all theoretical results.

Acknowledgements

The authors are indebted to the constructive comments and suggestions of an anonymous associate editor and two referees, which have led to numerous improvements.

Disclosure Statement

The authors report there are no competing interests to declare.

Funding

The authors acknowledge support by DFG award FR 4768/1-1 (CF) and ARO award W911NF-21-1-0335 (CF, MS, SB).

References

Besag (1974) Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society: Series B 36, 192–225.
Brown (1986) Brown, L. (1986). Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory. Hayworth, CA, USA: Institute of Mathematical Statistics.
Chatterjee and Diaconis (2013) Chatterjee, S. and P. Diaconis (2013). Estimating and understanding exponential random graph models. The Annals of Statistics 41, 2428–2461.
Chen and Chen (2012) Chen, J. and Z. Chen (2012). Extended BIC for small- $n$ -large- $p$ sparse GLM. Statistica Sinica 22, 555–574.
Clark and Handcock (2024) Clark, D. A. and M. S. Handcock (2024). Causal inference over stochastic networks. Journal of the Royal Statistical Society Series A: Statistics in Society 187, 772–795.
Comets and Janzura (1998) Comets, F. and M. Janzura (1998). A central limit theorem for conditionally centred random fields with an application to Markov fields. Journal of Applied Probability 35, 608–621.
Efron (2022) Efron, B. (2022). Exponential families in theory and practice. Cambridge, MA: Cambridge University Press.
Fosdick and Hoff (2015) Fosdick, B. K. and P. D. Hoff (2015). Testing and modeling dependencies between a network and nodal attributes. Journal of the American Statistical Association 110, 1047–1056.
Handcock (2003) Handcock, M. S. (2003). Statistical models for social networks: Inference and degeneracy. In R. Breiger, K. Carley, and P. Pattison (Eds.), Dynamic Social Network Modeling and Analysis, pp. 1–12. Washington, D.C.: National Academies Press.
Huang et al. (2019) Huang, D., W. Lan, H. H. Zhang, and H. Wang (2019). Least squares estimation of spatial autoregressive models for large-scale social networks. Electronic Journal of Statistics 13, 1135–1165.
Huang et al. (2020) Huang, D., F. Wang, X. Zhu, and H. Wang (2020). Two-mode network autoregressive model for large-scale networks. Journal of Econometrics 216, 203–219.
Huang et al. (2024) Huang, S., J. Sun, and Y. Feng (2024). PCABM: Pairwise covariates-adjusted block model for community detection. Journal of the American Statistical Association 119, 2092–2104.
Hunter and Lange (2004) Hunter, D. R. and K. Lange (2004). A tutorial on MM algorithms. The American Statistician 58, 30–37.
Jensen and Künsch (1994) Jensen, J. L. and H. R. Künsch (1994). On asymptotic normality of pseudo likelihood estimates for pairwise interaction processes. Annals of the Institute of Mathematical Statistics 46, 475–486.
Kim et al. (2022) Kim, T., N. Nakka, I. Gopal, B. A. Desmarais, A. Mancinelli, J. J. Harden, H. Ko, and F. J. Boehmke (2022). Attention to the COVID‐19 pandemic on Twitter: Partisan differences among U.S. state legislators. Legislative Studies Quarterly 47, 1023–1041.
Kolaczyk (2017) Kolaczyk, E. D. (2017). Topics at the Frontier of Statistics and Network Analysis: (Re)Visiting the Foundations. Cambridge University Press.
Krivitsky et al. (2023) Krivitsky, P. N., P. Coletti, and N. Hens (2023). A tale of two datasets: Representativeness and generalisability of inference for samples of networks. Journal of the American Statistical Association 118, 2213–2224.
Le and Li (2022) Le, C. M. and T. Li (2022). Linear regression and its inference on noisy network-linked data. Journal of the Royal Statistical Society Series B: Statistical Methodology 84, 1851–1885.
Lei et al. (2024) Lei, J., K. Chen, and H. Moon (2024). Least squares inference for data with network dependency. Available at arXiv:2404.01977.
Li and Wager (2022) Li, S. and S. Wager (2022). Random graph asymptotics for treatment effect estimation under network interference. The Annals of Statistics 50, 2334 – 2358.
Li et al. (2019) Li, T., E. Levina, and J. Zhu (2019). Prediction models for network-linked data. The Annals of Applied Statistics 13(1), 132–164.
Niezink and Snijders (2017) Niezink, N. M. D. and T. A. B. Snijders (2017). Co-evolution of social networks and continuous actor attributes. The Annals of Applied Statistics 11, 1948–1973.
Ogburn et al. (2024) Ogburn, E. L., O. Sofrygin, I. Diaz, and M. J. Van der Laan (2024). Causal inference for social network data. Journal of the American Statistical Association 119, 597–611.
Ortega and Rheinboldt (2000) Ortega, J. M. and W. C. Rheinboldt (2000). Iterative solution of nonlinear equations in several variables. Society for Industrial and Applied Mathematics.
Ravikumar et al. (2010) Ravikumar, P., M. J. Wainwright, and J. Lafferty (2010). High-dimensional Ising model selection using $\ell_{1}$ -regularized logistic regression. The Annals of Statistics 38, 1287–1319.
Schweinberger (2011) Schweinberger, M. (2011). Instability, sensitivity, and degeneracy of discrete exponential families. Journal of the American Statistical Association 106, 1361–1370.
Snijders et al. (2007) Snijders, T. A. B., C. E. G. Steglich, and M. Schweinberger (2007). Modeling the co-evolution of networks and behavior. In K. van Montfort, H. Oud, and A. Satorra (Eds.), Longitudinal models in the behavioral and related sciences, pp. 41–71. Lawrence Erlbaum.
Stein et al. (2025) Stein, S., R. Feng, and C. Leng (2025). A sparse beta regression model for network analysis. Journal of the American Statistical Association. To appear.
Stewart and Schweinberger (2025) Stewart, J. R. and M. Schweinberger (2025). Pseudo-likelihood-based $M$ -estimators for random graphs with dependent edges and parameter vectors of increasing dimension. The Annals of Statistics. To appear.
Tchetgen Tchetgen et al. (2021) Tchetgen Tchetgen, E. J., I. R. Fulcher, and I. Shpitser (2021). Auto-G-computation of causal effects on a network. Journal of the American Statistical Association 116, 833–844.
Wang et al. (2024) Wang, J., X. Cai, X. Niu, and R. Li (2024). Variable selection for high-dimensional nodal attributes in social networks with degree heterogeneity. Journal of the American Statistical Association 119, 1322–1335.
Wang and Bickel (2017) Wang, Y. X. and P. J. Bickel (2017). Likelihood-based model selection for stochastic block models. The Annals of Statistics 45, 500–528.
Wang et al. (2024) Wang, Z., I. E. Fellows, and M. S. Handcock (2024). Understanding networks with exponential-family random network models. Social Networks 78, 81–91.
Zhu et al. (2020) Zhu, X., D. Huang, R. Pan, and H. Wang (2020). Multivariate spatial autoregressive model for large scale social networks. Journal of Econometrics 215, 591–606.

Supplementary Materials:
A Regression Framework for Studying Relationships among Attributes under Network Interference

\startcontents\printcontents

Appendix A Proofs of Propositions 2.1 and A

Proof of Proposition 2.1. The joint probability density function of $(\bm{Y},\,\bm{Z})\mid\bm{X}=\bm{x}$ stated in Equation (1) in Section 2 implies that the conditional probability density function of $Y_{i}\mid(\bm{X},\,\bm{Y}_{-i},\,\bm{Z})=(\bm{x},\,\bm{y}_{-i},\,\bm{z})$ can be written as

\begin{array}[]{llllllllll}&&f_{\bm{\theta}}(y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})\;=\;\dfrac{f_{\bm{\theta}}(y_{i},\,\bm{y}_{-i},\,\bm{z}\mid\bm{x})}{\displaystyle\int\limits_{\mathscr{Y}_{i}}f_{\bm{\theta}}(y,\,\bm{y}_{-i},\,\bm{z}\mid\bm{x})\mathop{\mbox{d}}\nolimits\nu_{\mathscr{Y}}(y)}\vskip 7.11317pt\vskip 7.11317pt\\ &=&\dfrac{a_{\mathscr{Y}}(y_{i})\,\exp\left(\bm{\theta}_{g}^{\top}\,g_{i,1}(\bm{x}_{i})\,y_{i}^{\star}+\left(\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\;\bm{\theta}_{h}^{\top}\,h_{i,j,1}(\bm{x},\,y_{j}^{\star},\,\bm{z})\right)y_{i}^{\star}\right)}{\displaystyle\int\limits_{\mathscr{Y}_{i}}a_{\mathscr{Y}}(y)\,\exp\left(\bm{\theta}_{g}^{\top}\,g_{i,1}(\bm{x}_{i})\,y^{\star}+\left(\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\;\bm{\theta}_{h}^{\top}\,h_{i,j,1}(\bm{x},\,y_{j}^{\star},\,\bm{z})\right)y^{\star}\right)\mathop{\mbox{d}}\nolimits\nu_{\mathscr{Y}}(y)}\vskip 7.11317pt\vskip 7.11317pt\\ &=&a_{\mathscr{Y}}(y_{i})\,\exp\left(\dfrac{\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z})\,y_{i}-b_{i}(\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z}))}{\psi}\right),\end{array}

where $y^{\star}\coloneqq y/\psi$ , $y_{i}^{\star}\coloneqq y_{i}/\psi$ , and $\bm{y}_{-i}^{\star}\coloneqq\bm{y}_{-i}/\psi$ , while

\begin{array}[]{llllllllll}\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z})\;\coloneqq\;\bm{\theta}^{\top}\left(g_{i,1}(\bm{x}_{i}),\;\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,h_{i,j,1}(\bm{x},\,y_{j}^{\star},\,\bm{z})\right)\vskip 7.11317pt\\ b_{i}(\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z}))\;\coloneqq\;\psi\,\log\displaystyle\int\limits_{\mathscr{Y}_{i}}\,a_{\mathscr{Y}}(y)\exp\left(\dfrac{\eta_{i}(\bm{\theta};\,\bm{x},\,\bm{y}_{-i}^{\star},\,\bm{z})\,y}{\psi}\right)\mathop{\mbox{d}}\nolimits\nu_{\mathscr{Y}}(y).\end{array}

Proposition 2. Consider Example 1 in Section 2.2.1. Let $\bm{U}\in\{0,1\}^{N\times N}$ be the $N\times N$ matrix with elements

\begin{array}[]{llllllllll}u_{i,j}&\coloneqq&c_{i,j}\,z_{i,j}&=&\mathbbm{1}(\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset)\,z_{i,j},\end{array}

(A.1)

and let $\bm{v}\in\mathbb{R}^{N}$ be the $N$ -vector with coordinates

\begin{array}[]{llllllllll}v_{i}&\coloneqq&\alpha_{\mathscr{Y}}+\beta_{\mathscr{X},\mathscr{Y}}\;x_{i}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;x_{j}.\end{array}

(A.2)

Denote by $\bm{I}$ the $N\times N$ identity matrix and define $\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\coloneqq\,\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}/\psi$ . If $(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})$ is positive definite, the conditional distribution of $\bm{Y}\mid(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z})$ is $N$ -variate Gaussian with mean vector $(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})^{-1}\,\bm{v}$ and covariance matrix $\psi\,(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})^{-1}$ .

Remark. The requirement that $(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})$ be positive definite imposes restrictions on $\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}$ . The restrictions on $\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}$ depend on the neighborhoods $\mathscr{N}_{i}$ of units $i\in\mathscr{P}_{N}$ and connections $Z_{i,j}$ among pairs of units $\{i,j\}\subset\mathscr{P}_{N}$ .

Proof of Proposition A. Example 1 in Section 2.2.1 demonstrates that the conditional distribution of $Y_{i}\mid(\bm{X},\,\bm{Y}_{-i},\,\bm{Z})=(\bm{x},\,\bm{y}_{-i},\,\bm{z})$ is Gaussian with conditional mean

	$\displaystyle\mathbb{E}(Y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})~$	$\displaystyle=~\alpha_{\mathscr{Y}}+\beta_{\mathscr{X},\mathscr{Y}}\,x_{i}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;x_{j}+\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;y_{j}^{\star}\vskip 7.11317pt$
		$\displaystyle=~v_{i}+\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;y_{j},$		(A.3)

where

\begin{array}[]{llllllllll}v_{i}&\coloneqq&\alpha_{\mathscr{Y}}+\beta_{\mathscr{X},\mathscr{Y}}\,x_{i}+\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;x_{j}\end{array}

and

\begin{array}[]{llllllllll}\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}&\coloneqq&\dfrac{\gamma_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}}{\psi}.\end{array}

The conditional variance of $Y_{i}\mid(\bm{X},\,\bm{Y}_{-i},\bm{Z})=(\bm{x},\,\bm{y}_{-i},\,\bm{z})$ is

\begin{array}[]{llllllllll}\mathbb{V}(Y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})&=&\psi.\end{array}

(A.4)

Let $\bm{m}\coloneqq(m_{i})\in\mathbb{R}^{N}$ be the conditional mean of $\bm{Y}\mid(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z})$ . Upon taking expectation on both sides of (A) conditional on $(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z})$ , we obtain

\begin{array}[]{llllllllll}m_{i}&=&v_{i}+\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;m_{j},\end{array}

(A.5)

which implies that

\begin{array}[]{llllllllll}v_{i}&=&m_{i}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;m_{j}\end{array}

and hence

$\displaystyle\mathbb{E}(Y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})$	$\displaystyle=~v_{i}+\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;y_{j}\vskip 7.11317pt$
	$\displaystyle=~m_{i}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;m_{j}+\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;y_{j}\vskip 7.11317pt$
	$\displaystyle=~m_{i}+\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\,\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,u_{i,j}\;(y_{j}-m_{j})\vskip 7.11317pt$
	$\displaystyle=~m_{i}-\displaystyle\sum\limits_{j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}}\,b_{i,j}\;(y_{j}-m_{j}),$	(A.6)

where

\begin{array}[]{llllllllll}b_{i,j}&\coloneqq&-\,\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;u_{i,j}.\end{array}

By comparing Equations (A.4) and (A) to Equations (2.17) and (2.18) of \citetsupprue2005gaussian and invoking Theorem 2.6 of \citetsupprue2005gaussian, we conclude that the conditional distribution of $\bm{Y}\mid(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z})$ is $N$ -variate Gaussian with mean vector $\bm{m}\in\mathbb{R}^{N}$ and precision matrix $\bm{P}\in\mathbb{R}^{N\times N}$ with elements

\begin{array}[]{llllllllll}p_{i,j}&\coloneqq&\begin{cases}\dfrac{1}{\psi}&\mbox{if }i=j\vskip 7.11317pt\\ \dfrac{b_{i,j}}{\psi}&\mbox{if }i\neq j,\end{cases}\end{array}

provided $u_{i,j}=u_{j,i}$ for all $i\neq j$ and $\bm{P}$ is positive definite; note that $u_{i,j}=u_{j,i}$ is satisfied in undirected networks with $z_{i,j}=z_{j,i}$ .

To state these results in matrix form, note that (A.5) can be expressed as

\begin{array}[]{llllllllll}\bm{m}&=&\bm{v}+\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U}\,\bm{m},\end{array}

implying

\begin{array}[]{llllllllll}\bm{m}&=&(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})^{-1}\;\bm{v},\end{array}

while $\bm{P}$ can be expressed as

\begin{array}[]{llllllllll}\bm{P}&=&\dfrac{1}{\psi}\,(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U}),\end{array}

implying

\begin{array}[]{llllllllll}\bm{P}^{-1}&=&\psi\,(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})^{-1}.\end{array}

To conclude, the conditional distribution of $\bm{Y}\mid(\bm{X},\,\bm{Z})=(\bm{x},\,\bm{z})$ is $N$ -variate Gaussian with mean vector $(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})^{-1}\,\bm{v}$ and covariance matrix $\psi\,(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})^{-1}$ , provided $(\bm{I}-\xi_{\mathscr{Y},\mathscr{Y},\mathscr{Z}}\;\bm{U})$ is positive definite.

Appendix B Proofs of Lemmas 3.1 and 3.2

Proof of Lemma 3.1. Lemma 3.1 is proved in the sentence preceeding the statement of Lemma 3.1 in Section 3.1.

Proof of Lemma 3.2. Letting $\bm{\Theta}_{1}$ denote the parameter space of $\bm{\theta}_{1}$ , suppose that $v:\bm{\Theta}_{1}\mapsto\mathbb{R}$ is any twice differentiable function and that $\nabla^{2}\,v(\bm{\theta})-\bm{M}$ is non-negative definite for all $\bm{\theta}\in\bm{\Theta}_{1}$ for some constant matrix $\bm{M}\in\mathbb{R}^{d\times d}$ ( $d\geq 1$ ). Then the function $u:\bm{\Theta}_{1}\mapsto\mathbb{R}$ given by

\begin{array}[]{llllllllll}u(\bm{\theta}_{1})&\coloneqq&v(\bm{\theta}_{0})+(\bm{\theta}_{1}-\bm{\theta}_{0})^{\top}\,\nabla\,v(\bm{\theta}_{0})+\dfrac{1}{2}(\bm{\theta}_{1}-\bm{\theta}_{0})^{\top}\bm{M}(\bm{\theta}_{1}-\bm{\theta}_{0}),&&\bm{\theta}_{0}\in\bm{\Theta}_{1}\end{array}

satisfies $u(\bm{\theta}_{1})\leq v(\bm{\theta}_{1})$ for all $\bm{\theta}_{1}\in\bm{\Theta}_{1}$ , because Taylor’s theorem (Theorem 6.11, \citealpsupp[p. 124]magnus_matrix_2019) gives

\begin{array}[]{llllllllll}u(\bm{\theta}_{1})-v(\bm{\theta}_{1})\;=\;\dfrac{1}{2}\,(\bm{\theta}_{1}-\bm{\theta}_{0})^{\top}\left[\nabla^{2}\,v(\dot{\bm{\theta}})-\bm{M}\right](\bm{\theta}_{1}-\bm{\theta}_{0}),\end{array}

where $\dot{\bm{\theta}}\coloneqq\phi\,\bm{\theta}_{0}+(1-\phi)\,\bm{\theta}_{1}\in\bm{\Theta}_{1}$ ( $\phi\in[0,\,1]$ ). The inequality $1/4\,\geq\,\pi_{i,j}\,(1-\pi_{i,j})$ implies that

\begin{array}[]{llllllllll}-[\bm{A}(\bm{\theta}_{1})-\bm{A}^{\star}]&=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\left[\dfrac{1}{4}-\pi_{i,j}^{(t)}\,(1-\pi_{i,j}^{(t)})\right]\,\bm{e}_{i,j}\,\bm{e}_{i,j}^{\top}\end{array}

is non-negative definite. Lemma 3.1 proves that $\bm{\theta}_{1}$ is concave and that the restriction of $\ell(\bm{\theta})$ to $\bm{\theta}_{1}$ has the properties of $v(\bm{\theta}_{1})$ stated above, proving Lemma 3.2.

Appendix C Proof of Theorem 4

Theorem 4 is a generalization of Theorem 2 of \citetsupp[][abbreviated as S25]StSc20 from exponential family models for binary connections $\bm{Z}$ to exponential family models for binary, count-, and real-valued responses and connections $(\bm{Y},\bm{Z})\mid\bm{X}=\bm{x}$ . We henceforth suppress predictors $\bm{x}\in\mathscr{X}$ .

Proof of Theorem 4. Let $\bm{s}(\bm{\theta};\,\bm{y},\bm{z})\coloneqq\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\,\bm{y},\bm{z})$ and consider events

\begin{array}[]{cll}\mathscr{C}(\delta_{N})&\coloneqq&\left\{(\bm{y},\,\bm{z})\in\mathscr{Y}\times\mathscr{Z}:\,\left|\!\left|\bm{s}(\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}\;\leq\;\delta_{N}\right\}\vskip 7.11317pt\\ \mathscr{H}&\subseteq&\left\{(\bm{y},\,\bm{z})\in\mathscr{Y}\times\mathscr{Z}:\;-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{y},\,\bm{z})\mbox{ is invertible for all $\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})$}\right\}.\end{array}

Define

\begin{array}[]{cll}\Lambda_{N,\bm{y},\bm{z}}(\bm{\theta}^{\star})&\coloneqq&\sup\limits_{\bm{\theta}\,\in\,\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})}\,|\!|\!|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{y},\,\bm{z}))^{-1}|\!|\!|_{\infty},\;\;\;(\bm{y},\,\bm{z})\in\mathscr{H}\vskip 7.11317pt\\ \Lambda_{N}(\bm{\theta}^{\star})&\coloneqq&\sup\limits_{(\bm{y},\,\bm{z})\,\in\,\mathscr{H}}\,\Lambda_{N,\bm{y},\bm{z}}(\bm{\theta}^{\star}).\end{array}

It follows from results of S25 that $s(\bm{\theta};\,\bm{y},\bm{z})$ , considered as a function of $\bm{\theta}\in\bm{\Theta}$ for fixed $(\bm{y},\bm{z})\in\mathscr{H}$ , is a homeomorphism and is continuously differentiable.

In the event $(\bm{Y},\bm{Z})\in\mathscr{C}(\delta_{N})$ , the set ${\bm{\Theta}}(\delta_{N})$ is non-empty. By construction of the sets $\mathscr{C}(\delta_{N})$ and $\widehat{\bm{\Theta}}(\delta_{N})$ , the set $\widehat{\bm{\Theta}}(\delta_{N})$ is non-empty for all $(\bm{y},\,\bm{z})\in\mathscr{C}(\delta_{N})$ , because $\widehat{\bm{\Theta}}(\delta_{N})$ contains the data-generating parameter vector $\bm{\theta}^{\star}\in\bm{\Theta}$ provided $(\bm{y},\,\bm{z})\in\mathscr{C}(\delta_{N})$ :

\begin{array}[]{llllllllll}\bm{\theta}^{\star}&\in&\widehat{\bm{\Theta}}(\delta_{N})&\coloneqq&\left\{\bm{\theta}\in\bm{\Theta}:\;\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}\,\leq\,\delta_{N}\right\}.\end{array}

In the event $(\bm{Y},\bm{Z})\in\mathscr{C}(\delta_{N})\,\cap\,\,\mathscr{H}$ , the set $\widehat{\bm{\Theta}}(\delta_{N})$ satisfies $\widehat{\bm{\Theta}}(\delta_{N})\subseteq\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\rho_{N})$ provided $N>N_{0}$ . By assumption, there exists a sequence $\rho_{1},\rho_{2},\dots\in[0,+\infty)$ such that $\rho_{N}=o(1)$ . Therefore, there exists an integer $N_{0}\in\{1,2,\dots\}$ such that $\rho_{N}<\epsilon^{\star}$ for all $N>N_{0}$ . Consider any $N>N_{0}$ and any $(\bm{y},\,\bm{z})\in\mathscr{C}(\delta_{N})\,\cap\,\mathscr{H}$ . Since $\bm{s}^{-1}(\,\cdot\,;\,\bm{y},\,\bm{z})$ is continuous on $\bm{\Theta}$ , there exists, for each $(\bm{y},\,\bm{z})\in\mathscr{H}$ , a real number $\epsilon_{N}(\rho_{N})\in(0,\,+\infty)$ (which depends on $(\bm{y},\,\bm{z})\in\mathscr{H}$ ) such that

\begin{array}[]{llllllllll}\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\,\bm{z})-\bm{s}(\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}&\leq&\epsilon_{N}(\rho_{N})&\mbox{implies}&\left|\!\left|\bm{\theta}-\bm{\theta}^{\star}\right|\!\right|_{\infty}&\leq&\rho_{N}.\end{array}

(C.1)

As $\bm{s}(\bm{\theta};\,\bm{y},\bm{z})$ is a homeomorphism and continuously differentiable, we can invoke Lemma 1 of S25 to conclude that $\epsilon_{N}(\rho_{N})$ is related to $\rho_{N}$ by the following inequality:

\begin{array}[]{llllllllll}\dfrac{\rho_{N}}{\Lambda_{N,\bm{y},\bm{z}}(\bm{\theta}^{\star})}&\leq&\epsilon_{N}(\rho_{N}).\end{array}

(C.2)

To take advantage of (C.2), observe that, for all $\bm{\theta}\in\widehat{\bm{\Theta}}(\delta_{N})$ and all $(\bm{y},\,\bm{z})\in\mathscr{C}(\delta_{N})\,\cap\,\mathscr{H}$ ,

\begin{array}[]{llllllll}\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\bm{z})-\bm{s}(\bm{\theta}^{\star};\,\bm{y},\bm{z})\right|\!\right|_{\infty}\,\leq\,\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\bm{z})\right|\!\right|_{\infty}+\left|\!\left|\bm{s}(\bm{\theta}^{\star};\,\bm{y},\bm{z})\right|\!\right|_{\infty}\,\leq\,2\,\delta_{N}=\dfrac{\rho_{N}}{\Lambda_{N}(\bm{\theta}^{\star})},\end{array}

(C.3)

because $\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}\leq\delta_{N}$ for all $\bm{\theta}\in\widehat{\bm{\Theta}}(\delta_{N})$ , $\left|\!\left|\bm{s}(\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}\leq\delta_{N}$ for all $(\bm{y},\,\bm{z})\in\mathscr{C}(\delta_{N})\,\cap\,\mathscr{H}$ , and $\delta_{N}\coloneqq\rho_{N}\,/\,(2\,\Lambda_{N}(\bm{\theta}^{\star}))$ . Using (C.3) along with the definition of $\Lambda_{N}(\bm{\theta}^{\star})\coloneqq\sup_{(\bm{y},\,\bm{z})\in\mathscr{H}}\,\Lambda_{N,\bm{y},\bm{z}}(\bm{\theta}^{\star})>0$ , we obtain

\begin{array}[]{llllllllll}\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\,\bm{z})-\bm{s}(\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}&\leq&\dfrac{\rho_{N}}{\Lambda_{N}(\bm{\theta}^{\star})}&\leq&\dfrac{\rho_{N}}{\Lambda_{N,\bm{y},\bm{z}}(\bm{\theta}^{\star})},\end{array}

(C.4)

and, using (C.2),

\begin{array}[]{llllllllll}\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\,\bm{z})-\bm{s}(\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}&\leq&\dfrac{\rho_{N}}{\Lambda_{N,\bm{y},\bm{z}}(\bm{\theta}^{\star})}&\leq&\epsilon_{N}(\rho_{N}).\end{array}

(C.5)

In light of the fact that

\begin{array}[]{llllllllll}\left|\!\left|\bm{s}(\bm{\theta};\,\bm{y},\,\bm{z})-\bm{s}(\bm{\theta}^{\star};\,\bm{y},\,\bm{z})\right|\!\right|_{\infty}&\leq&\epsilon_{N}(\rho_{N})&\mbox{implies}&|\!|\bm{\theta}-\bm{\theta}^{\star}|\!|_{\infty}&\leq&\rho_{N},\end{array}

the set $\widehat{\bm{\Theta}}(\delta_{N})$ is non-empty and satisfies

\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\rho_{N})\end{array}

(C.6)

in the event $(\bm{Y},\bm{Z})\in\mathscr{C}(\delta_{N})\cap\,\mathscr{H}$ , provided $N>N_{0}$ .

The event $(\bm{Y},\bm{Z})\,\in\,\mathscr{C}(\delta_{N})\,\cap\,\mathscr{H}$ occurs with probability $1-o(1)$ . The probability of event $(\bm{Y},\bm{Z})\,\in\,\mathscr{C}(\delta_{N})\,\cap\,\mathscr{H}$ is bounded below by

\begin{array}[]{llllllllll}\mathbb{P}\left((\bm{Y},\bm{Z})\in\mathscr{C}(\delta_{N})\cap\,\mathscr{H}\right)&\geq&1-\mathbb{P}\left((\bm{Y},\bm{Z})\not\in\mathscr{C}(\delta_{N})\right)-\mathbb{P}\left((\bm{Y},\bm{Z})\not\in\mathscr{H}\right)&=&1-o(1).\end{array}

The above inequality stems from a union bound, while the identity follows from the assumption that the probabilities of the events $(\bm{Y},\bm{Z})\not\in\mathscr{C}(\delta_{N})$ and $(\bm{Y},\bm{Z})\not\in\mathscr{H}$ satisfy

\begin{array}[]{llllllllll}\mathbb{P}\left((\bm{Y},\bm{Z})\not\in\mathscr{C}(\delta_{N})\right)&=&\mathbb{P}\left(\left|\!\left|\bm{s}(\bm{\theta}^{\star};\,\bm{Y},\bm{Z})-\mathbb{E}\,\,\bm{s}(\bm{\theta}^{\star};\,\bm{Y},\bm{Z})\right|\!\right|_{\infty}\geq\delta_{N}\right)&=&o(1)\vskip 7.11317pt\\ \mathbb{P}\left((\bm{Y},\bm{Z})\not\in\mathscr{H}\right)&=&o(1),\end{array}

where the first result leverages the fact that $\mathbb{E}\,\,\bm{s}(\bm{\theta}^{\star};\,\bm{Y},\bm{Z})=\bm{0}$ by Lemma 7 of S25.

Conclusion. Combining (C.6) with (C) establishes that, for all $N>N_{0}$ , the random set $\widehat{\bm{\Theta}}(\delta_{N})$ is non-empty and, with probability $1-o(1)$ , satisfies

\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\rho_{N}).\end{array}

Appendix D Corollaries 4 and D.3

To state and prove Corollaries 4 and D.3, we first introduce notation along with background on conditional independence graphs \citepsuppgraphical.models and couplings \citepsuppLi02.

D.1 Notation and Background

We consider the model of Corollary 4, with joint probability mass function

\begin{array}[]{llllllllll}\mathbb{P}_{\bm{\theta}}\left((\bm{Y},\,\bm{Z})=(\bm{y},\,\bm{z})\mid\bm{X}=\bm{x}\right)&\propto&\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y},\,\bm{z})\right).\end{array}

(D.1)

The parameter vector is $\bm{\theta}\coloneqq(\alpha_{\mathscr{Z},1},\ldots,\alpha_{\mathscr{Z},N},\gamma_{\mathscr{Z},\mathscr{Z}},\,\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}})\;\in\;\mathbb{R}^{N+2}$ and the vector of sufficient statistics is $\bm{b}(\bm{x},\,\bm{y},\,\bm{z})\in\mathbb{R}^{N+2}$ , with coordinates

•

$b_{i}(\bm{x},\,\bm{y},\,\bm{z})\coloneqq\sum_{j\in\mathscr{P}_{N}\setminus\,\{i\}}\,z_{i,j}$ ( $i=1,\ldots,N$ ),
•

$b_{N+1}(\bm{x},\,\bm{y},\,\bm{z})\coloneqq\sum_{i=1}^{N}\sum_{j=i+1}^{N}d_{i,j}(\bm{z})\,z_{i,j}$ ,
•

$b_{N+2}(\bm{x},\,\bm{y},\,\bm{z})\coloneqq\sum_{i=1}^{N}\sum_{j=i+1}^{N}\,c_{i,j}\,(x_{i}\,y_{j}+x_{j}\,y_{i})\,z_{i,j}$ ,

where the terms $c_{i,j}$ and $d_{i,j}(\bm{z})$ are defined as follows:

\begin{array}[]{llllllllll}c_{i,j}\coloneqq\mathbbm{1}(\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset)\\ d_{i,j}(\bm{z})\coloneqq\mathbbm{1}(\exists\;k\,\in\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,:\,z_{i,k}=z_{k,j}=1).\end{array}

(D.2)

In light of $\psi\coloneqq 1$ , we do not distinguish between $\bm{y}$ and $\bm{y}^{\star}$ or $y_{i}$ and $y_{i}^{\star}$ . To ease the presentation, we write $Y_{i}\mid\bm{x},\,\,\bm{y}_{-i},\,\bm{z}$ rather than $Y_{i}\mid(\bm{X},\,\,\bm{Y}_{-i},\,\bm{Z})=(\bm{x},\,\,\bm{y}_{-i},\,\bm{z})$ , and $Z_{i,j}\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}}$ rather than $Z_{i,j}\mid(\bm{X},\,\bm{Y},\,\bm{Z}_{-\{i,j\}})=(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})$ . Expectations, variances, and covariances with respect to the conditional distributions of $Y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z}$ and $Z_{i,j}\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}}$ are denoted by $\mathbb{E}_{\mathscr{Y},i}$ , $\mathbb{V}_{\mathscr{Y},i}$ , $\mathbb{C}_{\mathscr{Y},i}$ and $\mathbb{E}_{\mathscr{Z},i,j}$ , $\mathbb{V}_{\mathscr{Z},i,j}$ , $\mathbb{C}_{\mathscr{Z},i,j}$ , respectively.

Conditional independence graph. Let $M\coloneqq N+\binom{N}{2}$ be the total number of responses and connections and

\begin{array}[]{llllllllll}\bm{W}\,\coloneqq\,(W_{1},\,\ldots,\,W_{M})\,\coloneqq\,(Y_{1},\,\ldots,\,Y_{N},\,Z_{1,2},\,\ldots,\,Z_{N-1,N})\,\in\,\mathscr{W}\,\coloneqq\,\{0,\,1\}^{N+\binom{N}{2}}\end{array}

(D.3)

be the vector consisting of responses and connections. The conditional independence structure of the model can be represented by a conditional independence graph $\mathscr{G}\coloneqq(\mathscr{V},\,\mathscr{E})$ with a set of vertices $\mathscr{V}\coloneqq\{W_{1},\ldots,W_{M}\}$ and a set of undirected edges $\mathscr{E}$ . We refer to elements of $\mathscr{V}$ and $\mathscr{E}$ as vertices and edges of $\mathscr{G}$ . There are two distinct subsets of vertices in $\mathscr{G}$ :

•

the subset $\mathscr{V}_{\mathscr{Y}}\,\coloneqq\,\{W_{1},\dots,W_{N}\}$ corresponding to responses $Y_{1},\ldots,Y_{N}$ ;
•

the subset $\mathscr{V}_{\mathscr{Z}}\,\coloneqq\,\{W_{N+1},\dots,W_{M}\}$ corresponding to connections $Z_{1,2},\ldots,Z_{N-1,N}$ .

An undirected edge between two vertices in $\mathscr{G}$ represents dependence of the two corresponding random variables conditional on all other random variables. The vertices in $\mathscr{G}$ are connected to the following subsets of vertices (neighborhoods):

•

The neighborhood of $Y_{i}$ in $\mathscr{G}$ consists of all $Y_{j}$ and all $Z_{i,j}$ such that $j\,\in\,\mathscr{P}_{N}\setminus\,\{i\}$ and $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset$ .
•
The neighborhood of $Z_{i,j}$ in $\mathscr{G}$ consists of
1. 1.
  
  $Y_{i}$ and $Y_{j}$ ;
2. 2.
  
  all $Z_{i,h}$ and $Z_{j,h}$ such that $h\in\mathscr{P}_{N}\setminus\,\{i,\,j\}$ and $h\in\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}$ ;
3. 3.
  
  all $Z_{i,h}$ and $Z_{j,h}$ such that $h\in\mathscr{P}_{N}\setminus\,\{i,\,j\}$ and $h\not\in\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}$ provided that either $j\in\mathscr{N}_{i}\,\cap\,\mathscr{N}_{h}$ holds or $i\in\mathscr{N}_{j}\,\cap\,\mathscr{N}_{h}$ holds.

Let $d_{\mathscr{G}}(i,j)$ be the length of the shortest path from vertex $W_{i}\in\mathscr{V}$ to vertex $W_{j}\in\mathscr{V}$ in $\mathscr{G}$ and let $\mathscr{S}_{\mathscr{G},i,k}$ be the set of vertices with distance $k\in\left\{1,2,\ldots\right\}$ to the $i$ th vertex $W_{i}$ in $\mathscr{G}$ :

\begin{array}[]{llllllllll}\mathscr{S}_{\mathscr{G},i,k}&\coloneqq&\left\{W_{j}\in\mathscr{V}\setminus\{W_{i}\}:d_{\mathscr{G}}(i,j)=k\right\}.\end{array}

We define the maximum degree of vertices relating to connections in $\mathscr{G}$ as follows:

\begin{array}[]{llllllllll}D_{N}&\coloneqq&\underset{1\,\leq\,i\,\leq\,M}{\text{max}}\;|\mathscr{S}_{\mathscr{G},i,1}|.\end{array}

(D.4)

Coupling matrix. Let $\bm{W}_{a:b}\coloneqq(W_{a},\,\ldots,\,W_{b})\in\mathscr{W}_{a:b}$ be the subvector consisting of responses and connections with indices $1\leq a\leq b\leq M$ . The set of random variables excluding the random variable $W_{v}\in\mathscr{V}$ with $v\in\{1,\ldots,M\}$ is denoted by $\bm{w}_{-v}\in\mathscr{W}_{-v}$ . Consider any $\bm{a}\in\{0,1\}^{M-i}$ and define

\begin{array}[]{llllllllll}\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},w_{i}}(\bm{W}_{(i+1):M}=\bm{a})&\coloneqq&\mathbb{P}_{\bm{\theta}^{\star}}(\bm{W}_{(i+1):M}=\bm{a}\mid(\bm{W}_{1:(i-1)},W_{i})=(\bm{w}_{1:(i-1)},w_{i})).\end{array}

We use the total variation distance between the conditional distributions $\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},0}$ and $\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},1}$ for quantifying the amount of dependence induced by the model, where $\bm{\theta}^{\star}\in\bm{\Theta}$ is the data-generating parameter vector. The total variation distance between $\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},0}$ and $\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},1}$ can be bounded from above by using coupling methods \citepsuppLi02. A coupling of $\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},0}$ and $\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},1}$ is a joint probability distribution $\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}$ for a pair of random vectors $(\bm{W}_{(i+1):M}^{\star},\,\bm{W}_{(i+1):M}^{\star\star})\in\{0,1\}^{M-i}\times\{0,1\}^{M-i}$ with marginals $\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},0}$ and $\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},1}$ . For convenience, we define $(\bm{W}^{\star},{\bm{W}}^{\star\star})\in\{0,\,1\}^{M}\times\{0,\,1\}^{M}$ , where the first $i$ elements are given by $\bm{W}^{\star}_{1:i}=(\bm{w}_{1:(i-1)},\,0)$ and ${\bm{W}}^{\star\star}_{1:i}=(\bm{w}_{1:(i-1)},\,1)$ , respectively. The basic coupling inequality \citepsupp[][Theorem 5.2, p. 19]Li02 shows that any coupling satisfies

\begin{array}[]{llllllllll}\left|\!\left|\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},0}-\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},1}\right|\!\right|_{\text{TV}}&\leq&\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}(\bm{W}_{(i+1):M}^{\star}\neq\bm{W}_{(i+1):M}^{\star\star}),\end{array}

(D.5)

where $\left|\!\left|.\right|\!\right|_{\text{TV}}$ denotes the total variance distance between probability measures. If the two sides in Equation (D.5) are equal, the coupling is called optimal. An optimal coupling is guaranteed to exist, but may not be unique \citepsupp[][pp. 99–107]Li02. To prove Corollary 4, we need an upper bound on the spectral norm ${|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}$ of the coupling matrix $\mathscr{D}(\bm{\theta}^{\star})$ , so we construct a coupling that is convenient but may not be optimal.

A coupling $\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}$ of $\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},0}$ and $\mathbb{P}_{\bm{\theta}^{\star},\bm{w}_{1:(i-1)},1}$ can be constructed as follows:

Step 1: Set $\mathscr{U}=\{1,\ldots,i\}$ and $\mathscr{K}=\{1,\ldots,M\}$ .
Step 2: Set $\mathscr{A}=\{j\in\mathscr{K}\,\setminus\,\mathscr{U}:\;(W_{i},\,W_{j})\in\mathscr{E}\text{ with }i\in\mathscr{U}\text{ and $j\in\mathscr{K}\,\setminus\,\mathscr{U}$ such that }W_{j}^{\star}\neq W_{j}^{\star\star}\}$ .
- (a)
  
  If $\mathscr{A}\neq\emptyset$ , pick the smallest element $j\in\mathscr{A}$ and let $(W_{j}^{\star},\,W_{j}^{\star\star})$ be distributed according to an optimal coupling of $\mathbb{P}_{\bm{\theta}^{\star}}(W_{j}=\cdot\mid\bm{W}_{\mathscr{U}}=\bm{w}_{\mathscr{U}}^{\star})$ and $\mathbb{P}_{\bm{\theta}^{\star}}(W_{j}=\cdot\mid\bm{W}_{\mathscr{U}}=\bm{w}_{\mathscr{U}}^{\star\star})$ .
- (b)
  
  If $\mathscr{A}=\emptyset$ , pick the smallest element $j\in\mathscr{K}\,\setminus\,\mathscr{U}$ and let $(W_{j}^{\star},W_{j}^{\star\star})$ be distributed according to an optimal coupling of $\mathbb{P}_{\bm{\theta}^{\star}}(W_{j}=\cdot\mid\bm{W}_{\mathscr{U}}=\bm{w}_{\mathscr{U}}^{\star})$ and $\mathbb{P}_{\bm{\theta}^{\star}}(W_{j}=\cdot\mid\bm{W}_{\mathscr{U}}=\bm{w}_{\mathscr{U}}^{\star\star})$ .
Step 3: Replace $\mathscr{U}$ by $\mathscr{U}\,\cup\,\{j\}$ and repeat Step 2 until $\mathscr{K}\,\setminus\,\mathscr{U}=\emptyset$ .

Based on $\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}$ , we construct a coupling matrix $\mathscr{D}(\bm{\theta}^{\star})\in\mathbb{R}^{M\times M}$ with elements

\begin{array}[]{llllllllll}\mathscr{D}_{i,j}(\bm{\theta}^{\star})&\coloneqq&\begin{cases}0&\mbox{if }i<j\\ 1&\mbox{if }i=j\\ \max\limits_{\bm{w}_{1:(i-1)}\,\in\,\mathscr{W}_{1:i-1}}\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}(W_{j}^{\star}\neq W_{j}^{\star\star})&\mbox{if }i>j.\end{cases}\end{array}

Overlapping subpopulations. To obtain convergence rates based on a single observation of dependent random variables $\bm{W}$ , we need to control the dependence of $\bm{W}$ in the form of ${|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}$ . In line with the simulation setting in Section 5, we therefore assume that overlapping subpopulations $\mathscr{A}_{1},\mathscr{A}_{2},\ldots$ characterize the neighborhoods. The neighborhood $\mathscr{N}_{i}$ of unit $i\in\mathscr{P}_{N}$ is then defined as

\begin{array}[]{llllllllll}\mathscr{N}_{i}&\coloneqq&\{j\,\in\,\mathscr{P}_{N}:\mbox{ there exists }k\in\{1,2,\ldots\}\text{ such that }i\in\mathscr{A}_{k}\text{ and }j\in\mathscr{A}_{k}\}.\end{array}

(D.6)

Let $\mathscr{G}_{\mathscr{A}}$ be a subpopulation graph with a set of vertices $\mathscr{V}_{\mathscr{A}}\coloneqq\left\{\mathscr{A}_{1},\mathscr{A}_{2},\ldots\right\}$ and a set of edges connecting distinct subpopulations $\mathscr{A}_{k}$ and $\mathscr{A}_{l}$ with $\mathscr{A}_{k}\,\cap\,\mathscr{A}_{l}\neq\emptyset$ . Define

\begin{array}[]{llllllllll}\mathscr{S}_{\mathscr{G}_{\mathscr{A}},\,i,k}&\coloneqq&\left\{\mathscr{A}_{j}\in\mathscr{V}_{\mathscr{A}}\setminus\{\mathscr{A}_{i}\}:\;d_{\mathscr{G}_{\mathscr{A}}}(i,j)=k\right\}.\end{array}

(D.7)

Using the background introduced above, we restate Condition 4 more formally.

Condition 4: Dependence. The population $\mathscr{P}$ consists of intersecting subpopulations $\mathscr{A}_{1},\mathscr{A}_{2},\ldots$ , whose intersections are represented by subpopulation graph $\mathscr{G}_{\mathscr{A}}$ . Let $D_{N}\in\{2,3,\ldots\}$ be defined by (D.4) and $\mathscr{S}_{\mathscr{G}_{\mathscr{A}},\,i,k}$ be defined by (D.7), and assume that

\begin{array}[]{llllllllll}\max\limits_{k\,\in\,\{1,2,\ldots\}}~|\mathscr{S}_{\mathscr{G}_{\mathscr{A}},k,l}|&\leq&\omega_{1}+\dfrac{\omega_{2}}{2\,D_{N}^{3}}\log(l+1),&&l=1,2,\ldots,\end{array}

where $\omega_{1}\,\geq\,0$ and $0\,\leq\,\omega_{2}\,\leq\,\min\limits\{\omega_{1},\,1/((\omega_{1}+1)\,|\log(1-U)|)\}$ with $U\coloneqq(1+\exp(-A))^{-1}>0$ . The constant $A>0$ is identical to the constant $A$ in Condition 4. In addition, for each unit $i\in\mathscr{P}_{i}$ , the neighborhood $\mathscr{N}_{i}$ is defined by (D.6), and there exists a constant $B\in(0,\,+\infty)$ such that $\max_{1\leq i\leq N}|\mathscr{N}_{i}|<B$ .

The assumption $\max_{1\leq i\leq N}|\mathscr{N}_{i}|<B$ implies that $D_{N}$ is bounded above by a constant $D\in\{2,3,\ldots\}$ .

D.2 Proof of Corollary 4

To prove Corollary 4, define

\begin{array}[]{llllllllll}\mathscr{H}\;\coloneqq\;\mathscr{H}_{1}\,\cap\,\mathscr{H}_{2}\vskip 7.11317pt\\ \mathscr{H}_{1}\;\coloneqq\;\left\{\bm{w}\in\mathscr{W}:\;\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}\geq\dfrac{N}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}\right\}\vskip 7.11317pt\\ \mathscr{H}_{2}\;\coloneqq\;\left\{\bm{w}\in\mathscr{W}:\;\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{w})\right|\!\right|_{\infty}}\geq\dfrac{c^{2}\,N}{2\,(1+\chi(\bm{\theta}^{\star}))}\right\}\vskip 7.11317pt\\ \bm{H}_{i,1}(\bm{w})\;\coloneqq\;(d_{i,1}(\bm{z}),\,\ldots,\,d_{i,i-1}(\bm{z}),\,d_{i,i+1}(\bm{z}),\,\ldots,\,d_{i,N}(\bm{z}))\vskip 7.11317pt\\ \bm{H}_{i,2}(\bm{w})\;\coloneqq\;(c_{i,1}\,x_{1}^{2}\,z_{i,1},\,\ldots,\,c_{i,i-1}\,x_{i-1}^{2}\,z_{i,i-1},\,c_{i,i+1}\,x_{i+1}^{2}\,z_{i,i+1},\,\ldots,\,c_{i,N}\,x_{N}^{2}\,z_{i,N})\end{array}

(D.8)

and

\begin{array}[]{llllllllll}\chi(\bm{\theta}^{\star})&\coloneqq&\exp(C\,D^{2}\,(|\!|\bm{\theta}^{\star}|\!|_{\infty}+\epsilon^{\star})),\end{array}

(D.9)

where the constants $0<c<C<\infty$ and $D\in\{2,3,\ldots\}$ are identical to the corresponding constants defined in Condition 4 and Equation (D.4), respectively.

Proof of Corollary 4. We prove Corollary 4 using Theorem 4 in five steps:

Step 1: We bound

\begin{array}[]{llllllllll}\mathbb{P}\left({\left|\!\left|\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\bm{W})|_{\bm{\theta}=\bm{\theta}^{\star}}-\mathbb{E}\,\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\bm{W})|_{\bm{\theta}=\bm{\theta}^{\star}}\right|\!\right|_{\infty}}<\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\right)&\geq&1-\tau\left(\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\right),\end{array}

and choose $\rho_{N}$ so that $1-\tau(\rho_{N}/(2\,\Lambda_{N}(\bm{\theta}^{\star})))\,\geq\,1-2\,/\max\{N,\,p\}^{2}$ .

Step 2: We show that $-\nabla_{\bm{\theta}}^{2}~\ell(\bm{\theta};\,\bm{w})$ is invertible for all $\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})$ and all $\bm{w}\in\mathscr{H}$ .
Step 3: We prove that the event $\bm{W}\in\mathscr{H}$ occurs with probability at least $1-\upsilon(\rho_{N}/(2\,\Lambda_{N}(\bm{\theta}^{\star})))\,\geq\,1-4\,/\max\{N,\,p\}^{2}$ .
Step 4: We bound $\delta_{N}$ .
Step 5: We bound $\rho_{N}$ .

The proof of Corollary 4 leverages auxiliary results supplied by Lemmas D.4, D.5, and D.6, which show that there exists an integer $N_{1}\in\{3,4,\dots\}$ such that, for all $N>N_{1}$ ,

\begin{array}[]{llllllllll}\Lambda_{N}(\bm{\theta}^{\star})&\leq&C_{1}\;\dfrac{\chi(\bm{\theta}^{\star})^{9}}{N}&\text{by Lemma \ref{lemma.bounds.lambda}}\vskip 7.11317pt\\ \sqrt{N/2}&\leq&\Psi_{N}~\leq~C_{2}\,\sqrt{N}&\text{by Lemma \ref{lemma.bounds.psi}}\vskip 7.11317pt\\ {|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&C_{3}&\text{by Lemma \ref{lemma.bounds.d}},\end{array}

where $C_{1}>0$ , $C_{2}>0$ , and $C_{3}\geq 1$ are constants.

Step 1: Since $\bm{W}\in\{0,1\}^{M\times M}$ , Lemma 6 of S25 establishes

\begin{array}[]{llllllllll}\mathbb{P}\left({\left|\!\left|\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\bm{W})|_{\bm{\theta}=\bm{\theta}^{\star}}-\mathbb{E}\,\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\bm{W})|_{\bm{\theta}=\bm{\theta}^{\star}}\right|\!\right|_{\infty}}<\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\right)&\geq&1-\tau\left(\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\right),\end{array}

where

\begin{array}[]{llllllllll}\tau\left(\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\right)&\coloneqq&2\,\exp\left(-\dfrac{\rho_{N}^{2}}{32\,\Lambda_{N}(\bm{\theta}^{\star})^{2}\;(1+D)^{2}\;|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}^{2}\;\Psi_{N}^{2}}+\log\,p\right),\end{array}

with $D\in\{2,3,\ldots\}$ defined in (D.4). Choosing

\begin{array}[]{llllllllll}\rho_{N}&\coloneqq&\sqrt{96}\;\Lambda_{N}(\bm{\theta}^{\star})\,(1+D)\,{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}\,\Psi_{N}\sqrt{\log\,\max\{N,\,p\}}\end{array}

(D.10)

implies that the event

\begin{array}[]{llllllllll}{\left|\!\left|\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\bm{W})|_{\bm{\theta}=\bm{\theta}^{\star}}-\mathbb{E}\,\nabla_{\bm{\theta}}\;\ell(\bm{\theta};\bm{W})|_{\bm{\theta}=\bm{\theta}^{\star}}\right|\!\right|_{\infty}}&<&\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\end{array}

occurs with probability at least

\begin{array}[]{llllllllll}1-\tau\left(\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}\right)&\geq&1-\dfrac{2}{\max\{N,\,p\}^{2}}.\end{array}

Step 2: Let $\mathscr{H}$ be defined in (D.8). Lemma D.4 establishes that $-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{w})$ is invertible for all $\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})$ and all $\bm{w}\in\mathscr{H}$ .

Step 3: Lemma D.7 shows that there exists an integer $N_{2}\in\{3,4,\dots\}$ such that, for all $N>N_{2}$ , the event $\bm{W}\in\mathscr{H}$ occurs with probability at least

\begin{array}[]{llllllllll}1-\upsilon(\delta_{N})&=&1-\dfrac{4}{\max\{N,\,p\}^{2}}.\end{array}

Step 4: The quantity

\begin{array}[]{llllllllll}\delta_{N}&\coloneqq&\dfrac{\rho_{N}}{2\,\Lambda_{N}(\bm{\theta}^{\star})}&=&\sqrt{24}\;\,(1+D)\,{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}\,\Psi_{N}\,\sqrt{\log\,\max\{N,\,p\}}\end{array}

is bounded below by

\begin{array}[]{llllllllll}\delta_{N}&\geq&\sqrt{24}\;\,D\,\sqrt{N/2}\;\sqrt{\log N}&=&\sqrt{12}\;\,D\,\sqrt{N\log N}\end{array}

and is bounded above by

\begin{array}[]{llllllllll}\delta_{N}&\leq&\sqrt{24}\;\,C_{2}\;C_{3}\,(2\,D)\,\sqrt{N}\,\sqrt{2\,\log N}&=&\sqrt{192}\;\,C_{2}\;C_{3}\,D\,\sqrt{N\log N},\end{array}

using $D\in\{2,3,\ldots\}$ , $1\leq{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}\leq C_{3}$ , $\sqrt{N/2}\leq\Psi_{N}\leq C_{2}\,\sqrt{N}$ , and $\max\{N,\,p\}=p=N+2$ . Since $C_{2}>0$ , $C_{3}\geq 1$ , and $D\in\{2,3,\ldots\}$ defined in (D.4) are constants, there exist constants $0\,<\,L\,\leq\,U\,<\,\infty$ such that

\begin{array}[]{llllllllll}L\;\sqrt{N\log N}&\leq&\delta_{N}&\leq&U\;\sqrt{N\log N}.\end{array}

Step 5: Substituting the bounds on $\Lambda_{N}(\bm{\theta}^{\star})$ , $\Psi_{N}$ , and ${|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}$ supplied by Lemmas D.4, D.5, and D.6 into (D.10) reveals that

\begin{array}[]{llllllllll}\rho_{N}&\coloneqq&\sqrt{96}\;\Lambda_{N}(\bm{\theta}^{\star})\,(1+D)\,{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}\,\Psi_{N}\sqrt{\log\,\max\{N,\,p\}}\vskip 7.11317pt\\ &\leq&\sqrt{96}\;\,C_{1}\;C_{2}\;C_{3}\;(2\,D)\;\dfrac{\chi(\bm{\theta}^{\star})^{9}}{N}\;\sqrt{N\log(N+2)}\vskip 7.11317pt\\ &\leq&\sqrt{768}\;\,C_{1}\;C_{2}\;C_{3}\;D\,\chi(\bm{\theta}^{\star})^{9}\;\sqrt{\dfrac{\log N}{N}},\end{array}

(D.11)

using $\max\{N,\,p\}=p=N+2$ and $\log(N+2)\leq\log(2\,N)\leq 2\,\log N$ ( $N\geq 2$ ). To bound $\chi(\bm{\theta}^{\star})$ , we invoke Condition 4:

\begin{array}[]{llllllllll}\chi(\bm{\theta}^{\star})^{9}\;\coloneqq\;\exp(C\,D^{2}\,(|\!|\bm{\theta}^{\star}|\!|_{\infty}+\epsilon^{\star}))^{9}\;\leq\;\exp(C\,D^{2}\,(A+\epsilon^{\star}))^{9}\;=\;\exp(9\;C\,D^{2}\,(A+\epsilon^{\star})).\end{array}

Define

\begin{array}[]{llllllllll}K&\coloneqq&\sqrt{768}\;\,C_{1}\;C_{2}\;C_{3}\;D\,\exp(9\;C\,D^{2}\,(A+\epsilon^{\star}))&>&0.\end{array}

Since $A$ , $C$ , $C_{1}$ , $C_{2}$ , $C_{3}$ , $D$ , and $\epsilon^{\star}$ are independent of $N$ , so is $K$ . We conclude that

\begin{array}[]{llllllllll}\rho_{N}&\leq&K\;\sqrt{\dfrac{\log N}{N}}&\to&0&\mbox{as}&N\to\infty.\end{array}

Conclusion. Theorem 4 implies that, for all $N>N_{0}\coloneqq\max\{N_{1},N_{2}\}$ , the random set $\widehat{\bm{\Theta}}(\delta_{N})$ is non-empty and satisfies

\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}\left(\bm{\theta}^{\star},\;K\,\sqrt{\dfrac{\log N}{N}}\right)\end{array}

with probability at least

\begin{array}[]{llllllllll}1-\tau(\delta_{N})-\upsilon(\delta_{N})&\geq&1-\dfrac{6}{\max\{N,\,p\}^{2}}&\geq&1-\dfrac{6}{N^{2}},\end{array}

using $\max\{N,\,p\}^{2}=p^{2}=(N+2)^{2}\geq N^{2}$ .

D.3 Statement and Proof of Corollary D.3

If subpopulations do not overlap, ${\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}}$ can grow as a function $N$ . Condition D.3 details how fast ${\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}}$ can grow.

Condition 5. The parameter space is $\bm{\Theta}=\mathbb{R}^{N+2}$ and the data-generating parameter vector $\bm{\theta}^{\star}\in\mathbb{R}^{N+2}$ satisfies

\begin{array}[]{llllllllll}\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}&\leq&\dfrac{E+\vartheta\,\log N}{C\,D^{2}}-\epsilon^{\star},\end{array}

where $E\geq 0$ and $\vartheta\in[0,\,1/18)$ are constants, $C>0$ is identical to the constant $C$ in Condition 4, $D\in\{2,3,\ldots\}$ is identical to the constant $D$ in (D.4), and $\epsilon^{\star}>0$ is identical to the constant $\epsilon^{\star}$ in the definition of $\Lambda_{N}(\bm{\theta}^{\star})$ in Section 4.

Corollary D.3 replaces Condition 4 by Condition D.3. Resulting from this, the constant $U$ coming up in Condition D.1 is redefined as $U\coloneqq(1+\exp(-D))^{-1}>0$ .

Corollary 2. Consider a single observation of dependent responses and connections $(\bm{Y},\bm{Z})$ generated by the model with parameter vector $\bm{\theta}^{\star}\coloneqq(\alpha_{\mathscr{Z},1}^{\star},\,\dots,\alpha_{\mathscr{Z},N}^{\star},\,\gamma_{\mathscr{Z},\mathscr{Z}}^{\star},\,\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}^{\star})\in\mathbb{R}^{N+2}$ . If Conditions 4, D.1, and D.3 are satisfied with $\vartheta\in[0,1/18)$ , there exist constants $K\in(0,+\infty)$ and $0<L\leq U<+\infty$ along with an integer $N_{0}\in\{3,4,\dots\}$ such that, for all $N>N_{0}$ , the quantity $\delta_{N}$ satisfies

\begin{array}[]{llllllllll}L\,\sqrt{N\log N}&\leq&\delta_{N}&\leq&U\,\sqrt{N\log N},\end{array}

and the random set $\widehat{\bm{\Theta}}(\delta_{N})$ is non-empty and satisfies

\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}\left(\bm{\theta}^{\star},\;K\,\sqrt{\dfrac{\log N}{N^{1-18\,\vartheta}}}\right)\end{array}

with probability at least $1-6\,/N^{2}$ .

Proof of Corollary D.3. The proof of Corollary D.3 resembles the proof of Corollary 4, with Condition 4 replaced by Condition D.3. The proof of Corollary 4 shows that

\begin{array}[]{llllllllll}\rho_{N}&\leq&\sqrt{768}\;\,C_{1}\;C_{2}\,C_{3}\;D\;\chi(\bm{\theta}^{\star})^{9}\;\sqrt{\dfrac{\log N}{N}},\end{array}

where the constants $C_{1}>0$ , $C_{2}>0$ , $C_{3}\geq 1$ , and $D\in\{2,3,\ldots\}$ are defined in Lemmas D.4, D.5, and D.6, and Equation (D.4), respectively. Condition D.3 implies that

\begin{array}[]{llllllllll}\chi(\bm{\theta}^{\star})^{9}\;\coloneqq\;\exp(C\,D^{2}\,(|\!|\bm{\theta}^{\star}|\!|_{\infty}+\epsilon^{\star}))^{9}\;\leq\;\exp\left(C\,D^{2}\,\left(\dfrac{E+\vartheta\,\log N}{C\,D^{2}}\right)\right)^{9}\;=\;\exp(9\,E)\,N^{9\,\vartheta},\end{array}

which in turn implies that

\begin{array}[]{llllllllll}\rho_{N}&\leq&\sqrt{768}\;\,C_{1}\;C_{2}\,C_{3}\;D\,\exp(9\,E)\,\sqrt{\dfrac{\log N}{N^{1-18\,\vartheta}}}&=&K\;\sqrt{\dfrac{\log N}{N^{1-18\,\vartheta}}},\end{array}

where $K\coloneqq\sqrt{768}\;C_{1}\;C_{2}\,C_{3}\;D\,\exp(9\,E)>0$ . The remainder of the proof of Corollary D.3 resembles the proof of Corollary 4. We conclude that there exists an integer $N_{0}\in\{3,4,\dots\}$ such that, for all $N>N_{0}$ , the random set $\widehat{\bm{\Theta}}(\delta_{N})$ is non-empty and satisfies

\begin{array}[]{llllllllll}\widehat{\bm{\Theta}}(\delta_{N})&\subseteq&\mathscr{B}_{\infty}\left(\bm{\theta}^{\star},\;K\;\sqrt{\dfrac{\log N}{N^{1-18\,\vartheta}}}\right)\end{array}

with probability at least $1-6\,/N^{2}$ .

D.4 Bounding $\Lambda_{N}(\bm{\theta}^{\star})$

Lemma 3. Consider the model of Corollary 4. If Conditions 4 and D.1 are satisfied along with either Condition 4 or Condition D.3 with $\vartheta\in[0,1/18)$ , there exists a constant $C_{1}>0$ along with an integer $N_{0}\in\{3,4,\dots\}$ such that, for all $N>N_{0}$ ,

•

$(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{w}))^{-1}$ is invertible for all $\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})$ and all $\bm{w}\in\mathscr{H}$ ,
•

the event $\bm{W}\in\mathscr{H}$ occurs with probability at least $1-4\,/\max\{N,\,p\}^{2}$ ,
•

$\Lambda_{N}(\bm{\theta}^{\star})~\coloneqq~\sup\limits_{\bm{w}\,\in\,\mathscr{H}}\;\,\sup\limits_{\bm{\theta}\,\in\,\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})}\,{\left|\!\left|\!\left|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}}~\leq~C_{1}\,\dfrac{\chi(\bm{\theta}^{\star})^{9}}{N}$ ,

where $\mathscr{H}$ is defined in (D.8) and $\chi(\bm{\theta}^{\star})$ is defined in (D.9).

Proof of Lemma D.4. We first partition $-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{w})$ in accordance with $\bm{\theta}\coloneqq(\bm{\theta}_{1},\,\bm{\theta}_{2})$ , given by $\bm{\theta}_{1}\coloneqq(\alpha_{\mathscr{Z},1},\dots,\alpha_{\mathscr{Z},N})\in\mathbb{R}^{N}$ and $\bm{\theta}_{2}\coloneqq(\gamma_{\mathscr{Z},\mathscr{Z}},\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}})\in\mathbb{R}^{2}$ :

\begin{array}[]{llllllllll}-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{w})&\coloneqq&\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})&\bm{C}(\bm{\theta},\,\bm{w})\\ \bm{C}(\bm{\theta},\,\bm{w})^{\top}&\bm{B}(\bm{\theta},\,\bm{w})\end{pmatrix},\end{array}

(D.12)

where the matrices $\bm{A}(\bm{\theta},\,\bm{w})\in\mathbb{R}^{N\times N}$ and $\bm{B}(\bm{\theta},\,\bm{w})\in\mathbb{R}^{2\times 2}$ define the covariance matrices of the sufficient statistics corresponding to the parameters $\bm{\theta}_{1}$ and $\bm{\theta}_{2}$ , respectively. Define $\bm{C}(\bm{\theta},\,\bm{w})\coloneqq\left(\bm{C}_{1}(\bm{\theta},\,\bm{w}),\,\bm{C}_{2}(\bm{\theta},\,\bm{w})\right)\in\mathbb{R}^{N\times 2}$ , where $\bm{C}_{1}(\bm{\theta},\,\bm{w})\in\mathbb{R}^{N}$ and $\bm{C}_{2}(\bm{\theta},\,\bm{w})\in\mathbb{R}^{N}$ are the covariances of the degree terms with the transitive connection term with weight $\gamma_{\mathscr{Z},\mathscr{Z}}$ and spillover term with weight $\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}$ , respectively.

We wish to bound the infinity norm of $\left(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\bm{w})\right)^{-1}$ , given by

\begin{array}[]{llllllllll}(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w}))^{-1}&=&\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})&\bm{C}(\bm{\theta},\,\bm{w})\\ \bm{C}(\bm{\theta},\,\bm{w})^{\top}&\bm{B}(\bm{\theta},\,\bm{w})\end{pmatrix}^{-1}\vskip 7.11317pt\\ &=&\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})^{-1}&\bm{0}_{N,2}\\ \bm{0}_{2,N}&\bm{0}_{2,2}\end{pmatrix}\\ &+&\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\bm{C}(\bm{\theta},\,\bm{w})^{\top}\\ -\bm{I}_{2,2}\end{pmatrix}\bm{V}(\bm{\theta},\,\bm{w})^{-1}\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\bm{C}(\bm{\theta},\,\bm{w})^{\top}\\ -\bm{I}_{2,2}\end{pmatrix}^{\top},\end{array}

where $\bm{0}_{a,b}\coloneqq\text{diag}(0,\ldots,0)\in\{0,1\}^{a\times b}$ and $\bm{I}_{a,b}\coloneqq\text{diag}(1,\ldots,1)\ \in\{0,1\}^{a\times b}$ ( $a,b\in\{1,2,\ldots\}$ ) are diagonal matrices, and

\begin{array}[]{llllllllll}\bm{V}(\bm{\theta},\,\bm{w})&\coloneqq&\bm{B}(\bm{\theta},\,\bm{w})-\bm{C}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}(\bm{\theta},\,\bm{w})\end{array}

is the Schur complement of $-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w})$ with respect to the block $\bm{A}(\bm{\theta},\,\bm{w})$ .

The $\ell_{\infty}$ -induced norm is submultiplicative, so

\begin{array}[]{llllllllll}&&\,{\left|\!\left|\!\left|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &\leq&\,{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}\\ &+&{\left|\!\left|\!\left|\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}(\bm{\theta},\,\bm{w})\\ -\bm{I}_{p,p}\end{pmatrix}\right|\!\right|\!\right|_{\infty}}{\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}{\left|\!\left|\!\left|\begin{pmatrix}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}(\bm{\theta},\,\bm{w})\\ -\bm{I}_{p,p}\end{pmatrix}^{\top}\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &\leq&{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}\,+\,\max\{1,\;{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}(\bm{\theta},\,\bm{w})\right|\!\right|\!\right|_{\infty}}\}\\ &\times&{\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}({\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}+1)\\ &\leq&{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}+\max\{1,\;{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}\,{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})\right|\!\right|\!\right|_{\infty}}\}\\ &\times&\,{\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}\,\left({\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})^{\top}\right|\!\right|\!\right|_{\infty}}\,{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}+1\right).\end{array}

(D.13)

We bound the terms ${\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}$ , ${\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})^{\top}\right|\!\right|\!\right|_{\infty}}$ , and ${\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}$ one by one.

Bounding ${\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}$ . The proof of Lemma 9 in S25 shows that

\begin{array}[]{llllllllll}{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}&\leq&\dfrac{18\,\chi(\bm{\theta}^{\star})^{2}}{N}\end{array}

(D.14)

for all $\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})$ , where $\chi(\bm{\theta}^{\star})$ is an upper bound on the inverse standard deviation of connections $Z_{i,j}$ of pairs of units $\{i,j\}\subset\mathscr{P}_{N}$ with $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset$ conditional on $\bm{X},\,\bm{Y},\,\bm{Z}_{-\{i,j\}}$ . Under the model considered here, the conditional distribution of $Z_{i,j}$ is Bernoulli, as shown in Section 2.2.2. Therefore, $\mathbb{V}_{\mathscr{Z},i,j}(Z_{i,j})$ is given by

\begin{array}[]{llllllllll}\mathbb{V}_{\mathscr{Z},i,j}(Z_{i,j})&=&\mathbb{P}(Z_{i,j}=1\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})\times(1-\mathbb{P}(Z_{i,j}=1\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})).\end{array}

Applying the bounds on $\mathbb{P}(Z_{i,j}=1\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})$ supplied by Lemma D.7 gives

\begin{array}[]{llllllllll}\mathbb{V}_{\mathscr{Z},i,j}(Z_{i,j})&\geq&\dfrac{1}{(\exp\left(C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right))^{2}}&\geq&\dfrac{1}{(\exp\left(C\,D^{2}\,(\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}+\epsilon^{\star})\right))^{2}},\end{array}

(D.15)

provided $D\in\{2,3,\ldots\}$ , where $D$ corresponds to the constant $D$ defined in (D.4) and $C$ corresponds to the constant $C$ in Condition 4. For the second inequality of (D.15), we use the fact that $\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\leq\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}+\epsilon^{\star}$ for all $\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})$ . With

\begin{array}[]{llllllllll}\chi(\bm{\theta}^{\star})&\coloneqq&\exp\left(C\,D^{2}\,(\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}+\epsilon^{\star})\right),\end{array}

we therefore deduce that $\chi(\bm{\theta}^{\star})$ is an bound on the inverse standard deviation of connections $Z_{i,j}$ :

\begin{array}[]{llllllllll}\chi(\bm{\theta}^{\star})&\geq&\dfrac{1}{\sqrt{\mathbb{V}_{\mathscr{Z},i,j}(Z_{i,j})}}.\end{array}

Bounding ${\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})^{\top}\right|\!\right|\!\right|_{\infty}}$ . Define $\bm{C}(\bm{\theta},\,\bm{w})\coloneqq\left(\bm{C}_{1}(\bm{\theta},\,\bm{w}),\,\bm{C}_{2}(\bm{\theta},\,\bm{w})\right)$ , where
$\bm{C}_{1}(\bm{\theta},\,\bm{w})\in\mathbb{R}^{N}$ and $\bm{C}_{2}(\bm{\theta},\,\bm{w})\in\mathbb{R}^{N}$ are the covariance terms of the degree terms with the sufficient statistics pertaining to the transitive connection term weighted by $\gamma_{\mathscr{Z},\mathscr{Z}}$ and the spillover term weighted by $\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}}$ , respectively. Then

\begin{array}[]{llllllllll}{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})^{\top}\right|\!\right|\!\right|_{\infty}}&\leq&\left|\!\left|\bm{C}_{1}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}\,+\,\left|\!\left|\bm{C}_{2}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}.\end{array}

We bound the terms $\left|\!\left|\bm{C}_{1}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}$ and $\left|\!\left|\bm{C}_{2}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}$ one by one.

By Lemma 13 of S25, $\left|\!\left|\bm{C}_{1}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}\leq 3\,D^{3}$ . The term $\bm{C}_{2}(\bm{\theta},\,\bm{w})\coloneqq(C_{2,1}(\bm{\theta},\,\bm{w}),\ldots,C_{2,N}(\bm{\theta},\,\bm{w}))\in\mathbb{R}^{N}$ refers to the covariances between the degrees $b_{i}(\bm{x},\,\bm{y},\,\bm{z})$ of units $i\in\{1,\dots,N\}$ and $b_{N+2}(\bm{x},\,\bm{y},\,\bm{z})$ . An upper bound on $t$ -th element of $\bm{C}_{2}(\bm{\theta},\,\bm{w})$ can be obtained by

\begin{array}[]{llllllllll}|C_{2,t}(\bm{\theta},\,\bm{w})|&=&\left|\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{C}_{\mathscr{Z},i,j}\left(b_{t}(\bm{x},\,\bm{y},\,\bm{Z}),\,b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right|\vskip 7.11317pt\\ &=&\left|\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{C}_{\mathscr{Z},i,j}\left(\displaystyle\sum\limits_{h\neq t}Z_{h,t},\;\,\sum_{h=1}^{N}\sum_{k=h+1}^{N}\,c_{h,k}\,(x_{h}\,y_{k}+x_{k}\,y_{h})\,Z_{h,k}\right)\right|\vskip 7.11317pt\\ &=&\left|\displaystyle\sum\limits_{i\neq t:\;\mathscr{N}_{i}\,\cap\,\mathscr{N}_{t}\,\neq\,\emptyset}\mathbb{C}_{\mathscr{Z},i,t}\left(Z_{i,t},\;(x_{i}\,y_{t}+y_{i}\,x_{t})\,Z_{i,t}\right)\right|\vskip 7.11317pt\\ &=&\left|\displaystyle\sum\limits_{i\neq t:\;\mathscr{N}_{i}\,\cap\,\mathscr{N}_{t}\,\neq\,\emptyset}(x_{i}\,y_{t}+y_{i}\,x_{t})\,\mathbb{V}_{\mathscr{Z},i,t}\left(Z_{i,t}\right)\right|\vskip 7.11317pt\\ &\leq&\left|\dfrac{C}{2}\;\displaystyle\sum\limits_{i\neq t:\;\mathscr{N}_{i}\,\cap\,\mathscr{N}_{t}\,\neq\,\emptyset}^{N}1\right|\;\;\leq\;\;C\,D^{2},\end{array}

(D.16)

where $C$ corresponds to the constant from Condition 4 and $D$ is defined in (D.4). On the third line, note that $b_{t}(\bm{x},\,\bm{y},\,\bm{z})$ only depends on connection $Z_{i,j}$ if $t\in\{i,j\}$ . Therefore, the covariance of $b_{t}(\bm{x},\,\bm{y},\,\bm{z})$ with respect to any other connection is 0. The first inequality follows from the observation that $x_{i}\,y_{j}+x_{j}\,y_{i}\leq 2\,C$ and $\mathbb{V}_{\mathscr{Z},i,j}\left(Z_{i,j}\right)\leq 1/4$ , which follows from $0\leq x_{i}\leq C<\infty$ by Condition 4 and $Y_{i}\in\{0,1\}$ . The second inequality follows from Lemma 15 in S25 bounding the pairs of units $i$ and $t$ such that $\mathscr{N}_{i}\,\cap\mathscr{N}_{t}\,\neq\,\emptyset$ from above by $D^{2}$ . Since the bound from (D.16) holds for all $t\in\{1,\ldots,N\}$ , we obtain $\left|\!\left|\bm{C}_{2}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}\leq C\,D^{2}$ . Taken together,

\begin{array}[]{llllllllll}{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})^{\top}\right|\!\right|\!\right|_{\infty}}&\leq&3\,D^{3}+C\,D^{2}&\leq&\max\{3,\,C\}\,D^{3}.\end{array}

(D.17)

Bounding ${\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}$ . Write

\begin{array}[]{llllllllll}\bm{B}(\bm{\theta},\,\bm{w})&\coloneqq&\begin{pmatrix}B_{1,1}(\bm{\theta},\,\bm{w})&B_{1,2}(\bm{\theta},\,\bm{w})\\ B_{1,2}(\bm{\theta},\,\bm{w})&B_{2,2}(\bm{\theta},\,\bm{w})\end{pmatrix}\vskip 7.11317pt\\ \bm{V}(\bm{\theta},\,\bm{w})&\coloneqq&\begin{pmatrix}V_{1,1}(\bm{\theta},\,\bm{w})&V_{1,2}(\bm{\theta},\,\bm{w})\\ V_{1,2}(\bm{\theta},\,\bm{w})&V_{2,2}(\bm{\theta},\,\bm{w})\end{pmatrix}.\end{array}

The elements of $\bm{V}(\bm{\theta},\,\bm{w})$ are then given by

\begin{array}[]{llllllllll}V_{i,j}(\bm{\theta},\,\bm{w})&=&B_{i,j}(\bm{\theta},\,\bm{w})-\bm{C}_{i}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{j}(\bm{\theta},\,\bm{w}).\end{array}

The inverse of $\bm{V}(\bm{\theta},\,\bm{w})$ is

\begin{array}[]{llllllllll}\bm{V}(\bm{\theta},\,\bm{w})^{-1}&=&\dfrac{1}{V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}}\begin{pmatrix}V_{2,2}(\bm{\theta},\,\bm{w})&-V_{1,2}(\bm{\theta},\,\bm{w})\\ -V_{1,2}(\bm{\theta},\,\bm{w})&V_{1,1}(\bm{\theta},\,\bm{w})\end{pmatrix},\end{array}

implying that

\begin{array}[]{llllllllll}\,{\left|\!\left|\!\left|(\bm{V}(\bm{\theta},\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}}&\leq&\dfrac{\max\left\{V_{1,1}(\bm{\theta},\,\bm{w}),\,V_{2,2}(\bm{\theta},\,\bm{w})\right\}+|V_{1,2}(\bm{\theta},\,\bm{w})|}{|V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}|}.\end{array}

(D.18)

Invoking the inequalities from (D.14) and (D.17), we obtain for $i,j\in\{1,2\}$

\begin{array}[]{llllllllll}&&|\bm{C}_{i}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{j}(\bm{\theta},\,\bm{w})|\\ &\leq&N\,\left|\!\left|\bm{C}_{i}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}\left|\!\left|\bm{C}_{i}(\bm{\theta},\,\bm{w})\right|\!\right|_{\infty}\vskip 7.11317pt\\ &\leq&N\,{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})\right|\!\right|\!\right|_{\infty}}{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &\leq&18\,\max\{9,\,C^{2}\}\,D^{6}\,\chi(\bm{\theta}^{\star})^{2},\end{array}

(D.19)

where $D$ corresponds to the constant $D$ defined in (D.4) and $C$ corresponds to the constant $C$ from Condition 4.

By applying Lemma D.7 along with (LABEL:eq:boundcac), we get for $i,j\in\{1,2\}$

\begin{array}[]{llllllllll}|V_{i,j}(\bm{\theta},\,\bm{w})|&=&\,|B_{i,j}(\bm{\theta},\,\bm{w})|\,+\,|\bm{C}_{i}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{j}(\bm{\theta},\,\bm{w})|\vskip 7.11317pt\\ &\leq&\max\{1,\,C^{2}\}\dfrac{N\,D^{5}}{4}+18\,\max\{9,\,C^{2}\}\,D^{6}\,\chi(\bm{\theta}^{\star})^{2}\vskip 7.11317pt\\ &\leq&\max\{9,\,C^{2}\}\,D^{5}\left(\dfrac{N}{4}+18\,D\,\chi(\bm{\theta}^{\star})^{2}\right)\end{array}

Thus, the numerator of (D.18) is bounded above by

\begin{array}[]{llllllllll}&\max\left\{V_{1,1}(\bm{\theta},\,\bm{w}),\,V_{2,2}(\bm{\theta},\,\bm{w})\right\}+|V_{1,2}(\bm{\theta},\,\bm{w})|\\ &\leq~\max\{9,\,C^{2}\}\,D^{5}\left(\dfrac{N}{2}+36\,D\,\chi(\bm{\theta}^{\star})^{2}\right).\end{array}

(D.20)

The denominator of (D.18), which is the determinant of $\bm{V}(\bm{\theta},\,\bm{w})$ , is

\begin{array}[]{llllllllll}&&~V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}\\ &=&\,(B_{1,1}(\bm{\theta},\,\bm{w})-\bm{C}_{1}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{1}(\bm{\theta},\,\bm{w}))\\ &\times&(B_{2,2}(\bm{\theta},\,\bm{w})-\bm{C}_{2}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{2}(\bm{\theta},\,\bm{w}))\\ &-&(B_{1,2}(\bm{\theta},\,\bm{w})-\bm{C}_{1}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{2}(\bm{\theta},\,\bm{w}))^{2}\\ &=&\,B_{1,1}(\bm{\theta},\,\bm{w})\,B_{2,2}(\bm{\theta},\,\bm{w})-B_{1,1}(\bm{\theta},\,\bm{w})\,\bm{C}_{2}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{2}(\bm{\theta},\,\bm{w})\\ &-&\,B_{2,2}(\bm{\theta},\,\bm{w})\,\bm{C}_{1}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{1}(\bm{\theta},\,\bm{w})\\ &+&(\bm{C}_{1}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{1}(\bm{\theta},\,\bm{w}))\,(\bm{C}_{2}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{2}(\bm{\theta},\,\bm{w}))-B_{1,2}(\bm{\theta},\,\bm{w})^{2}\\ &+&\,2\,B_{1,2}(\bm{\theta},\,\bm{w})\,(\bm{C}_{1}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{2}(\bm{\theta},\,\bm{w}))-(\bm{C}_{1}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{2}(\bm{\theta},\,\bm{w}))^{2}.\end{array}

Applying the property of positive semidefinite matrices $\bm{P}\in\mathbb{R}^{n\times n}$ that $(\bm{a}^{\top}\bm{P}\,\bm{a})\,(\bm{b}^{\top}\bm{P}\,\bm{b})\geq(\bm{a}^{\top}\bm{P}\,\bm{b})^{2}$ is true for all vectors $\bm{a}\in\mathbb{R}^{n}$ and $\bm{b}\in\mathbb{R}^{n}$ ( $n\geq 1$ ), we obtain

\begin{array}[]{llllllllll}&&~V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}\\ &\geq&\,B_{1,1}(\bm{\theta},\,\bm{w})\,B_{2,2}(\bm{\theta},\,\bm{w})-B_{1,2}(\bm{\theta},\,\bm{w})^{2}\\ &-&4\,\max\limits_{i,\,j}\,|B_{i,j}(\bm{\theta},\,\bm{w})|\,\,\max\limits_{i,\,j}\,|\bm{C}_{i}(\bm{\theta},\,\bm{w})^{\top}\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\bm{C}_{j}(\bm{\theta},\,\bm{w})|\\ &\geq&B_{1,1}(\bm{\theta},\,\bm{w})\,B_{2,2}(\bm{\theta},\,\bm{w})-B_{1,2}(\bm{\theta},\,\bm{w})^{2}-18\,\max\{81,\,C^{4}\}\,N\,D^{11}\,\chi(\bm{\theta}^{\star})^{2}\\ &=&U(\bm{\theta},\,\bm{w})-18\,\max\{81,\,C^{4}\}\,N\,D^{11}\,\chi(\bm{\theta}^{\star})^{2},\end{array}

where

\begin{array}[]{llllllllll}U(\bm{\theta},\,\bm{w})&\coloneqq&B_{1,1}(\bm{\theta},\,\bm{w})\,B_{2,2}(\bm{\theta},\,\bm{w})-B_{1,2}(\bm{\theta},\,\bm{w})^{2}.\end{array}

(D.21)

The final inequality follows from invoking (LABEL:eq:boundcac) along with Lemma D.7.

For (D.21), we obtain

\begin{array}[]{llllllllll}U(\bm{\theta},\,\bm{w})&=&\,B_{1,1}(\bm{\theta},\,\bm{w})\,B_{2,2}(\bm{\theta},\,\bm{w})-B_{1,2}(\bm{\theta},\,\bm{w})^{2}\vskip 7.11317pt\\ &=&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\vskip 7.11317pt\\ &\times&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Y},i}\left(b_{N+2}(\bm{x},\,\bm{Y},\,\bm{z})\right)+\,\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\vskip 7.11317pt\\ &-&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{C}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z}),\;b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)^{2}\vskip 7.11317pt\\ &=&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\left(\displaystyle\sum\limits_{i=1}^{N}\mathbb{V}_{\mathscr{Y},i}\left(b_{N+2}(\bm{x},\,\bm{Y},\,\bm{z})\right)\right)\vskip 7.11317pt\\ &+&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\vskip 7.11317pt\\ &-&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{C}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z}),\;b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)^{2}.\end{array}

Next, we show that the third term

\begin{array}[]{llllllllll}\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{C}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z}),b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)^{2}\end{array}

is smaller than the second term

\begin{array}[]{llllllllll}\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right).\end{array}

Define

\begin{array}[]{llllllllll}u_{1,i,j}\coloneqq\sqrt{\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)}\ \ \ \text{and}\ \ \ u_{2,i,j}\coloneqq\sqrt{\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)},\ \ \ i=1,\ldots,N.\end{array}

Then the second term can be restated as follows:

\begin{array}[]{llllllllll}&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\\ &=\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}u_{1,i,j}^{2}\right)\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}u_{2,i,j}^{2}\right),\end{array}

while the third term is

\begin{array}[]{llllllllll}&&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\,\mathbb{C}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z}),b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)^{2}\vskip 7.11317pt\\ &\leq&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\,|\mathbb{C}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z}),b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)|\right)^{2}\vskip 7.11317pt\\ &\leq&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\,\sqrt{\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\,\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)}\right)^{2}\vskip 7.11317pt\\ &=&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}u_{1,i,j}\,u_{2,i,j}\right)^{2}\vskip 7.11317pt\\ &\leq&\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}u_{1,i,j}^{2}\right)\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}u_{2,i,j}^{2}\right),\end{array}

where the Cauchy-Schwarz inequality is invoked on the third and last line. This translates to the following lower bound on $U(\bm{\theta},\,\bm{w})$ :

\begin{array}[]{llllllllll}U(\bm{\theta},\,\bm{w})&=&\,B_{1,1}(\bm{\theta},\,\bm{w})\,B_{2,2}(\bm{\theta},\,\bm{w})-B_{1,2}(\bm{\theta},\,\bm{w})^{2}\vskip 7.11317pt\\ &\geq&\,\left(\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\right)\left(\displaystyle\sum\limits_{i=1}^{N}\mathbb{V}_{\mathscr{Y},i}\left(b_{N+2}(\bm{x},\,\bm{Y},\,\bm{z})\right)\right)\vskip 7.11317pt\\ &\geq&\,\left(\displaystyle\sum\limits_{i=1}^{N}\dfrac{{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}}{(1+\chi(\bm{\theta}^{\star}))^{2}}\right)\,\displaystyle\sum\limits_{i=1}^{N}\left(\displaystyle\sum\limits_{j\neq i}^{N}c_{i,j}\,x_{j}\,z_{i,j}\right)^{2}\;\mathbb{V}_{\mathscr{Y},i}\left(Y_{i}\right),\end{array}

where $\bm{H}_{i,1}(\bm{w})$ is defined in (D.8) and the function $c_{i,j}$ is defined in (D.2). For the second inequality, we use the result from the proof of Lemma 13 in S25, which implies that

\begin{array}[]{llllllllll}\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{y},\,\bm{Z})\right)&\geq&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,d_{i,j}(\bm{z})\,\mathbb{V}_{\mathscr{Z},i,j}\left(Z_{i,j}\right)\\ \end{array}

(D.22)

where the function $d_{i,j}(\bm{Z})$ is defined in (D.2). By Lemma D.7, we get

\begin{array}[]{llllllllll}\mathbb{V}_{\mathscr{Z},i,j}(Z_{i,j})=\mathbb{P}(Z_{i,j}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})\times(1-\mathbb{P}(Z_{i,j}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z}))&\geq&\dfrac{1}{(1+\chi(\bm{\theta}^{\star}))^{2}},\end{array}

where $\chi(\bm{\theta}^{\star})$ is defined in (D.9). When combined with (D.22), this results in

\begin{array}[]{llllllllll}\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(b_{N+1}(\bm{y},\,\bm{Z})\right)&\geq&\dfrac{\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,d_{i,j}(\bm{z})}{(1+\chi(\bm{\theta}^{\star}))^{2}}\\ &\geq&\displaystyle\sum\limits_{i=1}^{N}\dfrac{{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}},\end{array}

where the second inequality is again from the proof of Lemma 13 in S25.

By applying Lemma D.7 and expanding the quadratic term, we obtain

\begin{array}[]{llllllllll}U(\bm{\theta},\,\bm{w})&\geq&\left(\displaystyle\sum\limits_{i=1}^{N}\dfrac{{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}\right)\,\displaystyle\sum\limits_{i=1}^{N}\,\mathbb{V}_{\mathscr{Y},i}\left(Y_{i}\right)\,\vskip 7.11317pt\\ &\times&\Bigg{(}\displaystyle\sum\limits_{j=i+1}^{N}x_{j}^{2}\,c_{i,j}\,z_{i,j}+\displaystyle\sum\limits_{h=1}^{N}\displaystyle\sum\limits_{k\neq h}^{N}\,c_{i,h}\,c_{i,k}\,x_{h}\,x_{k}\,z_{i,h}\,z_{i,k}\Bigg{)}\vskip 7.11317pt\\ &\geq&\left(\displaystyle\sum\limits_{i=1}^{N}\dfrac{{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}\right)\left(\displaystyle\sum\limits_{i=1}^{N}\,\mathbb{V}_{\mathscr{Y},i}\left(Y_{i}\right)\,\displaystyle\sum\limits_{j=i+1}^{N}x_{j}^{2}\;c_{i,j}\,z_{i,j}\right)\vskip 7.11317pt\\ &\geq&\dfrac{\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}\right)\,\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{w})\right|\!\right|_{\infty}}\right)}{2\,(1+\chi(\bm{\theta}^{\star}))^{4}},\end{array}

where $\bm{H}_{i,2}(\bm{w})$ is defined in (D.8) and $C$ corresponds to the constant from Condition 4. The second inequality follows from the assumption $x_{i}\in[0,\,C]$ by Condition 4 along with $c_{i,j}\in\{0,1\}$ and $z_{i,j}\in\{0,1\}$ . Lemma D.7 shows that

\begin{array}[]{llllllllll}\mathbb{P}(\bm{W}\in\mathscr{H})&\geq&1-\dfrac{4}{\max\{N,\,p\}^{2}},\end{array}

where $\mathscr{H}$ is defined in (D.8). For all $\bm{w}\in\mathscr{H}$ , we obtain by definition

\begin{array}[]{llllllllll}U(\bm{\theta},\,\bm{w})&\geq&\dfrac{\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}\right)\,\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{w})\right|\!\right|_{\infty}}\right)}{2\,(1+\chi(\bm{\theta}^{\star}))^{4}}&\geq&\dfrac{c^{2}\,N^{2}}{8\,(1+\chi(\bm{\theta}^{\star}))^{7}},\end{array}

which results in the following bound for the denominator of (D.18):

\begin{array}[]{llllllllll}V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}&\geq&U(\bm{\theta},\,\bm{w})-18\,\max\{81,\,C^{4}\}\,N\,D^{11}\,\chi(\bm{\theta}^{\star})^{2}\\ &\geq&\dfrac{c^{2}\,N^{2}}{8\,(1+\chi(\bm{\theta}^{\star}))^{7}}-18\,\max\{81,\,C^{4}\}\,N\,D^{11}\,\chi(\bm{\theta}^{\star})^{2}\vskip 7.11317pt\\ &>&\dfrac{c^{2}\,N^{2}}{8\,(1+\chi(\bm{\theta}^{\star}))^{7}}\,\left(1-\dfrac{18432\,\max\{81,\,C^{4}\}\,D^{11}\,\chi(\bm{\theta}^{\star})^{9}}{c^{2}\,N}\right),\end{array}

using the fact that $C>0$ , $D\geq 2$ , and $\epsilon^{\star}>0$ , which implies that

\begin{array}[]{llllllllll}\chi(\bm{\theta}^{\star})&\coloneqq&\exp(C\,D^{2}\,(|\!|\bm{\theta}^{\star}|\!|_{\infty}+\epsilon^{\star}))&>&1.\end{array}

Under Conditions 4 and D.3 with $\vartheta\in[0,\,1/18)$ , we have, for all $\bm{w}\in\mathscr{H}$ ,

\begin{array}[]{llllllllll}\dfrac{18432\,\max\{81,\,C^{4}\}\,D^{11}\,\chi(\bm{\theta}^{\star})^{9}}{c^{2}\,N}&\to&0&\mbox{as}&N\to\infty.\end{array}

Thus, there exists a real number $\epsilon>0$ along with an integer $N_{3}\in\{3,4,\ldots\}$ such that

\begin{array}[]{llllllllll}\dfrac{18432\,\max\{81,\,C^{4}\}\,D^{11}\,\chi(\bm{\theta}^{\star})^{9}}{c^{2}\,N}&\leq&\epsilon\end{array}

for all $N>N_{3}$ , which implies that

\begin{array}[]{llllllllll}&&V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}&\geq&\dfrac{c^{2}\,N^{2}}{8\,(1+\chi(\bm{\theta}^{\star}))^{7}}\,(1-\epsilon).\end{array}

(D.23)

Observe that (LABEL:eq:denom) provides a positive lower bound on the determinant of $\bm{V}(\bm{\theta},\,\bm{w})$ for $\bm{w}\in\mathscr{H}$ , demonstrating that

\begin{array}[]{llllllllll}|V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}|&=&V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}.\end{array}

(D.24)

Combining (LABEL:eq:boundw), (LABEL:eq:denom), and (D.24) shows that, for all $\bm{w}\in\mathscr{H}$ ,

\begin{array}[]{llllllllll}{\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}&\leq&\,\dfrac{\max\left\{V_{2,2}(\bm{\theta},\,\bm{w}),\,V_{1,1}(\bm{\theta},\,\bm{w})\right\}+V_{1,2}(\bm{\theta},\,\bm{w})}{V_{1,1}(\bm{\theta},\,\bm{w})\,V_{2,2}(\bm{\theta},\,\bm{w})-V_{1,2}(\bm{\theta},\,\bm{w})^{2}}\vskip 7.11317pt\vskip 7.11317pt\\ &\leq&\,\max\{9,\,C^{2}\}\,D^{5}\left(\dfrac{N}{2}+32\,D\,\chi(\bm{\theta}^{\star})^{2}\right)\,\dfrac{8\,(1+\chi(\bm{\theta}^{\star}))^{7}}{c^{2}\,N^{2}\,(1-\epsilon)}\vskip 7.11317pt\vskip 7.11317pt\\ &\leq&\,K_{1}\,\dfrac{D^{5}\,\chi(\bm{\theta}^{\star})^{7}}{N}\;\max\left\{1,\;\dfrac{D\,\chi(\bm{\theta}^{\star})^{2}}{N}\right\},\end{array}

(D.25)

where $K_{1}>0$ is a constant.

Conclusion. We show in two steps that $-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w})$ is invertible for all $\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})$ and all $\bm{w}\in\mathscr{H}$ . First, by Lemma 9 in S25, the matrix $\bm{A}(\bm{\theta},\,\bm{w})$ is invertible for all $\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})$ and all $\bm{w}\in\mathscr{H}$ . Second, (D.24) demonstrates that the determinant of $\bm{V}(\bm{\theta},\,\bm{w})$ is bounded away from $0$ for all $\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})$ and all $\bm{w}\in\mathscr{H}$ . Thus, $\bm{V}(\bm{\theta},\,\bm{w})$ is nonsingular for all $\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})$ and all $\bm{w}\in\mathscr{H}$ , and so is $-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w})$ by Theorem 8.5.11 of \citetsupp[][p. 99]harville_matrix_1997. Combining (LABEL:hessian_break), (D.14), (D.17), and (D.25) shows that, for all $\bm{\theta}\in\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})$ and all $\bm{w}\in\mathscr{H}$ ,

\begin{array}[]{llllllllll}{\left|\!\left|\!\left|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}}&\leq&{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &+&\max\{1,\;{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})\right|\!\right|\!\right|_{\infty}}\}\;{\left|\!\left|\!\left|\bm{V}(\bm{\theta},\,\bm{w})^{-1}\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &\times&\left(N\,{\left|\!\left|\!\left|\bm{C}(\bm{\theta},\,\bm{w})\right|\!\right|\!\right|_{\infty}}{\left|\!\left|\!\left|\bm{A}(\bm{\theta},\,\bm{w})^{-1}\,\right|\!\right|\!\right|_{\infty}}+1\right)\vskip 7.11317pt\vskip 7.11317pt\\ &\leq&\dfrac{18\,\chi(\bm{\theta}^{\star})^{2}}{N}\vskip 7.11317pt\vskip 7.11317pt\\ &+&\max\left\{1,\;\max\{3,\,C\}\,D^{3}\,\dfrac{18\,\chi(\bm{\theta}^{\star})^{2}}{N}\right\}\,K_{1}\;\dfrac{D^{5}\,\chi(\bm{\theta}^{\star})^{7}}{N}\,\vskip 7.11317pt\\ &\times&\max\left\{1,\;\dfrac{D^{2}\,\chi(\bm{\theta}^{\star})^{2}}{N}\right\}\;\left(\max\{3,\,C\}\,D^{3}\,18\,\chi(\bm{\theta}^{\star})^{2}+1\right).\end{array}

Conditions 4 and D.3 with $\vartheta\in[0,\,1/18)$ imply that

\begin{array}[]{llllllllll}\dfrac{\chi(\bm{\theta}^{\star})^{2}}{N}~<~\dfrac{\chi(\bm{\theta}^{\star})^{9}}{N}\;\rightarrow\;0\mbox{ as }N\to\infty.\end{array}

Thus, there exists an integer $N_{0}\in\{3,4,\dots\}$ such that the two maxima in the upper bound on ${\left|\!\left|\!\left|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}}$ are equal to $1$ for all $N>N_{0}$ , so that

\begin{array}[]{llllllllll}&&{\left|\!\left|\!\left|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta},\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &\leq&\,\dfrac{18\,\chi(\bm{\theta}^{\star})^{2}}{N}+K_{1}\,\dfrac{D^{5}\,\chi(\bm{\theta}^{\star})^{7}}{N}\left(\max\{3,\,C\}\,D^{3}\,18\,\chi(\bm{\theta}^{\star})^{2}+1\right)\vskip 7.11317pt\vskip 7.11317pt\\ &\leq&\,\dfrac{18\,\chi(\bm{\theta}^{\star})^{2}}{N}+K_{2}\,\dfrac{D^{8}\,\chi(\bm{\theta}^{\star})^{9}}{N}\vskip 7.11317pt\vskip 7.11317pt\\ &\leq&C_{1}\,\dfrac{\chi(\bm{\theta}^{\star})^{9}}{N},\end{array}

(D.26)

where $K_{2}>0$ and $C_{1}>0$ are constants. Substituting (LABEL:last.inequality) into the definition of $\Lambda_{N}(\bm{\theta}^{\star})$ concludes the proof of Lemma D.4:

\begin{array}[]{llllllllll}\Lambda_{N}(\bm{\theta}^{\star})&\coloneqq&\sup\limits_{\bm{w}\,\in\,\mathscr{H}}\;\,\sup\limits_{\bm{\theta}\,\in\,\mathscr{B}_{\infty}(\bm{\theta}^{\star},\,\epsilon^{\star})}\,{\left|\!\left|\!\left|(-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta};\,\bm{w}))^{-1}\right|\!\right|\!\right|_{\infty}}&\leq&C_{1}\,\dfrac{\chi(\bm{\theta}^{\star})^{9}}{N}.\end{array}

D.5 Bounding $\Psi_{N}$

Lemma 4. Consider the model of Corollary 4. Then $\sqrt{N/2}~\leq~\Psi_{N}~\leq~C_{2}\,\sqrt{N}$ , where $C_{2}>0$ is a constant.

Proof of Lemma D.5. The term $\Psi_{N}$ is defined in

\begin{array}[]{llllllllll}\Psi_{N}&\coloneqq&\underset{1\leq a\leq N+2}{\max}\;\left|\!\left|\bm{\Xi}_{a}\right|\!\right|_{2},\end{array}

where

\begin{array}[]{llllllllll}\bm{\Xi}_{a}&\coloneqq&(\Xi_{\{1\},a},\ldots,\Xi_{\{N\},a},\Xi_{\{1,2\},a},\ldots,\Xi_{\{N,N-1\},a})=(\bm{\Xi}_{\mathscr{Y},a},\bm{\Xi}_{\mathscr{Z},a}),~\text{$a\in\{1,\ldots,N+2\}$}.\end{array}

The sensitivity of the sufficient statistic vector $b_{a}(\bm{x},\,\bm{y},\,\bm{z})$ with respect to changes of responses is quantified by the vector $\bm{\Xi}_{\mathscr{Y},a}\in\mathbb{R}^{N}$ :

\begin{array}[]{llllllllll}\bm{\Xi}_{\mathscr{Y},a}&\coloneqq(\Xi_{\{1\},a},\ldots,\Xi_{\{N\},a}),\end{array}

where

\begin{array}[]{llllllllll}\Xi_{\{i\},a}&\coloneqq&\max\limits_{(\bm{w},\bm{w}^{\prime})\,\in\,\mathscr{W}\,\times\,\mathscr{W}:\;y_{k}=y_{k}^{\prime}\text{ for all }k\neq i,\,\bm{z}=\bm{z}^{\prime}}|b_{a}(\bm{x},\,\bm{y},\,\bm{z})-b_{a}(\bm{x},\,\bm{y}^{\prime},\bm{z}^{\prime})|.\end{array}

The sensitivity of the sufficient statistic vector $b_{a}(\bm{x},\,\bm{y},\,\bm{z})$ with respect to changes of connections is quantified by the vector $\bm{\Xi}_{\mathscr{Z},a}\in\mathbb{R}^{N}$ :

\begin{array}[]{llllllllll}\bm{\Xi}_{\mathscr{Z},a}&\coloneqq&(\Xi_{\{1,2\},a},\ldots,\Xi_{\{N,N-1\},a}),\end{array}

where

\begin{array}[]{llllllllll}\Xi_{\{i,j\},a}&\coloneqq&\max\limits_{(\bm{w},\bm{w}^{\prime})\,\in\,\mathscr{W}\,\times\,\mathscr{W}:\;\bm{y}=\bm{y}^{\prime},\,z_{k,l}=z_{k,l}^{\prime}\text{ for all }\,\{k,\,l\}\,\neq\,\{i,\,j\}}|b_{a}(\bm{x},\,\bm{y},\,\bm{z})-b_{a}(\bm{x},\,\bm{y}^{\prime},\bm{z}^{\prime})|.\end{array}

Define

\begin{array}[]{llllllllll}\Psi_{N}&=&\max\limits_{1\leq a\leq N+2}\,\sqrt{\displaystyle\sum\limits_{i=1}^{N}|\Xi_{\{i\},a}|^{2}+\displaystyle\sum\limits_{i=1}^{N}\displaystyle\sum\limits_{j=i+1}^{N}|\Xi_{\{i,j\},a}|^{2}}.\end{array}

(D.27)

•

For $a=1,\ldots,N$ , the statistic $b_{a}(\bm{x},\,\bm{y},\,\bm{z})$ refers to the degree effects of unit $a$ :

$\begin{array}[]{llllllllll}b_{a}(\bm{x},\,\bm{y},\,\bm{z})&=&\displaystyle\sum\limits_{j=1;\,j\neq a}^{N}z_{a,j}.\end{array}$

The term $\Xi_{\{i,j\},a}$ is $1$ if $a\in\{i,\,j\}$ and is $0$ otherwise. Since the statistic is unaffected by the response, $\Xi_{\{i\},a}=0$ for all $i=1,\ldots,N$ . For the sum in (D.27) over all $i<j$ , where $\mathbb{I}(a\in\{i,j\})=1$ holds $N$ times, yielding $\left|\!\left|\bm{\Xi}_{a}\right|\!\right|_{2}\,\leq\,\sqrt{N}$ for all $a=1,\ldots,N$ .

•

The statistic $b_{N+1}(\bm{x},\,\bm{y},\,\bm{z})$ refers to the transitive connections effect given by

\begin{array}[]{llllllllll}b_{N+1}(\bm{x},\,\bm{y},\,\bm{z})&=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}d_{i,j}(\bm{z})\,z_{i,j},\end{array}

where the function $d_{i,j}(\bm{Z})$ is defined in (D.2). Since this statistic is not affected by $\bm{y}$ , $\Xi_{\{i\},a}=0$ for all $i=1,\ldots,N$ . Following Lemma 18 in S25,

\begin{array}[]{llllllllll}\left|\!\left|\bm{\Xi}_{N+1}\right|\!\right|_{2}&\leq&\sqrt{N\,D^{2}\,(1+D)^{2}}&\leq&\sqrt{4\,N\,D^{4}}&=&2\,D^{2}\,\sqrt{N},\end{array}

where $D$ corresponds to the constant defined in D.4.

•

The statistic $b_{N+2}(\bm{x},\,\bm{y},\,\bm{z})$ refers to the spillover effect given by

\begin{array}[]{llllllllll}b_{N+2}(\bm{x},\,\bm{y},\,\bm{z})&=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,(x_{i}\,y_{j}+y_{i}\,x_{j})\,z_{i,j},\end{array}

where the function $c_{i,j}$ is defined in (D.2). For $\{i,j\}\subset\mathscr{P}_{N}$ , the terms $\Xi_{\{i,j\},N+2}$ are $(y_{i}\,x_{j}+y_{j}\,x_{i})\leq 2\,C$ if $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset$ and $0$ otherwise. For all $i\in\mathscr{P}_{N}$ ,

\begin{array}[]{llllllllll}\Xi_{\{i\},N+2}&=&\displaystyle\sum\limits_{j:\,\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}}x_{j}\,z_{i,j}&\leq&\displaystyle\sum\limits_{j:\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}}C&\leq&C\,D^{2},\end{array}

because according to Lemma 15 in S25 there are at most $D^{2}$ units such that $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset$ and $x_{i}\leq C$ for $i=1,\ldots,N$ according to Condition 4. Combining $\Xi_{\{i,j\},N+2}$ and $\Xi_{\{i\},N+2}$ gives

\begin{array}[]{llllllllll}\left|\!\left|\bm{\Xi}_{N+2}\right|\!\right|_{2}&\leq&\sqrt{2\,N\,C\,D^{2}+N\,C\,D^{2}}&\leq&D\,\sqrt{3\,N\,C}&\leq&2\,D\,\sqrt{N\,C},\end{array}

where $C$ corresponds to the constant $C$ in Condition 4.

Combining the results for $\left|\!\left|\bm{\Xi}_{a}\right|\!\right|_{2}$ for $a=1,\ldots,N+2$ gives

\begin{array}[]{llllllllll}\sqrt{N/2}&\leq&\Psi_{N}&\leq&2\,D\,\sqrt{N\,C}\\ \sqrt{N/2}&\leq&\Psi_{N}&\leq&C_{2}\,\sqrt{N},\end{array}

where $C_{2}\coloneqq 2\,D\,\sqrt{C}>0$ is a constant.

D.6 Bounding ${|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}$

Lemma 5. Consider the model of Corollary 4. If Conditions 4, 4, and D.1 are satisfied with $\vartheta\in[0,1/18)$ , there exists a constant $C_{3}\geq 1$ such that ${|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}\leq C_{3}$ for all $N\geq 2$ . If the population $\mathscr{P}_{N}$ consists of non-overlapping subpopulations with dependence restricted to subpopulations, the same result holds when Condition 4 is replaced by Condition D.3.

Proof of Lemma D.6. To bound ${|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}$ from above, we use the Hölder’s inequality

\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&\sqrt{{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{1}}\,{\left|\!\left|\!\left|\mathscr{D}(\bm{\theta}^{\star})\right|\!\right|\!\right|_{\infty}}},\end{array}

(D.28)

where

\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{1}}&\coloneqq&\max\limits_{1\leq j\leq M}\displaystyle\sum\limits_{i=1}^{M}|\mathscr{D}_{i,j}(\bm{\theta}^{\star})|\vskip 7.11317pt\\ {\left|\!\left|\!\left|\mathscr{D}(\bm{\theta}^{\star})\right|\!\right|\!\right|_{\infty}}&\coloneqq&\max\limits_{1\leq i\leq M}\displaystyle\sum\limits_{j=1}^{M}|\mathscr{D}_{i,j}(\bm{\theta}^{\star})|.\end{array}

We can therefore bound ${|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}$ by bounding the elements of the upper triangular coupling matrix $\mathscr{D}(\bm{\theta}^{\star})\in\mathbb{R}^{M\times M}$ which are

\begin{array}[]{llllllllll}\mathscr{D}_{i,j}(\bm{\theta}^{\star})&\coloneqq&\begin{cases}0&\mbox{if }i<j\\ 1&\mbox{if }i=j\\ \max\limits_{\bm{w}_{1:(i-1)}\,\in\,\mathscr{W}_{1:i-1}}\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}(W_{j}^{\star}\neq W_{j}^{\star\star})&\mbox{if }i>j.\\ \end{cases}\end{array}

Next, we define a symmetrized version of the coupling matrix denoted by $\mathscr{T}(\bm{\theta}^{\star})\in\mathbb{R}^{M\times M}$ with elements

\begin{array}[]{llllllllll}\mathscr{T}_{i,j}(\bm{\theta}^{\star})&\coloneqq&\begin{cases}\mathscr{D}_{j,i}(\bm{\theta}^{\star})&\mbox{if }i<j\\ \mathscr{D}_{i,i}(\bm{\theta}^{\star})&\mbox{if }i=j\\ \mathscr{D}_{i,j}(\bm{\theta}^{\star})&\mbox{if }i>j.\\ \end{cases}\end{array}

The symmetry of $\mathscr{T}(\bm{\theta}^{\star})$ yields the following upper bound for (D.28):

\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&\sqrt{{|\!|\!|\mathscr{T}(\bm{\theta}^{\star})|\!|\!|_{1}}\,{\left|\!\left|\!\left|\mathscr{T}(\bm{\theta}^{\star})\right|\!\right|\!\right|_{\infty}}}~=~{\left|\!\left|\!\left|\mathscr{T}(\bm{\theta}^{\star})\right|\!\right|\!\right|_{\infty}}\vskip 7.11317pt\\ &=&1+\underset{1\leq i\leq M}{\max}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}(W_{j}^{\star}\neq W_{j}^{\star\star})\vskip 7.11317pt\\ \end{array}

(D.29)

where the constant 1 in the second line stems from the diagonal elements of $\mathscr{T}(\bm{\theta}^{\star})$ .

Consider any $(i,j)\in\{1,\ldots,M\}\times\{1,\ldots,M\}$ such that $i\neq j$ and define the event $W_{i}\centernot{\longleftrightarrow}W_{j}$ as the event that there exists a path of disagreement between vertices $W_{i}$ and $W_{j}$ in $\mathscr{G}$ . A path of disagreement between vertices $W_{i}$ and $W_{j}$ in $\mathscr{G}$ is a path from $W_{i}$ to $W_{j}$ in $\mathscr{G}$ such that the coupling $(W_{(i+1):M}^{\star},\,W_{(i+1):M}^{\star\star})$ with joint probability mass function $\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}$ disagrees at each vertex on the path, in the sense that $W^{\star}\neq W^{\star\star}$ holds for all vertices $W$ on the path. Theorem 1 of \citetsupp[p. 753]BeMa94 shows that

\begin{array}[]{llllllllll}\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}(W_{j}^{\star}\neq W_{j}^{\star\star})&\leq&\mathbb{B}_{\bm{\pi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}),\end{array}

(D.30)

where $\mathbb{B}_{\bm{\pi}}$ is a Bernoulli product measure based on $M$ independent Bernoulli experiments with success probabilities $\bm{\pi}\coloneqq(\pi_{1},\dots,\pi_{M})\in[0,1]^{M}$ . With $v\in\{1,\ldots,M\}$ , the success probabilities $\pi_{v}$ are

\begin{array}[]{llllllllll}\pi_{v}&\coloneqq&\begin{cases}0&\text{if $v\in\{1,\ldots,i-1\}$}\\ 1&\text{if $v=i$}\\ \underset{(\bm{w}_{-v},\bm{w}_{-v}^{\prime})\in\mathscr{W}_{-v}\times\mathscr{W}_{-v}}{\max}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\text{if $v\in\{i+1,\ldots,M\},$}\end{cases}\end{array}

where

\begin{array}[]{llllllllll}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\coloneqq&\left|\!\left|\mathbb{P}_{\bm{\theta}}(\,\cdot\mid\bm{w}_{-v})-\mathbb{P}_{\bm{\theta}}(\,\cdot\mid\bm{w}_{-v}^{\prime})\right|\!\right|_{\text{TV}}.\end{array}

(D.31)

Lemma D.7 provides the following upper bound:

\begin{array}[]{llllllllll}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\leq&\dfrac{1}{1+\exp(-C\,D^{2}\,\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty})},\end{array}

(D.32)

where $C$ corresponds to the positive constant from Condition 4 and $D$ is defined in (D.4). Combining (D.32) with Condition D.3 shows that

\begin{array}[]{llllllllll}\dfrac{1}{1+\exp(-C\,D^{2}\,\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty})}&\leq&\dfrac{1}{1+\exp(-E-\vartheta\log N)}&\eqqcolon&U_{N}.\end{array}

(D.33)

The constant $U_{N}=U$ coincides with the constant $U$ considered in Condition D.1.

With (D.33), we define the vector $\bm{\xi}\in[0,1]^{M}$ with elements

\begin{array}[]{llllllllll}\xi_{v}&\coloneqq&\begin{cases}0&\text{if $v\in\{1,\ldots,i-1\}$}\\ 1&\text{if $v=i$}\\ U_{N}&\text{if $v\in\{i+1,\ldots,M\}$},\end{cases}\end{array}

and obtain

\begin{array}[]{llllllllll}\mathbb{B}_{\bm{\pi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G})&\leq&\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}),\end{array}

(D.34)

because $\pi_{v}\leq\xi_{v}$ for all $v=1,\ldots,M$ .

Next, we construct the set

\begin{array}[]{llllllllll}\mbox{$\mathscr{M}$}_{a,b}&\coloneqq&\{\{c,d\}:\;c\,\in\,\mathscr{N}_{a}\,\cup\,\mathscr{N}_{b},\;d\,\in\,\mathscr{N}_{a}\,\cup\,\mathscr{N}_{b}\setminus\{c\}\}\;\cup\;\{\{c\}:\;c\,\in\,\mathscr{N}_{a}\,\cup\,\mathscr{N}_{b}\}\end{array}

and two additional graphs with the same set of vertices as $\mathscr{G}$ :

1.
$\mathscr{G}_{1}\coloneqq(\mathscr{V},\mathscr{E}_{1})$ :
- •
  
  Vertex $W\in\mathscr{V}_{\mathscr{Z}}$ relating to connection $Z_{i,j}$ has edges to vertices that relate to all connections $Z_{h,k}$ and responses $Y_{h}$ with $\{h,k\},\{h\}\in\mbox{$\mathscr{M}$}_{i,j}$ .
- •
  
  Vertex $W\in\mathscr{V}_{\mathscr{Y}}$ relating to attribute $Y_{i}$ has edges to vertices that relate to all connections $Z_{h,k}$ and responses $Y_{h}$ with $\{h,k\},\{h\}\in\mbox{$\mathscr{M}$}_{i,N+1}$ for a fictional unit $N+1$ with $\mathscr{N}_{N+1}=\emptyset$ .
2.

$\mathscr{G}_{2}\coloneqq(\mathscr{V},\,\mathscr{E}_{1}\cup\mathscr{E}_{2})$ : The set $\mathscr{E}_{2}$ includes edges of all vertices $W_{i}\in\mathscr{V}$ with $i\in\{1,\ldots,M\}$ to vertices in $\mathscr{S}_{\mathscr{G}_{1},i,2}$ .

The graph $\mathscr{G}_{2}$ is a covering of $\mathscr{G}$ , so

\begin{array}[]{llllllllll}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G})&\leq&\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2}).\end{array}

(D.35)

Combining the previous results gives

\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&1+\underset{1\leq i\leq M}{\max}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{Q}_{\bm{\theta}^{\star},i,\bm{w}_{1:(i-1)}}(W_{j}^{\star}\neq W_{j}^{\star\star})\\ &\leq&1+\max\limits_{1\leq i\leq M}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{B}_{\bm{\pi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G})\\ &\leq&1+\max\limits_{1\leq i\leq M}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G})\\ &\leq&1+\max\limits_{1\leq i\leq M}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2}),\end{array}

(D.36)

using (D.29), (D.30), (D.34), (D.35). Sorting the vertices without $W_{i}$ by the geodesic distance to $W_{i}$ (i.e., by the length of the shortest path to $W_{i}$ ), we obtain

\begin{array}[]{llllllllll}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})&\leq&|\mathscr{S}_{\mathscr{G}_{2},i,1}|\left(\max\limits_{W_{j}\in\mathscr{S}_{\mathscr{G}_{2},i,1}}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})\right)\\ &+&\displaystyle\sum\limits_{k=2}^{\infty}|\mathscr{S}_{\mathscr{G}_{2},i,k}|\,\left(\max\limits_{W_{j}\in\mathscr{S}_{\mathscr{G}_{2},i,k}}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})\right)\\ &\leq&|\mathscr{S}_{\mathscr{G}_{2},i,1}|\\ &+&\displaystyle\sum\limits_{k=2}^{\infty}|\mathscr{S}_{\mathscr{G}_{2},i,k}|\,\max\limits_{W_{j}\in\mathscr{S}_{\mathscr{G}_{2},i,k}}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2}).\end{array}

(D.37)

For the event $W_{i}\centernot{\longleftrightarrow}W_{j}$ in $\mathscr{G}_{2}$ with $W_{j}\in\mathscr{S}_{\mathscr{G}_{2},i,k}$ and $k\,\geq\,2$ to occur, there must exist at least one vertex in each set $\mathscr{S}_{\mathscr{G}_{2},i,1},\ldots,\mathscr{S}_{\mathscr{G}_{2},i,k-1}$ at which the coupling disagrees. Therefore, we next derive bounds on $|\mathscr{S}_{\mathscr{G}_{2},i,k}|$ to obtain an upper bound on $\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})$ . Following Lemma D.7, Condition D.1 implies that for $i\in\{1,\ldots,M\}$ and $k\in\{2,3,\ldots\}$

\begin{array}[]{llllllllll}|\mathscr{S}_{\mathscr{G}_{2},i,k}|&\leq&K_{1}+K_{2}\,\log k\end{array}

(D.38)

and $|\mathscr{S}_{\mathscr{G}_{2},i,1}|\,\leq \,K_{3}$ , with constants $K_{1}\,\geq\,0,\,K_{2}\,\geq\,0$ , and $K_{3}\,>\,0$ being functions of the constants $\omega_{1}\,\geq\,0$ and $\omega_{2}\,\geq\,0$ defined in Condition D.1 and the constant $D\in\{2,3,\ldots\}$ defined in (D.4). The probability of event $W_{i}\centernot{\longleftrightarrow}W_{j}$ in $\mathscr{G}_{2}$ can then be bounded as follows:

\begin{array}[]{llllllllll}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})&\leq&U_{N}\,(1-(1-U_{N})^{K_{3}})\,\displaystyle\prod\limits_{l=2}^{k-1}\left[1-(1-U_{N})^{K_{1}+K_{2}\,\log l}\right]\\ &\leq&\displaystyle\prod\limits_{l=2}^{k-1}\left[1-(1-U_{N})^{K_{1}+K_{2}\,\log l}\right]\\ &\leq&\left[1-(1-U_{N})^{K_{1}+K_{2}\,\log(k-1)}\right]^{k-2},\end{array}

The first inequality follows from

\begin{array}[]{llllllllll}U_{N}\,(1-(1-U_{N})^{K_{3}})&\leq&1,\end{array}

because $U_{N}\in[0,1]$ and $K_{3}>0$ . Defining $K_{N}\coloneqq\exp(-K_{1}\,|\log(1-U_{N})|)$ , we obtain for $W_{j}\in\mathscr{S}_{\mathscr{G},i,k}$

\begin{array}[]{llllllllll}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})&\leq&\left[1-(1-U_{N})^{K_{1}+K_{2}\,\log(k-1)}\right]^{k-2}\\ &\leq&\exp(-K_{N}\,(k-1)^{1-K_{2}\,|\log(1-U_{N})|})\end{array}

(D.39)

with the inequality $1-a\,\leq\,\exp(-a)$ for all $a\in(0,1)$ .

Plugging (D.38) and (D.39) in (D.37), we obtain:

\begin{array}[]{llllllllll}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}&\mathbb{B}_{\bm{\xi}}&(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})\\ &\leq&K_{3}+\displaystyle\sum\limits_{k=2}^{\infty}(K_{1}+K_{2}\,\log k)\\ &\times&\exp\left(-K_{N}\,(k-1)^{1-K_{2}\,|\log(1-U_{N})|}\right)\vskip 7.11317pt\\ &=&K_{3}+K_{1}\,\displaystyle\sum\limits_{k=2}^{\infty}\exp\left(-K_{N}\,(k-1)^{1-K_{2}\,|\log(1-U_{N})|}\right)\\ &+&K_{2}\,\displaystyle\sum\limits_{k=2}^{\infty}\log k\,\exp\left(-K_{N}\,(k-1)^{1-K_{2}\,|\log(1-U_{N})|}\right),\\ \end{array}

(D.40)

resulting in two series that we bound one by one. With $\lceil\cdot\rceil:[0,\infty)\mapsto\{1,2,\ldots\}$ being the function giving the upper ceiling of a positive real number and $u_{N}\coloneqq\lceil 2/(1-K_{2}\,|\log(1-U_{N})|)\rceil$ , the first series can be bounded as follows:

\begin{array}[]{llllllllll}&&\displaystyle\sum\limits_{k=2}^{\infty}\exp\left(-K_{N}\,(k-1)^{1-K_{2}\,|\log(1-U_{N})|}\right)\vskip 7.11317pt\\ &=&\displaystyle\sum\limits_{k=1}^{\infty}\exp\left(-K_{N}\,k^{1-K_{2}\,|\log(1-U_{N})|}\right)\vskip 7.11317pt\\ &\leq&\dfrac{u_{N}!}{{(K_{N})}^{u_{N}}}\displaystyle\sum\limits_{k=1}^{\infty}\dfrac{1}{k^{2}}\;\;=\;\;\dfrac{u_{N}!\,\pi^{2}}{{(K_{N})}^{u_{N}}\,6}.\end{array}

The above bound is based on a Taylor expansion of $\exp(z)$ , which establishes the inequality $\exp(z)>z^{u}\,/\,u!$ implying for any $z>0$ and any $u\in\{1,2,\dots\}$ . This, in turn, implies the inequality $\exp(-z)<u!\,/\,z^{u}$ for any $z>0$ and any $u\in\{1,2,\dots\}$ . With $v_{N}\coloneqq\lceil 3/(1-K_{2}\,|\log(1-U_{N})|)\rceil$ , we apply the same inequality to the second series:

\begin{array}[]{llllllllll}&&\displaystyle\sum\limits_{k=2}^{\infty}\log(k)\,\exp\left(-{K_{N}}\,(k-1)^{1-K_{2}\,|\log(1-U_{N})|}\right)\vskip 7.11317pt\\ &=&\displaystyle\sum\limits_{k=1}^{\infty}\log(k+1)\,\exp\left(-{K_{N}}\,k^{1-K_{2}\,|\log(1-U_{N})|}\right)\vskip 7.11317pt\\ &\leq&\dfrac{v_{N}!}{{(K_{N})}^{v_{N}}}\displaystyle\sum\limits_{k=1}^{\infty}\dfrac{\log(k+1)}{k^{3}}\vskip 7.11317pt\\ &\leq&\dfrac{v_{N}!}{{(K_{N})}^{v_{N}}}\displaystyle\sum\limits_{k=1}^{\infty}\dfrac{k}{k^{3}}~=~\dfrac{v_{N}!}{{(K_{N})}^{v_{N}}}\displaystyle\sum\limits_{k=1}^{\infty}\dfrac{1}{k^{2}}~=~\dfrac{v_{N}!\,\pi^{2}}{{(K_{N})}^{v_{N}}\,6}.\end{array}

Plugging these results into (D.40) gives

\begin{array}[]{llllllllll}\displaystyle\sum\limits_{j=1:\,j\neq i}^{M}\mathbb{B}_{\bm{\xi}}(W_{i}\centernot{\longleftrightarrow}W_{j}\text{ in }\mathscr{G}_{2})&\leq&K_{3}+\dfrac{\pi^{2}}{6}\left(K_{1}\,\dfrac{u_{N}!}{{(K_{N})}^{u_{N}}}+K_{2}\,\dfrac{v_{N}!}{{(K_{N})}^{v_{N}}}\right).\\ \end{array}

(D.41)

Last but not least, combining (D.41) with (D.36) yields

\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&1+K_{3}+\dfrac{\pi^{2}}{6}\left(K_{1}\,\dfrac{u_{N}!}{{(K_{N})}^{u_{N}}}+K_{2}\,\dfrac{v_{N}!}{{(K_{N})}^{v_{N}}}\right).\end{array}

Under Condition 4, $\vartheta=0$ holds, hence $U_{\vartheta,N},K_{N},u_{\vartheta,N}$ , and $v_{\vartheta,N}$ in (D.33) reduce to

\begin{array}[]{llllllllll}U_{N}&=&\dfrac{1}{1+\exp(-E)}&\eqqcolon&U\vskip 7.11317pt\\ K_{N}&=&\exp(-K_{1}\,|\log(1-U)|)&\eqqcolon&K_{4}\vskip 7.11317pt\\ u_{N}&=&\left\lceil\dfrac{2}{1-K_{2}\,|\log U|}\right\rceil&\eqqcolon&u\vskip 7.11317pt\\ v_{N}&=&\left\lceil\dfrac{3}{1-K_{2}\,|\log U|}\right\rceil&\eqqcolon&v,\end{array}

which are constants independent of $\vartheta$ and $N$ . The constant $U$ corresponds to the constant $U$ from Condition D.1. This translates to

\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&C_{3},\end{array}

with $C_{3}\coloneqq 1+K_{3}+(\pi^{2}/\,6)\,(K_{1}\,u!/K_{4}^{u}+K_{2}\,v!/K_{4}^{v})\,\geq\,1$ . For non-overlapping subpopulations, we have $K_{1}=K_{2}=0$ and

\begin{array}[]{llllllllll}{|\!|\!|\mathscr{D}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&1+K_{3}&=&C_{3}.\end{array}

D.7 Auxiliary Results

Lemma 6. Consider the model of Corollary 4. Condition D.1 implies that there exist constants $K_{1}\geq 0$ , $K_{2}\geq 0$ , $K_{3}>0$ such that, for all $k\in\{2,3,\ldots\}$ and all $i\in\{1,\ldots,M\}$ ,

\begin{array}[]{llllllllll}|\mathscr{S}_{\mathscr{G}_{2},i,k}|&\leq&K_{1}+K_{2}\,\log k\vskip 7.11317pt\\ |\mathscr{S}_{\mathscr{G}_{2},i,1}|&\leq&K_{3}.\end{array}

Proof of Lemma D.7. With the set

\begin{array}[]{llllllllll}\mbox{$\mathscr{M}$}_{a,b}&\coloneqq&\{\{c,d\}:\;c\,\in\,\mathscr{N}_{a}\,\cup\,\mathscr{N}_{b},\;d\,\in\,\mathscr{N}_{a}\,\cup\,\mathscr{N}_{b}\setminus\{c\}\}\;\cup\;\{\{c\}:\;c\,\in\,\mathscr{N}_{a}\,\cup\,\mathscr{N}_{b}\},\end{array}

we constructed from $\mathscr{G}$ two additional graphs $\mathscr{G}_{1}$ and $\mathscr{G}_{2}$ as follows:

1.
$\mathscr{G}_{1}\coloneqq(\mathscr{V},\mathscr{E}_{1})$ :
- •
  
  Vertex $W\in\mathscr{V}_{\mathscr{Z}}$ relating to connection $Z_{i,j}$ has edges to vertices that relate to all connections $Z_{h,k}$ and responses $Y_{h}$ with $\{h,k\},\{h\}\in\mbox{$\mathscr{M}$}_{i,j}$ .
- •
  
  Vertex $W\in\mathscr{V}_{\mathscr{Y}}$ relating to attribute $Y_{i}$ has edges to vertices that relate to all connections $Z_{h,k}$ and responses $Y_{h}$ with $\{h,k\},\{h\}\in\mbox{$\mathscr{M}$}_{i,N+1}$ for a fictional unit $N+1$ with $\mathscr{N}_{N+1}=\emptyset$ .
2.

$\mathscr{G}_{2}\coloneqq(\mathscr{V},\,\mathscr{E}_{1}\,\cup\,\mathscr{E}_{2})$ : The set $\mathscr{E}_{2}$ includes edges of all vertices $W_{i}\in\mathscr{V}$ with $i\in\{1,\ldots,M\}$ to vertices in $\mathscr{S}_{\mathscr{G}_{1},i,2}$ .

The graph $\mathscr{G}_{1}$ is equivalent to the graph cover $\mathscr{G}^{\star}$ defined in Lemma 16 of S25. Therefore, we are able to use results from the proof of Lemma 16 in S25 demonstrating that Condition D.1 implies the following bound for $\mathscr{S}_{\mathscr{G}_{1},i,k}$ :

\begin{array}[]{llllllllll}|\mathscr{S}_{\mathscr{G}_{1},i,k}|&\leq&(\omega_{1}+1)(2\,D^{3}\,\omega_{1}+\omega_{2}\,\log(k-1)),&&k\in\{2,3,\ldots\},\end{array}

where $D$ corresponds to the constant defined in (D.4) and the constants $\omega_{1}\,\geq\,0$ and $0\,\leq\,\omega_{2}\,\leq\,\min\limits\{\omega_{1},\,1/((\omega_{1}+1)\,|\log(1-U)|)\}$ with $U\coloneqq(1+\exp(-A))^{-1}>0$ correspond to the constant from Condition D.1. Defining $K_{5}\coloneqq 2\,\omega_{1}\,(\omega_{1}+1)\,D^{3}\,\geq\,0$ and $K_{6}\coloneqq\omega_{2}\,(\omega_{1}+1)\,\geq\,0$ , this bound is:

\begin{array}[]{llllllllll}|\mathscr{S}_{\mathscr{G}_{1},i,k}|&\leq&K_{5}+K_{6}\,\log(k-1),&&k\in\{2,3,\ldots\}.\end{array}

The bound for $\mathscr{S}_{\mathscr{G}_{1},i,1}\leq 4\,D^{2}+D$ differs to the result from S25 since for our definition of $\mbox{$\mathscr{M}$}_{i,j}$ there are additional $|\mathscr{N}_{i}\cup\mathscr{N}_{j}|\,\leq\,D$ responses in $\mbox{$\mathscr{M}$}_{i,j}$ .

Adding edges $\mathscr{E}_{2}$ , defined as the edges from vertices to other vertices with a geodesic distance of two in $\mathscr{G}_{1}$ , to $\mathscr{G}_{2}$ reduces the geodesic distance between all vertices from $k\in\{1,2,\ldots\}$ in $\mathscr{G}_{1}$ to $\lceil k/2\rceil$ in $\mathscr{G}_{2}$ . Therefore, $|\mathscr{S}_{\mathscr{G}_{2},i,k}|=|\mathscr{S}_{\mathscr{G}_{1},i,2\,k}|+|\mathscr{S}_{\mathscr{G}_{1},i,2\,k-1}|$ holds for $k\in\{1,2,\ldots\}$ and $i\in\{1,\ldots,M\}$ . This allows us to relate the bounds for $|\mathscr{S}_{\mathscr{G}_{1},i,k}|$ to bounds for $|\mathscr{S}_{\mathscr{G}_{2},i,k}|$ with $k=2,3,\ldots$ and $i\in\{1,\ldots,M\}$ :

\begin{array}[]{llllllllll}|\mathscr{S}_{\mathscr{G}_{2},i,k}|&=&|\mathscr{S}_{\mathscr{G}_{1},i,2\,k}|+|\mathscr{S}_{\mathscr{G}_{1},i,2\,k-1}|\\ &\leq&2\,K_{5}+K_{6}\,(\log(2\,k)+\log(2\,k-1))\\ &\leq&2\,K_{5}+2\,K_{6}\,\log(2\,k)\\ &=&K_{1}+K_{2}\,\log k\end{array}

and

\begin{array}[]{llllllllll}|\mathscr{S}_{\mathscr{G}_{2},i,1}|&\leq&4\,D^{2}+D+K_{1}\eqqcolon K_{3},\end{array}

with $K_{1}\coloneqq 2\,K_{5}+2\,K_{6}\log 2$ and $K_{2}\coloneqq 2\,K_{6}$ . This proves the statement with $K_{1}\geq 0,K_{2}\geq 0,$ and $K_{3}>0$ .

Lemma 7. Consider the model of Corollary 4. Then, for any pair of units $\{i,j\}\subset\mathscr{P}_{N}$ such that $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset$ ,

\begin{array}[]{llllllllll}\dfrac{1}{1+\exp\left(C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}~\leq~\mathbb{P}_{\bm{\theta}}(Z_{i,j}=1\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})~\leq~\dfrac{1}{1+\exp\left(-C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}.\end{array}

Proof of Lemma D.7. For all $\{i,j\}\subset\mathscr{P}_{N}$ such that $\mathscr{N}_{i}\cap\mathscr{N}_{j}\neq\emptyset$ , the conditional probability of $Z_{i,j}$ given $(\bm{X},\bm{Y},\bm{Z}_{-\{i,j\}})=(\bm{x},\bm{y},\bm{z}_{-\{i,j\}})$ is

\begin{array}[]{llllllllll}&\mathbb{P}_{\bm{\theta}}(Z_{i,j}=z_{i,j}\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})\vskip 7.11317pt\\ =&\dfrac{\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},z_{i,j})\right)}{\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},1)\right)+\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},0)\right)}\vskip 7.11317pt\\ =&\dfrac{1}{1+g(1-z_{i,j};\bm{z}_{-\{i,j\}},z_{i,j},\bm{\theta})},\end{array}

with

\begin{array}[]{llllllllll}g(z;\,\bm{z}_{-\{i,j\}},z_{i,j},\bm{\theta})&=&\exp\left(\bm{\theta}^{\top}\left(\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},z)-\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},z_{i,j})\right)\right).\end{array}

Note that

\begin{array}[]{llllllllll}\underset{z_{-\{i,j\}}\in\mathscr{Z}_{-\{i,j\}}}{\max}|b_{a}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},0)-b_{a}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},1)|\vskip 7.11317pt\\ \leq\begin{cases}0&\text{if $a\in\{1,\ldots,N\}\setminus\{i,j\}$}\\ 1&\text{if $a\in\{i,j\}$}\\ 1+D&\text{if $a=N+1$}\\ 2\,C&\text{if $a=N+2$}\\ \end{cases},\end{array}

where $\mathscr{Z}_{-\{i,j\}}\coloneqq\mathbin{\leavevmode\hbox to9.47pt{\vbox to9.47pt{\pgfpicture\makeatletter\hbox{\thinspace\lower-0.43056pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}\pgfsys@setlinewidth{0.86111pt}\pgfsys@invoke{ }{}{{}}{} {}{}{}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{8.61108pt}{8.61108pt}\pgfsys@moveto{0.0pt}{8.61108pt}\pgfsys@lineto{8.61108pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}}_{(k,h)\neq(i,j)}^{N}\,\mathscr{Z}_{k,h}$ is the domain of $\bm{Z}$ excluding $Z_{i,j}$ , $C$ corresponds to the constant from Condition 4, and $D$ matches the constant defined in (D.4). . The bounds for $a=1,\ldots,N$ follow from the observation, that the degree statistic of unit $a$ can, first, only affected by connections $z_{i,j}$ with $a\in\{i,j\}$ and, second, be at most 1 if this is the case. For $a=N+1$ , the bound follows from Lemma 18 of S25. For $a=N+2$ , the sufficient statistic counts the number of connections with overlapping neighborhoods and either $Y_{i}\,x_{j}>0$ or $Y_{j}\,x_{i}>0$ . For $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset$ , the maximal change in the statistic is $2\,C$ since $y_{i}\in\{0,1\}$ and $x_{i}\leq C$ for $i\in\mathscr{P}_{N}$ , otherwise the maximal change is 0.

Upon applying the triangle inequality,

\begin{array}[]{llllllllll}|\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},z)-\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}},z_{i,j})|~\leq~(2+2\,C+D)\;\left|\!\left|\bm{\theta}\right|\!\right|_{\infty},\end{array}

we obtain for $\mathscr{N}_{i}\cap\mathscr{N}_{j}\neq\emptyset$

\begin{array}[]{llllllllll}\exp\left(-(2+2\,C+D)\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)~\leq~g(1-z_{i,j};\bm{z}_{-\{i,j\}},z_{i,j},\bm{\theta})~\leq~\exp\left((2+2\,C+D)\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right).\end{array}

Upon collecting terms, we obtain the final result:

\begin{array}[]{llllllllll}\dfrac{1}{1+\exp\left(C_{6}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}&\leq&\mathbb{P}_{\bm{\theta}}(Z_{i,j}=1\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})&\leq&\dfrac{1}{1+\exp\left(-C_{6}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}\vskip 7.11317pt\\ \dfrac{1}{1+\exp\left(C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}&\leq&\mathbb{P}_{\bm{\theta}}(Z_{i,j}=1\mid\bm{x},\,\bm{y},\,\bm{z}_{-\{i,j\}})&\leq&\dfrac{1}{1+\exp\left(-C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}\end{array}

where $D\in\{2,3,\ldots\}$ and $C_{6}\coloneqq 2+2\,C+D>0$ are constants.

Lemma 8. Consider the model of Corollary 4. Then, for any $i\in\{1,\ldots,M\}$

\begin{array}[]{llllllllll}\dfrac{1}{1+\exp\left(C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}~\leq~\mathbb{P}_{\bm{\theta}}(Y_{i}=1\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})~\leq~\dfrac{1}{1+\exp\left(-C\,D^{2}\,\left|\!\left|\bm{\theta}\right|\!\right|_{\infty}\right)}.\end{array}

Proof of Lemma D.7. The conditional probability of $Y_{i}$ given $(\bm{X},\,\bm{Y}_{-i},\bm{Z})=(\bm{y}_{-i},\,\bm{z})$ is

\begin{array}[]{llllllllll}\mathbb{P}_{\bm{\theta}}(Y_{i}=y_{i}\mid\bm{x},\,\bm{y}_{-i},\,\bm{z})&=&\dfrac{\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y}_{-i},\,y_{i},\bm{z})\right)}{\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y}_{-i},0,\bm{z})\right)+\exp\left(\bm{\theta}^{\top}\,\bm{b}(\bm{x},\,\bm{y}_{-i},1,\bm{z})\right)}\vskip 7.11317pt\\ &=&\dfrac{1}{g(0;\bm{y}_{-i},\,y_{i},\bm{\theta})+g(1;\bm{y}_{-i},\,y_{i},\bm{\theta})},\end{array}

where

\begin{array}[]{llllllllll}g(y;\,\bm{y}_{-i},\,y_{i},\,\bm{\theta})&=&\exp\left(\bm{\theta}^{\top}\left(\bm{b}(\bm{x},\,\bm{y}_{-i},\,y,\,\bm{z})-\bm{b}(\bm{x},\,\bm{y}_{-i},\,y_{i},\,\bm{z})\right)\right).\end{array}

Note that

\begin{array}[]{llllllllll}\underset{y_{-i}\in\mathscr{Y}_{-i}}{\max}\;|b_{a}(\bm{x},\,\bm{y}_{-i},\,0,\bm{z})-b_{a}(\bm{x},\,\bm{y}_{-i},1,\bm{z})|\;\leq\;\begin{cases}0&\text{if $a\in\{1,\ldots,N+1\}$}\\ C\,D^{2}&\text{if $a=N+2$,}\\ \end{cases}\end{array}

where $\mathscr{Y}_{-i}\coloneqq\mathbin{\leavevmode\hbox to9.47pt{\vbox to9.47pt{\pgfpicture\makeatletter\hbox{\thinspace\lower-0.43056pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}\pgfsys@setlinewidth{0.86111pt}\pgfsys@invoke{ }{}{{}}{} {}{}{}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{8.61108pt}{8.61108pt}\pgfsys@moveto{0.0pt}{8.61108pt}\pgfsys@lineto{8.61108pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}}_{j\neq i}^{N}\,\mathscr{Y}_{j}$ is the domain of $\bm{Y}$ without $Y_{i}$ , $C$ corresponds to the constant from Condition 4, and $D$ matches the constant defined in (D.4). The bounds for $a=1,\ldots,N+1$ are 0 as the corresponding statistics are not affected by changes in $\bm{y}$ . For $a=N+2$ , the maximal change is bounded by the number of units $j$ such that $\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\neq\emptyset$ , which is $D^{2}$ , times the maximal value $C$ of the predictors. The remainder of the proof of Lemma D.7 resembles the proof of Lemma D.7.

Lemma 9. Consider the model of Corollary 4. If Conditions 4, D.1, and D.3 are satisfied with $\vartheta\in[0,1/18)$ , we obtain the following bounds for all elements of $\bm{B}(\bm{\theta},\,\bm{w})$ , being the covariance matrix of the sufficient statistics $b_{N+1}(\bm{x},\,\bm{y},\,\bm{z})$ and $b_{N+2}(\bm{x},\,\bm{y},\,\bm{z})$ defined in Section D.1, for all $\bm{\theta}\in\bm{\Theta}$ and all $\bm{w}\in\mathscr{W}$ :

\begin{array}[]{llllllllll}B_{1,1}(\bm{\theta},\,\bm{w})&\leq&\,\dfrac{ND^{5}}{4}\vskip 7.11317pt\\ |B_{1,2}(\bm{\theta},\,\bm{w})|&\leq&\,\dfrac{N\,C^{2}\,D^{5}}{4}\vskip 7.11317pt\\ B_{2,2}(\bm{\theta},\,\bm{w})&\leq&\ \dfrac{N\,C^{2}\,D^{5}}{4}.\end{array}

Proof of Lemma D.7. We first bound $B_{1,1}(\bm{\theta},\,\bm{w})$ from above as follows:

\begin{array}[]{llllllllll}B_{1,1}(\bm{\theta},\,\bm{w})&=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(s_{N+1}(\bm{x},\,\bm{y},\,\bm{Z})\right)\vskip 7.11317pt\\ &=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(\displaystyle\sum\limits_{a=1}^{N}\displaystyle\sum\limits_{b=a+1}^{N}d_{a,b}(\bm{Z})\,Z_{a,b}\right)\vskip 7.11317pt\\ &=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,\mathbb{V}_{\mathscr{Z},i,j}\,\left(\displaystyle\sum\limits_{a=1}^{N}\displaystyle\sum\limits_{b=a+1}^{N}Z_{a,b}\,d_{a,b}(\bm{Z})\right)\vskip 7.11317pt\\ &\leq&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,D^{2}\left(\displaystyle\sum\limits_{a=1}^{N}\displaystyle\sum\limits_{b=a+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\,\left(Z_{a,b}\,d_{a,b}(\bm{Z})\right)\right)\end{array}

(D.42)

where $D$ matches the constant defined in (D.4) and the function $d_{a,b}(\bm{Z})$ is defined in (D.2). On the second line of (D.42), we use that the fact that $\mathcal{N}_{i}\,\cap\,\mathcal{N}_{j}\,=\,\emptyset$ implies that $d_{i,j}(\bm{Z})\,Z_{i,j}=0$ and $d_{a,b}(\bm{Z})\,Z_{a,b}$ does not depend on $Z_{i,j}$ for any $\{a,b\}\neq\{i,j\}$ . For the inequality in the last line of (D.42), we use the fact that the number of pairs $(a,b)$ for which $d_{a,b}(\bm{Z})\,Z_{a,b}$ is a function of $Z_{i,j}$ is bounded above by $D$ (see proof of Lemma 19 in S25). Invoking Lemma 15 of S25 together with applying

\begin{array}[]{llllllllll}\displaystyle\sum\limits_{a=1}^{N}\displaystyle\sum\limits_{b=a+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\,\left(d_{a,b}(\bm{Z})\,Z_{a,b}\right)&\leq&\dfrac{D}{4}\end{array}

gives:

\begin{array}[]{llllllllll}B_{1,1}(\bm{\theta},\,\bm{w})&\leq&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,D^{2}\left(\displaystyle\sum\limits_{a=1}^{N}\displaystyle\sum\limits_{b=a+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\,\left(Z_{a,b}\,d_{a,b}(\bm{Z})\right)\right)\;\;\leq\;\;\dfrac{N\,D^{5}}{4}\end{array}

We proceed with bounding $B_{2,2}(\bm{\theta},\,\bm{w})$ :

\begin{array}[]{llllllllll}B_{2,2}(\bm{\theta},\,\bm{w})&=&\displaystyle\sum\limits_{i=1}^{N}\mathbb{V}_{\mathscr{Y},i}\left(s_{N+2}(\bm{x},\bm{Y},\,\bm{z})\right)+\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(s_{N+2}(\bm{x},\,\bm{y},\,\bm{Z})\right)\\ &=&\displaystyle\sum\limits_{i=1}^{N}\mathbb{V}_{\mathscr{Y},i}\left(\left(\displaystyle\sum\limits_{j=1}^{N}c_{i,j}\,x_{j}\,z_{i,j}\right)\,Y_{i}\right)+\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}\mathbb{V}_{\mathscr{Z},i,j}\left(c_{i,j}\,(x_{i}\,y_{j}+x_{j}\,y_{i})\,Z_{i,j}\right)\\ &=&\displaystyle\sum\limits_{i=1}^{N}\left(\displaystyle\sum\limits_{j=1}^{N}c_{i,j}\,x_{j}\,z_{i,j}\right)^{2}\mathbb{V}_{\mathscr{Y},i}\left(Y_{i}\right)+\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=i+1}^{N}c_{i,j}\,(x_{i}\,y_{j}+x_{j}\,y_{i})^{2}\,\mathbb{V}_{\mathscr{Z},i,j}\left(Z_{i,j}\right)\vskip 7.11317pt\\ &\leq&\,\dfrac{5\,N\,C^{2}\,D^{4}}{4}~\leq~\dfrac{N\,C^{2}\,D^{5}}{4},\end{array}

because $|x_{j}|\,\leq C$ according to Condition 4. For the first inequality, we also use that

\begin{array}[]{llllllllll}\displaystyle\sum\limits_{j=1}^{N}c_{i,j}&\leq&D^{2}\end{array}

by Lemma 15 in S25. We obtain

\begin{array}[]{llllllllll}\max\{B_{1,1}(\bm{\theta},\,\bm{w}),\,B_{2,2}(\bm{\theta},\,\bm{w})\}&\leq&\dfrac{N\,C^{2}\,D^{5}}{4},\end{array}

which provides an upper bound on $|B_{1,2}(\bm{\theta},\,\bm{w})|$ by the Cauchy-Schwarz inequality:

\begin{array}[]{llllllllll}|B_{1,2}(\bm{\theta},\,\bm{w})|&\leq&\sqrt{B_{1,1}(\bm{\theta},\,\bm{w})}\,\sqrt{B_{2,2}(\bm{\theta},\,\bm{w})}\ \leq\,\dfrac{N\,C^{2}\,D^{5}}{4}.\end{array}

Lemma 10. Consider the model of Corollary 4. Define

\begin{array}[]{llllllllll}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\coloneqq&\left|\!\left|\mathbb{P}_{\bm{\theta}}(\,\cdot\mid\bm{w}_{-v})-\mathbb{P}_{\bm{\theta}}(\,\cdot\mid\bm{w}_{-v}^{\prime})\right|\!\right|_{\text{TV}}\end{array}

\begin{array}[]{llllllllll}\pi^{\star}&\coloneqq&\underset{1\,\leq\,v\,\leq\,M}{\max}\;\underset{(\bm{w}_{-v},\,\bm{w}_{-v}^{\prime})\,\in\,\mathscr{W}_{-v}\times\mathscr{W}_{-v}}{\max}\;\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}.\end{array}

Let $D\in\{2,3,\ldots\}$ be the maximum degree of vertices $Z_{i,j}$ in $\mathscr{G}$ . Then

\begin{array}[]{llllllllll}\pi^{\star}&\leq&\dfrac{1}{1+\exp(-C\,D^{2}\,\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty})}.\end{array}

Proof of Lemma D.7. The proof of Lemma D.7 resembles the proof of Lemma 21 in S25, adapted to the bounds on the conditional probabilities derived in Lemmas D.7 and D.7. We distinguish four cases, where $W_{v}$ with $v\in\{1,\ldots,M\}$ relates to:

1.

Connection $Z_{i,j}$ of a pair of nodes $\{i,\,j\}\subset\mathscr{P}_{N}$ with $\mathscr{N}_{i}\cap\mathscr{N}_{j}=\emptyset$ .
2.

Attribute $Y_{i}$ with $i\in\mathscr{P}_{N}$ and $\{j\in\mathscr{P}_{N}:\;\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset\}=\emptyset$ .
3.

Connection $Z_{i,j}$ of a pair of nodes $\{i,\,j\}\subset\mathscr{P}_{N}$ with $\mathscr{N}_{i}\cap\mathscr{N}_{j}\neq\emptyset$ .
4.

Attribute $Y_{i}$ with $i\in\mathscr{P}_{N}$ and $\{j\in\mathscr{P}_{N}:\;\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset\}\neq\emptyset$ .

In cases 1 and 2, $W_{v}$ is independent of $\bm{W}_{-v}$ , so that $\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}=0$ ; note that case 2 cannot occur, because Condition 4 ensures that there are no units $i\in\mathscr{P}_{N}$ with $\{j\in\mathscr{P}_{N}:\;\mathscr{N}_{i}\,\cap\,\mathscr{N}_{j}\,\neq\,\emptyset\}=\emptyset$ . In cases 3 and 4, $W_{v}$ depends on a non-empty subset of other vertices in $\mathscr{G}$ . Consider any $v\in\{1,\ldots,M\}$ such that $\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}>0$ for some $(\bm{w}_{-v},\bm{w}_{-v}^{\prime})\in\mathscr{W}_{-v}\times\mathscr{W}_{-v}$ and define

\begin{array}[]{llllllllll}a_{0,v}&\coloneqq&\mathbb{P}_{\bm{\theta}}(W_{v}=0\mid\bm{W}_{-v}=\bm{w}_{-v})~\text{ and }~a_{1,v}&\coloneqq&\mathbb{P}_{\bm{\theta}}(W_{v}=1\mid\bm{W}_{-v}=\bm{w}_{-v})\\ b_{0,v}&\coloneqq&\mathbb{P}_{\bm{\theta}}(W_{v}=0\mid\bm{W}_{-v}=\bm{w}_{-v}^{\prime})~\text{ and }~b_{1,v}&\coloneqq&\mathbb{P}_{\bm{\theta}}(W_{v}=1\mid\bm{W}_{-v}=\bm{w}_{-v}^{\prime}).\end{array}

Lemma 21 in S25 shows that

\begin{array}[]{llllllllll}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\leq&\min\{\max\{a_{0,v},\,b_{0,v}\},\;\max\{a_{1,v},\,b_{1,v}\}\}.\end{array}

Plugging in the bounds on the conditional probabilities in Lemmas D.7 and D.7, we obtain

\begin{array}[]{llllllllll}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\leq&\dfrac{1}{1+\exp\left(-C\,D^{2}\,\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}\right)},&v\in\mathscr{V}_{Z}\end{array}

and

\begin{array}[]{llllllllll}\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\leq&\dfrac{1}{1+\exp\left(-C\,D^{2}\,\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty}\right)},&v\in\mathscr{V}_{Y}.\end{array}

Since $D\in\{2,3,\ldots\}$ , we obtain

\begin{array}[]{llllllllll}\pi^{\star}&\coloneqq&\underset{1\,\leq\,v\,\leq\,M}{\max}\;\;\underset{(\bm{w}_{-v},\,\bm{w}_{-v}^{\prime})\,\in\,\mathscr{W}_{-v}\times\mathscr{W}_{-v}}{\max}\;\pi_{v,\bm{w}_{-v},\bm{w}_{-v}^{\prime}}&\leq&\dfrac{1}{1+\exp(-C\,D^{2}\,\left|\!\left|\bm{\theta}^{\star}\right|\!\right|_{\infty})}.\end{array}

Lemma 11. Consider the model of Corollary 4. If Conditions 4 and D.1 are satisfied along with either Condition 4 or Condition D.3 with $\vartheta\in[0,1/18)$ , there exists an integer $N_{0}\in\{3,4,\ldots\}$ such that, for all $N>N_{0}$ ,

\begin{array}[]{llllllllll}\mathbb{P}(\bm{W}\not\in\mathscr{H})&\leq&\dfrac{4}{\max\{N,\,p\}^{2}},\end{array}

where $\mathscr{H}$ is defined in (D.8).

Proof of Lemma D.7. We prove Lemma D.7 by showing that

\begin{array}[]{llllllllll}\mathbb{P}\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}~<~\dfrac{N}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}\right)&\leq&\dfrac{2}{\max\{N,\,p\}^{2}}\vskip 7.11317pt\vskip 7.11317pt\\ \mathbb{P}\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{W})\right|\!\right|_{\infty}}~<~\dfrac{c^{2}\,N}{2\,(1+\chi(\bm{\theta}^{\star}))}\right)&\leq&\dfrac{2}{\max\{N,\,p\}^{2}}.\end{array}

(D.43)

To prove the first line of (D.43), we first bound $(1/2)\sum_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}$ from below. We then use Theorem 1 of \citetsupp[][p. 207]Chetal07 to concentrate $\sum_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}$ . Last, but not least, we show that there exists an integer $N_{0}\in\{3,4,\ldots\}$ such that the obtained lower bound for $(1/2)\sum_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}$ is, with high probability, greater than the deviation of $\sum_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}$ from its mean. The first line of (D.43) follows from combining these steps. The second line of (D.43) can be established along the same lines. A union bound then establishes the desired result:

\begin{array}[]{llllllllll}\mathbb{P}(\bm{W}\not\in\mathscr{H})&\leq&\mathbb{P}\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}~<~\dfrac{N}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}\right)\vskip 7.11317pt\\ &+&\mathbb{P}\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{W})\right|\!\right|_{\infty}}~<~\dfrac{c^{2}\,N}{2\,(1+\chi(\bm{\theta}^{\star}))}\right)\vskip 7.11317pt\\ &\leq&\dfrac{4}{\max\{N,\,p\}^{2}}.\end{array}

Step 1: Condition 4 implies that, for each unit $i\in\mathscr{P}_{N}$ , there exists a unit $j\in\mathscr{P}_{N}\setminus\{i\}$ such that $\mathscr{N}_{i}\cap\mathscr{N}_{j}\neq\emptyset$ and $x_{j}\in[c,\,C]$ . Thus, by Lemma D.7, Lemma 17 of S25, and Conditions 4 and 4, we obtain

\begin{array}[]{llllllllll}\dfrac{1}{2}\,\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}&\geq&\dfrac{1}{2}\,\displaystyle\sum\limits_{i=1}^{N}\mathbb{E}\,d_{i,j}(\bm{Z})&\geq&\dfrac{N}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}&\geq&\dfrac{N}{4\,\chi(\bm{\theta}^{\star})^{2}}\vskip 7.11317pt&\geq&\dfrac{N^{1-2\,\vartheta}}{4\,\exp(2\,E)}.\end{array}

Theorem 1 of \citetsupp[][p. 207]Chetal07 implies

\begin{array}[]{llllllllll}\mathbb{P}\left(\left|\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{w})\right|\!\right|_{\infty}}-\mathbb{E}\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}\right|<t\right)&\geq&1-2\exp\left(-\dfrac{2\,t^{2}}{\Psi_{N}^{2}\,{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}^{2}}}\right).\end{array}

Choosing

\begin{array}[]{llllllllll}t&\coloneqq&\sqrt{\log\max\{N,\,p\}}\,\Psi_{N}\,{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}\end{array}

gives

\begin{array}[]{llllllllll}&&\mathbb{P}\left(\left|\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}-\mathbb{E}\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}\right|<\sqrt{\log\max\{N,\,p\}}\,\Psi_{N}\,{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}\right)\vskip 7.11317pt\\ &\geq&1-\dfrac{2}{\max\{N,\,p\}^{2}}.\end{array}

Next, we demonstrate that there exists an integer $N_{1}\in\{3,4,\dots\}$ such that, for all $N>N_{1}$ ,

\begin{array}[]{llllllllll}\sqrt{\log\max\{N,\,p\}}\,\Psi_{N}\,{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&\dfrac{N^{1-2\,\vartheta}}{4\,\exp(2\,E)}.\end{array}

To do so, we bound the three terms one by one. Using $\max\{N,\,p\}=N+2$ , the first term, $\sqrt{\log\max\{N,\,p\}}$ , is bounded above by $\sqrt{\log\max\{N,\,p\}}\,\leq\,2\,\sqrt{\log N}$ provided $N\geq 2$ . The second term is bounded above by $\Psi_{N}~\leq~D\,\sqrt{N}$ as shown in the proof of Lemma 14 in S25. The third term is bounded above by ${|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}<C_{3}$ by Lemma D.6, where $C_{3}>0$ is a constant.

Combining these results gives

\begin{array}[]{llllllllll}2\,\sqrt{N\,\log N}\,C_{3},D&\leq&\dfrac{N^{1-\vartheta}}{4\,\exp(E)}\vskip 7.11317pt\\ 8\,C_{3}\,D\,\exp(E)&\leq&\sqrt{\dfrac{N^{1-2\,\vartheta}}{\log N}}.\end{array}

Similar to the proof of Lemma 14 in S25, this implies

\begin{array}[]{llllllllll}\mathbb{P}\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,1}(\bm{W})\right|\!\right|_{\infty}}~\geq~\dfrac{N}{2\,(1+\chi(\bm{\theta}^{\star}))^{2}}\right)&\geq&1-\dfrac{2}{\max\{N,\,p\}^{2}}.\end{array}

Step 2: Conditions 4 and 4 along with Lemma D.7 establish

\begin{array}[]{llllllllll}\dfrac{1}{2}\,\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{w})\right|\!\right|_{\infty}}&\geq&\dfrac{c^{2}}{2}\,\displaystyle\sum\limits_{i=1}^{N}\mathbb{E}\,Z_{i,j}&\geq&\dfrac{c^{2}\,N}{2\,(1+\chi(\bm{\theta}^{\star}))}&\geq&\dfrac{c^{2}\,N}{4\,\chi(\bm{\theta}^{\star})}&\geq&\dfrac{c^{2}\,N^{1-\vartheta}}{4\,\exp(E)}.\end{array}

Once more, we invoke Theorem 1 of \citetsupp[][p. 207]Chetal07 to obtain

\begin{array}[]{llllllllll}&&\mathbb{P}\left(\left|\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{W})\right|\!\right|_{\infty}}-\mathbb{E}\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{W})\right|\!\right|_{\infty}}\right|\,<\,\sqrt{\log\max\{N,\,p\}}\,\Psi_{N}\,{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}\right)\vskip 7.11317pt\\ &\geq&1-\dfrac{2}{\max\{N,\,p\}^{2}}.\end{array}

We proceed by showing that there exists an integer $N_{2}\in\{3,4,\dots\}$ such that, for all $N>N_{2}$ ,

\begin{array}[]{llllllllll}\sqrt{\log\max\{N,\,p\}}\,\Psi_{N}\,{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}&\leq&\dfrac{c^{2}\,N^{1-\vartheta}}{4\,\exp(E)}.\end{array}

(D.44)

We bound the three terms on the left-hand side of (D.44) one by one. The bounds on the first term, $\sqrt{\log\max\{N,\,p\}}$ , and third term, ${|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}$ , are the same as in the first step. With regard to the second term, we obtain $\Psi_{N}~\leq~C_{2}\,\sqrt{N}$ by the proof of Lemma D.5 with $C_{2}>0$ .

Combining these bounds gives

\begin{array}[]{llllllllll}2\,\sqrt{N\,\log N}\,C_{3}&\leq&\dfrac{c^{2}\,N^{1-\vartheta}}{4\,\exp(E)}\vskip 7.11317pt\\ \dfrac{8}{c^{2}}\,C_{3}\,\exp(E)&\leq&\sqrt{\dfrac{N^{1-2\,\vartheta}}{\log N}},\end{array}

which vanishes as $N\rightarrow\infty$ under Conditions 4 and D.3 with $\vartheta\in[0,\,1/18)$ . Thus, for all $N>N_{2}$ ,

\begin{array}[]{llllllllll}\mathbb{P}\left(\displaystyle\sum\limits_{i=1}^{N}{\left|\!\left|\bm{H}_{i,2}(\bm{W})\right|\!\right|_{\infty}}~\geq~\dfrac{c^{2}\,N}{2\,(1+\chi(\bm{\theta}^{\star}))}\right)&\geq&1-\dfrac{2}{\max\{N,\,p\}^{2}}.\end{array}

Appendix E Quasi-Newton Acceleration

The two-step algorithm described in Section 3.2 iterates two steps:

Step 1: Update $\bm{\theta}_{1}^{(t)}$ given $\bm{\theta}_{2}^{(t-1)}$ using a MM algorithm with a linear convergence rate (\citealpsuppbohning_monotonicity_1988, Theorem 4.1).
Step 2: Update $\bm{\theta}_{2}^{(t)}$ given $\bm{\theta}_{1}^{(t)}$ using a Newton-Raphson update with a quadratic convergence rate.

To accelerate Step 1, we use quasi-Newton methods (\citealpsupplange_optimization_2000): We approximate the difference between $(\bm{A}^{\star})^{-1}$ and $[\bm{A}(\bm{\theta}^{(t)})]^{-1}$ , defined in Lemma 3.2 and Equation (9), respectively, by rank-one updates.

A first-order Taylor approximation of $\nabla_{\bm{\theta}_{1}}\,\ell(\bm{\theta}_{1},\bm{\theta}_{2}^{(t)})$ around $\bm{\theta}_{1}^{(t)}$ shows that

\displaystyle-\bm{A}(\bm{\theta}^{(t)})\;\bm{k}(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{1};\,\bm{\theta}_{2}^{(t)})\;\approx\;\bm{\theta}_{1}^{(t)}-\bm{\theta}_{1},

(E.1)

where

\begin{array}[]{llllllllll}\bm{k}(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{1};\,\bm{\theta}_{2}^{(t)})&\coloneqq&\nabla_{\bm{\theta}_{1}}\,\ell(\bm{\theta}_{1},\bm{\theta}_{2}^{(t)})\Big{|}_{\bm{\theta}_{1}=\bm{\theta}_{1}^{(t)}}-\nabla_{\bm{\theta}_{1}}\,\ell(\bm{\theta}_{1},\bm{\theta}_{2}^{(t)}).\end{array}

(E.2)

Since a standard Newton-Raphson algorithm corresponds to (E.1) with $\bm{\theta}_{1}=\bm{\theta}_{1}^{(t-1)}$ , the change in consecutive estimates carries information on $[\bm{A}(\bm{\theta}^{(t)})]^{-1}$ , which we want to approximate. More specifically, we approximate the difference between $\left(\bm{A}^{\star}\right)^{-1}$ and $[\bm{A}(\bm{\theta}^{(t)})]^{-1}$ . Thus we write $\left(\bm{A}^{\star}\right)^{-1}-\left[\bm{A}(\bm{\theta}^{(t)})\right]^{-1}\eqqcolon\bm{M}^{(t)}$ and set $\bm{\theta}_{1}=\bm{\theta}_{1}^{(t-1)}$ , so that (E.1) becomes

\begin{array}[]{llllllllll}\bm{M}^{(t)}\,\bm{k}(\bm{\theta}^{(t)}_{1},\bm{\theta}_{1}^{(t-1)};\bm{\theta}_{2}^{(t)})&=&(\bm{\theta}_{1}^{(t)}-\bm{\theta}_{1}^{(t-1)})+\left(\bm{A}^{\star}\right)^{-1}\,\bm{k}(\bm{\theta}^{(t)}_{1},\bm{\theta}_{1}^{(t-1)};\bm{\theta}_{2}^{(t)})\eqqcolon\bm{r}^{(t)},\end{array}

(E.3)

which is called the inverse secant condition for updating $\bm{M}^{(t)}$ . Given that (E.3) relates $[\bm{A}(\bm{\theta}^{(t)})]^{-1}$ to the score functions through the definition of $\bm{k}(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{1}^{(t-1)};\,\bm{\theta}_{2}^{(t)})$ in (E.2) and estimates $\bm{\theta}_{1}^{(t)}$ and $\bm{\theta}_{1}^{(t-1)}$ , the updates of $\bm{M}^{(t+1)}$ will be based on (E.3). We employ the parsimonious symmetric, rank-one update of \citetsuppdavidon_variable_1991 to satisfy (E.3) by updating $\bm{M}^{(t)}$ as follows:

\begin{array}[]{llllllllll}\bm{M}^{(t)}&=&\bm{M}^{(t-1)}+\bm{q}^{(t)}\left(\bm{q}^{(t)}\right)^{\top}\left[c^{(t)}\right]^{-1},\end{array}

(E.4)

with $\bm{q}^{(t)}\coloneqq\bm{r}^{(t)}-\bm{M}^{(t-1)}\,\bm{k}(\bm{\theta}^{(t)},\bm{\theta}^{(t-1)};\,\bm{\theta}_{2}^{(t)})$ and $c^{(t)}\coloneqq(\bm{q}^{(t)})^{\top}\,\bm{k}(\bm{\theta}^{(t)},\bm{\theta}^{(t-1)};\,\bm{\theta}_{2}^{(t)})$ . We seed the algorithm with the MM update described in Section 3.2 by setting $\bm{M}^{(0)}=\bm{0}$ , the $N\times N$ null matrix.

In summary, the quasi-Newton acceleration of the MM algorithm updates $\bm{\theta}_{1}^{(t)}$ given $\bm{\theta}_{2}^{(t)}$ as follows:

Step 1: Calculate $\bm{k}(\bm{\theta}_{1}^{(t)},\,\bm{\theta}_{1}^{(t-1)};\,\bm{\theta}_{2}^{(t)})$ defined in (E.2).
Step 2: Update $\bm{M}^{(t)}$ according to (E.4).

Step 3: Update $\bm{\theta}_{1}^{(t+1)}$ from $\bm{\theta}_{1}^{(t)}$ :

\displaystyle\bm{\theta}_{1}^{(t+1)}=\bm{\theta}_{1}^{(t)}+\left[\left(\bm{A}^{\star}\right)^{-1}-\bm{M}^{(t)}\right]\left[\nabla_{\bm{\theta}_{1}}\,\ell(\bm{\theta}_{1},\bm{\theta}_{2}^{(t)})\Big{|}_{\bm{\theta}_{1}=\bm{\theta}_{1}^{(t)}}\right].

(E.5)

Unlike the unaccelerated MM algorithm, the described quasi-Newton algorithm does not guarantee that $\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t+1)})\,\geq\,\ell(\bm{\theta}_{1}^{(t+1)},\,\bm{\theta}_{2}^{(t)})$ . Therefore, $\bm{\theta}_{1}^{(t+1)}$ is updated by either the quasi-Newton update (E.5) or the MM update (10), whichever gives rise to the higher pseudo-likelihood. The resulting updates slightly increase the computing time per iteration while potentially dramatically decreasing the total number of iterations.

Appendix F MM Algorithm: Directed Connections

If connections are directed, $Z_{i,j}$ may differ from $Z_{j,i}$ . In such cases, the pseudo-loglikelihood can be written as

\begin{array}[]{llllllllll}\ell(\bm{\theta})&\coloneqq&\displaystyle\sum\limits_{i=1}^{N}\ell_{i}(\bm{\theta})+\displaystyle\sum\limits_{i=1}^{N-1}\,\displaystyle\sum\limits_{j=1,\,j\neq i}^{N}\ell_{i,j}(\bm{\theta}),\end{array}

where $\ell_{i}$ and $\ell_{i,j}$ are defined by

\begin{array}[]{llllllllll}\ell_{i}(\bm{\theta})\ \coloneqq\ \log\,p_{\bm{\theta}}(y_{i}\mid\bm{y}_{-i},\,\bm{z})\quad\mbox{and}\quad\ell_{i,j}(\bm{\theta})\ \coloneqq\ \log\,p_{\bm{\theta}}(z_{i,j}\mid\bm{y},\,\bm{z}_{-\{i,j\}}).\end{array}

We partition the parameter vector $\bm{\theta}\coloneqq(\bm{\theta}_{1},\,\bm{\theta}_{2})\in\mathbb{R}^{2N+12}$ into

•

the nuisance parameter vector: $\bm{\theta}_{1}\coloneqq(\alpha_{\mathscr{Z},O,1}$ , $\dots$ , $\alpha_{\mathscr{Z},O,N},\alpha_{\mathscr{Z},I,1}$ , $\dots$ , $\alpha_{\mathscr{Z},I,N-1})\in\mathbb{R}^{2\,N-1}$ ;
•

the parameter vector of primary interest: $\bm{\theta}_{2}\coloneqq(\alpha_{\mathscr{Y}},$ $\,\beta_{\mathscr{X},\mathscr{Y},1},\,\beta_{\mathscr{X},\mathscr{Y},2},\,$ $\beta_{\mathscr{X},\mathscr{Y},3},$ $\,\lambda,\,\gamma_{\mathscr{Z},\mathscr{Z},1}$ , $\gamma_{\mathscr{Z},\mathscr{Z},2}$ , $\gamma_{\mathscr{X},\mathscr{Z},1}$ , $\gamma_{\mathscr{X},\mathscr{Z},2}$ , $\gamma_{\mathscr{X},\mathscr{Z},3}$ , $\gamma_{\mathscr{X},\mathscr{Z},4}$ , $\gamma_{\mathscr{Y},\mathscr{Z}}$ , $\gamma_{\mathscr{X},\mathscr{Y},\mathscr{Z}})\in\mathbb{R}^{13}$ .

As explained in Section 6.1, $\alpha_{\mathscr{Z},N,I}$ is set to $0$ in order to address identifiability issues. The negative Hessian is partitioned in accordance:

\begin{array}[]{llllllllll}-\nabla_{\bm{\theta}}^{2}\;\ell(\bm{\theta})&\coloneqq&\begin{pmatrix}\bm{A}(\bm{\theta})&\bm{B}(\bm{\theta})\\ \bm{B}(\bm{\theta})^{\top}&\bm{C}(\bm{\theta})&\end{pmatrix},\end{array}

where $\bm{A}(\bm{\theta})\in\mathbb{R}^{(2\,N-1)\times(2\,N-1)}$ , $\bm{B}(\bm{\theta})\in\mathbb{R}^{(2\,N-1)\times 13}$ , and $\bm{C}(\bm{\theta})\in\mathbb{R}^{13\times 13}$ . Writing $\ell(\bm{\theta}_{1},\bm{\theta}_{2})$ in place of $\ell(\bm{\theta})$ , we compute at iteration $t+1$ :

Step 1: Given $\bm{\theta}_{2}^{(t)}$ , find $\bm{\theta}_{1}^{(t+1)}$ satisfying $\ell(\bm{\theta}_{1}^{(t+1)},\bm{\theta}_{2}^{(t)})\,\geq\,\ell(\bm{\theta}_{1}^{(t)},\bm{\theta}_{2}^{(t)})$ .
Step 2: Given $\bm{\theta}_{1}^{(t+1)}$ , find $\bm{\theta}_{2}^{(t+1)}$ satisfying $\ell(\bm{\theta}_{1}^{(t+1)},\bm{\theta}_{2}^{(t+1)})\,\geq\,\ell(\bm{\theta}_{1}^{(t+1)},\bm{\theta}_{2}^{(t)})$ .

In Step 1, it is inconvenient to invert the high-dimensional $(2\,N-1)\times(2\,N-1)$ matrix

\begin{array}[]{llllllllll}\bm{A}(\bm{\theta}^{(t)})&\coloneqq&-\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=1,\,j\neq i}^{N}\,\nabla_{\bm{\theta}_{1}}^{2}\,\ell_{i,j}(\bm{\theta}_{1},\bm{\theta}_{2}^{(t)})\Big{|}_{\bm{\theta}_{1}=\bm{\theta}_{1}^{(t)}}&=&\displaystyle\sum\limits_{i=1}^{N}\,\displaystyle\sum\limits_{j=1,\,j\neq i}^{N}\pi_{i,j}^{(t)}\,(1-\pi_{i,j}^{(t)})\,\bm{e}_{i,j}\,\bm{e}_{i,j}^{\top}.\end{array}

Note that the definition of vector $\bm{e}_{i,j}\in\mathbb{R}^{2\,N-1}$ differs from the undirected case described in Section 3.2. For $j\neq N$ , let $\bm{e}_{i,j}$ be the $(2\,N-1)$ -vector whose $i$ th and $(j+N)$ th coordinates are $1$ and whose other coordinates are $0$ . For $j=N$ , let $\bm{e}_{i,j}$ be the $(2\,N-1)$ -vector whose $i$ th coordinate is $1$ and whose other coordinates are $0$ . Along the lines of the MM algorithm for undirected connections described in Section 3.2, we increase $\ell$ by maximizing a minorizing function of $\ell$ , replacing $\bm{A}(\bm{\theta}^{(t)})$ by a constant matrix $\bm{A}^{\star}$ that is more convenient to invert. The constant matrix $\bm{A}^{\star}$ is defined as

\begin{array}[]{llllllllll}\bm{A}^{\star}&\coloneqq&\begin{pmatrix}\bm{A}_{1,1}^{\star}&\bm{A}_{1,2}^{\star}\\ \left(\bm{A}_{1,2}^{\star}\right)^{\top}&\bm{A}_{2,2}^{\star}\\ \end{pmatrix}\end{array}

where

•

$\bm{A}_{1,1}^{\star}\in\mathbb{R}^{N\times N}$ and $\bm{A}_{2,2}^{\star}\in\mathbb{R}^{(N-1)\times(N-1)}$ are diagonal matrices with elements $(N-1)/4$ on the main diagonal;
•

$\bm{A}_{1,2}^{\star}\in\mathbb{R}^{N\times(N-1)}$ is a matrix with vanishing elements on its main diagonal and off-diagonal elements $1/4$ .

Applying Theorem 8.5.11 in \citetsuppharville_matrix_1997 to $\bm{A}_{1,2}^{\star}$ and $\bm{A}^{\star}$ shows that matrix can be inverted in $O(N)$ operations. With the above change in the constant matrix $\bm{A}^{\star}$ , we estimate $\bm{\theta}$ along the lines of Section 3.2.

Appendix G Hate Speech on X: Additional Information

G.1 Data

For the application, we use posts of $N=$ 2,191 U.S. state legislators on the social media platform X collected by \citetsuppkim_attention_2022 in the six months leading up to and including the insurrection at the United States Capitol on January 6, 2021. We restrict attention to active legislators, that is, legislators who posted during the aforementioned period and mentioned or reposted content from other active legislators. Since reposts do not necessarily reflect politicians’ opinions, we exclude all reposts and non-unique posts that are direct copies of other users’ messages to gather information on responses. Employing large language models of \citetsuppcamacho-collados_tweetnlp_2022 pre-trained on these posts enables categorizing the 109,974 posts into those containing hate speech statements versus those that do not. Accordingly, the binary attribute $Y_{i}$ equals 1 if the corresponding legislator sent at least one post classified as hate speech and 0 otherwise. The algorithm of \citetsuppcamacho-collados_tweetnlp_2022 provides for each Tweet a continuous value between 0 and 1. We classify the respective Tweet as using hate speech if its value is larger than 0.5. The attribute $x_{i,1}\in\{0,1\}$ is 1 if legislator $i$ is a Republican and 0 otherwise. In addition, we incorporate information on each legislator’s gender ( $x_{i,2}=1$ if legislator $i$ is female and $0$ otherwise), race ( $x_{i,3}=1$ if legislator is white and $0$ otherwise), and state ( $x_{i,4}$ ). On the social media platform X, users have the ability to either mention or repost other users’ posts. The resulting network, denoted as $\bm{Z}$ , is based on the mentions and reposts exchanged between January 6, 2020 and January 6, 2021: $Z_{i,j}=1$ if legislator $i$ mentioned or reposted legislator $j$ in a post.

G.2 Plots

In addition to the goodness-of-fit checks reported in Section 6, we assess whether the model preserves salient characteristics of connections $\bm{Z}$ . Figure 5 suggests that the proposed model captures the shared partner distribution, i.e., the numbers of connected pairs of legislators $\{i,j\}\subset\mathscr{P}_{N}$ with $1$ , $2$ , $\dots$ shared partners.

\bibliographystylesupp

chicago \bibliographysuppbase

A Regression Framework for Studying Relationships among Attributes under Network Interference

Abstract

1 Introduction

2 Regression under Network Interference

2.1 GLM Representations

2.2 Example: Model Specification

2.2.1 GLM Representation of Responses YiY_{i}

2.2.2 GLM Representation of Connections Zi,jZ_{i,j}

3 Scalable Statistical Computing

3.1 Pseudo-Loglikelihood

3.2 Minorization-Maximization (MM)

3.3 Quantifying Uncertainty

4 Theoretical Guarantees

5 Simulation Results

6 Hate Speech on X

6.1 Model Specification

6.2 Results

6.3 Model Assessment

7 Discussion

References

Appendix A Proofs of Propositions 2.1 and A

Appendix B Proofs of Lemmas 3.1 and 3.2

Appendix C Proof of Theorem 4

Appendix D Corollaries 4 and D.3

D.1 Notation and Background

D.2 Proof of Corollary 4

D.3 Statement and Proof of Corollary D.3

D.4 Bounding ΛN​(𝜽⋆)\Lambda_{N}(\bm{\theta}^{\star})

D.5 Bounding ΨN\Psi_{N}

D.6 Bounding ‖|𝒟N​(𝜽⋆)|‖2{|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}

D.7 Auxiliary Results

Appendix E Quasi-Newton Acceleration

Appendix F MM Algorithm: Directed Connections

Appendix G Hate Speech on X: Additional Information

G.1 Data

G.2 Plots

2.2.1 GLM Representation of Responses $Y_{i}$

2.2.2 GLM Representation of Connections $Z_{i,j}$

D.4 Bounding $\Lambda_{N}(\bm{\theta}^{\star})$

D.5 Bounding $\Psi_{N}$

D.6 Bounding ${|\!|\!|\mathscr{D}_{N}(\bm{\theta}^{\star})|\!|\!|_{2}}$