Asymptotic product-form steady-state for generalized Jackson networks in multi-scale heavy traffic

J.G. Dai
Cornell University
Peter Glynn
Stanford University
Yaosheng Xu
University of Chicago

Abstract

We prove that under a multi-scale heavy traffic condition, the stationary distribution of the scaled queue length vector process in any generalized Jackson network has a product-form limit. Each component in the product form follows an exponential distribution, corresponding to the Brownian approximation of a single station queue. The “single station” can be constructed precisely and its parameters have a good intuitive interpretation.

Keywords: single-class queueing networks, product-form stationary distributions, reflected Brownian motions, interchange of limits, heavy traffic approximation, performance analysis

Mathematics Subject Classification: 60K25, 60J27, 60K37

1 Introduction

Jackson (1957) introduced a class of first-come-first-serve (FCFS) queueing networks that are known today as open Jackson networks. The defining characteristics of a Jackson network are (a) all customers visiting a service station are homogeneous in terms of the service time distribution and the routing probabilities, and (b) all interarrival and service time distributions, referred to as the primitive distributions, are exponential. For a Jackson network with $J$ stations, the queue length vector process is a continuous time Markov chain (CTMC). Jackson’s pioneering contribution is that when the traffic intensity at each station is less than one, the CTMC has a unique stationary distribution that is of product form, meaning that the steady-state queue lengths at the various stations in the network are independent. This product-form result makes the computation of steady-state performance measures highly tractable.

When the primitive distributions are general, the corresponding network is known as a generalized Jackson network (GJN). For a GJN, the product-form result no longer holds in general. Creating and justifying approximations for GJNs have been an active research area for more than 60 years. In this paper, we prove the following asymptotic product-form result for a GJN: for a GJN with $J$ stations, satisfying the multi-scale heavy traffic condition

\displaystyle 1\ll\frac{1}{1-\rho_{1}}\ll\frac{1}{1-\rho_{2}}\ll\cdots\ll\frac{1}{1-\rho_{J}},

(1.1)

the following approximation holds

\displaystyle\Big{(}(1-\rho_{1})Z_{1},\ldots,(1-\rho_{J})Z_{J}\Big{)}\approx\left(\rho_{1}d_{1}E_{1},\ldots,\rho_{J}d_{J}E_{J}\right),

(1.2)

where $E_{1},\ldots,E_{J}$ are i.i.d exponential random variables with mean $1$ , and $(d_{1},\ldots,d_{J})>0$ is some deterministic vector. In (1.2), $\rho_{j}<1$ is the traffic intensity at station $j$ defined in (2.4), $Z=(Z_{1},\ldots,Z_{J})$ is the vector of steady-state queue length. The right side of (1.2) indicates a product-form distribution. Regarding the multi-scale heavy traffic condition (1.1), $1/(1-\rho_{j})$ will be called the heavy traffic normalization factor for station $j$ and $a\ll b$ is interpreted as “ $b$ being much larger than $a$ ”. Condition (1.1) says that the heavy traffic normalization factors for different stations are all large, but of widely separated magnitudes. A precise definition of the multi-scale heavy traffic condition will be defined in (3.5) in Section 3, and a precise version of (1.2) in terms of convergence in distribution will be stated in Theorem 3.1.

Our result is the first of its kind. It has many potential ramifications. An obvious benefit of our result is that the right side of (1.2) offers a scalable method to approximately evaluate the steady-state performance of a GJN. In Section 4, we demonstrate that approximations based on (1.2) can be accurate sometimes even when the load-separation condition in (1.1) is violated. Our asymptotic product-form result has been extended to multiclass queueing networks operating under static buffer priority service policies in Dai and Huo (2024). An active research direction is to explore whether the asymptotic product-form phenomenon holds for a much broader class of stochastic systems including stochastic processing networks as defined in Dai and Harrison (2020). These exciting developments have the potential to radically speed up the performance analysis and optimization of real world complex systems.

When a GJN is in conventional heavy traffic, namely all the stations have roughly equal heavy traffic normalization factors, Reiman (1984) and Johnson (1983) prove that the scaled queue length vector process converges to a multi-dimensional reflecting Brownian motion (RBM). In their conventional heavy traffic limit theorem, one common scaling is applied (based on the heavy traffic normalization factor for an arbitrarily chosen station) across all the different stations. Their RBMs are first introduced in Harrison and Reiman (1981), and the stationary distributions of these RBMs were shown to exist in Harrison and Williams (1987a) when $\rho_{j}<1$ for each station $j$ . These pioneering works have propelled the development of Brownian models for queueing network performance analysis and controls in the last forty years; see, for example, Harrison (1988), Harrison and Nguyen (1993), Chen and Yao (2001), Kang et al. (2009), Ata et al. (2024).

The stationary distribution of a multi-dimensional RBM is typically not of product-form except under a “skew-symmetry” condition characterized in Harrison and Williams (1987b); see, e.g., Harrison (1978). Numerical algorithms have been developed for computing the stationary distributions of RBMs in low dimensions; see Dai and Harrison (1992) and Shen et al. (2002). It is likely that these algorithms, even when convergent, do not scale well when the number $J$ of stations gets large. Moreover, a recent simulation-based algorithm proposed in Blanchet et al. (2021) has a complexity that scales linearly in dimension $J$ when the RBM data satisfies (a) a uniform (in $J$ ) contraction condition and (b) a uniform (in $J$ ) stability condition. The requirement of uniformity in $J$ is a strong condition when $J$ is large. In other words, the complexity guarantee of their work may not cover many real-world high-dimensional network examples. On the other hand, the approximation based on (1.2) can be easily computed even when $J$ is (very) large. The approximation (1.2) can be either used directly to approximate the original networks or it can be used at a “pre-conditioner” for numerical algorithms, such as that of Dai and Harrison (1992). Specifically, their algorithm computes a density relative to a reference measure which must be chosen well in order to guarantee convergence. The approximation (1.2) is a possible choice for that reference measure; its effectiveness will be explored elsewhere.

Our proof of Theorem 3.1 employs the BAR-approach recently developed in Braverman et al. (2017, 2024) for continuous-time queueing networks with general primitive distributions. The BAR-approach was developed initially for providing an alternative to the limit interchange procedure in Gamarnik and Zeevi (2006), Budhiraja and Lee (2009), Gurvich (2014), and Ye and Yao (2018), justifying the convergence of the prelimit stationary distribution to the RBM stationary distribution under conventional heavy traffic. Our proof of Theorem 3.1 relies heavily on a steady-state moment bound for the scaled queue length vector. This bound is proved in a companion paper Guang et al. (2023).

The parameter $d_{j}$ in (1.2) has an explicit formula (3.9) in terms of the first two moments of the primitive distributions and the routing probabilities. The distribution of $d_{j}E_{j}$ corresponds to the stationary distribution of a one-dimensional RBM approximating a single station queue. The “single station” can be constructed precisely and has a good intuitive interpretation as described immediately below the statement of Theorem 3.1, consistent with the bottleneck analysis advanced in Chen and Mandelbaum (1991). Our work contributes to a line of research pioneered by Whitt (1983) on decomposition approaches for the performance analysis of GJNs, and partly justifies the sequential bottleneck decomposition (SBD) heuristic proposed in Dai et al. (1994). For a recent survey of decomposition approaches including “robust queueing approximations”, readers are referred to Whitt and You (2022). We leave it as a future work to compare the approximation (1.2) with the extensive numerical work in Whitt and You (2022).

We note that when the normalization factors $(1-\rho_{j})^{-1}$ are widely separated, this implies that in heavy traffic, the diffusion temporal scalings $(1-\rho_{j})^{-2}$ that dictate the time scales governing the dynamics at each station are also widely separated. It is this “time scale separation” that leads to the independence across stations. A precise statement of this mathematical result and its proof has been developed in Chen et al. (2025).

2 Generalized Jackson networks

We first define an open generalized Jackson network, following closely Section 2.1 of Braverman et al. (2017) both in terms of terminology and notation. There are $J$ single-server stations in the network. Each station has an infinite-size buffer that holds jobs waiting for service. Each station potentially has an external job arrival process. An arriving job is processed at various stations in the network, one station at a time, before eventually exiting the network. When a job arrives at a station and finds the server busy processing another job, the arriving job waits in the buffer until its turn for service. After being processed at one station, the job either is routed to another station for further processing or exits the network, independent of all previous history. Jobs at each station, either waiting or in service, are homogeneous in terms of service times and routing probabilities. Jobs at each station are processed following the first-come-first-serve (FCFS) discipline.

Let ${\cal J}=\{1,\ldots,J\}$ be the set of stations. The following notational system follows Braverman et al. (2024) closely. For each station $j\in\cal{J}$ , we are given two nonnegative real numbers $\alpha_{j}\in\mathbb{R}_{+}$ and $\mu_{j}>0$ , two sequences of i.i.d. random variables $T_{e,j}=\{T_{e,j}(i),i\in\mathbb{N}\}$ and $T_{s,j}=\{T_{s,j}(i),i\in\mathbb{N}\}$ , and one sequence of i.i.d. random vectors $\Phi_{j}=\{\Phi_{j}(i),i\in\mathbb{N}\}$ , where $\mathbb{N}$ denotes the set of natural numbers. We allow the possibility that some stations may not have external arrivals by assuming $\alpha_{j}=0$ . If such a station $j$ exists, all terms associated with external arrivals at station $j$ can be removed without ambiguity. To maintain clean notation, we assume each station $j\in\cal{J}$ has external arrival. We assume $T_{e,1}$ , …, $T_{e,J}$ , $T_{s,1}$ , …, $T_{s,J}$ , $\Phi_{1}$ , $\ldots$ , $\Phi_{J}$ are independent. Following Braverman et al. (2024), we assume $\mathbb{E}[T_{e,j}(1)]=1$ , $\mathbb{E}[T_{s,j}(1)]=1$ such that $T_{e,j}(1)$ and $T_{s,j}(1)$ are unitized. For $j\in\cal{J}$ , we use $T_{e,j}(i)/\alpha_{j}$ to represent the interarrival time between the $(i-1)$ th and $i$ th external arriving jobs at station $j$ , and $T_{s,j}(i)/\mu_{j}$ to denote the $i$ th job service time at station $j$ . Therefore, $\alpha_{j}$ is interpreted as the external arrival rate and $\mu_{j}$ as the service rate at station $j$ . The random vector $\Phi_{j}(1)$ takes vector $e^{(k)}$ with probability $P_{jk}$ for $k\in\mathcal{J}$ and takes vector $e^{(0)}$ with probability $P_{j0}\equiv 1-\sum_{k\in\mathcal{J}}P_{jk}$ , where $e^{(k)}$ is the $J$ -vector with component $k$ being 1 and all other components being $0$ , and $e^{(0)}$ is the $J$ -vector of zeros. When $\Phi_{j}(i)=e^{(k)}$ , the $i$ th departing job from station $j$ goes next to station $k\in\mathcal{J}$ . When $\Phi_{j}(i)=e^{(0)}$ , the job exits the network. We assume the $J\times J$ routing matrix $P=(P_{ij})$ is transient, which is equivalent to requiring that $(I-P)$ is invertible. This assumption ensures that each arriving job will eventually exit the network, and thus the network is known as an open generalized Jackson network.

Denote by $c^{2}_{e,j}=\operatorname{Var}\left(T_{e,j}(1)\right)$ and $c^{2}_{s,j}=\operatorname{Var}\left(T_{s,j}(1)\right)$ . It is known that $c^{2}_{e,j}$ and $c^{2}_{s,j}$ are the squared coefficients of variation for interarrival time and service time, respectively.

Traffic Equation. Given the queueing network, let $\lambda$ be the unique solution to the traffic equation

\lambda=\alpha+P^{\prime}\lambda,

(2.3)

where $P^{\prime}$ is the transpose of $P$ . Here $\lambda_{j}$ is referred to as the nominal total arrival rate to station $j$ , including the external arrival and the arrivals from other stations. For each station $j\in\mathcal{J}$ , define the traffic intensity at station $j$ by

\displaystyle\rho_{j}=\lambda_{j}/\mu_{j}.

(2.4)

We assume that

\displaystyle\rho_{j}<1,\quad j\in\mathcal{J}.

Markov process. For $t\geq 0$ and $j\in\cal{J}$ , let $Z_{j}(t)$ be the number of jobs at station $j$ , including possibly one being in service. Let $R_{e,j}(t)$ be the residual time until the next external arrival to station $j$ . Let $R_{s,j}(t)$ be the residual service time for the job being processed in the station $j$ . If $Z_{j}(t)=0$ , the residual service time is the service time of the next job at station $j$ , meaning $R_{s,j}(t)=T_{s,j}(i)$ for an appropriate $i\in\mathbb{N}$ . We write $Z(t),R_{e}(t),R_{s}(t)$ as the vectors of $Z_{j}(t),R_{e,j}(t)$ , and $R_{s,j}(t)$ , respectively. Define

X(t)=(Z(t),R_{e}(t),R_{s}(t))

for each $t\geq 0$ . Then $\{X(t),t\geq 0\}$ is a Markov process with respect to the filtration $\mathbb{F}^{X}\equiv\left\{\mathcal{F}_{t}^{X};t\geq 0\right\}$ defined on the state space $\mathbb{Z}_{+}^{J}\times\mathbb{R}_{+}^{J}\times\mathbb{R}_{+}^{J}$ , where $\mathcal{F}_{t}^{X}=\sigma(\{X(u);0\leq u\leq t\})$ .

3 Multi-scale heavy traffic and the main result

We consider a family of generalized Jackson networks indexed by $r\in(0,1)$ . We denote by $\mu^{(r)}_{j}$ the service rate at station $j$ in the $r$ th network. To keep the presentation clean, we assume the service rate vector $\mu^{(r)}$ is the only network parameter that depends on $r\in(0,1)$ , and that the service time $T_{s,j}/\mu^{(r)}_{j}$ depends on $r$ solely through the service rate. The arrival rate $\alpha$ , the unitized interarrival and service times, and routing vectors do not depend on $r$ . Consequently, the nominal total arrival rate $\lambda$ does not depend on $r$ .

Assumption 3.1 (Multi-scale heavy traffic).

We assume there is a constant $b_{j}>0$ for each station $j\in\mathcal{J}$ and for $r\in(0,1)$

\displaystyle\mu^{(r)}_{j}-\lambda_{j}=b_{j}r^{j},\quad j\in\mathcal{J}.

(3.5)

Condition (3.5) is equivalent to

\displaystyle 1-\rho^{(r)}_{j}=b_{j}r^{j}/\mu^{(r)}_{j},\quad j\in\mathcal{J}.

Condition (3.5) implies $\mu^{(r)}_{j}\to\lambda_{j}$ and

\displaystyle 1/(1-\rho_{1}^{(r)})=\frac{\mu^{(r)}_{1}}{b_{1}}\frac{1}{r}\to\infty,\quad\frac{1/(1-\rho_{j+1}^{(r)})}{1/(1-\rho_{j}^{(r)})}=\frac{b_{j}\mu^{(r)}_{j+1}}{b_{j+1}\mu^{(r)}_{j}}\frac{1}{r}\to\infty,\quad j=1,\ldots,J-1.

as $r\downarrow 0$ , which is consistent with (1.1). We call condition (3.5) the multi-scale heavy traffic condition. Condition (3.5) is one way to achieve “widely separated traffic normalization factors”. Clearly, it is not the only way. We choose to use the form in condition (3.5) to make our proofs clean. We leave it to future research to understand the most general form of load-separation under which Theorem 3.1 still holds. In the rest of this paper, we choose $b_{j}=1$ for $j\in\mathcal{J}$ . When $b_{j}\neq 1$ , all the proofs remain essentially the same.

Assumption 3.2 (Moment).

We assume that there exists a $\delta_{0}>0$ such that the primitive distributions have $J+1+\delta_{0}$ moments. Namely,

\displaystyle\mathbb{E}\big{[}\big{(}T_{e,j}(1)\big{)}^{J+1+\delta_{0}}\big{]}<\infty,~{}\mathbb{E}\big{[}\big{(}T_{s,j}(1)\big{)}^{J+1+\delta_{0}}\big{]}<\infty,~{}j\in\mathcal{J}.

(3.6)

The $(J+1+\delta_{0})$ th moment assumption is used to derive in Section 7 the asymptotic basic adjoint relationship (7.75) and Taylor expansions (7.72) and (7.73). These three equations are critical in the proof of Theorem 3.1.

The following assumption is mild. Dai (1995) proves that the following assumption holds when $\rho_{j}^{(r)}<1$ for each station $j$ , and the interarrival times are “unbounded and spread-out”.

Assumption 3.3 (Stability).

For each $r\in(0,1)$ , the Markov process $\{X^{(r)}(t),t\geq 0\}$ is positive Harris recurrent and it has a unique stationary distribution.

Under Assumption 3.3, we use

\displaystyle X^{(r)}=\big{(}Z^{(r)},R_{e}^{(r)},R_{s}^{(r)}\big{)}

to denote the steady-state random vector that has the stationary distribution. It is well known that

\displaystyle\mathbb{P}\{Z^{(r)}_{j}=0\}=1-\rho^{(r)}_{j}=r^{j}/\mu^{(r)}_{j},\quad j\in\mathcal{J}.

(3.7)

See, for example, (A.11) and (A.13) of Braverman et al. (2017) for a proof. Therefore, under multi-scale heavy traffic condition (3.5), each server approaches $100\%$ utilization.

The following is the main result of this paper.

Theorem 3.1.

Assume Assumptions 3.1-3.3 all hold. Then

\displaystyle\Big{(}(1-\rho^{(r)}_{1})Z_{1}^{(r)},\ldots,(1-\rho^{(r)}_{J})Z_{J}^{(r)}\Big{)}\Rightarrow\left(d_{1}E_{1},\ldots,d_{J}E_{J}\right),\quad\text{ as }r\to 0,

(3.8)

where “ $\Rightarrow$ ” stands for convergence in distribution. Furthermore, $E_{1},\ldots,E_{J}$ are independent exponential random variables with mean $1$ , and for each station $j\in\mathcal{J}$ ,

$\displaystyle d_{j}=$	$\displaystyle\frac{\sigma^{2}_{j}}{2\lambda_{j}(1-w_{jj})},\quad\text{ and }$	(3.9)
$\displaystyle\sigma_{j}^{2}=$	$\displaystyle\sum_{i<j}\alpha_{i}\bigl{(}w_{ij}^{2}c_{e,i}^{2}+w_{ij}(1-w_{ij})\bigr{)}+\alpha_{j}c_{e,j}^{2}+\sum_{i>j}\lambda_{i}\bigl{(}w_{ij}^{2}c_{s,i}^{2}+w_{ij}(1-w_{ij}))$
	$\displaystyle+\lambda_{j}\bigl{(}c_{s,j}^{2}(1-w_{jj})^{2}+w_{jj}(1-w_{jj})\bigr{)},$	(3.10)

where $(w_{ij})$ is computed from the routing matrix $P$ through a formula given at the end this section.

As discussed in the introduction, the pre-limit stationary distribution (the left side of 3.8) is not of product form, nor is stationary distribution of conventional heavy traffic limit Reiman (1984). It is somewhat surprising that the multi-scale limit $E_{1},\ldots,E_{J}$ are independent. Theorem 3.1 suggests an obvious approximation to the stationary distribution of the queue length vector process for a generalized Jackson network with widely separated values of $(\mu_{j}-\lambda_{j})^{-1}$ , $j\in\mathcal{J}$ . Our approximation is

\displaystyle\big{(}Z_{1}(\infty),Z_{2}(\infty),\ldots,Z_{J}(\infty)\big{)}\stackrel{{\scriptstyle d}}{{\approx}}\left(\frac{\rho_{1}}{1-\rho_{1}}d_{1}E_{1},\frac{\rho_{2}}{1-\rho_{2}}d_{2}E_{2},\ldots,\frac{\rho_{J}}{1-\rho_{J}}d_{J}E_{J}\right),

(3.11)

where $(E_{1},\ldots,E_{J})$ is described in Theorem 3.1, and $\stackrel{{\scriptstyle d}}{{\approx}}$ denotes “has approximately the same distribution as” (and carries no rigorous meaning beyond that imparted by Theorem 3.1).

We now define the $J\times J$ matrix $w=(w_{ij})$ used in (3.9) and (3.10). For each $j\in\mathcal{J}$ ,

		$\displaystyle(w_{1j},\ldots,w_{j-1,j})^{\prime}=(I-P_{j-1})^{-1}P_{[1:j-1],j},$		(3.12)
		$\displaystyle(w_{jj},\ldots,w_{J,j})^{\prime}=P_{[j:J],j}+P_{[j:J],[1:j-1]}(I-P_{j-1})^{-1}P_{[1:j-1],j},$		(3.13)

where prime denotes the transpose, and, for $A,B\subseteq\mathcal{J}$ , $P_{A,B}$ is the submatrix of $P$ whose entries are $(P_{k\ell})$ with $k\in A$ and $\ell\in B$ with the following conventions: for $\ell\leq k$ ,

\displaystyle[\ell:k]=\{\ell,\ell+1,\ldots,k\},\quad P_{A,k}=P_{A,\{k\}},\quad P_{\ell,B}=P_{\{\ell\},B},\quad P_{k}=P_{[1:k],[1:k]},

and any expression involving $P_{A,B}$ is interpreted to be zero when either $A$ or $B$ is the empty set. The expressions in (3.12) and (3.13) may look complicated, but $(w_{ij})$ has the following probabilistic interpretation. Recall that the routing matrix $P$ can be embedded within a transition matrix of a DTMC on state space $\{0\}\cup\mathcal{J}$ with $0$ being the absorbing state. For each $i\in\mathcal{J}$ and $j\in\mathcal{J}$ , $w_{ij}$ is the probability that starting from state $i$ , the DTMC will eventually visit state $j$ while avoiding visiting states $\{0,j+1,\ldots,J\}$ . See more discussion of $(w_{ij})$ in Lemma A.1 in Section 5.

The variance parameter $\sigma_{j}^{2}$ in (3.10) has the following appealing interpretation. Consider a renewal arrival process $\{E(t),t\geq 0\}$ with arrival rate $\nu$ and squared coefficient of variation $c^{2}$ of the interarrival time. Each arrival has probability $p_{j}$ of going to station $j$ instantly, independent of all previous history. Let $\{A_{j}(t),t\geq 0\}$ be the corresponding arrival process to station $j$ . It is known that

\displaystyle\lim_{t\to\infty}\frac{\text{Var}(A_{j}(t))}{t}=\bigl{(}c^{2}p_{j}^{2}+p_{j}(1-p_{j})\bigr{)}\nu.

(3.14)

Thus, each of the first $J$ terms in the right side of (3.10) corresponds to the variance parameter in (3.14) of $J$ arrival sources to station $j$ . The first $j-1$ terms correspond to arrival sources from lightly-loaded stations, the $j$ th term corresponds to the external arrival process to station $j$ , and the last $J-j$ terms correspond to arrival sources from service completions at heavily-loaded stations. For a lightly-loaded station $\ell<j$ , its external arrival process serves as the renewal arrival process, each arrival having probability $w_{\ell j}$ to join station $j$ . For a heavily-loaded station $k>j$ , its service process serves as the renewal arrival process (as if the server is always busy), each arrival having probability $w_{kj}$ to join station $j$ . The resulting formula $d_{j}$ in (3.9) is consistent with the diffusion approximation of a $\sum_{i}G_{i}/GI/1$ queue with $J$ external arrival processes and immediate feedback probability $w_{jj}$ after the service at station $j$ .

Theorem 3.1 will be proved in Section 5. The proof relies on the BAR in Section 6 and the asymptotic BAR in Section 7. In addition, the proof relies critically on a uniform moment bound that is proved in the companion paper (Guang et al., 2023, Theorem 2). For easy reference, we state that result here.

Proposition 3.1 (Theorem 2 of Guang et al. (2023)).

Assume Assumptions 3.1–3.3. There exists $r_{0}\in(0,1)$ and $\epsilon_{0}>0$ such that

\sup_{r\in(0,r_{0})}\mathbb{E}\Big{[}\big{(}r^{j}Z_{j}^{(r)}\big{)}^{J+\epsilon_{0}}\Big{]}<\infty,\quad j\in\mathcal{J}.

(3.15)

The moment bound (3.15) holds of course when interarrival and service time distributions are exponential because $Z_{j}^{(r)}$ is geometrically distributed with mean $\rho^{(r)}_{j}/(1-\rho^{(r)}_{j})$ . In the exponential case, Lemma 23 of Xu (2022) provides an alternative proof via Lyapunov functions.

4 Numerical study

In this section, we illustrate the value of approximation (3.11) that is rooted in Theorem 3.1. In Section 4.1, we propose an approximation formula for the cumulative distribution function (cdf) of the queue length vector. In Section 4.2, we perform a simulation study on a two-station queueing network to demonstrate the accuracy of the approximation. We experiment with two sets of network parameters. In one set, we vary the level of load separation between stations, and in the other, we vary the variability of primitive distributions. In all cases, our proposed approximation performs well in multiple performance metrics, even in the case when the load separation condition is grossly violated. The latter phenomenon deserves a future study.

4.1 Approximation formula

We aim to approximate the steady-state queue length vector $Z(\infty)\equiv(Z_{1}(\infty),\ldots,Z_{J}(\infty))$ by the vector of exponential random variables $E\equiv(E_{1},\ldots,E_{J})$ as specified in (3.11). To approximate the probability mass function of $Z(\infty)$ , it is important to note that $Z(\infty)$ in the left side of (3.11) takes discrete values, whereas the exponential random variables $E$ on the right-hand side are continuous. We propose to approximate $Z(\infty)$ by $Y\equiv(Y_{1},\ldots,Y_{J})$ , where $Y_{1},\ldots,Y_{J}$ are independent and each $Y_{j}$ has a mixed distribution given below. From the distribution of $Y$ , we further develop approximations that are tailored to the particular performance measures of interest.

We define $Y_{j}$ to be the random variable with the following cumulative distribution function (cdf):

F_{j}(x)=(1-\rho_{j})+\rho_{j}(1-e^{-\frac{x}{\rho_{j}d_{j}/(1-\rho_{j})}}),\text{ for all }x\geq 0\text{ and }F_{j}(0-)=0.

(4.16)

In this approximation (4.16), $Y_{j}$ has a positive mass at $0$ , namely,

\displaystyle\mathbb{P}\{Y_{j}=0\}=1-\rho_{j},

which is equal to $\mathbb{P}\{Z_{j}(\infty)=0\}$ because of (3.7), and $Y_{j}$ is distributed exponentially with mean $\rho_{j}d_{j}/(1-\rho_{j})$ on $(0,\infty)$ with total mass $\rho_{j}$ .

Steady-state mean approximation. For the mean steady-state queue length $Z_{j}(\infty)$ , we propose

\mathbb{E}[Z_{j}(\infty)]=\frac{\rho_{j}}{1-\rho_{j}}d_{j};

(4.17)

consistent with (3.11).

Probability mass function approximation. For the probability mass approximation of $Z_{j}(\infty)$ , we directly assign $1-\rho_{j}$ as a fixed probability for approximating $Z_{j}(\infty)=0$ . Then we split the remaining probability mass $\rho_{j}$ for $Z_{j}(\infty)$ by discretizing $Y_{j}$ , such that the approximated probabilities of $Z_{j}(\infty)$ sum up to 1. Specifically, with this mixed discretization, $Z_{j}(\infty)$ has a probability mass function (pmf):

$\displaystyle\mathbb{P}(Z_{j}(\infty)=0)$	$\displaystyle\approx\mathbb{P}(Y_{j}=0)=F_{j}(0)=1-\rho_{j},$	(4.18)
$\displaystyle\mathbb{P}(Z_{j}(\infty)=k)$	$\displaystyle\approx\mathbb{P}(k<Y_{j}\leq k+1)=F_{j}\big{(}k+1\big{)}-F_{j}\big{(}k\big{)}$
	$\displaystyle=\rho_{j}\big{(}e^{-\frac{k(1-\rho_{j})}{\rho_{j}d_{j}}}-e^{-\frac{(k+1)(1-\rho_{j})}{\rho_{j}d_{j}}}\big{)},\quad k\in\mathbb{N}.$

We will use the pmf specified in (4.18) to plot the histogram as shown in Section 4.2.

4.2 Simulation

Refer to caption — Figure 1: Two-station generalized Jackson network

Consider an example of two-station generalized Jackson network shown in Figure 1 with external interarrival and service time following Gamma distributions. The external arrival rates and the routing matrix are fixed to be

\alpha_{1}=0.3,\quad\alpha_{2}=0.2,\quad P=\begin{pmatrix}0.3&0.6\\ 0.4&0.2\end{pmatrix},

from which one has the nominal arrival rates $\lambda_{1}=1$ and $\lambda_{2}=1$ . We set the service rates to be

\displaystyle\quad\mu_{1}=\lambda_{1}/\rho_{1},\quad\mu_{2}=\lambda_{2}/\rho_{2},

with the traffic intensity $\rho_{1},\rho_{2}$ being the parameters.

Recall that a Gamma distribution is characterized by two parameters: shape and scale. When the mean of the Gamma distribution is fixed as the reciprocal of the external arrival and service rates, the distribution can be fully specified by selecting the shape parameter, with the scale parameter determined as the mean divided by the shape. Accordingly, Table 1 presents our choices for the shape parameters of the external arrival and service distributions under the second and third columns. Each vector in these columns contains two values, representing the shape parameters for stations 1 and 2, respectively. The parameters $d_{1},d_{2}$ in Table 1 are calculated below according to (3.9):

	$\displaystyle d_{1}$	$\displaystyle=\frac{1}{2\lambda_{1}(1-w_{11})}\left\{\alpha_{1}c_{e,1}^{2}+\lambda_{2}\bigl{(}w_{21}^{2}c_{s,2}^{2}+w_{21}(1-w_{21}))+\lambda_{1}\bigl{(}c_{s,1}^{2}(1-w_{11})^{2}+w_{11}(1-w_{11})\bigr{)}\right\},$		(4.19)
	$\displaystyle d_{2}$	$\displaystyle=\frac{1}{2\lambda_{2}(1-w_{22})}\left\{\alpha_{1}\bigl{(}w_{12}^{2}c_{e,1}^{2}+w_{12}(1-w_{12})\bigr{)}+\alpha_{2}c_{e,2}^{2}+\lambda_{2}\bigl{(}c_{s,2}^{2}(1-w_{22})^{2}+w_{22}(1-w_{22})\bigr{)}\right\},$		(4.19)

where the matrix $w$ is calculated from (3.12) and (3.13) and its value is

\displaystyle w_{11}=0.3,\quad w_{12}=0.857,\quad w_{21}=0.4,\quad w_{22}=0.543.

We conduct two sets of simulation experiments. In the first set, we analyze three cases, labeled as A, B, and C, varying the variability in the primitive distributions. In the second set, we examine four cases, labeled as I–IV, varying the degree of load separation. In the first set, we fix the traffic intensities at $\rho_{1}=92\%,\rho_{2}=98\%$ . We simulate 3 cases by varying the shape parameters of the arrival and service distributions. For the second set, we use the shape parameters of the Gamma distribution as case B in the first set and simulate four cases by varying the multi-scale separation between the two stations. This is achieved by fixing $\rho_{2}=99\%$ , and adjusting the traffic intensity $\rho_{1}$ from no separation ( $99\%$ ) to large separation ( $84\%$ ).

Throughout this numerical study, we compare the simulation estimates (denoted as Sim in tables) with the theoretical estimates (denoted as Exp in tables) derived in Section 4.1. Before presenting the comparison, we illustrate how the simulation estimates are calculated.

In both sets, each simulation case is run for $10^{9}$ time units, with queue lengths recorded during the final one-tenth of the simulation period. This final segment is divided into 20 batches to construct a $95\%$ batch means confidence interval for the mean estimates for $Z(\infty)$ . Those estimates are presented under the Sim column in Tables 1 and 2. In comparison, the theoretical means are computed using (4.17) and presented under the Exp column of Tables 1 and 2.

				Traffic intensity $(92\%,98\%)$
Case	Arrival	Service	d	Station 1		Station 2
Case	shape	shape	(4.19)	Sim	Exp	Sim	Exp
A	[1.2, 1.3]	[1.1, 1.25]	[0.910,0.864]	$10.48\pm 0.08$	10.46	$41.87\pm 0.78$	42.32
B	[0.75, 0.8]	[0.95, 0.6]	[1.166,1.287]	$13.32\pm 0.09$	13.41	$62.42\pm 2.06$	63.08
C	[0.6, 0.45]	[0.5, 0.4]	[1.664,1.771]	$18.36\pm 0.14$	19.14	$87.43\pm 2.69$	86.78

Table 1: Mean estimates of queue length on different arrival and service distributions

Case	Traffic intensity	Station 1		Station 2
Case	Traffic intensity	Sim	Exp	Sim	Exp
I	$(99\%,99\%)$	$113.15\pm 6.53$	115.44	$127.94\pm 5.77$	127.46
II	$(96\%,99\%)$	$27.89\pm 0.38$	27.98	$128.75\pm 8.03$	127.46
III	$(90\%,99\%)$	$10.45\pm 0.06$	10.49	$126.84\pm 9.02$	127.46
IV	$(84\%,99\%)$	$6.05\pm 0.03$	6.12	$127.36\pm 6.72$	127.46

Table 2: Mean estimates of queue length on different multi-scale separation

According to Tables 1 and 2, our theoretical mean proposed in (4.17) demonstrate strong performance. In Table 1, the theoretical means for cases A and B align closely with the simulation ones. Although the mean estimates for case C are slightly less aligned, the values remain reasonably close. We hypothesize that the accuracy of our theoretical approximation decreases as the coefficient of variation of the arrival and service distributions increases (equivalently, when the shape parameter of the Gamma distribution becomes smaller). In Table 2, the mean estimates for cases I–IV perform similarly well, suggesting that the presence of multi-scale separation does not significantly impact estimation accuracy when the coefficient of variation remains moderate, as in case B. Interestingly, the performance is also remarkably strong in case I, in which there is no load separation. This unexpected result cannot be explained by the results in this paper, and we plan to explore this further in future research.

One may wonder how the approximation performs beyond estimating the mean of the queue lengths. To evaluate the performance, we present a histogram comparison for case B, shown in Figures 2 and 3. In these histograms, the Sim and Sim CI represent the mean and 95% confidence intervals of the percentage of time the queue length takes different values over 20 simulation batches. In contrast, the Exp represents the histogram derived using the approximation in (4.18). Note that the zoomed-in histograms on the third panel of Figures 2 and 3 use different y-axis scales. Figures 2 and 3 provide further evidence that our theoretical framework offers a highly accurate approximation, extending beyond mean estimates to closely align with the simulation results.

Additionally, we present a quantile comparison for the first set in Table 3. The quantiles before the parentheses (denoted as Sim) are calculated using a sample-based approach, where the $q$ th quantile is defined as the smallest queue length at which the cumulative percentage (averaged over 20 batches) at least reaches $q$ . In contrast, the quantiles within the parentheses (denoted as Exp) are calculated using the cumulative distribution function (cdf) described in (4.16).

Case	Station	25% quantile	50% quantile	75% quantile	90% quantile
A	1	3(2.14)	7(6.38)	15(13.63)	25(23.21)
A	2	12(11.32)	29(28.48)	58(57.81)	97(96.59)
B	1	3(2.74)	9(8.18)	19(17.47)	32(29.76)
B	2	18(16.87)	43(42.45)	87(86.18)	144(143.98)
C	1	4(3.91)	12(11.67)	26(24.93)	44(42.47)
C	2	24(23.21)	60(58.40)	122(118.54)	204(198.06)

Table 3: Quantile estimates of queue length on Sim and (Exp)

Our Theorem 3.1 establishes that the multi-scale limits $E_{1}$ and $E_{2}$ are independent. To conclude this section, we use simulations to investigate whether the queue lengths at the two stations exhibit asymptotic independence. The independence of two discrete random variables is defined by the relationship $\mathbb{P}(Z_{1}=k_{1},Z_{1}=k_{2})=\mathbb{P}(Z_{1}=k_{1})\cdot\mathbb{P}(Z_{1}=k_{2})$ for any integers $k_{1},k_{2}$ . In our simulations, we record both the joint percentages of time during which the queue lengths take specific values, and the individual percentages of time each queue length reaches its respective values over the final one-tenth of the simulated time units. The table 4 summarizes the $5\times 5$ combinations with the largest percentages. In the table, values before the parentheses represent the product of the marginal percentages, $\mathbb{P}(Z_{1}=k_{1})\cdot\mathbb{P}(Z_{1}=k_{2})$ , while the values in parentheses show the joint percentages, $\mathbb{P}(Z_{1}=k_{1},Z_{1}=k_{2})$ . All percentages are expressed on a scale of $10^{-3}$ , showing that the product of the marginal percentages and the joint percentages are very close.

Formally, we investigate independence through statistical testing. To this end, we construct a contingency table and perform the likelihood-ratio test, which, in this case, is known as the $G$ -test of independence (McDonald, 2014). Note that Table 4 is not a contingency table, as the recorded occurrences are not independent. To obtain independent samples, we run the simulation for a longer duration of $4*10^{9}$ time units. During the last $3*10^{9}$ time units, we divide the time into intervals of $10^{6}$ time units. At the start of each interval, we record the joint queue lengths as a representative point. These representative points, observed over $10^{6}$ -unit intervals, are considered independent occurrences. As a result, we obtain $3000(=3*10^{9}/10^{6})$ independent observations and count their occurrences to construct the contingency table. The resulting contingency table has dimensions $81\times 273$ . The $G$ statistic is $6911.50$ with $21760$ degrees of freedom. The $p$ -value is $1.0$ , meaning we cannot reject the null hypothesis that the queue lengths are independent. This result further supports the asymptotic independent behavior.

$k_{1}$ \ $k_{2}$	0	1	2	3	4	5
0	1.60(1.60)	1.27(1.23)	1.21(1.18)	1.18(1.15)	1.16(1.14)	1.14(1.12)
1	1.29(1.28)	1.03(1.01)	0.98(0.96)	0.95(0.94)	0.93(0.92)	0.92(0.91)
2	1.19(1.17)	0.94(0.93)	0.90(0.88)	0.87(0.86)	0.86(0.85)	0.84(0.83)
3	1.10(1.08)	0.87(0.86)	0.83(0.82)	0.81(0.80)	0.79(0.79)	0.78(0.78)
4	1.02(1.01)	0.81(0.80)	0.77(0.76)	0.75(0.75)	0.74(0.73)	0.73(0.72)
5	0.95(0.94)	0.76(0.75)	0.72(0.72)	0.70(0.70)	0.69(0.68)	0.68(0.67)

Table 4: Joint percentage of time (values in

10^{-3}

)

$\mathbb{P}(Z_{1}=k_{1})\cdot\mathbb{P}(Z_{1}=k_{2})$ ( $\mathbb{P}(Z_{1}=k_{1},Z_{1}=k_{2})$ )

These simulation findings further validate the accuracy and robustness of our product-form theoretical limit.

5 Proof of Theorem 3.1

In this section, we present a proof of Theorem 3.1 under an additional assumption (see Assumption 5.1 below). The proof relies on a transform version of the basic adjoint relationship (BAR), which will be developed in Section 6. Recall that $X^{(r)}=(Z^{(r)},R_{e}^{(r)},R^{(r)}_{s})$ denotes the steady state of the Markov process that governs the dynamics of the GJN. Define the moment generating function (MGF), or Laplace transform, $\phi^{(r)}$ of $Z^{(r)}$ as

\displaystyle\phi^{(r)}(\theta)=\mathbb{E}\Big{[}\exp{\Big{(}\sum_{j\in\mathcal{J}}\theta_{j}Z_{j}^{(r)}\Big{)}}\Big{]},\quad\theta\in\mathbb{R}^{J}_{-}\equiv\{y\in\mathbb{R}^{J},y\leq 0\}.

Then $\phi^{(r)}(r\eta_{1},\ldots,r^{J}\eta_{J})$ , with $\eta=(\eta_{1},\ldots,\eta_{J})^{\prime}\in\mathbb{R}^{J}_{-}$ , is the MGF of $(rZ^{(r)}_{1},\ldots,r^{J}Z^{(r)}_{J})$ . It is well known that proving (3.8) is equivalent to proving

\displaystyle\lim_{r\to\infty}\phi^{(r)}(r\eta_{1},\ldots,r^{J}\eta_{J})=\prod_{j\in\mathcal{J}}\frac{1}{1-d_{j}\eta_{j}},\quad\eta=(\eta_{1},\ldots,\eta_{J})^{\prime}\in\mathbb{R}^{J}_{-}.

(5.20)

To prove (5.20), it suffices to prove the following lemma. Equation (5.20) then follows from 5.21 by induction.

Proposition 5.1.

Assume Assumptions 3.1-3.3 hold. Fix $\eta\in\mathbb{R}^{J}_{-}$ . For each $j\in\mathcal{J}$ , as $r\to 0$ ,

\displaystyle\phi^{(r)}(0,\ldots,0,r^{j}\eta_{j},\ldots,r^{J}\eta_{J})-\frac{1}{1-d_{j}\eta_{j}}\phi^{(r)}(0,\ldots,0,r^{j+1}\eta_{j+1},\ldots,r^{J}\eta_{J})\to 0,

(5.21)

with the convention that $\phi^{(r)}(0,\ldots,0,r^{j+1}\eta_{j+1},\ldots,r^{J}\eta_{J})=1$ when $j=J$ .

We will present two proofs for Proposition 5.1. The first proof is given in this section under the following bounded-support assumption.

Assumption 5.1 (Bounded support).

For each $i\in\mathcal{J}$ , random variables $T_{e,i}(1)$ and $T_{s,i}(1)$ have bounded supports. In addition, they satisfy

\displaystyle\mathbb{P}\{T_{e,i}(1)>0\}=1,\quad\mathbb{P}\{T_{s,i}(1)>0\}=1,\quad i\in\mathcal{J}.

(5.22)

Assumption 5.1 is stronger than the moment assumption, Assumption 3.2. The rest of this section is devoted to the proof of Proposition 5.1 under Assumption 5.1. This proof relies on a transform version of the basic adjoint relationship (BAR) (5.27), which will be proved in Section 6. In Section 7, we will present the second proof of Proposition 5.1 without Assumption 5.1.

In the rest of this section, instead of working with $\phi^{(r)}(\theta)$ directly, we will work with

\displaystyle{\psi}^{(r)}(\theta)=\mathbb{E}[{f}_{\theta}(X^{(r)})],\quad\text{ and }\quad{\psi}^{(r)}_{i}(\theta)=\mathbb{E}[f_{\theta}(X^{(r)})\mid Z^{(r)}_{i}=0]\quad i\in\mathcal{J},

(5.23)

for each $\theta\in\mathbb{R}^{J}_{-}$ , where for $x=(z,u,v)\in\mathbb{Z}^{J}_{+}\times\mathbb{R}^{J}_{+}\times\mathbb{R}^{J}_{+}$ ,

\displaystyle{f}_{\theta}(x)=\exp\Big{(}\langle\theta,z\rangle-\sum_{i}\alpha_{i}\gamma_{i}(\theta_{i})u_{i}-\sum_{i}\mu^{(r)}_{i}\xi_{i}(\theta)v_{i}\Big{)}.

(5.24)

In (5.24), $\gamma_{i}(\theta)$ and $\xi_{i}(\theta)$ are defined via

	$\displaystyle e^{\theta_{i}}\mathbb{E}[e^{-\gamma_{i}(\theta_{i})T_{e,i}(1)}]=1,$		(5.25)
	$\displaystyle\bigl{(}P_{i0}e^{-\theta_{i}}+\sum_{j\in\mathcal{J}}P_{ij}e^{\theta_{j}-\theta_{i}}\bigr{)}\mathbb{E}[e^{-\xi_{i}(\theta)T_{s,i}(1)}]=1.$		(5.26)

The advantage of working with ${\psi}^{(r)}$ is that it, together with $({\psi}^{(r)}_{1},\ldots,{\psi}^{(r)}_{J})$ , satisfies the following transform version of BAR.

Lemma 5.1.

Assume Assumptions 3.3 and 5.1 hold. Then, $\gamma_{i}(\theta_{i})$ and $\xi_{i}(\theta)$ are well defined for every $\theta\in\mathbb{R}^{J}$ . Furthermore, $(\psi^{(r)},\psi^{(r)}_{1},\ldots,\psi^{(r)}_{J})$ satisfies the transform-BAR

\displaystyle Q(\theta){{\psi}}^{(r)}(\theta)+\sum_{i\in\mathcal{J}}(\mu^{(r)}_{i}-\lambda_{i})\xi_{i}(\theta)\bigl{(}{\psi}^{(r)}(\theta)-{\psi}_{i}^{(r)}(\theta)\bigr{)}=0,\quad\theta\in\mathbb{R}^{J}_{-},

(5.27)

where $Q(\theta)$ is defined to be

\displaystyle Q(\theta)=\sum_{i\in\mathcal{J}}\bigl{(}\alpha_{i}\gamma_{i}(\theta_{i})+\lambda_{i}\xi_{i}(\theta)\bigr{)}.

(5.28)

The fact that $\gamma_{i}(\theta_{i})$ and $\xi_{i}(\theta)$ are well defined for all $\theta\in\mathbb{R}^{J}$ follows from Lemma 4.1 of Braverman et al. (2017) and condition (5.22). The transform-BAR (5.27) follows from BAR (3.20) of Braverman et al. (2017) by taking $f(x)=f_{\theta}(x)$ in (5.24). The proof of Lemma 5.1 will be recapitulated in Section 6.

When the primitive distributions are exponential, $\gamma_{i}(\theta_{i})$ and $\xi_{i}(\theta)$ have the following explicit expressions: for each $\theta\in\mathbb{R}^{J}$ and $i\in\mathcal{J}$

	$\displaystyle\gamma_{i}(\theta_{i})=e^{\theta_{i}}-1,$		(5.29)
	$\displaystyle\xi_{i}(\theta)=P_{i0}\big{(}e^{-\theta_{i}}-1\big{)}+\sum_{j\in\cal{J}}P_{ij}\big{(}e^{\theta_{j}-\theta_{i}}-1\big{)}.$		(5.30)

Although $\gamma_{i}(\theta_{i})$ and $\xi_{i}(\theta)$ are defined implicitly in general through (5.25) and (5.26), the following Taylor expansions are what we need to access these implicit functions. As $\theta\to 0$ ,

	$\displaystyle\gamma_{i}(\theta_{i})=\theta_{i}+\frac{1}{2}\gamma^{*}_{i}(\theta_{i})+o(\theta_{i}^{2}),$		(5.31)
	$\displaystyle\xi_{i}(\theta)=\bar{\xi}_{i}(\theta)+\frac{1}{2}\xi^{*}_{i}(\theta)+o(\lvert\theta\rvert^{2}),$		(5.32)

where $\lvert\theta\rvert=\sum_{i}\lvert\theta_{i}\rvert$ , and

	$\displaystyle\gamma_{i}^{*}(\theta_{i})=c_{e,i}^{2}\theta_{i}^{2},$		(5.33)
	$\displaystyle\bar{\xi}_{i}(\theta)=\sum_{j\in\mathcal{J}}P_{ij}\theta_{j}-\theta_{i},$		(5.34)
	$\displaystyle\xi^{*}_{i}(\theta)=c_{s,i}^{2}\Big{(}\sum_{j\in\mathcal{J}}P_{ij}\theta_{j}-\theta_{i}\Big{)}^{2}+\sum_{j\in\mathcal{J}}P_{ij}\theta_{j}^{2}-\Big{(}\sum_{j\in\mathcal{J}}P_{ij}\theta_{j}\Big{)}^{2}.$		(5.35)

In (5.31) and (5.32), we adopt the standard notation that $f(r)=o(g(r))$ means $f(r)/g(r)\to 0$ as $r\to 0$ , and in (5.33) and (5.35), $c^{2}_{e,i}$ and $c^{2}_{s,i}$ are the variances of the unitized random variables $T_{e,i}$ and $T_{s,i}$ , respectively. The $c^{2}_{e,i}$ and $c^{2}_{s,i}$ are also equal to the squared coefficient of variations (SCVs) of the corresponding interarrival and service time distributions. A direct Taylor expansion of the functions in (5.29) and (5.30) are consistent with the expansions in (5.31) and (5.32) with $c^{2}_{e,i}=1$ and $c^{2}_{s,i}=1$ .

The following state space collapse (SSC) result implies that working with $\phi^{(r)}$ is equivalent to working with $\psi^{(r)}$ for our purposes.

Lemma 5.2 (SSC).

Assume $\theta{(r)}\in\mathbb{R}^{J}_{-}$ satisfying

\displaystyle\lvert\theta{(r)}\rvert\leq c\,r,\quad r\in(0,1)

(5.36)

for some constant $c>0$ . Then,

	$\displaystyle\phi^{(r)}\big{(}\theta{(r)}\big{)}-\psi^{(r)}\big{(}\theta{(r)}\big{)}=o(1)\quad\text{ as }r\to 0,$		(5.37)
	$\displaystyle\phi^{(r)}_{j}\big{(}\theta{(r)}\big{)}-\psi^{(r)}_{j}\big{(}\theta{(r)}\big{)}=o(1)\quad\text{ as }r\to 0,\quad j\in\mathcal{J},$		(5.38)

where $\phi^{(r)}_{j}(\theta)=\mathbb{E}[f_{(\theta,r)}(X^{(r)})\mid Z^{(r)}_{j}=0]$ is the MGF of $Z^{(r)}$ conditioning on $Z^{(r)}_{j}=0$ .

We call (5.37) and (5.38) the transform version of the state space collapse (SSC) because components $(R^{(r)}_{e},R^{(r)}_{s})$ in the steady-state variable $X^{(r)}$ are bounded by (deterministic) constants, and therefore “collapse to zero” in the sense that

\displaystyle\langle a^{(r)},R^{(r)}_{e}\rangle\to 0\quad\langle a^{(r)},R^{(r)}_{s}\rangle\to 0

in expectation for any sequence $a^{(r)}\to 0$ . Lemma 5.2 is a special case of Lemma 7.2, which will be proved in Appendix B.

Proof of Proposition 5.1 (under Assumption 5.1).

Recall $(w_{ij})$ defined in (3.12) and (3.13). To prove (5.21), fix a vector $\eta\in\mathbb{R}^{J}_{-}$ and a station $j\in\mathcal{J}$ . We define $\theta=(\theta_{1},\ldots,\theta_{J})^{\prime}$ with the components being given as the following:

	$\displaystyle\theta_{k}=\eta_{k}r^{k},\quad k=j+1,\ldots,J,$		(5.39)
	$\displaystyle\theta_{j}=\eta_{j}r^{j}+\frac{1}{1-w_{jj}}\sum_{i=j+1}^{J}w_{ji}\theta_{i},$		(5.40)
	$\displaystyle\theta_{\ell}=w_{\ell j}\theta_{j}+\sum_{i=j+1}^{J}w_{\ell i}\theta_{i},\quad\ell=1,\ldots,j-1.$		(5.41)

In the rest of this proof, we use $\theta^{(r)}$ to denote the $\theta$ defined by (5.39)-(5.41), omitting the dependence on $\eta\in\mathbb{R}^{J}_{-}$ and the station index $j\in\mathcal{J}$ . We will prove

\displaystyle\psi^{(r)}(\theta^{(r)})-\frac{1}{1-d_{j}\eta_{j}}\psi^{(r)}_{j}(\theta^{(r)})=o(1)\quad\text{ as }r\to 0.

(5.42)

Assuming (5.42), we now complete the proof of Proposition 5.1. First, the uniform moment bound (3.15) implies

	$\displaystyle\phi^{(r)}(\theta^{(r)})-\phi^{(r)}(0,\ldots,0,r^{j}\eta_{j},\ldots,r^{J}\eta_{J})=o(1),$		(5.43)
	$\displaystyle\phi^{(r)}_{j}(\theta^{(r)})-\phi^{(r)}_{j}(0,\ldots,0,r^{j}\eta_{j},\ldots,r^{J}\eta_{J})=o(1).$		(5.44)

Since $\theta^{(r)}_{j}=r^{j}\eta_{j}+o(r^{j})=(1-\rho^{(r)}_{j})\eta_{j}+o(r^{j})$ , $\theta^{(r)}_{i}=w_{ij}\theta^{(r)}_{j}+o(r^{j})=w_{ij}\eta_{j}(1-\rho^{(r)}_{j})+o(r^{j})$ , for $i=1,\ldots,j-1$ , and the uniform moment bound condition (3.15) implies that as $r\to 0$

\displaystyle\theta^{(r)}_{i}Z^{(r)}_{i}\to 0\quad\text{ for }i=1,\ldots,j-1,\quad\text{ and }(\theta^{(r)}_{j}-r^{j}\eta_{j})Z^{(r)}_{j}\to 0

in expectation. Intuitively, from station $j$ ’s point of view, stations $i=1,\ldots,j-1$ are so lightly loaded that multiplying by station $j$ ’s scaling factor $(1-\rho^{(r)}_{j})$ , $Z^{(r)}_{i}$ “collapses to zero” for $i=1,\ldots,j-1$ . Therefore, one has heuristically

	$\displaystyle\phi^{(r)}(\theta^{(r)})=\mathbb{E}\Big{[}\exp\Big{(}\sum_{i=1}^{j-1}\theta^{(r)}_{i}Z^{(r)}_{i}+(\theta^{(r)}_{j}-r^{j}\eta_{j})Z^{(r)}_{j}+\sum_{i=j}^{J}r^{i}\eta_{i}Z^{(r)}_{i}\Big{)}\Big{]}$
	$\displaystyle\approx\mathbb{E}\Big{[}\exp\Big{(}\sum_{i=j}^{J}r^{i}\eta_{i}Z^{(r)}_{i}\Big{)}\Big{]}=\phi^{(r)}(0,\ldots,0,r^{j}\eta_{j},\ldots,r^{J}\eta_{J}).$

Both sets of results, (5.37)-(5.38) and (5.43)-(5.44), will be proved in Appendix B under the heading of SSC results.

It follows from SSC results (5.37)-(5.38) and (5.43)-(5.44), and (5.42) that

\displaystyle\phi^{(r)}(0,\ldots,0,r^{j}\eta_{j},\ldots,r^{J}\eta_{J})-\frac{1}{1-d_{j}\eta_{j}}\phi^{(r)}_{j}(0,\ldots,0,r^{j}\eta_{j},\ldots,r^{J}\eta_{J})\to 0.

(5.45)

By setting $\eta_{j}=0$ in (5.45), one has

\displaystyle\phi^{(r)}(0,\ldots,0,r^{j+1}\eta_{j+1},\ldots,r^{J}\eta_{J})-\phi^{(r)}_{j}(0,\ldots,0,r^{j+1}\eta_{j+1},\ldots,r^{J}\eta_{J})\to 0.

(5.46)

By definition, $\phi^{(r)}_{j}(0,\ldots,0,r^{j}\eta_{j},\ldots,r^{J}\eta_{J})=\phi^{(r)}_{j}(0,\ldots,0,r^{j+1}\eta_{j+1},\ldots,r^{J}\eta_{J})$ . Therefore, (5.45) and (5.46) imply (5.21). Thus, we have proved Proposition 5.1, assuming (5.42) holds.

We now use transform-BAR (5.27) to prove (5.42). First, recall the definitions of $\bar{\xi}_{i}$ , $\gamma^{*}_{i}$ , and $\xi^{*}_{i}$ in (5.33)-(5.35). It follows from traffic equation (2.3) that

\displaystyle\sum_{i\in\mathcal{J}}\bigl{(}\alpha_{i}\theta_{i}+\lambda_{i}\bar{\xi}_{i}(\theta)\bigr{)}=0\quad\text{ for each }\theta\in\mathbb{R}^{J}.

(5.47)

By Taylor expansions (5.31)-(5.32), the $Q$ in (5.28) has the following Taylor expansion

	$\displaystyle Q(\theta)$	$\displaystyle=\sum_{i\in\mathcal{J}}\bigl{(}\alpha_{i}\theta_{i}+\lambda_{i}\bar{\xi}_{i}(\theta)\bigr{)}+\frac{1}{2}\sum_{i\in\mathcal{J}}\big{(}\alpha_{i}\gamma_{i}^{}(\theta_{i})+\lambda_{i}\xi_{i}^{}(\theta)\big{)}+o(\lvert\theta\rvert^{2})$
		$\displaystyle=Q^{*}(\theta)+o(\lvert\theta\rvert^{2})\quad\text{ as }\theta\to 0,$		(5.48)

where

\displaystyle Q^{*}(\theta)=\frac{1}{2}\sum_{i\in\mathcal{J}}\big{(}\alpha_{i}\gamma_{i}^{*}(\theta_{i})+\lambda_{i}\xi_{i}^{*}(\theta)\big{)}.

(5.49)

As a result, the transform-BAR (5.27) implies that for any sequence $\theta=\theta{(r)}\in\mathbb{R}^{J}_{-}$ with $\theta{(r)}\to 0$ as $r\to 0$ ,

	$\displaystyle Q^{*}(\theta)\psi^{(r)}(\theta)+\sum_{i\in\mathcal{J}}(\mu^{(r)}_{i}-\lambda_{i})\bar{\xi}_{i}(\theta)\bigl{(}\psi^{(r)}(\theta)-\psi^{(r)}_{i}(\theta)\bigr{)}$
	$\displaystyle=-\frac{1}{2}\sum_{i\in\mathcal{J}}(\mu^{(r)}_{i}-\lambda_{i}){\xi}^{*}_{i}(\theta)\bigl{(}\psi^{(r)}(\theta)-\psi^{(r)}_{i}(\theta)\bigr{)}+o(\lvert\theta\rvert^{2}).$		(5.50)

We keep the justification of (5.50) brief here because a general version of (5.50) will be presented and proved in Section 7.

Recall $\theta^{(r)}$ defined in (5.39)-(5.41). One can verify that

	$\displaystyle\bar{\xi}_{i}(\theta^{(r)})=0\quad i=1,\ldots,j-1,$		(5.51)
	$\displaystyle\bar{\xi}_{j}(\theta^{(r)})=(1-w_{jj})r^{j}\eta_{j};$		(5.52)

see Lemma A.2 for a proof. From the definition of $\theta^{(r)}$ ,

	$\displaystyle\theta^{(r)}_{\ell}=w_{\ell j}r^{j}\eta_{j}+o(r^{j}),\quad\ell=1,\ldots,j-1,$
	$\displaystyle\theta^{(r)}_{j}=r^{j}\eta_{j}+o(r^{j}),$
	$\displaystyle\theta^{(r)}_{k}=o(r^{j}),\quad k=j+1,\ldots,J.$

Therefore, under the multi-scale heavy traffic condition (3.5),

	$\displaystyle(\mu^{(r)}_{i}-\lambda_{i}){\xi}^{*}_{i}(\theta^{(r)})=o(r^{2j})\quad i\in\mathcal{J},$		(5.53)
	$\displaystyle(\mu^{(r)}_{i}-\lambda_{i})\bar{\xi}_{i}(\theta^{(r)})=o(r^{2j})\quad i=j+1,\ldots,J.$		(5.54)

It follows from (5.51)-(5.54) and (5.50) that

\displaystyle Q^{*}\big{(}\theta^{(r)}\big{)}\psi^{(r)}\big{(}\theta^{(r)}\big{)}+(1-w_{jj})r^{2j}\eta_{j}\Bigl{(}\psi^{(r)}\big{(}\theta^{(r)}\big{)}-\psi^{(r)}_{i}\big{(}\theta^{(r)}\big{)}\Bigr{)}=o(r^{2j}).

(5.55)

It is easy to check that

\displaystyle\lim_{r\to 0}\frac{Q^{*}(\theta^{(r)})}{r^{2j}}=\eta_{j}^{2}Q^{*}(w_{1j},\ldots,w_{j-1,j},1,0,\ldots,0),

which is equal to $\frac{1}{2}\sigma^{2}_{j}\eta_{j}^{2}$ by Lemma A.3. Dividing both sides of (5.55) by $r^{2j}$ , we have

\displaystyle(\sigma^{2}_{j}/2)\eta_{j}^{2}\psi^{(r)}(\theta^{(r)})+(1-w_{jj})\eta_{j}\bigl{(}\psi^{(r)}(\theta^{(r)})-\psi^{(r)}_{i}(\theta^{(r)})\bigr{)}=o(1),

from which (5.42) follows.

∎

6 Basic adjoint relationship and proof of Lemma 5.1

The BAR characterizes the distribution of $X^{(r)}$ , and it is developed in Braverman et al. (2017, 2024) for steady-state analysis of continuous-time queueing networks with general primitive distributions. In this section, we recapitulate the BAR and provide a proof for transform-BAR (5.27) in Lemma 5.1. The BAR-approach provides an alternative to the interchange of limits, a common approach for proving steady-state convergence of queueing networks in conventional heavy traffic. This paper and Dai and Huo (2024) demonstrate that BAR is also a natural technical tool to prove asymptotic steady-state independence under multi-scale heavy traffic.

To state the BAR for a GJN, denote by $\mathcal{D}$ the set of bounded functions $f:S\to\mathbb{R}$ satisfying (a) for each $z\in\mathbb{Z}^{J}_{+}$ , $f(z,\cdot,\cdot):\mathbb{R}_{+}^{J}\times\mathbb{R}_{+}^{J}\rightarrow\mathbb{R}$ is continuously differentiable at all but finitely many points, and (b) the derivatives of $f(z,\cdot,\cdot)$ in each dimension have a uniform bound over $z\in\mathbb{Z}^{J}_{+}$ . The BAR takes the form: for each $f\in\mathcal{D}$ ,

\displaystyle\mathbb{E}_{\pi}[\mathcal{A}f(X)]

\displaystyle+\sum_{j\in\mathcal{J}}\alpha_{j}\mathbb{E}_{e,j}\left[f(X_{+})-f(X_{-})\right]+\sum_{j\in\mathcal{J}}\lambda_{j}\mathbb{E}_{s,j}\left[f(X_{+})-f(X_{-})\right]=0,

(6.56)

where

\displaystyle\mathcal{A}f(x)=-\sum_{j\in\mathcal{J}}\frac{\partial f}{\partial r_{e,j}}(x)-\sum_{j\in\mathcal{J}}\frac{\partial f}{\partial r_{s,j}}(x)\mathbbm{1}\left(z_{j}>0\right),\quad x=\left(z,r_{e},r_{s}\right)\in\mathbb{Z}^{J}_{+}\times\mathbb{R}^{J}_{+}\times\mathbb{R}^{J}_{+}.

(6.57)

In (6.56), $\mathbb{E}_{\pi}[\cdot]$ is the expectation when $X$ follows the stationary distribution $\pi$ , $E_{e,i}[\cdot]$ is the expectation when $X_{-}$ follows the Palm distribution $\nu_{e,i}$ . The Palm distribution $\nu_{e,i}$ measures the distribution of the state $X_{-}=(Z_{-},R_{e,-},R_{s,-})$ , immediately before an external arrival to station $i$ . In particular, it is necessary true that under the Palm distribution $\nu_{e,i}$ , $R_{e,i,-}=0$ and the post-jump state $X_{+}=(Z_{+},R_{e,+},R_{s,+})$ satisfying

\displaystyle Z_{i,+}=1+Z_{i,-},\quad R_{e,i,+}=T_{e,i}/\alpha_{i},

(6.58)

and other components of $X_{+}$ remain the same as $X_{-}$ , where $T_{e,i}$ is a random variable that has the same distribution as $T_{e,i}(1)$ and is independent of $X_{-}$ . Equation (6.58) states that an external arrival to station $i$ causes queue length increases by $1$ and the remaining interarrival time is reset to a “fresh” interarrival time. In vector form, under Palm measure $\nu_{e,i}$ ,

\displaystyle X_{+}-X_{-}=\Delta_{e,i}\equiv\left(e^{(i)},e^{(i)}T_{e,i}/\alpha_{i},0\right),\quad i\in\mathcal{J},

where $e^{(i)}$ is the $J$ -vector with the $i$ th component being $1$ and other components being $0$ . Palm expectation $\mathbb{E}_{s,i}[\cdot]$ with respect to the Palm distribution $v_{s,i}$ can be explained similarly with

\displaystyle\Delta_{s,i}\equiv\left(-e^{(i)}+\phi_{i},0,e^{(i)}T_{s,i}/\mu_{i}\right),\quad i\in\mathcal{J},

and $(\phi_{i},T_{s,i})$ is the random vector that has the same distribution as $(\Phi_{i}(1),T_{s,i}(1))$ and is independent of $X_{-}$ . The proof of (6.56) is given in (6.16) and Lemma 6.3 of Braverman et al. (2024), and the Palm distributions are defined in (6.11) and (6.21) there. The proof follows from the stationary equation

\displaystyle\mathbb{E}[f(X(1))-f(X(0))]=0,\quad\forall f\in\mathcal{D},

and the fundamental theorem of calculus, where $\{X(t),t\geq 0\}$ is the Markov process describing the GJN dynamics and $X(0)$ follows the stationary distribution $\pi$ . BAR (6.56) is the same as BAR (3.15) of Braverman et al. (2017), except that Palm distributions were not defined explicitly in Braverman et al. (2017).

When $f\in\mathcal{D}$ satisfies

\displaystyle\mathbb{E}[f(z+e^{(i)},u+e^{(i)}T_{e,i}/\alpha_{i},v)]=f(z,u,v)

(6.59)

for each $x=(z,u,v)\in S$ with $u_{i}=0$ , and

\displaystyle\mathbb{E}[f(z-e^{(i)}+\phi_{i},u,v+e^{(i)}T_{s,i}/\mu^{(r)}_{i}]=f(z,u,v)

(6.60)

for each $x=(z,u,v)\in S$ with $z_{i}>0$ and $v_{i}=0$ , the Palm terms in (6.56) are all zero and BAR (6.56) reduces to

\displaystyle\mathbb{E}[\mathcal{A}f(X)]=0.

(6.61)

Equation (6.61) was first proved in Lemma 3.1 of Braverman et al. (2017).

Proof of Lemma 5.1.

First, condition (5.22) and Lemma 4.1 of Braverman et al. (2017) imply that $\gamma_{i}(\theta_{i})$ and $\xi_{i}(\theta)$ are well defined for all $\theta\in\mathbb{R}^{J}$ . We now prove that $\psi^{(r)}$ , together with $(\psi^{(r)}_{1},\ldots,\psi^{(r)}_{J})$ satisfies (5.27). For each $\theta\in\mathbb{R}^{J}_{-}$ , consider $f_{\theta}(x)$ defined in (5.24). Because $\gamma_{i}(\theta_{i})$ satisfies (5.25), $f_{\theta}(x)$ satisfies (6.59). Similarly, since $\xi_{i}(\theta)$ satisfies (5.26), $f_{\theta}(x)$ satisfies (6.60). Since $T_{e,i}(1)$ and $T_{s,i}(1)$ have bounded supports for each $i\in\mathcal{J}$ , we can restrict $(u,v)$ component of state $x=(z,u,v)$ in a bounded region. Thus, $f_{\theta}\in\mathcal{D}$ for each $\theta\in\mathbb{R}^{J}_{-}$ . It is easy to see

	$\displaystyle\mathcal{A}f_{\theta}(x)=$	$\displaystyle-\sum_{i\in\mathcal{J}}\alpha_{i}\gamma_{i}(\theta_{i})f_{\theta}(x)-\sum_{i\in\mathcal{J}}\mu_{i}^{(r)}\xi_{i}(\theta)f_{\theta}(x)\mathbbm{1}(z_{k}>0)$
	$\displaystyle=$	$\displaystyle-\sum_{i\in\mathcal{J}}\Big{(}\alpha^{i}\gamma_{i}(\theta_{i})+\mu^{(r)}_{i}\xi_{i}(\theta)\Big{)}f_{\theta}(x)+\sum_{i\in\mathcal{J}}\mu_{i}^{(r)}\xi_{i}(\theta)f_{\theta}(x)\mathbbm{1}(z_{k}=0)$
	$\displaystyle=$	$\displaystyle-\sum_{i\in\mathcal{J}}\Big{(}\alpha^{i}\gamma_{i}(\theta_{i})+\lambda_{i}\xi_{i}(\theta)\Big{)}f_{\theta}(x)-\sum_{i\in\mathcal{J}}(\mu^{(r)}_{i}-\lambda_{i})f_{\theta}(x)$
		$\displaystyle+\sum_{i\in\mathcal{J}}\mu_{i}^{(r)}\xi_{i}(\theta)f_{\theta}(x)\mathbbm{1}(z_{i}=0).$

Thus, (6.61) implies

	$\displaystyle\sum_{i\in\mathcal{J}}\Big{(}\alpha^{i}\gamma_{i}(\theta_{i})+\lambda_{i}\xi_{i}(\theta)\Big{)}\mathbb{E}[f_{\theta}(X^{(r)})]+\sum_{i\in\mathcal{J}}(\mu^{(r)}_{i}-\lambda_{i})\mathbb{E}[f_{\theta}(X^{(r)})]$
	$\displaystyle-\sum_{i\in\mathcal{J}}\mu_{i}^{(r)}\xi_{i}(\theta)\mathbb{E}[f_{\theta}(X^{(r)})\mathbbm{1}(Z^{(r)}_{i}=0)]=0.$		(6.62)

Finally, (5.27) follows from (6.62), $\psi^{(r)}(\theta)=\mathbb{E}[f_{\theta}(X^{(r)})]$ , and

	$\displaystyle\mu_{i}^{(r)}\mathbb{E}[f_{\theta}(X^{(r)})\mathbbm{1}(Z^{(r)}_{i}=0)]=\mu_{i}^{(r)}\mathbb{E}[f_{\theta}(X^{(r)})\mid Z^{(r)}_{i}=0]\mathbb{P}\{Z^{(r)}_{i}=0\}$
	$\displaystyle=\mu^{r}_{i}(1-\rho^{(r)}_{i})\psi^{(r)}_{i}(\theta)=(\mu^{(r)}_{i}-\lambda_{i})\psi^{(r)}_{i}(\theta),$

where the second equality follows from (3.7).

∎

7 Asymptotic BAR

Sections 5 and 6 provided a proof of Theorem 3.1 under Assumption 5.1 (bounded supports). This section details the changes needed to prove Theorem 3.1 without Assumption 5.1. As in Section 5, to prove Theorem 3.1, it suffices to prove Proposition 5.1. The key is to develop an asymptotic version of the transform-BAR (7.75) to replace (5.50) that is used in the proof of Proposition 5.1 in Section 5.

Throughout this section, we assume Assumptions 3.1-3.3. When the unitized interarrival and service times $T_{e,i}(1)$ and $T_{s,i}(1)$ are bounded, we have employed test function $f_{\theta}(x)$ in (5.24) in Section 5, where $\gamma_{i}(\theta_{i})$ and $\xi_{i}(\theta)$ satisfy (5.25) and (5.26) in order for $f_{\theta}(x)$ to satisfy (6.59) and (6.60). Without Assumption 5.1, we will work with the truncated random variables $T_{e,i}(1)\wedge r^{-1}$ and $T_{s,i}(1)\wedge r^{-1}$ for $r\in(0,1)$ , where $a\wedge b\equiv\min(a,b)$ for $a,b\in\mathbb{R}$ . Equivalently, for each $\theta\in\mathbb{R}^{J}_{-}$ and each $r\in(0,1)$ , we wish to work with test function $f_{(\theta,r)}(x)$ for $x=(z,u,v)\in\mathbb{Z}^{J}_{+}\times\mathbb{R}^{J}_{+}\times\mathbb{R}^{J}_{+}$ via

\displaystyle f_{(\theta,r)}(x)=\exp\Bigl{(}\langle\theta,z\rangle-\sum_{j}\gamma_{j}(\theta_{j},r)(\alpha_{j}u_{j}\wedge r^{-1})-\sum_{j}\xi_{j}(\theta,r)(\mu_{j}^{(r)}v_{j}\wedge r^{-1})\Bigr{)}.

(7.63)

In order for $f_{(\theta,r)}$ to satisfy (6.59) and (6.60), $\gamma_{i}(\theta_{i},r)$ and $\xi_{i}(\theta,r)$ need to satisfy

	$\displaystyle e^{\theta_{j}}\mathbb{E}[e^{-\gamma_{j}(\theta_{j},r)(T_{e,j}(1)\wedge r^{-1})}]=1,$		(7.64)
	$\displaystyle\bigl{(}\sum_{k\in\mathcal{J}}P_{jk}e^{\theta_{k}-\theta_{j}}+P_{j0}e^{-\theta_{j}}\bigr{)}\mathbb{E}[e^{-\xi_{j}(\theta,r)(T_{s,j}(1)\wedge r^{-1})}]=1.$		(7.65)

Without condition (5.22) in Assumption 5.1, $\gamma_{i}(\theta_{i},r)$ and $\xi_{i}(\theta,r)$ are not guaranteed to be well defined for every $\theta\in\mathbb{R}^{J}_{-}$ . Lemma 3.2 of Braverman et al. (2017) says that when $\lvert\theta\rvert<M$ for small enough constant $M>0$ , $\gamma_{j}(\theta_{j},r)$ and $\xi_{j}(\theta,r)$ are defined for each $r\in(0,1)$ . Clearly, when the interarrival and service times are bounded by $1/r$ , $\gamma_{j}(\theta_{j},r)=\gamma_{j}(\theta_{j})$ and $\xi_{j}(\theta,r)=\xi_{j}(\theta)$ . Therefore, the proof of Proposition 5.1 in Section 5 is a special case of the proof presented below. The proof in Section 5 is concise under the bounded support assumption.

In the following, define $\Theta=\{\theta\in\mathbb{R}^{J}_{-}:\lvert\theta\rvert<M\}$ . For each $\theta\in\Theta$ , define

\displaystyle{\psi}^{(r)}(\theta)=\mathbb{E}[f_{(\theta,r)}(X^{(r)})],\quad{\psi}^{(r)}_{j}(\theta)=\mathbb{E}\big{[}f_{(\theta,r)}(X^{(r)})\mid Z^{(r)}_{j}=0\big{]}.

(7.66)

The symbol $\psi^{(r)}$ is identical to the one used in (5.23). The reusage of the symbol is intentional. First, when the primitive distributions have bounded supports, the two definitions are indeed identical when $r$ is small enough. Second, the $\psi^{(r)}$ in this section always follow the definition in (7.66), but the same symbol reminds us to trace expressions and equations involving $\psi^{r}$ in Section 5, which could be copied verbatim into this section. This fact allows us to focus on those aspects that affect current proof differently.

We first establish three lemmas. The following lemma replaces Lemma 5.1, and equation (7.67) replaces the transform-BAR (5.27).

Lemma 7.1.

Assume Assumption 3.3 holds. Then, for each $r\in(0,1)$ ,

\displaystyle Q^{(r)}(\theta){{\psi}}^{(r)}(\theta)+\sum_{i\in\mathcal{J}}(\mu^{(r)}_{i}-\lambda_{i})\xi_{i}(\theta,r)\bigl{(}{\psi}^{(r)}(\theta)-{\psi}_{i}^{(r)}(\theta)\bigr{)}=\epsilon(\theta,r),\quad\theta\in\Theta,

(7.67)

where

$\displaystyle{Q}^{(r)}(\theta)$	$\displaystyle=\sum_{i\in\mathcal{J}}\Big{(}\alpha_{i}\gamma_{i}(\theta_{i},r)+\lambda_{i}\xi_{i}(\theta,r)\Big{)},$	(7.68)
$\displaystyle\epsilon(\theta,r)$	$\displaystyle=\sum_{i\in\mathcal{J}}\alpha_{i}\gamma_{i}(\theta_{i},r)\mathbb{E}\big{[}1(\alpha_{i}R^{(r)}_{e,i}>1/r)f_{(\theta,r)}(X^{(r)})\big{]}$
	$\displaystyle+\sum_{i\in\mathcal{J}}\mu^{(r)}_{i}\xi_{i}(\theta,r)\mathbb{E}\big{[}1(\mu_{i}^{(r)}R^{(r)}_{s,i}>1/r,Z^{(r)}_{i}>0)f_{(\theta,r)}(X^{(r)})\big{]},$	(7.69)

and $\gamma_{i}(\theta_{i},r)$ , $\xi_{i}(\theta,r)$ are defined in (7.64) and (7.65), respectively.

Proof.

For each $\theta\in\Theta$ and $r\in(0,1)$ , $\gamma_{i}(\theta_{i},r)$ and $\xi_{i}(\theta,r)$ are well defined, $f_{(\theta,r)}\in\mathcal{D}$ , and $f_{(\theta,r)}$ satisfies (6.59) and (6.60). It follows from (6.61) that $\mathbb{E}[f_{(\theta,r)}(X^{(r)})]=0$ . Note that

	$\displaystyle\mathcal{A}f_{(\theta,r)}(x)=$	$\displaystyle-\sum_{i\in\mathcal{J}}\alpha_{i}\gamma_{i}(\theta_{i},r)\mathbbm{1}(\alpha_{i}u_{i}\leq r^{-1})f_{(\theta,r)}(x)$
		$\displaystyle-\sum_{i\in\mathcal{J}}\mu_{i}^{(r)}\xi_{i}(\theta,r)\mathbbm{1}(\mu^{(r)}_{i}v_{i}\leq r^{-1})f_{(\theta,r)}(x)\mathbbm{1}(z_{i}>0)$
	$\displaystyle=$	$\displaystyle-\sum_{i\in\mathcal{J}}\alpha_{i}\gamma_{i}(\theta_{i},r)f_{(\theta,r)}(x)-\sum_{i\in\mathcal{J}}\mu_{i}^{(r)}\xi_{i}(\theta,r)f_{(\theta,r)}(x)\mathbbm{1}(z_{k}>0)$
		$\displaystyle+\sum_{i\in\mathcal{J}}\alpha_{i}\gamma_{i}(\theta_{i},r)\mathbbm{1}(\alpha_{i}u_{i}>r^{-1})f_{(\theta,r)}(x)$
		$\displaystyle+\sum_{i\in\mathcal{J}}\mu_{i}^{(r)}\xi_{i}(\theta,r)\mathbbm{1}(\mu^{(r)}_{i}v_{i}>r^{-1})f_{(\theta,r)}(x)\mathbbm{1}(z_{i}>0)$

The rest of the proof is identical to the proof of Lemma 5.1 in Section 6.

∎

The second lemma replaces Lemma 5.2. Actually, Lemma 5.2 is a special case of Lemma 7.2, which will be proved in Appendix B.

Lemma 7.2.

For $(\psi^{(r)},\psi^{(r)}_{1},\ldots,\psi^{(r)}_{J})$ defined in (7.66), the SSC results continue to hold. Namely, as $r\to 0$ ,

	$\displaystyle\phi^{(r)}(\theta(r))-\psi^{(r)}(\theta(r))=o(1),\quad$		(7.70)
	$\displaystyle\phi_{j}^{(r)}(\theta(r))-\psi^{(r)}_{j}(\theta(r))=o(1)\quad\text{ for }j\in\mathcal{J}$		(7.71)

for any sequence $\theta(r)\in\Theta$ satisfying (5.36).

The third lemma provides Taylor expansions for $\gamma_{i}(\theta_{i},r)$ and $\xi_{i}(\theta,r)$ , replacing Taylor expansions for $\gamma_{i}(\theta_{i})$ and $\xi_{i}(\theta)$ in (5.31) and (5.32). The proof is analogous to the proof of Lemma 7.5 in Braverman et al. (2024). For completeness, we provide a full proof here.

Lemma 7.3.

Let $\theta(r)\in\Theta$ , $r\in(0,1)$ , satisfying (5.36). Denote $\theta=\theta(r)$ . Then, as $r\to 0$ ,

		$\displaystyle\gamma_{i}(\theta_{i},r)=\theta_{i}+\frac{1}{2}\gamma_{i}^{*}(\theta_{i})+o(r^{J}\theta_{i})+o(\theta_{i}^{2}),\quad i\in\mathcal{J}$		(7.72)
		$\displaystyle\xi_{i}(\theta,r)=\bar{\xi}_{i}(\theta)+\frac{1}{2}\xi_{i}^{*}(\theta)+o(r^{J}\lvert\theta\rvert)+o(\lvert\theta\rvert^{2})\quad i\in\mathcal{J}.$		(7.73)

Proof.

Fix $i\in\mathcal{J}$ . For $r\in(0,1)$ , define $g^{(r)}(u)=x\wedge r^{-1}$ for $u\in\mathbb{R}_{+}$ . Let

\displaystyle a^{(r)}=1/\mathbb{E}\big{[}g^{(r)}\big{(}T_{e,i}(1)\big{)}\big{]},\quad b^{(r)}=\mathbb{E}\big{[}\big{(}g^{(r)}(T_{e,i}(1))\big{)}^{2}\big{]},\quad\sigma^{2}_{(r)}=b^{(r)}-(1/a^{(r)})^{2}.

For $y\in\mathbb{R}$ , define

\displaystyle c^{(r)}_{e,i}(y)=\gamma_{i}(y,r)-ya^{(r)}-\frac{1}{2}y^{2}(a^{(r)})^{3}\sigma^{2}_{(r)}.

Lemma 4.2 of Braverman et al. (2017) proved that there exists $K>0$ ,

\displaystyle\lim_{r\to 0}\sup_{0<\lvert y\rvert\leq K}\frac{c^{(r)}_{e,i}(ry)}{r^{2}y^{2}}=0,

which implies that for any $\theta_{i}(r)\in\mathbb{R}$ satisfying (5.36),

\displaystyle c^{(r)}_{e,i}(\theta_{i}(r))=o((\theta_{i}(r))^{2})\quad\text{ as }r\to 0.

(7.74)

Note that

	$\displaystyle\mathbb{E}[T_{e,i}(1)]-\mathbb{E}[g^{(r)}(T_{e,i}(1))]=\mathbb{E}[(T_{e,i}(1)-r^{-1})1(T_{e,i}(1)>r^{-1})]$
	$\displaystyle\leq r^{J+\delta_{0}}\mathbb{E}([T_{e,i}(1))^{J+1+\delta_{0}}1(T_{e,i}(1)>r^{-1})]=o(r^{J+\delta_{0}})=o(r^{J}),$
	$\displaystyle\mathbb{E}[(T_{e,i}(1))^{2}]-\mathbb{E}[(g^{(r)}(T_{e,i}(1)))^{2}]=\mathbb{E}\big{[}(T^{2}_{e,i}(1)-r^{-2})1(T_{e,i}(1)>r^{-1})\big{]}$
	$\displaystyle\leq r^{J-1+\delta_{0}}\mathbb{E}([T_{e,i}(1))^{J+1+\delta_{0}}1(T_{e,i}(1)>r^{-1})]=o(r^{J-1+\delta_{0}})=o(r^{J-1}).$

where the first inequality is by applying $\frac{1}{r^{J+\delta_{0}}}1(T_{e,i}(1)>r^{-1})\leq\left(T_{e,i}(1)\right)^{J+\delta_{0}}1(T_{e,i}(1)>r^{-1})$ , and similar technique applies on the second inequality. Therefore, using the facts that $\mathbb{E}[T_{e,i}(1)]=1$ and $c^{2}_{e,i}={\rm Var}(T_{e,i}(1))$ ,

	$\displaystyle a^{(r)}=1/\mathbb{E}[g^{(r)}(T_{e,i}(1))]=1/(1+o(r^{J}))=1+o(r^{J}),$
	$\displaystyle b^{(r)}=\mathbb{E}[T_{e,i}(1)]^{2}+o(r^{J-1}),$
	$\displaystyle\sigma^{2}_{(r)}=c^{2}_{e,i}+o(r^{J-1}),$
	$\displaystyle(a^{(r)})^{3}\sigma^{2}_{(r)}=c^{2}_{e,i}+o(r^{J-1}).$

Finally, (7.72) follows from (7.74) and

	$\displaystyle a^{(r)}\theta_{i}-\theta_{i}=\theta_{i}o(r^{J})=o(r^{J}\theta_{i}),$
	$\displaystyle(a^{(r)})^{3}\sigma^{2}_{(r)}\theta^{2}_{i}-c^{2}_{e,i}\theta^{2}_{i}=\theta^{2}_{i}o(r^{J-1})=o(r^{J-1}\theta^{2}_{i})=o(\theta_{i}^{2}).$

Expansion (7.73) can be proved similarly, again using Lemma 4.2 of Braverman et al. (2017).

∎

We now complete the proof of Proposition 5.1 without Assumption 5.1.

Proof of Proposition 5.1 (without Assumption 5.1).

Assume Assumptions 3.1-3.3 all hold. Examine the proof of Proposition 5.1 in Section 5 under the bounded support Assumption 5.1. With Lemma 7.2 replacing Lemma 5.2, the remaining step is to prove equation (5.42) for $(\psi^{(r)},\psi^{(r)}_{1},\ldots,\psi^{(r)}_{J})$ defined in (7.66).

To prove (5.42) for $(\psi^{(r)},\psi^{(r)}_{1},\ldots,\psi^{(r)}_{J})$ defined in (7.66), the key is to extend asymptotic BAR (5.50) to an asymptotic version that is appropriate for the current setting. We claim, as $r\to 0$ ,

	$\displaystyle Q^{*}(\theta)\psi^{(r)}(\theta)+\sum_{i\in\mathcal{J}}(\mu^{r}_{i}-\lambda_{i})\bar{\xi}_{i}(\theta)\bigl{(}\psi^{(r)}(\theta)-\psi^{(r)}_{i}(\theta)\bigr{)}$
	$\displaystyle=-\frac{1}{2}\sum_{i\in\mathcal{J}}(\mu^{(r)}_{i}-\lambda_{i}){\xi}^{*}_{i}(\theta)\bigl{(}\psi^{(r)}(\theta)-\psi^{(r)}_{i}(\theta)\bigr{)}+o(r^{J}\lvert\theta\rvert)+o(\lvert\theta\rvert^{2})$		(7.75)

for any sequence $\theta=\theta(r)$ satisfying (5.36).

To prove (7.75), our starting point is (7.67) in Lemma 7.1. We first prove

\displaystyle\epsilon(\theta(r),r)=o(\lvert\theta(r)\rvert r^{J}),

(7.76)

where $\epsilon(\theta,r)$ are defined in (7.69). To prove (7.76), we first claim

	$\displaystyle\mathbb{P}\big{\{}\alpha_{j}R^{(r)}_{e,j}>1/r\big{\}}=o(r^{J}),$		(7.77)
	$\displaystyle\mathbb{P}\big{\{}\mu^{(r)}_{j}R^{(r)}_{s,j}>1/r\big{\}}=o(r^{J}).$		(7.78)

Indeed,

	$\displaystyle\mathbb{P}\{\alpha_{j}R^{(r)}_{e,j}>1/r\}$	$\displaystyle\leq r^{J}\mathbb{E}[(\alpha_{j}R^{(r)}_{e,j})^{J}1(\alpha_{j}R^{(r)}_{e,j}>1/r)]$
		$\displaystyle=\frac{r^{J}}{J+1}\mathbb{E}\big{[}\big{(}T_{e,j}(1)\big{)}^{J+1}1(T_{e,j}(1)>r^{-1})\big{]}=o(r^{J}),$

where the first equality follows from Lemma 6.4 of Braverman et al. (2024), and the second equality follows from condition (3.6). Equation (7.78) can be proved similarly.

We next claim there exists a $r_{0}\in(0,1)$ and $M>0$ such that

\displaystyle\sup_{r\in(0,r_{0})}\sup_{x=(z,u,v)}f_{(\theta(r),r)}(x)\leq M.

(7.79)

Indeed, by Lemma 7.3, there exists $r_{0}\in(0,1)$ and $c_{1}>0$ such that for $r\in(0,r_{0})$

\displaystyle\lvert\gamma_{i}(\theta_{i}(r),r)\rvert\leq c_{1}\lvert\theta_{i}(r)\rvert\leq c_{1}c\,r,\quad\lvert\xi_{i}(\theta(r),r)\rvert\leq c_{1}\lvert\theta(r)\rvert\leq c_{1}c\,r,

(7.80)

from which (7.79) follows. It is clear that (7.76) follows from (7.77), (7.78), (7.79), and (7.80).

To complete the proof of (7.75), it remains to prove that for $\theta(r)$ satisfying (5.36), the following holds

	$\displaystyle\Big{(}Q^{(r)}(\theta){{\psi}}^{(r)}(\theta)+\sum_{i\in\mathcal{J}}(\mu^{(r)}_{i}-\lambda_{i})\xi_{i}(\theta,r)\bigl{(}{\psi}^{(r)}(\theta)-{\psi}_{i}^{(r)}(\theta)\bigr{)}\Big{)}$
	$\displaystyle-\Big{(}Q^{}(\theta){{\psi}}^{(r)}(\theta)+\sum_{i\in\mathcal{J}}(\mu^{(r)}_{i}-\lambda_{i})\bigl{(}\bar{\xi}_{i}(\theta)+\frac{1}{2}\xi^{}_{i}(\theta)\bigr{)}\bigl{(}{\psi}^{(r)}(\theta)-{\psi}_{i}^{(r)}(\theta)\bigr{)}\Big{)}$
	$\displaystyle=o(r^{J}\lvert\theta\rvert(r))+o(\lvert\theta(r)\rvert^{2}),$

which follows from the Taylor expansions (7.72) and (7.73), definition of $Q^{(r)}$ in (7.68), and (5.47), and the fact that $\psi^{(r)}(\theta(r))\leq 1$ and $\psi^{(r)}_{i}(\theta(r))\leq 1$ for each $r\in(0,1)$ and $i\in\mathcal{J}$ . ∎

References

Ata et al. (2024) Ata, B., Harrison, J. M. and Si, N. (2024). Drift control of high-dimensional reflected brownian motion: A computational method based on neural networks. Stochastic Systems .
URL https://doi.org/10.1287/stsy.2023.0044
Blanchet et al. (2021) Blanchet, J., Chen, X., Si, N. and Glynn, P. W. (2021). Efficient steady-state simulation of high-dimensional stochastic networks. Stochastic Systems 11 174–192.
URL https://doi.org/10.1287/stsy.2021.0077
Braverman et al. (2017) Braverman, A., Dai, J. and Miyazawa, M. (2017). Heavy traffic approximation for the stationary distribution of a generalized Jackson network: the BAR approach. Stochastic Systems 7 143–196.
URL http://projecteuclid.org/euclid.ssy/1495785619
Braverman et al. (2024) Braverman, A., Dai, J. and Miyazawa, M. (2024). The BAR approach for multiclass queueing networks with SBP service policies. Stochastic Systems. Published online in Articles in Advance 02 May 2024 .
URL https://doi.org/10.1287/stsy.2023.0011
Budhiraja and Lee (2009) Budhiraja, A. and Lee, C. (2009). Stationary distribution convergence for generalized Jackson networks in heavy traffic. Math. Oper. Res. 34 45–56.
URL http://dx.doi.org/10.1287/moor.1080.0353
Chen and Mandelbaum (1991) Chen, H. and Mandelbaum, A. (1991). Discrete flow networks: bottleneck analysis and fluid approximations. Math. Oper. Res. 16 408–446.
URL https://doi.org/10.1287/moor.16.2.408
Chen and Yao (2001) Chen, H. and Yao, D. (2001). Fundamentals of Queueing Networks: Performance, Asymptotics, and Optimization. Springer-Verlag, New York.
Chen et al. (2025) Chen, Z., Dai, J. G. and Guang, J. (2025). Generalized Jackson networks in multi-scale heavy traffic. Tech. rep., Preprint.
Dai and Harrison (2020) Dai, J. and Harrison, J. M. (2020). Processing Networks: Fluid Models and Stability. Cambridge University Press.
URL https://spnbook.org
Dai (1995) Dai, J. G. (1995). On positive Harris recurrence of multiclass queueing networks: a unified approach via fluid limit models. Ann. Appl. Probab. 5 49–77.
URL http://links.jstor.org/sici?sici=1050-5164(199502)5:1<49:OPHROM>2.0.CO;2-4&origin=MSN
Dai and Harrison (1992) Dai, J. G. and Harrison, J. M. (1992). Reflected Brownian motion in an orthant: numerical methods for steady-state analysis. The Annals of Applied Probability 2 65–86.
Dai and Huo (2024) Dai, J. G. and Huo, D. (2024). Asymptotic product-form steady-state for multiclass queueing networks with SBP service policies in multi-scale heavy traffic .
URL https://arxiv.org/abs/2403.04090
Dai et al. (1994) Dai, J. G., Nguyen, V. and Reiman, M. I. (1994). Sequential bottleneck decomposition: an approximation method for open queueing networks. Operations Research 42 119–136.
Gamarnik and Zeevi (2006) Gamarnik, D. and Zeevi, A. (2006). Validity of heavy traffic steady-state approximation in generalized Jackson networks. Ann. Appl. Probab. 16 56–90.
URL http://dx.doi.org/10.1214/105051605000000638
Guang et al. (2023) Guang, J., Chen, X. and Dai, J. G. (2023). Uniform moment bounds for generalized Jackson networks in multi-scale heavy traffic. arXiv preprint arXiv2401.14647 .
URL https://doi.org/10.48550/arXiv.2401.14647
Gurvich (2014) Gurvich, I. (2014). Validity of heavy-traffic steady-state approximations in multiclass queueing networks: the case of queue-ratio disciplines. Mathematics of Operations Research 39 121–162.
URL http://dx.doi.org/10.1287/moor.2013.0593
Harrison (1978) Harrison, J. M. (1978). The diffusion approximation for tandem queues in heavy traffic. Advances in Applied Probability 10 886–905.
Harrison (1988) Harrison, J. M. (1988). Brownian models of queueing networks with heterogeneous customer populations. In Stochastic Differential Systems, Stochastic Control Theory and Applications (W. Fleming and P. L. Lions, eds.), vol. 10 of The IMA Volumes in Mathematics and Its Applications. Springer, New York, NY.
URL https://doi.org/10.1007/978-1-4613-8762-6_11
Harrison and Nguyen (1993) Harrison, J. M. and Nguyen, V. (1993). Brownian models of multiclass queueing networks: current status and open problems. Queueing Systems Theory Appl. 13 5–40.
URL https://doi.org/10.1007/BF01158927
Harrison and Reiman (1981) Harrison, J. M. and Reiman, M. I. (1981). Reflected Brownian motion on an orthant. Ann. Probab. 9 302–308.
URL http://links.jstor.org/sici?sici=0091-1798(198104)9:2<302:RBMOAO>2.0.CO;2-P&origin=MSN
Harrison and Williams (1987a) Harrison, J. M. and Williams, R. J. (1987a). Brownian models of open queueing networks with homogeneous customer populations. Stochastics 22 77–115.
URL http://dx.doi.org/10.1080/17442508708833469
Harrison and Williams (1987b) Harrison, J. M. and Williams, R. J. (1987b). Multidimensional reflected Brownian motions having exponential stationary distributions. Annals of Probability 15 115–137.
Jackson (1957) Jackson, J. R. (1957). Networks of waiting lines. Operations Research 5 518–521.
Johnson (1983) Johnson, D. P. (1983). Diffusion approximations for optimal filtering of jump processes and for queueing networks. Ph.D. thesis, University of Wisconsin.
URL http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:8316215
Kang et al. (2009) Kang, W. N., Kelly, F. P., Lee, N. H. and Williams, R. J. (2009). State space collapse and diffusion approximation for a network operating under a fair bandwidth sharing policy. The Annals of Applied Probability 19 1719 – 1780.
URL https://doi.org/10.1214/08-AAP591
McDonald (2014) McDonald, J. H. (2014). Handbook of Biological Statistics. 3rd ed. Sparky House Publishing, Baltimore, Maryland.
Reiman (1984) Reiman, M. I. (1984). Open queueing networks in heavy traffic. Mathematics of Operations Research 9 441–458.
URL http://dx.doi.org/10.1287/moor.9.3.441
Shen et al. (2002) Shen, X., Chen, H., Dai, J. G. and Dai, W. (2002). The finite element method for computing the stationary distribution of an SRBM in a hypercube with applications to finite buffer queueing networks. Queueing Systems 42 33–62.
Whitt (1983) Whitt, W. (1983). The queueing network analyzer. Bell System Technical Journal 62 2779–2815.
Whitt and You (2022) Whitt, W. and You, W. (2022). A robust queueing network analyzer based on indices of dispersion. Naval Research Logistics (NRL) 69 36–56.
URL https://onlinelibrary.wiley.com/doi/abs/10.1002/nav.22010
Xu (2022) Xu, Y. (2022). WWTA load-balancing for parallel-server systems with heterogeneous servers and multi-scale heavy traffic limits for generalized jackson networks. Ph.D. thesis, Cornell University.
Ye and Yao (2018) Ye, H.-Q. and Yao, D. D. (2018). Justifying diffusion approximations for multiclass queueing networks under a moment condition. The Annals of Applied Probability 28 3652 – 3697.
URL https://doi.org/10.1214/18-AAP1401

Appendix A Supporting lemmas to prove Proposition 5.1

To assist the proof of Proposition 5.1, we state and prove the following three lemmas.

Lemma A.1.

Fix $j\in\mathcal{J}$ . The following equations

w_{ij}=P_{ij}+\sum_{k<j}P_{ik}w_{kj},\quad i\in\cal{J}

(A.81)

has a unique solution $(w_{1j},\ldots,w_{J,j})$ that is given by (3.12) and (3.13).

Proof.

The first $j-1$ equations in (A.81) uniquely solve $(w_{1j},\ldots,w_{j-1,j})^{\prime}$ , which is given by (3.12). (3.13) is simply the vector version of the last $J-j+1$ equations with $(w_{1j},\ldots,w_{j-1,j})^{\prime}$ being substituted by the right side of (3.12). ∎

The quantity $w_{ij}$ has the following probabilistic interpretations. Consider the DTMC on state space $\{0\}\cup\mathcal{J}$ corresponding to the routing matrix $P$ . Let $w_{ij}$ be the probability that starting from state $i$ , the DTMC will eventually visit state $j$ , avoiding visiting states in $\{0,j+1,\ldots,J\}$ . By the “first-step” method, $w_{ij}$ satisfies (A.81).

Lemma A.2 (Solution to a linear system).

Fix $\eta\in\mathbb{R}^{J}_{-}$ and $j\in\mathcal{J}$ . Recall $\theta=\theta^{(r)}$ defined in (5.39)-(5.41). Then, $\theta$ satisfies the following equations.

	$\displaystyle\bar{\xi}_{\ell}(\theta)=0,\quad\ell=1,\ldots,j-1,$		(A.82)
	$\displaystyle\bar{\xi}_{j}(\theta)=(1-w_{jj})r^{j}\eta_{j},$		(A.83)

Proof.

With $\theta_{k}=r^{k}\eta_{k}$ for $k=j+1,\ldots,J$ being fixed, the linear system of equations

	$\displaystyle\bar{\xi}_{\ell}(\theta)=0,\quad\ell=1,\ldots,j-1,$
	$\displaystyle\bar{\xi}_{j}(\theta)=(1-w_{jj})r^{j}\eta_{j}$

has $j$ equations and $j$ variables $(\theta_{1},\ldots,\theta_{j})$ . It has a unique solution $(\theta_{1},\ldots,\theta_{j})^{\prime}$ given by (5.40) and (5.41). ∎

Lemma A.3.

Fix $j\in\mathcal{J}$ . Recall $\sigma^{2}_{j}$ is defined in (3.10). Then,

\displaystyle\sigma^{2}_{j}=2Q^{*}(w_{1j},\ldots,w_{j-1,j},1,0,\ldots,0),

(A.84)

where $Q^{*}$ is defined in (5.49).

Proof.

Define

\displaystyle u=(w_{1j},\ldots,w_{j-1,j},1,0,\ldots,0)^{\prime}\in\mathbb{R}^{J},

(A.85)

where prime denotes transpose. It follows from (A.81) that

\displaystyle\sum_{k\in\mathcal{J}}P_{ik}u_{k}=w_{ij},\quad i\in\mathcal{J}.

Recall the definition $\xi^{*}_{i}(u)$ in (5.35). It follows that

	$\displaystyle\xi^{*}_{i}(u)$	$\displaystyle=c^{2}_{s,i}\Bigl{(}-u_{i}+\sum_{k\in\mathcal{J}}P_{i,k}u_{k}\Bigr{)}^{2}+\sum_{k\in\mathcal{J}}P_{ik}u_{k}^{2}-\big{(}\sum_{k\in\mathcal{J}}P_{ik}u_{k}\big{)}^{2}$
		$\displaystyle=c_{s,i}^{2}(-u_{i}+w_{ij})^{2}+\sum_{k\in\mathcal{J}}P_{ik}(u_{k}^{2}-u_{k})+w_{ij}-w_{ij}^{2}.$

For $u$ in (A.85),

\displaystyle-u_{i}+w_{ij}=\begin{cases}0&i<j,\\ -1+w_{jj},&i=j,\\ w_{ij}&i>j.\end{cases}

Denoting

\displaystyle b_{i}=\sum_{k\in\mathcal{J}}P_{ik}(u_{k}^{2}-u_{k})+w_{ij}-w_{ij}^{2},

one has the following

\displaystyle\sum_{i=1}^{J}\lambda_{i}\xi^{*}_{i}(u)=\lambda_{j}c_{s,j}^{2}(1-w_{jj})^{2}+\sum_{i>j}\lambda_{i}c_{s,i}^{2}w_{ij}^{2}+\sum_{i\in\mathcal{J}}\lambda_{i}b_{i},

where the last term $\sum_{i\in\mathcal{J}}\lambda_{i}b_{i}$ is equal to

	$\displaystyle\sum_{i\in\mathcal{J}}\lambda_{i}\sum_{k\in\mathcal{J}}P_{ik}(u_{k}^{2}-u_{k})+\sum_{i\in\mathcal{J}}\lambda_{i}w_{ij}(1-w_{ij})$
	$\displaystyle=\sum_{k\in\mathcal{J}}(\lambda_{k}-\alpha_{k})(u_{k}^{2}-u_{k})+\sum_{i\in\mathcal{J}}\lambda_{i}w_{ij}(1-w_{ij})$
	$\displaystyle=\sum_{k<j}(\alpha_{k}-\lambda_{k})(w_{kj}-w_{kj}^{2})+\sum_{i\in\mathcal{J}}\lambda_{i}w_{ij}(1-w_{ij})$
	$\displaystyle=\sum_{k<j}\alpha_{k}(w_{kj}-w_{kj}^{2})+\sum_{i\geq j}\lambda_{i}w_{ij}(1-w_{ij}).$

Following the definition of the quadratic function $Q^{*}(u)$ in (5.49) with $\eta^{*}_{i}(u_{i})$ defined in (5.33), one has

	$\displaystyle 2Q^{*}(u)$	$\displaystyle=\alpha_{j}c_{e,j}^{2}+\sum_{i<j}\alpha_{i}c_{e,i}^{2}w^{2}_{ij}+\lambda_{j}c_{s,j}^{2}(1-w_{jj})^{2}+\sum_{i>j}\lambda_{i}c_{s,i}^{2}w_{ij}^{2}$
		$\displaystyle+\sum_{i<j}\alpha_{i}(w_{ij}-w_{ij}^{2})+\sum_{i\geq j}\lambda_{i}w_{ij}(1-w_{ij})=\sigma^{2}_{j}.$

∎

Appendix B State space collapse

In this section, we first prove Lemma 7.2, which supersedes Lemma 5.2. We next prove (5.43) and (5.44) in Lemma B.2, whose proof utilizes Lemma B.1.

Proof of Lemma 7.2.

Let $\theta(r)\in\Theta$ be a sequence satisfying (5.36). We prove (7.70)-(7.71). In the following, all $\theta$ usage should be interpreted as $\theta(r)$ . We first prove (7.70). By condition (5.36) and Taylor expansion (7.72), there exists $r_{0}\in(0,1)$ such that

\displaystyle\lvert\gamma_{i}(\theta_{i},r)\rvert\leq c_{1}r\text{ and }\lvert\xi_{i}(\theta,r)\rvert\leq c_{1}r\quad r\in(0,r_{0}),i\in\mathcal{J},

where $c_{1}>0$ is some constant. It follows that for $r\in(0,r_{0})$

	$\displaystyle\lvert\phi^{(r)}(\theta)-{\psi}^{(r)}(\theta)\rvert$
	$\displaystyle\leq\mathbb{E}\Big{\lvert}1-\exp\Big{(}-\sum_{i}\gamma_{i}(\theta_{i},r)(\alpha_{i}R^{(r)}_{e,i}\wedge r^{-1})-\sum_{i}\xi_{i}(\theta,r)(\mu^{(r)}_{i}R^{(r)}_{s,i}\wedge r^{-1})\Big{)}\Big{\rvert}$
	$\displaystyle\leq e^{2Jc_{1}}c_{1}\Big{(}\sum_{i}\alpha_{i}r\mathbb{E}[R^{(r)}_{e,i}]+\sum_{i}\mu^{(r)}_{i}r\mathbb{E}[R^{(r)}_{s,i}]\Big{)}=o(1),$

where the second inequality follows from

\displaystyle\lvert 1-e^{-x}\rvert\leq e^{\lvert x\rvert}\lvert x\rvert,\quad x\in\mathbb{R},

and the last equality follows from Lemma 4.5 of Braverman et al. (2017). Therefore, (7.70) is proved.

We next prove (7.71). It follows that, for $r\in(0,r_{0})$ ,

	$\displaystyle\lvert\phi^{(r)}_{j}(\theta)-{\psi}^{(r)}_{j}(\theta)\rvert$
	$\displaystyle\leq\mathbb{E}\Big{[}\,\big{\lvert}1-\exp\big{(}-\sum_{i}\gamma_{i}(\theta,r)(\alpha_{i}R^{(r)}_{e,i}\wedge(1/r))-\sum_{i}\xi_{i}(\theta,r)(\mu^{(r)}_{i}R^{(r)}_{s,i}\wedge(1/r))\big{)}\big{\rvert}\mid Z^{(r)}_{j}=0\Big{]}$
	$\displaystyle\leq e^{2Jc_{1}}c_{1}\Big{(}\sum_{i}\alpha_{i}r\mathbb{E}\big{[}R^{(r)}_{e,i}\mid Z^{(r)}_{j}=0\big{]}+\sum_{i}\mu^{(r)}_{i}r\mathbb{E}\big{[}R^{(r)}_{s,i}\mid Z^{(r)}_{j}=0\big{]}\Big{)}.$

Now,

	$\displaystyle r\mathbb{E}\big{[}R^{(r)}_{e,i}\mid Z^{(r)}_{j}=0\big{]}=\frac{r\mathbb{E}\big{[}R^{(r)}_{e,i}1_{\{Z^{(r)}_{j}=0\}}\big{]}}{\mathbb{P}\{Z^{(r)}_{j}=0\}}\leq\frac{r\big{(}\mathbb{E}\big{[}\big{(}R^{(r)}_{e,i}\big{)}^{p}\big{]}\big{)}^{1/p}\big{(}\mathbb{P}\{Z^{(r)}_{j}=0\}\big{)}^{1/q}}{\mathbb{P}\{Z^{(r)}_{j}=0\}}$
	$\displaystyle=r^{1-j/p}\Big{(}\mathbb{E}\big{[}\big{(}R^{(r)}_{e,i}\big{)}^{(J+\delta_{0})}]\Big{)}^{1/(J+\delta_{0})}(\mu_{j}^{(r)})^{1/p}=o(1)\quad\text{ for }i,j\in\mathcal{J},$

where in obtaining the inequality, we have used Holder’s inequality with $p=J+\delta_{0}$ ( $\delta_{0}$ is given in (3.6)) and $1/p+1/q=1$ , and the last equality follows from Lemma 6.6 of Braverman et al. (2024) and Assumption 3.2. Similarly, one can prove

\displaystyle r\mathbb{E}\big{[}R^{(r)}_{s,i}\mid Z^{(r)}_{j}=0\big{]}=o(1)\quad\text{ for }i,j\in\mathcal{J}.

∎

The following two lemmas are consequences of the uniform moment bounds in Proposition 3.1.

Lemma B.1.

Assume the assumptions in Proposition 3.1. Then

\displaystyle\lim_{r\to 0}\mathbb{E}\Big{[}\sum_{\ell<j}r^{\ell+1}Z^{(r)}_{\ell}\mid Z_{j}^{(r)}=0\Big{]}=0\quad\text{ for }j\in{\cal{J}}.

(B.86)

Proof.

For each $j\in\mathcal{J}$ and $\ell<j$ , by choosing $p=j+\epsilon_{0}$ , $q=(j+\epsilon_{0})/(j+\epsilon_{0}-1)$ and employing the Holder’s inequality

\displaystyle\mathbb{E}\Big{[}r^{\ell}Z^{(r)}_{\ell}1_{\{Z_{j}^{(r)}=0\}}\Big{]}\leq\Big{(}\mathbb{E}\big{[}(r^{\ell}Z^{(r)}_{\ell})^{p}\big{]}\Big{)}^{1/p}\bigl{(}\mathbb{P}\{Z_{j}^{(r)}=0\}\bigr{)}^{1/q},

one can show that

	$\displaystyle r\mathbb{E}\Big{[}r^{\ell}Z^{(r)}_{\ell}\mid Z_{j}^{(r)}=0\Big{]}$	$\displaystyle\leq r\Big{(}\mathbb{E}\big{[}(r^{\ell}Z^{(r)}_{\ell})^{p}\big{]}\Big{)}^{1/p}\bigl{(}\mathbb{P}\{Z_{j}^{(r)}=0\}\bigr{)}^{1/q-1}$
		$\displaystyle=\Big{(}\mathbb{E}\big{[}(r^{\ell}Z^{(r)}_{\ell})^{p}\big{]}\Big{)}^{1/p}(\mu^{(r)}_{j})^{1/(j+\epsilon_{0})}r^{\frac{\epsilon_{0}}{j+\epsilon_{0}}}\to 0,$

which proves (B.86). ∎

Lemma B.2.

Fix a vector $\eta\in\mathbb{R}^{J}_{-}$ and a station $j\in\mathcal{J}$ . Let $\theta=\theta^{(r)}$ be defined in (5.39)-(5.41). As $r\to 0$ ,

	$\displaystyle\phi^{(r)}(\theta^{(r)})-\phi^{(r)}(0,\ldots,0,r^{j}\eta_{j},\ldots,r^{J}\eta_{J})=o(1),$		(B.87)
	$\displaystyle\phi^{(r)}_{j}(\theta^{(r)})-\phi^{(r)}_{j}(0,\ldots,0,r^{j}\eta_{j},\ldots,r^{J}\eta_{J})=o(1),$		(B.88)

Proof.

In the proof, we drop the superscript $r$ from $\theta^{(r)}$ . We first prove (B.87), i.e. (5.43). Note that $\theta_{k}\leq 0$ for $k<j$ and $\theta_{j}-r^{j}\eta_{j}\leq 0$ .

	$\displaystyle\phi^{(r)}(0,\ldots,0,r^{j}\eta_{j},\ldots,r^{J}\eta_{J})-\phi^{(r)}(\theta)=\mathbb{E}\Big{[}\Big{(}1-e^{\sum_{k<j}\theta_{k}Z^{(r)}_{k}+(\theta_{j}-r^{j}\eta_{j})Z^{(r)}_{j}}\Big{)}e^{\sum_{k\geq j}\eta_{k}r^{k}Z^{(r)}_{k}}\Big{]}$
	$\displaystyle\leq\mathbb{E}\Big{[}1-e^{\sum_{k<j}\theta_{k}Z^{(r)}_{k}+(\theta_{j}-r^{j}\eta_{j})Z^{(r)}_{j}}\Big{]}\leq\mathbb{E}\Big{[}\sum_{k<j}\lvert\theta_{k}\rvert Z^{(r)}_{k}+\lvert\theta_{j}-r^{j}\eta_{j}\rvert Z^{(r)}_{j}\Big{]}$
	$\displaystyle=\sum_{k<j}\frac{\lvert\theta_{k}\rvert}{r^{k}}\mathbb{E}\Big{[}r^{k}Z^{(r)}_{k}\Big{]}+\frac{\lvert\theta_{j}-r^{j}\eta_{j}\rvert}{r^{j}}\mathbb{E}\Big{[}r^{j}Z^{(r)}_{j}\Big{]}=o(1),$

where the second inequality follows from $1-e^{-x}\leq x$ for $x\geq 0$ and the last equality is due to $\theta_{k}=o(r^{k})$ for $k<j$ , $\theta_{j}-r^{j}\eta_{j}=o(r^{j})$ , and (3.15).

We next prove (B.88), i.e. (5.44)

	$\displaystyle\phi^{(r)}_{j}(0,\ldots,0,r^{j}\eta_{j},\ldots,r^{J}\eta_{J})-\phi^{(r)}_{j}(\theta)=\mathbb{E}\Big{[}\Big{(}1-e^{\sum_{k<j}\theta_{k}Z^{(r)}_{k}}\Big{)}e^{\sum_{k\geq j}\eta_{k}r^{k}Z^{(r)}_{k}}\mid Z^{(r)}_{j}=0\Big{]}$
	$\displaystyle\leq\mathbb{E}\Big{[}\Big{(}1-e^{\sum_{k<j}\theta_{k}Z^{(r)}_{k}}\Big{)}\mid Z^{(r)}_{j}=0\Big{]}\leq\mathbb{E}\Big{[}\sum_{k<j}\lvert\theta_{k}\rvert Z^{(r)}_{k}\mid Z^{(r)}_{j}=0\Big{]}$
	$\displaystyle=\sum_{k<j}\frac{\lvert\theta_{k}\rvert}{r^{k+1}}\mathbb{E}\Big{[}r^{k+1}Z^{(r)}_{k}\mid Z^{(r)}_{j}=0\Big{]}=o(1),$

where the last equality is due to $\lvert\theta_{k}\rvert\leq c_{k}r^{j}$ for some constant $c_{k}>0$ as $r\to 0$ for $k<j$ and (B.86).

∎