Exploiting Data Locality to Improve Performance of Heterogeneous Server Clusters

Zhisheng Zhao¹ Debankur Mukherjee² Ruoyu Wu³

Abstract

We consider load balancing in large-scale heterogeneous server systems in the presence of data locality that imposes constraints on which tasks can be assigned to which servers. The constraints are naturally captured by a bipartite graph between the servers and the dispatchers handling assignments of various arrival flows. When a task arrives, the corresponding dispatcher assigns it to a server with the shortest queue among $d\geq 2$ randomly selected servers obeying the above constraints. Server processing speeds are heterogeneous and they depend on the server-type. For a broad class of bipartite graphs, we characterize the limit of the appropriately scaled occupancy process, both on the process-level and in steady state, as the system size becomes large. Using such a characterization, we show that data locality constraints can be used to significantly improve the performance of heterogeneous systems. This is in stark contrast to either heterogeneous servers in a full flexible system or data locality constraints in systems with homogeneous servers, both of which have been observed to degrade the system performance. Extensive numerical experiments corroborate the theoretical results.

^†^†¹Georgia Institute of Technology, Email: zhisheng@gatech.edu^†^†²Georgia Institute of Technology, Email: debankur.mukherjee@isye.gatech.edu^†^†³Iowa State University, Email: ruoyu@iastate.edu^†^†Keywords and phrases. Heterogeneous load balancing system, data locality, compatibility constraint, Power-of-Two, Mean-field, McKean-Vlasov Process, Stochastic Coupling^†^†Acknowledgements. The work was partially supported by the NSF grant CIF-2113027.

1 Introduction

Over the last two decades, large-scale load balancing has emerged as a fundamental research problem. In simple terms, the goal is to investigate how to efficiently allocate tasks in large-scale service systems, such as data centers and cloud networks? As modern data centers continue to process massive amounts of data with increasingly stringent processing time requirements, the need for more efficient and scalable, dynamic load balancing algorithms is greater than ever. The study of scalable load balancing algorithms started with the seminal works of Mitzenmacher [17, 1, 16] and Vvedenskaya et al. [32], where the popular ‘power-of- $d$ choices’ or the JSQ( $d$ ) algorithm was introduced. Here a canonical model was considered that consists of $N$ identical parallel servers, each serving a dedicated queue of tasks. Arriving tasks are routed to the shortest of $d\geq 2$ randomly selected queues by a centralized dispatcher, irrevocably and instantaneously, at the time of arrival. Since then, this model has received significant attention from the research community and we have seen tremendous progress in our understanding of performance of various algorithms; see [31] for a recent survey.

Despite this phenomenal progress, when it comes to modern large-scale systems, many of the existing wisdoms can be observed to be false. This is primarily due to the fact that the above classical model fails to capture two of the most significant factors that impact the performance of these systems: (a) Data locality constraints: In simple terms, it means that tasks of a particular type can only be routed to a small subset of servers that are equipped with the appropriate resources to execute them [33, 22, 27, 29]. For example, an image classification request must be routed to a server that is trained with appropriate machine learning models such as, deep convolutional neural network. Also, in online video services like Netflix and YouTube, users’ requests may only be routed to servers that are equipped with the required data (e.g., movies, music). The classical model ignores this effect and assumes full flexibility, that is, that any task can be assigned to any server in the system. In the presence of data locality constraints, the delay performance of the system may degrade drastically as compared to fully flexible systems. (b) Heterogeneity in service rates: Servers in any modern large-scale server clusters do not process tasks at equal speeds. This heterogeneity of the service rates is a major bottleneck in implementing the existing heuristics of the classical model. For example, if there are two groups of servers in the system, one faster and the other slower, then popular dynamic algorithms like JSQ( $d$ ) that has a provably excellent delay performance when all server speeds are identical, can be observed to be unstable (i.e., their queue lengths blow up) [10, 13, 21, 20]. In other words, heterogeneity shrinks the stability region as formally established in [13]. This happens simply because if all the servers are treated equally, then the slower server pool may receive a higher flow of arrivals than what it can process.

Takeaway. In summary, both data locality and heterogeneity of server speeds may significantly degrade the system performance. The main contribution of the current work is to establish that when these two aspects are considered together, then the performance can in fact be drastically improved. That is, if servers are heterogeneous, then efficiently designing the data locality constraints (by appropriately placing the resource files in the server network) can regain the full stability region, which was shrunk for fully flexible systems. Moreover, we also establish that a carefully designed data locality constraints can ensure the celebrated double-exponential decay of tail probability of the steady-state queue length distribution even for the heterogeneous systems.

1.1 Our Contributions

Motivated by this, in the current paper, we consider a bipartite graph model for large-scale load balancing systems, which has recently gained popularity in the research community. In this model, a bipartite graph between the servers and task types describes the compatibility between the two, where an edge represents the server’s ability to process the corresponding task type. This encompasses the classical full-flexibility models as those having a complete bipartite compatibility graph. An immediate difficulty of the new model is that when the graph is non-trivial (i.e., not a collection of isolated pairs or a complete bipartite graph), the mean-field techniques break down. This is because, the queues no longer remain exchangeable, making the aggregate processes such as the vector of number of servers with queue length $i$ with $i=0,1,2,\ldots$ non-Markovian. In addition, we also consider that each dispatcher handles the arrival flow of one of $K$ possible task types and that there are $M$ server types. The rate of service at a server depends on its type. Throughout the paper, the key quantity of interest will be the global occupancy process $\mathbf{q}^{N}(t)=(q^{N}_{m,l}(t),m=1,\ldots,M,l\geq 1)$ , where $q^{N}_{m,l}(t)$ represents the fraction of servers of type $m$ with queue-length at least $l$ at time $t$ in the $N$ -th system with $N$ servers, and we will look at the large-system asymptotic regime: $N\to\infty$ .

Due to the compatibility constraints, the servers become non-exchangeable, even if they belong to the same type. This causes most of the existing frameworks [8, 17, 24] to break down. To characterize the process-level limit of the queue length process, we take resort to the theory of weakly interacting particle systems and asymptotically couple the evolution of the $N$ -dimensional vector of queue lengths with an appropriately defined infinite system of independent McKean-Vlasov processes [26, 15]. We also show the asymptotic independence of any finite number of queue length processes, also know as the propagation of chaos property. This convergence of the queue length processes (in $L_{2}$ sense) is then used to establish the transient convergence of the occupancy process. One downside of the above convergence is that it depends on the assumption that the initial queue lengths within each set of servers of the same type are independent and identically distributed (i.i.d.) and are independent across the the set of servers of different types. Due to this assumption, this convergence result cannot be used to establish the interchange of $t\to\infty$ and $N\to\infty$ limits, which is crucial in studying the limit of steady states.

To overcome this issue, we use the framework of [22], recently introduced in the context of homogeneous systems. Here, a notion called proportional sparsity for graph sequences was introduced, which ensures that the empirical queue length distribution within the set of compatible servers of any dispatcher is close to the empirical queue length distribution of the entire system. This was used in [22] to construct conditions on graphs that match the performance of a fully flexible system. In the current setup, however, this notion is inadequate, since our goal is not to match the performance of the fully flexible system (which is usually poor under heterogeneity). That is why, we extend this notion to what we call the clustered proportional sparsity for a sequence of graphs with increasing size, to accommodate the heterogeneous systems. The clustered proportional sparsity property allows us to construct a stochastic coupling between the system and another intermediate system whose task allocation is done by a carefully constructed algorithm called GWSQ( $d$ ) (Algorithm 1). This coupling with the intermediate system, along with clustered proportional sparsity, helps us establish that if the initial occupancy of two systems are close, then the distance (in the $\ell_{1}$ -norm) between their global occupancy remains small uniformly over any finite time interval. In turn, it implies that their limits of the global occupancy systems are the same. As a consequence, we can remove the i.i.d. assumption of the initial queue lengths, since the above guarantees that under clustered proportional sparsity, the convergence of the occupancy process depends only on the initial occupancy and not on how the individual queues are distributed.

The above process-level limit result shows that the transient limit of the occupancy process can be described as a system of ODEs that depend on various graph parameters. Next, we also show that the interchange of limits holds and that the sequence of occupancy states in stationarity, converges weakly to the unique fixed point of the ODE. One celebrated feature of the classical JSQ( $d$ ) policy for homogeneous systems under full flexibility is that the steady-state queue length decays doubly exponentially as $\lambda^{(d^{i}-1)/(d-1)}$ , where $\lambda\in(0,1)$ is the load per server [17, 32]. We establish this double-exponential decay property for the heterogeneous system.

It is worthwhile to note that the strength of the above results lie in that they hold for arbitrary deterministic sequence of graphs satisfying certain properties. However, we show that all these properties are satisfied almost surely by a sequence of inhomogeneous random graphs with parameters prescribed by the theorems. This makes it easy to design graphs with the desired favorable properties.

1.2 Related Works

The research on task allocation systems with limited flexibility can be traced back to the works of Turner [30] and Foss and Chernova [9]. Of particular importance to the current work, Foss and Chernova [9] considered stability properties of the system using the fluid model. Later, Bramson [4] generalized some parts of results in [9] to a broad class of Join-Shortest-Queue (JSQ)-type systems, including the JSQ( $d$ ) policy, via the Lyapunov function approach. Stolyar [23] considered optimal routing in output-queued flexible server system, which is essentially the bipartite graph model for the load balancing system. Here the author considered a system with a fixed number of servers and dispatchers in the conventional heavy traffic regime and proposed a routing policy that is optimal in terms of server workload. Recently, Cruise et al. [7] considered load balancing problems on hypergraphs and proved its stability conditions. The above works, however, did not aim to precisely characterize the system performance in the large-scale scenario.

The analysis in the large-scale scenario became prominent in the last decade, with the emergence of its applications to load balancing in data centers and cloud networks. In the full-flexibility setup, the analysis of heterogeneous-server systems gained some attention. In this case, Stolyar [24, 25] studied the zero-queueing property of the Join-Idle-Queue (JIQ) policy, Mukhopadhyay et al. [21] and Mukhopadhyay and Mazumdar [20] analyzed the JSQ( $d$ ) policy in heterogeneous systems with processor-sharing service discipline, Hurtado-Lange and Maguluri [13] studied the throughput and delay optimality properties of JSQ( $d$ ), and Bhambay and Mukhopadhyay [3] studied a speed-aware JSQ policy. The above works on the JSQ( $d$ ) policy observe that the stability region shrinks if the dispatcher applies the JSQ( $d$ ) policy blindly. One way to mitigate this performance degradation is to take the server speeds into consideration while sampling servers or while assigning tasks to the sampled servers. Such a ‘hybrid JSQ( $d$ )’ scheme is able to recover the stability region. The current work can be contrasted with this approach. First, in the presence of data locality, both the server speeds and the underlying compatibility constraints need to be taken into account during the sampling procedure, and the approach becomes significantly more complicated. Second, we show how exploiting the data locality, the blind JSQ( $d$ ) policy can recover the stability region and even achieve the double-exponential decay of tail probabilities of the stead-state queue length distribution. One advantage of the latter approach is that the dispatchers can be oblivious to the server speeds, which reduces the implementation complexity and also, makes it robust against changes to the servers (e.g., when servers are add/removed).

Recently, Allmeier and Gast [2] studied the application of (refined) mean-field approximations for heterogeneous systems. Their method is using an ODE to approximate the evolution of each server, and the error vanishes as the system scales. However, this method cannot be directly used in our case. Due to the bipartite compatibility graph structure, it is hard to capture the interactions between two servers, which means that we cannot write the transition rates of the underlying Markov chain as [2] does. Also, one important assumption in their work is the finite buffer, but we consider the infinite buffer case here.

The aspect of task-server compatibility constraints in large-scale load balancing and scheduling gained popularity only recently, as the data locality became prominent in data centers and cloud networks. This led to many works in this area [11, 18, 5, 22, 33, 29, 28]. All these works consider homogeneous processing speeds at the servers. The initial works [30, 11] focused on certain fixed-degree graphs and showed that the flexibility to forward tasks to even a few neighbors with possibly shorter queues may significantly improve the waiting time performance as compared to dedicated arrival streams or a collection of independent M/M/1 queues. Tsitsiklis and Xu [28, 29] considered asymptotic optimality properties of the bipartite graph topology in an input-queued, dynamic scheduling framework. Later, in the (output-queued) load balancing setup, Mukherjee et al. [18] considered the JSQ policy and Budhiraja et al. [5] considered the transient analysis of the JSQ( $d$ ) policy on non-bipartite graphs. The goal in these papers was to provide sufficient condition on the graph sequence to asymptotically match the performance of a complete graph. Here we should mention that the non-bipartite graph model cannot be used to capture the data locality constraints. In the presence of data locality constraints, the analysis of the JSQ( $d$ ) policy for homogeneous systems, including both transient and interchange of limits, was performed by Rutten and Mukherjee [22]. Weng et al. [33] is the first to consider the large-scale heterogeneous-server model under data locality. They showed that the Join-the-Fastest-Shortest-Queue (JFSQ) and Join-the-Fastest-Idle-Queue (JFIQ) policies achieve asymptotic optimality for minimizing mean steady-state waiting time when the bipartite graph is sufficiently well connected. However, these results fall in the category of JSQ-type policies where the asymptotic behavior is degenerate in the sense that the queue lengths at servers can be either 0 or 1. Naturally, the results and their analysis are very different from the JSQ( $d$ )-type policies where queues of any length is possible.

1.3 Notations

Let $\mathbb{N}_{0}=\mathbb{N}\cup\{0\}$ . For a set $S$ , its cardinality is denoted as $|S|$ . For a polish space $\mathcal{S}$ , the space of right continuous functions with left limits from $[0,\infty)$ to $\mathcal{S}$ is denoted as $\mathbb{D}([0,\infty),\mathcal{S})$ , endowed with the Skorokhod topology. The distribution of $\mathcal{S}$ -valued random variable $X$ will be denoted as $\mathcal{L}(X)$ . For a function $f:\ [0,\infty)\rightarrow\mathbb{R}$ , let $\left\lVert f\right\rVert_{*,t}\coloneqq\sup_{0\leq s\leq t}|f(s)|$ . The distribution of $\mathcal{S}$ -valued random variable $X$ will be denoted as $\mathcal{L}(X)$ . For $x\in\mathcal{S}$ , the Dirac measure at the point $x$ is denoted as $\delta_{x}$ . $\left\lVert\cdot\right\rVert_{p}$ represents the $\ell_{p}$ -norm. Define ${X\choose Y}=\frac{X(X-1)\cdots(X-Y+1)}{Y!}$ if $X\geq Y$ and is 0, otherwise. RHS is the acronym of Right Hand side.

2 Model Description

The model below for large-scale systems with limited flexibility was considered by Tsitsiklis and Xu [28, 29] in the context of scheduling algorithms for input-queued systems. Subsequently, it was considered in [18, 5, 22, 33] for output-queued load balancing systems. Let $G^{N}=(W^{N},V^{N},E^{N})$ be a system with $N$ single servers, each serving its own queue, and $W(N)$ dispatchers, where $W^{N}=\{1,...,W(N)\}$ and $V^{N}=\{1,...,N\}$ denote the sets of dispatchers and servers, respectively. Similar to [28, 29], we assume that $\lim_{N\rightarrow\infty}W(N)/N=\xi$ where $\xi>0$ is a constant. The set $E^{N}\subseteq W^{N}\times V^{N}$ of edges represents hard compatibility between the dispatchers and servers in the $N$ -th system. In other words, tasks of type $i$ can be assigned to a server $j$ if and only if $(i,j)\in E^{N}$ . Tasks arriving at a dispatcher must be assigned instantaneously and irrevocably to one of the compatible servers.

•

Task types: A task can be of one of $K$ possible types labelled in $\mathcal{K}=\{1,...,K\}$ and each dispatcher handles arrivals of exactly one task type. Thus, we will interchangeably use the terms task-type and dispatcher-type throughout the article. Let $W^{N}_{k}$ denote the set of all dispatchers handling type- $k$ tasks. As $N\to\infty$ , assume that $|W^{N}_{k}|/W(N)\to w_{k}\in(0,1)$ for $k\in\mathcal{K}$ with $\sum_{k=1}^{K}w_{k}=1$ . Tasks arrive at each dispatcher as an independent Poisson process with rate $\lambda$ .
•

Server types: Each server belongs to one of $M$ possible types labelled in $\mathcal{M}=\{1,...,M\}$ . Let $V^{N}_{m}$ denote the set of type- $m$ servers, and as $N\to\infty$ , $|V^{N}_{m}|/N\to v_{m}\in(0,1)$ for $m\in\mathcal{M}$ with $\sum_{m=1}^{M}v_{m}=1$ .
•

Service times: The processing time at a type- $m$ server is exponentially distributed with mean $1/u_{m}$ , where $u_{m}$ is a positive constant. Throughout, we will assume that asymptotically, the system has sufficient service capacity in the sense that

$\lambda\xi<\sum_{m\in\mathcal{M}}u_{m}v_{m}.$ (2.1)

Note that the left and right hand side above represents the scaled total arrival rate and scaled maximum departure rate, respectively.

For all the asymptotic results, we consider a general class of systems where the compatibility graph satisfies certain asymptotic criteria as specified in Condition 2.1 below. Define

	$\displaystyle\deg^{N}_{w}(i,m)$	$\displaystyle=\|\{j\in V^{N}_{m}:(i,j)\in E^{N}\}\|,\quad i\in W^{N},m\in\mathcal{M},$
	$\displaystyle\deg^{N}_{v}(k,j)$	$\displaystyle=\|\{i\in W^{N}_{k}:(i,j)\in E^{N}\}\|,\quad j\in V^{N},k\in\mathcal{K}.$

Namely, $\deg^{N}_{w}(i,m)$ is the number of the dispatcher $i$ ’s neighboring servers whose type is $m\in\mathcal{M}$ . Similarly, $\deg^{N}_{v}(k,j)$ is the number of the server $j$ ’s neighboring dispatchers whose type is $k\in\mathcal{K}$ .

Condition 2.1.

The sequence $\{G^{N}\}_{N\geq 1}$ satisfies the following:

(a)

For each $k\in\mathcal{K}$ and $m\in\mathcal{M}$ , let $E^{N}(k,m)=\{(i,j)\in W^{N}_{k}\times V^{N}_{m}:(i,j)\in E^{N}\}$

$\lim_{N\rightarrow\infty}\frac{|E^{N}(k,m)|}{|W^{N}_{k}|\times|V^{N}_{m}|}=p_{k,m}\in[0,1].$ (2.2)

We call the matrix $\mathbf{p}=(p_{k,m},k\in\mathcal{K},m\in\mathcal{M})$ as the compatibility matrix.

(b)

For each $k\in\mathcal{K}$ and $m\in\mathcal{M}$ ,

\lim_{N\rightarrow\infty}\frac{\max_{i\in W^{N}_{k}}\deg^{N}_{w}(i,m)}{\min_{i\in W^{N}_{k}}\deg^{N}_{w}(i,m)}=1,\quad\lim_{N\rightarrow\infty}\frac{\max_{j\in V^{N}_{m}}\deg^{N}_{v}(k,j)}{\min_{j\in V^{N}_{m}}\deg^{N}_{v}(k,j)}=1.

Intuitively, the condition implies that the ‘asymptotic density’ of edges between type- $k$ dispatchers and type- $m$ servers is given by $p_{k,m}$ and for each task-type-server-type pair, the servers have similar levels of flexibility. The classical, well-studied setup where any task can be processed by any server, corresponds to the complete bipartite graph with $p^{N}_{k,m}=1$ , $\forall k\in\mathcal{K},m\in\mathcal{M}$ . In Section 3.5, we show that for any given $\mathbf{p}:=(p_{k,m},k\in\mathcal{K},m\in\mathcal{M})$ , a sequence of graphs satisfying Condition 2.1 can be obtained simply by putting edges suitably randomly. This is a certain class of inhomogeneous random graphs, which we call irg( $\mathbf{p}$ ); see Definition 3.15 for details. In fact, the irg( $\mathbf{p}$ ) sequence of graphs will be proved to satisfy the required conditions for all the results of this article to hold.

State Space. In the $N$ -th system, let $X^{N}_{j}(t)$ be the number of tasks (including those in service) in the queue of server $j\in V^{N}$ at time $t$ . Let $q^{N}_{m,l}(t)$ be the proportion of servers of type $m$ with queue length at least $l$ at time $t$ , namely,

q^{N}_{m,l}(t)\coloneqq\frac{1}{|V^{N}_{m}|}\sum_{j\in V^{N}_{m}}\mathds{1}_{\big{(}X^{N}_{j}(t)\geq l\big{)}},\quad t\geq 0,m\in\mathcal{M},l\in\mathbb{N}_{0}.

(2.3)

Let $\mathbf{q}^{N}(t)=\big{(}q^{N}_{m,l}(t),m\in\mathcal{M},l\in\mathbb{N}_{0}\big{)}$ . Then $\mathbf{q}^{N}\coloneqq\big{\{}\mathbf{q}^{N}(t)\big{\}}_{0\leq t<\infty}$ is a process with sample paths in $\mathbb{D}([0,\infty),\mathcal{S})$ where

\mathcal{S}\coloneqq\Big{\{}\mathbf{q}\in[0,1]^{M\times\mathbb{N}_{0}}:q_{m,0}=1,q_{m,l}\geq q_{m,l+1},\text{ and }\sum_{l\in\mathbb{N}_{0}}q_{m,l}<\infty,\forall m\in\mathcal{M},l\in\mathbb{N}_{0}\Big{\}}

is equipped with the $\ell_{1}$ -topology. Note that the space $\mathcal{S}$ is a complete metric space.

Local JSQ( $d$ ) Policy. For any fixed $d\geq 2$ , each dispatcher uses the JSQ( $d$ ) policy [17, 32] to assign the incoming tasks to servers. To describe the policy, define the neighborhood of dispatcher $i\in W^{N}$ , $\mathcal{N}^{N}_{w}(i):=\{j\in V^{N}:(i,j)\in E^{N}\}$ with $\delta^{N}_{i}=|\mathcal{N}^{N}_{w}(i)|$ . When a new task arrives at the dispatcher $i\in W^{N}$ with $\delta^{N}_{i}\geq d$ , it is immediately assigned to the server with the shortest queue among $d$ servers selected uniformly at random from $\mathcal{N}^{N}_{w}(i)$ . Ties are broken uniformly at random. If $\delta^{N}_{i}<d$ , then the task is assigned to one server selected from $\mathcal{N}^{N}_{w}(i)$ uniformly at random. This $\delta^{N}_{i}<d$ scenario is asymptotically not relevant for us since all the graphs that we will consider have diverging degrees as $N\to\infty$ .

3 Main Results

3.1 Mitigating the Stability Issue

As discussed earlier, when the server speeds are heterogeneous, the fully flexible systems (with the complete bipartite compatibility graph) may not be stable under the JSQ( $d$ ) policy, even if we assume that the sufficient service capacity in (2.1) is satisfied. The next lemma provides a necessary and sufficient condition for ergodicity of the queue length process. Recall $\delta^{N}_{i}=|\mathcal{N}^{N}_{w}(i)|$ . For any fixed $N$ , define

\rho^{N}\coloneqq\max_{\begin{subarray}{c}U\subseteq V^{N}\\ U\neq\emptyset\end{subarray}}\Big{\{}\Big{(}\sum_{j\in U}\sum_{m\in\mathcal{M}}\mathds{1}_{(j\in V^{N}_{m})}u_{m}\Big{)}^{-1}\sum_{i\in W^{N}}\Big{(}\mathds{1}_{(\delta^{N}_{i}\geq d)}\sum_{\begin{subarray}{c}S\subseteq(U\cap\mathcal{N}^{N}_{w}(i)):\\ |S|=d\end{subarray}}\frac{\lambda}{{\delta^{N}_{i}\choose d}}+\mathds{1}_{(\delta^{N}_{i}<d)}\frac{|U\cap\mathcal{N}^{N}_{w}(i)|}{\delta^{N}_{i}}\Big{)}\Big{\}}.

(3.1)

Lemma 3.1.

The queue length process $\big{(}X_{j}^{N}(t)\big{)}_{j\in V^{N}}$ under the local JSQ( $d$ ) policy is ergodic if and only if $\rho^{N}<1$ .

The above lemma is an immediate consequence of [9, Theorem 2.5]; see also [4]. We omit its proof. Intuitively, $\rho^{N}<1$ means that in the $N$ -th system, for any subset $U$ of servers with possibly long queues (compared to the rest servers), the total rate at which tasks are assigned to some server in this set must be less than the rate of departure from this set.

Since we are interested in large- $N$ behavior, we will assume certain asymptotic version of the above stability criterion. This is fairly standard in the large-system analysis, as one would want to avoid the ‘heavy-traffic’ regime when $\rho^{N}\uparrow 1$ as $N\to\infty$ . The behavior in the latter scenario is typically qualitatively different from the so-called ‘subcritical’ regime as defined below.

Definition 3.2 (Subcritical Regime).

The sequence $\{G^{N}\}_{N}$ of systems defined as above is said to be in the subcritical regime with asymptotic load $\rho<1$ if $\rho^{N}\to\rho<1$ , as $N\to\infty$ .

Throughout this paper we will assume that the sequence of systems under consideration is in subcritical regime. From Lemma 3.1, it is immediate that if a sequence of systems is in subcritical regime, then its queue length process is ergodic for all large enough $N$ . The potential non-ergodicity of fully flexible, heterogeneous server clusters brings us to the question that when the sufficient service capacity in (2.1) is satisfied, whether we can design the underlying compatibility structure carefully so that the queue length process is ergodic. In other words, can we regain the stability region? Proposition 3.3 below shows that this is indeed the case. In some sense, this highlights the first-order improvements (i.e., in terms of stability properties) of a careful compatibility structure design in contrast to a fully flexible system.

Proposition 3.3.

Let the parameters $\lambda,\xi,d$ and $w_{k},v_{m},u_{m}$ , $k\in\mathcal{K}$ , $m\in\mathcal{M}$ , be such that (2.1) is satisfied. Then there exists $(p_{k,m})_{k\in\mathcal{K},m\in\mathcal{M}}\in[0,1]^{K\times M}$ such that, for any sequence of systems $\{G^{N}\}_{N\geq 1}$ satisfying Condition 2.1, the queue length process $\big{(}X_{j}^{N}(t)\big{)}_{j\in V^{N}}$ is ergodic for all $N$ large-enough. Moreover, such a $(p_{k,m})_{k\in\mathcal{K},m\in\mathcal{M}}$ can be obtained explicitly by solving a set of inequalities.

In the following sections, we will demonstrate, in addition to the first-order improvements, how asymptotic queue length distribution can be improved as well, for example, in terms of having a double-exponential decay of tail probabilities.

The proof of Proposition 3.3 is provided in Appendix A. It relies on first building a simple criteria involving the system parameters, which, for sequence of systems satisfying Condition 2.1, ensures stability for all large-enough $N$ (Lemma 3.4). Then we show that given other parameters, a value of $(p_{k,m})_{k\in\mathcal{K},m\in\mathcal{M}}$ satisfying this criteria can be found by checking the feasibility region defined by $M$ inequalities.

We end this subsection by presenting the above-mentioned simple asymptotic criteria for subcriticality. The proof is given in Appendix A. Denote $\delta_{k}\coloneqq\sum_{m\in\mathcal{M}}p_{k,m}v_{m}$ for each $k\in\mathcal{K}$ .

Lemma 3.4.

Let $\{G^{N}\}_{N}$ be a sequence satisfying Condition 2.1. The sequence of systems is in subcritical regime if

\frac{\lambda\xi}{u_{m}}\sum_{k\in\mathcal{K}}\frac{w_{k}p_{k,m}}{\delta_{k}}<1,\quad\text{ for all }m\in\mathcal{M}.

(3.2)

3.2 Process-level Limit: IID Case

Our first main result characterizes the process-level limit of the queue-length process $\big{(}X^{N}_{j}$ , $j\in V\big{)}$ , as $N\to\infty$ , when the starting states $\big{\{}X^{N}_{j}(0):j\in V^{N}_{m}\big{\}}$ are i.i.d. for all $m\in\mathcal{M}$ and independent across different $m$ -values. When the sequence of graphs $\{G^{N}\}_{N}$ satisfies a stronger condition, called clustered proportional sparsity (Definition 3.8), the i.i.d. condition can be removed. This is the content of Section 3.3.

Now, note that for a fixed $N\geq 1$ , $\{X^{N}_{j}:j\in V_{N}\}$ is a system of $N$ stochastic processes with mean-field type interactions. Exploiting tools from the theory of weakly interacting particles, we show in Theorem 3.5 below that as the system size becomes large, queue-length processes converge weakly to those of an infinite system of independent McKean-Vlasov processes $\{X_{j}:j\in\mathbb{N}\}$ (see e.g. [26, 15]). In fact, using a suitable coupling to be described in more details in Section 4.1, the convergence holds in $L_{2}$ . For ease of describing such processes and coupling, although we only assumed that certain fractions of servers are of certain task types in the model description, it will be convenient to fix the type of each server $j\in\mathbb{N}$ in this subsection, by defining a membership map $\mathbf{M}:\mathbb{N}\rightarrow\mathcal{M}$ , so that $V^{N}_{m}=\{j\in V^{N}:\mathbf{M}(j)=m\}$ with $\lim_{N\rightarrow\infty}\frac{|V^{N}_{m}|}{N}=v_{m}$ and $V_{m}=\lim_{N\rightarrow\infty}V^{N}_{m}$ for each $m\in\mathcal{M}$ . With such fixed server types and $X_{j}^{N}(0)\equiv X_{j}(0)$ , let

	$\displaystyle X_{j}(t)$	$\displaystyle=X_{j}(0)-\int_{0}^{t}\mathds{1}_{\big{(}X_{j}(s-)>0\big{)}}D_{j}(ds)+\int_{[0,t]\times\mathbb{R}_{+}}\mathds{1}_{\big{(}0\leq y\leq C_{j}(s-)\big{)}}A_{j}(dsdy),$		(3.3)
	$\displaystyle C_{j}(t)$	$\displaystyle=d\xi\sum_{k\in\mathcal{K}}\frac{p_{k,m}w_{k}}{\delta_{k}}\sum_{(M_{2},...,M_{d})\in\mathcal{M}^{d-1}}h_{t}(j,M_{2},...,M_{d}),$		(3.4)

where $\mathbf{M}(j)=m$ and

$\displaystyle h_{t}(j,M_{2},...,M_{d})$	$\displaystyle=\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\int_{\mathbb{N}^{d-1}}b\big{(}X_{j}(t),x_{j_{2}},...,x_{j_{d}}\big{)}\mu^{M_{2}}_{t}(dx_{j_{2}})\cdots\mu^{M_{d}}_{t}(dx_{j_{d}}),$
$\displaystyle b(\mathbf{x})$	$\displaystyle=b(x_{1},...,x_{d})\coloneqq\sum_{r=1}^{d}\frac{1}{r}\mathds{1}_{\big{(}x_{1}=\min_{j\in[d]}\mathbf{x},\|\operatorname*{arg\,min}\mathbf{x}\|=r\big{)}},\quad\mathbf{x}=(x_{1},...,x_{d})\in\mathbb{N}_{0}^{d},$	(3.5)
$\displaystyle\mu^{m}_{t}$	$\displaystyle=\mathcal{L}\big{(}X_{i}(t)\big{)},\quad\forall\,i\in V_{m},m\in\mathcal{M},t\geq 0.$

Here, $\{D_{j}:j\in V_{m}\}$ are i.i.d. Poisson processes with rate $u_{m}$ for each $m\in\mathcal{M}$ , $\{A_{j}:j\in\mathbb{N}\}$ are i.i.d. Poisson random measures on $[0,\infty)\times\mathbb{R}_{+}$ with intensity $\lambda dsdy$ , and all $D_{j}$ ’s and $A_{j}$ ’s are independent. Loosely speaking, $A_{j}$ corresponds to the arrival processes and $D_{j}$ corresponds to the departure processes at servers. We note that the existence and uniqueness of solutions to (3.3) and (3.4) can be proved by standard arguments (see e.g., [26, 15]) using the boundedness and Lipschitz property of the functions $b$ and $x\mapsto\mathds{1}_{(x>0)}$ on $\mathbb{N}_{0}$ .

Theorem 3.5 (Convergence to McKean-Vlasov process and propagation of chaos).

Consider any fixed $\mathbf{q}^{\infty}=\big{(}q^{\infty}_{m,l},m\in\mathcal{M},l\in\mathbb{N}_{0}\big{)}\in\mathcal{S}$ . Assume that all $X^{N}_{j}(0)$ ’s are independent, and for each $m\in\mathcal{M}$ , $\big{\{}X^{N}_{j}(0):j\in V^{N}_{m}\big{\}}$ is i.i.d. with $\mathbb{P}\big{(}X^{N}_{j}(0)\geq l\big{)}=q^{\infty}_{m,l}$ , $l\in\mathbb{N}_{0}$ . On any finite time interval $[0,T]$ , $T>0$ , for any $m\in\mathcal{M}$ and $j\in V_{m}$ , the queue length process $X^{N}_{j}(\cdot)$ at server $j$ weakly converges to the process $X_{j}(\cdot)$ in (3.3). In fact, one can suitably couple $X^{N}_{j}$ with $X_{j}$ such that

\max_{j\in V^{N}}\mathbb{E}\left\lVert X^{N}_{j}-X_{j}\right\rVert^{2}_{*,T}\xrightarrow{N\rightarrow\infty}0

(3.6)

and hence the propagation of chaos property holds, that is, for any $n\in\mathbb{N}$ and distinct $j_{h}\in V_{M_{h}}$ , $h=1,\dotsc,n$ ,

\mathcal{L}(X_{j_{1}}^{N},\dotsc,X_{j_{n}}^{N})\xrightarrow{N\rightarrow\infty}\mathcal{L}(X_{j_{1}},\dotsc,X_{j_{n}})=\mu^{M_{1}}\otimes\dotsb\otimes\mu^{M_{n}}.

(3.7)

Theorem 3.5 gives us the limit law of all individual queues. Next, in Theorem 3.6, we will show how such a server-level convergence can be used to obtain a convergence result for the global occupancy process $\mathbf{q}^{N}(\cdot)$ to a deterministic dynamical system, which was our primary goal. The proofs of Theorem 3.5 and Theorem 3.6 are provided in Section 4.

Theorem 3.6 (Process-level convergence for i.i.d. starting state).

Assume that all $X^{N}_{j}(0)$ ’s are independent, and for each $m\in\mathcal{M}$ , $\{X^{N}_{j}(0):j\in V^{N}_{m}\}$ is i.i.d. with $\mathbb{P}(X^{N}_{j}(0)\geq l)=q^{\infty}_{m,l}$ , $l\in\mathbb{N}_{0}$ for some $\mathbf{q}^{\infty}=(q^{\infty}_{m,l},m\in\mathcal{M},l\in\mathbb{N}_{0})\in\mathcal{S}$ . Then on any finite time interval, the occupancy process $\mathbf{q}^{N}(\cdot)$ converges weakly with respect to Skorokhod $J_{1}$ topology to the deterministic limit $\mathbf{q}(\cdot)\coloneqq(q_{m,l}(\cdot),m\in\mathcal{M},l\in\mathbb{N}_{0})$ given by the unique solution to the following system of ODEs: For all $m\in\mathcal{M}$ , $q_{m,0}(t)=1$ , $q_{m,l}(0)=q^{\infty}_{m,l}$ , and

	$\displaystyle\frac{dq_{m,l}(t)}{dt}$	$\displaystyle=-u_{m}\big{(}q_{m,l}(t)-q_{m,l+1}(t)\big{)}$
		$\displaystyle\quad+\lambda\xi\big{(}q_{m,l-1}(t)-q_{m,l}(t)\big{)}\sum_{k\in\mathcal{K}}\frac{p_{k,m}w_{k}}{\delta_{k}}\frac{(\tilde{q}_{k,l-1}(t))^{d}-(\tilde{q}_{k,l}(t))^{d}}{\tilde{q}_{k,l-1}(t)-\tilde{q}_{k,l}(t)},\quad\forall l\in\mathbb{N}.$		(3.8)

where $\tilde{q}_{k,l}(t)=\sum_{m\in\mathcal{M}}\frac{v_{m}p_{k,m}}{\delta_{k}}q_{m,l}(t)$ for all $k\in\mathcal{K}$ .

Remark 3.7.

Using the propagation of chaos property (3.7) and the fact that $\{X_{j}(t):j\in\mathbb{N}\}$ is independent and $\{X_{j}(t):j\in V_{m}\}$ is i.i.d. for each $m\in\mathcal{M}$ , it follows that the limit of the global occupancy process at any time instant $t$ , in fact, corresponds to the laws of $X_{j}(t)$ for each type of servers $j$ in (3.3), that is,

\mu_{t}^{m}[l,\infty)=\mathbb{P}(X_{j}(t)\geq l)=q_{m,l}(t),\quad j\in V_{m},m\in\mathcal{M},l\in\mathbb{N}_{0},t\geq 0.

3.3 Process-level Limit: General Case

Theorem 3.6 requires the strong assumption that for each $m\in\mathcal{M}$ , $X^{N}_{j}(0)$ , $j\in V^{N}_{m}$ , are i.i.d.. In order to argue the interchange of limits, we need to relax this assumption on initial states. This is because the arguments for the interchange of limits involves initiating the prelimit system at the steady state and then showing that as $N\to\infty$ , the system must converge to the unique fixed point of the limiting ODE. The above requires us to characterize the (process-level) limiting trajectory of the system starting from arbitrary occupancy state. We achieve this in this section.

Intuitively, the assumption of i.i.d. in Theorems 3.5 and 3.6 ensures that the local occupancy observed by any dispatcher $i\in W^{N}_{k}$ , $k\in\mathcal{K}$ is ‘close’, in suitable sense, to the average occupancy at the entire system. This phenomenon can be ensured asymptotically, even without the i.i.d. assumption if the graph sequence satisfies a property we call the clustered proportional sparsity. This notion was first introduced for the homogeneous systems in [22]. The definition below is a modified notion that is suitable for the current heterogeneous setting.

Definition 3.8 (Clustered Proportional Sparsity).

Recall $\mathcal{N}^{N}_{w}(i)=\{j\in V^{N}:(i,j)\in E^{N}\}$ . The sequence $\{G^{N}\}_{N}$ is called clustered proportionally sparse if for any $\varepsilon>0$ ,

\sup_{k\in\mathcal{K}}\sup_{U\subseteq V^{N}}\Big{|}\Big{\{}i\in W^{N}_{k}:\Big{|}\frac{|\mathcal{N}^{N}_{w}(i)\cap U|}{|\mathcal{N}^{N}_{w}(i)|}-\frac{|E^{N}_{k}(U)|}{|E^{N}_{k}(V^{N})|}\Big{|}\geq\varepsilon\Big{\}}\Big{|}/|W^{N}_{k}|\xrightarrow{N\rightarrow\infty}0,

(3.9)

where $E^{N}_{k}(U)\coloneqq\{(i,j)\in W^{N}_{k}\times U:(i,j)\in E^{N}\}$ .

Remark 3.9.

We can view the subset $U$ in the definition as a test set, say $U=\mathcal{Q}^{N}_{m,l}(t)$ , where $\mathcal{Q}^{N}_{m,l}(t)$ is the set of type $m\in\mathcal{M}$ servers with queue length at least $l\in\mathbb{N}_{0}$ at time $t$ . Hence, Definition 3.8 ensures that for all but $o(N)$ dispatchers, the observed empirical queue length distribution within its neighborhood, is close to the global weighted empirical queue length distribution (Definition 4.6) of its corresponding type. Then, the global occupancy process evolves similarly to (and converges to the same limit as) the case when the initial states are i.i.d..

Theorem 3.10 (Process-level convergence).

Let $\{G^{N}\}_{N}$ be a clustered proportionally sparse sequence of graphs. Assume that $\mathbf{q}^{N}(0)$ weakly converges to $\mathbf{q}^{\infty}\in\mathcal{S}$ . Then on any finite time interval, the occupancy process $\mathbf{q}^{N}(\cdot)$ converges weakly with respect to the Skorokhod $J_{1}$ topology to the deterministic limit $\mathbf{q}(\cdot)\coloneqq(q_{m,l}(\cdot),m\in\mathcal{M},l\in\mathbb{N}_{0})$ given by the unique solution to the system of ODEs defined by (3.8) with initial state $\mathbf{q}(0)=\big{(}q^{\infty}_{m,l},m\in\mathcal{M},l\in\mathbb{N}_{0}\big{)}$ .

The proof of Theorem 3.10 is given in Section 4.4

3.4 Convergence of Steady States

In the last section, we showed the process-level convergence of global occupancy process $\mathbf{q}^{N}(\cdot)$ to a mean-field limit $\mathbf{q}(\cdot)$ . In this section, we will establish the convergence of the sequence of stationary distributions to the unique fixed point of the mean-field limit by establishing the interchange of large- $N$ and large- $t$ limits: $\lim_{t\rightarrow\infty}\lim_{N\rightarrow\infty}\mathbf{q}^{N}(t)=\lim_{N\rightarrow\infty}\lim_{t\rightarrow\infty}\mathbf{q}^{N}(t)$ . Throughout this section, we will assume that the sequence of systems is in subcritical regime (recall Definition 3.2). The first result below states that the limiting system of ODEs have a unique fixed point $\mathbf{q}^{*}$ and it satisfies the global stability property, i.e., for any initial point $\mathbf{q}(0)\in\mathcal{S}$ , $\lim_{t\rightarrow\infty}\mathbf{q}(t)=\mathbf{q}^{*}$ .

Theorem 3.11 (Global stability).

Let $\bar{\mathbf{q}}(t,\mathbf{q}_{0})$ be the solution to the system of ODEs in (3.8) with the initial point $\mathbf{q}(0)=\mathbf{q}_{0}\in\mathcal{S}$ . Then there exists a unique fixed point $\mathbf{q}^{*}=\big{(}q^{*}_{m,l},m\in\mathcal{M},l\in\mathbb{N}_{0}\big{)}\in\mathcal{S}$ such that

\lim_{t\rightarrow\infty}\bar{\mathbf{q}}(t,\mathbf{q}_{0})=\mathbf{q}^{*}.

The proof of Theorem 3.11 is given in Section 5. It relies on a monotonicity property of the system, which ensures that for two processes $\mathbf{q}^{1}(\cdot)$ and $\mathbf{q}^{2}(\cdot)$ , if $\mathbf{q}^{1}(0)\leq\mathbf{q}^{2}(0)$ , then $\mathbf{q}^{1}(t)\leq\mathbf{q}^{2}(t)$ for all $t\geq 0$ (see ref. [24, 14]).

The last ingredient that we need in order to prove the interchange of limits is to establish tightness of the sequence of random variables $\{\mathbf{q}^{N}(\infty)\}_{N\geq 1}$ under suitable metric, where $\mathbf{q}^{N}(\infty):=\lim_{t\to\infty}\mathbf{q}^{N}(t)$ . Here, as before, we should note that the process $(\mathbf{q}^{N}(t))_{t\geq 0}$ is not Markovian. That is why, the random variable $\mathbf{q}^{N}(\infty)$ should be interpreted as the functional applied to the steady-state system. The tightness result is stated in the next theorem.

Theorem 3.12 (Tightness).

For any $\varepsilon>0$ , there exists a compact subset $\bar{K}(\varepsilon)\subseteq\mathcal{S}$ , when $\mathcal{S}$ is equipped with the $\ell_{1}$ -topology, such that

\mathbb{P}(\mathbf{q}^{N}(\infty)\notin\bar{K}(\varepsilon))<\varepsilon,\quad\forall N\geq 1.

Theorem 3.12 is proved in Section 5. The key idea is to use Lyapunov function approach to bound the expected sum of tails $q^{N}_{m,l}(\infty)$ . Combining Theorems 3.10, 3.11, and 3.12 we can prove the following interchange of limits result.

Theorem 3.13 (Convergence of steady states).

Let $\{G^{N}\}_{N\geq 1}$ be a clustered proportionally sparse sequence of graphs satisfying Condition 2.1. Then the sequence of random variables $\{\mathbf{q}^{N}(\infty)\}_{N\geq 1}$ converges weakly to $\mathbf{q}^{*}$ , the unique fixed point of the system of ODEs in (3.8).

One major discovery about the JSQ( $d$ ) policy for the classical, homogeneous, fully flexible system is that the limit of the stationary distribution (which, in our case, is given by $\mathbf{q}^{*}$ ) has a double-exponential decay of tail [17, 32] for any $d\geq 2$ . This is in sharp contrast with the (single) exponential decay of the corresponding tail for random routing or $d=1$ . In fact, in this case, for any $d\geq 2$ , $\mathbf{q}^{*}$ can be characterized explicitly as: $q^{*}_{l}=\lambda^{\frac{d^{l}-1}{d-1}}$ , where $q_{l}^{*}$ is the (limiting) steady-state fraction of servers with queue length at least $l=1,2,\ldots$ . In the current case of heterogeneous systems, it is intractable to characterize the fixed point $\mathbf{q}^{*}$ explicitly. However, as stated in the next theorem, we can still prove that the doubly exponential decay of the tails $q^{*}_{m,l}$ for each $m\in\mathcal{M}$ holds.

Theorem 3.14 (Double-exponential tail decay).

Let $\mathbf{q}^{*}=\big{(}q^{*}_{m,l},m\in\mathcal{M},l\in\mathbb{N}_{0}\big{)}$ be the unique fixed point of the system of ODEs in (3.8). Then, for all $m\in\mathcal{M}$ , the sequence $\big{\{}q^{*}_{m,l},l\in\mathbb{N}_{0}\big{\}}$ decreases doubly exponentially, i.e., there exist positive constant $l_{m}\in\mathbb{N}_{0}$ , $a_{m}\in(0,1)$ and $b_{m}>0$ such that for all $l\geq l_{m}$ ,

q^{*}_{m,l}\leq b_{m}a_{m}^{d^{l}}.

(3.10)

3.5 Simple Data Locality Design using Randomization

Sections 3.1–3.4 characterize the performance of the occupancy process for arbitrary deterministic sequence of systems where the underlying graph sequence satisfies certain properties. In particular, Condition 2.1 and Definition 3.8 provide a sufficient criteria under which both the process-level convergence (Theorem 3.10) and interchange of limits (Theorem 3.13) hold. In this section, we show that graphs satisfying the above required criteria can be obtained easily if the compatibility graph is designed suitably randomly. Given the asymptotic edge-density parameters in Condition 2.1, we define a certain sequence of inhomogeneous random graphs or irg as follows.

Definition 3.15 (irg( $\mathbf{p}$ )).

Given $\mathbf{p}\coloneqq(p_{k,m},k\in\mathcal{K},m\in\mathcal{M})$ , the $N$ -th system of irg( $\mathbf{p}$ ) is constructed as follows: For any $k\in\mathcal{K}$ and $m\in\mathcal{M}$ , dispatcher $i$ and server $j$ shares an edge with probability $p_{k,m}$ for all $i\in W^{N}_{k}$ and $j\in V^{N}_{m}$ , independently of each other.

For any $\mathbf{p}$ for which the asymptotic stability criterion holds, we have the following result for the sequence of irg( $\mathbf{p}$ ).

Theorem 3.16.

Let $\mathbf{p}=(p_{k,m},k\in\mathcal{K},m\in\mathcal{M})$ be such that the stability criterion in (3.2) holds and $\{G_{N}\}_{N\geq 1}$ be a sequence of irg( $\mathbf{p}$ ) with increasing $N$ . Then the conclusions of Theorem 3.10 and Theorem 3.13 hold for $\{G_{N}\}_{N\geq 1}$ .

The proof of Theorem 3.16 is provided in Appendix I. It relies on verifying that the sequence of irg( $\mathbf{p}$ ) graphs satisfies Condition 2.1 and the property of clustered proportional sparsity almost surely. The verification involves using concentration of measure arguments to establish structural properties of the compatibility graphs.

4 Proof of Transient Limit Results

In this section, we will prove the results of transient limit results, Theorems 3.5, 3.6 and 3.10 in Sections 4.2, 4.3, and 4.4, respectively. We start by proving a few auxiliary results in Section 4.1.

4.1 Auxiliary Results

First, we will need a characterization of the evolution of the queue length process at each server. To describe this evolution, let us introduce the following notations:

$\displaystyle\texttt{set}^{N}(j)$	$\displaystyle\coloneqq\Big{\{}(j_{2},...,j_{d})\in[N]^{d-1}:(j,j_{2},\ldots,j_{d})\text{ are distinct}\Big{\}},$	(4.1)
$\displaystyle\texttt{sett}^{N}(j)$	$\displaystyle\coloneqq\Big{\{}(j_{2},\ldots,j_{d},j^{\prime}_{2},\ldots,j^{\prime}_{d})\in[N]^{2d-2}:(j_{2},...,j_{d})\in\texttt{set}^{N}{(j)},(j^{\prime}_{2},...,j^{\prime}_{d})\in\texttt{set}^{N}{(j)},$
	$\displaystyle\hskip 199.16928pt(j_{2},...,j_{d})\cap(j^{\prime}_{2},...,j^{\prime}_{d})\neq\emptyset\Big{\}}.$	(4.2)

To represent the graph, define the edge occupancy $\xi^{N}_{i,j}$ to be the binary variable:

\xi^{N}_{i,j}=\begin{cases}1,&\text{if}\ (i,j)\in E^{N},\\ 0,&\text{otherwise},\end{cases}\quad\mbox{for all }i\in W^{N},j\in V^{N}.

Recall the function $b$ , Poisson processes $\{D_{j}\}$ and Poisson random measures $\{A_{j}\}$ in and below (3.5). By Condition 2.1, for all large enough $N$ , all dispatchers in the $N$ -th system have at least $d$ neighbors. Hence, WLOG, in the rest of this section, we will only consider the case $\delta^{N}_{i}\geq d$ , $\forall i\in W^{N}$ . In that case, due to the Poisson thinning property, note that we can write $X^{N}_{j}(t)$ as follows:

X^{N}_{j}(t)=X^{N}_{j}(0)-\int_{0}^{t}\mathds{1}_{\big{(}X^{N}_{j}(s-)>0\big{)}}D_{j}(ds)+\int_{[0,\infty)\times\mathbb{R}_{+}}\mathds{1}_{\big{(}0\leq y\leq C^{N}_{j}(s-)\big{)}}A_{j}(dsdy),

(4.3)

where

	$\displaystyle C^{N}_{j}(s)$	$\displaystyle=\sum_{i\in W^{N}}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}b\big{(}X^{N}_{j}(s),X^{N}_{j_{2}}(s),...,X^{N}_{j_{d}}(s)\big{)}$		(4.4)
		$\displaystyle=\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}b\big{(}X^{N}_{j}(s),X^{N}_{j_{2}}(s),...,X^{N}_{j_{d}}(s)\big{)}.$

The RHS of the first summation in (4.4) represents the probability that a job arriving at the dispatcher $i\in W^{N}$ will be assigned to the server $j\in V^{N}$ given the state $\big{(}X^{N}_{j},j\in V^{N}\big{)}$ . Moreover, by Condition 2.1, the term $C^{N}_{j}$ for all $j\in V^{N}$ can be upper bounded, uniformly for all $t$ , by a constant for all large enough $N$ , which is stated in Lemma 4.2 below.

When we do some estimation, like bounding the term $C^{N}_{j}$ , we need to uniformly bound the number of the neighbors of servers or dispatchers. Such uniformity is stated in Lemma 4.1 and is a direct result of Condition 2.1. Recall $\delta^{N}_{i}=|\mathcal{N}^{N}_{w}(i)|$ and $\delta_{k}=\sum_{m\in\mathcal{M}}p_{k,m}v_{m}$ .

Lemma 4.1.

For each $k\in\mathcal{K}$ ,

\lim_{N\rightarrow\infty}\max_{i\in W^{N}_{k}}\frac{\deg^{N}_{w}(i,m)}{|V^{N}_{m}|}=\lim_{N\rightarrow\infty}\min_{i\in W^{N}_{k}}\frac{\deg^{N}_{w}(i,m)}{|V^{N}_{m}|}=p_{k,m},\quad m\in\mathcal{M},

(4.5)

and

\lim_{N\rightarrow\infty}\max_{i\in W^{N}_{k}}\frac{\delta^{N}_{i}}{N}=\lim_{N\rightarrow\infty}\min_{i\in W^{N}_{k}}\frac{\delta^{N}_{i}}{N}=\delta_{k}.

(4.6)

Also, for each $m\in\mathcal{M}$ ,

\lim_{N\rightarrow\infty}\max_{j\in V^{N}_{m}}\frac{\deg^{N}_{v}(k,j)}{|W^{N}_{k}|}=\lim_{N\rightarrow\infty}\min_{j\in V^{N}_{m}}\frac{\deg^{N}_{v}(k,j)}{|W^{N}_{k}|}=p_{k,m},\quad k\in\mathcal{K}.

(4.7)

Lemma 4.2.

For all large enough $N$ , we have that for any $m\in\mathcal{M}$ , $j\in V^{N}_{m}$ , and $t\geq 0$ ,

C^{N}_{j}(t)\leq 2\xi d\sum_{k\in\mathcal{K}}\frac{p_{k,m}w_{k}}{\delta_{k}}.

(4.8)

Proof.

By the definition of $C^{N}_{j}(t)$ , for any $t\geq 0$ and large enough $N$ ,

C^{N}_{j}(t)\leq\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}=\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\frac{{\delta^{N}_{i}-1\choose d-1}}{{\delta^{N}_{i}\choose d}}\leq 2\xi d\sum_{k\in\mathcal{K}}\frac{p_{k,m}w_{k}}{\delta_{k}},

where the first inequality is due to $b(\cdot)\leq 1$ and the last inequality comes from Lemma 4.1. ∎

By Lemma 4.1, we know that the neighborhoods of dispatchers of the same type are almost the same. With the scale of the system size, the local graph structure for each dispatcher of the same type will converge to the average one. The following two lemmas give necessary approximation of the graph structures for large- $N$ systems. Their proofs are combinatorial and are based on Condition 2.1 and Lemma 4.1. They are provided in Appendix B.

Lemma 4.3.

Consider a sequence $\{G^{N}\}_{N}$ satisfying Condition 2.1. For each $m\in\mathcal{M}$ ,

\begin{split}\max_{j\in V^{N}_{m}}\max_{k\in\mathcal{K}}\max_{(M_{2},...,M_{d})\in\mathcal{M}^{d-1}}\Big{|}&\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\sum_{\begin{subarray}{c}(j_{2},...,j_{d})\in\texttt{set}^{N}(j)\\ s.t.\quad j_{2}\in V^{N}_{M_{2}},...,j_{d}\in V^{N}_{M_{d}}\end{subarray}}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}\\ &\hskip 113.81102pt-\xi d\frac{p_{k,m}w_{k}}{\delta_{k}}\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\Big{|}\xrightarrow{N\rightarrow\infty}0.\end{split}

(4.9)

Lemma 4.4.

Consider any $m\in\mathcal{M}$ and $j\in V_{m}$ . For large enough $N$ ,

\begin{split}&\sum_{i\in W^{N}}\sum_{\texttt{sett}^{N}{(j)}}\frac{\xi^{N}_{i,j}\times\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}\frac{\xi^{N}_{i,j}\times\xi^{N}_{i,j^{\prime}_{2}}\times\cdots\times\xi^{N}_{i,j^{\prime}_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}\leq\frac{C_{1}}{N^{2}}\end{split}

(4.10)

where $C_{1}$ is a positive constant. Similarly,

\begin{split}&\sum_{\begin{subarray}{c}i_{1},i_{2}\in W^{N},\\ i_{1}\neq i_{2}\end{subarray}}\sum_{\texttt{sett}^{N}{(j)}}\frac{\xi^{N}_{i_{1},j}\times\xi^{N}_{i_{1},j_{2}}\times\cdots\times\xi^{N}_{i_{1},j_{d}}}{{\delta^{N}_{i_{1}}\choose d}(d-1)!}\frac{\xi^{N}_{i_{2},j}\times\xi^{N}_{i_{2},j^{\prime}_{2}}\times\cdots\times\xi^{N}_{i_{2},j^{\prime}_{d}}}{{\delta^{N}_{i_{2}}\choose d}(d-1)!}\leq\frac{C_{2}}{N}\end{split}

(4.11)

where $C_{2}$ is a positive constant.

4.2 Convergence to McKean-Vlasov Process: IID Case

Proof of Theorem 3.5.

It suffices to prove (3.6). Fix any $m\in\mathcal{M}$ , $j\in V_{m}$ and $T>0$ . We have that for any fixed $t\in[0,T]$ and any $N$ s.t. $j\in V^{N}$ ,

$\displaystyle\mathbb{E}\left\lVert X^{N}_{j}-X_{j}\right\rVert^{2}_{*,t}$	$\displaystyle\leq c_{0}\mathbb{E}\left\lVert X^{N}_{j}(t)-X_{j}(t)\right\rVert^{2}$
	$\displaystyle\leq c_{1}\mathbb{E}\int_{0}^{t}\|\mathds{1}_{(X^{N}_{j}(s)>0)}-\mathds{1}_{(X_{j}(s)>0)}\|^{2}ds+c_{1}\mathbb{E}\Big{(}\int_{0}^{t}\|\mathds{1}_{(X^{N}_{j}(s)>0)}-\mathds{1}_{(X_{j}(s)>0)}\|ds\Big{)}^{2}$
	$\displaystyle\quad+c_{1}\mathbb{E}\int_{[0,t]\times\mathbb{R}_{+}}\|\mathds{1}_{(0\leq y\leq C^{N}_{j}(s))}-\mathds{1}_{(0\leq y\leq C_{j}(s))}\|^{2}dsdy$
	$\displaystyle\quad+c_{1}\mathbb{E}\Big{(}\int_{[0,t]\times\mathbb{R}_{+}}\|\mathds{1}_{(0\leq y\leq C^{N}_{j}(s))}-\mathds{1}_{(0\leq y\leq C_{j}(s))}\|dsdy\Big{)}^{2}$
	$\displaystyle\leq c_{1}\mathbb{E}\int_{0}^{t}\|X^{N}_{j}(s)-X_{j}(s)\|^{2}ds+c_{1}\mathbb{E}\Big{(}\int_{0}^{t}\|X^{N}_{j}(s)-X_{j}(s)\|ds\Big{)}^{2}$
	$\displaystyle\hskip 56.9055pt+c_{1}\mathbb{E}\int_{0}^{t}\|C^{N}_{j}(s)-C_{j}(s)\|^{2}ds+c_{1}\mathbb{E}\Big{(}\int_{0}^{t}\|C^{N}_{j}(s)-C_{j}(s)\|ds\Big{)}^{2}$
	$\displaystyle\leq c_{2}\int_{0}^{t}\mathbb{E}\|X^{N}_{j}(s)-X_{j}(s)\|^{2}ds+c_{2}\int_{0}^{t}\mathbb{E}\|C^{N}_{j}(s)-C_{j}(s)\|ds,$	(4.12)

where $c_{0}$ , $c_{1}$ and $c_{2}$ are positive constants. The first two inequalities are by Doob’s inequalities and Cauchy-Schwarz, respectively. The last inequality is due to Lemma 4.1. By adding and subtracting terms, we have

\begin{split}|C^{N}_{j}&(s)-C_{j}(s)|\leq|C^{N}_{j}(s)-C^{N,1}_{j}(s)|+|C^{N,1}_{j}(s)-C^{N,2}_{j}(s)|+|C^{N,2}_{j}(s)-C_{j}(s)|,\end{split}

(4.13)

where

	$\displaystyle C^{N,1}_{j}$	$\displaystyle=\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\Big{[}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}b(X_{j}(s),X_{j_{2}}(s),...,X_{j_{d}}(s))\Big{]},$
	$\displaystyle C^{N,2}_{j}$	$\displaystyle=\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\bigg{[}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}\int_{\mathbb{N}^{d-1}}b(X_{j}(t),x_{j_{2}},...,x_{j_{d}})$
		$\displaystyle\hskip 256.0748pt\mu^{\mathbf{M}(j_{2})}_{t}(dx_{j_{2}})\cdots\mu^{\mathbf{M}(j_{d})}_{t}(dx_{j_{d}})\bigg{]}.$

First, consider $|C^{N}_{j}(s)-C^{N,1}_{j}(s)|$ . For large enough $N$ ,

		$\displaystyle\mathbb{E}\|C^{N}_{j}(s)-C^{N,1}_{j}(s)\|$
		$\displaystyle=\mathbb{E}\bigg{\|}\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\Big{[}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}\big{(}b(X^{N}_{j}(s),X^{N}_{j_{2}}(s),...,X^{N}_{j_{d}}(s))$
		$\displaystyle\hskip 256.0748pt-b(X_{j}(s),X_{j_{2}}(s),...,X_{j_{d}}(s))\big{)}\Big{]}\bigg{\|}$
		$\displaystyle\leq\mathbb{E}\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\Big{[}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}\big{(}\|X^{N}_{j}(s)-X_{j}(s)\|+\cdots+\|X^{N}_{j_{d}}(s)-X_{j_{d}}(s)\|\big{)}\Big{]}$
		$\displaystyle\leq d\times\max_{j\in V^{N}}\mathbb{E}[\|X^{N}_{j}(s)-X_{j}(s)\|]\times\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}$
		$\displaystyle\leq c_{3}\max_{j\in V^{N}}\mathbb{E}[\|X^{N}_{j}(s)-X_{j}(s)\|],$		(4.14)

where $c_{3}$ is constant. The first inequality is from the that $b(\cdot)$ is Lipschitz continuous with Lipschitz constant 1 and the last inequality is from (4.9).

Second, consider $|C^{N,1}_{j}(s)-C^{N,2}_{j}(s)|$ .

	$\displaystyle\mathbb{E}\big{[}\|C^{N,1}_{j}(s)-C^{N,2}_{j}(s)\|^{2}\big{]}$
	$\displaystyle=\mathbb{E}\Big{\|}\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\Big{[}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}b(X_{j}(s),X_{j_{2}}(s),...,X_{j_{d}}(s))\Big{]}$
	$\displaystyle\hskip 42.67912pt-\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\Big{[}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}\int_{\mathbb{N}^{d-1}_{0}}b(X_{j}(s),x_{j_{2}},...,x_{j_{d}})$		(4.15)
	$\displaystyle\hskip 256.0748pt\mu^{\mathbf{M}(j_{2})}_{s}(dx_{j_{2}})\cdots\mu^{\mathbf{M}(j_{d})}_{s}(dx_{j_{d}})\Big{]}\Big{\|}^{2}$
	$\displaystyle\leq\mathbb{E}\Big{[}\sum_{i_{1},i_{2}\in W^{N}}\sum_{\texttt{sett}^{N}(j)}\frac{\xi^{N}_{i_{1},j}\times\xi^{N}_{i_{1},j_{2}}\times\cdots\times\xi^{N}_{i_{1},j_{d}}}{{\delta^{N}_{i_{1}}\choose d}(d-1)!}\frac{\xi^{N}_{i_{2},j}\times\xi^{N}_{i_{2},j^{\prime}_{2}}\times\cdots\times\xi^{N}_{i_{2},j^{\prime}_{d}}}{{\delta^{N}_{i_{2}}\choose d}(d-1)!}\Big{]}$
	$\displaystyle\overset{(a)}{\leq}\mathbb{E}\Big{[}\sum_{i\in W^{N}}\sum_{\texttt{sett}^{N}{(j)}}\frac{\xi^{N}_{i,j}\times\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}\frac{\xi^{N}_{i,j}\times\xi^{N}_{i,j^{\prime}_{2}}\times\cdots\times\xi^{N}_{i,j^{\prime}_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}$
	$\displaystyle\hskip 42.67912pt+\sum_{i_{1},i_{2}\in W^{N},i_{1}\neq i_{2}}\sum_{\texttt{sett}^{N}{(j)}}\frac{\xi^{N}_{i_{1},j}\times\xi^{N}_{i_{1},j_{2}}\times\cdots\times\xi^{N}_{i_{1},j_{d}}}{{\delta^{N}_{i_{1}}\choose d}(d-1)!}\frac{\xi^{N}_{i_{2},j}\times\xi^{N}_{i_{2},j^{\prime}_{2}}\times\cdots\times\xi^{N}_{i_{2},j^{\prime}_{d}}}{{\delta^{N}_{i_{2}}\choose d}(d-1)!}\Big{]},$
	$\displaystyle\leq c_{4}N^{-2}+c_{5}N^{-1}$		(4.16)

where the first inequality is due to the fact that $X_{j}(0)$ is i.i.d. for $j\in V_{m}$ and independent for different $m$ , so for each $m\in\mathcal{M}$ , $\{X_{j}(s),j\in V_{m}\}$ are also i.i.d., and the independence across the server pools holds for any fixed $s>0$ . Hence, if $(j,j_{2},...,j_{d},j^{\prime}_{2},...,j^{\prime}_{d})$ are distinct, then

\begin{split}\mathbb{E}\Big{[}&\big{(}b(X_{j}(t),X_{j_{2}}(t),...,X_{j_{d}}(t))-\int_{\mathbb{N}^{d-1}}b(X_{j}(t),x_{j_{2}},...,x_{j_{d}})\mu^{\mathbf{M}(j_{2})}_{t}(dx_{j_{2}})\cdots\mu^{\mathbf{M}(j_{d})}_{t}(dx_{j_{d}})\big{)}\\ &\big{(}b(X_{j}(t),X_{j^{\prime}_{2}}(t),...,X_{j^{\prime}_{d}}(t))-\int_{\mathbb{N}^{d-1}}b(X_{j}(t),x_{j^{\prime}_{2}},...,x_{j^{\prime}_{d}})\mu^{\mathbf{M}(j^{\prime}_{2})}_{t}(dx_{j^{\prime}_{2}})\cdots\mu^{\mathbf{M}(j^{\prime}_{d})}_{t}(dx_{j^{\prime}_{d}})\big{)}\Big{]}=0,\end{split}

and $b(\cdot)$ and $\int b(\cdot)\mu(d\cdot)$ are both in $[0,1]$ . The last inequality of (4.16) is by (4.10) and (4.11).

Third, consider $|C^{N,2}_{j}(s)-C_{j}(s)|$ .

	$\displaystyle\mathbb{E}\big{[}\|C^{N,2}_{j}(s)-C_{j}(s)\|\big{]}$
	$\displaystyle=\mathbb{E}\Big{[}\Big{\|}\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\Big{[}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}\int_{\mathbb{N}^{d-1}}b(X_{j}(t),x_{j_{2}},...,x_{j_{d}})$
	$\displaystyle\hskip 256.0748pt\mu^{\mathbf{M}(j_{2})}_{t}(dx_{j_{2}})\cdots\mu^{\mathbf{M}(j_{d})}_{t}(dx_{j_{d}})\Big{]}$
	$\displaystyle-d\xi\sum_{k\in\mathcal{K}}\frac{p_{k,m}w_{k}}{\delta_{k}}\sum_{(M_{2},...,M_{d})\in\mathcal{M}^{d-1}}\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\int_{\mathbb{N}^{d-1}}b(X_{j}(t),x_{j_{2}},...,x_{j_{d}})$
	$\displaystyle\hskip 256.0748pt\mu^{\mathbf{M}(j_{2})}_{t}(dx_{j_{2}})\cdots\mu^{\mathbf{M}(j_{d})}_{t}(dx_{j_{d}})\Big{\|}\Big{]}$
	$\displaystyle\leq c_{6}(N),$		(4.17)

where $c_{6}(N)$ only depends on $N$ and goes to $0$ as $N\rightarrow\infty$ and the inequality comes from (4.9) and the fact that $\int b(\cdot)\mu(d\cdot)\in[0,1]$ . Now, by (4.12), (4.13), (4.2), (4.16) and (4.17), we have that for large enough $N$ ,

\max_{j\in V^{N}}\mathbb{E}\left\lVert X^{N}_{j}-X_{j}\right\rVert^{2}_{*,t}\leq c_{10}\int_{0}^{t}\max_{j\in V^{N}}\mathbb{E}\left\lVert X^{N}_{j}-X_{j}\right\rVert^{2}_{*,t}ds+f(N),

where $c_{10}$ is a constant and $f(N)$ is a function which goes to 0 as $N\rightarrow\infty$ . Last by Gronwall’s inequality, we have (3.6) and this completes the proof. ∎

4.3 Convergence of the Occupancy Process: IID Case

In this section, we want to show the convergence of the occupancy process $\mathbf{q}^{N}(\cdot)$ to the limit process $\mathbf{q}$ represented by the ODE (3.8). The first step is to investigate the existence and uniqueness of the solution of the ODE (3.8). Define

\bar{\mathcal{S}}\coloneqq\Big{\{}\mathbf{q}\in[0,1]^{M\times\mathbb{N}_{0}}:q_{m,0}=1,q_{m,l}\geq q_{m,l+1},\forall m\in\mathcal{M},l\in\mathbb{N}_{0}\Big{\}}

and clearly, $\mathcal{S}\subseteq\bar{\mathcal{S}}$ .

Lemma 4.5.

If $\mathbf{q}(0)=\mathbf{q}_{0}\in\bar{\mathcal{S}}$ , then the ODE system (3.8) has a unique solution denoted as $\bar{\mathbf{q}}(t,\mathbf{q}_{0})$ , $t\geq 0$ in $\bar{\mathcal{S}}$ .

The proof of Lemma 4.5 is based on the Picard successive approximation method ([14, Theorem 1(i)]) and is provided in Appendix C.

Proof of Theorem 3.6.

Fix any $T\in(0,\infty)$ . For each $m\in\mathcal{M}$ , consider random measures $\mu^{N}_{m}=\frac{1}{|V^{N}_{m}|}\sum_{j\in V^{N}_{m}}\delta_{X^{N}_{j}(\cdot)}$ and $\bar{\mu}^{N}_{m}=\frac{1}{|V^{N}_{m}|}\sum_{j\in V^{N}_{m}}\delta_{X_{j}(\cdot)}$ on $\mathbb{S}\coloneqq\mathbb{D}([0,T],\mathbb{N}_{0})$ , where $X_{j}(\cdot)$ is defined in (3.3). Denote the joint measures $\mu^{N}=(\mu^{N}_{1},...,\mu^{N}_{M})$ and $\bar{\mu}^{N}=(\bar{\mu}^{N}_{1},...,\bar{\mu}^{N}_{M})$ . Denote by $d_{BL}(\cdot,\cdot)$ the bounded-Lipschitz metric for probability measures on $\mathbb{S}$ :

d_{BL}(\mu_{1},\mu_{2})\coloneqq\sup_{\left\lVert f\right\rVert_{BL}\leq 1}\Big{|}\int_{\mathbb{S}}fd\mu_{1}-\int_{\mathbb{S}}fd\mu_{2}\Big{|},\quad\left\lVert f\right\rVert_{BL}\coloneqq\max\Big{\{}\left\lVert f\right\rVert_{\infty},\sup_{x\neq y}\frac{f(x)-f(y)}{d(x,y)}\Big{\}}.

From (3.6) we have

\begin{split}\mathbb{E}d_{BL}(\mu^{N}_{m},\bar{\mu}^{N}_{m})&\leq\mathbb{E}\sup_{\left\lVert f\right\rVert_{BL}\leq 1}\frac{1}{|V^{N}_{m}|}\sum_{j\in V^{N}_{m}}|f(X^{N}_{j})-f(X_{j})|\leq\frac{1}{|V_{m}^{N}|}\sum_{j\in V^{N}}\mathbb{E}\left\lVert X^{N}_{j}-X_{j}\right\rVert_{*,T}\xrightarrow{N\rightarrow\infty}0\end{split}

which implies that $d_{BL}(\mu^{N}_{m},\bar{\mu}^{N}_{m})\xrightarrow{\ \mathbb{P}\ }0$ for each $m\in\mathcal{M}$ . Since $\bar{\mu}^{N}_{m}\xrightarrow{\ \mathbb{P}\ }\mu_{m}$ by LLN, we have $\mu^{N}=(\mu^{N}_{1},...,\mu^{N}_{M})\xrightarrow{\ \mathbb{P}\ }(\mu_{1},...,\mu_{M})$ by Slutsky’s theorem. Also, it is easy to check that

\sup_{N}\mathbb{E}\Big{[}\sup_{0\leq t\leq T}\left\lVert\mathbf{q}^{N}(t)\right\rVert^{2}_{\ell_{1}}\Big{]}<\infty.

Thus, we have $\mathbf{q}^{N}\xrightarrow{\ \mathbb{P}\ }\mathbf{q}$ . Next, we need to show that $\mathbf{q}$ satisfies (3.8). Define $f_{l}(x)=\mathds{1}_{\{x\geq l\}}$ , $l\in\mathbb{N}_{0}$ . By (3.3), we have that for any $m\in\mathcal{M}$ and $j\in V_{m}$ ,

	$\displaystyle\mathbb{E}f_{l}(X_{j}(t))$	$\displaystyle=\mathbb{E}f_{l}(X_{j}(0))+\int_{0}^{t}u_{m}\mathbb{E}\mathds{1}_{\{X_{j}(s)>0\}}\big{(}f_{l}(X_{j}(s)-1)-f_{l}(X_{j}(s))\big{)}ds$
		$\displaystyle\quad+\int_{0}^{t}\int_{\mathbb{N}^{d-1}}\lambda\xi d\sum_{k\in\mathcal{K}}\frac{p_{k,m}w_{k}}{\delta_{k}}\sum_{(M_{2},...,M_{d})\in\mathcal{M}^{d-1}}\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}$
		$\displaystyle\qquad\times\mathbb{E}\big{[}b(X_{j}(s),x_{j_{2}},...,x_{j_{d}})\big{(}f_{l}(X_{j}(s)+1)-f_{l}(X_{j}(s))\big{)}\big{]}\mu^{M_{2}}_{s}(dx_{j_{2}})\cdots\mu^{M_{d}}_{s}(dx_{j_{d}})ds$
		$\displaystyle=\mathbb{E}f_{l}(X_{j}(0))+\int_{0}^{t}u_{m}\mathbb{E}\mathds{1}_{\{X_{j}(s)>0\}}(f_{l+1}(X_{j}(s))-f_{l}(X_{j}(s)))ds$
		$\displaystyle\quad+\int_{0}^{t}\int_{\mathbb{N}^{d-1}}\lambda\xi d\sum_{k\in\mathcal{K}}\frac{p_{k,m}w_{k}}{\delta_{k}}\sum_{(M_{2},...,M_{d})\in\mathcal{M}^{d-1}}\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}$
		$\displaystyle\qquad\times\mathbb{E}[b(l-1,x_{j_{2}},...,x_{j_{d}})(f_{l-1}(X_{j}(s))-f_{l}(X_{j}(s)))]\mu^{M_{2}}_{s}(dx_{j_{2}})\cdots\mu^{M_{d}}_{s}(dx_{j_{d}})ds.$

For any $m\in\mathcal{M}$ , if $j\in V_{m}$ , then $\mathbb{E}f_{l}(X_{j}(t))=q_{m,l}(t)=\mu^{m}_{t}[l,\infty)$ for $l=1,2,...$ . Hence,

	$\displaystyle q_{m,l}(t)$	$\displaystyle=q_{m,l}(0)-\int_{0}^{t}u_{m}(q_{m,l}(s)-q_{m,l+1}(s))ds+\int_{0}^{t}\lambda\xi d\sum_{k\in\mathcal{K}}\frac{p_{k,m}w_{k}}{\delta_{k}}(q_{m,l-1}(s)-q_{m,l}(s))$
		$\displaystyle\times\sum_{(M_{2},...,M_{d})\in\mathcal{M}^{d-1}}\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\int_{\mathbb{N}^{d-1}}b(l-1,x_{j_{2}},...,x_{j_{d}})\mu^{M_{2}}_{s}(dx_{j_{2}})\cdots\mu^{M_{d}}_{s}(dx_{j_{d}})ds$		(4.18)

Also,

		$\displaystyle\sum_{(M_{2},...,M_{d})\in\mathcal{M}^{d-1}}\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\int_{\mathbb{N}^{d-1}}b(l-1,x_{j_{2}},...,x_{j_{d}})\mu^{M_{2}}_{s}(dx_{j_{2}})\cdots\mu^{M_{d}}_{s}(dx_{j_{d}})$
		$\displaystyle=\sum_{\bar{r}\in\bar{\mathcal{R}}}\sum_{\bar{r}^{\prime}\in\bar{\mathcal{R}}^{\prime}(\bar{r})}\frac{1}{1+\|\bar{r}^{\prime}\|}\prod_{m\in\mathcal{M}}{r_{m}\choose r^{\prime}_{m}}\Big{(}\frac{v_{m}p_{k,m}}{\delta_{k}}\Big{)}^{r_{m}}(q_{m,l-1}(s)-q_{m,l}(s))^{r^{\prime}_{m}}(q_{m,l}(s))^{r_{m}-r^{\prime}_{m}}$
		$\displaystyle=\sum_{r=0}^{d-1}\frac{1}{1+r}{d-1\choose r}\Big{(}\sum_{m\in\mathcal{M}}\frac{v_{m}p_{k,m}}{\delta_{k}}q_{m,l-1}(s)-\sum_{m\in\mathcal{M}}\frac{v_{m}p_{k,m}}{\delta_{k}}q_{m,l}(s)\Big{)}^{r}\Big{(}\sum_{m\in\mathcal{M}}\frac{v_{m}p_{k,m}}{\delta_{k}}q_{m,l}(s)\Big{)}^{d-1-r}$
		$\displaystyle=\sum_{r=1}^{d}\frac{1}{r}{d-1\choose r-1}\big{(}\tilde{q}_{k,l-1}(s)-\tilde{q}_{k,l}(s)\big{)}^{r-1}(\tilde{q}_{k,l}(s))^{d-r}\quad(\text{Let }\tilde{q}_{k,l}(s)=\sum_{m\in\mathcal{M}}\frac{v_{m}p_{k,m}}{\delta_{k}}q_{m,l}(s))$
		$\displaystyle=\frac{(\tilde{q}_{k,l-1}(s))^{d}-(\tilde{q}_{k,l}(s))^{d}}{d(\tilde{q}_{k,l-1}(s)-\tilde{q}_{k,l}(s))}$		(4.19)

where $\bar{\mathcal{R}}=\{\bar{r}=(r_{1},...,r_{M})\in\mathbb{N}_{0}^{M}:\sum_{m\in\mathcal{M}}r_{m}=d-1\}$ and $\bar{\mathcal{R}}^{\prime}(\bar{r})=\{\bar{r}^{\prime}=(r^{\prime}_{1},...,r^{\prime}_{M})\in\mathbb{N}_{0}^{M}:r^{\prime}_{m}\leq r_{m},\forall m\in\mathcal{M}\}$ given $\bar{r}\in\bar{\mathcal{R}}$ . Plugging (4.3) into (4.3), we get the desired result. ∎

4.4 Convergence of the Occupancy Process: General Case

In this section, we will discuss the case that the sequence $\{G^{N}\}_{N}$ is clustered proportionally sparse, which helps us remove the i.i.d. assumption in Theorem 3.10. Intuitively, if $\{G^{N}\}_{N}$ is clustered proportionally sparse, then for each $k\in\mathcal{K}$ and each dispatcher $i\in W^{N}_{k}$ , the queue-length distribution of its neighborhood will always be close (in appropriate sense) to the corresponding global weighted queue-length distribution. Clustered proportional sparsity ensures that this statement holds uniformly for all occupancy states. Loosely speaking, this statement enables us to make sure that the evolution of the occupancy process happens in the same way for any initial state as in the case of i.i.d. initial state. For the case of homogeneous systems, the notion of proportional sparsity was introduced in [22]. Here, proportional sparsity was defined in a way that for most dispatcher $i$ , the fraction of its neighbors within any subset $U$ of servers is proportional to the size of the subset $U$ . However, due to the heterogeneous compatibility between dispatchers and servers, such fraction, in the current setup, depends on the corresponding type of the dispatcher as well (see the term $\frac{E^{N}_{k}(U)}{E^{N}_{k}(V^{N})}$ in Definition 3.8). Thus, unlike the homogeneous case where the local queue-length distribution is directly compared to the global queue-length distribution of the system, for the heterogeneous case, we need to define $K$ types of global weighted queue-length distribution (see Definition 4.6), where the weights are determined by the asymptotic properties of the graph structure: $(v_{m},m\in\mathcal{M})$ and $(p_{k,m},k\in\mathcal{K},m\in\mathcal{M})$ . Then, we compare the local queue-length distribution of dispatcher $i$ to the global weighted queue-length distribution of the corresponding type as defined below.

Definition 4.6.

Consider any fixed $N\in\mathbb{N}$ and $k\in\mathcal{K}$ . Given the global occupancy $\mathbf{q}^{N}=(q^{N}_{m,l},m\in\mathcal{M},l\in\mathbb{N}_{0})$ of the $N$ -th system, the global weighted queue-length distribution (GWQD) of type k is defined as $\big{(}x^{N}_{k,m,l},m\in\mathcal{M},l\in\mathbb{N}_{0}\big{)}$ , where

x^{N}_{k,m,l}=\frac{v_{m}p_{k,m}}{\delta_{k}}(q^{N}_{m,l+1}-q^{N}_{m,l}).

Also, the local queue-length distribution is defined as follows.

Definition 4.7.

Consider any fixed $N\in\mathbb{N}$ and $k\in\mathcal{K}$ . Given the state $(X^{N}_{j},j\in V^{N})$ of the $N$ -th system, the local queue-length distribution (LQD) of dispatcher $i\in W^{N}_{k}$ is defined as $(\hat{x}^{N}_{i,m,l},m\in\mathcal{M},l\in\mathbb{N}_{0})$ , where

\hat{x}^{N}_{i,m,l}=\frac{|\{j\in V^{N}_{m}:\xi^{N}_{i,j}=1\text{ and }X^{N}_{j}=l\}|}{|\mathcal{N}^{N}_{w}(i)|}.

Although the dispatcher following the JSQ( $d$ ) policy selects a target server based on its LQD, if its LQD is close (in suitable sense) to its corresponding GWQD, then the selection can be viewed as if the decision was based on the GWQD. The latter case is easier to analyze. Hence, if a dispatcher’s LQD is close to its corresponding GWQD, we call it a good dispatcher:

Definition 4.8 ( $\varepsilon$ -Good Dispatcher).

Consider any fixed $N\in\mathbb{N}$ and an $\varepsilon>0$ . Given the state $(X^{N}_{j},j\in V^{N})$ of the $N$ -th system. A dispatcher $i\in W^{N}_{k}$ , $k\in\mathcal{K}$ , is $\varepsilon$ -good if

\sum_{m\in\mathcal{M}}\sum_{l\in\mathbb{N}_{0}}|\hat{x}^{N}_{i,m,l}-x^{N}_{k,m,l}|\leq\varepsilon.

(4.20)

Also, a dispatcher is $\varepsilon$ -bad if it is not $\varepsilon$ -good.

4.4.1 Consequences of Clustered Proportional Sparsity

The proof of Theorem 3.10 relies on the idea that if the local occupancy of each dispatcher within a particular type evolves similar to the global occupancy of that type, then the process-level limiting behavior should not depend on any specific initial state. That is, it will enable us to go beyond the i.i.d. assumption. First step, for this approach to work, is to show that almost all dispatchers are $\varepsilon$ -good for any $\varepsilon>0$ . Here is where we need the property of clustered proportional sparsity. This is stated in the next proposition.

Proposition 4.9.

Let $\{G^{N}\}_{N}$ be a sequence of clustered proportionally sparse graphs. For any $T\geq 0$ and $\varepsilon_{1},\varepsilon_{2}>0$ ,

\mathbb{P}\Big{(}\sup_{t\in[0,T]}\mathscr{B}^{\varepsilon_{1}}_{N}(t)\geq\varepsilon_{2}|W^{N}|\Big{)}\xrightarrow{N\rightarrow\infty}0,

(4.21)

where $\mathscr{B}^{\varepsilon_{1}}_{N}(t)$ is the number of $\varepsilon_{1}$ -bad dispatchers at time $t$ .

The intuition behind Proposition 4.9 is that the servers of type $m\in\mathcal{M}$ with queue length $l\in\mathbb{N}_{0}$ forms a subset $U^{N}_{m,l}$ of the server set $V^{N}$ . If this set is large, then by the clustered proportional sparsity, for any fixed $k\in\mathcal{K}$ and almost all $i\in W^{N}_{k}$ , the fraction of the dispatcher $i$ ’s neighbors within $U^{N}_{m,l}$ is close to $\frac{|E^{N}_{k}(U^{N}_{m,l})|}{|E^{N}_{k}(V^{N})|}$ , which is close to $x^{N}_{k,m,l}$ for large enough $N$ by Condition 2.1. Also, in order to deal with the sum over $l\in\mathbb{N}_{0}$ , we will need to establish uniform bounds of the tail of the occupancy process on any finite time interval. The complete proof is given in Appendix D.

4.4.2 Coupling with an intermediate system

The main methodology for the proof of Theorem 3.10 is a stochastic coupling with a sequence $\{G^{\prime N}\}_{N\geq 1}$ of carefully constructed systems where the evolution of each system $G^{\prime N}$ can be coupled with that of the system $G^{N}$ . For each $N$ , the system $G^{\prime N}$ has the same sets of dispatchers and servers as $G^{N}$ , i.e., $W^{\prime N}=W^{N}$ and $V^{\prime N}=V^{N}$ . However, the task assignment in $G^{\prime N}$ happens differently. To describe the task assignment policy, let us introduce the following notations: Let $X^{\prime N}_{j}(t)$ be the number of tasks (including those in service) in the queue of server $j\in V^{\prime N}$ at time $t$ . Let $\mathbf{q}^{\prime N}(t)=\big{(}q^{\prime N}_{m,l}(t),m\in\mathcal{M},l\in\mathbb{N}_{0}\big{)}$ be the corresponding global occupancy at time $t$ , which is defined in the same way as $\mathbf{q}^{N}$ for the system $G^{N}$ . Then, the system $G^{\prime N}$ assigns tasks under the Global Weighted Shortest Queue (GWSQ( $d$ )) policy as described in Algorithm 1. The GWSQ( $d$ ) policy is essentially a variant of the JSQ(d) policy since for each new task, the dispatcher selects a target set of servers of size $d$ according to the global weighted queue-length distribution.

while A new task arrives at dispatcher $i\in W^{N}_{k}$ , $k\in\mathcal{K}$ do

Get the current global occupancy

\mathbf{q}^{N}=(q^{N}_{m,l},m\in\mathcal{M},l\in\mathbb{N}_{0})

;

Calculate the global weighted queue-length distribution

\mathbf{x}^{N}_{k}=(x^{N}_{k,m,l},m\in\mathcal{M},l\in\mathbb{N}_{0})

of type

k

x^{N}_{k,m,l}=\frac{v_{m}p_{k,m}}{\delta_{k}}(q^{N}_{m,l+1}-q^{N}_{m,l});

Randomly select a set

\texttt{select}^{N}

with size

d

as the following:

•

Let $Y^{N}_{k,m,l}(t)\in\mathbb{N}_{0}$ be the number of servers of type $m\in\mathcal{M}$ with queue length

l\in\mathbb{N}_{0}

in the set

\texttt{select}^{N}

;

•

$(Y^{N}_{k,m,l}(t),m\in\mathcal{M},l\in N_{0})$ satisfies

\sum_{m\in\mathcal{M},l\in\mathbb{N}_{0}}Y^{N}_{k,m,l}(t)=d;

•

The probability of selecting $(Y^{N}_{k,m,l}(t),m\in\mathcal{M},l\in N_{0})$ is

\mathbb{P}(Y^{N}_{k,m,l}(t),m\in\mathcal{M},l\in N_{0})=\prod_{m\in\mathcal{M},l\in\mathbb{N}_{0}}{X^{N}_{k,m,l}(t)\choose Y^{N}_{k,m,l}(t)}/{N\choose d};

where $X^{N}_{k,m,l}=N\times x^{N}_{k,m,l}$ .

Get $l^{*}=\min(l\in\mathbb{N}_{0}:\exists k\in\mathcal{K},m\in\mathcal{M}\text{ such that }Y^{N}_{k,m,l}>0)$ ;

Assign the task a type

m\in\mathcal{M}

server with queue length

l^{*}

with probability

\frac{Y^{N}_{k,m,l^{*}}}{\sum_{m\in\mathcal{M}}Y^{N}_{k,m,l^{*}}}.

end while

$\displaystyle\mathbb{E}\left\lVert X^{N}_{j}-X_{j}\right\rVert^{2}_{*,t}$	$\displaystyle\leq c_{0}\mathbb{E}\left\lVert X^{N}_{j}(t)-X_{j}(t)\right\rVert^{2}$
	$\displaystyle\leq c_{1}\mathbb{E}\int_{0}^{t}\|\mathds{1}_{(X^{N}_{j}(s)>0)}-\mathds{1}_{(X_{j}(s)>0)}\|^{2}ds+c_{1}\mathbb{E}\Big{(}\int_{0}^{t}\|\mathds{1}_{(X^{N}_{j}(s)>0)}-\mathds{1}_{(X_{j}(s)>0)}\|ds\Big{)}^{2}$
	$\displaystyle\quad+c_{1}\mathbb{E}\int_{[0,t]\times\mathbb{R}_{+}}\|\mathds{1}_{(0\leq y\leq C^{N}_{j}(s))}-\mathds{1}_{(0\leq y\leq C_{j}(s))}\|^{2}dsdy$
	$\displaystyle\quad+c_{1}\mathbb{E}\Big{(}\int_{[0,t]\times\mathbb{R}_{+}}\|\mathds{1}_{(0\leq y\leq C^{N}_{j}(s))}-\mathds{1}_{(0\leq y\leq C_{j}(s))}\|dsdy\Big{)}^{2}$
	$\displaystyle\leq c_{1}\mathbb{E}\int_{0}^{t}\|X^{N}_{j}(s)-X_{j}(s)\|^{2}ds+c_{1}\mathbb{E}\Big{(}\int_{0}^{t}\|X^{N}_{j}(s)-X_{j}(s)\|ds\Big{)}^{2}$
	$\displaystyle\hskip 56.9055pt+c_{1}\mathbb{E}\int_{0}^{t}\|C^{N}_{j}(s)-C_{j}(s)\|^{2}ds+c_{1}\mathbb{E}\Big{(}\int_{0}^{t}\|C^{N}_{j}(s)-C_{j}(s)\|ds\Big{)}^{2}$
	$\displaystyle\leq c_{2}\int_{0}^{t}\mathbb{E}\|X^{N}_{j}(s)-X_{j}(s)\|^{2}ds+c_{2}\int_{0}^{t}\mathbb{E}\|C^{N}_{j}(s)-C_{j}(s)\|ds,$	(4.12)

		$\displaystyle\mathbb{E}\|C^{N}_{j}(s)-C^{N,1}_{j}(s)\|$
		$\displaystyle=\mathbb{E}\bigg{\|}\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\Big{[}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}\big{(}b(X^{N}_{j}(s),X^{N}_{j_{2}}(s),...,X^{N}_{j_{d}}(s))$
		$\displaystyle\hskip 256.0748pt-b(X_{j}(s),X_{j_{2}}(s),...,X_{j_{d}}(s))\big{)}\Big{]}\bigg{\|}$
		$\displaystyle\leq\mathbb{E}\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\Big{[}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}\big{(}\|X^{N}_{j}(s)-X_{j}(s)\|+\cdots+\|X^{N}_{j_{d}}(s)-X_{j_{d}}(s)\|\big{)}\Big{]}$
		$\displaystyle\leq d\times\max_{j\in V^{N}}\mathbb{E}[\|X^{N}_{j}(s)-X_{j}(s)\|]\times\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\sum_{(j_{2},...,j_{d})\in\texttt{set}^{N}(j)}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}$
		$\displaystyle\leq c_{3}\max_{j\in V^{N}}\mathbb{E}[\|X^{N}_{j}(s)-X_{j}(s)\|],$		(4.14)

	$\displaystyle q_{m,0}(\bar{\alpha})=1,\forall m\in\mathcal{M},$
	$\displaystyle q_{m,1}(\bar{\alpha})=\alpha_{m},\quad m\in\mathcal{M}\setminus\{M\},\quad\mbox{and}\quad q_{M,1}=\frac{\lambda\xi-\sum_{m\in\mathcal{M}\setminus\{M\}}\alpha_{m}v_{m}u_{m}}{v_{M}u_{M}},$
	$\displaystyle u_{m}(q_{m,l}(\bar{\alpha})-q_{m,l+1}(\bar{\alpha}))=\lambda\xi(q_{m,l-1}(\bar{\alpha})-q_{m,l}(\bar{\alpha}))\sum_{k\in\mathcal{K}}\frac{p_{k,m}w_{k}}{\delta_{k}}\frac{(\tilde{q}_{k,l-1}(\bar{\alpha}))^{d}-(\tilde{q}_{k,l}(\bar{\alpha}))^{d}}{\tilde{q}_{k,l-1}(\bar{\alpha})-\tilde{q}_{k,l}(\bar{\alpha})},l\geq 1.$		(5.3)

	$\displaystyle\Big{\|}\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\sum_{\begin{subarray}{c}(j_{2},...,j_{d})\in\texttt{set}^{N}(j)\\ s.t.\quad j_{2}\in V^{N}_{M_{2}},...,j_{d}\in V^{N}_{M_{d}}\end{subarray}}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}-d\xi\frac{p_{k,m}w_{k}}{\delta_{k}}\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\Big{\|}$
	$\displaystyle\leq\Big{\|}\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\sum_{\begin{subarray}{c}(j_{2},...,j_{d})\in\texttt{set}^{N}(j)\\ s.t.\quad j_{2}\in V^{N}_{M_{2}},...,j_{d}\in V^{N}_{M_{d}}\end{subarray}}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}-\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\frac{{\delta^{N}_{i}\choose d-1}}{{\delta^{N}_{i}\choose d}}\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\Big{\|}$		(B.1)
	$\displaystyle\quad+\Big{\|}\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\frac{{\delta^{N}_{i}\choose d-1}}{{\delta^{N}_{i}\choose d}}\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}-d\xi\frac{p_{k,m}w_{k}}{\delta_{k}}\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\Big{\|}$		(B.2)

		$\displaystyle\max_{i\in W^{N}_{k}}\Big{\|}\sum_{\begin{subarray}{c}(j_{2},...,j_{d})\in\texttt{set}^{N}(j)\\ s.t.\quad j_{2}\in V^{N}_{M_{2}},...,j_{d}\in V^{N}_{M_{d}}\end{subarray}}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d-1}(d-1)!}-\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\Big{\|}$
	$\displaystyle\leq$	$\displaystyle\max_{i\in W^{N}_{k}}\Big{\|}\sum_{\begin{subarray}{c}(j_{2},...,j_{d})\in\texttt{set}^{N}(j)\\ s.t.\quad j_{2}\in V^{N}_{M_{2}},...,j_{d}\in V^{N}_{M_{d}}\end{subarray}}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d-1}(d-1)!}-\frac{\deg^{N}_{w}(i,M_{2})\times\cdots\times\deg^{N}_{w}(i,M_{d})}{{\delta^{N}_{i}\choose d-1}(d-1)!}\Big{\|}$
		$\displaystyle+\max_{i\in W^{N}_{k}}\Big{\|}\frac{\deg^{N}_{w}(i,M_{2})\times\cdots\times\deg^{N}_{w}(i,M_{d})}{{\delta^{N}_{i}\choose d-1}(d-1)!}-\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\Big{\|}.$

	$\displaystyle\sum_{m\in\mathcal{M}}\sum_{l\in\mathbb{N}_{0}}\|\hat{x}^{N}_{i,m,l}(t-)-x^{\prime N}_{k,m,l}(t-)\|$
	$\displaystyle\leq\sum_{m\in\mathcal{M}}\sum_{l\in\mathbb{N}_{0}}\|\hat{x}^{N}_{i,m,l}(t-)-x^{N}_{k,m,l}(t-)\|+\sum_{m\in\mathcal{M}}\sum_{l\in\mathbb{N}_{0}}\|x^{N}_{k,m,l}(t-)-x^{\prime N}_{k,m,l}(t-)\|=\varepsilon+\rho^{N}_{k}(t).$		(4.31)

	$\displaystyle\max_{i\in W^{N}_{k}}\Big{\|}\sum_{\begin{subarray}{c}(j_{2},...,j_{d})\in\texttt{set}^{N}(j)\\ s.t.\quad j_{2}\in V^{N}_{M_{2}},...,j_{d}\in V^{N}_{M_{d}}\end{subarray}}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d-1}(d-1)!}-\frac{\deg^{N}_{w}(i,M_{2})\times\cdots\times\deg^{N}_{w}(i,M_{d})}{{\delta^{N}_{i}\choose d-1}(d-1)!}\Big{\|}$
	$\displaystyle\leq\max_{i\in W^{N}_{k}}\frac{d(d-1)}{{\delta^{N}_{i}\choose d-1}(d-1)!}\max_{m\in\mathcal{M}}(\deg^{N}_{w}(i,m))^{d-2}$
	$\displaystyle\leq\frac{d(d-1)}{\min_{i\in W^{N}_{k}}{\delta^{N}_{i}\choose d-1}(d-1)!}\max_{i\in W^{N}_{k}}\max_{m\in\mathcal{M}}(\deg^{N}_{w}(i,m))^{d-2}$
	$\displaystyle\leq c^{N}(m,k)d(d-1)\frac{(N\max_{m\in\mathcal{M}}v_{m}p_{k,m})^{d-2}}{(N\delta_{k})^{d-1}}\xrightarrow{N\rightarrow\infty}0,$		(B.3)

	$\displaystyle\max_{i\in W^{N}_{k}}\Big{\|}\frac{\deg^{N}_{w}(i,M_{2})\times\cdots\times\deg^{N}_{w}(i,M_{d})}{{\delta^{N}_{i}\choose d-1}(d-1)!}-\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\Big{\|}$
	$\displaystyle\leq\max\Big{(}\prod_{h=2}^{d}\big{(}\frac{\max_{i\in W^{N}_{k}}\deg^{N}_{w}(i,M_{h})}{\min_{i\in W^{N}_{k}}(\delta^{N}_{i}-d)}-\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\big{)},\prod_{h=2}^{d}\big{(}\frac{\min_{i\in W^{N}_{k}}\deg^{N}_{w}(i,M_{h})}{\max_{i\in W^{N}_{k}}\delta^{N}_{i}}-\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\big{)}\Big{)}$
	$\displaystyle\leq c^{N}(m,k,M_{2},...,M_{d})\xrightarrow{N\rightarrow\infty}0,$		(B.4)

	$\displaystyle\lim_{N\rightarrow\infty}\max_{i\in W^{N}_{k}}\frac{N{\delta^{N}_{i}\choose d-1}}{{\delta^{N}_{i}\choose d}}$	$\displaystyle=\lim_{N\rightarrow\infty}\min_{i\in W^{N}_{k}}\frac{N{\delta^{N}_{i}\choose d-1}}{{\delta^{N}_{i}\choose d}}=\frac{d}{\delta_{k}},$
	$\displaystyle\lim_{N\rightarrow\infty}\max_{j\in V^{N}_{m}}\frac{\deg^{N}_{v}(k,j)}{N}$	$\displaystyle=\lim_{N\rightarrow\infty}\min_{j\in V^{N}_{m}}\frac{\deg^{N}_{v}(k,j)}{N}=\xi p_{k,m}w_{k}.$

	$\displaystyle\Big{\|}\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\frac{{\delta^{N}_{i}\choose d-1}}{{\delta^{N}_{i}\choose d}}-d\xi\frac{p_{k,m}w_{k}}{\delta_{k}}\Big{\|}$	$\displaystyle\leq\Big{\|}\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\frac{{\delta^{N}_{i}\choose d-1}}{{\delta^{N}_{i}\choose d}}-\deg^{N}_{v}(k,j)\frac{d}{N\delta_{k}}\Big{\|}+\Big{\|}\deg^{N}_{v}(k,j)\frac{d}{N\delta_{k}}-d\xi\frac{p_{k,m}w_{k}}{\delta_{k}}\Big{\|}$
		$\displaystyle\leq c^{N}_{1}(m,k)\xrightarrow{N\rightarrow\infty}0,$		(B.6)

	$\displaystyle\Big{\|}\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\sum_{\begin{subarray}{c}(j_{2},...,j_{d})\in\texttt{set}^{N}(j)\\ s.t.\quad j_{2}\in V^{N}_{M_{2}},...,j_{d}\in V^{N}_{M_{d}}\end{subarray}}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}-\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\frac{{\delta^{N}_{i}\choose d-1}}{{\delta^{N}_{i}\choose d}}\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\Big{\|}$
	$\displaystyle=\Big{\|}\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\frac{{\delta^{N}_{i}\choose d-1}}{{\delta^{N}_{i}\choose d}}\sum_{\begin{subarray}{c}(j_{2},...,j_{d})\in\texttt{set}^{N}(j)\\ s.t.\quad j_{2}\in V^{N}_{M_{2}},...,j_{d}\in V^{N}_{M_{d}}\end{subarray}}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d-1}(d-1)!}-\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\frac{{\delta^{N}_{i}\choose d-1}}{{\delta^{N}_{i}\choose d}}\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\Big{\|}$
	$\displaystyle\leq\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\frac{{\delta^{N}_{i}\choose d-1}}{{\delta^{N}_{i}\choose d}}\Big{\|}\sum_{\begin{subarray}{c}(j_{2},...,j_{d})\in\texttt{set}^{N}(j)\\ s.t.\quad j_{2}\in V^{N}_{M_{2}},...,j_{d}\in V^{N}_{M_{d}}\end{subarray}}\frac{\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d-1}(d-1)!}-\prod_{h=2}^{d}\frac{v_{M_{h}}p_{k,M_{h}}}{\delta_{k}}\Big{\|}$
	$\displaystyle\overset{(a)}{\leq}\sum_{i\in W^{N}_{k}}\xi^{N}_{i,j}\frac{{\delta^{N}_{i}\choose d-1}}{{\delta^{N}_{i}\choose d}}c_{2}^{N}(m,k,M_{2},...,M_{d})$
	$\displaystyle\overset{(b)}{\leq}c_{2}^{N}(m,k,M_{2},...,M_{d})c^{N}_{2}(m,k)d\xi\frac{p_{k,m}w_{k}}{\delta_{k}}\xrightarrow{N\rightarrow\infty}0,$		(B.7)

	$\displaystyle\frac{\Big{[}(d-1)!{\delta^{N}_{i}-1\choose d}\Big{]}^{2}-(2d-2)!{\delta^{N}_{i}-1\choose 2d-2}}{{\delta^{N}_{i}\choose d}^{2}\big{(}(d-1)!\big{)}^{2}}$	$\displaystyle\leq\frac{\Big{[}(d-1)!\max_{i\in W^{N}_{k}}{\delta^{N}_{i}-1\choose d}\Big{]}^{2}-(2d-2)!\min_{i\in W^{N}_{k}}{\delta^{N}_{i}-1\choose 2d-2}}{\min_{i\in W^{N}_{k}}{\delta^{N}_{i}\choose d}^{2}\big{(}(d-1)!\big{)}^{2}}$
		$\displaystyle\leq c_{1}(N)\frac{\Big{[}(d-1)!{N\delta_{k}\choose d-1}\Big{]}^{2}-(2d-2)!{N\delta_{k}\choose 2d-2}}{{N\delta_{k}\choose d}^{2}\big{(}(d-1)!\big{)}^{2}}$

	$\displaystyle\sum_{i\in W^{N}}\sum_{\texttt{sett}^{N}{(j)}}\frac{\xi^{N}_{i,j}\times\xi^{N}_{i,j_{2}}\times\cdots\times\xi^{N}_{i,j_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}\frac{\xi^{N}_{i,j}\times\xi^{N}_{i,j^{\prime}_{2}}\times\cdots\times\xi^{N}_{i,j^{\prime}_{d}}}{{\delta^{N}_{i}\choose d}(d-1)!}$
	$\displaystyle=\sum_{k\in\mathcal{K}}\sum_{i\in W^{N}_{k}}\frac{\Big{[}(d-1)!{\delta^{N}_{i}-1\choose d}\Big{]}^{2}-(2d-2)!{\delta^{N}_{i}-1\choose 2d-2}}{{\delta^{N}_{i}\choose d}^{2}\big{(}(d-1)!\big{)}^{2}}$
	$\displaystyle\leq c_{1}(N)\sum_{k\in\mathcal{K}}\deg^{N}_{v}(k,j)\frac{\Big{[}(d-1)!{N\delta_{k}\choose d-1}\Big{]}^{2}-(2d-2)!{N\delta_{k}\choose 2d-2}}{{N\delta_{k}\choose d}^{2}\big{(}(d-1)!\big{)}^{2}}$
	$\displaystyle\leq c_{1}(N)c_{2}(N,m)\sum_{k\in\mathcal{K}}\|W^{N}_{k}\|p_{k,m}\frac{\Big{[}(d-1)!{N\delta_{k}\choose d-1}\Big{]}^{2}-(2d-2)!{N\delta_{k}\choose 2d-2}}{{N\delta_{k}\choose d}^{2}\big{(}(d-1)!\big{)}^{2}}.$

	$\displaystyle\bar{h}_{m,0}(\mathbf{q})$	$\displaystyle=0,$
	$\displaystyle\bar{h}_{m,l}(\mathbf{q})$	$\displaystyle=-u_{m}(q_{m,l}-q_{m,l+1})+\lambda\xi(q_{m,l-1}-q_{m,l})\sum_{k\in\mathcal{K}}\frac{p_{k,m}w_{k}}{\delta_{k}}\frac{(\tilde{q}_{k,l-1})^{d}-(\tilde{q}_{k,l})^{d}}{\tilde{q}_{k,l-1}-\tilde{q}_{k,l}},\quad l\geq 1.$		(C.2)

		$\displaystyle\mathbb{P}\Big{(}\sup_{t\in[0,T]}\Big{\|}\Big{\{}i\in W^{N}_{k}:\sum_{m\in\mathcal{M}}\sum_{l\in\mathbb{N}_{0}}\|\hat{x}^{N}_{i,m,l}(t)-x^{N}_{k,m,l}(t)\|>\varepsilon_{1}\Big{\}}\Big{\|}\geq\varepsilon_{2}M(N)/K\Big{)}$
		$\displaystyle\leq\mathbb{P}\Big{(}\sup_{t\in[0,T]}\Big{\|}\Big{\{}i\in W^{N}_{k}:\sum_{m\in\mathcal{M}}\sum_{0\leq l\leq\ell-1}\|\hat{x}^{N}_{i,m,l}(t)-x^{N}_{k,m,l}(t)\|>\varepsilon_{1}/4\Big{\}}\Big{\|}\geq\varepsilon_{2}M(N)/(4K)\Big{)}$
		$\displaystyle\quad+\mathbb{P}\Big{(}\sup_{t\in[0,T]}\Big{\|}\Big{\{}i\in W^{N}_{k}:\sum_{m\in\mathcal{M}}\sum_{l\geq\ell}\hat{x}^{N}_{i,m,l}(t)>\varepsilon_{1}/2\Big{\}}\Big{\|}\geq\varepsilon_{2}M(N)/(2K)\Big{)}$
		$\displaystyle\quad+\mathbb{P}\Big{(}\sup_{t\in[0,T]}\Big{\|}\Big{\{}i\in W^{N}_{k}:\sum_{m\in\mathcal{M}}\sum_{l\geq\ell}x^{N}_{k,m,l}(t)>\varepsilon_{1}/4\Big{\}}\Big{\|}\geq\varepsilon_{2}M(N)/(4K)\Big{)}$
		$\displaystyle\leq\sum_{0\leq l\leq\ell-1}\mathbb{P}\Big{(}\sup_{t\in[0,T]}\Big{\|}\Big{\{}i\in W^{N}_{k}:\sum_{m\in\mathcal{M}}\|\hat{x}^{N}_{i,m,l}(t)-x^{N}_{k,m,l}(t)\|>\varepsilon_{1}/(4\ell)\Big{\}}\Big{\|}\geq\varepsilon_{2}M(N)/(4\ell K)\Big{)}$
		$\displaystyle\quad+\mathbb{P}\Big{(}\sup_{t\in[0,T]}\Big{\|}\Big{\{}i\in W^{N}_{k}:\sum_{m\in\mathcal{M}}\big{\|}\sum_{l\geq\ell}(\hat{x}^{N}_{i,m,l}(t)-x^{N}_{k,m,l}(t))\big{\|}>\varepsilon_{1}/4\Big{\}}\Big{\|}\geq\varepsilon_{2}M(N)/(4K)\Big{)}$
		$\displaystyle\quad+2\mathbb{P}\Big{(}\sup_{t\in[0,T]}\Big{\|}\Big{\{}i\in W^{N}_{k}:\sum_{m\in\mathcal{M}}\sum_{l\geq\ell}x^{N}_{k,m,l}(t)>\varepsilon_{1}/4\Big{\}}\Big{\|}\geq\varepsilon_{2}M(N)/(4K)\Big{)}$		(D.12)

	$\displaystyle\sum_{0\leq l\leq\ell-1}\mathbb{P}\Big{(}\sup_{t\in[0,T]}\Big{\|}\Big{\{}i\in W^{N}_{k}:\sum_{m\in\mathcal{M}}\|\hat{x}^{N}_{i,m,l}(t)-x^{N}_{k,m,l}(t)\|>\varepsilon_{1}/(4\ell)\Big{\}}\Big{\|}\geq\varepsilon_{2}M(N)/(4\ell K)\Big{)}$
	$\displaystyle\leq\sum_{0\leq l\leq\ell-1}\Big{(}\mathbb{P}\Big{(}\sup_{t\in[0,T]}\Big{\|}\Big{\{}i\in W^{N}_{k}:\sum_{m\in\mathcal{M}}\big{\|}\hat{x}^{N}_{i,m,l}(t)-\frac{\|E^{N}_{k}(U^{N}_{m,l}(t))\|}{\|E^{N}_{k}(V^{N})\|}\big{\|}>\varepsilon_{1}/(8\ell)\Big{\}}\Big{\|}\geq\varepsilon_{2}M(N)/(4\ell K)\Big{)}$
	$\displaystyle\hskip 42.67912pt+\mathbb{P}\Big{(}\sup_{t\in[0,T]}\sum_{m\in\mathcal{M}}\Big{\|}\frac{\|E^{N}_{k}(U^{N}_{m,l}(t))\|}{\|E^{N}_{k}(V^{N})\|}-x^{N}_{k,m,l}\Big{\|}>\varepsilon_{1}/(8\ell)\Big{)}\Big{)}$
	$\displaystyle\leq\frac{4\ell K}{\varepsilon_{2}M(N)}\sum_{0\leq l\leq\ell-1}\mathbb{E}\Big{(}\sup_{t\in[0,T]}\Big{\|}\Big{\{}i\in W^{N}_{k}:\sum_{m\in\mathcal{M}}\|\hat{x}^{N}_{i,m,l}(t)-\frac{\|E^{N}_{k}(U^{N}_{m,l}(t))\|}{\|E^{N}_{k}(V^{N})\|}\|>\varepsilon_{1}/(8\ell)\Big{\}}\Big{\|}\Big{)}$
	$\displaystyle\hskip 42.67912pt+\sum_{0\leq l\leq\ell-1}\mathbb{P}\Big{(}\sum_{m\in\mathcal{M}}\sup_{U\in V^{N}_{m}}\Big{\|}\frac{\|E^{N}_{k}(U)\|}{\|E^{N}_{k}(V^{N})\|}-\frac{v_{m}}{\delta_{k}}\frac{\|U\|}{\|V^{N}_{m}\|}\Big{\|}>\varepsilon_{1}/(8\ell)\Big{)}$
	$\displaystyle\leq\frac{4\ell K}{\varepsilon_{2}M(N)}\sum_{0\leq l\leq\ell-1}\sum_{m\in\mathcal{M}}\mathbb{E}\Big{(}\sup_{t\in[0,T]}\Big{\|}\Big{\{}i\in W^{N}_{k}:\|\hat{x}^{N}_{i,m,l}(t)-\frac{\|E^{N}_{k}(U^{N}_{m,l}(t))\|}{\|E^{N}_{k}(V^{N})\|}\|>\varepsilon_{1}/(8M\ell)\Big{\}}\Big{\|}\Big{)}$
	$\displaystyle\hskip 42.67912pt+\sum_{0\leq l\leq\ell-1}\sum_{m\in\mathcal{M}}\mathbb{P}\Big{(}\sup_{U\in V^{N}_{m}}\Big{\|}\frac{\|E^{N}_{k}(U)\|}{\|E^{N}_{k}(V^{N})\|}-\frac{v_{m}p_{k,m}}{\delta_{k}}\frac{\|U\|}{\|V^{N}_{m}\|}\Big{\|}>\varepsilon_{1}/(8M\ell)\Big{)}$
	$\displaystyle\leq\frac{4\ell K}{\varepsilon_{2}M(N)}\sum_{0\leq l\leq\ell-1}\sum_{m\in\mathcal{M}}\sup_{U\in V^{N}_{m}}\Big{\|}\Big{\{}i\in W^{N}_{k}:\big{\|}\frac{\|\mathcal{N}^{N}_{w}(i)\cap U\|}{\|\mathcal{N}^{N}_{w}(i)\|}-\frac{\|E^{N}_{k}(U)\|}{\|W^{N}_{k}(V^{N})\|}\big{\|}>\varepsilon_{1}/(8M\ell)\Big{\}}\Big{\|}$
	$\displaystyle\hskip 42.67912pt+\sum_{0\leq l\leq\ell-1}\sum_{m\in\mathcal{M}}\mathbb{P}\Big{(}\sup_{U\in V^{N}_{m}}\Big{\|}\frac{\|E^{N}_{k}(U)\|}{\|E^{N}_{k}(V^{N})\|}-\frac{v_{m}p_{k,m}}{\delta_{k}}\frac{\|U\|}{\|V^{N}_{m}\|}\Big{\|}>\varepsilon_{1}/(8M\ell)\Big{)}$
	$\displaystyle\leq\frac{4\ell^{2}KM}{\varepsilon_{2}M(N)}\sup_{U\in V^{N}}\Big{\|}\Big{\{}i\in W^{N}_{k}:\big{\|}\frac{\|\mathcal{N}^{N}_{w}(i)\cap U\|}{\|\mathcal{N}^{N}_{w}(i)\|}-\frac{\|E^{N}_{k}(U)\|}{\|W^{N}_{k}(V^{N})\|}\big{\|}>\varepsilon_{1}/(8M\ell)\Big{\}}\Big{\|}$
	$\displaystyle\hskip 42.67912pt+\sum_{0\leq l\leq\ell-1}\sum_{m\in\mathcal{M}}\mathbb{P}\Big{(}\sup_{U\in V^{N}_{m}}\Big{\|}\frac{\|E^{N}_{k}(U)\|}{\|E^{N}_{k}(V^{N})\|}-\frac{v_{m}p_{k,m}}{\delta_{k}}\frac{\|U\|}{\|V^{N}_{m}\|}\Big{\|}>\varepsilon_{1}/(8M\ell)\Big{)}$		(D.13)

		$\displaystyle\mathbb{P}\Big{(}\sup_{t\in[0,T]}\Big{\|}\Big{\{}i\in W^{N}_{k}:\sum_{m\in\mathcal{M}}\big{\|}\sum_{l\geq\ell}(\hat{x}^{N}_{i,m,l}(t)-x^{N}_{k,m,l}(t))\big{\|}>\varepsilon_{1}/4\Big{\}}\Big{\|}\geq\varepsilon_{2}M(N)/(4K)\Big{)}$
		$\displaystyle\leq\frac{4K}{\varepsilon_{2}M(N)}\sup_{U\in V^{N}}\Big{\|}\Big{\{}i\in W^{N}_{k}:\big{\|}\frac{\|\mathcal{N}^{N}_{w}(i)\cap U\|}{\|\mathcal{N}^{N}_{w}(i)\|}-\frac{\|E^{N}_{k}(U)\|}{\|W^{N}_{k}(V^{N})\|}\big{\|}>\varepsilon_{1}/4\Big{\}}\Big{\|}.$		(D.16)

		$\displaystyle\sum_{m\in\mathcal{M}}\sum_{l\in\mathbb{N}_{0}}\|F^{N}_{m,l}(\mathbf{x})-f_{m,l}(\mathbf{x})\|$
		$\displaystyle\leq\sum_{m\in\mathcal{M}}\sum_{l\in\mathbb{N}_{0}}\sum_{r=1}^{d}\sum_{r_{1}=1}^{r}\frac{r_{1}}{r}\frac{d!}{r_{1}!(r-r_{1})!(d-r)!}\Big{(}\big{(}x_{m,l}\big{)}^{r_{1}}\big{(}\sum_{\mathcal{M}\setminus\{m\}}x_{m,l}\big{)}^{r-r_{1}}\big{(}\sum_{\mathcal{M}}\sum_{l^{\prime}\geq l+1}x_{m,l^{\prime}}\big{)}^{d-r}$
		$\displaystyle\hskip 99.58464pt-\big{(}x_{m,l}-\frac{r_{1}}{N}\big{)}^{r_{1}}\big{(}\sum_{\mathcal{M}\setminus\{m\}}x_{m,l}-\frac{r-r_{1}}{N}\big{)}^{r-r_{1}}\big{(}\sum_{\mathcal{M}}\sum_{l^{\prime}\geq l+1}x_{m,l^{\prime}}-\frac{d-r}{n}\big{)}^{d-r}\Big{)}$
		$\displaystyle\leq\sum_{m\in\mathcal{M}}\sum_{l\in\mathbb{N}_{0}}\sum_{r=1}^{d}\sum_{r_{1}=1}^{r}\frac{r_{1}}{r}\frac{d!r_{1}(r-r_{1})(d-r)}{r_{1}!(r-r_{1})!(d-r)!}x_{m,l}(\sum_{\mathcal{M}\setminus\{m\}}x_{m,l})(\sum_{\mathcal{M}}\sum_{l^{\prime}\geq l+1}x_{m,l^{\prime}})\big{(}\frac{d}{N}\big{)}^{d}$
		$\displaystyle\leq\sum_{m\in\mathcal{M}}\sum_{l\in\mathbb{N}_{0}}\sum_{r=1}^{d}\sum_{r_{1}=1}^{r}\frac{r_{1}}{r}\frac{d!r_{1}(r-r_{1})(d-r)}{r_{1}!(r-r_{1})!(d-r)!}x_{m,l}\big{(}\frac{d}{N}\big{)}^{d}$
		$\displaystyle=\sum_{r=1}^{d}\sum_{r_{1}=1}^{r}\frac{r_{1}}{r}\frac{d!r_{1}(r-r_{1})(d-r)}{r_{1}!(r-r_{1})!(d-r)!}\big{(}\frac{d}{N}\big{)}^{d}\rightarrow 0\text{ as }0.$		(E.3)

$\displaystyle\mathbb{P}(\textit{Mismatch})$	$\displaystyle\leq\sum_{m\in\mathcal{M}}\sum_{l\in\mathbb{N}_{0}}\|F^{N}_{m,l}(\hat{\mathbf{x}}^{N}_{i})-F^{N}_{m,l}(\mathbf{x}^{\prime N}_{k})\|$
	$\displaystyle\leq\sum_{m\in\mathcal{M}}\sum_{l\in\mathbb{N}_{0}}\|F^{N}_{m,l}(\hat{\mathbf{x}}^{N}_{i})-f_{m,l}(\hat{\mathbf{x}}^{N}_{i})\|+\sum_{m\in\mathcal{M}}\sum_{l\in\mathbb{N}_{0}}\|F^{N}_{m,l}(\mathbf{x}^{\prime N}_{k})-f_{m,l}(\mathbf{x}^{\prime N}_{k})\|$
	$\displaystyle\quad+\sum_{m\in\mathcal{M}}\sum_{l\in\mathbb{N}_{0}}\|f_{m,l}(\hat{\mathbf{x}}^{N}_{i})-f_{m,l}(\mathbf{x}^{\prime N}_{k})\|$	(E.4)

$\displaystyle\Delta L^{N}_{m,\ell}(X^{N})$	$\displaystyle=\sum_{i=\ell}^{\infty}\Big{(}\frac{\lambda W(N)}{\lambda W(N)+\sum_{m\in\mathcal{M}}\|V^{N}_{m}\|u_{m}}\mathbb{P}(\mathcal{E}(Q^{N}_{m,i-1}))-\frac{\sum_{m\in\mathcal{M}}\|V^{N}_{m}\|u_{m}}{\lambda W(N)+\sum_{m\in\mathcal{M}}\|V^{N}_{m}\|u_{m}}\frac{\|Q^{N}_{m,i}\|u_{m}}{\sum_{m\in\mathcal{M}}\|V^{N}_{m}\|u_{m}}\Big{)}$
	$\displaystyle\leq\sum_{i=\ell}^{\infty}\Big{(}\frac{\rho q^{N}_{m,i-1}u_{m}(1+\varepsilon)}{\lambda\xi+\sum_{m\in\mathcal{M}}v_{m}u_{m}}-\frac{q^{N}_{m,i}u_{m}}{\lambda\xi+\sum_{m\in\mathcal{M}}v_{m}u_{m}}\Big{)}$
	$\displaystyle=\frac{\rho q^{N}_{m,\ell-1}u_{m}(1+\varepsilon)}{\lambda\xi+\sum_{m\in\mathcal{M}}v_{m}u_{m}}-\frac{1-(1+\varepsilon)\rho}{\lambda\xi+\sum_{m\in\mathcal{M}}v_{m}u_{m}}\sum_{i=\ell}^{\infty}q^{N}_{m,i}u_{m}$	(H.4)

Exploiting Data Locality to Improve Performance of Heterogeneous Server Clusters

Abstract

1 Introduction

1.1 Our Contributions

1.2 Related Works

1.3 Notations

2 Model Description

Condition 2.1.

3 Main Results

3.1 Mitigating the Stability Issue

Lemma 3.1.

Definition 3.2 (Subcritical Regime).

Proposition 3.3.

Lemma 3.4.

3.2 Process-level Limit: IID Case

Theorem 3.5 (Convergence to McKean-Vlasov process and propagation of chaos).

Theorem 3.6 (Process-level convergence for i.i.d. starting state).

Remark 3.7.

3.3 Process-level Limit: General Case

Definition 3.8 (Clustered Proportional Sparsity).

Remark 3.9.

Theorem 3.10 (Process-level convergence).

3.4 Convergence of Steady States

Theorem 3.11 (Global stability).

Theorem 3.12 (Tightness).

Theorem 3.13 (Convergence of steady states).

Theorem 3.14 (Double-exponential tail decay).

3.5 Simple Data Locality Design using Randomization

Definition 3.15 (irg(𝐩\mathbf{p})).

Theorem 3.16.

4 Proof of Transient Limit Results

4.1 Auxiliary Results

Lemma 4.1.

Lemma 4.2.

Proof.

Lemma 4.3.

Lemma 4.4.

4.2 Convergence to McKean-Vlasov Process: IID Case

Proof of Theorem 3.5.

4.3 Convergence of the Occupancy Process: IID Case

Lemma 4.5.

Proof of Theorem 3.6.

4.4 Convergence of the Occupancy Process: General Case

Definition 4.6.

Definition 4.7.

Definition 4.8 (ε\varepsilon-Good Dispatcher).

4.4.1 Consequences of Clustered Proportional Sparsity

Proposition 4.9.

4.4.2 Coupling with an intermediate system

Optimal Coupling.

Definition 4.10 (Mismatch).

Proposition 4.11.

Lemma 4.12.

Proof.

Lemma 4.13.

4.4.3 Proof of Theorem 3.10

Proof of Theorem 3.10.

5 Proof of Interchange of Limits

5.1 Properties of the Limiting System of ODEs

Proposition 5.1.

Lemma 5.2.

Proof of Theorem 3.11.

Claim 5.3.

Proof.

Proof of Theorem 3.14.

5.2 Proof of Tightness and Interchange of Limits

Lemma 5.4.

Lemma 5.5 ([19, Lemma 2]).

Proof of Theorem 3.12.

Proof of Theorem 3.13.

6 Numerical Results

Complete Bipartite vs. Designed Compatibility Structure.

Convergence of global occupancy states.

Uniqueness of the fixed point of the limit system.

References

Appendix A Proofs for Stability Results

Lemma A.1.

Proof.

Proof of Lemma 3.4.

Proof of Proposition 3.3.

Definition 3.15 (irg( $\mathbf{p}$ )).

Definition 4.8 ( $\varepsilon$ -Good Dispatcher).

Appendix B Approximation of Graph Structure for Large $N$ Systems