This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Cooperative Learning-Based Framework
for VNF Caching and Placement Optimization
over Low Earth Orbit Satellite Networks

Khai Doan, Marios Avgeris, Aris Leivadeas, Ioannis Lambadaris, and Wonjae Shin K. Doan and W. Shin are with Korea University, School of Electrical Engineering, Seoul, South Korea (e-mail:{khaidoan, wjshin }@korea.ac.kr). M. Avgeris and I. Lambadaris are with Carleton University, Department of Systems and Computer Engineering, Ottawa, ON, Canada (e-mail: {mariosavgeris@cunet, ioannis@sce}.carleton.ca). A. Leivadeas is with École de Technologie Supérieure, Department of Software and IT Engineering, Montreal, QC, Canada (e-mail: aris.leivadeas@etsmtl.ca).
Abstract

Low Earth Orbit Satellite Networks (LSNs) are integral to supporting a broad range of modern applications, which are typically modeled as Service Function Chains (SFCs). Each SFC is composed of Virtual Network Functions (VNFs), where each VNF performs a specific task. In this work, we tackle two key challenges in deploying SFCs across an LSN. Firstly, we aim to optimize the long-term system performance by minimizing the average end-to-end SFC execution delay, given that each satellite comes with a pre-installed/cached subset of VNFs. To achieve optimal SFC placement, we formulate an offline Dynamic Programming (DP) equation. To overcome the challenges associated with DP, such as its complexity, the need for probability knowledge, and centralized decision-making, we put forth an online Multi-Agent Q-Learning (MAQL) solution. Our MAQL approach addresses convergence issues in the non-stationary LSN environment by enabling satellites to share learning parameters and update their Q-tables based on distinct rules for their selected actions. Secondly, to determine the optimal VNF subsets for satellite caching, we develop a Bayesian Optimization (BO)-based learning mechanism that operates both offline and continuously in the background during runtime. Extensive experiments demonstrate that our MAQL approach achieves near-optimal performance comparable to the DP model and significantly outperforms existing baselines. Moreover, the BO-based approach effectively enhances the request serving rate over time.

Index Terms:
Network Function Virtualization, Service Function Chains, Satellite Networks, Multi-Agent Reinforcement Learning, Bayesian Optimization.

I Introduction

In the evolving field of communication networks, satellite technologies have introduced innovative solutions for global connectivity. Low Earth Orbit Satellite Networks (LSNs) and space-air-ground integrated networks are gaining momentum for providing seamless global connectivity [1, 2]. The deployment of thousands of Low Earth Orbit (LEO) satellites by operators such as SpaceX and OneWeb represents a shift towards ultra-dense mega constellations aimed at delivering high-data-rate, high-capacity global communication services. As demand for real-time, complex applications grows—especially in remote and latency-sensitive areas—traditional standalone networks often fall short [3]. Innovative network paradigms like Network Function Virtualization and Service Function Chains (SFCs) have been integrated into the context of LSNs as a solution [4]. Accordingly, Virtual Network Functions (VNFs) are designed and by chaining them in a specific order, SFCs are created to be delivered as the required network services. Leveraging the flexibility of virtualization allows for services to be deployed on satellite platforms to meet stringent requirements. This approach ensures seamless and robust service delivery, enhances reconfigurability, and reduces reliance on specialized hardware [5].

While virtualization enables the distribution of SFC placements across the entire LSN, realizing its benefits is not straightforward. Effectively optimizing the placement of VNF components among the available satellites is crucial for enhancing the overall efficiency of the solution [6]. Specifically in the LSN context, traditional methods for SFC placement [7] may face additional challenges in adapting to the unique characteristics of satellite networks, such as high mobility, complex time-varying topology, and resource competition. Reinforcement learning, and specifically Multi-Agent Reinforcement Learning (MAQL), has shown great potential in addressing these challenges [8]. Reinforcement learning algorithms empower autonomous agents to develop optimal strategies through interactions with their environments and feedback in the form of rewards or penalties. For SFC placement in LSNs, where network conditions frequently change, MAQL enables multiple agents to collaboratively learn and adapt to these dynamic scenarios. Specifically, agents can dynamically optimize SFC placement by considering factors like satellite movement, evolving network conditions, and varying service demands. Additionally, MAQL’s collaborative and decentralized approach enhances decision-making, increasing the adaptability and efficiency of SFC placement strategies in complex LSN environments. This method offers the potential for robust and adaptive SFC placement solutions tailored to the unique characteristics of LSNs.

In our previous work [8], we had formulated a Dynamic Programming (DP)-based solution and proposed a MAQL model for SFC deployment that optimized the long-term average service end-to-end delay. In this extension, we provide a detailed description of the proposed cooperative VNF/SFC placement method, followed by formal algorithmic definitions. In addition, considering the practical aspect of our approach—where each satellite has a subset of VNFs pre-installed/cached—we introduce a caching policy to determine the optimal VNF subsets for pre-installation on the satellites. This policy complements the placement mechanism by further optimizing performance and maximizing the request serving rate, given the constraints of satellite resources. Our contribution is fourfold:

  1. 1.

    We refine the DP-based method from [8] that aims to determine the optimal SFC placement policy to minimize end-to-end service delay. While this method provides an optimal solution, it is computationally intensive. Additionally, it necessitates centralized control and requires statistical information that is not always readily available.

  2. 2.

    We revisit and refine the online MAQL approach from [8], which treats satellites as independent agents. This approach faces convergence challenges due to the non-stationary learning environment. To overcome this issue, we introduce a parameter sharing mechanism with distinct rules depending on satellites’ actions, which enhances convergence and overall performance.

  3. 3.

    To optimize the subsets of VNFs pre-installed on satellites, we propose a novel VNF caching policy based on Bayesian Optimization (BO). This approach iteratively refines installed VNF sets at satellites and continuously improves them in the background as the system operates. By integrating this caching strategy with our VNF placement approach, we create a comprehensive framework for LSN service deployment that enhances both end-to-end service delays and request serving rates.

  4. 4.

    We experimentally demonstrate that our proposed framework closely approximates the optimal solutions. By leveraging the predictable movements of the satellites, our approach achieves improved request serving rates, end-to-end delays, and overall system cost. Additionally, we compare our method against those in the literature to highlight its effectiveness.

The rest of this work is organized as follows: Section II discusses the related works. Section III describes the system model. Section IV provides the DP formulation and the proposed MAQL-based VNF placement solution. In Section V, we introduce the BO-based VNF caching scheme. The experimental results are provided in Section VI and finally a conclusion is drawn in Section VII.

II Related Works

Recent research has focused on exploiting the benefits of LSNs by optimizing the throughput and satellites’ coverage to enhance the communication quality through beam scheduling, resource allocation and satellite formation design. For instance, in [9] the authors proposed an Internet of Things (IoT) system that uses LEO satellites, emphasizing on its benefits such as low latency and broad coverage. Mokhtar et al. [10] provided throughput bounds and a beam scheduling algorithm for a satellite downlink system. The authors in [11] used a unique 3D constellation optimization algorithm to optimize global coverage of an ultra-dense LEO satellite network. On the other hand, the authors in [12] examined channel capacity in dense LEO multi-terminal satellite systems and explored the impact of satellite distribution and formation size on the capacity. Wang et al. in [13] addressed a multi-layer LEO satellite-terrestrial network, aiming to minimize satellite utilization. However, besides the communication aspects, the benefits of non-terrestrial networks, especially LSNs, can extend to computational services such as the proposed one.

II-A VNF Placement in Static Satellite Topologies

LSNs create a breeding ground for diverse applications, including surveillance, tracking, mapping, weather forecasting, and disaster response [14], which can be decomposed into various supporting tasks such as remote sensing and Earth observation. An emerging paradigm within this context involves breaking down application tasks and abstracting them into VNFs [6]. This approach aligns with the SFC paradigm, facilitating the deployment and interconnection of VNFs in LSNs, the intricate challenges of which have concerned the research community during the last few years. Gao et al. [15] presented a hierarchical architecture for IoT users in satellite edge clouds, including IoT users, LEO satellites, and a cloud data center. They proposed a distributed VNF placement algorithm to minimize network bandwidth cost and service delay, considering satellite network characteristics like high latency and limited bandwidth. Also providing service to remote IoT users, the authors of [16] aimed at optimizing satellite deployment costs by modeling the problem as a network payoff maximization problem, solved with decentralized allocation through a non-cooperative potential game.

II-B VNF Placement in Dynamic Satellite Topologies

Qin et al. [17] examined a network model where service data is processed by satellites and sent to an Earth station. They formulated the problem as a congestion game to optimize delivery time and maintain continuous service, focusing on resource sharing and competition among SFCs. The authors then proposed two algorithms that achieve Nash Equilibrium to solve this problem. Jia et al. [18] tackled a VNF-based service delivery problem involving a network operation control center, Geostationary Equatorial Orbit (GEO) satellites as SDN controllers, and LEO satellites. They proposed an approximation algorithm focused on efficient resource allocation for practical uses. By employing both types of GEO and LEO satellites, the study in [19] divided the network into a control plane and a data plane. A joint optimization approach for elastic network resource provisioning was then introduced, enabling adaptable VNF deployment and traffic routing. In [20], the authors identified optimal SFC deployment paths that involve satellite deployment probabilities based on the satellite’s position and latitude. The authors of [21] proposed a heuristic and a mixed-integer linear programming algorithm to minimize end-to-end service latency while maximizing the service acceptance rate.

Evidently, placing SFCs on satellite networks is a challenging problem; thus following the recent interdisciplinary trends, Han et al. [2] devised an adaptive online SFC deployment algorithm for large-scale LSNs, based on deep reinforcement learning. Here, the authors proposed subnet division and a proximal policy optimization approach to reduce resource consumption in their solution. The cooperation/competition potential between individual satellites during the SFC placement problem was studied by Qin et al. in [3]; an approach based on potential game theory was adopted and three algorithms were proposed to minimize service delivery latency. On a different note, [4] suggested the use of a Tabu-search heuristic to solve the emerging integer non-linear programming problem of SFC deployment in satellite networks while minimizing VNF migrations.

II-C Novelty of the Proposed Framework

Despite the recent research interest around SFC deployment on LSNs, the majority of the works do not address the dynamic nature of the network nor optimize the long-term system performance. Additionally, the aforementioned works assume a priori that satellites are capable of executing every type of VNF available in the system. However, in the LSN context, embedded satellite processor capability and storage are typically limited, hence, this is not a practical assumption. In the proposed framework, which extends our prior work in [8], we aim to specifically address these two limitations in the literature, by providing a holistic LSN service deployment framework (VNF placement and caching) that simultaneously optimizes the end-to-end service delay and the request serving rate.

III System Model

III-A Communication Model

We assume an LSN that consists of multiple satellites, with 𝕍={v1,,vV}\mathbb{V}=\{v_{1},\ldots,v_{V}\} being the set of their indices (V=|𝕍|V=|\mathbb{V}|). The satellites gather data (e.g., images of the Earth’s surface) and cooperatively execute computational tasks for the supported applications. The satellites’ movements are considered periodic, and also, we assume that an inter-satellite link (ISL) between a pair of satellites is active when the two satellites enter the communication range of each other. Additionally, each satellite v𝕍v\in\mathbb{V} is equipped with RvR_{v} computational and ZvZ_{v} storage resources.

Refer to caption
Figure 1: Evolution of (v,u)(v,u)-ISL in time, when ev,u(1)=1,τv,u=2e_{v,u}\left(1\right)=1,\tau_{v,u}=2, Tv,u=3T_{v,u}=3. Two satellites signify an active ISL.

The introduced satellite network operates in a slotted, infinite time horizon, where t={1,2,}t\in\mathbb{N}=\left\{1,2,\ldots\right\} denotes a time slot. We use the binary parameter ev,u(t)=1e_{v,u}(t)=1 (or 0) to represent if the ISL between satellites vv and uu, referred to as (v,u)(v,u)-ISL, is active (or not). Due to the recurring nature of their orbits, satellites interact with each other at fixed intervals, and thus ISLs become active periodically as well. To model this periodicity, we introduce the following parameters; we assume that whenever the (v,u)(v,u)-ISL is active, it remains active for τv,u\tau_{v,u} time slots before becoming inactive. In addition, Tv,uT_{v,u} denotes the period of (v,u)(v,u)-ISL (in time slots), implying that Tv,uτv,uT_{v,u}\geq\tau_{v,u} and ev,u(t)=ev,u(t+Tv,u)e_{v,u}\left(t\right)=e_{v,u}\left(t+T_{v,u}\right), t,v,u𝕍\forall t\in\mathbb{N},\;v,u\in\mathbb{V}. Furthermore, we denote by TT the period of the entire network connectivity which can be computed as the least common multiple of Tu,v,u,v𝕍T_{u,v},\forall u,v\in\mathbb{V}. For simplicity, we assume that every data transfer between a satellite pair is completed within a single time slot using an inter-satellite free-space optical link with high transmission speed, long communication range, and high reliability. In this model, we only focus on the the sets of parameters Tu,vT_{u,v} and τu,v\tau_{u,v}, v,u\forall v,u to describe the network topology and motions of satellites for simplifying the presentation and enhancing the clarify. An illustrative example is provided in Fig. 1 with the presence/absence of the two satellites in a time slot implying an active/inactive ISL.

III-B SFCs, VNFs and Service Request Arrivals

As mentioned earlier, the satellites cooperatively execute computational tasks for certain types of applications. Each task requires data to be processed through the sequence of interconnected VNFs of an SFC. For instance, a wildfire detection involves several steps such as image processing, feature extraction, and fire classification [14], each one carried out by a separate function represented by a VNF in this work. An ordered sequence of lhl_{h} VNFs comprises an SFC hh\in\mathbb{H}, where ={h1,,hH},\mathbb{H}=\left\{h_{1},\ldots,h_{H}\right\}, is the index set of all the available SFCs/services, H=||H=|\mathbb{H}|. Also, we let 𝔽={1,2,}\mathbb{F}=\{\mathcal{F}_{1},\mathcal{F}_{2},\ldots\} be the set of all available VNFs. We denote by Ffh𝔽F^{h}_{f}\in\mathbb{F} the fthf^{\text{th}} VNF of SFC hh and by 𝕊h={Ffh|f{1,,lh}}\mathbb{S}_{h}=\left\{F^{h}_{f}\;|\;f\in\{1,...,l_{h}\}\right\} the set of VNFs comprising SFC hh\in\mathbb{H}. For a satellite to execute a specific VNF, the corresponding source code needs to be installed in the satellite’s computing system. The storage used for caching is excluded from the ZvZ_{v} storage resources of a satellite. We denote by 𝔽v\mathbb{F}_{v} the pre-installed set of VNF types on satellite v𝕍v\in\mathbb{V}, with |𝔽v||\mathbb{F}_{v}| being its caching capacity. A VNF FfhF^{h}_{f} can only be placed/executed on satellite vv if Ffh𝔽vF^{h}_{f}\in\mathbb{F}_{v}. Due to their limited physical resources, it is not always feasible to install every available VNFs on the satellites. Therefore, we consider a scenario where a subset of VNF images has been pre-installed on each satellite. This means that 𝔽v𝔽\mathbb{F}_{v}\subseteq\mathbb{F}, covering the unique case 𝔽v=𝔽\mathbb{F}_{v}=\mathbb{F} where v𝕍v\in\mathbb{V}.

At the end of time slot tt, a service request is initiated with a probability μ\mu. A request initiation triggers satellite v𝕍v\in\mathbb{V} to request the placement of an SFC hh\in\mathbb{H}. We call this satellite the requester; μvr\mu^{r}_{v} and μhs\mu^{s}_{h} are the probability that satellite vv will be the requester and the probability that SFC hh will be requested, respectively, conditioned on the occurrence of a service request. A service request for SFC hh will expire at the end of time slot t+Dht+D_{h} if not successfully served, where DhD_{h} is the SFC’s end-to-end delay tolerance. In our system model, we assume that at most one SFC request is initiated per time slot. Additionally, since the execution of an SFC may span multiple time slots, a satellite can manage multiple requests within a time slot, including its own and those forwarded from other satellites. In this work, each request is assigned a timestamp (described in Section IV-B) corresponding to its initiation time, and requests are processed in ascending order of these timestamps. In a practical context where a satellite initiates or receives multiple requests with the same initiation time slot, these requests can be buffered to eliminate their temporal overlap. Each buffered request is processed in a time slot, and the proposed framework in this paper remains applicable. Timestamps along with the use of buffers help determine the order in which requests are processed, thereby preventing resource conflicts.

III-C VNF Execution and Inter-Satellite Data Transfer

We use the terms placement and execution interchangeably throughout this work. Executing SFC hh means that all the VNFs FfhF^{h}_{f} for f=1,,lhf=1,\ldots,l_{h} are executed in the given order. That means the output of the fthf^{\text{th}} VNF becomes the input of the (f+1)th(f+1)^{\text{th}} VNF in the sequence. The output of the last VNF in the sequence bears the results carried out by SFC hh and needs to be returned to the requester satellite to mark the request as successfully served. We also denote by qfhq^{h}_{f} and gfhg^{h}_{f} the computational and storage resources required to execute Ffh𝕊hF^{h}_{f}\in\mathbb{S}_{h} and store its output.

We assume that if satellite vv executes VNF FfhF^{h}_{f}, and VNF Ff+1hF^{h}_{f+1} is assigned to another satellite uu following a placement policy (e.g., because Ff+1hF^{h}_{f+1} has not been pre-installed on vv), vv needs to store the output of FfhF^{h}_{f} in its storage before forwarding111The forwarding path may consist of multiple satellites as relays. it to satellite uu. In this case, when FfhF^{h}_{f} has been executed, qfhq^{h}_{f} computational resources on satellite vv are released, but gfhg^{h}_{f} storage resources are consumed to temporarily store its output. That happens because the communication can occur only when the (v,u)(v,u)-ISL is active. After forwarding the data, gfhg^{h}_{f} storage resources are released on satellite vv, while gfhg^{h}_{f} storage resources are consumed on satellite uu to store the received data. That means that data forwarding might fail if the receiving satellite does not have sufficient available storage space. On the contrary, if a satellite is executing a sequence of VNFs, the intermediate data is not stored in its storage.

In this work, we address the following two sub-problems:

  • The first sub-problem is to assign VNFs to satellites and define data routing paths to optimize the overall system performance. We refer to this as the VNF/service placement problem which is formally defined and solved in Section IV.

  • The second sub-problem is to define the optimal VNF subset installed on each satellite. We refer to this as the VNF caching problem which is formally defined and solved in Section V.

Fig. 2 illustrates a VNF placement constrained by installed VNF subsets. Table I summarizes the main notation. To reduce complexity, we will gradually augment the notation with additional variables as needed. In addition, we note that service requests and ISL transitions occur in each time slot. The former, along with the available resources of each satellite, constitutes the system state, which will be defined in the next section. The latter is due to satellite motions. Therefore, a time slot can be interpreted as a unit of time that quantifies both components.

TABLE I: Summary of main notation.
Notation Description
𝕍={v1,,vV}\mathbb{V}=\{v_{1},\ldots,v_{V}\} Set of LSN’s satellites
ev,u(t)e_{v,u}(t) (v,u)(v,u)-ISL state at time slot tt\in\mathbb{N}, u,v𝕍u,v\in\mathbb{V}
τv,u,Tv,u,T\tau_{v,u},T_{v,u},T Active ISL duration, period, system period
Rv,ZvR_{v},Z_{v} Computational, storage capacity of v𝕍v\in\mathbb{V}
={1,,H}\mathbb{H}=\left\{1,\ldots,H\right\} Set of LSN’s available SFCs/services
𝕊h={Ffh}\mathbb{S}_{h}=\{F^{h}_{f}\} Set of VNFs comprising SFC hh\in\mathbb{H}, with flhf\leq l_{h} (lhl_{h} being SFC’s hh length)
𝔽v={i}\mathbb{F}_{v}=\{\mathcal{F}_{i}\} Set of cached VNF types on satellite v𝕍v\in\mathbb{V}, i|𝔽v|i\leq|\mathbb{F}_{v}|
μ,μvr,μhs\mu,\mu^{r}_{v},\mu^{s}_{h} Service request, requester satellite, v𝕍v\in\mathbb{V}, and requested SFC, hh\in\mathbb{H}, probabilities
DhD_{h} SFC hh\in\mathbb{H} end-to-end delay tolerance
qfh,gfhq^{h}_{f},g^{h}_{f} FfhF^{h}_{f} computational, storage requirements
df,vhd^{h}_{f,v} FfhF^{h}_{f} execution delay on v𝕍v\in\mathbb{V}
𝐩={ϕ,𝐦}\mathbf{p}=\left\{\boldsymbol{\phi},\mathbf{m}\right\} (DP) SFC placement
ϕ={ϕk}\boldsymbol{\phi}=\{\phi_{k}\} Set of satellites ϕk𝕍\phi_{k}\in\mathbb{V} that handle the request at each time slot kk\in\mathbb{N}
𝐦={mf}\mathbf{m}=\{m_{f}\} Set of time slots mfm_{f}\in\mathbb{N} where VNF Ffh𝔽vF^{h}_{f}\in\mathbb{F}_{v} is activated
KK Largest end-to-end SFC delay tolerance
𝐫v={rv,k}\mathbf{r}_{v}=\{r_{v,k}\} Computational resource of v𝕍,kKv\!\in\!\mathbb{V},\forall k\!\leq\!K
𝐳v={zv,k}\mathbf{z}_{v}=\{z_{v,k}\} Storage resource of v𝕍,kKv\in\mathbb{V},\forall k\leq K
𝐱={𝐱v,h,n}\mathbf{x}=\{\mathbf{x}_{v},h,n\} (DP) System state: 𝐱v=(𝐫v,𝐳v)\mathbf{x}_{v}=(\mathbf{r}_{v},\mathbf{z}_{v}), v𝕍\forall v\in\mathbb{V}, requested SFC hh\in\mathbb{H}, requester n𝕍n\in\mathbb{V}
𝐱~={𝐱~v,h~,n~}\tilde{\mathbf{x}}=\{\tilde{\mathbf{x}}_{v},\tilde{h},\tilde{n}\} (DP) Transitioned system state
𝒫{𝐱𝐩𝐱~}\mathcal{P}\{\mathbf{x}\rightarrow_{\mathbf{p}}\tilde{\mathbf{x}}\} (DP) State transition probability given 𝐩\mathbf{p}
𝒞(𝐱,𝐩)\mathcal{C}\left(\mathbf{x},\mathbf{p}\right) (DP) Cost of placement 𝐩\mathbf{p} on state 𝐱\mathbf{x}
𝕐v\mathbb{Y}_{v} Set of requests in the buffer of satellite vv
𝐲={h,n,f,t^}\mathbf{y}=\left\{h,n,f,\hat{t}\right\} (MAQL) Request handled by satellite v𝕍v\in\mathbb{V}, hh\in\mathbb{H}, n𝕍n\in\mathbb{V}, t^\hat{t}\in\mathbb{N}, 𝐲𝕐v\mathbf{y}\in\mathbb{Y}_{v}
𝐬v={(rv,zv),𝐲}\mathbf{s}_{v}=\{(r_{v},z_{v}),\mathbf{y}\} (MAQL) Satellite v𝕍v\in\mathbb{V} state, 𝐲𝕐v\forall\mathbf{y}\in\mathbb{Y}_{v}
𝐚v={a}\mathbf{a}_{v}=\{a\} (MAQL) Satellite v𝕍v\in\mathbb{V} actions, 𝐲𝕐v\forall\mathbf{y}\in\mathbb{Y}_{v}
cv(𝐲,a)c_{v}(\mathbf{y},a) (MAQL) Cost when satellite v𝕍v\in\mathbb{V} performs action a𝐚va\in\mathbf{a}_{v} on handled request 𝐲𝕐v\mathbf{y}\in\mathbb{Y}_{v}
𝔸v(𝐲)\mathbb{A}_{v}(\mathbf{y}) The set of available actions for request 𝐲\mathbf{y} of satellite vv
πv(𝐲)a\pi_{v}\left(\mathbf{y}\right)\rightarrow a (MAQL) Placement policy v𝕍,a𝔸v(𝐲)v\in\mathbb{V},a\in\mathbb{A}_{v}(\mathbf{y})
Cv(𝐬v,𝐚v)C_{v}\left(\mathbf{s}_{v},\mathbf{a}_{v}\right) (MAQL) Total cost of satellite v𝕍v\in\mathbb{V} for state 𝐬v\mathbf{s}_{v} and action sequence 𝐚v\mathbf{a}_{v}
ζ\zeta Order of current time slot in system period
v\mathbb{Q}_{v} Q-table of satellite vv
Qvζ(𝐲,a)Q_{v}^{\zeta}\left(\mathbf{y},a\right) Q-value v,𝐲𝕐v,a𝐚v,v𝕍\in\mathbb{Q}_{v},\mathbf{y}\in\mathbb{Y}_{v},a\in\mathbf{a}_{v},v\in\mathbb{V}, and ζ{1,,T}\zeta\in\{1,\ldots,T\}
λ,δ\lambda,\delta Learning rate and discount factor
𝜽\boldsymbol{\theta} System-wide VNF caching strategy
MΠ(𝜽)M_{\Pi}(\boldsymbol{\theta}) Request serving rate under policy Π\Pi and caching strategy 𝜽𝚯\boldsymbol{\theta}\in\boldsymbol{\Theta}
μ˙(𝜽),σ˙(𝜽),𝒦(𝜽,𝜽~)\dot{\mu}(\boldsymbol{\theta}),\dot{\sigma}(\boldsymbol{\theta}),\mathcal{K}(\boldsymbol{\theta},\tilde{\boldsymbol{\theta}}) Predicted mean, standard deviation, kernel function for caching strategies 𝜽,𝜽~𝚯\boldsymbol{\theta},\tilde{\boldsymbol{\theta}}\in\boldsymbol{\Theta}
I(𝜽)I(\boldsymbol{\theta}) Acquisition function for caching strategy 𝜽\boldsymbol{\theta}
Refer to caption
Figure 2: The VNF/service placement problem on an LSN.

IV VNF Placement in LSNs

IV-A Dynamic Programming Formulation

The execution of an SFC can span several time slots, and its VNFs can be activated in multiple satellites at different time slots sequentially. We define the service placement as a tuple that contains both the indices specifying which satellites will handle data relaying and which will execute VNFs of the requested SFC, as well as their activation time slots. We denote it as 𝐩={ϕ,𝐦}\mathbf{p}=\left\{\boldsymbol{\phi},\mathbf{m}\right\}, where:

ϕ={ϕk|ϕk𝕍,k},\boldsymbol{\phi}=\{\phi_{k}\;|\;\phi_{k}\in\mathbb{V},k\in\mathbb{N}\}, (1)

and ϕk\phi_{k} denotes the index of the satellite that handles the request at the kthk^{\text{th}} time slot, with k=1k=1 representing the time slot when the placement 𝐩\mathbf{p} is performed. ϕ1\phi_{1} and ϕ|ϕ|\phi_{|\boldsymbol{\phi}|} are, respectively, the indices of the requester and the last satellite handling the request before the final result is obtained by the requester. We note that, by “handling” we refer to one of the followings: forwarding, carrying, rejecting a request or executing a VNF in it. Thus, calculating the number of elements in ϕ\boldsymbol{\phi}, i.e., |ϕ||\boldsymbol{\phi}|, gives us the end-to-end SFC/service execution and data transfer time, in time slots, for placement 𝐩\mathbf{p}. Set 𝐦\mathbf{m} is defined as:

𝐦={mf|mf,f{1,,lh}},\displaystyle\mathbf{m}=\{m_{f}\;|\;m_{f}\in\mathbb{N},f\in\{1,\ldots,l_{h}\}\}, (2)

where mfm_{f} is the time slot that VNF Ffh𝔽vF^{h}_{f}\in\mathbb{F}_{v} is activated, with mf=1m_{f}=1 representing again the time slot when the placement 𝐩\mathbf{p} is performed. A request rejection is represented by 𝐩=\mathbf{p}=\emptyset. A simple and intuitive example follows:

Example 1

The requester satellite 11 requests SFC 11 comprised of two VNFs, F11F_{1}^{1} and F21F_{2}^{1}. We assume for simplicity that 𝔽1={F11}\mathbb{F}_{1}=\{F_{1}^{1}\} and 𝔽2={F21}\mathbb{F}_{2}=\{F_{2}^{1}\}, i.e., the first and second VNFs are cached on satellites 11 and 22, respectively. It is also assumed that each satellite requires one time slot to execute its VNF. A feasible service placement is:

  • t=1t=1: Satellite 11 executes F11F^{1}_{1}.

  • t=2t=2: Satellite 11 forwards its output to satellite 22.

  • t=3t=3: Satellite 22 receives the data and executes F21F^{1}_{2}.

  • t=4t=4: Satellite 22 forwards the results back to satellite 11, the requester.

This placement 𝐩\mathbf{p} spans 44 time slots. From definition, we have that ϕ={1,1,2,2}\boldsymbol{\phi}=\left\{1,1,2,2\right\}, and 𝐦={1,3}\mathbf{m}=\left\{1,3\right\}, as the two VNFs are deployed at t=1t=1 and t=3t=3, respectively. \square

We note that when a placement 𝐩\mathbf{p} is made at time slot tt, the required resources of all involved satellites in future time slots are reserved accordingly. We define K=maxhDhK=\underset{h\in\mathbb{H}}{\max}~{}D_{h} as the largest end-to-end delay tolerance among all supported SFCs. We represent the system’s state by 𝐱\mathbf{x} which is updated at the beginning of every time slot; 𝐱\mathbf{x} encapsulates the satellites’ available resources for up to KK future time slots, the index of the requested SFC, as well as the index of the requester satellite.222It is sufficient for 𝐱\mathbf{x} to convey KK time slots from the current one because no placement can span more than KK time slots. The vector of available resources of satellite vv for the next KK time slots is given by 𝐱v=(𝐫v,𝐳v)\mathbf{x}_{v}=(\mathbf{r}_{v},\mathbf{z}_{v}), with 𝐫v={rv,k|rv,kRv,k{1,,K}},𝐳v={zv,k|zv,kZv,k{1,,K}}\mathbf{r}_{v}=\{r_{v,k}\;|\;r_{v,k}\leq R_{v},\forall k\in\{1,...,K\}\},~{}\mathbf{z}_{v}=\{z_{v,k}\;|\;z_{v,k}\leq Z_{v},\forall k\in\{1,...,K\}\}, where rv,kr_{v,k} and zv,kz_{v,k} are the available computational and storage resources of satellite vv, respectively, at time slot kk. These allow us to formally define the system state as:

𝐱={𝐱v,h,n|v𝕍,h,n𝕍},\displaystyle\mathbf{x}=\{\mathbf{x}_{v},h,n\;|\;\forall v\in\mathbb{V},h\in\mathbb{H},n\in\mathbb{V}\}, (3)

where hh and nn are the indices of the requested SFC and the requester at the end of the previous time slot, respectively. If there is no request in the previous time slot, h=n=0h=n=0.

We define 𝐱~={𝐱~v,h~,n~|v𝕍,h~,n𝕍}\tilde{\mathbf{x}}=\{\tilde{\mathbf{x}}_{v},\tilde{h},\tilde{n}\;|\;\forall v\in\mathbb{V},\tilde{h}\in\mathbb{H},n\in\mathbb{V}\} as the transitioned system state resulted by placement 𝐩\mathbf{p} in the beginning of the following time slot; 𝐱~v=(𝐫~v,𝐳~v)\tilde{\mathbf{x}}_{v}=(\tilde{\mathbf{r}}_{v},\tilde{\mathbf{z}}_{v}) represents the transitioned vector of available resources on satellite vv. We compute the components of 𝐱v\mathbf{x}_{v} as follows:

  1. (i)

    if v=ϕmfϕv=\phi_{m_{f}}\in\boldsymbol{\phi} and mf𝐦m_{f}\in\mathbf{m}, meaning that satellite vv is executing VNF FfhF^{h}_{f}, hence, ϕmf+m=v,m[0,df,vh1]\phi_{m_{f}+m}=v,\;\forall m\in[0,d^{h}_{f,v}-1]. Therefore, r~v,k=rv,(k+1)qfh\tilde{r}_{v,k}=r_{v,\left(k+1\right)}-q^{h}_{f}, and z~v,k=zv,(k+1)\tilde{z}_{v,k}=z_{v,\left(k+1\right)}, k[mf,mf+df,vh1]\forall k\in[m_{f},m_{f}+d^{h}_{f,v}-1].

  2. (ii)

    else if v=ϕmϕv=\phi_{m}\in\boldsymbol{\phi}, m𝐦m\notin\mathbf{m}, and mf𝐦\exists m_{f}\in\mathbf{m} such that either mf<m<mf+1m_{f}<m<m_{f+1} or m>mf=lhm>m_{f=l_{h}}, meaning vv stores the output of FfhF^{h}_{f}. Therefore, r~v,(m1)=rv,m\tilde{r}_{v,(m-1)}=r_{v,m} and z~v,(m1)=zv,mgfh\tilde{z}_{v,(m-1)}=z_{v,m}-g^{h}_{f}. We note that if mf<mm_{f}<m, with mf𝐦m_{f}\in\mathbf{m}, then m2m\geq 2.

  3. (iii)

    else either vϕv\notin\boldsymbol{\phi}, i.e., satellite vv is not part of the placement 𝐩\mathbf{p}, then, r~v,k=rv,(k+1),z~v,k=zv,(k+1)\tilde{r}_{v,k}=r_{v,\left(k+1\right)},\tilde{z}_{v,k}=z_{v,\left(k+1\right)}, k[1,K1]\forall k\in[1,K-1], or the placement 𝐩\mathbf{p} spans less than KK time slots (|ϕ|<K|\boldsymbol{\phi}|<K), then, r~v,k=rv,(k+1),z~v,k=zv,(k+1)\tilde{r}_{v,k}=r_{v,\left(k+1\right)},\tilde{z}_{v,k}=z_{v,\left(k+1\right)}, k[|ϕ|+1,K1]\forall k\in[|\boldsymbol{\phi}|+1,K-1].

Since a valid placement 𝐩\mathbf{p} cannot span more than KK time slots, r~v,K=Rv\tilde{r}_{v,K}=R_{v}, and z~v,K=Zv\tilde{z}_{v,K}=Z_{v} always. A representative example is provided below to aid the understanding of the definitions of system states and state transition:

Example 2

Let us consider an LSN consisting of two satellites, V=2V=2, where each one is equipped with computational resources equal to R1=R2=2R_{1}=R_{2}=2 and storage resources equal to Z1=Z2=3Z_{1}=Z_{2}=3. Two services are offered by the system as SFCs, H=2H=2, of length l1=l2=2l_{1}=l_{2}=2 and maximum end-to-end delay tolerances equal to D1=6D_{1}=6 and D2=7D_{2}=7 time slots. We have K=max(D1,D2)=7K=\max(D_{1},D_{2})=7. The execution of each of their VNF components consumes qf1=qf2=1q^{1}_{f}=q^{2}_{f}=1 computational resources and requires df,v1=df,v2=1d^{1}_{f,v}=d^{2}_{f,v}=1, v𝕍v\in\mathbb{V}, time slots to complete. Their output consumes gf1=gf2=1g^{1}_{f}=g^{2}_{f}=1 storage resources when stored. Data forwarding takes 11 time slot. Let T1,2=T=2T_{1,2}=T=2 and τ1,2=1\tau_{1,2}=1, i.e, the two satellites are in the communication range of each other every 22 time slots and once they are connected, their connection remains active for 11 time slot. Initially, we assume that the two satellites are not connected, i.e., the (1,2)(1,2)-ISL is inactive, thus e1,2(1)=0e_{1,2}(1)=0, while they have all their resources available, i.e., 𝐫v={rv,k=2|k=1,,7}\mathbf{r}_{v}=\{r_{v,k}=2|k=1,\ldots,7\} and 𝐳v={zv,k=3|k=1,,7}\mathbf{z}_{v}=\{z_{v,k}=3|k=1,\ldots,7\} for v=1,2v=1,2. Additionally, let 𝔽1={F21}\mathbb{F}_{1}=\{F_{2}^{1}\} and 𝔽2={F11}\mathbb{F}_{2}=\{F_{1}^{1}\}, i.e., satellites 1 and 2 come pre-installed with the second and the first VNF of SFC 11, respectively.

Refer to caption
Figure 3: Visualization of SFC deployment in Example 2.

At the end of time slot t=1t=1, the requester satellite n=2n=2 initiates a service request for SFC h=1h=1. Then, the system state at the beginning of time slot t=2t=2 is 𝐱={𝐱1,𝐱2,1,2}\mathbf{x}=\{\mathbf{x}_{1},\mathbf{x}_{2},1,2\}, where 𝐱1={𝐫1,𝐳1}\mathbf{x}_{1}=\{\mathbf{r}_{1},\mathbf{z}_{1}\} and 𝐱2={𝐫2,𝐳2}\mathbf{x}_{2}=\{\mathbf{r}_{2},\mathbf{z}_{2}\}. We visualize a feasible service placement of this example in Fig. 3, with the following details:

  • t=1t=1: Satellite 2 executes F11F_{1}^{1} which has been installed.

  • t=2t=2: Satellite 2 forwards its output to satellite 1.

  • t=3t=3: Satellite 1 receives the data and executes F21F_{2}^{1}.

  • t=4t=4: Satellite 1 forwards the output of F21F_{2}^{1} ​(the desired​​ result) back to satellite 2, the requester.

From this process, we obtain ϕ={2,2,1,1}\boldsymbol{\phi}=\{2,2,1,1\} and 𝐦={1,3}\mathbf{m}=\{1,3\}. We can calculate the components of the transitioned vectors of available resources 𝐱~1\tilde{\mathbf{x}}_{1}, 𝐱~2\tilde{\mathbf{x}}_{2}, through the following steps:

  • For v=ϕm1=2v=\phi_{m_{1}}=2 with m1=1m_{1}=1, condition (i) is satisfied. The same stands for v=ϕm2=1v=\phi_{m_{2}}=1 with m2=3m_{2}=3. Therefore, r~1,2=r1,31=2\tilde{r}_{1,2}=r_{1,3}-1=2 and z~1,2=z1,3=3\tilde{z}_{1,2}=z_{1,3}=3.

  • For v=ϕ2=2,m=2𝐦v=\phi_{2}=2,\;m=2\notin\mathbf{m} and m1𝐦\exists m_{1}\in\mathbf{m} such that m1=1<m<m2=3m_{1}=1<m<m_{2}=3, condition (ii) is satisfied. The same stands for v=ϕ4=1v=\phi_{4}=1. Therefore, r~2,1=r2,2=2\tilde{r}_{2,1}=r_{2,2}=2 and z~2,1=z2,21=2\tilde{z}_{2,1}=z_{2,2}-1=2. Also r~1,3=r1,4=2\tilde{r}_{1,3}=r_{1,4}=2 and z~1,3=z1,41=2\tilde{z}_{1,3}=z_{1,4}-1=2.

  • Since the placement 𝐩\mathbf{p} spans |ϕ|=4<K=7|\boldsymbol{\phi}|=4<K=7 time slots, condition (iii) is satisfied for calculating the rest of the components. Therefore, r~v,k=rv,k\tilde{r}_{v,k}=r_{v,k} and z~v,k=zv,k\tilde{z}_{v,k}=z_{v,k}, v{1,2},k{5,6}\forall v\in\{1,2\},\;\forall k\in\{5,6\}. Finally, r~1,7=r~2,7=2\tilde{r}_{1,7}=\tilde{r}_{2,7}=2 and z~1,7=z~2,7=3\tilde{z}_{1,7}=\tilde{z}_{2,7}=3. \square

Next, we define the transition probability from 𝐱\mathbf{x} to 𝐱~\tilde{\mathbf{x}}, given a placement 𝐩\mathbf{p}, as:

𝒫{𝐱𝐩𝐱~}={μ×μn~r×μh~s,if a request arrives,1μ,otherwise.\displaystyle\mathcal{P}\left\{\mathbf{x}\underset{\mathbf{p}}{\rightarrow}\tilde{\mathbf{x}}\right\}=\begin{cases}\mu\times\mu^{r}_{\tilde{n}}\times\mu^{s}_{\tilde{h}}&,\text{if a request arrives},\\ 1-\mu&,\text{otherwise}.\end{cases} (4)

Additionally, we define the system cost 𝒞(𝐱,𝐩)\mathcal{C}\left(\mathbf{x},\mathbf{p}\right) for performing placement 𝐩\mathbf{p}, while on state 𝐱\mathbf{x}, as: 𝒞(𝐱,𝐩)=|ϕ|\mathcal{C}(\mathbf{x},\mathbf{p})=|\boldsymbol{\phi}| if the request is served, and 𝒞(𝐱,𝐩)=Cp0\mathcal{C}(\mathbf{x},\mathbf{p})=C_{p}\gg 0 if rejected.

To accommodate the calculation of the long-term, time-dependent discounted cost, we augment the state and placement variables with time slot notation, 𝐱(t)\mathbf{x}(t) and 𝐩(t)\mathbf{p}(t), respectively. Then, this discounted cost can be given by:

t=1γt𝔼[𝒞(𝐱(t),𝐩(t))],\displaystyle\sum_{t=1}^{\infty}\gamma^{t}\mathbb{E}\left[\mathcal{C}\left(\mathbf{x}\left(t\right),\mathbf{p}\left(t\right)\right)\right], (5)

where γ[0,1]\gamma\in\left[0,1\right] is the discount factor. The objective of the VNF placement task is to minimize the discounted cost (5). Let J(𝐱)J\left(\mathbf{x}\right) be the minimum discounted cost which can be calculated through the following DP Equation:

J(𝐱)=min𝐩{𝒞(𝐱,𝐩)+γ𝒫{𝐱𝐩𝐱~}J(𝐱~)}.\displaystyle J\left(\mathbf{x}\right)=\underset{\mathbf{p}}{\min}\left\{\mathcal{C}\left(\mathbf{x},\mathbf{p}\right)+\gamma\;\mathcal{P}\left\{\mathbf{x}\underset{\mathbf{p}}{\rightarrow}\tilde{\mathbf{x}}\right\}J\left(\tilde{\mathbf{x}}\right)\right\}. (6)

Solving (6) produces the optimal service placement 𝐩\mathbf{p} that minimizes the end-to-end delay for the requested service.

A valid placement 𝐩\mathbf{p} adheres to the system constraints. We remind that 𝐩={ϕ,𝐦}\mathbf{p}=\left\{\boldsymbol{\phi},\mathbf{m}\right\}, with ϕ\boldsymbol{\phi} and 𝐦\mathbf{m} having been defined in Eqs. (1) and (2) respectively. ϕkϕ\phi_{k}\in\boldsymbol{\phi} denotes the index of the satellite handling the request in the kthk^{\text{th}} time slot and mf𝐦m_{f}\in\mathbf{m} denotes the time slot at which the fthf^{\text{th}} VNF is executed; mf=1m_{f}=1 denotes the current time slot. Therefore, ϕmf\phi_{m_{f}} is the index of the satellite executing the fthf^{\text{th}} VNF. Then, the set of VNF placement constraints with respect to (wrt.) the requested SFC hh is given as:

|ϕ|Dh,\displaystyle|\boldsymbol{\phi}|\leq D_{h}, (7)
qfhrϕk,k,k[mf,mf+df,ϕkh1],f=1,,lh,\displaystyle q^{h}_{f}\leq r_{\phi_{k},k},k\in\left[m_{f},m_{f}+d_{f,\phi_{k}}^{h}-1\right],f=1,\ldots,l_{h}, (8)
gfhzϕk,k,k[mf,mf+df,ϕkh1],f=1,,lh,\displaystyle g^{h}_{f}\leq z_{\phi_{k},k},k\notin\left[m_{f},m_{f}+d_{f,\phi_{k}}^{h}-1\right],f=1,\ldots,l_{h}, (9)
f𝔽ϕmf,f=1,,lh,\displaystyle f\in\mathbb{F}_{\phi_{m_{f}}},f=1,\ldots,l_{h}, (10)
eϕk,ϕk+1(k)=1,k[1,|ϕ|], for ϕkϕk+1.\displaystyle e_{\phi_{k},\phi_{k+1}}(k)=1,k\in\left[1,|\boldsymbol{\phi}|\right],\text{ for }\phi_{k}\neq\phi_{k+1}. (11)

Constraint (7) guarantees that the end-to-end delay of the placement does not exceed the maximum delay tolerance of SFC hh. Constraint (8) ensures that the satellite executing VNF ff at time slot mfm_{f} has sufficient computational resources. Constraint (9) ensures that every satellite handling the request has enough storage resources to store VNFs’ outputs. We note that before the first VNF is executed, only the request event is transferred between satellites, with no intermediate data produced by VNFs; hence, storage resource is considered to be unaffected and the constraint (9) is not required for k[1,max(1,m11)]k\in[1,\max(1,m_{1}-1)]. Constraint (10) ensures that only satellites installed with VNF ff can execute this VNF. Finally, constraint (11) guarantees that data is transferred only via active ISLs. Additionally, we note that if ϕk=ϕk+1\phi_{k}=\phi_{k+1}, then eϕk,ϕk+1(k)=1e_{\phi_{k},\phi_{k+1}}(k)=1 always.

IV-B Multi-Agent Q-Learning Solution

Finding a solution for the DP Eq. (6) subjected to constraints (7)-(11) can be computationally intractable due to its recursive nature. Furthermore, this equation assumes knowledge of the probabilities associated with service requests, i.e., μ\mu, μvr\mu_{v}^{r}, and μhs\mu_{h}^{s}, which are typically unobtainable in a realistic environment. To tackle this challenge, we reformulate the VNF placement problem as an MAQL model. In this context, each satellite operates as an independent agent with the goal of learning the optimal VNF placement policy for a requested service. The components of the MAQL model are defined as:

IV-B1 Agent’s State

The agent’s state 𝐬v\mathbf{s}_{v} of a satellite v𝕍v\in\mathbb{V} in a time slot tt, describes its available resources and the SFC requests it is currently handling. This state is derived from the system state 𝐱\mathbf{x}, introduced in the previous section for the DP formulation. We note that although only one request arrives at the system per time slot, each satellite might be handling multiple requests at each given moment. An agent’s state is defined as follows:

𝐬v={(rv,zv),𝐲i|i{1,,|𝕐v|}},\displaystyle\mathbf{s}_{v}=\{(r_{v},z_{v}),\mathbf{y}_{i}\;|\;\forall i\in\{1,\ldots,|\mathbb{Y}_{v}|\}\}, (12)

where rvr_{v} and zvz_{v} are the available computational and storage resources at satellite vv, respectively, in the considered time slot, 𝕐v\mathbb{Y}_{v} is the set of requests handled by satellite vv at the considered time slot, and |𝕐v||\mathbb{Y}_{v}| is the number of its elements; 𝐲i𝕐v\mathbf{y}_{i}\in\mathbb{Y}_{v} is the ithi^{\text{th}} handled request from the set of requests handled by satellite vv and is defined as:

𝐲i={hi,ni,fi,t^i},\displaystyle\mathbf{y}_{i}=\left\{h_{i},n_{i},f_{i},\hat{t}_{i}\right\}, (13)

where hih_{i}\in\mathbb{H} and ni𝕍n_{i}\in\mathbb{V} is the augmented requested SFC and requester satellite notation for the specific request, respectively; t^i\hat{t}_{i} represents the timestamp that indicates the initiation time slot of the request. We assume that the requests are stored sequentially in the agent’s state, i.e., if t^i<t^j\hat{t}_{i}<\hat{t}_{j}, then i<j[1,|𝕐v|]i<j\in[1,|\mathbb{Y}_{v}|]. fif_{i} indicates the next VNF in SFC hih_{i} that needs to be executed. When fif_{i} exceeds the total number of VNFs in SFC hih_{i}, i.e., fi=lhi+1f_{i}=l_{h_{i}}+1, all the VNFs of the requested SFC hih_{i} have been executed and the final result is to be returned to the requester. An example follows:

Example 3

We consider two SFCs with the following topologies; SFC 1: VNF 2 \rightarrow VNF3 and SFC 2: VNF 2 \rightarrow VNF 1 \rightarrow VNF 3. The execution of each VNF in these SFCs spans 11 time slot. The following scenario takes place:

  • t=1t=1: Satellite 11 requests for SFC 2, and carries the request over to the next time slot due to inactive ISL.

  • t=2t=2: Satellite 22 requests for SFC 1 and executes F11F_{1}^{1}. Satellite 11 again carries the request over to the next time slot.

  • t=3t=3: The (1,3)(1,3)- and (2,3)(2,3)-ISLs of the considered satellites are active. Satellite 11 forwards the request for SFC 2 to satellite 33, without executing any VNF. Satellite 22 also forwards the request for SFC 1 and the output of F11F_{1}^{1} to satellite 33.

Therefore, at time slot t=4t=4, satellite 33 is handling two requests 𝐲1={2,1,1,1}\mathbf{y}_{1}=\{2,1,1,1\} and 𝐲2={1,2,2,2}\mathbf{y}_{2}=\{1,2,2,2\} in ascending order of their initiation times. We present Fig. 4 to aid the intuition of this example. \square

Refer to caption
Figure 4: Visualization of the request transfer and satellites’ buffered requests in Example 3.

IV-B2 Agent’s Action

Each agent’s/satellite’s v𝕍v\in\mathbb{V} VNF placement action at a given time slot is a set of sub-actions, each directed to one of the requests it is currently handling:

𝐚v={ai|ai{1,,V+2},i{1,,|𝕐v|}},\displaystyle\mathbf{a}_{v}=\left\{a_{i}\;|\;a_{i}\in\{1,\ldots,V+2\right\},\forall i\in\{1,\ldots,|\mathbb{Y}_{v}|\}\}, (14)

where aia_{i} is the sub-action concerning handled request 𝐲i\mathbf{y}_{i}. We remind that the request index ii is assigned following ascending order of request initiation time. Based on that, request ii is handled before request i+1i+1, i.e., decision aia_{i} is taken before ai+1a_{i+1}. We identify V+2V+2 possible sub-actions for each request 𝐲i\mathbf{y}_{i} and classify them into four categories denoted by (𝖠𝟣),(𝖠𝟤),(𝖠𝟥)({\sf{A1}}),({\sf{A2}}),({\sf{A3}}), and (𝖠𝟦)({\sf{A4}}) as follows:

  • (A1): ai=u𝕍\{v}a_{i}=u\in\mathbb{V}\backslash\left\{v\right\}; forward the request to satellite uu. This sub-action is valid if the followings hold: (i) the (v,u)(v,u)-ISL is active at the current time slot, and (ii) if fi2f_{i}\geq 2, zugfihiz_{u}\geq g_{f_{i}}^{h_{i}}, i.e., satellite uu has sufficient storage resource to receive the request.

  • (A2): ai=V+1a_{i}=V+1; execute VNF FfihiF^{h_{i}}_{f_{i}}. This sub-action is carried out, provided that the following three conditions are satisfied: (i) not all the VNFs of SFC hih_{i} have been executed, i.e., filhif_{i}\leq l_{h_{i}}; (ii) the said VNF is pre-installed on satellite vv, i.e., Ffihi𝔽vF^{h_{i}}_{f_{i}}\in\mathbb{F}_{v}; (iii) the satellite has sufficient available resources.

  • (A3): ai=V+2a_{i}=V+2; reject the request (always valid).

  • (A4): ai=va_{i}=v: stall and carry the request over to the next time slot (always valid).

Let 𝔸v(𝐲i)\mathbb{A}_{v}\left(\mathbf{y}_{i}\right) be the family of valid actions of v𝕍v\in\mathbb{V} for request 𝐲i𝕐v\mathbf{y}_{i}\in\mathbb{Y}_{v}. In this MAQL setup, we determine a VNF placement policy, denoted by πv(𝐲i)ai\pi_{v}\left(\mathbf{y}_{i}\right)\rightarrow a_{i}, as a rule that maps each request 𝐲i\mathbf{y}_{i} to an action ai𝔸v(𝐲i)a_{i}\in\mathbb{A}_{v}\left(\mathbf{y}_{i}\right).

IV-B3 Instant Cost

By performing sub-action aia_{i} for a request 𝐲i\mathbf{y}_{i}, a cost cv(𝐲𝐢,ai)c_{v}\left(\mathbf{y_{i}},a_{i}\right) is induced on satellite vv, depending on the the type of sub-action:

cv(𝐲𝐢,ai)={𝟙{zugfihi}+Cp 1{zu<gfihi},if (𝖠𝟣),dfi,vhi,if (𝖠𝟤),Cp,if (𝖠𝟥),1,if (𝖠𝟦),\displaystyle c_{v}(\mathbf{y_{i}},a_{i})=\begin{cases}\mathds{1}\left\{z_{u}\geq g^{h_{i}}_{f_{i}}\right\}+C_{p}\;\mathds{1}\left\{z_{u}<g^{h_{i}}_{f_{i}}\right\},&\text{if }({\sf A1}),\\ d^{h_{i}}_{f_{i},v},&\text{if }({\sf A2}),\\ C_{p},&\text{if }({\sf A3}),\\ 1,&\text{if }({\sf A4}),\end{cases}

where 𝟙{}\mathds{1}\{\cdot\} is the indicator function which is equal to 11 if the enclosed condition is satisfied and 0 otherwise. In detail, if satellite vv forwards the request to satellite u𝕍u\in\mathbb{V} and uu has enough available storage to receive it, then, the induced cost is equal to the transferring time (11 time slot). Otherwise, data cannot be received, and a penalty CpC_{p} is incurred. In case satellite vv executes this part of the handled request, the cost is equal to the VNF’s execution time. Rejecting a request costs a penalty equal to CpC_{p}. Finally, stalling and carrying a request over to the next time slot incurs a cost equal to 11. Given a satellite’s vv state 𝐬v\mathbf{s}_{v}, we define the total cost for induced for performing a sequence of sub-actions, i.e., action 𝐚v\mathbf{a}_{v} as:

Cv(𝐬v,𝐚v)=i=1|𝕐v|γt^i+1cv(𝐲i,ai).\displaystyle C_{v}\left(\mathbf{s}_{v},\mathbf{a}_{v}\right)=\sum\limits_{i=1}^{|\mathbb{Y}_{v}|}\gamma^{\hat{t}_{i}+1}c_{v}\left(\mathbf{y}_{i},a_{i}\right). (15)

IV-B4 Learning Mechanism & Optimal Action Estimation

In the following, we augment the agent’s state, handled request and agent’s action notation to accommodate time slot locality, as 𝐬(t)\mathbf{s}(t), 𝐲(t)\mathbf{y}(t) and 𝐚(t)\mathbf{a}(t). We additionally drop the satellite, vv, and sequence, ii, subscripts from them, to avoid overloading the equations. Given the above, in the MAQL environment we are defining, each satellite v𝕍v\in\mathbb{V} aims at learning the optimal VNF placement policy that minimizes the following discounted long-term cost:

t=1𝔼[Cv(𝐬(t),𝐚(t))]=𝐲(t)𝕐vΨ(𝐲(t),a(t)),\displaystyle\sum_{t=1}^{\infty}\mathbb{E}\left[C_{v}\left(\mathbf{s}\left(t\right),\mathbf{a}\left(t\right)\right)\right]=\sum_{\mathbf{y}\left(t\right)\in\mathbb{Y}_{v}}\Psi(\mathbf{y}(t),a(t)), (16)

where Ψ(𝐲(t),a(t))=t=1𝔼πv[γt^+1cv(𝐲(t),a(t))]\Psi(\mathbf{y}(t),a(t))=\sum_{t=1}^{\infty}\mathbb{E}_{\pi_{v}}\left[\gamma^{\hat{t}+1}c_{v}\left(\mathbf{y}\left(t\right),a\left(t\right)\right)\right], a(t)a(t) being the sub-action for request 𝐲(t)𝕐v\mathbf{y}(t)\in\mathbb{Y}_{v} dictated by policy πv\pi_{v}. Approximating the optimal policy is equivalent to approximating an action for every request in each time slot that minimizes this discounted long-term cost.

In our MAQL framework, the Q-table contains the approximations of the long-term sub-costs Ψ(𝐲(t),a(t))\Psi(\mathbf{y}(t),a(t)). Let v\mathbb{Q}_{v} be the Q-table of satellite vv that stores all the learned Q-values. We remind the reader that the network topology varies deterministically and periodically with a period TT. Therefore, at a time slot tt, the network topology and its upcoming changes are the same as those at a time slot ζ\zeta, where 1ζT1\leq\zeta\leq T. This time slot ζ\zeta then can be computed wrt. tt as ζ=t 1{t<T}+mod(t,T)𝟙{tT}\zeta=t\;\mathds{1}\left\{t<T\right\}+\text{mod}\left(t,T\right)\mathds{1}\left\{t\geq T\right\}, where mod(t,T)\text{mod}\left(t,T\right) returns the remainder of t÷Tt\div T. To this end, we denote by Qv(ζ)(𝐲(t),a(t))Q_{v}^{\left(\zeta\right)}\left(\mathbf{y}\left(t\right),a\left(t\right)\right) the Q-value estimation of cost Ψ(𝐲(t),a(t))\Psi(\mathbf{y}(t),a(t)). For presentation convenience we fix the time slot tt thus hereafter the tt notation is dropped.

In this setting, each satellite’s environment is non-stationary as it depends on the policies employed by other satellites which continually evolve, meaning that system convergence is not guaranteed. To address this concern, we promote collaboration among the agents in the form of knowledge sharing. When satellite v𝕍v\in\mathbb{V} forwards a request 𝐲𝕐v\mathbf{y}\in\mathbb{Y}_{v} to satellite u𝕍{v}u\in\mathbb{V}\setminus\{v\}, uu shares its learned Q-values as follows:

  • If satellite uu has handled the request 𝐲\mathbf{y} before, i.e., the row Qu(ζ)(𝐲,a)Q_{u}^{\left(\zeta\right)}\left(\mathbf{y},a\right) is populated on u\mathbb{Q}_{u}, then the value min𝑎Qu(ζ)(𝐲,a)\underset{a}{\min}~{}Q_{u}^{\left(\zeta\right)}\left(\mathbf{y},a\right) is shared with satellite vv.

  • Otherwise, a predefined initial Q-value QinitQ_{init} is shared.

Let 𝐐v(ζ)(𝐬v,𝐚)={Qv(ζ)(𝐲,a)|𝐲𝕐v}\mathbf{Q}_{v}^{\left(\zeta\right)}\left(\mathbf{s}_{v},\mathbf{a}\right)=\{Q_{v}^{\left(\zeta\right)}\left(\mathbf{y},a\right)|~{}\forall\mathbf{y}\in\mathbb{Y}_{v}\}. Having acquired shared Q-values from other satellites, satellite vv updates each component Qv(ζ)(𝐲,a)Q_{v}^{\left(\zeta\right)}\left(\mathbf{y},a\right) if aV+2a\neq V+2 as follows:

Qv(ζ)(𝐲,a)(1λ)Qv(ζ)(𝐲,a)+λ(c(𝐲,a)+δQ^).\displaystyle Q_{v}^{\left(\zeta\right)}\left(\mathbf{y},a\right)\leftarrow\left(1-\lambda\right)Q_{v}^{\left(\zeta\right)}\left(\mathbf{y},a\right)+\lambda(c\left(\mathbf{y},a\right)+\delta\hat{Q}). (17)

Q^=mina𝔸w(𝐲)Qv(ζ)(𝐲,a),\hat{Q}=\underset{a^{\prime}\in\mathbb{A}_{w}\left(\mathbf{y}^{\prime}\right)}{\min}~{}Q_{v}^{\left(\zeta\right)}\left(\mathbf{y}^{\prime},a^{\prime}\right), where w=vw=v if a{v,V+1}a\in\{v,V+1\}; w=uw=u if a{1,,V}\{v}a\in\{1,\ldots,V\}\backslash\{v\}. λ\lambda, δ[0,1]\delta\in[0,1] are the learning rate and the discount factor, respectively; 𝐲\mathbf{y}^{\prime} is the next state of 𝐲={h,n,f,t^}\mathbf{y}=\{h,n,f,\hat{t}\}, where: (i) 𝐲=𝐲\mathbf{y}^{\prime}=\mathbf{y} if aV+1,V+2a\neq V+1,V+2, (ii) 𝐲={h,n,f+1,t^}\mathbf{y}^{\prime}=\{h,n,f+1,\hat{t}\} if a=V+1a=V+1. If a=V+2a=V+2, the request is rejected and the penalty cost CpC_{p} is yielded: Qv(ζ)(𝐲,V+2)=CpQ_{v}^{\left(\zeta\right)}\left(\mathbf{y},V+2\right)=C_{p}. Then, satellite vv approximates its optimal action aa^{*} for a handled request 𝐲𝕐v\mathbf{y}\in\mathbb{Y}_{v} by:

a=argmina𝔸v(𝐲)Qv(ζ)(𝐲,a).\displaystyle a^{*}=\underset{a\in\mathbb{A}_{v}\left(\mathbf{y}\right)}{\operatorname*{arg\,min}}~{}Q_{v}^{\left(\zeta\right)}\left(\mathbf{y},a\right). (18)

A formal description of the MAQL-based VNF placement method is provided in Algorithm 1.

Algorithm 1 Cooperative VNF Placement on Satellite vv
1:Initialize t1t\leftarrow 1,
2:ev,u(1)1e_{v,u}(1)\leftarrow 1 if (v,u)(v,u)-ISL is active, 0 if not, u𝕍\{v}\forall u\in\mathbb{V}\backslash\{v\}
3:while 𝕐v\mathbb{Y}_{v}\neq\emptyset do
4:     Select 𝐲i=(hi,ni,fi,t^i)𝕐v\mathbf{y}_{i}=(h_{i},n_{i},f_{i},\hat{t}_{i})\in\mathbb{Y}_{v}
5:     𝔸v(𝐲i){uv|ev,u(t)=1}{v}{V+2}\mathbb{A}_{v}(\mathbf{y}_{i})\leftarrow\{u\neq v\;|\;e_{v,u}(t)=1\}\cup\{v\}\cup\{V+2\}
6:     if Ffihi𝔽vF^{h_{i}}_{f_{i}}\in\mathbb{F}_{v} then
7:         𝔸v(𝐲i)𝔸v(𝐲i){V+1}\mathbb{A}_{v}(\mathbf{y}_{i})\leftarrow\mathbb{A}_{v}(\mathbf{y}_{i})\cup\{V+1\}      
8:     if training then
9:         Sample action ai𝔸v(𝐲i)a_{i}\in\mathbb{A}_{v}(\mathbf{y}_{i}) uniformly
10:     else
11:         Perform aa^{*}, Eq. (18)      
12:for u𝕍\{v}u\in\mathbb{V}\backslash\{v\} such that ev,u(t)=1e_{v,u}(t)=1 do
13:     Forward requests to uu, 𝐲i\forall\mathbf{y}_{i} for which ai=ua_{i}=u
14:     for 𝐲i𝕐u\mathbf{y}_{i}\in\mathbb{Y}_{u} transferred by uu with i=1,2,i=1,2,\ldots do
15:         𝕐v𝕐v{𝐲i}\mathbb{Y}_{v}\leftarrow\mathbb{Y}_{v}\cup\{\mathbf{y}_{i}\} if sufficient storage      
16:     Receive Q-value Qu(ζ)(𝐲i,ai)Q_{u}^{(\zeta)}(\mathbf{y}_{i},a_{i}) from uu, 𝐲i\mathbf{y}_{i}: ai=ua_{i}=u
17:     𝕐v𝕐v\{𝐲i𝕐v|𝐲i has been assigned ai=u}\mathbb{Y}_{v}\leftarrow\mathbb{Y}_{v}\backslash\{\mathbf{y}_{i}\in\mathbb{Y}_{v}\;|\;\mathbf{y}_{i}\text{ has been assigned $a_{i}=u$}\}
18:     Update Q-table v\mathbb{Q}_{v} of satellite vv using Eq. (17)
19:tt+1t\leftarrow t+1
20:ev,u(t)1e_{v,u}(t)\leftarrow 1 if (v,u)(v,u)-ISL is active, 0 if not, u𝕍\{v}\forall u\in\mathbb{V}\backslash\{v\}
21:go to line 3

Mapping the policies πv,v𝕍\pi_{v},\forall v\in\mathbb{V}, resulted from the convergence of the MAQL algorithm, back to 𝐩\mathbf{p}, allows us to define Π:𝐱𝐩\Pi:\mathbf{x}\rightarrow\mathbf{p} as the VNF placement policy that minimizes end-to-end delay for the requested service.

Remark 1. (MAQL Framework’s Features Supporting Large-Scale LEO Satellite Systems): In mega-constellation LEO satellite networks, the decentralized architecture of the MAQL framework facilitates scalability and reduces the need for extensive global coordination. Furthermore, parameter sharing mechanism enhances both scalability and learning efficiency within large networks.

V VNF Caching in LSNs

In the previous sections, we assumed that each satellite comes pre-installed with a subset of VNFs. Based on this, VNF placement is carried out. In this section, we aim to determine the optimal subset of VNFs to be cached/pre-installed on each satellite, thereby maximizing the successful request serving rate.

V-A Objective Function Formulation

We recall that |𝔽v||\mathbb{F}_{v}| is the given, fixed caching capacity of satellite vv. A system-wide caching strategy is given as:

𝜽={θi|θi𝔽,i[1,u=1V|𝔽u|]},\displaystyle\boldsymbol{\theta}=\left\{\theta_{i}|\theta_{i}\in\mathbb{F},\forall i\in\left[1,\sum_{u=1}^{V}|\mathbb{F}_{u}|\right]\right\}, (19)

where 𝔽v={θi|i[u=1v1|𝔽u|+1,u=1v|𝔽u|]}\mathbb{F}_{v}=\left\{\theta_{i}\;|\;\forall i\in\left[\sum_{u=1}^{v-1}|\mathbb{F}_{u}|+1,\sum_{u=1}^{v}|\mathbb{F}_{u}|\right]\right\} contains the VNFs cached by satellite v2v\geq 2, and 𝔽1={θi|i[1,|𝔽1|]}\mathbb{F}_{1}=\{\theta_{i}\;|\;\forall i\in\left[1,|\mathbb{F}_{1}|\right]\} are those cached by satellite 1. The objective of the VNF caching task is to maximize the average number of requests served in a time slot, i.e., the request serving rate. Thus, we aim at maximizing the objective function 𝔼[MΠ(𝜽)]\mathbb{E}\left[M_{\Pi}(\boldsymbol{\theta})\right] where 𝔼[]\mathbb{E}\left[\cdot\right] is the expectation operator; MΠ(𝜽)M_{\Pi}(\boldsymbol{\theta}) is associated with the VNF placement policy Π\Pi and caching strategy 𝜽\boldsymbol{\theta}, as:

MΠ(𝜽)=1Ωt=1Ω𝟙{𝒞(𝐱(t),Π(𝐱(t)|𝜽)))<Cp},\displaystyle M_{\Pi}(\boldsymbol{\theta})=\frac{1}{\Omega}\sum_{t=1}^{\Omega}\mathds{1}\left\{\mathcal{C}\left(\mathbf{x}(t),\Pi(\mathbf{x}(t)\;|\;\boldsymbol{\theta}))\right)<C_{p}\right\}, (20)

where Ω\Omega is a predefined time horizon for calculating the serving rate and Π(𝐱(t)|𝜽)\Pi(\mathbf{x}(t)\;|\;\boldsymbol{\theta}) is the VNF placement at time slot tt, following policy Π\Pi, given the VNF caching strategy 𝜽\boldsymbol{\theta}. Our objective is then to calculate the optimal caching strategy that maximizes the expected request serving rate:

𝜽=argmax𝜽𝔼[MΠ(𝜽)].\displaystyle\boldsymbol{\theta}^{*}=\underset{\boldsymbol{\theta}}{\text{arg}\,\max}~{}\mathbb{E}\left[M_{\Pi}(\boldsymbol{\theta})\right]. (21)

V-B Bayesian Optimization Solution

Bayesian Optimization (BO) is a powerful iterative method for optimizing black-box functions [22]. This subsection outlines the key components and the BO-based algorithm for solving Eq. (21).

V-B1 Search Space

We define our solution search space as the family 𝚯\boldsymbol{\Theta} of caching strategies 𝜽\boldsymbol{\theta} that satisfy the following conditions: i) there exists at least one SFC hh\in\mathbb{H} such that 𝕊h𝜽\mathbb{S}_{h}\subseteq\boldsymbol{\theta} and ii) a VNF is not cached more than once in each satellite.333In executing multiple VNFs, a satellite can access simultaneously access all installed VNFs. Hence, it is sufficient to cache a VNF once. The first condition ensures that at least requests for one type of SFCs can be served, and the second guarantees that no caching storage is wasted.

V-B2 Surrogate Model

A probabilistic, surrogate model is selected and trained to predict the distributions of the objective function’s outcomes, defined in Eq. (20), for various inputs 𝜽𝚯\boldsymbol{\theta}\in\boldsymbol{\Theta}. Let μ˙(𝜽)\dot{\mu}(\boldsymbol{\theta}) and σ˙(𝜽)\dot{\sigma}(\boldsymbol{\theta}) be the predicted mean and standard deviation for 𝜽\boldsymbol{\theta}, respectively. Let also the radial basis function 𝒦(𝜽,𝜽)=exp(12β2𝜽𝜽),\mathcal{K}(\boldsymbol{\theta},\boldsymbol{\theta}^{\prime})=\exp\left(\frac{1}{2\beta^{2}}||\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}||\right), serve as the kernel function, which is responsible for measuring how close the VNF caching strategy is to the training data, where β\beta is a predefined length scale parameter and 𝜽𝜽\boldsymbol{\theta}^{\prime}\neq\boldsymbol{\theta}. We select the Gaussian Process (GP) as our surrogate model based on the following considerations.

First, MΠ(𝜽)M_{\Pi}(\boldsymbol{\theta}) defined in Eq. (20) is a summation of the indicator functions, the outputs of which are random quantities. Then, when Ω\Omega is sufficiently large and the outputs of the indicator function are weakly dependent, the Central Limit Theorem [23] can be applied. This theorem suggests that, under these conditions, the distribution of MΠ(𝜽)M_{\Pi}(\boldsymbol{\theta}) will approximate a normal distribution.

Second, GP modeling offers computational tractability. Let 𝜽~𝚯\tilde{\boldsymbol{\theta}}\in\boldsymbol{\Theta} be an input strategy for which MΠ(𝜽~)M_{\Pi}(\tilde{\boldsymbol{\theta}}) has been evaluated. We call 𝜽~\tilde{\boldsymbol{\theta}} a training data point, and 𝚯~={𝜽~i|i[1,|𝚯~|]}\tilde{\boldsymbol{\Theta}}=\{\tilde{\boldsymbol{\theta}}_{i}\;|\;\forall i\in[1,|\tilde{\boldsymbol{\Theta}}|\,]\} the training dataset. Under the GP modeling, we have the prior distribution MΠ(𝜽~)𝒩(μ˙(𝜽~),𝒦(𝚯~,𝚯~))M_{\Pi}(\tilde{\boldsymbol{\theta}})\sim\mathcal{N}\left(\dot{\mu}(\tilde{\boldsymbol{\theta}}),\mathcal{K}(\tilde{\boldsymbol{\Theta}},\tilde{\boldsymbol{\Theta}})\right) where 𝒦(𝚯~,𝚯~)\mathcal{K}(\tilde{\boldsymbol{\Theta}},\tilde{\boldsymbol{\Theta}}) is a matrix consisting of elements 𝒦(𝜽~,𝜽~)\mathcal{K}(\tilde{\boldsymbol{\theta}},\tilde{\boldsymbol{\theta}}^{\prime}), 𝜽~𝜽~𝚯~\forall\tilde{\boldsymbol{\theta}}\neq\tilde{\boldsymbol{\theta}}^{\prime}\in\tilde{\boldsymbol{\Theta}}. For a family of non-evaluated strategies Δ𝚯\mathbb{\Delta}\subseteq\boldsymbol{\Theta}, the joint distribution between MΠ(𝚯~)M_{\Pi}(\tilde{\boldsymbol{\Theta}}) and MΠ(Δ)M_{\Pi}(\mathbb{\Delta}) can be expressed as:

(MΠ(𝚯~)MΠ(Δ))𝒩((μ˙(𝚯~)μ˙(Δ)),(𝒦(𝚯~,𝚯~)𝒦(𝚯~,Δ)𝒦(Δ,𝚯~)𝒦(Δ,Δ)))\displaystyle\binom{M_{\Pi}(\tilde{\boldsymbol{\Theta}})}{M_{\Pi}(\mathbb{\Delta})}\sim\mathcal{N}\left(\binom{\dot{\mu}\left(\tilde{\boldsymbol{\Theta}}\right)}{\dot{\mu}\left(\mathbb{\Delta}\right)},\binom{\mathcal{K}(\tilde{\boldsymbol{\Theta}},\tilde{\boldsymbol{\Theta}})~{}\mathcal{K}(\tilde{\boldsymbol{\Theta}},\mathbb{\Delta})}{\mathcal{K}(\mathbb{\Delta},\tilde{\boldsymbol{\Theta}})~{}\mathcal{K}(\mathbb{\Delta},\mathbb{\Delta})}\right)

that provides analytically tractable expressions for the marginal and conditional distributions. The posterior distribution, i.e., the predicted distribution of non-evaluated strategies Δ\mathbb{\Delta}, given the training dataset 𝚯~\tilde{\boldsymbol{\Theta}}, follows a Gaussian distribution, 𝒩(μ˙(Δ),σ˙(Δ))\mathcal{N}\left(\dot{\mu}(\mathbb{\Delta}),\dot{\sigma}(\mathbb{\Delta})\right), where:

μ˙(𝜽)=μ˙(𝚯~)+𝒦(𝜽,𝚯~)𝒦(𝚯~,𝚯~)1(MΠ(𝚯~)μ˙(𝚯~)),\displaystyle\dot{\mu}(\boldsymbol{\theta})=\dot{\mu}(\tilde{\boldsymbol{\Theta}})+\mathcal{K}(\boldsymbol{\theta},\tilde{\boldsymbol{\Theta}})^{\intercal}\mathcal{K}(\tilde{\boldsymbol{\Theta}},\tilde{\boldsymbol{\Theta}})^{-1}(M_{\Pi}(\tilde{\boldsymbol{\Theta}})-\dot{\mu}(\tilde{\boldsymbol{\Theta}})), (22)
𝒦(𝜽~,𝜽)=𝒦(𝜽~,𝜽~)+𝒦(𝜽~,𝚯~)𝒦(𝚯~,𝚯~)1𝒦(𝜽,𝚯~),\displaystyle\mathcal{K}(\tilde{\boldsymbol{\theta}},\boldsymbol{\theta})=\mathcal{K}(\tilde{\boldsymbol{\theta}},\tilde{\boldsymbol{\theta}})+\mathcal{K}(\tilde{\boldsymbol{\theta}},\tilde{\boldsymbol{\Theta}})^{\intercal}\mathcal{K}(\tilde{\boldsymbol{\Theta}},\tilde{\boldsymbol{\Theta}})^{-1}\mathcal{K}(\boldsymbol{\theta},\tilde{\boldsymbol{\Theta}})^{\intercal}, (23)

𝜽Δ\boldsymbol{\theta}\in\mathbb{\Delta}, and 𝜽𝚯~\boldsymbol{\theta}\notin\tilde{\boldsymbol{\Theta}} being a non-evaluated caching strategy and 𝜽~𝚯~\tilde{\boldsymbol{\theta}}\in\tilde{\boldsymbol{\Theta}} being a strategy from the training dataset. The obtained posterior distribution is used to predict the mean and variance of MΠ(𝜽)M_{\Pi}(\boldsymbol{\theta}). The current posterior distribution is then considered as prior in the following iteration.

V-B3 Acquisition Function

The role of the acquisition function in the BO is to guide the search for the optimal solution in the search space [22]. To come up with the most efficient way of doing so, we implemented and evaluated the following four acquisition functions:

a) Probability of Improvement (PI): this function provides an exploration-exploitation balance metric for guiding the search process by estimating the likelihood that a new sample will surpass the current best observation, M𝗆𝖺𝗑=max𝜽~𝚯~MΠ(𝜽~)M_{\sf max}=\underset{\tilde{\boldsymbol{\theta}}\in\tilde{\boldsymbol{\Theta}}}{\max}~{}M_{\Pi}(\tilde{\boldsymbol{\theta}}). The PI function is defined as:

I(𝜽)=1Φ(M𝗆𝖺𝗑μ˙(𝜽)σ˙(𝜽))\displaystyle I(\boldsymbol{\theta})=1-\Phi\left(\frac{M_{\sf max}-\dot{\mu}(\boldsymbol{\theta})}{\dot{\sigma}({\boldsymbol{\theta}})}\right) (24)

where Φ\Phi is the Cumulative Distribution Function (CDF) of the standard normal distribution.

b) Expected Improvement (EI): this function measures the expected improvement over M𝗆𝖺𝗑M_{\sf max}, when evaluating a candidate input strategy 𝜽\boldsymbol{\theta}, and is defined as:

I(𝜽)=𝔼[max(MΠ(𝜽)M𝗆𝖺𝗑,0)].\displaystyle I(\boldsymbol{\theta})=\mathbb{E}\left[\max(M_{\Pi}(\boldsymbol{\theta})-M_{\sf max},0)\right]. (25)

c) Upper Confidence Bound (UCB): This function aims to balance exploration and exploitation by considering the uncertainty associated with the objective function’s evaluations and is given by:

I(𝜽)=μ˙(𝜽)+ξσ˙(𝜽),\displaystyle I(\boldsymbol{\theta})=\dot{\mu}(\boldsymbol{\theta})+\xi\;\dot{\sigma}(\boldsymbol{\theta}), (26)

where ξ\xi is a predefined parameter that controls the trade-off between exploration and exploitation.

d) Lower Confidence Bound (LCB): This function aims to maximize the lower estimate of the objective function’s evaluations and is defined as:

I(𝜽)=μ˙(𝜽)ξσ˙(𝜽),\displaystyle I(\boldsymbol{\theta})=\dot{\mu}(\boldsymbol{\theta})-\xi\;\dot{\sigma}(\boldsymbol{\theta}), (27)

LCB tends to exploit the known regions of the search space more aggressively.

V-B4 The BO Algorithm

We initiate the process by constructing the training set 𝚯~\tilde{\boldsymbol{\Theta}}; Depending on the size |𝚯~||\tilde{\boldsymbol{\Theta}}| we are opting for, we repeat a random selection of candidate caching strategies from the search space, 𝜽𝚯\boldsymbol{\theta}\in\boldsymbol{\Theta} for evaluation; MΠ(𝜽)M_{\Pi}(\boldsymbol{\theta}) is calculated by caching the VNFs on satellites according to 𝜽\boldsymbol{\theta}, letting the system run for Ω\Omega time slots following the VNF placement policy Π\Pi, and counting the number of successfully served requests. Naturally, this makes the evaluation step a time-consuming process. Therefore, during the BO runtime, we only evaluate caching strategies 𝜽\boldsymbol{\theta} that have the potential to be the optimal solution, 𝜽𝗆𝖺𝗑\boldsymbol{\theta}_{\sf max}, guided by the acquisition function:

𝜽𝗆𝖺𝗑=argmax𝜽𝚯I(𝜽),\displaystyle\boldsymbol{\theta}_{\sf max}=\underset{\boldsymbol{\theta}\in\boldsymbol{\Theta}}{\text{arg}\,\max}~{}I(\boldsymbol{\theta}), (28)

We then evaluate the objective function MΠ(𝜽𝗆𝖺𝗑)M_{\Pi}(\boldsymbol{\theta}_{\sf max}) and update the surrogate model; the prior distribution of MΠ(𝜽)M_{\Pi}(\boldsymbol{\theta}) is updated to a posterior Multi-variate Gaussian Distribution, with a mean value and a kernel matrix given by Eq. (22) and (23), where 𝚯~\tilde{\boldsymbol{\Theta}} now is the training dataset augmented by 𝜽𝗆𝖺𝗑\boldsymbol{\theta}_{\sf max}, the last evaluated input strategy.

After running the above steps for a sufficient number of time slots and candidate optimal solutions 𝜽𝗆𝖺𝗑\boldsymbol{\theta}_{\sf max}, the optimal VNF caching strategy 𝜽\boldsymbol{\theta}^{*} is calculated as:

𝜽=argmax𝜽𝚯~MΠ(𝜽).\displaystyle\boldsymbol{\theta}^{*}=\underset{\boldsymbol{\theta}\in\tilde{\boldsymbol{\Theta}}}{\text{arg}\,\max}~{}M_{\Pi}(\boldsymbol{\theta}). (29)

In practice, each satellite v𝕍v\in\mathbb{V} obtains the VNF caching data from the controller on the ground and installs the VNFs dictated by 𝜽\boldsymbol{\theta}^{*}. We envision this step as a one-time pre-online task. However, the learning process can continue to operate in the background, during runtime, seeking potential improvements in the VNF caching strategies. Each newly discovered and superior solution can be seamlessly applied, leading to an iterative refinement process. We concisely and formally present the discussed BO-based VNF caching method in Algorithm 2.

Algorithm 2 BO-based VNF Caching
1:Select μ˙()\dot{\mu}(\cdot) and 𝒦(,)\mathcal{K}(\cdot,\mathbf{\cdot}) \triangleright (Prior distribution)
2:until sufficient |𝚯~||\tilde{\boldsymbol{\Theta}}| do
3:    Sample 𝜽𝚯\boldsymbol{\theta}\in\boldsymbol{\Theta}, randomly & uniformly
4:    Evaluate MΠ(𝜽)M_{\Pi}(\boldsymbol{\theta})
5:    𝚯~𝚯~{𝜽}\tilde{\boldsymbol{\Theta}}\leftarrow\tilde{\boldsymbol{\Theta}}\cup\{\boldsymbol{\theta}\} \triangleright (Construct initial training set)
6:until large enough |𝚯~||\tilde{\boldsymbol{\Theta}}| do
7:    Calculate 𝜽max\boldsymbol{\theta}_{\max} using Eq. (28)
8:    Evaluate MΠ(𝜽max)M_{\Pi}(\boldsymbol{\theta}_{\max})
9:    𝚯~𝚯~{𝜽𝗆𝖺𝗑}\tilde{\boldsymbol{\Theta}}\leftarrow\tilde{\boldsymbol{\Theta}}\cup\{\boldsymbol{\theta}_{\sf max}\}
10:    Update μ˙(𝜽)\dot{\mu}(\boldsymbol{\theta}) and 𝒦(𝜽~,𝜽)\mathcal{K}(\tilde{\boldsymbol{\theta}},\boldsymbol{\theta}) using Eqs. (22) and (23)
11:    \triangleright (Posterior distribution)
12:return 𝜽\boldsymbol{\theta}^{*} using Eq. (29)

In addition, we note that the proposed VNF placement and the caching frameworks are dependent on each other. The optimal solution is obtained by executing the two processes of caching and placement in an iterative fashion. The BO-based algorithm presented in this subsection serves as an outer layer that provides the caching strategy that needs to be evaluated. The MAQL-based VNF placement policy, which serves as an inner iteration, returns the optimal serving rate after convergence. As we iterate, we obtain an optimal caching strategy and a converged MAQL model associated with this strategy.

VI Numerical Results

In this section, we assess the performance of the introduced VNF placement and caching algorithms. In this framework, we utilize the concept of time slots to discretize the transition of state 𝐬v\mathbf{s}_{v} (defined in Eq. (12)) and the motion of every satellite vv where the latter factor leads to the intermittency of ISLs. For the simulations, the default values of the common parameters are given in Table II. Variations in these or other parameters will be noted in each simulation. The details of the considered SFCs are given in Table III with Dh=15D_{h}=15 time slots, h\forall h. The required computational and storage resources are given as follows:

  • SFC 1: gf1=3,1g_{f}^{1}=3,1 and qf1=5,4q_{f}^{1}=5,4 for f=1,2f=1,2.

  • SFC 2: gf2=1,3,1g_{f}^{2}=1,3,1 and qf2=3,4,4q_{f}^{2}=3,4,4 for f=1,2,3f=1,2,3.

  • SFC 3: gf3=1,2g_{f}^{3}=1,2 and qf3=3,3q_{f}^{3}=3,3 for f=1,2f=1,2.

  • SFC h{4,,14}h\in\{4,\ldots,14\}: gfh=1g_{f}^{h}=1 and qfh=2q_{f}^{h}=2, f\forall f.

TABLE II: Experimental parameter configuration.
Parameter Value
df,vhd^{h}_{f,v} 1 time slot
μ\mu, μvr\mu^{r}_{v}, μhs\mu^{s}_{h} 0.9, 1/V1/V, 1/H1/H
τv,u\tau_{v,u} 1 time slot
CpC_{p}, γ\gamma, δ\delta 100, 0.6, 0.6
TABLE III: SFC index (hh) and topology (𝕊𝐡\mathbf{\mathbb{S}_{h}}).
Index Topology
1 232\rightarrow 3
2 2132\rightarrow 1\rightarrow 3
3 131\rightarrow 3
4 343\rightarrow 4
5 2352\rightarrow 3\rightarrow 5
6 1461\rightarrow 4\rightarrow 6
7 565\rightarrow 6
Index Topology
8 7897\rightarrow 8\rightarrow 9
9 1101\rightarrow 10
10 484\rightarrow 8
11 6106\rightarrow 10
12 343\rightarrow 4
13 787\rightarrow 8
14 121\rightarrow 2

VI-A Cooperative MAQL-based VNF Placement

Refer to caption
(a) Optimal DP cost and serving rate vs resources.
Refer to caption
(b) Serving rate per training episode vs resources.
Refer to caption
(c) System cost comparison with optimal DP solution.
Figure 5: Evaluation of the proposed MAQL-based scheme in comparison with the optimal DP solution.
Refer to caption
(a) Comparison against existing methods in serving rate.
Refer to caption
(b) Comparison against existing methods in system cost.
Refer to caption
(c) Impact from the number of satellites on the system.
Figure 6: Assessment of the proposed MAQL-based scheme against baseline methods and illustration of the impact from the satellite number on the achieved serving rate and average delay of SFC deployment.

First, we evaluate the efficiency of the MAQL-based VNF placement mechanism. The optimal placement is given by the DP Eq. (6). To calculate the average cost, we compute the discounted long-term cost using Eq. (16). Additionally, the serving rate is determined by dividing the number of successfully served requests by the total number of requests. The learning rate is λ=0.1\lambda=0.1 and decreases to 0.01 if there is no improvement in a certain period of time.

1) Performance against the Optimal solution: We compare the proposed scheme against the optimal in Fig. 5. Our simulation setup involves V=3V=3 satellites, and a set of available services ={1,3}\mathbb{H}=\{1,3\}. The sets of VNFs cached on the satellites are given as 𝔽1={1,2}\mathbb{F}_{1}=\{1,2\}, 𝔽2={2}\mathbb{F}_{2}=\{2\} and 𝔽3={3}\mathbb{F}_{3}=\{3\}, respectively. The active ISLs periods between satellites are T1,2=2T_{1,2}=2, T2,3=4T_{2,3}=4 and T1,3=4T_{1,3}=4 time slots. The number of computational and storage resource units are equal, Rv=ZvR_{v}=Z_{v}, v𝕍\forall v\in\mathbb{V}. To set the reference point, Fig. 5(a) presents an analysis on the optimal discounted long-term cost and serving rate achieved through the proposed DP Eq. (6); as expected, the cost decreases and the serving rate increases as the satellites become more resourceful. Next, we define three levels of available satellite resources, Low, Medium and High, where Rv=Zv=4R_{v}=Z_{v}=4, 7 and 1414, respectively. Having established the optimal behavior, in Fig. 5(b) we compare the proposed MAQL approach against it; we observe that the proposed VNF placement solution quickly converges to near-optimal serving rates for all cases. Nonetheless, the more the initial satellite resources, the better the performance as the solution search space becomes broader. Similar observations can be made from Fig. 5(c), where the long-term system cost achieved after convergence is shown to be close to the optimal. We note here that throughout the experimentation under the Low satellite resources, 59%59\% of the times the placement policy had VNF 22 executed on satellite 22 and 41%41\% of the times on satellite 11; this preference on satellite 22 is a result of the resource contention for the limited resources of satellite 11. We should also note that after the one-time offline training of the algorithm, inferring the best policy is instantaneous; in this simple setup, the online execution time is on average 2.4×103sec2.4\times 10^{-3}sec for MAQL, while the optimal DP-based solution requires 128.57sec128.57sec to calculate a solution, making it unfit for real-time decision making.

2) Performance against baselines and impact of the satellite number: We then assess the proposed algorithms against two baselines and illustrate the impact of a growing number of satellites in Fig. 6. Here, we assume a system with the SFC set ={4,,13}\mathbb{H}=\{4,...,13\} and V=6V=6 satellites grouped in two subgroups: 𝕍1={1,3,5}𝕍\mathbb{V}_{1}=\{1,3,5\}\subseteq\mathbb{V} and 𝕍2={2,4,6}𝕍,𝕍1𝕍2=𝕍\mathbb{V}_{2}=\{2,4,6\}\subseteq\mathbb{V},\mathbb{V}_{1}\cup\mathbb{V}_{2}=\mathbb{V}. The active ISL periods are Tv,u=2T_{v,u}=2 if v,u𝕍1v,u\in\mathbb{V}_{1} or v,u𝕍2v,u\in\mathbb{V}_{2}, Tv,u=4T_{v,u}=4 otherwise. The sets of cached VNFs are given as 𝔽1={1,2}\mathbb{F}_{1}=\{1,2\}, 𝔽2={3,4}\mathbb{F}_{2}=\{3,4\}, 𝔽3={5,6}\mathbb{F}_{3}=\{5,6\}, 𝔽4={7,8}\mathbb{F}_{4}=\{7,8\}, 𝔽5={9,10}\mathbb{F}_{5}=\{9,10\} and 𝔽6={1,10}\mathbb{F}_{6}=\{1,10\}. A brief description of the compared baselines follows:

a) Q-learning [24]: we modified this learning VNF placement algorithm for an LSN environment as follows:

  • State: the requested SFC and VNF for execution.

  • Action: the satellite that will execute the above VNF.

  • Reward: 25(ιC+ιR)exp(2d/ψ)25(\iota_{C}+\iota_{R})\cdot\exp(2d/\psi), where ιC\iota_{C} and ιR\iota_{R} are the ratios between the required/available computational and storage resources on the designated satellite, respectively; dd is the end-to-end delay of the SFC placement, and ψ\psi is the length of the SFC. Other parameters values are as in [24].

We note that this Q-learning-based method is a centralized method that makes a placement decision for an entire SFC in a single time slot, based on the set of active ISLs.

b) Greedy [25]: We modify this work’s mechanism to fit our LSN environment by matching our requester satellite concept with both the ingress and the egress switch since the data must be returned to the requester. Based on this, a satellite v𝕍v\in\mathbb{V} selects another satellite to transfer data by assigning them a cost score. The original cost function is altered as follows: 𝒢(v,u)=ϵv,u+ϵu,v\mathcal{G}(v,u)=\epsilon_{v,u}+\epsilon_{u,v} where ϵv,u\epsilon_{v,u} is the delay for transferring data from satellite vv to uu, which consists of the waiting time for the (v,u)(v,u)-ISL and the transferring time (11 time slot). A satellite vv will transfer the data to satellite uu with the earliest ISL availability (if more than one, it selects randomly uniformly). The VNF placement is performed step by step in each time slot.

To make a fair comparison, none of the three algorithms have knowledge of the pre-installed VNFs at the beginning of the experiment. Figs. 6(a) and 6(b) illustrate the results of this benchmarking, in terms of achieved serving rate and cost. The MAQL-based solution outperforms the alternatives by allowing satellites to share learning parameters, enabling cooperative policy updates toward optimal solutions. The Greedy method prioritizes shorter paths while it transfers data between satellites without any VNF caching knowledge, resulting in longer end-to-end delays. Although the high end-to-end delay tolerance in our setting allows data to eventually reach a satellite with the requested VNF cached, stricter tolerances would further degrade the Greedy approach’s serving rate. On the other hand, although the plain Q-learning method can eventually learn the VNF caching information, it yields a lower performance compared to the Greedy scheme; this mechanism does not consider the ISL periodicity, meaning that data can be transferred during an inactive ISL, deeming the placements invalid.

Subsequently, we examine the influence of the number of available satellites VV on the proposed VNF placement scheme’s performance. The satellites are divided into three subgroups: 𝕍1\mathbb{V}_{1} contains the first V/3\lceil V/3\rceil satellites, 𝕍2\mathbb{V}_{2} contains the next V/3\lceil V/3\rceil satellites, 𝕍3\mathbb{V}_{3} contains the rest, with 𝕍1𝕍2𝕍3=𝕍\mathbb{V}_{1}\cup\mathbb{V}_{2}\cup\mathbb{V}_{3}=\mathbb{V}. ISLs are always active within a subgroup, and intermittent between subgroups with the following periods: Tv,u=2T_{v,u}=2 if none of vv and uu belongs to 𝕍3\mathbb{V}_{3}; Tv,u=4T_{v,u}=4 if either vv or uu belong to 𝕍3\mathbb{V}_{3}. Each satellite has Rv=Zv=3R_{v}=Z_{v}=3 units of available resources and is installed with one VNF. We investigate such a limited-resource scenario in order to emphasize the impact of the topology expansion as the number of satellites, VV, increases. H=11H=11 SFCs are considered with ={4,,14}\mathbb{H}=\{4,\ldots,14\} (details available in Table III) which comprise 1010 VNFs. The VNF execution time is fixed at 2 time slots, i.e., df,vh=2,h,f,vd^{h}_{f,v}=2,\forall h,f,v. VNFs are selected and installed in the satellites in a round-robin fashion. The results are presented in Fig. 6(c). There, we observe that in terms of request serving rate, the performance increases significantly (from 22%22\% to almost 100%) when more satellites are available, due to the increased resource and caching capacity. The right y-axis of Fig. 6(c) presents the average end-to-end service delay calculated by dividing the total induced delay by the total number of requests, including successfully served and rejected ones. When VV is small compared to the number of VNFs (e.g., V=3V=3), there might not be enough VNFs installed in the network to form the entire requested SFC. However, a satellite might still execute an (installed) VNF and transfer the results to another satellite, causing a small delay before the request is rejected (e.g., around 2 time slots for V=5V=5). This occurs because the satellites are not aware of the installed VNFs and resources of each other. Naturally, the larger the VV is, the more complete sequences of SFCs can be deployed, resulting in an increase on average delay alongside the serving rate. On the other hand, as VV increases, |𝕍1||\mathbb{V}_{1}|, |𝕍2||\mathbb{V}_{2}| and |𝕍3||\mathbb{V}_{3}| also increase, allowing for more VNFs to be installed in each subgroup. This, in turn, increases the probability for an SFC to be deployed as a whole within a single subgroup where ISLs are always active. This further reduces the deployment time by avoiding the waiting time for ISLs. As a result, after the sweet spot of V=20V=20 where the serving rate is maximized, the average delay starts decreasing from 9.259.25 to 8.088.08 time slots per request.

3) VNF placement with large-scale service request: We now extend our analysis to a large-scale service request context, introducing increased randomization in the service request events. In particular, the set of VNFs is 𝔽={1,,10}\mathbb{F}=\{1,\ldots,10\}, and a random requested SFC hh is generated via Algorithm 3 where the input distributions 𝒟S\mathcal{D}_{\text{S}} and 𝒟F\mathcal{D}_{\text{F}} are Uniform Distributions. The total number of SFCs in the sample space of Algorithm 3 is computed by: H=ν=110ν!(10l)=9864100H=\sum_{\nu=1}^{10}\nu!\binom{10}{l}=9864100 SFCs. We consider a setup with V=10V=10 satellites separated into three subgroups 𝕍1={1,,4}\mathbb{V}_{1}=\{1,\ldots,4\}, 𝕍2={5,,8}\mathbb{V}_{2}=\{5,\ldots,8\}, and 𝕍3={9,10}\mathbb{V}_{3}=\{9,10\}. The available computational and storage resources are given by Rv=Zv=14R_{v}=Z_{v}=14 units. We let L=maxhlhL=\underset{h\in\mathbb{H}}{\max}~{}l_{h} be the maximum length of a requested SFC, and consider three different scenarios where L=5L=5, 8, and 10, respectively. Intuitively, the greater the length of a requested SFC, the more resource units and time slots are required for the deployment. Also, we set Dh=80D_{h}=80 and df,vh=1d_{f,v}^{h}=1 time slots, h\forall h, as the delay tolerance of SFC and VNF execution duration, respectively. ISLs in the same subgroups are always available. Tv,u=2T_{v,u}=2 if neither vv nor uu is in 𝕍3\mathbb{V}_{3} and Tv,u=4T_{v,u}=4 if either vv or uu is in 𝕍3\mathbb{V}_{3}.

Algorithm 3 Random SFC Request Generation
1:Input: VNF set 𝔽\mathbb{F}, distributions 𝒟S\mathcal{D}_{\text{S}} and 𝒟F\mathcal{D}_{\text{F}}.
2:Output: 𝕊\mathbb{S} - the ordered sequence of VNFs.
3:Sample a random number ν{1,,|𝔽|}\nu\in\{1,\ldots,|\mathbb{F}|\} following distribution 𝒟S\mathcal{D}_{\text{S}}.
4:Initialize 𝔽~𝔽\tilde{\mathbb{F}}\leftarrow\mathbb{F} and 𝕊\mathbb{S}\leftarrow\emptyset.
5:for the fthf^{\text{th}} VNF where f=1,2,,νf=1,2,\ldots,\nu do
6:     Sample a VNF 𝔽~\mathcal{F}\in\tilde{\mathbb{F}} following distribution 𝒟F\mathcal{D}_{\text{F}}.
7:     𝕊𝕊{}\mathbb{S}\leftarrow\mathbb{S}\cup\{\mathcal{F}\} and 𝔽~𝔽~\{}\tilde{\mathbb{F}}\leftarrow\tilde{\mathbb{F}}\backslash\{\mathcal{F}\}. return 𝕊\mathbb{S}
Refer to caption
Figure 7: Comparison of the proposed MAQL-based scheme with the NBP [15] and the RP schemes for different maximum SFC’s lengths, LL.

We compare the proposed MAQL-based scheme against the Neighbor-Based Placement (NBP) scheme proposed in [15]. Originally, in [15], placement decisions for all VNFs in the requested SFC are made simultaneously, and only available ISLs and resources at the placement time are considered. A request is rejected if no satellite in the sub-network can execute it. To make the comparison fair, we modified the NBP method as follows: satellites can only execute VNFs cached on them, and requests are rejected with a penalty incurred if either the resources are insufficient or the required VNFs are not cached. Additionally, the Random Placement (RP) scheme is also considered as a baseline in which a satellite selects an action uniformly randomly for each request in its buffer.

Our results are in Fig. 7. The MAQL-based scheme shows a superior serving rate compared to the NBP scheme. This advantage arises because the proposed approach allows satellites to handle one SFC deployment step per time slot, leading to more effective use of future ISLs and resources. Although the NBP scheme optimally places VNFs under the given constraints, its performance is significantly limited by the requirement that satellites can only execute installed VNFs. This restriction, which relies solely on currently available satellites, severely reduces the number of executable VNFs. In contrast, the MAQL-based framework demonstrates a clear advantage in this context. Besides, the RP scheme, with a lack of strategic planning, shows lower performance compared to the others.

VI-B BO-based VNF Caching

In what follows, we evaluate the efficiency of the BO-based VNF caching mechanism. To simplify the evaluation of this component and to avoid any bias potentially induced by the MAQL-based VNF placement, we test the caching mechanism through the following straightforward, greedy VNF placement scheme: if a satellite vv has the required VNF cached, it executes it, otherwise, vv forwards the request to satellite uu with the earliest ISL availability, and which has the required VNF cached. It is assumed that every satellite has complete knowledge of the cached VNFs on every other satellite. For this experimentation family, we first utilize an infrastructure consisting of V=20V=20 satellites and H=2H=2 SFCs, specifically h={1,2}h\in\mathbb{H}=\{1,2\}. We group the satellites into three distinct subgroups as follows: 𝕍1={1,,7}\mathbb{V}_{1}=\{1,...,7\}, 𝕍2={8,,14}\mathbb{V}_{2}=\{8,...,14\}, and 𝕍3={15,,20}\mathbb{V}_{3}=\{15,...,20\}, with 𝕍1𝕍2𝕍3=𝕍\mathbb{V}_{1}\cup\mathbb{V}_{2}\cup\mathbb{V}_{3}=\mathbb{V}. The ISLs between satellites of the same group are always active. The rest of the active ISL periods are given as: Tv,u=2T_{v,u}=2 between satellites of 𝕍1\mathbb{V}_{1} and 𝕍2\mathbb{V}_{2}, Tv,u=5T_{v,u}=5 between satellites of 𝕍1\mathbb{V}_{1} and 𝕍3\mathbb{V}_{3} and Tv,u=3T_{v,u}=3 between satellites of 𝕍2\mathbb{V}_{2} and 𝕍3\mathbb{V}_{3}. Each satellite can cache at most two VNFs, i.e., |𝔽v|=2|\mathbb{F}_{v}|=2, v𝕍\forall v\in\mathbb{V} and their initial resource capacities are equal to Rv=Zv=9R_{v}=Z_{v}=9, v𝕍\forall v\in\mathbb{V}.

Refer to caption
(a) Required number of evaluations without I()I(\cdot).
Refer to caption
(b) Required number of evaluations with I()I(\cdot).
Refer to caption
(c) Effectiveness of the employment of I()I(\cdot).
Figure 8: Demonstration of acquisition function I()I(\cdot)’s impact on determining potentially optimal caching strategies.

1) Performance impact of Acquisition Functions: We study the impact of incorporating an acquisition function I()I(\cdot) in Fig. 8, by closely examining three satellites, one from each group 𝕍1\mathbb{V}_{1}, 𝕍2\mathbb{V}_{2} and 𝕍3\mathbb{V}_{3}. We employ the PI acquisition function, Eq. (24), in our proposed BO-based caching method to perform caching on three VNFs, 𝕊1𝕊2={1,2,3}\mathbb{S}_{1}\cup\mathbb{S}_{2}=\{1,2,3\}. Given that |𝔽v|=2|\mathbb{F}_{v}|=2, v𝕍\forall v\in\mathbb{V}, this allows for choosing between three unique caching sets 𝔽v\mathbb{F}_{v} for each satellite, which gives us a search space of 𝚯=33=27\boldsymbol{\Theta}=3^{3}=27 caching strategies. In the case where no I()I(\cdot) is used, 𝜽𝗆𝖺𝗑\boldsymbol{\theta}_{\sf max} is sampled randomly from 𝚯\boldsymbol{\Theta}. The results of this case are illustrated in Fig. 8(a), where we observe that the GP model’s predictions align closely with the actual average serving rates represented on the x-axis for each caching decision. This emphasizes that the GP serves as a suitable surrogate model for our problem. However, without the guidance of an acquisition function, all the potential cache placements are subjected to evaluation, resulting in a considerable increase in execution time. Contrarily, when 𝜽𝗆𝖺𝗑\boldsymbol{\theta}_{\sf max} is calculated through an acquisition function as in Eq. (28), the number of evaluations required is significantly reduced, and the outcome approaches the optimal solution, showcasing a more efficient operation, as shown in Fig. 8(b). For visualization purposes, the caching strategies in Figs. 8(a) and 8(b) are sorted based on the true mean serving rate achieved with them. In Fig. 8(c), we illustrate the benefits of using an acquisition function; this reduction of 80%80\% in the number of evaluations needed directly translates to immense savings in real-world execution time. Specifically, in our setup, evaluating every potential caching strategy translated to 73.16sec73.16sec total execution time, whereas the same optimal caching strategy was pinpointed through the acquisition function in just 14.88sec14.88sec.

Refer to caption
(a) Achieved serving rate vs training set size.
Refer to caption
(b) Comparison against baseline methods.
Refer to caption
(c) Number of 𝜽\boldsymbol{\theta} evaluations per iteration.
Figure 9: Evaluations of the proposed BO-based scheme in terms of achieved serving rate and execution delay.

2) Performance impact of Acquisition Function type, cache size and comparison against baselines: In Fig. 9, we compare the performance of the four acquisition functions by analyzing the serving rate achieved as the training set size |𝚯~||\tilde{\boldsymbol{\Theta}}| increases with each iteration of Algorithm 2. The superior performance of the PI function, Eq. (24), as illustrated in Fig. 9(a), suggests that in the VNF caching problem, exploration and prioritizing immediate improvements are dominant factors. PI places a strong emphasis on exploring regions of the search space where improvements are likely, and this approach evidently proved advantageous. On the other hand, the EI function, Eq. (25), which also prioritizes exploration but to a slightly lesser extent, provided consistently good results, though subpar compared to PI. Contrarily, the UCB Eq. (26) and LCB Eq. (27) functions emphasize exploration by considering upper and lower bounds on the evaluated function, but they do not directly measure the likelihood of improvement over the current best value. This undermines the significance of identifying areas with the potential for caching policy improvements, and this is reflected in the mediocre performance. This improvement of the serving rate in this function follows a step function pattern, as the best serving rate achieved is improved only when a better caching solution is found. Following, in Fig. 9(b) we demonstrate the results of benchmarking the proposed VNF caching mechanism (BO) against two baselines from the literature:

a) Coordinate Descent (CD) [26]: initialize the caching strategy 𝜽\boldsymbol{\theta} and i=0i=0. In each iteration do:

  1. (i)

    Produce |h=1H𝕊h||\bigcup^{H}_{h=1}\mathbb{S}_{h}| different caching strategies 𝜽\boldsymbol{\theta}^{\prime} by alternatively assigning the ithi^{\text{th}} component of 𝜽\boldsymbol{\theta} with the VNF indices 1,,|h=1H𝕊h|1,\ldots,|\bigcup^{H}_{h=1}\mathbb{S}_{h}|. Then, evaluate MΠ(𝜽)M_{\Pi}(\mathbf{\boldsymbol{\theta}^{\prime}}) for every obtained 𝜽\boldsymbol{\theta}^{\prime}.

  2. (ii)

    update 𝜽\boldsymbol{\theta} to the 𝜽\boldsymbol{\theta}^{\prime} with the maximum evaluation.

  3. (iii)

    ii+1i\leftarrow i+1.

  4. (iv)

    Return 𝜽\boldsymbol{\theta} if termination criteria met, else go to (i).

b) Pattern Search (PS) [27]: initialize 𝜽\boldsymbol{\theta} randomly. In each iteration do:

  1. (i)

    generate neighbor strategies 𝜽={θi|θi𝔽}\boldsymbol{\theta}^{\prime}=\left\{\theta^{{}^{\prime}}_{i}|\theta^{{}^{\prime}}_{i}\in\mathbb{F}\right\}; 𝜽\boldsymbol{\theta}^{\prime} is a neighbor of 𝜽\boldsymbol{\theta} if there exists indices j{1,,N}j\in\{1,\ldots,N\} and k{1,,|𝔽|}k\in\{1,\ldots,|\mathbb{F}|\} such that θj=k\theta_{j}=\mathcal{F}_{k}, θj=k+1\theta_{j}^{{}^{\prime}}=\mathcal{F}_{k+1}, and θi=θi\theta_{i}^{{}^{\prime}}=\theta_{i} for iji\neq j. Evaluate MΠ(𝜽),M_{\Pi}(\mathbf{\boldsymbol{\theta}^{\prime}}), for every obtained 𝜽\boldsymbol{\theta}^{\prime}.

  2. (ii)

    update 𝜽\boldsymbol{\theta} to the 𝜽\boldsymbol{\theta}^{\prime} with the maximum evaluation.

  3. (iii)

    Return 𝜽\boldsymbol{\theta} if termination criteria met, else go to (i).

The results depicted in Fig. 9(b) indicate that the BO, CD, and PS schemes offer comparable serving rates upon completion. However, the BO scheme demonstrates a superior ability to strike a balance between maximizing the objective function and maintaining a low execution time compared to the other candidates. This is due to the ability of the presented framework in predicting the potential improvement of an unevaluated strategy. Finally, in Fig. 9(c) we briefly demonstrate the scalability of the BO-based algorithm. Since the objective function evaluation time accounts for the majority of the running time, we opted to show the average MΠ(𝜽)M_{\Pi}(\boldsymbol{\theta}) evaluations per iteration (lines 6116-11 in Algorithm 2) as the size of the infrastructure grows, for different cache sizes |𝔽v||\mathbb{F}_{v}|.

Next, we consider ={1,2,4}\mathbb{H}=\{1,2,4\} that results in 4 VNFs to be cached. Additionally, satellite subgroups are formed as follows: 𝕍1\mathbb{V}_{1} contains the first V/3\lceil V/3\rceil satellites, 𝕍2\mathbb{V}_{2} the next V/3\lceil V/3\rceil satellites and 𝕍3\mathbb{V}_{3} contains the rest. The ISL periodicity between the logical subgroups remains the same as defined at the beginning of this subsection. The figure suggests that the evaluation rate increases monotonically as the infrastructure size increases. This is consistent with the fact that the total number of caching strategies is given by v=1V(|h=1H𝕊h||𝔽v|)\prod_{v=1}^{V}\binom{|\cup^{H}_{h=1}\mathbb{S}_{h}|}{|\mathbb{F}_{v}|}. In addition, since the number of possible VNF subsets to cache on a satellite is given by the binomial coefficient (4|𝔽v|)\binom{4}{|\mathbb{F}_{v}|} we see that more evaluations are required when |𝔽v|=2|\mathbb{F}_{v}|=2 and |𝔽v|=1|\mathbb{F}_{v}|=1 and less for |𝔽v|=3|\mathbb{F}_{v}|=3.

3) Assessing the combined performance of the MAQL-based VNF-Placement & the BO-based VNF Caching methods: In this simulation, we compare the proposed MAQL-based scheme with the NBP scheme [15]. For both schemes, the subsets of installed VNFs on the satellites are selected via the proposed BO-based algorithm. The setup in this experiment is as follows: ={1,2}\mathbb{H}=\{1,2\} and the network consists of V=3V=3 satellites with T1,2=2T_{1,2}=2, T1,3=4T_{1,3}=4 and T2,3=4T_{2,3}=4 being the periods of ISLs between the satellite pairs. The satellite resources and VNF execution time are given as Rv=Zv=14,R_{v}=Z_{v}=14, and dfvh=2,v𝕍d^{h}_{fv}=2,\forall v\in\mathbb{V}.

Refer to caption
Figure 10: Comparison between the proposed MAQL-based approach and the NBP [15] schemes, under the BO-based VNF caching method.

The results in terms of achieved serving rate are presented in Fig. 10; “SFC 1” stands for the case when SFC 1 is the only available service. Here, the BO-based caching results in 𝔽v={2,3},v𝕍\mathbb{F}_{v}=\{2,3\},\forall v\in\mathbb{V} for both the MAQL and NBP methods, which can be straightforwardly confirmed as the optimal caching strategy as SFC 1 comprises only VNF 2 and 3. In this case, every request can be handled by satellites individually, hence, both methods serve 100%100\% of requests. Similarly, “SFC 2” stands for the case where SFC 2 is the only available service and “Mixed” when both SFCs are available and either is requested with equal probability of 0.50.5. In these cases we observe that the proposed method prevails due to the limitation of NBP which considers only satellites with currently active ISLs. The superiority of exploiting the periodic movements of satellites to perform the placement using the entire LSN in the case of the proposed MAQL-based solution is evident.

4) VNF caching with large-scale service request: In this simulation, we examine the impact of cache size in a large-scale service request setting. We consider the same simulation setup described in Subsection VI-A-3 where a total of H=9864100H=9864100 SFCs can be requested. We make the following modifications in the setup: V=12V=12 satellites that form three subgroups with four satellites in each one and Tv,u=10T_{v,u}=10 for satellites vv and uu belonging in different subgroups. To the best of our knowledge, none of the existing works in the literature have proposed a practical distribution for the popularity of VNFs. Therefore, to demonstrate the impact of the VNF popularity imbalance on the caching performance, we assume that the popularity of VNFs, i.e., 𝒟F\mathcal{D}_{\text{F}}, follows Zipf distribution [28]. This choice is based on Zipf’s skewness factor, χ\chi, which effectively models varying levels of popularity imbalance. Particularly, we consider low and high popularity imbalance levels defined by χ=1.5\chi=1.5 and χ=4\chi=4, respectively. The requested SFCs are generated through Algorithm 3.

Refer to caption
Figure 11: Assessment of the proposed BO-based caching method against the Greedy scheme where VNF popularity following a Zipf distribution with skewness factor χ\chi.

The results are illustrated in Fig. 11 where both methods achieve improved performance as the caching capacity |𝔽|v|\mathbb{F}|_{v}, v𝕍\forall v\in\mathbb{V}, increases. We compare the proposed BO-based scheme against a Greedy Caching scheme, where the popularity of VNFs is known and VNFs are cached in the descending order of their popularity. Since satellites operate independently and are not aware of the cached VNF subsets of each other, the greedy scheme, with a lack of diversity in its caching decisions, results in the same cache placement for every satellite. The presented framework can effectively capture the popularity and diversity aspects of VNFs for an effective caching strategy, as demonstrated in Fig. 11. Besides that, the bias of VNF popularity is also shown to have a significant impact with both schemes performing better when the bias defined by χ\chi increases from 1.51.5 to 44.

VII Conclusion

In this work, we have tackled the SFC placement problem in LSNs, aiming at optimizing long-term system performance. We have first developed an optimal offline service placement policy using a DP equation, but its high computational complexity, extensive statistical information, and centralized nature presented challenges for online use. To overcome these, we have proposed a cooperative MAQL-based approach with a parameter-sharing mechanism to manage the non-stationary satellite environment. Additionally, recognizing the dependence of SFC deployment on the pre-installed/cached VNFs on each satellite, we have incorporated a BO-based VNF caching scheme to maximize the request serving rate, where an acquisition function has guided the search towards potentially optimal caching strategies. Simulations have demonstrated that our proposed framework outperforms the most recent approaches introduced in the literature. Future work will focus on scaling the framework to handle multiple requests and exploring a distributed approach to enhance the BO-based method, particularly for applications in LEO satellite mega-constructions.

References

  • [1] H. Lee, B. Lee, H. Yang, J. Kim, S. Kim, W. Shin, B. Shim, and H. V. Poor, “Towards 6G Hyper-Connectivity: Vision, Challenges, and Key Enabling Technologies,” J. Commn. Net., vol. 25, no. 3, pp. 344–354, 2023.
  • [2] C. Han, X. Li, H. Ji, and H. Zhang, “Adaptive Online Service Function Chain Deployment in Large-scale LEO Satellite Networks,” in Proc. IEEE Int. Symp. Pers., Indoor, Mobile Radio Commun., Toronto, ON, Canada, 2023, pp. 1–6.
  • [3] X. Qin, T. Ma, Z. Tang, X. Zhang, H. Zhou, and L. Zhao, “Service-Aware Resource Orchestration in Ultra-Dense LEO Satellite-Terrestrial Integrated 6G: A Service Function Chain Approach,” IEEE Trans. Wirel. Commun., vol. 22, no. 9, 2023.
  • [4] J. He, N. Cheng, Z. Yin, H. Zhou, W. Xu, H. Peng, C. Zhou, and R. Zhang, “Service-Oriented Resource Allocation in SDN Enabled LEO Satellite Networks,” in Proc. IEEE Int. Symp. Pers., Indoor, Mobile Radio Commun., Toronto, ON, Canada, 2023, pp. 1–6.
  • [5] T. Li, H. Zhou, H. Luo, Q. Xu, S. Hua, and B. Feng, “Service Function Chain in Small Satellite-Based Software Defined Satellite Networks,” China Commun., vol. 15, no. 3, pp. 157–167, 2018.
  • [6] A. Leivadeas, G. Kesidis, M. Ibnkahla, and I. Lambadaris, “VNF Placement Optimization at the Edge and Cloud,” Future Internet, vol. 11, no. 3, p. 69, 2019.
  • [7] G. L. Santos et al., “Service Function Chain Placement in Distributed Scenarios: A Systematic Review,” J. Netw. Syst. Manag., vol. 30, no. 1, p. 4, 2022.
  • [8] K. Doan, M. Avgeris, A. Leivadeas, I. Lambadaris, and W. Shin, “Service Function Chaining in LEO Satellite Networks via Multi-Agent Reinforcement Learning,” in Proc. IEEE Global Telecommun. Conf., Kuala Lumpur, Malaysia, 2023, pp. 7145–7150.
  • [9] L. Jin, L. Wang, X. Jin, J. Zhu, K. Duan, and Z. Li, “Research on the Application of LEO Satellite in IOT,” in Proc. Int. Conf. Eng. Technol. Innov., India, 2022, pp. 739–741.
  • [10] A. Mokhtar and M. Azizoglu, “On the Downlink Throughput of a Broadband LEO Satellite Network with Hopping Beams,” IEEE Commun. Lett., vol. 4, no. 12, pp. 390–393, 2000.
  • [11] R. Deng, B. Di, H. Zhang, L. Kuang, and L. Song, “Ultra-Dense LEO Satellite Constellations: How Many LEO Satellites Do We Need?” IEEE Trans. Wirel. Commun., vol. 20, no. 8, pp. 4843–4857, 2021.
  • [12] R. Deng, B. Di, and L. Song, “Ultra-Dense LEO Satellite Based Formation Flying,” IEEE Trans. Commun., vol. 69, no. 5, pp. 3091–3105, 2021.
  • [13] P. Wang, B. Di, and L. Song, “Multi-layer LEO Satellite Constellation Design for Seamless Global Coverage,” in Proc. IEEE Global Telecommun. Conf., Spain, 2021, pp. 01–06.
  • [14] B. Ko and S. Kwak, “Survey of Computer Vision-Based Natural Disaster Warning Systems,” Opt. Eng., vol. 51, pp. 901–936, 2012.
  • [15] X. Gao, R. Liu, A. Kaushik, J. Thompson, H. Zhang, and Y. Ma, “Dynamic Resource Management for Neighbor-Based VNF Placement in Decentralized Satellite Networks,” in Proc. Int. Conf. 6G Netw., Paris, France, 2022, pp. 1–5.
  • [16] X. Gao, R. Liu, and A. Kaushik, “Virtual Network Function Placement in Satellite Edge Computing With a Potential Game Approach,” IEEE Trans. Netw. Service Manag., vol. 19, no. 2, pp. 1243–1259, 2022.
  • [17] X. Qin, T. Ma, Z. Tang, X. Zhang, X. Liu, and H. Zhou, “SFC Enabled Data Delivery for Ultra-Dense LEO Satellite-Terrestrial Integrated Network,” in Proc. IEEE Global Telecommun. Conf., Rio de Janeiro, Brazil, 2022, pp. 668–673.
  • [18] Z. Jia, M. Sheng, J. Li, D. Zhou, and Z. Han, “VNF-Based Service Provision in Software Defined LEO Satellite Networks,” IEEE Trans. Wirel. Commun., vol. 20, no. 9, pp. 6139–6153, 2021.
  • [19] Z. Jia et al., “Joint Optimization of VNF Deployment and Routing in Software Defined Satellite Networks,” in Proc. IEEE Veh. Technol. Conf., Porto, Portugal, 2018, pp. 1–5.
  • [20] Y. Cai, Y. Wang, X. Zhong, W. Li, X. Qiu, and S. Guo, “An Approach to Deploy Service Function Chains in Satellite Networks,” in Proc. IEEE/IFIP Netw. Oper. Manage. Symp., Taipei, Taiwan, 2018, pp. 1–7.
  • [21] Q. Xia, G. Wang, Z. Xu, W. Liang, and Z. Xu, “Efficient Algorithms for Service Chaining in NFV-Enabled Satellite Edge Networks,” IEEE Trans. Mob. Comput., pp. 1–17, 2023.
  • [22] R. Garnett, Bayesian Optimization.   Cambridge Univ. Press, 2023.
  • [23] L. Wasserman, All of statistics : A Concise Course in Statistical Inference.   New York: Springer, 2010.
  • [24] S. Pandey, J. W.-K. Hong, and J.-H. Yoo, “Q-Learning Based SFC Deployment on Edge Computing Environment,” in Proc. Asia-Parcific Netw. Oper. Manage. Symp., Daegu, Korea, 2020, pp. 220–226.
  • [25] J. Liu, Y. Li, Y. Zhang, L. Su, and D. Jin, “Improve Service Chaining Performance with Optimized Middlebox Placement,” IEEE Trans. Serv. Comput., vol. 10, no. 4, pp. 560–573, 2017.
  • [26] S. J. Wright and B. Recht, Coordinate Descent.   Cambridge Univ. Press, 2022, p. 100–117.
  • [27] C. Bogani, M. Gasparo, and A. Papini, “Generalized Pattern Search Methods for a Class of Nonsmooth Optimization Problems with Structure,” J. Comput. Appl. Math., vol. 229, no. 1, pp. 283–293, 2009.
  • [28] A. Saichev, Y. Malevergne, and D. Sornette, Theory of Zipf’s Law and Beyond.   Springer Berlin Heidelberg, 2009.