This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Private Information Retrieval From a Cellular Network With Caching at the Edge

Siddhartha Kumar, , Alexandre Graell i Amat, ,
Eirik Rosnes, , and Linda Senigagliesi
S. Kumar and E. Rosnes were supported by the Research Council of Norway (grant 240985/F20). A. Graell i Amat was supported by the Swedish Research Council under grant #2016-04253.S. Kumar and E. Rosnes are with Simula UiB, N-5020 Bergen, Norway (e-mail: {kumarsi,eirikrosnes}@simula.no).A. Graell i Amat is with the Department of Electrical Engineering, Chalmers University of Technology, SE-41296 Gothenburg, Sweden (e-mail: alexandre.graell@chalmers.se).L. Senigagliesi is with Dipartimento di Ingegneria dell’Informazione, Università Politecnica delle Marche, Ancona, Italy (e-mail: l.senigagliesi@pm.univpm.it).
Abstract

We consider the problem of downloading content from a cellular network where content is cached at the wireless edge while achieving privacy. In particular, we consider private information retrieval (PIR) of content from a library of files, i.e., the user wishes to download a file and does not want the network to learn any information about which file she is interested in. To reduce the backhaul usage, content is cached at the wireless edge in a number of small-cell base stations using maximum distance separable codes. We propose a PIR scheme for this scenario that achieves privacy against a number of spy SBSs that (possibly) collaborate. The proposed PIR scheme is an extension of a recently introduced scheme by Kumar et al. to the case of multiple code rates, suitable for the scenario where files have different popularities. We then derive the backhaul rate and optimize the content placement to minimize it. We prove that uniform content placement is optimal, i.e., all files that are cached should be stored using the same code rate. This is in contrast to the case where no PIR is required. Furthermore, we show numerically that popular content placement is optimal for some scenarios.

I Introduction

Bringing content closer to the end user in wireless networks, the so-called caching at the wireless edge, has emerged as a promising technique to reduce the backhaul usage. The literature on wireless caching is vast. Information-theoretic aspects of caching were studied in [1, 2]. To leverage the potential gains of caching, several papers proposed to cache files in densely deployed small-cell base stations (SBSs) with large storage capacity, see, e.g., [3, 4, 5, 6, 7]. In [5], content is cached in SBSs using maximum distance separable (MDS) codes to reduce the download delay. This scenario was further studied in [7], where the authors optimized the MDS-coded caching to minimize the backhaul rate. Caching content directly in the mobile devices and exploiting device-to-device communication has been considered in, e.g., [8, 9, 10, 11, 12].

Recently, private information retrieval (PIR) has attracted a significant interest in the research community [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]. In PIR, a user would like to retrieve data from a distributed storage system (DSS) in the presence of spy nodes, without revealing any information about the piece of data she is interested in to the spy nodes. PIR was first studied by Chor et al. [24] for the case where a binary database is replicated among nn servers (nodes) and the aim is to privately retrieve a single bit from the database in the presence of a single spy node (referred to as the noncolluding case), while minimizing the total communication cost. In the last few years, spurred by the rise of DSSs, research on PIR has been focusing on the more general case where data is stored using a storage code.

The PIR capacity, i.e., the maximum achievable PIR rate, was studied in [18, 19, 21, 22, 23]. In [19, 23], the PIR capacity was derived for the scenario where data is stored in a DSS using a repetition code. In [22], for the noncolluding case, the authors derived the PIR capacity for the scenario where data is stored using an (single) MDS code, referred to as the MDS-PIR capacity. For the case where several spy nodes collaborate with each other, referred to as the colluding case, the MDS-PIR capacity is in general still unknown, except for some special cases [18] (and for repetition codes [23]). PIR protocols for DSSs have been proposed in [14, 16, 17, 20, 21]. In [16], a PIR protocol for MDS-coded DSSs was proposed and shown to achieve the MDS-PIR capacity for the case of noncolluding nodes when the number of files stored in the DSS goes to infinity. PIR protocols for the case where data is stored using non-MDS codes were proposed in [17, 20, 21].

In this paper, we consider PIR of content from a cellular network. In particular, we consider the private retrieval of content from a library of files that have different popularities. We consider a similar scenario as in [7] where, to reduce the backhaul usage, content is cached in SBSs using MDS codes. We propose a PIR scheme for this scenario that achieves privacy against a number of spy SBSs that possibly collude. The proposed PIR scheme is an extension of Protocol 3 in [21] to the case of multiple code rates, suitable for the scenario where files have different popularities. We also propose an MDS-coded content placement slightly different than the one in [7] but that is more adapted to the PIR case. We show that, for the conventional content retrieval scenario with no privacy, the proposed content placement is equivalent to the one in [7], in the sense that it yields the same average backhaul rate. We then derive the backhaul rate for the PIR case as a function of the content placement. We prove that uniform content placement, i.e., all files that are cached are encoded with the same code rate, is optimal. This is a somewhat surprising result, in contrast to the case where no PIR is considered, where optimal content placement is far from uniform [7]. We further consider the minimization of a weighted sum of the backhaul rate and the communication rate from the SBSs, relevant for the case where limiting the communication from the SBSs is also important. We finally report numerical results for both the scenario where SBSs are placed regularly in a grid and for a Poisson point process (PPP) deployment model where SBSs are distributed over the plane according to a PPP. We show numerically that popular content placement is optimal for some system parameters. To the best of our knowledge, PIR for the wireless caching scenario has not been considered before.

Notation: We use lower case bold letters to denote vectors, upper case bold letters to denote matrices, and calligraphic upper case letters to denote sets. For example, 𝒙\bm{x}, 𝑿\bm{X}, and 𝒳\mathcal{X} denote a vector, a matrix, and a set, respectively. We denote a submatrix of 𝑿\bm{X} that is restricted in columns by the set \mathcal{I} by 𝑿|\bm{X}|_{\mathcal{I}}. 𝒞\mathcal{C} will denote a linear code over the finite field GF(q)\mathrm{GF}(q). The multiplicative subgroup of GF(q)\mathrm{GF}(q) (not containing the zero element) is denoted by GF(q)×\mathrm{GF}(q)^{\times}. We use the customary code parameters (n,k)(n,k) to denote a code 𝒞\mathcal{C} of blocklength nn and dimension kk. A generator matrix for 𝒞\mathcal{C} will be denoted by 𝑮𝒞\bm{G}^{\mathcal{C}} and a parity-check matrix by 𝑯𝒞\bm{H}^{\mathcal{C}}. A set of coordinates of 𝒞\mathcal{C}, {1,,n}\mathcal{I}\subseteq\{1,\ldots,n\}, of size kk is said to be an information set if and only if 𝑮𝒞|\bm{G}^{\mathcal{C}}|_{\mathcal{I}} is invertible. The Hadamard product of two linear subspaces 𝒞\mathcal{C} and 𝒞\mathcal{C}^{\prime}, denoted by 𝒞𝒞\mathcal{C}\circ\mathcal{C}^{\prime}, is the space generated by the Hadamard products 𝒄𝒄(c1c1,,cncn)\bm{c}\circ\bm{c}^{\prime}\triangleq(c_{1}c_{1}^{\prime},\ldots,c_{n}c_{n}^{\prime}) for all pairs 𝒄𝒞\bm{c}\in\mathcal{C}, 𝒄𝒞\bm{c}^{\prime}\in\mathcal{C}^{\prime}. The inner product of two vectors 𝒙\bm{x} and 𝒙\bm{x}^{\prime} is denoted by 𝒙,𝒙\langle\bm{x},\bm{x}^{\prime}\rangle, while w𝖧(𝒙)w_{\mathsf{H}}\left(\bm{x}\right) denotes the Hamming weight of 𝒙\bm{x}. ()(\cdot)^{\top} represents the transpose of its argument, while 𝖧()\mathsf{H}(\cdot) represents the entropy function. With some abuse of language, we sometimes interchangeably refer to binary vectors as erasure patterns under the implicit assumption that the ones represent erasures. An erasure pattern (or binary vector) 𝒙\bm{x} is said to be correctable by a code 𝒞\mathcal{C} if matrix 𝑯𝒞|χ(𝒙)\bm{H}^{\mathcal{C}}|_{\chi(\bm{x})} has rank |χ(𝒙)||\chi(\bm{x})|.

II System Model

We consider a cellular network where a macro-cell is served by a macro base station (MBS). Mobile users wish to download files from a library of FF files that is always available at the MBS through a backhaul link. We assume all files of equal size.111Assuming files of equal size is without loss of generality, since content can always be divided into chunks of equal size. In particular, each file consists of βL\beta L bits and is represented by a β×L\beta\times L matrix 𝑿(i)\bm{X}^{(i)},

𝑿=(i)(𝒙~1(i)𝒙~β(i))\displaystyle\bm{X}{{}^{(i)}}=\left(\begin{array}[]{c}\tilde{\bm{x}}^{(i)}_{1}\\ \vdots\\ \tilde{\bm{x}}^{(i)}_{\beta}\end{array}\right)

where upperindex i=1,,Fi=1,\ldots,F is the file index. Therefore, each file can be seen as divided into β\beta stripes 𝒙~1(i),,𝒙~β(i)\tilde{\bm{x}}^{(i)}_{1},\ldots,\tilde{\bm{x}}^{(i)}_{\beta} of LL bits each. The file library has popularity distribution 𝒑=(p1,,pF)\bm{p}=(p_{1},\ldots,p_{F}), where file 𝑿(i)\bm{X}^{(i)} is requested with probability pip_{i}. We also assume that N𝖲𝖡𝖲N_{\mathsf{SBS}} SBSs are deployed to serve requests and offload traffic from the MBS whenever possible. To this purpose, each SBS has a cache size equivalent to MM files. The considered scenario is depicted in Fig. 1.

II-A Content Placement

File 𝑿(i)\bm{X}^{(i)} is partitioned into βki\beta k_{i} packets of size L/kiL/k_{i} bits and encoded before being cached in the SBSs. In particular, each packet is mapped onto a symbol of the field GF(qδi)\mathrm{GF}(q^{\delta_{i}}), with δiLkilog2q\delta_{i}\geq\frac{L}{k_{i}\log_{2}q}. For simplicity, we assume that Lkilog2q\frac{L}{k_{i}\log_{2}q} is integer and set δi=Lkilog2q\delta_{i}=\frac{L}{k_{i}\log_{2}q}. Thus, stripe 𝒙~a(i)\tilde{\bm{x}}^{(i)}_{a} can be equivalently represented by a stripe 𝒙a(i)\bm{x}^{(i)}_{a}, a=1,,βa=1,\ldots,\beta, of symbols over GF(qδi)\mathrm{GF}(q^{\delta_{i}}). Each stripe 𝒙a(i)\bm{x}^{(i)}_{a} is then encoded using an (N𝖲𝖡𝖲,ki)(N_{\mathsf{SBS}},k_{i}) MDS code 𝒞i\mathcal{C}_{i} over GF(q)\mathrm{GF}(q) into a codeword 𝒄a(i)=(ca,1(i),,ca,N𝖲𝖡𝖲(i))\bm{c}^{(i)}_{a}=(c^{(i)}_{a,1},\ldots,c^{(i)}_{a,N_{\mathsf{SBS}}}), where code symbols ca,j(i)c^{(i)}_{a,j}, j=1,,N𝖲𝖡𝖲j=1,\ldots,N_{\mathsf{SBS}}, are over GF(qδi)\mathrm{GF}(q^{\delta_{i}}). For later use, we define k𝗆𝗂𝗇min{ki}k_{\mathsf{min}}\triangleq\min\{k_{i}\}, k𝗆𝖺𝗑max{ki}k_{\mathsf{max}}\triangleq\max\{k_{i}\}, and δ𝗆𝖺𝗑Lk𝗆𝗂𝗇log2q\delta_{\mathsf{max}}\triangleq\frac{L}{k_{\mathsf{min}}\log_{2}q}.

The encoded file can be represented by a β×N𝖲𝖡𝖲\beta\times N_{\mathsf{SBS}} matrix 𝑪(i)=(ca,j(i))\bm{C}^{(i)}=(c^{(i)}_{a,j}). Code symbols ca,j(i)c^{(i)}_{a,j} are then stored in the jj-th SBS (the ordering is unimportant). Thus, for each file 𝑿(i)\bm{X}^{(i)}, each SBS caches one coded symbol of each stripe of the file, i.e., a fraction μi=1/ki\mu_{i}=1/k_{i} of the ii-th file. As ki{1,,N𝖲𝖡𝖲1}k_{i}\in\{1,\ldots,N_{\mathsf{SBS}}-1\},

μi{0,1/(N𝖲𝖡𝖲1),,1/2,1},\displaystyle\mu_{i}\in\mathcal{M}\triangleq\{0,1/(N_{\mathsf{SBS}}-1),\ldots,1/2,1\},

where μi=0\mu_{i}=0 implies that file 𝑿(i)\bm{X}^{(i)} is not cached. Note that, to achieve privacy, ki<N𝖲𝖡𝖲k_{i}<N_{\mathsf{SBS}}, i.e., files need to be cached with redundancy. As a result, μi=1/N𝖲𝖡𝖲\mu_{i}=1/N_{\mathsf{SBS}} is not allowed. This is in contrast to the case of no PIR, where ki=N𝖲𝖡𝖲k_{i}=N_{\mathsf{SBS}} (and hence μi=1/N𝖲𝖡𝖲\mu_{i}=1/N_{\mathsf{SBS}}) is possible.

Since each SBS can cache the equivalent of MM files, the μi\mu_{i}’s must satisfy

i=1Fμi\displaystyle\sum_{i=1}^{F}\mu_{i} M.\displaystyle\leq M.

We define the vector 𝝁=(μ1,,μF)\bm{\mu}=(\mu_{1},\ldots,\mu_{F}) and refer to it as the content placement. Also, we denote by 𝒞𝖬𝖣𝖲𝝁\mathcal{C}_{\mathsf{MDS}}^{\bm{\mu}} the caching scheme that uses MDS codes {𝒞i}\{\mathcal{C}_{i}\} according to the content placement 𝝁\bm{\mu}. For later use, we define μ𝗆𝗂𝗇min{μi|μi0}\mu_{\mathsf{min}}\triangleq\min\{\mu_{i}|\mu_{i}\neq 0\} and μ𝗆𝖺𝗑max{μi}\mu_{\mathsf{max}}\triangleq\max\{\mu_{i}\}.

Refer to caption
Figure 1: A wireless network for content delivery consisting of a MBS and five SBSs. Users download files from a library of FF files. The MBS has access to the library through a backhaul link. Some files are also cached at SBSs using a (5,3)(5,3) MDS code. User A retrieves a cached file from the three SBSs within range. User B retrieves a fraction 2/32/3 of a cached file from the two SBSs within range and the remaining fraction from the MBS.

We remark that the content placement above is slightly different than the content placement proposed in [7]. In particular, we assume fixed code length (equal to the number of SBSs, N𝖲𝖡𝖲N_{\mathsf{SBS}}) and variable kik_{i}, such that, for each file cached, each SBS caches a single symbol from each stripe of the file. In [7], the content placement is done by first dividing each file into kk symbols and encoding them using an (n~i,k)(\tilde{n}_{i},k) MDS code, where n~i=k+(N𝖲𝖡𝖲1)mi\tilde{n}_{i}=k+(N_{\mathsf{SBS}}-1)m_{i}, mikm_{i}\leq k. Then, mim_{i} (different) symbols of the ii-th file are stored in each SBS and the MBS stores kmik-m_{i} symbols.222This is because the model in [7] assumes that one SBS is always accessible to the user. If this is not the case, the MBS must store all kk symbols of the file. Here, we consider the case where the MBS must store all kk symbols because it is a bit more general. Our formulation is perhaps a bit simpler and more natural from a coding perspective. Furthermore, we will show in Section IV that the proposed content placement is equivalent to the one in [7], in the sense that it yields the same average backhaul rate.

II-B File Request

Mobile devices request files according to the popularity distribution 𝒑=(p1,,pF)\bm{p}=(p_{1},\ldots,p_{F}). Without loss of generality, we assume p1p2pFp_{1}\geq p_{2}\geq\ldots\geq p_{F}. The user request is initially served by the SBSs within communication range. We denote by γb\gamma_{b} the probability that the user is served by bb SBSs and define 𝜸=(γ0,,γN𝖲𝖡𝖲)\bm{\gamma}=(\gamma_{0},\ldots,\gamma_{N_{\mathsf{SBS}}}). If the user is not able to completely retrieve 𝑿(i)\bm{X}^{(i)} from the SBSs, the additional required symbols are fetched from the MBS. Using the terminology in [7], the average fraction of files that are downloaded from the MBS is referred to as the backhaul rate, denoted by R, and defined as

Raverage no. of bits downloaded from the MBSβL.\displaystyle\textnormal{R}\triangleq\frac{\text{average no. of bits downloaded from the MBS}}{\beta L}.

Note that for the case of no caching R=1\textnormal{R}=1.

As in [7], we assume that the communication is error free.

II-C Private Information Retrieval and Problem Formulation

We assume that some of the SBSs are spy nodes that (potentially) collaborate with each other. On the other hand, we assume that the MBS can be trusted. The users wish to retrieve files from the cellular network, but do not want the spy nodes to learn any information about which file is requested by the user. The goal is to retrieve data from the network privately while minimizing the use of the backhaul link, i.e., while minimizing R. Thus, the goal is to optimize the content placement 𝝁\bm{\mu} to minimize R.

III Private Information Retrieval Protocol

In this section, we present a PIR protocol for the caching scenario. The PIR protocol proposed here is an extension of Protocol 3 in [21] to the case of multiple code rates.333Protocol 3 in [21] is based on and improves the protocol in [20], in the sense that it achieves higher PIR rates.

Assume without loss of generality that the user wants to download file 𝑿(i)\bm{X}^{(i)}. To retrieve the file, the user generates nN𝖲𝖡𝖲n\leq N_{\mathsf{SBS}} query matrices, 𝑸(l)\bm{Q}^{(l)}, l=1,,nl=1,\ldots,n, where 𝑸(1),,𝑸(b)\bm{Q}^{(1)},\ldots,\bm{Q}^{(b)} are the queries sent to the bb SBSs within visibility and the remaining nbn-b queries 𝑸(b+1),,𝑸(n)\bm{Q}^{(b+1)},\ldots,\bm{Q}^{(n)} are sent to the MBS. Note that nn is a parameter that needs to be optimized. Each query matrix is of size d×βFd\times\beta F symbols (from GF(q)\mathrm{GF}(q)) and has the following structure,

𝑸(l)=(𝒒1(l)𝒒2(l)𝒒d(l))=(q1,1(l)q1,2(l)q1,βF(l)q2,1(l)q2,2(l)q2,βF(l)qd,1(l)qd,2(l)qd,βF(l)).\displaystyle\bm{Q}^{(l)}=\left(\begin{matrix}\bm{q}^{(l)}_{1}\\ \bm{q}^{(l)}_{2}\\ \vdots\\ \bm{q}^{(l)}_{d}\\ \end{matrix}\right)=\left(\begin{matrix}q^{(l)}_{1,1}&q^{(l)}_{1,2}&\cdots&q^{(l)}_{1,\beta F}\\ q^{(l)}_{2,1}&q^{(l)}_{2,2}&\cdots&q^{(l)}_{2,\beta F}\\ \vdots&\vdots&\cdots&\vdots\\ q^{(l)}_{d,1}&q^{(l)}_{d,2}&\cdots&q^{(l)}_{d,\beta F}\\ \end{matrix}\right).

The query matrix 𝑸(l)\bm{Q}^{(l)} consists of dd subqueries 𝒒j(l)\bm{q}^{(l)}_{j}, j=1,,dj=1,\ldots,d, of length βF\beta F symbols each. In response to query matrix 𝑸(l)\bm{Q}^{(l)}, a SBS (or the MBS) sends back to the user a response vector 𝒓(l)=(r1(l),,rd(l))\bm{r}^{(l)}=(r^{(l)}_{1},\ldots,r^{(l)}_{d})^{\top} of length dd, computed as

𝒓(l)=(r1(l),,rd(l))=𝑸(l)(c1,l(1),,cβ,l(1),,cβ,l(F)).\displaystyle\bm{r}^{(l)}=(r^{(l)}_{1},\ldots,r^{(l)}_{d})^{\top}=\bm{Q}^{(l)}\bigl{(}c^{(1)}_{1,l},\ldots,c^{(1)}_{\beta,l},\ldots,c^{(F)}_{\beta,l}\bigr{)}^{\top}. (1)

We will denote the jj-th entry of the response vector 𝒓(l)\bm{r}^{(l)}, i.e., rj(l)r^{(l)}_{j}, as the jj-th subresponse of 𝒓(l)\bm{r}^{(l)}. Each response vector consists of dd subresponses, each being a linear combination of βF\beta F symbols. Note that the operations are performed over the largest extension field, i.e., GF(qδ𝗆𝖺𝗑)\mathrm{GF}(q^{\delta_{\mathsf{max}}}), and the subresponses are also over this field, i.e., each subresponse is of size L/k𝗆𝗂𝗇=Lμ𝗆𝖺𝗑L/k_{\mathsf{min}}=L\mu_{\mathsf{max}} bits and hence each response is of size dLμ𝗆𝖺𝗑dL\mu_{\mathsf{max}} bits.

The queries and the responses must be such that privacy is ensured and the user is able to recover the requested file. More precisely, information-theoretic PIR in the context of wireless caching with spy SBSs is defined as follows.

Definition 1.

Consider a wireless caching scenario with N𝖲𝖡𝖲N_{\mathsf{SBS}} SBSs that cache parts of a library of FF files and in which a set 𝒯\mathcal{T} of TT SBSs act as colluding spies. A user wishes to retrieve the ii-th file and generates queries 𝐐(l)\bm{Q}^{(l)}, l=1,,nl=1,\ldots,n. In response to the queries the SBSs and (potentially) the MBS send back the responses 𝐫(l)\bm{r}^{(l)}. This scheme achieves perfect information-theoretic PIR if and only if

Privacy: 𝖧(i|𝑸(l),l𝒯)=𝖧(i);\displaystyle\mathsf{H}\bigl{(}i|\bm{Q}^{(l)},l\in\mathcal{T}\bigr{)}=\mathsf{H}(i); (2a)
Recovery: 𝖧(𝑿(i)|𝒓(1),,𝒓(n))=0.\displaystyle\mathsf{H}\bigl{(}\bm{X}^{(i)}|\bm{r}^{(1)},\ldots,\bm{r}^{(n)}\bigr{)}=0. (2b)

Condition (2a) means that the spy SBSs gain no additional information about which file is requested from the queries (i.e., the uncertainty about the file requested after observing the queries is identical to the a priori uncertainty determined by the popularity distribution), while Condition (2b) guarantees that the user is able to recover the file from the nn response vectors.

We define the (n,ki)(n,k_{i}) code 𝒞i\mathcal{C}^{\prime}_{i}, i=1,,Fi=1,\ldots,F, as the code obtained by puncturing the underlying (N𝖲𝖡𝖲,ki)(N_{\mathsf{SBS}},k_{i}) storage code 𝒞i\mathcal{C}_{i}, and by 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}}^{\prime} the code with parameters (n,k𝗆𝖺𝗑)(n,k_{\mathsf{max}}).444Without loss of generality, to simplify notation we assume that the last coordinates of the code are puntured. For the protocol to work, we require that k𝗆𝗂𝗇k_{\mathsf{min}} divides kik_{i} for all ii, i.e., k𝗆𝗂𝗇kik_{\mathsf{min}}\mid k_{i}. This ensures that GF(qδi)GF(qδ𝗆𝖺𝗑)\mathrm{GF}(q^{\delta_{i}})\subseteq\mathrm{GF}(q^{\delta_{\mathsf{max}}}). Furthermore, we require the codes 𝒞i\mathcal{C}^{\prime}_{i} to be such that 𝒞i𝒞𝗆𝖺𝗑\mathcal{C}^{\prime}_{i}\subseteq\mathcal{C}_{\mathsf{max}}^{\prime}. The protocol is characterized by the codes {𝒞i}\{\mathcal{C}^{\prime}_{i}\} and by two other codes, 𝒞¯\bar{\mathcal{C}} and 𝒞~\tilde{\mathcal{C}}. Code 𝒞¯\bar{\mathcal{C}} (over GF(q)\mathrm{GF}(q)) has parameters (n,k¯)(n,\bar{k}) and characterizes the queries sent to the SBSs and the MBS, while code 𝒞~\tilde{\mathcal{C}} (defined below) defines the responses sent back to the user from the SBSs and the MBS. The designed protocol achieves PIR against a number of colluding SBSs Td𝗆𝗂𝗇𝒞¯1T\leq d_{\mathsf{min}}^{\bar{\mathcal{C}}^{\perp}}-1, where d𝗆𝗂𝗇𝒞¯d_{\mathsf{min}}^{\bar{\mathcal{C}}^{\perp}} is the minimum Hamming distance of the dual code of 𝒞¯\bar{\mathcal{C}}.

III-A Query Construction

The queries must be constructed such that privacy is preserved and the user can retrieve the requested file from the nn response vectors 𝒓(l)\bm{r}^{(l)}, l=1,,nl=1,\ldots,n. In particular, the protocol is designed such that the subresponses rj(l)r^{(l)}_{j}, l=1,,nl=1,\ldots,n, corresponding to the nn subqueries 𝒒j(1),,𝒒j(n)\bm{q}_{j}^{(1)},\ldots,\bm{q}_{j}^{(n)} recover Γ\Gamma unique code symbols of the file 𝑿(i)\bm{X}^{(i)}.

The queries are constructed as follows. The user chooses βF\beta F codewords 𝒄¯m(i)=(cm,1(i),,cm,n(i))𝒞¯\bar{\bm{c}}_{m}^{(i)}=(c_{m,1}^{(i)},\ldots,c_{m,n}^{(i)})\in\bar{\mathcal{C}}, m=1,,βm=1,\ldots,\beta, i=1,,Fi=1,\ldots,F, independently and uniformly at random. Then, the user constructs nn vectors,

𝒄̊l=(𝒄̊l(1),,𝒄̊l(F)),l=1,,n,\displaystyle\mathring{\bm{c}}_{l}=(\mathring{\bm{c}}^{(1)}_{l},\ldots,\mathring{\bm{c}}^{(F)}_{l}),\quad l=1,\ldots,n, (3)

where 𝒄̊l(i)\mathring{\bm{c}}^{(i)}_{l} collects the ll-th coordinates of the β\beta codewords 𝒄¯m(i)\bar{\bm{c}}_{m}^{(i)}, m=1,,βm=1,\ldots,\beta, i.e., 𝒄̊l(i)=(c¯1,l(i),,c¯β,l(i))\mathring{\bm{c}}^{(i)}_{l}=(\bar{c}^{(i)}_{1,l},\ldots,\bar{c}^{(i)}_{\beta,l}).

Assume that the user wants to retrieve file 𝑿(i)\bm{X}^{(i)}. Then, subquery 𝒒j(l)\bm{q}^{(l)}_{j} is constructed as

𝒒j(l)=𝒄̊l+𝜹j(l),\displaystyle\bm{q}_{j}^{(l)}=\mathring{\bm{c}}_{l}+\bm{\delta}_{j}^{(l)}, (4)

where

𝜹j(l)={𝝎β(i1)+sj(l)if l𝒥j,𝝎0otherwise,\displaystyle\bm{\delta}_{j}^{(l)}=\begin{cases}\bm{\omega}_{\beta(i-1)+s_{j}^{(l)}}&\text{if }l\in\mathcal{J}_{j},\\ \bm{\omega}_{0}&\text{otherwise},\end{cases} (5)

for some set 𝒥j\mathcal{J}_{j} that will be defined below. Vector 𝝎t\bm{\omega}_{t}, t=1,,βFt=1,\ldots,\beta F, denotes the tt-th (βF)(\beta F)-dimensional unit vector, i.e., the length-βF\beta F vector with a one in the tt-th coordinate and zeroes in all other coordinates, and 𝝎0\bm{\omega}_{0} the all-zero vector. The meaning of index sj(l)s_{j}^{(l)} will become apparent later.

According to (4), each subquery vector is the sum of two vectors, 𝒄̊l\mathring{\bm{c}}_{l} and 𝜹j(l)\bm{\delta}_{j}^{(l)}. The purpose of 𝒄̊l\mathring{\bm{c}}_{l} is to make the subquery appear random and thus ensure privacy (i.e., Condition (2a)). On the other hand, the vectors 𝜹j(l)\bm{\delta}_{j}^{(l)} are deterministic vectors which must be properly constructed such that the user is able to retrieve the requested file from the response vectors (i.e., Condition (2b)). Similar to Protocol 3 in [21], the vectors 𝜹j(l)\bm{\delta}_{j}^{(l)} are constructed from a d×nd\times n binary matrix 𝑬^\hat{\bm{E}} where each row represents a weight-Γ\Gamma erasure pattern that is correctable by 𝒞~\tilde{\mathcal{C}} and where the weights of its columns are determined from β\beta information sets m\mathcal{I}_{m}, m=1,,βm=1,\ldots,\beta, of 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}}^{\prime}.

The construction of 𝑬^\hat{\bm{E}} is addressed below. We define the set l\mathcal{F}_{l} as the index set of information sets m\mathcal{I}_{m} that contain the ll-th coordinate of 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}}^{\prime}, i.e., l={m:lm}\mathcal{F}_{l}=\{m:l\in\mathcal{I}_{m}\}. To allow the user to recover the requested file from the response vectors, 𝑬^\hat{\bm{E}} is constructed such that it satisfies the following conditions.

  1. 𝖢𝟣.\mathsf{C1.}

    The user should be able to recover Γ\Gamma unique code symbols of the requested file 𝑿(i)\bm{X}^{(i)} from the responses to each set of nn subqueries 𝒒j(l)\bm{q}^{(l)}_{j}, l=1,,nl=1,\ldots,n. This is to say that each row of 𝑬^\hat{\bm{E}} should have exactly Γ\Gamma ones. We denote by 𝒥j\mathcal{J}_{j} the support of the jj-th row of 𝑬^\hat{\bm{E}}.

  2. 𝖢𝟤.\mathsf{C2.}

    The user should be able to recover Γdβki\Gamma d\geq\beta k_{i} unique code symbols of the requested file 𝑿(i)\bm{X}^{(i)}, at least kik_{i} symbols from each stripe. This means that each row 𝒆^j=(e^j,1,,e^j,n)\hat{\bm{e}}_{j}=(\hat{e}_{j,1},\ldots,\hat{e}_{j,n}), j=1,,dj=1,\ldots,d, of 𝑬^\hat{\bm{E}} should correspond to an erasure pattern that is correctable by 𝒞~\tilde{\mathcal{C}}.

  3. 𝖢𝟥.\mathsf{C3.}

    Let 𝒕l\bm{t}_{l}, l=1,,nl=1,\ldots,n, be the ll-th column vector of 𝑬^\hat{\bm{E}}. The protocol should be able to recover w𝖧(𝒕l)w_{\mathsf{H}}\left(\bm{t}_{l}\right) unique code symbols from the ll-th response vector, which means that it is required that w𝖧(𝒕l)=|l|w_{\mathsf{H}}\left(\bm{t}_{l}\right)=|{\mathcal{F}_{l}}|. We call the vector (w𝖧(𝒕1),,w𝖧(𝒕n))(w_{\mathsf{H}}\left(\bm{t}_{1}\right),\ldots,w_{\mathsf{H}}\left(\bm{t}_{n}\right)) the column weight profile of 𝐄^\hat{\bm{E}}.

Finally, from 𝑬^\hat{\bm{E}} we construct the vectors 𝜹j(l)\bm{\delta}_{j}^{(l)} in (5). In particular, index sj(l)s_{j}^{(l)} in (5) is such that sj(l)ls_{j}^{(l)}\in\mathcal{F}_{l} and sj(l)sj(l)s_{j}^{(l)}\neq s_{j^{\prime}}^{(l)} for jjj\neq j^{\prime}, j,j=1,,dj,j^{\prime}=1,\ldots,d.

III-B Response Vectors

The jj-th subresponse corresponding to subquery 𝒒j(l)\bm{q}^{(l)}_{j}, j=1,,dj=1,\ldots,d, is (see (1))

rj(l)=𝒒j(l),(c1,l(1),,cβ,l(F)).\displaystyle r^{(l)}_{j}=\langle\bm{q}_{j}^{(l)},(c_{1,l}^{(1)},\ldots,c_{\beta,l}^{(F)})\rangle.

The user collects the nn subresponses rj(l)r^{(l)}_{j}, l=1,nl=1,\ldots n, in the vector 𝝆j\bm{\rho}_{j},

𝝆j=(rj(1)rj(2)rj(n))=\displaystyle\bm{\rho}_{j}=\left(\begin{matrix}r^{(1)}_{j}\\ r^{(2)}_{j}\\ \vdots\\ r^{(n)}_{j}\end{matrix}\right)= m=1β(c¯m,1(1)cm,1(1)c¯m,2(1)cm,2(1)c¯m,n(1)cm,n(1)){𝒙(GF(qδmax))n:𝑯𝒞1𝒞¯𝒙=𝟎}+(c¯m,1(2)cm,1(2)c¯m,2(2)cm,2(2)c¯m,n(2)cm,n(2)){𝒙(GF(qδmax))n:𝑯𝒞2𝒞¯𝒙=𝟎}\displaystyle\sum_{m=1}^{\beta}\underbrace{\left(\begin{matrix}\bar{c}_{m,1}^{(1)}c_{m,1}^{(1)}\\ \bar{c}_{m,2}^{(1)}c_{m,2}^{(1)}\\ \vdots\\ \bar{c}_{m,n}^{(1)}c_{m,n}^{(1)}\\ \end{matrix}\right)}_{\begin{subarray}{c}\in\,\left\{\bm{x}\in(\mathrm{GF}(q^{\delta_{\rm max}}))^{n}\colon\right.\\ \left.\bm{H}^{\mathcal{C}^{\prime}_{1}\circ\,\bar{\mathcal{C}}}\bm{x}=\bm{0}\right\}\end{subarray}}+\underbrace{\left(\begin{matrix}\bar{c}_{m,1}^{(2)}c_{m,1}^{(2)}\\ \bar{c}_{m,2}^{(2)}c_{m,2}^{(2)}\\ \vdots\\ \bar{c}_{m,n}^{(2)}c_{m,n}^{(2)}\\ \end{matrix}\right)}_{\begin{subarray}{c}\in\,\left\{\bm{x}\in(\mathrm{GF}(q^{\delta_{\rm max}}))^{n}\colon\right.\\ \left.\bm{H}^{\mathcal{C}^{\prime}_{2}\circ\,\bar{\mathcal{C}}}\bm{x}=\bm{0}\right\}\end{subarray}}
++(c¯m,1(F)cm,1(F)c¯m,2(F)cm,2(F)c¯m,n(F)cm,n(F)){𝒙(GF(qδmax))n:𝑯𝒞𝗆𝖺𝗑𝒞¯𝒙=𝟎}+(oj(1)oj(2)oj(n)),\displaystyle+\cdots+\underbrace{\left(\begin{matrix}\bar{c}_{m,1}^{(F)}c_{m,1}^{(F)}\\ \bar{c}_{m,2}^{(F)}c_{m,2}^{(F)}\\ \vdots\\ \bar{c}_{m,n}^{(F)}c_{m,n}^{(F)}\\ \end{matrix}\right)}_{\begin{subarray}{c}\in\,\left\{\bm{x}\in(\mathrm{GF}(q^{\delta_{\rm max}}))^{n}\colon\right.\\ \left.\bm{H}^{\mathcal{C}_{\mathsf{max}}^{\prime}\circ\,\bar{\mathcal{C}}}\bm{x}=\bm{0}\right\}\end{subarray}}+\left(\begin{matrix}o_{j}^{(1)}\\ o_{j}^{(2)}\\ \vdots\\ o_{j}^{(n)}\end{matrix}\right), (6)

where symbol oj(l)o^{(l)}_{j} represents the code symbol from file 𝑿(i)\bm{X}^{(i)} downloaded in the jj-th subresponse from the ll-th response vector. Due to the structure of the queries obtained from 𝑬^\hat{\bm{E}}, the user retrieves Γ\Gamma code symbols from the set of nn subresponses to the jj-th subqueries. Consider a retrieval code 𝒞~\tilde{\mathcal{C}} of the form

𝒞~=i=1F𝒞i𝒞¯=(a)(i=1F𝒞i)𝒞¯,\displaystyle\tilde{\mathcal{C}}=\sum_{i=1}^{F}\mathcal{C}^{\prime}_{i}\circ\bar{\mathcal{C}}\overset{(a)}{=}\bigg{(}\sum_{i=1}^{F}\mathcal{C}^{\prime}_{i}\bigg{)}\circ\bar{\mathcal{C}}, (7)

where 𝒞i+𝒞j\mathcal{C}^{\prime}_{i}+\mathcal{C}^{\prime}_{j} denotes the sum of subspaces 𝒞i\mathcal{C}^{\prime}_{i} and 𝒞j\mathcal{C}^{\prime}_{j}, resulting in the set consisting of all elements 𝒄+𝒄\bm{c}+\bm{c}^{\prime} for any 𝒄𝒞i\bm{c}\in\mathcal{C}^{\prime}_{i} and 𝒄𝒞j\bm{c}^{\prime}\in\mathcal{C}^{\prime}_{j}, and where (a)(a) follows due to the fact that the Hadamard product is distributive over addition.

The symbols requested by the user are then obtained solving the system of linear equations defined by

𝑯𝒞~𝝆j=𝑯𝒞~(oj(1)oj(2)oj(n)).\displaystyle\bm{H}^{\tilde{\mathcal{C}}}\bm{\rho}_{j}=\bm{H}^{\tilde{\mathcal{C}}}\left(\begin{matrix}o^{(1)}_{j}\\ o^{(2)}_{j}\\ \vdots\\ o^{(n)}_{j}\end{matrix}\right).

III-C Privacy

For the retrieval, we require 𝒞~\tilde{\mathcal{C}} to be a valid code, i.e., it must have a code rate strictly less than 11. For a given number of colluding SBSs TT, the combination of conditions on 𝒞¯\bar{\mathcal{C}} and 𝒞~\tilde{\mathcal{C}} restricts the choice for the underlying storage codes {𝒞i}\{\mathcal{C}_{i}\}. In the following theorem, we present a family of MDS codes, namely generalized Reed-Solomon (GRS) codes, that work with the protocol. A GRS code 𝒞\mathcal{C} over GF(q)\mathrm{GF}(q) of length nn and dimension kk is a weighted polynomial evaluation code of degree kk defined by some weighting vector 𝒗=(v1,,vn)(GF(q)×)n\bm{v}=(v_{1},\ldots,v_{n})\in(\mathrm{GF}(q)^{\times})^{n} and an evaluation vector 𝜿=(κ1,,κn)(GF(q)×)n\bm{\kappa}=(\kappa_{1},\ldots,\kappa_{n})\in(\mathrm{GF}(q)^{\times})^{n} satisfying κiκj\kappa_{i}\neq\kappa_{j} for all iji\neq j [25, Ch. 5]. In the sequel, we refer to (n,k,𝒗,𝜿)(n,k,\bm{v},\bm{\kappa}) as the parameters of a GRS code 𝒞\mathcal{C}.

Lemma 1.

Given an (n,k𝗆𝖺𝗑,𝐯,𝛋)(n,k_{\mathsf{max}},\bm{v},\bm{\kappa}) GRS code 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}}, for all k<k𝗆𝖺𝗑k<k_{\mathsf{max}}, there exists an (n,k,𝐯,𝛋)(n,k,\bm{v},\bm{\kappa}) GRS code that is a subcode of 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}}.

Proof:

The canonical generator matrix for an (n,k𝗆𝖺𝗑,𝒗,𝜿)(n,k_{\mathsf{max}},\bm{v},\bm{\kappa}) GRS code 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}} is given by

(111κ1κ2κnκ1k𝗆𝖺𝗑1κ2k𝗆𝖺𝗑1κnk𝗆𝖺𝗑1)(v1000v2000vn).\displaystyle\begin{pmatrix}1&1&\dots&1\\ \kappa_{1}&\kappa_{2}&\dots&\kappa_{n}\\ \vdots&\vdots&\dots&\vdots\\ \kappa_{1}^{k_{\mathsf{max}}-1}&\kappa_{2}^{k_{\mathsf{max}}-1}&\dots&\kappa_{n}^{k_{\mathsf{max}}-1}\end{pmatrix}\begin{pmatrix}v_{1}&0&\dots&0\\ 0&v_{2}&\dots&0\\ \vdots&\vdots&\dots&\vdots\\ 0&0&\dots&v_{n}\end{pmatrix}. (8)

Clearly, taking the first kk rows of the leftmost matrix of (8) and multiplying it with the rightmost diagonal matrix generates an (n,k)(n,k) subcode of 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}} which by itself is an (n,k,𝒗,𝜿)(n,k,\bm{v},\bm{\kappa}) GRS code. Thus, GRS codes are naturally nested, and the result follows. ∎

Theorem 1.

Let 𝒞𝖬𝖣𝖲𝛍\mathcal{C}_{\mathsf{MDS}}^{\bm{\mu}} be a caching scheme with GRS codes {𝒞i}\{\mathcal{C}_{i}\} of parameters (N𝖲𝖡𝖲,ki,𝐯,(κ1,,κN𝖲𝖡𝖲))(N_{\mathsf{SBS}},k_{i},\bm{v},(\kappa_{1},\ldots,\kappa_{N_{\mathsf{SBS}}})) and let 𝒞i\mathcal{C}^{\prime}_{i} be the (n,ki)(n,k_{i}) code obtained by puncturing 𝒞i\mathcal{C}_{i}. Also, let 𝒞¯\bar{\mathcal{C}} be an (n,T,𝐯¯,(κ1,,κn))(n,T,\bar{\bm{v}},(\kappa_{1},\ldots,\kappa_{n})) GRS code. Then, for β=Γ=n(k𝗆𝖺𝗑+T1)\beta=\Gamma=n-(k_{\mathsf{max}}+T-1) and d=k𝗆𝖺𝗑d=k_{\mathsf{max}}, the protocol achieves PIR against TT colluding SBSs.

Proof:

The proof is given in the appendix. ∎

Note that the retrieval code 𝒞¯\bar{\mathcal{C}} depends on the nn SBSs within visibility that are contacted by the user through its evaluation vector. Finally, we remark that, with some slight modifications, the proposed protocol can be adapted to work with non-MDS codes.

III-D Example

As an example, consider the case of F=2F=2 files, 𝑿(1)\bm{X}^{(1)} and 𝑿(2)\bm{X}^{(2)}, both of size βL\beta L bits. The first file 𝑿(1)\bm{X}^{(1)} is stored in the SBSs according to Fig. 2 using an (N𝖲𝖡𝖲=6,k1=1)(N_{\mathsf{SBS}}=6,k_{1}=1) binary repetition code 𝒞1\mathcal{C}_{1}. Similarly, the second file 𝑿(2)\bm{X}^{(2)} is stored (again according to Fig. 2) using an (N𝖲𝖡𝖲=6,k2=5)(N_{\mathsf{SBS}}=6,k_{2}=5) binary single parity-check code 𝒞2\mathcal{C}_{2}. Assume n=N𝖲𝖡𝖲=6n=N_{\mathsf{SBS}}=6 (i.e., no puncturing) and that none of the SBSs collude, i.e., T=1T=1. Furthermore, we assume that the user wants to retrieve 𝑿(1)\bm{X}^{(1)} and is able to contact b=n=6b=n=6 SBSs (i.e., we consider the extreme case where the user is not contacting the MBS). According to Theorem 1, we can choose β=Γ=n(k𝗆𝖺𝗑+T1)=6(5+11)=1\beta=\Gamma=n-(k_{\mathsf{max}}+T-1)=6-(5+1-1)=1 and d=k𝗆𝖺𝗑=5d=k_{\mathsf{max}}=5. Finally, we choose 𝒞¯\bar{\mathcal{C}} as an (n=6,T=1)(n=6,T=1) binary repetition code.

According to (7), the retrieval code 𝒞~=(𝒞1+𝒞2)𝒞¯=𝒞1+𝒞2=𝒞2\tilde{\mathcal{C}}=(\mathcal{C}_{1}+\mathcal{C}_{2})\circ\bar{\mathcal{C}}=\mathcal{C}_{1}+\mathcal{C}_{2}=\mathcal{C}_{2} and can be generated by

𝑮𝒞~=𝑮𝒞2=(100001010001001001000101000011).\displaystyle\bm{G}^{\tilde{\mathcal{C}}}=\bm{G}^{\mathcal{C}_{2}}=\left(\begin{matrix}1&0&0&0&0&1\\ 0&1&0&0&0&1\\ 0&0&1&0&0&1\\ 0&0&0&1&0&1\\ 0&0&0&0&1&1\end{matrix}\right).

Moreover, let

𝑬^=(100000010000001000000100000010)and1={1,2,3,4,5},\displaystyle\hat{\bm{E}}=\left(\begin{matrix}1&0&0&0&0&0\\ 0&1&0&0&0&0\\ 0&0&1&0&0&0\\ 0&0&0&1&0&0\\ 0&0&0&0&1&0\end{matrix}\right)\,\,\,\,\text{and}\,\,\,\,\mathcal{I}_{1}=\{1,2,3,4,5\},

where 1\mathcal{I}_{1} is an information set of 𝒞𝗆𝖺𝗑=𝒞2\mathcal{C}_{\mathsf{max}}=\mathcal{C}_{2} (the submatrix 𝑮𝒞2|1\bm{G}^{\mathcal{C}_{2}}|_{\mathcal{I}_{1}} has rank k2=5k_{2}=5). Note that 𝑬^\hat{\bm{E}} satisfies all three conditions 𝖢𝟣\mathsf{C1}𝖢𝟥\mathsf{C3} and has column weight profile (1,1,1,1,1,0)=(|1|,,|6|)(1,1,1,1,1,0)=(|\mathcal{F}_{1}|,\ldots,|\mathcal{F}_{6}|).

Query Construction. The user generates βF=2\beta F=2 codewords 𝒄¯1(1)\bar{\bm{c}}^{(1)}_{1} and 𝒄¯1(2)\bar{\bm{c}}^{(2)}_{1} independently and uniformly at random from 𝒞¯\bar{\mathcal{C}}. Without loss of generality, let 𝒄¯1(1)=𝒄¯1(2)=(1,,1)\bar{\bm{c}}^{(1)}_{1}=\bar{\bm{c}}^{(2)}_{1}=(1,\ldots,1). Next, the n=6n=6 subqueries q1(l)q_{1}^{(l)}, l=1,,6l=1,\ldots,6, are constructed according to 4, 5 as

𝒒1(l)={𝒄̊l+(1,0)if l=1,𝒄̊l+(0,0)otherwise,\displaystyle\bm{q}_{1}^{(l)}=\begin{cases}\mathring{\bm{c}}_{l}+(1,0)&\text{if $l=1$},\\ \mathring{\bm{c}}_{l}+(0,0)&\text{otherwise},\end{cases}

where 𝒄̊l\mathring{\bm{c}}_{l} is defined in (3).

File Retrieval. Consider the n=6n=6 subresponses r1(l)r_{1}^{(l)}, l=1,,6l=1,\ldots,6. Then, according to (III-B),

𝝆1=(r1(1)r1(2)r1(3)r1(4)r1(5)r1(6))\displaystyle\bm{\rho}_{1}=\left(\begin{matrix}r_{1}^{(1)}\\ r_{1}^{(2)}\\ r_{1}^{(3)}\\ r_{1}^{(4)}\\ r_{1}^{(5)}\\ r_{1}^{(6)}\end{matrix}\right) =(c¯1,1(1)c1,1(1)c¯1,2(1)c1,2(1)c¯1,6(1)c1,6(1)){𝒙(GF(25))n:𝑯𝒞1𝒞¯𝒙=𝟎}+(c¯1,1(2)c1,1(2)c¯1,2(2)c1,2(2)c¯1,6(2)c1,6(2)){𝒙(GF(25))n:𝑯𝒞2𝒞¯𝒙=𝟎}+(o1(1)o1(2)o1(6))\displaystyle=\underbrace{\left(\begin{matrix}\bar{c}_{1,1}^{(1)}c_{1,1}^{(1)}\\ \bar{c}_{1,2}^{(1)}c_{1,2}^{(1)}\\ \vdots\\ \bar{c}_{1,6}^{(1)}c_{1,6}^{(1)}\\ \end{matrix}\right)}_{\begin{subarray}{c}\in\,\left\{\bm{x}\in(\mathrm{GF}(2^{5}))^{n}\colon\right.\\ \left.\bm{H}^{\mathcal{C}^{\prime}_{1}\circ\,\bar{\mathcal{C}}}\bm{x}=\bm{0}\right\}\end{subarray}}+\underbrace{\left(\begin{matrix}\bar{c}_{1,1}^{(2)}c_{1,1}^{(2)}\\ \bar{c}_{1,2}^{(2)}c_{1,2}^{(2)}\\ \vdots\\ \bar{c}_{1,6}^{(2)}c_{1,6}^{(2)}\\ \end{matrix}\right)}_{\begin{subarray}{c}\in\,\left\{\bm{x}\in(\mathrm{GF}(2^{5}))^{n}\colon\right.\\ \left.\bm{H}^{\mathcal{C}^{\prime}_{2}\circ\,\bar{\mathcal{C}}}\bm{x}=\bm{0}\right\}\end{subarray}}+\left(\begin{matrix}o_{1}^{(1)}\\ o_{1}^{(2)}\\ \vdots\\ o_{1}^{(6)}\end{matrix}\right)
=(x1,1(1)x1,1(1)x1,1(1)x1,1(1)x1,1(1)x1,1(1))+(x1,1(2)x1,2(2)x1,3(2)x1,4(2)x1,5(2)l=15x1,l(2))+(x1,1(1)00000),\displaystyle=\left(\begin{matrix}x_{1,1}^{(1)}\\ x_{1,1}^{(1)}\\ x_{1,1}^{(1)}\\ x_{1,1}^{(1)}\\ x_{1,1}^{(1)}\\ x_{1,1}^{(1)}\\ \end{matrix}\right)+\left(\begin{matrix}x_{1,1}^{(2)}\\ x_{1,2}^{(2)}\\ x_{1,3}^{(2)}\\ x_{1,4}^{(2)}\\ x_{1,5}^{(2)}\\ \sum_{l=1}^{5}x_{1,l}^{(2)}\\ \end{matrix}\right)+\left(\begin{matrix}x_{1,1}^{(1)}\\ 0\\ 0\\ 0\\ 0\\ 0\\ \end{matrix}\right),

and the code symbol x1,1(1)x_{1,1}^{(1)} of the file 𝑿(1)\bm{X}^{(1)} is recovered from

𝑯𝒞~𝝆1=(111111)(x1,1(1)00000)=x1,1(1).\displaystyle\bm{H}^{\tilde{\mathcal{C}}}\bm{\rho}_{1}=\left(\begin{matrix}1&1&1&1&1&1\end{matrix}\right)\left(\begin{matrix}x_{1,1}^{(1)}\\ 0\\ 0\\ 0\\ 0\\ 0\\ \end{matrix}\right)=x^{(1)}_{1,1}.

Note that in order to retain privacy across the two files of the library, we need to send d=k𝗆𝖺𝗑=5d=k_{\mathsf{max}}=5 subqueries to each SBS, thus generating 55 subresponses from each SBS (even if the first file can be recovered from the n=6n=6 subresponses r1(l)r_{1}^{(l)}, l=1,,6l=1,\ldots,6).

Refer to caption
Figure 2: Wireless caching scenario in which there are N𝖲𝖡𝖲=6N_{\mathsf{SBS}}=6 SBSs. The SBSs store F=2F=2 files, 𝑿(1)=(x1,1(1))GF(25)1×1\bm{X}^{(1)}=(x^{(1)}_{1,1})\in\mathrm{GF}(2^{5})^{1\times 1} and 𝑿(2)=(x1,1(2),x1,2(2),x1,3(2),x1,4(2),x1,5(2))GF(2)1×5\bm{X}^{(2)}=(x^{(2)}_{1,1},x^{(2)}_{1,2},x^{(2)}_{1,3},x^{(2)}_{1,4},x^{(2)}_{1,5})\in\mathrm{GF}(2)^{1\times 5}, of βL=5\beta L=5 bits each. The first file 𝑿(1)\bm{X}^{(1)} is encoded using an (N𝖲𝖡𝖲=6,k1=1)(N_{\mathsf{SBS}}=6,k_{1}=1) binary repetition code 𝒞1\mathcal{C}_{1}, while the second file 𝑿(2)\bm{X}^{(2)} is encoded using an (N𝖲𝖡𝖲=6,k2=5)(N_{\mathsf{SBS}}=6,k_{2}=5) binary single parity-check code 𝒞2\mathcal{C}_{2}.

IV Backhaul Rate Analysis: No PIR Case

In this section, we derive the backhaul rate for the proposed caching scheme for the case of no PIR, i.e., the conventional caching scenario where PIR is not required.

Proposition 1.

The average backhaul rate for the caching scheme 𝒞𝖬𝖣𝖲𝛍\mathcal{C}_{\mathsf{MDS}}^{\bm{\mu}} in Section II for the case of no PIR is

R𝗇𝗈𝖯𝖨𝖱\displaystyle\textnormal{R}_{\mathsf{noPIR}}
=i=1Fpiμib=0N𝖲𝖡𝖲γbmax(0,1/μib)μi+i=1Fpi1μi.\displaystyle=\sum_{i=1}^{F}p_{i}\lceil\mu_{i}\rceil\sum_{b=0}^{N_{\mathsf{SBS}}}\gamma_{b}\max\left(0,1/\mu_{i}-b\right)\mu_{i}+\sum_{i=1}^{F}p_{i}\lfloor 1-\mu_{i}\rfloor. (9)
Proof:

To download file 𝑿(i)\bm{X}^{(i)}, if the user is in communication range of a number of SBSs, bb, larger than or equal to 1/μi1/\mu_{i}, the user can retrieve the file from the SBSs and there is no contribution to the backhaul rate. Otherwise, if b<1/μib<1/\mu_{i}, the user retrieves a fraction L/ki=LμiL/k_{i}=L\mu_{i} of the file from each of the bb SBSs, i.e., a total of bβLμib\beta L\mu_{i} bits, and downloads the remaining (1/μib)βLμi(1/\mu_{i}-b)\beta L\mu_{i} bits from the MBS. Averaging over 𝜸\bm{\gamma} and 𝒑\bm{p} (for the files cached) and normalizing by the file size βL\beta L, the contribution to the backhaul rate of the retrieval of files that are cached in the SBSs is

i=1Fpiμib=0N𝖲𝖡𝖲γbmax(0,1/μib)μi.\displaystyle\sum_{i=1}^{F}p_{i}\lceil\mu_{i}\rceil\sum_{b=0}^{N_{\mathsf{SBS}}}\gamma_{b}\max\left(0,1/\mu_{i}-b\right)\mu_{i}. (10)

On the other hand, the files that are not cached are retrieved completely from the MBS, and their contribution to the backhaul rate is

i=1Fpi1μi.\displaystyle\sum_{i=1}^{F}p_{i}\lfloor 1-\mu_{i}\rfloor. (11)

Combining (10) and (11) completes the proof. ∎

We denote by R𝗇𝗈𝖯𝖨𝖱\textnormal{R}_{\mathsf{noPIR}}^{*} the maximum PIR rate resulting from the optimization of the content placement. R𝗇𝗈𝖯𝖨𝖱\textnormal{R}_{\mathsf{noPIR}}^{*} can be obtained solving the following optimization problem,

R𝗇𝗈𝖯𝖨𝖱\displaystyle\textnormal{R}_{\mathsf{noPIR}}^{*} =minμii=1Fpiμib=0N𝖲𝖡𝖲γbmax(0,1/μib)μi\displaystyle=\min_{\mu_{i}\in\mathcal{M}^{\prime}}~~\sum_{i=1}^{F}p_{i}\lceil\mu_{i}\rceil\sum_{b=0}^{N_{\mathsf{SBS}}}\gamma_{b}\max\big{(}0,1/\mu_{i}-b\big{)}\mu_{i}
+i=1Fpi1μi\displaystyle~~~~~~~~~~~~~+\sum_{i=1}^{F}p_{i}\lfloor 1-\mu_{i}\rfloor
s.t.i=1FμiM,\displaystyle\text{s.t.}\sum_{i=1}^{F}\mu_{i}\leq M,

where ={1/N𝖲𝖡𝖲}\mathcal{M}^{\prime}=\mathcal{M}\cup\{1/N_{\mathsf{SBS}}\}, as μi=1/N𝖲𝖡𝖲\mu_{i}=1/N_{\mathsf{SBS}} is a valid value for the case where PIR is not required.

In the following lemma, we show that the proposed content placement is equivalent to the one in [7], in the sense that it yields the same average backhaul rate.

Lemma 2.

The average backhaul rate given by (1) for the caching scheme 𝒞𝖬𝖣𝖲𝛍\mathcal{C}_{\mathsf{MDS}}^{\bm{\mu}} in Section II is equal to the one given by the caching scheme in [7], i.e., the two content placements are equivalent.

Proof:

We can rewrite (1) using simple math as

R𝗇𝗈𝖯𝖨𝖱\displaystyle\textnormal{R}_{\mathsf{noPIR}}
=i=1Fpiμib=0N𝖲𝖡𝖲γbmax(0,1/μib)μi+i=1Fpi1μi\displaystyle=\sum_{i=1}^{F}p_{i}\lceil\mu_{i}\rceil\sum_{b=0}^{N_{\mathsf{SBS}}}\gamma_{b}\max\big{(}0,1/\mu_{i}-b\big{)}\mu_{i}+\sum_{i=1}^{F}p_{i}\lfloor 1-\mu_{i}\rfloor
=i=1Fpiμib=0N𝖲𝖡𝖲γbmax(0,1bμi)+i=1Fpi1μi\displaystyle=\sum_{i=1}^{F}p_{i}\lceil\mu_{i}\rceil\sum_{b=0}^{N_{\mathsf{SBS}}}\gamma_{b}\max\big{(}0,1-b\mu_{i}\big{)}+\sum_{i=1}^{F}p_{i}\lfloor 1-\mu_{i}\rfloor
=i=1Fpiμib=0N𝖲𝖡𝖲γb(1min(1,bμi))+i=1Fpi1μi\displaystyle=\sum_{i=1}^{F}p_{i}\lceil\mu_{i}\rceil\sum_{b=0}^{N_{\mathsf{SBS}}}\gamma_{b}\big{(}1-\min\big{(}1,b\mu_{i}\big{)}\big{)}+\sum_{i=1}^{F}p_{i}\lfloor 1-\mu_{i}\rfloor
=(a)i=1Fpi(μi+1μi)b=0N𝖲𝖡𝖲γb(1min(1,bμi))\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\sum_{i=1}^{F}p_{i}(\lceil\mu_{i}\rceil+\lfloor 1-\mu_{i}\rfloor)\sum_{b=0}^{N_{\mathsf{SBS}}}\gamma_{b}\big{(}1-\min\big{(}1,b\mu_{i}\big{)}\big{)}
=i=1Fpib=0N𝖲𝖡𝖲γb(1min(1,bμi)),\displaystyle=\sum_{i=1}^{F}p_{i}\sum_{b=0}^{N_{\mathsf{SBS}}}\gamma_{b}\big{(}1-\min\big{(}1,b\mu_{i}\big{)}\big{)},

which is the expression in [7, eq. (1)]. (a)(a) follows from the fact that we can write pi1μip_{i}\lfloor 1-\mu_{i}\rfloor as pi1μib=0N𝖲𝖡𝖲γb(1min(1,bμi))p_{i}\lfloor 1-\mu_{i}\rfloor\sum_{b=0}^{N_{\mathsf{SBS}}}\gamma_{b}\big{(}1-\min\big{(}1,b\mu_{i}\big{)}\big{)}. For 0<μi10<\mu_{i}\leq 1 both expressions are zero, while for μi=0\mu_{i}=0 both expressions boil down to pip_{i} as pi1μib=0N𝖲𝖡𝖲γb(1min(1,bμi))=pib=0N𝖲𝖡𝖲γbp_{i}\lfloor 1-\mu_{i}\rfloor\sum_{b=0}^{N_{\mathsf{SBS}}}\gamma_{b}\big{(}1-\min\big{(}1,b\mu_{i}\big{)}\big{)}=p_{i}\sum_{b=0}^{N_{\mathsf{SBS}}}\gamma_{b} and b=0N𝖲𝖡𝖲γb=1\sum_{b=0}^{N_{\mathsf{SBS}}}\gamma_{b}=1. ∎

For popular content placement, i.e., the case where the MM most popular files are cached in all SBSs (this corresponds to caching the MM most popular files using an (N𝖲𝖡𝖲,1)(N_{\mathsf{SBS}},1) repetition code, i.e., μi=1\mu_{i}=1 for iMi\leq M and μi=0\mu_{i}=0 for i>Mi>M), the backhaul rate is given by

R𝗇𝗈𝖯𝖨𝖱𝗉𝗈𝗉\displaystyle\textnormal{R}_{\mathsf{noPIR}}^{\mathsf{pop}} =γ0i=1Mpi+i=M+1Fpi.\displaystyle=\gamma_{0}\sum_{i=1}^{M}p_{i}+\sum_{i=M+1}^{F}p_{i}. (12)

V Backhaul Rate Analysis: PIR Case

In this section, we derive the backhaul rate for the case of PIR (i.e., when the user wishes to download content privately) and we prove that uniform content placement (under the PIR protocol in Section III with GRS codes) is optimal. The average backhaul rate is given in the following proposition.

Proposition 2.

The average backhaul rate for the caching scheme 𝒞𝖬𝖣𝖲𝛍\mathcal{C}_{\mathsf{MDS}}^{\bm{\mu}} in Section II (with GRS codes) for the PIR case is

R𝖯𝖨𝖱=\displaystyle\textnormal{R}_{\mathsf{PIR}}=\; μ𝗆𝖺𝗑μmin(nT+1)1i=1Fpiμib=0nγb(nb)\displaystyle\frac{\mu_{\mathsf{max}}}{\mu_{\min}(n-T+1)-1}\sum_{i=1}^{F}p_{i}\lceil\mu_{i}\rceil\sum_{b=0}^{n}\gamma_{b}(n-b)
+i=1Fpi1μi.\displaystyle+\sum_{i=1}^{F}p_{i}\lfloor 1-\mu_{i}\rfloor. (13)
Proof:

To download file 𝑿(i)\bm{X}^{(i)}, the user generates nn query matrices. If the user is in communication range of bb SBSs, it receives bb responses (one from each SBS). The responses to the remaining nbn-b query matrices need to be downloaded from the MBS. Since each response consists of dd subresponses of size Lμ𝗆𝖺𝗑L\mu_{\mathsf{max}} bits, the user downloads (nb)dLμ𝗆𝖺𝗑(n-b)dL\mu_{\mathsf{max}} bits from the MBS. Averaging over 𝜸\bm{\gamma} and 𝒑\bm{p} (for the files cached) and normalizing by the file size βL\beta L, the contribution to the backhaul rate of the retrieval of files that are cached in the SBSs is

1βi=1Fpiμib=0nγb(nb)dμ𝗆𝖺𝗑.\displaystyle\frac{1}{\beta}\sum_{i=1}^{F}p_{i}\lceil\mu_{i}\rceil\sum_{b=0}^{n}\gamma_{b}(n-b)d\mu_{\mathsf{max}}. (14)

Now, using the fact that β=Γ=n(k𝗆𝖺𝗑+T1)=μ𝗆𝗂𝗇(nT+1)1μ𝗆𝗂𝗇\beta=\Gamma=n-(k_{\mathsf{max}}+T-1)=\frac{\mu_{\mathsf{min}}(n-T+1)-1}{\mu_{\mathsf{min}}} and d=k𝗆𝖺𝗑=1/μ𝗆𝗂𝗇d=k_{\mathsf{max}}=1/\mu_{\mathsf{min}} (see Theorem 1), we can rewrite (14) as

μ𝗆𝖺𝗑μmin(nT+1)1i=1Fpiμib=0nγb(nb).\displaystyle\frac{\mu_{\mathsf{max}}}{\mu_{\min}(n-T+1)-1}\sum_{i=1}^{F}p_{i}\lceil\mu_{i}\rceil\sum_{b=0}^{n}\gamma_{b}(n-b). (15)

On the other hand, the files that are not cached are retrieved completely from the MBS, and their contribution to the backhaul rate is (as for the no PIR case)

i=1Fpi1μi.\displaystyle\sum_{i=1}^{F}p_{i}\lfloor 1-\mu_{i}\rfloor. (16)

Combining (15) and (16) completes the proof. ∎

V-A Optimal Content Placement

Let R𝖯𝖨𝖱\textnormal{R}_{\mathsf{PIR}}^{*} be the maximum PIR rate resulting from the optimization of the content placement. R𝖯𝖨𝖱\textnormal{R}_{\mathsf{PIR}}^{*} can be obtained solving the following optimization problem,

R𝖯𝖨𝖱\displaystyle\textnormal{R}_{\mathsf{PIR}}^{*} =minμin𝒜μ𝗆𝖺𝗑μmin(nT+1)1i=1Fpiμib=0nγb(nb)\displaystyle=\underset{\begin{subarray}{c}\mu_{i}\in\mathcal{M}\\ n\in\mathcal{A}\end{subarray}}{\min}~\frac{\mu_{\mathsf{max}}}{\mu_{\min}(n-T+1)-1}\sum_{i=1}^{F}p_{i}\lceil\mu_{i}\rceil\sum_{b=0}^{n}\gamma_{b}(n-b)
+i=1Fpi1μi\displaystyle~~~~~~~~~~~+\sum_{i=1}^{F}p_{i}\lfloor 1-\mu_{i}\rfloor (17)
s.t.i=1FμiMandk𝗆𝗂𝗇ki,\displaystyle\text{s.t.}\sum_{i=1}^{F}\mu_{i}\leq M\;\text{and}\;k_{\mathsf{min}}\mid k_{i},

where 𝒜={1/μ𝗆𝗂𝗇+T,,N𝖲𝖡𝖲}\mathcal{A}=\{1/\mu_{\mathsf{min}}+T,\ldots,N_{\mathsf{SBS}}\} and the minimum value that nn can take on, i.e., 1/μ𝗆𝗂𝗇+T1/\mu_{\mathsf{min}}+T, comes from the fact that μmin(nT+1)1\mu_{\min}(n-T+1)-1 has to be positive.

Lemma 3.

Uniform content allocation, i.e., μi=μ\mu_{i}=\mu for all files that are cached, is optimal. Furthermore, the optimal number of files to cache is the maximum possible, i.e., μi=μ\mu_{i}=\mu for imin(M/μ,F)i\leq\min(M/\mu,F).

Proof:

We first prove the first part of the lemma. We need to show that either the optimal solution to the optimization problem in (V-A) is the all-zero vector 𝝁=(μ1,,μF)=(0,,0)\bm{\mu}=(\mu_{1},\ldots,\mu_{F})=(0,\ldots,0), or there exists a nonzero optimal solution 𝝁=(μ1,,μF)\bm{\mu}=(\mu_{1},\ldots,\mu_{F}) for which μ𝗆𝖺𝗑=μ𝗆𝗂𝗇\mu_{\mathsf{max}}=\mu_{\mathsf{min}}. Consider the second case, and let 𝝁\bm{\mu} denote any nonzero feasible solution to (V-A), i.e., a nonzero solution that satisfies the cache size constraint. Furthermore, let 𝝁=(μ1,,μF)\bm{\mu}^{\prime}=(\mu^{\prime}_{1},\ldots,\mu^{\prime}_{F}) denote the length-FF vector obtained from 𝝁\bm{\mu} as μi=μ𝗆𝗂𝗇\mu^{\prime}_{i}=\mu_{\mathsf{min}} for μi0\mu_{i}\neq 0 and μi=0\mu^{\prime}_{i}=0 otherwise. Clearly, 𝝁\bm{\mu}^{\prime} satisfies the cache size constraint as well. Note that μ𝗆𝖺𝗑=μ𝗆𝗂𝗇=μ𝗆𝗂𝗇\mu_{\mathsf{max}}^{\prime}=\mu_{\mathsf{min}}^{\prime}=\mu_{\mathsf{min}}. Thus,

μ𝗆𝖺𝗑μ𝗆𝗂𝗇(nT+1)1\displaystyle\frac{\mu_{\mathsf{max}}^{\prime}}{\mu_{\mathsf{min}}^{\prime}(n-T+1)-1} =μ𝗆𝗂𝗇μ𝗆𝗂𝗇(nT+1)1\displaystyle=\frac{\mu_{\mathsf{min}}}{\mu_{\mathsf{min}}(n-T+1)-1}
μ𝗆𝖺𝗑μ𝗆𝗂𝗇(nT+1)1.\displaystyle\leq\frac{\mu_{\mathsf{max}}}{\mu_{\mathsf{min}}(n-T+1)-1}.

Furthermore, since both the double summation in the first term of the objective function in (V-A) and the second term in (V-A) only depend on the support of 𝝁\bm{\mu}, it follows that the value of the objective function for 𝝁\bm{\mu}^{\prime} is smaller than or equal to the value of the objective function for 𝝁\bm{\mu}. Thus, for any nonzero feasible solution 𝝁\bm{\mu} there exists another at least as good nonzero feasible solution 𝝁\bm{\mu}^{\prime} for which all nonzero entries are the same (i.e., μ𝗆𝗂𝗇=μ𝗆𝖺𝗑=μ\mu_{\mathsf{min}}^{\prime}=\mu_{\mathsf{max}}^{\prime}=\mu), and the result follows by applying the above procedure to a (nonzero) optimal solution to (V-A).

We now prove the second part of the lemma. Caching a file helps in reducing the backhaul rate if

μμ(nT+1)1b=0nγb(nb)<1,\displaystyle\frac{\mu}{\mu(n-T+1)-1}\sum_{b=0}^{n}\gamma_{b}(n-b)<1, (18)

for some n𝒜n\in\mathcal{A} and μ\mu\in\mathcal{M}. This is independent of the file index ii. Thus, if the optimal solution is to cache at least one file (𝝁𝟎\bm{\mu}\neq\bm{0}), (18) is met for some n𝒜n\in\mathcal{A} and caching other files (as many files as permitted up to the cache size constraint, with decreasing order of popularity) is optimal as it further reduces the backhaul rate. ∎

Following Lemma 3, the optimization problem in (V-A) can be rewritten as

R𝖯𝖨𝖱=minμn𝒜\displaystyle\textnormal{R}_{\mathsf{PIR}}^{*}=\underset{\begin{subarray}{c}\mu\in\mathcal{M}\\ n\in\mathcal{A}\end{subarray}}{\min}\; μμ(nT+1)1i=1min(M/μ,F)pib=0nγb(nb)\displaystyle\frac{\mu}{\mu(n-T+1)-1}\sum_{i=1}^{\min(M/\mu,F)}p_{i}\sum_{b=0}^{n}\gamma_{b}(n-b)
+i=M/μ+1Fpi.\displaystyle+\sum_{i=M/\mu+1}^{F}p_{i}. (19)

V-B Popular Content Placement

For popular content placement, the backhaul rate is given by

R𝖯𝖨𝖱𝗉𝗈𝗉=minn𝒜\displaystyle\textnormal{R}_{\mathsf{PIR}}^{\mathsf{pop}}=\min_{n\in\mathcal{A}}\; 1nTi=1Mpib=0nγb(nb)+i=M+1Fpi.\displaystyle\frac{1}{n-T}\sum_{i=1}^{M}p_{i}\sum_{b=0}^{n}\gamma_{b}(n-b)+\sum_{i=M+1}^{F}p_{i}. (20)

Note that the optimization over nn is still required.

VI Weighted Communication Rate

So far, we have considered only the backhaul rate. However, it might also be desirable to limit the communication rate from SBSs to the user. We thus consider the weighted communication rate, C𝖯𝖨𝖱\textnormal{C}_{\mathsf{PIR}}, defined as555For the case of no PIR, a linear scalarization of the MBS and SBS download delays was considered in [5]. The communication rate is directly related to the download delay.

C𝖯𝖨𝖱=R𝖯𝖨𝖱+θD𝖯𝖨𝖱,\displaystyle\textnormal{C}_{\mathsf{PIR}}=\textnormal{R}_{\mathsf{PIR}}+\theta\textnormal{D}_{\mathsf{PIR}},

where D𝖯𝖨𝖱\textnormal{D}_{\mathsf{PIR}} is the average communication rate (normalized by the file size βL\beta L) from the SBSs, and θ\theta is a weighting parameter. We consider θ1\theta\leq 1, stemming from the fact that the bottleneck is the backhaul. Note that minimizing the average backhaul rate corresponds to θ=0\theta=0.

Proposition 3.

The average communication rate from the SBSs for the caching scheme 𝒞𝖬𝖣𝖲𝛍\mathcal{C}_{\mathsf{MDS}}^{\bm{\mu}} in Section II (with GRS codes) for the PIR case is

D𝖯𝖨𝖱=μ𝗆𝖺𝗑μmin(nT+1)1b=0nγ~bb,\displaystyle\textnormal{D}_{\mathsf{PIR}}=\frac{\mu_{\mathsf{max}}}{\mu_{\min}(n-T+1)-1}\sum_{b=0}^{n}\tilde{\gamma}_{b}b, (21)

where γ~b=γb\tilde{\gamma}_{b}=\gamma_{b} for b<nb<n and γ~n=b=nN𝖲𝖡𝖲γb\tilde{\gamma}_{n}=\sum_{b=n}^{N_{\mathsf{SBS}}}\gamma_{b}.

Proof:

To ensure privacy, the user needs to download data from the SBSs within visibility regardless whether the requested file is cached or not. This is in contrast to the case of no PIR. Note that, if the user queries the SBSs only in the case the requested file is cached, then the spy SBSs would infer that the user is interested in one of the files cached, thus gaining some information about the file requested. In other words, the user sends dummy queries and downloads data that is useless for the retrieval of the file but is necessary to achieve privacy. The user receives bb responses from the bb SBSs within communication range, each of size dLμ𝗆𝖺𝗑dL\mu_{\mathsf{max}} bits. Let γ~b\tilde{\gamma}_{b} denote the probability to receive responses from bb SBSs. For b<nb<n, γ~b\tilde{\gamma}_{b} is equal to the probability that bb SBSs are within communication range, i.e., γ~b=γb\tilde{\gamma}_{b}=\gamma_{b}. On the other hand, the probability to receive responses from nn SBSs, γ~n\tilde{\gamma}_{n}, is the probability that at least nn SBSs are within communication range, i.e., γ~n=b=nN𝖲𝖡𝖲γb\tilde{\gamma}_{n}=\sum_{b=n}^{N_{\mathsf{SBS}}}\gamma_{b}. Averaging over 𝜸~\tilde{\bm{\gamma}} and 𝒑\bm{p} (for all files, cached and not cached) and normalizing by the file size βL\beta L, the contribution to the communication rate of the retrieval of a file from the SBSs is

1βi=1Fpib=0nγ~bbdμ𝗆𝖺𝗑.\displaystyle\frac{1}{\beta}\sum_{i=1}^{F}p_{i}\sum_{b=0}^{n}\tilde{\gamma}_{b}bd\mu_{\mathsf{max}}. (22)

Now, using the fact that β=Γ=n(k𝗆𝖺𝗑+T1)=μ𝗆𝗂𝗇(nT+1)1μ𝗆𝗂𝗇\beta=\Gamma=n-(k_{\mathsf{max}}+T-1)=\frac{\mu_{\mathsf{min}}(n-T+1)-1}{\mu_{\mathsf{min}}} and d=k𝗆𝖺𝗑=1/μ𝗆𝗂𝗇d=k_{\mathsf{max}}=1/\mu_{\mathsf{min}} (see Theorem 1), we can rewrite (22) as (21). ∎

The corresponding optimization problem is

C𝖯𝖨𝖱\displaystyle\textnormal{C}_{\mathsf{PIR}}^{*} =minμin𝒜R𝖯𝖨𝖱+θD𝖯𝖨𝖱\displaystyle=\underset{\begin{subarray}{c}\mu_{i}\in\mathcal{M}\\ n\in\mathcal{A}\end{subarray}}{\min}\;\textnormal{R}_{\mathsf{PIR}}+\theta\textnormal{D}_{\mathsf{PIR}} (23)
s.t.i=1FμiMandk𝗆𝗂𝗇ki,\displaystyle\text{s.t.}\sum_{i=1}^{F}\mu_{i}\leq M\;\text{and}\;k_{\mathsf{min}}\mid k_{i},

where R𝖯𝖨𝖱\textnormal{R}_{\mathsf{PIR}} is given in (13).

Lemma 4.

Uniform content allocation, i.e., μi=μ\mu_{i}=\mu for all files that are cached, is optimal. Furthermore, the optimal number of files to cache is the maximum possible, i.e., μi=μ\mu_{i}=\mu for imin(M/μ,F)i\leq\min(M/\mu,F).

Proof:

The proof of Lemma 3 applies to both terms in (23) and the result follows. ∎

Following Lemma 4, the optimization problem in (23) can be rewritten as

C𝖯𝖨𝖱\displaystyle\textnormal{C}_{\mathsf{PIR}}^{*} =minμn𝒜μμ(nT+1)1i=1min(M/μ,F)pib=0nγb(nb)\displaystyle=\underset{\begin{subarray}{c}\mu\in\mathcal{M}\\ n\in\mathcal{A}\end{subarray}}{\min}~~\frac{\mu}{\mu(n-T+1)-1}\sum_{i=1}^{\min(M/\mu,F)}p_{i}\sum_{b=0}^{n}\gamma_{b}(n-b)
+i=M/μ+1Fpi+θμμ(nT+1)1b=0nγ~bb.\displaystyle~~~~~~~~~~~+\sum_{i=M/\mu+1}^{F}p_{i}+\theta\frac{\mu}{\mu(n-T+1)-1}\sum_{b=0}^{n}\tilde{\gamma}_{b}b. (24)

VII Numerical Results

For the numerical results in this section, we assume that the files popularity distribution 𝒑\bm{p} follows the Zipf law [26], i.e., the popularity of file 𝑿(i)\bm{X}^{(i)} is

pi=1/iα1/α,\displaystyle p_{i}=\frac{1/i^{\alpha}}{\sum_{\ell}1/\ell^{\alpha}},

where α[0.5,1.5]\alpha\in[0.5,1.5] is the skewness factor [7] and by definition p1p2pFp_{1}\geq p_{2}\geq\ldots\geq p_{F}. In Figs. 3 and 4, we consider a network topology where SBSs are deployed over a macro-cell of radius DD meters according to a regular grid with distance dd meters between them [7, 5]. Each SBS has a communication radius of rr meters. Let b\mathcal{R}_{b} be the area where a user can be served by bb SBSs. Then, assuming that the users are uniformly distributed over the macro-cell area with density ϕ\phi users per square meter, the probability that a user is in communication range of bb SBSs can be calculated as in [7]

γb=ϕbϕa=1N𝗆𝖺𝗑a,\gamma_{b}=\frac{\phi\mathcal{R}_{b}}{\phi\sum_{a=1}^{N_{\mathsf{max}}}\mathcal{R}_{a}},

where the areas b\mathcal{R}_{b} can be easily obtained by simple geometrical evaluations, and N𝗆𝖺𝗑N_{\mathsf{max}} is the maximum number of SBSs within communication range of a user.

For the results in Figs. 3 and 4, the system parameters (taken from [7]) are D=500D=500 meters, which results in N𝖲𝖡𝖲=316N_{\mathsf{SBS}}=316 over the macro-cell area, F=200F=200 files, α=0.7\alpha=0.7, and r=60r=60 meters. This results in 𝜸=(0,0,0.1736,0.5113,0.3151,0,,0)\bm{\gamma}=(0,0,0.1736,0.5113,0.3151,0,\ldots,0), i.e., the maximum number of SBSs in visibility of a user is N𝗆𝖺𝗑=4N_{\mathsf{max}}=4.

Refer to caption
Figure 3: Backhaul rate as a function of the cache size constraint MM for a system with F=200F=200 files, N𝖲𝖡𝖲=316N_{\mathsf{SBS}}=316, and α=0.7\alpha=0.7.

In Fig. 3, we plot the optimized backhaul rate R𝖯𝖨𝖱\textnormal{R}_{\mathsf{PIR}}^{*} (red, solid lines) according to (19) as a function of the cache size constraint MM for the noncolluding case (T=1T=1) and T=2T=2 and T=3T=3 colluding SBSs. The curves in Fig. 3 should be interpreted as the minimum backhaul rate that is necessary in order to achieve privacy against TT spy SBSs out of the nn SBSs that are contacted by the user. For the particular system parameters considered, the optimal value of nn is 33 for T=1T=1 and T=2T=2, and all values of MM, i.e., the scheme yields privacy against TT spy SBSs out of the n=3n=3 SBSs contacted. For T=3T=3 the optimal value of nn is 44 for all values of MM, and thus the scheme yields privacy against 33 spy SBSs out of n=4n=4 SBSs. We also plot the optimized backhaul rate R𝗇𝗈𝖯𝖨𝖱\textnormal{R}_{\mathsf{noPIR}}^{*} for the case of no PIR.666The curve R𝗇𝗈𝖯𝖨𝖱\textnormal{R}_{\mathsf{noPIR}}^{*} in the figure is identical to that in [7, Fig. 4]. As proved in Lemma 2, while the proposed content placement is different from the one in [7], they are equivalent in terms of average backhaul rate. As can be seen in the figure, caching helps in significantly reducing the backhaul rate for T=1T=1 and T=2T=2. For T=3T=3 caching also helps in reducing the backhaul rate, but the reduction is smaller. Also, as expected, compared to the case of no PIR (R𝗇𝗈𝖯𝖨𝖱\textnormal{R}_{\mathsf{noPIR}}^{*}, black, solid line) achieving privacy requires a higher backhaul rate. The required backhaul rate increases with the number of colluding SBSs TT.

Refer to caption
Figure 4: Optimized weighted communication rate as a function of the cache size constraint MM for a system with T=1T=1 spy SBS, F=200F=200 files, N𝖲𝖡𝖲=316N_{\mathsf{SBS}}=316, α=0.7\alpha=0.7, and several values of θ\theta.

For M100M\geq 100 and no PIR, the backhaul rate is zero, as all files can be downloaded from the SBSs. Indeed, for M=100M=100, we can select ki=2ik_{i}=2~\forall i and cache one coded symbol from each stripe of each file in each SBS (thus satisfying the constraint i=1FμiM\sum_{i=1}^{F}\mu_{i}\leq M as i=1200μi=i=12001/ki=i=12000.5=100\sum_{i=1}^{200}\mu_{i}=\sum_{i=1}^{200}1/k_{i}=\sum_{i=1}^{200}0.5=100). Since for no PIR to retrieve each stripe of a file it is enough to download 22 symbols from each stripe of the file (due to the MDS property) and according to 𝜸\bm{\gamma} at least 22 SBSs are within range, for M=100M=100 (and hence for M>100M>100 as well) the user can always retrieve the file from the SBSs and the backhaul rate is zero. For the case of PIR and T=1T=1, on the other hand, the required backhaul rate is positive unless all complete files can be cached in all SBSs, i.e., M=FM=F. For T=2T=2 and T=3T=3, even for M=FM=F the backhaul rate is not zero. This is because in this case the user needs to receive n=3n=3 and n=4n=4 responses 𝒓(l)\bm{r}^{(l)}, l=1,,nl=1,\ldots,n, respectively (from the SBSs or the MBS). However, for the considered system parameters the probability that the user has b3b\geq 3 SBSs within range is not one, thus the user always needs to download data from the MBS to recover the file and the backhaul rate is positive.

For comparison purposes, in the figure we also plot the backhaul rate for the case of popular content placement R𝖯𝖨𝖱𝗉𝗈𝗉\textnormal{R}_{\mathsf{PIR}}^{\mathsf{pop}} in (20) (blue, dashed lines). In this case, the optimal value of nn is 22, 33, and 44 for T=1T=1, T=2T=2, and T=3T=3, respectively. We remark that the curve R𝖯𝖨𝖱𝗉𝗈𝗉\textnormal{R}_{\mathsf{PIR}}^{\mathsf{pop}} for T=1T=1 overlaps with the curve R𝗇𝗈𝖯𝖨𝖱𝗉𝗈𝗉\textnormal{R}_{\mathsf{noPIR}}^{\mathsf{pop}}. This is due to the fact that for T=1T=1, n=2n=2, and γ0=γ1=0\gamma_{0}=\gamma_{1}=0, R𝖯𝖨𝖱𝗉𝗈𝗉\textnormal{R}_{\mathsf{PIR}}^{\mathsf{pop}} in (20) boils down to M+1Fpi\sum_{M+1}^{F}p_{i}, which is R𝗇𝗈𝖯𝖨𝖱𝗉𝗈𝗉\textnormal{R}_{\mathsf{noPIR}}^{\mathsf{pop}} in (12). However, for the general case, i.e., other 𝜸\bm{\gamma}, R𝖯𝖨𝖱𝗉𝗈𝗉\textnormal{R}_{\mathsf{PIR}}^{\mathsf{pop}} and R𝗇𝗈𝖯𝖨𝖱𝗉𝗈𝗉\textnormal{R}_{\mathsf{noPIR}}^{\mathsf{pop}} may differ. As already shown in [7], for no PIR the optimized content placement yields significantly lower backhaul rate than popular content placement. For the PIR case and T=1T=1, up to M=118M=118 the optimized content placement also yields some performance gains with respect to popular content placement, albeit not as significant as for the case of no PIR. Interestingly, as shown in the figure, for M119M\geq 119, PIR popular content placement is optimal. Furthermore, as shown in the figure, for T=2T=2 and T=3T=3 popular content placement is optimal for all MM.

Refer to caption
Figure 5: Backhaul rate as a function of the density of SBSs λ\lambda and several values MM for the scenario where SBSs are distributed according to a PPP and T=1T=1. F=200F=200 files and α=0.7\alpha=0.7. Solid lines correspond to optimal content placement (R𝖯𝖨𝖱\textnormal{R}_{\mathsf{PIR}}^{*} in (19)) and dashed lines to popular content placement (R𝖯𝖨𝖱𝗉𝗈𝗉\textnormal{R}_{\mathsf{PIR}}^{\mathsf{pop}} in (20)).

In Fig. 4, we plot the optimized weighted communication rate C𝖯𝖨𝖱\textnormal{C}_{\mathsf{PIR}}^{*} in (24) for the noncolluding case (T=1T=1) as a function of the cache size constraint MM and several values of θ\theta. For the considered system parameters, caching is still useful for small values of θ\theta if the cache size is big enough. For example, for θ=0.5\theta=0.5 caching helps in reducing the weighted communication rate with respect to no caching for M87M\geq 87. For θ0.7\theta\geq 0.7, caching does not bring any reduction of the weighted communication rate.

Refer to caption
Figure 6: Backhaul rate as a function of the density of SBSs λ\lambda and several values of MM for the scenario where SBSs are distributed according to a PPP and T=2T=2 and T=4T=4. F=200F=200 files and α=0.7\alpha=0.7. Solid lines correspond to optimal content placement (R𝖯𝖨𝖱\textnormal{R}_{\mathsf{PIR}}^{*} in (19)) and dashed lines to popular content placement (R𝖯𝖨𝖱𝗉𝗈𝗉\textnormal{R}_{\mathsf{PIR}}^{\mathsf{pop}} in (20)).

In Figs. 5 and 6, we plot the backhaul rate for a PPP deployment model where SBSs are distributed over the plane according to a PPP and a user at an arbitrary location in the plane can connect to all SBSs that are within radius r𝗎r_{\mathsf{u}}. Let λ\lambda be the density of SBSs per square meter. For this scenario, the probability that a user is in communication range of bb SBSs is given by [27]

γb=eψψbb!,\displaystyle\gamma_{b}=\mathrm{e}^{-\psi}\frac{\psi^{b}}{b!},

where ψ=λπr𝗎2\psi=\lambda\pi{r^{2}_{\mathsf{u}}}. In Fig. 5, we plot the optimized backhaul rate (R𝖯𝖨𝖱\textnormal{R}_{\mathsf{PIR}}^{*} in (19), solid lines) as a function of the density λ\lambda for F=200F=200 files, α=0.7\alpha=0.7, r𝗎=60r_{\mathsf{u}}=60 meters, different cache size constraint MM, and a single spy SBS, i.e., T=1T=1. For small densities, caching does not help in reducing the backhaul rate. However, as expected, the required backhaul rate diminishes by increasing the density of SBSs. For comparison purposes, we also plot the backhaul rate for popular content placement (R𝖯𝖨𝖱𝗉𝗈𝗉\textnormal{R}_{\mathsf{PIR}}^{\mathsf{pop}} in (20), dashed lines). Interestingly, popular content placement is optimal up to a given density of SBSs, after which optimizing the content placement brings a significant reduction of the required backhaul rate. Similar results are observed for T=2T=2 and T=4T=4 colluding SBSs in Fig. 6 with the same system parameters as in Fig. 5. In Figs. 5 and 6, for each MM the optimal value of nn and μ\mu depends on the density of SBSs. Typically, a pair (n,μ)(n,\mu) is optimal for a range of densities. In the figures, we give the optimal values of nn and kk for M=50M=50 (in particular we give the pair (n,k)(n,k), with k=1/νk=1/\nu, which is also the code parameters of the punctured code 𝒞\mathcal{C}^{\prime}). For convenience, in the figures we only give the parameters for the densities where the optimal pair (n,k)(n,k) changes. The values should be read as follows: In Fig. 5, walking the curve from top-left to bottom-right, no caching is optimal for densities up to λ=8105\lambda=8\cdot 10^{-5}. For λ=9105\lambda=9\cdot 10^{-5}, (4,1)(4,1) is optimal. Then, (3,1)(3,1) is optimal for densities λ=104\lambda=10^{-4} to λ=1.2104\lambda=1.2\cdot 10^{-4}. From λ=1.3104\lambda=1.3\cdot 10^{-4} to λ=3.2104\lambda=3.2\cdot 10^{-4} the optimal value is (2,1)(2,1), and so on (the curves are plotted with steps of 10510^{-5}).

VIII Conclusion

We proposed a private information retrieval scheme that allows to download files of different popularities from a cellular network, where to reduce the backhaul usage content is cached at the wireless edge in SBSs, while achieving privacy against a number of spy SBSs. We derived the backhaul rate for this scheme and formulated the content placement optimization. We showed that, as for the no PIR case, up to a number of spy SBSs caching helps in reducing the backhaul rate. Interestingly, contrary to the no PIR case, uniform content placement is optimal. Furthermore, popular content placement is optimal for some scenarios. Although uniform content placement is optimal, the proposed PIR scheme for multiple code rates may be useful in other scenarios, e.g., for distributed storage where data is stored using codes of different rates.

Appendix
Proof of Theorem 1

To prove that the protocol achieves PIR against TT colluding SBSs, we need to prove that both the privacy condition in (2a) and the recovery condition in (2b) are satisfied. We first prove that the recovery condition in (2b) is satisfied.

According to Lemma 1, GRS codes with a fixed weighting vector 𝒗\bm{v} and evaluation vector 𝜿\bm{\kappa} are naturally nested. Furthermore, puncturing a GRS code results in another GRS code, since GRS codes are weighted evaluation codes [25, Ch. 5]. Thus, 𝒞i𝒞𝗆𝖺𝗑\mathcal{C}_{i}^{\prime}\subseteq\mathcal{C}_{\mathsf{max}}^{\prime} for all ii, and it follows from 7 that

𝒞~=(i=1F𝒞i)𝒞¯=𝒞𝗆𝖺𝗑𝒞¯.\displaystyle\tilde{\mathcal{C}}=\left(\sum_{i=1}^{F}\mathcal{C}^{\prime}_{i}\right)\circ\bar{\mathcal{C}}=\mathcal{C}_{\mathsf{max}}^{\prime}\circ\bar{\mathcal{C}}.

Furthermore, it can easily be shown that the Hadamard product of two GRS codes with the same evaluation vector (κ1,,κn)(\kappa_{1},\ldots,\kappa_{n}) is also a GRS code with dimension equal to the sum of the dimensions minus 11. Thus, 𝒞~\tilde{\mathcal{C}} is a GRS code of dimension k𝗆𝖺𝗑+T1k_{\mathsf{max}}+T-1. As 𝒞~\tilde{\mathcal{C}} is an (n,k𝗆𝖺𝗑+T1)(n,k_{\mathsf{max}}+T-1) MDS code (GRS codes are MDS codes), it can correct arbitrary erasure patterns of up to Γ=n(k𝗆𝖺𝗑+T1)\Gamma=n-(k_{\mathsf{max}}+T-1) erasures. This implies that one can construct a valid k𝗆𝖺𝗑×nk_{\mathsf{max}}\times n (d=k𝗆𝖺𝗑d=k_{\mathsf{max}}) matrix 𝑬^\hat{\bm{E}} (satisfying conditions 𝖢𝟣\mathsf{C1}𝖢𝟥\mathsf{C3}) from β=Γ\beta=\Gamma information sets {m}\{\mathcal{I}_{m}\} of 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}}^{\prime} as shown below.

Let 𝒥j={j,,(j+Γ1)modn}\mathcal{J}_{j}=\{j,\ldots,(j+\Gamma-1)\bmod n\}, j=1,,k𝗆𝖺𝗑j=1,\ldots,k_{\mathsf{max}}. Construct 𝑬^\hat{\bm{E}} in such a way that 𝒥j\mathcal{J}_{j} is the support of the jj-th row of 𝑬^\hat{\bm{E}}. Hence, 𝖢𝟣\mathsf{C1} is satisfied. Furthermore, since 𝒞~\tilde{\mathcal{C}} is an (n,k𝗆𝖺𝗑+T1)(n,k_{\mathsf{max}}+T-1) MDS code and Γ=n(k𝗆𝖺𝗑+T1)\Gamma=n-(k_{\mathsf{max}}+T-1), all rows of 𝑬^\hat{\bm{E}} are correctable by 𝒞~\tilde{\mathcal{C}}, and thus 𝖢𝟤\mathsf{C2} is satisfied. Finally, run Algorithm 1, which constructs β=Γ\beta=\Gamma information sets {m}\{\mathcal{I}_{m}\} of 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}}^{\prime} (and the corresponding sets {l}\{\mathcal{F}_{l}\}) such that 𝖢𝟥\mathsf{C3} is satisfied. Note that since 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}}^{\prime} is an MDS code, all coordinate sets of size k𝗆𝖺𝗑k_{\mathsf{max}} are information sets of 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}}^{\prime}, and hence Algorithm 1 will always succeed in constructing a valid set of information sets of 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}}^{\prime} (the inequalities in Algorithms 1 and 1 together with the fact that the overall weight of 𝑬^\hat{\bm{E}} is Γk𝗆𝖺𝗑\Gamma k_{\mathsf{max}} ensure that β=Γ\beta=\Gamma valid information sets for 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}}^{\prime} are constructed). In particular, the while-loop in Algorithm 1 will always terminate.

From the constructed matrix 𝑬^\hat{\bm{E}}, the user is able to recover Γdβki\Gamma d\geq\beta k_{i} unique code symbols of the requested file 𝑿(i)\bm{X}^{(i)}, at least kik_{i} symbols from each stripe. Furthermore, a set of kik_{i} recovered code symbols from each stripe corresponds to an information set of 𝒞i\mathcal{C}_{i}^{\prime} (any subset of size kik_{i} of any information set of size k𝗆𝖺𝗑k_{\mathsf{max}} of 𝒞𝗆𝖺𝗑\mathcal{C}_{\mathsf{max}}^{\prime} is an information set of 𝒞i\mathcal{C}_{i}^{\prime}), and the requested file 𝑿(i)\bm{X}^{(i)} can be recovered. This can be seen following a similar argument as in the proof of [21, Th. 6], and it follows that the recovery condition in (2b) is satisfied.

Secondly, we consider the privacy condition in (2a). A reasoning similar to the proof of [21, Lem. 6] shows that it is satisfied, and we refer the interested reader to this proof for further details. The fundamental reason is that addition of a deterministic vector in (5) does not change the joint probability distribution of {𝑸(l),l𝒯}\{\bm{Q}^{(l)},l\in\mathcal{T}\} for any set 𝒯\mathcal{T} size TT, and the proof follows the same lines as the proof of [20, Th. 8]. However, note that there is a subtle difference in the sense that independent instances of the protocol may query different sets of SBSs. However, since the set of SBSs that are queried is independent of the requested file and depends only on which SBSs that are within communication range, this fact does not leak any additional information on which file is requested by the user.

1
Input : 𝑬^\hat{\bm{E}}, β\beta, nn, k𝗆𝖺𝗑k_{\mathsf{max}}
Output : {m}\{\mathcal{I}_{m}\}, {l}\{\mathcal{F}_{l}\}
2 for m{1,,β}m\in\{1,\ldots,\beta\} do
3 m\mathcal{I}_{m}\leftarrow\emptyset
4 end for
5for l{1,,n}l\in\{1,\ldots,n\} do
6 l\mathcal{F}_{l}\leftarrow\emptyset, m1m\leftarrow 1
7 while |l|w𝖧(𝐭l)|\mathcal{F}_{l}|\leq w_{\mathsf{H}}\left(\bm{t}_{l}\right) do
8    if  |m]<k𝗆𝖺𝗑|\mathcal{I}_{m}]<k_{\mathsf{max}} then
9       ll{m}\mathcal{F}_{l}\leftarrow\mathcal{F}_{l}\cup\{m\}
10       mm{l}\mathcal{I}_{m}\leftarrow\mathcal{I}_{m}\cup\{l\}
11      end if
12    
13    mm+1m\leftarrow m+1
14   end while
15 
16 end for
17
18
Algorithm 1 Construction of {m}\{\mathcal{I}_{m}\} for Theorem 1

References

  • [1] U. Niesen, D. Shah, and G. W. Wornell, “Caching in wireless networks,” IEEE Trans. Inf. Theory, vol. 58, no. 10, pp. 6524–6540, Oct. 2012.
  • [2] M. A. Maddah-Ali and U. Niesen, “Fundamental limits of caching,” IEEE Trans. Inf. Theory, vol. 60, no. 5, pp. 2856–2867, May 2014.
  • [3] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K. Soong, and J. C. Zhang, “What will 5G be?” IEEE J. Sel. Areas Commun., vol. 32, no. 6, pp. 1065–1082, Jun. 2014.
  • [4] D. Liu, B. Chen, C. Yang, and A. F. Molisch, “Caching at the wireless edge: Design aspects, challenges, and future directions,” IEEE Commun. Mag., vol. 54, no. 9, pp. 22–28, Sep. 2016.
  • [5] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire, “Femtocaching: Wireless content delivery through distributed caching helpers,” IEEE Trans. Inf. Theory, vol. 59, no. 12, pp. 8402–8413, Dec. 2013.
  • [6] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role of proactive caching in 5G wireless networks,” IEEE Commun. Mag., vol. 52, no. 8, pp. 82–89, Aug. 2014.
  • [7] V. Bioglio, F. Gabry, and I. Land, “Optimizing MDS codes for caching at the edge,” in Proc. Global Commun. Conf. (GLOBECOM), San Diego, CA, Dec. 2015.
  • [8] M. Ji, G. Caire, and A. F. Molisch, “Fundamental limits of caching in wireless D2D networks,” IEEE Trans. Inf. Theory, vol. 62, no. 2, pp. 849–869, Feb. 2016.
  • [9] N. Golrezaei, P. Mansourifard, A. F. Molisch, and A. G. Dimakis, “Base-station assisted device-to-device communications for high-throughput wireless video networks,” IEEE Trans. Wireless Commun., vol. 13, no. 7, pp. 3665–3676, Jul. 2014.
  • [10] J. Pedersen, A. Graell i Amat, I. Andriyanova, and F. Brännström, “Distributed storage in mobile wireless networks with device-to-device communication,” IEEE Trans. Commun., vol. 64, no. 11, pp. 4862–4878, Nov. 2016.
  • [11] A. Piemontese and A. Graell i Amat, “MDS-coded distributed storage for low delay wireless content delivery,” in Proc. 2016 9th Int. Symp. Turbo Codes & Iterative Inform. Process. (ISTC), Brest, France, 2016, pp. 320–324.
  • [12] J. Pedersen, A. Graell i Amat, I. Andriyanova, and F. Brännström, “Optimizing MDS coded caching in wireless networks with device-to-device communication,” Jan. 2017, arXiv:1701.06289v2 [cs.IT]. [Online]. Available: https://arxiv.org/abs/1701.06289
  • [13] Y. Ishai, E. Kushilevitz, R. Ostrovsky, and A. Sahai, “Batch codes and their applications,” in Proc. 36th Annual ACM Symp. Theory Comput. (STOC), Chicago, IL, Jun. 2004, pp. 262–271.
  • [14] N. B. Shah, K. V. Rashmi, and K. Ramchandran, “One extra bit of download ensures perfectly private information retrieval,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Honolulu, HI, Jun./Jul. 2014, pp. 856–860.
  • [15] T. H. Chan, S.-W. Ho, and H. Yamamoto, “Private information retrieval for coded storage,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Hong Kong, China, Jun. 2015, pp. 2842–2846.
  • [16] R. Tajeddine and S. El Rouayheb, “Private information retrieval from MDS coded data in distributed storage systems,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Barcelona, Spain, Jul. 2016, pp. 1411–1415.
  • [17] S. Kumar, E. Rosnes, and A. Graell i Amat, “Private information retrieval in distributed storage systems using an arbitrary linear code,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Aachen, Germany, Jun. 2017, pp. 1421–1425.
  • [18] H. Sun and S. A. Jafar, “Private information retrieval from MDS coded data with colluding servers: Settling a conjecture by Freij-Hollanti et al.” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Aachen, Germany, Jun. 2017, pp. 1893–1897.
  • [19] ——, “The capacity of private information retrieval,” IEEE Trans. Inf. Theory, vol. 63, no. 7, pp. 4075–4088, Jul. 2017.
  • [20] R. Freij-Hollanti, O. W. Gnilke, C. Hollanti, and D. A. Karpuk, “Private information retrieval from coded databases with colluding servers,” SIAM J. Appl. Algebra Geom., vol. 1, no. 1, pp. 647–664, Nov. 2017.
  • [21] S. Kumar, H.-Y. Lin, E. Rosnes, and A. Graell i Amat, “Achieving maximum distance separable private information retrieval capacity with linear codes,” 2017, arXiv:1712.03898v3 [cs.IT]. [Online]. Available: https://arxiv.org/abs/1712.03898
  • [22] K. Banawan and S. Ulukus, “The capacity of private information retrieval from coded databases,” IEEE Trans. Inf. Theory, vol. 64, no. 3, pp. 1945–1956, Mar. 2018.
  • [23] H. Sun and S. A. Jafar, “The capacity of robust private information retrieval with colluding databases,” IEEE Trans. Inf. Theory, vol. 64, no. 4, pp. 2361–2370, Apr. 2018.
  • [24] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, “Private information retrieval,” in Proc. 36th IEEE Symp. Found. Comp. Sci. (FOCS), Milwaukee, WI, Oct. 1995, pp. 41–50.
  • [25] W. C. Huffman and V. Pless, Eds., Fundamentals of Error-Correcting Codes.   Cambridge, UK: Cambridge University Press, 2010.
  • [26] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web caching and Zipf-like distributions: Evidence and implications,” in Proc. IEEE Joint Conf. Comput. Commun. Soc. (INFOCOM), New York, NY, Mar. 1999, pp. 126–134.
  • [27] B. Serbetci and J. Goseling, “On optimal geographical caching in heterogeneous cellular networks,” in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), San Francisco, CA, Mar. 2017.