Robust low-delay Streaming PIR using convolutional codes
Abstract
In this paper we investigate the design of a low-delay robust streaming PIR scheme on coded data that is resilient to unresponsive or slow servers and can privately retrieve streaming data in a sequential fashion subject to a fixed decoding delay. We present a scheme based on convolutional codes and the star product and assume no collusion between servers. In particular we propose the use of convolutional codes that have the maximum distance increase, called Maximum Distance Profile (MDP). We show that the proposed scheme can deal with many different erasure patterns.
Index Terms:
Private Information Retrieval, Private Streaming, Convolutional codes low-delay MDP codes Erasure channel.I Introduction
Video traffic has had an explosive growth and it is expected to keep its exponential growth in the coming years [1]. Service providers for real-time video streaming are typically hosted in a public cloud, with multiple servers in different data centers, e.g. Google Cloud, Amazon CloudFront and Microsoft Azure. These cloud services aim for private and low-latency communications.
The problem of Private Information Retrieval (PIR) has attracted a lot of attention in the recent years and studies how to retrieve a file from a storage system without revealing the desired file to the servers. It was initially addressed for replicated files [7] and recently for coded files [6, 8, 13, 16, 17]. In this last setting, the general model of the information theoretic PIR problem is as follows. Let a coded database be contained in servers storing files and assume that the user knows the content of the servers. Each file is coded and stored independently using the same code and the user wants to retrieve a particular file from the database with zero information gain from the servers, i.e., the user wants information theoretic privacy [19]. Recently, the literature on PIR models has grown considerably with extensions for more general PIR models with several additional constraints. Most of the efforts in private retrieval have focused on efficient schemes that optimize different metrics, such as communication cost or rate. However, in many cases some of the servers may be busy and do not respond within a desirable time frame or network failures may occur. For this reason, new robust schemes were proposed in order to deal with such scenarios [19, 18] adding redundancy to tolerate certain missing files from the servers. PIR schemes on coded data for Byzantine or unresponsive servers were presented in [21, 18]. These schemes are suited for retrieving one single file and therefore use block codes. In [10] a scheme for sequential retrieving was proposed but again for a given set of files of fixed size and assuming that all the responses of the servers are lost at the same time instant. The case of a non-bursty channel is also considered in this paper but only using unit memory convolutional codes. However, to the best of the author’s knowledge, the problem of low-delay private retrieval of a stream of files (of undetermined length) with some low or unresponsive servers remains unexplored.
In this work we investigate this more general problem and propose a novel robust scheme to deal with low-delay streaming retrieval of files from servers in the presence of possible unresponsive servers by using Maximum Distance Profile (MDP) convolutional codes. This class of codes is suitable for low-delay streaming applications as they possess optimal error-correcting capabilities within a decoding window, see [5, 20]. One of the advantages of using convolutional codes over block codes is the sliding window flexibility that allows to select different decoding windows according to the erasure pattern. We show how to take advantage of this property to provide robust PIR in this context. We present a scheme that is able to stream files consisting of many stripes in the presence of erasures without assuming any particular structure in the sequence of erasures. The model in [10] treated burst erasure channels using general convolutional codes and the non-bursty channel case was treated using unit memory convolutional codes. Unit memory codes are restricted to store only what occurred in the previous instant and therefore are far from optimal for low-delay applications when the given delay constraint is larger than one. Note also that when only burst erasure channels are assumed, there exist concrete constructions of convolutional codes that are optimal in such a context [5, 4, 14]. In this work we extend this thread of research and consider a non necessarily bursty channel using convolutional codes with no restriction in the memory, namely, MDP convolutional codes. In contrast to [10], where the response of the servers is built in a convolutional fashion but the storage code is still a block code, we also use a convolutional code to store the files on the servers.
II Preliminaries
In this section we recall basic material and introduce the definitions needed for this work, including the notion of convolutional code and superregular matrix. Let be a finite field of size and be the ring of polynomials with coefficients in .
Definition 1
An -block code is a -dimensional subspace of , i.e., there exists of full row rank matrix such that
is called generator matrix of the code and is unique up to left multiplication with an invertible matrix . Furthermore, is called message vector and the elements are called codewords.
Convolutional codes process a continuous sequence of data instead of blocks of fixed vectors as done by block codes. If we introduce a variable , called the delay operator, to indicate the time instant in which each information arrived or each codeword was transmitted, then we can represent the sequence message as a polynomial vector . Formally, we can define convolutional codes as follows.
A rate convolutional code [20] is an -submodule of with rank given by
where is a matrix, called generator matrix, that is basic, i.e., has a polynomial right inverse.
Note that if , with
then,
The maximum degree of all polynomials in the -th row of is denoted by . The degree of is defined as the maximum degree of the full size minors of . We say that is an convolutional code [15]. Important for the performance of a code in terms of error-free decoding is the (Hamming) distance between two codewords. In the case of convolutional codes, the most relevant notion of distance for low-delay decoding is the column distance that can be defined as follows.
The -th column distance [11] is defined as
where represents the -th truncation of the codeword and
where is the Hamming weight of , which determines the number of nonzero components of , for . For simplicity, we use instead of .
The -th column distance is upper bounded [9] by
and the maximality of any of the column distances implies the maximality of all the previous ones, that is, if for some , then for all . The value
(1) |
is the largest value for which the bound can be achieved and an convolutional code with is called a maximum distance profile (MDP) code [9]. Hence, MDP codes have optimal error correcting capabilities within time intervals and therefore are ideal for low delay correction. In this work we shall assume that the retrieval must be performed within a given delay constraint , see [2, 5].
Assume that
and consider the associated sliding matrix
(2) |
with when , for .
Theorem 2 (Theorem in [9])
Let be the matrices defined in (2). Then the following statements are equivalent:
-
1.
;
-
2.
every full size minor of formed from the columns with the indices , where , for is nonzero;
In particular, when , is an MDP code.
Theorem 3
[20, Theorem 3.1] Let be an MDP convolutional code. If and in any sliding window of length at most erasures occur in a transmitted sequence, then complete recovery is possible.
Considering the proof of this theorem, one sees that the recovery is even possible within a delay of windows of size and that the given condition for complete recovery is only sufficient but not necessary.
We will develop a PIR scheme in which the star product of certain block codes plays an important role.
Definition 4
The star product of two vectors is defined as . The star product of two block codes is defined as .
Star product PIR was first introduced in [8]. The main idea of this scheme is to design the queries to the different servers in such a way that if the responses are formed as inner products of the query and the stored information, then the total response is a codeword of a certain star product code with some error, where the error contains the information one is interested in. In [10] this scheme was adopted forming the responses in a convolutional way. In the following section, we present a star product scheme where the responses as well as the storage code are convolutional.
III Streaming PIR scheme
We have sequences of files with for and . These are encoded with an MDP convolutional code with generator matrix to obtain the sequences of files with for and where we set for . Moreover, we have servers and for , we store the -th component of each vector (for , ) on server number . Furthermore, we assume that and that for , is the generator matrix of an MDS block code denoted by . We will present a construction for an MDP convolutional code with these properties later in this paper. It holds for all and for . Thus, we set for .
The user wants to stream the sequence for some without the servers knowing , i.e. without that the servers know which sequence he or she is streaming. For our PIR scheme we assume that there is no collusion between the servers (i.e. the number of colluding servers, usually denoted by in the literature is equal to ).
Set , let be the block code generated by and be a matrix whose rows are constituted by random codewords of (i.e. multiples of ). For a subset , we denote by the vector with entries and we denote by the -th standard basis vector of .
For , we send the following query to server :
(3) |
where denotes the -th column of . We write with for and .
The response of server at time is
(4) |
where for .
Hence the total response at time is given by
(5) |
where diag(E) denotes the diagonal matrix with diagonal entries equal to the entries of the vector .
By Definition 4 and the definition of the code the star product code is equal to the MDS code . As is a linear code, any sum of codewords is again a codeword. Hence, the response has the form
(6) |
for some .
We assume that it is possible that some parts of the response at time get lost during transmission and could not be received. Hence the vector could have some erased components. We denote by the set that consists of the positions of the erased components of the vector .
Lemma 5
If , the user is able to obtain the vector . In particular, this is true if , where is the number of erased components of the vector .
Proof:
Using equation (6) and the definition of the vector , we apply erasure decoding in the MDS code to the vector where the set of erasures is the union of and . The lemma follows from the fact that an MDS code could correct any set of erasures whose cardinality is smaller than the minimum distance of the code. ∎
For each for which the condition of the preceding lemma is not fulfilled we are not able to obtain . Therefore, we define
(7) |
where denotes the zero matrix.
It remains to show how to obtain the desired sequence of files from the sequence . With the definitions and
(11) |
one obtains
(12) |
Denote by the identity matrix and set where each block of rows of contains identity matrices. Then, one has
(16) |
Therefore, one obtains the following lemma.
Lemma 6
The column distances of the convolutional code with generator matrix where are equal to the column distances of .
Proof:
First note that the matrix defined in (11) is the sliding generator matrix of . Denote by the -th column distance of the code and by the matrix that consists of the first rows and the first columns of the matrix . Then, it holds
(17) |
∎
Hence, we could use equation (12) to obtain via erasure decoding with an MDP convolutional code where the set of positions of the total erasures denoted by has the form with
(18) |
where for , the set is defined as
and should be defined analogous.
Hence, using also Theorem 3, we get the following theorem.
Theorem 7
Assume that . If the set of erasures given in (18) is such that in every sliding window of the sequence of size there are not more than erasures, then one could obtain the desired sequence of files from the sequence within time delay , i.e. one could privatly obtain the sequence of files within time delay .
From this theorem we could deduce which erasure patterns we can correct for sure with our proposed scheme.
Corollary 8
With the proposed scheme private reception within time delay is possible if for , there are not more than transmission erasures in positions of the sequence of responses and in every sliding window of this sequence of length there are not more than transmission erasures in positions .
Finally, we have to choose the cardinality of the set . Then, the set is chosen randomly with this fixed cardinality. If the cardinality of is larger, this leads to more erasures for to correct. But in turn if the cardinality of is smaller, this leads to more erasures for to correct. To balance this somehow, we want to determine such that the number of erasures one could correct in positions is approximately the same as the number of erasures one could correct in positions . We denote this number of erasures by . This approach leads to the following equations:
(19) | ||||
(20) |
This implies
(21) |
and consequently,
(22) |
Having equality in this last equation, implies . However, we need to be an integer and independent of . As depending on the erasure pattern, the MDP convolutional code might be able to correct more erasures than (20) indicates, we propose to rather choose smaller, which finally leads to
(23) |
Of course, depending on the erasures that occur during transmission, other choices for could lead to a better performance. However, as we do not know the erasure pattern before transmission and we have to choose before, we cannot adapt corresponding to the erasure pattern but have to choose it in a way that the numbers of channel erasures our codes and are able to tolerate are balanced.
Note that we can correct more erasures in if is small (as the code has a larger minimum distance if is small). This means that we could tolerate slightly more erasures at the beginning of the stream than at the end.
In the following, we illustrate the erasure correcting capability of our scheme with the help of two examples.
Example 9
Let , and . This implies and , i.e. is an MDP convolutional code that could recover all erasures patterns for which in each sliding window of size there are not more than erasures. We assume . Moreover, according to equation (23), we have . We illustrate one window of the response sequence in the following figure, where the squares with content denote the positions of the set :
j | j | j | j | j | j |
According to Theorem 8 we are able to recover erasures in the first positions with erasure decoding in . Moreover, and are both able to correct additional erasure. Finally, the convolutional code is able to correct erasures in the positions in which we have a . To count the total number of erasures as well as the number of erasure patterns that can be corrected (assuming that erasures occur independently of each other), we have to distinguish two cases.
For the first case, we assume that the erasure pattern allows decoding with , and . Hence, we are able to correct up to erasures in positions. Moreover, if we assume that the erasures occur independently of each other, we could correct different erasure patterns.
For the second case, we assume that the erasure pattern is such that there exists such that decoding with is not possible, i.e. the -th window of size has to be considered as completely lost for . In order that recovery is still possible, decoding with for has to be possible and only one additional erasure in the positions in outside the completely erased window can be tolerated. Thus, for the maximal number of erasures that can be corrected is and the number of correctable erasure patterns equals . For , the maximum number of erasures that can be corrected is and the number of correctable erasure patterns equals .
Summimg up, considering all cases, one gets that there are erasure patterns that we can correct.
If one would choose , correction is not possible anymore if one complete window of size is lost. We would still be able to correct erasures but all these erasures have to be in positions
whereas no erasures in positions
could be corrected. Counting the number of erasure patterns that we are able to correct under the assumption of independent erasures, we get .
If one would choose , there are three cases to distinguish. For the first case, assume that no window of size is completely lost for recovery with . Then, we could again correct erasures but only of these erasures can have a position in
The number of erasure patterns that could be corrected is .
For the second case, assume that correction with is not possible for exactly one . For , one could correct up to erasures and erasure patterns, for , up to erasures and erasure patterns.
For the third case, assume that correction with is not possible for exactly two values , denoted by and . If , one could correct up to erasures and erasure patterns, for , up to erasures and erasure patterns.
Hence the total number of erasure patterns that could be corrected is . This illustrates that our choice of is optimal if we assume the erasures to occur independently of each other.
Finally, we want to consider, how many erasures we can correct in a larger window and choose a window of size 24, which is illustrated as follows:
j | j | j | j | j | j | j | j |
According to Theorem 8 we are able to recover erasures in the first positions with erasure decoding in and additional erasures with for . The convolutional code is able to correct up to erasures in the positions with . Under the assumption that decoding with is possible for , we are able to correct up to erasures. If decoding is not possible for exactly one , one can correct up to erasures if and up to erasures if . If decoding is not possible for exactly two of the star product codes, recovery is only possible if this happens for and , in which case up to erasures can be corrected.
If one would choose , we would only be able to correct erasures and all these erasures have to be in positions in
If one would choose , one has to distinguish four cases. Under the assumption that decoding with is possible for , we are able to correct up to erasures in total but only of these erasures could be in
If decoding is not possible for exactly one , one can correct up to erasures if and up to erasures if . If decoding is not possible for exactly two of the star product codes and is among them, one could correct up to erasures and if is not among them, one could correct up to erasures. If decoding is not possible for exactly three of the star product codes, one could correct up to erasures (but there are only two erasure patterns for this scenario).
Example 10
Let , and . This implies (if we use for the construction presented in the next section where is full rank) and , i.e. is an MDP convolutional code that can recover all erasures patterns for which in each sliding window of size there are not more than erasures. We assume . Moreover, according to equation (23), we have .
According to Theorem 8 we are able to recover erasures in the first positions of the response sequence with erasure decoding in . Moreover, and are both able to correct additional erasure. Finally, the convolutional code is able to correct erasures in the positions covered by one of the sets . In total, we are able to correct erasures in positions in the case that correction with all is possible, up to erasures in the case that (only) the first window of size is lost completely and up to erasures in the case that another window of size is erased completely.
If one would choose , we would be able to correct erasures but all these erasures have to be in positions in
whereas no erasures in
could be corrected.
If one would choose , we can again correct erasures in the case that correction with all is possible but only of these erasures can have a position in
Moreover, we could correct up to erasures in the case that the first window of size is lost completely and up to erasures in the case that another window of size is erased completely.
Again our choice of is optimal if we assume the erasures to occur independently of each other.
Remark 11
The major advantage of using convolutional codes instead of block codes is that the symbols in different windows of size are dependent on each other and hence erasures cannot only be recovered with the help of the received symbols in the same window but also with the help of received symbols of other windows. This is illustrated also by the previous examples where recovery is possible if all symbols with positions in are erased if not too many symbols with positions in and are erased. This is due to the fact that there are erasure patterns where all symbols of the first window of size are erased but recovery with a convolutional code is still possible. But of course. this can never be possible using block codes since in this case all windows of size have to be decoded independently of each other.
IV Construction of suitable streaming codes
The aim of this section is to provide constructions for MDP convolutional codes , which have the additional property that, for , is an MDS block code, as proposed at the beginning of the previous section. To this end, we will use the following lemma and proposition.
Lemma 12
[12] Let be an block code with generator matrix . Then, is MDS if, and only if, all full size minors of are nonzero.
Proposition 13
[3, Theorem 3.3] Let be a primitive element of a finite field and be a matrix over with the following properties
-
1.
if , then for a positive integer
-
2.
if , then for any or for any
-
3.
if , and , then
-
4.
if , and , then .
Suppose is greater than any exponent of appearing as a nontrivial term of any minor of . Then has the property that each of its minors which is not trivially zero is nonzero.
The following theorem gives the desired construction.
Theorem 14
Let be prime, and be a primitive element of . For , set
(27) |
Then, the convolutional code with generator matrix is an MDP convolutional code and moreover, for , is the generator matrix of an MDS block code if .
Proof:
V Conclusion
We have studied the problem of private streaming of a sequence of files having the resilience against unresponsive servers the primary metric for judging the efficiency of a PIR scheme. We proposed for the first time a general scheme for such a problem. This scheme is based on MDP convolutional codes and the star product of codes. It suits for a context where some servers fail to respond in contrast to other solutions considered in the literature where all the servers were assumed to fail at the same time instant. The approach presented can retrieve files in a sequential fashion and therefore is optimal for low-delay streaming applications. Some examples were presented to show how to take advantage of the proposed scheme. We derived a large set of erasure patterns that our codes can recover. Concrete constructions of such codes exist although large field sizes are required. The construction of optimal codes for PIR over small fields that can deal with both burst and isolated erasures/errors is an interesting open problem that requires further research.
Acknowledgment
The work of the first and third author was supported by the Portuguese Foundation for Science and Technology (FCT-Fundação para a Ciência e a Tecnologia), through CIDMA - Center for Research and Development in Mathematics and Applications, within project UID/MAT/04106/2019. The first author was supported by the German Research Foundation within grant LI 3101/1-1. The second author was partially supported by Spanish grant AICO/2017/128 of the Generalitat Valenciana and the University of Alicante under the project VIGROB-287.
References
- [1] Cisco visual network index: Forecast and methodology, 2016-2021. Tech. Rep., June 2017, 2018.
- [2] N. Adler and Y. Cassuto. Burst-erasure correcting codes with optimal average delay. IEEE Trans. Inform. Theory, 63(5):2848–2865, May 2017.
- [3] P. Almeida, D. Napp, and R. Pinto. Superregular matrices and applications to convolutional codes. Linear Algebra and its Applications, 499:1–25, 2016.
- [4] A. Badr, A. Khisti, W. T. Tan, and J. Apostolopoulos. Robust streaming erasure codes based on deterministic channel approximations. In 2013 IEEE International Symposium on Information Theory, pages 1002–1006, 2013.
- [5] A. Badr, A. Khisti, Wai-Tian. Tan, and J. Apostolopoulos. Layered constructions for low-delay streaming codes. IEEE Trans. Inform. Theory, 63(1):111–141, 2017.
- [6] K. Banawan and S. Ulukus. The capacity of private information retrieval from coded databases. IEEE Transactions on Information Theory, 64(3):1945–1956, 2018.
- [7] Benny Chor, Eyal Kushilevitz, Oded Goldreich, and Madhu Sudan. Private information retrieval. J. ACM, 45(6):965–981, 1998.
- [8] R. Freij-Hollanti, O. Gnilke, C. Hollanti, and D. Karpuk. Private information retrieval from coded databases with colluding servers. SIAM Journal on Applied Algebra and Geometry, 1(1):647–664, 2017.
- [9] H. Gluesing-Luerssen, J. Rosenthal, and R. Smarandache. Strongly MDS convolutional codes. IEEE Trans. Inform. Theory, 52(2):584–598, 2006.
- [10] Lukas Holzbaur, Ragnar Freij-Hollanti, Antonia Wachter-Zeh, and Camilla Hollanti. Private streaming with convolutional codes. In 2018 IEEE Information Theory Workshop, ITW 2018, pages 550–554. Institute of Electrical and Electronics Engineers, 2019.
- [11] R. Johannesson and K. Sh. Zigangirov. Fundamentals of Convolutional Coding. IEEE Press, New York, 2015.
- [12] F. J. MacWilliams and N. J.A. Sloane. The Theory of Error-Correcting Codes. North Holland, Amsterdam, 1977.
- [13] U. Martínez-Peñas. Private information retrieval from locally repairable databases with colluding servers. In 2019 IEEE International Symposium on Information Theory (ISIT), 2019.
- [14] E. Martinian and C. E. W. Sundberg. Burst erasure correction codes with low decoding delay. IEEE Transactions on Information Theory, 50(10):2494–2502, 2004.
- [15] R. J. McEliece. The algebraic theory of convolutional codes. In Handbook of Coding Theory, volume 1, pages 1065–1138. Elsevier Science Publishers, 1998.
- [16] N. B. Shah, K. V. Rashmi, and K. Ramchandran. One extra bit of download ensures perfectly private information retrieval. In 2014 IEEE International Symposium on Information Theory, pages 856–860, 2014.
- [17] R. Tajeddine and S. El Rouayheb. Private information retrieval from mds coded data in distributed storage systems. In 2016 IEEE International Symposium on Information Theory (ISIT), pages 1411–1415, 2016.
- [18] R. Tajeddine, O. W. Gnilke, D. Karpuk, R. Freij-Hollanti, and C. Hollanti. Private information retrieval from coded storage systems with colluding, byzantine, and unresponsive servers. IEEE Transactions on Information Theory, 65(6):3898–3906, 2019.
- [19] R. Tajeddine and S. E. Rouayheb. Robust private information retrieval on coded data. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 1903–1907, 2017.
- [20] V. Tomas, J. Rosenthal, and R. Smarandache. Decoding of convolutional codes over the erasure channel. IEEE Trans. Inform. Theory, 58(1):90–108, January 2012.
- [21] Yiwei Zhang and Gennian Ge. Private information retrieval from mds coded databases with colluding servers under several variant models. 2017.