This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

University of Waterloo, Canadaxiaohu@uwaterloo.cahttps://orcid.org/0000-0002-7890-665X University of Waterloo, Canadazhiang.wu@uwaterloo.cahttps://orcid.org/0009-0004-8647-1416 {CCSXML} <ccs2012> <concept> <concept_id>10002978.10003018.10003020</concept_id> <concept_desc>Security and privacy Management and querying of encrypted data</concept_desc> <concept_significance>500</concept_significance> </concept> <concept> <concept_id>10002951.10002952.10003190.10003192.10003426</concept_id> <concept_desc>Information systems Join algorithms</concept_desc> <concept_significance>500</concept_significance> </concept> </ccs2012> \ccsdesc[500]Security and privacy Management and querying of encrypted data \ccsdesc[500]Information systems Join algorithms \EventEditors \EventNoEds2 \EventLongTitle28th International Conference on Database Theory (ICDT 2025) \EventShortTitleICDT 2025 \EventAcronymICDT \EventYear2025 \EventDate \EventLocation \EventLogo \SeriesVolume \ArticleNo

Optimal Oblivious Algorithms for Multi-way Joins

Xiao Hu    Zhiang Wu
Abstract

In cloud databases, cloud computation over sensitive data uploaded by clients inevitably causes concern about data security and privacy. Even when encryption primitives and trusted computing environments are integrated into query processing to safeguard the actual contents of the data, access patterns of algorithms can still leak private information about the data. Oblivious RAM (ORAM) and circuits are two generic approaches to address this issue, ensuring that access patterns of algorithms remain oblivious to the data. However, deploying these methods on insecure algorithms, particularly for multi-way join processing, is computationally expensive and inherently challenging.

In this paper, we propose a novel sorting-based algorithm for multi-way join processing that operates without relying on ORAM simulations or other security assumptions. Our algorithm is a non-trivial, provably oblivious composition of basic primitives, with time complexity matching the insecure worst-case optimal join algorithm, up to a logarithmic factor. Furthermore, it is cache-agnostic, with cache complexity matching the insecure lower bound, also up to a logarithmic factor. This clean and straightforward approach has the potential to be extended to other security settings and implemented in practical database systems.

keywords:
oblivious algorithms, multi-way joins, worst-case optimality

1 Introduction

In outsourced query processing, a client entrusts sensitive data to a cloud service provider, such as Amazon, Google, or Microsoft, and subsequently issues queries to the provider. The service provider performs the required computations and returns the results to the client. Since these computations are carried out on remote infrastructure, ensuring the security and privacy of query evaluation is a critical requirement. Specifically, servers must remain oblivious to any information about the underlying data throughout the computation process. To achieve this, advanced cryptographic techniques and trusted computing hardware are employed to prevent servers from inferring the actual contents of the data [34, 19]. However, the memory accesses during execution may still lead to information leakage, posing an additional challenge to achieving comprehensive privacy. For example, consider the basic (natural) join operator on two database instances: R1={(ai,bi):i[N]}S1={(bi,ci):i[N]}R_{1}=\left\{(a_{i},b_{i}):i\in[N]\right\}\Join S_{1}=\left\{(b_{i},c_{i}):i\in[N]\right\} and R2={(ai,b1):i[N]}S2={(b1,ci):i[N]}R_{2}=\left\{(a_{i},b_{1}):i\in[N]\right\}\Join S_{2}=\left\{(b_{1},c_{i}):i\in[N]\right\} for some N+N\in\mathbb{Z}^{+}, where each pair of tuples can be joined if and only if they have the same bb-value. Suppose each relation is sorted by their bb-values. Using the merge join algorithm, there is only one access to S1S_{1} between two consecutive accesses to R1R_{1}, but there are NN accesses to S2S_{2} between two consecutive accesses to R2R_{2}. Hence, the server can distinguish the degree information of join keys of the input data by observing the sequence of memory accesses. Moreover, if the server counts the total number of memory accesses, it can further infer the number of join results of the input data.

The notion of obliviousness was proposed to formally capture such a privacy guarantee on the memory access pattern of algorithms [31, 30]. This concept has inspired a substantial body of research focused on developing algorithms that achieve obliviousness in practical database systems [55, 24, 20, 17]. A generic approach to achieving obliviousness is Oblivious RAM (ORAM) [31, 41, 29, 52, 23, 48, 7], which translates each logical access into a poly-logarithmic (in terms of the data size) number of physical accesses to random locations of the memory. but the poly-logarithmic additional cost per memory access is very expensive in practice [15]. Another generic approach involves leveraging circuits [53, 26]. Despite their theoretical promise, generating circuits is inherently complex and resource-intensive, and integrating such constructions into database systems often proves to be inefficient. These challenges highlight the advantages of designing algorithms that are inherently oblivious to the input data, eliminating the need for ORAM frameworks or circuit constructions.

In this paper, we take on this question for multi-way join processing, and examine the insecure worst-case optimal join (WCOJ) algorithm [43, 44, 50], that can compute any join queries in time proportional to the maximum number of join results. Our objective is to investigate the intrinsic properties of the WCOJ algorithm and transform it into an oblivious version while preserving its optimal complexity guarantee.

1.1 Problem Definition

Multi-way join. A (natural) join query can be represented as a hypergraph 𝒬=(𝒱,)\mathcal{Q}=(\mathcal{V},\mathcal{E}) [1], where 𝒱\mathcal{V} models the set of attributes, and 2𝒱\mathcal{E}\subseteq 2^{\mathcal{V}} models the set of relations. Let dom(x)\mathrm{dom}(x) be the domain of attribute x𝒱x\in\mathcal{V}. An instance of 𝒬\mathcal{Q} is a function \mathcal{R} that maps each ee\in\mathcal{E} to a set of tuples ReR_{e}, where each tuple tRet\in R_{e} specifies a value in dom(x)\mathrm{dom}(x) for each attribute xex\in e. The result of a join query 𝒬\mathcal{Q} over an instance \mathcal{R}, denoted by 𝒬()\mathcal{Q}(\mathcal{R}), is the set of all combinations of tuples, one from each relation ReR_{e}, that share the same values in their common attributes, i.e.,

𝒬()={tx𝒱dom(x)e,teRe,πet=te}.\mathcal{Q}(\mathcal{R})=\left\{t\in\prod_{x\in\mathcal{V}}\mathrm{dom}(x)\mid\forall e\in\mathcal{E},\exists t_{e}\in R_{e},\pi_{e}t=t_{e}\right\}.

Let N=e|Re|N=\sum_{e\in\mathcal{E}}|R_{e}| be the input size of instance \mathcal{R}, i.e., the total number of tuples over all relations. Let OUT=|𝒬(R)|\textrm{OUT}=|\mathcal{Q}(R)| be the output size of the join query 𝒬\mathcal{Q} over instance \mathcal{R}. We study the data complexity [1] of join algorithms by measuring their running time in terms of input and output size of the instance. We consider the size of 𝒬\mathcal{Q}, i.e., |𝒱||\mathcal{V}| and |||\mathcal{E}|, as constant.

Model of computation.

We consider a two-level hierarchical memory model [40, 18]. The computation is performed within trusted memory, which consists of MM registers of the same width. For simplicity, we assume that the trusted memory size is cMc\cdot M, where cc is a constant. This assumption will not change our results by more than a constant factor. Since we assume the query size as a constant, the arity of each relation is irrelevant. Each tuple is assumed to fit into a single register, with one register allocated per tuple, including those from input relations as well as intermediate results. We further assume that cMc\cdot M tuples from any set of relations can fit into the trusted memory. Input data and all intermediate results generated during the execution are encrypted and stored in an untrusted memory of unlimited size. Both trusted and untrusted memory are divided into blocks of size BB. One memory access moves a block of BB consecutive tuples from trusted to untrusted memory or vice versa. The complexity of an algorithm is measured by the number of such memory accesses.

An algorithm typically operates by repeating the following three steps: (1) read encrypted data from the untrusted memory into the trusted memory, (2) perform computation inside the trusted memory, and (3) Encrypt necessary data and write them back to the untrusted memory. Adversaries can only observe the address of the blocks read from or written to the untrusted memory in (1) and (3), but not data contents. They also cannot interfere with the execution of the algorithm. The sequence of memory accesses to the untrusted memory in the execution is referred to as the “access pattern” of the algorithm. In this context, we focus on two specific scenarios of interest:

  • Random Access Model (RAM). This model can simulate the classic RAM model with M=O(1)M=O(1) and B=1B=1, where the trusted memory corresponds to O(1)O(1) registers and the untrusted memory corresponds to the main memory. The time complexity in this model is defined as the number of accesses to the main memory by a RAM algorithm.

  • External Memory Model (EM). This model can naturally simulate the classic EM model [3, 51], where the trusted memory corresponds to the main memory and the untrusted memory corresponds to the disk. Following prior work [28, 21, 18], we focus on the cache-agnostic EM algorithms, which are unaware of the values of MM (memory size) and BB (block size), a property commonly referred to as cache-oblivious in the literature. To avoid ambiguity, we use the terms “cache-agnostic” to refer to “cache-oblivious” and “oblivious” to refer to “access-pattern-oblivious” throughout this paper. The advantages of cache-agnostic algorithms have been extensively studied, particularly in multi-level memory hierarchies. A cache-agnostic algorithm can seamlessly adapt to operate efficiently between any two adjacent levels of the hierarchy. We adopt the tall cache assumption, M=Ω(B2)M=\Omega(B^{2}) and further M=Ω(log1+ϵN)M=\Omega(\log^{1+\epsilon}N)111In this work, log()\log(\cdot) always means log2()\log_{2}(\cdot) and should be distinguished from logMB()\log_{\frac{M}{B}}(\cdot). for an arbitrarily small constant ϵ(0,1)\epsilon\in(0,1), and the wide block assumption, B=Ω(log0.55N)B=\Omega(\log^{0.55}N). These are standard assumptions widely adopted in the literature of EM algorithms [3, 51, 6, 28, 21, 18]. The cache complexity in this model is defined as the number of accesses to the disk by an EM algorithm.

Oblivious Algorithms. The notion of obliviousness is defined based on the access pattern of an algorithm. Memory accesses to the trusted memory are invisible to the adversary and, therefore, have no impact on security. Let 𝒜\mathcal{A} be an algorithm, 𝒬\mathcal{Q} a join query, and \mathcal{R} an arbitrary input instance of 𝒬\mathcal{Q}. We denote 𝖠𝖼𝖼𝖾𝗌𝗌𝒜(𝒬,)\mathsf{Access}_{\mathcal{A}}(\mathcal{Q},\mathcal{R}) as the sequence of memory accesses made by 𝒜\mathcal{A} to the untrusted memory when given (𝒬,)(\mathcal{Q},\mathcal{R}) as the input, where each memory access is a read or write operation associated with a physical address. The join query 𝒬\mathcal{Q} and the size NN of the input instance are considered non-sensitive information and can be safely exposed to the adversary. In contrast, all input tuples are considered sensitive information and should be hidden from adversaries. This way, the access pattern of an oblivious algorithm 𝒜\mathcal{A} should only depend on 𝒬\mathcal{Q} and NN, ensuring no leakage of sensitive information.

Definition 1.1 (Obliviousness [30, 31, 14]).

An algorithm 𝒜\mathcal{A} is oblivious for a join query 𝒬\mathcal{Q}, if given an arbitrary parameter N+N\in\mathbb{Z}^{+}, for any pair of instances ,\mathcal{R},\mathcal{R}^{\prime} of 𝒬\mathcal{Q} with input size NN, 𝖠𝖼𝖼𝖾𝗌𝗌𝒜(𝒬,)𝛿𝖠𝖼𝖼𝖾𝗌𝗌𝒜(𝒬,)\mathsf{Access}_{\mathcal{A}}(\mathcal{Q},\mathcal{R})\overset{\mathrm{\delta}}{\equiv}\mathsf{Access}_{\mathcal{A}}(\mathcal{Q},\mathcal{R}^{\prime}), where δ\delta is a negligible function in terms of NN. Specifically, for any positive constant cc, there exists NcN_{c} such that δ(N)<1Nc\delta(N)<\frac{1}{N^{c}} for any N>NcN>N_{c}. The notation 𝛿\overset{\mathrm{\delta}}{\equiv} indicates the statistical distance between two distributions is at most δ\delta.

This notion of obliviousness applies to both deterministic and randomized algorithms. For a randomized algorithm, different execution states may arise from the same input instance due to the algorithm’s inherent randomness. Each execution state corresponds to a specific sequence of memory accesses, allowing the access pattern to be modeled as a random variable with an associated probability distribution over the set of all possible access patterns. The statistical distance between two probability distributions is typically quantified using standard metrics, such as the total variation distance. A randomized algorithm is indeed oblivious if its access pattern exhibits statistically indistinguishable distributions across all input instances of the same size. Relatively simpler, a deterministic algorithm is oblivious if it displays an identical access pattern for all input instances of the same size.

1.2 Review of Existing Results

Oblivious RAM. ORAM is a general randomized framework designed to protect access patterns [31]. In ORAM, each logical access is translated into a poly-logarithmic number of random physical accesses, thereby incurring a poly-logarithmic overhead. Goldreich et al. [31] established a lower bound Ω(logN)\Omega(\log N) on the access overhead of ORAMs in the RAM model. Subsequently, Asharov et al. [7] proposed a theoretically optimal ORAM construction with an overhead of O(logN)O(\log N) in the RAM model under the assumption of the existence of a one-way function, which is rather impractical [47]. It remains unknown whether a better cache complexity than O(logN)O(\log N) can be shown for such a construction. Path ORAM [48] is currently the most practical ORAM construction, but it introduces an O(log2N)O(\log^{2}N) overhead and requires Ω(1)\Omega(1) trusted memory. In the EM model, one can place the tree data structures for ORAM in an Emde Boas layout, resulting in a memory access overhead of O(logNlogBN)O(\log N\cdot\log_{B}N).

Insecure Join Algorithms.

The WCOJ algorithm [43] have been developed to compute any join query in O(Nρ)O(N^{\rho^{*}}) time222A hashing-based algorithm achieves O(Nρ)O(N^{\rho^{*}}) time in the worst case using the lazy array technique [27]., where ρ\rho^{*} is the fractional edge cover number of the join query (formally defined in Section 2.1). The optimality is implied by the AGM bound [8]. 333The maximum number of join results produced by any instance of input size NN is O(Nρ)O(N^{\rho^{*}}), which is also tight in the sense that there exists some instance of input size NN that can produce Θ(Nρ)\Theta(N^{\rho^{*}}) join results. However, these WCOJ algorithms are not oblivious. In Section 4, we use triangle join as an example to illustrate the information leakage from the WCOJ algorithm. Another line of research also explored output-sensitive join algorithms. A join query can be computed in O((Nsubw+OUT)𝗉𝗈𝗅𝗒𝗅𝗈𝗀N))O((N^{\textsf{subw}}+\textrm{OUT})\cdot\mathsf{polylog}N)) time [54, 2], where subw is the submodular-width of the join query. For example, subw=1\textsf{subw}=1 if and only if the join query is acyclic [11, 25]. These algorithms are also not oblivious due to various potential information leakages. For instance, the total number of memory accesses is influenced by the output size, which can range from a constant to a polynomially large value relative to the input size. A possible mitigation strategy is worst-case padding, which involves padding dummy accesses to match the worst case. However, this approach does not necessarily result in oblivious algorithms, as their access patterns may still vary significantly across instances with the same input size.

In contrast, there has been significantly less research on multi-way join processing in the EM model. First of all, we note that an EM version of the WCOJ  algorithm incurs at least Ω(NρB)\Omega\left(\frac{N^{\rho^{*}}}{B}\right) cache complexity since there are Θ(Nρ)\Theta(N^{\rho^{*}}) join results in the worst case and all join results should be written back to disk. For the basic two-way join, the nested-loop algorithm has cache complexity O(N2B)O\left(\frac{N^{2}}{B}\right) and the sort-merge algorithm has cache complexity O(NBlogMBNB+OUTB)O\left(\frac{N}{B}\log_{\frac{M}{B}}\frac{N}{B}+\frac{\textrm{OUT}}{B}\right). For multi-way join queries, an EM algorithm with cache complexity O(NρMρ1BlogMBNB+OUTB)O\left(\frac{N^{\rho^{*}}}{M^{\rho^{*}-1}B}\cdot\log_{\frac{M}{B}}\frac{N}{B}+\frac{\textrm{OUT}}{B}\right) has been achieved for Berge-acyclic joins [37], α\alpha-acyclic joins [36, 39], graph joins [38, 22] and Loomis-Whitney joins [39].444Some of these algorithms have been developed for the Massively Parallel Computation (MPC) model [10] and can be adapted to the EM model through the MPC-to-EM reduction [39]. These results were previously stated without including the output-dependent term OUTB\frac{\textrm{OUT}}{B} since they do not consider the cost of writing join results back to disk. Again, even padding the output size to be as large as the worst case, these algorithms remain non-oblivious since their access patterns heavily depend on the input data. Furthermore, even in the insecure setting, no algorithm with a cache complexity O(NρB)O\left(\frac{N^{\rho^{*}}}{B}\right) is known for general join queries.

Oblivious Join Algorithms.

Oblivious algorithms have been studied for join queries in both the RAM and EM models. In the RAM model, the naive nested-loop algorithm can be transformed into an oblivious one by incorporating some dummy writes, as it enumerates all possible combinations of tuples from input relations in a fixed order. This algorithm runs in O(N||)O(N^{|\mathcal{E}|}) time, where |||\mathcal{E}| is the number of relations in the join query. Wang et al. [53] designed circuits for conjunctive queries - capturing all join queries as a special case - whose time complexity matches the AGM bound up to poly-logarithmic factors. Running such a circuit will automatically yield an oblivious join algorithm with O(Nρ𝗉𝗈𝗅𝗒𝗅𝗈𝗀N)O\left(N^{\rho^{*}}\cdot\mathsf{polylog}N\right) time complexity. By integrating the insecure WCOJ algorithm [44] with the optimal ORAM [7], it is possible to achieve an oblivious algorithm with O(NρlogN)O(N^{\rho^{*}}\cdot\log N) time complexity, albeit under restrictive theoretical assumptions. Alternatively, incorporating the insecure WCOJ algorithm into the Path ORAM yields an oblivious join algorithm with O(Nρlog2N)O\left(N^{\rho^{*}}\cdot\log^{2}N\right) time complexity.

In the EM model, He et al. [35] proposed a cache-agnostic nested-loop join algorithm for the basic two-way join RSR\Join S with O(|R||S|B)O\left(\frac{|R|\cdot|S|}{B}\right) cache complexity, which is also oblivious. Applying worst-case padding and the optimal ORAM construction to the existing EM join algorithms, we can derive an oblivious join algorithm with O(NρBlogMBNBlogN)O\left(\frac{N^{\rho^{*}}}{B}\cdot\log_{\frac{M}{B}}\frac{N}{B}\cdot\log N\right) cache complexity for specific cases such as acyclic joins, graph joins and Loomis-Whitney joins. However, these algorithms are not cache-agnostic. For general join queries, no specific oblivious algorithm has been proposed for the EM model, aside from results derived from the oblivious RAM join algorithm. These results yield cache complexities of either O(NρlogN)O\left(N^{\rho^{*}}\cdot\log N\right) or O(NρlogNlogBN)O\left(N^{\rho^{*}}\cdot\log N\cdot\log_{B}N\right), as they rely heavily on retrieving tuples from hash tables or range search indices.

Previous Results New Results
RAM model O(NρlogN)O\left(N^{\rho^{*}}\cdot\log N\right) [44, 7] O(NρlogN)O\left(N^{\rho^{*}}\cdot\log N\right)
(one-way function assumption) (no assumption)
Cache-agnostic O(Nmin{ρ+1,ρ}BlogMBNmin{ρ+1,ρ}B)O\left(\frac{N^{\min\{\rho^{*}+1,\rho\}}}{B}\cdot\log_{\frac{M}{B}}\frac{N^{\min\{\rho^{*}+1,\rho\}}}{B}\right) O(NρBlogMBNρB)O\left(\frac{N^{\rho^{*}}}{B}\cdot\log_{\frac{M}{B}}\frac{N^{\rho^{*}}}{B}\right)
EM model
(no assumption) (tall cache and wide block assumptions)
Table 1: Comparison between previous and new oblivious algorithms for multi-way joins. NN is the input size. ρ\rho^{*} and ρ\rho are the input join query’s fractional and integral edge cover numbers, respectively. MM is the trusted memory size. BB is the block size.

Relaxed Variants of Oblivious Join Algorithms.

Beyond fully oblivious algorithms, researchers have explored relaxed notions of obliviousness by allowing specific types of leakage, such as the join size, the multiplicity of join values, and the size of intermediate results. One relevant line of work examines join processing with released input and output sizes. For example, integrating an insecure output-sensitive join algorithm into an ORAM framework produces a relaxed oblivious algorithm with O((Nsubw+OUT)polylogN)O\left((N^{\textsf{subw}}+\textrm{OUT})\cdot\mathrm{polylog}N\right) time complexity. It is noted that relaxed oblivious algorithm with the same time complexity O((N+OUT)logN)O((N+\textrm{OUT})\cdot\log N) have been proposed without requiring ORAM [5, 40] for the basic two-way join as well as acyclic joins. Although not fully oblivious, these algorithms serve as fundamental building blocks for developing our oblivious algorithms for general join queries. Another line of work considered differentially oblivious algorithms [14, 12, 18], which require only that access patterns appear similar across neighboring input instances. However, differentially oblivious algorithms have so far been limited to the basic two-way join [18]. This paper does not pursue this direction further.

1.3 Our Contribution

Our main contribution can be summarized as follows (see Table 1):

  • We give a nested-loop-based algorithm for general join queries with O(Nmin{ρ+1,ρ}logN)O\left(N^{\min\{\rho^{*}+1,\rho\}}\cdot\log N\right) time complexity and O(Nmin{ρ+1,ρ}BlogMBNmin{ρ+1,ρ}B)O\left(\frac{N^{\min\{\rho^{*}+1,\rho\}}}{B}\cdot\log_{\frac{M}{B}}\frac{N^{\min\{\rho^{*}+1,\rho\}}}{B}\right) cache complexity, where ρ\rho^{*} and ρ\rho are the fractional and integral edge cover number of the join query, respectively (formally defined in Section 2.1). This algorithm is also cache-agnostic. For classes of join queries with ρ=ρ\rho^{*}=\rho, such as acyclic joins, even-length cycle joins and boat joins (see Section 3), this is almost optimal up to logarithmic factors.

  • We design an oblivious algorithm for general join queries with O(NρlogN)O\left(N^{\rho^{*}}\cdot\log N\right) time complexity, which has matched the insecure counterpart by a logarithmic factor and recovered the previous ORAM-based result, which assumes the existence of one-way functions. This algorithm is also cache-agnostic, with O(NρBlogMBNρB)O\left(\frac{N^{\rho^{*}}}{B}\cdot\log_{\frac{M}{B}}\frac{N^{\rho^{*}}}{B}\right) cache complexity. This cache complexity can be simplified to O(NρBlogMBNB)O\left(\frac{N^{\rho^{*}}}{B}\cdot\log_{\frac{M}{B}}\frac{N}{B}\right) when B<Ncρc1B<N^{\frac{c-\rho^{*}}{c-1}} for some sufficiently large constant cc. This result establishes the first worst-case near-optimal join algorithm in the insecure EM model when all join results are returned to disk.

  • We develop an improved algorithm for relaxed two-way joins with better cache complexity, which is also cache-agnostic. By integrating our oblivious algorithm with generalized hypertree decomposition [33], we obtain a relaxed oblivious algorithm for general join queries with O((Nfhtw+OUT)logN)O\left((N^{\textsf{fhtw}}+\textrm{OUT})\cdot\log N\right) time complexity and O(Nfhtw+OUTBlogMBNfhtw+OUTB)O\left(\frac{N^{\textsf{fhtw}}+\textrm{OUT}}{B}\cdot\log_{\frac{M}{B}}\frac{N^{\textsf{fhtw}}+\textrm{OUT}}{B}\right) cache complexity, where fhtw is the fractional hypertree width of the input query.

Roadmap. This paper is organized as follows. In Section 2, we introduce the preliminaries for building our algorithms. In Section 3, we show our first algorithm based on the nested-loop algorithm. While effective, this algorithm is not always optimal, as demonstrated with the triangle join. In Section 4, we use triangle join to demonstrate the leakage of insecure WCOJ algorithm and show how to transform it into an oblivious algorithm. We introduce our oblivious WCOJ algorithm for general join queries in Section 5, and conclude in Section 6.

2 Preliminaries

2.1 Fractional and Integral Edge Cover Number

For a join query 𝒬=(𝒱,)\mathcal{Q}=(\mathcal{V},\mathcal{E}), a function W:[0,1]W:\mathcal{E}\to[0,1] is a fractional edge cover for 𝒬\mathcal{Q} if e:xeW(e)1\sum_{e\in\mathcal{E}:x\in e}W(e)\geq 1 for any x𝒱x\in\mathcal{V}. An optimal fractional edge cover is the one minimizing eW(e)\sum_{e\in\mathcal{E}}W(e), which is captured by the following linear program:

mineW(e)s.t. e:xeW(e)\displaystyle\min\ \sum_{e\in\mathcal{E}}W(e)\ \ \ \textrm{s.t. }\ \sum_{e\in\mathcal{E}:x\in e}W(e) 1,x𝒱;W(e)[0,1],e\displaystyle\geq 1,\forall x\in\mathcal{V};\ \ W(e)\in[0,1],\forall e\in\mathcal{E} (1)

The optimal value of (1) is the fractional edge cover number of 𝒬\mathcal{Q}, denoted as ρ(𝒬)\rho^{*}(\mathcal{Q}). Similarly, a function W:{0,1}W:\mathcal{E}\to\{0,1\} is an integral edge cover if e:xeW(e)1\sum_{e\in\mathcal{E}:x\in e}W(e)\geq 1 for any x𝒱x\in\mathcal{V}. The optimal integral edge cover is the one minimizing eW(e)\sum_{e\in\mathcal{E}}W(e), which can be captured by a similar linear program as (1) except that W(e)[0,1]W(e)\in[0,1] is replaced with W(e){0,1}W(e)\in\{0,1\}. The optimal value of this linear program is the integral edge cover number of 𝒬\mathcal{Q}, denoted as ρ(𝒬)\rho(\mathcal{Q}).

2.2 Oblivious Primitives

We introduce the following oblivious primitives, which form the foundation of our algorithms. Each primitive displays an identical access pattern across instances of the same input size.

Linear Scan.

Given an input array of NN elements, a linear scan of all elements can be done with O(N)O(N) time complexity and O(NB)O(\frac{N}{B}) cache complexity in a cache-agnostic way.

Sort [4, 9].

Given an input array of NN elements, the goal is to output an array according to some predetermined ordering. The classical bitonic sorting network [9] requires O(Nlog2N)O(N\cdot\log^{2}N) time. Later, this time complexity has been improved to O(NlogN)O\left(N\cdot\log N\right) [4] in 1983. However, due to the large constant parameter hidden behind O(NlogN)O(N\cdot\log N), the classical bitonic sorting is more commonly used in practice, particularly when the size NN is not too large. Ramachandran and Shi [45] showed a randomized algorithm for sorting with O(NlogN)O(N\cdot\log N) time complexity and O(NBlogMBNB)O\left(\frac{N}{B}\log_{\frac{M}{B}}\frac{N}{B}\right) cache complexity under the tall cache assumption.

Compact [32, 46].

Given an input array of NN elements, some of which are distinguished as \perp, the goal is to output an array with all non-distinguished elements moved to the front before any \perp, while preserving the ordering of non-distinguished elements. Lin et al. [42] showed a randomized algorithm for compaction with O(NloglogN)O(N\cdot\log\log N) time complexity and O(NB)O\left(\frac{N}{B}\right) cache complexity under the tall cache assumption.

We use the above primitives to construct additional building blocks for our algorithms. To ensure obliviousness, all outputs from these primitives include a fixed size equal to the worst-case scenario, i.e., NN, comprising both real and dummy elements. All these primitives achieve O(NlogN)O(N\cdot\log N) time complexity and O(NBlogMBNB)O\left(\frac{N}{B}\cdot\log_{\frac{M}{B}}\frac{N}{B}\right) cache complexity. Further details are provided in Appendix B.

SemiJoin.

Given two input relations RR, SS of at most NN tuples and their common attribute(s) xx, the goal is to output the set of tuples in RR that can be joined with at least one tuple in SS.

Project.

Given an input relation RR of NN tuples defined over attributes ee, and a subset of attributes xex\subseteq e, the goal is to output {tR:πxt}\{t\in R:\pi_{x}t\}, ensuring no duplication.

Intersect.

Given two input arrays R,SR,S of at most NN elements, the goal is to output RSR\cap S.

Augment.

Given a relation RR and kk additional relations S1,S2,,SkS_{1},S_{2},\cdots,S_{k} (each with at most NN tuples) sharing common attribute(s) xx, the goal is to attach each tuple tt the number of tuples in SiS_{i} (for each i[k]i\in[k]) that can be joined with tt on xx.

We note that any sequential composition of oblivious primitives yields more complex algorithms that remain oblivious, which is the key principle underlying our approach.

2.3 Oblivious Two-way Join

NestedLoop. Nested-loop algorithm can compute RSR\Join S with O(|R||S|)O(|R|\cdot|S|) time complexity, which iterates all combinations of tuples from R,SR,S and writes a join result (or a dummy result, if necessary, to maintain obliviousness). He et al. [35] proposed a cache-agnostic version in the EM model with O(|R||S|B)O\left(\frac{|R|\cdot|S|}{B}\right) cache complexity, which is also oblivious.

Theorem 2.1 ([35]).

For RSR\Join S, there is a cache-agnostic algorithm that can compute RSR\Join S with O(|R||S|)O\left(|R|\cdot|S|\right) time complexity and O(|R||S|B)O\left(\frac{|R|\cdot|S|}{B}\right) cache complexity, whose access pattern only depends on M,B,|R|M,B,|R| and |S||S|.

RelaxedTwoWay. The relaxed two-way join algorithm [5, 40] takes as input two relations R,SR,S and a parameter τ|RS|\tau\geq|R\Join S|, and output a table of τ\tau elements containing join results of RSR\Join S, whose access pattern only depends on |R|,|S||R|,|S| and τ\tau. This algorithm can also be easily transformed into a cache-agnostic version with O((|R|+|S|+τ)log(|R|+|S|+τ))O((|R|+|S|+\tau)\cdot\log(|R|+|S|+\tau)) time complexity and O(|R|+|S|+τBlogτ)O\left(\frac{|R|+|S|+\tau}{B}\cdot\log\tau\right) cache complexity. In Appendix C, we show how to improve this algorithm with better cache complexity without sacrificing the time complexity.

Theorem 2.2.

For RSR\Join S and a parameter τ|RS|\tau\geq|R\Join S|, there is a cache-agnostic algorithm that can compute RSR\Join S with O((|R|+|S|+τ)log(|R|+|S|+τ))O\left((|R|+|S|+\tau)\cdot\log(|R|+|S|+\tau)\right) time complexity and O(|R|+|S|+τBlogMB|R|+|S|+τB)O\left(\frac{|R|+|S|+\tau}{B}\cdot\log_{\frac{M}{B}}\frac{|R|+|S|+\tau}{B}\right) cache complexity under the tall cache and wide block assumptions, whose access pattern only depends on M,B,|R|,|S|M,B,|R|,|S| and τ\tau.

3 Beyond Oblivious Nested-loop Join

Although the nested-loop join algorithm is described for the two-way join, it can be extended to multi-way joins. For a general join query with kk relations, the nested-loop primitive can be recursively invoked k1k-1 times, resulting in an oblivious algorithm with O(NkB)O\left(\frac{N^{k}}{B}\right) cache complexity. Careful inspection reveals that we do not necessarily feed all input relations into the nested loop; instead, we can restrict enumeration to combinations of tuples from relations included in an integral edge cover of the join query. Recall that for 𝒬=(𝒱,)\mathcal{Q}=(\mathcal{V},\mathcal{E}), an integral edge cover of 𝒬\mathcal{Q} is a function W:{0,1}W:\mathcal{E}\to\{0,1\}, such that e:xeW(e)1\sum_{e:x\in e}W(e)\geq 1 holds for every x𝒱x\in\mathcal{V}. While enumerating combinations of tuples from relations “chosen” by WW, we can apply semi-joins using remaining relations to filter intermediate join results.

As described in Algorithm 1, it first chooses an optimal integral edge cover WW^{*} of 𝒬\mathcal{Q} (line 1), and then invokes the NestedLoop primitive to iteratively compute the combinations of tuples from relations with W(e)=1W^{*}(e)=1 (line 7), whose output is denoted as LL. Meanwhile, we apply the semi-join between LL and the remaining relations (line 8).

Below, we analyze the complexity of this algorithm. First, as ||ρ|\mathcal{E}^{\prime}|\leq\rho, the intermediate join results materialized in the while-loop is at most O(Nρ)O(N^{\rho}). After semi-join filtering, the number of surviving results is at most O(Nρ)O\left(N^{\rho^{*}}\right). In this way, the number of intermediate results materialized by line 7 is at most O(Nρ+1)O\left(N^{\rho^{*}+1}\right). Putting everything together, we obtain:

Theorem 3.1.

For a general join query 𝒬\mathcal{Q}, there is an oblivious and cache-agnostic algorithm that can compute 𝒬()\mathcal{Q}(\mathcal{R}) for an arbitrary instance \mathcal{R} of input size NN with O(Nmin{ρ,ρ+1})O\left(N^{\min\{\rho,\rho^{*}+1\}}\right) time complexity and O(Nmin{ρ,ρ+1}BlogMBNmin{ρ,ρ+1}B)O\left(\frac{N^{\min\{\rho,\rho^{*}+1\}}}{B}\cdot\log_{\frac{M}{B}}\frac{N^{\min\{\rho,\rho^{*}+1\}}}{B}\right) cache complexity under the tall cache and wide block assumptions, where ρ\rho^{*} and ρ\rho are the optimal fractional and integral edge cover number of 𝒬\mathcal{Q}, respectively.

It is important to note that any oblivious join algorithm incurs a cache complexity of Ω(NρB)\Omega\left(\frac{N^{\rho^{*}}}{B}\right), so Theorem 3.1 is optimal up to a logarithmic factor for join queries where ρ=ρ\rho=\rho^{*}. Below, we list several important classes of join queries that exhibit this desirable property:

1
2WW^{*}\leftarrow an optimal integral edge cover of 𝒬\mathcal{Q}, LL\leftarrow\emptyset;
3 {e:W(e)=1}\mathcal{E}^{\prime}\leftarrow\{e\in\mathcal{E}:W^{*}(e)=1\};
4 while \mathcal{E}^{\prime}\neq\emptyset do
5       ee\leftarrow an arbitrary relation in \mathcal{E}^{\prime};
6       {e}\mathcal{E}^{\prime}\leftarrow\mathcal{E}^{\prime}-\{e\};
7       if L=L=\emptyset then LReL\leftarrow R_{e};
8       else LNestedLoop(L,Re)L\leftarrow\textsc{NestedLoop}(L,R_{e});
9       foreach e{e}e^{\prime}\in\mathcal{E}-\{e\} do LSemiJoin(L,Re)L\leftarrow\textsc{SemiJoin}(L,R_{e^{\prime}});
10      
11return LL;
Algorithm 1 ObliviousNestedLoopJoin(𝒬,)(\mathcal{Q},\mathcal{R})
Example 3.2 (α\alpha-acyclic Join).

A join query 𝒬\mathcal{Q} is α\alpha-acyclic [11, 25] if there is a tree structure 𝒯\mathcal{T} of 𝒬=(𝒱,)\mathcal{Q}=(\mathcal{V},\mathcal{E}) such that (1) there is a one-to-one correspondence between relations in 𝒬\mathcal{Q} and nodes in 𝒯\mathcal{T}; (2) for every attribute x𝒱x\in\mathcal{V}, the set of nodes containing xx form a connected subtree of 𝒯\mathcal{T}. Any α\alpha-acyclic join admits an optimal fractional edge cover that is integral [36].

Example 3.3 (Even-length Cycle Join).

An even-length cycle join is defined as 𝒬=R1(x1,x2)R2(x2,x3)Rk1(xk1,xk)Rk(xk,x1)\mathcal{Q}=R_{1}(x_{1},x_{2})\Join R_{2}(x_{2},x_{3})\Join\cdots\Join R_{k-1}(x_{k-1},x_{k})\Join R_{k}(x_{k},x_{1}) for some even integer kk. It has two integral edge covers {R1,R3,,Rk1}\{R_{1},R_{3},\cdots,R_{k-1}\} and {R2,R4,,Rk}\{R_{2},R_{4},\cdots,R_{k}\}, both of which are also an optimal fractional edge cover. Hence, ρ=ρ=k2\rho^{*}=\rho=\frac{k}{2}.

Example 3.4 (Boat Join).

A boat join is defined as 𝒬=R1(x1,y1)R2(x2,y2)Rk(xk,yk)Rk+1(x1,x2,,xk)Rk+2(y1,y2,,yk)\mathcal{Q}=R_{1}(x_{1},y_{1})\Join R_{2}(x_{2},y_{2})\Join\cdots\Join R_{k}(x_{k},y_{k})\Join R_{k+1}(x_{1},x_{2},\cdots,x_{k})\Join R_{k+2}(y_{1},y_{2},\cdots,y_{k}). It has an integral edge cover {R1,R2}\{R_{1},R_{2}\} that is also an optimal fractional edge cover. Hence, ρ=ρ=2\rho^{*}=\rho=2.

4 Warm Up: Triangle Join

The simplest join query that oblivious nested-loop join algorithm cannot solve optimally is the triangle join 𝒬=R1(x2,x3)R2(x1,x3)R3(x1,x2)\mathcal{Q}_{\triangle}=R_{1}(x_{2},x_{3})\Join R_{2}(x_{1},x_{3})\Join R_{3}(x_{1},x_{2}), which has ρ=2\rho=2 and ρ=32\rho^{*}=\frac{3}{2}. While various worst-case optimal algorithms for the triangle join have been proposed in the RAM model, none of these are oblivious due to their inherent leakage of intermediate statistics. Below, we outline the issues with existing insecure algorithms and propose a strategy to make them oblivious.

Insecure Triangle Join Algorithm 2.

We start with attribute x1x_{1}. Each value adom(x1)a\in\mathrm{dom}(x_{1}) induces a subquery 𝒬a=R1(σx1=aR2)(σx1=aR3)\displaystyle{\mathcal{Q}_{a}=R_{1}\Join(\sigma_{x_{1}=a}R_{2})\Join(\sigma_{x_{1}=a}R_{3})}. Moreover, a value adom(x1)a\in\mathrm{dom}(x_{1}) is heavy if |πx3σx1=aR2||πx2σx1=aR3||\pi_{x_{3}}\sigma_{x_{1}=a}R_{2}|\cdot|\pi_{x_{2}}\sigma_{x_{1}=a}R_{3}| is greater than |R1||R_{1}|, and light otherwise. If aa is light, 𝒬a\mathcal{Q}_{a} is computed by materializing the Cartesian product between πx3σx1=aR1\pi_{x_{3}}\sigma_{x_{1}=a}R_{1} and πx2σx1=aR3\pi_{x_{2}}\sigma_{x_{1}=a}R_{3}, and then filter the intermediate result by a semi-join with R1R_{1}. Every surviving tuple forms a join result with aa, which will be written back to untrusted memory. If aa is heavy, 𝒬a\mathcal{Q}_{a} is computed by applying the semi-joins between R1R_{1} with σx1=aR2\sigma_{x_{1}=a}R_{2} and σx1=aR3\sigma_{x_{1}=a}R_{3}. This algorithm achieves a time complexity of O(N32)O(N^{\frac{3}{2}}) (see [43] for detailed analysis), but it leaks sensitive information through the following mechanisms:

  • |(πx1R2)(πx1R3)|\left|(\pi_{x_{1}}R_{2})\cap(\pi_{x_{1}}R_{3})\right| is leaked by the number of for-loop iterations in line 2;

  • |πx2σx1=aR3|\left|\pi_{x_{2}}\sigma_{x_{1}=a}R_{3}\right| and |πx3σx1=aR2|\left|\pi_{x_{3}}\sigma_{x_{1}=a}R_{2}\right| are leaked by the number of for-loop iterations in line 4;

  • The sizes of heavy and light values in (πx1R2)(πx1R3)(\pi_{x_{1}}R_{2})\cap(\pi_{x_{1}}R_{3}) are leaked by the if-else condition in lines 3 and 6;

To protect intermediate statistics, we pad dummy tuples to every intermediate result (such as (πx1R2)(πx1R3)(\pi_{x_{1}}R_{2})\cap(\pi_{x_{1}}R_{3}), πx3σx1=aR2\pi_{x_{3}}\sigma_{x_{1}=a}R_{2} and πx2σx1=aR3\pi_{x_{2}}\sigma_{x_{1}=a}R_{3}) to match the worst-case size NN. To hide heavy and light values, we replace conditional if-else branches with a unified execution plan by visiting every possible combination of (πx2σx1=aR3)×(πx3σx1=aR2)\left(\pi_{x_{2}}\sigma_{x_{1}=a}R_{3}\right)\times\left(\pi_{x_{3}}\sigma_{x_{1}=a}R_{2}\right) and every tuple of R1R_{1}. By integrating these techniques, this strategy leads to N2N^{2} memory accesses, hence destroying the power of two choices that is a key advantage in the insecure WCOJ algorithm.

1 LL\leftarrow\emptyset;
2 foreach a(πx1R2)(πx1R3)a\in\left(\pi_{x_{1}}R_{2}\right)\cap\left(\pi_{x_{1}}R_{3}\right) do
3       if |σx1=aR2||σx1=aR3||R1||\sigma_{x_{1}=a}R_{2}|\cdot|\sigma_{x_{1}=a}R_{3}|\leq|R_{1}| then
4             foreach (b,c)(πx2σx1=aR3)×(πx3σx1=aR2)(b,c)\in\left(\pi_{x_{2}}\sigma_{x_{1}=a}R_{3}\right)\times\left(\pi_{x_{3}}\sigma_{x_{1}=a}R_{2}\right) do
5                   if (b,c)R1(b,c)\in R_{1} then write (a,b,c)(a,b,c) to LL;
6                  
7            
8      else
9             foreach (b,c)R1(b,c)\in R_{1} do
10                   if (a,b)R3(a,b)\in R_{3} and (a,c)R2(a,c)\in R_{2} then write (a,b,c)(a,b,c) to LL;
11                  
12            
13      
14return LL;
Algorithm 2 Compute 𝒬\mathcal{Q}_{\triangle} by power of two choices
1 A(πx1R2)(πx1R3)A\leftarrow\left(\pi_{x_{1}}R_{2}\right)\cap\left(\pi_{x_{1}}R_{3}\right) by Project and Intersect;
2 AAugment(A,{R2,R3},x1)A\leftarrow\textsc{Augment}(A,\{R_{2},R_{3}\},x_{1});
3 A1,A2A_{1},A_{2}\leftarrow\emptyset;
4 while read (a,Δ1,Δ2)(a,\Delta_{1},\Delta_{2}) from AA do // Δ1=|πx3σx1=aR2|\Delta_{1}=|\pi_{x_{3}}\sigma_{x_{1}=a}R_{2}| and Δ2=|πx2σx1=aR3|\Delta_{2}=|\pi_{x_{2}}\sigma_{x_{1}=a}R_{3}|
5       if Δ1Δ2|R1|\Delta_{1}\cdot\Delta_{2}\leq|R_{1}| then write aa to A1A_{1}, write \perp to A2A_{2};
6       else write aa to A2A_{2}, write \perp to A1A_{1};
7      
8L1RelaxedTwoWay(A2,R1,N32)L_{1}\leftarrow\textsc{RelaxedTwoWay}(A_{2},R_{1},N^{\frac{3}{2}});
9 L1SemiJoin(L1,R2)L_{1}\leftarrow\textsc{SemiJoin}(L_{1},R_{2}), L1SemiJoin(L1,R3)L_{1}\leftarrow\textsc{SemiJoin}(L_{1},R_{3});
10 R2SemiJoin(R2,A1)R_{2}\leftarrow\textsc{SemiJoin}(R_{2},A_{1}), R3SemiJoin(R3,A1)R_{3}\leftarrow\textsc{SemiJoin}(R_{3},A_{1});
11 L2RelaxedTwoWay(R2,R3,N32)L_{2}\leftarrow\textsc{RelaxedTwoWay}(R_{2},R_{3},N^{\frac{3}{2}});
12 L2SemiJoin(L2,R1)L_{2}\leftarrow\textsc{SemiJoin}(L_{2},R_{1});
13 return Compact L1L2L_{1}\cup L_{2} while keeping the first N32N^{\frac{3}{2}} tuples;
Algorithm 3 Inject Obliviousness to Algorithm 2

Inject Obliviousness to Algorithm 2.

To inject obliviousness into Algorithm 2, Algorithm 3 leverages oblivious primitives to ensure the same access pattern across all instances of the input size. Here’s a breakdown of how this is achieved and why it works. We start with computing A=(πx1R2)(πx1R3)A=\left(\pi_{x_{1}}R_{2}\right)\cap\left(\pi_{x_{1}}R_{3}\right) by the Intersect primitive. Then, we partition values in AA into two subsets A1,A2A_{1},A_{2}, depending on the relative order between |πx3σx1=aR2||πx2σx1=aR3||\pi_{x_{3}}\sigma_{x_{1}=a}R_{2}|\cdot|\pi_{x_{2}}\sigma_{x_{1}=a}R_{3}| and |R1||R_{1}|. We next compute the following two-way joins A2R1A_{2}\Join R_{1} and (R2A1)(R3A2)(R_{2}\ltimes A_{1})\Join(R_{3}\ltimes A_{2}) by invoking the RelaxedTwoWay primitive separately, each with the upper bound N32N^{\frac{3}{2}}. At last, we filter intermediate join results by the SemiJoin primitive and remove unnecessary dummy tuples by the Compact primitive.

Analysis of Algorithm 3.

It suffices to show that |(R2A1)(R3A1)|N32|(R_{2}\ltimes A_{1})\Join(R_{3}\ltimes A_{1})|\leq N^{\frac{3}{2}} and |A2R1|N32|A_{2}\Join R_{1}|\leq N^{\frac{3}{2}}, which directly follows from the query decomposition lemma [44]:

aAmin{|σx1=aR2||σx1=aR3|,|R1|}aA(|R2a||R3a|)12|R1a|12N32.\sum_{a\in A}\min\left\{\left|\sigma_{x_{1}=a}R_{2}\right|\cdot\left|\sigma_{x_{1}=a}R_{3}\right|,|R_{1}|\right\}\leq\sum_{a\in A}\left(\left|R_{2}\ltimes a\right|\cdot\left|R_{3}\ltimes a\right|\right)^{\frac{1}{2}}\cdot\left|R_{1}\ltimes a\right|^{\frac{1}{2}}\leq N^{\frac{3}{2}}.

All other primitives have O(NlogN)O(N\cdot\log N) time complexity and O(NBlogMBNB)O\left(\frac{N}{B}\cdot\log_{\frac{M}{B}}\frac{N}{B}\right) cache complexity. Hence, this whole algorithm incurs O(N32logN)O\left(N^{\frac{3}{2}}\cdot\log N\right) time complexity and O(N32BlogMBN32B)O\left(\frac{N^{\frac{3}{2}}}{B}\cdot\log_{\frac{M}{B}}\frac{N^{\frac{3}{2}}}{B}\right) cache complexity. As each step is oblivious, the composition of all these steps is also oblivious.

Insecure Triangle Join Algorithm 4.

We start with attribute x1x_{1}. We first compute the candidate values in x1x_{1} that appear in some join results, i.e., (πx1R2)(πx1R3)(\pi_{x_{1}}R_{2})\cap(\pi_{x_{1}}R_{3}). For each candidate value aa, we retrieve the candidate values in x2x_{2} that can appear together with aa in some join results, i.e., (πx2σx1=aR3)(πx2R1)\left(\pi_{x_{2}}\sigma_{x_{1}=a}R_{3}\right)\cap\left(\pi_{x_{2}}R_{1}\right). Furthermore, for each candidate value bb, we explore the possible values in x3x_{3} that can appear together with (a,b)(a,b) in some join results. More precisely, every value cc appears in πx3σx2=bR1\pi_{x_{3}}\sigma_{x_{2}=b}R_{1} as well as πx3σx1=aR2\pi_{x_{3}}\sigma_{x_{1}=a}R_{2} forms a triangle with a,ba,b. This algorithm runs in O(N32)O(N^{\frac{3}{2}}) time (see [44] for detailed analysis). Similarly, it is not oblivious as the following intermediate statistics may be leaked:

  • |(πx1R2)(πx1R3)|\left|(\pi_{x_{1}}R_{2})\cap(\pi_{x_{1}}R_{3})\right| is leaked by the number of for-loop iterations in line 2;

  • |(πx2σx1=aR3)(πx2R1)|\left|(\pi_{x_{2}}\sigma_{x_{1}=a}R_{3})\cap(\pi_{x_{2}}R_{1})\right| is leaked by the number of for-loop iterations in line 3;

  • |(πx3σx2=bR2)(πx3πx1=aR2)|\left|(\pi_{x_{3}}\sigma_{x_{2}=b}R_{2})\cap(\pi_{x_{3}}\pi_{x_{1}=a}R_{2})\right| is leaked by the number of for-loop iterations in line 4;

1 LL\leftarrow\emptyset;
2 foreach a(πx1R2)(πx1R3)a\in\left(\pi_{x_{1}}R_{2}\right)\cap\left(\pi_{x_{1}}R_{3}\right) do
3       foreach b(πx2σx1=aR3)(πx2R1)b\in\left(\pi_{x_{2}}\sigma_{x_{1}=a}R_{3}\right)\cap\left(\pi_{x_{2}}R_{1}\right) do
4             foreach c(πx3σx2=bR1)(πx3σx1=aR2)c\in\left(\pi_{x_{3}}\sigma_{x_{2}=b}R_{1}\right)\cap\left(\pi_{x_{3}}\sigma_{x_{1}=a}R_{2}\right) do
5                   write (a,b,c)(a,b,c) to LL;
6                  
7            
8      
9return LL;
Algorithm 4 Compute 𝒬\mathcal{Q}_{\triangle} by delaying computation
1 R3Augment(R3,R1,x2)R_{3}\leftarrow\textsc{Augment}(R_{3},R_{1},x_{2}), R3Augment(R3,R2,x1)R_{3}\leftarrow\textsc{Augment}(R_{3},R_{2},x_{1});
2 K1,K2K_{1},K_{2}\leftarrow\emptyset;
3 while read (t,Δ1,Δ2)(t,\Delta_{1},\Delta_{2}) from R3R_{3} do // Suppose Δi=|Ri{t}|\Delta_{i}=|R_{i}\ltimes\{t\}|
4       if Δ1Δ2\Delta_{1}\leq\Delta_{2} then write tt to K1K_{1}, write \perp to K2K_{2};
5       else write tt to K2K_{2}, write \perp to K1K_{1};
6      
7L1RelaxedTwoWay(K1,R1,N32)L_{1}\leftarrow\textsc{RelaxedTwoWay}(K_{1},R_{1},N^{\frac{3}{2}}), L1SemiJoin(L1,R2)L_{1}\leftarrow\textsc{SemiJoin}(L_{1},R_{2});
8 L2RelaxedTwoWay(K2,R2,N32)L_{2}\leftarrow\textsc{RelaxedTwoWay}(K_{2},R_{2},N^{\frac{3}{2}}), L2SemiJoin(L2,R1)L_{2}\leftarrow\textsc{SemiJoin}(L_{2},R_{1});
9 return Compact L1L2L_{1}\cup L_{2} while keeping the first N32N^{\frac{3}{2}} tuples;
Algorithm 5 Inject Obliviousness to Algorithm 4

To achieve obliviousness, a straightforward solution is to pad every intermediate result with dummy tuples to match the worst-case size NN. However, this would result in N3N^{3} memory accesses, which is even less efficient than the nested-loop-based algorithm in Section 3.

Inject Obliviousness to Algorithm 4.

We transform Algorithm 4 into an oblivious version, presented as Algorithm 5, by employing oblivious primitives. The first modification merges the first two for-loops (lines 2–3 in Algorithm 4) into one step (line 1 in Algorithm 5). This is achieved by applying the semi-joins on R3R_{3} using R1,R2R_{1},R_{2} separately. Then, the third for-loop (line 4 in Algorithm 4) is replaced with a strategy based on the power of two choices. Specifically, for each surviving tuple (a,b)R3(a,b)\in R_{3}, we first compute the size of two lists, |πx3σx2=bR1|\left|\pi_{x_{3}}\sigma_{x_{2}=b}R_{1}\right| and |πx3σx1=aR2|\left|\pi_{x_{3}}\sigma_{x_{1}=a}R_{2}\right|, and put (a,b)(a,b) into either K1K_{1} or K2K_{2}, based on the relative order between |πx3σx2=bR1|\left|\pi_{x_{3}}\sigma_{x_{2}=b}R_{1}\right| and |πx3σx1=aR2|\left|\pi_{x_{3}}\sigma_{x_{1}=a}R_{2}\right|. We next compute the following two-way joins K1R1K_{1}\Join R_{1} and K2R2K_{2}\Join R_{2} by invoking the RelaxedTwoWay primitive, each with the upper bound N32N^{\frac{3}{2}} separately. Finally, we filter intermediate join results by the SemiJoin primitive and remove unnecessary dummy tuples by the Compact primitive.

Complexity of Algorithm 5.

It suffices to show that |K1R1|N32|K_{1}\Join R_{1}|\leq N^{\frac{3}{2}} and |K2R2|N32|K_{2}\Join R_{2}|\leq N^{\frac{3}{2}}, which directly follows from the query decomposition lemma [44]:

(a,b)R3min{|πx3σx2=bR1|,|πx3σx1=aR2|}(a.b)R3|R1(a,b)|12|R2(a,b)|12N32.\sum_{(a,b)\in R_{3}}\min\left\{\left|\pi_{x_{3}}\sigma_{x_{2}=b}R_{1}\right|,\left|\pi_{x_{3}}\sigma_{x_{1}=a}R_{2}\right|\right\}\leq\sum_{(a.b)\in R_{3}}\left|R_{1}\ltimes(a,b)\right|^{\frac{1}{2}}\cdot\left|R_{2}\ltimes(a,b)\right|^{\frac{1}{2}}\leq N^{\frac{3}{2}}.

All other primitives incur O(NlogN)O(N\log N) time complexity and O(NBlogMBNB)O\left(\frac{N}{B}\cdot\log_{\frac{M}{B}}\frac{N}{B}\right) cache complexity. Hence, this algorithm incurs O(N32logN)O\left(N^{\frac{3}{2}}\cdot\log N\right) time complexity and O(N32BlogMBN32B)O\left(\frac{N^{\frac{3}{2}}}{B}\cdot\log_{\frac{M}{B}}\frac{N^{\frac{3}{2}}}{B}\right) cache complexity. As each step is oblivious, the composition of all these steps is also oblivious.

Theorem 4.1.

For triangle join 𝒬\mathcal{Q}_{\triangle}, there is an oblivious and cache-agnostic algorithm that can compute 𝒬()\mathcal{Q}(\mathcal{R}) for any instance \mathcal{R} of input size NN with O(N32logN)O\left(N^{\frac{3}{2}}\cdot\log N\right) time complexity and O(N32BlogMBN32B)O\left(\frac{N^{\frac{3}{2}}}{B}\cdot\log_{\frac{M}{B}}\frac{N^{\frac{3}{2}}}{B}\right) cache complexity under the tall cache and wide block assumptions.

5 Oblivious Worst-case Optimal Join Algorithm

1 if |𝒱|=1|\mathcal{V}|=1 then return eRe\cap_{e\in\mathcal{E}}R_{e} by Intersect;
2 (I,J)(I,J)\leftarrow an arbitrary partition of 𝒱\mathcal{V};
3 𝒬IGenericJoin((I,[I]),{πIRe:e})\mathcal{Q}_{I}\leftarrow\textsc{GenericJoin}((I,\mathcal{E}[I]),\left\{\pi_{I}R_{e}:e\in\mathcal{E}\right\});
4 foreach t𝒬It\in\mathcal{Q}_{I} do  𝒬tGenericJoin((J,[J]),{πJ(Ret):e})\mathcal{Q}_{t}\leftarrow\textsc{GenericJoin}((J,\mathcal{E}[J]),\left\{\pi_{J}(R_{e}\ltimes t):e\in\mathcal{E}\right\});
5 return t𝒬I{t}×𝒬t\bigcup_{t\in\mathcal{Q}_{I}}\{t\}\times\mathcal{Q}_{t};
Algorithm 6 GenericJoin(𝒬=(𝒱,),)\textsc{GenericJoin}(\mathcal{Q}=(\mathcal{V},\mathcal{E}),\mathcal{R}) [44]

In this section, we start with revisiting the insecure WCOJ algorithm in Section 5.1 and then present our oblivious algorithm in Section 5.2 and present its analysis in Section 5.3. Subsequently, in Section 5.4, we explore the implications of our oblivious algorithm for relaxed oblivious algorithms designed for cyclic join queries.

5.1 Generic Join Revisited

In a join query 𝒬=(𝒱,)\mathcal{Q}=(\mathcal{V},\mathcal{E}), for a subset of attributes S𝒱S\subseteq\mathcal{V}, we use 𝒬[S]=(S,[S])\mathcal{Q}[S]=(S,\mathcal{E}[S]) to denote the sub-query induced by attributes in SS, where [S]={eS:e}\mathcal{E}[S]=\{e\cap S:e\in\mathcal{E}\}. For each attribute x𝒱x\in\mathcal{V}, we use x={e:xe}\mathcal{E}_{x}=\{e\in\mathcal{E}:x\in e\} to denote the set of relations containing xx. The insecure WCOJ algorithm described in [44] is outlined in Algorithm 6, which takes as input a general join query 𝒬=(𝒱,)\mathcal{Q}=(\mathcal{V},\mathcal{E}) and an instance \mathcal{R}. In the base case, when only one attribute exists, it computes the intersection of all relations. For the general case, it partitions the attributes into two disjoint subsets, II and JJ, such that IJ=I\cap J=\emptyset and IJ=𝒱I\cup J=\mathcal{V}. The algorithm first computes the sub-query 𝒬[I]\mathcal{Q}[I], induced by attributes in II, whose join result is denoted 𝒬I\mathcal{Q}_{I}. Then, for each tuple t𝒬It\in\mathcal{Q}_{I}, it recursively invokes the whole algorithm to compute the sub-query 𝒬[J]\mathcal{Q}[J] induced by attributes in JJ, over tuples that can be joined with tt. The resulting join result for each tuple tt is denoted as 𝒬t\mathcal{Q}_{t}. Finally, it attaches each tuple in 𝒬t\mathcal{Q}_{t} with tt, representing the join results in which tt participates. The algorithm ultimately returns the union of all join results for tuples in 𝒬I\mathcal{Q}_{I}. However, Algorithm 6 exhibits significant leakage of data statistics that violates the obliviousness constraint, for example:

  • |eRe|\left|\bigcap_{e\in\mathcal{E}}R_{e}\right| is leaked in line 1;

  • |πIRe|\left|\pi_{I}R_{e}\right| for each relation ee\in\mathcal{E} is leaked in line 3;

  • |𝒬I||\mathcal{Q}_{I}|, |πJ(Ret)|\left|\pi_{J}\left(R_{e}\ltimes t\right)\right|, and |𝒬t|\left|\mathcal{Q}_{t}\right| for each tuple t𝒬It\in\mathcal{Q}_{I} are leaked in line 4.

More importantly, this algorithm heavily relies on hashing indexes or range search indexes for retrieving tuples, such that the intersection at line 1 can be computed in O(mine|Re|)O\left(\min_{e\in\mathcal{E}}|R_{e}|\right) time. However, these indexes do not work well in the external memory model since naively extending this algorithm could result in O(Nρ)O\left(N^{\rho^{*}}\right) cache complexity, which is too expensive. Consequently, designing a WCOJ algorithm that simultaneously maintains cache locality and achieves obliviousness remains a significant challenge.

1 if |𝒱|=1|\mathcal{V}|=1 then return eRe\cap_{e\in\mathcal{E}}R_{e} by Intersect;
2 (I,J)(I,J)\leftarrow a partition of 𝒱\mathcal{V} such that (1) |J|=1|J|=1; or (2) |J|=2|J|=2 (say J={y,z}J=\{y,z\}) and yz\mathcal{E}_{y}-\mathcal{E}_{z}\neq\emptyset and zy\mathcal{E}_{z}-\mathcal{E}_{y}\neq\emptyset;
3 foreach ee\in\mathcal{E} do SeProject(Re,eI)S_{e}\leftarrow\textsc{Project}(R_{e},e\cap I);
4 𝒬IObliviousGenericJoin((I,[I]),{Se:e})\mathcal{Q}_{I}\leftarrow\textsc{ObliviousGenericJoin}((I,\mathcal{E}[I]),\left\{S_{e}:e\in\mathcal{E}\right\});
5 if |J|=1|J|=1 then // Suppose J={x}J=\{x\}
6       foreach exe\in\mathcal{E}_{x} do  𝒬IAugment(𝒬I,Re,eI)\mathcal{Q}_{I}\leftarrow\textsc{Augment}(\mathcal{Q}_{I},R_{e},e\cap I);
7      {QIe}exPartition-I(𝒬I,x)\{Q_{I}^{e}\}_{e\in\mathcal{E}_{x}}\leftarrow\textsc{Partition-I}(\mathcal{Q}_{I},\mathcal{E}_{x});
8       foreach exe\in\mathcal{E}_{x} do
9             LeRelaxedTwoWay(𝒬Ie,Re,Nρ(𝒬))L_{e}\leftarrow\textsc{RelaxedTwoWay}\left(\mathcal{Q}^{e}_{I},R_{e},N^{\rho^{*}(\mathcal{Q})}\right);
10             for ex{e}e^{\prime}\in\mathcal{E}_{x}-\{e\} do LeSemiJoin(Le,Re)L_{e}\leftarrow\textsc{SemiJoin}(L_{e},R_{e^{\prime}});
11            
12      LexLeL\leftarrow\bigcup_{e\in\mathcal{E}_{x}}L_{e};
13      
14else // Suppose J={y,z}J=\{y,z\}
15       foreach eyze\in\mathcal{E}_{y}\cup\mathcal{E}_{z} do  𝒬IAugment(𝒬I,Re,eI)\mathcal{Q}_{I}\leftarrow\textsc{Augment}(\mathcal{Q}_{I},R_{e},e\cap I);
16       {𝒬Ie1,e2}(e1,e2)(yz)×(zy),{𝒬Ie3}e3xyPartition-II(QI,y,z)\{\mathcal{Q}_{I}^{e_{1},e_{2}}\}_{(e_{1},e_{2})\in(\mathcal{E}_{y}-\mathcal{E}_{z})\times(\mathcal{E}_{z}-\mathcal{E}_{y})},\{\mathcal{Q}_{I}^{e_{3}}\}_{e_{3}\in\mathcal{E}_{x}\cap\mathcal{E}_{y}}\leftarrow\textsc{Partition-II}(Q_{I},\mathcal{E}_{y},\mathcal{E}_{z});
17       foreach (e1,e2)(yz)×(zy)(e_{1},e_{2})\in(\mathcal{E}_{y}-\mathcal{E}_{z})\times(\mathcal{E}_{z}-\mathcal{E}_{y}) do
18             Le1,e2RelaxedTwoWay(𝒬Ie1,e2,Re1,Nρ(𝒬))L_{e_{1},e_{2}}\leftarrow\textsc{RelaxedTwoWay}\left(\mathcal{Q}^{e_{1},e_{2}}_{I},R_{e_{1}},N^{\rho^{*}(\mathcal{Q})}\right);
19             Le1,e2RelaxedTwoWay(Le1,e2,Re2,Nρ(𝒬))L_{e_{1},e_{2}}\leftarrow\textsc{RelaxedTwoWay}\left(L_{e_{1},e_{2}},R_{e_{2}},N^{\rho^{*}(\mathcal{Q})}\right);
20             foreach e{e1,e2}e\in\mathcal{E}-\{e_{1},e_{2}\} do Le1,e2SemiJoin(Le1,e2,Re)L_{e_{1},e_{2}}\leftarrow\textsc{SemiJoin}(L_{e_{1},e_{2}},R_{e});
21            
22      foreach e3yze_{3}\in\mathcal{E}_{y}\cap\mathcal{E}_{z} do
23             Le3RelaxedTwoWay(𝒬Ie3,Re3,Nρ(𝒬))L_{e_{3}}\leftarrow\textsc{RelaxedTwoWay}\left(\mathcal{Q}^{e_{3}}_{I},R_{e_{3}},N^{\rho^{*}(\mathcal{Q})}\right);
24             foreach e{e3}e\in\mathcal{E}-\{e_{3}\} do  Le3SemiJoin(Le3,Re)L_{e_{3}}\leftarrow\textsc{SemiJoin}(L_{e_{3}},R_{e});
25            
26      L((e1,e2)(yz)×(zy)Le1,e2)(e3yzLe3)L\leftarrow\left(\bigcup_{(e_{1},e_{2})\in(\mathcal{E}_{y}-\mathcal{E}_{z})\times(\mathcal{E}_{z}-\mathcal{E}_{y})}L_{e_{1},e_{2}}\right)\cup\left(\bigcup_{e_{3}\in\mathcal{E}_{y}\cap\mathcal{E}_{z}}L_{e_{3}}\right);
27      
28return Compact LL while keeping the first Nρ(𝒬)N^{\rho^{*}(\mathcal{Q})} tuples;
Algorithm 7 ObliviousGenericJoin(𝒬=(𝒱,),)\textsc{ObliviousGenericJoin}(\mathcal{Q}=(\mathcal{V},\mathcal{E}),\mathcal{R})
1 foreach exe\in\mathcal{E}_{x} do 𝒬Ie\mathcal{Q}^{e}_{I}\leftarrow\emptyset;
2 while read (t,{Δe(t)}ex)(t,\{\Delta_{e}(t)\}_{e\in\mathcal{E}_{x}}) from 𝒬I\mathcal{Q}_{I} do // Suppose Δe(t)=|Re{t}|\Delta_{e}(t)=|R_{e}\ltimes\{t\}|
3       eargminexΔe(t)e^{\prime}\leftarrow\arg\min_{e\in\mathcal{E}_{x}}\Delta_{e}(t);
4       write tt to 𝒬Ie\mathcal{Q}^{e^{\prime}}_{I} and write \perp to 𝒬Ie′′\mathcal{Q}^{e^{\prime\prime}}_{I} for each e′′x{e}e^{\prime\prime}\in\mathcal{E}_{x}-\{e^{\prime}\};
5      
return {QIe}ex\{Q_{I}^{e}\}_{e\in\mathcal{E}_{x}};
Algorithm 8 Partition-I(𝒬I,x)\textsc{Partition-I}(\mathcal{Q}_{I},\mathcal{E}_{x})
1 foreach (e1,e2)(yz)×(zy)(e_{1},e_{2})\in(\mathcal{E}_{y}-\mathcal{E}_{z})\times(\mathcal{E}_{z}-\mathcal{E}_{y}) do  𝒬Ie1,e2\mathcal{Q}^{e_{1},e_{2}}_{I}\leftarrow\emptyset;
2 foreach e3yze_{3}\in\mathcal{E}_{y}\cap\mathcal{E}_{z} do 𝒬Ie3\mathcal{Q}^{e_{3}}_{I}\leftarrow\emptyset;
3 while read (t,{Δe(t)}eyz)(t,\{\Delta_{e}(t)\}_{e\in\mathcal{E}_{y}\cup\mathcal{E}_{z}}) from 𝒬I\mathcal{Q}_{I} do // Suppose Δe(t)=|Re{t}|\Delta_{e}(t)=|R_{e}\ltimes\{t\}|
4       e1,e2,e3argmineyzΔe(t),argminezyΔe(t),argmineyzΔe(t)\displaystyle{e_{1},e_{2},e_{3}\leftarrow\arg\min_{e\in\mathcal{E}_{y}-\mathcal{E}_{z}}\Delta_{e}(t),\arg\min_{e\in\mathcal{E}_{z}-\mathcal{E}_{y}}\Delta_{e}(t),\arg\min_{e\in\mathcal{E}_{y}\cap\mathcal{E}_{z}}\Delta_{e}(t)};
5       if Δe1(t)Δe2(t)Δe3(t)\Delta_{e_{1}}(t)\cdot\Delta_{e_{2}}(t)\leq\Delta_{e_{3}}(t) then
6             write tt to 𝒬Ie1,e2\mathcal{Q}^{e_{1},e_{2}}_{I};
7             foreach (e1,e2)(yz)×(zy){(e1,e2)}(e_{1}^{\prime},e_{2}^{\prime})\in(\mathcal{E}_{y}-\mathcal{E}_{z})\times(\mathcal{E}_{z}-\mathcal{E}_{y})-\{(e_{1},e_{2})\} do write \perp to 𝒬Ie1,e2\mathcal{Q}^{e_{1}^{\prime},e_{2}^{\prime}}_{I};
8             foreach e3yze_{3}^{\prime}\in\mathcal{E}_{y}\cap\mathcal{E}_{z} do write \perp to 𝒬Ie3\mathcal{Q}^{e_{3}^{\prime}}_{I};
9            
10      else
11             write tt to 𝒬Ie3\mathcal{Q}^{e_{3}}_{I};
12             foreach (e1,e2)(yz)×(zy)(e_{1}^{\prime},e_{2}^{\prime})\in(\mathcal{E}_{y}-\mathcal{E}_{z})\times(\mathcal{E}_{z}-\mathcal{E}_{y}) do write \perp to 𝒬Ie1,e2\mathcal{Q}^{e_{1}^{\prime},e_{2}^{\prime}}_{I};
13             foreach e3yz{e3}e_{3}^{\prime}\in\mathcal{E}_{y}\cap\mathcal{E}_{z}-\{e_{3}\} do write \perp to 𝒬Ie3\mathcal{Q}^{e_{3}^{\prime}}_{I};
14            
15      
16return {𝒬Ie1,e2}(e1,e2)(yz)×(zy),{𝒬Ie3}e3xy\{\mathcal{Q}_{I}^{e_{1},e_{2}}\}_{(e_{1},e_{2})\in(\mathcal{E}_{y}-\mathcal{E}_{z})\times(\mathcal{E}_{z}-\mathcal{E}_{y})},\{\mathcal{Q}_{I}^{e_{3}}\}_{e_{3}\in\mathcal{E}_{x}\cap\mathcal{E}_{y}};
Algorithm 9 Partition-II(𝒬I,y,z)\textsc{Partition-II}(\mathcal{Q}_{I},\mathcal{E}_{y},\mathcal{E}_{z})

5.2 Our Algorithm

Now, we extend our oblivious triangle join algorithms from Section 4 to general join queries, as described in Algorithm 7. It is built on a recursive framework:

Base Case: |𝒱|=1|\mathcal{V}|=1. In this case, the join degenerates to the set intersection of all input relations, which can be efficiently computed by the Intersect primitive.

General Case: |𝒱|>1|\mathcal{V}|>1.

In general, we partition 𝒱\mathcal{V} into two subsets II and JJ, but with the constraint that |J|=1|J|=1 or |J|=2|J|=2 but the two attributes y,zy,z in JJ must satisfy yz\mathcal{E}_{y}-\mathcal{E}_{z}\neq\emptyset and zy\mathcal{E}_{z}-\mathcal{E}_{y}\neq\emptyset. Similar to Algorithm 6, we compute the sub-query 𝒬[I]\mathcal{Q}[I] by invoking the whole algorithm recursively, whose join result is denoted as 𝒬I\mathcal{Q}_{I}. To prevent the potential leakage, we must be careful about the projection of each relation involved in this subquery, which is computed by the Project primitive. We further distinguish two cases based on |J||J|:

General Case 1: |J|=1|J|=1.

Suppose J={x}J=\{x\}. Recall that for each tuple t𝒬It\in\mathcal{Q}_{I}, Algorithm 6 computes the intersection ex(Ret)\cap_{e\in\mathcal{E}_{x}}\left(R_{e}\ltimes t\right) on xx in the base case. To ensure this step remains oblivious, we must conceal the size of RetR_{e}\ltimes t. To achieve this, we augment each tuple t𝒬It\in\mathcal{Q}_{I} with its degree in ReR_{e}, which is defined as Δe(t)=|Ret|\Delta_{e}(t)=\left|R_{e}\ltimes t\right|, using the Augment primitive. Then, we partition tuples in 𝒬I\mathcal{Q}_{I} into |x||\mathcal{E}_{x}| subsets based on their smallest degree across all relations in x\mathcal{E}_{x}. The details are described in Algorithm 8. Let 𝒬Ie𝒬I\mathcal{Q}^{e}_{I}\subseteq\mathcal{Q}_{I} denote the set of tuples whose degree is the smallest in ReR_{e}, i.e., e=argminexΔe(t)e=\arg\min_{e^{\prime}\in\mathcal{E}_{x}}\Delta_{e^{\prime}}(t) for each t𝒬Iet\in\mathcal{Q}^{e}_{I}. Whenever we write one tuple t𝒬It\in\mathcal{Q}_{I} to one subset, we also write a dummy tuple \perp to the other |x|1|\mathcal{E}_{x}|-1 subsets. At last, for each exe\in\mathcal{E}_{x}, we compute Re𝒬IeR_{e}\Join\mathcal{Q}^{e}_{I} by invoking the RelaxedTwoWay primitive (line 9), with upper bound NρN^{\rho^{*}}, and further filter them by remaining relations with semi-joins (line 10).

General Case 2: |J|=2|J|=2.

Suppose J={y,z}J=\{y,z\}. Consider an arbitrary tuple t𝒬It\in\mathcal{Q}_{I}. Algorithm 6 computes the residual query {eyz(Ret)}{eyz(Ret)}{ezy(Ret)}\left\{\bigcap_{e\in\mathcal{E}_{y}\cap\mathcal{E}_{z}}(R_{e}\ltimes t)\right\}\Join\left\{\bigcap_{e\in\mathcal{E}_{y}-\mathcal{E}_{z}}(R_{e}\ltimes t)\right\}\Join\left\{\bigcap_{e\in\mathcal{E}_{z}-\mathcal{E}_{y}}(R_{e}\ltimes t)\right\}. Like the case above, we first compute its degree in ReR_{e} as Δe(t)\Delta_{e}(t), by the Augment primitive. We then partition tuples in 𝒬I\mathcal{Q}_{I} into |yz|+|yz||zy||\mathcal{E}_{y}\cap\mathcal{E}_{z}|+|\mathcal{E}_{y}-\mathcal{E}_{z}|\cdot|\mathcal{E}_{z}-\mathcal{E}_{y}| subsets based on their degrees, but more complicated than Case 1. The details are described in Algorithm 9. More specifically, for each e3yze_{3}\in\mathcal{E}_{y}\cap\mathcal{E}_{z}, let

𝒬Ie3={t𝒬I:\displaystyle\mathcal{Q}^{e_{3}}_{I}=\biggl{\{}t\in\mathcal{Q}_{I}: Δe3(t)=mine′′yzΔe′′(t),Δe3(t)<mineyz,ezyΔe(t)Δe(t)};\displaystyle\Delta_{e_{3}}(t)=\min_{e^{\prime\prime}\in\mathcal{E}_{y}\cap\mathcal{E}_{z}}\Delta_{e^{\prime\prime}}(t),\Delta_{e_{3}}(t)<\min_{e\in\mathcal{E}_{y}-\mathcal{E}_{z},e^{\prime}\in\mathcal{E}_{z}-\mathcal{E}_{y}}\Delta_{e}(t)\cdot\Delta_{e^{\prime}}(t)\biggl{\}};

and for each pair (e1,e2)(yz)×(zy)(e_{1},e_{2})\in(\mathcal{E}_{y}-\mathcal{E}_{z})\times(\mathcal{E}_{z}-\mathcal{E}_{y}), let

𝒬Ie1,e2={t𝒬I:\displaystyle\mathcal{Q}^{e_{1},e_{2}}_{I}=\biggl{\{}t\in\mathcal{Q}_{I}: Δe1(t)Δe2(t)=mineyz,ezyΔe(t)Δe(t)mine′′yzΔe′′(t)}\displaystyle\Delta_{e_{1}}(t)\cdot\Delta_{e_{2}}(t)=\min_{e\in\mathcal{E}_{y}-\mathcal{E}_{z},e^{\prime}\in\mathcal{E}_{z}-\mathcal{E}_{y}}\Delta_{e}(t)\cdot\Delta_{e^{\prime}}(t)\leq\min_{e^{\prime\prime}\in\mathcal{E}_{y}\cap\mathcal{E}_{z}}\Delta_{e^{\prime\prime}}(t)\biggl{\}}

For each (e1,e2)(yz)×(zy)(e_{1},e_{2})\in(\mathcal{E}_{y}-\mathcal{E}_{z})\times(\mathcal{E}_{z}-\mathcal{E}_{y}), we compute Re1Re2𝒬Ie1,e2R_{e_{1}}\Join R_{e_{2}}\Join\mathcal{Q}_{I}^{e_{1},e_{2}} by invoking the RelaxedTwoWay primitive iteratively (line 16-17), with the upper bound Nρ(𝒬)N^{\rho^{*}(\mathcal{Q})}, and filter these results by remaining relations with semi-joins (line 18). For each e3yze_{3}\in\mathcal{E}_{y}\cap\mathcal{E}_{z}, we compute Re3𝒬Ie3R_{e_{3}}\Join\mathcal{Q}^{e_{3}}_{I} by invoking the RelaxedTwoWay primitive (line 20), with the upper bound Nρ(𝒬)N^{\rho^{*}(\mathcal{Q})}, and filter these results by remaining relations with semi-joins (line 21).

5.3 Analysis of Algorithm 7

Base Case: |𝒱|=1|\mathcal{V}|=1. The obliviousness is guaranteed by the Intersect primitive. The cache complexity is O(NBlogMBNB)O\left(\frac{N}{B}\cdot\log_{\frac{M}{B}}\frac{N}{B}\right). In this case, ρ=1\rho^{*}=1. Hence, Theorem 5.1 holds.

General Case: |𝒱|>1|\mathcal{V}|>1.

By hypothesis, the recursive invocation of ObliviousGenericJoin at line 4 takes O(Nρ(𝒬)logN)O\left(N^{\rho^{*}(\mathcal{Q})}\cdot\log N\right) time and O(NρBlogMBNB)O\left(\frac{N^{\rho^{*}}}{B}\cdot\log_{\frac{M}{B}}\frac{N}{B}\right) cache complexity, since ρ((I,[I]))ρ(𝒬)\rho^{*}((I,\mathcal{E}[I]))\leq\rho^{*}(\mathcal{Q}). We then show the correctness and complexity for all invocations of RelaxedTwoWay primitive. Let ρ()\rho^{*}(\cdot) be an optimal fractional edge cover of 𝒬\mathcal{Q}. The real size of the two-way join at line 9 can be first rewritten as:

ex|Re𝒬Ie|=ext𝒬Ie|Ret|=ext𝒬Ieminex|Ret|t𝒬Iex|Ret|ρ(e)Nρ\sum_{e\in\mathcal{E}_{x}}\left|R_{e}\Join\mathcal{Q}^{e}_{I}\right|=\sum_{e\in\mathcal{E}_{x}}\sum\limits_{t\in\mathcal{Q}^{e}_{I}}\left|R_{e}\ltimes t\right|=\sum_{e\in\mathcal{E}_{x}}\sum\limits_{t\in\mathcal{Q}^{e}_{I}}\min_{e^{\prime}\in\mathcal{E}_{x}}\left|R_{e^{\prime}}\ltimes t\right|\leq\sum_{t\in\mathcal{Q}_{I}}\prod_{e^{\prime}\in\mathcal{E}_{x}}\left|R_{e^{\prime}}\ltimes t\right|^{\rho^{*}(e^{\prime})}\leq N^{\rho^{*}}

where the inequalities follow the facts that exρ(e)1\displaystyle{\sum_{e^{\prime}\in\mathcal{E}_{x}}\rho^{*}(e^{\prime})\geq 1}, rx𝒬Ie=𝒬I\bigcup_{r\in\mathcal{E}_{x}}\mathcal{Q}^{e}_{I}=\mathcal{Q}_{I}, and the query decomposition lemma [44]. Hence, Nρ(𝒬)N^{\rho^{*}(\mathcal{Q})} is valid upper bound for Re𝒬IeR_{e}\Join\mathcal{Q}^{e}_{I} for each exe\in\mathcal{E}_{x}. The real size of the two-way join at lines 18-19 and line 22 can be rewritten as

e1yz,e2zy|Re1Re2𝒬Ie1,e2|+e3yz|Re3𝒬Ie3|\displaystyle\sum_{e_{1}\in\mathcal{E}_{y}-\mathcal{E}_{z},e_{2}\in\mathcal{E}_{z}-\mathcal{E}_{y}}\left|R_{e_{1}}\Join R_{e_{2}}\Join\mathcal{Q}^{e_{1},e_{2}}_{I}\right|+\sum_{e_{3}\in\mathcal{E}_{y}\cap\mathcal{E}_{z}}\left|R_{e_{3}}\Join\mathcal{Q}^{e_{3}}_{I}\right|
=e1yz,e2zyt𝒬Ie1,e2|(Re1t)(Re2t)|+e3yzt𝒬Ie3|Re3t|\displaystyle=\sum_{e_{1}\in\mathcal{E}_{y}-\mathcal{E}_{z},e_{2}\in\mathcal{E}_{z}-\mathcal{E}_{y}}\sum_{t\in\mathcal{Q}^{e_{1},e_{2}}_{I}}\left|\left(R_{e_{1}}\ltimes t\right)\Join\left(R_{e_{2}}\ltimes t\right)\right|+\sum_{e_{3}\in\mathcal{E}_{y}\cap\mathcal{E}_{z}}\sum_{t\in\mathcal{Q}^{e_{3}}_{I}}\left|R_{e_{3}}\ltimes t\right|
=t𝒬Imin{mine1yz,e2zy|Re1t||Re2t|,mine3yz|Re3t|}\displaystyle=\sum_{t\in\mathcal{Q}_{I}}\min\left\{\min_{e_{1}\in\mathcal{E}_{y}-\mathcal{E}_{z},e_{2}\in\mathcal{E}_{z}-\mathcal{E}_{y}}|R_{e_{1}}\ltimes t|\cdot|R_{e_{2}}\ltimes t|,\min_{e_{3}\in\mathcal{E}_{y}\cap\mathcal{E}_{z}}|R_{e_{3}}\ltimes t|\right\} (2)

Let ρ1=eyzρ(e)\displaystyle{\rho_{1}=\sum_{e\in\mathcal{E}_{y}-\mathcal{E}_{z}}\rho^{*}(e)}, ρ2=ezyρ(e)\displaystyle{\rho_{2}=\sum_{e\in\mathcal{E}_{z}-\mathcal{E}_{y}}\rho^{*}(e)} and ρ3=eyzρ(e)\displaystyle{\rho_{3}=\sum_{e\in\mathcal{E}_{y}\cap\mathcal{E}_{z}}\rho^{*}(e)}. Note ρ31min{ρ1,ρ2}\rho_{3}\geq 1-\min\{\rho_{1},\rho_{2}\} as ρ()\rho^{*}(\cdot) is a valid fractional edge cover for both yy and zz. For each tuple t𝒬It\in\mathcal{Q}_{I}, we have

min{mine1yz,e2zy|Re1t||Re2t|,mine3yz|Re3t|}\displaystyle\min\left\{\min_{e_{1}\in\mathcal{E}_{y}-\mathcal{E}_{z},e_{2}\in\mathcal{E}_{z}-\mathcal{E}_{y}}\left|R_{e_{1}}\ltimes t\right|\cdot|R_{e_{2}}\ltimes t|,\min_{e_{3}\in\mathcal{E}_{y}\cap\mathcal{E}_{z}}\left|R_{e_{3}}\ltimes t\right|\right\}
(mine1yz|Re1t|)ρ1(mine2zy|Re2t|)ρ2(mine3yz|Re3t|)ρ3\displaystyle\leq\left(\min_{e_{1}\in\mathcal{E}_{y}-\mathcal{E}_{z}}\left|R_{e_{1}}\ltimes t\right|\right)^{\rho_{1}}\cdot\left(\min_{e_{2}\in\mathcal{E}_{z}-\mathcal{E}_{y}}|R_{e_{2}}\ltimes t|\right)^{\rho_{2}}\cdot\left(\min_{e_{3}\in\mathcal{E}_{y}\cap\mathcal{E}_{z}}\left|R_{e_{3}}\ltimes t\right|\right)^{\rho_{3}}
eyz|Ret|ρ(e)ezy|Ret|ρ(e)eyz|Ret|ρ(e)=eyz|Ret|ρ(e),\displaystyle\leq\prod_{e\in\mathcal{E}_{y}-\mathcal{E}_{z}}\left|R_{e}\ltimes t\right|^{\rho^{*}(e)}\cdot\prod_{e\in\mathcal{E}_{z}-\mathcal{E}_{y}}\left|R_{e}\ltimes t\right|^{\rho^{*}(e)}\cdot\prod_{e\in\mathcal{E}_{y}\cap\mathcal{E}_{z}}\left|R_{e}\ltimes t\right|^{\rho^{*}(e)}=\prod_{e\in\mathcal{E}_{y}\cup\mathcal{E}_{z}}\left|R_{e}\ltimes t\right|^{\rho^{*}(e)},

where the first inequality follows from min{a,b}apb1p\min\left\{a,b\right\}\leq a^{p}\cdot b^{1-p} for a,b0a,b\geq 0 and p[0,1]p\in[0,1], and the third inequality follows from ρ1,ρ2min{ρ1,ρ2}\rho_{1},\rho_{2}\geq\min\left\{\rho_{1},\rho_{2}\right\}. Now, we can further bound (5.3) as

(5.3)t𝒬Ieyz|Ret|ρ(e)=t𝒬Ieyz|Ret|ρ(e)e|Re|ρ(e)Nρ(\ref{eq:2})\leq\sum_{t\in\mathcal{Q}_{I}}\prod_{e\in\mathcal{E}_{y}\cup\mathcal{E}_{z}}\left|R_{e}\ltimes t\right|^{\rho^{*}(e)}=\sum_{t\in\mathcal{Q}_{I}}\prod_{e\in\mathcal{E}_{y}\cup\mathcal{E}_{z}}\left|R_{e}\ltimes t\right|^{\rho^{*}(e)}\leq\prod_{e\in\mathcal{E}}|R_{e}|^{\rho^{*}(e)}\leq N^{\rho^{*}}

where the second last inequality follows the query decomposition lemma [44].

Theorem 5.1.

For a general join query 𝒬\mathcal{Q}, there is an oblivious and cache-agnostic algorithm that can compute 𝒬()\mathcal{Q}(\mathcal{R}) for any instance \mathcal{R} of input size NN with O(NρlogN)O\left(N^{\rho^{*}}\cdot\log N\right) time complexity and O(NρBlogMBNρB)O\left(\frac{N^{\rho^{*}}}{B}\cdot\log_{\frac{M}{B}}\frac{N^{\rho^{*}}}{B}\right) cache complexity under the tall cache and wide block assumptions, where ρ\rho^{*} is the optimal fractional edge cover number of 𝒬\mathcal{Q}.

5.4 Implications to Relaxed Oblivious Algorithms

Our oblivious WCOJ algorithm can be combined with the generalized hypertree decomposition framework [33] to develop a relaxed oblivious algorithm for general join queries.

Definition 5.2 (Generalized Hypertree Decomposition (GHD)).

Given a join query 𝒬=(𝒱,)\mathcal{Q}=(\mathcal{V},\mathcal{E}), a GHD of 𝒬\mathcal{Q} is a pair (𝒯,λ)(\mathcal{T},\lambda), where 𝒯\mathcal{T} is a tree as an ordered set of nodes and λ:𝒯2𝒱\lambda:\mathcal{T}\to 2^{\mathcal{V}} is a labeling function which associates to each vertex u𝒯u\in\mathcal{T} a subset of attributes in 𝒱\mathcal{V}, λu\lambda_{u}, such that (1) for each ee\in\mathcal{E}, there is a node u𝒯u\in\mathcal{T} such that eλue\subseteq\lambda_{u}; (2) For each x𝒱x\in\mathcal{V}, the set of nodes {u𝒯:xλu}\{u\in\mathcal{T}:x\in\lambda_{u}\} forms a connected subtree of 𝒯\mathcal{T}. The fractional hypertree width of 𝒬\mathcal{Q} is defined as min(𝒯,λ)maxu𝒯ρ((λu,{eλ:e}))\displaystyle{\min_{(\mathcal{T},\lambda)}\max_{u\in\mathcal{T}}\rho^{*}\left((\lambda_{u},\{e\cap\lambda:e\in\mathcal{E}\})\right)}.

The pseudocode of our algorithm is given in Appendix D. Suppose we take as input a join query 𝒬=(𝒱,)\mathcal{Q}=(\mathcal{V},\mathcal{E}), an instance \mathcal{R}, and an upper bound on the output size τ|𝒬()|\tau\geq|\mathcal{Q}(\mathcal{R})|. Let (𝒯,λ)(\mathcal{T},\lambda) be an arbitrary GHD of 𝒬\mathcal{Q}. We first invoke Algorithm 7 to compute the subquery 𝒬u=(λu,u)\mathcal{Q}_{u}=(\lambda_{u},\mathcal{E}_{u}) defined by each node u𝒯u\in\mathcal{T}, where u={eu:e}\mathcal{E}_{u}=\{e\cap u:e\in\mathcal{E}\}, and materialize its join result as one relation. We then apply the classic Yannakakis algorithm [54] on the materialized relations by invoking the SemiJoin primitive for semi-joins and the RelaxedTwoWay primitive for pairwise joins. After removing dangling tuples, the size of each two-way join is upper bound by the size of the final join results and, therefore, τ\tau. This leads to a relaxed oblivious algorithm whose access pattern only depends on NN and τ\tau.

Theorem 5.3.

For a join query 𝒬\mathcal{Q}, an instance \mathcal{R} of input size NN, and parameter τ|𝒬()|\tau\geq|\mathcal{Q}(\mathcal{R})|, there is a cache-agnostic algorithm that can compute 𝒬()\mathcal{Q}(\mathcal{R}) with O((Nw+τ)log(Nw+τ))O\left((N^{w}+\tau)\cdot\log(N^{w}+\tau)\right) time complexity and O(Nw+τBlogMBNw+τB)O\left(\frac{N^{w}+\tau}{B}\cdot\log_{\frac{M}{B}}\frac{N^{w}+\tau}{B}\right) cache complexity, whose access pattern only depends on NN and τ\tau, where ww is the fractional hypertree width of 𝒬\mathcal{Q}.

6 Conclusion

This paper has introduced a general framework for oblivious multi-way join processing, achieving near-optimal time and cache complexity. However, several intriguing questions remain open for future exploration:

  • Balancing Privacy and Efficiency: Recent research has investigated improved trade-offs between privacy and efficiency, aiming to overcome the challenges of worst-case scenarios, such as differentially oblivious algorithms [14].

  • Emit model for EM algorithms. In the context of EM join algorithms, the emit model - where join results are directly outputted without writing back to disk - has been considered. It remains open whether oblivious, worst-case optimal join algorithms can be developed without requiring all join results to be written back to disk.

  • Communication-oblivious join algorithm for MPC model. A natural connection exists between the MPC and EM models in join processing. While recent work has explored communication-oblivious algorithms in the MPC model [13, 49], extending these ideas to multi-way join processing remains an open challenge.

References

  • [1] S. Abiteboul, R. Hull, and V. Vianu. Foundations of databases, volume 8. Addison-Wesley Reading, 1995.
  • [2] M. Abo Khamis, H. Q. Ngo, and A. Rudra. Faq: questions asked frequently. In PODS, pages 13–28, 2016.
  • [3] A. Aggarwal and S. Vitter, Jeffrey. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116–1127, 1988.
  • [4] M. Ajtai, J. Komlós, and E. Szemerédi. An 0 (n log n) sorting network. In STOC, pages 1–9, 1983.
  • [5] A. Arasu and R. Kaushik. Oblivious query processing. ICDT, 2013.
  • [6] L. Arge, M. A. Bender, E. D. Demaine, B. Holland-Minkley, and J. Ian Munro. An optimal cache-oblivious priority queue and its application to graph algorithms. SIAM Journal on Computing, 36(6):1672–1695, 2007.
  • [7] G. Asharov, I. Komargodski, W.-K. Lin, K. Nayak, E. Peserico, and E. Shi. Optorama: Optimal oblivious ram. In Eurocrypt, pages 403–432. Springer, 2020.
  • [8] A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. In FOCS, pages 739–748. IEEE, 2008.
  • [9] K. E. Batcher. Sorting networks and their applications. In Proceedings of the April 30–May 2, 1968, spring joint computer conference, pages 307–314, 1968.
  • [10] P. Beame, P. Koutris, and D. Suciu. Communication steps for parallel query processing. JACM, 64(6):1–58, 2017.
  • [11] C. Beeri, R. Fagin, D. Maier, and M. Yannakakis. On the desirability of acyclic database schemes. JACM, 30(3):479–513, 1983.
  • [12] A. Beimel, K. Nissim, and M. Zaheri. Exploring differential obliviousness. In APPROX/RANDOM, 2019.
  • [13] T. Chan, K.-M. Chung, W.-K. Lin, and E. Shi. Mpc for mpc: secure computation on a massively parallel computing architecture. In ITCS. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
  • [14] T. H. Chan, K.-M. Chung, B. M. Maggs, and E. Shi. Foundations of differentially oblivious algorithms. In SODA, pages 2448–2467. SIAM, 2019.
  • [15] Z. Chang, D. Xie, and F. Li. Oblivious ram: A dissection and experimental evaluation. Proc. VLDB Endow., 9(12):1113–1124, 2016.
  • [16] Z. Chang, D. Xie, F. Li, J. M. Phillips, and R. Balasubramonian. Efficient oblivious query processing for range and knn queries. TKDE, 2021.
  • [17] Z. Chang, D. Xie, S. Wang, and F. Li. Towards practical oblivious join. In SIGMOD, 2022.
  • [18] S. Chu, D. Zhuo, E. Shi, and T.-H. H. Chan. Differentially Oblivious Database Joins: Overcoming the Worst-Case Curse of Fully Oblivious Algorithms. In ITC, volume 199, pages 19:1–19:24, 2021.
  • [19] V. Costan and S. Devadas. Intel sgx explained. Cryptology ePrint Archive, 2016.
  • [20] N. Crooks, M. Burke, E. Cecchetti, S. Harel, R. Agarwal, and L. Alvisi. Obladi: Oblivious serializable transactions in the cloud. In OSDI, pages 727–743, 2018.
  • [21] E. D. Demaine. Cache-oblivious algorithms and data structures. Lecture Notes from the EEF Summer School on Massive Data Sets, 8(4):1–249, 2002.
  • [22] S. Deng and Y. Tao. Subgraph enumeration in optimal i/o complexity. In ICDT. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2024.
  • [23] S. Devadas, M. v. Dijk, C. W. Fletcher, L. Ren, E. Shi, and D. Wichs. Onion oram: A constant bandwidth blowup oblivious ram. In TCC, pages 145–174. Springer, 2016.
  • [24] S. Eskandarian and M. Zaharia. Oblidb: Oblivious query processing for secure databases. Proc. VLDB Endow., 13(2).
  • [25] R. Fagin. Degrees of acyclicity for hypergraphs and relational database schemes. JACM, 30(3):514–550, 1983.
  • [26] A. Z. Fan, P. Koutris, and H. Zhao. Tight bounds of circuits for sum-product queries. SIGMOD, 2(2):1–20, 2024.
  • [27] J. Flum, M. Frick, and M. Grohe. Query evaluation via tree-decompositions. JACM, 49(6):716–752, 2002.
  • [28] M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS, pages 285–297. IEEE, 1999.
  • [29] C. Gentry, K. A. Goldman, S. Halevi, C. Julta, M. Raykova, and D. Wichs. Optimizing oram and using it efficiently for secure computation. In PETs, pages 1–18. Springer, 2013.
  • [30] O. Goldreich. Towards a theory of software protection and simulation by oblivious rams. In STOC, pages 182–194, 1987.
  • [31] O. Goldreich and R. Ostrovsky. Software protection and simulation on oblivious rams. JACM, 43(3):431–473, 1996.
  • [32] M. T. Goodrich. Data-oblivious external-memory algorithms for the compaction, selection, and sorting of outsourced data. In SPAA, pages 379–388, 2011.
  • [33] G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions and tractable queries. JCSS, 64(3):579–627, 2002.
  • [34] H. Hacigümüş, B. Iyer, C. Li, and S. Mehrotra. Executing sql over encrypted data in the database-service-provider model. In SIGMOD, pages 216–227, 2002.
  • [35] B. He and Q. Luo. Cache-oblivious nested-loop joins. In CIKM, pages 718–727, 2006.
  • [36] X. Hu. Cover or pack: New upper and lower bounds for massively parallel joins. In PODS, pages 181–198, 2021.
  • [37] X. Hu, M. Qiao, and Y. Tao. I/o-efficient join dependency testing, loomis–whitney join, and triangle enumeration. JCSS, 82(8):1300–1315, 2016.
  • [38] B. Ketsman and D. Suciu. A worst-case optimal multi-round algorithm for parallel computation of conjunctive queries. In PODS, pages 417–428, 2017.
  • [39] P. Koutris, P. Beame, and D. Suciu. Worst-case optimal algorithms for parallel query processing. In ICDT. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016.
  • [40] S. Krastnikov, F. Kerschbaum, and D. Stebila. Efficient oblivious database joins. VLDB, 13(12):2132–2145, 2020.
  • [41] E. Kushilevitz, S. Lu, and R. Ostrovsky. On the (in) security of hash-based oblivious ram and a new balancing scheme. In SODA, pages 143–156. SIAM, 2012.
  • [42] W.-K. Lin, E. Shi, and T. Xie. Can we overcome the n log n barrier for oblivious sorting? In SODA, pages 2419–2438. SIAM, 2019.
  • [43] H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms. JACM, 65(3):1–40, 2018.
  • [44] H. Q. Ngo, C. Ré, and A. Rudra. Skew strikes back: New developments in the theory of join algorithms. Acm Sigmod Record, 42(4):5–16, 2014.
  • [45] V. Ramachandran and E. Shi. Data oblivious algorithms for multicores. In SPAA, pages 373–384, 2021.
  • [46] S. Sasy, A. Johnson, and I. Goldberg. Fast fully oblivious compaction and shuffling. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 2565–2579, 2022.
  • [47] E. Shi. Path oblivious heap: Optimal and practical oblivious priority queue. In 2020 IEEE Symposium on Security and Privacy (SP), pages 842–858. IEEE, 2020.
  • [48] E. Stefanov, M. V. Dijk, E. Shi, T.-H. H. Chan, C. Fletcher, L. Ren, X. Yu, and S. Devadas. Path oram: an extremely simple oblivious ram protocol. JACM, 65(4):1–26, 2018.
  • [49] Y. Tao, R. Wang, and S. Deng. Parallel communication obliviousness: One round and beyond. Proceedings of the ACM on Management of Data, 2(5):1–24, 2024.
  • [50] T. L. Veldhuizen. Leapfrog triejoin: A simple, worst-case optimal join algorithm. In ICDT, 2014.
  • [51] J. S. Vitter. External memory algorithms and data structures: Dealing with massive data. ACM Computing surveys (CsUR), 33(2):209–271, 2001.
  • [52] X. Wang, H. Chan, and E. Shi. Circuit oram: On tightness of the goldreich-ostrovsky lower bound. In CCS, pages 850–861, 2015.
  • [53] Y. Wang and K. Yi. Query evaluation by circuits. In PODS, 2022.
  • [54] M. Yannakakis. Algorithms for acyclic database schemes. In VLDB, pages 82–94, 1981.
  • [55] W. Zheng, A. Dave, J. G. Beekman, R. A. Popa, J. E. Gonzalez, and I. Stoica. Opaque: An oblivious and encrypted distributed analytics platform. In NSDI 17, pages 283–298, 2017.

Appendix A Missing Materials in Section 1

Graph Joins. A join query 𝒬=(𝒱,)\mathcal{Q}=(\mathcal{V},\mathcal{E}) is a graph join if |e|2|e|\leq 2 for each ee\in\mathcal{E}, i.e., each relation contains at most two attributes.

Loomis-Whitney Joins.

A join query 𝒬=(𝒱,)\mathcal{Q}=(\mathcal{V},\mathcal{E}) is a Loomis-Whitney join if 𝒱={x1,x2,,xk}\mathcal{V}=\{x_{1},x_{2},\cdots,x_{k}\} and ={𝒱{xi}:i[k]}\mathcal{E}=\{\mathcal{V}-\{x_{i}\}:i\in[k]\}.

Appendix B Oblivious Primitives in Section 2

We provide the algorithm descriptions and pseudocodes for the oblivious primitives declared in Section 2.2. For the local variables used in these primitives, key, val, 𝗉𝗈𝗌\mathsf{pos} and 𝖼𝗇𝗍\mathsf{cnt}, we do not need to establish obliviousness for them because they are stored in the trusted memory during the entire execution of the algorithms and the adversaries cannot observe the access pattern to them. But for all the temporal sets with non-constant size, KK and LL, they are stored in the untrusted memory.

SemiJoin.

Given two input relations RR, SS and their common attribute(s) xx, the goal is to replace each tuple in RR that cannot be joined with any tuple in SS with a dummy tuple \perp, i.e., compute RSR\ltimes S. As shown in Algorithm 10, we first sort all tuples by their join values and break ties by putting SS-tuples before RR-tuples if they share the same join value in xx. We then perform a linear scan, using an additional variable key to track the largest join value of the previous tuple that is no larger than the join value of the current tuple tt visited. More specifically, we distinguish two cases on tt. Suppose tRt\in R. If πxt=key\pi_{x}t=\textsf{key}, we just write tt to the result array LL. Otherwise, we write a dummy tuple \perp to LL. Suppose tSt\in S. We simply write a dummy tuple \perp to LL and update key with πxt\pi_{x}t. At last, we compact the elements in LL to move all \perp to the last and keep the first |R||R| tuples in LL.

1 KK\leftarrow Sort RSR\cup S by attribute(s) xx, breaking ties by putting SS-tuples before RR-tuples when they have the same value in xx;
2key\textsf{key}\leftarrow\perp, LL\leftarrow\emptyset;
3 while read tt from KK do
4       if tRt\in R then
5             if tt\neq\perp and πxt=key\pi_{x}t=\textsf{key} then write tt to LL;
6             else write \perp to LL;
7            
8      else write \perp to LL;
9       keyπxt\textsf{key}\leftarrow\pi_{x}t;
10      
11return Compact LL while keeping the first |R||R| tuples;
Algorithm 10 SemiJoin(R,S,xR,S,x)
1 KSortK\leftarrow\textrm{{Sort}} RR by attribute(s) xx with all \perp moved to the last;
2 𝗄𝖾𝗒\mathsf{key}\leftarrow\perp, val0\textsf{val}\leftarrow 0, LL\leftarrow\emptyset;
3 while read tt from KK do
4       if t=t=\perp then  write \perp to LL;
5       else if tt\neq\perp and πxt=𝗄𝖾𝗒\pi_{x}t=\mathsf{key} then  write \perp to LL, valvalw(t)\textsf{val}\leftarrow\textsf{val}\oplus w(t);
6       else  write (key,val)(\textsf{key},\textsf{val}) to LL, valw(t)\textsf{val}\leftarrow w(t), keyπxt\textsf{key}\leftarrow\pi_{x}t;
7      
8write (key,val)(\textsf{key},\textsf{val}) to LL;
9 return Compact LL while keeping the first |R||R| tuples;
Algorithm 11 ReduceByKey(R,x,w(),R,x,w(\cdot),\oplus)

ReduceByKey.

Given an input relation ReR_{e}, some of which are distinguished as \perp, a set of key attribute(s) xex\subseteq e, a weight function ww, and an aggregate function \oplus, the goal is to output the aggregation of each key value, which is defined as the function \oplus over the weights of all tuples with the same key value. This primitive can be used to compute degree information, i.e., the number of tuples displaying a specific key value in a relation.

As shown in Algorithm 11, we sort all tuples by their key values (values in attribute(s) xx) while moving all distinguished tuples to the last of the relation. Then, we perform a linear scan, using an additional variable key to track the key value of the previous tuple, and val to track the aggregation over the weights of tuples visited. We distinguish three cases. If t=t=\perp, the remaining tuples in KK are all distinguished as \perp, implied by the sorting. We write a dummy tuple \perp to LL in this case. If tt\neq\perp and πxt=key\pi_{x}t=\textsf{key}, we simply write a dummy tuple tt to LL, and increase val by w(t)w(t). If tt\neq\perp and keyπxt\textsf{key}\neq\pi_{x}t, the values of all elements with key key are already aggregated into val. In this case, we need to write (key,val)(\textsf{key},\textsf{val}) to LL and update val with w(t)w(t), i.e., the value of current tuple, and key with πxt\pi_{x}t. At last, we compact the tuples in LL by moving all \perp to the last and keep the first |R||R| tuples in LL for obliviousness.

1 KK\leftarrow Sort RSR\cup S by attribute(s) xx while moving all \perp to the last and breaking ties by putting SS-tuples before RR-tuples when they have the same value in xx;
2 key\textsf{key}\leftarrow\perp, val0\textsf{val}\leftarrow 0 , LL\leftarrow\emptyset;
3 while read tt from KK do
4       if t=t=\perp then write \perp to LL;
5       else if tSt\in S then
6             write \perp to LL, valπx¯t\textsf{val}\leftarrow\pi_{\bar{x}}t, keyπxt\textsf{key}\leftarrow\pi_{x}t
7      else if tRt\in R and πxt=key\pi_{x}t=\textsf{key} then write (t,val)(t,\textsf{val}) to LL;
8       else write \perp to LL;
9      
10return Compact LL while keeping the first |R||R| tuples;
Algorithm 12 Annotate(R,S,x)R,S,x))

Annotate.

Given an input relation RR, where each tuple is associated with a key, and a list SS of key-value pairs, where each pair is associated with a distinct key, the goal is to attach, for each tuple in RR, the value of the corresponding distinct pair in SS matched by the key. As shown in Algorithm 12, we first sort all tuples in RR and SS by their key values in attribute xx, while moving all \perp to the last of the relation and breaking ties by putting all SS-tuples before RR-tuples when they have the same key value. We then perform a linear scan, using another two variables key,val\textsf{key},\textsf{val} to track the SS-tuple with the largest key but no larger than the key of the current tuple visited. We distinguish the following cases. If tt is a SS-tuple and tt\neq\perp, we update key,val\textsf{key},\textsf{val} with tt. If tt is a RR-tuple and tt\neq\perp, we attach val to tt by writing (t,val)(t,\textsf{val}) to LL. We write a dummy tuple \perp to LL in the remaining cases. Finally, we compact the tuples in LL to remove unnecessary dummy tuples.

1 KK\leftarrow Sort RR by attribute(s) xx;
2 key\textsf{key}\leftarrow\perp, val0\textsf{val}\leftarrow 0, LL\leftarrow\emptyset;
3 foreach tKt\in K do
4       if πxt=key\pi_{x}t=\textsf{key} then  valval+1\textsf{val}\leftarrow\textsf{val}+1 ;
5       else val1\textsf{val}\leftarrow 1,   keyπxt\textsf{key}\leftarrow\pi_{x}t;
6       write (t,val)(t,\textsf{val}) to LL;
7      
8return LL;
Algorithm 13 MultiNumber(R,xR,x)

MultiNumber.

Given an input relation RR, each associated with a key attribute(s) xx, the goal is to attach consecutive numbers 1,2,3,,1,2,3,\cdots, to tuples with the same key.

As shown in Algorithm 13, we first sort all tuples in RR by attribute xx. We then perform a linear scan, using two additional variables key,val\textsf{key},\textsf{val} to track the key of the previous tuples, and the number assigned to the previous tuple. Consider tt as the current element visited. If πxt=key\pi_{x}t=\textsf{key}, we simply increase val by 11. Otherwise, we set val to 11 and update key with πxt\pi_{x}t. In both cases, we assign val to tuple tt and writw (t,val)(t,\textsf{val}) to LL.

Project.

Given an input relation RR defined over attributes ee, and a subset of attributes xex\subseteq e, the goal is to output the list {tR:πxt}\{t\in R:\pi_{x}t\} (without duplication). This primitive can be simply solved by sorting by attribute(s) xx and then removing duplicates by a linear scan.

Intersect.

Given two input arrays R,SR,S of distinct elements separately, the goal is to output the common elements appearing in both RR and SS. This primitive can be done with sorting by attribute(s) xx, and then a linear scan would suffice to find out common elements.

Augment.

Given two relations R,SR,S of at most NN tuples and their common attribute(s) xx, the goal is to attach each tuple tt the number of tuples in SS that can be joined with tt on xx. The Augment primitive can be implemented by the ReduceByKey and Annotate primitives. See Algorithm 14.

1 foreach i[k]i\in[k] do
2       LReduceByKey(Si,x)L\leftarrow\textsc{ReduceByKey}(S_{i},x);
3       RAnnotate(R,L,x)R\leftarrow\textsc{Annotate}(R,L,x);
4      
5return RR;
Algorithm 14 Augment(R,{S1,S2,,Sk},xR,\{S_{1},S_{2},\cdots,S_{k}\},x)

Appendix C RelaxedTwoWay Primitive

Given two relations R,SR,S of N1,N2N_{1},N_{2} tuples and an integral parameter τ\tau, where N1+N2=NN_{1}+N_{2}=N and |RS|τ|R\Join S|\leq\tau, the goal is to output a relation of size τ\tau whose first |RS||R\Join S| tuples are the join results and the remaining tuples are dummy tuples. Arasu et al. [5] first proposed an oblivious algorithm for τ=|RS|\tau=|R\Join S|, but it involves rather complicated primitive without giving complete details [16]. Krastnikov et al. [40] later showed a more clean and effective version, but this algorithm does not have a satisfactory cache complexity. Below, we present our own version of the relaxed two-way join. We need one important helper primitive first.

Expand Primitive.

Given a sequence of (ti,wi):wi+,i[N]\langle(t_{i},w_{i}):w_{i}\in\mathbb{Z}^{+},i\in[N]\rangle and a parameter τi[N]wi\tau\geq\sum_{i\in[N]}w_{i}, the goal is to expand each tuple tit_{i} with wiw_{i} copies and output a table of τ\tau tuples. The naive way of reading a pair (ti,wi)(t_{i},w_{i}) and then writing wiw_{i} copies does not preserve obliviousness since the number of consecutive writes can leak the information. Alternatively, one might consider writing a fixed number of tuples after reading each pair. Still, the ordering of reading pairs is critical for avoiding dummy writes and avoiding too many pairs stored in trusted memory (this strategy is exactly adopted by [5]).

We present a simpler algorithm by combining the oblivious primitives. Suppose LL is the output table of RR, such that LL contains wiw_{i} copies of tit_{i}, and any tuple tit_{i} comes before tjt_{j} if i<ji<j. As described in Algorithm 15, it consists of four phases:

  • (lines 1-4). for each pair (ti,wi)R(t_{i},w_{i})\in R with wi0w_{i}\neq 0, attach the beginning position of tit_{i} in R~\tilde{R}, which is j<iwj\sum_{j<i}w_{j}. For the remaining pairs with wi=0w_{i}=0, replace them with \perp and attach with the infinite position as these tuples will not participate in any join result;

  • (lines 5-7) pad τ\tau dummy tuples and attach them with consecutive numbers 1.5,2.5,1.5,2.5,\cdots; after sorting the well-defined positions, each tuple tit_{i} will be followed by wiw_{i} dummy tuples, and all dummy tuples with infinite positions are put at last;

  • (lines 8-14) for each tuple tit_{i}, we replace it with \perp but the following wiw_{i} dummy tuples with tit_{i}. After moving all dummy tuples to the end, the first τ\tau elements are the output.

1 𝗉𝗈𝗌1\mathsf{pos}\leftarrow 1, KK\leftarrow\emptyset;
2 while read (ti,wi)(t_{i},w_{i}) from RR do
3      if (ti,wi)=(,)(t_{i},w_{i})=(\perp,\perp) then write (,+)(\perp,+\infty) to XX;
4       else write (ti,𝗉𝗈𝗌)(t_{i},\mathsf{pos}) to KK, 𝗉𝗈𝗌𝗉𝗈𝗌+wi\mathsf{pos}\leftarrow\mathsf{pos}+w_{i};
5      
6𝗉𝗈𝗌1.5\mathsf{pos}\leftarrow 1.5;
7 foreach i[τ]i\in[\tau] do write (,𝗉𝗈𝗌)(\perp,\mathsf{pos}) to KK, 𝗉𝗈𝗌𝗉𝗈𝗌+1\mathsf{pos}\leftarrow\mathsf{pos}+1;
8 Sort KK by 𝗉𝗈𝗌\mathsf{pos};
9 tt\leftarrow\perp, 𝖼𝗇𝗍0\mathsf{cnt}\leftarrow 0, LL\leftarrow\emptyset;
10 while read (key,𝗉𝗈𝗌)(\textsf{key},\mathsf{pos}) from KK do
11       if 𝗉𝗈𝗌=+\mathsf{pos}=+\infty then  write \perp to LL;
12       else if key\textsf{key}\neq\perp then tkeyt\leftarrow\textsf{key}, write \perp to LL;
13       else if 𝖼𝗇𝗍<τ\mathsf{cnt}<\tau then write tt to LL ;
14       else write \perp to LL;
15       𝖼𝗇𝗍𝖼𝗇𝗍+1\mathsf{cnt}\leftarrow\mathsf{cnt}+1;
16      
17return Compact LL while keeping the first τ\tau elements;
Algorithm 15 Expand(R=(ti,wi):i[N],τR=\langle(t_{i},w_{i}):i\in[N]\rangle,\tau)

It can be easily checked that the access pattern of Expand only depends on the values of τ\tau and NN. Moreover, Expand is cache-agnostic since they are constructed by sequential compositions of cache-agnostic primitives (Scan, Sort and Compact).

Lemma C.1.

Given a relation \mathcal{R} of input size NN and a parameter τ\tau, the Expand primitive is cache-agnostic with O((N+τ)log(N+τ))O\left((N+\tau)\cdot\log(N+\tau)\right) time complexity and O(N+τBlogMBN+τB)O\left(\frac{N+\tau}{B}\log_{\frac{M}{B}}\frac{N+\tau}{B}\right) cache complexity, whose access pattern only depends on NN and τ\tau.

Now, we are ready to describe the algorithmic details of RelaxedTwoWay primitive. The high-level idea is to simulate the sort-merge join algorithm without revealing the movement of pointers in the merge phase. Let L=R(x1,x2)S(x2,x3)L=R(x_{1},x_{2})\Join S(x_{2},x_{3}) be the join results sorted by x2,x3,x1x_{2},x_{3},x_{1} lexicographically. The idea is to transform R,SR,S into a sub-relation of LL by keeping attributes (x1,x2),(x2,x3)(x_{1},x_{2}),(x_{2},x_{3}) separately, without removing duplicates. Then, doing a one-to-one merge to obtain the final join results suffices. As described in Algorithm 16, we construct these two sub-relations from the input relations R,SR,S via the following steps (a running example is given in Figure 1):

  • (line 1) attach each tuple with the number of tuples it can be joined in the other relation;

  • (line 2) expand each tuple to the annotated number of copies;

  • (lines 3-4) prepare the expanded R~\tilde{R} and S~\tilde{S} with the “correct” ordering, as it appears in the final sort-merge join results;

  • (lines 5-8) perform a one-to-one merge of ordered tuples in R~\tilde{R} and S¯\bar{S};

As a sequential composition of (relaxed) oblivious primitives, RelaxedTwoWay is cache-agnostic, with O((N+τ)log(N+τ))O((N+\tau)\cdot\log(N+\tau)) time complexity and O(N+τBlogMBN+τB)O(\frac{N+\tau}{B}\cdot\log_{\frac{M}{B}}\frac{N+\tau}{B}) cache complexity, whose access pattern only depends on NN and τ\tau.

1 R^Augment(R,S,x2)\hat{R}\leftarrow\textsc{Augment}(R,S,x_{2}), S^Augment(S,R,x2)\hat{S}\leftarrow\textsc{Augment}(S,R,x_{2});
2 R~Expand(R^,τ)\tilde{R}\leftarrow\textsc{Expand}(\hat{R},\tau), S~Expand(S^,τ)\tilde{S}\leftarrow\textsc{Expand}(\hat{S},\tau);
3 S¯MultiNumber(S~,x2)\bar{S}\leftarrow\textsc{MultiNumber}(\tilde{S},x_{2}); // S~\tilde{S} is enriched with another attribute 𝗇𝗎𝗆\mathsf{num}
4 Sort S¯\bar{S} by attributes x2x_{2} and 𝗇𝗎𝗆\mathsf{num} lexicographically;
5 LL\leftarrow\emptyset;
6 while read t1t_{1} from R~\tilde{R} and read t2t_{2} from S¯\bar{S} do
7       if t1t_{1}\not=\perp and t2t_{2}\not=\perp then write t1t2t_{1}\Join t_{2} to LL;
8       else write \perp to LL;
9      
10return LL;
Algorithm 16 RelaxedTwoWay(R(x1,x2),S(x2,x3),τR(x_{1},x_{2}),S(x_{2},x_{3}),\tau)
Refer to caption
Figure 1: A running example of Algorithm 16.

Appendix D Missing Materials in Section 5

1 (𝒯,λ)(\mathcal{T},\lambda)\leftarrow a GHD of 𝒬\mathcal{Q};
2 foreach node u𝒯u\in\mathcal{T} do
3       u{eλu:e}\mathcal{E}_{u}\leftarrow\{e\cap\lambda_{u}:e\in\mathcal{E}\};
4       foreach ee\in\mathcal{E} do Se,uπeλuReS_{e,u}\leftarrow\pi_{e\cap\lambda_{u}}R_{e} by project;
5       𝒬uOblivousGenericJoin((λu,u),{Se,u:e})\mathcal{Q}_{u}\leftarrow\textsc{OblivousGenericJoin}\left((\lambda_{u},\mathcal{E}_{u}),\{S_{e,u}:e\in\mathcal{E}\}\right);
6      
7while visit nodes u𝒯u\in\mathcal{T} in a bottom-up way (excluding the root) do
8       pup_{u}\leftarrow the parent node of uu;
9       𝒬puSemiJoin(𝒬pu,𝒬u)\mathcal{Q}_{p_{u}}\leftarrow\textsc{SemiJoin}(\mathcal{Q}_{p_{u}},\mathcal{Q}_{u});
10      
11while visit nodes u𝒯u\in\mathcal{T} in a top-down way (excluding the leaves) do
12       foreach child node vv of uu do  𝒬vSemiJoin(𝒬v,𝒬u)\mathcal{Q}_{v}\leftarrow\textsc{SemiJoin}(\mathcal{Q}_{v},\mathcal{Q}_{u});
13      
14while visit nodes u𝒯u\in\mathcal{T} in a bottom-up way (excluding the root) do
15       pup_{u}\leftarrow the parent node of uu;
16       𝒬puRelaxedTwoWay(𝒬pu,𝒬u,τ)\mathcal{Q}_{p_{u}}\leftarrow\textsc{RelaxedTwoWay}(\mathcal{Q}_{p_{u}},\mathcal{Q}_{u},\tau);
17      
18return 𝒬r\mathcal{Q}_{r} for the root node rr of 𝒯\mathcal{T};
Algorithm 17 RelaxedJoin(𝒬=(𝒱,),,τ)\textsc{RelaxedJoin}(\mathcal{Q}=(\mathcal{V},\mathcal{E}),\mathcal{R},\tau)