This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Speed-up of Data Analysis with Kernel Trick
in Encrypted Domain

Joon Soo Yoo, Baek Kyung Song, Tae Min Ahn, Ji Won Heo, Ji Won Yoon
Korea University
Seoul, South Korea
{sandiegojs, baekkyung777, xoals3563, hjw4, jiwon_yoon}@korea.ac.kr
Abstract

Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performance in ML/STAT algorithms within encrypted domains. This technique, independent of underlying HE mechanisms and complementing existing optimizations, notably reduces costly HE multiplications, offering near constant time complexity relative to data dimension. Aimed at accessibility, this method is tailored for data scientists and developers with limited cryptography background, facilitating advanced data analysis in secure environments.

Index Terms:
Homomorphic Encryption, Kernel Method, Privacy-Preserving Machine Learning, Privacy-Preserving Statistical Analysis, High-dimensional Data Analysis

I Introduction

Homomorphic encryption (HE) enables computations on encrypted data, providing quantum-resistant security in client-server models. Since its introduction by Gentry in 2009 [1], HE has rapidly evolved towards practical applications, offering privacy-preserving solutions in various data-intensive fields such as healthcare and finance.

Despite these advancements, applying HE to complex nonlinear data analysis poses significant challenges due to the computational demands of internal nonlinear functions. Although basic models such as linear regression are well-suited for HE, extending this to more complex algorithms, like those seen in logistic regression models [2, 3, 4, 5], often results in oversimplified linear or quasi-linear approaches. This limitation narrows the scope of HE’s applicability to more intricate models.

Furthermore, advanced data analysis algorithms resist simple linearization, shifting the focus of research to secure inference rather than encrypted training. Prominent models like nGraph-HE2 [6], LoLa [7], and CryptoNets [8] exemplify this trend, emphasizing inference over training, with reported latencies on benchmarks like the MNIST dataset of 2.05 seconds for nGraph-HE2, 2.2 seconds for LoLa, and 250 seconds for CryptoNets. It should be noted that these models are only used for testing and do not encrypt trained parameters, relying instead on relatively inexpensive plaintext-ciphertext multiplications, which contributes to their reported time performance.

Since all of these problems arise from the time-consuming nature of operations performed in HE schemes, much literature focuses on optimizing the performance level of FHE by hardware acceleration or adding new functionality in the internal primitive. Jung et al. [9] propose a memory-centric optimization technique that reorders primary functions in bootstrapping to apply massive parallelism; they use GPUs for parallel computation and achieve 100 times faster bootstrapping in one of the state-of-the-art FHE schemes, CKKS [10]. Likewise, [11] illustrates the extraction of a similar structure of crucial functions in HE multiplication; they use GPUs to improve time performance by 4.05 for CKKS multiplication. Chillotti et al. [12] introduces the new concept of programmable bootstrapping (PBS) in TFHE library [13, 14] which can accelerate neural networks by performing PBS in non-linear activation function.

Our approach to optimization is fundamentally different from existing techniques, which primarily focus on hardware acceleration or improving cryptographic algorithms within specific schemes or libraries. Unlike these approaches, our optimizer operates independently of underlying cryptographic schemes or libraries, and works at a higher software (SW) level. As a result, it does not compete with other optimization techniques and can synergistically amplify their effects to improve time performance.

Moreover, our approach does not necessitate extensive knowledge of cryptographic design. By utilizing our technique, significant speed-ups in homomorphic circuits, particularly for machine learning and statistical algorithms, can be achieved with relative ease by data scientists and developers. Our approach requires only a beginner-level understanding of cryptography, yet yields remarkable improvements in performance that are difficult to match, especially for high-dimensional ML/STAT analysis.

Furthermore, our approach holds great promise for enabling training in the encrypted domain, where traditional most advanced machine learning methods struggle to perform. We demonstrate the effectiveness of our technique by applying it to various machine learning algorithms, including support vector machine (SVM), kk-means clustering, and kk-nearest neighbor algorithms. In classical implementations of these algorithms in the encrypted domain, the practical execution time is often prohibitive due to the high-dimensional data requiring significant computation. However, our approach provides a nearly dimensionless property, where the required computation for high-dimensional data remains nearly the same as that for low-dimensional data, allowing for efficient training even in the encrypted domain.

Our optimizer is based on the kernel method [15], a widely-used technique in machine learning for identifying nonlinear relationships between attributes. The kernel method involves applying a function ϕ\phi that maps the original input space to a higher-dimensional inner product space. This enables the use of modified versions of machine learning algorithms expressed solely in terms of kernel elements or inner products of data points. By leveraging this approach, our optimizer enables the efficient training of machine learning algorithms even in the presence of high-dimensional data, providing a powerful tool for encrypted machine learning.

Our findings suggest that the kernel trick can have a significant impact on HE regarding execution time. This is due to the substantial time performance gap between addition and multiplication operations in HE. In HE schemes, the design of multiplication is much more complex. For example, in TFHE, the multiplication to addition is about 21, given the 16-bit input size for Boolean evaluation. In addition, BGV-like schemes (BGV, BFV, CKKS) require additional steps followed by multiplication of ciphertexts such as relinearization and modulus-switching. On contrary, the latency of plain multiplication is approximately the same as that of plain addition  [16].

The kernel method can effectively reduce the number of heavy multiplication operations required, as it transforms the original input space to an inner product space where kernel elements are used for algorithm evaluation. Since the inner product between data points is pre-evaluated in the preprocessing stage, and since the inner product involves the calculation of dd (dimension size) multiplications, much of the multiplication in the algorithm is reduced in this process. This structure can lead to significant performance improvements for some ML/STAT algorithms, such as kk-means and total variance.

To showcase the applicability of the kernel method to encrypted data, we performed SVM classification as a preliminary example on a subset of the MNIST dataset [17] using the CKKS encryption scheme, which is implemented in the open-source OpenFHE library [18]. Specifically, we obtained an estimation111λ=128\lambda=128 represents the security parameter, N=216N=2^{16} signifies the dimension of the polynomial ring, Δ=250\Delta=2^{50} is the scaling factor, and L=110L=110 stands for the circuit depth (Check [10].) of 38.18 hours and 509.91 seconds for SVM’s general and kernel methods, respectively. This demonstrates a performance increase of approximately 269 times for the kernel method compared to the classical approach.

We summarize our contributions as follows:

  • Universal Applicability. Introduced the linear kernel trick to the HE domain, where our proposed method works independently, regardless of underlying HE schemes or libraries. The kernel method can synergistically combine with any of the underlying optimization techniques to boost performance.

  • Dimensionless Efficiency. Demonstrated near-constant time complexity across ML/STAT algorithms (classification, clustering, dimension reduction), leading to significant execution time reductions, especially for high-dimensional data.

  • Enhanced Training Potential. Shown potential for significantly improved ML training in the HE domain where current HE training models struggle.

  • User-Friendly Approach. Easily accessible for data scientists and developers with limited knowledge of cryptography.

II Preliminaries

II-A Homomorphic Encryption (HE)

Homomorphic encryption (HE) is a quantum-resistant encryption scheme that includes a unique evaluation step, allowing computations on encrypted data. The main steps for a general two-party (client-server model) symmetric HE scheme are as follows, with the client performing:

  • KeyGen(λ\lambda): Given a security parameter λ\lambda, the algorithm outputs a secret key 𝗌𝗄\mathsf{sk} and an evaluation key 𝖾𝗏𝗄\mathsf{evk}.

  • Enc(mm, 𝗌𝗄\mathsf{sk}): Given a message mMm\in M from the message space MM, the encryption algorithm uses the secret key 𝗌𝗄\mathsf{sk} to generate a ciphertext 𝖼𝗍\mathsf{ct}.

  • Dec(𝖼𝗍\mathsf{ct}, 𝗌𝗄\mathsf{sk}): Given a ciphertext 𝖼𝗍\mathsf{ct} encrypted under the secret key 𝗌𝗄\mathsf{sk}, the decryption algorithm outputs the original message mMm\in M.

A distinguishing feature of HE, compared to other encryption schemes, is the evaluation step, Eval, which computes on encrypted messages and is performed by the server.

  • Eval(𝖼𝗍1,,𝖼𝗍k\mathsf{ct}_{1},\cdots,\mathsf{ct}_{k}, 𝖾𝗏𝗄\mathsf{evk}; ψ\psi): Suppose a function ψ:MkM\psi:M^{k}\rightarrow M is to be performed over messages m1,,mkm_{1},\dots,m_{k}. The evaluation algorithm takes as input the ciphertexts 𝖼𝗍1,,𝖼𝗍k\mathsf{ct}_{1},\cdots,\mathsf{ct}_{k}, each corresponding to m1,,mkm_{1},\cdots,m_{k} encrypted under the same 𝗌𝗄\mathsf{sk}, and uses 𝖾𝗏𝗄\mathsf{evk} to generate a new ciphertext 𝖼𝗍\mathsf{ct}^{\prime} such that 𝖣𝖾𝖼(𝖼𝗍,𝗌𝗄)=ψ(m1,,mk)\mathsf{Dec}(\mathsf{ct}^{\prime},\mathsf{sk})=\psi(m_{1},\cdots,m_{k}).

Fully or Leveled HE. The most promising HE schemes rely on lattice-based problem—learning with errors (LWE) which Regev [19] proposed in 2005, followed by its ring variants by Stehlé [20] in 2009. LWE-based HE uses noise for security; however, the noise is accumulated for each evaluation of ciphertext. When the noise of ciphertext exceeds a certain limit, the decryption of ciphertext does not guarantee the correct result. Gentry [1] introduces a bootstrapping technique that periodically reduces the noise of the ciphertext for an unlimited number of evaluations, i.e., fully homomorphic. However, such bootstrapping is a costly operation. Thus many practical algorithms bypass bootstrapping and instead use leveled homomorphic encryption, or LHE—the depth of the circuit is pre-determined; it uses just enough parameter size for the circuit to evaluate without bootstrapping.

Arithmetic or Boolean. There are two primary branches of FHE: arithmetic and Boolean-based. (1) The arithmetic HE—BGV [21] family—uses only addition and multiplication. Thus, it has to approximate nonlinear operations with addition and multiplication. Also, it generally uses the LHE scheme and has a faster evaluation. B/FV [22] and CKKS [10] are representative examples. (2) Boolean-based HE is the GSW [23] family—FHEW [24] and TFHE [14], which provides fast-bootstrapping gates such as XOR and AND gates. Typically, it is slower than arithmetic HE; however, it has more functionalities other than the usual arithmetics.

II-B Kernel Method

In data analysis, the kernel method is a technique that facilitates the transformation ϕ:XF\phi:X\rightarrow F from the input space XX to a high-dimensional feature space FF, enabling the separation of data in FF that is not linearly separable in the original space XX. The kernel function K(𝐱𝐢,𝐱𝐣)=ϕ(𝐱𝐢)ϕ(𝐱𝐣)K(\mathbf{x_{i}},\mathbf{x_{j}})=\phi(\mathbf{x_{i}})\cdot\phi(\mathbf{x_{j}}) computes the inner product of 𝐱𝐢\mathbf{x_{i}} and 𝐱𝐣\mathbf{x_{j}} in FF without explicitly mapping the data to the high-dimensional space FF. The kernel matrix 𝐊\mathbf{K} is a square symmetric matrix of size n×nn\times n that contains all pair-wise inner products of ϕ(𝐱𝐢)\phi(\mathbf{x_{i}}) and ϕ(𝐱𝐣)\phi(\mathbf{x_{j}}), i.e., K(𝐱𝐢,𝐱𝐣)K(\mathbf{x_{i}},\mathbf{x_{j}}), where the data matrix 𝐃\mathbf{D} contains nn data points 𝐱𝐢\mathbf{x_{i}} each with dimension dd. For more details, refer to [25].

III Kernel Effect in Homomorphic Encryption

III-A Why is Kernel More Effective in HE than Plain Domain?

The kernel method can greatly benefit from the structural difference in HE between addition and multiplication operations. Specifically, the main reason is that the kernel method avoids complex-designed HE multiplication and uses almost-zero cost additions.

Time-consuming HE Multiplication. Homomorphic multiplication is significantly more complex than addition, unlike plain multiplication, which has a similar time performance to addition. For example, in the BGV family of the homomorphic encryption scheme, the multiplication of two ciphertexts 𝐜𝐭=(𝖼𝗍𝟢,𝖼𝗍𝟣)\mathbf{ct}=(\mathsf{ct_{0}},\mathsf{ct_{1}}) and 𝐜𝐭=(𝖼𝗍𝟢,𝖼𝗍𝟣)q2\mathbf{ct^{\prime}}=(\mathsf{ct_{0}^{\prime}},\mathsf{ct_{1}^{\prime}})\in\mathcal{R}_{q}^{2} is defined as:

𝐜𝐭mult=(𝖼𝗍𝟢𝖼𝗍𝟢,𝖼𝗍𝟢𝖼𝗍𝟣+𝖼𝗍𝟢𝖼𝗍𝟣,𝖼𝗍𝟣𝖼𝗍𝟣)\mathbf{ct}_{\text{mult}}=(\mathsf{ct_{0}}\cdot\mathsf{ct_{0}^{\prime}},\mathsf{ct_{0}}\cdot\mathsf{ct_{1}^{\prime}}+\mathsf{ct_{0}^{\prime}}\cdot\mathsf{ct_{1}},\mathsf{ct_{1}}\cdot\mathsf{ct_{1}^{\prime}})

where q=q[X]/(XN+1)\mathcal{R}_{q}=\mathbb{Z}_{q}[X]/(X^{N}+1) is the polynomial ring. The size of the ciphertext 𝐜𝐭mult\mathbf{ct}_{\text{mult}} grows after multiplication, requiring an additional process called relinearization to reduce the ciphertext to its normal form. In the CKKS scheme, a rescaling procedure is used to maintain a constant scale. On the other hand, HE addition is merely vector addition of polynomial ring elements q\mathcal{R}_{q}. Thus, the performance disparity between addition and multiplication in HE is substantial compared to that in plain operations (see Table I).

TABLE I: Comparison of Execution Times and Ratios for Addition and Multiplication in Plain and HE Domains.
Plain222We obtained the average execution time of plaintext additions and multiplications by measuring their runtime 1000 times each. TFHE333λ=128,l=16\lambda=128,l=16 CKKS444λ=128,N=216,Δ=250,L=50\lambda=128,N=2^{16},\Delta=2^{50},L=50 B/FV555λ=128,L=20,n=215,log2(q)=780\lambda=128,L=20,n=2^{15},\log_{2}(q)=780
Addition 3.39(ns)3.39(ns) 1.06(s)1.06(s) 24.85(ms)24.85(ms) 1.81(ms)1.81(ms)
Multiplication 3.56(ns)3.56(ns) 22.95(s)22.95(s) 920.75(ms)920.75(ms) 284.62(ms)284.62(ms)
Ratio 1.051.05 21.6521.65 37.0437.04 156.73156.73

III-B Two Kernel Properties for Acceleration

P1: Parallel Computation of Kernel Function. One property of kernel evaluation is the ability to induce parallel structures in the algorithm, such as computing kernel elements K(𝐱𝐢,𝐱𝐣)=ϕ(𝐱𝐢)ϕ(𝐱𝐣)K(\mathbf{x_{i}},\mathbf{x_{j}})=\phi(\mathbf{x_{i}})\cdot\phi(\mathbf{x_{j}}). Since evaluating kernel elements involves performing the same dot product structure over input vectors of the same size, HE can benefit from parallel computation of these kernel elements. For instance, in the basic HE model, (1) the server can perform concurrent computation of kernel elements during the pre-processing stage. (2) Alternatively, the client can compute the kernel elements and send the encrypted kernels to the server as alternative inputs, replacing the original encrypted data for further computation. Either way, the server can bypass the heavy multiplication in the pre-processing stage, thereby accelerating the overall kernel evaluation process (see Fig. 1).

Refer to caption
Figure 1: (P1) Parallel computation structure for evaluating kernel elements. Expensive HE multiplications can be pre-computed in parallel during the initial stage for enhanced performance. Each kernel element evaluation can be assigned to a separate processor, enabling concurrent computation over the kernel matrix 𝐊\mathbf{K}.

P2: Dimensionless—Constant Time Complexity with respect to Dimension. A key feature of the kernel method in HE schemes is its dimensionless property. Generally, evaluating certain ML/STAT algorithms requires numerous inner products of vectors, each involving dd multiplications. Given the high computational cost of HE multiplications, this results in significant performance degradation. However, by employing the kernel method, we can circumvent the need for these dot products. Consequently, dimension-dependent computations are avoided during kernel evaluation, leading to a total time complexity that is constant with respect to the data dimension dd, unlike general algorithms, which have a linear dependency on dd (see Fig. 2).

Refer to caption
Figure 2: (P2) Dimensionless—constant time complexity w.r.t. dimension. While general ML/STAT algorithms require the computation of inner products involving costly multiplications, the kernel method bypasses these dot products by utilizing kernel elements or scalars.

III-C Example: kk-means Algorithm

The kk-means algorithm is an iterative process for finding optimal clusters. The algorithm deals with (1) cluster reassignment and (2) centroid update (see Fig. 3). The bottleneck is the cluster reassignment process that computes the distance d(𝐱j,𝝁i)d(\mathbf{x}_{j},\bm{\mu}_{i}) from each centroid 𝝁i\bm{\mu}_{i}. Since evaluating the distance requires a dot product of the deviation from the centroid, dd multiplications are required for calculating each distance. Assuming that the algorithm converges in tt iteration, the total number of multiplications is tnkdtnkd.

Refer to caption
Figure 3: Example of the kk-means algorithm. The kernel method lessens the multiplication number by approximately a factor of dd (P1). The inner products or the kernel element is computed parallel from the initial stage (P2). The total time complexity is nearly constant with respect to dimension dd.

Using the kernel method and its properties, we can significantly improve the time performance of the circuit. First, we express the distance in terms of the kernel elements:

d(𝐱j,𝝁i)=𝐱a,𝐱bgiK(𝐱a,𝐱b)2ni𝐱agiK(𝐱j,𝐱a)d(\mathbf{x}_{j},\bm{\mu}_{i})=\mathop{\sum\sum}_{\mathbf{x}_{a},\mathbf{x}_{b}\in g_{i}}K(\mathbf{x}_{a},\mathbf{x}_{b})-2n_{i}\sum\limits_{\mathbf{x}_{a}\in g_{i}}K(\mathbf{x}_{j},\mathbf{x}_{a}) (1)

where nin_{i} is the number of elements in the cluster gi{g1,,gk}g_{i}\in\{g_{1},\dots,g_{k}\}. We verify that only one multiplication is performed in Eq. (1), which reduces the multiplications by a factor of dd from the general distance calculation. Therefore, the total number of multiplications in the kernel kk-means is tnktnk (P2, dimensionless property), whereas tnkdtnkd in the general method. Note that we can evaluate the kernel elements in parallel at the algorithm’s beginning, compensating for the dd multiplication time (P1, parallel structure).

Kernel Method—More Effective in the Encrypted Domain: kk-means Simulation. Based on algorithms in Fig. 3, we simulate the kk-means algorithm within different types of domains—plain, TFHE, CKKS, and B/FV—to demonstrate the effectiveness of the kernel method in the encrypted domain (see Fig. 4). We count the total number of additions and multiplications in kk-means; based on the HE parameter set and execution time in Table I, we compute the total simulation time with fixed parameters t=10,k=3t=10,k=3.

Refer to caption
(a) (n=10,d=784)(n=10,d=784)
Refer to caption
(b) (n=100,d=784)(n=100,d=784)
Figure 4: kk-means simulation: This figure demonstrates the effectiveness of the kernel method in kk-means across different domains—plain, TFHE, CKKS, and B/FV. The parameters were fixed at k=3k=3 and t=10t=10, and the log ratio of general time (tgent_{gen}) to kernel time (tkert_{ker}) was compared.

The result indicates that the kernel method is more effective in the encrypted domain. Fig. 4a shows that the kernel method has a significant speed-up when the time ratio of addition and multiplication is large. For instance, B/FV has a maximum speed-up of 645645 times, whereas plain kernel has 4545 times (CKKS: 416416 times, TFHE: 315315 times). Moreover, Fig. 4b demonstrates that depending on the parameter set, the kernel effect on the time performance can differ. For instance, B/FV has a speed-up of 3535 times in Fig. 4b.

Furthermore, the kernel can even have a negative impact on the execution time. For example, the plain kernel in Fig. 4b conversely decreases the total execution time by 5353 percent. This highlights the sensitivity of the plain kernel’s performance to variations. We provide actual experimental results regarding the kernel effect in Section VI.

III-D Benefits of the Kernel Method

Software-Level Optimization—Synergistic with Any HE Schemes or Libraries. The kernel method is synergistic with any HE scheme, amplifying the speed-up achieved by HE hardware accelerators since it operates at the software level. For instance, [11] uses GPUs at the hardware level to accelerate HE multiplication by 4.05 times in the CKKS scheme. When the kernel method is applied to evaluate the SVM of dimension 784 (single usage: 269×\times), it can increase acceleration by more than 1,076 times. Most current literature focuses on the hardware acceleration of each HE scheme. However, the kernel method does not compete with each accelerator; instead, it enhances the time performance of HE circuits.

Reusability of the Kernel Matrix. The kernel method enables the reusability of the kernel matrix, which can be employed in other kernel algorithms. The server can store both the dataset 𝐃\mathbf{D} and the kernel matrix 𝐊\mathbf{K}, eliminating the need to reconstruct inner products of data points. This results in faster computations and efficient resource utilization.

IV Kernel Trick and Asymptotic Complexities

This section analyzes the time complexities of general and kernel evaluations for ML/STAT algorithms. We first present a summary of the theoretical time complexities in Table II, followed by an illustration of how these complexities are derived. Note that the kk-means and kk-NN algorithms use a Boolean-based approach, while the remaining algorithms are implemented with an arithmetic HE scheme.

TABLE II: Comparison of Time Complexities in ML/STAT Algorithms
Algorithm General TC Kernel TC
SVM O(tn2d)O(tn^{2}d) O(tn2)O(tn^{2})
PCA O(m2+mtpow+nm)O(m^{2}+mt_{pow}+nm) O(nr+rtpow+3ntsqrt)O(nr+rt_{pow}+3nt_{sqrt})
kk-NN O(10ndl+6ndl2+Δshared)O(10ndl+6ndl^{2}+\Delta_{shared}) O(15nl+Δshared)O(15nl+\Delta_{shared})
kk-means O(6tnkdl2+tnkdl)O(6tnkdl^{2}+tnkdl) O(11tn2kl+3tnk2l)O(11tn^{2}kl+3tnk^{2}l)
Total Variance O(nd)O(nd) O(1)O(1)
Distance O(d)O(d) O(1)O(1)
Norm O(d+3tsqrt)O(d+3t_{sqrt}) O(3tsqrt)O(3t_{sqrt})
Similarity O(3d+3tsinv+6tsqrt)O(3d+3t_{sinv}+6t_{sqrt}) O(3tsinv+6tsqrt)O(3t_{sinv}+6t_{sqrt})

IV-A Arithmetic HE Construction

Matrix Multiplication and Linear Transformation. Halevi et al. [26] demonstrate efficient matrix and matrix-vector multiplication within HE schemes. Specifically, the time complexity for multiplying two n×nn\times n matrices is optimized to O(n2)O(n^{2}), while linear transformations are improved to O(n)O(n).

Dominant Eigenvalue: Power Iteration. The power iteration method [28] recursively multiplies a matrix by an initial vector until convergence. For a square matrix of degree mm, assuming tpowt_{pow} iterations, the time complexity is O(mtpow)O(mt_{pow}).

Square Root and Inverse Operation of Real Numbers. Wilkes’s iterative algorithm [29] approximates the square root operation with a complexity of O(tsqrt)O(t_{sqrt}). Goldschmidt’s division algorithm approximates the inverse operation with a complexity of O(tsinv)O(t_{sinv}).

IV-A1 SVM

Kernel Trick. The main computation part of SVM is to find the optimal parameter 𝜶=(α1,,αn)\bm{\alpha}=(\alpha_{1},\dots,\alpha_{n}) to construct the optimal hyperplane h(𝐱)=𝐰T𝐱+bh^{*}(\mathbf{x})=\mathbf{w}^{T}\mathbf{x}+b. For optimization, we use SGD algorithm that iterates over the following gradient update rule: αk=αk+η(1yki=1nαiyi𝐱iT𝐱j)\alpha_{k}=\alpha_{k}+\eta(1-y_{k}\sum\limits_{i=1}^{n}\alpha_{i}y_{i}\mathbf{x}_{i}^{T}\mathbf{x}_{j}).

The kernel trick for SVM is a replacement of 𝐱iT𝐱j\mathbf{x}_{i}^{T}\mathbf{x}_{j} with a linear kernel element K(𝐱i,𝐱j)K(\mathbf{x}_{i},\mathbf{x}_{j}). Thus, we compute αk=αk+η(1yki=1nαiyiK(𝐱i,𝐱j))\alpha_{k}=\alpha_{k}+\eta(1-y_{k}\sum\limits_{i=1}^{n}\alpha_{i}y_{i}K(\mathbf{x}_{i},\mathbf{x}_{j})) for the kernel evaluation.

(1) General Method. The gradient update rule for αk\alpha_{k} requires n(d+2)+1=O(nd)n(d+2)+1=O(nd) multiplications. For the complete set of 𝜶\bm{\alpha} and assuming that the worst-case of convergence happens in tt iterations, the total time complexity is O(tn2d)O(tn^{2}d).

(2) Kernel Method. The kernel trick bypasses inner products, which requires dd multiplications. Hence, it requires 2n+1=O(n)2n+1=O(n) multiplications for updating αk\alpha_{k} (O(n2)O(n^{2}) for 𝜶\bm{\alpha}). Therefore, the total time complexity is O(tn2)O(tn^{2}).

IV-A2 PCA

Kernel Trick. Suppose λ1\lambda_{1} is the dominant eigenvalue with the corresponding eigenvector 𝐮1\mathbf{u}_{1} of the covariance matrix 𝚺\bm{\Sigma}. Then, the eigenpair (λ1,𝐮1)(\lambda_{1},\mathbf{u}_{1}) satisfies 𝚺𝐮1=λ1𝐮1\bm{\Sigma}\mathbf{u}_{1}=\lambda_{1}\mathbf{u}_{1}. Since 𝚺=1ni=1n𝐱i𝐱iT\bm{\Sigma}=\frac{1}{n}\sum\limits_{i=1}^{n}\mathbf{x}_{i}\mathbf{x}_{i}^{T}, we can express 𝐮1=i=1nci𝐱i\mathbf{u}_{1}=\sum\limits_{i=1}^{n}c_{i}\mathbf{x}_{i} where ci=𝐱iT𝐮1nλ1c_{i}=\frac{\mathbf{x}_{i}^{T}\mathbf{u}_{1}}{n\lambda_{1}}. Plugging in the formulae for Σ\Sigma and 𝝁1\bm{\mu}_{1} and multiplying 𝐱kT\mathbf{x}_{k}^{T} on both sides yield: i=1n𝐱kT𝐱ij=1ncj𝐱iT𝐱j=nλ1i=1nci𝐱kT𝐱i\sum\limits_{i=1}^{n}\mathbf{x}_{k}^{T}\mathbf{x}_{i}\sum\limits_{j=1}^{n}c_{j}\mathbf{x}_{i}^{T}\mathbf{x}_{j}=n\lambda_{1}\sum\limits_{i=1}^{n}c_{i}\mathbf{x}_{k}^{T}\mathbf{x}_{i} for all 𝐱kD\mathbf{x}_{k}\in\textbf{D}. By substituting inner products with the corresponding kernel elements K(,)K(\cdot,\cdot), we derive the following equation expressed in terms of the kernel elements:

i=1nK(𝐱k,𝐱i)j=1ncjK(𝐱i,𝐱j)=nλ1i=1nciK(𝐱k,𝐱i).\sum\limits_{i=1}^{n}K(\mathbf{x}_{k},\mathbf{x}_{i})\sum\limits_{j=1}^{n}c_{j}K(\mathbf{x}_{i},\mathbf{x}_{j})=n\lambda_{1}\sum\limits_{i=1}^{n}c_{i}K(\mathbf{x}_{k},\mathbf{x}_{i}).

Further simplification yields 𝐊𝐜=(nλ1)𝐜\mathbf{K}\mathbf{c}=(n\lambda_{1})\mathbf{c}, where 𝐜\mathbf{c} is an eigenvector of the kernel matrix 𝐊\mathbf{K}. Thus, a reduced basis element 𝝁i\bm{\mu}_{i} is derived by scaling 𝐜i\mathbf{c}_{i} with λi𝐜iT𝐜i\sqrt{\lambda_{i}\mathbf{c}_{i}^{T}\mathbf{c}_{i}}.

Comparison of Complexities. Both general and kernel methods require the computation of eigenvalues and eigenvectors using the power iteration method for the covariance matrix 𝚺\bm{\Sigma} and kernel matrix 𝐊\mathbf{K}.

(1) General Method. This algorithm involves one matrix multiplication, power iteration, and nn matrix-vector multiplications. The time complexity is O(m2+mtpow+nm)O(m^{2}+mt_{pow}+nm), where m=max(n,d)m=\text{max}(n,d).

(2) Kernel Method. This algorithm involves power iteration, rr normalizations (square root), and rnrn inner-products. The time complexity is O(ntpow+3rtsqrt+nr)O(nt_{pow}+3rt_{sqrt}+nr).

IV-A3 Total Variance

Kernel Trick. The linear kernel trick simplifies the normal variance formula by expressing it in terms of dot products:

TV=1ni=1n𝐱i𝝁2=1ni=1nK(𝐱i,𝐱i)1n2i=1nj=1nK(𝐱i,𝐱j)\displaystyle TV=\frac{1}{n}\sum\limits_{i=1}^{n}\lVert\mathbf{x}_{i}-\bm{\mu}\rVert^{2}=\frac{1}{n}\sum\limits_{i=1}^{n}K(\mathbf{x}_{i},\mathbf{x}_{i})-\frac{1}{n^{2}}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}K(\mathbf{x}_{i},\mathbf{x}_{j})

where the second equality holds by substituting 𝝁=1ni=1n𝐱i\bm{\mu}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{x}_{i}.

(1) General Method. This requires nn dot products of deviations; the time complexity is O(nd)O(nd).

(2) Kernel Method. The kernel trick for total variance bypasses dot products, resulting in a time complexity of O(1)O(1).

IV-A4 Distance / Norm

Kernel trick. We can apply the linear kernel trick on distance 𝒹(𝐱i,𝐱j)=𝐱i𝐱j2\mathcal{d}(\mathbf{x}_{i},\mathbf{x}_{j})=\lVert\mathbf{x}_{i}-\mathbf{x}_{j}\rVert^{2}:

𝒹(𝐱i,𝐱j)=𝐱i22𝐱iT𝐱j+𝐱j2=K(𝐱i,𝐱i)2K(𝐱i,𝐱j)+K(𝐱j,𝐱j).\displaystyle\begin{split}\mathcal{d}(\mathbf{x}_{i},\mathbf{x}_{j})&=\lVert\mathbf{x}_{i}\rVert^{2}-2\mathbf{x}_{i}^{T}\mathbf{x}_{j}+\lVert\mathbf{x}_{j}\rVert^{2}\\ &=K(\mathbf{x}_{i},\mathbf{x}_{i})-2K(\mathbf{x}_{i},\mathbf{x}_{j})+K(\mathbf{x}_{j},\mathbf{x}_{j}).\end{split}

(1) General Method. The general distance formula requires O(d)O(d) for evaluating a dot product. Moreover, norm operation entails additional square root operation; hence, O(d+3tsqrt)O(d+3t_{sqrt}).

(2) Kernel Method. The kernel method necessitates only addition and subtraction requiring O(1)O(1). Likewise, kernelized norm involves the square root operation; thus, O(3tsqrt)O(3t_{sqrt}).

IV-A5 Similarity

Kernel Trick. Applying the linear kernel trick to cosine similarity formula, we have

sim(𝐱i,𝐱j)=𝐱i𝐱j𝐱i𝐱j=K(𝐱i,𝐱j)K(𝐱i,𝐱i)K(𝐱j,𝐱j).sim(\mathbf{x}_{i},\mathbf{x}_{j})=\frac{\mathbf{x}_{i}\cdot\mathbf{x}_{j}}{\lVert\mathbf{x}_{i}\rVert\lVert\mathbf{x}_{j}\rVert}=\frac{K(\mathbf{x}_{i},\mathbf{x}_{j})}{\sqrt{K(\mathbf{x}_{i},\mathbf{x}_{i})}\sqrt{K(\mathbf{x}_{j},\mathbf{x}_{j})}}.

(1) General Method. We compute three dot products, one multiplication, one inverse of a scalar, and two square root operations for computing the general cosine similarity. Hence, it requires O(3d+3tsinv+6tsqrt)O(3d+3t_{sinv}+6t_{sqrt}).

(2) Kernel Method. We can bypass the evaluation of the three inner products using the kernel method. Thus, the total time complexity is O(3tsinv+6tsqrt)O(3t_{sinv}+6t_{sqrt}).

IV-B Boolean-based HE Construction

kk-means and kk-NN Boolean Construction. Performing kk-means and kk-NN in the encrypted domain requires functionalities beyond addition and multiplication, such as comparison operations. In the kk-means algorithm, it is essential to evaluate the minimum encrypted distance to label data based on proximity to each cluster’s encrypted mean. Similarly, kk-NN necessitates evaluating the maximum encrypted labels given the encrypted distances to all data points. Detailed algorithms for kk-means and kk-NN are provided in the Appendix.

Time Complexity for TFHE Circuit Evaluation. We measure the algorithms’ performance by counting the total number of binary gates for exact calculation. Boolean gates such as XOR, AND, and OR take twice the time of a single MUX gate [14]. We assume that the TFHE ciphertext is an encryption of an ll-bit fixed-point number, with l/2l/2 bits for decimals, (l/21)(l/2-1) bits for integers, and 11 bit for the sign bit. Table III presents the time complexity of fundamental TFHE operations used throughout the paper.

TABLE III: Time complexity of TFHE operations used in the paper
TFHE Basic Operation
Comparison argmin / argmax Add / Subt. Multiplication Absolute Value Two’s Complement Division
#\#(Binary Gate) 3l3l k(k1)(3l+1)k(k-1)(3l+1) 5l35l-3 6l2+15l66l^{2}+15l-6 4l14l-1 2l12l-1 272l232l+1\frac{27}{2}l^{2}-\frac{3}{2}l+1

IV-B1 kk-means

Kernel Trick. Let 𝒹ij\mathcal{d}_{ij} denote the distance from data point 𝐱i\mathbf{x}_{i} to the mean of cluster gjg_{j}.

𝒹ij=𝐱i𝝁j2=𝐱iT𝐱i1nj𝐱agj𝐱iT𝐱a+1nj2𝐱a,𝐱bgj𝐱aT𝐱b\mathcal{d}_{ij}=\lVert\mathbf{x}_{i}-\bm{\mu}_{j}\rVert^{2}=\mathbf{x}_{i}^{T}\mathbf{x}_{i}-\frac{1}{n_{j}}\sum_{\mathbf{x}_{a}\in g_{j}}\mathbf{x}_{i}^{T}\mathbf{x}_{a}+\frac{1}{n_{j}^{2}}\sum_{\mathbf{x}_{a},\mathbf{x}_{b}\in g_{j}}\mathbf{x}_{a}^{T}\mathbf{x}_{b}

where njn_{j} is the number of elements in cluster gjg_{j}. Applying the linear kernel trick, we can compute 𝒹ij\mathcal{d}_{ij} using kernel elements:

𝒹ij=K(𝐱i,𝐱i)1nj𝐱agjK(𝐱i,𝐱a)+1nj2𝐱a,𝐱bgjK(𝐱a,𝐱b)\mathcal{d}_{ij}=K(\mathbf{x}_{i},\mathbf{x}_{i})-\frac{1}{n_{j}}\sum_{\mathbf{x}_{a}\in g_{j}}K(\mathbf{x}_{i},\mathbf{x}_{a})+\frac{1}{n_{j}^{2}}\sum_{\mathbf{x}_{a},\mathbf{x}_{b}\in g_{j}}K(\mathbf{x}_{a},\mathbf{x}_{b})

Further simplification yields the objective function:

j=argmin𝑗{nj𝐱agjK(𝐱i,𝐱a)+𝐱a,𝐱bgjK(𝐱a,𝐱b)}j^{*}=\underset{j}{\text{argmin}}\left\{-n_{j}\sum_{\mathbf{x}_{a}\in g_{j}}K(\mathbf{x}_{i},\mathbf{x}_{a})+\sum_{\mathbf{x}_{a},\mathbf{x}_{b}\in g_{j}}K(\mathbf{x}_{a},\mathbf{x}_{b})\right\} (2)

where the common factor K(𝐱i,𝐱i)K(\mathbf{x}_{i},\mathbf{x}_{i}) and divisions are omitted for efficiency.

(1) General Method. The kk-means algorithm involves four steps: computing distances 𝒹ij\mathcal{d}_{ij}, labeling 𝐱i\mathbf{x}_{i}, forming labeled dataset D(i)\textbf{D}^{(i)}, and calculating cluster means 𝝁j\bm{\mu}_{j}. The total complexity of kk-means is asymptotically O(6tnkdl2+tnkdl)O(6tnkdl^{2}+tnkdl), assuming tt iterations until convergence.

(2) Kernel Method. The kernel kk-means algorithm involves four steps: computing the labeled kernel matrix K(j)K^{(j)}, counting the number of elements 𝓃j\mathcal{n}_{j} in each cluster, computing distances 𝒹ij\mathcal{d}_{ij}, and labeling 𝐱i\mathbf{x}_{i}. The total complexity of kernel kk-means is asymptotically O(11tn2kl+3tnk2l)O(11tn^{2}kl+3tnk^{2}l), assuming tt iterations until convergence.

IV-B2 kk-NN

Kernel Trick. The linear kernel trick can be applied to the distance 𝒹i\mathcal{d}_{i} in kk-NN:

𝒹i=𝐱𝐱i2=𝐱T𝐱2𝐱T𝐱i+𝐱iT𝐱i=K(𝐱,𝐱)2K(𝐱,𝐱i)+K(𝐱i,𝐱i).\begin{split}\mathcal{d}_{i}=\lVert\mathbf{x}-\mathbf{x}_{i}\rVert^{2}&=\mathbf{x}^{T}\mathbf{x}-2\mathbf{x}^{T}\mathbf{x}_{i}+\mathbf{x}_{i}^{T}\mathbf{x}_{i}\\ &=K(\mathbf{x},\mathbf{x})-2K(\mathbf{x},\mathbf{x}_{i})+K(\mathbf{x}_{i},\mathbf{x}_{i}).\end{split}

For computational efficiency, we simplify 𝒹i\mathcal{d}_{i} by removing the common kernel element K(𝐱,𝐱)K(\mathbf{x},\mathbf{x}):

𝒹i=2K(𝐱,𝐱i)+K(𝐱i,𝐱i).\mathcal{d}_{i}=-2K(\mathbf{x},\mathbf{x}_{i})+K(\mathbf{x}_{i},\mathbf{x}_{i}). (3)

Comparison of Complexities. Both kk-NN algorithms share the same processes: sorting, counting, and finding the majority label. Specifically, sorting requires (n1)2(n-1)^{2} swaps, with each swap involving 4 MUX gates and one comparison circuit, resulting in a complexity of 11(n1)2l11(n-1)^{2}l. Counting elements in each class requires ksks comparisons and additions, with a complexity of ks(8l3)ks(8l-3). Finding the majority index among ss labels has a complexity of s(s1)(3l+1)s(s-1)(3l+1). Thus, the shared complexity is O(11n2l+8ksl+3s2l)O(11n^{2}l+8ksl+3s^{2}l), denoted as Δshared\Delta_{shared}.

(1) General Method. Computing 𝒹i=𝐱𝐱i2\mathcal{d}_{i}=\lVert\mathbf{x}-\mathbf{x}_{i}\rVert^{2} involves dd subtractions, dd multiplications, and d1d-1 additions, resulting in a total complexity of O(10ndl+6ndl2)O(10ndl+6ndl^{2}). Including the shared process, the total complexity is O(10ndl+6ndl2+Δshared)O(10ndl+6ndl^{2}+\Delta_{shared}).

(2) Kernel Method. Using Eq. (3), 𝒹i\mathcal{d}_{i} is computed with one addition and two subtractions per element, totaling 3n3n additions. Including the shared process, the total complexity is O(15nl+Δshared)O(15nl+\Delta_{shared}).

V Experiment

V-A Evaluation Metric

We aim to evaluate the effectiveness of the kernel method in various ML/STAT algorithms using the following metrics.

(1) Kernel Effect in HE. Let tgent_{gen} and tkert_{ker} denote the execution times for the general and kernel methods, respectively. The kernel effectiveness or speed-up is evaluated by the ratio:

EFF=tgentker.EFF=\frac{t_{gen}}{t_{ker}}.

(2) Kernel Effect Comparison: Plain and HE. Let tgenPLt_{gen}^{PL} and tkerPLt_{ker}^{PL} represent the execution times for the general and kernel methods in the plain domain. Similarly, let tgenHEt_{gen}^{HE} and tkerHEt_{ker}^{HE} represent the execution times in the HE domain. The effectiveness of the kernel methods in both domains can be compared by their respective ratios: EFFPLEFF^{PL} and EFFHEEFF^{HE}.

V-B Experiment Setting

Environment. Experiments were conducted on an Intel Core i7-7700 8-Core 3.60 GHz, 23.4 GiB RAM, running Ubuntu 20.04.5 LTS. We used TFHE version 1.1 for Boolean-based HE and CKKS from OpenFHE [18] for arithmetic HE.

Dataset and Implementation Strategy. We used a randomly generated dataset of n=10n=10 data points with dimensions in [n/2,3n/2][n/2,3n/2]. The dataset is intentionally small due to the computational time required for logic-gates in TFHE. For example, kk-means with parameters (n,d=10n,d=10) under the TFHE scheme takes 8,071 seconds. Despite the small dataset, this work aligns with on-going efforts like Google’s Transpiler [27] and its extension HEIR, which use TFHE for practical HE circuit construction. Employing the kernel method can significantly enhance scalability and efficiency.

Parameter Setting. Consistent parameters were used for both the general and kernel methods across all algorithms.

(1) TFHE Construction. TFHE constructions were set to a 110-bit security level666Initial tests at 128-bit security were revised due to recent attacks demonstrating a lower effective security level., with 16-bit message precision.

(2) CKKS Construction. CKKS constructions used a 128-bit security level and a leveled approach to avoid bootstrapping. Parameters (N,Δ,LN,\Delta,L) were pre-determined to minimize computation time (see Table IV).

(3) kk-means and kk-NN. For the experiments, k=3k=3 and s=2s=2 were used for kk-means, and k=3k=3 for kk-NN.

TABLE IV: Parameter setting for CKKS implementation
λ\lambda RingDim NN ScalingMod Δ\Delta MultDepth LL
SVM 128128 2162^{16} 2502^{50} 110110
PCA 128128 2162^{16} 2502^{50} 3939
Variance 128128 2162^{16} 2502^{50} 3333
Distance 128128 2162^{16} 2502^{50} 3333
Norm 128128 2162^{16} 2502^{50} 3333
Similarity 128128 2162^{16} 2502^{50} 5050

VI Results and Analysis

VI-A Kernel Effect in the Encrypted Domain

Refer to caption
(a) SVM
Refer to caption
(b) PCA
Refer to caption
(c) kk-means
Refer to caption
(d) kk-NN
Figure 5: Execution Time Comparison Between ML’s General and Kernel Methods Showing Kernel’s Dimensionless Property (P2).

Kernel Method in ML/STAT: Acceleration in HE Schemes. Fig. 5 demonstrates the stable execution times of kernel methods in ML algorithms within encrypted domains, contrasting with the linear increase observed in general methods. This consistency in both CKKS and TFHE settings aligns with our complexity analysis in Table II, improving computational efficiency for algorithms such as SVM, PCA, kk-means, and kk-NN. Notably, for general PCA, we observe a linear time increase for d<nd<n and a rapid rise as O(m2)O(m^{2}) for d>nd>n due to the quadratic complexity of evaluating the covariance matrix 𝚺=DTD\bm{\Sigma}=\textbf{D}^{T}\textbf{D}.

Refer to caption
(a) Total variance
Refer to caption
(b) Distance
Refer to caption
(c) Norm
Refer to caption
(d) Similarity
Figure 6: Execution Time Comparison Between STAT’s General and Kernel Methods Showing Kernel’s Outperformance.

Fig. 6 shows that the kernel method consistently outperforms general approaches in STAT algorithms within HE schemes. The kernel method maintains stable execution times across various statistical algorithms, unlike the linearly increasing times of general methods as data dimension grows. This aligns with our complexity calculations (Table II), highlighting the kernel method’s efficiency in encrypted data analysis. The dimensionless property of the kernel method makes it a powerful tool for high-dimensional encrypted data analysis, enabling efficient computations in complex settings.

Kernelization Time (Pre-Processing Stage). Kernelization occurs during preprocessing, with kernel elements computed in parallel. Although not included in the main evaluation duration, kernelization time is minimal, constituting less than 0.05 of the total evaluation time. For instance, in SVM, kernel generation accounts for only 0.02 of the total evaluation time (10.82 seconds of 509.91 seconds). Additionally, the reusability of kernel elements in subsequent applications further reduces the need for repeated computations.

Refer to caption
(a) SVM
Refer to caption
(b) PCA
Refer to caption
(c) kk-NN
Refer to caption
(d) kk-means
Refer to caption
(e) Total Variance
Figure 7: Comparison of Kernel and General Methods in ML/STAT Algorithms for Different Data Sizes (nn) at Dimension d=10d=10.

Kernel Method Outperforms with Increasing Data. We present the execution time of four ML algorithms (SVM, PCA, kk-NN, kk-means) and the total variance in the STAT algorithm, with respect to varying numbers of data (n=5,10,15,20n=5,10,15,20) at a fixed dimension (d=10d=10), as shown in Fig. 7. Our results demonstrate that the kernel method consistently outperforms the general method in terms of execution time, even as the number of data increases. This aligns with our complexity table (see Table 3), where general SVM and PCA show a quadratic increase, while kk-NN, kk-means, and total variance increase linearly.

VI-B Kernel Effect Comparison: Plain vs HE

Refer to caption
(a) SVM
Refer to caption
(b) PCA
Refer to caption
(c) kk-means
Refer to caption
(d) kk-NN
Figure 8: Impact of Kernel Methods in ML Alg. Across Plain and HE Domains.
Refer to caption
(a) Total variance
Refer to caption
(b) Distance
Refer to caption
(c) Norm
Refer to caption
(d) Similarity
Figure 9: Impact of Kernel Method on STAT Alg. in Plain and HE Domains.

Significant Kernel Effect in the HE Domain. Our experiments demonstrate a more substantial kernel effect in the HE domain compared to the plain domain for both ML and STAT algorithms. This effect is consistently observed across algorithms (see Fig. 8 and Fig. 9). For example, the kernel effect in SVM within HE amplifies performance by 2.95-6.26 times, compared to 1.58-1.69 times in the plain domain, across dimensions 5-15. This enhanced performance is due to the kernel method’s ability to reduce heavy multiplicative operations, a significant advantage in HE where multiplication is more time-consuming than addition (see Section III).

VII Conclusion

This paper introduces the kernel method as an effective optimizer for homomorphic circuits in ML/STAT applications, applicable across various HE schemes. We systematically analyze the kernel optimization and demonstrate its effectiveness through complexity analysis and experiments. Our results show significant performance improvements in the HE domain, highlighting the potential for widespread use in secure ML/STAT applications.

References

  • [1] C. Gentry, ”Fully homomorphic encryption using ideal lattices,” in Proc. 41st Annu. ACM Symp. Theory Comput., 2009, pp. 169–178.
  • [2] A. Kim, Y. Song, M. Kim, K. Lee, and J. H. Cheon, ”Logistic regression model training based on the approximate homomorphic encryption,” BMC Med. Genomics, vol. 11, no. 4, pp. 23–31, 2018.
  • [3] H. Chen, R. Gilad-Bachrach, K. Han, Z. Huang, A. Jalali, K. Laine, and K. Lauter, ”Logistic regression over encrypted data from fully homomorphic encryption,” BMC Med. Genomics, vol. 11, no. 4, pp. 3–12, 2018.
  • [4] Y. Aono, T. Hayashi, L. T. Phong, and L. Wang, ”Scalable and secure logistic regression via homomorphic encryption,” in Proc. Sixth ACM Conf. Data Appl. Secur. Priv., 2016, pp. 142–144.
  • [5] E. Crockett, ”A low-depth homomorphic circuit for logistic regression model training,” Cryptology ePrint Archive, 2020.
  • [6] F. Boemer, A. Costache, R. Cammarota, and C. Wierzynski, ”nGraph-HE2: A high-throughput framework for neural network inference on encrypted data,” in Proc. 7th ACM Workshop Encrypted Comput. & Appl. Homomorphic Cryptogr., 2019, pp. 45–56.
  • [7] A. Brutzkus, R. Gilad-Bachrach, and O. Elisha, ”Low latency privacy preserving inference,” in Int. Conf. Mach. Learn., 2019, pp. 812–821.
  • [8] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing, ”Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy,” in Int. Conf. Mach. Learn., 2016, pp. 201–210.
  • [9] W. Jung, S. Kim, J. H. Ahn, J. H. Cheon, and Y. Lee, ”Over 100x faster bootstrapping in fully homomorphic encryption through memory-centric optimization with GPUs,” IACR Trans. Cryptogr. Hardw. Embed. Syst., pp. 114–148, 2021.
  • [10] J. H. Cheon, A. Kim, M. Kim, and Y. Song, ”Homomorphic encryption for arithmetic of approximate numbers,” in Int. Conf. Theory Appl. Cryptol. Inf. Secur., Springer, 2017, pp. 409–437.
  • [11] W. Jung, E. Lee, S. Kim, J. Kim, N. Kim, K. Lee, C. Min, J. H. Cheon, and J. H. Ahn, ”Accelerating fully homomorphic encryption through architecture-centric analysis and optimization,” IEEE Access, vol. 9, pp. 98772–98789, 2021.
  • [12] I. Chillotti, M. Joye, and P. Paillier, ”Programmable bootstrapping enables efficient homomorphic inference of deep neural networks,” in Int. Symp. Cyber Secur. Cryptogr. Mach. Learn., Springer, 2021, pp. 1–19.
  • [13] I. Chillotti, N. Gama, M. Georgieva, and M. Izabachene, ”Faster fully homomorphic encryption: Bootstrapping in less than 0.1 seconds,” in Adv. Cryptol.–ASIACRYPT, 2016, pp. 3–33.
  • [14] I. Chillotti, N. Gama, M. Georgieva, and M. Izabachène, ”TFHE: Fast Fully Homomorphic Encryption over the Torus,” J. Cryptol., vol. 33, no. 1, pp. 34–91, 2020.
  • [15] M. J. Zaki and W. Meira, Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, 2014.
  • [16] Part Guide, ”Intel® 64 and ia-32 architectures software developer’s manual,” Vol. 3B: System Programming Guide, Part, vol. 2, no. 11, 2011.
  • [17] L. Deng, ”The MNIST database of handwritten digit images for machine learning research,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 141–142, 2012.
  • [18] A. Al Badawi et al., ”OpenFHE: Open-Source Fully Homomorphic Encryption Library,” in Proc. 10th Workshop Encrypted Comput. & Appl. Homomorphic Cryptogr., 2022, pp. 53–63.
  • [19] O. Regev, ”On lattices, learning with errors, random linear codes, and cryptography,” J. ACM, vol. 56, no. 6, pp. 1–40, 2009.
  • [20] D. Stehlé, R. Steinfeld, K. Tanaka, and K. Xagawa, ”Efficient public key encryption based on ideal lattices,” in ASIACRYPT, 2009, pp. 617–640.
  • [21] Z. Brakerski, C. Gentry, and V. Vaikuntanathan, ”(Leveled) fully homomorphic encryption without bootstrapping,” ACM Trans. Comput. Theory, vol. 6, no. 3, pp. 1–36, 2014.
  • [22] J. Fan and F. Vercauteren, ”Somewhat practical fully homomorphic encryption,” Cryptology ePrint Archive, 2012.
  • [23] C. Gentry, A. Sahai, and B. Waters, ”Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based,” in Annu. Cryptol. Conf., 2013, pp. 75–92.
  • [24] L. Ducas and D. Micciancio, ”FHEW: bootstrapping homomorphic encryption in less than a second,” in Annu. Int. Conf. Theory Appl. Cryptogr. Tech., 2015, pp. 617–640.
  • [25] B. Schölkopf, A. J. Smola, and F. Bach, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
  • [26] S. Halevi and V. Shoup, ”Faster homomorphic linear transformations in HElib,” in Annu. Int. Cryptol. Conf., 2018, pp. 93–120.
  • [27] S. Gorantala, R. Springer, S. Purser-Haskell, W. Lam, R. Wilson, A. Ali, E. P. Astor, I. Zukerman, S. Ruth, C. Dibak, et al., ”A general purpose transpiler for fully homomorphic encryption,” arXiv preprint arXiv:2106.07893, 2021.
  • [28] G. H. Golub and H. A. Van der Vorst, ”Eigenvalue computation in the 20th century,” J. Comput. Appl. Math., vol. 123, no. 1–2, pp. 35–65, 2000.
  • [29] M. V. Wilkes, D. J. Wheeler, and S. Gill, The Preparation of Programs for an Electronic Digital Computer: With special reference to the EDSAC and the Use of a Library of Subroutines. Addison-Wesley Press, 1951.

Appendix A ML / STAT

Appendix B kk-means and kk-NN: Boolean Construction

B-A Boolean-based HE Construction: kk-means

(1) General Method. Constructing the kk-means circuit in the encrypted domain presents a primary challenge: the iterative update of encrypted labels 𝓵\bm{\mathcal{l}} in each iteration.

li=argmin𝑗{𝒹ij}j=1k.l_{i}=\underset{j}{\text{argmin}}\{\mathcal{d}_{ij}\}_{j=1}^{k}. (4)

This involves two key tasks: 1) labeling data xix_{i} using Eq. (4), and 2) computing the average value for each cluster gig_{i} based on the encrypted labels.

Issue 1: Labeling Data. We solve Eq. (4) using the TFHE.LEQ(𝖼𝗍𝟣,𝖼𝗍𝟤)(\mathsf{ct_{1}},\mathsf{ct_{2}}) comparison operation. This operation returns 𝖤𝗇𝖼(1)\mathsf{Enc}(1) if 𝖼𝗍𝟣\mathsf{ct_{1}} is less than or equal to 𝖼𝗍𝟤\mathsf{ct_{2}}, and 𝖤𝗇𝖼(0)\mathsf{Enc}(0) otherwise. Algorithm 1 determines the index of the minimum value from a set of encrypted ciphertexts {𝒹i}i=1k\{\mathcal{d}_{i}\}_{i=1}^{k}. The output is a binary ciphertext 𝓵=(𝓁1,,𝓁k)\bm{\mathcal{l}}^{*}=(\mathcal{l}_{1},\dots,\mathcal{l}_{k}), where a non-zero 𝓁j\mathcal{l}_{j} indicates the label corresponding to 𝐱i\mathbf{x}_{i}.

1
2for  i=1,,ki=1,\dots,k do
3       𝓁i𝖤𝗇𝖼(1)\mathcal{l}_{i}\leftarrow\mathsf{Enc}(1)
4      for j=1,,kj=1,\dots,k do
5             𝓉\mathcal{t}\leftarrow TFHE.LEQ(𝒹i,𝒹j)ji\underset{j\neq i}{\text{TFHE.LEQ}(\mathcal{d}_{i},\mathcal{d}_{j})}
6            𝓁i\mathcal{l}_{i}\leftarrow TFHE.AND(𝓁i,t\mathcal{l}_{i},t)
7      
8𝓵(𝓁1,,𝓁k)\bm{\mathcal{l}}^{*}\leftarrow(\mathcal{l}_{1},\dots,\mathcal{l}_{k})
Algorithm 1 TFHE.argmini {𝒹i}i=1,,k\{\mathcal{d}_{i}\}_{i=1,\dots,k}

Issue 2: Computing Average Value. Identifying the data 𝐱i\mathbf{x}_{i} in a specific group gjg_{j} using encrypted labels 𝓵i\bm{\mathcal{l}}_{i} is challenging. We address this by extracting cluster data 𝐃(j)\mathbf{D}^{(j)}, setting non-belonging data to zero through an AND operation between each label lijl_{ij} and its respective data 𝐱i\mathbf{x}_{i} (see Algorithm 2).

1
/* 𝐱j𝐱j or 𝐱j𝖤𝗇𝖼(0) by label 𝓁ji\mathbf{x}_{j}\leftarrow\mathbf{x}_{j}\text{ or }\mathbf{x}_{j}\leftarrow\mathsf{Enc}(0)\text{ by label }\mathcal{l}_{ji} */
2
3foreach 𝐱j=(xj1,,xjl)𝐃\mathbf{x}_{j}=(x_{j1},\dots,x_{jl})\in\mathbf{D} do
4      
5      xjkx_{jk}\leftarrow TFHE.AND(𝓁ji,xjk),k=1,,l\mathcal{l}_{ji},x_{jk}),k=1,\dots,l
6
Algorithm 2 TFHE.GetClusterData(𝐃\mathbf{D}, \mathcal{L})

With 𝐃(j)\mathbf{D}^{(j)} identified, the mean 𝝁j\bm{\mu}_{j} for each cluster is computed by summing all elements in 𝐃(j)\mathbf{D}^{(j)} and dividing by the number of elements in the class gjg_{j}. The number of elements in each class is obtained by summing the encrypted label 𝓁ji\mathcal{l}_{ji} for all 𝐱i𝐃\mathbf{x}_{i}\in\mathbf{D} (see Algorithm 3).

1
2for i=1,,ki=1,\dots,k do
3       𝐃(i)\mathbf{D}^{(i)}\leftarrow TFHE.GetClusterData(D,𝓛\textbf{D},\bm{\mathcal{L}}, ii)
4      for 𝐱j𝐃(i)\mathbf{x}_{j}^{\prime}\in\mathbf{D}^{(i)} do
5            
6            𝓃ii=1n𝓁i1\mathcal{n}_{i}\leftarrow\sum\limits_{i=1}^{n}{\mathcal{l}_{i1}}, and 𝝁i1𝓃ij=1n𝐱j\bm{\mu}_{i}\leftarrow\frac{1}{\mathcal{n}_{i}}\sum\limits_{j=1}^{n}\mathbf{x}_{j}^{\prime}
7      
8
Algorithm 3 TFHE.ClusterMean(D, 𝓛\bm{\mathcal{L}})
1
/* parallel compute linear kernel */
2
3𝓛{𝓵i|𝓵i=𝖤𝗇𝖼(a${1,,k}),i=1,,n}\bm{\mathcal{L}}\leftarrow\{\bm{\mathcal{l}}_{i}|\bm{\mathcal{l}}_{i}=\mathsf{Enc}(a\overset{\$}{\leftarrow}\{1,\dots,k\}),i=1,\dots,n\}
4repeat
5      
6      for i=1,,ni=1,\dots,n do
7            
8            for j=1,,kj=1,\dots,k do
9                  
10                  K(j)\textbf{K}^{(j)}\leftarrow TFHE.GetClusterData(K,𝓛i\textbf{K},\bm{\mathcal{L}}_{i})
11                  𝓃js=1nlsj\mathcal{n}_{j}\leftarrow\sum\limits_{s=1}^{n}l_{sj}
12                  𝓅a=1nb=1nK(j)(𝐱a,𝐱b)\mathcal{p}\leftarrow\sum\limits_{a=1}^{n}\sum\limits_{b=1}^{n}\textbf{K}^{(j)}(\mathbf{x}_{a},\mathbf{x}_{b})
13                  𝓆2𝓃ja=1nK(j)(𝐱a,𝐱b)\mathcal{q}\leftarrow-2\mathcal{n}_{j}\sum\limits_{a=1}^{n}\textbf{K}^{(j)}(\mathbf{x}_{a},\mathbf{x}_{b})
14                  𝒹j𝓅+𝓆\mathcal{d}_{j}\leftarrow\mathcal{p}+\mathcal{q}
15            
16            𝓵i\bm{\mathcal{l}}_{i}\leftarrow TFHE.argmin{𝒹j}j=1,,kj{}_{j}\{\mathcal{d}_{j}\}_{j=1,\dots,k}
17      
18until tt times
Algorithm 4 TFHE.Kernelkk-means(D, kk, tt)

(2) Kernel Method. Kernel kk-means reduces to solving Eq. (2). Directly solving partial sums from the kernel matrix is infeasible due to encrypted labels. Therefore, we proceed as follows (see Algorithm 4 for the complete kernel evaluation):

  • Obtain cluster-specific kernel K(j)K^{(j)}, where K(j)(𝐱a,𝐱b)K^{(j)}(\mathbf{x}_{a},\mathbf{x}_{b}) equals K(𝐱a,𝐱b)K(\mathbf{x}_{a},\mathbf{x}_{b}) if both 𝐱a,𝐱bgj\mathbf{x}_{a},\mathbf{x}_{b}\in g_{j}; otherwise, it is zero.

  • Compute the partial sums from Eq. (2) using the cluster-specific kernel K(j)K^{(j)}.

Proc. 1: Cluster-specific Kernel. We obtain K(j)K^{(j)} using 𝓁ij\mathcal{l}^{*}_{ij} and AND gates:

𝓉\displaystyle\mathcal{t} TFHE.AND(𝓁aj,𝓁bj),\displaystyle\leftarrow\text{TFHE.AND}(\mathcal{l}^{*}_{aj},\mathcal{l}_{bj}),
K(j)(𝐱a,𝐱b)[s]\displaystyle K^{(j)}(\mathbf{x}_{a},\mathbf{x}_{b})[s] TFHE.AND(𝓉,K(𝐱a,𝐱b)[s])\displaystyle\leftarrow\text{TFHE.AND}(\mathcal{t},K(\mathbf{x}_{a},\mathbf{x}_{b})[s])

where 𝓉\mathcal{t} is 𝖤𝗇𝖼(1)\mathsf{Enc}(1) if both 𝐱a\mathbf{x}_{a} and 𝐱b\mathbf{x}_{b} belong to gjg_{j}.

Proc. 2: Partial Sum Evaluation. We compute over nnjn\geq n_{j} elements in K(j)K^{(j)}:

𝐱agjK(𝐱i,𝐱a)\displaystyle\sum_{\mathbf{x}_{a}\in g_{j}}K(\mathbf{x}_{i},\mathbf{x}_{a}) =a=1nK(j)(𝐱i,𝐱a)\displaystyle=\sum_{a=1}^{n}K^{(j)}(\mathbf{x}_{i},\mathbf{x}_{a})
𝐱a,𝐱bgjK(𝐱a,𝐱b)\displaystyle\sum_{\mathbf{x}_{a},\mathbf{x}_{b}\in g_{j}}K(\mathbf{x}_{a},\mathbf{x}_{b}) =b=1na=1nK(j)(𝐱a,𝐱b)\displaystyle=\sum_{b=1}^{n}\sum_{a=1}^{n}K^{(j)}(\mathbf{x}_{a},\mathbf{x}_{b})

B-B Boolean-based HE construction: kk-NN

(1) General Method: The kk-NN algorithm involves 1) sorting, 2) counting, and 3) finding the majority label among encrypted data. We address the issues as follows.

Issue 1: Sorting. We use the TFHE.minMax function, detailed in Algorithm 5, to arrange a pair of ciphertexts 𝖼𝗍𝟣,𝖼𝗍𝟤\mathsf{ct_{1}},\mathsf{ct_{2}} in ascending order using the TFHE.LEQ operation. By using 𝓉\mathcal{t} and its negation 𝓉\sim\mathcal{t} as selectors in a MUX gate, we obtain the ordered pair (𝖼𝗍min,𝖼𝗍max)(\mathsf{ct}_{\text{min}},\mathsf{ct}_{\text{max}}).

1
2𝓉\mathcal{t}\leftarrow TFHE.LEQ(𝖼𝗍1,𝖼𝗍2\mathsf{ct}_{1},\mathsf{ct}_{2}), 𝓉\sim\mathcal{t}\leftarrow TFHE.NOT(𝓉\mathcal{t})
3for j=1,,rj=1,\dots,r do
4      
5      𝖼𝗍jmin\mathsf{ct}_{j}^{min}\leftarrow TFHE.MUX(𝓉\mathcal{t}, 𝖼𝗍1j\mathsf{ct}_{1j}, 𝖼𝗍2j\mathsf{ct}_{2j})
6      𝖼𝗍jmax\mathsf{ct}_{j}^{max}\leftarrow TFHE.MUX(𝓉\sim\mathcal{t}, 𝖼𝗍1j\mathsf{ct}_{1j}, 𝖼𝗍2j\mathsf{ct}_{2j})
7      𝓁jmin\mathcal{l}_{j}^{min}\leftarrow TFHE.MUX(𝓉\mathcal{t}, 𝓁1j\mathcal{l}_{1j}, 𝓁2j\mathcal{l}_{2j})
8      𝓁jmax\mathcal{l}_{j}^{max}\leftarrow TFHE.MUX(𝓉\sim\mathcal{t}, 𝓁1j\mathcal{l}_{1j}, 𝓁2j\mathcal{l}_{2j})
9
Algorithm 5 TFHE.minMax((𝖼𝗍1,𝓁1),(𝖼𝗍2,𝓁2)(\mathsf{ct}_{1},\mathcal{l}_{1}),(\mathsf{ct}_{2},\mathcal{l}_{2}))

By employing the TFHE.minMax function, Algorithm 6 executes the Bubblesort algorithm, yielding a sorted sequence {di}i=1n\{d_{i}^{*}\}_{i=1}^{n} in ascending order, with their respective labels 𝓁i\mathcal{l}_{i}^{*}.

1
2𝒶j=(𝒹j,𝓁j)\mathcal{a}_{j}=(\mathcal{d}_{j},\mathcal{l}_{j})
3repeat
4      
5      for j=1,,n1j=1,\dots,n-1 do
6            
7            (𝒶j,𝒶j+1)(\mathcal{a}_{j},\mathcal{a}_{j+1})\leftarrow TFHE.minMax(𝒶j,𝒶j+1\mathcal{a}_{j},\mathcal{a}_{j+1})
8      
9until n1n-1 times
Algorithm 6 TFHE.BubbleSort({𝒹i}i=1n,𝓵\{\mathcal{d}_{i}\}_{i=1}^{n},\bm{\mathcal{l}})

Issue 2: Counting. From the sorted distances {di}i=1n\{d_{i}^{*}\}_{i=1}^{n} and respective labels {𝓁i}i=1n\{\mathcal{l}_{i}^{*}\}_{i=1}^{n}, we count the number of elements in each class gjg_{j} among the kk nearest neighbors. We repeatedly use the TFHE.EQ comparison, which outputs 𝖤𝗇𝖼(1)\mathsf{Enc}(1) if two input ciphertexts are equal (see Algorithm 7).

1
2for i=1,,si=1,\dots,s do
3       𝓃i𝖤𝗇𝖼(0)\mathcal{n}_{i}\leftarrow\mathsf{Enc}(0)
4      for j=1,,kj=1,\dots,k do
5             𝓉\mathcal{t}\leftarrow TFHE.EQ(𝖤𝗇𝖼(i),𝓁j\mathsf{Enc}(i),\mathcal{l}^{*}_{j})
6            𝓃i𝓃i+𝓉\mathcal{n}_{i}\leftarrow\mathcal{n}_{i}+\mathcal{t}
7      
8
Algorithm 7 TFHE.CountClass({𝓁i}i=1,,k,s\{\mathcal{l}^{*}_{i}\}_{i=1,\dots,k},s)

Issue 3: Majority Label. Finally, we use the TFHE.argmax operation on {𝓃j}j=1s\{\mathcal{n}_{j}\}_{j=1}^{s} to determine the majority label.

(2) Kernel Method. Kernel evaluation of kk-NN replaces distance calculations with kernel elements (see Algorithm 8).

1
/* parallel compute linear kernel */
2
3𝒹i2K(𝐱i,𝐱)+K(𝐱i,𝐱i),i=1,,n\mathcal{d}_{i}\leftarrow-2K(\mathbf{x}_{i},\mathbf{x})+K(\mathbf{x}_{i},\mathbf{x}_{i}),i=1,\dots,n
4𝓭(𝒹1,,𝒹n),𝔂(𝓎1,,𝓎n)\bm{\mathcal{d}}\leftarrow(\mathcal{d}_{1},\dots,\mathcal{d}_{n}),\bm{\mathcal{y}}\leftarrow(\mathcal{y}_{1},\dots,\mathcal{y}_{n})
5(𝒹i,li)i=1,,n(\mathcal{d}_{i}^{*},l_{i}^{*})_{i=1,\dots,n}\leftarrow TFHE.BubbleSort(𝓭,𝒚\bm{\mathcal{d}},\bm{y})
6(𝓃1,,𝓃s)(\mathcal{n}_{1},\dots,\mathcal{n}_{s})\leftarrow TFHE.CountClass({li}i=1,,k\{l_{i}^{*}\}_{i=1,\dots,k})
7y^\hat{y}\leftarrow TFHE.argmax{𝓃j}j=1,,sj{}_{j}\{\mathcal{n}_{j}\}_{j=1,\dots,s}
Algorithm 8 TFHE.Kernelkk-NN(D, kk, 𝐱\mathbf{x})