¹¹institutetext: Center for Mathematical Sciences, Huazhong University of Science and Technology, Wuhan 430074, China ²²institutetext: School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430074, China ³³institutetext: School of Mathematical and Statistical Sciences, Clemson University, Clemson SC 29634-0975 U.S.A ⁴⁴institutetext: School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China ⁵⁵institutetext: Greater Bay University, Dongguan 523830, China

Privacy-Preserving Discretized Spiking Neural Networks

Pengbo Li 11 Ting Gao Corresponding author: tgao0716@hust.edu.cn 11 Huifang Huang 22 Jiani Cheng 11 Shuhong Gao 33 Zhigang Zeng 44 Jinqiao Duan 55

Abstract

The rapid development of artificial intelligence has brought considerable convenience, yet also introduces significant security risks. One of the research hotspots is to balance data privacy and utility in the real world of artificial intelligence. The present second-generation artificial neural networks have made tremendous advances, but some big models could have really high computational costs. The third-generation neural network, SNN (Spiking Neural Network), mimics real neurons by using discrete spike signals, whose sequences exhibit strong sparsity, providing advantages such as low energy consumption and high efficiency. In this paper, we construct a framework to evaluate the homomorphic computation of SNN named FHE-DiSNN that enables SNN to achieve good prediction performance on encrypted data. First, benefitting from the discrete nature of spike signals, our proposed model avoids the errors introduced by discretizing activation functions. Second, by applying bootstrapping, we design new private preserving functions FHE-Fire and FHE-Reset, through which noise can be refreshed, allowing us to evaluate SNN for an arbitrary number of operations. Furthermore, We improve the computational efficiency of FHE-DiSNN while maintaining a high level of accuracy. Finally, we evaluate our model on the MNIST dataset. The experiments show that FHE-DiSNN with 30 neurons in the hidden layer achieves a minimum prediction accuracy of 94.4%. Under optimal parameters, it achieves a 95.1% accuracy, with only a 0.6% decrease compared to the original SNN (95.7%). These results demonstrate the superiority of SNN over second-generation neural networks for homomorphic evaluation.

Keywords:

Privacy Computing Fully Homomorphic Encryption Spiking Neural Network Bootstrapping

1 Introduction

Privacy-Preserved AI. Machine learning algorithms based on deep neural networks have attracted extensive attention as a key technology in Artificial Intelligence (AI). These achievements have been widely applied in various fields such as image processing, intelligent transportation, and security. However, users face challenges of insufficient local computing power when training neural network models with a large number of parameters, which leads to the consideration of MLaaS(Machine Learning as a Service)[33] to outsource the computation of neural network models to cloud services. However, outsourcing brings risks of data security breaches. To address this issue, many privacy protection techniques are applied to machine learning models, such as homomorphic encryption (HE), differential privacy (DP), and secure multi-party computation (SMC) based on cryptography.

Homomorphic encryption refers to the ability to perform arbitrary computations on ciphertext without decryption. This unique property enables homomorphic encryption to have broad theoretical and practical applications, such as secure encrypted retrieval in cloud computing and secure multi-party computation. Therefore, researching homomorphic encryption holds significant scientific and practical value. In 2009, Gentry[20, 19] constructs the first fully homomorphic encryption (FHE) scheme, which is a major breakthrough in the field of cryptography. So far, there have been four generations of FHE. In the first generation [19], Gentry constructs a true bootstrapping process, although its practical performance is poor. The second-generation scheme, represented by BFV [5] and BGV [15], introduces a technique called modulus reduction, which builds leveled HE schemes that can compute addition and multiplication of predefined depth. Another advantage of the second-generation scheme is the SIMD operation, allowing parallel processing of thousands of plaintexts in corresponding ciphertext slots, greatly improving the scheme’s performance. CKKS [10] is a modification of BFV schemes that supports homomorphic real number operations with fixed precision. The third-generation schemes include FHEW [14], TFHE [11] and Gao et al.[18, 8] that have fast bootstrapping and enable an unlimited number of operations.

Although there are many works based on the early second-generation FHE, it only supports homomorphic operations of addition and multiplication, while practical computations often involve non-linear operations such as comparison and maximization, especially activation functions in neural networks. To address these issues, Gilad-Bachrach et al. [23] propose the CryptoNets method, which replaces non-linear activation functions with polynomial functions. However, polynomials of a high degree are needed for a good approximation of nonlinear functions used in machine learning. Mohassel et al. [31] introduce an interactive algorithm that utilizes two servers to handle non-linear function problems, but it requires continuous interaction between the client and the servers, leading to high communication costs. Chabanne et al. [9] modify the model for the prediction phase to address the non-linear activation function problem, but this approach results in a loss of precision in prediction and training results.

In [4], the authors design FHE-DiNN, a discrete neural network framework based on the third-generation TFHE [11] scheme, where the output of each neuron is refreshed through bootstrapping, enabling homomorphic computation for arbitrary depths of networks. Unlike standard neural networks, FHE-DiNN utilizes a discretized neural network that restricts the propagated signals to integers and employs the sign function as the activation function to achieve scale invariance. FHE-DiNN exhibits fast computation speed but has lower model prediction accuracy. This work inspires us to consider whether SNN neurons that naturally output 0 and 1 binary values can also be efficiently homomorphically evaluated.

Spiking Neural Network. Compared to other neural network models, Spiking Neural Networks (SNN) are generally more reliable in biological interpretation. As the third generation of neural networks[29], SNNs have gained increasing attention due to their rich spatiotemporal neural dynamics, diverse coding mechanisms, and low-power advantages in neuromorphic chips.

In contrast to the prosperity of artificial neural networks (ANNs), the development of SNNs is still in the early stage. Currently, researches in SNNs mainly focus on five major directions: neuron models, training algorithms, programming frameworks, datasets, and hardware chips. In response to the dynamic characteristics of the potential of neurons, neurophysiologists have constructed many models. These models are the basic units that make up spiking neural networks and determine the basic dynamic characteristics of the network. Among them, the most influential models include the Hodgkin-Huxley (H-H) model [24], the leaky integrate-and-fire (LIF) model [34], the Izhikevich model [25], and the spike response model [27](SRM), etc.

The training algorithms of SNNs can be mainly divided into three types: (1) gradient-free training algorithms represented by spike-timing dependent plasticity (STDP) [26]; (2) direct conversion of ANNs to SNNs; (3) gradient-surrogate training algorithms represented by error back-propagation in the spatiotemporal domain. Bohte et al.[3] first propose a gradient descent learning algorithm that can be applied to multi-layer feed-forward spiking Neural networks, called the SpikeProp learning algorithm. Recently, Wu et al.[34]propose the spatiotemporal back propagation (STBP) method for the direct training of SNNs, and significantly improve it in order to be compatible with a much deeper structure, larger dataset, and better performance.

Considering the superior stability and lower energy consumption of SNNs in handling discrete data, it is reasonable to explore the integration of SNNs with FHE. The advantage of FHE-DiSNN lies in its strong robustness to discretization. In the process of converting traditional ANNs to homomorphic computation, discretizing the activation function is a challenging problem. Whether it is approximating with low-degree polynomials[23] or directly setting it as the sign function (in DiNN), both methods result in a loss of accuracy. SNN, on the other hand, naturally avoids this problem since all its outputs are binary pulse signals taking values from 0,1. This property also satisfies the scale-invariant property, eliminating the need to consider the influence of computation depth when designing the discretization. Inspired by FHE-DiNN, we also provide discretization methods for linear neuron models such as LIF and IF and prove that the discretization error caused by this method is very small.

Our Contribution. In this paper, we construct a novel framework called FHE-DiSNN with the following benefits:

$\bullet$

develop a low-accuracy-loss method to discretize SNN to DiSNN with controllable error.
$\bullet$

design new private preserving functions FHE-Fire and FHE-Reset with TFHE bootstrapping technology so that the resulting FHE-DiSNN constructed from DiSNN can have an arbitrary number of operations.
$\bullet$

propose an easy-extended framework(SNN $\rightarrow$ DiSNN $\rightarrow$ FHE-DiSNN) that allows the prediction procedure of SNN to be evaluated homomorphically.

Our experiments on the MNIST[13] dataset confirm the advantages of the FHE-DiSNN. First, we train a fully connected SNN with a single hidden layer consisting of 30 neurons. This SNN is constructed based on the IF(Integrate-and-Fire) neuron model and implemented using the Spikingjelly[16] Python package. Then, we convert it to DiSNN with the optimal parameters determined experimentally. The experiments show that DiSNN achieves a prediction accuracy of 95.3% on plaintext data, with only a marginal decrease of 0.4% compared to the original SNN’s accuracy of 95.7%. Finally, the accuracy of FHE-DiSNN is evaluated on ciphertext using the TFHE library, resulting in an accuracy rate of 95.1%. This demonstrates a slight degradation (0.2%, 0.6%) compared to both DiSNN (95.3%) and SNN (95.7%).

Outline of the paper. The paper is structured as follows: In Section 2, we provide definitions and explanations of SNN and TFHE, including a brief introduction to the bootstrapping process of TFHE. In Section 3, we present our method of constructing Discretized Spiking Neural Networks and prove that the discretization error can be controlled. In Section 4, we highlight the challenges of evaluating a DiSNN homomorphically and provide a detailed explanation of our proposed solution. In Section 5, we present comprehensive experimental results for verification of our proposed framework. And discuss the challenges and possible future work in section 6.

2 Preliminary Knowledge

In this chapter, we commence by presenting the training and prediction methods of the SNN model. Subsequently, we provide a concise introduction to the bootstrapping process in the TFHE scheme.

2.1 Spiking Neural Network

The typical structure of a neuron predominantly encompasses three components: dendrites, soma (cell body), and axons. In consideration of the neuron’s potential dynamic characteristics during its operation, neurophysiologists have devised diverse models that constitute the foundational constituents of spiking neural networks, thereby exerting influence on the network’s fundamental dynamic properties.

The Hodgkin-Huxley (H-H) model provides a comprehensive and accurate depiction of the intricate electrical activity mechanisms in neurons. However, it entails a complex and extensive system of dynamic equations that impose substantial computational demands, so simplified models remain practical and valuable such as the most widely utilized Leaky Integrate-and-Fire (LIF) model. LIF model simplifies the process of action potentials significantly while retaining three key characteristics: leakage, accumulation, and threshold excitation which are presented below:

\Omega\frac{dV}{dt}=-V+I,

(1)

where $\Omega=RC$ is a time constant, $R$ and $C$ denotes the membrane resistance and capacitance respectively. Building upon the foundation of LIF model, there exist, several variant models, including QIF model [7], EIF model [17], and adaptive EIF model [6]. Besides, IF model [1] is a further simplification of LIF model, where $\Omega=1$ and $V$ in Equation 1 disappear, i.e. $\frac{dV}{dt}=I$ .

In practical applications, it is common to utilize discrete difference equations as an approximation method for modeling the equations governing neuronal electrical activity. Although the specific accumulation equations for various neuronal membrane potentials may differ, the threshold excitation and reset equations for the membrane potential remain consistent. Consequently, the neuronal electrical activity can be simplified into three distinct stages: charging, firing, and resetting.

$\displaystyle H[t]$	$\displaystyle=V_{t-1}+f(V[t-1],I[t]),$	(2)
$\displaystyle S[t]$	$\displaystyle=\textbf{Fire}\left(H[t]-V_{threshold}\right),$
$\displaystyle V[t]$	$\displaystyle=\textbf{Reset}(H[t])=\left\{\begin{aligned} V_{reset},\quad&\text{if}\quad H[t]\geq V_{threshold},\\ H[t],\quad&\text{if}\quad V_{reset}\leq H[t]\leq V_{threshold},\\ V_{reset},\quad&\text{if}\quad H[t]\leq V_{reset}.\end{aligned}\right.$

$\textbf{Fire}(\cdot)$ is a step function:

\textbf{Fire}(x)=\left\{\begin{aligned} 1,\quad&\text{if}\quad x\geq 0,\\ 0,\quad&\text{if}\quad x\leq 0.\\ \end{aligned}\right.

(3)

$I[t]$ (The subscript $i$ represents the i-th neuron, here we only refer to an arbitrary neuron, so $i$ can be omitted.) represents the total membrane current of the external input from the pre-synaptic neurons. This term can be conceptually interpreted as the voltage increment and mathematically calculated using the equation provided below:

I[t]=\sum\limits_{j}w_{ij}S_{j}[t].

(4)

To mitigate potential confusion, we employ the notation $H[t]$ to denote the membrane potential of the neuron subsequent to the charging phase and prior to spike initiation, while $V[t]$ signifies the membrane potential of the neuron subsequent to spike initiation. The function $f(V[t-1],X[t])$ represents the equation governing the state transition of the neuron, wherein the distinctions between different neuron models manifest in the specific formulation of $f$ .

In practical applications, it is common to utilize spike encoding methods to transform image data into appropriate binary inputs format for SNN. Poisson encoding is a commonly used one, in which the inputs are encoded into rate-based spike by the $\lambda$ -Poisson process. Additionally, due to the non-differentiable nature of spiking functions, the conventional back-propagation algorithm based on gradient descent in ANNs is not suitable in this context. Therefore, alternative training approaches must be sought. Poisson encoding and surrogate gradient method are utilized in this paper and other common methodologies for encoding data and training SNN are detailed in Appendix A.

2.2 Programmable Bootstrapping

Let $N=2^{k}$ and $p>1$ an even integer. Let $Z_{p}=\{-\frac{p}{2}+1,\dots,\frac{p}{2}\}$ be the ring of integer modulo $p$ . Let $X^{N}+1$ be the (2N)-th cyclotomic polynomial. Let $q$ be a prime and define $R_{q,N}=R/qR\equiv\mathbb{Z}_{q}[X]/(X^{N}+1)\equiv\mathbb{Z}[X]/(X^{N}+1,q)$ , similarly for $R_{p,N}$ . Vectors are represented by lowercase bold letters, such as $\mathbf{a}$ . The $i$ -th entry of a vector $\mathbf{a}$ is denoted as $a_{i}$ . The inner product between vectors $\mathbf{a}$ and $\mathbf{b}$ is denoted by $\langle\mathbf{a},\mathbf{b}\rangle$ . A polynomial $m(X)$ in $R_{p,N}$ corresponds to a message vector of length $N$ over $Z_{p}$ , and the ciphertext for $m(X)$ will be a pair of polynomials in $R_{q,N}$ . Detailed fully homomorphic encryption schemes have been included in Appendix B.

When referring to a probability distribution, we indicate that a value $d$ is drawn from the distribution $\mathcal{D}$ as $d\sim\mathcal{D}$ .

Theorem 2.1.

(Programmable bootstrapping[12]) TFHE/FHEW bootstrapping support the computation of any function $g:Z_{p}\rightarrow Z_{p}\quad\text{and}\quad g(v+\frac{p}{2})=-g(v)$ . We refer to $g$ as the program function of bootstrapping. An LWE ciphertext $LWE_{s}(m)=(\mathbf{a},b)$ , where $m\in Z_{p}$ , $\mathbf{a}\in Z_{p}^{N}$ and $b\in Z_{p}$ , can be bootstrapped into $LWE_{s}(g(m))$ with very low noise.

This process relies on the Homomorphic Accumulator[30] denoted as $ACC_{g}$ .Using the notations of [30], the bootstrapping process can be broken down into the following steps:

-Initialize: Set the initial polynomial:

ACC_{g}[-b]=X^{-b}\cdot\sum\limits_{i=0}^{N-1}g\left(\left\lfloor\frac{i\cdot p}{2N}\right\rfloor\right)X^{i}\bmod X^{N}+1.

(5)

-Blind Rotation: $ACC_{g}\leftarrow_{+}^{+}-a_{i}\cdot ek_{i}$ , modifies the content of the accumulator from $ACC_{g}[-b]$ to $ACC_{g}[-b+\sum a_{i}s_{i}]=ACC_{g}[-m-e]$ , where

\text{ek}=\left(RGSW\left(X^{s_{1}}\right),\ldots,RGSW\left(X^{s_{n}}\right)\right),

which is a list of materials over $R_{q}^{N}$ .

-Sample Extraction: $ACC_{g}=(a(X),b(X))$ is the RLWE ciphertext with component polynomials $a(X)=\sum\limits_{0\leq i\leq N-1}a_{i}X^{i}$ and $b(X)=\sum\limits_{0\leq i\leq N-1}b_{i}X^{i}$ . The extraction operation outputs the LWE ciphertext:

RLWE_{z}\stackrel{{\scriptstyle\text{Sample Extraction}}}{{\longrightarrow}}LWE_{z}(g(m))=(\mathbf{a},b_{0}),

where $\mathbf{a}=(a_{0},\ldots,a_{N-1})$ is the coefficient vector of $a(X)$ , and $b_{0}$ is a coefficient of $b(X)$ .

-Key Switching: Key switching transforms the LWE instance’s key from the original vector $\mathbf{z}$ to the vector $\mathbf{s}$ without changing plaintext message $m$ :

LWE_{\mathbf{z}}(g(m))\stackrel{{\scriptstyle\text{Key Switching}}}{{\longrightarrow}}LWE_{\mathbf{s}}(g(m)).

Taking a bootstrapping key and a key switching key as input, bootstrapping can be defined as:

\text{bootstrapping}=\textbf{KeySwitch}\circ\textbf{Extract}\circ\textbf{BlindRotate}\circ\textbf{Initialize}

(6)

With program function $g$ , bootstrapping takes ciphertext $LWE_{s}(m)$ as input, and output $LWE_{s}(g(m))$ with the original secret key $s$ :

\text{bootstrapping}(LWE_{s}(m))=LWE_{s}(g(m)).

(7)

This property will be extensively utilized in our context. Since bootstrapping does not alter the secret key, we will use the shorthand $LWE(m)$ to refer to an LWE ciphertext in the rest.

3 Discretized Spiking Neural Network

There are two parts to this section. Firstly, we present a simple discretization method to convert SNNs into Discretized Spiking Neural Networks(DiSNNs). We demonstrate that this method guarantees controllable errors for both the IF neuron model and the LIF neuron model. Furthermore, we provide estimations for the extrema of these two discretization models which can be used to determine the size of the plaintext space. Secondly, we propose an efficient method for computing the Fire and Reset functions of the SNN neuron model on the ciphertext, denoted as FHE-Fire and FHE-Reset.

Definition 1.

A Discretized Spiking Neural Network (DiSNN) is a type of feed-forward spiking neural network in that all weights are discretized into a finite $Z_{p}$ , as well as the inputs and outputs of the neuron model.

We denote this discretization method as the function:

\hat{x}\triangleq\text{Discret}(x,\tau)=\lfloor x\cdot\tau\rceil,

(8)

where $\hat{x}$ represents the value $x$ after discretization and the precision of the discretization can be controlled, with a larger $\tau$ resulting in finer discretization. The equation 2 can be discretized as follows( $i$ is omitted like Equation4):

$\displaystyle\hat{I}[t]$	$\displaystyle=\varSigma\hat{\omega}_{ij}S_{j}[t],$	(9)
$\displaystyle\hat{H}[t]$	$\displaystyle=\hat{V}[t-1]+f(\hat{V}[t-1],\hat{I}[t]),$
$\displaystyle S[t]$	$\displaystyle=\textbf{Fire}\left(\hat{H}[t]-\hat{V}_{threshold}\right),$
$\displaystyle\hat{V}[t]$	$\displaystyle=\textbf{Reset}(\hat{H}[t])=\left\{\begin{aligned} \hat{V}_{reset},\quad&\text{if}\quad\hat{H}[t]\geq\hat{V}_{threshold},\\ \hat{H}[t],\quad&\text{if}\quad\hat{V}_{reset}\leq\hat{H}[t]\textless\hat{V}_{threshold},\\ \hat{V}_{reset},\quad&\text{if}\quad\hat{H}[t]\leq\hat{V}_{reset}.\end{aligned}\right.$

This system of equations clearly shows the advantages of SNN in terms of discretization methods. The binary spike signals with values of 0 and 1 not only avoid the losses incurred by self-discretization but also effectively control the errors caused by discretized weights. The two crucial parameters of SNN, $V_{threshold}$ and $V_{reset}$ , are generally set as integers, eliminating any discretization errors. In fact, the only aspect that requires attention is the discretization of weights. An estimate of the upper bound on the discretization error is given in the assertion below.

Proposition 1.

For the IF neuron model and LIF neuron model, the discretization error is independent of the scaling factor $\tau$ and only depends on the number of spikes.

Proof.

For the IF and LIF neuron models, let a linear function $f$ denote their charging processes. We have,

\displaystyle\tau f(V[t-1],I[t])=f(\tau V[t-1],\tau I[t])=f(\hat{V}[t-1],\hat{I}[t]).

This means that the discretization error is only concentrated in $\hat{I}[t]$ ,

	$\displaystyle\max\limits_{i}\|\hat{I}_{i}[t]-\tau I_{i}[t]\|$	$\displaystyle=\max\limits_{i}\|\sum\limits_{j}(\tau w_{ij}-\hat{w}_{ij})S_{j}[t]\|$
		$\displaystyle\leq\max\limits_{i,j}\|\tau w_{ij}-\hat{w}_{ij}\|\cdot\|\sum\limits_{j}S_{j}[t]\|$
		$\displaystyle\leq\frac{1}{2}\cdot\|\sum\limits_{j}S_{j}[t]\|.$

As the above showing, discretization error is actually independent of $\tau,$ but proportional to the number of spikes. ∎

Proposition 1 provides an upper bound on the overall discretization error, where $\frac{1}{2}$ represents the maximum value of individual weight discretization error. However, in practical situations, not all weights will reach the maximum error. From a mathematical expectation perspective, the discretization error can be further reduced. The proof is provided by the following Proposition.

Proposition 2.

For the IF and LIF neuron models, assuming the weights follow a uniform distribution on $[-\frac{1}{2},\frac{1}{2}]$ and the number of spikes follows a Poisson distribution with intensity $\lambda$ , the mathematical expectation of the discretization error is $\lambda/4$ .

Proof.

Denote the random variable $\tau w_{ij}-\hat{w}_{ij}$ as $\xi_{j}(\omega)$ , we can see that it follows a uniform distribution on the interval $[-\frac{1}{2},\frac{1}{2}]$ . Set $N(\omega)=\sum_{j}S_{j}[t]$ which is a Poisson random variable with intensity $\lambda$ . Note that $\mathbb{E}(|\xi_{i}|)=\frac{1}{4}$ , $\mathbb{P}(N=n)=e^{\lambda}\cdot\frac{\lambda^{n}}{n!}$ and $\sum\limits_{n=0}^{\infty}\mathbb{P}(N=n)=1$ . Then, the expectation of the error can be written as follows:

	$\displaystyle\mathbb{E}\|\hat{I}[t]-\tau I[t]\|$	$\displaystyle\approx\mathbb{E}(\sum\limits_{j=0}^{N(\omega)}\|\xi_{j}\|)$
		$\displaystyle=\sum_{n=0}^{\infty}\mathbb{E}(\sum_{i=0}^{n}\|\xi_{i}\|\mid N(\omega)=n)\cdot\mathbb{P}(N=n)$
		$\displaystyle=\sum_{n=0}^{\infty}\frac{n}{4}\cdot e^{\lambda}\cdot\frac{\lambda^{n}}{n!}=\frac{\lambda}{4}.$

∎

Notice that in the Proof, $\mathbb{E}(|\xi_{i}|\mid N(\omega)=n)=\mathbb{E}(|\xi_{i}|)$ is from the independence between $\xi_{i}(\omega)$ and $N(\omega)$ .

We can obtain a similar conclusion as Proposition 1: the number of spikes, not the parameter $\tau$ , affects the magnitude of the error. Although the Proposition above indicates that the size of $\tau$ does not affect the growth of the error, we cannot infinitely increase $\tau$ in order to improve accuracy. This is because larger $\tau$ implies a larger message space, which puts more computational burden on homomorphic encryption. In Proposition 3, we specify the relationship between them.

Proposition 3.

For the IF and LIF models, the maximum and minimum values generated during the computation process are controlled by $\tau$ , the number of spikes, and the extremal values of the weights.

Proof.

From equation 2, for the IF model case, it can be observed that the range of membrane potential $V[t]$ is controlled by the Reset function of the neuron model, bounded within $[V_{reset},V_{threshold}]$ . The maximum and minimum values can only occur in the variable $\hat{H}$ . Therefore, the extreme values satisfy the following inequalities:

	$\displaystyle Max$	$\displaystyle\triangleq\max(\hat{H}[t])=\max(\hat{V}[t]+\hat{I}[t])$
		$\displaystyle\leq\tau(V_{threshold}+\max\limits_{i,j}(\|w_{ij}\|\cdot\sum\limits_{j}\|S_{j}[t]\|),$
	$\displaystyle Min$	$\displaystyle\triangleq\min(\hat{H}[t])\geq-\|\hat{V}_{reset}\|-\|\hat{I}[t]\|$
		$\displaystyle\geq-\tau(V_{reset}+\max\limits_{i,j}(\|w_{ij}\|)\cdot\sum\limits_{j}\|S_{j}[t]\|).$

Besides, we can also prove LIF model in a similar way. We denote the upper and lower bound as $\alpha,\beta$ , respectively. ∎

Corollary 1.

In general, when the neuron model has $V_{reset}=0$ , the relation between the maximum and minimum values is given by:

\beta=-\alpha+\hat{V}_{threshold},

(10)

where $\alpha,\beta$ represent the upper and lower bound of DiSNN from Proposition 3

This seemingly trivial but highly important conclusion ensures the correctness of homomorphic evaluation. We will encounter it in subsequent sections.

Refer to caption — (a) Finite Filde used in FHE

Figure 1(a) illustrates a finite field $Z_{p}$ , which forms a cyclic group. Similar to a circular arrangement, if a number exceeds $p/2$ , it wraps around to the opposite end, as well as values below $-\frac{p}{2}+1$ . Being defined on $Z_{p}$ , the intermediate values during DiSNN computation must be accommodated in $Z_{p}$ , or else exceeding the boundaries will result in computational errors. This means that the inequality $-\frac{p}{2}+1\leq\beta\textless\alpha\textless\frac{p}{2}$ must be satisfied. However, large $p$ leads to a decrease in computational efficiency for homomorphic encryption. Therefore, selecting an appropriate $\tau$ that strikes a balance between computational efficiency and discretization accuracy becomes a crucial consideration.

4 Homomorphic Evaluation of DiSNN

This section will delve into a detailed analysis of how DiSNN performs prediction on homomorphically encrypted images. As the prediction procedure of SNN shown in Figure 1(b), all operations performed on ciphertext can be summarized as Fire, Reset, and Multisum(scalar multiplication and summation). Poisson encoding merely involves comparing magnitudes, which can be computed using the Fire function. Multisum is naturally supported in FHE, so the challenges lie in performing Reset and Fire functions on ciphertext since they are non-polynomial functions. We leverage the programmable bootstrapping technique introduced by Chillotti et al.[12] to execute the Fire and Reset functions of the SNN model while simultaneously refreshing the noise of the ciphertext.

4.1 Homomorphic Computation of Multisum

We select a neuron from the SNN layer, and its input is expressed as Equation 11, which is correct, as long as the noise carried by the ciphertext does not exceed the upper bound of the noise shown in the following Remark.

$\displaystyle\sum_{j}\hat{w}_{ij}LWE(S_{j}[t])$	$\displaystyle=\sum_{j}LWE(\hat{w}_{ij}\cdot S_{j}[t])$	(11)
	$\displaystyle=LWE(\sum_{j}\hat{w}_{ij}S_{j}[t])$
	$\displaystyle=LWE(\hat{I}_{i}[t]).$

Remark 1.

It is observed that the multiplication and addition operations on ciphertexts will amplify the noise carried by the ciphertexts. To ensure the correctness of the above computation which equals to

Dec(\sum_{j}\hat{w}_{ij}LWE(S_{j}[t]))=Dec(LWE(\hat{I}_{i}[t])).

There are two conditions that need to be satisfied: (1) $\sum\limits_{j}\hat{w}_{ij}S_{j}[t]\in[-\frac{p}{2},\frac{p}{2})$ ; (2) the noise does not grow beyond the noise bound. The first condition is easy to satisfy by choosing a sufficiently large message space $Z_{p}$ . To address the noise issue, let us assume that $LWE(S_{j}[t])$ has an initial noise $\sigma$ (as each spike is generated via bootstrapping). After the multiplication and addition operations, the noise in the ciphertext grows to $|\sum\limits_{j}\hat{w}_{ij}|\cdot\sigma$ , which is proportional to the discretization parameter $\tau$ . One way to control the noise growth is to decrease $\tau$ , which may lead to a decrease in accuracy. Another approach is to trade off the security level by reducing the initial noise $\sigma$ , where increasing the dimension of the LWE problem $n$ could remedy the situation[2].

4.2 Homomorphic Computation of Fire Function

The Fire function is a non-polynomial function, so we must rely on the Theorm 2.1 to evaluate it and refresh the ciphertext noise simultaneously. We propose a solution to implement the Fire function on ciphertexts, referred to as the FHE-Fire function, which can be realized as:

$\displaystyle\textbf{FHE-Fire}(LWE(m))$	$\displaystyle\triangleq bootstrap(LWE(m))+1$	(12)
	$\displaystyle=\begin{cases}LWE(2),&\text{if }m\in[0,\frac{p}{2}),\\ LWE(0),&\text{if }m\in[-\frac{p}{2},0)\end{cases}$
	$\displaystyle=LWE(2\cdot Spike).$

by defining the program function $g$ of bootstrapping as:

g(m)\triangleq\begin{cases}1,&\text{if }m\in[0,\frac{p}{2}),\\ -1,&\text{if }m\in[-\frac{p}{2},0).\end{cases}

(13)

In this case, the spike signal is mapped to $\{0,2\}$ , doubling its original value. This adds a slight complication for the subsequent fully connected layer’s computation. However, it can be easily overcome. Since the spike signal is now doubled, we ensure the consistency of the final result by halving the weights as the following equation show:

	$\displaystyle LWE(\hat{I})$	$\displaystyle=\sum_{j}\hat{w}_{ij}\cdot LWE(S_{j})\approx\sum_{j}\lfloor\frac{\hat{w}_{ij}}{2}\rfloor\cdot LWE(2\cdot S_{j})$		(14)
		$\displaystyle\approx\sum_{j}\text{Discretized}(w_{ij},\frac{\tau}{2})\cdot LWE(2\cdot S_{j}).$		(14)

This approach also benefits the control of ciphertext noise by halving the discretization parameter $\tau$ . Since ciphertext $LWE(2\cdot S_{j})$ obtained through bootstrapping has very low initial noise $\sigma$ , our method reduces the noise by half, which can be easily proven using Remark 1. This allows us to have lower noise growth and enables us to make more confident parameter choices.

4.3 Homomorphic Computation of Reset Function

The Reset function describes two characteristics of the neuron model’s membrane potential. First, when the membrane potential $V$ exceeds $V_{threshold}$ , a spike is emitted and the membrane potential is back to $V_{reset}$ . Second, the membrane potential cannot be lower than $V_{reset}$ , so if such a value is generated during the computation process, it needs also to be set to $V_{reset}$ .

For convenience, the $V_{reset}$ is often set to 0. For non-zero conditions, we can shift the $V_{reset}$ to 0 using a translation method. Similarly to the Fire function, here we set the program function $g$ of bootstrapping as:

g(m)\triangleq\begin{cases}0,&\quad\text{if }m\in[\hat{V}_{threshold},\displaystyle\frac{p}{2}),\\ m,&\quad\text{if }m\in[0,\hat{V}_{threshold}),\\ 0,&\quad\text{if }m\in[\hat{V}_{threshold}-\displaystyle\frac{p}{2},0),\\ -(m+\displaystyle\frac{p}{2}),&\quad\text{if }m\in[-\displaystyle\frac{p}{2},\hat{V}_{threshold}-\frac{p}{2}),\end{cases}

(15)

where $g(x)=-g(x+\frac{p}{2})$ must be satisfied. Then, the FHE-Reset function can be computed as follows:

	$\displaystyle\textbf{FHE-Reset}(LWE(m))$	$\displaystyle\triangleq bootstrap(LWE(m)$		(16)
		$\displaystyle=\left\{\begin{aligned} LWE(0)\ &,\ m\in[\hat{V}_{threshold},\frac{p}{2}),\\ LWE(m)\ &,\ m\in[0,\hat{V}_{threshold}),\\ LWE(0)\ &,\ m\in[\hat{V}_{threshold}-\frac{p}{2},0),\\ LWE(-(m+\frac{p}{2}))\ &,\ m\in[-\frac{p}{2},\hat{V}_{threshold}-\frac{p}{2}).\\ \end{aligned}\right.$		(16)

Note that $g$ does not match Reset function on interval $[-\frac{p}{2},\hat{V}_{threshold}-\frac{p}{2})$ , which will lead to computation error. Therefore, the computation must be evaluated within the interval $[\hat{V}_{threshold}-\frac{p}{2},\frac{p}{2})$ , equaling to $\hat{V}_{threshold}-\frac{p}{2}\leq\beta\leq\alpha\leq\frac{p}{2}$ from Proposition 3. The given conditions can be simplified to $\alpha\textless\frac{p}{2}$ , based on the insights provided by Equation 10. This condition serves as a necessary requirement for selecting the message space, as it ensures that the intermediate variable naturally falls within the correct range without the need for additional conditions.

Moreover, FHE-Reset function not only realizes computing the Reset function on the ciphertext but also serves the purpose of refreshing the noise that is accumulated during the computation of $LWE(\hat{V}[t])$ . This means that we don’t need to worry about the noise issue, and it can support an arbitrary number of computations.

5 Experiment

In this chapter, we will actually build a practical FHE-DiSNN network and demonstrate its advantages and limitations in experiments by comparing it with the original SNN. There are three parts in this section. first, we determine the simulation model and network depth of our network to train a convergent SNN. Second, we proceed to convert the well-trained SNN into DiSNN. Finally, in the third part, we conduct experiments to evaluate the accuracy and efficiency of FHE-DiSNN in performing forward propagation on encrypted images. This assessment provides insights into the performance of FHE-DiSNN in a secure and encrypted environment.

5.1 Building an SNN in the clear

We select a 784 ( $28\times 28$ )-dimensional Poisson encoding layer as the input layer, and 30-dimensional and 10-dimensional IF model as the hidden layer and output layer, respectively. We utilize the Spikingjelly[16] library, a PyTorch-based framework for SNN. Training SNN is an intriguing research direction, and in this study, the gradient surrogate method is chosen to train SNNs. Other commonly used training methods such as ANN-to-SNN conversion and unsupervised training with STDP are not extensively discussed here. In general, the network needs to run for a period of time, taking the average firing rate over $T$ time steps as the basis for classification.

The prediction process is essentially a forward propagation of the trained model, with the key difference that gradient approximation is not required. The predictive accuracy improves as $T$ increases, but the marginal effect leads to a diminishing rate of improvement in accuracy, while the time consumption continues to grow linearly. Hence, in order to maintain accuracy, it is vital to minimize the value of $T$ as much as possible. We are aware that in encrypted computations, bootstrapping is the most time-consuming operation. Here, we provide an estimate for the number of bootstrapping operations in FHE-DiSNN.

Proposition 4.

For a single-layer FHE-DiSNN network with n-dimensional input, k-dimensional hidden layer, m-dimensional output, and simulation time T, the number of required bootstrapping operations is $(n+2k+2m)T$ .

Proof.

In FHE-DiSNN, the Poisson encoding requires one FHE-Fire function call, and the discharge and reset processes of the model each require one FHE-Fire function and one FHE-Reset function call. In a single simulation time, there are $n$ Poisson encoding operations and $m+k$ FHE-Fire and FHE-Reset operations, resulting in a total of $(n+2m+2k)$ bootstrapping. With $T$ repeated simulation steps, the total number of bootstrapping is given by:

\text{nums}=(n+2k+2m)\cdot T.

∎

We can reduce the number of bootstrapping from two aspects to improve experimental efficiency. First, we can encrypt the message after Poisson encoding, which eliminates the requirement for $nT$ times bootstrapping. Second, we can shorten the simulation time $T$ . The curve in Figure 2(a) shows that the optimal trade-off is achieved at $T=10$ , where the accuracy is comparable to the highest level achieved at $T=20$ .

5.2 Constructing an FHE-DiSNN from SNN

The accuracy of DiSNN improves with the increase of $\tau$ , but the marginal effect is also present. Moreover, the increase in $\tau$ leads to linear growth of noise, resulting in elevated computational costs. Therefore, selecting an appropriate value for $\tau$ is crucial. Following the design of the FHE-DiSNN algorithm, we conduct experiments to show the relationship between $\tau$ and the prediction accuracy of DiSNN for $T=10$ and $T=20$ . The curve is plotted in Figure 2(b), where the result indicates that $\tau=10$ and $\tau=10$ achieve the optimal trade-off and the highest accuracy, respectively.

The next step is to choose an appropriate $p$ of the message space. According to Equation 10, the requirements can be simplified to $\alpha\textless\frac{p}{2}$ . The maximum value depends on $\max\limits_{i,j}(\sum\limits_{j}w_{ij}S_{j}[t])$ , which is a fixed value for a well-trained network and can be pre-determined. Through several experiments, we have $\alpha\approx 50\tau$ for the first layer and approximately $10\tau$ for the second layer. A message space size of $p=1024$ is enough to accommodate the DiSNN with $\tau=10$ , and $p=2048$ for $\tau=20$ . For our experiments, we have selected the STD128 parameter set[30] shown as follows:
-Size of message space: $p=1024$ or $2048$
-Dimension of LWE ciphertext: $n=512$
-Degree of the polynomials in the ring: $N=1024$
-Bits of the decomposition basis of KeySwitching: $B_{ks}=14$
-Bits of the decomposition basis for TGSW ciphertext: $B_{g}=7$

5.3 Exhibiting experiment result

We conduct the following experimental process on an Intel Core i7-7700HQ CPU @ 2.80 GHz:
1. The grayscale handwritten digit image is encoded by the Poisson encoding layer.
2. The Poisson-encoded image is then encrypted into LWE ciphertexts.
3. The ciphertext is multiplied by the discretized plaintext weights and passed into the SNN layer.
4. The IF neuron model in the SNN layer calculates the charging, firing, and reset procedure on the ciphertext. The bootstrapping operations involved in this process are accelerated using FFT.
5. Steps 1-4 repeat $T$ times, and the resulting outputs are accumulated as classification scores.
6. Decrypt and the highest score are selected as the classification result.
In Table 1, we show the experimental results, with the Fire and Reset functions in Step 4 implemented according to our method in Section 4.

$(\mathrm{T},\tau)$	FHE-DiSNN	DiSNN	SNN	Time per step	Time per image
$(10,10)$	$94.40\%$	$95.00\%$	$95.30\%$	0.83s	8.31s
$(10,20)$	$94.40\%$	$95.00\%$	$95.30\%$	0.86s	8.67s
$(20,10)$	$94.80\%$	$95.10\%$	$95.70\%$	0.81s	16.21s
$(20,20)$	$95.10\%$	$95.30\%$	$95.70\%$	0.79s	15.97s
	FHE-DiNN	DiNN	NN	Time per step	Time per image
30 neurons	$93.46\%$	$93.55\%$	$94.46\%$	0.49s	0.49s

Table 1: The experiment result. The table above presents the results of experiments conducted with four different parameter sets. The first three columns represent the prediction accuracy of FHE-DiSNN, DiSNN, and SNN, respectively. In the case of the original SNN, only the parameter

T

influences the prediction performance. The fourth and fifth columns display the time consumed by FHE-DiSNN during a single time step and a complete prediction procedure, which, as stated by Proposition 4, is directly proportional to the simulation time

T

. The last two lines excerpt the experimental results of FHE-DiNN[4] for comparison, and SNN and NN have the same structure.

The results reveal, as emphasized at the beginning of the article, that discretization has minimal impact on SNN. The original SNN, with 30 hidden layers, achieves a stable accuracy rate of around 95.7%, outperforming many second-generation neural networks. DiSNN also demonstrates a commendable accuracy rate of 95.3%(the best parameters). Importantly, it only incurs a loss of 0.4% compared to the original SNN, showcasing the inherent advantages of DiSNN. Furthermore, FHE-DiSNN performs impressively, consistently surpassing 94% accuracy across four parameter sets. Particularly, the (20,20) parameter set demonstrates comparable performance to DiSNN. However, FHE-DiSNN suffers from time inefficiency due to the increased number of bootstrapping operations caused by the simulation time $T$ , resulting in each prediction taking 8(16) seconds with 0.8 seconds consuming on average for a single simulation step.

During the experimental process, we observe that the number of spike firings differs between the FHE-DiSNN and DiSNN during computation. This suggests that certain ciphertexts may encounter noise overflow. However, this has a minimal effect on the final categorization outcomes. This is because slight noise overflow only causes a deviation of $\pm 1$ , and abnormal spike firings occur only when the value is at the edge of $\hat{V}_{\text{threshold}}$ , with a small probability. Additionally, individual instances of abnormal spike firings are effectively mitigated within the $T$ simulation time. This indicates that FHE-DiSNN exhibits a considerable level of tolerance toward the noise, which is a highly intriguing experimental finding.

6 Conclusion

This paper serves as an initial exploration of the fusion of SNN(Spiking Neural Networks) with homomorphic encryption and presents a research avenue brimming with boundless potential. This innovation facilitates us in terms of both low energy consumption from the machine learning side and data privacy from a security point of view. We offer an estimation of the maximum upper bound for discretization error in DiSNN and substantiate its expected error to be $\lambda/4$ from a mathematical expectation perspective. Experimental results further validate this finding. Furthermore, we leverage TFHE bootstrapping to construct FHE-Fire and FHE-Reset functions, enabling support for computations of unlimited depth. Besides, our proposed framework is easy to scalable and extended to more complicated neural models.

However, there still remain some challenging tasks for further research, such as more complex neuron equations, different encoding methods, parallel computing, and so on. Besides, as highlighted by Proposition 4, Poisson encoding introduces numerous bootstrapping operations (equal to the dimension of input data), which can have a high evaluation time. And this is also one of our future directions.

References

[1] Abbott, L.F.: Lapicque’s introduction of the integrate-and-fire model neuron (1907). Brain Research Bulletin 50, 303–304 (1999)
[2] Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors
[3] Bohté, S.M., Kok, J.N., Poutré, H.L.: Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing 48, 17–37 (2000)
[4] Bourse, F., Minelli, M., Minihold, M., Paillier, P.: Fast homomorphic evaluation of deep discretized neural networks. In: Shacham, H., Boldyreva, A. (eds.) Advances in Cryptology – CRYPTO 2018. Lecture Notes in Computer Science, Cham
[5] Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (leveled) fully homomorphic encryption without bootstrapping. ACM Transactions on Computation Theory 6(3)
[6] Brette, R., Gerstner, W.: Adaptive exponential integrate-and-fire model as an effective description of neuronal activity. Journal of Neurophysiology 94(5)
[7] Brunel, N., Latham, P.E.: Firing rate of the noisy quadratic integrate-and-fire neuron. Neural Computation 15(10)
[8] Case, B.M., Gao, S., Hu, G., Xu, Q.: Fully homomorphic encryption with k-bit arithmetic operations. Cryptology ePrint Archive
[9] Chabanne, H., de Wargny, A., Milgram, J., Morel, C., Prouff, E.: Privacy-preserving classification on deep neural network
[10] Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approximate numbers. In: Takagi, T., Peyrin, T. (eds.) Advances in Cryptology – ASIACRYPT 2017. Lecture Notes in Computer Science, Cham
[11] Chillotti, I., Gama, N., Georgieva, M., Izabachène, M.: Tfhe: Fast fully homomorphic encryption over the torus. Journal of Cryptology 33(1)
[12] Chillotti, I., Ligier, D., Orfila, J.B., Tap, S.: Improved programmable bootstrapping with larger precision and efficient arithmetic circuits for tfhe. In: Tibouchi, M., Wang, H. (eds.) Advances in Cryptology – ASIACRYPT 2021. Lecture Notes in Computer Science, Cham
[13] Deng, L.: The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine 29(6)
[14] Ducas, L., Micciancio, D.: Fhew: Bootstrapping homomorphic encryption in less than a second. In: Oswald, E., Fischlin, M. (eds.) Advances in Cryptology – EUROCRYPT 2015. Lecture Notes in Computer Science, Berlin, Heidelberg
[15] Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption
[16] Fang, W., Chen, Y., Ding, J., Chen, D., Yu, Z., Zhou, H., Tian, Y., other contributors: Spikingjelly. https://github.com/fangwei123456/spikingjelly (2020), accessed: YYYY-MM-DD
[17] Fourcaud-Trocmé, N., Hansel, D., van Vreeswijk, C., Brunel, N.: How spike generation mechanisms determine the neuronal response to fluctuating inputs. Journal of Neuroscience 23(37)
[18] Gao, S.: Efficient fully homomorphic encryption scheme. Cryptology ePrint Archive
[19] Gentry, C.: Computing arbitrary functions of encrypted data. Communications of the ACM 53(3)
[20] Gentry, C.: A Fully Homomorphic Encryption Scheme. Ph.D. thesis, Stanford University, Stanford, CA, USA
[21] Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based. In: Canetti, R., Garay, J.A. (eds.) Advances in Cryptology – CRYPTO 2013. Lecture Notes in Computer Science, Berlin, Heidelberg
[22] Georgopoulos, A.P., Schwartz, A.B., Kettner, R.E.: Neuronal population coding of movement direction. Science 233 4771, 1416–9 (1986)
[23] Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In: Proceedings of The 33rd International Conference on Machine Learning
[24] Hodgkin, A.L., Huxley, A.F.: A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of physiology 117 4, 500–44 (1952)
[25] Izhikevich, E.M.: Simple model of spiking neurons. IEEE transactions on neural networks 14 6, 1569–72 (2003)
[26] Izhikevich, E.M.: Solving the distal reward problem through linkage of stdp and dopamine signaling. BMC Neuroscience 8, S15 – S15 (2007)
[27] Jolivet, R., J., T., Gerstner, W.: The spike response model: A framework to predict neuronal spike trains. In: Kaynak, O., Alpaydin, E., Oja, E., Xu, L. (eds.) Artificial Neural Networks and Neural Information Processing — ICANN/ICONIP 2003. Lecture Notes in Computer Science, Berlin, Heidelberg
[28] Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings
[29] Maass, W.: Networks of spiking neurons: The third generation of neural network models. Electron. Colloquium Comput. Complex. TR96 (1996)
[30] Micciancio, D., Polyakov, Y.: Bootstrapping in fhew-like cryptosystems
[31] Mohassel, P., Zhang, Y.: Secureml: A system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy (SP)
[32] Regev, O.: On lattices, learning with errors, random linear codes, and cryptography. In: Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing. STOC ’05, New York, NY, USA
[33] Ribeiro, M., Grolinger, K., Capretz, M.A.: Mlaas: Machine learning as a service. In: 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)
[34] Wu, Y., Deng, L., Li, G., Zhu, J., Shi, L.: Spatio-temporal backpropagation for training high-performance spiking neural networks. Frontiers in Neuroscience 12 (2017)
[35] Zhan, Q., Liu, G., Xie, X., Sun, G., Tang, H.: Effective transfer learning algorithm in spiking neural networks. IEEE Transactions on Cybernetics 52, 13323–13335 (2021)

Appendix A: SNN

Encoding Strategies

To handle various stimulus patterns, SNNs often use abundant coding methods to process the input stimulus. At present, the most common neural coding methods mainly include rate coding, temporal coding, bursting coding and population coding [22]. For visual recognition tasks, rate coding is a popular scheme. Rate coding[34, 35] is mainly based on spike counting, and Poisson distribution can describe the number of random events occurring per unit of time.

In our study, the inputs are encoded into rate-based spike trains by the Poisson process. Given a time interval $\Delta t$ in advance, then the reaction time is divided into $T$ intervals evenly. During each time step t, a random matrix $M_{t}$ is generated using uniform distribution in $[0,1]$ . Then, we compare the original normalized pixel matrix $X_{o}$ with $M_{t}$ to determine whether the current time $t$ has a spike or not. The final encoding spike train $X$ is calculated by using the following equation:

X(i,j)=\begin{cases}0,&\ X_{o}(i,j)\leq M_{t}(i,j)\\ 1,&\ X_{o}(i,j)>M_{t}(i,j)\end{cases}

where $i$ and $j$ are the coordinates of the pixel points in the images. In this way, the encoded spike trains follow the Poisson distribution.

Training methods

Supervised learning algorithms for deep spiking neural networks mainly include indirectly supervised learning algorithms represented by ANN-converted SNN and directly supervised algorithms represented by spatiotemporal back-propagation. However, the lack of differentiability of the spike function is still a problem we have to confront. At present, a common solution is to use a continuous function similar to it to replace the spike function or its derivative, which is called surrogate gradient method, resulting in a spike-based BP algorithm. Wu et al.[34] introduce four curves to approximate the derivative of spike activity denoted by $f_{1},f_{2},f_{3},f_{4}$ as follow.

		$\displaystyle f_{1}(V)=\frac{1}{a_{1}}\operatorname{sign}\left(\left\|V-V_{th}\right\|<\frac{a_{1}}{2}\right),$
		$\displaystyle f_{2}(V)=\left(\frac{\sqrt{a_{2}}}{2}-\frac{a_{2}}{4}\left\|V-V_{th}\right\|\right)\operatorname{sign}\left(\frac{2}{\sqrt{a_{2}}}-\left\|V-V_{th}\right\|\right),$
		$\displaystyle f_{3}(V)=\frac{1}{a_{3}}\frac{e^{\frac{V_{th}-V}{a_{3}}}}{\left(1+e^{\frac{V_{th}-V}{a_{3}}}\right)^{2}},$
		$\displaystyle f_{4}(V)=\frac{1}{\sqrt{2\pi a_{4}}}e^{-\frac{\left(V-V_{th}\right)^{2}}{2a_{4}}},$

In general, SNN training follows the three principles: (1)The output of spiking neurons is binary and can be affected by noise. The firing frequency over time is used to represent the response strength of a category for classification. (2)The goal is for only the correct neuron to fire at the highest frequency while others remain silent. MSE loss is often used and has shown better results. (3)The network state needs to be reset after each simulation. It is important to note that training such an SNN requires a linear amount of memory and simulation time $T$ . A larger $T$ corresponds to a smaller simulation time step and more ”fine-grained” training, but it does not necessarily result in better training performance. When $T$ is too large, the SNN unfolds into a very deep network in terms of time, which can lead to gradient vanishing or exploding during gradient computation.

Load the train data to traindata_loader

for img, label in traindata_loader:

label_onehot = onehot_encode(label)

for 1:T do

img_encode = Poisson_Encode(img)

I = SNN.Linea(img_encode)

spike = SNN.Atan(I)

output += spike

output /= T

loss = MSEloss(output, label_onehot)

loss.backward()

optimizer.step()

Table 2: Train Process

Load the test data to testdata_loader

for img in testdata_loader:

for 1:T do

img_encode = Poisson_Encode(img)

I = SNN.Linea(img_encode)

spike = SNN.Atan(I)

output += spike

result = argmax(output)

Table 3: Test Process

Table2 and Table3 present the pseudocode for the training and testing processes, respectively, providing a more intuitive representation of their distinctions.

The longer the simulation time $T$ , the higher the testing accuracy. However, increasing $T$ results in a significant increase in time consumption, especially when dealing with encrypted data.

Appendix B: Homomorphic Encryption shemes

LWE We recall that a LWE(Learning With Errors)[32] ciphertext encrypting $m\in Z_{p}$ has the form:

LWE_{s}(m)=(\mathbf{a},b)=\left(\mathbf{a},\langle\mathbf{a},\mathbf{s}\rangle+e+\left\lfloor\frac{q}{p}m\right\rceil\right)\bmod q

here, $\mathbf{a}\in\mathbb{Z}_{q}^{n}$ and $b\in\mathbb{Z}_{q}$ , and the keys are vectors $\mathbf{s}\in\mathbb{Z}_{q}^{n}$ . The ciphertext $(\mathbf{a},b)$ is decrypted using:

\left\lfloor\frac{p}{q}(b-\langle\mathbf{a},\mathbf{s}\rangle)\right\rceil\bmod p=\left\lfloor m+\frac{p}{q}e\right\rceil=m.

Note. The function $\left\lfloor\cdot\right\rceil$ has the nice property: if $m_{0}\equiv m_{1}\bmod p$ , then $\left\lfloor\frac{q}{p}m_{0}\right\rceil\equiv\left\lfloor\frac{q}{p}m_{1}\right\rceil\bmod q$ , so the operations on message polynomials mod $p$ are realized when computing ciphertext modulo $q$ .

RLWE An RLWE(Ring Learning With Errors)[28] ciphertext of a message $m(X)\in R_{p,N}$ can be obtained as follows:

RLWE_{s}(m(X))=\left(a(X),b(X)\right),\text{where }b(X)=a(X)\cdot s(X)+e(X)+\left\lfloor\frac{q}{p}m(X)\right\rceil,

where $a(X)\leftarrow R_{q,N}$ is chosen uniformly at random, and $e(X)\leftarrow\chi_{\sigma}^{n}$ is chosen from a discrete Gaussian distribution with parameter $\sigma$ . The decryption algorithm for RLWE is similar to LWE.

GSW GSW[21] is recognized as the first third-generation FHE scheme, and in practice, the RGSW[21] variant is commonly used. Given a plaintext $m\in\mathbb{Z}_{p}$ , the plaintext $m$ is embedded into a power of a polynomial to obtain $X^{m}\in R_{p,N}$ , which is then encrypted as $RGSW(X^{m})$ . RGSW enables efficient computation of homomorphic multiplication, denoted as $\diamond$ , while effectively controlling noise growth:

		$\displaystyle RGSW\left(X^{m_{0}}\right)\diamond RGSW\left(X^{m_{1}}\right)=RGSW\left(X^{m_{0}+m_{1}}\right),$
		$\displaystyle RLWE\left(X^{m_{0}}\right)\diamond RGSW\left(X^{m_{1}}\right)=RLWE\left(X^{m_{0}+m_{1}}\right).$

Note that the first multiplication involves two RGSW ciphertexts, whereas the second multiplication operates on an RLWE ciphertext and an RGSW ciphertext and is used in bootstrapping.

	$\displaystyle\max\limits_{i}\|\hat{I}_{i}[t]-\tau I_{i}[t]\|$	$\displaystyle=\max\limits_{i}\|\sum\limits_{j}(\tau w_{ij}-\hat{w}_{ij})S_{j}[t]\|$
		$\displaystyle\leq\max\limits_{i,j}\|\tau w_{ij}-\hat{w}_{ij}\|\cdot\|\sum\limits_{j}S_{j}[t]\|$
		$\displaystyle\leq\frac{1}{2}\cdot\|\sum\limits_{j}S_{j}[t]\|.$

	$\displaystyle\mathbb{E}\|\hat{I}[t]-\tau I[t]\|$	$\displaystyle\approx\mathbb{E}(\sum\limits_{j=0}^{N(\omega)}\|\xi_{j}\|)$
		$\displaystyle=\sum_{n=0}^{\infty}\mathbb{E}(\sum_{i=0}^{n}\|\xi_{i}\|\mid N(\omega)=n)\cdot\mathbb{P}(N=n)$
		$\displaystyle=\sum_{n=0}^{\infty}\frac{n}{4}\cdot e^{\lambda}\cdot\frac{\lambda^{n}}{n!}=\frac{\lambda}{4}.$