This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

SpikeDyn: A Framework for Energy-Efficient Spiking Neural Networks with Continual and Unsupervised Learning Capabilities in Dynamic Environments

1Rachmad Vidya Wicaksana Putra, 2Muhammad Shafique 1Technische Universität Wien (TU Wien), Vienna, Austria
2New York University Abu Dhabi (NYUAD), Abu Dhabi, United Arab Emirates
Email: rachmad.putra@tuwien.ac.at, muhammad.shafique@nyu.edu
Abstract

Spiking Neural Networks (SNNs) bear the potential of efficient unsupervised and continual learning capabilities because of their biological plausibility, but their complexity still poses a serious research challenge to enable their energy-efficient design for resource-constrained scenarios (like embedded systems, IoT-Edge, etc.). We propose SpikeDyn, a comprehensive framework for energy-efficient SNNs with continual and unsupervised learning capabilities in dynamic environments, for both the training and inference phases. It is achieved through the following multiple diverse mechanisms: 1) reduction of neuronal operations, by replacing the inhibitory neurons with direct lateral inhibitions; 2) a memory- and energy-constrained SNN model search algorithm that employs analytical models to estimate the memory footprint and energy consumption of different candidate SNN models and selects a Pareto-optimal SNN model; and 3) a lightweight continual and unsupervised learning algorithm that employs adaptive learning rates, adaptive membrane threshold potential, weight decay, and reduction of spurious updates. Our experimental results show that, for a network with 400 excitatory neurons, our SpikeDyn reduces the energy consumption on average by 51% for training and by 37% for inference, as compared to the state-of-the-art. Due to the improved learning algorithm, SpikeDyn provides on avg. 21% accuracy improvement over the state-of-the-art, for classifying the most recently learned task, and by 8% on average for the previously learned tasks.

Index Terms:
Spiking neural networks, SNNs, continual learning, unsupervised learning, energy-efficiency, embedded systems, complexity.

I Introduction

With the evolution of neuromorphic computing, SNNs are (re-)gaining researchers’ attention as they have demonstrated potential for having better learning capabilities (especially in unsupervised settings) due to their biological plausibility, and relatively better energy efficiency compared to other competitive neural network models [1]. Previous works have explored different methodologies to build energy-efficient and unsupervised SNN systems, and most of them employed offline training [2, 3, 4, 5, 6]. However, the information learned by the offline-trained SNN system can be obsolete or may lead to low accuracy at run time under dynamically changing scenarios, as new data may have new features that should be learned online [7, 8, 9, 10, 11]. It becomes especially important for use cases like IoT-Edge devices deployed in dynamically changing environments [8] and the robotic agents/UAVs in unknown terrains [9]. New data that are gathered directly from such dynamic environments, are usually unlabeled. Hence, an SNN-based system should employ unsupervised learning to process them [6]. Moreover, new data are uncontrolled and their classes might not be randomly distributed, thereby making the system difficult to learn different classes/tasks proportionally [12]. Therefore, the SNN system should employ real-time continual learning111Continual learning is defined as the ability of a model to learn consecutive tasks (e.g., classes), while retaining information that have been learned [8][13][14]. Real-time means during the operational lifetime of the dynamic system., while avoiding the undesired conditions, such as: (1) the system learns information from the new data, but quickly forgets the previously learned ones (i.e., catastrophic forgetting) [15, 13, 14]; (2) the system mixes new information with the existing ones, thereby corrupting/polluting the existing information [7][12]; and (3) the learning requires a large number of weights and neuron parameters, and complex exponential calculations, thereby consuming high energy.

Targeted Research Problem: If and how can we design lightweight and energy-efficient continual and unsupervised learning for SNNs that adapts to task changes (dynamic environments) for providing improved accuracy at run time. An efficient solution to this problem will enable SNN systems to achieve better learning capabilities on energy-constrained devices deployed in the unpredictable dynamic scenarios.

State-of-the-Art and Limitations: Previous works have tried to achieve continual learning through different techniques. First category includes the supervised learning techniques that minimize the cost function in the learning process using data labels [16, 17, 18]. Hence, they cannot process unlabeled data which is required in the targeted problem. Second category includes the unsupervised learning techniques that perform learning using unlabeled data [7][12][19]. However, they suffer from spurious updates which lead to the sub-optimal accuracy, since they update the weights at each spike event, as observed in [3]. They also incur high energy consumption due to: (a) additional neurons [12][19]; (b) non-proportional quantities of training samples, i.e., samples from the earlier task are presented with larger quantities than later tasks [7]; and (c) large memory footprint and complex exponential calculations to achieve high accuracy [7], i.e., a total of 800 neurons are needed to achieve 75% accuracy on the MNIST dataset. Note, the research for the continual and unsupervised learning in SNNs is still at an early stage and prominent SNN works mostly use the MNIST dataset [7] [12] [19]. Therefore, we adopt the same test conditions as used widely by the SNN research community. To highlight the targeted problem and the limitations of the state-of-the-art, we perform an experimental case study using the MNIST dataset, as discussed below.

I-A Motivational Case Study

We performed experiments that provide dynamic scenarios by feeding consecutive task (i.e., class222The SNN community uses term “task” which refers to “class” in dataset.) changes to the network. First, a stream of samples for digit-0 is fed. Then, the task is changed to digit-1. This process is repeated for other tasks without re-feeding previous tasks, and each task has the same number of samples. More details of the experimental setup are presented in Section IV. The results are presented in Fig. 1, from where we make the following key observations. 1 The baseline does not efficiently learn new tasks from digit-2 onward, as most of the synapses are already occupied by previously learned tasks (digit-0 and digit-1), and mix new information with the existing ones. 2 The state-of-the-art improves the accuracy over the baseline at a cost of an energy overhead due to: (a) a large number of weights and neuron parameters from excitatory and inhibitory layers, and (b) the complex exponential calculations for computing multiple spike traces, membrane-potential and threshold-potential decay, and weight decay. These observations expose several challenges that need to be solved to address the targeted problem, as discussed below.

Refer to caption
Figure 1: (a) The SNN architecture employed by the baseline [2] and the state-of-the-art (ASP) [7]. The experimental results: (b) the energy consumption for networks with 200 excitatory neurons (N200) and 400 excitatory neurons (N400), and (c) the per-digit accuracy for N400.

I-B Key Scientific Challenges that are Addressed in this Paper

  • The SNN system should employ a simple yet effective learning algorithm at run time that achieves high accuracy, in both the dynamic and non-dynamic scenarios.

  • It should reduce the non-significant weights or neuron parameters, and the complex exponential calculations, to optimize the energy consumption.

  • The memory- and energy-constraints should be considered in the design process to meet many design scenarios.

I-C Our Novel Contributions and Key Results

To address the above challenges, we propose SpikeDyn framework (Section III) for developing energy-efficient SNNs considering both the training and the inference phases, with continual and unsupervised learning capabilities in dynamic environments. The SpikeDyn employs the following key mechanisms (see Fig. 2).

  1. 1.

    Reducing the energy consumption of the neuronal operations (Section III-B) by replacing the inhibitory neurons with the direct lateral inhibitory connections.

  2. 2.

    An SNN model search algorithm (Section III-C) under the given memory- and energy-constraints. It quickly estimates the memory footprint and energy consumption of the investigated SNN models using our analytical models that leverage the network parameters, the bit precision, the energy for processing an input, and the number of samples.

  3. 3.

    An algorithm for achieving a continual and unsupervised learning (Section III-D) through the following means: (a) adaptive learning rates, (b) synaptic weight decay, (c) adaptive membrane threshold potential, and (d) reduction of the spurious weight updates.

Key Results: We evaluated our SpikeDyn for accuracy and energy consumption using Python-based simulations with the MNIST on the Embedded GPU and the GPGPUs, to show the generality of our solution. For a network with 400 excitatory neurons, when compared to the state-of-the-art [7], SpikeDyn reduces the energy consumption by up to 66% (avg. 51%) for training and up to 54% (avg. 37%) for inference. It also improves the accuracy by up to 29% (avg. 21%) for the most recently learned task and by up to 37% (avg. 8%) for the previously learned tasks, in dynamic scenarios.

Refer to caption
Figure 2: The overview of novel contributions (shown in the blue boxes).

II Background: An Overview of SNNs

SNNs are considered as the third generation of neural networks’ computation models, as they exhibit high biological plausibility through their spike-coded information. An SNN model consists of spike coding, network architecture, neuron and synapse models, and learning rule [20]. Spike coding converts the information into a spike sequence/train. There are several coding schemes in literature, such as: rate, temporal, rank-order, phase, and burst coding schemes [21, 22, 23, 24]. Here, we consider rate coding as it has demonstrated high accuracy in unsupervised SNNs [2]. We also consider the network architecture of Fig. 1(a), since it is widely employed in the literature for the continual and unsupervised learning-based SNNs [7][12]. Here, each excitatory neuron is expected to recognize a class. For the neuron model, we use the Leaky Integrate-and-Fire model, since it has the lowest computational complexity among the existing neuron models. This model increases its membrane potential at the occurrence of the incoming spike. It generates a spike when the membrane potential reaches the threshold potential (VthV_{th}), and then goes to the reset potential (VresetV_{reset}). To prevent a neuron from dominating the spiking activity, the membrane threshold potential is usually defined by Vth+θV_{th}+\theta [2]. The θ\theta denotes the adaptation potential which is increased each time the neuron spikes, otherwise it decays with a rate of θdecay\theta_{decay}. Meanwhile, a synapse is modeled by the conductance and weight (ww). The conductance is increased by ww when a spike arrives at a synapse, otherwise it decays exponentially. For the learning rule, the SNN model employs a spike-timing-dependent plasticity (STDP) mechanism. Further details on SNNs can be found in [1][20].

III SpikeDyn Framework

III-A Overview

Fig. 3 shows the detailed flow of our SpikeDyn framework with the following key steps, which are explained in the subsequent sections.

  1. 1.

    Reducing the operations in SNN model to minimize the energy consumption, through a replacement of the inhibitory neurons with the direct lateral inhibitions. Thus, the operations in the inhibitory layer are eliminated.

  2. 2.

    An SNN model search algorithm that explores different number of excitatory neurons to find a set of Pareto-optimal SNN models, while considering the memory and energy constraints. To quickly perform the search, we also propose analytical models that achieve less than 5% errors compared to the actual run, by incorporating:

    1. (a)

      the number of weights and neuron parameters, and the bit precision, to estimate the memory footprint,

    2. (b)

      the energy consumption for processing an input sample, and the number of samples that will be processed, to estimate the energy consumption.

  3. 3.

    A continual and unsupervised learning algorithm that employs the following techniques.

    1. (a)

      Adaptive learning rates that determine the potentiation and depression factors in the STDP-based learning, using the presynaptic and postsynaptic spike activities.

    2. (b)

      Synaptic weight decay that helps the network to gradually remove the weak synaptic connections (which represent old and insignificant information), thereby enabling the synapses to learn new information.

    3. (c)

      Adaptive membrane threshold potential that provides balance in the neurons’ internal, so that the neuron generates spikes only when the corresponding synapses need to learn the input features. It is determined by the threshold potential value and its decay rate.

    4. (d)

      Reduction of the spurious weight updates in the presynaptic and postsynaptic spike event by employing timestep to carefully perform weight potentiation and depression.

Refer to caption
Figure 3: Overview of our SpikeDyn framework; see novel blocks in blue.

III-B Reducing the neuronal operations

Previous works in the continual and unsupervised learning for SNNs, used the network architecture shown in Fig. 1(a), which consists of the input, excitatory, and inhibitory layers [7][12]. We observe that the inhibitory neurons have different parameters from the excitatory ones to be saved in memory. Therefore, employing such an architecture will consume high memory and energy. To address this issue, we reduce the neuronal operations to substantially decrease the memory footprint and the energy consumption, as shown in Figs. 4(a)-4(c). We also observe that, the optimized architecture still achieves similar accuracy profile as of the baseline, as shown in Fig. 4(d). Therefore, we will improve the accuracy of the optimized architecture with our learning mechanism, as discussed in the Section III-D.

Refer to caption
Figure 4: (a) Replacing the inhibitory neurons with the direct lateral inhibitions. The optimized architecture reduces (b) memory and (c) energy, but still has similar (d) accuracy profile in dynamic scenarios like the baseline architecture.

III-C An SNN model search algorithm

Each application has memory- and energy-constraints that need to be considered in the model generation. Towards this, we propose an algorithm to search for an appropriate model size for a given SNN architecture that meets the design constraints (see Alg. 1). The idea is to explore different sizes of SNN model and estimate their memory and energy consumption in both training and inference phases, using the proposed analytical models.

Algorithm 1 Pseudo-code of the proposed search algorithm
0:(1) Memory constraint (memcmem_{c}); (2) Energy constraints: training (EctE_{ct}), inference (EciE_{ci}); (3) SNN model (modelmodel): number of neurons (model.nexcmodel.n_{exc}), size (model.memmodel.mem), energy of training (model.Etmodel.E_{t}) and inference (model.Eimodel.E_{i}); (4) Energy for one sample: training (E1tE_{1t}), inference (E1iE_{1i}); (5) number of additional neurons (naddn_{add});
0: SNN model (modelmodel);
0:    Initialization:
1:model.nexc=0model.n_{exc}=0;
2:model.mem=0model.mem=0; Process:
3:while model.memmemcmodel.mem\leq mem_{c} do
4:    if (model.nexc>0model.n_{exc}>0then
5:       perform training with 1 sample using Alg. 2;
6:       calculate E1tE_{1t}; // for 1 sample
7:       estimate model.Etmodel.E_{t}; // for all samples
8:       if (model.EtEct)model.E_{t}\leq E_{ct}) then
9:          perform inference with 1 sample;
10:          calculate E1iE_{1i}; // for 1 sample
11:          estimate model.Eimodel.E_{i}; // for all samples
12:          if (model.EiEcimodel.E_{i}\leq E_{ci}then
13:             save modelmodel;
14:          end if
15:       end if
16:    end if
17:    model.nexc+=naddmodel.n_{exc}+=n_{add};
18:    estimate model.memmodel.mem;
19:end while
20:return  modelmodel;
20:

For each investigated SNN models’ size, the memory footprint (memmem) is estimated using mem=(Pw+Pn)BPmem=(P_{w}+P_{n})\cdot BP, that leverages the number of weights (PwP_{w}) and neuron parameters (PnP_{n}), and the bit precision (BPBP). The reason is that, the above aspects are dominant factors in determining the size of an SNN model. Meanwhile, the total energy (EE) is estimated using E=E1NE=E_{1}\cdot N, that leverages the energy for processing a single sample (E1E_{1}), and the number of samples that will be processed (NN). The number of samples NN is important, as the deployed systems might have different number of samples available from the environment. If the estimated memory memmem and energy EE are within the memory constraint (memcmem_{c}) and energy constraint (EcE_{c}) respectively, then the investigated SNN model is selected as the candidate of solution. Our algorithm then selects the largest-sized SNN model from the candidates as the solution, since larger network usually can achieve higher accuracy [6]. We validated our analytical models against the actual execution runs, see the results presented in Fig. 5(a) for memory footprint, and Figs. 5(b)-5(c) for energy consumption of training and inference, respectively. The results show that our analytical models achieve less than 5% errors compared to the actual runs. Thus, they are suitable to the fast estimation need. Employing our algorithm is beneficial, rather than actually running all possible SNN configurations and selecting the desired one at the end, since it saves the exploration time, as shown in Figs. 5(d)-5(e).

Refer to caption
Figure 5: Validation of our analytical models against the actual runs in terms of (a) the memory footprint, and the energy consumption for (b) training and (c) inference, using full MNIST dataset. Our algorithm also reduces the exploration time over the actual runs for both (d) training and (e) inference phases.

III-D Proposed continual and unsupervised learning algorithm

Our SpikeDyn employs an algorithm that incorporates the following techniques (see pseudo-code in Alg. 2).

Adaptive Learning Rates: Our algorithm employs the potentiation factor (kpk_{p}) and the depression factor (kdk_{d}) to define the learning rates for weight potentiation and depression. The idea is adjusting the potentiation factor kpk_{p} to have high value when the corresponding synapses need to learn input features, which is indicated by the occurrences of postsynaptic spikes. The value of kpk_{p} is obtained by normalizing the maximum accumulated postsynaptic spikes (maxSppostmaxSp_{post}) with the spike threshold (SpthSp_{th}); see Eq. 1(a). Meanwhile, the depression factor kdk_{d} provides weight depression when the corresponding synapses need to weaken the connections, which is indicated by no occurrences of postsynaptic spikes. The value of kdk_{d} is obtained by the ratio between the maximum accumulated postsynaptic spikes (maxSppostmaxSp_{post}) and presynaptic spikes (maxSppremaxSp_{pre}); see Eq. 1(b). These factors are incorporated into the improved STDP-based learning algorithm; see Eq. 2. Here, Δw\Delta w denotes the weight change, ηpre\eta_{pre} and ηpost\eta_{post} denote the learning rate for a pre- and post-synaptic spike, xprex_{pre} and xpostx_{post} denote the pre- and post-synaptic traces, respectively.

(a)kp=maxSppostSpth;(b)kd=maxSppostmaxSppre\vspace{-0.3cm}\small\textbf{(a)}\;\;k_{p}=\left\lceil\frac{maxSp_{post}}{Sp_{th}}\right\rceil\;\;;\;\;\textbf{(b)}\;\;k_{d}=\frac{maxSp_{post}}{maxSp_{pre}}\\ \vspace{-0.2cm} (1)
Δw={kdηprexpostondepression update timekpηpostxpreonpotentiation update time\small\begin{split}\Delta w=\begin{cases}-k_{d}\cdot\eta_{pre}\cdot x_{post}&\text{on}\;\text{depression update time}\\ k_{p}\cdot\eta_{post}\cdot x_{pre}&\text{on}\;\text{potentiation update time}\end{cases}\end{split}\vspace{-0.1cm} (2)

Synaptic Weight Decay: We employ a weight decay to gradually remove the old and insignificant information, which are represented by small weight values. It follows equation τdecay(dw/dt)=wdecayw\tau_{decay}\cdot(dw/dt)=-w_{decay}\cdot w, with τdecay\tau_{decay} denotes the decay time constant and wdecayw_{decay} denotes the weight decay rate. In this manner, weak connections will get more disconnected over the training period. We define the value of wdecayw_{decay} to be inversely proportional to the size of network (wdecay1/nexcw_{decay}\propto 1/n_{exc}), with nexcn_{exc} denotes the number of excitatory neurons. The reason is that, a smaller network has less number of synapses for learning new information, while retaining the old ones. Therefore, it needs to forget the old information at a faster rate than the larger network. We observe that an appropriate wdecayw_{decay} can improve accuracy, as shown by the label- 1 in Fig. 6.

Adaptive Membrane Threshold Potential: The threshold potential is defined by Vth+θV_{th}+\theta, as discussed in Section II. We observe that the adaptation potential θ\theta has an important role to determine whether a neuron would generate spikes easily for later inputs. If θ\theta is too high, the neurons will not spike easily for later inputs, since the threshold potential is already adjusted for recognizing earlier inputs. In the context of dynamic scenarios, the network will face difficulties when learning new tasks. If θ\theta is too low, the neurons will spike easily for any inputs. In the context of dynamic scenarios, the network will quickly forget old information. Thus, the threshold potential should be balanced, so that some neurons are available for recognizing new features, while the others retain the old yet significant information. Towards this, we define the adaptation potential θ\theta to be proportional to its decay rate θdecay\theta_{decay} and the presentation time of a sample tsimt_{sim}, and can be stated as θ=cθθdecaytsim\theta=c_{\theta}\cdot\theta_{decay}\cdot t_{sim}, with cθc_{\theta} denotes the adaptation constant. An appropriate θ\theta can improve accuracy, as shown by the label- 2 in Fig. 6.

Refer to caption
Figure 6: Impact of employing weight decay and adaptation potential θ\theta on the accuracy of learning new tasks in a dynamic scenario.

Reducing the Spurious Weight Updates: Previous work [3] has observed that there are spurious updates in SNNs, which can degrade the accuracy. These are observed in two cases: (1) when the neurons spike unpredictably, due to the random weight initialization; and (2) when a neuron spikes for patterns that belong to different classes, due to the overlapped features. We exploit this observation in a novel way to reduce the spurious updates that are induced by both the pre- and post-synaptic spikes. The idea is to employ a timestep, and then monitor whether at least one postsynaptic spike happens. If so, then the weight potentiation will be conducted, and otherwise, the weight depression will be conducted (see Fig. 7).

Refer to caption
Figure 7: Overview of the proposed timestep-based weight updates.
Algorithm 2 Pseudo-code of the proposed learning algorithm
0:(1) Simulation time for an input (tsimt_{sim}); (2) Timestep (tstept_{step}); (3) SNN parameters: # of neurons (nexcn_{exc}), # of synapses-per-neuron (nsynn_{syn}), accumulated presynaptic spikes (Nsp_preN_{sp\_pre}) and accumulated postsynaptic spikes (Nsp_postN_{sp\_post}); (4) Presynaptic spike (sppresp_{pre}), postsynaptic spike (sppostsp_{post});
0: Synaptic weight update (Δw\Delta w);
0:    Initialization:
1:Δw[nexc,nsyn]=zeros[nexc,nsyn]\Delta w[n_{exc},n_{syn}]=zeros[n_{exc},n_{syn}];
2:Nsp_pre[nexc,nsyn]=zeros[nexc,nsyn]N_{sp\_pre}[n_{exc},n_{syn}]=zeros[n_{exc},n_{syn}];
3:Nsp_post[nexc]=zeros[nexc]N_{sp\_post}[n_{exc}]=zeros[n_{exc}]; Process:
4:for (t=0(t=0 to (tsim1))(t_{sim}-1)) do
5:    for (i=0(i=0 to (nexc1))n_{exc}-1)) do
6:       for (j=0(j=0 to (nsyn1))n_{syn}-1)) do
7:          if sppresp_{pre} then
8:             Nsp_pre[i,j]N_{sp\_pre}[i,j] +== 11;
9:          end if
10:          if sppostsp_{post} then
11:             Nsp_post[i]N_{sp\_post}[i] +== 11;
12:          end if
13:       end for
14:    end for
15:    if ((tmodtstep)((t\mod t_{step}) == 0)0) then
16:       maxSppre=max(Nsp_pre)maxSp_{pre}=max(N_{sp\_pre});
17:       maxSppost=max(Nsp_post)maxSp_{post}=max(N_{sp\_post});
18:       if (no sppostsp_{post} within tstep)t_{step}) then
19:          update Δw[:,:]\Delta w[:,:] using Eq. 2; // weight depression
20:       else
21:          mindex(max(Nsp_post))m\leftarrow index(max(N_{sp\_post}));
22:          update Δw[m,:]\Delta w[m,:] using Eq. 2; // weight potentiation
23:       end if
24:    else
25:       update Δw[:,:]\Delta w[:,:] using weight decay;
26:    end if
27:    return  Δw\Delta w;
28:end for
28:

IV Evaluation Methodology

Fig. 8 shows the experimental setup for evaluating SpikeDyn framework. We used Python-based SNN simulations [25] that run on Embedded GPU (Nvidia Jetson Nano) and GPGPUs (Nvidia GTX 1080 Ti and RTX 2080 Ti) to perform diverse evaluations under different memory and compute capabilities, for showing the generality of our solution. The GPU specifications are presented in Table I.

Refer to caption
Figure 8: The experimental setup and tool flow.
TABLE I: GPU Specifications.
Category Jetson Nano GTX 1080 Ti RTX 2080 Ti
Architecture Maxwell Pascal Turing
CUDA cores 128 3584 4352
Memory 4GB LPDDR4 11GB GDDR5X 11GB GDDR6
Interface width 64-bit 352-bit 352-bit
Power 10W 250W 250W

To estimate the energy consumption of both the training and the inference phases, we adopted the approach of [26]. That is, leveraging the information of the processing time, and the processing power that is reported using (1) nvidia-smi utility for GPGPUs, and (2) measurement using a power-meter for Embedded GPU. We used the MNIST as it is widely used for evaluating the continual and unsupervised learning in SNNs [7][12][19], and employed the rate coding to convert each pixel of an image into a Poisson-distributed spike train. For comparison partners, we used work in [2] as the baseline, and the adaptive synaptic plasticity (ASP) [7] as the state-of-the-art333We used the work in [7] since it is the only available recent work that has the complete set of configurations, parameters, and implementations details to have a reproducible design and results.. The evaluation was performed for both dynamic and non-dynamic environments. Dynamic environments mean that the network is fed with consecutive task changes without re-feeding previous tasks, and each task has the same number of samples. It simulates an extreme condition where an SNN system receives training tasks from the environment in a consecutive manner, and each task has a defined number of samples. Non-dynamic environments mean that the network is fed with input samples whose tasks are distributed randomly. We used different network sizes, i.e., 200 and 400 excitatory neurons, which we refer them to as N200 and N400, respectively.

V Experimental Results and Discussions

V-A Maintaining the Classification Accuracy

Refer to caption
Figure 9: Accuracy in the dynamic environments: for most recently learned task in (a.1) N200, (b.1) N400; and for the previously learned tasks in (a.2) N200, (b.2) N400. Accuracy in the non-dynamic environments: over the presentation of training samples in (c.1) N200 and (c.2) N400.
Refer to caption
Figure 10: Confusion matrices of the SpikeDyn for classifying the previously learned tasks, which show the relation between the target labels and the predicted labels in (a) N200 and (b) N400.

Dynamic Environments: We evaluated the classification accuracy for two cases, i.e., (1) classifying the most recently learned task, which represent the capability of learning new task; and (2) classifying the previously learned tasks, which represent the capability of retaining old information.

Case-1: Figs. 9(a.1) and 9(b.1) show the accuracy when the network classifies the most recently learned task (or learning new task), for N200 and N400 respectively. The ASP achieves better accuracy than the baseline since the ASP employs adaptive learning rate and weight decay, while the baseline does not consider the dynamic tasks in its learning. Here, our SpikeDyn improves the learning capabilities more than the ASP. The SpikeDyn improves the accuracy by up to 38% (avg. 23%) than the ASP for N200, and by up to 29% (avg. 21%) for N400. The reason is that, the SpikeDyn employs: (1) a more careful mechanism to determine the rates of weight potentiation and depression for learning new features, (2) an appropriate threshold potential to adjust some neurons to be active in the learning process, (3) weight decay rate that effectively removes the old and insignificant information, and (4) reduction of the spurious weight updates.

Case-2: Figs. 9(a.2) and 9(b.2) show the accuracy when the network classifies the previously learned tasks (or retaining old information), for N200 and N400 respectively. The baseline performs the worst as it does not decrease the weights, thereby suffering from mixed information in its synapses. Here, our SpikeDyn shows comparable accuracy to the ASP. The SpikeDyn improves the accuracy by up to 36% (avg. 4%) than the ASP for N200, and by up to 37% (avg. 8%) for N400. The reason is that, the SpikeDyn employs: (1) a threshold potential that tunes some neurons to be inactive in the learning process, thereby retaining the old yet significant information, and (2) weight decay rate that does not remove the old yet significant connections. Furthermore, we also observe that, some classes are relatively difficult to learn in dynamic scenarios, especially in the case of retaining old information. For instance, in N400 case, the accuracy for classifying digit-4 is low, as indicated by label- 1 in Fig. 9(b.2). It is because a considerable number of misclassification happens when the digit-4 is recognized as another digit (i.e., digit-9), as indicated by label- 1 in Fig. 10(b). The reason is that, the learned features from digit-4 are gradually changed to represent the features of digit-9 over a training period, due to their overlapped features and the sequence of learning tasks. Therefore, some neurons that recognize digit-4 at the early of the training period, are changed to recognize digit-9 at the end of the training period.

Non-dynamic Environments: Figs. 9(c.1) and 9(c.2) show the accuracy over the presentation of training samples for N200 and N400, respectively. The results show that our SpikeDyn achieves comparable accuracy to other techniques. The reason is that, our SpikeDyn employs effective learning rates to potentiate and decrease the weights, while reducing the spurious updates. In this manner, each weight update is adjusted appropriately, and hence the accuracy is maintained. Such observations are important, as SNN systems may have a different number of training samples available from the environment. Thus, the users can devise a strategy to define the minimum training samples for achieving the targeted accuracy.

Refer to caption
Figure 11: The energy consumption (normalized to the baseline) for training and inference phases, and across different sizes of networks and different GPUs.

V-B Reduction of the Energy Consumption

Fig. 11 shows that SpikeDyn reduces the energy consumption compared to other techniques, across different sizes of network and different GPUs, for both the dynamic and non-dynamic environments. For N200, our SpikeDyn reduces the energy consumption from the ASP by up to 59% (avg. 57%) for training, and by up to 54% (avg. 51%) for inference. For N400, our SpikeDyn reduces the energy consumption from the ASP by up to 66% (avg. 51%) for training, and by up to 54% (avg. 37%) for inference. The energy savings in training come from the elimination of the inhibitory neurons, the reduction of spurious updates and exponential calculations. Meanwhile, the energy savings in inference mainly come from the elimination of inhibitory neurons. Furthermore, we also observe the processing time of the SpikeDyn for training and inference phases (see Table II). The results show that running an SNN model on the Embedded GPU (Jetson Nano) requires a longer time than the GPGPUs, as the Embedded GPU has less number of cores, memory size and bandwidth. Therefore, the users should devise a strategy for defining the number of samples in the training and inference phases, to comply with the use cases’ requirements, especially on the embedded applications.

VI Conclusion

We propose a novel SpikeDyn framework that supports continual and unsupervised learning for SNNs, while reducing the energy consumption, by optimizing of SNN operations and improving the learning algorithm. The experimental results show that our SpikeDyn incurs less energy and improves the accuracy, as compared to the state-of-the-art in both the dynamic and non-dynamic scenarios. Therefore, our SpikeDyn would enable an energy-efficient embedded SNNs with one-time deployment.

TABLE II: Processing Time of SpikeDyn on Full MNIST Dataset (in hours).
Process Jetson Nano GTX 1080 Ti RTX 2080 Ti
N200 N400 N200 N400 N200 N400
Training 35.0 36.3 5.0 5.3 3.9 4.1
Inference 4.7 4.8 0.7 0.7 0.6 0.6
Inference of an 1.71s 1.74s 0.25s 0.25s 0.2s 0.2s
image (seconds)

VII Acknowledgment

This work was partly supported by the Indonesia Endowment Fund for Education (IEFE/LPDP) Scholarship Program from Ministry of Finance, Indonesia.

References

  • [1] M. Pfeiffer and T. Pfeil, “Deep learning with spiking neurons: Opportunities and challenges,” Frontiers in Neuroscience, vol. 12, 2018.
  • [2] P. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,” Frontiers in Computational Neuroscience, vol. 9, p. 99, 2015.
  • [3] G. Srinivasan et al., “Spike timing dependent plasticity based enhanced self-learning for efficient pattern recognition in spiking neural networks,” in Proc. of IJCNN, May 2017, pp. 1847–1854.
  • [4] H. Hazan et al., “Unsupervised learning with self-organizing spiking neural networks,” in Proc. of IJCNN, July 2018, pp. 1–6.
  • [5] D. J. Saunders et al., “Locally connected spiking neural networks for unsupervised feature learning,” Neural Networks, vol. 119, 2019.
  • [6] R. V. W. Putra and M. Shafique, “Fspinn: An optimization framework for memory-efficient and energy-efficient spiking neural networks,” IEEE TCAD, vol. 39, no. 11, pp. 3601–3613, 2020.
  • [7] P. Panda et al., “Asp: Learning to forget with adaptive synaptic plasticity in spiking neural networks,” IEEE JETCAS, vol. 8, no. 1, March 2018.
  • [8] J. L. Lobo et al., “Spiking neural networks and online learning: An overview and perspectives,” Neural Networks, vol. 121, 2020.
  • [9] T. Lesort et al., “Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges,” Information Fusion, vol. 58, pp. 52–68, 2020.
  • [10] G. Anthes, “Lifelong learning in artificial neural networks,” Commun. ACM, vol. 62, no. 6, p. 13–15, May 2019.
  • [11] G. M. van de Ven et al., “Brain-inspired replay for continual learning with artificial neural networks,” Nature communications, vol. 11, no. 1, pp. 1–14, 2020.
  • [12] J. M. Allred and K. Roy, “Unsupervised incremental stdp learning using forced firing of dormant or idle neurons,” in Proc. of IJCNN, 2016, pp. 2492–2499.
  • [13] Z. Chen and B. Liu, “Lifelong machine learning,” Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 12, no. 3, 2018.
  • [14] G. I. Parisi et al., “Continual lifelong learning with neural networks: A review,” Neural Networks, vol. 113, pp. 54 – 71, 2019.
  • [15] M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” Psychology of Learning and Motivation, vol. 24, pp. 109 – 165, 1989.
  • [16] J. Kirkpatrick et al., “Overcoming catastrophic forgetting in neural networks,” PNAS, vol. 114, no. 13, pp. 3521–3526, 2017.
  • [17] S.-W. Lee et al., “Overcoming catastrophic forgetting by incremental moment matching,” in Proc. of NIPS, 2017, pp. 4652–4662.
  • [18] S. G. Wysoski et al., “On-line learning with structural adaptation in a network of spiking neurons for visual pattern recognition,” in Proc. of ICANN, 2006, pp. 61–70.
  • [19] J. M. Allred and K. Roy, “Controlled forgetting: Targeted stimulation and dopaminergic plasticity modulation for unsupervised lifelong learning in spiking neural networks,” Frontiers in Neuroscience, vol. 14, p. 7, 2020.
  • [20] A. Tavanaei et al., “Deep learning in spiking neural networks,” Neural Networks, vol. 111, pp. 47–63, 2019.
  • [21] J. Gautrais and S. Thorpe, “Rate coding versus temporal order coding: a theoretical approach,” Biosystems, vol. 48, no. 1, pp. 57–65, 1998.
  • [22] C. Kayser et al., “Spike-phase coding boosts and stabilizes information carried by spatial and temporal spike patterns,” Neuron, vol. 61, no. 4, pp. 597 – 608, 2009.
  • [23] S. Thorpe and J. Gautrais, “Rank order coding,” in Computational neuroscience.   Springer, 1998, pp. 113–118.
  • [24] S. Park et al., “Fast and efficient information transmission with burst spikes in deep spiking neural networks,” in Proc. of DAC, 2019, p. 53.
  • [25] H. Hazan et al., “Bindsnet: A machine learning-oriented spiking neural networks library in python,” Frontiers in Neuroinformatics, vol. 12, 2018.
  • [26] S. Han et al., “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.