AM-MTEEG: Multi-task EEG classification based on impulsive associative memory

Junyan Li, Bin Hu*, Zhi-Hong Guan This work was supported in part by the National Natural Science Foundation of China under Fund 62322311. (*Corresponding author: B. Hu).J. Li and B. Hu are with the School of Future Technology, South China University of Technology, and also with the Pazhou Lab, Guangzhou 510006, China. (E-mails: 202164690091@mail.scut.edu.cn; huu@scut.edu.cn)Z.-H. Guan is with the School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China. (E-mail: zhguan@mail.hust.edu.cn)

Abstract

Electroencephalogram-based brain-computer interface (BCI) has potential applications in various fields, but their development is hindered by limited data and significant cross-individual variability. Inspired by the principles of learning and memory in the human hippocampus, we propose a multi-task (MT) classification model, called AM-MTEEG, which combines learning-based impulsive neural representations with bidirectional associative memory (AM) for cross-individual BCI classification tasks. The model treats the EEG classification of each individual as an independent task and facilitates feature sharing across individuals. Our model consists of an impulsive neural population coupled with a convolutional encoder-decoder to extract shared features and a bidirectional associative memory matrix to map features to class. Experimental results in two BCI competition datasets show that our model improves average accuracy compared to state-of-the-art models and reduces performance variance across individuals, and the waveforms reconstructed by the bidirectional associative memory provide interpretability for the model’s classification results. The neuronal firing patterns in our model are highly coordinated, similarly to the neural coding of hippocampal neurons, indicating that our model has biological similarities.

Index Terms:

Muti-task learning, brain-computer interface, Electroencephalogram (EEG), bidirectional associative memory, impulsive neural network.

Refer to caption — Figure 1: The overall architecture of the associative memory multi-task EEG (AM-MTEEG) model: Included are a convolutional encoder, impulsive neural population, a bidirectional associative memory (BAM) module, and a decoder with transposed convolution. Both the convolution layer and the transposed convolution layer use a convolution kernel of length 5 and are activated using the Relu function. In the experiment, we used 200 Leaky Integrate-and-Fire neurons as the impulsive neuron population. The encoder, impulsive neural population and decoder are trained by backpropagation (BP) with joint loss, and the BAM module is trained by Hebbian learning.

I Introduction

The brain-computer interface (BCI) can be defined as a system that translates a user’s brain activity patterns into messages or commands for interactive applications[1]. The brain-computer interface effectively exchanges information between the brain and physical devices and has broad application prospects in medical rehabilitation and neuroscience research[2]. Most of the current BCI data comes from neural electrical signals recorded by electroencephalogram (EEG), which enables researchers to measure and decode human brain activity. A classic BCI paradigm, such as motor imagery (MI), basically consists of five parts: EEG acquisition, EEG preprocessing, feature extraction, classification, and task execution[3]. The key step is extracting the features of EEG signals and classifying them. Although EEG data collection technology has become highly mature, the classification of EEG data is still constrained by the following problems.

•

Since EEG data has large variability and the representation of neural activity changes over time[4], EEG classification models trained on different sample data are difficult to generalize to other samples.
•

Most EEG classification models have low cross-individual classification accuracy[5], therefore models are only trained and tested on a single subject’s data[6], resulting in insufficient data for training the model.
•

Because BCI experiments are time-consuming[7], and are limited by the energy and time of the subjects, the amount of EEG data collected by a single subject is relatively small.

Although data-driven deep learning (DL) has achieved remarkable achievements in image recognition, natural language processing, and other fields, the above problems limit its application in EEG signal classification. On the other hand, due to ethical and safety considerations, the medical field has placed higher requirements on the interpretability of deep learning models[8]. However, most current deep learning models have poor interpretability, which limits their application in BCI.

To address the problem of large variability in EEG data, we were inspired by Zheng et al.[9] and used multi-task learning (MTL) to model the EEG classification of each subject as a task, extract common features between samples of different subjects to achieve cross-individual training, and map the features to categories of each individual. In addition, to address the problem of less EEG data and poor interpretability of deep learning models, we hybridized the deep learning model with the associative memory model, which is an abstract model inspired by the memory principles of the real human brain. In contrast to deep learning, the human brain can learn effective features and make accurate mappings from very few demonstrations[10]. The ability of fast and few-shot learning comes from associative memory. The processes of learning and memory in the human brain can be categorized into three distinct phases[11]: encoding, storage, and retrieval. The brain structures responsible for encoding and retrieval have evolved over millions of years of species development and are refined through years of individual learning. In contrast, the neural architecture associated with the storage phase is established over a much shorter period. In this work, we propose replacing the brain’s encoding mechanism with a deep learning encoder-decoder model trained across various samples. This model incorporates a layer of impulsive (also spiking) neurons to encode the necessary neural signals for associative memory formation. For each individual, we substitute the brain’s storage and retrieval functions with a bidirectional associative memory (BAM) matrix. This matrix enables the mapping between two modalities: impulsive signals and classification categories. Consequently, the original EEG signals can be reconstructed by decoding the impulsive signals associated with specific category labels, thereby enhancing the interpretability of the model’s classification process. Our major contributions can be summarized as follows.

•

To address the challenges posed by significant individual variability and limited data in EEG analysis, we propose a model that integrates deep learning-based impulsive encoding with a Hebbian learning bidirectional associative memory network for EEG data classification, particularly in the context of brain-computer interfaces. The encoder in our model is designed to capture shared features across different samples, while the associative memory network effectively captures the variability among individual data. This work represents the application of impulsive neural networks and bidirectional associative memory to multi-task learning in EEG signal processing.
•

For any motor imagery EEG signals, the impulsive neurons in our model exhibit specific firing patterns characterized by high synchrony. This synchrony suggests that neuronal activity is governed by a low-dimensional latent manifold, a feature consistent with the neural coding mechanisms observed in hippocampal neurons. This alignment indicates the bio-inspired nature of our model, reflecting its biological plausibility.
•

Our model achieved an average accuracy of 86% on the BCI Competition IV IIa dataset, surpassing existing state-of-the-art (SOTA) models and exhibiting minimal performance variance across different samples. Furthermore, the model can reconstruct EEG signals through the encoder-decoder framework. The EEG signals reconstructed by the bidirectional associative memory network can be compared with the event-related potentials (ERPs) of each category, thereby enhancing the interpretability of the model.

II Related Work and Study Motivation

In the classification of EEG data, particularly within the field of BCI applications, a common approach involves using common spatial patterns (CSP) for feature extraction[12], followed by classification algorithms such as LDA and SVM. While CSP can significantly enhance EEG features, traditional machine learning models struggle to recognize more complex EEG patterns. Consequently, deep learning models have demonstrated greater flexibility in EEG classification for complex BCI tasks. For example, Lawhern et al.[5] proposed EEGNet, which applies separable two-dimensional convolutions to EEG classification problems. Liu et al.[13] combine the same spatiotemporal convolution with filter banks and propose FBMSNet, which mixes deep convolution to extract temporal features at multiple scales and then performs spatial filtering to mitigate volume conduction. Similarly, Altaheri et al.[14] introduced ATCNet, a convolutional neural network with temporal attention mechanisms for EEG classification. This model achieved an average accuracy of 85.4% on the BCI Competition IV IIa dataset, setting a new state-of-the-art performance on this dataset.

Multi-task learning optimizes multiple loss functions simultaneously, allowing different tasks to share the same features. Compared to single-task learning models, MTL leverages more data from different tasks[15], enabling the learning of more generalized representations. Moreover, MTL addresses the challenges of high individual variability and limited sample size in EEG data. Zheng et al.[9] developed an effective algorithm where each subject’s sample is treated as a separate task, utilizing regularized tensors. This method employs Fisher’s discriminant criterion for feature selection and optimizes using the alternating direction method of multipliers (ADMMs). In addition to MTL, the use of ensemble learning can also significantly reduce the variability of EEG. For example, Yu et al.[16] used a dynamic ensemble Bayesian filter to assemble models. The ensemble models can cope with variability in signals and improve the robustness of online control.

Spiking neural networks (SNNs), inspired by biological neural systems, offer advantages such as low power consumption and high interpretability, making them widely applicable across various tasks. Diehl et al.[17] implemented an SNN using unsupervised STDP learning for handwritten digit classification, achieving 95% accuracy on the MNIST dataset. Xu et al.[18] proposed a spiking convolutional neural network (SCNN) for electromyography (EMG) pattern recognition, which can be used in prosthesis control and human-computer interaction. The effectiveness of SNNs in multi-task learning has also been demonstrated. For instance, Cachi et al.[19] proposed TM-SNN, which uses different spiking thresholds to represent different tasks while sharing the same structure and parameters across tasks.

Models that integrate Hebbian associative memory neural networks with deep learning have been shown to enhance performance across a range of tasks. Hu et al.[20] proposed that spiking neural networks using Hebbian learning can provide stable and fault-tolerant associative memory. Miconi et al.[21] combined Hebbian rule-based associative memory with traditional backpropagation neural networks, achieving efficient learning on small-sample image datasets. Building on this, Wu et al.[22] applied a similar structure to spiking neural networks, where the network weights are updated through both global learning (backpropagation) and local learning (Hebbian learning). This hybrid approach has demonstrated remarkable performance in tasks such as fault-tolerant learning, few-shot learning, and continual learning.

Despite advancements in BCI classification models utilizing neural manifolds and multi-task learning to reduce inter-individual variability, challenges remain in feature encoding clarity, interpretability, and consistent performance across individuals. These limitations hinder the widespread adoption of BCI technology in fields such as medical research and rehabilitation. To address these issues, our model integrates deep learning-based SNN with BAM networks. The BAM framework enhances the interpretability of BCI performance across different individuals, while the shared features extracted through multi-task learning improve classification stability across individuals.

III The AM-MTEEG Model

As illustrated in Fig.1, our proposed multi-task learning model consists of a spiking encoder and an associative memory classifier. The spiking encoder utilizes a one-dimensional convolutional neural network to extract signal features, which are then fed into a population of spiking neurons, encoding the input into low-dimensional spiking representations. A convolutional neural network decoder is employed to reconstruct the EEG signals. The associative memory classifier assigns an associative memory matrix to each classification task, mapping the encoded spikes to multi-task categories.

The training of the multi-task learning model is divided into two phases. In the first phase, we combine self-supervised learning with label-guided training to optimize the encoder-decoder model. Multi-channel EEG signals are used simultaneously as both input and target, training the spiking encoder to reconstruct the one-dimensional EEG signals. In the second phase, we freeze the parameters of the spiking encoder and train the corresponding associative memory matrices for different tasks. Here, we represent the input and category labels as input-output pattern pairs $\{\mathbf{x}_{i},\mathbf{y}_{i}\}$ , where $\mathbf{x}_{i}\in R^{n\times t}$ is the low-dimensional spiking representation vector produced by the spiking encoder, with $n$ representing the number of neurons in the population and $t$ denoting the time series length after CNN encoding. $\mathbf{y}_{i}$ is the target vector where the labels are one-hot encoded. The associative memory matrix matches the input patterns to the corresponding output patterns by forming bidirectional hetero-associative memory.

III-A Convolutional Feature Extractor

As shown in Fig.1, our convolutional feature extractor consists of an encoder $E$ and a decoder $D$ consisting of one-dimensional convolutions, where the convolution kernel length of the convolutional layer is 5 and the ReLU function is used for activation. Unlike the existing EEG classification models based on 2D convolution[5, 14], to achieve EEG data classification while maintaining the structure of the original EEG signal as much as possible, we only used 1D convolution for feature extraction. Therefore, the convolution kernel parameters to be trained are reduced from $c\times n^{2}$ to $c\times n$ , where $c$ is the number of signal channels and $n$ is the convolution kernel size, which allows us to use larger convolution kernels. In the motor imagery task, we used the CNN model architecture as shown in Table I. The input signal $\mathbf{x}\in R^{c\times t}$ is downsampled to 1/4 of the original length by two one-dimensional maximum pooling in the encoder to obtain the hidden signal

\mathbf{h}=E(\mathbf{x}),\mathbf{h}\in R^{n\times t/4}.

(1)

The encoded signal is directly input into the spiking neuron as the current through the fully connected layer, recording the spike sequence $\mathbf{S}_{p}\in R^{n\times t/4}$ emitted by the neuron. In the decoder stage, the hidden layer spike are mapped by the fully connected layer and then upsampled to the original length by two one-dimensional deconvolutions.

\mathbf{x}^{\prime}=D(\mathbf{S}_{p}),\mathbf{x}^{\prime}\in R^{c\times t}.

(2)

In this process, the Encoder-Decoder model and the impulsive neural population obtain a low-dimensional representation of EEG activity through autoregressive learning.

TABLE I: The size of each module output signal

Blocks	Layer	$N_{conv}$	Size	Stride	Activation	Output
Encoder	Input		$(C,T)$			$(C,T)$
	Conv1D Block	3	5	1	ReLU	$(128,T)$
	AvgPool1D		2	2		$(128,T/2)$
	Conv1D Block	5	5	1	ReLU	$(256,T/2)$
	AvgPool1D		2	2		( $256,T/4)$
	Conv1D Block	5	5	1	ReLU	$(256,T/4)$
Neual Population	FC		$(256,200)$			$(200,T/4)$
	LIF Neurons		200			$(200,T/4)$
	FC		$(200,256)$			$(256,T/4)$
Decoder	Conv1D Block	3	5	1	ReLU	$(128,T/4)$
	ConvTranspose1D		8	2		$(128,T/2)$
	Conv1D Block	5	5	1	ReLU	$(128,T/2)$
	ConvTranspose1D		8	2		$(128,T)$
	Conv1D Block	5	5	1	ReLU	$(C,T)$
Associative Memory Classifier	AMM		$(200\times T/4,N_{class})$			$N_{class}$

III-B Impulsive neural population

We use the encoder output signal as a current $\mathbf{I}$ , input it into the Leaky Integrate-and-Fire (LIF) neuron population, and convert it into a discrete spike train $\mathbf{s}$ . The change of LIF membrane potential $u$ with discrete time $t$ is

\tau\frac{du}{dt}=-u+RI(t),

(3)

when the membrane potential is greater than $u_{th}$ , the neuron generates a spike, and the membrane potential is set to 0. To facilitate computer simulation, we use its differential form

	$\displaystyle u^{t}$	$\displaystyle=(1-\tau)u^{t-1}-s^{t-1}u_{th}+\sum_{i}(w_{i}I_{i}^{t-1}),$		(4)
	$\displaystyle s^{t}$	$\displaystyle=step(u^{t}-u_{th}),$		(4)

where $\tau$ is the decay constant, $w_{i}$ is the synaptic weight of the synapse $i$ , $s^{t}\in\{0,1\}$ is the spike fired at $t$ , and $I_{i}^{t-1}$ represents the input current of the synapse $i$ at time $t-1$ . When the membrane potential is greater than $u_{th}$ , the neuron generates a spike, and the membrane potential is set to 0 at the next time $t+1$ . As shown in eq.4, This process uses the unit step function

step(x)=\begin{cases}1&x\geq 0,\\ 0&x<0.\end{cases}

However, when training the encoder using backpropagation, calculating the gradient of the step function poses a challenge. Since the step function is discontinuous, its gradient results in an impulse response

\delta(x)=\begin{cases}+\infty&x=0,\\ 0&x\neq 0.\end{cases}

Therefore the gradient of membrane potential is

$\displaystyle\nabla u^{t-1}$	$\displaystyle=\frac{\partial u^{t}}{\partial u^{t-1}}\nabla u^{t}+\frac{\partial s^{t-1}}{\partial u^{t-1}}\nabla s^{t-1}$	(5)
	$\displaystyle=(1-\tau)\nabla u^{t}+\delta(u^{t}-u_{th})\nabla s^{t-1}$
	$\displaystyle=\nabla u^{t}[(1-\tau)+u_{th}\delta(u^{t}-u_{th})].$

where $\delta(u^{t}-u_{th})$ makes it difficult to train the encoder. As shown in Figure 2, we use the surrogate gradient method[23] to replace the impulse function with a rectangular window function

rect(x)=\begin{cases}1&|x|\leq 0.5\\ 0&|x|>0.5\end{cases}.

(6)

The membrane potential gradient using the surrogate gradient is

$\displaystyle\nabla u^{t-1}$	$\displaystyle=\frac{\partial u^{t}}{\partial u^{t-1}}\nabla u^{t}+(\frac{\partial s^{t-1}}{\partial u^{t-1}})^{\prime}\nabla s^{t-1}$	(7)
	$\displaystyle=(1-\tau)\nabla u^{t}+rect(u^{t}-u_{th})\nabla s^{t-1}$
	$\displaystyle=\nabla u^{t}[(1-\tau)+u_{th}rect(u^{t}-u_{th})].$

As illustrated in Fig.1, at each time step t, the signal processed by the convolutional neural network is treated as input current, which is passed through a fully connected layer into multiple LIF neurons in the hidden layer. The hidden layer generates spikes, and these spike sequences encode essential information from the original EEG signal. Subsequently, we employ a self-supervision approach, using a convolutional neural network decoder to reconstruct the EEG signal. To enhance the model’s ability to pre-classify the input EEG while performing signal reconstruction, we connect the hidden layer neurons to an auxiliary classifier for preliminary classification. Here we use a trainable fully connected layer, and we decompose the training loss function into reconstruction loss $L_{reg}$ and classification loss $L_{cls}$ , where the reconstruction loss is MSE, i.e. $L_{reg}(\mathbf{x},\mathbf{x^{\prime}})=\frac{1}{n}\sum_{i}(x_{i}-x_{i}^{\prime})^{2}$ , and the classification loss uses cross-entropy loss, i.e. $L_{cls}(\mathbf{x},label)=-\sum_{i}label_{i}log(x_{i})$ . The joint loss is

L=L_{reg}+\lambda L_{cls},

(8)

where $\lambda$ is an artificially set mixing factor, which we set to 0.1 here, and $L_{cls}$ is calculated by the hidden layer spike through the auxiliary classifier.

III-C Associative Memory Classifier

In order to perform efficient training and accurate classification on a physiological electrical signal dataset with a small amount of data, we use a bidirectional associative memory method to map the pulse activity to the label. Let the input-output pattern pair be $\{\mathbf{x}_{k},\mathbf{y}_{k}\}$ , where $\mathbf{x}_{k}\in R^{n}$ is the input column vector and $\mathbf{y}_{k}\in R^{m}$ is output one-hot vector. In the memory retrieval stage, the iterative process of the input-output pattern pair $\{\mathbf{x}_{k},\mathbf{y}_{k}\}$ using the associative memory matrix (AMM) $\mathbf{W}_{k}$ is[24]

	$\displaystyle\mathbf{y}_{k}^{t+1}$	$\displaystyle=sgn(\mathbf{W}_{k}\mathbf{x}_{k}^{t}),$		(9)
	$\displaystyle\mathbf{x}_{k}^{t+1}$	$\displaystyle=sgn(\mathbf{W}_{k}^{T}\mathbf{y}_{k}^{t}),$		(9)

where $sgn=\begin{cases}-1&x\leq 0\\ +1&x>0\end{cases}$ . We write eq. 9 as the sum of each term,

	$\displaystyle y_{t+1}^{i}=\sum_{j=1}^{n}w^{ij}x_{t}^{j},$		(10)
	$\displaystyle x_{t+1}^{i}=\sum_{j=1}^{m}w^{ji}y_{t}^{j}.$		(10)

Its differential form is

	$\displaystyle\frac{dy^{i}}{dt}=\sum_{j=1}^{n}w^{ij}x^{j},$		(11)
	$\displaystyle\frac{dx^{j}}{dt}=\sum_{i=1}^{m}w^{ij}y^{i}.$		(11)

When the associative memory system is stable, $y^{t+1}=y^{t},x^{t+1}=x^{t}$ , so the above process is actually optimizing the energy function of the system[25]

E=-\mathbf{y}^{T}\mathbf{Wx}.

(12)

We can write it as

E=\sum_{i}\sum_{j}-y^{i}w^{ij}x^{j}.

(13)

Therefore $\frac{\partial E}{\partial w^{ij}}=-y^{i}x^{j}$ , and when $w^{ij}=sgn(y^{i}x^{j})$ , the energy function is minimum. During the training stage, the associative memory matrix $\mathbf{W}_{k}$ of the task $k$ is

\mathbf{W}_{k}=\sum_{j}\mathbf{y}_{k}^{j}\mathbf{x}_{k}^{jT}.

(14)

From the above, we can see that bidirectional associative memory is a type of Hebbian learning. When the pre and post-synaptic membranes emit spikes at the same time, the synaptic connection will be strengthened. This process is similar to the synaptic learning mechanism of hippocampal neurons[26].

Next, we will prove the convergence of the system. Let $\dot{x}^{j},\dot{y}^{i}$ be the time derivatives of $x^{j},y^{i}$ . Then the rate of change of the energy function E is

\dot{E}=\sum_{j}\frac{\partial E}{\partial x^{j}}\dot{x}^{j}+\sum_{i}\frac{\partial E}{\partial y^{i}}\dot{y}^{i}.

(15)

From eq. 13, we know

	$\displaystyle\frac{\partial E}{\partial x^{j}}=\sum_{i}-y^{i}w^{ij},$		(16)
	$\displaystyle\frac{\partial E}{\partial y^{i}}=\sum_{j}-x^{j}w^{ij}.$		(16)

Substitute eq.16 and eq. 11 into eq. 15, we can get

\dot{E}=-\sum_{j}(\sum_{i}w^{ij}y^{i})^{2}-\sum_{i}(\sum_{j}w^{ij}x^{j})^{2},

(17)

where

	$\displaystyle(\sum_{i}w^{ij}y^{i})^{2}\geq 0,$		(18)
	$\displaystyle(\sum_{j}w^{ij}x^{j})^{2}\geq 0.$		(18)

Therefore, $\dot{E}\leq 0$ always holds true, and the dynamic changes of the system will cause $E$ to continue to decrease. Considering the use of the $sgn$ function, the system will gradually converge to a stable value.

Since we use one-hot encoded $\mathbf{y}^{j}$ as the output pattern, after applying the $sgn$ function, only the maximum value is set to 1, while the remaining values are set to -1. This system stabilizes after a single iteration. Therefore, during the testing phase, for a given task sample $\mathbf{x}_{i}$ , the classification result is obtained using the associative memory matrix

label^{i}=\mathop{\arg\min}\limits_{i}(\mathbf{W}^{i}\mathbf{x}^{i}).

(19)

Considering all the pattern pairs $\{\mathbf{x}_{k},\mathbf{y}_{k}\}$ to be stored, for any input $\mathbf{x}_{i}$ in the prediction phase

\mathbf{y}_{i}=\sum_{k}\mathbf{y}_{k}\mathbf{x}_{k}^{T}\mathbf{x}_{i},

this process is equivalent to taking the cosine similarity between the current input $\mathbf{x}_{i}$ and $\mathbf{x}_{k}$ in all pattern pairs as the average output $\mathbf{y}_{k}$ of the weighted calculation.

IV Experimental Results

We applied our model to the classic motor imagery BCI paradigm, which classifies users’ EEG signals into different motor actions based on EEG classification models. In this study, we utilized the BCI Competition dataset and achieved an average accuracy of 94% and 86% on two of its subsets, respectively.

IV-A Experimental datasets

BCI Competition III Iva[27]

This dataset is a binary classification dataset, including right-hand and foot movement imagery tasks performed by 5 subjects. Each task includes 118 channels of EEG signals obtained at a sampling rate of 100 Hz within 3 seconds.

BCI Competition IV IIa[28]

This dataset is a four-category dataset, including motor imagery tasks of the left hand, right hand, feet, and tongue performed by 9 subjects. Each task includes 22 channels of EEG signals and 3 channels of EOG signals obtained at a sampling rate of 250 Hz within 3 seconds.

IV-B Performance evaluation

Comparative studies

As shown in Table II and Table III, we compared our proposed model with other models. Compared to existing SOTA models, our model achieved comparable accuracy and surpassed the current SOTA model in terms of average accuracy on the BCI Competition IV IIa dataset. In contrast to other multi-task models, our proposed model exhibited the smallest standard deviation in accuracy across different individuals, indicating its ability to provide stable classification performance across individuals. Additionally, when extending to new tasks, our model only requires retraining the associative memory matrix, and the Hebbian-like learning used in this process is highly efficient, demonstrating good scalability.

TABLE II: BCI Competition IV IIa comparison experiment accuracy

Tasks	Zheng et al.[9]	DMTL-BCI[29]	EEGNet (Non-MTL)[5]	ATCNet (Non-MTL, SOTA)[14]	Ours
1	0.840	0.835	0.858	0.885	0.810
2	0.573	0.490	0.615	0.705	0.793
3	0.549	0.927	0.886	0.976	0.879
4	0.959	0.670	0.749	0.810	0.831
5	0.912	0.713	0.559	0.830	0.948
6	0.826	0.637	0.521	0.736	0.897
7	0.792	0.808	0.896	0.931	0.862
8	0.835	0.800	0.833	0.903	0.879
9	0.819	0.817	0.795	0.910	0.844
AVG	0.790	0.753	0.745	0.854	0.860
STD	0.131	0.120	0.139	0.086	0.045

TABLE III: BCI Competition III Iva comparison experiment accuracy

Tasks	EEGNet (Non-MTL)[5]	EDPNet(Non-MTL)[30]	STL-Overlap[31]	Zheng et al.[9]	Ours
aa	1.000	1.000	0.857	0.911	0.958
al	0.688	0.884	0.982	1.000	0.975
av	0.582	0.704	0.643	0.768	0.838
aw	0.795	0.835	0.964	1.000	0.975
ay	0.516	0.679	0.911	0.929	0.950
AVG	0.716	0.820	0.871	0.921	0.942
STD	0.176	0.118	0.122	0.084	0.052

Ablation studies

We evaluated our proposed model on the BCI Competition III Iva dataset and compared it against two alternative models: (a) a model where spiking neurons were replaced with the tanh function, and (b) a model where the associative memory matrix was replaced by a fully connected layer trained via gradient descent. The results, as shown in Fig.3, demonstrate that the original model outperformed the simplified models in terms of accuracy on most samples. These findings suggest that both the spiking computation and the bidirectional associative memory classifier used in our model contribute to improved performance.

IV-C Model interpretability

Due to the reversibility of bidirectional associative memory, we input the classification label into the associative memory matrix to obtain the characteristic pulse sequence corresponding to any category

x_{i}^{label}=sgn(\mathbf{W}_{i}^{T}\mathbf{y}^{label}).

(20)

The spiking sequences obtained through the bidirectional associative memory on the BCI Competition III Iva dataset are shown in Fig. 5. It can be observed that the spikes from most individual neurons represent a fixed category, and these spikes exhibit a high degree of synchrony. This suggests that the spiking neurons in the hidden layer share similarities with the neural population coding observed in the hippocampus of the human brain[32].

We then used the decoder to reconstruct the original EEG data corresponding to the labels in the BCI Competition IV IIa dataset, obtaining characteristic waveforms for the four motor imagery categories. As shown in Fig.4a, the reconstructed EEG signals reveal distinct waveforms for each of the four categories. When comparing these waveforms with the actual ERP from the real data, as illustrated in Fig.4b, we observe a similarity between the reconstructed waveforms and the ERP. The greater the similarity between these two waveforms, the higher the confidence in the model’s correct classification.

V Discussion

Current research on multi-task learning for EEG classification is limited, but its effectiveness in cross-subject classification on highly variable EEG signals has been demonstrated. Our proposed method enables rapid adaptation to new individuals once the encoder has been trained, a capability derived from the bidirectional associative memory network. For brain-computer interface applications, our model can be fitted to new individuals using only a small number of samples, which facilitates the interaction of BCI devices across different users.

With the advent of neuromorphic computing, specialized chips designed for spiking neural networks have been developed[33, 34]. Compared to CPUs and GPUs based on the Von Neumann architecture, neuromorphic circuits can be implemented using in-memory computing digital circuits or memristor-based analog circuits. These circuits offer advantages such as high parallel efficiency, low energy consumption, and compact size. The computation of spiking neural populations on such neuromorphic circuits can be employed in edge scenarios, promoting the domestic and miniaturized use of BCI devices.

Although initially designed to address single-task cross-individual EEG classification, our multi-task learning model can also be applied to single-individual multi-task or multi-individual multi-task scenarios when tasks are correlated. Therefore, our model holds promise for achieving multimodal EEG decoding.

VI Conclusion

This article has developed AM-MTEEG, a multi-task EEG classification model based on impulsive associative memory. This model effectively integrates impulsive neural representations from deep learning with bidirectional associative memory networks to address challenges in the brain-computer interface field, such as limited EEG data, high variability, and the lack of interpretability in deep learning models. Through the multi-task learning framework, our model treats each subject’s classification task as an independent task and leverages cross-subject training to extract shared features and facilitate feature sharing across individuals. Our model demonstrated superior performance on the BCI Competition dataset, achieving accuracy that surpasses existing state-of-the-art models on the BCI Competition IV IIa dataset, while minimizing classification performance variance across different samples. These results validate our model’s effectiveness in extracting common EEG features, capturing data variability, and handling cross-subject classification tasks. Future work will focus on further integrating associative memory networks with deep impulsive neural networks, as well as exploring the joint learning of Hebbian rules and gradient descent.

References

[1] F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo, A. Rakotomamonjy, and F. Yger, “A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update,” Journal of neural engineering, vol. 15, no. 3, p. 031005, 2018.
[2] M. A. Lebedev and M. A. Nicolelis, “Brain-machine interfaces: from basic science to neuroprostheses and neurorehabilitation,” Physiological reviews, vol. 97, no. 2, pp. 767–837, 2017.
[3] F. Lotte, L. Bougrain, and M. Clerc, “Electroencephalography (EEG)-based brain-computer interfaces,” Wiley encyclopedia of electrical and electronics engineering, p. 44, 2015.
[4] A. D. Degenhart, W. E. Bishop, E. R. Oby, E. C. Tyler-Kabara, S. M. Chase, A. P. Batista, and B. M. Yu, “Stabilization of a brain–computer interface via the alignment of low-dimensional spaces of neural activity,” Nature biomedical engineering, vol. 4, no. 7, pp. 672–685, 2020.
[5] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces,” Journal of neural engineering, vol. 15, no. 5, p. 056013, 2018.
[6] X. Hu, J. Chen, F. Wang, and D. Zhang, “Ten challenges for EEG-based affective computing,” Brain Science Advances, vol. 5, no. 1, pp. 1–20, 2019.
[7] X. Shen, X. Liu, X. Hu, D. Zhang, and S. Song, “Contrastive learning of subject-invariant EEG representations for cross-subject emotion recognition,” IEEE Transactions on Affective Computing, vol. 14, no. 3, pp. 2496–2511, 2022.
[8] A. Adadi and M. Berrada, “Peeking inside the black-box: a survey on explainable artificial intelligence (xai),” IEEE access, vol. 6, pp. 52 138–52 160, 2018.
[9] Q. Zheng, Y. Wang, and P. A. Heng, “Multitask feature learning meets robust tensor decomposition for EEG classification,” IEEE Transactions on Cybernetics, vol. 51, no. 4, pp. 2242–2252, 2019.
[10] A. R. Seitz, “Sensory learning: rapid extraction of meaning from noise,” Current Biology, vol. 20, no. 15, pp. R643–R644, 2010.
[11] E. B. Herreras, “Cognitive neuroscience; the biology of the mind,” Cuadernos de Neuropsicología/Panamerican Journal of Neuropsychology, vol. 4, no. 1, pp. 87–90, 2010.
[12] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K.-R. Muller, “Optimizing spatial filters for robust EEG single-trial analysis,” IEEE Signal processing magazine, vol. 25, no. 1, pp. 41–56, 2007.
[13] K. Liu, M. Yang, Z. Yu, G. Wang, and W. Wu, “FBMSNet: A Filter-Bank Multi-Scale Convolutional Neural Network for EEG-Based Motor Imagery Decoding,” IEEE Transactions on Biomedical Engineering, vol. 70, no. 2, pp. 436–445, 2023.
[14] H. Altaheri, G. Muhammad, and M. Alsulaiman, “Physics-informed attention temporal convolutional network for EEG-based motor imagery classification,” IEEE transactions on industrial informatics, vol. 19, no. 2, pp. 2249–2258, 2022.
[15] Y. Zhang and Q. Yang, “A survey on multi-task learning,” IEEE transactions on knowledge and data engineering, vol. 34, no. 12, pp. 5586–5609, 2021.
[16] Y. Qi, X. Zhu, K. Xu, F. Ren, H. Jiang, J. Zhu, J. Zhang, G. Pan, and Y. Wang, “Dynamic Ensemble Bayesian Filter for Robust Control of a Human Brain-Machine Interface,” IEEE Transactions on Biomedical Engineering, vol. 69, no. 12, pp. 3825–3835, 2022.
[17] P. U. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,” Frontiers in computational neuroscience, vol. 9, p. 99, 2015.
[18] M. Xu, X. Chen, A. Sun, X. Zhang, and X. Chen, “A Novel Event-Driven Spiking Convolutional Neural Network for Electromyography Pattern Recognition,” IEEE Transactions on Biomedical Engineering, vol. 70, no. 9, pp. 2604–2615, 2023.
[19] P. G. Cachi, S. V. Soto, and K. J. Cios, “TM-SNN: Threshold Modulated Spiking Neural Network for Multi-task Learning,” in International Work-Conference on Artificial Neural Networks. Springer, 2023, pp. 653–663.
[20] B. Hu, Z.-H. Guan, G. Chen, and F. L. Lewis, “Multistability of Delayed Hybrid Impulsive Neural Networks With Application to Associative Memories,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 5, pp. 1537–1551, 2019.
[21] T. Miconi, K. Stanley, and J. Clune, “Differentiable plasticity: training plastic neural networks with backpropagation,” in International Conference on Machine Learning. PMLR, 2018, pp. 3559–3568.
[22] Y. Wu, R. Zhao, J. Zhu, F. Chen, M. Xu, G. Li, S. Song, L. Deng, G. Wang, H. Zheng et al., “Brain-inspired global-local learning incorporated with neuromorphic computing,” Nature Communications, vol. 13, no. 1, p. 65, 2022.
[23] E. O. Neftci, H. Mostafa, and F. Zenke, “Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks,” IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 51–63, 2019.
[24] B. Kosko, “Bidirectional associative memories,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 18, no. 1, pp. 49–60, 1988.
[25] Kosko, Bart, “Bidirectional associative memories,” IEEE Transactions on Systems, man, and Cybernetics, vol. 18, no. 1, pp. 49–60, 1988.
[26] S. R. Kelso, A. H. Ganong, and T. H. Brown, “Hebbian synapses in hippocampus.” Proceedings of the National Academy of Sciences, vol. 83, no. 14, pp. 5326–5330, 1986.
[27] G. Dornhege, B. Blankertz, G. Curio, and K.-R. Muller, “Boosting bit rates in noninvasive EEG single-trial classifications by feature combination and multiclass paradigms,” IEEE transactions on biomedical engineering, vol. 51, no. 6, pp. 993–1002, 2004.
[28] C. Brunner, R. Leeb, G. Müller-Putz, A. Schlögl, and G. Pfurtscheller, “BCI Competition 2008–graz data set a,” Institute for knowledge discovery (laboratory of brain-computer interfaces), Graz University of Technology, vol. 16, pp. 1–6, 2008.
[29] Y. Song, D. Wang, K. Yue, N. Zheng, and Z.-J. M. Shen, “EEG-based motor imagery classification with deep multi-task learning,” in 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019, pp. 1–8.
[30] C. Han, C. Liu, C. Cai, J. Wang, and D. Qian, “EDPNet: An Efficient Dual Prototype Network for Motor Imagery EEG Decoding,” arXiv preprint arXiv:2407.03177, 2024.
[31] K. Wimalawarne, R. Tomioka, and M. Sugiyama, “Theoretical and experimental analyses of tensor-based regression and classification,” Neural computation, vol. 28, no. 4, pp. 686–715, 2016.
[32] E. R. J. Levy, S. Carrillo-Segura, E. H. Park, W. T. Redman, J. R. Hurtado, S. Chung, and A. A. Fenton, “A manifold neural population code for space in hippocampal coactivity dynamics independent of place fields,” Cell reports, vol. 42, no. 10, 2023.
[33] M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain et al., “Loihi: A neuromorphic manycore processor with on-chip learning,” Ieee Micro, vol. 38, no. 1, pp. 82–99, 2018.
[34] J. Pei, L. Deng, S. Song, M. Zhao, Y. Zhang, S. Wu, G. Wang, Z. Zou, Z. Wu, W. He et al., “Towards artificial general intelligence with hybrid Tianjic chip architecture,” Nature, vol. 572, no. 7767, pp. 106–111, 2019.