SSVEP-DANet: Data Alignment Network for SSVEP-based Brain Computer Interfaces

Sung-Yu Chen, and Chun-Shu Wei This work was supported in part by the National Science and Technology Council (NSTC) under Contracts 109-2222-E-009-006-MY3, 110-2221-E-A49-130-MY2, and 110-2314-B-037-061; and in part by the Higher Education Sprout Project of National Yang Ming Chiao Tung University and Ministry of Education. Corresponding author: Chun-Shu Wei (wei@nycu.edu.tw). Xin-Yao Huang, Sung-Yu Chen and Chun-Shu Wei are with the Department of Computer Science, National Yang Ming Chiao Tung University (NYCU), Hsinchu, Taiwan. Chun-Shu Wei is also with the Institute of Education and the Institute of Biomedical Engineering, NYCU, Hsinchu, Taiwan.

Abstract

Many advanced steady-state visual evoked potential (SSVEP) recognition methods use individual calibration data to enhance accuracy. However, the laborious and time-consuming calibration process significantly induces visual fatigue in subjects. Additionally, due to inter-subject varability, it is challenging to concatenate calibration data from different participants for training the detection algorithm. Therefore, we propose a data alignment network (SSVEP-DANet) to align cross-domain (sessions, topics, devices) SSVEP data. Simultaneously, we introduce a novel domain adaptation framework called pre-adaptation to expedite the transformation process through intermediate participants. The evaluation of the proposed SSVEP-DANet on two Tsinghua SSVEP datasets demonstrates its significant superiority over the SSVEP decoding algorithms (task-related component analysis, TRCA) as well as a competitive data alignment method (Least Squares Transform, LST). This paper demonstrates that the SSVEP-DANet-based approach can improve SSVEP detection algorithms by reducing cross-domain variations and enhancing the effectiveness of calibration data. As the first neural network-based data alignment model for SSVEP-based brain-computer interfaces (BCIs), we foresee that the proposed SSVEP-DANet method will not only assist future neural network developers in designing SSVEP data transformation models, but also expedite the practical application of SSVEP-based BCIs.

Index Terms:

Electroencephalogram (EEG), Brain-computer interface (BCI), steady-state visual evoked potentials (SSVEPs), Data Alignment, Domain Adaptation

I Introduction

Steady-State Visual Evoked Potentials (SSVEPs) are a specific type of EEG signal that occurs in the cortical region of the human brain when an individual focuses their attention on visual stimuli flickering at specific frequencies [david1977ssvep1, norcia2015ssvep2]. SSVEPs are known for their robustness [waytowich2016multiclass, waytowich2018compact] and have emerged as a reliable control signal for non-invasive brain-computer interfaces (BCIs) [wolpaw2007bci, birb2006breaking], facilitating communication between individuals and computers or external devices. The utilization of SSVEP-based BCIs has found widespread applications in various practical domains, including spelling [cheng2002design, chen2015high], gaming [martivsius2016prototype, chen2017single], and device control [muller2011using, guneysu2013ssvep].

Refer to caption — Figure 1: An illustration of the concept of our domain adaptation framework. The domain adaptation involves transferring from the source domain to the target domain, denoted as $D_{S}\rightarrow D_{T}$ .

To accurately detect and analyze users’ SSVEPs for distinguishing corresponding stimuli, the development of efficient decoding algorithms has become significantly important. Canonical correlation analysis (CCA) [lin2007cca, bin2009onelineCCA] is a training-free technique that is used to explore the relationships between multichannel SSVEP and the reference signals corresponding to each stimulation frequency. In addition to CCA, researchers often employ training-based decoding algorithms to improve the detection of SSVEPs, which utilize individual calibration data or SSVEP templates. One such method is task-related component analysis (TRCA) [nakan2018trca, chiang2022fastTRCA], which aims to separate task-related information from non-task-related information by maximizing the reproducibility of SSVEP data within each trial. Furthermore, to outperform TRCA-based methods, a combination of non-linear convolutional neural network (CNN) models and traditional correlation analysis has been proposed, known as convolutional correlation analysis (Conv-CA) [li2020conCCA]. However, the individual calibration process is often time-consuming and laborious, leading to significant visual fatigue in subjects [cao2014fatigue]. Furthermore, due to substantial inter-subject variability, utilizing training data from a larger number of participants can actually decrease the performance of the decoding algorithm [Chiang2021LST]. Therefore, in this study, we employ domain adaptation techniques [pan2010survey] to mitigate inter-domain disparities and enhance the performance of SSVEP decoding algorithms, particularly in scenarios with limited calibration data. Figure 1 depicts the domain adaptation framework we propose.

TABLE I: Related work on the domain adaptation methods in SSVEP studies.

Method	DA approaches	Share subspace	stimulus-specific	Classifier input	Subjects	Sessions	Devices
					Domain transferred
RPA	Subspace alignment	Yes	No	SPD matrix	Yes	Yes	No
SLR	Subspace alignment	Yes	Yes	Spatial patterns	Yes	Yes	Yes
ALPHA	Subspace alignment	Yes	Yes	Spatial patterns	Yes	Yes	Yes
TSA	Subspace alignment	Yes	No	Tangent vectors	Yes	Yes	Yes
LST	Data alignment	No	Yes	Time-series data	Yes	Yes	Yes

Domain adaptation [pan2010survey] aims to transfer knowledge learned from a source domain to improve the performance of a model on a target domain. Many studies have applied domain adaptation techniques in SSVEP-BCI [Chiang2021LST, rodrigues2018rpa, masaki2020facil, liu2022alpha, bleuze2022tsa]. In subspace alignment, these methods align the data from different participants to a common subspace to reduce the inter-subject data distribution differences, thereby improving the accuracy of SSVEP-based BCI systems. However, some methods that output non-time-series data [rodrigues2018rpa, bleuze2022tsa], such as covariance or tangent vectors, or methods that identify target stimuli by computing the correlation coefficients between spatial features [liu2022alpha, masaki2020facil], impose limitations on subsequent classification algorithms and make it difficult to integrate with commonly used SSVEP decoding algorithms such as TRCA and Conv-CA. In contrast, in data alignment, this method aligns the source domain samples with the target domain samples, resulting in time-series data as output, which can be combined with popular SSVEP recognition algorithms to obtain more powerful subject-specific decoding algorithms.

Despite the successful domain apatation approaches to enhance the classification accuracy of SSVEP, most methods have focused on individual visual stimuli and learned stimulus-specific model parameters [Chiang2021LST, masaki2020facil, liu2022alpha]. This stimulus-specific learning approach results in more constrained available data, making it challenging to obtain optimally accurate and stable model parameters. Furthermore, data quality significantly influences model parameter. When the data from source subjects is collected via dry electrode devices, EEG data with lower SNR is often acquired, which could lead to less noticeable improvements in domain adaptation [Chiang2021LST].

To address these issues, we introduce a neural network-based data alignment method named SSVEP-DANet. SSVEP-DANet not only employs stimulus-independent training techniques to acquire relatively robust model parameters but also employs non-linear transformation techniques for efficient transmission of SSVEP signals. In addition, the TRCA-based algorithm is considered the most effective high-performance SSVEP decoding algorithm [wong2020spatial, chiang2022reformulating]. Therefore, we incorporate this algorithm into our domain adaptation study. Detailed explanation of the data alignment method combined with the TRCA-based algorithm is provided in figure 2. This study proposes the first neural network architecture dedicated to SSVEP data alignment for improving SSVEP decoding performance under limited calibration data. The proposed architecture incorporates innovative training methodologies, notably stimulus-independent training and pre-training techniques, aimed at mitigating the challenges posed by data scarcity. The viability and effectiveness of the proposed SSVEP-DANet framework are validated through a rigorous validation process incoporating diverse cross-domain scenarios that corresponds to practical SSVEP-based BCI speller applications.

II Related Work

This section provides the background of domain adaptation in SSVEP-based BCI and reviews related work on the domain adaptation approach in SSVEP studies. Table I presents a comparison of the current domain adaptation techniques utilized in SSVEP-based BCIs.

II-A Background of domain adaptation in SSVEP-based BCI

Domain adaptation techniques aim to adapt the trained model from the source domain to the target domain by leveraging the available data from the target domain while utilizing the knowledge learned from the source domain. The goal is to improve the model’s performance and generalization capabilities across different users without the need for extensive retraining or user-specific calibration.

Domain adaptation methods in SSVEP-based BCIs are typically achieved through shared feature spaces or subspaces, as well as utilizing transformation relationships between the data. Additionally, based on the differences in domain adaptation methods, we can categorize these methods into two groups: 1) subspace alignment, which involves sharing feature spaces or subspaces, and 2) data alignment, which involves finding transformation relationships between samples [sarafraz2022domain, wan2021review, wu2020transfer].

II-B Subspace alignment

Subspace alignment methods align the source and target domains to a common subspace to reduce the discrepancy between the two domains. Riemannian Procrustes Analysis (RPA) [rodrigues2018rpa] achieves this by applying simple geometric transformations (translation, scaling, and rotation) to symmetric positive definite matrices (SPD), aligning the source and target domains to the same subspace. Although this method can be applied across subjects and sessions, its practicality is limited due to the output being SPD matrices. Shared Latent Response (SLR) [masaki2020facil] uses common spatial filtering methods, including CCA and TRCA, to extract features from the training data and then uses least squares regression to obtain new spatial filters that project test data onto the same subspace as the training data. This approach is applicable to cross-subject, cross-session, and cross-device scenarios, with input data being common time series data, providing more flexibility in practical applications. ALign and Pool for EEG Headset domain Adaptation (ALPHA) [liu2022alpha] aligns spatial patterns through orthogonal transformations and aligns the covariance between different distributions using linear transformations, mitigating variations in spatial patterns and covariance. This method further improves upon obtaining new spatial filters and achieves better performance than SLR. Additionally, it can be applied to cross-subject, cross-session, and cross-device scenarios with input data being time series data, thereby providing more practical opportunities. Tangent Space Alignment (TSA) [bleuze2022tsa] shares similarities with RPA as it aligns different domains to the same subspace through translation, scaling, and rotation, but operates within the tangent space. The tangent space being Euclidean allows for faster decoding, and rotation can be achieved with a singular value decomposition (SVD), making it computationally efficient compared to RPA. TSA is also applicable to cross-subject, cross-session, and cross-device scenarios. However, it outputs tangent vectors, which impose limitations on subsequent classifiers, thus restricting its practicality.

II-C Data alignment

Data alignment methods align source domain samples with target domain samples in order to mitigate the disparities between the two domains. The Least Squares Transformation (LST) approach [Chiang2021LST] finds a linear transformation relationship among the SSVEP data, effectively reducing the errors between the transformed data from the source SSVEP and the target SSVEP. This method is applicable in cross-domain scenarios (sessions, subjects, and devices). Furthermore, the output is time series data and differs from some subspace alignment methods [masaki2020facil, liu2022alpha] that can only identify target stimuli by computing correlation coefficients between spatial features in the same subspace. This method can be integrated into commonly used SSVEP decoding algorithms such as CCA, TRCA, and conCA, significantly enhancing their flexibility and feasibility in real-world applications. Therefore, we aim to design a data alignment method that can be applied to commonly used SSVEP decoding algorithms or future, more powerful SSVEP decoding models.

III Materials and Method

III-A Architecture

In recent years, several non-linear SSVEP models for data augmentation, such as SBSGAN [aznan2019simulating], S2S-StarGAN [kwon2022novel], and TEGAN, [pan2023short] have demonstrated promising results. This suggests that SSVEP signals can exhibit nonlinear characteristics. Additionally, SSVEP is a time-synchronous data; we will focus on spatial characteristics rather than temporal. Therefore, in this study, we assume that there exists a non-linear and channel-wise transformation of SSVEP signals between subjects. We propose a neural network-based transformation to transfer source SSVEP signals to target SSVEP signals. Figure 3 shows our network architecture, which consists of two modules: one spatial convolution and two channel-wise fully connected layer, a tangent hyperbolic (Tanh) activation function is used between two fully connected layers. The input to DANet is SSVEP data obtained in the source domain format to be $N_{C}\times N_{S}$ and the transformation target from the existing training format to be $N_{C^{\prime}}\times N_{S}$ is the SSVEP template obtained in the target domain, which is obtained by averaging multiple trials corresponding to specific stimulation from the target subject for noise suppression and signal improvement, where $N_{C}$ indicates the number of channels, $N_{C^{\prime}}$ indicates the number of channels, $N_{S}$ is the number of time samples.

Since SSVEP is time-synchronous data, several detection methods, such as CCA and TRCA, apply a spatial filter to find a linear combination of channel-wise SSVEP signal. These methods have been demonstrated to improve signal-to-noise ratio (SNR) [Johnson2006snr] and enhance SSVEP detection performance. SCCNet [wei2019sccnet] applies a network architecture based on spatial components for noise reduction. Therefore, in the first module, we use spatial convolution to project the original SSVEP data to latent spaces to obtain spatial features, where $N_{C}$ spatial filters with kernel size of $(N_{C},1)$ . Then, the spatial features are permuted in order of $(2,1,3)$ , and batch normalization is applied. Finally, we permute the features in order of $(1,3,2)$ as input for the next module.

In the second module, we perform two channel-wise fully connected layers and one activation function to find the non-linear channel relation between spatial features and target SSVEP templates. First, we use a channel-wise fully connected layer to integrate channel information at each time point and project the feature to a new latent space. Second, we utilize the activation function $tanh$ to make the data easier to fit our model. At last, we use a channel-wise fully connected layer to project the features into the target domain space and permute the data in order of $(1,3,2)$ as SSVEP data similar to the target domain data.

III-B Model Training

(i) cross-stimulus : The training approach involving cross-stimulus training enables the learning of model parameters across different stimulus frequencies, even when the available calibration data is limited, thereby ensuring a relatively reliable learning process. Therefore, during the training of SSVEP-DANet, all stimuli from all subjects are utilized to learn distinct mapping relationships between the target domain and the source domain, facilitating the attainment of a more stable domain adaptation model.

(ii) pre-trained: Many studies utilize the fine-tuning technique in transfer learning, which involves taking the pre-trained model and training some of its layers on new data, to improve the recognition performance for SSVEP-based BCI [ravi2020ssvepIDUD] [guney2021ssvepDNN]. Similarly, in order to get better model weights to transfer the existing domain to the target domain, we fine-tune the pre-trained model to produce different model weights using different subjects. The procedure of our method is shown in figure 4.

For each subject from the source domain, we utilize a fine-tuned model corresponding to a specific subject to transfer the subject data. We concatenate all transformation data from the source subjects and calibration data from the target subjects to form a larger training set. The new training set can be used to obtain a new spatial filter and SSVEP template using the TRCA-based method. Finally, we perform the SSVEP-decoding algorithm to classify the target stimuli.

III-C Configuration

We fit the network using Adam optimizer [kingma2014adam] with learning rate of 0.0005. And MSE (Mean Square Error) between target SSVEP data $X_{T}$ and output SSVEP data $X^{\prime}_{T}$ is used to measure a transformation loss $L_{trans}$ :

L_{trans}(X_{T},X^{\prime}_{T})=||X_{T}-X^{\prime}_{T}||^{2}_{2}

The source subjects are splitted into the training set and the validation set in the ratio of 8:2 in this study. We train 500 epochs and save our model weights that produce the lowest validation set loss. In the fine-tune stage, we fine-tune 150 epochs for each source subject.

III-D Data

(i) Dataset I. : The benchmark dataset [wang2017benchmark] used in this study is a publicly available SSVEP dataset prepared by the Tsinghua group. In this dataset, the SSVEP-based BCI experiment involved 35 participants. Each participant participated in 6 blocks of the experiment, each block comprising 40 trials presented in random order. Visual stimuli were presented within a frequency range of 8 to 15.8 Hz with an interval of 0.2 Hz. The phase values of the stimuli ranged from 0, with a phase interval of 0.5 $\pi$ . EEG signals were recorded using the extended 10-20 system through 64 channels. We selected EEG data from eight channels (PO3, PO4, PO5, PO6, POz, O1, O2, Oz) in the analysis and performance evaluation. The EEG signals were down-sampled from 1000 to 250 Hz, and a notch filter at 50 Hz was applied to remove the common power-line noise. The data were extracted in [ $L_{1}$ s, $L_{1}$ + $T_{w_{1}}$ s], where time zero denotes stimulus onset, $L_{1}$ is the latency delay ( $L_{1}=$ 0.14 s) and $T_{w_{1}}$ indicates the time-window length ( $T_{w_{1}}=$ 1.5 s).

(ii) Dataset II. : The wearable SSVEP BCI dataset [zhu2021wearalbe] used in this study is a publicly available SSVEP dataset prepared by the Tsinghua group. In this dataset, 102 healthy subjects participated in the wearable SSVEP-based BCI experiment. The experiment consisted of 10 blocks, each of which contained 12 trials in random order of 12 visual stimuli. Stimulation frequencies ranged from 9.25 to 14.75 Hz with an interval of 0.5 Hz. The phase values of the stimuli started at 0, and the phase difference between two adjacent frequencies was 0.5 $\pi$ . The 8-channel EEG data was recorded using wet and dry electrodes, and the electrodes were placed according to the international system 10-20. All channels (PO3, PO4, PO5, PO6, POz, O1, O2, Oz) of the EEG signals were used in data analysis and performance evaluation. The EEG signals were resampled at 250 Hz from 1000 Hz. To remove the common power-line noise, a 50 Hz notch filter was applied to the dataset. The data was extracted in [0.5 + $L_{2}$ s, 0.5 + $L_{2}$ + $T_{w_{2}}$ s], where 0.5 s denotes stimulus onset, $L_{2}$ indicates latency delay ( $L_{2}$ = 0.14 s) and $T_{w_{2}}$ is the time-window length ( $T_{w_{2}}=$ 1.5 s).

TABLE II: Cases of domain adaptation

Task	Source domain	Target domain
Benchmark	Dataset I	Dataset I
Dry to dry	Dataset II - Dry	Dataset II - Dry
Wet to wet	Dataset II - Wet	Dataset II - Wet
Dry to wet	Dataset II - Dry	Dataset II - Wet
Wet to dry	Dataset II - Wet	Dataset II - Dry

IV Experiment

Cases of domain adaptation Table II shows the cases of domain adaptation under different datasets.

(a) Benchmark : Both source and target subjects wore the same EEG device.

(b) Dry to dry : Both source and target subjects wore the same dry electrode EEG device.

(d) Dry to wet : The source subjects wore dry electrode EEG devices, while the target subjects wore wet electrode EEG devices.

(e) Wet to dry : The source subjects wore wet electrode EEG devices, while the target subjects wore dry electrode EEG devices.

Compared methods Our method will be compared with the following methods: TRCA-based method [nakan2018trca], LST [Chiang2021LST].

(a) TRCA-based SSVEP decoding algorithm : TRCA is a training-based algorithm that aims to extract task-related components by maximizing the reproducibility of neural activity across multiple trials within each specific task [nakan2018trca]. Furthermore, the combination of TRCA and filter bank analysis facilitates the decomposition of SSVEP signals into multiple sub-band components, thereby effectively extracting independent information embedded within the harmonic components [chen2015fbcca]. We set the number of filter banks to 3 and 5 in the wearable SSVEP dataset and the benchmark dataset from the Tsinghua group, respectively. Finally, an ensemble approach is employed to integrate multiple filters trained using the aforementioned methods, thereby enhancing the robustness and accuracy of SSVEP-based BCIs.

(b) Least-squares transformation method : LST is a linear transformation method that aims to find the transformation relationship between SSVEP data by minimizing the error between the source SSVEP and target SSVEP transformed data [Chiang2021LST] .

Performance evaluation. To evaluate the proposed SSVEP-DANet method to align SSVEP data from different domains, we compare the TRCA-based SSVEP decoding algorithm and the LST-based method using different training schemes:

(a) Baseline : The calibration data are obtained from target domain. This training method is as same as individual training scheme.

(b) Concat. : The calibration data from target domain and training data from existing domain are combined to create a new training set.

(c) SSVEP-DANet : We use SSVEP-DANet to transform training trials from existing domain to transformation data and concatenate them with calibration data from target domain to form a new training set.

(d) LST : We use LST to transform training trials from existing domain to transformation data and concatenate them with calibration data from target domain to form a new training set.

To assess the performance of the SSVEP-DANet-based domain adaptation method in real-world SSVEP-based BCIs, we conducted a series of training schemes using a leave-one-subject-out cross-validation. During this validation process, one subject was considered the target user, while the remaining subjects were treated as existing users. When testing a new user (the target user), the trials corresponding to each stimulus were partitioned into calibration and testing sets, with a ratio of 4:2 for Dataset I and 6:4 for Dataset II. In the Baseline scheme, we performed multiple calibration trials (ranging from 2 to 4 trials for Dataset I and 2 to 6 trials for Dataset II) for each of the multiple stimuli (40 stimuli for Dataset I and 12 stimuli for Dataset II) on the target subject to construct the training set. In the Concat. scheme, all trials (6 trials for Benchmark and 10 trials for Wearable) of each stimulus from all non-target subjects (source subjects) were concatenated with the training set utilized in the Baseline scheme. Similarly, in the LST scheme, the data from source subjects were transformed using the LST method and then combined with the Baseline training set. In the SSVEP-DANet scheme, the data from source subjects were transformed using the SSVEP-DANet method and then merged with the Baseline training set. Subsequently, the SSVEP decoding performance of these four models was evaluated on the testing set, consisting of 2 trials for Benchmark and 4 trials for Wearable, using SSVEP. In addition to the Baseline scheme, we further evaluated the other three schemes in a cross-device scenario. In this scenario, we employed a leave-one-subject-out cross-validation approach, selecting different EEG devices for the target and source subjects.

The performance of all training mechanisms was averaged over 10 or more random initializations. To assess the significance of improvements between the SSVEP-DANet-based method and other training schemes, we employed the Wilcoxon signed-rank test.

IV-A Number of calibration trials per stimulus

In this section, we experimentally validate the performance of the SSVEP-DANet scheme compared to three other schemes (Baseline, Concat. , and LST) when the number of calibration trials for each stimulus varies for the target subject. We also discuss the impact of different numbers of calibration trials for each stimulus on the effectiveness of our SSVEP-DANet scheme.

Figure 5 demonstrates the cross-domain performance using different schemes under varying numbers of calibration trials for each stimulus. The results indicate that our proposed method consistently outperforms the other three schemes in most cases, regardless of whether the calibration data for the target subject are sufficient. Moreover, as the number of calibration trials for the target subject increases, our proposed method consistently improves the performance of the SSVEP decoding algorithm. Interestingly, due to domain variability, the naive transfer learning (Concat. scheme) exhibits a negative impact on the TRCA-based method. Similarly, the LST-based method also adversely affects the TRCA-based method, as the accuracy of the transformation matrix is influenced by unstable SSVEP signals. Additionally, with a large amount of training data, we found that our proposed method does not rely on the data quality of a single source domain and can stably enhance the performance of the SSVEP decoding algorithm under different numbers of calibration trials for each stimulus.

IV-B Number of supplementary subjects

This section validates the performance of the w/ SSVEP-DANet scheme compared to the other three schemes under different numbers of supplementary subjects. Additionally, it discusses the impact of varying numbers of supplementary subjects on the effectiveness of our w/ SSVEP-DANet scheme.

Figure 6 illustrates the cross-domain performance using different schemes under varying numbers of supplementary subjects. The results demonstrate that our proposed method outperforms the other three schemes in most cases, regardless of the number of supplementary subjects. Interestingly, the results indicate that both w/o SSVEP-DANet and the w/ LST schemes exhibit negative impacts on the TRCA-based method. We observed a significant drop in performance for the LST-based method as the number of supplementary subjects from dry electrode devices increases, indicating that the LST method transfers more low-quality SSVEP data for training TRCA. Furthermore, similar to the experiments mentioned earlier, our proposed SSVEP-DANet model, trained on a large amount of data, is capable of improving the performance of the SSVEP decoding algorithm consistently across different numbers of supplementary subjects, irrespective of the quality of the SSVEP signals. On the other hand, in both the dry-to-dry and wet-to-dry scenarios, since SSVEP-DANet is a training-based approach, incorporating more supplementary subjects enhances its data alignment capability and effectively transforms the data from the source domain.

IV-C Visulization

We utilized t-SNE [van2008tSNE] to perform dimensionality reduction on EEG data, reducing it to 2D in order to compare SSVEP trials between two scenarios: SSVEP-DANet and Concat. , as well as among target subjects. Additionally, we investigated whether the use of SSVEP-DANet can effectively reduce differences among the subjects.

Figure 7 (a) presents the t-SNE visualization of the 13th target subject from the Benchmark dataset in a cross-subject scenario. Figure 7 (b) illustrates the t-SNE visualization of the first target subject wearing a dry electrode device in the Wet-to-dry scenario, using data from the Wearable dataset. From these figures, we observed that under the same stimulus, the clusters in the SSVEP-DANet scheme were smaller compared to the clusters in the Concat. scheme. Additionally, under different stimuli, the clusters in the SSVEP-DANet scheme showed stronger separation compared to the Concat. scheme. It is noteworthy that, under the same stimulus, the clusters in the SSVEP-DANet scheme did not consistently align closely with the target subject clusters. These findings suggest that the SSVEP-DANet method reduces inter-subject variability and increases inter-stimulus variability.

We utilized EEG spectrograms to investigate the impact of increased similarity among subjects on the power spectral density at the target frequency.

Figures 8 (a) depict the average spectra of the 12.6 Hz SSVEP signals for target subject 3 in the Benchmark dataset, under three schemes (Baseline, Concat. , and SSVEP-DANet), when the calibration trials for each stimulus are two. Similarly, Figures 8 (b) show the average spectra of the 14.75 Hz SSVEP signals for target subject 1 wearing a dry electrode device in the Wet-to-dry scenario from the Wearable dataset, with calibration trials two, for each stimulus under the three schemes (Baseline, Concat. , and SSVEP-DANet). In the Concat. scheme, we observed different outcomes in Figure 8 (a) and Figure 8 (b). The results in Figure 8 (a) indicate that the Concat. scheme on the standard dataset struggles to obtain stable spectra due to the high variability of SSVEP trials in the scheme, resulting in less concentrated spectral peaks at the target frequency. Conversely, the results in Figure 8 (b) demonstrate that on the wearable dataset, where the target participants have suboptimal SNR, the Concat. scheme can effectively improve SNR by incorporating a large amount of higher-quality data, leading to more stable spectra. In the SSVEP-DANet scheme, significant enhancements in peak amplitudes are observed at the target frequency and its harmonics in both Figure 8 (a) and Figure 8 (b). These findings suggest that our proposed SSVEP-DANet method effectively reduces inter-trial variability in SSVEP experiments, enabling the utilization of non-target subject trials and increasing the SNR. Moreover, these phenomena are reflected in the decoding accuracy (figure 5).

IV-D Ablation study

TABLE III: Ablation study of SSVEP-DANet across different training procedure. The asterisks indicate a significant different between SSVEP-DANet and other schemes. (*

p<

0.05, **

p<

0.01, ***

p<

0.001).

We conducted ablation experiments to evaluate the effectiveness of different training methods in SSVEP-DANet when target subjects have limited calibration data. Table III presents the results of SSVEP-DANet using different training methods. The Single + w/o FT scheme involves training the model using a single stimulus across subjects and using model transfer SSVEP signal from the corresponding stimulus in the source domain. The Subject scheme entails training distinct subject-specific models by employing various individual source subjects. These models are then employed to transfer SSVEP signals from corresponding subjects within the source domain. The All + w/o FT scheme, on the other hand, involves training the model using all the training data across subjects and applying the same model to transform the SSVEP signal from the source domain. The results indicate that our proposed training approach significantly outperforms both the Single + w/o FT scheme, the Subject scheme and the All + w/o FT scheme in most cases. This suggests that our proposed training scheme can effectively train a robust and stable model by utilizing a large amount of data. Fine-tuning the model using specific source domain subjects and target subjects allows for efficient transfer of source domain subject information to the target subjects, thereby significantly improving the accuracy of SSVEP recognition.

V Conclusion

In this study, we propose SSVEP-DANet, a model for SSVEP data alignment, which integrates pioneering training methods, particularly cross-stimulus training and pre-training techniques. Our experimental results demonstrate that the SSVEP-DANet-based approach enhances subject similarity and improves SSVEP decoding accuracy by effectively utilizing data from non-target subjects. Additionally, the results of ablation studies indicate that through cross-stimulus training and pre-training techniques, further stability in performance enhancement can be achieved. Overall, these robust validation findings substantiate the practical viability and efficacy of the SSVEP-DANet framework in real-world applications of SSVEP-based BCI spellers.

VI Future Work

Although our proposed DANet-based approach shows improvement over the TRCA-based approach, it is essential to validate its effectiveness on other state-of-the-art SSVEP classifiers, including CORCA [zhang2018corca], SSVEPNet [Pan2022ssvepnet], Compact-CNN [Waytowich2018CCNN], and DNN [guney2021ssvepDNN]. Furthermore, despite our experiments demonstrating the effectiveness of the SSVEP-DANet model for SSVEP data alignment, we plan to investigate its applicability to other types of time-locked data, such as the BCI Challenge ERN dataset (BCI-ERN) [margaux2012ERNdataset]. This will help showcase the versatility of our model and its potential for improving time-locked signal decoding. Finally, model interpretation is crucial for EEG analysis as it allows us to understand how the model makes decisions based on EEG data. By employing model interpretation techniques, we can identify informative EEG features, validate the correctness of the SSVEP-DANet model, and modify the SSVEP-DANet model architecture to further enhance its performance.