Towards Neuromorphic Compression based Neural Sensing for Next-Generation Wireless Implantable Brain Machine Interface

Vivek Mohan , , Wee Peng Tay , , and Arindam Basu V. Mohan and W. P. Tay are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. (e-mail: vivekmoh001@e.ntu.edu.sg, wptay@ntu.edu.sg)A. Basu is with the Department of Electrical Engineering, City University of Hong Kong. (e-mail: arinbasu@cityu.edu.hk)Note: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. 0000-0002-0248-6417 0000-0002-1543-195X 0000-0003-1035-8770

Abstract

This work introduces a neuromorphic compression based neural sensing architecture with address-event representation inspired readout protocol for massively parallel, next-gen wireless iBMI. The architectural trade-offs and implications of the proposed method are quantitatively analyzed in terms of compression ratio and spike information preservation. For the latter, we use metrics such as root-mean-square error and correlation coefficient between the original and recovered signal to assess the effect of neuromorphic compression on spike shape. Furthermore, we use accuracy, sensitivity, and false detection rate to understand the effect of compression on downstream iBMI tasks, specifically, spike detection. We demonstrate that a data compression ratio of $50-100$ can be achieved, $5-18\times$ more than prior work, by selective transmission of event pulses corresponding to neural spikes. A correlation coefficient of $\approx 0.9$ and spike detection accuracy of over $90\%$ for the worst-case analysis involving $10K$ -channel simulated recording and typical analysis using $100$ or $384$ -channel real neural recordings. We also analyze the collision handling capability and scalability of the proposed pipeline.

Index Terms:

implantable-brain machine interface (iBMI), neurotechnology, neuromorphic compression, address event representation (AER).

List of Abbreviations-

iBMI	Implantable Brain Machine Interface
ADC	Analog to Digital Converter
ADM	Asynchronous Delta Modulator
SPDWOR	Wired-OR Readout
DVS	Dynamic Vision Sensor
AER	Address event representation
APM	All Pulse Mode
PCM	Pulse Count Mode
NHP	Non-human Primate
RMSE	Root Mean Square Error
CC	Correlation Coefficient
$\mathrm{Th_{ON(OFF)}}$	ON/OFF Pulse Generation Threshold
$\mathrm{Th_{spd}}$	Spike Detection Threshold
SPD	Spike Detection
AT-SPD	Absolute-threshold based SPD
NEO-SPD	Non-linear Energy Operator based SPD
TDR	Transmission Data Rate
CR	Compression Ratio

I Introduction

Advances in neurotechnology in recent years has allowed partial restoration of lost sensory capabilities such as vision[1], speech[2] and touch[3] through stimulation, and limited motor capabilities in people suffering from motor impairment or paralysis. At the same time, a variety of neural sensors such as electroencephalography (EEG), electrocorticography (ECoG) and intracortical electrode based implantable brain-machine interface (iBMI) have demonstrated promising results for clinical applications[4]. A typical implementation of an iBMI system involves recording neural activity through a microelectrode array followed by amplification, filtering, and spike-detection stages to capture the action potentials which then may be decoded on- or off-chip to operate and control effectors such as prosthetic arm, computer cursor, mobility devices, etc., as shown in Fig. 1.

Refer to caption — Figure 1: (a) Studies in [5] observed a Moore’s law like doubling of the number of simultaneously recorded neurons. This trend indicates the need to develop neural processing systems that scale well with electrode count while assuring robust performance within the allowable power, memory, and data rate budget. (adapted from [6]) (b) Block diagram of typical iBMI systems involving different schemes for multiplexing the analog/digital signals followed by different options to transmit the recorded signal, ranging from transmission of all recorded signals to transmitting decoded signals. (c) Trade-off plot, putting into perspective the existing iBMI systems and the proposed pipeline.

Recent works using iBMI have enabled brain to handwritten text [7], brain-based speech synthesis [8], and therapies for epilepsy via deep brain stimulation and mental disorders [9]. Despite the compelling advances in the direction of futuristic BMI assistive technologies, the efficacy of such systems is limited by the number of recording channels. It, therefore, becomes necessary for dexterous next generation of implantable brain-machine interfaces (Nx-iBMI) to support parallel recording from thousands of electrodes in order to improve iBMI performance and enable more sophisticated control of assistive technologies such as prosthetic arms. It is also necessary to consider implementing Nx-iBMI as wireless transcutaneous implants to reduce the risk of infection, enhance aesthetics, and user mobility, and allow stable recording for longer durations. Increasing the cellular coverage, i.e., the number of simultaneously recorded neurons, allows the study of granular neuronal interactions and cooperation like never before and potentially treating neural disorders or even enhancing sensory perception. With advances in semiconductor technologies, the general direction toward increasing neural signal resolution has resulted in the development of neural interfaces with higher channel counts [10, 11, 12, 13] and thereby creating a trend of Moore’s law like doubling of the number of simultaneously recorded neurons about every $6.3$ years as shown in Fig. 1(a).

One of the main issues that come up with increasing the channel count is data bandwidth limitations and power dissipation, especially with wireless transmission. To illustrate the challenge with electrode scaling in neural implants, consider an example of Nx-iBMI that would have about $10,000$ channels following Moore’s law like scaling of neural electrodes as shown in Fig. 1(a). A major hurdle for the interface would be digitizing the massive amount of data (10 bits/sample $\times 30$ Ks/sec $\times 10,000$ channels = $3$ Gbps of neural data) and transmitting it off-chip (as done in Schemes I-A and II-A in Fig. 1(b)). The power budget is constrained by the maximum allowable thermal power dissipation of $\approx 80$ mW/cm², and the temperature increase in the neural interface is restricted below $0.5^{\circ}$ C to prevent damage to brain tissue[14]. This points to the need for data compression in the implant to satisfy the increased channel count requirements of the future and to keep the transmission power low. Another issue relates to the wiring required to access numerous electrodes in a limited area, necessitating some form of multiplexing. In this work, we analyze a neuromorphic event-driven neural front-end that can potentially address both these issues.

The rest of the paper is organized as follows. The following section discusses some of the related works and lists the key contributions of this work. Section III describes the proposed neuromorphic compression pipeline. Section IV presents the trade-off between compression and performance for the neuromorphic compression pipeline and compares it with relevant conventional and novel methods from previous works. This is followed by a section V that discusses the main results and shows that our approach is scalable and yields expected results with two real datasets. Finally, we summarize our findings and conclude in the last section.

II Related Work and Contribution

Various research works have made progressive improvements in the iBMI sensing architecture or processing flow. Following are some popular methods that try to address the issues that arise with increasing electrode count on neural recording arrays:

II-A Sub-array digitization

As the number of recording channels increases, traditionally used analog multiplexing schemes (as shown in Fig. 1(b) Scheme-I) become prone to noise and interference, therefore multiplexing digital signals becomes a preferred scalable solution. Newer neural recording arrays such as Neuropixel [11], Neuralink [15], Argo [13] and $59$ K in-vitro electrodes [12] attempt to tackle the power and data-rate limitations by either using programmable switch matrix or by implementing on-chip multiplexing for sub-array digitization, which is a hybrid between the Schemes I and II in Fig. 1(b) involving multiplexing of a group of recording electrodes in the analog domain before digitization, thereby making it feasible for only about $3-4\%$ of the electrodes ( $2,048$ channels for Argo and $384$ channels for Neuropixel) that can be addressed simultaneously. [12, 10] integrate the ADC under a neuro-pixel while increasing the electrode count. This allows full-frame readout from all electrodes besides ensuring high-SNR, robust digital readout from an arbitrarily selectable subset of electrodes. They however neither address the data deluge post-digitization nor provide compression strategies to realize scalable, low-power, wireless neural implants.

II-B On-chip compression

Analog techniques such as spatial compression before digitization and compression via superposition were proposed in [16, 17]. However, they either involve large complexity hardware-intensive algorithms for the front end or are limited by noise summation, which limits the scalability of such neural implants. Compression schemes such as compressive sensing [18] and autoencoder [19] also fall short of the requirements to meet the available wireless data rates with increasing electrode count. On-chip spike sorting [20, 21, 22, 23] and compression [24] have been proposed to reduce the transmitted data (as shown in Fig. 1(b) Scheme-I/II followed by Scheme-C). However, these do not tackle the issue of multiplexing and digitization in the front end of the neural implant sufficiently.

II-C Wired-OR readout (SPDWOR)

An interesting image sensor inspired technique, presented in [25], exploits the spatio-temporal sparsity of neural signals to simultaneously achieve compression and digital multiplexing with wired-OR interactions. It provides a lossy compression by discarding samples in the baseline region via wired-OR contention among pixels retaining samples potentially corresponding to spikes. When multiple pixels compete for access to the limited wires, collision occurs, and more than one row/column decoders are activated, resulting in no unique decoding solution for recovery of quantized voltages. Such collisions may also occur for spike signals depending on the choice of the baseline, especially when neural firing is correlated– leading to loss of spike data. Reducing the loss of valid samples to collision would require a more complex wiring configuration and address decoding schemes.

II-D Activity dependent processing

Data rate and power-sensitive methods propose transmitting a binary train of spike information indicating the presence or absence of spikes in time bins[26] for real-time clinically viable iBMI. While these methods reduce power consumption and data rate by an order of magnitude, they preclude features such as spike shapes for tasks such as spike sorting. However, the performance of iBMI systems employing such compression with increasing electrode count on the neural interface array is still unknown. Some other works such as [27, 28] advocate the integration of decoders in the implant (as shown in Fig. 1(b) Scheme-I/II followed by Scheme-D). While this addresses the problem convincingly, it also limits the implant to specific tasks, reducing its adaptability in the future [29].

II-E Neuromorphic compression

Limitations imposed by power and bandwidth with electrode scaling on neural interfaces could be addressed effectively with the promising low-power neuromorphic approach as demonstrated previously for sensory applications such as event-based vision [30], audio [31], tactile [32], and olfactory [33] sensors. This approach essentially ‘morphs’ the initial sensory information processing stages of biological sensors/receptors into VLSI chips using a combination of efficient analog and digital circuitry to mimic their asynchronous spatio-temporal activity. A neuromorphic event-driven neural recording approach was first proposed in [34] for iBMI and more recently implemented in [35, 36, 37]. The neuromorphic approach can also provide benefits of digital multiplexing due to integration with address event representation (AER) circuits and digitizing/communicating data only during spikes by virtue of in-pixel thresholding. The AER-based handling of events could potentially address the shortcomings of data loss due to collisions in SPDWOR configuration. Works in [20, 35, 37] introduce a pipeline involving spike train generation using an analog-to-spike conversion block on-chip decoding using a spiking neural network, which limits its usability to specific tasks. While [35] uses a round-robin arbitration based AER scheme, [20, 37] does not have a collision handling strategy that could scale with the size of the neural electrode. None of the aforementioned neuromorphic/event-based neural recording schemes investigate the gains from using the neuromorphic compression in terms of reduction in data rate, nor assess its impact on iBMI performance. The fidelity of the compressed signal and the effect of collisions in the AER block of a neuromorphic compressive iBMI on task-specific performance when scaling to large Nx-iBMI systems has hitherto been unexplored.

In this work, we investigate the architectural trade-offs of neuromorphic compression of neural signals for iBMI inspired by asynchronous event detection image sensors (AEDIS) such as the dynamic vision sensor (DVS) using next-generation large neuromorphic neural recording systems for simulated and realistic neural recordings. We make the following contributions:

•

We explore the use of address event representation (AER) based readout with a pair of handshaking signals (request and acknowledgment) for elegant collision management to prevent loss of detected pulses by delaying them.
•

We assess the extent and effect of neuromorphic compression on spike shape and spike detection performance for two modes of transmission - ‘All pulse mode’ (APM) and ‘Pulse count mode’ (PCM) quantitatively with a set of standard evaluation metrics and compare them with other popular methods of transmitting neural signals.
•

We evaluate task-specific fidelity, specifically spike shape preservation and spike detection performance, for APM and PCM pipelines and investigate the performance variation across different synthetic and real datasets (100-channel non-human primate, i.e., NHP and 384-channel Neuropixel recordings).
•

We assess the collision handling capability and effect of collision on signal recovery.
•

We discuss the scalability of the proposed method to a higher number of recording channels and the potential to compress the proposed read-out further.

An initial version of this work was presented in [38] where the proposed neuromorphic compression based neural sensing pipeline was analyzed using a single-channel synthetic recording. This work extends the findings for higher channel count recordings including real neural recordings from non-human primates and mice to understand the scalability of the proposed pipeline, besides investigating its collision handling capability and exploring options for further compression. The neuromorphic iBMI event dataset obtained from the simulation of the proposed pipeline is made publicly available at: \urlhttps://sites.google.com/view/brainsyslab/neuromorphic-ibmi-dataset.

III Methodology

The firing rate of a biological neuron is $\approx 1-200$ Hz. Combined with an approximate spike duration of $1-2$ ms, this implies that biological spikes occupy a small fraction of samples in neural recordings. This temporal sparsity of spikes allows the use of an AER-based readout strategy that leads to high compression rates in large iBMI interfaces.

III-A Overview of the Proposed Pipeline

A neuromorphic compression based neural sensing system as shown in Fig. 2(a) is proposed to consist of a DVS-pixel-like ON and OFF threshold crossing detection circuit [30] integrated into each cell, similar to the implementation in [34]. Generally, the front end comprises a capacitive low-noise amplifier with gain $A_{1}$ followed by a second programmable gain stage with gain $A_{2}$ . The total gain of the amplifier stages is typically $\approx 200-1000$ . This is followed by an asynchronous delta modulator to generate an output $\mathrm{V_{mod}}$ and pulses $p$ (+1/ON and -1/OFF). The AER readout strategy is facilitated by arbitration logic, address decoders, and a pair of handshaking signals for event readout. The readout events are then packetized depending on the mode of operation and transmitted wirelessly for further downstream processing of the neural event stream. The following subsections elaborate on the working principle of some of the main blocks of the proposed pipeline.

III-B Event Generation

Each pixel consists of an asynchronous delta modulator typically composed of an input operational transconductance amplifier (OTA) with a capacitive divider gain stage, a pair of comparators, and inverters that generate the pulse train commonly referred to as ‘events’. ON ( $\mathrm{+1}$ ) and OFF ( $\mathrm{-1}$ ) pulses/events, are produced when the change in the amplified version of the input signal ( $\mathrm{V_{in}}$ ) increases above the reference voltage, $\mathrm{V_{ref}}$ , by $\mathrm{Th_{ON}}$ or decreases below $\mathrm{V_{ref}}$ by $\mathrm{Th_{OFF}}$ respectively. The pulse generation process can be described as follows:

	$\displaystyle\frac{\mathrm{dV_{mod}}}{\mathrm{dt}}=A_{1}A_{2}\frac{\mathrm{dV_{in}}}{\mathrm{dt}}$
	$\displaystyle p\mathrm{(t)}=\begin{cases}~{}~{}1,~{}\mathrm{V_{mod}(t+\delta)}=\mathrm{V_{cm}},~{}\text{if}~{}\mathrm{V_{mod}(t)}>\mathrm{Th_{ON}},\\ -1,~{}\mathrm{V_{mod}(t+\delta)}=\mathrm{V_{cm}},~{}\text{if}~{}\mathrm{V_{mod}(t)}<\mathrm{Th_{OFF}},\\ ~{}~{}0,~{}\mathrm{V_{mod}(t+\delta)}=\mathrm{V_{mod}(t)}+\frac{\mathrm{dV_{mod}}}{\mathrm{dt}}\delta,~{}\text{otherwise,}\end{cases}$		(1)

where $\mathrm{V_{cm}}$ is the common mode voltage of the ADM. An example waveform with a spike reconstruction is shown in Fig. 2(b). Other implementations[39] have also combined this function in one stage, where the signal reconstruction method was different. An enhanced adaptive version of the asynchronous delta modulator, as introduced in [36], could be used to modulate and minimize the event generation rate by following the amplitude and noise characteristics of the input signal. This could effectively suppress event generation due to noise or abnormalities in the baseline region. For the purpose of simplicity, the event generation block in this work is assumed to contain a basic asynchronous delta modulator, as implemented in [34].

III-C AER-based Readout

Upon generation of ON/OFF events, the $\mathrm{Req}$ signal is generated and, an additional logic asserts the $\mathrm{Ack}$ signal once the event is readout, which in turn resets the pixel. The reset state will be held for a ‘refractory period’ determined by the values of the capacitor and the bias voltage in the pixel circuitry. The number of pulses generated by each pixel depends on the amplitude and frequency of the input signal, and also on the parameters of the delta modulator - ON/OFF pulse generate thresholds and refractory period. The pixels across the rows and columns of the electrode array are tied via a wired-OR connection to pass the read request (Req) to the readout block. Row $\mathrm{(Req~{}Y_{i})}$ and column $\mathrm{(Req~{}X_{i})}$ request lines from each of the neural recording pixels are connected to the row (Y) and column (X) arbiter similar to DVS [30]. For realizing an AER-like readout and managing collision efficiently, we use a toggle tree fair X, Y-arbiter [40] to decide the sequence of readout when pulse read requests are generated by multiple electrodes simultaneously. As in any AER system, time represents itself and events are generated asynchronously. The address $\mathrm{(x_{i},y_{i})}$ and polarity ( $\mathrm{p_{i}}$ ) of the generated pulses (ON/ $\mathrm{+1}$ or OFF/ $\mathrm{-1}$ ) are communicated to the next stage. Acknowledgment signals ( $\mathrm{Ack~{}Y_{i}}$ and $\mathrm{Ack~{}X_{i}}$ ) are then sent from the X, Y-arbiters to the cell ( $\mathrm{x_{i},y_{i}}$ ) after read out, to reset its amplifier for continuing the delta compression.

III-D Event Packaging

The proposed neuromorphic scheme allows the neural data to be encoded as a train of asynchronous binary ON/OFF pulses, instead of producing an n-bit word for each sample at the output of the ADC. Events from the event stream are packetized depending on the mode of transmission before they are transmitted. In this work, we investigate two modes of operation: APM and PCM. In APM, all the generated pulses are transmitted off-chip asynchronously through a wireless link, with the event pulse packetized to contain the address and polarity of the event, $\mathrm{(x_{i},y_{i},p_{i})}$ . In contrast, PCM involves electrode-wise accumulation of events with fixed-time bins. If the width of the bin is chosen to be ‘n’-sampling intervals long, then it is denoted as PCM‘n’ (e.g. PCM1, PCM2, and PCM4 have bin-widths equal to 1, 2, and 4 times the traditional sampling interval in neural recording systems respectively). The ON and OFF Events generated by an electrode at $\mathrm{(x_{i},y_{i})}$ location in PCM mode are packetized as the ON and OFF event counts - $\mathrm{n_{ON}}$ and $\mathrm{n_{OFF}}$ as $\mathrm{(x_{i},y_{i},n_{ON},n_{OFF})}$ . For a bin width of duration $\mathrm{t_{b}}$ , in PCM the ON/OFF events are accumulated as follows:

\displaystyle\mathrm{n_{ON(OFF)}}=\sum_{\mathrm{t_{b}}}|p\mathrm{(t)}|_{p(t)=1(-1)}

(2)

The number of bits per APM event is, $1+\log_{2}\mathrm{N_{r}}+\log_{2}\mathrm{N_{c}}$ whereas the number of bits per PCM event is, $\mathrm{n_{ON}}+\mathrm{n_{OFF}}+\log_{2}\mathrm{N_{r}}+\log_{2}\mathrm{N_{c}}$ where $\mathrm{N_{r}}\times\mathrm{N_{c}}$ is the size of the electrode array. In both modes, nothing is transmitted when no events are generated. At the receiver end, the electrode signals can be recovered by the accumulation of pulses depending on their polarity by stair-step reconstruction or directly processed using spiking neural networks, for further processing downstream processing such as spike detection, spike sorting, decoding, etc.

IV Results

IV-A Simulation Setup

Simulations were performed using the data processing pipeline shown in Fig.3. In order to incorporate realistic arbitration delay when simulating for larger electrode count, we simulated a toggle-tree fair arbiter as introduced in [40] in a $65$ nm CMOS process using Cadence Virtuoso to determine the arbitration delay. The arbitration delay ( $\mathrm{t_{arb}}$ ) was estimated to be in the order of a few nanoseconds. This delay was incorporated in the timing of colliding events by delaying transmission of the event by $\mathrm{p_{e}}\times\mathrm{t_{arb}}$ where $\mathrm{p_{e}}$ is the event priority decided by the arbiter. Thus, the estimated arbitration delay was fed to the processing pipeline along with the recorded neural signals and, calibrated thresholds for ON/OFF pulse generation and spike detection ( $\mathrm{Th_{ON}}$ / $\mathrm{Th_{OFF}}$ and $\mathrm{Th_{SPD}}$ respectively). To evaluate the correctness of neural signal encoding in the form of neural events using the proposed scheme, we recover the signal by following the steps detailed in Section - IV-C1. The recovered signal which approximates the original channel recording was thus obtained. The drift in the trace of the recovered signal from the baseline owing to the open loop asynchronous behavior of delta modulation was removed using a high-pass filter on the recovered signal. The following subsections elaborate on the neural recording datasets that were used for the simulations in this work, along with the evaluation metrics used and key results.

IV-B Dataset

The simulations in this work have been performed for three diverse datasets, ranging from single-channel synthetic data to 100-channel non-human primate (NHP) motor cortex and 384-channel mice visual cortex neural recordings. The details of the datasets used are as follows:

•

Synthetic dataset provided in [41] with varying noise levels ( $\mathrm{\sigma_{noise}}$ = 0.05, 0.1, 0.15 and 0.2) and sampling frequency of $24$ KHz with spike duration of $\approx 1-2$ ms. The dataset also contains ground truth for the spike detection task. For worst-case simulation of large n-channel electrode arrays, the single-channel signal is replicated to all the n-channels.
•

Non-human primate (NHP) motor cortex recordings containing 100 channels sampled at 30 KHz previously used in [27, 42] for decoding motor intention. Since no ground truth is available for this dataset, spike detection was performed on the original recorded signal using the absolute threshold method in [41]. The resultant spike detections were used as a ground truth of potential spike samples for performance analysis of the different compression methods investigated in this work.
•

Recordings of mice’s visual cortex to visual stimuli with 384 channels Neuropixel [43] at $30$ KHz and $<7~{}\mu$ V RMS noise levels were made available in [44]. Spike detection ground truth containing potential spike times was obtained in the same way as was described for the NHP dataset above. Spike signal duration is $<1~{}\mu$ S in the Neuropixel recording.

IV-C Evaluation Metrics

In order to analyze the effect of neuromorphic compression on iBMI we evaluate the spike information retention i.e., how well the compressed signal preserves spike shape (used in tasks such as spike sorting). This is measured in terms of signal recovery performance. We also show that better tradeoffs may be obtained by using other metrics, such as task-specific performance analysis. To this end, we use spike detection performance to assess the task-specific usability of signal recovered from the compressed data for downstream iBMI tasks, such as spike-based decoding.

IV-C1 Signal Recovery

Signal recovery was performed by stair-step reconstruction of the transmitted event pulses. We recover the signal from APM event packets by adding/subtracting the $\mathrm{Th_{ON}}$ / $\mathrm{Th_{OFF}}$ value at the corresponding ON ( $\mathrm{+1}$ ) and OFF ( $\mathrm{+1}$ ) event times. Signal recovery was done for PCM packets by adding/subtracting $\mathrm{Th_{ON}}$ / $\mathrm{Th_{OFF}}$ multiplied by the event counts ( $\mathrm{n_{ON}}$ / $\mathrm{n_{OFF}}$ ) in the event packets. The recovered signal was then resampled to match the sampling rate of the neural recording, as shown in Fig. 4(a). To evaluate the signal recovery performance, we used two common metrics - root-mean-square error (RMSE) and Pearson’s correlation coefficient (CC). RMSE was normalized to $[0,1]$ . These metrics are indicative of spike shape preservation in the event data obtained from the proposed APM/PCM pipeline. A Low RMSE and high CC indicate good spike shape preservation or high recovery, i.e., the recovered signal closely approximates the original signal and may be assumed to yield good performance in tasks such as spike cell classification/clustering that depend on spike shape. These metrics were computed between the original signal and the signal recovered from the event pulse packets.

IV-C2 Spike Detection (SPD)

Evaluation of spike detection performance is done by determining detection accuracy (A), sensitivity (S), and false detection rate (FDR) as shown in Eq. 3 [45].

\displaystyle\mathrm{S}=\frac{\mathrm{TP}}{\mathrm{TP+FN}};~{}\mathrm{FDR}=\frac{\mathrm{FP}}{\mathrm{TP+FP}};~{}\mathrm{A}=\frac{\mathrm{TP}}{\mathrm{TP+FP+FN}}

(3)

Spike detections occurring within a tolerance window of $\mathrm{t_{spike}\pm\delta_{\mathrm{tolerance}}}$ ( $\delta_{\mathrm{tolerance}}$ here is about half the spike duration, i.e., $\approx 0.5$ ms) are marked as true positives ( $\mathrm{TP}$ ), spurious detections that are absent in the ground truth are marked as false positives ( $\mathrm{FP}$ ) and the missed detections are marked as false negatives ( $\mathrm{FN}$ ). Accuracy ( $\mathrm{A}$ ), sensitivity ( $\mathrm{S}$ ), and false detection rate ( $\mathrm{FDR}$ ) calculated using Eq. (3) are used as metrics for measuring spike detection performance. Accuracy is an overall metric because it takes into account sensitivity and FDR. A high sensitivity is desired for SPD performance evaluation in firing rate based BMI systems, especially considering the sparsity of spikes. Since the synthetic dataset used contains ground truth for SPD, we performed SPD using both - the absolute threshold crossing method (AT-SPD) [41] and non-linear energy operator (NEO-SPD)[46] method involving threshold crossing based detection on the NEO-enhanced signal. In the calibration stage, the spike detection threshold for NEO-SPD was determined as $\mathrm{Th_{SPD}}=8\times median($ NEO(neuralSignal) $)$ . Spikes are detected for all NEO(recoveredSignal) $>$ $\mathrm{Th_{SPD}}$ . For the signal recovered from events of the synthetic dataset, NEO-SPD and AT-SPD yielded similar SPD performance results. Therefore, for simplicity, AT-SPD was used for ‘potential spike times’ ground truth generation from NHP and Neuropixel datasets, and for the evaluation of SPD performance evaluation on the signal recovered from the events stream corresponding to these datasets.

IV-D Choice of threshold

For APM and PCM, the ON and OFF thresholds were determined from trade-off curves between threshold and performance metrics mentioned in IV-C. During the calibration stage (a few seconds long), the input-referred pulse generation threshold for each recording channel was determined as a factor of the spike amplitudes $\mathrm{V_{spike-max}}$ and is obtained as follows:

\displaystyle\mathrm{Th_{ON/OFF}}=\pm k\times\mathrm{V_{spike-max}}

(4)

The factor $k$ was obtained from trade-off curves as shown in Fig. 4(b) by sweeping $k$ in the range $(0,1)$ in steps of $0.1$ and determining the value at which $k$ balances off the performance metrics and pulse generation rate. As shown in the left plot of Fig. 4(b), for the synthetic dataset, an input referred threshold obtained with of $k=\pm 0.3$ was found to provide a good tradeoff between data rate and signal recovery. As expected, spike detection required a less stringent threshold, obtained with $k=\pm 0.45$ . $\mathrm{Th_{ON/OFF}}$ is a parameter of the asynchronous delta modulator and therefore affects the pulse generation rate. A larger $\mathrm{Th_{ON/OFF}}$ would result in a lower data rate. However, this results in coarser reconstruction of the signal from the pulse data and suppression of samples whose values are below the threshold, resulting in degraded signal recovery (affecting CC and RMSE) and spike detection performance (affecting Acc and S). Similar trade-off curves were obtained for the NHP and Neuropixel datasets, and a $k=\pm 0.3$ yielded better performance without a significant increase in the data rate as shown in Fig. 4(d-e). The more stringent threshold for signal recovery is used to report the results presented in the following subsections.

IV-E Data rates for the different modes

The theoretical model for the estimation of transmission data rate (TDR) in terms of the firing rate of the biological neuron, for the architecture studied in this work is:

$\displaystyle\mathrm{TDR}$	$\displaystyle=(\mathrm{N_{r}}\times\mathrm{N_{c}})\times\mathrm{R_{p}}\times(\mathrm{n_{b}}+\log_{2}\mathrm{N_{r}}+\log_{2}\mathrm{N_{c}})$
$\displaystyle\mathrm{R_{p}}$	$\displaystyle=\begin{cases}\mathrm{f_{neu}}\times\mathrm{N_{AP}}+\mathrm{R_{noise}},&\text{for~{}APM}\\ \mathrm{R_{bin}},&\text{for~{}PCM}\\ \mathrm{f_{neu}}\times\mathrm{N_{spike}}\,&\text{for~{}SPDWOR}\\ \mathrm{f_{s}},&\text{for~{}\cite[cite]{[\@@bibref{}{19584Electrodes,5120Electtrodes,59760Electrodes}{}{}]}}\\ \end{cases}$
$\displaystyle\mathrm{n_{b}}$	$\displaystyle=\begin{cases}1,&\text{for~{}APM}\\ \mathrm{n_{b-ON}}+\mathrm{n_{b-OFF}},&\text{for~{}PCM}\\ \mathrm{b_{ADC}},&\text{for~{}SPDWOR,\cite[cite]{[\@@bibref{}{19584Electrodes,5120Electtrodes,59760Electrodes}{}{}]}}\end{cases}$	(5)

where $\mathrm{R_{p}}$ is the sample/pulse generation rate for an $\mathrm{N_{r}}\times\mathrm{N_{c}}$ array requiring $\mathrm{n_{b}}$ bits to represent the pulse, $\mathrm{f_{neu}}$ is the biological spike firing rate, $\mathrm{N_{AP}}$ is the average number of pulses generated per spike, $\mathrm{n_{b}}$ is the number of bits per pulse/sample, $\mathrm{R_{noise}}$ is the pulse generation rate corresponding to non-spike samples. $\mathrm{R_{bin}}$ is the rate of non-empty event count bins in PCM, which depends on the number of bins per second ( $\mathrm{n_{b}}$ ) and the probability of non-empty bin ( $\alpha_{\mathrm{b}}$ ), and is related as follows:

\displaystyle\mathrm{R_{bin}}=\alpha_{\mathrm{b}}\times\mathrm{n_{b}}

(6)

$\alpha_{\mathrm{b}}$ in Eq. 6 represents the sparsity of PCM event counts. SPDWOR involves transmitting $\mathrm{b_{ADC}}$ bits for $\mathrm{f_{neu}}\times\mathrm{N_{spike}}$ spike samples, where $\mathrm{N_{spike}}$ is the number of samples per spike. The theoretical model was validated by comparing with several seconds of synthetic data and was found to match well with an error of $\pm 5\%$ .

IV-F Compression ratio

The extent of compression for the proposed method is evaluated by computing the compression ratio (CR), which is defined as the ratio of the transmission data rate for full sample transmission ( $\mathrm{TDR_{fs}}$ ) as done in [10, 11, 12] to the transmission data rate of spike-sample transmission ( $\mathrm{TDR_{spk}}$ ) as done in SPDWOR [25] or that of the proposed PCM/APM ( $\mathrm{TDR_{APM/PCMn}}$ ). The CRs for different transmission modes (APM and PCM) were computed using the following equations:

	$\displaystyle\mathrm{CR1}$	$\displaystyle=\frac{\mathrm{TDR_{fs}}}{\mathrm{TDR_{spk}}}$	$\displaystyle;~{}\mathrm{CR2}$	$\displaystyle=\frac{\mathrm{TDR_{fs}}}{\mathrm{TDR_{APM}}};$		(7)
	$\displaystyle\mathrm{CR3}$	$\displaystyle=\frac{\mathrm{TDR_{fs}}}{\mathrm{TDR_{PCM1}}}$	$\displaystyle;~{}\mathrm{CR4}$	$\displaystyle=\frac{\mathrm{TDR_{fs}}}{\mathrm{TDR_{PCM4}}}$

The CR comparison plot shown in Fig 5 (a-d) was obtained for the synthetic dataset by sweeping the firing rate $\mathrm{f_{neu}}$ of the channel and linearly extrapolating the compression ratio for different $\mathrm{f_{neu}}$ for different background noise levels. As expected, the $\mathrm{TDR_{APM/PCMn}}$ increases with increasing noise levels (due to added events generated by background noise) and results in a drop in the CR. Higher the $\mathrm{f_{neu}}$ higher is the APM/PCM event generation rate ( $\mathrm{R_{p}}$ in Eq. IV-E), resulting in a drop in the CR as $\mathrm{f_{neu}}$ increases. An ideal SPDWOR-like implementation transmits all the samples (8-12 bits per sample besides the address bits) corresponding to each of the spikes, irrespective of the change in the signal value compared to the previously recorded sample. In contrast, APM transmits 1 bit and PCM transmits $\mathrm{n_{ON}+n_{OFF}}$ (typically $<12$ ) bits only when the signal exceeds $\mathrm{Th_{ON/OFF}}$ , thereby transmitting fewer bits per spike compared to SPDWOR. Therefore, at neural firing rates of $\approx$ $30-60$ Hz, the CR of the proposed method is $\approx 20-50$ and $6-8\times$ SPDWOR [25] as shown in Fig.5(e). Even though CR in proposed methods decreases with increasing electrode numbers (Fig. 5(f)) due to a higher number of bits needed to encode the address, it is still $\approx 3\times$ better than [25] for $10$ K channels. For the 100-channel NHP dataset with an average neural firing rate per channel of $\approx 62$ Hz, CR1 = $3.23$ , CR2 = $25.2$ , CR3 = $15.4$ , and CR4 = $850$ were obtained, which is consistent with the estimated CR in Fig. 5(f).

TABLE I: Analysis of the effect of neuromorphic compression. The best results for each of the criteria are highlighted in boldface.

Threshold = $\pm 0.3\times\mathrm{V_{spike-max}}$
Dataset	#Channels	Mode	RMSE	CC	A	S	FDR	DR (Mbps)
Synthetic ( $\mathrm{\sigma_{noise}}$ = 0.05)	10K	APM	0.1054	0.923	92.82	92.82	0	76.75
		PCM1	0.1236	0.8937	99.31	99.86	0.55	80.22
		PCM2	0.13	0.8827	99.04	99.57	0.53	78
		PCM4	0.7268	0.2681	71.23	71.81	1.13	63.59
Synthetic ( $\mathrm{\sigma_{noise}}$ = 0.1)	10K	APM	0.1196	0.9077	93.58	93.58	0	81.12
		PCM1	0.1386	0.8752	97.68	99.72	2.06	85.51
		PCM2	0.1401	0.8729	96.71	97.93	1.27	83.19
		PCM4	0.4165	0.4506	70.47	71.17	1.37	67.19
Synthetic ( $\mathrm{\sigma_{noise}}$ = 0.15)	10K	APM	0.1368	0.8903	93.99	94.09	0.12	114.22
		PCM1	0.1559	0.8561	95.53	96.45	0.99	124.86
		PCM2	0.156	0.8563	89.15	90.32	1.43	126.25
		PCM4	0.2344	0.7117	71.84	72.88	1.95	108.66
Synthetic ( $\mathrm{\sigma_{noise}}$ = 0.2)	10K	APM	0.1459	0.892	92.02	93.23	1.4	178.24
		PCM1	0.1705	0.8499	89.5	90.38	1.08	202.29
		PCM2	0.1696	0.8521	80.29	81.3	1.52	212.26
		PCM4	0.2468	0.7183	71.19	71.91	1.38	192.54
NHP	100	APM	0.0983	0.8997	90.28	96.91	6.7	1.19
		PCM1	0.1318	0.8515	87.32	93.2	8	1.95
		PCM2	0.1443	0.8234	86.51	92.58	0.08	0.36
		PCM4	0.2311	0.7422	65.01	88.38	0.29	0.04
Neuropixel	384	APM	0.0881	0.8981	84.94	89.45	5.54	6.58
		PCM1	0.088	0.9005	74.39	91.04	19.63	10.99
		PCM2	0.1186	0.854	71.36	89.63	22.15	2.03
		PCM4	0.1922	0.7201	57.86	87.97	36.86	1.83
Neuropixel	100	APM	0.0808	0.8389	88.44	94.36	6.53	0.25
		PCM1	0.0678	0.9101	72.58	91.2	21.88	0.46
		PCM2	0.0979	0.8571	72.28	89.62	21.05	0.48
		PCM4	0.2119	0.6238	63.39	86.17	28.93	0.42

IV-G Effect of compression

As discussed in Section IV-C, signal recovery and spike detection metrics are used to understand the effect of compression. Table I summarizes the effect of neuromorphic compression on large iBMI for the two modes of pulse transmission- APM and PCM for bin widths that are 1,2 and 4 times (PCM1, PCM2, and PCM3) the sampling interval used in conventional neural signal transmission for all the three datasets introduced earlier. Over 90% of the spike shape can be fully recovered on average, from APM, whereas the recovered signal is relatively degraded with increasing levels of compression in PCM, as seen in the RMSE and CC columns of Table I. While PCM4 ensures lower TDR compared to APM, it comes at the cost of spike shape recovery. The higher the level of PCM, the coarser the event generation and recovery of the signal.

Spike detection is a typical but significant task in firing-rate based iBMI systems. Therefore, it is necessary to assess whether the proposed APM/PCM effectively captures the spike-time information. We do this by performing spike detection on the signal recovered from the APM/PCM events, comparing the detection from the recovered signal with the ground truth, and quantitatively computing the metrics (A, S and FDR) described in Section IV-C2, it can also be observed that the sensitivity of spike detection is not significantly affected by the poor spike shape recovery. A higher sensitivity ensures that spikes are not missed, especially those closer to the noise level, and this factor is all the more significant in noisy channels. It can be seen that spike detection on signals recovered from APM, PCM1, and PCM2 results in very high sensitivity. As expected, spike detection performance, particularly FDR and accuracy, degrades with increasing noise ( $0.05$ to $2.0$ ) and compression (APM to PCM4).

For the NHP and Neuropixel datasets, the evaluation of performance metrics relied on how ‘potential spikes’ were defined in the absence of the ground truth. We determined the absolute threshold for spike detection, which is a few factors higher than the noise margin as typically done in AT-SPD, and recorded the time of the positive and negative spike-detection threshold ( $\mathrm{Th_{SPD}}$ ) crossing as potential spike times. Despite the constraint, the sensitivity of SPD and CC are decent for the NHP and Neuropixel datasets. We found that the spike information (spike time and shape) is well-preserved in the proposed APM/PCM. We leave spike detection directly using the generated APM/PCM events as future work.

IV-H Collision Analysis

Simulations performed in this work using the synthetic datasets inherently test the robustness of the neuromorphic compression pipeline, owing to the replication of single-channel signal to $\mathrm{10K}$ channels. This results in creating a worst-case scenario where events are simultaneously generated by each of the $\mathrm{10K}$ channels, requiring arbitration without loss of events. Thanks to the sparsity of spikes and the sampling interval, the arbiter has sufficient time to complete the AER handshaking and reading out of events in time without dropping any event, which is evident from the high CC and S recorded in Table I. The AER-based read-out process however introduces a small delay owing to the arbitration time in case of collision as shown in Fig. 6(a). For the NHP dataset, it was determined that an average of $10.0748\pm 3.5741$ channels collide in the recording, with $0-74$ collisions at any instance. Similarly, for the Neuropixel dataset, $29.5113\pm 33.51$ channels collide, with $\approx 5$ of them carrying spike samples. There are $0-321$ collisions at any instance in the Neuropixel recording. For the NHP and Neuropixel datasets, $\approx 25\%$ of events correspond to spike samples undergoing collision. If spike-sample only transmission scheme akin to SPDWOR-like read-out were to be used in scenarios like this, it would result in the loss of the spike samples that undergo collision. While SPDWOR uses collisions to an advantage to suppress samples corresponding to the background activity, it also loses some samples corresponding to the spike. The reference [25] reports $80-90\%$ spike recovery measure in terms of spike sorting performance, which indicates over a tenth of spike information is lost. The spike sample loss using the SPDWOR scheme might be more prominent in high-density neural probes that exhibit a high spatiotemporal correlation among the neighboring recording sites/channels [47]. Owing to the inherent noise sample suppression of the ADM and collision handling mechanism of the AER-arbiter, the loss of spike samples is negligible for the proposed APM/PCM pipeline. This can be inferred from the change in RMSE due to collision is negligible and mainly due to delay in the sample as shown in Fig. 6(b).

V Discussion

V-A Scalability simulation and worst-case analysis

In the simulations for the APM/PCM pipeline using the synthetic dataset, it is to be noted that the CRs reported are for the worst-case scenario where events are simultaneously generated in each of the $\mathrm{10K}$ channels and the AER-arbiter handles the collisions. The worst-case analysis is a by-product of simulations with single-channel synthetic data of varying noise levels copied to $\mathrm{10K}$ channels. We noticed there is only a small difference in the signal recovery performance due to the arbitration delay in the proposed pipeline. Since we did not implement the SPDWOR, we estimate the data rate based on the assumption that under ideal conditions, SPDWOR captures all the samples corresponding to the spike. Work in [25] demonstrates the scalability of the wired-OR readout for up to 512-channel electrode array with about $80\%$ spike recovery, however, the increase in complexity of wiring and the effect of collision on the quality of spike information captured for higher electrode counts is still unknown.

In order to analyze the scalability of the proposed neuromorphic compression scheme for the real recordings, we take a subset of $100$ channels with the highest firing-rate channels from the 384-channel Neuropixel dataset. As seen in the last few rows of Table I, TDR for APM/PCM drops when the number of channels is limited to 100 high firing-rate channels. We notice that TDR increases significantly for APM and PCM1 when scaled from 100 to 384 channels of the Neuropixel dataset while the increase is not as high for PCM2 and PCM4. The probability of non-empty bins (studied further in Section V-B) is higher for PCM2 and PCM4, as a result, the TDR does not change dramatically with scaling. On examining the cause for higher TDR in APM/PCM1 with scaling, we identified that a significant number of events generated by an ADM in APM/PCM with a single fixed threshold ( $\mathrm{Th_{ON/OFF}}$ with $k=0.3$ ), correspond to background activity, resulting in higher TDR (as shown in green in Fig. 8(a)). In order to reduce TDR and improve CR, it is necessary to choose ( $\mathrm{Th_{ON/OFF}}$ ) such that the ADM generates events only for the spike and suppresses background activity in the channel. In Section V-C we explore some schemes to reduce TDR and improve CR by suppressing event generation due to background activity.

TABLE II: Sparsity

\alpha_{b}

of PCM events

$\sigma_{\mathrm{noise}}$	PCM1	PCM2	PCM4
0.05	0.0256	0.0407	0.0615
0.1	0.03625	0.06165	0.1013
0.15	0.0649	0.11855	0.21045
0.2	0.09505	0.17765	0.31255

V-B Sparsity of events

While transmitting in PCM, as opposed to APM seems to have a better CR for lower noise and thresholds as seen from the results in the previous sections, the improvement in compression does not change dramatically at higher noise levels. To understand this effect, Table II presents the sparsity of PCM events, in other words, the probability of non-zero event counts in PCM bins ( $\alpha_{b}$ in Eq. 6). It can be seen that there is a higher probability of non-empty bins in PCM at higher noise and for lower pulse generation thresholds, which results in higher TDR and therefore lower CR. Owing to the design of the asynchronous delta modulator ADC (ADM-based pulse generator) used in this work, a fast-rising or falling input signal such as spikes results in dense pulse generation which translates to higher APM event rate and higher PCM event counts per bin, whereas a smooth/flat input signal results in few or no event pulses. Thus, the sparsity of APM events or non-zero event counts depends on the choice of the ON/OFF pulse generation threshold.

TABLE III: Comparison of characteristics and implementation of different iBMI neural recording systems

Compression Technique/

Factor

Full-sample recording

[10, 11, 12]

Spike-sample recording

[47]

On-chip spike sorting/

decoding

[20, 21, 22, 23, 24, 27]

SPDWOR

[25]

This work

Spike shape preservation

Circuit complexity and wiring

Moderate

High

Low

Collision handling

Scalability

(No. of Channels)

384 (Neuropixel)

2048 (Argo)

256

64-384

512

10K (Synthetic)

384 (Neuropixel)

Compression ratio (CR)

2-40

>

240

2-20

20-100

V-C Further Compression

TABLE IV: Comparison of CR for APM using a dual threshold Asynchronous Delta Modulator at firing rate,

\mathrm{f_{AP}}=60

$\sigma_{\mathrm{noise}}$	$k1$	$k2$	TDR (Mbps)	CR2	CR2:CR1
0.05	0.6	0.1	199.9102	13	6
0.05	0.6	0.3	57.4173	42	18
0.1	0.6	0.1	204.856	12	5
0.1	0.6	0.3	60.2111	40	17
0.15	0.6	0.1	214.2423	12	5
0.15	0.6	0.3	63.0822	39	16
0.2	0.6	0.1	225.3674	11	5
0.2	0.6	0.3	66.2197	37	16

A good threshold for ADM is critical in ensuring higher compression. Higher thresholds i.e., increasing the factor $k$ can ensure that background activity causes no pulse/event generation and improves overall compression, however, the spike shape captured might be coarser resulting in poor recovery. An adaptive delta modulator as introduced in [36] or a delta modulator with a two-level threshold - one for detecting the spike and the other for finer asynchronous sampling of the spike, may be implemented.

Fig. 7(a) presents a logical block diagram for a simple dual threshold ADM with two sets of thresholds - $\mathrm{Th_{High}}$ and $\mathrm{Th_{Low}}$ . In this approach, the ADM initially operates with the higher threshold, until the input signal level exceeds $\mathrm{Th_{High}}$ and then switches to a lower threshold $\mathrm{Th_{Low}}$ for a fixed time $T_{timer}$ determined by the timer (typically, spike duration $\approx 1-3$ ms) to capture the spike shape with finer step size. When the timer elapses, the ADM switches back to asynchronous sampling with $\mathrm{Th_{High}}$ . Following the notation used in Eq. 4, $\mathrm{Th_{High}}$ and $\mathrm{Th_{Low}}$ are adjusted by varying the factors $k_{1}$ and $k_{2}$ respectively. Table IV summarizes the observation for two different values of $k_{2}$ ( $k_{2}=0.1~{}\text{and}~{}0.3$ ), fixed value of $T_{timer}=1$ ms and a fixed higher $k_{1}=2\times$ $k=0.6$ ( $k=0.3$ as discussed in Section IV-D). $k_{2}=0.3$ results in higher CR ( $\approx 2\times$ more than single-threshold ADM) and $16-18\times$ higher compression than SPDWOR. As shown in Fig. 8(a), a dual-threshold ADM captures the spike shape well depending on the chosen $k_{2}$ and prevents spurious events from background activity that are captured by single-threshold ADM.

Another approach for increasing compression and suppressing background activity events is presented in Fig. 7(b), where event-based temporal filters may be employed following a single-ADM neuromorphic compression based readout to limit the transmission of APM/PCM events corresponding to spikes only. Leveraging on the sparsity of events and pulse generation pattern discussed in Section V-B, the event filters may track the event rate (for APM) or event counts (for PCM) within a temporal neighborhood to determine whether the current event corresponds to a spike or not. An event from a channel may be said to result from a spike if it is preceded by a dense spike train in the near past from the same channel. A low-complexity event-based spike detector may thus be realized as a byproduct of the event filters. However, implementation of this approach is left for future work. With an average of $4-8$ events per spike, the filtered event stream is estimated to result in up to $8\times$ more compression per channel. Thus, with future improvements to the proposed pipeline to transmit events corresponding to detected spikes only, a CR of about $50-100$ can be achieved for a mean neural firing rate of $\mathrm{f_{AP}=60}$ Hz.

V-D Hardware implementation

We do not present circuit simulation of the Asynchronous Delta Modulator since such schemes have been published in several papers such as [34] or even the recent [36, 35]. Unlike SPDWOR, no expensive wiring is needed to handle collision scenarios, which are inherently handled by the AER-based readout strategy in our proposed pipeline. Neuromorphic circuits are known for low power consumption, and thus a neuromorphic compression based iBMI system is expected to consume far less power than conventional neural recording systems with full-sample transmission. Relying on the hardware measurements from [35] which implements an ADM-based pulse transmission in $40$ nm CMOS technology, the ADM configured consumes $\approx 7~{}\mu$ W per channel. A neuromorphic compression based readout as proposed in this work, operating in low-power mode can be estimated to result in a surface power density of $\approx 4.36$ mW/cm², which lies well within the power dissipation budget of $\approx 80$ mW/cm² for iBMI[14].

V-E Comparison with other works

A summary of the high-level comparison of the proposed neuromorphic compression based neural sensing with prior works is presented in Table III. The proposed pipeline is a low-complexity, scalable, and high-compression neural recording system that preserves spike shape by handling collision scenarios effectively without the need for additional wiring. High compression may be obtained by recording systems that capture spike times only or perform on-chip decoding, however, this limits their adaptability to a limited set of iBMI tasks.

VI Conclusion

Neural electrode scaling for Nx-iBMI is severely limited by data rate and power budget. In this work, we quantitatively evaluate the effectiveness and extent of a neuromorphic compression based neural sensing architecture for iBMI, inspired by an asynchronous event detection image sensor such as the dynamic vision sensor. Transmission of pulses in APM or PCM results in compression ratios that are $2-5$ times that of transmitting spike samples as in [25] and $15-20$ times that of full sample transmission as in conventional CMOS-image sensor like readout architecture. We show that even with such high compression using the proposed neuromorphic architecture, there is $\approx 90\%$ similarity of the recovered spike shape while ensuring a spike detection accuracy of more than $92\%$ and a negligible false detection rate. A very high spike detection accuracy was obtained even for higher compression ratios of $5-50$ , thus demonstrating that neuromorphic compression based neural sensing systems can perform iBMI tasks well while lowering the data rate and the transmission power of Nx-iBMI. Transmission of event pulses corresponding to spikes alone can boost the compression ratio further to $50-100$ times that of full sample transmission and $5-18$ times more compression than spike sample transmission. Future work will analyze the effects of neuromorphic compression on spike sorting and motor decoding tasks in large Nx-iBMI, and explore event-based processing for spike detection and iBMI decoding using spiking neural networks.

acknowledgment

The work described in this paper was partially supported by a grant from the Singapore Ministry of Education Academic Research Fund Tier 2 grant (MOE-T2EP20220-0002) and the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CityU 11200922).

References

[1] M. S. Beauchamp, D. Oswalt, P. Sun, B. L. Foster, J. F. Magnotti, S. Niketeghad, N. Pouratian, W. H. Bosking, and D. Yoshor, “Dynamic stimulation of visual cortex produces form vision in sighted and blind humans,” Cell, vol. 181, no. 4, pp. 774–783.e5, 2020. [Online]. Available: \urlhttps://www.sciencedirect.com/science/article/pii/S0092867420304967
[2] F. R. Willett, E. M. Kunz, C. Fan, D. T. Avansino, G. H. Wilson, E. Y. Choi, F. Kamdar, M. F. Glasser, L. R. Hochberg, S. Druckmann, K. V. Shenoy, and J. M. Henderson, “A high-performance speech neuroprosthesis,” Nature, vol. 620, no. 7976, pp. 1031–1036, Aug 2023. [Online]. Available: \urlhttps://doi.org/10.1038/s41586-023-06377-x
[3] S. N. Flesher, J. E. Downey, J. M. Weiss, C. L. Hughes, A. J. Herrera, E. C. Tyler-Kabara, M. L. Boninger, J. L. Collinger, and R. A. Gaunt, “A brain-computer interface that evokes tactile sensations improves robotic arm control,” Science, vol. 372, no. 6544, pp. 831–836, 2021. [Online]. Available: \urlhttps://www.science.org/doi/abs/10.1126/science.abd0380
[4] K. Shen, O. Chen, J. L. Edmunds, D. K. Piech, and M. M. Maharbiz, “Translational opportunities and challenges of invasive electrodes for neural interfaces,” Nature Biomedical Engineering, vol. 7, no. 4, pp. 424–442, Apr 2023. [Online]. Available: \urlhttps://doi.org/10.1038/s41551-023-01021-5
[5] I. H. Stevenson and K. P. Kording, “How advances in neural recording affect data analysis,” Nature Biomedical Engineerng, vol. 14, pp. 139–142, Jan 2011.
[6] I. Stevenson, “Tracking advances in neural recording.” [Online]. Available: \urlhttps://stevenson.lab.uconn.edu/scaling/
[7] F. R. Willett, D. T. Avansino, L. R. Hochberg, J. M. Henderson, and K. V. Shenoy, “High-performance brain-to-text communication via handwriting,” Nature, vol. 593, no. 7858, pp. 249–254, May 2021. [Online]. Available: \urlhttps://doi.org/10.1038/s41586-021-03506-2
[8] D. A. Moses, S. L. Metzger, J. R. Liu, G. K. Anumanchipalli, J. G. Makin, P. F. Sun, J. Chartier, M. E. Dougherty, P. M. Liu, G. M. Abrams, A. Tu-Chan, K. Ganguly, and E. F. Chang, “Neuroprosthesis for decoding speech in a paralyzed person with anarthria,” New England Journal of Medicine, vol. 385, no. 3, pp. 217–227, Jul. 2021. [Online]. Available: \urlhttps://doi.org/10.1056/nejmoa2027540
[9] I. Basu, A. Yousefi, B. Crocker, R. Zelmann, A. C. Paulk, N. Peled, K. K. Ellard, D. S. Weisholtz, G. R. Cosgrove, T. Deckersbach, U. T. Eden, E. N. Eskandar, D. D. Dougherty, S. S. Cash, and A. S. Widge, “Closed-loop enhancement and neural decoding of cognitive control in humans,” Nature Biomedical Engineering, vol. 7, no. 4, pp. 576–588, Nov. 2021. [Online]. Available: \urlhttps://doi.org/10.1038/s41551-021-00804-y
[10] X. Yuan, A. Hierlemann, and U. Frey, “Extracellular recording of entire neural networks using a dual-mode microelectrode array with 19584 electrodes and high snr,” IEEE Journal of Solid-State Circuits, vol. 56, no. 8, pp. 2466–2475, 2021.
[11] S. Wang, S. K. Garakoui, H. Chun, D. G. Salinas, W. Sijbers, J. Putzeys, E. Martens, J. Craninckx, N. Van Helleputte, and C. M. Lopez, “A compact quad-shank cmos neural probe with 5,120 addressable recording sites and 384 fully differential parallel channels,” IEEE Transactions on Biomedical Circuits and Systems, vol. 13, no. 6, pp. 1625–1634, 2019.
[12] J. Dragas, V. Viswam, A. Shadmani, Y. Chen, R. Bounik, A. Stettler, M. Radivojevic, S. Geissler, M. E. J. Obien, J. Müller, and A. Hierlemann, “In vitro multi-functional microelectrode array featuring 59 760 electrodes, 2048 electrophysiology channels, stimulation, impedance measurement, and neurotransmitter detection channels,” IEEE Journal of Solid-State Circuits, vol. 52, no. 6, pp. 1576–1590, 2017.
[13] K. Sahasrabuddhe, A. A. Khan, A. P. Singh, T. M. Stern, Y. Ng, A. Tadić, P. Orel, C. LaReau, D. Pouzzner, K. Nishimura, K. M. Boergens, S. Shivakumar, M. S. Hopper, B. Kerr, M.-E. S. Hanna, R. J. Edgington, I. McNamara, D. Fell, P. Gao, A. Babaie-Fishani, S. Veijalainen, A. V. Klekachev, A. M. Stuckey, B. Luyssaert, T. D. Y. Kozai, C. Xie, V. Gilja, B. Dierickx, Y. Kong, M. Straka, H. S. Sohal, and M. R. Angle, “The argo: a high channel count recording system for neural recording in vivo,” Journal of Neural Engineering, Dec. 2020. [Online]. Available: \urlhttps://doi.org/10.1088/1741-2552/abd0ce
[14] P. D. Wolf, “Thermal considerations for the design of an implanted cortical brain–machine interface (bmi),” in Indwelling Neural Implants: Strategies for Contending with the In Vivo Environment, W. M. Reichert, Ed. Boca Raton (FL): CRC Press/Taylor & Francis, 2008, ch. 3, \urlhttps://www.ncbi.nlm.nih.gov/books/NBK3932/.
[15] E. M. and, “An integrated brain-machine interface platform with thousands of channels,” Journal of Medical Internet Research, vol. 21, no. 10, p. e16194, Oct. 2019. [Online]. Available: \urlhttps://doi.org/10.2196/16194
[16] M. Shoaran, M. M. Lopez, V. S. R. Pasupureddi, Y. Leblebici, and A. Schmid, “A low-power area-efficient compressive sensing approach for multi-channel neural recording,” in 2013 IEEE International Symposium on Circuits and Systems (ISCAS), 2013, pp. 2191–2194.
[17] T. Okazawa and I. Akita, “A time-domain analog spatial compressed sensing encoder for multi-channel neural recording,” Sensors, vol. 18, no. 2, p. 184, Jan 2018. [Online]. Available: \urlhttp://dx.doi.org/10.3390/s18010184
[18] W. Zhao, B. Sun, T. Wu, and Z. Yang, “On-chip neural data compression based on compressed sensing with sparse sensing matrices,” IEEE Transactions on Biomedical Circuits and Systems, vol. 12, no. 1, pp. 242–254, 2018.
[19] T. Wu, W. Zhao, E. Keefer, and Z. Yang, “Deep compressive autoencoder for action potential compression in large-scale neural recording,” Journal of Neural Engineering, vol. 15, no. 6, p. 066019, oct 2018. [Online]. Available: \urlhttps://dx.doi.org/10.1088/1741-2552/aae18d
[20] Y. Liu, J. L. Pereira, and T. G. Constandinou, “Clockless continuous-time neural spike sorting: Method, implementation and evaluation,” in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), 2016, pp. 538–541.
[21] Y. Liu, S. Luan, I. Williams, A. Rapeaux, and T. G. Constandinou, “A 64-channel versatile neural recording soc with activity-dependent data throughput,” IEEE Transactions on Biomedical Circuits and Systems, vol. 11, no. 6, pp. 1344–1355, 2017.
[22] Z. Zhang, O. W. Savolainen, and T. G. Constandinou, “Algorithm and hardware considerations for real-time neural signal on-implant processing,” Journal of Neural Engineering, vol. 19, no. 1, p. 016029, feb 2022. [Online]. Available: \urlhttps://dx.doi.org/10.1088/1741-2552/ac5268
[23] Y. Chen, B. Tacca, Y. Chen, D. Biswas, G. Gielen, F. Catthoor, M. Verhelst, and C. M. Lopez, “A 384-channel online-spike-sorting ic using unsupervised geo-osort clustering and achieving 0.0013mm2/ch and $1.78\mu\text{W/ch}$ ,” in 2023 IEEE International Solid-State Circuits Conference (ISSCC), 2023, pp. 486–488.
[24] M. Pagin and M. Ortmanns, “A neural data lossless compression scheme based on spatial and temporal prediction,” in 2017 IEEE Biomedical Circuits and Systems Conference (BioCAS), 2017, pp. 1–4.
[25] D. G. Muratore, P. Tandon, M. Wootters, E. J. Chichilnisky, S. Mitra, and B. Murmann, “A data-compressive wired-or readout for massively parallel neural recording,” in 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 2019, pp. 1–5.
[26] N. Even-Chen, D. G. Muratore, S. D. Stavisky, L. R. Hochberg, J. M. Henderson, B. Murmann, and K. V. Shenoy, “Algorithm and hardware considerations for real-time neural signal on-implant processing,” Nature Biomedical Engineering, vol. 4, pp. 984–996, Aug 2020.
[27] S. Shaikh, R. So, T. Sibindi, C. Libedinsky, and A. Basu, “Towards intelligent intracortical bmi (i²bmi): Low-power neuromorphic decoders that outperform kalman filters,” IEEE Transactions on Biomedical Circuits and Systems, vol. 13, no. 6, pp. 1615–1624, 2019.
[28] Z. Zhang and T. G. Constandinou, “Firing-rate-modulated spike detection and neural decoding co-design,” Journal of Neural Engineering, vol. 20, no. 3, p. 036003, may 2023. [Online]. Available: \urlhttps://dx.doi.org/10.1088/1741-2552/accece
[29] S. Shaikh and A. Basu, Intelligent Intracortical Brain-Machine Interfaces. New York, NY: Springer New York, 2022, pp. 869–889. [Online]. Available: \urlhttps://doi.org/10.1007/978-1-4614-3447-4_64
[30] P. Lichsteiner, C. Posch, and T. Delbruck, “A 128× 128 120 db 15 $\mu$ s latency asynchronous temporal contrast vision sensor,” IEEE Journal of Solid-State Circuits, vol. 43, no. 2, pp. 566–76, 2008.
[31] S.-C. Liu, A. van Schaik, B. A. Minch, and T. Delbruck, “Asynchronous binaural spatial audition sensor with 2 $\,\times\,$ 64 $\,\times\,$ 4 channel output,” IEEE Transactions on Biomedical Circuits and Systems, vol. 8, no. 4, pp. 453–464, 2014.
[32] W. W. Lee, S. L. Kukreja, and N. V. Thakor, “A kilohertz kilotaxel tactile sensor array for investigating spatiotemporal features in neuromorphic touch,” in 2015 IEEE Biomedical Circuits and Systems Conference (BioCAS), 2015, pp. 1–4.
[33] N. Imam and T. A. Cleland, “Rapid online learning and robust recall in a neuromorphic olfactory circuit,” Nature Machine Intelligence, vol. 2, no. 3, pp. 181–191, Mar. 2020. [Online]. Available: \urlhttps://doi.org/10.1038/s42256-020-0159-4
[34] F. Corradi and G. Indiveri, “A neuromorphic event-based neural recording system for smart brain-machine-interfaces,” IEEE Transactions on Biomedical Circuits and Systems, vol. 9, no. 5, pp. 699–709, 2015.
[35] Y. He, F. Corradi, C. Shi, S. van der Ven, M. Timmermans, J. Stuijt, P. Detterer, P. Harpe, L. Lindeboom, E. Hermeling, G. Langereis, E. Chicca, and Y.-H. Liu, “An implantable neuromorphic sensing system featuring near-sensor computation and send-on-delta transmission for wireless neural sensing of peripheral nerves,” IEEE Journal of Solid-State Circuits, vol. 57, no. 10, pp. 3058–3070, 2022.
[36] M. Sharifshazileh and G. Indiveri, “An adaptive event-based data converter for always-on biomedical applications at the edge,” in 2023 IEEE International Symposium on Circuits and Systems (ISCAS), 2023, pp. 1–5.
[37] J. Chen, H. Wu, X. Liu, R. Eskandari, F. Tian, W. Zou, C. Fang, J. Yang, and M. Sawan, “Neuro bmi: A new neuromorphic implantable wireless brain machine interface with a 0.48 µw event-driven noise-tolerant spike detector,” in 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2023, pp. 1–5.
[38] V. Mohan, W. P. Tay, and A. Basu, “Architectural exploration of neuromorphic compression based neural sensing for next-gen wireless implantable-bmi,” in 2023 IEEE International Symposium on Circuits and Systems (ISCAS), 2023, pp. 1–5.
[39] C. Yi, A. Basu, and et al, “A digitally assisted, signal folding neural recording amplifier,” IEEE Transactions on Biomedical Circuits and Systems, vol. 8, no. 4, pp. 528–42, 2014.
[40] A. M. T. Linn, D. A. Tuan, C. Shoushun, and Y. K. Seng, “Adaptive priority toggle asynchronous tree arbiter for aer-based image sensor,” in 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip, 2011, pp. 66–71.
[41] R. Q. Quiroga, Z. Nadasdy, and Y. Ben-Shaul, “Unsupervised Spike Detection and Sorting with Wavelets and Superparamagnetic Clustering,” Neural Computation, vol. 16, no. 8, pp. 1661–1687, 08 2004. [Online]. Available: \urlhttps://doi.org/10.1162/089976604774201631
[42] C. Libedinsky, R. So, and et al, “Independent mobility achieved through a wireless brain-machine interface,” PLOS One, vol. 11, no. 11, 2016.
[43] C. M. Lopez, S. Mitra, J. Putzeys, B. Raducanu, M. Ballini, A. Andrei, S. Severi, M. Welkenhuysen, C. Van Hoof, S. Musa, and R. F. Yazicioglu, “22.7 a 966-electrode neural probe with 384 configurable channels in 0.13µm soi cmos,” in 2016 IEEE International Solid-State Circuits Conference (ISSCC), 2016, pp. 392–393.
[44] N. Steinmetz, M. Carandini, and K. D. Harris, “Single phase3 and dual phase3 neuropixels datasets,” 3 2019. [Online]. Available: \urlhttps://figshare.com/articles/_Single_Phase3_Neuropixels_Dataset/7666892
[45] Z. Zhang and T. G. Constandinou, “Adaptive spike detection and hardware optimization towards autonomous, high-channel-count bmis,” Journal of Neuroscience Methods, vol. 354, p. 109103, 2021. [Online]. Available: \urlhttps://www.sciencedirect.com/science/article/pii/S0165027021000388
[46] S. Mukhopadhyay and G. Ray, “A new interpretation of nonlinear energy operator and its efficacy in spike detection,” IEEE Transactions on Biomedical Engineering, vol. 45, no. 2, pp. 180–187, 1998.
[47] S.-Y. Park, J. Cho, K. Lee, and E. Yoon, “Dynamic power reduction in scalable neural recording interface using spatiotemporal correlation and temporal sparsity of neural signals,” IEEE Journal of Solid-State Circuits, vol. 53, no. 4, pp. 1102–1114, 2018.