This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quantitative Evaluation of Hardware Binary Stochastic Neurons

Orchi Hassan Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh    Supriyo Datta School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47906 USA    Kerem Y. Camsari Department of Electrical and Computer Engineering, University of California, Santa Barbara, Santa Barbara, CA 93106, USA
Abstract

Recently there has been increasing activity to build dedicated Ising Machines to accelerate the solution of combinatorial optimization problems by expressing these problems as a ground-state search of the Ising model. A common theme of such Ising Machines is to tailor the physics of underlying hardware to the mathematics of the Ising model to improve some aspect of performance that is measured in speed to solution, energy consumption per solution or area footprint of the adopted hardware. One such approach to build an Ising spin, or a binary stochastic neuron (BSN), is a compact mixed-signal unit based on a low-barrier nanomagnet based design that uses a single magnetic tunnel junction (MTJ) and three transistors (3T-1MTJ) where the MTJ functions as a stochastic resistor (1SR). Such a compact unit can drastically reduce the area footprint of BSNs while promising massive scalability by leveraging the existing Magnetic RAM (MRAM) technology that has integrated 1T-1MTJ cells in \sim Gbit densities. The 3T-1SR design however can be realized using different materials or devices that provide naturally fluctuating resistances. Extending previous work, we evaluate hardware BSNs from this general perspective by classifying necessary and sufficient conditions to design a fast and energy-efficient BSN that can be used in scaled Ising Machine implementations. We connect our device analysis to systems-level metrics by emphasizing hardware-independent figures-of-merit such as flips per second and dissipated energy per random bit that can be used to classify any Ising Machine.

preprint: AIP/123-QED

I Introduction

In the era of internet of things (IoT), combinatorial optimization problems are ubiquitous Yamaoka et al. (2015). In fact, most of the real-problems that quantum computers are aiming to solve can be formulated as combinatorial optimization problems.From directing traffic flow Neukart et al. (2017), to routing interconnections in integrated circuit design Barahona et al. (1988); Cook et al. (2018), to making financial decisions Rosenberg et al. (2016), drug discoveries Sakaguchi et al. (2016), etc. - all involve solving a form of combinatorial optimization problems. The demand for solving these problems faster and more efficiently is ever-increasing. But such problems typically fall into the category of NP-hard or NP-complete class in complexity theory Barahona (1982), with no known polynomial time solution, making them notoriously difficult to solve in digital computers using traditional computing methods. This has given rise to a new paradigm in computing, namely Ising computing. Ising computing maps combinatorial optimization problems to an Ising model, and solves it by searching for the ground state of the system described by Lucas (2014); Sutton et al. (2017):

E=I0(12i,j=1NJijmimj+i=1Nhimi)E=-I_{0}\left(\frac{1}{2}\sum_{i,j=1}^{N}J_{ij}m_{i}m_{j}+\sum_{i=1}^{N}h_{i}m_{i}\right) (1)

where, mm denotes the Ising spin, JJ is the coupling co-efficient, hh is the external bias, and I0I_{0} is the annealing parameter which is proportional to the inverse of temperature. In the machine learning field, the same underlying principle is used for Boltzmann Machines with the annealing parameter being 1. The binary stochastic neurons (BSNs) BSN of stochastic neural networks are well suited to function as a ‘spin’ is such systems, described mathematically by:

mi=sgn[tanh(Ii)ri]m_{i}=\mathrm{sgn}[\tanh(I_{i})-r_{i}] (2)

where rir_{i} is a random number between +1\rm+1 and 1\rm-1, and Ii=E/miI_{i}=-\partial E/\partial m_{i} is the input to the neuron.

Refer to caption
Figure 1: 1MTJ-3T compact BSN hardware which utilizes the natural physics of low-barrier nanomagnets holds the promise to accelerate the simulated annealing processors (a) Shows the underlying working principle of ising Machines. (b) Shows an implementation scheme utilizing MTJ and memristive crossbar arrays, where the BSN is the Ising spin mim_{i}, memristors (RijR_{ij}) implement the weight and bias co-efficients , and the feedback resistor RR can control the annealing temperature electrically.

Given the importance of optimization problems, a lot of research has gone into developing algorithms and identifying appropriate hardware for Ising computing. Various approaches including quantum computers based on quantum annealing (QA) or adiabatic quantum optimization (AQC) implemented with superconducting circuits Johnson et al. (2011), coherent Ising machines (CIMs) implemented with laser pulses McMahon et al. (2016), phase-change oscillators Dutta et al. (2020), or CMOS oscillators Goto, Tatsumura, and Dixon (2019); Wang and Roychowdhury (2019); Ahmed, Chiu, and Kim (2020); Chou et al. (2019) and digital annealers based on simulated annealing (SA) Kirkpatrick, Gelatt, and Vecchi (1983) implemented with digital circuits Baity-Jesi et al. (2014); Yamaoka et al. (2015); Takemoto et al. (2019); Aramon et al. (2019); Yamamoto et al. (2020); Patel et al. (2020); Patel, Canoza, and Salahuddin (2020) are being explored.

In this paper we comprehensively evaluate and characterize a stochastic magnetic tunnel junction (sMTJ) based realization of the Ising spin (eqn.2) where random numbers are generated using the natural physics of low barrier nanomagnets Camsari, Salahuddin, and Datta (2017) in a compact design. A network of these BSN units can be coupled with a memristive crossbar array Xia et al. (2016); Cai et al. (2019a); Bayat et al. (2018) to perform the synaptic operation as shown in Fig. 1 can drastically improve the area requirements and accelerate computation speed of Ising Machines. We evaluate the performance of the BSN device in terms of its energy and delay metrics and connect these to the problem and substrate-independent metric of flips per second that the probabilistic system makes Sutton et al. (2019).

Our evaluation of 1MTJ-3T BSN design considers different types of low-barrier nanomagnet realizations of MTJs. As the MTJ essentially functions as a two-terminal stochastic resistor (SR), we first take a general 3T-1SR design approach, classifying necessary and sufficient conditions for achieving the BSN operation for different types of SRs in Section II. We relate these conditions to the different sMTJ realizations in Section III. We report the timescale of operation, power and energy for each case based on benchmarked SPICE simulations of the BSN hardware consisting of spintronic elements from a modular circuit framework Torunbalci et al. (2018) coupled to 14 nm FinFET PTM models pre , and provide analytical results for relevant quantities in Section IV. Lastly, we use these device performance metrics to project onto hardware performance figures of merit such as flips per second that a probabilistic sampler makes. Our projections indicate orders of magnitude improvement potential over current digital implementations.

II General Approach to Design of BSN

Binary stochastic neurons (BSNs) are well suited to function as a ‘spin’ in Ising machines for solving combinatorial optimization problems BSN ; Hassan et al. (2019). A compact and efficient hardware realization of the BSN leveraging the natural physics of stochastic nanomagnets can be made by using unstable magnetic tunnel junctions (MTJs) Daniels et al. (2020); Parks et al. (2018); Grollier et al. (2020); Abeed and Bandyopadhyay (2019); Drobitch and Bandyopadhyay (2019) as shown in Fig. 1.

The compact design of BSN based on low-barrier magnet (LBM) stochastic MTJs (sMTJs) was first proposed in 2017 Camsari, Salahuddin, and Datta (2017). Using magnet and circuit physics to analyze the performance, it was reported that using an LBM in a circular disk geometry with energy barriers below kBT\rm k_{B}T as the free layer of an MTJ results in sub-ns response times requiring only \sim a few fJ of energy per random bit Hassan et al. (2019). The proposed design and the performance analysis considers a very specific type of sMTJ which had circular in-plane magnetic anisotropy (IMA) whose fluctuations are undisturbed by the current in the circuit for typical current drive conditions. However, in 2019, a version of the BSN design that was implemented in hardware to solve an 8-bit factorization problem Borders et al. (2019), consisted of an sMTJ with perpendicular anisotropy (PMA) and a barrier of a few kBT\rm k_{B}T as its free layer. Unlike the circular in-plane design, the PMA design relied on its resistance being tunable by the spin-transfer-torque effect in order to achieve the BSN operation. This has called for an extension of our initial analysis presented in Hassan et al. (2019) which we systematically perform in this paper.

As the MTJs in the BSN circuit effectively act as a fluctuating resistor, RR Parks et al. (2020) and the design principle is independent of this realization, for establishing the fundamental design rules we approach it from a general perspective and we hope these design rules stimulate discussion in the realization of different stochastic resistors that use different mechanisms Cheemalavagu et al. (2005); Shukla et al. (2014); Kumar, Strachan, and Williams (2017); Stampfer et al. (2018); Cai et al. (2019b); Camsari et al. (2020).

II.1 Types of fluctuating resistances

We categorize the fluctuating RR into four types. First, based on the fluctuating nature it can be continuous or bipolar (telegraphic). Second, it can be tunable or non-tunable depending on whether it is affected by the current that is flowing through it.

Refer to caption
Figure 2: Categorizing Resistances: (a) Fluctuating nature: they can be continuous or bipolar. The time dynamics and distribution are shown for each category. (b) Current-Tunability: The fluctuations could be unaffected by II or it could be a function of II as indicated by their transfer characteristics. I50I_{50} is the current at the 50:50 point where the resistance spends equal time in RP\rm R_{P} and RAP\rm R_{AP} states. I0I_{0} is the biasing current defined as the slope of the (R vs I) curve at 50:50 point. The pinning current is typically 35I0\sim 3-5\ I_{0}.
Refer to caption
Figure 3: Transfer Characteristics : The BSN circuit is realized by coupling the fluctuating resistor which is the physical realization of the random variable rir_{i} in the BSN equation to an NMOS which provides the tunability, and then to an inverter which thresholds the output. The four types of resistances are coupled to a 14 nm FinFET and the resistance parameters (based on experimental demonstrations of MTJs Lin et al. (2009)) are chosen to match the transistor characteristics. All resistance types except for the bipolar non-tunable were able to achieve BSN operation following eq. 2. To function as a BSN the bipolar resistances need some means of tuning their probability distribution.

A continuous resistor can have its resistance being any value between [RPRAP]\rm[R_{P}\rightarrow R_{AP}] while a bipolar resistor only assumes the two values RP\rm R_{P} and RAP\rm R_{AP} as shown in Fig. 2(a). The distribution of continuous resistances can be of different types as well. It can be uniform or follow slightly bimodal distribution in the case of an MTJ as shown in the figure. Different distributions typically result in different average RR values, slightly bimodal or uniform distributions are better suited than Gaussian distributions for BSN realizations.

The current II flowing in the circuit can tune the probability distribution of the resistance fluctuations, and we call such resistors tunable resistors. When designing a BSN with current tunable R, we need to know the current where fluctuations are equal between the two extreme states (I50\rm I_{50}) Parks et al. (2020) and the current required to pin the resistance to one of those states. An important parameter in this case is the bias current I0I_{0}, which is the slope of the RvsI\rm R~vs~I curve at the 50-50 point. Typically, 35I0\rm\sim 3-5\ I_{0} current is required to pin the fluctuating resistance to one of its states. We will later provide analytical expressions for I0I_{0} for four cases of resistors that can be obtained by various MTJs (Fig. 9).

Based on this analysis, we categorize the fluctuating resistance into four types: Non-tunable continuous (NTC), Non-tunable bipolar (NTB), tunable continuous (TC) and tunable bipolar (TB).

II.2 Performing the BSN function

We first take a look at the transfer characteristics of the device to see whether the four types of resistance can faithfully mimic BSN operation described by eqn.2. The fluctuating RR is a physical realization of the random variable rir_{i}, the NMOS acts as a constant current source that provides tunability, and the inverter performs the sgn\mathrm{sgn} operation in eqn.2.

Refer to caption
Figure 4: Non-tunable Continuous vs Bipolar Resistance : (a) Transfer Characteristics shows that while the continuous resistor results in a sigmoidal output, the bipolar gives a stair-case like function. (b) The bipolar R is unable to follow the Boltzmann distribution of the invertible AND gate (description in ref.Camsari et al. (2017)). All states remain equally probable.

Fig. 3, shows that while all other resistance types were able to reproduce the desired sigmoidal average curve mi=tanh(Ii)\rm\langle m_{i}\rangle=tanh(I_{i}), the non-tunable bipolar resistor gives a staircase-like function instead. This is because of the fixed delta function like resistance distribution at the two extreme states (see Fig. 2(a)ii. As there is no continuity in the resistance distribution and no additional means of tuning the delta distribution itself has been introduced to the structure, the BSN output fluctuations are equal until either of the threshold points are crossed, resulting in the stair-case like function.

Mathematically, when the resistance is bipolar, it means rir_{i} is ±1\pm 1. So, for any input IiI_{i} where |tanh(Ii)|<1\rm|tanh(I_{i})|<1, the output m\rm\langle m\rangle is equal to zero. In fig. 4(b), if we look at a simple invertible AND gate Camsari et al. (2017); Camsari, Salahuddin, and Datta (2017) operation, it is seen that devices with stair-case like function like this are not suitable for performing as BSNs. This has been demonstrated experimentally in ref.Lv, Bloom, and Wang (2019); Zink, Lv, and Wang (2019) where a stable MTJ was used as a bipolar resistor whose distribution was tuned by an external field. However, this issue could be resolved by introducing external/additional control parameters like external field as shown in the same experiment.

II.3 Parameter Dependence and Design Choices

Fig. 3 is created with a fixed set of parameters for the resistor and coupled with a specific transistor technology, 14 nm FinFET models. In this section we explore how the transfer characteristics are affected by different parameters of the resistors and FET characteristics and how to choose the right combination of RR and FET to be coupled.

Stochastic Region: The stochastic region, which we define next, is a function of the resistance ratio nn for non-tunable resistors and biasing current I0I_{0} for tunable resistors as shown in Fig.5, that needs to be matched with the transistor characteristics.

Refer to caption
Figure 5: Effect of n and I0\rm I_{0} : The stochastic region of the non-tunable resistances are determined by the resistance ratio n=RP/RAP\rm n=R_{P}/R_{AP}, while the biasing current I0\rm I_{0} of tunable resistances control the stochastic region. For large biasing currents, the tunable resistors behave effectively like non-tunable resistances.

Effect of n: The resistance ratio n=RP/RAP\rm n=R_{P}/R_{AP} is directly related to the stochastic region Δv\Delta\rm v through the NMOS characteristics in case of non-tunable resistor designs. The edge of the stochastic region v±v^{\pm} is defined by when Vi=VDD/2[I+RP,IRAP]0\rm V_{i}=V_{DD}/2-[I^{+}R_{P},I^{-}R_{AP}]\approx 0 where the current I±\rm I^{\pm} is determined by the NMOS as shown in Fig. 6(c). For a desired Δv=v+v\rm\Delta v=v^{+}-v^{-} (stochastic region) and NMOS transistor, the required n=RAP/RP\rm n=R_{AP}/R_{P} should approximately equal I+/II^{+}/I^{-}. Ideally, the minimum value of the resistance should be RP=(VDD/2)/I+\rm R_{P}=(V_{DD}/2)/I^{+} and to get full pinning, Δv\Delta v should be less than VDD\rm V_{DD}. For a 14 nm FinFET, to get a stochastic region of Δv=50200mV\Delta v=\rm 50-200mV, the resistance ratio nn should be around 2502-50. The resistance ratio nn is a measure for tunneling magneto-resistance, TMR (=(n1)×100%\rm=(n-1)\times 100\%) in case of MTJs. For the non-tunable case, TMR needs to be large enough to provide a voltage swing large enough to overcome the noise margins of the inverter Hassan et al. (2019), and it should be small enough so that output pinning is achieved within the given input range. Typically MTJs have TMRs ranging from 100300%100-300\% Parkin et al. (2004) with a maximum reported TMR of 604%604\% Ikeda et al. (2008), so the resistance ratio of MTJs are well within the desired range, but the general requirements we outline should be applicable for other types of stochastic resistors as well.

Refer to caption
Figure 6: Stochastic Region boundaries : The stochastic region boundaries [v+,vv^{+},v^{-}] are set by different parameters for tunable and non-tunable resistors. (a) Shows the BSN circuit with (b) the current transfer characteristics of the 14 nm FinFET NMOS when Vi0V\rm V_{i}\sim 0V. (c) Non-tunable R : In this case the boundaries are set by when Vi0V_{i}\approx 0 when resistance ratio n=RAP/RPI+/I\rm n=R_{AP}/R_{P}\approx I^{+}/I^{-}. (d) Tunable R : The stochastic range is determined by pinning current IP\rm I_{P} characteristics of the resistance. The transfer characteristics of each stage in (c) and (d) indicates the stochastic range v+v^{+} and vv^{-} and the relation to the NMOS characteristics in each case in (b).

Effect of I0\rm I_{0}: In case of tunable resistances, the stochastic region is independent of the resistance ratio and depends on the pinning current and thus the bias current (IP±I0I_{P}^{\pm}\propto I_{0}) instead as shown in Fig. 6(d). For large bias currents (I0II_{0}\gg I), the tunable resistances act essentially like non-tunable resistances. To get the full range of R,the NMOS needs to be able to supply the pinning current. If the pinning current is (35)I0~(3-5)I_{0} as shown in Fig. 2, then to get the full range of the resistance IPmax+\rm I_{Pmax}^{+} needs to be around (610)I0\sim(6-10)I_{0}. In case of 14 nm FinFETs, Imax+I^{+}_{max} is around 40μA\sim 40\ \mu A, restricting I0I_{0} to values less than 7μA7\ \mu A.

Choice of I50\rm I_{50}: Another parameter that is important for the operation of tunable resistors is the I50I_{50} which determines the midpoint of the sigmoid. I50I_{50} is the current at which the resistance on average spends equal time in RPR_{P} and RAPR_{AP} states Parks et al. (2020). As the circuit can only support positive current values, it needs to be a positive quantity and preferably matched with the saturation point (VDS=VGS\rm V_{DS}=V_{GS}) current IDsatI_{Dsat} of the NMOS transistor. Changing I50I_{50} shifts the transfer characteristics laterally as shown in Fig. 7(a).

Refer to caption
Figure 7: (a) Choice of I50\rm I_{50}: I50I_{50} is ideally a positive quantity matched with the IDsat\rm I_{Dsat} of the transistor, changing I50I_{50} results in a lateral shift of the sigmoid. (b) RR vs II relationship: The output characteristics also depend on the nature of the resistance tunability with the circuit current II. If RR decreases with I (RAPRP\rm R_{AP}\rightarrow R_{P}), the opposing characteristics of the transistor current and resistance change result in a non-monotonic output.

R vs I: One last requirement is that, for current tunable resistance with increasing current II, the resistance needs to increase from RPRAPR_{P}\rightarrow R_{AP}. This can be understood intuitively: Increasing II means the NMOS transistor is becoming more conductive. If the MTJ concomitantly becomes more conductive as II is increasing, the transfer characteristics can show non-monotonic behavior as shown in Fig. 7(b). This requirement holds true irrespective of whether the circuit’s RR branch consists of a PMOS-1R or 1R-NMOS topology.

III Realization of fluctuating resistances with sMTJs

A magnetic-tunnel-junction (MTJ) whose free layer is a low-barrier magnet (LBM) could serve as a physical realization of fluctuating resistors. Depending on the nature and characteristics of the LBM magnetization fluctuations, we can get different types of R. Our previous analysis Hassan et al. (2019) was restricted to one type of LBM, the circular IMA with barrier <kBT\rm<k_{B}T, in this section we extend it to include all possible LBMs.

A general description of the energy associated with a magnet is given by Hassan et al. (2019):

E\displaystyle E =12HkpMsΩ(1mx2)+12HkiMsΩ(1mz2)\displaystyle=\frac{1}{2}H_{kp}M_{s}\Omega(1-m_{x}^{2})+\frac{1}{2}H_{ki}M_{s}\Omega(1-m_{z}^{2}) (3)
H^extMsΩm^\displaystyle~-~\hat{H}_{ext}M_{s}\Omega\cdot\hat{m}

where, Hkp=2Ks/t4πMs\rm H_{kp}=2K_{s}/t-4\pi M_{s} is the perpendicular anisotropy field along the x-axis, Ks\rm K_{s} is the surface anisotropy density, Hki\rm H_{ki} is the in-plane anisotropy along z-axis, Hext\rm H_{ext} is the external field, Ms\rm M_{s} is the saturation magnetization and Ω=π(D/2)2t\rm\Omega=\pi(D/2)^{2}t is the volume of the magnet. By adjusting the thickness or the shape of the magnet, the magnetic anisotropy of the magnet can be scaled to behave like a low-barrier magnet Debashis et al. (2018); Hassan et al. (2019).Second order magnetic anisotropy effect and in-plane components of demagnetization fields have not been considered here and left for future investigation since the macroscopic models without it seems to be reasonably consistent with recent experimental results involving low barrier magnets Debashis et al. (2016); Safranski et al. (2020); Parks et al. (2020); Zhang et al. (2021). We use the stochastic LLG module from our spintronics library nanohub.org to simulate the LBM dynamics in HSPICE using its transient noise function. This model has been carefully benchmarked against general Fokker-Planck based methods Torunbalci et al. (2018).

Refer to caption
Figure 8: Low-barrier magnet fluctuation dynamics: We use the benchmarked stochastic LLG module to simulate LBM dynamics. The saturation magnetization is considered to be Ms=1000emu/ccM_{s}=1000~\rm emu/cc, Ω=6.3×1019cc\Omega=6.3\times 10^{-19}~\rm cc, and HkH_{k} adjusted to get the indicated Δ\Delta. Each simulation is carried out with a time-step at least ×100\times 100 smaller for a time-duration ×1000\times 1000 than characteristic timescales to avoid any simulation time dependencies, the exact parameters are indicated.

Δ<kBT\rm\Delta<k_{B}T magnets have more continuous fluctuations with (b) having a more uniform distribution than (a) while slightly higher barrier magnets have a more telegraphic fluctuation. In both cases, the presence of high demagnetization fields cause faster fluctuations in IMA magnets.

Refer to caption
Figure 9: MTJ Free layer and its corresponding R type along with corresponding characteristic parameters and their analytical expression. The numbers in bracket indicates an approximate range of values for each parameter. The proportionality constant for correlation time of magnets with Δ>kBT\rm\Delta>k_{B}T is τ00.11ns\rm\tau_{0}\sim 0.1-1ns, exact equation can be found in Coffey and Kalmykov (2012).

LBM Magnet Fluctuation Dynamics: By low-barrier magnet we refer to magnets whose barrier is <10kBT<10k_{B}T or so, whose magnetization fluctuates randomly in presence of thermal noise. Interestingly, the magnetization dynamics of low-barrier magnets with barrier <kBT<k_{B}T are different from those with a slightly higher barrier Hassan et al. (2019); Kaiser et al. (2019). The simple exponential dependence of retention time of the magnetization state on the barrier height is not valid around or below kBTk_{B}T Coffey and Kalmykov (2012).

Fig. 8 shows the fluctuation dynamics, the magnetization distribution, and the auto-correlation time (τCORR\tau_{CORR}) for low barrier magnets. Magnetization fluctuations translate into resistance fluctuations in MTJ, and we see that magnets with barrier <kBT\rm<k_{B}T act like continuous resistances, while slightly higher barrier magnets, which have a more defined two states, give telegraphic fluctuations, and in both cases IMA magnets fluctuate orders of magnitude faster than their PMA counterparts due to a novel mechanism where the demagnetization field plays a central role Pufall et al. (2004); Safranski et al. (2020); Hassan et al. (2019); Kaiser et al. (2019); Faria, Camsari, and Datta (2017).

Current Response of LBM Magnets: Magnetic fluctuations can be tuned by spin-current. For high barrier magnets, the minimum current required to switch the magnetization is called the critical current Sun (2000), in case of low-barrier magnets, we refer to it as a biasing current, defined by the inverse of the derivative taken at m=0\rm\langle m\rangle=0, mathematically expressed as: I0=(m/IS)1\rm I_{0}=(\langle m\rangle/I_{S})^{-1} at low bias (IS\rm I_{S}). The current required to pin the magnetization, similar to switching current in high-barrier magnets is assumed to be 35I0\rm\sim 3-5\ I_{0}, as indicated in Fig. 2. IMA magnets have a much larger pinning current than PMA magnets because of the large demagnetization field present due to their disk shape Hassan et al. (2019); Sun (2000); Faria, Camsari, and Datta (2018), meaning transistors with much larger current ranges would be required for IMA magnet MTJs than PMA for tunable resistors.

Refer to caption
Figure 10: Current Response: LBM response to spin-current with and without external fields for (a) circular IMA magnet (Hki0,HkpHD\rm H_{ki}\sim 0,H_{kp}\sim-H_{D}) and (b) isotropic anisotropy magnet (Hkp0\rm H_{kp}\sim 0). Each point on the curve is a long-time (T=1μsT=1\mu s, Δt=1ps\rm\Delta t=1ps) average magnetization from our benchmarked sLLG module. The critical field for IMA magnet was 130Oe\rm\sim 130\ Oe and for isotropic magnet 200Oe\rm\sim 200\ Oe.

An important thing to note here is the current tunability in presence of an external field which can arise, for example, due to the fixed, stable layer that acts as a reference to the free layer in the MTJ. In the case of high-barrier magnets, the spin-current induced magnetic switching hysteresis loop just shifts in case of PMA magnets depending on the direction of field, but for IMA magnets the shape of the hysteresis and magnet dynamics is changed Sun (2000). The large demagnetizing field present perpendicular to the magnetization plane in IMA magnets causes the magnetization to precess around it when spin-current is applied in the opposite direction to the external field. The same is observed in low-barrier magnets as shown in Fig. 10. The larger the external field the more pronounced the effect is. The uniform precessional motion kicks in at high-field, when the current is close to the biasing current or higher applied in the opposite direction to the field. Very recently, this has been observed experimentally for low fields Safranski et al. (2020). While this is an undesired effect in case of our BSN operation, this can be useful in context to oscillator based networks Romera et al. (2018).

This has important implications in terms of acting as a fluctuating resistance in a BSN circuit. IMA magnets with external fields (i.e. uncompensated dipolar fields in MTJ Jenkins et al. (2019)) greater than its pinning field is not suited to function as a tunable or non-tunable resistor. IMA magnets with continuous magnetization coupled to a transistor with small saturation current (tensofμA\rm tens~of~\mu A) compared to the biasing current of IMA (hundredsofμA\rm hundreds~of~\mu A) can work as non-tunable resistors, and as experimental observations in ref. Safranski et al. (2020) suggest, it can withstand small (compared to its pinning field) stray fields.

PMA magnet MTJs with their small biasing current (\sim few to few tens of μA\mu A) when coupled to typical transistors act as tunable resistors in BSN circuit. In this case the external bias field is actually preferred, since this enables positive I50\rm I_{50} current Borders et al. (2019).

So, if we coupled an MTJ with a 14 nm FinFET (VDD=0.8\rm V_{DD}=0.8 and IDsat=15μA\rm I_{Dsat}=15\mu A) pre , the table in Fig. 9 summarizes the resistance mapping and the associated parameters.

IV Performance Evaluation of MTJ based BSN

In the final section we compare the physical performance of these different sMTJs in a BSN.

Timescale of Operation: The two relevant timescales of operation for a BSN are, the correlation time τC\tau_{C} which is the average time it takes to produce a new output at given input and the response time τN\tau_{N} which is defined as the average time it takes for the circuit to give a random output with correct statistics as the input is changed Hassan et al. (2019). Fig. 11 shows the two timescales for the three types of fluctuating resistances for MTJs with two different timescales. For simplicity we assumed the correlation time to be same for all types of magnets, but in reality they would follow the τCORR\rm\tau_{CORR} relations indicated in Fig. 9 Kaiser et al. (2019); Hassan et al. (2019).

Refer to caption
Figure 11: Timescales of Operation for each resistor type for two fluctuation times τC[160ps,320ps]\tau_{C}\sim[160~\rm{ps},320~\rm{ps}] are shown. The resistances are engineered to have similar characteristic timescales but different fluctuation behavior (tunable, non-tunable and continuous and bipolar fluctuation) for comparison purposes.

Fig. 11 shows that the response time, τN\tau_{N} for non-tunable resistor is independent of the fluctuation time of the resistance, it is rather proportional to the RC delay of the circuit. While for the tunable cases, the response time is related to the characteristic timescales of the resistor. But the time to give new numbers or flip rate τC\tau_{C} at VIN=0\rm V_{IN}=0 is entirely resistance fluctuation time dependent for all cases (τCτCORR\tau_{C}\approx\tau_{CORR}). So for the tunable case, the two said timescales of operation are likely to be similar as they are governed by the magnet fluctuation characteristics while for the non-tunable case, the response time which is RC dependent has the potential to be very short compared to the magnet dependent correlation time. For most applications this difference may not be of importance but for some applications where the network is directed, like Bayesian inference having two different timescales seems to be a requisite Faria et al. (2020).

Power: Our SPICE simulations indicate that the average power consumed by the BSN circuit in its stochastic region is P2×VDDIDsat\rm\langle P\rangle\approx 2\times V_{DD}I_{Dsat} Hassan et al. (2019). The 22 is for the two branches, the MTJ branch and the inverter branch. This holds true for all types of resistors. For a 14 nm FinFET with VDD=0.8V\rm V_{DD}=0.8V and IDsat15μA\rm I_{Dsat}\sim 15\mu A, P20μW\rm\langle P\rangle\sim 20\mu W. While the power is almost independent of TMR or the resistance ratio (n) for a set 50-50 point and technology for the MTJ branch, its joule heating increases with increasing TMR (n\sim\propto\sqrt{n}) in the positive pinning region as the NMOS resistance reduces. So the lowest TMR that ensures a voltage swing ViV_{i} greater than the noise margin of the inverter is considered best suited for BSN operation. The MTJ branch power could be reduced by operating in subthreshold region IDsub1μA\rm I_{Dsub}\sim 1\mu A, but this reduces the total power by ×0.5\times 0.5 while trading-off with an ×10\times 10 increase in the RC response time. Given the flexibility, it is preferable to design the MTJ to operate in the saturation region of transistor. For tunable case this means matching I50IDsat\rm I_{50}\sim I_{Dsat}, for non-tunable this means having R(VDD/2)/IDsat\rm\langle R\rangle\approx(V_{DD}/2)/I_{Dsat}.

Energy: As there are two timescales associated with the BSN operation, we can define two energy as well. The energy to produce first random number after the input changes, ENτNP\rm E_{N}\sim\tau_{N}\langle P\rangle and the energy required to produce new random numbers at a given input state, EC=τCPE_{C}=\tau_{C}\langle P\rangle. Fig. 12(a) shows an energy delay plot indicating the ranges for each type of MTJs. When describing the performance of a hardware BSN, we generally refer to the correlation time τC\tau_{C} for delay and ECE_{C} for the energy. The individual energy-delay numbers can be used to project performance parameters for processors built with them.

Refer to caption
Figure 12: (a) Energy-Delay of each type of MTJ based BSN assuming an average power of 20μW\rm 20\ \rm{\mu W} and timescales in Fig. 10. (b) flips per second projections for different nunmber of neurons for each type of MTJs. For these projections only BSN performance numbers are used, synapse would add to the power and thus energy per flip number.
Refer to caption
Figure 13: flips per second (fps) is a substrate and algorithm independent performance metric for simulated annealing processors much like the flops per second metric used for general purpose computers. It is a measure of how many flips, and hence spin configurations the system can cycle through in a second. fps can be derived from the reported performance metrics of the processors following ref. Sutton et al. (2019). The reported and derived quantities as indicated. Current CMOS based annealing processors perform at 1012fps\rm\sim 10^{12}~fps. We project that MTJ based hardware can increase by a few orders of magnitude.

Hardware Projections: Typically the performance of an Ising hardware is measured in terms of time and energy it takes to solve a specific problem. Time to solution depends not only on the physical hardware performance but also on the algorithm that is being implemented. Here, we emphasize measuring the hardware performance in terms of a purely hardware metric flips per second (fps) Sutton et al. (2019); Isakov et al. (2015); Baity-Jesi et al. (2014), which refers to the maximum number of spin configurations the hardware can cycle through per second. It depends on the number of spins in the system (N) and the time it takes for a spin to flip (τ\rm\tau), f=N/τf=N/\tau.

For the digital annealers the spin update time is usually determined by its clock period (τclk\rm\tau_{clk}) which ranges typically in tens of ns range. To ensure fidelity simultaneous updates of connected spins needs to be avoided Aarts, Aarts, and Lenstra (2003) forcing digital annealers that operate on clock edge to update spins sequentially. So in a network where all spins are connected effectively only one spin can update per clock cycle Aramon et al. (2019). But it need not be if some spins are unconnected (i.e. nearest neighbor Yamaoka et al. (2015); Baity-Jesi et al. (2014), or king-graph Takemoto et al. (2019) connection, or if spins are parallelized by implementing special algorithms Yamamoto et al. (2020); Patel et al. (2020); Patel, Canoza, and Salahuddin (2020). Based on the reported total spin number and clock speeds of digital annealing hardware today which have about 10K\rm\sim 10K neurons that can update per 10ns\rm\sim 10ns clock period, we derive an estimation of their performance at f104/108=1012\rm f\sim 10^{4}/10^{-8}=10^{12} flips per second Yamaoka et al. (2015); Sutton et al. (2019) as shown in Fig. 13.

Compared to digital annealers the Ising spin hardware we presented in this work can work autonomously, i.e, without a synchronizing clock or a sequencer Sutton et al. (2019); Faria et al. (2020); Kaiser et al. (2020). In this mode, the speeds are governed by neuron (τneu\tau_{neu}) and synapse (τsyn\tau_{syn}) time only, and to ensure fidelity and avoid simultaneous updates of connected BSNs the synapse needs to update faster than the the neuron (τsyn<τneu\rm\tau_{syn}<\tau_{neu}). Sutton et. al.Sutton et al. (2019) defines a metric s=τsyn/τneu\rm s=\tau_{syn}/\tau_{neu} and showed that to ensure the fidelity of operations s\rm s needs to be less than 1. The exact requirements are problem and architecture dependent. Memristive crossbar arrays paired with a fast summing amplifier synapse could operate very efficiently at as low as few tens of ps speeds Xia et al. (2016); Cai et al. (2019c); Huang et al. (2015); Bayat et al. (2018); Hu et al. (2018); Cai et al. (2019a).

The digital annealers mimic the Ising spin using a combination of random-number generators (LFSR, Xoshiro, etc.), look-up-tables (LUT) and comparators. The random number generator (RNG) unit is one of the most are expensive elements in the design Gyoten, Hiromoto, and Sato (2018). Even in the most optimized design, the RNG unit take up 11%\rm\sim 11\% of the total logic gate area Yamamoto et al. (2020). The 3T-1MTJ design offers drastic reduction in the area footprint, promising massive scalability leveraging existing 1T-1MTJ Magnetic RAM technology that already has 1Gbit integrated cells Aggarwal et al. (2019); eve (2019).

Fig. 12(b) projects fps number considering ττneuτCORR\rm\tau\equiv\tau_{neu}\approx\tau_{CORR} for different no of spins, N. An MTJ realization with circular IMA, with \sim ns timescale can offer almost two orders of magnitude speedup with <10k\rm<10k neurons. If spins are implemented in Gbit densities all stochastic implementations seem to outperform the CMOS implementations. For such systems the upper bound for N is ultimately determined either by area or by power budget of the chip. Note that the fps number does not reflect the connectivity of the spins or the algorithm implemented by the hardware. It also does not indicate the solution accuracy obtainable for specific problems Zhang et al. (2020). What we highlight here is that using the natural physics of the MTJ we can design a very compact realization of eq. 2 compared to current state of the art CMOS implementations, and despite being a magnetic circuit, low barrier magnet implementations even offer an overall speed up due to their fast fluctuation rates.

V Conclusion

In this paper, we presented a comprehensive evaluation of naturally stochastic magnetic building blocks for implementing probabilistic algorithms compactly and efficiently. We generalized the proposed 1MTJ-3T design to a 1SR-3T design and presented necessary design rules for BSN operation that we hope will stimulate further interest in finding stochastic resistance (1SR) with suitable properties. We extended the physical performance analysis of the 1MTJ-3T BSN design to include unstable MTJ’s with different low-barrier-magnets as free layers. They are evaluated as physical realizations of the general stochastic resistor (SR) with respect to 14 nm FinFET transistors. IMA magnets with barrier kBT\leq k_{B}T proved to be the best option, low-barrier PMA can function as current-tunable resistors as well. While careful optimization of the fixed layer to cancel the stray fields in IMA MTJ is preferred, PMA can benefit from the presence of stray fields (can be a source of the I50\rm I_{50}). The most challenging set of working conditions are set for telegraphic IMA magnets, even if they are highly optimized and no stray fields are present in the circuit, they need to be coupled with high current transistors due to their high pinning currents, because if paired with low current transistors like 14 nm FinFET results in a staircase-like functional behavior which does not work as a p-bit as we discussed.

These BSNs are an integral part of Ising machines which are often referred to as annealing processors. Using 1MTJ-3T BSN could speed up the operation of these processors by orders of magnitude. Another important application space for these BSN is stochastic neural networks Kaiser et al. (2020); Nasrin et al. (2019); Schuman et al. (2017); Hinton (2002). In fact, binary stochastic neurons are desired for deep learning networks, but are typically avoided because it is harder to generate random bits in CMOS hardware Courbariaux et al. (2016). Use of this compact neuron that relies on MTJs natural physics to provide stochastic binarization could accelerate computation in custom hardware Tsai et al. (2017); Park et al. (2015) by faster evaluation of BSN function Hassan et al. (2019) and also encourage algorithmic advancement using BSN.

Acknowledgements.
This work was supported by the Center for Probabilistic Spin Logic for Low-Energy Boolean and Non-Boolean Computing (CAPSL), one of the Nanoelectronic Computing Research (nCORE) Centers as task 2759.005, a Semiconductor Research Corporation (SRC) program sponsored by the NSF through CCF 1739635.

Appendix A Derivation for Pinning Field of LBM

Magnets are generally used to store information putting the focus on the evaluating and predicting characteristics of stable high-barrier magnets. It is interesting to note that theoretical predictions and analytical derivations regarding low-barrier magnet (ΔkBT\rm\Delta\leq k_{B}T) dynamics typically receive less attention as cases of ’least practical interest’Brown Jr (1963). We document the analytical expressions associated with LBM in Fig. 9. The expressions for correlation time and biasing current can be found in ref.Coffey and Kalmykov (2012); Hassan et al. (2019); Kaiser et al. (2019); Sayed et al. (2019), in this appendix we derive the bias field.

We derive the expressions for external magnetic field H0\rm H_{0} required to pin the magnetization of an LBM with ΔkBT\rm\Delta\leq k_{B}T here. We start from the energy expression for the magnet (EE) and derive the expressions presented in Fig. 9 from the steady-state average magnetization defined by:

m=θ=0θ=πϕ=πϕ=πsinθdϕdθmexp(E/kBT)θ=0θ=π/2ϕ=πϕ=πsinθdϕdθexp(E/kBT)\centering\langle m\rangle=\frac{\displaystyle\int_{\theta=0}^{\theta=\pi}\displaystyle\int_{\phi=-\pi}^{\phi=\pi}\sin\theta\ d\phi\ d\theta~m\exp(-E/k_{B}T)}{\displaystyle\int_{\theta=0}^{\theta=\pi/2}\displaystyle\int_{\phi=-\pi}^{\phi=\pi}\sin\theta\ d\phi\ d\theta~\exp(-E/k_{B}T)}\@add@centering (4)

where (mx,my,mz)(cosθ,sinθsinϕ,sinθcosϕ)(m_{x},m_{y},m_{z})\equiv(\cos{\theta},\sin{\theta}\sin{\phi},\sin{\theta}\cos{\phi}).

Refer to caption
Figure 14: Pinning Field of low-barrier magnets The numerical evaluations of equations are compared to SPICE simulation for (a) Isotropic magnets and (b) circular IMA magnets which have ΔkBT\rm\Delta\leq k_{B}T. The pinning fields are shown to be a function of MSΩM_{S}\Omega only where MS=600emu/cc\rm M_{S}=600~emu/cc and the volume of magnet Ω\rm\Omega is varied, The pinning field values for IMA magnets indicate that it is independent of the large demagnetization field, HD\rm H_{D}. The precise correspondence between the analytical formulas and the numerical simulation also constitutes as a benchmark to our finite temperature (stochastic) LLG formulation.

A.0.1 Perpendicular Magnetic Anisotropy (PMA)

In case of LBM with perpendicular magnetization, the anisotropy field along x-axis Hkp0\rm H_{kp}\rightarrow 0 and thus for a field applied in the x-direction the energy expression eq. 1 is reduced to :

E=HextMSΩmxE=-H_{ext}M_{S}\Omega~m_{x} (5)

Evaluation eq. 4 wrt to this energy gives us: mx=coth(HextMSΩ/kBT)(HextMSΩ/kBT)tanh(HextMSΩ/3kBT)\langle m_{x}\rangle=\coth(H_{ext}M_{S}\Omega/k_{B}T)-(H_{ext}M_{S}\Omega/k_{B}T)\approx\tanh(H_{ext}M_{S}\Omega/3k_{B}T). So to pin the magnetization to any of its state mx=±1\langle m_{x}\rangle=\pm 1, the required external field for PMA magnets can be approximated by:

|Hext(PMA)|=3kBTMsΩ|H_{ext(PMA)}|=\frac{3k_{B}T}{M_{s}\Omega} (6)

A.0.2 In-plane Magnetic Anisotropy (IMA)

For LBM with in-plane magnets, the anisotropy field along z-axis Hki0\rm H_{ki}\rightarrow 0 and a large demagnetization field HD\rm H_{D} exists along the z-axis which keeps the magnetization in-plane. The energy expression from eq. 1 in this case is :

E=HDMSΩmx2HextMSΩmz.E=H_{D}M_{S}\Omega~m_{x}^{2}-H_{ext}M_{S}\Omega~m_{z}. (7)

Once again evaluating eq. 4 wrt to this energy for very large demagnetizing field (HD\rm H_{D}\rightarrow\infty) can be simplified to mzHextMSΩ/2kBT\langle m_{z}\rangle\approx H_{ext}M_{S}\Omega/2k_{B}T. So to pin the magnetization to any of its state mz=±1\langle m_{z}\rangle=\pm 1, the required external field for IMA magnets can be approximated by:

|Hext(IMA)|=2kBTMsΩ|H_{ext(IMA)}|=\frac{2k_{B}T}{M_{s}\Omega} (8)

The expression is independent of the demagnetization field. These empirical expressions match our SPICE simulation results quite well as shown in fig. 14.

References

  • Yamaoka et al. (2015) M. Yamaoka, C. Yoshimura, M. Hayashi, T. Okuyama, H. Aoki,  and H. Mizuno, “24.3 20k-spin ising chip for combinational optimization problem with cmos annealing,” in 2015 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA (IEEE, 2015) pp. 1–3.
  • Neukart et al. (2017) F. Neukart, G. Compostella, C. Seidel, D. Von Dollen, S. Yarkoni,  and B. Parney, “Traffic flow optimization using a quantum annealer,” Frontiers in ICT 4, 29 (2017).
  • Barahona et al. (1988) F. Barahona, M. Grötschel, M. Jünger,  and G. Reinelt, “An application of combinatorial optimization to statistical physics and circuit layout design,” Operations Research 36, 493–513 (1988).
  • Cook et al. (2018) C. Cook, H. Zhao, T. Sato, M. Hiromoto,  and S. X.-D. Tan, “Gpu based parallel ising computing for combinatorial optimization problems in vlsi physical design,” arXiv preprint arXiv:1807.10750  (2018).
  • Rosenberg et al. (2016) G. Rosenberg, P. Haghnegahdar, P. Goddard, P. Carr, K. Wu,  and M. L. De Prado, “Solving the optimal trading trajectory problem using a quantum annealer,” IEEE Journal of Selected Topics in Signal Processing 10, 1053–1060 (2016).
  • Sakaguchi et al. (2016) H. Sakaguchi, K. Ogata, T. Isomura, S. Utsunomiya, Y. Yamamoto,  and K. Aihara, “Boltzmann sampling by degenerate optical parametric oscillator network for structure-based virtual screening,” Entropy 18, 365 (2016).
  • Barahona (1982) F. Barahona, “On the computational complexity of ising spin glass models,” Journal of Physics A: Mathematical and General 15, 3241 (1982).
  • Lucas (2014) A. Lucas, “Ising formulations of many np problems,” Frontiers in Physics 2, 5 (2014).
  • Sutton et al. (2017) B. Sutton, K. Y. Camsari, B. Behin-Aein,  and S. Datta, “Intrinsic optimization using stochastic nanomagnets,” Scientific Reports 7, 44370 (2017).
  • (10) “Binary stochastic neurons in tensorflow (https://r2rt.com/binary-stochastic-neurons-in-tensorflow.html),” .
  • Johnson et al. (2011) M. W. Johnson, M. H. Amin, S. Gildert, T. Lanting, F. Hamze, N. Dickson, R. Harris, A. J. Berkley, J. Johansson, P. Bunyk, et al., “Quantum annealing with manufactured spins,” Nature 473, 194–198 (2011).
  • McMahon et al. (2016) P. L. McMahon, A. Marandi, Y. Haribara, R. Hamerly, C. Langrock, S. Tamate, T. Inagaki, H. Takesue, S. Utsunomiya, K. Aihara, et al., “A fully programmable 100-spin coherent ising machine with all-to-all connections,” Science 354, 614–617 (2016).
  • Dutta et al. (2020) S. Dutta, A. Khanna, H. Paik, D. Schlom, A. Raychowdhury, Z. Toroczkai,  and S. Datta, “Ising hamiltonian solver using stochastic phase-transition nano-oscillators,” arXiv preprint arXiv:2007.12331  (2020).
  • Goto, Tatsumura, and Dixon (2019) H. Goto, K. Tatsumura,  and A. R. Dixon, “Combinatorial optimization by simulating adiabatic bifurcations in nonlinear hamiltonian systems,” Science advances 5, eaav2372 (2019).
  • Wang and Roychowdhury (2019) T. Wang and J. Roychowdhury, “Oim: Oscillator-based ising machines for solving combinatorial optimisation problems,” in International Conference on Unconventional Computation and Natural Computation, Tokyo, Japan (Springer, 2019) pp. 232–256.
  • Ahmed, Chiu, and Kim (2020) I. Ahmed, P.-W. Chiu,  and C. H. Kim, “A probabilistic self-annealing compute fabric based on 560 hexagonally coupled ring oscillators for solving combinatorial optimization problems,” in 2020 IEEE Symposium on VLSI Circuits, Honolulu, USA (IEEE, 2020) pp. 1–2.
  • Chou et al. (2019) J. Chou, S. Bramhavar, S. Ghosh,  and W. Herzog, “Analog coupled oscillator based weighted ising machine,” Scientific reports 9, 1–10 (2019).
  • Kirkpatrick, Gelatt, and Vecchi (1983) S. Kirkpatrick, C. D. Gelatt,  and M. P. Vecchi, “Optimization by simulated annealing,” science 220, 671–680 (1983).
  • Baity-Jesi et al. (2014) M. Baity-Jesi, R. A. Baños, A. Cruz, L. A. Fernandez, J. M. Gil-Narvión, A. Gordillo-Guerrero, D. Iñiguez, A. Maiorano, F. Mantovani, E. Marinari, et al., “Janus ii: A new generation application-driven computer for spin-system simulations,” Computer Physics Communications 185, 550–559 (2014).
  • Takemoto et al. (2019) T. Takemoto, M. Hayashi, C. Yoshimura,  and M. Yamaoka, “2.6 a 2×\times 30k-spin multichip scalable annealing processor based on a processing-in-memory approach for solving large-scale combinatorial optimization problems,” in 2019 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA (IEEE, 2019) pp. 52–54.
  • Aramon et al. (2019) M. Aramon, G. Rosenberg, E. Valiante, T. Miyazawa, H. Tamura,  and H. G. Katzgraber, “Physics-inspired optimization for quadratic unconstrained problems using a digital annealer,” Frontiers in Physics 7, 48 (2019).
  • Yamamoto et al. (2020) K. Yamamoto, K. Ando, N. Mertig, T. Takemoto, M. Yamaoka, H. Teramoto, A. Sakai, S. Takamaeda-Yamazaki,  and M. Motomura, “7.3 statica: A 512-spin 0.25 m-weight full-digital annealing processor with a near-memory all-spin-updates-at-once architecture for combinatorial optimization with complete spin-spin interactions,” in 2020 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA (IEEE, 2020) pp. 138–140.
  • Patel et al. (2020) S. Patel, L. Chen, P. Canoza,  and S. Salahuddin, “Ising model optimization problems on a fpga accelerated restricted boltzmann machine,” arXiv preprint arXiv:2008.04436  (2020).
  • Patel, Canoza, and Salahuddin (2020) S. Patel, P. Canoza,  and S. Salahuddin, “Logically synthesized, hardware-accelerated, restricted boltzmann machines for combinatorial optimization and integer factorization,” arXiv preprint arXiv:2007.13489  (2020).
  • Camsari, Salahuddin, and Datta (2017) K. Y. Camsari, S. Salahuddin,  and S. Datta, “Implementing p-bits with embedded mtj,” IEEE Electron Device Letters 38, 1767–1770 (2017).
  • Xia et al. (2016) L. Xia, P. Gu, B. Li, T. Tang, X. Yin, W. Huangfu, S. Yu, Y. Cao, Y. Wang,  and H. Yang, “Technological exploration of rram crossbar array for matrix-vector multiplication,” Journal of Computer Science and Technology 31, 3–19 (2016).
  • Cai et al. (2019a) F. Cai, J. M. Correll, S. H. Lee, Y. Lim, V. Bothra, Z. Zhang, M. P. Flynn,  and W. D. Lu, “A fully integrated reprogrammable memristor-cmos system for efficient multiply–accumulate operations,” Nature Electronics 2, 290–299 (2019a).
  • Bayat et al. (2018) F. M. Bayat, M. Prezioso, B. Chakrabarti, H. Nili, I. Kataeva,  and D. Strukov, “Implementation of multilayer perceptron network with highly uniform passive memristive crossbar circuits,” Nature communications 9, 1–7 (2018).
  • Sutton et al. (2019) B. Sutton, R. Faria, L. A. Ghantasala, K. Y. Camsari,  and S. Datta, “Autonomous probabilistic coprocessing with petaflips per second,” arXiv preprint arXiv:1907.09664  (2019).
  • Torunbalci et al. (2018) M. M. Torunbalci, P. Upadhyaya, S. A. Bhave,  and K. Y. Camsari, “Modular compact modeling of mtj devices,” IEEE Transactions on Electron Devices 65, 4628–4634 (2018).
  • (31) “Predictive Technology Model (PTM) (http://ptm.asu.edu/),” .
  • Hassan et al. (2019) O. Hassan, R. Faria, K. Y. Camsari, J. Z. Sun,  and S. Datta, “Low-barrier magnet design for efficient hardware binary stochastic neurons,” IEEE Magnetics Letters 10, 1–5 (2019).
  • Daniels et al. (2020) M. W. Daniels, A. Madhavan, P. Talatchian, A. Mizrahi,  and M. D. Stiles, “Energy-efficient stochastic computing with superparamagnetic tunnel junctions,” Physical Review Applied 13, 034016 (2020).
  • Parks et al. (2018) B. Parks, M. Bapna, J. Igbokwe, H. Almasi, W. Wang,  and S. A. Majetich, “Superparamagnetic perpendicular magnetic tunnel junctions for true random number generators,” AIP Advances 8, 055903 (2018).
  • Grollier et al. (2020) J. Grollier, D. Querlioz, K. Camsari, K. Everschor-Sitte, S. Fukami,  and M. D. Stiles, “Neuromorphic spintronics,” Nature Electronics , 1–11 (2020).
  • Abeed and Bandyopadhyay (2019) M. A. Abeed and S. Bandyopadhyay, “Low energy barrier nanomagnet design for binary stochastic neurons: Design challenges for real nanomagnets with fabrication defects,” IEEE Magnetics Letters 10, 1–5 (2019).
  • Drobitch and Bandyopadhyay (2019) J. L. Drobitch and S. Bandyopadhyay, “Reliability and scalability of p-bits implemented with low energy barrier nanomagnets,” IEEE Magnetics Letters 10, 1–4 (2019).
  • Borders et al. (2019) W. A. Borders, A. Z. Pervaiz, S. Fukami, K. Y. Camsari, H. Ohno,  and S. Datta, “Integer factorization using stochastic magnetic tunnel junctions,” Nature 573, 390–393 (2019).
  • Parks et al. (2020) B. Parks, A. Abdelgawad, T. Wong, R. F. Evans,  and S. A. Majetich, “Magnetoresistance dynamics in superparamagnetic co- fe- b nanodots,” Physical Review Applied 13, 014063 (2020).
  • Cheemalavagu et al. (2005) S. Cheemalavagu, P. Korkmaz, K. V. Palem, B. E. Akgul,  and L. N. Chakrapani, “A probabilistic cmos switch and its realization by exploiting noise,” in IFIP International Conference on VLSI (2005) pp. 535–541.
  • Shukla et al. (2014) N. Shukla, A. Parihar, E. Freeman, H. Paik, G. Stone, V. Narayanan, H. Wen, Z. Cai, V. Gopalan, R. Engel-Herbert, et al., “Synchronized charge oscillations in correlated electron systems,” Scientific reports 4, 4964 (2014).
  • Kumar, Strachan, and Williams (2017) S. Kumar, J. P. Strachan,  and R. S. Williams, “Chaotic dynamics in nanoscale nbo 2 mott memristors for analogue computing,” Nature 548, 318–321 (2017).
  • Stampfer et al. (2018) B. Stampfer, F. Zhang, Y. Y. Illarionov, T. Knobloch, P. Wu, M. Waltl, A. Grill, J. Appenzeller,  and T. Grasser, “Characterization of single defects in ultrascaled mos 2 field-effect transistors,” ACS nano 12, 5368–5375 (2018).
  • Cai et al. (2019b) J. Cai, B. Fang, L. Zhang, W. Lv, B. Zhang, T. Zhou, G. Finocchio,  and Z. Zeng, “Voltage-controlled spintronic stochastic neuron based on a magnetic tunnel junction,” Physical Review Applied 11, 034015 (2019b).
  • Camsari et al. (2020) K. Y. Camsari, M. M. Torunbalci, W. A. Borders, H. Ohno,  and S. Fukami, “Double free-layer magnetic tunnel junctions for probabilistic bits,” arXiv preprint arXiv:2012.06950  (2020).
  • Lin et al. (2009) C. Lin, S. Kang, Y. Wang, K. Lee, X. Zhu, W. Chen, X. Li, W. Hsu, Y. Kao, M. Liu, et al., “45nm low power cmos logic compatible embedded stt mram utilizing a reverse-connection 1t/1mtj cell,” in 2009 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA (IEEE, 2009) pp. 1–4.
  • Camsari et al. (2017) K. Y. Camsari, R. Faria, B. M. Sutton,  and S. Datta, “Stochastic p-bits for invertible logic,” Physical Review X 7, 031014 (2017).
  • Lv, Bloom, and Wang (2019) Y. Lv, R. P. Bloom,  and J.-P. Wang, “Experimental demonstration of probabilistic spin logic by magnetic tunnel junctions,” IEEE Magnetics Letters 10, 1–5 (2019).
  • Zink, Lv, and Wang (2019) B. R. Zink, Y. Lv,  and J.-P. Wang, “Independent control of antiparallel-and parallel-state thermal stability factors in magnetic tunnel junctions for telegraphic signals with two degrees of tunability,” IEEE Transactions on Electron Devices 66, 5353–5359 (2019).
  • Parkin et al. (2004) S. S. Parkin, C. Kaiser, A. Panchula, P. M. Rice, B. Hughes, M. Samant,  and S.-H. Yang, “Giant tunnelling magnetoresistance at room temperature with mgo (100) tunnel barriers,” Nature materials 3, 862–867 (2004).
  • Ikeda et al. (2008) S. Ikeda, J. Hayakawa, Y. Ashizawa, Y. Lee, K. Miura, H. Hasegawa, M. Tsunoda, F. Matsukura,  and H. Ohno, “Tunnel magnetoresistance of 604% at 300 k by suppression of ta diffusion in co fe b/ mg o/ co fe b pseudo-spin-valves annealed at high temperature,” Applied Physics Letters 93, 082508 (2008).
  • Debashis et al. (2018) P. Debashis, R. Faria, K. Y. Camsari,  and Z. Chen, “Design of stochastic nanomagnets for probabilistic spin logic,” IEEE Magnetics Letters 9, 1–5 (2018).
  • Debashis et al. (2016) P. Debashis, R. Faria, K. Y. Camsari, J. Appenzeller, S. Datta,  and Z. Chen, “Experimental demonstration of nanomagnet networks as hardware for ising computing,” in 2016 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA (IEEE, 2016) pp. 34–3.
  • Safranski et al. (2020) C. Safranski, J. Kaiser, P. Trouilloud, P. Hashemi, G. Hu,  and J. Z. Sun, “Demonstration of nanosecond operation in stochastic magnetic tunnel junctions,” arXiv preprint arXiv:2010.14393  (2020).
  • Zhang et al. (2021) C. Zhang, Y. Takeuchi, S. Fukami,  and H. Ohno, “Field-free and sub-ns magnetization switching of magnetic tunnel junctions by combining spin-transfer torque and spin–orbit torque,” Applied Physics Letters 118, 092406 (2021).
  • (56) nanohub.org, “Modular approach to spintronics,” https://nanohub.org/groups/spintronics.
  • Coffey and Kalmykov (2012) W. T. Coffey and Y. P. Kalmykov, “Thermal fluctuations of magnetic nanoparticles: Fifty years after brown,” Journal of Applied Physics 112, 121301 (2012).
  • Kaiser et al. (2019) J. Kaiser, A. Rustagi, K. Y. Camsari, J. Z. Sun, S. Datta,  and P. Upadhyaya, “Subnanosecond fluctuations in low-barrier nanomagnets,” Physical Review Applied 12, 054056 (2019).
  • Pufall et al. (2004) M. R. Pufall, W. H. Rippard, S. Kaka, S. E. Russek, T. J. Silva, J. Katine,  and M. Carey, “Large-angle, gigahertz-rate random telegraph switching induced by spin-momentum transfer,” Physical Review B 69, 214409 (2004).
  • Faria, Camsari, and Datta (2017) R. Faria, K. Y. Camsari,  and S. Datta, “Low-barrier nanomagnets as p-bits for spin logic,” IEEE Magnetics Letters 8, 1–5 (2017).
  • Sun (2000) J. Z. Sun, “Spin-current interaction with a monodomain magnetic body: A model study,” Physical Review B 62, 570 (2000).
  • Faria, Camsari, and Datta (2018) R. Faria, K. Y. Camsari,  and S. Datta, “Implementing bayesian networks with embedded stochastic mram,” AIP Advances 8, 045101 (2018).
  • Romera et al. (2018) M. Romera, P. Talatchian, S. Tsunegi, F. A. Araujo, V. Cros, P. Bortolotti, J. Trastoy, K. Yakushiji, A. Fukushima, H. Kubota, et al., “Vowel recognition with four coupled spin-torque nano-oscillators,” Nature 563, 230–234 (2018).
  • Jenkins et al. (2019) S. Jenkins, A. Meo, L. E. Elliott, S. K. Piotrowski, M. Bapna, R. W. Chantrell, S. A. Majetich,  and R. F. Evans, “Magnetic stray fields in nanoscale magnetic tunnel junctions,” Journal of Physics D: Applied Physics 53, 044001 (2019).
  • Faria et al. (2020) R. Faria, J. Kaiser, K. Y. Camsari,  and S. Datta, “Hardware design for autonomous bayesian networks,” arXiv preprint arXiv:2003.01767  (2020).
  • Isakov et al. (2015) S. V. Isakov, I. N. Zintchenko, T. F. Rønnow,  and M. Troyer, “Optimised simulated annealing for ising spin glasses,” Computer Physics Communications 192, 265–271 (2015).
  • Aarts, Aarts, and Lenstra (2003) E. Aarts, E. H. Aarts,  and J. K. Lenstra, Local search in combinatorial optimization (Princeton University Press, NJ, USA, 2003).
  • Kaiser et al. (2020) J. Kaiser, R. Faria, K. Y. Camsari,  and S. Datta, “Probabilistic circuits for autonomous learning: A simulation study,” Frontiers in Computational Neuroscience 14, 14:1–7 (2020).
  • Cai et al. (2019c) F. Cai, S. Kumar, T. Van Vaerenbergh, R. Liu, C. Li, S. Yu, Q. Xia, J. J. Yang, R. Beausoleil, W. Lu, et al., “Harnessing intrinsic noise in memristor hopfield neural networks for combinatorial optimization,” arXiv preprint arXiv:1903.11194  (2019c).
  • Huang et al. (2015) H. Huang, J. Heilmeyer, M. Grözing, M. Berroth, J. Leibrich,  and W. Rosenkranz, “An 8-bit 100-gs/s distributed dac in 28-nm cmos for optical communications,” IEEE Transactions on Microwave Theory and Techniques 63, 1211–1218 (2015).
  • Hu et al. (2018) M. Hu, C. E. Graves, C. Li, Y. Li, N. Ge, E. Montgomery, N. Davila, H. Jiang, R. S. Williams, J. J. Yang, et al., “Memristor-based analog computation and neural network classification with a dot product engine,” Advanced Materials 30, 1705914 (2018).
  • Gyoten, Hiromoto, and Sato (2018) H. Gyoten, M. Hiromoto,  and T. Sato, “Area efficient annealing processor for ising model without random number generator,” IEICE TRANSACTIONS on Information and Systems 101, 314–323 (2018).
  • Aggarwal et al. (2019) S. Aggarwal, H. Almasi, M. DeHerrera, B. Hughes, S. Ikegawa, J. Janesky, H. Lee, H. Lu, F. Mancoff, K. Nagel, et al., “Demonstration of a reliable 1 gb standalone spin-transfer torque mram for industrial applications,” in 2019 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA (IEEE, 2019) pp. 2–1.
  • eve (2019) “Everspin enters pilot production phase for the world’s first 28 nm 1 gb stt-mram component,” Everspin Technology  (2019).
  • Zhang et al. (2020) X. Zhang, R. Bashizade, Y. Wang, C. Lyu, S. Mukherjee,  and A. R. Lebeck, “Beyond application end-point results: Quantifying statistical robustness of mcmc accelerators,” arXiv preprint arXiv:2003.04223  (2020).
  • Nasrin et al. (2019) S. Nasrin, J. L. Drobitch, S. Bandyopadhyay,  and A. R. Trivedi, “Low power restricted boltzmann machine using mixed-mode magneto-tunneling junctions,” IEEE Electron Device Letters 40, 345–348 (2019).
  • Schuman et al. (2017) C. D. Schuman, T. E. Potok, R. M. Patton, J. D. Birdwell, M. E. Dean, G. S. Rose,  and J. S. Plank, “A survey of neuromorphic computing and neural networks in hardware,” arXiv preprint arXiv:1705.06963  (2017).
  • Hinton (2002) G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural computation 14, 1771–1800 (2002).
  • Courbariaux et al. (2016) M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv,  and Y. Bengio, “Binarized neural networks: Training neural networks with weights and activations constrained to+ 1 or-1,” arXiv preprint arXiv:1602.02830 2 (2016).
  • Tsai et al. (2017) C.-H. Tsai, W.-J. Yu, W. H. Wong,  and C.-Y. Lee, “A 41.3/26.7 pj per neuron weight rbm processor supporting on-chip learning/inference for iot applications,” IEEE Journal of Solid-State Circuits 52, 2601–2612 (2017).
  • Park et al. (2015) S. Park, K. Bong, D. Shin, J. Lee, S. Choi,  and H.-J. Yoo, “93tops/w scalable deep learning/inference processor with tetra-parallel mimd architecture for big-data applications,” in 2015 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA (IEEE, 2015) pp. 1–3.
  • Brown Jr (1963) W. F. Brown Jr, “Thermal fluctuations of a single-domain particle,” Physical Review 130, 1677 (1963).
  • Sayed et al. (2019) S. Sayed, K. Y. Camsari, R. Faria,  and S. Datta, “Rectification in spin-orbit materials using low-energy-barrier magnets,” Physical Review Applied 11, 054063 (2019).