An FPGA Implementation of On-Chip Trainable Multilayer SAM Spiking Neural Network

This paper describes an implementation of a multilayer SAM spiking neural network into the PL logic part (FPGA) in a Xilinx Zynq processor. The SAM neuron model is a type of one of the most popular LIF spiking neuron model. The SAM neural network can be an on-chip trainable model because the model does not require any multiplier under our proposed Back Propagation (BP) base training algorithm. As a result, the model achieved an XOR logic element in a unit of spikes. Moreover, we achieved a multiplier-less implementation with our intended algorithm and architecture. The design allows an arbitrary number setting for the hidden and output neurons.


Introduction
Recently, researches and industrial applying of AI are very progressing, especially the neural network research area of AI is generating a wide variety of results (1) . Researches are reducing the precision of calculation information of neuron outputs and neuron parameters for reducing the scales of the circuit and improving the performances. Therefore, researches of spiking neural networks (SNN) is substantially expected because the information of neuron output of the SNN is regarded as only 1 bit. Moreover, SNN is a nearer model to a biological neuron than normal artificial neural networks (ANNs) (2,3,4) .
We have been proposed a supervised training algorithm for the multilayer SAM spiking neural network (5). The algorithm is based on BP. Because of allowing 'multiplierless' implementation to digital circuits such as FPGAs by an approximation without using × , the model can achieve 'on-chip training' on digital circuits.
The reports of SAM-SNN have been only simulation results of AND logic and 3 rd poly-nominal function approximation for Intel-Altera FPGAs so far (6,7) . This time, we implemented XOR logic in a unit of spike timing using the SAM-SNN into the PL logic part (FPGA) in a Xilinx Zynq processor. Moreover, we designed the SAM-SNN as allowing arbitrary number setting for the hidden neurons and output neurons.

SAM neuron model
The Spike Accumulation and Modulation (SAM) neuron model is a type of LIF spiking neuron model. The behavior of the SAM neuron model is as follows. At a discrete time , if the -th neuron receives a spike ( ) ∈ = {0,1} from the -th neuron, then the inner potential ( ) ∈ of the -th neuron is calculated by the sum of product between the link weights ∈ and input spikes (t), plus the addition of the weighted previous inner potential ( − 1) (where is a decay parameter). Thus, The -th neuron output ( ) ∈ is obtained by applying the activation function () to ( ), where () is a step function such that if ( ) is less than the output is 0, corresponding to no spike; if ( ) is greater than or equal to the output is 1 and a spike is generated. is the threshold, i.e., Moreover, the inner potential is decreased by an amount upon activation, such that The model has the advantage of treating time information within a single neuron body, avoiding a reduction in the frequency of output spikes even if the sampling interval is wide, comparing to the LIF neuron model. Therefore, the SAM neuron model is suitable for implementation on digital circuits (FPGAs).

Supervised training algorithm of Multilayer SAM neural network
The training approach for the multilayer SAM neural network is based on an approximation of BP. We define the objective function as RMS error loss between output spike , 〈3〉 ( ) and teacher spike , 〈3〉 ( ), Generally, for the -th layer, the link weight 〈 〉 is updated by gradient, such as where, Here, when the layer is a hidden layer, 〈 〉 ( ) is the teacher signal for the hidden layer calculated by the notion of approximation of BP as Because the differential of the step function ′( ) is a delta function that has infinite value, we instead use the pseudo derivative function shown in figure 2.

Design of the SAM model and the SAM neural network
The schematic design for digital circuitry based on the training algorithm described in chapter 2 is shown in figure  3. Figure 3 shows the SAM neuron model circuit is achieved without any multiplier (namely 'multiplier-less`). The multiplication of decay constant is realized by shift   (trapezoid symbol). Figure 4 shows the multilayer SAM neural network design, which has switches of inferring mode and training mode. "Learning" module produces the teacher spike signal in each layer. The HDL design is parameterized for generating circuits by specifying hyper parameter 2 (the number of hidden units) automatically. We used Verilog-HDL for the design.
This time, we chose the XOR problem as a task. As the number expression, we adopted fixed point number of 17 bits (m.n format). We set m=5 and n=12.
As the parameters for the SAM neuron model, we set = 0.5, = 0.5. The initial value of link weights 〈 〉 is set on random, and 〈 〉 ( ) = 0.25. The learning coefficient The state design is as follows. There are 3 states, such as initial state(S0), inferring and training states(G0L0), and next time step(TNEXT). After the start, the controller repeats G0L0 and TNEXT alternately.
Moreover, this time, we introduced a parameterized designed to set an arbitrary number of hidden neuron units and output neuron units by using generate sentence of Verilog-HDL.

Simulation and Implementation results
The tool for designing, implementation and simulation is Xilinx Vivado Suite 2018.3 (8) , and the assumed FPGA is a Zynq processor (xc7z020clg484-1) which has 17600 LUTs and 35200 FFs.
The result of the 2-2-1 (2 inputs, 2 hidden neurons, 1 output neuron) SAM neural network for the XOR task is shown in figure 5. Figure 5 shows 6 blocks that are the spike result of input(X1, X2)/output(Xj)/teacher signal(Tj) of the net, i/o/t of the output layer, i/o/t of the hidden neuron 1, and the hidden neuron 2, from top to bottom. At 459 training epoch (signal l[9:0]) the XOR task is realized. Moreover, the figure shows that the hidden neuron 1 performs OR logic element and the hidden neuron 2 performs AND logic element. Figure 6 shows an implementation report of the design. The model required small circuit scale such that 72 LUTs (0.41% of 17600LUTs) and 57 FFs(0.16% of 35200FFs) total for the implementation. In addition, there was no DSP48E report, that is, a multiplier-less design was achieved.

Conclusions
This time, we implemented the 3 layers SAM spiking neural network into the Xilinx Zynq processor. As concluding remarks, (a) In a simulation, the model performs XOR logic element in a unit of spike timing. (b) The implemented circuit scale is small, and it is confirmed that the implementation does not use any multiplier. Thus, the design allows 'on-chip training' circuitry. (c) The design is parameterized in the setting of hidden layer neuron units and output layer neuron units. This allows a setting arbitrary number of neuron units.
Recently, an embedded trainable AI board Jetson Nano using GPU has been on sale by NVIDIA Co., Ltd (8) . However, generally speaking, GPU requires somewhat electrical energy (Jetson Nano needs 5 to 10 watt). This research aims to develop trainable AI devices that have characteristics of energy efficiency. It is considered that the applications of this research are autonomous robots, intelligent sensors, and myoelectric prosthetic hands. Fig. 6. Utilization of the implementation for Zynq processor (xc7z020clg484-1). The report result of 'DSP48E' for multiplier usage does not report.