Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

EchoBay: optimization of Echo State Networks under memory and time constraints

The increase in computational power of embedded devices and the latency demands of novel applications brought a paradigm shift on how and where the computation is performed. AI inference is slowly moving from the Cloud to end-devices with limited resources, reducing bandwidth and latency, using compression, distillation of large networks, or quantization methods. While this approach worked well with regular Artificial Neural Networks, time-centric recurrent networks like Long-Short Term Memory remain too complex to be transferred on embedded devices without extreme simplifications. To solve this issue, the Reservoir Computing paradigm proposes sparse untrained non-linear networks, the reservoir, that can embed temporal relations without some of the hindrances of Recurrent Neural Networks training, and with a lower memory occupation. Echo State Networks (ESN) and Liquid State Machines are the most notable examples. In this scenario, we propose a methodology for ESN design and training based on Bayesian Optimization. Our Bayesian learning process efficiently searches hyper-parameters that maximize a fitness function. At the same time, it considers soft memory and time boundaries, measured empirically on the target device (whether embedded or not), and subject to the user’s constraints. Preliminary results show that the system is able to optimize the ESN hyper-parameters under stringent time and memory constraints, obtaining comparable results in terms of prediction accuracy.

  • Identifiez-vous pour voir les commentaires

  • Soyez le premier à aimer ceci

EchoBay: optimization of Echo State Networks under memory and time constraints

  1. 1. 1 EchoBay: Design of Echo State Networks under Memory and Time constraints 17th - 31st May NGCX@San Francisco Luca Cerina {luca.cerina@polimi.it} Giuseppe Franco {g.franco4@studenti.unipi.it} Marco D. Santambrogio {marco.santambrogio@polimi.it}
  2. 2. 2 Biological inspiration cdn.aarp.net
  3. 3. 3 Visual Cortex GoogleNet In Convolutional Neural Networks and similar architectures information directly traverse the network: ● Information and recognition are defined by neural weights ● Easier to learn (derivable functions) ● Don’t require temporal relations to function properly
  4. 4. 4 Memory in the Brain human-memory.net Complexity explodes if we want to mimic memory functions. Memory is spread across different brain cortexes. Following the biologically-inspired road is technologically challenging.
  5. 5. 5 Memorize everything digitaltrends.com Although information density grows larger and larger, data storage without semantic is not an efficient paradigm for memory (e.g. 1mln photos of cats do not explicitly represent the cats concept).
  6. 6. 6 Model everything Graphs and equation models add semantics to the data to improve knowledge, but either require human supervision (e.g. annotating graphs) or they are extremely difficult to identify from data (e.g. NARMAX models of non-linear dynamic systems).
  7. 7. 7 Learn everything Early Hopfield networks provided associative memory, but with a low recall (0.13 - 0.14). Modern LSTM networks (long-short term memory) can learn complex temporal relations at different time-scales.
  8. 8. 8 Power demands power Novel RNN architectures are more efficient than LSTM, but they still require long learning time and high computational power. These limitations confine RNNs in large cloud setups (38M parameters for DeepSpeech 2[1]) or really shallow models (4 layers at most) on mobile systems [2][3]. Latency-critical applications require smarter models. [1] Dario Amodei et al. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin [2] Qingqing Caoet al. 2017. MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU. [3] J. Chauhan et al. 2018. Breathing-Based Authentication on Resource-Constrained IoT Devices using Recurrent Neural Networks.
  9. 9. 9 Enters Echo State Nets
  10. 10. 10 Echo State Network ● Win and W are Random → Untrained ● Wout is Trained with least square regression ● Fewer Weights ● Less Data Required ● Efficient and Fast Training
  11. 11. 11 ESN embedded Echo State Networks can exploit reservoirs with different topologies: random, cyclic, and mixed (small world). Simpler topologies allow special implementations such as photonic devices [4], memristors [5], and FPGAs [6]. Regular random ESN instead require a fine tuning of hyper-parameters to reach a compromise between performance, memory usage and computational time. [4] Laurent Larger, et al 2017. High-speed photonic reservoir computing using a time-delay-based architecture: Million words per second classification. [5] Shiping Wen, Rui Hu et al 2018. Memristor-based echo state network with online least mean square. [6] Miquel L Alomar et al 2016. FPGA-based stochastic echo state networks for time-series forecasting.
  12. 12. 12 Bayesian Optimization Bayesian Optimization efficiently searches in the regions that are expected to improve the performance (i.e. the fitness function). How does it apply to embedded systems?
  13. 13. 13 Memory Constraints Since ESN are extremely sparse, memory occupation is mostly dependent from the number of active units . On embedded devices we can optimize the network both setting hard constraints on density and reservoir’s size, and introducing a penalty factor on Runits
  14. 14. 14 Time Constraints Since ESN are extremely sparse, also . A target-dependent benchmark map gives us a contour of that respects a given time-constraint. Other options include: precision reduction, quantization, and multithreading.
  15. 15. 15 Experimental analysis The network was tested on state-of-the-art non-linear prediction tasks: ● SantaFe Laser ● NARMA10 task ● Lorenz equations
  16. 16. 16 Results: memory Hard-constraints optimization can be managed choosing high Nr, high 𝜌, or balanced solutions, following the benchmark contours. 1-step Laser prediction task on ESP32 target (240MHz). Other hyper-parameters: 𝜔in = 𝛼 = 1, λ = 0
  17. 17. 17 Results: memory If the problem becomes too complex for small random topologies, we can couple the constraint with memory penalization. 5-step NARMA10 prediction task on ARM target. Other hyper-parameters: 𝜔in = 0.4, 𝛼 = 1, λ = 0 ESP32 reached only 45% accuracy
  18. 18. 18 Results: time Proper boundaries on Bayesian optimization guarantee optimal and good performance under decreasing time constraints 5-step Lorenz prediction task on ARM target.
  19. 19. 19 Results: time Proper boundaries on Bayesian optimization guarantee optimal and good performance under decreasing time constraints
  20. 20. 20 Conclusions Bayesian Optimization and Echo State Networks provide competitive performance on temporal learning tasks. Target dependent constraints allow performance tuning and smarter optimization. The EchoBay library simplifies the design and testing process without a single line of code. QUESTIONS? Luca Cerina {luca.cerina@polimi.it} Giuseppe Franco {g.franco4@studenti.unipi.it} Marco D. Santambrogio {marco.santambrogio@polimi.it}