Publicité
Publicité

Contenu connexe

Similaire à Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind(20)

Plus de Techon Organization(20)

Publicité

Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind

  1. • 自己紹介 • AnyLogic入門 • 強化学習の入門 • AnyLogic+強化学習のメリット • サンプルと実績の紹介 | OUTLINE
  2. Currently VP. Engineering @ Skymind • Leading RL Applications • Previously: • Assistant Manager @ JBS • Intern Researcher @ Panasonic Eduardo Gonzalez | WHO AM I 3 @wm_eddie https://qiita.com/wmeddie https://wm-eddie.info
  3. ● Builds AI infrastructure for operating models in production ● Allows model access from cloud, server, desktop, and mobile ● Providing tooling for models such as revision history and accuracy monitoring over time ● Created the widely used open-source AI framework Deeplearning4j, powering AI for large enterprises globally, from banking to telecom PRODUCTS SKIL: ML and DL Model Server | ABOUT SKYMIND 4
  4. Skymind’s team has contributed millions of lines of code to Open Source | OPEN SOURCE CONTRIBUTORS 5
  5. Deep Learning, A Practitioner’s Approach ● Written by Adam Gibson (CTO) and Josh Patterson (Contributor) ● Published in 2017 ● Good fundamentals for deep learning and the DL4J framework ● Many Graphics come from the book | BOOK 6
  6. Deep Learning and the Game of Go ● Written by Max Pumperla, Deep Learning Engineer @ Skymind ● Published in 2019 ● Shows how to go from 0 to an entire AlphaZero style Go bot ● Introduces Deep Learning and Reinforcement Learning from scratch. | BOOK 7
  7. AnyLogic入門 8
  8. AnyLogic is a multi-modal simulation modeling software that is capable of doing system dynamics, agent-based and discrete event based simulations. It is a de facto standard in the industry and is used by almost all of the Fortune 500. | ANYLOGIC AnyLogic models can be exported into a Java application and deployed to customers.
  9. AnyLogic models are extended with Java so you can create custom agents or experiments. Exported applications are Java libraries and can be integrated into and leverage data from Enterprise applications and Excel. | ANYLOGIC DETAILS
  10. DL4J includes RL4J, a reinforcement library for Java. It can be used inside AnyLogic without friction. Reinforcement Learning was a main theme of the AnyLogic ’19 Conference. Skymind collaborated closely with AnyLogic for workshops and panel discussions. | WHY ANYLOGIC + SKYMIND
  11. 強化学習入門 12
  12. | WHAT IS AI? 13
  13. | 4 TYPES OF LEARNING 14
  14. | REINFORCEMENT LEARNING IN DETAIL
  15. | REINFORCEMENT LEARNING ALGORITHMS (VALUE) Q-learning is a method for training a reinforcement learning agent to anticipate how much reward it can expect in the future. The Q comes from the standard mathematical notation Q(s, a) which is a function of the state and a possible action © Intel Illustration from Deep Learning and the Game of Go © Manning
  16. | REINFORCEMENT LEARNING ALGORITHMS (POLICY) Actor Critic based algorithms use the current state as the input and outputs a set of moves it should play (the policy), and a value of which player is ahead (the critic) © Intel Illustration from Deep Learning and the Game of Go © Manning
  17. AnyLogic+強化学習のメリット 18
  18. • Lots of NP-Hard problems exist in Simulation • Current Optimization techniques are not able to do anything • A good enough solution is better than no solution • And better than hand written heuristics | WHY REINFORCEMENT LEARNING
  19. © The AnyLogic Company | www.anylogic.com 20 Learning and decision making from a simulation model FINAL MODEL LEARN Simulation model is an extension of someone’s mental model
  20. © The AnyLogic Company | www.anylogic.com 21 Learning and decision making from a simulation model FINAL MODEL LEARN
  21. © The AnyLogic Company | www.anylogic.com 22 Simulation as the reinforcement learning environment SIMULATED WORLD (Simulation Model)
  22. サンプルと実績の紹介 23
  23. © The AnyLogic Company | www.anylogic.com 24 Traffic Light Example Eduardo Gonzalez VP Engineering Skymind Samuel Audet Deep Learning Engineer Skymind Tyler Wolfe-Adam Technical Support Specialist The AnyLogic Company
  24. © The AnyLogic Company | www.anylogic.com 25 Arrivalrates(perhour) Time (seconds) Traffic Light Example Cars enter the intersection from 4 directions and move towards the opposing side. The objective of the training experiment is to learn a policy optimally controls the traffic light based on current status of the traffic. N S W E
  25. © The AnyLogic Company | www.anylogic.com 26 Implementation Architecture
  26. © The AnyLogic Company | www.anylogic.com 27 Implementation Architecture AnyLogic Model Imported RL4J library Custom Experiment
  27. © The AnyLogic Company | www.anylogic.com 28 What is inside the Custom experiment? Hyperparameters Network configuration Training
  28. © The AnyLogic Company | www.anylogic.com 29 What is inside the Custom experiment? Network configuration 10 300 300 2 Input Hidden 1 Hidden 2 Output
  29. © The AnyLogic Company | www.anylogic.com 30 What is inside the Custom experiment? Network configuration
  30. © The AnyLogic Company | www.anylogic.com 31 What is inside the Custom experiment? Network configuration Training
  31. © The AnyLogic Company | www.anylogic.com 32 What is inside the Custom experiment?
  32. © The AnyLogic Company | www.anylogic.com 33 What is inside the Custom experiment? Array with 10 elements 1 2 34 5 6 87 9
  33. © The AnyLogic Company | www.anylogic.com 34 What is inside the Custom experiment?
  34. © The AnyLogic Company | www.anylogic.com 35 What is inside the Custom experiment? Action == 0: do nothing Action == 1: change the traffic light phase if not yellow
  35. © The AnyLogic Company | www.anylogic.com 36 Comparison of results (Optimized vs. Policy)
  36. © The AnyLogic Company | www.anylogic.com 37
  37. © The AnyLogic Company | www.anylogic.com 38 Comparison of results (Base vs. Optimized vs. Policy) Real systems: Dynamic + Stochastic (exogenous inputs / system internals) Optimization: Optimal fixed input parameters Policy: Optimal (or near-optimal) decisions over time
  38. © The AnyLogic Company | www.anylogic.com 39 Reinforcement learning decision points Hyperparameters Observation Space Action SpaceReward
  39. © The AnyLogic Company | www.anylogic.com 40 Trained policies can be deployed in all types of devices and equipments to adaptively and autonomously complete some tasks. How are learned policies used? Edge devices could be used as controllers to deploy the learned policies.
  40. © The AnyLogic Company | www.anylogic.com 41 Machine Learning powered by Skymind http://www.skymind.ai/anylogic
  41. © The AnyLogic Company | www.anylogic.com 42 • The great news for simulation modelers is that their skills have a new and exciting application now! • To implement a reinforcement learning (or DRL) a team of DRL expert(s) + simulation modeler(s) can collaborate. In theory, it is not necessary for each team to have an in-depth knowledge of the other group’s tasks. • In developing simulation models that are going to be used as training environments, the stakes are higher because the human buffer is no longer there. What should simulation modelers know about this new application?
  42. © The AnyLogic Company | www.anylogic.com 43 At least in near future, there is NO way to automate the process of abstracting reality into a simulation model because it has two aspects that [current] machines are not good at: ̶ The process of abstracting reality is an art ̶ Simulation models are fundamentally based on uncovering causality and how something works Can simulation modelers’ jobs be replaced with AI too?
  43. © The AnyLogic Company | www.anylogic.com 44 thank you!
Publicité