Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

[241] AI 칩 개발에 사용되는 엔지니어링

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 37 Publicité

Plus De Contenu Connexe

Similaire à [241] AI 칩 개발에 사용되는 엔지니어링 (20)

Plus par NAVER D2 (20)

Publicité

Plus récents (20)

[241] AI 칩 개발에 사용되는 엔지니어링

  1. 1. How can we build super-great AI chips? For software engineers who are new to HW. Paik June FuriosaAI
  2. 2. Contents ▪ Silicon Engineering ▪ Architecture Exploration ▪ HDL describing HW computation ▪ Conclusion
  3. 3. Silicon engineering: The foundation of computing
  4. 4. We forget Silicon Valley = Silicon + Valley Silicon engineering is one of the most complex-coordinated process that humankind has ever practiced so far. ▪ Enormous challenges ahead as design complexity explodes. ▪ Nvidia Volta GPUs packed with 20 billion transistors. Silicon Valley engineering culture is very influenced and shaped by very disciplined silicon engineering. ▪ Jeff Dean, Sanjay Ghemawat and Urs Holze all came from HW companies before joining Google. ▪ Our DEVIEW keynote speaker Song also worked for DEC ☺ It proves my point.
  5. 5. SW is eating the world. But, “People who are really serious about software should make their own hardware” – Alan Kay / Steve Jobs ▪ There is not much distinction between HW and SW if we are serious about it. Google, Amazon, Facebook, Microsoft, Alibaba, Baidu, Apple: Everyone is trying to build the strong silicon team as it’s strategically important to get vertical customizing their architectures controlling the entire stack. ▪ EX: Google TPU
  6. 6. What is our opportunity? We are into the big wave of global Semiconductor Super-Cycle ▪ Just think about cloud datacenter, autonomous car, IoT and AR/VR, all the electronic gadgets that will be powered by semiconductors. It is simply the biggest driving engine of our economy now and future. ▪ Global dominance in memory: 25% of the entire national exports ▪ We all know that we are relatively weak with non-memory products. ▪ SSD is in-between memory and non-memory. ▪ How about AI chips?
  7. 7. Yes We Can. We have one of the most advanced semiconductor manufacturing facilities in the world. ▪ TSMC vs. Samsung We have new generation of engineers with great potential ▪ Global Hit Semiconductor product experiences: Mobile Application Processor (AP), Solid State Drive (SSD) We also have AI application and service industries of good enough size. ▪ Good testbed before launching into global products.
  8. 8. Hell of challenges We don’t have much experience and success stories of enterprise level B2B solution initiated by startups. ▪ Domestic market is too small. Weak ecosystem in terms of market size. Semiconductor is fundamentally very tough business. It’s not easy at all even for big guys. It has been very capital and human resource intensive because ▪ It’s the timing business. You should be very fast. ▪ It requires extreme precision engineering. It shouldn’t fail.
  9. 9. To pull off successful design and sell to the masses, It should be very strategic and orchestrated long-term effort. Let’s go back to the fundamental .
  10. 10. AI chip engineering There are many aspects of AI chip design. We will mainly focus on microarchitecture. ▪ Application ▪ Algorithm ▪ Software ▪ Microarchitecture ▪ Physical Design ▪ …
  11. 11. Rendering of GDS2 file illustrating physical structure of silicon chips Zoom into a microchip
  12. 12. Microarchitecutre = micro + architecture Chip Design companies (Ex: Qualcomm, Nvidia, FuriosaAI) passes the architecture blueprint to the Fab companies (Ex: TSMC, Samsung, Global Foundary).
  13. 13. Great architecture need great architects Great building serves people to enable the best human activities in the most humane manner possible given the building material Great microarchitecture serves computation process that enables the best applications in the most efficient manner possible given the silicon/power/budget ▪ Real estate in the micro world ▪ Great architect should know in and out of everything and is able to implement the chip as scheduled with the given budgets
  14. 14. Microarchitect’s toolkit ▪ Instruction Set Architecture ▪ VLIW, SIMD, Vector, Systolic Array ▪ SuperScalar/ Multithreading / DataFlow ▪ Pipelining ▪ Virtualization ▪ Prefetching/Caching ▪ IO/Memory subsystem ▪ Finite state machine ▪ …
  15. 15. Key Question: What is the great winner architecture for AI computation?
  16. 16. More important questions How can we explore and find the best architecture and build it?
  17. 17. Let’s explore architecture .
  18. 18. Build the performance modeling simulator It’s a so called cycle accurate-simulator which can simulate both behavior and performance of machine we’re building at the very fine granularity and abstraction level which is usually at the level of clock cycle. This enforces the discipline of ▪ Concrete and precise thinking ▪ Data-Driven evaluation for important trade-off of design choices Architect should have strong (or reasonable) SW skill to build this simulator. OOP language and Event-Driven programming paradigm is the natural fit for this job. C++ is the standard choice.
  19. 19. Arch exploration takes time and experiences. Korean industries have neglected this part because we didn’t (or couldn’t afford to) allocate enough time for defining and exploring the design space to come up with the solid architecture specification. It takes time because ▪ Workload characterization and prediction takes time. ▪ Simulation needs supercomputer-scale computation. ▪ Understanding very detailed design trade-off just takes time. In other words, cultivating intuition by refining it iteratively by methodically taking good measures takes time
  20. 20. Time Schedule So let’s say it takes 1.5~2 years to build commercial AI chips from concept to production. We need to allocate at least 6~8 month for performance modeling that goes in parallel to the implementation Performance Modeling / Architecturing RTL Implementation Software Architecturing / Implementation Verification Physical Design / Manufacturing
  21. 21. Arch Examples: : Quantization (suggested by Google) ▪ Aggressive operator fusion: Performing as many operations as possible in a single pass can lower the cost of memory accesses and provide significant improvements in run-time and power consumption ▪ Compressed memory access: One can optimize memory bandwidth by supporting on the fly de-compression of weights (and activations). A simple way to do that is to support lower precision storage of weights and possibly activations. ▪ Lower precision 4/8/16 bit arithmetic processing ▪ Per-layer selection of bitwidths ▪ Per-channel quantization
  22. 22. Arch Examples: : Quantization (suggested by Google) ▪ Aggressive operator fusion: Performing as many operations as possible in a single pass can lower the cost of memory accesses and provide significant improvements in run-time and power consumption ▪ Compressed memory access: One can optimize memory bandwidth by supporting on the fly de-compression of weights (and activations). A simple way to do that is to support lower precision storage of weights and possibly activations. ▪ Lower precision 4/8/16 bit arithmetic processing ▪ Per-layer selection of bitwidths ▪ Per-channel quantization
  23. 23. Implementation: the dirty game starts with Hardware Description Language(HDL) .
  24. 24. Have you heard of Verilog, VHDL? ▪ HDL is notoriously hard to write in a right way. ▪ It’s partly due to the syntax, but the main reason is that you need to specify every step of the computation process at the very precise level using logic gate and finite-state machine. ▪ State machine is the very fundamental concept. Please read Leslie lamport and TLA+.
  25. 25. The best introduction to HW computation Amazing, SICP Ch5 "Computing with register machines" has one of the best explanation of HW computation process.
  26. 26. Euclid algorithm: SW implementation
  27. 27. Euclid Algorithm: HW implementation Datapath Controller
  28. 28. Describing HW datapath and controller
  29. 29. Where is the programmability of HW?
  30. 30. Real Production HDL Source Code ▪ Rocket-V Core Source code it is written in Chisel language, which is Scala- based. HDL source code is the most important golden part of the hardware IP that our engineers spend most of time on. It should be developed and maintained with the highest standard: ▪ Very strong discipline of test: Unit, Random, Formal, Top Level, Emulation, System Level Test. It requires 100 % test coverage. Once shipped, you can’t change hardware. ▪ But there are still many bugs. Observability such as performance and status registers should be baked into hardware at every level.
  31. 31. You learned major concept. Can you describe the matrix computation in HDL language? Give it a gry.
  32. 32. Example of AI chips: Google TPU
  33. 33. Example of AI Chips: Furiosa Madrun
  34. 34. HDL to the physical realities It’s the Physical Compiler = physical + compiler who does the job. Caution: it’s very capital intensive, expensive translation.
  35. 35. Let’s wrap up here. ▪ We mainly focused on microarchitecture and HDL aspect of AI chip engineering. ▪ AI chip ocused design is the true interplay and codesign of Algorithm + SW + HW. ▪ SW and Algorithm might matter more. It’s also really exciting technology. We have SW and Algorithm team as big as HW. ▪ Hope that we can discuss this in next Deview event after we have our chip out next year. ▪ Thank you! Good Luck!
  36. 36. Q & A
  37. 37. 질문은 Slido에 남겨주세요. sli.do #deview TRACK 4

×