Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

False sharing 隱藏在多核系統的效能陷阱

691 vues

Publié le

簡介隱藏在多核系統的效能陷阱 - false sharing
Bfief exalpin performance traps an multi cores system - false sharing

Publié dans : Ingénierie
  • Soyez le premier à commenter

False sharing 隱藏在多核系統的效能陷阱

  1. 1. False Sharing 隱藏在多核系統的效能陷阱 G7
  2. 2. Agenda ● What is false sharing ● How to avoid it ● How to use it to improvement performance
  3. 3. What is false sharing?
  4. 4. Which one is faster? type MyTest struct { param1 uint64 param2 uint64 } var addTimes = 100000000 var wg sync.WaitGroup func Inc(num *uint64) { for i := 0; i < addTimes; i++ { atomic.AddUint64(num, 1) } wg.Done() } func BenchmarkTestProcessNum1(b *testing.B) { runtime.GOMAXPROCS(1) myTest := &MyTest{} wg.Add(2) go Inc(&myTest.param1) go Inc(&myTest.param2) wg.Wait() } type MyTest struct { param1 uint64 param2 uint64 } var addTimes = 100000000 var wg sync.WaitGroup func Inc(num *uint64) { for i := 0; i < addTimes; i++ { atomic.AddUint64(num, 1) } wg.Done() } func BenchmarkTestProcessNum2(b *testing.B) { runtime.GOMAXPROCS(2) myTest := &MyTest{} wg.Add(2) go Inc(&myTest.param1) go Inc(&myTest.param2) wg.Wait() }
  5. 5. Trace Result seem better
  6. 6. Which one is faster? type MyTest struct { param1 uint64 param2 uint64 } var addTimes = 100000000 var wg sync.WaitGroup func Inc(num *uint64) { for i := 0; i < addTimes; i++ { atomic.AddUint64(num, 1) } wg.Done() } func BenchmarkTestProcessNum1(b *testing.B) { runtime.GOMAXPROCS(1) myTest := &MyTest{} wg.Add(2) go Inc(&myTest.param1) go Inc(&myTest.param2) wg.Wait() } type MyTest struct { param1 uint64 param2 uint64 } var addTimes = 100000000 var wg sync.WaitGroup func Inc(num *uint64) { for i := 0; i < addTimes; i++ { atomic.AddUint64(num, 1) } wg.Done() } func BenchmarkTestProcessNum2(b *testing.B) { runtime.GOMAXPROCS(2) myTest := &MyTest{} wg.Add(2) go Inc(&myTest.param1) go Inc(&myTest.param2) wg.Wait() }
  7. 7. Benchmark result 單核速度比雙核速度快了約 180%
  8. 8. 兩個獨立的 Job ,單核跑的比雙核快,Why?
  9. 9. 兩個獨立的 Job ,單核跑的比雙核快,Why? False sharing
  10. 10. CPU Cache
  11. 11. CPU Cache reference : https://chrisadkin.io/2015/01/20/large-memory-pages-how-they-work-and-the-logcache_access-spinlock/
  12. 12. CPU Cache
  13. 13. CPU Cache
  14. 14. CPU Cache
  15. 15. CPU Cache
  16. 16. 兩個獨立的 Job ,單核跑的比雙核快,Why? False sharing,導致 CPU 被迫使用更慢的 memory 存取資料
  17. 17. How to avoid: cache padding
  18. 18. Cache padding
  19. 19. Cache padding type MyTest struct { param1 uint64 param2 uint64 } type MyTest struct { param1 uint64 _p1 [8]int64 param2 uint64 _p2 [8]int64 } ps. 目前主流 CPU cache line 為 64 byte
  20. 20. Benchmark result after padding
  21. 21. How to use it to improve performance
  22. 22. Lock free ring buffer
  23. 23. Lock free ring buffer
  24. 24. Lock free ring buffer type RingBuffer struct { head uint64 tail uint64 mask uint64 ringbuf []*entity } func (rb *RingBuffer) Put(item interface{}) error { // 獲取最新的 head 位置 // 將資料放進該位置 } func (rb *RingBuffer) Get() (interface{}, error) { // 獲取最新的 tail 位置 // 將該位置的資料抓出來 }
  25. 25. Benchmark: channel, ring buffer
  26. 26. Who use lock free ring buffer ● LAMX Disruptor ● So You Wanna Go Fast?
  27. 27. example code: https://github.com/genchilu/falseSharingPresentation
  28. 28. QA

×