Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Non-blocking IO to tame distributed systems ー How and why ChatWork uses asynchbase

1 728 vues

Publié le

title: ノンブロッキングIOで分散システム を手懐ける ーチャットワークでのasynchbaseの利用
event: LINE Developer Meetup in Tokyo #28 - JVM非同期プログラミング -
https://line.connpass.com/event/78912/

Publié dans : Ingénierie
  • Soyez le premier à commenter

Non-blocking IO to tame distributed systems ー How and why ChatWork uses asynchbase

  1. 1. ノンブロッキングIOで分散システム を手懐ける ーチャットワークでのasynchbaseの 利用 Non-blocking IO to tame distributed systems ー How and why ChatWork uses asynchbase 安田裕介/Yusuke Yasuda (@TanUkkii007)
  2. 2. Agenda ● How we used a native HBase client ● Problems we faced with a native HBase client ● Migration to asynchbase ● Blocking IO vs Non-blocking IO: performance test results
  3. 3. About me ● Yusuke Yasuda / 安田裕介 ● @TanUkkii007 ● Working for Chatwork for 2 years ● Scala developer
  4. 4. About ChatWork
  5. 5. How we used a native HBase client
  6. 6. Messaging system architecture overview You can find more information about our architecture at Kafka summit 2017. Today’s topic
  7. 7. HBase ● Key-value storage to enable random access on HDFS ● HBase is used as a query-side storage in our system ○ version: 1.2.0 ● Provides streaming API called “Scan” to query a sequence of rows iteratively ● Scan is the most used HBase API in ChatWork
  8. 8. Synchronous scan with native HBase client A bad example def scanHBase(connection: Connection, tableName: TableName, scan: Scan): Vector[Result] = { val table: Table = connection.getTable(tableName) val scanner: ResultScanner = table.getScanner(scan) @tailrec def loop(results: Vector[Result]): Vector[Result] = { val result = scanner.next() if (result == null) results else loop(results :+ result) } try { loop(Vector.empty) } finally { table.close() scanner.close() } } ● a thread is not released until whole scan is finished ● throughput is bounded by the number of threads in a pool ● long running blocking calls cause serious performance problem in event loop style application like Akka HTTP Cons: Gist
  9. 9. Throughput and Latency trade-off in asynchronous and synchronous settings asynchronous : throughput=8, latency=2 synchronous: throughput=4, latency=1 Asynchronous setting is more flexible and fair! synchronous asynchronous Optimized for latency throughput Under high workload throughput is bounded throughput increases while sacrificing latency Under low workload Requests for many rows are executed exclusively are evenly scheduled as small requests both have equal latency and throughput
  10. 10. Asynchronous streaming of Scan operation with Akka Stream class HBaseScanStage(connection: Connection, tableName: TableName, scan: Scan) extends GraphStage[SourceShape[Result]] { val out: Outlet[Result] = Outlet("HBaseScanSource") override def shape: SourceShape[Result] = SourceShape(out) override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) { var table: Table = _ var scanner: ResultScanner = _ override def preStart(): Unit = { table = connection.getTable(tableName) scanner = table.getScanner(scan) } setHandler(out, new OutHandler { override def onPull(): Unit = { val next = scanner.next() if (next == null) complete(out) else push(out, next) } }) override def postStop(): Unit = { if (scanner != null) scanner.close() if (table != null) table.close() super.postStop() } } } ● ResultScanner#next() is passively called inside callback in a thread safe way ● thread is released immediately after single ResultScanner#next() call ● Results are pushed to downstream asynchronously ● when and how many times next()s are called is determined by downstream Gist
  11. 11. Problems we faced caused by a native HBase client
  12. 12. Just a single unresponsive HBase region server caused whole system degradation The call queue size of hslave-5 region server spiked. All Message Read API servers suffered latency increase and throughput fall.
  13. 13. Distributed systems are supposed to fail partially but why not? ● Native HBase client uses blocking IO ● Requests to unresponsive HBase block a thread until timeout ● All threads in a thread pool are consumed so Message Read API servers were not able to respond upper limit of pool size HBase IPC queue size thread pool status in Read API servers #active threads
  14. 14. Asynchronous streaming is not enough. Non-blocking IO matters. What we learned
  15. 15. Migration to asynchbase
  16. 16. asynchbase Non-blocking HBase client based on Netty ● https://github.com/OpenTSDB/asynchbase ● Netty 3.9 ● Supports reverse scan since 1.8 ● Asynchronous interface by Deferred ○ https://github.com/OpenTSDB/async ○ Observer pattern that provides callback interfaces ● Thread safety provided by Deferred ○ Event loop executes volatile checks at each step ○ Safe to mutate states inside callbacks
  17. 17. Introduce streaming interface to asynchbase with Akka Stream class HBaseAsyncScanStage(scanner: Scanner) extends GraphStage[SourceShape[util.ArrayList[KeyValue]]] with HBaseCallbackConversion { val out: Outlet[util.ArrayList[KeyValue]] = Outlet("HBaseAsyncScanStage") override def shape: SourceShape[util.ArrayList[KeyValue]] = SourceShape(out) override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) { var buffer: List[util.ArrayList[KeyValue]] = List.empty setHandler(out, new OutHandler { override def onPull(): Unit = { if (buffer.isEmpty) { val deferred = scanner.nextRows() deferred.addCallbacks( (results: util.ArrayList[util.ArrayList[KeyValue]]) => callback.invoke(Option(results)), (e: Throwable) => errorback.invoke(e) ) } else { val (element, tailBuffer) = (buffer.head, buffer.tail) buffer = tailBuffer push(out, element) } } }) override def postStop(): Unit = { scanner.close() super.postStop() } private val callback = getAsyncCallback[Option[util.ArrayList[util.ArrayList[KeyValue]]]] { case Some(results) if !results.isEmpty => val element = results.remove(0) buffer = results.asScala.toList push(out, element) case Some(results) if results.isEmpty => complete(out) case None => complete(out) } private val errorback = getAsyncCallback[Throwable] { error => fail(out, error) } } } ※ This code contains a serious issue. You must handle downstream cancellation properly. Otherwise a Close request may be fired while NextRows request is still running, which causes HBase protocol violation. See how to solve this problem on the Gist. Gist
  18. 18. Customizing Scan behavior with downstream pipelines HBaseAsyncScanSource(scanner).take(1000) HBaseAsyncScanSource(scanner) .throttle(elements=100, per=1 second, maximumBurst=100, ThrottleMode.Shaping) HBaseAsyncScanSource(scanner).completionTimeout(5 seconds) HBaseAsyncScanSource(scanner).recoverWithRetries(10, { case NotServingRegionException => HBaseAsyncScanSource(scanner) }) ● early termination of scan when count of rows limit is reached ● scan iteration rate limiting ● early termination of scan by timeout ● retrying if a region server is not serving Gist
  19. 19. Switching from synchronous API to asynchronous API ● Switching from synchronous API to asynchronous API usually requires rewriting whole APIs ● Abstracting database drivers is difficult ● Starting with asynchronous interface like Future[T] is a good practice ● Another option for abstract interface is streams ● Streams can behave collections like Future, Option, List, Try, but do not require monad transformer to integrate each other ● Stream interface specification like reactive-streams (JEP266) gives a way to connect various asynchronous libraries ● Akka Stream is one of the implementations of the reactive-streams
  20. 20. Database access abstraction with streams Transport Interface Layer interface: Directive[T], Future[T] engine: Akka HTTP Stream Adaptor interface: Source[Out, M], Flow[In, Out, M], Sink[In, M] engine: Akka Stream Database Interface Layer interface: implementation specific engine: database driver ● native HBase client ● asynchbase ● HBaseScanStage ● HBaseAsyncScanStage ● ReadMessageDAS UseCase Layer interface: Source[Out, M], Flow[In, Out, M], Sink[In, M] engine: Akka Stream Domain Layer interface: Scala collections and case classes engine: Scala standard library
  21. 21. Transport Interface Layer interface: Directive[T], Future[T] engine: Akka HTTP Stream Adaptor interface: Source[Out, M], Flow[In, Out, M], Sink[In, M] engine: Akka Stream Database Interface Layer interface: implementation specific engine: database driver ● native HBase client ● asynchbase ● HBaseScanStage ● HBaseAsyncScanStage ● ReadMessageDAS UseCase Layer interface: Source[Out, M], Flow[In, Out, M], Sink[In, M] engine: Akka Stream Domain Layer interface: Scala collections and case classes engine: Scala standard library ● Stream abstraction mitigates impact of changes of underlying implementations ● Database access implementation can be switched by Factory functions ● No change was required inside UseCase and Domain layers Database access abstraction with streams
  22. 22. Blocking IO vs Non-blocking IO performance test results Fortunately we have not faced HBase issues since asynchbase migration in production. Following slides show performance test results that was conducted before asynchbase deployment.
  23. 23. Blocking IO vs Non-blocking IO performance test settings ● Single Message Read API server ○ JVM heap size=4GiB ○ CPU request=3.5 ○ CPU limit=4 ● Using production workload pattern simulated with gatling stress tool ● 1340 request/second ● mainly invokes HBase Scan, but there are Get and batch Get as well Both implementations with asynchbase and native HBase client are tested with the same condition.
  24. 24. Blocking IO vs Non-blocking IO throughput Message Read API server with native HBase client Message Read API server with asynchbase throughput: 1000 → 1300
  25. 25. Blocking IO vs Non-blocking IO latency Message Read API server with native HBase client Message Read API server with asynchbase ※ Note that the scales of y-axis are different. 99pt.: 2000ms → 300ms 95pt.: 1000ms → 200ms
  26. 26. Blocking IO vs Non-blocking IO Thread pool usage Message Read API server with native HBase client Message Read API server with asynchbase Note that hbase-dispatcher is an application thread pool, not Netty IO worker thread pool. pool size: 600 → 8 active threads: 80 → 2
  27. 27. Blocking IO vs Non-blocking IO JVM heap usage Message Read API server with native HBase client Message Read API server with asynchbase heap usage: 2.6GiB → 1.8Gi
  28. 28. Blocking IO vs Non-blocking IO HBase scan metrics Message Read API server with native HBase client Message Read API server with asynchbase average of sum of millis sec between nexts average of sum of millis sec between nexts
  29. 29. HBase scan metrics may come to asynchnase https://github.com/OpenTSDB/asynchbase/pull/184
  30. 30. Room for improvement Timeouts and Rate limiting ● Proper timeouts and rate limiting are necessary for asynchronous and non-blocking systems ○ Without reins asynchronous system increases its throughput until consumes all resources ● Timeouts ○ completionTimeout: timout based on total processing time ■ Not ideal for Scan that has broad distribution of processing time ○ idleTimeout: timeout based on processing time between two data ■ Single iteration of Scan has sharp distribution of processing time. Probably a better strategy. ● Rate limiting ○ Under high workload, the first bottleneck is throughput of storage of HBase ■ How to implement storage-aware rate limiting? ■ Tuning application resources may be necessary
  31. 31. Conclusion ● Blocking IO spoils benefits of distributed databases ○ partial failure of database exhausts application threads and makes the application unresponsive ● Non-blocking IO is resilient to partial failure ● Asynchronous stream is great as a flexible execution model and abstract interface ● asynchronous stream with Non-blocking IO outperforms blocking one ● Our journey for resilient system continues

×