Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

PayPal Risk Platform High Performance Practice

124 vues

Publié le

PayPal Risk Async Framework Targeting for High Performance & Low Latency

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

PayPal Risk Platform High Performance Practice

  1. 1. PayPal Risk Platform High Performance Practice Ling ZhiJun (Brian Ling)
  2. 2. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  3. 3. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  4. 4. 2017 Software Architecture Summit TPV/day ~1 BILLIONpayments/year 6.1 BILLIO N Computation/day ~20 Billion Active Customer Accounts 210M petabytes of data 105 Queries/ day 250 Billion PayPal operates one of the largest Online Payment in the world 0.32% Loss Rate The power of our platform Our technology transformation enables us to: • Process payments at tremendous scale (200+ countries & 25currencies supported) • Accelerate the innovation of new products • Engage world-class developers & technologists PayPal Overview
  5. 5. 2017 Software Architecture Summit TPV +35 4 BILLION payments/year 6.1 BILLIO N payments/ second at peak 1.8B active customer accounts 210M petabytes of data 73 database calls/ quarter 4.5T PayPal operates one of the largest Online Payment in the world 0.32% Loss Rate The power of our platform Our technology transformation enables us to: • Process payments at tremendous scale (200+ countries & 25currencies supported) • Accelerate the innovation of new products • Engage world-class developers & technologists PayPal Risk KPI Payments transactions
  6. 6. Requirement for Risk Platform Accuracy vs Latency Low Latency + Hardware Investment Vs Large Throughput
  7. 7. 2017 Software Architecture Summit PayPal Risk Platform Architecture Online Offline DAL Service Real-time Compute Data Offline Generated Data Model + Variable Computation Service Decision Service Variable Rollup Service Logging System/ ETL Read Path Write Path Gateway Service Offline Generated Data Simulated Real-time Data Offline Variable Simulation PlatformModel Training Platform Offline Variable Aggregation Service
  8. 8. 2017 Software Architecture Summit PayPal Risk Platform Architecture Online Offline DAL Service Offline Generated Data Real-time Compute Data Model + Variable Computation Service Decision Service Variable Aggregation Service Logging System/ ETL Read Path Write Path Gateway Service Offline Generated Data Simulated Real-time Data Offline Variable Simulation PlatformModel Training Platform Offline Variable Aggregation Service
  9. 9. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  10. 10. DAL Service Ultimate Questions JVM-Based High Performance & ATB DAL Service <100ms P99.99 Latency ?? For single instance, 20k-30k Peak TPS ?? • 99.99% Availability-To-Business??
  11. 11. DAL Service Technical Challenges Budget Cost • Align with traffic, Hardware investment Exponential Increase Performance Issue • P99 Latency Significantly differentiate Avg latency • Too Many Latency Spike under Traffic • Storage Cluster Unavailability Impact Latency Customer Requirement • Adopt New Use Case • Access behavior Differentiate per Colo • Flexibility & Fast-evolving Use Case • Replication • Traffic Strategy Operational Cost • Maintain too many Client with multiple versions • Too Frequent Release tie to Biz Case • Standby Storage Cluster switch- over Req Tech Value Cost
  12. 12. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  13. 13. 2017 Software Architecture Summit Async Original Benefit • More Efficient Thread Scheduling • Non-blocking Call • Event-Driven Callback • Less Context Switch • Fault Isolation
  14. 14. 2017 Software Architecture Summit Reactor Pattern Threading Model
  15. 15. 2017 Software Architecture Summit Async DAL Service KPI Comparison • Low Latency • ~10-35% Reduction (Average/P99) 0 20000 40000 60000 80000 100000 120000 200030004000500060007000800090001000011000120001300014000150001600017000 LATENCY(INMICROSECONDS) THROUGHPUT (REQUESTS PER SEC) E2E Client-Service-Aerospike Benchmark: Read 50% Write 50% Latency vs. Throughput (4-core VM) 99thPercentileLatency_update 99thPercentileLatency_read AvgLatency_read AvgLatency_update 99.9thPercentileLatency_read 99.9thPercentileLatency_update 99.99thPercentileLatency_read 99.99thPercentileLatency_update
  16. 16. 2017 Software Architecture Summit Async DAL Service KPI Comparison – Cont. • High Throughput • 3-10X Increase (Single Instance Comparison)
  17. 17. 2017 Software Architecture Summit Async DAL Service KPI Comparison – Cont. • Less CPU Usage • 50% CPU Usage Reduction • 66%+ Reduction for Context Switch & System Interrupts
  18. 18. 2017 Software Architecture Summit Async DAL Service KPI Comparison – Cont. • Less Thread Pool • 90% Reduction for Thread pool number 0 20 40 60 80 100 120 140 160 180 200 Server RPC Thread Operation Thread Replication Thread Management Thread 9 0 0 2 200 14 40 2 Thread Number Comparison Async Sync
  19. 19. Async DAL Service KPI Comparison – Cont. • Memory Friendly • 20% Reduction for Memory Allocation • 100+MB Young Generation after Young GC • 130+MB Pooled Off-heap 0.00% 0.01% 0.02% 0.03% 0.04% 0.05% 0.06% 0.07% Sync Async GC Time / Total Time GC Time / Total Time 0 50 100 150 200 250 300 350 Sync Async GC Count GC Count
  20. 20. We Have ONE Async Dream • Reform Application Charter from CPU-bound Charter to IO- bound • Traffic Throughput (non-)linear growth with CPU Usage • By guarantee Low Latency, Taking 20-30K TPS with 500MB JVM Heap (After young GC) • Cloud Friendly Application • Less Hardware Investment • Low Operational Cost • Easy Capacity Estimation
  21. 21. High Performance Design E2E Async • Non-blocking Pipeline: Async RPC + Async DataAccess Less is More • Shared ThreadPool OVER Separate ThreadPool • Inline Execution over Execution cross Multiple Thread Pool Autonomous Memory Management • Use Off-Heap as much as possible (inbound/outbound & [de]serialization) • Release Inbound Memory At earlier stage (submitRequest)
  22. 22. High Performance Good Practice • Performance Test as Critical Path for Each Commit • [Mandatory] Continuous Performance Test for Each Commit Inbound/Outbound Management • Batch Consolidation • Order Management • Timeout Management • Retry Only Happen in Client Side Programming Habit • Fast Fail over Exception Thrown Cascading • Logging & Monitoring Matters • Thread-safe Write Operation In Control Plan while Exception-safe Read Operation In Data Plane KPI Sign-Off
  23. 23. Async High Level Architecture Real Time Data Service Data Set Clients Data Set 1 Client Data Set N Client Data Set Schema Data Access API Metadata API Generic Configuration API KV Store APIClient Server Biz logic HTTP(s) RPC Client HTTP(s) RPC Server KV Store API Generic logic Schema-less Read KV Store Metadata namespace Data set namespace Configuration namespace Direct access Service access Store/Cache
  24. 24. Async DAL Service Hierarchy
  25. 25. Async Data Access Maturity • Client& Server RoR Identification • biz-schema aware on Client Side • Schema-less on Sever Side • Traffic Sharding & Routing • Active-Active/Active-Standby • Auto-Failover • Multi-Tenancy • ACL • Direct/Service-To-Service Replication … .... • Source-of-Truth for Online Guideline & Offline Inventory • Centralized Configuration • Zero Restart/Auto-Fresh DAL Service Feature Metadata Driven Data Access Mapping DataSet => KV Mapping Logical => Physical DataSet Mapping
  26. 26. 2017 Software Architecture Summit Async RPC Control Plane Abstraction
  27. 27. 2017 Software Architecture Summit Async RPC Maturity • Configurable Execution Chain per URL • Customize protobuf / json encoder • Inject Monitoring Module • Execution Resource Configuration • Threadpool size / netty option (tcp_nodelay) • Sharable or not • Service Listener Registry • Server Container Life Cycle Management • Graceful Shutdown • Partial Shutdown Given Container • Auto Rebuild RPC Client Channel High Flexibility Configuration RPC Resource Management
  28. 28. Async RPC Embrace Async DataAccess
  29. 29. Async Core Value • Low Latency + High Throughput • Low System Load • SLA Isolation • Understand Performance Contribution More • Zero Code Change + Zero Release (new case on-board) • Minimize new DB Storage Integration Effort • Lego-Style Customization • Highly Reusable Functionality High Performance Easy Adoption Cost Saving • Less Hardware Investment • Loose Constraint for Hardware/VM SKU High Flexibility Configuration • Execution Chain per URL (RPC) • DataAccess Storage & Option [consistency & ttl] • Traffic Routing Strategy • Replication Strategy
  30. 30. 2017 Software Architecture Summit Async Family Async Data Access RPC (Server/ Client) In-Memory Aerospike Workflow Messaging (pub-sub) Kafka ActiveMQ Netty HBase
  31. 31. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  32. 32. Future Plan • Shared Eventloop • Netty Option (IO Ratio) • NIO vs Epoll SocketChannel • JDK SSL vs OpenSSL • Protobuf vs Msgpack • Sync Client vs Async Client • W/- Monitoring/Replication features Async DataAccess • Compute Operation Support • DB Server-side UDF Adoption • Smart Client for Direct & Service Access • Async HBase Integration Async RPC • Finer Granularity Monitoring & Throttling • Error Handling Injection • Client Side Multiplexing • Server Push Partial Response + RPC Client Consolidate Response Async+Sync Hybrid Workflow Execution Continuous Performance Tuning Deep Dive Open Source in Year 2019
  33. 33. 2017 Software Architecture Summit

×