SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Brought to you by
Extreme HTTP Performance Tuning:
1.2M API req/s on a 4 vCPU EC2 Instance
Marc Richards
Chief Problem Solver at
Marc Richards
Chief Problem Solver at Talawah Solutions
Talawah Solutions
■ Based in Kingston Jamaica
■ Cloud Computing Consultant for almost a decade
■ Solutions Architect / DevOps Engineer / Performance Engineer
■ No low-level systems performance tuning experience before
this project!
Demystifying Systems Performance Tuning
■ You don't need to be a kernel developer or a wizard sysadmin.
■ FlameGraph and bpftrace have changed the game.
■ New ebpf based tools coming out will only make things easier!
Overview
■ I accidentally fell down this optimization rabbit hole.
■ Started with a simple, high-performance API server written in C.
■ Used FlameGraph and bpftrace to analyze and optimize the entire stack.
Overview
■ Cloud: AWS
■ Hardware: 4 vCPU c5n.xlarge** (server) / 16 vCPU c5n.4xlarge (client)
■ Benchmark: Techempower JSON Serialization test
■ Server: Techempower libreactor implementation
** In order to minimize inconsistencies at the platform level I did the final benchmark run on a c5n.9xlarge that was
restricted to 4 vCPUS using the EC2 CPU Options feature.
Blog post with even more details
https://talawah.io/blog/extreme-http-performance-tuning-one
-point-two-million/
Optimizations
Optimization Gain Req/s
Ground Zero - 224k
Application Optimizations 55% 347k
Disabling Speculative Execution Mitigations 28% 446k
Disabling Syscall Auditing / Blocking 11% 495k
Disabling iptables / netfilter 22% 603k
Perfect Locality 38% 834k
Interrupt Optimizations 28% 1.06M
The Case of the Nosy Neighbor 6% 1.12M
The Battle Against the Spin Lock 2% 1.15M
This Goes to Twelve 4% 1.20M
Optimizations
Optimization Gain Req/s
Ground Zero - 224k
Application Optimizations 55% 347k
Disabling Speculative Execution Mitigations 28% 446k
Disabling Syscall Auditing / Blocking 11% 495k
Disabling iptables / netfilter 22% 603k
Perfect Locality 38% 834k
Interrupt Optimizations 28% 1.06M
The Case of the Nosy Neighbor 6% 1.12M
The Battle Against the Spin Lock 2% 1.15M
This Goes to Twelve 4% 1.20M
Ground Zero
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 1.14ms
90.00% 1.21ms
99.00% 1.26ms
99.99% 1.32ms
2243551 requests in 10.00s, 331.64MB read
Requests/sec: 224,353.73
* I modified nginx.conf to send back a hardcoded JSON response. This is not a part of the Techempower implementation.
Ground Zero
Application Optimizations
Application Optimizations
Application Optimizations
■ Run on all logical cores/vCPUs: ~25%
■ gcc -O3 and march=native: ~15%
■ send/recv instead of write/read: ~5%
■ Remove pthread overhead: ~3%
Application Optimizations
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 723.00us
90.00% 0.88ms
99.00% 0.94ms
99.99% 1.08ms
3470892 requests in 10.00s, 483.27MB read
Requests/sec: 347,087.15
Application Optimizations
Before After
Disabling...
Speculative Execution Mitigations
Syscall Auditing / Blocking
iptables/netfilter
Disabling...
■ Speculative Execution Mitigations: 28%
● nospectre_v1 nospectre_v2 pti=off mds=off tsx_async_abort=off
■ Syscall Auditing/Blocking: 11%
● auditctl -a never,task
● docker run -d --security-opt seccomp=unconfined libreactor
■ iptables/netfilter: 22%
● modprobe -rv ip_tables
● ExecStart=/usr/bin/dockerd ---bridge=none --iptables=false --ip-forward=false
Disabling...
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 419.00us
90.00% 479.00us
99.00% 517.00us
99.99% 575.00us
6031161 requests in 10.00s, 839.76MB read
Requests/sec: 603,112.18
Disabling...
Before After
Perfect Locality
+
Interrupt Optimizations
Perfect Locality + Interrupt Optimizations
■ Perfect Locality
● Pin processes to CPUs
● Pin network queues to CPUs (RSS + XPS)
● SO_REUSEPORT + SO_ATTACH_REUSEPORT_CBPF
■ Interrupt Moderation
● ethtool -C eth0 adaptive-rx on
■ Busy polling
● net.core.busy_poll=1
■ Perfect Locality + Interrupt Moderation + Busy Polling = 💯
Perfect Locality + Interrupt Optimizations
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 233.00us
90.00% 263.00us
99.00% 292.00us
99.99% 348.00us
10660410 requests in 10.00s, 1.45GB read
Requests/sec: 1,066,034.60
Perfect Locality + Interrupt Optimizations
Before After
The Case of the Nosy Neighbor
+
The Battle Against the Spin Lock
The Case of the Nosy Neighbor
Someone, somewhere was spying on all my packets (kinda)
■ dev_queue_xmit_nit() -> packet_rcv()
■ packet_rcv() implicates AF_PACKET
■ sudo ss --packet --processes -> (("dhclient",pid=3191,fd=5))
■ My (extreme) solution was to disable dhclient after boot
The Case of the Nosy Neighbor
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 218.00us
90.00% 254.00us
99.00% 285.00us
99.99% 341.00us
11279049 requests in 10.00s, 1.53GB read
Requests/sec: 1,127,894.86
The Case of the Nosy Neighbor
Before After
The Battle Against the Spin Lock
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 212.00us
90.00% 246.00us
99.00% 276.00us
99.99% 338.00us
11551707 requests in 10.00s, 1.57GB read
Requests/sec: 1,155,162.15
The Battle Against the Spin Lock
Before After
This Goes to Twelve
This Goes to Twelve
■ Disabling Generic Receive Offload (GRO)
■ TCP Congestion Control: cubic -> reno
■ Static Interrupt Moderation
This Goes to Twelve
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 203.00us
90.00% 236.00us
99.00% 265.00us
99.99% 317.00us
12031718 requests in 10.00s, 1.64GB read
Requests/sec: 1,203,164.22
Conclusion
436% increase in requests per second. 79% reduction in p99 latency.
■ Throughput: 224k req/s -> 1.2M req/s
■ p99 latency: 1.26ms -> 265.00us
■ p99.99 latency: 1.32ms -> 317.00us
All 11 implementations on a c5n.xlarge using the stock Amazon Linux 2 AMI without any OS/Networking optimizations
All 11 implementations on a c5n.xlarge with all OS/Networking optimizations applied
Next Steps
■ Next gen kernel: 5.10 LTS
■ Next gen technologies: io_uring
■ Next gen instances: ARM vs Intel vs AMD
■ Driving performance from the bottom-up using Rust, Java, etc
Brought to you by
Marc Richards
https://talawah.io/contact
@talawahtech

Contenu connexe

Tendances

Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumScyllaDB
 
Kubernetes a comprehensive overview
Kubernetes   a comprehensive overviewKubernetes   a comprehensive overview
Kubernetes a comprehensive overviewGabriel Carro
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunheut2008
 
Containerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container RuntimeContainerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container RuntimePhil Estes
 
Docker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting TechniquesDocker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting TechniquesSreenivas Makam
 
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1Etsuji Nakai
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayLaurent Bernaille
 
Cilium - API-aware Networking and Security for Containers based on BPF
Cilium - API-aware Networking and Security for Containers based on BPFCilium - API-aware Networking and Security for Containers based on BPF
Cilium - API-aware Networking and Security for Containers based on BPFThomas Graf
 
How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichDevOpsDays Tel Aviv
 
Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners HubSpot
 
An Introduction to Linux
An Introduction to LinuxAn Introduction to Linux
An Introduction to Linuxanandvaidya
 
Mininet multiple controller
Mininet   multiple controllerMininet   multiple controller
Mininet multiple controllerCatur Mei Rahayu
 
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPFUSENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPFTaeung Song
 
Linux commands and file structure
Linux commands and file structureLinux commands and file structure
Linux commands and file structureSreenatha Reddy K R
 

Tendances (20)

Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in Cilium
 
Docker, LinuX Container
Docker, LinuX ContainerDocker, LinuX Container
Docker, LinuX Container
 
Kubernetes a comprehensive overview
Kubernetes   a comprehensive overviewKubernetes   a comprehensive overview
Kubernetes a comprehensive overview
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
Containerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container RuntimeContainerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container Runtime
 
Docker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting TechniquesDocker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting Techniques
 
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
 
Introduction to Git and GitHub
Introduction to Git and GitHubIntroduction to Git and GitHub
Introduction to Git and GitHub
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard way
 
Cilium - API-aware Networking and Security for Containers based on BPF
Cilium - API-aware Networking and Security for Containers based on BPFCilium - API-aware Networking and Security for Containers based on BPF
Cilium - API-aware Networking and Security for Containers based on BPF
 
Linux kernel
Linux kernelLinux kernel
Linux kernel
 
Linux kernel
Linux kernelLinux kernel
Linux kernel
 
How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar Leibovich
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners
 
An Introduction to Linux
An Introduction to LinuxAn Introduction to Linux
An Introduction to Linux
 
Mininet multiple controller
Mininet   multiple controllerMininet   multiple controller
Mininet multiple controller
 
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPFUSENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
 
Linux commands and file structure
Linux commands and file structureLinux commands and file structure
Linux commands and file structure
 

Similaire à Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance

Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance BenchmarkingSantanu Dey
 
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingAnalyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingScyllaDB
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchYutaka Yasuda
 
(NET404) Making Every Packet Count
(NET404) Making Every Packet Count(NET404) Making Every Packet Count
(NET404) Making Every Packet CountAmazon Web Services
 
AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)Amazon Web Services
 
Web Server Deathmatch 2009 Erlang Factory Joe Williams
Web Server Deathmatch 2009 Erlang Factory Joe WilliamsWeb Server Deathmatch 2009 Erlang Factory Joe Williams
Web Server Deathmatch 2009 Erlang Factory Joe Williamslogicalstack
 
Mininet: Moving Forward
Mininet: Moving ForwardMininet: Moving Forward
Mininet: Moving ForwardON.Lab
 
Crushing Latency with Vert.x
Crushing Latency with Vert.xCrushing Latency with Vert.x
Crushing Latency with Vert.xPaulo Lopes
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemSneha Inguva
 
Set Up & Operate Tungsten Replicator
Set Up & Operate Tungsten ReplicatorSet Up & Operate Tungsten Replicator
Set Up & Operate Tungsten ReplicatorContinuent
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
Setup & Operate Tungsten Replicator
Setup & Operate Tungsten ReplicatorSetup & Operate Tungsten Replicator
Setup & Operate Tungsten ReplicatorContinuent
 
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecasesLF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecasesLF_OpenvSwitch
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsPerrin Harkins
 
Training Slides: 153 - Working with the CLI
Training Slides: 153 - Working with the CLITraining Slides: 153 - Working with the CLI
Training Slides: 153 - Working with the CLIContinuent
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linuxPavel Klimiankou
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersSadique Puthen
 

Similaire à Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance (20)

Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
 
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingAnalyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow Switch
 
(NET404) Making Every Packet Count
(NET404) Making Every Packet Count(NET404) Making Every Packet Count
(NET404) Making Every Packet Count
 
AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)
 
Web Server Deathmatch 2009 Erlang Factory Joe Williams
Web Server Deathmatch 2009 Erlang Factory Joe WilliamsWeb Server Deathmatch 2009 Erlang Factory Joe Williams
Web Server Deathmatch 2009 Erlang Factory Joe Williams
 
Mininet: Moving Forward
Mininet: Moving ForwardMininet: Moving Forward
Mininet: Moving Forward
 
Crushing Latency with Vert.x
Crushing Latency with Vert.xCrushing Latency with Vert.x
Crushing Latency with Vert.x
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use Them
 
Set Up & Operate Tungsten Replicator
Set Up & Operate Tungsten ReplicatorSet Up & Operate Tungsten Replicator
Set Up & Operate Tungsten Replicator
 
Haproxy - zastosowania
Haproxy - zastosowaniaHaproxy - zastosowania
Haproxy - zastosowania
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
Setup & Operate Tungsten Replicator
Setup & Operate Tungsten ReplicatorSetup & Operate Tungsten Replicator
Setup & Operate Tungsten Replicator
 
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecasesLF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applications
 
Training Slides: 153 - Working with the CLI
Training Slides: 153 - Working with the CLITraining Slides: 153 - Working with the CLI
Training Slides: 153 - Working with the CLI
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoorters
 

Plus de ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

Plus de ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Dernier

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance

  • 1. Brought to you by Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance Marc Richards Chief Problem Solver at
  • 2. Marc Richards Chief Problem Solver at Talawah Solutions Talawah Solutions ■ Based in Kingston Jamaica ■ Cloud Computing Consultant for almost a decade ■ Solutions Architect / DevOps Engineer / Performance Engineer ■ No low-level systems performance tuning experience before this project!
  • 3. Demystifying Systems Performance Tuning ■ You don't need to be a kernel developer or a wizard sysadmin. ■ FlameGraph and bpftrace have changed the game. ■ New ebpf based tools coming out will only make things easier!
  • 4. Overview ■ I accidentally fell down this optimization rabbit hole. ■ Started with a simple, high-performance API server written in C. ■ Used FlameGraph and bpftrace to analyze and optimize the entire stack.
  • 5. Overview ■ Cloud: AWS ■ Hardware: 4 vCPU c5n.xlarge** (server) / 16 vCPU c5n.4xlarge (client) ■ Benchmark: Techempower JSON Serialization test ■ Server: Techempower libreactor implementation ** In order to minimize inconsistencies at the platform level I did the final benchmark run on a c5n.9xlarge that was restricted to 4 vCPUS using the EC2 CPU Options feature.
  • 6. Blog post with even more details https://talawah.io/blog/extreme-http-performance-tuning-one -point-two-million/
  • 7. Optimizations Optimization Gain Req/s Ground Zero - 224k Application Optimizations 55% 347k Disabling Speculative Execution Mitigations 28% 446k Disabling Syscall Auditing / Blocking 11% 495k Disabling iptables / netfilter 22% 603k Perfect Locality 38% 834k Interrupt Optimizations 28% 1.06M The Case of the Nosy Neighbor 6% 1.12M The Battle Against the Spin Lock 2% 1.15M This Goes to Twelve 4% 1.20M
  • 8. Optimizations Optimization Gain Req/s Ground Zero - 224k Application Optimizations 55% 347k Disabling Speculative Execution Mitigations 28% 446k Disabling Syscall Auditing / Blocking 11% 495k Disabling iptables / netfilter 22% 603k Perfect Locality 38% 834k Interrupt Optimizations 28% 1.06M The Case of the Nosy Neighbor 6% 1.12M The Battle Against the Spin Lock 2% 1.15M This Goes to Twelve 4% 1.20M
  • 9. Ground Zero Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 1.14ms 90.00% 1.21ms 99.00% 1.26ms 99.99% 1.32ms 2243551 requests in 10.00s, 331.64MB read Requests/sec: 224,353.73
  • 10. * I modified nginx.conf to send back a hardcoded JSON response. This is not a part of the Techempower implementation.
  • 14. Application Optimizations ■ Run on all logical cores/vCPUs: ~25% ■ gcc -O3 and march=native: ~15% ■ send/recv instead of write/read: ~5% ■ Remove pthread overhead: ~3%
  • 15. Application Optimizations Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 723.00us 90.00% 0.88ms 99.00% 0.94ms 99.99% 1.08ms 3470892 requests in 10.00s, 483.27MB read Requests/sec: 347,087.15
  • 17. Disabling... Speculative Execution Mitigations Syscall Auditing / Blocking iptables/netfilter
  • 18. Disabling... ■ Speculative Execution Mitigations: 28% ● nospectre_v1 nospectre_v2 pti=off mds=off tsx_async_abort=off ■ Syscall Auditing/Blocking: 11% ● auditctl -a never,task ● docker run -d --security-opt seccomp=unconfined libreactor ■ iptables/netfilter: 22% ● modprobe -rv ip_tables ● ExecStart=/usr/bin/dockerd ---bridge=none --iptables=false --ip-forward=false
  • 19. Disabling... Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 419.00us 90.00% 479.00us 99.00% 517.00us 99.99% 575.00us 6031161 requests in 10.00s, 839.76MB read Requests/sec: 603,112.18
  • 22. Perfect Locality + Interrupt Optimizations ■ Perfect Locality ● Pin processes to CPUs ● Pin network queues to CPUs (RSS + XPS) ● SO_REUSEPORT + SO_ATTACH_REUSEPORT_CBPF ■ Interrupt Moderation ● ethtool -C eth0 adaptive-rx on ■ Busy polling ● net.core.busy_poll=1 ■ Perfect Locality + Interrupt Moderation + Busy Polling = 💯
  • 23. Perfect Locality + Interrupt Optimizations Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 233.00us 90.00% 263.00us 99.00% 292.00us 99.99% 348.00us 10660410 requests in 10.00s, 1.45GB read Requests/sec: 1,066,034.60
  • 24. Perfect Locality + Interrupt Optimizations Before After
  • 25. The Case of the Nosy Neighbor + The Battle Against the Spin Lock
  • 26. The Case of the Nosy Neighbor Someone, somewhere was spying on all my packets (kinda) ■ dev_queue_xmit_nit() -> packet_rcv() ■ packet_rcv() implicates AF_PACKET ■ sudo ss --packet --processes -> (("dhclient",pid=3191,fd=5)) ■ My (extreme) solution was to disable dhclient after boot
  • 27. The Case of the Nosy Neighbor Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 218.00us 90.00% 254.00us 99.00% 285.00us 99.99% 341.00us 11279049 requests in 10.00s, 1.53GB read Requests/sec: 1,127,894.86
  • 28. The Case of the Nosy Neighbor Before After
  • 29. The Battle Against the Spin Lock Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 212.00us 90.00% 246.00us 99.00% 276.00us 99.99% 338.00us 11551707 requests in 10.00s, 1.57GB read Requests/sec: 1,155,162.15
  • 30. The Battle Against the Spin Lock Before After
  • 31. This Goes to Twelve
  • 32. This Goes to Twelve ■ Disabling Generic Receive Offload (GRO) ■ TCP Congestion Control: cubic -> reno ■ Static Interrupt Moderation
  • 33. This Goes to Twelve Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 203.00us 90.00% 236.00us 99.00% 265.00us 99.99% 317.00us 12031718 requests in 10.00s, 1.64GB read Requests/sec: 1,203,164.22
  • 34. Conclusion 436% increase in requests per second. 79% reduction in p99 latency. ■ Throughput: 224k req/s -> 1.2M req/s ■ p99 latency: 1.26ms -> 265.00us ■ p99.99 latency: 1.32ms -> 317.00us
  • 35. All 11 implementations on a c5n.xlarge using the stock Amazon Linux 2 AMI without any OS/Networking optimizations
  • 36. All 11 implementations on a c5n.xlarge with all OS/Networking optimizations applied
  • 37. Next Steps ■ Next gen kernel: 5.10 LTS ■ Next gen technologies: io_uring ■ Next gen instances: ARM vs Intel vs AMD ■ Driving performance from the bottom-up using Rust, Java, etc
  • 38. Brought to you by Marc Richards https://talawah.io/contact @talawahtech