SlideShare une entreprise Scribd logo
1  sur  31
Millions of transactions per second, with an
advanced new programming model
Seastar
How multifarious and how mutually
complicated are the considerations which
the working of such an engine involve.
There are frequently several distinct sets of
effects going on simultaneously; all in a
manner independent of each other, and yet
to a greater or less degree exercising a
mutual influence. To adjust each to every
other, and indeed even to perceive and
trace them out with perfect correctness and
success, entails difficulties whose nature
partakes to a certain extent of those
involved in every question where conditions
are very numerous and inter-complicated.
Hardware outgrowing software
+ CPU clocks not getting faster.
+ More cores, but hard to use them.
+ Locks have costs even when no contention
+ Data is allocated on one core, copied and used on
others
+ Result: Software can’t keep up with new
hardware (SSD, 10Gbps networking…)
Kernel
Application
TCP/IPScheduler
queuequeuequeuequeuequeue
threads
NIC
Queues
Kernel
Memory
Workloads changing
+ Complex, multi-layered applications
+ NoSQL data stores
+ More users
+ Lower latencies needed
+ Microservices
- 81% of Redis processing is in the kernel.
- If 100 requests needed for a page, the “99%
latency” affects 63% of pageviews.
Kernel
Application
TCP/IPScheduler
queuequeuequeuequeuequeue
threads
NIC
Queues
Kernel
Memory
7 Million IOPS
Benchmark hardware
■ 2x Xeon E5-2695v3, 2.3GHz
35M cache, 14 cores
(28 cores total, 56 HT)
■ 8x 8GB DDR4 Micron memory
■ Intel Ethernet CNA XL710-QDA1
A new model
Threads
- Costly locking (example:
POSIX requires multiple
threads to be able to use same
socket)
+ Uses available skills/tools
Shared-nothing
+ Fewer wasted cycles
- Cross-core communication
must be explicit, so harder to
program
How
■ Single-threaded async engine
running on each CPU
■ No threads
■ No shared data
■ All inter-CPU communication by message
passing
Linear scaling
+ Each engine is executed by each core
+ Shared-nothing per-core design
+ Fits existing shared-nothing distributed
applications model
+ Full kernel bypass, supports zero-copy
+ No threads, no context switch and no locks!
+ Instead, asynchronous lambda
invocation
Application
TCP/I
P
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/I
P
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/I
P
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/I
P
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Kernel
Comparison with old school
Application
TCP/IPScheduler
queuequeuequeuequeuequeue
threads
NIC
Queues
Kernel
Traditional stack SeaStar’s sharded stack
Memory
Application
TCP/I
P
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/I
P
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/I
P
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/I
P
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(not
involved)
Userspace
Millions of connections
Traditional stack SeaStar’s sharded stack
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise is a
pointer to
eventually
computed value
Task is a
pointer to a
lambda function
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Threa
d
Stack
Threa
d
Stack
Threa
d
Stack
Threa
d
Stack
Threa
d
Stack
Threa
d
Stack
Threa
d
Stack
Threa
d
Stack
Thread is a
function pointer
Stack is a byte
array from 64k
to megabytes
But how can you program it?
■ Ada Lovelace’s
problem today
■ Need max. possible
“easy” without
giving up any “fast.”
If the answer
were “no”,
would this
book be 467
pages long?
Basic model
■ Futures
■ Promises
■ Continuations
F-P-C defined: Future
A future is a result of a computation
that may not be available yet.
■ a data buffer that we are reading from the network
■ the expiration of a timer
■ the completion of a disk write
■ the result computation that requires the values from one
or more other futures.
F-P-C defined: Promise
A promise is an object or function that
provides you with a future, with the
expectation that it will fulfill the future.
Basic future/promise
future<int> get(); // promises an int will be produced eventually
future<> put(int) // promises to store an int
void f() {
get().then([] (int value) {
put(value + 1).then([] {
std::cout << "value stored successfullyn";
});
});
}
Chaining
future<int> get(); // promises an int will be produced eventually
future<> put(int) // promises to store an int
void f() {
get().then([] (int value) {
return put(value + 1);
}).then([] {
std::cout << "value stored successfullyn";
});
}
Zero copy friendly
future<temporary_buffer>
socket::read(size_t n);
■ temporary_buffer points at driver-provided pages if
possible
■ stack can linearize scatter-gather buffers using page
tables
■ discarded after use
Zero copy friendly (2)
pair<future<size_t>,
future<temporary_buffer>>
socket::write(temporary_buffer);
■ First future becomes ready when TCP window allows
sending more data (usually immediately)
■ Second future becomes ready when buffer can be
discarded (after TCP ACK)
■ May complete in any order
Fully async filesystem
No threads
read_metadata().then([] {
return lock_pages();
}).then([] {
return read_data();
});
Shared state: networking
■ No shared state except index of
net channels (1 per cpu)
■ No migration of existing TCP connections
Handling shared state: block
■ Each CPU is responsible for handling
specific files/directories/free blocks
(by hash)
■ Can delegate access to another CPU for
locality, but not concurrent shared access
■ Flash optimized - no fancy layout
■ DMA only
Seastar
TCP
Seastar
TCP
Linux
sockets
Seastar TCP
DPDK Virtio or raw
device
access
Linux
process
OSv
networking
Deployment models
Licensing
■ Apache
■ Goals: compatibility and contributor safety
Performance results
■ Linear scaling to 20 cores and beyond
■ 250,000 transactions/core (memcached)
■ Currently limited by client. More client
development in progress.
Applications
■ HTTP server
■ NoSQL system
■ Distributed filesystem
■ Object store
■ Transparent proxy
■ Cache (Memcache, CDN,..)
■ NFV
Thank you
http://www.seastar-project.org/
@CloudiusSystems

Contenu connexe

Tendances

Data Reduction for Gluster with VDO
Data Reduction for Gluster with VDOData Reduction for Gluster with VDO
Data Reduction for Gluster with VDOGluster.org
 
OSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofOSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofrhatr
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 
Docker volume-isolator-in-mesos
Docker volume-isolator-in-mesosDocker volume-isolator-in-mesos
Docker volume-isolator-in-mesosGuangya Liu
 
GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS  GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS Gluster.org
 
1027 predictive models in 10 seconds, by David Pardo Villaverde, Corunet
1027 predictive models in 10 seconds, by David Pardo Villaverde, Corunet1027 predictive models in 10 seconds, by David Pardo Villaverde, Corunet
1027 predictive models in 10 seconds, by David Pardo Villaverde, CorunetAltinity Ltd
 
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux ContainersCRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux ContainersKirill Kolyshkin
 
GlusterFS As an Object Storage
GlusterFS As an Object StorageGlusterFS As an Object Storage
GlusterFS As an Object StorageKeisuke Takahashi
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureDataStax Academy
 
OSMC 2017 | Icinga 2 + Director, flexible Thresholds with Ansible by Kevin H...
OSMC 2017 |  Icinga 2 + Director, flexible Thresholds with Ansible by Kevin H...OSMC 2017 |  Icinga 2 + Director, flexible Thresholds with Ansible by Kevin H...
OSMC 2017 | Icinga 2 + Director, flexible Thresholds with Ansible by Kevin H...NETWAYS
 
Accessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willsonAccessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willsonGluster.org
 
Gluster as Block Store in Containers
Gluster as Block Store in ContainersGluster as Block Store in Containers
Gluster as Block Store in ContainersGluster.org
 
Aerospike Go Language Client
Aerospike Go Language ClientAerospike Go Language Client
Aerospike Go Language ClientSayyaparaju Sunil
 
Gluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster.org
 
Hands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyHands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyGluster.org
 
Fedora Virtualization Day: Linux Containers & CRIU
Fedora Virtualization Day: Linux Containers & CRIUFedora Virtualization Day: Linux Containers & CRIU
Fedora Virtualization Day: Linux Containers & CRIUAndrey Vagin
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySematext Group, Inc.
 
Tutorial ceph-2
Tutorial ceph-2Tutorial ceph-2
Tutorial ceph-2Tommy Lee
 
Gluster and Kubernetes
Gluster and KubernetesGluster and Kubernetes
Gluster and KubernetesGluster.org
 
Supercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPCSupercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPCOpenStack
 

Tendances (20)

Data Reduction for Gluster with VDO
Data Reduction for Gluster with VDOData Reduction for Gluster with VDO
Data Reduction for Gluster with VDO
 
OSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofOSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear of
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
Docker volume-isolator-in-mesos
Docker volume-isolator-in-mesosDocker volume-isolator-in-mesos
Docker volume-isolator-in-mesos
 
GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS  GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS
 
1027 predictive models in 10 seconds, by David Pardo Villaverde, Corunet
1027 predictive models in 10 seconds, by David Pardo Villaverde, Corunet1027 predictive models in 10 seconds, by David Pardo Villaverde, Corunet
1027 predictive models in 10 seconds, by David Pardo Villaverde, Corunet
 
CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux ContainersCRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
 
GlusterFS As an Object Storage
GlusterFS As an Object StorageGlusterFS As an Object Storage
GlusterFS As an Object Storage
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
 
OSMC 2017 | Icinga 2 + Director, flexible Thresholds with Ansible by Kevin H...
OSMC 2017 |  Icinga 2 + Director, flexible Thresholds with Ansible by Kevin H...OSMC 2017 |  Icinga 2 + Director, flexible Thresholds with Ansible by Kevin H...
OSMC 2017 | Icinga 2 + Director, flexible Thresholds with Ansible by Kevin H...
 
Accessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willsonAccessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willson
 
Gluster as Block Store in Containers
Gluster as Block Store in ContainersGluster as Block Store in Containers
Gluster as Block Store in Containers
 
Aerospike Go Language Client
Aerospike Go Language ClientAerospike Go Language Client
Aerospike Go Language Client
 
Gluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and future
 
Hands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyHands On Gluster with Jeff Darcy
Hands On Gluster with Jeff Darcy
 
Fedora Virtualization Day: Linux Containers & CRIU
Fedora Virtualization Day: Linux Containers & CRIUFedora Virtualization Day: Linux Containers & CRIU
Fedora Virtualization Day: Linux Containers & CRIU
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
 
Tutorial ceph-2
Tutorial ceph-2Tutorial ceph-2
Tutorial ceph-2
 
Gluster and Kubernetes
Gluster and KubernetesGluster and Kubernetes
Gluster and Kubernetes
 
Supercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPCSupercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPC
 

Similaire à Seastar Delivers 7M IOPS with Advanced New Programming Model

Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and SeastarTzach Livyatan
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Hajime Tazaki
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioHajime Tazaki
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Databricks
 
Nodejs a-practical-introduction-oredev
Nodejs a-practical-introduction-oredevNodejs a-practical-introduction-oredev
Nodejs a-practical-introduction-oredevFelix Geisendörfer
 
Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)Igalia
 
Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the wayOleg Podsechin
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterAnne Nicolas
 
Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)Igalia
 

Similaire à Seastar Delivers 7M IOPS with Advanced New Programming Model (20)

Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and Seastar
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
 
Nodejs a-practical-introduction-oredev
Nodejs a-practical-introduction-oredevNodejs a-practical-introduction-oredev
Nodejs a-practical-introduction-oredev
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
 
Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the way
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
 
Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)
 

Dernier

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Dernier (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

Seastar Delivers 7M IOPS with Advanced New Programming Model

  • 1. Millions of transactions per second, with an advanced new programming model Seastar
  • 2. How multifarious and how mutually complicated are the considerations which the working of such an engine involve. There are frequently several distinct sets of effects going on simultaneously; all in a manner independent of each other, and yet to a greater or less degree exercising a mutual influence. To adjust each to every other, and indeed even to perceive and trace them out with perfect correctness and success, entails difficulties whose nature partakes to a certain extent of those involved in every question where conditions are very numerous and inter-complicated.
  • 3.
  • 4. Hardware outgrowing software + CPU clocks not getting faster. + More cores, but hard to use them. + Locks have costs even when no contention + Data is allocated on one core, copied and used on others + Result: Software can’t keep up with new hardware (SSD, 10Gbps networking…) Kernel Application TCP/IPScheduler queuequeuequeuequeuequeue threads NIC Queues Kernel Memory
  • 5. Workloads changing + Complex, multi-layered applications + NoSQL data stores + More users + Lower latencies needed + Microservices - 81% of Redis processing is in the kernel. - If 100 requests needed for a page, the “99% latency” affects 63% of pageviews. Kernel Application TCP/IPScheduler queuequeuequeuequeuequeue threads NIC Queues Kernel Memory
  • 6.
  • 8.
  • 9. Benchmark hardware ■ 2x Xeon E5-2695v3, 2.3GHz 35M cache, 14 cores (28 cores total, 56 HT) ■ 8x 8GB DDR4 Micron memory ■ Intel Ethernet CNA XL710-QDA1
  • 10. A new model Threads - Costly locking (example: POSIX requires multiple threads to be able to use same socket) + Uses available skills/tools Shared-nothing + Fewer wasted cycles - Cross-core communication must be explicit, so harder to program
  • 11. How ■ Single-threaded async engine running on each CPU ■ No threads ■ No shared data ■ All inter-CPU communication by message passing
  • 12. Linear scaling + Each engine is executed by each core + Shared-nothing per-core design + Fits existing shared-nothing distributed applications model + Full kernel bypass, supports zero-copy + No threads, no context switch and no locks! + Instead, asynchronous lambda invocation Application TCP/I P Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/I P Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/I P Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/I P Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace
  • 13. Kernel Comparison with old school Application TCP/IPScheduler queuequeuequeuequeuequeue threads NIC Queues Kernel Traditional stack SeaStar’s sharded stack Memory Application TCP/I P Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/I P Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/I P Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/I P Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (not involved) Userspace
  • 14. Millions of connections Traditional stack SeaStar’s sharded stack Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise is a pointer to eventually computed value Task is a pointer to a lambda function Scheduler CPU Scheduler CPU Scheduler CPU Scheduler CPU Scheduler CPU Threa d Stack Threa d Stack Threa d Stack Threa d Stack Threa d Stack Threa d Stack Threa d Stack Threa d Stack Thread is a function pointer Stack is a byte array from 64k to megabytes
  • 15. But how can you program it? ■ Ada Lovelace’s problem today ■ Need max. possible “easy” without giving up any “fast.” If the answer were “no”, would this book be 467 pages long?
  • 16. Basic model ■ Futures ■ Promises ■ Continuations
  • 17. F-P-C defined: Future A future is a result of a computation that may not be available yet. ■ a data buffer that we are reading from the network ■ the expiration of a timer ■ the completion of a disk write ■ the result computation that requires the values from one or more other futures.
  • 18. F-P-C defined: Promise A promise is an object or function that provides you with a future, with the expectation that it will fulfill the future.
  • 19. Basic future/promise future<int> get(); // promises an int will be produced eventually future<> put(int) // promises to store an int void f() { get().then([] (int value) { put(value + 1).then([] { std::cout << "value stored successfullyn"; }); }); }
  • 20. Chaining future<int> get(); // promises an int will be produced eventually future<> put(int) // promises to store an int void f() { get().then([] (int value) { return put(value + 1); }).then([] { std::cout << "value stored successfullyn"; }); }
  • 21. Zero copy friendly future<temporary_buffer> socket::read(size_t n); ■ temporary_buffer points at driver-provided pages if possible ■ stack can linearize scatter-gather buffers using page tables ■ discarded after use
  • 22. Zero copy friendly (2) pair<future<size_t>, future<temporary_buffer>> socket::write(temporary_buffer); ■ First future becomes ready when TCP window allows sending more data (usually immediately) ■ Second future becomes ready when buffer can be discarded (after TCP ACK) ■ May complete in any order
  • 23. Fully async filesystem No threads read_metadata().then([] { return lock_pages(); }).then([] { return read_data(); });
  • 24. Shared state: networking ■ No shared state except index of net channels (1 per cpu) ■ No migration of existing TCP connections
  • 25. Handling shared state: block ■ Each CPU is responsible for handling specific files/directories/free blocks (by hash) ■ Can delegate access to another CPU for locality, but not concurrent shared access ■ Flash optimized - no fancy layout ■ DMA only
  • 26. Seastar TCP Seastar TCP Linux sockets Seastar TCP DPDK Virtio or raw device access Linux process OSv networking Deployment models
  • 27. Licensing ■ Apache ■ Goals: compatibility and contributor safety
  • 28. Performance results ■ Linear scaling to 20 cores and beyond ■ 250,000 transactions/core (memcached) ■ Currently limited by client. More client development in progress.
  • 29. Applications ■ HTTP server ■ NoSQL system ■ Distributed filesystem ■ Object store ■ Transparent proxy ■ Cache (Memcache, CDN,..) ■ NFV
  • 30.

Notes de l'éditeur

  1. 318,715 transactions/core at 2 cores, 274,114 transactions/core at 16 cores…
  2. 250,000 transactions/core
  3. Slide 7 - locking is only part of the problem, and mostly eliminated by "lock-free" alternatives to locking. The other problems are cache-line bouncing, and slow atomic operations and memory barriers. How "shared nothing" design cannot eliminate all of these (we still communicate between core), but can minimize it by making it very explicit when these things happen. If I understood Avi correctly, he also says that another problem of the thread model is the large stacks also mean large cache polution on context switches, while our tiny "task" switches don't have large cache pollution. You even mention this later on But I have to admit I'm not completely convinced this is the case (even if the stack is large, the threads use only a tiny portion of it?).
  4. http://aws.amazon.com/ec2/pricing/
  5. http://aws.amazon.com/ec2/pricing/
  6. http://aws.amazon.com/ec2/pricing/
  7. Promises and futures simplify asynchronous programming since they decouple the event producer (the promise) and the event consumer (whoever uses the future). Whether the promise is fulfilled before the future is consumed, or vice versa, does not change the outcome of the code.