SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Exception-Less System Calls for
    Event-Driven Servers


  Livio Soares and Michael Stumm

       University of Toronto
Talk overview
 ➔
     At OSDI'10: exception-less system calls
     ➔   Technique targeted at highly threaded servers
     ➔   Doubled performance of Apache
 ➔   Event-driven servers are popular
     ➔   Faster than threaded servers

             We show that exception-less system
             calls make event-driven server faster
 ➔   memcached speeds up by 25-35%
 ➔   nginx speeds up by 70-120%

Livio Soares | Exception-Less System Calls for Event-Driven Servers   2
Event-driven server architectures
 ➔   Supports I/O concurrency with a single
     execution context
     ➔   Alternative to thread based architectures
 ➔   At a high-level:
     ➔
         Divide program flow into non-blocking stages
     ➔   After each stage register interest in event(s)
     ➔   Notification of event is asynchronous, driving next
         stage in the program flow
     ➔   To avoid idle time, applications multiplex execution
         of multiple independent stages


Livio Soares | Exception-Less System Calls for Event-Driven Servers   3
Example: simple network server
void server() {
   ...
   ...
   fd = accept();
   ...
   ...
   read(fd);
   ...
   ...
   write(fd);
   ...
   ...
   close(fd);
   ...
   ...
}

Livio Soares | Exception-Less System Calls for Event-Driven Servers   4
Example: simple network server
void server() {                                 S1
   ...
   ...   S1
   fd = accept();
                                                             UNIX options:
   ...                                          S2
   ... S2                                                    Non-blocking I/O
   read(fd);                                                   poll()
   ...
   ... S3                                       S3             select()
   write(fd);                                                  epoll()
   ...
   ... S4                                       S4           Async I/O
   close(fd);
   ...
   ...   S5
}                                               S5

Livio Soares | Exception-Less System Calls for Event-Driven Servers          5
Performance: events vs. threads
                                ApacheBench
                 14000
                 12000
 Requests/sec.




                 10000
                  8000
                  6000
                  4000
                                                         nginx (events)
                  2000
                                                         Apache (threads)
                     0
                       0   100          200         300          400   500
                                  Concurrency

    nginx delivers 1.7x the throughput of Apache;
          gracefully copes with high loads
Livio Soares | Exception-Less System Calls for Event-Driven Servers          6
Issues with UNIX event primitives
 ➔   Do not cover all system calls
     ➔   Mostly work with file-descriptors (files and sockets)
 ➔   Overhead
     ➔   Tracking progress of I/O involves both application
         and kernel code
     ➔   Application and kernel communicate frequently

      Previous work shows that fine-grain mode
       switching can half processor efficiency


Livio Soares | Exception-Less System Calls for Event-Driven Servers   7
FlexSC component overview




FlexSC and FlexSC-Threads presented at OSDI 2010

This work: libflexsc for event-driven servers
 1) memcached throughput increase of up to 35%
 2) nginx throughput increase of up to 120%

Livio Soares | Exception-Less System Calls for Event-Driven Servers   8
Benefits for event-driven applications
 1) General purpose
     ➔   Any/all system calls can be asynchronous
 2) Non-intrusive kernel implementation
     ➔   Does not require per syscall code
 3) Facilitates multi-processor execution
     ➔   OS work is automatically distributed
 4) Improved processor efficiency
     ➔   Reduces frequent user/kernel mode switches



Livio Soares | Exception-Less System Calls for Event-Driven Servers   9
Summary of exception-less syscalls




Livio Soares | Exception-Less System Calls for Event-Driven Servers   10
Exception-less interface: syscall page
write(fd, buf, 4096);



entry = free_syscall_entry();

/* write syscall */
entry->syscall = 1;
entry->num_args = 3;
entry->args[0] = fd;
entry->args[1] = buf;
entry->args[2] = 4096;
entry->status = SUBMIT;
                SUBMIT

while (entry->status != DONE)
                        DONE
   do_something_else();

return entry->return_code;

Livio Soares | Exception-Less System Calls for Event-Driven Servers   11
Exception-less interface: syscall page
write(fd, buf, 4096);



entry = free_syscall_entry();

/* write syscall */
entry->syscall = 1;
entry->num_args = 3;
entry->args[0] = fd;
entry->args[1] = buf;
entry->args[2] = 4096;                                                SUBMIT
entry->status = SUBMIT;
                SUBMIT

while (entry->status != DONE)
                        DONE
   do_something_else();

return entry->return_code;

Livio Soares | Exception-Less System Calls for Event-Driven Servers            12
Exception-less interface: syscall page
write(fd, buf, 4096);



entry = free_syscall_entry();

/* write syscall */
entry->syscall = 1;
entry->num_args = 3;
entry->args[0] = fd;
entry->args[1] = buf;
entry->args[2] = 4096;                                                DONE
entry->status = SUBMIT;
                SUBMIT

while (entry->status != DONE)
                        DONE
   do_something_else();

return entry->return_code;

Livio Soares | Exception-Less System Calls for Event-Driven Servers          13
Syscall threads
 ➔   Kernel-only threads
     ➔   Part of application process
 ➔   Execute requests from syscall page
 ➔   Schedulable on a per-core basis




Livio Soares | Exception-Less System Calls for Event-Driven Servers   14
Dynamic multicore specialization



Core 0                                                                Core 1
Core 2                                                                Core 3




    1) FlexSC makes specializing cores simple
    2) Dynamically adapts to workload needs
Livio Soares | Exception-Less System Calls for Event-Driven Servers        15
libflexsc: async syscall library
 ➔   Async syscall and notification library

 ➔   Similar to libevent
     ➔   But operates on syscalls instead of file-descriptors

 ➔   Three main components:
     1) Provides main loop (dispatcher)
     2) Support asynchronous syscall with associated
       callback to notify completion
     3) Cancellation support


Livio Soares | Exception-Less System Calls for Event-Driven Servers   16
Main API: async system call
1  struct flexsc_cb {
2      void (*callback)(struct flexsc_cb *); /* event handler */
3      void *arg;                            /* auxiliary var */
4      int64_t ret;                          /* syscall return */
5  }
6 
7  int flexsc_##SYSCALL(struct flexsc_cb *, ... /*syscall args*/);
 
8   /* Example: asynchronous accept */
9   struct flexsc_cb cb;
10   cb.callback = handle_accept;
11  flexsc_accept(&cb, master_sock, NULL, 0);
12 
13   void handle_accept(struct flexsc_cb *cb) {
14    int fd = cb­>ret;
15    if (fd != ­1) {
16         struct flexsc_cb read_cb;
17         read_cb.callback = handle_read;
18         flexsc_read(&read_cb, fd, read_buf, read_count);
19    }
20   }


Livio Soares | Exception-Less System Calls for Event-Driven Servers   17
memcached port to libflexsc
 ➔   memcached: in-memory key/value store
     ➔   Simple code-base: 8K LOC
     ➔   Uses libevent


 ➔   Modified 293 LOC
 ➔   Transformed libevent calls to libflexsc
 ➔   Mostly in one file: memcached.c
 ➔   Most memcached syscalls are socket based


Livio Soares | Exception-Less System Calls for Event-Driven Servers   18
nginx port to libflexsc
 ➔   Most popular event-driven webserver
     ➔   Code base: 82K LOC
     ➔   Natively uses both non-blocking (epoll) I/O and
         asynchronous I/O

 ➔   Modified 255 LOC
 ➔   Socket based code already asynchronous
 ➔   Not all file-system calls were asynchronous
     ➔   e.g., open, fstat, getdents
 ➔   Special handling of stack allocated syscall args

Livio Soares | Exception-Less System Calls for Event-Driven Servers   19
Evaluation
 ➔   Linux 2.6.33
 ➔   Nehalem (Core i7) server, 2.3GHz
     ➔   4 cores
 ➔   Client connected through 1Gbps network
 ➔   Workloads
     ➔   memslap on memcached (30% user, 70% kernel)
     ➔   httperf on nginx (25% user, 75% kernel)
 ➔
     Default Linux (“epoll”) vs.
         libflexsc (“flexsc”)

Livio Soares | Exception-Less System Calls for Event-Driven Servers   20
memcached on 4 cores
                              140000
 Throughput (requests/sec.)


                              120000

                              100000

                              80000
                                                 30% improvement
                              60000

                              40000
                                                                        flexsc
                              20000
                                                                        epoll
                                  0
                                       0   200     400   600      800    1000
                                            Request Concurrency


Livio Soares | Exception-Less System Calls for Event-Driven Servers              21
memcached processor metrics
                       1.2
                                    User                                   Kernel
                        1
Relative Performance




                       0.8
    (flexsc/epoll)




                       0.6

                       0.4

                       0.2

                        0
                                   L2         i-cache               L2         i-cache
                             CPI        d-cache           CPI            d-cache




  Livio Soares | Exception-Less System Calls for Event-Driven Servers               22
httperf on nginx (1 core)
                    120
                              flexsc
                    100
                              epoll
Throughput (Mbps)




                    80

                    60

                    40
                                                   100% improvement
                    20

                     0
                          0   10000    20000     30000      40000     50000   60000

                                               Requests/s

Livio Soares | Exception-Less System Calls for Event-Driven Servers            23
nginx processor metrics
                       1.2
                                   User                               Kernel
                        1
Relative Performance




                       0.8
    (flexsc/epoll)




                       0.6

                       0.4

                       0.2

                        0
                                   L2        i-cache        CPI       d-cache    Branch
                             CPI        d-cache    Branch          L2      i-cache




     Livio Soares | Exception-Less System Calls for Event-Driven Servers            24
Concluding remarks
 ➔   Current event-based primitives add overhead
     ➔   I/O operations require frequent communication
         between OS and application
 ➔
     libflexsc: exception-less syscall library
     1) General purpose
     2) Non-intrusive kernel implementation
     3) Facilitates multi-processor execution
     4) Improved processor efficiency
 ➔   Ported memcached and nginx to libflexsc
     ➔   Performance improvements of 30 - 120%

Livio Soares | Exception-Less System Calls for Event-Driven Servers   25
Exception-Less System Calls for
    Event-Driven Servers


  Livio Soares and Michael Stumm

       University of Toronto
Backup Slides
Difference in improvements
Why does nginx improve more than memcached?

1) Frequency of mode switches:
                 Server                 memcached              nginx
        Frequency of syscalls
                                            3,750              1,460
           (in instructions)


2) nginx uses greater diversity of system calls
  ➔   More interference in processor structures (caches)

3) Instruction count reduction
 ➔ nginx with epoll() has connection timeouts



Livio Soares | Exception-Less System Calls for Event-Driven Servers    28
Limitations
 ➔   Scalability (number of outstanding syscalls)
     ➔   Interface: operations perform linear scan
     ➔   Implementation: overheads of syscall threads
         non-negligible
 ➔   Solutions
     ➔   Throttle syscalls at application or OS
     ➔   Switch interface to scalable message passing
     ➔   Provide exception-less versions of async I/O
     ➔   Make kernel fully non-blocking


Livio Soares | Exception-Less System Calls for Event-Driven Servers   29
Latency (ApacheBench)

                30
                                                             epoll
                25                                           flexsc
 Latency (ms)




                20

                15

                10                                                    50% latency
                                                                       reduction
                5

                0
                     1 core                        2 cores


Livio Soares | Exception-Less System Calls for Event-Driven Servers          30

Contenu connexe

Tendances

GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationEtsuji Nakai
 
Make container without_docker_7
Make container without_docker_7Make container without_docker_7
Make container without_docker_7Sam Kim
 
Make container without_docker_6-overlay-network_1
Make container without_docker_6-overlay-network_1 Make container without_docker_6-overlay-network_1
Make container without_docker_6-overlay-network_1 Sam Kim
 
Fast boot
Fast bootFast boot
Fast bootSZ Lin
 
A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages! ...
A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages! ...A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages! ...
A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages! ...CODE BLUE
 
LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?Jérôme Petazzoni
 
DeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows KernelDeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows KernelPeter Hlavaty
 
Namespaces and cgroups - the basis of Linux containers
Namespaces and cgroups - the basis of Linux containersNamespaces and cgroups - the basis of Linux containers
Namespaces and cgroups - the basis of Linux containersKernel TLV
 
Storage based on_openstack_mariocho
Storage based on_openstack_mariochoStorage based on_openstack_mariocho
Storage based on_openstack_mariochoMario Cho
 
Lecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentLecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentMohammed Farrag
 
Kernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureKernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureAnne Nicolas
 
Kdump-FUDcon-2015-Session
Kdump-FUDcon-2015-SessionKdump-FUDcon-2015-Session
Kdump-FUDcon-2015-SessionBuland Singh
 
도커 없이 컨테이너 만들기 5편 마운트 네임스페이스와 오버레이 파일시스템
도커 없이 컨테이너 만들기 5편 마운트 네임스페이스와 오버레이 파일시스템도커 없이 컨테이너 만들기 5편 마운트 네임스페이스와 오버레이 파일시스템
도커 없이 컨테이너 만들기 5편 마운트 네임스페이스와 오버레이 파일시스템Sam Kim
 
Stackless Python In Eve
Stackless Python In EveStackless Python In Eve
Stackless Python In Evel xf
 
From L3 to seL4: What have we learnt in 20 years of L4 microkernels
From L3 to seL4: What have we learnt in 20 years of L4 microkernelsFrom L3 to seL4: What have we learnt in 20 years of L4 microkernels
From L3 to seL4: What have we learnt in 20 years of L4 microkernelsmicrokerneldude
 
Possibility of arbitrary code execution by Step-Oriented Programming
Possibility of arbitrary code execution by Step-Oriented ProgrammingPossibility of arbitrary code execution by Step-Oriented Programming
Possibility of arbitrary code execution by Step-Oriented Programmingkozossakai
 
[若渴計畫] Black Hat 2017之過去閱讀相關整理
[若渴計畫] Black Hat 2017之過去閱讀相關整理[若渴計畫] Black Hat 2017之過去閱讀相關整理
[若渴計畫] Black Hat 2017之過去閱讀相關整理Aj MaChInE
 

Tendances (20)

GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack Integration
 
Make container without_docker_7
Make container without_docker_7Make container without_docker_7
Make container without_docker_7
 
Make container without_docker_6-overlay-network_1
Make container without_docker_6-overlay-network_1 Make container without_docker_6-overlay-network_1
Make container without_docker_6-overlay-network_1
 
Fast boot
Fast bootFast boot
Fast boot
 
A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages! ...
A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages! ...A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages! ...
A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages! ...
 
RunningFreeBSDonLinuxKVM
RunningFreeBSDonLinuxKVMRunningFreeBSDonLinuxKVM
RunningFreeBSDonLinuxKVM
 
LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?
 
DeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows KernelDeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows Kernel
 
Namespaces and cgroups - the basis of Linux containers
Namespaces and cgroups - the basis of Linux containersNamespaces and cgroups - the basis of Linux containers
Namespaces and cgroups - the basis of Linux containers
 
Storage based on_openstack_mariocho
Storage based on_openstack_mariochoStorage based on_openstack_mariocho
Storage based on_openstack_mariocho
 
Lecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentLecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports Development
 
Kernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureKernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and future
 
Kdump-FUDcon-2015-Session
Kdump-FUDcon-2015-SessionKdump-FUDcon-2015-Session
Kdump-FUDcon-2015-Session
 
도커 없이 컨테이너 만들기 5편 마운트 네임스페이스와 오버레이 파일시스템
도커 없이 컨테이너 만들기 5편 마운트 네임스페이스와 오버레이 파일시스템도커 없이 컨테이너 만들기 5편 마운트 네임스페이스와 오버레이 파일시스템
도커 없이 컨테이너 만들기 5편 마운트 네임스페이스와 오버레이 파일시스템
 
Stackless Python In Eve
Stackless Python In EveStackless Python In Eve
Stackless Python In Eve
 
From L3 to seL4: What have we learnt in 20 years of L4 microkernels
From L3 to seL4: What have we learnt in 20 years of L4 microkernelsFrom L3 to seL4: What have we learnt in 20 years of L4 microkernels
From L3 to seL4: What have we learnt in 20 years of L4 microkernels
 
Possibility of arbitrary code execution by Step-Oriented Programming
Possibility of arbitrary code execution by Step-Oriented ProgrammingPossibility of arbitrary code execution by Step-Oriented Programming
Possibility of arbitrary code execution by Step-Oriented Programming
 
Genode Compositions
Genode CompositionsGenode Compositions
Genode Compositions
 
[若渴計畫] Black Hat 2017之過去閱讀相關整理
[若渴計畫] Black Hat 2017之過去閱讀相關整理[若渴計畫] Black Hat 2017之過去閱讀相關整理
[若渴計畫] Black Hat 2017之過去閱讀相關整理
 
Syslog Protocols
Syslog ProtocolsSyslog Protocols
Syslog Protocols
 

Similaire à Livio slides-libflexsc-usenix-atc11

[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화OpenStack Korea Community
 
Installing tivoli system automation for high availability of db2 udb bcu on a...
Installing tivoli system automation for high availability of db2 udb bcu on a...Installing tivoli system automation for high availability of db2 udb bcu on a...
Installing tivoli system automation for high availability of db2 udb bcu on a...Banking at Ho Chi Minh city
 
An Introduce of OPNFV (Open Platform for NFV)
An Introduce of OPNFV (Open Platform for NFV)An Introduce of OPNFV (Open Platform for NFV)
An Introduce of OPNFV (Open Platform for NFV)Mario Cho
 
Network & Filesystem: Doing less cross rings memory copy
Network & Filesystem: Doing less cross rings memory copyNetwork & Filesystem: Doing less cross rings memory copy
Network & Filesystem: Doing less cross rings memory copyScaleway
 
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogicRakuten Group, Inc.
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane Michelle Holley
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Fwdays
 
LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0Marcel Mitran
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1Hajime Tazaki
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Hajime Tazaki
 
Scalable Networking
Scalable NetworkingScalable Networking
Scalable Networkingl xf
 
The linux kernel hidden inside windows 10
The linux kernel hidden inside windows 10The linux kernel hidden inside windows 10
The linux kernel hidden inside windows 10mark-smith
 
2003 scalable networking - unknown
2003 scalable networking - unknown2003 scalable networking - unknown
2003 scalable networking - unknownGeorge Ang
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetesTed Jung
 
Vm13 vnx mixed workloads
Vm13 vnx mixed workloadsVm13 vnx mixed workloads
Vm13 vnx mixed workloadspittmantony
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Community
 
Storage Changes in VMware vSphere 4.1
Storage Changes in VMware vSphere 4.1Storage Changes in VMware vSphere 4.1
Storage Changes in VMware vSphere 4.1Scott Lowe
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 

Similaire à Livio slides-libflexsc-usenix-atc11 (20)

[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Installing tivoli system automation for high availability of db2 udb bcu on a...
Installing tivoli system automation for high availability of db2 udb bcu on a...Installing tivoli system automation for high availability of db2 udb bcu on a...
Installing tivoli system automation for high availability of db2 udb bcu on a...
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
 
An Introduce of OPNFV (Open Platform for NFV)
An Introduce of OPNFV (Open Platform for NFV)An Introduce of OPNFV (Open Platform for NFV)
An Introduce of OPNFV (Open Platform for NFV)
 
Network & Filesystem: Doing less cross rings memory copy
Network & Filesystem: Doing less cross rings memory copyNetwork & Filesystem: Doing less cross rings memory copy
Network & Filesystem: Doing less cross rings memory copy
 
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...
 
LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0
 
Mina2
Mina2Mina2
Mina2
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
 
Scalable Networking
Scalable NetworkingScalable Networking
Scalable Networking
 
The linux kernel hidden inside windows 10
The linux kernel hidden inside windows 10The linux kernel hidden inside windows 10
The linux kernel hidden inside windows 10
 
2003 scalable networking - unknown
2003 scalable networking - unknown2003 scalable networking - unknown
2003 scalable networking - unknown
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
 
Vm13 vnx mixed workloads
Vm13 vnx mixed workloadsVm13 vnx mixed workloads
Vm13 vnx mixed workloads
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
 
Storage Changes in VMware vSphere 4.1
Storage Changes in VMware vSphere 4.1Storage Changes in VMware vSphere 4.1
Storage Changes in VMware vSphere 4.1
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 

Dernier

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Dernier (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Livio slides-libflexsc-usenix-atc11

  • 1. Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto
  • 2. Talk overview ➔ At OSDI'10: exception-less system calls ➔ Technique targeted at highly threaded servers ➔ Doubled performance of Apache ➔ Event-driven servers are popular ➔ Faster than threaded servers We show that exception-less system calls make event-driven server faster ➔ memcached speeds up by 25-35% ➔ nginx speeds up by 70-120% Livio Soares | Exception-Less System Calls for Event-Driven Servers 2
  • 3. Event-driven server architectures ➔ Supports I/O concurrency with a single execution context ➔ Alternative to thread based architectures ➔ At a high-level: ➔ Divide program flow into non-blocking stages ➔ After each stage register interest in event(s) ➔ Notification of event is asynchronous, driving next stage in the program flow ➔ To avoid idle time, applications multiplex execution of multiple independent stages Livio Soares | Exception-Less System Calls for Event-Driven Servers 3
  • 4. Example: simple network server void server() { ... ... fd = accept(); ... ... read(fd); ... ... write(fd); ... ... close(fd); ... ... } Livio Soares | Exception-Less System Calls for Event-Driven Servers 4
  • 5. Example: simple network server void server() { S1 ... ... S1 fd = accept(); UNIX options: ... S2 ... S2 Non-blocking I/O read(fd); poll() ... ... S3 S3 select() write(fd); epoll() ... ... S4 S4 Async I/O close(fd); ... ... S5 } S5 Livio Soares | Exception-Less System Calls for Event-Driven Servers 5
  • 6. Performance: events vs. threads ApacheBench 14000 12000 Requests/sec. 10000 8000 6000 4000 nginx (events) 2000 Apache (threads) 0 0 100 200 300 400 500 Concurrency nginx delivers 1.7x the throughput of Apache; gracefully copes with high loads Livio Soares | Exception-Less System Calls for Event-Driven Servers 6
  • 7. Issues with UNIX event primitives ➔ Do not cover all system calls ➔ Mostly work with file-descriptors (files and sockets) ➔ Overhead ➔ Tracking progress of I/O involves both application and kernel code ➔ Application and kernel communicate frequently Previous work shows that fine-grain mode switching can half processor efficiency Livio Soares | Exception-Less System Calls for Event-Driven Servers 7
  • 8. FlexSC component overview FlexSC and FlexSC-Threads presented at OSDI 2010 This work: libflexsc for event-driven servers 1) memcached throughput increase of up to 35% 2) nginx throughput increase of up to 120% Livio Soares | Exception-Less System Calls for Event-Driven Servers 8
  • 9. Benefits for event-driven applications 1) General purpose ➔ Any/all system calls can be asynchronous 2) Non-intrusive kernel implementation ➔ Does not require per syscall code 3) Facilitates multi-processor execution ➔ OS work is automatically distributed 4) Improved processor efficiency ➔ Reduces frequent user/kernel mode switches Livio Soares | Exception-Less System Calls for Event-Driven Servers 9
  • 10. Summary of exception-less syscalls Livio Soares | Exception-Less System Calls for Event-Driven Servers 10
  • 11. Exception-less interface: syscall page write(fd, buf, 4096); entry = free_syscall_entry(); /* write syscall */ entry->syscall = 1; entry->num_args = 3; entry->args[0] = fd; entry->args[1] = buf; entry->args[2] = 4096; entry->status = SUBMIT; SUBMIT while (entry->status != DONE) DONE do_something_else(); return entry->return_code; Livio Soares | Exception-Less System Calls for Event-Driven Servers 11
  • 12. Exception-less interface: syscall page write(fd, buf, 4096); entry = free_syscall_entry(); /* write syscall */ entry->syscall = 1; entry->num_args = 3; entry->args[0] = fd; entry->args[1] = buf; entry->args[2] = 4096; SUBMIT entry->status = SUBMIT; SUBMIT while (entry->status != DONE) DONE do_something_else(); return entry->return_code; Livio Soares | Exception-Less System Calls for Event-Driven Servers 12
  • 13. Exception-less interface: syscall page write(fd, buf, 4096); entry = free_syscall_entry(); /* write syscall */ entry->syscall = 1; entry->num_args = 3; entry->args[0] = fd; entry->args[1] = buf; entry->args[2] = 4096; DONE entry->status = SUBMIT; SUBMIT while (entry->status != DONE) DONE do_something_else(); return entry->return_code; Livio Soares | Exception-Less System Calls for Event-Driven Servers 13
  • 14. Syscall threads ➔ Kernel-only threads ➔ Part of application process ➔ Execute requests from syscall page ➔ Schedulable on a per-core basis Livio Soares | Exception-Less System Calls for Event-Driven Servers 14
  • 15. Dynamic multicore specialization Core 0 Core 1 Core 2 Core 3 1) FlexSC makes specializing cores simple 2) Dynamically adapts to workload needs Livio Soares | Exception-Less System Calls for Event-Driven Servers 15
  • 16. libflexsc: async syscall library ➔ Async syscall and notification library ➔ Similar to libevent ➔ But operates on syscalls instead of file-descriptors ➔ Three main components: 1) Provides main loop (dispatcher) 2) Support asynchronous syscall with associated callback to notify completion 3) Cancellation support Livio Soares | Exception-Less System Calls for Event-Driven Servers 16
  • 17. Main API: async system call 1  struct flexsc_cb { 2  void (*callback)(struct flexsc_cb *); /* event handler */ 3  void *arg;                            /* auxiliary var */ 4  int64_t ret;                          /* syscall return */ 5  } 6  7  int flexsc_##SYSCALL(struct flexsc_cb *, ... /*syscall args*/);   8   /* Example: asynchronous accept */ 9   struct flexsc_cb cb; 10   cb.callback = handle_accept; 11  flexsc_accept(&cb, master_sock, NULL, 0); 12  13   void handle_accept(struct flexsc_cb *cb) { 14    int fd = cb­>ret; 15    if (fd != ­1) { 16    struct flexsc_cb read_cb; 17    read_cb.callback = handle_read; 18    flexsc_read(&read_cb, fd, read_buf, read_count); 19    } 20   } Livio Soares | Exception-Less System Calls for Event-Driven Servers 17
  • 18. memcached port to libflexsc ➔ memcached: in-memory key/value store ➔ Simple code-base: 8K LOC ➔ Uses libevent ➔ Modified 293 LOC ➔ Transformed libevent calls to libflexsc ➔ Mostly in one file: memcached.c ➔ Most memcached syscalls are socket based Livio Soares | Exception-Less System Calls for Event-Driven Servers 18
  • 19. nginx port to libflexsc ➔ Most popular event-driven webserver ➔ Code base: 82K LOC ➔ Natively uses both non-blocking (epoll) I/O and asynchronous I/O ➔ Modified 255 LOC ➔ Socket based code already asynchronous ➔ Not all file-system calls were asynchronous ➔ e.g., open, fstat, getdents ➔ Special handling of stack allocated syscall args Livio Soares | Exception-Less System Calls for Event-Driven Servers 19
  • 20. Evaluation ➔ Linux 2.6.33 ➔ Nehalem (Core i7) server, 2.3GHz ➔ 4 cores ➔ Client connected through 1Gbps network ➔ Workloads ➔ memslap on memcached (30% user, 70% kernel) ➔ httperf on nginx (25% user, 75% kernel) ➔ Default Linux (“epoll”) vs. libflexsc (“flexsc”) Livio Soares | Exception-Less System Calls for Event-Driven Servers 20
  • 21. memcached on 4 cores 140000 Throughput (requests/sec.) 120000 100000 80000 30% improvement 60000 40000 flexsc 20000 epoll 0 0 200 400 600 800 1000 Request Concurrency Livio Soares | Exception-Less System Calls for Event-Driven Servers 21
  • 22. memcached processor metrics 1.2 User Kernel 1 Relative Performance 0.8 (flexsc/epoll) 0.6 0.4 0.2 0 L2 i-cache L2 i-cache CPI d-cache CPI d-cache Livio Soares | Exception-Less System Calls for Event-Driven Servers 22
  • 23. httperf on nginx (1 core) 120 flexsc 100 epoll Throughput (Mbps) 80 60 40 100% improvement 20 0 0 10000 20000 30000 40000 50000 60000 Requests/s Livio Soares | Exception-Less System Calls for Event-Driven Servers 23
  • 24. nginx processor metrics 1.2 User Kernel 1 Relative Performance 0.8 (flexsc/epoll) 0.6 0.4 0.2 0 L2 i-cache CPI d-cache Branch CPI d-cache Branch L2 i-cache Livio Soares | Exception-Less System Calls for Event-Driven Servers 24
  • 25. Concluding remarks ➔ Current event-based primitives add overhead ➔ I/O operations require frequent communication between OS and application ➔ libflexsc: exception-less syscall library 1) General purpose 2) Non-intrusive kernel implementation 3) Facilitates multi-processor execution 4) Improved processor efficiency ➔ Ported memcached and nginx to libflexsc ➔ Performance improvements of 30 - 120% Livio Soares | Exception-Less System Calls for Event-Driven Servers 25
  • 26. Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto
  • 28. Difference in improvements Why does nginx improve more than memcached? 1) Frequency of mode switches: Server memcached nginx Frequency of syscalls 3,750 1,460 (in instructions) 2) nginx uses greater diversity of system calls ➔ More interference in processor structures (caches) 3) Instruction count reduction ➔ nginx with epoll() has connection timeouts Livio Soares | Exception-Less System Calls for Event-Driven Servers 28
  • 29. Limitations ➔ Scalability (number of outstanding syscalls) ➔ Interface: operations perform linear scan ➔ Implementation: overheads of syscall threads non-negligible ➔ Solutions ➔ Throttle syscalls at application or OS ➔ Switch interface to scalable message passing ➔ Provide exception-less versions of async I/O ➔ Make kernel fully non-blocking Livio Soares | Exception-Less System Calls for Event-Driven Servers 29
  • 30. Latency (ApacheBench) 30 epoll 25 flexsc Latency (ms) 20 15 10 50% latency reduction 5 0 1 core 2 cores Livio Soares | Exception-Less System Calls for Event-Driven Servers 30