SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Kqueue : Generic Event Notification




Mahendra M
Mahendra_M@infosys.com
http://www.infosys.com


This work is licensed under a Creative Commons License
http://creativecommons.org/licenses/by-sa/2.5/
Agenda
   Traditional ways of multiplexing I/O
   Methods and issues in handling asynchronous events.
   Enter Kqueue
   The Kqueue architecture.
   Kqueue possibilities.
Traditional File/Socket handling
   Traditionally a single file can be handled as below
    /* No error checking here */
    while ( i = read( fd, ... ) ) {
         do_something( with_this_data );
    }
   The above case works fine for one file descriptor
   What about the case where we have two or more such
    descriptors ( for sockets ) and data can appear on any one
    of the socket at any given point of time ?
    –   Basically, we need a mechanism for event driven applications.
    –   This is a case for multiplexing I/O ( or events ) !!
Traditional I/O multiplexing
   Use select() and/or poll()
   select() or poll() pass a list of file descriptors to the kernel
    and wait for updates to happen. On receiving an update
    these calls have the list of file descriptors that got updated.
   File descriptors passed as a bitmap – with each bit being set
    or unset to represent a file descriptor.
   Select() and poll() can watch for read/write/exception events
    on the list of file descriptors.
   On return, the applications have to parse the entire bitmap to
    see which file descriptors have to be handled.
Traditional I/O multiplexing ( contd.. )

fd_set fds;
FD_ZERO( &fds );
FD_SET( 5, &fds );
n = select( 1, &fds, NULL, NULL, NULL );
j = 0;
for ( i = 0; (i < MAX) && (j < n); i++ ) {
    if ( FD_ISSET( i ) ) {
       read_something_from_socket( i );
       j++;
    }
}
Issues with select()/poll()
   Problems of scalability
    –   Entire descriptor set has to be passed to each invocation of
        the system call ( specially with poll() - which uses an array )
    –   Massive copies from user space to kernel space and vice-
        versa
    –   Not all descriptors may have activity all the time
    –   On return, apps had to parse the entire list to check for
        updated descriptors. ( duplicated effort in kernel and app ) -
        O(N) activity
    –   Results in inefficient memory usage within the kernel
    –   In case of sleep, the list has to be parsed three times.
   sleep()/poll() can handle only file descriptors
   Coding was clunky for select()
    –   Descriptor set is a bitmap of fixed size ( default 255 )
Other forms of interesting events
   Asynchronous signal notifications
    –   Required in libraries that may want to be notified of signals
   Asynchronous timer expiry
   Asynchronous Read/Write ( aio_read(), aio_write() )
   VFS changes
   Process state Changes
   Thread state changes
   Device driver notifications
   Anything else – that will require some asynchronous event
    notification – and the design allowing it.
Available solutions
   Linux 2.4 : SIGIO
   Sun Solaris : /dev/poll
   Linux 2.4 : /dev/epoll
    –   Use ioctl() to manipulate the above.
   Even Microsoft Windows had something to offer.
   Kqueue – for BSD boxes.
    –   We shall be talking about that now !!
Kqueue - Goals
   A generic event notification framework
    –   File descriptors (read/write/exceptions), Signals,
        Asynchronous I/O ( not in OSFR ), Vnodes monitoring,
        process monitoring, Timer events.
   A single system call to handle all this.
   Capability to add new functionality.
   Efficient use of memory
    –   Memory should be allocated as per need.
    –   Should be able to register/receive interested number of
        events.
    –   Events should be combined ( eg: data arriving over a socket )
   Should be good replacements for standard calls.
   Should be possible to extend this functionality easily
Kqueue APIs
   int32_t kqueue( void );
    –   Creates a kernel queue. It is identical to a file descriptor. It can
        be deleted using the close() system call.
   int32_t kevent( kq, changes, nc, events, ne,
    timeout );
    –   To register events in the kernel queue
    –   To receive events that occurred between consecutive calls.
    –   Can simulate select(), poll() - Using different values of timeout
    –   No need to store the event descriptors locally in the
        application.
   EV_SET( &event, ident, filter, flags,
    fflags, data, udata)
    –   Used to prepare an event for registering in the kernel queue.
Kqueue sample code
kq = kqueue();
struct kevent kev[10];
// Prepare an event
EV_SET( &kev[0], fd, EVFILT_READ, EV_ADD, 0, 0, 0);
// Register an event
kevent( kq, &kev, 10, NULL, 0, timeout );


// Receive events
n = kevent( kq, NULL, 0, &kev, 10, timeout );
for ( i = 0; i < n; i++ ) {
    // Do something
}
Kqueue filter types
   READ : Returns when data is available for read from
    sockets, vnodes, fifos, pipes
    –   ident = descriptor
    –   Data = amount of data to be read
    –   Flags = can be EOF etc.
   WRITE : Returns when it is possible to write to a descriptor
    ( ident ).
    –   Data = amount of data that can be written
   VNODE : Returns when a file descriptor changes
    –   fflags = delete, write, extend, attrib, link, rename, revoke
Kqueue filter types ( contd... )
   PROC : Monitors a process
    –   Ident = pid of the process to be monitored.
    –   Fflags = Exit, fork, exec, track, trackerr
   SIGNAL : Returns when a signal is delivered to a process.
    –   Ident = signal number
    –   Data = no of times the signal was delivered.
    –   Co-exists with signal() and sigaction() - and has a lower
        precedence.
    –   Is delivered even if SIG_IGN is set for the signal
   TIMER : Establishes a timer
    –   ident = timer id, Data = timeout in milliseconds, or no of times
    –   Periodic by default unless ONESHOT is specified
Kqueue Flags
   ADD : To add an event to the queue
   ENABLE : To enable a disabled event
   DISABLE : To temporarily disable an event ( not deleted )
   DELETE : Remove an event from the kernel queue
   ONESHOT : Cause the event to happen only once.
   CLEAR : Clear the state of the filter after it is received
   EOF : End – of – File
   ERROR : Specific errors.
Kqueue – Good things
   As you would have seen – It is extremely scalable in
    handling large file descriptors
     –   Eliminates most of the deficiencies of select()/poll()
     –   Currently, efforts are underway to migrate some popular
         daemons ( Apache ) to use Kqueue.
   It supports a wide range of events – not just file descriptors.
   Is easily extensible.
   New kqueue filters can be added very easily inside the BSD
    kernels.
   Opens up a lot of interesting possibilities.
Issues with Kqueue
   Kqueue calls are not part of POSIX specifications.
    –   Most of the Unix systems do not implement it.
    –   Breaks portability across Unices
   Third party code may still use select(), poll() etc. We may
    have to migrate this or allow these to co-exist
   Relatively new in the play field – Not time-tested.
References
   Kqueue: A generic and scalable event notification facility -
    Jonathan Lemon
        http://people.freebsd.org/~jlemon/papers/kqueue.pdf
   Man pages for kqueue, knote, kfilter_register
   Read the source, Luke !!
Finally ...
   Questions ??
   Thanks to
    –   Organizers for giving me a chance to speak at GNUnify 2006
    –   NetBSD and Linux developers who helped me during my work
    –   To Infosys for sponsoring my visit to GNUnify 2006
   Special thanks to YOU for listening...


                      You can contact me at :
                    Mahendra_M@infosys.com

Contenu connexe

Tendances

The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelDivye Kapoor
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing LandscapeKernel TLV
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Boden Russell
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the CanariesKernel TLV
 
Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01Hajime Tazaki
 
Make Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsMake Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsKernel TLV
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1Hajime Tazaki
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Hajime Tazaki
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Jérôme Petazzoni
 
Portable TeX Documents (PTD): PackagingCon 2021
Portable TeX Documents (PTD): PackagingCon 2021Portable TeX Documents (PTD): PackagingCon 2021
Portable TeX Documents (PTD): PackagingCon 2021Jonathan Fine
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel TLV
 
Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Dobrica Pavlinušić
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Kernel TLV
 
protothread and its usage in contiki OS
protothread and its usage in contiki OSprotothread and its usage in contiki OS
protothread and its usage in contiki OSSalah Amean
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDPlcplcp1
 
LXC on Ganeti
LXC on GanetiLXC on Ganeti
LXC on Ganetikawamuray
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsBrendan Gregg
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.Naoto MATSUMOTO
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stackHajime Tazaki
 
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce RichardsonThe 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardsonharryvanhaaren
 

Tendances (20)

The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the Canaries
 
Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01
 
Make Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsMake Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance Tools
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
 
Portable TeX Documents (PTD): PackagingCon 2021
Portable TeX Documents (PTD): PackagingCon 2021Portable TeX Documents (PTD): PackagingCon 2021
Portable TeX Documents (PTD): PackagingCon 2021
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and Containers
 
Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545
 
protothread and its usage in contiki OS
protothread and its usage in contiki OSprotothread and its usage in contiki OS
protothread and its usage in contiki OS
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
LXC on Ganeti
LXC on GanetiLXC on Ganeti
LXC on Ganeti
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance Results
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stack
 
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce RichardsonThe 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
 

Similaire à Kqueue : Generic Event notification

Forensic artifacts in modern linux systems
Forensic artifacts in modern linux systemsForensic artifacts in modern linux systems
Forensic artifacts in modern linux systemsGol D Roger
 
REAL TIME OPERATING SYSTEM PART 2
REAL TIME OPERATING SYSTEM PART 2REAL TIME OPERATING SYSTEM PART 2
REAL TIME OPERATING SYSTEM PART 2Embeddedcraft Craft
 
Docker Runtime Security
Docker Runtime SecurityDocker Runtime Security
Docker Runtime SecuritySysdig
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource KernelsSilvio Cesare
 
A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCJohan Tibell
 
Linux Performance Tunning Kernel
Linux Performance Tunning KernelLinux Performance Tunning Kernel
Linux Performance Tunning KernelShay Cohen
 
Lxc – next gen virtualization for cloud intro (cloudexpo)
Lxc – next gen virtualization for cloud   intro (cloudexpo)Lxc – next gen virtualization for cloud   intro (cloudexpo)
Lxc – next gen virtualization for cloud intro (cloudexpo)Boden Russell
 
The building blocks of docker.
The building blocks of docker.The building blocks of docker.
The building blocks of docker.Chafik Belhaoues
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Jérôme Petazzoni
 
brief intro to Linux device drivers
brief intro to Linux device driversbrief intro to Linux device drivers
brief intro to Linux device driversAlexandre Moreno
 
Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta PyData
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2mona_hakmy
 
Operating System 4
Operating System 4Operating System 4
Operating System 4tech2click
 
DevSecCon Singapore 2018 - System call auditing made effective with machine l...
DevSecCon Singapore 2018 - System call auditing made effective with machine l...DevSecCon Singapore 2018 - System call auditing made effective with machine l...
DevSecCon Singapore 2018 - System call auditing made effective with machine l...DevSecCon
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009lilyco
 

Similaire à Kqueue : Generic Event notification (20)

Linux IO
Linux IOLinux IO
Linux IO
 
UNIX Operating System ppt
UNIX Operating System pptUNIX Operating System ppt
UNIX Operating System ppt
 
Forensic artifacts in modern linux systems
Forensic artifacts in modern linux systemsForensic artifacts in modern linux systems
Forensic artifacts in modern linux systems
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
REAL TIME OPERATING SYSTEM PART 2
REAL TIME OPERATING SYSTEM PART 2REAL TIME OPERATING SYSTEM PART 2
REAL TIME OPERATING SYSTEM PART 2
 
Docker Runtime Security
Docker Runtime SecurityDocker Runtime Security
Docker Runtime Security
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource Kernels
 
A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHC
 
Linux Performance Tunning Kernel
Linux Performance Tunning KernelLinux Performance Tunning Kernel
Linux Performance Tunning Kernel
 
Lxc – next gen virtualization for cloud intro (cloudexpo)
Lxc – next gen virtualization for cloud   intro (cloudexpo)Lxc – next gen virtualization for cloud   intro (cloudexpo)
Lxc – next gen virtualization for cloud intro (cloudexpo)
 
The building blocks of docker.
The building blocks of docker.The building blocks of docker.
The building blocks of docker.
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...
 
brief intro to Linux device drivers
brief intro to Linux device driversbrief intro to Linux device drivers
brief intro to Linux device drivers
 
Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
 
Operating System 4
Operating System 4Operating System 4
Operating System 4
 
DevSecCon Singapore 2018 - System call auditing made effective with machine l...
DevSecCon Singapore 2018 - System call auditing made effective with machine l...DevSecCon Singapore 2018 - System call auditing made effective with machine l...
DevSecCon Singapore 2018 - System call auditing made effective with machine l...
 
UNIX Basics and Cluster Computing
UNIX Basics and Cluster ComputingUNIX Basics and Cluster Computing
UNIX Basics and Cluster Computing
 
Unix 3 en
Unix 3 enUnix 3 en
Unix 3 en
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 

Dernier

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Dernier (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Kqueue : Generic Event notification

  • 1. Kqueue : Generic Event Notification Mahendra M Mahendra_M@infosys.com http://www.infosys.com This work is licensed under a Creative Commons License http://creativecommons.org/licenses/by-sa/2.5/
  • 2. Agenda  Traditional ways of multiplexing I/O  Methods and issues in handling asynchronous events.  Enter Kqueue  The Kqueue architecture.  Kqueue possibilities.
  • 3. Traditional File/Socket handling  Traditionally a single file can be handled as below /* No error checking here */ while ( i = read( fd, ... ) ) { do_something( with_this_data ); }  The above case works fine for one file descriptor  What about the case where we have two or more such descriptors ( for sockets ) and data can appear on any one of the socket at any given point of time ? – Basically, we need a mechanism for event driven applications. – This is a case for multiplexing I/O ( or events ) !!
  • 4. Traditional I/O multiplexing  Use select() and/or poll()  select() or poll() pass a list of file descriptors to the kernel and wait for updates to happen. On receiving an update these calls have the list of file descriptors that got updated.  File descriptors passed as a bitmap – with each bit being set or unset to represent a file descriptor.  Select() and poll() can watch for read/write/exception events on the list of file descriptors.  On return, the applications have to parse the entire bitmap to see which file descriptors have to be handled.
  • 5. Traditional I/O multiplexing ( contd.. ) fd_set fds; FD_ZERO( &fds ); FD_SET( 5, &fds ); n = select( 1, &fds, NULL, NULL, NULL ); j = 0; for ( i = 0; (i < MAX) && (j < n); i++ ) { if ( FD_ISSET( i ) ) { read_something_from_socket( i ); j++; } }
  • 6. Issues with select()/poll()  Problems of scalability – Entire descriptor set has to be passed to each invocation of the system call ( specially with poll() - which uses an array ) – Massive copies from user space to kernel space and vice- versa – Not all descriptors may have activity all the time – On return, apps had to parse the entire list to check for updated descriptors. ( duplicated effort in kernel and app ) - O(N) activity – Results in inefficient memory usage within the kernel – In case of sleep, the list has to be parsed three times.  sleep()/poll() can handle only file descriptors  Coding was clunky for select() – Descriptor set is a bitmap of fixed size ( default 255 )
  • 7. Other forms of interesting events  Asynchronous signal notifications – Required in libraries that may want to be notified of signals  Asynchronous timer expiry  Asynchronous Read/Write ( aio_read(), aio_write() )  VFS changes  Process state Changes  Thread state changes  Device driver notifications  Anything else – that will require some asynchronous event notification – and the design allowing it.
  • 8. Available solutions  Linux 2.4 : SIGIO  Sun Solaris : /dev/poll  Linux 2.4 : /dev/epoll – Use ioctl() to manipulate the above.  Even Microsoft Windows had something to offer.  Kqueue – for BSD boxes. – We shall be talking about that now !!
  • 9. Kqueue - Goals  A generic event notification framework – File descriptors (read/write/exceptions), Signals, Asynchronous I/O ( not in OSFR ), Vnodes monitoring, process monitoring, Timer events.  A single system call to handle all this.  Capability to add new functionality.  Efficient use of memory – Memory should be allocated as per need. – Should be able to register/receive interested number of events. – Events should be combined ( eg: data arriving over a socket )  Should be good replacements for standard calls.  Should be possible to extend this functionality easily
  • 10. Kqueue APIs  int32_t kqueue( void ); – Creates a kernel queue. It is identical to a file descriptor. It can be deleted using the close() system call.  int32_t kevent( kq, changes, nc, events, ne, timeout ); – To register events in the kernel queue – To receive events that occurred between consecutive calls. – Can simulate select(), poll() - Using different values of timeout – No need to store the event descriptors locally in the application.  EV_SET( &event, ident, filter, flags, fflags, data, udata) – Used to prepare an event for registering in the kernel queue.
  • 11. Kqueue sample code kq = kqueue(); struct kevent kev[10]; // Prepare an event EV_SET( &kev[0], fd, EVFILT_READ, EV_ADD, 0, 0, 0); // Register an event kevent( kq, &kev, 10, NULL, 0, timeout ); // Receive events n = kevent( kq, NULL, 0, &kev, 10, timeout ); for ( i = 0; i < n; i++ ) { // Do something }
  • 12. Kqueue filter types  READ : Returns when data is available for read from sockets, vnodes, fifos, pipes – ident = descriptor – Data = amount of data to be read – Flags = can be EOF etc.  WRITE : Returns when it is possible to write to a descriptor ( ident ). – Data = amount of data that can be written  VNODE : Returns when a file descriptor changes – fflags = delete, write, extend, attrib, link, rename, revoke
  • 13. Kqueue filter types ( contd... )  PROC : Monitors a process – Ident = pid of the process to be monitored. – Fflags = Exit, fork, exec, track, trackerr  SIGNAL : Returns when a signal is delivered to a process. – Ident = signal number – Data = no of times the signal was delivered. – Co-exists with signal() and sigaction() - and has a lower precedence. – Is delivered even if SIG_IGN is set for the signal  TIMER : Establishes a timer – ident = timer id, Data = timeout in milliseconds, or no of times – Periodic by default unless ONESHOT is specified
  • 14. Kqueue Flags  ADD : To add an event to the queue  ENABLE : To enable a disabled event  DISABLE : To temporarily disable an event ( not deleted )  DELETE : Remove an event from the kernel queue  ONESHOT : Cause the event to happen only once.  CLEAR : Clear the state of the filter after it is received  EOF : End – of – File  ERROR : Specific errors.
  • 15. Kqueue – Good things  As you would have seen – It is extremely scalable in handling large file descriptors – Eliminates most of the deficiencies of select()/poll() – Currently, efforts are underway to migrate some popular daemons ( Apache ) to use Kqueue.  It supports a wide range of events – not just file descriptors.  Is easily extensible.  New kqueue filters can be added very easily inside the BSD kernels.  Opens up a lot of interesting possibilities.
  • 16. Issues with Kqueue  Kqueue calls are not part of POSIX specifications. – Most of the Unix systems do not implement it. – Breaks portability across Unices  Third party code may still use select(), poll() etc. We may have to migrate this or allow these to co-exist  Relatively new in the play field – Not time-tested.
  • 17. References  Kqueue: A generic and scalable event notification facility - Jonathan Lemon http://people.freebsd.org/~jlemon/papers/kqueue.pdf  Man pages for kqueue, knote, kfilter_register  Read the source, Luke !!
  • 18. Finally ...  Questions ??  Thanks to – Organizers for giving me a chance to speak at GNUnify 2006 – NetBSD and Linux developers who helped me during my work – To Infosys for sponsoring my visit to GNUnify 2006  Special thanks to YOU for listening... You can contact me at : Mahendra_M@infosys.com