"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Kqueue : Generic Event notification
1. Kqueue : Generic Event Notification
Mahendra M
Mahendra_M@infosys.com
http://www.infosys.com
This work is licensed under a Creative Commons License
http://creativecommons.org/licenses/by-sa/2.5/
2. Agenda
Traditional ways of multiplexing I/O
Methods and issues in handling asynchronous events.
Enter Kqueue
The Kqueue architecture.
Kqueue possibilities.
3. Traditional File/Socket handling
Traditionally a single file can be handled as below
/* No error checking here */
while ( i = read( fd, ... ) ) {
do_something( with_this_data );
}
The above case works fine for one file descriptor
What about the case where we have two or more such
descriptors ( for sockets ) and data can appear on any one
of the socket at any given point of time ?
– Basically, we need a mechanism for event driven applications.
– This is a case for multiplexing I/O ( or events ) !!
4. Traditional I/O multiplexing
Use select() and/or poll()
select() or poll() pass a list of file descriptors to the kernel
and wait for updates to happen. On receiving an update
these calls have the list of file descriptors that got updated.
File descriptors passed as a bitmap – with each bit being set
or unset to represent a file descriptor.
Select() and poll() can watch for read/write/exception events
on the list of file descriptors.
On return, the applications have to parse the entire bitmap to
see which file descriptors have to be handled.
5. Traditional I/O multiplexing ( contd.. )
fd_set fds;
FD_ZERO( &fds );
FD_SET( 5, &fds );
n = select( 1, &fds, NULL, NULL, NULL );
j = 0;
for ( i = 0; (i < MAX) && (j < n); i++ ) {
if ( FD_ISSET( i ) ) {
read_something_from_socket( i );
j++;
}
}
6. Issues with select()/poll()
Problems of scalability
– Entire descriptor set has to be passed to each invocation of
the system call ( specially with poll() - which uses an array )
– Massive copies from user space to kernel space and vice-
versa
– Not all descriptors may have activity all the time
– On return, apps had to parse the entire list to check for
updated descriptors. ( duplicated effort in kernel and app ) -
O(N) activity
– Results in inefficient memory usage within the kernel
– In case of sleep, the list has to be parsed three times.
sleep()/poll() can handle only file descriptors
Coding was clunky for select()
– Descriptor set is a bitmap of fixed size ( default 255 )
7. Other forms of interesting events
Asynchronous signal notifications
– Required in libraries that may want to be notified of signals
Asynchronous timer expiry
Asynchronous Read/Write ( aio_read(), aio_write() )
VFS changes
Process state Changes
Thread state changes
Device driver notifications
Anything else – that will require some asynchronous event
notification – and the design allowing it.
8. Available solutions
Linux 2.4 : SIGIO
Sun Solaris : /dev/poll
Linux 2.4 : /dev/epoll
– Use ioctl() to manipulate the above.
Even Microsoft Windows had something to offer.
Kqueue – for BSD boxes.
– We shall be talking about that now !!
9. Kqueue - Goals
A generic event notification framework
– File descriptors (read/write/exceptions), Signals,
Asynchronous I/O ( not in OSFR ), Vnodes monitoring,
process monitoring, Timer events.
A single system call to handle all this.
Capability to add new functionality.
Efficient use of memory
– Memory should be allocated as per need.
– Should be able to register/receive interested number of
events.
– Events should be combined ( eg: data arriving over a socket )
Should be good replacements for standard calls.
Should be possible to extend this functionality easily
10. Kqueue APIs
int32_t kqueue( void );
– Creates a kernel queue. It is identical to a file descriptor. It can
be deleted using the close() system call.
int32_t kevent( kq, changes, nc, events, ne,
timeout );
– To register events in the kernel queue
– To receive events that occurred between consecutive calls.
– Can simulate select(), poll() - Using different values of timeout
– No need to store the event descriptors locally in the
application.
EV_SET( &event, ident, filter, flags,
fflags, data, udata)
– Used to prepare an event for registering in the kernel queue.
11. Kqueue sample code
kq = kqueue();
struct kevent kev[10];
// Prepare an event
EV_SET( &kev[0], fd, EVFILT_READ, EV_ADD, 0, 0, 0);
// Register an event
kevent( kq, &kev, 10, NULL, 0, timeout );
// Receive events
n = kevent( kq, NULL, 0, &kev, 10, timeout );
for ( i = 0; i < n; i++ ) {
// Do something
}
12. Kqueue filter types
READ : Returns when data is available for read from
sockets, vnodes, fifos, pipes
– ident = descriptor
– Data = amount of data to be read
– Flags = can be EOF etc.
WRITE : Returns when it is possible to write to a descriptor
( ident ).
– Data = amount of data that can be written
VNODE : Returns when a file descriptor changes
– fflags = delete, write, extend, attrib, link, rename, revoke
13. Kqueue filter types ( contd... )
PROC : Monitors a process
– Ident = pid of the process to be monitored.
– Fflags = Exit, fork, exec, track, trackerr
SIGNAL : Returns when a signal is delivered to a process.
– Ident = signal number
– Data = no of times the signal was delivered.
– Co-exists with signal() and sigaction() - and has a lower
precedence.
– Is delivered even if SIG_IGN is set for the signal
TIMER : Establishes a timer
– ident = timer id, Data = timeout in milliseconds, or no of times
– Periodic by default unless ONESHOT is specified
14. Kqueue Flags
ADD : To add an event to the queue
ENABLE : To enable a disabled event
DISABLE : To temporarily disable an event ( not deleted )
DELETE : Remove an event from the kernel queue
ONESHOT : Cause the event to happen only once.
CLEAR : Clear the state of the filter after it is received
EOF : End – of – File
ERROR : Specific errors.
15. Kqueue – Good things
As you would have seen – It is extremely scalable in
handling large file descriptors
– Eliminates most of the deficiencies of select()/poll()
– Currently, efforts are underway to migrate some popular
daemons ( Apache ) to use Kqueue.
It supports a wide range of events – not just file descriptors.
Is easily extensible.
New kqueue filters can be added very easily inside the BSD
kernels.
Opens up a lot of interesting possibilities.
16. Issues with Kqueue
Kqueue calls are not part of POSIX specifications.
– Most of the Unix systems do not implement it.
– Breaks portability across Unices
Third party code may still use select(), poll() etc. We may
have to migrate this or allow these to co-exist
Relatively new in the play field – Not time-tested.
17. References
Kqueue: A generic and scalable event notification facility -
Jonathan Lemon
http://people.freebsd.org/~jlemon/papers/kqueue.pdf
Man pages for kqueue, knote, kfilter_register
Read the source, Luke !!
18. Finally ...
Questions ??
Thanks to
– Organizers for giving me a chance to speak at GNUnify 2006
– NetBSD and Linux developers who helped me during my work
– To Infosys for sponsoring my visit to GNUnify 2006
Special thanks to YOU for listening...
You can contact me at :
Mahendra_M@infosys.com