A presentation given at the Programming Languages Meetup in San Francisco (Jun 10, 2014). Computation is about communicating state machines, but the message is lost in the endless debates on threads vs. events, iterators vs.. reactive approaches. There are lightweight coroutine and thread options available in all major mainstream languages, which help combine the easy sequential thread programming, with performance of event-oriented code. You can have it all.
2. • Fundamental building block of computation
• Communicating State Machines model
• Synchronous and Asynchronous composition
• Hierarchical State Machines specification
• (Edward A. Lee and Pravin Varaiya, Structure and Interpretation of Signals
and Systems, LeeVaraiya.org)
State machines
3. • Distributed Systems
• Hardware interfaces
• Components of a memory hierarchy
• Stream producers and consumers
• Parsers and Lexers
• Filesystem and tree walker.
• Networking stack and Socket consumer
• Bidirectional communication
CSMs are Ubiquitous
5. • Language used in TinyOS to program wireless motes
nesC
• Components with bidirectional interfaces
• Separate configuration to stitch together components
11. Stacks as State Machines
void readMsgs( socket) {
numMsgsRead = 0
while (true) {
msg = readMsg(socket)
dispatch(msg)
log(numMsgsRead++)
}
}
void readMsg(socket) {
len := readLen(socket)
readBody(len)
}
void readLen(socket) {
byte[4] len
for i = 0 .. 4 {
len[i] = readByte(socket)
}
return len
}
• Thread of control
• Control plane = Call Chain (each frame
remembers its pc)
• Sequential flow of control defines hidden states
• Functions define major states
• Data plane = Vars local in each frame
• Blocking semantics == synchronous (lock-
step) communication
• readByte and dispatch interact with network
• Easy API; that’s why Posix and most db
calls are synchronous
12. State machine in Erlang
bark() ->
io:format("Dog says: BARK! BARK!~n"),
receive
pet ->
wag_tail();
_ ->
io:format("Dog is confused~n"),
bark()
after 2000 ->
bark()
end.
!
wag_tail() ->
io:format("Dog wags its tail~n"),
receive
pet ->
sit();
_ ->
io:format("Dog is confused~n"),
wag_tail()
after 30000 ->
bark()
end.
sit() ->
io:format("Dog is sitting. Gooooood boy!~n"),
receive
squirrel -> bark();
_ ->
io:format("Dog is confused~n"),
sit()
end.
• Tail-call optimization renders
change trivial
Credit: http://learnyousomeerlang.com/finite-state-machines
13. • Problem: Obtain leaves from a tree one at a time
Leaves from a Tree
14. • Problem: Obtain leaves from a tree one at a time
Leaves from a Tree
15. • Problem: Obtain leaves from a tree one at a time
• Two interacting state machines:
• Producer: tree, Consumer: user code that acts on the leaves.
• Pull solution: Iterators
• Convenient for clients
• for leaf in tree:
print leaf.name
• Push solution: Functional approach
• Tree pushes data to visitors or user-defined functions
• tree.visit( myfunc )
• Ideally: Duals of each other
• In practice: Duel with each other
Leaves from a Tree
16. Pull Solution: Iterators
class Node:
…
def __iter__(self): return Iter(self)
!
class Iter:
def __init__(self, root):
self.nxt = root.first_leaf()
self.prev = None
def next(self):
nxt = self.nxt
if nxt: # First time entry into iterator
self.nxt = None
self.prev = nxt
return nxt
(contd).
prev = self.prev
if prev.sibling:
nxt = prev.sibling.first_leaf()
else: # explore cousins .. children of parent's siblings
parent = prev.parent
while parent:
uncle = parent.sibling
if uncle:
nxt = uncle.first_leaf()
break
else:
parent = parent.parent # continue loop
if nxt:
self.prev = nxt # for next iter
return nxt
else:
raise StopIteration
• Consumer code drives iteration
• Producer code (iterable) needs to save state between iterations
17. Push solution
class Node:
…
def leaves(self, callback):
if self.is_leaf():
callback(self)
else:
for c in self.children:
c.leaves(callback)
!
if self.sibling:
self.sibling.leaves(callback)
def cb(node):
print node.name
!
tree.leaves(cb)
• Consumer side:
• Callback hell
• Visitor pattern is an
abomination
• Does not have flow-control
between events
• Producer side:
• drives iteration
• stack for storing recursive
state
• Allows async consumers to
deliver events
Consumer
Producer
18. Push: Consumer-side trouble
exports.processJob = function(options, next) {
db.getUser(options.userId, function(err, user) {
if (error) return next(err);
db.updateAccount(user.accountId, options.total, function(err) {
if (err) return next(err);
http.post(options.url, function(err) {
if (err) return next(err);
next();
});
});
});
};
def sameFringe(treeA, treeB):
itreeA = iter(treeA)
itreeB = iter(treeB)
while 1:
nodeA = itreeA.next()
nodeB = itreeB.next()
if node A .name != nodeB.name: return False
….
Callback Hell
!
Sequential chain of
events verbose to
express
!
Inversion of control
Concurrent Traversals
trivial in Pull approach
20. Generators: Concurrent Stacks
o = odds()
!
print o.next()
print o.next()
print o.next()
!
# Print infinite stream
for n in odds() :
print n
def odds():
i = 1
while True:
yield i
i += 2
for leaf in tree.leaves():
print leaf.name
class Tree:
def leaves(self):
if self.is_leaf():
yield self
else:
for c in self.children:
for leaf in c.leaves():
yield leaf
21. • Generators/Coroutines are simply a compiler transformation of
threaded to event-driven code on same kernel thread
• Flow of control alternates between consumer and producer
• Cheap user-level tasks with explicit cooperative scheduling
• Scheduler calls next()
• Task calls yield() whenever necessary
• Wrapped in an abstraction called Fiber
• Ruby: Fiber.yield, Javascript: function*, yield/yield*
• Symmetric vs Asymmetric coroutines
• Lazy streams — Infinite streams on demand
Generators
23. • All threads have the same fixed size set at creation time:
usually set to worst case
• Kernel Thread context switching is expensive (in μs)
• Preemption at any time ==> Save all registers: 16 general purpose registers, PC, SP,
segment registers, 16 XMM registers, FP coprocessor state, X AVX registers, all MSRs
• TLB flushes, cache invalidation, crossing kernel protection boundary
• Even cooperative yields are expensive.
• A kernel thread is a precious resource. Can’t block it.
• No, not for IO-bound code, says Paul Tyma
Why can’t we just use Kernel Threads?
> ulimit –s
8192
25. • But horrible user-programming model
• libuv, libasync, EventMachine (Ruby), netty (Java)
• User-code must not block, not call other I/O
operations
Event-driven I/O is faster
26. Netty inversion of control
io.netty.handler.codec.DecoderException: java.lang.RuntimeException: No packet with id 78
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:263)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:131)
at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337)
at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:173)
at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337)
at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323)
at io.netty.handler.codec.ByteToMessageDecoder.handlerRemoved(ByteToMessageDecoder.java:109)
at io.netty.channel.DefaultChannelPipeline.callHandlerRemoved0(DefaultChannelPipeline.java:524)
at io.netty.channel.DefaultChannelPipeline.callHandlerRemoved(DefaultChannelPipeline.java:518)
at io.netty.channel.DefaultChannelPipeline.remove0(DefaultChannelPipeline.java:348)
at io.netty.channel.DefaultChannelPipeline.remove(DefaultChannelPipeline.java:319)
at io.netty.channel.DefaultChannelPipeline.remove(DefaultChannelPipeline.java:296)
at org.spigotmc.netty.LegacyDecoder.decode(LegacyDecoder.java:38)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:232)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:131)
at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337)
at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323)
at io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:149)
at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337)
at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:100)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:478)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:447)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:341)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.RuntimeException: No packet with id 78
at org.spigotmc.netty.Protocol$ProtocolDirection.createPacket(Protocol.java:272)
at org.spigotmc.netty.PacketDecoder.decode(PacketDecoder.java:44)
28. • All I/O handled by special I/O event loop in separate
thread
• Can’t do I/O in callback
• Cannot block
• Handed off to a task on a separate thread pool
• Task cannot block there either; limited threads in thread pool
• Hand-rolled continuations
Current mainstream
29. Netty
public
class
WriteTimeOutHandler
extends
ChannelOutboundHandlerAdapter
{
@Override
public
void
write(ChannelHandlerContext
ctx,
Object
msg,
ChannelPromise
promise)
{
ctx.write(msg,
promise);
!
if
(!promise.isDone()
{
ctx.executor().schedule(new
WriteTimeoutTask(promise),
30,
TimeUnit.SECONDS);
}
}
}
Ugh.
30. Functional Reactive Programming
getDataFromNetwork()
.skip(10)
.take(5)
.map({ s -> return s + " transformed" })
.subscribe({ println "onNext => " + it })
• Reactive extensions .NET, RxJava, Scala
• Asynchronous stream. A chain of transformers
ending with a callback.
• Effectively with the same kinds of restrictions:
• No blocking, worry about thread context (“can I write to a socket”)
31. • Pretty sequential code.
• Millions of Threads.
• Block when we want to.
• Receive and Send to other SMs anywhere.
• Receive from multiple sources
• Speed and lightness of Event-driven solutions
Can we have it all?
32. • kilim.malhar.net
• Bytecode transformer for coroutines/generators and
lightweight tasks
• s/Thread/Task/
• s/run()/execute() throws Pausable/
• All functions that may block annotated as “throws Pausable”
• Use typed mailboxes to communicate
• Bytecode transformation of Java code.
• Offline or at class load time
Ta da! Kilim
36. • Lightweight threads — C layout, small dynamic stacks
• Multiplex on channel I/O — CSP’s alt operator.
• Fast context switching — three registers to save and restore
(PC, SP and DX)
• Syntactic lightness
• Language and idioms fit in my L1 cache
• Closures, Duck-typing
• 0-sized channels == true synchronous lock-step
• What I want: Some aspects of Swift/Rust!
What I like about Go
37. Go
package main
func main() {
ch := make(chan int)
!
go func() { // producer
i := 1
for {
ch <– i
i += 2
}
}()
!
for { // consumer
println(<–ch)
}
}
38. Go
func main() {
// Listen and accept loop
tcpaddr, err := net.ResolveTCPAddr("tcp", "localhost:9999")
check(err)
tcp_acceptor, err := net.ListenTCP("tcp", tcpaddr)
check(err)
fmt.Println("Listening on ", tcp_acceptor.Addr())
!
for true {
tcp_conn, err := tcp_acceptor.AcceptTCP()
check(err)
go serve(tcp_conn)
}
}
func serve(conn *net.TCPConn) {
for true {
dec := gob.NewDecoder(conn)
//var msg Msg
var data string
//err := dec.Decode(&msg)
err := dec.Decode(&data)
check(err)
println("Server: Rcvd ", data)
//println("Server: Rcvd ", msg.Data, "from", msg.From)
….
}
39. • Compiler transformation of ‘go’ blocks into event-
driven code
• All blocking calls must be made directly inside a go
block
• Channel receives and sends cannot be made in a called function
• In general, all approaches relying only on compiler
transformations leak abstractions. Need Go/Erlang
like deep runtime support
Clojure core.async
40. • Threaded style is easy to write and understand
• Actors are not internally concurrent; no internal data races.
• Undesirable combination: Aliasing + Mutability
• Either aliased+immutable — clojure approach
• Unaliased+mutable — KIlim, Rust, Go approach.
• Isolate actor state, and exchange messages. Rust’s linear type system is wonderful.
• Go mantra: Share by communicate, not communicate by sharing
• No more threads vs. events debates. You can have it all
• Erlang, Go, Rust, Kilim for Java, Akka for Scala, F#
Takeaways