16. require "rubygems"require "rest_client"# Interacting with TokyoTyrant via RESTful HTTP!db = RestClient::Resource.new("http://localhost:1978")db["key"].put "value 1" # insert via HTTPdb["key"].put "value 2" # update via HTTPputs db["key"].get # get via HTTP# => "value 2"db["key"].delete # delete via HTTPputs db["key"].get rescue RestClient::ResourceNotFound RESTful Tokyo Tyrant
17. require "rubygems"require "rest_client"# Interacting with TokyoTyrant via HTTP!db = RestClient::Resource.new("http://localhost:1978")db["key"].put "value 1"# insert via HTTPdb["key"].put "value 2"# update via HTTPputs db["key"].get # get via HTTP# => "value 2"db["key"].delete # delete via HTTPputs db["key"].get rescueRestClient::ResourceNotFound RESTful Tokyo Tyrant Awesome.
18. “Recently, I sophisticated Hanami and the Sumida River in a houseboat, I was sad that day and not even a minute yet mikio bloom …” … so I added Lua scripting to Tyrant. http://alpha.mixi.co.jp/blog/?p=236
23. Garbage collectionGZIP(Source + Docs + Examples) = 212 Kb What is Lua? It’s like Ruby.. except it’s not. Fast + Lightweight = Great for embedded apps
24. + CREATE FUNCTION json_members RETURNS STRING SONAME 'lib_mysqludf_json.so'; SELECT json_object(customer_id, first_name) FROM customer; +---------------------------------------------------+ | customer | +---------------------------------------------------+ | {customer_id:1,first_name:"MARY"} | +---------------------------------------------------+ Extending the Database? MySQL User Defined Functions JSON Response http://www.mysqludf.org/lib_mysqludf_json/index.php
25. = C/C++ + + = Lua TC+Lua? Why? To make our lives easier, and more fun! Easy to learn & easy to extend!
30. -- -- incr.lua-- function incr (key, i)i = tonumber(i)ifnotithenreturnnilend local old = tonumber(_get(key)) if old theni = old + i end if not _put(key, i) then return nil end return iend Verify input Implementing INCR in Lua+TC
31. -- -- incr.lua-- function incr (key, i)i = tonumber(i)ifnotithenreturnnilend local old = tonumber(_get(key))if old theni = old + iend if not _put(key, i) then return nil end return iend Get old value & increment it Implementing INCR in Lua+TC
32. -- -- incr.lua-- function incr (key, i)i = tonumber(i)ifnotithenreturnnilend local old = tonumber(_get(key))if old theni = old + iendifnot _put(key, i) thenreturnnilendreturniend Save new value Implementing INCR in Lua+TC
36. “Redis as a data structures server, it is not just another key-value DB”
37. functionset_append(key, value) local stream = _get(key)ifnot stream then _put(key, value)else local set_len = _set_len(stream) if set_len == 1 then if stream == value then return nil endelseifset_len > 1 then for _, element in ipairs(_split(stream, SEP)) do if element == value then return nil end end end if not _putcat(key, SEP .. value) then return nil endendreturn valueend Empty Set Implementing Set operations in TC
38. functionset_append(key, value) local stream = _get(key)ifnot stream then _put(key, value)else local set_len = _set_len(stream)ifset_len == 1 thenif stream == value thenreturnnilendelseifset_len > 1 thenfor _, element inipairs(_split(stream, SEP)) doif element == value thenreturnnilendendendifnot _putcat(key, SEP .. value) thenreturnnilendendreturn valueend Append key if unique Implementing Set operations in TC
40. “memcachedis a general-purpose distributed memory caching system that is used by many top sites on the internet” Key Value Time key1 value1 10 key2 value2 20 Time = 15 key2 value2 30 Implementing TTL’s in TC
41. DELETE where x > Time.now function expire() local args = {} local cdate = string.format("%d", _time())table.insert(args, "addcondxNUMLE" .. cdate)table.insert(args, "out") local res = _misc("search", args)ifnot res then _log("expiration was failed")end print("rnum=" .. _rnum() .. " size=" .. _size())end Expiring Records with Lua
42. = ? + [ilya@igvita] >ttserver-ext expire.lua -extpc expire 5 "casket.tct#idx=x:dec" Invoke “expire” command every 5 seconds Table database, with index on expiry column (x) Implementing Set operations in TC
46. _out(key) _get(key) _vsiz(key) _addint(key, value) _mapreduce(mapper, reducer, keys) Executing MR jobs within Tokyo Cabinet
47. functionwordcount()functionmapper(key, value, mapemit)for word instring.gmatch(string.lower(value), "%w+") domapemit(word, 1)endreturntrueendlocal res = ""functionreducer(key, values) res = res .. key .. "" .. #values .. "" return true end if not _mapreduce(mapper, reducer) then res = nil end return resend Emit: {word: 1} Map-Reduce within Tokyo Cabinet
48. functionwordcount()functionmapper(key, value, mapemit)for word instring.gmatch(string.lower(value), "%w+") domapemit(word, 1)endreturntrueend local res = ""functionreducer(key, values) res = res .. key .. "" .. #values .. ""returntrueendifnot _mapreduce(mapper, reducer) then res = nilendreturn resend Emit: {word: 1} sizeof(values) Map-Reduce within Tokyo Cabinet
49. [ilya@igvita] >ttserver-ext wordcount.lua test.tch [ilya@igvita] >tcrmgrputlocalhost1 “This is a pen.“ [ilya@igvita] >tcrmgrputlocalhost1 “Hello World“ [ilya@igvita] >tcrmgrputlocalhost1 “Life is good“ [ilya@igvita] >tcrmgrextlocalhostwordcount a 1 good 1 is 2 life 1 pen 1 this 1 Execute Map-Reduce Job Map-Reduce within Tokyo Cabinet
So what is Tokyo Cabinet? Not surprisingly, as the name implies Tokyo Cabinet is a software project that was started in Japan, and as any Rubyist knows, Japanese developers have made some amazing contributions. Case in point: Ruby and Matz. Started as a research project back in ’93, it made its way to North America in around 2001 and 2002, and the rest is history.For that reason, when I stumbled across Tokyo Cabinet, I decided to do some digging. Turns out, the project is a brainchild of one developer.. and facebook to the rescue to help us put a face to the project. And I don’t know about you, but when I saw that photo.. I had one association pop up..
Yeah? What do you think? In fact, I think the analogy may be well suited because Tokyo Cabinet has all the potential to make it big in the database world.
When we talk about TC, we are actually referring to three distinct products under the umbrella: - cabinet – a set of database management routines - tyrant – a standalone server implementation around cabinet - dystopia – full text search engine built on top of tokyo cabinetAll of the projects are written in very clean, easy to read, and well documented C, and released under LGPL. For the purposes of this talk, we’ll first focus on tokyo cabinet and work through some of the basics and then move on to tyrant, which is where we’ll spend most of the time. I’m not going to talk about Dystopia, but I’d encourage you to check it out and play with it.
QDBM is/was a library of routines for managing a database. - 2000 – 2007Tokyocabinet is a successor to QDBM.
- Thread-safeRow level lockingSupports many different data layouts as we will see shortlyFull ACID supportHas bindings to all the most popular languages, which is pretty important considering this is an embedded database
TC has many different engines, which if you’re ever used MySQL is exactly the same as deciding between MyISAM or InnoDB, or any other engine. Each one has it’s advantages and the right choice really depends on your data and performance requirements. The bread and butter is the hashtable, which is just a key value store like BDB and many other database. It’s fast, really fast.B-Tree table is similar to the hash table, but because of the supported data layout it actually allows storage of duplicate keys, and iteration over that data.Fixed-length is a no frills array. It’s probably the fastest engine there is, just because of it’s simplicity, but you certainly give up a lot of functionality as well.Last but not least, the Table engine is one of the most interesting features of TC. It is essentially a schemaless document store with support for arbitrary indexing and query capabilities. Let’s take a look at a few examples…
We’re using tokyo cabinet via rufus/tokyo gem, which means we’re building an embedded database. There are two good alternatives for interfacing with TC from Ruby. There is a native gem provided by the author, and then there is the rufus/tokyo gem which I’m using here which is built via FFI which means you can this library from Jruby, MRI, Rubinius and so forth. It is definitely a little bit slower, but I like the syntax it provides a lot more than the native gem.
It walks & talks just like a Ruby Hash. Very easy to use.
Create a query, order by ‘age’. In this case we didn’t declare any explicit indexes, but the dataset is small so the query is still very fast. If we were working with a larger dataset we would explicitly declare an index either via ruby or when we created the database. Check the docs for details on how to do this.
Also, as you would expect, there is full transaction support. For this reason alone, it’s not completely unreasonable to think about using TC in place of Ruby hashes in some of your code. It’s dead simple, and fast.
http://www.slideshare.net/rawwell/tokyotalk
- High concurrency (multi-thread uses epoll/kqueue) - 3 Protocol Options: Binary, memcached, and HTTP - Hot backup - Update logging - Replication (master/slave, master/master) - Lua extensions
Interacting via rest_client.. No rufus/tokyo here!
Unfortunately, even though the project is very mature at this point, and is being used in production at mixi, finding discussions or support around the project is a bit of a challenge. Mikio regularly writes on his developer blog at mixi, but even that is in Japanese.. And I’m a big fan of statistical machine translation techniques, but there is definitely a lot of room for improvement… What you see on the slide is a start of one his blog posts of August of last year, in which he goes on to announce… By the way, this is where you should gasp and all jump in joy, because this is huge!
Lua" (pronounced LOO-ah) means "Moon" in Portuguese. As such, it is neither an acronym nor an abbreviation, but a noun. Lua runs on all flavors of Unix and Windows, and also on mobile devices (such as handheld computers and cell phones that use BREW, Symbian, Pocket PC, etc.) and embedded microprocessors (such as ARM and Rabbit) for applications like Lego MindStorms. Lua is a fast language engine with small footprint that you can embed easily into your application. Lua has a simple and well documented API that allows strong integration with code written in other languages.Lua allows many applications to easily add scripting within a sandbox…
So why is Lua + TokyoCabinet so interesting after all? Who’s used MySQL UDFs’? Anyone written one? They’re a pain on both accounts.User-defined functions are compiled as object files and then added to and removed from the server dynamically using the CREATE FUNCTION and DROP FUNCTION statements. You can add functions as native (built-in) MySQL functions. Native functions are compiled into the mysqld server and become available on a permanent basis. Couple of problems: C/C++ plus a very messy internal API. If you get it wrong, you crash the database, so you better get it right. Why use it? Faster then triggers, allows linking against other libraries. For example, there is a very popular memcached library which allows you to interface with memcache right from MySQL. There are number of reasons why you would want to use this, but one great use case is for replicating memcached. That is, use mysql protocol to replay queries and then update your memcached instances in different data centers, etc. This is exactly how Facebook keeps their clusters in sync.
TT is a hassle to extend the protocol and implementation. Lua on the server is able to register any function.The "passing as an argument and then returns the results" so that the interface that is common to all database operations, "the method name string, string key, string value to send, running and passing the key value of the name of the method, the return string will be returned "only if a protocol that eliminates the need to define for each protocol.
- Lua Extension * defines arbitrary database operations * atomic operation by record locking - define DB operations as Lua functions * clients call each giving functionname and record data * the server returns the return value of the function - options about atomicity * no locking / record locking / global lockinghttp://www.scribd.com/doc/12016121/Tokyo-Cabinet-and-Tokyo-Tyrant-Presentation
Enough handwaving, let’s look at the code… A no-op extension which will return the key and value pair without storing anything in the database.The “..” syntax is Lua’sconcat operator for strings.
Next, we start the tokyo tyrant server via a command line utility ‘ttserver’ and pass in the –ext parameter with the name of the extension. That’s all you need to start the server with our new extension.After that, we can use the manager utility and invoke some commands. First we specify ext as the command name, which indicates that we’re going to be calling an extension, then the address of the server, then we specify the command, which is “echo” and finally we pass in our key and value. In return, we get the string from TC!
Alternatively, we could do the same thing from ruby. Except this time, instead of creating a local database, we’re going to specify an IP address and port. From there, we call the ext method again, pass in the name of the command as a symbol and the parameters. Voila.
Ok, so echo is a cute example, but let’s look at something slightly more interesting. We’re going to build an increment command in Lua. Now, TC already has this in it’s API, but we’re going to do this for the same of an example.
Redis is a key-value database. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists and sets with atomic operations to push/pop elements.In order to be very fast but at the same time persistent the whole dataset is taken in memory and from time to time and/or when a number of changes to the dataset are performed it is written asynchronously on disk. You may lost the last few queries that is acceptable in many applications but it is as fast as an in memory DB (Redis supports non-blocking master-slave replication in order to solve this problem by redundancy).
Do some iteration over keys, and finally append the new value if it’s actually new.
In similar fashion, implement delete get and length functions and we have a minimal set of SET operations in TC. Of course, this is not going to be as fast as a native implementation in Redis, but that’s not the point either. The point is the potential extensibility of the database.
The extpc argument allows us to specify a function and an interval, in this case 5 seconds which will be executed by the tokyo server. Which means that our cleanup script will be executed every 5 seconds, effectively removing the expired records from the database! Additionally, we’re going to add an index on x, in decreasing order to help speed up this operation.Once again, TC+Lua = memcached? Not quiet. Memcached will perform better because it doesn’t care to sweep it’s memory for expired records, it just lets them hang around and evicts them when it runs of out free memory. But, it’s a great example of scripting your database.
Finally, here’s a real use case deployed at mixi and documented by Mikio. They use tyrant as a proxy cache server in front of a tier of their database. They use lua to interface with their cache and fallback to the actual MySQL and Hbase tables on a cache miss or write. This is possible because Lua has bindings for MySQL, which means that the entire cache server layer is built in Lua and runs as Tokyo Tyrant. No need for any extra frills and the different TC engines allow them to customize the cache layout to fit their use case.
Another example documented by Mikio is a session trail tracker. Instead of analyzing the logs in offline fashion, they built a system which recorded user’s visit based on their session cookie. In this case, there are two sessions one with ID 1, and second with ID 2. First user visits resources 123, 256, and 987, which are then easily retrieved via the list operation.The source for this is available in my github repo.
Back in November of ’08Mikio added another native interface for invoking mapreduce jobs on the database. There is no particular reason for this, as this implementation does not really have the distributed features that make Map-Reduce what it is, but it is a great example of a programming paradigm making the rounds…To get started with this functionality, you will once again need Lua and we’ll have to build just two functions to make it all work, a mapper and a reduce function. Let’s take a look at an example…
We’re going to build a wordcount example. That is, we’re going to assume that the values in our database are strings, and we’ll try to get a word frequency count across all the keys. To do that, we’re going to iterate over all the keys, which will be passed to the mapper function repeatedly, and emit a temporary tuple, where each word will have a count of 1.
Next, the reducer function takes over. In the background TC aggregates all the results with the same key and passes them into reduce. At this point, we simple count the number of values for each word, and.. we’re done!
Start the server, add a few strings, and then execute our MR job. Easy as that.
On that note.. Hopefully these examples gave you a taste for how easy it is to extend Tokyo Cabinet, and I would really encourage you to give it a try. All of the examples we went through in the slides are available in my github repo which I created for this talk, as well as a number of other examples.Take a look at the high-low-game, and the inverted-index extensions. Both great examples of what you can do with less than a hundred lines of Lua and TC.