2. Redis
Key/Value
Async I/O
Very fast (most ops take O(1))
Active development (VM, APPEND Only datafile, HASH
type)
Values can be data types: LISTS, SETS, ORDERED
SETS (http://code.google.
com/p/redis/wiki/IntroductionToRedisDataTypes)
One step further than memcached, same intuitive
applications and patterns
3. Redis
Why ?
RestMQ
Brief MongoDB/Redis recap
Information Retrieval: using SETs to search books
4. Why ? Another Key/Value ?
REDIS is a key value storage, but it presents different data
types.
These datatypes are the building blocks of more complex stuff
you use already.
Think LISTs, SETs, Ordered SETs, and methods to deal with
them as you would do with a good standard library.
Also, different persistence strategies, replication, locks,
increments...
5. RestMQ
RestMQ is a HTTP/REST/JSON based message queue.
HTTP as transport protocol
REST as a way to organize resources
JSON as data exchange format
- Built initially to mimic Amazon's SQS functionality at GAE
(http://jsonqueue.appspot.com)
- Standalone server, uses Python, Cyclone, Twisted and Redis.
- COMET consumer (bind your http client and get objects)
6. RestMQ and Redis
For each queue <<q>> in Redis:
q:uuid - The queue unique id counter
q:queue - The queue LIST (fifo)
q:control - Queue pause control
q:<<id>> - objects in queue.
Global:
QUEUESET: SET containing all queues
Also: Persistence, statistics
7. An async/sharding Redis client
Original python clients:
redis.py: Synchronous
txredis: Incomplete
Needed:
Async client, with connection pool and sharding (well sharding
is a plus).
http://github.com/fiorix/txredisapi
8. Web app framework
Original RestMQ ways twisted.web based. Cool, but too much
work.
http://github.com/fiorix/cyclone
A twisted based tornado clone. COMET is a breeze, lots of web
framework stuff, json encode/decode support built in.
Integrates easily with txredis-api. The core queue protocol was
ported and extended form the GAE version.
9. RestMQ
COMET consumer
REST producer/consumer
JSON Based producer/consumer
COMET is pausable (start/stop control)
HTTP based. Even CURL can operate a MQ now.
Asynchronous I/O
Map/Reduce and Actors are a given (easy to implement,
example shipped)
http://github.com/gleicon/restmq
11. Brief MongoDB/Redis recap - Books
MongoDB Redis
Search tags for erlang or haskell:
SINTER 'tag:erlang' 'tag:haskell'
db.books.find({"tags":
0 results
{ $in: ['erlang', 'haskell']
}
SINTER 'tag:programming' 'tag:computing'
})
3 results: 1, 2, 3
Search tags for erlang AND haskell (no
results)
SUNION 'tag:erlang' 'tag:haskell'
2 results: 2 and 3
db.books.find({"tags":
{ $all: ['erlang', 'haskell']
SDIFF 'tag:programming' 'tag:haskell'
}
2 results: 1 and 2 (haskell is excluded)
})
This search yields 3 results
db.books.find({"tags":
{ $all: ['programming', 'computing']
}
})
12. DOCDB
http://github.com/gleicon/docdb
Almost a document database.
eBook indexing - Basic IR procedure
tokenize(split) each word
take the stop words out
stemming
group words to make composed searches possible
Lots of wordSETs, but as documents are stored, the growing
rate slows.
13. DOCDB
Simulation about how many wordSETs would be created by book:
$ python doc_to_sets.py books/10702.txt
5965
$ python doc_to_sets.py books/13437-8.txt
6125
$ python doc_to_sets.py books/2346.txt
1920
$ python doc_to_sets.py books/24022.txt
3470
$ python doc_to_sets.py books/advsh12.txt
5576
14. DOCDB
Simulation about how many wordSETs would be created by book, accumulating the result:
$ python doc_to_sets.py books/10702.txt books/13437-8.txt books/2346.txt books/24022.
txt books/advsh12.txt
5965
9183
9426
10030
11400
That would mean 11400 SETs in Redis, named for the STEM of the word, each one
containing the IDs of books with this word. The growing rate starts with 5965 (no sets) and
goes to 1370 sets between the last two documents.
The search would be like using SINTER, SUNION and SDIFF as shown before, to find
book by words.
15. The End
- Check the project's website: http://code.google.com/p/redis/
- Python/Twisted driver: http://github.com/fiorix/txredisapi (connection pool, consistent hashing)
- No silver bullet
- Plan ahead, use IR techniques
- Own your data
- SETs and LISTs are building blocks for most operations regarding indexes. Use them.
- http://code.google.com/p/redis/wiki/IntroductionToRedisDataTypes - Intro do Redis DataTypes
- More about its features: http://code.google.com/p/redis/wiki/Features
- http://code.google.com/p/redis/wiki/TwitterAlikeExample - Twitter clone using Redis