The ColumnFamily data model and wide-row support provides the ability to store and access data efficiently in a de-normalized state. Recent enhancements for CQL's spare tables and built-in indexing provide the capability to store data in a manner similar to that of relational databases. For many use cases hybrid approaches are needed, because complete de-normalization is appropriate for some access patterns whereas more structured data is appropriate for others. At times a single logical event becomes multiple insertions across multiple column families. Likewise a user request might require a several reads across different column families. This talk describes some of these scenarios and demonstrates how advanced operations such multiple step procedures, filtering, intersection, and paging can be implemented client side or server side with the help of the IntraVert plugin.
4. You can query up a storm
● SELECT firstname,lastname FROM users WHERE username='tcodd';
firstname | lastname
-----------+----------
Ted | Codd
● SELECT * FROM videos WHERE videoid = 'b3a76c6b-7c7f-4af6-964f-
803a9283c401' and videoname>'N';
videoid | videoname | description
| tags | upload_date | username
b3a76c6b-7c7f-4af6-964f-803a9283c401 | Now my dog plays piano! | My
dog learned to play the piano because of the cat. | dogs,piano,lol | 2012-
08-30 16:50:00+0000 | ctodd
6. ● Can I slice a slice (or sub query)?
● Can I do advanced where clauses ?
● Can I union two slices server side?
● Can I join data from two tables without two
request/response round trips?
● What about procedures?
● Can I write functions or aggregation functions?
7. Let's look at the API's we have
http://www.slideshare.net/aaronmorton/apachecon-nafeb2013
8. But none of those API's do what I
want, and it seems simple
enough to do...
10. Why not just do it client side?
● Move processing close to data
– Idea borrowed from Hadoop
● Doing work close to the source can result in:
– Less network IO
– Less memory spend encoding/decoding 'throw
away' data
– New storage and access paradigms
11. Vertx + cassandra
● What is vertx ?
– Distributed Event Bus which spans the server and
even penetrates into client side for effortless 'real-
time' web applications
● What are the cool features?
– Asynchronous
– Hot re-loadable modules
– Modules can be written in groovy, ruby, java, java-
script
http://vertx.io
13. HTTP Transport
● HTTP is easy to use on firewall'ed networks
● Easy to secure
● Easy to compress
● The defacto way to do everything anyway
● IntraVert attempts to limit round-trips
– Not provide a terse binary format
14. JSON Payload
● Simple nested types like list, map, String
● Request is composed of N operations
● Each operation has a configurable timeout
● Again, IntraVert attempts to limit round-trips
– Not provide a terse message format
15. Why not use lighting fast transport
and serialization library X?
● X's language/code gen issues
● You probably can not TCP dump X
● Net-admins don't like 90 jars for health checks
● IntraVert attempts to limit round-trips:
– Prepared statements
– Server side filtering
– Other cool stuff
19. Application requirement
● User request wishes to know which beers are
“Breakfast Stout” (s)
● Common “solutions”:
– Write a copy of the data sorted by type
– Request all the data and parse on client side
20. Using an IntraVert filter
● Send a function to the server
● Function is applied to subsequent get or slice
operations
● Only results of the filter are returned to the
client
21. Defining a filter JavaScript
● Syntax to create a filter
{
"type": "CREATEFILTER",
"op": {
"name": "stouts",
"spec": "javascript",
"value": "function(row) { if (row['value'] == 'Breakfast Stout')
return row; else return null; }"
}
},
22. Defining a filter Groovy/Java
● We can define a groovy closure or Java filter
{
"type": "CREATEFILTER",
"op": {
"name": "stouts",
"spec": "groovy",
"{ row -> if (row["value"] == "Breakfast Stout") return row else
return null }"
}
},
29. Application Requirements
● User wishes to intersect the column names of
two slices/queries
● Common “solutions”
– Pull all results to client and apply the intersection
there
30. Server Side MultiProcessor
● Send a class that implements MultiProcessor
interface to server
● public List<Map> multiProcess
(Map<Integer,Object> input, Map params);
● Do one or more get/slice operations as input
● Invoke MultiProcessor on input
34. Imagine you want to insert this data
● User wishes to enter this event for multiple column
families
– 09/10/201111:12:13
– App=Yahoo
– Platform=iOS
– OS=4.3.4
– Device=iPad2,1
– Resolution=768x1024
– Events–videoPlayPercent=38–Taste=great
http://www.slideshare.net/charmalloc/jsteincassandranyc2011
38. IntraVert status
● Still pre 1.0
● Good docs
– https://github.com/zznate/intravert-ug/wiki/_pages
● Functional equivalent to thrift (mostly features
ported)
● CQL support
● Virgil (coming soon)
● Hbase like scanners (coming soon)