Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Spring one2gx2010 spring-nonrelational_data
1. Chicago, October 19 - 22, 2010
Using Spring with NoSQL
databases
Mark Pollack
Chris Richardson
2. Goal of this talk
• The benefits and drawbacks of
NoSQL databases
• How the Spring Data project
simplifies the development of
NoSQL applications
3. About Chris
•Grew up in England and live in Oakland, CA
•Over 25+ years of software development
experience including 14 years of Java
•Speaker at JavaOne, SpringOne, NFJS,
JavaPolis, Spring Experience, etc.
•Organize the Oakland JUG and the Groovy
Grails meetup
http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/
4. About Mark
• Spring committer since 2003
• Founder of Spring.NET
• Work on some other new open source projects
– Spring Gemfire
– Spring AMQP
• Used to work on large scale data collection/analysis in
High Energy Physics (~10yrs ago)
5. Agenda
• Why NoSQL?
• Some NoSQL Databases
• Spring NoSQL projects
• Demos/Code/Deep dive
7. Relational databases are great
• Well understood by developers
• Well supported by frameworks and tools, e.g. Spring JDBC,
Hibernate, JPA
• Well understood by operations
– Configuration
– Care and feeding
– Backups
– Tuning
– Failure and recovery
– Performance characteristics
• But….
8. The trouble with relational database
• Object/relational impedance mismatch
– Complicated to map rich domain model to relational schema
– Relational schema is rigid
• Extremely difficult/impossible to scale writes:
– Vertical scaling is limited/expensive
– Horizontal scaling is limited or requires $$
• Performance can be suboptimal
9. Why NoSQL?
Internet scale is the primary driver in the use of non-relational DBs
Large Data Size – TBs to PBs
High Transaction rates
Need for High Availability
Pictures courtesy of Gilt Group
Relational DBs
Harder to scale to larger size
Transactions and joins reduce access rates
10. Different priorities
• Relational Databases with ACID semantics
– Atomic, Consistent, Isolated, Durable
• Internet Systems with BASE semantics
– Basically Available, Scalable, Eventually Consistent
• NoSQL/BASE values remove standard DB behaviors
– No Joins
– Very limited "transactions"
– Different Durability/consistency guarantees
• NoSQL datastores emerge as “point solutions”
– Amazon/Google papers
– Facebook, LinkedIn …
11. The NoSQL Landscape
There are a variety of non-relational datastores
Many are Open Source
Many with companies behind them
Rapidly changing feature sets
Scalaris
Neo4J db4o Objectivity Amazon Dynamo
Terrastore Gemfire Sones
HBase CouchDB
InfoGrid RavenDB
Hadoop Scalaris Tokyo Cabinet
MongoDB
Hypertable Cassandra Azure Table Coherence
Mnesia Voldemort Gigaspaces
Riak
Amazon SimpleDB
Redis BerkeleyDB MemcacheDB
Tamino
InfiniteGraph Chordless
Expect consolidation over time
With so many choices, picking which one to use can be difficult
12. Stops on a tour through NoSQL-land
What is the data model?
How to I access my data?
What are the primary use cases?
What notable companies that are using it?
13. Data Model
Key Value
K1 V1
Key -> value K2 V2
Redis, Riak, Voldemort… K3 V2
“Amazon Dynamo inspired”
Column
Key -> ‘column families’
HBase, Cassandra
“Google Bigtable inspired”
14. Data Model
Document
Collections containing semi- { id: ‘4b2b9f67a1f631733d917a7b"),’
author: ‘joe’,
structured data tags : [‘example’, ‘db’],
comments : [ { author: 'jim', comment: 'OK' },
{ author: ‘ida', comment: ‘Bad' }
key-values , JSON, XML, ]
CouchDB, MongoDB { id: ‘4b2b9f67a1f631733d917a7c"),
author: ‘ida’, ...
{ id: ‘4b2b9f67a1f631733d917a7d"),
author: ‘jim’, ...
Graph
Graph: nodes and edges
Edges can be directional
Each can contain key-value
attributes
Neo4j, Sones, InfiniteGraph
16. How to I access my data?
Query Mechanism
Key lookup
Map-Reduce
Query by example
Query language
‘out of box’ query power usually linked to data model
Increasing power
KV -> Column -> Document/Graph
Counter examples
Riak is key-value and supports Map-Reduce
Spring Data Mapper…
.
17. Reality Check - Relational DBs are not going away
NoSQL usage small
by comparison…
But growing…
18. Future = multi-paradigm data storage for
enterprise applications
IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg
19. Agenda
• Why NoSQL?
• Some NoSQL Databases
• Spring NoSQL projects
• Demos/Code/Deep dive
20. Redis
• Advanced key-value store
– Think memcached on steroids (the good kind)
– Values can be binary strings, Lists, Sets, Ordered K1 V1
Sets, Hash maps, ..
– Operations for each data type, e.g. appending K2 V2
to a list, adding to a set, retrieving a slice of a
K3 V2
list, …
• Very fast: ~100K operations/second on entry-
level hardware
• Persistent
– Periodic snapshots of memory OR append
commands to log file
– Limits are size of keys retained in memory.
• Has “transactions”
• Libraries – Many languages, all separate
open source projects
21. Redis CLI
redis> sadd myset a
(integer) 1
redis> sadd myset b
(integer) 1
redis> smembers myset
1. "a"
2. "b"
redis> srem myset a
(integer) 1
redis> smembers myset
1. "b"
22. Scaling Redis
• Master/slave replication
– Tree of Redis servers
– Non-persistent master can replicate to a persistent slave
– Use slaves for read-only queries
• Sharding
– Client-side only – consistent hashing based on key
– Server-side sharding – coming one day
• Run multiple servers per physical host
– Leverage multiple CPUs
– 32 bit more efficient than 64 bit
• Optional "virtual memory"
– Ideally data should fit in RAM
– Values (not keys) written to disc
23. Redis use cases
• Use in conjunction with another database as the SOR
• Drop-in replacement for Memcached
– Session state
– Cache of data retrieved from SOR
• Denormalized datastore for high-performance queries
• Hit counts using INCR command
• Randomly selecting an item – SRANDMEMBER
• Queuing – Lists with LPOP, RPUSH, ….
• High score tables – Sorted sets
• Notable users: github, guardian.co.uk, ….
24. Cassandra
• Originally developed by Facebook for inbox search
– Now an Apache open-source project
• Extremely scalable
• Data is replicated and sharded
• The data model will hurt your brain
• Column-oriented database
• 4 or 5-dimensional hash map
25. Cassandra data model – insert/update
My Column family (within a key space)
Keys Columns
a colA: value1 colB: value2 colC: value3 This can
be data!
b colA: value colD: value colE: value
Insert(key=a, columName=colZ, value=foo)
Keys Columns
a colA: value1 colB: value2 colC: value3 colZ: foo
b colA: value colD: value colE: value
26. Cassandra query – range slice
Keys Columns
a colA: value1 colB: value2 colC: value3 colZ: foo
b colA: value colD: value colE: value
rangeSlice(startKey=a, endKey=b,
startColumn=colA, endColumnName=colB)
Keys Columns
a colA: value1 colB: value2
b colA: value
27. Cassandra CLI
$ bin/cassandra-cli
cassandra> connect localhost/9160
Connected to "Test Cluster" on localhost/9160
cassandra> get Keyspace1.Standard1['foo']
Returned 0 results
cassandra> set Keyspace1.Standard1['foo']['a'] = '2'
Value inserted
cassandra> get Keyspace1.Standard1['foo']
=> (Column=61, value=1, timestamp...)
cassandra> set Keyspace1.Standard1['foo']['bar'] = 'abc'
Value inserted
cassandra> get Keyspace1.Standard1['foo']
=> (Column=626172, value=abc, timestamp...)
=> (Column=61, value=1, timestamp...)
cassandra> count Keyspace1.Standard1['foo']
2 columns
cassandra>
28. Scaling Cassandra through sharding/replication
Keys = [D, A] •Client connects to any node
•Dynamically add/remove
nodes
Node 1
•Configurable # replicas
Token = A •Reads/Writes specify how
many nodes
Replicates to Replicates to Keys = [A, B]
Node 4 Node 2
Token = D Token = B
Keys = [C, D] Replicates to Replicates to
Node 3
Token = C
Keys = [B, C]
29. Cassandra use cases
• Use cases
• Big data
• Persistent cache
• (Write intensive) Logging
• Who is using it
– Digg, Facebook, Twitter, Reddit, Rackspace
– Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX
– “The largest production cluster has over 100 TB of data in
over 150 machines.“ – Casssandra web site
30. MongoDB
• JSON-style documents
– Lists, Maps, Primitive values
• Documents organized into collections (~table)
• Full or partial document updates
– Transactional update in place on one document
– Atomic Modifiers
• GridFS for efficiently storing large files
• Index support – secondary and compound
• Rich query language for dynamic queries
• Map/Reduce
• Replication and Auto Sharding
32. MongoDB query by example
• Find a restaurant that serves the 94619 zip code and is
open at 6pm on a Monday
{
serviceArea:"94619",
openingHours: {
$elemMatch : {
"dayOfWeek" : Monday,
"open": {$lte: 1800},
"close": {$gte: 1800}
}
}
} DBCursor cursor = collection.find(qbeObject);
while (cursor.hasNext()) {
DBObject o = cursor.next();
…
}
34. Scaling MongoDB
Shard 1 Shard 2
mongod mongod
mongod mongod mongod mongod
A shard
consists of a
Config Server replica set =
generalization
mongod mongos mongos of master slave
mongod
Collections
mongod spread over
client multiple shards
35. MongoDB use cases
• Use cases
– Real-time analytics
– Content management systems
– Single document partial update
– Caching
– High volume writes
• Who is using it?
– Shutterfly, Foursquare
– Bit.ly Intuit
– SourceForge, NY Times
– GILT Groupe, Evite,
– SugarCRM
36. CouchDB
• JSON documents
• Database is flat collection of documents
• Queries are implemented as views using Map/Reduce in
Javascript
– Queries typically defined up front
– Views are indexed and incrementally updated
• MVCC based
– Need to deal with reconciliation
• Multi-Master (bi) replication
• Scale out with replication
• Mobile version - Android
• Use as a Server-Side JavaScript application server
– “CouchApps”
• Cloudant provides Java VM based views
37. CouchDB with curl
$ curl -X PUT http://localhost:5984/contacts
{"ok":true}
$ curl -X PUT http://localhost:5984/contacts/markpollack -d
'{"firstName":"Mark",
"lastName":"Pollack",
"email":"mpollack@vmware.com"}‘
{"ok":true,"id":"markpollack","rev":"1-7bb43f219a8d4306614dbd526ed577fe"}
$curl -X GET http://localhost:5984/contacts/markpollack
{"_id":"markpollack",
"_rev":"1-7bb43f219a8d4306614dbd526ed577fe",
"firstName":"Mark",
"lastName":"Pollack",
"email":"mpollack@vmware.com"}
40. CouchDB use cases
• Use cases
– Synchronization
• master-master replication
• Offline replication for mobile devices: “Ground Computing”
– Web apps that consume JSON
• Server side JavaScript using CouchApps
– Content management systems
– Caching
• Who is using it?
– Mozilla, BBC, Palm, Apple,
– IBM, Ubuntu, meebo, Engine Yard
– Cloudant, UNICEF, Credit Suisse
41. Neo4j
• Graph Database
• DB is a collection of graph nodes, relationships
– Nodes and relationships have properties
• Querying is done via a traversal API
• Indexes on node/relationship properties
• Written in Java, can be embedded
– Also a REST API
• Transactional (ACID)
• Master-Slave replication in next release
43. Neo4j Use Cases
• Use Cases
– Social networking
– Bioinformatics
– Fraud detection
• Who is using it?
– Fanbox, Jayway
– Large companies that wish to remain anonymous
44. Agenda
• Why NoSQL?
• Some NoSQL Databases
• Spring NoSQL projects
• Demos/Code/Deep dive
45. Spring Data Project Goals
• Bring classic Spring value propositions to NoSQL
– Productivity
– Programming model consistency
• Support for a wide range of NoSQL databases
• Many entry points to use
– Low level data access APIs
– Opinionated APIs (Think JdbcTemplate)
– Object Mapping (Java and GORM)
– Cross Store Persistence Programming model
– Productivity support in Roo and Grails
– Vertical Applications
– Guidance
46. Spring Data
• Working together with various projects/companies
– Redis / VMware
– Riak / Basho
– CouchDB / CouchOne
– Mongo / 10gen
– Neo4j / Neo Technologies
47. Spring Data Projects
Data Commons Planned
Polyglot persistence Guidance Docs
Datastore Key-Value The big picture
Redis Datastore Column
Cassandra, HBase
Datastore Document
Hadoop
MongoDB, CouchDB
Datastore Key-Value
Datastore Graph
Riak
Neo4j
Datastore Mapping
Datastore Mapping
Additional DB Map
Redis, Gemfire
Blob storage
Amazon, Atmos, Azure
Reaching M1 release in Q4. SQL - Generic DAOs
Grails/Roo support
48. Spring Datastore Key-Value (Redis)
Interface based low-level Redis Client API
Method names are Redis command names
Pluggable implementations (Jedis, JRedis)
Translation to Spring DAO exception hierarchy
Connection Pooling
RedisTemplate
Descriptive method names, grouped into natural categories
ServerOps, SetOps
Convenience methods – ‘redis recipies’
getAndConvert, convertAndSet for payload/value serialization
Java, JSON, XML
Protocol Buffers, Avro, to come…
Redis backed java.util.Set, List, Map, Capped Collections
RedisPlatformTransactionManager
Redis Messaging
RedisMessageTemplate, RedisMessageListenerContainer
49. Spring Redis Examples
RedisClientFactory clientFactory = new JedisClientFactory();
RedisClient client = clientFactory.createClient();
client.sadd("s1", "1"); client.sadd("s1", "2");
client.sadd("s1", "3");
client.sadd("s2", "2"); client.sadd("s2", "3");
Set<String> intersection = client.sinter("s1", "s2");
System.out.println(intersection); // [3,2]
RedisTemplate redisTemplate = new RedisTemplate(client)
Person p = new Person("Joe", "Trader", 33);
redisTemplate.convertAndSet("trader:1", p);
p = redisTemplate.getAndConvert("trader:1", Person.class);
intersection =
redisTemplate.getSetOps().getIntersectionOfSets(“s1”,”s2”)
RedisSet s1 = new RedisSet(template, "s1");
s1.add("1"); s1.add("2"); s1.add("3");
RedisSet s2 = new RedisSet(template, "s2");
s2.add("2"); s2.add("3");
Set s3 = s1.intersection("s3", s1, s2); // [3,2]
50. Spring MongoDB
• MongoTemplate
– ‘RowMapper’ like functionality for mapping Mongo query
results.
– Basic POJO mapping support for simple CRUD
• Leverage Spring 3.0 TypeConverters and SpEL
– Connection affinity callback
• Support common map-reduce operations from Mongo
Cookbook
• Higher level integration
– File Upload
– Spring MVC activity monitoring and analysis
51. Spring MongoDB Example
Mongo mongo = new Mongo();
DB db = mongo.getDB("mvc");
MongoTemplate mongoTemplate = new MongoTemplate(db,
"people");
Person p = new Person("Joe", "Trader", 33);
mongoTemplate.save(p);
List<People> people =
mongoTemplate.queryForCollection(Person.class);
52. Spring CouchDB
CouchTemplate
RestTemplate based API for CouchDB
Basic JSON/POJO mapping support
CouchDbMappingJacksonHttpMessageConverter
Android implementation
.NET implementation planned
Repository base classes
Out of the box CRUD
Embedded View Definitions
Automatic view generation
Plans to work with ektrop OS project
http://code.google.com/p/ektorp/
53. Spring CouchDB Example
RestTemplate restTemplate = new RestTemplate();
restTemplate.setMessageConverters(couchDbConverter);
List<Restaurants> rests = (List<Restaurant>)
restTemplate.getForObject(getDbURL() +
"_design/demo/_view/all", Restaurant.class);
UserAccount userAccount = new UserAccount("u1");
restTemplate.put(getDbURL() + userAccount.getUserName(),
userAccount);
//Note, not handling response to get new revision string…
55. Spring Neo4j
• Easy POJO based programming model
• Support for graph Entities, Relationships, and Properties
– Properties leverage Spring 3.0 converters
• DAO finder support
– findAll, count, findById, findByPropertyValue,
findAllByTraversal
See Graph Database Persistence Using Neo4j with Spring and Roo
Tomorrow @ 10:15am in Lilac C
56. Spring Cross Store
• Use Case
– An existing application want to use NOSQL database as a
part of its existing SQL-based application
• Pick a problem space for which NOSQL is well suited
• How to link together the two worlds?
– Developer application code task
– No longer a single domain entity
• Spring Cross store allows for a single entity to be
persisted across multiple datastores
– AspectJ manages NOSQL entities/fields in change sets
– Change sets are flushed when JPA commits
– Interception to rehydrate when performing JPA queries
60. Spring Mapping
• GORM for Redis, Gemfire….
– Built on pure Java
• Mapping metadata is not tied to GORM
– Can use annotations, XML etc.
• Will be working on making this mapping framework
available to Java developers
63. Agenda
• Why NoSQL?
• Some NoSQL Databases
• Spring NoSQL projects
• Demos/Code/Deep dive
64. Redis meets POJOs in Action
• Finding restaurants that are open at the delivery time (e.g.
94619) and serve the delivery zip code (e.g. 6pm)
select r.*
from FTGO_RESTAURANT_ZIPCODE zip,
FTGO_RESTAURANT_TIME_RANGE tr, FTGO_RESTAURANT r
where r.RESTAURANT_ID = zip.RESTAURANT_ID
and tr.RESTAURANT_ID = r.RESTAURANT_ID;
and zip.ZIPCODE = '94619'
and tr.day_Of_week = 'Monday'
Joins and
and tr.open_time <= 1800
complex
and 1800<= tr.close_time where clause!
No way Redis can handle this! Right?
65. Redis – denormalizing the data is the key
Tr_id Day of Open_time Close_tim Zip code
Week e
Ajanta10 Monday 1130 1430 94707
Ajanta10 Monday 1130 1430 94619
Ajanta11 Monday 1730 2130 94707
Ajanta11 Monday 1730 2130 94619
Egg30 Monday 0700 1430 94707
… Two simpler queries
(SELECT tr_id
FROM time_range_zip_code
where day_of_week = ? and zip_code = ? open_time < ?)
INTERSECTION
(SELECT tr_id
FROM time_range_zip_code
where day_of_week = ? and zip_code = ? and ? < close_time)
66. Redis – storing the opening hours using sorted
sets
Sorted Set
Keys Members = TimeRangeId with opening time as score
open:94707:Monday Ajanta10:1130 Ajanta11:1730
open:94619:Monday Egg30:0700 Ajanta10:1130 Ajanta11:1730
ZRANGEBYSCORE open:94619:Monday 0 1800
{Egg30, Ajanta10, Ajanta11, …} Finds all time ranges that open
before 6pm
67. Redis – storing closing times with sorted sets
Ordered Set
Keys Members = TimeRangeId with closing time as score
close:94707:Monday Ajanta10:1430 Ajanta11:2130
close:94619:Monday Egg30:1430 Ajanta10:1430 Ajanta11:2130
ZRANGEBYSCORE close:94619:Monday 1800 2400
Finds all time ranges that close
{Ajanta11, …} after 6pm
68. Redis – computing the intersection
{Egg30, Ajanta10, Ajanta11, …}
Intersection
{Ajanta11, …}
=
{Ajanta11, …}
• This is super-cool!
• Illustrates the feasibility of using Redis as a queryable
datastore to improve scalability
69. Cassandra – Food to Go
AvailableRestaurants Column Family
Keys Columns
1430
94619_
Super
Monday_ Egg30: {…} column
0700
1430
94619_
Monday_ Ajanta10: {…}
1130
2130
94619_
Monday_ Ajanta11: {…}
1730
70. Cassandra – Food to Go
rangeSlice(startKey=94619_Monday_0000,
endKey=94619_Monday_1800,
startColumn=1800,
endColumnName=2359)
Keys Columns
2130
94619_
Monday_ Ajanta11: {…}
1730
71. Food to Go - MongoDB
{ {
"name" : "Ajanta", serviceArea:"94619",
"type" : "Indian", openingHours: {
"serviceArea" : [ $elemMatch : {
"94619", "dayOfWeek" : Monday,
"94618" "open": {$lte: 1800},
], "close": {$gte: 1800}
"openingHours" : [ }
{ }
"dayOfWeek" : Monday, }
"open" : 1730,
"close" : 2130
}
], Straightforward to model a
"_id" : restaurant aggregate as a MongoDB
ObjectId("4bddc2f49d1505567c document and execute queries
6220a0")
}
72. MongoDB – MVC Analytics
• Monitoring page view and web application activity is
critical to analyzing effectiveness of
– Marketing campaigns
– Site design / new features
• Standard solutions are problematic aside from scaling
– Google Analytics
• not real time, hard to setup
– MySQL
• Downtime for schema changes
• Solutions needs to handle high write volume, handle large
amounts of data, allow for flexible data exploration.
73. Web Analytics - SQL
Still difficult to
scale writes
Application MySQL
Server Master
Reporting
MySQL
Server MySQL
Slave
MySQL
Slave
Slave
74. Web Analytics - MongoDB
Application
Server
MongoD
Reporting
Server
75. Web Analytics - MongoDB
Application MongoD
Server
MongoS
Reporting
Server MongoD
76. Spring MVC based solution
• Custom Handler Adapter with a ActionInterceptor
• ActionInterceptor has full context of web action
– Controller class, method name, and parameters
– WebRequest
– ModelAndView
– Exception
• Similar to Ruby on Rails/ASP.NET MVC Action Filters
• Collect data in MongoDB
– MvcEvent
– Increment counters per controller and action method
• AnalyticsController to display results
– Action method parameter information is critical for reporting
• Also a good fit for an Insight plugin!
82. CouchDB – Mobile offline access and sync
• Smartphones have huge data capacity
• Can carry large data sets with you.
• Mobile devices are not always connected
• Data sync is a difficult problem to solve
– CouchOne Mobile handles the dirty bits of sync
– Available for Android
• Same programming model as on server side
• Spring RestTemplate based CouchDB API
– On Server and Phone
• Can also be used a full-stack environment on mobile
– Database, JavaScript, Web Server
83. CouchDB – Offline data access and sync
Continuous
bi-directional
replication Application
Server
85. Summary
• Many new options for persistence
– A particular use case in your application might be a good fit
– Polyglot persistence will be increasingly common
• Trade-offs with
– Normalization
– Transactions
– Familiarity
• The Spring Data project
– Make it easier to build Spring-powered applications that use
these new technologies
• More info at http://www.springframework.org/spring-data
– Welcome contributors, comments…
86. Q&A
Chris Richardson Mark Pollack
crichardson@vmware.com mpollack@vmware.com
@crichardson @markpollack