Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Polyglot persistence for Java
developers:
time to move out of the relational
comfort zone?
Chris Richardson
Author of POJO...
@crichardson
Presentation Goal
The benefits and drawbacks
of polyglot persistence
and
How to design applications
that use t...
@crichardson
About Chris
@crichardson
About Chris
Founder of a startup that’s creating
a platform for developing
event-driven microservices
@crichardson
Agenda
• Why polyglot persistence?
• Persisting entities with MongoDB and Cassandra
• Querying data with Mong...
@crichardson
Relational Databases
@crichardson
Example: Food to Go
• Take-out food delivery
service
• “Launched” in 2006
@crichardson
FoodTo Go Architecture
Order
taking
Restaurant
Management
MySQL
Database
CONSUMER
RESTAURANT
OWNER
@crichardson
Example: Device management
server ~ 2003
• Everything was stored in a Oracle database
• Device metadata
• Fir...
@crichardson
RDBMS are great
• SQL = Rich, declarative query language
• Database enforces referential integrity
• ACID sem...
@crichardson
Impact of SSD/Flash storage
• HDD = 200 IOPS vs. SSD = 100K IOPS
• Massive performance improvement
• Expands ...
@crichardson
• Hosted relational database
• Compatible with MySQL 5.6 but with 5x performance
• Vertically scales to 32 vC...
NEW SQL
• Next generation SQL databases,
e.g.VoltDB, MemSQL, ...
• Leverage modern, multi-core,
commodity hardware
• In-me...
@crichardson
An RDBMS is great for many
applications but ….
@crichardson
Limitations of relational
databases
• Scalability
• Multi data center, distributed database
• Schema updates
...
@crichardson
Solution: Spend $$$ on Oracle’s
high-end databases and servers
@crichardson
Not so bad…
http://www.powerandmotoryacht.com/megayachts/megayacht-musashi
@crichardson
… or is it?
http://www.iwtg.net/
@crichardson
Solution: Spend $$$ - open-
source stack + DevOps people
http://www.trekbikes.com/us/en/bikes/road/race_perfo...
@crichardson
Apply the scale cube
X axis
- horizontal duplication
Z
axis
-data
partitioning
Y axis -
functional
decomposit...
@crichardson
Applying the scale cube
• Y-axis splits/functional decomposition
• Application = Set[Microservice] - each wit...
@crichardson
How does each service access
data?
?
Velocity and
Volume
Variety of
Data
Fixed or ad
hoc queries
Access
patte...
@crichardson
Velocity andVolume?
• Velocity - speed at which data moves
• Volume - the amount of data
• Does it fit on a si...
@crichardson
Variety of Data?
• Relational
• Aggregate oriented
• Graph
• Complex nested structures
• Semi structured
• Te...
@crichardson
Fixed or ad hoc queries?
• Fixed set of queries
• Known in advance
• Slowly changing
• Ad hoc queries
• Users...
@crichardson
Access patterns
• PK-oriented access, e.g. load-modify-update a business entity
• Bulk queries and/or updates...
@crichardson
Reads vs.Writes
• Mix of reads and writes
• Write intensive, e.g. logging application
• Read intensive
• Data...
@crichardson
Distribution
• Single database
• Multiple active databases
• on a LAN (low latency)
• on a WAN (high latency)
@crichardson
Transactions
• Mandatory ACID
• Eventual consistency OK?
@crichardson
Latency
• When should new data show up in results?
• Low latency - seconds, milliseconds?
• High latency - ne...
@crichardson
And then pick your database…
@crichardson
Use a NoSQL database
Benefits
• Higher performance
• Higher scalability
• Richer data-model
• Schema-less
Draw...
@crichardson
Example NoSQL Databases
Database Key features
Cassandra
Extensible column store,
very scalable, distributed
M...
@crichardson
Relative popularity
http://www.indeed.com/jobtrends/mongodb%2Ccassandra%2Credis%2Cneo4j%2Cdynamodb.html
@crichardson
But there are many other options
• Blob store, e.g.AWS S3
• Text search engine, e.g. ElasticSearch,AWS CloudS...
@crichardson
Polyglot persistence
IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg
Event sourcing and...
@crichardson
Agenda
• Why polyglot persistence?
• Persisting entities with MongoDB and Cassandra
• Querying data with Mong...
@crichardson
Food to Go – Domain model (partial)
class Restaurant {
long id;
String name;
Set<String> serviceArea;
Set<Tim...
@crichardson
Database schema
ID Name …
1 Ajanta
2 Montclair Eggshop
Restaurant_id zipcode
1 94707
1 94619
2 94611
2 94619
...
@crichardson
RestaurantRepository
public interface RestaurantRepository {
void addRestaurant(Restaurant restaurant);
Resta...
@crichardson
MongoDB
• Document-oriented database
• JSON-style documents: Lists, Maps, primitives
• Schema-less
• Transact...
@crichardson
MongoDB use cases
• High volume writes
• Complex data
• Semi-structured data
@crichardson
MongoDB data model
Server
Database: Food To Go
Collection: Restaurants
{
"_id" : ObjectId("4bddc2f49d1505567c...
@crichardson
Many NoSQL Databases
=
Aggregate-oriented
@crichardson
Basic MongoDB collection
operations...
• insert(document(s), options)
• Application assigned ids
• Mongo gene...
@crichardson
....Basic MongoDB collection
operations
• find/findOne(criteria, projection)
• criteria - query
• projection - ...
@crichardson
Using Spring Data for Mongo
@Repository
class RestaurantRepositoryMongoDbImpl implements RestaurantRepository...
@crichardson
Apache Cassandra
• Distributed/Extensible row store: row ~= java.util.SortedMap
• Transaction = update of a r...
@crichardson
Apache Cassandra use cases
• Big data
• Multiple Data Center distributed database
• (Write intensive) Logging...
@crichardson
Cassandra data model
Keyspace
Table
K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3
N1 V1 TS1 N2 V2 TS2 N3 V3 TS3K2
Column
N...
@crichardson
Inserting/updating data
table.insert(key=K1, (N4, V4, TS4), …)Idempotent= transaction
Table
K1 N1 V1 TS1
…
N2...
@crichardson
Reading data
table.slice(key=K1, startColumn=N2, endColumn=N4)
Tables
K1 N1 V1 TS1
…
N2 V2 TS2 N3 V3 TS3 N4 V...
@crichardson
Cassandra Query Language
• SQL-like
• DDL: Create table, ...
• DML: Insert, Update, Select, ...
• Restricted ...
@crichardson
Representing restaurants
create table restaurant (
	 restaurant_id int PRIMARY KEY,
	 name text,
	 service_ar...
@crichardson
Inserting and retrieving
restaurants
insert into restaurants.restaurant(
restaurant_id, name, service_area,
d...
@crichardson
Storing restaurants in Cassandra
name Ajanta1 serviceArea:94619 -
serviceArea:94618 -
Set member
daysOfWeeks:...
@crichardson
Cassandra Java APIs
• Java Driver
• https://github.com/datastax/java-driver
• Netflix Astanyx
• http://techblo...
@crichardson
Java Driver: Inserting a restaurant
public class AvailableRestaurantRepositoryCassandraImpl ...
public Availa...
@crichardson
Java Driver: Finding a restaurant
public class AvailableRestaurantRepositoryCassandraImpl
implements Availabl...
@crichardson
Agenda
• Why polyglot persistence?
• Persisting entities with MongoDB and Cassandra
• Querying data with Mong...
@crichardson
Finding available restaurants
Available restaurants =
Serve the zip code of the delivery address
AND
Are open...
@crichardson
Finding available restaurants on Monday, 6.15pm for
94619 zipcode
Straightforward three-way join
select r.*
f...
@crichardson
MongoDB = easy to query
{
serviceArea:"94619",
openingHours: {
$elemMatch : {
"dayOfWeek" : "Monday",
"open":...
@crichardson
Using Spring Data for Mongo
@Repository
class RestaurantRepositoryMongoDbImpl implements RestaurantRepository...
@crichardson
BUT how to do this with
Cassandra??!
• How can Cassandra support a query that has
• A 3-way join
• Multiple =...
@crichardson
Simplification #1:
Denormalization
Restaurant_id Day_of_week Open_time Close_time Zip_code
1 Monday 1130 1430 ...
@crichardson
Simplification #2:Application
filtering
SELECT restaurant_id, open_time
FROM time_range_zip_code
WHERE day_of_w...
@crichardson
Available restaurants table
create table available_restaurants (
id int,
name text,
zip_code text,
day_of_wee...
@crichardson
Cassandra available_restaurants
table
1430:1:name Ajanta94619:Monday
1430:1:open_time 1130
close_time:id:≪col...
@crichardson
Finding available restaurants
select *
from available_restaurants
where
zip_code = '94619'
and day_of_week = ...
@crichardson
Cassandra query
@Repository
class AvailableRestaurantRepositoryCassandraImpl
implements RestaurantRepository ...
@crichardson
Cassandra query
@Repository
class AvailableRestaurantRepositoryCassandraImpl implements
RestaurantRepository ...
@crichardson
NoSQL Denormalized
representation for each query
@crichardson
SorryTed!
http://en.wikipedia.org/wiki/Edgar_F._Codd
@crichardson
About Cassandra and MongoDB
• Cassandra:
• Efficient storage of
complex aggregates
• Limited queries requiring...
@crichardson
Agenda
• Why polyglot persistence?
• Persisting entities with MongoDB and Cassandra
• Querying data with Mong...
Scaling MongoDB: Replica Sets
Replica Set
Mongod
(secondary)
Mongod
(primary)
Mongod
(secondary)
Client
http://docs.mongod...
Mongos
Scaling MongoDB: Sharding
Replica Set 2 (aka. Shard 2)
Mongod
(secondary)
Mongod
(primary)
Mongod
(secondary)
Repli...
@crichardson
MongoDB Sharding
• Collection is partitioned into chunks
• Each shard is responsible for one or more chunks
•...
@crichardson
MongoDB reads and writes
• Writes
• Trade-off: request latency vs. safety
• No acknowledgement!
• Acknowledge...
@crichardson
Cassandra cluster
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
Key
Partitioner
64/128-bit ...
@crichardson
Multiple data centers
DC 1 DC 2
@crichardson
Cassandra reads and writes
• Any node can handle any request
• Plays the role of coordinator
• Communicates w...
@crichardson
Cassandra read and write
consistency
• For each read and write request you specify:
• How many nodes to read/...
@crichardson
Consistency examples
• High-performance, high-availability writes, e.g. logging
• Write consistency of ANY - ...
@crichardson
Comparing Cassandra and
MongoDB
• Cassandra
• Replica model
• Write to any replica (or
Node)
• Sync locally/a...
@crichardson
Summary
• Each SQL/NoSQL database = set of tradeoffs
• NoSQL databases:
• Diverse
• Aggregate-oriented (typic...
@crichardson
Questions?
@crichardson chris@chrisrichardson.net
http://plainoldobjects.com
Prochain SlideShare
Chargement dans…5
×

Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

1 476 vues

Publié le

Relational databases have long been considered the one true way to persist enterprise data. Even today, they are an excellent choice for many applications. But for some applications NoSQL databases are a viable alternative. They can simplify the persistence of complex data models and offer significantly better scalability, and performance. But using NoSQL databases is very different than the ACID/SQL/JDBC/JPA world that we have become accustomed to. They have different and unfamiliar APIs and a very different and usually limited transaction model. So what’s a Java developer to do?

Publié dans : Logiciels

Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

  1. 1. Polyglot persistence for Java developers: time to move out of the relational comfort zone? Chris Richardson Author of POJOs in Action Founder of the original CloudFoundry.com @crichardson chris@chrisrichardson.net http://plainoldobjects.com
  2. 2. @crichardson Presentation Goal The benefits and drawbacks of polyglot persistence and How to design applications that use this approach
  3. 3. @crichardson About Chris
  4. 4. @crichardson About Chris Founder of a startup that’s creating a platform for developing event-driven microservices
  5. 5. @crichardson Agenda • Why polyglot persistence? • Persisting entities with MongoDB and Cassandra • Querying data with MongoDB and Cassandra • Scaling MongoDB and Cassandra
  6. 6. @crichardson Relational Databases
  7. 7. @crichardson Example: Food to Go • Take-out food delivery service • “Launched” in 2006
  8. 8. @crichardson FoodTo Go Architecture Order taking Restaurant Management MySQL Database CONSUMER RESTAURANT OWNER
  9. 9. @crichardson Example: Device management server ~ 2003 • Everything was stored in a Oracle database • Device metadata • Firmware patches! • ….
  10. 10. @crichardson RDBMS are great • SQL = Rich, declarative query language • Database enforces referential integrity • ACID semantics • Well understood by developers • Well supported by frameworks and tools, e.g. Spring JDBC, Hibernate, JPA • Well understood by operations
  11. 11. @crichardson Impact of SSD/Flash storage • HDD = 200 IOPS vs. SSD = 100K IOPS • Massive performance improvement • Expands the range of use cases that a single RDBMS server can cost-effectively support
  12. 12. @crichardson • Hosted relational database • Compatible with MySQL 5.6 but with 5x performance • Vertically scales to 32 vCPUs and 244 GiB of RAM • SSD-backed virtualized storage layer, replicated 6 ways across 3 AZs • Up to 15 replicas that share storage with master - minimal replication lag • Fast restart after crash • No redo log replay • SSD-backed virtualized storage layer purpose-built for database workloads • Fast fail-over to replica after master instance failure without data loss AWS Aurora http://aws.amazon.com/rds/aurora/details/
  13. 13. NEW SQL • Next generation SQL databases, e.g.VoltDB, MemSQL, ... • Leverage modern, multi-core, commodity hardware • In-memory • Horizontally scalable • Transparently shardable • ACID “Current databases are designed for 1970s hardware and for both OLTP and data warehouses” http://nms.csail.mit.edu/~stavros/pubs/OLTP_sigmod08.pdf
  14. 14. @crichardson An RDBMS is great for many applications but ….
  15. 15. @crichardson Limitations of relational databases • Scalability • Multi data center, distributed database • Schema updates • O/R impedance mismatch • Handling semi-structured data
  16. 16. @crichardson Solution: Spend $$$ on Oracle’s high-end databases and servers
  17. 17. @crichardson Not so bad… http://www.powerandmotoryacht.com/megayachts/megayacht-musashi
  18. 18. @crichardson … or is it? http://www.iwtg.net/
  19. 19. @crichardson Solution: Spend $$$ - open- source stack + DevOps people http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_5_series/madone_5_2/#
  20. 20. @crichardson Apply the scale cube X axis - horizontal duplication Z axis -data partitioning Y axis - functional decomposition Scale by splitting sim ilar things Scale by splitting different things
  21. 21. @crichardson Applying the scale cube • Y-axis splits/functional decomposition • Application = Set[Microservice] - each with its own database • Monolithic database is functionally decomposed • Different types of entities in different databases • Z-axis splits/sharding • Entities of the same type partitioned across multiple databases
  22. 22. @crichardson How does each service access data? ? Velocity and Volume Variety of Data Fixed or ad hoc queries Access patterns DistributionLatency
  23. 23. @crichardson Velocity andVolume? • Velocity - speed at which data moves • Volume - the amount of data • Does it fit on a single machine?
  24. 24. @crichardson Variety of Data? • Relational • Aggregate oriented • Graph • Complex nested structures • Semi structured • Text • Binary blogs, e.g. images
  25. 25. @crichardson Fixed or ad hoc queries? • Fixed set of queries • Known in advance • Slowly changing • Ad hoc queries • Users can submit ad hoc queries
  26. 26. @crichardson Access patterns • PK-oriented access, e.g. load-modify-update a business entity • Bulk queries and/or updates • Non-relational queries: • text search • graph-oriented • geo search • …
  27. 27. @crichardson Reads vs.Writes • Mix of reads and writes • Write intensive, e.g. logging application • Read intensive • Data analytics/warehouse • Slowly changing data • …
  28. 28. @crichardson Distribution • Single database • Multiple active databases • on a LAN (low latency) • on a WAN (high latency)
  29. 29. @crichardson Transactions • Mandatory ACID • Eventual consistency OK?
  30. 30. @crichardson Latency • When should new data show up in results? • Low latency - seconds, milliseconds? • High latency - next day?
  31. 31. @crichardson And then pick your database…
  32. 32. @crichardson Use a NoSQL database Benefits • Higher performance • Higher scalability • Richer data-model • Schema-less Drawbacks • Limited transactions • Limited querying • Relaxed consistency • Unconstrained data
  33. 33. @crichardson Example NoSQL Databases Database Key features Cassandra Extensible column store, very scalable, distributed MongoDB Document-oriented, fast, scalable Redis Key-value store, very fast DynamoDB AWS hosted key-value and document store Neo4j Graph Database http://nosql-database.org/ lists 150 NoSQL databases
  34. 34. @crichardson Relative popularity http://www.indeed.com/jobtrends/mongodb%2Ccassandra%2Credis%2Cneo4j%2Cdynamodb.html
  35. 35. @crichardson But there are many other options • Blob store, e.g.AWS S3 • Text search engine, e.g. ElasticSearch,AWS CloudSearch, … • Big data technology:Apache Hadoop,Apache Spark, … • Real time streaming: Storm, Spark Streaming, …
  36. 36. @crichardson Polyglot persistence IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg Event sourcing and CQRS are a great approach
  37. 37. @crichardson Agenda • Why polyglot persistence? • Persisting entities with MongoDB and Cassandra • Querying data with MongoDB and Cassandra • Scaling MongoDB and Cassandra
  38. 38. @crichardson Food to Go – Domain model (partial) class Restaurant { long id; String name; Set<String> serviceArea; Set<TimeRange> openingHours; List<MenuItem> menuItems; } class MenuItem { String name; double price; } class TimeRange { long id; int dayOfWeek; int openTime; int closeTime; }
  39. 39. @crichardson Database schema ID Name … 1 Ajanta 2 Montclair Eggshop Restaurant_id zipcode 1 94707 1 94619 2 94611 2 94619 Restaurant_id dayOfWeek openTime closeTime 1 Monday 1130 1430 1 Monday 1730 2130 2 Tuesday 1130 … RESTAURANT table RESTAURANT_ZIPCODE table RESTAURANT_TIME_RANGE table
  40. 40. @crichardson RestaurantRepository public interface RestaurantRepository { void addRestaurant(Restaurant restaurant); Restaurant findById(long id); ... } Food To Go will have scaling eventually issues
  41. 41. @crichardson MongoDB • Document-oriented database • JSON-style documents: Lists, Maps, primitives • Schema-less • Transaction = update of a single document • Rich query language for dynamic/ad hoc queries + geo queries • Tunable writes: speed vs. reliability • Highly scalable and available
  42. 42. @crichardson MongoDB use cases • High volume writes • Complex data • Semi-structured data
  43. 43. @crichardson MongoDB data model Server Database: Food To Go Collection: Restaurants { "_id" : ObjectId("4bddc2f49d1505567c6220a0") "name": "Ajanta", "serviceArea": ["94619", "99999"], "openingHours": [ { "dayOfWeek": 1, "open": 1130, "close": 1430 }, { "dayOfWeek": 2, "open": 1130, "close": 1430 }, … ] } BSON = binary JSON Sequence of bytes on disk è fast i/o 16MByte limit PK
  44. 44. @crichardson Many NoSQL Databases = Aggregate-oriented
  45. 45. @crichardson Basic MongoDB collection operations... • insert(document(s), options) • Application assigned ids • Mongo generated UUID • update(query, update, options) • query - selects document(s) • update - replace or modify document (e.g. increment a field) • options - upset , multi, … (optional) • remove(query, options)
  46. 46. @crichardson ....Basic MongoDB collection operations • find/findOne(criteria, projection) • criteria - query • projection - fields to return (optional)
  47. 47. @crichardson Using Spring Data for Mongo @Repository class RestaurantRepositoryMongoDbImpl implements RestaurantRepository { @Override public void add(Restaurant restaurant) { mongoTemplate.insert(restaurant, "restaurants"); } @Override public Restaurant findDetailsById(int id) { return mongoTemplate.findById(id, Restaurant.class, "restaurants"); } Spring Data’s Generic Repositories = even less code
  48. 48. @crichardson Apache Cassandra • Distributed/Extensible row store: row ~= java.util.SortedMap • Transaction = update of a row • Fast writes = append to a log • Tunable reads/writes: consistency latency/availability • Extremely scalable • Transparent and dynamic clustering • Rack and datacenter aware data replication
  49. 49. @crichardson Apache Cassandra use cases • Big data • Multiple Data Center distributed database • (Write intensive) Logging • High-availability (writes)
  50. 50. @crichardson Cassandra data model Keyspace Table K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3K2 Column Name Column Value Timestamp Row Key Column name/value: number, string, Boolean, timestamp, counter, and composite
  51. 51. @crichardson Inserting/updating data table.insert(key=K1, (N4, V4, TS4), …)Idempotent= transaction Table K1 N1 V1 TS1 … N2 V2 TS2 N3 V3 TS3 Table K1 N1 V1 TS1 … N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 optional column TTL Application assigned keys - natural or UUID
  52. 52. @crichardson Reading data table.slice(key=K1, startColumn=N2, endColumn=N4) Tables K1 N1 V1 TS1 … N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 K1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 Cassandra has secondary indexes but they aren’t always helpful
  53. 53. @crichardson Cassandra Query Language • SQL-like • DDL: Create table, ... • DML: Insert, Update, Select, ... • Restricted WHERE clauses, e.g. PK equality only (if you want efficiency) • Primary key: • Simple - 1 storage table row 1 CQL row • Compound - 1 storage table row multiple CQL rows! (clustered rows)
  54. 54. @crichardson Representing restaurants create table restaurant ( restaurant_id int PRIMARY KEY, name text, service_area set<text>, day_of_weeks list<int>, opening_times list<int>, closing_times list<int> );
  55. 55. @crichardson Inserting and retrieving restaurants insert into restaurants.restaurant( restaurant_id, name, service_area, day_of_weeks, opening_times, closing_times) Values(?, ?, ?, ?, ?, ?) select * from restaurants.restaurant where restaurant_id = ?
  56. 56. @crichardson Storing restaurants in Cassandra name Ajanta1 serviceArea:94619 - serviceArea:94618 - Set member daysOfWeeks:0 Monday daysOfWeeks:1 Monday Element index Element value
  57. 57. @crichardson Cassandra Java APIs • Java Driver • https://github.com/datastax/java-driver • Netflix Astanyx • http://techblog.netflix.com/2013/12/astyanax-update.html • Spring Data for Cassandra • http://projects.spring.io/spring-data-cassandra/
  58. 58. @crichardson Java Driver: Inserting a restaurant public class AvailableRestaurantRepositoryCassandraImpl ... public AvailableRestaurantRepositoryCassandraImpl(Session session) { insertStatement = session.prepare( "insert into restaurants.restaurant(restaurant_id, name, service_area, day_of_weeks, opening_times, closing_times) Values(?, ?, ?, ?, ?, ?);" ); ... } @Override public void add(Restaurant restaurant) { List<Integer> dayOfWeeks = new ArrayList<Integer>(); List<Integer> openingTimes = new ArrayList<Integer>(); List<Integer> closingTimes = new ArrayList<Integer>(); for (TimeRange tr : restaurant.getOpeningHours()) { dayOfWeeks.add(tr.getDayOfWeek()); openingTimes.add(tr.getOpenHour()); closingTimes.add(tr.getClosingTime()); } session.execute(insertStatement.bind(restaurant.getId(), restaurant.getName(), restaurant.getServiceArea(), dayOfWeeks, openingTimes, closingTimes )); }
  59. 59. @crichardson Java Driver: Finding a restaurant public class AvailableRestaurantRepositoryCassandraImpl implements AvailableRestaurantRepository { public AvailableRestaurantRepositoryCassandraImpl(Session session) { this.findByIdStatement = session.prepare( "select * from restaurants.restaurant where restaurant_id = ?;"); ... } @Override public Restaurant findDetailsById(int id) { Row row = session.execute(findByIdStatement.bind(id)).all().get(0); List<Integer> dayOfWeeks = row.getList("day_of_weeks", Integer.class); List<Integer> openingTimes= row.getList("opening_times", Integer.class); List<Integer> closingTimes = row.getList("closing_times", Integer.class); Set<TimeRange> openingHours = new HashSet<TimeRange>(); for (int i = 0 ; i < dayOfWeeks.size(); i++) { openingHours.add( new TimeRange(dayOfWeeks.get(i), openingTimes.get(i), closingTimes.get(i))); } Restaurant r = new Restaurant(row.getString("name"), ..., row.getSet("service_area", String.class), openingHours, null); r.setId(id); return r; }
  60. 60. @crichardson Agenda • Why polyglot persistence? • Persisting entities with MongoDB and Cassandra • Querying data with MongoDB and Cassandra • Scaling MongoDB and Cassandra
  61. 61. @crichardson Finding available restaurants Available restaurants = Serve the zip code of the delivery address AND Are open at the delivery time public interface AvailableRestaurantRepository { List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime); ... }
  62. 62. @crichardson Finding available restaurants on Monday, 6.15pm for 94619 zipcode Straightforward three-way join select r.* from restaurant r inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_id where ’94619’ = sa.zip_code and tr.day_of_week=’monday’ and tr.openingtime <= 1815 and 1815 <= tr.closingtime
  63. 63. @crichardson MongoDB = easy to query { serviceArea:"94619", openingHours: { $elemMatch : { "dayOfWeek" : "Monday", "open": {$lte: 1815}, "close": {$gte: 1815} } } } DBCursor cursor = collection.find(qbeObject); while (cursor.hasNext()) { DBObject o = cursor.next(); … } db.availableRestaurants.ensureIndex({serviceArea: 1})
  64. 64. @crichardson Using Spring Data for Mongo @Repository class RestaurantRepositoryMongoDbImpl implements RestaurantRepository { @Override public List<AvailableRestaurant> findAvailableRestaurants( Address deliveryAddress, Date deliveryTime) { int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime); Query query = new Query( where("serviceArea").is(deliveryAddress.getZip()) .and("openingHours") .elemMatch( where("dayOfWeek").is(dayOfWeek) .and("openingTime").lte(timeOfDay) .and("closingTime").gte(timeOfDay))); return mongoTemplate.find( query, AvailableRestaurant.class, AVAILABLE_RESTAURANTS_COLLECTION); }
  65. 65. @crichardson BUT how to do this with Cassandra??! • How can Cassandra support a query that has • A 3-way join • Multiple = • > and < ? è We need to denormalize the data!!
  66. 66. @crichardson Simplification #1: Denormalization Restaurant_id Day_of_week Open_time Close_time Zip_code 1 Monday 1130 1430 94707 1 Monday 1130 1430 94619 1 Monday 1730 2130 94707 1 Monday 1730 2130 94619 2 Monday 0700 1430 94619 … SELECT restaurant_id FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time AND open_time < 1815 Simpler query: § No joins § Two = and two <
  67. 67. @crichardson Simplification #2:Application filtering SELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time AND open_time < 1815 Even simpler query • No joins • Two = and one < This is a CQL query!
  68. 68. @crichardson Available restaurants table create table available_restaurants ( id int, name text, zip_code text, day_of_week int, open_time int, close_time int, primary key ((zip_code, day_of_week), close_time, id) ) ; Compound primary key Clustering columns prefix column names Composite partition key = row key
  69. 69. @crichardson Cassandra available_restaurants table 1430:1:name Ajanta94619:Monday 1430:1:open_time 1130 close_time:id:≪column name≫zipcode:day of week 1730:1:name Ajanta 1730:1:open_time 2130 1430:2:name Egg shop 1430:2:open_time 0800 primary key ((zip_code, day_of_week), close_time, id)
  70. 70. @crichardson Finding available restaurants select * from available_restaurants where zip_code = '94619' and day_of_week = 1 and close_time > 1815;
  71. 71. @crichardson Cassandra query @Repository class AvailableRestaurantRepositoryCassandraImpl implements RestaurantRepository { public AvailableRestaurantRepositoryCassandraImpl(Session session) { this.findAvailable = session.prepare( "Select open_time, restaurant_name " + "From restaurants.available_restaurants " + "Where zip_code = ? " + "And day_of_week = ? " + " And close_time >= ?;" ); … }
  72. 72. @crichardson Cassandra query @Repository class AvailableRestaurantRepositoryCassandraImpl implements RestaurantRepository { @Override public List<AvailableRestaurant> findAvailableRestaurants( Address deliveryAddress, Date deliveryTime) { List<AvailableRestaurant> result = new ArrayList<AvailableRestaurant>(); int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); BoundStatement bound = findAvailable.bind(deliveryAddress.getZip(), DateTimeUtil.dayOfWeek(deliveryTime), timeOfDay); for (Row row : session.execute(bound).all()) { if (row.getInt("open_time") <= timeOfDay) { result.add( new AvailableRestaurant(row.getString("restaurant_name")) ); } } return result; }
  73. 73. @crichardson NoSQL Denormalized representation for each query
  74. 74. @crichardson SorryTed! http://en.wikipedia.org/wiki/Edgar_F._Codd
  75. 75. @crichardson About Cassandra and MongoDB • Cassandra: • Efficient storage of complex aggregates • Limited queries requiring denormalized representation
 • MongoDB • Efficient storage of complex aggregates • Rich ad hoc queries But where they get really interesting is when it comes to scaling
  76. 76. @crichardson Agenda • Why polyglot persistence? • Persisting entities with MongoDB and Cassandra • Querying data with MongoDB and Cassandra • Scaling MongoDB and Cassandra
  77. 77. Scaling MongoDB: Replica Sets Replica Set Mongod (secondary) Mongod (primary) Mongod (secondary) Client http://docs.mongodb.org/manual/replication/ Writes Consistent reads Inconsistent reads replication Automatic master election Connects to seed servers
  78. 78. Mongos Scaling MongoDB: Sharding Replica Set 2 (aka. Shard 2) Mongod (secondary) Mongod (primary) Mongod (secondary) Replica Set 1 (aka. Shard 1) Mongod (secondary) Mongod (primary) Mongod (secondary) Mongos Client Config Server mongod mongod mongod http://docs.mongodb.org/manual/core/sharding-introduction/ Key-based routing or Scatter/gather
  79. 79. @crichardson MongoDB Sharding • Collection is partitioned into chunks • Each shard is responsible for one or more chunks • Range-based sharding • Each chunk is responsible for a range of keys • Efficient execution of range queries BUT risk of uneven distribution • Hash-based sharding • Key is hashed and mapped into chunk • Good distribution BUT range queries processed by all shards
  80. 80. @crichardson MongoDB reads and writes • Writes • Trade-off: request latency vs. safety • No acknowledgement! • Acknowledgement by primary or by primary & N - 1 replicas • Acknowledgement after committing to journal • Tag-based, e.g. write to servers in different data centers • Reads • Read uncommitted isolation - reads can return data that has not been committed yet • Master - the default • Secondary - if stale data is ok • Use tags { w: N, j: true/false, wtimeout: timeout }
  81. 81. @crichardson Cassandra cluster http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 Key Partitioner 64/128-bit hash (a.ka. token) VNode owns a range of hash values ReplicasMurmurHash MD5 Node owns collection of vnodes
  82. 82. @crichardson Multiple data centers DC 1 DC 2
  83. 83. @crichardson Cassandra reads and writes • Any node can handle any request • Plays the role of coordinator • Communicates with replica nodes • Write request • Update is written to commit log of one or more replicas • Other replicas are updated asynchronously • Read request • Read data from one or more replicas • Choose the most recent data based on timestamp • Read repair: sends updates to stale replicas No Master!
  84. 84. @crichardson Cassandra read and write consistency • For each read and write request you specify: • How many nodes to read/write before responding • Local (single DC) vs. Multi-DCs • All replicas in all DCs will eventually be updated • Trade-off: • More nodes: greater consistency but less availability and higher latency • Fewer nodes: less consistency but higher availability and lower latency http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
  85. 85. @crichardson Consistency examples • High-performance, high-availability writes, e.g. logging • Write consistency of ANY - even replicas can be down • Read consistency of ONE - any replica • Consistent reads • (nodes_written + nodes_read) > replication_factor • Read/Write consistency of LOCAL_QUORUM • Globally consistent reads • Read/write consistency of QUORUM
  86. 86. @crichardson Comparing Cassandra and MongoDB • Cassandra • Replica model • Write to any replica (or Node) • Sync locally/async globally
 • MongoDB • Master/slave model • Write to master • Sync to possibly remote master
  87. 87. @crichardson Summary • Each SQL/NoSQL database = set of tradeoffs • NoSQL databases: • Diverse • Aggregate-oriented (typically) • Use query-oriented data modeling (typically) • Polyglot persistence: leverage the strengths of SQL and NoSQL databases
  88. 88. @crichardson Questions? @crichardson chris@chrisrichardson.net http://plainoldobjects.com

×