SlideShare une entreprise Scribd logo
1  sur  88
Télécharger pour lire hors ligne
Polyglot persistence for Java
developers:
time to move out of the relational
comfort zone?
Chris Richardson
Author of POJOs in Action
Founder of the original CloudFoundry.com
@crichardson
chris@chrisrichardson.net
http://plainoldobjects.com
@crichardson
Presentation Goal
The benefits and drawbacks
of polyglot persistence
and
How to design applications
that use this approach
@crichardson
About Chris
@crichardson
About Chris
Founder of a startup that’s creating
a platform for developing
event-driven microservices
@crichardson
Agenda
• Why polyglot persistence?
• Persisting entities with MongoDB and Cassandra
• Querying data with MongoDB and Cassandra
• Scaling MongoDB and Cassandra
@crichardson
Relational Databases
@crichardson
Example: Food to Go
• Take-out food delivery
service
• “Launched” in 2006
@crichardson
FoodTo Go Architecture
Order
taking
Restaurant
Management
MySQL
Database
CONSUMER
RESTAURANT
OWNER
@crichardson
Example: Device management
server ~ 2003
• Everything was stored in a Oracle database
• Device metadata
• Firmware patches!
• ….
@crichardson
RDBMS are great
• SQL = Rich, declarative query language
• Database enforces referential integrity
• ACID semantics
• Well understood by developers
• Well supported by frameworks and tools, e.g. Spring JDBC,
Hibernate, JPA
• Well understood by operations
@crichardson
Impact of SSD/Flash storage
• HDD = 200 IOPS vs. SSD = 100K IOPS
• Massive performance improvement
• Expands the range of use cases that a single RDBMS server
can cost-effectively support
@crichardson
• Hosted relational database
• Compatible with MySQL 5.6 but with 5x performance
• Vertically scales to 32 vCPUs and 244 GiB of RAM
• SSD-backed virtualized storage layer, replicated 6 ways across 3 AZs
• Up to 15 replicas that share storage with master - minimal replication lag
• Fast restart after crash
• No redo log replay
• SSD-backed virtualized storage layer purpose-built for database workloads
• Fast fail-over to replica after master instance failure without data loss
AWS Aurora
http://aws.amazon.com/rds/aurora/details/
NEW SQL
• Next generation SQL databases,
e.g.VoltDB, MemSQL, ...
• Leverage modern, multi-core,
commodity hardware
• In-memory
• Horizontally scalable
• Transparently shardable
• ACID
“Current databases are designed for 1970s
hardware and for both OLTP and data
warehouses”
http://nms.csail.mit.edu/~stavros/pubs/OLTP_sigmod08.pdf
@crichardson
An RDBMS is great for many
applications but ….
@crichardson
Limitations of relational
databases
• Scalability
• Multi data center, distributed database
• Schema updates
• O/R impedance mismatch
• Handling semi-structured data
@crichardson
Solution: Spend $$$ on Oracle’s
high-end databases and servers
@crichardson
Not so bad…
http://www.powerandmotoryacht.com/megayachts/megayacht-musashi
@crichardson
… or is it?
http://www.iwtg.net/
@crichardson
Solution: Spend $$$ - open-
source stack + DevOps people
http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_5_series/madone_5_2/#
@crichardson
Apply the scale cube
X axis
- horizontal duplication
Z
axis
-data
partitioning
Y axis -
functional
decomposition
Scale
by
splitting
sim
ilar
things
Scale by
splitting
different things
@crichardson
Applying the scale cube
• Y-axis splits/functional decomposition
• Application = Set[Microservice] - each with its own database
• Monolithic database is functionally decomposed
• Different types of entities in different databases
• Z-axis splits/sharding
• Entities of the same type partitioned across multiple databases
@crichardson
How does each service access
data?
?
Velocity and
Volume
Variety of
Data
Fixed or ad
hoc queries
Access
patterns
DistributionLatency
@crichardson
Velocity andVolume?
• Velocity - speed at which data moves
• Volume - the amount of data
• Does it fit on a single machine?
@crichardson
Variety of Data?
• Relational
• Aggregate oriented
• Graph
• Complex nested structures
• Semi structured
• Text
• Binary blogs, e.g. images
@crichardson
Fixed or ad hoc queries?
• Fixed set of queries
• Known in advance
• Slowly changing
• Ad hoc queries
• Users can submit ad hoc queries
@crichardson
Access patterns
• PK-oriented access, e.g. load-modify-update a business entity
• Bulk queries and/or updates
• Non-relational queries:
• text search
• graph-oriented
• geo search
• …
@crichardson
Reads vs.Writes
• Mix of reads and writes
• Write intensive, e.g. logging application
• Read intensive
• Data analytics/warehouse
• Slowly changing data
• …
@crichardson
Distribution
• Single database
• Multiple active databases
• on a LAN (low latency)
• on a WAN (high latency)
@crichardson
Transactions
• Mandatory ACID
• Eventual consistency OK?
@crichardson
Latency
• When should new data show up in results?
• Low latency - seconds, milliseconds?
• High latency - next day?
@crichardson
And then pick your database…
@crichardson
Use a NoSQL database
Benefits
• Higher performance
• Higher scalability
• Richer data-model
• Schema-less
Drawbacks
• Limited transactions
• Limited querying
• Relaxed consistency
• Unconstrained data
@crichardson
Example NoSQL Databases
Database Key features
Cassandra
Extensible column store,
very scalable, distributed
MongoDB Document-oriented, fast,
scalable
Redis Key-value store, very fast
DynamoDB AWS hosted key-value
and document store
Neo4j Graph Database
http://nosql-database.org/ lists 150 NoSQL
databases
@crichardson
Relative popularity
http://www.indeed.com/jobtrends/mongodb%2Ccassandra%2Credis%2Cneo4j%2Cdynamodb.html
@crichardson
But there are many other options
• Blob store, e.g.AWS S3
• Text search engine, e.g. ElasticSearch,AWS CloudSearch, …
• Big data technology:Apache Hadoop,Apache Spark, …
• Real time streaming: Storm, Spark Streaming, …
@crichardson
Polyglot persistence
IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg
Event sourcing and CQRS are a great approach
@crichardson
Agenda
• Why polyglot persistence?
• Persisting entities with MongoDB and Cassandra
• Querying data with MongoDB and Cassandra
• Scaling MongoDB and Cassandra
@crichardson
Food to Go – Domain model (partial)
class Restaurant {
long id;
String name;
Set<String> serviceArea;
Set<TimeRange> openingHours;
List<MenuItem> menuItems;
}
class MenuItem {
String name;
double price;
}
class TimeRange {
long id;
int dayOfWeek;
int openTime;
int closeTime;
}
@crichardson
Database schema
ID Name …
1 Ajanta
2 Montclair Eggshop
Restaurant_id zipcode
1 94707
1 94619
2 94611
2 94619
Restaurant_id dayOfWeek openTime closeTime
1 Monday 1130 1430
1 Monday 1730 2130
2 Tuesday 1130 …
RESTAURANT table
RESTAURANT_ZIPCODE table
RESTAURANT_TIME_RANGE table
@crichardson
RestaurantRepository
public interface RestaurantRepository {
void addRestaurant(Restaurant restaurant);
Restaurant findById(long id);
...
}
Food To Go will have scaling
eventually issues
@crichardson
MongoDB
• Document-oriented database
• JSON-style documents: Lists, Maps, primitives
• Schema-less
• Transaction = update of a single document
• Rich query language for dynamic/ad hoc queries + geo queries
• Tunable writes: speed vs. reliability
• Highly scalable and available
@crichardson
MongoDB use cases
• High volume writes
• Complex data
• Semi-structured data
@crichardson
MongoDB data model
Server
Database: Food To Go
Collection: Restaurants
{
"_id" : ObjectId("4bddc2f49d1505567c6220a0")
"name": "Ajanta",
"serviceArea": ["94619", "99999"],
"openingHours": [
{
"dayOfWeek": 1,
"open": 1130,
"close": 1430 },
{
"dayOfWeek": 2,
"open": 1130,
"close": 1430
}, …
]
}
BSON = binary JSON
Sequence of bytes
on disk è fast i/o
16MByte limit
PK
@crichardson
Many NoSQL Databases
=
Aggregate-oriented
@crichardson
Basic MongoDB collection
operations...
• insert(document(s), options)
• Application assigned ids
• Mongo generated UUID
• update(query, update, options)
• query - selects document(s)
• update - replace or modify document (e.g. increment a field)
• options - upset , multi, … (optional)
• remove(query, options)
@crichardson
....Basic MongoDB collection
operations
• find/findOne(criteria, projection)
• criteria - query
• projection - fields to return (optional)
@crichardson
Using Spring Data for Mongo
@Repository
class RestaurantRepositoryMongoDbImpl implements RestaurantRepository {
@Override
public void add(Restaurant restaurant) {
mongoTemplate.insert(restaurant, "restaurants");
}
@Override
public Restaurant findDetailsById(int id) {
return mongoTemplate.findById(id, Restaurant.class, "restaurants");
}
Spring Data’s Generic Repositories = even less code
@crichardson
Apache Cassandra
• Distributed/Extensible row store: row ~= java.util.SortedMap
• Transaction = update of a row
• Fast writes = append to a log
• Tunable reads/writes: consistency latency/availability
• Extremely scalable
• Transparent and dynamic clustering
• Rack and datacenter aware data replication
@crichardson
Apache Cassandra use cases
• Big data
• Multiple Data Center distributed database
• (Write intensive) Logging
• High-availability (writes)
@crichardson
Cassandra data model
Keyspace
Table
K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3
N1 V1 TS1 N2 V2 TS2 N3 V3 TS3K2
Column
Name
Column
Value
Timestamp
Row
Key
Column name/value: number, string, Boolean, timestamp, counter, and
composite
@crichardson
Inserting/updating data
table.insert(key=K1, (N4, V4, TS4), …)Idempotent= transaction
Table
K1 N1 V1 TS1
…
N2 V2 TS2 N3 V3 TS3
Table
K1 N1 V1 TS1
…
N2 V2 TS2 N3 V3 TS3 N4 V4 TS4
optional column TTL
Application assigned keys - natural or UUID
@crichardson
Reading data
table.slice(key=K1, startColumn=N2, endColumn=N4)
Tables
K1 N1 V1 TS1
…
N2 V2 TS2 N3 V3 TS3 N4 V4 TS4
K1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4
Cassandra has secondary indexes but they
aren’t always helpful
@crichardson
Cassandra Query Language
• SQL-like
• DDL: Create table, ...
• DML: Insert, Update, Select, ...
• Restricted WHERE clauses, e.g. PK equality only (if you want efficiency)
• Primary key:
• Simple - 1 storage table row 1 CQL row
• Compound - 1 storage table row multiple CQL rows! (clustered rows)
@crichardson
Representing restaurants
create table restaurant (
	 restaurant_id int PRIMARY KEY,
	 name text,
	 service_area set<text>,
	 day_of_weeks list<int>,
	 opening_times list<int>,
	 closing_times list<int>
);
@crichardson
Inserting and retrieving
restaurants
insert into restaurants.restaurant(
restaurant_id, name, service_area,
day_of_weeks, opening_times,
closing_times)
Values(?, ?, ?, ?, ?, ?)
select *
from restaurants.restaurant
where restaurant_id = ?
@crichardson
Storing restaurants in Cassandra
name Ajanta1 serviceArea:94619 -
serviceArea:94618 -
Set member
daysOfWeeks:0 Monday
daysOfWeeks:1 Monday
Element
index
Element
value
@crichardson
Cassandra Java APIs
• Java Driver
• https://github.com/datastax/java-driver
• Netflix Astanyx
• http://techblog.netflix.com/2013/12/astyanax-update.html
• Spring Data for Cassandra
• http://projects.spring.io/spring-data-cassandra/
@crichardson
Java Driver: Inserting a restaurant
public class AvailableRestaurantRepositoryCassandraImpl ...
public AvailableRestaurantRepositoryCassandraImpl(Session session) {
insertStatement = session.prepare(
"insert into restaurants.restaurant(restaurant_id, name, service_area, day_of_weeks,
opening_times, closing_times) Values(?, ?, ?, ?, ?, ?);"
);
...
}
@Override
public void add(Restaurant restaurant) {
List<Integer> dayOfWeeks = new ArrayList<Integer>();
List<Integer> openingTimes = new ArrayList<Integer>();
List<Integer> closingTimes = new ArrayList<Integer>();
for (TimeRange tr : restaurant.getOpeningHours()) {
dayOfWeeks.add(tr.getDayOfWeek());
openingTimes.add(tr.getOpenHour());
closingTimes.add(tr.getClosingTime());
}
session.execute(insertStatement.bind(restaurant.getId(),
restaurant.getName(),
restaurant.getServiceArea(),
dayOfWeeks,
openingTimes,
closingTimes
));
}
@crichardson
Java Driver: Finding a restaurant
public class AvailableRestaurantRepositoryCassandraImpl
implements AvailableRestaurantRepository {
public AvailableRestaurantRepositoryCassandraImpl(Session session) {
this.findByIdStatement = session.prepare(
"select * from restaurants.restaurant where restaurant_id = ?;");
...
}
@Override
public Restaurant findDetailsById(int id) {
Row row = session.execute(findByIdStatement.bind(id)).all().get(0);
List<Integer> dayOfWeeks = row.getList("day_of_weeks", Integer.class);
List<Integer> openingTimes= row.getList("opening_times", Integer.class);
List<Integer> closingTimes = row.getList("closing_times", Integer.class);
Set<TimeRange> openingHours = new HashSet<TimeRange>();
for (int i = 0 ; i < dayOfWeeks.size(); i++) {
openingHours.add(
new TimeRange(dayOfWeeks.get(i), openingTimes.get(i), closingTimes.get(i)));
}
Restaurant r = new Restaurant(row.getString("name"), ...,
row.getSet("service_area", String.class), openingHours, null);
r.setId(id);
return r;
}
@crichardson
Agenda
• Why polyglot persistence?
• Persisting entities with MongoDB and Cassandra
• Querying data with MongoDB and Cassandra
• Scaling MongoDB and Cassandra
@crichardson
Finding available restaurants
Available restaurants =
Serve the zip code of the delivery address
AND
Are open at the delivery time
public interface AvailableRestaurantRepository {
List<AvailableRestaurant>
findAvailableRestaurants(Address deliveryAddress, Date deliveryTime);
...
}
@crichardson
Finding available restaurants on Monday, 6.15pm for
94619 zipcode
Straightforward three-way join
select r.*
from restaurant r
inner join restaurant_time_range tr
on r.id =tr.restaurant_id
inner join restaurant_zipcode sa
on r.id = sa.restaurant_id
where ’94619’ = sa.zip_code
and tr.day_of_week=’monday’
and tr.openingtime <= 1815
and 1815 <= tr.closingtime
@crichardson
MongoDB = easy to query
{
serviceArea:"94619",
openingHours: {
$elemMatch : {
"dayOfWeek" : "Monday",
"open": {$lte: 1815},
"close": {$gte: 1815}
}
}
}
DBCursor cursor = collection.find(qbeObject);
while (cursor.hasNext()) {
DBObject o = cursor.next();
…
}
db.availableRestaurants.ensureIndex({serviceArea: 1})
@crichardson
Using Spring Data for Mongo
@Repository
class RestaurantRepositoryMongoDbImpl implements RestaurantRepository {
@Override
public List<AvailableRestaurant> findAvailableRestaurants(
Address deliveryAddress, Date deliveryTime) {
int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime);
int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);
Query query =
new Query(
where("serviceArea").is(deliveryAddress.getZip())
.and("openingHours")
.elemMatch(
where("dayOfWeek").is(dayOfWeek)
.and("openingTime").lte(timeOfDay)
.and("closingTime").gte(timeOfDay)));
return mongoTemplate.find(
query, AvailableRestaurant.class,
AVAILABLE_RESTAURANTS_COLLECTION);
}
@crichardson
BUT how to do this with
Cassandra??!
• How can Cassandra support a query that has
• A 3-way join
• Multiple =
• > and <
?
è We need to denormalize the data!!
@crichardson
Simplification #1:
Denormalization
Restaurant_id Day_of_week Open_time Close_time Zip_code
1 Monday 1130 1430 94707
1 Monday 1130 1430 94619
1 Monday 1730 2130 94707
1 Monday 1730 2130 94619
2 Monday 0700 1430 94619
…
SELECT restaurant_id
FROM time_range_zip_code
WHERE day_of_week = ‘Monday’
AND zip_code = 94619
AND 1815 < close_time
AND open_time < 1815
Simpler query:
§ No joins
§ Two = and two <
@crichardson
Simplification #2:Application
filtering
SELECT restaurant_id, open_time
FROM time_range_zip_code
WHERE day_of_week = ‘Monday’
AND zip_code = 94619
AND 1815 < close_time
AND open_time < 1815
Even simpler query
• No joins
• Two = and one <
This is a CQL query!
@crichardson
Available restaurants table
create table available_restaurants (
id int,
name text,
zip_code text,
day_of_week int,
open_time int,
close_time int,
primary key ((zip_code, day_of_week), close_time, id)
) ;
Compound
primary key
Clustering columns
prefix column names
Composite
partition key
= row key
@crichardson
Cassandra available_restaurants
table
1430:1:name Ajanta94619:Monday
1430:1:open_time 1130
close_time:id:≪column name≫zipcode:day of week
1730:1:name Ajanta
1730:1:open_time 2130
1430:2:name Egg shop
1430:2:open_time 0800
primary key ((zip_code, day_of_week), close_time, id)
@crichardson
Finding available restaurants
select *
from available_restaurants
where
zip_code = '94619'
and day_of_week = 1
and close_time > 1815;
@crichardson
Cassandra query
@Repository
class AvailableRestaurantRepositoryCassandraImpl
implements RestaurantRepository {
public AvailableRestaurantRepositoryCassandraImpl(Session session) {
this.findAvailable = session.prepare(
"Select open_time, restaurant_name " +
"From restaurants.available_restaurants " +
"Where zip_code = ? " +
"And day_of_week = ? " +
" And close_time >= ?;"
);
…
}
@crichardson
Cassandra query
@Repository
class AvailableRestaurantRepositoryCassandraImpl implements
RestaurantRepository {
@Override
public List<AvailableRestaurant> findAvailableRestaurants(
Address deliveryAddress, Date deliveryTime) {
List<AvailableRestaurant> result =
new ArrayList<AvailableRestaurant>();
int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime);
BoundStatement bound = findAvailable.bind(deliveryAddress.getZip(),
DateTimeUtil.dayOfWeek(deliveryTime), timeOfDay);
for (Row row : session.execute(bound).all()) {
if (row.getInt("open_time") <= timeOfDay) {
result.add(
new AvailableRestaurant(row.getString("restaurant_name"))
);
}
}
return result;
}
@crichardson
NoSQL Denormalized
representation for each query
@crichardson
SorryTed!
http://en.wikipedia.org/wiki/Edgar_F._Codd
@crichardson
About Cassandra and MongoDB
• Cassandra:
• Efficient storage of
complex aggregates
• Limited queries requiring
denormalized
representation

• MongoDB
• Efficient storage of
complex aggregates
• Rich ad hoc queries
But where they get really interesting is
when it comes to scaling
@crichardson
Agenda
• Why polyglot persistence?
• Persisting entities with MongoDB and Cassandra
• Querying data with MongoDB and Cassandra
• Scaling MongoDB and Cassandra
Scaling MongoDB: Replica Sets
Replica Set
Mongod
(secondary)
Mongod
(primary)
Mongod
(secondary)
Client
http://docs.mongodb.org/manual/replication/
Writes
Consistent reads Inconsistent reads
replication
Automatic
master
election
Connects to seed servers
Mongos
Scaling MongoDB: Sharding
Replica Set 2 (aka. Shard 2)
Mongod
(secondary)
Mongod
(primary)
Mongod
(secondary)
Replica Set 1 (aka. Shard 1)
Mongod
(secondary)
Mongod
(primary)
Mongod
(secondary)
Mongos
Client
Config Server
mongod
mongod
mongod
http://docs.mongodb.org/manual/core/sharding-introduction/
Key-based routing
or
Scatter/gather
@crichardson
MongoDB Sharding
• Collection is partitioned into chunks
• Each shard is responsible for one or more chunks
• Range-based sharding
• Each chunk is responsible for a range of keys
• Efficient execution of range queries BUT risk of uneven distribution
• Hash-based sharding
• Key is hashed and mapped into chunk
• Good distribution BUT range queries processed by all shards
@crichardson
MongoDB reads and writes
• Writes
• Trade-off: request latency vs. safety
• No acknowledgement!
• Acknowledgement by primary or by primary & N - 1 replicas
• Acknowledgement after committing to journal
• Tag-based, e.g. write to servers in different data centers
• Reads
• Read uncommitted isolation - reads can return data that has not been committed yet
• Master - the default
• Secondary - if stale data is ok
• Use tags
{ w: N,
j: true/false,
wtimeout: timeout
}
@crichardson
Cassandra cluster
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
Key
Partitioner
64/128-bit hash
(a.ka. token)
VNode
owns a
range of
hash
values
ReplicasMurmurHash
MD5
Node
owns
collection
of vnodes
@crichardson
Multiple data centers
DC 1 DC 2
@crichardson
Cassandra reads and writes
• Any node can handle any request
• Plays the role of coordinator
• Communicates with replica nodes
• Write request
• Update is written to commit log of one or more replicas
• Other replicas are updated asynchronously
• Read request
• Read data from one or more replicas
• Choose the most recent data based on timestamp
• Read repair: sends updates to stale replicas
No
Master!
@crichardson
Cassandra read and write
consistency
• For each read and write request you specify:
• How many nodes to read/write before responding
• Local (single DC) vs. Multi-DCs
• All replicas in all DCs will eventually be updated
• Trade-off:
• More nodes: greater consistency but less availability and higher latency
• Fewer nodes: less consistency but higher availability and lower latency
http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
@crichardson
Consistency examples
• High-performance, high-availability writes, e.g. logging
• Write consistency of ANY - even replicas can be down
• Read consistency of ONE - any replica
• Consistent reads
• (nodes_written + nodes_read) > replication_factor
• Read/Write consistency of LOCAL_QUORUM
• Globally consistent reads
• Read/write consistency of QUORUM
@crichardson
Comparing Cassandra and
MongoDB
• Cassandra
• Replica model
• Write to any replica (or
Node)
• Sync locally/async globally

• MongoDB
• Master/slave model
• Write to master
• Sync to possibly remote
master
@crichardson
Summary
• Each SQL/NoSQL database = set of tradeoffs
• NoSQL databases:
• Diverse
• Aggregate-oriented (typically)
• Use query-oriented data modeling (typically)
• Polyglot persistence: leverage the strengths of SQL and NoSQL
databases
@crichardson
Questions?
@crichardson chris@chrisrichardson.net
http://plainoldobjects.com

Contenu connexe

Tendances

Tendances (20)

Saturn2017: No such thing as a microservice!
Saturn2017: No such thing as a microservice! Saturn2017: No such thing as a microservice!
Saturn2017: No such thing as a microservice!
 
Oracle CodeOne 2019: Decompose Your Monolith: Strategies for Migrating to Mic...
Oracle CodeOne 2019: Decompose Your Monolith: Strategies for Migrating to Mic...Oracle CodeOne 2019: Decompose Your Monolith: Strategies for Migrating to Mic...
Oracle CodeOne 2019: Decompose Your Monolith: Strategies for Migrating to Mic...
 
SVCC Developing Asynchronous, Message-Driven Microservices
SVCC Developing Asynchronous, Message-Driven Microservices  SVCC Developing Asynchronous, Message-Driven Microservices
SVCC Developing Asynchronous, Message-Driven Microservices
 
Developing Event-driven Microservices with Event Sourcing & CQRS (gotoams)
Developing Event-driven Microservices with Event Sourcing & CQRS (gotoams)Developing Event-driven Microservices with Event Sourcing & CQRS (gotoams)
Developing Event-driven Microservices with Event Sourcing & CQRS (gotoams)
 
Melbourne Jan 2019 - Microservices adoption anti-patterns: Obstacles to decom...
Melbourne Jan 2019 - Microservices adoption anti-patterns: Obstacles to decom...Melbourne Jan 2019 - Microservices adoption anti-patterns: Obstacles to decom...
Melbourne Jan 2019 - Microservices adoption anti-patterns: Obstacles to decom...
 
YOW2018 - Events and Commands: Developing Asynchronous Microservices
YOW2018 - Events and Commands: Developing Asynchronous MicroservicesYOW2018 - Events and Commands: Developing Asynchronous Microservices
YOW2018 - Events and Commands: Developing Asynchronous Microservices
 
SVCC Microservices: Decomposing Applications for Testability and Deployability
SVCC Microservices: Decomposing Applications for Testability and Deployability SVCC Microservices: Decomposing Applications for Testability and Deployability
SVCC Microservices: Decomposing Applications for Testability and Deployability
 
JFokus: Cubes, Hexagons, Triangles, and More: Understanding Microservices
JFokus: Cubes, Hexagons, Triangles, and More: Understanding MicroservicesJFokus: Cubes, Hexagons, Triangles, and More: Understanding Microservices
JFokus: Cubes, Hexagons, Triangles, and More: Understanding Microservices
 
Oracle CodeOne 2019: Descending the Testing Pyramid: Effective Testing Strate...
Oracle CodeOne 2019: Descending the Testing Pyramid: Effective Testing Strate...Oracle CodeOne 2019: Descending the Testing Pyramid: Effective Testing Strate...
Oracle CodeOne 2019: Descending the Testing Pyramid: Effective Testing Strate...
 
Overview of the Eventuate Tram Customers and Orders application
Overview of the Eventuate Tram Customers and Orders applicationOverview of the Eventuate Tram Customers and Orders application
Overview of the Eventuate Tram Customers and Orders application
 
Decompose that WAR? A pattern language for microservices (@QCON @QCONSP)
Decompose that WAR? A pattern language for microservices (@QCON @QCONSP)Decompose that WAR? A pattern language for microservices (@QCON @QCONSP)
Decompose that WAR? A pattern language for microservices (@QCON @QCONSP)
 
DDD SoCal: Decompose your monolith: Ten principles for refactoring a monolith...
DDD SoCal: Decompose your monolith: Ten principles for refactoring a monolith...DDD SoCal: Decompose your monolith: Ten principles for refactoring a monolith...
DDD SoCal: Decompose your monolith: Ten principles for refactoring a monolith...
 
Mucon: Not Just Events: Developing Asynchronous Microservices
Mucon: Not Just Events: Developing Asynchronous MicroservicesMucon: Not Just Events: Developing Asynchronous Microservices
Mucon: Not Just Events: Developing Asynchronous Microservices
 
An overview of the Eventuate Platform
An overview of the Eventuate PlatformAn overview of the Eventuate Platform
An overview of the Eventuate Platform
 
Developing event-driven microservices with event sourcing and CQRS (london Ja...
Developing event-driven microservices with event sourcing and CQRS (london Ja...Developing event-driven microservices with event sourcing and CQRS (london Ja...
Developing event-driven microservices with event sourcing and CQRS (london Ja...
 
Events on the outside, on the inside and at the core - Chris Richardson
Events on the outside, on the inside and at the core - Chris RichardsonEvents on the outside, on the inside and at the core - Chris Richardson
Events on the outside, on the inside and at the core - Chris Richardson
 
A Pattern Language for Microservices
A Pattern Language for MicroservicesA Pattern Language for Microservices
A Pattern Language for Microservices
 
#DevNexus202 Decompose your monolith
#DevNexus202 Decompose your monolith#DevNexus202 Decompose your monolith
#DevNexus202 Decompose your monolith
 
Solving distributed data management problems in a microservice architecture (...
Solving distributed data management problems in a microservice architecture (...Solving distributed data management problems in a microservice architecture (...
Solving distributed data management problems in a microservice architecture (...
 
YOW! Perth: Cubes, Hexagons, Triangles, and More: Understanding the Microserv...
YOW! Perth: Cubes, Hexagons, Triangles, and More: Understanding the Microserv...YOW! Perth: Cubes, Hexagons, Triangles, and More: Understanding the Microserv...
YOW! Perth: Cubes, Hexagons, Triangles, and More: Understanding the Microserv...
 

En vedette

Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Johnny Miller
 
Developing polyglot persistence applications (SpringOne China 2012)
Developing polyglot persistence applications (SpringOne China 2012)Developing polyglot persistence applications (SpringOne China 2012)
Developing polyglot persistence applications (SpringOne China 2012)
Chris Richardson
 

En vedette (6)

Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
 
httpie
httpiehttpie
httpie
 
RESTful Web Services with Jersey
RESTful Web Services with JerseyRESTful Web Services with Jersey
RESTful Web Services with Jersey
 
Developing polyglot persistence applications (SpringOne China 2012)
Developing polyglot persistence applications (SpringOne China 2012)Developing polyglot persistence applications (SpringOne China 2012)
Developing polyglot persistence applications (SpringOne China 2012)
 
Developing polyglot persistence applications (gluecon 2013)
Developing polyglot persistence applications (gluecon 2013)Developing polyglot persistence applications (gluecon 2013)
Developing polyglot persistence applications (gluecon 2013)
 
Microservices pattern language (microxchg microxchg2016)
Microservices pattern language (microxchg microxchg2016)Microservices pattern language (microxchg microxchg2016)
Microservices pattern language (microxchg microxchg2016)
 

Similaire à Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 
Final_CloudEventFrankfurt2017 (1).pdf
Final_CloudEventFrankfurt2017 (1).pdfFinal_CloudEventFrankfurt2017 (1).pdf
Final_CloudEventFrankfurt2017 (1).pdf
MongoDB
 

Similaire à Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015) (20)

Developing polyglot persistence applications (svcc, svcc2013)
Developing polyglot persistence applications (svcc, svcc2013)Developing polyglot persistence applications (svcc, svcc2013)
Developing polyglot persistence applications (svcc, svcc2013)
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
CDC to the Max!
CDC to the Max!CDC to the Max!
CDC to the Max!
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
 
Augmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataAugmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure Data
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB Overview
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
Microservices with Apache Camel, Docker and Fabric8 v2
Microservices with Apache Camel, Docker and Fabric8 v2Microservices with Apache Camel, Docker and Fabric8 v2
Microservices with Apache Camel, Docker and Fabric8 v2
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
SQL, NoSQL, NewSQL? What's a developer to do?
SQL, NoSQL, NewSQL? What's a developer to do?SQL, NoSQL, NewSQL? What's a developer to do?
SQL, NoSQL, NewSQL? What's a developer to do?
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Dremio introduction
Dremio introductionDremio introduction
Dremio introduction
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Final_CloudEventFrankfurt2017 (1).pdf
Final_CloudEventFrankfurt2017 (1).pdfFinal_CloudEventFrankfurt2017 (1).pdf
Final_CloudEventFrankfurt2017 (1).pdf
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
mongodb_DS.pptx
mongodb_DS.pptxmongodb_DS.pptx
mongodb_DS.pptx
 
Cloud Data Strategy event London
Cloud Data Strategy event LondonCloud Data Strategy event London
Cloud Data Strategy event London
 

Plus de Chris Richardson

Plus de Chris Richardson (13)

The microservice architecture: what, why, when and how?
The microservice architecture: what, why, when and how?The microservice architecture: what, why, when and how?
The microservice architecture: what, why, when and how?
 
More the merrier: a microservices anti-pattern
More the merrier: a microservices anti-patternMore the merrier: a microservices anti-pattern
More the merrier: a microservices anti-pattern
 
YOW London - Considering Migrating a Monolith to Microservices? A Dark Energy...
YOW London - Considering Migrating a Monolith to Microservices? A Dark Energy...YOW London - Considering Migrating a Monolith to Microservices? A Dark Energy...
YOW London - Considering Migrating a Monolith to Microservices? A Dark Energy...
 
Dark Energy, Dark Matter and the Microservices Patterns?!
Dark Energy, Dark Matter and the Microservices Patterns?!Dark Energy, Dark Matter and the Microservices Patterns?!
Dark Energy, Dark Matter and the Microservices Patterns?!
 
Dark energy, dark matter and microservice architecture collaboration patterns
Dark energy, dark matter and microservice architecture collaboration patternsDark energy, dark matter and microservice architecture collaboration patterns
Dark energy, dark matter and microservice architecture collaboration patterns
 
Scenarios_and_Architecture_SkillsMatter_April_2022.pdf
Scenarios_and_Architecture_SkillsMatter_April_2022.pdfScenarios_and_Architecture_SkillsMatter_April_2022.pdf
Scenarios_and_Architecture_SkillsMatter_April_2022.pdf
 
Using patterns and pattern languages to make better architectural decisions
Using patterns and pattern languages to make better architectural decisions Using patterns and pattern languages to make better architectural decisions
Using patterns and pattern languages to make better architectural decisions
 
iSAQB gathering 2021 keynote - Architectural patterns for rapid, reliable, fr...
iSAQB gathering 2021 keynote - Architectural patterns for rapid, reliable, fr...iSAQB gathering 2021 keynote - Architectural patterns for rapid, reliable, fr...
iSAQB gathering 2021 keynote - Architectural patterns for rapid, reliable, fr...
 
Events to the rescue: solving distributed data problems in a microservice arc...
Events to the rescue: solving distributed data problems in a microservice arc...Events to the rescue: solving distributed data problems in a microservice arc...
Events to the rescue: solving distributed data problems in a microservice arc...
 
A pattern language for microservices - June 2021
A pattern language for microservices - June 2021 A pattern language for microservices - June 2021
A pattern language for microservices - June 2021
 
QConPlus 2021: Minimizing Design Time Coupling in a Microservice Architecture
QConPlus 2021: Minimizing Design Time Coupling in a Microservice ArchitectureQConPlus 2021: Minimizing Design Time Coupling in a Microservice Architecture
QConPlus 2021: Minimizing Design Time Coupling in a Microservice Architecture
 
MicroCPH - Managing data consistency in a microservice architecture using Sagas
MicroCPH - Managing data consistency in a microservice architecture using SagasMicroCPH - Managing data consistency in a microservice architecture using Sagas
MicroCPH - Managing data consistency in a microservice architecture using Sagas
 
GotoChgo 2019: Not Just Events: Developing Asynchronous Microservices
GotoChgo 2019: Not Just Events: Developing Asynchronous MicroservicesGotoChgo 2019: Not Just Events: Developing Asynchronous Microservices
GotoChgo 2019: Not Just Events: Developing Asynchronous Microservices
 

Dernier

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Dernier (20)

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 

Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

  • 1. Polyglot persistence for Java developers: time to move out of the relational comfort zone? Chris Richardson Author of POJOs in Action Founder of the original CloudFoundry.com @crichardson chris@chrisrichardson.net http://plainoldobjects.com
  • 2. @crichardson Presentation Goal The benefits and drawbacks of polyglot persistence and How to design applications that use this approach
  • 4. @crichardson About Chris Founder of a startup that’s creating a platform for developing event-driven microservices
  • 5. @crichardson Agenda • Why polyglot persistence? • Persisting entities with MongoDB and Cassandra • Querying data with MongoDB and Cassandra • Scaling MongoDB and Cassandra
  • 7. @crichardson Example: Food to Go • Take-out food delivery service • “Launched” in 2006
  • 9. @crichardson Example: Device management server ~ 2003 • Everything was stored in a Oracle database • Device metadata • Firmware patches! • ….
  • 10. @crichardson RDBMS are great • SQL = Rich, declarative query language • Database enforces referential integrity • ACID semantics • Well understood by developers • Well supported by frameworks and tools, e.g. Spring JDBC, Hibernate, JPA • Well understood by operations
  • 11. @crichardson Impact of SSD/Flash storage • HDD = 200 IOPS vs. SSD = 100K IOPS • Massive performance improvement • Expands the range of use cases that a single RDBMS server can cost-effectively support
  • 12. @crichardson • Hosted relational database • Compatible with MySQL 5.6 but with 5x performance • Vertically scales to 32 vCPUs and 244 GiB of RAM • SSD-backed virtualized storage layer, replicated 6 ways across 3 AZs • Up to 15 replicas that share storage with master - minimal replication lag • Fast restart after crash • No redo log replay • SSD-backed virtualized storage layer purpose-built for database workloads • Fast fail-over to replica after master instance failure without data loss AWS Aurora http://aws.amazon.com/rds/aurora/details/
  • 13. NEW SQL • Next generation SQL databases, e.g.VoltDB, MemSQL, ... • Leverage modern, multi-core, commodity hardware • In-memory • Horizontally scalable • Transparently shardable • ACID “Current databases are designed for 1970s hardware and for both OLTP and data warehouses” http://nms.csail.mit.edu/~stavros/pubs/OLTP_sigmod08.pdf
  • 14. @crichardson An RDBMS is great for many applications but ….
  • 15. @crichardson Limitations of relational databases • Scalability • Multi data center, distributed database • Schema updates • O/R impedance mismatch • Handling semi-structured data
  • 16. @crichardson Solution: Spend $$$ on Oracle’s high-end databases and servers
  • 18. @crichardson … or is it? http://www.iwtg.net/
  • 19. @crichardson Solution: Spend $$$ - open- source stack + DevOps people http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_5_series/madone_5_2/#
  • 20. @crichardson Apply the scale cube X axis - horizontal duplication Z axis -data partitioning Y axis - functional decomposition Scale by splitting sim ilar things Scale by splitting different things
  • 21. @crichardson Applying the scale cube • Y-axis splits/functional decomposition • Application = Set[Microservice] - each with its own database • Monolithic database is functionally decomposed • Different types of entities in different databases • Z-axis splits/sharding • Entities of the same type partitioned across multiple databases
  • 22. @crichardson How does each service access data? ? Velocity and Volume Variety of Data Fixed or ad hoc queries Access patterns DistributionLatency
  • 23. @crichardson Velocity andVolume? • Velocity - speed at which data moves • Volume - the amount of data • Does it fit on a single machine?
  • 24. @crichardson Variety of Data? • Relational • Aggregate oriented • Graph • Complex nested structures • Semi structured • Text • Binary blogs, e.g. images
  • 25. @crichardson Fixed or ad hoc queries? • Fixed set of queries • Known in advance • Slowly changing • Ad hoc queries • Users can submit ad hoc queries
  • 26. @crichardson Access patterns • PK-oriented access, e.g. load-modify-update a business entity • Bulk queries and/or updates • Non-relational queries: • text search • graph-oriented • geo search • …
  • 27. @crichardson Reads vs.Writes • Mix of reads and writes • Write intensive, e.g. logging application • Read intensive • Data analytics/warehouse • Slowly changing data • …
  • 28. @crichardson Distribution • Single database • Multiple active databases • on a LAN (low latency) • on a WAN (high latency)
  • 30. @crichardson Latency • When should new data show up in results? • Low latency - seconds, milliseconds? • High latency - next day?
  • 31. @crichardson And then pick your database…
  • 32. @crichardson Use a NoSQL database Benefits • Higher performance • Higher scalability • Richer data-model • Schema-less Drawbacks • Limited transactions • Limited querying • Relaxed consistency • Unconstrained data
  • 33. @crichardson Example NoSQL Databases Database Key features Cassandra Extensible column store, very scalable, distributed MongoDB Document-oriented, fast, scalable Redis Key-value store, very fast DynamoDB AWS hosted key-value and document store Neo4j Graph Database http://nosql-database.org/ lists 150 NoSQL databases
  • 35. @crichardson But there are many other options • Blob store, e.g.AWS S3 • Text search engine, e.g. ElasticSearch,AWS CloudSearch, … • Big data technology:Apache Hadoop,Apache Spark, … • Real time streaming: Storm, Spark Streaming, …
  • 36. @crichardson Polyglot persistence IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg Event sourcing and CQRS are a great approach
  • 37. @crichardson Agenda • Why polyglot persistence? • Persisting entities with MongoDB and Cassandra • Querying data with MongoDB and Cassandra • Scaling MongoDB and Cassandra
  • 38. @crichardson Food to Go – Domain model (partial) class Restaurant { long id; String name; Set<String> serviceArea; Set<TimeRange> openingHours; List<MenuItem> menuItems; } class MenuItem { String name; double price; } class TimeRange { long id; int dayOfWeek; int openTime; int closeTime; }
  • 39. @crichardson Database schema ID Name … 1 Ajanta 2 Montclair Eggshop Restaurant_id zipcode 1 94707 1 94619 2 94611 2 94619 Restaurant_id dayOfWeek openTime closeTime 1 Monday 1130 1430 1 Monday 1730 2130 2 Tuesday 1130 … RESTAURANT table RESTAURANT_ZIPCODE table RESTAURANT_TIME_RANGE table
  • 40. @crichardson RestaurantRepository public interface RestaurantRepository { void addRestaurant(Restaurant restaurant); Restaurant findById(long id); ... } Food To Go will have scaling eventually issues
  • 41. @crichardson MongoDB • Document-oriented database • JSON-style documents: Lists, Maps, primitives • Schema-less • Transaction = update of a single document • Rich query language for dynamic/ad hoc queries + geo queries • Tunable writes: speed vs. reliability • Highly scalable and available
  • 42. @crichardson MongoDB use cases • High volume writes • Complex data • Semi-structured data
  • 43. @crichardson MongoDB data model Server Database: Food To Go Collection: Restaurants { "_id" : ObjectId("4bddc2f49d1505567c6220a0") "name": "Ajanta", "serviceArea": ["94619", "99999"], "openingHours": [ { "dayOfWeek": 1, "open": 1130, "close": 1430 }, { "dayOfWeek": 2, "open": 1130, "close": 1430 }, … ] } BSON = binary JSON Sequence of bytes on disk è fast i/o 16MByte limit PK
  • 45. @crichardson Basic MongoDB collection operations... • insert(document(s), options) • Application assigned ids • Mongo generated UUID • update(query, update, options) • query - selects document(s) • update - replace or modify document (e.g. increment a field) • options - upset , multi, … (optional) • remove(query, options)
  • 46. @crichardson ....Basic MongoDB collection operations • find/findOne(criteria, projection) • criteria - query • projection - fields to return (optional)
  • 47. @crichardson Using Spring Data for Mongo @Repository class RestaurantRepositoryMongoDbImpl implements RestaurantRepository { @Override public void add(Restaurant restaurant) { mongoTemplate.insert(restaurant, "restaurants"); } @Override public Restaurant findDetailsById(int id) { return mongoTemplate.findById(id, Restaurant.class, "restaurants"); } Spring Data’s Generic Repositories = even less code
  • 48. @crichardson Apache Cassandra • Distributed/Extensible row store: row ~= java.util.SortedMap • Transaction = update of a row • Fast writes = append to a log • Tunable reads/writes: consistency latency/availability • Extremely scalable • Transparent and dynamic clustering • Rack and datacenter aware data replication
  • 49. @crichardson Apache Cassandra use cases • Big data • Multiple Data Center distributed database • (Write intensive) Logging • High-availability (writes)
  • 50. @crichardson Cassandra data model Keyspace Table K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3K2 Column Name Column Value Timestamp Row Key Column name/value: number, string, Boolean, timestamp, counter, and composite
  • 51. @crichardson Inserting/updating data table.insert(key=K1, (N4, V4, TS4), …)Idempotent= transaction Table K1 N1 V1 TS1 … N2 V2 TS2 N3 V3 TS3 Table K1 N1 V1 TS1 … N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 optional column TTL Application assigned keys - natural or UUID
  • 52. @crichardson Reading data table.slice(key=K1, startColumn=N2, endColumn=N4) Tables K1 N1 V1 TS1 … N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 K1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 Cassandra has secondary indexes but they aren’t always helpful
  • 53. @crichardson Cassandra Query Language • SQL-like • DDL: Create table, ... • DML: Insert, Update, Select, ... • Restricted WHERE clauses, e.g. PK equality only (if you want efficiency) • Primary key: • Simple - 1 storage table row 1 CQL row • Compound - 1 storage table row multiple CQL rows! (clustered rows)
  • 54. @crichardson Representing restaurants create table restaurant ( restaurant_id int PRIMARY KEY, name text, service_area set<text>, day_of_weeks list<int>, opening_times list<int>, closing_times list<int> );
  • 55. @crichardson Inserting and retrieving restaurants insert into restaurants.restaurant( restaurant_id, name, service_area, day_of_weeks, opening_times, closing_times) Values(?, ?, ?, ?, ?, ?) select * from restaurants.restaurant where restaurant_id = ?
  • 56. @crichardson Storing restaurants in Cassandra name Ajanta1 serviceArea:94619 - serviceArea:94618 - Set member daysOfWeeks:0 Monday daysOfWeeks:1 Monday Element index Element value
  • 57. @crichardson Cassandra Java APIs • Java Driver • https://github.com/datastax/java-driver • Netflix Astanyx • http://techblog.netflix.com/2013/12/astyanax-update.html • Spring Data for Cassandra • http://projects.spring.io/spring-data-cassandra/
  • 58. @crichardson Java Driver: Inserting a restaurant public class AvailableRestaurantRepositoryCassandraImpl ... public AvailableRestaurantRepositoryCassandraImpl(Session session) { insertStatement = session.prepare( "insert into restaurants.restaurant(restaurant_id, name, service_area, day_of_weeks, opening_times, closing_times) Values(?, ?, ?, ?, ?, ?);" ); ... } @Override public void add(Restaurant restaurant) { List<Integer> dayOfWeeks = new ArrayList<Integer>(); List<Integer> openingTimes = new ArrayList<Integer>(); List<Integer> closingTimes = new ArrayList<Integer>(); for (TimeRange tr : restaurant.getOpeningHours()) { dayOfWeeks.add(tr.getDayOfWeek()); openingTimes.add(tr.getOpenHour()); closingTimes.add(tr.getClosingTime()); } session.execute(insertStatement.bind(restaurant.getId(), restaurant.getName(), restaurant.getServiceArea(), dayOfWeeks, openingTimes, closingTimes )); }
  • 59. @crichardson Java Driver: Finding a restaurant public class AvailableRestaurantRepositoryCassandraImpl implements AvailableRestaurantRepository { public AvailableRestaurantRepositoryCassandraImpl(Session session) { this.findByIdStatement = session.prepare( "select * from restaurants.restaurant where restaurant_id = ?;"); ... } @Override public Restaurant findDetailsById(int id) { Row row = session.execute(findByIdStatement.bind(id)).all().get(0); List<Integer> dayOfWeeks = row.getList("day_of_weeks", Integer.class); List<Integer> openingTimes= row.getList("opening_times", Integer.class); List<Integer> closingTimes = row.getList("closing_times", Integer.class); Set<TimeRange> openingHours = new HashSet<TimeRange>(); for (int i = 0 ; i < dayOfWeeks.size(); i++) { openingHours.add( new TimeRange(dayOfWeeks.get(i), openingTimes.get(i), closingTimes.get(i))); } Restaurant r = new Restaurant(row.getString("name"), ..., row.getSet("service_area", String.class), openingHours, null); r.setId(id); return r; }
  • 60. @crichardson Agenda • Why polyglot persistence? • Persisting entities with MongoDB and Cassandra • Querying data with MongoDB and Cassandra • Scaling MongoDB and Cassandra
  • 61. @crichardson Finding available restaurants Available restaurants = Serve the zip code of the delivery address AND Are open at the delivery time public interface AvailableRestaurantRepository { List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime); ... }
  • 62. @crichardson Finding available restaurants on Monday, 6.15pm for 94619 zipcode Straightforward three-way join select r.* from restaurant r inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_id where ’94619’ = sa.zip_code and tr.day_of_week=’monday’ and tr.openingtime <= 1815 and 1815 <= tr.closingtime
  • 63. @crichardson MongoDB = easy to query { serviceArea:"94619", openingHours: { $elemMatch : { "dayOfWeek" : "Monday", "open": {$lte: 1815}, "close": {$gte: 1815} } } } DBCursor cursor = collection.find(qbeObject); while (cursor.hasNext()) { DBObject o = cursor.next(); … } db.availableRestaurants.ensureIndex({serviceArea: 1})
  • 64. @crichardson Using Spring Data for Mongo @Repository class RestaurantRepositoryMongoDbImpl implements RestaurantRepository { @Override public List<AvailableRestaurant> findAvailableRestaurants( Address deliveryAddress, Date deliveryTime) { int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime); Query query = new Query( where("serviceArea").is(deliveryAddress.getZip()) .and("openingHours") .elemMatch( where("dayOfWeek").is(dayOfWeek) .and("openingTime").lte(timeOfDay) .and("closingTime").gte(timeOfDay))); return mongoTemplate.find( query, AvailableRestaurant.class, AVAILABLE_RESTAURANTS_COLLECTION); }
  • 65. @crichardson BUT how to do this with Cassandra??! • How can Cassandra support a query that has • A 3-way join • Multiple = • > and < ? è We need to denormalize the data!!
  • 66. @crichardson Simplification #1: Denormalization Restaurant_id Day_of_week Open_time Close_time Zip_code 1 Monday 1130 1430 94707 1 Monday 1130 1430 94619 1 Monday 1730 2130 94707 1 Monday 1730 2130 94619 2 Monday 0700 1430 94619 … SELECT restaurant_id FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time AND open_time < 1815 Simpler query: § No joins § Two = and two <
  • 67. @crichardson Simplification #2:Application filtering SELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time AND open_time < 1815 Even simpler query • No joins • Two = and one < This is a CQL query!
  • 68. @crichardson Available restaurants table create table available_restaurants ( id int, name text, zip_code text, day_of_week int, open_time int, close_time int, primary key ((zip_code, day_of_week), close_time, id) ) ; Compound primary key Clustering columns prefix column names Composite partition key = row key
  • 69. @crichardson Cassandra available_restaurants table 1430:1:name Ajanta94619:Monday 1430:1:open_time 1130 close_time:id:≪column name≫zipcode:day of week 1730:1:name Ajanta 1730:1:open_time 2130 1430:2:name Egg shop 1430:2:open_time 0800 primary key ((zip_code, day_of_week), close_time, id)
  • 70. @crichardson Finding available restaurants select * from available_restaurants where zip_code = '94619' and day_of_week = 1 and close_time > 1815;
  • 71. @crichardson Cassandra query @Repository class AvailableRestaurantRepositoryCassandraImpl implements RestaurantRepository { public AvailableRestaurantRepositoryCassandraImpl(Session session) { this.findAvailable = session.prepare( "Select open_time, restaurant_name " + "From restaurants.available_restaurants " + "Where zip_code = ? " + "And day_of_week = ? " + " And close_time >= ?;" ); … }
  • 72. @crichardson Cassandra query @Repository class AvailableRestaurantRepositoryCassandraImpl implements RestaurantRepository { @Override public List<AvailableRestaurant> findAvailableRestaurants( Address deliveryAddress, Date deliveryTime) { List<AvailableRestaurant> result = new ArrayList<AvailableRestaurant>(); int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); BoundStatement bound = findAvailable.bind(deliveryAddress.getZip(), DateTimeUtil.dayOfWeek(deliveryTime), timeOfDay); for (Row row : session.execute(bound).all()) { if (row.getInt("open_time") <= timeOfDay) { result.add( new AvailableRestaurant(row.getString("restaurant_name")) ); } } return result; }
  • 75. @crichardson About Cassandra and MongoDB • Cassandra: • Efficient storage of complex aggregates • Limited queries requiring denormalized representation
 • MongoDB • Efficient storage of complex aggregates • Rich ad hoc queries But where they get really interesting is when it comes to scaling
  • 76. @crichardson Agenda • Why polyglot persistence? • Persisting entities with MongoDB and Cassandra • Querying data with MongoDB and Cassandra • Scaling MongoDB and Cassandra
  • 77. Scaling MongoDB: Replica Sets Replica Set Mongod (secondary) Mongod (primary) Mongod (secondary) Client http://docs.mongodb.org/manual/replication/ Writes Consistent reads Inconsistent reads replication Automatic master election Connects to seed servers
  • 78. Mongos Scaling MongoDB: Sharding Replica Set 2 (aka. Shard 2) Mongod (secondary) Mongod (primary) Mongod (secondary) Replica Set 1 (aka. Shard 1) Mongod (secondary) Mongod (primary) Mongod (secondary) Mongos Client Config Server mongod mongod mongod http://docs.mongodb.org/manual/core/sharding-introduction/ Key-based routing or Scatter/gather
  • 79. @crichardson MongoDB Sharding • Collection is partitioned into chunks • Each shard is responsible for one or more chunks • Range-based sharding • Each chunk is responsible for a range of keys • Efficient execution of range queries BUT risk of uneven distribution • Hash-based sharding • Key is hashed and mapped into chunk • Good distribution BUT range queries processed by all shards
  • 80. @crichardson MongoDB reads and writes • Writes • Trade-off: request latency vs. safety • No acknowledgement! • Acknowledgement by primary or by primary & N - 1 replicas • Acknowledgement after committing to journal • Tag-based, e.g. write to servers in different data centers • Reads • Read uncommitted isolation - reads can return data that has not been committed yet • Master - the default • Secondary - if stale data is ok • Use tags { w: N, j: true/false, wtimeout: timeout }
  • 81. @crichardson Cassandra cluster http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 Key Partitioner 64/128-bit hash (a.ka. token) VNode owns a range of hash values ReplicasMurmurHash MD5 Node owns collection of vnodes
  • 83. @crichardson Cassandra reads and writes • Any node can handle any request • Plays the role of coordinator • Communicates with replica nodes • Write request • Update is written to commit log of one or more replicas • Other replicas are updated asynchronously • Read request • Read data from one or more replicas • Choose the most recent data based on timestamp • Read repair: sends updates to stale replicas No Master!
  • 84. @crichardson Cassandra read and write consistency • For each read and write request you specify: • How many nodes to read/write before responding • Local (single DC) vs. Multi-DCs • All replicas in all DCs will eventually be updated • Trade-off: • More nodes: greater consistency but less availability and higher latency • Fewer nodes: less consistency but higher availability and lower latency http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
  • 85. @crichardson Consistency examples • High-performance, high-availability writes, e.g. logging • Write consistency of ANY - even replicas can be down • Read consistency of ONE - any replica • Consistent reads • (nodes_written + nodes_read) > replication_factor • Read/Write consistency of LOCAL_QUORUM • Globally consistent reads • Read/write consistency of QUORUM
  • 86. @crichardson Comparing Cassandra and MongoDB • Cassandra • Replica model • Write to any replica (or Node) • Sync locally/async globally
 • MongoDB • Master/slave model • Write to master • Sync to possibly remote master
  • 87. @crichardson Summary • Each SQL/NoSQL database = set of tradeoffs • NoSQL databases: • Diverse • Aggregate-oriented (typically) • Use query-oriented data modeling (typically) • Polyglot persistence: leverage the strengths of SQL and NoSQL databases