Relational databases have long been considered the one true way to persist enterprise data. Even today, they are an excellent choice for many applications. But for some applications NoSQL databases are a viable alternative. They can simplify the persistence of complex data models and offer significantly better scalability, and performance. But using NoSQL databases is very different than the ACID/SQL/JDBC/JPA world that we have become accustomed to. They have different and unfamiliar APIs and a very different and usually limited transaction model. So what’s a Java developer to do?
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)
1. Polyglot persistence for Java
developers:
time to move out of the relational
comfort zone?
Chris Richardson
Author of POJOs in Action
Founder of the original CloudFoundry.com
@crichardson
chris@chrisrichardson.net
http://plainoldobjects.com
5. @crichardson
Agenda
• Why polyglot persistence?
• Persisting entities with MongoDB and Cassandra
• Querying data with MongoDB and Cassandra
• Scaling MongoDB and Cassandra
10. @crichardson
RDBMS are great
• SQL = Rich, declarative query language
• Database enforces referential integrity
• ACID semantics
• Well understood by developers
• Well supported by frameworks and tools, e.g. Spring JDBC,
Hibernate, JPA
• Well understood by operations
11. @crichardson
Impact of SSD/Flash storage
• HDD = 200 IOPS vs. SSD = 100K IOPS
• Massive performance improvement
• Expands the range of use cases that a single RDBMS server
can cost-effectively support
12. @crichardson
• Hosted relational database
• Compatible with MySQL 5.6 but with 5x performance
• Vertically scales to 32 vCPUs and 244 GiB of RAM
• SSD-backed virtualized storage layer, replicated 6 ways across 3 AZs
• Up to 15 replicas that share storage with master - minimal replication lag
• Fast restart after crash
• No redo log replay
• SSD-backed virtualized storage layer purpose-built for database workloads
• Fast fail-over to replica after master instance failure without data loss
AWS Aurora
http://aws.amazon.com/rds/aurora/details/
13. NEW SQL
• Next generation SQL databases,
e.g.VoltDB, MemSQL, ...
• Leverage modern, multi-core,
commodity hardware
• In-memory
• Horizontally scalable
• Transparently shardable
• ACID
“Current databases are designed for 1970s
hardware and for both OLTP and data
warehouses”
http://nms.csail.mit.edu/~stavros/pubs/OLTP_sigmod08.pdf
20. @crichardson
Apply the scale cube
X axis
- horizontal duplication
Z
axis
-data
partitioning
Y axis -
functional
decomposition
Scale
by
splitting
sim
ilar
things
Scale by
splitting
different things
21. @crichardson
Applying the scale cube
• Y-axis splits/functional decomposition
• Application = Set[Microservice] - each with its own database
• Monolithic database is functionally decomposed
• Different types of entities in different databases
• Z-axis splits/sharding
• Entities of the same type partitioned across multiple databases
22. @crichardson
How does each service access
data?
?
Velocity and
Volume
Variety of
Data
Fixed or ad
hoc queries
Access
patterns
DistributionLatency
24. @crichardson
Variety of Data?
• Relational
• Aggregate oriented
• Graph
• Complex nested structures
• Semi structured
• Text
• Binary blogs, e.g. images
25. @crichardson
Fixed or ad hoc queries?
• Fixed set of queries
• Known in advance
• Slowly changing
• Ad hoc queries
• Users can submit ad hoc queries
26. @crichardson
Access patterns
• PK-oriented access, e.g. load-modify-update a business entity
• Bulk queries and/or updates
• Non-relational queries:
• text search
• graph-oriented
• geo search
• …
27. @crichardson
Reads vs.Writes
• Mix of reads and writes
• Write intensive, e.g. logging application
• Read intensive
• Data analytics/warehouse
• Slowly changing data
• …
35. @crichardson
But there are many other options
• Blob store, e.g.AWS S3
• Text search engine, e.g. ElasticSearch,AWS CloudSearch, …
• Big data technology:Apache Hadoop,Apache Spark, …
• Real time streaming: Storm, Spark Streaming, …
37. @crichardson
Agenda
• Why polyglot persistence?
• Persisting entities with MongoDB and Cassandra
• Querying data with MongoDB and Cassandra
• Scaling MongoDB and Cassandra
38. @crichardson
Food to Go – Domain model (partial)
class Restaurant {
long id;
String name;
Set<String> serviceArea;
Set<TimeRange> openingHours;
List<MenuItem> menuItems;
}
class MenuItem {
String name;
double price;
}
class TimeRange {
long id;
int dayOfWeek;
int openTime;
int closeTime;
}
41. @crichardson
MongoDB
• Document-oriented database
• JSON-style documents: Lists, Maps, primitives
• Schema-less
• Transaction = update of a single document
• Rich query language for dynamic/ad hoc queries + geo queries
• Tunable writes: speed vs. reliability
• Highly scalable and available
47. @crichardson
Using Spring Data for Mongo
@Repository
class RestaurantRepositoryMongoDbImpl implements RestaurantRepository {
@Override
public void add(Restaurant restaurant) {
mongoTemplate.insert(restaurant, "restaurants");
}
@Override
public Restaurant findDetailsById(int id) {
return mongoTemplate.findById(id, Restaurant.class, "restaurants");
}
Spring Data’s Generic Repositories = even less code
48. @crichardson
Apache Cassandra
• Distributed/Extensible row store: row ~= java.util.SortedMap
• Transaction = update of a row
• Fast writes = append to a log
• Tunable reads/writes: consistency latency/availability
• Extremely scalable
• Transparent and dynamic clustering
• Rack and datacenter aware data replication
49. @crichardson
Apache Cassandra use cases
• Big data
• Multiple Data Center distributed database
• (Write intensive) Logging
• High-availability (writes)
50. @crichardson
Cassandra data model
Keyspace
Table
K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3
N1 V1 TS1 N2 V2 TS2 N3 V3 TS3K2
Column
Name
Column
Value
Timestamp
Row
Key
Column name/value: number, string, Boolean, timestamp, counter, and
composite
55. @crichardson
Inserting and retrieving
restaurants
insert into restaurants.restaurant(
restaurant_id, name, service_area,
day_of_weeks, opening_times,
closing_times)
Values(?, ?, ?, ?, ?, ?)
select *
from restaurants.restaurant
where restaurant_id = ?
56. @crichardson
Storing restaurants in Cassandra
name Ajanta1 serviceArea:94619 -
serviceArea:94618 -
Set member
daysOfWeeks:0 Monday
daysOfWeeks:1 Monday
Element
index
Element
value
57. @crichardson
Cassandra Java APIs
• Java Driver
• https://github.com/datastax/java-driver
• Netflix Astanyx
• http://techblog.netflix.com/2013/12/astyanax-update.html
• Spring Data for Cassandra
• http://projects.spring.io/spring-data-cassandra/
58. @crichardson
Java Driver: Inserting a restaurant
public class AvailableRestaurantRepositoryCassandraImpl ...
public AvailableRestaurantRepositoryCassandraImpl(Session session) {
insertStatement = session.prepare(
"insert into restaurants.restaurant(restaurant_id, name, service_area, day_of_weeks,
opening_times, closing_times) Values(?, ?, ?, ?, ?, ?);"
);
...
}
@Override
public void add(Restaurant restaurant) {
List<Integer> dayOfWeeks = new ArrayList<Integer>();
List<Integer> openingTimes = new ArrayList<Integer>();
List<Integer> closingTimes = new ArrayList<Integer>();
for (TimeRange tr : restaurant.getOpeningHours()) {
dayOfWeeks.add(tr.getDayOfWeek());
openingTimes.add(tr.getOpenHour());
closingTimes.add(tr.getClosingTime());
}
session.execute(insertStatement.bind(restaurant.getId(),
restaurant.getName(),
restaurant.getServiceArea(),
dayOfWeeks,
openingTimes,
closingTimes
));
}
59. @crichardson
Java Driver: Finding a restaurant
public class AvailableRestaurantRepositoryCassandraImpl
implements AvailableRestaurantRepository {
public AvailableRestaurantRepositoryCassandraImpl(Session session) {
this.findByIdStatement = session.prepare(
"select * from restaurants.restaurant where restaurant_id = ?;");
...
}
@Override
public Restaurant findDetailsById(int id) {
Row row = session.execute(findByIdStatement.bind(id)).all().get(0);
List<Integer> dayOfWeeks = row.getList("day_of_weeks", Integer.class);
List<Integer> openingTimes= row.getList("opening_times", Integer.class);
List<Integer> closingTimes = row.getList("closing_times", Integer.class);
Set<TimeRange> openingHours = new HashSet<TimeRange>();
for (int i = 0 ; i < dayOfWeeks.size(); i++) {
openingHours.add(
new TimeRange(dayOfWeeks.get(i), openingTimes.get(i), closingTimes.get(i)));
}
Restaurant r = new Restaurant(row.getString("name"), ...,
row.getSet("service_area", String.class), openingHours, null);
r.setId(id);
return r;
}
60. @crichardson
Agenda
• Why polyglot persistence?
• Persisting entities with MongoDB and Cassandra
• Querying data with MongoDB and Cassandra
• Scaling MongoDB and Cassandra
61. @crichardson
Finding available restaurants
Available restaurants =
Serve the zip code of the delivery address
AND
Are open at the delivery time
public interface AvailableRestaurantRepository {
List<AvailableRestaurant>
findAvailableRestaurants(Address deliveryAddress, Date deliveryTime);
...
}
62. @crichardson
Finding available restaurants on Monday, 6.15pm for
94619 zipcode
Straightforward three-way join
select r.*
from restaurant r
inner join restaurant_time_range tr
on r.id =tr.restaurant_id
inner join restaurant_zipcode sa
on r.id = sa.restaurant_id
where ’94619’ = sa.zip_code
and tr.day_of_week=’monday’
and tr.openingtime <= 1815
and 1815 <= tr.closingtime
64. @crichardson
Using Spring Data for Mongo
@Repository
class RestaurantRepositoryMongoDbImpl implements RestaurantRepository {
@Override
public List<AvailableRestaurant> findAvailableRestaurants(
Address deliveryAddress, Date deliveryTime) {
int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime);
int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);
Query query =
new Query(
where("serviceArea").is(deliveryAddress.getZip())
.and("openingHours")
.elemMatch(
where("dayOfWeek").is(dayOfWeek)
.and("openingTime").lte(timeOfDay)
.and("closingTime").gte(timeOfDay)));
return mongoTemplate.find(
query, AvailableRestaurant.class,
AVAILABLE_RESTAURANTS_COLLECTION);
}
65. @crichardson
BUT how to do this with
Cassandra??!
• How can Cassandra support a query that has
• A 3-way join
• Multiple =
• > and <
?
è We need to denormalize the data!!
66. @crichardson
Simplification #1:
Denormalization
Restaurant_id Day_of_week Open_time Close_time Zip_code
1 Monday 1130 1430 94707
1 Monday 1130 1430 94619
1 Monday 1730 2130 94707
1 Monday 1730 2130 94619
2 Monday 0700 1430 94619
…
SELECT restaurant_id
FROM time_range_zip_code
WHERE day_of_week = ‘Monday’
AND zip_code = 94619
AND 1815 < close_time
AND open_time < 1815
Simpler query:
§ No joins
§ Two = and two <
75. @crichardson
About Cassandra and MongoDB
• Cassandra:
• Efficient storage of
complex aggregates
• Limited queries requiring
denormalized
representation
• MongoDB
• Efficient storage of
complex aggregates
• Rich ad hoc queries
But where they get really interesting is
when it comes to scaling
76. @crichardson
Agenda
• Why polyglot persistence?
• Persisting entities with MongoDB and Cassandra
• Querying data with MongoDB and Cassandra
• Scaling MongoDB and Cassandra
79. @crichardson
MongoDB Sharding
• Collection is partitioned into chunks
• Each shard is responsible for one or more chunks
• Range-based sharding
• Each chunk is responsible for a range of keys
• Efficient execution of range queries BUT risk of uneven distribution
• Hash-based sharding
• Key is hashed and mapped into chunk
• Good distribution BUT range queries processed by all shards
80. @crichardson
MongoDB reads and writes
• Writes
• Trade-off: request latency vs. safety
• No acknowledgement!
• Acknowledgement by primary or by primary & N - 1 replicas
• Acknowledgement after committing to journal
• Tag-based, e.g. write to servers in different data centers
• Reads
• Read uncommitted isolation - reads can return data that has not been committed yet
• Master - the default
• Secondary - if stale data is ok
• Use tags
{ w: N,
j: true/false,
wtimeout: timeout
}
83. @crichardson
Cassandra reads and writes
• Any node can handle any request
• Plays the role of coordinator
• Communicates with replica nodes
• Write request
• Update is written to commit log of one or more replicas
• Other replicas are updated asynchronously
• Read request
• Read data from one or more replicas
• Choose the most recent data based on timestamp
• Read repair: sends updates to stale replicas
No
Master!
84. @crichardson
Cassandra read and write
consistency
• For each read and write request you specify:
• How many nodes to read/write before responding
• Local (single DC) vs. Multi-DCs
• All replicas in all DCs will eventually be updated
• Trade-off:
• More nodes: greater consistency but less availability and higher latency
• Fewer nodes: less consistency but higher availability and lower latency
http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
85. @crichardson
Consistency examples
• High-performance, high-availability writes, e.g. logging
• Write consistency of ANY - even replicas can be down
• Read consistency of ONE - any replica
• Consistent reads
• (nodes_written + nodes_read) > replication_factor
• Read/Write consistency of LOCAL_QUORUM
• Globally consistent reads
• Read/write consistency of QUORUM
86. @crichardson
Comparing Cassandra and
MongoDB
• Cassandra
• Replica model
• Write to any replica (or
Node)
• Sync locally/async globally
• MongoDB
• Master/slave model
• Write to master
• Sync to possibly remote
master
87. @crichardson
Summary
• Each SQL/NoSQL database = set of tradeoffs
• NoSQL databases:
• Diverse
• Aggregate-oriented (typically)
• Use query-oriented data modeling (typically)
• Polyglot persistence: leverage the strengths of SQL and NoSQL
databases