Data Modeling for Microservices with Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016Strata + Hadoop World NYC Sept 26-29, 2016Page 1Page 1
Jeff Carpenter, Choice Hotels International
Data modeling for
microservices with
Cassandra and Spark

Strata + Hadoop World NYC Sept 26-29, 2016
1 IT Transformation – Distribution and Analytics
2 Creating a Data Architecture
3 Data Modeling for Microservices
4 Using Metadata for Diagnostics and Analytics
5 Challenges
Agenda
Page 2

IT Capabilities
Corporate IT
Guest
Franchise
Relations
Hotel
Manage-
ment
Business
Intelligence
Distribution
Page 3
This
talk

CRSWeb and
Mobile
External
Channels
Customer
& Loyalty
Billing
Property
Systems
Reporting
& Analytics
Distribution - Central Reservation System
Page 4
Distribution
Domain
Guest
Domain
Franchisee
Domain
Hotel
Management
Domain
Business
Intelligence
Domain

Current Reservation System – By The Numbers
Page 5
25 years
6,000 hotels
50
transactions / second4,000
distribution channels
1 instance

New Systems: Distribution and Data Platforms
Page 6
Distribution
Platform
Data
Platform
History
Realtime
data
See: Choice Hotels's journey to
better understand its customers
through self-service analytics
This Talk: how we model data
and use the self-service
platform

Distribution Platform - Architecture Tenets
Cloud-native
Microservices
Open Source Infrastructure
Extensibility
Stable, Scalable, Secure
Page 7

Data Ownership
What is a Microservice? (one definition)
Page 8
Message
Driven Service
Entity
Service
Client
REST
API
AMQ
Events
DB
Composing
Service
Persistence

Strata + Hadoop World NYC Sept 26-29, 2016Strata + Hadoop World NYC Sept 26-29, 2016Page 9
How can we design our data
architecture & models to be…
• Scalable?
• Extensible?
• Maintainable?
• Analytics-ready?

Non-
relational
storage
Long Term
Storage
Logging
Reporting
& Analytics
Metrics
Our Data Stack
Page 10

Data Modeling – Then and Now
Isolated
Systems
Data
Dictionary
SOA and
Canonical
Data
Model
Services
own data
Page 11
• Identifying domains and relationships
Conceptual Data Model
• Identifying data types and relationships
Logical Data Model
• Java APIs
• RESTful APIs (JSON)
• Events (JSON)
• Cassandra Schemas
Physical Models

Conceptual Data Model - Domains
Page 12
rates inventoryhotels reservationsoffers

Hotel Management
Domain
Guest DomainDistribution Domain
Conceptual Data Model – Domain Relationships
Page 13
hotels
guest
stay
loyalty
rates
inventory
offers
reservations

Rates Domain
Composite Rate Service
Rate Plan Service
Rate
Service
Logical Data Model – Identifying Types
Page 14
Rate Plan
• id
• code
• hotelId
• effectiveDates
• Conditions
Rate
• id
• ratePlanId
• productId
• hotelId
• dateSpan
Price
• condition
• amount
Product
• id
• code
• hotelId
• features
• …

Standardizing Common Data Types
• Instead of a Canonical Data Model,
we standardize basic building blocks
– Feature, Category, Brand
– Geospatial
– Financial
– Time
– Contact information
Page 15
Address
• lines[]
• city
• subdivision
• country
• postalCode

Data Types → Microservice Identification
Page 16
Hotel
Service
Rates
Service
Data Maintenance
Apps
Inventory
Service
Offer
Service
Inventory
Domain
Rates
Domain
Hotel
Domain
Offer
Domain
Internal / External
Client Apps
Reservation
Service
Reservation
Domain

Physical Data Models
Page 17
Physical Models
Java APIs
RESTful APIs
(JSON)
Events
(JSON)
Cassandra
Schemas
JSON = primary definition of
the data type owned by each
service

Key Data Types → RESTful Resource Paths
Page 18
Offer
Service
/offers
/reservations
Hotel
Service
Reservation
Service
Rates
Service
Inventory
Service
/hotels
/rates
/inventory

REST Java API
GET /types/<id> Type getTypeById()
GET /types?<query parameters> Type[] searchType(TypeSearchCriteria)
POST /types/ (JSON body) createType(Type)
PUT /types/ (JSON body) updateType(Type)
DELETE /types/<id> deleteType(TypeId)
Java and RESTful APIs – common pattern
Page 19

Strata + Hadoop World NYC Sept 26-29, 2016Page 20
Cassandra Data Modeling
(an idealized view)

Viewhotels
nearPOI
Viewhotel
Info
ShowPOIs
nearhotel
Shopfor
roomsat
hotel
Viewroom
details
Booka
room
Q1 Q2
Q3
Q4
Q5
Viewreservation
byconfirmation
number
Viewhotel
reservationsfor
adate
Find
reservationby
guestname
Q6
Q8
Q7
Viewguest
details
Q9
Q9
Q9
Cassandra Data Modeling – Access Patterns
Page 21

pois_by_hotel
hotel_id
poi_name
description
Q3
Q1 Q2 Q4
Q5
amenities_by_room
hotel_id
room_id
amenity_name
description
K
K
C↑
K
C↑
hotels_by_poi
poi_name
hotel_id
name
phone
address
K
C↑
hotels
hotel_id
name
phone
address
K
available_rooms_
by_hotel_date
hotel_id
date
room_number
is_available
K
C↑
C↑
Cassandra Data Modeling – Chebotko Diagrams
Page 22

hotelkeyspace
hotels_by_poi
poi_name
hotel_id
name
phone
address
K
C↑
pois_by_hotel
hotel_id
poi_name
description
amenities_by_room
hotel_id
room_number
amenity_name
description
K
K
C↑
K
C↑
available_rooms_
by_hotel_date
hotel_id
date
room_number
is_available
K
C↑
C↑
date
smallint
boolean
text
text
text
text
address
text
text
smallint
text
text
text
text
*address*
street
city
state_or_province
postal_code
country
hotels
hotel_id
name
phone
*address*
text
text
text
text
text
text
text
text
address
K
text
Cassandra Data Modeling - Physical
Page 23

Cassandra Data Modeling - Schemas
CREATE KEYSPACE hotel
WITH replication = {'class':
'SimpleStrategy',
'replication_factor' : 3};
CREATE TYPE hotel.address (
street text,
city text,
state_or_province text,
postal_code text,
country text
);
CREATE TABLE hotel.hotels_by_poi (
poi_name text,
hotel_id text,
name text,
phone text,
address frozen<address>,
PRIMARY KEY ((poi_name),
hotel_id)
)
WITH CLUSTERING ORDER BY (
hotel_id ASC) ;
Page 24

Strata + Hadoop World NYC Sept 26-29, 2016Page 25
And now…
Back to reality

Keyspace hotel
Access Patterns and Denormalization
Page 26
Locate hotel
by identifier
Find hotels
within X miles
of point Y
Find hotels by
city, state,
country
Find hotels
by postal
code
Hotels by
amenity
Find hotels
by brand
hotels_by_id
hotels_by_brand
hotels_by_postal_code
…
Hotels by
this
Hotels by
that
Hotels by
something
else

Metadata
Page 27
Request Context
• Requestor
• Tracking ID
• Token
• Locale
Service AMQ
Logs
ELK Stack
EventsIncoming
Request

Asynchronous events
Page 28
Event
• Type
• Create
• Update
• Delete
• Request Context
• Old entity
• New entity
Request Context
• Requestor
• Tracking ID
• Token
• Locale
{
"type" : "UPDATE",
"trackingId" : "0da7b794-f2c3-…",
"requestor": "Legacy CRS",
"newEntity" : {
"hotelId": "AZ123",
"productId": "NSK",
"date": "2016-05-20",
"consumedCount": "22",
"totalCount": "25“
},
"oldEntity" : {
"hotelId": "AZ123",
"productId": "NSK",
"date": "2016-05-20",
"consumedCount": "20",
"totalCount": "25“
}
}
Entity (old/new)
• Id
• …
Sample Inventory Event

Putting It Together – Diagnostics
Page 29
Service
C*
node
node
node
node
Incoming
Request
Data History Logs
Metrics StoreELK StackData Platform
Metrics

Metrics StoreELK Stack
Putting It Together – Long Term Storage
Page 30
Data Platform
C*
node
node
node
node
Long
Term
Storage

Separating Active and History Data
Page 31
Now
Time
Yesterday’s data is
ancient history
Rate + Inventory Data

Data Platform - Cloudera
History architecture
Page 32
Service AMQ Kafka
S3
Other
subscribers
History retrieval
History capture
Customer
Service Apps
History
Service
Spark
node
node
node
node
Impala*

Microservice Data Challenges
No Joins?
Data Maintenance
Data Integrity
Cascading Deletes
Transactions
Page 33

Distributed Transactions, Anyone?
Page 34
Commit the
contract
Reserve
the inventory
Booking
Client
Data Maintenance
Apps
Inventory
Service
Reservation
Service
inventory
reservations
Data
synchronization

Alternatives to Distributed Transactions
Approach Example Scope
C* Lightweight
Transaction
Updating inventory counts Data Tier
C* Logged Batch
Writing to multiple denormalized
hotel tables
Data Tier
Retrying failed calls
Data synchronization, reservation
processing
Service
Compensating
transactions
Verifying reservation processing System
Page 35
Eventual
consistency
Strong
consistency

Final Thoughts
Data Models > Microservices
Events = Streams
Use Metadata Everywhere
Page 36

Now Available!
Page 37
Cassandra: The Definitive Guide, 2nd Edition
Completely reworked for Cassandra 3.X:
• Data modeling in CQL
• SASI indexes
• Materialized views
• Lightweight transactions
• DataStax drivers
• New chapters on security, deployment, and integration

Contact Info
@choicehotels
careers.choicehotels.com
@jscarp
jeffreyscarpenter
Page 38

Data Modeling for Microservices with Cassandra and Spark

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Data Modeling for Microservices with Cassandra and Spark

Similaire à Data Modeling for Microservices with Cassandra and Spark (20)

Dernier

Dernier (20)

Data Modeling for Microservices with Cassandra and Spark

Notes de l'éditeur