SlideShare une entreprise Scribd logo
1  sur  38
© 2015 GridPoint, Inc. Proprietary and Confidential 1
Managing (Schema) Migrations in Cassandra
Mitch Gitman
senior software engineer
GridPoint, Inc.
© 2015 GridPoint, Inc. Proprietary and Confidential 2
10/23/2015
© 2015 GridPoint, Inc. Proprietary and Confidential 3
10/23/2015
© 2015 GridPoint, Inc. Proprietary and Confidential 4
10/23/2015
migration
A word with many meanings.
© 2015 GridPoint, Inc. Proprietary and Confidential 5
10/23/2015
disclaimer…
image © Ana Camamiel
© 2015 GridPoint, Inc. Proprietary and Confidential 6
What I mean by migrations
• Live-data migrations
10/23/2015
One-off as opposed to ETL
© 2015 GridPoint, Inc. Proprietary and Confidential 7
What I mean by migrations
• Source-driven migrations
− Schema migrations
− Reference data migrations
− Test/sample data migrations
• CQL commands as opposed to real data (sstables), generally
10/23/2015
source control
versioning
artifact versioning
publish
© 2015 GridPoint, Inc. Proprietary and Confidential 8
Database refactoring
10/23/2015
© 2015 GridPoint, Inc. Proprietary and Confidential 9
• Integration test & functional test automation (bootstrap-ability)
• CI server pipelines
• Containerization??
• Consistency & repeatability across environments
− Local developer box
− Dev environments
− Integration & QA environments
− Staging
− Production
Source-driven DB refactoring—the benefits
10/23/2015
© 2015 GridPoint, Inc. Proprietary and Confidential 10
We need tools!
• Built into web application frameworks
• Standalone
10/23/2015
© 2015 GridPoint, Inc. Proprietary and Confidential 11
What do (perhaps) all these tools have in common?
10/23/2015
They’re relational. They’re for SQL.
© 2015 GridPoint, Inc. Proprietary and Confidential 12
NoSQL Distilled
10/23/2015
Chapter 12. Schema Migrations
"We have seen that developing and maintaining
an application in the brave new world of
schemaless databases requires careful
attention to be given to schema migration."
either/or:
• RDBMS = strong schema
• NoSQL = no schema
© 2015 GridPoint, Inc. Proprietary and Confidential 13
10/23/2015
CREATE TABLE entities (
doc_id int,
attribute_name String,
attribute_value String,
...
PRIMARY KEY(doc_id, attribute_name)
);
• partition keys & clustering keys
• table-per-query denormalization
• shift from Thrift to CQL
• Thrift: super columns & super column families
• CQL: collection types
“metadata-driven documents
in columnar storage:”
Does Cassandra like weak schemas?
So how have teams been
managing their keyspace & table
definitions?
© 2015 GridPoint, Inc. Proprietary and Confidential 14
The Cassandra migration tools landscape
10/23/2015
• Flyway: First-class Cassandra support.
− Requires JDBC.
− https://github.com/flyway/flyway/issues/823
• Pillar: Scala tool.
• mutagen-cassandra: Java tool, Astyanax driver.
• Trireme: Python tool.
• cql-migrate: Python tool.
• mschematool: Python tool.
© 2015 GridPoint, Inc. Proprietary and Confidential 15
What’s the secret behind DB migration tools?
10/23/2015
The migrations version tracking table
© 2015 GridPoint, Inc. Proprietary and Confidential 16
Migration tool philosophies
10/23/2015
© Martha Stewart Living Omnimedia Inc. © Harpo Print, LLC
© 2015 GridPoint, Inc. Proprietary and Confidential 17
Flyway for Cassandra
10/23/2015
• First-class Flyway• Faked-out Flyway
migrations
(in SQL)
CQL
© 2015 GridPoint, Inc. Proprietary and Confidential 18
The tradeoff
10/23/2015
• Store the migrations tracking table in an RDBMS
© 2015 GridPoint, Inc. Proprietary and Confidential 19
Programmatically invoke Flyway
10/23/2015
© 2015 GridPoint, Inc. Proprietary and Confidential 20
10/23/2015
© 2015 GridPoint, Inc. Proprietary and Confidential 21
CassandraFlywayCallback
10/23/2015
implements FlywayCallback
© 2015 GridPoint, Inc. Proprietary and Confidential 22
Two-step process
10/23/2015
source control
artifact repository
MigrationsBuilder
FlywayMigrator
© 2015 GridPoint, Inc. Proprietary and Confidential 23
The migrations source
10/23/2015
The input to
MigrationsBuilder
© 2015 GridPoint, Inc. Proprietary and Confidential 24
10/23/2015
Run MigrationsBuilder for CQL:
Run MigrationsBuilder for SQL:
© 2015 GridPoint, Inc. Proprietary and Confidential 25
The generated
migrations
10/23/2015
The output from
MigrationsBuilder
© 2015 GridPoint, Inc. Proprietary and Confidential 26
The generated SQL script
10/23/2015
Faking out Flyway
© 2015 GridPoint, Inc. Proprietary and Confidential 27
10/23/2015
Run FlywayMigrator for CQL:
Run FlywayMigrator for SQL:
java -classpath /…/flyway-migrator-postgresql.jar 
com.gridpoint.tools.migrator.flyway.FlywayMigrator postgresql
java -classpath /…/flyway-migrator-cassandra.jar 
com.gridpoint.tools.migrator.flyway.FlywayMigrator cassandra
© 2015 GridPoint, Inc. Proprietary and Confidential 28
10/23/2015
flyway-migrator-postgresql.jarflyway-migrator-cassandra.jar
© 2015 GridPoint, Inc. Proprietary and Confidential 29
The migrations version tracking table
10/23/2015
The Cassandra incarnation
© 2015 GridPoint, Inc. Proprietary and Confidential 30
Best practices
10/23/2015
• Variations on versions
− Version control: f94c7d7f8b130df360a4e9e4f586eafc618ddc50
− Artifact repository: 3.5.1
− Migration tool: 201505270800 or 10 or whatever you want
− Effective contract versions—multiple versions can coexist at runtime
• Consistent deployment across environments
• Failure handling
• Baselining
• Rollbacks?
• Check schema agreement
© 2015 GridPoint, Inc. Proprietary and Confidential 31
Schema agreement
10/23/2015
https://datastax.github.io/java-driver/2.1.8/features/metadata/
© 2015 GridPoint, Inc. Proprietary and Confidential 32
Cassandra… migrations… limitations
10/23/2015
• Limitations of our Flyway-based solution
− You need a relational database
− Not open-sourced
• Limitations of source-driven migrations, in general
© 2015 GridPoint, Inc. Proprietary and Confidential 33
Static vs. dynamic tables
10/23/2015
© 2015 GridPoint, Inc. Proprietary and Confidential 34
Deploy time vs. runtime
10/23/2015
Dedicated migration application vs. part of main application
© 2015 GridPoint, Inc. Proprietary and Confidential 35
Source-driven, but…
10/23/2015
• The orchestration is in source control
• Actual data rather than CQL commands
− Not necessarily live data
− Maybe doesn’t need to be in source control
© 2015 GridPoint, Inc. Proprietary and Confidential 36
Embracing polyglot persistence
10/23/2015
A unified migrations solution
© 2015 GridPoint, Inc. Proprietary and Confidential 37
Takeaways
10/23/2015
•challenging
•exciting
•routine
•boring
© 2015 GridPoint, Inc. Proprietary and Confidential 38
10/23/2015
Thank you!
Mitch Gitman
 mgitman@gridpoint.com
 mgitman@nilistics.net
 mgitman@gmail.com
 skeletal presence @ LinkedIn

Contenu connexe

Tendances

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
DataWorks Summit
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 

Tendances (20)

Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
 
Cassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthCassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per month
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 
Couchbase 101
Couchbase 101 Couchbase 101
Couchbase 101
 
Oracle Exadata Exam Dump
Oracle Exadata Exam DumpOracle Exadata Exam Dump
Oracle Exadata Exam Dump
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Automate DBA Tasks With Ansible
Automate DBA Tasks With AnsibleAutomate DBA Tasks With Ansible
Automate DBA Tasks With Ansible
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandra
 
Getting started with MariaDB with Docker
Getting started with MariaDB with DockerGetting started with MariaDB with Docker
Getting started with MariaDB with Docker
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 

Similaire à Managing (Schema) Migrations in Cassandra

Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
Ludovico Caldara
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
Timothy Spann
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
ssuser73434e
 
The many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent dataThe many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent data
DoKC
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
Timothy Spann
 

Similaire à Managing (Schema) Migrations in Cassandra (20)

Governing Elastic IoT Cloud Systems under Uncertainties
Governing Elastic IoT Cloud Systems under UncertaintiesGoverning Elastic IoT Cloud Systems under Uncertainties
Governing Elastic IoT Cloud Systems under Uncertainties
 
Pivotal microservices spring_pcf_skillsmatter.pptx
Pivotal microservices spring_pcf_skillsmatter.pptxPivotal microservices spring_pcf_skillsmatter.pptx
Pivotal microservices spring_pcf_skillsmatter.pptx
 
To Microservices and Beyond
To Microservices and BeyondTo Microservices and Beyond
To Microservices and Beyond
 
Concevoir et déployer vos applications a base de microservices sur Cloud Foundry
Concevoir et déployer vos applications a base de microservices sur Cloud FoundryConcevoir et déployer vos applications a base de microservices sur Cloud Foundry
Concevoir et déployer vos applications a base de microservices sur Cloud Foundry
 
Plate spin migration and transformation prsesentation upload
Plate spin migration and transformation prsesentation uploadPlate spin migration and transformation prsesentation upload
Plate spin migration and transformation prsesentation upload
 
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
 
Big Iron + Big Data = BIG DEAL! Unlock The Power of Your Mainframe Data
Big Iron + Big Data = BIG DEAL! Unlock The Power of Your Mainframe DataBig Iron + Big Data = BIG DEAL! Unlock The Power of Your Mainframe Data
Big Iron + Big Data = BIG DEAL! Unlock The Power of Your Mainframe Data
 
Designing CloudStack Clouds
Designing CloudStack CloudsDesigning CloudStack Clouds
Designing CloudStack Clouds
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
 
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for KubernetesConfluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
 
Clocker, Calico and Docker
Clocker, Calico and DockerClocker, Calico and Docker
Clocker, Calico and Docker
 
Who's in your Cloud? Cloud State Monitoring
Who's in your Cloud? Cloud State MonitoringWho's in your Cloud? Cloud State Monitoring
Who's in your Cloud? Cloud State Monitoring
 
The many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent dataThe many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent data
 
The many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent dataThe many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent data
 
How Facebook's Technologies can define the future of VistA and Health IT
How Facebook's Technologies can define the future of VistA and Health ITHow Facebook's Technologies can define the future of VistA and Health IT
How Facebook's Technologies can define the future of VistA and Health IT
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
 
Microservices with kubernetes @190316
Microservices with kubernetes @190316Microservices with kubernetes @190316
Microservices with kubernetes @190316
 
Modeling the IoT with TitanDB and Cassandra
Modeling the IoT with TitanDB and CassandraModeling the IoT with TitanDB and Cassandra
Modeling the IoT with TitanDB and Cassandra
 
Removing Barriers Between Dev and Ops
Removing Barriers Between Dev and OpsRemoving Barriers Between Dev and Ops
Removing Barriers Between Dev and Ops
 

Plus de DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

Plus de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Managing (Schema) Migrations in Cassandra

  • 1. © 2015 GridPoint, Inc. Proprietary and Confidential 1 Managing (Schema) Migrations in Cassandra Mitch Gitman senior software engineer GridPoint, Inc.
  • 2. © 2015 GridPoint, Inc. Proprietary and Confidential 2 10/23/2015
  • 3. © 2015 GridPoint, Inc. Proprietary and Confidential 3 10/23/2015
  • 4. © 2015 GridPoint, Inc. Proprietary and Confidential 4 10/23/2015 migration A word with many meanings.
  • 5. © 2015 GridPoint, Inc. Proprietary and Confidential 5 10/23/2015 disclaimer… image © Ana Camamiel
  • 6. © 2015 GridPoint, Inc. Proprietary and Confidential 6 What I mean by migrations • Live-data migrations 10/23/2015 One-off as opposed to ETL
  • 7. © 2015 GridPoint, Inc. Proprietary and Confidential 7 What I mean by migrations • Source-driven migrations − Schema migrations − Reference data migrations − Test/sample data migrations • CQL commands as opposed to real data (sstables), generally 10/23/2015 source control versioning artifact versioning publish
  • 8. © 2015 GridPoint, Inc. Proprietary and Confidential 8 Database refactoring 10/23/2015
  • 9. © 2015 GridPoint, Inc. Proprietary and Confidential 9 • Integration test & functional test automation (bootstrap-ability) • CI server pipelines • Containerization?? • Consistency & repeatability across environments − Local developer box − Dev environments − Integration & QA environments − Staging − Production Source-driven DB refactoring—the benefits 10/23/2015
  • 10. © 2015 GridPoint, Inc. Proprietary and Confidential 10 We need tools! • Built into web application frameworks • Standalone 10/23/2015
  • 11. © 2015 GridPoint, Inc. Proprietary and Confidential 11 What do (perhaps) all these tools have in common? 10/23/2015 They’re relational. They’re for SQL.
  • 12. © 2015 GridPoint, Inc. Proprietary and Confidential 12 NoSQL Distilled 10/23/2015 Chapter 12. Schema Migrations "We have seen that developing and maintaining an application in the brave new world of schemaless databases requires careful attention to be given to schema migration." either/or: • RDBMS = strong schema • NoSQL = no schema
  • 13. © 2015 GridPoint, Inc. Proprietary and Confidential 13 10/23/2015 CREATE TABLE entities ( doc_id int, attribute_name String, attribute_value String, ... PRIMARY KEY(doc_id, attribute_name) ); • partition keys & clustering keys • table-per-query denormalization • shift from Thrift to CQL • Thrift: super columns & super column families • CQL: collection types “metadata-driven documents in columnar storage:” Does Cassandra like weak schemas? So how have teams been managing their keyspace & table definitions?
  • 14. © 2015 GridPoint, Inc. Proprietary and Confidential 14 The Cassandra migration tools landscape 10/23/2015 • Flyway: First-class Cassandra support. − Requires JDBC. − https://github.com/flyway/flyway/issues/823 • Pillar: Scala tool. • mutagen-cassandra: Java tool, Astyanax driver. • Trireme: Python tool. • cql-migrate: Python tool. • mschematool: Python tool.
  • 15. © 2015 GridPoint, Inc. Proprietary and Confidential 15 What’s the secret behind DB migration tools? 10/23/2015 The migrations version tracking table
  • 16. © 2015 GridPoint, Inc. Proprietary and Confidential 16 Migration tool philosophies 10/23/2015 © Martha Stewart Living Omnimedia Inc. © Harpo Print, LLC
  • 17. © 2015 GridPoint, Inc. Proprietary and Confidential 17 Flyway for Cassandra 10/23/2015 • First-class Flyway• Faked-out Flyway migrations (in SQL) CQL
  • 18. © 2015 GridPoint, Inc. Proprietary and Confidential 18 The tradeoff 10/23/2015 • Store the migrations tracking table in an RDBMS
  • 19. © 2015 GridPoint, Inc. Proprietary and Confidential 19 Programmatically invoke Flyway 10/23/2015
  • 20. © 2015 GridPoint, Inc. Proprietary and Confidential 20 10/23/2015
  • 21. © 2015 GridPoint, Inc. Proprietary and Confidential 21 CassandraFlywayCallback 10/23/2015 implements FlywayCallback
  • 22. © 2015 GridPoint, Inc. Proprietary and Confidential 22 Two-step process 10/23/2015 source control artifact repository MigrationsBuilder FlywayMigrator
  • 23. © 2015 GridPoint, Inc. Proprietary and Confidential 23 The migrations source 10/23/2015 The input to MigrationsBuilder
  • 24. © 2015 GridPoint, Inc. Proprietary and Confidential 24 10/23/2015 Run MigrationsBuilder for CQL: Run MigrationsBuilder for SQL:
  • 25. © 2015 GridPoint, Inc. Proprietary and Confidential 25 The generated migrations 10/23/2015 The output from MigrationsBuilder
  • 26. © 2015 GridPoint, Inc. Proprietary and Confidential 26 The generated SQL script 10/23/2015 Faking out Flyway
  • 27. © 2015 GridPoint, Inc. Proprietary and Confidential 27 10/23/2015 Run FlywayMigrator for CQL: Run FlywayMigrator for SQL: java -classpath /…/flyway-migrator-postgresql.jar com.gridpoint.tools.migrator.flyway.FlywayMigrator postgresql java -classpath /…/flyway-migrator-cassandra.jar com.gridpoint.tools.migrator.flyway.FlywayMigrator cassandra
  • 28. © 2015 GridPoint, Inc. Proprietary and Confidential 28 10/23/2015 flyway-migrator-postgresql.jarflyway-migrator-cassandra.jar
  • 29. © 2015 GridPoint, Inc. Proprietary and Confidential 29 The migrations version tracking table 10/23/2015 The Cassandra incarnation
  • 30. © 2015 GridPoint, Inc. Proprietary and Confidential 30 Best practices 10/23/2015 • Variations on versions − Version control: f94c7d7f8b130df360a4e9e4f586eafc618ddc50 − Artifact repository: 3.5.1 − Migration tool: 201505270800 or 10 or whatever you want − Effective contract versions—multiple versions can coexist at runtime • Consistent deployment across environments • Failure handling • Baselining • Rollbacks? • Check schema agreement
  • 31. © 2015 GridPoint, Inc. Proprietary and Confidential 31 Schema agreement 10/23/2015 https://datastax.github.io/java-driver/2.1.8/features/metadata/
  • 32. © 2015 GridPoint, Inc. Proprietary and Confidential 32 Cassandra… migrations… limitations 10/23/2015 • Limitations of our Flyway-based solution − You need a relational database − Not open-sourced • Limitations of source-driven migrations, in general
  • 33. © 2015 GridPoint, Inc. Proprietary and Confidential 33 Static vs. dynamic tables 10/23/2015
  • 34. © 2015 GridPoint, Inc. Proprietary and Confidential 34 Deploy time vs. runtime 10/23/2015 Dedicated migration application vs. part of main application
  • 35. © 2015 GridPoint, Inc. Proprietary and Confidential 35 Source-driven, but… 10/23/2015 • The orchestration is in source control • Actual data rather than CQL commands − Not necessarily live data − Maybe doesn’t need to be in source control
  • 36. © 2015 GridPoint, Inc. Proprietary and Confidential 36 Embracing polyglot persistence 10/23/2015 A unified migrations solution
  • 37. © 2015 GridPoint, Inc. Proprietary and Confidential 37 Takeaways 10/23/2015 •challenging •exciting •routine •boring
  • 38. © 2015 GridPoint, Inc. Proprietary and Confidential 38 10/23/2015 Thank you! Mitch Gitman  mgitman@gridpoint.com  mgitman@nilistics.net  mgitman@gmail.com  skeletal presence @ LinkedIn

Notes de l'éditeur

  1. We've had some sexy, exciting, cutting-edge topics today. This is not one of them. ... This is more the sort of routine, good-housekeeping, foundational work that can make the exciting stuff a little less exciting. I’m going to be talking about managing migrations in Cassandra and in particular schema migrations.
  2. Let me give a nod to my employer. From the web site: “ GridPoint is a leader in comprehensive, data-driven energy management solutions (EMS) that leverage the power of real-time data collection, big data analytics and cloud computing to maximize energy savings, operational efficiency, capital utilization and sustainability benefits.” The company is based in Arlington, VA, with a development office in Seattle.
  3. Disclaimer… This is my perspective. Oh, the statue you see is from Bonn, Germany, according to the photographer.
  4. A live-data migration is the process that runs to take the data in one table and adapt it to another table, such that the data in the first table can eventually be retired. I’m not going to be focusing so much on live-data migrations.
  5. I’m going to be focusing instead on what I would call source-driven migrations. For schema migrations, think DDL. The migrations are stored in source control and subject to source control versioning. They may be published to an artifact repository, where they artifact versioning and release versioning can be applied. I’ll be focusing in particular on schema migrations.
  6. These sorts of problems are covered in depth in this book from the Martin Fowler series that came out in 2006.
  7. I can’t speak to containerizing migrations. We haven’t explored that.
  8. A couple other established standalone tools are DBMaintain and DBDeploy, although those projects have not been active in recent years.
  9. 12.2. Schema Changes in RDBMS Liquibase, Mybatis Migrator, DBDeploy, DBMaintain 12.3. Schema Changes in a NoSQL Data Store the schema needs to change frequently in response to changing business requirements | can use similar techniques as with databases with strong schemas   with schemalessness at the database level, the burden of supporting the effective schema is shifted up to the application | the application still needs to be able to (un)marshal the data
  10. With this slide, I hope you can see that I’m setting up a bit of a straw man. (A straw man with a strong man.) There was a StackOverflow thread on schema migration tools for Cassandra (http://stackoverflow.com/questions/25286273/is-there-a-schema-versioning-tool-for-cassandra), and there was an erroneous answer I found amusing: "Cassandra is by its nature… 'schemaless.' It is a structured key-value store, so it is very different from a traditional rdbms in that regard.”   Think about it though. With Cassandra as much as with a relational database, you pay a bitter price for getting your schema wrong.   You end up defining a good number of tables.   I have the fortune of not having worked much with Thrift. But I know that with Thrift, you'd be in the business of manipulating the contents of messages, which obscures the database's desire to have a schema applied to it.   With Thrift, you had super columns and super column families. With CQL, you have collections. But the collections still have to be part of a table. The things that might smack of schemalessness still come back to a schema. =========================================== Thought experiment. Go into cqlsh and execute: describe keyspace keyspace_name   How big is that output getting? How much is it changing over time? =========================================== At last month's Cassandra Summit, there was an interesting talk by a company called Reltio, and they described how they were using Cassandra to support "metadata-driven documents in columnar storage." So they produced a keyspace that had a generic table like this. And maybe that schema only had one or two tables. But even they acknowledged that this is an atypical use case for Cassandra. =========================================== So how have teams been managing their keyspace and table definitions? My anecdotal experience is that whenever the question has come up, teams have usually rolled their own, especially because, on the face of it, or in the simple case, this seems like such a simple thing.  
  11. Next I want to get into the tools that are out there for Cassandra migrations, and the roadblocks teams have faced trying to manage Cassandra schema migrations via LiquiBase and Flyway. =========================================== Some history. The obvious way to integrate Liquibase or Flyway with Cassandra comes back to the prospect of the DataStax Java Driver supporting JDBC. There’s this statement from the 2013 announcement of the introduction of the driver (http://www.datastax.com/dev/blog/new-datastax-drivers-a-new-face-for-cassandra): "Today, DataStax announces version 1.0.0 of a new Java Driver, designed for CQL and based on years of experience within the Cassandra community. This Java driver is a first step; an object mapping and a JDBC extension will be available soon…." Let’s keep that JDBC extension in mind. =========================================== There was a liquibase-cassandra project that seemed to hit a wall. So some people gravitated toward Flyway. =========================================== Then there was a GitHub issue for the Flyway project , “Cassandra support.” https://github.com/flyway/flyway/issues/823   In January someone mentions a cassandra-jdbc project that’s out there and which also seems to have hit a wall. "I …recently looked into adding support for Cassandra to Flyway, but using the existing cassandra-jdbc driver from https://code.google.com/a/apache-extras.org/p/cassandra-jdbc/ , just to see how far I could get. I found a few issues:" Proceeds to list the issues. "I disabled or stubbed out code to get past these, but gave up soon after."   That same poster referenced a thread he started on the DataStax Java Driver user mailing list. =========================================== So if we go to that thread, which is from last December (https://groups.google.com/a/lists.datastax.com/forum/#!msg/java-driver-user/kspAx0neZlI/8A59HmYc-rwJ): Subject: "Timeline for JDBC support?"   "Is there any timeline for JDBC support in the DataStax Java Driver for Cassandra, please?"   Alex Popescu, Sen. Product Manager @ DataStax responds: "While I cannot (yet) promise an ETA for JDBC support, what I can say is that it's on our todo list (and very close to the top)." =========================================== I look forward to seeing how DataStax pulls off the Cassandra JDBC support, but to my mind, trying to do JDBC against Cassandra seems like, I dunno, a bit of an uphill climb.   So let's side aside the prospect of first-class Cassandra support in Flyway and see what else is out there. =========================================== Toward the end of the DataStax Java Driver mailing list thread, someone else chimes in and mentions Pillar, which is a dedicated Cassandra migrations tool written in Scala.   And here’s roughly what I wrote in my own internal tool evaluation: “Before settling on (our) Flyway design for Cassandra schema migrations, I evaluated various open-source Cassandra migration tools. They’re listed below. Of them, the most promising tool was Pillar, which is implemented in Scala. The problem with Pillar vs. (Flyway) was the risk. I was afraid I’d invest time with Pillar and come up emptyhanded, that it wouldn’t deliver the sort of contract I expect from Flyway.” That’s what I wrote. I’m happy we went down the road we did (if I weren’t I wouldn’t be here talking about it), but I’d still maintain that Pillar is worth checking out.   There's mutagen-cassandra, which is a Java tool written against the Astyanax driver but which hasn't been adapted to the DataStax Java Driver.   Then there are these three Python-based tools: Trireme, cql-migrate, mschematool.
  12. Here’s a view of a migrations table that’s responsible for several schemas in PostgreSQL, with PostgeSQL’s concept of a schema, analogous to a keyspace in Cassandra.
  13. So let’s get back to the two prominent database migration tools in the relational world. I think of Liquibase as the Martha Stewart of migration tools. It’s somewhat of a control freak. It wants to do everything itself. On the other hand, I think of Flyway as the Oprah of migration tools. It provides a framework and then gives you the space to figure things out for yourself. You see, Liquibase wants to generate the SQL from XML constructs. In the typical usage, the SQL is NOT a first-class citizen. You can define Liquibase migrations as SQL, but even then (to the best of my knowledge) you have to define it inline in the XML. With Flyway, though, SQL is a first-class citizen. You can make migrations ouf of straight .sql files. It’s Flyway’s lightweight, inobtrusive, extensible approach that’s going to provide the leverage for using it with Cassandra.
  14. So instead of first-class Flway, we’re going to do faked-out Flyway. The idea is, let Flyway do what it knows, which is migrations. Let Cassandra do what it knows, which is CQL. All we need is an adapter or translator to connect the two. And one key point. When I say that Flyway knows migrations, I’m saying that Flyway knows migrations in SQL.
  15. So here’s the tradeoff. Or “the weird trick,” to use the parlance of an Internet ad. Here’s what I wrote in my own internal design doc: “The reality is that first-class Flyway support for Cassandra doesn’t really gain us anything more than our fake-Flyway solution does, especially considering that we’re fine with persisting the Flyway migrations table to PostgreSQL; once you’re embracing polyglot persistence, you’d realize that a relational database is a better fit anyway for keeping track of the migrations.” 
  16. Failure handling: If a migration produces invalid CQL, the driver throws a RuntimeException. The act of throwing a RuntimeException is the signal I need to tell the JDBC Connection to roll back the transaction. This emulates the JDBC contract where RuntimeExceptions cause the transaction in the actual migrate call to roll back. We do this with the beforeEachMigrate hook so that we have a chance to fail the migration before our dummy, token migration has a chance to run. Flyway will have succeeded with all the migrations up to that point; it will fail only with this particular migration. That preserves the expected Flyway behavior.
  17. Our migrations follow a two-step process. At build time, we produce an artifact that gets published to an artifact repository. That’s the work of a proprietary class called MigrationsBuilder. At runtime, we have another custom class called FlywayMigrator that runs the published migrations against the target database. In the simple case with Flyway, there’s only a single step, the deploy-time step, even if that might be executed at build time, or to be precise, by a build tool like Maven or Gradle. It’s worth noting that we use the same two-step process, with the same classes, just the same way if the destination database is PostgreSQL.
  18. We have the .cql files organized into directories according to our releases.
  19. Here you can see that MigrationsBuilder is executed in a maven build. And you can see that the execution for CQL as opposed to SQL differs only by some arguments.
  20. Here we can see the output of MigrationsBuilder. MigrationsBuilder creates .sql files in a package structure that Flyway expects. But our .cql files just show up in the root of the classpath. The generated .sql files have the same simple names as the generated .cql files, and those names have been tweaked from the names in source control to comply with Flyway conventions.
  21. Contains the CQL script’s contents. This is the dummy, token script that the Flyway class executes with its migrate method.
  22. Now, at deploy time, when we go to execute FlywayMigrator against the destination database, you can see that the CQL and SQL invocations are quite similar.
  23. Here we see the dependencies for the standalone JAR that’s executed at deploy time. Both JARs depend on the flywayMigrator library. The Cassandra JAR has only one other dependency because it has to support only one keyspace. The PostgreSQL JAR has numerous other dependencies because it has to support multiple schemas along with some migrations and constructs that don’t fit nicely in a schema.
  24. Here you can see how the migrations version tracking table for Cassandra has been populated after a FlywayMigrator execution.
  25. Now I want to go beyond our own Cassandra migration solution and share some best practices that I’ve arrived at and that I’d recommend however you do your migrations. First, it’s worth keeping in mind the distinction between different kinds of versioning. Regarding effective contract versions, there’s a nice discussion in Chapter 12 of “NoSQL Distilled” of making two schema versions coexist in a running application. Consistent deployment across environments. You should be trying to execute your migrations the same way on a local dev box as you do in production. Or at least isolate the differences. Failure handling: This goes back to the rollback semantics I was describing in beforeEachMigrate. The Flyway contract is every migration up to the migration that failed sticks because every migration up to that failure succeeded. Baselining: If you haven’t been doing formalized database migrations from the get-go, you can use the current state of production as the starting point for your migrations by taking the “describe keyspace” CQL from cqlsh and make that be your initial migration, but only for installations that you want to create from scratch. And if you’ve made a lot of changes to your tables but your migrations haven’t made it to production yet, you can scrap all the history and start from your latest definitions. You get to call a mulligan. Declaring migration bankruptcy. Rollbacks: Something that Liquibase supports. Part of why Liquibase tries to be such a control freak. Flyway, on the other hand, purposely does not support rollbacks. When I first looked into Flyway, that to me was a downside. But I eventually came around to the Flyway way of thinking. You keep progressing forward, even if you’re semantically going backwards. A little like an event sourcing paradigm.
  26. The DataStax Java Driver has a nice mechanism for checking that your schema changes have propagated across the entire cluster. This snippet is taken from the DataStax Java Driver documentation.
  27. The graphic is showing how a source-driven migration can inevitably expand into incorporating a live-data migration as well. Maybe you’re changing a column or moving from one table to another, and in the process, you need to copy over the data. This isn’t so much a limitation. In a way, it’s a strength. Because we’re doing everything programmatically, there’s nothing stopping us from coupling a live-data migration with a source-driven migration. It’s just an extra amount of complexity to account for.
  28. Now here is an actual limitation. The two tables you see represent the same data, but with one having the data clustered in ascending order and the other with the data clustered in descending order. We need to have a time bucket to keep the partitions from growing indefinitely. In the ascending table, we’re able to incorporate the bucket into the partition key. But with the descending table, we want to be able to drop the tables entirely after a certain amount of time. So with those tables, we make the effective bucket part of the table name. The ascending table, where the bucket is part of the partition key—that we’re able to create statically in the migrations. But the descending table we have to create dynamically on the fly in the application. So it falls outside the realm of the migrations. I’m sure there’s a better solution out there; we’re living with this solution for now.
  29. Some other considerations… Making it part of the main app is what I believe a lot of teams do.
  30. Other use case where you want to migrate not CQL but actual sstables. At this point you might consider storing the data in a filesystem like S3 or even a separate Cassandra cluster.
  31. I mentioned Chapter 12 of “NoSQL Distilled,” “Schema Migrations.” Well, Chapter 13 is “Polyglot Persistence.” And the authors proceed to state the obvious, that different databases solve different problems. Relational databases excel at enforcing the existence of relationships. Not good at discovering relationships or pulling data from different tables into a single object. (Of course, these days some folks will say relational databases aren’t good enough at anything to justify their existence, but even then, that doesn’t necessarily mean that Cassandra is the best fit for everything either.) 13.5. Choosing the Right Technology "Initially, the pendulum had shifted from specialty databases to a single RDBMS database which allows all types of data models to be stored, although with some abstraction. The trend is now shifting back to using the data storage that supports the implementation of solutions natively." "Encapsulating data access into services reduces the impact of data storage choices on other parts of a system.“ Our Flyway-based solution has the promise to be a unified migrations solution for disparate persistence stores. What you see here is the view in PostgreSQL’s pgAdmin3 GUI of our dedicated flyway schema. There are two tables, one for the Cassandra migration versions, the other for the PostgreSQL migration versions. The name of that one is flyway_schema_version; it should really be called postgresql_schema_version. Not that I want to be encouraging persistence store proliferation, but you could see how we could create another table for another RDBMS vendor or for another entirely different type of persistence store.
  32. I hope by now you can appreciate that I’m not trying to sell you on our particular solution. I am trying to sell you on the value of source-driven schema migrations for Cassandra, and more broadly on the value of adding automation in building blocks at the right granularity. I’d initially figured this talk would be a better fit for the beginners’ track. It’s not one of the more challenging and exciting things you’ll be doing with Cassandra, but it’s doing the routine, boring things like this which I believe will eventually pay off for you and your work with Cassandra.