Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Introduction to Apache Cassandra™ + What’s New in 4.0

445 vues

Publié le

Apache Cassandra has been a driving force for applications that scale for over 10 years. This open-source database now powers 30% of the Fortune 100.Now is your chance to get an inside look, guided by the company that’s responsible for 85% of the code commits.You won’t want to miss this deep dive into the database that has become the power behind the moment — the force behind game-changing, scalable cloud applications - Patrick McFadin, VP Developer Relations at DataStax, is going behind the Cassandra curtain in an exclusive webinar.

View recording: https://youtu.be/z8fLn8GL5as

Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Introduction to Apache Cassandra™ + What’s New in 4.0

  1. 1. Introduction to Apache Cassandra™ 1 Patrick McFadin VP Developer Relations, DataStax @PatrickMcFadin
  2. 2. Cloud Applications 2
  3. 3. 3 You may consider one of these …or one of these
  4. 4. 4 –Who knows “The definition of insanity is doing the same thing over and over and expecting a different result” “The definition of bad engineering
  5. 5. Sharding 5 shard 1 shard 2 shard 3 shard 4 App Server client
  6. 6. Sharding 6 A-F G-M N-T U-Z App Server client Customer Name
  7. 7. 2005 - It's broke! 7
  8. 8. June 29, 2007 8
  9. 9. Dynamo Paper(2007) • How do we build a data store that is: – Reliable – Performant – “Always On” • Nothing new and shiny • 24 papers cited 9 Evolutionary. Real. Computer Science Also the basis for Riak and Voldemort
  10. 10. BigTable(2006) • Richer data model • 1 key. Lots of values • Fast sequential access • 38 Papers cited 10
  11. 11. Cassandra(2008) • Distributed features of Dynamo • Data Model and storage from BigTable • February 17, 2010 it graduated to a top-level Apache project 11
  12. 12. Basic Architecture 12
  13. 13. Node 13 Server
  14. 14. Token 14 Server •Each partition is a 64 bit value •Consistent hash between -263 to +263-1 •Each node owns a range of those values •The token is the beginning of that range to the next node’s token value •Virtual Nodes break these down further Data Token Range 0 …
  15. 15. The cluster 15 Server Token Range 0 0-100 0-100
  16. 16. The cluster 16 Server Token Range 0 0-50 51 51-100 Server 0-50 51-100
  17. 17. The cluster 17 Server Token Range 0 0-25 26 26-50 51 51-75 76 76-100 Server ServerServer 0-25 76-100 26-5051-75
  18. 18. Replication 18 10.0.0.1 00-25 DC1 DC1: RF=1 Node Primary 10.0.0.1 00-25 10.0.0.2 26-50 10.0.0.3 51-75 10.0.0.4 76-100 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75
  19. 19. Replication 19 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 DC1 DC1: RF=2 Node Primary Replica 10.0.0.1 00-25 76-100 10.0.0.2 26-50 00-25 10.0.0.3 51-75 26-50 10.0.0.4 76-100 51-75 76-100 00-25 26-50 51-75
  20. 20. Replication 20 DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50
  21. 21. Consistency 21 DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15
  22. 22. Consistency level 22 Consistency Level Number of Nodes Acknowledged One One replica acknowledges read One replica commits write Quorum 51% nodes agree on read or commit write Local Quorum 51% in local DC
  23. 23. Consistency 23 DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 CL= One
  24. 24. Consistency 24 DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 CL= One
  25. 25. Consistency 25 DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 CL= Quorum
  26. 26. Multi-datacenter 26 AWS DC1: RF=3 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 GCP 10.1.0.1 00-25 10.1.0.4 76-100 10.1.0.2 26-50 10.1.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 DC2: RF=3
  27. 27. Multi-datacenter 27 AWS DC1: RF=3 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 GCP 10.1.0.1 00-25 10.1.0.4 76-100 10.1.0.2 26-50 10.1.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 DC2: RF=3
  28. 28. Multi-datacenter 28 AWS DC1: RF=3 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Write to partition 15 GCP 10.1.0.1 00-25 10.1.0.4 76-100 10.1.0.2 26-50 10.1.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 DC2: RF=3
  29. 29. Data Modeling 29
  30. 30. Relational Data Models • 5 normal forms • Foreign Keys • Joins 30 deptId First Last 1 Edgar Codd 2 Raymond Boyce id Dept 1 Engineering 2 Math Employees Department
  31. 31. 31
  32. 32. Relational Modeling 32 CREATE TABLE users ( id number(12) NOT NULL , firstname nvarchar2(25) NOT NULL , lastname nvarchar2(25) NOT NULL, email nvarchar2(50) NOT NULL, password nvarchar2(255) NOT NULL, created_date timestamp(6), PRIMARY KEY (id), CONSTRAINT email_uq UNIQUE (email) ); -- Users by email address index CREATE INDEX idx_users_email ON users (email); • Create entity table • Add constraints • Index fields • Foreign Key relationships CREATE TABLE videos ( id number(12), userid number(12) NOT NULL, name nvarchar2(255), description nvarchar2(500), location nvarchar2(255), location_type int, added_date timestamp, CONSTRAINT users_userid_fk FOREIGN KEY (userid) REFERENCES users (Id) ON DELETE CASCADE, PRIMARY KEY (id) );
  33. 33. Relational Modeling 33 Data Models Application
  34. 34. Cassandra Modeling 34 Data Models Application
  35. 35. Modeling Queries • What are your application’s workflows? • How will I access the data? • Knowing your queries in advance is NOT optional • Different from RDBMS because I can’t just JOIN or create a new indexes to support new queries 35
  36. 36. Some Application Workflows in KillrVideo 36 User Logs into site Show basic information about user Show videos added by a user Show comments posted by a user Search for a video by tag Show latest videos added to the site Show comments for a video Show ratings for a video Show video and its details
  37. 37. Some Queries in KillrVideo to Support Workflows 37 Users User Logs into site Find user by email address Show basic information about user Find user by id Comments Show comments for a video Find comments by video (latest first) Show comments posted by a user Find comments by user (latest first) Ratings Show ratings for a video Find ratings by video
  38. 38. CQL vs SQL • No joins • Limited aggregations 38 deptId First Last 1 Edgar Codd 2 Raymond Boyce id Dept 1 Engineering 2 Math Employees Department SELECT e.First, e.Last, d.Dept FROM Department d, Employees e WHERE ‘Codd’ = e.Last AND e.deptId = d.id
  39. 39. Denormalization • Combine table columns into a single view • Eliminate the need for joins 39 SELECT First, Last, Dept FROM employees WHERE id = ‘1’ id First Last Dept 1 Edgar Codd Engineering 2 Raymond Boyce Math Employees
  40. 40. “Static” Table 40 CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) ); Table Name Column Name Column CQL Type Primary Key Designation Partition Key
  41. 41. Insert 41 INSERT INTO videos (videoid, name, userid, description, location, location_type, preview_thumbnails, tags, added_date, metadata) VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.',9761d3d7-7fbd-4269-9988-6cfd4e188678, 'First in a three part series for Cassandra Data Modeling','http://www.youtube.com/watch?v=px6U2n74q3g',1, {'YouTube':'http://www.youtube.com/watch?v=px6U2n74q3g'},{'cassandra','data model','relational','instruction'}, '2013-05-02 12:30:29'); Table Name Fields Values Partition Key: Required
  42. 42. Partition keys 42 06049cbb-dfed-421f-b889-5f649a0de1ed Murmur3 Hash Token = 7224631062609997448 873ff430-9c23-4e60-be5f-278ea2bb21bd Murmur3 Hash Token = -6804302034103043898 Consistent hash. 128 bit number between 2-63 and 264 INSERT INTO videos (videoid, name, userid, description) VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.’, 9761d3d7-7fbd-4269-9988-6cfd4e188678, 'First in a three part series for Cassandra Data Modeling'); INSERT INTO videos (videoid, name, userid, description) VALUES (873ff430-9c23-4e60-be5f-278ea2bb21bd,'Become a Super Modeler’, 9761d3d7-7fbd-4269-9988-6cfd4e188678, 'Second in a three part series for Cassandra Data Modeling');
  43. 43. Select 43 name | description | added_date ---------------------------------------------------+----------------------------------------------------------+-------------------------- The data model is dead. Long live the data model. | First in a three part series for Cassandra Data Modeling | 2013-05-02 12:30:29-0700 SELECT name, description, added_date FROM videos WHERE videoid = 06049cbb-dfed-421f-b889-5f649a0de1ed; Fields Table Name Primary Key: Partition Key Required
  44. 44. Locality 44 1000 Node Cluster videoid = 06049cbb-dfed-421f-b889-5f649a0de1ed SELECT name, description, added_date FROM videos WHERE videoid = 06049cbb-dfed-421f-b889-5f649a0de1ed;
  45. 45. No more sequences • Great for auto-creation of Ids • Guaranteed unique • Needs ACID to work. (Sorry. No sharding) 45 INSERT INTO user (id, firstName, LastName) VALUES (users_sequence.nextVal(), ‘Ted’, ‘Codd’) CREATE SEQUENCE users_sequence INCREMENT BY 1 START WITH 1 NOMAXVALUE NOCYCLE CACHE 10;
  46. 46. No sequences??? • Almost impossible in a distributed system • Couple of great choices – Natural Key - Unique values like email – Surrogate Key - UUID 46 • Universal Unique ID • 128 bit number represented in character form • Easily generated on the client • Same as GUID for the MS folks 99051fe9-6a9c-46c2-b949-38ef78858dd0
  47. 47. “Dynamic” Table 47 CREATE TABLE videos_by_tag ( tag text, videoid uuid, added_date timestamp, name text, preview_image_location text, tagged_date timestamp, PRIMARY KEY (tag, videoid) ); Partition Key Clustering Column
  48. 48. Primary key relationship 48 PRIMARY KEY (tag,videoid)
  49. 49. Primary key relationship 49 Partition Key PRIMARY KEY (tag,videoid)
  50. 50. Primary key relationship 50 Partition Key Clustering Column PRIMARY KEY (tag,videoid)
  51. 51. Primary key relationship 51 Partition Key data model PRIMARY KEY (tag,videoid) Clustering Column
  52. 52. -5.6 06049cbb-dfed-421f-b889-5f649a0de1ed Primary key relationship 52 Partition Key 2013-05-16 16:50:002013-05-02 12:30:29 873ff430-9c23-4e60-be5f-278ea2bb21bd PRIMARY KEY (tag,videoid) Clustering Column data model 49f64d40-7d89-4890-b910-dbf923563a33 2013-06-11 11:00:00
  53. 53. Row 53 Column 1 Partition Key 1 Column 2 Column 3 Column 4
  54. 54. Partition with Clustering 54 Cluster 1 Partition Key 1 Column 1 Column 2 Column 3 Cluster 2 Partition Key 1 Column 1 Column 2 Column 3 Cluster 3 Partition Key 1 Column 1 Column 2 Column 3 Cluster 4 Partition Key 1 Column 1 Column 2 Column 3 Order By
  55. 55. Table 55 Partition Key 1 Partition Key 1 Partition Key 1 Partition Key 1 Partition Key 2 Partition Key 2 Partition Key 2 Partition Key 2 Cluster 1 Column 1 Column 2 Column 3 Cluster 2 Column 1 Column 2 Column 3 Cluster 3 Column 1 Column 2 Column 3 Cluster 4 Column 1 Column 2 Column 3 Cluster 1 Column 1 Column 2 Column 3 Cluster 2 Column 1 Column 2 Column 3 Cluster 3 Column 1 Column 2 Column 3 Cluster 4 Column 1 Column 2 Column 3
  56. 56. Keyspace 56 Cluster 1 Partition Key 1 Column 2 Column 3 Column 4 Partition Key 2 Column 2 Column 3 Column 4 Cluster 2 Partition Key 1 Column 2 Column 3 Column 4 Cluster 3 Partition Key 1 Column 2 Column 3 Column 4 Cluster 4 Partition Key 1 Column 2 Column 3 Column 4 Partition Key 2 Column 2 Column 3 Column 4 Partition Key 2 Column 2 Column 3 Column 4 Partition Key 2 Column 2 Column 3 Column 4 Partition Key 1 Column 2 Column 3 Column 4 Partition Key 2 Column 2 Column 3 Column 4 Partition Key 1 Column 2 Column 3 Column 4 Partition Key 1 Column 2 Column 3 Column 4 Partition Key 1 Column 2 Column 3 Column 4 Partition Key 2 Column 2 Column 3 Column 4 Partition Key 2 Column 2 Column 3 Column 4 Partition Key 2 Column 2 Column 3 Column 4 Table 1 Table 2 Keyspace 1 Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 1 Cluster 2 Cluster 3 Cluster 4
  57. 57. Controlling Order 57 CREATE TABLE raw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,10,-5.6); INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,9,-5.1); INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,8,-4.9); INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.3);
  58. 58. Clustering Order 58 200510010:99999 12 1 10 200510010:99999 12 1 9 raw_weather_data -5.6 -5.1 200510010:99999 12 1 8 200510010:99999 12 1 7 -4.9 -5.3 Order By DESC
  59. 59. Clustering Order 59 added_date 1userid 1 videoid 1 added_date 2userid 1 videoid 2 user_videos added_date 3userid 1 videoid 3 added_date 4userid 1 videoid 4 Order By ASC name name name name preview_image preview_image preview_image preview_image
  60. 60. Clustering Order 60 added_date 4userid 1 videoid 1 added_date 3userid 1 videoid 2 user_videos added_date 2userid 1 videoid 3 added_date 1userid 1 videoid 4 Order By DESC name name name name preview_image preview_image preview_image preview_image
  61. 61. Write Path 61 Client INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.3); year 1wsid 1 month 1 day 1 hour 1 year 2wsid 2 month 2 day 2 hour 2 Memtable SSTable SSTable SSTable SSTable Node Commit Log Data * Compaction * Temp Temp
  62. 62. Storage Model - Logical View 62 2005:12:1:10 -5.6 2005:12:1:9 -5.1 2005:12:1:8 -4.9 10010:99999 10010:99999 10010:99999 wsid hour temperature 2005:12:1:7 -5.3 10010:99999 SELECT wsid, hour, temperature FROM raw_weather_data WHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
  63. 63. 2005:12:1:10 -5.6 -5.3-4.9-5.1 Storage Model - Disk Layout 63 2005:12:1:9 2005:12:1:8 10010:99999 2005:12:1:7 Merged, Sorted and Stored Sequentially SELECT wsid, hour, temperature FROM raw_weather_data WHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
  64. 64. 2005:12:1:10 -5.6 2005:12:1:11 -4.9 -5.3-4.9-5.1 Storage Model - Disk Layout 64 2005:12:1:9 2005:12:1:8 10010:99999 2005:12:1:7 Merged, Sorted and Stored Sequentially SELECT wsid, hour, temperature FROM raw_weather_data WHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
  65. 65. 2005:12:1:10 -5.6 2005:12:1:11 -4.9 -5.3-4.9-5.1 Storage Model - Disk Layout 65 2005:12:1:9 2005:12:1:8 10010:99999 2005:12:1:7 Merged, Sorted and Stored Sequentially SELECT wsid, hour, temperature FROM raw_weather_data WHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1; 2005:12:1:12 -5.4
  66. 66. Read Path 66 Client SSTable SSTable SSTable Node Data SELECT wsid,hour,temperature FROM raw_weather_data WHERE wsid='10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10; year 1wsid 1 month 1 day 1 hour 1 year 2wsid 2 month 2 day 2 hour 2 Memtable Temp Temp
  67. 67. Query patterns • Range queries • “Slice” operation on disk 67 Single seek on disk 10010:99999 Partition key for locality SELECT wsid,hour,temperature FROM raw_weather_data WHERE wsid='10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10; 2005:12:1:10 -5.6 -5.3-4.9-5.1 2005:12:1:9 2005:12:1:8 2005:12:1:7
  68. 68. Query patterns 68 Programmers like this Sorted by event_time 2005:12:1:10 -5.6 2005:12:1:9 -5.1 2005:12:1:8 -4.9 10010:99999 10010:99999 10010:99999 weather_station hour temperature 2005:12:1:7 -5.3 10010:99999 SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation_id=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
  69. 69. What’s Next?? Cassandra 4.0 69
  70. 70. Cassandra 4.0 70 Massive Stability Release Networking Changes • Async internode communication • 20% faster Streaming Restart Conditions • Gossip overhaul • Nodes coordinate on restart • Dead node detector Queries • Slow/Large query log • Stop large queries killing cluster
  71. 71. Cassandra 4.0 71 Big Features Pluggable Storage Audit Logging Virtual Tables Management Sidecar
  72. 72. © DataStax, All Rights Reserved. ZGC and the end of GC
  73. 73. Thank You! Follow Me @PatrickMcFadin 73
  74. 74. 74 © DataStax, All Rights Reserved. Confidential May 21 - 23, 2019 Gaylord National Resort & Convention Center Maryland Use Discount Code NEWYEAR19 for 19% off

×