Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Cassandra Data Modelling

48 vues

Publié le

When there is a question on volume and variety of data, to improve performance and scalability, organizations need to reinvent the ways in which data is represented and analyzed so data can be extracted efficiently. The NoSQL platform, Cassandra high scalability and fulfills this ideal requirement of the massive amount of data storage. It promotes fast retrieval of data irrespective of the size of stored data

Explore these slides to understand the core concepts and identifying the query patterns for designing a correct data model from a Cassandra cluster.

These slides cover:
1. What are Keys in Cassandra
2. Some Basic Goals of Cassandra
3. How to Model your Own Queries
4. Applying those Rules with Example

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Cassandra Data Modelling

  1. 1. Cassandra Data Modeling Presented By: Charmy Garg Software Consultant Knoldus Inc.
  2. 2. 01 Keys in Cassandra 02 Basic Goals 03 Model your own Queries 04 Applying Rules: Examples 05 Glance at Use cases Our Agenda
  3. 3. What is Apache Cassandra?
  4. 4. Cassandra vs Relational Cassandra Data Model Relational Data Model Keyspace Database Column family Table Partition key Primary Key Column Name/Key Column Name Column value Column value
  5. 5. Equivalent to the Partition Key in a single-field-key table (i.e. Simple). Just any multiple-column key. Responsible for data distribution across your nodes. Responsible for data sorting within the partition. 1 2 3 4 Primary Key Composite Key Partition Key Clustering Key “Keys to Recall for Cassandra Data Modeling”
  6. 6. Primary Key
  7. 7. Composite Key
  8. 8. Clustering Key & Partition Key
  9. 9. How Cassandra organizes data
  10. 10. Partitioning and Hashing
  11. 11. Non-Goals Minimize Data Duplication Minimize the Number of Writes As Cassandra is a distributed database, so data duplication provides instant data availability and no single point of failure. Cassandra is optimized for high write throughput, and almost all writes are equally efficient.
  12. 12. 2 1 4 1Spread data evenly around the cluster Rows are spread around the cluster based on a hash of the partition key, which is the first element of the PRIMARY KEY. So, the key to spreading data evenly is this: pick a good primary key. Minimize the number of partitions read Partitions are groups of rows that share the same partition key. When you issue a read query, you want to read rows from as few partitions as possible. Basic Goals
  13. 13. Model Your Data The way to minimize partition reads is to model your data to fit your queries. Don't model around relations. Don't model around objects. Model around your queries. Here's how you do that: Determine what queries you want to support Create table according to your queries Step 2Step 1
  14. 14. www.website.com Try to determine exactly what queries you need to support. This can include a lot of considerations that you may not think of at first. For example, you may need to think about: ● Grouping by an attribute ● Ordering by an attribute ● Filtering based on some set of conditions ● Enforcing uniqueness in the result set Changes to just one of these query requirements will frequently warrant a data model change for maximum efficiency. Step 1: Determine What Queries to Support
  15. 15. www.website.com Use one table per query pattern. If you need to support multiple query patterns, you usually need more than one table. If you need different types of answers, you usually need different tables. This is how you optimize for reads. Remember, in Cassandra data duplication is okay. Many of your tables may repeat the same data. Step 2: Create table for Queries
  16. 16. Applying the Rules: Examples
  17. 17. c Example 1: Table Music Playlist In the example, table Music Playlist, ● SongId is the partition key, and ● SongName is the clustering column
  18. 18. c Example 1: Table Music Playlist In the example, table Music Playlist, ● SongId and Year are the partition key, and ● SongName is the clustering column.
  19. 19. Glance at Use Cases
  20. 20. Use Case 1 Suppose that we are storing Facebook posts of different users in Cassandra. Query: Fetch the top ‘N‘ posts made by a given user. We require user_id, post_id and content as fields. The Cassandra table schema for this use case would look like: Stores all data for a particular user on a single partition as per the above guidelines. Using the post timestamp as the clustering key will be helpful for retrieving the top ‘N‘ posts more efficiently.
  21. 21. Use Case 2 Suppose that we are storing the details of different partner gyms across the different cities and states of many countries. Query: Fetch the sorted gyms for a given city. We require country_code, state, city, gym_name and opening_date as fields. The Cassandra table schema for this use case would look like: Also, let’s say we need to return the results having gyms sorted by their opening date. Store the gyms located in a given city of a specific state and country on a single partition and use the opening date and gym name as a clustering key.
  22. 22. References Baeldung - Cassandra Data Modeling Guru99 - Data Modeling rules in Cassandra Simple Learn - Cassandra Data Modeling Datastax - Cassandra Data Modeling rules
  23. 23. Q&A Please email your queries at charmy.garg@knoldus.in
  24. 24. Thank You! @charmygarg @charmygarg /facebook.com/charmiigarg