CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx

A
Asst.Prof. M.GokilavaniProfessor à KL university
CCS334 BIG DATAANALYTICS
(R-21 III (I Sem))
Department of Artificial Intelligence and Data Science )
Session 3
by
Asst.Prof.M.Gokilavani
NIET
9/19/2023 Department of AI & DS 1
TEXT BOOKS
• Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data,
Big Analytics: Emerging Business Intelligence and Analytic Trends for
Today's Businesses", Wiley, 2013.
• Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
• Sadalage, Pramod J. “NoSQL distilled”, 2013.
REFERENCES
• E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive",
O'Reilley, 2012.
• Lars George, "HBase: The Definitive Guide", O'Reilley, 2011.
• Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010.
9/19/2023 Department of AI & DS 2
Topics covered in Unit 2 session
9/19/2023 Department of AI & DS 3
UNIT II NOSQL DATA MANAGEMENT
Introduction to NoSQL – aggregate data models – key-value and
document data models – relationships – graph databases – schema
less databases – materialized views – distribution models – master-
slave replication – consistency - Cassandra – Cassandra data model
– Cassandra examples – Cassandra clients.
Distribution Models
• Depending on the distribution model the data store can give us the ability:
• To handle large Quantity of data
• To process a grater read or write traffic
• To have more availability in the case of network slowdowns of
breakages.
• There are two path for distribution:
• Sharding: Sharding distributes different data across multiple
servers, so each server acts as the single source for a subset of
data.
• Replication: Replication copies data across multiple servers,
so each bit of data can be found in multiple places.
9/19/2023 Department of AI & DS 4
Distributed Models
• Single server
• Sharding
• Master Slave Replication
• Peer-to-Peer Replication
• Combining Sharding & Replication
9/19/2023 Department of AI & DS 5
Single Server
• It is the first and simplest distribution option.
• Also if NoSQL database are designed to run on a cluster they can be
used in a single server application.
• This make sense if a NoSQL database is more suited for the
application data model.
• Graph database are the more obvious.
• If data usage is most about processing aggregates, than a key or a
document store may be useful.
9/19/2023 Department of AI & DS 6
Sharding
• Often, a data store is busy because different people are accessing
different part of the dataset.
• In this cases we can support horizontal scalability by putting different
part of the data onto different servers (Sharding)
• The concept of sharding is not new as a part of application logic.
• It consists in put all the customer with surname A-D on one shard and
E-G to another
• This complicates the programming model as the application code
needs to distributed the load across the shards.
• In the ideal setting we have each user to talk one server and the load is
balanced.
• Of course the ideal case is rare.
9/19/2023 Department of AI & DS 7
9/19/2023 Department of AI & DS 8
Sharding and NoSQL
• In general, many NoSQL databases offers auto- sharding.
• This can make much easier to use sharding in an application.
• Sharding is especially valuable for performance because it improves
read and write performances.
• It scales read and writes on the different nodes of the same cluster.
9/19/2023 Department of AI & DS 9
Master – Slave Replication
• In this setting one node is designated as the master, or primary ad the
other as slaves.
• The master is the authoritative source for the date and designed to
process updates and send them to slaves.
• The slaves are used for read operations.
• This allows us to scale in data intensive dataset.
9/19/2023 Department of AI & DS 10
9/19/2023 Department of AI & DS 11
Master – Slave Replication
• We can scale horizontally by adding more slaves.
• But, we are limited by the ability of the master in processing incoming
data.
An advantage is read resilience.
Disadvantage:
• Also if the master fails the slaves can still handle read requests.
• Anyway writes are not allowed until the master is not restored.
9/19/2023 Department of AI & DS 12
Master – Slave Replication
• Another characteristic is that a slave can be appointed as master.
• Masters can be appointed manually or automatically.
• In order to achieve resilience we need that read and write paths are
different.
• This is normally done using separate database connections.
• Disadvantage: Replication in master-slave have the analyzed
advantages but it come with the problem of inconsistency.
• The readers reading from the slaves can read data not updated.
9/19/2023 Department of AI & DS 13
Master – Slave Replication in MongoDB
9/19/2023 Department of AI & DS 14
Peer to Peer Replication
• Master-Slave replication helps with read scalability but has problems
on scalability of writes.
• Moreover, it provides resilience on read but not on writes.
• The master is still a single point of failure.
• Peer-to-Peer attacks these problems by not having a master.
• All the replica are equal (accept writes and reads).
• With a Peer-to-Peer we can have node failures without lose write
capability and losing data.
9/19/2023 Department of AI & DS 15
9/19/2023 Department of AI & DS 16
Combine Sharing
• We have multiple masters, but each data has a single master.
• Depending on the configuration we can decide the master for each
group of data.
• Peer-to-Peer and sharding is a common strategy for column-family
databases.
• This is commonly composed using replication of the shards.
9/19/2023 Department of AI & DS 17
Topics to be covered in next session 4
• Cassandra data model
9/19/2023 Department of AI & DS 18
Thank you!!!
1 sur 18

Recommandé

HBase in Practice par
HBase in Practice HBase in Practice
HBase in Practice DataWorks Summit/Hadoop Summit
5.4K vues46 diapositives
LLAP: Building Cloud First BI par
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BIDataWorks Summit
1.1K vues59 diapositives
Introduction to Redis par
Introduction to RedisIntroduction to Redis
Introduction to RedisArnab Mitra
11.1K vues31 diapositives
Using all of the high availability options in MariaDB par
Using all of the high availability options in MariaDBUsing all of the high availability options in MariaDB
Using all of the high availability options in MariaDBMariaDB plc
2.9K vues31 diapositives
MariaDB Performance Tuning and Optimization par
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB plc
11.3K vues51 diapositives
Redis introduction par
Redis introductionRedis introduction
Redis introductionFederico Daniel Colombo Gennarelli
5.9K vues22 diapositives

Contenu connexe

Tendances

Understanding the architecture of MariaDB ColumnStore par
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreMariaDB plc
2.8K vues40 diapositives
Building Event Streaming Architectures on Scylla and Kafka par
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaScyllaDB
868 vues56 diapositives
Introduction to redis par
Introduction to redisIntroduction to redis
Introduction to redisNexThoughts Technologies
989 vues21 diapositives
Cassandra par
CassandraCassandra
CassandraUpaang Saxena
2.9K vues28 diapositives
MariaDB Server Performance Tuning & Optimization par
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB plc
8.1K vues43 diapositives
Introduction To HBase par
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
87.9K vues18 diapositives

Tendances(20)

Understanding the architecture of MariaDB ColumnStore par MariaDB plc
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStore
MariaDB plc2.8K vues
Building Event Streaming Architectures on Scylla and Kafka par ScyllaDB
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and Kafka
ScyllaDB868 vues
MariaDB Server Performance Tuning & Optimization par MariaDB plc
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
MariaDB plc8.1K vues
Introduction To HBase par Anil Gupta
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta87.9K vues
Top 5 Mistakes to Avoid When Writing Apache Spark Applications par Cloudera, Inc.
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.127.8K vues
Understanding Data Consistency in Apache Cassandra par DataStax
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
DataStax26.5K vues
Chapter1: NoSQL: It’s about making intelligent choices par Maynooth University
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
MySQL InnoDB Cluster - A complete High Availability solution for MySQL par Olivier DASINI
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
Olivier DASINI7.4K vues
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka par Edureka!
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
Edureka!1.5K vues
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer par Andrejs Karpovs
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and AdministerOracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Andrejs Karpovs3.1K vues
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ... par Simplilearn
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn10.1K vues
Introduction to Apache Spark par Rahul Jain
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain24.8K vues
How to use WebKitGtk+ par Joone Hur
How to use WebKitGtk+How to use WebKitGtk+
How to use WebKitGtk+
Joone Hur5.6K vues
The Parquet Format and Performance Optimization Opportunities par Databricks
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks8.2K vues

Similaire à CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx

Session 1 Introduction to NoSQL.pptx par
 Session 1 Introduction to NoSQL.pptx Session 1 Introduction to NoSQL.pptx
Session 1 Introduction to NoSQL.pptxAsst.Prof. M.Gokilavani
14 vues18 diapositives
Report 2.0.docx par
Report 2.0.docxReport 2.0.docx
Report 2.0.docxpinstechwork
7 vues6 diapositives
Introduction to NoSQL and MongoDB par
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
158 vues26 diapositives
NoSQLDatabases par
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
395 vues39 diapositives
No sql database par
No sql databaseNo sql database
No sql databasevishal gupta
365 vues25 diapositives
6 Data Modeling for NoSQL 2/2 par
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2Fabio Fumarola
12.2K vues45 diapositives

Similaire à CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx(20)

Introduction to NoSQL and MongoDB par Ahmed Farag
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
Ahmed Farag158 vues
6 Data Modeling for NoSQL 2/2 par Fabio Fumarola
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
Fabio Fumarola12.2K vues
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt... par PascalDesmarets1
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
A survey on data mining and analysis in hadoop and mongo db par Alexander Decker
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
Alexander Decker307 vues
A survey on data mining and analysis in hadoop and mongo db par Alexander Decker
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
Alexander Decker542 vues
Cloud Computing: The Hard Problems Never Go Away par ZendCon
Cloud Computing: The Hard Problems Never Go AwayCloud Computing: The Hard Problems Never Go Away
Cloud Computing: The Hard Problems Never Go Away
ZendCon2.4K vues
Objectivity/DB: A Multipurpose NoSQL Database par InfiniteGraph
Objectivity/DB: A Multipurpose NoSQL DatabaseObjectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL Database
InfiniteGraph5.1K vues

Plus de Asst.Prof. M.Gokilavani

AI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdf par
AI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdfAI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdf
AI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdfAsst.Prof. M.Gokilavani
11 vues16 diapositives
AI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdf par
AI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdfAI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdf
AI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdfAsst.Prof. M.Gokilavani
21 vues34 diapositives
AI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdf par
AI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdfAI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdf
AI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdfAsst.Prof. M.Gokilavani
14 vues33 diapositives
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ... par
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...Asst.Prof. M.Gokilavani
10 vues17 diapositives
AI3391 Session 12 Local search in continious space.pptx par
AI3391 Session 12 Local search in continious space.pptxAI3391 Session 12 Local search in continious space.pptx
AI3391 Session 12 Local search in continious space.pptxAsst.Prof. M.Gokilavani
7 vues16 diapositives
AI3391 Session 11 Hill climbing algorithm.pptx par
AI3391 Session 11 Hill climbing algorithm.pptxAI3391 Session 11 Hill climbing algorithm.pptx
AI3391 Session 11 Hill climbing algorithm.pptxAsst.Prof. M.Gokilavani
4 vues22 diapositives

Plus de Asst.Prof. M.Gokilavani(20)

AI3391 Session 13 searching with Non-Deterministic Actions and partial observ... par Asst.Prof. M.Gokilavani
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect... par Asst.Prof. M.Gokilavani
AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect...AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect...
AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect...
AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f... par Asst.Prof. M.Gokilavani
AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f...AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f...
AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f...

Dernier

AWS Certified Solutions Architect Associate Exam Guide_published .pdf par
AWS Certified Solutions Architect Associate Exam Guide_published .pdfAWS Certified Solutions Architect Associate Exam Guide_published .pdf
AWS Certified Solutions Architect Associate Exam Guide_published .pdfKiran Kumar Malik
6 vues121 diapositives
02. COLEGIO - KIT SANITARIO .pdf par
02. COLEGIO - KIT SANITARIO .pdf02. COLEGIO - KIT SANITARIO .pdf
02. COLEGIO - KIT SANITARIO .pdfRAULALEJANDROMALDONA
5 vues7 diapositives
REACTJS.pdf par
REACTJS.pdfREACTJS.pdf
REACTJS.pdfArthyR3
39 vues16 diapositives
Solution Challenge Introduction.pptx par
Solution Challenge Introduction.pptxSolution Challenge Introduction.pptx
Solution Challenge Introduction.pptxGDSCCEC
13 vues16 diapositives
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R... par
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...IJCNCJournal
5 vues25 diapositives
DevFest 2023 Daegu Speech_이재규, Implementing easy and simple chat with gol... par
DevFest 2023 Daegu Speech_이재규,  Implementing easy and simple chat with gol...DevFest 2023 Daegu Speech_이재규,  Implementing easy and simple chat with gol...
DevFest 2023 Daegu Speech_이재규, Implementing easy and simple chat with gol...JQLEE6
8 vues31 diapositives

Dernier(20)

AWS Certified Solutions Architect Associate Exam Guide_published .pdf par Kiran Kumar Malik
AWS Certified Solutions Architect Associate Exam Guide_published .pdfAWS Certified Solutions Architect Associate Exam Guide_published .pdf
AWS Certified Solutions Architect Associate Exam Guide_published .pdf
REACTJS.pdf par ArthyR3
REACTJS.pdfREACTJS.pdf
REACTJS.pdf
ArthyR339 vues
Solution Challenge Introduction.pptx par GDSCCEC
Solution Challenge Introduction.pptxSolution Challenge Introduction.pptx
Solution Challenge Introduction.pptx
GDSCCEC13 vues
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R... par IJCNCJournal
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
IJCNCJournal5 vues
DevFest 2023 Daegu Speech_이재규, Implementing easy and simple chat with gol... par JQLEE6
DevFest 2023 Daegu Speech_이재규,  Implementing easy and simple chat with gol...DevFest 2023 Daegu Speech_이재규,  Implementing easy and simple chat with gol...
DevFest 2023 Daegu Speech_이재규, Implementing easy and simple chat with gol...
JQLEE68 vues
Programmable Switches for Programmable Logic Devices par Usha Mehta
Programmable Switches for Programmable Logic DevicesProgrammable Switches for Programmable Logic Devices
Programmable Switches for Programmable Logic Devices
Usha Mehta19 vues
IRJET-Productivity Enhancement Using Method Study.pdf par SahilBavdhankar
IRJET-Productivity Enhancement Using Method Study.pdfIRJET-Productivity Enhancement Using Method Study.pdf
IRJET-Productivity Enhancement Using Method Study.pdf
SahilBavdhankar10 vues
GDSC Mikroskil Members Onboarding 2023.pdf par gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil72 vues
Programmable Logic Devices : SPLD and CPLD par Usha Mehta
Programmable Logic Devices : SPLD and CPLDProgrammable Logic Devices : SPLD and CPLD
Programmable Logic Devices : SPLD and CPLD
Usha Mehta27 vues
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf par AlhamduKure
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdfASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
AlhamduKure10 vues
Building source code level profiler for C++.pdf par ssuser28de9e
Building source code level profiler for C++.pdfBuilding source code level profiler for C++.pdf
Building source code level profiler for C++.pdf
ssuser28de9e10 vues
Different type of computer networks .pptx par nazmul1514788
Different  type of computer networks .pptxDifferent  type of computer networks .pptx
Different type of computer networks .pptx
nazmul151478819 vues
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... par csegroupvn
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
csegroupvn16 vues
Ansari: Practical experiences with an LLM-based Islamic Assistant par M Waleed Kadous
Ansari: Practical experiences with an LLM-based Islamic AssistantAnsari: Practical experiences with an LLM-based Islamic Assistant
Ansari: Practical experiences with an LLM-based Islamic Assistant
M Waleed Kadous12 vues
Field Programmable Gate Arrays : Architecture par Usha Mehta
Field Programmable Gate Arrays : ArchitectureField Programmable Gate Arrays : Architecture
Field Programmable Gate Arrays : Architecture
Usha Mehta23 vues

CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx

  • 1. CCS334 BIG DATAANALYTICS (R-21 III (I Sem)) Department of Artificial Intelligence and Data Science ) Session 3 by Asst.Prof.M.Gokilavani NIET 9/19/2023 Department of AI & DS 1
  • 2. TEXT BOOKS • Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013. • Eric Sammer, "Hadoop Operations", O'Reilley, 2012. • Sadalage, Pramod J. “NoSQL distilled”, 2013. REFERENCES • E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012. • Lars George, "HBase: The Definitive Guide", O'Reilley, 2011. • Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010. 9/19/2023 Department of AI & DS 2
  • 3. Topics covered in Unit 2 session 9/19/2023 Department of AI & DS 3 UNIT II NOSQL DATA MANAGEMENT Introduction to NoSQL – aggregate data models – key-value and document data models – relationships – graph databases – schema less databases – materialized views – distribution models – master- slave replication – consistency - Cassandra – Cassandra data model – Cassandra examples – Cassandra clients.
  • 4. Distribution Models • Depending on the distribution model the data store can give us the ability: • To handle large Quantity of data • To process a grater read or write traffic • To have more availability in the case of network slowdowns of breakages. • There are two path for distribution: • Sharding: Sharding distributes different data across multiple servers, so each server acts as the single source for a subset of data. • Replication: Replication copies data across multiple servers, so each bit of data can be found in multiple places. 9/19/2023 Department of AI & DS 4
  • 5. Distributed Models • Single server • Sharding • Master Slave Replication • Peer-to-Peer Replication • Combining Sharding & Replication 9/19/2023 Department of AI & DS 5
  • 6. Single Server • It is the first and simplest distribution option. • Also if NoSQL database are designed to run on a cluster they can be used in a single server application. • This make sense if a NoSQL database is more suited for the application data model. • Graph database are the more obvious. • If data usage is most about processing aggregates, than a key or a document store may be useful. 9/19/2023 Department of AI & DS 6
  • 7. Sharding • Often, a data store is busy because different people are accessing different part of the dataset. • In this cases we can support horizontal scalability by putting different part of the data onto different servers (Sharding) • The concept of sharding is not new as a part of application logic. • It consists in put all the customer with surname A-D on one shard and E-G to another • This complicates the programming model as the application code needs to distributed the load across the shards. • In the ideal setting we have each user to talk one server and the load is balanced. • Of course the ideal case is rare. 9/19/2023 Department of AI & DS 7
  • 9. Sharding and NoSQL • In general, many NoSQL databases offers auto- sharding. • This can make much easier to use sharding in an application. • Sharding is especially valuable for performance because it improves read and write performances. • It scales read and writes on the different nodes of the same cluster. 9/19/2023 Department of AI & DS 9
  • 10. Master – Slave Replication • In this setting one node is designated as the master, or primary ad the other as slaves. • The master is the authoritative source for the date and designed to process updates and send them to slaves. • The slaves are used for read operations. • This allows us to scale in data intensive dataset. 9/19/2023 Department of AI & DS 10
  • 12. Master – Slave Replication • We can scale horizontally by adding more slaves. • But, we are limited by the ability of the master in processing incoming data. An advantage is read resilience. Disadvantage: • Also if the master fails the slaves can still handle read requests. • Anyway writes are not allowed until the master is not restored. 9/19/2023 Department of AI & DS 12
  • 13. Master – Slave Replication • Another characteristic is that a slave can be appointed as master. • Masters can be appointed manually or automatically. • In order to achieve resilience we need that read and write paths are different. • This is normally done using separate database connections. • Disadvantage: Replication in master-slave have the analyzed advantages but it come with the problem of inconsistency. • The readers reading from the slaves can read data not updated. 9/19/2023 Department of AI & DS 13
  • 14. Master – Slave Replication in MongoDB 9/19/2023 Department of AI & DS 14
  • 15. Peer to Peer Replication • Master-Slave replication helps with read scalability but has problems on scalability of writes. • Moreover, it provides resilience on read but not on writes. • The master is still a single point of failure. • Peer-to-Peer attacks these problems by not having a master. • All the replica are equal (accept writes and reads). • With a Peer-to-Peer we can have node failures without lose write capability and losing data. 9/19/2023 Department of AI & DS 15
  • 17. Combine Sharing • We have multiple masters, but each data has a single master. • Depending on the configuration we can decide the master for each group of data. • Peer-to-Peer and sharding is a common strategy for column-family databases. • This is commonly composed using replication of the shards. 9/19/2023 Department of AI & DS 17
  • 18. Topics to be covered in next session 4 • Cassandra data model 9/19/2023 Department of AI & DS 18 Thank you!!!