SlideShare une entreprise Scribd logo
1  sur  20
Big Data Schema Design

               Deepak
Overview
•   Schema design is vital for performance.
•   Keywords : Non-relational, NOSQL, Distributed
•   Underlying File system : GFS, HDFS
•   Examples : Hadoop, GFS, Hbase, Big Tables etc
•   Example implementations : Facebook, Wallmart
    etc.
When to use
• Typically with systems having >=100’s of
  millions/billions rows
• Records of the order of 100’s or 1000’s of
  TB’s
• No advanced Query Language needed
• Typed columns or other RDBMS features not
  needed
Hadoop Architecture
Hadoop Ecosystem
HBase Architecture
Overview
• HBase runs on top of HDFS
• HDFS was chosen because of its fault tolerance,
  check summing, failover properties
• Java Native client or REST API
• Manager manages cluster, Region Servers
  manages data
HBase Data Model
• Table: design-time namespace, has many rows.
• Row: atomic key/value container, with one row
  key
• Column Family: divide columns into physical files
• Column: a key in the k/v container inside a row
• Timestamp: long milliseconds, sorted descending
• Value: a time-versioned value in the k/v container
Distribution
More distribution
Thoughts on the logical view
• Unit of scalability is Region.
• The rows are not tied to a server. They maybe
  moved around for load balancing.
• Add nodes so that we do not have too many
  regions per node
• Too many regions per node will work against
  distribution
Column Family
• Each Column Family represents a Physical storage
  unit ( A Directory)
• Data that are queried together should be stored
  together.
• Features such as compression can be enabled per
  Column Family
Bloom Filter
• Generated automatically when an HFile is
  flushed to disk
• Available in primary memory
• Contains Row keys
• CK can be stored as part of RK, but that
  might overload the memory.
• Can filter based on what is stored.
Physical View
Key Cardinality
Tall vs Fat Tables
• Fat tables with large amounts of data in each
  column.
• Tall tables with large amounts of rows.
• Tall is good for search or scans
• Fat is good for fetches or gets
• Rows don’t split
• Atomicity is only at row level, having compound
  keys, atomicity is not guaranteed
Key Design
• Sequential keys : Example timestamp as key
• With Sequential keys you keep hot spotting on a
  region.
• Salting to distribute the records
• Field promotion
• Random keys
Key Design Performance
Summary
• Think twice before you decide on NOSQL
  technologies
• Avoid hotspots
• Store values at appropriate places
• Choose the right keys
• Store inferences into RDBMS if necessary
Visit us:

   Facebook: http://www.facebook.com/QBurst
        Twitter: http://twitter.com/qburst
 Google+: https://plus.google.com/+qburst/posts
LinkedIn: http://www.linkedin.com/company/qburst
YouTube: http://www.youtube.com/QBurstVideos


                www.qburst.com

Contenu connexe

Tendances

Supercharge your RDBMS with Elasticsearch
Supercharge your RDBMS with ElasticsearchSupercharge your RDBMS with Elasticsearch
Supercharge your RDBMS with ElasticsearchArthur Gimpel
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Josh Carlisle
 
Indexing with solr search server and hadoop framework
Indexing with solr search server and hadoop frameworkIndexing with solr search server and hadoop framework
Indexing with solr search server and hadoop frameworkkeval dalasaniya
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)John Dougherty
 
Road to cloud-iaas
Road to cloud-iaasRoad to cloud-iaas
Road to cloud-iaasHatem Al Sum
 
MySQL Storage Engines
MySQL Storage EnginesMySQL Storage Engines
MySQL Storage EnginesKarthik .P.R
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersNiko Neugebauer
 
Share point 2013 on azure
Share point 2013 on azureShare point 2013 on azure
Share point 2013 on azurePrabath Fonseka
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server sideHoward Marks
 
Hive big-data meetup
Hive big-data meetupHive big-data meetup
Hive big-data meetupRemus Rusanu
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Dave Anselmi
 
Short introduction to Redis
Short introduction to RedisShort introduction to Redis
Short introduction to RedisJimmyZoger
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with RedshiftAmazon Web Services
 
Barcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWSBarcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWSWong Hoi Sing Edison
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseHBaseCon
 
CosmosDb for beginners
CosmosDb for beginnersCosmosDb for beginners
CosmosDb for beginnersPhil Pursglove
 

Tendances (20)

Supercharge your RDBMS with Elasticsearch
Supercharge your RDBMS with ElasticsearchSupercharge your RDBMS with Elasticsearch
Supercharge your RDBMS with Elasticsearch
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
 
Indexing with solr search server and hadoop framework
Indexing with solr search server and hadoop frameworkIndexing with solr search server and hadoop framework
Indexing with solr search server and hadoop framework
 
HBase
HBaseHBase
HBase
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)
 
Road to cloud-iaas
Road to cloud-iaasRoad to cloud-iaas
Road to cloud-iaas
 
MySQL Storage Engines
MySQL Storage EnginesMySQL Storage Engines
MySQL Storage Engines
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
Share point 2013 on azure
Share point 2013 on azureShare point 2013 on azure
Share point 2013 on azure
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server side
 
Postgres Open
Postgres OpenPostgres Open
Postgres Open
 
Hive big-data meetup
Hive big-data meetupHive big-data meetup
Hive big-data meetup
 
Storage for VDI
Storage for VDIStorage for VDI
Storage for VDI
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
 
Short introduction to Redis
Short introduction to RedisShort introduction to Redis
Short introduction to Redis
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 
Barcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWSBarcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWS
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
CosmosDb for beginners
CosmosDb for beginnersCosmosDb for beginners
CosmosDb for beginners
 

Similaire à Schema Design

HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singhMayank Singh
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Comparative study of modern databases
Comparative study of modern databasesComparative study of modern databases
Comparative study of modern databasesAnirban Konar
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"Inhacking
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012Chris Huang
 

Similaire à Schema Design (20)

HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
 
Apache hive
Apache hiveApache hive
Apache hive
 
Comparative study of modern databases
Comparative study of modern databasesComparative study of modern databases
Comparative study of modern databases
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
NoSql
NoSqlNoSql
NoSql
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 

Plus de QBurst

Frontend Optimization - Tips for Improving the Performance of Single Page App...
Frontend Optimization - Tips for Improving the Performance of Single Page App...Frontend Optimization - Tips for Improving the Performance of Single Page App...
Frontend Optimization - Tips for Improving the Performance of Single Page App...QBurst
 
Best Practices for Building Cloud-Native Apps
Best Practices for Building Cloud-Native AppsBest Practices for Building Cloud-Native Apps
Best Practices for Building Cloud-Native AppsQBurst
 
Project Tracking Application
Project Tracking ApplicationProject Tracking Application
Project Tracking ApplicationQBurst
 
DevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesDevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesQBurst
 
Cloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesCloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesQBurst
 
Implementing AMP on WP Blog
Implementing AMP on WP Blog Implementing AMP on WP Blog
Implementing AMP on WP Blog QBurst
 
HTTPS Impact on SEO
HTTPS Impact on SEOHTTPS Impact on SEO
HTTPS Impact on SEOQBurst
 
How to Secure Your WordPress Site
How to Secure Your WordPress SiteHow to Secure Your WordPress Site
How to Secure Your WordPress SiteQBurst
 
QBurst Big Data Expertise - Infographic
 QBurst Big Data Expertise - Infographic  QBurst Big Data Expertise - Infographic
QBurst Big Data Expertise - Infographic QBurst
 

Plus de QBurst (9)

Frontend Optimization - Tips for Improving the Performance of Single Page App...
Frontend Optimization - Tips for Improving the Performance of Single Page App...Frontend Optimization - Tips for Improving the Performance of Single Page App...
Frontend Optimization - Tips for Improving the Performance of Single Page App...
 
Best Practices for Building Cloud-Native Apps
Best Practices for Building Cloud-Native AppsBest Practices for Building Cloud-Native Apps
Best Practices for Building Cloud-Native Apps
 
Project Tracking Application
Project Tracking ApplicationProject Tracking Application
Project Tracking Application
 
DevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesDevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best Practices
 
Cloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesCloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best Practices
 
Implementing AMP on WP Blog
Implementing AMP on WP Blog Implementing AMP on WP Blog
Implementing AMP on WP Blog
 
HTTPS Impact on SEO
HTTPS Impact on SEOHTTPS Impact on SEO
HTTPS Impact on SEO
 
How to Secure Your WordPress Site
How to Secure Your WordPress SiteHow to Secure Your WordPress Site
How to Secure Your WordPress Site
 
QBurst Big Data Expertise - Infographic
 QBurst Big Data Expertise - Infographic  QBurst Big Data Expertise - Infographic
QBurst Big Data Expertise - Infographic
 

Dernier

Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Centuryrwgiffor
 
A305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdfA305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdftbatkhuu1
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxpriyanshujha201
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdftbatkhuu1
 
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...lizamodels9
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 DelhiCall Girls in Delhi
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxWorkforce Group
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetDenis Gagné
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insightsseri bangash
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsMichael W. Hawkins
 

Dernier (20)

Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
A305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdfA305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdf
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdf
 
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insights
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael Hawkins
 

Schema Design

  • 1. Big Data Schema Design Deepak
  • 2. Overview • Schema design is vital for performance. • Keywords : Non-relational, NOSQL, Distributed • Underlying File system : GFS, HDFS • Examples : Hadoop, GFS, Hbase, Big Tables etc • Example implementations : Facebook, Wallmart etc.
  • 3. When to use • Typically with systems having >=100’s of millions/billions rows • Records of the order of 100’s or 1000’s of TB’s • No advanced Query Language needed • Typed columns or other RDBMS features not needed
  • 7. Overview • HBase runs on top of HDFS • HDFS was chosen because of its fault tolerance, check summing, failover properties • Java Native client or REST API • Manager manages cluster, Region Servers manages data
  • 8. HBase Data Model • Table: design-time namespace, has many rows. • Row: atomic key/value container, with one row key • Column Family: divide columns into physical files • Column: a key in the k/v container inside a row • Timestamp: long milliseconds, sorted descending • Value: a time-versioned value in the k/v container
  • 11. Thoughts on the logical view • Unit of scalability is Region. • The rows are not tied to a server. They maybe moved around for load balancing. • Add nodes so that we do not have too many regions per node • Too many regions per node will work against distribution
  • 12. Column Family • Each Column Family represents a Physical storage unit ( A Directory) • Data that are queried together should be stored together. • Features such as compression can be enabled per Column Family
  • 13. Bloom Filter • Generated automatically when an HFile is flushed to disk • Available in primary memory • Contains Row keys • CK can be stored as part of RK, but that might overload the memory. • Can filter based on what is stored.
  • 16. Tall vs Fat Tables • Fat tables with large amounts of data in each column. • Tall tables with large amounts of rows. • Tall is good for search or scans • Fat is good for fetches or gets • Rows don’t split • Atomicity is only at row level, having compound keys, atomicity is not guaranteed
  • 17. Key Design • Sequential keys : Example timestamp as key • With Sequential keys you keep hot spotting on a region. • Salting to distribute the records • Field promotion • Random keys
  • 19. Summary • Think twice before you decide on NOSQL technologies • Avoid hotspots • Store values at appropriate places • Choose the right keys • Store inferences into RDBMS if necessary
  • 20. Visit us: Facebook: http://www.facebook.com/QBurst Twitter: http://twitter.com/qburst Google+: https://plus.google.com/+qburst/posts LinkedIn: http://www.linkedin.com/company/qburst YouTube: http://www.youtube.com/QBurstVideos www.qburst.com

Notes de l'éditeur

  1. Activity 1   - Study Make a conscious effort to improve attention to detail everywhere. Wherever you go, look for things to recall later. When you're shopping look for three things to study. Take 15 to 20 seconds to study each object. After returning home, write down specific things about the objects. Make notes of the size, the shape, the color.   Activity 2     - Recollection           People tend to get careless about the things in which they are familiar. Complacency especially during routine actions does not exercise the mind. Make a point to look for details and notice things as often as possible. Have you noticed the number of steps you need to climb from the ground to reach 3rd floor and 4th floor  at QBurst.63,85)