SlideShare une entreprise Scribd logo
1  sur  32
HBase and Hive a random walk down H street Joydeep Sen Sarma
this.toString() ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hadoop Ecosystem HDFS ZooKeeper HBase Map-Reduce Scheduler Hive Metastore Hive Pig
OMG - NoSQL looks like a DBMS! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why HIVE? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Quick Examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
HIVE: Components HDFS Hive CLI DDL Queries Browsing Map Reduce MetaStore Thrift API SerDe Thrift Jute JSON.. Execution Hive QL Parser Planner Mgmt. Web UI
Data Model Hive Entity Sample Metastore Entity Sample HDFS Location Table T /wh/T Partition date=d1 /wh/T/date=d1 Bucketing column userid /wh/T/date=d1/part-0000 … /wh/T/date=d1/part-1000 (hashed on userid) External Table extT /wh2/existing/dir (arbitrary location)
Using Hive: quick planner ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Join Processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Group-By Processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Common Issues ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Other goodies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Future ,[object Object],[object Object],[object Object],[object Object]
Hive vs. .. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive Warehouse @ Facebook ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive is just part of the story ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
HBase png s  via Lars George
Why/When use HBase? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Data Model
Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
System Picture (thx. Lars)
HBase Index Internals ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BigTable (HBase) vs. Dynamo (Cassandra, …) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Appendix
Looks like this .. Disks Node Disks Node Disks Node Disks Node Disks Node Disks Node 1 Gigabit 4-8 Gigabit Node = DataNode  + Map-Reduce
HDFS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
In pictures ..  NameNode Disks 32GB RAM Secondary NameNode Disks 32GB RAM DataNode DataNode DataNode DFS Client DataNode DataNode DataNode getLocations locations
Map/Reduce DataFLow
Programming with Map/Reduce ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Rubbing it in .. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive Optimizations  – Merge Sequential Map Reduce Jobs ,[object Object],[object Object],A Map Reduce B C AB Map Reduce ABC key av bv 1 111 222 key av 1 111 key bv 1 222 key cv 1 333 key av bv cv 1 111 222 333

Contenu connexe

Tendances

Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practicesHadoop User Group
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureRyan Hennig
 
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - OverviewJay
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseHBaseCon
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseMichael Stack
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesHBaseCon
 

Tendances (20)

Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practices
 
מיכאל
מיכאלמיכאל
מיכאל
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
 
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraph
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Hadoop and HBase @eBay
Hadoop and HBase @eBayHadoop and HBase @eBay
Hadoop and HBase @eBay
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 

Similaire à Nextag talk

Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pigSudar Muthu
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at FacebookS S
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainYahoo Developer Network
 
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseCloudera, Inc.
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Casesnzhang
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And HdfsCloudera, Inc.
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comEdward D. Kim
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyJay Nagar
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 

Similaire à Nextag talk (20)

Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBase
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And Hdfs
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
Hadoop
HadoopHadoop
Hadoop
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 

Nextag talk

  • 1. HBase and Hive a random walk down H street Joydeep Sen Sarma
  • 2.
  • 3.
  • 4. Hadoop Ecosystem HDFS ZooKeeper HBase Map-Reduce Scheduler Hive Metastore Hive Pig
  • 5.
  • 6.
  • 7.
  • 8. HIVE: Components HDFS Hive CLI DDL Queries Browsing Map Reduce MetaStore Thrift API SerDe Thrift Jute JSON.. Execution Hive QL Parser Planner Mgmt. Web UI
  • 9. Data Model Hive Entity Sample Metastore Entity Sample HDFS Location Table T /wh/T Partition date=d1 /wh/T/date=d1 Bucketing column userid /wh/T/date=d1/part-0000 … /wh/T/date=d1/part-1000 (hashed on userid) External Table extT /wh2/existing/dir (arbitrary location)
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. HBase png s via Lars George
  • 19.
  • 20.
  • 21.
  • 23.
  • 24.
  • 26. Looks like this .. Disks Node Disks Node Disks Node Disks Node Disks Node Disks Node 1 Gigabit 4-8 Gigabit Node = DataNode + Map-Reduce
  • 27.
  • 28. In pictures .. NameNode Disks 32GB RAM Secondary NameNode Disks 32GB RAM DataNode DataNode DataNode DFS Client DataNode DataNode DataNode getLocations locations
  • 30.
  • 31.
  • 32.

Notes de l'éditeur

  1. Offline and Near-Real time data processing Not online
  2. Simple map-reduce is easy – but it can get complicated very quickly.
  3. Assume users know about Hadoop Streaming
  4. Nomenclature: Core switch and Top of Rack
  5. Compare to a standard unix file system
  6. Rack local and node local access rocks Scalability is bound by switches
  7. Point out that now we know how HDFS works – we can run maps close to data
  8. Point out how data local computing is useful in this example Exposes some of the features we need in hadoop – output of reducer can be directly sent to another reducer As an exercise to the reader – the results from the shell do not equal those from hadoop – interesting to find why.