SlideShare une entreprise Scribd logo
1  sur  39
HBase Introduction
      Scott Miao
      2012/06/25
Agenda
•   Course Credit
•   One common web site story
•   Why RDB not affordable ?
•   Big Data
•   Why use noSQL ?
•   HBase Indroduction
•   Hands-on
•   noSQL architecture common practices
•   Case study

                                          1
一個網站的故事 (1/3)
   • RDBMS是Persistence tier一個理所當然的選擇
       • 它可以幫我們處理transaction(ACID),確保完整性限制
         (Integrity Constraints),標準的SQL語言,甚至還有Stored
         Procedure可以用

   • 第一次,你的使用者人數越來越多時…
       • 使用AP Servers Cluster,它們共用一台DB Server

   • 第二次,你的使用者人數越來越多時…
       • DB Server分成Master-Slave架構
          • 從Slave Servers讀取資料
          • 寫入資料至Master Server                                                     2
Hbase: The Definitive Guide - http://www.amazon.com/HBase-Definitive-Guide-Lars-
George/dp/1449396100/ref=sr_1_1?ie=UTF8&qid=1339060175&sr=8-1
一個網站的故事 (2/3)
• 第三次,你的使用者人數越來越多時…
 • 針對讀取資料的瓶頸
  • 在Server程式和DB之間,加入Cache,例如Memcached (Memory
    DB)
  • 但Server程式的Cache和DB之間,很可能出現資料不一致的問題
 • 針對寫入資料的瓶頸
  • 增加DB Server的機器規格(CPU、Memory、Disk等,Vertically
    Scaling)
  • 別忘記!我們也要連同Slave Severs的規格也要一起增加ㄛ…




                                                   3
一個網站的故事 (3/3)
• 第四次,你的使用者人數越來越多時…
 • 使用Database Sharding技術
   • 從Vertically Scaling轉換成Horizontally Scaling
   • 開啟管理的惡夢
     • RDBMS天生不適合分散式儲存 (ACID,Fixed Schema)
     • DBA要設定一組Sharding Rules
        • 當其中某一台DB Server掛掉,或是儲存容量滿了,就要開始手動作
          Resharding
        • Resharding包含了要重新調整Sharding Rules,接著需要作大量IO的資料複製
          和遷移工作,同時間要保證網站可以正常服務,或是要在一定時間內中斷服
          務


 • 這通常是事後不得已,而且少數可選擇的解決方案
   • 天知道我的網站會這麼紅?
                                                            4
Why RDB not affordable ? (1/6)
• Bottleneck of Relational-DB
   • 90s V.S. recent years (Web 2.0)

• Memcachd + mySQL
   • Mitigate read stress effectively, but not write stress

• mySQL Cluster solution
   • Master/Slave
      • Not affordable for highly-concurrency scenario
   • Vertical Partitioning
   • Vertical/Horizontal Partitioning (Database sharding)
      • Complex
      • Hard to scale-out and change requirements
      • Low availability
                                                               5
• Some type of simple but big size data cause this condition
http://www.infoq.com/cn/news/2011/01/nosql-why
Why RDB not affordable ? (2/6) –
  A general HA system architecture design




                                                6

軟體專案的素質之四 ─ 整體設計之 架構設計案例 ─ http://takeshi-
experience.blogspot.tw/2012/04/blog-post.html
Why RDB not affordable ? (3/6) –
Master/Slave




                                   7
Why RDB not affordable ? (4/6) –
Vertical Partitioning




                                   8
Why RDB not affordable ? (5/6) –
Master/Slave + Vertical Partitioning




                                       9
Why RDB not affordable ? (6/6) –
Vertical/Horizontal Partitioning




                                   10
• 過去3年所產生的資料量,比過去四萬年創造的資料量還
  多!
• WallMart的資料量是美國國會圖書館的167倍!
• eBay分析平台每天處理的資料量高達100PB!(約
  1,000,000GB)
• 截至2010年,世界電子資料儲存量為1.2ZB!
  (1,200,000PB)
• 根據IDC預測,2020年世界電子資料儲存量會是2009年的
  基礎上,再加上44倍,達到35萬億GB!
 • 35,000,000,000,000 Giga Bytes

                                                                       11

 架构师 10 月刊 ─ http://www.infoq.com/cn/minibooks/architect-oct-10-2011
Trend Micro’s problem
•   每人每天造訪約20 ~ 60 html頁面
•   每個html頁面約包含15 ~ 30 URI
•   每個URI物件大小約10 ~ 150 KB
•   以一百萬個用戶而言
    • 100萬 X 20 = 2,000萬個html頁面
    • 2,000萬個html頁面 X 15 = 30,000萬個URI (三十億)
    • 30,000萬個URI物件 X 10 = 30,000KB (3TB)
• 以上純屬台灣區的資料量

• 趨勢是個全球性的公司
    • 故每天的資料量約數十個TB
                                                          12

趨勢的雲端發現之旅 ─ http://findbook.tw/book/9789866126185/basic
大資料時代下的新寵兒 ─
 • Not only SQL
 • 於2009年開始
 • 有以下特性
   •   不使用關聯式資料模型
   •   天生分散式儲存
   •   易於水平式擴充的
   •   開放原始碼的
   •   易於擴充的
   •   簡單的API操作 (CRUD,通常沒有SQL支援)
   •   CAP (不同於ACID)
       • Eventually Consistency、Availability、Partition-Tolerance
   • 儲存巨量且異質的資料                                                    13


  http://nosql-database.org/
Why use noSQL ?
• Easy to scale-out
  • Unlike RDB, no relationship therefore easy to scale-out


• High performance even in the big data
  • Table-level cache (RDB) V.S. Record-level cache (noSQL)

• Elastic data model
  • Schema V.S. Schema-less/Dynamic schema
• High availability
  • Easy to add new machines (nodes) without any performance
    impact
                                                               14
Comparison between RDB and noSQL
If given a really huge of big data…

Aspects              RDB                      noSQL
Performance          Getting lower            Sustain as a small size of data
Scalability          Mainly for scale up      Mainly for scale out
Reliability          ACID                     CAP
Availability         Hard to maintain SLA     Easy to maintain SLA
Security             Robust                   Depends
Economics            High-end machines        Commodity machines
Data Model           Relational, Fix-schema   Depends but more likely
                                              simple, Schema-less
Maturity             Very mature              Not mature, various products
Commercial            Global company          Small start-ups
support
OLAP/BI               Mature                  Immature                          15
Human resource        Easy to find            Hard to find
noSQL basic categories




                                                               16

iTcloud新雲端時代 ─ http://www.ithome.com.tw/002/cloud/cloud.html
Apache Hbase介紹
    • ASF的top-level專案
    • 屬於noSQL DB中的Key-Value類型
    • 源自於Google的
       • Bigtable: A Distributed Storage System for Structured Data
       • a distributed storage system for managing structured data that is
         designed to scale to a very large size: petabytes of data across
         thousands of commodity servers
       • a sparse, distributed, persistent multi-dimensional sorted map




                                                                                   17
Hbase: The Definitive Guide - http://www.amazon.com/HBase-Definitive-Guide-Lars-
George/dp/1449396100/ref=sr_1_1?ie=UTF8&qid=1339060175&sr=8-1
Apache Hbase Concepts – Column-Oriented (1/2)




                                                              18


    http://ofps.oreilly.com/titles/9781449396107/intro.html
Apache Hbase Concepts – Column-
Oriented (2/2)
  • a sparse, distributed, persistent multi-dimensional sorted map
    • which is indexed by row key, column key (column family +
      qualifiers), and a timestamp




                       Column Families




                                                                     19
Apache Hbase Concepts - Architecture




                                                                   20


  http://ofps.oreilly.com/titles/9781449396107/architecture.html
Hands-on (1/3) –
Use your VM (Virtual Machine) to install tm-puppet

• Please refer to SPN Dev hbase training program again~
• Install git on your PC
• Install tm-puppet on your VM




                                                          21
Hands-on (2/3) –
Use HBase shell
• Basic operations
  • help, list, scan
• Create
  • A table ‘MY_FIRST_TABLE’
  • Two column families ‘FAM_1’, ‘FAM_2’
  • Ex.
      • create 't1', {NAME => 'f1'}, {NAME => 'f2'}
      • Create ‘t1’, ‘f1’, ‘f2’
• Put two records (column)
  • Ex. put 't1', 'r1', 'c1', 'value'
• Update a record (column) (It is also a put)
• Delete a record (column)                            22
  • delete 't1', 'r1', 'c1'
Hands-on (3/3) –
Requirements
• Put your successful installed tm-puppet image file to git
  • Use following commands
      • Jps
      • Ifconfig
  • Cut the image
  • Path : ${git_home}/hbase-training/001/hands-
    on/${your_name}/hands-on-001.jpg
• Put your hbase shell records image file to git
  • Use following commands
      • Scan ‘MY_TEST_TABLE’
      • Ifconfig
  • Cut the image
  • Path : ${git_home}/ hbase-training/001/hands-
    on/${your_name}/hands-on-002.jpg                          23
• Commit and push your git
noSQL architecture practices (1/8) –
 Use noSQL as complement
  • Use noSQL as a mirror (implemented by code)
     • The RDB is still a major storage device, and noSQL as a mirror




                                                                        24
NoSQL架構實踐(一)— 以NoSQL為輔 ─
http://www.infoq.com/cn/news/2011/02/nosql-architecture-practice
noSQL architecture practices (2/8) –
Use noSQL as complement
//PSEUDO CODE for noSQL as a mirror
//We want to store the data Object
bool status = false;
DB.startTransaction(); //start transaction
id = DB.Insert(data); //write data Object to RDB
if(id > 0){
   status = NoSQL.Add(id, data); //write data Object to noSQL by id
}
if(id > 0 && status == true){
   DB.commit(); //commit transaction
} else {
   DB.rollback(); //failed, rollback transaction
}

                                                                      25
noSQL architecture practices (3/8) –
Use noSQL as complement
• Use noSQL as a mirror (implemented by synchronization)




                                                           26
noSQL architecture practices (4/8) –
Use noSQL as complement
• Combine RDB & noSQL




                                       27
noSQL architecture practices (5/8) –
Use noSQL as complement
//PSEUDO CODE for RDB & noSQL combination
//we want to store the data Object
data.title = "title";
data.name = "name";
data.time = "2009-12-01 10:10:01";
data.from = "1";
bool status = false;
DB.startTransaction(); //start transaction
//write into RDB, data.from is a value needed by search criteria
id = DB.Insert("INSERT INTO table (from) VALUES(data.from)");
if(id > 0){
   //write data Object to noSQL by id
   status = NoSQL.Add(id, data);
}
if(id>0 && status==true){
   DB.commit(); //commit transaction                               28
}else{
   DB.rollback(); //failed, rollback transaction
}
noSQL architecture practices (6/8) –
Use noSQL as complement
• What benefits we can get from the RDB & noSQL combination
  practice

• Decrease the I/O of RDB, therefore save more storage space
• Increase the RDB table-level cache hitrate, only the key
  values(PK, FK, search criteria related values) updated will
  refresh the cache
• Increase the synchronization efficiency for RDB Master/Slave
  architecture
• Increase the RDB backup/recover efficiency
• Increase the scalability/performance for whole system
                                                                 29
noSQL architecture practices (7/8) –
  Use noSQL as master
   • Use only with noSQL
   • Mainly for simple query requirements systems
   • But there are noSQL products can fulfill the more complex
     queries
      • MonngoDB, Tokyo Cabinet, etc




                                                                     30
NoSQL架構實踐(二)— 以NoSQL為主 ─
http://www.infoq.com/cn/news/2011/03/nosql-architecture-practice-2
noSQL architecture practices (8/8) –
Use noSQL as master
• Use noSQL as major data source
• APs only write data into noSQL
• Then synchronize the data from noSQL to other data stores
  based on their application




                                                              31
Case Study (1/4) –
Facebook’s Real-time Message System
• Use HBase to store 135+ billion messages a month
   • Beat off other few competitors such as Cassandra, mySQL-
     Sharding, etc

• Data Patterns
   • A short set of temporal data that tends to be volatile
   • An ever-growing set of data that rarely gets accessed




                                                                               32
Facebook's New Real-time Messaging System: HBase to Store 135+ Billion
Messages a Month - http://highscalability.com/blog/2010/11/16/facebooks-new-
real-time-messaging-system-hbase-to-store-135.html
Case Study (2/4) –
Facebook’s Real-time Message System
• Some key aspects of their system:
  • HBase
     • Has a simpler consistency model than Cassandra.
     • Very good scalability and performance for their data patterns.
     • Most feature rich for their requirements: auto load balancing and
       failover, compression support, multiple shards per server, etc.
     • HDFS, the filesystem used by HBase, supports replication, end-to-end
       checksums, and automatic rebalancing.
     • Facebook's operational teams have a lot of experience using HDFS
       because Facebook is a big user of Hadoop and Hadoop uses HDFS as
       its distributed file system.


                                                                              33
Case Study (3/4) –
Facebook’s Real-time Message System
• Haystack is used to store attachments.
• A custom application server was written from scratch in order
  to service the massive inflows of messages from many
  different sources.
• A user discovery service was written on top of ZooKeeper.
• Infrastructure services are accessed for: email account
  verification, friend relationships, privacy decisions, and
  delivery decisions
• Keeping with their small teams doing amazing things approach,
  20 new infrastructures services are being released by 15
  engineers in one year.
• Facebook is not going to standardize on a single database        34
  platform, they will use separate platforms for separate tasks.
Case Study (4/4) –
Alibaba China Site architecture




                                                                                   35

http://www.infoq.com/cn/presentations/hl-alibaba-cn-architecture-design-practice
36
Data Access pattern as the key
for noSQL
• Data Structure
  • Structured
  • Semi-structured
  • Unstructured
  • Size
• How many & how often writes/read (proportion)
• Data Writing
  • Transaction
• Data Reading
  • Random access
  • Sequential access
  • Relationship                                  37
Q&A



      38

Contenu connexe

Tendances

HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Noteslarsgeorge
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101Nick Dimiduk
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme MakeoverHBaseCon
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationScott Miao
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
 
PostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverPostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverJohn Paulett
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon
 
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon
 

Tendances (19)

HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Notes
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
 
PostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverPostgreSQL Scaling And Failover
PostgreSQL Scaling And Failover
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
 
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond Panel
 

En vedette

002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapiScott Miao
 
005 cluster monitoring
005 cluster monitoring005 cluster monitoring
005 cluster monitoringScott Miao
 
003 admin featuresandclients
003 admin featuresandclients003 admin featuresandclients
003 admin featuresandclientsScott Miao
 
Hadoop voor niet-technici
Hadoop voor niet-techniciHadoop voor niet-technici
Hadoop voor niet-techniciEvert Lammerts
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 

En vedette (6)

002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapi
 
005 cluster monitoring
005 cluster monitoring005 cluster monitoring
005 cluster monitoring
 
003 admin featuresandclients
003 admin featuresandclients003 admin featuresandclients
003 admin featuresandclients
 
Hadoop voor niet-technici
Hadoop voor niet-techniciHadoop voor niet-technici
Hadoop voor niet-technici
 
Apache HBase 0.98
Apache HBase 0.98Apache HBase 0.98
Apache HBase 0.98
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 

Similaire à 001 hbase introduction

Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformMaris Elsins
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines
 
NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]Huy Do
 
Manuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4octManuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4octParadigma Digital
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutionssolarisyougood
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQLUlf Wendel
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
Handling Massive Writes
Handling Massive WritesHandling Massive Writes
Handling Massive WritesLiran Zelkha
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]shuwutong
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesMaynooth University
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 

Similaire à 001 hbase introduction (20)

NoSQL
NoSQLNoSQL
NoSQL
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IX
 
NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]
 
Manuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4octManuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4oct
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQL
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Handling Massive Writes
Handling Massive WritesHandling Massive Writes
Handling Massive Writes
 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 

Plus de Scott Miao

My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharingMy thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharingScott Miao
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01Scott Miao
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudScott Miao
 
analytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsanalytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsScott Miao
 
Attack on graph
Attack on graphAttack on graph
Attack on graphScott Miao
 
20121022 tm hbasecanarytool
20121022 tm hbasecanarytool20121022 tm hbasecanarytool
20121022 tm hbasecanarytoolScott Miao
 

Plus de Scott Miao (6)

My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharingMy thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
analytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsanalytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the aws
 
Attack on graph
Attack on graphAttack on graph
Attack on graph
 
20121022 tm hbasecanarytool
20121022 tm hbasecanarytool20121022 tm hbasecanarytool
20121022 tm hbasecanarytool
 

Dernier

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Dernier (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

001 hbase introduction

  • 1. HBase Introduction Scott Miao 2012/06/25
  • 2. Agenda • Course Credit • One common web site story • Why RDB not affordable ? • Big Data • Why use noSQL ? • HBase Indroduction • Hands-on • noSQL architecture common practices • Case study 1
  • 3. 一個網站的故事 (1/3) • RDBMS是Persistence tier一個理所當然的選擇 • 它可以幫我們處理transaction(ACID),確保完整性限制 (Integrity Constraints),標準的SQL語言,甚至還有Stored Procedure可以用 • 第一次,你的使用者人數越來越多時… • 使用AP Servers Cluster,它們共用一台DB Server • 第二次,你的使用者人數越來越多時… • DB Server分成Master-Slave架構 • 從Slave Servers讀取資料 • 寫入資料至Master Server 2 Hbase: The Definitive Guide - http://www.amazon.com/HBase-Definitive-Guide-Lars- George/dp/1449396100/ref=sr_1_1?ie=UTF8&qid=1339060175&sr=8-1
  • 4. 一個網站的故事 (2/3) • 第三次,你的使用者人數越來越多時… • 針對讀取資料的瓶頸 • 在Server程式和DB之間,加入Cache,例如Memcached (Memory DB) • 但Server程式的Cache和DB之間,很可能出現資料不一致的問題 • 針對寫入資料的瓶頸 • 增加DB Server的機器規格(CPU、Memory、Disk等,Vertically Scaling) • 別忘記!我們也要連同Slave Severs的規格也要一起增加ㄛ… 3
  • 5. 一個網站的故事 (3/3) • 第四次,你的使用者人數越來越多時… • 使用Database Sharding技術 • 從Vertically Scaling轉換成Horizontally Scaling • 開啟管理的惡夢 • RDBMS天生不適合分散式儲存 (ACID,Fixed Schema) • DBA要設定一組Sharding Rules • 當其中某一台DB Server掛掉,或是儲存容量滿了,就要開始手動作 Resharding • Resharding包含了要重新調整Sharding Rules,接著需要作大量IO的資料複製 和遷移工作,同時間要保證網站可以正常服務,或是要在一定時間內中斷服 務 • 這通常是事後不得已,而且少數可選擇的解決方案 • 天知道我的網站會這麼紅? 4
  • 6. Why RDB not affordable ? (1/6) • Bottleneck of Relational-DB • 90s V.S. recent years (Web 2.0) • Memcachd + mySQL • Mitigate read stress effectively, but not write stress • mySQL Cluster solution • Master/Slave • Not affordable for highly-concurrency scenario • Vertical Partitioning • Vertical/Horizontal Partitioning (Database sharding) • Complex • Hard to scale-out and change requirements • Low availability 5 • Some type of simple but big size data cause this condition http://www.infoq.com/cn/news/2011/01/nosql-why
  • 7. Why RDB not affordable ? (2/6) – A general HA system architecture design 6 軟體專案的素質之四 ─ 整體設計之 架構設計案例 ─ http://takeshi- experience.blogspot.tw/2012/04/blog-post.html
  • 8. Why RDB not affordable ? (3/6) – Master/Slave 7
  • 9. Why RDB not affordable ? (4/6) – Vertical Partitioning 8
  • 10. Why RDB not affordable ? (5/6) – Master/Slave + Vertical Partitioning 9
  • 11. Why RDB not affordable ? (6/6) – Vertical/Horizontal Partitioning 10
  • 12. • 過去3年所產生的資料量,比過去四萬年創造的資料量還 多! • WallMart的資料量是美國國會圖書館的167倍! • eBay分析平台每天處理的資料量高達100PB!(約 1,000,000GB) • 截至2010年,世界電子資料儲存量為1.2ZB! (1,200,000PB) • 根據IDC預測,2020年世界電子資料儲存量會是2009年的 基礎上,再加上44倍,達到35萬億GB! • 35,000,000,000,000 Giga Bytes 11 架构师 10 月刊 ─ http://www.infoq.com/cn/minibooks/architect-oct-10-2011
  • 13. Trend Micro’s problem • 每人每天造訪約20 ~ 60 html頁面 • 每個html頁面約包含15 ~ 30 URI • 每個URI物件大小約10 ~ 150 KB • 以一百萬個用戶而言 • 100萬 X 20 = 2,000萬個html頁面 • 2,000萬個html頁面 X 15 = 30,000萬個URI (三十億) • 30,000萬個URI物件 X 10 = 30,000KB (3TB) • 以上純屬台灣區的資料量 • 趨勢是個全球性的公司 • 故每天的資料量約數十個TB 12 趨勢的雲端發現之旅 ─ http://findbook.tw/book/9789866126185/basic
  • 14. 大資料時代下的新寵兒 ─ • Not only SQL • 於2009年開始 • 有以下特性 • 不使用關聯式資料模型 • 天生分散式儲存 • 易於水平式擴充的 • 開放原始碼的 • 易於擴充的 • 簡單的API操作 (CRUD,通常沒有SQL支援) • CAP (不同於ACID) • Eventually Consistency、Availability、Partition-Tolerance • 儲存巨量且異質的資料 13 http://nosql-database.org/
  • 15. Why use noSQL ? • Easy to scale-out • Unlike RDB, no relationship therefore easy to scale-out • High performance even in the big data • Table-level cache (RDB) V.S. Record-level cache (noSQL) • Elastic data model • Schema V.S. Schema-less/Dynamic schema • High availability • Easy to add new machines (nodes) without any performance impact 14
  • 16. Comparison between RDB and noSQL If given a really huge of big data… Aspects RDB noSQL Performance Getting lower Sustain as a small size of data Scalability Mainly for scale up Mainly for scale out Reliability ACID CAP Availability Hard to maintain SLA Easy to maintain SLA Security Robust Depends Economics High-end machines Commodity machines Data Model Relational, Fix-schema Depends but more likely simple, Schema-less Maturity Very mature Not mature, various products Commercial Global company Small start-ups support OLAP/BI Mature Immature 15 Human resource Easy to find Hard to find
  • 17. noSQL basic categories 16 iTcloud新雲端時代 ─ http://www.ithome.com.tw/002/cloud/cloud.html
  • 18. Apache Hbase介紹 • ASF的top-level專案 • 屬於noSQL DB中的Key-Value類型 • 源自於Google的 • Bigtable: A Distributed Storage System for Structured Data • a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers • a sparse, distributed, persistent multi-dimensional sorted map 17 Hbase: The Definitive Guide - http://www.amazon.com/HBase-Definitive-Guide-Lars- George/dp/1449396100/ref=sr_1_1?ie=UTF8&qid=1339060175&sr=8-1
  • 19. Apache Hbase Concepts – Column-Oriented (1/2) 18 http://ofps.oreilly.com/titles/9781449396107/intro.html
  • 20. Apache Hbase Concepts – Column- Oriented (2/2) • a sparse, distributed, persistent multi-dimensional sorted map • which is indexed by row key, column key (column family + qualifiers), and a timestamp Column Families 19
  • 21. Apache Hbase Concepts - Architecture 20 http://ofps.oreilly.com/titles/9781449396107/architecture.html
  • 22. Hands-on (1/3) – Use your VM (Virtual Machine) to install tm-puppet • Please refer to SPN Dev hbase training program again~ • Install git on your PC • Install tm-puppet on your VM 21
  • 23. Hands-on (2/3) – Use HBase shell • Basic operations • help, list, scan • Create • A table ‘MY_FIRST_TABLE’ • Two column families ‘FAM_1’, ‘FAM_2’ • Ex. • create 't1', {NAME => 'f1'}, {NAME => 'f2'} • Create ‘t1’, ‘f1’, ‘f2’ • Put two records (column) • Ex. put 't1', 'r1', 'c1', 'value' • Update a record (column) (It is also a put) • Delete a record (column) 22 • delete 't1', 'r1', 'c1'
  • 24. Hands-on (3/3) – Requirements • Put your successful installed tm-puppet image file to git • Use following commands • Jps • Ifconfig • Cut the image • Path : ${git_home}/hbase-training/001/hands- on/${your_name}/hands-on-001.jpg • Put your hbase shell records image file to git • Use following commands • Scan ‘MY_TEST_TABLE’ • Ifconfig • Cut the image • Path : ${git_home}/ hbase-training/001/hands- on/${your_name}/hands-on-002.jpg 23 • Commit and push your git
  • 25. noSQL architecture practices (1/8) – Use noSQL as complement • Use noSQL as a mirror (implemented by code) • The RDB is still a major storage device, and noSQL as a mirror 24 NoSQL架構實踐(一)— 以NoSQL為輔 ─ http://www.infoq.com/cn/news/2011/02/nosql-architecture-practice
  • 26. noSQL architecture practices (2/8) – Use noSQL as complement //PSEUDO CODE for noSQL as a mirror //We want to store the data Object bool status = false; DB.startTransaction(); //start transaction id = DB.Insert(data); //write data Object to RDB if(id > 0){ status = NoSQL.Add(id, data); //write data Object to noSQL by id } if(id > 0 && status == true){ DB.commit(); //commit transaction } else { DB.rollback(); //failed, rollback transaction } 25
  • 27. noSQL architecture practices (3/8) – Use noSQL as complement • Use noSQL as a mirror (implemented by synchronization) 26
  • 28. noSQL architecture practices (4/8) – Use noSQL as complement • Combine RDB & noSQL 27
  • 29. noSQL architecture practices (5/8) – Use noSQL as complement //PSEUDO CODE for RDB & noSQL combination //we want to store the data Object data.title = "title"; data.name = "name"; data.time = "2009-12-01 10:10:01"; data.from = "1"; bool status = false; DB.startTransaction(); //start transaction //write into RDB, data.from is a value needed by search criteria id = DB.Insert("INSERT INTO table (from) VALUES(data.from)"); if(id > 0){ //write data Object to noSQL by id status = NoSQL.Add(id, data); } if(id>0 && status==true){ DB.commit(); //commit transaction 28 }else{ DB.rollback(); //failed, rollback transaction }
  • 30. noSQL architecture practices (6/8) – Use noSQL as complement • What benefits we can get from the RDB & noSQL combination practice • Decrease the I/O of RDB, therefore save more storage space • Increase the RDB table-level cache hitrate, only the key values(PK, FK, search criteria related values) updated will refresh the cache • Increase the synchronization efficiency for RDB Master/Slave architecture • Increase the RDB backup/recover efficiency • Increase the scalability/performance for whole system 29
  • 31. noSQL architecture practices (7/8) – Use noSQL as master • Use only with noSQL • Mainly for simple query requirements systems • But there are noSQL products can fulfill the more complex queries • MonngoDB, Tokyo Cabinet, etc 30 NoSQL架構實踐(二)— 以NoSQL為主 ─ http://www.infoq.com/cn/news/2011/03/nosql-architecture-practice-2
  • 32. noSQL architecture practices (8/8) – Use noSQL as master • Use noSQL as major data source • APs only write data into noSQL • Then synchronize the data from noSQL to other data stores based on their application 31
  • 33. Case Study (1/4) – Facebook’s Real-time Message System • Use HBase to store 135+ billion messages a month • Beat off other few competitors such as Cassandra, mySQL- Sharding, etc • Data Patterns • A short set of temporal data that tends to be volatile • An ever-growing set of data that rarely gets accessed 32 Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month - http://highscalability.com/blog/2010/11/16/facebooks-new- real-time-messaging-system-hbase-to-store-135.html
  • 34. Case Study (2/4) – Facebook’s Real-time Message System • Some key aspects of their system: • HBase • Has a simpler consistency model than Cassandra. • Very good scalability and performance for their data patterns. • Most feature rich for their requirements: auto load balancing and failover, compression support, multiple shards per server, etc. • HDFS, the filesystem used by HBase, supports replication, end-to-end checksums, and automatic rebalancing. • Facebook's operational teams have a lot of experience using HDFS because Facebook is a big user of Hadoop and Hadoop uses HDFS as its distributed file system. 33
  • 35. Case Study (3/4) – Facebook’s Real-time Message System • Haystack is used to store attachments. • A custom application server was written from scratch in order to service the massive inflows of messages from many different sources. • A user discovery service was written on top of ZooKeeper. • Infrastructure services are accessed for: email account verification, friend relationships, privacy decisions, and delivery decisions • Keeping with their small teams doing amazing things approach, 20 new infrastructures services are being released by 15 engineers in one year. • Facebook is not going to standardize on a single database 34 platform, they will use separate platforms for separate tasks.
  • 36. Case Study (4/4) – Alibaba China Site architecture 35 http://www.infoq.com/cn/presentations/hl-alibaba-cn-architecture-design-practice
  • 37. 36
  • 38. Data Access pattern as the key for noSQL • Data Structure • Structured • Semi-structured • Unstructured • Size • How many & how often writes/read (proportion) • Data Writing • Transaction • Data Reading • Random access • Sequential access • Relationship 37
  • 39. Q&A 38