SlideShare a Scribd company logo
1 of 78
Download to read offline
ARCHITECTURE &
ADVANCED USAGE
  SCOTT MIAO 2012/7/19
AGENDA

• Course Credit

• Architecture
  • More…


• Advanced Usage
  • More…




                           2
COURSE CREDIT

•   Show up, 30 scores
•   Ask question, each question earns 5 scores
•   Quiz, 40 scores, Pls see TCExam
•   70 scores will pass this course

• Each course credit will be calculated once for
  each course finished
• The course credit will be sent to you and your
  supervisor by mail


                                                   3
ARCHITECTURE

• Seek V.S. Transfer   • KeyValue Format
• Storage              • Write-Ahead Log
• Write Path           • Read Path
• Files                • Regions Lookup
• Region Splits        • Region Life Cycle
• Compactions          • Replication
• HFile Format


                                             4
SEEK V.S. TRANSFER

• HBase use Log-Structure Merge Trees (LSM-Trees)
  data structure as it’s underlying Store File operation
  mechanism
  • Derived from B+ Trees
  • Easy to handle data with optimized layout
    • WAL Log
    • MemStore
  • Operates at the Disk Transfer
• B+ Trees
  • Many RDBMSs use B+ Trees
  • Use OPTIMIZATION process periodically
  • Operates at the Disk Seek


                                                           5
SEEK V.S. TRANSFER

  • Disk Transfer
     • Moving data between the disk surface and the host system
     • CPU, RAM, and disk size double every 18–24 months


  • Disk Seek
     • Measures the time it takes the head assembly on the actuator
       arm to travel to the track of the disk where the data will be
       read or written
     • Seek time remains nearly constant at around a 5% increase in
       speed per year


  • Conclusion
     • At scale seek, is inefficient compared to transfer
https://www.research.ibm.com/haifa/Workshops/ir2005/papers/DougCutti
ng-Haifa05.pdf                                                     6
SEEK V.S. TRANSFER –
     LSM TREES




                       7
STORAGE




          8
STORAGE - COMPONENTS

•   Zookeeper
•   -ROOT-, .META. Tables
•   HMaster
•   HRegionServer
•   HLog (WAL, Write-Ahead Log)
•   HRegion
•   Store => ColumnFamily
•   StorageFile => HFile
•   DFS Client
•   HDFS, Amazon S3, Local File System, etc


                                              9
WRITE PATH



              1. A Write to a
2. Write to
              region server
WAL log

               3. Write to a
               corresponding
               MemStore after WAL
               log persistent



                                4. Flush a new Hfile if
                                MemStore size reach the
                                threshold

                                                          10
FILES

•   Root-Level files
•   Table-Level files
•   Region-Level files
•   A txt file for reference




                                   11
REGION SPLITS

• Split one region to two half-size regions
• Triggered while
  • hbase.hregion.max.filesize reached, default is 256MB
  • Hbase Shell split, HBaseAdmin.split(…)
• Following Steps the Region server will take…
  • Create a folder called “split” under parent region folder
  • Close the parent region, so it can not service any request
  • Prepare two new daughter regions (with multiple threads),
    inside the split folder, including…
    • region folder structure, reference Hfile, etc
  • Move this two daughter regions into Table folder if above
    steps completed


                                                                 12
REGION SPLITS

• Here is an example of how this looks in the .META.
  Table
row: testtable,row-500,1309812163930.d9ffc3a5cd016ae58e23d7a6cb937949.
 column=info:regioninfo, timestamp=1309872211559, value=REGION => {NAME
=> 
  'testtable,row-500,1309812163930.d9ffc3a5cd016ae58e23d7a6cb937949. 
   TableName => 'testtable', STARTKEY => 'row-500', ENDKEY => 'row-700', 
   ENCODED => d9ffc3a5cd016ae58e23d7a6cb937949, OFFLINE => true,
   SPLIT => true,}

 column=info:splitA, timestamp=1309872211559, value=REGION => {NAME => 
  'testtable,row-500,1309872211320.d5a127167c6e2dc5106f066cc84506f8. 
  TableName => 'testtable', STARTKEY => 'row-500', ENDKEY => 'row-550', 
  ENCODED => d5a127167c6e2dc5106f066cc84506f8,}
 column=info:splitB, timestamp=1309872211559, value=REGION => {NAME => 
  'testtable,row-550,1309872211320.de27e14ffc1f3fff65ce424fcf14ae42. 
  TableName => [B@62892cc5', STARTKEY => 'row-550', ENDKEY => 'row-700', 
  ENCODED => de27e14ffc1f3fff65ce424fcf14ae42,}
                                                                             13
REGION SPLITS

• The name of the reference file is another random
  number, but with the hash of the referenced region
  as a postfix
/hbase/testtable/d5a127167c6e2dc5106f066cc84506f8/colfam1/ 
6630747383202842155.d9ffc3a5cd016ae58e23d7a6cb937949




                                                               14
COMPACTIONS
• The store files are monitored by a background
  thread
• The flushes of memstores slowly build up an
  increasing number of on-disk files
• The compaction process will combine them to a
  few, larger files
• This goes on until
 • The largest of these files exceeds the configured maximum
   store file size and triggers a region split
• Type
 • Minor
 • Major
                                                               15
COMPACTIONS

• Compaction check triggered while…
 • A memstore has been flushed to disk
 • The compact or major_compact shell commands/API calls
 • A background thread
   • Called CompactionChecker
   • Each region server runs a single instance

  hbase.server.thread.wakefrequency X
  hbase.server.thread.wakefrequency.multiplier (default set to 1000)

      • Run it less often than the other thread-based tasks




                                                                       16
COMPACTIONS - MINOR

• Rewriting the last few files into one larger one
• The number of files is set with the
  hbase.hstore.compaction.min property
  • Default is 3
  • Needs to be at least 2 or more
  • A number too large…
    • Would delay minor compactions
    • Also would require more resources and take longer
• The maximum number of files is set with
  hbase.hstore.compaction.max property
  • Default is 10

                                                          17
COMPACTIONS - MINOR
• The all files that are under the limit, up to the total
  number of files per compaction allowed
  • hbase.hstore.compaction.min.size property
• Any file larger than the maximum compaction size is
  always excluded
  • hbase.hstore.compaction.max.size property
  • Default is Long.MAX_VALUE




                                                            18
COMPACTIONS - MAJOR
• Compact all files into a single file
• Also drop predicate deletion KeyValues
  • Action is Delete
  • Version
  • TTL
• Triggered while…
  • major_compact shell command/majorCompact() API call
  • hbase.hregion.majorcompaction property
    • Default is 24 hours
  • hbase.hregion.majorcompaction.jitter property
    • Default is 0.2
    • Without the jitter, all stores would run a major compaction at the
      same time, every 24 hours
• Minor compactions might be promoted to major
  compactions
  • Due to only affect store files whose size less than the
    configured maximum files per compaction
                                                                           19
HFILE FORMAT

• The actual storage files are implemented by the
  HFile class
• Store HBase’s data efficiently
• Blocks
 • Fixed size
   • Trailer, File Info
 • Others are variable size




                                                    20
HFILE FORMAT

• Default block size is 64KB
• Some recommendation written in API docs
 • block size between 8KB to 1MB for general usage
 • Larger block size is preferred for sequential access usecase
 • Smaller block size is preferred for random access usecase
   • Require more memory to hold the block index
   • May be slower to create (leads more FS I/O flushes)
   • The smallest possible block size would be around 20KB-30KB
• Each block contains
 • A magic header
 • A number of serialized KeyValue instances


                                                                  21
HFILE FORMAT

• Each block is about as large as the configured
  block size
• In practice, it is not an exact science
  • Store a KeyValue that is larger than the block size, the writer
    has to accept this
  • Even with smaller values, the check for the block size is done
    after the last value was written
  • The majority of blocks will be slightly larger
• Using a compression algorithm
  • Will not have much control over block size
  • final store file contain the same number of blocks, but the
    total size will be smaller since each block is smaller
                                                                  22
HFILE FORMAT –
  HFILE BLOCK SIZE V.S. HDFS BLOCK SIZE
• Default HDFS block size is 64 MB
  • Which 1,024 times the HFile default block size (64KB)
• HBase stores its files transparently into a filesystem
  • No correlation between these two block types
  • It is just a coincidence
  • HDFS also does not know what HBase stores




                                                            23
HFILE FORMAT –
                  HFILE CLASS
• Access an HFile directly
• hadoop fs –cat <hfile>
• hbase org.apache.hadoop.hbase.io.hfile.HFile –f
  <hfile> -m –v- p
 • Actual data stored as serialized KeyValue instances
 • HFile.Reader properties and the trailer block details
 • File info block values




                                                           24
KEYVALUE FORMAT
• Each KeyValue in the HFile is a low-level byte array




• Fixed-length Numbers
  • Key Length
  • Value Length
• If you deal with small values
  • Try to keep the key small
    • Choose a short row and column key
    • family name with a single byte and the qualifier equally short
  • Compression should help mitigate the overwhelming key size
    problem
• The sorting of all KeyValues in the store file helps to keep
  similar keys close together
                                                                       25
WRITE-AHEAD LOG

• Region servers keep data in-memory until enough is
  collected to warrant a flush to disk
 • Avoiding the creation of too many very small files
 • The data resides in memory it is volatile, not persistent
• Write-Ahead Logging
 • A common approach to solve above issue, even in most of
   RDBMSs
 • Each update (edit) is written to a log, then to real persistent
   data store
 • The server then has the liberty to batch or aggregate the
   data in memory as needed


                                                                 26
WRITE-AHEAD LOG

• The WAL is the lifeline that is needed when disaster
  strikes
  • The WAL records all changes to the data
• If the server crashes
  • WAL can effectively replay the log to get everything up to
    where the server should have been just before the crash
• if writing the record to the WAL fails
  • The whole operation must be considered a failure
• The actual WAL log resides on HDFS
  • HDFS is a replicated filesystem
  • Any other server can open the log and start replaying the
    edits
                                                                 27
WRITE-AHEAD LOG –
   WRITE PATH




                    28
WRITE-AHEAD LOG –
  MAIN CLASSES




                    29
WRITE-AHEAD LOG –
               OTHER CLASSES
• LogSyncer Class
 • HTableDescriptor.setDeferredLogFlush(boolean)
 • Default is false
   • Every update to WAL log will be synced into filesystem
 • Set to true
   • Background process instead
      • hbase.regionserver.optionallogflushinterval property
      • Default is 1 second
   • There is a chance of data loss in case of a server failure
   • Can only applies to user tables, not catalog tables (-ROOT-
     , .META.)




                                                                   30
WRITE-AHEAD LOG –
                OTHER CLASSES
• LogRoller Class
  • Takes care of rolling logfiles at certain intervals
  • hbase.regionserver.logroll.period property
    • Default is 1 hour


• Other parameters
  • hbase.regionserver.hlog.blocksize property
    • Default is 32MB
  • hbase.regionserver.logroll.multiplier property
    • Default is 0.95
    • Rotate logs when they are at 95% of the block size

• Logs are rotated
  • A certain amount of time has passed
  • Considered full
  • Whatever comes first
                                                           31
WRITE-AHEAD LOG –
SPLIT & REPLAY LOGS




                      32
WRITE-AHEAD LOG –
                 DURABILITY
• WAL Log
  • Sync them for every edit
  • Set the log flush times to be as low as you want


• Still dependent on the underlying filesystem
  • Especially the HDFS


• Use Hadoop 0.21.0 or later
• Or a special 0.20.x with append support patches
  • I used 0.20.203 before
  • Otherwise, you can very well face data loss !!

                                                       33
READ PATH

         ColFam2




 Due to timestamp and Bloom
                              34
 filter exclusion process
REGION LOOKUPS

• Catalog Tables
  • -ROOT-
    • Refer to all regions in the .META. table
  • .META.
    • Refer to all regions in all user tables


• A Three Level B+ tree-like lookup scheme
  • A node stored in ZooKeeper
    • Contains the location of the root table’s region
  • Lookup of a matching meta region from the -ROOT- table
  • Retrieval of the user table region from the .META. table


                                                               35
REGION LOOKUPS




                 36
THE REGION LIFE CYCLE




                        37
ZOOKEEPER
    • ZooKeeper as HBase distributed coordination
      service
    • Use HBase shell
        • hbase zkcli
Znode                   Description


/hbase/hbaseid          Cluster ID, as stored in the hbase.id file on HDFS


/hbase/master           Holds the master server name


/hbase/replication      Contains replication details


/hbase/root-region- Server name of the region server hosting the -ROOT-
server              regions
                                                                             38
ZOOKEEPER

Znode             Description


/hbase/rs         The root node for all region servers to list themselves when
                  they start. It is used to track server failures.

/hbase/shutdow Is used to track the cluster state. It contains the time when
n              the cluster was started, and is empty when it was shut down

/hbase/splitlog   All log-splitting-related coordination. States including
                  unassigned, owned and RESCAN

/hbase/table      Disabled tables are added to this znode


/hbase/unassig    Used by the AssignmentManager, to track region states
ned               across the entire cluster. It contains znodes for those regions
                                                                              39
                  that are not open, but are in a transitional state.
REPLICATION

• A way to copy data between HBase deployments
• It can serve as a
  • Disaster recovery solution
  • Provide higher availability at the HBase layer
• (HBase cluster) Master-push
  • One master cluster can replicate to any number of slave
    clusters, and each region server will participate to replicate
    its own stream of edits
  • Eventual consistency




                                                                 40
REPLICATION




              41
ADVANCED USAGE

• Key Design

• Secondary Indexes

• Search Integration

• Transactions

• Bloom Filters


                             42
KEY DESIGN

• Two fundamental key structures
  • Row Key
  • Column Key
    • A column family name + a column qualifier


• Use these keys
  • to solve commonly found problems when designing storage
    solutions


• Logical V.S. Physical layout


                                                         43
LOGICAL V.S. PHYSICAL LAYOUT




                               44
READ PERFORMANCE AND QUERY CRITERIA




                                  45
KEY DESIGN –
TALL-NARROW V.S. FLAT-WIDE TABLES
• Tall-narrow table layout
  • A table with few columns but many rows
• Flat-wide table layout
  • Has fewer rows but many columns


• Tall-narrow table layout is recommended
  • Due to a single row could outgrow the maximum file/region
    size and work against the region split facility under Flat-wide
    table design




                                                                  46
KEY DESIGN –
TALL-NARROW V.S. FLAT-WIDE TABLES
 • A email system as example
   • Flat-wide layout
 <userId> : <colfam> : <messageId> : <timestamp> : <email-message>

 12345 : data : 5fc38314-e290-ae5da5fc375d : 1307097848 : "Hi Lars, ..."
 12345 : data : 725aae5f-d72e-f90f3f070419 : 1307099848 : "Welcome, and ..."
 12345 : data : cc6775b3-f249-c6dd2b1a7467 : 1307101848 : "To Whom It ..."
 12345 : data : dcbee495-6d5e-6ed48124632c : 1307103848 : "Hi, how are ..."

   • Tall-narrow
 <userId>-<messageId> : <colfam> : <qualifier> : <timestamp> : <email-message>

 12345-5fc38314-e290-ae5da5fc375d : data : : 1307097848 : "Hi Lars, ..."
 12345-725aae5f-d72e-f90f3f070419 : data : : 1307099848 : "Welcome, and ..."
 12345-cc6775b3-f249-c6dd2b1a7467 : data : : 1307101848 : "To Whom It ..."
 12345-dcbee495-6d5e-6ed48124632c : data : : 1307103848 : "Hi, how are ..."

Empty Qualifier !!                                                             47
PARTIAL KEY SCANS




• Make sure to pad the value of each field in
  composite row key, to ensure the right sorting order
  you expected                                        48
PARTIAL KEY SCANS

• Set startRow and stopRow
  • Set startRow with exact user ID
    • Scan.setStartRow(…)
  • Set stopRow with user ID + 1
    • Scan.setStopRow(…)
• Control the sorting order
  • Long.MAX_VALUE - <date-as-long>
  • String s = "Hello,";
     for (int i = 0; i < s.length(); i++) {
       print(Integer.toString(s.charAt(i) ^ 0xFF, 16));
     }
     b7 9a 93 93 90 d3


                                                          49
PAGINATION

• Use Filters
  • PageFilter
  • ColumnPaginationFilter
• Steps
  1.   Open a scanner at the start row
  2.   Skip offset rows
  3.   Read the next limit rows and return to the caller
  4.   Close the scanner
• Usecase
  • On web-based email client
  • Read first 1 ~ 50 emails, then 51 ~ 100, etc

                                                           50
TIME SERIES DATA

• Dealing with stream processing of event
• Most common use case is time series data
 • Data could be coming from
   • A sensor in a power grid
   • A stock exchange
   • A monitoring system for computer systems
 • Their row key represents the event time
• The sequential, monotonously increasing nature of
  time series data
 • Causes all incoming data to be written to the same region
 • Hot spot issue


                                                               51
TIME SERIES DATA

• Overcome this problem
 • By prefixing the row key with a nonsequential prefix


• Common choices
 • Salting
 • Field swap/promotion
 • Randomization




                                                          52
TIME SERIES DATA - SALTING

• Use a salting prefix to the key that guarantees a
  spread of all rows across all region servers
     byte prefix = (byte) (Long.hashCode(timestamp) %
     <number of regionservers>);
     byte[] rowkey = Bytes.add(Bytes.toBytes(prefix),
     Bytes.toBytes(timestamp);

• Which results
     0myrowkey-1
     0myrowkey-4
     1myrowkey-2
     1myrowkey-5
     ...


                                                        53
TIME SERIES DATA - SALTING

• Access to a range of rows must be fanned out
• Read with <number of region servers> get or scan
  calls

• Is it good or not good ?
  • Use multiple threads to read this data from distinct servers
  • Need more further study on the access pattern and try run




                                                                   54
TIME SERIES DATA –
                  SALTING USECASE
• A open source crash reporter named Socorro from
  Mozilla organization
  • For Firefox and Thunderbird
  • Reports are subsequently read and analyzed by the Mozilla
    development team


• Technologies
  • Python-based client code
  • Communicates with the HBase cluster using Thrift




 Mozilla wiki for Socorro - https://wiki.mozilla.org/Socorro   55
TIME SERIES DATA –
                  SALTING USECASE
• How the client is merging the previously salted,
  sequential keys when doing a scan operation



for salt in '0123456789abcdef':
  salted_prefix = "%s%s" % (salt,prefix)
  scanner = self.client.scannerOpenWithPrefix(table, salted_prefix, columns)
  iterators.append(salted_scanner_iterable(self.logger,self.client,
               self._make_row_nice,salted_prefix,scanner))




                                                                       56
TIME SERIES DATA –
         FIELD SWAP/PROMOTION
• Use the composite row key concept
  • Move the timestamp to a secondary position in the row key
• If you already have a row key with more than one
  field
  • Swap them
• If you have only the timestamp as the current row
  key
  • Promote another field from the column keys into the row
    key
  • Promote even the value
• You can only access data, especially time ranges,
  for a given swapped or promoted field
                                                              57
TIME SERIES DATA –
     FIELD SWAP/PROMOTION USECASE
• OpenTSDB
  • A time series database
  • Store metrics about servers and
    services, gathered by external
    collection agents
  • All of the data is stored in HBase
  • System UI enables users to query
    various metrics, combining
    and/or downsampling them—all
    in real time

• The schema promotes the
  metric ID into the row key
  • <metric-id><base-timestamp>...




                                         http://opentsdb.net/   58
TIME SERIES DATA –
 FIELD SWAP/PROMOTION USECASE
• Example




 OpenTSDB Schema - http://opentsdb.net/schema.html
                                                     59
TIME SERIES DATA




                   60
TIME-ORDERED RELATIONS
• You can also store related, time-ordered data
  • By using the columns of a table
• Since all of the columns are sorted per column
  family
  • Treat this sorting as a replacement for a secondary index
  • For a small number of indexes, you can create a column
    family for them
    • If the large amount of indexes, you shall consider the
      Secondary-Indexes approaches in later of this ppt
• HBase currently (0.95) does not do well with
  anything above two or three column families
  • Due to flushing and compactions are done on a per Region
    basis
    • Can make for a bunch of needless i/o loading
   http://hbase.apache.org/book/number.of.cfs.html
                                                                61
TIME-ORDERED RELATIONS – EXAMPLE
• Colum name = <indexId> + “-” + <value>
• Column value
  • Key in data column family
  • Redundant values from data column family for performance
      • Denormalization
… //data
12345 : data : 5fc38314-e290-ae5da5fc375d : 1307097848 : "Hi Lars, ..."
12345 : data : 725aae5f-d72e-f90f3f070419 : 1307099848 : "Welcome, and ..."
12345 : data : cc6775b3-f249-c6dd2b1a7467 : 1307101848 : "To Whom It ..."
12345 : data : dcbee495-6d5e-6ed48124632c : 1307103848 : "Hi, how are ..."
... //ascending index for from email address
12345 : index : idx-from-asc-mary@foobar.com : 1307099848 : 725aae5f-d72e...
12345 : index : idx-from-asc-paul@foobar.com : 1307103848 : dcbee495-6d5e...
12345 : index : idx-from-asc-pete@foobar.com : 1307097848 : 5fc38314-e290...
12345 : index : idx-from-asc-sales@ignore.me : 1307101848 : cc6775b3-f249...
...// descending index for email subjects
12345 : index : idx-subject-desc-xa8x90x8dx93x9bxde : 
  1307103848 : dcbee495-6d5e-6ed48124632c
12345 : index : idx-subject-desc-xb7x9ax93x93x90xd3 : 
                                                                         62
  1307099848 : 725aae5f-d72e-f90f3f070419
SECONDARY INDEXES

• HBase has no native support for secondary indexes
 • There are use cases that need them
 • Look up a cell with not just the primary coordinates
   • The row key, column family name and qualifier
 • But also an alternative coordinate
   • Scan a range of rows from the main table, but ordered by the
     secondary index
• Secondary indexes store a mapping between the
  new coordinates and the existing ones




                                                                    63
SECONDARY INDEXES -
              CLIENT-MANAGED
• Moving the responsibility into the application layer

• Combines a data table and one (or more)
  lookup/mapping tables

• Write data
  • Into the data table, also updates the lookup tables


• Read data
  • Either a direct lookup in the main table
  • A lookup in secondary index table, then retrieve data from
    main table
                                                                 64
SECONDARY INDEXES -
              CLIENT-MANAGED
• Atomicity
  • No cross-row atomicity
  • Writing to the secondary index tables first, then write to the
    data table at the end of the operation
  • Use asynchronous, regular pruning jobs


• It is hardcoded in your application
  • Needs to evolve with overall schema changes, and new
    requirements




                                                                     65
SECONDARY INDEXES -
    INDEXED-TRANSACTIONAL HBASE
 • Indexed-Transactional HBase (ITHBase) project
    • It extends HBase by adding special implementations of the
      client and server-side classes
 • Extension
    • The core extension is the addition of transactions
      • It guarantees that all secondary index updates are consistent
    • Most client and server classes are replaced by ones that
      handle indexing support
 • Drawbacks
    • May not support the latest version of HBase available
    • Adds a considerable amount of synchronization overhead
      that results in decreased performance

https://github.com/hbase-trx/hbase-transactional-tableindexed           66
SECONDARY INDEXES -
                 INDEXED HBASE
• Indexed HBase (IHBase)
   • Forfeits the use of separate tables for each index but
     maintains them purely in memory
   • this approach is very fast than previous one
• Indexes related
   • The indexes are generated when
     • A region is opened for the first time
     • A memstore is flushed to disk
   • The index is never out of sync, and no explicit transactional
     control is necessary
• Drawbacks
   • It is quite intrusive, requires additional JAR and a config file
   • It needs extra resources, it trades memory for extra I/O
     requirements
   • It may not be available for the latest version of HBase
https://github.com/ykulbak/ihbase                                       67
SECONDARY INDEXES -
               COPROCESSOR
• Implement an indexing solution based on
  coprocessors
  • Using the server-side hooks, e.g. RegionObserver
  • Use coprocessor to load the indexing layer for every region,
    which would subsequently handle the maintenance of the
    indexes
  • Use of the scanner hooks to transparently iterate over a
    normal data table, or an index-backed view on the same
  • Currently in development
• JIRA ticket
  • https://issues.apache.org/jira/browse/HBASE-2038


                                                               68
SEARCH INTEGRATION

• Using indexes
  • Still confined to the available keys user-predefined


• Search-based lookup
  • Use arbitrary nature of keys
  • Often backed by full search engine integration


• Following are a few possible approaches




                                                           69
SEARCH INTEGRATION -
             CLIENT-MANAGED
• Example Facebook inbox search
  • The schema is built roughly like this
• Every row is a single inbox, that is, every user has a
  single row in the search table
• The columns are the terms indexed from the
  messages
• The versions are the message IDs
• The values contain additional information, such as
  the position of the term in the document

  <inbox>:<COL_FAM_1>:<term>:<messageId>:<additionalInfo>

                                                            70
SEARCH INTEGRATION -
                 LUCENE
• Apache Lucene
  • Lucene Core
    • Provides Java-based indexing and search technology
  • Solr
    • High performance search server built using Lucene Core
• Steps
  1. HBase only stores the data
  2. BuildTableIndex class scans an entire data table and
     builds the Lucene indexes
  3. End up as directories/files on HDFS
  4. These indexes can be downloaded to a Lucene-based
     server for locally use
  5. A search performed via Lucene, will return row keys for
     next random lookup into data table for specific value
                                                               71
SEARCH INTEGRATION -
             COPROCESSORS
• Currently in development
• Similar to the use of Coprocessors to build
  secondary indexes
• Complement a data table with Lucene-based
  search functionality

• Ticket in JIRA
  • https://issues.apache.org/jira/browse/HBASE-3529




                                                       72
TRANSACTION

• It is a immature aspect of HBase
  • Due to it is a compliant for CAP theorem
• Here are a two possible solutions
  • Transactional HBase
    • Comes with the aforementioned ITHBase


  • Zookeeper
    • Comes with a lock recipe that can be used to implement a
      two-phase commit protocol
    • http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recip
      es_twoPhasedCommit



                                                                73
BLOOM FILTERS
• Problem
 • Cell count
   • 16,384 blocks = 64KB block size / 1GB store file size
   • 5,000,000 (million) cell amount = 200 bytes cell size / 1GB store
     file size
   • Block index => index start row key of each block
 • Store file
   • A number of store files within one column family
• Allowing you to improve lookup times. Since they
  add overhead in terms of storage and memory,
  they are turned off by default.



                                                                         74
BLOOM FILTERS –
  WHY USE IT ?




                  75
BLOOM FILTERS –
                DO WE NEED IT ?


• If possible, you should try to use the row-level Bloom
  filter
  • A good balance between the additional space
    requirements and the gain in performance


• Only resort to the more costly row+column Bloom
  filter
  • Gain no advantage from using the row-level one



                                                       76
BACKUPS



          77
QK一下好不好 (  ̄ 3 ̄)Y▂Ξ
                     78

More Related Content

What's hot

HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Noteslarsgeorge
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101Nick Dimiduk
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme MakeoverHBaseCon
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationScott Miao
 
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low LatencyNick Dimiduk
 

What's hot (19)

HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Notes
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond Panel
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 

Viewers also liked

003 admin featuresandclients
003 admin featuresandclients003 admin featuresandclients
003 admin featuresandclientsScott Miao
 
002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapiScott Miao
 
005 cluster monitoring
005 cluster monitoring005 cluster monitoring
005 cluster monitoringScott Miao
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi namboori
 
제3회 오픈 로보틱스 세미나 (제12세션) : 로봇 암 모델링과 MoveIt! 사용법
제3회 오픈 로보틱스 세미나 (제12세션) : 로봇 암 모델링과 MoveIt! 사용법제3회 오픈 로보틱스 세미나 (제12세션) : 로봇 암 모델링과 MoveIt! 사용법
제3회 오픈 로보틱스 세미나 (제12세션) : 로봇 암 모델링과 MoveIt! 사용법Yoonseok Pyo
 
Vertica 7.0 Architecture Overview
Vertica 7.0 Architecture OverviewVertica 7.0 Architecture Overview
Vertica 7.0 Architecture OverviewAndrey Karpov
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 

Viewers also liked (9)

003 admin featuresandclients
003 admin featuresandclients003 admin featuresandclients
003 admin featuresandclients
 
002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapi
 
005 cluster monitoring
005 cluster monitoring005 cluster monitoring
005 cluster monitoring
 
Hdfs internals
Hdfs internalsHdfs internals
Hdfs internals
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
 
제3회 오픈 로보틱스 세미나 (제12세션) : 로봇 암 모델링과 MoveIt! 사용법
제3회 오픈 로보틱스 세미나 (제12세션) : 로봇 암 모델링과 MoveIt! 사용법제3회 오픈 로보틱스 세미나 (제12세션) : 로봇 암 모델링과 MoveIt! 사용법
제3회 오픈 로보틱스 세미나 (제12세션) : 로봇 암 모델링과 MoveIt! 사용법
 
Vertica 7.0 Architecture Overview
Vertica 7.0 Architecture OverviewVertica 7.0 Architecture Overview
Vertica 7.0 Architecture Overview
 
Disk scheduling
Disk schedulingDisk scheduling
Disk scheduling
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 

Similar to 004 architecture andadvanceduse

Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveGluster.org
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014Nick Dimiduk
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBaseCon
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012Chris Huang
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation Yahoo Developer Network
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
Hbase Introduction
Hbase IntroductionHbase Introduction
Hbase IntroductionKim Yong-Duk
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Alluxio, Inc.
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectureshypertable
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesfeng1212
 
Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2BradDesAulniers2
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemRutvik Bapat
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedEqunix Business Solutions
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...DoKC
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...DoKC
 

Similar to 004 architecture andadvanceduse (20)

Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep Dive
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Hbase Introduction
Hbase IntroductionHbase Introduction
Hbase Introduction
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
 

More from Scott Miao

My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharingMy thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharingScott Miao
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01Scott Miao
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudScott Miao
 
analytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsanalytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsScott Miao
 
Attack on graph
Attack on graphAttack on graph
Attack on graphScott Miao
 
20121022 tm hbasecanarytool
20121022 tm hbasecanarytool20121022 tm hbasecanarytool
20121022 tm hbasecanarytoolScott Miao
 

More from Scott Miao (6)

My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharingMy thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
analytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsanalytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the aws
 
Attack on graph
Attack on graphAttack on graph
Attack on graph
 
20121022 tm hbasecanarytool
20121022 tm hbasecanarytool20121022 tm hbasecanarytool
20121022 tm hbasecanarytool
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

004 architecture andadvanceduse

  • 1. ARCHITECTURE & ADVANCED USAGE SCOTT MIAO 2012/7/19
  • 2. AGENDA • Course Credit • Architecture • More… • Advanced Usage • More… 2
  • 3. COURSE CREDIT • Show up, 30 scores • Ask question, each question earns 5 scores • Quiz, 40 scores, Pls see TCExam • 70 scores will pass this course • Each course credit will be calculated once for each course finished • The course credit will be sent to you and your supervisor by mail 3
  • 4. ARCHITECTURE • Seek V.S. Transfer • KeyValue Format • Storage • Write-Ahead Log • Write Path • Read Path • Files • Regions Lookup • Region Splits • Region Life Cycle • Compactions • Replication • HFile Format 4
  • 5. SEEK V.S. TRANSFER • HBase use Log-Structure Merge Trees (LSM-Trees) data structure as it’s underlying Store File operation mechanism • Derived from B+ Trees • Easy to handle data with optimized layout • WAL Log • MemStore • Operates at the Disk Transfer • B+ Trees • Many RDBMSs use B+ Trees • Use OPTIMIZATION process periodically • Operates at the Disk Seek 5
  • 6. SEEK V.S. TRANSFER • Disk Transfer • Moving data between the disk surface and the host system • CPU, RAM, and disk size double every 18–24 months • Disk Seek • Measures the time it takes the head assembly on the actuator arm to travel to the track of the disk where the data will be read or written • Seek time remains nearly constant at around a 5% increase in speed per year • Conclusion • At scale seek, is inefficient compared to transfer https://www.research.ibm.com/haifa/Workshops/ir2005/papers/DougCutti ng-Haifa05.pdf 6
  • 7. SEEK V.S. TRANSFER – LSM TREES 7
  • 9. STORAGE - COMPONENTS • Zookeeper • -ROOT-, .META. Tables • HMaster • HRegionServer • HLog (WAL, Write-Ahead Log) • HRegion • Store => ColumnFamily • StorageFile => HFile • DFS Client • HDFS, Amazon S3, Local File System, etc 9
  • 10. WRITE PATH 1. A Write to a 2. Write to region server WAL log 3. Write to a corresponding MemStore after WAL log persistent 4. Flush a new Hfile if MemStore size reach the threshold 10
  • 11. FILES • Root-Level files • Table-Level files • Region-Level files • A txt file for reference 11
  • 12. REGION SPLITS • Split one region to two half-size regions • Triggered while • hbase.hregion.max.filesize reached, default is 256MB • Hbase Shell split, HBaseAdmin.split(…) • Following Steps the Region server will take… • Create a folder called “split” under parent region folder • Close the parent region, so it can not service any request • Prepare two new daughter regions (with multiple threads), inside the split folder, including… • region folder structure, reference Hfile, etc • Move this two daughter regions into Table folder if above steps completed 12
  • 13. REGION SPLITS • Here is an example of how this looks in the .META. Table row: testtable,row-500,1309812163930.d9ffc3a5cd016ae58e23d7a6cb937949. column=info:regioninfo, timestamp=1309872211559, value=REGION => {NAME => 'testtable,row-500,1309812163930.d9ffc3a5cd016ae58e23d7a6cb937949. TableName => 'testtable', STARTKEY => 'row-500', ENDKEY => 'row-700', ENCODED => d9ffc3a5cd016ae58e23d7a6cb937949, OFFLINE => true, SPLIT => true,} column=info:splitA, timestamp=1309872211559, value=REGION => {NAME => 'testtable,row-500,1309872211320.d5a127167c6e2dc5106f066cc84506f8. TableName => 'testtable', STARTKEY => 'row-500', ENDKEY => 'row-550', ENCODED => d5a127167c6e2dc5106f066cc84506f8,} column=info:splitB, timestamp=1309872211559, value=REGION => {NAME => 'testtable,row-550,1309872211320.de27e14ffc1f3fff65ce424fcf14ae42. TableName => [B@62892cc5', STARTKEY => 'row-550', ENDKEY => 'row-700', ENCODED => de27e14ffc1f3fff65ce424fcf14ae42,} 13
  • 14. REGION SPLITS • The name of the reference file is another random number, but with the hash of the referenced region as a postfix /hbase/testtable/d5a127167c6e2dc5106f066cc84506f8/colfam1/ 6630747383202842155.d9ffc3a5cd016ae58e23d7a6cb937949 14
  • 15. COMPACTIONS • The store files are monitored by a background thread • The flushes of memstores slowly build up an increasing number of on-disk files • The compaction process will combine them to a few, larger files • This goes on until • The largest of these files exceeds the configured maximum store file size and triggers a region split • Type • Minor • Major 15
  • 16. COMPACTIONS • Compaction check triggered while… • A memstore has been flushed to disk • The compact or major_compact shell commands/API calls • A background thread • Called CompactionChecker • Each region server runs a single instance hbase.server.thread.wakefrequency X hbase.server.thread.wakefrequency.multiplier (default set to 1000) • Run it less often than the other thread-based tasks 16
  • 17. COMPACTIONS - MINOR • Rewriting the last few files into one larger one • The number of files is set with the hbase.hstore.compaction.min property • Default is 3 • Needs to be at least 2 or more • A number too large… • Would delay minor compactions • Also would require more resources and take longer • The maximum number of files is set with hbase.hstore.compaction.max property • Default is 10 17
  • 18. COMPACTIONS - MINOR • The all files that are under the limit, up to the total number of files per compaction allowed • hbase.hstore.compaction.min.size property • Any file larger than the maximum compaction size is always excluded • hbase.hstore.compaction.max.size property • Default is Long.MAX_VALUE 18
  • 19. COMPACTIONS - MAJOR • Compact all files into a single file • Also drop predicate deletion KeyValues • Action is Delete • Version • TTL • Triggered while… • major_compact shell command/majorCompact() API call • hbase.hregion.majorcompaction property • Default is 24 hours • hbase.hregion.majorcompaction.jitter property • Default is 0.2 • Without the jitter, all stores would run a major compaction at the same time, every 24 hours • Minor compactions might be promoted to major compactions • Due to only affect store files whose size less than the configured maximum files per compaction 19
  • 20. HFILE FORMAT • The actual storage files are implemented by the HFile class • Store HBase’s data efficiently • Blocks • Fixed size • Trailer, File Info • Others are variable size 20
  • 21. HFILE FORMAT • Default block size is 64KB • Some recommendation written in API docs • block size between 8KB to 1MB for general usage • Larger block size is preferred for sequential access usecase • Smaller block size is preferred for random access usecase • Require more memory to hold the block index • May be slower to create (leads more FS I/O flushes) • The smallest possible block size would be around 20KB-30KB • Each block contains • A magic header • A number of serialized KeyValue instances 21
  • 22. HFILE FORMAT • Each block is about as large as the configured block size • In practice, it is not an exact science • Store a KeyValue that is larger than the block size, the writer has to accept this • Even with smaller values, the check for the block size is done after the last value was written • The majority of blocks will be slightly larger • Using a compression algorithm • Will not have much control over block size • final store file contain the same number of blocks, but the total size will be smaller since each block is smaller 22
  • 23. HFILE FORMAT – HFILE BLOCK SIZE V.S. HDFS BLOCK SIZE • Default HDFS block size is 64 MB • Which 1,024 times the HFile default block size (64KB) • HBase stores its files transparently into a filesystem • No correlation between these two block types • It is just a coincidence • HDFS also does not know what HBase stores 23
  • 24. HFILE FORMAT – HFILE CLASS • Access an HFile directly • hadoop fs –cat <hfile> • hbase org.apache.hadoop.hbase.io.hfile.HFile –f <hfile> -m –v- p • Actual data stored as serialized KeyValue instances • HFile.Reader properties and the trailer block details • File info block values 24
  • 25. KEYVALUE FORMAT • Each KeyValue in the HFile is a low-level byte array • Fixed-length Numbers • Key Length • Value Length • If you deal with small values • Try to keep the key small • Choose a short row and column key • family name with a single byte and the qualifier equally short • Compression should help mitigate the overwhelming key size problem • The sorting of all KeyValues in the store file helps to keep similar keys close together 25
  • 26. WRITE-AHEAD LOG • Region servers keep data in-memory until enough is collected to warrant a flush to disk • Avoiding the creation of too many very small files • The data resides in memory it is volatile, not persistent • Write-Ahead Logging • A common approach to solve above issue, even in most of RDBMSs • Each update (edit) is written to a log, then to real persistent data store • The server then has the liberty to batch or aggregate the data in memory as needed 26
  • 27. WRITE-AHEAD LOG • The WAL is the lifeline that is needed when disaster strikes • The WAL records all changes to the data • If the server crashes • WAL can effectively replay the log to get everything up to where the server should have been just before the crash • if writing the record to the WAL fails • The whole operation must be considered a failure • The actual WAL log resides on HDFS • HDFS is a replicated filesystem • Any other server can open the log and start replaying the edits 27
  • 28. WRITE-AHEAD LOG – WRITE PATH 28
  • 29. WRITE-AHEAD LOG – MAIN CLASSES 29
  • 30. WRITE-AHEAD LOG – OTHER CLASSES • LogSyncer Class • HTableDescriptor.setDeferredLogFlush(boolean) • Default is false • Every update to WAL log will be synced into filesystem • Set to true • Background process instead • hbase.regionserver.optionallogflushinterval property • Default is 1 second • There is a chance of data loss in case of a server failure • Can only applies to user tables, not catalog tables (-ROOT- , .META.) 30
  • 31. WRITE-AHEAD LOG – OTHER CLASSES • LogRoller Class • Takes care of rolling logfiles at certain intervals • hbase.regionserver.logroll.period property • Default is 1 hour • Other parameters • hbase.regionserver.hlog.blocksize property • Default is 32MB • hbase.regionserver.logroll.multiplier property • Default is 0.95 • Rotate logs when they are at 95% of the block size • Logs are rotated • A certain amount of time has passed • Considered full • Whatever comes first 31
  • 32. WRITE-AHEAD LOG – SPLIT & REPLAY LOGS 32
  • 33. WRITE-AHEAD LOG – DURABILITY • WAL Log • Sync them for every edit • Set the log flush times to be as low as you want • Still dependent on the underlying filesystem • Especially the HDFS • Use Hadoop 0.21.0 or later • Or a special 0.20.x with append support patches • I used 0.20.203 before • Otherwise, you can very well face data loss !! 33
  • 34. READ PATH ColFam2 Due to timestamp and Bloom 34 filter exclusion process
  • 35. REGION LOOKUPS • Catalog Tables • -ROOT- • Refer to all regions in the .META. table • .META. • Refer to all regions in all user tables • A Three Level B+ tree-like lookup scheme • A node stored in ZooKeeper • Contains the location of the root table’s region • Lookup of a matching meta region from the -ROOT- table • Retrieval of the user table region from the .META. table 35
  • 37. THE REGION LIFE CYCLE 37
  • 38. ZOOKEEPER • ZooKeeper as HBase distributed coordination service • Use HBase shell • hbase zkcli Znode Description /hbase/hbaseid Cluster ID, as stored in the hbase.id file on HDFS /hbase/master Holds the master server name /hbase/replication Contains replication details /hbase/root-region- Server name of the region server hosting the -ROOT- server regions 38
  • 39. ZOOKEEPER Znode Description /hbase/rs The root node for all region servers to list themselves when they start. It is used to track server failures. /hbase/shutdow Is used to track the cluster state. It contains the time when n the cluster was started, and is empty when it was shut down /hbase/splitlog All log-splitting-related coordination. States including unassigned, owned and RESCAN /hbase/table Disabled tables are added to this znode /hbase/unassig Used by the AssignmentManager, to track region states ned across the entire cluster. It contains znodes for those regions 39 that are not open, but are in a transitional state.
  • 40. REPLICATION • A way to copy data between HBase deployments • It can serve as a • Disaster recovery solution • Provide higher availability at the HBase layer • (HBase cluster) Master-push • One master cluster can replicate to any number of slave clusters, and each region server will participate to replicate its own stream of edits • Eventual consistency 40
  • 42. ADVANCED USAGE • Key Design • Secondary Indexes • Search Integration • Transactions • Bloom Filters 42
  • 43. KEY DESIGN • Two fundamental key structures • Row Key • Column Key • A column family name + a column qualifier • Use these keys • to solve commonly found problems when designing storage solutions • Logical V.S. Physical layout 43
  • 45. READ PERFORMANCE AND QUERY CRITERIA 45
  • 46. KEY DESIGN – TALL-NARROW V.S. FLAT-WIDE TABLES • Tall-narrow table layout • A table with few columns but many rows • Flat-wide table layout • Has fewer rows but many columns • Tall-narrow table layout is recommended • Due to a single row could outgrow the maximum file/region size and work against the region split facility under Flat-wide table design 46
  • 47. KEY DESIGN – TALL-NARROW V.S. FLAT-WIDE TABLES • A email system as example • Flat-wide layout <userId> : <colfam> : <messageId> : <timestamp> : <email-message> 12345 : data : 5fc38314-e290-ae5da5fc375d : 1307097848 : "Hi Lars, ..." 12345 : data : 725aae5f-d72e-f90f3f070419 : 1307099848 : "Welcome, and ..." 12345 : data : cc6775b3-f249-c6dd2b1a7467 : 1307101848 : "To Whom It ..." 12345 : data : dcbee495-6d5e-6ed48124632c : 1307103848 : "Hi, how are ..." • Tall-narrow <userId>-<messageId> : <colfam> : <qualifier> : <timestamp> : <email-message> 12345-5fc38314-e290-ae5da5fc375d : data : : 1307097848 : "Hi Lars, ..." 12345-725aae5f-d72e-f90f3f070419 : data : : 1307099848 : "Welcome, and ..." 12345-cc6775b3-f249-c6dd2b1a7467 : data : : 1307101848 : "To Whom It ..." 12345-dcbee495-6d5e-6ed48124632c : data : : 1307103848 : "Hi, how are ..." Empty Qualifier !! 47
  • 48. PARTIAL KEY SCANS • Make sure to pad the value of each field in composite row key, to ensure the right sorting order you expected 48
  • 49. PARTIAL KEY SCANS • Set startRow and stopRow • Set startRow with exact user ID • Scan.setStartRow(…) • Set stopRow with user ID + 1 • Scan.setStopRow(…) • Control the sorting order • Long.MAX_VALUE - <date-as-long> • String s = "Hello,"; for (int i = 0; i < s.length(); i++) { print(Integer.toString(s.charAt(i) ^ 0xFF, 16)); } b7 9a 93 93 90 d3 49
  • 50. PAGINATION • Use Filters • PageFilter • ColumnPaginationFilter • Steps 1. Open a scanner at the start row 2. Skip offset rows 3. Read the next limit rows and return to the caller 4. Close the scanner • Usecase • On web-based email client • Read first 1 ~ 50 emails, then 51 ~ 100, etc 50
  • 51. TIME SERIES DATA • Dealing with stream processing of event • Most common use case is time series data • Data could be coming from • A sensor in a power grid • A stock exchange • A monitoring system for computer systems • Their row key represents the event time • The sequential, monotonously increasing nature of time series data • Causes all incoming data to be written to the same region • Hot spot issue 51
  • 52. TIME SERIES DATA • Overcome this problem • By prefixing the row key with a nonsequential prefix • Common choices • Salting • Field swap/promotion • Randomization 52
  • 53. TIME SERIES DATA - SALTING • Use a salting prefix to the key that guarantees a spread of all rows across all region servers byte prefix = (byte) (Long.hashCode(timestamp) % <number of regionservers>); byte[] rowkey = Bytes.add(Bytes.toBytes(prefix), Bytes.toBytes(timestamp); • Which results 0myrowkey-1 0myrowkey-4 1myrowkey-2 1myrowkey-5 ... 53
  • 54. TIME SERIES DATA - SALTING • Access to a range of rows must be fanned out • Read with <number of region servers> get or scan calls • Is it good or not good ? • Use multiple threads to read this data from distinct servers • Need more further study on the access pattern and try run 54
  • 55. TIME SERIES DATA – SALTING USECASE • A open source crash reporter named Socorro from Mozilla organization • For Firefox and Thunderbird • Reports are subsequently read and analyzed by the Mozilla development team • Technologies • Python-based client code • Communicates with the HBase cluster using Thrift Mozilla wiki for Socorro - https://wiki.mozilla.org/Socorro 55
  • 56. TIME SERIES DATA – SALTING USECASE • How the client is merging the previously salted, sequential keys when doing a scan operation for salt in '0123456789abcdef': salted_prefix = "%s%s" % (salt,prefix) scanner = self.client.scannerOpenWithPrefix(table, salted_prefix, columns) iterators.append(salted_scanner_iterable(self.logger,self.client, self._make_row_nice,salted_prefix,scanner)) 56
  • 57. TIME SERIES DATA – FIELD SWAP/PROMOTION • Use the composite row key concept • Move the timestamp to a secondary position in the row key • If you already have a row key with more than one field • Swap them • If you have only the timestamp as the current row key • Promote another field from the column keys into the row key • Promote even the value • You can only access data, especially time ranges, for a given swapped or promoted field 57
  • 58. TIME SERIES DATA – FIELD SWAP/PROMOTION USECASE • OpenTSDB • A time series database • Store metrics about servers and services, gathered by external collection agents • All of the data is stored in HBase • System UI enables users to query various metrics, combining and/or downsampling them—all in real time • The schema promotes the metric ID into the row key • <metric-id><base-timestamp>... http://opentsdb.net/ 58
  • 59. TIME SERIES DATA – FIELD SWAP/PROMOTION USECASE • Example OpenTSDB Schema - http://opentsdb.net/schema.html 59
  • 61. TIME-ORDERED RELATIONS • You can also store related, time-ordered data • By using the columns of a table • Since all of the columns are sorted per column family • Treat this sorting as a replacement for a secondary index • For a small number of indexes, you can create a column family for them • If the large amount of indexes, you shall consider the Secondary-Indexes approaches in later of this ppt • HBase currently (0.95) does not do well with anything above two or three column families • Due to flushing and compactions are done on a per Region basis • Can make for a bunch of needless i/o loading http://hbase.apache.org/book/number.of.cfs.html 61
  • 62. TIME-ORDERED RELATIONS – EXAMPLE • Colum name = <indexId> + “-” + <value> • Column value • Key in data column family • Redundant values from data column family for performance • Denormalization … //data 12345 : data : 5fc38314-e290-ae5da5fc375d : 1307097848 : "Hi Lars, ..." 12345 : data : 725aae5f-d72e-f90f3f070419 : 1307099848 : "Welcome, and ..." 12345 : data : cc6775b3-f249-c6dd2b1a7467 : 1307101848 : "To Whom It ..." 12345 : data : dcbee495-6d5e-6ed48124632c : 1307103848 : "Hi, how are ..." ... //ascending index for from email address 12345 : index : idx-from-asc-mary@foobar.com : 1307099848 : 725aae5f-d72e... 12345 : index : idx-from-asc-paul@foobar.com : 1307103848 : dcbee495-6d5e... 12345 : index : idx-from-asc-pete@foobar.com : 1307097848 : 5fc38314-e290... 12345 : index : idx-from-asc-sales@ignore.me : 1307101848 : cc6775b3-f249... ...// descending index for email subjects 12345 : index : idx-subject-desc-xa8x90x8dx93x9bxde : 1307103848 : dcbee495-6d5e-6ed48124632c 12345 : index : idx-subject-desc-xb7x9ax93x93x90xd3 : 62 1307099848 : 725aae5f-d72e-f90f3f070419
  • 63. SECONDARY INDEXES • HBase has no native support for secondary indexes • There are use cases that need them • Look up a cell with not just the primary coordinates • The row key, column family name and qualifier • But also an alternative coordinate • Scan a range of rows from the main table, but ordered by the secondary index • Secondary indexes store a mapping between the new coordinates and the existing ones 63
  • 64. SECONDARY INDEXES - CLIENT-MANAGED • Moving the responsibility into the application layer • Combines a data table and one (or more) lookup/mapping tables • Write data • Into the data table, also updates the lookup tables • Read data • Either a direct lookup in the main table • A lookup in secondary index table, then retrieve data from main table 64
  • 65. SECONDARY INDEXES - CLIENT-MANAGED • Atomicity • No cross-row atomicity • Writing to the secondary index tables first, then write to the data table at the end of the operation • Use asynchronous, regular pruning jobs • It is hardcoded in your application • Needs to evolve with overall schema changes, and new requirements 65
  • 66. SECONDARY INDEXES - INDEXED-TRANSACTIONAL HBASE • Indexed-Transactional HBase (ITHBase) project • It extends HBase by adding special implementations of the client and server-side classes • Extension • The core extension is the addition of transactions • It guarantees that all secondary index updates are consistent • Most client and server classes are replaced by ones that handle indexing support • Drawbacks • May not support the latest version of HBase available • Adds a considerable amount of synchronization overhead that results in decreased performance https://github.com/hbase-trx/hbase-transactional-tableindexed 66
  • 67. SECONDARY INDEXES - INDEXED HBASE • Indexed HBase (IHBase) • Forfeits the use of separate tables for each index but maintains them purely in memory • this approach is very fast than previous one • Indexes related • The indexes are generated when • A region is opened for the first time • A memstore is flushed to disk • The index is never out of sync, and no explicit transactional control is necessary • Drawbacks • It is quite intrusive, requires additional JAR and a config file • It needs extra resources, it trades memory for extra I/O requirements • It may not be available for the latest version of HBase https://github.com/ykulbak/ihbase 67
  • 68. SECONDARY INDEXES - COPROCESSOR • Implement an indexing solution based on coprocessors • Using the server-side hooks, e.g. RegionObserver • Use coprocessor to load the indexing layer for every region, which would subsequently handle the maintenance of the indexes • Use of the scanner hooks to transparently iterate over a normal data table, or an index-backed view on the same • Currently in development • JIRA ticket • https://issues.apache.org/jira/browse/HBASE-2038 68
  • 69. SEARCH INTEGRATION • Using indexes • Still confined to the available keys user-predefined • Search-based lookup • Use arbitrary nature of keys • Often backed by full search engine integration • Following are a few possible approaches 69
  • 70. SEARCH INTEGRATION - CLIENT-MANAGED • Example Facebook inbox search • The schema is built roughly like this • Every row is a single inbox, that is, every user has a single row in the search table • The columns are the terms indexed from the messages • The versions are the message IDs • The values contain additional information, such as the position of the term in the document <inbox>:<COL_FAM_1>:<term>:<messageId>:<additionalInfo> 70
  • 71. SEARCH INTEGRATION - LUCENE • Apache Lucene • Lucene Core • Provides Java-based indexing and search technology • Solr • High performance search server built using Lucene Core • Steps 1. HBase only stores the data 2. BuildTableIndex class scans an entire data table and builds the Lucene indexes 3. End up as directories/files on HDFS 4. These indexes can be downloaded to a Lucene-based server for locally use 5. A search performed via Lucene, will return row keys for next random lookup into data table for specific value 71
  • 72. SEARCH INTEGRATION - COPROCESSORS • Currently in development • Similar to the use of Coprocessors to build secondary indexes • Complement a data table with Lucene-based search functionality • Ticket in JIRA • https://issues.apache.org/jira/browse/HBASE-3529 72
  • 73. TRANSACTION • It is a immature aspect of HBase • Due to it is a compliant for CAP theorem • Here are a two possible solutions • Transactional HBase • Comes with the aforementioned ITHBase • Zookeeper • Comes with a lock recipe that can be used to implement a two-phase commit protocol • http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recip es_twoPhasedCommit 73
  • 74. BLOOM FILTERS • Problem • Cell count • 16,384 blocks = 64KB block size / 1GB store file size • 5,000,000 (million) cell amount = 200 bytes cell size / 1GB store file size • Block index => index start row key of each block • Store file • A number of store files within one column family • Allowing you to improve lookup times. Since they add overhead in terms of storage and memory, they are turned off by default. 74
  • 75. BLOOM FILTERS – WHY USE IT ? 75
  • 76. BLOOM FILTERS – DO WE NEED IT ? • If possible, you should try to use the row-level Bloom filter • A good balance between the additional space requirements and the gain in performance • Only resort to the more costly row+column Bloom filter • Gain no advantage from using the row-level one 76
  • 77. BACKUPS 77
  • 78. QK一下好不好 (  ̄ 3 ̄)Y▂Ξ 78