Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Separating hot-cold data
into heterogeneous storage
based on layered compaction
Allan Yang(HBase Committer)
Content 01
02
03
Typical Scenarios at Alibaba
Hot-cold Data Separation
— Hot-cold Data Recognition
— Layered Compaction
— ...
Typical Scenarios at Alibaba01
Typical Scenarios at Alibaba
Contacts&Chat AI BOTs Risk Control
Bills Logistics trackingGMV
Typical Scenarios
Commonality in some Scenarios
 Mass data
 No TTLs
 Only very small parts of data is
frequently visite...
Definitions
Hot Data
• Access very frequently
• Relatively small amount
• Low latency is very critical
HOT
COLD
Hot Data
•...
Hot-cold Data Separation02
Old Architecture
Pros
• Simple, no HBase code change
needed
Cons
• High maintenance cost
• Client aware
• Hard to keep con...
Current Architecture
• Separating hot cold data
automatically in a single table
• Transparent to user
• Different storage ...
Hot-cold Data Separation
— Hot-cold Data Recognition
— Layered Compaction
— Query Optimizations
Separating hot cold data
The Problems of separating data by KV timestamp
• Timestamp may not represents the heat of busine...
Secondary Field slicer
Besides ts, we provide a way to parse a Secondary Field from
Rowkey, use it as the boundary of Hot/...
Hot-cold Data Separation
— Hot-cold Data Recognition
— Layered Compaction
— Query Optimizations
Default compaction in HBase
Default HBase compaction
Strategy is Size-Tiered,
which is aimed to compact
small files to big...
Size-Tiered Compaction Strategy
• Size is the only concern
• Old data and new data will spread
around all HFiles
• Can’t b...
Date-Tiered Compaction Strategy
Date-Tiered Compaction Strategy(HBASE-15181)
Time Window
Compact multiple time windows in ...
Separating hot cold data
• Only Cold/Warm/Hot window is needed
• Data will move from hot to warn then to Cold window
• Sec...
Layered Compaction
• HFile flushed by Memstore is
always in L0
• Hot/Warm/Cold layer have
their own compaction
Strategy
• ...
Layered Compaction
• Compactor will output multiple HFiles
according to the separation boundary
• Secondary Field range wi...
Heterogeneous storage
We can specify Data encoding,
Compression, and storage type for
each layer
Here is an example:
Type
...
Storage Computing Separation
• Apsara HBase Provide a
Architecture of storage
computing separation
• High density HDD will...
Hot-cold Data Separation
— Hot-cold Data Recognition
— Layered Compaction
— Query Optimizations
HBase Read Path
A quick tour of HBase read path
HFile4 is filtered out by:
•Bloom filter
•Time range
•Key range
HFile 1 HF...
Goal of Query Optimization
• Query optimization is only for hot
queries
• We have to try our best to filter out
the cold H...
Query Optimization: Case 1
• Scenario: Monitoring, e.g. OpenTSDB
• Rowkey: MetricName + ts + postfix(tags)
Rowkey ts
cpuA0...
Query Optimization: Case 2
• Scenario: Tracing system
• Rowkey: TraceID (events are recorded in different column)
Rowkey t...
Row,column,ts
Row1,f:q,008
Row2,f:q,007
Row2,f:q,006
Row3,f:q,005
Time Range:
005…007
Row,column,ts
Row1,f:q,001
Row2,f:q,...
Scenario: KV Store
Rowkey: key(with only one qualifier)
28
HFile(hot)
Fake row:
Row5,f:q1, 006
Row4,f:q1
Row5,f:q1
Row6,f:...
29
HFile(hot)
trace2Collect005
trace2Arrive006
trace2Delivery007
trace2Done008
Time Range:
005…008
Separate data by
bounda...
30
Fixed size prefix
32bits traceNo actionCode ts
Other parts of Rowkey
Bloom Filter
Generate Check
HFile(hot)
trace2Colle...
31
HFile(hot)
userA991
userA992
userA993
userA994
Time Range:
005…008
Separate data by
boundary: ts = 004
Query: Scan scan...
32
Secondary Field Lazy Seek
prefix
32bits userID ts
Secondary Field
HFile(hot)
Fake Row: userA991
userA991
userA992
userA...
33
HFile(hot)
cpuA003server1
cpuA004server1
cpuB003server1
cpuB004server1
Secondary
Field Range:
003…004
Separate data by
...
Conclusion03
Conclusion
• A new approach to separate hot-cold data was introduced
• A new Secondary Field Slicer was used to decide lay...
We are hiring!
• If you are interested in or familiar
with Hadoop ecosystem or any
other No-SQL database
• If you are eage...
FAQ
Prochain SlideShare
Chargement dans…5
×

HBaseConAsia2018 Track1-6: Separating hot-cold data into hetergeneous storage based on layered compaction

356 vues

Publié le

WenLong Yang of Alibaba

Publié dans : Internet
  • Soyez le premier à commenter

HBaseConAsia2018 Track1-6: Separating hot-cold data into hetergeneous storage based on layered compaction

  1. 1. Separating hot-cold data into heterogeneous storage based on layered compaction Allan Yang(HBase Committer)
  2. 2. Content 01 02 03 Typical Scenarios at Alibaba Hot-cold Data Separation — Hot-cold Data Recognition — Layered Compaction — Query Optimizations Conclusions
  3. 3. Typical Scenarios at Alibaba01
  4. 4. Typical Scenarios at Alibaba Contacts&Chat AI BOTs Risk Control Bills Logistics trackingGMV
  5. 5. Typical Scenarios Commonality in some Scenarios  Mass data  No TTLs  Only very small parts of data is frequently visited  Hotspots change as time goes by
  6. 6. Definitions Hot Data • Access very frequently • Relatively small amount • Low latency is very critical HOT COLD Hot Data • Access very frequently • Relatively small amount • Low latency is very critical Cold Data • Access rarely • Big amount • Cost is more concerned
  7. 7. Hot-cold Data Separation02
  8. 8. Old Architecture Pros • Simple, no HBase code change needed Cons • High maintenance cost • Client aware • Hard to keep consistency Client Cold TableHot Table Coprocessor CopyTable Replication …… Write Hot query Cold query
  9. 9. Current Architecture • Separating hot cold data automatically in a single table • Transparent to user • Different storage policy for each layer • Auto query optimization
  10. 10. Hot-cold Data Separation — Hot-cold Data Recognition — Layered Compaction — Query Optimizations
  11. 11. Separating hot cold data The Problems of separating data by KV timestamp • Timestamp may not represents the heat of business data very well • KeyValue’s timestamp is also used as version number in HBase e.g. Write an order ID advance in current ts e.g. Data Source(Kafka, Spark…) delayed, resulting ts lag
  12. 12. Secondary Field slicer Besides ts, we provide a way to parse a Secondary Field from Rowkey, use it as the boundary of Hot/Warn/Cold data • FixPosFieldSlicer • DelimiterFieldSlicer Rowkey: Fixed size prefix 16bit UserID 64bit timestamp Postfix Secondary Field Variant size prefix Variant prefix 64bit timestamp Postfix Secondary Field Rowkey: # Delimiter
  13. 13. Hot-cold Data Separation — Hot-cold Data Recognition — Layered Compaction — Query Optimizations
  14. 14. Default compaction in HBase Default HBase compaction Strategy is Size-Tiered, which is aimed to compact small files to bigger files.
  15. 15. Size-Tiered Compaction Strategy • Size is the only concern • Old data and new data will spread around all HFiles • Can’t be used for Separating hot cold data Time range of HFile Time range we want
  16. 16. Date-Tiered Compaction Strategy Date-Tiered Compaction Strategy(HBASE-15181) Time Window Compact multiple time windows in to one tier when time goes by. The older, the bigger tier is. Logic view Physical view
  17. 17. Separating hot cold data • Only Cold/Warm/Hot window is needed • Data will move from hot to warn then to Cold window • Secondary Filed or timestamp is used Our layered compaction is inspired by Date-tiered Compaction.
  18. 18. Layered Compaction • HFile flushed by Memstore is always in L0 • Hot/Warm/Cold layer have their own compaction Strategy • Data is separated by secondary field or timestamp • Data out of boundary will be compacted out to next layer
  19. 19. Layered Compaction • Compactor will output multiple HFiles according to the separation boundary • Secondary Field range will be written into the FileInfo section of HFile e.g. Rowkey:userid+ts UserA002 UserA005 UserB003 UserB007 Secondary Field Range: 002…007 HFile HFile HFile Compactor HFile (hot) HFile (cold) Secondary Field Range Secondary Field Range
  20. 20. Heterogeneous storage We can specify Data encoding, Compression, and storage type for each layer Here is an example: Type Data Encoding Compression Storage Hot None None SSD/RAM Warn DIFF LZO One_SSD Cold DIFF LZ4 HDD/EC RAM RAM RAM SSD SSD SSD All_RAM All_SSD Erasure-Coding SSD HDD HDD One_SSD HDD HDD HDD All_HDD
  21. 21. Storage Computing Separation • Apsara HBase Provide a Architecture of storage computing separation • High density HDD will be available in Apsara HBase about this September. Welcome to try Apsara HBase at https://www.aliyun.com/product/hbase Apsara HBase
  22. 22. Hot-cold Data Separation — Hot-cold Data Recognition — Layered Compaction — Query Optimizations
  23. 23. HBase Read Path A quick tour of HBase read path HFile4 is filtered out by: •Bloom filter •Time range •Key range HFile 1 HFile 2 HFile3 HFile4 Store Memstore Region Memstore HFile 2 HFile3HFile 1HFile 2 HFile3 Memstore Scan Start Scan End KeyValue Heap
  24. 24. Goal of Query Optimization • Query optimization is only for hot queries • We have to try our best to filter out the cold HFiles, avoid seek in them. • Seeking in cold HFiles can tremendously increase RT for hot queries HFile (hot) HFile (cold) KeyValue Heap Client Query
  25. 25. Query Optimization: Case 1 • Scenario: Monitoring, e.g. OpenTSDB • Rowkey: MetricName + ts + postfix(tags) Rowkey ts cpuA001server1 001 cpuA002server1 002 cpuA003server1 003 cpuA004server1 004 diskB001server1 001 diskB002server1 002 diskB003server1 003 diskB004server1 004 Separate data by boundary: ts = 003 HFile(hot) cpuA003server1 cpuA004server1 disk003server1 diskB004server1 Time Range: 003…004 HFile(cold) cpuA001server1 cpuA002server1 diskB001serve1 diskB002server Time Range: 001…002 Optimization: Scan.setTimeRang e(003, 004) Cold HFile can be filtered out easily by time range Query: Scan scan = new Scan(cpuA003, cpuA004)
  26. 26. Query Optimization: Case 2 • Scenario: Tracing system • Rowkey: TraceID (events are recorded in different column) Rowkey ts traceid1 001 traceid2 002 traceid3 003 traceid4 004 traceid5 005 traceid6 006 traceid7 007 traceid8 008 Separate data by boundary: ts = 004 HFile(hot) traceid5 traceid6 traceid7 traceid8 Bloom Filter HFile(cold) traceid1 traceid2 traceid3 traceid4 Bloom Filter Optimization: Cold HFile can be filtered out by Bloom Filter Problem: false positive of bloom filter can cause spikes Query: Get get= new Get(“traceid7”)
  27. 27. Row,column,ts Row1,f:q,008 Row2,f:q,007 Row2,f:q,006 Row3,f:q,005 Time Range: 005…007 Row,column,ts Row1,f:q,001 Row2,f:q,003 Row2,f:q,002 Row3,f:q,004 Time Range: 001…004 Query: Select row >= Row2,f:q and limit =1 Fake row: row2,f:q,008 Row1,f:q,008 Row2,f:q,007 Row2,f:q,006 Row3,f:q,005 Time Range: 005…008 Fake row: row2,f:q,004 Row1,f:q,001 Row2,f:q,002 Row2,f:q,003 Row3,f:q,004 Time Range: 001…004 ① ① ① ① ② KeyValue Heap KeyValue Heap Lazy Seek Create a fake row with biggest ts possible HFile1 HFile2 HFile1 HFile2 Lazy Seek (HBASE-4465)
  28. 28. Scenario: KV Store Rowkey: key(with only one qualifier) 28 HFile(hot) Fake row: Row5,f:q1, 006 Row4,f:q1 Row5,f:q1 Row6,f:q1 Bloom Filter Time Range: 004…006 Separate data by boundary: ts = 004 Query: Get get= new Get(“Row5,f:q1”) HFile(cold) Fake row: Row5,f:q1, 003 Row1,f:q1 Row2,f:q1 Row3,f:q1 Bloom Filter Time Range: 001…003 Optimization: Cold HFile will not be seeked because of lazy seek Row,Column ts Row1,f:q1 001 Row2,f:q1 002 Row3,f:q1 003 Row4,f:q1 004 Row5,f:q1 005 Row6,f:q1 006 False positive of bloom filter Query Optimization: Case 3 • Scenario: KV Store • Rowkey: key(with only one qualifier)
  29. 29. 29 HFile(hot) trace2Collect005 trace2Arrive006 trace2Delivery007 trace2Done008 Time Range: 005…008 Separate data by boundary: ts = 004 Query: Scan scan = new Scan(“trace2” , “trace2~”) Rowkey ts trace1Collect001 001 trace1Arrive002 002 trace1Delivery003 003 trace1Done004 004 trace2Collect005 005 trace2Arrive006 006 trace2Delivery007 007 trace2Done008 008 HFile(cold) trace1Collect001 trace1Arrive002 trace1Delivery003 trace1Done004 Time Range: 001…004 Problem: Scan with prefix, no time range can be provided Query Optimization: Case 4 • Scenario: Logistics tracking in Alibaba • Rowkey: traceNo + actionCode + ts
  30. 30. 30 Fixed size prefix 32bits traceNo actionCode ts Other parts of Rowkey Bloom Filter Generate Check HFile(hot) trace2Collect005 trace2Arrive006 trace2Delivery007 trace2Done008 Prefix Bloom Filter HFile(cold) trace1Collect001 trace1Arrive002 trace1Delivery003 trace1Done004 Prefix Bloom Filter Query: Scan scan = new Scan(“trace2” , “trace2~”) Prefix Bloom Filter • Use the prefix part of a rowkey to generate and to check bloom filter
  31. 31. 31 HFile(hot) userA991 userA992 userA993 userA994 Time Range: 005…008 Separate data by boundary: ts = 004 Query: Scan scan = new Scan(“userA”), Limit 4 Rowkey ts userA991 008 userA992 007 userA993 006 userA994 005 userA995 004 userA996 003 userA997 002 userA998 001 HFile(cold) userA995 userA996 userA997 userA998 Time Range: 001…004 Problem: Scan with prefix, no endkey, no time range can be provided Query Optimization: Case 5 • Scenario: Bills History in Alibaba • Rowkey: userID + reverse(ts) + (oderID)
  32. 32. 32 Secondary Field Lazy Seek prefix 32bits userID ts Secondary Field HFile(hot) Fake Row: userA991 userA991 userA992 userA993 userA994 Secondary Field Range: 991…994 HFile(cold) Fake Row: userA995 userA995 userA996 userA997 userA998 Secondary Field Range: 995…998 Rowkey: KeyValue Heap Query: Scan scan = new Scan(“userA”), Limit 4 ① ① ② ③ ④ ⑤ Secondary Field Lazy Seek • Store Secondary Filed Range in HFile’s FileInfo section • Create fake key to perform lazy seek
  33. 33. 33 HFile(hot) cpuA003server1 cpuA004server1 cpuB003server1 cpuB004server1 Secondary Field Range: 003…004 Separate data by boundary: ts = 003 Query: Scan scan = new Scan(cpuA003, cpuA004) Optimization: Cold HFile can be filtered out easily by Secondary Field Rowkey ts cpuA001server1 001 cpuA002server1 002 cpuA003server1 003 cpuA004server1 004 cpuB001server1 001 cpuB002server1 002 cpuB003server1 003 cpuB004server1 004 HFile(cold) cpuA001server1 cpuA002server1 cpuB001server1 cpuB002server1 Secondary Field Range: 001…002 Query Optimization: Case 1 - revisit • Scenario: Monitoring, e.g. OpenTSDB • Rowkey: MetricName + ts(Secondary Field) + postfix(tags)
  34. 34. Conclusion03
  35. 35. Conclusion • A new approach to separate hot-cold data was introduced • A new Secondary Field Slicer was used to decide layer boundaries besides timestamp • Layered compaction was used to separate data to different layer • Heterogeneous storage was used to balance cost and performance • New technology like Prefix Bloom Filter and Secondary Field Range Lazy Seek was used to do auto query optimization • Production test shows that our approach can lower the query RT by 50% and decrease the storage usage by 25%
  36. 36. We are hiring! • If you are interested in or familiar with Hadoop ecosystem or any other No-SQL database • If you are eager to accept challenge of building high concurrency, low latency and flexible system
  37. 37. FAQ

×