Contenu connexe Similaire à HBaseCon 2012 | HBase Filtering - Lars George, Cloudera (20) Plus de Cloudera, Inc. (20) HBaseCon 2012 | HBase Filtering - Lars George, Cloudera2. Agenda
1 Introduction
2 Comparison Filters
3 Dedicated Filters
4 Decorating Filters
5 Combining Filters
6 Custom Filters
2 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
3. About Me
• Solutions Architect @ Cloudera
• Apache HBase & Whirr Committer
• Author of
HBase – The Definitive Guide
• Working with HBase since end
of 2007
• Organizer of the Munich OpenHUG
• Speaker at Conferences (Fosdem,
Hadoop World)
3 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
4. Introduction to Filters
• Used in combination with get() and scan()
API calls
• Steps:
– Create Filter instance
– Create Get or Scan instance
– Assign Filter to Get or Scan
– Call API and enjoy
• More fine-grained control over what is
returned to the client
4 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
5. Filter Features
• Allow client to further narrow down what is
retrieved
– Not just per row or column key, or per column
family
• Predicate Pushdown
– Move filtering from client to server to reduce
network traffic
• Varying performance implications,
dependent on the use-case
5 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
6. Filter Pushdown
6 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
7. Filter Features (cont.)
• Filters have access to the entire row to
decide its fate
– Access to KeyValue instances to check row keys,
column qualifiers, timestamps, or values
• Scan batching might conflict with the above
and might trigger an “Incompatible Filter”
exception
– Example: DependentColumnFilter
• There is no cross invocation state
– Cannot filter rows based on dependent rows
7 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
8. Available Filters
• Many filters are supplied by HBase
– Based on row key, column family, or column
qualifier
– Paging through rows and columns
– Based on dependencies
• Write your own filters
– Use FilterBase class to get a no-op
skeleton and fill in the gaps
8 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
9. Agenda
1 Introduction
2 Comparison Filters
3 Dedicated Filters
4 Decorating Filters
5 Combining Filters
6 Custom Filters
9 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
10. Comparison Filters
• Based on CompareFilter class
• Adds the compare() method to
FilterBase!
• Takes operator that defines how the
comparison is performed
– Predefined by client API
• Also needs a comparator to do the actual
check
– HBase supplies a large set
10 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
11. Comparison Operators
11 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
12. Comparators
12 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
13. Comparison Filters (cont.)
• Not all combinations of operator and
comparator make sense
– For example, the SubstringComparator
replies only 0 (match) and 1(no match)
– Only EQUAL and NOT_EQUAL are useful
– Using other operators is allowed but will most
likely yield unexpected results
13 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
14. Comparison Filters (cont.)
• HBase filters are usually filtering data out
• Comparison filters work in reverse as they
include matching data
– Be mindful when selecting the comparison
operator!
14 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
15. Available Comparison Filters
• Row Filter
– Based on row keys comparisons
• Family Filter
– Based on column family names
• Qualifier Filter
– Based on column names, aka qualifiers
• Value Filter
– Based on the actual value of a column
15 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
16. Available Comparison Filters (cont.)
• Dependent Column Filter
– Based on a timestamp of a reference column
– Includes all columns that have the same
timestamp
– Implies that the entire row is accessible, since
batching will not have access to the reference
column
• No scanner batching allowed!
– Useful for loading interdependent changes
within a row
16 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
17. Example Code
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("colfam1"), !
Bytes.toBytes("col-0")); !
Filter filter = new RowFilter(!
CompareFilter.CompareOp.LESS_OR_EQUAL, !
new BinaryComparator(Bytes.toBytes("row-22")));
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result res : scanner) { !
System.out.println(res); !
} !
scanner.close(); !
!
17 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
18. Example Ouput
keyvalues={row-1/colfam1:col-0/1301043190260/Put/vlen=7} !
keyvalues={row-10/colfam1:col-0/1301043190908/Put/vlen=8} !
keyvalues={row-100/colfam1:col-0/1301043195275/Put/vlen=9} !
keyvalues={row-11/colfam1:col-0/1301043190982/Put/vlen=8} !
keyvalues={row-12/colfam1:col-0/1301043191040/Put/vlen=8} !
keyvalues={row-13/colfam1:col-0/1301043191172/Put/vlen=8} !
keyvalues={row-14/colfam1:col-0/1301043191318/Put/vlen=8} !
keyvalues={row-15/colfam1:col-0/1301043191429/Put/vlen=8} !
keyvalues={row-16/colfam1:col-0/1301043191509/Put/vlen=8} !
keyvalues={row-17/colfam1:col-0/1301043191593/Put/vlen=8} !
keyvalues={row-18/colfam1:col-0/1301043191673/Put/vlen=8} !
keyvalues={row-19/colfam1:col-0/1301043191771/Put/vlen=8} !
keyvalues={row-2/colfam1:col-0/1301043190346/Put/vlen=7} !
keyvalues={row-20/colfam1:col-0/1301043191841/Put/vlen=8} !
keyvalues={row-21/colfam1:col-0/1301043191933/Put/vlen=8} !
keyvalues={row-22/colfam1:col-0/1301043191998/Put/vlen=8} !
18 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
19. Agenda
1 Introduction
2 Comparison Filters
3 Dedicated Filters
4 Decorating Filters
5 Combining Filters
6 Custom Filters
19 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
20. Dedicated Filters
• Based directly on FilterBase class
• Often less useful for get() calls, since
entire rows are filtered
20 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
21. Available Dedicated Filters
• Single Column Value Filter
– Filter rows based on one specific column
– Extra features
• “Filter if missing”
• “Get latest version only”
– Column must be part of the scan selection
• Or else it is all or nothing
– Also needs compare operation and an
optional comparator
21 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
22. Available Dedicated Filters (cont.)
• Single Column Value Exclude Filter
– Same as the one before but excludes the
selection column
• Prefix Filter
– Based on prefix of row keys
– Can early out the scan!
• Combine with start row for best performance
22 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
23. Available Dedicated Filters (cont.)
• Page Filter
– Allows pagination through rows
– Needs to be combined with setting the start row on
subsequent scans
– Can early out the scan when limit is reached
• Key Only Filter
– Drop the value for every column
• First Key Only Filter
– Return only the first column key
– Useful for row counter, or get newest post type
applications
– Can early out rest of row scan
23 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
24. Available Dedicated Filters (cont.)
• Inclusive Stop Filter
– As opposed to the exclusive stop row, this
filter will include the final row
• Timestamp Filter
– Takes list of timestamps to include in result
• Column Count Get Filter
– Used to limit number of columns returned by a
get() call
24 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
25. Available Dedicated Filters (cont.)
• Column Pagination Filter
– Allows to paginate through columns within a
row
– Skips to offset parameter and returns
limit columns
• Column Prefix Filter
– Analog to PrefixFilter, here for matching
column qualifiers
• Random Row Filter
25 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
26. Agenda
1 Introduction
2 Comparison Filters
3 Dedicated Filters
4 Decorating Filters
5 Combining Filters
6 Custom Filters
26 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
27. Decorating Filters
• Extend filters to gain additional control
over the returned data
• Skip Filter
– Skip entire row when a column is filtered
– Not all filters are compatible
• While Match Filter
– Aborts entire scan once the wrapped filter
indicates a row or column is omitted
27 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
28. Agenda
1 Introduction
2 Comparison Filters
3 Dedicated Filters
4 Decorating Filters
5 Combining Filters
6 Custom Filters
28 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
29. Combining Filters
• Implemented by the FilterList class
– Wraps list of filters into a Filter compatible
class
– Takes optional operator to decide how to
handle the results of each wrapped filter
(default: MUST_PASS_ALL)
29 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
30. Combining Filters
• Filter lists can contain other filter lists
• Operator is fixed per list, but hierarchy
allows to create combinations
• Using the proper List implementation
helps controlling filter execution order
30 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
31. List<Filter> filters = new ArrayList<Filter>();
Filter filter1 = new RowFilter(!
CompareFilter.CompareOp.GREATER_OR_EQUAL, !
new BinaryComparator(Bytes.toBytes("row-03"))); !
filters.add(filter1); !
Filter filter2 = new RowFilter(!
CompareFilter.CompareOp.LESS_OR_EQUAL, !
new BinaryComparator(Bytes.toBytes("row-06"))); !
filters.add(filter2); !
Filter filter3 = new QualifierFilter(!
CompareFilter.CompareOp.EQUAL, !
new RegexStringComparator("col-0[03]")); !
filters.add(filter3);!
FilterList filterList1 = new FilterList(filters); !
…!
FilterList filterList2 = new
FilterList(FilterList.Operator.MUST_PASS_ONE, filters); !
31 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
32. Agenda
1 Introduction
2 Comparison Filters
3 Dedicated Filters
4 Decorating Filters
5 Combining Filters
6 Custom Filters
32 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
33. Custom Filter
• Allows users to add missing filters
• Either implement Filter interface or use
FilterBase skeleton
• Provides hooks called at different stages
of the read process
33 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
34. Filter Interface
public interface Filter extends Writable { !
public enum ReturnCode { !
INCLUDE, SKIP, NEXT_COL, NEXT_ROW,!
SEEK_NEXT_USING_HINT } !
public void reset()!
public boolean filterRowKey(byte[] buffer, !
int offset, int length) !
public boolean filterAllRemaining()!
public ReturnCode filterKeyValue(KeyValue v)!
public void filterRow(List<KeyValue> kvs)!
public boolean hasFilterRow()!
public boolean filterRow()!
public KeyValue getNextKeyHint(KeyValue !
currentKV) !
!
34 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
35. Filter Return Codes
35 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
36. Merge Reads
36 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
37. Filter Flow
• Filter hooks are called at
different stages
• Seeks are done initially to
find the next KeyValue
– Hint from previous filter
invocation might help
• Early out checks improve
performance
37 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
38. Example Code
public class CustomFilter extends FilterBase{ !
private byte[] value = null; !
private boolean filterRow = true; !
public CustomFilter() { super(); }!
public CustomFilter(byte[] value) { this.value = value; } !
@Override
public void reset() { this.filterRow = true; } !
@Override !
public ReturnCode filterKeyValue(KeyValue kv) {!
if (Bytes.compareTo(value, kv.getValue()) == 0) { !
filterRow = false; !
} !
return ReturnCode.INCLUDE; !
} !
@Override !
public boolean filterRow() { return filterRow; } !
...!
} !
!
38 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
39. Deploying Custom Filters
• Need to provide JAR file with filter class
• Deploy JAR to RegionServers
• Add JAR to HBASE_CLASSPATH
• Restart RegionServers
• Tip: Testing on cluster more involved, test
on local machine first
39 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
40. Summary
40 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.
41. Summary (cont.)
41 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
or redistribution without written permission is prohibited.