This document discusses storing and manipulating graphs in HBase. It provides an overview of graph theory concepts and different modeling techniques for storing graphs in HBase, including the adjacency matrix and adjacency list approaches. It also covers techniques for distributed traversal and querying of graphs stored in HBase using MapReduce jobs. Key tips discussed include implementing custom comparators and using utilities like MultiScanTableInputFormat and TableMapReduceUtil.
37. Adjacency List Design in HBase
row key “edges” column family
e:dan@fullcontact.com p:+13039316251= ...
t:danklynn= ...
p:+13039316251 e:dan@fullcontact.com= ...
t:danklynn= ...
t:danklynn e:dan@fullcontact.com= ...
p:+13039316251= ...
38. Adjacency List Design in HBase
row key “edges” column family
e:dan@fullcontact.com p:+13039316251= ...
t:danklynn= ...
at to
W e?h
p:+13039316251 e:dan@fullcontact.com= ...
st or
t:danklynn= ...
t:danklynn e:dan@fullcontact.com= ...
p:+13039316251= ...
41. Don’t get fancy with byte[]
class EdgeValueWritable implements Writable {
EdgeValue edgeValue
byte[] toBytes() {
// use strings if you can help it
}
static EdgeValueWritable fromBytes(byte[] bytes) {
// use strings if you can help it
}
}
groovy
42. Querying by vertex
def get = new Get(vertexKeyBytes)
get.addFamily(edgesFamilyBytes)
Result result = table.get(get);
result.noVersionMap.each {family, data ->
// construct edge objects as needed
// data is a Map<byte[],byte[]>
}
43. Adding edges to a vertex
def put = new Put(vertexKeyBytes)
put.add(
edgesFamilyBytes,
destinationVertexBytes,
edgeValue.toBytes() // your own implementation here
)
// if writing directly
table.put(put)
// if using TableReducer
context.write(NullWritable.get(), put)
59. Do implement your own comparator
public static class Comparator
extends WritableComparator {
public int compare(
byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
// .....
}
}
java
60. Do implement your own comparator
static {
WritableComparator.define(VertexKeyWritable,
new VertexKeyWritable.Comparator())
}
java
68. Elastic MapReduce
HFi les
Copy to S3
Elastic MapReduce
Seq uen ceFiles Seq uen ceFiles
HFileOutputFormat.configureIncrementalLoad(job, outputTable)
HFi les
69. Elastic MapReduce
HFi les
Copy to S3
Elastic MapReduce
Seq uen ceFiles Seq uen ceFiles
HFileOutputFormat.configureIncrementalLoad(job, outputTable)
HFi les HBase
$ hadoop jar hbase-VERSION.jar completebulkload
70. Additional Resources
Google Pregel: BSP-based graph processing system
Apache Giraph: Implementation of Pregel for Hadoop
MultiScanTableInputFormat example
Apache Mahout - Distributed machine learning on Hadoop