2. Big Data Loading
● So you've processed your data...
● Now, how to get that to people quickly?
● Project Voldemort's Read-Only stores
● Simple key-value store
● Based upon Amazon Dynamo
● Simple Java interface and operation
● Immutable read only stores
3. Read Only Stores
● Precompute in Hadoop or else where
● Creates an indexed key-value store
● One reducer (or file) per node
● Replicated data for fail over
● Atomically loads into nodes
● Copy from hdfs or other http source
● Very fast, limited by network or storage i/o
● Can throttle so not affecting live services
● Can also roll back to previous versions
4. Example Hadoop Store Builder
public class JsonStoreBuilder
extends AbstractHadoopStoreBuilderMapper<LongWritable, Text>{
JSONParser parser = new JSONParser();
@Override
public Object makeKey(LongWritable lineNo, Text line) {
JSONObject json = parser.parse(line.toString());
return json.get("name");
}
@Override
public Object makeValue(LongWritable lineNo, Text line) {
return line.toString();
}
}
6. Pig to Json Index
● Output JSON from pig
STORE bag INTO 'data.json' USING JsonStorage();
● JsonStoreBuilder
● Extends Voldemort StoreBuilder
● Easily index any field
● Code up here:
http://github.com/danharvey/pigJsonUtils