Aaron Cordova outlines how Accumulo helps provide the essential features of a "Data Lake": a system in which all types of data from all sources can be imported, secured, analyzed, and delivered to decision makers.
11. Record
• Flat
Field A Field B Field C Field D
abc 123 2014 44,33
• Nested Semi-Structured
• {people: [{name:’bob’},{name:’fred’}]}
• Unstructured
• Text: “the brown fox jumped over …”
12. Accumulo Records
• Each record -> blob (serialized)
• Each record -> set of key-value pairs
18. Access Control
• Each data set has a unique label
• Within a data set, records and fields can have
different labels
• Apply labels from an external system at query time
19.
20. Query
• Indexing and Access control support query
• Point queries
• Lucene processing for text
• Ranges for numbers, dates
• Multi-dimensional