Apache Accumulo, originally developed by the National Security Agency and now an Apache Software Foundation project, builds upon Google's Bigtable design to provide a scalable, lightly-structured database capability complementing the ubiquitous Hadoop environment. The core capabilities of Accumulo include cell-level security, flexible schemas, real-time analytics, bulk I/O, and linear scalability beyond trillions of entries and petabytes of data. These new capabilities lead to techniques that unlock the power of Big Data, but don't fit into traditional database design patterns. Learn about the advantages of Apache Accumulo and how it fits into the Hadoop and NoSQL ecosystem.
Presenter: Adam Fuchs, CTO, sqrrl
Apidays New York 2024 - The value of a flexible API Management solution for O...
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
1. sqrrl data, INC.
Secure. Scale. Adapt.
Adam Fuchs, Chief Technology Officer
info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
2. Secure. Scale. Adapt.
Who We are
is the commercial
provider of
Mature Database Technology - Apache Accumulo
Fine-Grained Access Controls - Data Integration and Sharing
Proven Performance - Petabytes and Beyond
Advanced Analytics - Search, Statistics, and Graphs
2
info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
4. Secure. Scale. Adapt.
Apache Accumulo Perspective
Data Data Data
Integration across:
Multiple business lines
Multiple data sets
Multiple applications
Multiple security, privacy, legal,
Application Application Application
policy, regulatory, and
compliance constraints
New demands
4
info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
5. Secure. Scale. Adapt.
Accumulo Design Drivers
Cell-Level Security
1 Express common security requirements in the infrastructure, not just in the application
Data-centric approach encourages secure sharing
Scalability
2 Near linear performance improvements at thousands of nodes
Durable and reliable under increased failures that come with scale
Diverse, Interactive Analytics
3 Sorted key/value core performs well in a diverse set of domains
Information retrieval, statistics, graph analysis, geo indexing, and more
Flexible, Adaptive Schema
4 Start with universal structures and indexing
Refine the schema over time
5
info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
7. Secure. Scale. Adapt.
Accumulo Key Structure
An Accumulo key is a 5-tuple, consisting of:
Row: Controls Atomicity
Column Family: Controls Locality
Column Qualifier: Controls Uniqueness
Visibility Label: Controls Access
Timestamp: Controls Versioning
Row Col. Fam. Col. Qual. Visibility Timestamp Value
Patient suffers
John Doe Notes PCP PCP_JD 20120912
from an acute …
John Doe Test Results Cholesterol JD|PCP_JD 20120912 183
John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass
John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100…
Accumulo Key/Value Example
7
info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
8. Secure. Scale. Adapt.
Visibility Syntax & Semantics
8
info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
9. Secure. Scale. Adapt.
Tablets
Well-Known
Location
(zookeeper)
Collections of KV pairs form Tables
Tables are partitioned into Tablets
Root Tablet
-∞ to ∞ Metadata tablets hold info about
other tablets, forming a 3-level
hierarchy
Metadata Tablet 1 Metadata Tablet 2 A Tablet is a unit of work for a Tablet
-∞ to “Encyclopedia:Ocelot” “Encyclopedia:Ocelot” to ∞ Server
Table: Adam’s Table Table: Encyclopedia Table: Foo
Data Tablet Data Tablet Data Tablet Data Tablet Data Tablet Data Tablet
-∞ : thing thing : ∞ -∞ : Ocelot Ocelot : Yak Yak : ∞ -∞ to ∞
9
info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
10. Secure. Scale. Adapt.
Accumulo Architecture
Delegate
Zookeeper Authority Tablet Server
Zookeeper
Zookeeper
Tablet
Delegate Read/Write
Application
Authority Tablet Server
Assign/Balance
Master Application
Tablet
Store/Replicate Application
Tablet Server
Hadoop
Tablet
10
info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
11. Secure. Scale. Adapt.
Tablet Data Flow
Tablet
Scan
In-Memory Iterator
Reads
Writes Iterator Tree
Map Minor Tree
Compaction
Sorted, Ind Sorted, Ind
exed File exed File
Write Ahead Sorted, Ind
Log Iterator exed File
(For Recovery) Merging / Major Tree
Compaction
11
info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
23. Secure. Scale. Adapt.
acorn
Key/Value pairs are great! =
How do I construct a document
partitioning key again?
Techniques should be built into an API
Let the people have polyglot
Lucene, SQL, SPARQL, JAQL, Matlab
(not just Key, Value, Range)
+
+
28
info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
24. Secure. Scale. Adapt.
Combined IR + Graph Search
29
info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
26. Secure. Scale. Adapt.
Get Involved
http://accumulo.apache.org
Help us make Accumulo even better!
31
info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
27. Secure. Scale. Adapt.
Contact
Adam Fuchs, CTO
sqrrl data, Inc.
617-520-4375
www.sqrrl.com
@sqrrl_inc
info@sqrrl.com
32
info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved