Cloudera Federal Forum 2014: The REDDISK Big Data Architecture
1. Red Disk
AND SOME THOUGHTS ON BIG DATA IN THE GOVERNMENT
PAUL BROWN
KOVERSE, INC
2. Accumulo Origin Story
(Paul’s Version)
Thinking was:
We were way behind the curve
Data unification was the only way to survive
Google’s architecture is proven to scale and the design is available
Need to prove as soon as possible:
Scale/Unification in real world scenarios
Mission Impact
What we Learned along the way:
Needed Secure Indexes across datasets
“Productization” is critical to scaling success
We are way ahead of the curve…
3. Why Accumulo and Hadoop
Interactive Query at Scale
Adaptive Schemas
Heterogeneous Data
Bulk Processing
Multiple Versions
4. Adoption of Big Data
Home Grown(pre 2008)
Open Source
GOTS
COTS
6. Red Disk
Goals:
Lower
the complexity and time associated
operationalizing data
“product”,
purpose
repeatable, documented, general
Interoperability
between systems
7. Red Disk
RPMs
Key
New Apps
Existing Apps
Node Types
Hadoop/Accumulo
Red Disk
JBOSS
STORM
Hadoop and Accumulo
8. Red Disk API -> UCD API
Pre-processing and data ingest: storm
Bulk Analytics: MapReduce Input/Output Formats
CRUD and Query: REST services
9. Red Disk
Kafka
DPF
(UCD API)
Mission Apps
Storm
Ingest Analytics – NLP, etc
UCD Ingest / Query API
Raw Data
Indexing Providers: Koverse, GAIA, etc
Accumulo, HDFS, etc