Cloudera Federal Forum 2014: The REDDISK Big Data Architecture

•

4 likes•3,778 views

Cloudera, Inc.

CEO of Koverse Paul Brown, shares the story of Accumulo and how the project is applied to Hadoop and Big Data.

Technology Business

Red Disk

AND SOME THOUGHTS ON BIG DATA IN THE GOVERNMENT

PAUL BROWN
KOVERSE, INC

Accumulo Origin Story
(Paul’s Version)



Thinking was:


We were way behind the curve



Data unification was the only way to survive



Google’s architecture is proven to scale and the design is available



Need to prove as soon as possible:





Scale/Unification in real world scenarios
Mission Impact

What we Learned along the way:


Needed Secure Indexes across datasets



“Productization” is critical to scaling success



We are way ahead of the curve…

Why Accumulo and Hadoop


Interactive Query at Scale




Adaptive Schemas

Heterogeneous Data




Bulk Processing

Multiple Versions

Adoption of Big Data

Home Grown(pre 2008)
Open Source

GOTS
COTS

GOTS Phase


Mission Impact

Goals:

 Lower

complexity

 Mission

Impact

 Repeatability

Sources and
Methods
Technology
Core
Principals

Red Disk


Goals:

 Lower

the complexity and time associated
operationalizing data

 “product”,

purpose

repeatable, documented, general

 Interoperability

between systems

Red Disk
 RPMs
 Key

New Apps

Existing Apps

Node Types

 Hadoop/Accumulo

Red Disk

 JBOSS
 STORM

Hadoop and Accumulo

Red Disk API -> UCD API


Pre-processing and data ingest: storm



Bulk Analytics: MapReduce Input/Output Formats



CRUD and Query: REST services

Red Disk
Kafka
DPF
(UCD API)

Mission Apps

Storm
Ingest Analytics – NLP, etc

UCD Ingest / Query API

Raw Data

Indexing Providers: Koverse, GAIA, etc

Accumulo, HDFS, etc

UCD logical structure
Bob
Person
Place

Bob

Father Of

Terms

Bob Father of
Joe
Bobby AKA
Bob

Organization
Artifacts

Joe

Statements

UCD API

Objects

Review


Questions…


Red Disk



Accumulo



Anything else

Viewers also liked

Adamo pressreleaseimec.archive

Mozilla - Anurag Phadke - Hadoop World 2010Cloudera, Inc.

Boston webcast gpu_2016-12BOSTON Server & Storage Solutions GmbH

How Dell Delivers Personalized Customer Experiences, Leveraging Digital Commu...Moxie

8002 Research Project Faciltator ManualJay Hays

Cerner pptCHIRANTAN BOSE MD.,IFCAP.

Keynote: The Journey to Pervasive AnalyticsCloudera, Inc.

1 resistant hypertensionthozie02

Data: Open for Good and Secure by Default | Eddie GarciaCloudera, Inc.

Tech Backpack Brief v1Tech Backpack

INTERNAL MEDICINE - Secondary HypertensionNian Baring

Malignant hypertensionAlaa Fadhel Hassan Alwazni

Viewers also liked (12)

Adamo pressrelease

Mozilla - Anurag Phadke - Hadoop World 2010

Boston webcast gpu_2016-12

How Dell Delivers Personalized Customer Experiences, Leveraging Digital Commu...

8002 Research Project Faciltator Manual

Cerner ppt

Keynote: The Journey to Pervasive Analytics

1 resistant hypertension

Data: Open for Good and Secure by Default | Eddie Garcia

Tech Backpack Brief v1

INTERNAL MEDICINE - Secondary Hypertension

Malignant hypertension

Similar to Cloudera Federal Forum 2014: The REDDISK Big Data Architecture

HadoopWorkshopJuly2014Dieter De Witte

Oct 2011 CHADNUG Presentation on HadoopJosh Patterson

How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah

DWH & big data architecture approachesLuxoft

Владимир Слободянюк «DWH & BigData – architecture approaches»Anna Shymchenko

Big Data and Advanced Data Intensive ComputingJongwook Woo

Big Data and Hadoop - An IntroductionNagarjuna Kanamarlapudi

Big dataanalyticsinthecloudSivaramakrishnan Narayanan

Analyst Report : The Enterprise Use of Hadoop EMC

Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Josh Patterson

How Google Does Big Data - DevNexus 2014James Chittenden

Recent IT Development and Women: Big Data and The Power of Women in GoryeoJongwook Woo

FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)GeeksLab Odessa

Big data in actionTu Pham

Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsJongwook Woo

Lesson 1 introduction to_big_data_and_hadoop.pptxPankajkumar496281

Bigdata processing with SparkArjen de Vries

Big data business caseKarthik Padmanabhan ( MLE℠)

Hadoop introduction , Why and What is Hadoop ?sudhakara st

Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael SegelMichael Segel

Similar to Cloudera Federal Forum 2014: The REDDISK Big Data Architecture (20)

HadoopWorkshopJuly2014

Oct 2011 CHADNUG Presentation on Hadoop

How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook

DWH & big data architecture approaches

Владимир Слободянюк «DWH & BigData – architecture approaches»

Big Data and Advanced Data Intensive Computing

Big Data and Hadoop - An Introduction

Big dataanalyticsinthecloud

Analyst Report : The Enterprise Use of Hadoop

Chattanooga Hadoop Meetup - Hadoop 101 - November 2014

How Google Does Big Data - DevNexus 2014

Recent IT Development and Women: Big Data and The Power of Women in Goryeo

FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

Big data in action

Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems

Lesson 1 introduction to_big_data_and_hadoop.pptx

Bigdata processing with Spark

Big data business case

Hadoop introduction , Why and What is Hadoop ?

Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael Segel

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Developing An App To Navigate The Roads of BrazilV3cube

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

🐬 The future of MySQL is Postgres 🐘RTylerCroy

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Developing An App To Navigate The Roads of Brazil

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

08448380779 Call Girls In Civil Lines Women Seeking Men

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Unblocking The Main Thread Solving ANRs and Frozen Frames

Injustice - Developers Among Us (SciFiDevCon 2024)

🐬 The future of MySQL is Postgres 🐘

GenCyber Cyber Security Day Presentation

Finology Group – Insurtech Innovation Award 2024

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Exploring the Future Potential of AI-Enabled Smartphone Processors

Axa Assurance Maroc - Insurer Innovation Award 2024

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Handwritten Text Recognition for manuscripts and early printed texts

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

The 7 Things I Know About Cyber Security After 25 Years | April 2024

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Cloudera Federal Forum 2014: The REDDISK Big Data Architecture

1. Red Disk AND SOME THOUGHTS ON BIG DATA IN THE GOVERNMENT PAUL BROWN KOVERSE, INC

2. Accumulo Origin Story (Paul’s Version)  Thinking was:  We were way behind the curve  Data unification was the only way to survive  Google’s architecture is proven to scale and the design is available  Need to prove as soon as possible:    Scale/Unification in real world scenarios Mission Impact What we Learned along the way:  Needed Secure Indexes across datasets  “Productization” is critical to scaling success  We are way ahead of the curve…

3. Why Accumulo and Hadoop  Interactive Query at Scale   Adaptive Schemas Heterogeneous Data   Bulk Processing Multiple Versions

4. Adoption of Big Data Home Grown(pre 2008) Open Source GOTS COTS

5. GOTS Phase  Mission Impact Goals:  Lower complexity  Mission Impact  Repeatability Sources and Methods Technology Core Principals

6. Red Disk  Goals:  Lower the complexity and time associated operationalizing data  “product”, purpose repeatable, documented, general  Interoperability between systems

7. Red Disk  RPMs  Key New Apps Existing Apps Node Types  Hadoop/Accumulo Red Disk  JBOSS  STORM Hadoop and Accumulo

8. Red Disk API -> UCD API  Pre-processing and data ingest: storm  Bulk Analytics: MapReduce Input/Output Formats  CRUD and Query: REST services

9. Red Disk Kafka DPF (UCD API) Mission Apps Storm Ingest Analytics – NLP, etc UCD Ingest / Query API Raw Data Indexing Providers: Koverse, GAIA, etc Accumulo, HDFS, etc

10. UCD logical structure Bob Person Place Bob Father Of Terms Bob Father of Joe Bobby AKA Bob Organization Artifacts Joe Statements UCD API Objects

11. Review  Questions…  Red Disk  Accumulo  Anything else

Cloudera Federal Forum 2014: The REDDISK Big Data Architecture

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to Cloudera Federal Forum 2014: The REDDISK Big Data Architecture

Similar to Cloudera Federal Forum 2014: The REDDISK Big Data Architecture (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Recently uploaded

Recently uploaded (20)

Cloudera Federal Forum 2014: The REDDISK Big Data Architecture