SlideShare une entreprise Scribd logo
1  sur  33
Securely explore your data

SQRRL ENTERPRISE +
APACHE ACCUMULO:
A secure, scalable, real-time
analysis framework

Adam Fuchs, CTO
Sqrrl Data, Inc.
August 21, 2013
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
TWO HALVES OF REAL-TIME
Data-Driven
Real-Time  reduce event to reaction time

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Query-Driven
Real-Time  reduce ingest to query latency
Data-Driven + Query-Driven Real-Time Ecosystem
Actions
3

SPE
4

Data

1

Dashboards

2
5

NoSQL+
6

1.
2.
3.
4.
5.
6.

Interactive
Analysis Tools

(Discovery + Forensics)
SPE queries NoSQL to enrich streaming data
SPE persists results in NoSQL for future query
SPE takes action automatically
SPE issues data-driven alerts
Sqrrl provides context for dashboards
Analysis tools query use Sqrrl to search and manipulate historical data

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
This talk focuses on the database.
Actions
3

SPE
4

Data

1

Dashboards

2
5

NoSQL+
6

Interactive
Analysis Tools
(Discovery + Forensics)

1.
2.
3.
4.
5.
6.

SPE queries NoSQL to enrich streaming data
SPE persists results in NoSQL for future query
SPE takes action automatically
SPE issues data-driven alerts
Sqrrl provides context for dashboards
Analysis tools query use Sqrrl to search and manipulate historical data

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

5
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO DATA FORMAT
An Accumulo key is a 5-tuple, consisting of:
- Row: Controls Atomicity
- Column Family: Controls Locality
- Column Qualifier: Controls Uniqueness
- Visibility Label: Controls Access
- Timestamp: Controls Versioning

Accumulo Key/Value Example
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

7
ACCUMULO TABLETS
Well-Known
Location
(zookeeper)

Collections of KV pairs form Tables
Tables are partitioned into Tablets
Metadata tablets hold info about other
tablets, forming a 3-level hierarchy
A Tablet is a unit of work for a Tablet
Server

Root Tablet
-∞ to ∞

Metadata Tablet 1

Metadata Tablet 2

-∞ to “Encyclopedia:Ocelot”

“Encyclopedia:Ocelot” to ∞

Table: Adam’s Table
Data Tablet
-∞ : thing

Data Tablet
thing : ∞

Table: Encyclopedia
Data Tablet
-∞ : Ocelot

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Data Tablet
Ocelot : Yak

Data Tablet
Yak : ∞

Table: Foo
Data Tablet
-∞ to ∞

8
ACCUMULO PROCESSES
Zookeeper

Tablet Server

Zookeeper
Zookeeper

Tablet
Read/Write

Delegate
Authority

Assign/Balance

Tablet Server

Master

Application

Application
Tablet
Store/Replicate

Tablet Server

HDFS

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Application

Tablet

9
TABLET DATA FLOW
Tablet
Writes

In-Memory
Map

Iterator
Tree

Iterator
Tree

Reads

Minor
Compaction

Sorted,
Indexed
File
Write Ahead
Log
(For Recovery)

Scan

Merging / Major
Compaction

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Sorted,
Indexed
File
Iterator
Tree

Sorted,
Indexed
File

10
WORD COUNT:
Summing Aggregating Iterator

Input Corpus

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

11
ITERATOR FRAMEWORK
Iterator Operations:
- File Reads
- Block Caching
- Merging
- Deletion
- Isolation
- Locality Groups
- Range Selection
- Column Selection
- Cell-level Security
- Versioning
- Filtering
- Aggregation
- Partitioned Joins

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

12
ACCUMULO LATENCIES
~ms

~ms

Ingesters

Tablet Servers
InInInMemory
Memory
Memory
Map
Map
Map

Batch
Writer

ms - min

Input

~ms

Scan
Scan
Scan
Iterators
Iterators
Iterators

Queriers
Scanner/
Batch
Scanner

Output

Compactio
Compactio
n
Compaction
n
Iterators
Iterators
Iterators

RFile
RFile
RFiles

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

13
ACCUMULO THROUGHPUT
Scan:
up to 1M entries/s
per node

Ingest:
up to 500K entries/s
per node
~ms

~ms

Ingesters

Tablet Servers
InInInMemory
Memory
Memory
Map
Map
Map

Batch
Writer

ms - min

Input

~ms

Scan
Scan
Scan
Iterators
Iterators
Iterators

Queriers
Scanner
/Batch
Scanner

Output

Compacti
Compacti
on
Compaction
on
Iterators
Iterators
Iterators

RFile
RFile
RFiles

Read-Modify-Write Latency: ~ms

>1K entries/s challenging with R-M-W
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

14
SQRRL ENTERPRISE
Bulk Processing
Integration

Exploratory /
Operational Apps

Built on Apache Accumulo
Graph +
Document I/O

Sqrrl API over Apache Thrift RPC
(JSON, Graph, Aggregation, Search, etc.)

•
•
•
•
•

Sqrrl proprietary
Automated indexing
Custom iterators
Lucene integration
Security extensions

Sqrrl Server
Accumulo RPC
(Sorted Key/Value I/O)

• Open source
(including Sqrrl
contributions)

Hadoop RPC
(File I/O)
• Open source or
commercial distributions

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

15
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

16
DATA-CENTRIC SECURITY
Definition: Data carries with it information that is required
to make policy decisions on its releasability.

User 1

Sqrrl/
Accumul
o

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

User 2

17
SECURITY
Example Accumulo Key/Value Pairs

Accumulo is the only
NoSQL database with
cell-level access
controls
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

18
DATA-CENTRIC SECURITY ECOSYSTEM
Key
Mgmt

Audits
End Users

Labeler

Sqrrl
Enterprise

Policies

Data

Policy
Engine

Apps

Auth.
Service
User
Attributes

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

19
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

20
HIERARCHICAL DECOMPOSITION
<person>

Row:

Column Family:

Column Qualifier:

Value:

attribute

purchases

returns

age

discount

sneakers

hat

<age>

<rate>

<cost>

<cost>

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

21
MATERIALIZED TABLE
Key/Value Pair

Row:

bill

Column
Family:

george

attribute purchases

attribute purchases

Column
Qualifier:

age

discount

sneakers

age

sneakers

Value:

49

40%

$100

27

$83

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

22
FORWARD AND INVERTED INDEX
Table:

Forward Index

Inverted Index

Row:

<UUID>

<Term>

Column Family:

<Type>

<UUID>

Column Qualifier:

<Field>

<Type+Field>

<Term>

<Digest of Event>

Value:

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

23
FORWARD AND INVERTED INDEX

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

24
CUSTOM INDEXING
Table:

Geo Index

Latitude

Longitude

10110101001 00111010010

11010110110

<GeoHash>

Row:

101001110111010101011100001011100

Column Family:

<Event Type>

Column Qualifier:

Value:

Depth

<UUID>

<Digest of Event>

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

25
D4M 2.0 SCHEMA FOR TWITTER DATA
Table:

Tedge

TedgeT

Row:

<UUID>

<value>

Column Family:

“stat”

“time”

“user”

“word
”

“stat”

“time”

“user”

“word
”

Column Qualifier:

<stat>

<time>

<user
>

<word
>

<UUID
>

<UUID
>

<UUID
>

<UUID
>

“1”

“1”

“1”

“1”

“1”

“1”

“1”

“1”

Value:

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

26
D4M 2.0 SCHEMA FOR TWITTER DATA
Table:

TedgeDegT

Ttext

Row:

<value>

<UUID>

Column Family:

“stat”

“time”

“user”

“word
”

“text”

Column Qualifier:

“degre
e”

“degre
e”

“degre
e”

“degre
e”

-

Value:

<count>

<count> <count>

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

<count>

<text>

27
D4M 2.0 SCHEMA FOR TWITTER DATA

Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database, Kepner et. al., HPEC 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

28
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

29
ACCUMULO WITH D4M 2.0 SCHEMA PERFORMANCE
Maximizing throughput on an 8-node, 192-core cluster:

Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database, Kepner et. al., HPEC 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

30
ACCUMULO SCALABILITY: GRAPH500 BENCHMARK

source: http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

31
ATOMIC INCREMENT PERFORMANCE COMPARISON
Read/Modify/Write (HBase) vs. Iterators/Combiners (Accumulo)

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

32
QUESTIONS?

Adam Fuchs, CTO
Sqrrl Data, Inc.

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

33

Contenu connexe

Tendances

Hunk: Splunk Analytics for Hadoop
Hunk: Splunk Analytics for HadoopHunk: Splunk Analytics for Hadoop
Hunk: Splunk Analytics for HadoopGeorg Knon
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
 
December 2013 HUG: Hunk - Splunk over Hadoop
December 2013 HUG: Hunk - Splunk over HadoopDecember 2013 HUG: Hunk - Splunk over Hadoop
December 2013 HUG: Hunk - Splunk over HadoopYahoo Developer Network
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalHadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalBigDataCloud
 
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDeltares
 
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersEnabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersDataWorks Summit
 
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017Cloudera Japan
 
Deep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDeep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDataWorks Summit/Hadoop Summit
 
Differential data processing for energy efficiency of wireless sensor networks
Differential data processing for energy efficiency of wireless sensor networksDifferential data processing for energy efficiency of wireless sensor networks
Differential data processing for energy efficiency of wireless sensor networksDaniel Lim
 
PHISSUG S01E02: 99 way your data could die
PHISSUG S01E02: 99 way your data could diePHISSUG S01E02: 99 way your data could die
PHISSUG S01E02: 99 way your data could dieArgelo Royce Bautista
 
SplunkLive! Hunk Technical Deep Dive
SplunkLive! Hunk Technical Deep DiveSplunkLive! Hunk Technical Deep Dive
SplunkLive! Hunk Technical Deep DiveSplunk
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypetPyData
 
Dr. Andreas Lattner- Setting up predictive services with Palladium
Dr. Andreas Lattner- Setting up predictive services with PalladiumDr. Andreas Lattner- Setting up predictive services with Palladium
Dr. Andreas Lattner- Setting up predictive services with PalladiumPyData
 
Elster falch-gpu-cse-sem-oct2013
Elster falch-gpu-cse-sem-oct2013Elster falch-gpu-cse-sem-oct2013
Elster falch-gpu-cse-sem-oct2013Anne Elster
 
Approximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processingApproximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processingGabriele Modena
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualizationbigdataviz_bay
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech ProjectsJody Garnett
 

Tendances (20)

Hunk: Splunk Analytics for Hadoop
Hunk: Splunk Analytics for HadoopHunk: Splunk Analytics for Hadoop
Hunk: Splunk Analytics for Hadoop
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for Hadoop
 
December 2013 HUG: Hunk - Splunk over Hadoop
December 2013 HUG: Hunk - Splunk over HadoopDecember 2013 HUG: Hunk - Splunk over Hadoop
December 2013 HUG: Hunk - Splunk over Hadoop
 
Phissug s01 ep6, stretch database
Phissug s01 ep6, stretch databasePhissug s01 ep6, stretch database
Phissug s01 ep6, stretch database
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalHadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
 
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
 
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersEnabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
 
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
 
Deep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDeep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in Spark
 
Differential data processing for energy efficiency of wireless sensor networks
Differential data processing for energy efficiency of wireless sensor networksDifferential data processing for energy efficiency of wireless sensor networks
Differential data processing for energy efficiency of wireless sensor networks
 
PHISSUG S01E02: 99 way your data could die
PHISSUG S01E02: 99 way your data could diePHISSUG S01E02: 99 way your data could die
PHISSUG S01E02: 99 way your data could die
 
SplunkLive! Hunk Technical Deep Dive
SplunkLive! Hunk Technical Deep DiveSplunkLive! Hunk Technical Deep Dive
SplunkLive! Hunk Technical Deep Dive
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypet
 
Dr. Andreas Lattner- Setting up predictive services with Palladium
Dr. Andreas Lattner- Setting up predictive services with PalladiumDr. Andreas Lattner- Setting up predictive services with Palladium
Dr. Andreas Lattner- Setting up predictive services with Palladium
 
Elster falch-gpu-cse-sem-oct2013
Elster falch-gpu-cse-sem-oct2013Elster falch-gpu-cse-sem-oct2013
Elster falch-gpu-cse-sem-oct2013
 
Approximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processingApproximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processing
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 

Similaire à Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

Sqrrl Overview for Stac Research
Sqrrl Overview for Stac ResearchSqrrl Overview for Stac Research
Sqrrl Overview for Stac ResearchSqrrl
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Matt Stubbs
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterCloudera, Inc.
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Datajdijcks
 
Sqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data SilosSqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data SilosSqrrl
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccionFran Navarro
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Cloudera, Inc.
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshopFang Mac
 
Pivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache HadoopPivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache Hadoopmarklpollack
 
Accumulo meetup 20130109
Accumulo meetup 20130109Accumulo meetup 20130109
Accumulo meetup 20130109Sqrrl
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataInfiniteGraph
 
Soa12c launch 5 event processing shmakov eng cr
Soa12c launch 5 event processing shmakov eng crSoa12c launch 5 event processing shmakov eng cr
Soa12c launch 5 event processing shmakov eng crVasily Demin
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchCloudera, Inc.
 
Splunk Enterprise 6.3 - Splunk Tech Day
Splunk Enterprise 6.3 - Splunk Tech DaySplunk Enterprise 6.3 - Splunk Tech Day
Splunk Enterprise 6.3 - Splunk Tech DayZivaro Inc
 
JoTechies - Azure SQL DB
JoTechies - Azure SQL DBJoTechies - Azure SQL DB
JoTechies - Azure SQL DBJoTechies
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
SplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunk
 

Similaire à Adam Fuchs' Accumulo Talk at NoSQL Now! 2013 (20)

Sqrrl Overview for Stac Research
Sqrrl Overview for Stac ResearchSqrrl Overview for Stac Research
Sqrrl Overview for Stac Research
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
Sqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data SilosSqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data Silos
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera

 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Pivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache HadoopPivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache Hadoop
 
Accumulo meetup 20130109
Accumulo meetup 20130109Accumulo meetup 20130109
Accumulo meetup 20130109
 
Sqrrl and Accumulo
Sqrrl and AccumuloSqrrl and Accumulo
Sqrrl and Accumulo
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
 
Soa12c launch 5 event processing shmakov eng cr
Soa12c launch 5 event processing shmakov eng crSoa12c launch 5 event processing shmakov eng cr
Soa12c launch 5 event processing shmakov eng cr
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
Splunk Enterprise 6.3 - Splunk Tech Day
Splunk Enterprise 6.3 - Splunk Tech DaySplunk Enterprise 6.3 - Splunk Tech Day
Splunk Enterprise 6.3 - Splunk Tech Day
 
JoTechies - Azure SQL DB
JoTechies - Azure SQL DBJoTechies - Azure SQL DB
JoTechies - Azure SQL DB
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
SplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and Logs
 

Plus de Sqrrl

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government TechnologySqrrl
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsSqrrl
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkSqrrl
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedSqrrl
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Sqrrl
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphSqrrl
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Sqrrl
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivitySqrrl
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingSqrrl
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Sqrrl
 
Leveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivityLeveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivitySqrrl
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert TriageSqrrl
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to KnowSqrrl
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data AdvantageSqrrl
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreSqrrl
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelSqrrl
 

Plus de Sqrrl (20)

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government Technology
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your Hunts
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your Network
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar Users
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control Activity
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led Training
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
 
Leveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivityLeveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker Activity
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert Triage
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to Know
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use Case
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data Advantage
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, Analyze
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber Hunting
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value Store
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
 

Dernier

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Dernier (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

  • 1. Securely explore your data SQRRL ENTERPRISE + APACHE ACCUMULO: A secure, scalable, real-time analysis framework Adam Fuchs, CTO Sqrrl Data, Inc. August 21, 2013
  • 2. OUTLINE Two Halves of “Real-Time” Accumulo and Sqrrl Technology Data-Centric Security Table Designs Performance Benchmarks © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
  • 3. TWO HALVES OF REAL-TIME Data-Driven Real-Time  reduce event to reaction time © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Query-Driven Real-Time  reduce ingest to query latency
  • 4. Data-Driven + Query-Driven Real-Time Ecosystem Actions 3 SPE 4 Data 1 Dashboards 2 5 NoSQL+ 6 1. 2. 3. 4. 5. 6. Interactive Analysis Tools (Discovery + Forensics) SPE queries NoSQL to enrich streaming data SPE persists results in NoSQL for future query SPE takes action automatically SPE issues data-driven alerts Sqrrl provides context for dashboards Analysis tools query use Sqrrl to search and manipulate historical data © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
  • 5. This talk focuses on the database. Actions 3 SPE 4 Data 1 Dashboards 2 5 NoSQL+ 6 Interactive Analysis Tools (Discovery + Forensics) 1. 2. 3. 4. 5. 6. SPE queries NoSQL to enrich streaming data SPE persists results in NoSQL for future query SPE takes action automatically SPE issues data-driven alerts Sqrrl provides context for dashboards Analysis tools query use Sqrrl to search and manipulate historical data © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 5
  • 6. OUTLINE Two Halves of “Real-Time” Accumulo and Sqrrl Technology Data-Centric Security Table Designs Performance Benchmarks © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
  • 7. ACCUMULO DATA FORMAT An Accumulo key is a 5-tuple, consisting of: - Row: Controls Atomicity - Column Family: Controls Locality - Column Qualifier: Controls Uniqueness - Visibility Label: Controls Access - Timestamp: Controls Versioning Accumulo Key/Value Example © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 7
  • 8. ACCUMULO TABLETS Well-Known Location (zookeeper) Collections of KV pairs form Tables Tables are partitioned into Tablets Metadata tablets hold info about other tablets, forming a 3-level hierarchy A Tablet is a unit of work for a Tablet Server Root Tablet -∞ to ∞ Metadata Tablet 1 Metadata Tablet 2 -∞ to “Encyclopedia:Ocelot” “Encyclopedia:Ocelot” to ∞ Table: Adam’s Table Data Tablet -∞ : thing Data Tablet thing : ∞ Table: Encyclopedia Data Tablet -∞ : Ocelot © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Data Tablet Ocelot : Yak Data Tablet Yak : ∞ Table: Foo Data Tablet -∞ to ∞ 8
  • 9. ACCUMULO PROCESSES Zookeeper Tablet Server Zookeeper Zookeeper Tablet Read/Write Delegate Authority Assign/Balance Tablet Server Master Application Application Tablet Store/Replicate Tablet Server HDFS © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Application Tablet 9
  • 10. TABLET DATA FLOW Tablet Writes In-Memory Map Iterator Tree Iterator Tree Reads Minor Compaction Sorted, Indexed File Write Ahead Log (For Recovery) Scan Merging / Major Compaction © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Sorted, Indexed File Iterator Tree Sorted, Indexed File 10
  • 11. WORD COUNT: Summing Aggregating Iterator Input Corpus © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 11
  • 12. ITERATOR FRAMEWORK Iterator Operations: - File Reads - Block Caching - Merging - Deletion - Isolation - Locality Groups - Range Selection - Column Selection - Cell-level Security - Versioning - Filtering - Aggregation - Partitioned Joins © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 12
  • 13. ACCUMULO LATENCIES ~ms ~ms Ingesters Tablet Servers InInInMemory Memory Memory Map Map Map Batch Writer ms - min Input ~ms Scan Scan Scan Iterators Iterators Iterators Queriers Scanner/ Batch Scanner Output Compactio Compactio n Compaction n Iterators Iterators Iterators RFile RFile RFiles © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 13
  • 14. ACCUMULO THROUGHPUT Scan: up to 1M entries/s per node Ingest: up to 500K entries/s per node ~ms ~ms Ingesters Tablet Servers InInInMemory Memory Memory Map Map Map Batch Writer ms - min Input ~ms Scan Scan Scan Iterators Iterators Iterators Queriers Scanner /Batch Scanner Output Compacti Compacti on Compaction on Iterators Iterators Iterators RFile RFile RFiles Read-Modify-Write Latency: ~ms  >1K entries/s challenging with R-M-W © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 14
  • 15. SQRRL ENTERPRISE Bulk Processing Integration Exploratory / Operational Apps Built on Apache Accumulo Graph + Document I/O Sqrrl API over Apache Thrift RPC (JSON, Graph, Aggregation, Search, etc.) • • • • • Sqrrl proprietary Automated indexing Custom iterators Lucene integration Security extensions Sqrrl Server Accumulo RPC (Sorted Key/Value I/O) • Open source (including Sqrrl contributions) Hadoop RPC (File I/O) • Open source or commercial distributions © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 15
  • 16. OUTLINE Two Halves of “Real-Time” Accumulo and Sqrrl Technology Data-Centric Security Table Designs Performance Benchmarks © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 16
  • 17. DATA-CENTRIC SECURITY Definition: Data carries with it information that is required to make policy decisions on its releasability. User 1 Sqrrl/ Accumul o © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential User 2 17
  • 18. SECURITY Example Accumulo Key/Value Pairs Accumulo is the only NoSQL database with cell-level access controls © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 18
  • 19. DATA-CENTRIC SECURITY ECOSYSTEM Key Mgmt Audits End Users Labeler Sqrrl Enterprise Policies Data Policy Engine Apps Auth. Service User Attributes © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 19
  • 20. OUTLINE Two Halves of “Real-Time” Accumulo and Sqrrl Technology Data-Centric Security Table Designs Performance Benchmarks © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 20
  • 21. HIERARCHICAL DECOMPOSITION <person> Row: Column Family: Column Qualifier: Value: attribute purchases returns age discount sneakers hat <age> <rate> <cost> <cost> © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 21
  • 22. MATERIALIZED TABLE Key/Value Pair Row: bill Column Family: george attribute purchases attribute purchases Column Qualifier: age discount sneakers age sneakers Value: 49 40% $100 27 $83 © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 22
  • 23. FORWARD AND INVERTED INDEX Table: Forward Index Inverted Index Row: <UUID> <Term> Column Family: <Type> <UUID> Column Qualifier: <Field> <Type+Field> <Term> <Digest of Event> Value: © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 23
  • 24. FORWARD AND INVERTED INDEX © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 24
  • 25. CUSTOM INDEXING Table: Geo Index Latitude Longitude 10110101001 00111010010 11010110110 <GeoHash> Row: 101001110111010101011100001011100 Column Family: <Event Type> Column Qualifier: Value: Depth <UUID> <Digest of Event> © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 25
  • 26. D4M 2.0 SCHEMA FOR TWITTER DATA Table: Tedge TedgeT Row: <UUID> <value> Column Family: “stat” “time” “user” “word ” “stat” “time” “user” “word ” Column Qualifier: <stat> <time> <user > <word > <UUID > <UUID > <UUID > <UUID > “1” “1” “1” “1” “1” “1” “1” “1” Value: © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 26
  • 27. D4M 2.0 SCHEMA FOR TWITTER DATA Table: TedgeDegT Ttext Row: <value> <UUID> Column Family: “stat” “time” “user” “word ” “text” Column Qualifier: “degre e” “degre e” “degre e” “degre e” - Value: <count> <count> <count> © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential <count> <text> 27
  • 28. D4M 2.0 SCHEMA FOR TWITTER DATA Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database, Kepner et. al., HPEC 2013 © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 28
  • 29. OUTLINE Two Halves of “Real-Time” Accumulo and Sqrrl Technology Data-Centric Security Table Designs Performance Benchmarks © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 29
  • 30. ACCUMULO WITH D4M 2.0 SCHEMA PERFORMANCE Maximizing throughput on an 8-node, 192-core cluster: Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database, Kepner et. al., HPEC 2013 © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 30
  • 31. ACCUMULO SCALABILITY: GRAPH500 BENCHMARK source: http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 31
  • 32. ATOMIC INCREMENT PERFORMANCE COMPARISON Read/Modify/Write (HBase) vs. Iterators/Combiners (Accumulo) © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 32
  • 33. QUESTIONS? Adam Fuchs, CTO Sqrrl Data, Inc. © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 33