SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
page
BUILDING FAST DATA APPLICATIONS
WITH STREAMING DATA
Ryan Betts, CTO
VoltDB
1
page
© 2014 VoltDB PROPRIETARY
 page
AGENDA
•  Fast Data Application Patterns
•  Digging Deeper: Looking at the Data
•  Streaming Approach
•  DB Approach
•  Summary
2
page
© 2014 VoltDB PROPRIETARY
Collect	
   Explore	
  
Analyze	
  
Act	
  
Data leads to
applications

Applications create
more data
3
page
© 2014 VoltDB PROPRIETARY
DATA ARCHITECTURE FOR FAST + BIG DATA

Enterprise Apps
ETL
CR
M
ER
P
Etc
.
Data Lake
(HDFS, etc.)
BIG DATA
SQL on
Hadoop
Map
Reduce
Exploratory
Analytics
BI
Reporting
Fast Operational
Database
FAST DATA
Export
Ingest /
Interactive
Streaming
Analytics
Fast Serve
Analytics
Decisioning
4
page
© 2014 VoltDB PROPRIETARY
IN THE BIG CORNER
Systems facilitating exploration and analytics of large
data sets
5
Example	
  Technologies	
  
Columnar	
  OLAP	
  warehouses	
  
Hadoop	
  Ecosystem	
  
•  MapReduce	
  
•  Hive,	
  Pig	
  
•  SQL.next:	
  Impala,	
  Drill,	
  Shark	
  
Example	
  Applica7ons	
  
•  User	
  segmentaHon	
  &	
  pre-­‐scoring	
  
•  Seasonal	
  trending	
  
•  RecommendaHon	
  matrix	
  calculaHons	
  
•  Building	
  search	
  indexes	
  
•  Data	
  Science:	
  staHsHcal	
  clustering,	
  
Machine	
  learning	
  
page
© 2014 VoltDB PROPRIETARY
IN THE FAST CORNER
Systems facilitating real time ingest, analytics and
decisions against incoming event feeds
6
Example	
  Technologies	
  
•  Streaming	
  frameworks	
  
•  VoltDB	
  
	
  
Example	
  Applica7ons	
  
•  Micro-­‐personalizaHon	
  
•  RecommendaHon	
  serving	
  
•  AlerHng/alarming	
  
•  OperaHonal	
  monitoring	
  
•  Data	
  enrichment	
  (ETL	
  eliminaHon)	
  
•  High	
  throughput	
  authorizaHon	
  
•  Ex:	
  API	
  quota	
  enforcement	
  
page
© 2014 VoltDB PROPRIETARY
REAL TIME SCORING EXAMPLE
OLAP / Hadoop
User	
  segmenta6on	
  model	
  
Calculated	
  on	
  Big	
  Side	
  and	
  
cached	
  in	
  Fast	
  Side	
  
Personaliza6on	
  requests	
  
Score	
  based	
  responses	
  
Game	
  play	
  events	
  and	
  scoring	
  decisions	
  
exported	
  to	
  Big	
  
7
page
FAST AND BIG IN COMBINATION
•  Fast Profile
•  In memory: user segmentation
- GB to TB (300M+ rows)
•  10k to 1M+ requests/sec 
•  99 percentile latency under
5ms. (5x9’s under 50ms)
•  VoltDB export to Vertica

•  Big Profile
•  TB to PB of historical data
•  Columnar analytics for fast
reporting.
•  Real time ingest of historical
data (possibly via VoltDB)
•  Vertica UDX to VoltDB
page
© 2014 VoltDB PROPRIETARY
TYPICAL FAST QUESTIONS
9
Hadoop	
  
SQL	
  OLAP	
  
Fast	
  
•  Is	
  the	
  fast	
  layer	
  streaming?	
  
•  It	
  is	
  oSen	
  more	
  like	
  OLTP	
  
•  How	
  do	
  the	
  pieces	
  communicate?	
  
•  OLAP	
  analyHcs	
  from	
  Big	
  -­‐>	
  Fast	
  
•  New	
  events	
  from	
  Fast	
  -­‐>	
  Big	
  
•  Where	
  do	
  “analy7cs”	
  belong?	
  
•  AnalyHcs	
  with	
  decisions:	
  with	
  Fast	
  
•  AnalyHcs	
  against	
  history:	
  with	
  Big	
  
•  Are	
  streaming	
  frameworks	
  equivalent?	
  
•  TradiHonal	
  SQL	
  CEP	
  (Esper)	
  
•  Tuple	
  DAGs	
  (Storm)	
  
•  Window	
  processors	
  on	
  Hadoop	
  (Spark)	
  
	
  
page
© 2014 VoltDB PROPRIETARY
THREE FAST DATA APPLICATION PATTERNS
•  Real-Time Analytics
•  Real-time analytics for operations
•  Real-time KPI measurement
•  Real-time analytics for apps

•  Data Pipelines
•  Streaming data enrichment
•  Sessionization / re-assembly
•  Correlation (by time, by location, by id)
•  Filtering
•  Pre-aggregation
10
•  Fast Request/Response
•  Mobile Authorization
•  Campaign Authorization
•  Fast API Quota Enforcement
•  Micro-Personalization
•  Recommendation Serving
page
© 2014 VoltDB PROPRIETARY
DATA FLOWS
11
Pipeline
Data Lake
HDFS/OLAP/Queue
Ingest
Real Time Analytics
Export
Using Real Time Data
Request/
Response
page
© 2014 VoltDB PROPRIETARY
THE INPUT FEED IS ONLY A SMALL PART OF FAST DATA
Data
 Temporality
Input Feed
 Click stream, tick stream, sensors,
metrics
Real-Time
Analytic Results
Event metadata
 Device version, location, user
profiles, point of interest data
OLAP Analytics Used in
Real-Time Decisions
Responses
12
Examples
Event Stream
Persistent/
Queryable
Persistent
(Look-Ups)
Pipeline Output
Persistent
(Look-Ups)
Event Stream
Event Stream
Counters, streaming aggregates,
Time-series rollups
Scoring models, seasonal usage,
demographic trends
Policy enforcement decisions,
Personalization recommendations
Enriched, filtered, correlated
transform of input feed
page
© 2014 VoltDB PROPRIETARY
THREE REQUIREMENTS CREATE STATE

1.  RT analytics outputs must be queryable
2.  Metadata, dimension data, “lookup tables” to
create groupings for analytics and to supply
enrichment data
3.  Grouping, filtering and aggregating generate
intermediate state – open sessions, partially
assembled logical events
13
page
© 2014 VoltDB PROPRIETARY
STORM: A COMMON ALTERNATIVE
•  Spouts and Bolts
•  Streaming computation
•  Run snippets of java against each event
•  Connect queues to backends with
intermediate code
14
But…	
  
1.  Need	
  “lookup”	
  database	
  for	
  dimension	
  data.	
  
2.  Need	
  a	
  “serving”	
  database	
  for	
  analyHc	
  results	
  
3.  Need	
  addiHonal	
  management	
  clusters	
  (ZooKeeper)	
  
4.  No	
  ad-­‐hoc	
  queries.	
  
5.  Lots	
  of	
  custom	
  code	
  (rarely	
  declaraHve).	
  
page
© 2014 VoltDB PROPRIETARY
STREAMING OPERATORS NEED STATE
Require State
•  Filter
•  Join
•  Aggregate
•  Group By
Stateless
•  Partition
15
page
© 2014 VoltDB PROPRIETARY
VOLTDB: REAL-TIME ANALYTICS
16
VoltDB
Metadata	
  
(Dimension	
  table)	
  
Session	
  state	
  
(Fact	
  table)	
   •  Operational analytics and
monitoring
•  RT analytics enabling user-
facing applications
•  KPI for internal BI/Dashboards
•  In-memory MPP SQL over
ODBC/JDBC
•  Cheap + correct materialized
views for streaming aggregations
SQL,	
  Views	
  
Ingest
page
© 2014 VoltDB PROPRIETARY
VOLTDB: REQUEST/RESPONSE DECISIONS
17
•  Authorization
•  RT balance checks, quota
enforcement 
•  Personalization and
Recommendation Serving
•  Combine pre-score with immediate
context
•  Fully ACID transaction model.
•  Thousands to Millions per second
•  At less than 5ms latencies
Metadata	
  
(Dimension	
  table)	
  
Session	
  state	
  
(Fact	
  table)	
  
ACID	
  TransacHons	
  
page
© 2014 VoltDB PROPRIETARY
VOLTDB: DATA PIPELINES WITH EXPORT
18
VoltDB
Metadata	
  
(Dimension	
  table)	
  
Session	
  state	
  
(Fact	
  table)	
  
•  Filtering (ex: only RFID /
iBeacon readings that show
change from previous location).
•  Sessionization
•  Common version re-writing
•  Data enrichment

•  MPP streaming Export
•  Row data, Thrift messages, CSV
•  OLAP, HDFS and message queues
Export
page
© 2014 VoltDB PROPRIETARY
PIPELINE DEPLOYED: VOLTDB… 

19
Manages	
  game	
  state	
  for	
  online	
  poker	
  and	
  archives	
  completed	
  games	
  to	
  Hadoop.	
  
	
  
Ingests	
  smart	
  meter	
  readings	
  from	
  concentrators,	
  supports	
  real	
  7me	
  applica7ons	
  and	
  
buffers	
  data	
  for	
  end	
  of	
  day	
  billing	
  media7on	
  systems.	
  
	
  
page
© 2014 VoltDB PROPRIETARY
PIPELINE DEPLOYED: VOLTDB… 

20
Ingests	
  RFID	
  readings,	
  supports	
  real	
  Hme	
  applicaHons	
  that	
  push	
  social	
  media	
  updates	
  based	
  
on	
  VoltDB	
  leaderboards.	
  
	
  
Processes	
  clickstream	
  logs	
  and	
  exports	
  correlated	
  USERID	
  records	
  for	
  use	
  at	
  CDN	
  endpoints	
  
for	
  adverHsing	
  targeHng	
  
	
  
Processes	
  SKU	
  catalogs	
  from	
  suppliers	
  to	
  produce	
  correlated	
  catalog	
  that	
  is	
  exported	
  to	
  
indexing	
  and	
  post-­‐processing	
  for	
  an	
  online	
  retailer.	
  
page
© 2014 VoltDB PROPRIETARY
VOLTDB EXPORT ANSWERS THE QUESTIONS:

How do I stream filtered,
enriched, updated results
to OLAP/HDFS systems?
21
How do I send alerts,
alarms, SMS, or messages
to downstream
applications?
page
© 2014 VoltDB PROPRIETARY
VOLTDB EXPORT UI
CREATE TABLE events (!
EventID INTEGER,!
time TIMESTAMP,!
msg VARCHAR(128));!
EXPORT TABLE events;!
22
<export enabled="true"
target="file">!
ddl.sql!
deployment.xml!INSERT into TABLE values…!
Application SQL!
page
© 2014 VoltDB PROPRIETARY
EXPORTING TO HDFS
<export enabled="true" target="http">!
<configuration>!
<property name="endpoint">!
http://hadoopserver/webhdfs/v1.0/%t/%p.%t.%g.csv!
</property>!
</configuration>!
</export>
page
© 2014 VoltDB PROPRIETARY
EXPORT FORMATS
•  CSV
•  TSV
•  Avro container
•  Raw data
page
© 2014 VoltDB PROPRIETARY
EXTENSIBLE API
25
public void onBlockStart() throws RestartBlockException;{!
}!
!
public boolean processRow(int rowSize, byte[] rowData) throws
"RestartBlockException {!
}!
!
public void onBlockCompletion() throws RestartBlockException {!
}!
All	
  of	
  these	
  export	
  connectors	
  are	
  hosted	
  plugins	
  to	
  the	
  VoltDB	
  
database.	
  VoltDB	
  manages	
  HA,	
  fault	
  tolerance,	
  configura6on,	
  
and	
  MPP	
  scale-­‐out.	
  
page
© 2014 VoltDB PROPRIETARY
DATA ARCHITECTURE FOR FAST + BIG DATA

Enterprise Apps
ETL
CR
M
ER
P
Etc
.
Data Lake
(HDFS, etc.)
BIG DATA
SQL on
Hadoop
Map
Reduce
Exploratory
Analytics
BI
Reporting
Fast Operational
Database
FAST DATA
Export
Ingest /
Interactive
Streaming
Analytics
Fast Serve
Analytics
Decisioning
26
page
© 2014 VoltDB PROPRIETARY
 27
page
© 2014 VoltDB PROPRIETARY
 page
THANK YOU!
28
page
© 2014 VoltDB PROPRIETARY
VOLTDB:
29
•  We	
  say	
  “Ingest	
  &Export”	
  vs.	
  Spout	
  and	
  Bolt	
  
•  Scale	
  (ACID)	
  snippets	
  of	
  Java	
  for	
  each	
  
incoming	
  event.	
  
	
  
AND…	
  
•  Actually	
  serve	
  real	
  7me	
  analy7cs	
  via	
  SQL	
  
•  Metadata	
  for	
  lookup/enrichment	
  implicit	
  in	
  DB	
  func7on	
  
•  Integrate	
  with	
  OLAP	
  systems	
  to	
  use	
  OLAP	
  reports	
  with	
  event-­‐
based	
  processing	
  
•  Generate	
  fast	
  transac7onal	
  responses	
  
•  Support	
  ad-­‐hoc	
  queryability	
  
•  Declara7ve	
  aggrega7ons	
  vs.	
  code	
  
•  Fast:	
  no	
  need	
  to	
  micro-­‐batch	
  
page
© 2014 VoltDB PROPRIETARY
VOLTDB: DATA PIPELINES WITH EXPORT
30
VoltDB
Metadata	
  
(Dimension	
  table)	
  
Session	
  state	
  
(Fact	
  table)	
  
•  Filtering (ex: only RFID /
iBeacon readings that show
change from previous location).
•  Sessionization
•  Common version re-writing
•  Data enrichment

•  MPP streaming Export
•  Row data, Thrift messages, CSV
•  OLAP, HDFS and message queues
Export
page
© 2014 VoltDB PROPRIETARY
INTEGRATING DATA SOURCES WITH VOLTDB
•  CSV	
  loader	
  
•  Kaga	
  loader	
  
•  JDBC	
  loader	
  
•  Vertica UDx	
  
•  Extensible	
  loader	
  API	
  
•  JDBC	
  
•  ODBC	
  
•  HTTP	
  JSON	
  
•  NaHve	
  client	
  drivers	
  /	
  SDKs	
  
BULK	
  LOADERS	
   APPLICATION	
  INTERFACES	
  
page
© 2014 VoltDB PROPRIETARY
INTEGRATING WITH CSV DATA
csvloader volttable -f data.csv!
page
© 2014 VoltDB PROPRIETARY
INTEGRATING WITH KAFKA
kafkaloader volttable !
--zookeeper=zkserver:2181 !
--topic=topicname!
page
© 2014 VoltDB PROPRIETARY
INTEGRATING WITH JDBC SOURCES
jdbcloader volttable !
--jdbcurl=jdbc:postgresql://server/db !
--jdbcdriver=org.postgresql.Driver !
--jdbctable=table
page
© 2014 VoltDB PROPRIETARY
INTEGRATING WITH HP VERTICA UDX
SELECT voltdbload(c1, c2, c3!
USING PARAMETERS voltservers='localhost’,!
volttable=‘volttable’)!
FROM T;!
page
© 2014 VoltDB PROPRIETARY
EXTENSIBLE VOLTBULKLOADER: BULK SMASH
36
VoltBulkLoader loader =!
"client.getNewBulkLoader(tableName, maxBatchSize,
"failureCallback);!
for (…) {!
loader.insertRow(handle, values);!
}!
loader.drain();!
loader.close();!
All	
  of	
  these	
  tools	
  are	
  built	
  on	
  a	
  MIT	
  licensed	
  extensible	
  API	
  that	
  
provides	
  performance	
  op6miza6ons,	
  batching,	
  load	
  balancing.	
  
page
© 2014 VoltDB PROPRIETARY
NATIVE CLIENT LIBRARIES
•  Java
•  C++
•  PHP
•  Node.js	
  
•  Go	
  
•  Python	
  
•  Erlang	
  
•  Ruby	
  
Or,	
  just…	
  
curl ‘http://localhost:8080/api/1.0/?!
Procedure=Vote&Parameters=[1,1,0]’!
	
  
page
© 2014 VoltDB PROPRIETARY
VOLTDB EXPORT TOPOLOGIES
•  VoltDB -> Queue (Kafka, RabbitMQ)
•  VoltDB -> HDFS (for Pig/Hive/etc. processing)
•  VoltDB -> OLAP (Vertica, Netezza..)
•  VoltDB -> HTTP Endpoint, i.e: ElasticSearch
38
page
© 2014 VoltDB PROPRIETARY
VOLTDB EXPORT PROGRAMMING CONTRACT
•  Export data is durable until exported
•  MPP scale-out of export data flows
•  At-least-once delivery during HA events
•  Built-in row ids (for uniqueness filtering)
•  Extensible API for open source connectors
39
page
© 2014 VoltDB PROPRIETARY
INTEGRATING VOLTDB WITH EXPORT TARGETS
40
•  Local	
  file	
  system	
  export	
  
•  JDBC	
  export	
  
•  Kaga	
  export	
  
•  RabbitMQ	
  export	
  
•  HDFS	
  export	
  
•  HTTP	
  export	
  
•  Extensible	
  API	
  
page
© 2014 VoltDB PROPRIETARY
EXPORTING TO LOCAL FILE SYSTEM
<export enabled="true" target="file">!
<configuration>!
<property name="type">csv</property>!
<property name="nonce">MyExport</property>!
</configuration>!
</export>!
page
© 2014 VoltDB PROPRIETARY
EXPORTING TO JDBC DESTINATIONS
<export enabled="true" target=”jdbc">!
<configuration>!
<property name=”jdbcurl">jdbc:postgresql://server/db</
property>!
<property name=”jdbcuser">guest</property>!
</configuration>!
</export>
page
© 2014 VoltDB PROPRIETARY
EXPORTING TO KAFKA
<export enabled="true" target=”kafka">!
<configuration>!
<property name=“metadata.broker.list”>server1</property>!
</configuration>!
</export>
page
© 2014 VoltDB PROPRIETARY
EXPORTING TO RABBITMQ
<export enabled="true" target="rabbitmq">!
<configuration>!
<property name="broker.host”>server1</property>!
</configuration>!
</export>

Contenu connexe

Tendances

Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerDataWorks Summit
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
Building Audi’s enterprise big data platform
Building Audi’s enterprise big data platformBuilding Audi’s enterprise big data platform
Building Audi’s enterprise big data platformDataWorks Summit
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNDataWorks Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...DataWorks Summit
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
Data Centric Transformation in Telecom
Data Centric Transformation in TelecomData Centric Transformation in Telecom
Data Centric Transformation in TelecomDataWorks Summit
 
MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkJoe Percivall
 
StampedeCon 2015 Keynote
StampedeCon 2015 KeynoteStampedeCon 2015 Keynote
StampedeCon 2015 KeynoteKen Owens
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data avanttic Consultoría Tecnológica
 
Modernizing your Application Architecture with Microservices
Modernizing your Application Architecture with MicroservicesModernizing your Application Architecture with Microservices
Modernizing your Application Architecture with Microservicesconfluent
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopDataWorks Summit/Hadoop Summit
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...DataWorks Summit
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDataWorks Summit
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
 

Tendances (18)

Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging Manager
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Building Audi’s enterprise big data platform
Building Audi’s enterprise big data platformBuilding Audi’s enterprise big data platform
Building Audi’s enterprise big data platform
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Data Centric Transformation in Telecom
Data Centric Transformation in TelecomData Centric Transformation in Telecom
Data Centric Transformation in Telecom
 
MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talk
 
StampedeCon 2015 Keynote
StampedeCon 2015 KeynoteStampedeCon 2015 Keynote
StampedeCon 2015 Keynote
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
 
Modernizing your Application Architecture with Microservices
Modernizing your Application Architecture with MicroservicesModernizing your Application Architecture with Microservices
Modernizing your Application Architecture with Microservices
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
 

Similaire à Building Fast Applications for Streaming Data

How to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contendersHow to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contendersAkmal Chaudhri
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB
 
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...VoltDB
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt
 
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...NoSQLmatters
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
Combining Hadoop RDBMS for Large-Scale Big Data Analytics
Combining Hadoop RDBMS for Large-Scale Big Data AnalyticsCombining Hadoop RDBMS for Large-Scale Big Data Analytics
Combining Hadoop RDBMS for Large-Scale Big Data AnalyticsDataWorks Summit
 
Data Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data WarehousingData Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data WarehousingAll Things Open
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
Ibis 2015 final template
Ibis 2015 final templateIbis 2015 final template
Ibis 2015 final templateSumit Sarkar
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingCascading
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksMapR Technologies
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Fast Data – the New Big Data
Fast Data – the New Big DataFast Data – the New Big Data
Fast Data – the New Big DataVoltDB
 
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020HostedbyConfluent
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Data Con LA
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 

Similaire à Building Fast Applications for Streaming Data (20)

How to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contendersHow to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contenders
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
 
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
Combining Hadoop RDBMS for Large-Scale Big Data Analytics
Combining Hadoop RDBMS for Large-Scale Big Data AnalyticsCombining Hadoop RDBMS for Large-Scale Big Data Analytics
Combining Hadoop RDBMS for Large-Scale Big Data Analytics
 
Data Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data WarehousingData Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data Warehousing
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Ibis 2015 final template
Ibis 2015 final templateIbis 2015 final template
Ibis 2015 final template
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log Processing
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Fast Data – the New Big Data
Fast Data – the New Big DataFast Data – the New Big Data
Fast Data – the New Big Data
 
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
datavault2.pptx
datavault2.pptxdatavault2.pptx
datavault2.pptx
 

Plus de freshdatabos

An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Pythonfreshdatabos
 
Thinking in Data Workshop
Thinking in Data WorkshopThinking in Data Workshop
Thinking in Data Workshopfreshdatabos
 
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...freshdatabos
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing Networksfreshdatabos
 
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...freshdatabos
 
Data Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhD
Data Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhDData Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhD
Data Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhDfreshdatabos
 
Vector Space Word Representations - Rani Nelken PhD
Vector Space Word Representations - Rani Nelken PhDVector Space Word Representations - Rani Nelken PhD
Vector Space Word Representations - Rani Nelken PhDfreshdatabos
 
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang)  - 2014 Boston Data FestivalWinning Data Science Competitions (Owen Zhang)  - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festivalfreshdatabos
 
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival - You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -freshdatabos
 

Plus de freshdatabos (9)

An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
Thinking in Data Workshop
Thinking in Data WorkshopThinking in Data Workshop
Thinking in Data Workshop
 
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing Networks
 
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
 
Data Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhD
Data Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhDData Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhD
Data Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhD
 
Vector Space Word Representations - Rani Nelken PhD
Vector Space Word Representations - Rani Nelken PhDVector Space Word Representations - Rani Nelken PhD
Vector Space Word Representations - Rani Nelken PhD
 
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang)  - 2014 Boston Data FestivalWinning Data Science Competitions (Owen Zhang)  - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
 
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival - You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 

Building Fast Applications for Streaming Data

  • 1. page BUILDING FAST DATA APPLICATIONS WITH STREAMING DATA Ryan Betts, CTO VoltDB 1
  • 2. page © 2014 VoltDB PROPRIETARY page AGENDA •  Fast Data Application Patterns •  Digging Deeper: Looking at the Data •  Streaming Approach •  DB Approach •  Summary 2
  • 3. page © 2014 VoltDB PROPRIETARY Collect   Explore   Analyze   Act   Data leads to applications Applications create more data 3
  • 4. page © 2014 VoltDB PROPRIETARY DATA ARCHITECTURE FOR FAST + BIG DATA Enterprise Apps ETL CR M ER P Etc . Data Lake (HDFS, etc.) BIG DATA SQL on Hadoop Map Reduce Exploratory Analytics BI Reporting Fast Operational Database FAST DATA Export Ingest / Interactive Streaming Analytics Fast Serve Analytics Decisioning 4
  • 5. page © 2014 VoltDB PROPRIETARY IN THE BIG CORNER Systems facilitating exploration and analytics of large data sets 5 Example  Technologies   Columnar  OLAP  warehouses   Hadoop  Ecosystem   •  MapReduce   •  Hive,  Pig   •  SQL.next:  Impala,  Drill,  Shark   Example  Applica7ons   •  User  segmentaHon  &  pre-­‐scoring   •  Seasonal  trending   •  RecommendaHon  matrix  calculaHons   •  Building  search  indexes   •  Data  Science:  staHsHcal  clustering,   Machine  learning  
  • 6. page © 2014 VoltDB PROPRIETARY IN THE FAST CORNER Systems facilitating real time ingest, analytics and decisions against incoming event feeds 6 Example  Technologies   •  Streaming  frameworks   •  VoltDB     Example  Applica7ons   •  Micro-­‐personalizaHon   •  RecommendaHon  serving   •  AlerHng/alarming   •  OperaHonal  monitoring   •  Data  enrichment  (ETL  eliminaHon)   •  High  throughput  authorizaHon   •  Ex:  API  quota  enforcement  
  • 7. page © 2014 VoltDB PROPRIETARY REAL TIME SCORING EXAMPLE OLAP / Hadoop User  segmenta6on  model   Calculated  on  Big  Side  and   cached  in  Fast  Side   Personaliza6on  requests   Score  based  responses   Game  play  events  and  scoring  decisions   exported  to  Big   7
  • 8. page FAST AND BIG IN COMBINATION •  Fast Profile •  In memory: user segmentation - GB to TB (300M+ rows) •  10k to 1M+ requests/sec •  99 percentile latency under 5ms. (5x9’s under 50ms) •  VoltDB export to Vertica •  Big Profile •  TB to PB of historical data •  Columnar analytics for fast reporting. •  Real time ingest of historical data (possibly via VoltDB) •  Vertica UDX to VoltDB
  • 9. page © 2014 VoltDB PROPRIETARY TYPICAL FAST QUESTIONS 9 Hadoop   SQL  OLAP   Fast   •  Is  the  fast  layer  streaming?   •  It  is  oSen  more  like  OLTP   •  How  do  the  pieces  communicate?   •  OLAP  analyHcs  from  Big  -­‐>  Fast   •  New  events  from  Fast  -­‐>  Big   •  Where  do  “analy7cs”  belong?   •  AnalyHcs  with  decisions:  with  Fast   •  AnalyHcs  against  history:  with  Big   •  Are  streaming  frameworks  equivalent?   •  TradiHonal  SQL  CEP  (Esper)   •  Tuple  DAGs  (Storm)   •  Window  processors  on  Hadoop  (Spark)    
  • 10. page © 2014 VoltDB PROPRIETARY THREE FAST DATA APPLICATION PATTERNS •  Real-Time Analytics •  Real-time analytics for operations •  Real-time KPI measurement •  Real-time analytics for apps •  Data Pipelines •  Streaming data enrichment •  Sessionization / re-assembly •  Correlation (by time, by location, by id) •  Filtering •  Pre-aggregation 10 •  Fast Request/Response •  Mobile Authorization •  Campaign Authorization •  Fast API Quota Enforcement •  Micro-Personalization •  Recommendation Serving
  • 11. page © 2014 VoltDB PROPRIETARY DATA FLOWS 11 Pipeline Data Lake HDFS/OLAP/Queue Ingest Real Time Analytics Export Using Real Time Data Request/ Response
  • 12. page © 2014 VoltDB PROPRIETARY THE INPUT FEED IS ONLY A SMALL PART OF FAST DATA Data Temporality Input Feed Click stream, tick stream, sensors, metrics Real-Time Analytic Results Event metadata Device version, location, user profiles, point of interest data OLAP Analytics Used in Real-Time Decisions Responses 12 Examples Event Stream Persistent/ Queryable Persistent (Look-Ups) Pipeline Output Persistent (Look-Ups) Event Stream Event Stream Counters, streaming aggregates, Time-series rollups Scoring models, seasonal usage, demographic trends Policy enforcement decisions, Personalization recommendations Enriched, filtered, correlated transform of input feed
  • 13. page © 2014 VoltDB PROPRIETARY THREE REQUIREMENTS CREATE STATE 1.  RT analytics outputs must be queryable 2.  Metadata, dimension data, “lookup tables” to create groupings for analytics and to supply enrichment data 3.  Grouping, filtering and aggregating generate intermediate state – open sessions, partially assembled logical events 13
  • 14. page © 2014 VoltDB PROPRIETARY STORM: A COMMON ALTERNATIVE •  Spouts and Bolts •  Streaming computation •  Run snippets of java against each event •  Connect queues to backends with intermediate code 14 But…   1.  Need  “lookup”  database  for  dimension  data.   2.  Need  a  “serving”  database  for  analyHc  results   3.  Need  addiHonal  management  clusters  (ZooKeeper)   4.  No  ad-­‐hoc  queries.   5.  Lots  of  custom  code  (rarely  declaraHve).  
  • 15. page © 2014 VoltDB PROPRIETARY STREAMING OPERATORS NEED STATE Require State •  Filter •  Join •  Aggregate •  Group By Stateless •  Partition 15
  • 16. page © 2014 VoltDB PROPRIETARY VOLTDB: REAL-TIME ANALYTICS 16 VoltDB Metadata   (Dimension  table)   Session  state   (Fact  table)   •  Operational analytics and monitoring •  RT analytics enabling user- facing applications •  KPI for internal BI/Dashboards •  In-memory MPP SQL over ODBC/JDBC •  Cheap + correct materialized views for streaming aggregations SQL,  Views   Ingest
  • 17. page © 2014 VoltDB PROPRIETARY VOLTDB: REQUEST/RESPONSE DECISIONS 17 •  Authorization •  RT balance checks, quota enforcement •  Personalization and Recommendation Serving •  Combine pre-score with immediate context •  Fully ACID transaction model. •  Thousands to Millions per second •  At less than 5ms latencies Metadata   (Dimension  table)   Session  state   (Fact  table)   ACID  TransacHons  
  • 18. page © 2014 VoltDB PROPRIETARY VOLTDB: DATA PIPELINES WITH EXPORT 18 VoltDB Metadata   (Dimension  table)   Session  state   (Fact  table)   •  Filtering (ex: only RFID / iBeacon readings that show change from previous location). •  Sessionization •  Common version re-writing •  Data enrichment •  MPP streaming Export •  Row data, Thrift messages, CSV •  OLAP, HDFS and message queues Export
  • 19. page © 2014 VoltDB PROPRIETARY PIPELINE DEPLOYED: VOLTDB… 19 Manages  game  state  for  online  poker  and  archives  completed  games  to  Hadoop.     Ingests  smart  meter  readings  from  concentrators,  supports  real  7me  applica7ons  and   buffers  data  for  end  of  day  billing  media7on  systems.    
  • 20. page © 2014 VoltDB PROPRIETARY PIPELINE DEPLOYED: VOLTDB… 20 Ingests  RFID  readings,  supports  real  Hme  applicaHons  that  push  social  media  updates  based   on  VoltDB  leaderboards.     Processes  clickstream  logs  and  exports  correlated  USERID  records  for  use  at  CDN  endpoints   for  adverHsing  targeHng     Processes  SKU  catalogs  from  suppliers  to  produce  correlated  catalog  that  is  exported  to   indexing  and  post-­‐processing  for  an  online  retailer.  
  • 21. page © 2014 VoltDB PROPRIETARY VOLTDB EXPORT ANSWERS THE QUESTIONS: How do I stream filtered, enriched, updated results to OLAP/HDFS systems? 21 How do I send alerts, alarms, SMS, or messages to downstream applications?
  • 22. page © 2014 VoltDB PROPRIETARY VOLTDB EXPORT UI CREATE TABLE events (! EventID INTEGER,! time TIMESTAMP,! msg VARCHAR(128));! EXPORT TABLE events;! 22 <export enabled="true" target="file">! ddl.sql! deployment.xml!INSERT into TABLE values…! Application SQL!
  • 23. page © 2014 VoltDB PROPRIETARY EXPORTING TO HDFS <export enabled="true" target="http">! <configuration>! <property name="endpoint">! http://hadoopserver/webhdfs/v1.0/%t/%p.%t.%g.csv! </property>! </configuration>! </export>
  • 24. page © 2014 VoltDB PROPRIETARY EXPORT FORMATS •  CSV •  TSV •  Avro container •  Raw data
  • 25. page © 2014 VoltDB PROPRIETARY EXTENSIBLE API 25 public void onBlockStart() throws RestartBlockException;{! }! ! public boolean processRow(int rowSize, byte[] rowData) throws "RestartBlockException {! }! ! public void onBlockCompletion() throws RestartBlockException {! }! All  of  these  export  connectors  are  hosted  plugins  to  the  VoltDB   database.  VoltDB  manages  HA,  fault  tolerance,  configura6on,   and  MPP  scale-­‐out.  
  • 26. page © 2014 VoltDB PROPRIETARY DATA ARCHITECTURE FOR FAST + BIG DATA Enterprise Apps ETL CR M ER P Etc . Data Lake (HDFS, etc.) BIG DATA SQL on Hadoop Map Reduce Exploratory Analytics BI Reporting Fast Operational Database FAST DATA Export Ingest / Interactive Streaming Analytics Fast Serve Analytics Decisioning 26
  • 27. page © 2014 VoltDB PROPRIETARY 27
  • 28. page © 2014 VoltDB PROPRIETARY page THANK YOU! 28
  • 29. page © 2014 VoltDB PROPRIETARY VOLTDB: 29 •  We  say  “Ingest  &Export”  vs.  Spout  and  Bolt   •  Scale  (ACID)  snippets  of  Java  for  each   incoming  event.     AND…   •  Actually  serve  real  7me  analy7cs  via  SQL   •  Metadata  for  lookup/enrichment  implicit  in  DB  func7on   •  Integrate  with  OLAP  systems  to  use  OLAP  reports  with  event-­‐ based  processing   •  Generate  fast  transac7onal  responses   •  Support  ad-­‐hoc  queryability   •  Declara7ve  aggrega7ons  vs.  code   •  Fast:  no  need  to  micro-­‐batch  
  • 30. page © 2014 VoltDB PROPRIETARY VOLTDB: DATA PIPELINES WITH EXPORT 30 VoltDB Metadata   (Dimension  table)   Session  state   (Fact  table)   •  Filtering (ex: only RFID / iBeacon readings that show change from previous location). •  Sessionization •  Common version re-writing •  Data enrichment •  MPP streaming Export •  Row data, Thrift messages, CSV •  OLAP, HDFS and message queues Export
  • 31. page © 2014 VoltDB PROPRIETARY INTEGRATING DATA SOURCES WITH VOLTDB •  CSV  loader   •  Kaga  loader   •  JDBC  loader   •  Vertica UDx   •  Extensible  loader  API   •  JDBC   •  ODBC   •  HTTP  JSON   •  NaHve  client  drivers  /  SDKs   BULK  LOADERS   APPLICATION  INTERFACES  
  • 32. page © 2014 VoltDB PROPRIETARY INTEGRATING WITH CSV DATA csvloader volttable -f data.csv!
  • 33. page © 2014 VoltDB PROPRIETARY INTEGRATING WITH KAFKA kafkaloader volttable ! --zookeeper=zkserver:2181 ! --topic=topicname!
  • 34. page © 2014 VoltDB PROPRIETARY INTEGRATING WITH JDBC SOURCES jdbcloader volttable ! --jdbcurl=jdbc:postgresql://server/db ! --jdbcdriver=org.postgresql.Driver ! --jdbctable=table
  • 35. page © 2014 VoltDB PROPRIETARY INTEGRATING WITH HP VERTICA UDX SELECT voltdbload(c1, c2, c3! USING PARAMETERS voltservers='localhost’,! volttable=‘volttable’)! FROM T;!
  • 36. page © 2014 VoltDB PROPRIETARY EXTENSIBLE VOLTBULKLOADER: BULK SMASH 36 VoltBulkLoader loader =! "client.getNewBulkLoader(tableName, maxBatchSize, "failureCallback);! for (…) {! loader.insertRow(handle, values);! }! loader.drain();! loader.close();! All  of  these  tools  are  built  on  a  MIT  licensed  extensible  API  that   provides  performance  op6miza6ons,  batching,  load  balancing.  
  • 37. page © 2014 VoltDB PROPRIETARY NATIVE CLIENT LIBRARIES •  Java •  C++ •  PHP •  Node.js   •  Go   •  Python   •  Erlang   •  Ruby   Or,  just…   curl ‘http://localhost:8080/api/1.0/?! Procedure=Vote&Parameters=[1,1,0]’!  
  • 38. page © 2014 VoltDB PROPRIETARY VOLTDB EXPORT TOPOLOGIES •  VoltDB -> Queue (Kafka, RabbitMQ) •  VoltDB -> HDFS (for Pig/Hive/etc. processing) •  VoltDB -> OLAP (Vertica, Netezza..) •  VoltDB -> HTTP Endpoint, i.e: ElasticSearch 38
  • 39. page © 2014 VoltDB PROPRIETARY VOLTDB EXPORT PROGRAMMING CONTRACT •  Export data is durable until exported •  MPP scale-out of export data flows •  At-least-once delivery during HA events •  Built-in row ids (for uniqueness filtering) •  Extensible API for open source connectors 39
  • 40. page © 2014 VoltDB PROPRIETARY INTEGRATING VOLTDB WITH EXPORT TARGETS 40 •  Local  file  system  export   •  JDBC  export   •  Kaga  export   •  RabbitMQ  export   •  HDFS  export   •  HTTP  export   •  Extensible  API  
  • 41. page © 2014 VoltDB PROPRIETARY EXPORTING TO LOCAL FILE SYSTEM <export enabled="true" target="file">! <configuration>! <property name="type">csv</property>! <property name="nonce">MyExport</property>! </configuration>! </export>!
  • 42. page © 2014 VoltDB PROPRIETARY EXPORTING TO JDBC DESTINATIONS <export enabled="true" target=”jdbc">! <configuration>! <property name=”jdbcurl">jdbc:postgresql://server/db</ property>! <property name=”jdbcuser">guest</property>! </configuration>! </export>
  • 43. page © 2014 VoltDB PROPRIETARY EXPORTING TO KAFKA <export enabled="true" target=”kafka">! <configuration>! <property name=“metadata.broker.list”>server1</property>! </configuration>! </export>
  • 44. page © 2014 VoltDB PROPRIETARY EXPORTING TO RABBITMQ <export enabled="true" target="rabbitmq">! <configuration>! <property name="broker.host”>server1</property>! </configuration>! </export>