SlideShare une entreprise Scribd logo
1  sur  29
© 2016 MapR Technologies 1© 2016 MapR Technologies 1MapR Confidential
© 2016 MapR Technologies
When Your Stream is
the System of Record
Seattle Kafka Meetup Will Ochandarena
Sr Dir, Product
October 24 2016
© 2016 MapR Technologies 2© 2016 MapR Technologies 2MapR Confidential
Agenda
• Streaming System of Record - What?
• A Little About MapR Streams
• Versioning a Real-time Data Pipeline
– Demo - MapR + StreamSets
© 2016 MapR Technologies 3© 2016 MapR Technologies 3MapR Confidential © 2016 MapR Technologies
Streaming System of Record
System of Record (n): information storage system that is
the authoritative data source for a given data element or
piece of information.
© 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential
Who Does This Today?
Events
Processing
DB
More
Processing
Long Term Storage
© 2016 MapR Technologies 5© 2016 MapR Technologies 5MapR Confidential
Reprocessing is Hard
Events
Processing
DB
More
Processing
Long Term Storage
?
Medium Term Storage
3d ago -> Now
1 Year ago -> ~an hour ago
© 2016 MapR Technologies 6© 2016 MapR Technologies 6MapR Confidential
Easy Fix - Streaming System of Persistence
Events
Processing
DB
More
Processing
Long Term Storage
Long Term Storage
Events
© 2016 MapR Technologies 7© 2016 MapR Technologies 7MapR Confidential
DMV_Updates
Imagine each event as a change to an entry in a database.
DL_ID City Points
0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 }
1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 }
2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213}
3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 }
WillO
BradA
Mountain View
Atlanta
0
0
San Jose
2
How Can a Stream Be a System of Record?
© 2016 MapR Technologies 8© 2016 MapR Technologies 8MapR Confidential
Key-Val Document Graph
Wide Column Time Series Relational
???Inserts Updates
Streams and Databases in Harmony
© 2016 MapR Technologies 9© 2016 MapR Technologies 9MapR Confidential
Which of these can be used to reconstruct the other?
0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 }
1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 }
2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213}
3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 }
DL_ID City Points
Will0 San Jose 0
BradA Atlanta 2
Which Makes a Better System of Record?
© 2016 MapR Technologies 10© 2016 MapR Technologies 10MapR Confidential
• Auditing - “how did BradA’s points get so high?”
• Lineage - “who added points to BradA license?”
• History - “where did WillO used to live?”
• Integrity - “can I trust this data hasn’t been tampered with?”
• Yup - Streams are immutable
0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 }
1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 }
2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213}
3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 }
Other Benefits of Streaming System of Record
© 2016 MapR Technologies 11© 2016 MapR Technologies 11MapR Confidential
• Infinitely persisted events
• A way to query your persisted stream data
• An integrated security model across data services
What Do I Need For This to Work?
• Applied Streaming System of Record @ Liaison Blog
© 2016 MapR Technologies 12© 2016 MapR Technologies 12MapR Confidential © 2016 MapR Technologies
About MapR & MapR Streams
© 2016 MapR Technologies 13© 2016 MapR Technologies 13MapR Confidential
MapR Streams:
Global Pub-sub Event Streaming System for Big Data
Producers publish billions of events/sec
to a topic in a stream.
Events persisted and immediately
delivered to all consumers, guaranteed.
Tie together geo-dispersed clusters.
Worldwide.
Standard real-time API (Kafka).
Integrates with Spark Streaming, Storm,
Apex, and Flink.
Direct data access (OJAI API) from
analytics frameworks.
To
pi
c
Stream
TopicProducers Consumers
Remote sites and consumers
Batch analytics
© 2016 MapR Technologies 14© 2016 MapR Technologies 14MapR Confidential
Streams Offers a Durable,
Persistent System of Record
[
{“Topic1Part0Seq5001”: {
“timestamp” : 1456246886,
“topic” : “Topic1”,
“partition” : 0,
“producer” : “wochanda”,
“offset” : 5001,
“key” : “MsgKey”,
“data” : {...}
},
{“Topic2Part0Seq5002”: { … } },
…
]
● Reliable
● Secure
● Immutable
● Auditable
● Replayable
© 2016 MapR Technologies 15© 2016 MapR Technologies 15MapR Confidential
Streams Enables Global Applications and Analytics
Provides
● Arbitrary topology of thousands of clusters
● Automatic loop prevention
● DNS-based discovery
● Globally synchronized message offsets
and consumer cursors
Enables
● Global applications & data collection
● Producer & consumer failover
● Analysis/filtering/aggregation at the edge
● “Occasional” connections
Producers
Consumers
© 2016 MapR Technologies 16© 2016 MapR Technologies 16MapR Confidential
Fun Facts
MapR Streams
Converged Global Scale
Secure & Multi-Tenant
Single cluster for files,
tables, and streams. Global, IoT-scale “fabrics”
with failover.
Tenant-owned streams,
logical grouping of topics
and messages.
Authentication,
authorization, encryption.
Unified policy with all
other platform services.
Infinite “system of
record” persistence.
Metadata tracked
internally, no
dependencies on ZK.
Consumers, topics scale
into millions.
© 2016 MapR Technologies 17© 2016 MapR Technologies 17MapR Confidential
Open Source Engines & Tools Commercial Engines & Applications
DataProcessing
Web-Scale Storage
MapR-FS MapR-DB
Search and
Others
Global Namespace | No Single Point of Failure | Data Protection | Multi-tenancy | Workload Management
Multi Temperature | Global Multi Datacenter | High Performance Low Latency | Security | Management & Monitoring
MapR Streams
Cloud and
Managed
Services
Search and
Others
UnifiedManagementandMonitoring
Search and
Others
Event StreamingDatabase
Custom
Apps
HDFS API POSIX, NFS HBase API JSON API Kafka API
MapR Converged Data Platform
MapR Data Platform Services
Commodity Hardware/Storage, Clouds, & Containers
© 2016 MapR Technologies 18© 2016 MapR Technologies 18MapR Confidential © 2016 MapR Technologies
Versioning a Real-time Data Pipeline
© 2016 MapR Technologies 19© 2016 MapR Technologies 19MapR Confidential
Challenges of a Streaming App Developer
Pre-Production
Streaming System
Database Hadoop Cluster
App Environment
events
logs
events2
logs2
v2
v2 /clicks /clicks2
... ...
... ...
© 2016 MapR Technologies 20© 2016 MapR Technologies 20MapR Confidential
Challenges with Versioning
Post-Production
Input Data App Logic Output Data+ =
Output Streams
Database Tables
Logs, Metrics
What if you deploy a
new version of your
application?
What happens
to all of this?
© 2016 MapR Technologies 21© 2016 MapR Technologies 21MapR Confidential
Example: Versioning in Production
45 40 60 30 37 39 72 79 60
Input_Stream
45 35 70
Output_Stream
Calculate_Mean_3
Time Value
00:00:00 70
00:00:05 35
00:00:10 45
Output_Table
Calculate_Mean_3Calculate_Median_3
© 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential Calculate_Mean_3 Volume
Versioning with Converged App Volumes
45 40 60 30 37 39 72 79 60
Input_Stream
35 70
Output_Stream
Calculate_Mean_3
Time Value
00:00:00 70
00:00:05 35
00:00:10
Output_Table
Calculate_Mean_3Calculate_Median_3
Calculate_Median_3 Volume
Time Value
00:00:00 72
00:00:05 37
00:00:10 45
45 37 72
Output_Stream
Output_Table
© 2016 MapR Technologies 23© 2016 MapR Technologies 23MapR Confidential
Versioning & A/B Testing
80%
10%
10%
A
B
C
© 2016 MapR Technologies 24© 2016 MapR Technologies 24MapR Confidential © 2016 MapR Technologies
DEMO - MapR & Streamsets
Versioning a Production Data Pipeline
Rupal Shah - Streamsets
© 2016 MapR Technologies 25© 2016 MapR Technologies 25MapR Confidential
StreamSets Data Collector™
Adaptable Pipelines -> Efficiency
❑ Intent-driven ingest (minimal schema specification).
❑ Data drift handling.
Pipeline KPIs -> Visibility
❑ Real-time stage, edge and bad data metrics.
❑ Alerts via profiling, sampling and threshold-based rules.
Containerized Architecture -> Agility
❑ Flexible deployment: edge, cluster, embedded, pipeline,
pub/sub
❑ Zero-downtime upgrades due to logical component
isolation.
StreamSets Data Collector™ is open source software for building and deploying individual any-
to-any ingest pipelines in the face of data drift.
© 2016 MapR Technologies 26© 2016 MapR Technologies 26MapR Confidential
StreamSets Dataflow Performance
Manager™
StreamSets Dataflow Performance
Manager (DPM™) provides a single
pane of glass to map, measure and
master big data in motion.
MASTER
Availability & Accuracy
Proactive Remediation
MEASURE
Any Path
Any Time
MAP
Dataflow Lineage
Live Data Architecture
© 2016 MapR Technologies 27© 2016 MapR Technologies 27MapR Confidential
…helping you put data technology to work
● Find answers
● Ask technical questions
● Join on-demand training course discussions
● Follow release announcements
● Share and vote on product ideas
● Find Meetup and event listings
Connect with fellow Apache
Hadoop and Spark professionals
community.mapr.com
© 2016 MapR Technologies 28© 2016 MapR Technologies 28MapR Confidential © 2016 MapR Technologies
Backup
© 2016 MapR Technologies 29© 2016 MapR Technologies 29MapR Confidential
bit.ly/tbd
Find my slides & other related materials to this talk here:
or search:

Contenu connexe

Tendances

Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKai Wähner
 
Message Driven and Event Sourcing
Message Driven and Event SourcingMessage Driven and Event Sourcing
Message Driven and Event SourcingPaolo Castagna
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientistsJenn Rawlins
 
Enabling Smarter Cities and Connected Vehicles with an Event Streaming Platfo...
Enabling Smarter Cities and Connected Vehicles with an Event Streaming Platfo...Enabling Smarter Cities and Connected Vehicles with an Event Streaming Platfo...
Enabling Smarter Cities and Connected Vehicles with an Event Streaming Platfo...Kai Wähner
 
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth WiesmanUnified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth WiesmanStreamNative
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Kai Wähner
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Summit
 
Express Scripts: Driving Digital Transformation from Mainframe to Microservices
Express Scripts: Driving Digital Transformation from Mainframe to MicroservicesExpress Scripts: Driving Digital Transformation from Mainframe to Microservices
Express Scripts: Driving Digital Transformation from Mainframe to Microservicesconfluent
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
 
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWSBridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWSconfluent
 
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...confluent
 
Event streaming: A paradigm shift in enterprise software architecture
Event streaming: A paradigm shift in enterprise software architectureEvent streaming: A paradigm shift in enterprise software architecture
Event streaming: A paradigm shift in enterprise software architectureSina Sojoodi
 
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...Kai Wähner
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafkaconfluent
 
Redis and Kafka - Advanced Microservices Design Patterns Simplified
Redis and Kafka - Advanced Microservices Design Patterns SimplifiedRedis and Kafka - Advanced Microservices Design Patterns Simplified
Redis and Kafka - Advanced Microservices Design Patterns SimplifiedAllen Terleto
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Kai Wähner
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfWhy Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfDATAVERSITY
 
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X Kai Wähner
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streamsconfluent
 

Tendances (20)

Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
 
Message Driven and Event Sourcing
Message Driven and Event SourcingMessage Driven and Event Sourcing
Message Driven and Event Sourcing
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
 
Enabling Smarter Cities and Connected Vehicles with an Event Streaming Platfo...
Enabling Smarter Cities and Connected Vehicles with an Event Streaming Platfo...Enabling Smarter Cities and Connected Vehicles with an Event Streaming Platfo...
Enabling Smarter Cities and Connected Vehicles with an Event Streaming Platfo...
 
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth WiesmanUnified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 
Express Scripts: Driving Digital Transformation from Mainframe to Microservices
Express Scripts: Driving Digital Transformation from Mainframe to MicroservicesExpress Scripts: Driving Digital Transformation from Mainframe to Microservices
Express Scripts: Driving Digital Transformation from Mainframe to Microservices
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWSBridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
 
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...
 
Event streaming: A paradigm shift in enterprise software architecture
Event streaming: A paradigm shift in enterprise software architectureEvent streaming: A paradigm shift in enterprise software architecture
Event streaming: A paradigm shift in enterprise software architecture
 
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
 
Redis and Kafka - Advanced Microservices Design Patterns Simplified
Redis and Kafka - Advanced Microservices Design Patterns SimplifiedRedis and Kafka - Advanced Microservices Design Patterns Simplified
Redis and Kafka - Advanced Microservices Design Patterns Simplified
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfWhy Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
 
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 

En vedette

Strata+Hadoop 2015 Keynote: Impacting Business as it Happens
Strata+Hadoop 2015 Keynote: Impacting Business as it HappensStrata+Hadoop 2015 Keynote: Impacting Business as it Happens
Strata+Hadoop 2015 Keynote: Impacting Business as it HappensMapR Technologies
 
SapientNitro: Multi-channel and the Convergence of Marketing, Commerce & Cust...
SapientNitro: Multi-channel and the Convergence of Marketing, Commerce & Cust...SapientNitro: Multi-channel and the Convergence of Marketing, Commerce & Cust...
SapientNitro: Multi-channel and the Convergence of Marketing, Commerce & Cust...Day Software
 
Redefining Perspectives 6 - Session 1 Jarlath Forde
Redefining Perspectives 6 - Session 1 Jarlath FordeRedefining Perspectives 6 - Session 1 Jarlath Forde
Redefining Perspectives 6 - Session 1 Jarlath Fordesapientindia
 
Hadoop Self-Service Data Prep Fuels Analytics
Hadoop Self-Service Data Prep Fuels AnalyticsHadoop Self-Service Data Prep Fuels Analytics
Hadoop Self-Service Data Prep Fuels AnalyticsSenturus
 
Digital Velocity Europe 2015 | Sapient Nitro Presentation
Digital Velocity Europe 2015 | Sapient Nitro PresentationDigital Velocity Europe 2015 | Sapient Nitro Presentation
Digital Velocity Europe 2015 | Sapient Nitro PresentationTealium
 
Databeers Dub #1 - Krithika Ram - Customer Journey Analytics
Databeers Dub #1 - Krithika Ram - Customer Journey AnalyticsDatabeers Dub #1 - Krithika Ram - Customer Journey Analytics
Databeers Dub #1 - Krithika Ram - Customer Journey AnalyticsDatabeers Dublin
 
Understanding Customer Buying Journey with Big Data
Understanding Customer Buying Journey with Big DataUnderstanding Customer Buying Journey with Big Data
Understanding Customer Buying Journey with Big DataAnalyticsWeek
 
Tamr | Strata hadoop 2014 Michael Stonebraker
Tamr | Strata hadoop 2014 Michael StonebrakerTamr | Strata hadoop 2014 Michael Stonebraker
Tamr | Strata hadoop 2014 Michael StonebrakerTamr_Inc
 
Primavera vision & roadmap collaborate13 april 2013
Primavera vision & roadmap collaborate13 april 2013Primavera vision & roadmap collaborate13 april 2013
Primavera vision & roadmap collaborate13 april 2013p6academy
 
Content Strategy for the Customer Journey: Personalization Done Right Confab ...
Content Strategy for the Customer Journey: Personalization Done Right Confab ...Content Strategy for the Customer Journey: Personalization Done Right Confab ...
Content Strategy for the Customer Journey: Personalization Done Right Confab ...Kevin Nichols
 
Consumer Decision Journey in the Digital Age
Consumer Decision Journey in the Digital AgeConsumer Decision Journey in the Digital Age
Consumer Decision Journey in the Digital AgeAlok Ranjan
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaDataWorks Summit/Hadoop Summit
 
Tamr | cdo-summit
Tamr | cdo-summitTamr | cdo-summit
Tamr | cdo-summitTamr_Inc
 
Michael Stonebraker How to do Complex Analytics
Michael Stonebraker How to do Complex AnalyticsMichael Stonebraker How to do Complex Analytics
Michael Stonebraker How to do Complex AnalyticsMassTLC
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
Consumer decision journey HBR
Consumer decision journey HBR Consumer decision journey HBR
Consumer decision journey HBR Sameer Mathur
 
How PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
How PepsiCo's Big Data Strategy is Disrupting CPG Retail AnalyticsHow PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
How PepsiCo's Big Data Strategy is Disrupting CPG Retail AnalyticsHortonworks
 
Customer Journey Mapping
Customer Journey MappingCustomer Journey Mapping
Customer Journey MappingLenati
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Cloudera, Inc.
 

En vedette (20)

Strata+Hadoop 2015 Keynote: Impacting Business as it Happens
Strata+Hadoop 2015 Keynote: Impacting Business as it HappensStrata+Hadoop 2015 Keynote: Impacting Business as it Happens
Strata+Hadoop 2015 Keynote: Impacting Business as it Happens
 
SapientNitro: Multi-channel and the Convergence of Marketing, Commerce & Cust...
SapientNitro: Multi-channel and the Convergence of Marketing, Commerce & Cust...SapientNitro: Multi-channel and the Convergence of Marketing, Commerce & Cust...
SapientNitro: Multi-channel and the Convergence of Marketing, Commerce & Cust...
 
Redefining Perspectives 6 - Session 1 Jarlath Forde
Redefining Perspectives 6 - Session 1 Jarlath FordeRedefining Perspectives 6 - Session 1 Jarlath Forde
Redefining Perspectives 6 - Session 1 Jarlath Forde
 
Hadoop Self-Service Data Prep Fuels Analytics
Hadoop Self-Service Data Prep Fuels AnalyticsHadoop Self-Service Data Prep Fuels Analytics
Hadoop Self-Service Data Prep Fuels Analytics
 
Digital Velocity Europe 2015 | Sapient Nitro Presentation
Digital Velocity Europe 2015 | Sapient Nitro PresentationDigital Velocity Europe 2015 | Sapient Nitro Presentation
Digital Velocity Europe 2015 | Sapient Nitro Presentation
 
Databeers Dub #1 - Krithika Ram - Customer Journey Analytics
Databeers Dub #1 - Krithika Ram - Customer Journey AnalyticsDatabeers Dub #1 - Krithika Ram - Customer Journey Analytics
Databeers Dub #1 - Krithika Ram - Customer Journey Analytics
 
Connecting the Dots: Analytics and the Customer Journey
Connecting the Dots: Analytics and the Customer JourneyConnecting the Dots: Analytics and the Customer Journey
Connecting the Dots: Analytics and the Customer Journey
 
Understanding Customer Buying Journey with Big Data
Understanding Customer Buying Journey with Big DataUnderstanding Customer Buying Journey with Big Data
Understanding Customer Buying Journey with Big Data
 
Tamr | Strata hadoop 2014 Michael Stonebraker
Tamr | Strata hadoop 2014 Michael StonebrakerTamr | Strata hadoop 2014 Michael Stonebraker
Tamr | Strata hadoop 2014 Michael Stonebraker
 
Primavera vision & roadmap collaborate13 april 2013
Primavera vision & roadmap collaborate13 april 2013Primavera vision & roadmap collaborate13 april 2013
Primavera vision & roadmap collaborate13 april 2013
 
Content Strategy for the Customer Journey: Personalization Done Right Confab ...
Content Strategy for the Customer Journey: Personalization Done Right Confab ...Content Strategy for the Customer Journey: Personalization Done Right Confab ...
Content Strategy for the Customer Journey: Personalization Done Right Confab ...
 
Consumer Decision Journey in the Digital Age
Consumer Decision Journey in the Digital AgeConsumer Decision Journey in the Digital Age
Consumer Decision Journey in the Digital Age
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
Tamr | cdo-summit
Tamr | cdo-summitTamr | cdo-summit
Tamr | cdo-summit
 
Michael Stonebraker How to do Complex Analytics
Michael Stonebraker How to do Complex AnalyticsMichael Stonebraker How to do Complex Analytics
Michael Stonebraker How to do Complex Analytics
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
Consumer decision journey HBR
Consumer decision journey HBR Consumer decision journey HBR
Consumer decision journey HBR
 
How PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
How PepsiCo's Big Data Strategy is Disrupting CPG Retail AnalyticsHow PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
How PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
 
Customer Journey Mapping
Customer Journey MappingCustomer Journey Mapping
Customer Journey Mapping
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 

Similaire à Map r seattle streams meetup oct 2016

Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...MapR Technologies
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataCarol McDonald
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital TransformationMapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016Mathieu Dumoulin
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Tugdual Grall
 
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions ArchitectHUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions ArchitectSpagoWorld
 
Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data PipelinesMapR Technologies
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Carol McDonald
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Mathieu Dumoulin
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...DataWorks Summit/Hadoop Summit
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsMatt Stubbs
 

Similaire à Map r seattle streams meetup oct 2016 (20)

Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions ArchitectHUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
 
Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data Pipelines
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business Solutions
 

Plus de Nitin Kumar

Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafkaNitin Kumar
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehnerNitin Kumar
 
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Nitin Kumar
 
Processing trillions of events per day with apache
Processing trillions of events per day with apacheProcessing trillions of events per day with apache
Processing trillions of events per day with apacheNitin Kumar
 
Ren cao kafka connect
Ren cao   kafka connectRen cao   kafka connect
Ren cao kafka connectNitin Kumar
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bbNitin Kumar
 
EventHub for kafka ecosystems kafka meetup
EventHub for kafka ecosystems   kafka meetupEventHub for kafka ecosystems   kafka meetup
EventHub for kafka ecosystems kafka meetupNitin Kumar
 
Microsoft challenges of a multi tenant kafka service
Microsoft challenges of a multi tenant kafka serviceMicrosoft challenges of a multi tenant kafka service
Microsoft challenges of a multi tenant kafka serviceNitin Kumar
 
Net flix kafka seattle meetup
Net flix kafka seattle meetupNet flix kafka seattle meetup
Net flix kafka seattle meetupNitin Kumar
 
Brandon obrien streaming_data
Brandon obrien streaming_dataBrandon obrien streaming_data
Brandon obrien streaming_dataNitin Kumar
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Nitin Kumar
 
Microsoft kafka load imbalance
Microsoft   kafka load imbalanceMicrosoft   kafka load imbalance
Microsoft kafka load imbalanceNitin Kumar
 
Linked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaLinked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaNitin Kumar
 
Seattle kafka meetup nov 2015 published siphon
Seattle kafka meetup nov 2015 published  siphonSeattle kafka meetup nov 2015 published  siphon
Seattle kafka meetup nov 2015 published siphonNitin Kumar
 

Plus de Nitin Kumar (16)

Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
 
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
 
Processing trillions of events per day with apache
Processing trillions of events per day with apacheProcessing trillions of events per day with apache
Processing trillions of events per day with apache
 
Ren cao kafka connect
Ren cao   kafka connectRen cao   kafka connect
Ren cao kafka connect
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
 
EventHub for kafka ecosystems kafka meetup
EventHub for kafka ecosystems   kafka meetupEventHub for kafka ecosystems   kafka meetup
EventHub for kafka ecosystems kafka meetup
 
Kafka eos
Kafka eosKafka eos
Kafka eos
 
Microsoft challenges of a multi tenant kafka service
Microsoft challenges of a multi tenant kafka serviceMicrosoft challenges of a multi tenant kafka service
Microsoft challenges of a multi tenant kafka service
 
Net flix kafka seattle meetup
Net flix kafka seattle meetupNet flix kafka seattle meetup
Net flix kafka seattle meetup
 
Avvo fkafka
Avvo fkafkaAvvo fkafka
Avvo fkafka
 
Brandon obrien streaming_data
Brandon obrien streaming_dataBrandon obrien streaming_data
Brandon obrien streaming_data
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
Microsoft kafka load imbalance
Microsoft   kafka load imbalanceMicrosoft   kafka load imbalance
Microsoft kafka load imbalance
 
Linked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaLinked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafka
 
Seattle kafka meetup nov 2015 published siphon
Seattle kafka meetup nov 2015 published  siphonSeattle kafka meetup nov 2015 published  siphon
Seattle kafka meetup nov 2015 published siphon
 

Dernier

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 

Dernier (20)

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 

Map r seattle streams meetup oct 2016

  • 1. © 2016 MapR Technologies 1© 2016 MapR Technologies 1MapR Confidential © 2016 MapR Technologies When Your Stream is the System of Record Seattle Kafka Meetup Will Ochandarena Sr Dir, Product October 24 2016
  • 2. © 2016 MapR Technologies 2© 2016 MapR Technologies 2MapR Confidential Agenda • Streaming System of Record - What? • A Little About MapR Streams • Versioning a Real-time Data Pipeline – Demo - MapR + StreamSets
  • 3. © 2016 MapR Technologies 3© 2016 MapR Technologies 3MapR Confidential © 2016 MapR Technologies Streaming System of Record System of Record (n): information storage system that is the authoritative data source for a given data element or piece of information.
  • 4. © 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential Who Does This Today? Events Processing DB More Processing Long Term Storage
  • 5. © 2016 MapR Technologies 5© 2016 MapR Technologies 5MapR Confidential Reprocessing is Hard Events Processing DB More Processing Long Term Storage ? Medium Term Storage 3d ago -> Now 1 Year ago -> ~an hour ago
  • 6. © 2016 MapR Technologies 6© 2016 MapR Technologies 6MapR Confidential Easy Fix - Streaming System of Persistence Events Processing DB More Processing Long Term Storage Long Term Storage Events
  • 7. © 2016 MapR Technologies 7© 2016 MapR Technologies 7MapR Confidential DMV_Updates Imagine each event as a change to an entry in a database. DL_ID City Points 0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 } 1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 } 2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213} 3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 } WillO BradA Mountain View Atlanta 0 0 San Jose 2 How Can a Stream Be a System of Record?
  • 8. © 2016 MapR Technologies 8© 2016 MapR Technologies 8MapR Confidential Key-Val Document Graph Wide Column Time Series Relational ???Inserts Updates Streams and Databases in Harmony
  • 9. © 2016 MapR Technologies 9© 2016 MapR Technologies 9MapR Confidential Which of these can be used to reconstruct the other? 0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 } 1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 } 2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213} 3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 } DL_ID City Points Will0 San Jose 0 BradA Atlanta 2 Which Makes a Better System of Record?
  • 10. © 2016 MapR Technologies 10© 2016 MapR Technologies 10MapR Confidential • Auditing - “how did BradA’s points get so high?” • Lineage - “who added points to BradA license?” • History - “where did WillO used to live?” • Integrity - “can I trust this data hasn’t been tampered with?” • Yup - Streams are immutable 0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 } 1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 } 2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213} 3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 } Other Benefits of Streaming System of Record
  • 11. © 2016 MapR Technologies 11© 2016 MapR Technologies 11MapR Confidential • Infinitely persisted events • A way to query your persisted stream data • An integrated security model across data services What Do I Need For This to Work? • Applied Streaming System of Record @ Liaison Blog
  • 12. © 2016 MapR Technologies 12© 2016 MapR Technologies 12MapR Confidential © 2016 MapR Technologies About MapR & MapR Streams
  • 13. © 2016 MapR Technologies 13© 2016 MapR Technologies 13MapR Confidential MapR Streams: Global Pub-sub Event Streaming System for Big Data Producers publish billions of events/sec to a topic in a stream. Events persisted and immediately delivered to all consumers, guaranteed. Tie together geo-dispersed clusters. Worldwide. Standard real-time API (Kafka). Integrates with Spark Streaming, Storm, Apex, and Flink. Direct data access (OJAI API) from analytics frameworks. To pi c Stream TopicProducers Consumers Remote sites and consumers Batch analytics
  • 14. © 2016 MapR Technologies 14© 2016 MapR Technologies 14MapR Confidential Streams Offers a Durable, Persistent System of Record [ {“Topic1Part0Seq5001”: { “timestamp” : 1456246886, “topic” : “Topic1”, “partition” : 0, “producer” : “wochanda”, “offset” : 5001, “key” : “MsgKey”, “data” : {...} }, {“Topic2Part0Seq5002”: { … } }, … ] ● Reliable ● Secure ● Immutable ● Auditable ● Replayable
  • 15. © 2016 MapR Technologies 15© 2016 MapR Technologies 15MapR Confidential Streams Enables Global Applications and Analytics Provides ● Arbitrary topology of thousands of clusters ● Automatic loop prevention ● DNS-based discovery ● Globally synchronized message offsets and consumer cursors Enables ● Global applications & data collection ● Producer & consumer failover ● Analysis/filtering/aggregation at the edge ● “Occasional” connections Producers Consumers
  • 16. © 2016 MapR Technologies 16© 2016 MapR Technologies 16MapR Confidential Fun Facts MapR Streams Converged Global Scale Secure & Multi-Tenant Single cluster for files, tables, and streams. Global, IoT-scale “fabrics” with failover. Tenant-owned streams, logical grouping of topics and messages. Authentication, authorization, encryption. Unified policy with all other platform services. Infinite “system of record” persistence. Metadata tracked internally, no dependencies on ZK. Consumers, topics scale into millions.
  • 17. © 2016 MapR Technologies 17© 2016 MapR Technologies 17MapR Confidential Open Source Engines & Tools Commercial Engines & Applications DataProcessing Web-Scale Storage MapR-FS MapR-DB Search and Others Global Namespace | No Single Point of Failure | Data Protection | Multi-tenancy | Workload Management Multi Temperature | Global Multi Datacenter | High Performance Low Latency | Security | Management & Monitoring MapR Streams Cloud and Managed Services Search and Others UnifiedManagementandMonitoring Search and Others Event StreamingDatabase Custom Apps HDFS API POSIX, NFS HBase API JSON API Kafka API MapR Converged Data Platform MapR Data Platform Services Commodity Hardware/Storage, Clouds, & Containers
  • 18. © 2016 MapR Technologies 18© 2016 MapR Technologies 18MapR Confidential © 2016 MapR Technologies Versioning a Real-time Data Pipeline
  • 19. © 2016 MapR Technologies 19© 2016 MapR Technologies 19MapR Confidential Challenges of a Streaming App Developer Pre-Production Streaming System Database Hadoop Cluster App Environment events logs events2 logs2 v2 v2 /clicks /clicks2 ... ... ... ...
  • 20. © 2016 MapR Technologies 20© 2016 MapR Technologies 20MapR Confidential Challenges with Versioning Post-Production Input Data App Logic Output Data+ = Output Streams Database Tables Logs, Metrics What if you deploy a new version of your application? What happens to all of this?
  • 21. © 2016 MapR Technologies 21© 2016 MapR Technologies 21MapR Confidential Example: Versioning in Production 45 40 60 30 37 39 72 79 60 Input_Stream 45 35 70 Output_Stream Calculate_Mean_3 Time Value 00:00:00 70 00:00:05 35 00:00:10 45 Output_Table Calculate_Mean_3Calculate_Median_3
  • 22. © 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential Calculate_Mean_3 Volume Versioning with Converged App Volumes 45 40 60 30 37 39 72 79 60 Input_Stream 35 70 Output_Stream Calculate_Mean_3 Time Value 00:00:00 70 00:00:05 35 00:00:10 Output_Table Calculate_Mean_3Calculate_Median_3 Calculate_Median_3 Volume Time Value 00:00:00 72 00:00:05 37 00:00:10 45 45 37 72 Output_Stream Output_Table
  • 23. © 2016 MapR Technologies 23© 2016 MapR Technologies 23MapR Confidential Versioning & A/B Testing 80% 10% 10% A B C
  • 24. © 2016 MapR Technologies 24© 2016 MapR Technologies 24MapR Confidential © 2016 MapR Technologies DEMO - MapR & Streamsets Versioning a Production Data Pipeline Rupal Shah - Streamsets
  • 25. © 2016 MapR Technologies 25© 2016 MapR Technologies 25MapR Confidential StreamSets Data Collector™ Adaptable Pipelines -> Efficiency ❑ Intent-driven ingest (minimal schema specification). ❑ Data drift handling. Pipeline KPIs -> Visibility ❑ Real-time stage, edge and bad data metrics. ❑ Alerts via profiling, sampling and threshold-based rules. Containerized Architecture -> Agility ❑ Flexible deployment: edge, cluster, embedded, pipeline, pub/sub ❑ Zero-downtime upgrades due to logical component isolation. StreamSets Data Collector™ is open source software for building and deploying individual any- to-any ingest pipelines in the face of data drift.
  • 26. © 2016 MapR Technologies 26© 2016 MapR Technologies 26MapR Confidential StreamSets Dataflow Performance Manager™ StreamSets Dataflow Performance Manager (DPM™) provides a single pane of glass to map, measure and master big data in motion. MASTER Availability & Accuracy Proactive Remediation MEASURE Any Path Any Time MAP Dataflow Lineage Live Data Architecture
  • 27. © 2016 MapR Technologies 27© 2016 MapR Technologies 27MapR Confidential …helping you put data technology to work ● Find answers ● Ask technical questions ● Join on-demand training course discussions ● Follow release announcements ● Share and vote on product ideas ● Find Meetup and event listings Connect with fellow Apache Hadoop and Spark professionals community.mapr.com
  • 28. © 2016 MapR Technologies 28© 2016 MapR Technologies 28MapR Confidential © 2016 MapR Technologies Backup
  • 29. © 2016 MapR Technologies 29© 2016 MapR Technologies 29MapR Confidential bit.ly/tbd Find my slides & other related materials to this talk here: or search: