In this Strata 2018 presentation, Ted Malaska and Mark Grover discuss how to make the most of big data at speed.
https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/72396
7. How can we reduce the insight gap?
User interface Analytical
interface
Users Decision maker
Insight Lag
8. What contributes to ingest gap?
● Slow ingest and ETL
○ Derived data takes a while to become available.
● Slow human insights
○ Storage systems are not effective.
○ Tools for analyzing/gaining insights are not productive.
● Slow automated decisions
○ Developing and training models is hard.
9. Inside the “insight box” - historically
ETL
Engine Data
Warehouse
Source System
A
Source System
B
Source System
C
11. How Lyft is pushing the envelope
● Detecting driver scarcity (or abundance) and incentivizing them to be where the passengers are
○ Marketplace imbalance is not good
● Marketplace parameters consists of:
○ Drivers
○ Passengers
○ Geography
○ Time!
● Decide using data, if/when/which incentive to deploy
● Deploy the right incentive automatically
19. Inside the “insight box” - historically
ETL
Engine Data
Warehouse
Source System
A
Source System
B
Source System
C
20. Inside the “insight box” - Now
Pipes
Analytical
Storage
Source System
A
Source System
B
Source System
C
Long Term
Storage
Searchable
Storage
Time Series
Storage
In Memory
Windowing
State
Auditing &
Governance
21. Inside the “insight box” - Now
Pipes
Analytical
Storage
Source System
A
Source System
B
Source System
C
Long Term
Storage
Searchable
Storage
Time Series
Storage
In Memory
Windowing
State
Auditing &
Governance
Archival and storage
Managed storage,
SQL queries
For a user X
Grafana, wavefront
style dashboards
Sessionization,
windowing, etc.
23. Importance of Auditing & Governance
● Protect against the disorder
● Isolation Kafka Topics for different use cases
● Topic creation and routing dynamically is key
27. How we take action: Learn and Act
Pipes Analytical
Storage
Source System
A
Source System
B
Source System
C
Analysis Programer
Actionable
Systems
28. How we take action: Batch Generated Actions
Pipes Analytical
Storage
Source System
A
Source System
B
Source System
C
Actionable
Systems
Batch Job
Programer
Automation
29. How we take action: Stream Generated Actions
Pipes Stream
Processing
Source System
A
Source System
B
Source System
C
Actionable
Systems
Pipes
Storage
Model
Reviewers
30. Inside the “insight box” - Now
Pipes
Analytical
Storage
Source System
A
Source System
B
Source System
C
Long Term
Storage
Searchable
Storage
Time Series
Storage
Stream
Processing
Auditing &
Governance
Actionable
Systems
31. Faster Decisions
● Need to have a mindset of streaming data
○ Streams are tables
■ Tumbling
■ Sliding
■ Sessionization
■ Custom
● Train in Streams
● Output is Streams
● All the things are Streams
43. Streams are Tables
● Feature creation based on windows
● Batch as Streaming
○ Partition by Entity
○ Sort By Time
○ Flatmap for every window trigger
● Batch Model can be feed by Streaming Windows
● Output is a Stream as well
44. Journey From Input to Value
Pipes
Analytical
Storage
Source System
A
Source System
B
Source System
C
Long Term
Storage
Searchable
Storage
Time Series
Storage
Stream
Processing
Auditing &
Governance
Actionable
Systems