"Building Real-Time Data Pipelines with Kafka and MemSQL" by Rick Negrin, Director of Product Management at MemSQL for Orange County Roadshow March 17, 2017.
1. Rick Negrin, Director of Product Management, MemSQL
March 3, 2017
Enabling Real-Time Analytics for IoT
Building Real-Time Data Pipelines with Kafka and MemSQL
2. The Rise of Real-Time Analytics
On-demand economy Internet of Things New technologies
5. 5
REAL-TIME
ANALYTICS
Sensor Data
PMML Predictive Model
Oil rig
sensor activity
Fortune 500 Oil Company
BUSINESS BENEFITS
▪ Streaming well drilling sensor data mitigates $1M per day of lost productivity and drill damage
▪ Met 20TB target environment TCO objective at a dramatically lower cost than SAP HANA
TECHNICAL BENEFITS
▪ Quickly moved existing processes from batch to real-time
▪ Enabled machine learning to score streaming data
▪ Repurposed existing SAS model using PMML
▪ Joined multiple data types and third-party sources including geospatial and weather data
6. Smart Grid
Enterprise
Service Bus
Persistence
Ad-hoc data
science
Smart Data Access
Fortune 500 Energy Utility
BUSINESS BENEFITS
▪ Using real-time and historical analytics of smart meters to improve energy efficiency
▪ Reduce grid outages for improved customer experience and maintain/extend service pricing
▪ Proactive maintenance reduces energy operating costs
▪ Lowers fossil fuel consumption
TECHNICAL BENEFITS
▪ Analyze 1.6M smart meters usage trends, proactively manage grid for outage reduction
▪ Data Warehouse for data scientists and grid analysis applications
8. MemEx: IoT Showcase Application
- Combines Apache Kafka, Spark,
MemSQL, and OpenMaps for global
supply chain management
- Enables enterprises to predict
throughput of supply warehouses
- Processes 2 million data points, based
on 2,000 sensors across 1,000
warehouses
15. 15
Real-time drilling sensor data to manage the high stakes of
producing oil in a depressed market and maximizing productivity.
+ Top Energy Firm
15
16. TECHNICAL BENEFITS
- Enabled machine learning scoring of streaming data for real-time
Predictive Analytics
- Integrated SAS BI PMML for deep analytics
- Joined multiple data types and third party sources including
geospatial and weather data
16
17. 17
Spark MLlib Predictive Model
REAL-TIME
INPUTS
Raw Sensor 1 + Predictive Score 1
S1 P1
1
BUSINESS
LOGIC
18. Continued Rise of IoT
18
Sensor Array
PoS Systems
Connected Fleets
Mobile Apps
Security
Reporting Systems
Log Systems
Data Lake
Data Warehouse
Databases
“By 2020, over 20 billion connected things will be in use across a
range of industries; the IoT will touch every role across the enterprise.”
Source: Gartner
19. 19
“These are highly automated drones. They have what is
called sense-and-avoid technology. That means, basically,
seeing and then avoiding obstacles.”
Yahoo, January 2016: https://www.yahoo.com/tech/exclusive-amazon-reveals-details-about-1343951725436982.html
19
Amazon Invests in Drones for 30 Minute
Post-Order Deliveries
20. 20
Fedex Breaks Record With 317 Million
Packages Shipped Over Christmas 2015
“FedEx Ground continues to advance the industry’s most
automated hub network with investments in package sortation
systems that enable flexible and reliable operations and
six-sided scanning tunnels that boost data and image capture.”
FedEx, October 2015: http://about.van.fedex.com/newsroom/global-english/fedex-forecasts-record-volume-this-holiday-season/
20
21. The Evolution of Data Analytics
21
Descriptive Analytics Predictive AnalyticsReal-Time Analytics