SlideShare a Scribd company logo
1 of 16
Download to read offline
June 2018
Goodbye, Data Lake: Why Continuous
Analytics Yield Higher ROI
2
From Reactive to Proactive
The Data-Driven Business Challenge
Value
of Data
Time to Action
Real-time Minutes Days
Interactive
Event-Driven
Batch
Batch Layer
Real-time Layer
ETL Tools Change Log Batch Processing Data Lake
View 1
View 2
In-
Memory
NoSQL
Stream Stream Processing Serving
 Big data but slow
 Not up to date
 Complex
3
Too slow
Big and Slow or Small and Fast
Reports
Real-time
Dashboard
 Small amounts of data
 Expensive
 Lacks context
Limited context
OR
Data
Sources
4
How To Deliver Volume, Velocity and Variety ?
100TB NVMe Flash
(direct attached)
Apps, APIs,
and Functions
100 GbE fabric
Real-time Firewall
Real-Time DB
Decouple the APIs from
the DB Engine
Use NVMe Flash as an
extension of OS Memory
Traditional
Layered Approach
Rigid APIs
Database
File System
HCI / Storage Stack
External (NVMeOF / Object)
10 GbE fabric
10-100 GbE fabric
VM Hypervisor
• Slow
• Complex
• Expensive
• In-memory speed
• Simple
• 1/3rd the TCO
Optimized Approach
5
Ingest, Enrich, AI, and Serve on One DB Engine
External Data lakes
6
 Write code + local testing
 Build code and Docker image
 CI/CD pipeline
 Add logging and monitoring
 Harden security
 Provision servers + OS
 Handle data/event feed
 Handle failures/auto-scaling
 Handle rolling upgrades
 Configuration management
 Write code + local testing
 Provide spec, push deploy
Traditional Dev and Ops Model “Serverless” Development Model
Serverless, Eliminating 80% of The Work
1. Automated by the
serverless platform
2. Pay for what you use
80%
7
Open-Source Serverless, A Simpler Lock-free Alternative
Same APIs, Same User Experience, Anywhere
With native integration into each cloud platform
Laptop
Cloud
On-Prem or Edge
8
 Non-blocking, parallel
 Zero copy, buffer reuse
 Up to 400K events/sec/proc
Addressing Serverless Limitations With Nuclio
Function
Workers
Event
Listeners
Serverless for compute and data intensive tasks
100x faster than AWS Lambda !
Performance
Shard 1 Workers
Workers
Shard 2
Shard 3
Shard 4 Workers
Streaming and Batch
DB, MQ, File
Functions
 Auto-rebalance, checkpoints
 Any source: Kafka, NATS, Kinesis, event-
hub, iguazio, pub/sub, RabbitMQ, Cron
 Data bindings
 Shared volumes
 Context cache
Statefulness
nuclio processor
9
Evolve Into a Future Proof Cloud-Native Architecture
YARN
HbaseHDFS
Map
Reduce
Pig,
Hive, ..
DBaaS
S3 (object)
Once upon a time there was a beast …
You spent years feeding it with DevOps
And then it became cloudy !
Data
Orchestration
Middleware
Your Business Logic
Consume
Innovate
10
Delivering Intelligent Decisions in Real-Time
Ingested in real-time
(compressed to 10TB)
500TB of Raw Data
External
Context
Ingest Enrich AI Act
Unified Real-Time DB
Real-time
triggers
Real-time and historical
dashboards
Serve
ML Models
Zero Ops:
Via DBaaS, AIaaS, and Serverless
11
Cyber and Network Ops
 Processing high message throughput
from multiple streams at the rate of >
50K events/sec
 Cross correlating with historical and
external data in real-time
 AI predictions/inferencing conducted
on live data
 Small footprint to fit network locations
A leading telco needs to predict network behavior in real-time:
Traditional Continuous Analytics
Data Source Data Stores
Data Source Data Stores
Data Source Streaming Data Stores
Real-time and Batch
Streaming
ETL
Real-time
Batch
Build and Operationalize Proactive Systems Faster
Spectrum
Streaming
Netcool
Streaming
SMOD
Spectrum
Streaming
Netcool
Streaming
SMOD
REST
API
Visualization
Visualization
Visualization
• Complex, skill gaps, slow to productize
• No single view of ops, real-time, history
• Reactive (no actions)
• Simple, just a few weeks to a working app
• Unified view across ALL data
• AI driven, proactive
Demo: Voice Driven Real-Time Analytics
Voice
Query
SQL APIAI
Update
Locations SMART HOME
DEVICE
GOOGLE
MAP
SERVICE
WEB UI (REACT)
SQL Query
Demo Video
14
https://www.youtube.com/watch?v=vA8Uq7MvxL4
15
 Deliver real-time analytics on fresh, historical and operational data
 Optimize Flash usage to deliver in-memory speed at much lower costs
 Create a unified data layer for stream processing, AI and serving
 Adopt cloud-native and serverless approaches to gain agility
Build continuous, data-driven and proactive apps
Summary
info@iguazio.com | www.iguazio.com
Thank You

More Related Content

What's hot

Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksDatabricks
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezYahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezDataWorks Summit
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenDatabricks
 
Big data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands OnBig data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands Onhkbhadraa
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 PivotalOpenSourceHub
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark Summit
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Spark Summit
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
 
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...Databricks
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learningfelixcss
 
Streams and Things
Streams and ThingsStreams and Things
Streams and Thingsdarach
 
Deploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerDeploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerVu Nguyen Duy
 
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...PROIDEA
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamFlink Forward
 
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Fast Data with Apache Ignite and Apache Spark with Christos ErotocritouFast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Fast Data with Apache Ignite and Apache Spark with Christos ErotocritouSpark Summit
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormBrandon O'Brien
 
Infrastructure-as-code: bridging the gap between Devs and Ops
Infrastructure-as-code: bridging the gap between Devs and OpsInfrastructure-as-code: bridging the gap between Devs and Ops
Infrastructure-as-code: bridging the gap between Devs and OpsMykyta Protsenko
 

What's hot (20)

Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on Databricks
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezYahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
Big data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands OnBig data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands On
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learning
 
Streams and Things
Streams and ThingsStreams and Things
Streams and Things
 
Deploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerDeploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and docker
 
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
 
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Fast Data with Apache Ignite and Apache Spark with Christos ErotocritouFast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with Storm
 
Infrastructure-as-code: bridging the gap between Devs and Ops
Infrastructure-as-code: bridging the gap between Devs and OpsInfrastructure-as-code: bridging the gap between Devs and Ops
Infrastructure-as-code: bridging the gap between Devs and Ops
 

Similar to Stac summit june 14th - goodbye datalakes

HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalDataWorks Summit
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesKinetica
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Gavin Lin
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...nnakasone
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Data analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishData analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishCodeValue
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Amazon Web Services
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDATAVERSITY
 

Similar to Stac summit june 14th - goodbye datalakes (20)

HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposal
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial Services
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Data analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishData analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publish
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 

More from iguazio

Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Productioniguazio
 
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018iguazio
 
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...iguazio
 
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018iguazio
 
Serverless real use cases and best practices: Asavari Tayal, Microsoft, Serve...
Serverless real use cases and best practices: Asavari Tayal, Microsoft, Serve...Serverless real use cases and best practices: Asavari Tayal, Microsoft, Serve...
Serverless real use cases and best practices: Asavari Tayal, Microsoft, Serve...iguazio
 
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018iguazio
 

More from iguazio (7)

Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
 
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
 
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
 
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
 
Serverless real use cases and best practices: Asavari Tayal, Microsoft, Serve...
Serverless real use cases and best practices: Asavari Tayal, Microsoft, Serve...Serverless real use cases and best practices: Asavari Tayal, Microsoft, Serve...
Serverless real use cases and best practices: Asavari Tayal, Microsoft, Serve...
 
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Stac summit june 14th - goodbye datalakes

  • 1. June 2018 Goodbye, Data Lake: Why Continuous Analytics Yield Higher ROI
  • 2. 2 From Reactive to Proactive The Data-Driven Business Challenge Value of Data Time to Action Real-time Minutes Days Interactive Event-Driven Batch
  • 3. Batch Layer Real-time Layer ETL Tools Change Log Batch Processing Data Lake View 1 View 2 In- Memory NoSQL Stream Stream Processing Serving  Big data but slow  Not up to date  Complex 3 Too slow Big and Slow or Small and Fast Reports Real-time Dashboard  Small amounts of data  Expensive  Lacks context Limited context OR Data Sources
  • 4. 4 How To Deliver Volume, Velocity and Variety ? 100TB NVMe Flash (direct attached) Apps, APIs, and Functions 100 GbE fabric Real-time Firewall Real-Time DB Decouple the APIs from the DB Engine Use NVMe Flash as an extension of OS Memory Traditional Layered Approach Rigid APIs Database File System HCI / Storage Stack External (NVMeOF / Object) 10 GbE fabric 10-100 GbE fabric VM Hypervisor • Slow • Complex • Expensive • In-memory speed • Simple • 1/3rd the TCO Optimized Approach
  • 5. 5 Ingest, Enrich, AI, and Serve on One DB Engine External Data lakes
  • 6. 6  Write code + local testing  Build code and Docker image  CI/CD pipeline  Add logging and monitoring  Harden security  Provision servers + OS  Handle data/event feed  Handle failures/auto-scaling  Handle rolling upgrades  Configuration management  Write code + local testing  Provide spec, push deploy Traditional Dev and Ops Model “Serverless” Development Model Serverless, Eliminating 80% of The Work 1. Automated by the serverless platform 2. Pay for what you use 80%
  • 7. 7 Open-Source Serverless, A Simpler Lock-free Alternative Same APIs, Same User Experience, Anywhere With native integration into each cloud platform Laptop Cloud On-Prem or Edge
  • 8. 8  Non-blocking, parallel  Zero copy, buffer reuse  Up to 400K events/sec/proc Addressing Serverless Limitations With Nuclio Function Workers Event Listeners Serverless for compute and data intensive tasks 100x faster than AWS Lambda ! Performance Shard 1 Workers Workers Shard 2 Shard 3 Shard 4 Workers Streaming and Batch DB, MQ, File Functions  Auto-rebalance, checkpoints  Any source: Kafka, NATS, Kinesis, event- hub, iguazio, pub/sub, RabbitMQ, Cron  Data bindings  Shared volumes  Context cache Statefulness nuclio processor
  • 9. 9 Evolve Into a Future Proof Cloud-Native Architecture YARN HbaseHDFS Map Reduce Pig, Hive, .. DBaaS S3 (object) Once upon a time there was a beast … You spent years feeding it with DevOps And then it became cloudy ! Data Orchestration Middleware Your Business Logic Consume Innovate
  • 10. 10 Delivering Intelligent Decisions in Real-Time Ingested in real-time (compressed to 10TB) 500TB of Raw Data External Context Ingest Enrich AI Act Unified Real-Time DB Real-time triggers Real-time and historical dashboards Serve ML Models Zero Ops: Via DBaaS, AIaaS, and Serverless
  • 11. 11 Cyber and Network Ops  Processing high message throughput from multiple streams at the rate of > 50K events/sec  Cross correlating with historical and external data in real-time  AI predictions/inferencing conducted on live data  Small footprint to fit network locations A leading telco needs to predict network behavior in real-time:
  • 12. Traditional Continuous Analytics Data Source Data Stores Data Source Data Stores Data Source Streaming Data Stores Real-time and Batch Streaming ETL Real-time Batch Build and Operationalize Proactive Systems Faster Spectrum Streaming Netcool Streaming SMOD Spectrum Streaming Netcool Streaming SMOD REST API Visualization Visualization Visualization • Complex, skill gaps, slow to productize • No single view of ops, real-time, history • Reactive (no actions) • Simple, just a few weeks to a working app • Unified view across ALL data • AI driven, proactive
  • 13. Demo: Voice Driven Real-Time Analytics Voice Query SQL APIAI Update Locations SMART HOME DEVICE GOOGLE MAP SERVICE WEB UI (REACT) SQL Query
  • 15. 15  Deliver real-time analytics on fresh, historical and operational data  Optimize Flash usage to deliver in-memory speed at much lower costs  Create a unified data layer for stream processing, AI and serving  Adopt cloud-native and serverless approaches to gain agility Build continuous, data-driven and proactive apps Summary