SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
A H i g h Pe r f o r m a n c e D a t a P l a t f o r m U s i n g F l a s h
M A R C H 2 0 1 8
S u p e r C o m p u t i n g A s i a
• Edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
$48M in Funding20+ Deployments
Industrial IoT
Financial Services
Telecommunications
Cyber Security
2
100+ Employees
NA
APAC
EMEA
iguazio
• Disk and Network were still slow
• Network was in general slower than disk
When Hadoop started
Data at scale
Compute
Data
Storage
Distribute data
• Disk is faster than Network
Compute is where data is
Compute
Data
StorageDistribute compute
• Scalable but inherently batch and slow
• Disk and Network I/O (Map, reduce, shuffle … )
• Runs on JVM
• Not suitable for all type of workloads (real-time, interactive, iterative M/L)
• Cannot scale compute independent of storage and vice versa
Big Data Workloads –
Performance Characteristics
Big Data Workloads –
Performance Characteristics
Big Data Workload Disk I/O Network I/O Compute
Iterative Machine Learning
Jobs
Interactive Analytics at
scale
Lambda & Kappa
Architecture at scale
Typically Disk or Network maxes out during a massively parallel job, before Compute
High
Medium
Low
Big Data Workloads –
Performance Characteristics
Big Data Workload Disk I/O Network I/O Compute
Iterative Machine Learning
Jobs
Interactive Analytics at
scale
Lambda & Kappa
Architecture at scale
Traditional Database
Workload
Compare this to traditional RDBBS – CPU or Disk I/O maxes out
Typically Disk or Network maxes out during a massively parallel job, before Compute
High
Medium
Low
Big Data Workloads –
Performance Characteristics
Big Data Workload Disk I/O Network I/O Compute
Iterative Machine Learning
Jobs
Interactive Analytics at
scale
Lambda & Kappa
Architecture at scale
Traditional Database
Workload
in-memory and high-speed networking ?
High
Medium
Low
• L1 cache reference 0.5 ns
• Branch mispredict 5 ns
• L2 cache reference 7 ns
• Mutex lock/unlock 100 ns
• Main memory reference 100 ns
• Send 2K bytes over 1 Gbps network 20,000 ns
• SSD seek 80,000 ns
• Read 1 MB sequentially from memory 250,000 ns
• Round trip within same datacenter 500,000 ns
• Disk seek 10,000,000 ns
• Read 1 MB sequentially from network 10,000,000 ns
• Read 1 MB sequentially from disk 30,000,000 ns
• Send packet CA->Netherlands->CA 150,000,000 ns
Designs, Lessons and Advice from Building Large Distributed Systems by Dr Jeff Dean of Google Source http://www.slideshare.net/ikewu83/dean-
keynoteladis2009-4885081
Optimizing Big Data
Performance
RAM is 10-20x more expensive than Flash in $Cost / GB
RAM is expensive
SSDs prices are falling
• L1 cache reference 0.5 ns
• Branch mispredict 5 ns
• L2 cache reference 7 ns
• Mutex lock/unlock 100 ns
• Main memory reference 100 ns
• Send 2K bytes over 1 Gbps network 20,000 ns
• SSD seek 80,000 ns
• Read 1 MB sequentially from memory 250,000 ns
• Round trip within same datacenter 500,000 ns
• Disk seek 10,000,000 ns
• Read 1 MB sequentially from network 10,000,000 ns
• Read 1 MB sequentially from disk 30,000,000 ns
• Send packet CA->Netherlands->CA 150,000,000 ns
Designs, Lessons and Advice from Building Large Distributed Systems by Dr Jeff Dean of Google Source http://www.slideshare.net/ikewu83/dean-
keynoteladis2009-4885081
Optimizing Big Data
Performance
0
1
2
3
4
5
6
7
8
9
DRAM NVMe SSD SATA SSD SAS HDD
8.5
1.2
0.85
0.04
Cost in $ per GB (Raw)
Cost in $ per GB (Raw)
Price of Memory
Technology 2016/17
• NVMe is a software based standard that was specifically optimized for SSDs connected through the
PCIe interface.
What is NVMe?
• Shorter hardware data access path – directly connected via PCIe, Faster compared to SATA
• NVMe Completely redesigned software - bypasses conventional block layer request queue –
• Asynchronous Submission Queue for requests
• Asynchronous Completion Queue
Why is NVMe faster
(than SATA) ?
Source
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=2CA58A08BFB2821F59A7996DAC8742F1?doi=10.1.1.697.1493&rep=rep1&type=pdf
17
Micro-Services &
Serverless Functions
Unified DB for
Real-Time & Analytics
EXTERNAL
EVENTS AND
SOURCES
IMMEDIATE
INSIGHTS &
ACTIONS
EXTERNAL
DATA
Enrich
Reinforce
Learning
iguazio
• Combined fresh data
• and historical data
• Immediate Insights
• Rapid time to production
• Supporting common APIs
• Deployment Everywhere
Unified DB for Real-Time & Analytics
FUNCTIONS &
MICRO-SERVICES
SQL & TS
QUERY APIS
ANALYTICS &
AI TOOLS AWS APIS
iguazio Data platform
for real time and analytics
Single platform
All in One
S3
Kinesis
DynamoDB
Secure Intelligent Multi-model
UNIFIED & REAL-TIME DATABASE ENGINE
EXTERNAL DATA LAKES & CLOUDS
Architecture
• In-mem DB performance with
Flash economies and density
• Access data concurrently through
multiple standard APIs
• Fully integrated PaaS
19
LOCAL NVMe
AND OS BYPASS
Tr a d i t i o n a l L a y e r e d A p p r o a c h i g u a z i o
How to take full
advantage of NVMe?
And Advanced
Networking
iguazio © 2016
21
The tests were performed using the Yahoo! Cloud Serving Benchmark (YCSB), an open
source framework for evaluating and comparing the performance of multiple types of
NoSQL database management systems, the de facto industry standard for this purpose.
Performance Results
Independently scaling
compute and storage
iguazio Scale Outx2
Storage
Data nodes
(Processing)
CPU/Drive ratio
Performance
Capacity
Smart Rebuild
Scale Out System
CPU/Drive ratio
Performance
Capacity
Smart Rebuild
Processing + Storage
Scale Up System
Processing
CPU/Drive ratio
Performance
Capacity
Smart Rebuild
Storage
iguazio © 2016
24
Why iguazio?
Very high ingestion rate
Low latency for faster dashboards – in memory speed at the cost of SSD
Real time analytics on both fresh and historical data
Scale out architecture – enabling real time analytics on large data sets
Platform as a service providing cloud experience
iguazio © 2016
25
Business benefits - what’s in it for me?
Increase operational efficiency
Reduce TCO
Faster time to market for new services
Reduce cost of traditional data center operations while constraining growth of expensive cloud services such as
Amazon DynamoDB, Kinesis , S3 , redshift and EMR
Bring new services on-line in less time with greater reliability & security
Optimize business processes
Increase data engineering efficiency
Simplify the overall data pipe-line helping engineering to focus on building applications
Thank You
@Santanu_Dey

Contenu connexe

Tendances

IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...In-Memory Computing Summit
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDBMongoDB
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Redis Labs
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File Systemtutchiio
 
WebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsWebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsMaarten Smeets
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterMongoDB
 
Redis for horizontally scaled data processing at jFrog bintray
Redis for horizontally scaled data processing at jFrog bintrayRedis for horizontally scaled data processing at jFrog bintray
Redis for horizontally scaled data processing at jFrog bintrayRedis Labs
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCErik Krogen
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQLlefredbe
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Володимир Цап "Constraint driven infrastructure - scale or tune?"
Володимир Цап "Constraint driven infrastructure - scale or tune?"Володимир Цап "Constraint driven infrastructure - scale or tune?"
Володимир Цап "Constraint driven infrastructure - scale or tune?"Fwdays
 
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiHBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiMichael Stack
 
Performance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for DatabasesPerformance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for DatabasesScyllaDB
 
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScyllaDB
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationScott Miao
 

Tendances (20)

Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
 
WebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsWebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck Threads
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
 
Redis for horizontally scaled data processing at jFrog bintray
Redis for horizontally scaled data processing at jFrog bintrayRedis for horizontally scaled data processing at jFrog bintray
Redis for horizontally scaled data processing at jFrog bintray
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
 
Володимир Цап "Constraint driven infrastructure - scale or tune?"
Володимир Цап "Constraint driven infrastructure - scale or tune?"Володимир Цап "Constraint driven infrastructure - scale or tune?"
Володимир Цап "Constraint driven infrastructure - scale or tune?"
 
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiHBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
 
Accordion HBaseCon 2017
Accordion HBaseCon 2017Accordion HBaseCon 2017
Accordion HBaseCon 2017
 
Performance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for DatabasesPerformance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for Databases
 
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
GFS
GFSGFS
GFS
 

Similaire à Building a High Performance Analytics Platform

Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed_Hat_Storage
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcturesabnees
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutionssolarisyougood
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...Splunk
 
Best storage engine for MySQL
Best storage engine for MySQLBest storage engine for MySQL
Best storage engine for MySQLtomflemingh2
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_clusterPrabhat gangwar
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesArnon Shimoni
 
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectHow to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectPeak Hosting
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkDatabricks
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Boni Bruno
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the CloudKellyn Pot'Vin-Gorman
 

Similaire à Building a High Performance Analytics Platform (20)

Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
 
Best storage engine for MySQL
Best storage engine for MySQLBest storage engine for MySQL
Best storage engine for MySQL
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
 
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectHow to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data Project
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
 

Dernier

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 

Dernier (20)

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 

Building a High Performance Analytics Platform

  • 1. A H i g h Pe r f o r m a n c e D a t a P l a t f o r m U s i n g F l a s h M A R C H 2 0 1 8 S u p e r C o m p u t i n g A s i a
  • 2. • Edit Master text styles • Second level • Third level • Fourth level • Fifth level $48M in Funding20+ Deployments Industrial IoT Financial Services Telecommunications Cyber Security 2 100+ Employees NA APAC EMEA iguazio
  • 3. • Disk and Network were still slow • Network was in general slower than disk When Hadoop started
  • 5. • Disk is faster than Network Compute is where data is Compute Data StorageDistribute compute
  • 6. • Scalable but inherently batch and slow • Disk and Network I/O (Map, reduce, shuffle … ) • Runs on JVM • Not suitable for all type of workloads (real-time, interactive, iterative M/L) • Cannot scale compute independent of storage and vice versa Big Data Workloads – Performance Characteristics
  • 7. Big Data Workloads – Performance Characteristics Big Data Workload Disk I/O Network I/O Compute Iterative Machine Learning Jobs Interactive Analytics at scale Lambda & Kappa Architecture at scale Typically Disk or Network maxes out during a massively parallel job, before Compute High Medium Low
  • 8. Big Data Workloads – Performance Characteristics Big Data Workload Disk I/O Network I/O Compute Iterative Machine Learning Jobs Interactive Analytics at scale Lambda & Kappa Architecture at scale Traditional Database Workload Compare this to traditional RDBBS – CPU or Disk I/O maxes out Typically Disk or Network maxes out during a massively parallel job, before Compute High Medium Low
  • 9. Big Data Workloads – Performance Characteristics Big Data Workload Disk I/O Network I/O Compute Iterative Machine Learning Jobs Interactive Analytics at scale Lambda & Kappa Architecture at scale Traditional Database Workload in-memory and high-speed networking ? High Medium Low
  • 10. • L1 cache reference 0.5 ns • Branch mispredict 5 ns • L2 cache reference 7 ns • Mutex lock/unlock 100 ns • Main memory reference 100 ns • Send 2K bytes over 1 Gbps network 20,000 ns • SSD seek 80,000 ns • Read 1 MB sequentially from memory 250,000 ns • Round trip within same datacenter 500,000 ns • Disk seek 10,000,000 ns • Read 1 MB sequentially from network 10,000,000 ns • Read 1 MB sequentially from disk 30,000,000 ns • Send packet CA->Netherlands->CA 150,000,000 ns Designs, Lessons and Advice from Building Large Distributed Systems by Dr Jeff Dean of Google Source http://www.slideshare.net/ikewu83/dean- keynoteladis2009-4885081 Optimizing Big Data Performance
  • 11. RAM is 10-20x more expensive than Flash in $Cost / GB RAM is expensive
  • 12. SSDs prices are falling
  • 13. • L1 cache reference 0.5 ns • Branch mispredict 5 ns • L2 cache reference 7 ns • Mutex lock/unlock 100 ns • Main memory reference 100 ns • Send 2K bytes over 1 Gbps network 20,000 ns • SSD seek 80,000 ns • Read 1 MB sequentially from memory 250,000 ns • Round trip within same datacenter 500,000 ns • Disk seek 10,000,000 ns • Read 1 MB sequentially from network 10,000,000 ns • Read 1 MB sequentially from disk 30,000,000 ns • Send packet CA->Netherlands->CA 150,000,000 ns Designs, Lessons and Advice from Building Large Distributed Systems by Dr Jeff Dean of Google Source http://www.slideshare.net/ikewu83/dean- keynoteladis2009-4885081 Optimizing Big Data Performance
  • 14. 0 1 2 3 4 5 6 7 8 9 DRAM NVMe SSD SATA SSD SAS HDD 8.5 1.2 0.85 0.04 Cost in $ per GB (Raw) Cost in $ per GB (Raw) Price of Memory Technology 2016/17
  • 15. • NVMe is a software based standard that was specifically optimized for SSDs connected through the PCIe interface. What is NVMe?
  • 16. • Shorter hardware data access path – directly connected via PCIe, Faster compared to SATA • NVMe Completely redesigned software - bypasses conventional block layer request queue – • Asynchronous Submission Queue for requests • Asynchronous Completion Queue Why is NVMe faster (than SATA) ? Source http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=2CA58A08BFB2821F59A7996DAC8742F1?doi=10.1.1.697.1493&rep=rep1&type=pdf
  • 17. 17 Micro-Services & Serverless Functions Unified DB for Real-Time & Analytics EXTERNAL EVENTS AND SOURCES IMMEDIATE INSIGHTS & ACTIONS EXTERNAL DATA Enrich Reinforce Learning iguazio • Combined fresh data • and historical data • Immediate Insights • Rapid time to production • Supporting common APIs • Deployment Everywhere Unified DB for Real-Time & Analytics
  • 18. FUNCTIONS & MICRO-SERVICES SQL & TS QUERY APIS ANALYTICS & AI TOOLS AWS APIS iguazio Data platform for real time and analytics Single platform All in One S3 Kinesis DynamoDB Secure Intelligent Multi-model UNIFIED & REAL-TIME DATABASE ENGINE EXTERNAL DATA LAKES & CLOUDS Architecture • In-mem DB performance with Flash economies and density • Access data concurrently through multiple standard APIs • Fully integrated PaaS
  • 19. 19 LOCAL NVMe AND OS BYPASS Tr a d i t i o n a l L a y e r e d A p p r o a c h i g u a z i o How to take full advantage of NVMe?
  • 21. iguazio © 2016 21 The tests were performed using the Yahoo! Cloud Serving Benchmark (YCSB), an open source framework for evaluating and comparing the performance of multiple types of NoSQL database management systems, the de facto industry standard for this purpose. Performance Results
  • 22. Independently scaling compute and storage iguazio Scale Outx2 Storage Data nodes (Processing) CPU/Drive ratio Performance Capacity Smart Rebuild Scale Out System CPU/Drive ratio Performance Capacity Smart Rebuild Processing + Storage Scale Up System Processing CPU/Drive ratio Performance Capacity Smart Rebuild Storage
  • 23. iguazio © 2016 24 Why iguazio? Very high ingestion rate Low latency for faster dashboards – in memory speed at the cost of SSD Real time analytics on both fresh and historical data Scale out architecture – enabling real time analytics on large data sets Platform as a service providing cloud experience
  • 24. iguazio © 2016 25 Business benefits - what’s in it for me? Increase operational efficiency Reduce TCO Faster time to market for new services Reduce cost of traditional data center operations while constraining growth of expensive cloud services such as Amazon DynamoDB, Kinesis , S3 , redshift and EMR Bring new services on-line in less time with greater reliability & security Optimize business processes Increase data engineering efficiency Simplify the overall data pipe-line helping engineering to focus on building applications