SlideShare a Scribd company logo
1 of 19
Download to read offline
SPARK + FLASHBLADE
DELIVERING INSIGHTS FROM 5PB OF PRODUCT LOGS AT PURE STORAGE
Brian Gold
Pure Storage
© 2017 PURE STORAGE INC.
2
ALL-FLASH STORAGE FOR
DATA-INTENSIVE COMPUTING
© 2017 PURE STORAGE INC.
3
DATA POTENTIAL
IS GROWING EXPONENTIALLY
BIG
DATA
FAST
DATA
 X
© 2017 PURE STORAGE INC.
4
DATA POTENTIAL
IS GROWING EXPONENTIALLY
BIG
DATA
FAST
DATA
AGILE
DATA
X
 X
 =
 DATA
ADVANTAGE
© 2017 PURE STORAGE INC.
5
AGILE DATA MUST BE EFFORTLESS
TOP 1%: PURE STORAGE vs ALL OTHER BUSINESSES IN CUSTOMER SATISFACTION
The Pure Storage team brings issues to our
attention before we are even aware of them.” 

– 
Matt Harris, Head of IT



 
MERCEDES AMG PETRONAS Formula One™ Team

> 7700 incidents automatically ID’d & resolved
> 170 sev-1 issues avoided to date
(As measured by Satmetrix 2016 NPS Global B2B Benchmarks.)
© 2017 PURE STORAGE INC.
6
PURE’S ANALYTICS PIPELINE
CURRENT CHALLENGES
①  >10 ANALYSIS TOOLS, PLUS AD-HOC SCRIPTS
②  >5PB ACROSS 100s OF DATA SILOS
Engineering tickets
Internal QA tests
Product logs
CRM/Sales feed
Manufacturing
© 2017 PURE STORAGE INC.
7
LOG ANALYSIS TODAY
Raw logs
(grep, awk, etc.)
AD-HOC OR ENSEMBLE ANALYSIS
IS TOO SLOW (OR IMPOSSIBLE)
Extract & Aggregate
© 2017 PURE STORAGE INC.
8
IMPROVEMENTS WITH SPARK
100X+ PERFORMANCE IMPROVEMENTS
WITH LESS BESPOKE CODE
Raw logs
(archive only)
Extract
Transform
Aggregate
Load
© 2017 PURE STORAGE INC.
9
TOOL CONSOLIDATION
 SPARK ENABLES WORKFLOW CONSOLIDATION
•  Fewer compute resources
•  More code reuse

 CHALLENGES FOR SPARK & ECOSYSTEM
•  Increase performance visibility for tuning partitioning and ingest flows
•  Improve indexing options integrated in Spark for real-time queries
•  Eliminating OOMs and spurious exceptions
© 2017 PURE STORAGE INC.
10
PURE’S ANALYTICS PIPELINE
CURRENT CHALLENGES
①  >10 ANALYSIS TOOLS, PLUS AD-HOC SCRIPTS
②  >5PB ACROSS 100s OF DATA SILOS
Engineering tickets
Internal QA tests
Product logs
CRM/Sales feed
Manufacturing
© 2017 PURE STORAGE INC.
11
INFRASTRUCTURE SILOS
HDFS
POSTGRES
ELASTIC
KAFKA
© 2017 PURE STORAGE INC.
12
INFRASTRUCTURE SILOS
SPREAD OVER 100+ DATABASES
HDFS
POSTGRES
ELASTIC
KAFKA
HDFS
POSTGRES
MYSQL
HDFS
ELASTIC
POSTGRES
© 2017 PURE STORAGE INC.
13
INFRASTRUCTURE SILOS
SPREAD OVER 100+ DATABASES
HDFS
POSTGRES
ELASTIC
KAFKA
HDFS
POSTGRES
MYSQL
HDFS
ELASTIC
POSTGRES
RACKS OF EQUIPMENT, OFTEN
AT LOW AVERAGE UTILIZATION
© 2017 PURE STORAGE INC.
14
DECOUPLING COMPUTE & STORAGE
?
© 2017 PURE STORAGE INC.
15
INTRODUCING FLASHBLADE
ALL-FLASH FILE AND OBJECT STORAGE
BIG, FAST, SIMPLE
All-flash HW & SW
Up to 1.6PBs, 1M NFS OPS, and
16 GB/s in each 4U chassis
Designed to scale to 100s of blades

FILE + OBJECT
Access via multiple protocols
Ideal for consolidation of applications
and access patterns
BLADE
 ELASTICITY
 ELASTIC FABRIC
© 2017 PURE STORAGE INC.
16
INFRASTRUCTURE CONSOLIDATION
CLOUD-SCALE EFFICIENCY AND AGILITY
rsyslog
1000-CORE SPARK CLUSTER
ON 250TB FLASHBLADE
© 2017 PURE STORAGE INC.
17
INFRASTRUCTURE CONSOLIDATION
CLOUD-SCALE EFFICIENCY AND AGILITY
rsyslog
1000-CORE SPARK CLUSTER
ON 250TB FLASHBLADE
DENSITY: SHRINK 5 RACKS TO 1 (HDFS VS FLASHBLADE)
FLEXIBILITY: SCALE COMPUTE AND STORAGE INDEPENDENTLY
EFFICIENCY: CONSOLIDATE APPLICATIONS & CLUSTERS, SHARE DATA
© 2017 PURE STORAGE INC.
18
THIS IS JUST THE BEGINNING…
 THE FUTURE IS BRIGHT FOR SPARK + FLASHBLADE
•  Spark is a fast, flexible, swiss-army knife for consolidating analytics workflows
•  Running Spark on FlashBlade enables infrastructure consolidation and drives
flexibility and efficiency for data science and supporting teams
 COME VISIT US AT BOOTH 208 IN THE EXPO HALL!
 BIG
 FAST 
 SIMPLE
Spark + Flashblade: Spark Summit East talk by Brian Gold

More Related Content

What's hot

Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Alluxio, Inc.
 
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...Spark Summit
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTDenis Magda
 
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopAccelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopDataWorks Summit
 
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data StoresPresto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data StoresAlluxio, Inc.
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Spark Summit
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersDataWorks Summit/Hadoop Summit
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...DataStax
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoAlluxio, Inc.
 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedis Labs
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyAlluxio, Inc.
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDataWorks Summit
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMAlluxio, Inc.
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...Dremio Corporation
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.
 

What's hot (20)

Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
 
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
Hadoop Everywhere
Hadoop EverywhereHadoop Everywhere
Hadoop Everywhere
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
 
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopAccelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
 
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data StoresPresto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Tame that Beast
Tame that BeastTame that Beast
Tame that Beast
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power Systems
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiency
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 

Viewers also liked

BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...Spark Summit
 
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...Spark Summit
 
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Spark Summit
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Spark Summit
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Spark Summit
 
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...Spark Summit
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark Summit
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkDatabricks
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Spark Summit
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Spark Summit
 
Data-Driven Water Security with Bluemix Apache Spark Service: Spark Summit Ea...
Data-Driven Water Security with Bluemix Apache Spark Service: Spark Summit Ea...Data-Driven Water Security with Bluemix Apache Spark Service: Spark Summit Ea...
Data-Driven Water Security with Bluemix Apache Spark Service: Spark Summit Ea...Spark Summit
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterSpark Summit
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Spark Summit
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungSpark Summit
 
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Trends for Big Data and Apache Spark in 2017 by Matei ZahariaTrends for Big Data and Apache Spark in 2017 by Matei Zaharia
Trends for Big Data and Apache Spark in 2017 by Matei ZahariaSpark Summit
 
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...Spark Summit
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 
Global Empire-Building for Fun and Profit: Spark Summit East talk by Michelle...
Global Empire-Building for Fun and Profit: Spark Summit East talk by Michelle...Global Empire-Building for Fun and Profit: Spark Summit East talk by Michelle...
Global Empire-Building for Fun and Profit: Spark Summit East talk by Michelle...Spark Summit
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Spark Summit
 

Viewers also liked (20)

BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
 
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
 
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
 
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
 
Data-Driven Water Security with Bluemix Apache Spark Service: Spark Summit Ea...
Data-Driven Water Security with Bluemix Apache Spark Service: Spark Summit Ea...Data-Driven Water Security with Bluemix Apache Spark Service: Spark Summit Ea...
Data-Driven Water Security with Bluemix Apache Spark Service: Spark Summit Ea...
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
 
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Trends for Big Data and Apache Spark in 2017 by Matei ZahariaTrends for Big Data and Apache Spark in 2017 by Matei Zaharia
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
 
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
Global Empire-Building for Fun and Profit: Spark Summit East talk by Michelle...
Global Empire-Building for Fun and Profit: Spark Summit East talk by Michelle...Global Empire-Building for Fun and Profit: Spark Summit East talk by Michelle...
Global Empire-Building for Fun and Profit: Spark Summit East talk by Michelle...
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
 

Similar to Spark + Flashblade: Spark Summit East talk by Brian Gold

GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心NVIDIA Taiwan
 
Data at the corner of SAP and AWS
Data at the corner of SAP and AWSData at the corner of SAP and AWS
Data at the corner of SAP and AWSOcean9, Inc.
 
Cloud Expo NYC 2017: Big Data in IoT
Cloud Expo NYC 2017: Big Data in IoTCloud Expo NYC 2017: Big Data in IoT
Cloud Expo NYC 2017: Big Data in IoTOcean9, Inc.
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseAmazon Web Services
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudDataWorks Summit/Hadoop Summit
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceSnowflake Computing
 
Fanatics Ingests Streaming Data to a Data Lake on AWS
Fanatics Ingests Streaming Data to a Data Lake on AWSFanatics Ingests Streaming Data to a Data Lake on AWS
Fanatics Ingests Streaming Data to a Data Lake on AWSAmazon Web Services
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data FederationNRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data FederationNRB
 
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation NRB
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and QuboleAmazon Web Services
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and QuboleAmazon Web Services
 
Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...
Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...
Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...Amazon Web Services
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWKent Graziano
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data SnapLogic
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveGeekNightHyderabad
 
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020Aerospike
 
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppBridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppMongoDB
 
Accelerate Value from Big Data, AI, and IoT Initiatives with One-Tenth of the...
Accelerate Value from Big Data, AI, and IoT Initiatives with One-Tenth of the...Accelerate Value from Big Data, AI, and IoT Initiatives with One-Tenth of the...
Accelerate Value from Big Data, AI, and IoT Initiatives with One-Tenth of the...Amazon Web Services
 

Similar to Spark + Flashblade: Spark Summit East talk by Brian Gold (20)

GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
 
Data at the corner of SAP and AWS
Data at the corner of SAP and AWSData at the corner of SAP and AWS
Data at the corner of SAP and AWS
 
Cloud Expo NYC 2017: Big Data in IoT
Cloud Expo NYC 2017: Big Data in IoTCloud Expo NYC 2017: Big Data in IoT
Cloud Expo NYC 2017: Big Data in IoT
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
Fanatics Ingests Streaming Data to a Data Lake on AWS
Fanatics Ingests Streaming Data to a Data Lake on AWSFanatics Ingests Streaming Data to a Data Lake on AWS
Fanatics Ingests Streaming Data to a Data Lake on AWS
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data FederationNRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
 
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...
Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...
Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
 
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
 
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppBridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
 
Accelerate Value from Big Data, AI, and IoT Initiatives with One-Tenth of the...
Accelerate Value from Big Data, AI, and IoT Initiatives with One-Tenth of the...Accelerate Value from Big Data, AI, and IoT Initiatives with One-Tenth of the...
Accelerate Value from Big Data, AI, and IoT Initiatives with One-Tenth of the...
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 

Recently uploaded (20)

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 

Spark + Flashblade: Spark Summit East talk by Brian Gold

  • 1. SPARK + FLASHBLADE DELIVERING INSIGHTS FROM 5PB OF PRODUCT LOGS AT PURE STORAGE Brian Gold Pure Storage
  • 2. © 2017 PURE STORAGE INC. 2 ALL-FLASH STORAGE FOR DATA-INTENSIVE COMPUTING
  • 3. © 2017 PURE STORAGE INC. 3 DATA POTENTIAL IS GROWING EXPONENTIALLY BIG DATA FAST DATA X
  • 4. © 2017 PURE STORAGE INC. 4 DATA POTENTIAL IS GROWING EXPONENTIALLY BIG DATA FAST DATA AGILE DATA X X = DATA ADVANTAGE
  • 5. © 2017 PURE STORAGE INC. 5 AGILE DATA MUST BE EFFORTLESS TOP 1%: PURE STORAGE vs ALL OTHER BUSINESSES IN CUSTOMER SATISFACTION The Pure Storage team brings issues to our attention before we are even aware of them.” – Matt Harris, Head of IT MERCEDES AMG PETRONAS Formula One™ Team > 7700 incidents automatically ID’d & resolved > 170 sev-1 issues avoided to date (As measured by Satmetrix 2016 NPS Global B2B Benchmarks.)
  • 6. © 2017 PURE STORAGE INC. 6 PURE’S ANALYTICS PIPELINE CURRENT CHALLENGES ①  >10 ANALYSIS TOOLS, PLUS AD-HOC SCRIPTS ②  >5PB ACROSS 100s OF DATA SILOS Engineering tickets Internal QA tests Product logs CRM/Sales feed Manufacturing
  • 7. © 2017 PURE STORAGE INC. 7 LOG ANALYSIS TODAY Raw logs (grep, awk, etc.) AD-HOC OR ENSEMBLE ANALYSIS IS TOO SLOW (OR IMPOSSIBLE) Extract & Aggregate
  • 8. © 2017 PURE STORAGE INC. 8 IMPROVEMENTS WITH SPARK 100X+ PERFORMANCE IMPROVEMENTS WITH LESS BESPOKE CODE Raw logs (archive only) Extract Transform Aggregate Load
  • 9. © 2017 PURE STORAGE INC. 9 TOOL CONSOLIDATION  SPARK ENABLES WORKFLOW CONSOLIDATION •  Fewer compute resources •  More code reuse  CHALLENGES FOR SPARK & ECOSYSTEM •  Increase performance visibility for tuning partitioning and ingest flows •  Improve indexing options integrated in Spark for real-time queries •  Eliminating OOMs and spurious exceptions
  • 10. © 2017 PURE STORAGE INC. 10 PURE’S ANALYTICS PIPELINE CURRENT CHALLENGES ①  >10 ANALYSIS TOOLS, PLUS AD-HOC SCRIPTS ②  >5PB ACROSS 100s OF DATA SILOS Engineering tickets Internal QA tests Product logs CRM/Sales feed Manufacturing
  • 11. © 2017 PURE STORAGE INC. 11 INFRASTRUCTURE SILOS HDFS POSTGRES ELASTIC KAFKA
  • 12. © 2017 PURE STORAGE INC. 12 INFRASTRUCTURE SILOS SPREAD OVER 100+ DATABASES HDFS POSTGRES ELASTIC KAFKA HDFS POSTGRES MYSQL HDFS ELASTIC POSTGRES
  • 13. © 2017 PURE STORAGE INC. 13 INFRASTRUCTURE SILOS SPREAD OVER 100+ DATABASES HDFS POSTGRES ELASTIC KAFKA HDFS POSTGRES MYSQL HDFS ELASTIC POSTGRES RACKS OF EQUIPMENT, OFTEN AT LOW AVERAGE UTILIZATION
  • 14. © 2017 PURE STORAGE INC. 14 DECOUPLING COMPUTE & STORAGE ?
  • 15. © 2017 PURE STORAGE INC. 15 INTRODUCING FLASHBLADE ALL-FLASH FILE AND OBJECT STORAGE BIG, FAST, SIMPLE All-flash HW & SW Up to 1.6PBs, 1M NFS OPS, and 16 GB/s in each 4U chassis Designed to scale to 100s of blades FILE + OBJECT Access via multiple protocols Ideal for consolidation of applications and access patterns BLADE ELASTICITY ELASTIC FABRIC
  • 16. © 2017 PURE STORAGE INC. 16 INFRASTRUCTURE CONSOLIDATION CLOUD-SCALE EFFICIENCY AND AGILITY rsyslog 1000-CORE SPARK CLUSTER ON 250TB FLASHBLADE
  • 17. © 2017 PURE STORAGE INC. 17 INFRASTRUCTURE CONSOLIDATION CLOUD-SCALE EFFICIENCY AND AGILITY rsyslog 1000-CORE SPARK CLUSTER ON 250TB FLASHBLADE DENSITY: SHRINK 5 RACKS TO 1 (HDFS VS FLASHBLADE) FLEXIBILITY: SCALE COMPUTE AND STORAGE INDEPENDENTLY EFFICIENCY: CONSOLIDATE APPLICATIONS & CLUSTERS, SHARE DATA
  • 18. © 2017 PURE STORAGE INC. 18 THIS IS JUST THE BEGINNING…  THE FUTURE IS BRIGHT FOR SPARK + FLASHBLADE •  Spark is a fast, flexible, swiss-army knife for consolidating analytics workflows •  Running Spark on FlashBlade enables infrastructure consolidation and drives flexibility and efficiency for data science and supporting teams  COME VISIT US AT BOOTH 208 IN THE EXPO HALL!  BIG  FAST  SIMPLE