SlideShare a Scribd company logo
1 of 46
Download to read offline
DBG / June 5, 2018 / © 2018 IBM Corporation
Model Parallelism in
Spark ML 

Cross-validation
Nick Pentreath
Principal Engineer
Bryan Cutler
Software Engineer
DBG / June 5, 2018 / © 2018 IBM Corporation
About Nick
@MLnick on Twitter & Github
Principal Engineer, IBM
CODAIT - Center for Open-Source Data & AI
Technologies
Machine Learning & AI
Apache Spark committer & PMC
Author of Machine Learning with Spark
Various conferences & meetups
DBG / June 5, 2018 / © 2018 IBM Corporation
About Bryan
Software Engineer, IBM CODAIT
Apache Spark committer
Apache Arrow committer
Python, Machine Learning OSS
@BryanCutler on Github
DBG / June 5, 2018 / © 2018 IBM Corporation
Center for Open Source Data and AI Technologies
CODAIT
codait.org
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
Improving Enterprise AI Lifecycle in Open Source
DBG / June 5, 2018 / © 2018 IBM Corporation
Agenda
Model Tuning in Spark
Scaling Model Tuning
Performance Results
Best Practices
Future Directions in Optimizing
Pipelines
DBG / June 5, 2018 / © 2018 IBM Corporation
Model Tuning in Spark
DBG / June 5, 2018 / © 2018 IBM Corporation
Model selection: workflow within a workflow
Model Tuning in Spark
Ingest
Data
Processing
Feature
Engineering
Model
Selection
Final Model
Candidate
models
Train
Evaluate
Adjust
DBG / June 5, 2018 / © 2018 IBM Corporation
Pipeline cross-validation
Model Tuning in Spark
Tokenizer CountVectorizer LogisticRegression
Spark ML Pipeline
# features:
10
# features:
100
regParam:
0.001
regParam:
0.1
Parameters
DBG / June 5, 2018 / © 2018 IBM Corporation
Pipeline cross-validation
Model Tuning in Spark
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.001
# features:
10
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.1
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.001
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.1
# features:
100
regParam:
0.001
regParam:
0.1
DBG / June 5, 2018 / © 2018 IBM Corporation
Pipeline cross-validation
Model Tuning in Spark
# features:
10
# features:
100
regParam:
0.001
regParam:
0.1
Tokenizer CountVectorizer LogisticRegression
DBG / June 5, 2018 / © 2018 IBM Corporation
Pipeline cross-validation
Model Tuning in Spark
DBG / June 5, 2018 / © 2018 IBM Corporation
Pipeline cross-validation
Model Tuning in Spark
DBG / June 5, 2018 / © 2018 IBM Corporation
Pipeline cross-validation
Model Tuning in Spark
DBG / June 5, 2018 / © 2018 IBM Corporation
Cross-validation is expensive!
Model Tuning in Spark
• 5 x 5 x 5 hyperparameters = 125 pipelines
• ... across 4 machine learning models = 500
• If training & evaluation does not fully utilize
available cluster resources then that waste is
compounded for each model
Based on XKCD comic: https://xkcd.com/303/
& https://github.com/mislavcimpersak/xkcd-excuse-generator
DBG / June 5, 2018 / © 2018 IBM Corporation
Scaling Model Tuning
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.001
# features:
10
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.1
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.001
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.1
# features:
100
regParam:
0.001
regParam:
0.1
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.001
# features:
10
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.1
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.001
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.1
# features:
100
regParam:
0.001
regParam:
0.1
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.001
# features:
10
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.1
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.001
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.1
# features:
100
regParam:
0.001
regParam:
0.1
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.001
# features:
10
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.1
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.001
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.1
# features:
100
regParam:
0.001
regParam:
0.1
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
• Added in SPARK-19357 and SPARK-21911
(PySpark)
• Parallelism parameter governs the
maximum # models to be trained at once
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
# features:
10
# features:
100
regParam:
0.001
regParam:
0.1
Tokenizer CountVectorizer LogisticRegression
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
DBG / June 5, 2018 / © 2018 IBM Corporation
Implementation considerations
Scaling Model Tuning
• Parallelism parameter sets the size of
threadpool under the hood
• Dedicated ExecutionContext created to
avoid deadlocks with using the default
threadpool
• Used Futures instead of parallel
collections – more flexible
• Model-specific parallel fitting
implementations not supported
• SPARK-22126
DBG / June 5, 2018 / © 2018 IBM Corporation
Performance tests
Scaling Model Tuning
• Compared parallel CV to serial CV with
varying number of samples
• Simple LogisticRegression with regParam
and fitIntercept; parameter grid size 12
• Measure elapsed time for cross-validation
• Data size: 100,000 -> 5,000,000
• Number features: 10
• Number partitions: 10
• Number CV folds: 5
• Parallelism: 3
• Standalone cluster with 30 cores
DBG / June 5, 2018 / © 2018 IBM Corporation
Results
Scaling Model Tuning
• ±2.4x speedup
• Stays roughly constant as #
samples increases
DBG / June 5, 2018 / © 2018 IBM Corporation
Best practices
Scaling Model Tuning
• Simple integer parameter is the only thing
you can set (for now)
• Too low => under-utilize resources
• Too high => could lead to memory issues or
overloading cluster
• Rough rule: # cores / # partitions
• But depends on data and model sizes
• Mid-sized cluster probably <= 10
DBG / June 5, 2018 / © 2018 IBM Corporation
Optimizing Tuning for
Pipeline Models
DBG / June 5, 2018 / © 2018 IBM Corporation
Challenges
Optimizing Tuning for Pipeline Models
• Multi-stage, complex pipelines
• Parameter grid with hyperparameters from
different stages
• Easy to have huge number of candidate
parameter combinations
• Model parallelism helps, but can we do
better?
DBG / June 5, 2018 / © 2018 IBM Corporation
Duplicating work
Optimizing Tuning for Pipeline Models
• Each Pipeline treated
independently
• Depending on parameter grid
and pipeline stages
• Fit the same model multiple
times
• Perform same transformations
multiple times
DBG / June 5, 2018 / © 2018 IBM Corporation
Optimize with a DAG
Optimizing Tuning for Pipeline Models
• A node is an estimator/transformer with a
set of hyperparameters
• A path in the graph is a single pipeline
model
Tokenizer
Count
Vectorizer
nfeat=10
Count
Vectorizer
nfeat=100
LR
reg=0.1
LR
reg=0.01
LR
reg=0.1
LR
reg=0.01
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallelize in breadth-first order
Optimizing Tuning for Pipeline Models
• Example with parallelism parameter set to
2
• Tokenizer is only a transform, proceed to fit
CountVectorizer nodes
Tokenizer
Count
Vectorizer
nfeat=10
Count
Vectorizer
nfeat=100
LR
reg=0.1
LR
reg=0.01
LR
reg=0.1
LR
reg=0.01
DBG / June 5, 2018 / © 2018 IBM Corporation
Fit estimators
Optimizing Tuning for Pipeline Models
• Cache the result and proceed to fit the first
2 LogisticRegression models Tokenizer
Count
Vectorizer
nfeat=10
Count
Vectorizer
nfeat=100
LR
reg=0.1
LR
reg=0.01
LR
reg=0.1
LR
reg=0.01
Cache result
DBG / June 5, 2018 / © 2018 IBM Corporation
Fit estimators
Optimizing Tuning for Pipeline Models
• Unpersist when child tasks done
• Fit final 2 LR models Tokenizer
Count
Vectorizer
nfeat=10
Count
Vectorizer
nfeat=100
LR
reg=0.1
LR
reg=0.01
LR
reg=0.1
LR
reg=0.01
Unpersist
cached
dataframe
Cache
result
DBG / June 5, 2018 / © 2018 IBM Corporation
Fit estimators
Optimizing Tuning for Pipeline Models
• All 4 LR models fitted
Tokenizer
Count
Vectorizer
nfeat=10
Count
Vectorizer
nfeat=100
LR
reg=0.1
LR
reg=0.01
LR
reg=0.1
LR
reg=0.01
Unpersist
cached
dataframe
DBG / June 5, 2018 / © 2018 IBM Corporation
Evaluate models
Optimizing Tuning for Pipeline Models
• Evaluate models using similar method
• CountVectorizerModel is now a transformer
• Cache transform result
Tokenizer
CVModel
nfeat=10
CVModel
nfeat=100
LRModel
reg=0.1
LRModel
reg=0.01
LRModel
reg=0.1
LRModel
reg=0.01
Cache result
DBG / June 5, 2018 / © 2018 IBM Corporation
Evaluate models
Optimizing Tuning for Pipeline Models
• Evaluate models using similar method
• CountVectorizerModel is now a transformer
• Cache transform result
Tokenizer
CVModel
nfeat=10
CVModel
nfeat=100
LRModel
reg=0.1
LRModel
reg=0.01
LRModel
reg=0.1
LRModel
reg=0.01
Unpersist
cached
dataframe
Cache
result
Metrics: 0.62 0.62
DBG / June 5, 2018 / © 2018 IBM Corporation
Evaluate models
Optimizing Tuning for Pipeline Models
• All models evaluated for this fold
Tokenizer
CVModel
nfeat=10
CVModel
nfeat=100
LRModel
reg=0.1
LRModel
reg=0.01
LRModel
reg=0.1
LRModel
reg=0.01
Unpersist
cached
dataframe
Metrics: 0.62 0.62 0.72 0.66
DBG / June 5, 2018 / © 2018 IBM Corporation
Select best model
Optimizing Tuning for Pipeline Models
• Average the metrics from all folds and
select the best PipelineModel Tokenizer
CVModel
nfeat=10
CVModel
nfeat=100
LRModel
reg=0.1
LRModel
reg=0.01
LRModel
reg=0.1
LRModel
reg=0.01
Avg
Metrics:
0.64 0.64 0.71 0.65
DBG / June 5, 2018 / © 2018 IBM Corporation
Performance tests
Optimizing Tuning for Pipeline Models
• Compared to Standard Spark CV with
parallelism enabled
• Pipeline:

MinMaxScaler → PCA → LinearRegression

• Measure elapsed time for cross-validation
varying size of parameter grid from 36 to
80 models to evaluate
• Data size: 1,000,000
• Number features: 50
• Number partitions: 16
• Number CV folds: 4
• Parallelism: 3
• Standalone cluster with 30 cores
DBG / June 5, 2018 / © 2018 IBM Corporation
Results
Optimizing Tuning for Pipeline Models
• Up to 3.25x speedup
• Increases with more models …
• … and more complex pipelines
• Check out:
• https://github.com/BryanCutler/PipelineTuning
• Experimental!
• Watch SPARK-19071
Elapsed time for DAG CV vs Simple Parallel CV
0
275
550
825
1100
# models
36 48 60 80
Parallel DAG Parallel
DBG / June 5, 2018 / © 2018 IBM Corporation
Thank you!
codait.org
twitter.com/MLnick
github.com/MLnick
github.com/BryanCutler
developer.ibm.com/code
FfDL
Sign up for IBM Cloud and try Watson Studio!
https://datascience.ibm.com/
MAX
DBG / June 5, 2018 / © 2018 IBM Corporation
Date, Time, Location & Duration Session title and Speaker
Tue, June 5 | 11 AM
2010-2012, 30 mins
Productionizing Spark ML Pipelines with the Portable Format for Analytics
Nick Pentreath (IBM)
Tue, June 5 | 2 PM
2018, 30 mins
Making PySpark Amazing—From Faster UDFs to Dependency Management and Graphing!
Holden Karau (Google) Bryan Cutler (IBM)
Tue, June 5 | 2 PM
Nook by 2001, 30 mins
Making Data and AI Accessible for All
Armand Ruiz Gabernet (IBM)
Tue, June 5 | 2:40 PM
2002-2004, 30 mins
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database System
Rajesh Bordawekar (IBM T.J. Watson Research Center)
Tue, June 5 | 3:20 PM
3016-3022, 30 mins
Dynamic Priorities for Apache Spark Application’s Resource Allocations
Michael Feiman (IBM Spectrum Computing) Shinnosuke Okada (IBM Canada Ltd.)
Tue, June 5 | 3:20 PM
2001-2005, 30 mins
Model Parallelism in Spark ML Cross-Validation
Nick Pentreath (IBM) Bryan Cutler (IBM)
Tue, June 5 | 3:20 PM
2007, 30 mins
Serverless Machine Learning on Modern Hardware Using Apache Spark
Patrick Stuedi (IBM)
Tue, June 5 | 5:40 PM
2002-2004, 30 mins
Create a Loyal Customer Base by Knowing Their Personality Using AI-Based Personality Recommendation Engine;
Sourav Mazumder (IBM Analytics) Aradhna Tiwari (University of South Florida)
Tue, June 5 | 5:40 PM
2007, 30 mins
Transparent GPU Exploitation on Apache Spark
Dr. Kazuaki Ishizaki (IBM) Madhusudanan Kandasamy (IBM)
Tue, June 5 | 5:40 PM
2009-2011, 30 mins
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for Deep Neural Networks
Yonggang Hu (IBM) Chao Xue (IBM)
IBM Sessions at Spark+AI Summit 2018 (Tuesday, June 5)
DBG / June 5, 2018 / © 2018 IBM Corporation
Date, Time, Location & Duration Session title and Speaker
Wed, June 6 | 12:50 PM Birds of a Feather: Apache Arrow in Spark and More
Bryan Cutler (IBM) Li Jin (Two Sigma Investments, LP)
Wed, June 6 | 2 PM
2002-2004, 30 mins
Deep Learning for Recommender Systems
Nick Pentreath (IBM) )
Wed, June 6 | 3:20 PM
2018, 30 mins
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer
Frederick Reiss (IBM) Vijay Bommireddipalli (IBM Center for Open-Source Data & AI Technologies)
IBM Sessions at Spark+AI Summit 2018 (Wednesday, June 6)
Meet us at IBM booth in the Expo area.
DBG / June 5, 2018 / © 2018 IBM Corporation

More Related Content

What's hot

Airflow presentation
Airflow presentationAirflow presentation
Airflow presentationIlias Okacha
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseAltinity Ltd
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)Shivji Kumar Jha
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...Altinity Ltd
 
[Atlassian in 부산]Git을 이용한 형상관리 전략_투씨드
[Atlassian in 부산]Git을 이용한 형상관리 전략_투씨드[Atlassian in 부산]Git을 이용한 형상관리 전략_투씨드
[Atlassian in 부산]Git을 이용한 형상관리 전략_투씨드Atlassian 대한민국
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
Distributed applications using Hazelcast
Distributed applications using HazelcastDistributed applications using Hazelcast
Distributed applications using HazelcastTaras Matyashovsky
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
 
Event Sourcing - what could go wrong - Jfokus 2022
Event Sourcing - what could go wrong - Jfokus 2022Event Sourcing - what could go wrong - Jfokus 2022
Event Sourcing - what could go wrong - Jfokus 2022Andrzej Ludwikowski
 
SQL Server效能調校
SQL Server效能調校SQL Server效能調校
SQL Server效能調校國昭 張
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides Altinity Ltd
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesAltinity Ltd
 
Introducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itBruno Faria
 
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 Lessons from the Field, Episode II: Applying Best Practices to Your Apache S... Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...Databricks
 
ksqlDB: Building Consciousness on Real Time Events
ksqlDB: Building Consciousness on Real Time EventsksqlDB: Building Consciousness on Real Time Events
ksqlDB: Building Consciousness on Real Time Eventsconfluent
 

What's hot (20)

Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
 
[Atlassian in 부산]Git을 이용한 형상관리 전략_투씨드
[Atlassian in 부산]Git을 이용한 형상관리 전략_투씨드[Atlassian in 부산]Git을 이용한 형상관리 전략_투씨드
[Atlassian in 부산]Git을 이용한 형상관리 전략_투씨드
 
Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
kafka
kafkakafka
kafka
 
Distributed applications using Hazelcast
Distributed applications using HazelcastDistributed applications using Hazelcast
Distributed applications using Hazelcast
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
Event Sourcing - what could go wrong - Jfokus 2022
Event Sourcing - what could go wrong - Jfokus 2022Event Sourcing - what could go wrong - Jfokus 2022
Event Sourcing - what could go wrong - Jfokus 2022
 
SQL Server效能調校
SQL Server效能調校SQL Server效能調校
SQL Server效能調校
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
Introducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using it
 
Github in Action
Github in ActionGithub in Action
Github in Action
 
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 Lessons from the Field, Episode II: Applying Best Practices to Your Apache S... Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 
ksqlDB: Building Consciousness on Real Time Events
ksqlDB: Building Consciousness on Real Time EventsksqlDB: Building Consciousness on Real Time Events
ksqlDB: Building Consciousness on Real Time Events
 

Similar to Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan Cutler

Productionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsProductionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsNick Pentreath
 
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...Databricks
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3DataWorks Summit
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...Alok Singh
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathChester Chen
 
Productionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analyticsProductionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analyticsDataWorks Summit
 
SigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt
 
Search and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinSearch and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinNick Pentreath
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Julien SIMON
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Amazon Web Services
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Paul Brebner
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Amazon Web Services
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...AWS Summits
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Databricks
 
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Dataconomy Media
 
Airline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseAirline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseDataWorks Summit
 
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostAggregage
 
Post compiler software optimization for reducing energy
Post compiler software optimization for reducing energyPost compiler software optimization for reducing energy
Post compiler software optimization for reducing energyAbhishek Abhyankar
 
Developer insight into why applications run amazingly Fast in CF 2018
Developer insight into why applications run amazingly Fast in CF 2018Developer insight into why applications run amazingly Fast in CF 2018
Developer insight into why applications run amazingly Fast in CF 2018Pavan Kumar
 

Similar to Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan Cutler (20)

Productionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsProductionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for Analytics
 
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
 
Productionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analyticsProductionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analytics
 
SigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt for Hedge Funds
SigOpt for Hedge Funds
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AI
 
Search and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinSearch and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same Coin
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
 
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
 
Airline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseAirline reservations and routing: a graph use case
Airline reservations and routing: a graph use case
 
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
 
Post compiler software optimization for reducing energy
Post compiler software optimization for reducing energyPost compiler software optimization for reducing energy
Post compiler software optimization for reducing energy
 
Developer insight into why applications run amazingly Fast in CF 2018
Developer insight into why applications run amazingly Fast in CF 2018Developer insight into why applications run amazingly Fast in CF 2018
Developer insight into why applications run amazingly Fast in CF 2018
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 

Recently uploaded (20)

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 

Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan Cutler

  • 1. DBG / June 5, 2018 / © 2018 IBM Corporation Model Parallelism in Spark ML 
 Cross-validation Nick Pentreath Principal Engineer Bryan Cutler Software Engineer
  • 2. DBG / June 5, 2018 / © 2018 IBM Corporation About Nick @MLnick on Twitter & Github Principal Engineer, IBM CODAIT - Center for Open-Source Data & AI Technologies Machine Learning & AI Apache Spark committer & PMC Author of Machine Learning with Spark Various conferences & meetups
  • 3. DBG / June 5, 2018 / © 2018 IBM Corporation About Bryan Software Engineer, IBM CODAIT Apache Spark committer Apache Arrow committer Python, Machine Learning OSS @BryanCutler on Github
  • 4. DBG / June 5, 2018 / © 2018 IBM Corporation Center for Open Source Data and AI Technologies CODAIT codait.org CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise Relaunch of the Spark Technology Center (STC) to reflect expanded mission Improving Enterprise AI Lifecycle in Open Source
  • 5. DBG / June 5, 2018 / © 2018 IBM Corporation Agenda Model Tuning in Spark Scaling Model Tuning Performance Results Best Practices Future Directions in Optimizing Pipelines
  • 6. DBG / June 5, 2018 / © 2018 IBM Corporation Model Tuning in Spark
  • 7. DBG / June 5, 2018 / © 2018 IBM Corporation Model selection: workflow within a workflow Model Tuning in Spark Ingest Data Processing Feature Engineering Model Selection Final Model Candidate models Train Evaluate Adjust
  • 8. DBG / June 5, 2018 / © 2018 IBM Corporation Pipeline cross-validation Model Tuning in Spark Tokenizer CountVectorizer LogisticRegression Spark ML Pipeline # features: 10 # features: 100 regParam: 0.001 regParam: 0.1 Parameters
  • 9. DBG / June 5, 2018 / © 2018 IBM Corporation Pipeline cross-validation Model Tuning in Spark Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.001 # features: 10 Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.1 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.001 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.1 # features: 100 regParam: 0.001 regParam: 0.1
  • 10. DBG / June 5, 2018 / © 2018 IBM Corporation Pipeline cross-validation Model Tuning in Spark # features: 10 # features: 100 regParam: 0.001 regParam: 0.1 Tokenizer CountVectorizer LogisticRegression
  • 11. DBG / June 5, 2018 / © 2018 IBM Corporation Pipeline cross-validation Model Tuning in Spark
  • 12. DBG / June 5, 2018 / © 2018 IBM Corporation Pipeline cross-validation Model Tuning in Spark
  • 13. DBG / June 5, 2018 / © 2018 IBM Corporation Pipeline cross-validation Model Tuning in Spark
  • 14. DBG / June 5, 2018 / © 2018 IBM Corporation Cross-validation is expensive! Model Tuning in Spark • 5 x 5 x 5 hyperparameters = 125 pipelines • ... across 4 machine learning models = 500 • If training & evaluation does not fully utilize available cluster resources then that waste is compounded for each model Based on XKCD comic: https://xkcd.com/303/ & https://github.com/mislavcimpersak/xkcd-excuse-generator
  • 15. DBG / June 5, 2018 / © 2018 IBM Corporation Scaling Model Tuning
  • 16. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.001 # features: 10 Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.1 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.001 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.1 # features: 100 regParam: 0.001 regParam: 0.1
  • 17. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.001 # features: 10 Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.1 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.001 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.1 # features: 100 regParam: 0.001 regParam: 0.1
  • 18. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.001 # features: 10 Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.1 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.001 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.1 # features: 100 regParam: 0.001 regParam: 0.1
  • 19. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.001 # features: 10 Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.1 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.001 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.1 # features: 100 regParam: 0.001 regParam: 0.1
  • 20. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning • Added in SPARK-19357 and SPARK-21911 (PySpark) • Parallelism parameter governs the maximum # models to be trained at once
  • 21. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning # features: 10 # features: 100 regParam: 0.001 regParam: 0.1 Tokenizer CountVectorizer LogisticRegression
  • 22. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning
  • 23. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning
  • 24. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning
  • 25. DBG / June 5, 2018 / © 2018 IBM Corporation Implementation considerations Scaling Model Tuning • Parallelism parameter sets the size of threadpool under the hood • Dedicated ExecutionContext created to avoid deadlocks with using the default threadpool • Used Futures instead of parallel collections – more flexible • Model-specific parallel fitting implementations not supported • SPARK-22126
  • 26. DBG / June 5, 2018 / © 2018 IBM Corporation Performance tests Scaling Model Tuning • Compared parallel CV to serial CV with varying number of samples • Simple LogisticRegression with regParam and fitIntercept; parameter grid size 12 • Measure elapsed time for cross-validation • Data size: 100,000 -> 5,000,000 • Number features: 10 • Number partitions: 10 • Number CV folds: 5 • Parallelism: 3 • Standalone cluster with 30 cores
  • 27. DBG / June 5, 2018 / © 2018 IBM Corporation Results Scaling Model Tuning • ±2.4x speedup • Stays roughly constant as # samples increases
  • 28. DBG / June 5, 2018 / © 2018 IBM Corporation Best practices Scaling Model Tuning • Simple integer parameter is the only thing you can set (for now) • Too low => under-utilize resources • Too high => could lead to memory issues or overloading cluster • Rough rule: # cores / # partitions • But depends on data and model sizes • Mid-sized cluster probably <= 10
  • 29. DBG / June 5, 2018 / © 2018 IBM Corporation Optimizing Tuning for Pipeline Models
  • 30. DBG / June 5, 2018 / © 2018 IBM Corporation Challenges Optimizing Tuning for Pipeline Models • Multi-stage, complex pipelines • Parameter grid with hyperparameters from different stages • Easy to have huge number of candidate parameter combinations • Model parallelism helps, but can we do better?
  • 31. DBG / June 5, 2018 / © 2018 IBM Corporation Duplicating work Optimizing Tuning for Pipeline Models • Each Pipeline treated independently • Depending on parameter grid and pipeline stages • Fit the same model multiple times • Perform same transformations multiple times
  • 32. DBG / June 5, 2018 / © 2018 IBM Corporation Optimize with a DAG Optimizing Tuning for Pipeline Models • A node is an estimator/transformer with a set of hyperparameters • A path in the graph is a single pipeline model Tokenizer Count Vectorizer nfeat=10 Count Vectorizer nfeat=100 LR reg=0.1 LR reg=0.01 LR reg=0.1 LR reg=0.01
  • 33. DBG / June 5, 2018 / © 2018 IBM Corporation Parallelize in breadth-first order Optimizing Tuning for Pipeline Models • Example with parallelism parameter set to 2 • Tokenizer is only a transform, proceed to fit CountVectorizer nodes Tokenizer Count Vectorizer nfeat=10 Count Vectorizer nfeat=100 LR reg=0.1 LR reg=0.01 LR reg=0.1 LR reg=0.01
  • 34. DBG / June 5, 2018 / © 2018 IBM Corporation Fit estimators Optimizing Tuning for Pipeline Models • Cache the result and proceed to fit the first 2 LogisticRegression models Tokenizer Count Vectorizer nfeat=10 Count Vectorizer nfeat=100 LR reg=0.1 LR reg=0.01 LR reg=0.1 LR reg=0.01 Cache result
  • 35. DBG / June 5, 2018 / © 2018 IBM Corporation Fit estimators Optimizing Tuning for Pipeline Models • Unpersist when child tasks done • Fit final 2 LR models Tokenizer Count Vectorizer nfeat=10 Count Vectorizer nfeat=100 LR reg=0.1 LR reg=0.01 LR reg=0.1 LR reg=0.01 Unpersist cached dataframe Cache result
  • 36. DBG / June 5, 2018 / © 2018 IBM Corporation Fit estimators Optimizing Tuning for Pipeline Models • All 4 LR models fitted Tokenizer Count Vectorizer nfeat=10 Count Vectorizer nfeat=100 LR reg=0.1 LR reg=0.01 LR reg=0.1 LR reg=0.01 Unpersist cached dataframe
  • 37. DBG / June 5, 2018 / © 2018 IBM Corporation Evaluate models Optimizing Tuning for Pipeline Models • Evaluate models using similar method • CountVectorizerModel is now a transformer • Cache transform result Tokenizer CVModel nfeat=10 CVModel nfeat=100 LRModel reg=0.1 LRModel reg=0.01 LRModel reg=0.1 LRModel reg=0.01 Cache result
  • 38. DBG / June 5, 2018 / © 2018 IBM Corporation Evaluate models Optimizing Tuning for Pipeline Models • Evaluate models using similar method • CountVectorizerModel is now a transformer • Cache transform result Tokenizer CVModel nfeat=10 CVModel nfeat=100 LRModel reg=0.1 LRModel reg=0.01 LRModel reg=0.1 LRModel reg=0.01 Unpersist cached dataframe Cache result Metrics: 0.62 0.62
  • 39. DBG / June 5, 2018 / © 2018 IBM Corporation Evaluate models Optimizing Tuning for Pipeline Models • All models evaluated for this fold Tokenizer CVModel nfeat=10 CVModel nfeat=100 LRModel reg=0.1 LRModel reg=0.01 LRModel reg=0.1 LRModel reg=0.01 Unpersist cached dataframe Metrics: 0.62 0.62 0.72 0.66
  • 40. DBG / June 5, 2018 / © 2018 IBM Corporation Select best model Optimizing Tuning for Pipeline Models • Average the metrics from all folds and select the best PipelineModel Tokenizer CVModel nfeat=10 CVModel nfeat=100 LRModel reg=0.1 LRModel reg=0.01 LRModel reg=0.1 LRModel reg=0.01 Avg Metrics: 0.64 0.64 0.71 0.65
  • 41. DBG / June 5, 2018 / © 2018 IBM Corporation Performance tests Optimizing Tuning for Pipeline Models • Compared to Standard Spark CV with parallelism enabled • Pipeline:
 MinMaxScaler → PCA → LinearRegression
 • Measure elapsed time for cross-validation varying size of parameter grid from 36 to 80 models to evaluate • Data size: 1,000,000 • Number features: 50 • Number partitions: 16 • Number CV folds: 4 • Parallelism: 3 • Standalone cluster with 30 cores
  • 42. DBG / June 5, 2018 / © 2018 IBM Corporation Results Optimizing Tuning for Pipeline Models • Up to 3.25x speedup • Increases with more models … • … and more complex pipelines • Check out: • https://github.com/BryanCutler/PipelineTuning • Experimental! • Watch SPARK-19071 Elapsed time for DAG CV vs Simple Parallel CV 0 275 550 825 1100 # models 36 48 60 80 Parallel DAG Parallel
  • 43. DBG / June 5, 2018 / © 2018 IBM Corporation Thank you! codait.org twitter.com/MLnick github.com/MLnick github.com/BryanCutler developer.ibm.com/code FfDL Sign up for IBM Cloud and try Watson Studio! https://datascience.ibm.com/ MAX
  • 44. DBG / June 5, 2018 / © 2018 IBM Corporation Date, Time, Location & Duration Session title and Speaker Tue, June 5 | 11 AM 2010-2012, 30 mins Productionizing Spark ML Pipelines with the Portable Format for Analytics Nick Pentreath (IBM) Tue, June 5 | 2 PM 2018, 30 mins Making PySpark Amazing—From Faster UDFs to Dependency Management and Graphing! Holden Karau (Google) Bryan Cutler (IBM) Tue, June 5 | 2 PM Nook by 2001, 30 mins Making Data and AI Accessible for All Armand Ruiz Gabernet (IBM) Tue, June 5 | 2:40 PM 2002-2004, 30 mins Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database System Rajesh Bordawekar (IBM T.J. Watson Research Center) Tue, June 5 | 3:20 PM 3016-3022, 30 mins Dynamic Priorities for Apache Spark Application’s Resource Allocations Michael Feiman (IBM Spectrum Computing) Shinnosuke Okada (IBM Canada Ltd.) Tue, June 5 | 3:20 PM 2001-2005, 30 mins Model Parallelism in Spark ML Cross-Validation Nick Pentreath (IBM) Bryan Cutler (IBM) Tue, June 5 | 3:20 PM 2007, 30 mins Serverless Machine Learning on Modern Hardware Using Apache Spark Patrick Stuedi (IBM) Tue, June 5 | 5:40 PM 2002-2004, 30 mins Create a Loyal Customer Base by Knowing Their Personality Using AI-Based Personality Recommendation Engine; Sourav Mazumder (IBM Analytics) Aradhna Tiwari (University of South Florida) Tue, June 5 | 5:40 PM 2007, 30 mins Transparent GPU Exploitation on Apache Spark Dr. Kazuaki Ishizaki (IBM) Madhusudanan Kandasamy (IBM) Tue, June 5 | 5:40 PM 2009-2011, 30 mins Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for Deep Neural Networks Yonggang Hu (IBM) Chao Xue (IBM) IBM Sessions at Spark+AI Summit 2018 (Tuesday, June 5)
  • 45. DBG / June 5, 2018 / © 2018 IBM Corporation Date, Time, Location & Duration Session title and Speaker Wed, June 6 | 12:50 PM Birds of a Feather: Apache Arrow in Spark and More Bryan Cutler (IBM) Li Jin (Two Sigma Investments, LP) Wed, June 6 | 2 PM 2002-2004, 30 mins Deep Learning for Recommender Systems Nick Pentreath (IBM) ) Wed, June 6 | 3:20 PM 2018, 30 mins Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer Frederick Reiss (IBM) Vijay Bommireddipalli (IBM Center for Open-Source Data & AI Technologies) IBM Sessions at Spark+AI Summit 2018 (Wednesday, June 6) Meet us at IBM booth in the Expo area.
  • 46. DBG / June 5, 2018 / © 2018 IBM Corporation