SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Microsoft Azure Machine Learning
Anatomy of a machine learning service
Sharat Chikkerur, Senior Software Engineer, Microsoft
(On behalf of AzureML team)
Microsoft Azure Machine Learning (AzureML)
• AzureML is a cloud-hosted tool for creating and deploying machine
learning models
• Browser-based, zero-installation and cross platform
• Describe workflows graphically
• Workflows are versioned and support reproducibility
• Models can be programmatically retrained
• Models can be deployed to Azure as a scalable web service
• Can be scaled to 1000+ end points x 200 response containers per service
• Supports versioning, collaboration & monetization
Outline
• Distinguishing features (functional components) of AzureML
• Architectural components of AzureML
• Implementation details
• Lessons learned
Distinguishing features
MLStudio: Graphical authoring environment
AzureML Entities
Workspaces
Experiments
Graphs
Datasets
Assets
Actions
Web services
Versioning
• Each run of an experiment is versioned
• Can go back in time and examine historical results
• Intermediate results cached across experiments in workspace
• Each dataset has a unique source transformation
Collaboration
• Workspaces can be shared between multiple users
• Two users cannot however edit the same experiment simultaneously
• Any experiment can be pushed to a common AzureML gallery
• Allows experiments, models and transforms to be easily shared with the
AzureML user community
External Language Support
• Full-fidelity support for R, Python and SQL (via SQLite)
• AzureML datasets marshalled transparently
• R models marshalled into AzureML models
• Scripts available as part of operationalized web services
• Code isolation
• External language modules are executed within drawbridge (container)
• “Batteries included”
• R 3.1.0 with ~500 packages, Anaconda Python 2.7 with ~120 packages
• An experiment to be operationalized must be converted into a
“scoring” experiment
• Training and scoring experiments are “linked”
Operationalization
Operationalization
• A successful scoring experiment can be published as a web service
• Published web services are automatically managed, scaled out and load-balanced
• Web service available in two flavors
• Request/Response: Low-latency endpoint for scoring a single row at a time
• Batch: Endpoint for scoring a collection of records from Azure storage
Monetization
• Data marketplace (http://datamarket.azure.com) allows users to
monetize data models
• Supports
• Web services published through AzureML
• Stand alone web services
• Integration
• Python/R modules can query external web services (including marketplace
APIs) allowing functional composition
Architectural components
Component services
• Studio (UX)
• Experimentation Service (ES)
• Comprised of micro-services
• Job Execution Service (JES)
• Single Node Runtime (SNR)
• Request response service (RRS)
• Batch execution service (BES)
UX ES JES SNR
RRS
BES
User
Studio (UX)
• Primary UX layer
• Single page application
• Asset Palette
• Datasets
• Algorithms
• Trained models
• External language modules
• Experiment canvas
• DAG consisting of modules
• Module properties
• Parameters
• Action bar
• Commands to ES
UX ES JES SNR
RRS
BES
User
Experimentation Service (ES)
• Primary backend
• Orchestrates all component services
• Handles events to/from UX
• Programmatic access
• RESTful API (UX communicates this way)
• Features
• Experiment introspection
• Experiment manipulation/creation
• Consists of micro services
• UX, assets, authentication, packing etc.
UX ES JES SNR
RRS
BES
User
Job Execution Service (JES)
• Primary job scheduler
• Dependency tracking
• Experiment DAG defines dependencies between modules.
• Topological sort used to determined order of execution
• Parallel Execution
• Different experiments can be executed in parallel
• Modules that exist at the same depth in the tree can be scheduled in parallel
• Note: JES itself does not execute the task payload. They are
dispatched to a task queue
UX ES JES SNR
RRS
BES
User
Single Node Runtime (SNR)
• Executes tasks dispatched from JES
• Consumes tasks from a queue
• Tasks consists of input specification along with module parameters
• Stateless : Data required for execution is copied over
• Each SNR contains a copy of Runtime + modules
• Runtime-DataTables, Array implementation, IO, BaseClasses etc.
• Modules – machine learning algorithms
• SNR pool shared across deployment
• Size of the pool can be scaled based on demand
UX ES JES SNR
RRS
BES
User
Machine learning algorithms
• Sources of machine learning module assets
• Microsoft research
• Infer.NET (http://research.microsoft.com/en-
us/um/cambridge/projects/infernet/)
• Vowpal wabbit (http://hunch.net)
• OpenSource
• LibSVM
• PegaSOS
• OpenCV
• R
• Scikit-learn
UX ES JES SNR
RRS
BES
User
Category Sub category Module Reference
Supervised Binary Classification Average Perceptron (Freund & Schapire, 1999)
Bayes point machine (Herbrich, Graepel, & Campbell, 2001)
Boosted decision tree (Burges, 2010)
Decision jungle (Shotton et al., 2013)
Locally Deep SVM (Jose & Goyal, 2013)
Logistic regression (Duda, Hart, & Stork, 2000)
Neural network (Bishop, 1995)
Online SVM (Shalev-Shwartz et al., 2011)
Vowpal Wabbit (Langford et al., 2007)
Multiclass Decision Forest (Criminisi, 2011)
Decision Jungle (Shotton et al., 2013)
Multinomial regression (Andrew & Gao, 2007)
Neural network (Bishop, 1995)
One-vs-all (Rifkin & Klautau, 2004)
Vowpal Wabbit (Langford et al., 2007)
Regression Bayesian linear regression (Herbrich et al., 2001)
Boosted decision tree regression (Burges, 2010)
Linear regression (batch and online) (Bottou, 2010)
Decision Forest regression (Criminisi, 2011)
Random forest based quantile Regression (Criminisi, 2011)
Neural network based regression (Bishop, 1995)
Ordinal regression (McCullagh, 1980)
Poisson regression (Nelder & Wedderburn, 1972)
Recommendation Matchbox recommender (Stern et al., 2009)
Unsupervised Clustering K-means clustering (Jain, 2010)
Anomaly detection One class SVM (Schölkopf, Platt, Shawe-Taylor, Smola, &
Williamson, 2001)
PCA based anomaly detection (Duda et al., 2000)
Feature Selection Filter Filter based feature selection (Guyon, Guyon, Elisseeff, & Elisseeff, 2003)
Text analytics Topic modeling Online LDA using Vowpal wabbit (Hoffman, Blei, & Bach, 2010)
Request response service (RRS)
Batch Execution Service (BES)
• RRS
• Handles RESTful requests for single prediction
• Requests may execute full graph
• Can include data transformation before and after prediction
• Distinguishing feature compared to other web services
• Models and required datasets in graph are compiled to a static package
• Executes in-memory and on a single machine
• Can scale based on volume of requests
• BES
• Optimized for batch request. Similar to training workflow
UX ES JES SNR
RRS
BES
User
Implementation details
Implementation details : Data representation
• “DataTable”
• Similar to R/Pandas dataframe
• Column major organization with sliced and random access
• Has a rich schema
• Names: Allows re-ordering
• Purpose: Weights, Features, Labels etc.
• Stored as compressed 2D tiles
• “wide” tiles enable streaming access
• “narrow” tiles enable full column access
• Interoperability
• Can be marshalled in/out as R/Pandas dataframe
• Can be egressed out as CSV, TSV, SQL
Index 1
Block 1
Index 2
Block 2
Index 3
Block 3
Implementation details: Modules
• Functional units in an experiment graph
• Encapsulates: data sources & sinks, models, algorithms,
scripts
• Categories
• Data ingress
• Supported sources: CSV, TSV, ARFF, LibSVM, SQL, Hive
• Type guessing for CSV, TSV (allows override)
• Data manipulation
• Cleaning missing values, SQL Transformation, R & Python scripts
• Modeling
• Machine learning algorithm
• Supervised: binary classification, multiclass classification, linear
regression, ordinal regression, recommendation
• Unsupervised: PCA, k-means
• Optimization
• Parameter sweep
Implementation details: Modules
• Ports
• Define input and output contracts
• Allows multiple input formats per port
• I/O handling is done externally to the
module through pluggable port handlers
• Allows UX to validate inputs at design
time
• Parameters
• Strongly typed
• Supports conditional parameters
• Can be marked as ‘web service’
parameter – substituted at query time
• Supports ranges (for parameter sweep)
Implementation detail: Testing
• Standard tests
• UX tests
• Web services penetration testing
• Services integration test
• AzureML Specific tests
• Module properties tests
• Schema propagation tests
• E2E experiment tests
• Operationalized experiment tests
• “Runners” test
• Machine learning tests
• Accuracy tests
• Fuzz testing (boundary values testing)
• Golden values tests
• Auto-generated tests
Lessons learned
Lesson: Data wrangling is important
• More time is built in data wrangling than model building
• “A data scientist spends nearly 80% of the time cleaning data” – NY Times
(http://nyti.ms/1t8IzfE)
• Data manipulation modules are very popular
• Internal ranking
• “Execute R script”, “SQL Transform” modules are more popular than machine learning modules.
• It is hard to anticipate all data pre-processing needs
• Need to provide custom processing support
• SQL Transform
• Execute R script
• Execute Python script
Lesson: Make big data possible, but small data efficient
• Distributed machine learning comes with a large overhead (Zaharia et al. 2010)
• Typical data science workflows enable exploration with small
amounts of data
• Should make this effortless and intuitive
• AzureML approach: “Make big data possible, but small data efficient”
• Make sure all experiment graphs can handle data size.
• Support ingress of large data – SQL, Azure
• Support features to pre-process big data
• Feature selection
• Feature hashing
• Learning by counts – reduces high dimensional data to lower dimensional historic
counts/rates
• Support streaming algorithms for big data (e.g. “Train Vowpal Wabbit”)
Lesson: Feature gaps are inevitable
• Cannot cover all possible pre-processing scenarios
• Cannot provide all algorithms
• Support for scripting (R, Python, SQL)
• Allow custom data manipulation
• Allow users to bring in external libraries
• Allow users to call into other web services
• Isolate user code
• Support during operationalization
• Support custom modules
• Allow user to author first class “modules”
• Allow use to mix custom modules in the workflow
Lesson: Data science workflows should be reproducible
• Data science workflows are iterative, explorative and collaborative
• Need to provide a way to version and capture the workflow, settings, inputs etc.
• Make it easy to repeat the same experiment
• Reproducibility
• Capture random number seeds as part of the experiment.
• Same settings should produce the same results
• Re-running parts of the graph should be efficient.
• “Determinism”
• Modules are tagged as deterministic (e.g. SQL transform) or non-deterministic (e.g. :hive query)
• A graph can also be labeled as deterministic or non-deterministic
• Caching
• Outputs from deterministic modules are cached to make re-runs efficient.
• Only changed parts of the graph are re-executed.
Summary
• AzureML provides distinguishing features
• Visual authoring
• Versioning and reproducibility
• Collaboration
• Architecture
• Multiple scalable services
• Implementation details
• Extensible data format that can be interoperate with R & Python
• Modules provide a way to package data & code
• Lessons learned
• Data wrangling is important
• Allow user code to mitigate feature gaps
• Support big data but make small data efficient
Logistics: Getting access to AzureML
• http://azure.com/ml
• https://studio.azureml.net
• Guest access w/o sign in
• Free access with sign-in ($200 credit)
• Paid access with azure subscription
• https://manage.windowsazure.com
• Manage end points, storage accounts and workspaces
Thanks
shchikke@microsoft.com
Developing a predictive model is hard
Challenges
• Data processing
• Different sources, formats, schemas
• Missing values, noisy data
• Modeling
• Modeling choice
• Feature engineering
• Parameter tuning
• Tracking & collaboration
• Deployment & Retraining
• Productionizing/deployment of the
model
• Replication, scaling out
Developing a predictive model is hard
Challenges
• Data processing
• Different sources, formats, schemas
• Missing values, noisy data
• Modeling
• Modeling choice
• Feature engineering
• Parameter tuning
• Tracking & collaboration
• Deployment & Retraining
• Productionizing/deployment of the
model
• Replication, scaling out
Solutions
• Data processing
• Languages: SQL, R, python
• Frameworks: dpylr, pandas
• Stacks: Hadoop, Spark, Mapreduce
• Modeling
• Libraries: Weka, VW, ML Lib, LibSVM
• Feature engineering: gensim, NLTK
• Tuning: Spearmint, whetlab
• Tracking & collaboration: ipynb + github
• Deployment & Retraining
• Machine learning web services
Implementation detail: Schema propagation
• Schema is associated with
datasets/learners
• Dataset attributes
• Required columns for learners etc.
• Design time validation
• Module execution has latency overhead
• Schema is computed and propagated before
executing module code.
• Method: pre-determined schema calculus
• Each module class has well defined modification
of the schema
• One-off modules are encoded as exception
JES FE
JES WORKER
SNR FE
SNR WORKERTASKS STATE
USER
WORKSPACE
EXPERIMENTATION
SERVICE
Jobs Queue
Tasks Queue
JOBS STATE
• Stateless design, easy scalability,
failover simplicity
• Optimistic concurrency,
scheduling/locking overhead
• Separate shared storage, holding
transient job/tasks state
• Task cache management to speed
up execution and facilitate
iterative experimentation
• Throttling to limit the resource
usage per customer/workspace
• Plugin architecture for task
handlers and schedulers
JES SNR interaction

Contenu connexe

Tendances

Machine Learning Use Cases with Azure
Machine Learning Use Cases with AzureMachine Learning Use Cases with Azure
Machine Learning Use Cases with AzureChris McHenry
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&MDatabricks
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...Databricks
 
Resume_Achhar_Kalia
Resume_Achhar_KaliaResume_Achhar_Kalia
Resume_Achhar_KaliaAchhar Kalia
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLBarbara Fusinska
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Databricks
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...Spark Summit
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
 
Deep Learning with Microsoft Cognitive Toolkit
Deep Learning with Microsoft Cognitive ToolkitDeep Learning with Microsoft Cognitive Toolkit
Deep Learning with Microsoft Cognitive ToolkitBarbara Fusinska
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...Databricks
 
CI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranCI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranDatabricks
 
Azure Machine Learning tutorial
Azure Machine Learning tutorialAzure Machine Learning tutorial
Azure Machine Learning tutorialGiacomo Lanciano
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKJan Wiegelmann
 
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...Databricks
 
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016MLconf
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDatabricks
 

Tendances (20)

Machine Learning Use Cases with Azure
Machine Learning Use Cases with AzureMachine Learning Use Cases with Azure
Machine Learning Use Cases with Azure
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
 
Resume_Achhar_Kalia
Resume_Achhar_KaliaResume_Achhar_Kalia
Resume_Achhar_Kalia
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure ML
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
Deep Learning with Microsoft Cognitive Toolkit
Deep Learning with Microsoft Cognitive ToolkitDeep Learning with Microsoft Cognitive Toolkit
Deep Learning with Microsoft Cognitive Toolkit
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 
CI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranCI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel Kobran
 
Azure Machine Learning tutorial
Azure Machine Learning tutorialAzure Machine Learning tutorial
Azure Machine Learning tutorial
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACK
 
ML studio overview v1.1
ML studio overview v1.1ML studio overview v1.1
ML studio overview v1.1
 
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
 
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
 

En vedette

Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Mark Tabladillo
 
Azure machine learning overview
Azure machine learning overviewAzure machine learning overview
Azure machine learning overviewEric ShangKuan
 
Building Python Applications on Windows Azure
Building Python Applications on Windows AzureBuilding Python Applications on Windows Azure
Building Python Applications on Windows AzureEric ShangKuan
 
Developing Python Apps on Windows Azure
Developing Python Apps on Windows AzureDeveloping Python Apps on Windows Azure
Developing Python Apps on Windows Azurepycontw
 
Microsoft azure machine learning
Microsoft azure machine learningMicrosoft azure machine learning
Microsoft azure machine learningAmol Gholap
 
Large scale predictive analytics for anomaly detection - Nicolas Hohn
Large scale predictive analytics for anomaly detection - Nicolas HohnLarge scale predictive analytics for anomaly detection - Nicolas Hohn
Large scale predictive analytics for anomaly detection - Nicolas HohnPAPIs.io
 
Simple machine learning for the masses - Konstantin Davydov
Simple machine learning for the masses - Konstantin DavydovSimple machine learning for the masses - Konstantin Davydov
Simple machine learning for the masses - Konstantin DavydovPAPIs.io
 
Azure Machine Learning - A Full Journey
Azure Machine Learning - A Full JourneyAzure Machine Learning - A Full Journey
Azure Machine Learning - A Full JourneySolidQIT
 
DL on Azure ML with Python where type DL = Deep Learning | Deep LOVE
DL on Azure ML with Python where type DL = Deep Learning | Deep LOVEDL on Azure ML with Python where type DL = Deep Learning | Deep LOVE
DL on Azure ML with Python where type DL = Deep Learning | Deep LOVEYoshiyuki Nakamura
 
Short film research
Short film researchShort film research
Short film researchsaimaaauddin
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQLMichael Rys
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeMSAdvAnalytics
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)Amazon Web Services
 
What’s new on the Microsoft Azure Data Platform
What’s new on the Microsoft Azure Data Platform What’s new on the Microsoft Azure Data Platform
What’s new on the Microsoft Azure Data Platform Joris Poelmans
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveIlyas F ☁☁☁
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSAmazon Web Services
 

En vedette (19)

Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
 
Azure machine learning overview
Azure machine learning overviewAzure machine learning overview
Azure machine learning overview
 
Building Python Applications on Windows Azure
Building Python Applications on Windows AzureBuilding Python Applications on Windows Azure
Building Python Applications on Windows Azure
 
Developing Python Apps on Windows Azure
Developing Python Apps on Windows AzureDeveloping Python Apps on Windows Azure
Developing Python Apps on Windows Azure
 
Microsoft azure machine learning
Microsoft azure machine learningMicrosoft azure machine learning
Microsoft azure machine learning
 
Large scale predictive analytics for anomaly detection - Nicolas Hohn
Large scale predictive analytics for anomaly detection - Nicolas HohnLarge scale predictive analytics for anomaly detection - Nicolas Hohn
Large scale predictive analytics for anomaly detection - Nicolas Hohn
 
Simple machine learning for the masses - Konstantin Davydov
Simple machine learning for the masses - Konstantin DavydovSimple machine learning for the masses - Konstantin Davydov
Simple machine learning for the masses - Konstantin Davydov
 
Azure Machine Learning - A Full Journey
Azure Machine Learning - A Full JourneyAzure Machine Learning - A Full Journey
Azure Machine Learning - A Full Journey
 
DL on Azure ML with Python where type DL = Deep Learning | Deep LOVE
DL on Azure ML with Python where type DL = Deep Learning | Deep LOVEDL on Azure ML with Python where type DL = Deep Learning | Deep LOVE
DL on Azure ML with Python where type DL = Deep Learning | Deep LOVE
 
Short film research
Short film researchShort film research
Short film research
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
What’s new on the Microsoft Azure Data Platform
What’s new on the Microsoft Azure Data Platform What’s new on the Microsoft Azure Data Platform
What’s new on the Microsoft Azure Data Platform
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep Dive
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
What's New with AWS Lambda
What's New with AWS LambdaWhat's New with AWS Lambda
What's New with AWS Lambda
 

Similaire à [Research] azure ml anatomy of a machine learning service - Sharat Chikkerur

Frameworks Galore: A Pragmatic Review
Frameworks Galore: A Pragmatic ReviewFrameworks Galore: A Pragmatic Review
Frameworks Galore: A Pragmatic Reviewnetc2012
 
Cnam azure ze cloud resource manager
Cnam azure ze cloud  resource managerCnam azure ze cloud  resource manager
Cnam azure ze cloud resource managerAymeric Weinbach
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream AnalyticsDavide Mauri
 
Exploring microservices in a Microsoft landscape
Exploring microservices in a Microsoft landscapeExploring microservices in a Microsoft landscape
Exploring microservices in a Microsoft landscapeAlex Thissen
 
Azure SQL Database
Azure SQL Database Azure SQL Database
Azure SQL Database nj-azure
 
Developing SharePoint Framework Solutions for the Enterprise - SEF 2019
Developing SharePoint Framework Solutions for the Enterprise - SEF 2019Developing SharePoint Framework Solutions for the Enterprise - SEF 2019
Developing SharePoint Framework Solutions for the Enterprise - SEF 2019Eric Shupps
 
Teradata Partners 2011 - Utilizing Teradata Express For Development And Sandb...
Teradata Partners 2011 - Utilizing Teradata Express For Development And Sandb...Teradata Partners 2011 - Utilizing Teradata Express For Development And Sandb...
Teradata Partners 2011 - Utilizing Teradata Express For Development And Sandb...monsonc
 
DSD-INT 2020 Scripting a Delft-FEWS configuration - Verkade
DSD-INT 2020 Scripting a Delft-FEWS configuration - VerkadeDSD-INT 2020 Scripting a Delft-FEWS configuration - Verkade
DSD-INT 2020 Scripting a Delft-FEWS configuration - VerkadeDeltares
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in AzureValdas Maksimavičius
 
Software variability management - 2017
Software variability management - 2017Software variability management - 2017
Software variability management - 2017XavierDevroey
 
8. Software Development Security
8. Software Development Security8. Software Development Security
8. Software Development SecuritySam Bowne
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseAzure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseBizTalk360
 
Paa sing a java ee 6 application kshitiz saxena
Paa sing a java ee 6 application   kshitiz saxenaPaa sing a java ee 6 application   kshitiz saxena
Paa sing a java ee 6 application kshitiz saxenaIndicThreads
 

Similaire à [Research] azure ml anatomy of a machine learning service - Sharat Chikkerur (20)

Azure basics
Azure basicsAzure basics
Azure basics
 
Frameworks Galore: A Pragmatic Review
Frameworks Galore: A Pragmatic ReviewFrameworks Galore: A Pragmatic Review
Frameworks Galore: A Pragmatic Review
 
Node.js
Node.jsNode.js
Node.js
 
Microservices in Azure
Microservices in AzureMicroservices in Azure
Microservices in Azure
 
Cnam azure ze cloud resource manager
Cnam azure ze cloud  resource managerCnam azure ze cloud  resource manager
Cnam azure ze cloud resource manager
 
Microservices in Azure
Microservices in AzureMicroservices in Azure
Microservices in Azure
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
 
Exploring microservices in a Microsoft landscape
Exploring microservices in a Microsoft landscapeExploring microservices in a Microsoft landscape
Exploring microservices in a Microsoft landscape
 
Azure SQL Database
Azure SQL Database Azure SQL Database
Azure SQL Database
 
Oracle OpenWorld 2014 Review Part Four - PaaS Middleware
Oracle OpenWorld 2014 Review Part Four - PaaS MiddlewareOracle OpenWorld 2014 Review Part Four - PaaS Middleware
Oracle OpenWorld 2014 Review Part Four - PaaS Middleware
 
Developing SharePoint Framework Solutions for the Enterprise - SEF 2019
Developing SharePoint Framework Solutions for the Enterprise - SEF 2019Developing SharePoint Framework Solutions for the Enterprise - SEF 2019
Developing SharePoint Framework Solutions for the Enterprise - SEF 2019
 
Teradata Partners 2011 - Utilizing Teradata Express For Development And Sandb...
Teradata Partners 2011 - Utilizing Teradata Express For Development And Sandb...Teradata Partners 2011 - Utilizing Teradata Express For Development And Sandb...
Teradata Partners 2011 - Utilizing Teradata Express For Development And Sandb...
 
DSD-INT 2020 Scripting a Delft-FEWS configuration - Verkade
DSD-INT 2020 Scripting a Delft-FEWS configuration - VerkadeDSD-INT 2020 Scripting a Delft-FEWS configuration - Verkade
DSD-INT 2020 Scripting a Delft-FEWS configuration - Verkade
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in Azure
 
Software variability management - 2017
Software variability management - 2017Software variability management - 2017
Software variability management - 2017
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
8. Software Development Security
8. Software Development Security8. Software Development Security
8. Software Development Security
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseAzure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
 
CosmosDB.pptx
CosmosDB.pptxCosmosDB.pptx
CosmosDB.pptx
 
Paa sing a java ee 6 application kshitiz saxena
Paa sing a java ee 6 application   kshitiz saxenaPaa sing a java ee 6 application   kshitiz saxena
Paa sing a java ee 6 application kshitiz saxena
 

Plus de PAPIs.io

Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...PAPIs.io
 
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017PAPIs.io
 
Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...PAPIs.io
 
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...PAPIs.io
 
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...PAPIs.io
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...PAPIs.io
 
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...PAPIs.io
 
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...PAPIs.io
 
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016PAPIs.io
 
Real-world applications of AI - Daniel Hulme @ PAPIs Connect
Real-world applications of AI - Daniel Hulme @ PAPIs ConnectReal-world applications of AI - Daniel Hulme @ PAPIs Connect
Real-world applications of AI - Daniel Hulme @ PAPIs ConnectPAPIs.io
 
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...PAPIs.io
 
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...PAPIs.io
 
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs ConnectDemystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs ConnectPAPIs.io
 
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs ConnectPredictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs ConnectPAPIs.io
 
Microdecision making in financial services - Greg Lamp @ PAPIs Connect
Microdecision making in financial services - Greg Lamp @ PAPIs ConnectMicrodecision making in financial services - Greg Lamp @ PAPIs Connect
Microdecision making in financial services - Greg Lamp @ PAPIs ConnectPAPIs.io
 
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...PAPIs.io
 
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...PAPIs.io
 
How to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
How to predict the future of shopping - Ulrich Kerzel @ PAPIs ConnectHow to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
How to predict the future of shopping - Ulrich Kerzel @ PAPIs ConnectPAPIs.io
 
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...PAPIs.io
 

Plus de PAPIs.io (20)

Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
 
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
 
Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...
 
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
 
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
 
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
 
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
 
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
 
Real-world applications of AI - Daniel Hulme @ PAPIs Connect
Real-world applications of AI - Daniel Hulme @ PAPIs ConnectReal-world applications of AI - Daniel Hulme @ PAPIs Connect
Real-world applications of AI - Daniel Hulme @ PAPIs Connect
 
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
 
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
 
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs ConnectDemystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
 
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs ConnectPredictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
 
Microdecision making in financial services - Greg Lamp @ PAPIs Connect
Microdecision making in financial services - Greg Lamp @ PAPIs ConnectMicrodecision making in financial services - Greg Lamp @ PAPIs Connect
Microdecision making in financial services - Greg Lamp @ PAPIs Connect
 
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
 
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
 
How to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
How to predict the future of shopping - Ulrich Kerzel @ PAPIs ConnectHow to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
How to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
 
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
 

Dernier

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 

Dernier (20)

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 

[Research] azure ml anatomy of a machine learning service - Sharat Chikkerur

  • 1. Microsoft Azure Machine Learning Anatomy of a machine learning service Sharat Chikkerur, Senior Software Engineer, Microsoft (On behalf of AzureML team)
  • 2. Microsoft Azure Machine Learning (AzureML) • AzureML is a cloud-hosted tool for creating and deploying machine learning models • Browser-based, zero-installation and cross platform • Describe workflows graphically • Workflows are versioned and support reproducibility • Models can be programmatically retrained • Models can be deployed to Azure as a scalable web service • Can be scaled to 1000+ end points x 200 response containers per service • Supports versioning, collaboration & monetization
  • 3. Outline • Distinguishing features (functional components) of AzureML • Architectural components of AzureML • Implementation details • Lessons learned
  • 7. Versioning • Each run of an experiment is versioned • Can go back in time and examine historical results • Intermediate results cached across experiments in workspace • Each dataset has a unique source transformation
  • 8. Collaboration • Workspaces can be shared between multiple users • Two users cannot however edit the same experiment simultaneously • Any experiment can be pushed to a common AzureML gallery • Allows experiments, models and transforms to be easily shared with the AzureML user community
  • 9. External Language Support • Full-fidelity support for R, Python and SQL (via SQLite) • AzureML datasets marshalled transparently • R models marshalled into AzureML models • Scripts available as part of operationalized web services • Code isolation • External language modules are executed within drawbridge (container) • “Batteries included” • R 3.1.0 with ~500 packages, Anaconda Python 2.7 with ~120 packages
  • 10. • An experiment to be operationalized must be converted into a “scoring” experiment • Training and scoring experiments are “linked” Operationalization
  • 11. Operationalization • A successful scoring experiment can be published as a web service • Published web services are automatically managed, scaled out and load-balanced • Web service available in two flavors • Request/Response: Low-latency endpoint for scoring a single row at a time • Batch: Endpoint for scoring a collection of records from Azure storage
  • 12. Monetization • Data marketplace (http://datamarket.azure.com) allows users to monetize data models • Supports • Web services published through AzureML • Stand alone web services • Integration • Python/R modules can query external web services (including marketplace APIs) allowing functional composition
  • 14. Component services • Studio (UX) • Experimentation Service (ES) • Comprised of micro-services • Job Execution Service (JES) • Single Node Runtime (SNR) • Request response service (RRS) • Batch execution service (BES) UX ES JES SNR RRS BES User
  • 15. Studio (UX) • Primary UX layer • Single page application • Asset Palette • Datasets • Algorithms • Trained models • External language modules • Experiment canvas • DAG consisting of modules • Module properties • Parameters • Action bar • Commands to ES UX ES JES SNR RRS BES User
  • 16. Experimentation Service (ES) • Primary backend • Orchestrates all component services • Handles events to/from UX • Programmatic access • RESTful API (UX communicates this way) • Features • Experiment introspection • Experiment manipulation/creation • Consists of micro services • UX, assets, authentication, packing etc. UX ES JES SNR RRS BES User
  • 17. Job Execution Service (JES) • Primary job scheduler • Dependency tracking • Experiment DAG defines dependencies between modules. • Topological sort used to determined order of execution • Parallel Execution • Different experiments can be executed in parallel • Modules that exist at the same depth in the tree can be scheduled in parallel • Note: JES itself does not execute the task payload. They are dispatched to a task queue UX ES JES SNR RRS BES User
  • 18. Single Node Runtime (SNR) • Executes tasks dispatched from JES • Consumes tasks from a queue • Tasks consists of input specification along with module parameters • Stateless : Data required for execution is copied over • Each SNR contains a copy of Runtime + modules • Runtime-DataTables, Array implementation, IO, BaseClasses etc. • Modules – machine learning algorithms • SNR pool shared across deployment • Size of the pool can be scaled based on demand UX ES JES SNR RRS BES User
  • 19. Machine learning algorithms • Sources of machine learning module assets • Microsoft research • Infer.NET (http://research.microsoft.com/en- us/um/cambridge/projects/infernet/) • Vowpal wabbit (http://hunch.net) • OpenSource • LibSVM • PegaSOS • OpenCV • R • Scikit-learn UX ES JES SNR RRS BES User
  • 20. Category Sub category Module Reference Supervised Binary Classification Average Perceptron (Freund & Schapire, 1999) Bayes point machine (Herbrich, Graepel, & Campbell, 2001) Boosted decision tree (Burges, 2010) Decision jungle (Shotton et al., 2013) Locally Deep SVM (Jose & Goyal, 2013) Logistic regression (Duda, Hart, & Stork, 2000) Neural network (Bishop, 1995) Online SVM (Shalev-Shwartz et al., 2011) Vowpal Wabbit (Langford et al., 2007) Multiclass Decision Forest (Criminisi, 2011) Decision Jungle (Shotton et al., 2013) Multinomial regression (Andrew & Gao, 2007) Neural network (Bishop, 1995) One-vs-all (Rifkin & Klautau, 2004) Vowpal Wabbit (Langford et al., 2007) Regression Bayesian linear regression (Herbrich et al., 2001) Boosted decision tree regression (Burges, 2010) Linear regression (batch and online) (Bottou, 2010) Decision Forest regression (Criminisi, 2011) Random forest based quantile Regression (Criminisi, 2011) Neural network based regression (Bishop, 1995) Ordinal regression (McCullagh, 1980) Poisson regression (Nelder & Wedderburn, 1972) Recommendation Matchbox recommender (Stern et al., 2009) Unsupervised Clustering K-means clustering (Jain, 2010) Anomaly detection One class SVM (Schölkopf, Platt, Shawe-Taylor, Smola, & Williamson, 2001) PCA based anomaly detection (Duda et al., 2000) Feature Selection Filter Filter based feature selection (Guyon, Guyon, Elisseeff, & Elisseeff, 2003) Text analytics Topic modeling Online LDA using Vowpal wabbit (Hoffman, Blei, & Bach, 2010)
  • 21. Request response service (RRS) Batch Execution Service (BES) • RRS • Handles RESTful requests for single prediction • Requests may execute full graph • Can include data transformation before and after prediction • Distinguishing feature compared to other web services • Models and required datasets in graph are compiled to a static package • Executes in-memory and on a single machine • Can scale based on volume of requests • BES • Optimized for batch request. Similar to training workflow UX ES JES SNR RRS BES User
  • 23. Implementation details : Data representation • “DataTable” • Similar to R/Pandas dataframe • Column major organization with sliced and random access • Has a rich schema • Names: Allows re-ordering • Purpose: Weights, Features, Labels etc. • Stored as compressed 2D tiles • “wide” tiles enable streaming access • “narrow” tiles enable full column access • Interoperability • Can be marshalled in/out as R/Pandas dataframe • Can be egressed out as CSV, TSV, SQL Index 1 Block 1 Index 2 Block 2 Index 3 Block 3
  • 24. Implementation details: Modules • Functional units in an experiment graph • Encapsulates: data sources & sinks, models, algorithms, scripts • Categories • Data ingress • Supported sources: CSV, TSV, ARFF, LibSVM, SQL, Hive • Type guessing for CSV, TSV (allows override) • Data manipulation • Cleaning missing values, SQL Transformation, R & Python scripts • Modeling • Machine learning algorithm • Supervised: binary classification, multiclass classification, linear regression, ordinal regression, recommendation • Unsupervised: PCA, k-means • Optimization • Parameter sweep
  • 25. Implementation details: Modules • Ports • Define input and output contracts • Allows multiple input formats per port • I/O handling is done externally to the module through pluggable port handlers • Allows UX to validate inputs at design time • Parameters • Strongly typed • Supports conditional parameters • Can be marked as ‘web service’ parameter – substituted at query time • Supports ranges (for parameter sweep)
  • 26. Implementation detail: Testing • Standard tests • UX tests • Web services penetration testing • Services integration test • AzureML Specific tests • Module properties tests • Schema propagation tests • E2E experiment tests • Operationalized experiment tests • “Runners” test • Machine learning tests • Accuracy tests • Fuzz testing (boundary values testing) • Golden values tests • Auto-generated tests
  • 28. Lesson: Data wrangling is important • More time is built in data wrangling than model building • “A data scientist spends nearly 80% of the time cleaning data” – NY Times (http://nyti.ms/1t8IzfE) • Data manipulation modules are very popular • Internal ranking • “Execute R script”, “SQL Transform” modules are more popular than machine learning modules. • It is hard to anticipate all data pre-processing needs • Need to provide custom processing support • SQL Transform • Execute R script • Execute Python script
  • 29. Lesson: Make big data possible, but small data efficient • Distributed machine learning comes with a large overhead (Zaharia et al. 2010) • Typical data science workflows enable exploration with small amounts of data • Should make this effortless and intuitive • AzureML approach: “Make big data possible, but small data efficient” • Make sure all experiment graphs can handle data size. • Support ingress of large data – SQL, Azure • Support features to pre-process big data • Feature selection • Feature hashing • Learning by counts – reduces high dimensional data to lower dimensional historic counts/rates • Support streaming algorithms for big data (e.g. “Train Vowpal Wabbit”)
  • 30. Lesson: Feature gaps are inevitable • Cannot cover all possible pre-processing scenarios • Cannot provide all algorithms • Support for scripting (R, Python, SQL) • Allow custom data manipulation • Allow users to bring in external libraries • Allow users to call into other web services • Isolate user code • Support during operationalization • Support custom modules • Allow user to author first class “modules” • Allow use to mix custom modules in the workflow
  • 31. Lesson: Data science workflows should be reproducible • Data science workflows are iterative, explorative and collaborative • Need to provide a way to version and capture the workflow, settings, inputs etc. • Make it easy to repeat the same experiment • Reproducibility • Capture random number seeds as part of the experiment. • Same settings should produce the same results • Re-running parts of the graph should be efficient. • “Determinism” • Modules are tagged as deterministic (e.g. SQL transform) or non-deterministic (e.g. :hive query) • A graph can also be labeled as deterministic or non-deterministic • Caching • Outputs from deterministic modules are cached to make re-runs efficient. • Only changed parts of the graph are re-executed.
  • 32. Summary • AzureML provides distinguishing features • Visual authoring • Versioning and reproducibility • Collaboration • Architecture • Multiple scalable services • Implementation details • Extensible data format that can be interoperate with R & Python • Modules provide a way to package data & code • Lessons learned • Data wrangling is important • Allow user code to mitigate feature gaps • Support big data but make small data efficient
  • 33. Logistics: Getting access to AzureML • http://azure.com/ml • https://studio.azureml.net • Guest access w/o sign in • Free access with sign-in ($200 credit) • Paid access with azure subscription • https://manage.windowsazure.com • Manage end points, storage accounts and workspaces
  • 35. Developing a predictive model is hard Challenges • Data processing • Different sources, formats, schemas • Missing values, noisy data • Modeling • Modeling choice • Feature engineering • Parameter tuning • Tracking & collaboration • Deployment & Retraining • Productionizing/deployment of the model • Replication, scaling out
  • 36. Developing a predictive model is hard Challenges • Data processing • Different sources, formats, schemas • Missing values, noisy data • Modeling • Modeling choice • Feature engineering • Parameter tuning • Tracking & collaboration • Deployment & Retraining • Productionizing/deployment of the model • Replication, scaling out Solutions • Data processing • Languages: SQL, R, python • Frameworks: dpylr, pandas • Stacks: Hadoop, Spark, Mapreduce • Modeling • Libraries: Weka, VW, ML Lib, LibSVM • Feature engineering: gensim, NLTK • Tuning: Spearmint, whetlab • Tracking & collaboration: ipynb + github • Deployment & Retraining • Machine learning web services
  • 37. Implementation detail: Schema propagation • Schema is associated with datasets/learners • Dataset attributes • Required columns for learners etc. • Design time validation • Module execution has latency overhead • Schema is computed and propagated before executing module code. • Method: pre-determined schema calculus • Each module class has well defined modification of the schema • One-off modules are encoded as exception
  • 38. JES FE JES WORKER SNR FE SNR WORKERTASKS STATE USER WORKSPACE EXPERIMENTATION SERVICE Jobs Queue Tasks Queue JOBS STATE • Stateless design, easy scalability, failover simplicity • Optimistic concurrency, scheduling/locking overhead • Separate shared storage, holding transient job/tasks state • Task cache management to speed up execution and facilitate iterative experimentation • Throttling to limit the resource usage per customer/workspace • Plugin architecture for task handlers and schedulers JES SNR interaction