SlideShare une entreprise Scribd logo
1  sur  99
zekeLabs
Machine Learning at Scale
Development to Deployment
Skilling for the Future
www.zekeLabs.com
Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com
zekeLabs
Machine Learning at Scale
Development to Deployment
Skilling for the Future
www.zekeLabs.com
Modules
1. Understanding Machine Learning Ecosystem
2. The Machine Learning Pipeline & Product stories
3. Data Challenges
4. Taking Machine Learning to Scale using Spark & Kafka
5. Knowing the Unknowns
Module 1
Understanding
Machine Learning
Ecosystem
● Black box Introduction to Machine Learning
● Types of Machine Learning
● Components of AI
● The AI Timeline
Black Box Introduction to ML
What is not Machine Learning ?
● Rule Based Approach
● Legacy Systems
Learning Algorithm
What is Machine Learning ?
● Solve prediction problem
Input Data
● Logic is learned from examples & not by rules
Training Data
Prediction Function
or
Trained Model
Types of Machine Learning
Machine Learning
ReinforcementUnsupervisedSupervised
Task Driven Data Driven Environment Driven
Spam Mail Detection
● Input - Mail
● Output - Spam or Ham
● Supervised Machine Learning,
● Binary Classification Problem
● Input - Sensor Data
● Output - Failure time
● Supervised Machine Learning,
● Regression Problem
Predicting Lift Failure
● Input - Accident details
● Output - Insurance amount
● Supervised Machine Learning,
● Regression Problem
Predicting Insurance Amount
● Input - Patient Synopsis (fever,
temperature, BP, etc. )
● Output - Diagnosis
● Supervised Machine Learning,
● Multi-class classification Problem
Medical Diagnosis
Question - What is common between them ?
Market Segmentation
● Input - Customer Details
● Output - Clusters
● Unsupervised Machine Learning
● Clustering Problem
Robot playing Football
● Input - Player information,
Rewards
● Output - Action to score
● Reinforcement Learning
What does AI consist of ?
The A.I. Timeline
Module 2
Machine Learning
Pipeline
● Understanding Machine Learning Pipeline
● User Story - Automating customer support
● Implementation
● User Story - Fast Query Chatbots
● Implementation
Machine Learning Pipeline
Machine Learning Pipeline - Business Understanding
● Business understanding includes clarity what you are trying to achieve.
● Machine learning is not possible with small data size.
● Consolidating data pipeline to channelize continues flow of data.
● Web scraping, data lakes access, REST etc.
Machine Learning Pipeline - Data Wrangling
● Production data is never clean.
● It needs a major effort ( around 70% of total effort ) to make it ready for next stage.
● Transforming & mapping data from raw format to another format ready for next stage.
Machine Learning Pipeline - Data Visualization
● Visualization makes it easy to grasp difficult concepts
● Find useful pattern in the data
● Interactively drill down into charts for deeper details
Vectors - Fixed length array of numbers
● Text documents
● Image files
● CSV
● Audio
● Video
● Time Series data
● Many more ...
Machine Learning Pipeline - Data Preprocessing
Feature Extraction
Machine Learning Pipeline - Model Training
Learning Algorithm
Regression/Trees/SVM/Naiv
e Bayes/Neural Networks/
Prediction Function
or
Trained Model
● Linear Regression
● Logistic Regression
● Naive Bayes
● Nearest Neighbors
● Decision Trees
● Ensemble Methods
● Clustering
● Support Vector Machines
● Neural Networks
● CNN
● RNN
● GAN
Machine Learning Pipeline - Learning Algorithms
Prediction
Prediction Function
or
Trained Model
Machine Learning Pipeline - Model Validation
● Training different learning method will give you different trained model.
● Also, each model have huge possibilities of configuration (hyper-parameters).
● Finding the best model among all possibilities & best configuration for it is done as a part
of Model Validation.
● If results are not satisfactory, one has to go back in the chain & fix a few things.
Machine Learning Pipeline - Deployment
Trained Model
Or
Interface Model
Consumers RESTful Interface
1. User Story : Customer Service Industry
1. Reduce manual
effort of classifying
reviews.
2.Channelizing data
from Web server to
Analytics Engine.
1. Getting
data ready for
visualization.
2. Historical
data shows
past trends.
Visualization
of trend
Text needs to
be tokenized
& vectorized
Different
models were
trained.
Naive Bayes,
SGD Classifier
Choose the
best model
with best
hyper-
parameter
Naive Bayes
(MultinomialNB)
was chosen & put
in deployment
1. Implementation : Customer Service Industry
2. User Story : Fast Query Chatbots
2. Implementation : Fast Query Chatbots
1. Reduce manual effort
understanding the text
query
2. Waiting for BI has a
long turnaround time
3. We are trying to do this
using chatbot
1. Getting data
ready for
visualization.
2. Historical
data shows
past trends
Visualization
of trend of
text & sql
Text cannot
be used for
ML
Needs to be
tokenized &
vectorized
Deep learning
models with
different layer
configuration
Choosing the
best model
with best
hyper-
parameter
Model with best
config was chosen
& put in
deployment
3. User Story : Preventing System Failure
Module 3
Data Challenges
● Optimal data size
● Identify data sources
● Identify what is useful in data
● Cleaning data to extract useful information
● Tools & Libraries to clean & extract useful information
Optimal Data size for AI product
● Expectation from a predictor -
Moderate Bias & Moderate
Variance.
● Predictor validation is important.
● The more the data better the
model becomes to a limit.
Identify Data Sources
● No specific order in identifying problem statement & data sources.
● Innovation in this space can happen in both ways - Top-Down & Bottom’s-
Up.
● Data can be historical batch data stored in RDBMS & NoSQL DBs.
● Live streamed data using Kafka.
Identify what is useful in data
Cleaning data to extract useful information
Tools vs Libraries
● Data cleaning tools available in market.
● Why they don’t work in long run?
● Data cleaning libraries available.
● Why are more and more enterprises are embracing libraries?
Changes with change in volume of data
Spark vs Other technologies
● Big Data Compute Framework
● Do data cleaning at scale with unbounded performance
● Talk to different data sources
Module 4
Machine
Learning Pipeline
at Scale
● Machine Learning Pipeline using Spark
● Spark - A very social technology
● Spark for Big Data Cleaning & Wrangling
● Spark for building ML models at Scale
● Validation & monitoring of models
● Deployment using REST interface using Apache Livy
Machine Learning Pipeline using Spark
Spark - A very social technology
Preprocessing Data at Scale
● Scaling
● CountVectorizer
● Binning
● … many things can be done at scale using Spark
Training Models using Spark
● Distributed Model Training using Spark
● Regression
● Classification
● Clustering
● Recommendation Engine
Building Data Pipeline in Spark
● Spark provides in-built Transformers & Estimators.
● Pipeline can be built to connect transformers & estimators.
● Machine Learning Pipeline can be automated.
REST Interface to Spark
Module 5
Knowing
the
Unknowns
● Implementing Transformers & Estimators on Spark
● Deep Learning using Spark
● Are model retrainable?
● The skilling journey
● Introducing Apache Beam
Transformers & Estimators on Spark
● Building Custom Transformers
● Building Custom Estimators
What is Deep Learning ?
● Specialized Learning Technique.
● Rather than we choosing features for learning, this technique finds
important feature derivatives.
● Objective is to learn best derived features for prediction.
● It mimics the way our brain learns.
● Very useful for natural language, computer vision, audio, video etc.
Do you always need Deep Learning ?
● More data is required for Deep Learning
● More Compute Power
● Models less interpretable
“Don’t kill a mosquito with a cannon ball”
Don’t use Deep Learning if you don’t need to
Deep Learning using Spark
● Which one to choose - Distributed TensorFlow & DL using Spark.
● Libraries like - spark-dl & elephas
Are models re-trainable ?
● Online learning models in scikit - SGDClassifier, Multinomial Naive Bayes
● Spark ML models are not online learning models
Skilling Journey
Apache Beam - Probably our next webinar
● Apache Beam is an evolution of the Dataflow model created by Google to
process massive amounts of data.
● The name Beam (Batch + strEAM) comes from the idea of having a unified
model for both batch and stream data processing.
● Programs written using Beam can be executed in different processing
frameworks (via runners) using a set of different IOs (Spark, Flink etc.).
Q & A
Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com
Additional
Components of any AI product
Data Compute Talent
Where AI got into in business?
Imp : Advice to executives about AI
● Everybody should embrace modern capability of AI, on other they should
also think about business specific problems. Not every single tool that AI
community can develop can suit them correctly.
● Biggest challenge is people change not technology change, biggest gap
now is people who can map technology to business problem.
● Insourcing vs outsourcing. Building Team vs using enterprise solutions.
● AI will change everything in next few decades. Be a part of it.
Challenges - Data & Security
● Volume of data - Machine learning
on smaller data is infeasible.
● Accessibility of data - Important
data is not accessible & may be in
encrypted format.
info@zekeLabs.com | www.zekeLabs.com | +91
Compute, Storage & Network Power
● AI products needs data gathering from sensors, servers etc.
● Once gathered, data needs to be stored for further processing.
● Learning algorithms & data processing activities need lot of compute
power.
Infrastructure for development
● Finding the best model is an iterative
process.
● More experiments leads better model.
● Hyper-parameter Tuning
● Scaled infrastructure for developer is
important.
info@zekeLabs.com | www.zekeLabs.com | +91
Infrastructure for deployment
● Speedy Deployment.
● Easy deployment
● Fluctuating Demand.
● Need of Elastic infrastructure.
● Cost optimization.
info@zekeLabs.com | www.zekeLabs.com | +91
Summary of challenges
Cost optimization:
● Use Open Source alternatives
● Infrastructure optimization
● Don’t reinvent the wheel
info@zekeLabs.com | www.zekeLabs.com | +91
Module 3
Impact of AI
● Will AI benefit human ?
● AI in human computer interaction
● Impact of AI on business
● Impact on workplace
● Impact on society
info@zekeLabs.com | www.zekeLabs.com | +91
8095465880
AI benefit human - social, environmental
● Predicting diseases
● 60% People would prefer AI assistance over humans as financial advisors
or tax preparers
● 71% people believe that AI will help humans solve complex problems and
help live more enriched lives
AI assistants
● Saves Time
● Calendar events reminder
● Helps get things done
Impact of AI on business
More
AI advisor & manager at workplace
Impact on Decision Makers
● Adoption of AI advisors
What can be outsourced to AI assistant
Impact of artificial intelligence on society
● People are averse to the idea of availing annual health check-
ups at home with a robotic smart kit (77%) or having chatbot
assistant teachers in universities/ colleges that lower the cost
of overall tuition (61%).
● Responsible AI ensures that its workings are aligned to ethical
standards and social norms pertinent within its scope of
operations.
● Explainable AI is responsible for building AI models with
accountability and the ability to describe or depict why a certain
decision was made by the algorithm.
Module 4
Identify right tools
● Programming Language
● Open source libraries
● Infrastructure Optimizations
● Other alternatives
info@zekeLabs.com | www.zekeLabs.com | +91
8095465880
Choose the right Programming Language
Why Python makes life easy ?
● Easy to learn for ETL developers
● Integrates very well with other technologies
● Full-stack development -
○ Dashboard using bokeh,
○ Web application using django,
○ Machine learning models using scikit,
○ Scaling using PySpark
info@zekeLabs.com | www.zekeLabs.com | +91
Choose appropriate Libraries
- Statistical Modeling & Data Processing
info@zekeLabs.com | www.zekeLabs.com | +91
Choose appropriate Libraries
- Visualization
info@zekeLabs.com | www.zekeLabs.com | +91
Choose appropriate Libraries
- Machine Learning or Deep Learning
Infrastructure Optimization
Monolithic or Serverless
info@zekeLabs.com | www.zekeLabs.com | +91
Monolithic Infrastructure - Preallocated Infra
Model Training
● Developers request access
whenever required
● Might incur delay in peak
working hours.
● Idle in non-working hours
Model Interfacing
● Idle in non-peak hours.
● May fall short in spikes.
● Pay even if infra is not used
info@zekeLabs.com | www.zekeLabs.com | +91
Serverless Infrastructure - Elastic Allocation
Model Training
● No-preallocation
● Pay only for what you use
● Absolute no idle time for infra
● No wait time for developers
Model Interfacing
● Allocate infra only when required
● Scales down during non-peak
hours
● Improved customer experience
even in peak hours
info@zekeLabs.com | www.zekeLabs.com | +91
Serverless Infrastructure Solutions
● Open Function as a Service (OpenFaas)
● AWS Lambda
● Google Cloud Function
● Azure Function
info@zekeLabs.com | www.zekeLabs.com | +91
Distributed Machine Learning using Spark
● Apache Spark is a distributed data
processing framework.
● Many machine learning algorithms are
implemented in Spark.
● Most of the API’s are same that of scikit-
learn
● Scaled ETL & Machine Learning can be done
using Spark
info@zekeLabs.com | www.zekeLabs.com | +91
Other alternatives
Google Cloud AI
info@zekeLabs.com | www.zekeLabs.com | +91
Module 5
Build AI Team
● Adoption of AI
● Skills
● Hiring or upskilling
● Upskilling workforce
info@zekeLabs.com | www.zekeLabs.com | +91
8095465880
Adoption Strategy
Build Business Case Scale Efficiently
Create Data
Driven Culture
Skills
Talent Acquisition
● Upskill your current team ?
info@zekeLabs.com | www.zekeLabs.com | +91
Upskilling workforce
● It’s possible to make use of the people who have delivered for you in the
past.
Q & A
info@zekeLabs.com | www.zekeLabs.com | +91
Repositories
● https://github.com/zekelabs/machine-learning-for-beginners
● https://github.com/zekelabs/tensorflow-tutorial/
● Dog breed prediction -
https://www.edyoda.com/resources/watch/54AEA4CDC35394F1183A9D
D17AA47/
● Python learning course -
https://www.edyoda.com/resources/videolisting/98/
info@zekeLabs.com | www.zekeLabs.com | +91
Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

Contenu connexe

Tendances

Serverless Summit India 2017: Fission
Serverless Summit India 2017: FissionServerless Summit India 2017: Fission
Serverless Summit India 2017: FissionVishal Biyani
 
Creating a Kubernetes Operator in Java
Creating a Kubernetes Operator in JavaCreating a Kubernetes Operator in Java
Creating a Kubernetes Operator in JavaRudy De Busscher
 
Performance improvements in etcd 3.5 release
Performance improvements in etcd 3.5 releasePerformance improvements in etcd 3.5 release
Performance improvements in etcd 3.5 releaseLibbySchulze
 
[Lakmal] Automate Microservice to API
[Lakmal] Automate Microservice to API[Lakmal] Automate Microservice to API
[Lakmal] Automate Microservice to APILakmal Warusawithana
 
Serverless Functions: Accelerating DevOps Adoption
Serverless Functions: Accelerating DevOps AdoptionServerless Functions: Accelerating DevOps Adoption
Serverless Functions: Accelerating DevOps AdoptionAll Things Open
 
Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18
Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18
Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18CodeOps Technologies LLP
 
Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18
Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18
Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18CodeOps Technologies LLP
 
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...HostedbyConfluent
 
How kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedHow kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedShikha Srivastava
 
Advanced dev ops governance with terraform
Advanced dev ops governance with terraformAdvanced dev ops governance with terraform
Advanced dev ops governance with terraformJames Counts
 
Breaking the Monolith: Organizing Your Team to Embrace Microservices
Breaking the Monolith: Organizing Your Team to Embrace MicroservicesBreaking the Monolith: Organizing Your Team to Embrace Microservices
Breaking the Monolith: Organizing Your Team to Embrace MicroservicesPaul Osman
 
Cloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless ArchitectureCloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless ArchitectureAWS Vietnam Community
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and KubernetesAltoros
 
Deeplearning and dev ops azure
Deeplearning and dev ops azureDeeplearning and dev ops azure
Deeplearning and dev ops azureVishwas N
 
Distributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemDistributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemZhenzhong Xu
 
Manage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrierManage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrierLibbySchulze
 
Serverless java
Serverless   javaServerless   java
Serverless javaVishwas N
 
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...Platform9
 

Tendances (20)

Serverless Summit India 2017: Fission
Serverless Summit India 2017: FissionServerless Summit India 2017: Fission
Serverless Summit India 2017: Fission
 
Creating a Kubernetes Operator in Java
Creating a Kubernetes Operator in JavaCreating a Kubernetes Operator in Java
Creating a Kubernetes Operator in Java
 
Performance improvements in etcd 3.5 release
Performance improvements in etcd 3.5 releasePerformance improvements in etcd 3.5 release
Performance improvements in etcd 3.5 release
 
[Lakmal] Automate Microservice to API
[Lakmal] Automate Microservice to API[Lakmal] Automate Microservice to API
[Lakmal] Automate Microservice to API
 
Container Patterns
Container PatternsContainer Patterns
Container Patterns
 
Microservices with Spring
Microservices with SpringMicroservices with Spring
Microservices with Spring
 
Serverless Functions: Accelerating DevOps Adoption
Serverless Functions: Accelerating DevOps AdoptionServerless Functions: Accelerating DevOps Adoption
Serverless Functions: Accelerating DevOps Adoption
 
Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18
Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18
Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18
 
Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18
Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18
Containers and OpenStack - A Happy Marriage - Madhuri - Intel - CC18
 
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
 
How kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedHow kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updated
 
Advanced dev ops governance with terraform
Advanced dev ops governance with terraformAdvanced dev ops governance with terraform
Advanced dev ops governance with terraform
 
Breaking the Monolith: Organizing Your Team to Embrace Microservices
Breaking the Monolith: Organizing Your Team to Embrace MicroservicesBreaking the Monolith: Organizing Your Team to Embrace Microservices
Breaking the Monolith: Organizing Your Team to Embrace Microservices
 
Cloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless ArchitectureCloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless Architecture
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and Kubernetes
 
Deeplearning and dev ops azure
Deeplearning and dev ops azureDeeplearning and dev ops azure
Deeplearning and dev ops azure
 
Distributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemDistributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystem
 
Manage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrierManage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrier
 
Serverless java
Serverless   javaServerless   java
Serverless java
 
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
 

Similaire à Machine learning at scale - Webinar By zekeLabs

AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or realityAwantik Das
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Serverless Machine Learning
Serverless Machine LearningServerless Machine Learning
Serverless Machine LearningAsavari Tayal
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018Adam Gibson
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataWeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataWeCloudData
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
What is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMWhat is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMProduct School
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningEdunomica
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?Ivo Andreev
 

Similaire à Machine learning at scale - Webinar By zekeLabs (20)

AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Serverless Machine Learning
Serverless Machine LearningServerless Machine Learning
Serverless Machine Learning
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
What is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMWhat is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PM
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Machine learning
Machine learningMachine learning
Machine learning
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 

Plus de zekeLabs Technologies

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...zekeLabs Technologies
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabszekeLabs Technologies
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldzekeLabs Technologies
 
Master guide to become a data scientist
Master guide to become a data scientist Master guide to become a data scientist
Master guide to become a data scientist zekeLabs Technologies
 

Plus de zekeLabs Technologies (20)

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container world
 
SQL
SQLSQL
SQL
 
02 terraform core concepts
02 terraform core concepts02 terraform core concepts
02 terraform core concepts
 
08 Terraform: Provisioners
08 Terraform: Provisioners08 Terraform: Provisioners
08 Terraform: Provisioners
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Master guide to become a data scientist
Master guide to become a data scientist Master guide to become a data scientist
Master guide to become a data scientist
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Linear models of classification
Linear models of classificationLinear models of classification
Linear models of classification
 
Grid search, pipeline, featureunion
Grid search, pipeline, featureunionGrid search, pipeline, featureunion
Grid search, pipeline, featureunion
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Dimentionality reduction
Dimentionality reductionDimentionality reduction
Dimentionality reduction
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Logistic Regression
Logistic RegressionLogistic Regression
Logistic Regression
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 

Dernier

Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 

Dernier (20)

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 

Machine learning at scale - Webinar By zekeLabs

  • 1. zekeLabs Machine Learning at Scale Development to Deployment Skilling for the Future www.zekeLabs.com
  • 2. Visit : www.zekeLabs.com for more details THANK YOU Let us know how can we help your organization to Upskill the employees to stay updated in the ever-evolving IT Industry. Get in touch: www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com
  • 3. zekeLabs Machine Learning at Scale Development to Deployment Skilling for the Future www.zekeLabs.com
  • 4. Modules 1. Understanding Machine Learning Ecosystem 2. The Machine Learning Pipeline & Product stories 3. Data Challenges 4. Taking Machine Learning to Scale using Spark & Kafka 5. Knowing the Unknowns
  • 5. Module 1 Understanding Machine Learning Ecosystem ● Black box Introduction to Machine Learning ● Types of Machine Learning ● Components of AI ● The AI Timeline
  • 7. What is not Machine Learning ? ● Rule Based Approach ● Legacy Systems
  • 8. Learning Algorithm What is Machine Learning ? ● Solve prediction problem Input Data ● Logic is learned from examples & not by rules Training Data Prediction Function or Trained Model
  • 9. Types of Machine Learning Machine Learning ReinforcementUnsupervisedSupervised Task Driven Data Driven Environment Driven
  • 10. Spam Mail Detection ● Input - Mail ● Output - Spam or Ham ● Supervised Machine Learning, ● Binary Classification Problem
  • 11. ● Input - Sensor Data ● Output - Failure time ● Supervised Machine Learning, ● Regression Problem Predicting Lift Failure
  • 12. ● Input - Accident details ● Output - Insurance amount ● Supervised Machine Learning, ● Regression Problem Predicting Insurance Amount
  • 13. ● Input - Patient Synopsis (fever, temperature, BP, etc. ) ● Output - Diagnosis ● Supervised Machine Learning, ● Multi-class classification Problem Medical Diagnosis
  • 14. Question - What is common between them ?
  • 15. Market Segmentation ● Input - Customer Details ● Output - Clusters ● Unsupervised Machine Learning ● Clustering Problem
  • 16. Robot playing Football ● Input - Player information, Rewards ● Output - Action to score ● Reinforcement Learning
  • 17. What does AI consist of ?
  • 19. Module 2 Machine Learning Pipeline ● Understanding Machine Learning Pipeline ● User Story - Automating customer support ● Implementation ● User Story - Fast Query Chatbots ● Implementation
  • 21. Machine Learning Pipeline - Business Understanding ● Business understanding includes clarity what you are trying to achieve. ● Machine learning is not possible with small data size. ● Consolidating data pipeline to channelize continues flow of data. ● Web scraping, data lakes access, REST etc.
  • 22. Machine Learning Pipeline - Data Wrangling ● Production data is never clean. ● It needs a major effort ( around 70% of total effort ) to make it ready for next stage. ● Transforming & mapping data from raw format to another format ready for next stage.
  • 23. Machine Learning Pipeline - Data Visualization ● Visualization makes it easy to grasp difficult concepts ● Find useful pattern in the data ● Interactively drill down into charts for deeper details
  • 24. Vectors - Fixed length array of numbers ● Text documents ● Image files ● CSV ● Audio ● Video ● Time Series data ● Many more ... Machine Learning Pipeline - Data Preprocessing Feature Extraction
  • 25. Machine Learning Pipeline - Model Training Learning Algorithm Regression/Trees/SVM/Naiv e Bayes/Neural Networks/ Prediction Function or Trained Model
  • 26. ● Linear Regression ● Logistic Regression ● Naive Bayes ● Nearest Neighbors ● Decision Trees ● Ensemble Methods ● Clustering ● Support Vector Machines ● Neural Networks ● CNN ● RNN ● GAN Machine Learning Pipeline - Learning Algorithms
  • 28. Machine Learning Pipeline - Model Validation ● Training different learning method will give you different trained model. ● Also, each model have huge possibilities of configuration (hyper-parameters). ● Finding the best model among all possibilities & best configuration for it is done as a part of Model Validation. ● If results are not satisfactory, one has to go back in the chain & fix a few things.
  • 29. Machine Learning Pipeline - Deployment Trained Model Or Interface Model Consumers RESTful Interface
  • 30. 1. User Story : Customer Service Industry
  • 31. 1. Reduce manual effort of classifying reviews. 2.Channelizing data from Web server to Analytics Engine. 1. Getting data ready for visualization. 2. Historical data shows past trends. Visualization of trend Text needs to be tokenized & vectorized Different models were trained. Naive Bayes, SGD Classifier Choose the best model with best hyper- parameter Naive Bayes (MultinomialNB) was chosen & put in deployment 1. Implementation : Customer Service Industry
  • 32. 2. User Story : Fast Query Chatbots
  • 33. 2. Implementation : Fast Query Chatbots 1. Reduce manual effort understanding the text query 2. Waiting for BI has a long turnaround time 3. We are trying to do this using chatbot 1. Getting data ready for visualization. 2. Historical data shows past trends Visualization of trend of text & sql Text cannot be used for ML Needs to be tokenized & vectorized Deep learning models with different layer configuration Choosing the best model with best hyper- parameter Model with best config was chosen & put in deployment
  • 34. 3. User Story : Preventing System Failure
  • 35. Module 3 Data Challenges ● Optimal data size ● Identify data sources ● Identify what is useful in data ● Cleaning data to extract useful information ● Tools & Libraries to clean & extract useful information
  • 36. Optimal Data size for AI product ● Expectation from a predictor - Moderate Bias & Moderate Variance. ● Predictor validation is important. ● The more the data better the model becomes to a limit.
  • 37. Identify Data Sources ● No specific order in identifying problem statement & data sources. ● Innovation in this space can happen in both ways - Top-Down & Bottom’s- Up. ● Data can be historical batch data stored in RDBMS & NoSQL DBs. ● Live streamed data using Kafka.
  • 38. Identify what is useful in data
  • 39. Cleaning data to extract useful information
  • 40. Tools vs Libraries ● Data cleaning tools available in market. ● Why they don’t work in long run? ● Data cleaning libraries available. ● Why are more and more enterprises are embracing libraries?
  • 41. Changes with change in volume of data
  • 42. Spark vs Other technologies ● Big Data Compute Framework ● Do data cleaning at scale with unbounded performance ● Talk to different data sources
  • 43. Module 4 Machine Learning Pipeline at Scale ● Machine Learning Pipeline using Spark ● Spark - A very social technology ● Spark for Big Data Cleaning & Wrangling ● Spark for building ML models at Scale ● Validation & monitoring of models ● Deployment using REST interface using Apache Livy
  • 44.
  • 46. Spark - A very social technology
  • 47. Preprocessing Data at Scale ● Scaling ● CountVectorizer ● Binning ● … many things can be done at scale using Spark
  • 48. Training Models using Spark ● Distributed Model Training using Spark ● Regression ● Classification ● Clustering ● Recommendation Engine
  • 49. Building Data Pipeline in Spark ● Spark provides in-built Transformers & Estimators. ● Pipeline can be built to connect transformers & estimators. ● Machine Learning Pipeline can be automated.
  • 51. Module 5 Knowing the Unknowns ● Implementing Transformers & Estimators on Spark ● Deep Learning using Spark ● Are model retrainable? ● The skilling journey ● Introducing Apache Beam
  • 52. Transformers & Estimators on Spark ● Building Custom Transformers ● Building Custom Estimators
  • 53. What is Deep Learning ? ● Specialized Learning Technique. ● Rather than we choosing features for learning, this technique finds important feature derivatives. ● Objective is to learn best derived features for prediction. ● It mimics the way our brain learns. ● Very useful for natural language, computer vision, audio, video etc.
  • 54. Do you always need Deep Learning ? ● More data is required for Deep Learning ● More Compute Power ● Models less interpretable “Don’t kill a mosquito with a cannon ball” Don’t use Deep Learning if you don’t need to
  • 55. Deep Learning using Spark ● Which one to choose - Distributed TensorFlow & DL using Spark. ● Libraries like - spark-dl & elephas
  • 56. Are models re-trainable ? ● Online learning models in scikit - SGDClassifier, Multinomial Naive Bayes ● Spark ML models are not online learning models
  • 58. Apache Beam - Probably our next webinar ● Apache Beam is an evolution of the Dataflow model created by Google to process massive amounts of data. ● The name Beam (Batch + strEAM) comes from the idea of having a unified model for both batch and stream data processing. ● Programs written using Beam can be executed in different processing frameworks (via runners) using a set of different IOs (Spark, Flink etc.).
  • 59. Q & A
  • 60. Visit : www.zekeLabs.com for more details THANK YOU Let us know how can we help your organization to Upskill the employees to stay updated in the ever-evolving IT Industry. Get in touch: www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com
  • 62. Components of any AI product Data Compute Talent
  • 63. Where AI got into in business?
  • 64. Imp : Advice to executives about AI ● Everybody should embrace modern capability of AI, on other they should also think about business specific problems. Not every single tool that AI community can develop can suit them correctly. ● Biggest challenge is people change not technology change, biggest gap now is people who can map technology to business problem. ● Insourcing vs outsourcing. Building Team vs using enterprise solutions. ● AI will change everything in next few decades. Be a part of it.
  • 65. Challenges - Data & Security ● Volume of data - Machine learning on smaller data is infeasible. ● Accessibility of data - Important data is not accessible & may be in encrypted format. info@zekeLabs.com | www.zekeLabs.com | +91
  • 66. Compute, Storage & Network Power ● AI products needs data gathering from sensors, servers etc. ● Once gathered, data needs to be stored for further processing. ● Learning algorithms & data processing activities need lot of compute power.
  • 67. Infrastructure for development ● Finding the best model is an iterative process. ● More experiments leads better model. ● Hyper-parameter Tuning ● Scaled infrastructure for developer is important. info@zekeLabs.com | www.zekeLabs.com | +91
  • 68. Infrastructure for deployment ● Speedy Deployment. ● Easy deployment ● Fluctuating Demand. ● Need of Elastic infrastructure. ● Cost optimization. info@zekeLabs.com | www.zekeLabs.com | +91
  • 70. Cost optimization: ● Use Open Source alternatives ● Infrastructure optimization ● Don’t reinvent the wheel info@zekeLabs.com | www.zekeLabs.com | +91
  • 71. Module 3 Impact of AI ● Will AI benefit human ? ● AI in human computer interaction ● Impact of AI on business ● Impact on workplace ● Impact on society info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 72. AI benefit human - social, environmental ● Predicting diseases ● 60% People would prefer AI assistance over humans as financial advisors or tax preparers ● 71% people believe that AI will help humans solve complex problems and help live more enriched lives
  • 73. AI assistants ● Saves Time ● Calendar events reminder ● Helps get things done
  • 74. Impact of AI on business
  • 75. More
  • 76. AI advisor & manager at workplace
  • 77. Impact on Decision Makers ● Adoption of AI advisors
  • 78. What can be outsourced to AI assistant
  • 79. Impact of artificial intelligence on society ● People are averse to the idea of availing annual health check- ups at home with a robotic smart kit (77%) or having chatbot assistant teachers in universities/ colleges that lower the cost of overall tuition (61%). ● Responsible AI ensures that its workings are aligned to ethical standards and social norms pertinent within its scope of operations. ● Explainable AI is responsible for building AI models with accountability and the ability to describe or depict why a certain decision was made by the algorithm.
  • 80. Module 4 Identify right tools ● Programming Language ● Open source libraries ● Infrastructure Optimizations ● Other alternatives info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 81. Choose the right Programming Language
  • 82. Why Python makes life easy ? ● Easy to learn for ETL developers ● Integrates very well with other technologies ● Full-stack development - ○ Dashboard using bokeh, ○ Web application using django, ○ Machine learning models using scikit, ○ Scaling using PySpark info@zekeLabs.com | www.zekeLabs.com | +91
  • 83. Choose appropriate Libraries - Statistical Modeling & Data Processing info@zekeLabs.com | www.zekeLabs.com | +91
  • 84. Choose appropriate Libraries - Visualization info@zekeLabs.com | www.zekeLabs.com | +91
  • 85. Choose appropriate Libraries - Machine Learning or Deep Learning
  • 86. Infrastructure Optimization Monolithic or Serverless info@zekeLabs.com | www.zekeLabs.com | +91
  • 87. Monolithic Infrastructure - Preallocated Infra Model Training ● Developers request access whenever required ● Might incur delay in peak working hours. ● Idle in non-working hours Model Interfacing ● Idle in non-peak hours. ● May fall short in spikes. ● Pay even if infra is not used info@zekeLabs.com | www.zekeLabs.com | +91
  • 88. Serverless Infrastructure - Elastic Allocation Model Training ● No-preallocation ● Pay only for what you use ● Absolute no idle time for infra ● No wait time for developers Model Interfacing ● Allocate infra only when required ● Scales down during non-peak hours ● Improved customer experience even in peak hours info@zekeLabs.com | www.zekeLabs.com | +91
  • 89. Serverless Infrastructure Solutions ● Open Function as a Service (OpenFaas) ● AWS Lambda ● Google Cloud Function ● Azure Function info@zekeLabs.com | www.zekeLabs.com | +91
  • 90. Distributed Machine Learning using Spark ● Apache Spark is a distributed data processing framework. ● Many machine learning algorithms are implemented in Spark. ● Most of the API’s are same that of scikit- learn ● Scaled ETL & Machine Learning can be done using Spark info@zekeLabs.com | www.zekeLabs.com | +91
  • 91. Other alternatives Google Cloud AI info@zekeLabs.com | www.zekeLabs.com | +91
  • 92. Module 5 Build AI Team ● Adoption of AI ● Skills ● Hiring or upskilling ● Upskilling workforce info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 93. Adoption Strategy Build Business Case Scale Efficiently Create Data Driven Culture
  • 95. Talent Acquisition ● Upskill your current team ? info@zekeLabs.com | www.zekeLabs.com | +91
  • 96. Upskilling workforce ● It’s possible to make use of the people who have delivered for you in the past.
  • 97. Q & A info@zekeLabs.com | www.zekeLabs.com | +91
  • 98. Repositories ● https://github.com/zekelabs/machine-learning-for-beginners ● https://github.com/zekelabs/tensorflow-tutorial/ ● Dog breed prediction - https://www.edyoda.com/resources/watch/54AEA4CDC35394F1183A9D D17AA47/ ● Python learning course - https://www.edyoda.com/resources/videolisting/98/ info@zekeLabs.com | www.zekeLabs.com | +91
  • 99. Visit : www.zekeLabs.com for more details THANK YOU Let us know how can we help your organization to Upskill the employees to stay updated in the ever-evolving IT Industry. Get in touch: www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com