A Microservices Framework for Real-Time Model Scoring Using Structured Streaming with Vedant Jain

•

1 j'aime•1,028 vues

Open-source technologies allow developers to build microservices framework to build myriad real-time applications. One such application is building the real-time model scoring. In this session, we will showcase how to architect a microservice framework, in particular how to use it to build a low-latency, real-time model scoring system. At the core of the architecture lies Apache Spark’s Structured Streaming capability to deliver low-latency predictions coupled with Docker and Flask as additional open source tools for model service. In this session, you will walk away with: * Knowledge of enterprise-grade model as a service * Streaming architecture design principles enabling real-time machine learning * Key concepts and building blocks for real-time model scoring * Real-time and production use cases across industries, such as IIOT, predictive maintenance, fraud detection, sepsis etc.

Données & analyses

A Microservices Framework
for
Real time Model Scoring
using
Structured Streaming
Vedant Jain
October 2018
#SAISStreaming1

2
About me
• Solutions Architect @ Databricks
• x-Hortonworks, JPMC

3
What will we talk about?
§ Machine learning
§ Model scoring
§ Microservices
§ Structured Streaming
§ Demo

4
Machine Learning
“…what we want is a machine that can learn from experience.”

Types of Machine Learning
• Unsupervised Learning

X Y Z
0.5 -3.4 9.2
-2.3 2.9 3.2
1.3 2.3 -4.5
• Supervised Learning
Types of Machine Learning
Activity
Biking
Standing
Sitting
Labels
X Y Z
0.5 -3.4 9.2

X Y Z
0.5 -3.4 9.2
-2.3 2.9 3.2
1.3 2.3 -4.5
• Supervised Learning
Types of Machine Learning
Activity
Biking
Standing
Sitting

Continuous Learning
Machine Learning Process
Define the
problem
Connect data
source
Acquire Data
Enrich, Transform
Data
Feature
Selection
Build/Train
Model
Serve/Score
Model

9
What will we talk about?
§ Machine learning
§ Model scoring
§ Microservices
§ Structured Streaming
§ Demo

11
Model Scoring on “Big Data”
Scale Streaming Dynamic

Machine Learning Pipeline
12
• Drop null values
• Calculate moving
averages on 10 minute
window
• Convert to single
parquet
• OneHotEncoder
• RF Classifier
• Cross Tabulation
• K-Means Clustering
• Hyperparameter Tuning
• Pipelines
• Deploy Pipelines to
Docker
• Serve the models using
Python Flask
• Phone/Watch
• Gyroscope/Accelerometer
• CSV

13
What will we talk about?
§ Machine learning
§ Model scoring
§ Microservices
§ Structured Streaming
§ Demo

Microservices
● Properties:
Ø Decomposition: Logic is broken down into multiple independent components
Ø Isolation: Component services are deployed and maintained independently of one another
● Benefits:
ü Reduced Regression Testing time
ü Organizational Autonomy
ü Cloud/On-premise Agnostic
ü Scalability/Reusability

Microservices
Example:
Company A tracks user’s activity through smart
devices and wants to provide tailored content to the
users based on their behavior.

Microservices
X Y Z Activity
0.5 -3.4 9.2 Biking
-2.3 2.9 3.2 Standing
1.3 2.3 -4.5 Sitting
‘Activity’
score
User Bike Stand Sit Cluster
A 0.1 0.3 0.6 0
B 0.0 0.3 0.7 1
Actionable Insights
Web services
Target
content
Affinity Score

Microservices
Scalability
• Consistent
• Scalable
• Fault tolerant
• Complex Event
Processing

21
What will we talk about?
§ Machine learning
§ Model scoring
§ Microservices
§ Structured Streaming
§ Demo

Structured Streaming
• High-level streaming API
built on Spark SQL engine
• Runs the same computation
as batch queries in
Datasets/DataFrames
• Event time, windowing,
sessions, sources & sinks
• End-to-end exactly once
semantics
• Late Data Handling

23
ML Limitations in Streaming
• Many models/transformers/estimators are not supported
• Limited to only models built in Spark MLLib
• Not ideal for Continuous Learning
23

24
What will we talk about?
§ Machine Learning
§ Model scoring
§ Microservices
§ Structured Streaming
§ Demo
https://github.com/vedantja/eu_summit_demo

Recommandé

What's New in Spring Boot 2.5ikeyat

Thesis Midterm presentationOana Sipos

Pa 298 measures of correlationMaria Theresa

benjamin kenwright webgpu api lecture 1.pptxAuthorised

Introduction to RNA-seqPaul Gardner

Introduction to HMMER - A biosequence analysis tool with Hidden Markov Models Anax Fotopoulos

猫に教えてもらうルベーグ可測Shuyo Nakatani

DNA Methylation: An Essential Element in Epigenetics Facts and TechnologiesQIAGEN

Recommandé

What's New in Spring Boot 2.5ikeyat

Thesis Midterm presentationOana Sipos

Pa 298 measures of correlationMaria Theresa

benjamin kenwright webgpu api lecture 1.pptxAuthorised

Introduction to RNA-seqPaul Gardner

Introduction to HMMER - A biosequence analysis tool with Hidden Markov Models Anax Fotopoulos

猫に教えてもらうルベーグ可測Shuyo Nakatani

DNA Methylation: An Essential Element in Epigenetics Facts and TechnologiesQIAGEN

[CB19] Recent APT attack on crypto exchange employees by Heungsoo KangCODE BLUE

Big Data AnalyticsEMC

Lect 1 introductionhktripathy

RNA-seq: A High-resolution View of the TranscriptomeSean Davis

Data science for everyonePranavathiyani G

Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2Daniel Katz

My PhD thesis defense presentationSuman Srinivasan

A Mashup with BackboneC/D/H Technology Consultants

Marwa_Ezzatt_Ahmed_CVMarwa Ezzat

Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati

Data Management is a Team Sport - IBMMongoDB

1,2,3 … Testing : Is this thing on(line)? with Mike MartinNETUserGroupBern

Cross Platform Angular 2 and TypeScript DevelopmentJeremy Likness

Building an ML model with zero codeNick Trogh

Canada DevOps Summit 2020 Presentation Nov_03_2020Varun Manik

Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...Bhakthi Liyanage

Accelerate ML Deployment with H2O Driverless AI on AWSSri Ambati

5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday SeasonG3 Communications

Danny Bickson - Python based predictive analytics with GraphLab Create PyData

motorized bike j2ee ppt explanation of projectprabhat kumar

Kong Summit 2018 - Microservices: decomposing applications for testability an...Chris Richardson

Getting Started With Dato - August 2015Turi, Inc.

Contenu connexe

Tendances

[CB19] Recent APT attack on crypto exchange employees by Heungsoo KangCODE BLUE

Big Data AnalyticsEMC

Lect 1 introductionhktripathy

RNA-seq: A High-resolution View of the TranscriptomeSean Davis

Data science for everyonePranavathiyani G

Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2Daniel Katz

My PhD thesis defense presentationSuman Srinivasan

Tendances (7)

[CB19] Recent APT attack on crypto exchange employees by Heungsoo Kang

Big Data Analytics

Lect 1 introduction

RNA-seq: A High-resolution View of the Transcriptome

Data science for everyone

Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2

My PhD thesis defense presentation

Similaire à A Microservices Framework for Real-Time Model Scoring Using Structured Streaming with Vedant Jain

A Mashup with BackboneC/D/H Technology Consultants

Marwa_Ezzatt_Ahmed_CVMarwa Ezzat

Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati

Data Management is a Team Sport - IBMMongoDB

1,2,3 … Testing : Is this thing on(line)? with Mike MartinNETUserGroupBern

Cross Platform Angular 2 and TypeScript DevelopmentJeremy Likness

Building an ML model with zero codeNick Trogh

Canada DevOps Summit 2020 Presentation Nov_03_2020Varun Manik

Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...Bhakthi Liyanage

Accelerate ML Deployment with H2O Driverless AI on AWSSri Ambati

5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday SeasonG3 Communications

Danny Bickson - Python based predictive analytics with GraphLab Create PyData

motorized bike j2ee ppt explanation of projectprabhat kumar

Kong Summit 2018 - Microservices: decomposing applications for testability an...Chris Richardson

Getting Started With Dato - August 2015Turi, Inc.

Migrating to Microservices Patterns and Technologies (edition 2023)Ahmed Misbah

Ahmed El Mawaziny CVAhmed El Mawaziny

Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019Conclusion Connect enabling industry 4.0 with IoT

From Monoliths to Services: Paying Your Technical DebtTechWell

Role of artificial intellligence in construction engg & managementKundan Sanap

Similaire à A Microservices Framework for Real-Time Model Scoring Using Structured Streaming with Vedant Jain (20)

A Mashup with Backbone

Marwa_Ezzatt_Ahmed_CV

Building a Real-Time Security Application Using Log Data and Machine Learning...

Data Management is a Team Sport - IBM

1,2,3 … Testing : Is this thing on(line)? with Mike Martin

Cross Platform Angular 2 and TypeScript Development

Building an ML model with zero code

Canada DevOps Summit 2020 Presentation Nov_03_2020

Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...

Accelerate ML Deployment with H2O Driverless AI on AWS

5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season

Danny Bickson - Python based predictive analytics with GraphLab Create

motorized bike j2ee ppt explanation of project

Kong Summit 2018 - Microservices: decomposing applications for testability an...

Getting Started With Dato - August 2015

Migrating to Microservices Patterns and Technologies (edition 2023)

Ahmed El Mawaziny CV

Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019

From Monoliths to Services: Paying Your Technical Debt

Role of artificial intellligence in construction engg & management

Plus de Databricks

DW Migration Webinar-March 2022.pptxDatabricks

Data Lakehouse Symposium | Day 1 | Part 1Databricks

Data Lakehouse Symposium | Day 1 | Part 2Databricks

Data Lakehouse Symposium | Day 2Databricks

Data Lakehouse Symposium | Day 4Databricks

5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks

Democratizing Data Quality Through a Centralized PlatformDatabricks

Learn to Use Databricks for Data ScienceDatabricks

Why APM Is Not the Same As ML MonitoringDatabricks

The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks

Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks

Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks

Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks

Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks

Sawtooth Windows for Feature AggregationsDatabricks

Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks

Re-imagine Data Monitoring with whylogs and SparkDatabricks

Raven: End-to-end Optimization of ML Prediction QueriesDatabricks

Processing Large Datasets for ADAS Applications using Apache SparkDatabricks

Massive Data Processing in Adobe Using Delta LakeDatabricks

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx

Data Lakehouse Symposium | Day 1 | Part 1

Data Lakehouse Symposium | Day 1 | Part 2

Data Lakehouse Symposium | Day 2

Data Lakehouse Symposium | Day 4

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Democratizing Data Quality Through a Centralized Platform

Learn to Use Databricks for Data Science

Why APM Is Not the Same As ML Monitoring

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

Stage Level Scheduling Improving Big Data and AI Integration

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Scaling your Data Pipelines with Apache Spark on Kubernetes

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Sawtooth Windows for Feature Aggregations

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Re-imagine Data Monitoring with whylogs and Spark

Raven: End-to-end Optimization of ML Prediction Queries

Processing Large Datasets for ADAS Applications using Apache Spark

Massive Data Processing in Adobe Using Delta Lake

Dernier

Capstone Project on IBM Data Analytics ProgramMoniSankarHazra

Midocean dropshipping via API with DroFxolyaivanovalion

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

Halmar dropshipping via API with DroFxolyaivanovalion

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Zuja dropshipping via API with DroFx.pptxolyaivanovalion

Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

Data-Analysis for Chicago Crime Data 2023ymrp368

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823

Dernier (20)

Capstone Project on IBM Data Analytics Program

Midocean dropshipping via API with DroFx

100-Concepts-of-AI by Anupama Kate .pptx

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

CebaBaby dropshipping via API with DroFX.pptx

Smarteg dropshipping via API with DroFx.pptx

Halmar dropshipping via API with DroFx

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

FESE Capital Markets Fact Sheet 2024 Q1.pdf

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

Edukaciniai dropshipping via API with DroFx

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Zuja dropshipping via API with DroFx.pptx

Sampling (random) method and Non random.ppt

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

Data-Analysis for Chicago Crime Data 2023

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online

A Microservices Framework for Real-Time Model Scoring Using Structured Streaming with Vedant Jain

1. A Microservices Framework for Real time Model Scoring using Structured Streaming Vedant Jain October 2018 #SAISStreaming1

2. 2 About me • Solutions Architect @ Databricks • x-Hortonworks, JPMC

3. 3 What will we talk about? § Machine learning § Model scoring § Microservices § Structured Streaming § Demo

4. 4 Machine Learning “…what we want is a machine that can learn from experience.”

5. Types of Machine Learning • Unsupervised Learning

6. X Y Z 0.5 -3.4 9.2 -2.3 2.9 3.2 1.3 2.3 -4.5 • Supervised Learning Types of Machine Learning Activity Biking Standing Sitting Labels X Y Z 0.5 -3.4 9.2

7. X Y Z 0.5 -3.4 9.2 -2.3 2.9 3.2 1.3 2.3 -4.5 • Supervised Learning Types of Machine Learning Activity Biking Standing Sitting

8. Continuous Learning Machine Learning Process Define the problem Connect data source Acquire Data Enrich, Transform Data Feature Selection Build/Train Model Serve/Score Model

9. 9 What will we talk about? § Machine learning § Model scoring § Microservices § Structured Streaming § Demo

10. Model Scoring Propensity: Likelihood of a user to commit a certain action Affinity: How similar are two products or users etc. Lead: How closely matched lead is to target profile Attrition/Churn: Likelihood of a customer to drop a service and/or start using a competitor’s service Credit: Ability of the user to keep promise if granted access Anomaly Detection: Identification of rare or invalid transaction

11. 11 Model Scoring on “Big Data” Scale Streaming Dynamic

12. Machine Learning Pipeline 12 • Drop null values • Calculate moving averages on 10 minute window • Convert to single parquet • OneHotEncoder • RF Classifier • Cross Tabulation • K-Means Clustering • Hyperparameter Tuning • Pipelines • Deploy Pipelines to Docker • Serve the models using Python Flask • Phone/Watch • Gyroscope/Accelerometer • CSV

13. 13 What will we talk about? § Machine learning § Model scoring § Microservices § Structured Streaming § Demo

14. Microservices ● Properties: Ø Decomposition: Logic is broken down into multiple independent components Ø Isolation: Component services are deployed and maintained independently of one another ● Benefits: ü Reduced Regression Testing time ü Organizational Autonomy ü Cloud/On-premise Agnostic ü Scalability/Reusability

15. Microservices Example: Company A tracks user’s activity through smart devices and wants to provide tailored content to the users based on their behavior.

16. Microservices X Y Z Activity 0.5 -3.4 9.2 Biking -2.3 2.9 3.2 Standing 1.3 2.3 -4.5 Sitting ‘Activity’ score User Bike Stand Sit Cluster A 0.1 0.3 0.6 0 B 0.0 0.3 0.7 1 Actionable Insights Web services Target content Affinity Score

17. Microservices

18. Microservices Isolation

19. Microservices Scalability • Consistent • Scalable • Fault tolerant • Complex Event Processing

20. Microservices

21. 21 What will we talk about? § Machine learning § Model scoring § Microservices § Structured Streaming § Demo

22. Structured Streaming • High-level streaming API built on Spark SQL engine • Runs the same computation as batch queries in Datasets/DataFrames • Event time, windowing, sessions, sources & sinks • End-to-end exactly once semantics • Late Data Handling

23. 23 ML Limitations in Streaming • Many models/transformers/estimators are not supported • Limited to only models built in Spark MLLib • Not ideal for Continuous Learning 23

24. 24 What will we talk about? § Machine Learning § Model scoring § Microservices § Structured Streaming § Demo https://github.com/vedantja/eu_summit_demo