SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
MLOps and the Feature Store
with Hopsworks
Jim Dowling
CEO, Hopsworks
DC Data Science Meetup,
Sep 14th 2021
We all take different Journeys to arrive at the Feature Store
Data Engineer
“Gotta feed those data
‘scientists’ with data”
Data Scientist
“Hello!?! Hello!?!
Is there any data out there?”
ML Engineer
And then she said
“productionize this notebook”
Feature Store
We all take different Journeys to arrive at MLOps
Data Engineer
Orchestrated Pipelines,
baby!
Data Scientist
Notebooks as Jobs, yay!
ML Engineer
Containerize, kubernetize,
observerize!
Feature Store
triggers them
Feature
Store
Feature
Engineering
Model
Training
Model
serving
Model
monitoring
Validate
& Test
Input Data
MLOps with a Feature Store
SQL or Python or Spark for Feature Engineering?
SQL Features
(Table)
DB
DB
Python
Features
(Dataframe)
Msg Bus
Files
Extract,
Aggregate,
Transform
Spark
DBT
Extract,
Aggregate,
Transform
What Feature Engineering do we typically perform where?
Aggregations,
Data Validation
Training
Data
Serving
Raw Data
Feature
Store
Model
Repo
Transformations Input Data
Need to ensure no
skew between training
and serving
transformations
Feature Group
Feature 1 Feature M
Primary Key
0 ... ...
1 ... ...
2 ... ...
... ... ...
N ... ...
import hsfs
connection = hsfs.connection(...)
fs = connection.get_feature_store()
fg_meta = fs.create_feature_group(name="sales_fg",
version=1,
primary_key=['store',’date’,’dept’],
event_time="ts",
description="customer features",
online_enabled=True)
HSFS API - Create Feature Groups
sales_fg = fg.get_feature_group(“sales_fg”, version=1)
df = # featurize some data to ingest into the feature store
sales_fg.insert(df)
Batch insert/backfilling features into the Feature Store
Spark Streaming insertion of features into the Feature Store
sales_fg = fg.get_feature_group(“sales_fg”, version=1)
streaming_df = # get streaming dataframe to ingest into the feature store
sales_fg.insert_stream(streaming_df)
Data Validation for Feature Groups (using Deequ)
expectation_sales = fs.create_expectation(..,
rules=[Rule(name="HAS_MIN", level="WARNING", min=0),
Rule(name="HAS_MAX", level="ERROR", max=1000000)])
sales_fg = fg.get_feature_group(“sales_fg”, version=1)
sales_fg.attach_expectation(expectation_sales)
df = # get some dataframe to ingest into the feature store
# Run Data Validation Rules when data is written
sales_fg.insert(df)
On-Demand Feature Groups (External Tables)
snowflake_conn = fs.get_storage_connector("telco_snowflake_cluster")
telco_on_dmd = fs.create_on_demand_feature_group(name="telco_snowflake",
version=latest_version,
query="select * from telco",
description="On-demand FG",
storage_connector=snowflake_conn,
statistics_config=True)
telco_on_dmd.save()
You can also use connectors to any JDBC source or S3 source or ADLS on Azure
JOIN, Transform, Filter Features to create Training Datasets
Feature 1
LABEL
(CHURN_weekly)
Feature J
Primary Key
0 ... ... 1
1 ... ... 0
2 ... ... 0
... ... ... ...
N ... ... 1
Feature 1 Feature M
Primary Key
0 ... ...
1 ... ...
2 ... ...
... ... ...
N ... ...
Feature 1 Feature J
Primary Key
0 ... ...
1 ... ...
2 ... ...
... ... ...
N ... ...
Feature Group A Feature Group B
Training Dataset
Transform, Filter
HSFS API - Transformation Functions
# Store in a Python module. More than 1 transformation fn per file is allowed.
from datetime import datetime
def date_string_to_timestamp(date):
date_format = "%Y%m%d%H%M%S"
return int(float(datetime.strptime(date, date_format).timestamp()) * 1000)
HSFS API - Create Training Datasets with Transformations
date_string_2_ts = fs.create_transformation_function(
transformation_function=python_file.date_string_to_timestamp,
output_type="long", version=1)
# JOIN the features together
query = sales_fg.select_all().join(exogeneous_fg.select(['fuel_price',‘cpi’])
td = fs.create_training_dataset(name="sales_dc_td",
description="Dataset to train the Sales model for DC",
data_format="tfrecord",
transformation_functions={"sale_ts":date_string_2_ts},
version=1,
label=[”label_col”])
.filter(state=”DC”)
td.save(query)
16
Feature Store
Batch
Inference
Report
Model
Serving Feature Store
Latency and availability are critical for user experience
High throughput important, latency not critical
Analytical Models Operational Models
Feature Vectors
Models retrieve pre-computed features (Feature Vectors) from the Feature Store
Feature 1 CHURN_weekly
Feature N
Primary Key
ID ... ... N/A
From App From Feature Store No Label - Predict it
Lookup Features from Feature Store using “ID”
Note: this is the sames features as in the Training Dataset, minus the label
HSFS API - Serving
td = fs.get_training_dataset(“sales_dc_td”, version=1)
td.init_prepared_statement()
# online transformation functions are transparently applied before returning
prediction_array = td.get_serving_vector({“date”: “2021-06-01 21:04:00”})
# call model with ‘prediction_array’ as input
transaction_type
transaction_amount
user_id
user_nationality
user_gender
transactions_fg
users_fg
Feature Groups Training
Datasets
pk join
fraud_td
Descriptive
Statistics,
Feature
Correlations,
Histograms
...
Use for Drift
Detection
fraud_classifier
Models
Training Data
Features Models
Raw
Data
From Raw Data to Production Models in Hopsworks
Provenance Graph of Dependencies
Feature Groups Models
Training Datasets
Changes in upstream entities trigger actions that can cause downstream computations to run
Upstream Downstream
MLOps is Feature Pipelines, Training Pipelines, and Model Monitoring
transaction_type
transaction_amount
user_id
user_nationality
user_gender
transactions_fg
users_fg
Feature Groups Training
Datasets
pk join
fraud_td
Descriptive
Statistics,
Feature
Correlations,
Histograms
...
Use for Drift
Detection
fraud_classifier
Models
Feature Pipeline Training Pipeline
Model
Monitoring
Feature
Store
Feature
Engineering
Model
Training
Model
serving
Model
monitoring
ML Engineers
Data Scientists
Model
Testing
Data Engineers
Architects (Governance)
Roles and Responsibilities in a ML Pipeline
CI/CD Triggers and Orchestration of Pipelines in MLOps
Enterprise
Data
Model
Registry
Feature
Pipeline
Model
Serving
Training
Pipeline
Feature
Store
Orchestrator: Airflow, Github Actions, Jenkins
CI/CD Triggers: Code commit, New data, time trigger (e.g., daily)
Model
Monitoring
Orchestrate Feature and Training Pipelines with Airflow in Hopsworks
Feature Engineering
Notebook/Job
Validate on Data Slices
& Deploy Model
Run Experiment
to Train Model
Select Features, File Format
and Create Training Data
FEATURE
STORE
Data Science
Data Engineering Compliance & Regulatory
Feature Store
Teams use the tools of their choice,
integrated with the
Hopsworks Feature Store
Model Serving
Hopsworks is an Open, Modular Feature Store that can Plug into ML Pipelines
26
Feature Pipeline
Feature Store
Batch or Streaming
Feature Pipeline
Enterprise Datastores
Aggregations
Data Validation
27
Training Pipeline
Model
architecture
Select
target,
features
Find best
HParams
Train model
(distributed)
Validate
Model
Deploy
Model
Feature Store
Maggy - Experiments, Distributed ML, and write-once training logic
https://www.youtube.com/watch?v=1SHOwl37I5c
KubeFlow Model Serving (KFServing), the Feature Store, and Logging to Kafka
Local Remote
AI-Enabled
Application
KFServing Feature Store
1. 2.
3.
4.
1. Prediction Request
2. Request Features
3. Return Enriched Feature Vector
4. Predict, Log, & Return Result
class Transformer:
def _init_(self):
self.fs = #connect to feature store
self.td = self.fs.get_training_dataset("sales_dc_td")
def preprocess(inputs):
return td.get_serving_vector(inputs["some-key"])
2. Request Features from inside the KFServing Transformer
Kafka
4.
29
Model Monitoring from KFServing Logs
Usage example
Windowed Outliers
Pipe
Windowed Drift Pipe
Stats Outliers Pipe
Stats Drift Pipe
Outliers Pipe
Drift Pipe
Monitor pipe Window pipe
Stats pipe
Sink Pipe
Alerts
Reports
Insights
Prediction
Requests
Kafka
30
New Training Data from Prediction Logs and the Evaluation Store
Prediction
Requests
● Interactive Queries to debug the Model
● Interactive Queries to debug Inference Data
● Inspect Model KPIs Charts
● Inspect Model Serving Performance Charts
● Identify Model/Data Drift
● Interactive Queries to Audit Logs
Evaluation
Store
Feature
Store
ML Engineer
Data Scientist
● Understand Live Model Performance
● Use new Training Data
Kafka
End-to-End Example -
Anti-Money Laundering
https://github.com/logicalclocks/AMLend2end
CUSTOMER CASE STUDY SWEDBANK - ANTI-MONEY LAUNDERING (AML) WITH HOPSWORKS
THE CHALLENGE
Increase detection rate and reduce false positives and costs for AML.
GANs with
a ~40 TB
transaction dataset
Spark for Feature
Engineering
(including graph embeddings)
TensorFlow/GPUs to
train a GAN
Features, Scale-out
training, models, model
serving
Webinar, Thursday 16th, 9am PT:
https://info.nvidia.com/accelerate-financial-fraud-detection-webinar.html?ncid=so-link-610204-vt09&linkId=100000063386013
With Hopsworks, Swedbank managed to decrease in 99% of their false positive compared
to their previous system (rule based).
RULES-BASE AML vs DEEP LEARNING AML
CUSTOMER CASE STUDY SWEDBANK - ANTI-MONEY LAUNDERING (AML) WITH HOPSWORKS
Kafka
Teradata
Cloudera
AML
Application
Retrieve
Features
(<10 ms)
Real-Time Financial Features
Customer Credit Score / KYC
Historial Financial Transactions
Is this Money Transfer Suspicious?
Model
Train (40 TB)
Hopsworks Feature Store is the central location where all the data (features) are stored and manipulated
to be used for the AML application.
Hopsworks
Feature Store
35
Anti-Money Laundering End-to-End Example
transactions alert_transactions
party
trans_embeddings alert_trans_embeddings
training_data
user_id is the join key for party and (alert_)transactions
test_data
trans_id is the join key for (alert_)transactions and (alert_)trans_embeddings
user_id
trans_id trans_id
MLOps Lifecycle with Hopsworks
Enterprise
Data
Model
Registry
Feature
Engineering
Model
Serving
Model
Training
Model
Deploy
Model
Monitoring
Log Predictions Statistics
CDC
Experiments
Feature Statistics
A/B Test
Model
Metadata
Serving
Statistics
Free-text Search,
Provenance API
RonDB
Feature Store
Elasticsearch
RonDB
Metastore
Feature Vectors
Demo
Anti-Money Laundering
https://github.com/logicalclocks/AMLend2end
Training
Development
Model Repo
Model Serving
Output
Feature
Store
Feature
Engineering
Sources
Feature
Store
Database
Application/ERP
Logs
3rd Party APIs
Object and File Storage
• • •
Dashboards
Batch Applications
Augmented Analytics
Applications
Microservices
• • •
Hopsworks - Design and Operate AI Applications
Python
Spark/SQL
Spark
Streaming
Flink
Any Python Library
HopsFS (S3 / Azure Blob Storage)
RonDB
www.hopsworks.ai
-
@hopsworks
github.com/logicalclocks/hopsworks
github.com/logicalclocks/hopsworks
-
@logicalclocks
-
www.logicalclocks.com
Feature serving both online and batch
41
Offline Feature Store
OnlineFS-ClusterJ
OnlineFS-ClusterJ
HSFS
FG 2
FG 1
OnlineFS-ClusterJ
Meta Data (Avro Schema)
Online
Feature Store
Scalable stateless Online FS
upsert ingestion service
Kafka topic per Online
Feature Group
FG 1 FG 2 FG 3
Meta Data
Meta Data (Avro Schema)
Upsert
based on
Primary Key
Consume
and decode
Encode and
produce
Upsert
User/Application
fg.insert(df)
42
RonDB powers the Hopsworks Platform
RonDB makes Hopsworks the only LATS Feature Store
< 1ms KV lookup
>10M KV Lookups/sec
>99.999% availability

Contenu connexe

Tendances

Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsAndrzej Michałowski
 
The Feature Store in Hopsworks
The Feature Store in HopsworksThe Feature Store in Hopsworks
The Feature Store in HopsworksJim Dowling
 
Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIQAware GmbH
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Jim Dowling
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingJim Dowling
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020Jim Dowling
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...Databricks
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyJim Dowling
 
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019Kim Hammar
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlowTensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlowDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycleDatabricks
 
Introducing apache prediction io (incubating) (bay area spark meetup at sales...
Introducing apache prediction io (incubating) (bay area spark meetup at sales...Introducing apache prediction io (incubating) (bay area spark meetup at sales...
Introducing apache prediction io (incubating) (bay area spark meetup at sales...Databricks
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Databricks
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningStepan Pushkarev
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&MDatabricks
 

Tendances (20)

Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systems
 
The Feature Store in Hopsworks
The Feature Store in HopsworksThe Feature Store in Hopsworks
The Feature Store in Hopsworks
 
Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AI
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
 
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlowTensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
 
Introducing apache prediction io (incubating) (bay area spark meetup at sales...
Introducing apache prediction io (incubating) (bay area spark meetup at sales...Introducing apache prediction io (incubating) (bay area spark meetup at sales...
Introducing apache prediction io (incubating) (bay area spark meetup at sales...
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 

Similaire à Ml ops and the feature store with hopsworks, DC Data Science Meetup

KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreDatabricks
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreMoritz Meister
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Provectus
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
 
Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Karthik Murugesan
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkFaisal Siddiqi
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfJim Dowling
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real ExperienceIhor Bobak
 
(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine LearningRebecca Bilbro
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...ScyllaDB
 
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...Piyush Kumar
 
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA
 

Similaire à Ml ops and the feature store with hopsworks, DC Data Science Meetup (20)

KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature Store
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
 
Data Product Architectures
Data Product ArchitecturesData Product Architectures
Data Product Architectures
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real Experience
 
(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
 
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
 
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
 

Plus de Jim Dowling

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfJim Dowling
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdfJim Dowling
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022Jim Dowling
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money LaunderingJim Dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityJim Dowling
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019Jim Dowling
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJim Dowling
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsJim Dowling
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIJim Dowling
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Jim Dowling
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceJim Dowling
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraJim Dowling
 
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsScaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsJim Dowling
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsJim Dowling
 

Plus de Jim Dowling (16)

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on Hops
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in Finance
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
 
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsScaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on Hops
 

Dernier

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Ml ops and the feature store with hopsworks, DC Data Science Meetup

  • 1. MLOps and the Feature Store with Hopsworks Jim Dowling CEO, Hopsworks DC Data Science Meetup, Sep 14th 2021
  • 2. We all take different Journeys to arrive at the Feature Store Data Engineer “Gotta feed those data ‘scientists’ with data” Data Scientist “Hello!?! Hello!?! Is there any data out there?” ML Engineer And then she said “productionize this notebook” Feature Store
  • 3. We all take different Journeys to arrive at MLOps Data Engineer Orchestrated Pipelines, baby! Data Scientist Notebooks as Jobs, yay! ML Engineer Containerize, kubernetize, observerize! Feature Store triggers them
  • 5. SQL or Python or Spark for Feature Engineering? SQL Features (Table) DB DB Python Features (Dataframe) Msg Bus Files Extract, Aggregate, Transform Spark DBT Extract, Aggregate, Transform
  • 6. What Feature Engineering do we typically perform where? Aggregations, Data Validation Training Data Serving Raw Data Feature Store Model Repo Transformations Input Data Need to ensure no skew between training and serving transformations
  • 7. Feature Group Feature 1 Feature M Primary Key 0 ... ... 1 ... ... 2 ... ... ... ... ... N ... ...
  • 8. import hsfs connection = hsfs.connection(...) fs = connection.get_feature_store() fg_meta = fs.create_feature_group(name="sales_fg", version=1, primary_key=['store',’date’,’dept’], event_time="ts", description="customer features", online_enabled=True) HSFS API - Create Feature Groups
  • 9. sales_fg = fg.get_feature_group(“sales_fg”, version=1) df = # featurize some data to ingest into the feature store sales_fg.insert(df) Batch insert/backfilling features into the Feature Store
  • 10. Spark Streaming insertion of features into the Feature Store sales_fg = fg.get_feature_group(“sales_fg”, version=1) streaming_df = # get streaming dataframe to ingest into the feature store sales_fg.insert_stream(streaming_df)
  • 11. Data Validation for Feature Groups (using Deequ) expectation_sales = fs.create_expectation(.., rules=[Rule(name="HAS_MIN", level="WARNING", min=0), Rule(name="HAS_MAX", level="ERROR", max=1000000)]) sales_fg = fg.get_feature_group(“sales_fg”, version=1) sales_fg.attach_expectation(expectation_sales) df = # get some dataframe to ingest into the feature store # Run Data Validation Rules when data is written sales_fg.insert(df)
  • 12. On-Demand Feature Groups (External Tables) snowflake_conn = fs.get_storage_connector("telco_snowflake_cluster") telco_on_dmd = fs.create_on_demand_feature_group(name="telco_snowflake", version=latest_version, query="select * from telco", description="On-demand FG", storage_connector=snowflake_conn, statistics_config=True) telco_on_dmd.save() You can also use connectors to any JDBC source or S3 source or ADLS on Azure
  • 13. JOIN, Transform, Filter Features to create Training Datasets Feature 1 LABEL (CHURN_weekly) Feature J Primary Key 0 ... ... 1 1 ... ... 0 2 ... ... 0 ... ... ... ... N ... ... 1 Feature 1 Feature M Primary Key 0 ... ... 1 ... ... 2 ... ... ... ... ... N ... ... Feature 1 Feature J Primary Key 0 ... ... 1 ... ... 2 ... ... ... ... ... N ... ... Feature Group A Feature Group B Training Dataset Transform, Filter
  • 14. HSFS API - Transformation Functions # Store in a Python module. More than 1 transformation fn per file is allowed. from datetime import datetime def date_string_to_timestamp(date): date_format = "%Y%m%d%H%M%S" return int(float(datetime.strptime(date, date_format).timestamp()) * 1000)
  • 15. HSFS API - Create Training Datasets with Transformations date_string_2_ts = fs.create_transformation_function( transformation_function=python_file.date_string_to_timestamp, output_type="long", version=1) # JOIN the features together query = sales_fg.select_all().join(exogeneous_fg.select(['fuel_price',‘cpi’]) td = fs.create_training_dataset(name="sales_dc_td", description="Dataset to train the Sales model for DC", data_format="tfrecord", transformation_functions={"sale_ts":date_string_2_ts}, version=1, label=[”label_col”]) .filter(state=”DC”) td.save(query)
  • 16. 16 Feature Store Batch Inference Report Model Serving Feature Store Latency and availability are critical for user experience High throughput important, latency not critical Analytical Models Operational Models Feature Vectors
  • 17. Models retrieve pre-computed features (Feature Vectors) from the Feature Store Feature 1 CHURN_weekly Feature N Primary Key ID ... ... N/A From App From Feature Store No Label - Predict it Lookup Features from Feature Store using “ID” Note: this is the sames features as in the Training Dataset, minus the label
  • 18. HSFS API - Serving td = fs.get_training_dataset(“sales_dc_td”, version=1) td.init_prepared_statement() # online transformation functions are transparently applied before returning prediction_array = td.get_serving_vector({“date”: “2021-06-01 21:04:00”}) # call model with ‘prediction_array’ as input
  • 19. transaction_type transaction_amount user_id user_nationality user_gender transactions_fg users_fg Feature Groups Training Datasets pk join fraud_td Descriptive Statistics, Feature Correlations, Histograms ... Use for Drift Detection fraud_classifier Models Training Data Features Models Raw Data From Raw Data to Production Models in Hopsworks
  • 20. Provenance Graph of Dependencies Feature Groups Models Training Datasets Changes in upstream entities trigger actions that can cause downstream computations to run Upstream Downstream
  • 21. MLOps is Feature Pipelines, Training Pipelines, and Model Monitoring transaction_type transaction_amount user_id user_nationality user_gender transactions_fg users_fg Feature Groups Training Datasets pk join fraud_td Descriptive Statistics, Feature Correlations, Histograms ... Use for Drift Detection fraud_classifier Models Feature Pipeline Training Pipeline Model Monitoring
  • 23. CI/CD Triggers and Orchestration of Pipelines in MLOps Enterprise Data Model Registry Feature Pipeline Model Serving Training Pipeline Feature Store Orchestrator: Airflow, Github Actions, Jenkins CI/CD Triggers: Code commit, New data, time trigger (e.g., daily) Model Monitoring
  • 24. Orchestrate Feature and Training Pipelines with Airflow in Hopsworks Feature Engineering Notebook/Job Validate on Data Slices & Deploy Model Run Experiment to Train Model Select Features, File Format and Create Training Data FEATURE STORE
  • 25. Data Science Data Engineering Compliance & Regulatory Feature Store Teams use the tools of their choice, integrated with the Hopsworks Feature Store Model Serving Hopsworks is an Open, Modular Feature Store that can Plug into ML Pipelines
  • 26. 26 Feature Pipeline Feature Store Batch or Streaming Feature Pipeline Enterprise Datastores Aggregations Data Validation
  • 27. 27 Training Pipeline Model architecture Select target, features Find best HParams Train model (distributed) Validate Model Deploy Model Feature Store Maggy - Experiments, Distributed ML, and write-once training logic https://www.youtube.com/watch?v=1SHOwl37I5c
  • 28. KubeFlow Model Serving (KFServing), the Feature Store, and Logging to Kafka Local Remote AI-Enabled Application KFServing Feature Store 1. 2. 3. 4. 1. Prediction Request 2. Request Features 3. Return Enriched Feature Vector 4. Predict, Log, & Return Result class Transformer: def _init_(self): self.fs = #connect to feature store self.td = self.fs.get_training_dataset("sales_dc_td") def preprocess(inputs): return td.get_serving_vector(inputs["some-key"]) 2. Request Features from inside the KFServing Transformer Kafka 4.
  • 29. 29 Model Monitoring from KFServing Logs Usage example Windowed Outliers Pipe Windowed Drift Pipe Stats Outliers Pipe Stats Drift Pipe Outliers Pipe Drift Pipe Monitor pipe Window pipe Stats pipe Sink Pipe Alerts Reports Insights Prediction Requests Kafka
  • 30. 30 New Training Data from Prediction Logs and the Evaluation Store Prediction Requests ● Interactive Queries to debug the Model ● Interactive Queries to debug Inference Data ● Inspect Model KPIs Charts ● Inspect Model Serving Performance Charts ● Identify Model/Data Drift ● Interactive Queries to Audit Logs Evaluation Store Feature Store ML Engineer Data Scientist ● Understand Live Model Performance ● Use new Training Data Kafka
  • 31. End-to-End Example - Anti-Money Laundering https://github.com/logicalclocks/AMLend2end
  • 32. CUSTOMER CASE STUDY SWEDBANK - ANTI-MONEY LAUNDERING (AML) WITH HOPSWORKS THE CHALLENGE Increase detection rate and reduce false positives and costs for AML. GANs with a ~40 TB transaction dataset Spark for Feature Engineering (including graph embeddings) TensorFlow/GPUs to train a GAN Features, Scale-out training, models, model serving Webinar, Thursday 16th, 9am PT: https://info.nvidia.com/accelerate-financial-fraud-detection-webinar.html?ncid=so-link-610204-vt09&linkId=100000063386013 With Hopsworks, Swedbank managed to decrease in 99% of their false positive compared to their previous system (rule based).
  • 33. RULES-BASE AML vs DEEP LEARNING AML
  • 34. CUSTOMER CASE STUDY SWEDBANK - ANTI-MONEY LAUNDERING (AML) WITH HOPSWORKS Kafka Teradata Cloudera AML Application Retrieve Features (<10 ms) Real-Time Financial Features Customer Credit Score / KYC Historial Financial Transactions Is this Money Transfer Suspicious? Model Train (40 TB) Hopsworks Feature Store is the central location where all the data (features) are stored and manipulated to be used for the AML application. Hopsworks Feature Store
  • 35. 35 Anti-Money Laundering End-to-End Example transactions alert_transactions party trans_embeddings alert_trans_embeddings training_data user_id is the join key for party and (alert_)transactions test_data trans_id is the join key for (alert_)transactions and (alert_)trans_embeddings user_id trans_id trans_id
  • 36. MLOps Lifecycle with Hopsworks Enterprise Data Model Registry Feature Engineering Model Serving Model Training Model Deploy Model Monitoring Log Predictions Statistics CDC Experiments Feature Statistics A/B Test Model Metadata Serving Statistics Free-text Search, Provenance API RonDB Feature Store Elasticsearch RonDB Metastore Feature Vectors
  • 38. Training Development Model Repo Model Serving Output Feature Store Feature Engineering Sources Feature Store Database Application/ERP Logs 3rd Party APIs Object and File Storage • • • Dashboards Batch Applications Augmented Analytics Applications Microservices • • • Hopsworks - Design and Operate AI Applications Python Spark/SQL Spark Streaming Flink Any Python Library HopsFS (S3 / Azure Blob Storage) RonDB
  • 41. Feature serving both online and batch 41 Offline Feature Store OnlineFS-ClusterJ OnlineFS-ClusterJ HSFS FG 2 FG 1 OnlineFS-ClusterJ Meta Data (Avro Schema) Online Feature Store Scalable stateless Online FS upsert ingestion service Kafka topic per Online Feature Group FG 1 FG 2 FG 3 Meta Data Meta Data (Avro Schema) Upsert based on Primary Key Consume and decode Encode and produce Upsert User/Application fg.insert(df)
  • 42. 42 RonDB powers the Hopsworks Platform RonDB makes Hopsworks the only LATS Feature Store < 1ms KV lookup >10M KV Lookups/sec >99.999% availability