SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Scaling Data and ML with Feast
and Apache Spark
Willem Pienaar
Data Science Platform Lead
Agenda
▪ Overview
▪ Data challenges in production ML
▪ What is Feast?
▪ Getting data into Feast
▪ Feature serving
▪ Feature statistics and validation
▪ Takeaways
▪ The road ahead
Gojek
■ Ride hailing
■ Food delivery
■ Digital payments
■ Logistics
■ Lifestyle services
100m+
app downloads
+500k
merchants
4
countries
1m+
drivers
100m+
monthly bookings
Indonesia
Singapore
Thailand
Vietnam
Machine learning at Gojek
■ Matchmaking
■ Dynamic pricing
■ Routing
■ Recommendation systems
■ Incentive optimization
■ Supply positioning
■ Fraud prevention
Machine learning life cycle prior to Feast
Jupyter Notebook
Model Serving
Production
System
Features
??
Machine learning life cycle prior to Feast
Spark
Transform
Data
Train Model Deploy Model Model Serving
Production
System
Features
Streams
Stream
Processing
Data Lake
Problems with end-to-end ML systems
● Monolithic end-to-end systems are hard to iterate on
● Training code needs to be rewritten for serving
● Training and serving features are inconsistent
● Data quality monitoring and validation is absent
● Lack of feature reuse and sharing
Feast is a system that attempts to solve
the key data challenges with
productionizing machine learning
Feast background
▪ Feature store was a collaboration between Gojek and Google Cloud
▪ Open-sourced in January ‘19
▪ Community driven with adoption/contribution from multiple tech companies
Machine learning life cycle prior to Feast
Spark
Transform
Data
Train Model Deploy Model Model Serving
Production
System
Features
Streaming
Data
Stream
Processing
Data Lake
Machine learning life cycle with Feast
Train Model
Model
Serving
Production
System
Streaming
Data
Stream
Processing
Data Lake
Feast
Feast
Feast
Create Features
Train Model
Serve Model
Spark
What is Feast?
Feast is an ML-specific data system that attempts to solve the key challenges with productionizing ML
▪ Manages ingestion and storage of both streaming and batch data
▪ Allows for standardized definitions of features regardless of environment
▪ Encourages sharing and re-use of features through semantic references
▪ Ensures data consistency between to both training and serving
▪ Provides a point-in-time correct view of features for model training
▪ Ensures model performance by tracking, validating, and monitoring features
What is Feast not?
▪ A workflow scheduler (Airflow, Luigi)
▪ Just a data warehouse or data lake (Hive, BigQuery, Snowflake)
▪ A data transformation/processing tool (Pandas, Spark, DBT)
▪ A data discovery or cataloguing system (Amundsen, DataHub)
▪ Data version control or lineage (Dolt, Pachyderm)
▪ Model serving or metadata tracking (KFServing, Seldon, MLflow)
Getting data into Feast
Create entities and features using feature sets
name: driver_weekly
entities:
- name: driver_id
valueType: INT64
features:
- name: acc_rate
valueType: FLOAT
- name: conv_rate
valueType: FLOAT
- name: avg_daily_trips
valueType: FLOAT
▪ Feature sets allow for the definition of
entities and features and their
associated properties
▪ Allows for bulk definition of features
as they occur in a data source, e.g.,
Kafka
▪ Feature sets are not a grouping for
serving features
Ingesting a DataFrame into Feast
# Load dataframe
driver_df = pd.read_csv("driver_weekly_data.csv")
# Create feature set from dataframe
driver_fs = FeatureSet("driver_weekly")
driver_fs.infer_fields_from_df(dataframe)
# Register driver feature set.
feast_client.apply(driver_fs)
# Load feature data into Feast
feast_client.ingest(driver_fs, driver_df)
name: driver_weekly
entities:
- name: driver_id
valueType: INT64
features:
- name: acc_rate
valueType: FLOAT
- name: conv_rate
valueType: FLOAT
- name: avg_daily_trips
valueType: FLOAT
Ingesting streams into Feast
# Create feature set from a Kafka stream
driver_stream_fs = FeatureSet(
name="driver_stream",
entities=[Entity(name="driver_id", dtype=ValueType.INT64)],
features=[Feature(name="trips_today", dtype=ValueType.INT64)],
source=KafkaSource(brokers="kafka:9092", topic="driver-stream-topic"),
)
# Register driver stream feature set
feast_client.apply(driver_stream_fs)
Events on stream
What happens to the data?
Stream
Data Warehouse
Ingestion layer
(Apache Beam)
Data Lake
Jupyter Notebook
Historical Feature
Store
Online Feature Storage
(Redis, Cassandra)
Feast Serving
Feast Core
● Registry of features and entities
● Manages ingestion jobs
● Allows for search and discovery of features
● Allows for generation of feature statistics
● Retrieve point-in-time
correct training datasets
● Retrieve consistent online
features at low latency
● Unified ingestion ensures
online/historical consistency
● Provides feature schema
based statistics and alerting
Your data Ingestion Storage Serving Production
Model Training
Model Serving
Feature serving
Feature references and retrieval
Feast ServingModel Training
features = [
avg_daily_trips,
conv_rate,
acc_rate,
trips_today,
target
]
Training
Dataset
Feast ServingModel Serving
Online
features
< 10ms
■ Each feature is identified through a feature reference
■ Feature references allow clients to request either online
or historical feature data from Feast
■ Models have a single consistent view of features in both
training and serving
■ Feature references are persisted with model binaries,
allowing full automation of online serving
features = [
avg_daily_trips,
conv_rate,
acc_rate,
trips_today
]
Events throughout time
Time
Acceptance rate
Average daily trips
Conversion rate
Rider booking
Booking outcome
Featurevalues
Prediction made here Outcome of prediction
Trips Today
Ensuring point-in-time correctness
Time
Acceptance rate
Average daily trips
Conversion rate
Rider booking
Booking outcome
Featurevalues
Prediction made here Outcome of prediction
Trips Today
Point-in-time joins
Getting features for model training
features = [
"acc_rate",
"conv_rate",
"avg_daily_trips",
"trips_today",
]
# Fetch historical data
historic_features = client.get_batch_features(
entity_rows=drivers,
feature_ids=features
).to_dataframe()
# Train model
my_model = ml_framework.fit(historic_features)
Batch data Stream Target
Getting features during online serving
features = [
"acc_rate",
"conv_rate",
"avg_daily_trips",
"trips_today",
]
# Fetch online features
online_features = client.get_online_features(
entity_rows=drivers,
feature_ids=features
)
# Train model
result = trip_comp_model.predict(online_features)
Feature statistics and validation
Feature validation in Feast
▪ TFX: Feast has interoperability with TFDV as part of feature specifications
▪ Statistics: Allows users to generate feature statistics and visualize with Facets
▪ Dataset validation: Schemas can be used for validating data during training
▪ Monitoring & Alerting: Feast metrics an schemas can be used for monitoring and alerting
Infer TFDV schemas for features
# Get statistics based on source data inside of Feast
stats = feast_client.get_statistics(
feature_set_ref = 'iris',
start_date=start_date,
end_date=end_date
)
# Infer schema using TFDV
schema = tfdv.infer_schema(statistics=stats)
# User tweaks schema
tfdv.set_domain(schema, 'petal_width', schema_pb2.FloatDomain(min=0))
# Create a new Feast “feature set” from our Iris dataframe
iris_feature_set = feast_client.get_feature_set('iris')
# Update the entities and features with constraints defined in the schema
iris_feature_set.import_tfx_schema(schema)
# Persist feature set with TFDV schema in Feast
feast_client.apply(iris_feature_set)
name: iris
entities:
- name: class
valueType: STRING
features:
- name: sepal_length
valueType: DOUBLE
presence:
minFraction: 1
minCount: 1
shape:
dim:
- size: 1
- name: sepal_width
valueType: DOUBLE
presence:
minFraction: 1
minCount: 1
shape:
dim:
- size: 1
...
Visualize and validate training dataset
# Get statistics based on source data inside of Feast
dataset = client.get_batch_features(entity_rows=drivers,
feature_ids=features)
# Get statistics based on training dataset
stats = dataset.get_statistics()
# Get schema based on training dataset
schema = dataset.export_tfx_schema()
# Use TFDV to validate statistics generated from training dataset
anomalies = tfdv.validate_statistics(statistics=stats, schema=schema)
# Use TFDV to visualize statistics with Facets for debugging
tfdv.visualize_statistics(stats)
Takeaways
What value does Feast unlock?
▪ Sharing: New projects start with feature selection and not creation
▪ Iteration speed: Stages of the ML life cycle can be iterated on independently
▪ Consistency: Improved model performance through consistency and point-in-time correctness
▪ Definitions: Feature creators can encode domain knowledge into feature definitions
▪ Quality: Ensures the quality of data that reaches models through validation and alerting
The road ahead
Roadmap
▪ Feast 0.6
▪ Statistics and validation functionality
▪ Improved discovery and metadata functionality
▪ Under development
▪ Databricks, Azure, AWS support (community driven)
▪ SQL based sources
▪ JDBC storage (MySQL, PostgreSQL, Snowflake)
▪ Planned
▪ Automated training-serving skew detection
▪ Derived features
▪ Feature discovery UI
Get involved!
▪ Homepage: feast.dev
▪ Source code: github.com/feast-dev/feast
▪ Slack: #Feast
▪ Mailing list: https://groups.google.com/d/forum/feast-discuss
▪ These slides: https://tinyurl.com/feast-spark-deck

Contenu connexe

Tendances

Drone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiDrone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiTimothy Spann
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowDatabricks
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestAlluxio, Inc.
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드confluent
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingDatabricks
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionWes McKinney
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
Wix's ML Platform
Wix's ML PlatformWix's ML Platform
Wix's ML PlatformRan Romano
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkDatabricks
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Timothy Spann
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberXiang Fu
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine LearningLogical Clocks
 

Tendances (20)

Drone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiDrone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFi
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflow
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at Pinterest
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Real time data quality on Flink
Real time data quality on FlinkReal time data quality on Flink
Real time data quality on Flink
 
Wix's ML Platform
Wix's ML PlatformWix's ML Platform
Wix's ML Platform
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine Learning
 

Similaire à Scaling Data and ML with Apache Spark and Feast

Data Science in the Elastic Stack
Data Science in the Elastic StackData Science in the Elastic Stack
Data Science in the Elastic StackRochelle Sonnenberg
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azuregjuljo
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
 
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...Piyush Kumar
 
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA
 
Productionizing Real-time Serving With MLflow
Productionizing Real-time Serving With MLflowProductionizing Real-time Serving With MLflow
Productionizing Real-time Serving With MLflowDatabricks
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunk
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleNoriaki Tatsumi
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreDatabricks
 
Data Models Breakout Session
Data Models Breakout SessionData Models Breakout Session
Data Models Breakout SessionSplunk
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Databricks
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Johann de Boer
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
ICML'16 Scaling ML System@Twitter
ICML'16 Scaling ML System@TwitterICML'16 Scaling ML System@Twitter
ICML'16 Scaling ML System@TwitterJack Xiaojiang Guo
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdfJim Dowling
 
Data models pivot with splunk break out session
Data models pivot with splunk break out sessionData models pivot with splunk break out session
Data models pivot with splunk break out sessionGeorg Knon
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
DevOps > CI + CD. A web developer's introduction to Application Insights
DevOps > CI + CD.  A web developer's introduction to Application InsightsDevOps > CI + CD.  A web developer's introduction to Application Insights
DevOps > CI + CD. A web developer's introduction to Application InsightsJohn Garland
 

Similaire à Scaling Data and ML with Apache Spark and Feast (20)

Data Science in the Elastic Stack
Data Science in the Elastic StackData Science in the Elastic Stack
Data Science in the Elastic Stack
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azure
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
 
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
 
Productionizing Real-time Serving With MLflow
Productionizing Real-time Serving With MLflowProductionizing Real-time Serving With MLflow
Productionizing Real-time Serving With MLflow
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scale
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
 
Data Models Breakout Session
Data Models Breakout SessionData Models Breakout Session
Data Models Breakout Session
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
ICML'16 Scaling ML System@Twitter
ICML'16 Scaling ML System@TwitterICML'16 Scaling ML System@Twitter
ICML'16 Scaling ML System@Twitter
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Data models pivot with splunk break out session
Data models pivot with splunk break out sessionData models pivot with splunk break out session
Data models pivot with splunk break out session
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Mstr meetup
Mstr meetupMstr meetup
Mstr meetup
 
DevOps > CI + CD. A web developer's introduction to Application Insights
DevOps > CI + CD.  A web developer's introduction to Application InsightsDevOps > CI + CD.  A web developer's introduction to Application Insights
DevOps > CI + CD. A web developer's introduction to Application Insights
 

Plus de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Dernier

➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 

Dernier (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 

Scaling Data and ML with Apache Spark and Feast

  • 1.
  • 2. Scaling Data and ML with Feast and Apache Spark Willem Pienaar Data Science Platform Lead
  • 3. Agenda ▪ Overview ▪ Data challenges in production ML ▪ What is Feast? ▪ Getting data into Feast ▪ Feature serving ▪ Feature statistics and validation ▪ Takeaways ▪ The road ahead
  • 4. Gojek ■ Ride hailing ■ Food delivery ■ Digital payments ■ Logistics ■ Lifestyle services 100m+ app downloads +500k merchants 4 countries 1m+ drivers 100m+ monthly bookings Indonesia Singapore Thailand Vietnam
  • 5. Machine learning at Gojek ■ Matchmaking ■ Dynamic pricing ■ Routing ■ Recommendation systems ■ Incentive optimization ■ Supply positioning ■ Fraud prevention
  • 6. Machine learning life cycle prior to Feast Jupyter Notebook Model Serving Production System Features ??
  • 7. Machine learning life cycle prior to Feast Spark Transform Data Train Model Deploy Model Model Serving Production System Features Streams Stream Processing Data Lake
  • 8. Problems with end-to-end ML systems ● Monolithic end-to-end systems are hard to iterate on ● Training code needs to be rewritten for serving ● Training and serving features are inconsistent ● Data quality monitoring and validation is absent ● Lack of feature reuse and sharing
  • 9. Feast is a system that attempts to solve the key data challenges with productionizing machine learning
  • 10. Feast background ▪ Feature store was a collaboration between Gojek and Google Cloud ▪ Open-sourced in January ‘19 ▪ Community driven with adoption/contribution from multiple tech companies
  • 11. Machine learning life cycle prior to Feast Spark Transform Data Train Model Deploy Model Model Serving Production System Features Streaming Data Stream Processing Data Lake
  • 12. Machine learning life cycle with Feast Train Model Model Serving Production System Streaming Data Stream Processing Data Lake Feast Feast Feast Create Features Train Model Serve Model Spark
  • 13. What is Feast? Feast is an ML-specific data system that attempts to solve the key challenges with productionizing ML ▪ Manages ingestion and storage of both streaming and batch data ▪ Allows for standardized definitions of features regardless of environment ▪ Encourages sharing and re-use of features through semantic references ▪ Ensures data consistency between to both training and serving ▪ Provides a point-in-time correct view of features for model training ▪ Ensures model performance by tracking, validating, and monitoring features
  • 14. What is Feast not? ▪ A workflow scheduler (Airflow, Luigi) ▪ Just a data warehouse or data lake (Hive, BigQuery, Snowflake) ▪ A data transformation/processing tool (Pandas, Spark, DBT) ▪ A data discovery or cataloguing system (Amundsen, DataHub) ▪ Data version control or lineage (Dolt, Pachyderm) ▪ Model serving or metadata tracking (KFServing, Seldon, MLflow)
  • 16. Create entities and features using feature sets name: driver_weekly entities: - name: driver_id valueType: INT64 features: - name: acc_rate valueType: FLOAT - name: conv_rate valueType: FLOAT - name: avg_daily_trips valueType: FLOAT ▪ Feature sets allow for the definition of entities and features and their associated properties ▪ Allows for bulk definition of features as they occur in a data source, e.g., Kafka ▪ Feature sets are not a grouping for serving features
  • 17. Ingesting a DataFrame into Feast # Load dataframe driver_df = pd.read_csv("driver_weekly_data.csv") # Create feature set from dataframe driver_fs = FeatureSet("driver_weekly") driver_fs.infer_fields_from_df(dataframe) # Register driver feature set. feast_client.apply(driver_fs) # Load feature data into Feast feast_client.ingest(driver_fs, driver_df) name: driver_weekly entities: - name: driver_id valueType: INT64 features: - name: acc_rate valueType: FLOAT - name: conv_rate valueType: FLOAT - name: avg_daily_trips valueType: FLOAT
  • 18. Ingesting streams into Feast # Create feature set from a Kafka stream driver_stream_fs = FeatureSet( name="driver_stream", entities=[Entity(name="driver_id", dtype=ValueType.INT64)], features=[Feature(name="trips_today", dtype=ValueType.INT64)], source=KafkaSource(brokers="kafka:9092", topic="driver-stream-topic"), ) # Register driver stream feature set feast_client.apply(driver_stream_fs) Events on stream
  • 19. What happens to the data? Stream Data Warehouse Ingestion layer (Apache Beam) Data Lake Jupyter Notebook Historical Feature Store Online Feature Storage (Redis, Cassandra) Feast Serving Feast Core ● Registry of features and entities ● Manages ingestion jobs ● Allows for search and discovery of features ● Allows for generation of feature statistics ● Retrieve point-in-time correct training datasets ● Retrieve consistent online features at low latency ● Unified ingestion ensures online/historical consistency ● Provides feature schema based statistics and alerting Your data Ingestion Storage Serving Production Model Training Model Serving
  • 21. Feature references and retrieval Feast ServingModel Training features = [ avg_daily_trips, conv_rate, acc_rate, trips_today, target ] Training Dataset Feast ServingModel Serving Online features < 10ms ■ Each feature is identified through a feature reference ■ Feature references allow clients to request either online or historical feature data from Feast ■ Models have a single consistent view of features in both training and serving ■ Feature references are persisted with model binaries, allowing full automation of online serving features = [ avg_daily_trips, conv_rate, acc_rate, trips_today ]
  • 22. Events throughout time Time Acceptance rate Average daily trips Conversion rate Rider booking Booking outcome Featurevalues Prediction made here Outcome of prediction Trips Today
  • 23. Ensuring point-in-time correctness Time Acceptance rate Average daily trips Conversion rate Rider booking Booking outcome Featurevalues Prediction made here Outcome of prediction Trips Today
  • 25. Getting features for model training features = [ "acc_rate", "conv_rate", "avg_daily_trips", "trips_today", ] # Fetch historical data historic_features = client.get_batch_features( entity_rows=drivers, feature_ids=features ).to_dataframe() # Train model my_model = ml_framework.fit(historic_features) Batch data Stream Target
  • 26. Getting features during online serving features = [ "acc_rate", "conv_rate", "avg_daily_trips", "trips_today", ] # Fetch online features online_features = client.get_online_features( entity_rows=drivers, feature_ids=features ) # Train model result = trip_comp_model.predict(online_features)
  • 28. Feature validation in Feast ▪ TFX: Feast has interoperability with TFDV as part of feature specifications ▪ Statistics: Allows users to generate feature statistics and visualize with Facets ▪ Dataset validation: Schemas can be used for validating data during training ▪ Monitoring & Alerting: Feast metrics an schemas can be used for monitoring and alerting
  • 29. Infer TFDV schemas for features # Get statistics based on source data inside of Feast stats = feast_client.get_statistics( feature_set_ref = 'iris', start_date=start_date, end_date=end_date ) # Infer schema using TFDV schema = tfdv.infer_schema(statistics=stats) # User tweaks schema tfdv.set_domain(schema, 'petal_width', schema_pb2.FloatDomain(min=0)) # Create a new Feast “feature set” from our Iris dataframe iris_feature_set = feast_client.get_feature_set('iris') # Update the entities and features with constraints defined in the schema iris_feature_set.import_tfx_schema(schema) # Persist feature set with TFDV schema in Feast feast_client.apply(iris_feature_set) name: iris entities: - name: class valueType: STRING features: - name: sepal_length valueType: DOUBLE presence: minFraction: 1 minCount: 1 shape: dim: - size: 1 - name: sepal_width valueType: DOUBLE presence: minFraction: 1 minCount: 1 shape: dim: - size: 1 ...
  • 30. Visualize and validate training dataset # Get statistics based on source data inside of Feast dataset = client.get_batch_features(entity_rows=drivers, feature_ids=features) # Get statistics based on training dataset stats = dataset.get_statistics() # Get schema based on training dataset schema = dataset.export_tfx_schema() # Use TFDV to validate statistics generated from training dataset anomalies = tfdv.validate_statistics(statistics=stats, schema=schema) # Use TFDV to visualize statistics with Facets for debugging tfdv.visualize_statistics(stats)
  • 32. What value does Feast unlock? ▪ Sharing: New projects start with feature selection and not creation ▪ Iteration speed: Stages of the ML life cycle can be iterated on independently ▪ Consistency: Improved model performance through consistency and point-in-time correctness ▪ Definitions: Feature creators can encode domain knowledge into feature definitions ▪ Quality: Ensures the quality of data that reaches models through validation and alerting
  • 34. Roadmap ▪ Feast 0.6 ▪ Statistics and validation functionality ▪ Improved discovery and metadata functionality ▪ Under development ▪ Databricks, Azure, AWS support (community driven) ▪ SQL based sources ▪ JDBC storage (MySQL, PostgreSQL, Snowflake) ▪ Planned ▪ Automated training-serving skew detection ▪ Derived features ▪ Feature discovery UI
  • 35. Get involved! ▪ Homepage: feast.dev ▪ Source code: github.com/feast-dev/feast ▪ Slack: #Feast ▪ Mailing list: https://groups.google.com/d/forum/feast-discuss ▪ These slides: https://tinyurl.com/feast-spark-deck