SlideShare une entreprise Scribd logo
1  sur  96
Télécharger pour lire hors ligne
Proprietary + Confidential
Data Science in Cloud
Quick Tour
Priyanka Vergadia
Staff Developer Advocate
Google Cloud
Twitter: @pvergadia
Proprietary + Confidential
1
2
3
4
Google Cloud Orientation
Data Science
Data Analytics
MLOps
Flow
Some Google Cloud Tools - BigQuery,
BigQuery ML and Vertex AI
5
Wrap up
6
@pvergadia
01
Data Science
Orientation
Things I don’t
want to think
about...
1. Provisioning hardware
2. Installing software
3. Upgrading operating systems
4. Security patching
5. System and network admin
6. Scaling up/down
7. Paying for stuff I don’t use
8. Dealing with failures
9. Managing clusters
Things I want
to think
about...
1. Solving my problem
Getting things done using someone else’s computers, especially
where someone else worries about maintenance, provisioning, system
administration, security, networking, failure recover, etc.
02
Data Science
6 steps
03
Data Analysis
04
ML Ops
Proprietary + Confidential
The real problems with a
ML system will be found
while you are continuously
operating it for the long term”
Launching is easy,
Operating is hard.
pixabay.com
Developing the model
is just the beginning...
Modeling Code
…a product requires so much more
Configuration
Data Collection
Data
Verification
Feature Extraction Process Management
Tools
Analysis Tools
Machine
Resource
Management
Serving
Infrastructure
Monitoring
ML Code
Proprietary + Confidential
Why do things become harder in production?
(an incomplete list)
● data cleaning and processing is hard at scale
● scaling out training and serving; infrastructure issues
● tracking, monitoring, and reproducibility requirements
○ model or data drift
○ training/serving skew
● access control issues, security requirements
● (and lots more)
The level of automation defines the maturity of the
ML process
Level 0
Build and
deploy manually
Level 1
Automate
the training phase
Level 2
Automate training,
validation, and deployment
Production ML Experience
“On any given day there are thousands of TFX
pipelines running, which are processing exabytes
of data and producing tens of thousands of
models, which in turn are performing hundreds
of millions of inferences per second.“
BigQuery & BQML
05
BigQuery & BQML
Google BigQuery
Data warehouse with customers ranging from TB to 100+ PB
Insights for everyone
Cloud-scale enterprise
data warehouse
Unique
Serverless platform
Standard SQL(ANSI 2011)
with DML Support
Encrypted, durable,
highly available Unique
Built-in ML Unique
Real-time insights Unique
In ~15s:
● Read 2TB:
○ ~1k disks
● Run 50B regexps:
○ ~3k cores
Train and deploy ML models in
SQL
BigQuery ML
Execute ML workflows without
moving data from BigQuery
Automate common ML tasks
Built-in infrastructure
management, security &
compliance
BigQuery ML supported models and features
The data analyst’s onramp to AI and ML
Classification
Logistic regression
DNN classifier (TensorFlow)
XGBoost
Regression
Other Models
k-means clustering
Time series forecasting
Model ops and
explainability
Import/export TensorFlow models for
batch and online prediction
NDA
AutoML Tables
Linear regression
DNN regressor (TensorFlow)
XGBoost
AutoML Tables
Recommendation: Matrix factorization
NDA
Time series anomaly detectionPreview Q2’21,
GA H2’21
Hyperparameter tuning using Cloud AI
VizierPreview H1’21, GA H2’21
Model explainability using Cloud
AIPreview H1’21, GA H2’21
Managed Kubernetes and TFX
pipelinesPreview H2’21, GA 2022
List models for comparison and online
deployment in Cloud AIPreview H2’21, GA 2022
Model versioning, continuous
monitoringfuture
Wide and Deep NNsPreview, GA H1’21
Wide and Deep NNsPreview, GA H1’21
Autoencoders
06
Vertex AI
Vertex AI is a
managed ML platform
to speed the rate of
experimentation and accelerate
deployment of AI models.
The End-To-End ML Journey through Vertex AI
Where can I find
training data?
Feature Store
Datasets
Where do I start with
model experiments?
Workbench
How can I track
the results of
experiments?
Experiments
How can I train at scale?
Training
How do I deploy?
Endpoints
And for production?
Monitoring
Pipelines
07
Learning Resources
Proprietary + Confidential
● Introduction to Data Science blog:
https://goo.gle/dsintro
● Getting started docs:
cloud.google.com/vertex-ai/docs
● Get started in Cloud Console:
console.cloud.google.com/ai/platform
● Best practices:
cloud.google.com/architecture/ml-on-gcp-best-practices
Learn more
goo.gle/bqml-use-cases
BQML design patterns
https://github.com/priyankavergadia/google-cloud-4-words
Thank you!
Twitter, LinkedIn: @pvergadia
Proprietary + Confidential
What is Dataplex?
NDA
BigQuery
Dataplex
Data Lifecycle Mgmt
(Ingest, discover, prep, monitor, serve, archive)
Logical data organization
Unified Security and Governance
Unified Metadata with auto-discovery
Dataproc AI Platform
Data
Studio
Structured Streaming Data*
Semi-Structured Unstructured
GCP On-premises*
Multi-Cloud*
Dataflow
Storage
Built for distributed data
Logically unify and organize your data without any data
movement.
Intelligent Data Management
Automatic data discovery, metadata harvesting,
lifecycle management, and data quality with built-in
AI-driven intelligence.
Centralized Security & Governance
Central policy management, monitoring and auditing for
data authorization, retention, and classification.
Data Classification and Data Quality
Data
Intelligence
Analytics
*future capabilities
Proprietary + Confidential
Data Science On Google Cloud
A Guided Tour
Polong Lin & Marc Cohen
Developer Relations Engineers
Google Cloud
Slides: mco.fyi/ds
Lab
mco.fyi/mllab
or
mco.fyi/forecast
Feature Store: Data Model
Photo by Martin Olsen on
Complexity is a
barrier to adoption
HELLO CSECT The name of this program is 'HELLO'
* Register 15 points here on entry from OPSYS or caller.
STM 14,12,12(13) Save registers 14,15, and 0 thru 12 in caller's Save area
LR 12,15 Set up base register with program's entry point address
USING HELLO,12 Tell assembler which register we are using for pgm. base
LA 15,SAVE Now Point at our own save area
ST 15,8(13) Set forward chain
ST 13,4(15) Set back chain
LR 13,15 Set R13 to address of new save area
* -end of housekeeping (similar for most programs) -
WTO 'Hello World' Write To Operator (Operating System macro)
*
L 13,4(13) restore address to caller-provided save area
XC 8(4,13),8(13) Clear forward chain
LM 14,12,12(13) Restore registers as on entry
DROP 12 The opposite of 'USING'
SR 15,15 Set register 15 to 0 so that the return code (R15) is Zero
BR 14 Return to caller
*
SAVE DS 18F Define 18 fullwords to save calling program registers
END HELLO This is the end of the program
class HelloWorld
{
public static void main(String args[])
{
System.out.println("Hello, World");
}
}
print('Hello World')
Continuous Training for Production ML in the TFX Platform. OpML (2019).
Slice Finder: Automated Data Slicing for Model Validation. ICDE (2019).
Data Validation for Machine Learning. SysML (2019).
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. KDD (2017).
Data Management Challenges in Production Machine Learning. SIGMOD (2017).
Rules of Machine Learning: Best Practices for ML Engineering. Google AI Web (2017).
Machine Learning: The High Interest Credit Card of Technical Debt. NeurIPS (2015).
Hidden Technical Debt in Machine Learning Systems. NIPS (2015).
Production ML Research
Serving and
Monitoring
Continuous
Training
Experimentation/
Development
Code
Repository
Training Pipeline
CI/CD
Code and
configurations
Artifact
Repository
Pipeline
artifacts
Model
Registry
Model Deployment
CI/CD
Serving
Infrastructure
Trained
model
Model
deployment
ML Metadata
Logs
Serving
logs
Putting it all together
End-to-end view
Code & Config
Training pipeline
Registered model
Deployed model
Serving logs
Focus of today
Vertex
Feature Store
Vertex Training and
Pipelines
Vertex Model
Monitoring
Vertex
Workbench
Cloud Build
Vertex ML Metadata
Vertex Endpoints
and Prediction
Feature Store in one picture
Our Solution
Feature
Store
Online Store
Feature
Management API
Batch
Ingestion API
Stream
Ingestion API
Feature Discovery
API
Online Serving API
Batch Serving API
Cache
Online Prediction
Model Training
Batch Feature
Engineering
Streaming Feature
Engineering
Data Lake
(BQ, GCS)
Kafka/Pubsub
Point-in-time
lookups
Registry
Feature Monitoring
Offline Store
How does the new SDK fits in the picture?
Our Solution
Feature
Store
Online Store
Feature
Management API
Batch
Ingestion API
Stream
Ingestion API
Feature Discovery
API
Online Serving API
Batch Serving API
Cache
Online Prediction
Model Training
Batch Feature
Engineering
Streaming Feature
Engineering
Data Lake
(BQ, GCS)
Kafka/Pubsub
Point-in-time
lookups
Registry
Feature Monitoring
Offline Store
Vertex AI SDK
Data engineer ML engineer
Data scientist
Proprietary + Confidential
Scalable training and serving on Vertex AI
Train with
Data
Analyst
ML
Developer
Data
Scientist
Use when Serve with
Vertex
Training
• Your problem doesn’t match the criteria
listed below for BigQuery ML or AutoML.
• You’re already running training on-premises or
another cloud, and you need consistency across
the platforms.
Vertex
Prediction
AutoML
• Your problem fits into one of the types AutoML
supports. Offers a point-and-click workflow.
• Natural Language or Video models are served from
Google Cloud. While Vision and Tables support
edge / downloadable models.
BigQuery ML
• All your data is contained in BigQuery.
• Users are most comfortable with SQL.
• The set of models available in BigQuery ML
matches the problem you’re trying to solve.
Train with Use when Serve with
Data
Analyst
ML
Developer
Data
Scientist
Model deployment &
management (MLOps)
Explainable AI
Model development and
data science
BigQuery ML Roadmap for 2021
H1’21 H2’21
TF Wide and Deep NNs Preview
Autoencoders Preview
PCA Preview
P-values for linear models Preview
Hyperparameter Tuning Preview
Anomaly Detection Preview
AutoML Tables GA
NDA
TF Wide and Deep NNs GA
Autoencoders GA
PCA GA
P-values for linear models GA
Hyperparameter Tuning GA
Anomaly Detection GA
Multivariate Time Series (AutoML) Preview
Model Registry Preview
Managed Pipelines Preview
Explainable AI Preview
Preparing the training data
Mix of demographic & behavioural data
Each row
is a
different
user
Preparing the training data
Each row
is a
different
user
Mix of demographic & behavioural data
Goal is to
create
cluster
labels
3
2
3
2
1
Proprietary + Confidential
Developer Days
SELECT
* EXCEPT(userId)
FROM
mydataset.train
Build and train with
CREATE MODEL
Proprietary + Confidential
Developer Days
CREATE OR REPLACE MODEL
mydataset.kmeans_3
OPTIONS(
model_type='KMEANS',
kmeans_init_method = 'KMEANS++',
num_clusters=3
)
SELECT
* EXCEPT(userId)
FROM
mydataset.train
Build and train with
CREATE MODEL
Proprietary + Confidential
Developer Days
ML.PREDICT results
Proprietary + Confidential
Developer Days
Compute cluster labels
using ML.PREDICT
SELECT
*
FROM
ML.PREDICT(MODEL mydataset.kmeans_3,
(
SELECT
*
FROM
mydataset.train ))
Proprietary + Confidential
Developer Days
Inspecting the clusters "Evaluation" tab on the BigQuery UI
Proprietary + Confidential
Developer Days
Inspecting the clusters
Anomaly detection with k-means
Fraud detection
Each row is a transaction
Which rows are anomalies?
CREATE MODEL - k-means clustering
#Query for model training
CREATE MODEL demo.kmeans_model
OPTIONS(
model_type='kmeans',
num_clusters= 8,
kmeans_init_method = 'kmeans++'
)
AS
SELECT * EXCEPT(Time, Class)
FROM
bigquery-public-data.ml_datasets.ulb_fraud_detection;
ML.DETECT_ANOMALIES with k-means clustering
#Query for creating anomaly detection results
SELECT
*
FROM
ML.DETECT_ANOMALIES(
MODEL demo.kmeans_model,
STRUCT(0.005 AS contamination),
TABLE bigquery-public-data.ml_datasets.ulb_fraud_detection
);
Blogpost
https://cloud.google.com/blog/prod
ucts/data-analytics/bigquery-ml-unsu
pervised-anomaly-detection
Docs
https://cloud.google.com/bigquery-
ml/docs/reference/standard-sql/bigq
ueryml-syntax-detect-anomalies
Automated HP tuning
Have BigQuery ML automatically
search for the optimal
hyperparameters
Preview
Select number of trials
1
Don't need to be an expert in HPs
Save time from manually training
models with different HPs
Easy to use
CREATE MODEL
mydataset.my_logreg_model
OPTIONS(
model_type="logistic_reg",
input_label_cols=["mylabel"],
num_trials=20
) AS
SELECT
*
FROM
mydataset.my_training_data
Hyperparameter tuning with BigQuery ML
Automated HP tuning
Have BigQuery ML automatically
search for the optimal
hyperparameters
Preview
Select number of trials
1
Uses Vertex Vizier under-the-hood
Save time from manually training
models with different HPs
Easy to use
Inspect the trials info
2
SELECT
*
FROM
ML.TRIAL_INFO(MODEL mydataset.my_logreg_model)
Even while it's
still training!
Hyperparameter tuning with BigQuery ML
Automated HP tuning
Have BigQuery ML automatically
search for the optimal
hyperparameters
Preview
Select number of trials
1
Uses Vertex Vizier under-the-hood
Save time from manually training
models with different HPs
Easy to use
Inspect the trials info
2
Evaluate your model
3
Hyperparameter tuning with BigQuery ML
Automated HP tuning
Have BigQuery ML automatically
search for the optimal
hyperparameters
Preview
Select number of trials
1
Uses Vertex Vizier under-the-hood
Save time from manually training
models with different HPs
Easy to use
Inspect the trials info
2
Evaluate your model
3
Predict!
4
SELECT
*
FROM
ML.PREDICT(MODEL mydataset.my_logreg_model)
Hyperparameter tuning with BigQuery ML
How to import TensorFlow models to do
batch predictions in BigQuery
using BigQuery ML
Importing TensorFlow models into BigQuery
CREATE MODEL
PREDICT
https://towardsdatascience.com/how-to-do-batch-predictions-of-tensorflow-models-directly-in-bigquery-ffa843ebdba6
https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models
TensorFlow Hub - tfhub.dev
Question:
Can we do text
similarity based on
embeddings?
# The following are example embedding outputs of 20 dimensions per sentence
# Embedding for: The quick brown fox jumps over the lazy dog.
# [0.0560572519898, 0.0534118898213, -0.0112254749984, ...]
# Embedding for: I am a sentence for which I would like to get its embedding.
# [-0.0343746766448, -0.0529498048127, 0.0469399243593, ...]
Text similarity using an imported Tensorflow model
https://towardsdatascience.com/how-to-do-text-similarity-search-and-document-clustering-in-bigquery-75eb8f45ab65
Goal:
I want to search for
comments similar to:
"power line down on a home"
Step 1: Save the TensorFlow model to GCS
CREATE OR REPLACE MODEL
mydataset.swivel_text_embed
OPTIONS(
model_type='tensorflow',
model_path='gs://BUCKET/swivel/*')
Step 2: CREATE MODEL using the GCS folder path
Step 3: Use ML.PREDICT to get comment embeddings
SELECT
*
FROM
ML.PREDICT(MODEL mydataset.swivel_text_embed,
(SELECT
comments AS sentences
FROM
mydataset.mydata) );
Step 3: Use ML.PREDICT to get comment embeddings
SELECT
*
FROM
ML.PREDICT(MODEL mydataset.swivel_text_embed,
(SELECT
comments AS sentences
FROM
mydataset.mydata) );
Text converted into an
embedding of 20
floating points
Step 4: Calculate distance between embeddings to
compute text similarity
Input search term:
"power line down on a home"
Top 15 most similar comments to input
Exporting BQML models for use with Vertex
Model trained with BigQuery ML Vertex Pipelines
Export to Cloud Storage
https://github.com/GoogleCloudPlatform/analytics-componentized-patterns/tree/master/retail/recommendation-system/bqml-mlops
Proprietary + Confidential
Data
Labeling
AutoML
DL Environment (DL VM + DL Container)
Prediction
Feature
Store
Training
Experiments
Data
Readiness
Feature
Engineering
Training/
HP-Tuning
Model
Monitoring
Model
serving
Understanding/
Tuning
Edge
Model
Management
Notebooks
Pipelines (Orchestration)
Explainable
AI
Hybrid AI
Continuous
Monitoring
Metadata
Vision Translation Tables
Language
Video
AI
Accelerators
Vizier
Optimization
Datasets
What’s included in Vertex AI? NDA
Proprietary + Confidential
Vertex Pipelines: Key capabilities
Python SDKs
Data Scientist friendly
Python SDKs
Serverless and
Scalable
Run as many pipelines
on as much data as you
want.
Metadata and lineage
Store metadata for
every artifact produced
by the pipeline.
Monitoring UIs
and APIs
Track and debug
pipelines executions
Security
Supports Cloud IAM,
VPC-SC, and CMEK.
Cost-effective
Only pay for the pipelines
you run and the
resources they use
Proprietary + Confidential
Proprietary + Confidential
Proprietary + Confidential
Proprietary + Confidential
Conditional triggers
Proprietary + Confidential
Logging metrics
Proprietary + Confidential
Experimentation management with Vertex Pipelines
Iterative Experimentation
Data
Prep
Development
datasets / Features
Source
Repository
Feature
Eng
Model
Training
Model
Eval
Experiment Tracking
Training Pipeline
Automation
Parameters, metrics, artifacts
Training
Pipeline
Source Code
Proprietary + Confidential
Continuous Training with Vertex Pipelines
Orchestrated Training Pipeline
Data
Extraction
Development
datasets / Features
Model Registry &
Artifact Store
Data
Valid.
Data
Prep.
Model
Training
Training Pipeline Metadata
Trained
Model
Model
Eval.
Model
Valid.
Training Pipeline CI/CD
Training Pipeline Source Code
Evaluate and Understand Models
Tabular Text
What-If Tool (WIT)
Visually probe the behavior of trained machine
learning models, with minimal coding
Language Interpretability Tool (LIT)
Open-source platform for visualization and
understanding of NLP models.
A canonical ML workflow
Experimentation (Re) Training Model Deployment
Continuous Model
Monitoring
Training Serving
1 2 3 4
EDA /
Prototyping
Training
pipeline dev
Pipeline
CI/CD
Candidate
Model generation
Model
Serving
Canary & A/B
Testing
Model performance monitoring
Retrain Triggers
Data
Validation
Feature
Engineering
Model
Training
Model
Evaluation
Model
Registry
Model Cards
& Reporting
Model
Provenance
Compliance
Model Management & Governance
Learning Transferable Architectures for Scalable Image Recognition, Zoph et al. 2017, https://arxiv.org/abs/1707.07012
computational cost
Accuracy
(precision
@1)
accuracy
AutoML outperforms handcrafted models
92
https://cloud.google.com/architecture/ml-on-gcp-best-practices
Proprietary + Confidential
Three Modalities
of Google Cloud
1. Cloud Console
2. Command Line
3. APIs

Contenu connexe

Tendances

Google Vertex AI
Google Vertex AIGoogle Vertex AI
Google Vertex AIVikasBisoi
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...Edge AI and Vision Alliance
 
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersIntro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersDaniel Zivkovic
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptxWasm1953
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Preparing, Piloting & Paths to Success with Microsoft Copilot
Preparing, Piloting & Paths to Success with Microsoft CopilotPreparing, Piloting & Paths to Success with Microsoft Copilot
Preparing, Piloting & Paths to Success with Microsoft CopilotRichard Harbridge
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_futureNisha Talagala
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬VINCI Digital - Industrial IoT (IIoT) Strategic Advisory
 
Future Ready Enterprise Systems | Accenture
Future Ready Enterprise Systems | AccentureFuture Ready Enterprise Systems | Accenture
Future Ready Enterprise Systems | Accentureaccenture
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOpsCarl W. Handlin
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 

Tendances (20)

Google Vertex AI
Google Vertex AIGoogle Vertex AI
Google Vertex AI
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersIntro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Preparing, Piloting & Paths to Success with Microsoft Copilot
Preparing, Piloting & Paths to Success with Microsoft CopilotPreparing, Piloting & Paths to Success with Microsoft Copilot
Preparing, Piloting & Paths to Success with Microsoft Copilot
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_future
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
 
Azure purview
Azure purviewAzure purview
Azure purview
 
Microsoft Purview
Microsoft PurviewMicrosoft Purview
Microsoft Purview
 
Future Ready Enterprise Systems | Accenture
Future Ready Enterprise Systems | AccentureFuture Ready Enterprise Systems | Accenture
Future Ready Enterprise Systems | Accenture
 
Data stewardship
Data stewardshipData stewardship
Data stewardship
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Getting your enterprise ready for Microsoft 365 Copilot
Getting your enterprise ready for Microsoft 365 CopilotGetting your enterprise ready for Microsoft 365 Copilot
Getting your enterprise ready for Microsoft 365 Copilot
 

Similaire à GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud

Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...HostedbyConfluent
 
EPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHUEPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHUDmitrii Suslov
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...UA DevOps Conference
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningEdunomica
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapNeo4j
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...ScyllaDB
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014Mark Tabladillo
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019GoDataDriven
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 
Production ML Systems and Computer Vision with Google Cloud
Production ML Systems and Computer Vision with Google CloudProduction ML Systems and Computer Vision with Google Cloud
Production ML Systems and Computer Vision with Google Cloudgdgsurrey
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j
 
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Naoki (Neo) SATO
 
DICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyDICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyCloudify Community
 

Similaire à GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud (20)

Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
EPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHUEPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHU
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Production ML Systems and Computer Vision with Google Cloud
Production ML Systems and Computer Vision with Google CloudProduction ML Systems and Computer Vision with Google Cloud
Production ML Systems and Computer Vision with Google Cloud
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j Vision and Roadmap
Neo4j Vision and Roadmap
 
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
 
DICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyDICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made Easy
 

Plus de James Anderson

GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...James Anderson
 
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...James Anderson
 
GDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for Kubernetes
GDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for KubernetesGDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for Kubernetes
GDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for KubernetesJames Anderson
 
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...James Anderson
 
GDG SLK - Why should devs care about container security.pdf
GDG SLK - Why should devs care about container security.pdfGDG SLK - Why should devs care about container security.pdf
GDG SLK - Why should devs care about container security.pdfJames Anderson
 
GraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdf
GraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdfGraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdf
GraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdfJames Anderson
 
GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...
 GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ... GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...
GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...James Anderson
 
A3 - AR Code Planetarium CST.pdf
A3 - AR Code Planetarium CST.pdfA3 - AR Code Planetarium CST.pdf
A3 - AR Code Planetarium CST.pdfJames Anderson
 
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...James Anderson
 
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language ModelsGDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language ModelsJames Anderson
 
GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...
GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...
GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...James Anderson
 
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...James Anderson
 
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...James Anderson
 
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...James Anderson
 
GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...
GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...
GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...James Anderson
 
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneGDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneJames Anderson
 
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...James Anderson
 
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...James Anderson
 
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud BoundariesGDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud BoundariesJames Anderson
 

Plus de James Anderson (20)

GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
 
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
 
GDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for Kubernetes
GDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for KubernetesGDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for Kubernetes
GDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for Kubernetes
 
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
 
GDG SLK - Why should devs care about container security.pdf
GDG SLK - Why should devs care about container security.pdfGDG SLK - Why should devs care about container security.pdf
GDG SLK - Why should devs care about container security.pdf
 
GraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdf
GraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdfGraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdf
GraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdf
 
GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...
 GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ... GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...
GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...
 
A3 - AR Code Planetarium CST.pdf
A3 - AR Code Planetarium CST.pdfA3 - AR Code Planetarium CST.pdf
A3 - AR Code Planetarium CST.pdf
 
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...
 
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language ModelsGDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models
 
GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...
GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...
GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...
 
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...
 
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
 
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...
 
GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...
GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...
GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...
 
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneGDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
 
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...
 
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...
 
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud BoundariesGDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
 

Dernier

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Dernier (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud

  • 1. Proprietary + Confidential Data Science in Cloud Quick Tour Priyanka Vergadia Staff Developer Advocate Google Cloud Twitter: @pvergadia
  • 2. Proprietary + Confidential 1 2 3 4 Google Cloud Orientation Data Science Data Analytics MLOps Flow Some Google Cloud Tools - BigQuery, BigQuery ML and Vertex AI 5 Wrap up 6 @pvergadia
  • 4. Things I don’t want to think about... 1. Provisioning hardware 2. Installing software 3. Upgrading operating systems 4. Security patching 5. System and network admin 6. Scaling up/down 7. Paying for stuff I don’t use 8. Dealing with failures 9. Managing clusters
  • 5. Things I want to think about... 1. Solving my problem
  • 6. Getting things done using someone else’s computers, especially where someone else worries about maintenance, provisioning, system administration, security, networking, failure recover, etc.
  • 8.
  • 9.
  • 11.
  • 13. Proprietary + Confidential The real problems with a ML system will be found while you are continuously operating it for the long term” Launching is easy, Operating is hard. pixabay.com
  • 14. Developing the model is just the beginning... Modeling Code
  • 15. …a product requires so much more Configuration Data Collection Data Verification Feature Extraction Process Management Tools Analysis Tools Machine Resource Management Serving Infrastructure Monitoring ML Code
  • 16. Proprietary + Confidential Why do things become harder in production? (an incomplete list) ● data cleaning and processing is hard at scale ● scaling out training and serving; infrastructure issues ● tracking, monitoring, and reproducibility requirements ○ model or data drift ○ training/serving skew ● access control issues, security requirements ● (and lots more)
  • 17. The level of automation defines the maturity of the ML process Level 0 Build and deploy manually Level 1 Automate the training phase Level 2 Automate training, validation, and deployment
  • 19. “On any given day there are thousands of TFX pipelines running, which are processing exabytes of data and producing tens of thousands of models, which in turn are performing hundreds of millions of inferences per second.“
  • 22. Google BigQuery Data warehouse with customers ranging from TB to 100+ PB Insights for everyone Cloud-scale enterprise data warehouse Unique Serverless platform Standard SQL(ANSI 2011) with DML Support Encrypted, durable, highly available Unique Built-in ML Unique Real-time insights Unique
  • 23. In ~15s: ● Read 2TB: ○ ~1k disks ● Run 50B regexps: ○ ~3k cores
  • 24. Train and deploy ML models in SQL BigQuery ML Execute ML workflows without moving data from BigQuery Automate common ML tasks Built-in infrastructure management, security & compliance
  • 25. BigQuery ML supported models and features The data analyst’s onramp to AI and ML Classification Logistic regression DNN classifier (TensorFlow) XGBoost Regression Other Models k-means clustering Time series forecasting Model ops and explainability Import/export TensorFlow models for batch and online prediction NDA AutoML Tables Linear regression DNN regressor (TensorFlow) XGBoost AutoML Tables Recommendation: Matrix factorization NDA Time series anomaly detectionPreview Q2’21, GA H2’21 Hyperparameter tuning using Cloud AI VizierPreview H1’21, GA H2’21 Model explainability using Cloud AIPreview H1’21, GA H2’21 Managed Kubernetes and TFX pipelinesPreview H2’21, GA 2022 List models for comparison and online deployment in Cloud AIPreview H2’21, GA 2022 Model versioning, continuous monitoringfuture Wide and Deep NNsPreview, GA H1’21 Wide and Deep NNsPreview, GA H1’21 Autoencoders
  • 27. Vertex AI is a managed ML platform to speed the rate of experimentation and accelerate deployment of AI models.
  • 28. The End-To-End ML Journey through Vertex AI Where can I find training data? Feature Store Datasets Where do I start with model experiments? Workbench How can I track the results of experiments? Experiments How can I train at scale? Training How do I deploy? Endpoints And for production? Monitoring Pipelines
  • 30. Proprietary + Confidential ● Introduction to Data Science blog: https://goo.gle/dsintro ● Getting started docs: cloud.google.com/vertex-ai/docs ● Get started in Cloud Console: console.cloud.google.com/ai/platform ● Best practices: cloud.google.com/architecture/ml-on-gcp-best-practices Learn more
  • 34. Proprietary + Confidential What is Dataplex? NDA BigQuery Dataplex Data Lifecycle Mgmt (Ingest, discover, prep, monitor, serve, archive) Logical data organization Unified Security and Governance Unified Metadata with auto-discovery Dataproc AI Platform Data Studio Structured Streaming Data* Semi-Structured Unstructured GCP On-premises* Multi-Cloud* Dataflow Storage Built for distributed data Logically unify and organize your data without any data movement. Intelligent Data Management Automatic data discovery, metadata harvesting, lifecycle management, and data quality with built-in AI-driven intelligence. Centralized Security & Governance Central policy management, monitoring and auditing for data authorization, retention, and classification. Data Classification and Data Quality Data Intelligence Analytics *future capabilities
  • 35.
  • 36. Proprietary + Confidential Data Science On Google Cloud A Guided Tour Polong Lin & Marc Cohen Developer Relations Engineers Google Cloud Slides: mco.fyi/ds
  • 39.
  • 40. Photo by Martin Olsen on Complexity is a barrier to adoption
  • 41. HELLO CSECT The name of this program is 'HELLO' * Register 15 points here on entry from OPSYS or caller. STM 14,12,12(13) Save registers 14,15, and 0 thru 12 in caller's Save area LR 12,15 Set up base register with program's entry point address USING HELLO,12 Tell assembler which register we are using for pgm. base LA 15,SAVE Now Point at our own save area ST 15,8(13) Set forward chain ST 13,4(15) Set back chain LR 13,15 Set R13 to address of new save area * -end of housekeeping (similar for most programs) - WTO 'Hello World' Write To Operator (Operating System macro) * L 13,4(13) restore address to caller-provided save area XC 8(4,13),8(13) Clear forward chain LM 14,12,12(13) Restore registers as on entry DROP 12 The opposite of 'USING' SR 15,15 Set register 15 to 0 so that the return code (R15) is Zero BR 14 Return to caller * SAVE DS 18F Define 18 fullwords to save calling program registers END HELLO This is the end of the program
  • 42. class HelloWorld { public static void main(String args[]) { System.out.println("Hello, World"); } }
  • 44.
  • 45. Continuous Training for Production ML in the TFX Platform. OpML (2019). Slice Finder: Automated Data Slicing for Model Validation. ICDE (2019). Data Validation for Machine Learning. SysML (2019). TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. KDD (2017). Data Management Challenges in Production Machine Learning. SIGMOD (2017). Rules of Machine Learning: Best Practices for ML Engineering. Google AI Web (2017). Machine Learning: The High Interest Credit Card of Technical Debt. NeurIPS (2015). Hidden Technical Debt in Machine Learning Systems. NIPS (2015). Production ML Research
  • 46. Serving and Monitoring Continuous Training Experimentation/ Development Code Repository Training Pipeline CI/CD Code and configurations Artifact Repository Pipeline artifacts Model Registry Model Deployment CI/CD Serving Infrastructure Trained model Model deployment ML Metadata Logs Serving logs Putting it all together End-to-end view
  • 47. Code & Config Training pipeline Registered model Deployed model Serving logs Focus of today Vertex Feature Store Vertex Training and Pipelines Vertex Model Monitoring Vertex Workbench Cloud Build Vertex ML Metadata Vertex Endpoints and Prediction
  • 48. Feature Store in one picture Our Solution Feature Store Online Store Feature Management API Batch Ingestion API Stream Ingestion API Feature Discovery API Online Serving API Batch Serving API Cache Online Prediction Model Training Batch Feature Engineering Streaming Feature Engineering Data Lake (BQ, GCS) Kafka/Pubsub Point-in-time lookups Registry Feature Monitoring Offline Store
  • 49. How does the new SDK fits in the picture? Our Solution Feature Store Online Store Feature Management API Batch Ingestion API Stream Ingestion API Feature Discovery API Online Serving API Batch Serving API Cache Online Prediction Model Training Batch Feature Engineering Streaming Feature Engineering Data Lake (BQ, GCS) Kafka/Pubsub Point-in-time lookups Registry Feature Monitoring Offline Store Vertex AI SDK Data engineer ML engineer Data scientist
  • 50.
  • 51. Proprietary + Confidential Scalable training and serving on Vertex AI Train with Data Analyst ML Developer Data Scientist Use when Serve with Vertex Training • Your problem doesn’t match the criteria listed below for BigQuery ML or AutoML. • You’re already running training on-premises or another cloud, and you need consistency across the platforms. Vertex Prediction AutoML • Your problem fits into one of the types AutoML supports. Offers a point-and-click workflow. • Natural Language or Video models are served from Google Cloud. While Vision and Tables support edge / downloadable models. BigQuery ML • All your data is contained in BigQuery. • Users are most comfortable with SQL. • The set of models available in BigQuery ML matches the problem you’re trying to solve. Train with Use when Serve with Data Analyst ML Developer Data Scientist
  • 52. Model deployment & management (MLOps) Explainable AI Model development and data science BigQuery ML Roadmap for 2021 H1’21 H2’21 TF Wide and Deep NNs Preview Autoencoders Preview PCA Preview P-values for linear models Preview Hyperparameter Tuning Preview Anomaly Detection Preview AutoML Tables GA NDA TF Wide and Deep NNs GA Autoencoders GA PCA GA P-values for linear models GA Hyperparameter Tuning GA Anomaly Detection GA Multivariate Time Series (AutoML) Preview Model Registry Preview Managed Pipelines Preview Explainable AI Preview
  • 53. Preparing the training data Mix of demographic & behavioural data Each row is a different user
  • 54. Preparing the training data Each row is a different user Mix of demographic & behavioural data Goal is to create cluster labels 3 2 3 2 1
  • 55. Proprietary + Confidential Developer Days SELECT * EXCEPT(userId) FROM mydataset.train Build and train with CREATE MODEL
  • 56. Proprietary + Confidential Developer Days CREATE OR REPLACE MODEL mydataset.kmeans_3 OPTIONS( model_type='KMEANS', kmeans_init_method = 'KMEANS++', num_clusters=3 ) SELECT * EXCEPT(userId) FROM mydataset.train Build and train with CREATE MODEL
  • 57. Proprietary + Confidential Developer Days ML.PREDICT results
  • 58. Proprietary + Confidential Developer Days Compute cluster labels using ML.PREDICT SELECT * FROM ML.PREDICT(MODEL mydataset.kmeans_3, ( SELECT * FROM mydataset.train ))
  • 59. Proprietary + Confidential Developer Days Inspecting the clusters "Evaluation" tab on the BigQuery UI
  • 60. Proprietary + Confidential Developer Days Inspecting the clusters
  • 61. Anomaly detection with k-means Fraud detection Each row is a transaction Which rows are anomalies?
  • 62. CREATE MODEL - k-means clustering #Query for model training CREATE MODEL demo.kmeans_model OPTIONS( model_type='kmeans', num_clusters= 8, kmeans_init_method = 'kmeans++' ) AS SELECT * EXCEPT(Time, Class) FROM bigquery-public-data.ml_datasets.ulb_fraud_detection;
  • 63. ML.DETECT_ANOMALIES with k-means clustering #Query for creating anomaly detection results SELECT * FROM ML.DETECT_ANOMALIES( MODEL demo.kmeans_model, STRUCT(0.005 AS contamination), TABLE bigquery-public-data.ml_datasets.ulb_fraud_detection );
  • 64.
  • 66. Automated HP tuning Have BigQuery ML automatically search for the optimal hyperparameters Preview Select number of trials 1 Don't need to be an expert in HPs Save time from manually training models with different HPs Easy to use CREATE MODEL mydataset.my_logreg_model OPTIONS( model_type="logistic_reg", input_label_cols=["mylabel"], num_trials=20 ) AS SELECT * FROM mydataset.my_training_data Hyperparameter tuning with BigQuery ML
  • 67. Automated HP tuning Have BigQuery ML automatically search for the optimal hyperparameters Preview Select number of trials 1 Uses Vertex Vizier under-the-hood Save time from manually training models with different HPs Easy to use Inspect the trials info 2 SELECT * FROM ML.TRIAL_INFO(MODEL mydataset.my_logreg_model) Even while it's still training! Hyperparameter tuning with BigQuery ML
  • 68. Automated HP tuning Have BigQuery ML automatically search for the optimal hyperparameters Preview Select number of trials 1 Uses Vertex Vizier under-the-hood Save time from manually training models with different HPs Easy to use Inspect the trials info 2 Evaluate your model 3 Hyperparameter tuning with BigQuery ML
  • 69. Automated HP tuning Have BigQuery ML automatically search for the optimal hyperparameters Preview Select number of trials 1 Uses Vertex Vizier under-the-hood Save time from manually training models with different HPs Easy to use Inspect the trials info 2 Evaluate your model 3 Predict! 4 SELECT * FROM ML.PREDICT(MODEL mydataset.my_logreg_model) Hyperparameter tuning with BigQuery ML
  • 70. How to import TensorFlow models to do batch predictions in BigQuery using BigQuery ML
  • 71. Importing TensorFlow models into BigQuery CREATE MODEL PREDICT https://towardsdatascience.com/how-to-do-batch-predictions-of-tensorflow-models-directly-in-bigquery-ffa843ebdba6 https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models
  • 72. TensorFlow Hub - tfhub.dev
  • 73. Question: Can we do text similarity based on embeddings? # The following are example embedding outputs of 20 dimensions per sentence # Embedding for: The quick brown fox jumps over the lazy dog. # [0.0560572519898, 0.0534118898213, -0.0112254749984, ...] # Embedding for: I am a sentence for which I would like to get its embedding. # [-0.0343746766448, -0.0529498048127, 0.0469399243593, ...]
  • 74. Text similarity using an imported Tensorflow model https://towardsdatascience.com/how-to-do-text-similarity-search-and-document-clustering-in-bigquery-75eb8f45ab65 Goal: I want to search for comments similar to: "power line down on a home"
  • 75. Step 1: Save the TensorFlow model to GCS CREATE OR REPLACE MODEL mydataset.swivel_text_embed OPTIONS( model_type='tensorflow', model_path='gs://BUCKET/swivel/*') Step 2: CREATE MODEL using the GCS folder path
  • 76. Step 3: Use ML.PREDICT to get comment embeddings SELECT * FROM ML.PREDICT(MODEL mydataset.swivel_text_embed, (SELECT comments AS sentences FROM mydataset.mydata) );
  • 77. Step 3: Use ML.PREDICT to get comment embeddings SELECT * FROM ML.PREDICT(MODEL mydataset.swivel_text_embed, (SELECT comments AS sentences FROM mydataset.mydata) ); Text converted into an embedding of 20 floating points
  • 78. Step 4: Calculate distance between embeddings to compute text similarity Input search term: "power line down on a home" Top 15 most similar comments to input
  • 79. Exporting BQML models for use with Vertex Model trained with BigQuery ML Vertex Pipelines Export to Cloud Storage https://github.com/GoogleCloudPlatform/analytics-componentized-patterns/tree/master/retail/recommendation-system/bqml-mlops
  • 80. Proprietary + Confidential Data Labeling AutoML DL Environment (DL VM + DL Container) Prediction Feature Store Training Experiments Data Readiness Feature Engineering Training/ HP-Tuning Model Monitoring Model serving Understanding/ Tuning Edge Model Management Notebooks Pipelines (Orchestration) Explainable AI Hybrid AI Continuous Monitoring Metadata Vision Translation Tables Language Video AI Accelerators Vizier Optimization Datasets What’s included in Vertex AI? NDA
  • 81. Proprietary + Confidential Vertex Pipelines: Key capabilities Python SDKs Data Scientist friendly Python SDKs Serverless and Scalable Run as many pipelines on as much data as you want. Metadata and lineage Store metadata for every artifact produced by the pipeline. Monitoring UIs and APIs Track and debug pipelines executions Security Supports Cloud IAM, VPC-SC, and CMEK. Cost-effective Only pay for the pipelines you run and the resources they use
  • 87. Proprietary + Confidential Experimentation management with Vertex Pipelines Iterative Experimentation Data Prep Development datasets / Features Source Repository Feature Eng Model Training Model Eval Experiment Tracking Training Pipeline Automation Parameters, metrics, artifacts Training Pipeline Source Code
  • 88. Proprietary + Confidential Continuous Training with Vertex Pipelines Orchestrated Training Pipeline Data Extraction Development datasets / Features Model Registry & Artifact Store Data Valid. Data Prep. Model Training Training Pipeline Metadata Trained Model Model Eval. Model Valid. Training Pipeline CI/CD Training Pipeline Source Code
  • 89. Evaluate and Understand Models Tabular Text What-If Tool (WIT) Visually probe the behavior of trained machine learning models, with minimal coding Language Interpretability Tool (LIT) Open-source platform for visualization and understanding of NLP models.
  • 90. A canonical ML workflow Experimentation (Re) Training Model Deployment Continuous Model Monitoring Training Serving 1 2 3 4 EDA / Prototyping Training pipeline dev Pipeline CI/CD Candidate Model generation Model Serving Canary & A/B Testing Model performance monitoring Retrain Triggers Data Validation Feature Engineering Model Training Model Evaluation Model Registry Model Cards & Reporting Model Provenance Compliance Model Management & Governance
  • 91. Learning Transferable Architectures for Scalable Image Recognition, Zoph et al. 2017, https://arxiv.org/abs/1707.07012 computational cost Accuracy (precision @1) accuracy AutoML outperforms handcrafted models
  • 93. Proprietary + Confidential Three Modalities of Google Cloud