SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
Jupyter Notebooks
Workflow Building
Pipelines
Tools
Serving
Metadata
Kale
Fairing
TFX
KF Pipelines
HP Tuning
Tensorboard
KFServing
Seldon Core
TFServing, + Training Operators
Pytorch
XGBoost, +
Tensorflow
Prometheus
KFServing
Animesh Singh, Tommy Li
MPI
MXNet
❑  Rollouts:
Is this rollout safe? How do I roll
back? Can I test a change
without swapping traffic?
❑  Protocol Standards:
How do I make a prediction?
GRPC? HTTP? Kafka?
❑  Cost:
Is the model over or under scaled?
Are resources being used efficiently?
❑  Monitoring:
Are the endpoints healthy? What is
the performance profile and request
trace?
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
Data
Untrained
Model
❑  Frameworks:
How do I serve on Tensorflow?
XGBoost? Scikit Learn? Pytorch?
Custom Code?
❑  Features:
How do I explain the predictions?
What about detecting outliers and
skew? Bias detection? Adversarial
Detection?	
❑  How do I wire up custom pre and
post processing	
Production Model Serving? How hard could it be?
❑  How do I handle batch
predictions?
❑  How do I leverage standardized
Data Plane protocol so that I can
move my model across MLServing
platforms?
●  Seldon	Core	was	pioneering	Graph	Inferencing.	
●  IBM	and	Bloomberg	were	exploring	serverless	ML	lambdas.	IBM	gave	a	talk	on	
the	ML	Serving	with	Knative	at	last	KubeCon	in	Seattle	
●  Google	had	built	a	common	Tensorflow	HTTP	API	for	models.	
●  Microsoft	Kubernetizing	their	Azure	ML	Stack	
Experts fragmented across industry
●  Kubeflow	created	the	conditions	for	collaboration.	
●  A	promise	of	open	code	and	open	community.	
●  Shared	responsibilities	and	expertise	across	multiple	companies.	
●  Diverse	requirements	from	different	customer	segments	
Putting the pieces together
Introducing KFServing
KFServing
●  Founded by Google, Seldon, IBM, Bloomberg and Microsoft	
●  Part of the Kubeflow project
●  Focus on 80% use cases - single model rollout and update
●  Kfserving 1.0 goals:
○  Serverless ML Inference
○  Canary rollouts
○  Model Explanations
○  Optional Pre/Post processing
KFServing Stack
•  Event	triggered	functions	on	Kubernetes
•  			Scale	to	and	from	zero
•  Queue	based	autoscaling	for	GPUs	and	TPUs.	KNative	autoscaling	by	default	provides	inflight	requests	per	pod	
•  Traditional	CPU	autoscaling	if	desired.	Traditional	scaling	hard	for	disparate	devices	(GPU,	CPU,	TPU)	
Knative provides a set of building blocks that enable declarative, container-based, serverless workloads
on Kubernetes. Knative Serving provides primitives for serving platforms such as:
KNative
IBM is 	
2nd largest contributor
Connect: Traffic Control, Discovery,
Load Balancing, Resiliency
Observe: Metrics, Logging, Tracing
Secure: Encryption (TLS),
Authentication, and Authorization of
service-to-service communication
Control: Policy Enforcement
An open service mesh platform to connect, observe, secure, and control microservices. 	
Founded by Google, IBM and Lyft. IBM is the 2nd largest contributor	
Istio
Manages the hosting aspects of your models
•  InferenceService	-	manages the lifecycle of
models
	
•  Configuration	-	manages history of model
deployments. Two configurations for default and
canary.
	
•  Revision	-	A snapshot of your model version
•  Route	-	Endpoint and network traffic management
Route Default
Configuration		
Revision	1
Revision	M	90
%
KFService	
Canary
Configuration		
Revision	1
Revision	N	10
%
KFServing: Default and
Canary Configurations
Model	Servers	
							-		TensorFlow
- Nvidia TRTIS
- PyTorch
- XGBoost
- SKLearn
- ONNX
				
	
							Components:	
•  									-		Predictor, Explainer, Transformer
(pre-processor, post-processor)
							Storage	
	-		AWS/S3
- GCS
- Azure Blob
- PVC
Supported Frameworks, Components and
Storage Subsystems
The InferenceService architecture consists of a static graph of components which coordinate
requests for a single model. Advanced features such as Ensembling, A/B testing, and Multi-Arm-
Bandits should compose InferenceServices together.
Inference Service Control Plane
KFServing Deployment View
-  Today’s	popular	model	servers,	such	as	TFServing,	ONNX,	Seldon,	TRTIS,	all	
communicate	using	similar	but	non-interoperable	HTTP/gRPC	protocol	
-  KFServing	v1	data	plane	protocol	uses	TFServing	compatible	HTTP	API	and	
introduces	explain	verb	to	standardize	between	model	servers,	punt	on	v2	for	gRPC	
and	performance	optimization.	
KFServing Data Plane Unification
API	 Verb	 Path	 Payload	
List	Models	 GET	 /v1/models	 [model_names]	
Readiness	 GET	 /v1/models/<model_name>	
Predict	 POST	 /v1/models/<model_name>:predict	 Request:	{instances:[]}	
Response:	{predictions:[]}	
Explain	 POST	 /v1/models<model_name>:explain	 Request:	{instances:[]}	
Response:	{predictions:[],	
explanations:[]}	
KFServing Data Plane v1 protocol
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
default:
sklearn:
modelUri: "gs://kfserving-samples/models/sklearn/iris"
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "InferenceService"
metadata:
name: "flowers-sample"
spec:
default:
tensorflow:
modelUri: "gs://kfserving-samples/models/tensorflow/flowers"
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "InferenceService"
metadata:
name: ”pytorch-iris"
spec:
default:
pytorch:
modelUri: "gs://kfserving-samples/models/pytorch/iris"
KFServing Examples
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "KFService"
metadata:
name: "my-model"
spec:
default:
# 90% of traffic is sent to this model
tensorflow:
modelUri: "gs://mybucket/mymodel-2"
canaryTrafficPercent: 10
canary:
# 10% of traffic is sent to this model
tensorflow:
modelUri: "gs://mybucket/mymodel-3"
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "KFService"
metadata:
name: "my-model"
spec:
default:
tensorflow:
modelUri: "gs://mybucket/mymodel-2"
# Defaults to zero, so can also be omitted or explicitly set to zero.
canaryTrafficPercent: 0
canary:
# Canary is created but no traffic is directly forwarded.
tensorflow:
modelUri: "gs://mybucket/mymodel-3"
Canary
Pinned
Canary/Pinned Examples
●  Flexible,	high	performance	serving	system	for	TensorFlow		
●  https://www.tensorflow.org/tfx/guide/serving	
●  Stable	and	Google	has	been	using	it	since	2016	
●  Saved	model	format	and	graphdef	
●  Written	in	C++,	support	both	REST	and	gRPC	
●  KFServing	allows	you	to	easily	spin	off	an	Inference	Service	with	TFServing	to	
serve	your	tensorflow	model	on	CPU	or	GPU	with	serverless	features	like	canary	
rollout,	autoscaling.	
TensorFlow Serving(TFServing)
Inference Service with Transformer and TFServing
apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
name: bert-serving
spec:
default
transformer:
custom:
container:
image: bert-transformer:v1
predictor:
tensorflow:
storageUri: s3://examples/bert
runtimeVersion: 1.14.0-gpu
resources:
limits:
nvidia.com/gpu: 1
Pre/Post Processing
Tensorflow Model
Server
class BertTransformer(kfserving.KFModel):
def __init__(self, name):
super().__init__(name)
self.bert_tokenizer = BertTokenizer(vocab_file)
def preprocess(self, inputs: Dict) -> Dict:
encoded_features = bert_tokenizer.encode_plus(
text=text_a, text_pair=text_b)
return {“input_ids”: encoded_features[“input_ids”],
“input_mask”: encoded_features[“attention_mask”],
“segment_ids”: encoded_features[“segment_ids”],
“label_ids”: 1}
def postprocess(self, inputs: Dict) -> Dict:
return inputs
●  NVIDIA’s	highly-optimized	model	runtime	on	GPUs	
●  https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs	
●  Supports	model	repository,	versioning	
●  Dynamic	batching	
●  Concurrent	model	execution	
●  Supports	TensorFlow,	PyTorch,	ONNX	models	
●  Written	in	C++,	support	both	REST	and	gRPC	
●  TensorRT	Optimizer	can	further	bring	down	the	BERT	inference	latency	
NVIDIA Triton Inference Server
Inference Service with Triton Inference Service
apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
name: bert-serving
spec:
default
transformer:
custom:
container:
image: bert-transformer:v1
env:
name: STORAGE_URI
value: s3://examples/bert_transformer
predictor:
tensorrt:
storageUri: s3://examples/bert
runtimeVersion: r20.02
resources:
limits:
nvidia.com/gpu: 1 Triton	Inference	Server
infer_ctx = InferContext(url, protocol, model_name,
model_version)
unique_ids = np.int32([1])
segment_ids = features["segment_ids"]
input_ids = features["input_ids"]
input_mask = features["input_mask"]
result = infer_ctx.run({ 'unique_ids' : (unique_ids,),
'segment_ids' : (segment_ids,),
'input_ids' : (input_ids,),
'input_mask' : (input_mask,) },
{ 'end_logits' :
InferContext.ResultFormat.RAW,
'start_logits' :
InferContext.ResultFormat.RAW }, batch_size)
Pre/Post Processing
PyTorch Model Server
●  PyTorch	model	server	maintained	by	KFServing	
●  https://github.com/kubeflow/kfserving/tree/master/python/pytorchserver	
●  Implemented	in	Python	with	Tornado	server	
●  Loads	model	state	dict	and	model	class	python	file	
●  GPU	Inference	is	supported	in	KFServing	0.3	release	
	
●  Alternatively	you	can	export	PyTorch	model	in	ONNX	format	and	serve	on	
TensorRT	Inference	Server	or	ONNX	Runtime	Server.
ONNX Runtime Server
●  ONNX	Runtime	is	a	performance-focused	inference	
	engine	for	ONNX	models	
●  https://github.com/microsoft/onnxruntime	
●  Supports	Tensorflow,	Pytorch	models	which	can	be	converted	to	ONNX		
●  Written	in	C++,	support	both	REST	and	gRPC	
●  ONNX	Runtime	optimized	BERT	transformer	network	to	further	bring	down	the	
latency	
https://github.com/onnx/tutorials/blob/master/tutorials/Inference-TensorFlow-
Bert-Model-for-High-Performance-in-ONNX-Runtime.ipynb
Inference Service with PyTorch/ONNX Runtime
apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
name: bert-serving
spec:
default
transformer:
custom:
container:
image: bert-transformer:v1
env:
name: STORAGE_URI
value: s3://examples/bert_transformer
predictor:
pytorch:
storageUri: s3://examples/bert
runtimeVersion: v0.3.0-gpu
resources:
limits:
nvidia.com/gpu: 1 Pytorch	Model	Server
apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
name: bert-serving-onnx
spec:
default
transformer:
custom:
container:
image: bert-transformer:v1
env:
name: STORAGE_URI
value: s3://examples/bert_transformer
predictor:
onnx:
storageUri: s3://examples/bert
runtimeVersion: 0.5.1
Pre/Post Processing
ONNX	Runtime	Server
Pre/Post Processing
GPU Autoscaling - KNative solution
Ingress	
Activator	
(buffers	requests)	
Autoscaler	
Queue	
Proxy	
Model	
server	
when	scale	==	0	or	handling	
burst	capacity	
when	scale	>	0	
metrics	
●  Scale	based	on	#	in-flight	requests	against	expected	concurrency	
●  Simple	solution	for	heterogeneous	ML	inference	autoscaling	
scale	
metrics	
0...N	Replicas	
API	
Requests
`
26	
KFServing: Default, Canary and Autoscaler
But the Data Scientist Sees...
●  A pointer to a Serialized Model File
●  9 lines of YAML
●  A live model at an HTTP endpoint
=
http
●  Scale to Zero
●  GPU Autoscaling
●  Safe Rollouts
●  Optimized Serving Containers
●  Network Policy and Auth
●  HTTP APIs (gRPC soon)
●  Tracing
●  Metrics
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "flowers-sample"
spec:
default:
predictor:
tensorflow:
storageUri: "gs://kfserving-samples/models/tensorflow/flowers"
Production	users	include:	
Bloomberg
Summary
●  With	KFServing	user	can	easily	deploy	the	service	for	inference	on	GPU	with	
performant	industry	leading	model	servers	as	well	as	benefiting	from	all	the	
serverless	features.	
●  Autoscale	the	inference	workload	based	on	your	QPS,	much	better	resource	
utilization.	
●  gRPC	can	provide	better	performance	over	REST	which	allows	multiplexing	and	
protobuf	is	a	efficient	and	packed	format	than	JSON.	
●  Transformer	can	work	seamlessly	with	different	model	servers	thanks	to	
KFServing’s	data	plane	standardization.	
●  GPUs	benefit	a	lot	from	batching	the	requests.
Model Serving is accomplished. Can the
predictions be trusted?
Prepared
and
Analyzed
Data	
Trained
Model	
Deployed
Model
Prepared
Data	
Untrained
Model	
Can the model explain
its predictions?
Are there concept drifts?
Is there an outlier? 
Is the model vulnerable
to adversarial attacks?
Production ML Architecture
InferenceService	
logger
Broker
Trigger
Outlier	
Detection
Alerting	
API	
Serving	
Model
Explainer
Adversarial	
Detection	
Concept	
Drift
Payload Logging
Payload Logging
Why:
●  Capture payloads for analysis and future retraining of the model
●  Perform offline processing of the requests and responses
KfServing Implementation (alpha):
●  Add to any InferenceService Endpoint: Predictor, Explainer, Transformer
●  Log Requests, Responses or Both from the Endpoint
●  Simple specify a URL to send the payloads
●  URL will receive CloudEvents
POST	/event	HTTP/1.0	
Host:	example.com	
Content-Type:	application/json	
ce-specversion:	1.0	
ce-type:	repo.newItem	
ce-source:	http://bigco.com/repo	
ce-id:	610b6dd4-c85d-417b-b58f-3771e532	
		
<payload>
Payload Logging
apiVersion:	"serving.kubeflow.org/v1alpha2"	
kind:	"InferenceService"	
metadata:	
	name:	"sklearn-iris"	
spec:	
	default:	
			predictor:	
					minReplicas:	1						
					logger:	
							url:	http://message-dumper.default/	
							mode:	all	
					sklearn:	
							storageUri:	"gs://kfserving-samples/models/sklearn/iris"	
							resources:	
									requests:	
											cpu:	0.1
Payload Logging Architecture Examples
Model
InferenceService	
logger
Broker
Trigger
Outlier
Detector
Alerting
API	
Http kafka
Bridge Kafka Cluster
Serving
Machine Learning Explanations
KfServing Explanations
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "income"
spec:
default:
predictor:
sklearn:
storageUri: "gs://seldon-models/sklearn/income/model"
explainer:
alibi:
type: AnchorTabular
storageUri: "gs://seldon-models/sklearn/income/
explainer"	
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "moviesentiment"
spec:
default:
predictor:
sklearn:
storageUri: "gs://seldon-models/sklearn/moviesentiment"
explainer:
alibi:
type: AnchorText
https://github.com/SeldonIO/alibi	
	
	
	State	of	the	art	implementations:	
	
Seldon Alibi:Explain
Janis KlaiseGiovanni Vacanti Alexandru CocaArnaud Van Looveren
•  Anchors	
•  Counterfactuals		
•  Contrastive	explanations	
•  Trust	scores
38
AIX360 toolkit is an open-source library to help explain AI and
machine learning models and their predictions. This includes three
classes of algorithms: local post-hoc, global post-hoc, and directly
interpretable explainers for models that use image, text, and
structured/tabular data.
The AI Explainability360 Python package includes a comprehensive
set of explainers, both at global and local level.
Toolbox
Local post-hoc	
Global post-hoc	
Directly interpretable	
http://aix360.mybluemix.net
AI	Explainability	360		
↳	(AIX360)	
https://github.com/IBM/AIX360	
	
Explanations: Resources
AIX360
AIX360 Explainability in KFServing
ML	Inference	Analysis
Don’t trust predictions on instances outside of training distribution!
•  Outlier Detection
•  Adversarial Detection
•  Concept Drift
ML Inference Analysis
Don’t trust predictions on instances outside of training distribution!
→ Outlier Detection
Detector types:
-  stateful online vs. pretrained offline
-  feature vs. instance level detectors
Data types:
-  tabular, images & time series
Outlier types:
-  global, contextual & collective outliers
Outlier Detection
Don’t trust predictions on instances outside of training distribution!
→ Adversarial Detection
-  Outliers w.r.t. the model prediction
-  Detect small input changes with a big impact on predictions!
Adversarial Detection
Production data distribution != training distribution?
→ Concept Drift! Retrain!
Need to track the right distributions:
-  feature vs. instance level
-  continuous vs. discrete
-  online vs. offline training data
-  track streaming number of outliers
Concept Drift
Outlier Detection on CIFAR10
CIFAR10	Model
InferenceService	
logger
Broker
Trigger
CIFAR10	
Outlier Detector
Message	
Dumper	
API	
Serving
Adversarial Detection Demos
KFServing	MNIST	Model	with	
Alibi:Detect	VAE	Adversarial	Detector	
https://github.com/SeldonIO/alibi-detect/tree/master/integrations/
samples/kfserving/ad-mnist	
	
KFServing	Traffic	Signs	Model	with	
Alibi:Detect	VAE	Adversarial	Detector
47
•  
Adversarial Robustness 360 ↳ (ART)
ART is a library dedicated to adversarial machine learning. Its
purpose is to allow rapid crafting and analysis of attack, defense
and detection methods for machine learning models. Applicable
domains include finance, self driving vehicles etc.	
The Adversarial Robustness Toolbox provides an implementation
for many state-of-the-art methods for attacking and defending
classifiers.	
https://github.com/IBM/adversarial-robustness-toolbox	
Toolbox: Attacks, defenses, and metrics
Evasion attacks
Defense methods
Detection methods
Robustness metrics	
ART	
https://art-demo.mybluemix.net/
Open	Source	Dojo	 48	
KFServing – Demo One https://www.youtube.com/watch?v=xg5ar6vSAXY&t
Alert		
Temp	
check	
Event	Backbone	-	Kafka			
Data	Processing		-	Spark		
Weather	
Data	
Predict	near	
failure	/	Schedule	
maintenance	
Microservices	
MQTT	
Store	n	
Store	2	
HTTP	
Data	Collection	
Nifi	Debeezium	
Store	1	
Postgres	
Minio	
CDC	
Connect & Collect Organize & Integrate Analyze & Infuse
	
Kubeflow		
KFServing	
Application
Model
History S3
Data Warehouse
KFServing Pipeline – Demo
Demo Scenario: Intelligent Refrigeration Unit Monitoring Scenario
Problem	Statement		
•  The	failure	of	a	refrigerated	unit	is	a	common	problem	with	failures	leading	to	spoiled	good	which	become	a	cost	to	the	business.		Performing	manual	checks	
on	the	units	is	time	consuming,	costly	and	typically	is	just	a	visual	check	to	see	if	the	unit	is	running	and	the	temperature	in	range.		
•  As	a	general	case	being	able	to	automatically	monitor	the	operation	of	the	units	,	and	predict	the	likelihood	of	the	unit	failing	in	advance	of	an	actual	failure	
can	save	costs	in	terms	of	reducing	manual	workload	and	more	significantly	in	reducing	the	amount	of	spoiled	goods	by	being	able	to	take	preventative	
actions	before	the	unit	fails.		
•  Such	an	intelligent	monitoring	solution	has	value	across	different	industries,	from	super	markets,	to	healthcare	but	perhaps	one	of	the	most	significant	
opportunities	comes	with	container	shipping	and	monitoring	the	refrigerated	containers	through	their	life	cycle	of	being	loaded	at	a	supplier,	being	
transported	to	the	dock,	loaded	to	a	container	ship,	through	the	sea	voyage,	being	unloaded	and	transported	to	the	importer.	
•  Being	able	to	predict	a	likely	failure	before	loading	a	container	onto	a	ship,	enables	preventive	maintenance	to	be	done	before	loading,		this	reduces	risk	of	
good	being	spoiled	while	at	sea	where	manual	actions	are	limited.		Such	“intelligently”	monitored	containers	incur	lower	insurance	premiums	for	the	voyage	
as	the	risk	of	loss	of	good	is	reduced.		
	
Proposed	Intelligent	Refrigeration	monitoring	system	
•  The	IT	team	are	proposing	the	development	of	an	intelligent	application	for	monitoring	and	management	of	the	refrigeration	units	in	the	containers.	To	
accelerate	the	development	of	the	application	they	IT	team	are	proposing	to	follow	a	cloud	native	approach	,	building	on	a	Kubernetes	(	OpenShift	)	based	
private	cloud.	The	development	team	will	follow	the	cloud	native	architecture	and	application	patterns	for	intelligent	applications		which	include		
1.  Device	Data	collection	
2.  Real	time	intelligence	with	Machine	Learning		
3.  Event	driven	applications	with	Event	Driven	Microservices
KFServing Pipeline – Demo Two
MNIST	model	end	to	end	using	Kubeflow	Pipelines	with	Tekton.	The	notebook	demonstrates	how	to	compile	and	execute	an	End	to	End	Machine	Learning	
workflow	that	uses	Katib,	TFJob,	KFServing,	and	Tekton	pipeline.	This	pipeline	contains	5	steps,	it	finds	the	best	hyperparameter	using	Katib,	creates	PVC	for	storing	
models,	processes	the	hyperparameter	results,	trains	the	model	on	TFJob	with	the	best	hyperparameter	using	more	iterations,	and	finally	serves	the	model	using	
KFServing..	
			
https://github.com/kubeflow/kfp-tekton/tree/master/samples/e2e-mnist
KFServing – Existing Features
q  Crowd sourced capabilities – Contributions by AWS, Bloomberg, Google, Seldon, IBM, NVidia and others.
q  Support for multiple runtimes pre-integrated (TFServing, Nvdia Triton (GPU optimization), ONNX Runtime, SKLearn,
PyTorch, XGBoost, Custom models.
q  Serverless ML Inference and Autoscaling: Scale to zero (with no incoming traffic) and Request queue based autoscaling
q  Canary and Pinned rollouts: Control traffic percentage and direction, pinned rollouts
q  Pluggable pre-processor/post-processor via Transformer: Gives capabilities to plug in pre-processing/post-processing
implementation, control routing and placement (e.g. pre-processor on CPU, predictor on GPU)
q  Pluggable analysis algorithms: Explainability, Drift Detection, Anomaly Detection, Adversarial Detection (contributed by
Seldon) enabled by Payload Logging (built using CloudEvents standardized eventing protocol)
q  Batch Predictions: Batch prediction support for ML frameworks (TensorFlow, PyTorch, ...)
q  Integration with existing monitoring stack around Knative/Istio ecosystem: Kiali (Service placements, traffic and graphs),
Jaeger (request tracing), Grafana/Prometheus plug-ins for Knative)
q  Multiple clients: kubectl, Python SDK, Kubeflow Pipelines SDK
q  Standardized Data Plane V2 protocol for prediction/explainability et all: Already implemented by Nvidia Triton
q  MMS: Multi-Model-Serving for serving multiple models per custom KFService instance
q  More Data Plane v2 API Compliant Servers: SKLearn, XGBoost, PyTorch…
q  Multi-Model-Graphs and Pipelines: Support chaining multiple models together in a Pipelines
q  PyTorch support via AWS TorchServe
q  gRPC Support for all Model Servers
q  Support for multi-armed-bandits
q  Integration with IBM AIX360 for Explainability, AIF360 for Bias detection and ART for Adversarial detection
KFServing – Upcoming Features
Open Source Projects
●  ML Inference
○  KFServing
○  Seldon Core
https://github.com/kubeflow/kfserving
https://github.com/SeldonIO/seldon-core
●  Model Explanations
○  Seldon Alibi
○  IBM AI Explainability 360	
https://github.com/seldonio/alibi
https://github.com/IBM/AIX360
●  Outlier and Adversarial Detection and Concept Drift
○  Seldon Alibi-detect
https://github.com/seldonio/alibi-detect
●  Adversarial Attack, Detection and Defense
○  IBM Adversarial Robustness 360	
https://github.com/IBM/adversarial-robustness-toolbox

Contenu connexe

Tendances

MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 

Tendances (20)

End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
 
Serving models using KFServing
Serving models using KFServingServing models using KFServing
Serving models using KFServing
 
ContainerDays Hamburg 2023 — Cilium Workshop.pdf
ContainerDays Hamburg 2023 — Cilium Workshop.pdfContainerDays Hamburg 2023 — Cilium Workshop.pdf
ContainerDays Hamburg 2023 — Cilium Workshop.pdf
 
Prometheus
PrometheusPrometheus
Prometheus
 
주니어의 쿠버네티스 생태계에서 살아남기
주니어의 쿠버네티스 생태계에서 살아남기주니어의 쿠버네티스 생태계에서 살아남기
주니어의 쿠버네티스 생태계에서 살아남기
 
Kubernetesの良さを活かして開発・運用!Cloud Native入門 / An introductory Cloud Native #osc19tk
Kubernetesの良さを活かして開発・運用!Cloud Native入門 / An introductory Cloud Native #osc19tkKubernetesの良さを活かして開発・運用!Cloud Native入門 / An introductory Cloud Native #osc19tk
Kubernetesの良さを活かして開発・運用!Cloud Native入門 / An introductory Cloud Native #osc19tk
 
AI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with Knative
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
 
Helm intro
Helm introHelm intro
Helm intro
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd products
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
 
[GKE & Spanner 勉強会] GKE 入門
[GKE & Spanner 勉強会] GKE 入門[GKE & Spanner 勉強会] GKE 入門
[GKE & Spanner 勉強会] GKE 入門
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Introducing github.com/open-cluster-management – How to deliver apps across c...
Introducing github.com/open-cluster-management – How to deliver apps across c...Introducing github.com/open-cluster-management – How to deliver apps across c...
Introducing github.com/open-cluster-management – How to deliver apps across c...
 
急速に進化を続けるCNIプラグイン Antrea
急速に進化を続けるCNIプラグイン Antrea 急速に進化を続けるCNIプラグイン Antrea
急速に進化を続けるCNIプラグイン Antrea
 
Kubernetesによる機械学習基盤への挑戦
Kubernetesによる機械学習基盤への挑戦Kubernetesによる機械学習基盤への挑戦
Kubernetesによる機械学習基盤への挑戦
 
Rancher and Kubernetes Best Practices
Rancher and  Kubernetes Best PracticesRancher and  Kubernetes Best Practices
Rancher and Kubernetes Best Practices
 

Similaire à KFServing - Serverless Model Inferencing

Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...
Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...
Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...
confluent
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Chris Fregly
 
High Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and KubernetesHigh Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and Kubernetes
inside-BigData.com
 

Similaire à KFServing - Serverless Model Inferencing (20)

Service Mesh - kilometer 30 in a microservice marathon
Service Mesh - kilometer 30 in a microservice marathonService Mesh - kilometer 30 in a microservice marathon
Service Mesh - kilometer 30 in a microservice marathon
 
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
 
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
 
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model ServingKubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
 
What is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your MicroservicesWhat is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your Microservices
 
Envoy @ Lyft: developer productivity (kubecon 2.0)
Envoy @ Lyft: developer productivity (kubecon 2.0)Envoy @ Lyft: developer productivity (kubecon 2.0)
Envoy @ Lyft: developer productivity (kubecon 2.0)
 
Cloudify: Open vCPE Design Concepts and Multi-Cloud Orchestration
Cloudify: Open vCPE Design Concepts and Multi-Cloud OrchestrationCloudify: Open vCPE Design Concepts and Multi-Cloud Orchestration
Cloudify: Open vCPE Design Concepts and Multi-Cloud Orchestration
 
Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...
Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...
Operating Kafka on AutoPilot mode @ DBS Bank (Arpit Dubey, DBS Bank) Kafka Su...
 
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
 
Openshift serverless Solution
Openshift serverless SolutionOpenshift serverless Solution
Openshift serverless Solution
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected Talks
 
Managing Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on KubernetesManaging Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on Kubernetes
 
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
 
Envoy @ Lyft: Developer Productivity
Envoy @ Lyft: Developer ProductivityEnvoy @ Lyft: Developer Productivity
Envoy @ Lyft: Developer Productivity
 
Open Source Networking Days- Service Mesh
Open Source Networking Days- Service MeshOpen Source Networking Days- Service Mesh
Open Source Networking Days- Service Mesh
 
High Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and KubernetesHigh Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and Kubernetes
 

Plus de Animesh Singh

Building a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStackBuilding a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStack
Animesh Singh
 
Automated Lifecycle Management - CloudFoundry on OpenStack
Automated Lifecycle Management - CloudFoundry on OpenStackAutomated Lifecycle Management - CloudFoundry on OpenStack
Automated Lifecycle Management - CloudFoundry on OpenStack
Animesh Singh
 

Plus de Animesh Singh (20)

Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AI
 
KFServing and Feast
KFServing and FeastKFServing and Feast
KFServing and Feast
 
Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open Source
 
AIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AI
 
Fabric for Deep Learning
Fabric for Deep LearningFabric for Deep Learning
Fabric for Deep Learning
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!
 
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
 
How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...
 
As a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons LearntAs a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons Learnt
 
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
 
Finding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User GroupsFinding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User Groups
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 
Building a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStackBuilding a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStack
 
Cloud foundry Docker Openstack - Leading Open Source Triumvirate
Cloud foundry Docker Openstack - Leading Open Source TriumvirateCloud foundry Docker Openstack - Leading Open Source Triumvirate
Cloud foundry Docker Openstack - Leading Open Source Triumvirate
 
Build Scalable Internet of Things Apps using Cloud Foundry, Bluemix & Cloudant
Build Scalable Internet of Things Apps using Cloud Foundry, Bluemix & CloudantBuild Scalable Internet of Things Apps using Cloud Foundry, Bluemix & Cloudant
Build Scalable Internet of Things Apps using Cloud Foundry, Bluemix & Cloudant
 
Automated Lifecycle Management - CloudFoundry on OpenStack
Automated Lifecycle Management - CloudFoundry on OpenStackAutomated Lifecycle Management - CloudFoundry on OpenStack
Automated Lifecycle Management - CloudFoundry on OpenStack
 
Docker OpenStack Cloud Foundry
Docker OpenStack Cloud FoundryDocker OpenStack Cloud Foundry
Docker OpenStack Cloud Foundry
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

KFServing - Serverless Model Inferencing

  • 1. Jupyter Notebooks Workflow Building Pipelines Tools Serving Metadata Kale Fairing TFX KF Pipelines HP Tuning Tensorboard KFServing Seldon Core TFServing, + Training Operators Pytorch XGBoost, + Tensorflow Prometheus KFServing Animesh Singh, Tommy Li MPI MXNet
  • 2. ❑  Rollouts: Is this rollout safe? How do I roll back? Can I test a change without swapping traffic? ❑  Protocol Standards: How do I make a prediction? GRPC? HTTP? Kafka? ❑  Cost: Is the model over or under scaled? Are resources being used efficiently? ❑  Monitoring: Are the endpoints healthy? What is the performance profile and request trace? Prepared and Analyzed Data Trained Model Deployed Model Prepared Data Untrained Model ❑  Frameworks: How do I serve on Tensorflow? XGBoost? Scikit Learn? Pytorch? Custom Code? ❑  Features: How do I explain the predictions? What about detecting outliers and skew? Bias detection? Adversarial Detection? ❑  How do I wire up custom pre and post processing Production Model Serving? How hard could it be? ❑  How do I handle batch predictions? ❑  How do I leverage standardized Data Plane protocol so that I can move my model across MLServing platforms?
  • 3. ●  Seldon Core was pioneering Graph Inferencing. ●  IBM and Bloomberg were exploring serverless ML lambdas. IBM gave a talk on the ML Serving with Knative at last KubeCon in Seattle ●  Google had built a common Tensorflow HTTP API for models. ●  Microsoft Kubernetizing their Azure ML Stack Experts fragmented across industry
  • 4. ●  Kubeflow created the conditions for collaboration. ●  A promise of open code and open community. ●  Shared responsibilities and expertise across multiple companies. ●  Diverse requirements from different customer segments Putting the pieces together
  • 6. KFServing ●  Founded by Google, Seldon, IBM, Bloomberg and Microsoft ●  Part of the Kubeflow project ●  Focus on 80% use cases - single model rollout and update ●  Kfserving 1.0 goals: ○  Serverless ML Inference ○  Canary rollouts ○  Model Explanations ○  Optional Pre/Post processing
  • 8. •  Event triggered functions on Kubernetes •  Scale to and from zero •  Queue based autoscaling for GPUs and TPUs. KNative autoscaling by default provides inflight requests per pod •  Traditional CPU autoscaling if desired. Traditional scaling hard for disparate devices (GPU, CPU, TPU) Knative provides a set of building blocks that enable declarative, container-based, serverless workloads on Kubernetes. Knative Serving provides primitives for serving platforms such as: KNative IBM is 2nd largest contributor
  • 9. Connect: Traffic Control, Discovery, Load Balancing, Resiliency Observe: Metrics, Logging, Tracing Secure: Encryption (TLS), Authentication, and Authorization of service-to-service communication Control: Policy Enforcement An open service mesh platform to connect, observe, secure, and control microservices. Founded by Google, IBM and Lyft. IBM is the 2nd largest contributor Istio
  • 10. Manages the hosting aspects of your models •  InferenceService - manages the lifecycle of models •  Configuration - manages history of model deployments. Two configurations for default and canary. •  Revision - A snapshot of your model version •  Route - Endpoint and network traffic management Route Default Configuration Revision 1 Revision M 90 % KFService Canary Configuration Revision 1 Revision N 10 % KFServing: Default and Canary Configurations
  • 11. Model Servers - TensorFlow - Nvidia TRTIS - PyTorch - XGBoost - SKLearn - ONNX Components: •  - Predictor, Explainer, Transformer (pre-processor, post-processor) Storage - AWS/S3 - GCS - Azure Blob - PVC Supported Frameworks, Components and Storage Subsystems
  • 12. The InferenceService architecture consists of a static graph of components which coordinate requests for a single model. Advanced features such as Ensembling, A/B testing, and Multi-Arm- Bandits should compose InferenceServices together. Inference Service Control Plane
  • 15. API Verb Path Payload List Models GET /v1/models [model_names] Readiness GET /v1/models/<model_name> Predict POST /v1/models/<model_name>:predict Request: {instances:[]} Response: {predictions:[]} Explain POST /v1/models<model_name>:explain Request: {instances:[]} Response: {predictions:[], explanations:[]} KFServing Data Plane v1 protocol
  • 16. apiVersion: "serving.kubeflow.org/v1alpha1" kind: "InferenceService" metadata: name: "sklearn-iris" spec: default: sklearn: modelUri: "gs://kfserving-samples/models/sklearn/iris" apiVersion: "serving.kubeflow.org/v1alpha1" kind: "InferenceService" metadata: name: "flowers-sample" spec: default: tensorflow: modelUri: "gs://kfserving-samples/models/tensorflow/flowers" apiVersion: "serving.kubeflow.org/v1alpha1" kind: "InferenceService" metadata: name: ”pytorch-iris" spec: default: pytorch: modelUri: "gs://kfserving-samples/models/pytorch/iris" KFServing Examples
  • 17. apiVersion: "serving.kubeflow.org/v1alpha1" kind: "KFService" metadata: name: "my-model" spec: default: # 90% of traffic is sent to this model tensorflow: modelUri: "gs://mybucket/mymodel-2" canaryTrafficPercent: 10 canary: # 10% of traffic is sent to this model tensorflow: modelUri: "gs://mybucket/mymodel-3" apiVersion: "serving.kubeflow.org/v1alpha1" kind: "KFService" metadata: name: "my-model" spec: default: tensorflow: modelUri: "gs://mybucket/mymodel-2" # Defaults to zero, so can also be omitted or explicitly set to zero. canaryTrafficPercent: 0 canary: # Canary is created but no traffic is directly forwarded. tensorflow: modelUri: "gs://mybucket/mymodel-3" Canary Pinned Canary/Pinned Examples
  • 18. ●  Flexible, high performance serving system for TensorFlow ●  https://www.tensorflow.org/tfx/guide/serving ●  Stable and Google has been using it since 2016 ●  Saved model format and graphdef ●  Written in C++, support both REST and gRPC ●  KFServing allows you to easily spin off an Inference Service with TFServing to serve your tensorflow model on CPU or GPU with serverless features like canary rollout, autoscaling. TensorFlow Serving(TFServing)
  • 19. Inference Service with Transformer and TFServing apiVersion: serving.kubeflow.org/v1alpha2 kind: InferenceService metadata: name: bert-serving spec: default transformer: custom: container: image: bert-transformer:v1 predictor: tensorflow: storageUri: s3://examples/bert runtimeVersion: 1.14.0-gpu resources: limits: nvidia.com/gpu: 1 Pre/Post Processing Tensorflow Model Server class BertTransformer(kfserving.KFModel): def __init__(self, name): super().__init__(name) self.bert_tokenizer = BertTokenizer(vocab_file) def preprocess(self, inputs: Dict) -> Dict: encoded_features = bert_tokenizer.encode_plus( text=text_a, text_pair=text_b) return {“input_ids”: encoded_features[“input_ids”], “input_mask”: encoded_features[“attention_mask”], “segment_ids”: encoded_features[“segment_ids”], “label_ids”: 1} def postprocess(self, inputs: Dict) -> Dict: return inputs
  • 20. ●  NVIDIA’s highly-optimized model runtime on GPUs ●  https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs ●  Supports model repository, versioning ●  Dynamic batching ●  Concurrent model execution ●  Supports TensorFlow, PyTorch, ONNX models ●  Written in C++, support both REST and gRPC ●  TensorRT Optimizer can further bring down the BERT inference latency NVIDIA Triton Inference Server
  • 21. Inference Service with Triton Inference Service apiVersion: serving.kubeflow.org/v1alpha2 kind: InferenceService metadata: name: bert-serving spec: default transformer: custom: container: image: bert-transformer:v1 env: name: STORAGE_URI value: s3://examples/bert_transformer predictor: tensorrt: storageUri: s3://examples/bert runtimeVersion: r20.02 resources: limits: nvidia.com/gpu: 1 Triton Inference Server infer_ctx = InferContext(url, protocol, model_name, model_version) unique_ids = np.int32([1]) segment_ids = features["segment_ids"] input_ids = features["input_ids"] input_mask = features["input_mask"] result = infer_ctx.run({ 'unique_ids' : (unique_ids,), 'segment_ids' : (segment_ids,), 'input_ids' : (input_ids,), 'input_mask' : (input_mask,) }, { 'end_logits' : InferContext.ResultFormat.RAW, 'start_logits' : InferContext.ResultFormat.RAW }, batch_size) Pre/Post Processing
  • 22. PyTorch Model Server ●  PyTorch model server maintained by KFServing ●  https://github.com/kubeflow/kfserving/tree/master/python/pytorchserver ●  Implemented in Python with Tornado server ●  Loads model state dict and model class python file ●  GPU Inference is supported in KFServing 0.3 release ●  Alternatively you can export PyTorch model in ONNX format and serve on TensorRT Inference Server or ONNX Runtime Server.
  • 23. ONNX Runtime Server ●  ONNX Runtime is a performance-focused inference engine for ONNX models ●  https://github.com/microsoft/onnxruntime ●  Supports Tensorflow, Pytorch models which can be converted to ONNX ●  Written in C++, support both REST and gRPC ●  ONNX Runtime optimized BERT transformer network to further bring down the latency https://github.com/onnx/tutorials/blob/master/tutorials/Inference-TensorFlow- Bert-Model-for-High-Performance-in-ONNX-Runtime.ipynb
  • 24. Inference Service with PyTorch/ONNX Runtime apiVersion: serving.kubeflow.org/v1alpha2 kind: InferenceService metadata: name: bert-serving spec: default transformer: custom: container: image: bert-transformer:v1 env: name: STORAGE_URI value: s3://examples/bert_transformer predictor: pytorch: storageUri: s3://examples/bert runtimeVersion: v0.3.0-gpu resources: limits: nvidia.com/gpu: 1 Pytorch Model Server apiVersion: serving.kubeflow.org/v1alpha2 kind: InferenceService metadata: name: bert-serving-onnx spec: default transformer: custom: container: image: bert-transformer:v1 env: name: STORAGE_URI value: s3://examples/bert_transformer predictor: onnx: storageUri: s3://examples/bert runtimeVersion: 0.5.1 Pre/Post Processing ONNX Runtime Server Pre/Post Processing
  • 25. GPU Autoscaling - KNative solution Ingress Activator (buffers requests) Autoscaler Queue Proxy Model server when scale == 0 or handling burst capacity when scale > 0 metrics ●  Scale based on # in-flight requests against expected concurrency ●  Simple solution for heterogeneous ML inference autoscaling scale metrics 0...N Replicas API Requests
  • 27. But the Data Scientist Sees... ●  A pointer to a Serialized Model File ●  9 lines of YAML ●  A live model at an HTTP endpoint = http ●  Scale to Zero ●  GPU Autoscaling ●  Safe Rollouts ●  Optimized Serving Containers ●  Network Policy and Auth ●  HTTP APIs (gRPC soon) ●  Tracing ●  Metrics apiVersion: "serving.kubeflow.org/v1alpha2" kind: "InferenceService" metadata: name: "flowers-sample" spec: default: predictor: tensorflow: storageUri: "gs://kfserving-samples/models/tensorflow/flowers" Production users include: Bloomberg
  • 28. Summary ●  With KFServing user can easily deploy the service for inference on GPU with performant industry leading model servers as well as benefiting from all the serverless features. ●  Autoscale the inference workload based on your QPS, much better resource utilization. ●  gRPC can provide better performance over REST which allows multiplexing and protobuf is a efficient and packed format than JSON. ●  Transformer can work seamlessly with different model servers thanks to KFServing’s data plane standardization. ●  GPUs benefit a lot from batching the requests.
  • 29. Model Serving is accomplished. Can the predictions be trusted? Prepared and Analyzed Data Trained Model Deployed Model Prepared Data Untrained Model Can the model explain its predictions? Are there concept drifts? Is there an outlier? Is the model vulnerable to adversarial attacks?
  • 32. Payload Logging Why: ●  Capture payloads for analysis and future retraining of the model ●  Perform offline processing of the requests and responses KfServing Implementation (alpha): ●  Add to any InferenceService Endpoint: Predictor, Explainer, Transformer ●  Log Requests, Responses or Both from the Endpoint ●  Simple specify a URL to send the payloads ●  URL will receive CloudEvents POST /event HTTP/1.0 Host: example.com Content-Type: application/json ce-specversion: 1.0 ce-type: repo.newItem ce-source: http://bigco.com/repo ce-id: 610b6dd4-c85d-417b-b58f-3771e532 <payload>
  • 34. Payload Logging Architecture Examples Model InferenceService logger Broker Trigger Outlier Detector Alerting API Http kafka Bridge Kafka Cluster Serving
  • 36. KfServing Explanations apiVersion: "serving.kubeflow.org/v1alpha2" kind: "InferenceService" metadata: name: "income" spec: default: predictor: sklearn: storageUri: "gs://seldon-models/sklearn/income/model" explainer: alibi: type: AnchorTabular storageUri: "gs://seldon-models/sklearn/income/ explainer" apiVersion: "serving.kubeflow.org/v1alpha2" kind: "InferenceService" metadata: name: "moviesentiment" spec: default: predictor: sklearn: storageUri: "gs://seldon-models/sklearn/moviesentiment" explainer: alibi: type: AnchorText
  • 37. https://github.com/SeldonIO/alibi State of the art implementations: Seldon Alibi:Explain Janis KlaiseGiovanni Vacanti Alexandru CocaArnaud Van Looveren •  Anchors •  Counterfactuals •  Contrastive explanations •  Trust scores
  • 38. 38 AIX360 toolkit is an open-source library to help explain AI and machine learning models and their predictions. This includes three classes of algorithms: local post-hoc, global post-hoc, and directly interpretable explainers for models that use image, text, and structured/tabular data. The AI Explainability360 Python package includes a comprehensive set of explainers, both at global and local level. Toolbox Local post-hoc Global post-hoc Directly interpretable http://aix360.mybluemix.net AI Explainability 360 ↳ (AIX360) https://github.com/IBM/AIX360 Explanations: Resources AIX360
  • 41. Don’t trust predictions on instances outside of training distribution! •  Outlier Detection •  Adversarial Detection •  Concept Drift ML Inference Analysis
  • 42. Don’t trust predictions on instances outside of training distribution! → Outlier Detection Detector types: -  stateful online vs. pretrained offline -  feature vs. instance level detectors Data types: -  tabular, images & time series Outlier types: -  global, contextual & collective outliers Outlier Detection
  • 43. Don’t trust predictions on instances outside of training distribution! → Adversarial Detection -  Outliers w.r.t. the model prediction -  Detect small input changes with a big impact on predictions! Adversarial Detection
  • 44. Production data distribution != training distribution? → Concept Drift! Retrain! Need to track the right distributions: -  feature vs. instance level -  continuous vs. discrete -  online vs. offline training data -  track streaming number of outliers Concept Drift
  • 45. Outlier Detection on CIFAR10 CIFAR10 Model InferenceService logger Broker Trigger CIFAR10 Outlier Detector Message Dumper API Serving
  • 47. 47 •  Adversarial Robustness 360 ↳ (ART) ART is a library dedicated to adversarial machine learning. Its purpose is to allow rapid crafting and analysis of attack, defense and detection methods for machine learning models. Applicable domains include finance, self driving vehicles etc. The Adversarial Robustness Toolbox provides an implementation for many state-of-the-art methods for attacking and defending classifiers. https://github.com/IBM/adversarial-robustness-toolbox Toolbox: Attacks, defenses, and metrics Evasion attacks Defense methods Detection methods Robustness metrics ART https://art-demo.mybluemix.net/
  • 48. Open Source Dojo 48 KFServing – Demo One https://www.youtube.com/watch?v=xg5ar6vSAXY&t
  • 50. Demo Scenario: Intelligent Refrigeration Unit Monitoring Scenario Problem Statement •  The failure of a refrigerated unit is a common problem with failures leading to spoiled good which become a cost to the business. Performing manual checks on the units is time consuming, costly and typically is just a visual check to see if the unit is running and the temperature in range. •  As a general case being able to automatically monitor the operation of the units , and predict the likelihood of the unit failing in advance of an actual failure can save costs in terms of reducing manual workload and more significantly in reducing the amount of spoiled goods by being able to take preventative actions before the unit fails. •  Such an intelligent monitoring solution has value across different industries, from super markets, to healthcare but perhaps one of the most significant opportunities comes with container shipping and monitoring the refrigerated containers through their life cycle of being loaded at a supplier, being transported to the dock, loaded to a container ship, through the sea voyage, being unloaded and transported to the importer. •  Being able to predict a likely failure before loading a container onto a ship, enables preventive maintenance to be done before loading, this reduces risk of good being spoiled while at sea where manual actions are limited. Such “intelligently” monitored containers incur lower insurance premiums for the voyage as the risk of loss of good is reduced. Proposed Intelligent Refrigeration monitoring system •  The IT team are proposing the development of an intelligent application for monitoring and management of the refrigeration units in the containers. To accelerate the development of the application they IT team are proposing to follow a cloud native approach , building on a Kubernetes ( OpenShift ) based private cloud. The development team will follow the cloud native architecture and application patterns for intelligent applications which include 1.  Device Data collection 2.  Real time intelligence with Machine Learning 3.  Event driven applications with Event Driven Microservices
  • 51. KFServing Pipeline – Demo Two MNIST model end to end using Kubeflow Pipelines with Tekton. The notebook demonstrates how to compile and execute an End to End Machine Learning workflow that uses Katib, TFJob, KFServing, and Tekton pipeline. This pipeline contains 5 steps, it finds the best hyperparameter using Katib, creates PVC for storing models, processes the hyperparameter results, trains the model on TFJob with the best hyperparameter using more iterations, and finally serves the model using KFServing.. https://github.com/kubeflow/kfp-tekton/tree/master/samples/e2e-mnist
  • 52. KFServing – Existing Features q  Crowd sourced capabilities – Contributions by AWS, Bloomberg, Google, Seldon, IBM, NVidia and others. q  Support for multiple runtimes pre-integrated (TFServing, Nvdia Triton (GPU optimization), ONNX Runtime, SKLearn, PyTorch, XGBoost, Custom models. q  Serverless ML Inference and Autoscaling: Scale to zero (with no incoming traffic) and Request queue based autoscaling q  Canary and Pinned rollouts: Control traffic percentage and direction, pinned rollouts q  Pluggable pre-processor/post-processor via Transformer: Gives capabilities to plug in pre-processing/post-processing implementation, control routing and placement (e.g. pre-processor on CPU, predictor on GPU) q  Pluggable analysis algorithms: Explainability, Drift Detection, Anomaly Detection, Adversarial Detection (contributed by Seldon) enabled by Payload Logging (built using CloudEvents standardized eventing protocol) q  Batch Predictions: Batch prediction support for ML frameworks (TensorFlow, PyTorch, ...) q  Integration with existing monitoring stack around Knative/Istio ecosystem: Kiali (Service placements, traffic and graphs), Jaeger (request tracing), Grafana/Prometheus plug-ins for Knative) q  Multiple clients: kubectl, Python SDK, Kubeflow Pipelines SDK q  Standardized Data Plane V2 protocol for prediction/explainability et all: Already implemented by Nvidia Triton
  • 53. q  MMS: Multi-Model-Serving for serving multiple models per custom KFService instance q  More Data Plane v2 API Compliant Servers: SKLearn, XGBoost, PyTorch… q  Multi-Model-Graphs and Pipelines: Support chaining multiple models together in a Pipelines q  PyTorch support via AWS TorchServe q  gRPC Support for all Model Servers q  Support for multi-armed-bandits q  Integration with IBM AIX360 for Explainability, AIF360 for Bias detection and ART for Adversarial detection KFServing – Upcoming Features
  • 54. Open Source Projects ●  ML Inference ○  KFServing ○  Seldon Core https://github.com/kubeflow/kfserving https://github.com/SeldonIO/seldon-core ●  Model Explanations ○  Seldon Alibi ○  IBM AI Explainability 360 https://github.com/seldonio/alibi https://github.com/IBM/AIX360 ●  Outlier and Adversarial Detection and Concept Drift ○  Seldon Alibi-detect https://github.com/seldonio/alibi-detect ●  Adversarial Attack, Detection and Defense ○  IBM Adversarial Robustness 360 https://github.com/IBM/adversarial-robustness-toolbox