SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Revolutionary
container based
hybrid cloud
solution for ML
Making ML simple portable and scalable
NextGenML
Platform
Radu Moldovan
Europe Lead Big Data & Machine Learning
Ness Machine Learning Platform (MLP)
Ø Robust pipeline container oriented to:
ü to train a model
ü provide metrics and generic evaluation
Ø Measure the performance of the trained model
Ø Deploy the model as a service
… of course all integrated into a Kubernetes container solution
Ness Machine Learning Platform (MLP)
Ness MLP act as a hub around which all data
science work takes place at enterprise scale level.
Ness' data science platform puts the
entire data modelling process in the hands of data
science teams so they can focus on gaining insights
from data and communicating them to key stakeholders in
your business.
• Infrastructure Agnostic (Docker)
•was deployed on prem but also on cloud (Azure Kubernetes Service)
•still easy to integrate with cloud services (Databricks – Spark)
•used cutting edge devOps technologies in container area (Dockerize, K8s)
• Azure heterogenous cluster (CPU&GPU) for dynamic GPU allocation for ML
trainig
In client’s production we run in parallel more than 20 ML models for organ-segmentation, each one took around 20-24
hours. The GPU nodes were allocated on the fly minimaxing the time and costs.
Not vendor locked, entire technology stack is opensource
NextGenML
Platform
• ML pipeline (Kubeflow)
Build, deploy and manage multi-step ML workflows based on Docker containers
• Pipeline creation is done programmatically using Python DSL (low level control)
• Real time perception of progress, execution, logs, in/outs of parameters
• Unified all the logs from pipeline containers and also system containers so it’s easy for all
the users(data engineers/scientist) to check progress
• Easy tool to compare in parallel different parameters and hyper-parameters for scientific
experiments.
• Also keep track of entire history of all runs and re-runs
NextGenML
Platform
• Asset Catalog
Build as ML/AI collaboration (Records, Models, Experiments, Pipelines, Deployments)
• built by Ness us from scratch as single source of truth built around ML concepts
• is the centric part for data scientist, the reference data integration also for data engineers
• data traceability UI/UX designed to easily drill down the data and ML execution
• REST API and SDK(Python) to manage records, experiment, pipelines during ML execution
• “experiment metadata” provided through UI => no need to ssh the machine for info
NextGenML
Platform
• Asset Catalog
Build as ML/AI collaboration (Records, Models, Experiments, Pipelines, Deployments)
• built by Ness us from scratch as single source of truth built around ML concepts
• is the centric part for data scientist, the reference data integration also for data engineers
• data traceability UI/UX designed to easily drill down the data and ML execution
• REST API and SDK(Python) to managerecords, experiment, pipelines during ML execution
• “experiment metadata” provided through UI => no need to ssh the machine for info
NextGenML
Platform
• ETL (Spark)
Build for easy integration with the ML platform
• full integration with Asset Catalog
• created own Spark adapters, Spark Operator, to deploy and run Spark on
Kubernetes
• created own Spark Python SDK to create Databricks Spark Cluster, deploy
and execute spark job on Databricks
• integration of Azure Data Factory with Databricks and Asset Catalog to
support Azure Service Bus(distributed queue)
NextGenML
Platform
• Automation (CI/CD)
Created for an easy code and flow integration with user’s demands
• smooth propagation of released code
• deploy easily “exact the same Docker image” in different environments like
development, integration system, production
• built on top of Jenkins and devOps best practices
NextGenML
Platform
Developing a cognitive interface between engineers and technology – data catalog
Data
Engineers
devOps
Data
Scientist
Data
Scientist
Data
Scientist
devOps
Data
Scientist
- unify the work for data scientists across different Line of Business
- to automate most of their manual process for data processing
- data discovery and tracing
- normalize different computer data
- visual navigation on experiments & deployments
Data
Engineers
NextGenML
Platform
import kfp.dsl as dsl
from kubernetes import client as k8s_client
training_op = dsl.ContainerOp(
name='training',
image=AZURE_ML___GPU_IMAGE,
command=['bash', '-c'],
arguments=[...],
file_outputs={'output': '....'})
training_op.add_pod_annotation('....', 'false')
training_op.set_gpu_limit("1")
Data
Engineers
devOps
Data
Scientist
CPU Cluster GPU Cluster Extended
TensorFlow
Keras
GPU
User implementation
System automation
TensorFlow
Keras
GPU
NextGenML
Platform
Data Ingestion NextGenML
Platform
Data PreparationTransformation NextGenML
Platform
Training Model
NextGenML
Platform
NextGenML
Platform
DataAsset Catalog
Data Catalog- Record NextGenML
Platform
Data Catalog- Experiment NextGenML
Platform
Data Catalog- Model NextGenML
Platform
Data Catalog- Record Set NextGenML
Platform
Data Catalog- Analytics NextGenML
Platform
Azure
Container
Registry
Azure DevOps
Team City
git
build
Azure
Kubernetes
Cluster
MASTER
git
DEV
PRODUCTION
Azure
Kubernetes
Cluster
DEV & UAT
Data
Engineers
devOps
Testing
Data
Scientist
Docker
Helm
kSonnet
User initiated
System initiated
Continuous Integration (CI) / Continuous Deployment (CD)
CI
CD CPU
GPU
CPU
GPU
2 x
2 x
NextGenML
Platform
Asset
Catalog
Gitlab/github/AD
dex
http://assetcatalog.rotsrlxdv29.ness.com/
ingress
1
2
3
4 (JWT –Authentication
& Authorisation)
Open ID Connect(OIDC), (OAuth2)
Identity provider
Is JWT signature
(encoded) valid? – see next page
Is JWT expired – see next page
Is User Authorized
Private Key Public Key
(PAYLOAD DATA: 'iat': 1565182393)
5
6
7
NextGenML
Platform
KUBEFLOW
K8 -CPU
BOARDING
K8-GPU
APPs
DATA
CATALOG
PORTAL
SYNC
SERVICE
Balancer
Active
Directory
Azure Storage
Datalake
Metrics
Logs
BlobFiles
Public
IP
SERVING
MODEL
Azure Vnet
MONITORING
TRAINING
Params
State Report
MODEL
Metrics
TRAINING
ND6s instance
NextGenML
Platform
BOARDING
NextGenML
Platform
VISUALISATION
NextGenML
Platform
TensorFlow Serving REST API (https://www.tensorflow.org/tfx/serving/api_rest)
POST http://host:port/v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]:(classify|regress)
#preparation payload
encoded_image = base64.b64encode(THE_IMAGE)
payload_json{
“image”:encoded_image,
….
}
# post request to Tensorflow Serving
json_response = requests.post(url=rest_url, json= payload_json{})
# decode the response and extract marked organ
response = extract_from_json_THE_2D_PREDICTION (numpy)
organ_countur = png.Reader(bytes=y.tostring())
SERVING
NextGenML
Platform
VPN + KeyVault
AZ DevOps Pipelines (Brigade)
Kubernetes Services (AKS)
HDInsight (Kafka & Spark)
Storage Account
DataLake Storage Gen 1 & 2
Data Factory (ADF)
Function App(lambda)
Batch Account (VM Pools)
Databricks + external metastore
Service Bus (ASB)
Cache for Redis
VM ( GPU )
CI / CD
Kibana Logstash Filebit
GrayLog
Jenkins
Nexus
SonarQube**
Machine Learning
Heterogeneous cluster (CPU+GPU)
Kubeflow(on K8s)
Tensorboard(Keras)- training/boarding/serving*
Seldon*
K8s
Gangway+Dex(GitLab/AD)+RBAC
Ingress(Nginx)/Istio +Services
KubeFlow
Spark
Asset Catalog
GlusterFS
Graphana + Prometeus
Azure Service
Hybrid Cloud Topology NextGenML
Platform
NextGenML
Platform
NextGenML
Platform
NextGenML
Platform
Hybrid Cloud – ETL solution
Azure Data Factory NextGenML
Platform
ABDOMEN – markers
"Small Intestine", [200, 200, 100],
"Large Intestine", [255, 255, 0],
"Spinal Canal", [150, 100, 50],
"Gall Bladder", [40, 100, 240]
"Spleen", [0, 200, 120],
"Liver", [30, 200, 50],
"Stomach", [120, 80, 10],
"Pancreas", [200, 70, 30],
"Duodenum”, [20, 100, 50],.
NextGenML
Platform
• Stack Technology
Platform
Kubernetes (onPrem) + Docker
Azure Kubernetes Cluster (AKS)
Nexus
Azure Container Registry(ACR)
GlusterFS
Workflow
Argo->Kubeflow
DevOps
Helm
kSonnet
Kustomize
Azure DevOps
Code Management & CI/CD
Git
TeamCity
SonarQube
Jenkins
Security
MS Active Directory
Azure VPN
Dex (K8s) integrated with GitLab
Machine Learning
TensorFlow (model training, boarding, serving)
Keras
Seldon
Storage (Azure)
Storage Gen1 & Gen2
Data Lake
File Storage
ETL (Azure)
Databricks
Data Factory (ADF)
HDInsight (Kafka and Spark)
Service Bus (ASB)
Lambda functions & VMs
Spark on K8
Cache for Redis
Monitoring and Logging
Graphana
Prometeus
GrayLog
www.ness.comNess Timisoara
Thank you!
Radu.Moldovan@ness.com
Skype: r.moldovan
Linkedin: linkedin.com/in/raduadrianmoldovan
NextGenML
Platform

Contenu connexe

Tendances

How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
Databricks
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Data Science Milan
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 

Tendances (20)

Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
 
Model versioning done right: A ModelDB 2.0 Walkthrough
Model versioning done right: A ModelDB 2.0 WalkthroughModel versioning done right: A ModelDB 2.0 Walkthrough
Model versioning done right: A ModelDB 2.0 Walkthrough
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache Spark
 
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleInfrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...
Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...
Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platformNatalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
Whats new in_mlflow
Whats new in_mlflowWhats new in_mlflow
Whats new in_mlflow
 
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
 

Similaire à NextGenML

NLA Cloud Platform™ Datasheet | Infovista
NLA Cloud Platform™ Datasheet | InfovistaNLA Cloud Platform™ Datasheet | Infovista
NLA Cloud Platform™ Datasheet | Infovista
Infovista
 

Similaire à NextGenML (20)

ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
Red hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategyRed hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategy
 
NLA Cloud Platform™ Datasheet | Infovista
NLA Cloud Platform™ Datasheet | InfovistaNLA Cloud Platform™ Datasheet | Infovista
NLA Cloud Platform™ Datasheet | Infovista
 
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, SmileOCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
 
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
 
Scalable Open-Source IoT Solutions on Microsoft Azure
Scalable Open-Source IoT Solutions on Microsoft AzureScalable Open-Source IoT Solutions on Microsoft Azure
Scalable Open-Source IoT Solutions on Microsoft Azure
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
AnilKumarT_Resume_latest
AnilKumarT_Resume_latestAnilKumarT_Resume_latest
AnilKumarT_Resume_latest
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

NextGenML

  • 1. Revolutionary container based hybrid cloud solution for ML Making ML simple portable and scalable NextGenML Platform Radu Moldovan Europe Lead Big Data & Machine Learning
  • 2. Ness Machine Learning Platform (MLP) Ø Robust pipeline container oriented to: ü to train a model ü provide metrics and generic evaluation Ø Measure the performance of the trained model Ø Deploy the model as a service … of course all integrated into a Kubernetes container solution
  • 3. Ness Machine Learning Platform (MLP) Ness MLP act as a hub around which all data science work takes place at enterprise scale level. Ness' data science platform puts the entire data modelling process in the hands of data science teams so they can focus on gaining insights from data and communicating them to key stakeholders in your business.
  • 4. • Infrastructure Agnostic (Docker) •was deployed on prem but also on cloud (Azure Kubernetes Service) •still easy to integrate with cloud services (Databricks – Spark) •used cutting edge devOps technologies in container area (Dockerize, K8s) • Azure heterogenous cluster (CPU&GPU) for dynamic GPU allocation for ML trainig In client’s production we run in parallel more than 20 ML models for organ-segmentation, each one took around 20-24 hours. The GPU nodes were allocated on the fly minimaxing the time and costs. Not vendor locked, entire technology stack is opensource NextGenML Platform
  • 5. • ML pipeline (Kubeflow) Build, deploy and manage multi-step ML workflows based on Docker containers • Pipeline creation is done programmatically using Python DSL (low level control) • Real time perception of progress, execution, logs, in/outs of parameters • Unified all the logs from pipeline containers and also system containers so it’s easy for all the users(data engineers/scientist) to check progress • Easy tool to compare in parallel different parameters and hyper-parameters for scientific experiments. • Also keep track of entire history of all runs and re-runs NextGenML Platform
  • 6. • Asset Catalog Build as ML/AI collaboration (Records, Models, Experiments, Pipelines, Deployments) • built by Ness us from scratch as single source of truth built around ML concepts • is the centric part for data scientist, the reference data integration also for data engineers • data traceability UI/UX designed to easily drill down the data and ML execution • REST API and SDK(Python) to manage records, experiment, pipelines during ML execution • “experiment metadata” provided through UI => no need to ssh the machine for info NextGenML Platform
  • 7. • Asset Catalog Build as ML/AI collaboration (Records, Models, Experiments, Pipelines, Deployments) • built by Ness us from scratch as single source of truth built around ML concepts • is the centric part for data scientist, the reference data integration also for data engineers • data traceability UI/UX designed to easily drill down the data and ML execution • REST API and SDK(Python) to managerecords, experiment, pipelines during ML execution • “experiment metadata” provided through UI => no need to ssh the machine for info NextGenML Platform
  • 8. • ETL (Spark) Build for easy integration with the ML platform • full integration with Asset Catalog • created own Spark adapters, Spark Operator, to deploy and run Spark on Kubernetes • created own Spark Python SDK to create Databricks Spark Cluster, deploy and execute spark job on Databricks • integration of Azure Data Factory with Databricks and Asset Catalog to support Azure Service Bus(distributed queue) NextGenML Platform
  • 9. • Automation (CI/CD) Created for an easy code and flow integration with user’s demands • smooth propagation of released code • deploy easily “exact the same Docker image” in different environments like development, integration system, production • built on top of Jenkins and devOps best practices NextGenML Platform
  • 10. Developing a cognitive interface between engineers and technology – data catalog Data Engineers devOps Data Scientist Data Scientist Data Scientist devOps Data Scientist - unify the work for data scientists across different Line of Business - to automate most of their manual process for data processing - data discovery and tracing - normalize different computer data - visual navigation on experiments & deployments Data Engineers NextGenML Platform
  • 11. import kfp.dsl as dsl from kubernetes import client as k8s_client training_op = dsl.ContainerOp( name='training', image=AZURE_ML___GPU_IMAGE, command=['bash', '-c'], arguments=[...], file_outputs={'output': '....'}) training_op.add_pod_annotation('....', 'false') training_op.set_gpu_limit("1") Data Engineers devOps Data Scientist CPU Cluster GPU Cluster Extended TensorFlow Keras GPU User implementation System automation TensorFlow Keras GPU NextGenML Platform
  • 16. Data Catalog- Record NextGenML Platform
  • 17. Data Catalog- Experiment NextGenML Platform
  • 18. Data Catalog- Model NextGenML Platform
  • 19. Data Catalog- Record Set NextGenML Platform
  • 20. Data Catalog- Analytics NextGenML Platform
  • 21. Azure Container Registry Azure DevOps Team City git build Azure Kubernetes Cluster MASTER git DEV PRODUCTION Azure Kubernetes Cluster DEV & UAT Data Engineers devOps Testing Data Scientist Docker Helm kSonnet User initiated System initiated Continuous Integration (CI) / Continuous Deployment (CD) CI CD CPU GPU CPU GPU 2 x 2 x NextGenML Platform
  • 22. Asset Catalog Gitlab/github/AD dex http://assetcatalog.rotsrlxdv29.ness.com/ ingress 1 2 3 4 (JWT –Authentication & Authorisation) Open ID Connect(OIDC), (OAuth2) Identity provider Is JWT signature (encoded) valid? – see next page Is JWT expired – see next page Is User Authorized Private Key Public Key (PAYLOAD DATA: 'iat': 1565182393) 5 6 7 NextGenML Platform
  • 27. TensorFlow Serving REST API (https://www.tensorflow.org/tfx/serving/api_rest) POST http://host:port/v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]:(classify|regress) #preparation payload encoded_image = base64.b64encode(THE_IMAGE) payload_json{ “image”:encoded_image, …. } # post request to Tensorflow Serving json_response = requests.post(url=rest_url, json= payload_json{}) # decode the response and extract marked organ response = extract_from_json_THE_2D_PREDICTION (numpy) organ_countur = png.Reader(bytes=y.tostring()) SERVING NextGenML Platform
  • 28. VPN + KeyVault AZ DevOps Pipelines (Brigade) Kubernetes Services (AKS) HDInsight (Kafka & Spark) Storage Account DataLake Storage Gen 1 & 2 Data Factory (ADF) Function App(lambda) Batch Account (VM Pools) Databricks + external metastore Service Bus (ASB) Cache for Redis VM ( GPU ) CI / CD Kibana Logstash Filebit GrayLog Jenkins Nexus SonarQube** Machine Learning Heterogeneous cluster (CPU+GPU) Kubeflow(on K8s) Tensorboard(Keras)- training/boarding/serving* Seldon* K8s Gangway+Dex(GitLab/AD)+RBAC Ingress(Nginx)/Istio +Services KubeFlow Spark Asset Catalog GlusterFS Graphana + Prometeus Azure Service Hybrid Cloud Topology NextGenML Platform
  • 32. Azure Data Factory NextGenML Platform
  • 33. ABDOMEN – markers "Small Intestine", [200, 200, 100], "Large Intestine", [255, 255, 0], "Spinal Canal", [150, 100, 50], "Gall Bladder", [40, 100, 240] "Spleen", [0, 200, 120], "Liver", [30, 200, 50], "Stomach", [120, 80, 10], "Pancreas", [200, 70, 30], "Duodenum”, [20, 100, 50],. NextGenML Platform
  • 34.
  • 35. • Stack Technology Platform Kubernetes (onPrem) + Docker Azure Kubernetes Cluster (AKS) Nexus Azure Container Registry(ACR) GlusterFS Workflow Argo->Kubeflow DevOps Helm kSonnet Kustomize Azure DevOps Code Management & CI/CD Git TeamCity SonarQube Jenkins Security MS Active Directory Azure VPN Dex (K8s) integrated with GitLab Machine Learning TensorFlow (model training, boarding, serving) Keras Seldon Storage (Azure) Storage Gen1 & Gen2 Data Lake File Storage ETL (Azure) Databricks Data Factory (ADF) HDInsight (Kafka and Spark) Service Bus (ASB) Lambda functions & VMs Spark on K8 Cache for Redis Monitoring and Logging Graphana Prometeus GrayLog
  • 36. www.ness.comNess Timisoara Thank you! Radu.Moldovan@ness.com Skype: r.moldovan Linkedin: linkedin.com/in/raduadrianmoldovan NextGenML Platform