Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell

The Complete Data Value Chain
in a Nutshell
9th October 2020

ABOUT US
Gartner - Leader400+ Employees
30K+ Users
300 + Clients
#1 Insurance Brand
#1 Pharma Brand
#1 US Construction Company
#1 Financial Information Company
#1 Flash Sales Company
#1 Car Sharing Company
#1 Parking Device Company
#1 Cosmetics Company
#3 CPG Company
Funded By

http://www.pxleyes.com/photoshop-contest/4530/rube-goldberg.html

Motivation: Enterprise AI
https://www.dataiku.com/stories/essilor/

The data value chain
DATA DATA
DECISIONS
people
systems
automation
preparation
analytics
quality
SCIENCE
machine learning
metrics
statistics

Data
● data access
integration, security incl. impersonation
● data quality
● data preparation
filter, join, enrich, prepare, formats…
● changes in input data sets
● changes in data quality
● KPIs / metrics
● basic statistics
● dashboards

A better approach
● process data where it is stored
● use elastic compute

Why containers for Data Science?
DSS and containers
Resource Allocation
Resource Management
• Leverage cloud native technologies to manage resources extensibility
• Use diﬀerent hardware configurations (like GPUs)
• Pre-build images with necessary library dependencies
Collaboration
• Control dependencies and isolate runtimes on the same host
• Share work by sharing containers
• Kubernetes makes orchestration of the containers simple
Reproducibility
• Simplify migration by copying containers
• Attach models to a container context and facilitate past work re-run
• Ensure old code/models continue running
Production
• Facilitate self-service to production process
• Easily host models as APIs for downstream applications
• Deploy and monitor batch processes with reproducibility in mind

Leverage your infrastructure with containers
DSS and containers
Run Python / R
code in containers
Machine Learning
in containers

Models
● automated machine learning
● coding (Python, R)
● model information
● model interpretation
● model performance
incl. monitoring of model drift
● data preparation
● feature engineering
● versioning
● expose trained models via APIs

Data Scientists: focus talent on what counts
Code your way
Full programmatic
control
Full fledged API to manage
models, pipelines and automation
Free coding
Use any package with isolated envs
Full Git integration
Reuse and share code
Ensure impact
Self-provisioning
of compute resources
Cloud-based elastic processing for large volumes of
data, users or services
Don’t get distracted
Expedited wrangling
Facilitated connection to SQL, HDFS, cloud
storage, NoSQL, HDFS, APIs,...
Use visual tools where it is faster
Reuse work from other teams/analysts
Low eﬀort CI / CD
Orchestrate pipelines with optional automatic checks
Create deployment artifacts
Deploy your models as containerized APIs
Showcase your insights
Build insights, create webapps
(Shiny, Flask, Bokeh) and deploy in K8S
Package for reuse by target population
Jupyter Notebooks or IDEs
SQL/Python/R/Scala
LDAP
Kerberos
SSO

People (Collaboration)
● coders
code environments, git integration, tools etc.
● clickers
basic statistics, explore data, dashboards, download data
● communication in projects
● statistics
● visualizations
● documentation
● share data between projects
● export data and results

Automatization and Monitoring
● automate scenarios
● scheduling
● triggers
● jobs
● metrics
● notifications / reporters

Models operationalization platform
Solution Overview: Architecture
DATAIKU
DESIGN NODE
Dataiku Automation Node
MONITOR
WORKFLOWS
MONITOR
MODELS
RETRAIN / SCORE
WORKFLOWS
DEPLOY
MODELS
DEPLOY MODELS AND
ANALYTICS ARTIFACTS
Production DWH / DB
Dataiku API Nodes
IT MONITORING
APPLICATION MONITORING
Nagios / Datadog / Zabbix
BUSINESS
APPLICATIONS
Hadoop
Spark
Databases
(JDBC)
etc…
Kubernetes
Cluster
R/W/E
R/W/E
E
Real-Time
Scoring
Fetch Data
HTTP Queries

Concrete Steps toward Enterprise AI
Industrialization of Advanced Analytics Capabilities
Big Data Day 0
ML is for specialists
Ad-hoc analytics
Siloed Approach
Enterprise AI
There is no shortcut to Enterprise AI. It is a journey
that organisations need to undertake consciously,
requiring mastering each one of the four key phases,
one after the other.

Concrete Steps toward Enterprise AI
Industrialization of Advanced Analytics Capabilities
Big Data Day 0
Initiation
Impact
Acceleration
Systematization
Ad-hoc analytics
Siloed Approach
Demonstrate Value
Deliver Business Value
In Actual Operations
Fully align data,
organization and
processes
Structure Execution
and Self-Service
● Assemble first team
● Data: quality, availability,
accessibility, features
● Integration
● Minimal viable product
● Assessment of use cases
● Performance monitoring
● Improve continuously
● Operationalize models
● Get business acceptance
and impact on model
● Onboard analysts
Goals
● Integrate technologies
● Make data available for all
personas involved
● Maintaining models in
production
● New deployments
● Capitalization on previous
projects
● Build up manpower to
expand projects
● Optimization of
infrastructure
● Leveraging of new
technologies
● Optimization of analytics
processes and data
management
Enterprise AI
There is no shortcut to Enterprise AI. It is a journey
that organisations need to undertake consciously,
requiring mastering each one of the four key phases,
one after the other.

Gradual Steps toward Enterprise AI:
Main Risks
Dataiku’s Maturity Model
Big Data Day 0
Initiation
Impact
Acceleration
Systematization
Ad-hoc analytics
Siloed Approach
Demonstrate Value
Deliver Business Value
In Actual Operations
Fully align data,
organization and
processes
Structure Execution
and Self-Service
● Difficulty to assemble a
first team
● Shifting data
infrastructure/IT systems
● Lack of traction on
business owners
● Difficulty to
operationalize models
● Difficulty to get business
acceptance and impact
on model
● Inability to onboard
analysts
Main Risks
● Fragmented technologies
● Data is limited to ‘experts’
● Maintaining models in
production too costly,
hindering new
deployments
● Lack of capitalization on
previous projects
● Fractionated initiatives
difficult to reconcile
● Lack of manpower to
expand projects
● Accumulated
obsolescence of
deployed projects
● Lack of leveraging of new
technologies
● Data projects remain
fairly specific, lacking
cultural pervasivity
Enterprise AI

In a nutshell
Our experience Operationalization / going into production
● initial focus on development and coders
● no initial focus on governance, data protection, auditing
● no initial focus on enterprise security
● difficulty to operationalize models
● maintaining models in production too costly, hindering new
deployments
● accumulated obsolescence of deployed projects
Missing value definition
● Difficulty to get business acceptance and impact on
model
● Lack of traction on business owners
● Lack of capitalization on previous projects
● Data projects remain too specific
Missing Collaboration
● Difficulty to assemble a first team
● Inability to onboard analysts
● Lack of traction on business owners
● Fractionated initiatives difficult to reconcile
● Lack of manpower to expand projects
● Data projects remain too specific
Siloed IT systems & data
● Shifting data infrastructure/IT systems
● Fragmented technologies
● Data is limited to ‘experts’
● Lack of leveraging of new technologies

Dataiku DSS: Design Nodes, Automation Nodes, API Nodes

https://www.dataiku.com/webinars/

Dr. Nadine Schöne
Senior Solutions Architect, Dataiku
nadine.schoene@dataiku.com
dataiku.com

Thank you!
Q&A

Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell

Similaire à Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell (20)

Plus de IT Arena

Plus de IT Arena (20)

Dernier

Dernier (20)

Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell