Contenu connexe Similaire à Open Source AI - News and examples (20) Plus de Luciano Resende (20) Open Source AI - News and examples1. Open Source @ IBM
Open source
AI/Machine Learning
2018 / © 2018 IBM Corporation 1
Luciano Resende
Data Science Platform Architect
2. About me - Luciano Resende
2
Data Science Platform Architect – IBM – CODAIT
• Have been contributing to open source at ASF for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache
Toree, Apache Spark among other projects related to AI/ML platforms
lresende@apache.org
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende
3. Open Source @ IBM
Center for Open Source Data & AI Technologies (CODAIT)
Model Asset eXchange (MAX)
Fabric for Deep Learning (FfDL)
Jupyter Enterprise Gateway
Q&A
Agenda
32018 / © 2018 IBM Corporation
4. 4
Learn
Open Source @ IBM
Program touches
78,000
IBMers annually
Consume
Virtually all
IBM products
contain some
open source
• 40,363 pkgs
Per Year
Contribute
• >62K OS Certs
per year
• ~10K IBM
commits per
month
Connect
> 1000
active IBM
Contributors
Working in key OS
projects
2018 / © 2018 IBM Corporation
Open Source participation and usage is simpler than ever
5. 5
Open Source is essential to Developer Advocacy
IBM generated open source innovation
• 137 Code Open (dWO) projects w/1000+ Github projects
• 4 graduates: Node-Red, OpenWhisk, SystemML,
Blockchain fabric to full open governance in the last year
• developer.ibm.com/code/open/code/
Community
• IBM focused on 18 strategic communities
• Drive open governance in “Centers of Gravity”
• IBM Leaders drive key technologies and assure freedom
of action
The IBM OS Way is now open sourced
• Training, Recognition, Tooling
• Organization, Consuming, Contributing
2018 / © 2018 IBM Corporation
6. Center for Open Source
Data and AI Technologies
(CODAIT)
62018 / © 2018 IBM Corporation
7. 7
IBM’s history of strong AI leadership
1997: Deep Blue
• Deep Blue became the first machine to beat a world chess
champion in tournament play
2011: Jeopardy!
• Watson beat two top
Jeopardy! champions
1968, 2001: A Space Odyssey
• IBM was a technical
advisor
• HAL is “the latest in
machine intelligence”
2018: Open Tech, AI & emerging
standards
• New IBM centers of gravity for AI
• OS projects increasing exponentially
• Emerging global standards in AI
2018 / © 2018 IBM Corporation
8. Center for Open Source
Data and AI Technologies
CODAIT
codait.org
2018 / © 2018 IBM Corporation
codait (French)
= coder/coded
https://m.interglot.com/fr/en/codait
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
8
9. CODAIT by the numb3rs
CODAIT
codait.org
2018 / © 2018 IBM Corporation
codait (French)
= coder/coded
https://m.interglot.com/fr/en/codait
The team contributes to over 10 open source projects. These
projects include - Spark, Tensorflow, Keras, SystemML, Arrow,
Bahir, Toree, Livy, Zeppelin, R4ML, Stocator, Jupyter Enterprise
Gateway
17 committers and many contributors in Apache projects- Spark,
Arrow, systemML, Bahir, Toree, Livy
Over 980 JIRAs and 50,000 lines of code committed to Apache
Spark itself, and Over 65,000 LoC into SystemML
• Established IBM as the number 1 contributor to Spark
Machine Learning in Spark 2.0 release
Over 25 product lines within IBM leveraging Apache Spark in
some form or another. CODAIT engineers have interacted and
interlocked with many of them.
Speakers at over 100 conferences, MeetUps, un-conferences
etc.
9
Spark code contribution growth by
week
10. Center for Open Source
Data and AI Technologies
2018 / © 2018 IBM Corporation
codait (French)
= coder/coded
https://m.interglot.com/fr/en/codaitCode - Build and improve practical frameworks
to enable more developers to realize immediate
value (e.g. FfDL, Tensorflow Jupyter, Spark)
Content – Showcase solutions to complex and
real world AI problems
Community – Bring developers and data
scientists to engage with IBM (e.g. MAX)
Improving Enterprise AI lifecycle in Open Source
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
CODAIT
codait.org
10
12. CODAIT: Enabling End-to-End AI in the Enterprise
122018 / © 2018 IBM Corporation
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
13. Making AI as Ubiquitous
as the Telephone
132018 / © 2018 IBM Corporation
14. Q: What is deep learning?
A: Machine learning using
deep neural networks.
142018 / © 2018 IBM Corporation
InceptionV3 Convolutional Neural Net
(A “medium-sized” deep learning model)
Image Source:
https://github.com/tensorflow/models/blob/master/research/inception/g
3doc/inception_v3_architecture.png
15. Characteristics of Deep
Learning (1)
15
State-of-the-Art prediction
quality in many domains
– Image classification
– Machine translation
– Facial recognition
– Time series prediction
– Many more
2018 / © 2018 IBM Corporation
16. Characteristics of Deep
Learning (2)
16
Large, complex models
– Model size generally determined by “how big a
model can you fit on your device?”
2018 / © 2018 IBM Corporation
Each box ≈ between
32 and 768 linear
regression models
17. Characteristics of Deep
Learning (3)
17
Poorly understood today
…even by experts
– Why do the models converge?
– Why do the models converge with low loss?
– Why do the models generalize?
2018 / © 2018 IBM Corporation
18. Focus of this Talk
18
Incorporating well-
understood deep learning
models into enterprise
applications.
2018 / © 2018 IBM Corporation
20. “cat”
The Components of a Deep
Learning Model
202018 / © 2018 IBM Corporation
Dense
(3×8)
Dense
(8×6)
Input
(3)
Output
(2)Dense
(6×4)
Dense
(4×2)
Neural Network
Graph
Weights
(not to scale)
Driver Program
21. Example: Get an Image Classifier
21
Step 1: Find a suitable neural
network graph.
– Need to read some papers
2018 / © 2018 IBM Corporation
22. Example: Get an Image Classifier
22
Step 2: Find code to generate
the neural network graph
2018 / © 2018 IBM Corporation
TensorFlow code to build ResNet50 neural network graph
23. Example: Get an Image Classifier
23
Step 3: Find some pre-trained
weights for your graph
2018 / © 2018 IBM Corporation
Caffe2 ResNet50 model weights
24. Example: Get an Image Classifier
24
Step 4: Find example code
that performs model
inference
2018 / © 2018 IBM Corporation
TensorFlow code for training and batch inference on ResNet50
25. Example: Get an Image Classifier
25
Step 5: Write your own code to
perform model inference on one
image at a time
Step 6: Package your inference
code, graph creation code, and pre-
trained weights together
Step 7: Deploy your package
2018 / © 2018 IBM Corporation
26. Model Marketplaces
26
Collections of well-
understood deep learning
models
Provide a central place to find
known-good implementations
of these models
2018 / © 2018 IBM Corporation
27. IBM Model Asset eXchange
MAX is a one-stop shop open source
ecosystem for data scientists and AI
developers to share and consume models that
use machine learning engines, such
as TensorFlow, PyTorch and Caffe2.
It also provides a standard approach to
classify, annotate, and deploy these models
for prediction and inferencing.
MAX
https://developer.ibm.com/
code/exchanges/models/
2018 / © 2018 IBM Corporation 27
28. 282018 / © 2018 IBM Corporation
Demo!
https://developer.ibm.com/code/exchanges/models/
29. Summary
29
Free, open-source models.
Wide variety of domains.
Multiple deep learning frameworks.
Vetted and tested code and IP.
Build and deploy a web service in 30
seconds.
Start training on Watson Studio in
minutes.
2018 / © 2018 IBM Corporation
30. MAX: Future Plans
30
Many more models
– Train with Watson Studio/DLaaS
– Run inference on IBM infrastructure
Revamped website
Integration with Watson Catalog
IBMer-uploaded models
More IBM Code code patterns showing usage
2018 / © 2018 IBM Corporation
https://developer.ibm.com/code/exchanges/models/
31. But if you can’t wait
MAX Models at DockerHub
2018 / © 2018 IBM Corporation
MAX models are exposed as Docker
containers, and published to
DockerHub under CODAIT
organization.
31
https://hub.docker.com/u/codait/dashboard/
32. MAX and Container Services
K8 Deployment Descriptor
apiVersion: v1
kind: Pod
metadata:
name: image-caption-generator
namespace: default
labels:
app: image-caption-generator
spec:
restartPolicy: Always
containers:
- env:
name: image-caption-generator
image: codait/max-image-caption-generator
---
apiVersion: v1
kind: Service
metadata:
labels:
app: image-caption-generator
component: image-caption-generator
name: image-caption-generator
spec:
ports:
- name: http
port: 5000
targetPort: 5000
selector:
app: image-caption-generator
sessionAffinity: None
type: NodePort2018 / © 2018 IBM Corporation
MAX models require a Kubernetes
deployment descriptor to enable easy
deployment in IBM Cloud Container
Services.
32
Kubectl apply –f image-caption-generator.yaml
33. IBM Cloud Container Service
2018 / © 2018 IBM Corporation
IBM Cloud CLI
IBM Cloud Container Service plug-in
Kubernetes CLI (kubectl)
Useful Commands
Pointing Kubectl to IBM Cloud Container Service
export KUBECONFIG=/Users/lresende/.bluemix/plugins/container-
service/clusters/lresende-kubernetes/kube-config-hou02-lresende-kubernetes.yml
Accessing Kubernetes dashboard via kubectl proxy
kubectl config view -o jsonpath='{.users[0].user.auth-provider.config.id-token}'
kubectl proxy
http://localhost:8001/ui
Deploying application
kubectl apply –f image-caption-generator.yml
Accessing application
bx cs workers lresende-kubernetes
kubectl describe service image-caption-generator
curl -F "image=@assets/surfing.jpg" -X POST
http://184.172.242.55:32229/model/predict
curl -F "image=@/Users/lresende/Pictures/375337.jpg" -X POST
http://184.172.242.55:32229/model/predict
curl -F "image=@/Users/lresende/Pictures/362809.jpg" -X POST
http://184.172.242.55:32229/model/predict
34. Click to edit Master title style
FfDL
Fabric for Deep Learning
2018 / © 2018 IBM Corporation 34
FfDL provides a scalable,
resilient, and fault tolerant
deep-learning framework
35. Fabric for Deep Learning
https://github.com/IBM/FfDL
2018 / © 2018 IBM Corporation
FfDL provides a scalable, resilient, and fault
tolerant deep-learning framework
FfDL Github Page
https://github.com/IBM/FfDL
FfDL dwOpen Page
https://developer.ibm.com/code/open/projects/fa
bric-for-deep-learning-ffdl/
FfDL Announcement Blog
http://developer.ibm.com/code/2018/03/20/fabri
c-for-deep-learning
FfDL Technical Architecture Blog
http://developer.ibm.com/code/2018/03/20/dem
ocratize-ai-with-fabric-for-deep-learning
Deep Learning as a Service within Watson Studio
https://www.ibm.com/cloud/deep-learning
Research paper: “Scalable Multi-Framework
Management of Deep Learning Training Jobs”
http://learningsys.org/nips17/assets/papers/pape
r_29.pdf
• Fabric for Deep Learning or FfDL (pronounced as ‘fiddle’) is an open source
project which aims at making Deep Learning easily accessible to the
people it matters the most i.e. Data Scientists, and AI developers.
• FfDL Provides a consistent way to deploy, train and visualize Deep
Learning jobs across multiple frameworks like TensorFlow, Caffe, PyTorch,
Keras etc.
• FfDL is being developed in close collaboration with IBM Research and IBM
Watson. It forms the core of Watson`s Deep Learning service in open
source.
FfDL
35
36. Fabric for Deep Learning
https://github.com/IBM/FfDL
FfDL is built using Microservices architecture
on Kubernetes
• FfDL platform uses a microservices architecture to offer
resilience, scalability, multi-tenancy, and security without
modifying the deep learning frameworks, and with no or minimal
changes to model code.
• FfDL control plane microservices are deployed as pods on
Kubernetes to manage this cluster of GPU- and CPU-enabled
machines effectively
• Tested Platforms: Minikube, IBM Cloud Public, IBM Cloud
Private, GPUs using both Kubernetes feature gate Accelerators
and NVidia device plugins
362018 / © 2018 IBM Corporation
Try FfDL/DLaaS
https://ibm.biz/BdZtab
37. source code
training
definition
Auto-allocation means infrastructure is used only when needed
Kubernetes container
training
artifacts
compute cluster
NVIDIA Tesla K80, P100, V100
Cloud Object Storage
Training assets are
managed and tracked.
Access to elastic compute leveraging Kubernetes
372018 / © 2018 IBM Corporation
41. Click to edit Master title style
Jupyter
Enterprise
Gateway
2018 / © 2018 IBM Corporation 41
Provides multi-tenant,
scalable and secure remote
Jupyter Notebook kernels
43. Jupyter Notebooks
© 2018 IBM Corporation 43
Notebooks are interactive
computational
environments, in which
you can combine code
execution, rich text,
mathematics, plots and
rich media.
44. Jupyter Notebooks
© 2018 IBM Corporation 44
• Notebook UI runs on the browser
• The Notebook Server serves the
’Notebooks’
• Kernels interpret/execute cell contents
– Are responsible for code execution
– Abstracts different languages
46. Building an Data Science Platform
© 2018 IBM Corporation
Large pool of shared computing resources
• Enterprise Cloud, Public Cloud or Hybrid
• Data in the cloud (Data Lakes/Object Storage)
Distributed Consumers
• Notebooks running local (users laptop)
or as a service (e.g. Jupyter Hub)
Different Resource Utilization Patterns
• High number of idle resources
47. Vanilla Jupyter Notebooks
© 2018 IBM Corporation
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
47
8 8 8 8
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap)
Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF
SIMULTANEOUS KERNELS
Kernel
Kernel
Kernel
Kernel
Limitations of Jupyter Notebook Stack
• Security limitations
• Single user sharing the same privileges
• Users can see and control each other process
using Jupyter administrative utilities
• Scalability limitations
• Jupyter Kernels running as local process
• Resources are limited by what is available
on the one single node that runs all Kernels
and associated Spark drivers
Kernel
48. Jupyter Enterprise
Gateway
© 2018 IBM Corporation
Jupyter Enterprise Gateway at IBM Code
https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/
Jupyter Enterprise Gateway source code at GitHub
https://github.com/jupyter-incubator/enterprise_gateway
Jupyter Enterprise Gateway Documentation
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Supported Kernels
Supported Platforms
49
A lightweight, multi-tenant, scalable
and secure gateway that enables
Jupyter Notebooks to share resources
across an Apache Spark or Kubernetes
cluster for Enterprise/Cloud use cases
Spectrum Conductor
+
49. Jupyter Enterprise Gateway
© 2018 IBM Corporation
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
50
16
32
48
64
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap)
Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF
SIMULTANEOUS KERNELS
Kernel
Kernel
KernelKernel
Optimized Resource Allocation
– Utilize resources on all cluster nodes by running kernels as Spark
applications in YARN Cluster Mode.
– Pluggable architecture to enable support for additional Resource Managers
Enhanced Security
– End-to-End secure communications
• Secure socket communications
• Encrypted HTTP communication using SSL
Multiuser support with user impersonation
– Enhance security and sandboxing by enabling user impersonation when
running kernels (using Kerberos).
– Individual HDFS home folder for each notebook user.
– Use the same user ID for notebook and batch jobs.
KernelKernel
Kernel
50. Jupyter Enterprise Gateway – YARN
© 2018 IBM Corporation 51
YARN Cluster
YARN
Workers
Gateway Node
Jupyter Enterprise Gateway
• Multitenancy
• Remote kernel lifecycle management via process proxies
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Impersonation:
Alice’s kernel runs
under Alice’s user ID.
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
SecurityLayer
nb2kg
nb2kg
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Bob
Alice
51. Enterprise Gateway & Kubernetes
© 2018 IBM Corporation
Supported Platforms
Kernel
Kernel
Kernel
Kernel
Before Jupyter Enterprise Gateway …
• Scalability limitations
• Resources are limited and the amount
required to all kernels needs to be allocated
during Notebook Server pod creation.
• Resources are limited by what is available
on the one single node that runs all Kernels
and associated Spark drivers
Kernel
KernelKernel
52. Jupyter Enterprise Gateway - Kubernetes
© 2018 IBM Corporation 53
Container images defined in kernelspec
Community image
Kernel
Spark on K8
Kernel
Distributed
File
System
Vanilla Kernels
Spark based kernels
Gateway
nb2kg
nb2kg
54. Summary
© 2018 IBM Corporation 59
• Model Asset Exchange
• Curated set of models ready to use or embedded in your
application or solution
• Fabric for Deep Learning
• Provides a consistent way for AI developers and
Data Scientists to train their models
• Jupyter Enterprise Gateway
• Enables your Jupyter Notebook stack to scale in
order to build Machine Learning and AI Models
more resource effectively
MAX
https://developer.ibm.com/
code/exchanges/models/