END-TO-END MACHINE LEARNING STACK

END-TO-END
MACHINE LEARNING PIPELINE

Single machineML hero Small data

Single machineSmall data
Single machineSmall data
ML hero
ML hero

More Data + Bigger Models + More Computation
= Better Results in Machine Learning

https://blog.openai.com/ai-and-compute

Single machineBig data
Single machineBig data
ML hero
ML hero

Cluster
Big data
Big data
ML hero
ML hero

Single machine Data center
1 user Many users
Megabyte of data Petabyte of data
Local filesystem Distributed filesystem
Exclusive use Resource sharing, scheduling,
queueing, resource isolation
Scale up Scale out
pip install ... Automating deployment
- Operations, monitoring, ...

Development cycle for autonomous vehicles
1 Collect
sensors data
3 Autonomous
Driving
2 Model
Engineering
Data Logger Control Unit
Big Data Trained Model
Data Center

Sensors Udacity Lincoln MKZ
Camera 3x Blackfly GigE Camera, 20 Hz
Lidar Velodyne HDL-32E, 9.5 Hz
IMU Xsens, 400 Hz
GPS 2x fixed, 1 Hz
CAN bus, 1,1 kHz
Robot Operating System
Data 3 GB per minute
https://github.com/udacity/self-driving-car

ROS bag data structure
https://github.com/valtech/ros_hadoop

Robot Operating System
+ Popular open source robotics
framework
+ Reliable distributed architecture
+ Wide use in the robotics
research community
+ Huge selection of “off-the-shelf”
software packages for
hardware/algorithms/etc.
+ Used by Bosch, BMW, KUKA, Google, Siemens, etc.
https://roscon.ros.org/2015/presentations/ROSCon-Automated-Driving.pdf

17
1 Collect
sensors data
3 Autonomous
Driving
2 Model
Engineering
Data Logger Control Unit
Big Data Trained Model
Data Center
Development cycle for autonomous vehicles

Ingest data
Data
Preprocessing
Feature
Engineering
Model
Training
Simulation
Reports
Results
Model
Deployment
Training
data
Model
Validation
Train Test Loop
Test
data
Model Feedback Loop

Train and evaluate machine learning models at scale
Single machine Data center
How to run more experiments faster and in parallel?
How to share and reproduce research?
How to go from research to real products?

Distributed Machine Learning
Data Size
Model Size
Model parallelism
Single machine
Data center
Data
parallelism
training very large models exploring several model
architectures, hyper-
parameter optimization,
training several
independent models
speeds up the training

Compute Workload for Training and Evaluation
I/O workload
Compute
workload
Single machine
Data center

I/O Workload for Simulation and Testing
I/O workload
Compute
workload
Single machine
Data center

Flux – Open Machine Learning Stack
Training &
Test data
Compute + Network + Storage Deploy model
ML Development & Catalog & REST API
ML-Heros
Feature
Engineering
Training
Evaluation
Re-Simulation
Testing
CaffeOnSpark
Sample Model Prediction Batch Regression Cluster
Dataset Correlation Centroid Anomaly Test Scores
 Mainly open source
 No vendor lock in
 Scale-out architecture
 Multi user support
 Resource management
 Job scheduling
 Speed-up training
 Speed-up simulation
https://github.com/flux-project/flux

Feature Engineering
+ Hadoop InputFormat and
Record Reader for Rosbag
+ Process Rosbag with Spark,
Yarn, MapReduce, Hadoop
Streaming API, …
+ Spark RDD are cached and
optimized for analysis
Ros
bag
Processing
Engine
Computer
Network
Storage
Advanced
Analytics
RDD
Record
Reader
RDD
DataFrame, DataSet
SQL, Spark APIs
NumPy
Ros
Msg

Hadoop InputFormat for ROS bags
https://github.com/valtech/ros_hadoop

Training & Evaluation
+ Tensorflow ROSRecordDataset
+ Protocol Buffers to serialize
records
+ Save time because data
conversion not needed
+ Save storage because data
duplication not needed
Training
Engine
Machine
Learning
Ros
bag
Computer
Network
Storage
ROS
Dataset
Ros
msg

Re-Simulation & Testing
+ Use Spark for preprocessing,
transformation, cleansing,
aggregation, time window
selection before publish to ROS
topics
+ Use Re-Simulation framework
of choice to subscribe to the
ROS topics
Engine
Re-Simulation
with framework
of choice
Computer
Network
Storage
Ros
bag
Ros
topic
core
subscribe
publish

Flux
Open Machine Learning Stack
Apache License 2.0
http://flux-project.org

END-TO-END MACHINE LEARNING STACK

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à END-TO-END MACHINE LEARNING STACK

Similaire à END-TO-END MACHINE LEARNING STACK (20)

Dernier

Dernier (20)

END-TO-END MACHINE LEARNING STACK