SlideShare une entreprise Scribd logo
@PaaSDev
Apache Deep Learning 201 v1.00
(For Data Engineers)
Timothy Spann
https://github.com/tspannhw/ApacheDeepLearning201/
@PaaSDev
Disclaimer
• This is my personal integration and use of Apache software, no companies vision.
• This document may contain product features and technology directions that are under
development, may be under development in the future or may ultimately not be
developed. This is Tim’s ideas only.
• Technical feasibility, market demand, user feedback, and the Apache Software
Foundation community development process can all effect timing and final delivery.
• This document’s description of these features and technology directions does not
represent a contractual commitment, promise or obligation from Hortonworks to
deliver these features in any generally available product.
• Product features and technology directions are subject to change, and must not be
included in contracts, purchase orders, or sales agreements of any kind.
• Since this document contains an outline of general product development plans,
customers should not rely upon it when making a purchase decision.
@PaaSDev
There are some who call him...
DZone Zone Leader and Big Data MVB;
Princeton Future of Data Meetup
https://github.com/tspannhw
https://community.hortonworks.com/users/9304/tspann.html
https://dzone.com/users/297029/bunkertor.html
https://www.meetup.com/futureofdata-princeton/
@PaaSDev
Hadoop {Submarine} Project: Running deep learning workloads on YARN ,
Tim Spann (Cloudera)
@PaaSDev
@PaaSDev
@PaaSDev
IoT Edge Processing with Apache MiniFi and Multiple Deep Learning Libraries
@PaaSDev
Deep Learning for Big Data Engineers
Multiple users, frameworks, languages, devices, data sources & clusters
BIG DATA ENGINEER
• Experience in ETL
• Coding skills in Scala,
Python, Java
• Experience with Apache
Hadoop
• Knowledge of database
query languages such as
SQL
• Knowledge of Hadoop tools
such as Hive, or Pig
• Expert in ETL (Eating, Ties
and Laziness)
• Social Media Maven
• Deep SME in Buzzwords
• No Coding Skills
• Interest in Pig and Falcon
CAT AI
• Will Drive your Car
• Will Fix Your Code
• Will Beat You At Q-Bert
• Will Not Be Discussed
Today
• Will Not Finish This Talk For
Me, This Time
http://gluon.mxnet.io/chapter01_crashcourse/preface.html
@PaaSDev
@PaaSDev
@PaaSDev
Why Apache NiFi?
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a 200 sources
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
@PaaSDev
Aggregate all the Data!
Sensors, Drones, logs,
Geo-location devices
Photos, Images,
Results from running predictions on
Pre-trained models.
Collect: Bring Together
@PaaSDev
Mediate point-to-point and
Bidirectional data flows
Delivering data reliably to and from
Apache HBase, Druid, Apache Phoenix, Apache
Hive, Impala, Kudu, HDFS, Slack and Email.
Conduct: Mediate the Data Flow
@PaaSDev
Orchestrate, parse, merge, aggregate, filter, join,
transform, fork
Query, sort, dissect, store, enrich with weather, location,
Sentiment analysis, image analysis, object detection,
image recognition, …
Curate: Gain Insights
@PaaSDev
• Cloud ready
• Python, C++, Scala, R, Julia, Matlab, MXNet.js and Perl Support
• Experienced team (XGBoost)
• AWS, Microsoft, NVIDIA, Baidu, Intel
• Apache Incubator Project
• Run distributed on YARN and Spark
• In my early tests, faster than TensorFlow. (Try this yourself)
• Runs on Raspberry PI, NVidia Jetson TX1 and other constrained devices
https://mxnet.incubator.apache.org/how_to/cloud.html
https://github.com/apache/incubator-mxnet/tree/1.3.1/example
https://gluon-cv.mxnet.io/api/model_zoo.html
@PaaSDev
• Great documentation
• Crash Course
• Gluon (Open API), GluonCV, GluonNLP
• Keras (One API Many Runtime Options)
• Great Python Interaction. Java and Scala APIs!
• Open Source Model Server Available
• ONNX (Open Neural Network Exchange Format) Support for AI Models
• Now in Version 1.4.0!
• Rich Model Zoo!
• Math Kernel Library and NVidia CUDA Optimizations
• TensorBoard compatible
http://mxnet.incubator.apache.org
/
http://gluon.mxnet.io/https://onnx.ai
/
pip3.6 install -U keras-mxnet
https://gluon-
nlp.mxnet.io/
pip3.6 install --upgrade mxnet
pip3.6 install gluonnlp pip3.6 install gluoncv
pip3.6 install mxnet-mkl>=1.3.0 --upgrade
@PaaSDev
Apache MXNet GluonCV Zoo
https://gluon-cv.mxnet.io/model_zoo/classification.html
• ResNet152_v2
• MobileNetV2_0.25
• VGG19_bn
• SqueezeNet1.1
• DenseNet201
• Darknet53
• InceptionV3
• CIFAR_ResNeXt29_16x64
• yolo3_darknet53_voc
• ssd_512_mobilenet1.0_coco
• faster_rcnn_resnet101_v1d_coco
• yolo3_darknet53_coco
• FCN model on PASCAL VOC
@PaaSDev
• Apache MXNet Running in Apache Zeppelin Notebooks
• Apache MXNet Running on YARN 3.1 In Hadoop 3.1 In Dockerized
Containers
• Apache MXNet Running on YARN
Apache NiFi Integration with Apache Hadoop Options
https://community.hortonworks.com/articles/176789/apache-deep-learning-101-using-apache-mxnet-in-apa.html
https://community.hortonworks.com/articles/174399/apache-deep-learning-101-using-apache-mxnet-on-apa.html
https://www.slideshare.net/Hadoop_Summit/deep-learning-on-yarn-running-distributed-tensorflow-etc-on-hadoop-cluster-v3
@PaaSDev
Object Detection: GluonCV YOLO v3 and Apache NiFi
https://community.hortonworks.com/articles/222367/using-apache-nifi-with-apache-mxnet-gluoncv-for-yo.html
@PaaSDev
Object Detection: Faster RCNN with GluonCV
net = gcv.model_zoo.get_model(faster_rcnn_resnet50_v1b_voc, pretrained=True)
Faster RCNN model trained on Pascal VOC dataset with
ResNet-50 backbone
https://gluon-cv.mxnet.io/api/model_zoo.html
@PaaSDev
Instance Segmentation: Mask RCNN with GluonCV
net = model_zoo.get_model('mask_rcnn_resnet50_v1b_coco', pretrained=True)
Mask RCNN model trained on COCO dataset with ResNet-50 backbone
https://gluon-cv.mxnet.io/build/examples_instance/demo_mask_rcnn.html
https://arxiv.org/abs/1703.06870
https://github.com/matterport/Mask_RCNN
@PaaSDev
Semantic Segmentation: DeepLabV3 with GluonCV
model = gluoncv.model_zoo.get_model('deeplab_resnet101_ade', pretrained=True)
GluonCV DeepLabV3 model on ADE20K dataset
https://gluon-cv.mxnet.io/build/examples_segmentation/demo_deeplab.html
run1.sh demo_deeplab_webcam.py
http://groups.csail.mit.edu/vision/datasets/ADE20K/ https://arxiv.org/abs/1706.05587
https://www.cityscapes-dataset.com/
This one is a bit
slower.
@PaaSDev
Semantic Segmentation: Fully Convolutional Networks
model = gluoncv.model_zoo.get_model(‘fcn_resnet101_voc ', pretrained=True)
GluonCV FCN model on PASCAL VOC dataset
https://gluon-cv.mxnet.io/build/examples_segmentation/demo_fcn.html
run1.sh demo_fcn_webcam.py
https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf
@PaaSDev
Simple Pose Estimation
https://gluon-cv.mxnet.io/build/examples_pose/cam_demo.html
pip3.6 install gluoncv --pre --upgrade
https://github.com/dmlc/gluon-cv/tree/master/scripts/pose/simple_pose
yolo3_mobilenet1.0_coco + simple_pose_resnet18_v1b
@PaaSDev
Apache MXNet Model Server from Apache NiFi
https://community.hortonworks.com/articles/223916/posting-images-with-apache-nifi-17-and-a-custom-
pr.html
@PaaSDev
Apache MXNet Native Processor for Apache NiFi
This is a beta, community release by me using the new beta Java API for Apache MXNet.
https://github.com/tspannhw/nifi-mxnetinference-
processorhttps://community.hortonworks.com/articles/229215/apache-nifi-processor-for-apache-mxnet-ssd-
single.htmlhttps://www.youtube.com/watch?v=Q4dSGPvq
@PaaSDev
Edge Intelligence with Apache NiFi Subproject - MiNiFi
⬢ Guaranteed delivery
⬢ Data buffering
‒ Backpressure
‒ Pressure release
⬢ Prioritized queuing
⬢ Flow specific QoS
‒ Latency vs. throughput
‒ Loss tolerance
⬢ Data provenance
⬢ Recovery / recording a rolling
log of fine-grained history
⬢ Designed for extension
⬢ Java or C++ Agent
Different from Apache NiFi
⬢ Design and Deploy
⬢ Warm re-deploys
Key
Features
@PaaSDev
Apache MXNet Running on Edge Nodes (MiniFi)
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-
mqtt.html
https://github.com/tspannhw/OpenSourceComputerVision
https://github.com/tspannhw/ApacheDeepLearning101
https://github.com/tspannhw/mxnet-for-iot
@PaaSDev
Multiple IoT Devices with Apache NiFi and Apache MXNet
https://community.hortonworks.com/articles/203638/ingesting-multiple-iot-devices-with-apache-nifi-17.html
@PaaSDev
Using Apache MXNet on The Edge with Sensors and Intel Movidius
(MiNiFi)
https://community.hortonworks.com/articles/176932/apache-deep-learning-101-using-apache-mxnet-on-the.html
https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1-running-apac.html
@PaaSDev
Using Apache MXNet on The Edge with Sensors and Google Coral (MiNiFi)
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
@PaaSDev
Storage Platform: HDFS in Apache Hadoop 3.1
Compute & GPU Platform: YARN in
Apache Hadoop 3.1HBase2.0
Security & Governance: Atlas 1.0, Ranger 1.0, Knox 1.0
Hive 3.0 Spark 2.3Phoenix
0.8
Operations: Ambari 2.7
Open Source Hadoop 3.1
@PaaSDev
Apache MXNet on Apache YARN 3.1 Native No Spark
yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -jar
/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command
python3.6 -shell_args "/opt/demo/analyzex.py /opt/images/cat.jpg" -container_resources memory-
mb=512,vcores=1
Uses: Python Any
@PaaSDev
Apache MXNet on Apache YARN 3.1 Native No Spark
https://community.hortonworks.com/content/kbentry/222242/running-apache-mxnet-deep-learning-on-yarn-31-
hdp.html
https://github.com/tspannhw/ApacheDeepLearning101/blob/master/analyzehdfs.py
@PaaSDev
Apache MXNet on YARN 3.2 in Docker Using “Submarine”
https://github.com/apache/hadoop/tree/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine
yarn jar hadoop-yarn-applications-submarine-<version>.jar job run 
--name xyz-job-001 --docker_image <your docker image> 
--input_path hdfs://default/dataset/cifar-10-data 
--checkpoint_path hdfs://default/tmp/cifar-10-jobdir 
--num_workers 1 
--worker_resources memory=8G,vcores=2,gpu=2 
--worker_launch_cmd "shell for Apache MXNet"
Wangda Tan
(wangda@apache.org)
Hadoop {Submarine} Project: Running deep learning workloads on YARN
https://issues.apache.org/jira/browse/YARN-8135

Contenu connexe

Tendances

Tendances (20)

Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?
 
Comparison of various streaming technologies
Comparison of various streaming technologiesComparison of various streaming technologies
Comparison of various streaming technologies
 
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Architecting for Scale
Architecting for ScaleArchitecting for Scale
Architecting for Scale
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
 
Axway amplify api management platform
Axway amplify api management platformAxway amplify api management platform
Axway amplify api management platform
 
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
 
Spark optimization
Spark optimizationSpark optimization
Spark optimization
 
Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
Continus sql with sql stream builder
Continus sql with sql stream builderContinus sql with sql stream builder
Continus sql with sql stream builder
 
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
 
Using apache mx net in production deep learning streaming pipelines
Using apache mx net in production deep learning streaming pipelinesUsing apache mx net in production deep learning streaming pipelines
Using apache mx net in production deep learning streaming pipelines
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 

Similaire à Apache Deep Learning 201 - Barcelona DWS March 2019

ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
Timothy Spann
 
ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300
Timothy Spann
 
Overview of PaaS: Java experience
Overview of PaaS: Java experienceOverview of PaaS: Java experience
Overview of PaaS: Java experience
Igor Anishchenko
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
DataWorks Summit
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 

Similaire à Apache Deep Learning 201 - Barcelona DWS March 2019 (20)

Apache Deep Learning 201
Apache Deep Learning 201Apache Deep Learning 201
Apache Deep Learning 201
 
Apache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open Source
 
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
 
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueDeep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
 
Apache deep learning 101
Apache deep learning 101Apache deep learning 101
Apache deep learning 101
 
ApacheCon 2021: Apache NiFi 101- introduction and best practices
ApacheCon 2021:   Apache NiFi 101- introduction and best practicesApacheCon 2021:   Apache NiFi 101- introduction and best practices
ApacheCon 2021: Apache NiFi 101- introduction and best practices
 
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018
 
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
 
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
 
딥러닝프레임워크비교
딥러닝프레임워크비교딥러닝프레임워크비교
딥러닝프레임워크비교
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
 
Apache MXNet for IoT with Apache NiFi
Apache MXNet for IoT with Apache NiFiApache MXNet for IoT with Apache NiFi
Apache MXNet for IoT with Apache NiFi
 
ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300
 
Hands-On Deep Dive with MiniFi and Apache MXNet
Hands-On Deep Dive with MiniFi and Apache MXNetHands-On Deep Dive with MiniFi and Apache MXNet
Hands-On Deep Dive with MiniFi and Apache MXNet
 
DevOps-Roadmap
DevOps-RoadmapDevOps-Roadmap
DevOps-Roadmap
 
Overview of PaaS: Java experience
Overview of PaaS: Java experienceOverview of PaaS: Java experience
Overview of PaaS: Java experience
 
Overview of PaaS: Java experience
Overview of PaaS: Java experienceOverview of PaaS: Java experience
Overview of PaaS: Java experience
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 

Plus de Timothy Spann

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
Timothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
Timothy Spann
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Timothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 

Plus de Timothy Spann (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
 

Dernier

一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
benishzehra469
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 

Dernier (20)

Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 

Apache Deep Learning 201 - Barcelona DWS March 2019

  • 1. @PaaSDev Apache Deep Learning 201 v1.00 (For Data Engineers) Timothy Spann https://github.com/tspannhw/ApacheDeepLearning201/
  • 2. @PaaSDev Disclaimer • This is my personal integration and use of Apache software, no companies vision. • This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. This is Tim’s ideas only. • Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery. • This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. • Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Since this document contains an outline of general product development plans, customers should not rely upon it when making a purchase decision.
  • 3. @PaaSDev There are some who call him... DZone Zone Leader and Big Data MVB; Princeton Future of Data Meetup https://github.com/tspannhw https://community.hortonworks.com/users/9304/tspann.html https://dzone.com/users/297029/bunkertor.html https://www.meetup.com/futureofdata-princeton/
  • 4. @PaaSDev Hadoop {Submarine} Project: Running deep learning workloads on YARN , Tim Spann (Cloudera)
  • 7. @PaaSDev IoT Edge Processing with Apache MiniFi and Multiple Deep Learning Libraries
  • 8. @PaaSDev Deep Learning for Big Data Engineers Multiple users, frameworks, languages, devices, data sources & clusters BIG DATA ENGINEER • Experience in ETL • Coding skills in Scala, Python, Java • Experience with Apache Hadoop • Knowledge of database query languages such as SQL • Knowledge of Hadoop tools such as Hive, or Pig • Expert in ETL (Eating, Ties and Laziness) • Social Media Maven • Deep SME in Buzzwords • No Coding Skills • Interest in Pig and Falcon CAT AI • Will Drive your Car • Will Fix Your Code • Will Beat You At Q-Bert • Will Not Be Discussed Today • Will Not Finish This Talk For Me, This Time http://gluon.mxnet.io/chapter01_crashcourse/preface.html
  • 11. @PaaSDev Why Apache NiFi? • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a 200 sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  • 12. @PaaSDev Aggregate all the Data! Sensors, Drones, logs, Geo-location devices Photos, Images, Results from running predictions on Pre-trained models. Collect: Bring Together
  • 13. @PaaSDev Mediate point-to-point and Bidirectional data flows Delivering data reliably to and from Apache HBase, Druid, Apache Phoenix, Apache Hive, Impala, Kudu, HDFS, Slack and Email. Conduct: Mediate the Data Flow
  • 14. @PaaSDev Orchestrate, parse, merge, aggregate, filter, join, transform, fork Query, sort, dissect, store, enrich with weather, location, Sentiment analysis, image analysis, object detection, image recognition, … Curate: Gain Insights
  • 15. @PaaSDev • Cloud ready • Python, C++, Scala, R, Julia, Matlab, MXNet.js and Perl Support • Experienced team (XGBoost) • AWS, Microsoft, NVIDIA, Baidu, Intel • Apache Incubator Project • Run distributed on YARN and Spark • In my early tests, faster than TensorFlow. (Try this yourself) • Runs on Raspberry PI, NVidia Jetson TX1 and other constrained devices https://mxnet.incubator.apache.org/how_to/cloud.html https://github.com/apache/incubator-mxnet/tree/1.3.1/example https://gluon-cv.mxnet.io/api/model_zoo.html
  • 16. @PaaSDev • Great documentation • Crash Course • Gluon (Open API), GluonCV, GluonNLP • Keras (One API Many Runtime Options) • Great Python Interaction. Java and Scala APIs! • Open Source Model Server Available • ONNX (Open Neural Network Exchange Format) Support for AI Models • Now in Version 1.4.0! • Rich Model Zoo! • Math Kernel Library and NVidia CUDA Optimizations • TensorBoard compatible http://mxnet.incubator.apache.org / http://gluon.mxnet.io/https://onnx.ai / pip3.6 install -U keras-mxnet https://gluon- nlp.mxnet.io/ pip3.6 install --upgrade mxnet pip3.6 install gluonnlp pip3.6 install gluoncv pip3.6 install mxnet-mkl>=1.3.0 --upgrade
  • 17. @PaaSDev Apache MXNet GluonCV Zoo https://gluon-cv.mxnet.io/model_zoo/classification.html • ResNet152_v2 • MobileNetV2_0.25 • VGG19_bn • SqueezeNet1.1 • DenseNet201 • Darknet53 • InceptionV3 • CIFAR_ResNeXt29_16x64 • yolo3_darknet53_voc • ssd_512_mobilenet1.0_coco • faster_rcnn_resnet101_v1d_coco • yolo3_darknet53_coco • FCN model on PASCAL VOC
  • 18. @PaaSDev • Apache MXNet Running in Apache Zeppelin Notebooks • Apache MXNet Running on YARN 3.1 In Hadoop 3.1 In Dockerized Containers • Apache MXNet Running on YARN Apache NiFi Integration with Apache Hadoop Options https://community.hortonworks.com/articles/176789/apache-deep-learning-101-using-apache-mxnet-in-apa.html https://community.hortonworks.com/articles/174399/apache-deep-learning-101-using-apache-mxnet-on-apa.html https://www.slideshare.net/Hadoop_Summit/deep-learning-on-yarn-running-distributed-tensorflow-etc-on-hadoop-cluster-v3
  • 19. @PaaSDev Object Detection: GluonCV YOLO v3 and Apache NiFi https://community.hortonworks.com/articles/222367/using-apache-nifi-with-apache-mxnet-gluoncv-for-yo.html
  • 20. @PaaSDev Object Detection: Faster RCNN with GluonCV net = gcv.model_zoo.get_model(faster_rcnn_resnet50_v1b_voc, pretrained=True) Faster RCNN model trained on Pascal VOC dataset with ResNet-50 backbone https://gluon-cv.mxnet.io/api/model_zoo.html
  • 21. @PaaSDev Instance Segmentation: Mask RCNN with GluonCV net = model_zoo.get_model('mask_rcnn_resnet50_v1b_coco', pretrained=True) Mask RCNN model trained on COCO dataset with ResNet-50 backbone https://gluon-cv.mxnet.io/build/examples_instance/demo_mask_rcnn.html https://arxiv.org/abs/1703.06870 https://github.com/matterport/Mask_RCNN
  • 22. @PaaSDev Semantic Segmentation: DeepLabV3 with GluonCV model = gluoncv.model_zoo.get_model('deeplab_resnet101_ade', pretrained=True) GluonCV DeepLabV3 model on ADE20K dataset https://gluon-cv.mxnet.io/build/examples_segmentation/demo_deeplab.html run1.sh demo_deeplab_webcam.py http://groups.csail.mit.edu/vision/datasets/ADE20K/ https://arxiv.org/abs/1706.05587 https://www.cityscapes-dataset.com/ This one is a bit slower.
  • 23. @PaaSDev Semantic Segmentation: Fully Convolutional Networks model = gluoncv.model_zoo.get_model(‘fcn_resnet101_voc ', pretrained=True) GluonCV FCN model on PASCAL VOC dataset https://gluon-cv.mxnet.io/build/examples_segmentation/demo_fcn.html run1.sh demo_fcn_webcam.py https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf
  • 24. @PaaSDev Simple Pose Estimation https://gluon-cv.mxnet.io/build/examples_pose/cam_demo.html pip3.6 install gluoncv --pre --upgrade https://github.com/dmlc/gluon-cv/tree/master/scripts/pose/simple_pose yolo3_mobilenet1.0_coco + simple_pose_resnet18_v1b
  • 25. @PaaSDev Apache MXNet Model Server from Apache NiFi https://community.hortonworks.com/articles/223916/posting-images-with-apache-nifi-17-and-a-custom- pr.html
  • 26. @PaaSDev Apache MXNet Native Processor for Apache NiFi This is a beta, community release by me using the new beta Java API for Apache MXNet. https://github.com/tspannhw/nifi-mxnetinference- processorhttps://community.hortonworks.com/articles/229215/apache-nifi-processor-for-apache-mxnet-ssd- single.htmlhttps://www.youtube.com/watch?v=Q4dSGPvq
  • 27. @PaaSDev Edge Intelligence with Apache NiFi Subproject - MiNiFi ⬢ Guaranteed delivery ⬢ Data buffering ‒ Backpressure ‒ Pressure release ⬢ Prioritized queuing ⬢ Flow specific QoS ‒ Latency vs. throughput ‒ Loss tolerance ⬢ Data provenance ⬢ Recovery / recording a rolling log of fine-grained history ⬢ Designed for extension ⬢ Java or C++ Agent Different from Apache NiFi ⬢ Design and Deploy ⬢ Warm re-deploys Key Features
  • 28. @PaaSDev Apache MXNet Running on Edge Nodes (MiniFi) https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi- mqtt.html https://github.com/tspannhw/OpenSourceComputerVision https://github.com/tspannhw/ApacheDeepLearning101 https://github.com/tspannhw/mxnet-for-iot
  • 29. @PaaSDev Multiple IoT Devices with Apache NiFi and Apache MXNet https://community.hortonworks.com/articles/203638/ingesting-multiple-iot-devices-with-apache-nifi-17.html
  • 30. @PaaSDev Using Apache MXNet on The Edge with Sensors and Intel Movidius (MiNiFi) https://community.hortonworks.com/articles/176932/apache-deep-learning-101-using-apache-mxnet-on-the.html https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1-running-apac.html
  • 31. @PaaSDev Using Apache MXNet on The Edge with Sensors and Google Coral (MiNiFi) https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
  • 32. @PaaSDev Storage Platform: HDFS in Apache Hadoop 3.1 Compute & GPU Platform: YARN in Apache Hadoop 3.1HBase2.0 Security & Governance: Atlas 1.0, Ranger 1.0, Knox 1.0 Hive 3.0 Spark 2.3Phoenix 0.8 Operations: Ambari 2.7 Open Source Hadoop 3.1
  • 33. @PaaSDev Apache MXNet on Apache YARN 3.1 Native No Spark yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command python3.6 -shell_args "/opt/demo/analyzex.py /opt/images/cat.jpg" -container_resources memory- mb=512,vcores=1 Uses: Python Any
  • 34. @PaaSDev Apache MXNet on Apache YARN 3.1 Native No Spark https://community.hortonworks.com/content/kbentry/222242/running-apache-mxnet-deep-learning-on-yarn-31- hdp.html https://github.com/tspannhw/ApacheDeepLearning101/blob/master/analyzehdfs.py
  • 35. @PaaSDev Apache MXNet on YARN 3.2 in Docker Using “Submarine” https://github.com/apache/hadoop/tree/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine yarn jar hadoop-yarn-applications-submarine-<version>.jar job run --name xyz-job-001 --docker_image <your docker image> --input_path hdfs://default/dataset/cifar-10-data --checkpoint_path hdfs://default/tmp/cifar-10-jobdir --num_workers 1 --worker_resources memory=8G,vcores=2,gpu=2 --worker_launch_cmd "shell for Apache MXNet" Wangda Tan (wangda@apache.org) Hadoop {Submarine} Project: Running deep learning workloads on YARN https://issues.apache.org/jira/browse/YARN-8135