End-to-End Big Data AI with Analytics Zoo
Jason Dai – Sr. Principal Engineer
Fadi Zuhayri – Sr. Director
Intel Architecture, Graphics & Software
• Intel’s Transformation for Intelligence Era
• Analytics Zoo: Software Platform for Big Data AI
• Building Big Data AI Applications on Analytics Zoo
Outline
COMPUTE
1980 1990 2000 2010 2020 2030 2040
PC ERA
DIGITIZE
EVERYTHING
NETWORK
EVERYTHING
1 BILLION INTERNET
CONNECTED DEVICES
1018
109
104
1015
102
T E C H N O L O G Y L E D D I S R U P T I O N S
COMPUTE
DEMOCRATIZATION
COMPUTE
1980 1990 2000 2010 2020 2030 2040
PC ERA
DIGITIZE
EVERYTHING
NETWORK
EVERYTHING
1 BILLION INTERNET
CONNECTED DEVICES
1018
109
104
1015
102
CLOUD
EVERYTHING
MOBILE
EVERYTHING
MOBILE + CLOUD ERA
10 BILLION CLOUD CONNECTED DEVICES
T E C H N O L O G Y L E D D I S R U P T I O N S
COMPUTE
DEMOCRATIZATION
COMPUTE
1980 1990 2000 2010 2020 2030 2040
PC ERA
DIGITIZE
EVERYTHING
NETWORK
EVERYTHING
1 BILLION INTERNET
CONNECTED DEVICES
1018
109
104
1015
102
100 BILLION
INTELLIGENT
CONNECTED
DEVICES
INTELLIGENCE ERA
CLOUD
EVERYTHING
MOBILE
EVERYTHING
MOBILE + CLOUD ERA
10 BILLION CLOUD CONNECTED DEVICES
T E C H N O L O G Y L E D D I S R U P T I O N S
COMPUTE
DEMOCRATIZATION
COMPUTE
1980 1990 2000 2010 2020 2030 2040
PC ERA
DIGITIZE
EVERYTHING
NETWORK
EVERYTHING
1 BILLION INTERNET
CONNECTED DEVICES
1018
109
104
1015
102
T E C H N O L O G Y L E D D I S R U P T I O N S
COMPUTE
DEMOCRATIZATION F O R E V E R Y O N E
EXASCALE
CLOUD
EVERYTHING
MOBILE
EVERYTHING
MOBILE + CLOUD ERA
10 BILLION CLOUD CONNECTED DEVICES
Industry inflections are fueling the growth of data
Intelligent
Edge
Artificial
Intelligence
5G Network
Transformation
Cloudification
Move Faster Store More Process Everything
Software & System Level Optimized
Intel® Silicon Photonics
Intel® Ethernet
Intel® Tofino
Unleashing the Potential of Data
Analytics & AI Strategy
21
4 3 Hardware
Software
Ecosystem
OPTIMIZED
SOFTWARE
E2E DATA
SCIENCE
UNIFIED
APIs
CPU INFUSED
WITH AI
FLEXIBLE
ACCELERATION
OPTIMIZED
PLATFORM
A THRIVING
COMMUNITY
INTELLIGENT
SOLUTIONS
INNOVATION &
INVESTMENT
XPU: DIVERSE INTEL HW PORTFOLIO – from edge to cloud
CPU
General
compute,
AI Inference
& training
Xe
GPU
HPC & AI
AI training &
inference
Habana
Low power
vision
Low power
NLPGNA
Movidius
24 OPTIMIZED
TOPOLOGIES
44 OPTIMIZED
TOPOLOGIES
100+ OPTIMIZED
TOPOLOGIES
…
Foundation for AI
13
More built-in
AI acceleration &
optimized
topologies with
each new gen
OPTIMIZED LIBRARIES AND FRAMEWORKS
2017 1ST GEN
Intel® Advanced
Vector Extensions
512 (Intel AVX-512)
2019 2ND GEN
Intel Deep
Learning Boost
(with VNNI)
2020 3RD GEN
Intel Deep
Learning Boost
(VNNI, BF16)
Intel Deep
Learning Boost
(AMX)
2021 NEXT GEN
AIPERFORMANCE
CPU INFUSED
WITH AI
2020 2021
Q1 Q2 Q3 Q4Q4Q3
0.6
Spec.
0.7
Spec.
0.8
Spec.
0.9
Spec.
1.0
Spec.
Industry
Initiative
Announced
More Soon…
Learn more at oneapi.com
Middleware, Frameworks & Runtimes
Applications & Services
XPUs
CPU GPU FPGA
Intel® oneAPI Product
Gold
Available December
2020
...
Compatibility
Tool
Languages Libraries
Analysis &
Debug Tools
Hardware Abstraction Layer
Run Locally Run in the Cloud
Get started quickly: code samples, quick-start guides, webinars, training
software.intel.com/oneapi
Downloads
Repositories
Containers
DevCloud
One Minute
to Code
No Hardware
Acquisition
No Download, Install
or Configuration
Support for Jupyter
Notebooks, VS Code
Easy Access to Samples
and Tutorials
ECOSYSTEM OF AI SOFTWARE STACK
HW
LIBRARIES &
COMPILERS
AI/ANALYTICS
SOLUTIONS
DL/ML/BIGDATA
FRAMEWORKS
XPU
oneDNN
CPU
oneDAL oneCCL
DATA SCIENTISTS &
DATA ANALYSTS
GPU ACCELERATERS
M O D E L Z O O A N A L Y T I C S Z O O
O P E N -
V I N O ™
T E N S O R -
F L O W
P Y T H O N
/
N U M B A
T V M
P Y -
T O R C H
M X N E T
S P A R K
S Q L + M L / DL S c a l e O ut
M O D I N
N U M P Y
X G -
BO O S T
S C I K I T -
L E A R N
P A N D A S
21
Achieving higher
yields and efficiency
Increasing production
and uptime
Transforming
learning
Enhancing
safety
Revolutionizing
patient outcomes
Turning data
into value
Pervasive Analytics & AI
Agriculture Energy Education Government Finance Healthcare
Empowering
industry 4.0
Creating thrilling
experiences
Modernizing
shopping
Enabling homes to
see, hear & respond
Fueling automated
driving
Driving network
efficiency
Industrial Media Retail Smart Home Telecom Transport
intel.com/customerspotlight
Big Data AI Open-Source Software Platform
Distributed TensorFlow, PyTorch, Keras, BigDL, RAY, and Apache Spark
Reference Use Cases, AI Models, High-level APIs, Feature Engineering, etc.
https://github.com/intel-analytics/analytics-zoo
AI on Big Data
Simplifying End-to-End Big Data AI Solutions Development
• Big Data AI
• Analytics Zoo overview
• Summary
Agenda
• Big Data AI
• Analytics Zoo overview
• Summary
Agenda
Seamless Scaling from Laptop to Distributed Big Data
Distributed, High-Performance
Deep Learning Framework
for Apache Spark
https://github.com/intel-analytics/bigdl
Unified Big Data AI Platform
for TensorFlow, PyTorch, Keras, BigDL,
OpenVINO, Ray and Apache Spark
https://github.com/intel-analytics/analytics-zoo
AI on Big Data
Transformation of Big Data
• Storing and processing more data
• Analyzing (querying) more data
• Real-time analysis
• Modelling and prediction (ML/DL)
AI is everywhere
• Moving from experimentation to production
• Applying to large-scale, distributed Big Data
Big Data AI
Case Study: Image Feature Extraction at JD.com
Query
Search Result
Source: “Bringing deep learning into big data analytics using BigDL”, Xianyan Jia and Zhenhua Wang, Strata Data Conference Singapore 2017
Similar Image Search Image Deduplication
Image Feature
Extraction:
Applications:
* https://software.intel.com/en-us/articles/building-large-scale-image-feature-extraction-with-bigdl-at-jdcom
* “BigDL: A Distributed Deep Learning Framework for Big Data”, ACM SoCC 2019, https://arxiv.org/abs/1804.05839
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
• End-to-end Big Data AI
pipeline (using BigDL
on Apache Spark)
• Efficiently scale out
(3.83x speed-up vs.
Nvidia GPU severs)*
Case Study: Image Feature Extraction at JD.com
Analytics Zoo: Software Platform for Big Data AI
End-to-End Pipelines
(Seamlessly scale AI models to distributed Big Data)
ML Workflow
(Automate tasks for building end-to-end pipelines)
Models
(Built-in models and algorithms)
Compute
Environment
K8s Cluster Cloud
Python Libraries
(Numpy/Pandas/sklearn/…)
DL Frameworks
(TF/PyTorch/BigDL/OpenVINO/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Laptop Hadoop Cluster
Powered by oneAPI
https://github.com/intel-analytics/analytics-zoo
End-to-End Big Data Analytics and AI
Seamless Scaling from Laptop to Distributed Big Data
Big Data
Pipeline
Prototype on laptop
using sample data
Experiment on clusters
with history data
Production deployment w/
distributed data pipeline
• Easily prototype end-to-end pipelines that apply AI models to big data
• “Zero” code change from laptop to distributed cluster
• Seamlessly deployed on production Hadoop/K8s clusters
• Automate the process of applying machine learning to big data
• Big Data AI
• Analytics Zoo overview
• Summary
Agenda
Analytics Zoo: Software Platform for Big Data AI
Recommendation
Distributed TensorFlow & PyTorch on Spark
Spark Dataframes & ML Pipelines for DL
RayOnSpark
InferenceModel
Models &
Algorithms
End-to-end
Pipelines
Time Series Computer Vision NLP
ML Workflow AutoML Automatic Cluster Serving
Compute
Environment
K8s Cluster Cloud
Python Libraries
(Numpy/Pandas/sklearn/…)
DL Frameworks
(TF/PyTorch/BigDL/OpenVINO/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Laptop Hadoop Cluster
Powered by oneAPI
https://github.com/intel-analytics/analytics-zoo
Analytics Zoo: Software Platform for Big Data AI
Recommendation
Spark Dataframes & ML Pipelines for DL
Distributed TensorFlow & PyTorch on Spark
InferenceModel
Models &
Algorithms
End-to-end
Pipelines
Time Series Computer Vision NLP
ML Workflow AutoML Automatic Cluster Serving
Compute
Environment
K8s Cluster Cloud
Python Libraries
(Numpy/Pandas/sklearn/…)
DL Frameworks
(TF/PyTorch/BigDL/OpenVINO/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Laptop Hadoop Cluster
Powered by oneAPI
RayOnSpark
https://github.com/intel-analytics/analytics-zoo
Distributed TensorFlow/PyTorch on Spark
Write TensorFlow/PyTorch inline with Spark code
#pyspark code
train_rdd = spark.hadoopFile(…).map(…)
dataset = TFDataset.from_rdd(train_rdd,…)
#tensorflow code
import tensorflow as tf
slim = tf.contrib.slim
images, labels = dataset.tensors
with slim.arg_scope(lenet.lenet_arg_scope()):
logits, end_points = lenet.lenet(images, …)
loss = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy(
logits=logits, labels=labels))
#distributed training on Spark
optimizer = TFOptimizer.from_loss(loss, Adam(…))
optimizer.optimize(end_trigger=MaxEpoch(5))
Analytics Zoo API in blue
Network Quality Prediction in SK Telecom
Distributed TensorFlow/PyTorch on Spark
https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Analytics Zoo: Software Platform for Big Data AI
Recommendation
Spark Dataframes & ML Pipelines for DL
Distributed TensorFlow & PyTorch on Spark
InferenceModel
Models &
Algorithms
End-to-end
Pipelines
Time Series Computer Vision NLP
ML Workflow AutoML Automatic Cluster Serving
Compute
Environment
K8s Cluster Cloud
Python Libraries
(Numpy/Pandas/sklearn/…)
DL Frameworks
(TF/PyTorch/BigDL/OpenVINO/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Laptop Hadoop Cluster
Powered by oneAPI
RayOnSpark
https://github.com/intel-analytics/analytics-zoo
• : distributed framework for
emerging AI applications
• RayOnSpark
• Directly run Ray programs on Big
Data cluster
• Integrate Ray programs into Spark
data pipeline
https://medium.com/riselab/rayonspark-running-emerging-ai-applications-on-big-data-clusters-with-ray-and-analytics-zoo-923e0136ed6a
RayOnSpark
Run Ray Programs Directly on Big Data Platform
RayOnSpark
Run Ray Programs Directly on Big Data Platform
Analytics Zoo API in blue
sc = init_spark_on_yarn(...)
ray_ctx = RayContext(sc=sc, ...)
ray_ctx.init()
#Ray code
@ray.remote
class TestRay():
def hostname(self):
import socket
return socket.gethostname()
actors = [TestRay.remote() for i in range(0, 100)]
print([ray.get(actor.hostname.remote()) for actor in actors])
ray_ctx.stop()
https://medium.com/riselab/rayonspark-running-emerging-ai-applications-on-big-data-clusters-with-ray-and-analytics-zoo-923e0136ed6a
Fast Food Recommendation in Burger King
End-to-End Training Pipeline w/ RayOnSpark
* https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d
* “Context-Aware Drive-thru Recommendation Service at Fast Food Restaurants”, https://arxiv.org/abs/2010.06197
DATA
INGESTIO N
FEATURE
ENGINEERING
TRAINING INFERENCE
on
Analytics Zoo: Software Platform for Big Data AI
Recommendation
Spark Dataframes & ML Pipelines for DL
Distributed TensorFlow & PyTorch on Spark
InferenceModel
Models &
Algorithms
End-to-end
Pipelines
Time Series Computer Vision NLP
ML Workflow AutoML Automatic Cluster Serving
Compute
Environment
K8s Cluster Cloud
Python Libraries
(Numpy/Pandas/sklearn/…)
DL Frameworks
(TF/PyTorch/BigDL/OpenVINO/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Laptop Hadoop Cluster
Powered by oneAPI
RayOnSpark
https://github.com/intel-analytics/analytics-zoo
Scalable AutoML for Time Series Prediction
Automated feature generation, model selection and hyper parameter tuning
Analytics Zoo API in blue
tsp = TimeSequencePredictor(
dt_col="datetime",
target_col="value")
pipeline = tsp.fit(train_df,
val_df, metric="mse",
recipe=RandomRecipe())
pipeline.predict(test_df)
https://medium.com/riselab/scalable-automl-for-time-series-prediction-using-ray-and-analytics-zoo-
b79a6fd08139
Scalable AutoML for Time Series Prediction
Automated feature generation, model selection and hyper parameter tuning
FeatureTransformer
Model
SearchEngine
Search presets
trial
trial
trial
trial
…best model
/parameters
trail jobs
Pipeline
with tunable parameters
with tunable parameters
configured with best parameters/model
Each trial runs a different combination
of hyper parameters
Ray Tune
rolling, scaling, feature generation, etc.
Spark + Ray
“Scalable AutoML for Time Series Forecasting using Ray”, USENIX OpML’20
TI-One ML Platform in Tencent Cloud
Scalable AutoML for Time Series Prediction
Using Analytics Zoo in Tencent Cloud TI-
One ML Platform
Predicting NYC Taxi Passengers Using AutoML
https://software.intel.com/content/www/us/en/develop/articles/tencent-cloud-leverages-analytics-zoo-to-improve-performance-of-ti-one-ml-platform.html
“Zouwu”
Open Source Framework for Time Series on Analytics Zoo
Application framework for building end-to-end
time series analysis
• Use case - reference time series use cases
• Models - built-in models for time series analysis
• AutoTS - AutoML support for building E2E time
series analysis pipelines
Project
Zouwu
Built-in Models
ML
Workflow
AutoML Workflow
End-to-End Pipelines
use-case
models autots
https://github.com/intel-analytics/analytics-
zoo/tree/master/pyzoo/zoo/zouwu
“Project Zouwu: Scalable AutoML for Telco Time Series Analysis using Ray and Analytics”, Ray Summit 2020
Analytics Zoo: Software Platform for Big Data AI
• E2E Big Data & AI pipeline (distributed TF/PyTorch/OpenVINO/Ray on Spark)
• Advanced AI workflow (AutoML, Time-Series, Cluster Serving, etc.)
Github
• Project repo: https://github.com/intel-analytics/analytics-zoo
• Use cases: https://analytics-zoo.github.io/master/#powered-by/
Technical paper/tutorials
• CVPR 2020 tutorial: https://jason-dai.github.io/cvpr2018/
• ACM SoCC 2019 paper: https://arxiv.org/abs/1804.05839
• AAAI 2019 tutorial: https://jason-dai.github.io/aaai2019/
• CVPR 2018 tutorial: https://jason-dai.github.io/cvpr2018/
Conclusion
Analytics Zoo: Software Platform for Big Data AI
End-to-End Pipelines
(Seamlessly scale AI models to distributed Big Data)
ML Workflow
(Automate tasks for building end-to-end pipelines)
Compute
Environment
K8s Cluster Cloud
Python Libraries
(Numpy/Pandas/sklearn/…)
DL Frameworks
(TF/PyTorch/BigDL/OpenVINO/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Laptop Hadoop Cluster
Powered by oneAPI
Recommendation Time Series Computer Vision NLP
https://github.com/intel-analytics/analytics-zoo
• Recommendation
• Time series analysis
• Computer vision
• Natural language processing (NLP)
Big Data AI Applications on Analytics Zoo
Food Recommendation Using Analytics Zoo in Burger King
Guest arrives
ODMB
Checks Menu
Board
Cashier enters
order
Checks Menu
Board
Guest
completes
order
* https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d
* “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
Food Recommendation Challenges
Challenges
• Lack of user identifiers
• Same session food compatibilities
• Other variables in our use case:
locations, weathers, time, etc.
• Deployment challenges
* https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d
* “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
Transformer Cross Transformer (TxT) Model
Model Components
• Sequence Transformer
• Taking item order sequence as input
• Context Transformer
• Taking multiple context features as input
• Latent Cross Joint Training
• Element-wise product for both transformer
outputs
* https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d
* “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
Unified Big Data Processing and Model Training on
Analytics Zoo
CurrentPrevious
* https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d
* “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
Food Recommendation Using Analytics Zoo in
Burger King
Offline Training Result
Model Top1
Accuracy
Top3
Accuracy
RNN 29.98% 46.24%
Contextual ItemCF 32.18% 48.37%
RNN Latent Cross 33.10% 49.98%
TxT 34.52% 52.37%
A/B Testing Result
Model Conversation
Rate Gain
Add-on
Sales Gain
RNN Latent
Cross (control)
- -
TxT +7.5% +4.7%
* https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d
* “Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King”, Data + AI Summit Europe 2020
Recommendation Using Analytic Zoo in Mastercard
https://software.intel.com/en-us/articles/deep-learning-with-analytic-zoo-optimizes-mastercard-recommender-ai-service
Train NCF Model
Features Models
Model
Candidates
Models
sampled
partition
Training Data
…
Load Parquet
Train Multiple Models
Train Wide & Deep Model
sampled
partition
sampled
partition
Spark ML Pipeline Stages
Test Data
Predictions
Test
Spark DataFramesParquet Files
Feature
Selections
SparkMLPipeline
Neural Recommender using Analytics Zoo
Estimator Transformer Model
Evaluation
& Fine
Tune
Train ALS Model
• Recommendation
• Time series analysis
• Computer vision
• Natural language processing (NLP)
Big Data AI Applications on Analytics Zoo
Time Series Based Network Quality Prediction in
SK Telecom
• Predict Network Quality Indicators (CQI, RSRP, RSRQ, SINR, …)*
for anomaly detection and real-time management
* CQI : Channel Quality Indicator
* RSRP : Reference Signal Received Power
* RSRQ : Reference Signal Received Quality
* SINR :Signal to Interference Noise Ratio
* “Vectorized Deep Learning Acceleration from Preprocessing to Inference and Training on Apache Spark in SK Telecom”, Spark + AI Summit 2020
* https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
Memory Augmented Network – Test Result
Improved predictions for sudden change!
seq2seq
Mem-network
https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
Migrating to Analytics Zoo on Intel® Xeon
Data Loader
DRAM
Store tiering forked.
Flash
Store customized.
Data Source APIs
Spark-SQL
SQL Queries
(Web, Jupyter)
LegacyDesignwithGPU
Export Preprocessing AITraining/Inference
GPU
Servers
NewArchitecture: Unified DataAnalytic+AIPlatform
Preprocessing RDDofTensor
2nd Generation Intel®Xeon®
Scalable Processors
csv files pandas, dask
spark spark
spark
Manually manage separate clusters, and segregated workflow
E2E architecture that atomically scales deep learning on Spark
AIModelCodeofTF
https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
Inference Pipeline Speed-up with Analytics Zoo
* https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
Up-to 6x speedup for
end-to-end inference
on Analytics Zoo in
SK Telecom*
Training Pipeline Speed-up with Analytics Zoo
Up-to 4x speedup for
end-to-end training
on Analytics Zoo in
SK Telecom*
* https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality
Wind Power Prediction using Analytics Zoo in
GoldWind
LSTNet
ETL Training Prediction
Deployment
Update
DB
Historical
Power
Wind Power Prediction
• Accuracy improved to 79%
(from previous 59%)*
• 4x training speedup*
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
* https://www.intel.cn/content/www/cn/zh/analytics/artificial-intelligence/create-power-forecasting-solutions.html
• Recommendation
• Time series analysis
• Computer vision
• Natural language processing (NLP)
Big Data AI Applications on Analytics Zoo
Industrial Vision Inspection Using Analytics Zoo in
Midea and KUKA
https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics
Industrial Vision Inspection Using Analytics Zoo in
Midea and KUKA
Edge to Cloud architecture using
Analytics Zoo
• 99.8% accuracy*
• <50ms image processing latency*
• >8x inference speedup*
* https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics
* https://www.intel.cn/content/www/cn/zh/analytics/artificial-intelligence/midea-case-study.html
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
AI-Assisted Radiology Using Analytics Zoo in
Dell EMC
Condition E
Condition D
Condition C
Condition B
Condition A
Patient A
Transfer Learning using ResNet-50 trained with ImageNet
https://www.delltaechnologies.com/resources/en-us/asset/white-papers/solutions/h17686_hornet_wp.pdf
chest X-rays
• Recommendation
• Time series analysis
• Computer vision
• Natural language processing (NLP)
Big Data AI Applications on Analytics Zoo
Customer Service Chatbot Using Analytics Zoo in
Microsoft Azure
*https://software.intel.com/en-us/articles/use-analytics-zoo-to-inject-ai-into-customer-service-platforms-on-microsoft-azure-part-1
*https://www.infoq.com/articles/analytics-zoo-qa-module/
Job Recommendation Using Analytics Zoo in Talroo
documents
(resume, job
description)
https://software.intel.com/content/www/us/en/develop/articles/talroo-uses-analytics-zoo-and-aws-to-leverage-deep-learning-for-job-recommendations.html
Analytics Zoo: Software Platform for Big Data AI
• E2E Big Data & AI pipeline (distributed TF/PyTorch/OpenVINO/Ray on Spark)
• Advanced AI workflow (AutoML, Time-Series, Cluster Serving, etc.)
Github
• Project repo: https://github.com/intel-analytics/analytics-zoo
• Use cases: https://analytics-zoo.github.io/master/#powered-by/
Technical paper/tutorials
• CVPR 2020 tutorial: https://jason-dai.github.io/cvpr2018/
• ACM SoCC 2019 paper: https://arxiv.org/abs/1804.05839
• AAAI 2019 tutorial: https://jason-dai.github.io/aaai2019/
• CVPR 2018 tutorial: https://jason-dai.github.io/cvpr2018/
Big Data AI
Summary
INDUSTRY INFLECTIONS ARE FUELING THE GROWTH OF DATA
5G Network Transformation, Artificial Intelligence, Intelligent Edge, Cloudification
AI & ANALYTICS ARE THE DEFINING WORKLOADS OF THE NEXT DECADE
with growing demand for end-to-end AI pipeline
UNMATCHED PORTFOLIO BREADTH AND ECOSYSTEM SUPPORT
Intel delivers a silicon & software foundation designed for the
diverse range of use cases from the cloud to the edge
ANALYTICS ZOO OPEN-SOURCE SOFTWARE PLATFORM FOR BIG DATA AI
Simplifies End-to-End Big Data AI pipeline solutions development